Research Article | Open Access
Sen Jia, Zhangwei Zhan, Meng Xu, "Shearlet-Based Structure-Aware Filtering for Hyperspectral and LiDAR Data Classification", Journal of Remote Sensing, vol. 2021, Article ID 9825415, 25 pages, 2021. https://doi.org/10.34133/2021/9825415
Shearlet-Based Structure-Aware Filtering for Hyperspectral and LiDAR Data Classification
The joint interpretation of hyperspectral images (HSIs) and light detection and ranging (LiDAR) data has developed rapidly in recent years due to continuously evolving image processing technology. Nowadays, most feature extraction methods are carried out by convolving the raw data with fixed-size filters, whereas the structural and texture information of objects in multiple scales cannot be sufficiently exploited. In this article, a shearlet-based structure-aware filtering approach, abbreviated as ShearSAF, is proposed for HSI and LiDAR feature extraction and classification. Specifically, superpixel-guided kernel principal component analysis (KPCA) is firstly adopted on raw HSIs to reduce the dimensions. Then, the KPCA-reduced HSI and LiDAR data are converted to the shearlet domain for texture and area feature extraction. In contrast, superpixel segmentation algorithm utilizes the raw HSI data to obtain the initial oversegmentation map. Subsequently, by utilizing a well-designed minimum merging cost that fully considers spectral (HSI and LiDAR data), texture, and area features, a region merging procedure is gradually conducted to produce a final merging map. Further, a scale map that locally indicates the filter size is achieved by calculating the edge distance. Finally, the KPCA-reduced HSI and LiDAR data are convolved with the locally adaptive filters for feature extraction, and a random forest (RF) classifier is thus adopted for classification. The effectiveness of our ShearSAF approach is verified on three real-world datasets, and the results show that the performance of ShearSAF can achieve an accuracy higher than that of comparison methods when exploiting small-size training sample problems. The codes of this work will be available at http://jiasen.tech/papers/ for the sake of reproducibility.
Recently, the continuously evolving remote sensing sensor technologies have contributed to the capture of multisource data in the same area [1, 2]. Among those numerous remote sensing data, hyperspectral images (HSIs) contain joint spectrum and space information, providing a distinctive discriminating ability for Earth’s surface objects. HSIs have hundreds or thousands of narrow spectral bands, covering the spectral region from the visible to the infrared field . In particular, HSIs have both spatial and spectral smoothness, which not only produces detailed and accurate descriptions of objects but also results in a high correlation between adjacent bands [4–6]. Based on the above reasons, some obstacles and challenges exist regarding the interpretation of HSI information. Specifically, HSIs are prone to include information redundancy as a result of high correlation and the Hughes phenomenon caused by a high spectral dimension [7, 8]. In addition, environmental factors, such as clouds and noise, will also cause information confusion when the remote sensor captures scene data . Compared with HSI, LiDAR integrates a laser ranging system, a global positioning system, and an inertial navigation system, so that it can collect the position and intensity information of objects in a three-dimensional space [10–12]. However, LiDAR works in a single band and lacks semantic information; thus, it has poor discriminative ability in distinguishing targets with similar heights but different spectra .
Many works in the literature have proven the effectiveness of combined HSI and LiDAR interpretation, indicating that the intensity information provided by LiDAR can supplement the HSI deficiencies regarding the target height and shape information [5, 14–17]. In 2013, the HSI and LiDAR data fusion competition, organized by IEEE Geoscience and Remote Sensing Society, greatly promoted the research on HSI and LiDAR data fusion methods for classification . In general, these fusion methods can be roughly divided into three categories: pixel-level fusion, feature-level fusion, and decision-level fusion. The strategy of pixel-level fusion relies on concatenating multisource data directly on the original data, which requires geometric registration. Feature-level fusion is considered as a better approach in that it can achieve better classification performance in most cases [19–22]. It conducts the feature extraction for each source individually and then combines them. Decision-level fusion methods are aimed at integrating several rough classification results of multisource data into the final classification [23, 24]. Although the computational complexity is relatively low, it relies heavily on the original classification results and integration strategies. In fact, due to the inherent shortcomings of single data fusion methods, there are many articles exploring the classification framework that combines feature-level fusion and decision-level fusion [25–27].
In addition, it must be mentioned that deep learning-based approaches are also widely favored in recent years for hyperspectral feature extraction and classification. The pioneering work is the stacked autoencoder proposed by Chen et al., which has been used for hyperspectral high-level feature extraction . Subsequently, convolutional neural network (CNN) [29–31] and recurrent neural network (RNN) [32, 33] have been widely developed. Among them, the representative 3D-CNN can effectively extract the joint spatial-spectral information and has shown good performance . Furthermore, the deep learning-based framework, which combines traditional features and network structure, has also been explored [35, 36]. However, the model parameter optimization of these deep learning-based approaches generally relies on a large number of training samples, which greatly limits its applicable ability due to the difficulty of sample labeling in remote sensing field. Inspired by the small sample set circumstance, some new strategies have been developed. For example, Yu et al. proposed a novel CNN model by combining data augmentation and convolutional kernel . More recently, semisupervised CNN and PCANet-derivative methods have also received constant attention because of their performance with limited training samples [38–41].
Alternatively, wavelet analysis is an important mathematical tool because of its optimal approximate fitness in signal processing. However, in the case of multidimensional images with a discontinuity curve, traditional wavelet loses its sparsity on the edge response [42, 43]. Thus, the multidimensional directional wavelet is required. Gabor, which is widely used in texture analysis and feature extraction, can be considered as an early directional wavelet [44, 45]. Its inherent drawback is that directions are restricted on each scale once sampled. There has been an emerging series of directional wavelets in the past few decades, such as contourlets, bandlets, curvelets, and shearlets [46–49]. They all provide a flexible framework in mathematical theory while capturing the geometric features in applications. Among these methods, the shearlet possesses remarkable properties: it accurately captures the edge direction, has an optimal sparse representation for multidimensional data, uses a well-organized multiscale structure, and exhibits fast algorithm implementation and efficient calculation [50–52]. In its simplest form, the shearlet starts with the construction of the so-called mother function, and then, it adopts three basic operations (scaling, translation, and direction) to provide a derivative with more shape and direction. Two well-known properties of shearlets are highlighted as follows: (1) if a point is far away from the edge, then its shearlet coefficient decays rapidly as the scale decreases; (2) if a point is an edge point or a corner point, then its shearlet coefficient decays slowly in the normal direction, while it decays rapidly in other directions. Therefore, shearlets have been widely used for edge and corner detection [53–55]. In addition, due to its frequency domain division, there are also some pioneering works using it for denoising, feature extraction, and data fusion [56–60].
During the past two decades in the computer vision field, another emerging and rapidly spreading concept is the use of superpixel segmentation. Specifically, a superpixel is considered as a homogeneous area containing some texture or structural similar pixels [61–63]. The edge of the superpixel is a closed curve with continuity, which is different from scenarios encountered with edge extraction algorithms in which continuous scattered points may exist. Moreover, the superpixel should also possess region compactness, shape regularity, and boundary smoothness . Currently, superpixel algorithms are roughly divided into three categories: cluster-based approaches represented by simple linear iterative clustering (SLIC) and simple noniterative clustering (SNIC) [65, 66], graph-based approaches represented by entropy rate superpixel segmentation (ERS) and normalized nuts (NCut) [67, 68], and gradient-based methods represented by spatial-constrained watershed (SCoW) and superpixels using the shortest gradient distance (SSGD) [69, 70]. Notably, fuzzy superpixels were proposed in the past two years and have further enriched the content of superpixels, especially in cases of low spatial resolution such as in remote images [71, 72]. However, regardless of the kind of superpixel algorithm, direct or indirect spatial constraints exist for compactness requirements. For example, SLIC directly adds spatial distance to the clustering metric, and ERS requires each superpixel to be as close to the same size as possible. This spatial constraint inevitably leads to conflicts between oversegmentation and undersegmentation. Moreover, the situation worsens since the resistance of this constraint grows as a power series. In other words, due to larger space constraints, a large homogeneous region has to be divided into several small superpixels, and a small area may contain several objects because there is almost no spatial restriction. Nevertheless, some heuristic solutions have been proposed for superpixel number selection [73, 74], but this inherent property of superpixels has not been slackened effectively.
In particular, filters play an extremely important role in image processing. There are already many filters based on different applications, such as the mean filter, Gaussian filter, and Gabor filter, for feature extraction [75, 76], and the Laplacian, Sobel, and Laplacian of Gaussian (LoG), for edge capture. However, fixed-size filter does not have the ability to obtain the best description of surface objects with various scales, and thus, it is undeniably difficult for a uniform-size filter to achieve globally satisfactory results. In other words, near the edge, a larger size of a filter will cause more confusing information, whereas in the center of the region, a smaller size of the filter hardly works well when abnormal points exist. Some researchers have made some attempts in this area, such as the multiscale spectral-spatial classification method with adaptive filtering [77–79] and the spatial adaptive multiscale filtering technique [80, 81]. However, the features or classification results in multiple filtering scales were simply concatenated or combined, and it is more desirable to take full advantage of the internal structure of objects and achieve local structure-aware filtering (i.e., automatically adjusting the size of the filter kernel according to the local position).
In this article, we innovatively propose a shearlet-based structure-aware filtering framework, abbreviated as ShearSAF, for HSI and LiDAR data classification with the help of the above tools. First, superpixel-guided kernel principal component analysis (KPCA) is adopted on raw HSIs for dimension reduction and information focus, which is greatly helpful for subsequent calculation and processing. Then, shearlet transform is implemented on KPCA-reduced HSI and LiDAR, and structural description in frequency domain is achieved; i.e., the high frequency and low frequency, respectively, contain the region information and texture information. They are further processed by energy superposition and time-frequency conversion to attain region features and texture features. Second, a gradual region merging procedure is developed to alleviate the superpixel spatial constraints and enhance the robustness of the proposed ShearSAF method. Specifically, the SNIC superpixel algorithm acts on the raw HSI to address serious oversegmentation and ensure the homogeneity of each small region, and then, the superpixels are progressively combined together according to a well-designed merging criterion that takes all the spectral information, region information, and texture information into account, eventually achieving the final merging map. Third, by locating the edge in the final merging map and calculating the shortest distance between each pixel and the edge, we can obtain a distance map to reflect the relationship between points and edges. Through geometry optimization and threshold processing, a scale map is also extracted to control the filter size, in which the point value indicates the size of the convolution kernel. Thus, the adaptive-size filter for each spatial pixel is obtained. Finally, KPCA-reduced HSI and LiDAR are convolved by these structure-aware mean filters for feature extraction and classification to verify the effectiveness of the proposed ShearSAF approach. For a better description, the detailed process of our ShearSAF is shown in Figure 1, and the main contributions of this article are summarized as follows: (i)First of all, we design a structure-aware filtering scheme for HSI and LiDAR feature extraction. These locally adaptive-size filters have a small-size kernel near the edge to protect the information from being disturbed by nearby objects, while the kernel size is larger at the center of the area to filter noise and abnormal points. Since structure-aware filter size could reflect the spatial structure of objects more precisely, the convolutional procedure can be more elegant and the discriminability of extracted features can be promoted. To our best knowledge, this is the first time a method of extracting structure-aware features for HSI and LiDAR data processing is discussed(ii)Second, shearlet transform is employed for structure description on both HSI and LiDAR data. After dividing the shearlet features into low-frequency and high-frequency energy parts, they are converted into area feature and texture feature, respectively. Since the local region and edge structure of objects can be well characterized by the extracted area and texture features, both features are taken into account (conventional methods only use either feature for edge detection and noise reduction), which provide a valuable guidance for the subsequent superpixel fusion(iii)Third, we proposed an elaborate design to effectively combine HSI and LiDAR information in a distance measurement for gradual region merging. The well-designed evaluation criterion integrates the spectral (from Euclidean distance viewpoint), area, and texture (from statistical distance viewpoint) features, which provides a more comprehensive description of the real scene and could greatly increase the structural representation ability of objects(iv)Finally, all the parameters can be either preset and kept unchanged for different HSI and LIDAR datasets or heuristically determined; therefore, the robustness and generalization ability of the proposed ShearSAF method can be ensured. Meanwhile, since the structure-aware feature extraction procedure is unrelated to the training samples and only needed to be calculated once, our ShearSAF also has high efficiency. The source codes of this work will be available at http://jiasen.tech/papers/for the sake of reproducibility
We would like to point out that the structure-aware filtering design presented here is essentially very general and can be easily utilized for other features (such as morphological and attribute features). Experimental results with several state-of-the-art methods on three real datasets demonstrate the effectiveness of the proposed ShearSAF approach.
The organization of this paper is as follows. Section 2 introduces related works about shearlet transform, SNIC superpixel algorithm, and KPCA. Section 3 describes the process of designing the shearlet-based structure-aware filtering in detail. Section 4 presents the experimental data and two ablation experiments. The experimental results of the proposed ShearSAF method with a number of alternatives are given in Section 5. Finally, Section 6 provides the conclusions of this paper.
2. Related Works
This section introduces the theory of shearlet transform, SNIC superpixel algorithm, and KPCA.
2.1. Shearlet Transform
Shearlets have received great attention for their optimal approximation properties in representing images and were first introduced in [43, 82]. The shearlet is regarded as a multiscale representation system and possesses the ability to capture the direction and geometric features [50, 53]. Suppose there exist a dilation matrix and a shear matrix where and are called the dilation factor and shear factor, respectively, the shearlet mother function is expressed as where is a translation factor, and is the coordinate in the spatial domain. and , respectively, control the scale and orientation of the shearlet. In the frequency domain, function is written as and can be factorized as where and are the two coordinates in the frequency domain. Consequently, in the frequency domain can be expressed as follows: where is the coordinate in the frequency domain, which is just the concatenation of and . and are the continuous wavelet function and bump function, respectively, meeting a certain support domain.
In fact, the shearlet compact framework in the frequency domain can be divided into three parts: the low-frequency region, horizontal cone, and vertical cone. Notably, the above factorization (4) and equation (5) are used for the horizontal cone (in the following section, we renamed in equation (5)as ). Alternatively, in the vertical cone, they are denoted as
For the low-frequency domain, the shearlet has neither dilation nor shear, so it can be written as where is called the scaling function. The frequency support of the low-frequency region, representative horizontal cone, and representative vertical cone are shown in Figure 2.
In practice, the above continuous shearlet needs to be discretized in digital image processing. Let us consider a single-band digital image mapped into a two-dimensional grid; therefore, the related parameters, such as , , and , can be computed as where is the number of scales. By substituting these discretized parameters (9) into equations (5), (7), and (8), three shearlets in the frequency domain can be rewritten as (the fixed factor is ignored) where and . To avoid overly complicated mathematical formulas and theoretical derivations, we directly give the final expression of the shearlet transform: where represents two-dimensional inverse Fourier transform and is the response of in the frequency domain. In this equation, the support domains of the first, second, and third parts are the low-frequency region, horizontal cone, and vertical cone, respectively. Additional details of shearlet construction and derivation can be found in .
2.2. Multichannel SNIC
SNIC represent the simple noniterative clustering superpixel algorithm, which has both low computational complexity and good segmentation results . Through parameter control in SNIC, the number of superpixel and the weight of spectral-spatial information can be set manually. In this article, the multichannel SNIC is adopted, which is more suitable for multichannel hyperspectral image segmentation. Specifically, SNIC starts from the initialization of the centroid and adds the elements into a priority queue. Next, when an element is taken from the priority queue, the surrounding pixels are marked and added to the queue. At the same time, the coordinates of the centroid are updated accordingly. This process will continue until the queue is empty. The comparison criterion of the priority queue is the distance between the elements and centroid , which is defined as follows: where and represent the spectral vector and space coordinates, respectively, and and are their corresponding weights.
2.3. Superpixel-Guided KPCA
As a generalization of principal components analysis (PCA), KPCA maps the input data into a high-dimensional or Hilbert space by a mapping function and can well reflect the complex structures in the corresponding high-dimensional space [84, 85]. However, in the sample selection strategy of original KPCA, random sampling and conducting all sample (there may be a million levels) strategies usually cause feature degradation and computational explosion. Therefore, a superpixel-based KPCA scheme by taking advantages of superpixel homogeneity is applied for dimension reduction .
Specifically, in terms of a raw HSI ( and are, respectively, the spatial and spectral dimensions), SNIC is applied (here, the number of superpixels is simply as ) and a superpixel map is obtained. Then, the mean vector of each region is calculated and forms the input sample set . Subsequently, the mapping function converts the input low-dimensional sample data into a high-dimensional feature . Let us consider the following covariance matrix:
Therefore, characteristic equation can be denoted as where is a diagonal matrix composed of eigenvalues arranged from large to small and is an matrix composed of corresponding eigenvectors. For convenience of calculation, an ingenious substitution is made for : where is the coefficients matrix, which is used for explaining the relationship between and . Next, through simultaneously multiplying a matrix by the left-hand side of equation (16), we can obtain where is known as the kernel function. For equations (18) and (19), can be optimized into , which is regarded as a new characteristic equation. Finally, for each spectral vector of , the dimensionality reduction process can be expressed as where represents the reserved dimension.
3. Shearlet-Based Structure-Aware Filtering
We now consider the proposed shearlet-based structure-aware filtering design. Briefly, in order to take full advantage of the structural information of objects, the following idea is complied: when a certain point approaches the edge, the size of the filter will shrink, while when it is at the center, the size of the filter will enlarge. Our ShearSAF approach to obtain this adaptive-size filter involves the following four steps: preprocessing, shearlet-based feature extraction, gradual region merging, and structure-aware filter designing. Table 1 summarizes some important mathematical symbols used in this paper for additional clarification.
This part mainly includes two aspects: superpixel-guided KPCA for HSI dimension reduction and multichannel SNIC for superpixel oversegmentation.
3.1.1. Superpixel-Guided KPCA for HSI Dimension Reduction
For high-dimensional HSI data with complex structures, KPCA has superior capabilities for dimensionality reduction. In our superpixel-guided KPCA, the radial basis function (RBF) kernel is adopted and 99.5% energy is maintained in the principal components. Afterward, the information-focused hyperspectral data is attained.
3.1.2. Multichannel SNIC for Superpixel Oversegmentation
SNIC is an emerging superpixel algorithm containing both low computational complexity and good segmentation results. The multichannel SNIC is applied on raw HSI data , and an initial oversegmentation map can be obtained, in which the homogeneity of each superpixel can be largely ensured. Instead of directly providing the number of superpixels, the number of pixels inside each superpixel is set as (the value of this parameter will be discussed in the experimental section), and weight parameters and are set as and 0.5 as default, respectively.
3.2. Shearlet-Based Feature Extraction
The shearlet is a tight framework with clear mathematical meaning that provides directional scale decomposition. In the high-frequency part, it can effectively obtain texture information and is thus used for edge detection and corner detection . Alternatively, in the low-frequency part, it can effectively obtain area information and is thus used for denoising .
Let us start with the single-band LiDAR data . The number of scale in equation (9) is set as 3 by default in our shearlet transform, while the construction of (i.e., the Meyer wavelet function) and (i.e., the bump function) is the same as .
In the shearlet compact frame, when the scale is 0, 1, and 2, respectively, there are 4, 8, and 16 support cones with different directions (including the horizontal cone and vertical cone). Among them, the 16 highest frequency part is related to the texture information, while the remaining parts can well characterize the area information; therefore, the shearlet-based frequency features are divided into two parts as follows: where and , respectively, represents the highest frequency and the rest frequency information of the LiDAR data .
Furthermore, for the highest frequency parts , i.e., , the sum of coefficients in 16 directions is used as the measure of texture feature . For the other remaining 13 frequency parts , including the low-frequency region and remaining high-frequency cones with and , the inversion of the shearlet transform is applied to acquire the area information . They can be computed as follows: where is the inversion operator and is the absolute value operator.
Correspondingly, for the KPCA-reduced HSI data , each component performs the above frequency separation process, and then the results are concatenated. Therefore, the texture information and area information for HSI data can be expressed as where and contain the texture and area information of the th component , respectively. The detailed procedure of shearlet-based feature extraction is displayed in Figure 3.
3.3. Gradual Region Merging
The previous steps provide an oversegmentation map () and three different types of description of objects, including spectral information ( and ), area information ( and ) and texture information ( and ). Apparently, it is advantageous to investigate the three features in an unified framework to guide the fusion process of the oversegmentation map .
Specifically, the oversegmentation map is mapped onto an undirected graph. Each superpixel is regarded as a node, and there exists edge only when two superpixels are adjacent. In order to make the description of the proposed progressive region merging process more clear and intuitive, it is divided into two parts: merging cost definition and region merging procedure. Figure 4 illustrates the gradual region merging procedure.
3.3.1. Merging Cost Definition
It can be easily found that the merging cost between two adjacent regions is not only related to the size of the region and the length of shared edges but also related to the similarity among the three different kinds of features, including spectral, area, and texture features. Suppose there are two adjacent regions and in the oversegmentation map , the distance between the two regions in the spectral domain is calculated by the mean gaps and can be defined as follows: where and represent the mean value of region in the th band of KPCA-reduced HSI data and LiDAR data , respectively.
On the other hand, for the area information and texture information , it is necessary to adopt statistical manner to measure the region distance since all the area and the texture features are extracted in the frequency domain. Taking the LiDAR texture information as an example, it is firstly normalized into the interval [0, 256] for convenience. Then, we select interval endpoints (, including 0 and 256) to divide the whole interval into parts with the same length. At the same time, these endpoints are considered as bins in the histogram and its value in region is denoted as follows (this process can be seen in Figure 5): where represents the LiDAR texture information value in region . Thus, the frequency histogram in region is calculating by
After obtaining the frequency distribution in each region, the -statistic distance measurement is applied for two adjacent regions and :
Thus, the LiDAR texture distance can be expressed as . Similarly, the statistical distance of LiDAR area feature , denoted as , can be computed in the same way. Concerning and extracted from the hyperspectral feature , the above statistical calculation procedure is applied on each band, and , , ..., and , , ..., can be obtained correspondingly.
Since the information contained in each spectral band of and LiDAR is inconsistent, it is necessary to fuse these distance measures in a weighted manner, which is computed based on the homogeneity of each segmentation area . Specifically, if the segmentation area has good homogeneity, it should have high weight. Conversely, if the segmented area is heterogeneous, the weight value should be small. In our framework, a locally adaptive approach is implemented. where is the maximal frequency in the corresponding band, while and represent the area distance and texture distance, respectively.
As indicated so far, the dissimilarity of two adjacent region can be defined by where is the balance factor addressing that the spectral distance and statistical distance are at the same order of magnitude. In our experiment, is set as and the interval endpoints is set as .
As mentioned before, the merging cost of region and is not only related to the dissimilarity but also related to the size of the region and the length of shared edges. The smaller the size of the region and the larger the shared boundary between the two regions, the easier the two regions merge together. Based on this point of view, the merging cost of region and is defined as where is the length of shared boundary of region and , while and represent the number of pixels in regions and , respectively.
3.3.2. Region Merging Procedure
A progressive region merging technique is introduced to effectively alleviate the conflict between oversegmentation and undersegmentation of superpixels and largely guarantee the homogeneity of the final merging map. Oversegmented superpixels ensure the homogeneity of each region, while region merging that gradually combines two adjacent similar regions does not introduce an undersegmentation problem with the help of shearlet extracted features.
Specifically, for the initial oversegmentation map , a data structure is utilized to record each pair of adjacent nodes with their merging cost, and a priority queue (denoted as ) is built to store all these structure. Based on the queue, the structure with the smallest cost is chosen and the corresponding two regions (called and for similarity) are obtained as well. Subsequently, all structures related to and in the priority queue are removed. Through adding all points in region into region , some new structures are created to record the reconstructed region and its neighborhood, which are then put into the priority queue. This progressive region merging procedure is carried out until the number of regions reaches a predefined value . At last, the oversegmentation map is gradually transformed into a final merging map .
3.4. Structure-Aware Filter Designing
For a point close to the edge, the surrounding labels are more likely to be different for object classifications, indicating that the neighboring spatial relationship should have less consideration. However, when it is located in the center of a local region, the surrounding objects tend to be the same; thus, the neighboring spatial relationship should be paid more attention. This perspective motivates us to design an adaptive structure-aware filter whose kernel size changes with the distance from a point to the edge.
In fact, it is difficult to obtain accurate edges between objects in HSIs due to the inherent low-spatial resolution of remote sensing images. Fortunately, through applying the well-designed shearlet-based gradual region merging scheme on the SNIC oversegmentation map , a final merging map with lower space constraint conflicts is thus achieved, in which the homogeneity of local regions is largely ensured. Meanwhile, the junctions between regions are regarded as edges. In particular, for each point in , its region boundary must be a continuous closed curve, which means the number of edge points is limited. Therefore, all spatial distances between this point and its region boundary can be calculated. The smallest value is selected to form the distance map . This process can be expressed by the following formula: where is the spatial position matrix of , represents a point of region boundary , , and represents the two-dimensional spatial coordinates of points and , respectively.
However, the direction from different points to their nearest boundary point is not fixed, implying that directly using as the filtering size may cause the filter to be oversized and introduce some disturbing information of other ground objects. As we know, the diagonal of the square is longer than the other inner straight lines. In other words, as long as the diagonal length of the adaptive-size filter is less than , the filter centered by point will not exceed the boundary. Therefore, we convert the distance map into the so-called scale map :
In addition, when the point is at the center of the region, an overly large filter size may contain more outside-region points, which could degrade the feature representation ability. Thus, a threshold-truncated method is introduced:
In our experiments, the threshold is simply set as 55.
Figure 6 illustrates the three circumstances of filter size determination procedure. Concretely, the dotted frame centered on is the filter with , while the solid frame centered on is the filter with . For the point , the dotted frame and solid frame represent the filters without a threshold process and with threshold process, respectively. Clearly, the dotted frames of and are more precise for filter size than those two solid frames. Besides, for the region edge point such as , the filter size is only , which obeys our filter size calculation process as well.
A final note is that all the points in the scale map are assigned an odd value ranging from 1 to 55, indicating the filter size with each pixel. For each spatial pixel , the corresponding structure-aware filter is formulated as where represents a matrix where all element values are 1. Obviously, can be considered as a mean filter with adaptive size for each spatial pixel, which can be visually seen in Figure 1. Hence, the obtained adaptive-size filter achieves structure-aware based on the geometric position of the convolution center. This flexible filter can well protect the difference of different objects on the edge, while reducing the abnormal points in the center area.
3.5. Feature Extraction and Classification
Since the edges in the final merging map may not be accurate edges, classification errors can occur more frequently near the edge. Hence, the formulated structure-aware filter is solely used for feature extraction rather than regularization of classification results. Taking the LiDAR data as an example, the filtering process on each spatial pixel can be expressed as follows: where is the convolution operator. After applying the convolution procedure on each pixel in , the feature can be extracted. Similarly, through applying the convolution procedure on each band of , the corresponding feature cube can be obtained. By concatenating both the features and along the spectral direction, the final feature can be thus achieved.
During classification, random forest (RF) classifier is chosen, which can not only achieve high classification accuracy but also possess fast computation speed. Meanwhile, RF has advantages for antioverfitting and antinoise. Notably, RF consists of two steps: randomly selecting repeatable training subsets and building multiple decision trees, which involves bagging sampling techniques. In the experiments, the default subspace of RF is the floor of the logarithmic value of the features, and the number of trees in the forest is set as 500. Finally, by employing the RF classifier on the extracted feature , the classification map can be thus obtained. At last, the pseudocode of the proposed ShearSAF approach for HSI and LiDAR feature extraction and classification is outlined in Algorithm 1.
The computational complexity of our proposed ShearSAF can be divided into three parts. Firstly, the complexity of SNIC and SNIC-guided KPCA are and , respectively, while computational complexity of shearlet transform is . Secondly, since the number of adjacent nodes for a region is limited, the computational complexity of priority queue is ( in our experiments). Thus, the complexity of the region merging process is . Finally, the computational complexity of the convolution process and RF classification is and , respectively.
4. Experimental Data and Ablation Analysis
In this section, three real HSI and LiDAR datasets in diverse areas are used to evaluate the effectiveness of the proposed ShearSAF framework. Firstly, the three HSI and LiDAR datasets are presented. Secondly, the parameters contained in ShearSAF are analyzed. Thirdly, two ablation experiments are carried out to validate the advantage of the well-designed structure-aware filtering scheme and the superiority of the proposed ShearSAF method over other related filters.
4.1.1. Houston Dataset
The first dataset is captured over the University of Houston campus , in which the Houston HSI contains 144 spectral bands ranging from 380 to 1050 nm. Each band contains 349 1905 pixels with 2.5 m of spatial resolution. Meanwhile, the corresponding LiDAR data has the same spatial size with the height information of surface materials. Fifteen land-cover classes and 15,029 labeled samples are given in the ground-truth image, as shown in Table 2 and Figure 7.
4.1.2. Trento Dataset
The second dataset is collected over the south of Trento, Italy, consisting of 63 spectral bands that range from 400 to 980 nm . Each band is 600 166 pixels with a spatial resolution of 1 m. Likewise, the LiDAR data only has one band of the same spatial size. The six land-cover classes and 30,414 labeled pixels are listed in Table 3 and Figure 8.
4.1.3. MUUFL Gulfport Dataset
The third dataset was collected over the Gulf Park Campus of the University of Southern Mississippi [90, 91]. The spatial size of both HSI and LiDAR data is 325 220 with a spatial resolution of 1 m. After removing eight noisy bands from the original 72 bands of the HSI data, 64 spectral bands are employed in the experiment. The details are given in Table 4 and Figure 9.
4.2. Parameter Setting
In our proposed ShearSAF framework, there are several parameters that should be carefully specified. Concerning the scale parameter () for Shearlet transform, it is set as according to their original paper. With respect to the weight parameters for SNIC, and , they, respectively, correspond to the spectral and spatial dimension and thus are set as and . Meanwhile, the number of internal endpoints is set as to facilitate the subsequent -statistic distance computation. For the dimension in KPCA, the corresponding number should guarantee that 99.5% energy is reserved.
In fact, there are two parameters in the gradual region merging procedure that are necessary to be determined: the initial number of pixels inside superpixel block in the oversegmentation map and the number of regions in the final merging map . In fact, it is difficult to obtain the final number of homogeneous regions as a fixed value for different datasets because of the impacts of object distributions, spatial complexity, and so on. Here, we propose a heuristic way to calculate , which contains the class number (), spatial complexity (), and space size ( and ). where is the floor operator. is defined as follows: the Sobel operator is adopted on the three normalized principal components of HSI and normalized LiDAR to calculate their gradients, and then, the sum of absolute values divided by is used as the spatial complexity. By this heuristic method, is , , and and is , , and for Houston, Trento, and MUUFL Gulfport datasets, respectively.
To prove the effectiveness of our strategy, we conduct a series of experiments to track the process of gradual region merging and record the overall accuracy (OA, which is computed by dividing the correctly predicted samples with the number of testing ones) varying with different and . Figure 10 shows that the OA varies with the parameter and for the Houston, Trento, and MUUFL Gulfport dataset. Here, the parameter ranges from 20 to 100 with the steps of 10, and then, the parameter ranges from 50 to 600 with the steps of 50 for the Trento dataset and MUUFL Gulfport dataset, while it ranges from 1000 to 6000 with steps of 500 for the Houston dataset. As far as the small sample set scenario is concerned, only 3, 5, 10, and 15 samples per class are randomly chosen from the labeled set, and the remaining labeled samples are used for testing. Each experiment is executed 20 times to obtain the mean value. It can be seen from Figure 10 that the OA is better when are, respectively, 3500, 100, and 200 for the three datasets, which are close to the values that are heuristically calculated by equation (39).
Two more observations can be found from Figure 10 First, the OA increases first and then decreases with decreasing . This is reasonable because the adjacent regions with similar objects are merged to improve feature performance at the beginning, while two adjacent areas with different objects are merged after reaches the critical value, leading to a decline in classification performance. The second is that OA has a slight increase with the decrease in in the three datasets. In fact, the parameter is used to ensure the homogeneity of each oversegmented region so that too many pixels inside the superpixel region would decrease the homogeneity. In the experiment, is set as 50 for the three datasets, which not only keeps the region homogeneity but also promotes the calculation speed in region merging. To be more clear, Figure 11 illustrates the result of the gradual region merging procedure on the three datasets. It can be easily observed the structure information of various materials can be well represented.
At last, the parameter setting in the proposed ShearSAF approach is summarized in Table 5. Apparently, all the parameters included can be either preset and kept unchanged for different experimental datasets (such as , , and ) or heuristically computed (such as and ); hence, the robustness and generalization ability of ShearSAF can be guaranteed, which is a distinct advantage of the proposed ShearSAF approach.
4.3. Ablation Analysis
In this part, two ablation experiments are carried out to validate the effectiveness and superiority of the proposed structure-aware filtering scheme. On the one hand, our ShearSAF is compared with fixed-size mean filters, whose kernel sizes range from 1 to 55 with a step of 2. That is, the features are obtained by convolving the KPCA-reduced HSI and LiDAR with the mean filter that has the fixed spatial size all the time, and the RF classifier is then employed. Similarly, the experiment is executed 20 times due to the small training sample scenario, and the OA of the mean filters with different sizes on the Houston, Trento, and MUUFL Gulfport datasets is illustrated in Figure 12. It should be mentioned that the four curves from the bottom to the top (blue, red, green, and black) indicate the performance of the mean filters with a fixed size under the conditions of 3, 5, 10, and 15 training samples per class as the training set, respectively, and correspondingly, the horizontal dotted lines from the bottom to the top (blue, red, green, and black) represent the performance of ShearSAF with the same training set, respectively.
It can be easily observed from Figure 12 that the OA of the curve rises when the filter size is relatively small. Analytically, the ability to filter noise and abnormal points is improved as the kernel size increases for considering more neighborhood relations. Then, the OA drops when the filter size increases continuously. This is because the continuous increase in the filter size will damage the feature performance at the junctions of objects. Moreover, it can be clearly seen that our ShearSAF approach always shows the best performance, implying that our structure-aware filter design does protect the edges and filter the noise in the center region. Besides, it is worth mentioning that the kernel size of the optimal filter is inconsistent for different datasets. For the three real datasets concerned here, as illustrated in Figure 12, the optimal filter size is 9, 7, and 3 for Houston, Trento, and MUUFL Gulfport, respectively, and thus, the filter size is hard to be determined in advance in practice. Alternatively, our structure-aware filter design can automatically adjust the filter size according to the well-designed scale map and achieve higher accuracy, indicating the advantage and feasibility of the proposed ShearSAF approach.
Alternatively, our structure-aware design with other filters is also examined, as illustrated in Figure 13. Here, both the Gaussian and Gabor filters are taken into consideration. Specifically, the ShearSAF-Gaussian means that two-dimensional (2D) Gaussian with structure-aware size is applied on the stacked HSI and LiDAR data. In other words, we obtain the scale map in the same way as the ShearSAF, and each point in the obtained scale map represents the corresponding Gaussian filter size. Then, the structure-aware Gaussian filters are convolved with the stacked HSI and LiDAR data to achieve the related features. At last, the RF classifier is utilized for classification. Similarly, a series of 2D Gabor filters (four scales and six orientations) with adaptive spatial size is applied on the stacked HSI and LiDAR data for feature extraction, called ShearSAF-Gabor. It can be seen from Figure 13 that ShearSAF-Gabor performs better than ShearSAF-Gaussian on the Trento dataset, while the opposite situation can be observed in the Houston and MUUFL Gulfport datasets. This is reasonable since the spatial distribution of objects in the Trento dataset (as shown in Figure 8) is more regular than the rest two datasets (as shown in Figures 7 and 9); the features obtained by the 2D Gabor filters with various orientations and scales can be more specific than those extracted by the Gaussian filter. Furthermore, the proposed ShearSAF constantly achieves the best results on the three datasets all the time, validating the importance and suitability of the simplest mean filter for our ShearSAF approach.
5. Experimental Results
In this section, a number of state-of-the-art feature extraction and fusion algorithms are incorporated to compare with the proposed ShearSAF approach. Firstly, two simplest methods, including the RF classifier on the raw HSI data (named as Raw-H) and on the concatenation of both HSI and LiDAR data (named as Raw), are used as the benchmark. Secondly, three deep learning-based methods, 3D-CNN (3D convolutional neural network , a classic deep learning-based method that can simultaneously capture spatial-spectral joint information), miniGCN (minibatch graph convolutional network , an emerging deep learning-based method that allows to train large-scale GCNs in a minibatch fashion), and SAE-LR (stacked autoencoder with logistic regression , an autoencoder-based deep learning method that can preserve abstract and invariant information in deeper features), are taken into consideration for HSI and LiDAR data classification. Thirdly, five widely used feature extraction and fusion algorithms, that is, NMFL (nonlinear multiple feature learning-based classification  that explores different types of available features in a collaborative and flexible way), EMAP (extended morphological attribute profile ), GGF (generalized graph-based fusion ), EPCA (a novel ensemble classifier ), and OTVCA (orthogonal total variation component analysis  that can get the best low-rank representation and show strong antinoise ability), are also conducted on both HSI and LiDAR data. For the classification issue, 3 to 15 samples per class are randomly selected from the labeled dataset to form the training set, while the rest are used for the testing set. At the same time, each experiment is run twenty times in order to reduce the effects of random factors. Both the mean values and standard deviations are reported. Except the OA measure, the kappa coefficient (), which reflects the impact of classes, is also adopted to evaluate the classification performance.
Figures 14–16 show the OA and of the eleven compared methods including the Raw-H, Raw, 3D-CNN, miniGCN, SAE-LR, NMFL, EPCA, GGF, EMAP, OTVCA, and our ShearSAF when the training set ranges from 3 to 15. It should be noted that the OA obtained by a single LiDAR data is much smaller than that of other methods; thus, it has not been added for comparison. Generally, the classification performs better as the number of training sample grows for the three datasets. Compared to the Raw method, the performance of the Raw-H that just uses HSI data shows lower classification accuracies, confirming that the supplement of LiDAR information can improve the performance of HSI classification. Specifically, HSI data provides abundant spectral information for distinguishing materials with different physical properties, while LiDAR provides shape and height information that can be used to distinguish different targets of the same material. For the rest compared methods, the proposed ShearSAF outputs the highest results all the time, which is reasonable since the designed structure-aware filters can reduce its size to avoid interclass interference at the near edge and introduce more neighborhood information to reduce the environmental impact at the region center.
In addition, it should be noted that deep learning-based methods behaved badly for three datasets for limited training samples. Specifically, SAE-LR gives the worst performance on the Houston and MUUFL Gulfport dataset, while 3D-CNN performed the worst on the Trento dataset and the second worst on the Houston dataset. As for miniGCN, it is also lower than the traditional classification method in most cases. Analytically, the deep learning-based methods usually need a great quantity of training samples to constantly modify the magnanimous parameters in the process of training model. But the small sample set in the experiments significantly limits the performance of deep learning-based methods. Meanwhile, the training process of deep learning methods requires considerable time consumption as well.
Furthermore, when there are only five training samples per class, the classification performances, including each class accuracy, OA, and of the eleven methods, have been summarized in Tables 6–8 for the Houston, Trento, and MUUFL Gulfport datasets, respectively. It can be seen that ShearSAF outputs the best performance in most cases, which favors the superiority of our ShearSAF method. In more detail, considering the C5 class (vineyard) of the Trento dataset, it can be found from the ground-truth map (Figure 8) that the spatial distribution of C5 is very regular, and ShearSAF effectively filters the noise in the area and protects the edges; thus, the performance increases from 72.58% for the RAW method to 98.23% for our approach, as illustrated in Table 7. Alternatively, concerning the C10 class (yellow curbs) of the MUUFL Gulfport dataset in Table 8, it has scattering in the scene and is even hard to be seen in Figure 9. Although our method is not optimal in the C10 class, this structure-aware filter does work for reducing its own size and keeping the target information from being interfered by neighboring objects. To illustrate, the ground-truth and the complete classification maps for all the three datasets of the eleven compared methods are shown in Figures 17–19. It can be easily observed that our ShearSAF approach is outstanding compared to the others, demonstrating the effectiveness of the proposed method.
Finally, when there are five training samples per class, the computation time is given in Table 9, which was recorded by a workstation with a 24-core Intel processor at 2.20 GHz with 128 GB RAM. As expected, the deep learning-based method (3D-CNN, miniGCN, and SAE-LR) takes more times than the others because model training and parameter optimization require considerable time. It can be observed that the time cost of our ShearSAF method is less than that of the other methods, which is mainly due to the irrelevance of the structure-aware feature extraction procedure with the training set. That is to say, the feature extraction procedure in ShearSAF method is executed only once, while RF classifier has low computational cost; therefore, the proposed ShearSAF method is computationally efficient and is applicable for remote sensing image with large spatial size, which proves the superiority of our method once again.
In this paper, a newly designed shearlet-based structure-aware filtering approach has been proposed for HSI and LiDAR feature extraction. Specifically, the shearlet transform is implemented on the KPCA-reduced HSI and LiDAR data for area and texture feature extraction. Then, the spectral, area, and texture features are used to guide the gradual region merging procedure, which converts the initial oversegmentation map into a final merging map, and the spatial structure of objects can be well characterized. By calculating the edge distance in the final merging map, the scale map can be acquired, which is utilized to adaptively select the filter size for convolution. Finally, the RF classifier is used for classification.
In summary, the most important contribution of this article involves the design of the structure-aware filtering design. In this process, we innovatively proposed a shearlet-based area and texture feature representation that could effectively measure the distance between two adjacent areas. At the same time, the structure-aware filter is constructed in an elegant manner to ensure that the pixel near the edge could have a small-size kernel to protect the information from being disturbed by nearby objects, while the point at the center of the area could have a larger kernel size to filter noise and abnormal points. Two ablation experiments with various fixed-size mean filters and other adaptive-size filters (Gaussian and Gabor) demonstrate the effectiveness of the proposed ShearSAF method. Meanwhile, comparison with several state-of-the-art methods (3D-CNN, miniGCN, SAE-LR, NMFL, EPCA, GGF, EMAP, and OTVCA) constantly shows the superiority of the proposed ShearSAF approach. At last, we still want to emphasize that the structure-aware filtering design presented here can be further embedded with other kinds of feature. For instance, manifold learning-based methods, such as LLE (locally linear embedding) and ISOMAP (isometric mapping), can be used for dimension reduction and feature extraction. Furthermore, structure-aware filter design pattern can be integrated with other central-based filters (including Gaussian and median filter) to extract discriminative feature and improve the robustness of the whole framework. All these aspects are worthy of more attention.
The codes of this work are available at http://jiasen.tech/papers/.
Conflicts of Interest
The authors declare no conflicts of interest.
S. Jia, Z. Zhan, and M. Xu proposed the method. Z. Zhan implemented the experiments. S. Jia, Z. Zhan, and M. Xu wrote the manuscript. All authors read and approved the final manuscript.
This work was supported in part by the National Natural Science Foundation of China under Grant 41971300 and Grant 61901278; in part by the Key Project of Department of Education of Guangdong Province under Grant 2020ZDZX3045; and in part by the Shenzhen Scientific Research and Development Funding Program under Grant JCYJ20180305124802421 and Grant JCYJ20180305125902403.
- J. Richards, Remote Sensing Digital Image Analysis: An Introduction, Springer, 2013.
- G. Camps-Valls, D. Tuia, L. Gómez-Chova, S. Jiménez, and J. Malo, “Remote Sensing Image Processing,” Synthesis Lectures on Image, Video, and Multimedia Processing, vol. 5, no. 1, pp. 1–192, 2011.
- J. Bioucas-Dias, A. Plaza, N. Dobigeon et al., “Hyperspectral unmixing overview: geometrical, statistical, and sparse regression-based approaches,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 354–379, 2012.
- J. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” IEEE Geoscience and Remote Sensing Magazine, vol. 1, no. 2, pp. 6–36, 2013.
- M. Khodadadzadeh, J. Li, S. Prasad, and A. Plaza, “Fusion of hyperspectral and LiDAR remote sensing data using multiple feature learning,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 8, no. 6, pp. 2971–2983, 2015.
- M. Kishore and S. Kulkarni, “Approches and challenges in classification for hyperspectral data: a review,” in 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 3418–3421, Chennai, India, 2016.
- S. Jia, Z. Zhu, L. Shen, and Q. Li, “A two-stage feature selection framework for hyperspectral image classification using few labeled samples,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 4, pp. 1023–1035, 2014.
- Y. Zhou, J. Peng, and C. Chen, “Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 2, pp. 1082–1095, 2015.
- P. Hartzell, C. Glennie, and S. Khan, “Terrestrial hyperspectral image shadow restoration through lidar fusion,” Remote Sensing, vol. 9, no. 5, p. 421, 2017.
- S. Sun and C. Salvaggio, “Aerial 3d building detection and modeling from airborne LiDAR point clouds,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 3, pp. 1440–1449, 2013.
- C. Paris and L. Bruzzone, “A three-dimensional model-based approach to the estimation of the tree top height by fusing low-density LiDAR data and very high resolution optical images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 1, pp. 467–480, 2015.
- P. Ghamisi and B. Höfle, “LiDAR data classification using extinction profiles and a composite kernel support vector machine,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 5, pp. 659–663, 2017.
- J. Rau, J. Jhan, and Y. Hsu, “Analysis of oblique aerial images for land cover and point cloud classification in an urban environment,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 3, pp. 1304–1319, 2015.
- M. Soleimanzadeh, A. Karami, and P. Scheunders, “Fusion of hyperspectral and LiDAR images using non-subsampled shearlet transform,” in IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), pp. 8873–8876, Valencia, Spain, 2018.
- M. Zhang, P. Ghamisi, and W. Li, “Classification of hyperspectral and LiDAR data using extinction profiles with feature fusion,” Remote Sensing Letters, vol. 8, no. 10, pp. 957–966, 2017.
- B. Bigdeli and P. Pahlavani, “A Dempster Shafer-based fuzzy multisensor fusion system using airborne LiDAR and hyperspectral imagery,” International Journal of Remote Sensing, vol. 39, no. 21, pp. 7718–7737, 2018.
- C. Ge, Q. Du, W. Li, Y. Li, and W. Sun, “Hyperspectral and LiDAR data classification using kernel collaborative representation based residual fusion,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 12, no. 6, pp. 1963–1973, 2019.
- C. Debes, A. Merentitis, R. Heremans et al., “Hyperspectral and LiDAR data fusion: outcome of the 2013 GRSS data fusion contest,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2405–2418, 2014.
- Z. Zhong, B. Fan, K. Ding, H. Li, S. Xiang, and C. Pan, “Efficient multiple feature fusion with hashing for hyperspectral imagery classification: a comparative study,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4461–4478, 2016.
- B. Rasti, P. Ghamisi, and R. Gloaguen, “Hyperspectral and LiDAR fusion using extinction profiles and total variation component analysis,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 7, pp. 3997–4007, 2017.
- W. Chen, X. Dai, B. Pan, and T. Huang, “A novel discriminant criterion based on feature fusion strategy for face recognition,” Neurocomputing, vol. 159, pp. 67–77, 2015.
- Wenzhi Liao, A. Pizurica, R. Bellens, S. Gautama, and W. Philips, “Generalized graph-based fusion of hyperspectral and LiDAR data using morphological features,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 3, pp. 552–556, 2015.
- Z. Ye, S. Prasad, W. Li, J. Fowler, and M. He, “Classification based on 3-D DWT and decision fusion for hyperspectral image analysis,” IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 1, pp. 173–177, 2014.
- K. Schindler, “An overview and comparison of smooth labeling methods for land-cover classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 11, pp. 4534–4545, 2012.
- W. Liao, R. Bellens, A. Pizurica, S. Gautama, and W. Philips, “Combining feature fusion and decision fusion for classification of hyperspectral and LiDAR data,” in IEEE Geoscience and Remote Sensing Symposium, pp. 1241–1244, Quebec City, QC, Canada, 2014.
- R. Luo, W. Liao, H. Zhang, Y. Pi, and W. Philips, “Classification of cloudy hyperspectral image and LiDAR data based on feature fusion and decision fusion,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 2518–2521, Beijing, China, 2016.
- R. Luo, W. Liao, H. Zhang et al., “Fusion of hyperspectral and LiDAR data for classification of cloud-shadow mixed remote sensed scene,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 8, pp. 3768–3781, 2017.
- Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based classification of hyperspectral data,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2094–2107, 2014.
- K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deep supervised learning for hyperspectral data classification through convolutional neural networks,” in 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 4959–4962, Milan, Italy, 2015.
- M. Zhang, W. Li, and Q. Du, “Diverse region-based CNN for hyperspectral image classification,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2623–2634, 2018.
- A. Ben Hamida, A. Benoit, P. Lambert, and C. Ben Amar, “3-D deep learning approach for remote sensing image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 8, pp. 4420–4434, 2018.
- L. Mou, P. Ghamisi, and X. X. Zhu, “Deep recurrent neural networks for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 7, pp. 3639–3655, 2017.
- A. Ma, A. Filippi, Z. Wang, and Z. Yin, “Hyperspectral image classification using similarity measurements-based deep recurrent neural networks,” Remote Sensing, vol. 11, no. 2, p. 194, 2019.
- Y. Li, H. Zhang, and Q. Shen, “Spectral-spatial classification of hyperspectral imagery with 3d convolutional neural network,” Remote Sensing, vol. 9, no. 1, p. 67, 2017.
- P. Ghamisi, B. Höfle, and X. X. Zhu, “Hyperspectral and LiDAR data fusion using extinction profiles and deep convolutional neural network,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 6, pp. 3011–3024, 2017.
- Q. Cao, Y. Zhong, A. Ma, and L. Zhang, “Urban land use/land cover classification based on feature fusion fusing hyperspectral image and LiDAR data,” in IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 8869–8872, Valencia, Spain, 2018.
- S. Yu, S. Jia, and C. Xu, “Convolutional neural networks for hyperspectral image classification,” Neurocomputing, vol. 219, pp. 88–98, 2017.
- B. Liu, X. Yu, P. Zhang, X. Tan, A. Yu, and Z. Xue, “A semi-supervised convolutional neural network for hyperspectral image classification,” Remote Sensing Letters, vol. 8, no. 9, pp. 839–848, 2017.
- H. Wu and S. Prasad, “Semi-supervised deep learning using pseudo labels for hyperspectral image classification,” IEEE Transactions on Image Processing, vol. 27, no. 3, pp. 1259–1270, 2018.
- B. Pan, Z. Shi, and X. Xu, “R-VCANet: a new deep-learning-based hyperspectral image classification method,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 5, pp. 1975–1986, 2017.
- B. Pan, Z. Shi, and X. Xu, “MugNet: deep learning for hyperspectral image classification using limited samples,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 145, pp. 108–119, 2018.
- K. Guo, D. Labate, W.-Q. Lim, G. Weiss, and E. Wilson, “Wavelets with composite dilations and their MRA properties,” Applied and Computational Harmonic Analysis, vol. 20, no. 2, pp. 202–236, 2006.
- G. Easley, D. Labate, and W.-Q. Lim, “Sparse directional image representations using the discrete shearlet transform,” Applied and Computational Harmonic Analysis, vol. 25, no. 1, pp. 25–46, 2008.
- S. Jia, L. Shen, J. Zhu, and Q. Li, “A 3-D Gabor phase-based coding and matching framework for hyperspectral imagery classification,” IEEE Transactions on Cybernetics, vol. 48, no. 4, pp. 1176–1188, 2018.
- S. Jia, Z. Lin, B. Deng, J. Zhu, and Q. Li, “Cascade superpixel regularized Gabor feature fusion for hyperspectral image classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 5, pp. 1638–1652, 2020.
- E. Candès, L. Demanet, D. Donoho, and L. Ying, “Fast discrete curvelet transforms,” Multiscale Modeling & Simulation, vol. 5, no. 3, pp. 861–899, 2006.
- J. Ma and G. Plonka, “A review of curvelets and recent applications,” IEEE Signal Processing Magazine, vol. 27, 2011.
- A. L. Da Cunha, J. Zhou, and M. N. Do, “The nonsubsampled contourlet transform: theory, design, and applications,” IEEE Transactions on Image Processing, vol. 15, no. 10, pp. 3089–3101, 2006.
- W. Lim, “The discrete shearlet transform: a new directional transform and compactly supported shearlet frames,” IEEE Transactions on Image Processing, vol. 19, no. 5, pp. 1166–1180, 2010.
- G. R. Easley, D. Labate, and W. Lim, “Optimally sparse image representations using shearlets,” in 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, pp. 974–978, Pacific Grove, CA, USA, 2006.
- P. S. Negi and D. Labate, “3-D discrete shearlet transform and video processing,” IEEE Transactions on Image Processing, vol. 21, no. 6, pp. 2944–2954, 2012.
- W. Lim, “Nonseparable shearlet transform,” IEEE Transactions on Image Processing, vol. 22, no. 5, pp. 2056–2065, 2013.
- Sheng Yi, D. Labate, G. R. Easley, and H. Krim, “A shearlet approach to edge analysis and detection,” IEEE Transactions on Image Processing, vol. 18, no. 5, pp. 929–941, 2009.
- K. Guo, D. Labate, and W.-Q. Lim, “Edge analysis and identification using the continuous shearlet transform,” Applied and Computational Harmonic Analysis, vol. 27, no. 1, pp. 24–46, 2009.
- M. A. Duval-Poo, F. Odone, and E. De Vito, “Edges and corners with shearlets,” IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3768–3780, 2015.
- G. R. Easley, D. Labate, and F. Colonna, “Shearlet-based total variation diffusion for denoising,” IEEE Transactions on Image Processing, vol. 18, no. 2, pp. 260–268, 2009.
- S. Häuser and G. Steidl, “Convex multiclass segmentation with shearlet regularization,” International Journal of Computer Mathematics, vol. 90, no. 1, pp. 62–81, 2013.
- Y. Li, L. Po, C. Cheung et al., “No-reference video quality assessment with 3d shearlet transform and convolutional neural networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 6, pp. 1044–1057, 2016.
- M. Zaouali, S. Bouzidi, and E. Zagrouba, “3-D shearlet transform based feature extraction for improved joint sparse representation HSI classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 11, no. 4, pp. 1306–1314, 2018.
- H. Rezaei, A. Karami, and P. Scheunders, “Hyperspectral and multispectral image fusion based on spectral matching in the shearlet domain,” in IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 8070–8073, Valencia, Spain, 2018.
- A. Moore, S. Prince, J. Warrell, U. Mohammed, and G. Jones, “Superpixel lattices,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Anchorage, AK, USA, 2008.
- W. Wang, D. Xiang, Y. Ban, J. Zhang, and J. Wan, “Superpixel segmentation of polarimetric SAR images based on integrated distance measure and entropy rate method,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 9, pp. 4045–4058, 2017.
- F. Meng, H. Li, Q. Wu, B. Luo, C. Huang, and K. N. Ngan, “Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 4, pp. 906–919, 2018.
- S. Patel and B. Kadhiwala, “Comparative analysis of cluster based superpixel segmentation techniques,” in 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), pp. 1454–1459, Tirunelveli, India, 2018.
- R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, “SLIC superpixels compared to state-of-the-art superpixel methods,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, 2012.
- R. Achanta and S. Süsstrunk, “Superpixels and polygons using simple non-iterative clustering,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4895–4904, Honolulu, HI, USA, 2017.
- Jianbo Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888–905, 2000.
- M. Liu, O. Tuzel, S. Ramalingam, and R. Chellappa, “Entropy rate superpixel segmentation,” in CVPR 2011, pp. 2097–2104, Colorado Springs, CO, USA, 2011.
- Z. Hu, Q. Zou, and Q. Li, “Watershed superpixel,” in 2015 IEEE International Conference on Image Processing (ICIP), pp. 349–353, Quebec City, QC, Canada, 2015.
- N. Zhang and L. Zhang, “SSGD: superpixels using the shortest gradient distance,” in 2017 IEEE International Conference on Image Processing (ICIP), pp. 3869–3873, Beijing, China, 2017.
- Y. Guo, L. Jiao, S. Wang, S. Wang, F. Liu, and W. Hua, “Fuzzy superpixels for polarimetric SAR images classification,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 5, pp. 2846–2860, 2018.
- C. Wu, J. Zheng, Z. Feng et al., “Fuzzy SLIC: Fuzzy Simple Linear Iterative Clustering,” IEEE Transactions on Circuits and Systems for Video Technology, p. 1, 2020.
- S. Jia, X. Deng, J. Zhu, M. Xu, J. Zhou, and X. Jia, “Collaborative representation-based multiscale superpixel fusion for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 10, pp. 7770–7784, 2019.
- Q. Leng, H. Yang, J. Jiang, and Q. Tian, “Adaptive multiscale segmentations for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 8, pp. 5847–5860, 2020.
- Linlin Shen and Sen Jia, “Three-dimensional Gabor wavelets for pixel-based hyperspectral imagery classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 12, pp. 5039–5046, 2011.
- F. Mirzapour and H. Ghassemian, “Multiscale Gaussian derivative functions for hyperspectral image feature extraction,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 4, pp. 525–529, 2016.
- Y. Teng, Y. Zhang, Y. Chen, and C. Ti, “Adaptive morphological filtering method for structural fusion restoration of hyperspectral images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 9, no. 2, pp. 655–667, 2016.
- S. Wu, J. Zhang, C. Shi, and W. Li, “Multiscale spectral-spatial hyperspectral image classification with adaptive filtering,” in IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 2591–2594, Valencia, Spain, 2018.
- Z. Sun, Z. Zhang, Y. Chen, S. Liu, and Y. Song, “Frost filtering algorithm of SAR images with adaptive windowing and adaptive tuning factor,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 6, pp. 1097–1101, 2020.
- Y. Yang, W. Wan, S. Huang, F. Yuan, S. Yang, and Y. Que, “Remote sensing image fusion based on adaptive IHS and multiscale guided filter,” IEEE Access, vol. 4, pp. 4573–4582, 2016.
- C. Kadam and S. B. Borse, “An improved image denoising using spatial adaptive mask filter for medical images,” in 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA),, pp. 1–5, Pune, India, 2017.
- D. Labate, W.-Q. Lim, G. Kutyniok, and G. Weiss, “Sparse multidimensional representation using shearlets,” in Wavelets XI, San Diego, California, USA, Aug. 2005.
- S. Huser and G. Steidl, “Fast finite shearlet transform,” 2014, https://arxiv.org/abs/1202.1773.
- M. Fauvel, J. Chanussot, and J. A. Benediktsson, “Kernel principal component analysis for the classification of hyperspectral remote sensing data over urban areas,” EURASIP Journal on Advances in Signal Processing, vol. 2009, no. 1, pp. 1–14, 2009.
- H. Halim, S. Isa, and S. Mulyono, “Comparative analysis of PCA and KPCA on paddy growth stages classification,” in 2016 IEEE Region 10 Symposium (TENSYMP), pp. 167–172, Bali, Indonesia, 2016.
- S. Jia, Z. Zhan, M. Zhang et al., “Multiple feature-based superpixel-level decision fusion for hyperspectral and LiDAR data classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 2, pp. 1437–1452, 2021.
- A. Karami, R. Heylen, and P. Scheunders, “Band-specific shearlet-based hyperspectral image noise reduction,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 9, pp. 5054–5066, 2015.
- Z. Hu, Z. Wu, Q. Zhang, Q. Fan, and J. Xu, “A spatially-constrained color–texture model for hierarchical VHR image segmentation,” IEEE Geoscience and Remote Sensing Letters, vol. 10, no. 1, pp. 120–124, 2013.
- M. Zhang, W. Li, Q. Du, L. Gao, and B. Zhang, “Feature extraction for classification of hyperspectral and LiDAR data using patch-to-patch CNN,” IEEE Transactions on Cybernetics, vol. 50, no. 1, pp. 100–111, 2020.
- P. Gader, A. Zare, R. Close, J. Aitken, and G. Tuell, MUUFL Gulfport hyperspectral and LiDAR airborne data set, University of Florida, Gainesville, 2013.
- X. Du and A. Zare, “Technical report: scene label ground truth map for MUUFL Gulfport data set,” Tech. Rep., University of Florida, Gainesville, 2017.
- D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot, “Graph convolutional networks for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, pp. 1–13, 2020.
- J. Li, X. Huang, P. Gamba et al., “Multiple feature learning for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, pp. 1592–1606, 2015.
- M. Dalla-Mura, J. Atli-Benediktsson, B. Waske, and L. Bruzzone, “Extended profiles with morphological attribute filters for the analysis of hyperspectral data,” International Journal of Remote Sensing, vol. 31, no. 22, pp. 5975–5991, 2010.
- J. Xia, N. Yokoya, and A. Iwasaki, “Fusion of hyperspectral and LiDAR data with a novel ensemble classifier,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 6, pp. 957–961, 2018.
- B. Rasti, D. Hong, R. Hang et al., “Feature extraction for hyperspectral imagery: the evolution from shallow to deep: overview and toolbox,” IEEE Geoscience and Remote Sensing Magazine, vol. 8, no. 4, pp. 60–88, 2020.
Copyright © 2021 Sen Jia et al. Exclusive Licensee Aerospace Information Research Institute, Chinese Academy of Sciences. Distributed under a Creative Commons Attribution License (CC BY 4.0).