Research Article | Open Access
Meiyu Huang, Yao Xu, Lixin Qian, Weili Shi, Yaqin Zhang, Wei Bao, Nan Wang, Xuejiao Liu, Xueshuang Xiang, "A Bridge Neural Network-Based Optical-SAR Image Joint Intelligent Interpretation Framework", Space: Science & Technology, vol. 2021, Article ID 9841456, 10 pages, 2021. https://doi.org/10.34133/2021/9841456
A Bridge Neural Network-Based Optical-SAR Image Joint Intelligent Interpretation Framework
The current interpretation technology of remote sensing images is mainly focused on single-modal data, which cannot fully utilize the complementary and correlated information of multimodal data with heterogeneous characteristics, especially for synthetic aperture radar (SAR) data and optical imagery. To solve this problem, we propose a bridge neural network- (BNN-) based optical-SAR image joint intelligent interpretation framework, optimizing the feature correlation between optical and SAR images through optical-SAR matching tasks. It adopts BNN to effectively improve the capability of common feature extraction of optical and SAR images and thus improving the accuracy and application scenarios of specific intelligent interpretation tasks for optical-SAR/SAR/optical images. Specifically, BNN projects optical and SAR images into a common feature space and mines their correlation through pair matching. Further, to deeply exploit the correlation between optical and SAR images and ensure the great representation learning ability of BNN, we build the QXS-SAROPT dataset containing 20,000 pairs of perfectly aligned optical-SAR image patches with diverse scenes of high resolutions. Experimental results on optical-to-SAR crossmodal object detection demonstrate the effectiveness and superiority of our framework. In particular, based on the QXS-SAROPT dataset, our framework can achieve up to 96% high accuracy on four benchmark SAR ship detection datasets.
With the rapid development of deep learning, remarkable breakthroughs have been made in deep learning-based land use segmentation, scene classification, object detection, and recognition in the field of remote sensing in the past decade [1–4]. This is mainly due to the powerful feature extraction and representation ability of deep neural networks [5–8], which can well map the remote sensing observations into the desired geographical knowledge. However, the current mainstream interpretation technology for remote sensing images is still mainly focused on single-modal data and cannot make full use of the complementary and correlated information of multimodal data from different sensors with heterogeneous characteristics, resulting in insufficient intelligent interpretation capabilities and limited application scenarios. For example, optical imaging is easily restricted by illumination and weather conditions, based on which accurate interpretation cannot be obtained at night or under complex weather with clouds, fog, and so on. Compared with optical imaging, synthetic aperture radar (SAR) imaging can achieve full-time and all-weather earth observations. However, due to the lack of texture features, it is difficult for SAR images to be interpreted even by well-trained experts. Therefore, gathering sufficient amounts of training SAR data with diverse scenes and accurate labeling is a challenging problem, which heavily affects the deep research and application of SAR images based on intelligent interpretation. To address the above issues, multimodal data fusion [9–12] becomes one of the most promising application directions of deep learning in remote sensing, especially the combined utilization of SAR and optical data because these data modalities are completely different from each other in terms of geometric and radiometric appearance [13–17].
However, the existing optical-SAR fusion techniques mainly concentrate on the matching problem. The proposed optical-SAR image matching methods can be divided into three types: signal-based, hand-crafted feature-based, and deep learning-based approaches. For the signal-based similarity measures, crosscorrelation (CC)  and mutual information (MI) [19, 20] have been widely used to the optical-SAR matching tasks. Since MI is an intensity-based statistical measure and has good adaptability to geometric and radiometric changes, it is extensively outperformed CC in optical-SAR image matching. Nevertheless, signal-based approaches that do not contain any local structure information are not robust and accurate enough for matching multisensor images. Feature-based methods commonly utilize invariant key points and feature descriptors. The reason why it comes to a better result may stem from the feature descriptors, which are less sensitive to the geometric and radiometric changes. Many traditional hand-crafted methods have been proposed for optical-SAR image matching, such as SIFT , SAR-SIFT , and HOPC [23, 24]. However, considering highly divergence between SAR and optical images and the computing ability, the hand-crafted feature-based matching approaches are quite limited to get a further step. Because of the powerful feature extraction and representation learning ability, exploiting convolution neural networks to extract the deep features has achieved a high matching accuracy. The mainstream architecture for optical-SAR image matching is Siamese network [25–28], which is composed of two identical convolutional streams. The dual network is used to extract deep characteristic information of input image pairs; therefore, the deep features in the same space can be measured under the same metric. However, the Siamese network can be only applied to the optical-SAR image matching problem; yet, no subsequent optical-SAR images joint interpretation work.
Based on the analysis above, we innovatively propose a bridge neural network- (BNN-) based optical-SAR image joint intelligent interpretation framework, which utilizes BNN to enhance the general feature embedding of optical and SAR images to improve the accuracy and application scenarios of specific optical-SAR images joint intelligent interpretation tasks. Completely different from the Siamese network, BNN contains two independent feature extraction networks and projects the optical and SAR images into a subspace to learn the desired common representation where features can be measured with Euclidean distance. The proposed framework is shown in Figure 1. BNN is trained on an optical-SAR image matching dataset to learn the common representation from optical and SAR images so that the BNN model can be transferred to the feature extraction module for fine-tuning the interpretation model on optical-SAR/SAR/optical image interpretation datasets.
Further, to verify the effectiveness and the superiority of our proposed framework and promote the development of research in optical-SAR image fusion based on deep learning, it is very important to obtain datasets of a large number of perfectly aligned optical-SAR images. Besides, considering the existing optical-SAR image matching dataset either lacks scene diversity due to the huge difficulty in pixel-level matching between optical and SAR images , or has a low resolution limited by the remote sensing satellites , or covers only a single area , which cannot fully exploit the relevance of optical and SAR images, we publish the QXS-SAROPT dataset, which contains 20,000 optical-SAR patch pairs from multiple scenes of a high resolution. Specifically, the SAR images are collected from the Gaofen-3 satellite , and the corresponding optical images are from the Google Earth . These images spread across landmasses of San Diego, Shanghai, and Qingdao. The QXS-SAROPT dataset under open access license CCBY is publicly available at https://github.com/yaoxu008/QXS-SAROPT.
On this basis, we conduct experiments on the optical-to-SAR crossmodal object detection to demonstrate the effectiveness and superiority of our framework. In particular, based on the QXS-SAROPT dataset, our framework can achieve up to 96% high accuracy on four benchmark SAR ship detection datasets.
The contributions of this paper can be summarized as follows: (i)We propose a BNN-based optical-SAR image joint intelligent interpretation framework, which can effectively improve the generic feature extraction capability of optical and SAR images, and thus improving the accuracy and the application scenarios of specific intelligent interpretation tasks for optical-SAR/SAR/optical images(ii)We publish the optical-SAR matching dataset: QXS-SAROPT, which contains 20,000 optical-SAR image pairs from multiple scenes of a high resolution of 1 meter to support the joint interpretation of optical and SAR images(iii)The BNN-based optical-SAR image joint intelligent interpretation framework is applied to SAR ship detection and achieves high accuracy on four SAR ship detection benchmark datasets
In this section, the details of the bridge neural network (BNN) and the proposed BNN-based joint interpretation framework are introduced.
2.1. Bridge Neural Network
Given a SAR-optical image matching dataset , where is the set of SAR images and is the set of corresponding optical images, here, we consider samples from , whose image pairs from the same region, as positive samples and samples from as negative samples. Different from Siamese network, BNN contains two separate feature extraction networks: SAR network and optical network with parameters , respectively, extracting features from SAR and optical images . To decrease the feature dimension, following the feature extraction backbone, we employ a convolution layer and a average-pooling layer to the feature map. Finally, a linear layer with sigmoid activation function is followed to project the feature map into the -dimension common feature representations: . Then, the BNN outputs the Euclidean distance between and to measure the relevance of the input SAR-optical image pairs, which is described as
where is the dimension of . The Euclidean distance indicates whether the input data pairs have a potential relation. And the closer distance between them, the more relevant they are. Specifically, the distance between positive samples tends to 0 while the distance between negative samples is close to 1. Therefore, the loss on positive samples and negative samples is set as follows:
Hence, the problem of learning the common representations of SAR-optical images is taken as a binary classification problem. The overall loss of BNN can be written as where is a hyperparameter to balance the weights of positive loss and negative loss. Then, the best weights can be obtained via a optimization issue:
2.2. Optical-SAR Image Joint Intelligent Interpretation
Since BNN projects optical-SAR image patches into a common feature subspace, the model can well mine the correlation between optical and SAR images and thus improving the feature learning ability of optical-SAR images. Based on BNN, we propose the optical-SAR image joint interpretation framework, which can jointly utilize optical and SAR features enhanced by BNN and improve the performance of specific interpretation tasks of optical-SAR/SAR/optical images. As depicted in Figure 3, we present two different usage scenarios of the proposed BNN-based optical-SAR image joint intelligent interpretation framework.
As shown in Figure 3(a), for applications of optical and SAR fusion intelligent interpretation, such as object detection, classification, and segmentation, our framework can learn the common representations from SAR and optical images and enhance their feature learning ability for better interpretation performance under all-weather in full-time. As for applications of crossmodal intelligent interpretation, see Figure 3(b), our framework first utilizes BNN to project optical and SAR images into a common feature space and mines their complementary and correlated information through optical-SAR image matching. Then, the feature extraction module of BNN for the image modality to be interpreted is used as the pretained model for feature embedding of the specific crossmodal intelligent interpretation task. In this way, plentiful complementary features transferred from images of the other modality during the learning of common feature space can be used to enhance the feature embeddings to be interpreted. Benefited from the enhanced feature embeddings, our framework can effectively improve the interpretation performance of crossmodal SAR/optical images.
Take SAR ship detection as an example. SAR ship detection in complex scenes is a great challenging task. And CNN-based SAR ship detection methods have drawn considerable attention because of the powerful feature embedding ability. Due to the scarce labeled SAR images, the pretraining technique is adopted to support these CNN-based SAR ship detectors. As SAR completely different from optical images, directly leveraging ImageNet  pretraining is hardly to obtain a good ship detector. However, our proposed framework can transfer rich texture features from optical images to SAR images to obtain a specific feature extraction model with better SAR feature embedding capabilities. Specifically, our proposed framework resorts to a SAR feature embedding operator from common representation learning based on the optical-SAR image matching task using BNN.
3. QXS-SAROPT Dataset
To fully exploit the relevance of optical and SAR images and verify the effectiveness of our proposed framework, a large perfectly aligned optical-SAR image dataset with diverse scenes of a high resolution is in need. Considering the existing optical-SAR image matching dataset either lack scene diversity or has a low resolution, we have published the QXS-SAROPT dataset, which contains 20,000 pairs of SAR and optical image patches with a high resolution of 1 m extracted from multiple Gaofen-3 and Google Earth  scenes. As far as we know, QXS-SAROPT is the first dataset to provide high-resolution coregistered SAR and optical satellite image patches covering over three big port cities in the world: San Diego, Shanghai, and Qingdao. The coverage of these images is shown in Figure 4. Algorithm 1 shows the procedure for the QXS-SAROPT dataset construction. Finally, 20,000 high-quality image patch pairs are preserved in our dataset, some of which are shown in Figure 5 for examples.
To verify the effectiveness and superiority of our BNN-based optical-SAR image joint intelligent interpretation framework, the framework is applied to one typical optical-to-SAR cross-modal object detection task.
We conduct optical-to-SAR crossmodal object detection tasks on four benchmark SAR ship detection datasets. We select two representative CNN-based ship detection methods: faster R-CNN  and YOLOv3 , as the benchmarks in our work. We first utilize BNN to pretrain the feature extraction module for the selected SAR ship detectors, namely, the ResNet50  backbone for faster R-CNN  and the Darknet53  backbone for YOLOv3 , based on QXS-SAROPT with 14,000 image pairs as the training set and the remaining 6,000 image pairs as the testing set. Then, the pretrained model with better SAR feature embedding capabilities by common representation learning from optical and SAR images is used for fine-tuning the corresponding SAR ship detector.
The AIR-SARShip-1.0 dataset consists of 31 high-resolution large-scale images from the Gaofen-3 satellite. 21 images are randomly selected as training and validation data, and the remaining 10 images are used for testing. The AIR-SARShip-2.0 dataset includes 300 images of size with resolutions ranging from 1 m to 5 m from the Gaofen-3 satellite. 210 images are randomly selected as training and validation data, and the remaining 90 images are used for testing. Images in AIR-SARShip-1.0 and AIR-SARShip-2.0 datasets are cropped into pixels with a overlap. The HRSID dataset contains 5604 SAR images of size and is divided into training and testing set at a ratio of . The resolutions of images in the SSDD dataset range from 1 m to 15 m, and 1160 images are divided into 928 images for training and 232 images for testing.
For faster R-CNN , we directly input each image of AIR-SARShip-1.0, AIR-SARShip-2.0, and HRSID into the network and resize images in SSDD to pixels. As for YOLOv3 , all the images are resized to pixels. Because of the necessity of the multiscaling training strategy for YOLOv3, no data augmentation except for scaling is applied.
4.2. Parameter Settings
4.2.1. Optical-SAR Image Matching
The BNN model with ResNet50  and Darknet53  as backbone is both trained in SGD for 200 epochs with a batch size of 20. The initial learning rate is set as 0.01 and then divided by a factor of 2 at the 30th and 100th epochs. The SAR and optical images are encoded into a -dimensional feature representation subspace. The ratio of positive and negative samples is set as and the adjusting factor .
4.2.2. SAR Ship Detectors
For the faster R-CNN  benchmark, all models are trained with SGD for 14 epochs with 0.0001 weight decay and 0.9 momentum, and the batch size is set to 8. The initial learning rate is 0.02 and is then divided by 10 at the 8th and 12th epochs. For the YOLOv3  benchmark, all models are trained with SGD for 240 epochs with 12 images per minibatch. The initial learning rate is set as 0.001 and is then divided by 10 at the 160th and 200th epochs. The IoU threshold is set as 0.5 when training and testing for rigorous filtering of the bounding boxes with low precision. Warm-up  is used at the first 500 iterations during the training stage to avoid gradient explosion. The same settings are applied for all experiments for a fair comparison.
4.3.1. Optical-SAR Image Matching
Table 1 shows the results of optical-SAR image matching using BNN  on the QXS-SAROPT dataset, which suggests that BNN achieves an outstanding performance on both ResNet50  and Darknet53  backbone. Specifically, the matching accuracy based on ResNet50 and Darknet53 reach up to and , respectively, demonstrating that BNN can learn useful common representations and well predict the relationship between SAR and optical images. Image pairs with different matching results by BNN are shown in Figure 6.
To explore the relationship between the training set size and matching results, we randomly select 4000 and 8000 optical-SAR image pairs as the training sets to train BNN on ResNet50. Table 2 shows the results of three sizes of training sets, which indicates that BNN can learn a good common representation even with a small number of training image pairs, and more training data can lead to better matching results. Besides, we show the accuracy, precision, and recall curves of BNN with 8000 image pairs as the training set in Figure 7, which show the convergence process of BNN.
4.3.2. SAR Ship Detection
Table 3 shows the average precision (AP) of SAR ship detection results on four ship detection datasets using ImageNet pretraining-based SAR ship detector (ImageNet-SSD) and our BNN-based SAR ship detector (BNN-based-SSD) pretrained on QXS-SAROPT. As shown in Table 3, compared with ImageNet-SSD, the AP of detection results is generally improved by BNN-based-SSD. Especially on SAR ship detection dataset AIR-SARShip-1.0 , 1.32% and 1.24% performance improvement can be achieved using two-stage detection benchmark: faster R-CNN  and one-stage detection benchmark: YOLOv3 , respectively. Average precision of ImageNet-SSD and BNN-based-SSD during the training on the test set of HRSID and AIR-SARShip-2.0 with two detectors are displayed in Figure 8. Taking YOLOv3 in AIR-SARShip-2.0 dataset as an example, BNN-based-SSD achieves higher average precision than ImageNet-SSD during the whole training process, indicating the significant improvement of our BNN-based-SSD. Similar phenomena are also presented on other datasets for both benchmarks, demonstrating the superiority of our BNN-based optical-SAR images joint intelligent interpretation framework. All these improved performances prove that our framework can well enhance the feature extraction capability of SAR ship detectors by common representation learning utilizing BNN and thus boosting ship detection in SAR images even with no additional annotation information of ships. To qualitatively compare these two methods, we visualize some detection results in Figure 9, which shows that our BNN-based-SSD clearly outperforms ImageNet-SSD and significantly reduces the missed detections and false alarms.
5. Conclusion and Future Work
In this paper, we propose a bridge neural network- (BNN-) based optical-SAR image joint intelligent interpretation framework, which can effectively improve the generic feature extraction capability of optical and SAR images by mining their feature correlation through matching tasks with BNN, and then improve the accuracy and application scenarios of specific optical-SAR image joint intelligent interpretation tasks. In order to fully exploit the correlation between optical and SAR images and ensure the great representation learning ability of BNN, we publish the QXS-SAROPT dataset containing 20,000 optical-SAR patch pairs from multiple scenes of a high resolution of 1 meter. Experimental results on the optical-to-SAR crossmodal object detection task demonstrate the effectiveness and superiority of our framework. It is noted that based on the QXS-SAROPT dataset, our framework can achieve up to 96% high accuracy in SAR ship detection.
This research is in its early stage. In the future, we will consider exploring the performance of the proposed framework on optical-SAR fusion intelligent interpretation tasks, such as classification of land use and land cover and building segmentation. To support the research in intelligent interpretation fusing optical-SAR data, we will add label annotations and positions for scenes/objects of interest to every patch pair of the QXS-SAROPT dataset. In addition, to further explore the potential value of the QXS-SAROPT dataset, we are going to release an improved version of the dataset in the future, which will cover more land areas with versatile scenes and different sized patch pairs suitable for various optical-SAR data fusion tasks.
At a more macroscopic level, there are plentiful aspects that deserve deeper investigation. Currently, our approach to interpreting multimodal remote sensing images is verified by experiments on the ground. However, the onboard processing of remote sensing images will be a trend in the future.
Unfortunately, running deep learning models tends to be a high-power consumption process, let alone the tight constraints of onboard memory and computing resources. In this case, deep learning model compression is an effective and necessary technique to achieve onboard processing in our future work. The purpose of model compression is to achieve a model with fewer parameters, calculation amount, and less RAM to run without significantly diminished accuracy. Popular model compression methods include pruning , quantization , low-rank approximation and sparsity , and knowledge distillation [44, 45].
Furthermore, the formation of SAR images from echoes is the first inevitable step of SAR data processing nowadays, based on the algorithms such as back projection , compressed sensing , or signal processing. Therefore, the SAR application pipeline consists of multiple operations and varieties of complex calculations. In our future work, we will attempt to develop a deep learning framework that performs an integrating SAR processing workflow end to end, from the reflected echoes to the interpretation results. This will help to reduce the complexity of the onboard processor and further improve the processing efficiency.
The QXS-SAROPT dataset released by this work is publicly available at https://github.com/yaoxu008/QXS-SAROPT under open access license CCBY. Four SAR ship detection datasets used in this paper are publicly available. AIR-SARShip-1.0 and AIR-SARShip-2.0 can be accessed at http://radars.ie.ac.cn/web/data/getData?dataType=SARDataset. HRSID is available at https://github.com/chaozhong2010/HRSID. SSDD can be downloaded at https://github.com/CAESAR-Radi/SAR-Ship-Dataset.
Conflicts of Interest
All authors declare no possible conflicts of interests.
X. Xiang and M. Huang conceived the idea of this study and supervised the study. M. Huang, X. Yao, L. Qian, and W. Bao conducted the experiments. M. Huang and L. Qian performed data analysis. M. Huang, X. Yao, and L. Qian contributed to the writing of the manuscript. L. Qian, W. Shi, Y. Zhang, N. Wang, and X. Liu participated in the construction of the QXS-SAROPT dataset. Meiyu Huang, Yao Xu, and Lixin Qian contributed equally to this work.
This is supported by the Beijing Nova Program of Science and Technology under Grant Z191100001119129 and the National Natural Science Foundation of China 61702520.
- L. Zhang, L. Zhang, and B. Du, “Deep learning for remote sensing data: a technical tutorial on the state of the art,” IEEE Geoscience and Remote Sensing Magazine, vol. 4, no. 2, pp. 22–40, 2016.
- X. X. Zhu, D. Tuia, L. Mou et al., “Deep learning in remote sensing: a comprehensive review and list of resources,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8–36, 2017.
- J. E. Ball, D. T. Anderson, and C. S. Chan, “Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community,” Journal of Applied Remote Sensing, vol. 11, no. 4, 2017.
- G. Tsagkatakis, A. Aidini, K. Fotiadou, M. Giannopoulos, A. Pentari, and P. Tsakalides, “Survey of deep-learning approaches for remote sensing observation enhancement,” Sensors, vol. 19, no. 18, article 3929, 2019.
- T. N. Sainath, A.-r. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep Convolutional Neural Networks for Lvcsr,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8614–8618, Vancouver, BC, Canada, 2013.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations, San Diego, CA, USA, 2015.
- J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, pp. 85–117, 2015.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, 2016.
- M. Schmitt and X. X. Zhu, “Data fusion and remote sensing: an ever-growing relationship,” IEEE Geoscience and Remote Sensing Magazine, vol. 4, no. 4, pp. 6–23, 2016.
- Z. Zhang, G. Vosselman, M. Gerke, D. Tuia, and M. Y. Yang, “Change detection between multimodal remote sensing data using siamese cnn,” 2018, https://arxiv.org/abs/1807.09562.
- P. Feng, Y. Lin, J. Guan et al., “Embranchment cnn based local climate zone classification using sar and multispectral remote sensing data,” in IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 6344–6347, Yokohama, Japan, 2019.
- Z. Zhang, G. Vosselman, M. Gerke, C. Persello, D. Tuia, and M. Y. Yang, “Detecting building changes between airborne laser scanning and photogrammetric data,” Remote sensing, vol. 11, no. 20, article 2417, 2019.
- M. Schmitt, F. Tupin, and X. X. Zhu, “Fusion of sar and optical remote sensing data–challenges and recent trends,” in 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 5458–5461, Fort Worth, TX, USA, 2017.
- M. Schmitt, L. Hughes, and X. Zhu, “The sen1-2 dataset for deep learning in sar-optical data FUSION,” Remote Sensing & Spatial Information Sciences, vol. IV-1, no. 1, pp. 141–146, 2018.
- Q. Feng, J. Yang, D. Zhu et al., “Integrating multitemporal sentinel-1/2 data for coastal land cover classification using a multibranch convolutional neural network: a case of the yellow river delta,” Remote Sensing, vol. 11, no. 9, article 1006, 2019.
- S. C. Kulkarni and P. P. Rege, “Pixel level fusion techniques for sar and optical images: a review,” Information Fusion, vol. 59, pp. 13–29, 2020.
- X. Li, L. Lei, Y. Sun, M. Li, and G. Kuang, “Multimodal bilinear fusion network with Second-Order attention-based channel selection for land cover classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 1011–1026, 2020.
- W. Burger and M. J. Burge, Principles of Digital Image Processing: Core Algorithms, 2010, Springer Science & Business Media.
- J. Walters-Williams and Y. Li, “Estimation of mutual information: a survey,” in Rough Sets and Knowledge Technology. RSKT 2009, P. Wen, Y. Li, L. Polkowski, Y. Yao, S. Tsumoto, and G. Wang, Eds., Lecture Notes in Computer Science, pp. 389–396, Springer, Berlin, Heidelberg, 2009.
- S. Suri and P. Reinartz, “Mutual-information-based registration of terrasar-x and ikonos imagery in urban areas,” IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 2, pp. 939–949, 2010.
- D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
- F. Dellinger, J. Delon, Y. Gousseau, J. Michel, and F. Tupin, “Sar-sift: a sift-like algorithm for sar images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 1, pp. 453–466, 2015.
- Y. Ye and L. Shen, “Hopc: a novel similarity metric based on geometric structural properties for multi-modal remote sensing image MATCHING,” Remote Sensing and Spatial Information Sciences, vol. III-1, pp. 9–16, 2016.
- Y. Ye, J. Shan, L. Bruzzone, and L. Shen, “Robust registration of multimodal remote sensing images based on structural similarity,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 5, pp. 2941–2958, 2017.
- S. Zagoruyko and N. Komodakis, “Learning to compare image patches via convolutional neural networks,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4353–4361, Boston, MA, USA, 2015.
- N. Merkle, W. Luo, S. Auer, R. Müller, and R. Urtasun, “Exploiting deep matching and sar data for the geo-localization accuracy improvement of optical satellite images,” Remote Sensing, vol. 9, no. 6, article 586, 2017.
- L. Mou, M. Schmitt, Y. Wang, and X. X. Zhu, “A cnn for the identification of corresponding patches in sar and optical imagery of urban scenes,” in 2017 Joint Urban Remote Sensing Event (JURSE), pp. 1–4, Dubai, United Arab Emirates, 2017.
- L. H. Hughes, M. Schmitt, L. Mou, Y. Wang, and X. X. Zhu, “Identifying corresponding patches in sar and optical images with a pseudo-siamese cnn,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 784–788, 2018.
- Y. Wang and X. X. Zhu, “The sarptical dataset for joint analysis of sar and optical image in dense urban area,” in IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 6840–6843, Valencia, Spain, 2018.
- J. Shermeyer, D. Hogan, J. Brown et al., “Spacenet 6: Multi-sensor all weather mapping dataset,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 196-197, Seattle, WA, USA, 2020.
- Q. Zhang, “System design and key technologies of the gf-3 satellite,” Acta Geodaetica et Cartographica Sinica, vol. 46, no. 3, pp. 269–277, 2017.
- Y. Xu, X. Xiang, and M. Huang, “Task-driven common representation learning via bridge neural network,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5573–5580, 2019.
- O. Russakovsky, J. Deng, H. Su et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, pp. 91–99, 2015.
- J. Redmon and A. Farhadi, “Yolov3: an incremental improvement,” 2018, https://arxiv.org/abs/1804.02767.
- S. Xian, W. Zhirui, S. Yuanrui, D. Wenhui, Z. Yue, and F. Kun, “Air-sarship–1.0: High resolution sar ship detection dataset,” Journal of Radars, vol. 8, no. 6, pp. 852–862, 2019.
- “2020 gaofen challenge on automated high-resolution earth observation image interpretation,” 2020, http://en.sw.chreos.org.
- S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “Hrsid: A high-resolution sar images dataset for ship detection and instance segmentation,” IEEE Access, vol. 8, pp. 120234–120254, 2020.
- J. Li, C. Qu, and J. Shao, “Ship Detection in Sar Images Based on an Improved faster r-Cnn,” in 2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA), pp. 1–6, Beijing, China, 2017.
- S. Han, J. Pool, J. Tran, and W. J. Dally, “Learning both weights and connections for efficient neural network,” Advances in Neural Information Processing Systems, MIT Press, 2015.
- S. Han, H. Mao, and W. J. Dally, “Deep compression: compressing deep neural networks with pruning, trained quantization and human coding,” in Proceedings of International Conference on Learning Representations, San Juan, Puerto Rico, 2016.
- M. Jaderberg, A. Vedaldi, and A. Zisserman, “Speeding up Convolutional Neural Networks with Low Rank Expansions,” in Proceedings of the British Machine Vision Conference, University of Nottingham, UK, 2014.
- C. Bucilua, R. Caruana, and A. Niculescumizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 535–541, 2006.
- G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” Advances in Neural Information Processing Systems, MIT Press, 2014.
- L. A. Gorham and L. J. Moore, “Sar image formation toolbox for matlab, in Algorithms for Synthetic Aperture Radar Imagery XVII,” International Society for Optics and Photonics, vol. 7699, pp. 769–906, 2010.
- R. Baraniuk and P. Steeghs, “Compressive radar imaging,” in 2007 IEEE Radar Conference, pp. 128–133, Waltham, MA, USA, 2007.
Copyright © 2021 Meiyu Huang et al. Exclusive Licensee Beijing Institute of Technology Press. Distributed under a Creative Commons Attribution License (CC BY 4.0).