Get Our e-AlertsSubmit Manuscript
Space: Science & Technology / 2022 / Article

Research Article | Open Access

Volume 2022 |Article ID 9805707 |

Yanru Sun, Zongxia Xie, Yanhong Chen, Qinghua Hu, "Accurate Solar Wind Speed Prediction with Multimodality Information", Space: Science & Technology, vol. 2022, Article ID 9805707, 13 pages, 2022.

Accurate Solar Wind Speed Prediction with Multimodality Information

Received30 Mar 2022
Accepted10 Jul 2022
Published10 Aug 2022


When the solar wind passes over the Earth, it will cause geomagnetic storms, affect short-wave communications, and threaten the safety of pipelines such as electricity and oil. Accurate prediction of the solar wind speed will allow people to make adequate preparations to avoid wasting resources and affecting people’s life. Most existing methods only use single-modality data as input and do not consider the information complementarity between different modalities. This paper proposes a multimodality prediction (MMP) method that jointly learns vision and sequence information in a unified end-to-end framework for solar wind speed prediction. MMP includes three modules: Vmodule, Tmodule, and Fusion module. Vmodule, which uses pretrained GoogLeNet, is proposed to learn visual representations from the extreme ultraviolet (EUV) images. Tmodule combining one-dimensional CNN with bidirectional long short-term memory (BiLSTM) is applied for learning sequence representation from multivariate time series. Finally, a multimodality fusion method is applied to improve the overall performance. We adopt the EUV images observed by the solar dynamics observatory (SDO) satellite and the OMNIWEB dataset measured at Lagrangian point 1 (L1) to experiment. Comparative experiments have shown that the proposed MMP achieves best performance in many metrics. The ablation experiments also verify the validity of each module and the rationality of the hyperparameter setting.

1. Introduction

As more and more high-tech systems are exposed to the space environment, space weather prediction can provide more excellent protection for these devices [1, 2]. In the solar system, space weather is mainly influenced by solar wind conditions. The solar wind is a stream of supersonic plasma-charged particles that shoot out of the Sun’s surface and affect the Earth, such as the aurora phenomenon and geomagnetic storms [3]. Accurate prediction of solar wind speed is of great significance to the Earth and humanity.

In recent years, as the quality and quantity of observation equipment have improved, a large amount of data has been generated, and more and more methods have emerged to make full use of these data for modeling to improve the accuracy of solar wind speed prediction. From the perspective of input data, they can be divided into two categories: methods based on sequence data ( and methods based on images [4].

Sequence data can reflect the changing trend of a phenomenon over time. Intuitively, historical time series data are used as input for solar wind speed prediction. Wang et al. [5] propose Wang-Sheeley-Arge (WSA), the most widely used solar wind prediction model. This method is based on the negative correlation between the solar wind speed at 1 AU and the expansion factor of the coronal magnetic field on the source surface. By adding angular distances from coronal hole boundaries , Arge et al. [6] give the current most common form of the function. Wintoft et al. [7, 8] propose a method based on machine learning to predict solar wind speed up to 3 days in advance. They adopt the potential field model to extend the magnetic map’s photosphere field to 2.5 and obtain the flux tube expansion coefficient. Then, the time series of the source surface magnetic field is input into the RBF neural network to predict the daily average solar wind speed. The correlation coefficient (CORR) of the best model is 0.58, and the root mean square error (RMSE) is 90 km/s. Liu et al. [9] implement the prediction of hourly average solar wind speed using SVR. They only adopt several data periods (about four 27-d solar-rotation periods) as the model’s input and obtain the accurate and reliable prediction performance. Yang et al. [10] propose a hybrid intelligent source surface model (HISS) based on neural networks. The inputs to the model include six features calculated by the potential field source surface (PFSS) model and the historical solar wind speed 27 days ago. Combined with these input features and artificial neural network, the solar wind speed prediction is obtained four days in advance, and the CORR is 0.74, and RMSE is 68 km/s. Sun et al. [11] propose a two-dimensional attention mechanism (TDAM) by capturing the relationship between feature dimension and time dimension of multivariate time series data, respectively, and obtain the result of 24-hour advance prediction, with a CORR of 0.78 and an RMSE of 62.88 km/s. Compared with other manual feature extraction methods, deep learning-based methods [11] can save time and energy in designing and producing outstanding and suitable features, which always needs professional knowledge.

In addition, Krieger et al. [12] demonstrate a correlation between the high-speed flow of the solar wind and coronal holes for the first time. Rotter et al. [13] also demonstrate a correlation of up to 0.78 between coronal holes area in EUV images and solar wind speed. In EUV images, the position, brightness, and size of coronal holes can be seen directly, which is very suitable for extracting coronal hole information. Deep learning has strong advantages in image data processing and is widely used in image classification [14, 15], semantic segmentation [16, 17], and other fields [18, 19]. Some deep learning-based methods have emerged to predict solar wind speed using extreme ultraviolet (EUV) images and magnetic maps as input. Upendran et al. [20] propose the WindNet model, which uses the pretrained GoogLeNet layer [21] for feature extraction and long short-term memory (LSTM) [22] for regression prediction to predict the daily average solar wind speed. The model predicts three days in advance and obtains the best-fit correlation of 0.55 0.03 with the observed data and a threat score of 0.357 0.03. Hemapriya et al. [23] propose a convolutional neural network (CNN-)-based deep learning model for solar wind speed prediction. They use EUV images from Atmospheric Imaging Assembly (AIA) at 193 wavelengths for training and obtain an RMSE of 76.3 km/s and a CORR of 0.57 for 2018. Similarly to [11], these methods also eliminate the need to extract features manually. However, whether sequence data or EUV images are used as input, only single-modality information is considered, and complementary information between different modalities is not utilized.

Multimodality is one of the rapidly developing research fields in deep learning, which has extensively promoted the development of visual question answering (VQA) [18, 24], visual commonsense reasoning (VCR) [25, 26], speech emotion recognition (SER) [27], and so on [28, 29]. Modal fusion is mainly divided into early fusion and late fusion, suitable for information fusion from different sources. These methods can make full use of complementary information between different modalities to improve the performance of downstream tasks. For example, VL-BERT [30] and VideoBERT [31] effectively aggregate the multimodal information in both the visual and linguistic domains and obtain good performance on several downstream tasks. STSER [32] achieves good accuracy of speech emotion recognition by integrating speech and text information. Similarly, different modalities of solar wind data contain different information, as shown in Figure 1. EUV images of the Sun contain static information such as coronal hole position and brightness. When some large coronal holes appear near the center of the Sun, they eject large amounts of charged particles toward the Earth, creating high-speed solar wind. In comparison, time series data provide more accurate information about historical trends of solar wind speed. When historical period solar wind speed data show an upward trend, it is highly likely to produce high-speed solar wind. We can use the complementary information of the two modalities to infer whether a high-speed solar wind will occur.

Inspired by that, we develop the multimodality prediction model (MMP), the dual-flow structure for solar wind speed prediction, as shown in Figure 2. Firstly, Vmodule is proposed to obtain the visual representation from the EUV image, composed of GoogLeNet pretraining layers [21]. For multivariate time series data, we propose a Tmodule learning sequence representation for assisting prediction. Finally, they are fused by late fusion to realize information complementary to improve performance. Compared with the traditional method, MMP not only integrates the complementary information of different modalities for prediction but also does not require manual feature extraction. Comparison with other models on publicly available datasets from 2011 to 2017 shows that our proposed MMP achieves the best results, and a large number of ablation experiments have been conducted to verify the necessity of each module of MMP.

The rest of the paper is structured as follows: Section 2 introduces the structure of the proposed model and its details. Section 3 gives the dataset description and preprocessing method. Then Section 4 presents the comparison results of the MMP model and the baselines models, and the results of the ablation experiment. We conclude in Section 5.

2. Materials and Methods

This section introduces the proposed MMP framework combining EUV image and multivariate time series for solar wind speed prediction. We first describe the overall structure of MMP. Then, the structures of vision feature extractor, Vmodule, and time series encoder, Tmodule, are introduced, respectively. Finally, the design of the multimodality fusion predictor is presented.

2.1. Overview Structure of MMP

In order to solve the problem of solar wind speed prediction, a multimodality solar wind speed prediction method based on a dual-flow model is proposed in this paper. In this method, the information complementarity between EUV images and multivariate time series data is used to improve the prediction accuracy. The overview structure of the proposed MMP framework is shown in Figure 2.

Compared with the single-modality methods [10, 23], MMP not only can capture the location and size of coronal holes from EUV images but also learn implicit trend information from historical data. It can use the information of the two modalities to complement each other to improve the prediction performance.

As shown in Figure 2, image data and sequence data are processed by vision feature extractor, Vmodule, and time series encoder, Tmodule, respectively. After extracting features from these two modules, the two feature vectors are concatenated into one vector for multimodality fusion. Finally, the prediction results are obtained by a multimodality prediction regressor. In this paper, we adopt represents the input length, and represents the time-in-advance of prediction. For example, means that the input includes 1-day EUV image and 1-day time series and the output is the prediction of the next day. The resolution of the EUV image and the time series is different, and the 1-day time series includes 24 data points, which contains the trend information. Similarly, means input 2-day EUV images and 2-day time series (48 hours data) and predict next day speed.

2.2. Vision Feature Extractor

Vision feature extractor, Vmodule, is mainly composed of a pretraining layer. We compare the performance with other pretrained models in Section 4, which verifies the effectiveness of the pretrained models capturing image features [3336]. Similar to Upendran et al. [20], we take GoogLeNet as an example. We use the pretrained GoogLeNet model as a feature extractor to extract EUV image features. We obtain the GoogLeNet weights from

As shown in Figure 2, GoogLeNet [21] introduces a new deep convolutional neural network architecture, Inception, and obtains state-of-the-art performance in the ImageNet Large-Scale Visual Recognition Challenge 2014. Though the structure of the GoogLeNet is very complex, they keep the amount of computation constant through careful design. GoogLeNet’s design follows a practical intuition. When we look at things, visual information is first processed at different scales and then aggregated, and GoogLeNet extracts cross-scale information from images by using convolutions with different kernel sizes in parallel.

The GoogLeNet model parameters are pretrained on the ImageNet dataset, which is a large dataset proposed by Deng et al. [37] and is built from a hierarchy provided by WordNet and contains approximately 50 million clearly labeled full-resolution images. However, the EUV images in our dataset do not share the same features as these images. So instead of fixing its parameters pretrained on the ImageNet dataset, we finetune the GoogLeNet network parameters based on our data. We need to feed GoogLeNet with properly formatted input and finetune all the network parameters. As shown in Figure 2, we draw the structure diagram of GoogLeNet model. We can see that the inputs of GoogLeNet are the EUV images, and we resize them as [256, 256]. In MMP, the GoogLeNet acts as a feature extractor. After the module, MMP gets a set of generic features embeddings, and the output dimension of the pretrained layer is [32, 1000].

Through the Vmodule, we extract the vision feature representation: .

2.3. Time Series Encoder

As shown in Figure 2, Tmodule consists of CNN and bidirectional long short-term memory (BiLSTM) to encode sequence data features for assisting prediction. According to [38, 39], there are complicated implicit relationships between different features, so we propose a one-dimensional CNN layer to capture the dependencies between different features. In addition, recurrent neural network (RNN) [40] is suitable for time series prediction due to its time dimension. However, the main problem of RNN is that it is easy to encounter gradient disappearance. To alleviate this problem, LSTM [22] is proposed to learn long-term dependencies and contextual information by introducing a gating mechanism.

We use BiLSTM [41, 42], a variant of LSTM, to learn the long-term dependence of features extracted from the CNN layer. Because of the bidirectional LSTM component, BiLSTM can effectively use past and future input features. BiLSTM layer number is , and the length of input time series in the encoder is . indicates that the input data is the relevant time series of the day before current moment. The resolution of the time series is 1 hour, so the input time series contains 24 data points.

Through the Tmodule composed of the CNN and BiLSTM, we get the feature representation of the time series: .

2.4. Multimodality Fusion Prediction

In order to obtain better prediction performance, we propose a multimodality fusion predictor, including feature fusion and prediction regression. Feature fusion module fuses the extracted vision and sequence features: and , and the predictor uses the fusion features to make regression predictions. Both two modules are designed to improve overall performance.

We adapt the concatenation strategy for feature fusion by directly concatenating vision and sequence representation to generate a new representation vector. Moreover, this vector is then fed into the multimodality prediction linear regressor to obtain the final prediction: where means concatenation and and mean the vision representation and the sequence representation, respectively. , are the parameters of the network. Specifically, we resize the EUV image as [256, 256] and set the dimension of each module including full connection layers in MMP before training. After Vmodule and Tmodule, the dimensions of the vision and sequence embedding are [32, 1000] and [32, 100], respectively. Through concatenation, the dimension becomes [32, 1100], where 32 represents the batch_size of the hyperparameters. We conduct multimodality fusion regression prediction based on these embeddings.

2.5. Loss Function

Our model is trained with mean square error loss using label : where is the current moment and means the -th day in the future. and mean the observed and the predicted daily solar wind speed, respectively. We find that coronal holes change slowly, so the resolution of the EUV image is 1 day. But the time series data contains more trend information which resolution is 1 hour. The resolution of the output is 1 day, which means we want to predict the daily solar wind speed in advance H days.

3. Dataset Description and Preprocessing

This section introduces the data used in this paper, including image data and solar wind time series data. Besides, we also introduce the methods of missing data processing and data split.

3.1. SDO/AIA Image

The SDO satellite has launched since NASA in 2010 and has been continuously monitoring solar activity and providing valuable scientific data [4]. The Atmospheric Imaging Assembly (AIA) module captures images of the whole Sun in two ultraviolet wavelength bands, including seven EUV wavelength bands and one visible wavelength. As Yang et al. [10] note, AIA 193 Å is more sensitive to the size and location of coronal holes. So we adopt the EUV images of AIA 193 Å taken by the SDO satellite as one of the input modalities ( We download the original images from the website with 1-hour resolution. We find that there are some data gaps because of temporary shutdowns which aim to protect equipment from high-energy magnetic storms. The longest data gap is about 1-2 days. Our dataset contains images taken by AIA from 2011 to 2017 at 512 × 512 resolution at a 1-day sampling frequency, so we replace the missing data with adjacent images from the original high-resolution data. For example, if the EUV for 2012.1.5 00 : 00 missed, we replace the image with the EUV for 2012.1.5 01 : 00 or 2012.1.4 23 : 00.

3.2. Solar Wind Data

Yang et al. [39] find that features such as particle temperature and density correlated with solar wind speed. Similar to Sun et al. [11], we also use six-related features as inputs for multivariate time series shown in Table 1. In particular, we binary the ICME list to introduce the ICME information, which 1 means the moment occurs the coronal mass ejection and 0 means not occurs the coronal mass ejection. The MMP is trained with the solar wind speed from the OMNI dataset as the target variable. We also use cubic B-spline interpolation [43] to fill the lost data. In addition, in order to eliminate the influence of dimension on the results, the standard normalization is carried out. The normalized mean is 0, and the variance is 1: , where is the mean of all sample data and is the standard deviation of the feature.


Time series of solar wind dataProton density1 hour2011-2017
Proton temperature1 hour2011-2017
Flow pressure1 hour2011-2017
Sigma_B1 hour2011-2017
Bulk speed1 hour2011-2017
ICME list1 hour2011-2017

EUV image dataAIA 193 Å1 day2011-2017

3.3. Dataset Split

Due to the availability of data, we preprocess EUV images and the solar wind data from 2011 to 2017. Since time series data have continuity in the time dimension, we split data from 2011 to 2015 as the training set, data from 2016 as the validation set, and 2017 as the test set. The data we used is shown in Table 1.

4. Results

In order to verify the effectiveness of the MMP model, we conduct a large number of experiments. This section describes the experimental setup firstly. And then, the experimental results and analysis are presented.

4.1. Experimental Setup
4.1.1. Hyperparameters

We finetune the GoogLeNet pretrained on the ImageNet dataset to extract EUV image features. Tmodule adopts as the convolution kernel size, where is the number of features of multivariate time series. And the number of output channels in CNN is set to 32. Moreover, the dimension of hidden states in BiLSTM is 100. The learning rate is 0.001, the batch size is 32, and the entire network is trained for 30 epochs. The Adam [44] optimization algorithm is used to optimize the model parameters.

4.1.2. Metrics for Comparison

Metrics such as RMSE, MAE, and CORR are used to evaluate the continuous prediction performance of the model. RMSE is calculated by taking the square root of the arithmetic mean of the difference between the observed value and the predicted value, as shown in Equation (3). MAE represents the mean of absolute error between the predicted and observed value, as shown in Equation (4). CORR can represent the similarity between the observed and the predicted sequence, and the calculation method is shown in Formula (5): (1)Root mean square error (RMSE):(2)Mean absolute error (MAE):(3)Pearson correlation coefficient (CORR):where means the prediction horizon, is the -th observed speed value, is the -th predicted speed value, and are the mean value of the observed and predicted speed value, respectively, and is the batch_size of data. For RMSE and MAE, lower value is better, but for CORR, higher value is better.

In addition, according to Table 2, we adopt the Heidke skill score to evaluate whether the model can capture the peak solar wind speed accurately: where the means of that are shown in Table 2.


Prediction threshold (hits) (false alarms)
Prediction<threshold (misses) (correct rejection)

4.2. Experimental Results and Analysis
4.2.1. Benchmark Models

(i)27-Day Persistence model [45]. According to the Carrington rotation period, the Sun’s rotation period is about 27 days, and we use the 27 persistence model as the baseline model. The speed of 27 days ago is used as the predicted output value: , where , represents the current moment, is the input window size, and means the horizon ahead moment(ii)Persistence model. We think that the speed value is also important to the predicted value at the near moment. Therefore, the recent time is used as the predicted value output: , where the meaning of and is the same as 27 persistence model and “1” means 1 day before the current time(iii)SVR model [46]. SVR uses SVM to fit the curve for regression analysis. In this paper, SVR implemented by Scikit-learn [47] is used for prediction, and three kernel functions are used for benchmark testing: linear kernel function (SVR_linear), polynomial kernel function (SVR_Poly), and radial basis kernel function (SVR_RBF).(iv)LSTM model [22]. LSTM is a special type of RNN that can learn long-term dependent information in sequence data and alleviate the phenomena of gradient disappearance and explosion to a great extent

4.2.2. Experimental Results

As shown in Table 3, we conduct experiments compared with the benchmark models for the metric defined—RMSE, MAE, and CORR, respectively. We set the input length and the time-in-advance of prediction . When , the input includes an EUV image and 24 time series points before the day, and the output is the daily solar wind speed after the day. The best results obtained by each method under different horizon settings are bold. The best results of all methods are underlined. We can see that MMP outperforms the benchmarks over each horizon. It can also be seen from Table 3 that with the extension of the horizon, the performance of all models declines to vary degrees.



1Note. The 27-day persistence gives an RMSE of 91.37, MAE of 68.95, and CORR of 0.59. The unit of RMSE and MAE is km/s.

In addition to the quantitative results, we also show representative solar wind time sequence profiles and the corresponding observations in Figures 36. From top to bottom are the 27-day persistence model, persistence model, SVR_Linear, SVR_Poly, SVR_RBF, LSTM, and MMP methods, respectively. As can be seen from the graphs in Figures 36, compared with other comparison models, MMP fits the trend of observed values well, especially when . When , we predict the solar wind speed 1-day in advance, which easily fits observations compared to other settings. Therefore, MMP fits the observed values well when . However, in the case of 2-, 3-, or 4-day in advance prediction, the minimum and maximum are far away from the observation. We think it is because the number of samples near the mean is more than samples near the extreme value. The model cannot learn enough features from less samples well, which is a common phenomenon in machine learning. And more and more researchers are dedicated to these works [4850]. We will continue to research how to predict high-speed stream accurately in the future.

As shown in Table 4, we calculate the Heidke skill score for all methods. By analyzing all the data from the training set and validation set, we draw a solar wind speed interval-frequency histogram. As can be seen from Figure 7, most solar wind speeds are distributed between 300 and 500. When the solar wind speed is greater than 500 km/s, the frequency of data declines greatly. Therefore, we set the threshold as 500 km/s to calculate the Heidke skill score according to Subsection 4.1.2. The best result is shown in bold. As we can see from Table 4, MMP outperforms other methods except the 27-day persistence model. We analyze the reason is that the Sun’s rotating period is 27 days, and the persistence model contains more trend and period information than the other models. So it performs better in the Heidke skill score and Figures 36.



1Note. The 27-day persistence gives a Heidke skill score of 0.40.
4.2.3. Ablation Study

To prove the effectiveness of each module in the MMP model, we conduct ablation experiments, which are shown in Table 5. We name the different ablation experiments as follows: (i)MMP w/o Vmodule: the MMP model without the vision feature extractor module(ii)MMP w/o Tmodule: the MMP model without the time series encoder component

(W,H)MMP w/o VmodelMMP w/o TmodelMMP


1Note. The unit of RMSE and MAE is km/s.

The results are shown in Table 5. For different horizon settings, the best results of each model are bold, and the best results of all models are underlined. We can find some highlighting: (i)Removing the Vmodule leads to a decline in experimental results, especially for long-term prediction. According to our analysis, multivariate time series play an important role in short-term prediction, so removing Vmodule has little impact on short-term prediction. However, the solar wind speed is mainly influenced by coronal holes and other driving sources for 3 and 4 days in advance prediction. Therefore, it is necessary to capture the EUV visual information, and removing the Vmodule impacts the performance significantly(ii)In contrast to the removal of Vmodule, removing Tmodule has a more significant impact on short-term prediction. After analysis, we think that most solar wind with a short propagation time is the background solar wind. EUV images are relatively calm at this time, and the historical trend information has critical significance. Therefore, after removing Tmodule, the short-term prediction effect is significantly reduced(iii)Under different horizon settings, our MMP model achieves the best results, with the lowest RMSE and MAE and the highest CORR. It shows that the combination of EUV image data and historical time series data can provide richer information and help to improve the effect of short-term and long-term prediction and also proves that MMP can effectively utilize the complementary information of the two modalities

4.2.4. Different Pretrained Models

This section compares the effects of different pretrained models. As shown in Table 6, we compare the performance of VGG [33], ResNet [34], DenseNet [35], and SequeezeNet [36] as the pretrained models to capture the vision feature representation, respectively. For different horizon settings, the best results of each pretrained model are bold, and the best results of all models are underlined. From Table 6, we can find that GoogLeNet obtains the most and the best metric results. But the other pretrained models obtain the similar performance at some () combines, which proves the potential of pretrained models trained on large-scale dataset to capture image features.



1Note. The unit of RMSE and MAE is km/s.
4.2.5. Hyperparameter Effect

In this section, we analyze the influence of essential hyperparameter settings on performance. Table 7 shows the influence of the number BiLSTM layers on the experimental results, and the best results are shown in bold. The BiLSTM module in Tmodule is used to extract the features of the time dimension of multiple time series. However, as shown in Table 7, too many or too few BiLSTM layers can also lead to performance degradation.

HorizonBiLSTM-layer =0BiLSTM-layer = 1 (MMP)BiLSTM-layer = 2BiLSTM-layer = 3


1Note. The unit of RMSE and MAE is km/s.

5. Discussion

In this paper, a multimodality solar wind speed prediction method (MMP) is proposed. MMP combines the complementary information of EUV image data and multivariate time series data to capture the dynamic information of solar surface and historical sequence trend information. Firstly, we propose a dual-flow structure to extract the two modalities’ features and then fuse them for prediction. In contrast to the baseline model, our model uses information from different modalities to achieve the best results. In addition, we conduct ablation experiments to verify the effectiveness of the vision feature extractor and time series encoder. We also compare the performance of different pretrained models to verify the effectiveness of them to capture image features and conduct hyperparameter comparison experiments to verify the rationality of our model parameter selection.

For the future work, there are several promising directions for the work. Firstly, for different horizon, this paper considers the contribution of image data and time series data to performance is the same. However, it can be seen from Table 4 that image data and sequence data have different importance in the 1-day in advance prediction and 2, 3, and 4 days in advance prediction. Future research will focus on the impact of different modalities on performance, assign different weights to different modalities, and use their complementary relationship to improve performance. Secondly, it can be seen from Table 4 and Figures 36 that our model cannot capture high-speed solar stream well, which is very difficult but essential for the application. We will focus on how to improve peak prediction in the future.

Data Availability

The data and code used to support the findings of this study will be published on Github after the paper is accepted:

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Authors’ Contributions

All authors participated in the research design and conducted the experiments. Yanru Sun performed data analysis and accomplished the writing of the manuscript.


We acknowledge the use of NASA/GSFC’s Space Physics Data Facility’s OMNIWeb (or CDAWeb or ftp) service, and OMNI data and courtesy of NASA/SDO and the AIA, EVE, and HMI science teams. We would also like to acknowledge the anonymous reviewers for their careful work and thoughtful suggestions which have helped improve the manuscript substantially. This work was supported by the National Natural Science Foundation of China under Grant Nos. 61925602 and 61732011.


  1. R. Schwenn, “Space weather: the solar perspective,” Living Reviews in Solar Physics, vol. 3, no. 1, pp. 1–72, 2006. View at: Publisher Site | Google Scholar
  2. M. Hapgood, “Towards a scientific understanding of the risk from extreme space weather,” Advances in Space Research, vol. 47, no. 12, pp. 2059–2072, 2011. View at: Publisher Site | Google Scholar
  3. K. L. Bedingfield, “Spacecraft system failures and anomalies attributed to the natural space environment,” NASA, vol. 1390, 1996. View at: Google Scholar
  4. R. Galvez, D. F. Fouhey, M. Jin et al., “A machine-learning data set prepared from the nasa solar dynamics observatory mission,” The Astrophysical Journal Supplement Series, vol. 242, no. 1, p. 7, 2019. View at: Publisher Site | Google Scholar
  5. Y.-M. Wang and N. Sheeley Jr., “Solar wind speed and coronal flux-tube expansion,” The Astrophysical Journal, vol. 355, pp. 726–732, 1990. View at: Publisher Site | Google Scholar
  6. C. Arge and V. Pizzo, “Improvement in the prediction of solar wind conditions using near-real time solar magnetic field updates,” Journal of Geophysical Research: Space Physics, vol. 105, no. A5, pp. 10465–10479, 2000. View at: Publisher Site | Google Scholar
  7. P. Wintoft and H. Lundstedt, “Prediction of daily average solar wind velocity from solar magnetic field observations using hybrid intelligent systems,” Physics and Chemistry of the Earth, vol. 22, no. 7-8, pp. 617–622, 1997. View at: Publisher Site | Google Scholar
  8. P. Wintoft and H. Lundstedt, “A neural network study of the mapping from solar magnetic fields to the daily average solar wind velocity,” Journal of Geophysical Research: Space Physics, vol. 104, no. A4, pp. 6729–6736, 1999. View at: Publisher Site | Google Scholar
  9. D. Liu, C. Huang, J. Lu, and J. Wang, “The hourly average solar wind velocity prediction based on support vector regression method,” Monthly Notices of the Royal Astronomical Society, vol. 413, no. 4, pp. 2877–2882, 2011. View at: Publisher Site | Google Scholar
  10. Y. Yang, F. Shen, Z. Yang, and X. Feng, “Prediction of solar wind speed at 1 au using an artificial neural network,” Space Weather, vol. 16, no. 9, pp. 1227–1244, 2018. View at: Publisher Site | Google Scholar
  11. Y. Sun, Z. Xie, Y. Chen, X. Huang, and Q. Hu, “Solar wind speed prediction with two dimensional attention mechanism,” Space Weather, vol. 19, no. 7, article e2020SW002707, 2021. View at: Publisher Site | Google Scholar
  12. A. Krieger, A. Timothy, and E. Roelof, “A coronal hole and its identification as the source of a high velocity solar wind stream,” Solar Physics, vol. 29, no. 2, pp. 505–525, 1973. View at: Publisher Site | Google Scholar
  13. T. Rotter, A. Veronig, M. Temmer, and B. Vršnak, “Relation between coronal hole areas on the sun and the solar wind parameters at 1 au,” Solar Physics, vol. 281, no. 2, pp. 793–813, 2012. View at: Publisher Site | Google Scholar
  14. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: convolutional block attention module,” in Computer Vision – ECCV 2018. ECCV 2018, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11211 of Lecture Notes in Computer Science, pp. 3–19, 2018. View at: Publisher Site | Google Scholar
  15. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, Salt Lake City, UT, USA, 2018. View at: Publisher Site | Google Scholar
  16. G. Huang, Z. Liu, L. Van DerMaaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708, Honolulu, HI, USA, 2017. View at: Publisher Site | Google Scholar
  17. K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, Venice, Italy, 2017. View at: Google Scholar
  18. H. Tan and M. Bansal, “Lxmert: learning cross-modality encoder representations from transformers,” 2019, View at: Google Scholar
  19. Z. Huang, Z. Zeng, B. Liu, D. Fu, and J. Fu, “Pixel-bert: aligning image pixels with text by deep multi-modal transformers,” 2020, View at: Google Scholar
  20. V. Upendran, M. C. Cheung, S. Hanasoge, and G. Krishnamurthi, “Solar wind prediction using deep learning,” Space Weather, vol. 18, no. 9, article e2020SW002478, 2020. View at: Publisher Site | Google Scholar
  21. C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, Boston, MA, USA, 2015. View at: Google Scholar
  22. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. View at: Publisher Site | Google Scholar
  23. H. Raju and S. Das, “Cnn-based deep learning model for solar wind forecasting,” Solar Physics, vol. 296, no. 9, pp. 1–25, 2021. View at: Publisher Site | Google Scholar
  24. J. Lu, D. Batra, D. Parikh, and S. Lee, “Vilbert: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks,” Advances in Neural Information Processing Systems, vol. 32, 2019. View at: Google Scholar
  25. L. H. Li, M. Yatskar, D. Yin, C.-J. Hsieh, and K.-W. Chang, “Visualbert: A Simple and Performant Baseline for Vision and Language,” 2019, View at: Google Scholar
  26. G. Li, N. Duan, Y. Fang, M. Gong, and D. Jiang, “Unicoder-vl: a universal encoder for vision and language by cross-modal pre-training,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11336–11344, 2020. View at: Publisher Site | Google Scholar
  27. D. Zhang et al., “Multi-modal multi-label emotion recognition with heterogeneous hierarchical message passing,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14338–14346, 2021. View at: Google Scholar
  28. S. M. S. A. Abdullah, S. Y. A. Ameen, M. A. Sadeeq, and S. Zeebaree, “Multimodal emotion recognition using deep learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 2, pp. 52–58, 2021. View at: Publisher Site | Google Scholar
  29. P. Anderson, X. He, C. Buehler et al., “Bottom-up and top-down attention for image captioning and visual question answering,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6077–6086, Salt Lake City, UT, USA, 2018. View at: Publisher Site | Google Scholar
  30. W. Su, X. Zhu, Y. Cao et al., “Vl-bert: pre-training of generic visual-linguistic representations,” 2019, View at: Google Scholar
  31. C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid, “Videobert: a joint model for video and language representation learning,” in IEEE/CVF International Conference on Computer Vision (ICCV),, pp. 7464–7473, Seoul, Korea (South), 2019. View at: Publisher Site | Google Scholar
  32. M. Chen and X. Zhao, “A multi-scale fusion framework for bimodal speech emotion recognition,” in Interspeech 2020, pp. 374–378, Shanghai, China, 2020. View at: Publisher Site | Google Scholar
  33. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2015, View at: Google Scholar
  34. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, 2016. View at: Publisher Site | Google Scholar
  35. G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, Honolulu, HI, USA, 2017. View at: Publisher Site | Google Scholar
  36. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet:Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” 2016, View at: Google Scholar
  37. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: a large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, Miami, FL, USA, 2009. View at: Publisher Site | Google Scholar
  38. J. Gosling, R. Hansen, and S. Bame, “Solar wind speed distributions: 1962–1970,” Journal of Geophysical Research, vol. 76, no. 7, pp. 1811–1815, 1971. View at: Publisher Site | Google Scholar
  39. Z. Yang, F. Shen, J. Zhang, Y. Yang, X. Feng, and I. G. Richardson, “Correlation between the magnetic field and plasma parameters at 1 au,” Solar Physics, vol. 293, no. 2, pp. 1–13, 2018. View at: Publisher Site | Google Scholar
  40. J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990. View at: Publisher Site | Google Scholar
  41. A. Graves, A.-R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649, Vancouver, BC, Canada, 2013. View at: Publisher Site | Google Scholar
  42. A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, vol. 18, no. 5-6, pp. 602–610, 2005. View at: Publisher Site | Google Scholar
  43. W. Boehm, “Inserting new knots into b-spline curves,” Computer-Aided Design, vol. 12, no. 4, pp. 199–201, 1980. View at: Publisher Site | Google Scholar
  44. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” 2014, View at: Google Scholar
  45. M. J. Owens, R. Challen, J. Methven, E. Henley, and D. Jackson, “A 27 day persistence model of near-earth solar wind conditions: a long lead-time forecast and a benchmark for dynamical models,” Space Weather, vol. 11, no. 5, pp. 225–236, 2013. View at: Publisher Site | Google Scholar
  46. H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” Advances in Neural Information Processing Systems, vol. 9, pp. 155–161, 1996. View at: Google Scholar
  47. F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikit-learn: machine learning in python,” The Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011. View at: Google Scholar
  48. J. Liu, Y. Sun, C. Han, Z. Dou, and W. Li, “Deep representation learning on long-tailed data: a learnable embedding augmentation perspective,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2970–2979, Seattle, WA, USA, 2020. View at: Publisher Site | Google Scholar
  49. T. Wu, Q. Huang, Z. Liu, Y. Wang, and D. Lin, “Distribution-balanced loss for multi-label classification in long-tailed datasets,” in Computer Vision – ECCV 2020. ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J. M. Frahm, Eds., vol. 12349 of Lecture Notes in Computer Science, pp. 162–178, Springer, 2020. View at: Publisher Site | Google Scholar
  50. F. Zhou, L. Yu, X. Xu, and G. Trajcevski, “Decoupling representation and regressor for long-tailed information cascade prediction,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1875–1879, Canada, 2021. View at: Publisher Site | Google Scholar

Copyright © 2022 Yanru Sun et al. Exclusive Licensee Beijing Institute of Technology Press. Distributed under a Creative Commons Attribution License (CC BY 4.0).

 PDF Download Citation Citation
Altmetric Score