Research Article  Open Access
Yanru Sun, Zongxia Xie, Yanhong Chen, Qinghua Hu, "Accurate Solar Wind Speed Prediction with Multimodality Information", Space: Science & Technology, vol. 2022, Article ID 9805707, 13 pages, 2022. https://doi.org/10.34133/2022/9805707
Accurate Solar Wind Speed Prediction with Multimodality Information
Abstract
When the solar wind passes over the Earth, it will cause geomagnetic storms, affect shortwave communications, and threaten the safety of pipelines such as electricity and oil. Accurate prediction of the solar wind speed will allow people to make adequate preparations to avoid wasting resources and affecting people’s life. Most existing methods only use singlemodality data as input and do not consider the information complementarity between different modalities. This paper proposes a multimodality prediction (MMP) method that jointly learns vision and sequence information in a unified endtoend framework for solar wind speed prediction. MMP includes three modules: Vmodule, Tmodule, and Fusion module. Vmodule, which uses pretrained GoogLeNet, is proposed to learn visual representations from the extreme ultraviolet (EUV) images. Tmodule combining onedimensional CNN with bidirectional long shortterm memory (BiLSTM) is applied for learning sequence representation from multivariate time series. Finally, a multimodality fusion method is applied to improve the overall performance. We adopt the EUV images observed by the solar dynamics observatory (SDO) satellite and the OMNIWEB dataset measured at Lagrangian point 1 (L1) to experiment. Comparative experiments have shown that the proposed MMP achieves best performance in many metrics. The ablation experiments also verify the validity of each module and the rationality of the hyperparameter setting.
1. Introduction
As more and more hightech systems are exposed to the space environment, space weather prediction can provide more excellent protection for these devices [1, 2]. In the solar system, space weather is mainly influenced by solar wind conditions. The solar wind is a stream of supersonic plasmacharged particles that shoot out of the Sun’s surface and affect the Earth, such as the aurora phenomenon and geomagnetic storms [3]. Accurate prediction of solar wind speed is of great significance to the Earth and humanity.
In recent years, as the quality and quantity of observation equipment have improved, a large amount of data has been generated, and more and more methods have emerged to make full use of these data for modeling to improve the accuracy of solar wind speed prediction. From the perspective of input data, they can be divided into two categories: methods based on sequence data (https://omniweb.gsfc.nasa.gov/index.html) and methods based on images [4].
Sequence data can reflect the changing trend of a phenomenon over time. Intuitively, historical time series data are used as input for solar wind speed prediction. Wang et al. [5] propose WangSheeleyArge (WSA), the most widely used solar wind prediction model. This method is based on the negative correlation between the solar wind speed at 1 AU and the expansion factor of the coronal magnetic field on the source surface. By adding angular distances from coronal hole boundaries , Arge et al. [6] give the current most common form of the function. Wintoft et al. [7, 8] propose a method based on machine learning to predict solar wind speed up to 3 days in advance. They adopt the potential field model to extend the magnetic map’s photosphere field to 2.5 and obtain the flux tube expansion coefficient. Then, the time series of the source surface magnetic field is input into the RBF neural network to predict the daily average solar wind speed. The correlation coefficient (CORR) of the best model is 0.58, and the root mean square error (RMSE) is 90 km/s. Liu et al. [9] implement the prediction of hourly average solar wind speed using SVR. They only adopt several data periods (about four 27d solarrotation periods) as the model’s input and obtain the accurate and reliable prediction performance. Yang et al. [10] propose a hybrid intelligent source surface model (HISS) based on neural networks. The inputs to the model include six features calculated by the potential field source surface (PFSS) model and the historical solar wind speed 27 days ago. Combined with these input features and artificial neural network, the solar wind speed prediction is obtained four days in advance, and the CORR is 0.74, and RMSE is 68 km/s. Sun et al. [11] propose a twodimensional attention mechanism (TDAM) by capturing the relationship between feature dimension and time dimension of multivariate time series data, respectively, and obtain the result of 24hour advance prediction, with a CORR of 0.78 and an RMSE of 62.88 km/s. Compared with other manual feature extraction methods, deep learningbased methods [11] can save time and energy in designing and producing outstanding and suitable features, which always needs professional knowledge.
In addition, Krieger et al. [12] demonstrate a correlation between the highspeed flow of the solar wind and coronal holes for the first time. Rotter et al. [13] also demonstrate a correlation of up to 0.78 between coronal holes area in EUV images and solar wind speed. In EUV images, the position, brightness, and size of coronal holes can be seen directly, which is very suitable for extracting coronal hole information. Deep learning has strong advantages in image data processing and is widely used in image classification [14, 15], semantic segmentation [16, 17], and other fields [18, 19]. Some deep learningbased methods have emerged to predict solar wind speed using extreme ultraviolet (EUV) images and magnetic maps as input. Upendran et al. [20] propose the WindNet model, which uses the pretrained GoogLeNet layer [21] for feature extraction and long shortterm memory (LSTM) [22] for regression prediction to predict the daily average solar wind speed. The model predicts three days in advance and obtains the bestfit correlation of 0.55 0.03 with the observed data and a threat score of 0.357 0.03. Hemapriya et al. [23] propose a convolutional neural network (CNN)based deep learning model for solar wind speed prediction. They use EUV images from Atmospheric Imaging Assembly (AIA) at 193 wavelengths for training and obtain an RMSE of 76.3 km/s and a CORR of 0.57 for 2018. Similarly to [11], these methods also eliminate the need to extract features manually. However, whether sequence data or EUV images are used as input, only singlemodality information is considered, and complementary information between different modalities is not utilized.
Multimodality is one of the rapidly developing research fields in deep learning, which has extensively promoted the development of visual question answering (VQA) [18, 24], visual commonsense reasoning (VCR) [25, 26], speech emotion recognition (SER) [27], and so on [28, 29]. Modal fusion is mainly divided into early fusion and late fusion, suitable for information fusion from different sources. These methods can make full use of complementary information between different modalities to improve the performance of downstream tasks. For example, VLBERT [30] and VideoBERT [31] effectively aggregate the multimodal information in both the visual and linguistic domains and obtain good performance on several downstream tasks. STSER [32] achieves good accuracy of speech emotion recognition by integrating speech and text information. Similarly, different modalities of solar wind data contain different information, as shown in Figure 1. EUV images of the Sun contain static information such as coronal hole position and brightness. When some large coronal holes appear near the center of the Sun, they eject large amounts of charged particles toward the Earth, creating highspeed solar wind. In comparison, time series data provide more accurate information about historical trends of solar wind speed. When historical period solar wind speed data show an upward trend, it is highly likely to produce highspeed solar wind. We can use the complementary information of the two modalities to infer whether a highspeed solar wind will occur.
Inspired by that, we develop the multimodality prediction model (MMP), the dualflow structure for solar wind speed prediction, as shown in Figure 2. Firstly, Vmodule is proposed to obtain the visual representation from the EUV image, composed of GoogLeNet pretraining layers [21]. For multivariate time series data, we propose a Tmodule learning sequence representation for assisting prediction. Finally, they are fused by late fusion to realize information complementary to improve performance. Compared with the traditional method, MMP not only integrates the complementary information of different modalities for prediction but also does not require manual feature extraction. Comparison with other models on publicly available datasets from 2011 to 2017 shows that our proposed MMP achieves the best results, and a large number of ablation experiments have been conducted to verify the necessity of each module of MMP.
The rest of the paper is structured as follows: Section 2 introduces the structure of the proposed model and its details. Section 3 gives the dataset description and preprocessing method. Then Section 4 presents the comparison results of the MMP model and the baselines models, and the results of the ablation experiment. We conclude in Section 5.
2. Materials and Methods
This section introduces the proposed MMP framework combining EUV image and multivariate time series for solar wind speed prediction. We first describe the overall structure of MMP. Then, the structures of vision feature extractor, Vmodule, and time series encoder, Tmodule, are introduced, respectively. Finally, the design of the multimodality fusion predictor is presented.
2.1. Overview Structure of MMP
In order to solve the problem of solar wind speed prediction, a multimodality solar wind speed prediction method based on a dualflow model is proposed in this paper. In this method, the information complementarity between EUV images and multivariate time series data is used to improve the prediction accuracy. The overview structure of the proposed MMP framework is shown in Figure 2.
Compared with the singlemodality methods [10, 23], MMP not only can capture the location and size of coronal holes from EUV images but also learn implicit trend information from historical data. It can use the information of the two modalities to complement each other to improve the prediction performance.
As shown in Figure 2, image data and sequence data are processed by vision feature extractor, Vmodule, and time series encoder, Tmodule, respectively. After extracting features from these two modules, the two feature vectors are concatenated into one vector for multimodality fusion. Finally, the prediction results are obtained by a multimodality prediction regressor. In this paper, we adopt represents the input length, and represents the timeinadvance of prediction. For example, means that the input includes 1day EUV image and 1day time series and the output is the prediction of the next day. The resolution of the EUV image and the time series is different, and the 1day time series includes 24 data points, which contains the trend information. Similarly, means input 2day EUV images and 2day time series (48 hours data) and predict next day speed.
2.2. Vision Feature Extractor
Vision feature extractor, Vmodule, is mainly composed of a pretraining layer. We compare the performance with other pretrained models in Section 4, which verifies the effectiveness of the pretrained models capturing image features [33–36]. Similar to Upendran et al. [20], we take GoogLeNet as an example. We use the pretrained GoogLeNet model as a feature extractor to extract EUV image features. We obtain the GoogLeNet weights from http://www.deeplearningmodel.net/.
As shown in Figure 2, GoogLeNet [21] introduces a new deep convolutional neural network architecture, Inception, and obtains stateoftheart performance in the ImageNet LargeScale Visual Recognition Challenge 2014. Though the structure of the GoogLeNet is very complex, they keep the amount of computation constant through careful design. GoogLeNet’s design follows a practical intuition. When we look at things, visual information is first processed at different scales and then aggregated, and GoogLeNet extracts crossscale information from images by using convolutions with different kernel sizes in parallel.
The GoogLeNet model parameters are pretrained on the ImageNet dataset, which is a large dataset proposed by Deng et al. [37] and is built from a hierarchy provided by WordNet and contains approximately 50 million clearly labeled fullresolution images. However, the EUV images in our dataset do not share the same features as these images. So instead of fixing its parameters pretrained on the ImageNet dataset, we finetune the GoogLeNet network parameters based on our data. We need to feed GoogLeNet with properly formatted input and finetune all the network parameters. As shown in Figure 2, we draw the structure diagram of GoogLeNet model. We can see that the inputs of GoogLeNet are the EUV images, and we resize them as [256, 256]. In MMP, the GoogLeNet acts as a feature extractor. After the module, MMP gets a set of generic features embeddings, and the output dimension of the pretrained layer is [32, 1000].
Through the Vmodule, we extract the vision feature representation: .
2.3. Time Series Encoder
As shown in Figure 2, Tmodule consists of CNN and bidirectional long shortterm memory (BiLSTM) to encode sequence data features for assisting prediction. According to [38, 39], there are complicated implicit relationships between different features, so we propose a onedimensional CNN layer to capture the dependencies between different features. In addition, recurrent neural network (RNN) [40] is suitable for time series prediction due to its time dimension. However, the main problem of RNN is that it is easy to encounter gradient disappearance. To alleviate this problem, LSTM [22] is proposed to learn longterm dependencies and contextual information by introducing a gating mechanism.
We use BiLSTM [41, 42], a variant of LSTM, to learn the longterm dependence of features extracted from the CNN layer. Because of the bidirectional LSTM component, BiLSTM can effectively use past and future input features. BiLSTM layer number is , and the length of input time series in the encoder is . indicates that the input data is the relevant time series of the day before current moment. The resolution of the time series is 1 hour, so the input time series contains 24 data points.
Through the Tmodule composed of the CNN and BiLSTM, we get the feature representation of the time series: .
2.4. Multimodality Fusion Prediction
In order to obtain better prediction performance, we propose a multimodality fusion predictor, including feature fusion and prediction regression. Feature fusion module fuses the extracted vision and sequence features: and , and the predictor uses the fusion features to make regression predictions. Both two modules are designed to improve overall performance.
We adapt the concatenation strategy for feature fusion by directly concatenating vision and sequence representation to generate a new representation vector. Moreover, this vector is then fed into the multimodality prediction linear regressor to obtain the final prediction: where means concatenation and and mean the vision representation and the sequence representation, respectively. , are the parameters of the network. Specifically, we resize the EUV image as [256, 256] and set the dimension of each module including full connection layers in MMP before training. After Vmodule and Tmodule, the dimensions of the vision and sequence embedding are [32, 1000] and [32, 100], respectively. Through concatenation, the dimension becomes [32, 1100], where 32 represents the batch_size of the hyperparameters. We conduct multimodality fusion regression prediction based on these embeddings.
2.5. Loss Function
Our model is trained with mean square error loss using label : where is the current moment and means the th day in the future. and mean the observed and the predicted daily solar wind speed, respectively. We find that coronal holes change slowly, so the resolution of the EUV image is 1 day. But the time series data contains more trend information which resolution is 1 hour. The resolution of the output is 1 day, which means we want to predict the daily solar wind speed in advance H days.
3. Dataset Description and Preprocessing
This section introduces the data used in this paper, including image data and solar wind time series data. Besides, we also introduce the methods of missing data processing and data split.
3.1. SDO/AIA Image
The SDO satellite has launched since NASA in 2010 and has been continuously monitoring solar activity and providing valuable scientific data [4]. The Atmospheric Imaging Assembly (AIA) module captures images of the whole Sun in two ultraviolet wavelength bands, including seven EUV wavelength bands and one visible wavelength. As Yang et al. [10] note, AIA 193 Å is more sensitive to the size and location of coronal holes. So we adopt the EUV images of AIA 193 Å taken by the SDO satellite as one of the input modalities (https://sdo.gsfc.nasa.gov/assets/img/browse/). We download the original images from the website with 1hour resolution. We find that there are some data gaps because of temporary shutdowns which aim to protect equipment from highenergy magnetic storms. The longest data gap is about 12 days. Our dataset contains images taken by AIA from 2011 to 2017 at 512 × 512 resolution at a 1day sampling frequency, so we replace the missing data with adjacent images from the original highresolution data. For example, if the EUV for 2012.1.5 00 : 00 missed, we replace the image with the EUV for 2012.1.5 01 : 00 or 2012.1.4 23 : 00.
3.2. Solar Wind Data
Yang et al. [39] find that features such as particle temperature and density correlated with solar wind speed. Similar to Sun et al. [11], we also use sixrelated features as inputs for multivariate time series shown in Table 1. In particular, we binary the ICME list to introduce the ICME information, which 1 means the moment occurs the coronal mass ejection and 0 means not occurs the coronal mass ejection. The MMP is trained with the solar wind speed from the OMNI dataset as the target variable. We also use cubic Bspline interpolation [43] to fill the lost data. In addition, in order to eliminate the influence of dimension on the results, the standard normalization is carried out. The normalized mean is 0, and the variance is 1: , where is the mean of all sample data and is the standard deviation of the feature.

3.3. Dataset Split
Due to the availability of data, we preprocess EUV images and the solar wind data from 2011 to 2017. Since time series data have continuity in the time dimension, we split data from 2011 to 2015 as the training set, data from 2016 as the validation set, and 2017 as the test set. The data we used is shown in Table 1.
4. Results
In order to verify the effectiveness of the MMP model, we conduct a large number of experiments. This section describes the experimental setup firstly. And then, the experimental results and analysis are presented.
4.1. Experimental Setup
4.1.1. Hyperparameters
We finetune the GoogLeNet pretrained on the ImageNet dataset to extract EUV image features. Tmodule adopts as the convolution kernel size, where is the number of features of multivariate time series. And the number of output channels in CNN is set to 32. Moreover, the dimension of hidden states in BiLSTM is 100. The learning rate is 0.001, the batch size is 32, and the entire network is trained for 30 epochs. The Adam [44] optimization algorithm is used to optimize the model parameters.
4.1.2. Metrics for Comparison
Metrics such as RMSE, MAE, and CORR are used to evaluate the continuous prediction performance of the model. RMSE is calculated by taking the square root of the arithmetic mean of the difference between the observed value and the predicted value, as shown in Equation (3). MAE represents the mean of absolute error between the predicted and observed value, as shown in Equation (4). CORR can represent the similarity between the observed and the predicted sequence, and the calculation method is shown in Formula (5): (1)Root mean square error (RMSE):(2)Mean absolute error (MAE):(3)Pearson correlation coefficient (CORR):where means the prediction horizon, is the th observed speed value, is the th predicted speed value, and are the mean value of the observed and predicted speed value, respectively, and is the batch_size of data. For RMSE and MAE, lower value is better, but for CORR, higher value is better.
In addition, according to Table 2, we adopt the Heidke skill score to evaluate whether the model can capture the peak solar wind speed accurately: where the means of that are shown in Table 2.

4.2. Experimental Results and Analysis
4.2.1. Benchmark Models
(i)27Day Persistence model [45]. According to the Carrington rotation period, the Sun’s rotation period is about 27 days, and we use the 27 persistence model as the baseline model. The speed of 27 days ago is used as the predicted output value: , where , represents the current moment, is the input window size, and means the horizon ahead moment(ii)Persistence model. We think that the speed value is also important to the predicted value at the near moment. Therefore, the recent time is used as the predicted value output: , where the meaning of and is the same as 27 persistence model and “1” means 1 day before the current time(iii)SVR model [46]. SVR uses SVM to fit the curve for regression analysis. In this paper, SVR implemented by Scikitlearn [47] is used for prediction, and three kernel functions are used for benchmark testing: linear kernel function (SVR_linear), polynomial kernel function (SVR_Poly), and radial basis kernel function (SVR_RBF).(iv)LSTM model [22]. LSTM is a special type of RNN that can learn longterm dependent information in sequence data and alleviate the phenomena of gradient disappearance and explosion to a great extent
4.2.2. Experimental Results
As shown in Table 3, we conduct experiments compared with the benchmark models for the metric defined—RMSE, MAE, and CORR, respectively. We set the input length and the timeinadvance of prediction . When , the input includes an EUV image and 24 time series points before the day, and the output is the daily solar wind speed after the day. The best results obtained by each method under different horizon settings are bold. The best results of all methods are underlined. We can see that MMP outperforms the benchmarks over each horizon. It can also be seen from Table 3 that with the extension of the horizon, the performance of all models declines to vary degrees.
 
^{1}Note. The 27day persistence gives an RMSE of 91.37, MAE of 68.95, and CORR of 0.59. The unit of RMSE and MAE is km/s. 
In addition to the quantitative results, we also show representative solar wind time sequence profiles and the corresponding observations in Figures 3–6. From top to bottom are the 27day persistence model, persistence model, SVR_Linear, SVR_Poly, SVR_RBF, LSTM, and MMP methods, respectively. As can be seen from the graphs in Figures 3–6, compared with other comparison models, MMP fits the trend of observed values well, especially when . When , we predict the solar wind speed 1day in advance, which easily fits observations compared to other settings. Therefore, MMP fits the observed values well when . However, in the case of 2, 3, or 4day in advance prediction, the minimum and maximum are far away from the observation. We think it is because the number of samples near the mean is more than samples near the extreme value. The model cannot learn enough features from less samples well, which is a common phenomenon in machine learning. And more and more researchers are dedicated to these works [48–50]. We will continue to research how to predict highspeed stream accurately in the future.
As shown in Table 4, we calculate the Heidke skill score for all methods. By analyzing all the data from the training set and validation set, we draw a solar wind speed intervalfrequency histogram. As can be seen from Figure 7, most solar wind speeds are distributed between 300 and 500. When the solar wind speed is greater than 500 km/s, the frequency of data declines greatly. Therefore, we set the threshold as 500 km/s to calculate the Heidke skill score according to Subsection 4.1.2. The best result is shown in bold. As we can see from Table 4, MMP outperforms other methods except the 27day persistence model. We analyze the reason is that the Sun’s rotating period is 27 days, and the persistence model contains more trend and period information than the other models. So it performs better in the Heidke skill score and Figures 3–6.
 
^{1}Note. The 27day persistence gives a Heidke skill score of 0.40. 
4.2.3. Ablation Study
To prove the effectiveness of each module in the MMP model, we conduct ablation experiments, which are shown in Table 5. We name the different ablation experiments as follows: (i)MMP w/o Vmodule: the MMP model without the vision feature extractor module(ii)MMP w/o Tmodule: the MMP model without the time series encoder component
 
^{1}Note. The unit of RMSE and MAE is km/s. 
The results are shown in Table 5. For different horizon settings, the best results of each model are bold, and the best results of all models are underlined. We can find some highlighting: (i)Removing the Vmodule leads to a decline in experimental results, especially for longterm prediction. According to our analysis, multivariate time series play an important role in shortterm prediction, so removing Vmodule has little impact on shortterm prediction. However, the solar wind speed is mainly influenced by coronal holes and other driving sources for 3 and 4 days in advance prediction. Therefore, it is necessary to capture the EUV visual information, and removing the Vmodule impacts the performance significantly(ii)In contrast to the removal of Vmodule, removing Tmodule has a more significant impact on shortterm prediction. After analysis, we think that most solar wind with a short propagation time is the background solar wind. EUV images are relatively calm at this time, and the historical trend information has critical significance. Therefore, after removing Tmodule, the shortterm prediction effect is significantly reduced(iii)Under different horizon settings, our MMP model achieves the best results, with the lowest RMSE and MAE and the highest CORR. It shows that the combination of EUV image data and historical time series data can provide richer information and help to improve the effect of shortterm and longterm prediction and also proves that MMP can effectively utilize the complementary information of the two modalities
4.2.4. Different Pretrained Models
This section compares the effects of different pretrained models. As shown in Table 6, we compare the performance of VGG [33], ResNet [34], DenseNet [35], and SequeezeNet [36] as the pretrained models to capture the vision feature representation, respectively. For different horizon settings, the best results of each pretrained model are bold, and the best results of all models are underlined. From Table 6, we can find that GoogLeNet obtains the most and the best metric results. But the other pretrained models obtain the similar performance at some () combines, which proves the potential of pretrained models trained on largescale dataset to capture image features.
 
^{1}Note. The unit of RMSE and MAE is km/s. 
4.2.5. Hyperparameter Effect
In this section, we analyze the influence of essential hyperparameter settings on performance. Table 7 shows the influence of the number BiLSTM layers on the experimental results, and the best results are shown in bold. The BiLSTM module in Tmodule is used to extract the features of the time dimension of multiple time series. However, as shown in Table 7, too many or too few BiLSTM layers can also lead to performance degradation.
 
^{1}Note. The unit of RMSE and MAE is km/s. 
5. Discussion
In this paper, a multimodality solar wind speed prediction method (MMP) is proposed. MMP combines the complementary information of EUV image data and multivariate time series data to capture the dynamic information of solar surface and historical sequence trend information. Firstly, we propose a dualflow structure to extract the two modalities’ features and then fuse them for prediction. In contrast to the baseline model, our model uses information from different modalities to achieve the best results. In addition, we conduct ablation experiments to verify the effectiveness of the vision feature extractor and time series encoder. We also compare the performance of different pretrained models to verify the effectiveness of them to capture image features and conduct hyperparameter comparison experiments to verify the rationality of our model parameter selection.
For the future work, there are several promising directions for the work. Firstly, for different horizon, this paper considers the contribution of image data and time series data to performance is the same. However, it can be seen from Table 4 that image data and sequence data have different importance in the 1day in advance prediction and 2, 3, and 4 days in advance prediction. Future research will focus on the impact of different modalities on performance, assign different weights to different modalities, and use their complementary relationship to improve performance. Secondly, it can be seen from Table 4 and Figures 3–6 that our model cannot capture highspeed solar stream well, which is very difficult but essential for the application. We will focus on how to improve peak prediction in the future.
Data Availability
The data and code used to support the findings of this study will be published on Github after the paper is accepted: https://github.com/syrGitHub/MMP.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this article.
Authors’ Contributions
All authors participated in the research design and conducted the experiments. Yanru Sun performed data analysis and accomplished the writing of the manuscript.
Acknowledgments
We acknowledge the use of NASA/GSFC’s Space Physics Data Facility’s OMNIWeb (or CDAWeb or ftp) service, and OMNI data and courtesy of NASA/SDO and the AIA, EVE, and HMI science teams. We would also like to acknowledge the anonymous reviewers for their careful work and thoughtful suggestions which have helped improve the manuscript substantially. This work was supported by the National Natural Science Foundation of China under Grant Nos. 61925602 and 61732011.
References
 R. Schwenn, “Space weather: the solar perspective,” Living Reviews in Solar Physics, vol. 3, no. 1, pp. 1–72, 2006. View at: Publisher Site  Google Scholar
 M. Hapgood, “Towards a scientific understanding of the risk from extreme space weather,” Advances in Space Research, vol. 47, no. 12, pp. 2059–2072, 2011. View at: Publisher Site  Google Scholar
 K. L. Bedingfield, “Spacecraft system failures and anomalies attributed to the natural space environment,” NASA, vol. 1390, 1996. View at: Google Scholar
 R. Galvez, D. F. Fouhey, M. Jin et al., “A machinelearning data set prepared from the nasa solar dynamics observatory mission,” The Astrophysical Journal Supplement Series, vol. 242, no. 1, p. 7, 2019. View at: Publisher Site  Google Scholar
 Y.M. Wang and N. Sheeley Jr., “Solar wind speed and coronal fluxtube expansion,” The Astrophysical Journal, vol. 355, pp. 726–732, 1990. View at: Publisher Site  Google Scholar
 C. Arge and V. Pizzo, “Improvement in the prediction of solar wind conditions using nearreal time solar magnetic field updates,” Journal of Geophysical Research: Space Physics, vol. 105, no. A5, pp. 10465–10479, 2000. View at: Publisher Site  Google Scholar
 P. Wintoft and H. Lundstedt, “Prediction of daily average solar wind velocity from solar magnetic field observations using hybrid intelligent systems,” Physics and Chemistry of the Earth, vol. 22, no. 78, pp. 617–622, 1997. View at: Publisher Site  Google Scholar
 P. Wintoft and H. Lundstedt, “A neural network study of the mapping from solar magnetic fields to the daily average solar wind velocity,” Journal of Geophysical Research: Space Physics, vol. 104, no. A4, pp. 6729–6736, 1999. View at: Publisher Site  Google Scholar
 D. Liu, C. Huang, J. Lu, and J. Wang, “The hourly average solar wind velocity prediction based on support vector regression method,” Monthly Notices of the Royal Astronomical Society, vol. 413, no. 4, pp. 2877–2882, 2011. View at: Publisher Site  Google Scholar
 Y. Yang, F. Shen, Z. Yang, and X. Feng, “Prediction of solar wind speed at 1 au using an artificial neural network,” Space Weather, vol. 16, no. 9, pp. 1227–1244, 2018. View at: Publisher Site  Google Scholar
 Y. Sun, Z. Xie, Y. Chen, X. Huang, and Q. Hu, “Solar wind speed prediction with two dimensional attention mechanism,” Space Weather, vol. 19, no. 7, article e2020SW002707, 2021. View at: Publisher Site  Google Scholar
 A. Krieger, A. Timothy, and E. Roelof, “A coronal hole and its identification as the source of a high velocity solar wind stream,” Solar Physics, vol. 29, no. 2, pp. 505–525, 1973. View at: Publisher Site  Google Scholar
 T. Rotter, A. Veronig, M. Temmer, and B. Vršnak, “Relation between coronal hole areas on the sun and the solar wind parameters at 1 au,” Solar Physics, vol. 281, no. 2, pp. 793–813, 2012. View at: Publisher Site  Google Scholar
 S. Woo, J. Park, J.Y. Lee, and I. S. Kweon, “Cbam: convolutional block attention module,” in Computer Vision – ECCV 2018. ECCV 2018, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., vol. 11211 of Lecture Notes in Computer Science, pp. 3–19, 2018. View at: Publisher Site  Google Scholar
 J. Hu, L. Shen, and G. Sun, “Squeezeandexcitation networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, Salt Lake City, UT, USA, 2018. View at: Publisher Site  Google Scholar
 G. Huang, Z. Liu, L. Van DerMaaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708, Honolulu, HI, USA, 2017. View at: Publisher Site  Google Scholar
 K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, Venice, Italy, 2017. View at: Google Scholar
 H. Tan and M. Bansal, “Lxmert: learning crossmodality encoder representations from transformers,” 2019, https://arxiv.org/abs/1908.07490. View at: Google Scholar
 Z. Huang, Z. Zeng, B. Liu, D. Fu, and J. Fu, “Pixelbert: aligning image pixels with text by deep multimodal transformers,” 2020, https://arxiv.org/abs/2004.00849. View at: Google Scholar
 V. Upendran, M. C. Cheung, S. Hanasoge, and G. Krishnamurthi, “Solar wind prediction using deep learning,” Space Weather, vol. 18, no. 9, article e2020SW002478, 2020. View at: Publisher Site  Google Scholar
 C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, Boston, MA, USA, 2015. View at: Google Scholar
 S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. View at: Publisher Site  Google Scholar
 H. Raju and S. Das, “Cnnbased deep learning model for solar wind forecasting,” Solar Physics, vol. 296, no. 9, pp. 1–25, 2021. View at: Publisher Site  Google Scholar
 J. Lu, D. Batra, D. Parikh, and S. Lee, “Vilbert: pretraining taskagnostic visiolinguistic representations for visionandlanguage tasks,” Advances in Neural Information Processing Systems, vol. 32, 2019. View at: Google Scholar
 L. H. Li, M. Yatskar, D. Yin, C.J. Hsieh, and K.W. Chang, “Visualbert: A Simple and Performant Baseline for Vision and Language,” 2019, https://arxiv.org/abs/1908.03557. View at: Google Scholar
 G. Li, N. Duan, Y. Fang, M. Gong, and D. Jiang, “Unicodervl: a universal encoder for vision and language by crossmodal pretraining,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11336–11344, 2020. View at: Publisher Site  Google Scholar
 D. Zhang et al., “Multimodal multilabel emotion recognition with heterogeneous hierarchical message passing,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 14338–14346, 2021. View at: Google Scholar
 S. M. S. A. Abdullah, S. Y. A. Ameen, M. A. Sadeeq, and S. Zeebaree, “Multimodal emotion recognition using deep learning,” Journal of Applied Science and Technology Trends, vol. 2, no. 2, pp. 52–58, 2021. View at: Publisher Site  Google Scholar
 P. Anderson, X. He, C. Buehler et al., “Bottomup and topdown attention for image captioning and visual question answering,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6077–6086, Salt Lake City, UT, USA, 2018. View at: Publisher Site  Google Scholar
 W. Su, X. Zhu, Y. Cao et al., “Vlbert: pretraining of generic visuallinguistic representations,” 2019, https://arxiv.org/abs/1908.08530. View at: Google Scholar
 C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid, “Videobert: a joint model for video and language representation learning,” in IEEE/CVF International Conference on Computer Vision (ICCV),, pp. 7464–7473, Seoul, Korea (South), 2019. View at: Publisher Site  Google Scholar
 M. Chen and X. Zhao, “A multiscale fusion framework for bimodal speech emotion recognition,” in Interspeech 2020, pp. 374–378, Shanghai, China, 2020. View at: Publisher Site  Google Scholar
 K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” 2015, https://arxiv.org/abs/1409.1556. View at: Google Scholar
 K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, 2016. View at: Publisher Site  Google Scholar
 G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, Honolulu, HI, USA, 2017. View at: Publisher Site  Google Scholar
 F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet:Alexnetlevel accuracy with 50x fewer parameters and< 0.5 mb model size,” 2016, https://arxiv.org/abs/1602.07360. View at: Google Scholar
 J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei, “Imagenet: a largescale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255, Miami, FL, USA, 2009. View at: Publisher Site  Google Scholar
 J. Gosling, R. Hansen, and S. Bame, “Solar wind speed distributions: 1962–1970,” Journal of Geophysical Research, vol. 76, no. 7, pp. 1811–1815, 1971. View at: Publisher Site  Google Scholar
 Z. Yang, F. Shen, J. Zhang, Y. Yang, X. Feng, and I. G. Richardson, “Correlation between the magnetic field and plasma parameters at 1 au,” Solar Physics, vol. 293, no. 2, pp. 1–13, 2018. View at: Publisher Site  Google Scholar
 J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990. View at: Publisher Site  Google Scholar
 A. Graves, A.R. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649, Vancouver, BC, Canada, 2013. View at: Publisher Site  Google Scholar
 A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, vol. 18, no. 56, pp. 602–610, 2005. View at: Publisher Site  Google Scholar
 W. Boehm, “Inserting new knots into bspline curves,” ComputerAided Design, vol. 12, no. 4, pp. 199–201, 1980. View at: Publisher Site  Google Scholar
 D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” 2014, https://arxiv.org/abs/1412.6980. View at: Google Scholar
 M. J. Owens, R. Challen, J. Methven, E. Henley, and D. Jackson, “A 27 day persistence model of nearearth solar wind conditions: a long leadtime forecast and a benchmark for dynamical models,” Space Weather, vol. 11, no. 5, pp. 225–236, 2013. View at: Publisher Site  Google Scholar
 H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Support vector regression machines,” Advances in Neural Information Processing Systems, vol. 9, pp. 155–161, 1996. View at: Google Scholar
 F. Pedregosa, G. Varoquaux, A. Gramfort et al., “Scikitlearn: machine learning in python,” The Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011. View at: Google Scholar
 J. Liu, Y. Sun, C. Han, Z. Dou, and W. Li, “Deep representation learning on longtailed data: a learnable embedding augmentation perspective,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2970–2979, Seattle, WA, USA, 2020. View at: Publisher Site  Google Scholar
 T. Wu, Q. Huang, Z. Liu, Y. Wang, and D. Lin, “Distributionbalanced loss for multilabel classification in longtailed datasets,” in Computer Vision – ECCV 2020. ECCV 2020, A. Vedaldi, H. Bischof, T. Brox, and J. M. Frahm, Eds., vol. 12349 of Lecture Notes in Computer Science, pp. 162–178, Springer, 2020. View at: Publisher Site  Google Scholar
 F. Zhou, L. Yu, X. Xu, and G. Trajcevski, “Decoupling representation and regressor for longtailed information cascade prediction,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1875–1879, Canada, 2021. View at: Publisher Site  Google Scholar
Copyright
Copyright © 2022 Yanru Sun et al. Exclusive Licensee Beijing Institute of Technology Press. Distributed under a Creative Commons Attribution License (CC BY 4.0).