Support vector machine tropical wind speed retrieval in the presence of rain for Ku-band wind scatterometry

. Wind retrieval parameters, i.e., quality indicators and the 2DVAR analysis speeds, are explored with the aim to improve wind speed retrieval during rain for tropical regions. We apply the well-researched support vector machine (SVM) method in machine learning (ML) to solve this complex problem in a data-orientated regression. To guarantee the effectiveness 10 of SVM, the inputs are extensively analysed to evaluate their appropriateness for this problem, before the results are produced. Subsequently, triple collocation shows that the similarity of the resolved Ku-band (OSCAT-2) wind speed in rain is better than the 2DVAR speed, with respect to the collocated C-band (ASCAT) speed, which is much less affected by rain. The comparisons between distributions and differences between data of rain-contaminated winds, corrected winds and good quality C-band winds, illustrate that the rain-distorted wind distributions become more nominal with SVM, hence eliminating reducing 15 much the rain-induced biases and error variance. Further confirmation is obtained from a case with synchronous Himawari-8 observation indicating rain (clouds) in the scene. Furthermore, the determination estimation of simultaneous rain rate is attempted with some success to retrieve both wind and rain. Although, additional observations or higher resolution may be required to better assess the accuracy of the wind and rain retrievals, the Machine Learning (ML) results demonstrate benefits of such methodology in geophysical retrieval and nowcasting applications.

normalized radar cross-sections (NRCS) of the wind-roughened ocean surface in different azimuthal directions from oblique incidence angles. The winds are then obtained in a maximum likelihood estimation method (MLE) from the measured NRCSs within a wind vector cell (WVC) with reference to a Geophysical Model Function (GMF). Generally, a WVC is a square of the size 25 km × 25 km sized square, and GMFs are empirical models mapping NRCSs from scatterometers in different frequencies, polarizations and observing geometries to winds. 35 Rain products provide another important information for air-sea interaction. In the Global Precipitation Mission (GPM), one of the core instruments is the dual-frequency radar (DPR) working at Ku-and Ka-band in nadir-looking mode. Rain is then obtained by relating the radar cross-sections to a chosen distribution of precipitation particles. Meanwhile, rain products from infrared observations are also widely used, for example, rain rates from the Spinning Enhanced Visible and Infrared Imager (SEVIRI) on-board the Meteosat Second Generation (MSG) Satellite, which is derived by considering retrieved cloud 40 condensed water path (CWP), particle distribution, and cloud thermodynamic phase (E. L. A. Wolters, M. Van Den Hurk et al, 2011). Both rain products described above are good references for rain in Ku-band wind scatterometry (Xu et al., 2020a), though the high spatial and temporal variability of rain generally challenges small collocation errors and high correlation between instantaneous rain data sets (e.g., Liu et al., 2020).
Combined retrievals of wind and rain are generally applying synchronous passive measurements from radiometers for rain 45 in the scatterometer case (Stiles et al., 2010) while and in addition to rains for winds are retrieved in GPM researches (Li et al., 2014). Radiometer winds are of coarser spatial resolution and are not adept for wind direction retrieval, which would require the third and fourth Stokes parameters that are now generally obtained in low signal to noise ratio (SNR). Scatterometers are not specifically designed for acquiring precipitation profiles. When rain clouds affect the observations, the winds obtained from a wind GMF will deviate from the truth, resulting in biases in the retrieved wind and an increased retrieval residual, 50 called MLE. Since rain is spatially more heterogeneous than winds are, rain can be captured and estimated in the NRCS set within a WVC. Considering the distances of NRCS observations to the wind GMF, the retrieved wind and with the reference to rain observations from the Tropical Rainfall Measuring Mission (TRMM) Precipitation Radar (PR), wind and rain may be segregated (Owen et al., 2011;Draper et al., 2004). Furthermore, the heterogeneous rain within a WVC can be depicted from indicators applied in scatterometer Quality Control (QC) Lin et al., 2017). Joss is a recent indicator occurs simultaneously with rain in convective downdrafts for C-band winds, hence illustrating the physical integrity of C-band 65 winds in the presence of rain. C-band rejections correspond to the most extreme variability in WVCs, including wind gradients induced by heavy precipitation downdrafts (King et al., 2017). The different rain signatures in C-and Ku-band scatterometers can cast a light on developing methods for correction of the rain-affected winds in Ku-band scatterometer retrievals by referring to their C-band collocations. Particularly, the combination of MLE and Joss appears promising to segregate wind variability and rain effects in  To derive the complexly associated wind and rain information referred to above, machine learning (ML) may prove to be a powerful tool, which can be applied with knowledge of the validity of the underlying principles (Reichstein et al., 2019). In fact, ML methods have long been well-researched in wind scatterometry (Thiria et al., 1993;Stiles et al., 2010). For common roughness conditions, it cannot exceed the performance of GMF-based methods (Cornford et al., 1999), but ML may be effective in rainy conditions. Among the ML methods, support vector machine (SVM) is one based on the Mercer theorem, by 75 complement of empirical risk minimization with Vapnik-Chervonenkis (VC) confidence, infer statistical relations without a priori distributions and gives no regional minimum (Vapnik, 1998). It can establish an information space based on the training set and if the data applied in training is well representative for the problem, it also requires less samples than other ML methods.
Aside, SVM already provides good results in rain rate estimates (Kumar et al, 2021).
In this research, SVM is applied for wind correction of rain-affected winds of Ku-band scatterometers, considering 80 quantified rain and rain effect information captured in the QC indicators of Ku-band observations. The GPM rain products and collocated accepted winds from C-band products are used as references. When this SVM model has been established, without C-band collocations, the rain contaminated winds can be corrected with Ku-band winds and their QC indicators alone. First, in the method part, the underlying principles of the problem of rain signatures in scatterometry are addressed in detail with a brief of error requirement for assimilation application before data description. Then, in the experimental part, results for the 85 testing set, not applied in the training procedure, demonstrate a minimum mean difference of -0.12 m/s at about 8 m/s and a largest difference of -3.25 m/s at about 14 m/s ASCAT speed. The distributions of the corrected windsare obtained and,together with the scatter plots against C-band winds isare inspected, with the a check on wind differences in each wind speed bin of the original and corrected winds against ASCAT winds, prove proving the more unbiased and symmetric error of the corrected set, addressing illustrating the advantage of applying SVM. Together with the analysis on The similarity of the corrected 90 distributions with the references provided from collocated ASCAT winds and the reduced mutual differences, it indicates is indicated that to a certain extent the local (WVC) wind scales are recovered from analysis of the error distributions by the SVM corrections. Results suggests that the method resolves the heterogeneity induced by rain clouds in MLE and Joss from with the settings of the proposed SVM. Furthermore, a case without data that is not with rain collocations, and thus not involved in deriving the method of this research corrections, is provided as a case study for verification case study. Where simultaneous 95 images from the Himawari-8 provides a concrete view of the rain clouds in the scene.
In the discussion part, rain labelling and regression SVMs are established with the same inputs, attempting rain estimation from scatterometer winds by employing SVM. The rain identification accuracy is 72% for the independent test set not applied in the training procedures. While for rain rate estimation, the correlation coefficient of SVM rain with GPM products achieves 0.47 for the independent testing set. An analysis of the uncertainties in the SVM model and possible improvements in the rain 100 estimation procedure are also discussed. The corrected winds increase the global wind coverage and, in synergy with the rain information research provided, benefit now-casting applications (Majumdar et al, 2021). This research illustrates an example of n acomplex scene ofdatadriven ML methods, complementary to traditional methods in complex problems, which motivates and demonstrates the adhibition of the ML method in meteorological applications.

Method 105
Research on observation errors, i.e., the deviations from the truth, together with the monitoring information obtained from differences between scatterometer winds and models, supports NWP. before obtaining monitoring information of differences between scatterometer winds and models. Among the errors which, undetermined geophysical dependencies including rain effects are to be corrected to better understand model biases (Stoffelen et al., 2021), while it cannot be achieved by a first order correction. Apart from this, the control variables, define multivariate background errors and correlated errors between variables 110 are modeled by linear regression (Descombes et al, 2015). Also, the 3DVAR and Kalman Filter assumes linear or quasi-linear and Gaussian features in observation operator and error distributions respectively, when 4DVAR considers additional dynamical constraints in the time dimension (Parrish et al., 1992;Courtier et al., 1994). Hence linearized Gaussian or quasi-Gaussian errors are vital for the assimilation of observations. We seek to address and correct biases in Ku-band scatterometer wind retrievals due to rain. In the following part, first, the complex rain signatures in wind scatterometer observations are 115 analyzed, demonstrating non-Gaussian error features before the principles of SVM are introduced.

Rain characteristics in MLE, Joss and the fractal parameter
When compared to the C-band winds that are of good quality (accepted), collocated Ku-band QC-rejected WVCs in tropical regions are affected by rain due to the shorter observing wavelength (Xu et al., 2020a). The wind Quality Control (QC) is determined by QC indicators, and the indicator widely applied in operational wind products, is the MLE residual obtained 120 through wind inversion. Using all N NRCS measurements obtained within a WVC Maximum Likelihood Estimation (MLE) procedures are applied for wind retrieval. The MLE residual is a normalized Euclidian distance to the cone determined by GMFs (Stoffelen et al., 1997): Where is the i th NRCS of the N NRCSs within a WVC, is a dimensionless constant determined by instrument noise 125 represents the variance of NRCS within this WVC, and is from a wind GMF indexed by observing geometry and the local wind vector. Before wind inversion, NRCS are well-calibrated for instrumental as well as GMF uncertainties that are generally small (~2%), and are reproducible or systematic. NRCS calibration and GMF bias term uncertainties lead to wind speed probability density function variations. Errors in the harmonic terms of the GMF may lead to wind direction errors, and in systematic wind speed errors that have associated wind direction errors and vice versa . During the 130 2DVAR procedure that optimizes wind vector selection , essentially the WVC MLE associated with the selected direction is determined. At the same time, the 2DVAR low-pass filtered analysis winds, which are here referred to as 2DVAR winds, are calculated When rain affects the NRCS, the GMF does not represent the NRCS measurements well, as rain effects are not considered in the wind GMFs (Stoffelen, 1998). Therefore, this part of the GMF error due to missed or incompletely modelled rain processes generates errors of a class that cannot be eliminated by calibration, and induces deviation 135 of error distributions from the well-calibrated random Gaussian shape. Note that the KNMI QC flag is based on MLE values, and in the Ku-band rejections and C-band acceptances in tropical regions, the rejections are mainly caused by rain. Hence, MLE values of the 2DVAR selected Ku-band wind can be related to rain effects that alter the amplitudes of NRCSs.
However, at the same time, the 2DVAR winds do not use QC-flagged WVCs and are hence not affected by local disturbances introduced by rain. The wind speed correction procedure employed here, hence does not change the 2DVAR analysis field, 140 nor the selected wind direction at the rain-affected WVCs obtained during the elaborated 2DVAR Multiple Solution Scheme (MSS) . Then, the rain effect is estimated by the wind speed difference of the 2DVAR analysis wind speed f and the selected observational wind speed fs, corresponding to the wind direction obtained by 2DVAR (Xu et al., 2020a): As Note that the 2DVAR winds are low-pass filtered and of relatively coarse resolution, and ignoring rain-affected WVCs through MLE-based QC (J. Vogelzang, 2017). The Since the spatially heterogeneous tropical rain clouds are generally of smaller spatial scale than a WVC, rain effects in the 2DVAR analysis winds can be ignored and taken as the true winds (J. Vogelzang and A. Stoffelen, 2018). Hence, Joss values can screen and eliminate false alarm rate (FAR) for MLE based QC results for Ku-band wind products after 2DVAR processing, indicating rain information (Xu et al, 2020b). 150 Usually rain clouds will cause negative Joss for wind speeds below 15 m/s. A WVC is usually partially heavy rain, and since Ku-band rain saturates around 18m/s, hereafter the parameter for area fraction for Ku-band winds can be expressed as: As 18 m/s winds cannot be distinguished from rain and to allow rain sensitivity, the rain effect correction set is limited to: 155 For retrieved 2DVAR speed smaller or equal to 11m/s. For 2DVAR wind speed larger than 11m/s, the FAR set is limited to Joss < -1.33 (Xu et al., 2021). Then the negative values of corresponding to positive Joss when wind speeds are smaller than 18 m/s can be owed to effects of local variance of the ocean surface. Larger wind speed than 18 m/s and positive Joss may happen when both rain and winds are large in the scene. For tropical rain this practically only occurs in hurricanes, but has not yet been investigated with respect to Joss in the criterion above. Thus, this parameter can provide relative information of rain within the WVC from 2DVAR residuals.
Enhanced wind variability enhances MLE, due to beam collocation errors. In particular, extreme wind convergence and divergence is associated with heavy rain (King et al., 2017). The wind variability associated with heavy precipitation may enhance the wind speed, just like rain does at Ku-band, but which has been investigated by comparing the 2DVAR winds with 165 ASCAT winds. ASCAT winds are equally sensitive to wind speed variations at the surface, but much less sensitive to rain cloud scattering effects. Hence, the effect due to amplitude alternations for a single NRCS in a tropical scene with rain clouds can be obtained by the rain screening ability of Joss.
From the above contents and equations, rain effects can be represented by MLE, Joss, and the observational wind in the Kuband retrieval, while the 2DVAR analysis wind provides information on rain sensitivity. In this research, for the C-band QC 170 accepted and Ku-band rejected WVCs, after the FAR set is eliminated, the Ku-band WVCs are collocated with rain rates from GPM products. Then MLE, Joss values and the 2DVAR winds and observational winds are applied as inputs to the SVM model, with the training destination set as the collocated C-band winds. In the established model, corrected winds closer to the observed C-band winds may can be obtained for rain-affected Ku-band WVCs, by eliminating non-Gaussian errors within a WVC caused by rain. Moreover, the SVM model, when established, could can be applied for Ku-band rejections. 175

The principle of SVM regression
The SVM regression procedures map input vectors to a space of higher dimension before the regression is conducted. When the mapping is obtained and thus described by kernel functions determined from the training sets, non-linear features are linearized. This provides a possibility for solving problems that are non-convex and difficult to solve in the original input space, as well as linearizing intricate relations. Specifically, during the training procedure, weights for the input vectors in the 180 training set in the mapped space is determined, and the corresponding support vectors (SVs) can be identified by the values of corresponding weights. While the weights are applied to scale similarities with other vectors in the training set. On the other hand, they are obtained by minimizing distances with the targets of the training vectors. Moreover, the similarity is measured between the kernel function mapped inputs. In this way, it allows the data involved in training to embody the underlying model in a space that facilitates information extraction. Furthermore, L2-normdistance minimization is achieved by an objective 185 function expressed as the distances between the vectors in the training sets to the plane fixed by the weighted support vectors in the mapped space (Vapnik, 1998).
The employed kernel functions are linear, generally polynomial or Gaussian radial basis functions (RBF). Among them, the RBF, or the Gaussian kernel, is superior in unlimited dimension mapping and easier in hidden parameter setting. For RBF, the similarity between a vector x and the selected support vector (1) is expressed as (Vapnik, 1998;Smola et al., 1998): 190 Where σ is the scale parameter weighting the similarity of x and (1) . And the larger the value of σ is, the more x and (1) can be taken as similar. If the L2 distance is applied: When θi are weights, ( ) is the target value corresponding to , the objective function can be expressed as: 195 where C is the relaxation coefficient and the L2 distance is applied as the cost functions 1 and 0 (Smola et al., 1998;Chang et al., 2019).

The expression of rain in wind retrieval parameters 200
The representativeness of the data sets from which the featured SVs are obtained is vital in the SVM procedure. In this research, OSCAT-2 WVCs of 4.8 min. Furthermore, rain products are area-weighted over the OSCAT-2 WVCs to obtain WVCrepresentative rain rates (Xu et al., 2020c). Finally, for validation, the images of the 11th band (medium infrared, MI, 8.6μm) with 2 km resolution in the tropics are also used for reference (Otemachi, C.-k. et al., 2015). 215 In Figure 1 (a) we note that observed wind distributions from ASCAT and OSCAT-2 are similar, while in Figure 2 (a), the Ku-band winds are much elevated with respect to ASCAT and clearly suspect, as the ASCAT wind while latter distribution 235 appears nominal and similar to that in Figure 1 We note from Figure1 and Figure 2 that rain casts effects on OSCAT-2 data while collocated ASCAT winds remain of acceptable quality. The winds distorted by rain (clouds) are clearly segregated by the FAE, resulting in a deformed speed distribution, as well as much elevated MLE and Joss, that all can be potentially related to WVC rain rate.

SVM for Ku-band wind correction in rain
For the correction of rain effects a SVM model is established, where the inputs are determined by the wind-rain related 245 parameters, as described in the previous sections. Specifically, the inputs and outputs are in Table 1. OSCAT-2 observational speed The SVM tool from sklearn is applied, which is based on the libsvm to realize the procedure described in section 2 for SVM (C-C Chang and C-J Lin, 2019). In total, there are 18,528 WVCs obtained from FAE in OSCAT-2 collocations for ASCAT-250 A and ASCAT-B together. Among them, 70% (12, 969 WVCs) are used in training and 30% (5,559 WVCs) for testing or validation. Note that the testing set is not applied in the training procedure.

Results
Starting from the large input biases illustrated in Figure 3 (a), typically 5 m/s, Figure 3 shows the corrected winds against the 255 accepted winds from ASCAT-A and ASCAT-B for the training set in (a) and the validation set in (b). While in (c) and (d), the observational winds and 2DVAR winds of OSCAT-2 are also plot against ASCAT winds. With some of the corresponding statistics listed from (a) to (d), in Table 2. 265 Table 2 Corresponding mean and standard deviation of difference (SDD) statistics to Figure 3 (a-d). As can be seen from Figure 3  it is noteworthy that there is a sign change for these speed differences, suggesting an excessive speed range suppression for wind speeds both lower and higher than around 8 m/s respectively. This trend also exists in Figures 3(c) and 3(d) of the 280 observational and 2DVAR wind against ASCAT winds, as seen from the curvature of the red lines representing mean bin values, though they are generally smaller and larger than the ASCAT wind speed for the 2DVAR and observational speed respectively, while the distances are larger in absolute values for the observational winds. This is consistent with the fact that the OSCAT 2DVAR wind filters the details of the local wind changes, ignoring wind variability due to rain that is captured by the C-band observations of good quality at finer resolutions. We further note that Figure 3 and Table 2 are based on a 285 conditional binning of ASCAT winds, while ASCAT winds are not perfect and OSCAT is not perfectly collocated with ASCAT. Such uncertainty in ASCAT also has the tendency to flatten the red curves in Figure 3.

(a) Corrected winds in the training set (b) Corrected winds in the validation set ASCAT-
In Figure 4, the distributions of wind speed of the OSCAT-2 observational wind speed, OSCAT-2 2DVAR speed, collocated ASCAT speed and that of the SVM-corrected speed are displayed for the testing set.

295
From Figure 4 (a), the blue curve indicates rain affected OSCAT-2 winds are elevated and skewed to higher speeds, peaking at around 12 m/s. They also deviate from the corresponding 2DVAR speeds (purple) as well as the collocated ASCAT winds (green). Similar to the latter two, the SVM-corrected winds (lighter blue) peak at a similar speed around 8 m/s. This is also consistent with Figure 1(a). Moreover, note that the 2DVAR wind distribution extends to the lowest speeds and deviates more than the corrected winds from ASCAT observations.. Anyway, the corrected winds show a very similar shape to the ASCAT 300 distribution, proving the effectiveness of the SVM. Figure 4 (b) demonstrates the speed errors defined as the differences with respect to the ASCAT observations. Consistent with (a), the errors distribute more symmetrically and over the smallest range for the corrected winds. The more Gaussian-like features of this speed error as compared to the other groups can be more easily can be observed that deviations from the C-band accepted collocations are due to rains vary with the reference wind speeds in a similar linear way while for each wind speed there are multiple differences induced by rain. This is consistent with the quasi-320 linear relationship between Joss and rain rates in Figure 2, and explains that such second-order (speed difference v.s. speed) relations involving multiple parameters (rain, wind and wind-rain correlations) cannot be corrected by more simple linear methods. While in (b), the corresponding density of samples indicates non-uniform characteristics of the distribution of the differences for each reference speed (horizontal axis), implying skewed error distributions. At the same time, in (c) and (d), it can be seen that by SVM corrections, most of the differences are corrected, while (d) shows more evenly distributed difference 325 patterns for the moderate wind speeds, where rain contamination effects appear better resolved, implying more uniform and normal difference values. This goes along with the distribution of corrected OSCAT winds slightly skewed away from the diagonal, this may due to the lack of samples in higher wind speeds.

Spatial consistency of corrected winds
In this section, to obtain a spatial view of the results, figures of the collocated data on a randomly selected date (22 nd , May 330 2017) are provided in Figure 6, where (a) shows the wind speed of OSCAT-2 in both QC-I and QC-II collocations, and that of the rest FAE set. The same set is displayed in (b), but where the FAE OSCAT-2 wind speeds are from the SVM corrections,.
In (c) the regressed wind is replaced by the ASCAT accepted winds. Furthermore, data in Figure 4 is without GMPGPM collocations, and the SVM winds are retrieved directly from the model established in section 3.2.

335
( (e) (f) 340 Figure 6: OSCAT-2 speed (m/s, in colour bars) for QC-I collocation set FA and FAE in the QCII set (a), and that of the QC-I, QC-II FA set when the FAE in QC-II are replaced SVM regressed speeds (b), then the FAE wind speeds are substituted by collocated

ASCAT-A and ASCAT-B speeds (c). (d) is the differences of speeds in (c) with their corresponding ASCAT speed, and (e) indicate the FAE location while (f) shows the statistics of the corrected wind with ASCAT wind
In Figure 6, the abscissas are longitudes, while the ordinate represents latitudes, and both are in degree. Then the colour bars 345 indicate wind speeds in m/s. Where the ascending and descending tracks are displayed together, with latter observations obtained replacing the former ones. It can be observed that the colour red in (a) is suppressed in (b), while (b) is also in better consistency with (c) than (a) is. This can be directly observed from (d), with the corrected wind locations from (e). (f) shows a generally accepted correction in this region with speed higher than 12 m/s overestimated. Similar trends can also be noted in regions becoming much bluer, especially, in cases that can be found near the red regions. Nota bene, the higher wind regions 350 with speed larger than 15 m/s are with fewer samples while also limited by the FA rule limiting Joss to -1.33m/s, above which, the wind-rain tangling at higher speed cannot be well resolved. Moreover, a region with no GPM GMP collocation, thus not

Figure 7: Wind speed of the QC-I, QC-II FA and QC-II FAE (a), the FAE set replaced by the SVM regressed speed (b) and by speeds from their ASCAT collocations (c), with the synchronous MIR (e) images from Himawari-8, where the green rectangle indicates the region in (a), (b) and (c).
involved in training procedures, is selected from the data set generating Figure 7, and is shown in Figure 8  It can be seen from Figure 8 that higher wind due to rain is suppressed by the method proposed, while for higher wind speed around 12 m/s, the SVM-regressed winds become somewhat less consistent with ASCAT truth, as discussed in the former 380 section. The effectiveness of the SVM-regressed winds are further confirmed with the data in Figure 8, as they have not been applied in the derivation of the SVM.

Discussion
Air-sea interaction in the vicinity of rain is complex and difficult to observe. In this research, the effect of rain in Ku-band wind scatterometery is explored for correction of retrieved wind under rainy conditions. The method employed is as follows: 385 on the basis of the analysis of signatures induced by rain from parameters obtained during wind retrieval from scatterometers, rain effects are corrected as a function of these signatures. Specifically, for quantifying the heterogeneity induced by rain and its effect on the wind speed, the quality indicator MLE and Joss are analyzed, with reference to the low-pass filtered 2DVAR winds and collocated ASCAT winds (Xu et al., 2020a). Accepted C-band ASCAT winds  are used as reference to identify the rain effects and form the basis of a correction after establishing a SVM. Results show that the 390 correction is adequate, especially at speeds with abundant information for in the Ku-band to segregate wind and rain (under 12m/s). The spatial consistency of the corrected winds with the ASCAT observational winds is identified as more similar than to the 2DVAR winds. Subsequently, a case is provided with comparison to MIR images to check for rain occurrence. This confirms that the SVM method proposed is effective. Hereafter, rain information extraction from scatterometers is established.
Following which, further analysis and discussion on the remaining uncertainties are given, with a view to improve in our future

SVM for rain identification and regression
For a view of uncertainties unresolved with wind-rain tangling in Ku-band wind scatterometry, SVMs in the same input for rain identification and regression in the following Table:   Table 3 Inputs and outputs for the SVM of rain classification and regression: 400 The data set is the same as for the wind correction, while the training target changed to GPM rain. The classification accuracies for both the training and testing set of rain identification SVM are the same and 72%. The results for rain regression are shown in the following figure, where the correlation coefficient of the SVM-regressed and GPM rain rates for the training set and the testing set are both 0.47. Little skill for rain rate appears below 5 mm/hr, while GPM produces more extreme rain 405 rates > 10 mm/hr. The corresponding scatter plots of the regressed rain rates in the training set and testing set are depicted in From visualization of the classification results (details not shown), non-rainy WVCs are less often incorrectly classified than rainy WVCs. Where larger 2DVAR speed are well-crowded and can be better discriminated in MLE, Joss and α to the correct 415 class, while this is more difficult for smaller 2DVAR speed WVCs. Light rain clouds have small effects to the wind observations. Correspondingly, Figure 10 (a) shows the distribution of rain rates from GPM (blue), SVM regression (purple) and that of the error defined as the GPM rain rate minus the regressed values (green). The corresponding CDF of error is shown in (b). In addition to Figure 9, Figure 10 (a) shows in detail that SVM regressed rain fails in capturing the non-convex feature in lower rain rate, and in prediction of higher rains. This may due to the L2 distance norm applied and lack of information as 420 well as samples. For GPM rain above 10 mm/h, OSCAT rain rates are rather randomly distributed and presumably lack skill.
However, from (b) in Figure 10, it can be observed that the error displays a feature of symmetry and steady increasing feature.
And those within the range of [-2, 2] mm/h takes 34%, within [-5, 5] mm/h takes about 80%, consistent with the correlation coefficient value of 0.47. L1 distance, at the same time, including other sources of observation, with increasing number of samples may help improve the results. Xu et al. (2020b) find similar spread in rain products at the scatterometer spatial 425 resolution, hence illustrating the applicability of the SVM rain product derived here.

Conclusions and further research
Rain features in wind scatterometry in the Ku-band can trigger QC rejections. These effects also provide opportunities to identify rain and perform wind corrections. The SVM method proposed acts well for medium and lower wind speeds, while the wind-rain tangling remains severe for higher wind speed. This can also be noted from the rain identification and regression On the other hand, from the rain features in MLE and Joss, as well as the uncorrected speed, it can be seen that uncertainties can be introduced from the training parameters; the normalized MLE is designed to characterize errors that result in large 435 deviations from the GMF for QC, but its accuracy depends on relative wind vector and azimuthal diversity of the NRCS views.
While the 2DVAR speed is derived balancing errors in the observation space of a grid of WVCs and the NWP background, representing larger spatial scales, thus they can be considered as lower-bound estimates of the true values and uncertainties in the wind speeds can be different due to spatial heterogeneity. This may hamper the effectiveness of the rain screening ability of Joss. In order to bound those uncertainties for better results in SVM, extra observations for rain (clouds) can help, while 440 higher spatial resolution is obtained in the next generation of scatterometers for simultaneous ocean surface wind and current measurements, for example, Chelton et al. (2018) and Du et al. (2021). OSCAT-2 and ASCAT collocations provided a unique opportunity to study rain effects in Ku-band scatterometers. Rain effects are rather transient in nature, where the moist convection time scale is about 30 minutes. This implies updrafts, downdrafts and rain patterns in a WVC change very fast and rather strict collocation criteria would be needed to resolve rain effects well. With WindRad on FY3E a combined C-and Ku-445 band scatterometer has been launched on the 5th, July, 2021, which will provide parts of the swath with excellent azimuth diversity and both C-and Ku-band retrieval capability. Hence this mission will be useful to further elaborate this research.
Above all, the SVM can effectively represent the increasing effect of rain in elevating wind speeds as the true wind speed decreases showing the advantage of the ML method for such complex problems involving multiple interrelated variables. The method provides correction of deviations that are non-uniform and skew-to Gaussian-like features. This demonstrates the 450 effectiveness of a ML method when used with representative parameters for addressing more complex problems. The corrected winds provide information previously lacking, while vital for nowcasting winds in the presence of moist convection and to improve initialization of NWP models in dynamic conditions. The rain regression in SVM indicates the potential of additional rain information observations for further exploration, as well as the promise of improved hybrid wind and rain optimization estimation methods based on ML using physically meaningful with parameters chosen considering their physical features for 455 the problem at hand.

Code availability
None. But for the experiments, they can be reproduced upon request.

Appendix: 460
The mean values and standard deviations of differences For the comparison of two collocated groups of data, one of which is set as reference group. Then figures and values are obtained by grouping the reference data (depicted as horizontal axis) and the other data set to compare (vertical axis) into i bins of the same sample number j. For the mean values of the reference data, Refi, (in tables they are put in the first column), there is corresponding (in tables as the second column) and standard deviation values (third column) calculated for the data to compare (in figures as the vertical axis). Specifically, the following equations describe the calculation of the mean value and standard deviation of difference (SDD) : Where the values of the group to compare is _ . 470

Code availability
None. But for the experiments, they can be reproduced upon request.

Competing interests
Ad Stoffelen is associate editor of AMT.