Retrieval of ice water path from the FY-3B MWHS polarimetric measurements based on deep neural network

. Ice water path (IWP) is an important cloud parameter in atmospheric radiation, and there are still great difficulties 10 in retrieval. Artificial neural network is a popular method in atmospheric remote sensing in recent years. This study presents a global IWP retrieval based on deep neural networks using the measurements from Microwave Humidity Sounder (MWHS) onboard the FengYun-3B (FY-3B) satellite. Since FY-3B/MWHS has quasi-polarization channels at 150 GHz, the effect of polarimetric radiance difference (PD) is also studied. A retrieval database is established using collocations between MWHS and CloudSat 2C-ICE. Then two types of networks are trained for cloud scene filtering and IWP retrieval, respectively. For 15 the cloud filtering network, the microwave channels show a lack of capacity with a false alarm ratio (FAR) of 0.26 and a probability of detection (POD) of 0.63. For the IWP retrieval network, different combination inputs of auxiliaries and channels are compared. The results show that the five MWHS channels combined with scan angle, latitude, and ocean/land mask perform best. Applying the cloud filtering network and IWP retrieval network, the final root mean squared error (RMSE) is 916.76 g m -2 , the mean absolute percentage error (MAPE) is 92%, and the correlation coefficient is 0.65. Then a 20 tropical cyclone case measured simultaneously by MWHS and CloudSat is chosen to test the performance of the networks, and the result shows a good correlation with 2C-ICE. Finally, the global annual mean IWP of MWHS of IWP retrieval from the FY-3B/MWHS observations based on the deep neural network. Both 150 GHz (QV and QH) channels and their PD are investigated. First, collocate the MWHS measurements with the CloudSat/2C-ICE IWP according to the time and geolocation. Second, we train deep neural networks (DNNs) that are used to filter cloud scenes and retrieve the IWP. The effects of different channels (including PD) and auxiliary information on DNN retrieval are also discussed. Finally, the performance of the final configuration network is evaluated. The trained neural networks are used for a tropical cyclone case and the global annual mean IWP map of MWHS. to the cyclone Bansi, the results show a relatively high correlation (0.73) between MWHS IWP and 2C-ICE. The 2015 annual mean IWP from MWHS shows a similar overall shape to that of MODIS, 2C-ICE and ERA5, and is very close to 2C-ICE in magnitude making the retrieval IWP more credible.


Introduction
Ice clouds play an important role in the global climate (Liou, 1986), and their distribution strongly affects precipitation and 25 the water cycle (Eliasson et al., 2011;Field and Heymsfield, 2015). Long time series and global observations of ice clouds are essential for understanding the Earth's climate system. Depending on the wavelength of observation, satellite remote sensing can measure different cloud microphysics. Microwave measurement can penetrate deeper into cloud layers to measure thick and dense ice clouds, while infrared and visible instruments are mainly used for thin clouds measurement around the cloud-top (Liu and Curry, 1998;Weng and Grody, 2000;Stubenrauch et al., 2013). Although the ice water path 30 (IWP) obtained from different instruments show several folds of differences (Stephens and Kummerow, 2007;Wu et al., 2009), it is of great importance to use remote sensing to get microphysical of clouds. Active observations such as lidar and radar as well as passive measurements such as visible/infrared imaging spectrometers and microwave radiometers have been used to produce cloud products (King et al., 1998.;Austin et al., 2009;Delanoë and Hogan, 2010;Deng et al., 2010;Boukabara et al., 2011). Millimeter frequency radiometers are sensitive to larger precipitating hydrometeors while sub-35 millimeter frequencies are sensitive to smaller ice particles (Buehler et al., 2007). Cloud radar has the advantage of higher vertical resolution and sensitivity than passive radiometer and can determine the vertical structure of ice clouds. However, this usually comes at the cost of low spectral range and low spatial coverage of the observations (Pfreundschuh et al., 2020).
The brightness temperature (TB) depression caused by the scattering of ice particles is usually proportional to the IWP which simplifies the retrieval method from radiometric measurements (Liu and Curry, 2000). Researches on ice cloud 40 retrieval using radiometers such as AMSU, SSMIS, MHS and MWHS, as well as limb sounders such as MLS, SMR, SMILES have been published for years (Zhao and Weng, 2002;Eriksson et al., 2007;Wu et al., 2008;Sun and Weng, 2012;Millán et al., 2013;Wang et al., 2014). However, these spaceborne radiometers lack the ability of polarization measurement while dual-polarization measurements above 100 GHz show obvious polarized scattering signals of ice clouds. The recent theoretical model research indicates that the non-spherical and oriented ice particles are the main reason for the polarization 45 signal .
With the increasing frequency, polarimetric measurement will lead to a new understanding of clouds and their microphysical (Buehler et al., 2012;Eriksson et al., 2018;Coy et al., 2020;Fox, 2020). Most passive microwave sensors that have dual-polarization channels are limited to frequencies below 100 GHz. However, these sensors are strongly affected by surface contamination. Currently, only GMI and MADRAS have observed polarimetric signals from ice clouds above 100 50 GHz (Defer et al., 2014;Gong and Wu, 2017). By analyzing the polarization differences between the 89 GHz and 166 GHz channels of GMI, Gong and Wu (2017) found that large polarization occurs mainly near the convective outflow regions (anvil or stratified precipitation), while in the inner deep convective core and the distant cirrus regions, the polarization signal is smaller. It is roughly estimated that neglecting the polarimetric signal in the IWP retrieval will lead to errors of up to 30% (Gong et al., 2018). Their further study showed that the main source of the 166 GHz high polarimetric radiance 55 difference (PD) is horizontally oriented snow aggregates or large snow particles, while the low polarization signal could be small cloud ice, randomly oriented snow aggregates, foggy snow, or supercooled water (Gong et al., 2020). The Ice Cloud Imager (ICI) will provide a more comprehensive observation of ice clouds. By covering 176 GHz to 668 GHz, ICI has good sensitivity to both large and small ice particles, and its dual-polarization channels also allow observation of horizontal particles . 60 The Microwave Humidity Sounder (MWHS) onboard the Fengyun-3B (FY-3B) satellite has been proven to give information about IWP (He and Zhang, 2016). The MWHS has quasi-polarization channels at 150 GHz that can provide polarization information of cloud ice. The neural network is an easy way to find the nonlinear relationships between TBs and IWP while the only problem is the lack of true IWP values. CloudSat is recognized as a relatively accurate instrument for cloud measurement, and its official Level-2C product is used in this paper. Numerous studies have been conducted to 65 compare CloudSat products with in-situ measurements, the results show that the Level-2C product is quite reliable when using a combination of Cloud ProfilING Radar (CPR) and Lidar. Its ice cloud water content (IWC) is fairly close to the insitu observation (Deng et al., 2013;Heymsfield et al., 2017). Although CloudSat products still have considerable uncertainties (Duncan and Eriksson, 2018), they can give us a relatively accurate reference of IWP and IWC. Holl et al. (2010Holl et al. ( , 2014 present an IWP product (SPARE-ICE) that uses collocations between MHS, AVHRR, and CloudSat to train a 70 pair of artificial neural networks. The 89 GHz and 150 GHz channels were excluded since they are surface sensitive.
However, the 150 GHz channel shows good sensitivity to precipitation-sized ice particles Bauer, 2003). Brath et al. (2018) retrieve IWP from airborne radiometers of ISMAR and MARSS using neural networks.
In this paper, we present an analysis of IWP retrieval from the FY-3B/MWHS observations based on the deep neural network. Both 150 GHz (QV and QH) channels and their PD are investigated. First, we collocate the MWHS measurements 75 with the CloudSat/2C-ICE IWP according to the observation time and geolocation. Second, we train deep neural networks (DNNs) that are used to filter cloud scenes and retrieve the IWP. The effects of different channels (including PD) and auxiliary information on DNN retrieval are also discussed. Finally, the performance of the final configuration network is evaluated. The trained neural networks are used for a tropical cyclone case and the global annual mean IWP map of MWHS.
Zonal mean IWP of MWHS is also compared with Aqua/MODIS L3 product, 2C-ICE and ERA5 reanalysis data. The main 80 aim of this study is to analyze the ability of the MWHS in IWP retrieval, especially the role played by the dual-polarization channels in IWP retrieval. This paper is organized to describe the data analysis in Sect. 2, followed by the retrieval method in Sect. 3. The IWP retrieval results and analysis are discussed in the subsequent section, with conclusions in the end.

FY-3B/MWHS
The FY-3B satellite was launched on November 5, 2010, and the MWHS was equipped as one of the main payloads. The MWHS performs the cross-track scanning along the orbit at an angle of ±53.35° from nadir to make 98 nominal measurements per scan line, which is corresponding to a scan swath of 2645 km in 2.667 s with a resolution of 15 km at 90 nadir. It measures at frequencies from 150 GHz to 190 GHz (two window channels at 150 GHz and three channels near the water vapor absorption line at 183 GHz), these channels are labeled as CH.1 to CH.5 hereafter. The details of each channel are shown in Table 1 (Wang et al., 2013). Compared to its successors (i.e. MWHS-II) onboard the FY-3C/D/E satellite, the 150 GHz channels of MWHS have quasi-horizontal and quasi-vertical polarization that can include unique cloud information.
These channels can provide information near the Earth's surface and lower atmosphere, and can also be used to measure 95 atmospheric cloud parameters. For the 150 GHz channels, Zou et al. (2014) investigated the polarization information and concluded that the polarization signal is related to the scan angle and also to information such as surface wind speed, wind direction and salinity, especially in the clear-sky condition. Under all weather conditions except heavy precipitation, all five channels of MWHS can observe water vapor and ice in the atmosphere. In this study, the Level-1B brightness temperature data set of MWHS is used. 100

CloudSat/CALIPSO
CloudSat is a cloud observation satellite launched into the NASA A-Train in April 2006, with a 94 GHz cloud profiling radar providing continuous cloud profile information (Stephens et al., 2008). The footprint size of CPR observation is about 1.3 km × 1.7 km, with a vertical resolution of 240 m. The scan time for each profile is about 0.16 s, and its sensitivity is -30 105 dBZ. It has an orbital inclination of 98.26°, which is similar to the FY-3B satellite. The Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) was launched with the CloudSat satellite and designed to fly close to each other in the A-Train satellite constellation to make synergistic observations. The Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) carried on the CALIPSO is a dual-wavelength polarized lidar, providing 532 nm and 1064 nm backscatter profiles with a footprint of 75 m cross-track and 1 km along-track . 110 The CloudSat and CALIPSO Ice Cloud Characterization product (2C-ICE) contains retrieved estimates of IWC, effective radius and extinction coefficient for identified ice clouds measured by CPR and CALIOP with orthogonal polarization. The 2C-ICE cloud product uses a combined input of the radar reflectivity factor measured by the CPR and the attenuated backscatter coefficient measured by the Lidar at 532 nm to constrain the ice cloud retrieval more tightly than using only the radar product and to produce more accurate results (Mace and Deng, 2019). The combination of CPR and CALIOP provides 115 a more complete measurement of the ice clouds than any other current spaceborne sensor measurements. Further study showed that this combined retrieval method is less sensitive to the changes in the assumed microphysical properties than CPR or CALIOP single retrieval (Delanoë and Hogan, 2010).
The 2C-ICE retrieval relies on forward model assumptions. Lidar is sensitive to small particles near the top of the cloud, but cannot measure that deep in the cloud which can lead to an unquantifiable error (Mace et al., 2009). A sensitivity study shows that multiple scattering, assumptions regarding particle habits and size distribution shapes are critical to the accuracy of the retrieval (Deng et al., 2010). The research also finds that the ratio between IWC product and in-situ measurements is similar to the ratio between two independent in-situ measurements (around a factor of 2) and conclude that the retrieval agrees well with in-situ data. Since 2C-ICE is used to train the retrieval network in this work, the trained network directly inherits all the systematic errors and limitations of the product. 125

Collocation
Collocated measurement is the occurrence where two or more sensors observe the same regions at the same time. One factor for the collocation window requirements is the specific observation target. Ice clouds is a fast-changing (minutes to hours) atmospheric parameter that needs a window of short time and small space. Another considered factor in defining the collocation window is the number of meaningful statistics for training. 130 The ascending node time of CloudSat is between 13:30 and 13:45 at the local solar time (LST) which is close to that of FY-3B (13:30 LST). Because of the close orbits and the ascending time between FY-3B and CloudSat, the number of collocated measurements is large. In this study, a collocation data set of MWHS and 2C-ICE was created by setting the collocation window to 15 min in time and 15 km in space. Since the footprint of MWHS is an order of magnitude larger than that of CPR, multiple 2C-ICE pixels can be found within one MWHS measurement. Thus, the IWP values of 2C-ICE within 135 a circular window (with a radius of 7.5 km) were averaged to represent the mean IWP for the MWHS measurement pixel.
According to this collocation strategy, 1207731 collocations have been found between the FY-3B/MWHS and the CloudSat/2C-ICE for the year 2014. Since the different observation methods of MWHS and CPR/CALIOP, only 14 pixels of 2C-ICE are contained in the best case of collocations (See Fig. 1a). Thus, the CloudSat footprints cover at most 13.75% of the area of an MWHS footprint, an error from imprecise collocation is unavoidable and the representation of the data set 140 must be considered. Figure 1 illustrates the statistics of 2C-ICE IWP within the MWHS footprints in the collocations. In most cases, more than 10 pixels of 2C-ICE were averaged in the corresponding MWHS pixel. However, there are still many MWHS pixels that only cover a small quantity of 2C-ICE pixels which means the collocations are poorly represented. The coefficient of variation of each collocation pixel is manifested in Fig. 1b. The coefficient of variation is used to represent the IWP 145 dispersion of 2C-ICE pixels in each MWHS pixel. When the coefficient of variation is small, it means the IWP of 2C-ICE pixels averaged in this MWHS pixel are homogeneous and represent the scene that MWHS observed relatively well. Since the collocation error cannot be estimated, the criteria discussed in Holl et al. (2010) is applied to reduce the sampling effect of collocations. In this study, an MWHS pixel with more than 10 pixels of 2C-ICE and less than 0.6 coefficients of variation were selected for subsequent processing. However, in the case of highly inhomogeneous clouds existing outside the 150 CloudSat field of view, larger uncertainty for the IWP within MWHS pixels cannot be eliminated. After the reduction of inhomogeneous collocations, 665519 collocations were retained. Since the data set is used for global retrieval, it must have sufficient samples and their distribution must represent the real world. According to the statistical results of the collocated MWHS pixels shown in Fig. 2, most of the collocations occurred on one side of the flight direction (from the 40th to 90th scan pixel). In terms of observation latitude, the collocations near the nadir scan (the 49th pixel) cover the latitude from 80°S to 80°N, while at the edge of the observation (the 90th pixel) they 160 only cover the tropical regions. In terms of observation time and latitude, Figure 3 illustrates that there is an obvious lack of data above 60°S from April to September, and there are also few data between 0° and 30°S in December. The data distribution suggests that the training in polar regions may be inadequate. Due to the high number of collocations near the poles, 121500 observations at high latitude were randomly excluded to obtain a balanced data set. For IWP retrieval, collocations should be classified into two bins (clear-sky scene and cloudy scene) according to a specific IWP threshold. A 165 threshold of IWP >100 g m -2 is preliminarily selected to classify cloudy scenes. Thus, 81490 collocations are recognized to be cloudy scenes and 462529 collocations are clear sky scenes in this data set.  The density plots of the PD and TB at 150 GHz (clear-sky and cloudy scene) and the corresponding IWP from 2C-ICE over the ocean and land are depicted in Fig. 4 and 5. Scan angles from ±40.15° to ±53.35° are selected to compare the results with observations from conical scanners. In the cloudy case, the TBs are distributed between 150 K and 290 K, with the 175 largest PD occurring at 230 K (corresponding to IWP >1000 g m -2 ). This is similar to the result of Gong et al (2017Gong et al ( , 2020. However, due to the cross-track scanning mode, the PD of MWHS is much lower than conical scanners. The lowest TB generally appears in the center of deep convection clouds, and the PD is small due to the randomly oriented ice particles; the largest PD due to the horizontally oriented particles generally appears in the warmer ice clouds. From Fig.4, it can be seen that the lower the TB, the larger IWP, but the TB is also influenced by the local atmospheric temperature. Comparing Fig. 4  180 and Fig. 5, the TB of the clear sky is generally above 240 K. The PD from the ocean surface is relatively large, while the PD from land is small. Figure 4. The PD-TB150V density plots for the collocations in the cloudy scenes over the ocean (a) and land (b). The (c) and 185 (d) show the corresponding IWP from 2C-ICE.

Retrieval method
The collocations are used as a retrieval database to train the networks, the processing flow is shown in Fig. 6. The DNN is a 190 feed-forward neural network which contains an input layer, several hidden layers, and an output layer. The DNN is a fully connected network, neurons in each layer connect with all neurons in the next layer. The hidden layers are used to perform the nonlinear calculation to achieve a nonlinear mapping of the relationship between input and output data. DNN is based on backpropagation learning algorithms to search for a minimum loss function (such as the mean squared error between prediction data and reference data) and then adjust the thresholds and weights iteratively to close the reference data. The 195 outstanding nonlinear mapping capability makes DNN popular for geophysical retrieval.
In this study, DNN with 6 layers is selected. The first layer is the input layer, and each input quantity uses a neuron to connect with the next layer. The second to fifth layers are the hidden layers, in which 256 neurons are used for each layer, and the tanh and the Rectified Linear Unit (ReLU) is selected as the activation function for the cloud filtering network and the IWP retrieval network, respectively. Since networks are prone to overfitting in the training, the early stopping and 200 dropout method is used to improve the training. To remove the effect of the order of data, random assignation and normalization are performed in the front of the hidden layers. The final layer is the output layer which uses the IWP of 2C-ICE (transfer to log space) as reference. The activation function of the last layer is selected according to the target of the network. For the determination of cloudy and clear-sky scenes, the sigmoid function is used for binary classification. For the IWP retrieval, the results are output directly. Due to the imbalanced data set of the clear-sky and cloudy scenes, the "focal 205 loss" function which can solve the problem of serious imbalance of positive and negative sample ratio in one-stage object detection is used instead of the cross-entropy loss function (Lin et al., 2017). In the iterative training of the networks, the models with the best results in the validation data will be retained. The hyperparameters were chosen by comparing the performance of DNNs with different hidden layers, number of hidden neurons and regularization parameters. Each network mentioned in the next section uses the same hyperparameters of the model to ensure that the performance of the network is 210 only affected by the input parameters. Figure 6. The schematic of the MWHS retrieval based on the DNN model.
The sensitivity of ice clouds is discussed by Holl et al. (2010) and Eliasson et al. (2013), their studies show no significant radiance signals at IWP <100 g m -2 for MHS measurements. Thus, it is used as the threshold for the cloud filtering network. 215 From those collocations, we randomly assign 75% to be used for training and 25% to be used for validation. The training data are used as a sample of data for model fitting. The validation data can be used to tune the hyperparameters of the network and for preliminary evaluation of the model. Collocations during January 2015 are used for testing. These data are not used to train the networks and adjust the hyperparameters but serve as independent data to test the performance of the final obtained networks. The performance metrics employed for the retrieval are defined in the following. 220 The commonly used binary classification metrics are chosen for the cloud filtering network. A confusion matrix M is defined as and are the number of true positives (both MWHS and CloudSat find ice clouds) and negatives (both MWHS and CloudSat find no ice clouds), respectively. and are the number of false positives (MWHS finds ice clouds but 225 CloudSat not) and negatives (CloudSat finds ice clouds but MWHS not), respectively From the confusion matrix above, the accuracy (AC), False Alarm Ratio (FAR), Probability of Detection (POD), F1 score and Critical Success Index (CSI) can be derived as The performance evaluation for the IWP retrieval network is based on the root mean square error (RMSE), mean absolute percentage error (MAPE), BIAS and Pearson correlation coefficient (CC), defined as 235 Results 240 To retrieve the IWP from the MWHS measurements, two networks were trained for different capabilities. The first one allows classifying a scene according to whether it is clear or cloudy. The second is to retrieve the IWP. The two networks are used separately, and the IWP of the scene considered clear is set to 0. Due to the randomness of the neural network in the assigned training and validation data, 20 models were trained for each combination to ensure the stability of the model results. 245

Cloud Filtering Network
The network structure, training data set and cloud IWP threshold are discussed above. The sigmoid activation function can vary the output of the network from 0 to 1, which represents the probability of cloud occurrence. Thus, a threshold value of cloud probability must be assigned to determine the cloudy scene. After testing, a threshold value of 0.4 is the most appropriate for this cloud filtering. The results show that all channels have cloud information, and CH. 4 (183-3 GHz) is the 250 best for cloud detection. This channel is also used by the traditional method to distinguish cloudy from clear sky. However, the detection of ice clouds using MWHS channels is still limited. The FAR and POD of the best network are 0.26 and 0.63, respectively.

IWP Retrieval Network
For the global IWP retrieval, clear-sky scenes were excluded from the training data. Different combinations of the network input are compared to find the best retrieval strategy. The auxiliary information cases and their retrieval errors are listed in Concerning the errors shown in Table 3, a significant improvement in retrieval performance is achieved by adding latitude or ocean/land mask information while the contribution of just adding the scan angle to the retrieval is not significant. In MWHS measurements, the signal from ice clouds is a reduction in TB by scattering effects. In the absence of latitude information, it is difficult to distinguish whether the decrease in TB is due to the ice particles or the low radiance from the surface or atmosphere. So is the ocean/land mask information. According to cases 1, 2, 4 in Table 3, the CC is improved 265 from 0.50 to about 0.62, RMSE and MAPE are also improved significantly. However, MAPE and BIAS are in conflict, reducing MAPE will increase BIAS. Thus, the correlation is an important metric for evaluating the model. The combination of auxiliaries can further improve the retrieval results, although the effect of using the scan angle alone is not obvious. Case 5 and 6 in Table 3 indicates that the scan angle combined with latitude and ocean/land mask can also further improve the retrieval capability. The retrieval MAPE of each IWP bin is shown in Fig. 7 (a). The MAPE in different IWP bins gives a 270 more detailed comparison. Compared to no auxiliary model, adding auxiliaries can significantly reduce the retrieval errors, especially at IWP <200 g m -2 and IWP >1000 g m -2 . The performance of the different channel combinations (all the auxiliary information is added) is presented in Table 4.
Since the 183 GHz channels (CH. 3-5) of MHS have proved to have good sensitivity to CloudSat IWP, the influence of the 275 150 GHz channel and its PD is mainly focused here. The results of case 2 and 3 in Table 4 show that adding the 150 GHz window channel (CH. 2) give an improvement to all the metrics. Considering the contribution of PD in the retrieval, the results show that the addition of PD alone (case 4) contributes to the retrieval of IWP, while the combination including both H and V polarization channels has the best performance (case 1). Figure 7 (b) illustrate the MAPE of different channels.
Comparing case 3 with case 4 in Table 4, the addition of PD gives an obvious improvement in the retrieval results at 280 IWP >2000 g m -2 . This conclusion is close to the analysis in Figure 4. In general, all channels of MWHS contribute to ice cloud retrieval.  Figure 7. Comparison between the performance of the IWP retrieval networks using different auxiliary and channel combinations of input.
The final retrieval models (case 1 in Table 2 and case 8 in Table 3) were selected according to the metrics. Combining the cloud filtering network and the IWP retrieval network with the test data, the final results are shown in Table 5. The performance over the ocean and land is also listed. After adding the cloud filtering network, the accuracy of IWP retrieval 290 decreased, significantly for MAPE and BIAS, and slightly for CC and RMSE. The results are better over the ocean than over land, especially the correlation. Figure 8 shows the scatter plot between MWHS IWP and 2C-ICE IWP in January 2015. The result shows relative agreement, but MWHS IWP has significant dispersion at low IWP, which may be due to the lack of sensitivity of MWHS to thin ice clouds. The final model underestimates the true value overall but overestimates it when the IWP <300 g m -2 . 295

Tropical Cyclone IWP retrieval
A tropical cyclone Bansi observed by MWHS and CloudSat simultaneously (the time difference is about 3 minutes) on 12 January 2015 is selected for the validation of the final networks. MWHS observed TBs of the cyclone are manifested in Fig.  305 9. Quite low TB (as low as 150 K) can be found at 150 GHz and 183-7 GHz channels in the regions of the eyewall (the eye is not seen) and spiral rain bands which are mainly caused by the scattering of ice particles in the clouds. The 183-1 GHz and 183-3 GHz channels are strongly influenced by water vapor, the shape of the cyclone is not observable, but clear low TBs can still be seen in the eyewall and rainband. The PDs at 150 GHz (TB − TB ), their distribution characteristics are the same as the low TBs. The PD reaches its maximum in the anvil precipitation regions (around 5 K, consistent with the result 310 in Fig. 4) and decrease in the remote clear-sky or cirrus regions. Applying the two neural networks trained above to the tropical cyclone, the retrieval IWPs are shown in Fig. 10 in comparison with 2C-ICE, and the retrieval errors are listed in Table 6. Due to the narrow field of view of CloudSat, a total of 315 21 pixels of MWHS are collocated in the tropical cyclone region. The results show that MWHS IWP has a high correlation with 2C-ICE, the MAPE and BIAS are better than that in Table 5, although the RMSE is larger, it is reasonable in tropical cyclones.  Figure 11 shows the global mean IWP for 2015 from Aqua/MODIS L3 product (MYD08_M3, C61, Platnick et al., 2017), CloudSat 2C-ICE, FY-3B/MWHS retrieval and ERA5 reanalysis data set. ERA5 IWP data shown here is combined of its 325 total column snow water (CSW) and cloud ice water (CIW) data since it differentiates between precipitating and nonprecipitating ice. The overall distribution of the annual mean IWP for the four data sets is similar. The MODIS product has a significantly higher IWP than the other three products, while the ERA5 has a lower IWP overall. IWP from 2C-ICE is the same as MODIS near the equator and between ERA5 and MODIS elsewhere. Since 2C-ICE is used to train the networks, MWHS IWP is certainly approaching the 2C-ICE. The zonal means of IWP for 2015 are given in Fig. 12. The overall shape 330 of the IWP zonal averages is fairly consistent across data sets. However, there are large differences in the overall magnitude of the IWP. These differences are particularly pronounced at mid-latitudes, especially between the MODIS product and the other three products. Compared to the IWP maps in Duncan and Eriksson (2018), this version of MODIS IWP is more similar to 2C-ICE near the equator (10°S -10°N), but with increasing latitude, the IWP is much larger than the other products. The MWHS IWP is very close to that of 2C-ICE but lower than 2C-ICE in the mid-latitudes of the southern 335

Global mean IWP comparison
hemisphere. This may be due to the lack of training data in the middle and high latitudes of the southern hemisphere.

Discussion
Ice cloud misidentification is an important and unavoidable problem in this study. One reason is that the microwave channels detect ice clouds through the large decrease in TB. However, the low temperature in high altitude regions or other 345 temperature anomaly phenomena can also lead to low TB. In the final results above, although the geographic information is added to the training data, there are still many misclassification cases, such as on the Tibetan Plateau in winter. Therefore, knowing the surface temperature or the near-surface air temperature will help the ice cloud detection. The other reason is due to the mismatch between the CloudSat and the MWHS footprints spatially and temporarily. Since the CloudSat pixels only cover less than 15% of the MWHS pixel, the 2C-ICE scenes cannot fully represent the MWHS observations, especially in 350 the case of thin clouds.
For the IWP retrieval, the 150 GHz window channel has a significant ice cloud response which in combination with 183 GHz channels provides a better retrieval of IWP. The PD at 150 GHz, although contaminated by polarization from the ocean surface, also contributes positively to the retrieval especially when the IWP is larger than 1000 g m -2 . In addition, the PD of quasi-polarization channels from MWHS is related to the scan angle and does not fully represent the polarization 355 information of the ice particles, especially near the 45° scan angle. From the perspective of polarization measurements only, a cross-track scanner does not provide as much polarization information as a conical scanner but is more convenient for data assimilation.
However, there are some limitations to using neural networks for IWP retrieval. Collocation is the first limitation since there are some uncertainties in the field of view of MWHS and CloudSat due to the large resolution difference. These 20 uncertainties are represented in the training data and can be predicted using for example quantile regression neural networks.
The most important issue is the real sample (2C-ICE) used in training, which has uncertainties that are difficult to quantify. Therefore, it is also impossible to make accurate error estimates of the model results. In the absence of access to a large number of real samples, the use of neural networks can only converge to a certain product with the highest accuracy (such as 2C-ICE). An alternative approach is to use simulation results (typical profiles) of radiative transfer models, where the 365 generalization ability of the network will strongly depend on the model itself and the input field. In addition, the microwave band below 200 GHz is sensitive only to large ice particles and thick clouds and is relatively less effective for cloud detection.

Conclusions
In this paper, an analysis of global IWP retrieval from FY-3B/MWHS radiance measurements based on neural networks is 370 presented. MWHS onboard FY-3B satellite has two quasi-polarization channels at 150 GHz which can provide more information about ice clouds. For IWP retrieval, CloudSat/2C-ICE is chosen as the reference data set for neural networks because it is publicly available and it meets the requirements in terms of data numbers and measurement accuracy. Two types of networks (cloud filtering and IWP retrieval) are trained using the collocation data set of MWHS and 2C-ICE. A cloud filtering network is trained to classify the cloudy and clear-sky scenes. For the IWP threshold of 100 g m -2 , all 375 channels of MWHS show sensitivity to ice clouds, and CH. 4 is the most powerful for cloud detection. The FAR and POD of the final network are 0.26 and 0.63, respectively. IWP retrieval networks with different combinations of channels and auxiliary information as input are compared to find the best retrieval strategy. The retrieval results show that adding the 150 GHz channel gives an obvious improvement in IWP retrieval and the PD also make a positive impact. Comparing the MWHS IWP with 2C-ICE, the CC = 0.65, RMSE = 916.76 g m -2 , MAPE = 92.90%, and BIAS = -213.12 g m -2 . Applying the 380 networks to the cyclone Bansi, the results show a relatively high correlation (0.73) between MWHS IWP and 2C-ICE. The 2015 annual mean IWP from MWHS shows a similar overall shape to that of MODIS, 2C-ICE and ERA5, and is very close to 2C-ICE in magnitude making the retrieval IWP more credible.
Neural networks are widely used to statistically characterize the mapping between radiometric measurements and related geophysical variables. The advantages of neural networks are their simplicity and ease of use, their ability to effectively 385 learn the complex nonlinear mapping relationships in samples, and their better robustness to noisy data. By using the collocated measurements, there is no need to establish a complicated radiative transfer model with many possible sources of error. Although the retrieval accuracy can never be as good as 2C-ICE, the spatial and temporal coverage will be much larger which is important for long time series of climate research.