Integrated water vapor and liquid water path retrieval using a single-channel radiometer

Microwave radiometers are widely used for the retrieval of Liquid Water Path (LWP) and Integrated Water Vapor (IWV) in the context of cloud and precipitation studies. This paper presents a new site-independent retrieval algorithm for LWP and IWV, relying on a single-frequency 89-GHz ground-based radiometer. A statistical approach is used, based on a neural network, which is trained and tested on a synthetic data set constructed from radiosonde profiles worldwide. In addition to 89-GHz brightness temperature, the input features include surface measurements of temperature, pressure and humidity, as 5 well as geographical information and, when available, estimates of IWV and LWP from reanalysis data. An analysis of the algorithm is presented to assess its accuracy, the impact of the various input features, as well as its sensitivity to radiometer calibration and its stability across geographical locations. The new method is then implemented on real data that were collected during a field deployment in Switzerland and during the ICE-POP 2018 campaign in South Korea. The new algorithm is shown to be quite robust, especially in mid-latitude environments with a moderately moist climate, although its accuracy is inevitably 10 lower than that obtained with state-of-the-art multi-channel radiometers.

institute (MeteoSwiss) was used. MeteoSwiss's facilities in Payerne comprise a multi-frequency radiometer with tipping-curve calibration, HATPRO (Rose et al., 2005;Löhnert and Maier, 2012). This state-of-the-art instrument retrieves LWP and IWV 90 with a nominal accuracy of respectively 20 g m −2 and 0.2 kg m −2 (RPG Radiometer Physics GmbH, 2014). During this deployment, both WProf and HATPRO measured brightness temperatures with a high temporal resolution of the order of a few seconds.
In addition, radiosondes are launched twice daily in Payerne, allowing for the direct computation of IWV values, which are 95 used as a further source of validation for the IWV retrieval algorithm.

ICE-POP 2018
The second dataset on which the new algorithm was tested was gathered during the ICE-POP 2018 campaign, which took place in South Korea during the 2017-2018 winter, in the context of the 2018 Olympic and Paralympic winter games in Pyeong Chang (Gehring et al., 2020). WProf was deployed from November 2017 to April 2018 in Mayhills, 50 km south-east of 100 Pyeong Chang, at 789 m of altitude. This allows for an implementation of the algorithm in a different context than Payerne: i.e. in winter conditions and at a fully different geographical location, at a lower latitude and closer to the sea.
In this case, unlike in Payerne, no independent measurements of LWP are available; however, radiosondes were launched every 3 hours, thus providing a means of comparison for IWV retrievals.
3 Forward model 105 In order to develop a statistical algorithm, a large amount of data is required to reliably perform the statistical learning phase. For this purpose, a synthetic database was built, using as a starting point the radiosonde profiles worldwide described in the previous section. A two-step forward model was implemented, first to identify clouds in each profile and derive the corresponding liquid water content, then to compute the resulting 89-GHz brightness temperature. The different steps of this forward model are illustrated in the flowchart in Fig. 1.

Cloud liquid model
To derive profiles of liquid water content (LWC) from radiosonde profiles of atmospheric variables, the cloud model from Salonen and Uppala (1991) was used. Cloud boundaries are identified using a threshold on relative humidity, this threshold being pressure-and temperature-dependent. Corrections from Mattioli et al. (2009) are used for the coefficients of the Salonen model. More sophisticated and accurate models can be defined on a local geographical scale (e.g. Pierdicca et al. (2006)), but 115 given the stated objective of this study to design a non-site specific algorithm, it was considered preferable to assume a single universal liquid cloud model, in spite of its potential limitations. droplet size deviates from this regime, for instance as droplets grow larger near the onset of precipitation, then the Rayleigh assumption falls short and higher-order terms in the Mie equations become non-negligible, which alters the modeling of T B (e.g., Zhang et al., 1999). This implies that the algorithm will output biased results when applied to raining cases, and should not be trusted in those cases. This shall be considered as an intrinsic limitation to the algorithm.
While radiative emission from solid hydrometeors has a minor influence on brightness temperature when compared to emission 130 from liquid droplets and water vapor, they do contribute to microwave brightness temperature through the backscattering of surface radiation. Scattering from snowfall particles is difficult to model accurately, but Kneifel et al. (2010) suggest that this effect could be notable during snowfall, in a way that is highly dependent on the microphysical properties of snowfall particles, and that would increase with their size. The present study does not take into account this process and could therefore yield biased results during intense snowfall events. 135 4 Design of the IWV and LWP retrieval algorithms

Input features
When a single frequency is available for the measurement of T B , the problem's underdetermination can be partially relieved by including other available information in the retrieval's measurement vector. In this study, several categories of variables were included in the input features. The first category consists of T B and higher order polynomials (up to fourth degree) and is 140 expected to have the greatest importance in the retrieval. Secondly, surface measurements are included (temperature, sea-level pressure and relative humidity); in the case of the radar-radiometer set-up that is used here, a weather station is collocated, meaning those measurements are available at the location of the instruments. The third category of input features comprises geographical descriptors: latitude, longitude, altitude; the day of year is also included in this group of features, as a means to account for seasonal variability in atmospheric and meteorological conditions. When available, a fourth category is added to 145 the input features with reanalysis data (precipitable water and liquid water) from ERA5 (Copernicus Climate Change Service, 2020). The spatial and temporal resolution of this reanalysis data is too low for it to be held as ground truth, but it can serve as a reasonable rough estimate in the statistical learning process. Those four groups of features are used both for the retrieval of IWV and that of LWP. In the case of LWP, an additional input feature can be added, which is the output of the IWV retrieval algorithm. The impact of each of those feature groups on the retrieval will be discussed in Sect.5.

Dataset preprocessing
Strong rain events should be excluded from the training set, since they are out of the scope of the algorithm's range of validity, as explained in Sect. 3. Profiles with LWP > 1000 g m −2 are therefore removed (i.e. in the range of heavy rain according to Cadeddu et al., 2017). The resulting dataset contains ∼ 10 6 profiles and is used for the design of the IWV retrieval algorithm.

155
In the case of LWP retrieval, additional pre-processing is needed, since the forward model produced a large majority of clearsky cases. If left as such, the training phase will result in a strong bias of the retrieval toward low LWP values. In order to avoid this, the dataset was subsampled so that clear-sky and cloudy cases (up to 600 g m −2 ) would be equally represented; this threshold results from a trade-off between bias reduction and preservation of overall accuracy. The resulting histogram is shown in Fig. 2, and the LWP dataset thus contains ∼ 10 5 profiles. In the case of IWV, the distribution is also not uniform, but 160 it suffers from a much smaller bias than the initial LWP data set. After some trials, it was considered preferable to use the full IWV data set rather than go through subsampling steps, which did not seem to bring significant improvements in this case. It should also be noted here that the additional pre-processing that was necessary for the LWP retrieval algorithm led us to design two separate algorithms, rather than a single one that would retrieve IWV and LWP at once. Indeed, while LWP retrieval is mostly relevant in cloudy cases, IWV can show some significant variability in clear-sky cases, which should therefore not be 165 excluded from the training stage.

Statistical retrieval using a neural network
Both LWP and IWV datasets were randomly split into training, validation and testing set (70 %-15 %-15 %), and normalized using mean and standard deviation of each input feature in the training set. A densely-connected neural network architecture was chosen over linear regression and decision-tree-based retrieval techniques for it was found to produce more reliable results, 170 with higher accuracy than the former and less prone to overfitting than the latter. The algorithm was designed using the Keras library in Python (Chollet et al., 2015). The neural networks' hyperparameters were tuned on the validation set. As comes across from the training curve of the LWP retrieval on Fig. 3, the training dataset is large enough to ensure that the algorithm is not prone to overfitting. Figure 4 and Table 1 summarize the resulting architecture and relevant parameters. Different versions of the algorithm were trained, using various sets of input features, to assess the importance of each category (discussed below).  Figure 5 illustrates the distribution of the error over the validation set. In panels c) and d), the target variable, respectively IWV and LWP, is binned to intervals on which the root mean square error (RMSE) is calculated. This illustrates the behavior of the algorithm across the entire range of values, rather than summarizing the performance with a single metric such as total RMSE, 180 which can conceal specific behaviors related to the distribution of the target variable in the dataset. Along the same line, we emphasize that comparing those total RMSE values to those from other studies should be done carefully because they strongly depend on the dataset from which they are calculated. Figure 6 shows how this total error, represented by RMSE (left panels) and by the square correlation coefficient (R 2 ) (right panels), is affected by the addition or removal of input features. For each set of input features, a full tuning of the algorithm 185 was performed, and the results that are presented correspond to those from the tuned -i.e. best -version on the validation set.

IWV algorithm
The IWV retrieval algorithm yields a RMSE of 1.6 kg m −2 on the testing set, which corresponds to a relative error of 5.2 %, and is reasonably well distributed across the dataset. There is however a bias for large IWV values (> 60 kg m −2 ), which are underrepresented in the dataset, and which the algorithm thus tends to underestimate (see Fig. 5 a).

190
From Fig. 6 a) it comes across that the IWV retrieval is significantly improved by the addition of secondary input features.
Including solely T B measurements in the input deteriorates the RMSE to nearly 6 kg m −2 . It appears that the most important secondary feature is ERA5 estimates, but that IWV is also significantly correlated to climatological features such as surface atmospheric variables as well as geographical and temporal information.
Overall, the IWV retrieval is reliable and accurate, in particular when using all available secondary information.

LWP algorithm
The LWP retrieval algorithm has a RMSE of 86 g m −2 at best on the testing set (see Fig. 6 c), i.e. 24.6 % in relative error on the total testing set. If clear-sky cases are removed using 30 g m −2 as a threshold value, following Löhnert and Crewell (2003), the relative error is 17.5 %. The RMSE is here again rather homogeneous across the range of LWP values, with however a bias for low LWP values, which are slightly overestimated, and for large LWP (> 800 g m −2 ) which are underestimated (see Fig. 5 b   200 and d). The latter point is due to the lack of data in this range, and is likely acceptable for it would correspond to raining events (light to moderate); yet this highlights once again that those cases are out of the algorithm's scope. As already mentioned, the total RMSE values given here should be taken with care since they depend on the data set's distribution. For comparison, when the retrieval is implemented on the full dataset, i.e. without the subsampling described in Sect. 4, the total RMSE drops to 40 g m −2 .
205 Figure 6 c) shows that the LWP retrieval is less affected than IWV by secondary input features: most categories other than brightness-temperature-derived features have a relatively small impact on the error. This is reasonable since liquid water typically has a greater contribution to T B than water vapor (Löhnert and Crewell, 2003). Furthermore, LWP at a given location can have a high temporal variability due to cloud dynamics in the atmospheric column, which might not always be captured in the time series of surface atmospheric variables. Similar reasons can help explain why the addition of reanalysis data significantly 210 improves the IWV retrieval, but only in a minor way does it increase the LWP retrieval's accuracy. Liquid water content can vary on a shorter spatial and temporal scale than that captured by ERA5 models.
Still, the accuracy of the algorithm drops severely when no other features are considered than brightness temperature (RMSE of 140 g m −2 ). This means that, albeit second-order when taken individually, the secondary input features are efficient in incorporating statistical trends and climatological information to the retrieval during the training phase.

215
Adding IWV prediction as an input feature to the LWP retrieval has a very minor impact. This is not surprising, since it is itself the output of an algorithm that relies on essentially the same input features. However, the slight improvement that is seen can be understood by recalling that the IWV retrieval algorithm was trained on a much larger dataset, which includes in particular a larger number of clear-sky cases (cf. Sect. 3).

Sensitivity to instrument calibration 220
In order to assess the stability of the algorithm with respect to potential miscalibration or calibration drift of the radiometer, T B offsets were virtually added to the testing dataset before implementing the retrieval. Figure 7 a) shows that a 5 K offset in T B results in a 30 % increase in RMSE for the IWV estimations, which is non-negligible. Ensuring proper radiometer calibration thus seems crucial in constraining the error of this retrieval. For comparison, the 89-GHz radiometer presented in Küchler et al.
(2017) has a nominal accuracy of 0.5 K, after calibration.

225
In terms of relative impact, the LWP algorithm is less affected (Fig. 7 b) ) with an increase of the RMSE of less than 10 % for an offset of 5 K in T B , which makes it reasonably stable to inaccuracy of T B measurements. It also appears that the different versions are affected in a similar way by offset T B values. When examined closely, the general trend follows the intuition that the retrieval is slightly more stable when more input features are included, which counterbalances the effect of the miscalibration.

Geographical distribution of the error
One of the motivations of this study was to design an algorithm that could be used across the globe with a constrained uncertainty. Figure 8 illustrates the geographical distribution of the error for both LWP and IWV retrievals, using the synthetic radiosonde-based dataset. Two approaches were used to assess this error: first, RMSE values were calculated on the entire set of data available for each location, excluding LWP greater than 1000 g m −2 . Second, the RMSE was normalized by the mean 235 value of LWP (resp. IWV) for each site, excluding low values (LWP less than 20 g m −2 , i.e. using a conservative threshold to exclude clear-sky cases). Note that this normalized error is not equal to the relative error; rather, it gives an idea of how large the RMSE of the retrieval is, compared to the mean values that are observed at a given location. K17A and K17B were applied to Payerne campaign dataset, and their results are compared to those from the new algorithm 280 in Fig. 10. The error metrics are calculated using HATPRO's values as a reference. The algorithms perform in a similar way, with slightly better results for the new algorithm when at least one of the secondary input features is included. We remind that K17A and K17B were specifically tuned on Payerne data, while the new algorithm was tuned globally, on a dataset that did not include radiosonde profiles from Payerne.

285
As detailed in Sect. 2, the South Korean deployment of WProf in 2017-2018 also offers an opportunity to compare results from the IWV retrieval to IWV from radiosonde measurements.
The analysis of the T B timeseries showed that a miscalibration of the radiometer led to unrealistic -negative -values for which a correction had to be implemented, through the addition of a constant offset to T B measurements. The value of this offset (18 K) was determined by computing theoretical brightness temperatures from clear-sky radiosonde profiles and comparing them 290 to measured TBs, following the approach of Ebell et al. (2017). This is however only a first-order correction whose output should be taken with care, especially after the analysis of Sect. 5.2 which underlined the importance of T B accuracy for IWV retrieval.
After this correction, the IWV retrieval gives coherent results (see Fig. 12), with a total RMSE that is slightly lower than that obtained on the testing data set (1.2 kg m −2 at best). Here again, the best results are found when several input features are 295 included and drop severely when no secondary input features are used. From Fig. 12 it also comes through that the algorithm is consistently outperformed by ERA5 products, which have both a lower RMSE and a higher R 2 , which makes the algorithm less relevant for the study of this specific campaign. Let us highlight that this was not the case in Payerne nor in the radiosonde database, where the algorithm has a greater accuracy than ERA5 values. Possibly, the dry and cold weather that was observed during the ICE-POP campaign (Gehring et al., 2020) featured little short-term variability and were associated with stable at-300 mospheric conditions that were particularly well captured in ERA5 reanalyses.