Using artificial neural networks to predict riming from Doppler cloud radar observations

Abstract. Riming, i.e. the accretion and freezing of SLW on ice particles in mixed-phase clouds, is an important pathway for precipitation formation. Detecting and quantifying riming using ground-based cloud radar observations is of great interest, however, approaches based on measurements of the mean Doppler velocity (MDV) are unfeasible in convective and orographically influenced cloud systems. Here, we show how artificial neural networks (ANNs) can be used to predict riming using ground-based zenith-pointing cloud radar variables as input features. ANNs are a versatile means to extract relations from labeled data sets, which contain input features along with the expected target values. Training data are extracted from a data set acquired during winter 2014 in Finland, containing both Ka-band cloud radar and in-situ observations of snowfall. We focus on two configurations of input variables: ANN #1 uses the equivalent radar reflectivity factor (Ze), MDV, the width from left to right edge of the spectrum above the noise floor (spectrum edge width; SEW), and the skewness as input features. ANN #2 only uses Ze, SEW and skewness. The application of these two ANN configurations to case studies from different data sets demonstrates that both are able to predict strong riming (riming index = 1) and yield low values (riming index ≤ 0.4) for unrimed snow. In general, the predictions of ANN #1 and ANN #2 are very similar, advocating the capability to predict riming without the use of MDV. It is demonstrated that both ANN setups are able to generalize to W-band radar data. The predictions of both ANNs for a wintertime convective cloud fit coinciding in-situ observations extremely well, suggesting the possibility to predict riming even within convective systems. Application of ANN #2 to an orographic case yields high riming index values coinciding with observations of solid graupel particles at the ground.



Introduction
Mixed-phase clouds are important components of the climate system because they play a major role both for the radiation budget (Tan et al., 2016) and for the hydrological cycle (Mülmenstädt et al., 2015). In this type of cloud, ice particles and supercooled liquid water (SLW) can coexist in the temperature range between 0 and −38 • C. The availability of both ice and SLW allows for riming, i.e., the accretion and freezing of SLW on frozen ice particles. Riming is a key process for ice growth and, eventually, precipitation formation (Lamb and Verlinde, 2011;Heymsfield et al., 2020). Detecting and quantifying riming using ground-based remote sensing observations is a non-trivial task, but of great interest for several reasons.
First, ice-phase microphysical growth process rates in general are associated with a large uncertainty, posing a major challenge for microphysics schemes in numerical weather forecast and climate models (Morrison et al., 2020). Recently, considerable effort has been made to improve the representation of riming in models, ranging from processoriented approaches (e.g., Seifert et al., 2019) to novel microphysics schemes (e.g., Morrison and Milbrandt, 2015). Deriving the rime mass fraction from remote sensing measurements would enable observation-model comparison studies, which are an important step in evaluating and constraining the parameterizations of riming in models. Second, those regions where riming occurs in clouds coincide with icing conditions, which pose a hazard to aircraft traffic (Cao et al., 2018). The ability to detect riming conditions, e.g., using ground-based vertically pointing remote sensing instruments, would allow for real-time warnings near airports (Serke et al., 2010). Finally, long-term statistics of riming are needed in order to better understand under which conditions riming is taking place in the atmosphere.
In the past, several approaches to retrieving riming from ground-based remote sensing observations were developed. Riming, at least in its initial stage, increases the particle density, but not its size, because the freezing SLW droplets first fill the cavities in the particle structure (Heymsfield, 1982;Seifert et al., 2019). As a result, its terminal velocity increases (Mosimann et al., 1994;Mosimann, 1995) and also the aspect ratio and backscattering cross section are changed (Garrett et al., 2015;Leinonen and Szyrmer, 2015). While it is quite challenging to detect riming in polarimetric measurements (Vogel and Fabry, 2018;Li et al., 2018), robust signatures were identified by putting the radar signals of multiple frequencies in relation to one another. Kneifel et al. (2015Kneifel et al. ( , 2016 utilized observations of collocated cloud radars with three different wavelengths to pin down the fingerprints of riming and aggregation in the triple-frequency Doppler spectral space. They found that rimed particles can be connected with a combination of low dual-wavelength ratios (DWRs) of X-and Ka-band radar (DWR X,Ka < 3 dB) and DWR Ka,W values larger than 3 dB. These signatures are clearer for larger particles. For smaller particles, i.e., small DWR values, this distinction becomes ambiguous. Under these conditions, the fall velocity of the hydrometeors can give an additional constraint. Mason et al. (2018) developed an optimal estimation retrieval to obtain a density factor parameter, which is sensitive to riming, using observations of mean Doppler velocity (MDV) and DWR Ka,W . Li et al. (2020) developed a snow observation classification with a rimed and an unrimed category, using MDV and DWR X,Ka . Oue et al. (2021) combined MDV and DWR Ka,W with polarimetric observations and were able to observe even different stages of the riming process. However, only a few sites worldwide are equipped with cloud radars of two or more different frequencies, and correct alignment and volume matching is associated with a certain effort. Another approach which does not rely on observations at multiple wavelengths is based on MDV only (Mosimann et al., 1994;Mosimann, 1995). Recently, this technique has been further developed by Kneifel and Moisseev (2020), who were able to derive a robust estimate of rime mass fraction using MDV averaged in both spatial and temporal domains over a 100 m/20 min height-time window. They applied this method to the time series of radar measurements at several stations located within the CloudNet (Illingworth et al., 2007) network.
While there is an evident correlation between MDV and riming due to the larger density and, thus, increased fall speed of rimed ice particles, it is not always possible to rely on this relation. The method fails when the assumption that the MDV is equivalent to the particle fall speed in the observation volume does not hold. This is, e.g., the case in convective systems, which can cause persistent upor downdrafts. Also, in complex terrain, orographically induced waves can shift the observed MDV up or down by several meters per second. These gravity waves can be temporally persistent and, consequently, not be removed by temporal averaging , making MDV-based approaches such as the one by Kneifel and Moisseev (2020) unfeasible. This is unfortunate, especially considering the fact that riming plays an important role in the microphysics of convective and orographic systems (Woods et al., 2005;Houze and Medina, 2005). In these cases, other radar variables have to be exploited in the search for the fingerprints of riming, which allow for a quantitative detection of this process. During riming, multiple hydrometeor populations are present in the same radar observation volume. Due to their different terminal fall velocities, these hydrometeor types result in multiple peaks in the cloud radar Doppler spectra . A broadening of the spectra is the result, which is, e.g., evident in an increase in the spectrum width (the second moment). Another variable which is impacted by riming is the skewness (the fourth moment). Multipeaked spectra have nonzero absolute skewness values, with zero being the value of a Gaussian distribution. The impact of riming on these other radar variables is, however, not as straightforward as for MDV. For example, riming and aggregation processes may result in similar signatures because aggregation can also lead to bimodal spectra (e.g., Barrett et al., 2019). Consequently, the often ambiguous information contained in the higher radar moments cannot easily be extracted from the radar variable space using, e.g., simple thresholding techniques (Maahn and Löhnert, 2017). More sophisticated methods to derive the relationship between riming and the set of available radar variables are required.
Machine learning (ML) algorithms offer ways to discern relationships from complex data sets. The interest in ML techniques has been increasing rapidly over the past few years, and exciting advances have been accomplished in many scientific fields. Also in atmospheric sciences, the use of ML offers a promising path for scientific discoveries (e.g., Seifert and Rasp, 2020). Deep learning is a type of ML, where artificial neural networks (ANNs) are trained to make predictions. Their potential to extract useful relations from ground-based radar observations has been demonstrated in work published by, e.g., Luke et al. (2010) and van den Heuvel et al. (2020).
In this study, the overall goal is to develop a technique which does not rely on MDV and can, consequently, be applied to data sets acquired in complex terrain, where orographically induced vertical air motions render MDV unusable. More precisely, our motivation is to derive riming estimates for a data set acquired in Punta Arenas, Chile. This site is strongly influenced by stationary gravity waves, which make the application of MDV-based riming retrievals challenging. At the same time, information about the occurrence of riming over Punta Arenas would be of special interest due to its location in the vicinity of the Southern Ocean and the pristine aerosol conditions encountered there (Foth et al., 2019;Radenz et al., 2021). We assess, in this work, how ANNs can be used to predict riming using ground-based, zenith-pointing cloud radar measurements as input features. We optimize and train ensembles of ANNs using different combinations of radar variables. These variables include the equivalent radar reflectivity factor (Ze), the MDV, the spectrum width from left to right edge of the spectrum above the noise floor (spectrum edge width -SEW), and the skewness.
2 Data and methods

Field experiments
We are using data from the "Biogenic Aerosols -Effects on Clouds and Climate" (BAECC; Petäjä et al., 2016) campaign to train, validate, and test ensembles of ANNs. During BAECC, in situ observations of snow were collocated with Ka-and W-band radar measurements at a site in Hyytiälä, Finland. The ensembles of trained ANNs are then applied to observations collected during the "TRIple-frequency and Polarimetric radar Experiment for improving process observation of winter precipitation" (TRIPEx-pol; Mróz et al., 2020) in Sect. 3.2. The motivation to use these data is twofold. First, this allows us to demonstrate that the ANNs are able to generalize to different meteorological conditions and radar settings and to check whether the predictions are consistent between the Ka-and W-band setup. Second, the riming retrieval can additionally be compared with expected signatures of riming in triple-frequency radar observations. Another validation study is performed using data obtained at the Leipzig Institute for Meteorology (LIM) on 19 March 2021 in Sect. 3.3. In this data set, W-band radar data are complemented by ground-based in situ observations of graupel and snowflakes. In Sect. 3.4, we apply one set of ANNs to a graupel case measured by the same W-band radar during the "Dynamics, Aerosol, Cloud And Precipitation Observations" (DACAPO-PESO) field experiment in Punta Arenas. In this section, each of the four measurement setups is briefly introduced.

The BAECC experiment
The BAECC campaign was a joint effort between the U.S. Department of Energy Atmospheric Radiation Measurement Program (DOE ARM), the University of Helsinki, the Finnish Meteorological Institute (FMI), the U.S. National Aeronautics and Space Administration (NASA), and Colorado State University. From February to September 2014, the second ARM mobile facility (AMF2) was deployed at the Station for Measuring Ecosystem-Atmosphere Relations II (SMEAR II) in Hyytiälä, Finland (61 • 51 N, 24 • 17 E). A detailed description of the setup of the remote sensing and in situ instrumentation can be found in Petäjä et al. (2016). The suite of remote sensing instruments include a Ka-band ARM zenith-pointing radar (KAZR) and a Marine W-Band ARM Cloud Radar (MWACR), which are both Doppler cloud radars. Moreover, several ground-based in situ instruments for measuring microphysical properties of snow were deployed at the site. We use data from the ground-based in situ Precipitation Imaging Package (PIP; Pettersen et al., 2020), which is a video disdrometer that measures the size and velocity of hydrometeors reaching the surface.

The TRIPEx-pol experiment
The TRIPEx-pol campaign took place from November 2018 until February 2019 at the Jülich Observatory for Cloud Evolution Core Facility, Germany (JOYCE-CF; 50 • 54 N, 6 • 25 E; Löhnert et al., 2015). At that site, vertically pointing pulsed X-and Ka-band (Görsdorf et al., 2015) systems (manufactured by METEK GmbH) and a frequency-modulated continuous wave (FMCW) vertically pointing W-band radar (RPG Radiometer Physics GmbH; Küchler et al., 2017) are employed on a roof platform within 10 m distance. Calibration, quality control, matching, and resampling procedures described in Dias Neto et al. (2019) were applied to yield high-quality data from all three instruments on the same time-height grid. We are using data from 7 measurement days, between 24 November 2018 and 10 January 2019, during which thick cloud systems were observed, covering a range of meteorological conditions. The specific days chosen for this analysis are 24 November, 3, 8, 23, and 27 December, and 7 and 10 January, all featuring thick mixed-phase cloud cases, possibly with riming.

LIM roof platform
The University of Leipzig Institute for Meteorology (LIM) remote sensing instrument suite is located on the observatory's roof platform (51 • 20 N, 12 • 22 E). This CloudNet site encompasses LIM's 94 GHz radar (LIMRAD94) and a video in situ snowfall sensor (VISSS; Maahn et al., 2021). LIM-RAD94 is a W-band FMCW radar very similar to the system which was employed during TRIPEx-pol. The VISSS is an optical observation system for snow particles, measur-ing 140 frames per second at an optical resolution of 59 µm. While particle size distribution and fall velocity are planned to be made available as data products in the future, we here make use of the recorded images of snow particles in a qualitative manner.

The DACAPO-PESO experiment
Since December 2018, the Leipzig Aerosol and Cloud Remote Observations System (LACROS; Bühl et al., 2016) has been employed in Punta Arenas, Chile (53 • 10 S, 70 • 56 W). LACROS comprises a suite of active and passive remote sensing instruments, and for the time period from December 2018 to October 2019, LIMRAD94 was employed next to the LACROS shipping container. The measurement site in Punta Arenas is located at the most southerly continental part of South America, where the meteorological conditions are characterized by prevailing strong westerly to northwesterly winds. As the air flow hits the continental mass, it is forced over mountainous orography and, at the same time, strongly decelerated, resulting in orographic wave motions. From the ground-based radar perspective, persistent up-or downdrafts (lee waves) and updrafts followed by downdrafts, or vice versa, can often be observed. The site and the DACAPO-PESO experiment (http://dacapo.tropos. de; last access: 16 January 2022) have been described in more detail in Floutsi et al. (2021).

Attenuation corrections
The radar reflectivity of all the data used in this study was corrected for attenuation by atmospheric gases, liquid water, melting particles, and ice. The Passive and Active Microwave TRAnsfer (PAMTRA) forward operator (Mech et al., 2020) was used in combination with the CloudNet products, which are either freely available on the CloudNet data portal (https: //cloudnet.fmi.fi/; last access: 16 January 2022) or were processed locally using the cloudnetpy Python package (Tukiainen et al., 2020). Attenuation due to atmospheric gases, including water vapor, was estimated using PAMTRA and the profiles of temperature and humidity included in the Cloud-Net model files. For liquid water, we used the measured liquid water path (LWP) and distributed the mass among the pixels classified as "liquid containing" in the CloudNet classification mask weighted by the measured reflectivity. We used PAMTRA to obtain the attenuation caused by the mass liquid in the respective height bin, assuming a monodisperse particle size distribution for cloud droplets and an exponential distribution for rain (Mech et al., 2020). This means that attenuation caused by SLW droplets is only corrected for in the SLW layers detected by the CloudNet classification mask; i.e., it is limited by the lidar signal attenuation. If pixels were classified as "melting" in CloudNet, the melting layer attenuation was assumed following Matrosov (2008), who derived relations for the Ka and W bands as a function of the rainfall rate. Attenuation due to snow and ice was neglected for the Ka band and estimated according to Protat et al. (2019) for the W-band radars. If the cumulative attenuation correction for a pixel exceeded 10 dBZ, then the profile was removed from the analysis.

Sampling of training data
Masses of individual snowflakes can be retrieved by applying hydrodynamic theory to PIP observations of particle velocity and size (von Lerber et al., 2017). A data set containing observed snow particle number size distributions, along with retrieved particle masses collected between 2014 and 2015 in 5 min temporal resolution, is freely available on GitHub (Moisseev, 2018). Using these retrieved masses, m, and the maximum dimensions of the observed particles, D max , we first estimated the rime fraction. The mass of unrimed snow, m us , is assumed, as in Moisseev et al. (2017) and Li et al. (2020), as follows: where the values for α and β are α = 0.0053 and β = 2.05 (in centimeter-gram-second units). The rime fraction is then defined, as in Kneifel and Moisseev (2020), as follows: Here, IWC us is the estimated ice water content for unrimed snow with the same N (D max ) as the observed particles. It is obtained by integrating m us computed from Eq. (1) over the measured particle number size distribution N (D max ). FR PIP values smaller than zero were removed. In the next step, continuous PIP sampling periods longer than 1 h were identified for which KAZR and/or MWACR observations are available as well. This results in six snow cases between 1 February and 20 March 2014, which are listed in Table 1.
During BAECC, a turbulent surface layer was often present, resulting in a broadening of the Doppler spectra and impacting their width and other higher-order moments relevant for the training. The following sampling procedure for the training data set was chosen, striving to achieve the best possible spatiotemporal match between remote sensing and in situ observations, while at the same time avoiding the sampling of spectra which are strongly impacted by surface induced turbulence. We compute the turbulent eddy dissipation rate (EDR) for 5 min time periods  and determine r, the lowest range at which EDR is < 10 −3 m 2 s −3 . This threshold was determined empirically in a sensitivity study in which spectral broadening was simulated using the convolution of a spectral broadening term σ T with measured Doppler spectra. The time t it takes for the particle to travel from height r to the surface can be estimated using the measured MDV at r, making the assumption  of temporal and spatial coherence, meaning that the properties measured at time t and r will be the same as the properties at time t + t at r + t · MDV(t, r). Note that negative MDV values indicate downward motions. A schematic for the visualization of the applied sampling technique is shown in Fig. 1. For each PIP measurement, the rime fraction was obtained using Eq. (2). All the KAZR and MWACR spectra at range r, which were matched with a 5 min PIP measurement, were extracted, and Ze, MDV, and the skewness were computed. MDV was corrected for air density change as a function of altitude (Vogel and Fabry, 2018). An additional measure of the width of the spectra was computed and stored; the SEW is defined as the distance (in meters per second) between the left edge and the right edge above the noise threshold. This threshold is the mean noise, according to Hildebrand and Sekhon (1974), plus 3 standard deviations of the noise. SEW offers the advantage that it is, in contrast to the spectrum width, not weighted by the reflectivity, making it more sensitive to broadening of the spectrum due to small peaks, e.g., caused by liquid water. While SEW and skewness are not among the traditional radar moments, they are often available; e.g., the ARM MicroARSCL product contains the skewness and edges of the spectra (Kollias et al., 2007;Luke et al., 2008).
The resulting extracted training data set includes 105 115 Ka-band and 209 965 W-band radar observations, along with the FR PIP retrieved from the corresponding PIP measurements. It is available on GitHub (https://doi.org/10.5281/zenodo.5751820; last access: 16 January 2022). A 3D plot of three of the contained features (Ze, SEW, and skewness) colored by FR PIP is shown in Fig. 1c. The video supplement contains an animated visualization of the same variables.

Machine learning methods
The retrieval described in this section reflects a regression problem. Machine learning techniques are used to relate Doppler cloud radar moments and SEW to FR PIP , using a fully connected neural network. Multiple ANN models are trained to make predictions, given many input (features) and output (target) pairs, with the goal to search for a function that both fits the given data well and is also able to generalize to new values (Goodfellow et al., 2016). For the steps described in the following, tools provided in the Python library of Sklearn (Pedregosa et al., 2011), were used.

Data preparation
Radar moments and the SEW extracted from the BAECC data set are used as input features, and FR PIP values are the desired target. One problem emerges because only a limited number of cases with high FR, i.e., values > 0.7, were observed during the period when KAZR, MWACR, and PIP were collocated in Hyytiälä. By their nature, high FR cases are rather short-lived, so the duration and, therefore, length of such observations are rather small. This skewed distribution of target values has impacts on the training of machine learning algorithms and needs to be carefully taken into account. ANN models will primarily be trained to predict values in the range where most of the observations lie. In order to ensure that trained models also succeed to predict high values of FR, which are of special interest, the training data set is resampled using the Synthetic Minority Oversampling TEchnique for Regression (SMOTER) problems (Torgo et al., 2013). SMOTER is an algorithm which oversamples rare cases and undersamples frequent cases in the training data set, leading to a more balanced distribution.
For many ML applications, the standardization of the input data set is a common requirement. In order to avoid effects caused by outliers, it is advantageous to use a robust scaler which scales each of the input features according to the interquartile range (IQR) and removes the median. The same robust scaler which is defined using the training data set will also be applied later on to new data being sent through the model.
After scaling, the labeled data are split into two parts, with one training and validation set (90 %) and the other retaining 10 % of the data for the testing phase. The choice of these ratios is subjective and associated with a tradeoff between learning and the assessment of the generalization ability.

ML model specifications
In our study, we are using a multilayer perceptron (MLP), which is composed of one input layer, at least one hidden layer, and one output layer. In this section, the hyperparameters, which determine the network's architecture and the training process, are discussed.
The input layer consists of one neuron for each input feature, and the output layer yields the output value(s). In between the input and output layers, one or more hidden layers are defined. Each hidden layer contains a certain number of neurons, which transform the values from the previous layer and then apply a non-linear (rectified linear unit; f (x) = max(0, x)) activation function. In a fully connected model, a given neuron is connected to every neuron in the previous layer. During training, an optimizer iteratively adjusts the weights of the model, until a minimum in error (loss) is reached. This error is defined by the loss function, in our case the squared error. We are using the Adaptive Moment Estimation (Adam) optimizer, a stochastic, gradient-based method (Kingma and Ba, 2017), which requires computing the gradient of the loss function with respect to the model parameters by a back-propagation algorithm. The learning rate controls the step size for updating the weights and is, in our case, set to a constant value of 0.001.
The number of hidden layers (which is also referred to as the layer depth) and respective number of neurons are hyperparameters, which require tuning in the validation phase. In Goodfellow et al. (2014), plots of accuracy vs. layer depth are used to determine the optimal ANN architecture. Following this concept, we plotted the root mean squared error (RMSE) of the validation data set predictions and targets vs. the number of neurons/hidden layers (not shown) to find an architecture which is simple yet accurate. For the limited number of input features used in our approach, a relatively simple setup with one hidden layer and two neurons fulfills these requirements.
Here, we use a k-fold cross validation (k-fold cv) approach with three folds. This means that the training and validation set is split into three equally sized chunks (folds). Then, training is performed in three separate setups, in each of which a different fold is used to test the model, while the remaining two folds are used to train the model. This method enables us to compute the mean error of the three folds, which is a more reliable measure of the ANN performance than the error obtained for one split between training/validation set only. In addition, this approach results in three trained models which can be applied as an ensemble to yield an average prediction and a spread, which can (at least to some degree) be regarded as an uncertainty estimate of the prediction. In this study, we are using the following three different parameter combinations as input features to the ANNs: 1. ANN 0: Ze and MDV 2. ANN 1: Ze, MDV, SEW, and skewness 3. ANN 2: Ze, SEW, and skewness.
The set of input parameters in ANN 0 was chosen because it is similar to existing riming retrieval methods relying on MDV. In order to check whether adding more radar variables can improve the riming estimate, ANN 1 uses the SEW and skewness as additional input features. ANN 2 represents the set of input parameters if MDV cannot be used, e.g., due to persistent up-or downdrafts.
In the testing phase, the model performance is evaluated using the test set prediction RMSE in connection with a visual inspection of the scatterplots of ANN test set predictions and target values. In addition, time-height plots of ANN predictions are examined with regard to the physical plausibility of the predicted features. We decided to put an additional focus on the ability of the networks to predict high FR values ≥ 0.5. The reason for this choice is that the extracted Doppler spectra features are expected to contain little information about riming up to a moderate riming stage. For high FR values, the prediction should be more accurate because the riming signal should be clearer in the input features. Table 2 summarizes the test set RMSEs found for the three different ANN input parameter sets and the two radar frequencies. We will refer to the predicted quantity in the following as FR ANN to distinguish it from the FR PIP retrieved from in situ observations. Figure 2 shows scatterplots of ANN predictions and test set FR PIP target values for the three different ANN ensembles and for the Ka and W bands. The slopes of the linear models (red dashed lines in Fig. 2) are listed in Table 2. From Fig. 2, it becomes apparent that ANN 0 seems to have a problem in the W-band setup because two populations of values are separated in the predictions. The slopes of the linear fits, which should ideally be 1, are lowest for ANN 0 for both considered radar frequencies. This observation is, however, not reflected in the RMSEs listed in Table 2, which hardly differ between the three ANN ensembles for the W band. The RMSEs, in turn, show that ANN 0 performs worse than the other two setups for the Ka band. This illustrates the limits of using a single quantity like RMSE as a quality metric. ANN 1 and 2 also have issues predicting high FR values, and all ANNs feature a cutoff at low FR values.

T. Vogl et al.: Using ANNs to predict riming
This might be because riming signatures for low FR are not very pronounced in the cloud radar Doppler spectra, making it impossible for the ANNs to accurately predict FR ≤ 0.3 from the extracted features. Similarly, the issues of the ANNs when predicting high FR values could be explained by the spectra of pure graupel not necessarily being bimodal, which would result in a less clear signature in SEW and skewness. For the remaining study, we will focus only on ANN 1 and 2.

Error consideration
We acknowledge that several sources of uncertainty impact the riming predictions and only some of them can be quantified. For the training data set, assumptions about the density of unrimed snow and the viewing-geometry-corrected D max were made. The m-D relation used to derive the mass of unrimed snow  is assumed to overestimate m us by a maximum of 5 %. Furthermore, errors are introduced by the spatiotemporal matching of radar and PIP observations. To grasp the overall effect of measurement uncertainties which propagate into the FR ANN predictions, a sensitivity study was performed. We assumed an error of 0.2 dBZ for Ze, one Doppler bin for MDV and SEW, and 0.2 for the skewness and added this uncertainty to selected Doppler spectra. This was accomplished by random picking of n = 10 000 samples from a Gaussian distribution defined by the measured values as the mean and above errors as the standard deviation. The resulting standard deviation in the predicted FR ANN is approximately 1 %. In addition to the 5 % uncertainty from the mass of unrimed snow, and other error sources which cannot be quantified, we propose assuming an overall uncertainty of approximately 10 % due to the training data. This uncertainty has to be added to the uncertainty of the ANN models (Table 2).

Results and discussion
This section is structured as follows: in Sect. 3.1, a welldefined benchmark riming case from the BAECC data set is presented. The application of the ANNs to the TRIPEx-pol data set is demonstrated in Sect. 3.2. In Sect. 3.3, the performance of the ANNs for the W-band radar in Leipzig on 19 March 2021 is evaluated by comparison to in situ observations. Finally, in Sect. 3.4, we present ANN predictions for a case obtained during the DACAPO-PESO field campaign using the same W-band radar.

BAECC benchmark riming case
In addition to using the BAECC data set for training, validation, and testing, we also capitalize on the fact that these data have already been extensively studied with respect to microphysical growth processes, and several well-defined case studies are available. We evaluate the performance of our newly developed riming estimation technique for a 1 h period between 22:00 and 23:15 UTC on 21 February 2014, which is part of the 10 % of the labeled data that were retained for the test set. Riming started at around 17:00, with the LWP reaching its maximum value of more than 1000 g m −2 at around 22:00 UTC ( Fig. 4e; see Moisseev et al., 2017). Parts of the focus period considered here between 22:00 and 23:15 UTC have been previously analyzed in detail by, e.g., Kalesse et al. (2016Kalesse et al. ( , 2019, Moisseev et al. (2017), Mason et al. (2019), and Kneifel et al. (2015Kneifel et al. ( , 2016. Fig. 3 shows Ze, MDV, spectral width, skewness, and SEW measured by the KAZR. The case is characterized by a seeder-feeder situation, where a frontal snow cloud merges into a mid-level mixed-phase cloud. In the mixed-phase cloud, SLW layers are present at 0.7 to 0.9 km and slightly below 3 km. As snow starts to fall from the frontal (seeder) cloud into the lower-level (feeder) cloud, intense riming happens along a slanted fall streak feature at around 22:40 to 22:45 UTC, resulting in Doppler spectra with multiple peaks . During the following time period, a transition from strongly rimed particles to unrimed snow aggregates at between 23:03 and 23:10 UTC was observed by Kneifel et al. (2015Kneifel et al. ( , 2016; Moisseev et al. (2017); Mason et al. (2019). Figure 4 shows the predicted FR ANN for the two different ANN ensembles and the two different radar frequencies, respectively. The ensembles yield very similar predictions, which is remarkable given the fact that ANN 2 does not use MDV as input feature. The surface-induced turbulent layer close to the ground is masked with gray pixels where EDR exceeds the threshold of 10 −3 m 2 s −3 . For the W band, columns where the estimated attenuation exceeded 10 dBZ were masked. For all four configurations, a clear increase in FR ANN is visible at around 22:40 UTC when snow starts falling from the seeder cloud through the SLW layers in the lower-level, mixed-phase (feeder) cloud. This is the period for which Kalesse et al. (2016Kalesse et al. ( , 2019 reported riming signatures in Doppler spectra featuring multiple peaks. Unfortunately, due to low precipitation intensity at the ground, no PIP-based FR retrieval was available for this time period (Fig. 4f). However, a continuous decrease in LWP points to the depletion of SLW by the strong riming (Fig. 4c). Later, after around 22:50 UTC, the predictions differ between Ka and W band. While the two ANN ensembles trained on Kaband data only predict riming during a short period around 23:00 UTC, the FR ANN predictions in the W-band setup remain elevated up to a range of around 4 km. The Ka-band ANNs are apparently able to detect the transition from increased riming to less riming around 23:05 UTC, which was reported in existing studies of the event. Around this time, the LWP has reached its minimum, indicating that no SLW for further riming is available in the column. Coinciding with this period from around 23:05 UTC onwards, the FR ANN is very low (≤ 0.5) throughout the cloud system, which is in accordance with the measured decrease in FR PIP (Fig. 4f).
This first case study is promising with respect to the usability of our trained ANNs to predict riming from cloud radar  moments. Furthermore, the similarity between ANN 1 and 2 for both radar frequencies shows that their application might be possible even without the use of MDV.

TRIPEx-pol case study and triple-frequency signatures for seven cases
To answer the question of whether the developed methods are able to generalize to conditions that are different to the ones they were trained on, it is required that another data set be considered. For this reason, the two ANN sets are applied to data from a different site and obtained by radars with different settings. We will first focus in more detail on the 24 November 2018 case from the TRIPEx-pol campaign. This precipitation event has been analyzed with respect to rain and ice microphysics by Mróz et al. (2020), who found strong signatures of aggregation during the period from 06:45 to 07:45 UTC and riming during a shorter time interval at around 09:00 UTC. Figure 5 shows the radar moments measured during the period from 02:00 to 11:30 UTC on 24 November 2018. The period, characterized as aggregation by Mróz et al. (2020), clearly shows up as a patch of increased signal in the DWR X,Ka between the 1 and 4 km range, while, during the riming period around 09:00 UTC, an obvious increase in absolute MDV values can be observed in a similar range interval, along with an increase in SEW.
In Fig. 6, the predictions of the two ANNs are shown side by side for W band and Ka band. In both cases, the CloudNet classification mask was used only to apply the ANNs to those parts of the cloud which were classified as ice or ice and liquid. For the two different frequencies, and for the two ANNs, the predicted FR ANN features are very similar. For the aggregation period, which has the strongest signal in the DWR X,Ka (Fig. 5c), from approximately 06:45 to 07:30 UTC, some riming is predicted; however, there are relatively low values. During the riming period, which clearly shows up as increased MDV in Fig. 5b between 08:00 and 09:00 UTC, strong riming is predicted for both wavelengths and by both ANN sets. Again, it is remarkable how the MDV features in Fig 5b can be discovered even in the predictions by ANN 2, which does not use MDV (Fig. 6c, d). Despite the differences in scattering properties and attenuation due to the hydrometeors at the Ka and W bands, as well as the different noise levels of the two radars, the retrieved FR ANN is rather similar. The common time-height grid in the TRIPEx-pol data set allows for a convenient direct comparison of the two radar frequencies for each of the ANN setups. The correlation of the predicted FR ANN values for all considered TRIPEx-pol cases is high for both ANN sets, and the R 2 of the Ka-vs. W-band-based predictions are 0.73 for ANN 1 and 0.81 for ANN 2, respectively.
As mentioned earlier on, riming and aggregation can produce distinct signatures in the triple-frequency space of the X-, Ka-and W-band reflectivity. When considering a plot with DWR X,Ka on the abscissa and DWR Ka,W on the ordinate, observations obtained during riming events fall onto a line at low DWR X,Ka (pink line in Fig. 7). Aggregates, in contrast, tend to yield a hook-like feature, due to the comparably large size at which DWR Ka,W is in saturation because of Mie scattering. This conceptual model was presented by Kneifel et al. (2015) and further explored by Mason et al. (2019), who found that not only the density but also the shape parameter of the particle size distribution and the internal structure of aggregates can have strong impacts on triplefrequency signatures.
In Fig. 7, we plotted the observations for which an increased FR ANN > 0.5 was predicted by the ANNs on 2D histograms in the triple-frequency space and colored by the median FR ANN of all the observations in the respective pixel. A pink line is drawn along the line of the increasing median volume diameter expected for rimed particles (according to Fig. 15 in Kneifel et al., 2015). With respect to frequency of occurrence (not shown here), the largest portion of the pixels for which a FR ANN > 0.5 was predicted falls around that line for both ANN sets and both frequencies. This finding suggests that both ANN ensembles for both frequencies are capable of predicting elevated FR values.

Convective riming and aggregation case study in Leipzig
The findings in the previous sections motivate the need for additional comparisons to in situ observations. The 19 March 2021 Leipzig case is characterized by a wintertime convective mixed-phase cloud system, with the cloud top at 3-4 km, and with embedded strong snow and graupel showers. Cold air aloft, combined with some solar warming near the ground, causes weak instability. Additional lifting, e.g., due to convergences, triggers showers and, as in this case study, is even sufficient to cause thunderstorms. In Fig. 8, the first two moments of the cloud radar Doppler spectra during that day are shown, along with the SEW. Around 15:00 UTC, a strong updraft is visible in the MDV, followed by strongly negative Doppler velocities coinciding with increased Ze and SEW values. Around that time, alternating graupel and snow showers were observed in the Leipzig area. This case is moderately convective, and most of the data would be excluded by filtering criteria such as the convection index κ used in the approach by Mosimann (1995). Figure 9a and b show the ANN predictions for the time between 14:00 and 19:00 UTC. In the panels below (Fig. 9c), hydrometeors observed by the VISSS are shown for three selected time periods. In both ANN predictions, a clear tran- Figure 6. Predicted riming during the case on 24 November 2018 by ANN 1 and 2 using W-band moments (a, c) and Ka-band moments (b, d) as input features. The ANNs were applied only to those parts of the cloud which were classified as containing ice or ice and liquid by the CloudNet algorithm. Pixels with EDR above the threshold of 10 − 3 m 2 s −3 are masked in gray. Figure 7. The 2D histograms for seven cases, chosen from the TRIPEx-pol data set, where the rime index > 0.5 was predicted. Each pixel contains at least 10 observations, and the color indicates the median FR ANN of all the observations contained in the pixel. FR ANN predicted using the W band (a, c), and Ka-band (b, d) radar observations are displayed. The pink line is drawn where rimed particles are expected (according to Fig. 15 in Kneifel et al., 2015). This change can be confirmed by the VISSS observations. In the period from 14:40 to 14:50 UTC, the images show dense and roundish particles (rimed aggregates and graupel). In contrast, during the period from 15:00 to 15:10 UTC, fluffy, unrimed aggregates are observed. Later, from approximately 15:45 to 16:20 UTC, the FR ANN features increased values throughout the vertical column (≈ 0.7-1.0), but these are not as pronounced as in the period before 15:00 UTC. VISSS images taken during the period from 15:50 to 16:00 UTC reveal a mixture of particles which have different sizes and degrees of riming. Small hydrometeors, with diameters well below 1 mm, coincide with aggregates and graupel particles having sizes of several millimeters. The ANN predictions fit extremely well with these observations. In the first case, very high FR ANN values around 1 are predicted, whereas in the second case, the predicted riming indices are below 0.4. In the third case, even though a portion of the column is masked due to the EDR threshold, it is visible that intermediate FR ANN values are predicted by both ANNs. This leads to the assumption that the ANNs are not only capable of detecting strong riming but are also sensitive to the degree of riming or the fraction of rimed particles compared to the total hydrometeor population. In this case, and in the previously presented results, the predictions of ANN 1 and 2 are strikingly similar. These findings show that predicting riming is possible -even without the use of MDV. It has to be acknowledged, though, that the vertical distribution of FR ANN cannot be verified by the VISSS observations. Since the ANNs proved to perform well for this wintertime convective case, this could open a door to detect and even quantify riming in convective systems.

Punta Arenas gravity wave case
The previous findings make us confident that ANN 2 can be applied to the W-band radar data in the DACAPO-PESO data set. We do not apply ANN 1 because it would be biased by the orographic wave motions. Here, we analyze a case observed on 21 February 2019 (Fig. 10) from 13:30 to 22:00 UTC. A precipitating cloud system, with a cloud top around 2.5 to 3 km, is present from around 15:00 to 18:00 UTC. Above, a mid-level cloud, topped at around 6 km and with varying cloud base, is observed. Especially in the higher-level cloud, in the range between 3 and 6 km, a wave pattern is visible in the MDV (Fig. 10b), including a prominent downdraft around 18:00 UTC, with MDV ≈ −2 ms −1 . Precipitation reached the ground between 15:00 and 18:00 UTC, and another precipitation event occurred at around 21:00 UTC. At 16:30 UTC, marked by the red cross in Fig. 10d, graupel particles were observed on the ground at the site.
In Fig. 10d, the FR ANN predicted by ANN 2 is shown. The predicted FR ANN is only increased in the lower part of the cloud system up to around 3.5 km range, which probably contains liquid water. No increased FR ANN values are predicted for the higher part of the cloud system. The maximum in FR ANN predictions is reached around 16:30 UTC, coinciding with the observation of graupel particles.

Summary, conclusions, and outlook
In this work, we have demonstrated the ability of artificial neural networks (ANNs) to estimate riming using groundbased, zenith-pointing cloud radar measurements as input features. Training data were extracted from the BAECC data set by temporally matching PIP-based riming retrievals with cloud radar observations. Ensembles of ANNs were trained to predict a FR ANN , separately for Ka-band and W-band observations, using three different combinations of input variables. ANN 1 uses the equivalent radar reflectivity factor (Ze), the mean Doppler velocity (MDV), the spectrum edge width (SEW), and skewness as input features, and ANN 2 uses Ze, SEW, and skewness. One set of ANNs, using only Ze and MDV as input features, was not further considered due to low performance indicating that these two quantities are not sufficient for quantifying riming. We evaluated the trained models using four case studies and a longer data set comprising observations of seven mixed-phase cloud systems. In general, the predictions of ANN 1 and 2 were found to be very similar across all considered cases, despite the different input variables. Both ANNs were able to predict strong riming and capture the subsequent transition to unrimed snow reported in the literature for a case from the BAECC experiment (Fig. 4). A turbulence threshold of EDR = 10 −3 m 2 s −3 was applied to prevent FR ANN ANN estimates that are too high due to broadened Doppler spectra. It was shown that the models are able to generalize to a new data set, i.e., different radar systems for both considered wavelengths (Fig. 6) and different meteorological conditions. ANN predictions for seven cloud cases were shown to match the expected signatures of riming in the triple-frequency observation space of X, Ka, and W bands (Fig. 7). Large FR ANN values mostly fall into the region for which riming is expected. The application of both ANNs to a convective wintertime cloud case showed that the method can also be applied to convective systems (Fig. 9). Because ANN 2 does not depend on MDV, it was applied to an orographic case, yielding high FR ANN values for the period during which solid graupel particles were observed at the site (Fig. 10). These findings indicate that retrieving riming is possible even without the use of MDV.
One of the major constraints of this study is the limited training data set. As better and longer-term training data sets become available, the ML techniques at hand can be more fully exploited, and further improvement of the ANN performance is expected. Also, an extended data set would better allow us to quantify the errors of the ML method and to understand the limitations with respect to identifying very high and very low FR values.
This study closes an important gap in our abilities to quantify the riming process with cloud radars. Further validation, e.g., by comparison of this technique with airborne in situ observations would be a useful extension of this work. Future applications will focus on longer-term data sets to investigate the drivers of riming, including orographic conditions.
Video supplement. A 3D visualization of the training data set is available to be viewed online via the TIB AV-Portal (Vogl, 2021a).
Author contributions. The method was developed by TV, MM, and HK with contributions by all co-authors. SK provided the RIPExpol data set, DM the in situ data for the BAECC data set, and MM the in situ data for the Leipzig data set. TV performed data visualization and analysis, with help from all co-authors. The text was written by TV and reviewed by all co-authors.
Competing interests. At least one of the (co-)authors is a member of the editorial board of Atmospheric Measurement Techniques. The peer-review process was guided by an independent editor, and the authors also have no other competing interests to declare.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Special issue statement. This article is part of the special issue "Fusion of radar polarimetry and numerical atmospheric modelling towards an improved understanding of cloud and precipitation processes (ACP/AMT/GMD inter-journal SI)". It is not associated with a conference.
Acknowledgements. We acknowledge the provision of physical access to the LACROS resources in the frame of DACAPO-PESO, which is provided via the European Research Infrastructure for the observation of Aerosol, Clouds and Trace Gases (ACTRIS; grant nos. 654109 and 739530) as part of the European Union's Horizon 2020 research and innovation programme. We also want to thank Boris Barja from the Universidad de Magallanes, for granting access to the site and his support throughout the field campaign. The German Weather Service (DWD) is acknowledged for providing ICON model data. Work provided by Stefan Kneifel was funded by the German Research Foundation (DFG; grant no. KN 1112/2-1) as part of the Emmy Noether Group OPTIM-Ice. The TRIPEx-pol campaign has been supported by the DFG Priority Program SPP2115 "Fusion of Radar Polarimetry and Numerical Atmospheric Modelling Towards an Improved Understanding of Cloud and Precipitation Processes" (PROM) under PROM-IMPRINT (grant no. 408011764). Teresa Vogl acknowledges funding from the German Academic Exchange Service for a research stay at University of Colorado, Boulder in Boulder, CO (grant no. 57504619). We would like to thank Leonie von Terzi, for her help processing the TRIPEx-pol data set. A big thank you to Martin Radenz, who saw the graupel in the grass in Punta Arenas after coming back from a lunch break and wrote it into the logbook. We also want to thank David John Gagne from NCAR, for his advice on machine learning. Thank you to Anton Kötsche, for the synoptic analysis, and special thanks to Jen Kay and Patric Seifert, for the fruitful discussions.
Financial support. We gratefully acknowledge the funding of the German Research Foundation (DFG) in the frame of the special priority program on the Fusion of Radar Polarimetry and Atmospheric Modelling (SPP-2115, PROM; grant no. KA 4162/2-1).
Review statement. This paper was edited by Patric Seifert and reviewed by two anonymous referees.