ERUO: a spectral processing routine for the MRR-PRO

The Micro Rain Radar (MRR) PRO is a K-band Doppler weather radar, using frequency modulated continuous wave (FMCW) signals, developed by Metek Meteorologische Messtechnik GmbH (Metek) as successor to the MRR-2. Benefiting from four datasets collected during two field campaigns in Antarctica and Switzerland, we developed a processing library for snowfall measurements, named ERUO (Enhancement and Reconstruction of the spectrUm for the MRR-PRO), with a two-fold objective. Firstly, the proposed method addresses a series of issues plaguing the radar variables, which include interference 5 lines, power drops at the extremes of the Doppler spectrum and abrupt cutoff of the transfer function. Secondly, the algorithm aims to improve the quality of the final variables, by lowering the minimum detectable equivalent attenuated reflectivity factor and extending the valid Doppler velocity range through antialiasing. The performance of the algorithm has been tested against the measurements of a co-located W-band Doppler radar. Information from a close-by X-Band Doppler dual-polarization radar has been used to exclude unsuitable radar volumes from the comparison. Particular attention has been dedicated to verify the 10 estimation of the meteorological signal in the spectra covered by interferences.


Introduction
While most of the densely populated areas of the planet benefit from the coverage by operational radar networks (Saltikoff et al., 2019), remote locations are often scarcely monitored. Among the latter, Antarctica constitutes an extreme example, with precipitation often estimated from glaciological surface-based observations (Bromwich, 1990), numerical models (Bromwich PRO and WProf. Thanks to the dual-polarization nature of the measurements, we have access to an estimate of the hydrometeor types present in the observed volumes, computed using the classification described in Besic et al. (2016) and refined using the demixing algorithm described in Besic et al. (2018). The information on the hydrometeor population above the airport site is the only product of MXPol used in this study.

130
This section presents the procedure followed by ERUO to process the raw spectra collected by the MRR-PRO. A schematic representation of it can be seen in Figure 1. As hinted by the scheme, the algorithm and its description are divided in three stages: preprocessing, processing and postprocessing, each described in a dedicated subsection below.

Preprocessing
The preprocessing uses the information contained in a whole dataset, collected from a single continuous deployment of the 135 MRR-PRO, to compute three products: the interference line mask, IM(i, n), which highlights the regions of the raw spectra of the campaign likely affected by interference; the border correction, BC(i, n), a spectral density offset used to compensate the drop in the raw spectra values observed at i close to 1 or m;

140
the raw signal-free spectrum profile estimate, P e (n), a guess of the signal recorded in each range gate in ideal clear sky conditions, likely due to the thermal noise of the receiver.
The procedure starts by loading all the raw spectra available in the dataset, which are concatenated along the temporal axis.
From the resulting single 3-dimensional (t, i, n) matrix, the algorithm extracts the 2-dimensional (i, n) median raw spectrum across all time steps, referred to asS(i, n). Panel a of Figure 2 shows how this median spectrum looks like for the MRR-PRO 06 145 dataset.
In our case, the signature of precipitation was never visible in this matrix, being instead relegated to lower quantiles. However, it may be possible that other datasets contain a higher proportion of precipitation measurements, due to a higher frequency of events or to the lack of recording during prolonged clear-sky conditions. The user is advised to displayS(i, n), and look for the signature of precipitation in this matrix. If needed, a higher quantile can be used instead of the standard 0.5 value used to 150 5 https://doi.org/10.5194/amt-2021-294 Preprint. Discussion started: 20 October 2021 c Author(s) 2021. CC BY 4.0 License. computeS(i,n). This value can be easily changed by the user in the dedicated section of the configuration file of the ERUO library.
We were able to identify a set of characteristics recurrent across the median MRR-PRO spectra analyzed so far. First of all, the dependence of the median power from the range gate number seems to follow always the same pattern: after a sharp rise in the lowest gates, the power level briefly plateaus before starting a consistent descent that continues up to the last range gate. 155 The exact position of the range gate in which the transition in trend happens may vary between MRR-PRO and campaigns.
Another behavior consistently present in theS(i, n) matrix is the drop in raw spectrum observed at the extreme values of the spectral line number, already mentioned when introducing the quantity BC(i, n). This drop is, usually, of lower intensity when compared with other features, such as marked precipitation signal. Therefore it is extremely important to make sure thatS(i, n) has been computed over a campaign with a relatively low frequency of precipitation (such as the ones used for this 160 study) or over a subset of clear-sky measurements collected throughout the campaign. Finally, peaks of different size and entity, generating the interference lines in the final MRR-PRO products, may appear in several location across the median spectrum.
Given the presence of these anomalies in both clear-sky and precipitation measurements, their signature is clearly visible in any quantile ofS(i, n), including of course the default value of 0.5 used in this study. These peaks can be roughly divided in two categories:

165
the isolated peaks, occupying only a handful of spectral lines and range gates, appearing as circular or oval shape when plottingS(i, n) versus both n and i; the lines, covering the whole i axis, from i = 1 to i = m, and spanning only a few range gates.
Two examples of the main category are visible in Figure 2, together with the other typical features described before. Additional interference lines of the second type are displayed in subsection 4.3, alongside a description of how the ERUO library handles 170 their presence and a discussion on some possible impacts on the final radar variables.
Each of these observations plays a role in the preprocessing, allowing the use of theS(i, n) matrix as the basis from which the three products of this section will be derived. From this point onward, the algorithm continues following a two-step, quasiiterative procedure. For clarity purposes, we will describe one iteration at a time. 175 The main aim of this first step of the preprocessing is the elimination of isolated and anomalous spectral density peaks iñ S(i, n). Their presence can seriously impact P e (n), by artificially raising the estimated clear-sky power level at the range gates affected by these spikes.

First iteration
In order to identify these spurious signals, we first need to define a rough estimate of the background power level at each range gate. Therefore, a quantity akin to an early guess of P e (n) is defined following this procedure: and excluding the range gates that do not satisfy the thresholds described above. After some tests, the degree of the polynomial has been set to 4. While it appears that the fit is not very sensitive to small changes, such as increasing or lowering the degree by one, the chosen value seems to provide a curve that better maintains the profile trend, even if we were to extend the range to n > n max . This may reduce the risk of contaminating the final part of the profile with the effect of a spurious inversion of the trend a few gates after the end of the fitted range. We will refer to the fit result as 205S f it (n) 5. Finally, we can define our early guess for the power received in each range gate in ideal clear sky conditions. Since the curve presented here is just a first estimate, we use the symbol P (1) e (n) for it. For n ≤ n up , we set P (1) e (n) equal toS(n). For the upper part, instead, we set P (1) e (n) equal to the element-wise minimum betweenS f it (n) andS(n), to avoid the risk of artificially increasing the baseline power level.   Both features are enhanced when looking at the anomaly from this baseline raw spectrum, defined as . This matrix is used to compute IM (1) (i, n), a partial interference mask, covering only the isolated anomalous spikes in the spectrum. Both A (1) (i, n) and IM (1) (i, n) (for MRR-PRO 06) can be seen in panel e of Figure 2.
The computation of IM (1) (i, n) starts by masking (i.e. setting equal to 1) all regions where A (1) (i, n) > 0.2 S.U.. Once again, this assumption is only valid ifS(i, n) does not contain any leftover precipitation signal, either due to the relatively low 220 frequency of precipitation, as in our case, or by being derived from a subset of clear-sky measurements collected throughout the campaign. In this way, we can be sure that the threshold does not affect regions of the spectrum containing precipitation signal for more than 50% of the campaign. Then, the columns i = 1, 2, 3, m − 2, m − 1, m are all unmasked (i.e. set equal to 0), since the power drop at those spectral lines has not been corrected yet. Finally, all range gates containing at least m − 6 masked values are also artificially removed from the mask, since IM (1) (i, n) aims to cover only the isolated peaks, and not the ones The threshold 0.2 S.U. has been found to be the lowest one that allowed the masking of all the most visible interference in our four datasets, among all the tested values. The decision of excluding the three last spectral line numbers on the two extremes of the velocity range is motivated by the drop in raw spectrum, more marked in those columns for all our datasets. 230 The border correction BC(i, n), the first real product of the preprocessing, is introduced to correct for this drop. After masking all regions ofS(i, n) where IM (1) (i, n) = 1, the median of the remaining valid values at each range gate is computed and tiled m times, resulting into a 2-dimensional matrix,S rec . This matrix is usually similar to the one displayed in panel d of Figure 2. However, while the latter is based on our reconstruction,S (2) rec is entirely based on a subset of the measured data. Then, BC(i, n) is defined as equal toS (2) rec −S(i, n) for the couples (i, n) for whichS (2) rec −S(i, n) > 0 S.U., and 0 everywhere 235 else. Panel f of Figure 2 shows the result for the MRR-PRO 06.

Second iteration
The main product of the first iteration, BC(i, n), allows us to refine most of the previous estimates by reducing the impact of the power drop for i close to 1 or m. Without this first estimate of the border correction, it would have been impossible to accurately define an interference mask. The typical power drop at the extreme i can be lower than 1 S.U., while the faintest of 240 the interference lines can be constituted by anomalies as low as 0.2 S.U. , by definition. Therefore, the first can hide the second in some regions of the spectrum. This would result in the final products of the processing retaining these unmasked section of the interference lines, defying the whole purpose of the preprocessing.
As starting point of this second iteration of the preprocessing, we define a new quantity:S (3) (i, n) =S(i, n) + BC(i, n).
Using it in place ofS(i, n), we repeat the steps 1-5 of subsection 3.1.1. The resulting reconstructed profile is P e (n), one of the 245 final products of the preprocessing.
As in the previous subsection, a 2-dimensional matrix is derived by repeating m times P e (n) across a new dimension. This matrix is subtracted fromS (3) (i, n), giving us the new anomaly, A (2) (i, n). To compute IM(i, n), the algorithm starts by masking (setting equal to 1) all entries (i, n) where A (2) (i, n) > 0.2 S.U. If at a specific range gate number more than 0.9 × m are masked, all the remaining values at that n will be masked, too. Finally, the mask undergoes a series of 3 binary dilations.

250
Each of them expands the mask to pixels adjacent to already masked ones (excludign the diagonal from the contiguity).
As mentioned when IM(i, n) was first introduced, the mask aims to cover the regions likely to contain interference. The latter appears in the spectrum as peaks or lines, whose exact position is changing during the campaign, and it may vary slightly in both i and n. While the median spectrum, at the base of this preprocessing, captures the most likely position for the various peaks and lines, variations in their positions may appear too seldom to leave a trace strong enough to be detected as interference.

255
This is why the binary dilation and the threshold on the maximum number of masked spectral lines at any range gate have been introduced: they artificially expand the mask, resulting in a more likely coverage of these rare variations. The exact values for the parameters controlling the two operations have been set after a series of trials on our datasets. The final choice was a compromise: while the spectrum reconstruction (subsection 3.2.2) improves for smaller masked regions, the contamination by interference in the final output of ERUO is reduced when the mask is expanded. The two parameters can also be modified by 260 the user in the configuration file.

Loading large datasets
The size of the matrix resulting from the loading of all raw spectra is a consequence of both the frequency of acquisition and of the duration of the campaign. Depending on the hardware used at this stage, it may be impossible (or extremely tedious) to load all the raw spectra at once, to computeS(i, n). Therefore, user discretion is advised at this stage, and some tests may 265 be necessary to assess the limits of the computer used for the analysis. For reference, the laptop used for the development of ERUO had no difficulties in loading each of the four datasets presented in the Data section.
In case of problems we suggest two possible solutions: splitting the dataset in smaller chunks (one month duration at 10 s resolution should work on most commercially available laptops), and run ERUO separately on each of them;

270
manually selecting a sample of clear sky measurements, spaced throughout the campaign, and loading only those files at the beginning of the preprocessing.

Processing
This section describes how ERUO processes a single MRR-PRO file, using the raw spectra stored in it as starting point for the computation of a set of radar variables: Z (P roc) ea (t, n), V (P roc) (t, n), SW (P roc) (t, n), and SNR (P roc) (t, n). These quan-275 tities are stored in a new file, alongside the 3-dimensional matrix containing the spectral attenuated equivalent reflectivity, SZ (P roc) ea (t, i, n). As schematized in Figure 1, the processing can be seen as a series of short operations, each building on top of the output of the preceding one. These steps are detailed separately in the next subsections, following the same order as the library where the procedure is implemented. As usual, all numerical parameters introduced in the following subsections can be customized by the user through the configuration file.

Transfer function reconstruction
The transfer function is used by FMCW radars to include the receiver gain dependence on the range gate number in the conversion of the raw spectra into spectral reflectivity. As visible in Figure 3, the codomain of these functions is approximately confined between 0 and 1.
In both datasets collected by the MRR-PRO 23 and used in this study, this function appears to have a sudden jump at n = 128

285
(approximately 2000 m above the radar in our configuration), with its value increasing by a factor of approximately 10 38 . The jump is displayed in Figure 3, represented by the almost vertical line connecting the last valid transfer function entry and the next, extremely high one. Therefore, the final radar variables available in the data file appear truncated at that range gate.
Even though we cannot derive exactly what the value of the transfer function for n > 128 would have been without the sudden increase, we decided to include in ERUO a step, preceding the real processing, that estimates it. By comparing the 290 transfer function of the MRR-PRO 23 with the ones of the MRR-PRO 06 and MRR-PRO 22, we noticed that the location of the maximum for the former is at a significantly lower range gate than the ones of the other two MRR-PRO.
We hypothesize that this shift and the lack of valid measurements above n > 128 are both a consequence of the incorrect handling, by the MRR-PRO 23, of the number of range gates set in its configuration. Since n max can be only increased or decreased of a factor of 2, we suspect that the transfer function saved in the files is just a re-sampled version of the real transfer 295 function over the wrong number of range gates, equal to the lower possible step in the radar configuration.
Therefore, we decided to introduce an extra step in the processing routine, called transfer function reconstruction, to resample TF(n) over the correct n max . This reconstruction is executed only if a problem analogous to the one observed in the MRR-PRO 23 dataset arises. Therefore, the algorithm starts by checking for entries in TF(n) above a certain threshold, set by default to 9 × 10 9 , and masks them. Then, using the "scipy.signal.resample" function (Virtanen et al., 2020), the unmasked 300 section of the function is re-sampled over a number of samples equal to n max . The result is finally re-scaled to ensure that the height of its maximum peak is the same as the unmasked section of the original function. In Figure 3, the final output for the MRR-PRO 23 (and ICEGENESIS) dataset is displayed as a brown line.

Spectrum reconstruction
From this point onward, the algorithm moves to the processing of each time step separately. The starting point of this spectrum-305 by-spectrum processing can be considered as the most delicate one of the whole ERUO library. As mentioned before, the removal of interference lines from the final products is among the topmost priority of the ERUO routine. Their manifestations in the raw spectrum, spurious peaks and lines, sometimes hide, in part or completely, the meteorological signal. During precipitation, they can artificially increase the amplitude of the recorded signal at the affected locations, resulting in the overestimation of Z ea and, possibly, alterations of V and SW , depending on the particular shape of the interference. Additionally, in clear-310 sky conditions, these peaks and lines generate anomalies that are recognized as signal by standard techniques such as the one proposed by Hildebrand and Sekhon (1974), and therefore often result in false positives of detected precipitation. Therefore, ERUO intervenes by identifying the regions of each spectra likely affected by interference, masks part of them and reconstructs the masked regions using information from its surroundings. It should be noted that this part of the processing is optional, and it can be avoided by setting the flag controlling it in the configuration file equal to 0. However, we strongly recommend running 315 the spectrum reconstruction, to keep the output products as clean as possible.
For each time step t j of the input file, the process starts by loading of the raw spectra and summing BC(i, n) to it, obtaining the corrected raw spectra, S(t j , i, n). An ideal clear-sky guess of the power return is defined, similarly to the preprocessing case, by repeating m times along a new dimension the profile P e (n), obtaining S (cs) (t j , i, n). This matrix is used to compute the power anomaly, defined as A(t j , i, n) = S(t j , i, n) − S (cs) (t j , i, n) The first guess of the regions of this matrix affected by 320 interference is obtained by selecting the couples (i, n) where IM(i, n) = 1 and A(t j , i, n) > 1.0 S.U.. The chosen threshold is significantly larger than the one used as minimum prominence for the signal detection, to avoid creating vast regions of masked values. As discussed later in the result section, the reconstruction performs better on smaller masked areas. The main aim of this reconstruction is the removal of the most noticeable interferences, which are usually associated to peaks taller than the minimum detectable prominence threshold. Therefore, this choice of threshold suits the objective, while smaller, leftover 325 interference is easily eliminated by the postprocessing.
This first set of locations can be seen as a collection of few contiguous regions of masked (i, n) couples. Among those regions, only the ones that satisfy one of two conditions are kept: the masked section is an isolated peak. In practice, A(i, n) > 1.0 s.u. for less than 5 couples (i, n) in the surroundings of the current masked region. The surrounding area in this case is computed by dilating twice the current contiguous 330 masked region.
for some of the affected n, the masked region covers at least 80% of the interval i = 1, .., m. This may indicate the presence of an interference spanning all spectral line numbers, extreme folding of the meteorological signal or particularly wide spectra, such as the ones recorded in presence of strong turbulence.
The second set of masked regions is further refined by unmasking, at each range gate number, the entry (i, n) with the largest 335 value of A(i, n), if strong anomalies are detected immediately above or below the region. This condition allows the preservation of the position and intensity of the maximum return power at each rage gate, constraining the following reconstruction. In practical terms, the algorithm first checks if A(i, n) > 5.0 S.U. for at least one entry (i, n) within the masked region. The second condition is the existence of locations (i, n) satisfying the same condition in (at least) 3 of the 5 closest range gates below or above the masked region. If the median i coordinate of these peaks differs from the i coordinate of the maximum 340 peak of A(i, n) within the masked region by 5 or less, the algorithm proceeds with the unmasking of the location of the largest anomaly at each range gate.
The matrix A(i, n) is copied and, in this copy, all the previously masked region are substituted by the value "Not a Number" (NaN). These missing region are reconstructed using the "astropy.convolution.interpolate_replace_nans" function of the Python3 library Astropy (Astropy Collaboration et al., 2013. As described in the documentation, this function replaces 345 the NaN values with a kernel-weighted interpolation from their neighbors. We decided to use a Gaussian kernel for the reconstruction, after performing a few tests with the different types available in the library. While along the i-axis the kernel standard deviation has been fixed to 1, on the n-axis the value is defined by the number of consecutive range gates containing at least one NaN divided by a scaling factor, set by default to 3. The kernel size is then left to the Astropy default value, which is eight times the standard deviation. Therefore, on the n-axis, the kernel always contains at least a not-NaN value. On the i-axis, 350 a visual inspection revealed that a standard deviation larger than 1 often causes an artificial broadening of the reconstructed peaks, when compared to the precipitation signal directly above and below. Finally, the first 15 range gates are excluded from the whole procedure, due to the difficulties encountered in accurately reconstructing the peculiar behavior of the A(i, n) in the lowest gates, especially at the extremes of the i-axis.

355
The reconstructed anomaly is added back to the baseline clear-sky return, S (cs) (t j , i, n), and it is converted to linear units. An  provides an example of peak detection for a spectrum collected by the MRR-PRO 06. The peaks with the largest prominence are highlighted in green, while the ones eliminated by the absolute and relative prominence threshold are displayed in red and blue, respectively. In the example, the blue peaks would have been (barely) acceptable, if no larger one was detected in the spectrum. This allows signals as low as the blue ones of panel c of Figure 4 to be detected if no clearer peak is present, while at the same time preventing them from interfering with the next step when stronger signal exists.

370
Once all the (i, n) coordinates of the peaks are identified, the algorithm proceeds by trying to connect them in vertical lines.
For each candidate peak, the algorithm looks in a window of 5 along the n-axis and 10 on the i-axis around its coordinate. If another peak falls in this window, the two are connected in a line, and the procedure continues by trying to connect more peaks to the same line. Each peak can belong only to one line, each line can contain only a single peak per range gate, and multiple lines can occupy the same n (for example, in case of bimodality in the spectrum). If multiple peaks fall in the same window, 375 the choice always favors the one in the closest range gate, using the distance along the i-axis as secondary decision parameter.
Following with the example of Figure 4, four lines have been detected in this specific case, and they have been superimposed Once the lines have been determined, the ones with less than 3 elements are excluded from the analysis. Among the remaining ones, the algorithm tries to exclude the duplicate ones caused by the three repetitions of the original spectrum. Couples of lines 380 whose median difference along the i-axis is close to m (the tolerance is 1 ms −1 , converted in spectral line numbers) are flagged as duplicates. Among these couples, the algorithm keeps the line whose upper half is closer, in median, to i = 0. This operation per-se does not count as full dealiasing. However, given the focus of ERUO on snow and the usual slow fall-speed of solid hydrometeors, especially in the upper part of the precipitation systems, this choice results in a reasonably good handling of many of the aliasing cases of our four datasets.

385
Once all duplicates have been removed, the line that spans the largest number of range gates is considered the main one, and all the ones at a distance larger than m along the i-axis are eliminated. Figure 4, in panel d, displays the resulting selection and the main line for the example spectra.

Signal identification
The set of lines derived in the previous section denotes only the position of the main peak in the spectrum at each range gate.

390
To proceed with the analysis, it is necessary to extract m spectral lines around each of these peaks. The selection of the section of the spectrum to keep at each range gate starts from the peak, expanding symmetrically from it in decreasing order of power.
Once the spectrum at each n has been determined, the signal is separated from the noise following the Decreasing Average (DA) method. This technique, already used by Maahn and Kollias (2012) for the processing of some of the MRR 2 measurements, appears more reliable than Hildebrand and Sekhon (1974) in our case. In short, the method starts by flagging the highest 395 peak in the spectrum as signal. In a series of iteration, adjacent peaks are flagged, too, favoring always the largest one, until the average power of the remaining, unflagged spectrum continues to decrease. In our case, we stop the algorithm slightly before, by setting a minimum threshold of 0.001 s.u. for the decrease between consecutive iterations. This small modification prevents a problem that we often encountered with the original version, where the algorithm included sometimes the whole spectrum in the signal.

400
Once the borders of the signal have been determined, we compute the noise level and its standard deviation (along the i dimension). In the unfortunate case in which the signal has been found to occupy the whole spectrum, such as in the case of extreme folding or some leftover interference line, the noise level and standard deviation are set equal respectively to the minimum of the signal and 0 s.u.. The resulting profile of noise level, NL (1) (t j , n), undergoes further refinement. One of the products of the preprocessing, P e (n), gives us the median raw spectrum per range gate for the campaign in logarithmic form.

405
After we convert it to linear units, we can compare it to NL (1) (t j , n), flagging all n at which the computed noise level is more than 0.2 s.u above P e (n). A copy of NL (1) (t j , n) with the entries at those range gates substituted by NaN is created, and convoluted with a box kernel 5 gates wide. The aim of this operation is the removal of gates potentially contaminated by remaining interference. The convolution is performed using, once again, the Astropy library, which has been set to perform a prior interpolation in the NaN regions. The resulting smoothed noise level is merged with the original NL (1) (t j , n) only in the 410 flagged range gates, giving us the final noise level, NL(t j , n).

Moments computation
The noise level and standard deviation identified in the previous section allow us to proceed with the computation of the final product of the processing section. The process starts by adding the noise standard deviation multiplied by 3 to the noise level, as a mean to eliminate the risk 415 of including spurious noise fluctuations in the signal. After subtracting the result from the elements of H(t j , i, n) within the signal borders identified in the previous step, we obtain the signal matrix H (sig) (t j , i, n).
The algorithm converts it to spectral reflectivity using the same formula used by Maahn and Kollias (2012) and Garcia-Benadi et al. (2020) for the MRR-2, which in turn is in agreement with the documentation provided by Metek: The same formula is used for the conversion of the noise floor and standard deviation. The former is also integrated across all m spectral line to derive the noise floor, NF (lin) (t j , n). All quantities computed so far are in linear units, mm 6 m −3 .
Using SZ (lin) ea (t j , i, n) as starting point, we can proceed with the computation of the radar variables. The first one is the attenuated equivalent reflectivity, in its logarithmic form and therefore expressed in dBZ: In the equation, λ = 0.01238 m is the wavelength of the instrument and |K| 2 0.92 is the dielectric factor of water (Segelstein, 1981). To conclude the example of Figure 4, the spectral reflectivity computed for this specific case is displayed in panel e.
Interestingly, relatively small peaks in the upper part of the spectrum may reach spectral reflectivity values comparable to much larger peaks in the lowest gate, due to the presence of the transfer function in equation 1.
Higher order moments of the spectrum give use two additional variables, the Doppler velocity and the spectral width, both 430 expressed in ms −1 : The quantity vel(i) indicates the velocity associated at each spectral line number. 435 The last variable computed is the signal-to-noise ratio, in dB: At the end, the quantities Z (P roc) ea (t, n), V (P roc) (t, n), SW (P roc) (t, n), SNR (P roc) (t, n) and SZ (P roc) ea (t, i, n), together with the noise floor and level are saved in a file, at the location specified by the user in the configuration file. The optional "quickplots" routine included in the ERUO library can provide a simple visualization of some of these products.

Postprocessing
Most of the parameters for the processing of raw spectra have been tuned to prioritize an increase in sensitivity, over the need to produce a clean set of variables, not contaminated by noise or interference lines. The spectrum reconstruction and the refinement of the noise level are the only two steps completely dedicated to the minimization of the impact of spurious peaks in the raw spectra. While their contribution to the elimination of the most noticeable interference lines is remarkable, 445 the processing output is often still affected by artifacts of lower intensity. The postprocessing aims to identify and mask these leftover non-meteorological signals.
The procedure starts by setting a minimum threshold of -20 dB on SNR, defining a new matrix of measurements to exclude, denoted by EXC(t, n). This quantity acts as a generic mask that can be applied over any of the radar variables: EXC(t, n) = 0 in the regions that are left untouched by the postprocessing, EXC(t, n) = 1 in the ones removed. The next subsections describe 450 the two steps that modify EXC(t, n) in order to reach the goal of this section. These two steps are executed independently of each other. Therefore, the user can decide to execute both of them, to skip one or to ignore the whole postprocessing.

Interference line removal
The first part of the postprocessing is designed to remove the remaining interference lines, typically characterized as relatively fixed in height and persistent for prolonged periods of time. Given the persistent nature of these lines, only range gates contain-455 ing valid measurements in more than 20% of the time span of the file are considered for the analysis. By valid measurements we denote the coordinates (t, n) where the radar variables computed by the processing have a not-NaN value and EXC(t, n) = 0.
For each set of coordinates (t j , n k ), the algorithm starts by defining a window of 40 time steps around t j at n = n k . This window is centered around t j , with the only exception of the first and last 20 time steps in the file, where the selection becomes asymmetrical to compensate for the borders of the matrix. We also define a set of Gaussian weights spanning the same time 460 interval, with the maximum of the curve coinciding with t j . The standard deviation of the Gaussian curve is set equal to 1/8 of the window size.
If at least 20% of the window contains valid measurements, we count the valid measurements in an interval of 40 range gates around n k , at t = t j . Similarly to the previous case, the selection of range gates is symmetrical in all cases, except the ones close to the bottom and top of the profile. A minimum threshold of 2 is imposed on the ratio between the number of valid 465 measurements in the two windows along t and n.
If the condition on the ratio is satisfied, the weights are summed to a temporary matrix, covering the same 40 time steps around t j at n = n k . After repeating the procedure for all candidate coordinates, the resulting matrix will contain higher values at those locations where signal persists for a long time and occupies only few range gates. So, we set EXC(t, n) = 1 in all locations where the value of the matrix is above 20.

470
All numerical parameters have been chosen according to their performances in our datasets. Therefore, the user may be required to change their value in the configuration file, if the setup of the MRR-PRO undergoing the postprocessing differs from the one listed in Table 1.

Leftover noise removal
The second part of the postprocessing focuses on the sporadic small scale noise, sometimes visible in the lowest range gates 475 in clear-sky conditions. As for the previous subsection, with valid measurements we indicate the coordinates (t, n) where the radar variables computed by the processing have a not-NaN value and EXC(t, n) = 0 (the latter retains the filtering from the previous step).
In this case, the postprocessing treats the matrix of valid measurements like an image, where each couple (t, n) represents a pixel. Using the multidimensional image processing library of Scipy (Virtanen et al., 2020), we detect all contiguous regions 480 of valid measurements (diagonal connectivity excluded) and we count the pixels inside it. If this number is lower than 4, we set EXC(t, n) = 1 for all coordinates (t, n) inside the region.

Validation
To test the validity of the proposed method, we designed a verification phase divided in three stages. The first one ensures that the processing and postprocessing products do not diverge excessively from the original ones (for the common detected signal) 485 while trying to quantify the improvements in terms of sensitivity. The second one aims to provide an independent validation, testing the consistency with the variables collected by a second radar. The last one, instead, focuses on a specific phase of the processing: the spectrum reconstruction. This part of the algorithm is arguably the one that diverges the most from libraries presented in other studies for the processing of similar radars, such as the MRR-2. Therefore, we deemed necessary a dedicated verification, to ensure that the reconstruction indeed improves the quality of the products.

Comparison with initial MRR-PRO products
The comparison between the ERUO products and the original variables, extracted directly from the MRR-PRO data files, is performed over the four datasets presented in Section 2. Given the issue in the transfer function experienced by the MRR-PRO 23 during the two campaigns, particular care will be taken in comparing its results with the ones derived from the other two datasets.

Sensitivity curves
To highlight the improvement in sensitivity, we computed the 2-dimensional histogram of attenuated equivalent reflectivity factor and range gate number. For each of the four datasets, three histograms have been produced: the first one using Z ea from the original MRR-PRO files, the second and third ones using the ERUO processing and postprocessing output, respectively.
We extracted the minimum value and the quantile 0.01 from the empirical distribution of Z ea at each range gate. These 500 quantities are interpreted as approximate indicators of the minimum detectable signal at each height. The difference between the set of statistics computed from the original variables and the ones derived from the processed (or postprocessed) ERUO products provides us with an estimate of the improvements in sensitivity that our algorithm is able to deliver.
As a summary of the typical behavior observable in such histograms, we decided to display in Figure 5  The effects of interference lines on a dataset is clearly visible in both panels, where small, isolated clusters of bins with relatively high counts can be seen starting from an height of around 3 km. In panel b, most of the effect of interference is not visible anymore, except for the strongest one, at approximately 4.5 km, and a smaller one 1 km below. Additionally, the count 510 associated to their bins is significantly lower than the one seen in panel a.
In panel d the interference clusters are accompanied by a large amount of noise, characterized by relatively low counts and reflectivity spanning a small interval close to the minimum detectable values. Even though the position of the interference is similar between the two datasets, their appearance in the spectra differ significantly (see subsection 4.3), and ERUO is able to produce a cleaner result for the MRR-PRO 22 dataset.

515
In each of the mentioned four panels, two sets of dots are visible: the blue ones denote the minimum of the empirical distribution of Z ea at each range gate, while the green ones show the quantile 0.01. Panels c and f display the difference between each set of statistics for the two campaigns, with a positive value indicating that the ERUO products reach a lower Z ea at that specific range gate. We can observe how the ERUO processing improves the sensitivity of more than 10 dBZ in the lowest part of the profile, up to approximately 2 km above the first gate. Above that, the improvement is gradually 520 reduced, reaching 0 dBZ at an height between 2.1 and 2.5 km, depending on the MRR-PRO model and the statistic considered.
Both datasets have been collected at PEA, where precipitation events were relatively faint and shallow, resulting in a lack of measurements at higher altitudes. The few isolated values above that height, visible in panel c, can be safely ignored, since they refer to the interference lines visible in both products. Lower values of the two statistics may actually indicate that those interference lines are fainter in the ERUO postprocessed outputs. 525 The same set of statistics computed over all datasets are displayed in Figure 6, with the results for the ERUO processing output in the top row of panels and postprocessing ones on the bottom row. Note that the transfer function reconstruction radically affects the MRR-PRO 23 measurements, and a direct comparison of its sensitivity with the ones of the other two MRR-PRO is not possible.
At first, the sensitivity gain after the processing appears to be the largest for all datasets. However, we suspect that most Metek, probably linked with the low raw spectra values typical of this region of the spectrum (see panel a of Figure 2). time-series reveals that noise is often recorded in clear sky situations. By analyzing the ICEGENESIS dataset, during which we have reliable co-located observations of precipitation, we noticed that the noise is more noticeable after snowfall events, with the antenna still covered in snow due to the absence of heating. The same may hold true for the MRR-PRO 23 dataset at PEA, even though we cannot confirm it since we lack independent precipitation measurements at the site. More significant differences can be seen for the spectral width, in panel c. We suspect that it may be due to differences in signal detection: the higher sensitivity of ERUO may result in a lower noise level, which may cause the peak associated with the detected signal to appear wider, and therefore have a larger SW .

Comparison with WProf
Given the presence of WProf alongside the MRR-PRO during the ICEGENESIS field campaign, we have access to an independent dataset for the verification of the ERUO products. Unfortunately, the MRR-PRO deployed at the site was affected by the issue in the transfer function previously described in subsection 3.2.1. Any improvement in the agreement with WProf 600 will be partially due to its reconstruction. Therefore, this section is not intended as an estimate of the improvements of the ERUO products compared to the original MRR-PRO variables, but as an independent verification of the validity of the signal recovered by the proposed method.
Since the two radars record data at a different temporal and vertical resolution, we decided to remap the measures of WProf on the MRR-PRO resolutions, which are the coarsest. An example of the remapped data, alongside the two sets of MRR-PRO 605 measurements, can be seen in Figure 9. In panel c we can immediately spot the interference line typical of the ICEGENESIS dataset, just above the 1 km height mark, of which we can only see some traces in the ERUO products of panel b. The difference of sensitivity between the two panels is also evident, especially at higher altitude. A third issue affects the profiles of panel c: Z ea increases with height much faster than the WProf measurements of panel a. This behavior is particularly evident after 15:45, when the MRR-PRO measures an increase of reflectivity with height above 1.5 km, while WProf sees a decrease. A 610 discussion of this effect and its possible causes is presented in Appendix-A.
Before extending the comparison of Z ea to the whole ICEGENESIS campaign, we need to define clearly what measurements are suitable for it. Given the difference in frequency, we can expect that measurements in rainfall would be particularly attenuated for WProf, therefore we decided to manually exclude the only rain event recorded in the period considered from the comparison. Additionally, large hydrometeors can be problematic, since we could have cases in which the measurements 615 at W-band are dominated by Mie scattering, while the Rayleigh scattering is still valid at K-band. This would artificially increase in the difference between the Z ea measured at the two bands, spoiling the comparison. Given the presence of MXPol at few kilometers of distance from the site, we could use the results from both the hydrometeor classification (Besic et al., 2016) and the demixing algorithm (Besic et al., 2018) to extract profiles directly above the MRR-PRO and WProf site, giving us information on both the dominant hydrometeor type and the proportion of each hydrometeor class. After remapping them 620 on the same MRR-PRO temporal and spatial resolution mentioned before, we decided to use this information to select which radar volumes are suitable for the comparison. The first rule we defined is the most relaxed one: only volumes dominated by crystals, according to the classification results of Besic et al. (2016), will be used for the comparison between the MRR-PRO and WProf. Crystals are among the most common hydrometeor type in our dataset, and they are the smallest in size, therefore the most unlikely to cause an increase in the difference between the two bands. Naturally, we can expect some crystals to be 625 particularly large and still become Mie scatterers at W-band, but this would still be less common than with other hydrometeor classes with larger diameters. A second comparison is then performed on a stricter set of rule: in addition to enforcing a majority of crystals in the volumes, we also required that less than 20% of the hydrometeors in it are aggregates, using the percentages provided by Besic et al. (2018). Aggregates are also common in our measurements and they can reach sizes considerably larger than crystals. As can be seen in Kneifel et al. (2015), the difference between the logarithmic difference of 630 reflectivity values between X-band and W-band deviates the most from linearity when considering aggregates. This deviation is caused by aggregates becoming Mie scatterers at W-band, a fact that would hold true also in our comparison, even though our comparison involves K-band instead of X-band.
Two 2-dimensional histograms obtained using this second set of rules are shown in Figure 10. Z W ea (t, n) is compared with Z ea (t, n) (panel b) and with the ERUO postprocessed products (panel d). By comparing the two we can immediately spot curve, parallel to the identity line. However, the spread remains large, with ERUO recording a particularly high number of reflectively values lower than their WProf counterpart. One possible explanation is the noise in the MRR-PRO measurements, in particular during precipitation events too faint to be recorded by the latter and strong enough to be detected by WProf. A second contributing factor could be the artificial lowering of Z ea at the lowest n. The phenomenon is more marked in the original MRR-PRO data, but it is not completely solved by ERUO. The stronger attenuation at W-band compared to k-band 645 can also be a contributing factor.
Similarly to the comparison presented in the previous subsection, we decided to summarize the results of the comparison for each variable in Figure 11. ERUO appears to perform better than the original products in all panels. However, as explained at the beginning of this subsection, a significant part of the improvement for Z ea can be attributed to the transfer function reconstruction. Additionally, in the case of panel a, the exact value of the median difference loses importance when we are 650 unsure on the contribution of calibration differences on this offset. However, the improvements in interquartile range and correlation can, at least partially, be explained by the reduction of the contribution of interference lines. Intuitively, the latter appear in the 2-dimensional histogram as a cluster of high Z ea for the MRR-PRO and low Z ea for WProf, which enlarges the spread and worsens the overall correlation.
On the contrary, differences in Doppler velocity cannot be linked as directly to the transfer function, since the variable 655 depends mostly on the position of the recorded signal in the velocity range. Panel b shows that the ERUO products have a significantly smaller median difference from V W (t, n). This is obviously linked with the issue with the Doppler velocity in the original MRR-PRO files previously mentioned. We observe a decrease of the interquartile range of the difference with WProf also for V , probably caused by the antialiasing.
Overall, even though a direct comparison of the original and ERUO products for this dataset is not completely fair, the 660 improvement in the series of statistics displayed in Figure 11 still confirms that ERUO is recovering valid measurements.
As hinted by Figure 9, many measurements that were below the sensitivity in the original products are now present in the ERUO ones. The improved agreement with WProf suggests that those newly visible signals are indeed valid precipitation measurements and they brings valid information, previously unavailable, to the dataset. 665 The spectrum reconstruction plays a crucial role in creating a clean output, eliminating the strongest interference before they can affect any of the variables. In all four datasets, interferences appear as spurious peaks in raw spectrum anomaly (A(i, n)).

Performance of the spectrum reconstruction
These peaks cover any return of lower intensity and they are masked by stronger signals. Since we cannot know what the real meteorological spectrum would have looked like without the interference, we cannot directly test the performances of our reconstruction.

670
However, we can extract some of these spurious peaks in A(i, n) and apply them to unaffected regions of the spectra from our datasets. Then, we can apply the spectrum reconstruction to these modified spectra, process them with ERUO and compare outputs with the respective unaffected ones. If we process the modified spectra without first reconstructing them, we can also obtain a baseline that allows us to estimate the improvement brought by the reconstruction. among the four datasets. We used the previously defined IM(i, n) to define the extent of each anomaly. Four of them can be seen in Figure 12, in which each panel shows the typical shape of the spurious peaks in A(i, n) for its respective dataset.
Their exact position changes during the time span of the campaign, so the region extracted is usually larger than the main anomaly peak. These four interference types have been chosen to represent as much as possible the variety that we observed. Therefore, even though the MRR-PRO used for the ICEGENESIS campaigns is the same MRR-PRO 23 deployed at PEA, 680 different interference lines have been chosen to represent the two datasets. Additionally, since the anomaly in panel a is not symmetrical with respect to the vertical axis, we flipped half of them. As a consequence, in the flipped anomalies the rightmost peak becomes closer to lower velocity values, which are more likely to contain precipitation.
Since the extracted interferences are stored in a i = 32 × n = 32 grid, we had to identify a region of similar size in each An example of how the extracted anomalies are applied to a spectrum and of the subsequent reconstruction is shown in Figure 13. Since the reconstruction acts on anomalies (A(i, n)) and not directly on the raw spectrum, we decided to display the 690 former in the example figure. In panel a we highlighted the region to which the interference will be applied by delimiting its border with two dashed lines. As expected, the meteorological signal in the region seems unaffected by any noticeable spurious peak. In panel b, one of the anomaly matrices extracted from the ICEGENESIS dataset has been applied to the profile. At each sets of coordinates (i, n), the resulting value has been chosen as the maximum between the original A(i, n) and the extracted interference anomaly. As a result, it is possible to see both the meteorological signal and two interference lines of different 695 intensity. The spectrum reconstruction takes the resulting matrix as input, and masks only a fraction of the entries, as shown in panel c, following the rules detailed in subsection 3.2.2. Finally, the reconstructed anomaly is show in the last panel. This output undergoes the standard processing, from which we derive Z ea , V and SW . The whole operation has been repeated for 120 profiles, divided equally among the three datasets collected at PEA.
The variables derived from the reconstructed and altered profiles are compared with the ones computed from the original, 700 unaltered raw spectra. The result of the comparison are shown in Figure 14. The criteria used for the comparison are: the mean absolute error (MEA) in the first row of panels, the root mean square error (RMSE) in the second, and the Pearson correlation coefficient (PCC) in the third. All comparison criteria are computed only over the range gates covered by newly added anomalies. Since the reconstruction often only affects a small fraction of these gates, the median difference could not be used to estimate the offset (or error, in this case) as done in the previous subsections, since its value was almost always null. 705 The value of the three statistics changes drastically between the three MRR-PRO. Given the different central height at which the anomalies have been applied, the difference may be due to the relative scarcity of precipitation data at higher altitudes. To compute each of the three statistics, we need that a signal is detected, at least for a fraction of the range gates considered, in all three profiles: the reconstructed one, the modified one and the unaltered one. Therefore, the comparison shown in the plot only takes in account precipitation data.
It should be noted that the overlapping of meteorological signal and interference is the most complex situation to reconstruct for ERUO, therefore the skills discussed here represent a "worst case scenario". As exemplified by Figure 9, the traces of the original position of the interference line in the ERUO products can often be seen only over precipitation measurements, and not in the clear sky sections.
In all cases the reconstruction reduces the error on the radar variables, while improving their correlation with the ones 715 derived from the unaltered profiles. However, both MAE and RMSE indicate that the variables differ considerably from their ideal value. By observing simple example of reconstruction, we could identify two critical situations that are not handled correctly by ERUO. The first one is the overlapping of precipitation signal with interference in the shape shown in panel a of Figure 12. As described in subsection 3.2.2, the algorithm checks whether the interference is an isolated peak, or a line spanning the whole range i = 1, .., m. When one of the two peaks shown in panel a of Figure 12 intersect the precipitation signal, it falls 720 in neither of the two categories, and therefore it is not reconstructed. The second case is caused by weak interference, such as the one in panel b of Figure 12, or the bottom line in the anomaly displayed in panel d. As can be seen in panel c of Figure 13, sometimes the anomaly is below the threshold used for masking it, and therefore is never flagged for reconstruction.
This behavior can be controlled by the user by setting lower thresholds in the configuration file. However, this may lead to an increase in false positive when flagging the pixels to reconstruct.

Summary and conclusions
In this study we presented the ERUO library, a new processing technique for the MRR-PRO raw spectra, developed specifically for measurements in snowfall. The method has the two-fold objective of minimizing the effect of a series of issues affecting the raw spectra, such as interference lines, and improving the quality of the output radar variables. The algorithm is divided in three stages, each with its own set of configurable parameters that can be tuned by the user to better fit their measurements. 730 The first step, the preprocessing, uses information accumulated over an extended period of time to extract some quantities used in the correction of the raw spectra. In particular, the quantity BM(i, n), which compensates for the power drop at the extremes of the Doppler velocity range, is computed at this stage. Previous scientific work used interpolation to solve an analogous issue with the MRR-2 spectra, but this could lead to underestimation of the signal is case of aliasing (Maahn and Kollias, 2012). The ERUO preprocessing, instead, is able to make use of the measurements of the instrument by simply 735 adding back an offset purely derived from the dataset itself. On the other hand, the preprocessing contains the first of the major limitations of the ERUO library. Given the need to load all available data at once to extract the median spectra, large datasets (due to high frequency of acquisition or large temporal duration of the measurement campaign) may be problematic to process. While this particular problem can be circumvented as described in subsection 3.1.3, the proposed solution still requires a manual intervention of the user. 740 The next step, the processing, computes a series of variables using the preprocessing outputs and the raw spectra as starting point. A crucial point during this procedure is the spectrum reconstruction. The IM(i, n) matrix computed during the previous stage indicates region of the spectra likely to contain interference. A series of conditions aims to isolate only the location of the interference peak, and the meteorological signal that has been covered by it is estimated from the surrounding unaffected spectra. In the verification stage, particular attention has been dedicated to this reconstruction. We tested its performances 745 by applying real interference anomalies from all our datasets to unaffected regions of the spectra collected by three MRR-PRO. The reconstruction results in variables closer to the ones obtained from the unaltered spectra, when compared to their not-reconstructed counterparts. Better skills are obtained for interference lines placed at higher n values, hinting that the reconstruction may have more difficulties when precipitation and interference coexist in the same profile. A visual inspection of the MRR-PRO 06 reconstructed products shows that the reconstruction fails when isolated peaks, like the ones shown A comparison of the Z ea and V measurements with a second radar, WProf, has also been performed. Only the ICEGENESIS dataset has been used in this case, since it is the only one in which the two radars were co-located. Unfortunately, MRR-PRO 23 was used for this campaign, and its measurements are affected by the transfer function issue. The analysis has been repeated 775 for two different sets of conditions, imposing first a condition on the sole dominant hydrometeor type (ice crystals) and then also on the maximum proportion of aggregates (20%). In both cases the ERUO products have shown a lower median and IQR of the difference between the two radars, and a higher Pearson correlation coefficient. The same velocity computation issue mentioned before also led to a significantly reduced median difference of the ERUO product from the WProf counterpart, when compared with the original MRR-PRO velocity. For Z ea , however, it is impossible to distinguish exactly the effect 780 of the transfer function reconstruction from any other improvements brought by the ERUO processing. The original truncated transfer function results in a reflectivity profiles that increases with height much faster than WProf, while the difference is much more stable when the reconstructed one is used. This implies that the results of the comparison can be used as independent verification of the validity of the ERUO products, but not as fair comparison between the latter and the original MRR-PRO variables. Overall, the agreement between the ERUO variables and the ones collected by WProf suggests that the former is 785 recovering valid measurements.
Finally, an achievement of the ERUO processing that may be hard to quantify is the enhancement in readability of the precipitation structure in time-height plots. ERUO is able to recover precipitation signals that were too faint to be recorded in the original data, especially at higher altitudes. As can be seen in Figure 9, the removal of the main interference line and the improvement in sensitivity result in a much clearer series of profile, in which the structure of the precipitation system can be 790 more easily be understood.
Code availability. The ERUO library is available for download at https://github.com/alfonso-ferrone/ERUO.git Appendix A: Height dependence of the offset in the ICEGENESIS dataset An inspection of the time-series of attenuated equivalent reflectivity factor recorded by the MRR-PRO and by WProf reveals some differences in the vertical trends of Z ea . While the original MRR-PRO variable increases much faster than its WProf 795 counterpart, the ERUO products appears to be more in agreement with the W-band radar. Therefore, we decided to investigate the Z ea comparison more closely. By studying the change of the median difference value with altitude, we were able to quantify the discrepancy in behavior between the two sets of MRR-PRO products. As can be seen in panel a of Figure A1, the difference between the WProf and original MRR-PRO attenuated reflectivity factor decreases steadily until the height of approximately 1 km, after which it stabilizes. The interquartile range of the difference also 800 experiences a decrease, even though its value is significantly smaller than the variation in the median value. On the contrary, the median difference for the ERUO products does not experience a significant drift with height, and the value appears to oscillate around 2 dBZ for the whole profile. A less marked tapering of the IQR can also be seen in the upper range gates. We suspect that the IQR remains larger than for the original MRR-PRO case because a higher amount of measurements are recovered by ERUO at those heights. 805 The discrepancy between the two sets of products can be explained by looking at the transfer function before and after the reconstruction, described in subsection 3.2.1 and illustrated by Figure 3. Since the original TF(n), used by the MRR-PRO to computed the variables available in the initial data files, is usually lower than the reconstructed one for high n, the original spectral reflectivity is necessarily higher than the one derived by ERUO. This results in an artificial increase of Z ea with height for the original products, which is absent in ERUO thanks to the reconstruction. 810 The Pearson correlation coefficient, in panel b, shows a better agreement between the ERUO and original products. Both experience a decrease with height, followed by a small rise in value for the last 500 m of the profile. Only minor differences exist between the original Z ea values and the ERUO ones, with the latter slightly outperforming the former in the range 1 km to 2 km. Overall, the transfer function issue has only a limited impact on correlation, when smaller section of the profiles are analyzed independently from each other.