A Gaussian Mixture Method for Specific Differential Phase Retrieval at X-band Frequency

Specific differential phase Kdp is one of the most important polarimetric radar variables, but the variance σ(Kdp), regarding the errors in the calculation of the range derivative of differential phase shift φdp, is not well characterized due to the lack of a data generation model. This paper presents a probabilistic method based on Gaussian mixture model for Kdp estimation at X-band frequency. The Gaussian mixture method can not only estimate the expected values of Kdp by differentiating the expected values of φdp, but also obtain σ(Kdp) from the product of the square of the first derivative of Kdp and the 5 variance of φdp. Additionally, ambiguous φdp and backscattering differential phase shift are corrected via the mixture model. The method is qualitatively evaluated with a convective event of a bow echo observed by the X-band dual-polarization radar in the University of Missouri. It is concluded that Kdp estimates are highly consistent with the gradients of φdp in the leading edge of the bow echo, and large σ(Kdp) occurs with high variation of Kdp. Furthermore, the performance is quantitatively assessed by three-year radar-gauge data, and the results are compared to linear regression model. It is clear thatKdp-based rain 10 amounts have good agreement with the rain gauge data, while the Gaussian mixture method gives improvements over linear regression model, particularly for far ranges.


Introduction
Apart from radar reflectivity (Z H ) and differential reflectivity (Z DR ), polarimetric radars also measure differential phase shift 15 (φ dp ) to reflect the forward scattering property of hydrometeor scatterers (Seliga and Bringi, 1978;Sachidananda and Zrnić, 1986). Its range derivative, also called specific differential phase (K dp ), has some advantages over Z H and Z DR (Zrnić and Ryzhkov, 1996), including insensitivity to attenuation, clutter, partial beam blockage and radar absolute calibration. The specific differential phase has played a key role in various meteorological applications-such as hydrometeor classification (Lim et al., 2005;Park et al., 2009), raindrop size distribution retrieval (Bringi et al., 2002;Williams et al., 2014) and quantitative 20 precipitation estimation (Ryzhkov et al., 2005;Cifelli et al., 2011)-since K dp is a phase variable independent of Z H and Z DR and almost linearly proportional to rain rate. σ 2 (φ dp ) (section 4.2). Next, we correct the ambiguous φ dp and δ co via the mixture model (section 4.3). Last, we calculate the expected value and variance of K dp (section 4.4), and improve the K dp profile by reducing σ 2 (K dp ) (section 4.5). To evaluate the algorithm, section 5 gives a case study and a comparison between radar and gauge. Section 6 summarizes the paper.

Background
The specific differential phase is the first derivative of differential phase shift φ dp along the radar range, giving a way to estimate 5 K dp by radar measurement of φ dp . Furthermore, the probability density function of φ dp can be modelled as a Gaussian mixture, which is often obtained via an expectation-maximization (EM) approach. Therefore, the mean and variance of the Gaussian mixture may lead to the improvement of the K dp estimation.
In this section, we introduce the physical interpretation of K dp and the regression model for estimating K dp . Since the Gaussian mixture is adopted as the data generation model, we also give a brief description of mathematical definition of the 10 Gaussian mixture model and the EM approach.
2.1 Specific differential phase (K dp ) For linear polarization, K dp is proportional to the integral of the raindrop size distribution and the real part of the difference of forward scattering amplitudes at orthogonal polarizations. It is mathematically formulated as K dp = 0.18λ  15 where λ is radar wavelength in millimeters, D is raindrop size in millimeters, N (D) is size spectrum in m -3 mm -1 , f hh,vv (0, D) is forward scattering amplitudes at horizontal and vertical polarizations, respectively.
By considering the Rayleigh-Gans scattering from identical and horizontally-oriented oblate spheroids, such as raindrops, the forward scattering amplitudes are proportional to the inverse square of radar wavelength, i.e., f hh,vv (0, D) ∝ 1/λ 2 , leading to the fact that K dp is inversely proportional to radar wavelength, i.e., K dp ∝ 1/λ. Therefore, the values of K dp at X-band 20 are often larger than that at S-band by a factor of 3, indicating that X-band radar can provide better K dp data than S-band radar when retrieving the rainfall rate. The conclusion is still valid even if the Mie effect is taken into account (Bringi and Chandrasekar, 2001;Chandrasekar et al., 2006).
However, K dp cannot be detected by polarimetric radar directly, whereas its integral φ dp is measurable. Hence, K dp can be estimated as the range derivative of the profile of φ dp , i.e., K dp = ∆φ dp 2∆r ,where r is the radar range in kilometers. An alternative 25 approach to estimating K dp is to apply a regression fit to the profile of φ dp , and the first order polynomial is usually considered as the fitting function (Balakrishnan and Zrnić, 1990;Ryzhkov and Zrnić, 1995). Subsequently, if the φ dp measurements are equally spaced in range by ∆r, K dp is then estimated by K dp = n i=1 φ dp (r i ) −φ dp [6i − 3(n + 1)] n(n − 1)(n + 1)∆r , where n is the number of gates, andφ dp is the mean value of φ dp within the n radar gates. Equation (2) shows that the accuracy of K dp estimates is determined by the number of gates (n), and the accuracy of φ dp . By assuming σ 2 (φ dp ) is relatively stable for all gates along a ray and noting that φ dp (r i ) is the only variable in Eq.
In Eq. (3), σ 2 (K dp ) is proportional to σ 2 (φ dp ), which is related to the spectrum width, cross-correlation coefficient, and the 5 dwell time (Sachidananda and Zrnić, 1986;Hubbert et al., 1993), and inversely proportional to n 3 . Moreover, it is notable that the backscattering phase shift is not negligible at X-band, thus the total propagation phase shift φ dp consists of φ dp and the backscattering differential phase, δ co , i.e.,φ dp = φ dp + δ co . The backscattering phase shift is often showed as a sudden jump over one or few range gates in a monotonically increasingφ dp profile of rain (May et al., 1999), with a value much larger than standard deviation σ (φ dp ). The presence of δ co over a small number of consecutive gates can be eliminated by a simple filter 10 (Hubbert and Bringi, 1995).
The specific differential phase is a unique polarimetric variable in terms of statistical errors in the rain rate estimation, since it is the range derivative of the phase measurement φ dp . The errors in the calculation of the first derivative also needs to be taken into account. In this study, we consider a Gaussian mixture as the data generation model, which plays an important role in the estimation of K dp and σ 2 (K dp ).

Gaussian mixture model
The Gaussian mixture is a statistical model for data probability density estimation, assuming that the data points are generated by a mixture of a finite number of Gaussian distributions associated with their weights (McLachlan and Peel, 2000;Sung, 2004). It is expressed as 20 where m is the number of components in the Gaussian mixture, w i is a weight with It is prevalent to use an Expectation-Maximization (EM) algorithm to estimate the parameters, w, µ, and Σ, by constructing the lower bound of log-likelihood based on Jensen inequality (Dempster et al., 1977). The EM algorithm is divided into two 25 steps, namely, an expectation (E) step and a maximization (M) step. In the E step, a degree of membership toward to the jth cluster is calculated, i.e., where i is the ith data with a total number of n data points, and y is a latent variable that determines the corresponding cluster.
Here, Q gives a tight lower bound for the log-likelihood, equivalent to maximizing the expectation.
By maximizing the lower bound with respect to each parameter, w j , µ j , and Σ j are updated as (Petersen and Pedersen, 2012) Notably, the M step increases the log-likelihood monotonically, and the covariance retains positive definite with sufficiently large data samples. Finally, the E step and M step are iteratively operated until the log-likelihood converges to a value with the difference between two successive steps below a certain threshold. In addition, the EM algorithm requires a specification of the number of clusters, m, prior to the E and M steps, and an inappropriate choice of m may lead to meaningless values of 10 the parameters. To tackle this problem, the Bayesian information criterion is often calculated to select the optimal m, while a Dirichlet process may also be used to model a prior probability to construct an infinite Gaussian mixture.
One of interpretations of the Gaussian mixture is to view each distribution as a cluster with a Gaussian probability density, while the individual data point is attributed to a specific cluster or a weight toward the cluster, regarded as unsupervised learning (Hastie et al., 2009). The clustering procedures based on Gaussian mixture have been applied to the identification of 15 storm structure (Veneziano and Villani, 1996), and the particle identification at S-band (Wen et al., 2015(Wen et al., , 2016b(Wen et al., , 2017) and X-band (Wen et al., 2016a) frequencies. Furthermore, the Gaussian mixture model can be extended to fit a set of unknown parameters in the prior probability of the Bayesian framework, forming a Bayesian Gaussian mixture model (Li et al., 2012).
The prior is then multiplied with the known conditional probability of data given the parameters to be estimated, yielding the posterior probability with a new set of parameters. The expectation of the posterior is often used to retrieve the conditional 20 mean of the new parameters based on least square criteria.
For the regression problem, the characteristics of the Gaussian mixture imply that the direct modeling of a regression function is very difficult. Nevertheless, the joint probability of the measurements and the estimated parameters may be modeled as a Gaussian mixture, leading to a regression function derived from the joint density model. Due to the asymptotic consistency of a Gaussian mixture model, it is capable of estimating a general density function in R n in any shape (Sung, 2004). Moreover, 25 the speed of calculating unknown parameters within a Gaussian mixture linearly depends on the number of the training data points, and the computation of the outputs is independent of the size of the training data. Consequently, regression based on a Gaussian mixture can be achieved very rapidly, compared to Gaussian process regression that grows with the data size. In addition, the Gaussian mixture can also be used to solve the regression problem with multiple dimensions, and a subset of dimensions can be selected to handle the missing data (Wen et al., 2015). The details of the radar characteristics are described in Simpson and Fox (2017).
The primary objective is to provide the observations of precipitation near the surface by means of low-cost and fine-scale 5 X-band radar, and to fill the observational gaps of the S-band radar network in Saint Louis (KLSX), Kansas City (KEAX), and Springfield (KSGF). Within the MZZU radar coverage, the Hinkson Creek located near Columbia, MO, flows through a catchment basin and eventually merges into the Missouri river, forming a typical urban watershed (Hubbart and Zell, 2013).
The radar can provide timely flash flooding warning for the Hinkson Creek watershed and surrounding areas.
In this study, we analyze the data collected by the X-band MZZU dual-polarization radar. The maximum detectable range of The raw radar data are organized and processed by an open-source software package called Python ARM Radar Toolkit (Py-ART: Helmus and Collis, 2016). Moreover, to validate the K dp estimation algorithm, we also use the data from tipping-bucket Bradford and Sanborn may be affected by ground clutter, since the radar beams are very close to the ground, with heights of 314.6 m and 336.9 m, respectively, and therefore the second elevation at 2 • is selected for validation. In contrast, the first elevations at Auxvasse and Williamsburg reach about 723.8 m and 999.0 m, which are less contaminated by ground clutter. 20 Furthermore, the point measurement of rain gauge is different from the volumetric measurement of radar, imposing additional errors on the comparison between radar and gauge (Anagnostou et al., 1999). The radar-based rain rate is then derived by averaging K dp over three successive range gates and three successive azimuthal rays with a total of 9 values centered over each gate in order to obtain good consistency between the instruments. In addition, the rain gauges are carefully calibrated in terms of instrumentation failure, clogging, and other discrepancies between the devices (Simpson and Fox, 2017), and well 25 documented to provide long-term data for rainfall observations. and Auxvasse. Moreover, Sanborn gives the highest hourly rain amount, the lowest total rain amount, and the second lowest duration out of the four sites, due to the effects of urban heat island (Hubbart et al., 2014). The second highest maximum hourly rain amount is recorded at Williamsburg, however, the total rain amount and duration are also the highest among the four sites, implying that convective rain is the most frequent at Williamsburg. In contrast, stratiform rain is more common at Bradford, since the gauge records the lowest maximum hourly rain amount and duration, and the second total hourly rain amount. In addition, it can be seen that Auxvasse also provides useful data for the comparisons between gauges and between radar and gauge, though the statistics are all ranked in the middle of the four sites. Overall, the rain gauge data at Bradford, Sanborn, Auxvasse and Williamsburg are representative and sufficiently large, leading to a valid dataset for testing the K dp and K dp -based rain amounts. 5 4 K dp estimation As discussed in section 2, the joint probability density function (PDF) based on a Gaussian mixture can be used to derive the regression model for K dp estimation. The Gaussian mixture method (GMM) not only estimates the expected values of K dp by differentiating the conditional expectation of φ dp , but also gives an estimation of K dp variance by regarding the errors in the calculation of the first derivative of φ dp . In this section, we describe GMM for the K dp estimation using MZZU radar data.
10 Figure 1 illustrates the flowchart of GMM (Fig. 1b), comparing to that of the linear regression model (LR; Fig. 1a).
From the chart of LR in Fig. 1a, we can see that after the radar measurements are collected, the φ dp is unfolded, and then the clutter is removed. After these corrections, an iterative filtering method is applied to the φ dp profile. An adaptive method is finally used to estimate the K dp profile according to the values of Z H . The Gaussian mixture model, on the other hand, processes φ dp differently. First of all, the clutter is masked out according to the thresholds of Z H and the variation of φ dp .

15
Secondly, the range measurements r and processed φ dp are fitted into a Gaussian mixture to yield the joint PDF, while a smoothed φ dp profile and the variances are obtained by taking the first raw and second central moments of the conditional PDF of φ dp given r. Thirdly, some specific clusters in the Gaussian mixture PDF are adjusted to solve the issues, such as ambiguous φ dp and backscattering differential phase shift δ co . Fourthly, a raw K dp profile is calculated from the first derivative of the expected values of φ dp , and the associated variances are obtained via a Taylor series expansion. Finally, the raw K dp profile is 20 smoothed, and consequently, the variances are reduced. In addition, new φ dp with lower variances can be re-constructed from the K dp estimates.

Data masking
The presence of clutter in the φ dp measurements may severely affect the K dp estimation, producing significantly large variations on the estimates. It is well known that the effect of clutter can be reduced by applying a spectrum filter to the time-series 25 data (e.g., May and Strauch, 1998;Hubbert et al., 2009). However, some residual clutter echoes are still shown on the radar measurements including φ dp (Wen et al., 2017). Therefore, the clutter needs to be well handled in GMM, prior to the deviation of the regression model based on the joint PDF.
In LR, the clutter is often eliminated by some criteria based on φ dp or ρ hv . For instance, we use the thresholds of local standard deviation of φ dp less than 10 • to classify valid points. Further, ten consecutive range gates of valid points signify 30 the beginning of a rain cell, and five consecutive gates of invalid points finish the associated rain cell. Overall, the thresholds give a fairly good performance on the MZZU radar, however, the clutter may be incorrectly identified in the regions of high reflectivity or for the clutter-contaminated weather echoes, which are often associated with large φ dp variation.
In contrast, GMM adopts sophisticated procedures, as depicted in Figure 2. It is clear that there are five stages in the data masking, beginning with the input of raw φ dp and ending with masked data. At the first stage, the raw data are fitted to a Gaussian mixture initialized by the k-means clustering, while the covariance is set to be diagonal for simplicity. The clusters 5 with no more than 5 points are promptly masked out, before they pass to the second stage. Stages two, three and four of the process all involve the clusters. At the second stage, the clusters are validated according to two sets of thresholds with respect to mean reflectivity. For the MZZU radar, the ratio of the standard deviations, σ(φ dp )/σ(r), less than 14.2 • km −1 , and σ(φ dp ) less than 4.1 • are used for Z less than 41 dBZ. To reduce the effect of hail contamination, the thresholds are increased for higher Z, resulting in σ(φ dp )/σ(r) < 47.9 • km −1 and σ(φ dp ) < 6.3 • . Next, the entire φ dp profile is divided into multiple rain 10 cell segments by considering the gaps between two consecutive clusters. Similar to the first stage, the segments containing no more than 5 points are excluded from the output of masked data. Following this, the dominant is determined for each segment by comparing the the weight accumulations of weather and clutter clusters. For a clutter segment with mean height below 200 m, the clusters within the segment are re-evaluated by thresholds of σ(φ dp )/σ(r) < 2.0 • km −1 and σ(φ dp ) < 0.8 • , on the other hand, the clusters in a weather segment are re-examined using σ(φ dp )/σ(r) < 34.7 • km −1 and σ(φ dp ) < 6.1 • . This step can 15 efficiently identify the clutter-contaminated weather echoes, which are often associated with large variances. At the last stage, some isolated points along the azimuth are obscured in the final results. Figure 3 illustrates two examples for data masking, including a convective case (Fig. 3a) and a stratiform case (Fig. 3b).
The data points in the two cases show steadily increasing trends related to anisotropic media along the wave propagation path.
However, between 1.3 and 15 km at an azimuth of 252 • in the convective case (Fig. 3a), the data present significant fluctuations 20 with the minimum value at about a few hundredths π rad but the maximum value at π rad. Since the dynamic range of φ dp is from 0 to 180 • for the MZZU radar, the measurements near the ground are likely to be the clutter returns, verifying the results of data masking. After 15 km, the φ dp points start from about π/4 rad and go all the way up to π rad. Notwithstanding this trend, the points sharply decrease to a few hundredths π rad at about 40 km, indicating the occurrence of ambiguous φ dp .
The data masking can effectively detect the ambiguous φ dp , and provide valid masked data for deriving the joint PDF. On the 25 other hand, the weather echoes are more frequently observed at 1 • in azimuth in the stratiform case (Fig. 3b). By taking a closer inspection on the φ dp data, we can discern that the points largely fluctuate between 40 and 80 km due to low signal-tonoise ratio. In LR, these points may be incorrectly discarded based on σ(φ dp ) thresholds, leading to some missing data in the stratiform regions. In contrast, the data masking accurately identifies weather echoes characterized by a number of verticallyoriented density ellipses. The continuous and uniformly-distributed regimes are consistent with the physical interpretation of 30 stratiform precipitation. In addition, the data masking is also sensitive to sudden jumps at the beginning of the φ dp data, which may be caused by δ co . 4.2 φ dp density estimation In the previous section, it is shown that the φ dp profile varies along the range r. It rises quickly for horizontally-oriented anisotropic scatterers, and conversely, it falls steadily for vertically-oriented particles. To estimate the relationship between r and φ dp , we consider r as an independent variable, denoted as x, and φ dp as a dependent variable, denoted as y. If the minimization of mean square error is required, the regression function is obtained by taking the average value of y at fixed x, 5 equivalent to estimating the expected values of y conditioned on x, i.e., where β is a set of unknown variables, for example, β = (m, w, µ, Σ) for the mixture model. Since the Gaussian mixture can be used to model any shapes of probability density with a rapid speed, the (x,y) points are then assumed to follow a joint PDF of Gaussian mixture, as defined in Eq. (4). Moreover, the properties of the multivariate Gaussian distribution in each cluster 10 determine the Gaussianity of the marginal distribution of either variable and the conditional distribution of one variable given the other (Bishop, 2006). Therefore, the conditional PDF of y given x is expressed as where  are obtained by the EM algorithm. In Eq. (14), f (x) is the marginal PDF of x with the parameters identical to the mixture, and f i (x) is the weighted marginal PDF of each cluster, i.e., . By substituting Eq. (11) into Eq. (10) and noting the linearity of the mathematical expectation, the expected value of y conditioned on x is then obtained as and the conditional variance is given as (see Appendix A)

25
Equations (15) and (18) play an important role in the joint PDF-based regression analysis, called the regression and skedastic functions (Spanos, 1999). In Eq. (15), it can be seen that the regression function in GMM consists of multiple linear kernels, is not determined by the local structure but the marginal PDF of global data x. Comparing to LR, GMM is more flexible to capture the data information, while it still retains a finite set of parameters. Moreover, Eq. (18) readily estimates the point-wise variances σ 2 (y|x) that characterize the random errors in the measurements, whereas these errors σ(φ dp ) are often considered as constant in LR. Figure 4 compares the φ dp profiles given by Eqs. (15) and (18) with that obtained by LR. Figure 4.a gives the same example 5 as Fig. 3.a, but the EM algorithm is configured differently. In the φ dp density estimation, the mixture with full covariance yields density ellipses of random shapes. Furthermore, the algorithm repeats the fitting procedures three times to avoid the local maxima of the log-likelihood. Meanwhile, the choice of the cluster number relies on the Bayesian information criterion calculated for each m, starting at 10 clusters. It can be seen that the mixture composed by density ellipses well characterizes the data points, since the root-mean-square error is small relative to the expected values. Between 15 and 35 km, the narrow 10 ellipses result in φ dp with a rising trend consistent with LR. On the other hand, the mixture has very small variances, giving a high confidence for the fitted parameters. From 35 km, the ellipses become wider, and the associated variances increase due to low signal-to-noise ratio at the edge of radar echoes. What is notable, however, is that the φ dp profile dramatically increases to a large value, whereas LR remains a relatively steady trend. It indicates the importance of the φ dp unfolding for the φ dp density estimation.
15 Figure 4.b presents another example of the density estimation. It is clear that the φ dp profiles produced by GMM and LR both rise considerably along the range, and the trends for the two methods are very similar with a strong correlation of 0.998.
The profile starts at about 1 rad, and remains relatively stable before rising dramatically between 35 and 55 km. By 65 km, φ dp has more than doubled, and then, there is a steady increase for φ dp reaching about 2.3 rad at the end of the profile, which is around 1.3 rad up on the ranges of 0 and 35 km, and 0.3 rad more than recorded at the ranges of 55 and 65 km. If we examine 20 φ dp measured at X-band frequency, we can see that some points fall out of the error bars corresponding to one standard error (i.e., 95% interval). Most notably, between 18 and 20 km, the φ dp profile shows a sudden slump, indicating the occurrence of backscattering differential phase shift. In conclusion, the expected value and the variance of φ dp can be obtained from the joint PDF, but the mixture needs to be tuned in terms of φ dp unfolding and δ co elimination.
4.3 φ dp unfolding and δ co elimination 25 According to the continuity and consistency of φ dp data, we can discern that some issues exist in the density estimation, such as ambiguous φ dp and δ co . Since φ dp is an range accumulative measurement of propagation phase, depending on the initial φ dp (0). The measurements may exceed the dynamic range of 0-180 • when the wave propagates through a rain medium. This situation is even more significant at X-band frequency than S-band due to the inverse relation of the wavelength and the rate of phase shift. Nevertheless, it can be noted that φ dp gives a non-negative trend along the range for rain, and therefore, the 30 ambiguous φ dp may be corrected accordingly (Wang and Chandrasekar, 2009).
In LR, φ dp is first averaged over a small window for weather data, and a linear fit is then performed to obtain the increment for the range gate next to the window. In the following stage, a reference is predicted by summing up the average and the increment, and compared to the observed value at the same gate. If the difference between the predicted and observed values is larger than 90 • , the observed φ dp is then increased by 180 • . Finally, the correction process is iteratively operated until the last gate.
On the other hand, the φ dp unfolding is more straightforward in GMM. After the initial stage of generating the mixture, the density ellipses corresponding to more than 6 data points are transformed into the next stage, which compares the means µ i of two consecutive density ellipses along the range. At this point, the mixture is ready to be corrected for ambiguous φ dp . The 5 mean of the latter density ellipse is finally added up 180 • , if the former mean is larger than the latter one by 80 • .
As illustrated in Fig. 4.a, the profile φ dp reaches π rad at about 38 km, and then becomes ambiguous between 38 and 42 km.
In LR, the φ dp values at these locations are interpolated according to the trend of the previous few gates, and the maximum value is π rad. In contrast, the corrected density ellipses in GMM show an upward trend between 38 and 42 km, while the φ dp profile reaches a maximum value of about 3.5 rad, indicating the effectiveness of the φ dp unfolding in the region of heavy rain. 10 In addition to ambiguous φ dp , the estimation of the joint PDF may also be affected by non-zero δ co , which occurs more frequently at X-band frequency than S-band due to Mie effects. δ co is shown as a sudden phase change over a small number of gates in a monotonically increasing trend for rain. According to this manifestation, the magnitude and gate number of the φ dp perturbation can be used to eliminate δ co (Matrosov et al., 2002;Otto and Russchenberg, 2011;Trömel et al., 2013).
The linear regression model often adopts an iterative filter technique, which generates a new φ dp profile from either the raw 15 data or the filtered one based on a threshold (Hubbert and Bringi, 1995). If the filtering alters the data by 4 • , the new profile selects the filtered data, otherwise the raw data are remained. The new profile is then used as input in the next iteration until the convergence condition is satisfied.
In GMM, the δ co elimination is embedded into the process of the φ dp unfolding. For two consecutive density ellipses, the latter density ellipse is removed if its mean is larger than the former one by 85 • . Prior to this step, the mean of the first density 20 ellipse in the mixture should be below 90 • to reduce the δ co effect at the first few gates. Since δ co occurs over a small number of range gates, a mixture pruning is also employed to remove the density ellipses with weights less than 0.0501, equivalent to 2% of the data.
It is clear from Fig. 4.b that δ co has occurred at multiple locations in the data. The φ dp profile starts at a high value and drops somewhat over the first two gates. Notably, there is a narrow gap between 18 and 20 km, which is non-zero δ co . These data are 25 characterized by a density ellipse with a slightly decreasing trend in GMM, and the resulting expected values are consistent with the filtered data in LR. Between 70 and 90 km, a few isolated points beyond the density ellipses are associated with δ co .
Both of the two methods can produce φ dp following the main trend of the data, which suggests that the process is effective for the δ co elimination.

K dp estimation 30
As discussed in section 2.1, K dp is the first derivative of φ dp with respect to the range r. According to the mean value and dominated convergence theorems, the derivative of the expected value of φ dp conditioned on r is equal to the expected value of the derivative of φ dp with respect to r, i.e., K dp (see Appendix B). Following the notation in section 4.2, we denote K dp as y . Therefore, the expected value of K dp is obtained by taking the derivative of Eq. (15), yielding The variance of y conditioned on x can be approximated by the first-order Taylor series expansion (see Appendix C), i.e., where σ 2 (y|x) is given in Eq. (18). By taking the derivative of Eq. (19), E (y|x) is expressed as From Eq. (B8) in Appendix B, it is clear that where g i (x) is the summation term. Subsequently, the second derivative of w y|x i is given as Equations (19) and (20) are the regression and skedastic functions for the K dp estimation. In Eq. (19), it is clear that the expected value of K dp can be divided into two components, including Eqs. (B7) and (B11). On one hand, Eq. (B7) is related to the changing rate a i weighted by the marginal distribution of each cluster in the mixture, equivalent to a linearly 15 weighted combination of small portions of data. If a data point is dominated by a specific cluster, i.e., the weight of a cluster is significantly larger than the others, K dp is determined by the coefficients of the cross-correlation and auto-correlation of r, and independent of the means and auto-correlation of φ dp , yielding a constant value within the dominated cluster. On the other hand, Eq. (B11) shows that the weighting function also contributes to the K dp estimates by considering the Gaussian derivative of the φ dp estimates in two or three adjacent clusters along the range. The sign of K dp is then determined by the marginal 20 means and variances of the clusters, weighted by the difference of their contributions to φ dp .
In Eq. (20), it can be seen that σ 2 (K dp ) is proportional to σ 2 (φ dp ), which is similar to Eq. (3) in LR. However, σ 2 (φ dp ) varies along the range due to the random errors of the φ dp estimates in GMM, whereas σ 2 (φ dp ) is stable in LR. In addition, the statistical errors with respect to signal processing may be included in Eq. (20) as an additive term, independent of φ dp .
Moreover, the radar gate spacing and gate number for the K dp estimation are often fixed in Eq. (3), indicating σ 2 (K dp ) is also constant in LR. In contrast, σ 2 (K dp ) in GMM is closely related to the first derivative of K dp in Eq. (20). As the changing rate of K dp increases, the random errors associated with the K dp estimates rise dramatically. Figure 5.b illustrates K dp and its variance estimated from φ dp in Fig. 5.a, which is the same case as given in Figs. 3.a and   4.a. It is apparent that the K dp estimates present a large fluctuation, while the associated variances are significant. In GMM, K dp starts from about 0.5 deg km −1 , and then fluctuates between 17 and 20 km and between 24 and 42 km. In the profile, 5 there are six local peaks with the maximum at about 8.5 deg km −1 . Meanwhile, the K dp variances vary as the K dp estimates change. Between 15 and 17 km and between 20 and 24 km, the K dp estimates stand at a value, leading to small K dp variances in these regions. When short excursions are present, such as that between 18 and 20 km, K dp variances increase significantly due to the contribution of the first derivative of K dp in Eq. (20). Furthermore, the large φ dp variances between 35 and 42 km also result in an increase of the K dp variances. In contrast, LR gives less fluctuation in K dp estimates with two peaks at about 10 20 and 34 km. The comparison of K dp obtained by the two methods may suggest that a smoothing procedure is required to reduce the significant variance in GMM.

K dp smoothing
As discussed previously, the K dp variance is small for high K dp , but relatively large for low K dp . Therefore, an adaptive estimation is adopted in LR. For radar reflectivity (Z H ) less than 20 dBZ, the gate number n in Eq.
(2) is set as 15, while n is 15 8 for 20 ≤ Z H < 35 dBZ, and 2 for Z H ≥ 35 dBZ, respectively. On the other hand, GMM also applies an adaptive technique based on finite impulse filter (FIR) to the expected values of K dp in order to reduce the associated variances. Figure 6 shows the time responses of FIR. The impulse response ( Fig. 6.a) is peaked at the center, and gradually decreases towards the two ends. If a longer window is required, the cut-off bounds are extended accordingly. Furthermore, the step response ( Fig. 6.b) gives the accumulation of the impulse response, indicating that the magnitudes around the center change faster than that at the 20 two ends. In this study, the length of the FIR window is subject to relative square error of two adjacent iterations. For profiles with sufficiently large data points, the window length is between 29 and 33 for the MZZU radar.
To obtain the reduced variance, we consider the filter as a number of weighting functions, denoted as h i (x), and subsequently, the smoothed data become where y is a smoothed data point, x i is the original data within the smoothing window, and n is the window length. By taking the variance on both sides of Eq. (26), we have Therefore, the variance of the smoothed data is the weighted sum of the variances of the original data within the smoothing window. Since the FIR coefficients are much less than unity, σ 2 (y) is smaller than σ 2 (x) at the same gate. Furthermore, the 30 K dp estimates with the reduced variances can be used to re-construct φ dp to obtain smaller φ dp variances. For a fixed gate spacing ∆r, the re-constructed φ dp for the jth range gate is K i dp ∆r, and (28) The red curves in Figs. 5.a and 5.b illustrate the re-constructed φ dp and the smoothed K dp using FIR, respectively. The smoothed K dp in Fig. 5.b is more consistent with the LR results compared to the original K dp produced by the GMM. In the 5 first few kilometers, the smoothed K dp gradually rises, and then peaks at about 21 km. With no fluctuations, the smoothed K dp falls gradually, followed by a growth before reaching a plateau at 33 km. After a slight decrease between 33 and 36 km, K dp rises dramatically, which is very different from LR. Meanwhile, the variances are small at the beginning, but get larger as K dp is climbing up. Between 20 and 33 km, the K dp estimates do not change very much, leading to small variances in this region. But after 33 km, the variances begin to increase and retain large values until the end of the profile. Overall, the smoothed 10 K dp is stable, producing a profile considerably consistent with LR, and the variances are significantly reduced comparing to the original data. In addition, the re-constructed φ dp (Fig. 5.a) constantly increases with few local fluctuations, while the associated variances are smaller than the φ dp variances in GMM.

Evaluation
In this section, a case study is first presented to qualitatively analyze the storm structure and evolution based on K dp . The 15 radar-gauge dataset is then used to provide a quantitative evaluation for the K dp estimation in terms of root mean squared error (RMSE), normalized bias (NB) and Pearson correlation coefficient (ρ RG ), which are defined as 20 where N is the sample size, R i is the individual radar hourly rain amount, G i is the gauge data, andR andḠ are the sample means for radar and gauge, respectively. The radar hourly rain amount is calculated based on the CASA radar rainfall algorithm (Wang and Chandrasekar, 2010;Chen and Chandrasekar, 2015).

Case study
On 24  the storm structure at high levels, since each radar was about 150 km away from the storm. Notably, the Kansas City radar showed positive and negative Doppler velocities in a small area (not shown), indicating the occurrence of a downburst. On the other hand, the MZZU radar illustrated a bow echo of Z H close to the radar center (Figs. 7.b). In addition to Z H , the GMM-based K dp (Figs. 7.d, e and f) was also obtained to investigate the storm structure near the surface. Figure 7 illustrates that the convective storm evolves from a strong and large echo to a bow shape echo, and then dissipates 5 at far range. At 0304 UTC (Fig. 7.a), a cell with strong Z H moves into the radar area, while K dp is moderate with a maximum of about 3 deg km −1 (Fig. 7.d). As the cell is transforming to a bow shape, the radar echo becomes intensive, and forms a rain band with embedded convective cores ( Fig. 7.b). It is clear to see that K dp reaches over 10 deg km −1 in these core regions (Fig.   7.e), indicating very heavy precipitation at the surface. With the fast movement of the storm, the downburst has been weaken, and the storm starts to dissipate (Fig. 7.c). At 0441 UTC, it can be seen that K dp is gradually reduced at the far range, while its 10 maximum is much less than that at the mature stage.
In this storm, the bow echo is shown as a number of convective cores embedded in a rain band, while the downbursts occured at the leading edge near the echo center. The bow echo can be considered as a mesoscale convection with a horizontal dimension of more than 60 km. To gain a further insight, Fig. 8 shows raw φ dp and K dp for the bow echo. In Fig. 8.a, raw φ dp presents large gradients along the leading edge, rising from about 50 • to over 140 • . Due to the sharp increase, φ dp exceeds the 15 maximum dynamic range, leading to ambiguity in the areas of X:-20~-18 km and Y:12~18 km and X:-40~-23 km and Y:-8~-5 km. In addition, the echoes behind the convective cores occasionally vanish as a result of signal attenuation. Nevertheless, LR ( Fig. 8.b) produces continuous K dp by φ dp unfolding and linear interpolation according to the trends of the profiles, but some missing data still exist within the storm. In contrast, GMM (Fig. 8.c) corrects these data with the expected values derived from the joint PDF, and simultaneously obtains the statistical errors in the production of K dp . It is evident that the GMM method 20 can efficiently handle the missing data via the mixture model, which is another advantage over the LR model. Furthermore, the statistical errors are not very large in these areas, since the missing data are filled by the distribution of the entire data profile.
Additionally, the GMM K dp estimates are generally a few deg km −1 higher than the LR ones, particularly for the regions of high Z H .
By taking a closer look at GMM K dp , we can see that the bow echo is generally characterized by K dp of above 2.5 deg 25 km −1 , while five pockets of high K dp are identified. In the bow head, the first pocket presents very high K dp associated with a rapid growth of φ dp . Behind this pocket, there is a region of negative K dp , whereas LR generally yields positive values.
It may be due to a reduction of cross-correlation coefficient caused by low signal-to-noise ratio, since the signals have been significantly attenuated after propagating through the pocket. In the middle of the second and third pockets in the bow center, LR and GMM both show lower K dp comparing to the two pockets, while K dp is substantially consistent with the gradient 30 of φ dp in the area. By considering the high Z H in Fig. 7, these moderate K dp values may indicate less anisotropic scatterers, such as small hail in the process of wet growth. Similarly, a hail signature with maximum Z H of above 66 dBZ and small K dp of 1~2 deg km −1 can also be identified in the middle of the fourth and fifth pockets in the bow tail. Along with the expected values of GMM K dp , Fig. 8.d depicts the statistical errors σ(K dp ) in the calculation of the expected values. The five pockets of high K dp are generally associated with small σ(K dp ) of a few tenths deg km −1 . However, the estimates behind the top four pockets yield very large σ(K dp ) with a maximum above 10 deg km −1 , and the expected values of K dp are sometimes below 0 deg km −1 , such as X:-25~-20 km and Y:11~20 km. In contrast, a region of high σ(K dp ) appears in front of the bottom pocket, superimposed on the high Z H area associated with hail. In conclusion, the GMM K dp estimates of high confidence give good agreement with the gradients of φ dp in the leading edge of the bow echo, while large σ(K dp ) are expected at the region of high variation of the K dp estimates.

Statistical analysis
In order to quantitatively evaluate the accuracy of GMM K dp , hourly accumulated rain amounts are derived from the Xband rainfall rate algorithm (Chen and Chandrasekar, 2015), and compared to the rain gauge data collected at Bradford, Sanborn, Auxvasse and Williamsburg between 1 April 2016 and 2 June 2018. The scatterplots presented in Fig. 9 illustrate the comparison between GMM-based radar and gauge rain amounts, and the accompanying table (Table 2) gives RMSE, NB and 10 ρ RG results obtained by GMM and LR.
Consistent with the data in Table 1, the rainfall at the four sites is predominately made up of light rain with hourly rain amounts no more than 2.5 mm h −1 . Nevertheless, according to Fig. 9, moderate rain with amounts between 2.6 and 8 mm h −1 gives a considerable contribution to the total rain events, followed by a small portion of heavy rain with amounts more than 8 mm h −1 . When we study the scatterplots and statistics for each of the four sites, it is apparent that Bradford ( Fig. 9.a) and 15 Sanborn ( Fig. 9.b) are more concentrated on the red line than Auxvasse (Fig. 9.c) and Williamsburg ( Fig. 9.d), since Bradford and Sanborn are closer to the radar. Accordingly, RMSEs for Bradford and Sanborn (Table 2) are relatively small, about 13%~35% lower than Auxvasse and Williamsburg. Furthermore, it can be seen that Bradford and Sanborn show negative bias associated with negative NBs, indicating an underestimation of rain amounts by GMM K dp . In contrast, a slight overestimation may be concluded for Auxvasse and Williamsburg by considering the point trends and the positive NBs. Additionally, Sanborn 20 claims the highest ρ RG out of the four sites, yielding the best consistency between radar and gauge.
When compared to LR statistics as given in which may be due to the local complex terrain near the radar. Overall, the rain amounts deduced from GMM K dp are highly consistent with the rain gauge data, and GMM gives a better performance than LR in terms of RMSE, NB and ρ RG at the far ranges.

Summary and discussions
In this study, we proposed a probabilistic method based on Gaussian mixture model to estimate the specific differential phase 30 K dp , which is the range derivative of differential phase shift φ dp . The Gaussian mixture method (GMM) not only obtained the expected values of K dp by differentiating the conditional expectation of φ dp , but also yielded the variance σ 2 (K dp ) regarding the errors in the calculation of the first derivative of φ dp .
As an initial step of GMM, the data masking was performed to eliminate the the residual clutter in the φ dp measurements.
The data of r and φ dp were first fitted into a simplified Gaussian mixture to generate a number of clusters, which were validated against the two sets of the σ(φ dp ) and σ(φ dp )/σ(r) thresholds given by radar reflectivity Z H . The clusters were then combined 5 to form the rain cell segments, and the segments were classified by comparing the weight accumulations of weather and clutter clusters. Next, the clusters within each segment were re-evaluated by the thresholds according to the segment types. Finally, the azimuthally isolated points were masked out.
Secondly, the joint PDF was obtained by fitting the data of r and φ dp into a mixture model with full covariance, where the cluster number m, weight w, mean µ and covariance Σ were optimized via the Expectation-Maximization (EM) algorithm. 10 Subsequently, the PDF of φ dp conditioned on r was also a mixture with parameters related to the joint PDF. Finally, new φ dp was estimated by the conditional expectation, and the statistical errors σ 2 (φ dp ) were given by the conditional variance, which was not always constant, but varied with w and the marginal PDF of r.
Thirdly, the ambiguous φ dp and backscattering differential phase shift δ co were corrected by examining the two adjacent density ellipses in the mixture. On one hand, if the former density ellipse had a mean larger than the latter one by 80 • , the latter 15 mean was added to 180 • for φ dp unfolding. On the other hand, if the former mean was smaller than the latter one by 85 • , the latter density ellipse was removed as δ co . Moreover, for δ co elimination, the first density ellipse mean was assumed as below 90 • , while the density ellipses with small weights were also removed.
Fourthly, the joint PDF of r and φ dp was used in the calculations of K dp and σ 2 (K dp ). Since K dp was the range derivative of φ dp , the expected values of K dp were then obtained via the derivative of the expected value of φ dp . Moreover, by taking the 20 first-order Taylor series expansion, σ 2 (K dp ) was the product of the square of the first derivative of K dp and σ 2 (φ dp ), yielding non-constant values of σ 2 (K dp ).
In the final step, the expected values of K dp were smoothed to reduce the associated σ 2 (K dp ). An FIR filter was implemented, and iteratively applied to the data to search for an optimal window length. Subsequently, the reduced σ 2 (K dp ) was obtained by the sum of the original σ 2 (K dp ) weighted by the FIR coefficient squares within the window. Additionally, new 25 φ dp were re-constructed from the smoothed K dp , while σ 2 (φ dp ) was also reduced.
The experimental results with a severe storm observed by the X-band polarimetric radar in the University of Missouri (MZZU) revealed the advantages of GMM. By studying the structure and evolution of a bow echo in the storm, it was concluded that the GMM K dp was consistent with the gradients of raw φ dp along the leading edge of the bow echo, while large σ 2 (K dp ) occurred with high variation of K dp . The GMM method produced results similar to the LR method, with the ability to handle 30 the missing data. Moreover, the hourly rain amounts based on K dp were compared to the rain gauge data, showing good agreement between radar and gauge measurements. The rain amounts obtained by GMM K dp gave improvements over the linear regression model, particularly for the far ranges.
The potential applications of GMM K dp and σ 2 (K dp ) include quantitative precipitation estimation (Cifelli et al., 2011;Chen et al., 2017) and attenuation correction (Park et al., 2005). For quantitative precipitation estimation, the relationship between K dp and rain rate R is almost linear, since K dp is about the fourth-order moment of raindrop size distribution, and R is the 3.67th-order moment. As illustrated in Fig. 9, the R (K dp ) algorithm is consistent with the in situ measurements. To further investigate the R errors, it is necessary to consider the K dp errors in the calculation of the first derivative of φ dp . The standard deviation σ(K dp ) is then related to σ(R) by a factor of R/K dp (Bringi and Chandrasekar, 2001). In a similar manner, K dp is linearly proportional to specific attenuation A h and specific differential attenuation A hv (Bringi and Hendry, 1990). Therefore, 5 the errors of radar reflectivity Z H and differential reflectivity Z DR may also be proportional to σ(K dp ) after the attenuation correction, and eventually contribute to the R errors via R(Z H ) and R(Z H , Z DR ). In addition, the error estimates can be used to provide streamflow trends in hydrological model.
Appendix A: Variance of φ dp We consider the range r as an independent variable, denoted as x, and φ dp as a dependent variable, denote as y. The joint 10 distribution of z = (x, y) follows a Gaussian mixture as where w i , µ i and Σ i are the weight, mean and covariance for each component, respectively. The probability of y conditioned on x is also a Gaussian mixture with parameters w Meanwhile, the second term is given as Based on the properties of Gaussian function, the derivatives of f i (x) and f (x) are expressed as By substituting Eqs. (B9) and (B10) into Eq. (B8), the second term is transformed as By substituting Eqs. (B7) and (B11) into Eq. (B6), we obtain are zeros if the series is expanded at the mean value of y. By taking mathematical expectations on both sides of Eq. (C1), it is transformed as From Eqs. (C1) and (C3), the variance of g(y) is approximated as = g (θ) 2 σ 2 (y) Let g(y) be y , and then we have σ 2 (y |x) = [E (y|x)] 2 σ 2 (y|x).

15
From Eq. (A9), we can see that By taking the derivative of Eq. (B6), the second derivative of the expected value of y conditioned on x becomes