Interpreting estimated observation error statistics of weather radar measurements using the ICON-LAM-KENDA system

Zeng, Yuefei; Janjic, Tijana; Feng, Yuxuan; Blahak, Ulrich; de Lozar, Alberto; Bauernschubert, Elisabeth; Stephan, Klaus; Min, Jinzhong

doi:https://doi.org/10.5194/amt-14-5735-2021

Articles | Volume 14, issue 8

https://doi.org/10.5194/amt-14-5735-2021

Special issue:

Fusion of radar polarimetry and numerical atmospheric modelling...

https://doi.org/10.5194/amt-14-5735-2021

Articles | Volume 14, issue 8

Research article

20 Aug 2021

Research article |

| 20 Aug 2021

Interpreting estimated observation error statistics of weather radar measurements using the ICON-LAM-KENDA system

Yuefei Zeng, Tijana Janjic, Yuxuan Feng, Ulrich Blahak, Alberto de Lozar, Elisabeth Bauernschubert, Klaus Stephan, and Jinzhong Min

Abstract

Assimilation of weather radar measurements including radar reflectivity and radial wind data has been operational at the Deutscher Wetterdienst, with a diagonal observation error (OE) covariance matrix. For an implementation of a full OE covariance matrix, the statistics of the OE have to be a priori estimated, for which the Desroziers method has been often used. However, the resulted statistics consists of contributions from different error sources and are difficult to interpret. In this work, we use an approach that is based on samples for truncation error in radar observation space to approximate the representation error due to unresolved scales and processes (RE) and compare its statistics with the OE statistics estimated by the Desroziers method. It is found that the statistics of the RE help the understanding of several important features in the variances and correlation length scales of the OE for both reflectivity and radial wind data and the other error sources from the microphysical scheme, radar observation operator and the superobbing technique may also contribute, for instance, to differences among different elevations and observation types. The statistics presented here can serve as a guideline for selecting which observations are assimilated and for assignment of the OE covariance matrix that can be diagonal or full and correlated.

Download & links

Article (PDF, 5308 KB)

Download & links

How to cite.

Received: 01 Apr 2021 – Discussion started: 12 Apr 2021 – Revised: 21 Jul 2021 – Accepted: 24 Jul 2021 – Published: 20 Aug 2021

1 Introduction

Nowadays, assimilation of weather radar measurements has been widely adopted in many weather services for convective-scale numerical weather prediction (NWP) models (Gustafsson et al., 2018). For instance, in the 3D-Var system of the Météo-France, Doppler radial wind measurements are assimilated (Montmerle and Faccani, 2009), and radar reflectivity measurements are assimilated by a combined 1-D and 3D-Var method (Caumont et al., 2010), which firstly derives relative humidity profiles from reflectivity data. At the Met Office, volume scans of radar reflectivity data are directly assimilated (Hawkness-Smith and Simonin, 2021) by the hourly cycling 4D-Var (Milan et al., 2020). At the Deutscher Wetterdienst (DWD), the Kilometre-scale ENsemble Data Assimilation (KENDA) system (Schraff et al., 2016) has been developed for the COSMO (COnsortium for Small-scale MOdelling, Baldlauf et al., 2011) and the ICON (ICOsahedral Nonhydrostatic, Zängl et al., 2015) models. Since June 2020, the radial wind and reflectivity data have been assimilated via the local ensemble transform Kalman filter (LETKF, Hunt et al., 2007) combined with the latent heat nudging (Stephan et al., 2008) for the COSMO model in the operational suite; the ICON-LAM (ICON – Limited Area Model) is the limited area version of the ICON model and is to replace the COSMO model in the operational forecasting system. The ICON-D2 (D: Deutschland (Germany); 2: 2 km) is an ICON-LAM setting at approximately 2 km grid spacing, which is restricted to Germany and the neighboring countries and became operational for very-short-range forecasting since February 2021. Despite rapid progress, convective-scale data assimilation is still at an early phase of development, and a number of challenges remain for both variational and ensemble-based methods, e.g., imbalance due to rapid update (Bick et al., 2016; Lange et al., 2017; Zeng et al., 2021 b), strong nonlinearity of models and observation operators (Wang and Wang, 2017), model error due to unresolved scales (Zeng et al., 2019, 2020) and parameters (Ruckstuhl and Janjić, 2020), and representation error of observations (Janjic et al., 2018). In the present work, we focus on the last topic.

As stated in Janjic et al. (2018), the observation error consists of two components in the context of data assimilation: the first is the instrument error that occurs during the measurement process; the second is the representation error that is understood as the difference between the actual observation and its modeled representation, and it can be primarily categorized into three types: observation operator error, pre-processing or quality control error and error due to unresolved scales and processes. In this work, for brevity of text and convenience of explanation, we denote the observation error with “OE” and the instrument error with “IE”, and we group the observation operator error together with pre-processing or quality control error as forward model error and denote it with “FE”, and denote the representation error due to unresolved scales and processes with “RE” (i.e., OE = IE + FE + RE). In general, the FE and the RE are larger than the IE and the IE is better understood (e.g., standard deviations of the IE for radar reflectivity observations are proportional to the measured values, Doviak and Zrnic, 1993; Xue et al., 2007). To quantify the OE statistics, the methods of Hollingsworth and Lonnberg (1986) and Desroziers et al. (2005) have been widely used in practice. The former is based on the first-guess departure, while the latter is based on the first-guess and analysis departures and has enjoyed more popularity in recent years. For instance, the Met Office uses the Desroziers method to calculate the interchannel error covariances for satellites and incorporates them in the OE covariance matrix in the 3D-Var analysis (Weston et al., 2014; Waller et al., 2016 a), and so does ECMWF (Bormann et al., 2016). The DWD specifies the OE variances for conventional observations (Schraff et al., 2016) and MODE-S observations (Lange and Janjić, 2016) based on the Desroziers diagnostic in the KENDA system. Furthermore, Météo-France, the Met Office and JMA (Japan Meteorological Agency) have also applied the method for radial wind observations to estimate spatial error correlations that are then accounted for in the data assimilation (Wattrelot et al., 2012; Simonin et al., 2019; Fujita et al., 2020). In the present work, we use the Desroziers method to explore characteristics of the OE for reflectivity and radial wind in the operational ICON-LAM-KENDA system of the DWD. It is the first application of radar data assimilation using this framework (a similar study has been done by Waller et al., 2019 but for the COSMO-KENDA system and only for the radial wind). To authors' knowledge, it is also the first in-depth attempt to investigate the OE statistics (variances and correlations) of reflectivity data. However, the estimated OE statistics embraces contributions from the IE, FE and RE, and it is not clear how much an individual error contributes. To approximate the RE, we assume that a high-resolution model is the truth and we regard model equivalence of radar data calculated from the truth as observations (e.g., Waller et al., 2014, 2021) and evaluate the statistics from a set of samples of differences between observations and model equivalence of the low-resolution model run, which can then be compared with the OE statistics estimated by the Desroziers method.

The paper is organized as follows. Section 2 describes the concepts of the two methods used here to compute the observation error statistics. Section 3 gives details about the ICON model and the radar observation operator. Section 4 presents the experimental settings and results, followed by Sect. 5 for summary.

2 Methodology

In this section, we describe two methods used for calculating statistics of the RE and OE.

2.1 Samples of error due to unresolved scales and processes

In spite of increasing resolution in operational NWP models, convection cannot be completely resolved and shallow convection has to be parameterized. It is known that with a higher horizontal resolution the model can better resolve updraft and vertical transportation of energy and more accurately describe orography (Wedi, 2014). To mimic the RE, one can treat the mapping of states from a high-resolution model as observations (Waller et al., 2014), and the low-resolution model is considered as a truncation.

Following the similar approach of Zeng et al. (2019), differences between forecasts of two model runs with different resolutions, expressed in the observation space, are used to represent the RE:

\begin{array}{l} η_{k} = & H \{[M^{H} (x^{H} (t_{k} - t))]\} \\ (1) & - H \{M^{L} [T (x^{H} (t_{k} - t))]\}, \end{array}

where ℋ is the observation operator, ℳ^H and ℳ^L are models at high and low resolutions, respectively, x^H is the state of ℳ^H, 𝒯 is the interpolation operator. t is the predefined forecast time, and t_k is an arbitrary valid time. For any t_k, we can calculate a η_k that is a sample for the RE. A flowchart of this approach is given in Fig. 2 of Zeng et al. (2019).

Running models for a period (with a certain weather pattern), a set of samples is produced. If the size of samples is sufficiently large, statistics of samples should provide useful information on the nature of the RE (under certain weather conditions). More details about the settings of model runs can be found in Sect. 4.1.

2.2 The Desroziers method

The Desroziers method (Desroziers et al., 2005) calculates the expected value of the outer product of the first-guess departure (or called innovation) $d_{o-b} = y^{o} - H (x_{b})$ and the analysis departure $d_{o-a} = y^{o} - H (x_{a})$ to approximate the observation error covariance matrix:

\begin{matrix} (2) & R_{est} = E [d_{o-a} d_{o-b}^{T}], \end{matrix}

where y^o is the observation vector. R_est is optimal in case of a linear observation operator and uncorrelated background error and OE covariances (denoted by P^b and R) that are perfectly specified (Reichle et al., 2002). Although these assumptions are usually not satisfied in practice, R_est is still widely used as a qualitative indicator for the OE statistics. Besides, Desroziers et al. (2005) initially suggested applying Eq. (2) in successive iterations to converge to the truth; however, a useful estimation can be often obtained in the first iteration (Waller et al., 2016 b). Therefore, considering the computational cost, most of the studies with operational NWP models have performed only the first iteration (e.g., Weston et al., 2014; Lange and Janjić, 2016; Waller et al., 2016 a; Bormann et al., 2016). In this work, we also compute the Desroziers diagnostic in one iteration. Furthermore, as in Waller et al. (2016 a), the means of d_o-a and d_o-b are subtracted to ensure that the bias does not affect R_est.

In the following, we estimate statistics of the RE of radar reflectivity and radial wind data by using the method from Sect. 2.1 and statistics of the OE by applying the Desroziers method to an data assimilation experiment with a low-resolution model, i.e., $d_{o-b} = y^{o} - H (x_{b}^{L})$ and $d_{o-a} = y^{o} - H (x_{a}^{L})$ . It should be mentioned that due to the logarithmic unit of reflectivity it is very well established in the radar data assimilation community to set a threshold value for very small reflectivities (e.g., with negative values) to avoid unrealistically large increments (Zeng et al., 2021 a) and spurious convection (Aksoy et al., 2009). In the operational settings of KENDA, the threshold value is 0 dBZ, which means all reflectivity values lower than 0 dBZ are set to 0 dBZ, and we call 0 dBZ data “clear-air reflectivity data”. Before the superobbing, the same threshold value is set to all observations and to all simulated reflectivities in each background ensemble member. However, regarding the Desroziers diagnostics, the standard deviations of the OE may be underestimated since the same threshold value is set to both observations and backgrounds (Zeng et al., 2021 a). To mitigate this problem, we calculate Desroziers diagnostics for reflectivities with values ≥5 dBZ either in observations or in backgrounds or in analyses. To be consistent, we also calculate the statistics of the RE for reflectivity data ≥0 and ≥5 dBZ, respectively.

3 The ICON model, radar observations and the observation operator

The ICON global model, which has been in operation at the DWD since January 2015 (Zängl et al., 2015), is non-hydrostatic and is based on an icosahedral (triangular) grid with a horizontal resolution of 13 km and 90 vertical levels. The ICON-LAM is the regional model and the ICON-D2 is one version of the ICON-LAM, with the domain as shown in Fig. 1 and with a horizontal resolution of 2.1 km and 65 vertical levels. The ICON-D2 model became operational at the DWD since February 2021. Lateral boundary conditions for the ICON-LAM Ensemble Prediction System (EPS) are provided by the global ICON EPS, with a resolution of 40 (20) km globally and 13 (6.5) km over Europe for the ensemble (deterministic run). The deep convection is explicitly resolved and the shallow convection is parameterized with the Tiedtke scheme (Tiedtke, 1989). The turbulent kinetic energy (TKE) scheme for turbulence is developed by Raschendorfer (2001). The Lin–Farley–Orville-type one-moment bulk microphysical scheme is used, which predicts cloud droplets q_c, cloud ice q_i, rain q_r, snow q_s and graupel q_g (Lin et al., 1983; Reinhardt and Seifert, 2006).

https://amt.copernicus.org/articles/14/5735/2021/amt-14-5735-2021-f01

Figure 1Illustration of the ICON-D2 domain with the orography and of the radar network of the DWD (each station is denoted by a red bullet and the station number; the scanning range is denoted by a circle).

The DWD utilizes a network of 17C-band Doppler radars covering Germany and part of adjacent countries (see Fig. 1), A complete radar volume scan lasts 5 min and it consists of 180 range bins (resolution of 1.0 km), 360 azimuths (resolution of 1.0^∘) and 10 elevations (0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 8.0, 12.0, 17.0 and 25.0^∘). To transform model variables to synthetic radar observations, an Efficient Modular VOlume scanning RADar Operator (EMVORADO, Zeng, 2013; Zeng et al., 2014, 2016) has been developed. The EMVORADO is coded in a modular way and is able to simulate effects such as beam bending/broadening/shielding, fall speed and reflectivity weighting for radial wind, attenuation, detectable signal, etc. Reflectivities are first calculated on the model grid points and then interpolated onto radar coordinates. There are two scattering schemes are implemented: the Rayleigh approximation for simple near-spherical hydrometeors whose size is small compared to the wavelength and the Mie method for one- and two-layered spherical hydrometeors of arbitrary size. To simulate radial wind, three wind components are interpolated onto radar coordinates and then radial winds are calculated. In the operational settings, the EMVORADO is run with the Mie method (using look-up tables) and takes beam shielding, fall speed, attenuation and detectable signal into account. Beam bending and broadening effects as well as reflectivity weighting are omitted for the sake of efficiency (computational costs can be found in Zeng et al., 2016). The $4 / 3$ Earth radius model that assumes a standard atmosphere is used to mimic the beam propagation (Zeng et al., 2014).

4 Experimental settings and results

In this section, experiments are performed to create samples for estimation of the OE statistics of radar observations. For each elevation, standard deviations and horizontal correlations of the OE at different heights are calculated as in Waller et al. (2016 c, 2019). For comparison, the same is done for the RE. Results are shown for elevations of 0.5, 1.5, 3.5 and 5.5^∘. The elevations higher than 5.5^∘ are not shown due to a small number of samples. As in Liu and Rabier (2002); Waller et al. (2016 c, 2019), the correlation length scale is determined by the distance, at which the correlation coefficient is not longer greater than 0.2. Standard deviations are averaged over all samples and correlations are calculated for each elevation at each radar station for a specific time and then averaged. As Waller et al. (2019), if the numbers of samples available for estimation are too small (e.g., <1000), the estimated standard deviations and correlations might be considerably contaminated by the sampling error and therefore are not reliable. Furthermore, it is noted in Waller et al. (2017) that for the local ensemble data assimilation scheme the error correlation between two observations y_i and y_j estimated by the Desroziers method is correct if the observation operator applied to calculate the model counterpart of y_i acts only on states updated by y_j; however, the LETKF does not seem to suffer strongly from this issue as shown in Waller et al. (2019).

4.1 Observation error statistics estimated by samples of error due unresolved scales and processes

4.1.1 Experimental settings

To create samples for the RE, the ICON-D2 model (equipped with the EMVORADO) is run with a resolution of 1.0 km for a training period from 26 May 2016, 00:00 UTC to 25 June 2016, 00:00 UTC, which has been investigated in a number of studies (Zeng et al., 2018, 2019, 2020). During the period, a large area of southeastern and central Europe was hit by severe thunderstorms with heavy rain. The hourly outputs of the model run at 1.0 km are interpolated onto a coarser grid with 2.1 km (operational) using the iconremap utility from the DWD ICON tools (Prill, 2014), and the interpolated states are used as initial conditions for 1 h forecast runs at the resolution of 2.1 km. Both high- and low-resolution model runs are driven by hourly boundary conditions. For any time during this period, one can build a difference between two model runs. In total, there are 720 samples of differences. Since the EMVORADO is run together, we have the samples also in radar observation space. Each sample contains differences in radar volume scans of all radar stations. No superobbing has been applied, and no data assimilation has been conducted here since we are interested in the climatology of the RE instead of exact positions and intensities of convection.

https://amt.copernicus.org/articles/14/5735/2021/amt-14-5735-2021-f02

Figure 2Illustration of variations of beamwidths (in km for azimuth resolution of 1.0^∘) with height (a) and surface distances (away from the radar station) with corresponding heights (b) for elevations of 0.5, 1.5, 3.5 and 5.5^∘, based on the $4 / 3$ Earth radius model. The height of the radar station is omitted. The figure for radial ranges with corresponding heights looks very similar to the right panel.

Interpreting estimated observation error statistics of weather radar measurements using the ICON-LAM-KENDA system

2.1 Samples of error due to unresolved scales and processes

2.2 The Desroziers method

4.1 Observation error statistics estimated by samples of error due unresolved scales and processes

4.1.1 Experimental settings

4.1.2 Results

4.2 Observation error statistics estimated by the Desroziers method

4.2.1 Experimental settings

4.2.2 Results

4.3 Discussion