Reply on RC2

In this paper, the impact of assimilating GPS-ZTD data and sounding observations at low and high vertical resolution is evaluated on a case study of heavy precipitation. The COSMO model is employed at 3 different resolutions over 3 domains and observations are assimilated by a nudging technique. Verification of experiments is performed using several metrics, especially regarding precipitation. Regardless of the model resolution, only the assimilation of operational low vertical resolution radiosondes improves precipitation accuracy, while both high-resolution soundings and GPS observations have a negative impact. This is probably due to deficiencies in model physics and, for GPS, to lack of vertical information.


Major Comments
High vertical resolution radiosondes (HR) are assimilated without performing any thinning or data reduction. As far as I understand, since HR vertical levels are much more than model levels (700 compared to 40-80), this means that HR observations are overweighted. I think that this point should be reported and discussed in the manuscript.
We agree that this aspect should be discussed in the manuscript.
The nudging procedure of the COSMO model reads radiosonde reports as they are made available, and after quality and consistency checks, observations are either averaged over each model layer (temperature, wind) or vertically interpolated to the height of the mid model layer (humidity). This is done both for operational soundings (RAD) and highresolution (HR). The improvement gained from the DA comes therefore from how the larger number of levels impacts the layer averages (wind, temperature) or vertical interpolations to the mid model layer (humidity). This aspect will be more relevant as the number of vertical levels is increased with finer model resolutions (40 levels in a 7 km setup, 50 in 2.8 km and 80 for 500 m).
To bring this discussion in the manuscript the following changes will be introduced in Sect. "Nudging of GPS and radiosondes" within section 2.2.1. We acknowledge that the manuscript needs further explanation on how the 99 th percentile of the 3h precipitation aggregates are computed. To this end, we have extended Sect. 2.4 (Verification Metrics). In a nutshell the 99-percentile of 3h precipitation aggregates are obtained as follows: We obtain 3-hourly precipitation aggregates for the grid points within the investigation area. Then the 99-percentiles are obtained from the sample of all 3-hourly intensities at each grid point during the day of precipitation, i.e. for eight time steps during 24 September 2012.
Moreover, how are the raingauges treated? For example, for the domain average, are they aggregated to the same grid of MSWEP to take into account spatial variability?
We used the values of RG 24hly and 3hly precipitation aggregates at each station to obtain the spatial average (Fig.5a) and the 99-perc percentile (Fig.5b). Hence, in the old version of the manuscript, no interpolation to a common grid (0.1° from MSWEP) was carried out for RG. This, as pointed out, can pose problems regarding comparability between the different data sets.
We have performed supplementary analyses for this review, trying interpolating the RG 24hly and 3hly to the 0.1° MSWEP native grid by means of an Inverse Distance Weighting Method (Hodam et al., 2017). The analysis has revealed spurious artifacts around the point stations and unrealistic precipitation gradients with no agreement with the original RG distribution. This illustrate the complications of interpolating station, information. Therefore we have decided to restrict the verification to the MSWEP product to avoid the artifacts produced by the gridding of RG data.
More inforamtion and supplementary plots will be provided in the detailed answers to the reviewers.
Finally note that the title of subplot "b" has to be swapped with that of subplot "c".
Indeed, panels b) and c) within Fig. 5 were interchanged. This has been corrected in the new version of the manuscript. L56-58. In contrast to GPS, satellite and radar are claimed to not be all-weather observations. Regarding radar reflectivity, even if it is particularly useful in case of precipitation, it can be gainfully assimilated also in no-precipitating conditions to suppress spurious model rainfall (see for example Bick et al. (2016) and Gastaldo et al. (2021)

for COSMO-LETKF, but the same holds for nudging schemes). About satellite observations, clear-sky observations have been assimilated for many years, but there are several studies dealing with the all-sky assimilation (see for example Geer et al. (2018) for a review). So, please explain more in detail what you mean.
This statement was incorrect. Satellite and ground radars also measure atmospheric variables in cloud-precipitation situations. The intention was highlighting the advantages of GPS in measuring IWV as opposite to satellite products for the same variable. For example, IWV measurements from MODIS only provide IWV estimates in clear conditions, or cloudy areas, above the cloud tops. As opposite to GPS, that also in the presence of clouds can provide information of the IWV. The statement in the introduction has been rephrased. Davolio et al. (2017) restrict the correction to boundary layer. Looking at their Table 3, the moisture correction is smoothed in the boundary layer.

L61-62. I am not sure that
Yes, this is correct. In the Davolio et al., (2017) paper it is explained: "The main role of the parameter is to limit the specific humidity adjustment in the boundary layer, in order to avoid too unstable profiles that can produce excessive convective activity".
Indeed, the correction is not restricted in the boundary layer, and it is truncated only at a height of 8 km. The information will be corrected in the manuscript. We compute FSS using moving boxes of neighbour length (N=20), not 18. This will be corrected in the manuscript. This means that the fractions of precipitation (f=n precip /n tot ) for the model (f mod ) and the observations (f obs ) are computed using 2*20+1 grid points in both directions (a total of n tot =1681 grid points). This choice of neighbour length N is selected given the fact that the largest skill of the forecast is given when N is the largest possible. Provided the shortest dimension of the investigation area RhoAlps is , N=20 is the maximum neighbour length possible, to comply with n=2N+1. This is what is defined in Roberts and Lean (2008) as Asymptotic Fractions Skill Score (AFSS), that theoretically would have value of 1 in the case of no bias between the model and the observations. This is the upper limit of the forecast skill.

L190-231 Several symbols employed in the equations and in the text
This explanation has been reworked to provide a better explanation on how the FSS is computed. It now shows as follows: L311-312. MSWEP clearly underestimates precipitation over Liguria region compared to RG, This should be reported. Moreover, this may also be taken into account for the subsequent qualitative verification (Fig. 3 and 4).
The underestimation over Liguria will be noted in the new version of the manuscript in Sect. 3.