Reply on RC1

Various combination of model resolutions and observations are tested. The performance of these forecast is mostly assessed from the resulting precipitation compared against observations. The overall conclusion is that the assimilation of operational radiosonde data is important but assimilating extra high resolution observations is not. Deficiencies in modeled moist processes and lack of vertical information in GPS observations are given as factors that could explain the results obtained.

Various combination of model resolutions and observations are tested. The performance of these forecast is mostly assessed from the resulting precipitation compared against observations. The overall conclusion is that the assimilation of operational radiosonde data is important but assimilating extra high resolution observations is not. Deficiencies in modeled moist processes and lack of vertical information in GPS observations are given as factors that could explain the results obtained.
With their heterogeneous distributions and difficult statistical properties, physical state variables such as moisture and precipitation remain challenging to data assimilation and verification. As such, this manuscript takes place in the context of an active topic of research. While the experiments and analyses presented are not fundamentally novel, they contribute to a better understanding of data assimilation for moist processes. The topic is interesting and within the scope of the Weather and Climate Dynamics journal.
The manuscript is well organized and generally easy to follow. The indepth examination of the meteorological impacts (i.e. changes in moisture) brought by the assimilation process is interesting.
We would like to thank Dr. Dominik Jaques for his valuable comments and corrections. We have accepted most of the remarks. In the following, we provide detailed answers to his questions/requests.
Perhaps the area that needs the most improvement is the description of results related to figure 5. As discussed in major comment 1 below, the description of certain scores is missing or unclear. There is also a labeling error in figure 5.
A detailed review of the changes carried out related to Fig. 5 is included later in this document in the Major Comments section.
Only one precipitation case is presented in this study. On the one hand, this allows for an in depth analysis of the factors contributing to this precipitation event. On the other hand, this imposes a strong limitation on the generalization of conclusions drawn from the various analyses. Luck (good or bad) cannot be ruled out of the many factors influencing the forecasts. Interestingly, the analysis reveals that the assimilation of one radiosonde in the operational network has a significant impact on the forecasts being performed. One can wonder if the conclusions of the manuscript would have been different had this radiosonde been part of the extra high resolution observations being tested. The examination of only one precipitation event should not prevent the publication of this manuscript. However, the limitations that come from this should be emphasized in the concluding statements. Special care should be taken with respect to the models treatment of moist processes (section 5a) which seem to be supported by other studies but which may only be applicable to this one case.
The reason for using just one case study was to be able to focus on the different impacts of each observation type. With a total number of 21 simulations, 3 observation types (and their combinations) in 3 different resolutions, considering several cases would have been challenging.
We planned these experiments as an illustrative means of assessing the improvement potential of each observation type. Indeed, these experiments belong to a series of GPS assimilation experiments reproducing the whole 2012 Autumn period, where we got further insights on the model biases, regarding water vapour and precipitation. Analysing the impact of the different observation types on all cases of the 3-month period, would have not allowed such an in-depth assessment. This is why we simulated IOP6 separately.
Nevertheless, as pointed out, the manuscript should clearly state that the findings relate to this one case study, and that generalisation of the results is therefore constrained. Several modifications have ben included to clearly stress this point in the new version of the manuscript.  We have added a new subsection (2.4.1) to introduce how the percentiles are calculated, within section 2.4 Verification metrics. In a nutshell, we obtain 3-hourly precipitation aggregates for the grid points within the investigation area. The 99-percentile is obtained from the sample of all 3-hourly precipitation intensities at each grid point during the day of precipitation i.e., for eight time steps during 24 September 2012. More detailes will be provided in the new version of the manuscript.

Major comment
The Fraction Skill Score is in the range [0,1] but the y-axis of panel c) goes from 5 to 35.

Because of this, most of the discussion on lines ~350-380 is difficult to follow and/or interpret. It is believed that this part of the text and figure 5 should be reworked before publication of the manuscript.
Panels b) and c) in Fig. 5 were wrongly interchanged. This has been corrected in the new version of the manuscript and the text has been reworked Still on the topic of verification, the use of anomaly correlation for the verification of precipitation in a day-to-day forecasting context is unusual and somewhat confusing. If the concept of anomalies makes sense in a climatological context, it is more difficult to apply in a weather context. In my understanding, the anomalies should refer to some departure from a preferred mode for the model solution. Because the mode of highdimensional pdfs are generally difficult to estimate, they are often replaced by the average of a large number of such solutions. Many seasons are averaged for climate forecasts, many ensemble members may be averaged for ensemble forecasts. In the present case, for a single weather event in a deterministic context it does not seem possible to know the normal mode about which the anomalies could be estimated. In particular, the daily average precipitation for one case cannot be thought of as normal baseline against which anomalies can be estimated.
That said, the correlation coefficient between two fields can be used in the context of verification. To avoid the confusion that arise from the concept of anomalies, it is suggested that correlations be estimated from the fields themselves. Just remove the \overbar{mod} and \overbar{obs} from eq. 5. The results previously obtained will be unchanged since the Pearsons's correlation coefficient is invariant to such offsets by constant values. As a final note, one should remember that due to its non-linear response, Pearson's correlation coefficient is difficult to interpret in the context of verification. This problem is discussed in the appendix of https://doi.org/10.1175/MWR-D-18-0118.1.
We agree with the reviewer that treating the precipitation average of one case cannot be understood as the normal baseline of the event and changed this to a full timeseries correlations .
We have computed the correlation coefficient as suggested by the reviewer, removing the subtraction of the timely means, with an invariant result.
However, the formulation of Eq. 5 remains the same, as it is the formulation of Pearson's correlation coefficient. We have adapted the text to better explain that in Eq. 5, obs and mod and stand for the spatially averaged precipitation for time step t=i measured by MSWEP and simulated by COSMO, respectively. With out subtraction of the period mean, as suggested by the reviewer. The corresponding explanations will be included in the revised version of the manuscript.

Most description of results repeat a lot of information that can be read from the figures. This makes these description quite lengthy and somewhat difficult to read. For example, the beginning of section 4.2 is especially hard to follow. The paragraph ~445-450 also repeats a lot of information accessible in the table being discussed. It is suggested that the description of results be shortened or summarized wherever possible.
We have shortened the description of the results wherever it was possible, aiming at providing clearer descriptions of the findings.

References to supplementary material Often figures found in the supplementary material
will be referred to alongside the other figures. For example on line 377 we find "... western side of the Alps (Figs. 4b, S1b ans S2b)." If the supplementary material will not be immediately available to the readers of the manuscript it is suggested that the supplementary figures not be referred to directly. If these figures are necessary to the comprehension of the text, they should be included in the manuscript.
We have adapted the manuscript not to refer to the SM repeatedly. Only needed graphs are included in the manuscript that are sufficient to comprehend and validate the expressed results.

Minor comments
If any minor comments posted by the reviewer are not anered here in our reply, it is because we have accepted all corrections. Only minor comments that need further explaining are replied here.

Line 133: Because of image compression, the red squares in figure 1b look a lot like circles.
It is true, still thanks to the colour difference between operational soundings (blue triangles) and the high-resolution (red squares), we believe these two observation types are readily distinguishable by eye. Additionally, we have changed, in line 133, the word "squares" for "markers".

Equation 3: Out of curiosity, what is the value of "s" being used? Does it change with the resolution of the model or the observations being assimilated? Should it?
"s" is defined as correlation scale and provides a factor for attenuation of assimilation impact when spreading the information horizontally. "s" varies with altitude and is a parameter pre-defined in the model. For example, for humidity (q) and temperature (T) the correlation scale parameter in km is as follows for pressure levels between 1000 hPa and 50 hPa. The values of s can be found in the model documentation (Schraff and Hess, 2012) but for illustration: applying a for humidity at a 500 hPa implies that the weight of the observation for the horizontal spreading is halved at a distance of 135 km from the observation's location.
We did not perform supplementary experiments varying this parameter as our main goal was to assess the added value of the observation types and their impact on model variables rather than assessing how model parameters could be fine-tuned. We believe such experiments would fall out the scope of the paper. Nevertheless, our interpretation is that adapting the values for s, is more sensible for different observation types than for model resolution, not to harm the information of too close neighbouring observations (as in the case of GPS due to its larger coverage). However, reasonable conflicts could arise from the use of a too large correlation scale for observations close to the surface in different resolutions. The better representation of the model's orography in a 2.8 km and a 500 m resolution could impose orographic boundaries that should be considered to truncate the too large horizontal spreading. This aspect is discussed briefly in the revision in Sect. 2.2.1 The COSMO Model, the nudging scheme.
In figure 4d) we can clearly see artifacts caused by the inflow through the model boundaries. Visibly, it takes some time for the model's parametrizations to generate precipitation from the inflow through the boundaries. Presumably, some of the microphysical species being modeled are initialized at zero at the boundaries. While this does not seem to affect the main areas of interests for this study, this illustrates the difficulties associated with such high resolution forecasts. This phenomenon would probably be worth mentioning.
We have included this observation in the revision, in the description of Fig.4.
Line 334 : "no dynamic impacts" In the Canadian system, the assimilation of radarinferred precipitation through latent heat nudging is shown (see paper references above) to reduce RMSE for upper-level winds by a few percent on average over a twomonth verification period (~110 forecasts). One would not expect to be able to observe such a small signal on the model dynamics for only one precipitation event.
This information has been included in the manuscript to provide further insights on how the nudging of thermodynamic profiles brought a low impact on wind components for ourcase study.

Line 421 Altitude-based corrections can sometimes be significant, especially in mountainous terrain where the difference between the model terrain and observation height can be large. Do we know if this is the case here?
We follow the procedure suggested by Bock and Parracho (2019), where stations with height differences (station altitude vs. altitude of selected grid point) larger than 500 m are dismissed from the calculations. The IWV corrections applied to the remainder stations (dIWV/IWV=-4*10^-4*dh) bring corrections that averaged in time and space are no larger than 0.2 %. For a specific date, after spatially averaging to all stations (within investigation domain RhoAlps) are of 1 % and for particular stations can be as large as ±20 %. These corrections are necessary, especially over complex terrain to consider the height differences. However, for the results presented in Fig. 7 and Tab. 2 bring a marginal impact (~ 1 %), since the values presented are spatially and timely averaged. We acknowledge that the GPS black line is hard to see, precisely because of the good performance of the runs with assimilated observations, that overlay the black line of the GPS. We have added a note "underneath the coloured lines" in Sect. 4.2 to make clear to the reader that the simulations with assimilated observations is underneath all the rest.
Line 532: In other instances of the text, the great heterogeneity of the moisture field is mentioned as a source of complications. It seems reasonable to assume that this likely explains why high moisture content was measured by only one sounding.
We have adapted the corresponding paragraph to clearly state that the large spatiotemporal hetergoneity and variability of atompsheric moisture might have played a decisive role in this measurement.  figure 10 show no obvious differences that would be statistically different between the various experiments. Since this section is quite detailed and the manuscript already long, it is suggested that this section be moved to the supplementary materials. If it is believed that the section should remain in the manuscript, lines ~560-575 should be reworked to improve readability.