|Review of the first revision of the manuscript „Rainfall retrieval algorithm for commercial microwave links: stochastic calibration“ by Wolff et al.|
The authors have provided a substantial major revision. They have redone their analysis according to my suggestions and have updated the manuscript. The updated analysis did not change the results much, but the methodology is now much more sound. My only complain is that the newly added text is often not of good quality and hence should be revised carefully. In summary, I only have some minor and specific comments that can be addressed in a minor revision.
1. The impact of false-positives and false-negatives that results from the wet-dry classification could be discussed in more detail. Besides providing optimized RAINLINK parameters, this manuscript also shows the sensitivity of the wet-dry classification to its parameters. Most important, it also shows the challenge of wet-dry classification and its limitations, i.e. there is still a significant number of false positives and false negatives even after optimisation. This is important information and hence should be pointed out more clearly. See also my comment on section 3.2.1.
2. I am missing a table with the optimised parameters so that they are quickly to grasp. Maybe this info could just be added to table 1 and table 2. Or the optimised parameters could be shown together with their performance metrics in comparison to the default values and those from other RAINLINK calibrations. The later option might be hard to put into one table without making it confusing, though.
Specific comments (line numbers refer to the diff version):
L19-23: This still sounds as if there is an ongoing decline of rain gauges that is evident from the data availability plot of GPCC. Writing that the GPCC database "underwent a decline" sounds as if GPCC would get less and less data. From how I understand the last GPCC report that I referenced in my last review, this is not true. There is a constant increase of data. The largest portion arrives at irregular intervals and with large delay, though. There might be a global decrease of rain gauges which are in operation, but the GPCC data availability plot cannot be used to deduce such a trend. I suggest to reformulate this section once more.
L62: I would not say that "data-driven solutions are not feasible for places or countries without sufficient reference data". The training can, of course, only be carried out in regions with sufficient reference data. But the trained methods, like the ones in the references that you cite, can potentially be used with data from any region. Transferability can be questioned, though. But this is also true for most other CML processing methods which are typically developed with data from only one climatological region. The big disadvantage I see with data-driven solution is that you cannot readjust them to a new dataset just be tuning two or three parameters. I suggest to slightly rephrase this new section.
L85: This new sentence is hard to understand. Please reformulate and/or split into two sentences.
L94: the part "..., also we..." does not sound like correct English to me. Anyway, a new sentence could be started here.
L176: My question from the last review is still not answered: "How is this relative importance related to the parameter range that was selected?" Let's say, you select a too small parameter range because you do not yet know the sensitivity. Then the "relative partial effect", which as far as I understand, will depend on the absolute step size, which will be very small for the too small parameter range. So my question is not, what is the relative step size, but how the parameter range, which influences the absolute steps size, could impact the importance of a parameter in this analysis.
L241-244: Since WD_p2 is now by far the most important parameter, this should be explained in the text. I would also like to understand why WD_p2 suddenly is so much more important than before.
L249: Remove "the" before "all solutions"
Fig 2.: It is hard to see relations between the different WD parameters. I suggest to rearrange the individual plots to a scatter plot matrix, e.g. using https://ggobi.github.io/ggally/reference/ggpairs.html, because this way all relations and potential correlations would be visible. The distributions that are now shown on the bottom, would also fit on the diagonal in the scatter plot matrix.
L271: "Due to the similar value of WDp1..." I do not understand this sentence. Why are data excluded due to similarity of WDp1 values?
Table 4: Similar to table 3, what is the reason that the order of relative importance changed and that there is clear leader, RR_p5 here?
L309: I do not understand what "bears to similarity" means.
L320: Here you probably mean RR_p4 and not RR_p5.
L330-336: It could be noted here that the k-R relation is not very sensitive to DSD variations for frequencies in the range of approx. 20-35 GHz. Compared to errors from wet antenna or wrong wet-dry classification, the DSD dependence of the k-R relation in this frequency range can be considered to be small.
L346: The MCC of 0.4 for the validation is significantly smaller than the minimum MCC of 0.53 of all "behavioral" solutions from which the mean parameters where taken. What is an explanation for this strong decrease in performance?
L353: Maybe write "wet-dry observations of the reference" to make it clear that these labels are derived from the reference.
L353-355: While 97% (number of dry data points in validation period) and 93% (number of dry periods in calibration period) are numbers which seem close to each other, I want to point out that the relative number of wet periods is more than twice as high in the calibration period (7%) compared to the validation period (%3). I am, however, not sure about the exact impact on the results, e.g. the significantly decreased MCC in the validation periods. You might want to think about this issue and add a comment to the text.
Section 3.2.1: In my opinion it should be pointed out here that the absolute number of false-positives is higher than the number of true positives. This is important for the interpretation of CML rainfall estimates because it means that more than 50% of the data points where CMLs estimate rainfall can be considered artifacts. As can be see in Polz et al. 2020 (https://doi.org/10.5194/amt-13-3835-2020) in Fig 9 this is not uncommon. The impact of the false-positives on the resulting rainfall amount is, however, smaller than the count of the false-positives suggest, as can be seen in in Polz et al. 2020 Fig 9d and 9f. In your case the impact of the false-positives on the rainfall amount might be different, though. Given the impact of false-positives on PBIAS in your analysis, the false-positive rain rates might play a larger role here. This should be discussed in more detail, maybe also in the conclusion section because the frequent occurrence and impact of false-positives seems to be a peculiar characteristic of CML rainfall estimates that all potential users or producers of CML QPE should be aware of.
Fig 5: Why did the results for the default parameters change compared to the same plot, Fig 4, in the initial submission? E.g. KGE is now 0.37 for the default parameters. It was 0.45 in the initial submission for the default parameters.