Revisions have improved the manuscript, helping me understand both the new retrieval and the existing SCA retrieval better. I have some new questions about the various hypotheses about what makes the new retrieval better, and I think these could be clarified. Additionally, the authors have decided to completely remove the analytically derived error estimate from the manuscript because it was found to be incorrect, and I agree that removing the incorrect derivation was the right decision. However, this leaves some cases without any error characterization. Because this retrieval is still very impacted by large measurement noise (despite being much improved over the existing retrieval), characterizing the distribution of error in the retrieved results is quite important.
Line numbers in the following comments refer to the "Tracked changes" version of the document, amt-2021-212-ATC1.pdf.
Major Theme 1: there are various stated reasons for why the current method is better than the SCA; it would be good to emphasize if there is just one that is the key reason (which I think there is), and downplay those that are more minor.
First I want to say, in the paragraph at L250-256, thanks to the authors for adding this note about how the MLE and SCA solutions are algebraically the same when the box constraint is removed from MLE and the zero-flooring is removed from SCA. This insight was very helpful for my understanding of both retrievals, and makes the paper better. But now this better understanding has caused me to question several other details, and I think it's possible to have a more consistent picture throughout the manuscript of how the methods differ and what the impacts are.
The various reasons are first introduced at the paragraph at L76-91. (1) "we account for the noise of the signals" (2) the new retrieval is simultaneous with respect to backscatter and extinction (3) constraints are applied (which are applied differently than the zero-flooring of the SCA) and (4) "dominant anti-correlated noise that originates from the cross-talk correction" is "automatically detect[ed] and suppress[ed]".
Reading between the lines of the manuscript, it seems the SCA implements its zero-flooring constraint after the mathematical retrieval of backscatter and extinction; does this mean it breaks the connection with the measurements, such that the final reported backscatter and extinction solution cannot reproduce the measurements well? If that's correct, I think that's probably the key difference, since the new retrieval, in contrast, implements the box constraints as part of the retrieval and requires the retrieved backscatter and extinction to be consistent with the measurements subject to the constraints. So, it's a better retrieval because it is self consistent and preserves consistency with the measurements. Do the authors agree? If so, this is basically the same as the reason marked (3) above, but I believe it should be possible to clarify the introduction, discussion and conclusions to make it more obvious. I believe it would also be helpful to remove or downplay the other distracting reasons that are less informative, or to relate them to this main reason.
As for the other three reasons: In backwards order, first the "dominant anti-correlated noise". In my first review, I had trouble interpreting the impact of the anti-correlated noise and the authors response says this is actually not important. In that case, I think this reason should be dropped from L84.
At L 520 "With this [the introduction of constraints] the anti-correlated noise in the cross-talk corrected signals can be traced back and effectively suppressed", I offer an alternate version that I think is more in line with the author response and revisions in the Appendix: "Since the box constraints are integral to the simultaneous retrieval of backscatter and extinction, noise is suppressed in both products simultaneously, in contrast to the zero-flooring of the SCA which works on the channels independently without regard to the fact that the errors are correlated due to channel cross talk (Appendix A)."
Next, the simultaneous retrieval of backscatter and extinction. I think this is important too, but not in the way I first interpreted the writing. L 516 says "A coupled retrieval may improve the precision". However, the authors have now convinced me that if it weren't for the constraints (box constraints for MLE and zero-flooring for SCA), both retrievals are the same algebraic noise-fitting solution, since they have the same number of measurements as unknowns. So, in the absense of the constraints, it doesn't matter at all if the backscatter and extinction are retrieved sequentially or coupled. Rather, as the authors emphasize, it's the better implementation of constraints, an implementation that minimizes disagreement with the measurements, that improves the precision. It's much easier to implement these better constraints in a simultaneous retrieval, of course. So, I think it's something like "A retrieval that implements constraints simultaneously with the retrieval of backscatter and extinction improves precision".
Finally, does accounting for the noise in the signals improve the retrieval? As the authors have pointed out in the discussion and revisions, there are the same number of unknowns as measurements, and so the solution (in the absense of the box constraint) is just the algebraic solution. In that case, it doesn't matter what the measurement error covariance matrix looks like; the same solution will be found. What about with the box constraints? No, I think with no prior or regularization term, I believe the minimum cost is still independent of the measurement error covariance matrix, although that minimum won't be zero. In the implementation of the retrieval, if the iterations are cut off at some thereshold that's dependent on the measurement error covariance matrix, then of course that matrix would impact the solution in that way, but if the algorithm actually converges to the minimum cost function (which I believe is the intent here) then I think the measurement error covariance matrix will not impact the solution. Am I correct?
If so, then almost everywhere that the measurement error covariance matrix is mentioned is confusing and somewhat spurious, and should probably be reconsidered. E.g. Line 80 (mentioned above, where better performance is attributed to accounting for signal error); and L 273 - I understand of course the desire to avoid the complication of the off-diagonal terms in the covariance matrix, but if the covariance matrix doesn't impact the solutions, perhaps this isn't very relevant; and L 279 - "As pointed out by Povey et al (2014) unbiased estimates are a prerequisite" - in Povey et al, it's a prerequisite for an optimal estimation retrieval that has a prior term that must balance with the measurement term, but this retreival does not have that feature.
Major Theme 2: There's a need for better characterization of the error distribution of the box-constrained MLE retrieval results. The authors have shown that the new retrieval produces a better-looking solution than the SCA and SCA MB retrievals, but they also have shown that it is still quite impacted by measurement noise and the resulting error is quite significant (100% or more). For that reason, the results should not be used without an understanding of their uncertainty. The simulation cases, therefore, are a hugely important part of this manuscript. It is also important to include some kind of estimate of the spread of solutions for the included real data cases as well. Here are the easiest ways I can think of to do this.
Figure 7. Each solution appears to be an average of multiple bins. Can the figure include an indicator of the actual spread of solutions obtained from each averaging interval?
Figure 8. I appreciate that this case is harder, since it is only two individual profiles. One approach is to produce another simulation with profiles taken from the solution of this real data case (i.e. same backscatter and extinction as one of the profiles or the mean of them). If it is difficult to use the instrument simulator to do this, then I think it would be accpetable to use the assumed measurement error covariance matrix (i.e. from Eq 13) to estimate it. (This is the clarification of my earlier suggestion that the authors requested.)
Ultimately, the authors would like an analytical error calculation since only an analytical solution is believed to be fast enough for practical use in real data processing. However, since both the input errors and the forward model (when box constraints are included) are non-linear, any analytic solution with assumptions will be hard to accept until sufficient research and analysis demonstrate consistency with existing numerical solutions for real data. So, a fair number of numerical solutions will be required to be calculated anyway. This underscores my hope that it's not unreasonable to wish for numerical results for the specific cases in the manuscript.
I would also suggest completely removing the analytic error propagation. E.g. the paragraph at lines 324-334 and Appendix C. The authors know it does not correctly represent the error propagation of their retrieval because it doesn't include the box constraint. For that reason, they have deleted all the results relating to this error propgation. So, it should not still be included in the methodology. I think everything after "whereas" at line 325 should be deleted (as well as Appendix C). If desired, the authors can make a short statement acknowledging that an analytical error propagation for the current retrieval would have to include a way of representing the impact of the box constraint, which is a topic of future work.
Minor Theme 1, related to Major Theme 2. The representation of the errors for the simulation cases can be improved.
Figure 2 and Figure 4 make clear that the standard deviation is not a good representation of the spread of the errors in the box-constrained MLE method (so perhaps also for the SCA and SCA MB as well), in that the shaded area for the lidar ratio goes well below 2 sr, although the constraint makes it impossible that any solutions lie in that range. I encourage the authors to remake the figure with a different visualization of the spread that is more representative of the actual range of the results. An alternative is to use a percentile spread, such as the 25-75% limits. (In figure 3, since standard deviation is shown along with a bias statistic, I think it's more acceptable).
Previously I asked about the potential for bias caused by the box constraint algorithm and the characterization of the error output to account for potential bias. The authors have added a line late in the paper that suggests that an observed bias may be due to the box constraint, and I appreciate that, but they also admitted they do not have a good understanding of the algorithm. I think more analysis is required to understand how the algorithm affects the distribution of solutions. I suggest one way to gain a better understanding is to show a histogram of the solutions from the synthetic cases, particularly a histogram of the lidar ratio with mean and median marked. For instance, it would be good to see the behavior near the edges (e.g. 2 sr), whether the histogram looks truncated or rather "piled up".
Figure 3. I like the new Figure 3 very much; it is very helpful to see the behavior specifically in the region with significant aerosol, and helpful to see a bias calculation paired with the standard deviation, and to have equations at line 364-365 specifying the statistics. I would also like to see panels showing the statistics for lidar ratio (which is, after all, a directly retrieved quantity from the MLE retrieval.)
Figure 5. Figure 5 should have the addition of the zoomed in boxes, like Figure 3 has.
Minor theme 2. Items related to the discussion of ratioing of signals and discontinuities where the range bin size change.
L 367-370 (a). First, I suggest replacing "seems to be triggered by the refined range bin" with a more definite description of the observation, holding the hypotheses for the next sentence. That is, "the bias is colocated with the change in range bin size". I suggest this because I think the change in range bin size only makes the problem more obvious, but does not actually cause the problem (more below).
L 367-370 (b). Next, about the first theory about the bins that are not uniformly filled: does this make sense? Wouldn't the requirement for uniformly filled bins be more badly violated by large bins and better met by small bins? If the suggestion is that it is worse here due to the dramatically increasing slope of the aerosol extinction profile, then the bias would logically be colocated with the slope but only coincidentally colocated with the change in bin size. Flamant et al. is quoted elsewhere as predicting a bias in extinction due to this reason, but is it also expected to affect backscatter, as here? If this part is kept, Flamant et al. should be referenced here, with a specific description of their work showing how non-uniformly filled bins cause a bias.
L 367-370 (c). Finally, I find the other theory more convincing (ratioing of noisy signals). But why doesn't it apply to the MLE retrieval as well?
L412-413, "due to its noise suppression capabilities" is presumably the answer to my question but I find it somewhat vague. Can it be made more specific?
L579. "if this varying reliability of the signals is not taken into account". I believe from L367-370 that the bias is linked to taking the inverse of a noisy signal, not because of a discontinuity in bin-size, although it is more noticeable because the discontinuity in bin-size results in a discontinuity in the bias.
Miscellaneous minor suggestions in line-number order.
L 37. Illingworth et al. 2015 is a secondary reference with respect to the idea that the indirect effect of aerosol on clouds is the largest uncertainty in radiative forcing. I suggest referecing some primary sources, or a major climate change review such as the IPCC report.
L303. "not invariant under variable transforms". While admittedly I only quickly skimmed Zhu et al (1997), it caught my eye that they say the algorithm is indeed invariant under transforms with the exception of the first step away from the first guess. This suggests to me that efforts to find a better first guess might be better rewarded than attempts to find a better transform to deal with a poor first guess. The aerosol-free atmosphere is a difficult first guess to work with. The standard HSRL technique provides backscatter in one algebraic step. Perhaps consider calculating backscatter from the ratio of the channels, and with this estimate optical depth using your first guess lidar ratio of 60. (I am not suggesting this is required as a response to this review but offer the suggestion in case it's helpful.)
L312. I'm curious about the description of running 40000 iterations so that "the estimate should fit as close as possible to the signal data". Does the cost function really continue to decrease for 40000 iterations? I believe it's common for the cost function to begin to jump around after a while and not continuously decrease. I agree with the point that cutting off the iteration prematurely leaves some unnecessary impact from the first guess (especially if the measurement error covariance matrix is not strictly correct), but I do think it should be cut off when the cost function ceases to decrease.
L 378. "the feedback" is vague. Does it mean that when the SNR decreases there is a greater proportion of negative solutions that get filtered out by the zero-flooring, and therefore bias the mean solution?
L380. At the start of the added section, it would be good to say "with the exception of the bin closest to the surface". This is mentioned in the following paragraph, but the first paragraph is confusing with this omission.
L 469. Copolarized lidar ratios of 80sr to 120 sr for depolarizing desert dust are attributed to Wandinger et al. 2015. Does that paper really present copolarized lidar ratios? Or is this a calculation of the authors' based on non-polarized lidar ratios from that Wandinger et al. 2015?
L 415-421. What does the averaging kernal look like below the cloud in the region that has up to 100% bias and up to 500% relative error, but where the average lidar ratio "remains quite accurate"? It would boost confidence in the conclusion, if the averaging kernal also shows that the optical depth is not reliable but the lidar ratio is.
L 517. Geometric overlap? Is that relevant for a satellite lidar like AEOLUS? Is this sentence meant as a more general discussion that also encompasses ground-based lidar?
L 517. I can't see the cross-talk calibration as part of any tradeoff between a coupled or sequential retrieval, since errors in the cross-talk calibration will significantly impact either style of retrieval.
L 522 (approximately). It should be repeated in the conclusions that in the simulations moderate amounts of aerosol still cannot be distinguished from zero (despite the fact that the new retrieval does significantly better than the existing one). This can be part of the motivation for the future work with signal accumulation. (And by the way, the manuscript has quite a good explanation of the motivation for the scene-based retrieval strategy.)
L 594. "the contribution from the Mie channel in the particle-free atmosphere is pure noise". But C1=C4=1, so the molecular backscatter is distributed evenly across the two channels, so I don't think this is true. Perhpas "the signal in the Mie channel in the particle-free atmosphere is more than half noise"
L35. I suggest avoiding the awkward parantheses. Specifically, I suggest removing the parentheses around "optical" and simply deleting "(change)"
L40. Consider deleting "so-called". "So-called" usually has a connotation that the speaker does not agree with the label or as a way of using an informal-sounding name in a more formal context, neither of which apply here. I know the intent is "what is called" but it's redundant here anyway, so could just be deleted.
L276-277. The sentence that starts "This accounts" is confusing, and I'm not sure I really understand it. Can the authors please reword this?
L297. Consider changing to "can cause the retrieval results to underestimate the true particle extinction by a factor of 16, and therefore underestimate the lidar ratio".
L314. I didn't follow what "even in unfavorable conditions" refers to.
L386-387. I suggest deleting "likely due to the diminishing influence of the lowermost optical depth on the cost function". The non-sensitivity of the cost function at this point does not determine that it will over or underestimate, but it does show that a big error was expected, and it's not a speculation. The second part of the sentence can stand on its own without the "likely" part.
L394. Not everywhere but nearly everywhere.
Figure 6 labels and caption. "Lidar ratio" should be "copolarized lidar ratio" (everywhere, but particularly important where measured data is described).
Figure 7 caption. The sentence beginning "The upper error bound" is no longer relevant.
L505. The sentence "It should be stressed...lower error margins" is no longer relevant.
L579. Change "if" to "since"