Comment on amt-2021-141

The paper by Schneider et al. presents an extension of the former TROPOMI column data set published by the authors last year that was restricted to cloud-free scenes. It is great that the authors now can also retrieve the cloudy scenes and thus extend their previous data set. This allows now for a much wider scope of scientific application than the previous data set and presents therefore a quite valuable data set.

Thank you for your review and your positive evaluation of our work. In the following, all individual comments are quoted in italics and our response is given below.
I think the only drawback is that the structure of the paper is quite similar to the previous paper by the authors (Schneider et al., 2020). It would have been nicer and maybe more interesting to set the focus on the comparison of the old and new data set and the additional gain and application possibilities of the new data set instead of just extending the previous analyses with some more stations (though this of course is also nice and valuable).
The new data set (scattering retrieval) is independent of the old one introduced in the previous paper (non-scattering retrieval) and thus requires a separate validation. For comparison, we have additionally included the performance of the old data set. The direct comparison for the same scenes found at the end of Section 4.1 in the discussion paper has now been extended into a separate subsection and supplemented with a plot which we had left out in the initial submission due to the length of the paper.
I have following comments I would like to ask the authors to consider before publication. P1, L1: The first sentence of the abstract is not clear without knowing what you have done. I would suggest rephrasing and being more precise.
We have rephrased this sentence as follows: "This paper presents an extended scientific HDO/H 2 O column data product from short-wave infrared (SWIR) measurements by the Tropospheric Monitoring Instrument (TROPOMI) including clear-sky and cloudy scenes." P1, L4: Clouds are usually present over the oceans and over land. How does it then comes that you in case of cloud-free scenes derive data over land and if you consider cloudy scenes over oceans?
Cloud-free scenes can be retrieved over land only because bodies of water are too dark in the short-wave infrared to retrieve. Cloudy scenes are retrieved over both oceans and land. This is explained in the main text, but we have added a half-sentence in the abstract to also mention it there. The sentence now reads: "… particularly enabling data over oceans as the albedo of water in the SWIR spectral range is too low to retrieve under cloud-free conditions." P2, L36-38: Same as for P1, L4. An explanation or more information would be quite helpful.
As explained in L31-32, the albedo of water in the short-wave infrared is too low to retrieve in cloud-free conditions over oceans. To make more clear that cloudy scenes are used over both oceans and land, the sentence in L38-40 is rephrased as follows: "This can be remedied by also considering scenes over low clouds, which enables data over oceans and greatly extends coverage over land. To this end, an updated retrieval is employed which accounts for scattering and estimates effective cloud parameters additionally to the trace gases." P2, L40-41: This sentence is not easy to understand, please rephrase.
We have rephrased this sentence as follows: "Any loss of sensitivity to the partial column below the cloud is reflected by the column averaging kernel." P2, L42: "Section 2….." this comes a bit suddenly. Add an introductory sentence to begin this paragraph of paper structure description as e.g. "The paper is structured as follows:" or write "In the next section we describe the retrieval set-up…." We have rephrased it to "The next section describes the retrieval setup …" P3, L59 and 61: fitted to what? Please be more precise.
We have rephrased this as follows: "The inversion derives the target trace gases H 2 O and HDO together with the interfering species CH 4 and CO and a Lambertian surface albedo from the observed spectrum in the spectral window from 2354.0 nm to 2380.5 nm (Scheepmaker et al., 2016). The isotopologue H 2 18 O is included in the forward model but not estimated in the inversion (i.e. the abundance is fixed at the prior value)." P4, L99: "except if the data are assimilated using averaging kernels" I would suggest to make an extra sentence since also the data of all scenes can be used when the averaging kernels are applied. This does not only hold for data assimilation.
We start a new sentence after "recommended to be taken into account by the user" as follows: "If averaging kernels are taken into account, e.g. when assimilating the data, all scenes can be used, although shielding by high clouds may result in quite low information content." P6, L101: Retrieve what exactly? A specific gas or trace gases in general? Also here I would suggest to be more precise.
We mean trace gases in general. We have rephrased the sentence as follows: "Such a surface albedo filter is not applied to cloudy scenes because clouds usually have high reflectivity, which allows the retrieval algorithm to work over very low surface albedos with high signal-to-noise ratio."

P6, L103ff: How is it guaranteed that the missing data that is filled in is reliable? Is there another filter used to check the quality or to filter out the non realistic filled up information?
The a priori profile, which is obtained from the ECMWF analysis product, is the best estimation of the truth that is available. Under cloudy conditions there will always be a nullspace error. The restriction to low clouds (cloud height filter) limits that error. To completely prevent such errors, the user needs to take the averaging kernel into account to avoid null-space errors.
P6, 110: How was this correction factor derived? By comparing FTIR to TROPOMI? If yes, how can you then be sure that the correction factor holds for both data sets (old and new)?
The correction factor is determined by comparing TCCON to MUSICA-NDACC for instruments in both networks, since MUSICA δD is validated by aircraft measurements but TCCON HDO is not verified. The correction thus depends on FTIR measurements only and in particular is independent of the satellite retrieval. We have added a short summary on how the correction factor is determined, see below.
P6, L111: missing calibration of what? Be more precise.
TCCON HDO misses an aircraft correction factor. That corrects systematic biases due to uncertainties in the spectroscopy which tend to be highly reproducible. Usually, the aircraft correction factor is determined by a comparison to aircraft measurements at TCCON sites, but no such reference measurements of HDO exist. We have changed the manuscript as follows: "This factor accounts for a missing aircraft correction factor of TCCON HDO. The aircraft correction factor corrects systematic biases due to uncertainties in the spectroscopy which tend to be highly reproducible (Wunch et al., 2015). It is usually obtained from a comparison to airborne reference measurements at TCCON sites, however such measurements are lacking for HDO. Thus, Schneider et al. (2020) determined an effective factor by fitting TCCON a posteriori δD to MUSICA-NDACC δD because MUSICA-NDACC δD is validated with aircraft measurements." P6, L127: Also in this paragraph you could be a bit more precise. So it seems the altitude difference is generally a problem and as higher the station as higher the problem/error gets? You could e.g. more clearly state here that this is the reason why a high-altitude stations are considered separately.
To explain this better, the end of this paragraph now reads: "High-altitude stations are typically located on mountains and thus most co-located ground pixels have significantly lower surface height. Therefore, such stations are treated separately in Sec. 4.3." P7, L135: Extended to 0 hPa? This is really high. Is this realistic? Are measurements made that high up? 0 hPa is the top of the layering of the forward model. To explain that, we have added the following half-sentence: "… to match the layering of the forward model." Since the abundance of water vapour is very low in these high altitudes, the contribution to the total column is very small, thus this choice influences the result very little. The correction factor can only be determined at the few stations that are in both networks TCCON and NDACC. Most stations are only in one network, thus it is not possible to determine the correction factor individually at each station. We show that the difference between stations is small. Therefore, it is okay to use the average of the stations in both networks at all stations. To clarify this, we have appended the following sentence to the paragraph: "This correction is applied to all MUSICA-NDACC stations, i.e. also those not in the TCCON network." P10, L160: Why should one fill the null-space with MUSICA-NDACC profiles if this data set is then used for validation? TROPOMI would then not be an independent data set. However, for validation rather an independent data set should be used.
The reason for such a potential filling of the null-space would be to account for limited sensitivity of the retrieval, however the prior profiles of MUSICA-NDACC are not realistic enough to do that.

P10, L173: This is a very complicated sentence and very hard to follow. Please rephrase and distribute the content over several sentences.
The sentence is rephrased as follows: "In order to derive total columns, the aircraft profiles are extended to the ground by assuming a constant mixing ratio equal to the lowest observed value and extended to the top with the scaled prior profile. These extended profiles are then vertically integrated to obtain total columns."

P11, L181: add "which is then used for the validation"
Added at the end of Section 3.3. Figure 5: The figure could be improved by using lines and symbols that are a bit thicker/larger. Looking at the number of data points it looks like there are almost no data points from TROPOMI.

P12,
The symbols are now larger, the colormap has been changed.
A TCCON scan takes 2×78s, thus a TCCON station can take spectra every ∼3 minutes under good conditions. Although the measurement schedule varies by station, there can be a lot of measurements during the co-location time of 2 hours around a satellite overpass. In order not to confuse the reader, we now do not show the amount of TCCON measurements any more.

P13, Figure 6: Why is TCCON corrected? I thought MUSICA-NDACC is the data set that needs to be corrected?
Both TCCON and MUSICA-NDACC need corrections to resolves the inconsistency between both datasets in H 2 O and HDO. TCCON HDO is corrected so that TCCON δD matches MUSICA-NDACC δD because the latter is validated (see Section 3.1). MUSICA H 2 O (and HDO by the same factor to not change δD) is corrected because TCCON H 2 O is better validated than MUSICA-NDACC H 2 O (see Section 3.2).

P14, L228: Why is this connected to the different conditions in cloudy and clear-sky
weather? Isn't this simply due to the nature of the relative differences? These are usually more severe (higher) for lower values. The differences of course (i.e lower values and larger differences) may be related to cloudy or clear-sky conditions.
We do not completely understand this statement. We show in our validation that the relative bias in δD is comparable for clear-sky and cloudy scenes, but the absolute bias is higher for cloudy scenes than for clear-sky scenes. That means that the absolute δD values are systematically more negative in cloudy scenes, which we attribute to different weather conditions. P16, L241: Refer here again to Figure 8 . Add also a table with the biases? The  approximate values can be derived from the figure, but if one needs the exact value a  table would be quite useful. This sentence does not refer to Figure 8, but to a separate comparison taking only the same ground pixels for both scattering and non-scattering retrieval into account. To make this difference more clear, we have made a separate subsection for this comparison.
The bias values at individual stations that can be taken from the plot are in our opinion precise enough, especially since there is a large spread in the differences (see violin and box plots), so that we deem no separate table necessary. P17, Figure 9: As for Figure 5, the lines and symbols are too thin and the colors hard to differentiate. For the clear-sky data the corrected and uncorrected data are shown, but for cloudy only the corrected data are shown. Why? Shouldn't also here both been shown, corrected and uncorrected?
The size of the symbols is increased and the colours changed.
In the cloudy case the retrieval is not sensitive to the partial column below the cloud and thus the station height and therefore is not reliable. The station altitude is significantly higher than the maximal cloud height for the cloudy scene filter.

P18, L278: What is the "scaled" and what the "depleted" prior?
Not sure which place in the manuscript you are referring to. We have added the following sentences at line 247: "This prior is referred to as "depleted" prior because a depletion in HDO is assumed to compute it from the humidity profile. The standard prior is also referred to as "scaled" prior because it consists of a scaled humidity profile (i.e. corresponding to 0‰ δD)." We have also added a sentence in the figure caption of Figure 9: "The red points correspond to the standard prior which is scaled from the humidity profile, while the green points correspond to the prior computed assuming a more realistic δD profile." P18, Figure 10: Also here the figure could be improved by using a thicker line style.
Line width is increased.

P21, L294:"The data reduction method is described in Sect. 3.3"? You mean the collocation criteria is described in Sec. 3.3?
We mean not only the co-location criteria but also the computation of the total column from the aircraft profile with limited height coverage. We have now written "co-location method". P21, Figure 13: Use also here colors that are better differentiable and line styles that are thicker and better visible.
Symbol size is increased an colours changed.
P22ff: I really appreciate that you demonstrate the application of the data set, however, since your paper is already quite long and complex I wonder if it wouldn't be better to have an own, more sophisticated application paper where you also could include model simulations. You could only show here Figure 14 and give a short description of where exactly you gain more information and in which areas thus more sophisticated studies are possible. Another option would be to keep this example short. In that case, I would suggest skipping the details on Figure 14 (the discussion of the dD distribution, L307-L333, it does definitely not need to be that detailed) and put the focus only on the case study itself.
The case study based on single overpasses over the ocean really demonstrates the novelty of this dataset, thus we consider it an important part of the paper. Therefore, we concentrate on that case study and skip the detailed description of the monthly mean global plot (L307-L333). To provide a concrete insight into the benefit of δD total column data compared to H 2 O alone in process-based studies of the atmospheric water cycle an additional figure with the distribution of the (H 2 O, δD) pairs in the tropics is shown in the new Fig. 17. This figure is shortly discussed in Section 5.1.
P23, L305: Here you jump from data coverage to different dD amounts. Of course, different weather conditions result in different dD amounts, but that relation should be better explained.
Yes, here we shortly attract the reader's attention to the fact that a comparison between the monthly δD distributions from the scattering and non-scattering retrievals is not trivial, because the monthly means result from sampling over different types of weather conditions, which makes a direct comparison over given regions difficult. We find this an important point and therefore kept this statement. The discussion between lines 307-333 in the first submission was exactly intended to explain how different weather situations can lead to differences in δD. However, we agree with the previous comment of the reviewer and therefore removed the latter text in the interest of keeping the paper focussed and short.