|After reading the paper a second time I must admit that I have even stronger reservations than the first time. Instead of my initial recommendation "major revision" I now disagree with publication in a form which resembles the present structure of the paper. In general I note that the revisions made in response to my comments have been minor, not addressing in full my major concerns.|
Let me clarify my judgement by reviewing my major general comments (between quotes) in my first review:
"My first major reservation: the purpose of the paper is the formulation of the list of recommendations for a more uniform and complete error reporting in level-2 satellite data products (see the last line of the abstract and section 7). However, the bulk of the material presented in the paper is basically a review of real-world implementations of optimal-estimation based (and related) profile retrievals. As such the authors could consider to split the document into two papers, a review of profile retrievals and a shorter more focussed paper about unified error reporting."
"Several sections of the paper are providing useful functional background information for section 7. But for quite some sections I could not find the link with the final recommendations. Examples are section 5.4 and also parts of 4, 5.1 and 6.4 (e.g. 6.4.3 to 6.4.6), 6.7. Because of these sections the paper is very long. "
"Section 5.4 is a review of (profile) retrieval approaches, but contains a lot of material which is not directly relevant for the paper. Personally, I would propose to shorten this section, keeping possibly the tables (and references) and keeping those remarks which are important in the context of error reporting. This review-like section also leads to a very long list of references. It would be good to mention only those references that bring new information to the discussion how to present the retrieval errors. "
Suddenly the authors claim that the paper is meant to be a review. However, this is completely unclear based on the title and abstract alone. The abstract contains several sentences which would fit better in the bulk of the paper. Only the last sentence of the abstract specifies what the paper will contribute to the existing literature, namely a list of recommendations. This is in contrast with some words that have now been added to the introduction about the paper being meant as a review. The link between the conditions section (2) and the recommendations section (7) on the one hand, and the rest of the paper on the other hand, is still not very strong.
The authors have responded to this major reservation, but by implementing minor fixes like cross-references and clarifications of links between the sections. This does not address my more fundamental observations that the review part of the paper is not serving the recommendations part at the end of the paper.
I would strongly urge the authors to re-think the purpose of the paper. I suggest:
- either a review of profile retrievals (focussing on the error formulation),
- or a shorter recommendation paper on how to present retrieval data products to users,
- or both (two papers).
In it's present form I think the material presented in the paper is not fully compliant with either a review or a recommendation paper. So this would require major adjustments to the text and the logical structure of the paper. Currently the paper is very long because of the review nature.
"After reading the first sections of the paper it was not fully clear to me what is really the problem which is addressed? In what sense are retrieval products not comparable? Please provide (generic) examples of retrieval products which miss information which makes a direct comparison between retrievals, or comparisons with independent data difficult or impossible. In what sense is there a need for a new set of recommendations, e.g. what is missing after the work of e.g. QA4EO or the GUM?"
The authors address this first point with an extra sentence in the introduction, which is useful. However, I would like this to be discussed more systematically/methodologically in the main text with a clear link to the recommendations.
"The final set of recommendations are focussing on profile retrievals. But the tables include also total column retrieval examples (e.g. DOAS). I think this is a missed opportunity, and I would encourage the authors to formulate explicitly what their recommendations are for column retrieval products (some recommendations are generic, but several parts of section 7 explicitly refer to profiles)."
The authors clarify how the formalism (sec 6.4) includes column retrievals as a limiting case. In practice, total column products are organized differently than profile products, and error treatments also differ. Many satellite products are total columns products, so I really miss a more detailed discussion of the variability in total column approaches. The addition of specific recommendations how to harmonise column products would be very useful.
"Arguably the atmospheric composition data assimilation community is the main user of satellite retrieval products. This community and their needs are basically not discussed in the paper. More generally the users of the data do not receive much attention, and the requirements are discussed from a L2 data provider point of view. This is my second major reservation. Some parts of the text refer to the validation activities, but this is not presented in a very structured way. The needs and feedback from the validation and assimilation communities on existing L2 satellite products would be an important starting point to discuss requirements for satellite data products. Some assimilation users would prefer to work directly with the level-1b data, an option which is also not discussed. "
I still find it hard to accept that recommendations are given on how to report the errors without referring to the user community. Different use applications have different needs. The authors mention that a companion paper is addressing this. The reporting is the interface between data provider and user, so both sides should be addressed. As example, there are also very practical considerations like file sizes: error information can easily become by far the largest part of the dataset. The role and activities of the space agencies (NASA / EUMETSAT / ESA / CNSA / JAXA) in unifying data products should also be mentioned.
Personally I think a generalised set of recommendations for the L2 retrieval teams that fit all (unspecified) applications of the data is of limited use.
"The recommendations in section 7 are not always formulated as a recommendation, but leave room for interpretation and implementation. I sometimes found the CoA points in section 2 even more clear and explicit than the recommendation points. It may be useful to split the list in section 7 in actual (strong) recommendations and related discussion points. Sometimes it is not so clear what is actually recommended by the authors, e.g. due to a trade-off between completeness and data volume, or aspects are left to the retrieval teams to decide (e.g. point 1, 2, 3, 4, 16, 18). "
This is acknowledged by the authors, but the number of recommendations has not changed, and the authors have not made a division in recommendations and related discussion points. I still think that reducing the number would make the concluding recommendations more useful.
"I was expecting recommendations also regarding the naming (see section 3). The authors discuss in particular "error" versus "uncertainty", but do not really provide a clear guidance on what to use. Also, the consistency or inconsistency with the GUM activity are not clear to me after reading the section. The reader is referred to a paper in preparation. Retrieval data files contain parameters labelled as "precision", "accuracy", "trueness" etc. and different guidelines exist from different space agencies and for different application areas. It would be useful if the authors can discuss naming conventions also in this paper and express a clear opinion/recommendation. "
The authors reply that this would always be in conflict with part of the community. I had hoped for a stronger recommendation. For instance in the data assimilation community there have been papers devoted to unified naming and notation. Could the work of Rodgers not serve as starting point, since much of the retrieval formalism was discussed in a systematic way by him? I think there is a need for recommendations on which word means what.
"Machine learning approaches are getting more and more popular and deserve some special attention. Several machine learning implementations for retrievals are limited on the error information they provide. It would be useful to have some targeted recommendations for these approaches as well."
A discussion on machine learning is added in the final discussion section. The bulk of the paper addresses more traditional optimal estimation type profile retrievals based on full radiative transfer models. It is not clear to me (and likely to the reader) to what extent the recommendations are general and apply to important classes of alternative approaches as well, such as the various forms of machine learning which will become much more prominent in the future, or popular approaches such as DOAS.