[1] "getting shortnames from database"
Reading data of C:/LuftBlick/Review_AMT_2024/L2/Pandora25s1_HoustonTX_L2_rfus5p1-8.txt
Review amt-2024-114
General Comments
Overview
The presented manuscript proposes an alternative data flagging procedure to the standard PGN flagging for HCHO and NO2 column densities, retrieved from MAX-DOAS and direct sun measurements, in order to increase the data amount that can be used for scientific studies. As such, the topic of the manuscript is highly important for users of PGN data-products. This can help the data-users and readers of this manuscript to better understand the standard flagging, and most importantly to apply their own filter criteria with the presented approach, or even go beyond. The authors use the linear correlation coefficient as a metric to validate their novel approach for both species, although the focus and interest is more on HCHO. The correlation of HCHO to surface O3, and airborne data for both HCHO and NO2 are presented as case studies.
The motivation to increase the sample for scientific analysis is certainly important, but the reason why data are flagged still needs to be taken into account. Unfortunately, the main part and the supplemental review of the quality flags is described rather vague, with both missing parts and wrong statements of the current flagging. Therefore, the manuscript would highly benefit from a more in-depth analysis of the standard quality flags to highlight the driving quality indicators which lead to the flagging, and the additional corrections about the current flagging.
Title
The title is misleading in terms applicability to which PGN data product. The presented study focused on HCHO and NO2, but with a strong focus on HCHO. However, the Pandora data pool also covers O3, SO2 and H2O, which have not been demonstrated in the manuscript. For both O3 and SO2 there are no MAX-DOAS data products available, which limit the presented combined-approach to HCHO and NO2. H2O would be available by both the direct sun and MAX-DOAS measurements, but was not considered here. Therefore, the presented approach is not generic enough to be applied for all Pandora data products, which should be properly reflected in the title.
Specific Comments
Here I refer to the lines of the manuscript:
18 change “PGN standard quality assurance” to “PGN standard quality flagging”, since the assurance part does not change the high,medium, or low quality categorization
25 Have other uncertainty components been analyzed?
26 Confusing statement of “independent uncertainty filter”. Does this refer to the reported “independent uncertainty” component in the L2 file or the presented approach which uses the “independent uncertainty”?
56 “interferences” would refer more to an optical problem. With respect the Delrin-problem, it was HCHO that was measured and retrieved, but it was not the just atmospheric HCHO in the lightpath.
60 replace https://blickm.pandonia-global-network.org to https://www.pandonia-global-network.org/ since not all data-users do have a blickm account. And blickm is a monitoring tool without providing reports, software, or data to download.
65 LuftBlick with capital “B”
75 With respect the direct sun HCHO retrieval I would additionally cite the ReadME:https://www.pandonia-global-network.org/wp-content/uploads/2023/11/PGN_DataProducts_Readme_v1-8-8.pdf since Spinei et al. is the originator of the MAX-DOAS retrieval but not of the direct sun.
91 At the end I would mention that the approach has been demonstrated solely for HCHO and NO2.
92 H2O is also an official data product provided by the PGN, with rcodes wvt1 and nvh3 for direct sun and MAX-DOAS retrievals, respectively.
105 Not all Pandoras are stabilized at 20, some are also measuring at 15, which highly depends on the location and environment where the instrument is set-up.
111 latency correction is not applied in a characterization step, and also not characterized in the laboratory, since it would require to open the spectrometer and flip the CCD.
112 stray light characterizations are not applied in the processor 1.8, which limits the stray light correction to the simple straylight method, which is to subtract the signal below 290 nm. The straylight correction matrix method currently not applied.
161 Here the user might benefit from the information that the highest angle is used a reference in the spectral fitting. Which further means, that if this angle is contaminated by an obstruction (e.g. tree), a spectral signal could enhance the wrms and further the data product. What was the azimuth angle of all the datasets used in this study? Where the instruments looking in the same direction?
185-189 uncertainties are not used in any part of the processor 1.8 flagging procedure. It is true that the “total uncertainty” of the processor 1.7 was used which is the independent uncertainty.But it was removed from the flagging criteria in 1.8. The reason was that this parameter was too dependent on the instrument’s sensitivity and schedule/routines the Pandora is measuring, since longer exposure times typically have larger independent uncertainties. However, processor version 1.8 provides a detailed uncertainty budget which might be of interest also as a decision criteria which data to use.
190-202 What is the reason for “much of the data unavailable”? Which parameter is the driving quality indicator?
226 It is expected that the independent uncertainty has an overlap in all quality flag categories, since it is not reflecting any issues on L1 site (e.g. small number of cycles) or L2Fit side (spectral features which cannot be captured). It would also not reflect an air mass factor error on the L2 side, if the instrument has been using the wrong PC time for example. This problem is highly impacting the L2 columns in terms of the diurnal shape which is of interest for satellites like TEMPO. Pandora25s1 at HoustonTX has this problem. Here the periods have been categorized as unusable (20,21,22) by the quality assurance part. How is the presented approach accounting for such situations if no quality assurance has been applied on the dataset? Because this would is not reflected in the independent uncertainty, because the instrument can still be properly aligned and looking into the sun.
255 Figure 6. How is the independent uncertainty related to the atmospheric variability parameter?
265 How is this threshold defined and what is the objective approach behind?
270-272 Is this improvement related to the data removal due to the wrms < 0.01
Figure10 as soon as MAX-DOAS comes into the recipe, the approach is not applicable for O3,SO2. It would also be needed to analyze H2O to demonstrate the applicability in a broader context.
322-324 This strong increase is great! However, since the flagging approach is not taking into account L1 related problems, nor potential slant column biases in the spectral fitting (covered by the wrms), some justification is missing if each retrieval is really usable or not. Is this increase attributed due to by-parsing one or two of the standard flagging criteria already? And if, which are those?
355-360 Is the R^2 the proper measure to demonstrate the applicability? Under the assumption to have a linear correlation, the correlation and R^2 should remain similar if 100 or 1000 datapoints are used. If the R^2 differ significantly, could this imply to have an undersampling or wrong assumption of the relationship? Can the R^2 between two different populations be compared directly? The relationship in Figure 12 implies a little bit to be non-linear for the MAX-DOAS columns. Can you provide some uncertainty range of the R^2, maybe by cross-validations or bootstrapping approaches? Or is there any expected correlation from literature between HCHO and surface O3 which supports a certain R^2 value where the sample should converge?
487 What is meant by other methods?
524 Is the means bias value showing some seasonality due to different mixing heights in summer and wintertime? This would mean in summer the MAX-DOAS would not see a larger fraction of the total column compared to wintertime. This could indicate a smaller mean bias in winter than in summer.
Table 2 Can you provide any uncertainty ranges of the R^2?
565 is the wrms threshold of 0.001 site-specific or generally applicable? How is this 0.001 related the the wrms threshold of 0.01 reported in Figure10 and on line 265?
575 It would be very interesting to see why so many datapoints are discarded. If this is related to 1 or two quality indicators. I encourage the authors to look into the L2 file, where all the needed information is reported (see example of L2 flag propagation).
Example of L2 flag propagation
The following example uses the direct sun HCHO data (fus5p1-8) from P25s1 at HoustonTX. Starting from left (L0 data amount), the stacked columns represent the errors on the different processing levels:
eL1: error code on the L1 data
eL2Fit : error code on the L2Fit data
eL2 : error code on the L2 data
QF : data quality flag
This gives an overview about the detailed flagging propagation from L0 to data quality category, which is all given in the L2 file of each PGN data product. From this it can be seen that it is one dominating parameter on the L1 side, and two dominating quality indicators on the L2Fit side.
Comment On Quality flags
The quality flags are propagated from L1,to L2Fit,to L2 and end up in the different clusters for high, medium, low data quality, that can be un-assured, assured, or unusable. This means if a single retrieval is identified as low data quality on the L1 side, it cannot be of better quality on higher levels. If the number of dark cycles is already too low, or saturated data occurred already on the L0 side, or the spectrometer temperature is too far away from the characterized temperature in the laboratory, data will be flagged into medium or low quality already. The same applies on higher levels. If for instance the L1 retrieval is of high quality, but the spectral fitting wrms is exceeded due to a spectral signal which cannot be captured by the retrieval polynomials, data can be flagged into the categories based on the threshold which is exceeded.
The thresholds for some of the quality indicators come from the Gaussian Mixture Regression model approach. This approach is applicable to individual datasets, such as P25s1 HoustonTX. However, the PGN flagging does not use instrument-specific flags and uses a PGN average over multiple datasets. This could indeed lead to some datasets flagged to strict and some to weak. Has this approach been tested, to use for instance a HoustonTX-specific treshold of the wrms to not by-parse L1 filter?
More general, the wrms is THE quality indicator of the spectral fitting. By by-parsing this quality indicator as a flagging criteria, potential slant column biases introduced by spectral signals are ignored. The same applies for the unusable category of 20,21,22, which should not be ignored.
On the other hand, there might be quality indicators which are too strict and can lead to a filtering of valid retrievals, which is the motivation of this manuscript. It would be needed to identify those quality indicators. Maybe, there is one or two quality indicators which are responsible for the majority of the filtering of ‘valid’ retrievals. It would be interesting to see if the simple removal of them can already lead to the same effect as the proposed approach.