the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Validation of MUSES NH3 observations from AIRS and CrIS against aircraft measurements from DISCOVER-AQ and a surface network in the Magic Valley
Karen E. Cady-Pereira
Xuehui Guo
April B. Leytem
Chase Calkins
Elizabeth Berry
Markus Müller
Armin Wisthaler
Vivienne H. Payne
Mark W. Shephard
Mark A. Zondlo
Valentin Kantchev
Download
- Final revised paper (published on 05 Jan 2024)
- Supplement to the final revised paper
- Preprint (discussion started on 31 Jan 2023)
Interactive discussion
Status: closed
-
RC1: 'Comment on amt-2022-336', Anonymous Referee #1, 27 Apr 2023
Validation of NH3 observations from AIRS and CrIS against aircraft measurements from DISCOVER-AQ and a surface network in the Magic Valley
Overall summary:
I welcome any validation study for ammonia satellite products as, like the authors noted, there are precious few available. Aircraft based measurements provide an almost ideal method for validation which makes this study highly relevant. Even more relevant is the fact that these are, to my knowledge (limited as it is), the first detailed evaluation of CrIS retrieved profiles under inversion conditions. It is also the first study describing the MUSES retrieval and validation. However, some of the approaches taken in the study seem counter intuitive and the manuscript/method will either need some adjustments or at least some further detailing.
Major comments
- As this seems the first study describing the MUSES retrieval for AIRS and CrIS, that in itself could be given more focus by adding it to the title of the manuscript. Furthermore, illustrate the value by adding a summary to the conclusions on the strengths of the retrieval / improvements over CrIS-FPR.
- As described in the manuscript there is a clear difference between the CrIS-FPR and CrIS-MUSES retrievals as well as the previous AIRS and MUSES retrievals and it should be more clearly described to be as such. The current title and some of the statements (“Preliminary comparisons have shown excellent agreement between the two algorithms”) might make it seems that this validation study is applicable to CFPR as well, which is not the case. To not confuse any future reader, please make it more clear what retrieval is used in the study and leave out any comparisons with other products. An alternative option is of course to include the CrIS-FPR retrieval to this validation study to illustrate the differences.
- On several occasions, throughout the manuscript, the authors stress the importance of the applying the observation operator for a fair comparison to any second set of observed or modelled concentrations and/to reduce the impact of the apriori choice. While its clear, as stated several times, that the retrievals add information beyond any loss during the apriori choice, the information available is still limited and therefor the apriori will have a large influence. Already in the introduction it’s made clear that a comparison of in-situ ground based observations is complicated and highly uncertain due to the strong influence of local atmospheric conditions and vertical distribution! (line 166-168). Several of the comparisons however are made without applying the operator (e.g. Figure 3, Figure 7, and any in-situ observation comparison). Any argument such as “there are many end-users who will want to use the data as is in their own analysis and will want to know the corresponding uncertainties. “ or “IASI studies also don’t use an operator” are no reason to not apply an averaging kernel and reduce the overall quality of the validation study. If anything it should be stressed once more that comparing satellite observations to in-situ data is not trivial. Assumptions can be made for vertical profiles (e.g. based on apriori shape, modelled profiles, or mixing layer height) after which an AVK can be applied. For the two week averaged concentrations an effective averaging kernel can be approximated.
- Validation vs Evaluation: One could argue that the study is not a validation but evaluation of the profiles as the vertical extent by the flights is limited to either 500 or 700hPa (and only surface for the in-situ obs). The further assumption that all concentrations above those levels are zero doesn’t help bring the comparison closer to one another. Please change the title to evaluation and/or make a better assumption for the concentrations above 500/700hPa.
- Which brings us to: The noise in the observed concentrations… make it hard to trust any of the observed concentrations above a certain level (950 (top figure 2) and 800hpa(bottom figure 2)) and concentration value (>5ppbv). Instruments capable of measuring ammonia at high temporal resolution are prone to large bias/artefacts (Bobrutzki et al., 2010/ Twigg et al., 2022) especially at “low” <10ppbv concentrations(funny standard) . The concentrations during the flights show an overall variation of 10 and 4 ppbv depending on the flights/profile/direction etc. This does not match the assumption of 0 ppbv above the measured profiles. While the variations could simply be instruments noise/artefacts, that does reduce the value of all measurements above a certain altitude (950/800hPa). Is it possible to perform any further QA/QC and provide ancillary information, such as the measurement error and mixing layer height to reduce the overall signs that the instrument is simply measuring noise/offset/artefacts.
https://amt.copernicus.org/articles/3/91/2010/
https://amt.copernicus.org/articles/15/6755/2022/
- Not sure if to place the next comment under minor or major comments:
Line 158-169: A great summary of the pitfalls of previous studies… that we then proceed to walk into in this study. Each of these factors, up to a degree, can be accounted for and improve the validation study:
- Sub-pixel inhomogeneity, a fun point that almost no study does anything with besides mentioning it, some parts from Souri et al., 2022, could help https://amt.copernicus.org/articles/15/41/2022/.
- Time-scales: the mismatch in representativeness of in-situ networks and satellite will result in a bias. Only a rough statement is made (line 554) but its unclear what the exact value is in this case. With a rough lifetime of 4-12 hours the concentrations that CrIS/AIRS observe will be a combination of emissions over the last few hours and not just the overpass. You could argue that CrIS will be more representative of morning/nighttime emissions and not the peak afternoon values. Please make the potential impact more quantitative by adjusting for the impact or adding a rough uncertainty estimate to the observations.
- Noise of the in-situ or satellite instruments?
- As stated the horizontal and vertical distribution of ammonia can have a huge impact on the estimated total columns (e.g. factor 2, van Damme 2014). https://acp.copernicus.org/articles/14/2905/2014/acp-14-2905-2014.pdf
As stated above make an effort to reduce the potential impact or add a factor of uncertainty to the comparison to account for the potential impact.
Minor comments
- Abstract lines 35-37 rewrite needed: The way it’s currently written, to me, makes it sound like the validation study only represents a very tiny set of conditions and …not important. Be proud of the study, the highly detailed (smaller set) of observations is a strength!
- Line 35-40 add quantities to the bias/errors.
- In several sections of the manuscript there are statements of outcomes of other validation studies but it is not clear which retrieval was validated. There have been several IASI products over recent years with large difference between them (typically updates). Please add the version numbers.
- Line 67: “within the European union” how its currently written makes it sound like the EU regulate US pollutants.
Line 85: add reference to “measure accurately”, for example Bobrutzki / Twigg, https://amt.copernicus.org/articles/3/91/2010/
https://amt.copernicus.org/articles/15/6755/2022/
- Line 91: there are several instruments measuring via an open-path that are used in measurement networks (e.g. https://amt.copernicus.org/articles/10/4099/2017/amt-10-4099-2017.pdf, https://amt.copernicus.org/articles/16/529/2023/amt-16-529-2023.html)
- Line 100-105 add the spatial coverage/footprints of the individual sensors.
- Line 120: reference? (e.g. https://acp.copernicus.org/articles/22/6595/2022/acp-22-6595-2022.html, https://acp.copernicus.org/articles/22/951/2022/acp-22-951-2022.html)
- Line 125 onward: add version numbers and retrieval names.
- Line 125 onward: instead of good/high/etc add quantities
- Line 141 what was the result for IASI(-NN and -LUT)?
- Line 142:144: Incorrect statement. Most of the FTIR sites are located away from high source regions, which limits the applicability for high concentration regions. Several of the NDACC sites however (e.g. Hefei, Mexico city, Bremen, Boulder) are within or near regions with high concentrations which makes the complete network quite applicable, and to great interest of the air quality community.
- Line 143: What is of greatest interest to the air quality community?
- Line 158: Move a part of the sub-pixel bit (169-180) above this section for better readability.
- Line 180-182: again makes it sound like the study is not that relevant, while it is!
- Chapter 2 & 3.3 integrate together into a MUSES chapter
- Line 199: Can you add an example of the apriori profiles and typical surface values.
- Line 214-218: either add a section comparing the two retrievals, provide a source for these results or remove this section.
- Line 216: specifically “excellent agreement”, while I understand the comparison is beyond this paper at least specify where anyone can find the comparison.
- Line 256:261: please add some quantities (ppbv/%) to what can be expected for each of the error sources. Show that III in particular is essential as it seems the part that’s new within the MUSES retrieval.
- Line 284-287: Either remove, or add a few lines/statement to the discussion/conclusions that this study indicates the potential hazards of, and large levels of uncertainty in, simply using the CrIS retrieved surface concentrations.
- Line 290: After the whole description, simply truncating the averaging kernel seem counter-intuitive. Of the limited information contained in each observed profile quite a bit will be above the 500/700hPa level. Smoothing etc will have an effect on the profile/column comparison. Please show that this effect is minimal or (better) redo the comparison without truncating and assume a value for the levels above 500/700hPa.
- Line ~295: add the dates/period to California 2013 and Colorado 2014.
- Line 310-311, why not bin the observations the CrIS and AIRS retrieval intervals?
- Line 311: add the detection limit, and concentration interval that the 35% is representative of.
- Line 314: “higher detection limit” how much higher?
- Line 315: same, what amounts are we talking about?
- Line 323: technology and protocols: add reference.
- Line 325 type of sampler? Quite the quality differences.
- Line 346: add LSTs.
- Line 351: if CrIS-JPSS-1 is not used leave it out of the manuscript.
- Line 361: add some information on why 60 minutes and 15km are used (especially for low(<5kmph) and high (>30kmph) wind speeds differences in observed air mass are possible.
- Line 362: summarizes the approach of Guo2021.
- Line 363-364: stricter for a reason, other studies (e.g. Tournadre, 2020 for NH3) showed some information on the limits, reference.
https://amt.copernicus.org/articles/13/3923/2020/amt-13-3923-2020.pdf
- Line 365-367: add values for future reproducibility
- Line 369: Why median and not mean? Did the data have strong outlier values / not well distributed?
- Line 372-377: move up to line 347 as its appropriate to already mention the differences there.
- Line 382: “amounts as high as 100ppbv …” add during a (weak) inversion?
- Line 385:390: excellent case-study no negativity/toning down needed, adjust text.
- Line 391: “However, when averages over long periods and/or broad regions are desired, it would be reasonable to exclude cases with inversions”, I have to disagree and argue the opposite, and ground-based measurement will also measure during these inversions, for an accurate comparison you’ll need to include such events into the satellite mean, else its not representative of the situation on the ground.
- Line 395-407, to be honest this whole section could be removed. From previous studies and your summary in the introduction its already clear that there are large bias (or representation errors) to be expected from not applying the averaging kernel or spatial-heterogeneity, its not needed to show it here.
- Line 426: Please add the percentage that the 1.0ppbv represents compared to the total observed concentrations or add those to the text.
- Line 427-429: Importance was already stated in the introduction. If you want to leave this section in, add some colouring of the profiles (fig3,fig5) based on the apriori profile concentrations. A percentage based plot would also help put the values into perspective.
- Line 436: …has been argued…add a refence.
- Line 451: similar like stated above, add some values on what order of uncertainty/error/bias we can expect for the individual errors. A plot like Fig. A2/A3 in Dammers et al., 2017, comes to mind. https://amt.copernicus.org/articles/10/2645/2017/
- Line 453-455: “The measured uncertainties range from 5%-50% …. point to the need for averaging…” why is that the case? Most in situ instruments/measurements observe with comparable levels of uncertainty, similarly the uncertainties in the emissions can be up to several orders (Factor 2.5, higher values also mentioned within Van Damme 2018, stated in introduction / Dammers 2019 also in introduction).
- Line 459:461: Again not an argument to make the same (incorrect) comparison here. Either replace entirely with, or add a comparison including the application of the averaging kernel and show the impact on the comparison.
- Line 461: incorrect statement, Dammers et al., 2017 (and 2016) did apply the averaging kernel to IASI profiles. The IASI retrieval uses a profile assumption and profiles can be derived from the columns. https://acp.copernicus.org/articles/16/10351/2016/
- Line 469: “are assumed to be zero” As stated above this is quite an assumption and the impact should be quantified. Alternative choices such as using the values from the apriori profile or scaling the apriori profile with the observed values at the top of the spiral are also viable. The lifetime is of the order of hours – days which means there should be a non-insignificant amount of ammonia above the mixing layer and in the upper troposphere. The July-August measurements in Colorado coincide with the fire season, long-distance high-altitude plumes could occur during this period and interfere in the comparison (e.g. Lutsch et al., 2016; https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2016GL070114).
- Line 486:487 + beyond: At the altitudes where the measured concentrations seem valid enough there is no indication of a missing error source. If the values of the aircraft are not representative of concentrations observed >500/700 hPa how can we still conclude anything about potentially missing error sources?
- Line 515: …water vapour retrieval errors… Add a bit of discussion on this outcome as the sub-pixel retrievals of water vapour etc were one of the addition of MUSES over the old retrievals (unless I am mistaken, always possible). If this only increases the bias/uncertainty is it smart to keep doing the retrievals as such?
- Line 545: biased high should be biased low. Puchalski et al also gave a range of +18 to -32%.
- Line 554: see earlier statement on using emissions for concentration representativity.
- Line 559: “possibly” missing a closing bracket
- Figure 9: Excellent example of the temporal variability picked up by the satellite instrument However, accounting for the apriori effects could bring these comparison a lot closer.
- Figure 11: colorbar values are missing
- Line 616: add within the “United States” or something similar, recent EU emission databases are typically at distributed based on livestock numbers at each farm, with some inventories (e.g. the Netherlands) even at facility level that get aggregated to 1x1km2
- Line 621: oddly specific, leave out, or add reference? (Herrera, 2022, https://acp.copernicus.org/articles/22/14119/2022/acp-22-14119-2022.html)
- Line 624-626: Be proud! You’re reducing the relevance of this paper, while it definitely is relevant!
- Conclusions: update statements following the above comments.
- Line 624: show the shape and values in the retrieval section.
Citation: https://doi.org/10.5194/amt-2022-336-RC1 -
AC1: 'Reply on RC1', Karen Cady-Pereira, 27 Apr 2023
I would like to thank reviewer one for some very well thought out observations. I am writing now before submitting a formal response to get some clarification on the reviewer's suggestions. First, the reviewer pointed out that we should apply the averaging kernel to the aircfraft and surface data. For the aircraft profiles we did it both ways (Figures 3 and 5). Based on my collaborations with end-users I feel pretty strongly that showing both comparisons is useful. Is the reviewer suggesting we do the same for the column comparisons and the surface data? Second it's not clear how much the reviewer believes we should be comparing the MUSES retrievals with the CFPR algorithm. Is a discussion of the difference sufficient?
I certainly agree that "evaluation" bettter describes this study.
Thanks in advance for any guidance.
Citation: https://doi.org/10.5194/amt-2022-336-AC1 - AC2: 'Reply on RC1', Karen Cady-Pereira, 02 Aug 2023
-
RC2: 'Comment on amt-2022-336', Anonymous Referee #2, 26 May 2023
The present study describes the validation of ammonia (NH3) data retrieved from the AIRS and CrIS satellite instruments using the MUSES algorithm. The study focuses on comparing NH3 profiles and columns derived from aircraft measurements obtained during the DISCOVER-AQ campaign in California and Colorado, specifically in source regions of ammonia. Additionally, it includes a comparison with three years of surface NH3 measurements from a monitoring network in Idaho.
The manuscript is well written, properly structured, and aligns well with the scopes of AMT. It contributes to the extensive validation efforts of various NH3 products derived from satellite measurements, which is crucial considering the growing utilization of NH3 satellite data. There is an evident need for such validation. Overall, the comparisons between satellite and in-situ data are well executed and yield interesting results.
However, I have some general concerns regarding the significant uncertainties inherent in the comparisons involving satellite-derived mixing ratios and in-situ measurements at specific altitudes/pressure levels, particularly at the surface. This concern arises due to the absence of vertical information that can be obtained from trace gas retrievals like NH3. Additionally, I believe the manuscript lacks thorough discussion and investigation into the reasons behind the remaining biases/uncertainties. Addressing these aspects would provide valuable material for the paper's conclusion.
Therefore, I recommend publication once the following major comments listed below are addressed.
Major comments
A major conclusion of the study is that a portion of the biases between satellite and in-situ measurements can be attributed to smoothing errors and unaccounted error sources, which can be substantial. This is evident in Figure 5, where the standard deviation of the biases remains large even after applying the AVKs. However, I believe the study could go deeper into understanding the sources of these biases and uncertainties, and additional tests within this framework would be beneficial. Specifically:
- The manuscript should discuss the uncertainties and errors associated with temperature and H2O profiles, as they often have significant impacts on trace gas retrievals from nadir-viewing satellite observations, particularly for NH3 that mainly resides in the boundary layer. It would be helpful to explore whether the remaining biases between satellite and in-situ measurements are dependent on errors in these two variables. Additionally, what is the influence of uncertainties in temperature and H2O profiles on the overall NH3 uncertainty budget?
- Information about the DOFS associated with the AIRS and CrIS NH3 retrievals would be interesting to discuss. Did you exclude observations based on their DOFS before conducting the validation? Do the biases between satellite and in-situ measurements decrease when filtering out observations with low DOFS? Exploring this aspect would provide valuable insights.
- The choice of the a priori profile is crucial. Are the selected a priori profiles representative enough for the conditions in California and Colorado during the in-situ measurements? How would the biases between satellite and in-situ measurements change if you adopted an a priori profile that peaks closer to the surface? Would such a choice help reduce the biases? This is an aspect worth investigating and discussing.
I have concerns when comparing trace gas mixing ratios retrieved from ground-based or spaceborne observations at a specific altitude/pressure level directly with in-situ measurements taken at the same level. It's important to recognize that, with the optimal estimation, the value retrieved at this level alone lacks meaning as it heavily relies on information obtained from other levels. This is particularly relevant for trace gases like NH3, where only a single piece of information (total column) can be obtained. It becomes even more complex when comparing surface in-situ measurements with near-surface mixing ratios derived from retrieved profiles, considering the decreased sensitivity of satellite IR sounders in the lowermost tropospheric layers. Although the sum of each row of the AVKs shows some sensitivity to the near surface, the influence of the a priori profile remains substantial in these layers. Furthermore, the shapes of the AVKs indicate a tendency for the retrievals to overcompensate in the free troposphere (Figure 4), suggesting that part of the information used to estimate near-surface values originates from higher levels. Additionally, I assume that the constraint matrix restricts the variability in these layers, incorporating inter-layer correlations through the extra-diagonal elements, in order to prevent abnormal oscillations in the retrieved profile and maintain it within a reasonable range relative to the a priori. Considering these factors, it is crucial to thoroughly investigate and discuss the extent to which these retrieval characteristics impact the comparisons between satellite and in-situ data before drawing conclusions.
In Guo et al. (2021), it was demonstrated that notable differences exist between the ascent and descent aircraft profiles of NH3 measurements obtained during the DISCOVER-AQ campaign. These profiles are utilized for validating the AIRS and CrIS NH3 observations in the current study. The observed disparities arise due to the slow response time of the PTR-MS instruments, leading to a sampling lag when the aircraft moves from the boundary layer to the free troposphere, and vice versa. Since the majority of NH3 is concentrated in the boundary layer, this response lag results in an overestimation of NH3 in the free troposphere during upward spirals and an underestimation of NH3 in the boundary layer during downward spirals. These biases between ascent and descent profiles have been identified as significant and, when combined with the 35% uncertainties associated with NH3 measurements, can considerably impact the comparisons with satellite measurements. However, this point is not discussed or accounted for in the present study.
In the comparison of NH3 columns between satellite and in-situ measurements (Figure 7), it is evident that both AIRS and CrIS tend to overestimate NH3 columns in California and Colorado, as indicated by the slopes ranging from 1.6 to 2.3. However, the application of AVKs has not been performed on the aircraft profiles in this comparison. Since satellite sounders exhibit reduced sensitivity and obtain less information in the near-surface layers, where NH3 is predominantly abundant, it is expected that the AVKs would diminish the influence of these layers when computing NH3 total columns from the smoothed aircraft profiles. Consequently, the aircraft NH3 columns may be lower, potentially accentuating the overestimation of NH3 by AIRS and CrIS. This aspect should be discussed.
On the other hand, in section 5.2, it is revealed that CrIS NH3 exhibits an overall low bias when compared to surface in-situ measurements (slope of 0.65). This finding contradicts the previously noted large overestimation observed for CrIS NH3 columns (as mentioned in my previous comment). This discrepancy raises additional concerns regarding the reliability of quantitative comparisons between surface in-situ and satellite data. It suggests that such comparisons at the surface level are particularly uncertain and prone to biases.
The study identifies sampling differences between satellite data and in-situ measurements as another significant source of uncertainties and biases, particularly following the application of satellite AVKs. To gain further insights into this aspect, it would be beneficial to conduct tests on the co-location criteria in terms of both spatial and temporal alignment. For instance, reducing the co-location time to 30 minutes could improve the representativeness of the satellite measurements with respect to the in-situ data. Conversely, extending the co-location time would provide a larger statistical dataset for the comparisons, enabling a more robust analysis of uncertainties and biases.
Minor comments / typos
- Lines 226-228 and 399-401: I find it unclear whether the retrievals are conducted over cloudy scenes as well. If retrievals are performed over cloudy areas, it could pose a concern since the presence of clouds can impact the baseline temperature of the spectra and the thermal contrast, consequently affecting the retrieved column. These effects should be taken into consideration, as they have the potential to introduce biases and uncertainties in the NH3 retrieval process.
- Lines 263-264: Why are the systematic errors not evaluated for CrIS?
- Lines 288-290: I believe truncating AVK matrices in such a manner may not be appropriate. It implies that the influence of the upper layers on the remaining layers is entirely disregarded, consequently affecting the smoothing of the aircraft profiles. A possible workaround could involve complementing the aircraft profiles at the top with additional information, such as model profiles scaled to background NH3 values, and applying the complete AVK matrices. Although this approach would introduce an impact from the profile at the top, it seems more reasonable to me than truncating the AVK matrices entirely.
- Line 292: Müller
- Lines 350-351: Why are the observations from CrIS/JPSS-1 not used here? They could have filled the gap of CrIS/SNPP in 2019. And it would have been interesting to check the consistency between these two CrIS instruments for NH3.
- Lines 365-361: Could you be more specific on the filters you applied to the retrieved data?
- Figure 2: It might be useful to superimpose the mean and standard deviation of the profiles on these plots, as it was done for Figure 3.
- Line 437: “It has been argued”. As is, it sounds a bit weird. Do you refer to a specific study?
- Line 477: What is the typical detection limit of NH3 for these satellite instruments?
- Lines 482-485: I don’t understand what is meant here.
- Line 495: “some poor quality CrIS retrievals”. It is surprizing, since CrIS has much lower instrumental noise compared with AIRS.
- Line 535 and line 545: There is likely a full stop punctuation missing at the end of each of these two lines.
- Line 559: Could the winter values be underestimated because of the general weaker thermal contrast (measurements closer to the detection limit)?
- Line 559: “(possibly,”? There might be a typo here.
- Lines 587-588: “at the high end of the values reported in the literature” Please provide references.
- Figure 10: The shapes/limits of the subplots are not consistent between NH3 data vs. number of dairies. Please consider using the same projection. Also, the values of the NH3 colour bars are hidden.
- Line 631: delete the full stop punctuation after “measurements”
Citation: https://doi.org/10.5194/amt-2022-336-RC2 - AC3: 'Reply on RC2', Karen Cady-Pereira, 02 Aug 2023