the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Understanding the potential of Sentinel-2 for monitoring methane point emissions
Daniel J. Varon
Itziar Irakulis-Loitxate
Luis Guanter
Download
- Final revised paper (published on 10 Jan 2023)
- Preprint (discussion started on 04 Oct 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on amt-2022-261', Cristina Ruiz Villena, 05 Nov 2022
General Comments
In this work, the authors have developed a new benchmarking framework for the validation of Sentinel-2 methane retrievals, including an estimation of detection thresholds for various scenes with different characteristics, and an analysis of the uncertainties. Existing work on Sentinel-2 methane monitoring has demonstrated the usefulness of this sensor to detect and quantify large methane emissions at a global scale, but mostly focused on examples of real plumes. Though similar work has already been done for other sensors such as PRISMA or WorldView-3, to my knowledge, this is the first time that such a benchmarking framework has been developed for Sentinel-2 with simulated methane plumes and the detection thresholds and uncertainties characterised in a more systematic way. This work is valuable as it will aid the validation of future S2 quantification of real-world methane plumes, which is challenging and/or costly to do otherwise (e.g. with ground-based or aircraft observations).The manuscript is well written and its contents are of high quality and scientific interest. Methane monitoring is currently a hot topic, given the high potential to mitigate further warming by reducing methane emissions, particularly from the oil and gas sector, where it is most cost-effective. The Global Methane Pledge signed by more than 100 countries at COP26 last year is a testament to the current relevance of this topic.
Specific Comments
My comments are mostly focused on making things clearer for the reader in the manuscript. A detailed list can be found in the annotated manuscript.Technical Corrections
I have spotted a few typos and other technical details to correct, which, in my opinion, would improve the quality of the manuscript. A detailed list can be found in the annotated manuscript.-
AC1: 'Reply on RC1', Javier Gorroño, 07 Dec 2022
Dear Cristina,
Thanks for your constructive and careful review. We have incorporated your comments in the reviewed version. Here we list a small subset of the comments with a brief answer to them.
- L118: Why didn't you run the simulations at 20x20 m2 to begin with?
- Running these type of simulations consumes time and file size. It is possible to run these simulations at 20 or lower spatial resolution but at the expense of more processing time and file size. We checked that the interpolation error was negligible.
- L142: Can you explain this further?
- We have added a sentence at the end of the paragraph which better explains this.
- L230: Which one?
- The correct package is now cited.
- L236: What do you mean? Can you clarify? It sounds like you use the same reference for both sites, which I don't think is what you mean.
- We have rewritten the sentence to clarify this point.
- L276: I would refer to these plots as something like 'masked plumes' or 'masked ΔXCH4' rather than 'masks'. I understand a mask as a 2-D boolean array that determines what pixels to keep and what pixels to discard.
- Done
- Fig.7: I would refer to these plots as something like 'masked plumes' or 'masked ΔXCH4' rather than 'masks'. I understand a mask as a 2-D boolean array that determines what pixels to keep and what pixels to discard.
- Done
- L348: what scenarios? do you mean masking criteria?
- This has been clarified in the text.
- L369: What does this mean?
- This has been clarified in the text
- L377: What do you mean exactly? Do you mean that if you apply a linear fitting, there is higher disagreement for higher U10?
- When comparing the two linear fittings, the slope disagreements results in an error the higher U10 is. It is an illustrative scenario since it will also depend on the fitting process itself.
Citation: https://doi.org/10.5194/amt-2022-261-AC1
-
AC1: 'Reply on RC1', Javier Gorroño, 07 Dec 2022
-
RC2: 'Comment on amt-2022-261', Anonymous Referee #2, 07 Nov 2022
In the past few years, satellite imaging has been shown to be a very important tool to track methane emissions. This has proved to be an important step toward reducing unwanted emissions and fight global warming.
The goal of this work is to propose a novel methane emission detection and quantification method for Sentinel-2, an open-access satellite constellation from the ESA. On top of this method, the authors show how the proposed framework can also be used, alongside simulated plume shape, to simulate realistic synthetic data and perform an in-depth analysis of quantification errors as well as detection limits.
While such work had already been done for other satellites, such as PRISMA, this is the first time, as far as I know, that a quantitative study is performed for Sentinel-2 using realistic synthetic data. This allows for a more extensive study than real on-site experiments, such as Sherwin et al. (2022), that are both costly and polluting (since they require potentially large release of methane).This work is mostly clear and very impactful. It is a very important block in the design of new methods for detecting and monitoring methane emissions but also helps understanding better the optimal conditions and limits of the current methods.
I list here more specific comments:
- While the conclusions are indubitably important, one can regret the low amount of testing. Indeed, only 5 plume shapes and three locations (at a specific date) were used. I think it is too little to derive an accurate conclusion, especially given the variance of the results:
- I think that more plume shapes is necessary. Given the high variability of the plume shapes used I feel like that it is quite difficult to draw good conclusions. Indeed, the authors mention that a small but highly concentrated plume interacts differently that a widespread plume but this is derived using only one to two examples. This is even more important since results with similar shapes (such as plumes 1 and 5) are not very similar (see Fig.9 and Fig.10).
- All dates selected are during the summer for the respected countries. This correspond to optimal sensing conditions (in the sense that the illumination of the scene is the best). It would be interesting to see similar results for other dates to measure the variability of the method and the impact of seasonality for example
- More example of sites in these environments would have also been beneficial for similar reasons as the one mentioned previously.
- With only three sites use, one can regret the lack of more diverse locations. In particular, no site from the Southern hemisphere were selected. At least a site with different weather conditions such as Canada or Russia (with snow for example) would have also been beneficiary. Even though these sites can be difficult to monitor in practice, the potential (and limits) can be interesting nonetheless. This is even more important since the SNR is signal dependent and as such depend on the type of albedo of the location. While probably not necessary for this paper, I think that it can be a very interesting follow up work for the future.
- How stable is the simulation process regarding to the spectral characterization? The ESA regularly updates the spectral characterization of both S2A and S2B so I assume that each time the analysis needs to done again but one can wonder how much would be the error is this wasn't done (and as such what is the error that is done when using an imprecise spectral characterization).
- The authors mentioned the spectral sensitivity difference between S2A and S2B. It is not entirely clear the satellites is used for the different experiences (it seems that the observation is from S2A and the reference from S2B based on l.106-112, and thus the synthetic experiment is using purely S2B). I think it could be interesting to study in that case the difference in detection limit between the two satellites (I assume that S2A limit is smaller than S2B since it is more sensitive to methane) and if there is a gain in using the reference image from the same satellite as well.
- l.60: The difference with the approach from Cusworth et al. needs to be more explicit. From the text, I notice at least two differences (simulated vs real images and the proposed correction term), are they other differences? I also think that it comparing the proposed retrieval approach to the one described in Cusworth et al. can be a valuable additional experiment. As it stands it is difficult to figure out the added value of the proposed quantification methods compared to the one previously presented since no explicit comparison is done.
- Some of the analysis is missing (or should be cited appropriately if it comes from another source), such as l.170 "it was found that ..." or l.178 "". I think that the experiments that lead to these assumptions should be presented as well to prove that these are indeed valid assumption.
- In section 3, the authors go from a 3D plume simulation to a 2D mass field thus neglecting the impact of the altitude of the plume. This should be clarified in the text (i.e clearly stating the assumptions + justification). Clearly specifying all hypothesis for the study is very important.
- Given that the correction term is scene dependent, It would have been interesting to have Fig.5 for the other sites as well to see if there are noticeable differences between sites.
- It is not entirely clear when the temporal normalization is the same as the observation image. I think this should be clarified for the different experiments. Another interesting experiment is the measure of the variability of the estimation as a function of the reference date. The method recommend using the closest reference but this allows to answer important questions such as "What happens when the closest is a poor reference?" or "Is the closest always the best choice or could another criterium be better?"
- l.~300: I think that a metric such as the SNR between the plume (delimited by the mask) and the rest of the scene used in Ehret et al. (2022) could be useful to describe the difference behavior in this section.
Additional form comments:- I suggest that the authors cite the 2021 report of the Intergovernmental Panel on Climate Change that contains a very detailed analysis of the impact of methane emissions on the planet.
- l.44, "high-quality calibration and high temporal revisit": I think that this should be clarified. The calibration and temporal revisit is poor compared to other satellites such as Sentinel-5P so I think it would be better to explicit the characteristic of Sentinel-2 instead of just qualifying as "high".
- For me some information are difficult to find in the text. For example, l.198 the authors state that "methane plume quantification is obtained from isolation of the term ...". It is only two sentence later that it is explained how (I think since no clear link is made). These links must be clearly explicit so that these sections are easier to understand and follow.
- I suggest adding an equation that clearly explicit that LB12 = ∫B12 LTOA(λ)dλ (l.~150)
- The paper is filled with small typos. I will list the one that stood out to me but I'm sure there are many others:
- l.96: missing ref
- "normalisation" (l.108) vs "normalization" (l.204)
- l.235: "acquitions" -> "acquisitions"
- l.390: 5Ì0 -> ~50
Citation: https://doi.org/10.5194/amt-2022-261-RC2 -
AC2: 'Reply on RC2', Javier Gorroño, 07 Dec 2022
Dear reviewer,
We have carefully considered your comments in the reviewed version of the manuscript. These comments have been very helpful to improve the overall quality of the manuscript. Here below we individually respond to each one of the comments:
- While the conclusions are indubitably important, one can regret the low amount of testing. Indeed, only 5 plume shapes and three locations (at a specific date) were used. I think it is too little to derive an accurate conclusion, especially given the variance of the results.
This is a very important comment and we thank the reviewer for bringing it up. In this reviewed version we have added a new subsection named “Extending the validation to a large plume dataset and season changes”. This section includes 221 methane plumes and the winter vs. solstice acquisitions. These three sites are representative of a typical scenario for S2 methane detection. However, we agree that more sites could be studied. As the reviewer points out, follow-up work is expected which focuses on a large number of sites and conditions with an already consolidated methodology.
- How stable is the simulation process regarding to the spectral characterization? The ESA regularly updates the spectral characterization of both S2A and S2B so I assume that each time the analysis needs to done again but one can wonder how much would be the error is this wasn't done (and as such what is the error that is done when using an imprecise spectral characterization).
We use version 3.0 of the spectral response. The spectral characterization changes mostly affect B1, B2 and B8. For methane detection and quantification, we select B12 and B11. The changes are not significant in these bands. Issue #39 in ESA: Sentinel-2 MSI Level-1C data quality report.
- The authors mentioned the spectral sensitivity difference between S2A and S2B. It is not entirely clear the satellites is used for the different experiences (it seems that the observation is from S2A and the reference from S2B based on l.106-112, and thus the synthetic experiment is using purely S2B). I think it could be interesting to study in that case the difference in detection limit between the two satellites (I assume that S2A limit is smaller than S2B since it is more sensitive to methane) and if there is a gain in using the reference image from the same satellite as well.
We have studied a couple of examples over Hassi Messaoud which suggest a small increase in the detection limit (somewhere between 0-500kg/h). It is an interesting point that we just mentioned at the end of the manuscript. However, we believe that this is an issue to incorporate with more detailed analysis in a follow-up study. Temporal normalisation is a complex multi-criteria decision. We explicitly mention in the manuscript that we start from the idea that a 5 or 10-day separated acquisition (same viewing) is a priori the best observation available. However, on a case-by-case basis this varies depending on spectral homogeneity, geolocation, or simply cloud coverage.
- The difference with the approach from Cusworth et al. needs to be more explicit. From the text, I notice at least two differences (simulated vs real images and the proposed correction term), are they other differences? I also think that it comparing the proposed retrieval approach to the one described in Cusworth et al. can be a valuable additional experiment. As it stands it is difficult to figure out the added value of the proposed quantification methods compared to the one previously presented since no explicit comparison is done.
No comparison with Cusworth et al. was included since the missions and scope of the work are different from this study. The proposed correction term here results in a correction of 15-20% as shown in Fig5. This correction results in a reliable benchmark product which we believe is an important added value.
- Some of the analysis is missing (or should be cited appropriately if it comes from another source), such as l.170 "it was found that ..." or l.178 "". I think that the experiments that lead to these assumptions should be presented as well to prove that these are indeed valid assumption.
We have made an extra effort to provide details of the methodology and results. It is possible that minor details that do not affect the content reproducibility are not included in order to reduce the manuscript length and improve readability.
- In section 3, the authors go from a 3D plume simulation to a 2D mass field thus neglecting the impact of the altitude of the plume. This should be clarified in the text (i.e clearly stating the assumptions + justification). Clearly specifying all hypothesis for the study is very important.
We agree with the reviewer that altitude might introduce small differences in the column of dry air. This was mentioned as upcoming improvements in the conclusion section.
- Given that the correction term is scene dependent, It would have been interesting to have Fig.5 for the other sites as well to see if there are noticeable differences between sites.
The correction is scene dependent but variations are expected to be small. The major component of the correction is the downwelling irradiance which will be highly similar between sites in the SWIR due to the high atmospheric transmittance. The surface effect is considered a second order error. Further adaptations of the study to this or other missions, this scene dependence (like the surface) could be simplified with a minor error on the simulated products. In the conclusions section we now completed the paragraph with these comments.
- It is not entirely clear when the temporal normalization is the same as the observation image. I think this should be clarified for the different experiments. Another interesting experiment is the measure of the variability of the estimation as a function of the reference date. The method recommend using the closest reference but this allows to answer important questions such as "What happens when the closest is a poor reference?" or "Is the closest always the best choice or could another criterium be better?"
The new subsection, “Extending the validation to a large plume dataset and season changes” introduces a discussion about normalization. It brings, for example, the importance of phenological considerations. Different periods of the year might show different vegetation covers (for example, grass in the summer). Cloud and cirrus would definitely be another important criteria. Temporal normalization is a topic that requires understanding of multiple conditions (orbit, phenology, cloud, viewing...). We covered different cases and highlighted some of these issues. However, this will require a dedicated study in upcoming studies. The conclusions section has included a better explanation of the temporal normalization at this point.
- I think that a metric such as the SNR between the plume (delimited by the mask) and the rest of the scene used in Ehret et al. (2022) could be useful to describe the difference behavior in this section.
This is an important suggestion that could be added as a potential metric in future reviews. However, we feel at this point such a metric could be counter-intuitive. The error in the enhancement image is not spatially uncorrelated but contains features (blob-like shapes are mentioned here).
- I suggest that the authors cite the 2021 report of the Intergovernmental Panel on Climate Change that contains a very detailed analysis of the impact of methane emissions on the planet.
Done
- "high-quality calibration and high temporal revisit": I think that this should be clarified. The calibration and temporal revisit is poor compared to other satellites such as Sentinel-5P so I think it would be better to explicit the characteristic of Sentinel-2 instead of just qualifying as "high".
The sentence has been changed to “detailed instrument characterization, and a 5-day temporal revisit”
- For me some information are difficult to find in the text. For example, l.198 the authors state that "methane plume quantification is obtained from isolation of the term ...". It is only two sentence later that it is explained how (I think since no clear link is made). These links must be clearly explicit so that these sections are easier to understand and follow.
This sentence has been rewritten to improve readability.
- I suggest adding an equation that clearly explicit that LB12= ∫B12 LTOA(λ)dλ (l.~150)
Done
- The paper is filled with small typos. I will list the one that stood out to me but I'm sure there are many others:
- 96: missing ref
- "normalisation" (l.108) vs "normalization" (l.204)
- 235: "acquitions" -> "acquisitions"
- 390: 5̃0 -> ~50
These changes have been correctly implemented.
Citation: https://doi.org/10.5194/amt-2022-261-AC2
- While the conclusions are indubitably important, one can regret the low amount of testing. Indeed, only 5 plume shapes and three locations (at a specific date) were used. I think it is too little to derive an accurate conclusion, especially given the variance of the results: