Detecting and quantifying methane emissions from oil and gas production: algorithm development with ground-truth calibration based on Sentinel-2 satellite imagery

. Sentinel-2 satellite imagery has been shown by studies to be capable of detecting and quantifying methane emissions from oil and gas production. However, current methods lack performance validation by calibration with ground-truth testing. This study developed a multi-band-multi-pass-multi-comparison methane retrieval algorithm that enhances Sentinel-2 sensitivity to methane plumes. The method was calibrated using data from a large-scale controlled release test in Ehrenberg, Arizona in fall 2021, with three algorithm parameters tuned based on the true emission rates. Tuned parameters are the pixel- 5 level concentration upper bound threshold during extreme value removal, the number of comparison dates, and the pixel-level methane concentration percentage threshold when determining the spatial extent of a plume. We found that a low value of the upper bound threshold during extreme value removal can result in false negatives. A high number of comparison dates helps enhance the algorithm sensitivity to the plumes in the target date, but values in excess of 12 days are neither necessary nor computationally efficient. A high percentage threshold when determining the spatial extent of a plume helps enhance the quan- 10 tification accuracy, but it may harm the yes/no detection accuracy. We found that there is a trade-off between quantification accuracy and detection accuracy. In a scenario with the highest quantification accuracy, we achieved the lowest quantification error and had zero false positive detections; however, the algorithm missed 3 true plumes which reduced the yes/no detection accuracy. On the contrary, all the true plumes were detected in the highest detection accuracy scenario, but the emission rate quantification had higher errors. We also illustrated a two-step method that updates the emission rate estimates in an interim ability more false positives due to the background noise of the target dates may be generated. Removing these false positives requires more work after the plume mask generation, such as removing the plume 305 masks that are far away from known well pad or pipeline locations. Other options may involve applying machine vision based shape learning methods to filter out plume masks with shapes unlikely to be generated by a gas cloud. We hope to develop an efficient method of false detection removal so that Sentinel-2 can play a more important role in routine oil and gas methane monitoring in the global scale.

estimate suggested that >80 Tg of methane emissions were from the oil and gas sector across the globe in 2021, ∼ 30% higher than the 62 Tg in 2000 (IEA, 2022). The most detailed studies to date have been performed in the United States, where the methane loss rate from oil and gas supply in 2015 was estimated at 2.3% of the gross natural gas production (Alvarez et al.,25 2018). Studies also claim that the U.S. official inventories have been consistently underestimating methane emissions in oil and natural gas systems, suggesting a more important role for methane in GHG emissions reduction in the oil and gas sector (Alvarez et al., 2018;Brandt et al., 2014;Zavala-Araiza et al., 2015;Rutherford et al., 2021).
Reducing methane loss from oil and gas systems will require measurement and monitoring. Because of the large spatial scale of the oil and gas industry, there has been significant interest in methane measurement methods using aircraft or satellites to 30 detect methane emissions across large areas (Karion et al., 2013;Hausmann et al., 2016;Chen et al., 2022;Cusworth et al., 2021). Particularly, satellite detection has been considered a promising methane emissions monitoring technology because of its frequent revisit time, wide spatial coverage and low labor cost. SCIAMACHY (2003SCIAMACHY ( -2012 and Greenhouse Gases Observing Satellite (GOSAT, 2009-present) were the first two satellites to measure total methane columns by solar backscatter in the shortwave infrared (SWIR) (Jacob et al., 2016). The EO-1 Hyperion spectrometer achieved the 35 first orbital detection of a methane superemitter plume from the Aliso Canyon release in 2016 (Thompson et al., 2016). The TROPOspheric Monitoring Instrument (TROPOMI) on the Sentinel-5 Precursor satellite launched in 2017 maps methane columns with daily global coverage at up to 7 × 5.5km 2 resolution (Veefkind et al., 2012;Hu et al., 2018). The GHGSat constellation instruments launched from 2016-2022, each provide methane measurements with 25-50m spatial resolution over a ∼ 12 × 12km 2 domain (Varon et al., 2018(Varon et al., , 2020. More recently, the Sentinel-2 twin land-surveying satellites launched 40 in 2015 and 2017 were shown to have moderate sensitivity to methane at specific wavelength bands (Varon et al., 2021).
The Sentinel-2 constellation has two polar-orbiting satellites placed in the same sun-synchronous orbit phased at 180 • to each other. The main Sentinel-2 data products are imagery from 13 spectral bands from the visible to the SWIR (Phiri et al., 2020).
Among these spectral bands, bands 11 (∼ 1560−1660 nm) and 12 (∼ 2090−2290 nm) integrate radiances over methane's 1650 50 and 2300 nm SWIR absorption features, thus enabling methane detection and quantification. Because of its global coverage, fine spatial resolution (20 × 20m 2 in band 11 and 12) and frequent revisit time (2-5 days), Sentinel-2 is believed to have potential for large-scale high-frequency monitoring of methane plumes in oil and gas producing regions (Ehret et al., 2021). Varon et al. (2021) developed three retrieval approaches to derive methane enhancements across a scene of a methane point source based on the Sentinel-2 data in bands 11 and 12. The single-band-multi-pass (SBMP) retrieval method uses the 55 changes in band-12 reflectance between a satellite pass with a plume and a pass sampling a reference scene with no plume to derive methane column enhancements. The multi-band-single-pass (MBSP) retrieval compares reflectance in band 11 and remove artifacts from the retrieval field. In that work, two case studies of applying these approaches to methane point-source plume detection from oil and gas facilities were presented, one in the Hassi Messaoud oil field of Algeria and the other in 60 the Korpezhe oil and gas field of Turkmenistan. The Korpezhe retrieval results were shown to be consistent with GHGSat-D satellite instrument observations in 2018-2019 although with higher observation density. Among the three retrieval methods, MBMP method generally performs the best, mainly because it increases the contrast of the plumes by combining two spectral bands and having one pass sampling a reference scene.
However, the retrieval methods from Varon et al. (2021) might still be improved. First, calibration of the retrieved emission source rates with ground-truth values needs to be done to validate the performance of the sensor and the retrieval method. Varon et al. (2021) validated the retrieval results by comparing them with GHGSat observations since GHGSat has relatively higher precision; however, ground-truth calibration with controlled release volumes is still essential in performance validation and retrieval method fine tuning. Second, the retrieval methods include tunable parameters such as the percentage threshold during plume mask extraction. Nevertheless, the optimal values of the tunable parameters were not discussed. Lastly, because 70 of Sentinel-2's limited sensitivity to methane, the MBMP retrieval method can generate false detections if the atmospheric conditions between satellite passes are different or if some ground features have higher reflectance in band 11 than band 12.
And removing these false detections still relies on manual verification, such as checking if a similar shape occurs in the satellite observation of the other bands or in the imagery basemap. New modifications need to be made to remove the false detections at scale in a reasonable and convenient way.

75
Here we present a Multi-band-multi-pass-multi-comparison (MBMPMC) retrieval algorithm based on the MBMP approach from Varon et al. (2021). The new algorithm extends the MBMP approach to enhance its sensitivity to methane plumes and reduces false detections. Additionally, we were able to calibrate the method using data from a single-blind controlled release test in Ehrenberg, Arizona in fall 2021. During calibration, three algorithm parameters were tuned based on the groundtruth emission rates to improve the algorithm performance. Furthermore, we show two simple application studies of the new 80 algorithm, one in an extended time period at the same region with the controlled release test, and one at a different region in the same time period. To our knowledge, this is the first time that a methane detection and quantification algorithm based on Sentinel-2 imagery has been calibrated with ground-truth emission rates. The MBMPMC retrieval algorithm is an improved retrieval method with modifications based on the MBMP retrieval method from Varon et al. (2021). The new algorithm follows the same logic of retrieving the vertical column concentrations of atmospheric methane ∆Ω (kg · m −2 ) from Sentinel-2 SWIR reflectances (see Figure 1). The main idea is retrieving methane column concentrations from one spectral measurement featuring methane absorption and one not, such as two observations from different passes with or without a methane plume, or two adjacent spectral bands with different methane absorption properties. For a given scene, the method compares the Sentinel-2 measurements with the top-of-atmosphere (TOA) radiance simulated by a 100-layer, clear-sky radiative transfer model at 0.02 nm spectral resolution over the band 11 and 12 wavelength ranges. The specific steps are: first in a specific pass 1, the methane concentration enhancements are retrieved by minimizing the difference between the fractional change of Sentinel-2 reflectance and a fractional absorption model based on the simulated TOA radiance in bands 11 and 12; then the same process is repeated in another pass 2, and the difference of these two retrieved 95 column enhancements (two MBSP retrievals) is the MBMP methane column enhancement in pass 1 (Equation (1)). Here the subtraction between two passes aims to remove systematic errors in the MBSP retrieval due to wavelength separation between bands 11 and 12. In other words, the MBSP retrieval in pass 2 is mainly used for removing artifacts of the MBSP retrieval in pass 1. Therefore, in this paper we name pass 1 as the "target date (TD)" and pass 2 as the "comparison date (CD)" for clarification. The TD in our method is the date for which the plume size is estimated. And by default here the target date is 100 assumed to be chronologically after the comparison date, although in practice this need not be the case.
We make some modifications during the column retrieval process since the MBMP retrieval can still lead to false detections, especially in the MBMP subtraction step (Equation (1)). In theory, in the background with no methane plume, we expect the two MBSP retrievals to have similar values of methane column enhancements since they are at the same scene. However, this is 105 not always true because: (1) MBSP retrieval can be greatly affected by the atmospheric conditions such as cloud coverage; (2) the MBSP retrieval in one pass may have similar spatial distribution but with all the pixel values higher or lower than the MBSP retrieval in another pass due to differences in various atmospheric or earth properties (e.g., solar zenith angle, surface albedo) between different dates; and (3) other unpredictable random measurement errors can occur in a specific pass. Therefore, we add the following steps to further reduce the number of false detections (see Figure 1 for sequence):

110
Choose clear-view passes. First, we only select passes with a clear view for both the target date and comparison dates since clouds can result in false detections by affecting reflectance. Here we use Sentinel-2 cloud probability, a data product created with the sentinel2-cloud-detector library, to select clear-view passes with no large cloud coverage. Specifically, we select the passes with less than 10% of cloud coverage (i.e., the area with cloud probability higher than 65% is less than 10% of the total area of the study region).

115
Normalization. If two MBSP retrievals of Equation (1) have a uniform value difference in all the pixels, artifacts will still be preserved after the MBMP subtraction. So we normalize both MBSP retrievals before the MBMP subtraction to maximize the effects of artifacts removal. For example, in Figure 2, the MBMP retrievals with normalization show more plume contrast with the background compared with the ones without normalization. Some artifacts, such as the straight line in the unnormalized retrieval with 09/19/2021 as the comparison date, are also removed in the normalized retrieval. Therefore, changing MBSP 120 retrievals to the same scale helps enhance the ability to detect true methane plumes. However, note that the resulting concentration enhancements after normalization are no longer "actual" enhancements, thus should not be used to calculate the emission rates. In other words, normalization is only used for detecting the plume location and shape.
Remove extreme values. In some cases extremely high methane column enhancements can be generated for a small number of pixels because of the appearance of random features in one of the two passes. Thus we also remove extreme values for the 125 two MBSP retrievals before normalization. The removal method is based on setting upper and lower bound thresholds, and truncating values outside the bound thresholds to the threshold values. Here we set the lower bound threshold as 0 kg m −2 , and the upper bound threshold will be tuned using the controlled release experimental data below. Similar with normalization, this step is only used for plume detection instead of quantification.
Include multiple comparison dates. Instead of using a single comparison date, we include multiple comparison dates to 130 help with plume detection. Different with the "sliding window" method from Ehret et al. (2021) which uses a multi-linear regression onto 1-20 previous passes, we directly take the average of comparison date retrievals as the subtrahend in the MBMP subtraction. Using multiple comparison days helps to stabilize the background since the background values can vary among different passes due to weather, temperature, surface albedo difference, and other variation. Shown in Figure 3, more comparison dates provide a more stable background, and therefore are more likely to increase the contrast of the plumes. On 135 the other hand, it is possible that in real application, the comparison date may also have methane plumes at the same location with similar shape as the plumes in the target date. In this case, it is harder for the algorithm to detect the target date plumes after the MBMP subtraction. So using the average of multiple comparison dates helps lower the possibility of the occurrence of high-volume methane plume in the subtrahend, thus enhance the algorithm sensitivity to the plumes in the target date. Here the comparison dates are selected as continuous clear-view passes before the target date, and the number of comparison dates 140 is a parameter that will be tuned using the controlled release experimental data below. Because the new algorithm considers multiple comparison dates for the multi-band-multi-pass approach, it is named the "Multi-band-multi-pass-multi-comparison" (MBMPMC) retrieval algorithm.
After column retrieval, the methane column enhancements ∆Ω M BM P M C are further used to calculate the emission source rate Q using the Integrated mass enhancement (IME) method described by Varon et al. (2021) (Equation (2)) (Frankenberg 145 et al., 2016;Varon et al., 2018). In this equation, IME is the integrated mass enhancement (kg), U ef f is the effective wind speed (m/s), and L is the plume size (m).
To calculate IME, we first generate Boolean plume masks based on ∆Ω M BM P M C by selecting methane columns above some percentage threshold for the scene, and smooth with a 3 × 3 median filter and a 3 × 3 Gaussian filter (see Figure1 (e)).

150
Here the percentage threshold is a parameter that will be tuned using the controlled release experimental data below. This plume mask generation step sets the location and shape of the methane plumes.
Then the IME is defined as the sum of multiplication of column enhancements and pixel-level area of all the mask pixels.
Note that the column enhancements here are the original enhancements without any data transformation such as normalization or extreme value removal applied to aid detection of the plume shape. The effective wind speed U ef f is the function of 155 the local 10 m wind speed U 10 derived by Varon et al. (2021), calibrated with large-eddy simulations. We collect local wind speed data from the High-Resolution Rapid Refresh (HRRR) atmospheric model from U.S. National Oceanic and Atmospheric Administration(U.S. NOAA, 2021). The plume size L is taken in a simplified form as the square root of the plume mask area.

Performance assessment
To validate the performance of the new algorithm, calibration is required to compare the algorithm outcome with the ground 160 truth. The goal of calibration is to assess the algorithm performance in both detection and quantification. Accurate yes/no detection is defined as the algorithm being able to detect a methane plume when it appears, and detecting nothing when no plume appears. Accurate quantification means that the emission rate estimates derived from the algorithm are consistent with the ground-truth measured release volumes.
Additionally, the algorithm performance can also be improved by parameter tuning to best match the ground truth. Here three parameter affects the algorithm outcome is described as below: The upper bound threshold b u : b u is a parameter that occurs during the extreme value removal, during which the retrieval values higher than it are considered to be extreme outliers and are replaced by the threshold value. So a lower b u means a more 170 strict constraint during extreme value removal. Ideally, an optimal b u helps remove false detections due to the extreme highs.
However, if b u is too low, a true methane plume may also be ignored since its retrieval values could be removed.
The number of comparison dates n: We expect that the higher n is, the more stable the background is, thus the contrast of the plume is increased. However, this stability increase is not linear, so the increase of n may not help much in the case of a very large n. In addition, the computation workload also increases along with higher n, approximately linearly with n.

175
The percentage threshold p: The higher p is, the fewer pixels are included in the plume mask. So a higher p means a smaller plume mask area. This may help with removing false positives and enhancing quantification accuracy, but may also lead to false negatives or result in underestimation of plume volume if selected at too high of a value.
To quantify the algorithm performance, we use two assessment factors with focus on different aspects. First, we choose F1 score to assess the performance of detection. F1 score is a function of "precision" and "recall", measures of false positives 180 and false negatives respectively (Equations (3)(4)(5)). F1 score has a range of 0 to 1, with higher values representing better algorithm performance. In addition, we choose the average absolute error (AAE) to assess the performance of quantification (Equation (6), where x i andx i are the emission rate estimate and ground-truth emission rate in day i, and N is the number of days 3 Results

190
In fall 2021, a single-blind controlled release test was conducted by the Stanford University Environmental Assessment & Optimization Group. The test was performed in Ehrenberg, Arizona, the testing methods are described in detail in Rutherford et al. (2022), and the test was generally similar to previous tests of airplane-based methane plume detection from the same group (Sherwin et al., 2021). This test aimed at assessing the performance of various aircraft and satellite methane detection technologies. During the test, the participants were given the information of time and location of the potential release, although 195 the methane plume volumes (including zero, i.e., no methane plume) were unknown to them. Participants were asked to estimate the mass emissions rate during each observation in kg CH 4 /h. Specifically for Sentinel-2, there are 7 clear-view satellite passes and one cloud-covered pass covered in this test from 10/17/2021 to 11/03/2021. Here we consider only the 7 clear-view passes, and also add three dates after the test with zero emission, so that in total 10 target dates with groundtruth emission rates are used to do the ground-truth calibration. Of the 10 target dates, 5 have methane plumes with non-zero 200 emission rates, and 5 have no methane plumes. Figure 4 region A is the study region that covers the controlled release point source. After calibration, we also provided two simple application studies to validate the algorithm performance (Section 3.2).
Because we lacked other ground-truth data to use as a blind test set, one goal of these application studies was to test if the algorithm can avoid generating false positives in the case of no methane plumes.

Controlled release calibration 205
We selected a wide value range for each algorithm parameter during the parameter tuning. For b u , we noticed that the magnitudes of the pixel-level column enhancements of a methane plume are usually from 10 −3 to 10 −1 kg · m −2 . So we selected 10 values from 0.01 to 0.1 kg · m −2 with increment 0.01 kg · m −2 , and 4 other values 0.005, 0.12, 0.15 and 0.20 kg · m −2 .
For n, for each target date 15 clear-view passes were selected with the earliest comparison date around 45 days before the target date, so n ranges from 1 to 15 with increment 1. And for p, 16 values were selected from 0.80 to 0.95 with increment 210 0.01. Therefore, there are in total 3360 scenarios of different combinations of three parameters. Each of these 3360 parameter settings was run to quantify volumes from all 10 study days. Figure 5 shows how each parameter affects the algorithm outcome. In each figure, an assessment factor (AAE or F1 score) is shown as a function of two parameters, based on a fixed value of the third parameter (i.e., a "slice" through 2 parameters keeping the third constant). Here the fixed values are from the parameter setting with the lowest AAE. Figures 5(a)(b) show 215 that a small b u value (0.005-0.02 kg · m −2 ) leads to bad algorithm performance with high AAE and low F1 score (AAE>1.3, F1 score<0.4). This suggests that the b u constraint is too strict in this range and removes retrievals not only from the extreme highs, but also from true methane plumes. Thus the algorithm starts to generate false negatives. Particularly in Figure 5(b) when b u is 0.005kg · m −2 , we see NAN values of F1 score because there is no true positive detection at all. Aside from the low value range, AAE and F1 score show less sensitivity to b u at the other values. Therefore, the conclusion from b u tuning is that 220 one should avoid excessively low values of b u (< 0.02kg · m −2 ).
Figures 5(a)(c) show a rough decreasing trend of AAE along with higher n when n < 12. This suggests that a higher n helps with quantification accuracy by providing a more stable background and lowering the possibility of high-volume plume in the comparison dates. However, AAE does not show an obvious decrease when n ≥ 12, which suggests that 12 or more comparison dates are not necessary, or at least ceases to improve performance. Figure 5(b)(d) show low F1 scores when n is 225 low (for example, F1 scores <0.67 when n = 2). This is because some target dates have their earlier comparison dates with higher methane plume volumes, and a low value of n does not effectively reduce the average volume in the comparison dates, thus resulting in more false negatives. In real application, this may be a more serious problem if the plume is continuous among a long time period with varying volumes. Additionally, computational cost is roughly proportional to n, so too high of a value of n can have excessive computational costs with little benefit to accuracy. Therefore, the value of n should not be too low nor 230 too high, and from the figures we can conclude that a reasonable choice of n is in the range 10-12.
Figure 5(c)(e) show that AAE decreases with higher p at first, but starts to increase when p > 0.92. The decreasing trend is due to smaller plume volumes and less false positives resulting from smaller plume masks during the Boolean plume mask generation. The increasing trend in high p range, however, is because p becomes sufficiently high such that no mask is generated even for the dates with real methane plumes. This also explains why in Figure 5(d)(f) the F1 score is low in high p ranges. Low

235
AAEs occur in the p range 0.91-0.93, while high F1 scores occur in the p range 0.85-0.86. This suggests a trade-off between accurate quantification and accurate yes/no detection: accurate quantification usually requires a high p value, but accurate yes/no detection needs a lower p value (though not excessively low). Therefore, when selecting the best p value, we can choose to emphasize quantification accuracy and accept the possibility of missing plumes (p > 0.90) ; or we can choose to detect more plumes, and accept the possibility of emission rate overestimation(p ≈ 0.85).

240
Here two specific scenarios shown in Table 1 and 2 further illustrate the trade-off between accurate quantification and accurate yes/no detection. The "Min AAE" scenario is an example of pursuing quantification accuracy. It has the lowest AAE of all the parameter settings and with the highest precision, meaning that it also has the minimum amount of false positives.
However, this scenario has three false negatives that reduce the F1 score. Aside from this specific scenario, the top 1% scenarios with low AAEs have their b u ranging widely in 0.03 − 0.15, n in a middle-to-high range of 7 − 14 and p staying high in 245 0.91 − 0.92. On the other hand, the "Max F1 score" scenario has the highest F1 score. It doesn't have false negatives, but in order to find all plumes it becomes too aggressive, leading to one false positive. Note that multiple scenarios have the same highest F1 score, and the scenario we show here is the one with the lowest AAE among them. The top 1% scenarios with high F1 scores have their b u ranging widely in 0.02 − 0.12, n in a wide range of 1 − 15 and p in the middle range of 0.82 − 0.85.
As a compromise, we developed a method to apply approaches in sequence to reduce the quantification error further while 250 keeping a high F1 score. The specific steps are: (1) apply a scenario with high F1 score as the base case to generate the first round of emission rate estimates; (2) raise the value of p and apply the updated scenario again to generate the second round of emission rate estimates; (3) for the passes with non-zero emission rates in both scenarios, update the base case estimates to the new ones since they are likely to be closer to the ground-truth volumes. We name this method the "two-step application" method. Here we only change the value of p since the mask extraction step where p is applied is after the column retrieval 255 step where b u and n are applied. So a consistent b u and n greatly reduces the computation workload as we only need to redo the mask extraction. Table 1 shows an example of the two-step application ("Two-step hybrid" scenario) with the "Base case" scenario. Results show that the 'Two-step hybrid" scenario achieves lower AAE than the "Base case" scenario with F1 score remaining the same. Specific locations and shapes of detected plumes in "Min AAE", "Max F1 score" and "Two-step hybrid" scenarios are shown in Figure 6. 260 We also compared the performance of MBMPMC algorithm with MBMP, MBSP and SBMP methods from Varon et al. (2021) in Figure 7. The top row is for a true emission rate of 7.38 tCH 4 /h while the bottom row is for a true emission rate of 0 tCH 4 /h. Results show that the MBMPMC algorithm performs the best with both true positive and true negative detections.
Its emission rate estimates are also the closest to the ground-truth volumes. The MBMP method has true negative detection in 10/17/2021, but shows a small false positive detection in 10/19/2021. Its emission rate estimate in this date is also much lower 265 than the ground truth. MBSP and SBMP retrievals perform worst with multiple large-area false positive plumes.

Broader application to examine false positives in cases of no ground release
To test the algorithm's performance in avoiding false positives, we applied the algorithm with the "Min AAE" scenario since it achieved zero false positive in the ground-truth calibration above. Two application studies were designed, one in an extended three-month time period from 10/01/2021 to 12/31/2021 at the same region with the controlled release test (Figure 4 Figure   8.

Conclusion
This study presented a multi-band-multi-pass-multi-comparison (MBMPMC) methane retrieval algorithm using Sentinel-2 275 satellite imagery with several modifications based on the multi-band-multi-pass (MBMP) retrieval method from Varon et al. (2021). The major modification is including multiple comparison dates into the retrieval, which helps increase the contrast of the plume by stabilizing the background.
The new retrieval algorithm was then calibrated by a controlled release test in Ehrenberg, Arizona in fall 2021. During calibration, three algorithm parameters were tuned based on the ground-truth emission rates to improve the algorithm performance.

280
They are the the pixel-level concentration upper bound threshold b u for extreme value removal, the number of comparison dates n, and the pixel-level methane concentration percentage threshold p when determining the spatial extent of a plume. We found that although the algorithm sensitivity to b u is generally not very high, a low b u value can decrease its accuracy by resulting in false negatives. n value should be high enough to enhance the algorithm sensitivity to the plumes in the target date, but values > 12 are neither necessary nor computationally efficient. A high p value helps enhance the quantification accuracy, but it may 285 harm the yes/no detection accuracy by missing some true plumes.
The controlled release calibration suggests that there is a trade-off between quantification accuracy and detection accuracy. If the algorithm aims to guarantee the quantification accuracy and avoid false positives, then a b u in range 0.03-0.15, a n in range 7-14 and a p in range 0.91-0.92 are preferable. If the algorithm is expected to guarantee the detection accuracy, particularly with the fewest false negatives, then it would be more appropriate to choose b u in 0.02-0.12, n in range 1-15 and p in range 290 0.82-0.85. We also illustrate a two-step method that changes the parameter values and updates the emission rate estimates in an interim step which improves quantification accuracy while keeping high yes/no detection accuracy.
To our knowledge, this is the first study that validates the performance of a Sentinel-2 methane detection and quantification algorithm by calibrating it with the ground-truth emission rates. We believe the ground truth calibration offers researchers an opportunity to optimally tune methane retrieval algorithms and have confidence in their widespread deployment. In the future, 295 the MBMPMC algorithm can be validated with more systematic experiments wherein the algorithm can be adjusted or tuned to meet different detection expectations.
We believe that the algorithm can still be improved further in the following aspects. First, the optimal values of three parameters may vary in different situations. For example, b u may vary with the methane plume volumes; n is affected by whether the plume is continuous or discrete in time; and p also depends on the area of the plume and the area of the study 300 region, so it may vary with the study region size. This study simplifies these problems since our controlled release test covers only one region with a short time period. We hope to explore these questions more with more abundant ground-truth data in the future. Additionally, the current algorithm focuses more on removing false positives resulting from the background noise of the comparison dates. In real applications, however, more false positives due to the background noise of the target dates may be generated. Removing these false positives requires more work after the plume mask generation, such as removing the plume 305 masks that are far away from known well pad or pipeline locations. Other options may involve applying machine vision based shape learning methods to filter out plume masks with shapes unlikely to be generated by a gas cloud. We hope to develop an efficient method of false detection removal so that Sentinel-2 can play a more important role in routine oil and gas methane monitoring in the global scale.
Code and data availability. The methane detection and quantification algorithm code will be made available upon request. The methane column retrieval code will be made available for non-commercial use upon request (GHGSAT Data and Products -Copyright © 2021 GHGSAT Inc. All rights reserved). The Sentinel-2 satellite imagery are available in the Google Earth Engine (GEE) cloud platform, and the HRRR wind data are available in the AWS HRRR GRIB2 Archive. Both the data collection codes will be made available upon request.  Two-step hybrid 0.04 10 0.87→ 0.91 1.09 0.67 1 2 "Min AAE" scenario: the scenario with the lowest AAE; "Max F1 score" scenario: the scenario with the highest F1 score; "Base case" scenario: the base case of the two-step application method example; "Two-step hybrid" scenario: the two-step application method example.