Constraining the Accuracy of Flux Estimates Using OTM 33A

Other Test Method 33A (OTM 33A) is a near-source flux measurement method developed by the Environmental Protection Agency (EPA) primarily used to locate and estimate emission fluxes of methane from oil and gas (O&G) production facilities without requiring site access. A recent national estimate of methane emissions from O&G production included a large number of flux measurements of upstream O&G facilities made using OTM 33A and concluded the EPA National Emission Inventory 5 underestimates this sector by a factor of ∼2.1 (Alvarez et al., 2018). The study presented here investigates the accuracy of OTM 33A through a series of test releases performed at the Methane Emissions Technology Evaluation Center (METEC), a facility designed to allow quantified amounts of natural gas to be released from decommissioned O&G equipment to simulate emissions from real facilities (Fig. 1). This study includes test releases from single and multiple points, from equipment locations at different heights, and spanned methane release rates ranging from 0.16 to 2.15 kg h-1. Approximately 95% of 10 individual measurements (N=45) fell within ± 70% of the known release rate. A simple linear regression of OTM 33A versus known release rates at the METEC site gives an average slope of 0.96 with 95% CI (0.66,1.28), suggesting that an ensemble of OTM 33A measurements may have a small but statistically insignificant low bias.

during the day to meet meteorological requirements, and diurnal variability of emissions associated with onsite maintenance could impact aircraft-based emission estimates in some basins (Schwietzke et al., 2017;Vaughn et al., 2018;Zaimes et al., 2019).
A final type of measurement technique used to estimate emissions from O&G production facilities-and the focus of this study-are downwind measurements that estimate emissions by using methane mixing ratio and wind measurements to derive 5 the source flux. Downwind emission flux estimates are made using parameters measured in the field combined with additional parameters found with Gaussian or atmospheric dispersion models (Brantley et al., 2014;Rella et al., 2015;Caulton et al., 2018;Robertson et al., 2017;Lan et al., 2015;Foster-Wittig et al., 2015). Downwind measurements do not require site access, but may not be able to identify or capture all sources onsite, especially buoyant ones. Similar to TFR, these techniques require downwind roadways (50 -200 m away) and consistent wind direction. Operator-approved site access can improve OTM 33A 10 measurement success in regions with limited downwind roadway infrastructure or complex topography. Though sampling time can be considerably faster than TFR or onsite techniques, it is hard to measure enough sites to get a representative sample (and therefore a flux) of an entire O&G basin (Harriss et al., 2015). As a whole, all of the emission measurement techniques mentioned here are only representative of a timescale between seconds and hours, and therefore have difficulty capturing emissions sources with large temporal variability (U.S. EPA, 2014;Brantley et al., 2014;Robertson et al., 2017;Bell et al., 15 2017;Caulton et al., 2018;Vaughn et al., 2018).
This study focuses on a ground-based mobile emissions measurement approach, Other Test Method 33A (OTM 33A). OTM 33A is among the most common downwind methods, along with TFR, used to measure methane and VOC fluxes from O&G sources (Brantley et al., 2014(Brantley et al., , 2015Robertson et al., 2017). A recent study by Bell et al. (2017)  ∼40-60% of emissions measured or estimated by onsite teams in the Fayetteville when the dominant emission source was an onsite direct measurement rather than a simulated emission source. OTM 33A had a larger low bias when manual or automated unloadings were measured. Manual or automated unloadings occur when the well pressure is not great enough to move liquids from the geologic formation, preventing gas flow to the pressurized sales line. To maximize the pressure differential, the well is vented directly to the atmosphere in order to remove accumulated liquids. This process can be performed manually 25 or automatically, and may use a plunger to assist with liquid removal. This creates an emissions plume with high vertical velocity. It is likely the majority of this plume would pass over the mobile laboratory unless perfect conditions and road access generate a downwind measurement site 200 m or less from the source. The results of the Bell et al. study add uncertainty to recent national methane emission estimates, which relied heavily on OTM 33A measurements in five O&G basins (Alvarez et al., 2018). However, the Alvarez et al. study also found that basin-wide emission estimates based on OTM 33A facility 30 measurements agreed with airborne basin-wide flux estimates to within measurement uncertainty. Additionally, no significant low-bias (> 10%) was detected in numerous (>100) OTM 33A test releases, conducted by multiple groups (Brantley et al., 2014;Robertson et al., 2017). These test releases were all single point-source releases conducted in open terrain without obstacles, which may not be a reliable comparison to the types of emission sources experienced in O&G fields. The discrepancy between results of Bell et al. study and previous test releases, along with the potential significant impact on national emission estimates, motivated the suite of more realistic test releases described here.

Mobile Laboratory
The University of Wyoming mobile laboratory is a customized Freightliner Sprinter van. The front of the van is equipped 5 with a horizontal mast that projects instrumentation and the inlet at a fixed height of 4 meters above the ground slightly beyond the vehicle's front bumper. Meteorological instruments on the mast include a 3-D sonic anemometer and an all-in-one compact weather station. The mast also includes a camera, an AirMar differential GPS, and a Teflon inlet (1/4" OD) for gas-phase species. Ambient air is pulled through the Teflon inlet at a rate of 6.5 L min -1 . For the test releases described here, the laboratory was instrumented with a G2204 Picarro Cavity Ringdown Spectrometer (CRDS) which has been modified to 10 measure water vapor and dry methane concentrations at a frequency of 2 Hz. The Picarro has an additional meter of 1/8" OD Teflon tubing that branches from the main inlet line, resulting in a total sample transit time through the inlet to the instrument of one second. This lag is accounted for during data processing. Additionally, the van contains a battery bank which allows the instrumentation and data acquisitions system to be used while the vehicle engine is turned off.

15
The Picarro response was tested using two NIST certified methane-zero air mixtures (2.538 ± 0.05 ppm, 101 ± 5 ppm), and ultra-high-purity zero air (UHPA) at intervals throughout the campaign to confirm stability and accuracy. The instrument was always within ± 0.01 ppm of the lower NIST standard, ± 1 ppm of the higher standard, and ± 0.003 ppm of zero when tested with UHPA. The 5-second instrument precision is ± 0.002 ppm. Due to the observed instrument stability and accuracy, no calibration adjustments were made to methane concentrations during data processing. 20

OTM 33A Measurement Method
OTM 33A is one of the EPA Geospatial Measurement of Air Pollution Remote Emission Quantification (GMAP-REQ) techniques that was designed to observe, characterize, and/or quantify emissions from a variety of sources, though OTM 33A has been used most often to measure emissions from O&G operations (U.S. EPA, 2014;Thoma, 2012;Brantley et al., 2014Brantley et al., , 2015Robertson et al., 2017). While several quantification approaches are possible with OTM 33A, the one most commonly 25 employed is an inverse Gaussian approach, which is the focus of this manuscript. OTM 33A has three operational parts: concentration mapping, source characterization, and emission rate quantification. Detection of emissions occurs by driving downwind of possible emission sources in an attempt to transect an emissions plume, measure the ambient background trace gas mixing ratio, and, if possible, to rule out any emissions from upwind sources. Source characterization includes observations of temporal variability and emissions composition. If enhancements of methane or other trace gases are detected during downwind 30 transects of a possible source, the laboratory is parked 20-200 m directly downwind within the emission plume to quantify emissions. Care is taken to position the mast directly into the dominant wind direction to minimize impact from turbulent eddies around the vehicle. Once the laboratory is safely positioned, the vehicle is turned off and an OTM 33A flux measurement begins. During the ∼20 minute measurement, 2 Hz measurements of wind direction (in x, y, and z), wind speed, temperature, and the methane mixing ratio are collected and time-stamped with a universal data system time. Meanwhile, distance to the 5 possible emission sources relative to the mast of the laboratory are measured using a TruePulse laser range finder (Model 200).
If possible, the most likely emission source is identified using an infrared camera (FLIR GF300). Site photos and observations are also collected.
The OTM 33A analysis program, written in MATLAB (2015), estimates an emission mass flux, Q [g s -1 ], by using the Gaussian dispersion equation (Eq. 1). The terms of this equation are found as follows. First, the lowest 5% of measured mixing 10 ratios during the ∼20 minute measurement are averaged and considered ambient background, which was around 1.9 ppm (±0.15 ppm) of methane for this study. The background value is subtracted from the data to yield methane enhancement. The analysis program bins observed methane enhancements by wind direction into 10°bins ( Fig. 2(a)), and then calculates the average methane enhancement observed in that wind bin. A plot of methane enhancement vs. wind direction is then generated and fit to a Gaussian distribution ( Fig. 2(b)). The Gaussian fit's apex is C peak [g m -3 ]. To determine the expected spreading of Equation 1 does not include any terms for ground reflection of the plume, plume buoyancy/velocity, or differences in height 25 of the emission source and measurement inlet. OTM 33A assumes a single emission point. For this reason, OTM 33A is best suited for measuring O&G facilities with equipment concentrated in one area that have downwind roadways. OTM 33A struggles to quantify plumes with a particularly high vertical velocity or buoyancy (such as manual unloadings, lit or unlit flares, or very hot emissions). In this scenario, the calculated C peak will not represent the center of the emission plume, leading to underestimations of these sources (Bell et al., 2017). The estimated lower detection limit of the method is 0.01 g s -1 (0.036 30 kg h -1 ) (Brantley et al., 2014).
A series of built-in data quality indicators (DQI) will flag an OTM 33A flux estimate for a variety of reasons, including poor Gaussian fit, inadequate sampling time within the emission plume, too variable wind speed or direction, or a maximum methane enhancement that is too small. Flags are then added up, and measurements are broken into categories that represent the probability an OTM measurement is a good flux estimate. For the current study, the same approach as Robertson et al. (2017) and Bell et al. (2017) was used where most of the Category 1 and a few Category 2 measurements that were only flagged for low methane concentrations (max enhancement less than 100 ppb above background) were considered. Occasionally, measurements with very few DQI flags (Category 1 measurements) will be thrown out after review of the Gaussian fit or if IR camera images suggest we are missing most of the emission plume. Full descriptions of the DQI can be found in SI Sect. 1.2, Robertson et al.

Test Releases
The University of Wyoming performed two sets of test releases to assess the ability of OTM 33A to quantify methane emissions. The first set of tests, the Christman Field Test Releases (CF-TR) were conducted in conjunction with Colorado State University in July and August of 2014 at the abandoned Christman Airfield in Fort Collins, CO. These releases consisted of 10 two configurations, a simple point source (an opened gas cylinder) and manifold (an elevated ∼6-foot length of PVC pipe with many perforations). Neither source of methane gas was obstructed, and they were, in essence, single point sources, one slightly broader than the other. Release rates were set using calibrated mass flow controllers and are correct to within 5%. These tests  (2015) and Robertson et al. (2017).

The more-recent set of tests were performed at the Methane Emissions Technology Evaluation Center (METEC) in Fort
Collins, CO in June of 2017. METEC contains multiple faux O&G facilities ranging in size and complexity with decommissioned O&G equipment that has been plumbed to release a known amount of natural gas (>94% methane) from a multitude 20 of points. For this study, we used one METEC site representative of a small O&G facility that included a condensate storage tank, separator, and well head, all of which were plumbed to be possible emission sources, 11 of which were used in this study ( Fig. 1). This resulted in 15 release configurations that had from 1-3 release points at different heights (0.33-4 meters), up to 6 meters apart from one another. The relative complexity of the site also introduced obstructions (the methane release would have to flow around a large tank or other piece of equipment to reach the mobile lab) which could potentially impact 25 release quantification. Releases spanned 0.17 to 2.15 kg hr -1 and were controlled by combining flows from a number of critical orifices, resulting in a four σ release error less than 5%. Meteorological conditions ranged from sunny to partly cloudy, with average winds from 2-9 m s -1 from the E/SE. The calculated PGI ranged from 3-6, which roughly correspond to Pasquill- of the METEC-TR are within ±38% of the known release, perhaps suggesting that a slightly higher 1σ error is appropriate, especially if measuring emissions fluxes less than 0.5 kg hr -1 . For the combined set of test releases, greater than 85% of the 20 data are within ±50% of the known value, and 95% of the data are within ±73%. If a Gaussian curve is fit to all of the test release data (N=45), the 95% confidence interval is found to be +54% to −84%, suggesting a low bias of −15% and a 2σ error of ±69% (Fig. S5). The rounded 2σ confidence interval for test releases of ±70% would become 0.58q and 3.33q when q is an OTM 33A estimate made of an unknown emission source in an O&G basin. The number of replicate measurements of METEC release configurations were too small to perform a similar statistical analysis (N=10), but multiple measurements 25 did not decreased the mean OTM 33A measurement error (14.7% for replicate measurements, 13.1% for all measurements).
Replicate measurements have been shown to improve flux estimates but at the expense of measuring a number of unique sites (Brantley et al., 2014).

Ordinary Least Squares Regression
Another approach to assess the performance of OTM 33A is using an ordinary least squares (OLS) regression applied to a 30 correlation plot of the OTM 33A flux estimate versus the known release rate. Assuming the OTM-measured flux and known release rate converge at (0,0) yields OLS slopes of 0.91 for CF-TR and 0.92 for METEC-TR (Fig. 6). This suggests OTM 33A may have a ∼ −10% negative bias when an ensemble of measurements are considered. Notably, the increased complexity of the METEC-TR did not yield a more significant bias like that reported by Bell et al. 2017

Bland-Altman Analysis
Because the rate of methane releases for both the METEC and Christman tests are known to within a small margin of error (<5%), OLS regression, which assumes no error in the independent variable, is a reasonable approach. However, OLS analysis is weighted by larger release rates and may not give an accurate representation of OTM 33A performance at all methane 15 emission rates. Bland-Altman (BA) analysis removes this bias by considering the difference between the test release and OTM measurements (known release -OTM flux) as a function of a known release rate (Fig. 7) (Giavarina, 2015). Bland-Altman analysis also assumes that the method difference (y-axis) comes from a normal distribution. Kolmogorov-Smirnov statistical tests supporting the normality of the method difference can be found in SI Sect. 3. For BA analysis, if the 2σ range of method difference includes zero, the methods are considered to be statistically equivalent; i.e. no bias (Giavarina, 2015). The BA 20 plot also illustrates the amount by which OTM can over-(negative numbers) or under-estimate (positive numbers) the known release. On average, the CF-TR and METEC-TR both underestimate the known releases, with mean differences of 0.028 kg h -1 and 0.025 kg h -1 respectively. However, since the 2σ interval includes zero, the BA analysis identifies no statistical difference between the OTM 33A flux estimate and the known release rate.

25
Other approaches for minimizing the influence of larger release rates on the OLS fit include orthogonal distance regression (ODR) and variance weighted least squares regression (VWLS). These methods take into account error in both the x and y variables, and require that each measurement has an independent uncertainty estimate on both axes. Since uncertainty of OTM 33A flux estimates is taken as a fixed percentage of the estimated value, these methods tend to perceive a higher confidence (smaller absolute uncertainty) in smaller estimates, and a lower confidence (higher absolute uncertainty) in larger estimates.

30
This results in a fit with a low bias since the estimates with smaller absolute uncertainty are strongly weighted, and less weight is given to estimates with larger uncertainties. To examine these approaches, the METEC-TR are used as an example below.
Applying a measurement uncertainty of ±50% (representing the % error that roughly 85% of the data points are within) for each OTM measurement and the metered uncertainty for each METEC-TR in kg h -1 yields an ODR slope of 0.79 ±0.09 when the intercept is set to (0,0) (Fig. 8). A lower slope of 0.67 ± 0.1 is found using the VWLS method. In this case the ODR and VWLS regressions suggest the OTM flux estimates are 20-33% lower than the known releases, where an OLS regression indicates the method is only 8% low. Total emissions estimated by OTM 33A (23.074 kg hr -1 ) are 2.5% lower than the total 5 known emission rates (23.67 kg hr -1 ), suggesting OLS regression is a better fit for this data set.
VWLS and ODR should be used with caution where the measurement uncertainty is not independent of the measurement (i.e. kg h -1 ) also supports the conclusion that the OTM 33A flux estimate was biased low relative to the onsite measurements at these paired facilities.

OTM 33A Sensitivity to Source Distance
Because OTM 33A assumes a point source, the distance to the release point has a large influence, as this impacts the modeled 20 plume spread and therefore the final calculated flux. The importance of an accurate source distance in Gaussian plume modeling has been noted in previous studies (Lan et al., 2015;Caulton et al., 2018). During the METEC-TR, the University of Wyoming measurement team had site access and we were able to determine the exact emission point(s) using an IR camera. With this knowledge, we were able to calculate the exact source distance, or the average distance in the case of multiple emission sources. In the field, site access is often not available and it's often not possible to detect the most likely emission point(s). For 25 this reason, the average distance of possible emission points is used when calculating source distance.
OTM 33A sensitivity to source distance was tested two ways for the METEC test releases. The following test was performed during the data analysis stage, and compared the flux estimated using the average distance of all the components that could be sighted with the range finder from the van (e.g. wellhead, separator, tank) to the flux estimated using the distance to the known release point or point distance (identified using the FLIR camera). Although the well pad measured at the METEC facility 30 was quite small (∼6 m by 6 m), the average source distance was larger than the specific source distance ∼60% of the time.
The change in the OTM 33A flux (∆Flux) as a result of changing the measurement distance (∆Distance) was found using Equations 3 and 4.
A correlation plot of %∆Distance and %∆Flux suggests that for a 5% change in source distance, the OTM 33A flux estimate would increase by almost 10% (Fig. 9(a)). In terms of mass error, the OTM flux estimated by the average or specific source distance has very little impact in the over-or under-estimation of the METEC known release ( Fig. 9(b)). Allowing this fit to have an intercept changes the linear fit to y = 0.978x − 0.03, a negligible difference. Source distance related error is small in the context of the ±70% measurement error, but this analysis underscores how determination of the exact emission point can 10 further reduce errors in the field.
OTM 33A sensitivity to distance was also tested in the field during the METEC test releases. For configurations that had both a "closer" (generally <70 m) and "farther" (generally >100m) measurement distance for replicate measurements, the closer measurement had a flux estimate closer to the known release 78% of the time (SI Sect. 1.4). The average distance of the closer replicate measurements (78 m) is comparable to the average measurement distances for the CF-TR of 78 m, smaller 15 than the mean METEC-TR distance of 114 m, and larger than the measurement distances during the Arkansas campaign of 46 m (20-113 m) (Robertson et al., 2017;Bell et al., 2017). For both the CF-TR and the METEC-TR, there is no obvious increase in % error as measurement distance increases ( Fig. 10(a)), suggesting the underestimation reported by Bell et al. cannot be blamed solely on closer measurement distances.

20
One hypothesized reason for the underestimation of OTM 33A compared to onsite methods reported in the Bell et al. study is the lower wind speeds (<2 m s -1 ) experienced in that study. The CF-TR and METEC-TR both had wind speeds higher than 2 m s -1 , making an absolute conclusion impossible, but for the wind speeds measured there is no obvious trend between the mean measurement wind speed and OTM 33A error ( Fig. 10(b)).

25
The METEC-TR included multiple emission points, both slightly above and below the sampling inlet height. There is no obvious trend between the number of release points and % error, though the sample size for two or more sources is relatively small (N=6). The height of the sources tested also show no obvious influence on OTM 33A accuracy.

Ensemble Mass Flux
OTM 33A measurements are often used to find an average emission rate per well or per facility in an O&G basin (Robertson et al., 2017;Alvarez et al., 2018). To assess the accuracy of the mean of a number of OTM 33A measurements, the mean mass flux measured by OTM 33A is compared to the mean mass flux of the known release through bootstrapping. The bootstrapping approach is used to generate more statistically robust results without the need for assuming Gaussian distributions. The OTM 5 33A flux estimates and known releases (including their respective measurement uncertainties) are sampled with replacement, summed, and compared following Robertson et al. (2017). This approach suggests that the addition of complexity in the METEC-TR did not significantly impact the accuracy of OTM 33A (Fig. 12), and for both sets of test releases there is a large amount of overlap between the OTM 33A and known release distributions (Fig. S6). These results also indicate OTM 33A does not drastically underestimate the total emissions for an ensemble or group of measurements, and that scaling-up mean 10 emissions measured with OTM 33A to an entire basin is a valid approach.

Conclusions
The more realistic test releases described in this study build on preexisting test releases and suggest a single OTM 33A measurement can have a 2σ error of ±70%. Analysis of both the simple CF-TR and more complex METEC-TR indicate that under 20 these measurement conditions and release rates, an ensemble of OTM 33A may have a slight negative bias (∼5%) when compared to a known release rate through an OLS model. The mean and 95% CI found through bootstrapping are 0.96 (0.56,1.47) and 0.96 (0.66, 1.28) for the CF-TR and METEC-TR respectively. The 40-60% underestimation reported in the Bell et al. study was not replicated during either test release experiment.
OTM 33A flux estimates are sensitive to the assumed source distance, with a +5% change in source distance corresponding 25 to a ∼ +10% change in the OTM flux. However, the error caused by uncertainty in source distance is small compared to the measurement method error determined through these test releases. During field measurements, uncertainty in source distance can be mitigated by having site access and an IR camera to detect the emission source(s). Uncertainty did not correspond to wind speeds observed during the test releases, but was relatively higher for smaller release rates. Sensitivity of OTM 33A to the number or height of emission sources was inconclusive.
measured by the University of Wyoming range from 0.68-3.7 kg h -1 (Robertson et al., 2017), suggesting the range of these test releases may not be representative of the largest emission rates observed in the field (Fig. 13).
OTM 33A has been used to estimate mean facility emissions and basin-wide facility emissions in a number of O&G basins.
The mean mass fluxes and 95% CI for each test release experiment are not statistically different. This analysis lends confidence to national emission estimates from the O&G production sector using OTM 33A measurements. Despite the OTM 33A 5 estimated limit of detection (0.01 g s -1 ) and relative overestimation of smaller release rates, the analyses reported here and the study by Bell et al. suggest that OTM 33A does not overestimate an ensemble of flux estimates.
Code and data availability. Available on request.