Evaluating the performance of a Picarro G2207-i analyser for high-precision atmospheric O 2 measurements

. Fluxes of oxygen (O 2 ) and carbon dioxide (CO 2 ) in and out of the atmosphere are strongly coupled for terrestrial biospheric 10 exchange processes and fossil fuel combustion but are uncoupled for oceanic air-sea gas exchange. High-precision measurements of both species can therefore provide constraints on the carbon cycle and can be used to quantify fossil fuel CO 2 (ffCO 2 ) emission estimates. In the case of O 2 , however, due to its large atmospheric mole fraction of O 2 (~20.9 %) it is very challenging to measure small variations to the degree of precision and accuracy required for these applications. We have tested an atmospheric O 2 analyser based on the principle of cavity ring-down spectroscopy (Picarro Inc., model G2207-i ), both in the 15 laboratory and at the Weybourne Atmospheric Observatory (WAO) field station in the UK, in comparisons to well-established, pre-existing atmospheric O 2 and CO 2 measurement systems. In laboratory tests analysing dry air in high-pressure cylinders, from the Allan deviation we found that the best precision was achieved with 30 minute averaging and was improved to ± 0.5 ppm (~ ± 2.4 per meg). after 30 minute averaging calculated a precision of ± 1 ppm (1σ standard deviation of 300 seconds mean), and aAlso from continuous measurements from a cylinder 20 of dry air, we found the 24-hour peak-to-peak range of hourly averaged values of to be 1.2 ppm (~5.876 per meg). These results are close to atmospheric O 2 compatibility goals as set by the UN World Meteorological Organization. FBut from measurements of ambient air conducted at WAO we found that the built-in water correction of the G2207-i does not sufficiently correct for the influence of water vapour on the O 2 mole fraction. When sample air was pre-dried and employing a 5-hourly baseline correction with a reference gas cylinder was employed, the G2207-i ’s results showed an average difference from the 25 established O 2 analyser of 13.6 ± 7.5 per meg (over two weeks of continuous measurements). Over the same period, based on measurements of a so-called “target tank” (sometimes known as a “surveillance tank”), analysed for 12 minutes every 7 hours, we calculated a repeatability of ± 5.7 ± 5.6 per meg and a compatibility of ± 10.0 ± 6.7 per meg for the G2207 -i . To further examine the G2207-i ’s performance in real-world applications we used ambient air measurements of O 2 together with concurrent CO 2 measurements to calculate ffCO 2 . Due to the imprecision of the G2207-i , the ffCO 2 calculated showed large 30 differences from that calculated from the established measurement system, and had a large uncertainty of ± 13.0 ppm, which was roughly double that from the established system (± 5.8 ppm).

Abstract. Fluxes of oxygen (O 2 ) and carbon dioxide (CO 2 ) in and out of the atmosphere are strongly coupled for terrestrial biospheric exchange processes and fossil fuel combustion but are uncoupled for oceanic air-sea gas exchange. High-precision measurements of both species can therefore provide constraints on the carbon cycle and can be used to quantify fossil fuel CO 2 (ffCO 2 ) emission estimates. In the case of O 2 , however, due to its large atmospheric mole fraction (∼ 20.9 %) it is very challenging to measure small variations to the degree of precision and accuracy required for these applications. We have tested an atmospheric O 2 analyser based on the principle of cavity ring-down spectroscopy (Picarro Inc., model G2207-i), both in the laboratory and at the Weybourne Atmospheric Observatory (WAO) field station in the UK, in comparison to well-established, pre-existing atmospheric O 2 and CO 2 measurement systems.
In laboratory tests analysing dry air in high-pressure cylinders, we found that the best precision was achieved with 30 min averaging and was ±0.5 ppm (∼ ±2.4 per meg). Also from continuous measurements from a cylinder of dry air, we found the 24 h peak-to-peak range of hourly averaged values to be 1.2 ppm (∼ 5.8 per meg). These results are close to atmospheric O 2 compatibility goals as set by the UN World Meteorological Organization. However, from measurements of ambient air conducted at WAO we found that the built-in water correction of the G2207-i does not sufficiently correct for the influence of water vapour on the O 2 mole fraction. When sample air was dried and a 5-hourly baseline correction with a reference gas cylinder was employed, the G2207i's results showed an average difference from the established O 2 analyser of 13.6 ± 7.5 per meg (over 2 weeks of conti-nuous measurements). Over the same period, based on measurements of a so-called "target tank", analysed for 12 min every 7 h, we calculated a repeatability of ±5.7 ± 5.6 per meg and a compatibility of ±10.0 ± 6.7 per meg for the G2207-i. To further examine the G2207-i's performance in real-world applications we used ambient air measurements of O 2 together with concurrent CO 2 measurements to calculate ffCO 2 . Due to the imprecision of the G2207-i, the ffCO 2 calculated showed large differences from that calculated from the established measurement system and had a large uncertainty of ±13.0 ppm, which was roughly double that from the established system (±5.8 ppm).

Introduction
Oxygen (O 2 ) is the most abundant molecule in the atmosphere after nitrogen (N 2 ), with an atmospheric background mole fraction of approximately 20.9 %. Due to this large atmospheric background, O 2 measurements are sensitive to variations in the mole fractions of other atmospheric species, such as carbon dioxide (CO 2 ), due to dilution effects. O 2 measurements are therefore typically reported on a relative scale calculated as the change in the ratio of O 2 to N 2 relative to a standard O 2 /N 2 ratio, as given in Eq. (1), and expressed in "per meg" units.
In practice, atmospheric N 2 is far less variable than O 2 , meaning that changes in the O 2 /N 2 ratio can be assumed to Published by Copernicus Publications on behalf of the European Geosciences Union.
be representative of O 2 mole fraction (Keeling and Shertz, 1992). In comparing changes in O 2 to changes in CO 2 , on a mole for mole basis, a 1 per meg change in O 2 is equivalent to a 0.2094 ppm (parts per million) change in CO 2 mole fraction . Over the past 3 decades, atmospheric O 2 has been decreasing at an average rate of ∼ 15 per meg per year, primarily owing to fossil fuel combustion (Keeling and Manning, 2014); over the same period, atmospheric CO 2 has been increasing at an average rate of 2 ppm yr −1 (Dlugokencky and Tans, 2022), also predominantly due to fossil fuel combustion. For most processes that cause variability in atmospheric O 2 , there is an anti-correlated change in atmospheric CO 2 ; therefore, high-precision measurements of atmospheric O 2 play an increasingly important role in our understanding of atmospheric CO 2 , carbon cycling, and other biogeochemical processes (e.g. Pickers et al., 2017;Resplandy et al., 2019;Battle et al., 2019;Tohjima et al., 2019). Fluxes of O 2 and CO 2 in and out of the atmosphere are strongly coupled for terrestrial biosphere exchange with a global average oxidative ratio (OR) in the range of 1.03 to 1.10 mol mol −1 (Severinghaus, 1995). For fossil fuel combustion, dependent on fuel type, the OR is in the range of 1.17 to 1.95 mol mol −1 (Keeling, 1988b). Whereas O 2 and CO 2 fluxes are uncoupled for oceanic air-sea gas exchange primarily due to inorganic reactions in the water involving the carbonate system and not O 2 , as well as differences in air-sea equilibration times between the two gases.
The relationship between O 2 and CO 2 fluxes has also allowed for the derivation of the tracer "atmospheric potential oxygen" (APO), as defined in Eq. (2) (Stephens et al., 1998).
where the factor −1.1 represents the mean value of the O 2 : CO 2 OR for terrestrial biosphere photosynthesis and respiration (Severinghaus, 1995), and where we have ignored very minor influences from methane and carbon monoxide. APO is therefore, by definition, invariant with respect to the terrestrial biosphere. Changes in APO therefore mainly reflect changes in ocean-atmosphere exchange of O 2 and CO 2 (primarily on seasonal and longer timescales), with a contribution from fossil fuels on both shorter and longer timescales. APO can thus be used to examine oceanic CO 2 fluxes and to quantify fossil fuel CO 2 (ffCO 2 ) emissions . The World Meteorological Organization (WMO) Global Atmosphere Watch (GAW) programme has established a measurement compatibility goal for O 2 of ±2 per meg (±0.4 ppm) (Crotwell et al., 2019), where compatibility refers to the acceptable level of agreement between two field stations or laboratories when measuring the same air sample. This is the scientifically desirable level of compatibility required to resolve, for example, latitudinal gradients and longterm trends (Crotwell et al., 2019). There is also a WMO extended compatibility goal of ±10 per meg (±2 ppm), which is suitable for some specific applications when expected variations are relatively large (Crotwell et al., 2019), such as fossil fuel quantification in large cities. In order to meet the WMO compatibility goals, it is recommended that a measurement system's repeatability should not exceed half of the compatibility goal (i.e. ±1 per meg; ±0.2 ppm). Repeatability refers to the closeness of agreement between results of repeated measurements of the same measure (which is also sometimes referred to as the measurement system's precision). However, routinely achieving a measurement repeatability of ±1 per meg is not achievable for almost any laboratories or field stations making high-precision measurements of atmospheric O 2 . The large atmospheric background of O 2 makes it extremely challenging to measure the relatively small variations to the level of repeatability required, since measuring a change of, for example, 0.2 ppm against the background (∼ 209 400 ppm) requires a relative repeatability of 0.0001 %.
Presently, there are several different analytical techniques available for measuring atmospheric O 2 to high precision: interferometry (Keeling, 1988a), isotope ratio mass spectrometry (Bender et al., 1994), paramagnetic techniques (Manning et al., 1999), vacuum ultraviolet absorption (VUV; Stephens et al., 2011), gas chromatography (Tohjima, 2000), and electrochemical fuel cells (Stephens et al., 2007). The most precise of these current methods is the VUV absorption technique; however, VUV O 2 analysers are "homemade" and are not commercially available, thus limiting their widespread application. None of these techniques are "off-the-shelf" systems, all of them are complex and time-consuming systems to design, build, and optimise, with very precise pressure, temperature, and flow control needed. All of the techniques also require frequent interruption to sample measurement to carry out calibration procedures (Kozlova and Manning, 2009). The supply of calibration gases for such systems is particularly labour intensive, due to both their relatively rapid consumption rate and the fact that no commercial gas supply company is able to provide suitable gas mixtures for atmospheric O 2 research. Accurate, high-precision atmospheric O 2 measurements therefore remain challenging. An alternative commercially available O 2 analyser with fewer requirements for external gas handling, air-sample drying, and calibration procedures could consequently advance the field of atmospheric O 2 measurements if the required performance could be achieved and if it were relatively easy to operate with low maintenance requirements and a lower rate of calibration gas consumption.
In this paper we present the results from the analysis of a Picarro Inc. G2207-i oxygen analyser, which operates on the principle of cavity ring-down spectroscopy technology (CRDS) (hereafter referred to as the G2207-i) and evaluate its performance in comparison to established O 2 measurement systems in the University of East Anglia (UEA) Carbon Related Atmospheric Measurements (CRAM) Laboratory and at the Weybourne Atmospheric Observatory (WAO; North Norfolk, UK). Unlike most other analytical techniques used for atmospheric O 2 measurements, it is intended that the G2207-i should not require a continuous reference gas supply, and it has built-in pressure and flow control and the potential for greatly reduced sample drying requirements due to a built-in water measurement and correction procedure. These features make the G2207-i a potentially desirable analyser for high-precision atmospheric O 2 research, but we note that it would still require the same rigorous calibration procedures as other analysers (Kozlova and Manning, 2009), albeit possibly at reduced frequency. In this paper we quantify the compatibility, repeatability, and drift rates in the context of the WMO/GAW guidelines (Crotwell et al., 2019). In order to further examine the performance of the G2207-i in real-world applications, we also calculated ffCO 2 from concurrent O 2 and CO 2 measurements, using the novel methodology presented by Pickers et al. (2022). We compare ffCO 2 calculated with O 2 measurements from the G2207-i installed at WAO with ffCO 2 calculated from the established O 2 system employing a Sable Systems International Inc. "Oxzilla II" fuel cell analyser.

Picarro G2207-i O 2 analyser
The Picarro G2207-i O 2 analyser measures the mole fractions of the two most abundant atmospheric O 2 isotopologues, 16 (Berhanu et al., 2019). The design principles of this analyser have been described in detail by Berhanu et al. (2019). In our study we evaluate only what is called the "O 2 concentration" mode, measuring only the 16 O 16 O isotopologue. In the other mode, called the "δ 18 O plus O 2 concentration" mode, O 2 mole fraction values are considerably less precise, as the analyser is not optimised for 16 O 16 O measurements (primarily via a different set point for the pressure in the cavity). The analyser reports both "wet" and "dry" O 2 mole fraction values. The wet values (O 2,NC ; NC stands for "not corrected") do not have any correction applied to them, whereas the dry values (O 2,WC ; WC stands for "water corrected") are corrected for the dilution effect of water vapour on the O 2 mole fraction, as well as spectroscopic interference, using the analyser's parallel water vapour mole fraction measurements. The G2207-i data sheet states a measurement precision of 5 ppm +0.1 % of the reading (1-σ , 5 s) for the water vapour mole fraction.

CRAM laboratory measurement of cylinder gases
The performance of the G2207-i was evaluated in the UEA CRAM Laboratory by measuring a suite of 12 gas cylinders all containing dry natural air with varying O 2 mole fractions. The cylinders were stored horizontally in a thermally insu-lated "Blue Box" enclosure in order to prevent gravitational and thermal fractionation of O 2 relative to N 2 (Keeling et al., 2007). The O 2 composition of each of these cylinders was precisely defined on the Scripps Institution of Oceanography (SIO) O 2 scale (Keeling et al., 2007) using a VUV O 2 analyser, which is also in the CRAM Laboratory. The CO 2 mole fraction was defined on the "WMO CO 2 X2007" scale (Zhao and Tans, 2006) using a Siemens Corp. Ultramat model 6F non-dispersive infrared (NDIR) CO 2 analyser. Five of these cylinders were working secondary standards (WSSes), which were used to calibrate the G2207-i, one was a reference tank (RT; explained below in Sect. 2.3.2), while the other six were treated as cylinders with unknown mole fractions ( Table 1). The six "unknown" cylinders were used to evaluate the performance of the analyser with a CO 2 mole fraction range of 375 to 443 ppm and an O 2 /N 2 ratio range of −915 to +435 per meg, a much larger range than would typically be observed in ambient air.
The cylinders were run consecutively, starting with the six "unknowns" and ending with the five WSSes, with the RT run at the beginning and end; this sequence was repeated twice. Each of the gas cylinders was flushed for 20 min prior to running on the G2207-i to allow for removal of stagnant air and equilibration of the pressure regulators; air from each cylinder was then passed through the analyser for 20 min, with the first 8 min of data discarded to allow flushing of the previous cylinder's air from the cavity and to maintain consistency with the flushing time employed in subsequent WAO tests (Sect. 2.3.2). The remaining 12 min for each cylinder was then averaged to give the "raw" O 2,NC value for each cylinder as measured on the G2207-i.
The G2207-i has a linear response to O 2 mole fraction (Eq. 3) where B and C are the coefficients derived from the slope and intercept of the linear regression calculated from the measurement of the WSSes. Therefore, a minimum of two WSS cylinders are required to determine the B and C coefficients, but by using five we are able to calculate the coefficient of determination (R 2 ), as well as providing more robustness in the fit. The calibration equation was used to convert the "raw" O 2,NC values taken from the G2207-i (x in Eq. 3) into what we call "ppm equivalent" (ppmEquiv) O 2 units (y in Eq. 3), as described in Kozlova and Manning (2009). A linear interpolation between the RT at the beginning and end of each run was used as a baseline for the run and subtracted from all other cylinder measurements to correct for short-term analyser variations. The calibration curve (Eq. 3) for the G2207i was also determined relative to the interpolated RT values (WSS -RT); thus, all the unknown cylinder measurements could be converted into ppmEquiv. The ppmEquiv O 2 units were then converted to per meg units, providing a δ(O 2 /N 2 ) value for each unknown cylinder, using Eq. (4).
where, δO 2 is the calibrated G2207-i O 2,NC value in ppmEquiv units, CO 2 is the declared cylinder CO 2 mole fraction from the Siemens analyser in ppm, S O 2 is 0.2094, which is the standard mole fraction of O 2 molecules in dry air (Tohjima et al., 2005), and 363.29 is an arbitrary CO 2 reference value in ppm, inherent to the SIO O 2 scale (Stephens et al., 2007).  (Wilson, 2013). O 2 is measured with an Oxzilla II O 2 analyser (Sable Systems International Inc.) (hereafter referred to as the "Oxzilla"), and CO 2 is measured with an Ultramat 6E NDIR analyser (Siemens Corp.). These analysers are arranged in series, with the air sample first passing through the Ultramat 6E and then the Oxzilla, with rigorous gas handling and calibration protocols followed (as in Stephens et al., 2007).

Weybourne Atmospheric Observatory field tests
The G2207-i was installed at WAO from 23 October 2019-2 November 2019, sampling from a solar shield aspirated air inlet (AAI) at a height of 10 m above ground level (a.g.l.; 20 m above sea level, a.s.l.). The AAI protects the inlet from solar radiation and generates a continuous air flow over the inlet, thus preventing the differential fractionation of O 2 molecules relative to N 2 molecules due to ambient temperature variations (Blaine et al., 2006) and relatively slow inlet flow rates (Manning, 2001). A diagram of the gas handling set-up for the G2207-i at WAO is displayed in Fig. 1.

Drying
Water vapour mole fractions in the troposphere vary from a few parts per million to a few percent over small temporal and spatial scales. This water vapour has a diluting effect on atmospheric gas measurement. A 1 ppm increase of water vapour will dilute the measured atmospheric O 2 by approximately 1.3 per meg (Stephens et al., 2007); thus, the existing method for high-precision atmospheric O 2 measurements is to dry the sample air to less than 1 ppm water vapour content before measurement. All calibration and RT gases are also dried to less than 1 ppm water vapour. Furthermore, measurements using spectroscopic techniques are also sensitive to water vapour variability due to changes in the degree of pressure broadening of the spectroscopic lines used to measure the O 2 and δ 18 O 2 . Water vapour correction has previously been successfully implemented for measurements of CO 2 and methane (CH 4 ) with CRDS analysers (Chen et al., 2010); however, in order to achieve accuracies within the RT, reference tank; TT, target tank). Calibration gases were shared with the established O 2 and CO 2 system (using V4), but the established system has its own AAI, pump, drying system, and pressure and flow control (not depicted here).
WMO goal of 1 % H 2 O custom coefficients must be obtained for each analyser (Rella et al., 2013).
As discussed in Sect. 2.1, O 2 measurements are reported by the G2207-i as "wet" (O 2,NC ) and, after the implementation of water correction, "dry" (O 2,WC ). In order to evaluate the effectiveness of the built-in water correction procedure for compensating for water vapour dilution, ambient air was sampled with three different drying regimes: no drying, partial drying, and full drying. Under the full-drying conditions (which is the current standard practice), the sample air passed through a fridge trap (∼ 1 • C) and a cryogenic chiller trap (∼ −90 • C), removing water vapour to < 1 ppm. Under partial drying the chiller was bypassed, so the sample air only passed through the fridge trap, which dries the air to approximately 5000 ppm of water vapour. With no drying, both the chiller and fridge were bypassed. Air was simultaneously sampled through a separate AAI (10 m a.g.l.) into the preexisting O 2 and CO 2 system with full drying during each of these stages. The time difference between air travelling from the AAIs to each of the two analysers was accounted for.
To evaluate the built-in water correction procedure of the G2207-i, the O 2,WC values were compared with measurements from the Oxzilla (which was continuously sampling fully dried air) for the no drying and partial drying periods, and the O 2,NC and O 2,WC G2207-i values were compared to the Oxzilla when sampling fully dried air.

Calibration procedure
A tailor-made calibration protocol was developed for the G2207-i following ICOS atmospheric station specifications (ICOS-RI, 2020). The calibration cylinders were stored horizontally in a thermally insulated "Blue Box" enclosure in or-der to prevent gravitational and thermal fractionation of O 2 and N 2 . The calibration gases consisted of three WSSes with precisely defined O 2 and CO 2 values that span the unpolluted atmospheric range (traceable to the SIO O 2 and WMO CO 2 X2007 scales) and a reference tank (RT) with O 2 and CO 2 values close to ambient air conditions at the site. The repeatability and compatibility of the analyser were evaluated using a target tank (TT) (sometimes known as a "surveillance tank") with precisely defined O 2 and CO 2 values. With full drying of the sample air, each of the WSSes, the RT, and the TT were run for 20 minutes, of which the first 8 min was discarded due to the sweep-out time of the G2207-i and equilibration after valve switching and surface effects. The final 12 min were averaged to determine the cylinder value for the given run. A flushing period of 8 min and averaging time of 12 min were chosen to match that of the established system. Under partial and no drying the run time of the cylinders was increased in order to fully flush the G2207-i of water vapour; each cylinder was therefore run for 32 min, with the first 20 min being discarded and the final 12 min averaged.
A full three-gas WSS calibration of the G2207-i was run every 23 h, this frequency is intentionally not a multiple of 24 h in order to prevent aliasing the data by calibrating under environmental conditions that may occur at the same time each day. This calibration corrects for drift in the span or non-linearity of the analyser. As in the CRAM laboratory tests (see Sect. 2.2), the WSSes were used to define a calibration equation to convert the raw analyser O 2 values into ppmEquiv O 2 units. Equation (3) and the concurrent CO 2 measurement from the Ultramat 6E NDIR analyser were then used to convert this into per meg units.
The RT is used for data correction caused by short-term analyser drift and was run every 5 h. A linear interpolation between each of the RT run averages was treated as a baseline and subtracted from all subsequent air and cylinder measurements. The calibration curve for the G2207-i was also determined relative to the RT values (WSS-RT), and thus the air measurement differences can be easily converted into per meg units.
Finally, the TT was run every 7 h, this cylinder is used to quantify the repeatability and compatibility of the analyser. "Repeatability" is defined as the closeness of agreement between results of successive measurements of the same measure carried out under the same measurement conditions and is considered as a proxy for the precision of a measurement system. "Compatibility" is defined as the averaged O 2 value of all TT runs over time, compared to the values declared by the VUV, and provides a measure of the compatibility to the SIO scale over time (Kozlova and Manning, 2009). The TT air does not pass through the AAI or drying lines (Fig. 1), and it is therefore mainly representative of the analyser's repeatability and compatibility only.

Quantifying fossil fuel CO 2 using atmospheric potential oxygen
In order to further assess the G2207-i's performance in realworld applications, the O 2,NC observations from the fulldrying regime period at WAO were used to isolate the fossil fuel component of the concurrent CO 2 observations and then compared to the ffCO 2 values calculated from atmospheric potential oxygen (APO) defined from the Oxzilla O 2 observations following the methodology outlined in Pickers et al. (2022). The tracer APO, derived by Stephens et al. (1998), was first calculated using Eq. (5) (using both G2207-i O 2,NC and Oxzilla O 2 values); these APO values were then used to calculate ffCO 2 using Eq. (6).
where O 2 and CO 2 are in per meg and parts per million, respectively, −1.1 is the global average O 2 : CO 2 terrestrial biosphere-atmosphere exchange ratio (Severinghaus, 1995), 0.2094 is the standard mole fraction of O 2 in dry air (Tohjima et al., 2005), and 350 is an arbitrary reference value for CO 2 (in ppm). Multiplying CO 2 by −1.1 and dividing by 0.2094 converts the CO 2 data from parts per million to per meg units.
where APO is derived from Eq. (5) in per meg units, APO bg is the APO background, or baseline, value determined using a statistical baseline fitting procedure, and R APO:CO 2 is the APO : CO 2 combustion ratio for fossil fuel emissions. The APO bg values were determined using the rfbaseline function from the IDPmisc package in R, which implements robust fitting of local regression models, with a smoothing window of 1 week (Ruckstuhl et al., 2012). The APO : CO 2 emission ratio (R APO:CO 2 ) used is −0.3 mol mol −1 , an approximate mean value for WAO as determined from the COFFEE inventory (a typical value for fossil fuel emissions, given that the APO : CO 2 ratio = O 2 : CO 2 + 1.1; Pickers, 2016;Steinbach et al., 2011). The uncertainty on the ffCO 2 mole fractions was calculated using Eq. (6) with the upper and lower uncertainty limit for each variable (where the measurement uncertainty for APO was calculated by summing in quadrature the CO 2 and O 2 measurement uncertainty for each analyser) and then taking the standard deviation (SD) of the resultant ffCO 2 value of each combination for each hourly time stamp.
3 Results and discussion

Precision and drift
To assess the short-term precision and optimal averaging time of the G2207-i, the Allan deviation technique (Werle et al., 1993) was used whilst sampling a compressed-air cylinder in the laboratory (50 L, 200 bar). The cylinder was run for 24 h with a sample flow rate of 94 mL min −1 and cavity pressure and temperature of 340 mbar and 45 • C, respectively. The results of this Allan deviation analysis are in agreement with those obtained by Berhanu et al. (2019), where a precision of 1 ppm (∼ 4.8 per meg) was achieved after an averaging time of 300 s. Precision then continues to improve until around a 30 min averaging time where a precision of ∼ 0.5 ppm (∼ 2.4 per meg) is reached, and it remains around that value for averaging times up to around 1 h (Fig. 2). It should be noted that unlike the hourly average and standard deviation obtained from measurement of cylinder air, the hourly averages of atmospheric data also contain natural variability in addition to analyser-related noise and drift.
To evaluate the analyser drift (i.e. the changing sensitivity of the analysers response with time), O 2,NC values from the G2207-i were averaged to 1 h ( Fig. 3b; reported in ppm where 1 ppm corresponds to a change of 4.8 per meg in the O 2 /N 2 ratio). The G2207-i data sheet states a maximum drift at STP (standard temperature and pressure) (over 24 h, peakto-peak, 1 h internal average at 21 % O 2 ) of < 6 ppm. We found that over 24 h, the maximum peak-to-peak drift of the hourly averages is ∼ 1.2 ppm (approximately 5.8 per meg); this is better than stated by Picarro Inc. but does not meet the WMO compatibility goal of ±2 per meg, as the internal drift of the analyser is greater than this goal. The standard deviation of each of these hourly averages is ∼ 14.5 ppm (∼ 69.6 per meg) (Fig. 3a), this is caused by the large amount of analyser noise in the raw 1 s data points, spanning ∼ 100 ppm  (∼ 480 per meg) (Fig. 3c). The overall drift over the 24 h of raw data however is very small, shown by a linear regression slope of −4.26 × 10 −6 ppm s −1 (Fig. 3c).

CRAM laboratory measurement of cylinder gases
The G2207-i analyser performance was evaluated by measuring six gas cylinders with precisely defined O 2 and CO 2 values as measured on a VUV O 2 analyser and Siemens Ultramat 6F NDIR CO 2 analyser (Table 1). The difference between the O 2,NC values (per meg) as measured by the G2207i and the declared values from the VUV are shown in Table 2 for runs both with and without the RT interpolation applied. This procedure was carried out twice, referred to as "Run 1" and "Run 2" in Table 2 For both sets of runs without the application of the RT interpolation the difference between the VUV declared value and that measured by the G2207-i is very large and far outside of an acceptable range (Table 2), with an average difference from the declared values for all cylinders of 22.0 ± 10.3 per meg. For all cylinders, except for cylinder 5 and 6, a large improvement in the difference is seen after the application of the RT correction. Due to the large differences between the declared and measured values without the RT correction applied, only the results with the RT correction will be discussed hereafter.
Cylinders 5 and 6 contain O 2 values far higher than that found in ambient air (411.7 and 434.6 per meg, respectively) and outside of the range spanned by the WSSes used for calibration. For these two cylinders, the difference between the declared value and that measured by the G2207-i is far larger than the other cylinders and also more variable between the two runs, with a standard deviation of the absolute values between the two runs of ±14.9 and ±19.5 per meg, respectively (Table 2). Berhanu et al. (2019) found that the accuracy of the G2207-i was reduced when the CO 2 mole fraction was much higher than that of ambient air but did not observe the same reduction in accuracy with high O 2 mole fractions. Ignoring the two cylinders with positive O 2 , the average absolute difference for the remaining four unknown cylinders and the declared values over the two runs is 3.4 ± 2.5 per meg, this is slightly greater than the WMO compatibility goal of ±2 per meg but does fall within the extended goal of ±10 per meg and is similar to what can be achieved with an Oxzilla II (Pickers et al., 2017). There is also no correlation between the accuracy and the declared O 2 value when excluding the two cylinders with positive O 2 (R 2 = 0.07 for run 1, R 2 = 0.53 for run 2). Although the accuracy of the O 2 values measured by the G2207-i for these cylinders is variable, particularly for the cylinders with high O 2 , the standard deviation of the 2 min data points used to calculate the final cylinder O 2 value as defined by the G2207-i within each run is more consistent. However, the repeatability, used as a proxy for precision, and defined here as the ±1σ standard deviation of the average of the two measurements of each cylinder, is variable. For the two cylinders with high O 2 (cylinders 5 and 6) the repeatability is more than 5 times greater than the WMO extended repeatability goal of ±5 per meg. For the remaining ±1σ standard deviation of the average of the run 1 and run 2 G2207-i-VUV absolute differences.
four cylinders the repeatability is far lower, with cylinder 1 and cylinder 3 both falling within the extended repeatability goal.

Partial and no drying of ambient air measurements
The results from no drying and partial drying of the sample air into the G2207-i at WAO are displayed in Figs. 4 and 5, respectively. The O 2 mole fractions reported (in ppm) by the G2207-i were converted to per meg units using the calibration equations produced through the measurement of the three WSS cylinders every 23 h and the concurrent CO 2 observations from the Ultramat 6E analyser. During the period where there was no drying of the G2207-i air sample there is a significant difference between the O 2 values reported by the Oxzilla (dried air) and the G2207-i O 2,NC values (Fig. 4b). This is to be expected due to the diluting effect of water vapour; however, there is also a significant difference between the Oxzilla O 2 and the G2207i O 2,WC values. Over the entire no-drying period the average difference between the Oxzilla observations and the G2207-i O 2,NC is −965.4 ± 272.8 per meg. The average difference between the Oxzilla and the G2207-i O 2,WC values is −849.8 ± 31.1 per meg. Although the difference is substantially smaller with the application of the G2207-i builtin water correction procedure, it is still unusably large, with no similarity in the Oxzilla and G2207-i signals and both the O 2,NC and O 2,WC G2207-i values correlating with the H 2 O variability ( Fig. 6a and b). This demonstrates that the algorithm currently applied for water correction is unsuitable for precise O 2 measurement.
As seen during the no-drying period of the sample air, there is also a significant difference between the reported O 2 values of the Oxzilla and G2207-i under the partial drying regime for both O 2,NC and O 2,WC (Fig. 5b). With partial drying, the time series of the difference between the O 2 values of the two analysers is a lot smoother than with no drying. This is due to the fridge trap removing some of the natural variability in the water vapour mole fraction. Over the entire partial drying period the average difference between the Oxzilla observations and the G2207-i O 2,NC is −7144.1 ± 258.6 per meg. The average difference between the Oxzilla and the G2207-i O 2,WC values is −612.7 ± 31.8 per meg. There is a large improvement with the application of the water correction procedure; however, as with the no-drying results, the difference in O 2 values between the Oxzilla and G2207-i O 2,WC are too large to be usable for any application, with the O 2,NC and O 2,WC values correlating with the H 2 O variability ( Fig. 6c and d).
Under both partial-drying and no-drying regimes, the difference between the Oxzilla and G2207-i values is strongly correlated with the water vapour mole fraction but decreases with the application of the built-in water correction procedure (Fig. 6). The R 2 value decreases from 0.996 to 0.803 for no drying and from 0.967 to 0.301 for partial drying once the water correction has been applied. Given the correlation between the water vapour mole fraction and the O 2,WC reported by the G2207-i these values are not usable without significant improvements to the water correction procedure by Picarro Inc.
Due to the large differences observed between the Oxzilla and G2207-i reported O 2 values under no drying and partial drying, no further investigation was undertaken, thus only the fully dried sample air data are considered hereafter.

Full drying of ambient air measurements
The results from fully drying the sample air between 24 October and 7 November 2019 are displayed in Fig. 7. The O 2  mole fractions reported in ppm units by the G2207-i were converted to per meg units using the calibration equations produced through the measurement of the three WSS cylinders every 23 h, and the concurrent CO 2 observations. There is a greater difference between the Oxzilla and G2207-i O 2,WC values than the O 2,NC values, with an av-erage difference over the entire full-drying period of 22.6 ± 7.4 per meg compared to 13.6 ± 7.5 per meg, respectively. This may be due to overcorrection of the O 2,NC values as the water vapour mole fraction is below the G2207-i's lower detection limit and precision, i.e. the G2207-i is reporting H 2 O mole fractions of approximately 7 ppm (Fig. 7a) (with  frequent spikes due to equilibration after switching of V1 ( Fig. 1) from cylinder to sample air); however, when the air sample is fully dried by passing through the chiller and fridge trap, the water vapour is reduced to below 1 ppm. This overestimated water correction whilst sampling fully dried air was also found by Berhanu et al. (2019). We therefore only refer to the O 2,NC values, which we believe to be more accurate, in the analysis from now onwards.
The large jumps in the G2207-i O 2,NC values following WSS calibrations (see Fig. 7b, grey points) are caused by a drift in the analyser's baseline, which only becomes applied to the data after each calibration. These jumps were reduced through the application of the 5 h RT interpolation procedure (see Fig. 7b, blue points), which constrained the baseline drift (refer to Sect. 2.3.2). After the application of the RT interpolation the jumps between WSS calibrations were vastly reduced (see Fig. 7).

Repeatability and compatibility
The repeatability and compatibility of the analyser were evaluated through the running of a TT every 7 h during the fulldrying period using O 2,NC values, the results of which are presented in Fig. 8 and Table 3. For O 2 the WMO repeatability goal is ±1 per meg (with an extended goal of ±5 per meg) and the compatibility goal is ±2 per meg (with an extended goal of ±10 per meg; indicated by the dashed lines in Fig. 8; Crotwell et al., 2019).
The repeatability is determined from the mean ± 1σ standard deviations of the average of two consecutive measurements of the TT. For the G2207-i this is equal to ±5.7 ± 5.6 per meg, compared to ±2.2±2.0 per meg on the Oxzilla. Prior to applying the RT interpolation to the G2207-i data, the repeatability of the G2207-i was ±11.9 ± 13.8 per meg, twice as bad as after the RT application; this is because after the RT interpolation was applied the large jumps in the TT value after a WSS calibration were removed. In the context of the WMO repeatability goals, neither the Oxzilla nor the G2207-i meet the goal of ± 1 per meg. For O 2 , the WMO goals are very ambitious and not currently achievable by the O 2 measurement community; hence, the "extended" O 2 repeatability goal of ± 5 per meg (Crotwell et al., 2019). The Oxzilla TT results lie within this extended goal; however, the G2207-i does not, even after the application of the RT interpolation.
The compatibility of the analyser, which is used here as a proxy for accuracy, is determined by calculating the mean difference between the TT O 2 as measured by the G2207-i and the VUV declared value (−718 per meg). The mean absolute difference from the declared value on the VUV for the Oxzilla is 3.0 ± 2.6 per meg, this is well within the extended WMO compatibility goal of ± 10 per meg and is quite close to the more stringent goal of ± 2 per meg. The compatibility of the G2207-i prior to the application of the RT interpolation is 22.9 ± 34.1 per meg, which is far greater than even the ex-tended compatibility goal of ± 10 per meg. After the application of the RT interpolation the compatibility of the G2207-i O 2,NC is 10.0 ± 6.7 per meg, although this is not within the WMO compatibility goal, it is just within the extended goal, which is deemed suitable for some applications in specific circumstances, such as where the signals are relatively very large so that reduced repeatability and compatibility does not preclude useful information from the measurements.
The compatibility and repeatability of the G2207-i measurements were vastly improved after the application of a 5-hourly RT; however, if one ignores the TT results immediately after a new WSS calibration (i.e. after the large jumps when the RT was not applied), the repeatability without the RT interpolation is 5.2 ± 4.5 per meg, improving to 4.3 ± 4.6 per meg when the RT is applied. This is because the RT corrected for baseline drift between WSS calibrations, but it does not correct for drift within the calibration period. However, as the TT results are imprecise (as illustrated by the large error bars in Fig. 8), even if any baseline drift within a calibration period were corrected for, there would likely be little improvement in the final TT results as the noise in the RT-corrected TT values is primarily caused by imprecision rather than baseline drift.

Applications of the G2207-i O 2 measurements in the calculation of fossil fuel CO 2
In order to further assess the G2207-i's performance in realworld applications the fully dried, RT corrected, O 2,NC observations from WAO were used to isolate the fossil fuel component of the concurrent CO 2 observations and then compared to the ffCO 2 values calculated from the Oxzilla O 2 observations following the APO methodology outlined in Pickers et al. (2022). The resultant ffCO 2 values calculated from each analyser are displayed in Fig. 9. The measurement uncertainty was calculated as the average hourly SD on 30 October 2019, this date was chosen as it was a particularly stable period with little variation in the TT results for both analysers (Fig. 8); the resultant uncertainty for the G2207-i is ±11.2 per meg compared to ±4.9 per meg for the Oxzilla. The uncertainty in the baseline determination (±28 %) and the emission ratio uncertainty (±22 %) are significantly larger than these measurement uncertainties , but as these are the same for both analysers the additional measurement uncertainty for the G2207-i caused by analyser noise increases the uncertainty of the calculated ffCO 2 values. The average final calculated uncertainty on the ffCO 2 values calculated from the Oxzilla measurements is ±5.8 ppm, compared to ±13.0 ppm on the G2207-i.
The average ffCO 2 value over the entire full-drying period for the Oxzilla is 5.1 ppm, compared to 7.9 ppm on the G2207-i (Table 4); the calculated ffCO 2 from the G2207i is higher than that of the Oxzilla 73 % of the time. This difference is predominantly due to the higher O 2 values re-  ported by the G2207-i as discussed in Sect. 3.3.2; some of this difference also comes from the jumps in the G2207-i O 2 values, which means that the calculated baselines used for each analyser follow different trends. For example, on the 27 and 30 October 2019 the largest difference between the calculated ffCO 2 values is observed (Fig. 9); on both of these dates there is a large jump in O 2 values from the previous day measured by the G2207-i following a WSS calibration (Fig. 7). Although the O 2 difference between the two analysers on these days is low, there was a large difference the preceding day, the days with the larger difference (due to a higher O 2 value reported by the G2207-i) in observed values pull the baseline to become more positive, thus making the difference between the ffCO 2 calculated from the two analysers larger on days where the observed O 2 difference is smaller.
Although the G2207-i calculated ffCO 2 values that are often higher than those from the Oxzilla, they still follow the Oxzilla ffCO 2 (ppm) G2207-i ffCO 2 (ppm) Average 5.1 ± 5.9 7.9 ± 6.6 Maximum 25.2 29.4 Minimum −3.7 −6.5 same trend (with some jumps in the G2207-i values); however, the maximum and minimum values occur at different times. The differences in ffCO 2 calculated from the G2207i and the Oxzilla would become problematic if using the G2207-i analyser for top-down ffCO 2 quantification on an hourly basis.

Conclusions
The performance of the Picarro G2207-i under both laboratory and field conditions has been thoroughly evaluated. When running a cylinder on the G2207-i over 24 h in the laboratory, we observed a large amount of noise in the raw 1 s data, resulting in a large standard deviation in averaged data. This standard deviation is reduced over longer averaging times. During the laboratory measurement of cylinder gases with declared O 2 values, the G2207-i performed within the WMO extended compatibility goal of ±10 per meg when measuring cylinders with a negative O 2 per meg value. When measuring cylinders with a positive O 2 value, the precision and accuracy of the result worsened, thus the G2207-i is not recommended for use in this range. When sampling ambient air, we found that the G2207-i's built-in water correction does not, at present, sufficiently correct for the influence of water vapour even when the sample air is partially dried, and we therefore recommend full drying (< 1 ppm H 2 O) of air samples. When sampling fully dried air, large step-changes in the reported O 2 values from the G2207-i were observed after each WSS calibration; the addition of a RT every 5 h vastly reduced these jumps; however, they were still observable. When the RT interpolation was applied, the repeatability of the G2207-i was ±5.7 ± 5.6 per meg, falling just outside of the WMO extended goal of ±5 per meg; it is possible that with a more frequent RT interpolation this repeatability will improve. The compatibility was ±10 ± 6.7 per meg, falling within the WMO extended compatibility goal for O 2 of ±10 per meg. In the future, investigation into whether increasing the frequency of the running of an RT to reduce jumps in the observed O 2 values after a WSS calibration may improve both the repeatability and compatibility of the analyser. A key benefit of CRDS analysers is that they do not require drying of the air sample; however, this is not currently the case with the G2207-i for O 2 measurements. Data availability. The G2207-i data from the WAO and CRAM Lab tests are available at: https://doi.org/10.5281/zenodo.6802657 . The WAO in situ datasets are available at the CEDA data archives, for O 2 : https://catalogue.ceda.ac.uk/uuid/ b3f9714c956f428a840211e0184e23eb (last access: 1 July 2022; Forster, 2012b), and for CO 2 : https://catalogue.ceda.ac.uk/uuid/ 87fc265aab6b4aeb961e62da2cd6ca91 (last access: 1 July 2022; Forster, 2012a).
Author contributions. LSF, ACM, and PAP developed the measurement methodology, and the measurements were conducted by LSF at UEA and WAO. AJE developed the software used to run the analyser. Investigation and visualisation were completed by LSF. Writing was carried out by LSF. Reviewing and editing were done by LSF, ACM, PAP, and GLF.