Recent advances in measurement techniques for atmospheric carbon monoxide and nitrous oxide observations

. Carbon monoxide (CO) and nitrous oxide (N 2 O) are two key parameters in the observation of the atmosphere, relevant to air quality and climate change, respectively. For CO, various analytical techniques have been in use over the last few decades. In contrast, N 2 O was mainly measured using gas chromatography (GC) with an electron capture detector (ECD). In recent years, new spectroscopic methods have become available which are suitable for both CO and N 2 O. These include infrared (IR) spectroscopic techniques such as cavity ring-down spectroscopy (CRDS), off-axis integrated cavity output spectroscopy (OA-ICOS) and Fourier transform infrared spectroscopy (FTIR). Corresponding instruments became recently commercially available and are increasingly used at atmospheric monitoring stations. We analysed results obtained through performance audits conducted within the framework of the Global Atmosphere Watch (GAW) quality management system of the World Meteorology Organization (WMO). These results reveal that current spectroscopic measurement techniques have clear advantages with respect to data quality objectives compared to more traditional methods for measuring CO and N 2 O. Further, they allow for a smooth continuation of historic CO and N 2 O time series. However, special care is required concerning potential water vapour interference on the CO amount fraction reported by near-IR CRDS instruments. This is re-ﬂected in the results of parallel measurement campaigns, which clearly indicate that drying the sample air leads to an improved accuracy of CO measurements with such near-IR CRDS instruments.


Introduction
The Global Atmosphere Watch (GAW) Programme of the World Meteorological Organization (WMO) coordinates a network of atmospheric composition observations comprising 31 global stations, more than 400 regional stations, and around 100 contributing stations operated by contributing networks (GAWSIS, 2018). These stations provide long-term observations of atmospheric greenhouse gases (GHGs) and reactive gases such as carbon dioxide (CO 2 ), methane (CH 4 ), nitrous oxide (N 2 O), and carbon monoxide (CO), which are essential for understanding the GHG budget, both regionally and globally. To make full use of these observations, the uncertainty of these measurements must be reduced in order to obtain consistent data series with traceability to common reference standards. Within the GAW programme, Central Calibration Laboratories (CCLs) provide reference stan-dards that are linked to internationally accepted calibration scales (Rhoderick et al., 2016. In addition, World Calibration Centres (WCCs) evaluate GAW stations through independent assessments by on-site system and performance audits ). The Laboratory for Air Pollution/Environmental Technology of the Swiss Federal Laboratories for Materials Science and Technology (Empa) has been operating the WCC for carbon monoxide (CO), methane (CH 4 ), carbon dioxide (CO 2 ), and surface ozone (WCC-Empa) since 1996 as a Swiss contribution to the GAW programme and has conducted over 90 system and performance audits over the past 20 years. Furthermore, WCC-Empa collaborates closely with the WCC for nitrous oxide (WCC-N 2 O) hosted by the Karlsruhe Institute of Technology (KIT) Institute of Meteorology and Climate Research -Atmospheric Environmental Research (IMK-IFU) to increase the number of N 2 O audits. In order to address scientific needs for interpreting regional or global-scale atmospheric observations, the GAW programme sets ambitious network compatibility goals, which are continuously reviewed and, if necessary, revised during biannual meetings of the WMO/GAW community (WMO, 2018). Network compatibility goals are set for amount fraction ranges observed in the unpolluted troposphere, while extended network compatibility goals reflect the less stringent requirements for urban and regional studies with larger local fluxes. The network compatibility goals currently stand at 2 nmol mol −1 for CO and 0.1 nmol mol −1 for N 2 O, whilst the extended goals are set to 5 nmol mol −1 for CO and 0.3 nmol mol −1 for N 2 O. These goals represent the maximum bias that can generally be tolerated in measurements of well-mixed background air used in global models to infer regional fluxes. Some network compatibility goals may not be currently achievable within current measurement and/or scale transfer uncertainties. However, they are targeted for applications which require the smallest possible bias among different datasets or data providers, such as for the detection of small trends and gradients (WMO, 2018).
In situ measurements of tropospheric CO and N 2 O have been available since the late 1960s (Weiss, 1981;Rasmussen, 1983, 1988). While early measurements were mainly analysis results based on flask samples, quasicontinuous measurements have been available since the early 1980s (Brunke et al., 1990). Although continuous measurements of CO and N 2 O began approximately at the same time and were often collocated, challenges with respect to the measurement techniques for continuous measurements were completely different. Carbon monoxide shows high temporal and spatial variability, whilst the detection of very small changes is needed for N 2 O observations. In the past, atmospheric CO and N 2 O measurements at remote locations were almost exclusively made by gas chromatographic (GC) techniques. GC with an electron capture detector (GC/ECD) was by far the most abundant measurement technique for N 2 O, whereas flame ionisation detection (GC/FID) in combination with a methaniser and GC with a mercuric oxide reduction detector (GC/HgO) were the two most commonly used techniques for CO measurements .
Alternatives to GC-ECD for N 2 O are not as abundant, but several methods have been proposed in recent years. These include instruments deploying optical techniques in the mid-IR, e.g. CRDS spectroscopy, FTIR, OA-ICOS, QC-TILDAS, and difference-frequency-generation-based (DFGbased) systems. Lebegue et al. (2016) published a comprehensive overview of these techniques as well as their performance under controlled conditions.
The recently developed optical techniques for CO and N 2 O have clear advantages concerning sensitivity, repeatability, linearity, time response, and temporal coverage, resulting in new measurement setups and calibration strategies. However, only a few published studies comparing spectroscopic techniques with GC systems exist for CO (Zellweger et al., , 2012Ventrillard et al., 2017) and N 2 O (Vardag et al., 2014;Lebegue et al., 2016). Such comparisons of traditional and new techniques are crucial for a smooth continuation of multi-decadal time series when introducing new analytical techniques.
In this paper, we analyse data collected during CO and N 2 O performance audits made by WCC-Empa and WCC-N 2 O from 2002 through 2017 from the perspective of the used measurement techniques. We further present ambientair CO comparisons made with a NIR-CRDS travelling instrument during WCC-Empa audits and show limitations of the NIR-CRDS technique with respect to water vapour interference. Assessment of atmospheric measurements through parallel measurements with a travelling instrument is complementary to performance audits with travelling standards and round-robin experiments and is thus an essential, valuable quality control measure (Hammer et al., 2013b;Zellweger et al., 2016).
C. Zellweger et al.: Advances in measurement techniques for CO and N 2 O observations 5865 2 Experimental methods System and performance audits (hereafter only called audits) by WCCs are part of the quality management framework of the GAW programme (WMO, 2017a). Empa is the designated WCC for CO (since 1997), and since 2009 a collaboration between WCC-Empa and the WCC for N 2 O has allowed WCC-Empa to include N 2 O comparisons during station audits. The concept of station audits has been described elsewhere (Klausen et al., 2003;Buchmann et al., 2009;Zellweger et al., 2016). WCCs use two different approaches to conduct performance audits: (i) comparisons of travelling standards (TSs), i.e. high-pressure cylinders with known nominal values of CO and N 2 O amount fractions, and (ii) parallel measurements using a travelling instrument (TI). The TS method is widely applied, while the TI concept is used less frequently and limited to CO, CO 2 , and CH 4 by WCC-Empa.

Comparisons using travelling standards
The audit concept using TS supplies gases from highpressure cylinders, usually dry natural air or synthetic air, to the instruments of the audited station. Usually, multiple analyses of a set of three or more TSs are made and averaged for the final assignment of the TS value by the audited laboratory. Calibrations of the TS against reference standards before and after the station audit ensure traceability to the CCL, which is run by the National Oceanic and Atmospheric Administration Earth System Research Laboratory (NOAA/ESRL). The results are then analysed by a linear regression of the values measured by the station vs. the reference values assigned by the WCC. At WCC-Empa, N 2 O and CO amount fractions in the TS have been calibrated since 2010 by an Aerodyne quantum cascade laser spectrometer (QC-TILDAS-CS, Aerodyne Research Inc., MA, USA). Before that, an AL5001 vacuum ultra-violet resonance fluorescence analyser (AL5001, Aerolaser GmbH, Germany) was used for CO calibrations. Both instruments are described in more detail in Zellweger et al. (2012). Amount fractions are assigned to the TS using a set of several reference standards purchased from the CCL. The WCC-N 2 O uses a set of TSs traceable to a set of tertiary standards, which are regularly recalibrated against secondary standards at the CCL. For N 2 O, the calibration scales in use were the WMO-X2000 for audits before 2006 and WMO-X2006 and X2006A (Hall et al., 2007NOAA, 2018c) afterwards. CO refers to the WMO-X2000, WMO-X2004, X2014, and X2014A (NOAA, 2018a) calibration scales.
We analysed WCC-Empa performance audit results based on the TS method for carbon monoxide (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) and nitrous oxide (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017), as well as results of N 2 O audits conducted by the WCC-N 2 O (2002WCC-N 2 O ( -2013. Details on analytical techniques, instruments, and calibration scales of these audits are summarised in Table 1 for CO and Table 2 for N 2 O. Since the focus of the paper is on instrument performance, only comparisons involving fully functional instruments were considered. Furthermore, if data have been reprocessed due to any known biases, e.g. in working standards, only the results of the final comparison were considered, since they best represent the performance of the measurement instruments at the time of the audit. CO audits made by WCC-Empa before 2005 were not considered for the comparison due to the following reasons. (i) Stations and WCC-Empa were often not referring to the same CO calibration scale. WCC-Empa was using the WMO-X2000 carbon monoxide scale, while many GAW stations were still reporting on the older WMO-X88 scale (Novelli et al., 2003) or other scales. (ii) WCC-Empa at that time based its calibration of travelling standards only on CO standards above 185 nmol mol −1 ; the WMO-X2000 calibration scale had linearity issues, which have been corrected by the use of the WMO-X2004, X2014, and X2014A calibration scales. WCC-Empa continued using the WMO-X2000 calibration scale until 2011 but used only standards with an amount fraction larger than 185 nmol mol −1 . At these amount fractions, the difference between the WMO-X2000 and WMO-X2004 CO scales are very small and questionably significant within their uncertainties. We therefore consider these two scales as being identical for calibrations made at WCC-Empa. For CO, the assessment has been made in the same standardised way as for carbon dioxide (CO 2 ) and methane (CH 4 ) described in Zellweger et al. (2016), while a slightly different approach has been chosen for N 2 O due to the fact that ambient-air amount fractions increased significantly during the period of observation. The results section gives further details on the methodology.

Ambient-air comparisons
Assessments based on TS comparisons, e.g. during station audits or round-robin experiments, have limitations. They only cover the analytical system and exclude other aspects that might also be relevant, such as inlet or drying systems. The low water content of the TS may, for example, lead to a systematic bias, especially for analysers based on spectroscopic techniques with implemented water vapour correction algorithms. The assessment during on-site audits should therefore include parallel measurements with a TI whenever feasible (WMO, 2011(WMO, , 2012(WMO, , 2014(WMO, , 2016(WMO, , 2018. WCC-Empa implemented this additional approach for CO, CO 2 , and CH 4 audits in 2011. Details of the setup and procedure as well as results for CO 2 and CH 4 are published in Zellweger et al. (2016). Audits involving parallel measurements for CO were conducted using a NIR-CRDS analyser (G2401, Picarro Inc., USA) as a travelling instrument. The Picarro G2401 instrument has an internal water vapour correction mechanism for CO and reports the dry-air amount fraction only. However, these factory-based corrections are often not adequate . Due to the higher analytical noise compared to CO 2 and CH 4 measurements, corrections require a more comprehensive approach . The internal water vapour correction of the TI was evaluated using the water droplet method (Zellweger et al., 2012;Rella et al., 2013). Approximately 0.8 mL of ultra-pure water is injected into a constant flow of about 500 mL min −1 of a dry working standard and delivered to the instrument using a bypass overflow. The CO amount fraction of the standards used for the determination of the water vapour interference ranged from 57 to 741 nmol mol −1 . No dependency of the water vapour interference on the CO amount fraction was observed. For the WCC-Empa CO analyser, the water vapour influence on the CO amount fraction, which is already corrected by the internal water vapour compensation of the Picarro instrument, was then fitted by a quadratic function. Due to the relatively large uncertainties of individual experiments, we were not able to determine a reliable correction function and, therefore, relied on the factory settings for our experiments.
Parallel measurements with the TI of the following GAW stations are shown in this paper: -Puy de Dôme (PUY), France, is a global GAW station that is part of the European Integrated Carbon Observation System (ICOS). A separate inlet system leading to the same location as the air intake of the station analyser was in place for the comparison with the TI. An additional pump at a flow rate of approximately 2 L min −1 flushed this WCC-Empa inlet line. For the last days of the comparison, the TI sampled from the station inlet using the same cryogenic dryer as the station instrument. During this period, the air was dried to a dew point of approximately −50 • C.
-Anmyeon-do (AMY), South Korea, is a regional GAW station run and managed by the Environmental Meteorology Research Division of the National Institute of Meteorological Sciences (NIMS). Air was taken with both instruments from the AMY air inlet system, and the air was dried to a dew point of approximately −50 • C using a cryogenic trap. Table 3 gives an overview of the comparisons, including duration and instruments used. Detailed information about the stations is available from the GAW Station Information System (GAWSIS, 2018).

Analysis of travelling standard comparison
One of the objectives of this work was to evaluate the performance of instruments for measuring CO and N 2 O at remote atmospheric-research observatories. Of particular interest is the question of whether modern spectroscopic tech-niques such as NIR-CRDS, TILDAS, OA-ICOS, or FTIR have a significant advantage compared to traditional methods and whether spectroscopic techniques improve the results of the performance audits carried out by the WCCs for the corresponding compounds with respect to precision and uncertainty. WCC-Empa made 60 comparisons during station audits using travelling standards for CO (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) Tables 1 and 2 involved the comparison of a set of travelling standards and was then evaluated by a linear regression analysis of the measured values by the stations vs. the WCC assigned amount fractions, which are traceable to the CCL. To judge whether the combinations of the resulting slope and intercept meet the WMO/GAW network compatibility or extended network compatibility goals, a previously described method for CO 2 and CH 4 (Zellweger et al., 2016) was applied to CO and N 2 O. For CO, the bias of 165 nmol mol −1 , which is the centre of the amount fraction range of 30-300 nmol mol −1 , representing the unpolluted troposphere (WMO, 2018), was plotted against the slope of the individual travelling standard comparisons. This amount fraction range sufficiently covers the inter-hemispheric gradient, year-to-year variability, seasonal cycles as well as observed trends for the period of consideration at remote stations. For N 2 O, using a fixed amount fraction range however might not be appropriate due to the significant upward trend of the N 2 O mixing ratio in the atmosphere over the past decades. The range currently representing the unpolluted troposphere has been recently identified as 325-335 nmol mol −1 (WMO, 2018), which corresponds well to the mean global atmospheric N 2 O amount fraction of 328.9 ± 0.1 nmol mol −1 observed in 2016 (WMO, 2017b). A trend analysis made by Blunden and Arndt (2017) showed an annual increase of about 0.8 nmol mol −1 per year over the last decade, which is in agreement with a fairly constant annual growth rate of 0.81 nmol mol −1 per year from 1977 until today as determined by the National Oceanic and Atmospheric Administration (NOAA, 2018b). Based on this, our analysis of N 2 O audit results was made using a variable amount fraction range covering 10 nmol mol −1 with the centre being representative for the unpolluted troposphere for the year of the audit. Table 2 gives the corresponding ranges used for the analysis. This method allows displaying the result of each individual CO and N 2 O audit involving comparisons with travelling standards as a single dot in a bias vs. slope   Figure 1 shows the bias in the centre of the relevant amount fraction of the unpolluted troposphere of 30-300 nmol mol −1 CO vs. the slope for the CO audits listed in Table 1. Perfect agreement would result in bias-slope pairs of (0 nmol mol −1 /1). The allowed bias-slope combinations meeting the network compatibility (green area) and extended network compatibility goals (yellow area) of 2 and 5 nmol mol −1 (WMO, 2018), respectively, are indicated. The distribution of the observed biases and slopes gives further information about potential systematic offsets, which could be present either at the WCC or at the stations. If results are not systematically biased (e.g. by different calibration scales), a normal distribution of the observed bias and slope pairs around 0 nmol mol −1 (bias) and 1.0 (slope) is expected. This was the case for the slope with a mean value of 0.994 and dispersion (1σ ) of 0.068, which is not significantly different from 1 (t test, p = 0.47). However, the bias with a mean value of −2.6 nmol mol −1 and dispersion (1σ ) of 8.7 nmol mol −1 (1σ ) was significantly different from 0 (t test, p = 0.02). A potential reason for this could be an upward drift in standards, which is common for CO in air mixtures at ambient amount fractions (Novelli et al., 2003;Gomez-Pelaez et al., 2013). Drift rates are usually on the order of up to 1 nmol mol −1 per year. To account for this, WCC-Empa frequently retrieves reference standards from the CCL. This might not always be the case at measurement sites. There, standards are often in use over long periods without recalibration or the acquisition of new standards. The use of standards having increased amount fractions due to the drift of instrument calibration will then result in an underestimation of ambient CO, which potentially explains the observed mean bias. Figure 1 shows that reaching the network compatibility goals for CO is extremely challenging. The variety of measurement techniques is quite large and shows clear performance differences between methods. Newer spectroscopic techniques such as the QCL-based TILDAS or OA-ICOS spectroscopy (QCL hereafter) or CRDS generally show bet-ter performance compared to GC methods or NDIR. Moreover, they also yield higher data coverage due to the truly continuous observations in contrast to the semi-continuous GC measurements and the less frequently required application of reference gases compared to NDIR measurements. Higher data coverage further reduces the uncertainty caused by incomplete sampling. Figure 2 summarises the percentage of comparisons that met the network compatibility and extended network compatibility goals for (a) all comparisons, (b) for GC/HgO and GC/FID systems only, (c) NDIR instruments only, (d) VURF instruments only, and (e) for NIR-CRDS and QCL instruments. FTIR is not shown separately, since only two comparisons of one instrument were made. Out of the 60 comparisons, only 13 (21.7 %) met the network compatibility goal and an additional 14 (23.3 %) met the extended goal in the amount fraction range relevant to the troposphere. Good performance over the entire relevant amount fraction range is required, since atmospheric CO variability is large and pollution episodes, e.g. through long-range transport, are common even at remote locations. Calibration strategies therefore should cover the entire range, which is easier to implement for techniques with a linear response such as VURF, NIR-CRDS, and QCL. The analysis of the performance audit results shows that 90 % of the NIR-CRDS and QCL comparisons were meeting the network compatibility or extended network compatibility goal, while this was the case for less than 40 % of the NDIR analysers or GC systems. From the total of 10 TS-NIR-CRDS/QCL comparisons, five (50 %) were within ±2 nmol mol −1 , and an additional four (40 %) were within ±5 nmol mol −1 . The corresponding numbers are significantly smaller for GC-based methods (total of 18 comparisons) and NDIR (total of 23 comparisons), which clearly indicates an advantage of the recent methods compared to more traditional techniques.

Evaluation of CO comparisons
However, these results also depend on calibration and potential issues or differences in the calibration scales. For example, an instrument with perfect repeatability and reproducibility but an incorrect calibration, e.g. by a bias in the calibration standard, can be outside the quality goals only because of calibration issues. In this case, the uncertainty of the linear regression of the travelling standard comparison is expected to be smaller compared to instruments with poorer repeatability and reproducibility. Therefore, the uncertainty  of the linear regression analysis is another measure of the instrument performance. Figure 3 shows a boxplot of the standard uncertainty of the slopes of all CO performance audits grouped by different analytical techniques. The results also confirm the better performance of the QCL and NIR-CRDS instruments compared to GC techniques and NDIR. Interestingly, the performance of NDIR analysers and GC/HgO systems is similar, but this is likely due to different reasons. While the repeatability of GC/HgO systems is generally superior compared to NDIR, appropriate compensation of the non-linearity remains obviously difficult compared to the normally linear but noisy NDIR analysers, resulting in a similar performance of both techniques in the field for the amount fraction range from 30 to 300 nmol mol −1 .
Comparison with the recent WMO/IAEA Round Robin Comparison Experiment, as done for N 2 O (see below), is not straightforward. Changes in the calibration scale during the round-robin experiment jeopardises the direct comparison of the audit results with the round-robin results.   Table 2 along with the allowed bias-slope combinations meeting the network compatibility (green area) and extended network compatibility goals (yellow area) of 0.1 and 0.3 nmol mol −1 (WMO, 2018), respec-  tively. Only results of comparisons with fully functional instruments were considered.

Evaluation of N 2 O comparisons
The results presented in Fig. 4 show that reaching the WMO/GAW network compatibility goals remains difficult for N 2 O. However, calibration ranges at stations can be intentionally limited to the ambient amount fraction typical for their location and time. These ranges are normally significantly smaller than those used in Fig. 4 in the case of N 2 O. Therefore, bias-slope pairs outside the network compatibility goals do not necessarily imply that the measurements at a station are biased, but they are indicative of the performance of the instrument and its calibration over a given amount fraction range. The dashed green and yellow lines in Fig. 4 denote the limits for meeting the network compatibility and extended network compatibility goals at the relevant amount fraction.
As discussed above for CO, the distribution of the observed biases and slopes is an indicator of potential sys- Figure 6. (a) Percentage of the results of the sixth round-robin experiment that were for the range of the relevant amount fraction ±5 nmol mol −1 within the WMO/GAW network compatibility goals (green), the extended network compatibility goals (yellow), or outside the network compatibility goals (red area). (b) Same as (a) but at the relevant amount fraction (see text for details). tematic offsets, either at the WCCs or at the stations. No significant deviations were observed for audits carried out by WCC-Empa, with a mean value of the bias of 0.32 nmol mol −1 and a dispersion (1σ ) of 1.09 nmol mol −1 (t test, p = 0.11), and a mean value of the slope of 0.965 with a dispersion (1σ ) of 0.093 (t test, p = 0.21). WCC-N 2 O comparisons showed no significant deviations with a mean bias of −0.12 nmol mol −1 and a dispersion (1σ ) of 0.89 nmol mol −1 (t test, p = 0.35); however, the deviation of the slope with a mean value of 0.954 and a dispersion (1σ ) of 0.067 was significant (t test, p = 0.01). This result indicates that at the launch of the audits in 2002 the linearity problem of the ECD was not fully considered in the data evaluation by the audited stations. The GC/ECD technique, which contributes most to the results, is known to be highly non-linear (Lebegue et al., 2016), and consequently, deviations are expected for amount fractions away from the relevant level if the non-linearity of the systems had not been determined accurately enough.
With ongoing data quality assurance activities and the implementation of linearity corrections for the ECD response, the slope now is close to 1 for more recent performance audits. Figure 5 presents the result of the above analysis as percentages of comparisons meeting the network compatibility and extended network compatibility goals. Until now, none of the performance audits conducted by either the WCC-N 2 O or WCC-Empa achieved the compatibility goal of 0.1 nmol mol −1 , and only one third of the results were within the extended goals of 0.3 nmol mol −1 when an amount fraction range of 10 nmol mol −1 is considered. This slightly improves if we consider only the bias at the relevant amount fraction. The relevant amount fraction corresponds to the value at the centre of the relevant range for the corresponding year. Under these less stringent conditions, we find 19.4 % compliance with the network compatibility goal and 36.1 % with the extended network compatibility goal. This is in line with the small variations in N 2 O at remote locations and the corresponding limited calibration range of many stations mentioned above. Lebegue et al. (2016) recognised that measurements of small variations in the N 2 O amount fractions using GC/ECD is very challenging, which is in agreement with the TS comparison results from the station audits of this work.
The results obtained during the performance audits by WCC-Empa and the WCC-N 2 O compare well with the recent WMO/IAEA Round Robin Comparison Experiment organised and coordinated by the CCL for N 2 O hosted by NOAA. The sixth round-robin experiment took place in 2014-2015, and involved the comparison of two standards, one containing a lower (average of 321.6 nmol mol −1 ) and the other a higher (average of 333.7 nmol mol −1 ) N 2 O amount fraction (NOAA, 2018d). A total of 25 laboratories participated in this exercise. With this dataset, we made the same analysis as described above after the exclusion of two laboratories using calibration scales other than WMO-X2006A. The percentage of laboratories fulfilling the WMO network compatibility and extended network compatibility goal was very similar to the results from the station audits by WCC-Empa and the WCC-N 2 O, as shown in Fig. 6.
Out of the 25 laboratories in the round-robin experiment, only two (8 %) were entirely within the WMO/GAW network compatibility goal of 0.1 nmol mol −1 for the 10 nmol mol −1 range. At the relevant amount fraction, the percentage of laboratories that were not meeting the quality goals was very similar for the WCC audits (44 %) and the round-robin experiment (40 %).
The above results, both for TS comparisons during audits and the round-robin experiment, are clearly illustrating that it remains highly challenging to reach the network compatibility and extended network compatibility goals for N 2 O. In contrast to advances made for the detection of CH 4 , CO 2 (Zellweger et al., 2016), and CO, measurements of N 2 O were in most cases still based on gas chromatography, and only a few recent comparisons involved spectroscopic techniques.  The data for N 2 O clearly indicate the advantages of the spectroscopic techniques compared to gas chromatography. The uncertainty of the observed intercepts and slopes of the linear regression gives information on the linearity and repeatability of the system. The uncertainty of the slope of the linear regression was significantly smaller for QCL and FTIR analysers (median of 0.0028, standard deviation of 0.0031) compared to GC/ECD systems (median of 0.0126, standard deviation of 0.0284). Despite the better performance regarding the linearity and repeatability of the spectroscopic techniques compared to GC/ECD, no clear advantage of the spectroscopic methods was observed during the performance audits. A potential reason could be the uncertainty of the calibration standards, which in the case of N 2 O is of the same order or even larger than the WMO/GAW network compatibility goal. The CCL determined a reproducibility of N 2 O calibrations in the ambient range of ∼ 0.22 nmol mol −1 (95 % confidence level) (Hall et al., 2007;NOAA, 2018c), which is larger than the network compatibility goal. However, this uncertainty is low compared to uncertainties associated with the gravimetric preparation of standards, which highlights the importance of maintaining and propagating calibration scales (Brewer et al., 2018) as implemented in the WMO/GAW programme. Therefore, it is yet too early to quantify this improved performance of spectroscopic techniques for N 2 O and give a final statement with respect to the network compatibility goals.

Ambient-air comparisons
The above results, as well as round-robin experiments, are travelling standard comparisons and are therefore not covering all aspects of ambient-air measurements. Other aspects include bias due to sampling procedures, drying, or potentially relevant insufficient accounting of spectral interferences, e.g. by water vapour. For example, Chen et al. (2013) demonstrated that accurate measurements of CO in humid air is possible with the NIR-CRDS technique implemented by Picarro. Correction functions however are different for each individual instrument, and as a result of the work of , these functions have been implemented in Picarro NIR-CRDS CO analysers since 2012.
WCC-Empa started with parallel measurements of ambient air for CO, CO 2 , and CH 4 during station audits in 2011. The results of the greenhouse gas comparisons showed that additional information, e.g. related to air inlet systems, is obtained by these comparisons (Zellweger et al., 2016). However, these comparisons were in many cases less conclusive for CO. Some parallel measurements showed differences that were not present in the travelling standard comparisons. Sampling issues were unlikely because the ambient-air comparison of CH 4 and CO 2 agreed well. Therefore, other issues like interferences of ambient-air constituents may cause an additional bias.
For example, the comparison made at the global GAW station Puy de Dôme (PUY) in 2016 showed significant deviations in ambient CO measurements, as illustrated in Fig. 7, while the TS comparison showed good agreement. During this period, the PUY analyser was measuring on average 5.85 nmol mol −1 (standard deviation of 0.94 nmol mol −1 ) higher than the TI. Despite this bias, both instruments captured the temporal variation well. The WCC-Empa travelling instrument was sampling from the same air intake location but with a completely independent sampling line. In contrast to the PUY instrument, which sampled air dried to a dew point of −50 • C, the air sampled by the travelling instrument was not dried. As discussed in Sect. 2.2, the factory-implemented water vapour correction was used. The observed bias correlates with the measured water vapour, as shown in Fig. 8, which indicates issues with the internal water vapour compensation of the TI. Water vapour correction functions of this instrument were determined three weeks before and three weeks after the comparison campaign with a droplet test, in analogy to the method described by Rella et al. (2013). Figure 9 shows the ratio of CO (humid, corrected) / CO (dry) against the measured water vapour content of the TI; CO (dry) is the amount fraction measured by the instrument in the absence of water, and CO (humid, corrected) the water-vapour-corrected CO amount fraction reported by the Picarro G2401 during the humidification by the droplet test. Since the Picarro G2401 reports CO only as a dry-air amount fraction, the measured ratio should be equal to 1 and not depend on water vapour content. However, a significant change in the CO response in relation to water vapour was observed. The TI was underestimating the CO amount fraction in the experiment before the campaign (Fig. 9a), and it then changed to an overestimation after the campaign (Fig. 9b). Possibly, this has been influenced by the upgrade to a new software version of the TI between the two periods. Unlike for CO 2 and CH 4 , individual water vapour correction functions for CO can currently not be determined with sufficient accuracy to achieve the WMO/GAW network compatibility goal of 2 nmol mol −1 . Individual experiments using the droplet test have a large un- Figure 10. Ratios of CO (humid, corrected) / CO (dry) amount fractions vs. the water vapour mixing ratios of two different Picarro G2401 NIR-CRDS analysers over time. The legend shows the date (dd-mm-yy) of the experiment. The coloured areas are the limits for the WMO/GAW network compatibility goal (green) and extended (yellow) network compatibility goal at the amount fraction of 300 nmol mol −1 CO. Figure 11. Comparison of hourly averages of CO at PUY between the WCC-Empa travelling instrument and the PUY Picarro G2401 for the period when the TI sampled dry air. (a) CO time series. (b) CO bias of the station analyser vs. time. The green and yellow areas correspond to the WMO network compatibility and extended network compatibility goals. certainty due to higher instrumental noise for CO compared to CH 4 or CO 2 . Furthermore, CO correction functions seem to be less stable over time, and sudden changes are possible. Figure 10 shows fitted ratios of CO (humid, corrected) / CO (dry) vs. the measured water vapour content for two different instruments over a period of several years. Both instruments show significant variation over time in the humiditycorrected CO reported by the analyser. Consequently, drying of the sample air could improve CO measurements with Picarro G2401 instruments and likely with Picarro G1302 and G2302 CO/CO 2 /H 2 O analysers. This has been confirmed by a period of dry ambient-air measurements of both instruments at PUY. Figure 11 shows the comparison of the two analysers during the audit collocation measurement. In this case, the TI was connected to the same sampling line as the PUY instrument after the cryogenic trap, and both instruments were measuring dry air. The bias of the PUY analyser significantly decreased to 1.20 nmol mol −1 with a dispersion Figure 12. Bias of the PUY Picarro G2401 CO instrument vs. WCC-Empa assigned values. Black dots represent the average of the data at a given level from a specific TS comparison. The error bars show the standard deviation of individual measurement points. The green and yellow lines correspond to the WMO network compatibility and extended network compatibility goals, and the green and yellow areas to the amount fraction range relevant for PUY. The dashed lines around the regression lines are the Working-Hotelling 95 percentage confidence intervals. The coloured dots show the bias during the ambient-air comparison without (blue) and with (red) drying of the air sampled by the TI.
(1σ ) of 0.57 nmol mol −1 . This agrees well with the observed bias during the travelling standard comparison. Potentially, the change of the inlet system could also have been the reason for the reduction in the bias. However, this is unlikely because no change in the bias of CH 4 and CO 2 amount fraction, which were both measured simultaneously together with CO over the same inlet line, was observed. Figure 12 summarises the results of the performance audits at PUY with TS, as well as the bias observed during the comparison campaign with humid and dry measurement of the TI.   Figure 13 shows another example of a CO ambient-air comparison made at the regional GAW station Anmyeondo, South Korea, over a period of 1 month in 2017. The comparison was made between the AMY OA-ICOS analyser (LGR-30-EP, Los Gatos Research, USA) and the WCC-Empa Picarro G2401 travelling instrument. Both analysers were measuring ambient air dried to a dew point of −50 • C using a cryogenic trap. Temporal variability at this site is significantly larger compared to PUY, and except for a few spikes, it was well captured by both instruments. The bias of the AMY analyser averaged to 0.10 nmol mol −1 with a dispersion (1σ ) of 3.20 nmol mol −1 over the entire period of the campaign. However, during the first third of the campaign, the AMY instrument was slightly underestimating the CO amount fraction compared to WCC-Empa (bias of −2.28 nmol mol −1 , dispersion of 2.91 nmol mol −1 ), followed by a slight overestimation in the second third (bias of 1.47 nmol mol −1 , dispersion of 2.81 nmol mol −1 ). The last third then showed good agreement between the two systems (bias of 0.57 nmol mol −1 , dispersion of 2.80 nmol mol −1 ). These differences are likely due to different calibration strategies. The TI was measuring three standard gases to calibrate and compensate for drift of the instrument every 30 h. In contrast, manual calibrations were made of the AMY analyser every 14 d with one calibration standard (dried ambient air traceable to the WMO-X2014A scale), applied as a stepwise change fortnightly, and with no further corrections applied in the meantime. These manual calibrations coincide with the observed change in the bias. Consequently, more frequent calibrations or automated measurements of a working standard to compensate for drift would have further improved the agreement. The ambient-air measurements made at AMY were also in agreement with the TS comparison, which is illustrated in Fig. 14. The scatter in the bias is significantly larger for ambient-air measurements compared to the TS comparison. Firstly, part of this may be explained by the calibration strategy, as discussed above. Secondly, differences in the response time for both instrument types as well as residence time in the inlet might further add to the observed scatter, especially in case of rapid changes in the CO amount fraction, which frequently occurred at AMY.
Both campaigns show that accurate measurements of CO are possible if the sample air is dried. So far, this has not yet been implemented at all measurement stations. The above case study at PUY as well as the experiments done involving the droplet tests only investigated the internally implemented water vapour correction of the Picarro G2401, which proved to be not sufficiently stable enough to achieve the network compatibility goals of the WMO/GAW programme. Alternatively, better determination of the remaining water vapour interference is needed. The droplet method might not be suitable due to the relatively fast drying process, which results in relatively high uncertainties due to the analyser's noise. Alternative methods, e.g. as described by Reum et al. (2019) or as implemented by the ICOS Metrology Laboratory, which uses a Bronkhorst vapour delivery module (VDM) to humidify a gas stream from a tank, might give better results. In addition to improvements of the droplet method, alternative ways to compensate for the water-vapour-dependent CO bias need to be explored. Chen et al. (2013) showed that the main uncertainty of the water vapour correction is due to the fact that the weak CO absorption line is bracketed by adjacent absorption lines of CO 2 and H 2 O. Our results indicate that the compensation of the water vapour interference based on the work of Chen et al. (2013), which has been implemented in Picarro analysers since 2012, does not appropriately correct all the bias and may change over time. Therefore, frequent determination of the water vapour interference will be needed to ensure long-term stability of the correction function or to characterise its change over time. However, this will most likely be insufficient to detect the sudden changes in the correction function that were observed in our experiments. Consequently, drying the sample air should be considered when measuring CO with a Picarro G2401 instrument. Both cryogenic traps and Nafion dryers can be used. WCC-Empa now uses Nafion dryers for the parallel measurements during station audits. Both single tube (MD-070-48S-4) and multi-tube (PD-50T-12MPS) Nafion dryers in reflux mode using the Picarro pump for the vacuum in the purge air were employed successfully. This reduced the amount of water to approximately 0.06 %-0.22 % (single tube) and 0.01 %-0.03 % (multi-tube), depending on ambient-air humidity. In case of using Nafion dryers, the standard gases must be passed though the dryer to compensate for a potential loss in the dryer.

Conclusions
The different elements of the WMO/GAW quality management framework, including round-robin experiments, performance audits with travelling standards, and parallel measurements at stations provide complementary information, which is essential for reducing the bias and uncertainty of time series measured by atmospheric-research stations.
The assessment of performance audit results of CO and N 2 O with respect to different measurement techniques showed clear advantages of newer spectroscopic techniques such as NIR-CRDS or QCL spectroscopy in the case of CO. However, parallel measurements made using a Picarro NIR-CRDS analyser identified issues with the implemented water vapour compensation, and further improvement is currently only possible by drying the sample air. This can be implemented though drying the sample air with either cryogenic traps or Nafion dryers.
For N 2 O, one of the limitations is the uncertainty of calibration standards. This highlights the importance of maintaining traceability to an internationally accepted calibration scale as implemented by the GAW programme.
By introducing modern spectroscopic measurement techniques such as CRDS or QCL, the number of GAW stations complying with the WMO/GAW network compatibility goals for CO and N 2 O will increase. However, reaching the network compatibility goal of 2 nmol mol −1 for CO and 0.1 nmol mol −1 for N 2 O will remain challenging. Careful calibration strategies and appropriate water vapour corrections or drying the sample air are required for both CO and N 2 O.
Data availability. Data from the performance audits made by WCC-Empa are available from the corresponding audit reports (http://www.empa.ch/web/s503/wcc-empa, last access: 30 October 2019). Data of the WMO/IAEA Round Robin Comparison Experiment are publicly available on the NOAA Earth System Research Laboratory Global Monitoring Division web page (https://www.esrl.noaa.gov/gmd/ccgg/wmorr, last access: 30 October 2019). Other data used in the paper are available upon request to the corresponding author.
Author contributions. CZ led and designed this study. BB supervises the activities of WCC-Empa. RS made N 2 O comparisons during station audits of the WCC-N 2 O and provided N 2 O comparison data. CZ and MS performed CO and N 2 O comparisons during station audits by WCC-Empa. OL contributed to in situ measurements of CO at Puy de Dôme. HL and SK contributed to in situ measurements of CO at Anmyeon-do. CZ wrote the paper. All co-authors (RS, OL, HL, SK, LE, MS, BB) were involved in scientific discussions and commenting on the paper.