Comparison of continuous in situ CO 2 observations at Jungfraujoch using two different measurement techniques

. Since 2004, atmospheric carbon dioxide (CO 2 ) is being measured at the High Altitude Research Station Jungfraujoch by the division of Climate and Environmental Physics at the University of Bern (KUP) using a nondispersive infrared gas analyzer (NDIR) in combination with a paramagnetic O 2 analyzer. In January 2010, CO 2 measurements based on cavity ring-down spectroscopy (CRDS) as part of the Swiss National Air Pollution Monitoring Network were added by the Swiss Federal Laboratories for Materials Science and Technology (Empa). To ensure a smooth transition – a prerequisite when merging two data sets, e.g.,

. Additionally, in situ CO 2 measurements can be used in modeling studies as input parameters or to verify model output (Uglietti et al., 2011;Chevallier et al., 2010;Peters et al., 2010). Therefore it is crucial that the data sets are well calibrated and traced back to a common reference scale and that the measurement systems demonstrate high precision and accuracy. This guarantees minimal biases between different measurement sites and laboratories and improves the compatibility of the data sets. For CO 2 measurements, the World Meteorological Organization (WMO) recommends a compatibility of ±0.1 ppm for the northern hemisphere (WMO, 2011).
High-altitude observatories such as the High Alpine Research Station Jungfraujoch (JFJ) are predestined to monitor the background mole fraction of trace gases like CO 2 because they are mostly in the free troposphere.
That is why the division of Climate and Environmental Physics of the Physics Institute, University of Bern (KUP) started measuring CO 2 with a non-dispersive infrared (NDIR) gas analyzer (Sick Maihak, Germany, model S710) embedded in a combined CO 2 / O 2 analyzing system at JFJ in late 2004. At the beginning of 2010, a cavity ringdown spectrometer (until July 2011 a Picarro Inc., USA, model G1301, afterwards a Picarro Inc., USA, model G2401) was installed by the Swiss Federal Laboratories for Materials Science and Technology (Empa) as part of the Swiss National Air Pollution Monitoring Network (NABEL), enabling a long-term performance of CO 2 measurements at JFJ by Empa. To ensure that both records could be merged smoothly, both systems ran in parallel for a significant period, not only to detect potential biases but also to provide information about the compatibility of the two different measurement systems as had been done at other stations, e.g., at Mauna Loa (Komhyr et al., 1989;Peterson et al., 1977). Because of the year-round manned infrastructure and excellent accessibility by train, it was possible to test the compatibility of the two systems in situ under real conditions at a background site featuring only limited atmospheric variation.
In this study we report the data of the KUP and the Empa in situ CO 2 measurement system from January 2010 until December 2012.

Sampling site
The High Altitude Research Station Jungfraujoch (JFJ) is located 7 • 59 20 E, 46 • 32 53 N on the northern ridge of the Swiss Alps. The station itself is on a mountain saddle between the mountains Mönch (4099 m a.s.l.) and Jungfrau (4158 m a.s.l), at an altitude of 3580 m a.s.l. and is accessible by train the whole year round. Because of the high elevation, the station is above the planetary boundary layer most of the time and receives predominantly air from the free troposphere. It is therefore an ideal location to measure the atmospheric background air of continental Europe (Henne et al., 2010;Zellweger et al., 2003;Baltensperger et al., 1997). Nevertheless, the station is sometimes influenced by polluted boundary layer air, especially during Föhn events (Zellweger et al., 2003) or during hot summer days, when air from the surrounding valley is thermally uplifted to JFJ (Zellweger et al., 2000;Baltensperger et al., 1997). A comprehensive in situ measurement program with more than 70 trace gases and a large suite of aerosol properties is run continuously at JFJ by an international consortium of research institutions. JFJ is also one of the currently 29 Global Atmosphere Watch (GAW) stations.

NDIR measurements (KUP)
The KUP CO 2 measurements are based on a combined system to monitor CO 2 and O 2 changes in the atmosphere. The ambient air enters through a strongly ventilated (600 m 3 h −1 ) common inlet on the observatory's roof to a manifold, which serves many air detectors, where an aliquot is drawn to the KUP system. The air is cryogenically dried to a dew point of −90 • C (FC-100D21, FTS systems, USA). CO 2 is measured by a NDIR spectrometer (Maihak S710) with a working range of 350 to 450 ppm and a frequency of 1 Hz; O 2 is measured by a paramagnetic cell. To avoid influences caused by ambient air temperature and density fluctuations, the temperature as well as the pressure is stabilized. The box with the pressure and flow control system is embedded in a temperature-controlled box (45 ± 0.05 • C); the precision of the pressure regulation is ±0.05 mbar and the measurement cell of the NDIR analyzer is heated to 55 ± 0.05 • C. Measurements are done in a cyclic sequence of 18 h: each gas is measured for 6 min, with only the last 115 s of a 6 min period used for mole fraction determination to allow for signal stabilization after changing the sample source. At the beginning of each sequence, the system is calibrated with two reference gases (high and low span). A working gas is measured between two ambient air measurements to correct for shortterm variations. Therefore, an 18 h measurement sequence looks as follows: where B represents the working gas, G the high span CO 2 /low span O 2 , K the low span CO 2 /high span O 2 , H the target cylinder and A represents ambient air.
All measurements ending at a particular hour are used for the calculation of hourly mean CO 2 observations, which in our case includes six ambient observation values per hour. Cylinder measurements (see Table 1) with a known mole fraction show a precision better than 0.04 ppm for 1 h analysis. The precision is calculated as the square root of the sum

Cavity ring-down spectroscopy measurements (Empa)
From late December 2009 until July 2011, Empa was measuring CO 2 with a commercially available wavelengthscanned cavity ring-down spectrometer (Picarro Inc., USA; G1301) coupled to a custom-built calibration/drying unit. Initially, the sample air from the manifold was dried prior to analysis by means of a Nafion dryer to a dew point of < −30 • C. Along with CO 2 , the instrument also measured CH 4 and H 2 O at the same sample frequency of approximately 0.5 Hz. The H 2 O measurements allowed for correction of the CO 2 mole fraction in case of interferences (i.e., dilution and pressure broadening) due to potential water moisture in the system. From August 2010 on, the Nafion dryer was short cut, no water vapor removal was used and CO 2 dry air mole fractions were determined after application of an empirical humidity correction. The empirically determined water vapor corrections for CRDS analyzers do not significantly change over time and are very similar for different analyzers. According to Rella et al. (2013), the error in the reported dry air CO 2 mole fractions when using different correction coefficients is < 0.1 ppm, up to a H 2 O volume mixing ratio of 2.5 Vol-%. 2.5 Vol-% of H 2 O at JFJ refers to a dew point of more than 14 • C, a value that was never reached at JFJ. The error can be further reduced when using the dedicated correction coefficients for the respective analyzer and doing repeated water vapor interference measurements, as is the case here.
After a system breakdown due to a faulty electronic board in July 2011, the G1301 analyzer was replaced in September 2011 by the newer G2401 model (Picarro Inc., USA) that is also capable of monitoring CO mole fractions. Since then, the sample gas has been dried again for the beneficial effects with respect to the precision of the CO analysis.
Calibrations are performed every 46 h with two calibration gases (high and low span). In addition, a target gas is analyzed every 15 h to detect potential shorter-term instrument sensitivity changes. The mole fractions of the calibration gases are determined by the World Calibration Center for CH 4 , CO 2 , CO and O 3 at Empa (Table 3). Measurements of used KUP standard cylinders at JFJ show a precision better than 0.04 ppm for the 15 min analysis (Table 1). Similar to the KUP, CO 2 mole fractions are also reported on the WMO X2007 scale.

Calibration gases
The gas used to calibrate the KUP measurements is compressed outside air filled in steel cylinders delivered by Carbagas (Switzerland). Normally, each gas cylinder is first delivered to our laboratory at the University of Bern and measured for its CO 2 mole fraction and δO 2 / N 2 and δAr / N 2 values by mass spectrometry and a combined Licor 7000 NDIR and Oxzilla (Sable Systems, USA) system (CO 2 and O 2 ). The measured values are calibrated with a subset of three standards from the WMO/GAW Central Calibration Laboratory (CCL) run by the National Oceanic and Atmosphere Administration (NOAA, Boulder, USA) and are used as assigned values of the KUP system at JFJ (see Table 2). The cylinders are shipped to JFJ and stored in the basement until their usage as calibration gases. A maintained stock of calibration gases at JFJ is necessary because the KUP system has to be calibrated frequently for O 2 measurements (Uglietti et al., 2008) and has a rather high gas consumption of up to 150 mL per minute.
The calibration gases used by Empa consist of compressed dry natural air in aluminium cylinders filled by themselves with a modified oil-free compressor (Rix Industries, USA). The mole fractions of the gases are determined by  Table 3). After measuring the calibration gases, the cylinders are shipped to JFJ.
Lists of the calibration gases used by both systems during the period of intercomparison are included in Tables 2 and 3.

Comparison of the two data sets
The CO 2 measurements of both systems for the period 2010-2012 are shown in Fig. 1. In general, the two data sets show a good agreement. There are two longer gaps in the CRDS record from 11 June to 16 September 2011 and 5 to 18 January 2012. The first gap was caused by technical problems of the Picarro, which was replaced by a new four-channel model. In January 2012 a hard-disk failure was responsible for the roughly 2-week break. The NDIR data set shows four longer gaps from 31 January to 16 February 2011, 21 to 28 December 2011, 14 to 25 April 2012 and 7 to 24 October 2012. The reason for these gaps in the NDIR data set was technical issues with the Maihak analyzer, eventually leading to a replacement of the analyzer with an identical one. Furthermore, a power failure was responsible for the failure of a gas flow controller and, in April and October 2012, the dysfunction of the gas drying system. Throughout the record, small gaps are also present due to system crashes, failures in the regulation of the gas flows, etc. The remaining hourly mean values, where we have overlap of both systems (20 460 common hours), are in good agreement. The data sets show a similar seasonality as well as peaks caused by pollution events (e.g., advection of air masses from the Po basin in Italy or from northeastern Europe).
The average seasonality of the monthly mean values, spanning 36 months, is 10.01 ± 0.33 ppm and 10.05 ± 0.37 ppm for the CRDS and the NDIR data set, respectively. Due to missing data, one monthly value in each data set has to be interpolated. For the complete data sets, the CRDS data show an annual increase of 1.89 ± 0.01 ppm year −1 , whereas the NDIR data show a slope of 1.69 ± 0.01 ppm year −1 ; the uncertainties correspond to the error of the linear increase fitted to the data. Selecting only data points where both systems have overlapping data points, the difference in the slopes decreases slightly as documented by values of the annual increase of CO 2 of 1.91 ppm year −1 (CRDS) and 1.72 ppm year −1 (NDIR). A correlation plot of the CRDS data set against the NDIR data set reveals a very high correlation with a slope of 0.991 ppm ppm −1 , an intercept 3.662 ppm and a R 2 of 0.9909 (Fig. 2, grey points). It can be seen in Fig. 3 (grey points) that the differences of the two data sets are very stable over the 3-year period of the comparison. The average of the difference between the CRDS data set and the NDIR data set is 0.04 ± 0.40 ppm (Fig. 3, grey points); negative and positive differences equal out to almost zero. During periods of rapid CO 2 changes, the standard deviation of the differences is larger than during stable periods and significantly exceeds the precision of both instruments. This point can be emphasized by zooming in to higher-resolution data (115 s averages). For this exercise, the high-resolution CRDS data are aggregated to 115 s averages corresponding to the averaging intervals of the NDIR system. It can be clearly seen that the difference becomes noisier with fast changes (e.g., during the late afternoon in Figs. 4 and 5) and decreases again with more stable conditions (e.g., during the nighttime in Figs. 4 and 5).
The behavior of short-term variations is different for both systems as shown by a change of statistical characteristics when discarding values for which the change rate is above a certain threshold (Table 4). These changes are most probably due to different volumes and gas flow resulting in different residence times and leading to dispersion effects. For example, the NDIR system uses a water trap with a large volume, potentially dampening the CO 2 signal despite the higher flow rate compared to the rather small Nafion dryer of the CRDS system. The cutoff criteria of the change rate is set to 0.75 ppm h −1 and values with a higher change rate are discarded. Below 0.75 ppm h −1 there is still an improvement of the agreement, but the loss in data points is too high. Additionally, the standard error starts to increase again below a change rate of 0.75 ppm h −1 because many data points are omitted by this criteria. By setting the threshold to 0.75 ppm h −1 , roughly 20 % of the common data points are omitted. The correlation between the remaining data for both systems gets slightly closer to the ideal 1 : 1 function with a slope of 0.996 ppm ppm −1 , an intercept of 1.5696 ppm and a R 2 of 0.9924 (Fig. 3, black points). Also, the differences between the CRDS and the NDIR data set are smaller; considering only the data points of the more stable conditions (Fig. 3, black points), the average of the difference remains almost 0, namely 0.05 ppm ± 0.32 ppm.
However, there are problematic features worth mentioning. By replacing the working gas B26 with B27 and B28 with B29, small shifts occur that are most likely caused by an inaccurate assignment of the CO 2 values of the working gases (Fig. 3). By excluding the periods of the cylinders B27 and B29, 11 038 common data points remain; the average of the differences, however, stays almost the same, namely −0.03 ppm but with a reduced standard deviation of 0.25 ppm.
In late 2010, the difference between the two data sets increased significantly because of technical issues (see Sect. 3.1). After replacing the analyzer, the difference was again very small and stable (Fig. 3). Furthermore, with the exchange of the Picarro in summer 2011, a small jump in the difference between the two systems occurred (Fig. 3). Since the replacement happened during a single working gas period of the NDIR system (B27), the offset was probably caused by the change of the Picarro analyzer.

Drift of calibration gases and their corrections
In the case of regular periodic calibrations, the pressure of a calibration gas cylinder decreases relatively linear over its lifetime. Because steel cylinders show pressure-dependent adsorption and desorption effects of gases such as CO 2 or H 2 O (Langmuir, 1918), CO 2 continuously desorbs from the cylinder walls and increases the CO 2 mole fraction of the calibration gas during its usage. To avoid a large drift of the CO 2 mole fraction in calibration gases, Keeling et al. (1998) recommends using calibration gas cylinders only to a remaining Figure 2. The hourly averages of the CRDS CO 2 measurements versus the hourly averages of the NDIR CO 2 measurements (grey points) and the CRDS CO 2 measurements versus the hourly averages of the NDIR CO 2 measurements without periods of rapid mole fraction changes (black points) over the whole period of comparison. The dashed diagonal represents the ideal 1 : 1 agreement. Considering all data points, the R 2 is 0.9909 with a slope of 0.9908 ppm ppm −1 and an intercept of 3.6623 ppm. Excluding periods with rapid mole fraction changes, the agreement of the two data sets is slightly better with a R 2 of 0.9924, a slope of 0.9961 ppm ppm −1 and an intercept 1.5696 ppm. Figure 3. The difference of the CRDS -NDIR CO 2 measurements of all common hourly data points (grey points) and only during stable periods with a CO 2 change of less than 0.75 ppm h −1 (black points) against time. The green bars represent changes of the working gases of the NDIR system with the green number indicating the according cylinder and the red bars indicate the changes of the low span with the according number, also in red. The high span was not replaced throughout the entire period.
pressure of 25 to 30 bar. The adsorption of CO 2 to the cylinder wall can be calculated according to Langmuir (Langmuir, 1918) and corrected for by the formula: where WG meas corresponds to the measured CO 2 value of the working gas [ppm], a and b are constants (related to the Avogadro constant, the number of elementary spaces of the surface, the molecular mass and the temperature), p 0 is the start pressure and p(t) is the pressure at the time of the measurement. This enrichment of CO 2 is more pronounced at lower pressures compared to high pressures, as seen in Eq. (1). For example, at 100 bar cylinder pressure the correction is only about 0.05 ppm, whereas at 30 bar it is already up to 0.3 ppm (with a = 5 715 797 and b = 568 897, a and b values were derived from measurements of cylinder B23). In steel cylinders, present water vapor would cause an even more pronounced desorption effect. However, all cylinders used here contain dry gas. In Leuenberger et al. (2014), an in-depth discussion of this adsorption/desorption influence of gases on steel and aluminium cylinders is presented, based on dedicated experimental results in our laboratory as well as in a climate chamber. In April 2010 the working gas tank B23 ran completely empty due to a faulty pressure reader and therefore showed such an enrichment effect. Hence the CO 2 values of the NDIR calibrated with the uncorrected working gas are under-  estimated compared to the CRDS measurements. By recalculating measurements of the working gas using only the low and the high span (Fig. 6, black points), it is possible to approximate the evolution of the working gas CO 2 enrichment (Fig. 6, grey points) and correct for it (Fig. 7). The NDIR data are in much better agreement with the CRDS measurements after applying this correction. Comparing the uncorrected values of the B23 period to the corrected period, the R 2 of the NDIR values against the CRDS values increases from 0.89 to 0.99 (Fig. 8). The remaining working gases cylinders are recalculated in the same manner. Most of them show similar behavior as B23, even when they are replaced at a remaining pressure of around 25 bar. The only difference is the intensity of the enrichment, which is weaker than with B23 because they did not run completely empty. Note that for B23 we use time instead of pressure as the dependent variable because the pressure reader did not work properly during this period. Although not preferred, this is possible because the pressure decreased linearly long-term.

Polynomial correction of the NDIR Maihak measurements
NDIR spectrometers are nonlinear in response and hence require a correction of the data -generally of the polynomial form. Ideally, the analyzer response function is implemented by the producer, mostly for a restricted mole fraction range. This is also the case for the Maihak system. The Maihak measurements at JFJ are done in differential mode; thus the reference cell is always flushed with gas of known CO 2 mole fraction and compared to the cell with the sample. To set the polynomial function to overcome the nonlinearity of the device, the mole fraction of the reference gas has to be entered into the analyzer's system. Table 4. The statistics of different cutoff criteria of rapid CO 2 mole fraction changes and their influence on the number of points remaining, the average of the hourly differences of the NDIR and the CRDS system, the standard deviation of the differences and the according standard errors. The differences between the cutoff criteria are most probably caused by different volumes and flow rates of the systems and small time lags in the time stamps. The measurement precision of each system is about 10 times better than the standard deviation of the differences.  6. The CO 2 values of the calibration gas cylinder B23 calculated by using only the low and the high span cylinders against time (black points) and the calculated best fit function according to Langmuir (grey points) with a = 5 715 797 and b = 568 897. The CO 2 values increase with time (corresponding to decreasing pressure). The best fit function was used to correct the measured working gas CO 2 values of the NDIR system.
In fall 2011, 11 used calibration gas cylinders of the KUP system were measured with the CRDS system and remeasured with the NDIR system. The remeasuring of these cylinders reveals that the polynomial function of the Maihak is insufficient. The second order polynomial of the difference between CRDS (Empa) -NDIR (KUP) as a function of CRDS (Empa) yields a R 2 of 0.86. A very similar polynomial dependence is found between the assigned values of NDIR (KUP) and the assigned values according to Table 2 when discarding two cylinders with low remaining pressure (Table 1). We prefer the first polynomial because the cylinders might have already faced desorption effects due to changing pressure between the assignment measurement in Bern (> 2 years before) and the remeasurement at JFJ, whereas the comparison with the CRDS took place within a few days. Since the cavity ring-down technique is linear in the considered mole fraction range, the factors of the polynomial are used to correct the measurement values of the NDIR system, removing this dependence completely ( Fig. 9 and Table 1). The correction is only applied to the data measured by the NDIR analyzer actually in use at JFJ. The data of the first NDIR analyzer, which had to be replaced in the beginning of 2011 due to technical problems, remain unaltered since the working points of the two NDIR instruments are different. There is also no possibility to correct the data of the older NDIR analyzer properly, because at the time there were no comparable measurements with a nearly linear measurement system. However, values in the center of this span, where most of the ambient CO 2 mole fractions were measured, should not have been affected strongly; roughly 75 % of all NDIR values are affected less than 0.1 ppm by the polynomial correction. The accuracy of the standard measurements is calculated as the square root of the sum of squared trueness and squared precision (Menditto et al., 2007) and estimated to be 0.08 ppm over the working range from 350 to 450 ppm based on the CRDS -NDIR comparison including both instrumental uncertainties. Figure 7. Time series of the CO 2 mole fractions determined with CRDS (red points) and NDIR (uncorrected in grey, after correction in black). The desorption-corrected NDIR CO 2 values show a much better agreement with the CRDS CO 2 values than the uncorrected, which are severely underestimated towards the end of the B23 period. After replacement of the working gas cylinder B23 with B24, the uncorrected NDIR CO 2 values show a much better agreement with the CRDS CO 2 values. Figure 8. Correlation of ambient CO 2 mole fraction measured by CRDS versus the uncorrected CO 2 mole fraction of the NDIR instrument (grey points) and the desorption-corrected CO 2 NDIR mole fractions (black points), respectively during the period of the working gas cylinder B23. The diagonal represents the ideal 1 : 1 agreement. Due to the CO 2 enrichment in the working gas cylinder B23 the uncorrected KUP values seem to be much lower than the Empa values. By applying the correction, the values of the two systems are in a much better agreement.

Discussion
Long periods of overlapping measurements are helpful for assessing the compatibility of different analyzers. In this study we analyze two different data sets which were measured at the same station over 3 years (the simultaneous measurements are still ongoing) to determine the compatibility and quantify differences and biases. The compatibility is challenging since the Empa used the cavity ring-down technique (Picarro Inc.) and the KUP a non-dispersive infrared spectrometer (Maihak) to measure the CO 2 mole fraction of the air. Also, the calibration procedure, given the two techniques used, is different and therefore has to be investigated.
The seasonality based on monthly measured averages (10.01 ± 0.33 ppm and 10.05 ± 0.37 ppm for the CRDS and the NDIR data set) between the two data sets agree nicely and are also in good agreement with the results of flask measurements done at JFJ (10.54 ± 0.18 ppm) (van der Laan-Luijkx et al., 2013). The trend calculation (1.89 and 1.69 ppm year −1 ) for the complete and common data periods deviates by 0.2 ppm year −1 . In the first half of this comparison, the difference of CRDS -NDIR is negative, indicating that the NDIR values are higher than the CRDS values. In the second half the differences are mainly positive, especially during the periods with the working gases B27 and B29. Additionally, changing the CRDS instrument caused a small offset. This offset, together with the imperfectly assigned working gases, results in a slope of 0.2 ppm year −1 for the differences and a slope difference of the same amount between the two CO 2 records. The annual CO 2 increase of the CRDS data set is in good agreement with the average global CO 2 trend of 1.97 ppm year −1 for 2003-2012 (Tans, 2014), whereas the NDIR trend is a bit too low. Nevertheless, it is important to mention that for proper trend analysis the period of 3 years is too short. The CO 2 trend of the whole NDIR data set from the end of 2004 until the end of 2013 is 1.99 ppm year −1 , which is in much better agreement with the global trend of the same period of 2.03 ppm year −1 (Tans, 2014).
Despite the fact that it is common knowledge to use well-assigned standard cylinder values for accurate measurements, we find larger offsets for two calibration cylinders, namely B27 and B29 used by the NDIR system. Therefore the calculated CO 2 values of the NDIR data set are a bit too low for these two periods. This effect would have been undetected without the comparison measurements, at least for the B27 cylinder: the offset is only around 0.2 ppm, which is quite small compared to a seasonality of roughly 10 ppm and even more so compared to the daily variations in extreme cases of up to 25 ppm. Experiences of ongoing international comparisons of traveling cylinders highlight similar issues (Zellweger et al., 2011). Therefore, a thorough check of assigned values of calibration and working cylinders is crucial.
Another problem concerning the calibration cylinders is their CO 2 mole fraction. Ideally the low span is lowest in CO 2 mole fraction, the working gas intermediate, preferably close to the expected value of the specimen, and the high span accordingly highest. However, in some periods (e.g., B26, B27, B29) the CO 2 mole fraction of the working gas is higher than, or very close to, the CO 2 mole fraction of the high span (e.g., B28) and therefore not ideal for calibrating the measurements with a very high accuracy. However, because of logistic problems these cylinders have to be used as working gases.
Also, the stability of the CO 2 mole fraction in the calibration cylinders is very important. Adsorption/desorption effects occur when the pressure of steel cylinders drops under 20 to 30 bar (2 to 3 MPa) (Keeling et al., 1998), leading to an enrichment of CO 2 in the cylinder and therefore a CO 2 value different from the assigned. Hence, cylinders should not be used until they are empty. A recalculation of the working gas cylinders with a two-point calibration using the low and the high span show that the effect already starts at roughly 80 to 90 bar (8 to 9 MPa) and has to be corrected for. This finding is supported by Leuenberger et al. (2014). At higher pressures the effects are still very small, but with decreasing pressure the difference between the measured CO 2 mole fraction of the calibration gas and the assigned value increases, thus leading to a decrease in the accuracy of the calculated CO 2 values.
By applying the corrections mentioned above (see Sects. 3.2 and 3.3), discarding two periods of badly assigned working gases (B27 and B29) and excluding periods of fast CO 2 changes, the average difference of the hourly values decreases to −0.03 ppm ± 0.25 ppm. Whereas the average value of almost zero is very good, the standard deviation is still high in comparison to the WMO goal of 0.1 ppm and the precision of each individual system, which is roughly five times better. The rather high standard deviation is probably caused by differing averaging intervals of the hourly values and dispersion of the air parcels in the two systems due to different volumes and flow rates of the air.
Both systems obtain the outside air from the same air inlet at Sphinx observatory at JFJ, but the air is led to the two systems by separate tubing. Due to different flow rates and volumes of the two systems, the origin of an air parcel measured at a certain time is not necessarily exactly the same. For example, the volume of the NDIR drying unit leads to a residence time of 3 minutes, significantly longer than for the CRDS system. Therefore, NDIR system records slightly dampened ambient signals. Furthermore, it is possible that the according air parcels are shifted a little bit, thereby leading to small differences of the CO 2 values of the two systems. This could be the reason for the larger observed differences between the two data sets during fast CO 2 mole fraction changes of the outside air.
Overall, the CO 2 mole fraction measured by the two systems shows the same pattern not only during stable periods but also during extreme events and even in the highresolution data. Most of the noisier periods can be traced to specific technical problems, whereas systematic shifts are most probably caused by badly assigned calibration gases or drifts of the CO 2 mole fraction in the cylinders due to adsorption/desorption effects. While it is possible to correct for the pressure-dependent drifts of the calibration gases mathematically, it is harder to correct for the shifts caused by badly assigned calibration gases since there are not a lot of independent comparison measurements that can be used for correction.

Conclusions
The CO 2 data sets of two different CO 2 measurement systems, a nondispersive infrared analyzer and a cavity ringdown spectrometer, running parallel at the High Altitude Research Station JFJ are compared. The evaluation of the two CO 2 records from the NDIR system of the KUP to the CRDS system of the Empa shows that the two systems are generally in good agreement, not only long-term but also down to the 1 Hz scale despite the fundamentally different measurement principles and gas handling systems. Therefore the two data sets can be used to complement each other. However, the comparison also reveals (i) adsorption/desorption effects in the calibration gas steel cylinders used by NDIR system, (ii) insufficient nonlinearity correction of the NDIR analyzer and (iii) periods of small biases because of badly assigned calibration gas cylinders. Adsorption effects can be corrected for by monolayer adsorption equation (Langmuir, 1918). Nonlinearity of the NDIR analyzer is constrained with a second order polynomial, resulting in better agreement between the two data sets, in particular for values strongly deviating from the average. More research in adsorption/desorption effects of calibration gas cylinders has been done at the KUP and is reported in Leuenberger et al. (2014).
Finally, we like to emphasize that using steel cylinders is not adequate for high-precision trace gas determinations due to gas adsorption/desorption. Therefore, we conclude that all laboratories using steel cylinders for calibration purposes should switch to calibration gases stored in aluminium cylinders to minimize gas composition changes and allow an improved assignment of the cylinder.
It is helpful to continue this comparison exercise for another couple of years to show that once aluminium cylinders are in use for both systems, the compatibility will improve further.