Atmospheric CO 2 , δ (O 2 /N 2 ) and δ 13 CO 2 measurements at Jungfraujoch, Switzerland: results from a ﬂask sampling intercomparison program

. We present results from an intercomparison program of CO 2 , δ (O 2 /N 2 ) and δ 13 CO 2 measurements from atmospheric ﬂask samples. Flask samples are collected on a bi-weekly basis at the High Altitude Research Station Jungfrau-joch in Switzerland for three European laboratories: the University of Bern, Switzerland, the University of Groningen, the Netherlands and the Max Planck Institute for Biogeo-chemistry in Jena, Germany. Almost 4 years of measurements of CO 2 , δ (O 2 /N 2 ) and δ 13 CO 2 are compared in this paper to assess the measurement compatibility of the three laboratories. While the average difference for the CO 2 measurements between the laboratories in Bern and Jena meets the required compatibility goal as deﬁned by the World Meteorological Organization, the standard deviation of the average differences between all laboratories is not within the required goal. However, the obtained annual trend and sea-sonalities are the same within their estimated uncertainties. For δ (O 2 /N 2 ) signiﬁcant differences are observed between the three laboratories. The comparison for δ 13 CO 2 yields the least compatible results and the required goals are not met between the three laboratories. Our study shows the importance of regular intercomparison exercises to identify potential biases between laboratories and the need to improve the quality of atmospheric measurements


Introduction
Atmospheric measurements of greenhouse gases and related tracers are important for studies on the global carbon cycle and climate change research.The carbon cycle includes all processes involving the exchange of CO 2 between the atmosphere, oceans and terrestrial biosphere.δ(O 2 /N 2 ) and δ 13 CO 2 measurements1 offer additional information on the exchange of CO 2 between the different reservoirs (Battle et al., 2000;Ciais et al., 1995;Keeling et al., 1993Keeling et al., , 2011)).Modelling studies use the atmospheric measurements from many globally spread locations to estimate carbon fluxes, which are subsequently used in climate models to understand and predict climate change.One of the major challenges in this field is to minimize the measurement uncertainties and especially to minimize the biases between laboratories and measurement locations.A bias between measurement stations can cause a large difference in the estimated carbon fluxes.For example, the data assimilation system CarbonTracker (Peters et al., 2007) yields considerably different results for the estimated surface fluxes if a constant bias is (artificially) introduced into the measurements of a single observation site.A linear relationship was found between the measurement bias introduced at one station and the obtained surface fluxes.For the North American terrestrial carbon flux, this relationship is found to be 68 Tg C yr −1 (about 10 %) for each 1 ppm of bias introduced in the CO 2 measurement record (Masarie et al., 2011).
To emphasize the importance of the quality of atmospheric measurements, the World Meteorological Organization (WMO) has defined goals for the measurement compatibility of different atmospheric species.The goals are defined based on the required data quality for the use in e.g.inversion studies or the interpretation of large-scale atmospheric data measured by different laboratories.The defined goals for CO 2 , δ(O 2 /N 2 ) and δ 13 CO 2 are ± 0.1 ppm (0.05 ppm in the Southern Hemisphere), ± 2 per meg and ± 0.01 ‰, respectively (WMO, 2011).The first step to reach these compatibility goals between laboratories is that the internal reproducibility within each individual laboratory is below these goals.For CO 2 this is reached by most laboratories with the present-day instrumentation.For δ 13 CO 2 , it is not reached within all laboratories, as it is difficult to reach with currently available techniques.δ(O 2 /N 2 ) measurements are in general very challenging.The absolute atmospheric variations of O 2 are in the same order as for CO 2 , because they are stoichiometrically related.However, they have to be detected against a very high background of 21 % (e.g.Keeling, 1988), compared to the CO 2 background of about 0.04 %.The required goal for the precision of δ(O 2 /N 2 ) measurements of 2 per meg corresponds to a relative precision of about 0.0002 % and is currently not yet reached by the laboratories able to perform high-precision δ(O 2 /N 2 ) measurements.The compatibility for δ(O 2 /N 2 ) measurements between any two laboratories is at the moment not better than ± 5 per meg (WMO, 2011).While an international scale for δ(O 2 /N 2 ) measurements is not yet available, most laboratories use the scale provided by the Scripps Institution of Oceanography, United States (SIO) (Keeling et al., 2007).This scale is also used in this paper.All CO 2 and δ 13 CO 2 measurements are reported on the WMOX2007 scale and the VPDB scale, respectively.
To improve the quality of atmospheric measurements and to verify that measurements at different locations, by different laboratories, are not biased by the used sampling methods, materials, analytical techniques and calibration strategies and scales, intercomparison programs between different laboratories have been started (e.g.Manning et al., 2009;Masarie et al., 2001;WMO, 2011).These programs are used to assess the compatibility between laboratories and measurement locations.In these programs, either real air samples or sets of cylinders containing different concentrations are used.The "super-site" approach requires that flasks are filled with air at the same time and location using the individual sampling protocols of different laboratories and that the flask measurements are performed in the different laboratories.Especially for δ(O 2 /N 2 ) measurements, there are limited studies on this kind of compatibility.The first "super-site" intercomparison program for δ(O 2 /N 2 ) measurements was started in 1991 at Cape Grim, Tasmania, Australia by three laboratories: the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia, the University of Rhode Island, United States and SIO (Battle et al., 2006;Langenfelds et al., 1999).The main global intercomparison program for δ(O 2 /N 2 ) measurements is the Global Oxygen Laboratories Link Ultra-precise Measurements (Gollum) program, in which sets of 3 cylinders are shipped around the world that are measured in the 11 laboratories currently able to perform high precision δ(O 2 /N 2 ) measurements (http: //gollum.uea.ac.uk).Furthermore, another "super-site" intercomparison program is ongoing at Alert, Canada, including δ(O 2 /N 2 ) analyses by SIO and the Max Planck Institute for Biogeochemistry in Jena, Germany (MPI).
In 2007, three European laboratories started a new intercomparison project at the High Altitude Research Station Jungfraujoch in Switzerland.Flasks are filled on a biweekly basis for the laboratories of the University of Bern, Switzerland (UBE), the University of Groningen, the Netherlands (RUG) and MPI.For each laboratory, the flasks filled at Jungfraujoch are identical to the flasks these laboratories use for their own respective field stations.This has yielded unique datasets for the comparison of three different atmospheric species by three laboratories.
This paper first describes the sampling location, sampling procedures, and measurement techniques.Subsequently the results of the measurements of CO 2 , δ(O 2 /N 2 ) and δ 13 CO 2 are presented and discussed.Additionally, the results from another flask intercomparison program ("sausage") are included.

Sampling location
The High Altitude Research Station Jungfraujoch is located at 7 • 59 20 E, 46 • 32 53 N in the Swiss Alps.It is situated at an altitude of 3580 m a.s.l. on a mountain saddle between the mountains Jungfrau and Mönch (http://www.ifjungo.ch).Due to its high elevation, the station is situated above the planetary boundary layer most of the time and the air is mainly influenced by the free troposphere, representing atmospheric background conditions of continental Europe.A flask sampling program was started on site in 2000 by the University of Bern, initially on a bi-weekly basis, and later on the frequency was increased to weekly sampling.The sampling program was extended with the additional bi-weekly sampling for the other two laboratories in this intercomparison program in December 2007.The flask-filling usually takes place on (Friday) mornings around 7.00 a.m.local time (LT) to make sure that the samples represent clean background air and to minimize the influence of uplifted air masses from the boundary layer (Uglietti et al., 2008).

Flask types
For this intercomparison program, glass flasks are filled every 2 weeks with ambient air at Jungfraujoch for the three participating laboratories.Each laboratory uses its proprietary flasks with slightly different designs.The UBE flasks are 1 L glass flasks with two valves each placed at one end of the flask.The flasks are fitted with glass valves from Louwers (Hapert, the Netherlands) with Viton O-rings.The RUG glass flasks have identical valves, but the design is different in that the valves are situated on the same side of the flask.One of the valves is assigned to be the inlet of the flask.On this side a dip tube is placed inside the flask, which is connected to the inlet, so that the air always flushes the entire flask.The volume of the RUG flasks is 2.5 L. The MPI flasks are 1 L glass flasks with two valves, one on each end of the flask.The valves have seals made of Kel-F (PCTFE).More details about the flasks, valves and seals are presented by Sturm et al. (2004) and Rothe et al. (2005).

Flask sampling
Since the end of 2007, flasks have been filled every 2 weeks.In the alternate weeks, flasks are filled for UBE only.For this paper, we have included flasks filled between December 2007 and August 2011, which amounts to 96 different sampling dates.Flasks are filled in pairs for both UBE and RUG, and in triplicates for MPI.The setup of the flask sampling system is represented in Fig. 1.The design of the flask sampling system has been changed during the course of intercomparison project.Before March 2009, all flasks were connected in series in the following order: MPI -UBE -RUG, using a single pump (KNF Neuberger N022AN.18).From March 2009 onwards, two parallel filling setups are used: the MPI flasks are filled using a dedicated pump (KNF Neuberger UN05 ATI) and the UBE and RUG flasks are commonly using the original pump to fill the flasks in series.This change was made to flush all flasks already at their respectively required filling pressure to avoid artefacts during the pressurisation at the end of the filling that were mainly expected for δ(O 2 /N 2 ).The MPI flasks require a higher filling pressure because of the analytical procedures.
Prior to sampling, the air is dried using U-shaped glass tubes filled with anhydrous magnesium perchlorate (Mg(ClO 4 ) 2 ) and sealed with glass wool plugs.Dedicated intake lines are used for the flask filling, which consist of 15 m PVC tubing connected to the sampling units with Synflex (type 1300 with an outside diameter of 6 mm) tubing.Before March 2009 a single intake line and drying tube were used, after that one was used for the MPI flasks and one for the UBE and RUG flasks combined.
To completely flush the entire volumes, the flasks are flushed for about 30 min using a flow of about 2-3 L min −1 .The flasks are flushed and filled to a pressure of 1600 hPa for MPI and 950 hPa for UBE and RUG using pressure relief valves (type Fisher 289A and type KNF FDV 31 KTZ, respectively), while the average air pressure at Jungfraujoch is about 650 hPa.After the filling procedure, the flasks are transported back to their respective laboratories.For MPI, the flasks are transported back soon after each sampling, whereas for UBE and RUG this is done in batches of multiple flasks, leading to a storage time of the flasks at Jungfraujoch in the order of a couple of weeks.The difference between the pressure in the flasks and the local air pressure (also during the waiting time in the laboratories) can affect the concentrations of the air in the flasks, especially the δ(O 2 /N 2 ) values, by permeation through the o-rings used to seal the flasks.This effect was studied by Sturm et al. (2004) and leads to an increased difficulty to meet the compatibility goals for δ(O 2 /N 2 ).

Measurement techniques
After the filling procedure at Jungfraujoch, the flasks are measured in their respective laboratories.For the CO 2 measurements, the method used at UBE is different from the methods used at both RUG and MPI.At RUG and MPI the CO 2 concentration is measured using a Hewlett-Packard Gas Chromatograph (GC), model 6890, comparable to the setup described by Worthy et al. (2003) and van der Laan et al. (2009).More details are presented by Sirignano et al. (2010) for RUG and Jordan and Brand (2003) for MPI.In Bern, the CO 2 concentration is measured simultaneously with the δ(O 2 /N 2 ) values using mass spectrometry.In this case, the CO 2 concentration is also measured as the ratio of CO 2 to N 2 and the obtained δ-value is converted to a CO 2 concentration using the known CO 2 concentration of the machine reference gas.A correction factor is applied to correct for the N 2 O background value produced in the ion source due to sample nitrogen and oxygen reactions.More details about this method are presented by Leuenberger et al. (2000b).
The δ(O 2 /N 2 ) and δ 13 CO 2 measurements are performed in all three laboratories using mass spectrometry.For δ(O 2 /N 2 ), dual inlet isotope ratio mass spectrometers (DI-IRMS) are used in a similar manner as described by Bender et al. (1994).UBE and MPI use a Finnigan MAT DELTA plus XL/XP from Thermo Electron (Bremen, Germany) and RUG uses a Micromass Optima (Micromass, now Isoprime Manchester, UK).More details about the specific measurements in each laboratory are described by Leuenberger et al. (2000a) for UBE, van der Laan-Luijkx et al. (2010) for RUG and Brand (2005) for MPI.
δ 13 CO 2 is measured as the last of the three species presented in this paper, since the CO 2 is first extracted from the air sample before the analysis takes place.At UBE a Finnigan MAT DELTA XL mass spectrometer is combined with a GC column.CO 2 is extracted online from the air sample with liquid nitrogen and the column is used to separate N 2 O from the CO 2 .At RUG, a second Micromass Optima is used.The CO 2 is extracted from the air sample with liquid air (a mixture of 80 % nitrogen and 20 % oxygen), and a correction is applied for the co-trapped N 2 O.At MPI, a Finnigan MAT mass spectrometer is used in combination with the custom developed BGC-AirTrap to separate CO 2 from the air sample.More details are described by Sturm et al. (2006) for UBE, Sirignano et al. (2004) for RUG and Werner et al. (2001) for MPI.

CO 2
For intercomparing CO 2 abundance measurements at the different laboratories, results from 96 filling dates have been included in the analysis.For some dates not all 3 laboratories have valid flask results, due to e.g.logistical problems, measurement issues or leaking flasks.Flask results that were influenced by measurement problems or leakages have been removed from the data set as would be done with any unknown atmospheric background air sample (e.g. when a flask clearly contained laboratory air with a very high CO 2 concentration of over 500 ppm).For each laboratory, the resulting number of sampling dates with valid results for the CO 2 concentrations are: 90 for UBE, 84 for RUG and 82 for MPI.
For UBE, on 80 dates 2 flasks have been used to obtain an average value, for 10 dates there was only 1 valid flask.For RUG, we included 75 values based on the average of 2 flasks and 9 are measurements of a single flask.For MPI, 64 values are averages of 3 flasks, 16 are averages of 2 flasks and for 2 sampling dates only 1 flask was included.For the sampling dates with more than 1 valid flask, the internal reproducibility, i.e. the average standard errors in the mean of the duplicate or triplicate flasks are 0.05 ppm for UBE, 0.06 ppm for RUG and 0.06 ppm for MPI (see also Table 1).For comparison, we recall that the WMO goal for compatibility between laboratories is 0.1 ppm.
Figure 2 shows the results for the CO 2 measurements of the flasks sampled at Jungfraujoch.As indicated above, these values represent average data of 2 or 3 flasks, or the single value of sampling dates with only 1 valid flask sample.The fits shown in the figure are linear trends plus double harmonic seasonal components and do not include those points that are considered outliers of the fit, based on a 2.7 sigma exclusive filter of the residuals.This filter checks if a certain value is more than 2.7 times the standard deviation away from the average, and checks if this is still the case for the updated average and standard deviation after excluding the value, in which case it is considered an outlier.The filter excludes 4 values for UBE, 3 for RUG and 3 for MPI.From the figure it is clear that the flasks from the three laboratories follow the same trend as well as seasonality.In some cases, all three laboratories show a value far away from the fit, but the three data points are close together.These data represent e.g.local or nearby pollution events.There are also sampling dates with large differences between the values obtained by one laboratory compared to the other two, most likely due to e.g.measurement issues or small flask leakages.
Figure 3 shows the differences between each pair of two laboratories.The average values of the differences and their standard deviations are shown in Table 2.For UBE-RUG the average difference is 0.20 ppm, for UBE-MPI this is 0.08 ppm and for MPI-RUG it is 0.14 ppm.The difference between the measurements of UBE and MPI is the smallest.This is true for both the absolute value of the difference as well as the standard deviation of the average difference, which is smaller than for the other two comparisons.The RUG values are slightly lower than the values from the other two laboratories.Although the mean difference between UBE and MPI of 0.08 ppm is within the WMO compatibility goal, the majority of the calculated differences are outside of this range.For UBE-MPI, only 29 % of the differences are within the −0.1 and 0.1 ppm limits.For UBE-RUG this is 18 % and for MPI-RUG this is 20 %.We therefore conclude that these flask measurements do not yet meet the required compatibility goals for CO 2 .
As stated in Sect.2.3, the sampling setup was changed in March 2009.Before this date, the flasks from all three laboratories were sampled in series.After that, the flasks from MPI are sampled parallel to those of UBE and RUG, which are sampled in series.The results for the average differences between the laboratories before and after this change are included in Table 2. From these values it is clear that the MPI values shifted by about +0.2 ppm with respect to the other two laboratories (note that the sign is indeed in the same direction depending on the direction of the comparison), suggesting that the changed setup has removed a bias based on the pressurizing of the flasks.In the first setup, the magnesium perchlorate in the drying tubes is also pressurized (for the MPI flasks) and this can take CO 2 out of the sample.The change of 0.2 ppm suggests that the change in setup has removed this bias from the MPI values.From the values we also see that the standard deviations of the average values increase slightly from the first to the second period.This higher standard deviation is most likely due to the fact that the second period is significantly longer and contains larger variations in the periods summer/autumn 2009 as well as between June 2010 and February 2011.
The fits and derived fit parameters for annual trends and seasonality for the individual data series from each laboratory are shown in Table 3.The average annual trends obtained from the data sets are 1.76 ± 0.17 ppm yr −1 for UBE, 1.94 ± 0.18 ppm yr −1 for RUG and 1.83 ± 0.17 ppm yr −1 for MPI.Within their estimated uncertainty ranges these values correspond well to each other.The average of these values is 1.85 ± 0.09 ppm yr −1 .This corresponds well to the average global CO 2 trend which was 1.89 ppm yr −1 for the period 2008-2011 (Tans, 2013).For the seasonal amplitudes, the three results also agree within their uncertainty bars, although the UBE result is, on the edge of significance, lower than the other two.The average value for the amplitudes is 10.54 ± 0.18 ppm, representing low seasonal variations as expected for the high altitude continental background station Jungfraujoch.Seasonalities at other European sampling locations are more pronounced due to local and regional influences of the biosphere and fossil fuel combustion (e.g.Thompson et al., 2009;van der Laan et al., 2010).
Table 2. Average CO 2 , δ(O 2 /N 2 ) and δ 13 CO 2 differences between each pair of two laboratories and their standard errors of the mean.Also given are the standard deviations.The results are given for the entire data set as well as for the two sub-periods: before March 2009 (part 1) and after March 2009 (part 2).The results from the "sausage" program are also included.1.This internal reproducibility is 6 per meg for UBE, 8 per meg for RUG and 3 per meg for MPI.To put these in perspective: the WMO goal for the compatibility between two laboratories is 2 per meg.The WMO, however, states in its report that the goal of 2 per meg has not yet been reached and that the current compatibility between any two laboratories is not better than 5 per meg.

UBE -RUG
Figure 4 shows the results for the δ(O 2 /N 2 ) values of the atmospheric samples for the three laboratories.The error bars indicated in the figure are the standard errors of the mean of the results of 2 or 3 flasks.Values that represent only a single flask are not assigned an error bar.The fits shown in the figure are linear trends plus single harmonic seasonal components.Using the 2.7 sigma residuals filter as described in Sect.3.1, 1 value is rejected for UBE, 4 for RUG and 4 for MPI.The figure shows a large variability between the δ(O 2 /N 2 ) values for the three laboratories.Samples that represent local pollution events (as seen in Fig. 2 for CO 2 ) are not recognisable as such for δ(O 2 /N 2 ), due to the higher variability in the data sets.
Figure 5 shows the differences between each pair of two laboratories.The average values for the differences are included in Table 2.In Fig. 4 it is visible that the δ(O 2 /N 2 ) values for UBE are significantly lower than the values of the other two laboratories.The average difference between MPI and RUG is −3 per meg, whereas for UBE-RUG it is −33 per meg and for UBE-MPI it is −31 per meg.This bias most likely reveals a problem with the scale definition for UBE, a matter which requires further study and intercomparison.Also the standard deviation of the average difference is larger for the comparisons to UBE than between MPI and RUG.For MPI-RUG the average difference is within 5 per meg, however, only 14 % of the values are within the ±5 per meg limits.For UBE-RUG this is 12 % and for UBE-MPI 3 %.For δ(O 2 /N 2 ), significant improvements are needed to meet the WMO goals.Based on these results, the bias for the UBE measurements requires further investigation, but also the sampling procedures, including the storage of the flasks, should be studied to possibly reduce the internal reproducibility for the laboratories and thereby the compatibility between the laboratories.Table 2 also includes the difference between the measurements of the samples collected before and after March 2009.As for CO 2 , we see a shift in the values for the MPI flasks, due to the improvement in the pressurizing of the flasks, of about +17 to +24 per meg.
The indicated fits for the 3 data sets in Fig. 4 are quite different from each other.The obtained parameters for each laboratory are given in Table 3.The data sets cover almost four years, which is a short time to obtain robust values for the long-term annual trend, considering the large variability in the data sets.The seasonalities of the fits should be comparable between the three laboratories based on this time period.The large variability of the δ(O 2 /N 2 ) data does, however, lead to significant differences between the laboratories in the quality of the obtained fits and the estimates for the trend and seasonal amplitudes.The correlation coefficients (R 2 ) of the fits are 0.58 for UBE, 0.73 for RUG and 0.87 for MPI.The obtained values for the annual decrease rates differ significantly as well.Especially for the UBE data, the trend estimate is unrealistic (compared to the global average trend which was −19 per meg yr −1 over the past two decades (e.g.Keeling, 2013), and slightly more negative in the period of our study), due to the high variability of the data set.Since the focus of this study is the comparison between the measurements of different laboratories, we have included most of our data in our analysis.However, if this data set were to be used for trend analysis, a stronger filtering strategy could be applied.If a 1.9 sigma exclusive filter would be used, instead of the used 2.7 sigma filter (see Sect. 3.1), the trend estimate for UBE would become more robust at: −21 ± 2 per meg yr −1 (with R 2 = 0.81).For RUG and MPI the trend estimates are already more robust (given the higher initial R 2 values), and removing more data points does not alter the trend estimates that much.For the seasonality, the obtained values for the amplitude compare well between MPI and RUG, 85 ± 4 per meg and 84.1 ± 2.2 per meg, respectively.This value is, as for CO 2 , lower than at other stations which are in the European atmospheric boundary layer (e.g.Kozlova et al., 2008;Popa et al., 2010;Thompson et al., 2009;van der Laan-Luijkx et al., 2010).The value for Jungfraujoch represents a signal of a background station influenced mostly by the free troposphere.

δ 13 CO 2
For the analysis of δ 13 CO 2 we have included 88 values for UBE, 82 for RUG and 67 for MPI.For UBE, 75 are averages of the values of two flasks and 13 are single flask measurements.For RUG, 53 are averages of two values and 29 are single flasks.For MPI 53 values are averages of three flasks, 10 are averages of 2 flasks, and 4 values are single values.The internal reproducibility, i.e. the standard errors of the averages for the duplicate and triplicate samples are 0.08 ‰ UBE, 0.07 ‰ RUG, 0.009 ‰ MPI (see Table 1).We recall that the WMO compatibility goal is 0.01 ‰.
Figure 6 shows the results for the δ 13 CO 2 measurements from flasks sampled at Jungfraujoch.The standard errors of the averaged values are indicated as error bars.For single flask values no error bar is included in the figure.Filtering the data using the method described above, removes 3 values for UBE, 5 for RUG and 1 for MPI.The figure shows the seasonality in the δ 13 CO 2 signal as well as a small decreasing trend.The decrease rate is not clearly visible due to the short timescale.The results from the three laboratories follow the same pattern.The fits shown in the figure are linear trends plus single harmonic seasonal components.
Figure 7 shows the differences between the laboratories.The average differences are close together and are shown in Table 2.However, the variability for each comparison is quite large.The average differences are −0.03‰ for UBE-RUG, −0.02 ‰ for UBE-MPI and −0.02 ‰ for MPI-RUG.This result makes clear that the WMO goal for δ 13 CO 2 is not met between any of the three laboratories.The percentage of measurements within the WMO goal of 0.01 ‰ are 6 % for UBE-RUG, 2 % for UBE-MPI and 5 % MPI-RUG.This low compatibility of the δ 13 CO 2 results from the flasks sampled at Jungfraujoch for our three laboratories shows that the data series presented here are not optimal, but they can be used to interpret the trend and seasonality in the longer term.Table 2 also includes the values obtained before and after March 2009.These values do not show a difference due to the change in the setup within the uncertainty limits.
The obtained parameters for the trend and seasonality are presented in Table 3.The results from the three laboratories do not compare well with each other within their estimated uncertainties.The trend estimates for UBE and RUG of −0.081 ± 0.018 ‰ yr −1 and −0.069 ± 0.015 ‰ yr −1 are much too high compared to the estimate obtained for MPI of −0.016 ± 0.014 ‰ yr −1 .The latter is in good agreement with the trend from the GLOBALVIEW-CO2C13 dataset, which is −0.02 ‰ yr −1 for our latitude (GLOBALVIEW-CO2C13, 2009).The fact that the internal reproducibility of MPI is much better than the other two laboratories (see Table 1), enables this better trend estimate on the relatively short timescale of four years.The other two laboratories would need a longer data record to obtain a valid trend estimate.For UBE, additional flasks are sampled at Jungfraujoch and data from these flasks are available for the entire period 2000-2012.The obtained trend from the complete UBE record is estimated at −0.013 ± 0.004 ‰ yr −1 , much closer to the trend estimate from MPI.The average seasonal amplitude for the three laboratories is 0.51 ± 0.07 ‰, which is lower than obtained from other European stations, e.g. the obtained seasonal amplitude from the GLOBALVIEW-CO2C13 dataset for δ 13 CO 2 for our latitude is 0.7 ‰ (GLOBALVIEW-CO2C13, 2009), indicating again that Jungfraujoch is less influenced by regional and local emissions.

Results from the "sausage" flask intercomparison program
All three laboratories also participated in the so-called "sausage flask comparison" experiment (Levin et al., 2003).
In this program nine laboratories receive three flask pairs on a bi-monthly basis.These flasks are filled with dry ambient air from high pressure cylinders with three different compositions covering a CO 2 range of 360-410 ppm.The flasks are filled in series with each flask being flushed and filled at different pressures for each participant.The MPI flasks are filled to 1500 hPa and the RUG and UBE flasks are filled to 1000 hPa.The flasks are shipped to the participants within a few days after filling.The flasks are analysed for their CO 2 concentrations as well as δ 13 CO 2 values in the individual laboratories using the same methods as described above for the flasks sampled at Jungfraujoch.δ(O 2 /N 2 ) measurements were not part of this specific intercomparison exercise.
We include here additional results from this intercomparison activity, to help explain the differences found in the previous sections.Only flasks measured in the same period as the flasks from Jungfraujoch are included in the results shown in this paragraph.Whereas the inter-laboratory differences of the samples collected at Jungfraujoch are a result of the combination of uncertainties introduced during sample collection, storage as well as laboratory measurements, the results from the "sausage" flasks only contain information on the compatibility of the laboratory measurements.
The CO 2 inter-laboratory compatibility as represented by the "sausage" flask differences is generally smaller (or in the same range) compared to that based on the samples collected at Jungfraujoch, but the differences do show variations on a time period of months to one year of up to 1 ppm.This suggests that the large inter-laboratory differences in the Jungfraujoch samples (that occur before and after March 2009) are more likely to result from sampling and/or storage issues than from the measurement differences in the laboratories.In contrast to CO 2 , the compatibility for δ 13 CO 2 based on the "sausage" data is in the same range as that from the flask samples collected at Jungfraujoch.This indicates that the dominating cause of uncertainty is likely to be in the laboratory measurements.

Discussion, conclusion and recommendations
The study presented in this paper covers a long-term comparison of measurements of in situ sampled flasks for CO 2 , δ(O 2 /N 2 ) as well as δ 13 CO 2 .Intercomparison programs are important to document the inter-laboratory compatibility, to indicate the need of improvement and to detect measurement problems in specific laboratories.Global intercomparison programs are quite time consuming and costly.The existing programs focus on in situ versus flask comparison or on the intercomparison of cylinder measurements or flasks filled under laboratory conditions.In intercomparison programs such as the Cucumbers project (Manning et al., 2009), cylinders are shipped between laboratories to compare their measurements.The cylinders are therefore measured in each laboratory about once per year.The compatibility between laboratories under laboratory conditions is quite different from a field study, since biases can be introduced not only by the measurements, but also in the sampling procedure or the storage in the different flasks.Our study has shown that our three laboratories do not yet meet the required WMO goals for compatibility for the presented flask sampling program.However, it is important to note that Jungfraujoch is a very challenging measurement location, especially for δ(O 2 /N 2 ) due to its low air pressure and that the compatibility between the laboratories is better when based on e.g. the Cucumbers program.The quality of flask sample data is very relevant, as flask sampling is used at many sampling locations, because this is easier to achieve at remote locations, and it enables multiple sampling locations in terms of cost effectiveness compared to continuous measurements.Flask samples are therefore widely used in carbon cycle studies.Further efforts should be made to increase internal reproducibility of laboratories as well as the compatibility between laboratories for this sampling method.Our intercomparison program is therefore an important tool to assess inconsistencies, which is the first step to be able to minimize them.One of most important steps to improve our measurements is to further study the biases in CO 2 and δ(O 2 /N 2 ) for RUG and UBE, respectively.
Especially for δ(O 2 /N 2 ) measurements, intercomparison programs are rare.The desired high precision and accuracy for δ(O 2 /N 2 ) measurements is reached by only a few laboratories.δ(O 2 /N 2 ) is difficult to measure, therefore more collaboration and intercomparisons are needed to establish better compatibility between laboratories.Combined trend analysis of CO 2 and δ(O 2 /N 2 ) is an important tool to study the global oceanic CO 2 uptake.Differences in obtained CO 2 and δ(O 2 /N 2 ) trends between laboratories can therefore have a large impact on these estimates.The global oceanic CO 2 uptake is for example estimated by Manning and Keeling (2006) and van der Laan-Luijkx et al. (2010), who found 2.2 ± 0.6 PgC yr −1 and 1.8 ± 0.8 PgC yr −1 , respectively.Using the same approach as van der Laan-Luijkx et al. (2010), we obtain from our data the following estimates for the global oceanic CO 2 uptake: 6.4 ± 1.7 PgC yr −1 for UBE (3.0 ± 1.2 PgC yr −1 when using the more strict data filtering as described in Sect.3.2), 3.6 ± 1.4 PgC yr −1 for RUG and 1.5 ± 1.0 PgC yr −1 for MPI.These large differences are mainly caused by the large differences in the δ(O 2 /N 2 ) trend between the three laboratories (see Table 3).These values are based on only short time series, and can therefore be significantly improved by extending the data series.Longer time series are therefore necessary before these estimates can be used in a study to obtain the global oceanic CO 2 uptake.However, our estimates do show that differences between measurements of different laboratories can have a large impact on global carbon cycle estimates and therefore reflect that the ambitious WMO compatibility goals have a scientific justification.Laboratories should continue to improve their measurement precision and accuracy and continue to assess them in regular intercomparison programs.

Fig. 1 .
Fig. 1.Schematic diagram of the setup for flask sampling at Jungfraujoch, before March 2009 (A) and after March 2009 (B).

Fig. 2 .
Fig. 2. CO 2 concentration at Jungfraujoch, Switzerland from flask samples measured by three laboratories: University of Bern (UBE) (pink squares), University of Groningen (RUG) (orange diamonds) and Max Planck Institute in Jena (MPI) (blue circles).The values are the averages of 1, 2 or 3 flasks.The fits through the data are linear trends plus double harmonic seasonal components.Open symbols represent those values that are outliers to the fit of the individual data set (based on the 2.7 sigma exclusive filter on the residuals).The error bars represent the standard error of the average value of 2 or 3 flasks.For single flask measurements error bars are not shown.

Fig. 3 .
Fig. 3. Differences of the CO 2 concentration measured by each pair of two laboratories.The average differences are: 0.20 ppm for UBE-RUG, 0.08 ppm for UBE-MPI and 0.14 ppm for MPI-RUG.The error bars represent the quadratically added standard errors of the measurements of the two laboratories.

Fig. 4 .
Fig. 4. δ(O 2 /N 2 ) observations from Jungfraujoch, Switzerland from flask samples measured by three laboratories: UBE (pink squares), RUG (orange diamonds) and MPI (blue circles).The values are the averages of 1, 2 or 3 flasks.The fits through the data are linear trends plus single harmonic seasonal components.Open symbols represent those values that are outliers to the fit of the individual data set (based on the 2.7 sigma exclusive filter on the residuals).The error bars represent the standard error of the average value of 2 or 3 flasks.For single flask measurements error bars are not shown.

Fig. 5 .
Fig. 5. Differences of the δ(O 2 /N 2 ) values measured by each pair of two laboratories.The average differences are: −33 per meg for UBE-RUG, −31 per meg for UBE-MPI and −3 per meg for MPI-RUG.The error bars represent the quadratically added standard errors of the measurements of the two laboratories.

Fig. 6 .
Fig. 6. δ 13 CO 2 observations from Jungfraujoch, Switzerland from flask samples measured by three laboratories: UBE (pink squares), RUG (orange diamonds) and MPI (blue circles).The values are the averages of 1, 2 or 3 flasks.The fits through the data are linear trend plus single harmonic seasonal components.Open symbols represent those values that are outliers to the fit of the individual data set (based on the 2.7 sigma exclusive filter on the residuals).The error bars represent the standard error of the average value of 2 or 3 flasks.For single flask measurements error bars are not shown.

Fig. 7 .
Fig. 7. Differences of the δ 13 CO 2 values measured by each pair of two laboratories.The average differences are: −0.03 ‰ for UBE-RUG, −0.02 ‰ for UBE-MPI and −0.02 ‰ for MPI-RUG.The error bars represent the quadratically added standard errors of the measurements of the two laboratories.

Table 1 .
Average standard errors in the mean of the duplicate or triplicate flasks for the CO 2 , δ(O 2 /N 2 ) and δ 13 CO 2 measurements from each of the three laboratories for the flasks sampled at Jungfraujoch as well as for the flasks from the "sausage" program.

Table 3 .
CO 2 , δ(O 2 /N 2 ) and δ 13 CO 2 trends and seasonal amplitudes based on the fit of the data sets from each laboratory: UBE, RUG and MPI.The used fit is a linear combination of a linear trend plus a double (for CO 2 ) or single (for δ(O 2 /N 2 ) and δ 13 CO 2 ) harmonic seasonal component.The stated errors are the uncertainties of the fit only.
* More realistic values are obtained when a stronger filter (i.e.1.9 sigma instead of 2.7 sigma) is applied to the data: −21 ± 2 per meg yr −1 and 73 ± 3 per meg for the linear trend and seasonal amplitude, respectively.* * The trend estimate based on the complete record available for UBE between 2000 and 2012 is: −0.013 ± 0.004 ‰.