Results from the International Halocarbons in Air Comparison Experiment (IHALACE)

The International Halocarbons in Air Comparison Experiment (IHALACE) was conducted to document relationships between calibration scales among various laboratories that measure atmospheric greenhouse and ozone depleting gases. Six stainless steel cylinders containing natural and modiﬁed natural air samples were circulated 5 among 19 laboratories. Results from this experiment reveal relatively good agreement among commonly used calibration scales for a number of trace gases present in the unpolluted atmosphere at pmol mol − 1 (parts per trillion) levels, such as chloroﬂuorocarbons (CFCs), hydrochloroﬂuorocarbons (HCFCs), and hydroﬂuorocarbons (HFCs). Some scale relationships were found to be consistent with those derived from bi-lateral 10 experiments or from analysis of atmospheric data, while others revealed discrepancies. The transfer of calibration scales among laboratories was found to be problematic in many cases, meaning that measurements tied to a common scale may not, in fact, be compatible. These results reveal substantial improvements in calibration over previous comparisons. However there is room for improvement in communication and coordina- 15 tion of calibration activities with respect to the measurement of halogenated and related trace gases.


Introduction
Halogenated trace gases, such as chlorofluorocarbons (CFCs), hydrochlorofluorocarbons (HCFCs), and chlorinated solvents are involved in stratospheric ozone depletion 20 (Montzka and Reimann, 2011). Some of these, along with hydrofluorocarbons (HFCs), are also strong greenhouse gases. In an effort to characterize global distributions and sources/sinks of these gases, several international research groups measure the atmospheric abundance of CFCs, HCFCs, HFCs, and halogenated solvents on a routine basis. 25

8023
Collaborative efforts utilizing measurements from multiple groups have led to more robust estimates of the global distributions and emissions of N 2 O (Huang et al., 2008;Saikawa et al., 2013), CCl 4 (Xiao et al., 2010a), CH 3 Cl (Xiao et al., 2010b), HCFC-22 (Saikawa et al., 2012) and SF 6 (Rigby et al., 2010). Integrating results from different research groups to produce a consistent picture of the global or regional atmospheric 5 distribution can be challenging. There are many factors that can lead to differences in the data records collected by different groups (e.g., sampling or analytical artifacts, calibration differences, site selection). Perhaps the most fundamental of these is the calibration scale upon which the measurements are based. Nearly all measurements of ozone-depleting and greenhouse gases are made on a relative basis. That is, the 10 abundance is determined relative to a calibration standard measured in a similar manner. Most calibration standards consist of mixtures of trace gases stored in compressed gas cylinders with known mole fractions. Calibration standards are typically designed to match the atmospheric composition in order to minimize interference or bias.
The larger CO 2 measurement community, under the auspices of the World Meteoro-15 logical Organization -Global Atmosphere Watch (WMO/GAW) program, has adopted a single reference scale for WMO/GAW CO 2 measurements (WMO/GAW, 2009;Zhao et al., 1997). On-going efforts to compare laboratory measurements and assess how well cooperating laboratories are linked to the calibration scale are fundamental to the WMO/GAW program (WMO/GAW, 2009). Protocols for CH 4 , N 2 O, CO, SF 6 , and H 2 20 are also in place. However, there have been few efforts to characterize differences between calibration scales and measurement programs for halogenated gases. Early comparison studies (Rasmussen, 1978;Fraser, 1979) found large differences in mole fractions of the major ozone-depleting gases (CFC-11, CFC-12, CH 3 CCl 3 , and CCl 4 ) related to analytic methods and calibration. These studies revealed standard devia- 25 tions of 10-25 % among independent laboratory scales, but showed good agreement (3 %) for CFC-12 and CCl 4 among laboratories using commonly derived standards. In recent years, most of the research in this area has been carried out on a bi-lateral or ad-hoc basis. While a few scale differences can be calculated from global mean esti-mates (Clerbaux and Cunnold, 2007;Montzka and Reimann, 2011), these represent only a subset of research groups involved in the measurement of these trace gases, and do not differentiate between fundamental calibration scale differences and those associated with sampling, measurement location, or analytical technique. The International Halocarbon in Air Comparison Experiment (IHALACE) was con- 5 ceived as a first step toward assessing the variability of a number of common calibration scales for halogenated trace species measured in the atmosphere. While the existence of independent calibration scales is important for verifying trends and estimating uncertainties, it is also important to understand the relationships between independent scales. Experiments designed to assess calibration and analytical differences 10 have been conducted for greenhouse gases (WMO, 2011) and select hydrocarbons (Apel et al., 1994;Slemr et al., 2002;Apel et al., 2003). Only a limited number of such experiments have been conducted with a focus on halocarbons (Rasmussen, 1978;Fraser, 1979;Prinn and Zander, 1998;Jones et al., 2011). The goals of IHALACE were (1) establish a calibration matrix that relates the cali-15 bration scales among different laboratories at a specific point in time, and (2) enhance communication and cooperation among laboratories in order to improve data quality (e.g., through regular comparisons). We do not explore analytical or scale development uncertainties in depth. Typical scale uncertainties at ambient mole fractions are about 1-4 % (95 % C. L.). While it is possible that comparison results might agree within 20 these uncertainties, small differences between measurement programs can be important for gases with small spatial gradients. As a result, it is important to understand even small differences between scales rather than treat scales as equivalent based on agreement within uncertainties.

25
Six electro-polished stainless steel cylinders (Essex Cryogenics, St. Louis, MO), divided in two sets, were distributed among the participants (Table 1). Each group re-8025 ceived three cylinders, two at ambient mole fraction and one a mixture of 80 % ambient air and 20 % ultra-pure zero air (Table 2). Mole fractions were not disclosed at the time of distribution. To the extent possible groups that develop their own calibration scales received the same set of three samples, while those groups that adopt scales from other laboratories received the other set of three samples. bringing clean continental background air to the site. Cylinders (34-L empty volume) initially contained ∼ 100 hPa dry nitrogen upon receipt from the manufacturer. They were evacuated to 2 Pa and then filled with 6.2 MPa dry natural air via transfer from a pressurized Aculife-treated aluminum cylinder (filled previously at Niwot Ridge). Approximately 0.65 mL HPLC grade water was added to 15 each cylinder to humidify the air. Cylinders were conditioned with this humidified air for one month, evacuated to 6 Pa, and humidified by adding 0.65 mL HPLC-grade water and ∼ 100 L dry natural air as before. Dry synthetic zero-grade air was added to two cylinders to create sub-ambient air samples. The zero-grade air (Linweld, Lincoln, NE) was scrubbed for residual contamination by passing it through molecular sieve 20 5Å and activated charcoal at −78 • C. Final pressurization to 6.

Analysis and data reporting
Each participant was instructed to analyze the air samples in a manner similar to other 10 air samples from their measurement program. Most participants employ gas chromatography with electron capture-, mass selective-, or flame ionization detection. While laser-based systems have been developed for some species (e.g., N 2 O and CH 4 ), they were not widely used at the time of this experiment. A dedicated pressure regulator was supplied with each cylinder (Veriflow 959TDR, Veriflow Division of Parker

15
Hannifin, Richmond, CA) along with 1 m lengths of 1/16 stainless steel tubing. Participants were instructed to use the regulators provided unless their analysis method required a different procedure. Each laboratory was instructed to forward the cylinders to the next laboratory according to a pre-determined schedule. Cylinders were initially distributed in Septem-20 ber 2004. One set of cylinders was returned to Boulder for final analysis in 2006. The second set was returned a year later, taking an additional year to complete the circuit. Each cylinder was analyzed at NOAA at the beginning and end of the distribution period. At the end of the experiment, four of the six cylinders remained at high pressure (4-5 MPa) while two were accidentally partially vented during the final weeks of the ex-25 periment. The final analysis at NOAA was performed while all cylinders still contained large amounts of air, thus differences between initial and final mole fractions are not 8027 expected to be related to changes in cylinder pressure. Only minor differences were observed between initial and final analysis.
Data were submitted to two referees and held until all analyses were complete. At that point, data were released to participants in anonymous form with laboratories identified by number. Participants were informed only of their laboratory number. While 5 IHALACE was operated as a "blind" comparison, one of the referees also acted as a participant. Although this is not generally considered protocol for blind comparison, all participants were informed in advance, and there were no strong objections. The participant/referee submitted results to the other referee and to another participant (B. Hall). Further, the participant/referee ensured that handling and analysis were performed by 10 laboratory personnel not associated with the role of IHALACE referee.
It was requested that all data be properly identified with the corresponding calibration scale (see Table S1 in the Supplement). Data submitted on obsolete scales were converted to more recent scales according to known conversion factors (e.g., CH 4 on the CMDL-93 scale were converted to NOAA-04, Dlugokencky et al., 2005;CFC- -2006). In other cases, scale differences were small and do not significantly affect the results. For example, some data were submitted on SIO-98 scales even though SIO-05 is more recent. 20 The conversion from SIO-98 to SIO-05 for CFC-12 was estimated from SIO results submitted on both scales by the same laboratory. The scale ratio for CFC-12 (SIO-05/SIO-98 = 0.9999 at ∼ 545 ppt) is sufficiently close to 1.0 that results reported on the SIO-98 scale can be compared directly to those submitted on SIO-05. Likewise, conversion from N 2 O scale NOAA-2006 to NOAA-2006A is not necessary for comparative 25 purposes. Finally, some laboratories reported data on more than one scale or from more than one analytical instrument. Some laboratories maintain multiple instruments, such as gas chromatographs with electron-capture detector (ECD) and mass-selective detector (MS). These results are presented in figures and tables as non-integer labora-tory numbers. For example, additional results submitted by laboratory 2 are presented as laboratory 2.1. See Table S1 in the Supplement for additional laboratory information.

Results and discussion
To examine the results, we focus first on laboratories that prepare their own scales. This provides an indication of how well atmospheric mole fractions are known on an 5 absolute basis and avoids scale propagation issues. For each trace gas, we report the variation of results (one standard deviation) exclusively from laboratories that prepare their own standards (Table 5). Of course, no calibration scale is known absolutely, but good agreement among a number of scales would suggest that errors in determining the atmospheric mole fraction of a particular trace gas are likely small. Next we exam- 10 ine the extent to which certain scales are propagated among different laboratories, as two laboratories on the same scale should agree to the level at which the scale can be propagated (typically 2σ from the laboratory of scale origin). We also separate results by the season during which the cylinders were filled (late winter vs. early summer) as seasonal mole fraction differences are expected for some gases. For most compar-15 isons, we focus on the undiluted air samples since calibration and analysis procedures are likely to be optimized for ambient samples. We use the NOAA results as the basis for many of the comparisons because all six cylinders were analyzed at NOAA. Finally, we compare results for undiluted and diluted samples. Results from both diluted and undiluted samples, taken together, may shed light on non-linearities associated with 20 analysis or standards development, which could impact how datasets compare over the long term. We focus primarily on the results for major halogenated species (CFCs, HCFCs, HFCs, etc.), and other greenhouse gases such as N 2 O, SF 6 , and CH 4 . We limit the analysis primarily to trace gases for which three or more laboratories provided results 25 and at least two scales are represented. The full complement of results is available 8029 as Supplement (see Table S2). Average differences (%) compared to NOAA for select trace gases are shown in Tables 3 and 4.

Chlorofluorocarbons (CFCs)
Both CFC-11 and CFC-12 have a long history of measurement in the atmosphere and extensive work on scale development has been done over the years. For CFC-11, 5 good agreement was observed across six scales for the undiluted air samples, with a variation of only 1 % (one standard deviation) ( Table 5, Fig. 1a). There was some clustering, with three scales (1, 2, 7) at lower values and three scales (15,16,19) ∼ 4 ppt higher, but in general there is good agreement among scales (recall that the variability among scales is based only on laboratories that prepare their own standards, 10 and not on those that derive scales from others). The difference between laboratories 1 (NOAA) and 7 (NIST) was 0.1 %. This is less than the average difference of 0.9 % reported by Rhoderick and Dorko (2004) based on a comparison of two compressed gas standards.
Scale relationships for three commonly-used scales (NOAA, SIO, and UCI-2) were  Like CFC-11, the five CFC-12 calibration scales represented show a dispersion of 1 % (Fig. 1b, Table 5). NOAA re-analysis of IHALACE cylinders suggests that initial NOAA assignments were ∼ 0.8 ppt too low for unknown reasons (the average of the second NOAA analysis were 0.8 ppt higher than the first) and this was confirmed by further analysis at NOAA. All CFC-12 comparisons shown in Tables 3-6 are based on 5 the second NOAA analysis.
CFC-12 scale factors derived from undiluted IHALACE cylinders for SIO/NOAA and UCI-2/NOAA are nearly identical with those derived from global mean mole fraction estimates (Table 6). The NIST-NOAA average difference (Table 3)  While the overall scale differences for CFC-11 and CFC-12 are not large, scale propagation could be improved. Differences among laboratories reported to be on the same scale are nearly as large as differences among scales. Some laboratories (3, 11) reported results more than 10 ppt higher than the laboratory that developed the scale 15 (scale origin). This is an important finding also observed for other trace gases. Measurements which are supposedly comparable (traceable to the same scale) may not be compatible (see JCGM 200:2008(see JCGM 200: , 2007; WMO/GAW, 2011) due to scale propagation or sampling/measurement issues. This could impact the utility of combining data from different networks/sites even when the programs are linked to common scales. 20 One likely reason is the lack of regular communication between laboratories regarding calibration scale changes. Equally important are efforts to verify that mole fractions of calibration standards are not changing over time. Efforts to ensure data quality and scale transfer are needed on a continuing basis to minimize potential bias. Examples of efforts to address these issues include routine comparison of standards or air sam- 25 ples, and co-located sampling, where measurements are taken by independent groups at the same site.
It is important to note that with regard to potential scale transfer errors, some groups within this study are more closely linked than others. For example, laboratories 2, 9, 8031 14, and 17 are members of or affiliated with the Advanced Global Atmospheric Gases Experiment (AGAGE) (Prinn et al., 2000). Standard preparation, scale propagation, and data processing are likely more centralized within this group than between other groups operating on common scales. Scale transfer errors between AGAGE-affiliated laboratories should be smaller than those between laboratories with little or no formal 5 cooperative ties. The same would be expected from other measurement facilities operating within one agency. Select members of the AGAGE group were included in the experiment because of past experience with calibration scale development and trace gas records at important long-term sites (e.g., Cape Grim, Australia operated by the Australian Bureau of Meteorology in collaboration with CSIRO; and Mace Head, Ireland 10 operated by University of Bristol).
CFC-113 results are similar to those for CFC-11. The standard deviation of results from five scales is 1.7 ppt, or 2.1 %. Again, scale propagation is problematic in some cases (Fig. 1c). Laboratory 12 agrees with laboratory 1 (scale origin) very well, and laboratories 9 and 17 agree with laboratory 2 (scale origin), but laboratory 3 shows a large 15 difference relative to laboratory 1 (scale origin). Scale conversion factors derived from undiluted samples are consistent with those derived from global mean mole fraction estimates in 2004 (Table 6). The SIO/NOAA ratio is 0.972 compared with 0.975 from 2004 global means while the UCI-2/NOAA ratio is 0.974 compared to 0.978 based on global means. There are small differences between results from the same laboratory 20 using different instruments. Laboratory 2 reported a difference of ∼ 0.5 ppt between ECD and MS results, with the ECD results likely being affected by a co-elution. However, there is no difference between ECD and MS results for laboratory 17. Differences between ECD and MS results from laboratory 1 are partially traceable to standards used to define the scale. When the same standards are used on both ECD and MS 25 instruments, agreement is within 0.5 % for these air samples. While these differences are small, they suggest that CFC-113 results may be influenced by co-elution, matrix effects, or analytical non-linearities.
Fewer laboratories reported results for two additional CFCs: CFC-114 (CClF 2 CClF 2 ) and CFC-115 (CClF 2 CF 3 ). The variability among four scales reported for CFC-114 was 0.7%. While differences among primary scales were small, overall differences between laboratories were larger, with differences due to scale propagation larger than differences between primary scales (Fig. 1d). Some of the CFC-114 differences could 5 result from chromatographic co-elution of CFC-114 and CFC-114a (CCl 2 FCF 3 ) and relative amounts of CFC-114 and CFC-114a in laboratory standards compared to IHA-LACE samples. The variability among three scales reported for CFC-115 was 4.1 % with comparable differences due to scale propagation (Fig. 2a).

Chlorinated solvents: CCl 4 , CH 3 CCl 3 , and CHCl 3
Carbon tetrachloride (CCl 4 ) was reported by 12 laboratories on five independent scales (Fig. 2b). The standard deviation of results among five scales was 1.8 ppt (1.9 %). The difference between the NOAA scale (laboratory 1) and the SIO-05 scale (laboratory 2, ECD results) was 2.7 %. This is comparable to both the 2.6 % difference reported by Xiao et al. (2010a) based on co-located sampling results, and the 2.6 % difference 15 based on 2007-2008 global mean estimates (Table 6).
There remains a discrepancy between bottom-up inventories and top-down measurement-based inventories of global CCl 4 emissions (UNEP, 2007;Montzka and Reimann, 2011). From the IHALACE study, the largest difference between scales (laboratory 2 versus laboratory 7) is 4.8 ppt, or 5 % of the average northern hemispheric 20 mole fraction in 2004. If we assume that this represents the full range of calibration uncertainty, then top-down estimates of CCl 4 emissions could be subject to 5 % uncertainty due to calibration alone. This relatively small uncertainty is not enough to explain the discrepancy between top-down and bottom-up emission estimates.
Comparison results for CH 3 CCl 3 from 12 laboratories on six calibration scales are 25 shown in Fig. 2c. Results from laboratories that prepare primary standards show a variation of 0.8 ppt (3.4 %) for winter samples, and 1.0 ppt (4.7 %) for summer samples.
The fact that all scales agree within a few ppt is remarkable considering that it has 8033 been diffficult to obtain samples of pure CH 3 CCl 3 in the past. A prior calibration scale developed by NOAA in the late 1990s was based on a CH 3 CCl 3 reagent that contained as much as 7 % impurities. Like CFC-113, instruments can give different results for CCl 4 and CH 3 CCl 3 even when the same standards are used to define the scale. Laboratories 1, 2, 9, and 17 5 all reported CH 3 CCl 3 results from both ECD and MS instruments. Small differences, generally less than 0.5 ppt (2-3 %), are evident in each case. Laboratories 2, 9, and 17 reported both ECD and MS results for CCl 4 on the SIO-05 scale and are aware of a systematic problem in their MS method probably due to the chromatographic column. These results imply that one needs to be careful when accessing data collected 10 by different instruments. Small analytical differences can lead to discrepancies even within the same measurement program, and differences need to be assessed on an instrument by instrument basis.
Results for CHCl 3 are shown in Fig. 2d. The dispersion of five scales was 4.5 and 15.5 % from summer and winter samples, respectively. The large standard deviation for 15 the winter samples reflects a low mole fraction reported by laboratory 7 for the winter sample. Excluding laboratory 7, results on four scales show a variability of ∼ 5 % for both summer and winter samples. Scale transfer issues appear to be relatively minor. Differences due to analytical methods are on the order of 2-3 %, similar to CH 3 CCl 3 and CCl 4 . 20 Despite relatively small scale differences among independent scales, there are some substantial scale propagation issues for both CH 3 CCl 3 and CCl 4 . While some laboratories were able to reproduce results on existing scales, others were not. CCl 4 results reported by laboratory 3 were ∼ 30 ppt higher than laboratory 1, from which the scale is derived (outlier in Fig. 2c). This could be caused by downward drift of CCl 4 in standards used by laboratory 3 as CCl 4 is known to drift in some types of cylinders (untreated aluminum, for example). A smaller, but still significant difference is evident for laboratory 11, with their scale derived from laboratory 2. Laboratory 3 reported mole fractions of CH 3 CCl 3 that were 18 and 65 ppt larger than those of laboratory 1 (see Table 3). A large positive offset could be related to downward drift of CH 3 CCl 3 in standards used by laboratory 3, but would not explain the large difference in mole fractions reported for the two un-diluted samples.

HCFCs and HFCs
The atmospheric abundances of HCFCs and HFCs (1st and 2nd generation replacement for CFCs) are lower than those of the major CFCs and their measurement history is not as extensive. Thus, one might not expect the development of measurement scales to be as advanced as those for CFCs. Scale variations range from 1-2 % for HCFC-22 and HCFC-141b (four scales) to 4 % for HCFC-142b (four scales), and 3-6 % for HFC-152a and HFC-134a (three scales) (Figs. 3a-d and 4a). 10 While the overall scale differences are larger than those for CFC-11 and -12, the fact that HCFC and HFCs require more advanced measurement techniques compared to CFCs yet still show relatively good agreement among major scales is encouraging. It is likely that efforts to develop and improve CFC calibration scales through the years have translated into improved scales for HCFCs and HFCs as well. Once again, however, 15 scale propagation is problematic in some cases, e.g., propagation errors for HFC-152a are as large as 10 %.
Four scales for HFC-134a vary by 2.7 and 4.9 % for summer and winter samples, respectively. A fifth scale (UB-98) also agrees with other scales (Fig. 3d), but is not included in Table 5 because laboratory 11 is not the scale origin for UB-98. Scale 20 transfer is very good among AGAGE laboratories (2,9,14,17) and among those linked to the NOAA-04 scale (1,4,12). Laboratories 15 and 19 show an 11 % discrepancy based on undiluted samples.
It is encouraging that nearly all laboratories detected a mole fraction difference between cylinders filled in winter and summer, and that this was true for nearly all HCFCs 25 and HFCs. In most cases the seasonal differences were similar among all labs, except for HCFC-141b (Fig. 3b). Laboratories 1, 2, 4, and 17 observed a 1.0-1.5 % difference between summer and winter samples, while laboratories 14 and 16 observed smaller 8035 differences, and laboratories 11 and 19 observed differences with opposite sign. Thus, while there is generally good agreement among scales for HCFC-141b, there is room for improvement in analytical techniques. Observed SIO/NOAA ratios for HCFC-141b (1.012) and HCFC-142b (1.037) are similar to those derived from global means (Table 6). Likewise, SIO/NOAA and UCI-5 2/NOAA ratios for HCFC-22 are comparable to those based on global means.

Halons
Halon results are reported on two to three independent scales with several other laboratories reporting on dependent scales. Halon-1211 was measured by nine laboratories (Fig. 4b) on four scales with a standard deviation of 2.2 %. In contrast to most other 10 trace gases measured in this experiment, scale transfer is excellent (< 1 % in most cases). Results on two different scales are shown for laboratory 1, one determined by ECD, the other one by MS.
The range of values reported for halon-1301 (Fig. 4c) was larger than that for halon-1211 (7 % versus 2 %), but scale transfer issues were also relatively minor. Note that 15 SX-3537 was not analyzed for halon-1301 at NOAA. A NOAA value was estimated from SX-3538 (filled at the same time) using the summer/winter ratio from cylinders SX-3527 and SX-3538. This estimate does not affect the above conclusions because the mole fractions of all undiluted samples were similar for this gas.
Halon-2402 mole fractions, reported on two scales, show good agreement within 20 0.05 ppt (10 %) (Fig. 4d). While SX-3537 was not analyzed by NOAA, no attempt was made to estimate halon-2402 in this cylinder because both undiluted cylinders contained similar mole fractions according to results from laboratories 15, 17, and 19. Two laboratories (14 and 17) reported halon-2402 values based on provisional scales (i.e., scales adopted in a non-traditional manner, such as through the exchange of a sub-scale origin (laboratory 1) by up to a factor of two. Results from laboratories 15 and 19, which are on the NCAR/UM scale, agree within 0.03 ppt (6.5 %).

Methyl bromide and methyl chloride
Results for CH 3 Br from different laboratories differ by only a few percent. The standard deviations among five laboratories with independent scales were 2.2 and 1.6 % for 5 CH 3 Br for winter and summer samples, respectively (Table 5). Differences between summer and winter samples were detected by all laboratories (Fig. 5a). Scale transfer issues were minor in most cases, although a ∼ 7 % difference between laboratories 15 and 19 is apparent. The SIO/NOAA ratio (0.998) differs by a few percent from those based on global mean estimates (Table 6). For laboratory 9, one set of results (9.1 10 in Fig. 5a) is known to be overestimated because of a drifting calibration standard. Standard drift was taken into account in the second set of results (9.2 in Fig. 5a), which explains why this group of results is in better agreement with laboratory 2 (scale origin) than the first group. The seasonal difference in CH 3 Br mole fractions allows scales to be compared over a broad range. The five independent scales represented are, for the 15 most part, linearly related to each other (Fig. 7). CH 3 Cl results are similar to those of CH 3 Br, with relatively small differences among six scales (standard deviation ∼ 2.5 %) (Fig. 5b). The large apparent scale difference between SIO-05 and UB-98 (compare laboratories 2 and 11) is complicated by the scale propagation error between laboratories 2 and 11 (SIO-05 scale). Other laboratory 20 comparisons (P. K. Salameh, personal communication, 2010) indicate that the UB-98 scale is 1.5 % higher than SIO-05, which then implies that the laboratory 11 results are ∼ 25 ppt too low. The difference between the NOAA scale and the SIO-05 scale (laboratories 1 and 2) is 0.8 %, similar to the difference of 1.01 % used by Xiao et al. (2010b) based on co-located sampling. 25

Short-lived organo-halides
Few laboratories reported results for short-lived halogens, such as CHBr 3 , CH 2 Br 2 , and CH 3 I. However, recent interest in these gases (Read et al., 2008;Carpenter et al., 2009;Jones et al., 2011) warrants their inclusion. For CH 2 Br 2 and CHBr 3 , only laboratories 1 and 15 provided results on independent scales, and laboratory 12 provided 5 results on scales obtained from laboratory 1. Because the mole fractions of these gases in the IHALACE cylinders were low (< 1 ppt, consistent with continental background air) and the relative analytical precisions were larger than those of many other gases, we compare laboratories 15 and 12 to the average of the initial and final NOAA analysis. There does not appear to have been a significant change in the mole fractions of CHBr 3 10 and CH 3 I in the IHALACE cylinders during the experiment. An upward drift of ∼ 10 % is suggested for CH 2 Br 2 , although this is within the range of uncertainties.
For CHBr 3 , the difference between laboratories 1 and 15 was 30 % while the difference between laboratories 1 and 12 (same scale) was 6 % (Fig. 6a). Jones et al. (2011) reported scale differences as high as 70 % and scale transfer differences of ∼ 15 %.
For CH 3 I, results from most laboratories were in good agreement, with the excep-20 tion of laboratory 15, which was a factor of 2 higher than the rest (Fig. 5c). Jones et al. (2011) also reported factors of 2 differences for CH 3 I. Overall, the comparison of CH 2 Br 2 , CHBr 3 , and CH 3 I scales is promising considering that these gases are typically more difficult to measure compared to CFCs and HCFCs, and mole fractions in the IHALACE cylinders were less than 1 ppt. Comparisons carried 25 out at higher mole fractions (2-5 ppt) might make quantifying scale differences easier for these gases.

Nitrous oxide, SF 6 , methane, and carbonyl sulfide
The long atmospheric lifetime and small spatial gradients of nitrous oxide (N 2 O) mean that compatibility requirements are high. For multiple datasets to be optimally useful in inverse modeling, data should be compatible to within 0.1 ppb (WMO/GAW, 2009). This level of compatibility is often not met using ECD-based methods (WMO/GAW, 5 2011). However, progress has been made in recent years and studies involving multiple datasets have been performed (Hirsch et al., 2006;Huang et al., 2008;Nevison et al., 2011;Saikawa et al., 2013).
Nitrous oxide results varied by 0.72-0.87 ppb (0.23-0.27 %) among three scales (Fig. 6b). The average difference between NOAA and SIO (undiluted samples) was 10 0.08 ppb, which is comparable to differences reported by Hall et al. (2007) and Huang at al. (2008). The calibration ratio between laboratory 17 (CSIRO) and laboratory 2 (SIO) was 1.0025 (0.0002), which differs only slightly from the ratio 1.0017 reported by Huang et al. (2008). There also appears to be good agreement between these scales and the NIST scale, except that the best agreement is shown by laboratory 15 (UM-15 2, adopted scale) and not laboratory 7 (NIST, scale origin). The difference between NIST and NOAA based on undiluted samples is 1.37 ppb, or 0.4 %. This is larger and of opposite sign compared to that reported by Hall et al. (2007)  . Among laboratories on the same scale, compatibility is excellent for some (1, 5, 8; 2, 9, 17) and not so good for others (1, 3; 2, 13). We note that laboratory 13 recently adopted the NOAA-2006 N 2 O scale, and that compatibility is much improved. The average difference between laboratories 1 and 8 (KIT) is <0.1 ppb for undiluted samples. This is an important result because of the roles served by these laboratories 25 within the WMO/GAW program (NOAA as the Central Calibration Laboratory for N 2 O, and KIT as the World Calibration Center). It is essential that these laboratories remain closely linked. Finally, summer/winter differences between the two undiluted cylinders 8039 (∼ −0.2 ppb) were detected by most laboratories (1, 2, 3, 5,8,9,13,15,17) and overestimated by some (laboratories 7, 12, 14). While the results are encouraging overall, there is room for improvement in inter-laboratory compatibililty. SF 6 was reported on four scales (Fig. 6c). Three of these are in excellent agreement. Ratios of commonly used scales relative to the NOAA-2006 scale are 0.9954 (Univer-5 sity of Heidelberg) and 0.9991 (SIO) based on undiluted samples. The SIO/NOAA ratio is close to the mean scale factor of 0.998 ± 0.005 reported by Rigby et al. (2010) based on co-located sampling at five stations. While the three primary scales in use by the atmospheric science community show good agreement, scale transfer issues exist. Relatively large differences between laboratories 1 and 4 (NOAA-2006) and laborato-10 ries 2, 11, and 14 (SIO-05) are apparent. However, it is encouraging that the precision reported by some laboratories is excellent. The average difference between summer and winter samples measured by laboratory 1 was 0.03 ppt. This difference, as measured by laboratories 2, 5, 6, 9, 13, and 14 was 0.03, 0.02, 0.06, 0.03, and 0.02 ppt respectively. Thus, some laboratories are capable of resolving very small mole fraction 15 differences.
Although methane was not a focus of IHALACE, twelve laboratories reported CH 4 mole fractions on three scales (Fig. 6d). Scale differences are small. The relationship between the NOAA04 scale and the Tohoku University scale, 1.0003 as derived by Dlugokencky et al. (2005), is confirmed here. The average ratio of four laboratories 20 on the Tohoku University scale relative to the NOAA results is 1.0003 ± 0.0002. Both the NOAA04 and Tohoku University scales appear to have been propagated to within 2 ppb, which is the WMO/GAW compatibility goal for measurements on the same scale (WMO/GAW, 2009). All laboratories also detected a 24-28 ppb summer/winter difference to within a few ppb. The only disagreement is between laboratories 7 and 15, Carbonyl sulfide (COS) data were not part of the original data submission and are not shown. However, scale comparison information is of interest, particularly since measurements of COS may be useful as a tracer of photosynthesis (Montzka et al., 2007;Campbell et al., 2008). The standard deviation of COS data from four independent scales (winter samples) was 25 ppt (3.9 %). Two scales (1, 10) showed higher 5 COS amounts, while two scales (15, 19) tended to be lower. All laboratories detected a large difference between summer and winter samples, consistent with the seasonal drawdown of COS over the continental US in summer (Montzka et al., 2007) (Table S2, Supplement). The average difference between winter and summer values was 169 ppt (laboratories 1, 10,19). This large seasonal difference, combined with 10 results from the diluted sample, allows linear relationships among COS scales to be estimated. Here we compare to the NOAA scale as: Y = aX + b, where X is NOAA and Y is another scale: [Laboratory Number, a, b], (10, 1.064, −33), (15, 0.928, 17), (19, 0.985, −35). For example, the relationship between laboratory 10 and NOAA is Y 10 = 1.064 · X NOAA − 33 ppt.

Linearity issues
The atmospheric mole fractions of the majority of trace gases studied in this experiment have not been constant over time. CFC mole fractions increased rapidly in the 1980s and have been declining slowly over the last decade. Atmospheric mole fractions of some HCFCs (CFC replacements) continue to increase (Montzka et al., 2009;20 O'Doherty et al., 2004). Thus, a scale comparison based on air samples at one point in time may not be valid for other time periods. We address this briefly by comparing results for diluted and undiluted samples. We focus on gases for which sampling issues and precision are less likely to influence the results. To simply the analysis, we define a linearity factor (LF) as: where X i is the result from laboratory i, and X 1 is the NOAA result, for diluted and undiluted samples. This factor provides an indication of whether or not a constant scale factor might be applied over a 20-30 % mole fraction range. A LF of 1.0 results when scales differ by a constant factor at both ambient and sub-ambient mole fractions.

5
For CFC-113, linearity factors from four laboratories that prepare primary standards are close to the same value (1.02) and one laboratory (15) shows a ratio close to 1.00 (Fig. 8a). Because a number of laboratories show similar results compared to the NOAA ECD-based CFC-113, it seems that the NOAA ECD-based CFC-113 scale may be subject to a co-elution or perhaps the non-linear response of the NOAA ECD was 10 not fully characterized.
CFC-12 ECD results from NOAA and SIO differ by only 1 ppt at 535 ppt, but differ by 10 ppt at 448 ppt (LF = 1.0218 ± 0.0032, 1 s.d.). This suggests that long-term records based on NOAA and SIO measurements might diverge at lower mole fractions. While these are relatively small differences on a percentage basis, they are larger 15 than the typical analytical precision. SIO MS results are more consistent with NOAA ECD results over a 20 % mole fraction range (LF = 1.0058 ± 0.0030). Similarly, mole fraction-dependent differences were also small for laboratories 7, 15, and 19 compared to NOAA ECD results.
We can use the LF results to estimate potential errors introduced by the use of 20 fixed scale factors to adjust calibration scales over a 20 % mole fraction range. For example, CFC-11 LF results are generally within 1 % of 1.0, but the difference between laboratories 7 and 15 is nearly 3 %. Thus, if results on scale 7 (NIST) were adjusted to scale 15 (UM-2) using a fixed scale factor based on undiluted samples from this experiment, errors up to 3 % could result at mole fractions 20 % lower than that upon 25 which the fixed factor was derived. In contrast, results from laboratories 2 and 19 would likely be subject to much less uncertainty when adjusted by fixed scale factors over this range since LFs from these laboratories are nearly identical.
Linearity factors for CH 3 CCl 3 are close to 1.0 for most laboratories. However, LFs for several laboratories are less than 1.0, with an average of 0.986 for laboratories 9,11,14,15,16,17,19 (Fig. 8b). This is likely due to the choice of reference values (NOAA ECD) used to calculate LF. If NOAA MS results are used instead, LF factors increase by an average of 1.2 %. The same group of laboratories would then show an average 5 LF of 0.999. This suggests a slight non-linearity in the NOAA ECD data. Linearity of CH 3 CCl 3 response could be important when interpreting historical CH 3 CCl 3 data because of the rapid decline in CH 3 CCl 3 mole fraction that has occurred over the last two decades.
Linearity factors for CCl 4 (Fig. 8b) show little variation among laboratories that pre-10 pare primary standards (1,2,7,15,19) with most LF within 1 % of 1.00. This suggests that non-linear effects are not a major factor contributing to the observed 5 % scale differences.
Only small concentration-dependent scale differences were observed for HCFC-141b, HCFC-22 (Fig. 8c), and HCFC-142b (not shown) between commonly used scales 15 (laboratories 1, 2, 15, 16). Therefore application of a constant scale factor for these gases is unlikely to result in large errors over a limited mole fraction range. Linearity factors for HCFC-22 are nearly all within 1 % of 1.00. The LF factors for HCFC-141b range from 0.98 to 1.02 but in most cases differences between undiluted and undiluted samples is of the same order as the analytical precision. HFC-134a also shows good 20 linearity in this comparison with most LF within 1 s.d. of 1.0. Better scale transfers and linearity factors close to 1.0 for HCFCs may be partly due to the fact that MS instruments are more commonly used to measure HCFCs, and their response tends to be more linear than that of an ECD.
Nitrous oxide, which is typically measured using ECDs, showed discrepancies in 25 scale relationship and scale transfer in some cases (Fig. 6b). While the NOAA-NIST (1, 7) difference is consistent for both diluted and undiluted samples, the NOAA-SIO (1, 2) difference increases substantially at the lower mole fractions, and this difference is not consistent among other laboratories linked to the SIO-98 scale (2,9,13,14,17).

8043
Laboratories 9 and 17 show LF's close to 1.0 on the SIO-98 N 2 O scale, but laboratory 2 (scale origin) does not (Fig. 8). This discrepancy could be due to the fact that the SIO-98 N 2 O scale was developed over a limited mole fraction range, and the diluted samples measured here are outside the range of the SIO-98 scale.
For halon-1211, scale transfer appears to be excellent for both diluted and undiluted 5 samples (Fig. 8d), with linearity factors remarkably consistent near 1.0. Good scale comparability and transfer was also realized for CH 4 (not shown). CH 4 is commonly measured using a flame ionization detector, which typically has a linear response. While relative uncertainties are larger for SF 6 compared to N 2 O and halon-1211, most linearity factors are close to 1.0. This is important because SF 6 mole fractions are in- Linearity factors shown here are based on a limited dataset, and do not include time-dependent sampling issues that might influence real-world data. Long-term data records from similar locations should always be considered when applying scale factor 15 adjustments across changes in mole fraction and time. Further, agencies responsible for collecting the original data should be consulted whenever the application of scale factors are considered.

Summary
A comparison of numerous halogenated and other trace gases was carried out among 20 19 laboratories. Overall, scale differences are modest for a number of compounds. These results reveal substantial improvements in calibration over previous comparisons (Rasmussen, 1978;Fraser, 1979;Prinn and Zander, 1998). However, scale differences for most compounds are large compared to atmospheric gradients, and merging data on independent scales without regard to scale differences is not advised. Further, 25 differences due to scale propagation were found to be as large or larger than differences between independent scales in many cases. Scale differences range from 2 % for CFC-11 and CFC-12 to a factor of two for CH 3 I. Relatively large discrepancies among major scales were identified for CHCl 3 , CH 2 Cl 2 , CH 3 I, CH 2 Br 2 , and CHBr 3 . The standard deviation of CCl 4 results on 5 scales was 1.9 %. Uncertainties in top-down CCl 4 emission estimates solely due to calibration uncertainties are less than 5 %. Scale differences for CH 4 , N 2 O, and SF 6 reported previ-5 ously appear to be robust. Scale propagation errors are relatively small for some gases (halon-1211, HFC-134a) and larger for others (CH 3 CCl 3 ). Differences between measurement methods (ECD versus MS) are apparent, suggesting that co-elution or matrix effects may be important for some gases.
As a result of this experiment, cooperation among laboratories making similar measurements has improved. These results, available to participants since 2008, have stimulated the exchange of calibrated air samples and data in efforts to understand some of the observed differences on bi-lateral or multi-lateral basis. While these results provide a framework for relating calibration scales and measurement results among measurement programs, they should not be the sole basis upon which such relationships are 15 derived. A one-time assessment of measurement differences is not sufficient to fully characterize all aspects of the measurement of these and other trace gases.  Levin, I., Naegler, T., Heinz, R., Osusko, D., Cuevas, E., Engel, A., Ilmberger, J., Langenfelds, R. L., Neininger, B., Rohden, C. v., Steele, L. P., Weller, R., Worthy, D. E., and Zimov, S. A.: The global SF 6 source inferred from long-term high precision atmospheric measurements and its comparison with emission inventories, Atmos. Chem. Phys., 10, 2655-2662, doi:10.5194/acp-10-2655-2010, 2010 Lett., 36, L03804, doi:10.1029/2008GL036475, 2009 8048 Simmonds, P. G., O'Doherty, S., Nickless, G., Sturrock, G. A., Swaby, R., Knight, P., Ricketts, J., Woffendin, G., and Smith, R.: Automated gas chromatograph mass spectrometer for routine atmospheric field measurements of the CFC replacement compounds, the hydrofluorocarbons and hydrochlorofluorocarbons, Anal. Chem., 67, 717-723, 1995. Slemr, J., Slemr, F., Partridge, R., D'Souza, H., and Schmidbauer, N.: Accurate Measurements 30 of Hydrocarbons in the Atmosphere (AMOHA): Three European intercomparisons, J. Geophys. Res., 107, 4409, doi:10.1029/2001JD001357, 2002 8050     Table 6. Scale factors (relative to NOAA) derived from tropospheric global mean mole fractions reported in Table 1.1 of Montzka and Reimann (2011and 2007, and from undiluted IHALACE samples (mean and standard deviation) for representative laboratories. From these data we can compare global mean factors from AGAGE and UCI (University of California Irvine) with IHALACE factors from SIO and UCI-2, respectively. Unless otherwise specified, ratios were derived relative to NOAA ECD results.   (mole fraction, ppt = pmol mol −1 , parts per trillion) color-coded by calibration scale with scale identifiers shown along the top axis: similar colors denote similar scales; open (closed) symbols correspond to cylinders filled in winter (summer); circles denote laboratories that prepare primary standards, diamonds denote laboratories that adopt existing scales. Non-integer laboratory numbers indicate additional results submitted by or associated with the corresponding laboratory (different instruments, different calibration scales, etc.). For example, for CFC-11 laboratory 2 submitted data from two instruments on the same scale, while laboratory 6 submitted data on two different scales.        Linearity factors relative to NOAA for select gases. Filled symbols denote laboratories that prepare their own standards, while open symbols denote those that derive scales from others (see Table S1 in the Supplement for scale definitions). Note that symbol colors do not indicate common scales as was the case in Figs. 1-6. Data have been shifted on the x axis for clarity. A ratio of 1.0 corresponds to scale factors that are the same for both diluted and undiluted samples (NOAA results used for comparison are 1.0 by default and are not shown). Error bars are 1 s.d. Linearity factors are relative to NOAA ECD results in (a), (b), (d), and to NOAA MS results in (c).