Revision of the World Meteorological Organization Global Atmosphere Watch (WMO/GAW) CO2 calibration scale

The NOAA Global Monitoring Laboratory serves as the World Meteorological Organization Global Atmosphere Watch (WMO/GAW) Central Calibration Laboratory (CCL) for CO2 and is responsible for maintaining the WMO/GAW mole fraction scale used as a reference within the WMO/GAW program. The current WMO-CO2-X2007 scale is embodied by 15 aluminum cylinders containing modified natural air, with CO2 mole fractions determined using the NOAA manometer from 1995 to 2006. We have made two minor corrections to historical manometric records: fixing an error in the applied second virial coefficient of CO2 and accounting for loss of a small amount of CO2 to materials in the manometer during the measurement process. By incorporating these corrections, extending the measurement records of the original 15 primary standards through 2015, and adding four new primary standards to the suite, we define a new scale, identified as WMO-CO2-X2019. The new scale is 0.18 μmol mol−1 (ppm) greater than the previous scale at 400 ppm CO2. While this difference is small in relative terms (0.045 %), it is significant in terms of atmospheric monitoring. All measurements of tertiary-level standards will be reprocessed to WMO-CO2-X2019. The new scale is more internally consistent than WMO-CO2-X2007 owing to revisions in propagation and should result in an overall improvement in atmospheric data records traceable to the CCL.

Abstract. The NOAA Global Monitoring Laboratory serves as the World Meteorological Organization Global Atmosphere Watch (WMO/GAW) Central Calibration Laboratory (CCL) for CO 2 and is responsible for maintaining the WMO/GAW mole fraction scale used as a reference within the WMO/GAW program. The current WMO-CO 2 -X2007 scale is embodied by 15 aluminum cylinders containing modified natural air, with CO 2 mole fractions determined using the NOAA manometer from 1995 to 2006. We have made two minor corrections to historical manometric records: fixing an error in the applied second virial coefficient of CO 2 and accounting for loss of a small amount of CO 2 to materials in the manometer during the measurement process. By incorporating these corrections, extending the measurement records of the original 15 primary standards through 2015, and adding four new primary standards to the suite, we define a new scale, identified as WMO-CO 2 -X2019. The new scale is 0.18 µmol mol −1 (ppm) greater than the previous scale at 400 ppm CO 2 . While this difference is small in relative terms (0.045 %), it is significant in terms of atmospheric monitoring. All measurements of tertiary-level standards will be reprocessed to WMO-CO 2 -X2019. The new scale is more internally consistent than WMO-CO 2 -X2007 owing to revisions in propagation and should result in an overall improvement in atmospheric data records traceable to the CCL.

Introduction
Measurements of the atmospheric distribution of carbon dioxide (CO 2 ) are essential to understanding sources and sinks of this powerful greenhouse gas. We need wellcalibrated measurements to track the history of the global abundance of CO 2 because it is the main driving force of anthropogenic climate change. Small differences in the relative abundances of CO 2 and other trace gases observed at different locations, combined with information on atmospheric transport and mechanisms for land-atmosphere-ocean exchange, can provide constraints on estimates of the sources and sinks of CO 2 . Measurements are made at numerous sites around the globe in conjunction with the World Meteorological Organization (WMO) Global Atmosphere Watch (GAW) program and through regionally coordinated programs (e.g., Integrated Carbon Observing System, ICOS).
Because the atmospheric gradients of CO 2 are small in the background atmosphere far from sources of pollution, the WMO/GAW has adopted a single reference scale, maintained and disseminated by a designated Central Calibration Laboratory (CCL), upon which to base all measurements made within the program. The quantity to be measured is the mole fraction of CO 2 in dry air (µmol mol −1 , abbreviated as ppm, from parts per million) because it is conserved when air expands or contracts or when water vapor is added or removed. The WMO community has set network compatibility goals for the measurements, 0.1 ppm in the North-ern Hemisphere and 0.05 ppm in the Southern Hemisphere, aimed at minimizing bias between measurement sites in the network (WMO, 2020). To help meet these stringent goals, the WMO/GAW community voted in their 1995 meeting for the NOAA Climate Monitoring and Diagnostics Laboratory (subsequently known as the Global Monitoring Division and currently known as the Global Monitoring Laboratory) to serve as the Central Calibration Laboratory (CCL) for CO 2 . The Scripps Institution of Oceanography (SIO) initially served in this capacity (Keeling et al., 1986) before responsibilities were transferred to NOAA. The WMO/GAW CO 2 calibration scale also serves as a reference linking other measurement programs, such as those involving aircraft and total column measurements, to the surface measurement networks (Wunch et al., 2010;Messerschmidt et al., 2011).
As the CCL for CO 2 , NOAA maintains a set of 15 aluminum high-pressure gas cylinders containing modified natural air, with CO 2 spanning the range 250-520 ppm. CO 2 mole fractions were determined using an absolute method based on manometry (Zhao et al., 1997). These cylinders serve as primary standards and along with their assigned mole fractions constitute the WMO-CO 2 -X2007 mole fraction scale, where X is used to denote mole fraction and 2007 is the year in which the assigned values were adopted (hereafter simplified to X2007). The scale is distributed in highpressure aluminum cylinders containing natural air (tertiarylevel standards) with value assignment made by comparison against secondary standards (also natural air), which are traceable to the primary standards. The CCL at NOAA is a designated institute of WMO, which is a signatory to the Comité International des Poids et Mesures Mutual Recognition Arrangement (CIPM-MRA). Accordingly, calibration and measurement capabilities are listed in the Key Comparison Database maintained by the Bureau International des Poids et Mesures (BIPM) (http://kcdb.bipm.org/, last access: 1 November 2020). It is through primary methods, such as manometry, and comparison to other validated methods, such as gravimetry, that traceability to the International System of Units (SI) is established (Milton, 2013).
Since 1995, primary standards have been measured every 2-3 years to develop a measurement history and monitor for possible drift. Each measurement period is called an "episode". The X2007 scale was developed following the 2006 measurement episode (Tans et al., 2011). We have performed three measurement episodes since 2006 (2009, 2012, and 2015) to assess the X2007 assigned values using methods similar to those in use in 2006 (Zhao and Tans, 2006). Results from the 2009, 2012, and 2015 episodes were sufficiently close to the X2007 assignments that no updates to the scale have been made since 2007.
While the X2007 scale has served the community well for more than a decade, there are some compelling reasons to update the scale: (1) we discovered an error in the computer code used to reduce the manometer data; (2) we have improved our experimental methods in recent years, leading to a more accurate measure of CO 2 in the primary standards; (3) we would like to expand the range of the WMO/GAW scale to 800 ppm to better constrain instrument response and also to provide support for measurements obtained closer to emission sources, such as urban areas; and (4) we have recently developed a new measurement system used to transfer the scale to reference gases (Tans et al., 2017), which now allows us to harmonize the primary standards and define the scale with higher precision than what can be done with a single standard (see Sect. 6).
Here we introduce a revision of the WMO/GAW CO 2 scale, with the new scale identified as WMO-CO 2 -X2019 (hereafter referred to as X2019), and describe its implementation. This article is organized as follows. We first provide some background on the manometric method. We then describe two corrections to previous manometric results. These include corrections to rectify a calculation error related to the second virial coefficient of CO 2 and a correction for CO 2 absorption or adsorption to manometer surfaces (most likely O-rings) that occurs during the measurement process. The magnitude of the overall correction is small (∼ 0.18 ppm at 400 ppm) but significant in terms of network compatibility goals (WMO, 2020). We have applied these corrections to 23 years of manometric measurements. By reassigning CO 2 mole fractions to previous and newly introduced primary standards, we define the X2019 scale and explore differences between X2019 and X2007. We provide an estimate of the uncertainty associated with CO 2 reference gases, updating the work of Zhao and Tans (2006). Finally, we propagate the X2019 scale to all reference gases analyzed by the CCL and discuss the implementation of the X2019 scale.

The NOAA manometer
The manometric procedure is described in Zhao et al. (1997) and Zhao and Tans (2006). Briefly, the manometer consists of two glass volumes housed in a temperature-controlled oven, two glass traps for cryogenically extracting CO 2 from air and purifying the CO 2 , and devices to measure pressure and temperature (Fig. 1). During a measurement experiment, the manometer is evacuated to ∼ 5 mtorr (0.7 Pa), and then gas from a cylinder is loaded into the larger of the two volumes (large volume, ∼ 6 L). The large volume is flushed for 10 min at 200 mL min −1 , and the exit gas stream is monitored by non-dispersive infrared (NDIR) to ensure a stable CO 2 signal. Inability to observe a stable CO 2 signal (< 0.1 ppm) can result in the run being aborted. The large volume is then sealed off, allowed to equilibrate for 5 min, and the large volume temperature and pressure are recorded. The air sample is then pumped across the glass traps, which are held at liquid nitrogen temperature, to cryogenically extract the CO 2 from the air sample. The CO 2 is then purified (to remove H 2 O) by alternately freezing using liquid nitrogen and warming to ∼ −67 • C. Finally, the purified CO 2 is cryogenically trapped into the smaller of the two volumes (∼ 7 mL) and allowed to sublimate. The pressure and temperature of CO 2 in the small volume are recorded at ∼ 30 s intervals as the CO 2 warms and equilibrates to the oven temperature.
The mole fraction of CO 2 is determined from measurements of pressure, temperature, and the ratio of the two volumes. The volume ratio is determined by a gas expansion method using two additional volumes, which are also housed in the oven. A gas, usually air or nitrogen, is expanded into successive volumes, with P and T measured at each stage, to bridge the difference between small and large volumes (Zhao et al., 1997). The mole fraction of CO 2 , X CO 2 , is calculated using where T and P are the temperatures and pressures of air in the large volume (air) and nearly pure CO 2 in the small volume (CO 2 ), β air and β CO 2 are second virial coefficients, R is the gas constant, is the volume ratio (large / small), and X N 2 O is the mole fraction of N 2 O in the air sample (measured separately by gas chromatography with electron capture detection) (Hall et al., 2011). Equation (1) is an alternate form of Eq. (8) from Zhao et al. (1997).

Reprocessing historical manometer data
Manometer data were obtained using software designed to read and store temperature and pressure data during a manometer run and calculate the CO 2 mole fraction. Prior to each manometric episode, temperature and pressure were referenced to national standards (and to the SI) through calibration at accredited laboratories. Volume ratio experiments were performed prior to and during each episode (e.g., Fig. S6 in the Supplement). Pressure and temperature calibration coefficients needed to convert measured variables to P and T , as well as the volume ratio, were hard-coded in this software. During the final P and T measurement, CO 2 was calculated periodically as the gas in the small volume warmed and equilibrated to oven temperature during the final stage of measurement. An example of CO 2 mole fraction calculated as a function of time is shown in Fig. 2. Mole fractions of CO 2 were previously determined as the maximum X CO 2 calculated during the final stage (Fig. 2), adjusted for X N 2 O . There are two minor issues associated with this method that we correct with the implementation of the X2019 scale. First, we recently discovered an error in the software used to calculate X CO 2 . The second virial coefficient for CO 2 (β CO 2 ) (Sengers et al., 1971) was calculated corresponding to a temperature that was 10 K higher than the actual T CO 2 (320 K instead of 310 K) due to an interpolation error. Temperature was recorded correctly, but β CO 2 was calculated incorrectly. Consequently, X CO 2 was underestimated by about ∼ 0.03 ppm at 400 ppm. Second, we recognize that the pressure in the small volume decreases slowly with time after the temperature of the small volume stabilizes (Fig. 2). For the 380 ppm sample shown in Fig. 2, the rate of change in pressure is −10 −5 kPa s −1 , i.e., −0.036 kPa h −1 . We suspect that CO 2 absorbs to Viton O-rings and possibly adsorbs to surfaces of the small volume (Fig. 3). Separate tests conducted with pure CO 2 and Viton O-rings in a test tube revealed CO 2 loss rates comparable to what is observed in the manometer small volume (unpublished data). Essential to the development of the X2019 scale was revisiting previous data and making corrections for the incorrect β CO 2 and the loss of CO 2 that occurred prior to the maximum measured X CO 2 .
The results from all manometric determinations are stored in a database. Historical manometer results were adjusted using the following equation: (2)

Correcting for β CO 2
For X virial_correction , we first updated the data reduction software to calculate β CO 2 by correctly interpolating between the same β CO 2 coefficients used to define X2007 (−112.8 cm 3 mol −1 at 310 K and −104.8 cm 3 mol −1 at 320 K; Zhao et al., 1997). We then use the correct β CO 2 to calculate X CO 2 from pressure and temperature recorded in manometric data files and compare to X CO 2 calculated using the original (incorrect) values for β CO 2 . Figure 4 shows differences between the updated results (β CO 2 correct) and the original X CO 2 (β CO 2 incorrect). There are three representative periods that correspond to three nominally different volume ratios. The data show compact relationships with CO 2 mole fraction, as expected, since the mole fraction determined is largely a function of the pressure of CO 2 collected in the small volume. During each manometric determination, several temperatures were recorded. Since there are periods for which we do not know specifically which temperature records were used or the exact volume ratio used in the original calculation, we used three polynomial functions to estimate X virial_correction corresponding to three time periods: 1996periods: -1999periods: , 1999periods: -2003periods: , and 2004periods: -2016). The uncertainty associated with the estimated X virial_correction is less than 0.01 ppm.

Correcting for CO 2 loss
To correct for CO 2 loss, we assume that loss of CO 2 to materials in the small volume begins soon after CO 2 sublimes and occurs at a constant rate. By extending the manometer run time out by several hours, we can see that the loss rate  (1), which is shown as blue lines. After the temperature and pressure of the air in the large volume are recorded, the air is drawn from the large volume in the direction of arrow (2) (red lines) and through traps 1 and 2 to cryogenically trap the CO 2 . The CO 2 is cryogenically purified in glass traps 1 and 2 and then transferred to the small volume where its pressure and temperature are determined. Auxiliary volumes (AVs) are used in separate experiments to determine the ratio of large and small volumes (volume ratio). The dashed line depicts a temperature-controlled oven housing the glass volumes and pressure gauge. Historical manometric records are time-stamped with "measurement cycle", which is shown on the upper x axis. Here, each measurement cycle corresponds to ∼ 30 s. Temperature probe T3 is adjacent to the small volume and is cooled to liquid nitrogen temperature during extraction.
decreases with time (see Sect. S1.2 in the Supplement). However, the loss rate is sufficiently linear over the short term that a linear correction is a reasonable approach. We derive loss rates by fitting a linear function to the calculated X CO 2 , beginning ∼ 3 min after the maximum CO 2 and fitting 10-12 min of data (Fig. 5). This period corresponds to near-constant temperature and a steady decrease in pressure. After obtaining a loss rate from each data file, we correct the existing CO 2 record using the loss rate and elapsed time (expressed in terms of a measurement cycle, each approx. 30 s in duration): where a is the slope calculated from a record of CO 2 vs. time as in Fig  ing to the CO 2 maximum, and t 0 is the time at the start of the record, where we expect CO 2 loss to begin. Since a < 0, X loss_correction is positive. As an example, the maximum CO 2 shown in Fig. 5 occurs at cycle 35 in the data file, ∼ 1050 s after the liquid nitrogen was removed from the small volume. The slope (a) is −0.0074 ± 0.0002 ppm min −1 . If the loss of CO 2 begins at time t 0 = 0, the correction required would be 0.13 ppm. After the liquid nitrogen is removed from the small volume, we estimate that the purified CO 2 reaches a temperature of 273 K within 1 min and 300 K within 3 min. Adsorption of CO 2 probably begins about 1 min after the liquid nitrogen is removed. For many data records, we know that there was a software delay of 3 min between the time the small volume was sealed off (and the liquid N 2 removed) and the first data record. While this cannot be confirmed for all records, we include a 2 min delay: t maxCO 2 + 2 min (t 0 = 2 min). An error of 2 min in elapsed time would correspond to 0.015 ppm for a typical 400 ppm sample. Using an elapsed time of t maxCO 2 + 2 min (17.5 + 2 min, or 39 measurement cycles) in the above example, the loss correction is 0.14 ppm. All loss rates and estimated uncertainties are shown in Fig. 6. There is some time dependence to the corrections applied, possibly due to changes in materials (valves, Orings, etc.). The rate of CO 2 loss has generally increased over time; however, it may have improved slightly after a new airactuated valve with new Viton O-rings was installed in 2013.

Summary of manometer results
The X2007 scale was derived by averaging results from seven manometric episodes (1996,1998,2000,2001,2003,2004,2006) (Table 1). In developing X2019, we examined  . CO 2 loss correction (X loss_correction ) applied to manometer results in developing the X2019 scale (a) and estimated uncertainty associated with the loss correction (b), which have been color-coded by year (see Sect. S2.4). The corrections are mole fraction and time-dependent. Filled circles correspond to runs in which two CO 2 maxima were observed (1998 and 2004), and loss rates were determined from data after the second maximum. A total of 48 manometric runs were processed this way (8.9 % of the total). data files back to 1996 and applied the corrections discussed previously (Fig. 7). There is not a 1 : 1 correspondence between original and reprocessed results. In a few cases, the original data appeared abnormal and were flagged when developing X2019. In other cases, we were either unable to find raw files corresponding to results in the database or the records were not sufficient to calculate a CO 2 loss rate (data not stored for sufficient time). In all, we were able to recover and apply corrections to 93 % of the original data records. Higher variability in 1998 could be related to higher water vapor in samples extracted during that period. Manometric records from 1998 often did not show the characteristic single CO 2 maximum. Instead, those records show an initial "CO 2 " peak, followed by a short decline, and then a secondary peak followed by the normal decline (see Fig. S1). This secondary peak could be related to H 2 O desorbing from surfaces in the small volume. We have seen this pattern recently when the manometer has not been run for several weeks and tends to show characteristics of residual moisture (more time required to evacuate the manometer and higher than normal X CO 2 results). For most of the records from 1998 and some records from 2004, X loss_correction was determined from the time associated with the first peak in CO 2 , and the loss rate was determined after the second peak in CO 2 . We used the later loss rates because it appears that the initial slopes (loss rates) are impacted by evolution of H 2 O, and the loss rates calculated after the second peak in CO 2 are more consistent with loss rates determined during other episodes. Although this introduces additional uncertainty, results from 1998 are generally consistent with those from other years (Fig. 7). Comparing 1998 results to other years, it would appear any potential impact of additional water vapor as an impurity is less than 0.1 ppm. Further, if we used the time associated with the second peak instead of that associated with the first peak, manometer results from 1998 and 2004 would be slightly greater, but this would translate into an increase of Figure 7. History of manometric results, showing CO 2 (ppm) from the manometric database before corrections (black triangles), CO 2 values after applying X virial and X loss_correction (red circles), and results from 2017 and 2020 for which the second virial coefficient β CO 2 was calculated correctly and only the loss correction was applied (green diamonds). Note that cylinders CC71578, CA08231, CB11054, and CC71605 have a shorter measurement history. only 0.01 ppm in the average manometric values for primary standards in the 250-520 ppm range.
It is also important to note that in May 2014 we damaged the small volume during routine maintenance. New glassware and a new air-actuated valve (Glass Expansion, Pocasset, MA) were installed in August 2014. This meant that the volume ratio, which had been essentially constant since 2004, needed to be re-established. After establishing traceability for temperature and pressure, we performed a number of volume ratio experiments and obtained a new volume ratio that was 2 % larger than the previous one. Results from the 2015 episode, with the new small volume and vol-ume ratio, agree well with those from previous episodes. The mean difference between the 2012 episode and 2015 episode, for all primary standards in the 250-520 ppm range, is only 0.03 ppm.

Drift assessment
The mole fraction of CO 2 (in air) in aluminum cylinders can increase with use (Langenfelds et al., 2005;Leuenberger et al., 2015;Schibig et al., 2018). Our experience suggests that X CO 2 is relatively stable over the useful life of a cylinder when used sparingly at flow rates ∼ 0.3 L min −1 or lower but Table 1. Primary standard CO 2 mole fractions (ppm) determined using the NOAA manometer. A lower case "x" is used here to indicate that these are mean values determined from manometric measurement and have not yet been harmonized into a calibration scale. For X2007 we report the average manometric results from seven episodes (as the mean of the episode averages). For X2019 we averaged all valid recoverable data from 1996 to 2017 after correcting for β CO 2 and CO 2 loss. Note that primary ND17440 was put into service in 2010 to replace a standard that was thought to be drifting upward. ND17440 was not part of the original X2007 scale. CC71605 includes data from 2020. can increase as the pressure drops below about 15 % of the fill pressure. However, it is worth noting that detecting small drift rates over decades is very difficult because it requires a stable reference with comparably low uncertainties. At the end of the 2015 measurement episode, all 15 primary standards contained at least a third of the original gas, with pressures of at least 4.4 MPa (600 psi), and most contained more than 6 MPa. Drift in the X2007 scale was assessed through repeated manometric measurement. Only AL47-103 (no longer in use) was found to be drifting. With the update to X2019, we applied corrections to the primary standards that were both a function of mole fraction and time. We therefore need to reassess the possibility of drift in the primary standards. We performed a weighted least-squares linear fit to the mean mole fraction determined during each episode. Uncertainties were estimated by combining the manometer repeatability during each episode (σ i / √ N i ), where σ i is the standard deviation of results within episode "i" and N i is the number of measurements during that episode, with the rel-ative uncertainty in the volume ratio and the average uncertainty associated with X virial_correction and X loss_correction for each episode (0.02-0.04 ppm). We lack sufficient information to fully evaluate the uncertainty in the volume ratio dating back to the earliest periods, so we assume that our current uncertainty assessment is valid for the entire record. We consider each episode independent since traceability to national standards for temperature and pressure was established prior to each episode and do not include uncertainty components common to all episodes (which include components of the volume ratio uncertainty related to temperature gradients in the oven and differences in volume ratio obtained using difference gases, i.e., N 2 , air, and argon). We estimate the total uncertainty in the volume ratio to be 0.014 % (see Sect. S2.3.4). Excluding components common to all episodes, we use 0.013 % for uncertainty on the volume ratio in the drift assessment.
Drift rates, in parts per million per decade, are summarized in Fig. 8 (see also Table S1). For primary standards with X CO 2 > 530 ppm, the manometric histories are too short to adequately assess drift. For those with X CO 2 in the range 250-520 ppm, all but three show positive drift, although none is significant at the 95 % confidence level. While some calculated drift rates are of the order of 0.05 ppm per decade, we are unable to detect drift rates less than ∼ 0.08 ppm per decade, owing mostly to the uncertainties associated with the volume ratio and reproducibility of the manometric measurements. The average drift rate among standards in the 350-450 ppm range is 0.02 ppm per decade, which would have only a minor impact on the heart of the X2019 scale if drift rates shown in Fig. 8 were incorporated, except when making comparisons across decades. Thus, while relative drift among cylinders can be observed over short time periods, as in Leuenberger et al. (2015) and Schibig et al. (2018), detecting long-term drift on an absolute basis is difficult. Still, drift in cylinders is typically small compared to the growth rate of atmospheric CO 2 (∼ 2 ppm yr −1 ).
6 Defining the X2019 WMO CO 2 mole fraction scale Primary standards were analyzed using the laser spectroscopy system described in Tans et al. (2017). These data were then used to harmonize the standards and define a scale. Each primary standard was analyzed six times relative to a ∼ 400 ppm reference cylinder. On this analysis system we treat the three major isotopologues of CO 2 separately to eliminate subtle biases due to variations in isotopic compositions among the standards and between samples and references cylinders. We harmonized the primary standards using only the major ( (Tans et al., 2017). The isotopic assignments for  (White et al., 2015). Additionally, there is uncertainty in the tie to the JRAS-06 scale realization, which is currently being evaluated as part of the conversion of IN-STAAR data to the JRAS-06 scale realization. Based on a re-evaluation of recent comparisons with other laboratories, it is expected to be less than 0.05 ‰ for both δ 13 C and δ 18 O (Sylvia Michel, personal communication, 2020). These uncertainties are insignificant relative to the uncertainty in the manometric determination of total CO 2 in terms of the calculated mole fraction of the 16 O 12 C 16 O isotopologue.
The uncertainties on flask measurements at INSTAAR listed above are determined for ambient atmospheric samples (∼ −7.5 to −9 ‰). Several of the primary standards are depleted relative to the atmosphere (see Table 2), and this could increase the uncertainty of these measurements due to scale contraction in the measurements at INSTAAR. At δ 13 C = −20 ‰ and δ 18 O = −20 ‰, Wendeberg et al. (2013) found the INSTAAR realization of VPDB-CO 2 to be offset from JRAS-06 by approximately 0.2 ‰ in δ 13 C and 0.8 ‰ in δ 18 O. This was primarily due to scale contraction due to the instrumentation in use at INSTAAR. Subsequent conversion of the INSTAAR records to JRAS-06 is not expected to correct for the scale contraction in historical measurements since these measurements were not done with twopoint normalization. Errors in the isotopic assignments of the primary standards of this magnitude due to scale contraction issues will result in errors of less than 0.01 ppm in the calculated 16 O 12 C 16 O mole fraction. We therefore feel confident that we can harmonize the primary standards based on the 16 O 12 C 16 O measurements only.
A linear fit (orthogonal distance regression) was applied to the normalized analyzer response and the 16 O 12 C 16 O component of the average manometer results. This was repeated six times over 3 years. To test the sensitivity of the harmonization process, we performed an orthogonal distance regression with two variations of manometric average values and two variations of weighting factors for each primary standard (four combinations). For the manometric data, we used either the average of all manometric measurements of each primary standard or the weighted average from each measurement episode. For the weights in the regression, we used either the inverse variance (1/σ 2 ) (as in Table 1) or the square of the inverse standard error. All four variations give essentially the same result (within 0.01 ppm near 400 ppm). Therefore, the X2019 scale is defined from an orthogonal distance linear regression using the average manometric result and standard deviation (using 1/σ 2 as weighting factors) for each cylinder (avg. X2019 and SD X2019 in Table 1). Figure 9 shows the residuals from six analysis periods over 3 years associated with harmonization. There is good agreement among the different analysis periods, indicating that variability seen in the residuals relates to the manometer average values. For each primary standard, we corrected the CO 2 mole fraction by the mean residual from the linear fit ( Table 2). The X2019 scale is defined as the average residual-corrected mole fraction, determined over six analysis periods, for each primary standard. In this way, the scale is defined over a range, with better consistency and smaller uncertainty compared to individual primary standards. For X2019, we include the 15 primary standards used to define the X2007 scale, plus four additional primary standards with X CO 2 > 530 ppm. Additional primary standards in the upper range help to constrain the fit and reduce end effects. Many residuals are less than 0.05 ppm, but the newer standards in the upper CO 2 range show larger residuals. Some of this may be due to their short measurement history compared to standards in the 250-520 ppm range. Finally, while harmonization is not strictly necessary if all primary standards are to be analyzed at the same time when propagating the scale, it provides some insurance on the potential loss of a primary standard. By assigning mole fractions consistent with the best fit response, loss of 1 or 2 standards from the suite of 19, especially in the middle of the X CO 2 , range would not be catastrophic. Figure 10 shows differences between primary standard assignments on X2019 and X2007. As expected, the differences are a function of mole fraction, since both the virial correction and loss correction are functions of mole fraction. The scale difference based on primary standards alone (not including scale transfer) is 0.17 ppm at 400 ppm, and the average scale correction over the range 250-520 ppm is 0.04 %. Some of the scatter in Fig. 10 is due to updated assignments owing to a longer measurement record for X2019 compared Table 2. WMO primary standard assignments on X2007 and X2019 scales. Assignments were determined following analysis and residual correction by NDIR (X2007) and laser spectroscopy (X2019). The average ratio of primary standards on scales X2019 / X2007 is 1.00040, with a standard deviation of 0.00011.

Cylinder
Assigned  to X2007. However, the largest deviation is due to a misassigned value: the assigned value for AL47-146 was inadvertently listed as 389.55 in our database instead of 389.64. The implications of this misassignment are discussed in Sect. 9.

Independent assessment
Revision of the X2007 scale relies on the assumption that the loss of CO 2 to Viton O-rings in the small volume of the manometer can be adequately addressed by linear extrapo-lation (Fig. 5). Knowledge of CO 2 losses prior to the availability of representative pressure and temperature measurements (during the time while the small volume is warming) is lacking. Experiments in which pure CO 2 was loaded into the small volume by overpressure (not transfer by cryogenic extraction) suggest that the loss process is initially nonlinear and approaches a linear rate after about 10 min. If this is true, then the correction we apply is too small (by ∼ 0.2 ppm) (see Sect. S1.2). However, these experiments were not carried out under the same conditions used to extract CO 2 from air, so we cannot be sure that they are representative. Therefore, we explored an independent method to provide insight into potential bias in the X2007 scale and our attempt to correct for that bias.

Comparison to in-house gravimetrically prepared standards
We prepared CO 2 primary standards using a gravimetric method (Hall et al., 2019). Briefly, known masses of highly pure CO 2 were introduced into 29.5 L aluminum cylinders and diluted with known masses of CO 2 -free air. Uncertainties were reduced by preparing standards in one step and by accounting for CO 2 likely to be adsorbed to cylinder walls at high pressure (Schibig et al., 2018). These standards were analyzed by laser spectroscopy and assigned X CO 2 values on the X2019 scale (Table 3). The X2019 assignments are consistent with the gravimetrically prepared values, with an average difference of 0.03 ppm and an average ratio of 1.00008 (Table 3). If the gravimetric standards were used to define a calibration scale, it would, on average, be 0.045 % greater than the X2007 scale (avg. ratio 1.00045, SD 0.00017) (Hall et al., 2019). This is very close to the average ratio of 1.00040 derived by correcting historical manometric data (Table 2).

Comparison with NIST
Based on an exchange of 30 tertiary standards in 2010, Rhoderick et al. (2016) compared the National Institute of Standards and Technology (NIST) gravimetric CO 2 scale to the NOAA X2007 scale, reporting an average difference of 0.19 ± 0.03 ppm (NOAA lower) over the range 388-394 ppm. After adjusting NOAA results to X2019, differences range from −0.08 to +0.07 ppm, with a mean difference of 0.0 ± 0.03 ppm.

Key comparison CCQM-K120a
NOAA recently participated in an international comparison (CCQM-K120a) organized under the auspices of the

Uncertainty analysis
Here, we estimate the total uncertainty associated with a CO 2 determination on the X2019 scale. We extend the work of Zhao and Tans (2006), following accepted methods for uncertainty propagation (JCGM, 2008). To arrive at an uncertainty estimate, we use Eq. (4), which is a modified version of Eq. (1), and propagate uncertainties over a range of CO 2 mole fractions. We include the terms X virial_correction and X loss_correction since the X2019 scale was derived based on these corrections. Future manometric analysis will not include the term X virial_correction since β CO 2 is now correctly determined.We also include the term X H 2 O and estimated uncertainty even though we do not correct for water vapor in the final sample (X H 2 O = 0).
We establish traceability of manometric measurements to national temperature and pressure standards. Prior to a measurement episode, three platinum resistance thermometers, one thermistor, and a piston gauge are typically sent to an accredited laboratory for calibration (National Voluntary Laboratory Accreditation Program, NVLAP). We estimate the uncertainties associated with measurement of temperature and pressure from uncertainties reported by the calibration laboratories, repeatability, and experience. Uncertainty components are described in the Supplement and are similar to those estimated by Zhao and Tans (2006) except for the uncertainty associated with the volume ratio. We calculate a larger uncertainty for , in part because we observed small temperature gradients in the oven, and hence our ability to measure the gas temperature at each stage of the expansion sequence with existing equipment is probably less certain than previously estimated (Zhao and Tans, 2006). By calculating expanded uncertainties over a range of mole fractions, we arrive at a general expression for the expanded uncertainty (µX CO 2 ) as a function of X CO 2 : From Eq. (5), the expanded uncertainty at 400 ppm is 0.17 ppm or 0.043 %. This estimate is only slightly larger than that estimated by Zhao and Tans (2006) (2 × 0.069 = 0.14 ppm). We acknowledge that the uncertainty could be larger, owing to nonlinear loss processes in the early stages of the final pressure and temperature measurements. However, the magnitude of this potential bias could not be quantified experimentally under conditions consistent with manometric experiments. We include the scale transfer uncertainty in our uncertainty estimate, which is particularly relevant for users comparing data traceable to the same scale. From repeated measurements of multiple cylinders, we estimate the scale transfer uncertainty based on laser spectroscopy to be 0.01 ppm (1σ ), similar to what was reported by Tans et al. (2017). For cylinders value-assigned by NDIR (∼ 1995 to 2016) we estimate the scale transfer uncertainty at 0.03 ppm (1σ ) (see Sect. S2.5).

Scale implementation
As discussed above, the implementation of the scale involves the harmonization of primary standard manometric results through analysis, with assigned mole fractions derived using a linear response function based on spectroscopic analysis. These assigned mole fractions are then used to define the X2019 scale and transfer that scale to lower-order standards.
In the hierarchy of value assignment, standards used to support NOAA atmospheric measurements and those distributed by the CCL are known as "tertiary standards". Recalculating tertiary standard values on the X2019 scale involves three steps: (1) updating primary standards to X2019, (2) reassigning secondary standards based on primary-secondary comparisons (note that some secondaries were reassigned based on additional data not available upon initial assignment), and (3) reassigning tertiary standards based on updated daily response functions, relative to secondaries. Here we present the impact of the X2019 scale update on tertiary value assignments dating back to 1995. In a subsequent section we present the implications of the scale update on NOAA atmospheric measurements.
Tertiary standards are value-assigned based on analysis vs. secondary standards (Zhao and Tans, 2006). From 1995 to October 2016, value assignment was performed by NDIR (Siemens Ultramat-3 or Ultramat-6F; Li-Cor Li-6251, Li-6252, or Li-7000), and from November 2016 by laser spectroscopy (Picarro G2301; Los Gatos Research CCIA-46-EP; Aerodyne Research, Inc., QC-TILDAS-CS). There was an approximately 12-month overlap period where tertiary standards were run on both systems. The NDIR response to CO 2 is typically nonlinear. For analysis on a given day, a quadratic response function was determined based on four secondary standards, which were previously value-assigned based on similar mole-fraction-dependent subsets of the suite of primary standards. Secondary standards were selected such that X CO 2 spanned the range of tertiary standards to be calibrated. For example, analysis of a nominal 380 ppm tertiary standard would typically involve secondary standards at 370, 380, 390, and 400 ppm (10 ppm spacing). For X CO 2 greater than 450 ppm, three secondaries, spaced ∼ 25 ppm apart, were used. For analysis by laser spectroscopy, 16 secondary standards over the range 250-800 ppm (prior to April 2020, 14 secondary standards covering 250-600 ppm) are used to define response curves for the three major isotopologues of CO 2 ( 16 O 12 C 16 O, 16 O 13 C 16 O, and 16 O 12 C 18 O). The mole fraction of each of the three major isotopologues is measured and then converted into total CO 2 , δ 13 C, and δ 18 O, accounting for the unmeasured minor isotopologues as described in Tans et al. (2017).
Upon revision to X2019, all secondary standards used as far back as 1979 were re-evaluated. Secondary standards were compared to primary standards multiple times during their use. A statistical test and expert judgment were employed to evaluate drift in secondary standards. The statisti-cal test was occasionally overruled in cases where we suspect a step change due to change in instrumentation was the underlying driver rather than drift in the secondary standard. If drift was suspected, a weighted linear or polynomial function was fit to the data (weighted by instrument reproducibility, see Sect. S2.5) and a time-dependent mole fraction used. Note that it is easier to detect drift in secondary standards compared to primary standards because we evaluate secondary standards relative to the scale defined by many standards. Thus, the limiting factor is measurement reproducibility and not the absolute uncertainty of the scale.
During this re-evaluation, the drift status of some secondary standards was updated, with more data being available compared to when drift rates were first assigned. Thus, some standards that had previously assigned time-dependent values are now held constant, and vice-versa. Generally, the X2019 scale is more consistent across mole fraction and time, and therefore the new evaluations for secondary standard drift are considered more reliable. After updating secondary standard value assignments to X2019, X CO 2 for all tertiary standards dating to 1979 were reassigned from raw data. We focus here mainly on the period from 1995 onward because our role as a WMO/GAW CCL began in 1995. Figure 11 shows differences between tertiary standard assignments on X2019 and X2007 from 1995 through February 2020. The overall scale difference is clearly a function of mole fraction, with the difference approximately 0.18 ppm at 400 ppm. It is immediately obvious that differences are not a perfect linear function of mole fraction. Differences that are consistent over several months can be seen as coherent traces in Fig. 11. The coherent differences are due to secondaries being exhausted and replaced by others at slightly different mole fractions. Even though tertiary standards were bracketed by secondaries during analysis, limitations in the ability to value-assign any particular secondary standard, coupled with the limitations associated with fitting a quadratic response function to three or four secondaries contributes to variability. Even so, most of the year-to-year variability at a particular mole fraction is less than 0.02 ppm (1σ ). Outliers, such as those corresponding to analysis performed in the mid-1990s above 400 ppm (red and purple symbols), are the result of extrapolation beyond the range of the secondaries. Prior to 1997, the highest secondary standard in regular use was 390 ppm.
The more prominent variations evident in Fig. 11 stem from reassignment of primary and secondary standards, the nonlinear response of NDIR instruments, and the nature of the value-assignment process. Scale differences appear significantly larger during 2008-2009 over the 360-390 ppm range (light green symbols). These value assignments, which involved around 600 analysis records (less than 3 % of the total number), are inconsistent with most other data due to a revision of X CO 2 assigned to a particular secondary standard (CA01982) in use at the time. This particular secondary was assigned a value of 391.87 on the X2007 scale in 2008 Figure 11. Differences between X2019 and X2007 assignments to tertiary standards from 1995 to 2020. Each data point represents one analysis record (over 25 000 records shown), and a full calibration of a tertiary standard involves multiple analysis records. when compared to primary standards. However, incorporating subsequent analysis of this cylinder against primary standards, it was evident that the cylinder was drifting upward rapidly. This secondary standard drifted ∼ 0.2 ppm in 2 years (not common), but the drift was not accounted for in the X2007 value assignment, which caused the value used for data reduction to be too low. The drift is accounted for in the X2019 value assignment, leading to larger X2019-X2007 differences for tertiary standards measured against this secondary standard.
The more recent data based on analysis by laser spectroscopy are represented as dark purple and maroon colors in Fig. 11. These show a more linear relationship without the wavy structure, as expected for an instrument with a linear response calibrated over the entire scale range. The fact that the laser spectroscopic results do not agree with the NDIR data in the upper X CO 2 range (> 420 ppm) is due to the use of secondary standards on this system that were not well characterized. Value assignments for these secondary standards were determined on the NDIR system and thus incorporate the biases associated with that system on X2007. They were not well characterized when they went into service, especially at the upper end of the range, where we effectively expanded the calibration range in anticipation of the X2019 revision. We now have more information on these secondary standards, including analysis vs. the primary standards on the laser spectroscopic system, and can better define them on X2019.
It is important to note that differences in value assignment between the NDIR and laser spectroscopic system (Fig. 11) are only present on the X2007 scale. The X2019 revision resolves the underlying cause of the offsets. Figure 12 shows the results from the ∼ 12-month overlap during which tertiary standards were analyzed on both systems. There is a clear mole fraction dependence to the offset on the X2007 scale. Tans et al. (2017) attributed this to the assigned values of the primary standards coupled with the method used for scale transfer using the NDIR but were not able to rule out other potential issues such as gas handling on the NDIRbased system. The X2007 primary standard assignments (Table 2), based on harmonization by NDIR analysis, were not as robust as we thought. The X2007 scale was based on relatively few NDIR analysis runs, and as such the residuals were not as well defined as they are for X2019 (Fig. 9). By using small subsets of standards to calibrate the NDIR, the data reduction of the NDIR system tracked errors in the assigned values rather than averaging those errors over the entire range of the scale. By normalizing the primary standards on a linear system, using the full suite of primary standards multiple times over several years (as was done for X2019), we can better define the assigned values of the primary standards. After converting to X2019, the NDIR system is still subject to end effects and errors in value assignments of the primary standards, but these errors are much smaller compared to X2007, and the comparison data show much better agreement between the two systems (lower panel in Fig. 12). The good agreement between the two systems on X2019 leads us to believe that the mole fraction dependence in the offsets on X2007 (Fig. 12a) is due the assigned values of the primary standards and not to some other issue related to gas handling. This also indicates that the agreement is probably relatively stable in time and there is likely no mole-fraction-dependent bias in the NDIR results prior to the comparison period.

Approximating X2019 using a linear scale conversion
For users of standards obtained from the CCL, the best way to update to the X2019 scale is to implement the X2019 reassignments and propagate through to atmospheric data. A database management system allows for efficient propagation of scale changes to atmospheric data. However, for datasets in which a full reprocessing is not possible or practical, a linear scale conversion could be an option. The linear function shown below is based on primary standard assigned values (weighted linear regression): It is clear from Fig. 13 that the linear conversion, shown as the solid black line, will introduce errors in the 370-390 ppm range compared to full reprocessing, as the line does not pass through the majority of the data in that range. This is an unfortunate consequence of the misassigned primary standard (AL47-146) and the misassigned secondary in use in 2008. Nevertheless, the linear conversion introduces errors less than 0.05 ppm for 94 % of the tertiary standards in the range 320-460 ppm (Fig. 14). Errors are less than 0.03 ppm for 78 % of the data in the same range, although there is a persistent low bias between 380 and 390 ppm. Figure 12. Differences between NDIR and laser spectroscopic systems used for tertiary value assignment on X2007 (a) and X2019 (a) during a 12-month overlap period. Open symbols denote tertiary standards with significantly lower 13 C-CO 2 isotopic ratios compared to the others (δ 13 C < −20 ‰), which are thus subject to bias in the NDIR measurement. Dashed lines are the expected reproducibility of the NDIR system (±0.03 ppm).

Revision of NOAA atmospheric data
We have reprocessed NOAA atmospheric data back to ∼ 1979 for internal evaluation. This involved reassigning X CO 2 values for working (tertiary-level) standards to X2019 Figure 14. Scale conversion bias seen in tertiary standard assignments when using the linear scale conversion, shown as the difference between the linear scale conversion and the reprocessed values.
by reprocessing the original tertiary-secondary comparisons.
For data prior to 1995, this also involved converting from a Scripps Institution of Oceanography (SIO) scale to X2019. Complete detail of the conversion from the SIO scale to X2019 is beyond the scope of this paper and will be addressed in a separate publication. After fully converting to X2019, NOAA data prior to ∼ 1979 will still be traceable to the SIO scale in use at the time of measurement.
We include examples of atmospheric data here to provide a comparison of two methods used for propagating the X2019 scale: full reprocessing using updated tertiary standard values and response functions and a simple linear scale conversion applied to atmospheric records. Actual bias introduced into atmospheric records by implementing the linear conversion will depend on the calibration procedures used in a particular laboratory and the range and calibration history of standards. For example, if a particular set of standards used by a laboratory was analyzed multiple times by the CCL over several years, the impact of the 2008-2009 secondary standard misassignment would be reduced.
The lower panel in Fig. 15 shows the difference between the linear scale conversion and full reprocessing applied to in situ CO 2 at Mauna Loa, HI (MLO). Generally, the linear scale conversion is fairly close to the fully reprocessed data but has a negative bias that is larger during 2007-2009 due to the 2008 secondary misassignment issue. There are time periods of larger differences, such as in late 2014, due to a reassessment of drift in the working standards. In the case of the 2014 period, one of the working standards had a relatively large drift correction (0.2 ppm yr −1 , which is not common), but the drift correction was implemented on X2007 in a way that exaggerated the effect (this only applies to relatively few cylinders in 2014). Without fully reprocessing, this error would be preserved in the dataset. Figure 15. Hourly averaged CO 2 measurements from Mauna Loa Observatory fully reprocessed to X2019 (a), the difference between the fully reprocessed X2019 data and X2007 (b), and the difference between using the linear scale conversion and full reprocessing methods to determine X2019 values (c).
In addition to MLO, we reprocessed in situ data from the other NOAA baseline observatories (Utqiaġvik (formerly Barrow), AK; American Samoa; South Pole) and flask samples from marine boundary layer (MBL) sites using both the linear scale conversion and full reprocessing methods. Biases in the linear scale conversion were binned by year to get a sense of how well the linear scale conversion approximates the scale difference over time. Again, differences due to reassessment of drift in the working standards are included in these binned bias terms. Figure 16 shows the average annual bias in each of these data records that would be included if the records were converted to X2019 using the linear function rather than fully reprocessed (note, only hourly averages and flask samples identified as representing baseline conditions were used for this comparison). Average bias across the whole period is −0.03 ppm, but there are years in individual records with biases up to −0.09 ppm. These measurement systems are tightly tied to the calibration chain. The larger biases during [2007][2008][2009] show that these systems all follow the bias in the scale due to the 2008-2009 misassigned secondary standard. The effect is moderated slightly due to the use of multiple standards and the fact that most standards have pre-and post-deployment value assignments and that typically only one of these would have occurred during the 2008-2009 excursion. We also conducted a numerical experiment to examine scale conversion bias without the added complications from a reassessment of drift in working standards. We randomly selected sets of three and five individual tertiary standards measured within a calendar year. Each set required a standard within ±10 ppm of the global average from a particular year (https://www.esrl.noaa.gov/gmd/ccgg/trends/global. html, last access: 7 July 2020). The other standards were required to be at least 10 ppm but less than 30 ppm apart and cover mole fractions above and below the initial selected standard. Quadratic fits to the actual X2019-X2007 differences vs. the X2007 assignments were made. The point on this curve corresponding to the calendar year global average (on the X2007 scale) was compared to the global average converted to X2019 using the linear scale conversion. The experiment was run 50 times for each year. In essence, this lets us approximate the bias due to the use of the linear scale conversion on a hypothetical sample equal to the global average for 50 different sets of standards. The average biases due to the use of the linear scale conversion for three-standard and five-standard suites are shown in Fig. 16 expressed as 3-year running means. The results show good agreement with the bias seen in the in situ and flask MBL records. It is important to note that both the results of the numerical experiment and these particular atmospheric records are tightly tied to the CO 2 scale transfer system in time. Atmospheric data from 2007 to 2009 measured by external programs would not be as sensitive to the 2008 bias if their standards were not calibrated by the CCL during that time. Conversely, measurements at other times tied to standards that were only measured during the 2007-2008 period (without subsequent reanalysis) would be more sensitive. Figure 17. Potential bias that could exist in archive datasets traceable to NOAA standards prior to the release of X2007, shown as the difference between a hypothetical archived result and that result expressed on scale X2007 (derived from a sample of standards analyzed from 1993 to 2005).

Historical scales
The impact of the revision from X2007 to X2019 is well understood and the linear conversion agrees with full reprocessing within 0.03 ppm for nearly 80 % of standards valueassigned since 1995 over the range 320-460 ppm (Fig. 14). However, data traceable to NOAA scales prior to the release of X2007 that cannot be fully reprocessed are an additional concern. The implementation of NOAA scales prior to X2007 was not rigorously documented. Prior to 2001, NOAA scales were partially based on SIO value assignments of the NOAA primary standards and thus were sensitive to revisions of the SIO scale. The incorporation of SIO revisions over time at NOAA and how these translated into distributed scales is not well documented, and therefore it is difficult to determine relationships between X2019 and historical scales prior to the full conversion to X2007. Note that the CCL has taken multiple steps since then to ensure these lapses do not occur again and that the evolution of the scale is transparent and fully documented.
To assess the magnitude of potential bias relative to X2007 that could exist in archived datasets still traceable to historical NOAA scales, we examined records from CSIRO (Australia), NIWA (New Zealand), and Environment Canada, who provided records of tertiary standard value assignments prior to the formal adoption of the X2007 scale. Figure 17 shows the difference between the original reported value (assigned by NOAA at that time) and the value reassigned on scale X2007 upon its release.
NOAA primary standards were initially value-assigned by SIO from 1992 to 1995. From 1996 to 2000, we used a mixture of NOAA and SIO manometric results, and from 2001 onward we used only NOAA manometric results. Scales propagated by NOAA from 1993 to 2000 were effectively a mixture of the SIO scale in use at the time (now obsolete) and the NOAA manometric data up to that time. Bias is largest and shows more scatter prior to 1994 because the NOAA scale was based on relatively few SIO measurements of the NOAA primary standards (Keeling et al., 2012). Primary assignments improved over time as the number of measurements increased. Data traceable to these unnamed NOAA scales are biased relative to X2007 (Fig. 17). However, any potential bias in atmospheric records would be related to the date the standards were value-assigned and not necessarily the date the atmosphere was measured. The potential bias in historical datasets relative to X2019 would increase due to the X2019 to X2007 relationship. The linear conversion (Eq. 6) is not strictly applicable to data not traceable to X2007 but would be a close approximation for data traceable to scales in use between 2001 and 2006. These limitations should be considered with regard to the uncertainty of historical data.

Conclusions
We have applied two corrections to manometric data used to define the WMO/GAW CO 2 scale and include four additional standards to define a new scale, identified as WMO-CO 2 -X2019. The net result of a scale update is two-fold: (i) the X2019 scale is more accurate and internally consistent than the previous X2007 scale. (ii) Tertiary assignments on X2019 are more consistent across time because scale propagation has been improved with additional manometric analysis of primary standards and additional information on secondary assignments. While the scale difference at the tertiary-standard level (∼ 0.18 ppm at 400 ppm) is small in relative terms (0.045 %), it is significant in terms of atmospheric monitoring. Measurement laboratories will need to update to the X2019 scale to avoid misinterpretation of scaleinduced (artificial) atmospheric gradients as real signals.
For users of standards obtained from the CCL, the best way to update to the X2019 scale is to implement the X2019 reassignments and propagate through to atmospheric data. However, for datasets in which a full reprocessing is not possible or practical, a linear scale conversion is an option. The linear conversion will result in bias compared to full reprocessing, but the bias is relatively small in many cases and is less than 0.03 ppm for nearly 80 % of standards valueassigned since 1995 over the range 320-460 ppm.
Data availability. Data used in this work are available in the Supplement.
Author contributions. All authors played a role in experiment design and analysis. BDH, DRK, TM, BRM, and MFS performed the experiments. BDH, AMC, and PPT performed the calculations. BDH prepared the manuscript with contributions from AMC (Sects. 5, 6, 8 and 9) and PPT (Sects. 3,4,5). All authors provided feedback on the manuscript.