Calculating uncertainty for the RICE ice core continuous flow analysis water isotope record

We describe a systematic approach to the calibration and uncertainty estimation of a high-resolution continuous flow analysis (CFA) water isotope (δ2H, δ18O) record from the Roosevelt Island Climate Evolution (RICE) Antarctic ice core. Our method establishes robust uncertainty estimates for CFA δ2H and δ18O measurements, comparable to those reported for discrete sample δ2H and δ18O analysis. Data were calibrated using a time-weighted two-point linear calibration with two standards measured both before and after continuously melting 3 or 4 m of ice core. The error at each data point was calculated as the quadrature sum of three factors: Allan variance error, scatter over our averaging interval (error of the variance) and calibration error (error of the mean). Final mean total uncertainty for the entire record is δ2H= 0.74‰ and δ18O= 0.21‰. Uncertainties vary through the data set and were exacerbated by a range of factors, which typically could not be isolated due to the requirements of the multi-instrument CFA campaign. These factors likely occurred in combination and included ice quality, ice breaks, upstream equipment failure, contamination with drill fluid and leaks or valve degradation. We demonstrate that our methodology for documenting uncertainty was effective across periods of uneven system performance and delivered a significant achievement in the precision of highresolution CFA water isotope measurements.


Introduction
Stable water isotopes (δ 2 H, δ 18 O) are a fundamental part of ice core studies.They are particularly important as a temperature proxy (Dansgaard, 1964;Epstein et al., 1963) and are a key component in establishing the age-depth scale and chronology of ice cores (NGRIP Members, 2004;Vinther et al., 2006;Winstrup et al., 2017).They also provide other information about climate, including accumulation rates, precipitation source region, atmospheric circulation and air mass transport, and sea ice extent (e.g.Küttel et al., 2012;Sinclair et al., 2013;Steig et al., 2013;Bertler et al., 2018;Emanuelsson et al., 2018).
Historically, water isotopes from ice cores were analysed as a set of discrete water samples using isotope ratio mass spectrometry (Dansgaard, 1964).Recent advances in laser absorption spectrometry have allowed continuous flow analysis (CFA) to become common in ice core studies and are an essential measurement technique for obtaining highresolution climate records (e.g.Kaufmann et al., 2008;Gkinis et al., 2011;Kurita et al., 2012;Emanuelsson et al., 2015;Jones et al., 2017).However, the simultaneous operation of seven measurement systems (Winstrup et al., 2017;Pyne et al., 2018) and the continuous nature of CFA pose challenges for calibration and uncertainty estimation.Because of the size and resolution of CFA ice core data sets and the relatively new application of laser spectroscopy to ice cores, few established methods exist for calculating pointby-point uncertainty throughout measurements.Building on previous studies (e.g.Gkinis et al., 2011;Kurita et al., 2012;Emanuelsson et al., 2015), we have developed a systematic Published by Copernicus Publications on behalf of the European Geosciences Union.E. D. Keller et al.: RICE CFA water isotope record uncertainty approach to calibration and error calculation that allows for unique uncertainty estimates at each data point in a CFA water isotope record.In this study, we report our methodology for the calibration and calculation of uncertainty and demonstrate the application of the method on the Roosevelt Island Climate Evolution (RICE) ice core δ 2 H and δ 18 O data set.
The RICE collaboration retrieved a 760 m ice core from the north-eastern edge of the Ross Ice Shelf over Roosevelt Island in Antarctica (79.39 • S, 161.46 • W, 550 m a.s.l) during the austral summer 2011-2012 and 2012-2013 field seasons (Bertler et al., 2018).The RICE ice core provides a valuable record of a high snow accumulation site in coastal West Antarctica with annual or sub-annual resolution at the upper depths, representing the late Holocene.The climate reconstruction at the RICE site for the last 2700 years using the CFA water isotope record is available in a separate publication (Bertler et al., 2018).In addition to the value in the methodology itself, this paper provides confidence in the precision of the RICE data set and the climatic interpretation on annual and sub-annual timescales.This method can be applied to other high-resolution CFA ice core water isotope records in the future and may be suitable for other continuous water isotope measurement applications.
This paper is structured as follows: in Sect.2, we give an overview of our data processing and data quality control procedure.We also detail our methods for calibrating the isotope data and calculating the uncertainty for each data point.Section 3 contains the resulting estimates for each component of the total error of our data set and an analysis of the different sources of error.We conclude in Sect. 4 with a summary and recommendations for future CFA measurement campaigns.

Methods
The abundance of the rare isotope in a sample is conventionally reported in delta notation, defined as where R is 18 O/ 16 O or 2 H/ 1 H for water stable isotopes (Coplen, 2011).Results in this paper are reported as δ values in parts per thousand (‰), normalized to the international standard Vienna Standard Mean Ocean Water and Standard Light Antarctic Precipitation (VSMOW-SLAP) scale (Gonfiantini, 1978).

Melting and data processing
Cores were melted and processed at the Ice Core Laboratory at the GNS National Isotope Centre in Lower Hutt, New Zealand.There were two separate melting campaigns, one in June-July 2013, in which the top 500 m were melted, and the other in June-July 2014, in which the remaining 260 m (500-760 m) were melted (Pyne et al., 2018).There were several important differences between the 2 years in the CFA set-up (Emanuelsson et al., 2015;Pyne et al., 2018), which necessitated that the data from each melting campaign be processed separately.These differences are noted where they are relevant to the calibration and uncertainty calculations; some factors were calculated individually for each melting campaign and applied only to the data from that campaign.
The ice was cut into 1 m segments and melted at a controlled rate of approximately 3 cm min −1 , producing a liquid flow rate of ∼ 16.8 mL min −1 (Pyne et al., 2018).The melting set-up is based on Bigler et al. (2011) and is discussed in more detail in Emanuelsson et al. (2015), Pyne et al. (2018) and Winstrup et al. (2017).Briefly, the cores were placed vertically on a gold-coated copper melting plate and were allowed to melt continuously under gravitational pull.The water from the clean, inner part of the core was drawn from the centre of the melt head and pumped to instruments for CFA of stable isotopes, methane, black carbon, insoluble dust particles, calcium, pH and conductivity and discrete samples for major ion and trace element analyses.The water from the outer part of the core was saved in vials for discrete stable and radioactive isotope analysis.Either three or four 1 m core segments were stacked on top of each other and melted without interruption (referred to here as a "stack").At least one calibration cycle of three water standards was run between each stack.An optical encoder that rested on top of the core stack recorded the vertical distance displacement as the core melted.This displacement was translated into depth in millimetres and, along with the melting rate and other system information, was written to a log file every 1 s using LabVIEW software (National Instruments).These log files were used to align all CFA instrument data to the depth scale.Breaks in the ice were measured and recorded to 1.0 mm precision before melting.Any ice that was cut out and removed was recorded as a gap in the depth scale.The raw data files were processed using a graphical user interface (GUI) and a semiautomated script written in Matlab (Matlab Release 2012b, The MathWorks, Inc., Natick, Massachusetts, United States).Occasionally, poor-quality ice (i.e.ice containing fractures and slanted breaks) caused the upper part of the stack to stick to the sides of the core holder; the depth encoder failed to register any change in depth for a time, while the base of the stack continued to melt.These intervals required linear interpolation (assuming a constant melt rate) and introduced a small amount of uncertainty (Pyne et al., 2018).This occurred more frequently deeper in the core in the brittle ice zone (below 500 m).Given that the melt rate was fairly constant throughout the campaign, the error introduced in the depth assignment was negligible.More details of the data processing are available in Pyne et al. (2018).
Water isotope values (δ 2 H, δ 18 O) were measured using CFA with a water vapour isotope analyser (WVIA) using offaxis integrated cavity output spectroscopy (OA-ICOS; Baer et al., 2002) and a modified Water Vapour Isotopic Standard Source (WVISS) calibration unit (manufactured by Los Gatos Research, LGR).This system is described in detail in Emanuelsson et al. (2015).The 2013 and 2014 set-ups were largely the same but differed in the construction of the vaporizer and the delivery of the mixed vapour to the isotope analyser.In 2014, the heating element of the vaporizer was modified, and a higher sample flow was delivered directly to the IWA through an open split (Emanuelsson et al., 2015).Data were recorded in an output file at a rate of 2 Hz (0.5 s) in 2013 and at 1 Hz (1.0 s) for the remaining 260 m in 2014.The change in the recording rate of the isotope data in 2014 was made to match the rate at which the depth was recorded in both years (1 Hz).Note that this was not a change in the instrument's internal data acquisition rate, only in the rate of output aggregation.
The campaigns altogether required processing and alignment of over 5 million raw data points.Depth alignment across multiple measurement systems is a key issue for ice core campaigns and a fundamental requirement for producing an age chronology (Winstrup et al., 2017).The interpretation and identification of key events in the climate history thus depend on accurate depth alignment.This is particularly important deeper in the core, where a misalignment of a few centimetres could equate to hundreds or even thousands of years (Lee et al., 2018).Alignment of the isotope data to the depth scale is based on the time lag between the depth log file and the WVIA instrument output.The time lag was determined with an automated algorithm to detect the end of the calibration cycle and the beginning of the ice core melt stream using the abrupt increase in the change in numeric derivatives of adjacent data points.The calculated time lags during each measurement campaign averaged 418 s in 2013 and 156 s in 2014 but varied slightly from day to day by 10-20 s.(The lag was shorter in 2014 due to the reduction in length of tubing between the melter and WVIA.Variations occurred from the periodic replacement of the tubing.)There were a few occasions of equipment failure where manual depth alignment was necessary.Poor ice quality also affected the accuracy of the depth log files, as mentioned above (Pyne et al., 2018).The precise quantification of the uncertainty introduced from the depth assignment is beyond the scope of this paper; based on the variation in time lags, we estimate that, at most, it is of the order of 1-10 mm.

Data quality control
We applied several basic selection criteria to identify and eliminate poor-quality data from the raw δ 2 H and δ 18 O data set.The two main reasons for data removal were (1) changes in the water vapour concentration (H 2 O ppm) in the LGR analyser, and (2) the finite response time of the analyser and the transitional period when switching between water standards from the calibration cycle and RICE ice core meltwater (which by design had very different isotopic values).In addition, some gaps were introduced as a result of cutting the core into 1 m segments and the fractures in the ice that occurred during the drilling, recovery and handling process (Pyne et al., 2018).
The isotope ratio is dependent on water vapour concentration in the analyser (Sturm and Knohl, 2010;Kurita et al., 2012).To minimize the need to correct the data for this, the concentration in the analyser was kept as close to 20 000 ppm as possible.This value was monitored and recorded at the same frequency as the isotope data.For the most part this concentration was stable, but fluctuations and sudden changes did sometimes occur (for example, when air bubbles passed through the line).We removed data when the difference between the H 2 O ppm moving average over the short-term system response time of ∼ 60 s and over a longerterm, stable time of ∼ 200 s was greater than the standard deviation of the short-term average (Emanuelsson et al., 2015): avg s − avg l > σ s , where σ s is the standard deviation of the short-term average.In addition, data were removed if the water vapour concentration fell below 15 000 ppm for an extended period.This filtering removed the need to further correct for variations in water vapour concentration in the record (Emanuelsson et al., 2015).Figure 1 shows a typical day of raw data, including both RICE ice core stacks and calibration cycles.Data marked in red were removed using these criteria.The majority of these points occur during the switch from one water standard to another in the calibration cycle and do not affect the data from the ice core itself.The percentage of data removed using these criteria was 0.4 % of the total.
It was also necessary to remove some data points at the beginning and end of every stack during the transition period between the Milli Q (18.2 M ) laboratory water standard and ice core.This transition is illustrated in Fig. 2. The Milli Q standard is composed of local de-ionized water and has an isotopic value much greater than the RICE ice core (Table 1).Milli Q was run immediately before and after each stack, and there is a period of instrumental adjustment and mixing when switching between them due to memory effects and the finite response time of the spectrometer (see Emanuelsson et al., 2015 for a full discussion).To ensure that the data are not influenced by mixing at the beginning and end of the stack while including as much data as possible, we calculated the numerical derivative (or the rate of change) between consecutive δ 2 H data points during the transition until the derivative falls below a threshold; all points prior are then excluded.The same process is performed at the end of the stack in reverse.The threshold was found empirically and is different in 2013 and 2014 because of the difference in the response times of the two set-ups and the precision of the data.Data were inspected manually for cases where the algorithm was inadequate.Approximately 2-5 cm at the beginning and end of every stack was removed using this condition.These appear as gaps in the depth of the final data set.There were also a few occasions when melting was interrupted due to equipment failure, and Milli Q was run through the system until melting could resume; these periods were removed using the same procedure.A typical stack showing a portion of data removed is shown in Fig. 2 (δ 2 H vs. depth).The fraction of total data removed was 5.4 %.This resulted in short data gaps of 2-5 cm every 3 or 4 m.
The entire data set was manually inspected for any other regions of poor quality, and points that visibly fell outside the normal range or were affected by known instrument problems were removed.This only applied to a few isolated sections of data and was a very small portion (< 0.1 %) of the total.

Calibration
It is necessary in laser spectroscopy to normalize the isotopic values to the VSMOW-SLAP scale and to correct for instrumental drift.To accomplish this, we used a two-point linear calibration method (Paul et al., 2007;Kurita et al., 2012).Before and after each ice core stack, we ran calibration sequences consisting of four laboratory water standards: Milli Q, Working Standard 1 (WS1), RICE snow (RICE) and US International Trans-Antarctic Scientific Expedition West Antarctic snow (ITASE).An example of a calibration cycle is shown in Fig. 3. Assigned or "true" values for these standards measured against the VSMOW-2-SLAP-2 scale are listed in Table 1.We note that there is a difference in the assigned values for RICE and ITASE between 2013 and 2014.We have denoted them RICE-13, RICE-14, ITASE-13 and ITASE-14 in Table 1 to indicate that these standards were prepared and stored in different batches in each year, from water sources that had not been treated as standards or homogenized, and thus are slightly different in composition.We emphasize here that our standards are local working standards, selected or mixed by our laboratory to match the isotope ratios of the sample (melt stream).It is not unexpected that their isotopic value will change between batches during long measurement campaigns, as it is not practical to prepare and store all of the material in one batch.
Part of the difference in assigned values might be attributed to the difference in measurement systems.The assigned values for the 2013 calibrations were determined using discrete laser absorption spectroscopy measurements on an Isotope Water Analyzer (IWA) 35EP system.In 2014, our instrument was upgraded with a second laser to IWA-45EP, and the 2014 calibrations utilize values from standards measured continuously with this system.We were regrettably not able to calibrate our working standards using the 2013 CFA set-up before the set-up was modified for the 2014 campaign, so we use the assigned values from the 2013 discrete measurements in the 2013 calibrations.We thus consider the 2014 melting campaign to be better calibrated than the 2013 campaign.This follows from the principle of identical treatment (IT) of stable isotope analysis wherein samples and reference materials should be subject to identical preparation, measurement pathways and data processing to the greatest extent possible (Werner and Brand, 2001;Carter and Fry, 2013;Meier-Augenstein, 2017).
The working standards used for the calibration, RICE and ITASE, have assigned values which form an upper and a lower bound, respectively, for the majority of the ice core isotopic values (the ice core samples from the younger, top portion of the core occasionally fall slightly above the RICE standard).The third water standard (WS1) served as a quality control to enable us to check and quantify the accuracy of the calibration.Each standard was run continuously for approximately 10 min (varying between 8 and 15 min over the course of the melting campaigns), of which the first and last 100-200 s were discarded to ensure only the middle, stable portion of the measurement was used for calibrations.Around 300 s of data were averaged to arrive at the mean value of the measurement.
Frequent measurements of calibration standards are necessary to correct isotopic measurements for instrumental drift over time.At least one cycle of all three standards was run between stacks, and in many cases, there were several cycles.Melting a stack of three or four cores took around 2-2.5 h, so the measurement at the midpoint of a stack (the points fur-thest from a calibration) is about 1-1.25 h from the nearest calibration.While this is longer than would be ideal for isotope laser spectroscopy, the stability of other elements of the CFA system (in particular, continuous flow methane measurements) required long uninterrupted periods of melting.δ 18 O is typically more affected by drift than δ 2 H. Drift can be worsened by experimental conditions such as drill fluid contamination and leaks in the system as the analyte proceeds toward the vacuum in the laser cavity.We have quantified the error introduced by the amount of drift occurring between calibrations using the Allan deviation, discussed in Sect.2.4.1.
We have used a two-point linear normalization procedure, which is routinely used to adjust measured δ values to an isotopic reference scale (Paul et al., 2007).The correction takes the form of linear regression: where m is the slope of the line and b is the y intercept.The measured δ values of two laboratory standards are regressed against their assigned δ values.The slope m can be calculated by plotting the measured values of the standards on the x axis and their assigned values on the y axis and then using trigonometric formulas to relate them to the true value of the sample (Paul et al., 2007).The result is the ratio of the difference between the true RICE and ITASE δ values and the actual difference measured: where δ T RICE and δ T ITASE are the assigned true values and δ RICEi and δ ITASEi are the ith measured values of the standards RICE and ITASE, respectively.The correction then takes the following form: By design, the y intercept or offset b is equal to the difference between δ T RICE and δ RICEi when the slope m is 1.We applied this correction to each data point by weighting the factors calculated from the RICE and ITASE calibration measurements both before and after the stack with the time difference between the data point and the calibration: where δ raw is the uncalibrated, raw δ 2 H or δ 18 O value of the ice core sample, δ RICE1 and δ RICE2 are the measured values of the RICE standard before and after the stack, respectively, t is the time of the δ raw measurement, and f is a dimensionless weighting factor, f = (t − t 1 )/ (t 2 − t 1 ), t 1 is the starting time of the δ RICE1 measurement before the stack, and t 2 is the ending time of the δ RICE2 measurement after the stack.We note that this method assumes that drift is approximately linear over the measurement period.Our calibration procedure was validated by comparison with discrete measurements in Emanuelsson et al. (2015).The values of the slope corrections and the RICE and ITASE raw measurements used to calibrate the data in each year are shown in Figs.S2-S4 in the Supplement; mean values and standard deviations are in Table S1 in the Supplement.

Uncertainty calculation
We identified three main sources of uncertainty in our measurements: (i) the Allan variance error (a measure of our ability to correct for drift, a systematic source of uncertainty due to instrumental instability), (ii) the scatter or noise in the data over our chosen averaging interval, and (iii) a general calibration error relating to the overall accuracy of our calibration.
Our three error factors can be formally categorized as follows: i. Allan variance error is the systematic error or bias due to our imperfect ability to correct for drift; ii. scatter error is the error of the variance, precision or random variation of replicate measurements; iii. calibration error is the error of the mean or trueness.
The last two can be quantified with general analytical expressions (Kirchner, 2001).Systematic error does not have a general analytical form; isotopic drift is fortunately amenable to correction, but the method is imperfect.
We assume that the three error factors are uncorrelated to a large degree.This is supported by the general framework that we have used (Kirchner, 2001;Analytical Methods Committee, 2003) and the actual errors calculated at each data point (R 2 < 0.05 in each year for both isotopes).In practice it is impossible for all error factors to be completely uncorrelated, as some underlying sources of error will affect all aspects of the system.However, our data suggest that these interactions are small and/or short-lived and negligible to the total uncertainty.With this assumption, we calculate each error factor separately and add them in quadrature to arrive at the total uncertainty estimate: Each data point in the final record is assigned a unique error value.A detailed explanation of the calculation of each source of uncertainty follows.

Allan variance error
The Allan variance σ 2 allan , or two-sample frequency variance (Allan, 1966), is often used as a measure of signal stability and instrumental precision in laser spectroscopy (Werle, 2011;Aemisegger et al., 2012).In the context of CFA isotope measurements, it is also used as an estimate of how much instrumental drift accumulates over a specified period.It is defined by where τ is the averaging time, n is the number of time intervals, and δ(τ ) j and δ(τ ) j +1 are the mean values of adjacent time intervals j and j + 1 with length τ .The Allan deviation is the square root of the variance, σ allan .We calculated the Allan deviation of our system using measurements of the Milli Q standard, run continuously for 24-48 h.We conducted these tests periodically during both measurement campaigns (usually over the weekend when the instruments were otherwise idle; see Emanuelsson et al., 2015, for details).On a log-log plot of the Allan deviation vs. averaging time (τ ), there is a minimum at the averaging time where the precision is highest; before this point, at very short averaging times, instrumental noise affects the signal, and after, at longer averaging times, the effects of instrumental drift can be seen.Thus, the Allan deviation provides an estimate of the optimal averaging time, before and after which precision decreases.
The Allan deviation can also provide an indication of the uncertainty due to instrumental drift as a function of the time difference between the measurement and the nearest calibration.For our system to stay under the precision limit of 1.0 ‰ and 0.1 ‰ for δ 2 H and δ 18 O, respectively (and to permit analysis with deuterium excess, d = δ 2 H − 8 • δ 18 O), a calibration cycle to correct for drift should occur at least every ∼ 1 h during ice core measurements (Emanuelsson et al., 2015).However, as noted above, system limitations prevented us from running calibrations as frequently as would have been optimal.We use the Allan deviation here to estimate how quickly instrumental drift increased and thus how well we were able to correct for drift using our calibrations.
We plot the mean σ allan for all tests performed against averaging time τ on a log-log scale (done separately for 2013 and 2014) and perform a linear regression on the curve for averaging times greater than the minimum σ allan .The equation of the linear fit gives what we refer to as the Allan variance error (denoted by σ AVE to distinguish our error from the official definition of the Allan deviation): where t is the time difference between the data point and the calibration (as measured from the start of the measurement of the RICE standard), and a and b are constants determined from the linear regression.This error factor is calculated for each data point as a function of t.Because we calibrated using standards measured both before and after each stack, there are two factors at each point that are combined with a time-weighted average, using the same weighting used for the calibration (Eq.4): where f is defined as before in Sect.2.3.Allan variance error vs. depth over the whole data set is shown in Fig. 4. The local maximum for each stack occurs in the middle, at the point furthest away in time from the two calibrations bracketing the stack, reflecting that it is at this point that we are most uncertain of the amount of instrumental drift.

Scatter error
A second error derives from the scatter or noise in the signal over our averaging interval (15 s).This averaging interval was chosen by the RICE project team as a suitable scale over which to smooth measurement noise without obscuring important features in the data.This equates to approximately 7-8 mm on the depth scale.Due to this deliberate choice, the error calculation that follows applies over this interval.
To quantify this analytical uncertainty, we calculate the standard deviation for every 15 s time interval contained in each measurement of the RICE standard using a moving window (so that each adjacent, overlapping interval is advanced by 1 s) and average over the duration of the measurement: where σ i is the standard deviation, N is the total number of intervals, and n i is the number of data points in the ith interval (n =∼ 30 in 2013 and ∼ 15 in 2014).We note that the number of points that are contained in the interval is different in 2013 and 2014, resulting from the difference in output aggregation (not the instrument's internal data acquisition rate).This could affect the amount of noise in the data.However, we have not attempted to analyse this in detail, as we are only concerned here with quantifying the uncertainty associated with our averaging interval, regardless of the number of data points averaged.Again, because the RICE standard was measured both before and after each stack, we calculate σ scatter for both measurements and linearly combine them using a time-weighted average.Note this error is linear with time within a stack but is discontinuous at the points at which a stack begins and ends.This linearity is rooted in the fact that the noise in a set of measurements from the same sample can in general be modelled as a Gaussian process, with a normal distribution of independent random variables.The mean-squared displacement is linear with time.Scatter error vs. depth for the length of the core is shown in Fig. 5.

Calibration error
Finally, we calculate the error of the mean after applying our calibration procedure to quantify the trueness of the measurement with respect to our reference scale, denoted by σ calib .This captures both random, unsystematic components of uncertainty and systematic biases in the calibration stemming from a variety of (unspecified) sources.This quantity is often calculated as a check on the overall quality of the calibration procedure.Because it encompasses multiple sources of error, we expect it to be a relatively large error.Here, we make use of the large set of WS1 measurements that were taken during the calibration cycles.To calculate this factor, we apply the calibration formula using the RICE and ITASE standards (Eqs. 2 and 3) to the third quality-control standard, WS1, measured in the same cycle.The error is defined as the difference between the corrected, measured value and the assigned value of the WS1 standard.An example is shown in Fig. 6.We calculated this difference for all calibration cycles containing measurements of all three standards (RICE, ITASE and WS1) of sufficient quality (there were 221 such calibration cycles in 2013 and 318 in 2014) and then took the mean of the differences.Separate error estimates for the 2013 and 2014 melting campaigns were calculated and applied only to the data points from the respective year.The calibrated values obtained for all of the WS1 measurements throughout both campaigns are shown in Fig. S1 in the Supplement.

Results and discussion
Total error vs. depth for the whole record is shown in Fig. 7 and summarized in Table 2.The mean total errors for all data points are 0.74 ‰ (δ 2 H) and 0.21 ‰ (δ 18 O).Separated by melting campaign, mean total errors in 2013 are 0.85 ‰ (δ 2 H) and 0.22 ‰ (δ 18 O) and in 2014 they are 0.44 ‰ (δ 2 H) and 0.19 ‰ (δ 18 O).The total error reduces sharply at a depth of 500 m due to the switch between 2013 and 2014 campaigns and the greatly reduced calibration error in 2014.However, we observe a larger variability in the error in the 2014 data.This is mainly a result of the highly variable amount of noise in the measurements, which is discussed below.
The mean Allan errors for all data are 0.12 ‰ for δ 2 H and 0.14 ‰ for δ 18 O.Calculated separately by melting campaign, the mean errors are 0.13 ‰ (δ 2 H) and 0.16 ‰ (δ 18 O) in 2013 and 0.083 ‰ (δ 2 H) and 0.11 ‰ (δ 18 O) in 2014.As expected, the Allan error peaks at the points in the middle of the stack, furthest from a calibration (Fig. 4).It is both absolutely and proportionally larger for δ 18 O, as δ 18 O is typically more affected by drift.
The amount of scatter in the data varies considerably over the length of the record, particularly in 2014.The mean scatter errors over the whole record are 0.29 ‰ (δ 2 H) and 0.10 ‰ (δ 18 O).Separated by melting campaign, the mean errors are 0.26 ‰ (δ 2 H) and 0.093 ‰ (δ 18 O) in 2013, and 0.37 ‰ (δ 2 H) and 0.13 ‰ (δ 18 O) in 2014.On average, the scatter error is larger in 2014, although during the periods of best instrumental performance, σ scatter is lower than at any point in 2013.The instrument performance was highly variable in 2014, much more so than 2013.The standard deviations of σ scatter are 0.11 ‰ (δ 2 H) and 0.045 ‰ (δ 18 O) in 2014, as opposed to 0.026 ‰ (δ 2 H) and 0.012 ‰ (δ 18 O) in 2013.
Among the three error factors, the general calibration error is the largest contributor to the total error in 2013: σ calib (δ 2 H) = 0.80 ‰ and σ calib (δ 18 O) = 0.12 ‰.However, this error is greatly reduced for 2014: σ calib (δ 2 H) = 0.22 ‰ and σ calib (δ 18 O) = 0.078 ‰, reflecting the improved measurement of the assigned values of the standards.We were not able to measure the standards against VSMOW-SLAP using the 2013 CFA set-up (time constraints did not permit us to conduct additional measurements after the 2013 campaign concluded, as our instrument was sent to the manufacturer for modification), which would provide a better comparison between measured and assigned values, following from the principle of identical treatment (Werner and Brand, 2001).The 2013 σ calib is thus likely to be a very conservative estimate of the error.In addition, the assigned value of WS1 is well outside the range of the RICE ice core and is much greater than the RICE and ITASE standards, and thus RICE and ITASE could be considered poor choices for calibrating WS1.The two calibration standards, RICE and ITASE, were chosen to be similar in isotopic value to the ice core samples being measured (Werner and Brand, 2001), with the quality-control standard being of secondary concern.Ideally, we would use a quality-control standard that falls within the range of the values of our two calibration standards.While we could have used WS1 and ITASE as our calibration standards and RICE as a quality-control, WS1 is less appropriate than RICE for calibrating the range of isotopic values found in the ice core.Testing the sensitivity of the calibration error to our selection of quality-control standards, however, is outside the scope of this paper.
The scatter error dominates the total error in 2014.The magnitude of this error was highly variable from day to day, and thus the total error also varied considerably.There were some periods in which the instrument performed exceptionally well.During these periods, total errors were as low as 0.3 ‰ (δ 2 H) and 0.1 ‰ (δ 18 O).These represent the high end of system capability.However, for much of the 2014 melting campaign the total errors were closer to the average of 0.44 ‰ (δ 2 H) and 0.19 ‰ (δ 18 O).Because the campaign was conducted to operate many measurement systems simultaneously, as is characteristic of ice core CFA campaigns, it was typically not possible to conduct comprehensive performance tests and systematic evaluations during the 1 day of downtime in each week-long, 7day cycle.As a result, the precise sources of performance deterioration were difficult to isolate.Our method for calculating uncertainty is designed to capture the changing dayto-day conditions resulting from a range of system variations www.atmos-meas-tech.net/11/4725/2018/Atmos.Meas. Tech., 11, 4725-4736, 2018 and performance issues, even if it is not possible to pinpoint the exact cause.

Summary and conclusions
We have described a systematic approach to the data processing and calibration for the RICE CFA stable water isotope data set and presented a novel methodology to calculate uncertainty estimates for each data point derived from three factors: Allan deviation, scatter, and calibration accuracy.The mean total errors for all data points are 0.74 ‰ (δ 2 H) and 0.21 ‰ (δ 18 O).Mean total errors in 2013 are 0.85 ‰ (δ 2 H) and 0.22 ‰ (δ 18 O) and in 2014 they are 0.44 ‰ (δ 2 H) and 0.19 ‰ (δ 18 O).This represents a significant achievement in the precision of high-resolution CFA water isotope measurements, and documentation of uncertainty calculations for isotope analyses in a continuous measurement campaign comprising multiple complex measurement systems.The isotope analyser system performed exceptionally well during some time intervals in 2014, demonstrating high capability, even though this was not sustained.The variability in quality could be due to poor ice quality, interruptions in the CFA measurements, the build-up of residual drill fluid in the instrument, and/or leaks and valve degradation.Most likely, it was a combination of all of these factors.
The more accurate measurement of our laboratory water standards for the 2014 melting campaign enabled us to reduce the uncertainty considerably for the data at depths greater than 500 m.More generally, a reduction in the uncertainty in the system could be achieved through more rapid calibration cycles, enabling both the insertion of calibration during stacks and more rapid troubleshooting to isolate causes of degraded performance.
Our uncertainty estimates do not take into account the additional uncertainty introduced from the smoothing of the data during the melting procedure and the measurement response time.This is an important issue, particularly for deep, older ice, where annual layers are greatly compressed and measurement resolution is crucial to the ability to date the core accurately.The degree of mixing in the melting procedure itself can be controlled through the melting rate and the diameter of tubing leading from the melter to the CFA instruments.Our system was designed primarily for high throughput and multiple, simultaneous measurements.However, these parameters can be adjusted to increase resolution for older ice (the very bottom of the RICE core has yet to be measured).
The volume of the evaporation chamber is usually a limiting factor in the temporal resolution and response time of the IWA and can introduce a significant amount of uncertainty.While we reduced the volume of the chamber from the manufacturer's default of 1.1 L to 40 mL (Emanuelsson et al., 2015), there is still a finite amount of time required to fill and replace the chamber with new sample.We estimate that our depth resolution was between 1.0 and 3.0 cm (Pyne et al., 2018).A more comprehensive evaluation of the effect of the mixing inherent in the melting and measurement procedure on the overall uncertainty is beyond the scope of this paper but is an important consideration for future work.

Figure 1 .
Figure 1.An example of the raw data from a full day of ice melting and calibration cycles (2-3 July 2014): (a) δ 2 H, (b) δ 18 O and (c) water vapour mixing ratio.Isotope data that were removed because of water concentration anomalies are marked in red in (a, b) panels.

Figure 2 .
Figure 2. A selected example section of δ 2 H vs. depth.The data marked in red represent the transitions between the Milli Q standard and ice core at the boundaries of each 3 m stack.These data points (and other poor quality data) were removed from the final data set.

Figure 3 .
Figure 3.Time vs. raw δ 18 O (uncalibrated) for 1 day of melting (3 July 2014).Values of standards drift noticeably over the course of the day.An example of one calibration cycle of three water standards run between ice core stacks is marked in colour: WS1 (red), RICE (green) and ITASE (blue).

Figure 4 .
Figure 4. Allan variance error vs. depth in per mil.δ 2 H is in blue and δ 18 O is in red.The low points of the dips are the start and end of a stack, between which calibrations were carried out.

Figure 5 .
Figure 5. Scatter error vs. depth in per mil.δ 2 H is in blue and δ 18 O is in red.

Figure 6 .
Figure 6.Representative δ 18 O calibration of ice core stack and WS1, using RICE and ITASE standards from the same cycle, 15 s moving average vs. time (measured on 2 July 2014).The difference between the true value of WS1 (blue) and the calibrated measured value of WS1 (red) is the calibration error.The error that was applied to the CFA data set is the average difference of all WS1 calibration measurements during the melting campaign.

Figure 7 .
Figure 7.Total uncertainty vs. depth, along with each individual error factor in per mil.(a): δ 2 H. (b): δ 18 O.There is a noticeable discontinuity at 500 m; the melting campaign was paused at 500 m in 2013, and melting was resumed in 2014 with a modified set-up.The reduced calibration error in 2014 is responsible for the large step down in total error.

Table 1 .
Accepted values (VSMOW-SLAP scale) for water standards used for calibrations in per mil (‰).

Table 2 .
Summary of uncertainty estimates in per mil (‰).
* n/a stands for "not applicable".