Intercomparison of Total Carbon Column Observing Network (TCCON) data from two Fourier transform spectrometers at Lauder, New Zealand

We describe the change in operational instrument for the routine measurement of column-averaged dryair mole fraction of several greenhouse gases (denoted Xgas) at the Lauder Total Carbon Column Observing Network (TCCON) site and the steps taken to demonstrate comparability between the two observation systems following a systematic methodology. Further, we intercompare retrieved Xgas values during an intensive intercomparison period during October and November 2018, when both instruments were performing optimally, and on subsequent, less frequent occasions. The average difference between the two observing systems was found to be well below the expected level of uncertainty for TCCON retrievals for all compared species. In the case of XCO2 the average difference was 0.0264±0.0465 % (0.11± 0.19 μmol mol−1).


Introduction
The Total Carbon Column Observing Network (TCCON; Wunch et al., 2011) coordinates globally distributed measurements of near-infrared solar absorption spectra from which high-precision retrievals of the column-averaged dryair mole fraction of several greenhouse gases (denoted X gas ), including CO 2 , CH 4 and CO, can be made.
The National Institute of Water & Atmospheric Research (NIWA) atmospheric observatory at Lauder, New Zealand, was one of the first TCCON sites and has been operating since 2004. The site initially used a Bruker IFS 120HR (serial number (SN) 39, TCCON identifier lh) Fourier trans-form spectrometer (FTS) to take both near-infrared (NIR) TCCON measurements and mid-infrared (MIR) observations for the Network for the Detection of Atmospheric Composition Change Infrared Working Group (NDACC-IRWG, De Mazière et al., 2018). This meant that there were regular instrument interventions to change optical components. In 2010 a dedicated Bruker IFS 125HR (serial number 072, TCCON identifier ll) was purchased to continue the TCCON measurements in parallel with MIR measurements on the IFS 120HR. The history of the instrument systems used for the TCCON dataset and a thorough description of the site, retrieval scheme and validation of the dataset were previously presented in Pollard et al. (2017), hereafter referred to as Pol-lard17, and a summary of the instrument changes is given in Table 1.
Because the Bruker IFS 120HR became unsupported by the manufacturer, it was decided to purchase a second IFS 125HR (serial number 132, lr) to continue the TCCON dataset and switch the existing instrument to MIR measurements for the NDACC to ensure the continued reliability of both datasets.
The purpose of this article is to define the testing and comparisons that needed to be undertaken in order to ensure that the two instrument systems give comparable results and to demonstrate that the Lauder TCCON dataset meet these requirements and can be considered continuous across the change in instruments.
There have been several past studies which have compared the measurements of low-resolution, portable Bruker EM27/SUN FTS instruments with the IFS 125HRs of TC-CON stations, e.g. Gisi et al. (2012) and Hedelius et al. (2016). Side-by-side comparisons of high-resolution instru-  Messerschmidt et al. (2010) were able to compare NIR measurements from two IFS 125HR instruments side by side at the TCCON site in Bremen, Germany, and the comparison of the IFS 120HR and the original IFS 125HR at Lauder was described in Pollard17. The work described here represents the first time that an operational TCCON station has changed measurements between two IFS 125HR instruments and describes the steps needed to ensure comparability of their measurements.
In the next section we will briefly describe the instrumentation and retrieval schemes. Section 3 will outline the tests undertaken to ensure comparability between the retrievals carried out using both instruments. Conclusions will be drawn in Sect. 4.

Experimental setup
In this section we outline both the instrumentation and the retrieval scheme used to produce the Lauder TCCON site dataset. This has already been described in detail in Pol-lard17; therefore this section will give a broad overview and concentrate on details specific to the change in instrument.

Instrumentation and data collection
The Bruker Optik GmbH IFS 125HR FTS is the primary instrument of the TCCON. Over the course of the Lauder TC-CON site time series we have measured using three instruments as outlined in the introduction and detailed in Table 1. For clarity hereafter we will refer to the instruments by their two-letter TCCON site identifier (i.e. lr for the new 125HR, ll for the previous 125HR and lh for the original 120HR, which will not be discussed in detail herein).
The two instruments compared in this work are functionally identical, using a calcium fluoride beam splitter and a 45 cm path difference to give a spectral resolution of 0.02 cm −1 . The DC output of two detectors, InGaAs (spectral range 3800-12 000 cm −1 ) and Silicon (9000-16 000 cm −1 ), is measured simultaneously.
The high-resolution FTS instruments at Lauder are accommodated in a purpose-built, temperature-controlled building. In May 2018 instrument lh was removed from the building and replaced by lr, leaving ll in its original position.
Each instrument is positioned below a dedicated solar tracker with optical feedback providing a pointing accuracy of 0.02 • (Robinson et al., 2020).
Ancillary meteorological measurements are made at a nearby climate station, and the pressure data from this are necessary for the greenhouse gas (GHG) retrievals.
Through the use of automatic scheduling software (Geddes et al., 2018), the continuous operation of the solar trackers and the use of automated tracker covers which close a hatch over the solar pointing elevation mirror in the presence of precipitation or winds above a certain threshold, the operational TCCON instrument (lr) is able to make unattended measurements at any time. During the intensive intercomparison period between October and November 2018, the ll instrument was also left configured for NIR measurements and able to operate unattended in parallel with lr. Since November 2018 intercomparison measurements have been conducted on ll on an opportunistic basis. This has resulted in 34 d where both instruments were recording NIR spectra, spread across 12 months to September 2019.

Retrieval scheme
The GGG suite of processing software, currently version GGG2014 as described by Wunch et al. (2015), is used across the TCCON and includes software to process raw interferograms to spectra (i2s) and a non-linear, least squares fitting algorithm (GFIT). The implementation of GGG2014 used for the lr instrument is the same as for ll and has previously been described in Pollard17.
It is important to note that the resulting outputs of the retrieval scheme are dry-air mole fractions (DMFs or X gas ), where the vertical column of the retrieved gas is scaled by the co-retrieved vertical column of oxygen in order to remove instrumental biases: where 0.2095 is the assumed dry-air mole fraction of O 2 . The DMF of dry air, (X air ) is a special case given by where m H 2 O and m dry air are the mean molecular masses of water (18.02 g mol −1 ) and dry air (28.964 g mol −1 ) and (VC air ) is calculated from the surface pressure, (P s ): where {g} is the column-averaged acceleration due to gravity and N a is Avogadro's constant.
In an idealised case X air would be unity, but limitations in the spectroscopic databases used for the retrievals mean that the actual value typically lies within 1 % of 0.98. The value and stability of X air is used as a diagnostic of the measurement system as VC air is independent of the instrument system and instrumental biases are not removed by scaling. Therefore deviations from the nominal value can be indicative of instrumental and systematic problems such as timing or pointing errors. et al. (2016) attempted to identify all parts of the measurement and retrieval system that could lead to differences in the retrieved X gas quantities of two different FTS systems, which they summarised in Table 6 of that article, and we have used this as the basis for systematically demonstrating the comparability of the two instrument systems.

Hedelius
Several factors listed in Table 6 of Hedelius et al. (2016) are not relevant to the intercomparison being considered in this work for following the reasons: -Because the two instruments are functionally identical, the incoming radiation attenuation effect, optimum averaging time and resolution effects do not need to be considered.
-Solar zenith angle (SZA) artefacts are negated by comparing temporally coincident observations made in parallel.
-Spectral fitting windows and the uncertainty budget for the fitting algorithm do not need to be considered because the same retrieval scheme is used for both instruments. The same is also true for the averaging kernels. However, these will have a dependence on the instrument signal-to-noise ratio (SNR), but this will be a lower order effect than the variation with SZA (Wunch et al., 2011), especially as the SNR is very similar for both instruments (see Sect. 3.1). Therefore it need not be considered in this work.
-Long-term artefacts are not relevant over the period of this study.
-Because the instruments are co-located, region or zone dependence, surface pressure effects, sensitivity to the profile of meteorological parameters and differences in the a priori information can be ignored. In the subsections below, we first examine the signal-to-noise characteristics of the two instruments and then address each of the remaining items in Table 6 of Hedelius et al. (2016) in its own subsection.

Signal-to-noise ratio
There are several methods for calculating the SNR of a spectrum. In this section we use the method implemented in the upcoming version of the GGG processing suite, GGG2020, which smooths the spectrum to remove instrumental noise in order to calculate the signal level and then compares the rms differences of the unsmoothed spectrum with the smoothed spectrum in regions where the signal is close to zero. Figure 1 shows histograms of the spectral SNRs calculated for both instruments during October 2018. Only spectra which cleared the initial GGG quality checks (convergent solution and volume mixing ratio scaling factor, rms fit residuals, frequency shift, and solar gas shift within thresholds) were included in the statistics and outliers are not shown. The median SNR for ll is 154 and that for lr is 157, and the means are 153.5 and 154.1 respectively (standard deviations 8.2 and 10.3 respectively). The lr SNRs have a larger number of low-value outliers resulting in the lower value for the mean. However, the median values are similar and we conclude that the two instruments perform to a similar standard in this regard.

Instrument line shape
The instrument line shape (ILS) retrieved from lamp measurements of a gas cell containing a known amount of HCl is used as a diagnostic of the alignment and stability of instruments across the TCCON. This is achieved using the LINE-FIT 14.5 software and methodology outlined in Hase et al. (2013).
Since Pollard17, the retrieval settings used at Lauder to obtain the ILS have changed from one which described the ILS in terms of two typical misalignment parameters (shear and angular) to one which fully retrieves the ILS as a function of optical path difference (OPD), in accordance with TCCONwide guidance.
Over the period presented here, the mean modulation efficiency (ME) at the maximum OPD for ll is 1.022 ± 0.002 and the maximum phase error (PE) is 0.002±0.002 rad. This represents an increase in ME at maximum OPD and a reduction in maximum PE to the values presented in Pollard17, following a realignment of ll in October 2017. The 2.2 % overmodulation at maximum OPD, however, remains below the 4 % required to ensure the necessary retrieval accuracy for X CO 2 (Hase et al., 2013). For lr, the mean ME at maximum OPD is 0.994 ± 0.005 and the mean maximum PE is −0.002 ± 0.001 rad. Figure 2 shows both the ME at the maximum OPD and the maximum PE for both instruments at approximately monthly intervals during and since the intercomparison period. This demonstrates the quality and stability of the alignment of both instruments.

Laser sampling error
It is a known feature of the IFS 125HR instruments that the metrology laser can be sampled incorrectly resulting in some of the spectral information above the Nyquist frequency of 7899 cm −1 being folded below it and vice versa, causing features known as "ghosts" (Messerschmidt et al., 2010). This is mitigated in two ways. Firstly, the zero level on the laser amplifier board is tuned to minimise the effect and subsequently checked annually, a process more fully described in Pollard17. Secondly, within the GGG2014 i2s software, the spectra are resampled based on the spectra of the silicon detector, which is wholly contained in the upper half of the alias as described in Wunch et al. (2015). Figure 3 shows the laser sampling error (LSE) determined using this method for both instruments during the period of the intercomparison, showing a mean and standard deviation of 1.475 ± 1.315 × 10 −4 and 1.167 ± 1.612 × 10 −4 for ll and lr respectively. These diagnosed values are small relative to the range of LSE that can be resampled by i2s, and therefore will not have a detrimental effect on retrievals.

Frequency shifts
The absolute calibration of the measured spectral grid can be affected by either a discrepancy between the actual and expected laser wavenumber of the metrology laser or a Doppler shift of the absorbing species in the atmosphere caused by atmospheric motion parallel to the solar pointing direction.
GGG retrieves this frequency shift from the idealised spectroscopy of the telluric absorption features for each microwindow. For the purposes of this comparison we choose to examine the fitted shift in the oxygen window centred at 7885 cm −1 as this is the broadest micro-window and thus limits the sensitivity to specific species. For a 1-month period of the intercomparison we find that the median shift relative to the central wavenumber ( ν/ν) for the ll instrument is −0.469 × 10 −6 (standard deviation 0.028 × 10 −6 ) and that for lr is −0.507 × 10 −6 (0.026 × 10 −6 ). This demonstrates similar performance of the metrology laser in both instruments. The variability of the shift is likely dominated by the wind-induced Doppler shift, hence the similarity in the standard deviation values.

Solar gas shifts
Similarly, to the frequency shift, GFIT retrieves the shift of the solar spectroscopic lines from their expected value. GFIT accounts for the Doppler-induced shift caused by the Earth's rotation and orbital eccentricity and so the remaining solar gas shift (SGS) is wholly due to the Doppler shift induced by the Sun's rotation if the instrument solar tracker is not pointed at the centre of the solar disc. Figure 4 shows the retrieved SGS as a function of solar zenith angles for both instruments on 7 February 2019. Also plotted is the equivalent pointing accuracy required to achieve the TCCON target precision. As can be seen, the retrieved SGS remains well within this limit throughout the day, despite a small deviation for lr at high solar zenith angles in the morning as the solar tracker achieved a lock on the Sun and transitioned from passive to active tracking, indicating acceptable solar pointing is achieved.
This provides confidence that the performance of the entire measurement system: in this case the solar tracker, FTS and retrieval scheme are similar for both instruments. A more thorough discussion of the solar tracking system and its assessment is provided in Robinson et al. (2020).

Air mass dependence
Due to spectroscopic limitations, the retrievals of X CO 2 and a number of other species exhibit an SZA or air mass dependence at all TCCON sites. An air-mass-dependent correction factor (ADCF) is derived for these species following Appendix A(e) of Wunch et al. (2011) and is based upon fitting a symmetric and anti-symmetric function to the diurnal variation about the mean value. It is assumed that the symmetric variation is likely to be an artefact due to limitations of the spectroscopy used in the retrieval and that the antisymmetric component is real. For the TCCON-wide correction an ADCF is computed based upon long-term retrievals from a subset of sites.  Table 2. Because the symmetric term can also be affected by instrumental problems (e.g. zero level offsets, continuum curvature and ILS uncertainties), it is reassuring that the ADCFs derived for both instruments are consistent with one another and the prescribed TCCON values.

X gas comparison
In this section we present data from both instruments retrieved from measurements made during the October-November 2018 intercomparison period and intermittently since.
In order to make meaningful comparisons between the two instruments, the data are first averaged over 10 min bins. A 10 min time period was chosen in order to ensure that temporally coincident values are being compared whilst not alias-ing in effects due to air mass dependence or natural variability.
This method results in 833 10 min averages being compared from both instruments. Correlation plots for X AIR , X CO 2 , X CH 4 and X CO are shown in Fig. 5 and summarised in Table 3.
For X air lr is on average 0.0855 % higher than ll (standard deviation 0.1272), and the spread of the difference increases slightly at higher SZAs as shown in Fig. 6. This is likely caused by small differences in the time it takes both instruments to conduct a measurement, due to slightly different firmware versions or hardware, leading to small errors in the computed air mass which differ in magnitude for the forward and reverse scans. This is an effect which is amplified at high SZAs when the air mass is changing more rapidly, as detailed in Pollard17.
The average X CO 2 is virtually the same for both instruments (ll is 0.0264 % higher than lr with a standard deviation of the differences of 0.0465 %) and well within the  . Retrieved solar gas shift (left axis) and corresponding angular pointing error (right axis, assuming that any mispointing is perpendicular to the Sun's axis of rotation) as a function of solar zenith angle for both instruments during the course of 7 February 2019. The pointing accuracy required to maintain the TCCON precision target equivalent to a 0.2 % error in X CO 2 is indicated by the black line. expected uncertainty of the retrieval scheme (0.2 %, Wunch et al., 2010) and the expected site-to-site bias (0.2 %, Wunch et al., 2015). As can be seen in Fig. 7, there is no discernible variation with SZA, as the small timing error effect will have been negated during the scaling by the vertical column of O 2 to derive the dry-air mole fraction. The difference in the average retrieved X CH 4 is also very small (ll 0.0561 % lower, standard deviation 0.0647 %), which is again well below the expected retrieval uncertainty (0.4 %, Wunch et al., 2010) and the expected site-to-site bias (0.4 %, Wunch et al., 2015).

Conclusions
We have taken a systematic approach to demonstrating the comparability of retrieved quantities of X CO 2 , X CH 4 and X CO from an existing and new Bruker IFS 125HR instruments at the Lauder TCCON site. The approach adopted considered each instrument system, including the solar tracker and the processing and retrieval scheme, as a whole. Most potential sources of discrepancy, both instrumental and methodological, can be discounted due to the co-location of the two instruments and the use of a common processing and retrieval scheme. For the remaining, instrument-specific, sources we addressed each one to ensure comparability.
Finally, we compared the retrieved data from each instrument over a 2-month comparison period in October and   November 2018 and find excellent agreement with the average difference between 833 10 min averages of 0.0264 % for X CO 2 (ll-lr, standard deviation 0.0465 %), which is well below the expected TCCON site-to-site bias of 0.2 %. The difference in X CO 2 reported here also compares favourably with previous work. In Pollard17, the comparison of the lh and ll instruments showed a mean difference in daily averages of X CO 2 of 0.068 ± 0.113 % and Messerschmidt et al. (2010) reported an average difference for 1 h of data of 0.07 %.
We therefore conclude that users of the Lauder TCCON dataset can consider it to be continuous across the change in instruments.