Research article 24 May 2018
Research article  24 May 2018
Is it feasible to estimate radiosonde biases from interlaced measurements?
 ^{1}Bodeker Scientific, 42 Russell Street, Alexandra, New Zealand
 ^{2}Institute for Meteorology, Freie Universität Berlin, CarlHeinrichBecker Weg 6–10, Berlin, Germany
 ^{3}National Institute of Water and Atmospheric Research, Lauder, New Zealand
 ^{1}Bodeker Scientific, 42 Russell Street, Alexandra, New Zealand
 ^{2}Institute for Meteorology, Freie Universität Berlin, CarlHeinrichBecker Weg 6–10, Berlin, Germany
 ^{3}National Institute of Water and Atmospheric Research, Lauder, New Zealand
Correspondence: Stefanie Kremser (stefanie@bodekerscientific.com)
Hide author detailsCorrespondence: Stefanie Kremser (stefanie@bodekerscientific.com)
Upperair measurements of essential climate variables (ECVs), such as temperature, are crucial for climate monitoring and climate change detection. Because of the internal variability of the climate system, many decades of measurements are typically required to robustly detect any trend in the climate data record. It is imperative for the records to be temporally homogeneous over many decades to confidently estimate any trend. Historically, records of upperair measurements were primarily made for shortterm weather forecasts and as such are seldom suitable for studying longterm climate change as they lack the required continuity and homogeneity. Recognizing this, the Global Climate Observing System (GCOS) Reference UpperAir Network (GRUAN) has been established to provide referencequality measurements of climate variables, such as temperature, pressure, and humidity, together with wellcharacterized and traceable estimates of the measurement uncertainty. To ensure that GRUAN data products are suitable to detect climate change, a scientifically robust instrument replacement strategy must always be adopted whenever there is a change in instrumentation. By fully characterizing any systematic differences between the old and new measurement system a temporally homogeneous data series can be created. One strategy is to operate both the old and new instruments in tandem for some overlap period to characterize any interinstrument biases. However, this strategy can be prohibitively expensive at measurement sites operated by national weather services or research institutes. An alternative strategy that has been proposed is to alternate between the old and new instruments, socalled interlacing, and then statistically derive the systematic biases between the two instruments. Here we investigate the feasibility of such an approach specifically for radiosondes, i.e. flying the old and new instruments on alternating days. Synthetic data sets are used to explore the applicability of this statistical approach to radiosonde change management.
Radiosondes are indispensable for monitoring the upper air as they provide high vertical resolution in situ observations of temperature, pressure, and water vapour between the surface and the upper troposphere–lower stratosphere. Determining longterm temperature trends from radiosonde measurements is challenging because changes in instrumentation can, among other things, introduce discontinuities in the measurement time series (see Fig. 1). Since radiosonde measurements are primarily made to provide the data needed to constrain weather forecasts and not to detect longterm changes in climate, little attention has been paid to ensuring the longterm homogeneity of the measurement record when changing from one instrument to another. As a result, radiosonde data records typically fall short of the standard required to reliably detect changes in climate. Another cause of inhomogeneities in the record is undocumented changes in data processing (Thorne et al., 2011). While much effort has been spent attempting to remove discontinuities in radiosonde data records (e.g. Sherwood et al., 2005; Randel and Wu, 2006; Haimberger et al., 2012), lack of confidence in the longterm homogeneity erodes confidence in derived trends. Seidel and Free (2006) used upperair temperatures from the NCEPNCAR reanalysis (Saha et al., 2010) to investigate the effects of sampling frequency, changes in observation schedule, and the introduction of inhomogeneities on the radiosonde climate data record. Their results indicate that introducing inhomogeneities into a temperature time series provides the most significant source of uncertainty in trend estimates. Maintaining the temperature measurement stability to within 0.1 K for periods of 20 to 50 years avoids uncertainties in trend estimates in at least 99 % of cases (Seidel and Free, 2006). With a weaker stability requirement of 0.25 K, the uncertainty in a 50year trend estimate increases by about 5 % for twicedaily sampling. Rust et al. (2008) showed that inhomogeneities in temperature measurements can cause spurious memory, leading to larger uncertainty for statistics derived from these series. The results of these studies demonstrate the need to account for any inhomogeneities in the measurement time series prior to any trend analysis.
The GCOS (Global Climate Observing System) Reference UpperAir Network (GRUAN) was established to provide referencequality measurements of atmospheric ECVs suitable for reliably detecting changes in global and regional climate on decadal scales. To avoid compromising the integrity of the longterm climate record, it is essential that any change, e.g. in the instrumentation or data processing, is adequately assessed before the change is implemented. For example, when transitioning from one radiosonde type to another, intercomparison between the two radiosonde types is required to assess a potential systematic difference between the radiosondes and to correct for it, ensuring a continuous homogeneous data set without any introduced discontinuities. Typically, intercomparisons of measurements from dual or quadruple (two of each instrument type) radiosonde flights are used to robustly detect systematic differences between the instruments (e.g. Luers and Eskridge, 1998; Steinbrecht et al., 2008; Kobayashi et al., 2012; Jensen et al., 2016). Results presented in Steinbrecht et al. (2008) indicated that temperature biases often increase significantly with increasing altitude, particularly in the lower stratosphere. In the past, WMO conducted several radiosonde intercomparison campaigns (e.g. Jeannet et al., 2008; Nash et al., 2011) with the objective of investigating the performance of operational radiosonde systems. The results of these campaigns are used in part to improve the accuracy of daytime operational radiosonde measurements and the associated correction procedures to provide temperature and relative humidity accuracies currently possible with nighttime measurements. The knowledge of the performance that can be expected from various radiosonde systems allows the users to make a wellinformed decision on the choice of future equipment. For a measurement network like GRUAN, it is essential to have more than one goodquality radiosonde type for operations. Instrument biases are also influenced by clouds as shown in Jensen et al. (2016) who found systematic differences in temperature measurements greater than 2 K between the Vaisala RS92 and RS41 radiosonde when exiting cloud layers. This large difference in temperature measurements between the two radiosondes was attributed to the wetbulb effect, in which the temperature sensor gets wet while passing through a cloud layer and is subject to evaporative cooling after entering drier parts of the atmosphere. Below 28 km of altitude, Jensen et al. (2016) found a mean systematic difference between the temperature measurements of the two radiosondes of 0.13 K. For radiosonde measurements performed at GRUAN sites, it is suggested that sites conduct dual sonde launches for at least 6 months when changing from one instrument type to another (GCOS171, 2013). However, analysis of data from dual sonde launches conducted at the GRUAN Lead Centre suggests that at least 200 dual flights over a period of 1 year are required to accurately assess the systematic difference between the two sonde types (GCOS171, 2013). The number of dual sonde flights required may be site dependent, and therefore sitespecific analysis is likely required to determine the required number of dual flights at any site. Furthermore, it is possible that instrument biases at one site may not be the same in different atmospheric conditions at other sites, though this has not been extensively evaluated. Therefore, it would be ideal if all GRUAN sites could complete thorough radiosonde intercomparisons by performing dual radiosonde launches for at least 6 months prior to any instrument change. However, the costs of such a measurement campaign can be significant, preventing some stations from performing extensive dual launches.
In this study, we investigate the feasibility of quantifying the difference in biases of two instrument types by alternating between the two different instruments and then applying a statistical model to infer any systematic biases between the two instruments. For this study, we conduct the investigation by applying the statistical model developed to synthetic data sets, in which the persistence of weather conditions is a controllable parameter, that represent such interlaced radiosonde flights. Specifically, we investigate (i) whether a combination of interlaced measurements together with an appropriate statistical model can be used to estimate the differences in biases of two instrument types and, (ii) if so, how effective the approach is. This method, if feasible, could reduce the financial burden for sites seeking to manage such a transition, since an interlacing approach would not require additional measurements above what is needed for normal daily operation.
2.1 Background
Any modification of instrumentation might introduce a systematic change to the measurement time series. This change is typically assumed to be a constant difference (Δ) as a firstorder approximation resulting from differences in the individual instrument biases, i.e. their systematic deviations from the true value. As the true value of the quantity being measured is unknown in practice, it is not possible to estimate each instrument's individual bias. It is possible, however, to estimate the difference $\mathrm{\Delta}={\text{Bias}}_{A}{\text{Bias}}_{B}$ in biases Bias_{A} and Bias_{B} of instruments A and B. If temporally and spatially coincident measurements are made using instrument A and B (i.e. dual flights), this difference can be easily obtained: consider some quantity of interest, e.g. air temperature (T), measured with instrument A and instrument B at the same location and time t. The bias of each instrument is the difference between the expectation value of the instrument's measurement and the unknown true value T_{t}:
where T_{t,A} and T_{t,B} are the temperatures at time t measured with instrument A and B, respectively. The difference in the instrument bias is therefore
Consider now that T_{t,B} differs from T_{t,A} only by a constant offset Δ, i.e.
which is independent of the true value and thus the measurement time t. Under this assumption, an estimate for the stationary difference in biases can be obtained from N dual measurements according to
with $\widehat{\mathrm{\Delta}}$ denoting an estimate of the constant offset Δ. This equation applies even if the true value T_{t} is changing with time as it depends only on anomalies ${T}_{t,A/B}{T}_{t}$. Under suitable conditions, the uncertainty (expressed in terms of standard deviation, SD) of this estimate decreases with $\sqrt{N}$ and depends on the persistence (i.e. autocorrelation) of the time series (Wilks, 2011).
2.2 A statistical model for interlaced measurements
As dual measurements using both instrument types require additional resources and therefore inherent additional costs, estimating a systematic difference between the instruments using interlaced measurements, i.e. using instrument A on odd days $t\in \mathit{\{}\mathrm{1},\mathrm{3},\mathrm{5},\mathrm{\dots}\mathit{\}}$ and instrument B on even days $t\in \mathit{\{}\mathrm{2},\mathrm{4},\mathrm{6},\mathrm{\dots}\mathit{\}}$, is explored in this study. Using this approach, at every time t only one measurement from one instrument is available, and hence Eq. (4) is not applicable.
The underlying assumption for the approach outlined here to work is that the quantity of interest fluctuates around a smooth climatological signal (i.e. a seasonal cycle) and the fluctuations show a certain degree of persistence at the weather timescale; e.g. the fluctuations show a day to day dependence. For a typical difference in the biases between radiosondes this persistence (i.e. autocorrelation) is key to the idea of estimating a bias from interlaced measurements. The difference in the biases tested here is smaller than the day to day fluctuations themselves as it carries information from the measurement A to the measurement B.
In the following, a simplified model for air temperature time series complying with the abovementioned assumptions is constructed. The true (unobserved) time series is represented by a smooth seasonal cycle with an autoregressive process of first order (AR[1], e.g. Box and Jenkins, 1976; Wilks, 2011) added to the time series; i.e.
with ${d}_{t}\in [\mathrm{1},\mathrm{\dots},\mathrm{365}]$ giving the day in the year for date t, where a is the autocorrelation coefficient which describes the degree of persistence in the time series at the weather timescale, e.g. the fluctuations show a day to day dependence, and ${\mathit{\eta}}_{t}\sim \mathcal{N}(\mathrm{0},{\mathit{\sigma}}^{\mathrm{2}})$ is the driving noise of the AR[1] process selected randomly from a Gaussian distribution. The latter is taken to be Gaussian white noise with zero mean and variance σ^{2}. This is a wellestablished model for the persistence of e.g. daily air temperatures (e.g. Wilks, 2011).
Pseudoobservations are now obtained from a realization of T_{t} (Eq. 5) with an instrument bias and random measurement noise added. Here, we aim for interlaced temperature measurements T_{t,A} and T_{t,B} from instruments A and B and thus add the instrument biases c_{A} and c_{B}, respectively, and independent Gaussian measurement uncertainties ${\mathit{\u03f5}}_{t,A}\sim \mathcal{N}(\mathrm{0},{\mathit{\sigma}}_{A}^{\mathrm{2}})$ and ${\mathit{\u03f5}}_{t,B}\sim \mathcal{N}(\mathrm{0},{\mathit{\sigma}}_{B}^{\mathrm{2}})$:
For simplicity, we assume equal variances ${\mathit{\sigma}}_{A}^{\mathrm{2}}$ = ${\mathit{\sigma}}_{B}^{\mathrm{2}}$ for the measurement uncertainties. The continuous series of combined interlaced measurements T_{t,AB} for $t\in \mathit{\{}\mathrm{1},\mathrm{2},\mathrm{3},\mathrm{\dots}\mathit{\}}$ is therefore
with indicator function χ being 1 if t is a member of the set t_{A} or t_{B} and 0 otherwise. Figure 2 shows an example of such a synthetic time series of interlaced measurements. This example is based on a simulated temperature time series using a realization of an AR[1] process using an autocorrelation coefficient of a=0.5 in Eq. (6), similar to the autocorrelation coefficient of radiosonde measurements at 300 hPa above Lindenberg, Germany (see Sec. 2.4).
2.3 Estimating the difference in instrument biases
A direct approach to estimate the difference in instrument biases $\mathrm{\Delta}={c}_{A}{c}_{B}$ is an estimation using the differences in means ${\stackrel{\mathrm{\u203e}}{T}}_{A}$ and ${\stackrel{\mathrm{\u203e}}{T}}_{B}$ of instrument A and B, respectively, over a common time period t_{1} to t_{2}; i.e.
with
being the arithmetic means for the individual instruments; N_{A} and N_{B} are the number of measurements made by instrument A and B, respectively, in the given time period. The uncertainty in this estimate of the difference in instrument biases decreases with increasing N_{A} and N_{B} but also depends on the persistence of the underlying time series: larger persistence leads to larger uncertainties when calculating arithmetic means (e.g. von Storch and Zwiers, 1999).
Here, we exploit the persistence and suggest an approach based on the estimation of a slowly varying signal common to both instruments. Imagine, for example, a smooth temperature time series in the absence of weatherinduced noise. Measurements are then made of that signal using instrument A and this measurement series is represented by s(t) and an additional measurement noise ϵ_{t}. Analogously, measurements of the same slowly varying signal are made using instrument B and can be represented by the same s(t) but with the difference in instrument biases Δ and again measurement noise ϵ_{t}; i.e. $s\left(t\right)+\mathrm{\Delta}+{\mathit{\u03f5}}_{t}$. A model for these interlaced measurements T_{t,AB} is constructed using the indicator function χ:
For t∈t_{B}, the indicator function χ(t∈t_{B}) returns 1 and we
obtain a measurement with instrument B, i.e. ${\widehat{T}}_{t,B}=s\left(t\right)+\mathrm{\Delta}+{\mathit{\u03f5}}_{t}$. For other time steps t∈t_{A} the indicator function returns
0 and we obtain a measurement of instrument A, i.e. ${\widehat{T}}_{t,A}=s\left(t\right)+{\mathit{\u03f5}}_{t}$, excluding the difference in instrument bias Δ. The
statistical model described in Eq. (12) belongs to the class
of generalized additive models (GAMs; e.g. Chambers and Hastie, 1992),
a fundamental class of regression models. GAMs extend generalized linear
models (or linear regression) by additionally introducing to the classical
linear components a smooth term s. This smooth term can be estimated using
a smooth spline fit with its degrees of freedom (i.e. its flexibility of
smoothness) determined by generalized cross validation (Wood, 2006).
This functionality is implemented in the R package mgcv
(Wood, 2006).
2.4 Simulation setup
To investigate whether interlaced measurements diagnosed using the methodology described above can be used to estimate potential biases between instruments, we design a simulation study wherein an ensemble of synthetic upperair temperature time series is generated using a stochastic process. For each member of the ensemble, interlaced measurements for two instruments are obtained by adding a systematic measurement uncertainty (i.e. bias) for each instrument plus some random measurement noise. As the instrument biases are known, their difference Δ is also known. The questions to be answered in this study are the following.

Can a combination of interlaced measurements, together with an adequate statistical model, be used to estimate the difference in instrument biases?

If so, how effective is this estimation compared to an approach requiring dual measurements?
An analysis of the 300 hPa temperatures measured by radiosondes at Lindenberg, Germany forms the basis for this simulation study. After subtracting the seasonal cycle, the temperature anomalies show a variance of about ${\mathit{\sigma}}_{\text{anomalies}}^{\mathrm{2}}=\mathrm{10}\phantom{\rule{0.125em}{0ex}}{\mathrm{K}}^{\mathrm{2}}$ and can be adequately described with an AR[1] process as in Eq. (6) with a∼0.5. To provide a realistic synthetic time series for analysis, we use driving Gaussian white noise $\mathit{\eta}\sim \mathcal{N}(\mathrm{0},{\mathit{\sigma}}_{a}^{\mathrm{2}}$) with variance ${\mathit{\sigma}}_{a}^{\mathrm{2}}=(\mathrm{1}{a}^{\mathrm{2}})\phantom{\rule{0.125em}{0ex}}{\mathit{\sigma}}_{\text{anomalies}}^{\mathrm{2}}$. This choice of ${\mathit{\sigma}}_{a}^{\mathrm{2}}$ ensures that the anomaly variance is fixed at ${\mathit{\sigma}}_{\text{anomalies}}^{\mathrm{2}}=\mathrm{10}\phantom{\rule{0.125em}{0ex}}{\mathrm{K}}^{\mathrm{2}}$ independent of the value of a. This is necessary as we vary the persistence parameter (i.e. the autocorrelation coefficient) $a\in (\mathrm{0},\mathrm{1})$ to study time series with different persistence but identical anomaly variance.
The synthetic temperature series is generated using Eq. (9) that includes a seasonal cycle and a realization of an AR[1] process. The instrument biases in Eq. (9) are prescribed at ${c}_{A}=\mathrm{0.1}$ K and c_{B}=0.2 K and are added to the time series together with a measurement uncertainty being specified as Gaussian white noise $\mathit{\u03f5}\sim \mathcal{N}(\mathrm{0},{\mathit{\sigma}}^{\mathrm{2}})$. The resulting two time series for instruments A and B are combined to (a) a synthetic time series of dual measurements and (b) an interlaced observational counterpart. The difference in instrument biases between the two time series is prescribed as $\mathrm{\Delta}={c}_{A}{c}_{B}=\mathrm{0.1}\mathrm{0.2}=\mathrm{0.3}\phantom{\rule{0.125em}{0ex}}\mathrm{K}$. To investigate the influence of (i) persistence in the temperature series, (ii) measurement noise, and (iii) the number of measurements on our ability to estimate the difference in biases between two instruments, the following parameters are prescribed and controlled in our study:
leading to $\mathrm{6}\times \mathrm{7}=\mathrm{42}$ combinations, i.e. 42 synthetic time series to be analysed. The instrument noise is fixed at σ^{2}∈ 0.1. To generate a synthetic time series for a given a, N, and σ, the following steps were taken.

Generate a time series of length N consisting of an annual cycle and a realization of an AR[1] process as described above.

Add an offset of −0.1 K (instrument bias of instrument A) and Gaussian noise with variance σ^{2}=0.1 to produce a synthetic time series for instrument A.

Add an offset of 0.2 K (instrument bias of instrument B) and Gaussian noise with variance σ^{2}=0.1 to produce a synthetic time series for instrument B.

Select measurements from A for odd days and from B for even days to generate an interlaced time series.

Repeat steps 1 to 4 many times (e.g. M=1000, where M denotes the number of repetitions) to generate 1000 synthetic time series to derive statistically robust estimates of $\widehat{\mathrm{\Delta}}$.
The difference in instrument biases is then estimated based on
The box plots in Fig. 3 summarize the distribution of M=1000 bias estimates $\widehat{\mathrm{\Delta}}$ for a varying number of interlaced flights N. Figure 3a is based on the simulated temperature time series with an AR[1] coefficient a=0.5, being similar to the autocorrelation coefficient found for temperature measurements at 300 hPa above Lindenberg. Figure 3b and c are examples for stronger persistence, i.e. a=0.8 and a=0.9, respectively. All panels show that the spread in the estimated difference in bias between instruments A and B ($\widehat{\mathrm{\Delta}}$) converges towards the true value ($\mathrm{\Delta}=\mathrm{0.3}$) for increasing N in all cases. The rate at which this converges with increasing N depends on the persistence (i.e. autocorrelation) in the underlying time series. Weak persistence (small a) leads to slower convergence (Fig. 3a), while strong persistence (a approaching 1) shows faster convergence.
The SD of $\widehat{\mathrm{\Delta}}$ (see Fig. 4), representing the uncertainty with which the difference in the bias between instruments A and B can be estimated, depends on the number of interlaced flights and on the AR[1] coefficient a (coloured lines in Fig. 4). The SD can be used to construct asymptotic confidence intervals for the estimates using the standard normal assumption (e.g. Wilks, 2011, chap. 5); i.e. for a 95 % confidence interval, the estimated bias needs to be within 1.96 times the SD. For all a, the SD decreases with increasing N; however, the SD is generally larger for weak persistence (small $a\in (\mathrm{0},\mathrm{1})$) and smaller for strong persistent (large $a\in (\mathrm{0},\mathrm{1})$).
The synthetic time series of dual flights performed with instrument A and B simultaneously at N times (i.e. 2 N measurements, solid black line in Fig. 4) provides the most reliable estimate of the biases between the instruments; i.e. the SD is smallest for any N. To provide a robust comparison of the results from the dual flights to the results from N interlaced measurements, the results from the dual flights need to be compared to the results of doubled N interlaced flights. For a time series with an autocorrelation coefficient of a=0.5, at least 2000 days of consecutive interlaced daily measurements would be required to estimate the difference in instrument biases with a SD of 0.22 K. Consider the following example: a station operator seeks to detect the difference in bias between two radiosondes in a temperature time series showing an autocorrelation coefficient of 0.95. The station operator requires a SD of $\widehat{\mathrm{\Delta}}\le \mathrm{0.05}$ K, which leads to a 95 % confidence interval of about 0.1 K ($\approx \mathrm{0.05}\times \mathrm{1.96}$). Then, from Fig. 4 it can be inferred that 500 interlaced measurements are required to achieve this. Furthermore, we conclude that if an operator has a given amount of two types of radiosondes available from which the difference in instrument biases needs to be estimated, it is clear from Fig. 4 that dual flights result in better estimates (i.e. smaller SD in Fig. 4) than interlacing the instrument types from one day to the next. The results presented here (from dual and interlaced flights) also depend on the variance of the signal; for a higher measurement noise, the number of required days will increase and vice versa (not shown).
The results indicate that for typical difference in biases between radiosonde types, the presented method on interlaced measurements is unlikely to provide a robust estimate of the difference in biases for a reasonable length of the measurement period (reasonable is considered as 2 years here). That said, there might be cases of larger instrument biases and/or larger persistence in which the interlaced method could provide an alternative method to dual measurements, requiring fewer resources. Vertical profiles of autocorrelation coefficients as calculated from temperature data obtained from ERA5 reanalyses (https://www.ecmwf.int/en/forecasts/datasets/archivedatasets/reanalysisdatasets/era5, last access: 4 April 2018) are shown in Fig. 5. Temperature data were interpolated to the locations of six GRUAN sites, including sites in the tropics and the middle and high latitudes. Here we calculated the autocorrelation coefficient from ERA5 data rather than from radiosonde measurements, as longterm continuous measurements are required to obtain a robust estimate of the seasonal cycle of the temperature time series before calculating the autocorrelation coefficients. Such continuous observations, covering at least 2 years of daily radiosonde flights, are currently only available at a small subset of GRUAN sites, which does not cover all latitude bands. ERA5 is the latest reanalysis provided by the ECMWF and the calculated autocorrelation coefficients are expected to provide a good estimate of the autocorrelation coefficient at each of the selected sites. Figure 5 shows that the persistence varies strongly with altitude, and if the interlacing method is used, it has to be applied at different altitudes separately. For lower altitudes (pressure levels above 250 hPa), the autocorrelation coefficients vary between 0.4 and 0.8, with the lowest coefficients at the southern middle latitudes (e.g. Lauder, New Zealand). The persistence increases at higher altitudes (below 250 hPa), ranging from 0.7 in the tropics to 0.95 at higher latitudes. The results indicate that the interlacing method may be able to provide an estimate of the difference in biases for high altitudes at e.g. NyÅlesund, a GRUAN site showing the highest autocorrelation coefficients. However, a detailed case study needs to be performed to investigate potential benefits; this is beyond the scope of this study, which focuses on describing and presenting the methodology.
We have used synthetic time series representing temperature measurements to investigate the possibility of using interlaced measurements performed with two different instruments types together with generalized additive models to obtain an estimate of the difference in the bias between the two instrument types. Performing dual radiosonde flights with both instrument types is costly, and therefore we investigated the feasibility of using interlaced flights to obtain an estimate of the difference in the bias. This would be more sustainable and less costly. Information about typically small differences in instrument biases can be obtained from nonsimultaneous measurements using a persistence assumption; i.e. some information from the day's measurement is carried over to the next day. As atmospheric temperatures tend to be autocorrelated in time (e.g. Wilks, 2011; Maraun et al., 2004), the persistence assumption is justifiable. However, the strength of the autocorrelation depends in part on the geographical location of the measurement site and on altitude. Here we investigated how a statistical approach to estimate the difference between two instrument biases is affected by the persistence of a time series.
The results presented here indicate that while it is in principle possible to estimate the difference between two instrument biases from interlaced measurements, the number of interlaced flights required to obtain a satisfying accuracy is very large for reasonable values of the autocorrelation coefficient. Strongly autocorrelated signals require fewer data for an accurate estimate of the difference in biases and therefore fewer interlaced flights than time series with low autocorrelation. The results show that for very strong persistence (e.g. an AR[1] coefficient of 0.99) about twice the number of measurements is needed compared to parallel measurements to obtain a comparable uncertainty in estimates for interlaced measurements. Hence, the described approach may be used for measurements with very strong persistence or for which the costs for sufficient parallel measurements exceeds the costs for sufficient interlaced measurements to confidently infer the difference in the instrument bias. However, if, for example, it were possible to derive a robust estimate of the difference in instrument biases from interlaced measurements in some reasonable time period (e.g. 2 years) and even if this period was more than 2 or 3 times longer than would be required from a dual measurement strategy to achieve the same level of confidence, the interlacing approach would provide a costsaving alternative to an approach that would start with dual flights and then continue with flights using only the new instrument.
The code can be obtained by contacting the corresponding author. The GRUAN data used in this publication are available from ftp://ftp.ncdc.noaa.gov/pub/data/gruan/processing/level2/ (Sommer et al., 2012).
The authors declare that they have no conflict of interest.
We would like to thank the NOAA GCOS office, through the Meteorological Service of New Zealand Limited, for supporting this
research. Henning W. Rust acknowledges support from the Freie Universität Berlin within the Excellence Initiative of the German Research
Foundation. We would also like to thank Fabio Madonna and Alessandro Fasso for helpful discussion around the alternative approach of
interlaced measurements. We thank Matt Hanson and Jared Lewis for their initial comments on and contributions to the discussions about the
methodology. We thank the GCOS Reference
UpperAir Network (GRUAN) for providing the data used in this publication.
The authors confirm that these data have been
used in a manner consistent with the GRUAN data use policy, as articulated in the GRUAN Guide, and have not been used for commercial
gain.
Edited by: Roeland Van Malderen
Reviewed by: two anonymous referees
Box, G. E. P. and Jenkins, G. M.: Time Series Analysis: forecasting and control, Prentice Hall, New Jersey, USA, 1976. a
Chambers, J. M. and Hastie, T. H. (Eds.): Statistical Models in S, Wadsworth & Brooks/Cole, Pacific Grove, California, USA, 1992. a
GCOS171, W. T. R. N.: The GCOS Reference UpperAir Network (GRUAN) GUIDE, WMO, Geneva, Switzerland, 2013. a, b
Haimberger, L., Tavolato, C., and Sperka, S.: Homogenization of the Global Radiosonde Temperature Dataset through Combined Comparison with Reanalysis Background Series and Neighboring Stations, J. Climate, 25, 8108–3131, https://doi.org/10.1175/JCLID1100668.1, 2012. a
Jeannet, P., Bower, C., and Calpini, B.: Global criteria for tracing the improvements of radiosondes over the last decades, WMO/TD No. 1433, IOM Report No. 95, World Meteorological Organization, Geneva, Switzerland, 32 pp., 2008. a
Jensen, M. P., Holdridge, D. J., Survo, P., Lehtinen, R., Baxter, S., Toto, T., and Johnson, K. L.: Comparison of Vaisala radiosondes RS41 and RS92 at the ARM Southern Great Plains site, Atmos. Meas. Tech., 9, 3115–3129, https://doi.org/10.5194/amt931152016, 2016. a, b, c
Kobayashi, E., Noto, Y., Wakino, S., Yoshii, H., Ohyoshi, T., Saito, S., and Baba, Y.: Comparison of Meisei RS291 rawinsondes and Vaisala RS92SGP radiosondes at Tateno for the data continuity for climatic data analysis, J. Meteorol. Soc. Jpn., 90, 923–945, https://doi.org/10.2151/jmsj.2012605, 2012. a
Luers, J. and Eskridge, R.: Use of radiosonde temperature data in climate studies, J. Climate, 11, 1002–1019, 1998. a
Maraun, D., Rust, H. W., and Timmer, J.: Tempting longmemory – on the interpretation of DFA results, Nonlin. Processes Geophys., 11, 495–503, https://doi.org/10.5194/npg114952004, 2004. a
Nash, J., Oakley, T., Vömel, H., and Wei, L.: WMO intercomparison of high quality radiosonde systems, Yangjiang, China, 12 July–3 August 2010, WMO/TD No.1580, IOM Report, No. 107, World Meteorological Organization, Geneva, Switzerland, 248 pp., 2011. a
Randel, W. and Wu, F.: Biases in Stratospheric and Tropospheric Temperature Trends Derived from Historical Radiosonde Data, J. Climate, 19, 2094–2104, 2006. a
Rust, H. W., Mestre, O., and Venema, V. K. C.: Fewer jumps, less memory: Homogenized temperature records and long memory, J. Geophys. Res., 113, D19110, https://doi.org/10.1029/2008JD009919, 2008. a
Saha, S., Moorthi, S., Pan, H.L., et al.: The NCEP Climate Forecast System Reanalysis, B. Am. Meteorol. Soc., 91, 1015–1057, https://doi.org/10.1175/2010bams3001.1, 2010. a
Seidel, D. and Free, M.: Measurement Requirements for Climate Monitoring of UpperAir Temperature Derived from Reanalysis Data, J. Climate, 19, 854–871, 2006. a, b
Sherwood, S., Lanzante, J., and Meyer, C.: Radiosonde Daytime Biases and Late–20th Century Warming, Science, 309, 1556–1559, 2005. a
Sommer, M., Dirksen, R., and Immler, F.: RS92 GRUAN Data Product Version 2 (RS92GDP.2), GRUAN Lead Centre, https://doi.org/10.5676/GRUAN/RS92GDP.2, 2012. a
Steinbrecht, W., Claude, H., Schönenborn, F., Leiterer, U., Dier, H., and Lanzinger, E.: Pressure and Temperature Differences between Vaisala RS80 and RS92 Radiosonde Systems, J. Atmos. Ocean. Tech., 25, 909–927, https://doi.org/10.1175/2007JTECHA999.1, 2008. a, b
Thorne, P., Lanzante, J., Peterson, T., Seidel, D., and Shine, K.: Tropospheric temperature trends: history of an ongoing controversy, WIREs Climate Change, 2, 66–88, https://doi.org/10.1002/wcc.80, 2011. a, b
von Storch, H. and Zwiers, F.: Statistical analysis in Climate Research, Cambridge University Press, Cambridge, UK, https://doi.org/10.1017/CBO9780511612336, 1999. a
Wilks, D. S.: Statistical methods in the atmospheric sciences, 3rd edn., Academic Press, San Diego, CA, USA, 2011. a, b, c, d, e
Wood, S.: Generalized Additive Models: An Introduction with R, Chapman and Hall/CRC, Taylor & Francis Group, Boca Raton, NW, USA, 2006. a, b