Introduction
Radiosondes are indispensable for monitoring the upper air as
they provide high vertical resolution in situ observations of temperature,
pressure, and water vapour between the surface and the upper troposphere–lower
stratosphere. Determining long-term temperature trends from radiosonde
measurements is challenging because changes in instrumentation can, among
other things, introduce discontinuities in the measurement time series (see
Fig. ). Since radiosonde measurements are primarily made
to provide the data needed to constrain weather forecasts and not to detect
long-term changes in climate, little attention has been paid to ensuring the
long-term homogeneity of the measurement record when changing from one
instrument to another. As a result, radiosonde data records typically fall
short of the standard required to reliably detect changes in climate. Another
cause of inhomogeneities in the record is undocumented changes in data
processing . While much effort has been spent attempting
to remove discontinuities in radiosonde data records
e.g., lack of
confidence in the long-term homogeneity erodes confidence in derived trends.
used upper-air temperatures from the NCEP-NCAR reanalysis
to investigate the effects of sampling frequency, changes
in observation schedule, and the introduction of inhomogeneities on the
radiosonde climate data record. Their results indicate that introducing
inhomogeneities into a temperature time series provides the most significant
source of uncertainty in trend estimates. Maintaining the temperature
measurement stability to within 0.1 K for periods of 20 to 50 years
avoids uncertainties in trend estimates in at least 99 % of cases
. With a weaker stability requirement of 0.25 K,
the uncertainty in a 50-year trend estimate increases by about 5 % for
twice-daily sampling. showed that inhomogeneities in
temperature measurements can cause spurious memory, leading to larger
uncertainty for statistics derived from these series. The results of these
studies demonstrate the need to account for any inhomogeneities in the
measurement time series prior to any trend analysis.
The GCOS (Global Climate Observing System) Reference Upper-Air Network
(GRUAN) was established to provide reference-quality measurements of
atmospheric ECVs suitable for reliably detecting changes in global and
regional climate on decadal scales. To avoid compromising the integrity of
the long-term climate record, it is essential that any change, e.g. in the
instrumentation or data processing, is adequately assessed before the change
is implemented. For example, when transitioning from one radiosonde type to
another, inter-comparison between the two radiosonde types is required to assess
a potential systematic difference between the radiosondes and to correct for
it, ensuring a continuous homogeneous data set without any introduced
discontinuities. Typically, inter-comparisons of measurements from dual or
quadruple (two of each instrument type) radiosonde flights are used to
robustly detect systematic differences between the instruments
e.g..
Results presented in indicated that temperature
biases often increase significantly with increasing altitude, particularly in
the lower stratosphere. In the past, WMO conducted several radiosonde
inter-comparison campaigns e.g. with the
objective of investigating the performance of operational radiosonde systems.
The results of these campaigns are used in part to improve the accuracy of
daytime operational radiosonde measurements and the associated correction
procedures to provide temperature and relative humidity accuracies currently
possible with night-time measurements. The knowledge of the performance that
can be expected from various radiosonde systems allows the users to make
a well-informed decision on the choice of future equipment. For a measurement
network like GRUAN, it is essential to have more than one good-quality
radiosonde type for operations. Instrument biases are also influenced by
clouds as shown in who found systematic differences in
temperature measurements greater than 2 K between the Vaisala RS92
and RS41 radiosonde when exiting cloud layers. This large difference in
temperature measurements between the two radiosondes was attributed to the
wet-bulb effect, in which the temperature sensor gets wet while passing through
a cloud layer and is subject to evaporative cooling after entering drier
parts of the atmosphere. Below 28 km of altitude,
found a mean systematic difference between the temperature measurements of
the two radiosondes of 0.13 K. For radiosonde measurements performed
at GRUAN sites, it is suggested that sites conduct dual sonde launches for at
least 6 months when changing from one instrument type to another
. However, analysis of data from dual sonde launches conducted
at the GRUAN Lead Centre suggests that at least 200 dual flights over
a period of 1 year are required to accurately assess the systematic
difference between the two sonde types . The number of dual
sonde flights required may be site dependent, and therefore site-specific
analysis is likely required to determine the required number of dual flights
at any site. Furthermore, it is possible that instrument biases at one site
may not be the same in different atmospheric conditions at other sites,
though this has not been extensively evaluated. Therefore, it would be ideal
if all GRUAN sites could complete thorough radiosonde inter-comparisons by
performing dual radiosonde launches for at least 6 months prior to any
instrument change. However, the costs of such a measurement campaign can be
significant, preventing some stations from performing extensive dual
launches.
In this study, we investigate the feasibility of quantifying the difference
in biases of two instrument types by alternating between the two different
instruments and then applying a statistical model to infer any systematic
biases between the two instruments. For this study, we conduct the
investigation by applying the statistical model developed to synthetic data
sets, in which the persistence of weather conditions is a controllable parameter,
that represent such interlaced radiosonde flights. Specifically, we
investigate (i) whether a combination of interlaced measurements together with an
appropriate statistical model can be used to estimate the differences in
biases of two instrument types and, (ii) if so, how effective the approach
is. This method, if feasible, could reduce the financial burden for sites
seeking to manage such a transition, since an interlacing approach would not
require additional measurements above what is needed for normal daily
operation.
Methodology
Background
Any modification of instrumentation might introduce a systematic change to
the measurement time series. This change is typically assumed to be
a constant difference (Δ) as a first-order approximation resulting
from differences in the individual instrument biases, i.e. their systematic
deviations from the true value. As the true value of the quantity being
measured is unknown in practice, it is not possible to estimate each
instrument's individual bias. It is possible, however, to estimate the
difference Δ=BiasA-BiasB in biases BiasA and
BiasB of instruments A and B. If temporally and spatially
coincident measurements are made using instrument A and B (i.e. dual
flights), this difference can be easily obtained: consider some quantity of
interest, e.g. air temperature (T), measured with instrument A and
instrument B at the same location and time t. The bias of each instrument
is the difference between the expectation value of the instrument's
measurement and the unknown true value Tt:
Bias(Tt,A)=E[Tt,A]-TtandBias(Tt,B)=E[Tt,B]-Tt,
where Tt,A and Tt,B are the temperatures at time t measured with
instrument A and B, respectively. The difference in the instrument bias
is therefore
Δt=Bias(Tt,A)-Bias(Tt,B)=E[Tt,A]-E[Tt,B].
Consider now that Tt,B differs from Tt,A only by
a constant offset Δ, i.e.
Tt,A=Tt,B+Δ,
which is independent of the true value and thus the measurement time t.
Under this assumption, an estimate for the stationary difference in biases
can be obtained from N dual measurements according to
Δ^=1N∑t=1N(Tt,A-Tt,B)=1N∑t=1N(Tt,A-Tt)-(Tt,B-Tt),
with Δ^ denoting an estimate of the constant offset Δ.
This equation applies even if the true value Tt is changing with time as
it depends only on anomalies Tt,A/B-Tt. Under suitable conditions, the
uncertainty (expressed in terms of standard deviation, SD) of this estimate
decreases with N and depends on the persistence (i.e.
autocorrelation) of the time series .
Example time series for interlaced measurements of instrument A
(red dots) and instrument B (green dots). Horizontal lines are the means of
the measurements using instrument A (red) and instrument B (green).
Smooth dashed lines (red for instrument A, green for instrument B) are
spline estimates with the differences being an estimate for the differences
in the instrument biases.
A statistical model for interlaced measurements
As dual measurements using both instrument types require additional
resources and therefore inherent additional costs, estimating a systematic
difference between the instruments using interlaced measurements, i.e. using
instrument A on odd days t∈{1,3,5,…} and instrument B on even
days t∈{2,4,6,…}, is explored in this study. Using this approach,
at every time t only one measurement from one instrument
is available, and hence Eq. () is not applicable.
The underlying assumption for the approach outlined here to work is that the
quantity of interest fluctuates around a smooth climatological signal (i.e.
a seasonal cycle) and the fluctuations show a certain degree of persistence
at the weather timescale; e.g. the fluctuations show a day to day
dependence. For a typical difference in the biases between radiosondes this
persistence (i.e. autocorrelation) is key to the idea of estimating a bias
from interlaced measurements. The difference in the biases tested here is
smaller than the day to day fluctuations themselves as it carries information
from the measurement A to the measurement B.
In the following, a simplified model for air temperature time series
complying with the above-mentioned assumptions is constructed. The true
(unobserved) time series is represented by a smooth seasonal cycle with an
autoregressive process of first order AR[1],
e.g. added to the time series; i.e.
Tt=μ0+μ1sin2πdt365-π2+μ2sin2π2dt365-π2+ϵt,
ϵt=aϵt-1+ηt,
with dt∈[1,…,365] giving the day in the year for date t, where
a is the autocorrelation coefficient which describes the degree of
persistence in the time series at the weather timescale, e.g. the
fluctuations show a day to day dependence, and ηt∼N(0,σ2) is the driving noise of the AR[1] process selected
randomly from a Gaussian distribution. The latter is taken to be Gaussian
white noise with zero mean and variance σ2. This is
a well-established model for the persistence of e.g. daily air temperatures
e.g..
Pseudo-observations are now obtained from a realization of Tt
(Eq. ) with an instrument bias and random measurement noise
added. Here, we aim for interlaced temperature measurements Tt,A and
Tt,B from instruments A and B and thus add the instrument biases
cA and cB, respectively, and independent Gaussian measurement
uncertainties ϵt,A∼N(0,σA2) and
ϵt,B∼N(0,σB2):
Tt,A=Tt+cA+ϵt,At∈tA={1,3,5…}andTt,B=Tt+cB+ϵt,Bt∈tB={2,4,6…}.
For simplicity, we assume equal variances σA2 = σB2 for the
measurement uncertainties. The continuous series of combined interlaced
measurements Tt,AB for t∈{1,2,3,…} is therefore
Tt,AB=Tt+cAχ(t∈tA)+cBχ(t∈tb)+ϵt,
with indicator function χ being 1 if t is a member of the set tA or
tB and 0 otherwise. Figure shows an
example of such a synthetic time series of interlaced measurements. This
example is based on a simulated temperature time series using a realization
of an AR[1] process using an autocorrelation coefficient of a=0.5 in
Eq. (), similar to the autocorrelation coefficient of radiosonde
measurements at 300 hPa above Lindenberg, Germany (see
Sec. ).
Estimating the difference in instrument biases
A direct approach to estimate the difference in instrument biases
Δ=cA-cB is an estimation using the differences in means
T‾A and T‾B of instrument A and B,
respectively, over a common time period t1 to t2; i.e.
Δ^mean=T‾A-T‾B,
with
T‾A=1NA∑t≥t1t≤t2Tt,Afort∈tAandT‾B=1NB∑t≥t1t≤t2Tt,Bfort∈tB
being the arithmetic means for the individual instruments; NA and NB
are the number of measurements made by instrument A and B, respectively,
in the given time period. The uncertainty in this estimate of the difference
in instrument biases decreases with increasing NA and NB but also depends
on the persistence of the underlying time series: larger persistence
leads to larger uncertainties when calculating arithmetic means
e.g..
Here, we exploit the persistence and suggest an approach based on the
estimation of a slowly varying signal common to both instruments. Imagine,
for example, a smooth temperature time series in the absence of
weather-induced noise. Measurements are then made of that signal using
instrument A and this measurement series is represented by s(t) and an
additional measurement noise ϵt. Analogously, measurements of the
same slowly varying signal are made using instrument B and can be
represented by the same s(t) but with the difference in instrument biases
Δ and again measurement noise ϵt; i.e. s(t)+Δ+ϵt. A model for these interlaced measurements Tt,AB is
constructed using the indicator function χ:
T^t,AB=s(t)+Δχ(t∈tB)+ϵt.
For t∈tB, the indicator function χ(t∈tB) returns 1 and we
obtain a measurement with instrument B, i.e. T^t,B=s(t)+Δ+ϵt. For other time steps t∈tA the indicator function returns
0 and we obtain a measurement of instrument A, i.e. T^t,A=s(t)+ϵt, excluding the difference in instrument bias Δ. The
statistical model described in Eq. () belongs to the class
of generalized additive models GAMs; e.g.,
a fundamental class of regression models. GAMs extend generalized linear
models (or linear regression) by additionally introducing to the classical
linear components a smooth term s. This smooth term can be estimated using
a smooth spline fit with its degrees of freedom (i.e. its flexibility of
smoothness) determined by generalized cross validation .
This functionality is implemented in the R package mgcv
.
Simulation set-up
To investigate whether interlaced measurements diagnosed using the
methodology described above can be used to estimate potential biases between
instruments, we design a simulation study wherein an ensemble of synthetic
upper-air temperature time series is generated using a stochastic process.
For each member of the ensemble, interlaced measurements for two instruments
are obtained by adding a systematic measurement uncertainty (i.e. bias) for
each instrument plus some random measurement noise. As the instrument biases
are known, their difference Δ is also known. The questions to be
answered in this study are the following.
Can a combination of interlaced measurements, together with an adequate statistical model, be used to estimate the difference in
instrument biases?
If so, how effective is this estimation compared to an approach requiring dual measurements?
An analysis of the 300 hPa temperatures measured by radiosondes at
Lindenberg, Germany forms the basis for this simulation study. After
subtracting the seasonal cycle, the temperature anomalies show a variance of
about σanomalies2=10K2 and can be adequately
described with an AR[1] process as in Eq. () with a∼0.5.
To provide a realistic synthetic time series for analysis, we use driving
Gaussian white noise η∼N(0,σa2) with variance
σa2=(1-a2)σanomalies2. This choice of
σa2 ensures that the anomaly variance is fixed at
σanomalies2=10K2 independent of the value of
a. This is necessary as we vary the persistence parameter (i.e. the
autocorrelation coefficient) a∈(0,1) to study time series with different
persistence but identical anomaly variance.
The synthetic temperature series is generated using Eq. ()
that includes a seasonal cycle and a realization of an AR[1] process. The
instrument biases in Eq. () are prescribed at
cA=-0.1 K and cB=0.2 K and are added to the time series
together with a measurement uncertainty being specified as Gaussian white
noise ϵ∼N(0,σ2). The resulting two time series
for instruments A and B are combined to (a) a synthetic time series of
dual measurements and (b) an interlaced observational counterpart. The
difference in instrument biases between the two time series is prescribed as
Δ=cA-cB=-0.1-0.2=-0.3K. To investigate the influence of
(i) persistence in the temperature series, (ii) measurement noise, and (iii)
the number of measurements on our ability to estimate the difference in biases
between two instruments, the following parameters are prescribed and
controlled in our study:
persistence of the time seriesa∈{0.5,0.7,0.8,0.9,0.95,0.99}number of measurementsN∈{50,100,250,500,1000,2000,3000},
leading to 6×7=42 combinations, i.e. 42 synthetic time series to
be analysed. The instrument noise is fixed at σ2∈ 0.1. To
generate a synthetic time series for a given a, N, and σ, the
following steps were taken.
Generate a time series of length N consisting of an annual cycle and a realization of an AR[1] process as described above.
Add an offset of -0.1 K (instrument bias of instrument A) and Gaussian noise with variance σ2=0.1 to
produce a synthetic time series for instrument A.
Add an offset of 0.2 K (instrument bias of instrument B) and Gaussian noise with variance σ2=0.1 to produce
a synthetic time series for instrument B.
Select measurements from A for odd days and from B for even days to generate an interlaced time series.
Repeat steps 1 to 4 many times (e.g. M=1000, where M denotes the number of repetitions) to generate 1000 synthetic time
series to derive statistically robust estimates of Δ^.
The difference in instrument biases is then estimated based on
the calculated mean values of N dual measurements (Eq. ), i.e. N measurements for A and N measurements
for B made simultaneously, and
results from the statistical model (Eq. ) using the time series of N interlaced measurement, i.e. N/2
measurements for A and N/2 measurements for B.
Box and whisker plots of bias estimates (Δ^) against the number of interlaced flights N (50 flights means 25
flights of instrument A and 25 flights of instrument B) as derived from M=1000 simulations using an autocorrelation
coefficient of a=0.5 (a), a=0.8 (b), and a=0.9 (c) and a measurement noise of σ2=0.1. The boxes show the
inter-quartile range. The upper and lower whiskers represent the maximum (excluding outliers) and minimum (excluding
outliers). Suspected outliers are shown as dots and are located outside the fences (“whiskers”) of the box plot (e.g. outside 1.5
times the inter-quartile range above the upper quartile and below the lower quartile). The true difference in biases
Δ=-0.3K is marked with a red line.
SD of Δ^ against the number of flights N for different
AR[1] coefficients a. The black solid line represents the
reference experiment with dual flights of instruments A and B, i.e. 2N measurements. To compare the results from the dual
flights (black solid line) with the results obtained from interlaced flights, the number of dual flights has to be doubled. Note the
logarithmic vertical scale.
Results
The box plots in Fig. summarize the distribution of
M=1000 bias estimates Δ^ for a varying number of interlaced
flights N. Figure a is based on the
simulated temperature time series with an AR[1] coefficient a=0.5, being
similar to the autocorrelation coefficient found for temperature measurements
at 300 hPa above Lindenberg. Figure 3b and c are examples
for stronger persistence, i.e. a=0.8 and a=0.9, respectively. All panels
show that the spread in the estimated difference in bias between instruments
A and B (Δ^) converges towards the true value
(Δ=-0.3) for increasing N in all cases. The rate at which this
converges with increasing N depends on the persistence (i.e.
autocorrelation) in the underlying time series. Weak persistence (small a)
leads to slower convergence (Fig. a), while strong
persistence (a approaching 1) shows faster convergence.
The SD of Δ^ (see Fig. ), representing the
uncertainty with which the difference in the bias between instruments A and
B can be estimated, depends on the number of interlaced flights and on the
AR[1] coefficient a (coloured lines in Fig. ). The SD can be
used to construct asymptotic confidence intervals for the estimates using the
standard normal assumption e.g.chap. 5; i.e. for
a 95 % confidence interval, the estimated bias needs to be within 1.96
times the SD. For all a, the SD decreases with increasing N; however, the
SD is generally larger for weak persistence (small a∈(0,1)) and smaller
for strong persistent (large a∈(0,1)).
The synthetic time series of dual flights performed with instrument A and
B simultaneously at N times (i.e. 2N measurements, solid black line in
Fig. ) provides the most reliable estimate of the biases
between the instruments; i.e. the SD is smallest for any N. To provide
a robust comparison of the results from the dual flights to the results from
N interlaced measurements, the results from the dual flights need to be
compared to the results of doubled N interlaced flights. For a time series
with an autocorrelation coefficient of a=0.5, at least 2000 days of
consecutive interlaced daily measurements would be required to estimate the
difference in instrument biases with a SD of 0.22 K. Consider the
following example: a station operator seeks to detect the difference in bias
between two radiosondes in a temperature time series showing an
autocorrelation coefficient of 0.95. The station operator requires a SD of
Δ^≤0.05 K, which leads to a 95 % confidence
interval of about 0.1 K (≈0.05×1.96). Then, from
Fig. it can be inferred that 500 interlaced measurements are
required to achieve this. Furthermore, we conclude that if an operator has
a given amount of two types of radiosondes available from which the
difference in instrument biases needs to be estimated, it is clear from
Fig. that dual flights result in better estimates (i.e.
smaller SD in Fig. ) than interlacing the instrument types
from one day to the next. The results presented here (from dual and
interlaced flights) also depend on the variance of the signal; for a higher
measurement noise, the number of required days will increase and vice versa
(not shown).
Vertical profiles of calculated autocorrelation coefficients for six
GRUAN sites (colour coded as shown in the
legend). Autocorrelation coefficients were calculated from ERA5 temperature data interpolated to the location of the GRUAN sites.
The results indicate that for typical difference in biases between radiosonde
types, the presented method on interlaced measurements is unlikely to provide
a robust estimate of the difference in biases for a reasonable length of the
measurement period (reasonable is considered as 2 years here). That said,
there might be cases of larger instrument biases and/or larger persistence
in which the interlaced method could provide an alternative method to dual
measurements, requiring fewer resources. Vertical profiles of autocorrelation
coefficients as calculated from temperature data obtained from ERA5
reanalyses
(https://www.ecmwf.int/en/forecasts/datasets/archive-datasets/reanalysis-datasets/era5, last access: 4 April 2018) are shown in Fig. .
Temperature data were interpolated to the locations of six GRUAN sites,
including sites in the tropics and the middle and high latitudes. Here we calculated
the autocorrelation coefficient from ERA5 data rather than from radiosonde
measurements, as long-term continuous measurements are required to obtain
a robust estimate of the seasonal cycle of the temperature time series before
calculating the autocorrelation coefficients. Such continuous observations,
covering at least 2 years of daily radiosonde flights, are currently only
available at a small subset of GRUAN sites, which does not cover all latitude
bands. ERA5 is the latest reanalysis provided by the ECMWF and
the calculated autocorrelation coefficients are expected to provide a good estimate of
the autocorrelation coefficient at each of the selected sites.
Figure shows that the persistence varies strongly with
altitude, and if the interlacing method is used, it has to be applied at
different altitudes separately. For lower altitudes (pressure levels above
250 hPa), the autocorrelation coefficients vary between 0.4 and 0.8,
with the lowest coefficients at the southern middle latitudes (e.g. Lauder,
New Zealand). The persistence increases at higher altitudes (below
250 hPa), ranging from 0.7 in the tropics to 0.95 at higher
latitudes. The results indicate that the interlacing method may be able to
provide an estimate of the difference in biases for high altitudes at e.g. Ny-Ålesund, a GRUAN site showing the highest autocorrelation coefficients.
However, a detailed case study needs to be performed to investigate potential
benefits; this is beyond the scope of this study, which focuses on
describing and presenting the methodology.