This paper is motivated by the fact that, although temperature readings made by Vaisala RS41 radiosondes at GRUAN sites (

In particular, a general framework for the computation of interpolation uncertainty based on a Gaussian process (GP) set-up is developed. Using the GP characteristics, a simple formula for computing the linear interpolation standard error is given. Moreover, the GP interpolation is proposed as it provides an alternative interpolation method with its standard error.

For the Vaisala RS41, the two approaches are shown to provide similar interpolation performances using an extensive cross-validation approach based on the block-bootstrap technique. Statistical results about interpolation uncertainty at various GRUAN sites and for various missing gap lengths are provided. Since both approaches result in an underestimation of the interpolation uncertainty, a bootstrap-based correction formula is proposed.

Using the root mean square error, it is found that, for short gaps, with an average length of 5

The quality of climate variable profiles in the atmosphere is relevant in various scientific fields. In particular, it is important for numerical weather prediction, satellite observation validation, and climate change understanding, including extreme events such as droughts and tornadoes.
In this framework, more than 10 years ago, the GCOS (Global Climate Observing System) Reference Upper-Air Network (GRUAN,

GRUAN data processing for the Vaisala RS92 radiosonde was developed to meet the above criteria for reference measurements

Although temperature readings made by the Vaisala RS41 radiosonde at GRUAN sites are given at 1

The literature has considered the interpolation of atmospheric profiles from various points of view.
In some cases, interpolation is applied to measurement uncertainty. For example, considering the AERONET Version 3 aerosol retrievals,

A second and more relevant use of interpolation relates to the measurement itself.
In this field,

In the framework of radiosonde co-location uncertainty, considering relative humidity,

Comparisons of radiosonde and satellite data are sometimes based on low-vertical-resolution radiosonde profiles, especially for historical data.
In some cases, interpolation is not required because of the higher vertical resolution of satellite profiles

As a common trait of the above literature, interpolation of atmospheric profiles is quite common, but a systematic analysis of interpolation uncertainty per se is not yet available.
A general approach to interpolation is the geostatistics approach

In this paper, the uncertainty of the one-dimensional linear interpolation is discussed using two approaches. In the first stage, the closed-form formula of the linear interpolation uncertainty is presented under the assumption that the observed atmospheric profile is generated by a GP. In the second stage, thanks to the availability of good profiles without missing data, the GP assumption is relaxed, and a block-bootstrap correction of the uncertainty formula is constructed. This approach is valid for any atmospheric profile data set. Considering the motivating application, which focuses on temperature readings of the Vaisala RS41 at GRUAN sites, this paper's objective is to contribute to the understanding of interpolation uncertainty expressed as a function of missing gap length, missing frequency, altitude, and site. This objective amounts to studying the feasibility of an algorithm and/or a lookup table providing interpolation uncertainty in a future version of GRUAN data processing.

To achieve this objective, each good profile is divided into a learning set and a testing set. Firstly, data from the testing set are considered missing and are estimated by interpolation of the learning set data. Secondly, the comparison of estimated and true data in the testing set is used for interpolation uncertainty assessment.
This assessment is done for various missing patterns that resemble observed bad launches, which are characterised by many missing measurements.
In particular, increasing gap average lengths will be analysed.
The testing sets will be extracted using a block-bootstrap approach

The rest of the paper is organised as follows.
Section

There are several possible reasons for temporary gaps in data reception. These include the presence of obstacles that may interfere with radio transmission to the ground site (trees, buildings, local geography), extraordinary meteorological conditions, or instrument-related reasons. Considering an ascent as a trajectory, rather than a vertical profile, the probability of data gap occurrence increases with the horizontal distance from the launch site (weaker radio signal), which can significantly exceed the vertical distance, depending on wind conditions.

A preliminary statistical analysis of the occurrence of data gaps in RS41 radiosonde soundings performed at 15 GRUAN stations in the 2014–2019 period shows that gaps occur in more than 20 % of the soundings, virtually independently of the height ranges, with the majority (

Frequency distribution of temperature gaps in the stratospheric height section between 20 and 25 km, based on 13 667 RS41 profiles, years 2014–2019.

The GRUAN data processing is based on the raw data from the physical radiosonde sensors, namely temperature, relative humidity, positioning data provided by the Global Navigation Satellite System (GNSS), and pressure if an onboard sensor is present. The raw data are corrected for known or experimentally evidenced systematic effects, such as adjustments from pre-flight ground checks, corrections of sensor time lags, or solar radiative effects. Some intermediate variables are, in turn, calculated (e.g. effective air speed or ventilation) as components of the correction algorithms. A number of secondary variables are finally derived, for example, altitude, geopotential height, water vapour content, or wind components. At different processing stages, smoothing filters are applied for estimation and separation of the signal's noise components. Through all these steps, the regular grid of the measured raw data is maintained; that is, all variables and uncertainties in the product variables are given with the original high resolution.

This procedure inevitably leads to specific technical difficulties if data gaps occur randomly or intermittently. For example, smoothing may introduce effects which are difficult to handle when running over gaps, especially for gap sizes comparable to or exceeding the actual kernel length. The same difficulty applies to uncertainty estimates to be associated with the averaged (smoothed) values. Another example is related to magnitudes which are calculated cumulatively with height, such as pressure derived from positioning, temperature, and humidity data, or the integrated water vapour content. As a consequence, processing-related irregularities or deviations may occur in the profile data and uncertainty estimates, the systematics and extent of which are difficult to predict. Depending on the purpose for which the GRUAN data product is further used (e.g. process studies based on high-resolution data or average-based long-term studies for climate), such systematics may have a different impact.

In this section, formulas of the uncertainty for both linear and stochastic interpolation are considered under some stochastic assumptions about the data generation mechanism.

In particular, considering a radiosonde flight, we assume that

In Eq. (

From Eq. (

The GP is characterised by the parameter set

Considering an observation gap in the interval

Assuming that the true signal

Equation (

Figure

Linear interpolation standard error (SE), Eq. (

The above thresholds may be overcome in the presence of correlation. In general, for a GP with

Linear interpolation SE, Eq. (

Linear interpolation SE, Eq. (

The assumption that the temperature profile

Two data sets provided by the GRUAN Lead Centre (

GRUAN sites included in the Few_nan data set with the respective number of profiles and the number of profiles selected for the analysis, which have gaps shorter than 5

As a preliminary analysis of the bad data set Many_nan, Fig.

Frequency distribution of the missing data fraction in the Many_nan data set.

For further interpolation analysis, those profiles in Few_nan with very few missing data are selected. In particular, the

Frequency distribution of profile duration in the Few_nan data set.

The block bootstrap is a well-known technique

This section presents a rule for partitioning each original profile

The gap scheme is obtained by randomly generating and sorting the

We are interested in collecting information about the interpolation error in a dense vertical grid, even if the testing fraction

The main results of the next section are obtained using linear interpolation of temperature versus time, based on the neighbouring values, and GP interpolation given by the expectation of

Each bootstrap replicate

The GP interpolator depends on the local structure

The GP model selection considered the two autocovariance functions

Considering the choice of layering resolution, the results were little sensitive to layer-size variations, and a 400

The best results for the basis functions were obtained with a piecewise linear function of time. In this regard, other predictor set-ups were also considered: a piecewise quadratic function of time and vector predictor set-ups, including altitude, coordinates, and wind. Using these more complex models did not result in any relevant improvement to RMSE; still worse, it resulted in problems concerning the singularity of the information matrix at various combinations of sites and layers. Hence, invoking Occam's razor and looking for a robust and general model set-up, we settled on using the simplest piecewise linear function of time.

This section's bootstrap campaign aims to assess the uncertainty of the linear interpolation, Eqs. (

Comparison of cross-validation RMSEs between GP and linear interpolation for increasing average gap length

Table

When comparing the two interpolation approaches, Table

Figure

Linear interpolation uncertainty by GRUAN site and average gap size

Figure

Linear interpolation uncertainty of GRUAN sites. The cross-validation uncertainty (

In order to re-interpret the GP-based linear interpolation uncertainty formula of Figs.

Frequency distribution of estimated GP model parameters from all bootstrapped profiles and all model-related atmospheric layers.

In general, the connection between the uncertainty curves of Figs.

Figure

Each line shows the cross-validation RMSE of linear interpolation as a function of the interpolation distance (s) for a specific atmospheric layer in the range of 2–37

In addition, Fig.

Each line shows the linear interpolation SE as a function of the interpolation distance (s) for a specific atmospheric layer in the range of 2–37

Figures

For the above reasons, we propose a bootstrap-corrected interpolation uncertainty estimate by
merging the information of the single profile

As an illustration of the method, the profile of the Sodankylä site on 3 March 2017 12:00 UTC is considered in Fig.

RS42 temperature profile at the Sodankylä site on 3 March 2017 12:00 UTC.

Detail of the RS42 temperature profile at the Sodankylä site on 3 March 2017 12:00 UTC, around 22.5

Figure

It follows that the implementation of GRUAN data processing providing interpolated temperature profiles along with their uncertainty requires some effort divided into two different phases. First, massive GP offline computation is needed to prepare the lookup table related to Eqs. (

This paper offers a multifaceted assessment of the interpolation uncertainty of Vaisala RS41 temperature profiles at various altitudes, using an extensive data set from seven GRUAN sites. Moreover, it provides a general framework for the interpolation of generic atmospheric profiles.

Two complementary uncertainty approaches have been developed and integrated.
The first is a cross-validation approach based on block bootstrap, which shows that the average of the root mean square error of linear interpolation is about 0.1 K for small gaps, which increases up to 0.58 K for gaps of an average duration of 60

Since the cross-validation outputs are averages, the individual profile contribution to the uncertainty is not considered. Hence, the second approach addresses interpolation uncertainty using Gaussian process assumptions. This approach allows for obtaining a formula for the interpolation uncertainty which depends on the autocorrelation structure of each single profile.

Integrating the above two approaches, a bootstrap-corrected formula for the individual interpolation uncertainty is proposed. Based on these results, a future version of GRUAN data processing could implement interpolated temperature profiles, uncertainty included.

The extension of this approach to other essential climate variables (ECVs) and/or other instruments requires some consideration. From the modelling point of view, provided that enough field data are available, the extension is relatively straightforward. Indeed, the approach is quite general, and model selection and optimisation are data-driven. Hence, similar results may be expected for temperature profiles obtained by other instruments, provided that vertical resolution and instrumental error are comparable to the present case. Further, similar results are also expected for other smooth variables, such as pressure.

On the other hand, the interpolation uncertainty could be greater for ECVs which are known to have large variations in the small scale.
For example, relative humidity commonly shows highly intermittent profiles in the troposphere, with very large and very fast-changing gradients. In these cases, we can expect that the cross-validation uncertainty could be high even for small gaps. In addition, the vertical autocorrelation could have a shorter range, and the corresponding GP model could provide interpolation uncertainties close to the white noise case considered in Sect.

To see Eq. (

The underlying MATLAB code is available from the corresponding author upon request.
The data are available from the GRUAN Lead Centre (

Sections 1 and 9 are written together. Section 2 is due to MS and CvR. Sections 2–8 are due to AF.

The authors declare that they have no conflict of interest.

The authors wish to thank the GRUAN Quality Task Force for the extensive discussions.

This paper was edited by Brian Kahn and reviewed by two anonymous referees.