Total column ozone variations estimated using ground-based stations provide important independent source of information in addition to satellite-based estimates. This estimation has been vigorously challenged by data inhomogeneity in time and by the irregularity of the spatial distribution of stations, as well as by interruptions in observation records. Furthermore, some stations have calibration issues and thus observations may drift. In this paper we compare the spatial interpolation of ozone levels using the novel stochastic partial differential equation (SPDE) approach with the covariance-based kriging. We show how these new spatial predictions are more accurate, less uncertain and more robust. We construct long-term zonal means to investigate the robustness against the absence of measurements at some stations as well as instruments drifts. We conclude that time series analyzes can benefit from the SPDE approach compared to the covariance-based kriging when stations are missing, but the positive impact of the technique is less pronounced in the case of drifts.

The ground-based total column ozone data set is based on Dobson and
Brewer spectrophotometer and filter ozonometer observations available
from the World Ozone and UV Data Centre (WOUDC)
(

The TCO data set and corresponding satellite measurements have also
been widely discussed in the statistics literature. Some authors have
noticed space–time asymmetry in ozone data

The aim of this article is to apply a new technique, the stochastic
partial differential equation (SPDE) approach in spatial statistics

Section 2 gives a brief introduction to the theoretical framework of the SPDE technique, a basic description of the covariance-based kriging and related model selection and diagnostic techniques. Section 3 describes the spatial analysis using TCO data from WOUDC on a monthly, seasonal and annual basis. Furthermore, the estimated results of SPDE and covariance-based kriging are compared with the Total Ozone Mapping Spectrometer (TOMS) satellite data to examine which method yields approximations closer to satellite data. Finally, the long-term zonal mean trends enable us to conduct a sensitivity analysis by removing stations at random and by introducing long-term drifts at some ground-based stations.

Our main problem is to estimate ozone values at places where it is not
observed. Models in spatial statistics that enable this task are
usually specified through the covariance function of the latent
field. Indeed, in order to assess uncertainties in the spatial
interpolation with global coverage, we cannot build models only for
the discretely located observations, we need to build an approximation
of the entire underlying stochastic process defined on the sphere.
We consider statistical models for which the unknown functions are assumed to be realizations
of a Gaussian random spatial process. The standard fitting approach, covariance-based
kriging, spatially interpolates values as linear combinations of the
original observations, and this constitutes the spatial predictor. Not only large data sets
can be computationally demanding for a kriging predictor but covariance-based models also struggle to
take into consideration in general nonstationarity (i.e., when physical spatial
correlations are different across regions) due to the fixed
underlying covariance structure. Recently, a different computational
approach (for identical underlying spatial covariance models) was
introduced by

The Matérn covariance function is an advanced covariance
structure used to model dependence of spatial data on the plane. On
the sphere,

Let

For locations on spherical domain,

For the covariance-based approach, the hurdle that we are facing is that we have to define
a valid (but flexible enough) covariance model and, furthermore,
compared to data on the plane, we must employ a distance on the
sphere. Two distances are commonly considered. The chordal distance
between the two points

Surface predicted ozone (DU) mean and SD for SPDE (strategy D) and CBK (strategy B) from January 2000. The red points indicate the locations of stations.

In this section, we produce statistical estimates of monthly ozone maps, using TCO data from WOUDC. We consider TCO data in January 2000 as an illustration, which contain 150 ground-based ozone observations around the world. All ozone values in this article are in Dobson units (DU). We first choose the model setups for both SPDE and covariance-based approaches below.

With the SPDE approach, as the smoothness parameter

To compare the performance of the SPDE approach with the covariance-based
kriging, the same

In order to achieve a better estimation, the monthly mean “norms”

Specifications of the different strategies in the spatial estimations.

We now compare the results estimated by four strategies
described in Table

Comparison of the generalized cross-validation error (

Surface predicted ozone (DU) mean from SPDE approach (strategy D) by season from 2000.

Surface predicted ozone (DU) standard deviation from SPDE approach (strategy D) by season from 2000.

Table

Figure

Total ozone (DU) difference mapping of SPDE and covariance-based kriging (CBK) estimated mean with respect to satellite data from January, April and July 2000.

Total ozone (DU) difference mapping of SPDE and covariance-based kriging (CBK)
estimated mean with respect to satellite data from October
2000. Estimation in October shows worse prediction than other
months; hence it used different scale from
Fig.

Ozone mapping from TOMS data in

Ozone mapping from TOMS data in

Seasonal ozone data are obtained by averaging the corresponding
monthly data (but all months of every season must be available to
create such seasonal averages). Table

Generalized cross-validation error (

Generalized cross-validation error (

The annual ozone data are obtained by creating an annual average, which
also means that stations with record interruptions are not
used. Therefore fewer stations are available for this exercise. To see
the improvement of the annual-based analysis over seasonally and
monthly analyses, Table

Comparison with satellite data over all months and averaged over 2000–2005 RMSEs for covariance-based kriging (CBK) and SPDE predictions.

In this section, we assess the match between satellite observations
and spatial predictions based on ground-level stations. The TOMS data on monthly averages are
obtained from the NASA website (

From this stage we only compare the results between a nonstationary
SPDE-based model and a covariance-based model with

Figures

The seasonal predicted total ozone is obtained by averaging the
corresponding monthly means. We excluded stations that have
interruptions in their records. Therefore fewer observations are used
to predict seasonal means. The RMSEs between predicted surface and
satellite data are presented in Table

In this section, we show how variations in time of the zonal means can be improved by employing the more accurate SPDE-based mapping technique instead of covariance-based kriging.

To see how the ozone zonal means change over time over the same
stations with different algorithms, we choose the stations which
supplied data for at least 25 years between 1979 and 2010. Hence
67 stations are used to construct these zonal mean time series. There
is a strong asymmetry between the Southern Hemisphere (6 stations) and
the Northern Hemisphere (48 stations); there are 13 stations at the tropics
(defined as 30

In order to overcome the underestimation over the South Pacific (see
Fig.

Comparison with satellite data over all seasons and averaged over 2000–2005 RMSEs for covariance-based kriging (CBK) and SPDE predictions.

In this study, we compare the zonal mean time series estimated by SPDE
and covariance-based kriging with Solar Backscatter Ultraviolet (SBUV) satellite
instrument merged ozone data described by

Time series of zonal means by SBUV satellite data (black) WOUDC data set (green), covariance-based kriging (red) and SPDE (blue) from 1979 to 2010.

Time series of 30–60

Time series of 30–60

To investigate the pattern of zonal mean long-term changes in detail,
Fig.

The final step is to conduct a sensitivity analysis for the long-term
zonal mean estimations against either randomly removed stations or
drifts in some of the ground-based observations. To see the impact of
removing stations on the long-term ozone zonal mean change, we choose
57 stations (39 stations in the Northern Hemisphere, 10 stations in
the tropics and 8 stations in the Southern Hemisphere) which provided
data over the entire period from 1990 to 2010. We randomly remove 5, 10, 20
and 30 stations out of these set of stations by taking into account
the relative weights of the respective regions and estimate the zonal
mean trends in each case. The stations removed are randomly chosen by
the design in Table

Furthermore, to illustrate possible variations in the sensitivity
analysis, we randomly draw 5 sets of stations which need to be removed,
labeled cases 1–5. The time series for different zonal mean
trends over the latitude band 30–60

Annual zonal mean deviances from SBUV data (black), WOUDC
data set (green), using all 57 available ground-based data
(blue), random removed 5 (red), 10 (yellow), 20 (brown) and 30
(grey) stations in SPDE and covariance-based kriging (CBK) estimation over the (1)
global (60

Time series of 30–60

We use case 1 for further illustration, where both SPDE and covariance-based approaches
estimated well with respect to other cases. Figure

Time series of 30–60

Annual zonal mean deviances from SBUV data (black), WOUDC
data set (green), using all 57 available ground-based data
(blue), adding drift to 5 (red), 10 (yellow), 20 (brown) and 30
(grey) stations in SPDE and covariance-based kriging (CBK) estimation over the (1)
global (60

Design of the sensitivity analysis: stations to be removed are randomly selected within each region.

For the second part of sensitivity analysis, we add random long-term
drifts into observations due to instrument-related problems. In
reality, all observations from a ground-level station are often be
biased by 5–10 DU (2–3 %) over a period of several years. For
the setting of drifts, let

Using the same 57 stations which provided data consistently over 1990
to 2010, the zonal mean trends were estimated with these added drifts over
subsets of randomly selected 5, 10, 20 and 30 stations. We consider five
sets of random drifts as well to account for possible random
variations in the selection process. The time series in each case are
shown in Figs.

Table

RMSEs of annual, seasonal and monthly total mean ozone from WOUDC data set, and SPDE and covariance-based kriging (CBK) estimated means (using 57 stations) against SBUV data over 1990–2010.

In summary, the covariance-based kriging method may perform fairly well globally, but displays misfit locally. The misfits will be averaged out when zonal means are estimated, but they will reveal themselves as relatively higher errors in estimations compared to the SPDE spatial prediction method for mapping. Moreover, both the estimation uncertainty of SPDE and covariance-based methods considerably depend on the location of stations, but the SPDE approach outperforms covariance-based kriging in terms of the uncertainty quantification in areas with few stations. The estimation of trends in time series over the Northern Hemisphere are more accurate for both methods than over the Southern Hemisphere as there is a much denser network of stations than in the Southern Hemisphere. The sensitivity analysis also suggests that the ground-based network can provide a reliable source of data for estimation of the long-term ozone trends. In the Northern Hemisphere, annual means can be successfully estimated even if half of the available sites is excluded from the analysis. This is not the case for the tropical belt and Southern Hemisphere where the number of sites is very limited. Additional 3 % biases over 5-year intervals at up to the half of the network have a relatively small impact on the estimated zonal means. This suggest that the network can tolerate some systematic errors as long as instruments are calibrated on a regular basis (5 years in our tests) in order to remove such biases. Overall, when stations are removed or have added drift, the SPDE approach shows more robustness than covariance-based kriging and thus, for current observations, should be a preferred method.

The algorithm of estimation of parameters in SPDE works as
follows. Let

As pointed out in

For the ozone data, we specified a second order linear polynomial for

To assess the performance of model fitting, the residuals are
considered. Raw residuals are defined as the difference of the
observed values and fitted values. They can be interpreted as
estimators of the errors

In practice, the GCV function computes the weighted residual sum of
squares when each data point (i.e., station) is omitted and predicted
from the remaining points. The value of

After the optimal

Kai-Lan Chang was supported by the Taiwanese government sponsorship for PhD overseas study. S. Guillas was partially supported by a Leverhulme Trust research fellowship on stratospheric ozone and climate change (RF/9/RFG/2010/0427). Edited by: L. Bianco