Global Ensemble of Temperature over 1850-2018 : Quantification of Uncertainties in Observations , Coverage and Spatial modelling ( GETQUOCS )

An interesting paper with a few issues to be resolved. In their analysis of global temperature data using the multiresolution lattice kriging method, the authors extend the work of Ilyas et al 2017 by exploring the hyperparameter estimation uncertainty using a Monte Carlo sampling method. It is interesting to see the impacts of this hyperparameter uncertainty assessment. These are a potentially important source of uncertainty in assessments of observed global temperature change that have not previously been investigated in other studies. There are some issues with the paper structure, including a lack of concluding remarks. Additional discussion of the effects controlled by the sampled parameters and illustration of the impacts of their sampling throughout the temperature record is needed.


Introduction
Instrumental surface temperature data sets are frequently used to determine natural variability of changing surface temperatures on Earth (e.g. Hansen et al., 2010;Good, 2016;Menne et al., 2018). Climate models also use 2.1 Multi-resolution lattice kriging using ABC The Multi-resolution lattice kriging (MRLK) model was introduced by Nychka et al. (2015). It models spatial observations as a sum of a Gaussian process, a linear trend and a measurement error term. The MRLK can flexibly adjust to complicated shapes of the spatial domain and has the property of approximating standard covariance functions. This methodology extends 25 spatial methods to very large data sets accounting for all scales, for the goals of spatial inference and prediction. Indeed, it is computationally efficient for large data sets by exploiting sparsity in covariance matrices. The underlying spatial process is a sum of independent processes, each of which is a linear combination of the chosen basis functions. The basis functions are fixed and coefficients of the basis functions are random.
Consider observations y(x) at n spatial locations x 1 , x 2 , ..., x n in the spatial domain D. The aim is to predict the underlying 30 process at an arbitrary location x ∈ D and to estimate the uncertainty in the prediction. For x ∈ D, where d is the mean and the error term. The unknown spatial process g(x) is assumed to be the sum of L independent processes having different scales of spatial dependence. Each process is a linear combination of m basis functions where m(l) is the number of basis functions at level l, The basis functions (φ j,l ) are fixed. These are constructed at each level using the unimodal and symmetric radial basis func- 5 tions. Radial basis functions are functions that depend only on the distance from the center.
The inference methodology (Nychka et al., 2015) is the direct consequence of maximizing the likelihood function. This inference framework does not account for the uncertainty in the model parameters within the MRLK (Nychka et al., 2015).
Here, we estimate the MRLK parameters and quantify uncertainty in these parameters. For this purpose, a Bayesian framework 10 is created in which the posterior densities of the multi-resolution lattice kriging parameters are estimated using the Approximate Bayesian Computation (ABC). Our new technique allows for the spatial predictions to be accompanied by a quantification of uncertainties in these predictions that reflect not only the coverage gaps but also the uncertainties in the MRLK parameters.

ABC posterior density estimation
Consider a n-dimensional spatial random variable y(x). The multi-resolution lattice kriging model depends on the unknown 15 p-dimensional parameter θ. The probability distribution of the data given a specific parameter value θ is denoted by f (y|θ). If the prior distribution of θ is denoted as π(θ), then the posterior density is given by Here, θ = [λ, aw] T where λ and aw are respectively the smoothing parameter and autoregressive weights. These are the two main parameters of the MRLK. The autoregressive weight aw is the key covariance parameter. It is essential for specifying 20 and fitting the spatial model. The smoothness parameter λ influences throughout the calculation: an inappropriate estimate can lead to over or under fitting a spatial model and can result in imprecise interpolated values and prediction uncertainties (Nychka et al., 2015). The posterior distribution of these parameters given data, f (θ|y), is approximated using ABC. The ABC acceptance-rejection technique based on variogram as a summary statistic is developed in the next section that is used to approximate the posterior densities.

Variogram-based ABC algorithm
Approximate Bayesian computation (e.g. Busetto and Buhmann, 2009;Beaumont, 2010;Dutta et al., 2017;Beaumont, 2019) is a family of algorithms that deals with the situations where the likelihood of a statistical model is intractable, whereas it is possible to simulate data from the model for a given parameter value. ABC bypasses the evaluation of the likelihood function by comparing observed and simulated data. Additionally, it offers algorithms that are very easy to parallelize. There are several 30 https://doi.org/10.5194/amt-2020-454 Preprint. Discussion started: 4 January 2021 c Author(s) 2021. CC BY 4.0 License.
forms of ABC algorithms. The standard rejection algorithm is the classical ABC sampler (e.g. Pritchard et al., 1999;Beaumont et al., 2002). It is widely used e.g. for model calibration by Gosling et al. (2018). The algorithm is based on drawing values of the parameters from the prior distribution. The data sets are simulated for each draw of parameters, each resulting in a chosen summary statistic. A distance metric is computed between the summary statistic of the observed and simulated data.
The parameters that produce distances less than a tolerance threshold are retained. These accepted parameters form a sample 5 from the approximate posterior distribution.
The basic idea of ABC is to simulate from the multi-resolution lattice kriging model for a given set of parameters θ.
Simulations are run for a large number of parameters to be able to produce meaningful posterior distributions. The parameter values are retained for simulated data y * that match the observed data y up to a tolerance threshold. For the similarity metric, 10 we choose the sum of the squared differences between the semivariance at various lag distances of observed (γ(h)) and simulated (γ * (h)) data. Indeed, these semivariances are traditional descriptors of the correlations across space. The retained Simulate from the priors i.e. λ * ∼ π(λ) and aw * ∼ π(aw) Substitute λ * and aw * in the model (eq. 1) and simulate y * , y * ∼ f (y|λ * , aw * ) Compute summary statistic i.e. empirical semivariogram γ * (h) Compute similarity metric M to compare simulated data y * with observed data y Considering a tolerance threshold t, check whether γ * (h) is Figure 1. ABC acceptance-rejection algorithm using variogram as summary statistic for multi-resolution lattice kriging

Spatial estimate based on ABC posteriors
The existing MRLK inference methodology (Nychka et al., 2015) uses maximum likelihood estimators of smoothing parameter For the k th posterior sample, However, d and ρ are still estimated using the maximum likelihood approach. Hence, the spatial predictions in ABC based multi-resolution lattice kriging are carried out using (5) and the associated uncertainties are evaluated based on (6) below: The primary data used in this paper are the well-known HadCRUT4 (version 4.5.0.0) temperature anomalies (Morice et al.,

Ensemble members
Errors in weather observations can either be random or systematic. They can lead to a complex spatial and temporal correlation structures in the gridded data. An ensemble approach is used by  to represent the observational uncertainties is not available as an ensemble data set . So CRUTEM4 was converted to an ensemble data by Morice et al.
The sea surface temperature anomalies are typically being measured using engine room intake measurements, bucket mea-  Homogenization adjustment error Systematic biases occur due to changing station locations, measurement time, equipment and methods to calculate monthly averages. Homogenization adjustments are applied to the data to remove these nonclimatic signals. These adjustments typically do not fully capture the systematic biases. This residual error is referred as homogenization adjustment error. It is modeled using Gaussian distribution (Brohan et al., 2006;.
For station records, a value of homogenization adjustment error is drawn from zero mean Gaussian distribution with a Warming rate is sampled from a Gaussian distribution. If a negative value is drawn, it is set as 0.0 • C. Prior to 1900, the urbanization bias is assumed as 0.0 • C as well .
Exposure bias It has been observed that bias in temperatures can be introduced due to the station siting and exposure. Changes 30 in instrumentation can broadly be grouped into two broad classes. There were few standards for thermometer exposure or instrument shelters before the 19th century. By the early 20th century, these thatched (or covered) enclosures were largely replaced by free-standing louvered shelters or Stevenson type screens (Trewin, 2010). A Stevenson type screen is a shelter or enclosure that protects meteorological instruments from precipitation and direct heat radiation. However, it allows free circulation of the air. Changes in the thermometers‚ exposure to the atmosphere and shelters from direct or indirect solar radiation introduces exposure bias in temperatures (Parker, 1994;Moberg et al., 2003). This error in temperatures is modeled on a regional scale in HadCRUT using a Gaussian distribution . Exposure bias uncertainty takes a value of 0.2 • C and 0.1 • C for the stations that respectively lie within and outside the range of

Hyperparameter temperature ensemble data
The ensemble temperature data set created by Ilyas et al. (2017) presumed perfect knowledge of multi-resolution lattice kriging covariance parameters. The approximate Bayesian computation based multi-resolution lattice kriging developed in Section 2.1 is applied to the sparse HadCRUT4 ensemble data (Section 3, Morice et al. (2012)). As a result of this, a new 100,000 member 10 ensemble data is created. It is an update to the data set discussed in Ilyas et al. (2017). The key difference between the two data sets is the inference methodology. The updated data set is produced by using the ABC based posterior densities of the multiresolution lattice kriging covariance parameters whereas the first data set used pointwise estimates obtained via a likelihood approach. The use of posterior distribution of the model parameters creates a data set that accounts for the multi-resolution lattice kriging parametric uncertainties.  figure 1 is not yet extended to obtain posteriors of α as this parameter has little influence on the uncertainties compared 30 to the others. Other parameters are estimated for each monthly field since the spatial characteristics can vary considerably.
The geodesic grid and the great circle distance is used to handle the spherical domain. To implement multi-resolution lattice kriging, the LatticeKrig R package version 6.4 is used. As an example, the posterior densities of smoothing parameter λ and autoregressive weight aw are shown in Figure 2 for one spatial field. These posterior distributions result from the HadCRUT4 spatial field with the minimum spatial coverage i.e. May 1861.

Spatial field with minimum coverage
ABC-based multi-resolution lattice kriging (Figure 1) is used to predict this sparse spatial field. The spatial predictions and 5 associated uncertainties are shown in Figure 3a and Figure 4a, respectively. The spatial predictions (Figure 3a) are computed using the available spatial observations, multi-resolution basis functions (Section 4.1) and ABC posterior distributions of autoregressive weights aw and smoothing parameter λ (Figure 2). Equation (5) is used for the calculation of the spatial predictor.
For comparison with the previous reconstruction (Ilyas et al., 2017) of this spatial field, the spatial predictions based on profile maximum likelihood approach are presented in Figure 3b. 10 The difference of these reconstructions (Figure 3c) indicate that the spatial predictions using ABC-based multi-resolution lattice kriging (Figure 3a) are higher in magnitude in some regions as compared to the likelihood-based reconstruction (Figure 3b), e.g.for predicted temperatures in the northeast region of North America, southern South America and northern Asia. It can also be observed (Figure 3c) that the ABC based spatial predictions are smaller in certain regions (e.g. Russia and north of 15 Kazakhstan) as compared to the previous reconstruction (Figure 3b). The uncertainties in predictions are shown in Figure 4a that result from ABC-based multi-resolution lattice kriging. These uncertainties in the predictions are computed using equation (6). Figure 4c compares these uncertainties with those resulting from the previous reconstruction (Figure 4b). It can be 9 https://doi.org/10.5194/amt-2020-454 Preprint. Discussion started: 4 January 2021 c Author(s) 2021. CC BY 4.0 License.
observed that ABC based multi-resolution lattice kriging is producing higher uncertainty estimates close to the observed spatial sites. This was expected since there is now account of more sources of uncertainty. However, the unobserved grid locations are showing less uncertainties that are resulting from ABC based multi-resolution lattice kriging. This is possibly due to the fact that LKrig function of LatticeKrig R-package undergoes substantial modifications. For the old ensemble (Ilyas et al., 2017), LatticeKrig version 6.2 was used and for hyperparameter ensemble LatticeKrig Version 6.4 was used.

Discussion
The hyperparameter ensemble (i.e. an update on Ilyas et al. (2017) ) has provided an improved version of the global temperature anomalies since 1850. Instead of classical or frequentist approach, a Bayesian methodology for quantification of uncertainties in large data settings is developed for characterization of uncertainties. The impact of inclusion of parametric uncertainties is evident at a regional ( Figure 4) and global scale ( Figure 5) to draw a sample that captures the variation of multiple environmental variables. This sample accurately represents the distribution of the environmental variables over the full range.
To draw a subsample from 100,000 ensemble members, we considered a set of prominent environmental variables i.e. monthly area averages and IPCC AR5 regional means. The conditional Latin hypercube sample is being drawn from the dis-25 tribution of these environmental variables. The subsample accurately approximates the variation of the set of environmental variables over the full range of these variables. Stating differently, the distributions of the set of environmental variables in the conditioned Latin hypercube sample of size 100 is approximately similar to the distributions of these variables over the full range based on 100,000 ensemble members. As an example, the distribution of a grid box is shown in Figure 6 for May 1861. The full ensemble distribution is based on 100,000 gridboxes (Section-4.3). The subsample distribution results from the 30 conditioned Latin hypercube subsample of 100 gridboxes. Both the distributions overlap mostly. However, the extreme values at the tails are not being captured by the subsample. Also, It is important to note that the subsample ensemble only captures the variation of the specified environmental variables discussed above (i.e. AR5 regional means and monthly area averages).
This subsample would not be suitable to be used to explore any other environmental variable. In that case, full hyperparameter