Most studies on validation of satellite trace gas retrievals or atmospheric
chemical transport models assume that pointwise measurements, which roughly
represent the element of space, should compare well with satellite (model)
pixels (grid box). This assumption implies that the field of interest must
possess a high degree of spatial homogeneity within the pixels (grid box),
which may not hold true for species with short atmospheric lifetimes or in
the proximity of plumes. Results of this assumption often lead to a
perception of a nonphysical discrepancy between data, resulting from
different spatial scales, potentially making the comparisons prone to
overinterpretation. Semivariogram is a mathematical expression of spatial
variability in discrete data. Modeling the semivariogram behavior permits
carrying out spatial optimal linear prediction of a random process field
using kriging. Kriging can extract the spatial information (variance)
pertaining to a specific scale, which in turn translates pointwise data to
a gridded space with quantified uncertainty such that a grid-to-grid
comparison can be made. Here, using both theoretical and real-world
experiments, we demonstrate that this classical geostatistical approach can
be well adapted to solving problems in evaluating model-predicted or
satellite-derived atmospheric trace gases. This study suggests that
satellite validation procedures using the present method must take kriging
variance and satellite spatial response functions into account. We present
the comparison of Ozone Monitoring Instrument (OMI) tropospheric

Most of the literature on validation of satellite trace gas retrievals or
atmospheric chemical transport models assumes that geophysical quantities
within a satellite pixel or a model grid box are spatially homogeneous.
Nevertheless, it has long been recognized that this assumption can often be
violated; spatially coarse atmospheric models or satellites are often not
able to represent features, nor physical processes, transpiring at fine
spatial scales. Janjić et al. (2016) used the term

Numerous scientific studies have reported on this matter. The simulations of
short-lifetime atmospheric compounds such as nitrogen dioxide (NO

The spatial representation issue is not only limited to models. Satellite trace gas retrievals optimize the concentration of trace gases and/or atmospheric states to best match the observed radiance using an optimizer along with an atmospheric radiative transfer model. This procedure requires various inputs such as surface albedo, cloud and aerosol optical properties, and trace gas profiles, all of which come with different scales and representation errors. Moreover, the radiative transfer model by itself has different layers of complexity with regards to physics. A myriad of studies have reported that satellite-derived retrievals underrepresent spatial variability whenever the prognostic inputs used in the retrieval are spatially unresolved (e.g., Russell et al., 2011; Laughner et al., 2018; Souri et al., 2016; Goldberg et al., 2019; Zhao et al., 2020). Additionally, the large footprint of some sensors relative to the scale of spatial variability of species inevitably leads to some degree of the representativity issues (e.g., Souri et al., 2020b, Tang et al., 2021; Judd et al., 2020). It is because of this reason that several validation studies resorted to downscaling their relatively coarse satellite observations using high-resolution chemical transport models so that they could compare them to spatially finer datasets such as in situ measurements (Kim et al., 2018; Choi et al., 2020). Nonetheless, their results largely arise from modeling experiments which might be biased.

The validation of satellites or atmospheric models is widely done against pointwise measurements. Mathematically, a point is an element of space. Hence, it is not meaningful to associate a point with a spatial scale. If one compares a grid box to a point sample (i.e., apples to oranges), they are assuming that the point is the representative of the grid box. At this point, the fundamental question is the following: can the average of the spatial distribution of the underlying compound be represented by a single value measured at a subgrid location? This question was answered in Matheron (1963). He advocated the notion of the semivariogram, a mathematical description of the spatial variability, which finally led to the invention of kriging, the best unbiased linear estimator of a random field. A kriging model can estimate a geophysical quantity in a common grid. This is not exclusively special; a simple interpolation method such as the nearest neighbor has the same purpose. The power of kriging lies in the fact that it takes the data-driven spatial variability information into account and informs an error associated with the interpolated map. This strength not only makes kriging a relatively superior model over simplified interpolation methods, but also reflects the level of confidence pertaining to spatial heterogeneity dictated by both data and the semivariogram model used through its variance (Chilès and Delfiner, 2009).

Different studies leveraged this classical geostatistical method to map the concentrations of different atmospheric compounds at very high spatial resolutions (Tadić et al., 2017; Li et al., 2019; Zhan et al., 2018; Wu et al., 2018). To the best of our knowledge, Swall and Foley (2009) is the only study that used kriging for a chemical transport model validation with respect to surface ozone. They suggested that kriging estimation should be executed in grids rather than discrete points. Kriging uses a semivariogram model in a continuous form. Optimizing the kriging grid size (i.e., domain discretization) at which the estimation is performed is an essence to fully obtaining the maximum spatial information from data. Another important caveat with Swall and Foley (2009) is that averaging discrete estimates (points) to build grids is not applicable for remote sensing data. Depending on the optics and the geometry, the spatial response function can transform from an ideal box (simple average) to a sophisticated shape such as a super Gaussian function (weighted average) (Sun et al., 2018). Moreover, the footprint of satellites is not spatially constant. We will address these complications in this study using both theoretical and real-world experiments.

Our paper is organized with the following sections. Section 2 is a thorough review of the concept of the semivariogram and kriging. We then provide different theoretical cases, their uncertainty, sensitivities with respect to difference tessellation, grid size, and the number of samples. Section 3 proposes a framework for satellite (model) validation using sparse point measurements and elaborates on the representation error using idealized experiments. Section 4 introduces several real-world experiments.

The semivariogram is a mathematical representation of the degree of spatial
variability (or similarity) in a function describing a regionalized
geophysical quantity (

If a reasonable number of samples is present, one can describe

The kriging estimator predicts a value of interest over a defined domain
using a semivariogram model derived from samples (Chilès and Delfiner,
2009). The kriging model is defined as (Matheron, 1963)

The present section illustrates the application of ordinary kriging for
several numerical cases. Five idealized cases are simulated in a grid of

First column: five theoretical fields randomly sampled with 200 points (dots), namely, a constant field (C1), a ramp starting from zero in
the lower left to higher values in the upper right (C2), an intersection
with concentrated values in four corridors (C3), a Gaussian plume placed in
the center (C4), and multiple Gaussian plumes spread over the entire domain
(C5). Second column: the corresponding isotropic semivariograms computed
based on Eq. (2); the red line shows the stable Gaussian fitted to the
semivariogram based on the Levenberg–Marquardt method. Third column: the
kriging estimate at the same resolution of the truth (i.e.,

As for C1, the uniformity results in a constant semivariogram leading the
estimation to be identical to the truth. This estimation signifies the
unbiased characteristic of ordinary kriging. C1 is never met in reality;
however, it is possible to assume some degree of uniformity among data
restrained to background values; a typical example of this can be seen in
the spatial distribution of a number of trace gases in pristine environments
such as

Concerning C2, the semivariogram shows a linear shape, meaning data points at
larger distances exhibit larger differences. Generally geophysical samples
are uncorrelated at large distances; thereby one expects the semivariogram to
increase more slowly as the distance increases. The steady increase in

C3 is an example of an extremely inhomogeneous field manifested in the
stabilized semivariogram at a value of

C4 is a close example of a point source emitter with faint winds and turbulence. The semivariogram exhibits a bell shape. As samples get further from the source, the variance diverges, stabilizes, and then sharply decreases. This is essentially because many data points with low values, apart from each other, have negligible differences. This tendency is recognized as the hole effect, which is characterized for high values to be systemically surrounded by low values (and vice versa). It is possible to mask this effect by fitting a semivariogram model stabilizing at a certain sill (like the one in Fig. 1). Nonetheless, if the semivariogram shows periodic holes, the fitted model should be modified to a periodic cosine model (Pyrcz and Deutsch, 2003).

The last case, C5, shows a less severe case of the hole effect previously observed in C4. This is due to the presence of more structured patterns in different parts of the domain. The range is roughly twice as large as the previous case (C4), denoting that there is more information (variance) among the samples at larger distances. A number of experiments using this particular case will be discussed in the following subsections.

It is often essential to optimize the number of samples used for kriging.
The kriging estimator somewhat recognizes its own capability at capturing
the spatial variability through Eq. (11). Thus, if the target is spatially too complex and/or the samples are too limited, the estimator essentially
informs that

First column: the multi-plume case (C5) randomly sampled with a different number of samples (5, 25, 50, 100, and 500); second column: the corresponding isotropic semivariogram; third column: the kriging estimate; fourth column: the difference between the estimate and the truth; and fifth column: the kriging standard error.

A common application of kriging is to optimize the tessellation of data points for a fixed number of samples to achieve a desired precision. In real-world practices, the objective of such optimization is very purpose-specific; for example, one might prefer a spatial model representing a certain plume in the entire domain. Different ways for data selection exist (e.g., Rennen, 2008), but for simplicity, we focus on four categories: purely random, stratified random, a uniform grid, and an optimized tessellation. Figure 3 demonstrates the estimation of C5 using 25 samples chosen based on those four procedures.

The multi-plume case (C5) randomly sampled by four different sampling strategies using a constant number of samples (25). The sampling strategies include purely random (first row), stratified random (second row), uniform grids (third row), and an optimized tessellation proposed based on kriging (fourth row). Columns represent the truth, the isotropic semivariogram, the kriging estimate, the difference between the estimate and the truth, and the kriging standard error.

Concerning the random selection, the lack of samples over two minor plumes
causes the estimation to deviate largely from the truth. While a random
selection may seem to be practical because it is independent of the
underlying spatial variability, it can suffer from undersampling issues,
thus being inefficient. As a remedy, it might be advantageous to group the
domain into similar zones and randomly sample from each, which is commonly
known as stratified random selection. We classify the domain into four zones
by running the

As for the uniform grid, we notice that there are fewer data points in the
semivariogram stemming from redundant distances, which is indicative of
correlated information. Nonetheless, if the desired tessellation is neutral
with regard to location, meaning that all parts of the domain are of equal
scientific interest, the uniform grid is the most optimal design for the
prediction of

To execute the last experiment, we select 25 random samples for 1000 times
and find the optimal estimation by finding the minimum sum of

A lingering concern over the application of these numerical experiments is that the truth is assumed to be known. The truth is never known; this means we may never exactly know how well or poorly the kriging estimator is performing. However, it is highly unlikely for some prior understandings or expectations of the truth to be absent. If this is the case, which is rare, a uniform grid should be intuitively preferred to deliver the local estimations of average values in uniform blocks. In contrast, if the prior knowledge is articulated by previous site visits, model predictions, theoretical experiments, pseudo observations, or other relevant data, the tessellation needs to be optimized.

It is important to recognize that the uncertainties associated with the prior knowledge directly affect the level of confidence in the final answer. Accordingly, the prior knowledge error should ultimately be propagated to the kriging variance. The determination of the prior error is often done pragmatically. For example, if the goal is to design the location of thermometer sites to capture surface temperature during heat waves using a yearly averaged map of surface temperature, it would be wise to specify a large error with this specific prior information to play down the proposed design. This is primarily because the averaged map underrepresents such an atypical case. A possible extension of this example would be to use a weather forecast model with quantified errors capable of capturing retrospective heat waves. Although a reasonable forecast in the past does not necessarily guarantee a reasonable one in the future, it is rational to assume for the uncertainty with a new tessellation design using the weather model forecast to be lower than that using the averaged map.

A general roadmap for the data tessellation design is shown in Fig. 4. As
proven in Chilès and Delfiner (2009), if the field is purely isotropic,
the uniform grid is the most intuitive sensible choice when the prior
information on the spatial variability is lacking. When the prior knowledge
with quantified errors is available, an optimum tessellation can be achieved
by running a large number of kriging models with suitable

A schematic illustrating a framework for optimum sampling (tessellation) strategy. The prior knowledge refers to any data being capable of describing our quantity of interest including site visits, theoretical models, satellite observations, and emissions.

A kriging model can estimate a geophysical quantity at a desired location
considering the data-driven spatial variability information. Since the
kriging model is practically in a continuous form, the desired locations can
be anywhere within the field of

Figure 5a depicts an experiment comparing the estimates of C2 at different
grid sizes with the truth. The departure of the estimate from the truth is
rather negligible for several coarse grids (e.g.,

Finding an optimum grid size for kriging.

The complexity of directly using the range for choosing the optimal grid
size arises from the fact that the level of spatial homogeneity can vary
within the domain. In fact, the range is derived from a semivariogram model
representing a crude estimate of varying ranges occurring at various scales.
It is intuitively clear that depending on the degree of heterogeneity, which
is spatiotemporally variable, the grid size needs to be adaptively adjusted
(Bryan, 1999). For the sake of simplicity, but at a higher computational
cost, we adopt a numerical solution which is to first simulate on a coarse
grid and then on a finer one until the difference with respect to the previous
grid size across all pixels reaches an acceptable value (

To minimize the complications of different spatial scales between two
gridded data, we first need to upscale the finer-resolution data to match
the coarse ones. In case of numerical chemical transport or weather forecast
models, the size of the grid box is definitive. Likewise, a satellite
footprint, mainly dictated by the sensor design, the geometry, and
signal-to-noise requirements (Platt et al., 2021), is known. However, the
grid size of the kriging estimation is a variable subject to optimization
which has been discussed previously.
When we compare the grid size of the kriging estimate to that of a satellite
(or a model), three situations arise: first, the kriging spatial resolution
is coarser than the satellite, a condition occurring when either the field
is homogeneous or the field is undersampled. In situations where the field
is homogeneous (

To demonstrate the upscaling procedure, we use C5opt (

First row: C5Opt outputs convolved with an ideal box kernel with
different sizes (

We further directly compare

Illustrating the problem of spatial scale: comparisons of the kriging estimates at seven different spatial scales with the samples used for the C5opt estimation. The perceived discrepancies are purely due to the spatial representativeness.

To elaborate on the problem of scale, we design an idealized experiment
theoretically validating pseudo satellite observations against some pseudo
point measurements. The pseudo satellite observations are created by
upscaling the C5 truth

Yet, the comparison misses an important point: the kriging estimate is
considered error-free. We attempt to incorporate the kriging variance
through a Monte Carlo linear regression method. Here, the goal is to find an
optimal linear fit

Figure 9 summarizes the general roadmap for satellite (and model)
validations against point measurements. To fit the semivariogram with at
least two parameters, we are required to have three samples at minimum.
Therefore, it is implausible to derive the spatial information from the
point data where sampling is extremely sparse (

The proposed roadmap for transforming pointwise measurements to gridded data in satellite (model) validation.

We begin with focusing on tropospheric

First column: the spatial distribution of TROPOMI tropospheric

The preceding TROPOMI data enabled us to optimize a tessellation of
ground-based point spectrometers over Houston. Our goal here is to propose
an optimized network for winter 2021 given our knowledge on the spatial
distribution of

Figure 11 shows the optimized tessellation given 5, 10, 15, and 20 spectrometers over Houston. The Houston plume is better represented with more samples being used. All cases share the same feature; the optimized samples are clustered in the proximity or within the plume. This tendency is clearly intuitive. We are required to place the spectrometers in locations where a substantial gradient (variance) in the field is expected. The difference between the kriging estimate and the TROPOMI observations using 20 samples does not substantially differ in comparison to the one using 15 samples. Therefore, to keep the cost low, a preferable strategy is to keep the number of spectrometers as low as possible while achieving a reasonable accuracy. Based on the presented results, the optimized tessellation using 15 samples is preferred among others because it achieves roughly the same accuracy as the one with 20 samples.

Finding an optimum sample tessellation for wintertime over Houston given a different number of spectrometers (5, 10, 15, and 20).

In order to understand ozone pollution (e.g., Mazzuca et al., 2016; Pan et
al., 2017b, 2015), characterize anthropogenic emissions (Souri et
al., 2016, 2018), and validate satellite data (Choi et al., 2020), an
intensive air quality campaign was carried out in September 2013 over Houston
(DISCOVER-AQ). The campaign encompassed a large suite of Pandora
spectrometer instrument (PSI) (11 stations) measuring total

The spatial distribution of OMI tropospheric

We then follow the validation framework shown in Figure 9 in which the
number of point measurements and the level of heterogeneity are the main
factors in deciding if we should directly compare them to the satellite
pixels. Figure 13 shows the monthly-averaged PSI measurements along with the semivariogram and resulting kriging estimate at an optimized resolution
(

The Pandora tropospheric

Convolving both kriging estimates and errors with the OMI spatial response function formulated in Sun et al. (2018). The differences against the pre-convolved fields are also depicted.

We ultimately conduct two different sets of comparison: directly comparing
PSI to OMI pixels and comparing convolved kriged PSI to OMI. It is worth
noting that PSI measurements are monthly-averaged; similarly, OMI data are
oversampled on a monthly basis. In terms of the PSI, we only account for
grid boxes whose kriging error is below

There needs to be increased attention to the spatial representativity in the validation of satellite (model) against pointwise measurements. A point is the element of space, whereas satellite (model) pixels (grid box) are (at best) the product of the integration of infinitesimal points and a normalized spatial response function. If the spatial response function is assumed to be an ideal box, the resulting grid box will represent the average. Essentially, no justifiable theory exists to accept that the averaged value of a population should absolutely match with a sample, unless all samples are identical (i.e., a spatially homogeneous field). This glaring fact is often overlooked in the atmospheric science community. At a conceptual level, we are required to translate pointwise data to the grid format (i.e., rasterization). This can be done by modeling the spatial autocorrelation (or semivariogram) extracted from the spatial variance (information) among measured sample points. Assuming that the underlying field is a random function with an unknown mean, the best linear unbiased predictions of the field can be achieved by kriging using the modeled semivariograms.

In this study, we discussed methods for the kriging estimation of several
idealized cases. Several key tendencies were observed through this
experiment: first, the range corresponded to the degree of spatial
heterogeneity; a larger range indicated the lower presence of heterogeneity.
Second, the kriging variance explaining the density of information quickly
diverged from zero to large values when the field exhibited large spatial
heterogeneity. This tendency mandates increasing the number of samples
(observations) for those cases. Third, while the semivariogram models were
constructed from discrete pair of samples, they are mathematically in a
continuous form. It is because of this reason that we determined the optimal
spatial resolution of the kriging estimate by incrementally making the grids
finer and finer until a desired precision (

The present study applied kriging to achieve an optimum tessellation given a certain number of samples such that the difference between our prior knowledge of the field, articulated by previous observations, models, or theory, and the estimation is minimal. Usually there is uncertainty about the prior knowledge that should be propagated to the final estimates. The optimum tessellation for a range of idealized and real-world data consistently voted for placing more samples in areas where the gradients in the measurements were significant such as those close to point emitters.

This study also revisited the spatial representativity issue; it limits the
realistic determination of biases associated with satellites (models). In
one experiment, we convolved the kriging estimate for a multi-plume field
with a box filter but various sizes. The perfect agreement (

We further validated monthly-averaged Ozone Monitoring Instrument (OMI)
tropospheric

The central component of satellite and model validation is pointwise
measurements. Our experiments paved the way for a clear roadmap explaining
how to transform these pointwise datasets to a comparable spatial scale
relative to satellite (model) footprints. It is no longer necessary to
ignore the problem of scale. The validation against point measurements can be carefully
conducted in the following steps:

construct the experimental semivariogram if the number of point measurements allows (usually

drop the quantitative assessment if the number of point measurements are insufficient to gain spatial variance and the prior knowledge suggests a high likelihood of spatial heterogeneity within the field;

choose an appropriate function to model the semivariogram;

estimate the field with kriging (or any other spatial estimator capable of digesting the semivariogram) and calculate the variance;

find the optimum grid resolution of the estimate;

convolve the kriging estimate and its variance with the satellite (model) spatial response function (which is sensor-specific);

conduct the direct comparison of the convolved kriged output and the satellite (model) considering their errors through a Monte Carlo (or a weighted least-squares method).

Recent advances in satellite trace gas retrievals and atmospheric models have helped extend our understanding of atmospheric chemistry, but an important task before us in improving our knowledge on atmospheric composition is to embrace the semivariogram (or spatial autocorrelation) notion when it comes to validating satellites/models using pointwise measurements, so that we can have more robust quantitative applications of the data and models.

The analyses presented in this work utilized Schwanghart (2021a, b) functions in MATLAB.

Tropospheric NO

The supplement related to this article is available online at:

AHS designed the research, executed the experiments, analyzed the data, made all figures, and wrote the paper. KS implemented the oversampling method, provided the spatial response functions, and oversampled TROPOMI data. KC, XL, and MSJ helped with the conceptualization of the study and the interpretation of the results. All authors contributed to discussions and edited the paper.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Amir Souri and Matthew Johnson were funded for this work through NASA's Aura Science Team (grant no. 80NSSC21K1333). Kang Sun acknowledges support by NASA's Atmospheric Composition Modeling and Analysis (ACMAP) program (grant no. 80NSSC19K09). We thank many scientists whose concerns motivated us to tackle the presented problem. In particular, we thank Chris Chan Miller, Ron Cohen, Jeffrey Geddes, Gonzalo González Abad, Christian Hogrefe, Lukas Valin, and Huiqun (Helen) Wang.

This research has been supported by the National Aeronautics and Space Administration (grant nos. 80NSSC21K1333 and 80NSSC19K09).

This paper was edited by Can Li and reviewed by three anonymous referees.