Forward model emulator for atmospheric radiative transfer using Gaussian processes and cross validation

Lamminpää, Otto; Susiluoto, Jouni; Hobbs, Jonathan; McDuffie, James; Braverman, Amy; Owhadi, Houman

doi:https://doi.org/10.5194/amt-18-673-2025

Articles | Volume 18, issue 3

https://doi.org/10.5194/amt-18-673-2025

Articles | Volume 18, issue 3

Research article

06 Feb 2025

Research article |

| 06 Feb 2025

Forward model emulator for atmospheric radiative transfer using Gaussian processes and cross validation

Otto Lamminpää, Jouni Susiluoto, Jonathan Hobbs, James McDuffie, Amy Braverman, and Houman Owhadi

Abstract

Remote sensing of atmospheric carbon dioxide (CO₂) carried out by NASA's Orbiting Carbon Observatory-2 (OCO-2) satellite mission and the related uncertainty quantification effort involve repeated evaluations of a state-of-the-art atmospheric physics model. The retrieval, or solving an inverse problem, requires substantial computational resources. In this work, we propose and implement a statistical emulator to speed up the computations in the OCO-2 physics model. Our approach is based on Gaussian process (GP) regression, leveraging recent research on kernel flows and cross validation to efficiently learn the kernel function in the GP. We demonstrate our method by replicating the behavior of OCO-2 forward model within measurement error precision and further show that in simulated cases, our method reproduces the CO₂ retrieval performance of OCO-2 setup with computational time that is orders of magnitude faster. The underlying emulation problem is challenging because it is high-dimensional. It is related to operator learning in the sense that the function to be approximated maps high-dimensional vectors to high-dimensional vectors. Our proposed approach is not only fast but also highly accurate (its relative error is less than 1 %). In contrast with artificial neural network (ANN)-based methods, it is interpretable, and its efficiency is based on learning a kernel in an engineered and expressive family of kernels.

Download & links

Article (PDF, 6561 KB)

Download & links

How to cite.

Received: 04 Apr 2024 – Discussion started: 03 May 2024 – Revised: 19 Nov 2024 – Accepted: 28 Nov 2024 – Published: 06 Feb 2025

1 Introduction

Climate change, one of the most significant global environmental challenges, is primarily attributed to anthropogenic carbon emissions, which have accelerated the increase of carbon dioxide (CO₂) in the atmosphere, posing a threat to Earth's future. The industrial revolution marked the onset of increased CO₂ emissions due to the extensive use of fossil fuels in various industries, such as transportation, manufacturing, and agriculture. The Intergovernmental Panel on Climate Change underscores CO₂'s potent effect on planetary warming due to significant radiative forcing (IPCC, 2023). The atmospheric concentration of this trace gas is increasing at an ever faster rate, and as of May 2023, the measured CO₂ at Mauna Loa station was 424.0 ppm, a 3 ppm increase from a year before (421.0 ppm in May 2022). Although the global terrestrial biosphere and oceans each take up about 25 % of these emissions (Friedlingstein et al., 2022), this balance may not be sustainable, which might lead to unpredictable feedbacks in the carbon cycle and the global climate system. These couplings between the Earth's climate system and the carbon cycle can introduce significant uncertainty in future climate change projections (Friedlingstein et al., 2014), which further renders mitigation efforts increasingly challenging.

For reliable climate modeling and future scenario prediction, it is crucial to estimate carbon flux accurately (e.g., CarbonTracker, Peters et al., 2007), which involves quantifying both the sources and natural sinks of carbon. However, current in situ measurement networks are primarily deployed in the northern midlatitudes, leaving areas like the tropics underrepresented (Schimel et al., 2015). This lack of extensive coverage results in large uncertainties in flux estimates, underscoring the need for a more comprehensive global measurement network.

To provide a significant increase in coverage and resolution to the ground-based data set, global estimates of total column-averaged mole-fraction CO₂ (average amount of CO₂ over a vertical column of air at a specific ground pixel/location), denoted XCO₂, are collected using satellite-borne spectrometers. These instruments include the Japanese Greenhouse gases Observing SATellite (GOSAT, Kuze et al., 2009), operational since January 2009; the follow-on GOSAT-2 (Imasu et al., 2023) launched in October 2018; the Orbiting Carbon Observatory-2 from NASA (OCO-2, Crisp et al., 2012), launched in July 2014; the OCO-3 instrument (Eldering et al., 2019) taken to the International Space Station in May 2019; and the Chinese TanSat (Ran and Li, 2019) and TanSat 2 (Wu et al., 2023). Planned future missions include the Geostationary Carbon Cycle Observatory (GeoCarb, Moore et al., 2018), the European CO₂ Monitoring Mission (CO2M, Sierk et al., 2019), and the Global Observing Satellite for Greenhouse gases and Water cycle (GOSAT-GW, Kasahara et al., 2020). In this work, we focus exclusively on OCO-2, which, like all the abovementioned missions, measures solar radiance at the top of the atmosphere, reflected by Earth's surface and attenuated by atmospheric scattering and absorption by trace gases and aerosols. From these observed radiances, the OCO-2 mission uses a framework called optimal estimation (OE, Rodgers, 2004) to solve the related Bayesian inverse problem (see, e.g., Kaipio and Somersalo, 2005), referred to as a retrieval. OE is an iterative algorithm, returning an estimate of posterior mean and covariance as a Gaussian approximation to the nonlinear retrieval problem. Operationally, the retrieval problem is solved using the Atmospheric Carbon Observations from Space (ACOS) software (O'Dell et al., 2018), which implements OE using a state-of-the-art atmospheric full-physics (FP) model. Processing OCO-2 measurements with the ACOS algorithm is a computationally intensive task, and currently about one-third of prescreened clear soundings are used in a low-latency data processing stream (O'Dell et al., 2018). As the data record grows, computational speed is also a major hindrance for retrospective processing of the full collection of cloud-free soundings for the current and any future improved algorithms. Thus, computational efficiency is a limiting factor in releasing the improved data to the user community. These issues are certain to get even worse with upcoming wider-swath missions like CO2M and GOSAT-GW, as evidenced by another greenhouse gas imaging mission, Tropospheric Ozone Monitoring Instrument (TROPOMI, Veefkind et al., 2012), from regularly reprocessing their data record, which is more than 20 times greater in size than that of OCO-2.

As with all inverse problems, some approximations and assumptions have to be made in the ACOS algorithm. The resulting XCO₂ estimates have to be validated and bias-corrected using ground-based measurements from the Total Carbon Column Observing Network (TCCON, Wunch et al., 2017) and the COllaborative Carbon Column Observing Network (COCCON, Frey et al., 2019) as a reference. These sites are concentrated on the northern midlatitudes, and as a result of this coverage issue and the imperfections in the FP model, significant systematic errors persist in the data set. (See, e.g., Kiel et al., 2019, for the effect of systematic errors and Cressie, 2018, for an overview of statistical treatment of and issues in the retrieval.) Considerable effort has been exerted to tackle the high accuracy (less than 0.3 parts per million (ppm) in scenes with background levels of around 410 ppm) and high precision (standard errors less than 0.5 ppm) requirements of ingesting OCO-2 into flux inversion, which is the primary application of the data product (Gurney et al., 2002; Patra et al., 2007; Liu et al., 2017; Palmer et al., 2019; Crowell et al., 2019; Peiro et al., 2022; Byrne et al., 2023). Recent advancements in applying Markov chain Monte Carlo (MCMC, Brynjarsdóttir et al., 2018; Lamminpää et al., 2019) for non-Gaussian posterior characterization and simulation-based uncertainty quantification (Braverman et al., 2021) for capturing the overall uncertainty in the retrieval pipeline have been successfully deployed for addressing persisting retrieval errors. These methods, although comprehensive, suffer equally from computational speed issues as they require an extensive number of FP evaluations.

Computational speed issues in OE retrievals have been addressed in several ways. Neural network (NN)-based machine learning approaches (David et al., 2021; Mishra and Molinaro, 2021; Bréon et al., 2022) have been implemented to a combination of real-world radiance data and model atmospheres (outputs of computational atmospheric models, like the Copernicus Atmospheric Monitoring Services (CAMS) model; Chevallier et al., 2010). The OCO-2 forward model itself was sped up by using a surrogate model (Hobbs et al., 2017) that only partially considered the physical processes present in the FP model and more recently by using a Gaussian process (GP) emulator (Ma et al., 2019) for replicating the output of the FP model. In this paper, we will take a similar approach using GPs but with several improvements and an application to solving the retrieval problem with the help of closed-form Jacobians required in the gradient-based algorithm. GPs are a well-suited technique for forward model emulation, since they can be used with less data (130 K, David et al., 2021, for NN vs. 20 K for GP; see Sect. 4) and trained more quickly than NN-based approaches. Additionally, a GP provides uncertainty estimates and closed-form Jacobians trivially, which are not straightforward to extract from a NN. For training efficiency, our approach will leverage recent novel techniques for GP parameter learning called kernel flows (Owhadi and Yoo, 2019) and training data generation via evaluating the FP model using the Reusable Framework for Atmospheric Composition (ReFRACtor) (McDuffie et al., 2020). We will demonstrate the accuracy of forward model emulation against a holdout test set of FP evaluations and further demonstrate the ability of our emulator to replicate the OE retrieval performance of ReFRACtor FP model in a fraction of the computational time. Our approach achieves a remarkably low prediction error, less than 1 % (“within measurement error limits”), which is an excellent result in the field of more general operator learning. Strategies to achieve learning more complicated operators, like the FP model in our case, often involve a NN-based architecture (Lu et al., 2021; Li et al., 2024). Our approach follows the example set by Batlle et al. (2023) that kernel methods are competitive in operator learning.

The rest of the paper is organized as follows. Section 2 will describe in detail the GP regression, kernel learning, and the resulting forward model emulator. Section 3 will further elaborate on the details of the OCO-2 retrieval algorithm, the state vector, and the FP model describing atmospheric radiative transfer. Section 4 will detail the emulator implementation of the ReFRACtor FP model and assess its performance. Section 5 will show results of our emulator used in a simulated XCO₂ retrieval context, and finally Sect. 6 will provide concluding remarks and ideas on future work and applications.

2 Gaussian process emulator

Gaussian process (GP) regression (Rasmussen and Williams, 2006) (also called kriging in spatial context: Cressie, 1993; Stein, 1999) is a well-studied methodology for approximating any continuous function to an arbitrary accuracy, leveraging training data and a kernel function prescribed a priori. In addition, once trained, the GP model can be used to obtain fast and accurate predictions of a computationally demanding physics model, to estimate prediction uncertainty, and to compute closed-form derivatives and Jacobians for the prediction. Physical constraints like positive parameter values can be accounted for in training data design (which we address in later sections) so that predictions happen with the support of a training data set. This is to say that our training data set will cover the expected minimum and maximum values of each parameter in the state vector. For example, surface albedo is physically restricted between 0 (no reflected light) and 1 (full reflection). If our training data are sufficiently well spread covering this range, we can make good predictions essentially by interpolation with the physically feasible interval. Potential departure from this support can be detected by large prediction uncertainty values, as the prediction uncertainty of a GP gets large by design if the input location does not have training points near it. In this section, we outline the basic theory of GP regression and outline our approach to modeling the continuous function between atmospheric state vectors x and radiances y observed by the OCO-2 instrument. We also provide background on maximum likelihood estimation for fitting GP models and present a novel root mean square error (RMSE) cross-validation extension for the kernel flow (Owhadi and Yoo, 2019) approach, which we employ for the rest of this work.

2.1 Gaussian process regression

To construct an emulator for the forward model F(x), we employ Gaussian process (GP) regression to predict a label $z^{*} \in R$ at a new state $x^{*} \in R^{m}$ . A GP is defined by a kernel function (defined explicitly later) $k (x, x^{'})$ : $X \times X \to R$ , where in the cases studied in this work 𝒳=ℝ^m. We denote by Γ[X,X] the matrix of all kernel function evaluations over the training data $X \in R^{m \times N}$ of N points with the entries $Γ [X, X]_{i, j} = k (x_{i}, x_{j})$ , where x_i and x_j are the ith and jth training data points, respectively. Furthermore, $Γ [x^{*}, X]$ denotes the vector of kernel evaluations of state x^* against all training points X. Using the training data together with vector of corresponding labels z∈ℝ^N, a GP prediction of label (or function value) at a new state x^* is given by

\begin{matrix} (1) & z^{*} \equiv GP (x^{*}) = Γ [x^{*}, X] {(Γ [X, X] + σ I)}^{- 1} z, \end{matrix}

where we have assumed without loss of generality that the training data are centered, and thus the GP has a zero mean. We add that the term ${(Γ [X, X] + σ I)}^{- 1} z$ does not depend on the new input state, so it can be precomputed. This makes the predictions take minimal computational time by avoiding inverting a potentially large matrix ${(Γ [X, X] + σ I)}^{- 1}$ .

In GP literature, the variance term σ𝕀 is usually taken to be the measurement error or local-scale unexplained variability in the training labels z. However, since we are interested in reproducing the outputs of a computer code, the “measurements” are exact, and hence there is no measurement error. It was shown in Owhadi and Yoo (2019) that learning the parameters of GP models from noiseless data can lead to unstable predictive models and numerical singularities. For this reason, we treat σ as a regularization parameter, which captures the empirical mismatch between the model and the actual data, and optimize it together with other kernel parameters.

In addition to point predictions, GP prediction can be associated with prediction uncertainty (the posterior variance of the GP), given by

\begin{matrix} (2) & σ^{*} = k (x^{*}, x^{*}) - Γ [x^{*}, X] {(Γ [X, X] + σ I)}^{- 1} Γ [x^{*}, X]^{T} . \end{matrix}

The ability to include prediction uncertainties sets GP regression apart from many modern NN-based machine learning methods, which only provide a point estimate as a prediction. Large prediction variance can be an indication of departure from the support of a training data set, indicating that GP is likely to lose its prediction skill. Additionally, uncertainty from the predictions can be propagated forward and accounted for in further applications of GP-based emulators.

The GP formulas presented here rely on conditional Gaussian distributions and thus have a similar structure to that of optimal interpolation (OI; e.g., p. 157 of Kalnay, 2002) and related iterative optimal estimation (OE) algorithms (e.g., Eq. 15), both of which are widely used methods in atmospheric remote sensing. OI and OE use Gaussian assumptions to derive a mean and covariance for the posterior distribution as a solution to an inverse problem, i.e., data assimilation or a retrieval. In Gaussian process regression, the target function (here, the forward model) is represented similarly by a Gaussian distribution that has a mean (prediction) and covariance (error estimate of the prediction).

Our interest will be in replicating the results of a gradient-based optimization problem. Hence, in addition to fast evaluations of F(x), we would also benefit from fast derivatives obtained from closed-form expressions. Combining Eqs. (1) and (4), we get

\begin{matrix} (3) & \frac{d}{d x^{*}} z^{*} = \frac{d}{d x^{*}} Γ [x^{*}, X] {(Γ [X, X] + σ I)}^{- 1} z, \end{matrix}

which describes taking the derivative of Eq. (1) with respect to x^*.

While other machine learning methods, such as artificial neural networks and multilayer perceptrons, can in principle be differentiated, computing the derivatives of a large architecture is computationally more demanding than evaluating Eq. (3), which motivates our use of GP regression. Other similar approaches (e.g., radial basis function networks) can be shown to be universal approximators as well and could be used in place of GPs. As will be shown, our approach yields a fast and accurate predictor that is intuitive and relatively easy to implement, so comparison against other machine learning methods will not be pursued further in the scope of this work.

2.2 Kernel function

A crucial modeling choice in GP regression is specification of a kernel function. This task involves either expert knowledge of the domain structure or some iterative trial-and-error search. In our application, we have empirically observed that a kernel function consisting of the sum of Matérn and linear kernels yields excellent predictive performance. This is likely due to a locally near-linear behavior commonly assumed with the OCO-2 forward model being captured by the linear kernel, together with a largely flexible Matérn term that is known to capture a large variety of nonlinear effects. The Matérn kernel is a more expressive choice of kernel compared to the usual Gaussian/radial basis functions used by default in Gaussian process regression, which tend to be “too smooth” to capture more abrupt changes in the function that is being approximated. Furthermore, such a kernel can also be differentiated in closed form. The kernel function used throughout this work is given by

\begin{matrix} (4) & \begin{aligned} k (x, x^{'}) & = α_{1} (1 + \frac{\sqrt{3}}{l} ‖ (x - x^{'}) ‖_{W}) \\ \exp (- \frac{\sqrt{3}}{l} ‖ (x - x^{'}) ‖_{W}) + α_{2} (W x)^{T} (W x^{'}), \end{aligned} \end{matrix}

where $‖ (x - x^{'}) ‖_{W} = \sqrt{(x - x^{'})^{T} W^{2} (x - x^{'})}$ , 𝒲=diag(w) is a diagonal matrix, w∈ℝ^m is a vector of weights, l∈ℝ is a length scale parameter, and α₁ and $α_{2} \in R_{+}$ are positive weights that are restricted to sum to 1.

To compute Jacobians, we need an expression for the derivative of the kernel function $\frac{d}{d x^{*}} Γ [x^{*}, X]$ in Eq. (3). This can be computed in closed form from Eq. (4) using known matrix identities. The derivation of a closed-form expression can be found in Appendix A.

2.3 Parameter learning

Prediction quality of GP regression depends on identifying the hyperparameters θ that best fit the training data. In our case, following the form of our kernel function, we have $θ = [w, l, σ, α_{1}, α_{2}]$ . Hyperparameters are commonly learned via optimization, using maximum likelihood estimation (MLE, Rasmussen and Williams, 2006). This amounts to minimizing

\begin{matrix} (5) & L (θ) = - \frac{1}{2} \log [det (Γ_{θ})] - \frac{1}{2} z^{T} Γ (θ)^{- 1} z, \end{matrix}

where $Γ (θ) = Γ [X, X]$ evaluated at parameter values θ. Although this method is usually robust and performs well, GP applications with high-dimensional inputs and a large amount of training data are known to be challenging due to inverse matrix and log-determinant calculations. Numerous approaches have been suggested to tackle this problem (e.g., local approximations, Vecchia, 1988; Datta et al., 2016). Inspired by the kernel flow approach (Owhadi and Yoo, 2019) where kernel parameters are learned by minimizing a relative reproducing kernel Hilbert space (RKHS) norm, we propose a cross-validation RMSE-based method to be used in this work. Intuitively, RKHS norm is a way to measure the smoothness of a function approximation achieved with the kernel method. While smooth methods generally yield discretization-invariant predictors, we propose to directly minimize the prediction error instead. The intuition of this approach is to iteratively select small mini-batches of the training data set and individually leave points out one by one while using the rest of the mini-batch to predict the left-out values (via Eq. 1). Learning the kernel parameters that minimize this prediction error leads to globally good predictions given new inputs. This approach leverages the known screening effect associated with Matérn kernels, where the effects of faraway points on prediction accuracy diminish, and only close-by points are necessary for prediction accuracy. The same intuition is the basis of numerous nearest neighbors GP methods (e.g., Vecchia, 1988). The upside of our approach is the ability to select small mini-batches on each training iteration, allowing for faster computations while avoiding expensive log-determinant calculations and inverting the large covariance matrices required in MLE. We will later show that our proposed method converges reliably and yields excellent predictions.

We start by selecting a mini-batch X^batch and z^batch of size N_batch by randomly sampling from training data. We define a leave-one-out (LOO) cross-validation loss function with respect to L² error (also known as RMSE) by first considering taking out one data point from the training data and using the rest to predict it. This can be achieved by modifying the GP prediction formula from Eq. (1) and leaving out the ith data point. This is achieved via a rank-one downdate $\tilde{Γ} (θ)^{- 1} - \frac{\tilde{Γ} (θ)_{:, i}^{- 1} \tilde{Γ} (θ)_{:, i}^{- T}}{\tilde{Γ} (θ)_{i, i}^{- 1}}$ to remove the effect of the ith data point from the inverse covariance matrix $\tilde{Γ} (θ)^{- 1}$ . (See Stewart, 1998, and Zhu et al., 2022, for details.) The modified LOO prediction formula is then given by

\begin{matrix} (6) & \tilde{GP} (θ, i) = \tilde{Γ} (θ)_{:, i}^{T} (\tilde{Γ} (θ)^{- 1} - \frac{\tilde{Γ} (θ)_{:, i}^{- 1} \tilde{Γ} (θ)_{:, i}^{- T}}{\tilde{Γ} (θ)_{i, i}^{- 1}}) z^{batch}, \end{matrix}

where $\tilde{Γ} (θ) = Γ (X^{batch}, X^{batch})$ is the N_batch×N_batch covariance over the mini-batch evaluated at parameter values θ, Here, the notation $\tilde{Γ} (θ)_{:, i}$ means all rows of the ith column. We then define the final loss function by using Eq. (6) to predict z_i (the ith training label removed from the mini-batch) as

\begin{matrix} (7) & ρ (θ) = \sum_{i = k_{1}}^{k_{p}} {(\tilde{GP} (θ, i) - z_{i})}^{2} + ϵ ‖ θ_{0} - θ ‖^{k}, \end{matrix}

where $i \in [k_{1} \dots k_{p}] \subset [1 \dots N_{batch}]$ is a subset of p≤N_batch indices denoting elements of the mini-batch selected for prediction, which can be chosen as, for example, the entire mini-batch or the p nearest neighbors of the center point of the mini-batch. The regularization term, with error norm $‖ \cdot ‖^{2}$ , some penalty magnitude ϵ, and mean θ₀, is included to ensure that kernel amplitude parameter values do not grow uncontrollably. This is done since we have observed empirically that letting non-identifiable parameters grow during optimization can lead to the optimizer getting “stuck”, whereas this problem is not observed when regularizing the loss function. One may, for example, set θ₀ to be a vector of 1's.

We can now optimize the kernel parameters iteratively by repeatedly selecting mini-batches and updating θ along the gradient of ρ(θ), which is obtained by automatic differentiation using Julia's Zygote package (Innes, 2019). We note that closed-form kernel derivatives could be used here as well, but since automatic differentiation with mini-batch sizes we use uses negligible computational time, we will not pursue this idea further in this work. We note that as the mini-batch is selected at random, this method can be viewed as stochastic gradient descent. For this reason, we use the adaptive moment estimation (ADAM, Kingma and Ba, 2017) optimizer to find the optimal value. Use of a momentum-based optimizer is further recommended in this application as we have observed that the cost function often has several local minima. The optimization procedure is summarized in Algorithm 1. The final parameter value can be selected to be the one corresponding to the smallest loss function value achieved during training.

Algorithm 1Kernel parameter learning

Input:

kernel function k, training data (X,z), batch size N_batch, number of prediction points p, number of iterations N_Iter.

Output:

matrix of kernel parameters Θ and vector of loss values R

initialize

θ_{1} \leftarrow 1, Θ \leftarrow 0, R \leftarrow 0

for all i in 1…N_Iter do

X^{batch} \leftarrow X [rand (N_{batch})], z^{batch} \leftarrow z [rand (N_{batch})]

// Randomly select a mini-batch X^batch,z^batch

R[i]←ρ(θ_i) // Compute loss ρ(θ_i) from Eq. (7)

Θ [i] \leftarrow θ_{i + 1}, θ_{i + 1} \leftarrow θ_{i} + ADAM (\frac{\partial}{\partial θ} ρ (θ_{i}))

// Compute gradient

\frac{\partial}{\partial θ} ρ (θ_{i})

and update parameters θ_i using ADAM

end for

return Θ,R

2.4 Training data generation

As we aim to reproduce the performance of a function represented as computer code, we take advantage of the freedom to use a space-filling design for x in ℝ^m for training data creation. We first span the unit cube [0,1]^m with a Sobol' sequence (Sobol, 1967; Press et al., 1992) of N points. In practice we employ Julia's Sobol.jl (Johnson, 2020) package for this step. Then, using information about the minimum and maximum physically feasible value of each input dimension, we scale the unit cube to span the whole state space. During research, we tested other methods like random sampling and Latin-hypercube-based methods, which turned out to leave “holes” in training data set, meaning non-constant predictive performance over the entire data set. Sobol' sequences, meanwhile, span the entire input space more evenly. Sobol' sequences are a space-filling design akin to Latin hypercubes, providing optimality (observed experimentally) in generation of training data. We further evaluate the computational model F(x) at each training point, obtaining states $X \in R^{N \times m}$ and model outputs $Y \in R^{N \times n}$ .

3 The Orbiting Carbon Observatory-2

In this section, we describe OCO-2 and the related measurements, physics model, state vector, and retrieval algorithm. Further information on these topics can be found in, for example, Connor et al. (2008), O'Dell et al. (2012), Crisp et al. (2012), O'Dell et al. (2018), and in the Algorithm Theoretical Basis Document (ATBD) (Boesch et al., 2015).

3.1 The OCO-2 instrument

OCO-2 is a NASA-operated satellite mission dedicated to providing data products of global atmospheric carbon dioxide concentrations (Crisp et al., 2004). The satellite is pointed towards Earth as it measures solar light reflected by Earth's surface and atmosphere, recorded as radiances. The OCO-2 instrument itself is composed of three spectrometers that measure light reflected from Earth's surface in the infrared part of the spectrum in three separate wavelength bands. These bands are centered around 0.765, 1.61, and 2.06 µm and are called the O₂ A-band (O2), the weak CO₂ band (WCO2), and the strong CO₂ band (SCO2), respectively. Each observation consists of 1016 radiances on separate wavelengths from each band (for more information, see, e.g., Crisp et al., 2017; Rosenberg et al., 2017). These measurements are then used to infer a state vector containing information on atmospheric properties like CO₂ concentration on 20 pressure levels, surface pressure, temperature, and aerosol optical depth (AOD). The state vector also includes surface properties like albedo and solar-induced chlorophyll fluorescence (SIF). The primary scalar quantity of interest is the column-averaged CO₂ concentration (XCO₂).

3.2 Atmospheric radiative transfer

A key part to inferring XCO₂ from observed radiances is the construction of a computational atmospheric radiative transfer model which describes how solar radiation is propagated, reflected, and scattered by Earth's surface and atmosphere. Together with an instrument model, this computer code is known as the full-physics (FP) model, referred to in this work as

\begin{matrix} (8) & y = F (x, b), \end{matrix}

where y is the output of the FP model (a wavelength-by-wavelength radiance), x is a state vector containing atmospheric and surface information, and b denotes model parameters held fixed during data processing. A thorough description of the FP model is given in the ATBD (Boesch et al., 2015). To motivate our emulation approach, we will here describe parts of the forward model physics, which is not intended to be a full description of the included physics. Rather, we leverage this information to better design our emulator.

Part of the radiance comes from absorption of radiation by atmospheric molecules, given by

\begin{matrix} (9) & I (λ) = f_{0} (λ) \cos (τ_{0}) \cdot R (λ, θ, θ_{0}, φ - φ_{0}) \exp (- g (λ)), \end{matrix}

where λ is wavelength, the jth wavelength corresponds to the jth entry of radiance y, f₀(λ) is the solar flux at the top of the atmosphere, $R (λ, θ, θ_{0}, φ - φ_{0})$ is the reflectance of the surface, g(λ) is an integral over radiation path length that sums over for all modeled absorbers, θ and φ are the observation zenith and azimuth angles, and θ₀ and φ₀ are the corresponding solar zenith and azimuth angles. Observation and solar angles have a significant effect on the observed and modeled radiances, which will be important later in this work.

After calculating the absorption with Eq. (9), equations further describing atmospheric scattering are employed to solve for atmospheric radiative transfer (RT), which describes the total effect of atmosphere and surface on the scattered photons. The FP framework further includes an instrument model, which describes the effects of the observing system to the top-of-the-atmosphere radiances. These effects include instrument Doppler shift, spectral dispersion, and convolution with the instrument line shape (ILS) function, reducing the resolution from the finer RT grid to the coarser observational grid. On an abstracted level, this corresponds mathematically to

\begin{matrix} (10) & I_{OBS} (λ) = C_{1} (λ) \int_{- \infty}^{+ \infty} RT (λ^{'}) ILS (λ, λ^{'}) d λ^{'} + C_{2} (λ), \end{matrix}

where C₁(λ) and C₂(λ) denote the instrument effects other than convolution that can be expressed as multiplication and addition. Generally speaking, the instrument effects depend on different physical properties that can vary between detector arrays, while the RT portion of the forward model is constant within the instrument. This observation motivates us to focus on emulating the outputs of the RT, referred to as monochromatic radiances, after which instrument functions can be applied appropriately after the fact. Looking forward to operational integration of our emulator, this will reduce the complexity of the emulated system and arguably make our task easier.

3.3 OCO-2 state vector

The state vector elements comprising x for the FP model are summarized in Table 1. Notably, we have divided the table into two parts. The upper half lists the previously mentioned atmospheric and surface state vector elements that affect the RT part only, and the rest having to do with the instrument effects are in the lower half. This collection includes scaling factors for empirical orthogonal functions (EOFs) that capture unmodeled offsets in the observed radiances (O'Dell et al., 2018).

Table 1Elements of the OCO-2 state vector by functional group. The second column indicates the total elements per group. The check marks in the remaining columns indicate which wavelength bands are sensitive to changes in each variable.

Download Print Version | Download XLSX

In addition to state vector elements, the FP model is parameterized by a set of parameters that are held fixed based on auxiliary information, such as laboratory measurements or meteorological data sets. These parameters include instrument calibration details, spectroscopy properties for absorbing gases, land elevation, and aerosol microphysical parameters. These aerosol parameters arise from the selection of two dominant aerosol types as a function of space and time. All aerosol types have different optical properties. This choice is determined a priori by global maps based on meteorological knowledge and measurements (see Fig. 1). The possible dominant aerosol types are dust (DU), sulfate (SO), sea salt (SS), organic carbon (OC), and black carbon (BC). While constructing the emulator, we will consider data sets with a fixed pair of dominant aerosol species in order to decouple their physical effects from the rest of state vector. Separate emulators can then be constructed for each pair of aerosol species, and a selection of which one to use can be done by matching the measurement location with the appropriate types.

https://amt.copernicus.org/articles/18/673/2025/amt-18-673-2025-f01

Figure 1Example global map of (a) primary and (b) secondary aerosol types used in the OCO-2 FP model. Different aerosol types imply different physics, which needs to be taken into account when building a forward model emulator. Image taken from Boesch et al. (2015).

3.4 ReFRACtor

This work develops a proof-of-concept version of OCO-2 forward model emulator for a simulated case. For this reason and ease of access, we implement our simulations using the Reusable Framework for Atmospheric Composition (ReFRACtor, McDuffie et al., 2020). ReFRACtor is an extensible multi-instrument atmospheric composition retrieval framework that supports and facilitates combined use of radiance measurements from different instruments in the ultraviolet, visible, near-infrared, and thermal-infrared. It has been open-source since 2014 when it was first developed as the Level-2 processing code for OCO-2. Since 2017 the development team has worked to create a more general framework that supports more instruments and spectral regions. This framework has been developed to provide the broader Earth science community a freely licensed software package that uses robust software engineering practices with well-tested, community-accepted algorithms and techniques. ReFRACtor is geared not only towards the creation of end-to-end production science data systems, but also towards scientists who need a software package to help investigate specific Earth science atmospheric composition questions. Although ReFRACtor includes an implementation of a version of the OCO-2 production algorithm, the two have drifted since the initial intercomparison work was done. At that time it was validated against the B9.2.00 version of the software. For the most part mainly bug fixes have been kept in sync between the two versions. Additionally the core radiative transfer algorithms are the same, which justifies the use of ReFRACtor for constructing our emulator at this stage. Some minor additional algorithmic features made their way into the ReFRACtor version of OCO-2 from the production version. For the most part the major discrepancy will be due to changes in configuration values not implemented in ReFRACtor. These include values such as a priori and covariance versions, EOF data sets, ABSCO versions, and the solar model.

3.5 Retrieval algorithm

Inferring XCO₂ from measured radiances is an ill-posed inverse problem, which is referred to as performing a retrieval. The relationship between measurement and state is first modeled as

\begin{matrix} (11) & y = F (x) + ε, \end{matrix}

where data y∈ℝⁿ are a radiance vector; unknown x∈ℝ^m is the state vector, F; ℝ^m→ℝⁿ is the OCO-2 FP model; and ε∈ℝⁿ is the measurement uncertainty. For completeness, we summarize the operational retrieval algorithm used in OCO-2 processing. The retrieval proceeds with solving the inverse problem by using Bayesian formulation, in which the additive error ε and the prior for x are assumed to be Gaussian such that

\begin{matrix} (12) & ε \sim N (0, S_{ε}), x \sim N (x_{a}, S_{a}) . \end{matrix}

The measurement error covariance matrix S_ε is assumed to be diagonal, with elements for each wavelength j given by

\begin{matrix} (13) & σ_{j}^{2} = k_{1} y_{j} + k_{2}, \end{matrix}

where k₁ and k₂ are calibration parameters adjusted by the instrument calibration team. The a priori covariance is taken to be diagonal for non-CO₂ parameters, and the CO₂ profile is assumed to have a correlation structure shown in Fig. 2, which promotes continuous concentration profiles and limits the variability higher up in the atmosphere.

https://amt.copernicus.org/articles/18/673/2025/amt-18-673-2025-f02

Figure 2The a priori correlation matrix and standard deviation used for the CO₂ vertical profile in the OCO-2 retrieval. Vertical levels are ordered from the top of the atmosphere (Level 1) to the surface (Level 20).

Forward model emulator for atmospheric radiative transfer using Gaussian processes and cross validation

2.1 Gaussian process regression

2.2 Kernel function

2.3 Parameter learning

2.4 Training data generation

3.1 The OCO-2 instrument

3.2 Atmospheric radiative transfer

3.3 OCO-2 state vector

3.4 ReFRACtor

3.5 Retrieval algorithm

4.1 Data transformations

4.2 Training

4.3 Predictive performance

4.4 Faster research version

5.1 Effect of PCA dimensionality

5.2 Effect of aerosol types