the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Forward Model Emulator for Atmospheric Radiative Transfer Using Gaussian Processes And Cross Validation
Abstract. Remote sensing of atmospheric carbon dioxide (CO_{2}) carried out by NASA’s Orbiting Carbon Observatory2 (OCO2) satellite mission and the related Uncertainty Quantification (UQ) effort involves repeated evaluations of a stateoftheart atmospheric physics model. The retrieval, or solving an inverse problem, requires substantial computational resources. In this work, we propose and implement a statistical emulator to speed up the computations in the OCO2 physics model. Our approach is based on Gaussian Process (GP) Regression, leveraging recent research on Kernel Flows and Cross Validation to efficiently learn the kernel function in the GP. We demonstrate our method by replicating the behavior of OCO2 forward model within measurement error precision, and further show that in simulated cases, our method reproduces the CO_{2} retrieval performance of OCO2 setup with orders of magnitude faster computational time. The underlying emulation problem is challenging because it is high dimensional. It is related to operator learning in the sense that the function to be approximated is mapping highdimensional vectors to highdimensional vectors. Our proposed approach is not only fast but also highly accurate (its relative error is less than 1 %). In contrast with Artificial Neural Network (ANN) based methods, it is interpretable and its efficiency is based on learning a kernel in an engineered and expressive family of kernels.
 Preprint
(8320 KB)  Metadata XML
 BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Comment on amt202463', Hristo Georgiev Chipilski, 04 May 2024
SUMMARY
The authors of this study propose a datadriven surrogate of the forward model needed to relate atmospheric state to CO2 measurements in the context of the OCO2 satellite mission. The approach is based on a powerful regressionbased technique known as Gaussian Processes (GPs), which can approximate continuous functions to arbitrary accuracy (i.e., they enjoy the universal function approximation property). The paper describes and implements an enhanced GP emulator inspired by some recent developments in the field (namely, kernel flows with cross validation), and further incorporates dimension reduction techniques to efficiently deal with large volumes of data. The resulting algorithm is tested on synthetic datasets and shown to effectively recover the true forward model within a 1% relative error. Findings are encouraging and pave the way for future implementations on real CO2 datasets.
This is a very well written paper that documents all necessary technical aspects of the new GP surrogate model and offers a detailed evaluation of its performance. The authors provide an excellent motivation of the research topic, raising awareness of both the climate impacts of CO2 levels and the difficulties in measuring the concentrations of this trace gas. A particularly strong feature of the paper are the carefully justified sensitivities to various design choices in the GP algorithm; it was insightful to see how ruleofthumb criteria for selecting an appropriate number of PC components may not be adequate (Section 5.1) and how GP predictions are robust to various dominant aerosol species (Section 5.2). Along similar lines, it was important to emphasize how domain knowledge can play a crucial role for optimizing the accuracy and cost of GP predictions (e.g., the justifications for wavelength band separation).
On account of this positive feedback, I recommend this paper to be accepted for publication in AMT after the authors address my comments below. Most of them are fairly minor and linespecific, but the next section discusses a few more general suggestions pertaining to the manuscript’s presentation that will help further increase its impact.
GENERAL COMMENTS
 Methodology (in particular L66L84, but also other related parts of the paper discussing the novelty of your work): Given the choice of a journal, I expect the primary audience of this article to be experts in measurement techniques who might not be fully aware of all recent advances in the statistical and ML literature. Hence, I advise the authors to make their presentation slightly less technical, and instead focus on some of the practical advantages of using their enhanced GP emulator for retrieval problems. In simple terms, why is the GP method better than standard neural networks? What aspect of the retrieval problem can kernel flows help with — accuracy, computational speed, both? In what sense is the the new crossvalidation based training considered an extension of kernel flow techniques (i.e., what is the intuition behind a “flow” in cross validation)? How do Gaussian processes handle bounded datasets (CO2 concentrations are nonnegative)? What is the alternative to Gaussian processes and neural networks for emulating complex forward models like ReFRACtor? These are some of the questions practitioners would like to know before adopting your GPbased methodology.
 Technical comments on the GP presentation (particularly in terms of Eq. 1 and related text): The GP regression in Eq. (1) has a very similar structure to that of Optimal Interpolation (OI; e.g., page 157 of Kalnay 2003’s textbook “Atmospheric modeling, data assimilation and predictability”) and related iterative Optimal Estimation (OE) algorithms (e.g., Eq. 15 later in the manuscript). Specifically, the terms premultiplying “z” look similar to the OI weight matrix W = BH^T (HBH^T + R)^{1}, where B is the “background” covariance associated with z, R (which is \sigma \Gamma in your paper) is the measurement error covariance matrix and H (K in your notation) is the Jacobian of the observation operator (forward model in your nomenclature). To improve the reader’s intuition about GP models, it will be nice to include a short discussion on these connections as well as the underlying differences with GP (e.g., OI/OE do not have kernel evaluations in the covariance matrix definitions).
 Figure captions and intext figure interpretations: I sometimes found that figure captions do not provide a sufficient description of their content (e.g., Fig. 6). Related to that issue, intext interpretation of some figures was fairly minimal relative to their complexity (see next section for specific examples). I think the authors should add a few more details to make this paper more selfcontained.
SPECIFIC COMMENTS
L28: It will be helpful to include references showing the lack of CO2 measurements in the Tropics.
L31L32: Please elaborate on what total column molefraction CO2 is.
L46: If you are aware how many soundings are discarded in the process, include this in the text.
L68: What do you mean by model atmospheres?
L90: Please ensure consistent notation: radiances are denoted by y here, but Section 2.1 uses z for the labels.
Eq. (2): I assume there exists a more general expressions for the posterior covariance?
L117: Either provide references for these kernels or write them explicitly.
L119L120: Explain why differentiating the kernel function is important; this comes slightly later on L125, but I think it will better to expand on this point earlier.
L125: I would say “Taking the derivative of eq. (1) wrt to x*” here.
Eq. (5): Either provide a reference for this likelihood function or explain/derive how it can be obtained.
L145: L^2 space not explicitly defined, it might be better to continue using the notion of mean squared errors to improve readability.
Eq. (6): It is not clear whether this expression can be derived from Eq. (1). The text around L150 suggests a rankbased approximation is used, but the authors should clarify the connection to the original GP regression in Eq. (1).
L156: Clarify what you mean by a knorm.
L160L166: It will be good to discuss how the ability to compute the kernel derivatives in closed form factors into the automatic differentiation procedure.
Line 1 of Algorithm 1: Shouldn’t R be initialized according to the \theta_1 values, which are set to 1? That is, the initial loss should be consistent with the initial choice of GP parameters.
Section 2.4: What other options are there for generating the parameters? What motivated your choice for using Sobol sequences?
FP model (L190L215): It was not very clear to me how Eqs. (9) and (10) participate in the general forward model described by Eq. (8). For example, is F (loosely) defined as the sum of Eqs. (9) and (10)?
Fig. 1: In the figure caption, please clarify where these aerosol types are used.
L285L287: At this point a significant time period has passed since you last described what K is; to fully appreciate the usefulness of closedform Jacobians of the forward model (i.e., the matrix K), it will be helpful to remind them again what K is.
L204: Define loadings.
L307: What do you mean by template?
Eq. (19): diag(*,*) operation not defined, only diag(*) discussed. I assume diag(A,B) concatenates A and B?
L314: Can you confirm that the unit matrix here is different from the identity matrix?
Eq. (20): Shouldn’t \tilde z have a subscript B?
Fig. 4: It will be good to include intext interpretation of these results.
Fig. 6 and text describing it (L344L349): Please have a more detailed explanation of these steps.
L353: Write down the learning rate value explicitly.
L356: You mean the simulation distribution of Braverman et al. (2021)?
L358L360: Please expand the discussion here, there is lots of information in Figs. 7 and 8 that is left uncommented.
L366L372: Remind the readers of the practical significance for having good approximations of the averaging kernel A.
Table 2: Did you compute the Jacobian for the ReFRACtor model analytically or numerically? It will be insightful to know how challenging it is to compute the derivatives of the physical model.
L398: I believe the abbreviation QoI has not been defined before. Do you mean OI?
Eq. (25): Are the random variables epsilon and delta independent from each other? If so, one can perhaps write Eq. (25) more compactly with a single noise term that combines the mean and covariances of epsilon and delta.
L440: A reference on the 1% relative error in operator learning frameworks will be helpful.
Citation: https://doi.org/10.5194/amt202463RC1 
RC2: 'Comment on amt202463', Anonymous Referee #2, 19 Jun 2024
The Author present a machine learning based radiative transfer emulator to speed up OCO CO2 retrievals. The method is based on a novel formulation known as Gaussian process regression, and also provides uncertainty estimates alongside simple regression outputs. The emulator is evaluated against independent radiative transfer simulations and is also used in CO2 retrievals from synthetic spectra, revealing a performance similar to that of the full algorithm although somewhat different retrieval uncertainties.
The presented methodology appears sound, even though I found section 2 hard to follow at times, as a number of highly technical concepts (Matérn kernel, Sobol sequence, reproducing Hilbert space norm) are introduced without much explanation for what they are. I would invite the Authors to clarify these concepts better in the revised version to make the paper more readable.
MAIN COMMENTS
 P4, eq. 1. I find the notation confusing. It is not clear to me what the function Gamma operates on. Does it operate on a vector and a matrix (as in Gamma(x*, X)) or on two matrices (as in Gamma(X,X))? Consider using a clearer notation.
 P4, eq. 2. What relationship exists between Gamma(x*, X) and Gamma(X, x*)? Are they the same? Are they one the transpose of the other? P4, L110111. "... sets GP regression apart from many neural network based machine learning models". Given that models such as, e.g., multilayer perceptron and radial basis function networks are also analytically differentiable, input uncertainty can be linearly propagated through them. In this sense, what is it that can be done with GR but not with MLP or RBF?
 P4, L117. The concept of Matérn kernel is invoked but it is not defined. What is such kernel and why is it relevant here? P6, L139. It is taken for granted that the reader knows what a "relative reproducing kernel Hilbert space norm" is, but I am quite sure this is not the case for many.
 P67, L169170. What is a Sobol sequence and why is it relevant in this context? I think it is too much to expect the reader to go through the Sobol (1967) without at least a summary of what this all is about. Furthermore, as far as I have seen, this paper is only available in Russian, so I would have had a hard time going through it even if I decided to embark on the endeavour while writing my review.
 P14, L316. Again, the reader is referred to section 2.4 for the concept of Sobol sequence, but also there the concept is just mentioned, and not much is said of how this method works.
MINOR COMMENTS
 P1, L10. Isn't there a more recent report?
 P2, L32. Also add the Chinese TanSat (Ran and Li, 2019, doi: 10.1016/j.scib.2019.01.019).
 P2, L36. Also add TanSat2 (Wu et al., 2023, doi: 10.3390/rs15204904). Furthermore, I think the GeoCarb mission was cancelled towards the end of 2022.
 P3, L69. "to" > "for", "with"?
 P4, L110. "to preclude prediction uncertainties". Do you mean "to predict"?
 P7, eq. 9. Notation is rather uncommon. Usually zenith angles are denoted by thetas instead of taus (you also do that in subsequent sections), whereas tau is reserved for optical depths, which, I think, you denote by g(lambda) (at least, if I correctly interpret your equation g(lambda) seems to be the slant optical depth, you may want to mention that)
 P8, L205. "to" > "on"
 P8, L206. In principle fluorescence is additional radiation coming from the surface, it is not an effect of the observing system. I see later on that in your retrieval it is treated as an instrumental offset, but it may be best to keep the concepts separate at this point of the manuscript.
 P13, L299, "wave length" > "wavelength" P13, L318, remove "of" between "simplifies" and "computations"
 P31, Owhadi and Yoo (2018) and Owhadi and Yoo (2019) seem to be the same paper. Either it is a duplicated entry or one of the two entries is incorrect.
Citation: https://doi.org/10.5194/amt202463RC2 
AC1: 'Reply on RC2', Otto Lamminpää, 10 Aug 2024
We want to thank the reviewer for a thorough review and constructive criticism and comments provided. These are sure to make the article a more coherent publication.
The issue identified with technical concepts will be remedied in the revised manuscript and we will provide further details for improving readability of the article. In short, Matérn kernel is a more expressive choice of kernel compared to the usual Gaussian / radial basis functions used by default in Gaussian Process Regression, which tend to be "too smooth" to capture more abrupt changes in the function that is being approximated. Sobol sequences are a space filling desing akin to Latin Hypercubes, providing optimality (observed experimentally) in generation of training data. Reproducing Kernel Hilbert space norm is a way, in high level, to measure the smoothness of a function approximation achieved with kernel methods.
Reply to Main Comments, in order:
 P4, eq. 1: Gamma is shorthand for operating on both vectors and matrices, and denotes a covariance matrix. We will clarify the notation by explicitly writing out what is meant by Gamma(x*, X) and Gamma(X,X).
 P4, eq. 2: Gamma(x*, X) and Gamma(X, x*) are transposes of each other. We will clarify this in the revised manuscript.
 P4, L110111: Linear propagation of uncertainty is indeed possible using e.g. MLP or RBF. GP regression is different since it provides, as an output, a distribution with mean being the predicted value, and (co)variance the calibrated uncertainty. When calibrated correctly, this makes the GP directly selfaware of its own prediction uncertainty. This is added on top of linear error propagation provided by differentiable models, and can be added to the total error budget e.g. by modifying the likelihood function to include prediction uncertainty in addition to measurement error. We will expand on this comment in the revised version.
 P4, L117: as stated above, we will clarify the benefits of using a Matérn kernel as opposed to more standard choices.
 P6, L139: as stated above, we will clarify the meaning of RKHS norm and why it is relevant.
 P67, L169170: we will add further references to Sobol Sequences. A more recent description, in english, can be found from e.g. the Numerical Recipes books (https://iate.oac.uncor.edu/~mario/materia/nr/numrec/f77.pdf)
Reply to Minor Comments, in order:
 P1, L10: there is indeed a more recent report, we will reference accordingly in the revised version.
 P2, L32: thank you for the suggestion, we will add mention of TanSat
 P2, L36: we will add mention of TanSat2 and omit GeoCarb, as suggested.
 P7, eq. 9: we will change the mentions to tau to refer to thetas instead to more accurately comply with standard notation in the field.
 P8, L206: fluorescence is part of the forward model F(x) we are trying to emulate, but the reviewer is correct in pointing out that it's not part of the instrument effects, although it's applied after the radiative transfer calculations which are our main concern in emulation. We will clarify this distinction as suggested and mention that SIF is added after the fact as additional radiation coming from the surface.
Citation: https://doi.org/10.5194/amt202463AC1

AC1: 'Reply on RC2', Otto Lamminpää, 10 Aug 2024
Viewed
HTML  XML  Total  BibTeX  EndNote  

326  119  19  464  18  11 
 HTML: 326
 PDF: 119
 XML: 19
 Total: 464
 BibTeX: 18
 EndNote: 11
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1