the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Forward Model Emulator for Atmospheric Radiative Transfer Using Gaussian Processes And Cross Validation
Abstract. Remote sensing of atmospheric carbon dioxide (CO2) carried out by NASA’s Orbiting Carbon Observatory-2 (OCO-2) satellite mission and the related Uncertainty Quantification (UQ) effort involves repeated evaluations of a state-of-the-art atmospheric physics model. The retrieval, or solving an inverse problem, requires substantial computational resources. In this work, we propose and implement a statistical emulator to speed up the computations in the OCO-2 physics model. Our approach is based on Gaussian Process (GP) Regression, leveraging recent research on Kernel Flows and Cross Validation to efficiently learn the kernel function in the GP. We demonstrate our method by replicating the behavior of OCO-2 forward model within measurement error precision, and further show that in simulated cases, our method reproduces the CO2 retrieval performance of OCO-2 setup with orders of magnitude faster computational time. The underlying emulation problem is challenging because it is high dimensional. It is related to operator learning in the sense that the function to be approximated is mapping high-dimensional vectors to high-dimensional vectors. Our proposed approach is not only fast but also highly accurate (its relative error is less than 1 %). In contrast with Artificial Neural Network (ANN) based methods, it is interpretable and its efficiency is based on learning a kernel in an engineered and expressive family of kernels.
- Preprint
(8320 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 04 Jul 2024)
-
RC1: 'Comment on amt-2024-63', Hristo Georgiev Chipilski, 04 May 2024
reply
SUMMARY
The authors of this study propose a data-driven surrogate of the forward model needed to relate atmospheric state to CO2 measurements in the context of the OCO-2 satellite mission. The approach is based on a powerful regression-based technique known as Gaussian Processes (GPs), which can approximate continuous functions to arbitrary accuracy (i.e., they enjoy the universal function approximation property). The paper describes and implements an enhanced GP emulator inspired by some recent developments in the field (namely, kernel flows with cross validation), and further incorporates dimension reduction techniques to efficiently deal with large volumes of data. The resulting algorithm is tested on synthetic datasets and shown to effectively recover the true forward model within a 1% relative error. Findings are encouraging and pave the way for future implementations on real CO2 datasets.
This is a very well written paper that documents all necessary technical aspects of the new GP surrogate model and offers a detailed evaluation of its performance. The authors provide an excellent motivation of the research topic, raising awareness of both the climate impacts of CO2 levels and the difficulties in measuring the concentrations of this trace gas. A particularly strong feature of the paper are the carefully justified sensitivities to various design choices in the GP algorithm; it was insightful to see how rule-of-thumb criteria for selecting an appropriate number of PC components may not be adequate (Section 5.1) and how GP predictions are robust to various dominant aerosol species (Section 5.2). Along similar lines, it was important to emphasize how domain knowledge can play a crucial role for optimizing the accuracy and cost of GP predictions (e.g., the justifications for wavelength band separation).
On account of this positive feedback, I recommend this paper to be accepted for publication in AMT after the authors address my comments below. Most of them are fairly minor and line-specific, but the next section discusses a few more general suggestions pertaining to the manuscript’s presentation that will help further increase its impact.
GENERAL COMMENTS
- Methodology (in particular L66-L84, but also other related parts of the paper discussing the novelty of your work): Given the choice of a journal, I expect the primary audience of this article to be experts in measurement techniques who might not be fully aware of all recent advances in the statistical and ML literature. Hence, I advise the authors to make their presentation slightly less technical, and instead focus on some of the practical advantages of using their enhanced GP emulator for retrieval problems. In simple terms, why is the GP method better than standard neural networks? What aspect of the retrieval problem can kernel flows help with — accuracy, computational speed, both? In what sense is the the new cross-validation based training considered an extension of kernel flow techniques (i.e., what is the intuition behind a “flow” in cross validation)? How do Gaussian processes handle bounded datasets (CO2 concentrations are non-negative)? What is the alternative to Gaussian processes and neural networks for emulating complex forward models like ReFRACtor? These are some of the questions practitioners would like to know before adopting your GP-based methodology.
- Technical comments on the GP presentation (particularly in terms of Eq. 1 and related text): The GP regression in Eq. (1) has a very similar structure to that of Optimal Interpolation (OI; e.g., page 157 of Kalnay 2003’s textbook “Atmospheric modeling, data assimilation and predictability”) and related iterative Optimal Estimation (OE) algorithms (e.g., Eq. 15 later in the manuscript). Specifically, the terms pre-multiplying “z” look similar to the OI weight matrix W = BH^T (HBH^T + R)^{-1}, where B is the “background” covariance associated with z, R (which is \sigma \Gamma in your paper) is the measurement error covariance matrix and H (K in your notation) is the Jacobian of the observation operator (forward model in your nomenclature). To improve the reader’s intuition about GP models, it will be nice to include a short discussion on these connections as well as the underlying differences with GP (e.g., OI/OE do not have kernel evaluations in the covariance matrix definitions).
- Figure captions and in-text figure interpretations: I sometimes found that figure captions do not provide a sufficient description of their content (e.g., Fig. 6). Related to that issue, in-text interpretation of some figures was fairly minimal relative to their complexity (see next section for specific examples). I think the authors should add a few more details to make this paper more self-contained.
SPECIFIC COMMENTS
L28: It will be helpful to include references showing the lack of CO2 measurements in the Tropics.
L31-L32: Please elaborate on what total column mole-fraction CO2 is.
L46: If you are aware how many soundings are discarded in the process, include this in the text.
L68: What do you mean by model atmospheres?
L90: Please ensure consistent notation: radiances are denoted by y here, but Section 2.1 uses z for the labels.
Eq. (2): I assume there exists a more general expressions for the posterior covariance?
L117: Either provide references for these kernels or write them explicitly.
L119-L120: Explain why differentiating the kernel function is important; this comes slightly later on L125, but I think it will better to expand on this point earlier.
L125: I would say “Taking the derivative of eq. (1) wrt to x*” here.
Eq. (5): Either provide a reference for this likelihood function or explain/derive how it can be obtained.
L145: L^2 space not explicitly defined, it might be better to continue using the notion of mean squared errors to improve readability.
Eq. (6): It is not clear whether this expression can be derived from Eq. (1). The text around L150 suggests a rank-based approximation is used, but the authors should clarify the connection to the original GP regression in Eq. (1).
L156: Clarify what you mean by a k-norm.
L160-L166: It will be good to discuss how the ability to compute the kernel derivatives in closed form factors into the automatic differentiation procedure.
Line 1 of Algorithm 1: Shouldn’t R be initialized according to the \theta_1 values, which are set to 1? That is, the initial loss should be consistent with the initial choice of GP parameters.
Section 2.4: What other options are there for generating the parameters? What motivated your choice for using Sobol sequences?
FP model (L190-L215): It was not very clear to me how Eqs. (9) and (10) participate in the general forward model described by Eq. (8). For example, is F (loosely) defined as the sum of Eqs. (9) and (10)?
Fig. 1: In the figure caption, please clarify where these aerosol types are used.
L285-L287: At this point a significant time period has passed since you last described what K is; to fully appreciate the usefulness of closed-form Jacobians of the forward model (i.e., the matrix K), it will be helpful to remind them again what K is.
L204: Define loadings.
L307: What do you mean by template?
Eq. (19): diag(*,*) operation not defined, only diag(*) discussed. I assume diag(A,B) concatenates A and B?
L314: Can you confirm that the unit matrix here is different from the identity matrix?
Eq. (20): Shouldn’t \tilde z have a subscript B?
Fig. 4: It will be good to include in-text interpretation of these results.
Fig. 6 and text describing it (L344-L349): Please have a more detailed explanation of these steps.
L353: Write down the learning rate value explicitly.
L356: You mean the simulation distribution of Braverman et al. (2021)?
L358-L360: Please expand the discussion here, there is lots of information in Figs. 7 and 8 that is left uncommented.
L366-L372: Remind the readers of the practical significance for having good approximations of the averaging kernel A.
Table 2: Did you compute the Jacobian for the ReFRACtor model analytically or numerically? It will be insightful to know how challenging it is to compute the derivatives of the physical model.
L398: I believe the abbreviation QoI has not been defined before. Do you mean OI?
Eq. (25): Are the random variables epsilon and delta independent from each other? If so, one can perhaps write Eq. (25) more compactly with a single noise term that combines the mean and covariances of epsilon and delta.
L440: A reference on the 1% relative error in operator learning frameworks will be helpful.
Citation: https://doi.org/10.5194/amt-2024-63-RC1 -
RC2: 'Comment on amt-2024-63', Anonymous Referee #2, 19 Jun 2024
reply
The Author present a machine learning based radiative transfer emulator to speed up OCO CO2 retrievals. The method is based on a novel formulation known as Gaussian process regression, and also provides uncertainty estimates alongside simple regression outputs. The emulator is evaluated against independent radiative transfer simulations and is also used in CO2 retrievals from synthetic spectra, revealing a performance similar to that of the full algorithm although somewhat different retrieval uncertainties.
The presented methodology appears sound, even though I found section 2 hard to follow at times, as a number of highly technical concepts (Matérn kernel, Sobol sequence, reproducing Hilbert space norm) are introduced without much explanation for what they are. I would invite the Authors to clarify these concepts better in the revised version to make the paper more readable.
MAIN COMMENTS
- P4, eq. 1. I find the notation confusing. It is not clear to me what the function Gamma operates on. Does it operate on a vector and a matrix (as in Gamma(x*, X)) or on two matrices (as in Gamma(X,X))? Consider using a clearer notation.
- P4, eq. 2. What relationship exists between Gamma(x*, X) and Gamma(X, x*)? Are they the same? Are they one the transpose of the other?- P4, L110-111. "... sets GP regression apart from many neural network based machine learning models". Given that models such as, e.g., multilayer perceptron and radial basis function networks are also analytically differentiable, input uncertainty can be linearly propagated through them. In this sense, what is it that can be done with GR but not with MLP or RBF?
- P4, L117. The concept of Matérn kernel is invoked but it is not defined. What is such kernel and why is it relevant here?- P6, L139. It is taken for granted that the reader knows what a "relative reproducing kernel Hilbert space norm" is, but I am quite sure this is not the case for many.
- P6-7, L169-170. What is a Sobol sequence and why is it relevant in this context? I think it is too much to expect the reader to go through the Sobol (1967) without at least a summary of what this all is about. Furthermore, as far as I have seen, this paper is only available in Russian, so I would have had a hard time going through it even if I decided to embark on the endeavour while writing my review.
- P14, L316. Again, the reader is referred to section 2.4 for the concept of Sobol sequence, but also there the concept is just mentioned, and not much is said of how this method works.
MINOR COMMENTS
- P1, L10. Isn't there a more recent report?
- P2, L32. Also add the Chinese TanSat (Ran and Li, 2019, doi: 10.1016/j.scib.2019.01.019).
- P2, L36. Also add TanSat-2 (Wu et al., 2023, doi: 10.3390/rs15204904). Furthermore, I think the GeoCarb mission was cancelled towards the end of 2022.
- P3, L69. "to" -> "for", "with"?
- P4, L110. "to preclude prediction uncertainties". Do you mean "to predict"?
- P7, eq. 9. Notation is rather uncommon. Usually zenith angles are denoted by thetas instead of taus (you also do that in subsequent sections), whereas tau is reserved for optical depths, which, I think, you denote by g(lambda) (at least, if I correctly interpret your equation g(lambda) seems to be the slant optical depth, you may want to mention that)
- P8, L205. "to" -> "on"
- P8, L206. In principle fluorescence is additional radiation coming from the surface, it is not an effect of the observing system. I see later on that in your retrieval it is treated as an instrumental offset, but it may be best to keep the concepts separate at this point of the manuscript.
- P13, L299, "wave length" -> "wavelength"- P13, L318, remove "of" between "simplifies" and "computations"
- P31, Owhadi and Yoo (2018) and Owhadi and Yoo (2019) seem to be the same paper. Either it is a duplicated entry or one of the two entries is incorrect.
Citation: https://doi.org/10.5194/amt-2024-63-RC2
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
265 | 106 | 12 | 383 | 13 | 7 |
- HTML: 265
- PDF: 106
- XML: 12
- Total: 383
- BibTeX: 13
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1