the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Forward Model Emulator for Atmospheric Radiative Transfer Using Gaussian Processes And Cross Validation
Abstract. Remote sensing of atmospheric carbon dioxide (CO2) carried out by NASA’s Orbiting Carbon Observatory-2 (OCO-2) satellite mission and the related Uncertainty Quantification (UQ) effort involves repeated evaluations of a state-of-the-art atmospheric physics model. The retrieval, or solving an inverse problem, requires substantial computational resources. In this work, we propose and implement a statistical emulator to speed up the computations in the OCO-2 physics model. Our approach is based on Gaussian Process (GP) Regression, leveraging recent research on Kernel Flows and Cross Validation to efficiently learn the kernel function in the GP. We demonstrate our method by replicating the behavior of OCO-2 forward model within measurement error precision, and further show that in simulated cases, our method reproduces the CO2 retrieval performance of OCO-2 setup with orders of magnitude faster computational time. The underlying emulation problem is challenging because it is high dimensional. It is related to operator learning in the sense that the function to be approximated is mapping high-dimensional vectors to high-dimensional vectors. Our proposed approach is not only fast but also highly accurate (its relative error is less than 1 %). In contrast with Artificial Neural Network (ANN) based methods, it is interpretable and its efficiency is based on learning a kernel in an engineered and expressive family of kernels.
- Preprint
(8320 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 08 Jun 2024)
-
RC1: 'Comment on amt-2024-63', Hristo Georgiev Chipilski, 04 May 2024
reply
SUMMARY
The authors of this study propose a data-driven surrogate of the forward model needed to relate atmospheric state to CO2 measurements in the context of the OCO-2 satellite mission. The approach is based on a powerful regression-based technique known as Gaussian Processes (GPs), which can approximate continuous functions to arbitrary accuracy (i.e., they enjoy the universal function approximation property). The paper describes and implements an enhanced GP emulator inspired by some recent developments in the field (namely, kernel flows with cross validation), and further incorporates dimension reduction techniques to efficiently deal with large volumes of data. The resulting algorithm is tested on synthetic datasets and shown to effectively recover the true forward model within a 1% relative error. Findings are encouraging and pave the way for future implementations on real CO2 datasets.
This is a very well written paper that documents all necessary technical aspects of the new GP surrogate model and offers a detailed evaluation of its performance. The authors provide an excellent motivation of the research topic, raising awareness of both the climate impacts of CO2 levels and the difficulties in measuring the concentrations of this trace gas. A particularly strong feature of the paper are the carefully justified sensitivities to various design choices in the GP algorithm; it was insightful to see how rule-of-thumb criteria for selecting an appropriate number of PC components may not be adequate (Section 5.1) and how GP predictions are robust to various dominant aerosol species (Section 5.2). Along similar lines, it was important to emphasize how domain knowledge can play a crucial role for optimizing the accuracy and cost of GP predictions (e.g., the justifications for wavelength band separation).
On account of this positive feedback, I recommend this paper to be accepted for publication in AMT after the authors address my comments below. Most of them are fairly minor and line-specific, but the next section discusses a few more general suggestions pertaining to the manuscript’s presentation that will help further increase its impact.
GENERAL COMMENTS
- Methodology (in particular L66-L84, but also other related parts of the paper discussing the novelty of your work): Given the choice of a journal, I expect the primary audience of this article to be experts in measurement techniques who might not be fully aware of all recent advances in the statistical and ML literature. Hence, I advise the authors to make their presentation slightly less technical, and instead focus on some of the practical advantages of using their enhanced GP emulator for retrieval problems. In simple terms, why is the GP method better than standard neural networks? What aspect of the retrieval problem can kernel flows help with — accuracy, computational speed, both? In what sense is the the new cross-validation based training considered an extension of kernel flow techniques (i.e., what is the intuition behind a “flow” in cross validation)? How do Gaussian processes handle bounded datasets (CO2 concentrations are non-negative)? What is the alternative to Gaussian processes and neural networks for emulating complex forward models like ReFRACtor? These are some of the questions practitioners would like to know before adopting your GP-based methodology.
- Technical comments on the GP presentation (particularly in terms of Eq. 1 and related text): The GP regression in Eq. (1) has a very similar structure to that of Optimal Interpolation (OI; e.g., page 157 of Kalnay 2003’s textbook “Atmospheric modeling, data assimilation and predictability”) and related iterative Optimal Estimation (OE) algorithms (e.g., Eq. 15 later in the manuscript). Specifically, the terms pre-multiplying “z” look similar to the OI weight matrix W = BH^T (HBH^T + R)^{-1}, where B is the “background” covariance associated with z, R (which is \sigma \Gamma in your paper) is the measurement error covariance matrix and H (K in your notation) is the Jacobian of the observation operator (forward model in your nomenclature). To improve the reader’s intuition about GP models, it will be nice to include a short discussion on these connections as well as the underlying differences with GP (e.g., OI/OE do not have kernel evaluations in the covariance matrix definitions).
- Figure captions and in-text figure interpretations: I sometimes found that figure captions do not provide a sufficient description of their content (e.g., Fig. 6). Related to that issue, in-text interpretation of some figures was fairly minimal relative to their complexity (see next section for specific examples). I think the authors should add a few more details to make this paper more self-contained.
SPECIFIC COMMENTS
L28: It will be helpful to include references showing the lack of CO2 measurements in the Tropics.
L31-L32: Please elaborate on what total column mole-fraction CO2 is.
L46: If you are aware how many soundings are discarded in the process, include this in the text.
L68: What do you mean by model atmospheres?
L90: Please ensure consistent notation: radiances are denoted by y here, but Section 2.1 uses z for the labels.
Eq. (2): I assume there exists a more general expressions for the posterior covariance?
L117: Either provide references for these kernels or write them explicitly.
L119-L120: Explain why differentiating the kernel function is important; this comes slightly later on L125, but I think it will better to expand on this point earlier.
L125: I would say “Taking the derivative of eq. (1) wrt to x*” here.
Eq. (5): Either provide a reference for this likelihood function or explain/derive how it can be obtained.
L145: L^2 space not explicitly defined, it might be better to continue using the notion of mean squared errors to improve readability.
Eq. (6): It is not clear whether this expression can be derived from Eq. (1). The text around L150 suggests a rank-based approximation is used, but the authors should clarify the connection to the original GP regression in Eq. (1).
L156: Clarify what you mean by a k-norm.
L160-L166: It will be good to discuss how the ability to compute the kernel derivatives in closed form factors into the automatic differentiation procedure.
Line 1 of Algorithm 1: Shouldn’t R be initialized according to the \theta_1 values, which are set to 1? That is, the initial loss should be consistent with the initial choice of GP parameters.
Section 2.4: What other options are there for generating the parameters? What motivated your choice for using Sobol sequences?
FP model (L190-L215): It was not very clear to me how Eqs. (9) and (10) participate in the general forward model described by Eq. (8). For example, is F (loosely) defined as the sum of Eqs. (9) and (10)?
Fig. 1: In the figure caption, please clarify where these aerosol types are used.
L285-L287: At this point a significant time period has passed since you last described what K is; to fully appreciate the usefulness of closed-form Jacobians of the forward model (i.e., the matrix K), it will be helpful to remind them again what K is.
L204: Define loadings.
L307: What do you mean by template?
Eq. (19): diag(*,*) operation not defined, only diag(*) discussed. I assume diag(A,B) concatenates A and B?
L314: Can you confirm that the unit matrix here is different from the identity matrix?
Eq. (20): Shouldn’t \tilde z have a subscript B?
Fig. 4: It will be good to include in-text interpretation of these results.
Fig. 6 and text describing it (L344-L349): Please have a more detailed explanation of these steps.
L353: Write down the learning rate value explicitly.
L356: You mean the simulation distribution of Braverman et al. (2021)?
L358-L360: Please expand the discussion here, there is lots of information in Figs. 7 and 8 that is left uncommented.
L366-L372: Remind the readers of the practical significance for having good approximations of the averaging kernel A.
Table 2: Did you compute the Jacobian for the ReFRACtor model analytically or numerically? It will be insightful to know how challenging it is to compute the derivatives of the physical model.
L398: I believe the abbreviation QoI has not been defined before. Do you mean OI?
Eq. (25): Are the random variables epsilon and delta independent from each other? If so, one can perhaps write Eq. (25) more compactly with a single noise term that combines the mean and covariances of epsilon and delta.
L440: A reference on the 1% relative error in operator learning frameworks will be helpful.
Citation: https://doi.org/10.5194/amt-2024-63-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
167 | 75 | 8 | 250 | 6 | 5 |
- HTML: 167
- PDF: 75
- XML: 8
- Total: 250
- BibTeX: 6
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1