Interactive comment on “ A Singular Value Decomposition framework for retrievals with vertical distribution information from greenhouse gas column absorption spectroscopy measurements

• P 7: forward model definition. I find this section confusing. The measurement vector y is described as the deviation in the absorption from that corresponding to xu, but that is not a measurable quantity and the statement is contradicted by Eq. (4). Near the bottom of page 8 it is claimed a valid choice for the reference profile is xu = 0, so that y is a ‘deviation’ from zero. I believe this is correct in the end, and exploits the assumed linearity of the problem, but it is still not entirely clear to me, and I think the concepts should be better explained.


Introduction
In the past few decades, anthropogenic climate change has brought a renewed interest in carbon cycle science and thus in accurate sensing of greenhouse gases (GHGs).GHG column remote-sensing measurements are made using satellitebased optical spectrometers such as those aboard the Greenhouse gas Observing Satellite (GOSAT, Kuze et al., 2009) and the Orbiting Carbon Observatory (OCO-2, Boesch et al., 2011), ground-based spectrometers such as the Total Column Carbon Observing Network (TCCON, Wunch et al., 2011) and other instruments (Gisi et al., 2012).Atmospheric measurements have also been made using airborne integrated path differential absorption (IPDA) lidar instruments (Abshire et al., 2018;Lin et al., 2015;Menzies et al., 2014;Refaat et al., 2016).While column-averaged mixing ratios are retrieved from measurements using methods ranging from simple differential absorption ratioing (Refaat et al., 2016), least-squares line-fitting (Wunch et al., 2011) and traditional optimal estimation (OE) (Connor et al., 2008), information about the GHG vertical distribution (which we shall refer to as vertical information) is more difficult to obtain and typically not routinely reported as part of GHG retrievals.
Although in principle, the traditional OE (Rodgers, 2000) is capable of extracting vertical information in the measurement, in practice the assumption of a prior GHG distribution, which is necessary for the regularization of the problem, makes the retrieval potentially bias-prone.Here, by traditional OE we mean an application of the optimal retrieval Published by Copernicus Publications on behalf of the European Geosciences Union.framework as described in Rodgers (2000) where the input prior covariance matrix is informative (i.e., the prior covariance matrix has at least one finite eigenvalue).In contrast, the singular value decomposition (SVD) approach (Hansen, 1990) can extract vertical information from the measurement without assuming any prior GHG distribution.The SVD method is based on retrieving the leading principal components of the trace gas mixing ratio state vector from the measurement.The vertical information contained in the principal components can provide useful information for carbon flux inferences, thanks to the correlations between the pressure broadening (and thus absorption lineshape) of two layers and their GHG mixing ratios (due to GHG vertical transport).
The theoretical basis of the SVD method has been previously laid out in the context of the general underdetermined inversion problem (Hansen, 1990).Rodgers (2000) also has a discussion on the topic.Borsdorff et al. (2014) present a review of the SVD and related methods in the context of trace gas retrievals and the connections to the traditional OE as well as simple profile-scaling methods.The SVD method has also been applied to remote sensing for ozone (Hasekamp and Landgraf, 2001) and methane (Butz et al., 2010).Previous work has used the SVD method primarily to regularize the underdetermined retrieval problem but also for computational efficiency and to eliminate the need for knowledge of the prior distribution.
In this work, we choose a specific greenhouse gas measurement system and study the principal components and illustrate how they provide useful, quantifiable information about the vertical distribution of the gas.In choosing to evaluate the retrieval method via the principal components, the implicit prior used is strictly uninformative and does not cause any bias in the retrieved principal components, which we explicitly show.In addition, we explore the instrument spectral resolution necessary to obtain vertical information.Finally, we illustrate the theory using numerical simulations.
This paper also attempts to make the theoretical framework of the SVD method more accessible to readers who may not be as familiar with the matrix algebra conventions used in books like Rodgers (2000).It should be noted that many of the articles cited in Table 1 use nonmatrix equations for performing retrievals, even though the matrix formalism is more complete and general.By choosing a relatively simple CO 2 IPDA lidar system to focus on, we are able to make a direct connection between the retrieval problem and the underlying physics, with no major assumptions or simplifications.We also illustrate the most important matrices so that the reader is able to get an intuitive sense of the physics beneath the matrix algebra.
The SVD method works similarly to least-squares linefitting retrieval approaches but offers a more formal framework (Borsdorff et al., 2014).Here, we extend the approach to retrieve vertical GHG profile information without incurring bias from the regularization process.In contrast, the regularization process in the traditional OE method incurs bias when the prior GHG vertical profile is not close to the true GHG vertical profile.Biases are a concern for atmospheric carbon dioxide (CO 2 ) measurements, since even small biases are known to affect carbon flux inversions (Chevallier et al., 2014).
The paper is organized as follows.In Sect.2, we introduce the problem of regularization, which is intimately tied in to the challenge of extracting information about the vertical distribution, and set up the radiative transfer equations and retrieval equations.We follow it up in Sect. 3 with a description of the SVD method, its ability to extract vertical information and its robustness against bias in the absence of prior information on the GHG vertical profile.In Sect.4, we apply the SVD method to the specific case of the CO 2 Sounder lidar instrument, and proceed in Sect. 5 to perform numerical simulations comparing the SVD and traditional OE methods.We then describe the implications of this work in Sect.6 before concluding.

Retrievals from GHG absorption measurements
A retrieval seeks to extract certain information from a measurement.Even when the number of measurement samples far exceed the number of retrieved parameters (as with column GHG absorption measurement spectra), retrieval problems may or may not be fully determined depending on the information content of the samples with respect to the retrieved parameters.In situations where the retrieval problem is fully determined, one can obtain a unique solution of the parameters of interest.When the problem is overdetermined, one can perform a least-squares fit to solve for the parameters of interest.However, for column GHG absorption measurement spectra obtained from remote sensing, the retrieval is generally underdetermined, and thus needs some kind of regularization to make it more deterministic.

Regularization of the retrieval problem and vertical information
The traditional Bayesian OE method (Rodgers, 2000) recommends linearization of the problem close to the solution followed by regularization by a term corresponding to a prior distribution for the state.SVD and related methods (Hasekamp and Landgraf, 2001) perform an unconstrained retrieval, equivalent to the use of an uninformative prior, on subspaces of the trace gas column that are informed by the measurement.These regularization methods allow a solution to be computed, but may also induce bias on either certain dimensions (SVD and related methods) or all dimensions (traditional OE) of the solution space.At this point it is useful to qualify what we mean by prior information.The use of prior information in some form is unavoidable in any kind of GHG remote-sensing retrieval, since it is not possible to simultaneously measure all the pa-Table 1.Comparison of retrieval algorithms used for GHG remote sensing based on regularization method and source of vertical information.The approximate spectral resolution (instrument linewidth; see Sect.4.4), is given in brackets for each type of measurements.The SVD method proposed in this work extracts information on the vertical GHG distribution strictly from the measurement, making no assumption of a prior distribution.Note that use of a uniform column for vertical information is equivalent to the use of an uninformative prior.*  Butz  et al. (2011)  rameters needed for determining the GHG mixing ratio.For instance, the absorption depends on the spectroscopic parameters, which are determined from laboratory measurements, and the atmospheric pressure and temperature profile, which are typically obtained from weather models.A comprehensive quantification of uncertainty that includes errors arising from all these sources of "prior" information is well beyond the scope of this work.Rather, we will focus on how the assumption of a prior GHG distribution in the atmosphere could affect the retrieved estimate of the GHG profile.
An uninformative prior is one that fills in information necessary for a retrieval (here a GHG profile) but it tries to be as vague as possible.In this paper, our uninformative prior makes use of the principle of indifference, which assigns equal probability to all possibilities.Though the uninformative prior is used to determine the principal component basis for retrieval, so long as the validation of the retrieved parameters is also done in the principal component basis, there is no bias incurred even if the uninformative prior differs significantly from the actual GHG profile.
Although traditional OE has become the de facto standard for satellite GHG remote sensing (Oshchepkov et al., 2013), ground-based spectrometers and airborne IPDA lidar (see Table 1) have largely avoided it and other regularization methods by resorting to dimension reduction.Typically, a fixed profile shape is assumed (Wunch et al., 2011;Abshire et al., 2018), and only a simple vertical profile-scaling parameter is retrieved.Such simple methods have the advantage of en-abling more feedback on instrument performance by virtue of forcing the retrieval to derive certain information strictly from the measurement even when nonoptimal.Despite preliminary evidence to the contrary (Wunch et al., 2010), there remains the open question of whether biases are introduced by the assumption of a fixed vertical GHG profile, the potential underfitting of the absorption spectrum and the failure to exploit all the information contained in the measurement.In addition, this simple scaling of a vertical profile also precludes such instruments from discerning any information about the vertical GHG distribution.
Between the traditional OE retrieval and the least-squares fitting via a simple scaling method, there exist some intermediate choices.In a recent advance, Kulawik et al. (2017) extract the GHG mixing ratio of two vertical layers from GOSAT data using the OE method with a reduced vertical basis and an uninformative prior.The authors choose to use an uninformative prior for regularization to ensure that any vertical information can be attributed to the measurement alone.There have also been attempts to retrieve vertical information from ground-based sun spectrometer measurements by easing the constraints imposed by OE, as given in Wunch et al. (2011).Kuai et al. (2012) and Dohe (2013) used a reduced number of vertical levels and applied additional constraints via the choice of the prior covariance matrix.In fact, Cressie et al. (2017) show that, for a fully determined problem, the (non-Bayesian) least-squares fit is simply a special case of the optimal estimate using an uninformative prior.Thus, one can move back and forth along the spectrum of retrieval methods from fully Bayesian-like formalism to non-Bayesian by varying the prior assumption and the dimension of the basis describing vertical structure.
Dimension reduction via SVD has been previously used both for satellite retrievals (Masiello et al., 2012;Thompson, 1992;Butz et al., 2010), ground-based spectrometers (Tukiainen et al., 2016) and laboratory laser absorption measurements (Bomse and Kane, 2006).The SVD approach described here comes closest to the one applied for satellite methane retrievals (Butz et al., 2010) but performs the retrieval in the principal component basis to eliminate bias originating from the choice of the uninformative prior used (see Sect. 3.5).Components in the reduced dimensional principal component space can be directly assimilated into flux models similarly to the way X CO 2 is presently assimilated (Basu et al., 2013).Joiner and Da Silva (1998) describe a method that can ingest such components into an assimilation model based on their information content.

The radiative transfer problem
Remote-sensing measurements of GHGs are typically assimilated into a carbon flux inversion system or other modeling (see Fig. 1).We set up the radiative transfer problem and retrieval keeping in mind that the measurements are not an end in themselves.In addition, to best illustrate the SVD method, we choose a simplified measurement geometry and atmospheric conditions, all of which are satisfied by a nadirpointed IPDA lidar instrument such as Abshire et al. (2018): 1. We choose a nadir sounding geometry with light traveling along a perfect vertical path.Lidar instruments satisfy this condition since they are pointed nadir and have the source and detector on the same platform.
2. We assume perfect knowledge of the optical path with a clear atmosphere.Lidar instruments are pulsed (Abshire et al., 2018;Refaat et al., 2016), or alternatively have some modulation (Lin et al., 2015), and simultaneously measure the surface elevation (via ranging) and thus the precise light path length.In addition, this ranging capability enables the time-gating of the surface returns so as to exclude aerosol backscatter, a common cause of bias.
3. We assume an undistorted measure of atmospheric transmittance with negligible instrument broadening.Lidar instruments have narrow laser linewidths, which determine their instrument lineshape function, which are typically 3-4 orders of magnitude narrower than spectrometers.The laser line width is negligible compared to the molecular absorption lineshape and can be assumed to be monochromatic.
4. We assume negligible interference from other atmospheric species via a careful line choice.Lidar instru-ments typically sample a single absorption line, rather than a full absorption band.For this narrow spectral range, absorption from other species can be ignored.
5. We assume a sufficient number of wavelength samples.Due to complexities in generating precisely tuned laser light for wavelength samples, many lidar GHG-sensing instruments (Refaat et al., 2016;Lin et al., 2015;Menzies et al., 2014) use only two wavelength samples.
Here, we assume at least a few wavelength samples across the absorption line.
We divide the atmosphere into m layers.We make the layers in equal intervals of pressure to keep the number of air molecules in each layer the same.The atmospheric transmittance can be expressed as the negative exponent of the sum of the absorption (expressed in optical depth units) of the individual layers of height h i : where T is the two-way transmittance, OD(λ, b i ) represents the spectroscopic model calculating the two-way GHG absorption in units of optical depth per distance at wavelength λ for the atmospheric conditions b i (consisting of the atmospheric pressure and temperature).b is a vector containing the profiles b i .x is the vector containing the GHG mixing ratio profile x i .The total path length h = h i is given in units of distance.
Next we define a measurement vector y consisting of n samples of an absorption line and define the measurement equation with noise assuming perfect knowledge of the forward model, and the forward model, represents the measurement noise, which will be described in Sect.2.4.The atmospheric conditions and absorption path have been assumed fixed for each sounding and thus left out of the explicit notation.We have incorporated a measurement amplitude x 0 term, which includes all signal attenuation and loss factors in the vector x.
Additionally, we have normalized T (λ, x, z) by T (λ, x u , z), where x u is the uninformative prior.We have also taken the natural logarithm to make the problem linear with respect to the change in the GHG concentration .Schematic of the various terms involved in a greenhouse gas (GHG) measurement, retrieval and end use.The singular value decomposition (SVD) method introduces a new retrieval basis space z, which is different from the model parameter space x.In using the z basis, the SVD retrieval makes no assumptions regarding the prior GHG distribution, thus avoiding a potential source bias and making the validation and flux modeling more straightforward.
x, enabling the use of the tools of linear algebra.With that, y is defined as the deviation in the absorption from that of a column defined by x u rather than the absorption itself.A schematic of the model parameter and measurement spaces, is given in Fig. 1.
As with most atmospheric measurements, the retrieval problem for GHG remote sensing cannot be expressed as a nonsingular analytic expression based on the forward model.In the remainder of this section, we will set up the retrieval problem analogous to Rodgers (2000) and define the various matrices needed for the solution.

Forward-model Jacobian
Having set up the radiative transfer problem in Eq. ( 1) and defined the forward model in Eq. (3), one can see that the problem is already linear with respect to the change in GHG concentration.For the rest of this paper, we will assume that the forward model is linear (i.e., F(x) = Kx).For problems that are not linear, one can now take the linear approximation for small perturbations, a standard technique used extensively by Rodgers (2000).
Here, we will linearize the problem around the prior mean, x u .As we will later show mathematically (Sect.3.5) and through numerical simulations (Sect.5), the retrieval in the principal component basis is insensitive to the choice of the uninformative prior.We can now express the measurement vector as where K is an n × (m + 1) matrix of partial derivatives with the following form, and c is a known constant vector.Without lack of generality, we assume that c = 0, since in principle it is known and could be subtracted from y.

Measurement noise matrix
The measurement y is associated with noise, which we characterize using the measurement error covariance matrix (Rodgers, 2000) S , which has dimensions n × n.The noise is assumed to be Gaussian (random noise only) and the diagonal elements of S represent the variance (in a large sample of identical, repeated observations) of the individual wavelength samples.For a perfect instrument, which we assume here, the off-diagonal terms, which represent covariances between different wavelength samples are zero.

Retrieval equations
To derive an estimate of the state x from measured radiance y, we define a loss function (or weighted least-squares error) Note that Eq. ( 6) is the same as the method of least squares, except here we are weighting the sum of squared error by the measurement error matrix S .This weighted sum of squared errors is widely used in regression frameworks, and it is the loss function of choice for retrievals that are not based on optimal estimation (e.g., Atmospheric Infra-Red Sounder or AIRS, Chahine et al., 2005 andCressie et al., 2017).In contrast to the more common Bayesian treatments of the problem (Rodgers, 2000), we are not required to explicitly specify the a priori distribution for x in Eq. ( 6).
To find the optimum x, we take the derivative of L(x) with respect to x: In the above equations, we have carefully exercised our choice in linearly mapping the physical world to x by setting x u = 0 for simplicity, and scaling x such that x i = −1 corresponds to the GHG concentration of the ith layer in the atmosphere being zero.As per Eq.(3), F(x u ) is a constant, which can also be set to zero with no loss in generality.These sorts of transformations are fairly standard in the literature (Rodgers, 2000) and make the equations less complicated.The optimal state vector x that minimizes the loss function in Eq. ( 6) can be found by setting the derivative to 0 and solving as follows: Equation ( 7) can be used to solve for the optimal estimate x from a single measurement y.Solving for a unique x is usually not possible since it requires the inversion of the matrix K T S −1 K, which is typically singular.This implies that the complete information required to retrieve a unique x is not present in the measurement y.The standard practice, as described in Rodgers (2000), is to use a priori information to regularize Eq. ( 7), but here we will explore the alternative SVD method.

The singular value decomposition approach
The singular value decomposition (SVD) approach (Hansen, 1990) involves regularizing Eq. ( 7) by only solving for the principal components of the (m + 1) × (m + 1) matrix K T S −1 K.Alternatively, it can be interpreted as inverting Eq. ( 7) using a reduced-rank pseudo-inverse (discussed in Sect.3.3).Matrix SVD is a standard tool in matrix algebra which has applications that include least-squares fitting, principal component analysis (Wall et al., 2003;Madsen et al., 2004) and calculating the pseudo-inverse of a matrix, all of which are related to the approach used here.
Before getting into the formal derivation of the principal component basis, it is useful to bring in some physical intuition.The nature of the principal components is tied to the lineshapes of the various atmospheric layers.Pressure broadening of the lineshape in the atmosphere leads to the first principal component being shaped like a "mean" lineshape and representing a sort of column average.Higher-order principal components represent higher-order moments in the atmospheric profile and, as one would expect, are more challenging to measure.
The remainder of this section formally reviews and describes the SVD framework along the lines of Butz et al. (2010).In contrast to previous SVD work (Hansen, 1990;Hasekamp and Landgraf, 2001;Butz et al., 2010), we describe the mathematics underlying the SVD approach using the retrieval basis z of the principal components of K T S −1 K, which we will refer to as the principal component basis.In Sect.3.3, we connect the SVD retrieval method to a rankreduced pseudo-inverse applied to the retrieval equation.In Sect.3.5, we show how using the SVD method with the principal component basis can avoid bias from regularization and thus render the prior truly uninformative.Readers with a preference for an intuitive understanding based on the underlying physics can, as they read along, refer to Sect. 4, which illustrates the SVD framework applied to a specific instrument and measurement.

The z retrieval basis of principal components
To calculate the principal component basis z, we perform a singular value decomposition (Wall et al., 2003) of the matrix where -U is an n × n orthogonal matrix (rows consist of unit vectors that are normal to each other), is an n × (m + 1) matrix having all nonmain diagonal elements (i, j : i = j ) equal to zero, -V is an (m + 1) × (m + 1) orthogonal matrix.
The matrix singular value decomposition described in Eq. ( 8) is a standard function available in most numerical software packages.It is also equivalent to extracting the principal components of K T S −1 K via eigenvector decomposition.In a singular value decomposition, the first few rows of V T capture the most significant information contained in (S The new principal component z basis is defined by , where V T is a row-truncated version of V T .Both V and I m+1,p have dimensions (m+1)×p, where p < (m+1), and p < n.The truncation size p depends on the information content in the measurement, with typically 2 ≤ p ≤ 4 for GHG measurements described here.The choice of p will be discussed in more detail in Sect. 5. We note that the truncation of V leads to the matrix multiplication of V and V T being noncommutative for the general case: The subscript to I denotes the dimensions of the identity matrix.This noncommutative behavior has implications on the types of biases resulting from the SVD truncation as we will later see in Sect.3.5.Finally, for completeness, we will look at transformations between the x and z bases.Given a vector z, one can project it back onto the x basis using However, conversion using Eq. ( 12) only projects onto a subspace of x.Mathematically, in making a transformation from x to z using Eq. ( 9) and back to x using Eq. ( 12), any information corresponding to the m + 1 − p dimensions not present in the z basis is lost.But, starting with the reduced basis space z, one can transform to the x basis and back with no loss of information.

Retrieval equations in the z basis
By substituting Eq. ( 12) into Eq.( 7), effectively projecting the retrieval onto the subspace of x spanned by z, one can solve the retrieval equation: In the second equation line above, we have multiplied both sides by V T to also reduce the column space of the equation to the z basis.This yields an estimate ẑ in the z basis, ẑ = G SVD y, where (13) G SVD is a p × n matrix analogous to the G or "gain" matrix used in (Rodgers, 2000).In determining G SVD , one needs to ensure sufficient truncation in V to ensure that the p × p matrix [ V T K T S −1 K V] is invertible.Since V consists of the eigenvectors of K T S −1 K, truncation can easily be done by selecting only eigenvectors with positive eigenvalues.Equation ( 13), by selecting just the principal components, offers a way of regularizing and solving Eq. ( 7) without relying on the assumption of a prior distribution in x.This allows an alternative retrieval method to the commonly used Bayesian optimal estimation method.

Relationship between SVD and OE retrieval
In this section we will explicitly describe the SVD retrieval as an OE retrieval with a particular uninformative prior and the replacement of the inverse with the pseudo-inverse in computing the gain matrix G OE .Although the algebra here has been shown previously (Rodgers, 2000;Butz et al., 2010), we find it useful to think of the SVD method as simply implementing a pseudo-inverse in lieu of an inverse to solve the underdetermined retrieval equations.
We start with the analogous traditional OE version of Eq. ( 13) as described in Rodgers (2000): where where x a and S a are the a priori mean and covariance matrix of the state vector x, respectively.With no loss in generality, we set x a = 0. We then use an uninformative prior where S a is infinitely large such that S −1 a = 0. We note that the uninformative prior distribution N (x a , S a → ∞) is technically an improper prior in that it is not a well-defined probability distribution.However, it does yield a well-defined Bayesian posterior distribution.Note that this prior contains no information on the distribution of the state x, hence the name "uninformative" prior.The above equations then reduce to the following: where the subscript uOE on x and G indicates that we are using an uninformative prior within OE.Without the term S −1 a in Eq. ( 17), K T S −1 K might not be full rank and hence noninvertible.We will replace its inverse with the Moore-Penrose pseudo-inverse, which is well defined for singular where the superscript + above a matrix denotes its pseudoinverse.Since S is positive definite, K T S −1 K is positivesemidefinite and its singular value decomposition is identical to its eigenvalue decomposition.Therefore, we can express the singular value decomposition of K T S −1 K as follows: where D = T , a (m+1)×(m+1) diagonal matrix.here is the same as defined in Eq. ( 8).We can truncate the righthand side of the equation to remove degenerate rows in V and degenerate rows and columns in D, without affecting the equality.However, we choose to truncate further to rank p to get numerical stability: where V and D are truncated versions, with V being identical to that used in Eq. ( 9).If the truncation is applied only to the degenerate rows, the approximation in Eq. ( 20) can be replaced by equality and the pseudo-inverse can be constructed from the singular value decomposition per Petersen and Pedersen (2012).The key results still hold even when we choose to more aggressively truncate D to rank p in our SVD method.This is equivalent to replacing the term K T S −1 K in Eq. ( 17) with the closest rank-p matrix approximation under the Frobenious norm and then computing its pseudo-inverse (Eckart and Young, 1936): Substituting D from Eq. ( 20) into Eq.( 21), we see that OE with an uninformative prior and pseudo-inverse is equivalent to SVD: The result in Eq. ( 22) indicates that the above-modified OE retrieval equation with x a = 0 and S −1 a = 0 is mathematically identical to an SVD retrieval.In other words, the SVD retrieval may be viewed as a special case of the OE retrieval that uses an uninformative prior for the state x and a pseudoinverse for computation of the gain matrix.This has also been found by Cressie et al. (2017) in their analysis of the AIRS retrieval algorithm.

SVD retrieval error covariance matrix and averaging kernels
One of the strengths of the OE method is the ability to propagate errors from the inputs to the final estimate of the state vector x.Given the prior covariance matrix S a and measurement-error covariance matrix S , Rodgers (2000) demonstrated that the posterior covariance matrix for the OE estimate in Eq. ( 16) is Since we have demonstrated in Sect.3.3 that the SVD approach is equivalent to optimal estimation with x a = 0 and S −1 a = 0, we can apply those values into Eq.( 23) to obtain the SVD posterior covariance matrix S x,svd as follows: In some applications, K T S −1 K might not be full rank, and thus the expression in Eq. ( 24) may be approximated using a pseudo-inverse: Note that the SVD posterior matrix in Eq. ( 24) is in the x basis.It is straightforward to transform it to the z basis using the linear transformation in Eq. ( 9): Since and I m+1,p have nonzero elements only on the main diagonal, the retrieval error covariance matrix S z has no off-diagonal terms, implying that errors in the retrieved parameters are uncorrelated.
The averaging kernels of the z retrieval elements can be calculated using (Eskes and Boersma, 2003) Thus, in calculating the averaging kernel, one obtains the simplified Eq. ( 28), where V T can be directly obtained from the singular value decomposition step with the appropriate truncation.The reader should note that the degree of truncation only affects the number of components (rows) in the averaging kernel A SVD but not the information content (number of columns) or shapes of the individual components themselves.
3.5 SVD principal components free of bias from regularization and use of a prior In practice, biases in GHG measurements occur due to several reasons, many of which are out of the scope of this paper.To limit the discussion on biases arising from solving an underdetermined problem using some form of regularization (retrieval error, which is universal to all GHG measurements) we make two further assumptions: 1. Negligible error in the knowledge of the atmospheric pressure, temperature and water vapor profile (b).These errors have been found to be small in practice (Abshire et al., 2018) and can be further reduced with auxiliary measurements.
2. Negligible errors in radiative transfer equations (forward model), instrument calibration or other similar systematic effects.
Retrieval errors in the traditional OE method can arise from incorrect assumptions about the true greenhouse gas profile distribution.For the SVD method, we see a potential bias in retrievals in the original x basis but not in the principal component z basis.
We will first derive the expected bias for the OE method where the input prior mean and covariance matrix are incorrect.We assume that the true state process x has the true mean x t and the true covariance matrix S t .However, we assume that in practice the OE algorithm is using the prior mean x a and covariance matrix S a .Note that the true prior distribution, {x t , S t }, and the one used in the computations, {x a , S a }, are not necessarily the same.When they do differ, there is an expected bias, which we will show below.
The expected bias is defined as where E() denotes the expectation value averaged over several measurements, such that random noise, averages out to zero.We now substitute the OE retrieval equation (Eq.15): where we have applied the forward-model equation for a true state x t with noise .Simplifying the equation and substitut-ing for G OE , we get Looking at Eq. ( 30), we see that the expected bias in OE retrievals is proportional to (x a − x t ), or the difference between the true prior mean and one used in practice.Equation (30) also shows that when the constraint on the Bayesian OE is set too high (S a is small and thus S −1 a is large), there is a significant bias in the retrieval from the mismatch between the true mean and the prior mean assumed in the retrieval.
We now derive the bias when using the SVD method.We assume a true state x t and an uninformative prior x u .We again start with the bias equation where x denotes the SVD retrieved result ẑ (Eq.13) transformed to the x basis using Eq. ( 12).This can be expanded to give We have deliberately left in the uninformative prior x u for better illustration of the bias.We now include the linearized forward model (Eq.4), and the noise (Eq.2): where we have applied the pseudo-inverse derivation (Eqs.18 and 22) of the SVD retrieval.
As we can see in Eq. ( 33), when the term K T S −1 K is singular, then the product of it against its pseudo-inverse (i.e., (K T S −1 K) + K T S −1 K) is not equal to the identity matrix, and hence the bias will generally be nonzero.This can be further illustrated by applying Eqs. ( 19) and ( 21), where V V T is of rank p and not equal to the (m+1) rank identity matrix (see Eq. 11).
Fortunately, when we look at the retrievals on the z space, the retrievals are unbiased.Given that there is no loss of information in projecting the retrieval results from the z basis to the x basis as was done above, we can simply project Eq. ( 34) back to the z basis using Eq. ( 9): where we have used V T V = I p from Eq. ( 10).It should be noted that the bias-free result holds regardless of the degree of truncation or choice of the uninformative prior.A different choice of the uninformative prior may change the Jacobian K and thus all downstream calculations including the principal component z basis, but the above bias-free result would nevertheless hold in the changed z basis.The bias-free result will be illustrated via numerical simulations in Sect. 5.As a caveat, we note that the bias derivation above assumes that the forward model is linear (as is the case for GHG retrievals discussed here; see Sect.2.3) for both the OE and SVD retrieval, and therefore the bias equations for OE and SVD (Eqs.30 and 34) should only hold when the assumption is true or mostly true.For the more general nonlinear forward model, both the OE and SVD retrievals might be biased, but a thorough exploration of this nonlinear case beyond the scope of this paper is required.

SVD retrieval validation
SVD retrievals can be validated directly in the retrieval z basis by transforming the validation data in the parameter space x basis using Eq. ( 9).Since the retrieval error covariance matrix S z is defined in this basis, the expected scatter based on the assumed noise distribution, calculated using Eq. ( 26) can be compared against the actual scatter based on a large number of measurements.

SVD approach applied to IPDA lidar CO 2 measurements
We choose the NASA Goddard CO 2 Sounder instrument concept (Abshire et al., 2018) as an example with which to describe the SVD technique.The CO 2 Sounder is a lidar instrument that probes the 1572.335nm CO 2 absorption line with multiple (between 15 and 30) wavelength samples (Abshire et al., 2018).To best illustrate the matrix algebra, we choose the 15-wavelength sampling scheme from a recent field campaign (n = 15).The column absorption lineshape and wavelength sampling are shown in Fig. 1.For CO 2 , the atmosphere can be modeled using 100 layers (m = 100), where the layers are spaced almost evenly in pressure to have equal weight.

Forward model
The forward model can be linearized to produce the kernel matrix K as shown in Eq. ( 5).In Fig. 2, we illustrate K by plotting two columns of the K kernel matrix and two rows.The heterogeneity of the K matrix in row and column space is key for the SVD technique to be able to extract principal components.

Measurement noise
For IPDA lidar instruments, one primary limitation is photon shot noise, which is a fundamental quantum noise with variance equal to the number of photons detected.Photon shot noise is the fundamental limiting factor of measurement precision when lidar instruments are laser power limited, which is often the case.Although other forms of noise, such as detector dark current noise, laser speckle noise and solar background noise, also play a role, their effect on the principal components is limited.For this reason and for simplicity, we will assume a photon shot noise limited lidar instrument.
For this example, we will assume an integrated photon count of s 0 for a wavelength sample with no CO 2 absorption.This would give an optical signal level of for each of the wavelength samples λ j .From the definition of the forward model in Eq. (3), we can set Using Eq. ( 37), we can now define the measurement error covariance matrix as . (38)

SVD averaging kernels and error covariance
The CO 2 column dependence of the averaging kernels are plotted in Fig. 3.The first term of the singular value decomposition is mostly derived from x 0 ( V  a measure of the mean signal amplitude (see Fig. 3,left).This is to be expected since every wavelength sample is, in a sense, independently measuring the signal amplitude making it the most prominent term.It is more sensitive to wavelength samples on the wings of the absorption line since those do not see much of CO 2 absorption.The second term of the SVD is the first CO 2 principal component (PC) and behaves like a column-averaged CO 2 mixing ratio or X CO 2 with units of ppm.It should be noted that what is commonly referred to as the column mean in the retrieval community (including this work) should not be construed as a true column mean, i.e. one that has a flat averaging kernel and is thus insensitive to vertical transport of GHG molecules.As one can see in Fig. 3 (right), the column mean averaging kernel has some vertical dependence.
The third term from the SVD or second CO 2 PC behaves analogously to a dipole moment (for instance, the electric dipole moment in physics) and can be assigned dipole moment units of ppm B 2 .Analogous to the electric dipole moment, which has units of electric charge × distance, this dipole moment has units of ppm B ("charg") × B ("distance") = ppm B 2 .
The vertical dipole PC carries information about the vertical distribution of CO 2 and will be examined in detail in the next section.Typical values are between −0.5 and 0.5 ppm B 2 , with more extreme values going up to ±1.5 ppm B 2 .One requires a precision of about 0.1 ppm B 2 in retrieving the CO 2 vertical dipole moment in order to provide some useful vertical information about the CO 2 distribution.The fourth term or third CO 2 PC is the quadrupole moment of the column CO 2 mixing ratio profile.The averaging kernels for the first three CO 2 principal components are plotted in Fig. 3

(right).
As seen earlier in Eq. ( 26), the SVD approach ensures that the random errors in the retrieved quantities are uncorrelated.The variance of the retrieved quantities increases with principal component order, with the vertical dipole moment being about 4.5× less precise (standard deviation) than the column mean and the vertical quadrupole moment being a further 7× less precise.

Effect of spectral resolution on SVD retrievals
Higher-order CO 2 PCs rely on the differential pressure broadening in the CO 2 absorption lineshape along the atmospheric column to provide information about the vertical distribution of CO 2 .For this reason, unlike the column X CO 2 , they are expected to be sensitive to the instrument spectral resolution.
For passive spectrometers that work by resolving sunlight passing through the atmosphere, the instrument spectral resolution and sampling density are directly related and often close to one another.In contrast, lidar instruments probe the atmosphere with essentially monochromatic light; i.e. the laser spectral width is much narrower than gas absorption linewidth, and have spectral resolutions orders of magnitude better than the sampling density.It is important for the reader to note that high-quality measurements for the purposes of obtaining vertical information require high spectral resolution but not necessarily high sampling density.As we shall see in this section, for a given sampling density, the measurement precision depends strongly on the instrument spectral resolution.
We calculate the expected random noise in retrieving the first two PCs for a range of instrument spectral resolutions (see Fig. 4).The precision of the column X CO 2 showed little change with poorer spectral resolution as one would expect.In contrast, the precision of the CO 2 vertical dipole moment very quickly degrades with instrument line broadening.The result has been calculated for the specific case of a wavelength sampling scheme sampling a single absorption line, i.e. the CO 2 Sounder lidar sampling scheme.Nevertheless, the results still give some indication of the importance of spectral resolution.
The ability of satellite-based passive spectrometers to resolve the CO 2 vertical structure is expected to be significantly hampered by their poorer spectral resolutions compared to instruments like TCCON or the CO 2 Sounder.Although, in theory, random errors can be overcome by longer integration times or having more wavelength samples, in practice, as has been the experience of satellite GHG instruments as of the time of writing, systematic effects ultimately limit the accuracy of the measurement.Thus, spectral resolution is crucial in trying to resolve information on the vertical GHG distribution.Figure 4. Retrieval uncertainty versus instrument spectral linewidth for the first two CO 2 principal components (PCs): while the column X CO 2 is largely unaffected by the spectral resolution, the precision of the CO 2 vertical dipole moment degrades strongly with poorer resolution.We assume a CO 2 instrument model, but with some instrument line broadening.The x axis denotes the full width at half maximum of the triangular instrument lineshape used to broaden the CO 2 absorption.We assume photon shot noise with an SNR of 1000 for points with no CO 2 absorption.The spectral resolutions of TCCON (Wunch et al., 2011), satellite GHG-sensing spectrometers (Kuze et al., 2009;Connor et al., 2008) and the CO 2 Sounder instrument are indicated, though the calculations done in this work apply only to the CO 2 Sounder instrument.Typical actual precisions for X CO 2 are 0.15 ppm for TCCON (Wunch et al., 2011), 0.6 ppm (land only) for ACOS GOSAT (O'Dell et al., 2012), and 0.55 ppm (land only) for OCO-2 (Connor et al., 2016a).

Numerical simulations comparing singular value decomposition with Bayesian optimal estimation
In this section, we will look at the retrieval performance of the SVD and traditional OE (refers to OE retrievals with finite S a ) methods for simulated data.After describing the methodology used for comparisons, we will highlight the pitfalls of using too strong or too weak a constraint with the traditional OE method, and how the SVD method provides useful information in the principal component basis independently of the degree of constraint.Then, we will show a case in which the SVD approach successfully extracts vertical CO 2 information from the absorption measurement.

Methodology
For the simulations, we use a CO 2 Sounder instrument model with 30 wavelength samples (n = 30) to better illustrate the shape of the residuals.The measurement is made over a vertical air column from the surface to the top-of-atmosphere.
For the full model basis (x basis), we divide the atmosphere into 100 equal levels (m = 100), each spanning a 10 mB pressure interval.We make comparisons for three different cases, which will be described in the following subsections.Comparing SVD and OE retrievals: using simulated data y generated from a CO 2 profile x truth , we perform SVD and traditional OE retrievals and compare their results averaged over an N = 1000 ensemble projected onto the z basis.Specifically, we look at the variance and the bias compared to z truth , which is projected from x truth .We specify an uninformative prior x u for the SVD results and a Bayesian prior mean x a and prior covariance matrix S a .
For each case, we define a "true" CO 2 profile, x truth and compute the total absorption lineshape.We then compute the signal at the sample wavelengths (using Eq. 36) and add photon shot noise as per Eq. ( 37) to create a "measurement" y (see Fig. 5.For all simulations, we set s 0 = 10 6 , which implies SNR = 1000 for points with no CO 2 absorption.The measurement error covariance matrix is computed using Eq. ( 38).We then perform retrievals with the traditional OE (using Eq. 15) and SVD (using Eq. 13) approaches in their respective bases.By doing this for an ensemble of measurements, using the same x truth and s 0 but a different instance of noise each time, we get a set of results that can be characterized by a mean and standard deviation.
Since the SVD principal components are unbiased, we make quantitative comparisons between the two techniques in that z basis.While the idea of using the z basis might seem new, in practice, column-averaged measurements are typically used for flux estimations.Thus, results in the z basis for the OE method do have wider implications.We also project the SVD results back onto the x basis using Eq. ( 12) to get an intuitive sense of how the SVD and OE approaches work.This last projection is essentially a reduced-rank pseudoinverse calculation (see Sect. 3.3).
For the SVD approach, we set the uninformative prior x u to be a uniform 400 ppm CO 2 profile and anchor our definition of x to it.From this, x = −0.02,0, 0.02 would correspond to mixing ratios of 392, 400 and 408 ppm.For the OE approach, a proper choice of a Bayesian prior would factor in local meteorology, vertical mixing and confidence in global GHG models at the location in question.However, for the purpose of illustration of the workings of the OE method, we have kept the Bayesian prior mean and covariance simple.The Bayesian prior mean and variance (diagonal terms on the covariance matrix) are chosen on a case-by-case basis.For the prior covariance (off-diagonal terms in the covariance matrix), we assume a 200 mB 1/e 2 vertical correlation distance in the CO 2 concentration in the atmosphere.

Constraining the retrieval for regularization
GHG retrievals require some sort of constraint to regularize the retrieval problem (see Sect. 2.1).The level of constraint of an OE retrieval can be expressed as the relative strength of the weighting on the prior value, which is inversely proportional to the prior uncertainty.This uncertainty is specified in the prior covariance matrix S a .In our simulations, the prior uncertainty of the CO 2 concentration at each level (x i in x) is varied between 0.1 % and 100 % (strong to weak) depending on the case.SVD retrievals are constrained by the number of principal components used in the line-fitting.While the constraint is applied in qualitatively different ways to the two retrieval methodologies, the effect is somewhat similar particularly for weak constraints, since the SVD method is the limiting case of a weak prior constraint (discussed in Sect.3.3).For the SVD method, we retrieve between 1 and 4 (strong to weak) CO 2 principal components depending on the case.

Case 1: Underconstrained fit
We choose a sample profile from an atmospheric CO 2 profile measured from aircraft using an in situ instrument from an airborne campaign over California in 2016 (Abshire et al., 2018).For the underconstrained case, we set the prior uncertainty in S a for the traditional OE method to be 100 %.For the SVD retrieval, we include four CO 2 principal components (see Fig. 3 for a description of the components) in the fit.The results for a single simulated measurement are shown in Fig. 6 (this can be contrasted with Fig. 7, which has results for an overconstrained fit).As expected, the OE retrieval (and SVD retrieval projected to x basis) results in a CO 2 column with widely varying mixing ratios.Nevertheless, in the SVD z basis, both methods produce meaningful column averaged X CO 2 results.This is due to the orthogonality of the principal component basis, ensuring that lower-order components are unaffected by large swings or errors in higher-order components.
Ensemble results (Fig. 8, left and center) further confirm that the SVD and OE methods both produce bias-free results in the principal component z basis.In addition, we see that the calculated uncertainty from Eq. ( 26) is in good agreement with the variance in the SVD ensemble as well as the weakly constrained OE ensemble.

Case 2: Overconstrained fit
A strong constraint puts restrictions on the state vector and prevents a retrieval from fully minimizing the residual.Here, we set the prior uncertainty in S a for the traditional OE method to be 0.1 %.For the SVD retrieval, we allow just one CO 2 principal component in the fit.The effects of a strong constraint in each case is shown in Fig. 7.While the OE method shows a clear bias towards the prior mean (X a ), the SVD method is still able to retrieve an accurate X CO 2 column mean.Again, this is due to the orthogonality of the principal component basis, ensuring that lower-order components are unaffected by the absence of higher-order components in the fitting.
Ensemble results (Fig. 8, right) further illustrate the bias in the traditional OE method with strong weighting towards  .Sample retrieval for a single simulated measurement under strong constraint (one principal component for SVD, 0.1 % prior uncertainty for each CO 2 level for OE).(a) Both the SVD and traditional OE approaches produce persistent residuals well above the noise levels due to the strong constraint.For the SVD method, the X CO 2 column mean (first principal component in the z basis) is nevertheless bias-free.However, the OE method shows a bias when the results are projected to the z basis.(b) Results projected to the x basis also show a clear bias for the OE method, though the CO 2 profile is well behaved.This shows that, when using the Bayesian prior as a regularization to get a well-behaved CO 2 profile, one runs the risk of overconstraining the retrieval and incurring a bias in the column mean.See Fig. 8 (right) for ensemble results. .Ensemble results for retrieved CO 2 parameters from numerical simulations.For a weak constraint (see Fig. 6), the SVD method and OE methods both produce good results in the z basis for both the CO 2 vertical dipole moment (a) and X CO 2 column mean (b), which constitute the first two principal components (note that SVD and OE histograms are almost perfectly overlapped).Results are in line with the expected variance, S z from the SVD method.Under a strong constraint (c, Fig. 7), the OE method produces a smaller standard deviation but starts to incur a bias, whereas the SVD method continues to produce accurate results but with no reduction in the variance.Note that for the strong constraint case, the SVD CO 2 vertical dipole moment is not retrieved.
the prior mean.Correspondingly, with a low uncertainty in the prior mean, the traditional OE retrievals produce a lower variance.Thus, in order to benefit from the availability of prior information, the prior mean needs to be in good agreement with the true mean.Ensemble results for the SVD method show that it remains bias-free.The calculated uncertainty for the SVD method, which is independent of the number of principal components, is unchanged, and ensemble results confirm the same.Although a rather extreme constraint has been applied for the traditional OE method, the results show that there are intrinsic problems in using a constraint that is too strong.Often, such biases are subtle and less obvious, but nevertheless affect flux measurements, which are based on several thousand soundings and are sensitive to small biases.In contrast, the SVD approach is more robust.

Case 3: Extracting vertical CO 2 information using the vertical dipole moment term
Having demonstrated the SVD method's general robustness, we now look at the extraction of vertical information about the CO 2 distribution.During a flight over Iowa during the summer crop season in 2011, in situ measurements of the atmospheric CO 2 concentration profile showed a sharp 15 ppm drawdown in the boundary layer compared to the free troposphere (Ramanathan et al., 2015).When projected on the basis of principal components, this corresponded to a significant vertical dipole moment of −1.53 ppm B 2 .Figure 9 shows the SVD method capture the vertical dipole moment Despite a helpful Bayesian prior, the OE retrieval still differs significantly from the true profile.In addition, despite the CO 2 profile differing significantly from the uninformative prior used in the SVD method, the bias in the retrieved X CO 2 is small and for this instance, likely due to random errors (see Fig. 10 for more precise comparisons using an ensemble of simulations).
with an uncertainty of ±0.15 ppm B 2 .When projected back to the x basis, the CO 2 vertical profile reconstructed from the principal components shows that the SVD method is able to reproduce the overall shape but not the sharp increase in the planetary boundary layer.The OE method produces similar results despite a helpful prior from climatology data being used.Biases in the x basis are still rather high, at >5 ppm.
Figure 10 highlights the performance of the SVD and traditional OE methods in measuring the column mean.For the SVD method, we look at ensemble results for several choices in the number of principal components, ranging from 4 (underconstrained) to 1 (overconstrained).For the OE method, we correspondingly vary the prior mean uncertainty (for each layer) from 100 % (underconstrained) to 0.1 % (overconstrained).We look at the variance and the bias of the X CO2 column mean.At the weakest constraints, the SVD and OE methods behave similarly as expected.As the level of constraint is increased, the OE measurement starts to have a lower variance, but incurs a bias since the assumed prior CO 2 profile differs from the truth (see Fig. 9).The SVD method column mean, in contrast, is unaffected despite the uninformative prior (400 ppm uniform column) also differing significantly from the truth.This illustrates the robustness of the SVD method.
Figure 11 shows similar behavior for the retrieved vertical dipole moment.As with the column mean, the SVD and traditional OE methods behave similarly at weak constraints.As the constraint is increased the OE method starts to have a lower variance, but also incurs a significant bias.

Discussion
The SVD framework and its use of principal components provides a mathematical basis on which to determine what information can be extracted from GHG column absorption measurements.Section 3.5 confirms the notion that the retrieval of a column mean using least-squares line fitting of an absorption spectrum yields an estimate of the X GHG without incurring bias from the regularization or retrieval, regardless of the shape of the profile used in the prior (which turns out to be uninformative).Beyond the retrieval of the column mean, the SVD framework identifies higher-order modes such as the vertical gradient (vertical dipole moment), which can potentially be retrieved with sufficient measurement precision.
Although the numerical results from the SVD method have been shown for the CO 2 Sounder lidar instrument, the SVD method itself can also be applied to total column absorption measurements from ground-based and satellite spectrometers, since those instruments also measure pressurebroadened absorption lineshapes in the atmosphere.A key parameter affecting the principal components and the precision to which they can be retrieved is the instrument spectral resolution or linewidth (see Fig. 4).While ground-based spectrometers like TCCON and mini-LHR have a high spectral resolution and can retrieve more than one principal component, others such as the lower-resolution Bruker EM27 (0.5 cm −1 resolution, Gisi et al., 2012) will have significantly poorer precision for higher-order principal components.Furthermore, satellite instruments like GOSAT and OCO-2, besides having coarser spectral resolution, have the additional complication of aerosol scattering mixed with the signal (due to the lack of range gating of the surface reflected signal),  Figure 10.Robust measurement of the X CO 2 column mean by the SVD method.Results from ensembles of 1000 numerical experiments show that for retrievals using the SVD method for a range of constraints (one to four principal components), the variance (a) and bias (b) in the column mean X CO 2 are robust.In contrast, a similar change in constraint when using the OE retrievals (changing prior uncertainty in the layer CO 2 mixing ratio from 100 % to 0.1 %) produces a decrease in the variance of X CO 2 , but also a sharp increase in bias, above the 0.5 ppm accuracy needed for reasonable CO 2 flux inversions.Insets in the lower plot illustrate the ensemble distributions at different constraint points as done in Fig. 8 (center and right plots).
which can limit the accuracy of the retrieved principal components.

Advantages of using principal components
The primary benefits of using the SVD method with retrievals in the principal component basis can be summarized as follows: 1. retrieval of higher-order terms of the greenhouse gas vertical distribution (beyond the column mean) in the atmosphere, 2. no bias from the use of an uninformative prior, 3. orthogonality of principal components leading to robust retrievals independent of the degree of constraint (number of components solved for).
The robustness of the SVD method makes it useful in situations where the prior state is not well known or the uncer- Figure 11.Robust measurement of the CO 2 vertical dipole moment by the SVD method.As in Fig. 10, we look at the variance (a) and bias (b) in the retrieved CO 2 vertical dipole moment for ensembles of 1000 numerical experiments at varying constraints.As with the column X CO 2 , the SVD results are robust and unaffected by the changing number (two to four) of principal components.In contrast, the OE results incur a significant bias (0.1 ppm B 2 ) when the prior uncertainty (the regularizing constraint) for the CO 2 mixing ratio at each layer is set at 1 %.Insets in the lower plot illustrate the ensemble distributions at the minimum and maximum constraints as done in Fig. 8 (left plot).
tainty in the prior is not well quantified.For instance, CO 2 vertical profiles are measured only at a few locations around the Earth.While CO 2 retrievals over those select locations could benefit from the use of a Bayesian prior, retrievals over remote regions far from those places would better be served by the SVD method since the prior knowledge of the CO 2 profile is not well known (see Sect. 6.4 for when to choose SVD over OE).This is a key virtue of the SVD method.
The robustness of the SVD method may also make it easier to use in an operational environment where atmospheric and surface conditions can change the measurement precision significantly.Rather than using advanced retrieval methods to get vertical information separately from the main retrieval of the column mean (as in Kulawik et al., 2017), one can simply retrieve several principal components in the main retrieval itself (and keep them as part of the main product), but only assimilate the components that have sufficient precision in GHG flux models.
Furthermore, when performing the retrieval in the principal component basis, the SVD method requires fewer com-putations than the OE method, which works in the full model basis.This has the potential to make the retrieval faster and more efficient.In addition, the reduced basis of mutually orthogonal principal components makes retrieval analysis easier.Troubleshooting systematic or forward-model errors are also simpler in the principal component basis since the basis is smaller and the prior is uninformative, allowing one to more easily see the effects (manifested as a bias) on the different components.

Practical application of the SVD method to GHG retrievals
In practice, interference from other gas species in the atmosphere (for instance, water vapor) and instrument systematic errors prevent the simultaneous realization of all benefits listed in Sect.6.1 with the use of the SVD method.Nevertheless, one can use the SVD framework to analyze the problem and try to get most of the benefits.While a full analysis of all interferences and systematic errors and their effects on SVD retrievals is beyond the scope of this work, we give a simple example to show how certain types of interferences can be treated within the SVD framework.
The presence of a water vapor line at the shoulder of the CO 2 absorption line described in this work (also see Abshire et al., 2018) causes the principal components to have combinations of water vapor and CO 2 mixing ratios that are not physically meaningful.If one chooses to use the principal component retrieval basis, one gets benefits 2 and 3 described above but not benefit 1.If one chooses to keep the CO 2 mixing ratio principal components separate from the water vapor components in the retrieval basis, one gets benefits 1 and perhaps benefit 2 but not benefit 3.One can also use techniques like clumped fitting (Abshire et al., 2018) to use information based on spatial correlations of the water vapor mixing ratio as well as other systematic effects to try and get at all three key benefits.Kulawik et al. (2016) The SVD method discussed here bears some similarity to the approach used by Kulawik et al. (2017) to extract vertical information.Both methods use an uninformative prior and retrieve two pieces of information about the CO 2 column.Although the LMT (lowermost troposphere) CO 2 product is easier to relate to given that it represents the mixing ratio of the bottom 2.5 km of the atmosphere, it does have some sensitivity (with opposite sign) of higher altitude CO 2 concentrations similar to that of the CO 2 vertical dipole moment term discussed in this work.

Comparing the SVD method to
There are also some important differences between the two methods.In using principal components, the SVD retrieval produces orthogonal parameters that have uncorrelated errors and thus errors in the X CO 2 are uncorrelated with those of .Decision tree for a suitable retrieval approach when the first principal component is a column mean.The quality of prior information compared to the signal-to-noise ratio (SNR) determines which retrieval method would be better suited.The SVD method (with principal components retrieved) is robust and can be applied to a range of situations.However, in situations where the prior information is good (relative to the measurement SNR), the OE method offers a clear advantage of a lower variance in the retrieved X CO2 .
the CO 2 vertical dipole moment.In contrast, given the way the information is partitioned in Kulawik et al. (2017), the LMT product is expected to be negatively correlated with the U (upper atmosphere) product.In addition, it is expected to have higher precision than the SVD vertical dipole moment for the same data.Future work will involve a quantitative comparison of retrievals using the two techniques on the same absorption data, which could better illustrate the advantages of each of these methods.

Implications of using the traditional OE method
Going beyond the domain of trace gas retrievals to the broader problem of atmospheric sounding, the simulations shown in this paper underscore the importance of choosing a proper Bayesian prior and prior covariance if using the OE method.Ideally, the choice of these parameters will be from a large sampling of the true state space.In the absence of such data, the prior mean may be different from the true mean.
Setting or tuning of the constraint from the Bayesian prior for the purpose of regularization of the retrieval problem runs the risk of overstating prior knowledge and thus causing a bias.
In choosing between the SVD method and the traditional OE method, one needs to factor in the quality of the prior information (See Fig. 12) relative to the signal-to-noise ratio of the measurement.While the SVD method is always the safer option (less susceptible to bias), in situations when the measurement is noisy but the Bayesian prior is well characterized, the OE retrieval will result in a lower variance.Finally, the SVD method can also be used to check the validity of an OE prior used for retrieval.
We have described an approach to deducing vertical information from column GHG retrievals based on the singular value decomposition.The SVD approach does not require an assumption of a prior distribution of the GHG profile for regularizing the retrieval problem, and by using the principal component basis for retrievals, the prior is rendered uninformative.Simulations comparing the SVD method to the traditional Bayesian OE (using an informative prior) show that the SVD method is more robust and better suited to situations where prior knowledge of the CO 2 concentration and distribution is lacking or poorly characterized.
Intuitively, OE derives an estimate of the state using both the measurement and prior knowledge, while SVD only uses the measurement to inform its estimate.When the prior information is correct, there is no doubt that OE will have lower posterior uncertainty since OE can leverage an extra source of information to more efficiently derive its estimate.However, this efficiency comes at a potential cost when the prior is incorrect.For instance, we showed that when OE uses an incorrect prior mean, then the estimate is guaranteed to be biased.Estimates from the SVD method in the principal component basis, on the other hand, are insensitive to incorrect information coming from the prior.The choice between SVD and OE then mostly comes down to how well one understands the prior distribution of the state of interest.
In this work, we have assumed a perfect forward model and only random errors in the measurement.This is a necessary first step check for the feasibility of the method.However, in practice, other sources of error such as imperfect instrument calibration, imperfect knowledge of atmospheric state and forward-model approximations play important roles.Our preliminary attempts using CO 2 Sounder data from airborne field campaigns have shown that small errors in spectroscopy arising from neglecting the non-Voigt component of the lineshape can cause significant biases.These errors are beyond the scope of this paper and will be addressed in future work.
Another interesting topic is extending this work to nonlinear forward models, where the minimization of the loss function in Eq. ( 6) amounts to solving a nonlinear least square problem.In the traditional OE framework, the maximum a posteriori solutions are popularly solved using some variation of Newton's method.Since we have shown that the SVD method can be viewed as an OE algorithm, its extension to the nonlinear forward model can similarly make use of the iterative Newton's method (Rodgers, 2000) to solve for the maximum a posteriori solutions.Preliminary numerical simulations indicate that the SVD method is still unbiased for nonlinear forward models as long as F(•) is sufficiently "smooth" at the retrieved state, though further studies are required.
Future work will also explore other aspects of the measurement problem such as determining the optimal wavelength sampling.In contrast to passive spectrometers, which can have a large number of samples, lidar instruments bear some cost for each additional sample.While in theory one needs a wavelength sample for each principal component retrieved, in practice one needs to oversample the line to help reduce systematic errors (control biases).Determining the optimal wavelength sampling to best obtain information about the vertical distribution of the GHG while keeping biases low is important in the design of space-based IPDA lidar instruments for GHG measurements.
Figure1.Schematic of the various terms involved in a greenhouse gas (GHG) measurement, retrieval and end use.The singular value decomposition (SVD) method introduces a new retrieval basis space z, which is different from the model parameter space x.In using the z basis, the SVD retrieval makes no assumptions regarding the prior GHG distribution, thus avoiding a potential source bias and making the validation and flux modeling more straightforward.

Figure 2 .
Figure 2. Forward-model K matrix: (a) we illustrate the K matrix by plotting two columns, each corresponding to the absorption due to a certain slice of the atmosphere.With increasing atmospheric pressure, the absorption lineshape is pressure-broadened.(b) We plot three rows of K, each showing the dependence of the absorption to different parts of the atmosphere for a given measurement wavelength sample.The sample wavelength corresponding to each row has been expressed as a deviation from the absorption line center 1572.335122nm.The absorption is lower the further one deviates from the absorption line center.

Figure 3 .
Figure 3. SVD retrieval basis: (a) the rows of the G matrix are plotted as a function of the wavelength of the measurement samples.(b) The averaging kernels of the first three CO 2 principal component terms are plotted.Each subsequent term has an additional zero crossing in the averaging kernel.
Figure5.Comparing SVD and OE retrievals: using simulated data y generated from a CO 2 profile x truth , we perform SVD and traditional OE retrievals and compare their results averaged over an N = 1000 ensemble projected onto the z basis.Specifically, we look at the variance and the bias compared to z truth , which is projected from x truth .We specify an uninformative prior x u for the SVD results and a Bayesian prior mean x a and prior covariance matrix S a .

Figure 6 .
Figure6.Sample retrieval for a single simulated measurement (noise instance) under a weak constraint (four principal components for SVD, 100 % prior uncertainty for each CO 2 level for OE).(a) The SVD and traditional OE approaches successfully minimize the fit residual to match that of the noise, thus demonstrating convergence.Results projected to the z basis show reasonable performance of the X CO 2 column mean (first principal component) but poor performances for higher-order terms (not shown), indicating overfitting to the noise.(b) Results projected to the x basis show highly oscillatory and divergent profiles due to the instability in overfitting.Thus, traditional OE results in the full model x basis are not useful and need projection onto the z basis or other transformation.Note that this has been shown for illustrative purposes.A proper evaluation of the methods requires an ensemble average of such simulations (seeFig.8 left and center) Figure7.Sample retrieval for a single simulated measurement under strong constraint (one principal component for SVD, 0.1 % prior uncertainty for each CO 2 level for OE).(a) Both the SVD and traditional OE approaches produce persistent residuals well above the noise levels due to the strong constraint.For the SVD method, the X CO 2 column mean (first principal component in the z basis) is nevertheless bias-free.However, the OE method shows a bias when the results are projected to the z basis.(b) Results projected to the x basis also show a clear bias for the OE method, though the CO 2 profile is well behaved.This shows that, when using the Bayesian prior as a regularization to get a well-behaved CO 2 profile, one runs the risk of overconstraining the retrieval and incurring a bias in the column mean.See Fig.8(right) for ensemble results.
Figure8.Ensemble results for retrieved CO 2 parameters from numerical simulations.For a weak constraint (see Fig.6), the SVD method and OE methods both produce good results in the z basis for both the CO 2 vertical dipole moment (a) and X CO 2 column mean (b), which constitute the first two principal components (note that SVD and OE histograms are almost perfectly overlapped).Results are in line with the expected variance, S z from the SVD method.Under a strong constraint (c, Fig.7), the OE method produces a smaller standard deviation but starts to incur a bias, whereas the SVD method continues to produce accurate results but with no reduction in the variance.Note that for the strong constraint case, the SVD CO 2 vertical dipole moment is not retrieved.

Figure 9 .
Figure9.Sample simulated measurement of vertical dipole moment using the SVD method and appropriate constraint.(a) The SVD and OE approaches demonstrate good convergence, and since the SNR is relatively high, the residuals are small.Results projected to the z basis show reasonable performance of both techniques for retrieving the X CO 2 column mean (first principal component) and vertical dipole moment, with agreement within the expected variance.(b) Both methods detect the overall decrease in the CO 2 concentration at low altitudes.Despite a helpful Bayesian prior, the OE retrieval still differs significantly from the true profile.In addition, despite the CO 2 profile differing significantly from the uninformative prior used in the SVD method, the bias in the retrieved X CO 2 is small and for this instance, likely due to random errors (see Fig.10for more precise comparisons using an ensemble of simulations).
Figure12.Decision tree for a suitable retrieval approach when the first principal component is a column mean.The quality of prior information compared to the signal-to-noise ratio (SNR) determines which retrieval method would be better suited.The SVD method (with principal components retrieved) is robust and can be applied to a range of situations.However, in situations where the prior information is good (relative to the measurement SNR), the OE method offers a clear advantage of a lower variance in the retrieved X CO2 .
used the prior distribution for regularizing the CO 2 retrievals but the SVD-reduced dimensionality for CH 4