Maximum likelihood representation of MIPAS profiles

In order to avoid problems connected with the content of a priori information in volume mixing ratio vertical profiles measured with the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS), a user-friendly representation of the data has been developed which will be made available in addition to the regular data product. In this representation, the data will be provided on a fixed pressure grid coarse enough to allow a virtually unconstrained retrieval. To avoid data interpolation, the grid is chosen to be a subset of the pressure grids used by the Chemistry– Climate Model Initiative and the Data Initiative within the Stratosphere-troposphere Processes And their Role in Climate (SPARC) project as well as the Intergovernmental Panel of Climate Change climatologies and model calculations. For representation, the profiles have been transformed to boxcar base functions, which means that volume mixing ratios are constant within a layer. This representation is thought to be more adequate for comparison with model data. While this method is applicable also to vertical profiles of other species, the method is discussed using ozone as an example.


Introduction
The often ill-posed nature of inverse problems in remote sensing of the atmosphere is typically fought by formal regularization, i.e. by inclusion of prior information in a Bayesian or related sense.The most common of such methods is optimal estimation (e.g.Rodgers, 1976), which was later renamed maximum a posteriori retrieval (Rodgers, 2000).Another common regularization method is that developed independently by Tikhonov (1963), Twomey (1963), and Phillips (1962).While all these problems provide profiles which are in some sense optimal, the major drawback is that the data product contains a certain amount of a priori information.This a priori content can precisely be characterized and quantified by the averaging kernel, which is the derivative of the retrieved atmospheric state with respect to the true atmospheric state.Some problems, however, still remain: (a) the need to communicate averaging kernels to the data user multiplies the amount of data to be transmitted; (b) many data users are not willing to consider averaging kernels in their analysis and prefer to take the data as they are, ignoring their content of prior information; and (c) averaging kernels vary with time, which leads to unsolved problems in trend analysis (e.g.Yoon et al., 2013) or analysis of annual cycles (e.g.Hegglin et al., 2013, andSchieferdecker et al., 2015, particularly their Fig. 2 and related discussion).On the other hand, every statistics toolbox offers solutions to deal with varying error bars.Thus, it appears desirable to offer an alternative representation of the data which is user-friendly in a sense that the data user need not worry about averaging kernels, and no averaging kernels have to be provided to the user.Further, all variation of the information content of the measurement shall solely be reflected by the error bar, while the altitude resolution shall remain constant with time.In this paper we present a retrieval scheme which provides vertical abundance profiles of atmospheric constituents measured with the Michelson Interferometer for Passive Atmospheric Sounding (MI-PAS) in a maximum likelihood representation, i.e. a representation free of formal prior information.After a description of the MIPAS instrument and observations (Sect.2), we introduce the retrieval scheme along with related terminology and describe how the maximum likelihood representation is achieved (Sect.3).In Sect. 4 resampling on a user-friendly profile representation is discussed.Recommendations on the adequate use of the resampled data are given in Sect.5, while in Sect.6 we show examples of MIPAS retrievals and compare the maximum likelihood representation to the regular data version.Finally, in Sect.7, we critically discuss the ben-efits and caveats of the new representation of the MIPAS data.

MIPAS
As a test case for the method suggested and for purposes of demonstration, measurements obtained with the Michelson Interferometer for Passive Atmospheric Sounding are used.MIPAS, a limb-viewing mid-infrared Fourier transform spectrometer (Fischer et al., 2008), made global measurements of spectrally resolved atmospheric emission between June 2004 and April 2012.These spectrally resolved radiance measurements in the wavelength range from 4.1 to 14.7 µm (685-2410 cm −1 ) are processed to produce global temperature and composition distributions.Analyses discussed in this study are based on data retrieved with the data processor developed and operated jointly by the Institute for Meteorology and Climate Research (IMK) at the Karlsruhe Institute of Technology (KIT) and the Instituto de Astrofísica de Andalucía (IAA/CSIC).
The IMK-IAA data processor relies on multi-parameter non-linear least-squares fitting of measured and modelled spectra within a framework of Newtonian iteration (von Clarmann et al., 2003).Its extension to retrievals involving nonlocal thermodynamic equilibrium emissions is described in Funke et al. (2001), while the adaption to the reduced spectral resolution measurements as recorded since 2005 is documented in von Clarmann et al. (2009).These "regular" IMK-IAA retrievals, which contain a certain amount of prior information, are taken as the basis for the alternative representation, which is henceforth called the "maximum likelihood" (ML) product.The underlying concept of a "maximum likely estimate" has been developed by Fisher and Lucka (1956) and refers to the most plausible solution of a well-posed inverse problem without consideration of prior information, as opposed to solutions within a Bayesian framework, where background or prior knowledge is used.

The maximum likelihood representation
"True" (whatever this is) vertical profiles of atmospheric state variables are regarded as continuous.This issue is critically discussed in von Clarmann ( 2014), but for the purpose of this maximum likelihood representation it is not relevant whether the ideal representation of the profile requires an infinite number of values or just a too-large number of unknowns for a well-posed retrieval.In any case, some kind of prior information or prior assumptions are needed to get an unambiguous solution to the inverse problem involved in the inference of atmospheric state variables from a finite number of measurements.The prior information or prior assumption can be used either in an explicit formal way or in an implicit way.In the latter case a sufficiently coarse sampling of the atmosphere allows a retrieval without formal prior informa-tion.In this case, the interpolation scheme which describes how the atmosphere behaves between the sampling points arguably can be regarded as a prior assumption in itself.
Constant values within layers or linear interpolation between adjacent levels are two common options to implement prior assumptions in the ML case.The ML estimator xml of the true state x is where K is the Jacobian ∂y/∂x, S y is the measurement error covariance matrix, y is the vector of measurements, f (x 0 ) are the simulated measurements as calculated for an initial guess atmospheric state x 0 , and where T indicates a transposed matrix.Variants of this estimator in the context of Newtonian iteration exist but are ignored for the moment because they are not relevant to the current discussion.
The estimator using a formal constraint is Where R is a constraint matrix which in one way or the other pushes the estimate xc towards the a priori assumption x a .Commonly used constraints are R = S −1 a , where the inverse a priori covariance matrix S −1 a pushes the estimate towards the prior information and yields an optimal estimate in a Bayesian sense (e.g.Rodgers, 2000), or setting R a squared first-order finite differences operator (Tikhonov, 1963;Twomey, 1963;Phillips, 1962), and x a = 0, where the estimate will be a smoothed version of the true profile (Steck and von Clarmann, 2001;von Clarmann et al., 2009).The presence of a formal constraint term implies that the grid on which the atmospheric state is represented can be chosen finer than the ML grid without running into problems of singular matrices.
The advantage of ML representation of vertical profiles is that the averaging kernel matrix, which includes the partial derivatives of the retrieved profile values at altitude i with respect to the true profile values at altitude j , is the unity matrix, at least when represented on the grid on which the retrieval is made (this issue will be critically discussed below).The price to pay is that the ML representation of vertical profiles is only possible if the altitude grid is coarse enough to allow a stable retrieval.A fine vertical grid in an unregularized retrieval will lead to ill-posed retrieval problems which boost the error bars (see, e.g., Rodgers, 2000, for a more detailed discussion).The most user-friendly altitude grid is the one the data user works with, because this avoids interpolation problems and saves the data user from transforming the unity averaging kernel to the new grid.For this reason, we use for our retrievals a well-established pressure grid as a "master vertical grid", which has been used (except for the 700 and 400 hPa values newly added) in the context of the SPARC Chemistry-Climate Model Validation activities (e.g.Eyring et al., 2006;Eyring et al., 2007;Hegglin et al., 2010) or the SPARC Data Initiative (Hegglin and Tegtmeier, 2011).The pressure grid points of the master grid are 1000.0,700.0, 500.0,400.0, 300.0, 250.0, 200.0, 170.0, 150.0, 130.0, 115.0, 100.0, 90.0, 80.0, 70.0, 50.0, 30.0, 20.0, 15.0, 10.0, 7.0, 5.0, 3.0, 2.0, 1.5, 1.0, 0.7, 0.5, 0.3, 0.2, 0.15, 0.1, 0.03, 0.01, 0.003, 0.001, 0.0003, 0.00003, and 0.00001 hPa.
This master vertical grid, however, is still too fine to allow stable unregularized retrievals.In order to avoid any unnecessary interpolation, the actual retrieval grid is chosen to be a subset of the master grid.The approximate allowed minimum gridwidth is determined with the method presented by von Clarmann and Grabowski ( 2007), who allow one vertical grid point for the ML retrieval per altitude range over which the sum of the diagonal of the averaging kernel of a preceding regularized fine-grid retrieval is unity.Since our ML data product is only an alternative representation to the already available regularized product, it is generated by only one iteration where the regular product is used as an initial guess.Technically, the ML product is also generated with the same processor which generates the regular product, and the only difference is the altitude grid and the regularization.The regularization of the new product uses a Tikhonovtype (Tikhonov, 1963) first-order smoothing constraint matrix with altitude-dependent regularization strength.Over a wide part of the profile the regularization strength is set so weak that the retrieval remains practically unregularized and averaging kernel diagonal elements are in good approximation unity.Only at the upper and the lower end of the profile is some noticeable regularization allowed.Profile points at these altitudes, however, are not included in the ML data product.The ML characteristics are guaranteed by providing only data points for which the diagonal of the averaging kernel matrix is larger than 0.995.The term "quasi-MLrepresentation" might be more adequate, but for the sake of avoiding complication we still stick with the term "ML representation".
For reasons of computational accuracy, the forward model used for the retrieval solves (i.e.numerically integrates) the radiative transfer equation on an altitude grid which is finer than the retrieval grid.For this purpose, the constituent and temperature profiles are interpolated to the forward model grid points linearly with altitude.This means that the profile is represented by triangular base functions.The larger peak values of the ML profiles compared to the regular representation where the peaks are more rounded is a typical feature of triangular base functions: their peak is slimmer, which is compensated by a larger peak value to be consistent with the same amount of molecules (Fig. 1).

Staircase profiles
Often profiles based on triangular base functions are not directly comparable to modelled profiles, because the latter frequently are layer models, which assume constant atmo- A MIPAS ozone profile from the regular processing (black) and the same profile in a maximum likelihood representation where linear interpolation of vmr between altitude grid points was assumed (red).This particular example is a measurement at 17.75 • S, 96.70 • W on 02 October 2009.The original retrieval has 20.6 degrees of freedom, yielding 19 steps in the maximum likelihood representation, of which 17 are actually free of formal prior information.The regular MIPAS ozone data product used here has been validated by Laeng et al. (2014Laeng et al. ( , 2015)).spheric conditions within a layer around the vertical grid point.In other words, modelled profiles often have rectangular base functions; i.e. they are "staircase profiles".Thus the retrieved ML profiles are converted in a postprocessing step into staircase profiles containing the same amount of molecules.The transformation is reported in Appendix A, and results are shown in Fig. 2.
The staircase profiles could easily be obtained without the postprocessing described above simply by using rectangular base functions in the retrieval and the underlying forward model.We think, however, that linearly interpolated profiles are more adequate because they represent the true atmosphere better than the staircase profiles with their discontinuities at the layer boundaries.To avoid related inaccuracies in the radiative transfer calculations, we consider the triangular base functions as more appropriate and accept the additional effort implied by the transformation to the rectangular base functions.

How to use the ML data
In cases when the ML data are sampled exactly on the model grid, direct comparison of observations and model data without any transformation is possible and adequate.It is sufficient that the grids match in the altitude range considered for the intercomparison.Occasionally, the observations will, for reasons discussed above, be represented in a subset of the master grid.In these cases a mass-weighted mean of the respective two or more contiguous model layers will be the adequate quantity to be compared with the observation.This approach, while still simple and intuitive, is roughly equivalent with the application of the averaging kernel evaluated on the master grid, assuming linearity in a sense that the instrument is equally sensitive to the target gas regardless of its actual distribution within the combined retrieval layer.

Results
Here we compare the differences between the regular and the ML representation of typical ozone retrievals on the basis of zonal mean volume mixing ratio cross sections (Fig. 3).The distributions are virtually equivalent with respect to amounts and morphology.No differences in information content or atmospheric state are discernable except at the ozone peak in equatorial latitudes.Slightly lower peak values are a typical characteristic of the staircase representation.The mass within a layer, however, is the same as that resulting from integration over a layer with altitude-dependent volume mixing ratios.

Discussion and conclusion
Admittedly the statement that the averaging kernel of the ML retrieval is unity can be challenged: while algebraically this is certainly true when the averaging kernel is calculated on the coarse ML retrieval grid, the correct sensitivity of the retrieval with respect to the true atmospheric state would require consideration of atmospheric variability on a finer scale (Rodgers, 2000, Sect. 10.1).The fact that the ML averaging kernel is unity only on a coarse grid implies that it does not represent the response of the retrieval to true atmospheric structures of scales too fine to be represented on the coarse grid.These finer structures, however, cannot be seen by the instrument anyway, because the coarse grid has been chosen as fine as possible to allow an unconstrained retrieval.From the fact that any finer grid would lead to an ill-posed retrieval problem, we conclude that the respective rows of the Jaco- bian are linearly dependent and that the coarse-grid averaging kernel thus is a sufficiently accurate approximation to the true averaging kernel.This concept is referred to as "weak gridding criterion" by von Clarmann (2014).
For each species the altitude grid is chosen time-invariant.Thus, the altitude resolution, which in an ML retrieval is determined solely by the vertical grid chosen, does not vary with time either.Variations of the information content of the measurements with time thus change only the error bars, while the vertical resolution remains constant.Variable error bars, however, can be handled by any advanced statistical toolbox, as opposed to varying averaging kernels.In consequence, problems in time series analysis encountered by Yoon et al. (2013) or Hegglin et al. (2013) are avoided.
In the upcoming version of MIPAS data, both representations of the data will be made available: the regular retrievals will be available along with the usual diagnostics including averaging kernels and error estimates.The easy-to-use ML data product will be made available as an alternative, for applications where related problems cannot easily be solved by application of the averaging kernels (amplitudes of annual variation; trends) or for data users who have no experience in working with data containing a priori knowledge.It must be mentioned that for some species the ML representation is inadequate.Abundances of species like CO, NO, or NO 2 vary locally by orders of magnitudes, and it is not possible to to define a global grid on which a reasonable unconstrained profile retrieval is always possible.For these species the data user will have to use the regular data product.
Our method is initially targeted at trace gas distributions measured with the limb-sounding technique and should be applicable to a wide range of instruments measuring limb emission or solar/stellar/lunar occultation.Application to nadir-viewing instruments could be more difficult because of their limited vertical resolution.Nevertheless, similar efforts were made in the context of the nadir-looking Tropospheric Emission Spectrometer (TES) by Payne et al. (2009) Figure1.A MIPAS ozone profile from the regular processing (black) and the same profile in a maximum likelihood representation where linear interpolation of vmr between altitude grid points was assumed (red).This particular example is a measurement at 17.75 • S, 96.70 • W on 02 October 2009.The original retrieval has 20.6 degrees of freedom, yielding 19 steps in the maximum likelihood representation, of which 17 are actually free of formal prior information.The regular MIPAS ozone data product used here has been validated byLaeng et al. (2014Laeng et al. ( , 2015)).

Figure 2 .
Figure 2.An ozone profile represented in triangular (black) and rectangular (red) base functions.

Figure 3 .
Figure 3. Zonal mean ozone distributions from the regular retrieval on a fine grid (top panel) and the maximum likelihood staircase representation on a coarse grid (bottom panel). .