Tomographic airborne limb sounder retrievals on irregular grid with second order regularisation

Multiple limb sounder measurements of the same atmospheric region taken from different directions can be combined in a 3D tomographic retrieval. Mathematically, this is a computationally expensive inverse modelling problem. It typically requires an introduction of some general knowledge of the atmosphere (regularisation) due to its underdetermined nature. This paper introduces a consistent, physically motivated (no ad-hoc unphysical parameters) variant of the Tikhonov regularisation scheme based on spatial derivatives of first and second order. As shown by a case study with synthetic data, this scheme, 5 combined with irregular grid retrieval methods, improves both upon the quality and the computational cost of 3D tomography, while eliminating grid dependence and the need to tune parameters for each use case. The few physical parameters required can be derived from in situ measurements and model data. An efficient Monte Carlo technique was adopted for retrieval accuracy estimation.

solution that is in best agreement with our prior knowledge of the atmospheric state and general understanding of the physics involved. The mathematical framework of the tomographic retrieval is outlined in section 2.1.
The choice of regularisation algorithm has profound influence on retrieval quality. The classic Tikhonov regularisation scheme given in equation (2) is an established technique used for infrared limb sounding instruments such as the Michelson Interferometer for Passive Atmospheric Sounding (Steck et al., 2005;Carlotti et al., 2001)), GLORIA (Ungermann et al., 2010;5 Ungermann et al., 2011). It quantifies the spatial continuity of an atmospheric state by evaluating first order spatial derivatives of the retrieved quantities. This technique serves the purpose of ruling out very pathological, oscillatory retrieval results, but requires fine tuning of several unphysical ad-hoc parameters and subsequent validation. In this paper, we introduce an improved, more physically and statistically motivated approach to regularisation, that requires less tuning. It relies on the calculation of both first and second spatial derivatives of atmospheric quantities, which provide more complete information about smoothness 10 and feasibility of a particular atmospheric state. The regularisation parameters we use are physical, in the sense that they can be estimated from in situ measurements or model data, and less grid dependant: their values, once established, can be more readily used for new retrievals.
Data retrievals from limb sounders are typically performed on a rectilinear grid. In this paper, we define rectilinear grid by taking a set of longitudes, a set of latitudes and a set of altitudes and placing a grid point at each of the possible combinations 15 of these coordinates. Such grids can be a limiting factor for efficiency of numerical calculations. Due to the exponential nature of atmosphere density distribution with altitude, most of air radiance along any line of sight of a limb imager comes from the vicinity of the lowest altitude point on a given line of sight (called tangent point). The resolution of the retrieved data depends on the density of tangent points in the area. For airborne observations this density is highly inhomogeneous. The densely measured regions are limited in size, located below flight altitude only and rarely rectangular. The large, poorly resolved areas with little 20 or no tangent points still need to be included in the grid as long as lines of sight of any measurements pass through them.
A rectilinear grid retrieval tends to either underresolve the well-measured area, or waste memory and computation time for regions with few measurements. To avoid this problem, a tomographic retrieval on irregular grid with Delaunay triangulation was developed (section 2). It allows for significant computational cost improvements without compromising retrieval quality.
Most retrieval error estimation techniques would be unreasonably computationally expensive in the case of a 3D tomographic 25 retrieval. Monte Carlo methods allow a relatively quick error estimation. In order to apply Monte Carlo for a 3D tomographic retrieval on an irregular grid and using our newly developed regularisation, dedicated algorithms have to be developed. These are presented in section 3.
The new methods described in sections 2 and 3 were tested with synthetic measurement data to compare their results and computational costs. The tests, and their results, are described in detail in section 4.
2 Regularisation improvements and irregular grids 2.1 The retrieval The main focus of this paper is the algorithm for retrieving atmospheric quantities, such as temperature or trace gas mixing ratios, from limb sounder measurements in 3D. In this section, we outline the general inverse modelling approach to this problem and identify some aspects we aim to improve upon. 5 Let an atmospheric state vector x ∈ R n represent the values of some atmospheric quantities on a finite grid and let y ∈ R m be a set of m remote measurements taken within the region in question. The physics of the measurement process itself has to be understood for the instrument to be practical, so one can usually build a theoretical model of the measurement (the forward model) F : R n → R m . The inverse problem of determining x given y is then typically both underdetermined and ill-posed: many atmospheric states would result in the same measurement, but due to instrument and model errors no states would yield 10 the exact measurement result y. One can solve this problem by finding an atmospheric state x that minimises the following quadratic cost function The first term of (1) quantifies the difference between the actual measurement and a simulated one (given the atmospheric state is x). The matrix S −1 ∈ R m × R m represents the expected instrument errors. The second term introduces regularisation, 15 i.e. it is larger for x that are unlikely given our general, measurement unrelated, knowledge about the atmosphere. In practice this is achieved using an a priori state x a , usually derived from climatologies. It is meant for introducing the best of our prior knowledge into the retrieval process without imposing any features of the kind to be measured. The precision matrix S −1 a (the inverse of a covariance matrix for atmospheric states S a ) contains the prior knowledge about the probability distribution of atmospheric states. For a more detailed discussion of this approach refer to Rodgers (2000).

20
Finding the precision matrix S −1 a that would correctly represent the physical and statistical properties of the atmosphere is challenging. The Tikhonov regularisation approach (Tikhonov and Arsenin, 1977) is widely used for this purpose. Here the matrix L 0 is diagonal and contains standard deviations of atmospheric quantities, and L x , L y , L z represent spatial derivatives in the respective directions on regular rectangular grid, estimated by forward 25 differences. The positive constants α 0 , α h , α v are chosen ad-hoc. This is a convenient way to construct S −1 a , and it serves the purpose of ruling out very pathological, oscillatory retrievals. The constants α 0 , α h , α v are, however, unphysical, grid dependent, and hence have to be estimated (typically by trial-and-error) for each use case. They are not related to the properties of the atmosphere in any simple fashion. Also, the usage of first order derivatives only is justified mostly by simplicity and the need to keep the computations cheap. 30 We chose a slightly different, more physically and statistically motivated approach to regularisation, which is introduced in the following section.

Covariance
This section describes the theoretical background for the regularisation algorithm used for our retrieval, i.e. the motivation for S −1 a in equation (1). If the atmospheric state x has the mean x apr and covariance matrix S a then its entropy is maximised if x has multivariate normal distribution (proved e.g. by Tarantola, 2013). Hence by treating x as a random vector with this distribution we impose 5 the smallest restriction possible while still introducing a priori information from S a into our retrieval. Also, the probability density of x is then This immediately justifies the second term of equation (1), provided that S −1 a represents, to some extent, the actual statistics of the atmosphere. The precision matrix from equation (2), is, however, just a mathematical device not meant to represent the 10 physical world, as discussed in the previous section. Furthermore, we would like to derive a coordinate independent expression for the regularisation term of cost function (also unlike equation (2)), as this would be useful for work with irregular grids and allow us to use the same regularisation on completely different grid geometries. To achieve these goals, we go back to the theoretical foundations of the classic Tikhonov approach, but do not limit ourselves to just minimising first-order spatial derivatives of the result.

15
Let us denote the departure of the atmospheric quantity f (r) from the a priori by φ (r) = f (r) − f apr (r) for some r ∈ R 3 .
In some finite volume V ⊂ R 3 , we define the covariance operator C by where C k : R 3 × R 3 → R is the covariance kernel. Then we can treat the scalar fields φ (r) as elements of Hilbert space with the product It induces the norm φ 2 = φ, φ , which is represented, in our discrete numerical approach, by the second term of (1).
It now remains to find an appropriate covariance kernel C k . Let us first consider an atmosphere that is isotropic and has the same physical and statistical properties everywhere. Then one would expect C k (r, r ) = g ( r − r ) with some monotonously decreasing function g : [0, ∞) → [0, ∞). The most common kernel used in literature for fluid dynamics, meteorology and 25 similar applications is the parametric Matérn covariance kernel (introduced, e.g. by Lim and Teo (2009)) where d = r − r , σ and L are the standard deviation and typical length scale of structures (correlation length), respectively, of the atmospheric quantity in question. K ν is the modified Bessel function of the second kind, Γ(ν) is the Gamma function. Exponential and Gaussian covariance kernels are special cases of the kernel (6), with ν = 0.5 and ν = 1 respectively. We choose the exponential covariance as it closely resembles the Matérn with ν values that are typically used for fluid problems and allows for analytic derivation of the subsequently required quantities. Also, having a significant number of parameters to estimate theoretically, we could not 5 make a good use of the flexibility provided by the free parameter of the Matérn covariance in any case.
It can be shown (Tarantola, 2013) that the norm associated with the covariance (7) is In a more realistic picture of the atmosphere, the correlation length L depends on altitude and strongly depends on direction: correlation between vertically separated air parcels is much weaker than between air parcels separated by the same distance 10 horizontally. We propose to deal with anisotropic or variable L by performing a coordinate transformation such that L would be isotropic and constant in resulting coordinates. In particular, let U, V ⊂ R 3 and consider a bijective map ξ : V → U such that both ξ and ξ −1 are twice differentiable on their respective domains and have non-zero first and second derivatives everywhere.
Then, using integration by substitution and basic vector calculus identities, equation (8) can be written as Here (Dξ) ij = ∂ξ i /∂u j is the Jacobian, and we define the matrix (D 2 f ) ij = (∂ 2 f )/(∂u i ∂u j ). ξ can be chosen so that all its spatial derivatives could be computed analytically, and ∇(φ (ξ)), D 2 (φ (ξ)) are the derivatives of φ in the transformed space, so numerical evaluation of (9) is similar in complexity to that of (8), the only notable difference being the need to compute the mixed derivatives ∂ 2 φ/(∂u i ∂u j ), i = j if Dξ is not diagonal. A large and relevant class of suitable maps ξ can be defined, for example, by assuming that the correlation lengths in vertical and horizontal direction are smooth functions of altitude L v (z),

20
L h (z) (we use Cartesian coordinates (x, y, z) where z axis is vertical). Then the map defines a corresponding transformation. If, in addition, L h is constant, mixed derivatives are not required and we can simply evaluate ∇(φ (ξ)), ∆(φ (ξ)) in transformed space. Transformations of this type could be used, for example, to perform regularisation in tropopause-based coordinates and use different correlation lengths for troposphere and stratosphere. For the study 25 described in this paper, we only introduce anisotropy (i.e. L v and L h are different, but do not depend on altitude) and equation To implement the calculation of this cost function on an irregular grid one needs to have a triangulation for that grid, an interpolation algorithm (both described in section 2.3), a method for calculating the spatial derivatives ∇φ and ∆φ (section 2.4), and a way to compute volume integrals (section 2.5). The estimation of the physical parameters in equation (11) is 5 discussed in section 4.3.

Delaunay triangulation and interpolation
We aim to perform a retrieval on an arbitrary, finite grid. This section explains how vertically stretched Delaunay triangulation with linear interpolation is employed for that purpose.
To maintain generality and compatibility with arbitrary grids, we partition the retrieval volume into Euclidean simplexes 10 (tetrahedrons in our 3D case) with vertices at grid points. Many such partitions exist, but it is beneficial to ensure that the number of very elongated tetrahedrons, that increase interpolation and volume integration errors, are kept to a minimum.
The standard technique to achieve this is to use a Delaunay triangulation (Delaunay (1934), Boissonnat and Yvinec (1998)), which maximizes the minimum solid angle of any tetrahedron employed. Recall, also, from section 2.2 that atmospheric quantities in UTLS and the stratosphere above tend to have much more variation vertically than they do horizontally within 15 similar length scales. This difference can be quantified by comparing horizontal and vertical correlation lengths L h and L v .
Hence, stretching the space vertically by means of coordinate transformation (x, y, z) → (x, y, ηz) with η ∼ L h /L v before constructing a Delaunay triangulation ensures that, on average, the amount of variation of the retrieved quantity in vertical and horizontal directions is similar. This, in turn, allows us to make use of the aforementioned benefits of Delaunay partitioning.
In our implementation, we retrieve several quantities on the same grid, and L h and L v may differ for each of them. The 20 cost function of equation (11) is evaluated separately for each quantity, so regularisation can still be calculated correctly as long as spatial structures of every quantity are adequately sampled by the grid. In practice, this means that as long as L h /L v are of the same order of magnitude for all quantities, they can all be retrieved on one grid. Fortunately this is usually the case In order to keep the computational costs low, we use a simple linear interpolation scheme: in each Delaunay cell we express an atmospheric quantity f at any point x as If r i , 0 ≤ i ≤ 4 are the four vertices of the Delaunay cell, the (unique) constant gradient k can obtained from a system of 3 This method ensures that the interpolated quantity is continuous and consistent with the volume integration scheme described in section 2.5. Implementation is simple and fast, as any point inside a given Delaunay cell can be interpolated with data about 5 that cell only. The gradient ∇f of the atmospheric quantity, is, however, generally discontinuous at cell boundaries making the interpolation unsuitable for direct use in spatial derivative evaluation.

Derivatives
In order to evaluate the cost function based on 3D exponential covariance (11), we need to estimate ∇φ and ∆φ at every grid point (as before, φ = f −f apr is the departure of atmospheric quantity f from the a priori). This section describes the algorithm 10 we use to achieve that on irregular Delaunay grids.
We begin by establishing some requirements that our derivative-estimation algorithm will have to meet. Firstly, recall that our inverse model assembles the precision matrix S −1 a so that (y − y apr ) T S −1 a (y − y apr ) would be a numerical representation of (11) (or, more generally, (9)). Let us write S −1 a = A + B, where A is a diagonal matrix representing the first integral of (11) (or first term of (9)), and B represents the remaining terms. If B were an exact representation, it would be positive definite 15 by construction, but this may not hold with ∇φ and ∆φ obtained numerically for a finite grid. If indeed an atmospheric state vector q exists so that q T Bq = 0, then for every atmospheric state y hence one of the states y ± q would be favoured (have a smaller cost function) over y by the inverse modelling algorithm.
Therefore, a scaled version of q would appear on any retrieval as noise. This particular type of noise would only be suppressed 20 by the cost function term from (y − y apr ) T A (y − y apr ). This guarantees only that y is not very far from a priori, but does nothing to ensure that it is even continuous. This often results in retrievals with unphysical periodic structures (noise) and is unacceptable.
We deal with this problem by explicitly ensuring that B is positive definite. Consider the estimation of derivatives at a single grid point a ∈ R 3 . ∇φ (a) and ∆φ (a) are numerically estimated from the values of φ in some (say m) grid points near a. We 25 can write this as a map Now q T Bq = 0, q = 0 implies that at every grid point a ∈ R 3 we have ∇φ (a) = 0, ∆φ (a) = 0 for the φ (a) corresponding to state q. Hence such q will not exist if we require that D a have trivial null space for all grid points a (i.e. D a (φ) = {0, . . . , 0} The converse is not always true, so the trivial null space requirement is a stronger condition than absolutely necessary. It is, however, advantageous because it restricts the derivative-estimation algorithm at each grid point independently and hence can be implemented without computationally expensive operations on S −1 a as a whole. Since (15) will be used directly to construct S −1 a , which does not depend on atmospheric state y and hence on φ(a), D a must be a linear map, so null space is a vector space of dimension max {0, m − 6}, hence we need m ≤ 6. We also want to avoid the derivative estimation to be underdetermined (m ≥ 6), so we choose m = 6, i.e. we aim to construct a linear map (15) 5 that estimates derivatives at grid point a using atmospheric quantity values at 6 points around a. This prevents us from using interpolation for derivative calculations, since each interpolated value depends on atmospheric quantity values at multiple (in our case 4) grid points and consequently, even if the same grid point is reused for several interpolated values, the total number of grid points employed exceeds 6 in any interpolation-based scheme we could come up with.
A solution satisfying the above criterion and able to deal with grid points with neighbours at irregular positions is based on 10 polynomial fitting. For 6 grid points around a grid point a = (x a , y a , z a ) denoted as r We solve this as a linear system for the unknowns φ x , φ y , φ z , φ xx , φ yy , φ zz which can then be directly identified with the derivatives we seek. Note that, as we required, the system can be solved by inverting a 6-by-6 matrix that only depends on r i 15 and not φ (r i ), so we can use this method to precompute a matrix that would act on the atmospheric state vector to return the value of the volume integral over the derivative (11).
The only remaining issue is selecting a suitable set of grid points r i so that the resulting matrix that needs to be inverted would not be singular or have a very high condition number. We found that this is much easier to achieve by imposing simple geometric criteria for point selection rather than algebraic conditions.

20
Let a be a point on Delaunay grid. The main idea of the algorithm we propose is to pick, for each spatial dimension d, a pair of points r 1 , r 2 suitable for estimating the derivative in that direction. Points of such a pair should be sufficiently separated in d direction (i.e. projections of the vectors r i − a to d direction sufficiently different), be on the opposite sides of a (unless a is at the boundary of the grid and this is not possible), and be reasonably close to a. We obtain the required 6 points by choosing one pair of points for each of the 3 dimensions. This can be implemented as follows.

25
Let a = (x a , y a , z a ), and let us write the coordinates of other points on this grid as Let 0 < β < 1 and γ > 1 be constants, suitable numeric values will be discussed further on. Then: 4. If neither condition of step 2 is satisfied for at least 1 dimension, repeat steps 1-3 with not only direct Delaunay grid neighbours of a, but also second neighbours (i.e. neighbours of neighbours). If this fails as well, do not calculate derivatives at the point a and simply assume them to be zero.
We found that the polynomial interpolation method is robust and a rather low value of β = 0.3 was sufficient to produce suitable sets of neighbouring points (i.e. almost all 6-by-6 matrices resulting from such choice of points can be reliably inverted for 5 polynomial fitting). In practice this means that the algorithm can find a suitable pair of points for dimension d unless all the Delaunay neighbours of the point a are located very close to the other two axes w.r.t. the point a.

Volume integration
Computing (11) requires a way to estimate volume integrals on Delaunay grid, which is described in this section.
Let a Delaunay cell have vertices r i and face areas S i for 0 ≤ i ≤ 3, and volume V . Then integrating an atmospheric quantity 10 f as interpolated in (12) One can prove this by writing (12) with r i , 0 ≤ i ≤ 3 instead of r 0 and adding up the resulting 4 equations to obtain As we can choose such coordinates for any vertex instead of r 0 , the integral I has zero component in 3 linearly independent directions, so I = 0. Then integrating (18) over V yields (17) and completes the proof.
We can now see that (17) gives an exact value to the volume integral of f provided that f behaves exactly as prescribed by 20 our interpolation scheme. Since the interpolation is linear we may expect the values of volume integrals to be correct to the first order in cell dimensions. Since, generally, all data fed into the inverse modelling problem is interpolated beforehand, there is little reason to expect that a more elaborate and accurate volume integration scheme would improve the accuracy of retrievals on its own. A method based on Voronoi cells was also attempted and did not produce meaningfully different results.

25
Estimating the accuracy of remote sensing data products generated by means of inverse modelling is, in general, rather difficult due to the indirect manner in which they are obtained from the actual measurements. These estimates are, however, very valuable not only for the users of the final data, but also for evaluation and optimisation of the inverse modelling techniques.
Detailed quantitative descriptions of data accuracy can be derived in theory (see validation in section 4.6 and Rodgers (2000)) but they are, in case of large retrievals, too numerically expensive to calculate in practice. The accuracy calculation typically requires O n 3 operations for an atmospheric state vector of n elements (for 3D tomography n might be of the order 10 6 ).
One way of reducing computational cost is is employing the Monte Carlo technique to generate a set of random (Gaussian) 5 atmospheric state vectors with the required covariance. One can then add these random vectors to the result of the retrieval and run the forward model on the perturbed atmospheric states. The magnitude of perturbations that is required to produce a variation of forward model output similar to the expected error of measurements would then be an estimate of the retrieval error (Ungermann et al., 2011).
In order to implement this technique, one needs a way to generate a random vector y with the required covariance, repre-10 sented by a known, symmetric, positive definite covariance matrix S a . This can be done by generating a vector x of independent standard normal random variables and finding a square matrixL such thatLL T = S a . Then we can set y =Lx, and indeed yy T = L xx TLT =L xx T L T = S a , i.e. the vector y has the required covariance (the angle brackets in the expression above denote the expected value).
A widely used technique for obtaining the matrix L is the Cholesky decomposition. Given the covariance matrix S a = {s ij } 15 it explicitly provides a lower triangular matrix rootL = {l ij } satisfyingLL T = S a as shown in (20).
In practice we do not usually assemble the covariance matrix S a , but rather its inverse: the precision matrix S −1 a , because the latter is sparse. This is not an issue, since S −1 a is then also symmetric positive definite, so one can compute its root L such that LL T = S −1 a in the same way as above, and then obtain y from the linear system S −1 a y = Lx. We will refer to the components 20 of precision matrix by S −1 a = {a ij }. Cholesky decomposition does not preserve the sparse structure of S −1 a , i.e. the L obtained in this way will typically have many more non-zero entries than S −1 a . It follows from (20), however, that if S −1 a has lower half-bandwidth w (definition in equation (21) below), then so has L In practice, this means that if S −1 a is a sparse N ×N matrix with half-bandwidth w N , L will have approximately N (w +1) non zero entries. Cholesky decomposition is hence well suited for computing diagnostics of 1D retrieval: assuming that the value of the retrieved atmospheric parameter at one point only directly correlates with its w nearest neighbouring points in each direction, one gets a precision matrix with half-bandwidth w and can compute L cheaply with O(N w 2 ) operations.
The situation is very different in the higher dimensions. We will use some results of graph theory to show that Cholesky 30 decomposition is not practical in those cases. The n-dimensional lattice graph P n k consists of an n-dimensional rectangular grid of size k in each direction, with edge between any two grid neighbours. Let us label the vertices of P n k with integers: P n k = {v i : 1 ≤ i ≤ k n }, and let q be the maximum difference of indexes of neigbouring vertices (q = max {|i − j| : v i , v j adjacent}).
Then the bandwidth ϕ(P n k ) of P n k is defined as the minimum possible value of q among all possible ways to label the vertices. Now say we have some physical quantity defined on each vertex of this grid. Then any reasonable precision matrix S −1 a for this grid would at least give non-zero correlations between grid neighbours, i.e. S −1 a ij = 0 if v i and v j are neighbours. Then, 5 by comparing the definitions of ϕ(P n k ) and lower-half bandwidth w of the precision matrix from the last paragraph, one can see that 2w + 1 ≥ ϕ(P n k ). FitzGerald (1974) showed that ϕ P 2 k = k and ϕ P 3 k = 3k 2 /4 + k/2 . Therefore, the narrowest possible half-bandwidths of S −1 a are w = O (k) in 2D and w = O k 2 in 3D. It follows that if the grid contains a total of N points, the computational cost of Cholesky decomposition would be O(N 2 ) in 2D and O N 7/3 in 3D, which is unsatisfactory for large retrievals.

10
For these higher dimensions, we need to use sparse matrix iterative techniques to reduce computational cost and memory storage requirements, such as Krylov subspace methods. These allow to solve a linear system Mx = b, where M is a linear map, for x iteratively. The cost of each iteration is proportional to the number of non-zero elements in M. In general, it is rather difficult to find a simple iteration scheme that would compute a square root of a matrix and converge reasonably fast.
Here we will follow an algorithm proposed by Allen et al. (2000). Consider a system of linear ordinary differential equations is the solution of (22). Hence we can obtain x (1) = A 1/2 c by solving (22) numerically. Note that one does not need to explicitly calculate inverses of any of the matrices involved, since dx/dt can be found by employing iterative methods to solve a linear system 25 Furthermore, such a numerical solution can then be formally written as for some real constants ξ ij , t ij , which implies that A 1/2 is sum of the products of commuting symmetric matrices, therefore A 1/2 is symmetric. This means A 1/2 T A 1/2 = A 1/2 A 1/2 = A. Hence, as shown before, if c is a vector of independent random variables, each with standard normal distribution, x (1) has covariance matrix A. We chose a classical approach -a Runge-Kutta method, to solve the linear ODE system numerically. Using constant step size proved to be inefficient, so adaptive step size control was introduced (in particular, a fifth order Runge-Kutta-Fehlberg method by Fehlberg (1985)). Conjugate gradients method (Saad, 2003) was employed for all the linear equation systems involved. If we make the same assumption about the structure of the inverse of the correlation matrix as before, i.e. that atmospheric quantities are only correlated between nearest grid neighbours, the said matrix will have a constant number of non-zero entries in each 5 row for any amount of rows (grid points) N . More explicitly, under this assumption the covariance matrix for 1D retrieval would be tridiagonal, and for a 3D retrieval on a regular rectangular grid it would contain 6 non-zero entries in each row.
Therefore, each iteration of the conjugate gradients algorithm will cost O (N ) operations. It is difficult, in general, to estimate the dependence of required number of iterations on N , but in our case large matrices seem to converge similarly to smaller ones and the whole computational cost of generating random vectors remains close to O(N ), which is a huge improvement over 10 the Cholesky decomposition. The algorithm does not, though, explicitly yield the root of the covariance matrix and only gives one random vector per run. Hence, unlike the Cholesky decomposition, it must be executed again for each new random vector required, so it is only feasible if the number of random vectors required is a lot smaller than N . Fortunately, this is very much the case in practice, since the number of random vectors required is typically only of the order of 100 (Ungermann, 2013).
Although this root-finding algorithm does not provide a way to directly multiply the matrix root with itself, there is an 15 inexpensive way to verify that the matrix root was computed correctly. Let A be a n × n s. p. d. matrix, let e i be the i'th unit vector (1 ≤ i ≤ n), and compute v i = A 1/2 e i using our algorithm. Then for any i, j we have v T i v j = e T i A 1/2 T A 1/2 e i = e T i Ae i = A ij , an element of A. Hence the quality of the matrix root can be evaluated by comparing v T i v j with A ij . The error tolerances of conjugate gradients and Runge-Kutta were adjusted so that these values would differ by at most 10 −4 when factoring a matrix prescaled so that the largest eigenvalue is approximately 0.95. The implementations of the algorithms described in the sections 2 and 3 were integrated into the Jülich Rapid Spectral Simulation Code Version 2 (JURASSIC2). The forward model of atmospheric radiance used in this code employs a forward 5 model for atmospheric radiation based on radiances obtained from the emissivity growth approximation method (Weinreb and Neuendorffer (1973), Gordley and Russell (1981)) and the Curtis-Godson approximation (Curtis (1952), Godson (1953)).
For more information about JURASSIC2, refer to Hoffmann et al. (2008); Ungermann et al. (2011);Ungermann (2013). The performance of the new algorithms, both in terms of output quality and computational cost, is evaluated in the rest of this section. The test case used in this paper is based on the actual aircraft path, measurement locations, spectral lines used for retrieval and meteorological situation during this flight. Synthetic measurement data was used instead of real GLORIA observations: the forward model of JURASSIC2 was employed to simulate the observed radiances in the atmospheric state given by the ECWMF temperature and WACCM model data for trace gases, simulated instrument noise was subsequently added to those 20 radiances. This setup allows us to use model data as a reference (the "true" atmospheric state) for evaluation of retrieval quality.
The retrieval derives temperature and volume mixing ratios of O 3 , HNO 3 , CCl 4 , ClONO 2 . Temperature and CCl 4 concentration are derived in the altitude range from 3 km to 20 km; O 3 , HNO 3 , ClONO 2 are retrieved from 3 km to 64 km.
The latter three trace gases have larger volume mixing ratios above flight level and hence significant contributions to measured radiances from these high altitudes. Interpolated WACCM data was used as a priori for all trace gases. A priori data for tem-25 perature was obtained by applying polynomial smoothing to ECMWF data. The retrieval is hence not supplied with the full temperature structure that was used to simulate the measurement. Therefore, the agreement between the retrieval result and the "true" atmospheric state, upon which the simulated measurements are based, cannot be achieved by overregularisation.

Correlation length estimation
The standard deviation σ and correlation the lengths L h and L v in equation (11) can be obtained from statistical analysis of in 30 situ measurement or model data. Rigorous derivation of these parameters for each retrieved trace gas and temperature would require detailed analysis of in situ measurement databases and is, therefore, out of scope of this paper. However, the following rough approximation proved sufficient to improve upon retrieval results compared to the old regularisation scheme.  (2)) are used for retrieval A, the standard deviation σ (see (11)) used for retrievals B-D 3D tomographic retrievals are most useful for those trace gases that have high vertical gradients and complex spatial structure. The spatial correlation lengths of concentrations of such gases are mostly determined by stirring, mixing and other dynamical processes and are therefore similar. Furthermore, Haynes and Anglade (1997) and Charney (1971) estimate of the dynamically determined aspect ratio α ≈ 200−250 in lower stratosphere, and we expect less than that in the upper troposphere.
Hence we will assume L h /L v = 200 for all trace gases.

5
An analysis of spatial variability of some airborne in situ measurements was performed by Sparling et al. (2006). For ozone concentration χ (s , z), where s is the horizontal location and z is the altitude, they define fractional difference parameters 10 and provide the dependence of the standard deviation σ (∆ r ) on the horizontal separation r. If the typical spatial variation in ozone concentration is significantly smaller than the mean ozone concentration χ , one can approximate ∆ r ≈ |χ (s + r) − χ (s)| / χ . Then, using the covariance relation (7) and various properties of the normal distribution, we get . By comparing this to experimental results of Sparling et al. (2006), we estimate L h = 200 km. Using the aspect ratio argument above, we also set L v = 1 km.

15
The spatial structure of retrieved temperature differs from that of the trace gases. It is determined not only my mixing and stirring, but also radiative processes and gravity waves. Radiative transfer tends to erase some of the fine vertical structure determined by isentropic transport, hence one should use a higher vertical correlation length than in the case of trace gases.
Gravity waves have a variety length scales, but due to the finite resolution of the instrument not all of them can be retrieved.
Following the gravity wave observational filter study for the GLORIA instrument in Krisch et al. (2018) we can estimate lowest 20 observable gravity wave horizontal wavelength to be λ h ∼ 200 km and lowest vertical wavelength λ v ∼ 3 km. λ h coincides with the dynamically determined correlation length L h = 200 km, so this value should be appropriate for temperature as well. The λ v limit is not only determined by instrument resolution, but is also related the fundamental properties of gravity waves. Preusse et al. (2008) gives the following expression for the gravity wave saturation amplitudê whereT is the mean temperature and N is the Brunt-Väisälä frequency. In typical lower stratosphere conditions and λ v = 3 km the equation (28) givesT max ≈ 4K. Since gravity waves are usually not saturated and considering typical measurement error 5 (see Figure 6), waves of significantly lower wavelengths would probably not be observed. We hence set L v = 3 km. Time series of measurements at a fixed point in atmosphere are then used to approximate the standard deviations σ for all retrieved quantities (Table 1).

Test runs
The test data described in the previous section was processed with several different regularisation setups and retrieval grids.

10
Here we present the results for four different runs.
Firstly, as a reference, the latest version of JURRASIC2 without the implementations of any new algorithms described in this paper was used (retrieval A). It uses a rectangular grid that covers altitudes from 1 km to 64 km. The vertical spacing of the grid is 1 km between the altitudes of 1 km and 5 km, 250 m between altitudes of 5 km and 20 km, 1 km between altitudes of 20 km and 25 km and 4 km between 28 km and 64 km. As required for the rectilinear grid, the number and distribution of points is the same in each altitude (Figure 1a). For this reference run, a first order regularisation (1) with trilinear interpolation (Bai and Wang, 2010) was used. The regularisation weights, as defined in (2), are given in Table 1. These are typically tuned ad-hoc, validated against model data and, when using real observations, in situ measurements.
Then, to evaluate the performance of second order regularisation, retrieval B was performed. The grid and interpolation methods were identical to retrieval A, but the regularisation was replaced with the second order scheme from equation (11). 5 We compare the quality of different retrievals by inspecting the differences between the retrieved temperature and temperature used to generate the synthetic measurements ("true" temperature). These differences are shown on horizontal and vertical slices of the observed atmospheric volume in Figure 4. Figure 5 shows the variation of the retrieved and true temperatures on horizontal lines in order to visualise the response of retrievals to different spatial structures in true temperature.
Comparison of Figure 4 rows A and B shows the effect of the new regularisation (11). The new algorithm (retrieval B) 10 demonstrates better agreement with the true temperature. It shows markedly less retrieval noise in the central area, as well as better results (less effect of a priori) just outside it, at the edges of the vertical cut shown in Figure 4. Considering also that, unlike A, the regularisation strength of B was not manually tuned for this test case in particular, we can conclude that the new regularisation performed better in this test case.
The third retrieval (C) was performed on the same set of grid points as B and with the same input parameters, but the grid was 15 treated as irregular, i.e. a Delaunay triangulation was found, the algorithms described in section 2 were used for regularisation.
A horizontal cut of the Delaunay triangulation for this grid is shown in Figure 1a. By comparing B and C one can evaluate the agreement between the old interpolation, derivative calculation and volume integration techniques for rectilinear grids and their newly developed irregular-grid-capable alternatives. The temperature fields from retrievals B and C are very similar (Figure 4), which confirms that the new Delaunay triangulation based code gives consistent results when run on rectilinear grid.

20
Finally, retrieval D was performed with the same input parameters and methods as C, but using only the subset (22 %) of the original grid points to reduce computational cost. These points were chosen so that grid density would be highest in the volumes best resolved by GLORIA and sparser where little measurement data is available. In particular, a set of axially symmetric regions around the center of original retrieval (i.e. symmetric w.r.t. vertical axis at 66 • N, 15 • W) was defined as shown in Table 2. Points were removed from the original grid to achieve the specified vertical and horizontal resolutions 25 for each region, resulting in the grid shown in Figure 2. Results changed little compared to retrieval C. Most of the minor differences between C and D occur near the edges of the hexagonal flight path (Figure 4). Even though the grid spacing for retrieval D is still the same as C in some these areas, larger grid cells nearby have some effect on their neighbours. Note also that C and D retrievals tend to differ in the areas where the agreement between each of them and the true temperature is relatively poor and diagnostics predict lower accuracy (Figure 6), so D can be considered to be as good as C in well-resolved 30 areas. Unphysical temperature structures far outside the measured area like the one partially seen at 64 • N, 20 • W are not unique to this retrieval, they just occur a little closer to the flight path in this case due to the coarser grid outside the hexagon.

Computational costs
JURASSIC2 finds an atmospheric state minimising (1) iteratively. In each iteration it first computes a Jacobian matrix of the forward model, hence obtains a Jacobian of the cost function (1) and then minimises the cost function using the Levenberg-Marquardt (Marquardt, 1963) algorithm and employing conjugate gradients (CG) (Saad (2003), Hanke (1995)) to solve the linear systems involved. The forward model is then run to simulate the measurements for the new iteration of atmospheric 5 state.
All calculations were performed on the Jülich Research on Exascale Cluster Architectures (JURECA) supercomputer operated by the Jülich Supercomputing Centre (Jülich Supercomputing Centre, 2016). Each retrieval was executed in parallel on 4 computation nodes, each with 24 CPU cores (twin Intel Xeon E5-2680 v3 CPU's) and 128 GB of system main memory. Table 3 shows the time required for different parts of calculation averaged over 3 identical runs (although < 1% variation 10 was observed). The "Other calculations" field mostly includes time required by input and output, but also preparatory steps, such as Delaunay triangulation in the case of retrievals C and D.
The results in Table 3 reveal the impact of the new methods on computation times. The new, more complex regularisation results in significantly slower convergence in the conjugate gradient calculation, making retrieval B about 6% more expensive than A. The Delaunay triangulation, however, decreases Jacobian calculation costs by roughly 25%, even the one used on the 15 same grid as the old approach. This has to do with the fact that interpolation based on Delaunay triangulation only requires 4 values of the interpolated quantity at neighbouring points, and linear interpolation in three directions requires 8, thus resulting in more non-zero terms in the Jacobian. The time required for initialisation of Delaunay interpolation and derivatives, including construction of the matrix S −1 a , increases rather fast with grid size with current implementation, contributing to "other calculations" of retrieval C. Further optimisation of this step is possible and will be implemented in the future. In total, the new 20 retrieval (C) is 9% faster then the old approach (A) while using the same grid. Retrieval D gives a further 45% reduction of computation time by removing 82% of points in the grid. A speedup of such order was expected. Only a minority of grid points contribute to a forward model (and hence Jacobian) calculations, and JURASSIC2 code is heavily optimised for sparse matrix operations. Most of the grid points contributing to forward model are Atmos. Meas. Tech. Discuss., https://doi.org/10.5194/amt-2018-199 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 12 July 2018 c Author(s) 2018. CC BY 4.0 License.
in the areas well resolved by the instrument and thus cannot be removed from the grid without compromising data products.
Therefore, the actual savings in computation time mostly come from removing points above flight level and choosing a more appropriate non-rectangular shape for the densest part of the grid when measurement geometry requires that. Some of the points above flight path must be included since infrared radiation from there is measured by the instrument, but very little information about atmosphere far above flight level can be retrieved, so very low grid densities are sufficient (Table 2).

Diagnostics
We follow Ungermann et al. (2011) and use Monte Carlo approach to obtain diagnostics, but replace Cholesky decomposition with the method developed in section 3. Results for retrievals B and D are presented in Figure 6. They show the total error estimate for temperature based on the sensitivity of temperature value to all other retrieved atmospheric quantities as well as expected error of non-retrieved quantities (e.g. pressure).

10
Monte Carlo retrieval results provided here were verified by estimating temperature error at one grid point using a more direct approach. Rodgers (2000) proves that retrieved atmospheric state x can be written in terms of the a priori state x a , the (unknown) true state x t and measurement error as where Here D denotes Jacobian and all other notation as in (1). The so-called gain matrix G fully describes the effect of measurement uncertainties on the retrieved state x. Computing G would be prohibitively expensive due to the required matrix inversion, but note that i'th element of x only depends on i'th line of the matrix G, so one can use Krylov subspace methods (e.g. conjugate gradients) to solve Mr i = e i for the i'th row r T i of the inverse matrix M −1 . The i'th row of G can then be obtained 20 as r T i (DF(x)) T S −1 .
This approach allows for error estimation for a small number of grid points in reasonable time. We chose one point in the centre of the hexagon (66 • N, 15 • W, 11.75 km altitude) for retrieval B. The total temperature error was found to be 1.27 K and agrees with Monte Carlo result (1.25 K) at those points within 2%, which is an excellent accuracy for an error estimate.
The results of Monte Carlo runs for retrievals B and D are presented in Figure 6. They clearly confirm that temperature 25 can only be reliably retrieved within the flight hexagon or very close to the actual path. Also, the error estimates are higher in the central area of hexagon close to flight altitude, in agreement with test results (compare column 3 of Figures 4 and 6).
Note that the actual deviations of retrieved temperatures (Figure 4 A-D) from the temperature values used to generate the synthetic measurements (depicted in Figure 3 T) tend to be significantly lower than the Monte Carlo estimate. This happens because Monte Carlo provides a rather cautious approximation for work with real data, which includes the uncertainties of 30 unretrieved quantities taken from the models (e.g. pressure), but those quantities are treated as exact when generating the simulated atmosphere, and only measurement noise is introduced. The major spatial features of the Monte Carlo estimate, however, do agree well with the errors of the simulated retrieval results.
A new regularisation algorithm for 3D tomographic limb sounder retrievals employing second spatial derivatives (based on equation 11) was developed and implemented. The unphysical ad-hoc parameters (regularisation weights) of the previous first spatial derivative based approach were replaced with physical quantities -retrieved quantity standard deviation and correlation lengths in vertical and horizontal directions. Unlike the regularisation weights, these quantities do not require as manual tuning 5 for each retrieval and can be estimated from in situ measurements, model data and theoretical considerations. Tests with synthetic data showed slightly superior performance of the new algorithm, even compared to the old approach with tuned parameters. A rather general technique (equation (9)) of adapting the tomographic retrieval for atmospheric regions of high anisotropy and spatial variability was also proposed.
A Delaunay triangulation based approach for retrievals on irregular grids was developed. By better adapting the grid to the 10 retrieval geometry and thinning it out in the regions where lack of measurement data limits the resolution, the computation time of a tomographic retrieval was reduced by a total of 50% without significant deterioration of the results. Irregular grids can be further employed for efficient tomographic retrievals in non-standard measurement geometries.
Monte Carlo Diagnostics were newly implemented for a 3D tomographic retrieval allowing for quick and reliable evaluation of measurement quality, identification of reliably resolved volumes and selection of optimal 3D tomography setups. Error