Reduced-Cost Construction of Jacobian Matrices for High-Resolution Inversions of Satellite Observations of Atmospheric Composition

. Global high-resolution observations of atmospheric composition from satellites can greatly improve our understanding of surface emissions through inverse analyses. Variational inverse methods can optimize surface emissions at any resolution but do not readily quantify the error and information content of the posterior solution. In fact, the information content of satellite data may be orders of magnitude lower than its coverage suggests because of failed retrievals, instrument noise, and error correlations that propagate through the inversion. Analytical solution to the inverse problem provides closed- 15 form characterization of posterior error statistics and information content but requires the construction of the Jacobian matrix that relates emissions to atmospheric concentrations. Building the Jacobian matrix is computationally expensive at high resolution because it involves perturbing each emission element, typically individual grid cells, in the atmospheric transport model used as forward model for the inversion. We propose and analyze two methods, reduced-dimension and reduced-rank, to construct the Jacobian matrix at greatly decreased computational cost while retaining information content. Both methods 20 begin from an initial native-resolution estimate of the Jacobian matrix constructed at no computational cost by assuming that atmospheric concentrations are most sensitive to local emissions. The reduced-dimension method uses this estimate to construct a Jacobian matrix on a multiscale grid that maintains high resolution in areas with high information content and aggregates grid cells elsewhere. The reduced-rank method constructs the Jacobian matrix at native resolution by perturbing the leading patterns of information content given by the initial estimate. We demonstrate both methods in an analytical 25 Bayesian inversion of GOSAT methane satellite data with augmented information content North America in July 2009. We show that both methods reproduce the results of the native-resolution inversion while achieving a factor of 4 improvement in computational performance. The reduced-dimension method produces an exact solution at lower spatial resolution while the reduced-rank method solves the inversion at native resolution in areas of high information content and defaults to the prior estimate elsewhere. inversion optimizing emissions on a 1º x 1.25º grid. Both methods successfully approximated the native-resolution results and decreased computational cost by a factor of 4. The reduced-dimension method produced only 40% of the native-resolution information content as measured by the degrees of freedom for signal (DOFS) due to spatial averaging but generated twice the DOFS per 445 state vector element and avoided the correlated errors found in the native-resolution inversion. The reduced-rank method retained 70% of the native-resolution DOFS by solving the inversion accurately in the grid cells with the highest information content and defaulting to the prior emissions estimate elsewhere.


Introduction
Satellite observations of atmospheric composition provide a powerful resource to improve our knowledge of emissions (Streets et al., 2013). However, the inverse analyses used to infer emissions from observed atmospheric concentrations are subject to large errors from the measurements and the inversion itself. Conducting inverse analyses of satellite data to quantify emissions at high resolution is of considerable interest but may be limited by data quality in ways that are difficult 35 to quantify and that may compromise the results. Here we present two methods to conduct high-resolution inversions of satellite observations that optimize the information content of the observations while providing full error statistics and minimizing computational cost.
Inverse analyses infer emissions by fitting the observed atmospheric concentrations to a chemical transport model (CTM) 40 that simulates atmospheric concentrations as a function of emissions (Brasseur and Jacob, 2017). The CTM represents the forward model for the inverse problem. The solution is generally obtained by minimizing a Bayesian cost function regularized by a prior emissions estimate. The optimal (posterior) emissions estimate corresponds to the minimum of the cost function. This minimum is typically found using a numerical (variational) method, often employing the adjoint of the CTM to compute the cost function gradient (Henze et al., 2007). However, the numerical solution provides no explicit 45 characterization of the solution's error or information content. Methods of estimating the error exist (Bousserez and Henze, 2018;Evensen, 2009), but these approaches are computationally expensive, incomplete, and rarely applied in practice.
In the common case where the observed atmospheric concentrations depend linearly on emissions and the error statistics can be assumed to be normally or log-normally distributed, the Bayesian optimization problem has an analytical solution 50 including closed-form expressions for the posterior emissions estimate, its error statistics, and its information content (Rodgers, 2000;Maasakkers et al., 2019). The analytical solution requires explicit construction of the Jacobian matrix of the forward model, = ⁄ ∈ ℝ !×# , which represents the sensitivity of the simulated concentrations ∈ ℝ ! to the optimized emission state vector ∈ ℝ # (Brasseur and Jacob, 2017). The elements of y are individual observations and the elements of x are the emissions optimized by the inversion, often grid cells in a two-dimensional emissions field. The 55 Jacobian can be constructed column-wise by conducting n + 1 CTM simulations to perturb each of the state vector elements $ and obtain the corresponding column % ⁄ . Even on massively parallel computing clusters, the computational cost of conducting these simulations can limit the size of the state vector x and therefore the resolution at which inversions are conducted . However, once the Jacobian matrix is constructed, inversions can be conducted at essentially no additional computational cost, allowing study of the solution's sensitivity to changes in the specification of 60 inversion parameters, including errors, prior assumptions, and the number and type of observations. https://doi.org/10.5194/amt-2020-451 Preprint. Discussion started: 16 November 2020 c Author(s) 2020. CC BY 4.0 License.
An illustrative example is the inversion of satellite observations to infer methane emissions. Methane is an important greenhouse gas but the spatial and temporal distribution of emissions is highly uncertain (Saunois et al., 2019). Satellite observations of atmospheric methane columns can inform emission estimates (Jacob et al., 2016). This was first shown with 65 data from the SCIAMACHY satellite instrument (2003 -2012) with nadir pixel resolution of 30 x 60 km 2 (Bergamaschi et al., 2009(Bergamaschi et al., , 2013Houweling et al., 2014;Wecht et al., 2014). More recent inversions used observations from the TANSO-FTS instrument aboard the GOSAT satellite (2009 -present) with 10-km diameter pixels approximately 250 km apart alongand cross-track (Monteil et al., 2013;Alexe et al., 2015;Maasakkers et al., 2019). The Tropospheric Monitoring Instrument (TROPOMI) aboard the Sentinel-5 precursor satellite, launched in October 2017, now provides daily, 70 global retrievals of atmospheric methane columns at 5.5 x 7 km 2 nadir pixel resolution, increasing coverage by orders of magnitude relative to GOSAT (Veefkind et al.,, 2012). However, TROPOMI's methane retrieval has only a ~3% success rate for daytime scenes limited by dark surfaces (water), clouds, high aerosol loadings, and variable surface albedo and topography, resulting in heterogeneously distributed observations (Hasekamp et al., 2019;Hu et al., 2018). Inversions of TROPOMI data must attempt to capture the high resolution and density of observations where appropriate while recognizing 75 the limitations in information content resulting from data sparsity and observational errors.
Several methods have been proposed to decrease the computational cost of high-resolution analytical inversions by optimally reducing the dimension or rank of the state vector. Reduced-dimension methods (Bocquet et al., 2011; solve inversions on a multiscale emission grid of dimension k < n for which the construction of the Jacobian matrix 80 ∈ ℝ !×& is computationally tractable. Bocquet et al. (2011) defined a method to find the optimal multiscale grid from an array of all allowable grids, but this requires a large computational investment.  used prior emissions information to group together similar grid cells using a Gaussian mixture model, but the criteria used to define similarity were subjective and did not consider the information content of the forward model or the observations. Reduced-rank methods (Bousserez and Henze, 2018;Spantini et al., 2015) generate an approximation of the posterior solution at the 85 original dimension n by solving the inversion in the directions of highest information content. Spantini et al. (2015) assumed knowledge of the Jacobian matrix. Bousserez and Henze (2018) avoided explicit construction of the Jacobian matrix by estimating the directions of highest information content, but their approach is effective only if a small number of directions explain most of the information content.

90
Here we present two methods to construct the Jacobian matrix for a native n-dimensional state vector that maximize the information content of the inverse analysis using k < n forward model simulations. The reduced-dimension method generates a multiscale grid that preserves native resolution where information content is highest and aggregates grid boxes elsewhere.
The resulting reduced-dimension Jacobian matrix '( ∈ ℝ !×& solves the inversion exactly on the multiscale grid. The reduced-rank method constructs a Jacobian matrix ) ∈ ℝ !×# along the dominant patterns of information content in the 95 system, allowing the approximation of the inverse solution at native resolution. In both cases, a low-cost initial estimate of https://doi.org/10.5194/amt-2020-451 Preprint. Discussion started: 16 November 2020 c Author(s) 2020. CC BY 4.0 License.
the Jacobian matrix is updated using k forward model simulations where k is selected by the user based on the information content of the observing system and the available computational resources. We demonstrate both methods in a 1-month inversion of satellite data.

Methods 100
This section presents the reduced-dimension and reduced-rank methods of Jacobian matrix construction. Following a review of the standard analytical inverse framework (Section 2.1), we describe optimal reductions in both dimension and rank for an inverse system with a known native-resolution Jacobian matrix ∈ ℝ !×# (Section 2.2). We then present a two-step approach to approximate an initially unknown Jacobian matrix using reductions in dimension and rank (Sections 2.3 through 2.5). For the purposes of illustration, we take the state vector to be a gridded field of emissions although the methods apply 105 to any state vector.

Analytical solution to the inverse problem
The optimal estimate * of a state vector x given a prior estimate xA, observation vector y, and normal error statistics given by prior and observational error covariance matrices SA and SO, respectively, is obtained by the minimization of the Bayesian scalar cost function J(x) (Brasseur and Jacob, 2017): 110 Here F(x) represents the forward model that simulates the observations y given x. In our application, the forward model is a CTM. The observational error covariance matrix SO includes errors from both the measurement and the forward model, 115 which collectively form the observing system. If the forward model is linear so that F(x) = Kx + c, where = ⁄ is the Jacobian matrix calculated by finite difference (see Introduction) and c is a constant, then an analytical solution to the cost function minimum exists that yields both the posterior estimate * and its error covariance matrix 7 : Comparison of 7 and * defines the information content of the observing system, quantified by the averaging kernel matrix = *⁄ that represents the sensitivity of the posterior emissions estimate * to the true state x. A can be calculated as = − 7 * ,or equivalently as 125 = * Equation (4) expresses the dependence of the averaging kernel matrix on the forward model and both error covariance matrices. The diagonal elements of A are commonly referred to as the averaging kernel sensitivities. The sum of the sensitivities, or the trace of A, measures the number of pieces of information that can be independently quantified by the 130 observing system, known as the degrees of freedom for signal or DOFS (Rodgers, 2000).

Optimal reductions in dimension and rank of inverse systems
We first consider the problem of optimally reducing the dimension and rank of an inverse system as described in Section 2.1 with a known Jacobian matrix ∈ ℝ !×# . Figure 1 illustrates dimension and rank reductions for an emission grid over North America. The top left panel represents the original n-dimensional state space, i.e., the native-resolution grid. A linear 135 transformation ∈ ℝ &×# reduces the dimension of the state space from n to k. This transformation may reduce dimension discretely, as in the case of grid cell aggregation (top right panel), or non-discretely, in which case the k state vector components are themselves n-dimensional vectors (bottom right panel). A second linear transformation * ∈ ℝ #×& restores the dimension of the state space from k back to the original n. The resulting space, depicted in the bottom left, is a low-rank approximation of the original state space. The matrix = * transforms the original state space to the low-rank subspace. 140 The inverse problem can be solved in any of these four spaces, although the eigenvector corrections generated in the nondiscrete reduced-dimension space (bottom right panel) would be difficult to interpret.
We wish to define matrices and * that minimize the information loss associated with reducing the dimension or rank of the state vector. Bousserez and Henze (2018) show that the projection that maximizes the probability of restoring the 145 original full dimension state vector x given the reduced dimension state vector is given by = * For a projection of this form, they show that information loss is minimized by maximizing where ) and A are the reduced-rank and native-resolution averaging kernel matrices, respectively. Defining where the columns of W are the eigenvectors of and is a diagonal matrix of the corresponding eigenvalues ranked in descending order, Bousserez and Henze (2018) show that Tr( ) ) is maximized for a rank k subspace when = & where & is the matrix of the first k columns of W. The corresponding optimal projection is then 155 https://doi.org/10.5194/amt-2020-451 Preprint. Discussion started: 16 November 2020 c Author(s) 2020. CC BY 4.0 License.

Figure 1. Dimension and rank reductions of a gridded emissions field. The linear transformation matrix reduces the dimension
160 of the original state space (upper left) either discretely by aggregating grid cells to generate a multiscale grid (upper right) or nondiscretely by projecting along the patterns given by the rows of (lower right, with positive values in red and negative in blue). The reverse transformation * restores the dimension but not the rank, producing a low-rank subspace of the original state space (lower right). The projection = * reduces rank but not dimension.
This projection applies a dimension-reducing transformation followed by a dimension-restoring transformation * : The columns of * give an eigenvector basis of the averaging kernel matrix while the eigenvalues of Q give its eigenvalues, 170 together defining the dominant patterns of information content. The fraction of information content explained by the first k columns of * is the sum of the k largest eigenvalues divided by the total DOFS (Bousserez and Henze, 2018). We will refer https://doi.org/10.5194/amt-2020-451 Preprint. Discussion started: 16 November 2020 c Author(s) 2020. CC BY 4.0 License. to the rate at which the information content accumulates as the number of eigenvectors increases as the information content spectrum. On the basis of this spectrum, we can select k so that most of the information content is explained by the first k eigenvectors. Alternatively, we can select k so that all eigenvectors have a sufficiently large signal-to-noise ratio. The 175 diagonal matrix gives the singular values of the pre-whitened Jacobian matrix O = / ,-1 ⁄ * -1 ⁄ and represents the signal-to-noise ratio of 180 each eigenvector (Rodgers, 2000).

Approximating the Jacobian matrix
Section 2.2 described optimal reductions in dimension and rank of a state vector assuming knowledge of the nativeresolution Jacobian matrix K. However, the n + 1 forward model simulations needed to construct K may be prohibitively expensive. Here we present a two-step approach to construct a reduced-dimension or reduced-rank Jacobian matrix at much 185 lower computational cost. We start from a low-cost, native-resolution estimate (6) (see below) and calculate the corresponding averaging kernel matrix (6) . In the reduced-dimension method, we use (6) to construct a multiscale grid that maintains resolution in the areas of highest information content (top right panel of Figure 1). We generate the updated, reduced-dimension Jacobian matrix '( (-) ∈ ℝ !×& on the resulting grid using the forward model. In the reduced-rank method, we construct (-) ∈ ℝ !×# on the basis of the k dominant eigenvectors of (6) by perturbing those patterns in the forward 190 model, generating an approximation of the Jacobian matrix in a reduced-rank state space (bottom left panel of Figure 1). In both methods, the updated Jacobian matrix improves the estimate of the averaging kernel matrix and its eigenvectors by incorporating information content from forward model. We use either ' ( (-) or ) (-) to conduct a second update and construct the final Jacobian matrix.

195
In our demonstration case, we generate an initial estimate of the native-resolution Jacobian matrix (6) at no cost by assuming that a perturbation of methane emissions Δ [kg m -2 s -1 ] produce local dry column mixing ratio enhancements Δ [mol mol -1 ] as determined by a simple column mass balance dependent on local wind speed and parameterized turbulent diffusion. We construct (6) row-wise by assuming that observation i responds to emissions in grid cell j as 200 Δ $ = $8 9%: and therefore where $8 ∈ [0, 1] is a dimensionless coefficient providing a crude representation of turbulent diffusion, 9%: and ;< ! are the molecular weights of dry air and methane, respectively, L is a ventilation length scale taken as the square root of the grid cell area, g is gravitational acceleration, U is the local wind speed taken here as 5 km h -1 , and p is the surface pressure. We assume $8 = 0.4 for observations in grid cell j and distribute the remaining mass over the three concentric rings surrounding 210 that cell with $8 = 0.3/8, 0.2/16, and 0.1/24 from the inner to outer ring. This representation of turbulent diffusion increases the spatial coverage of the dominant patterns of information content; the exact parameterization (e.g. the number of rings used or the values of $8 ) is unimportant.
The reduced-dimension and reduced-rank methods rely on characterizing the dominant patterns of information content of the 215 observing system using the initial estimate of the averaging kernel matrix (6) . (6) can provide a good approximation of A even if the initial estimate of the Jacobian matrix (6) is crude because the averaging kernel matrix depends strongly on the specified prior and observational error covariance matrices * and / (equation (4)) and because, by assuming that observed concentrations are most sensitive to local emissions, (6) generates the highest information content where the observations are densest. This structure can then be refined by a two-step update. 220

Constructing the reduced-dimension Jacobian matrix
In an inverse system with a known native-resolution Jacobian matrix K, a reduced-dimension Jacobian matrix '( can be constructed on a multiscale grid that maintains native resolution where information content is highest and clusters grid cells elsewhere (top right panel of Figure 1). An optimal multiscale grid maximizes the total DOFS and the averaging kernel sensitivities of each state vector element, referred to here as the DOFS per cluster. To construct this grid, we first define the 225 state vector as a single element encompassing the inversion domain. We then add the native-resolution grid cells with the highest averaging kernel sensitivities to the state vector one-by-one, removing them from the original state vector element.
For each new element $ , we calculate the corresponding Jacobian matrix column $ ⁄ and the resulting increase in DOFS. When the DOFS stabilize, we add instead clusters of two or more native-resolution grid cells and repeat this procedure. Clusters can be generated by, for example, K-means clustering, which aggregates spatially proximate grid cells. 230 We repeat this process, increasing cluster size, until all native-resolution grid cells are allocated to the multiscale grid and the corresponding reduced-dimension Jacobian matrix '( is constructed. The DOFS convergence criteria and the sequence of cluster sizes can be selected to achieve the desired state vector dimension. We apply this approach beginning with our initial estimate (6) (Section 2.3) in a two-step update that iteratively improves 235 the multiscale grid. The information content for the initial multiscale grid is given by (6) , which identifies the grid cells with the highest sensitivities even given the crude estimate of the Jacobian matrix. We then construct a multiscale grid and compute the corresponding reduced-dimension Jacobian matrix '( (-) , introducing information content from the forward model. We identify the state vector elements where the forward model contributes the most information content by comparing the sensitivities given by the updated reduced-dimension averaging kernel matrix ' ( (-) to the sensitivities given 240 by (6) . We disaggregate the clusters with the largest differences and update the reduced-dimension Jacobian, generating '( (1) . The information content associated with both '( (-) and '( (1) includes contributions from prior emissions estimates, the observations, and the forward model. As a result, convergence is rapid and we find no need for further iteration. The analytical inversion can then be solved exactly on the multiscale grid using '( (1) .

Constructing the reduced-rank Jacobian matrix 245
In an inverse system with a known native-resolution Jacobian matrix K, a reduced-rank approximation of the Jacobian matrix ) can be constructed by calculating the linear relationship between emissions and observations for the most important patterns of information content rather than for individual or aggregate grid cells. A low-rank Jacobian corresponds to the state space shown in the bottom left panel of Figure 1. We showed in Section 2.2 that the leading patterns of information content are given by the columns of the dimension-restoring transformation * (equation (8)). For any selected 250 value of k, the k leading patterns span a rank-k, dimension-n subspace of the original information content space. A Jacobian matrix can be constructed within this space by calculating the model response to perturbations of these patterns. The response of the forward model F to the jth normalized eigenvector 8 * ∈ ℝ # , given by the jth column of * , is where is any scalar sufficiently large to ensure numerical stability. The model responses 8 , ∈ {1, … , } form the columns of the matrix = ∈ ℝ ! × & , which is the Jacobian matrix for an inverse system with a reduced-dimension state space spanned by the first k eigenvectors of the information content, illustrated by the bottom right panel of Figure 1. This reduceddimension Jacobian must be transformed to the original state dimension to enable physical interpretation of the posterior 260 results. Bousserez and Henze (2018) show that the reduced-dimension Jacobian matrix = is given by = = * and the reduced-rank Jacobian matrix ) by ) = = * . Thus, the reduced-rank Jacobian can be calculated from the reduced-dimension Jacobian by ) = = . The resulting Jacobian has dimension × and rank .
In an inverse system without a known Jacobian matrix, the reduced-rank Jacobian matrix approximation can be constructed 265 in a two-step update that iteratively improves the patterns of information content used as perturbations. We use the initial estimate of the Jacobian matrix (6) (Section 2.3) to calculate the corresponding averaging kernel matrix (6) and the matrix of its eigenvectors * (6) . When calculating * (6) , we select the (6) eigenvectors that have a signal-to-noise ratio greater than some threshold. We use the signal-to-noise criterion, which is stricter than the information content criterion, to account for the errors in the initial estimate of the information content. We compute the forward model response to each of the 270 eigenvectors using equation (12) and transform the resulting reduced-dimension Jacobian = (-) to the full-dimension state space with ) (-) = = (-) (6) . We calculate the associated averaging kernel matrix ) (-) and the matrix of its eigenvectors ) * (-) . Because ) (-) is a reduced-rank approximation, its spectrum of information content is discontinuous at (6) . We therefore use the spectrum of information content associated with the initial, full-rank estimate (6) to select the rank (-) of the second update and calculate ) * (-) . We use the (-) eigenvectors that span most of the information content from the initial 275 estimate and construct an updated reduced-rank Jacobian matrix approximation ) (1) as above. The resulting Jacobian matrix ) (1) is a rank ≈ (-) approximation that accurately quantifies the forward model where the observing system has high information content and loses accuracy in areas with lower information content where the observations are least able to constrain emissions. The resulting posterior solution is accurate in areas with high information content and defaults to the prior estimate elsewhere. 280

Results and discussion
We demonstrate the reduced-dimension and reduced-rank Jacobian matrix construction methods in an analytical Bayesian inversion of atmospheric methane columns observed by the GOSAT satellite over North America in July 2009. Although TROPOMI now provides higher density observations, using GOSAT allows us to follow Maasakkers (2019) to construct a "native-resolution" inverse system at 1º x 1.25º grid cell resolution (n = 2098, top left panel of Figure 1) against which we 285 compare our reduced-dimension and reduced-rank methods. To demonstrate the applicability of the methods to higherinformation observing systems such as TROPOMI, we artificially increase the information content of the GOSAT data by introducing an amplification factor > 1 to the cost function that increases the weight of the observational terms: The amplification factor functionally decreases the observational error covariance, increasing the DOFS. We set = 20, which increases the native-resolution DOFS from 40 to 216. Because of noise in the GOSAT data, this results in an overfit solution, which is irrelevant in our demonstration.  CH4 proxy retrieval over land Parker et al., 2011;ESA CCI GHG project team, 2018) for July 2009.
Prior emissions and error covariances are as described in Maasakkers et al. (2019). The demonstration is sufficiently coarseresolution and short that the native-resolution Jacobian matrix K can be explicitly computed with 2099 model runs. After constructing K, we use it as the forward model in lieu of additional GEOS-Chem simulations. 310 Figure 2 (top left panel) shows the native-resolution averaging kernel sensitivities, i.e. the diagonal elements of the nativeresolution averaging kernel matrix A. As discussed in Section 2.3, the structure of the averaging kernel matrix is largely determined by the prior error covariance matrix * and by the observation density as defined both by the observational covariance matrix / and by the Jacobian matrix K. This is apparent in the bottom panels of Figure 2, which show the 315 distribution of the prior error standard deviations (left) and observation density (right). Errors on the prior emissions estimate are largest for wetlands along the southeastern coast of the U.S. (Bloom et al., 2017). The variability in the observation density is driven by sampling frequency and retrieval success .
Figure 2 (top right panel) also shows the initial estimate of averaging kernel sensitivities of (6) derived from the initial 320 estimate of the Jacobian matrix (6) constructed as described in Section 2.3. No forward model simulations were conducted to construct this initial estimate, yet the patterns of information content reproduce those given by the native-resolution averaging kernel matrix A because of the strong dependence on the prior error standard deviation and the observation density. A has a smoother structure than (6) because of the effect of long-range transport in the CTM, which has little effect on the leading patterns of information content. 325 For this demonstration, we aim to reduce the number of forward model runs needed to construct the Jacobian matrix by a factor of 4 relative to the native-resolution inversion, from 2099 to ≈530 simulations. We first apply the reduced-dimension method to construct a Jacobian matrix on a multiscale grid generated with K-means clustering following Section 2.4. The This is reflected in the reduced-dimension averaging kernel sensitivities, which are more uniform than the native-resolution values. The reduced-dimension posterior scaling factors exhibit less variability than the native-resolution solution, which exhibits checkerboard patterns resulting in part from the overfit of the posterior solution to observational noise. The posterior scaling factors agree on regional scales. https://doi.org/10.5194/amt-2020-451 Preprint. Discussion started: 16 November 2020 c Author(s) 2020. CC BY 4.0 License. We next apply the reduced-rank method (Section 2.5) to construct a reduced-rank approximation of the Jacobian matrix. We calculate the dominant eigenvectors of the initial averaging kernel matrix estimate (6) , requiring that the signal-to-noise ratio of all eigenvectors be greater than 2.5. This yields (6) = 74 eigenvectors, which account for 37% of the initial-estimate DOFS. We perturb these eigenvectors in the forward model and construct the reduced-rank Jacobian matrix. We then calculate the averaging kernel matrix ) (-) and its dominant eigenvectors, defining (-) = 462 by requiring that the improved 350 eigenvectors capture 98.5% of the information content defined by ) (6) . The resulting Jacobian matrix ) (1) has rank ≈462 and required 537 forward model simulations. We solve the inversion with ) (1) and find 155 DOFS compared to the 216 DOFS generated in the native-resolution inversion, achieving 72% of the DOFS at a quarter of the computational cost.  The DOFS of the reduced-rank inversion are only moderately sensitive to the first and second update thresholds, with a 370 stronger dependence on the number of model runs conducted in the second update. Figure 4 shows the reduced-rank DOFS as a function of the number of first-and second-update forward model runs. Among all possible partitions of 537 total model runs (dashed line), our update scheme (starred) maximizes the DOFS. Using a signal-to-noise ratio threshold of 1 or 4 https://doi.org/10.5194/amt-2020-451 Preprint. Discussion started: 16 November 2020 c Author(s) 2020. CC BY 4.0 License. instead of 2.5 (dots), decreases the reduced-rank DOFS by only 2-3% (from 155 to 150 and 152, respectively). Lowering the signal-to-noise ratio threshold increases the number of eigenvectors drawn from (6) , which increases the effect of errors in 375 the initial Jacobian matrix estimate (6) . Increasing the threshold fails to exploit the information content of (6) . More generally, applying a signal-to-noise ratio threshold of 2.5 in the first update maximizes the DOFS regardless of the number of model runs conducted in the second update. We show the DOFS generated by these optimal configurations as a function of the total number of forward model runs in the top panel of Figure 4. After only 275 simulations, the optimal reduced-rank inversion generates 108 DOFS, achieving half of the native-resolution DOFS at 13% of the computational cost. 380 We solve the inversion (equations (2) -(4)) using the reduced-rank Jacobian matrix ) (1) and compare the posterior to the native-resolution solution. Figure  in the areas of highest information content and defaults to the prior value (a scaling factor of one) elsewhere. As a result of the exclusion of grid cells with low native-resolution information content, the reduced-rank DOFS (155) are lower than native-resolution DOFS (216). However, in grid cells with large averaging kernel sensitivities, the reduced-rank inversion preserves most information content. 699 grid cells have reduced-rank averaging kernel sensitivities greater than 0.01 and generate 153 DOFS, 87% of the 175 DOFS generated by these grid cells in the native-resolution inversion. 390 Figure 5 shows a statistical comparison of the reduced-rank and native-resolution inversion results for grid cells with a reduced-rank averaging kernel sensitivity above 0.01. None of the reduced-rank quantities exhibit significant bias, as shown by comparison to the 1:1 line. The elements of the reduced-rank Jacobian matrix ) (1) correspond closely with those of the native-resolution Jacobian matrix K (R = 0.96). The strong correlation of the averaging kernel sensitivities (R = 0.93) 395 confirms that the reduced-rank inversion accurately identifies the native-resolution grid cells with highest information content. The posterior error and scaling factors agree well in these grid cells. The posterior error standard deviations correlate very strongly (R = 0.99) due in part to the common contribution of the prior and observational error covariance matrices (equation (3)). The outlier reduced-rank standard deviations tend to be larger than the native-resolution values, reflecting the error introduced by discarding information content. The posterior scaling factors also agree well but the 400 correlation coefficient is smaller (R = 0.89) because of the propagation of errors from the posterior error covariance and Jacobian matrices (equation (2)).
The reduced-dimension and reduced-rank methods reproduce the native-resolution inversion with a factor of 4 reduction in computational cost. The reduced-dimension method generates lower DOFS but higher DOFS per state vector element due to 405 the clustering of grid cells. The resulting posterior solution is exact on the multiscale grid and provides better spatial

Conclusions
We proposed two methods to conduct analytical high-resolution inversions of satellite observations of atmospheric composition to infer emissions while maximizing information content and minimizing computational cost. The computational cost of analytical inversions is driven by the construction of the Jacobian matrix, which expresses the sensitivity of the observed concentrations to emissions. The Jacobian matrix is constructed numerically by conducting 420 perturbation simulations with a chemical transport model (CTM) that serves as forward model for the inversion. Our methods exploit the dominant patterns of information content in the observing system to build the Jacobian matrix. The reduced-dimension method constructs the Jacobian matrix on a multiscale grid that aggregates grid cells where information content is lowest. The reduced-rank method approximates the Jacobian matrix using the dominant patterns of information content, discarding the weaker patterns. Beyond the present application, both methods can be applied more generally to the 425 problem of efficient numerical approximation of high-dimension Jacobian matrices.
Both methods use a two-step update to improve an initial, no-cost estimate of the Jacobian matrix and the corresponding averaging kernel matrix. The initial estimate of the Jacobian matrix is constructed by assuming that observed atmospheric concentrations are most sensitive to local emissions. Because the averaging kernel matrix has a strong dependence on the 430 prior error covariance and observation density, this initial estimate can accurately quantify the fine structure of information content. The reduced-dimension method uses the initial estimate of the averaging kernel matrix to generate a multiscale grid that maintains native resolution where information content is highest and consolidates grid cells elsewhere. The forward model is used to build the Jacobian matrix on this grid, and the resulting reduced-dimension averaging kernel matrix is compared to the initial estimate to identify the state vector elements where the forward model contributed the most 435 information content. These elements are disaggregated to generate a second update. The reduced-rank method uses the initial estimate of the averaging kernel matrix to identify the dominant patterns of information content. The forward model is applied to these patterns, generating a first update of the Jacobian matrix. This update serves as the basis for a second update.
In both methods, rapid convergence occurs after two updates.

440
We applied both methods in a demonstration inversion of GOSAT column methane observations over North America for July 2009 with artificially enhanced information content and compared the results to a native-resolution inversion optimizing emissions on a 1º x 1.25º grid. Both methods successfully approximated the native-resolution results and decreased computational cost by a factor of 4. The reduced-dimension method produced only 40% of the native-resolution information content as measured by the degrees of freedom for signal (DOFS) due to spatial averaging but generated twice the DOFS per 445 state vector element and avoided the correlated errors found in the native-resolution inversion. The reduced-rank method retained 70% of the native-resolution DOFS by solving the inversion accurately in the grid cells with the highest information content and defaulting to the prior emissions estimate elsewhere. https://doi.org/10.5194/amt-2020-451 Preprint. Discussion started: 16 November 2020 c Author(s) 2020. CC BY 4.0 License.