The Adaptable 4A Inversion (5AI): description and first XCO2 retrievals from Orbiting Carbon Observatory-2 (OCO-2) observations

A better understanding of greenhouse gas surface sources and sinks is required in order to address the global challenge of climate change. Space-borne remote estimations of greenhouse gas atmospheric concentrations can offer the global coverage that is necessary to improve the constraint on their fluxes, thus enabling a better monitoring of anthropogenic emissions. In this work, we introduce the Adaptable 4A Inversion (5AI) inverse scheme that aims to retrieve geophysical parameters from any remote sensing observation. The algorithm is based on the Optimal Estimation algorithm, relying on the Operational version of the Automatized Atmospheric Absorption Atlas (4A/OP) radiative transfer forward model along with the Gestion et Étude des Informations Spectroscopiques Atmosphériques: Management and Study of Atmospheric Spectroscopic Information (GEISA) spectroscopic database. Here, the 5AI scheme is applied to retrieve the column-averaged dry air mole fraction of carbon dioxide (XCO2 ) from a sample of measurements performed by the Orbiting Carbon Observatory-2 (OCO-2) mission. Those have been selected as a compromise between Published by Copernicus Publications on behalf of the European Geosciences Union. 4690 M. Dogniaux et al.: The Adaptable 4A Inversion (5AI): description and first retrievals coverage and the lowest aerosol content possible, so that the impact of scattering particles can be neglected, for computational time purposes. For air masses below 3.0, 5AIXCO2 retrievals successfully capture the latitudinal variations of CO2 and its seasonal cycle and long-term increasing trend. Comparison with ground-based observations from the Total Carbon Column Observing Network (TCCON) yields a bias of 1.30± 1.32 ppm (parts per million), which is comparable to the standard deviation of the Atmospheric CO2 Observations from Space (ACOS) official products over the same set of soundings. These nonscattering 5AI results, however, exhibit an average difference of about 3 ppm compared to ACOS results. We show that neglecting scattering particles for computational time purposes can explain most of this difference that can be fully corrected by adding to OCO-2 measurements an average calculated–observed spectral residual correction, which encompasses all the inverse setup and forward differences between 5AI and ACOS. These comparisons show the reliability of 5AI as an optimal estimation implementation that is easily adaptable to any instrument designed to retrieve column-averaged dry air mole fractions of greenhouse gases.


Abstract.
A better understanding of greenhouse gas surface sources and sinks is required in order to address the global challenge of climate change. Space-borne remote estimations of greenhouse gas atmospheric concentrations can offer the global coverage that is necessary to improve the constraint on their fluxes, thus enabling a better monitoring of anthropogenic emissions. In this work, we introduce the Adaptable 4A Inversion (5AI) inverse scheme that aims to retrieve geophysical parameters from any remote sensing observation. The algorithm is based on the Optimal Estimation algorithm, relying on the Operational version of the Automatized Atmospheric Absorption Atlas (4A/OP) radiative transfer forward model along with the Gestion et Étude des Informations Spectroscopiques Atmosphériques: Management and Study of Atmospheric Spectroscopic Information (GEISA) spectroscopic database. Here, the 5AI scheme is applied to retrieve the column-averaged dry air mole fraction of carbon dioxide (X CO 2 ) from a sample of measurements performed by the Orbiting Carbon Observatory-2 (OCO-2) mission. Those have been selected as a compromise between coverage and the lowest aerosol content possible, so that the impact of scattering particles can be neglected, for computational time purposes. For air masses below 3.0, 5AI X CO 2 retrievals successfully capture the latitudinal variations of CO 2 and its seasonal cycle and long-term increasing trend. Comparison with ground-based observations from the Total Carbon Column Observing Network (TCCON) yields a bias of 1.30 ± 1.32 ppm (parts per million), which is comparable to the standard deviation of the Atmospheric CO 2 Observations from Space (ACOS) official products over the same set of soundings. These nonscattering 5AI results, however, exhibit an average difference of about 3 ppm compared to ACOS results. We show that neglecting scattering particles for computational time purposes can explain most of this difference that can be fully corrected by adding to OCO-2 measurements an average calculated-observed spectral residual correction, which encompasses all the inverse setup and forward differences between 5AI and ACOS. These comparisons show the reliability of 5AI as an optimal estimation implementation that is easily adaptable to any instrument designed to retrieve column-averaged dry air mole fractions of greenhouse gases.

Introduction
The atmospheric concentration of carbon dioxide (CO 2 ) has been rising for decades because of fossil fuel emissions and land use changes. However, large uncertainties still remain in the global carbon budget (e.g. Le Quéré et al., 2009). In order to address the global challenge of climate change, a better understanding of carbon sources and sinks is necessary, and remote space-borne estimations of CO 2 columns can help to constrain these carbon fluxes in atmospheric inversion studies, thus reducing the remaining uncertainties (e.g. Rayner and O'Brien, 2001;Chevallier et al., 2007;Basu et al., 2013Basu et al., , 2018. The column-averaged dry air mole fraction of CO 2 (X CO 2 ) can be retrieved from thermal infrared (TIR) soundings, mostly sensitive to the midtroposphere (e.g. Chédin et al., 2003;Crevoisier et al., 2004Crevoisier et al., , 2009a, and from nearinfrared (NIR) and shortwave infrared (SWIR) measurements, which are sensitive to the whole atmospheric column and especially to levels close to the surface where carbon fluxes take place. Current NIR and SWIR satellite missions observing carbon dioxide include the Japanese Greenhouse gases Observing SATellites (GOSAT and GOSAT-2), NASA's Orbiting Carbon Observatory-2 and 3 (OCO-2 and OCO-3) and the Chinese mission TanSat. Over time, different algorithms have been developed to exploit their measurements; those rely on different inverse methods and use various hypotheses to address the fundamentally illposed problem of X CO 2 retrieval. These algorithms, notably, include the Japanese National Institute for Environmental Studies (NIES) algorithm (Yokota et al., 2009;Yoshida et al., 2011Yoshida et al., , 2013, the Atmospheric CO 2 Observations from Space (ACOS) algorithm (Bösch et al., 2006;Connor et al., 2008;O'Dell et al., 2012O'Dell et al., , 2018, the University of Leicester Full Physics (UoL-FP) retrieval algorithm from the University of Leicester (Parker et al., 2011), RemoTeC from the Netherlands Institute for Space Research (SRON; Butz et al., 2011;Wu et al., 2018) and the Fast atmOspheric traCe gAs retrievaL (FOCAL) algorithm from the University of Bremen (Reuter et al., 2017a, b).
They also rely on different forward radiative transfer models to compute synthetic measurements and their partial derivatives. It was noted, for the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIA-MACHY) mission (Bovensmann et al., 1999), that NIR and SWIR measurements are also quite sensitive to the presence of scattering particles on the optical path, which can then substantially perturb X CO 2 retrievals if unaccounted for (Houweling et al., 2005). As exact multiple scattering calculations are too time consuming for operational X CO 2 retrievals, all the previously mentioned retrieval algorithms have radiative transfer models that implement various approximations to speed up forward modelling. Finally, radiative transfer fundamentally depends on spectroscopic databases that contain the parameters enabling to compute atmospheric gas absorption. The HITRAN spectroscopic database (latest version from 2016; Gordon et al., 2017) is widely used for greenhouse gas concentration retrievals, as are the absorption coefficient (ABSCO) atmospheric absorption tables for the ACOS algorithm Oyafuso et al., 2017).
The design of an X CO 2 retrieval algorithm, from the forward model and the spectroscopic parameters it uses to the choice of the adjusted quantities in the state vector, has a critical influence on the overall performance of the observing system (Rodgers, 2000). The systematic errors in retrieved X CO 2 and their standard deviations (the latter being also called single measurement precision) with regard to the true (but unknown) state of the atmosphere particularly impact the uncertainty reduction and bias in atmospheric CO 2 flux inversion studies (e.g. Chevallier et al., 2007). Retrieved X CO 2 products are most often validated against columns with similar observation geometry, like the ground-based solar absorption spectrometry. The Total Carbon Column Observing Network (TCCON) is a network of ground stations that retrieve the column-averaged dry air mole fraction of CO 2 and other species from NIR and SWIR spectra measured with Fourier transform spectrometers (FTSs) directly pointing at the Sun (Wunch et al., 2011b). The network currently consists of 27 stations all around the world, and its products constitute a truth proxy reference for the validation of space-borne retrievals of greenhouse gas atmospheric concentrations. For instance, TCCON data sets were used to validate SCIAMACHY (Reuter et al., 2011), GOSAT X CO 2 retrieved by the ACOS (Wunch et al., 2011a) and NIES algorithms (Inoue et al., 2016), and OCO-2 X CO 2 produced by ACOS Wunch et al., 2017), RemoTeC (Wu et al., 2018), and FOCAL (Reuter et al., 2017b). These three last algorithms exhibit different biases compared to TC-CON, depending on their respective forward modelling and bias correction strategies, i.e. 0.30 ± 1.04, 0.0 ± 1.36, and 0.67 ± 1.34 ppm (parts per million) for OCO-2 nadir land soundings, respectively.
In this paper, we present the Adaptable 4A Inversion (5AI) that implements the optimal estimation inverse method (Rodgers, 2000). 5AI relies on the OPerational version of the Automatized Atmospheric Absorption Atlas (4A/OP) radiative transfer model (Scott and Chédin, 1981;Cheruy et al., 1995;https://4aop.aeris-data.fr, last access: 25 May 2021) and the GEISA (Gestion et Étude des Informations Spectroscopiques Atmosphériques: Management and Study of Spectroscopic Information) spectroscopic database (Jacquinet-Husson et al., 2016;http://cds-espri.ipsl. fr/etherTypo/?id=950, last access: 25 May 2021). The 5AI scheme is applied to retrieve X CO 2 from a sample of OCO-2 measurements that compromises between coverage and the lowest possible values of ACOS retrieved aerosol optical depth in order to avoid possible singular biases due to strong aerosol events. This sample selection comprises the (1) OCO-2 best flag target mode soundings between 2014 and 2018 and (2) a sample of 2 years of OCO-2 nadir measurements with a global land coverage. First, for computational time purposes, retrievals are performed without taking into account scattering particles. We then discuss how considering them and accounting for differences in the radiative transfer modelling and retrieval setups impact the 5AI results, which are compared to ACOS and TCCON reference data over identical sets of soundings. This paper is organised as follows: Sect. 2 describes the 5AI retrieval scheme and its current features, as well as the 4A/OP radiative transfer model, the GEISA spectroscopic database, and the empirically corrected O 2 A-band absorption continuum on which it relies. Section 3 presents the OCO-2 and TCCON data selection. Section 4 presents the a posteriori filters used for this work and shows the 5AI X CO 2 target and nadir retrieval results which are compared to TC-CON and ACOS (B8r version). 5AI results are discussed in Sect. 5. It shows how taking into account scattering particles in 5AI retrievals can impact the results and how systematic differences between different X CO 2 products can be accounted for by compensating them with a spectral residual adjustment X CO 2 . Finally, Sect. 6 highlights the conclusions of this work.

The 5AI retrieval scheme
As for any other retrieval scheme, 5AI aims at finding the estimate of atmospheric and surface parameters (for example, trace gas concentration, temperature profile, surface albedo, or scattering particle optical depth) that best fits hyperspec-tral measurements made from space. This inverse problem can be expressed with the following equation: where y is the measurement vector containing the radiances measured by the space instrument, x is the state vector containing the geophysical parameters to be retrieved, ε is the measurement noise, and, finally, F is the forward radiative transfer model that describes the physics linking the geophysical parameters to be retrieved to the measured infrared radiances.
2.1 Forward modelling -4A/OP and GEISA spectroscopic database The 5AI retrieval scheme uses the OPerational version of the Automatized Atmospheric Absorption Atlas (4A/OP). 4A/OP is an accurate, line-by-line radiative transfer model that enables a fast computation of atmospheric transmittances based on atlases containing precomputed monochromatic optical thicknesses for reference atmospheres. Those are used to compute atmospheric transmittances for any input atmospheric profile and viewing configuration that enable one to solve the radiative transfer equation and yield radiances and their partial derivatives with respect to the input geophysical parameters at a pseudo-infinite spectral resolution (0.0005 cm −1 best) or can be convolved with an instrument function. 4A/OP is the reference radiative transfer model for the Centre National d'Études Spatiales (CNES) IASI (Infrared Atmospheric Sounding Interferometer) level 1 calibration/validation and operational processing, and it is used for daily retrieval of midtropospheric columns of CO 2 (Crevoisier et al., 2009a) and CH 4 (Crevoisier et al., 2009b) from IASI. Moreover, 4A/OP has also been chosen by CNES as the reference radiative transfer model for the development of the New Generation of the IASI instrument (IASI-NG; Crevoisier et al., 2014) and the French NIR and SWIR CO 2 remote sensing MicroCarb mission (Pascal et al., 2017) and the French-German MEthane Remote sensing LIdar Mission (MERLIN; Ehret et al., 2017). Although originally developed for the thermal infrared spectral region, 4A/OP has been extended to the near-and shortwave infrared regions (NIR and SWIR). (1) The computation of the atlases of optical thickness was extended to the 3000-13 500 cm −1 domain and takes into account line mixing and collision-induced absorption (CIA) in the O 2 A band (Tran and Hartmann, 2008), as well as line mixing and H 2 O broadening of CO 2 lines (Lamouroux et al., 2010). The absorption lines of CO 2 that we use in this work are, thus, identical to those included in HITRAN 2008. (2) The solar spectrum is a flexible input, and the Doppler shift of its lines is computed. (3) The radiative transfer model is now coupled with the LIDORT model (Spurr, 2002) for scalar multiple-scattering simulation performed with the discrete ordinates method and with VLIDORT (Spurr, 2006) if po-4692 M. Dogniaux et al.: The Adaptable 4A Inversion (5AI): description and first retrievals larisation or bidirectional reflectance distribution functions (BRDFs) need to be taken into account. This coupling especially enables us to take into account Rayleigh scattering and, if necessary, scattering particles in NIR and SWIR forward modelling.
The 4A/OP radiative transfer model can be used with monochromatic optical thickness atlases computed from any spectroscopic database. For this present work, the atlases are computed using the GEISA 2015 (Gestion et Étude des Informations Spectroscopiques Atmosphériques: Management and Study of Spectroscopic Information) spectroscopic database. Being the base of many projects since the beginning in astronomical and astrophysical communities, GEISA has also been used since the 2000s for the preparation of several current and future spatial missions and has been chosen by CNES as the reference spectroscopic database for the definition of IASI-NG, MicroCarb, and MERLIN. Due to imperfections in the Tran and Hartmann (2008) line mixing and CIA models, an empirical correction to the absorption continuum in the O 2 A band, fitted from Park Falls TC-CON spectra, following the method described in , has been added. Finally, we use Toon (2015) as input solar spectra.
2.2 Inverse modelling in the 5AI retrieval scheme 2.2.1 Optimal estimation applied for X CO 2 retrieval The whole formalism of optimal estimation that enables us to find a satisfying solution to Eq. (1) may be found in Rodgers (2000). This subsection only outlines the key steps that are implemented in order to retrieve X CO 2 .
Equation (1) includes ε, the experimental noise of the measured radiances. Hence, it appears more appropriate to use a formalism that takes into account this measurement uncertainty and translates it into retrieval uncertainty; this is done by representing the state of the atmosphere x and the measurement y as random variables. Assuming Gaussian statistics, the inversion problem consists in finding the state vector which compromises between an a priori knowledge of the geophysical state parameters (most often brought by climatologies) and the information brought by the measurement, both weighted by their respective uncertainties. It finally boils down to the minimisation of the following χ 2 cost function: where x a is the a priori state vector, which is also, in most cases, chosen as the first guess for iterative retrievals. Assuming again Gaussian statistics, S a is the a priori state covariance matrix that represents the variability around the a priori state vector, and similarly, S e is the a priori measurement error covariance matrix that represents the noise model of the instrument. Moreover, as the forward model for this retrieval is highly nonlinear, it is practical to use a local linear approximation, here expressed around the a priori state as follows: The partial derivatives of the forward radiative transfer model F (here 4A/OP) are expressed as a matrix, called the Jacobian matrix, and is denoted as K.
All these assumptions enable the maximum posterior probability statex that minimises the cost function defined in Eq.
(2) to be found. It can be computed by iteration, using the general approach, as follows: where γ is a scaling factor that can be set to 0 (Gauss-Newton method) or whose value can be adapted along iterations in order to prevent divergence (Levenberg-Marquardt method, in which successful retrievals use decreasing γ values and, eventually, 0 for the final iteration). K i here denotes the forward radiative transfer Jacobian matrix, whose values are evaluated for the state vector x i . A successful retrieval reduces the a priori uncertainty of the state vector described in S a . The a posteriori covariance matrix of the retrieved state vectorŜ, whose diagonal elements give the posterior variance of the retrieved state vector elements, is expressed as follows: Finally, the sensitivity of the retrieval with respect to the true geophysical state x true is given by the averaging kernel matrix A calculated according to the following: In most cases, the CO 2 concentration is included in the state vector as a level or layer profile from which X CO 2 , the retrieved column-averaged dry air mole fraction of CO 2 , is computed (e.g. O'Dell et al., 2012). If we notex CO 2 , the part of the retrieved state vectorx containing the CO 2 profile, and A CO 2 andŜ CO 2 , the corresponding square parts of A andŜ, we have the following: where h is the pressure weighting function. σ X CO 2 denotes the posterior uncertainty of the retrieved X CO 2 , and a CO 2 is the CO 2 column averaging kernel. This profile vector describes the vertical sensitivity of the retrieved column with regard to the true profile; it is essential to characterise retrieval results and to compare them to other products, as shown in Sect. 4.2.
2.2.2 5AI features and retrieval scheme setups for OCO-2 The 5AI retrieval scheme enables the retrieval of multiple geophysical variables from hyperspectral measurements. Those currently include trace gas concentration represented in the state vector as a concentration profile or a profile scaling factor, global temperature profile offset, surface temperature and pressure band-wise albedo, whose spectral dependence is modelled as a polynomial, and, finally, scattering particle layer-wise optical depth. For this work, the state vector includes the main geophysical parameters necessary to retrieve X CO 2 and is described in Table 1. The a priori values and their covariance are identical to those used in the ACOS B8r version  in order to ease the retrieval result comparison. The OCO-2 spectrometer measures Earth-reflected nearand shortwave infrared (NIR and SWIR) sunlight in three distinct bands, namely the O 2 A band (0.7 µm), the weak CO 2 band (1.6 µm), and the strong CO 2 band (2.0 µm). In order to accurately model OCO-2 measurements, polarisation effects have to be taken into account. As 4A/OP coupling with (V)LIDORT is not optimal yet, forward calculations can reach unmanageable durations without some assumptions that allow faster radiative transfer simulations. Therefore, as explained in Sect. 1, we first restrict this work to the lowest scattering particle content possible (while compromising with coverage) so that only Rayleigh scattering needs to be taken into account in the O 2 A band (0.7 µm). This is done by using 4A/OP coupling with VLIDORT, and the ACOS Stokes coefficients are applied to yield the final scalar radiances. For CO 2 weak and strong bands, scattering and polarisation can be neglected with this low scattering particle content assumption, and only the Stokes coefficient, 0.5, for the I component of the electric field is applied to yield the final scalar radiances. As we neglect, for computation time purposes, the possible impact of scattering particles in forward calculations and in the state vector, the retrieval problem becomes more linear. Thus, we can also assume a slow variation in the Jacobian matrix along the retrieval iterations and, therefore, choose not to update it in order to save computational time. Hence, the partial derivatives of the radiative transfer model are evaluated once and for all around the a priori state. We performed a sensitivity test and assessed that this approximation does not significantly change the retrieval results (not shown).

Data description
The OCO-2 satellite has three distinct observation modes. The nadir and glint modes are the nominal science observation modes; they constitute the vast majority of OCO-2 measurements. In addition, the target mode of the OCO-2 mission provides data for the validation of the retrievals. In target mode, the satellite tilts and aims at a validation target (most of them are TCCON stations) and scans its whereabouts several times during the overpass. These sessions thus provide OCO-2 data points closely collocated with validation targets (over areas that can be as small as 0.2 • longitude × 0.2 • latitude) and registered over a few minutes .
OCO-2 high-resolution spectra are analysed by the ACOS team in order to retrieve X CO 2 and other geophysical parameters from them. The ACOS team provides two different X CO 2 values, i.e. raw and posterior bias-corrected X CO 2 . Raw X CO 2 is the direct output of the ACOS algorithm following the full physics retrieval; B8 retrospective (B8r) ACOS data release is used here . Posterior bias-corrected X CO 2 is an empirically corrected X CO 2 that has reduced averaged bias with regard to different truth proxies . In this work, 5AI results are compared with raw X CO 2 as we do not perform any empirical bias correction.
In addition, we compare X CO 2 retrieved from OCO-2 spectra to TCCON data. The TCCON network uses groundbased high-resolution Fourier transform spectrometers to measure NIR and SWIR spectra that enable the retrieval of the column-averaged dry air mole fractions of greenhouse gases. These retrievals are performed by GGG2014 software , and their results are available on the TCCON Data Archive (https://tccondata.org/, last access: 25 May 2021).

Data selection
We intend to compare 5AI results with regard to TCCON against ACOS results for corresponding sets of soundings. First, we select all the OCO-2 target soundings between 2015 and 2018 with the best ACOS cloud, sounding quality, and outcome flags values. As a compromise between scattering particle content and coverage, we set an upper limit of 0.5 for the ACOS retrieved total aerosol optical depth (AOD). This sample set of OCO-2 target soundings includes 16 414 soundings, with a median ACOS retrieved total AOD of 0.05 and a 75 % percentile of 0.1.
For this study, we select the TCCON official products measured ±2 h compared to OCO-2 overpass time and only keep the target sessions where there are at least five OCO-2 measurements passing 5AI posterior filters and five TCCON data points are available. This set includes 9449 TCCON individual retrieval results from 19 TCCON stations listed in Table 2. Besides target data, we also select a sample of OCO-2 nadir land soundings with a coverage as global as possible over the years 2016-2017 (all ACOS flags at their best value possible). For every month and 5 • longitude ×5 • latitude bins, we select 25 (10 for North America, southern Africa, and Australia) soundings with low ACOS retrieved total AOD. For 2016 and 2017, this selection is done for a maximum ACOS retrieved total AOD of 0.035 and 0.045, respectively, yielding 17 069 soundings for 2016 and 11 002 for 2017. Figure 1 shows the spatial and temporal distribution of these OCO-2 points.

Post-filtering of retrieval results
We apply the a posteriori filters described in Table 3 to ensure the quality of the retrieved results. The surface pressure filter removes soundings for which it proved difficult to successfully model the optical path, suggesting scattering related errors leading to a large difference between the retrieved and prior surface pressure. The reduced χ 2 filter removes the worst spectral fits. In the end, 95 % of our selected soundings pass these first two filters. In addition, the blended albedo filter removes the 12 % fraction of target data representative of challenging snow or ice-covered surfaces (Wunch et al., 2011a). With the current retrieval setup, the difference between the 5AI retrieved surface pressure and its prior exhibit an air mass dependence, as shown in Fig. 2. For this present work, we filter out all soundings with an air mass above 3.0. Future studies will refine the 5AI forward and inverse setup in order to process hyperspectral infrared soundings with larger air masses. Results detailed in the following subsections are based on the 9605 target and 21 254 nadir OCO-2 soundings that passed all these filters. Figure 2. Distribution of target and nadir 5AI retrievals passing surface pressure, blended albedo, and reduced χ 2 r filters according to air mass and difference between retrieved and prior surface pressures. Grey areas denote bins for which no 5AI retrieval is available.

OCO-2 target retrieval results
For every target session, we consider a unique average of the available retrieval results from OCO-2 measurements and a unique average of the corresponding TCCON official products as performed in, for example, O'Dell et al. (2018) and Wu et al. (2018). As OCO-2 and TCCON X CO 2 vertical sensitivities described by their averaging kernels are not exactly identical, we take into account the averaging kernel correction of TCCON data, as performed by the ACOS team  and described by Eq. (10) (Nguyen et al., 2014) as follows: X OCO-2, TCCON is the column-averaged dry air mole fraction of CO 2 that would have been retrieved from the OCO-2 measurement if the collocated TCCON retrieval was the true state of the atmosphere. X a priori is the a priori column-averaged dry air mole fraction of CO 2 , considered to be very similar between 5AI (or ACOS) and GGG2014.X TCCON is the TCCON retrieved column-averaged dry air mole fraction of CO 2 . h is the pressure weighting function vector defined previously. (a CO 2 ) is the CO 2 column averaging kernel vector defined in Eq. (9), and x a priori is the a priori CO 2 concentration profile vector. The effect of this correction yields a positive shift of the bias with regard to TCCON of about 0.2 ppm for the set of target sessions considered in this work. Following post-filtering, Fig. 3 shows 5AI raw results compared to the TCCON official product over 92 target ses- We do not allow the surface pressure, P nlev , to be lower than its preceding pressure level.
Blended albedo -0.8 2.4× O 2 A-band albedo +1.13× CO 2 strong band albedo Target (Wunch et al., 2011a Air mass -3.0 1 cos(SZA) + 1 cos(VZA) , with SZA, the solar zenith angle, and Nadir and target VZA, the viewing zenith angle (Wunch et al., 2011a) Figure 3. The 5AI (a) and raw ACOS B8r (b) OCO-2 target X CO 2 retrieval results compared to the TCCON official X CO 2 product. Individual sounding results are averaged for every target session. Markers show the session average for OCO-2 and TCCON X CO 2 , and error bars show the standard deviations.
sions. The mean systematic X CO 2 bias (5AI − TCCON) is 1.30 ppm, and its standard deviation is 1.32 ppm. The ACOS raw X CO 2 and TCCON X CO 2 comparison for the corresponding set of OCO-2 soundings is also presented in Fig. 3. The bias compared to TCCON is −2.28 ppm, and its standard deviation is 1.23 ppm. This difference in bias compared to TCCON may be greatly influenced by forward modelling and retrieval setup differences between 5AI and ACOS, as detailed later in this work. Bias-corrected FOCAL and Re-moTeC X CO 2 retrieval results compared to the ACOS official product exhibit similar differences in bias standard deviations (Reuter et al., 2017b;Wu et al., 2018).
Temporal and latitudinal fits of 5AI and ACOS X CO 2 biases, compared to TCCON, are displayed in Fig. 4. Temporal biases are fitted with a first-order polynomial added to a cosine and exhibit a quasi-null slope with a ∼ 0.4 ppm am-plitude of yearly oscillation in both 5AI and ACOS cases. Latitudinal bias fits performed with all the available target sessions, except those from Eureka, show that 5AI bias compared to TCCON appears to be larger in the Southern Hemisphere than in the Northern Hemisphere, but its behaviour is quite parallel to ACOS, except at higher latitudes where 5AI and ACOS become closer. The Eureka station (latitude 80 • N) has been removed from those fits, as satellite retrievals and validation are known to be challenging at these latitudes .

OCO-2 nadir retrieval results
In this subsection, raw 5AI retrieved X CO 2 is compared to the ACOS raw product on a sample of OCO-2 nadir soundings, as described in Sect. 3 and displayed in Fig. 1. The nadir viewing configuration is the nominal science mode of the OCO-2 mission and allows comparisons at a larger spatial scale than the one offered by the target mode dedicated to validation. Figure 5 shows the average and associated standard deviation of the difference between 5AI-and ACOS-retrieved raw X CO 2 . The overall 5AI-ACOS difference is about 3 ppm, with a latitudinal dependency; it is lower above midlatitudes in the Northern Hemisphere. 5AI differences to ACOS also exhibit features over India or the Sahara that are places which are often associated with strong aerosol events; those may be due to the neglecting of scattering parameters in the 5AI retrievals. The standard deviation is mainly correlated with topography; it is higher in the vicinity of mountain chains and lower in flatter areas. As we do not take into account topography in the sampling strategy of the processed OCO-2 nadir soundings, its greater variability in mountainous areas can result in a greater variability in the retrieved surface pressure, which is strongly correlated with retrieved X CO 2 . As for the highest standard deviations in South America, they may be caused by the South Atlantic Anomaly which they are close to .
As seen in Fig. 6, latitudinal variations in raw 5AI retrieved X CO 2 are consistent with those of ACOS, with a difference between the two products being almost constant, except above midlatitudes in the Northern Hemisphere where the differences are smaller. In addition, the comparison between 5AI and ACOS in the nadir mode is consistent with the results obtained for target sessions. Indeed, the raw 5AI-ACOS target difference lies within ±1σ of nadir results, with σ being the standard deviation of the 5AI-ACOS difference. Figure 7 details the temporal variations in the retrieved X CO 2 . The global long-term increase in the atmospheric concentra- Figure 5. Spatial repartition of 5AI difference with the raw ACOS B8r (average and standard deviation) on 5 • × 5 • square bins for the nadir data selection. tion of CO 2 can be observed in both hemispheres and the seasonal cycle, which is stronger in the Northern Hemisphere, where most of the vegetation respiration and photosynthesis happen. The temporal variations in the 5AI-ACOS X CO 2 retrieval differences in the nadir mode are also consistent with those presented in the target mode.

Sensitivity of raw retrieval results to scattering particles
Of the main forward and inverse differences between 5AI and ACOS, one is the accounting of scattering particles on the optical path. Indeed, ACOS considers five Gaussianshaped vertical profiles of different scattering particle types for which it retrieves three parameters , while, for computational time purposes, none is considered in the previously presented 5AI results (hereafter denoted 5AI-NS for no scattering). In order to assess the sensitivity of 5AI results to this neglecting of scattering particles, we propose performing some 5AI X CO 2 retrievals from OCO-2 soundings while taking into account some aerosol parameters in the forward modelling and state vector. Several adaptations of the 5AI setup are required for this sensitivity test (hereafter denoted 5AI-AER for aerosols). First, we consider here two fixed-height fixed-width aerosol layers. The first one, representative of coarse mode minerals, is located between about 800 and 900 hPa, and the second, representative of fine mode soot, is between about 900 and 1013 hPa. Only the two layer-wise optical depths are retrieved (defined at 755 nm, as ACOS), each with an a priori value of 0.025 and an a priori uncertainty of 0.15. Otherwise, the state vector and its a priori, described in Table 1, remain unchanged. Regarding forward modelling, we still rely on 4A/OP coupling with VLIDORT for the O 2 A-band calculations, and we now use 4A/OP coupling with LIDORT for CO 2 weak and strong band calculations (thus still neglecting polarisation effects in these bands). Finally, as the retrieval problem becomes less linear when considering scattering particle parameters, we update the Jacobian matrix with every iteration.
With these adaptations, 5AI retrievals are about 12 times slower than when not accounting for scattering particles. Considering the increase in computation time, this sensitivity test can only be performed for a small subsample of the data. We choose to focus here on 15 OCO-2 target sessions (out of the 92 presented in Sect. 4) that have available AERONET (AErosol RObotic NETwork, version 3; AOD level 2.0) optical depths acquired ±2 h compared to OCO-2 overpass (Holben et al., 1998;Eck et al., 1999;Giles et al., 2019), thus enabling us to also discuss total retrieved aerosol optical depths. A total of 445 OCO-2 soundings have been processed, and 228 remain after filtering according to the quality of the spectral fit (reduced χ 2 < 7.0). Figure 8 shows how taking into account scattering particles in the state vector impacts the retrieval of surface pressure. The air mass dependence, exhibited in 5AI-NS results and shown in Fig. 2, appears to be reduced or even removed for the 5AI-AER results. Indeed, neglecting scattering particles results in neglecting the backscattered photons that leads to forward a priori synthetic measurements being less intense Figure 8. Retrieved surface pressure air mass dependence (a) for all 5AI-NS target OCO-2 soundings (light grey), 5AI-NS soundings selected in the small subsample that passed all filters (black), and the corresponding 5AI-AER (red) and ACOS (blue). Distributions of surface pressure degrees of freedom are shown (b) for 5AI-NS (black), 5AI-AER (red) and ACOS (blue). Figure 9. The 5AI-NS (a), 5AI-AER (b), and raw ACOS B8r (c) OCO-2 target X CO 2 retrieval results compared to the TCCON official X CO 2 product. Individual sounding results are averaged for every target session. Markers show the session average for OCO-2 and TCCON X CO 2 , and error bars show the standard deviations. than those actually measured. This difference is seen by the retrieval scheme as an a priori overestimation of the amount of O 2 along the optical path; thus, it is seen as an overestimation of surface pressure which is then reduced. Hence, the −5 hPa surface pressure average bias of 5AI-NS results, compared to the a priori surface pressure in Fig. 8, as opposed to the 1 hPa bias obtained with 5AI-AER for this small subsample of OCO-2 target soundings. Besides, the fraction of measured backscattered photons increases with air mass, leading to the air mass dependence of 5AI-NS results, as shown in Fig. 8. Furthermore, adding scattering particle parameters in the retrieval state vector interferes with surface pressure retrieval as the scattering particle and surface pressure information carried by the O 2 A band is entangled. As it can be seen in Fig. 8b, this leads to lower degrees of freedom for surface pressure compared to retrievals performed without scattering particle parameters in the state vector. 5AI-AER surface pressure degrees of freedom have a distribution that is more similar to ACOS's than to the 5AI-NS surface pressure degrees of freedom. When scattering particle parameters are included in the state vector, this consequently leads to a stronger pull-back of the retrieved surface pressure towards the a priori value, also helping to reduce or even remove the air mass dependence for surface pressure. Thus, for reasons related to both radiative transfer and the retrieval methodology, taking into account scattering particles modifies the average difference between retrieved and a priori surface pressure and helps to remove the air mass dependence seen in 5AI-NS results. Figure 9 shows X CO 2 retrieved from OCO-2 measurements for these 15 target sessions by the initial 5AI-NS setup (Fig. 9a), this adapted 5AI-AER setup (Fig. 9b) and ACOS in the B8r raw results (Fig. 9c). The impact of taking into account scattering particles in the retrievals directly translates from surface pressure to X CO 2 ; it appears that the difference of about 3 ppm exhibited in 5AI-NS results compared to ACOS is reduced to a difference of close to 1 ppm in 5AI-AER results. This shows that taking scattering particle parameters into account can indeed explain much of the differences between 5AI-NS results and ACOS. Regarding the retrieved optical depths, Fig. 10 shows 5AI-AER and ACOS retrieved total AOD compared to AERONET reference data interpolated at 755 nm. 5AI-AER exhibits a higher average difference to AERONET than ACOS, but both retrieval algorithms exhibit a considerable scatter of their results compared to AERONET. Efforts to optimise 4A/OP coupling with (V)LIDORT are underway so that more OCO-2 data can be processed. Once those are completed, a dedicated study will help to tune the 5AI scattering particle setup better (varying aerosol types, impact of cirrus clouds, varying layer altitudes, etc.).

Sensitivity of raw retrieval results to inverse and forward modelling
A difference of about 3 ppm is found between 5AI and ACOS raw X CO 2 retrieved from OCO-2 for both nadir and target observations. In Sect. 5.1 we show that neglecting scattering particles for computational time purposes can explain most of this difference. However, the 5AI-AER retrieval setup does not exactly reproduces ACOS setup, as state vector, forward radiative transfer, and spectroscopic parameter differences remain. All those can be encompassed and accounted for by using an average calculated-observed spectral residual analysis (hereafter calc-obs). It consists in calculating a spectrum (convolved to OCO-2 instrument line shape) based on the ACOS retrieval results (posterior pressure grid, temperature, H 2 O, and CO 2 profiles as well as albedo and albedo slope) and comparing it to the corresponding OCO-2 observation. Possible background differences are also compensated by scaling the OCO-2 spectrum so that its transparent spectral windows fit those of the calculated 4A/OP spectrum. Such comparisons must be performed and averaged over a spatially and temporally unbiased data set with a homogenous viewing geometry in order to cancel out possible dependences. Thus, it is performed here for a randomly chosen half of the nadir OCO-2 points with an air mass below 3.0 selected in 2016 (6790 in total). Figure 11 shows the resulting averaged calc-obs spectral residuals and the corresponding average OCO-2 measurement. Differences are principally located in the 0.7 µm O 2 absorption band but also in the 1.6 and 2.0 µm CO 2 absorption bands. They are due to the differences in inverse setup and in radiative transfer models between ACOS and 5AI (impact of aerosols, parameterisation of continua, spectroscopy, etc.). In order to compare 5AI retrievals with ACOS products while attenuating the impact of the forward and inverse modelling differences, the obtained averaged calc-obs residual is added to every OCO-2 measurements within the complementary half of the selected nadir soundings from 2016 (6799 in total). We then apply the 5AI inverse scheme on this new data set. Fig. 12 compares the distributions of 5AI-ACOS retrieval results obtained with and without the calc-obs adjustment. The systematic differences between 5AI and ACOS results for X H 2 O , X CO 2 , surface pressure, and global temperature profile shift are fully removed when adding the spectral residual adjustment to OCO-2 measurements (remaining differences are negligible compared to standard deviations). This X CO 2 shows that 5AI can, on average, reproduce ACOS results when all their respective differences are compensated with a calc-obs adjustment. However, it impacts the standard deviations of 5AI-ACOS differences. Indeed, only ACOS raw results that relate to the 5AI state vector parameters have been used to compute the calculated spectrum used in this calc-obs analysis, and other ACOS parameters, such as scattering particles for instance, have not been considered. Figure 11. 5AI-ACOS average calc-obs spectral residuals in the O 2 A band (a), CO 2 weak band (b), and CO 2 strong band (c) appear in thick black lines (left axis). A typical spectrum for the three bands is shown in thin grey lines (right axis). Their impact may be attenuated by the background difference correction, which, if disabled, leads to a similar standard deviation of 5AI-ACOS differences in both with and without calc-obs cases. However, without the background compensation, the average difference between 5AI-ACOS is only reduced to 1.9 ppm for X CO 2 (not shown). This exemplifies how highly challenging the sounding-to-sounding intercomparison of retrieval results remains and highlights how forward modelling and retrieval setup design impact X CO 2 retrieval results.

Conclusions
In this work, we have introduced the 5AI inverse scheme; it implements the optimal estimation algorithm and uses the 4A/OP radiative transfer model with the GEISA spectroscopic database and an empirically corrected absorption continuum in the O 2 A band. We have applied the 5AI inverse scheme to retrieve X CO 2 from a sample of ∼ 44k OCO-2 soundings that compromises between coverage and the lowest ACOS-retrieved total AOD. We neglected the impact of scattering particles for computational time purposes and obtained a global averaged uncorrected bias compared to TC-CON of 1.30 ppm, with a standard deviation of 1.32 ppm for air masses below 3.0. These results are comparable in standard deviation with those obtained by ACOS on the corre-sponding set of OCO-2 soundings. Moreover, we showed that, similarly to ACOS, 5AI X CO 2 retrievals satisfactorily capture the global increasing trend of atmospheric CO 2 , its seasonal cycle, and its latitudinal variations, and that 5AI results are consistent between OCO-2 nadir and target modes. Although 5AI exhibits a difference of about 3 ppm compared to ACOS, we showed that neglecting scattering particles can explain most of it. Indeed, 5AI-ACOS average difference is reduced to 1 ppm when accounting for the optical depths of two coarse-and fine-mode aerosol layers in 5AI state vector, respectively. This is in part due to how taking into account scattering particles impacts the retrieval of surface pressure, which becomes closer to ACOS. The air mass dependence of the 5AI retrieved surface pressure is also reduced. Finally, we showed that 5AI can on average reproduce ACOS results when adding to OCO-2 measurements an average calc-obs spectral residual correction. It encompasses all the inverse and forward differences between 5AI and ACOS, and, thus, underlies the critical sensitivity of retrieval results to the inverse setup design and forward modelling.
For favourable conditions (all the best ACOS flags and lowest ACOS retrieved total AOD possible), we showed that 5AI is a reliable implementation of the optimal estimation algorithm, whose results can be compared to ACOS raw products. Efforts are underway in order to optimise and increase the speed of 4A/OP coupling with (V)LIDORT. Finally, the implementation of the 5AI retrieval scheme is intended to be compatible with 4A/OP structure, so that the code can be easily adapted to any current or future greenhouse gas monitoring instrument, from TCCON, EM27/SUN (e.g. Gisi et al., 2012;Hase et al., 2016), and OCO-2 to MicroCarb or Copernicus CO 2 Monitoring (CO 2 M; Meijer and Team, 2019), and can even be applied to research concepts such as the one proposed in the European Commission Horizon 2020 project of SCARBO (Space CARBon Observatory; Brooker, 2018).
Data availability. For this work, we use the B8r of OCO-2 data that were produced by the OCO-2 project at the Jet Propulsion Laboratory, California Institute of Technology, and obtained from the OCO-2 data archive maintained at the NASA Goddard Earth Science Data and Information Services Center (NASA GES-DISC). All TCCON references are given in Table 2. AERONET data are available from the AERONET website (https://aeronet.gsfc.nasa. gov/new_web/download_all_v3_aod.html, last access: 25 May 2021, Stutz et al., 2021). 5AI retrieval results presented in this work are available, upon emailed request, from Matthieu Dogniaux (matthieu.dogniaux@lmd.ipsl.fr).
Author contributions. MD developed 5AI and tested it on the OCO-2 soundings under the supervision of CC, with input and support from CC, RA, VirC, TD, and VinC. MDM, NMD, DGF, OEG, DWTG, FH, LTI, RK, IM, JN, DFP, CMR, KS, KS, YT, VAV, and TW provided the TCCON data. MD wrote this article, with feedback from all co-authors.