The Adaptable 4 A Inversion ( 5 AI ) : Description and first XCO 2 retrievals from OCO-2 observations

A better understanding of greenhouse gas surface sources and sinks is required in order to address the global challenge of climate change. Spaceborne remote estimations of greenhouse gas atmospheric concentrations can offer the 30 global coverage that is necessary to improve the constraint on their fluxes, thus enabling a better monitoring of anthropogenic emissions. In this work, we introduce the Adaptable 4A Inversion (5AI) inverse scheme that aims to retrieve geophysical parameters from any remote sensing observation. The algorithm is based on Bayesian optimal estimation relying on the Operational version of the Automatized Atmospheric Absorption Atlas (4A/OP) radiative transfer forward model along with the Gestion et Étude des Informations Spectroscopiques Atmosphériques: Management and Study of 35 Atmospheric Spectroscopic Information (GEISA) spectroscopic database. Here, the 5AI scheme is applied to retrieve the column-averaged dry-air mole fraction of carbon dioxide (X!!!) from measurements performed by the Orbiting Carbon Observatory-2 (OCO-2) mission, and uses an empirically corrected absorption continuum in the O2 A-band. For airmasses https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c © Author(s) 2020. CC BY 4.0 License.

and the Fast atmOspheric traCe gAs retrievaL (FOCAL) algorithm from the University of 70 Bremen (Reuter et al., 2017a(Reuter et al., , 2017b. Besides implementing different inverse methods, these algorithms also rely on different forward radiative transfer models to compute synthetic measurements and their partial derivatives. WFM-DOAS and BESD use SCIATRAN (Rozanov et al., 2002(Rozanov et al., , 2014 with the time-efficient correlated-k approximation (Buchwitz et al., 2000), and take into account multiple 75 scattering. The !!! retrievals performed by NIES (Yoshida et al., 2011) use a fast radiative transfer model that uses the kspace to increase computational speed for multiple scattering (Duan et al., 2005). RemoTeC uses LINTRAN v2.0, which is a linearized vector (handling the four components of the Stokes vector at the same time) radiative transfer forward model that employs forward-adjoint theory to solve the radiative transfer equation (Hasekamp and Landgraf, 2002;Schepers et al., 2014). The ACOS !!! retrieval algorithm and UoL-FP combine, in a piecemeal approach, the LIDORT model to perform a 80 scalar single-scattering radiative transfer computation with the discrete ordinate method (Spurr, 2002) and a second-order-ofscattering polarization model named 2OS (Natraj et al., 2008). FOCAL uses a scalar radiative transfer model that approximates multiple-scattering by assuming the presence of a unique optically thin isotropic scattering layer in the atmosphere, thus enabling fast forward modelling (Reuter et al., 2017a).

85
These radiative transfer models also fundamentally depend on spectroscopic databases containing the parameters enabling to compute the atmospheric gas absorption. The previously mentioned retrieval algorithms mainly rely on the HITRAN spectroscopic database that evolved over the years: WFM-DOAS uses HITRAN 2008 (Rothman et al., 2009) as does the UoL-FP (with some updated CO 2 , H 2 O and CH 4 spectroscopic lines). RemoTeC and GOSAT !!! retrievals use HITRAN 2008 combined with an O 2 A-band line absorption spectroscopic model taking into account line-mixing and collision-90 induced absorption (CIA) (Tran and Hartmann, 2008) as well as another line-mixing model for CO 2 lines (Lamouroux et al., 2010). BESD relies on ABSCO v4.0 (computed by ACOS for OCO-2 processing), as does FOCAL for H 2 O (Thompson et al., 2012, Reuter et al., 2017a, 2017b. Finally, the ACOS !!! retrieval algorithm producing the OCO-2 official product uses ABSCO v5.0 (Drouin et al., 2017;O'Dell et al., 2018;, as does FOCAL, for O 2 and CO 2 (Reuter et al., 2017a). 95 The design of an !!! retrieval algorithm, from the forward model and the spectroscopic parameters it uses to the choice of the adjusted quantities in the state vector, has a critical influence on the overall performance of the observing system (Rodgers, 2000). The systematic errors in retrieved !!! and their standard deviations (the latter being also called single measurement precision) with regard to the true (but unknown) state of the atmosphere particularly impact the uncertainty 100 reduction and bias in atmospheric CO 2 flux inversion studies (e.g. Chevallier et al., 2007). However, direct in-situ measurements of CO 2 atmospheric concentration profiles are logistically too difficult to scale up for systematic validation of https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. spaceborne measurements, and so retrieved !!! products are most often validated against columns with similar observation geometry, like the ground based solar absorption spectrometry. The Total Carbon Column Observing Network (TCCON) is a network of ground stations that retrieve column-averaged dry-air mole fraction of CO 2 and other species from NIR and 105 SWIR spectra measured with Fourier Transform Spectrometers (FTS) directly pointing at the sun (Wunch et al., 2011b). The network currently consists of 27 stations all around the world and its products constitute a "truth-proxy" reference for the validation of spaceborne retrievals of greenhouse gas atmospheric concentrations. For instance, TCCON datasets were used to validate SCIAMACHY (Reuter et al., 2011), GOSAT !!! retrieved by the ACOS (Wunch et al., 2011a) and NIES algorithms (Inoue et al., 2016) and OCO-2 !!! produced by ACOS (O'Dell et al., 2018;, RemoTeC 110 (Wu et al., 2018) and FOCAL (Reuter et al., 2017b). These three last algorithms exhibit different biases with regard to TCCON, depending on their respective forward modelling and bias correction strategies: 0.30 ± 1.04 ppm, 0.0 ± 1.36 ppm and 0.67 ± 1.34 ppm for OCO-2 nadir land soundings, respectively.
In this paper, we present the Adaptable 4A Inversion (5AI) that relies on the OPerational version of the Automatized 115 Atmospheric Absorption Atlas (4A/OP) radiative transfer model (Scott and Chédin, 1981;Tournier, 1995;Cheruy et al., 1995) (1) OCO-2 cloud-free target session soundings between 2014 and 2018 and (2) a sample of two years of OCO-2 nadir clear sky 120 measurements with a global land coverage. We compare 5AI retrieval results to TCCON, and to ACOS and FOCAL v08 results over identical sets of soundings in order to assess the reliability of 5AI as a Bayesian optimal estimation implementation.
This paper is organized as follows: Sect. 2 describes the 5AI retrieval scheme and its current features, as well as the 4A/OP 125 radiative transfer model, the GEISA spectroscopic database and the empirically corrected O 2 A-band absorption continuum on which it relies. Section 3 presents the OCO-2 and TCCON data selection. Section 4 presents the a posteriori filters used for this work and shows the 5AI !!! target and nadir retrieval results which are compared to TCCON, ACOS and available FOCAL v08 !!! products. Section 4 finally underlines the critical importance of forward modelling differences to explain systematic differences between different !!! products through an average calculated -observed spectral residual 130 correction. Section 5 highlights the conclusions of this work. https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

The 5AI retrieval scheme
As for any other retrieval scheme, 5AI aims at finding the estimate of atmospheric and surface parameters (for example trace gas concentration, temperature profile, surface albedo, or scattering particle optical depth) that best fits hyperspectral measurements made from space. This inverse problem can be expressed with the following equation: 135 where is the measurement vector containing the radiances measured by the space instrument, is the state vector containing the geophysical parameters to be retrieved, is the measurement noise and finally is the forward radiative transfer model that describes the physics linking the geophysical parameters to be retrieved to the measured infrared radiances. 140

Forward modelling: 4A/OP and GEISA spectroscopic database
The 5AI retrieval scheme uses the OPerational version of the Automatized Atmospheric Absorption Atlas (4A/OP). 4A/OP is an accurate line-by-line radiative transfer model that enables a fast computation of atmospheric transmittances based on atlases containing pre-computed monochromatic optical thicknesses for reference atmospheres. Those are used to compute atmospheric transmittances, for any input atmospheric profile and viewing configuration, that enable to solve the radiative 145 transfer equation and yield radiances and their partial derivatives with regard to the input geophysical parameters at a pseudo-infinite spectral resolution (0.0005 cm -1 best) or convolved with an instrument function. 4A/OP is the reference radiative transfer model for the Centre National d'Études Spatiales (CNES) / EUMETSAT IASI Level 1 Calibration/Validation and operational processing, and it is used for daily retrieval of mid-tropospheric columns of CO 2 (Crevoisier et al., 2009a) and CH 4 (Crevoisier et al., 2009b) from the Infrared Atmospheric Sounding Imager (IASI). 150 Moreover, 4A/OP has also been chosen by CNES as the reference radiative transfer model for the development of the New Generation of the IASI instrument (IASI-NG) (Crevoisier et al., 2014).
Although originally developed for the thermal infrared spectral region, 4A/OP now also includes near and shortwave infrared regions (NIR and SWIR). The extension to NIR and SWIR brought important new features to 4A/OP: (1) The 155 computation of the atlases of optical thickness was extended to the 3,000 -13,500 cm -1 domain and takes into account linemixing and CIA in the O 2 A-band (Tran and Hartmann, 2008) as well as line-mixing and H 2 O-broadening of CO 2 lines (Lamouroux et al., 2010). The absorption lines of CO 2 we use in this work are thus identical to those included in HITRAN 2008; (2) Solar spectrum is a flexible input and the Doppler shift of its lines is computed; (3) The radiative transfer model is now coupled with the LIDORT model (Spurr, 2002) for scalar multiple-scattering simulation performed with the discrete 160 ordinates method, as well as with VLIDORT (Spurr, 2006) if polarization or Bidirectional Reflectance Distribution Functions (BRDF) need to be taken into account. These new features are critical for the preparation of the French NIR and https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. SWIR CO 2 remote sensing MicroCarb mission (Pascal et al., 2017) and the French-German MEthane Remote sensing LIdar Mission (MERLIN) (Ehret et al., 2017).

165
The 4A/OP radiative transfer model can be used with monochromatic optical thickness atlases computed from any spectroscopic database. For this present work, the atlases are computed using the GEISA 2015 (Gestion et Étude des Informations Spectroscopiques Atmosphériques: Management and Study of Spectroscopic Information) spectroscopic database. Being the base of many work since the beginning in the astronomical and astrophysical communities, GEISA has been also used since the 2000's for the preparation of several current and future spatial missions, as to be chosen by CNES as 170 the reference spectroscopic database for the definition of IASI-NG, MicroCarb and MERLIN. Due to imperfections in the Tran and Hartmann (2008) line mixing and CIA models, an empirical correction to the absorption continuum in the O 2 Aband, fitted from Park Falls TCCON spectra following the method described in Drouin et al. (2017), has been added. Finally, we use Toon (2015) as input solar spectra.

Bayesian optimal estimation applied for retrieval
The whole formalism of Bayesian optimal estimation that enables to find a satisfying solution to Eq. (1) may be found in Rodgers (2000). This subsection only outlines the key steps that are implemented in order to retrieve !!! .
Equation (1) includes , the experimental noise of the measured radiances. Hence, it appears more appropriate to use a 180 formalism that takes into account this measurement uncertainty and translates it into retrieval uncertainty. Considering the probability density function instead of vectors can bring such an insight. With Gaussian statistics, the inversion problem boils down to the minimization of the following ! cost function: where is the a priori state vector, which is also in most cases chosen as the first guess for iterative retrievals. Assuming 185 again Gaussian statistics, is the a priori state covariance matrix that represents the variability around the a priori state vector, and similarly is the a priori measurement error covariance matrix that represents the noise model of the instrument. Moreover, as radiative transfer is a highly non-linear forward model, it is practical to use a local linear approximation, here expressed around the a priori state: The partial derivatives of the forward radiative transfer model (here 4A/OP) are expressed as a matrix, called the Jacobian matrix, and denoted . https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.
All these assumptions enable the maximum posterior probability state that minimizes the cost function defined in Eq. (2) to be found. It can be computed by iteration, using the general approach: 195 where is a scaling factor that can be set to 0 (Gauss-Newton method) or whose value can be adapted along iterations in order to prevent divergence (Levenberg-Marquardt method). denotes here the forward radiative transfer Jacobian matrix, whose values are evaluated for the state vector . In this work we assume a slow variation of the Jacobian matrix along the iterations and therefore choose not to update it in order to save computational time. Hence, the partial derivatives of the 200 radiative transfer model are evaluated once and for all around the a priori state. We performed a sensitivity test and assessed that this approximation does not significantly change the retrieval results (not shown).
A successful retrieval reduces the a priori uncertainty of the state vector described in . The a posteriori covariance matrix of the retrieved state vector , whose diagonal elements give the posterior variance of the retrieved state vector elements, is 205 expressed as Finally, the sensitivity of the retrieval with regard to the true geophysical state is given by the averaging kernel matrix calculated according to In most cases, the CO 2 concentration is included in the state vector as a level or layer profile from which !!! , the retrieved column-averaged dry-air mole fraction of CO 2 , is computed (e.g. O'Dell et al., 2012). If we note , the part of the retrieved state vector containing the CO 2 profile, and and , the corresponding square parts of and , we have: where is the pressure weighting function. ! !!! denotes the posterior uncertainty of the retrieved !!! and is the CO 2 column averaging kernel. This profile vector describes the vertical sensitivity of the retrieved column with regard to the true profile: it is essential to characterize retrieval results and to compare them to other products, as shown in Sect. 4.2. 220

5AI features and retrieval scheme setups for OCO-2
The 5AI retrieval scheme enables the retrieval of multiple geophysical variables from hyperspectral measurements. Those currently include trace gas concentration represented in the state vector as a concentration profile or a profile scaling-factor, https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. global temperature profile offset, surface temperature and pressure, band-wise albedo whose spectral dependence is modelled as a polynomial, and finally scattering particle layer-wise optical depth. 225 For this work, the iterative scheme is set to the Levenberg-Marquardt method. The state vector includes the main geophysical parameters necessary to retrieve !!! and is described in Table 1. The a priori values and their covariance are identical to those used in the ACOS B8r version (O'Dell et al., 2018) in order to ease the retrieval result comparison, as we aim to assess 5AI reliability. However, some elements of the ACOS state vector are not included in this work: scattering 230 particles optical depth (AOD) as we only consider clear-sky soundings, Solar Induced Fluorescence which is not modelled in 4A/OP, surface wind speed (only land retrievals are considered) and Empirical Orthogonal Function (EOF) scaling factors. https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

Data description
The OCO-2 spectrometer measures Earth-reflected near and shortwave infrared (NIR and SWIR) sunlight in three distinct bands: the O 2 A-band (0.7 µm), the weak CO 2 band (1.6 µm) and the strong CO 2 band (2.0 µm). The satellite has three distinct observation modes. The nadir and glint modes are the nominal science observation modes; they constitute the vast majority of OCO-2 measurements. In addition, the target mode of the OCO-2 mission provides data for the validation of the 245 retrievals. During a target session, the satellite tilts and aims at a validation target (most of them are TCCON stations) and scans its whereabouts several times during the overpass. These sessions thus provide with OCO-2 data points closely collocated with validation targets (over areas that can be as small as 0.2° longitude × 0.2° latitude) and registered over a few minutes .

250
OCO-2 high-resolution spectra are analysed by the ACOS team in order to retrieve !!! and other geophysical parameters from them. Two different !!! values are provided by the ACOS team: raw and posterior bias-corrected !!! . Raw !!! is the direct output of the ACOS algorithm following the full physics retrieval: its most recent version is distributed within the B8 retrospective (B8r) ACOS data release (O'Dell et al., 2018). Posterior bias-corrected !!! is an empirically corrected !!! that has reduced averaged bias with regard to different "truth-proxies" (O'Dell et al., 2018). The last available version 255 of this product is distributed within the B9 retrospective (B9r) ACOS data release. It corrects the impacts of footprint geolocation errors and erroneous prior surface pressure temporal sampling directly in the bias correction procedure applied to the B8r raw !!! product, without a complete full-physics reprocessing of all OCO-2 data (Kiel et al., 2019). In this work, 5AI results are compared with B8r raw !!! , and B9r posterior bias-corrected !!! are also shown.

260
In addition to ACOS products, we also compare our results with OCO-2 FOCAL v08 data produced at the University of Bremen with the FOCAL algorithm (Reuter et al., 2017a) that includes an empirical posterior bias correction directly on the top of the full-physics retrieval (Reuter et al., 2017b). Only the posterior bias-corrected !!! is included in FOCAL v08 data.
In this work, we compare !!! retrieved from OCO-2 spectra to TCCON data. The TCCON network uses ground-based high 265 resolution Fourier Transform Spectrometers to measure NIR and SWIR spectra that enable the retrieval of the columnaveraged dry-air mole fractions of greenhouse gases. These retrievals are performed by GGG2014  and their results are available on the TCCON Data Archive (https://tccondata.org/). https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

Data selection
We intend to compare 5AI results with regard to TCCON against ACOS and FOCAL results for corresponding sets of 270 soundings. First, we select all the OCO-2 target soundings between 2015 and 2018 with low ACOS retrieved total AOD (<0.5) and ACOS cloud, sounding quality and outcome flags at their best possible value. As FOCAL v08 uses prior and posterior filtering techniques that are different from ACOS, only a fraction of this first selection intersects with available FOCAL data. In order to increase this fraction, we add all OCO-2 points with the best ACOS cloud and sounding quality flags intersecting the available FOCAL v08 data points, whatever ACOS outcome flag and retrieved AOD. This composite 275 sample set includes 48,885 OCO-2 target soundings and the fraction of available intersecting FOCAL data is shown in Fig.   1. For this study, we select the TCCON official products measured ± 2 hours with regard to OCO-2 overpass time and only keep the target sessions where at least five OCO-2 measurements passing 5AI posterior filters and five TCCON data points are available. This set includes 11,102 TCCON individual retrieval results from 20 TCCON stations listed in Table 2.  Figure 2 shows the spatial and temporal distribution of these OCO-2 points. https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

295
× 5° square bins. The titles include the number of soundings n for the corresponding panel: the low number of selected soundings in July-August-September 2017 is due to an identified OCO-2 data gap. https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

Post-filtering of retrieval results 300
We apply the a posteriori filters described in Table 3 to ensure retrieval results' quality. The surface pressure filter removes soundings for which it proved difficult to successfully model the optical path, suggesting scattering related errors leading to a large difference between the retrieved and prior surface pressure. The reduced ! filter removes the worst spectral fits. In the end, 88% of our selected soundings pass these first two filters. In addition, the blended albedo filter removes the fraction of target data (29%) representative of challenging snow or ice-covered surfaces (Wunch et al., 2011a). With the current 305 retrieval setup, the difference between the 5AI retrieved surface pressure and its prior exhibit an airmass dependence as shown in Fig. 3. For this present work, we filter out all sounding with airmasses above 3.0. Future studies will refine the 5AI forward and inverse setup in order to process hyperspectral infrared soundings with larger airmasses. Results detailed in the following subsections are based on the 24,449 target and 21,254 nadir OCO-2 soundings that passed all these filters.

Figure 3. Distribution of target and nadir 5AI retrievals passing surface pressure, blended albedo and reduced filters according to airmass and difference between retrieved and prior surface pressures. Grey areas denote bins for which no 5AI
315 retrieval is available.

OCO-2 target retrieval results
For every target session, we consider a unique average of the available retrieval results from OCO-2 measurements and a unique average of the corresponding TCCON official products as performed in e.g. O'Dell et al. (2018) and Wu et al. (2018). As OCO-2 and TCCON !!! vertical sensitivities described by their averaging kernels are not exactly identical, we 320 take into account the averaging kernel correction of TCCON data as performed by the ACOS team (O'Dell et al., 2018) and described by Eq. (10) (Nguyen et al., 2014): (10) https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.
!"!!!,!""#$ is the column-averaged dry-air mole fraction of CO 2 that would have been retrieved from the OCO-2 measurement if the collocated TCCON retrieval was the true state of the atmosphere, ! !"#$"# , the a priori column-averaged 325 dry-air mole fraction of CO 2 , considered to be very similar between 5AI (or ACOS) and GGG2014, !""#$ , the TCCON retrieved column-averaged dry-air mole fraction of CO 2 , , the pressure weighting function vector defined previously, ( ), the CO 2 column averaging kernel vector defined in Eq. (9) and , the a priori CO 2 concentration profile vector. The effect of this correction yields a positive shift of the bias with regard to TCCON of about 0.2 ppm for the set of target sessions considered in this work. 330 Following post-filtering, Fig. 4 shows 5AI raw results compared to the TCCON official product over 106 target sessions.
The mean systematic !!! bias (5AI − TCCON) is 1.33 ppm and its standard deviation is 1.29 ppm. The ACOS raw !!! and TCCON !!! comparison for the corresponding set of OCO-2 soundings is also presented in Fig. 4: the bias with regard to TCCON is -2.08 ppm and its standard deviation is 1.27 ppm. This difference in bias compared to TCCON may be greatly 335 influenced by forward modelling differences between 5AI and ACOS, as detailed later in this work. Bias-corrected RemoTeC !!! retrieval results compared to the ACOS official product exhibit similar differences in bias standard deviations (Wu et al., 2018). https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.
Temporal and latitudinal fits of 5AI and ACOS !!! biases compared to TCCON are displayed in Fig. 5. Temporal biases 345 are fitted with a 1 st order polynomial added to a cosine and exhibit quasi-null slope with a ~0.4 ppm amplitude of yearly oscillation in both 5AI and ACOS cases. Latitudinal bias fits performed with all the available target sessions except those from Eureka (full lines) show that 5AI bias compared to TCCON appears to be larger in the Southern hemisphere than in the Northern hemisphere, but its behaviour is quite parallel to ACOS except at higher latitudes where 5AI and ACOS get closer.
The Eureka station (latitude 80°N) has been removed from those fits as satellite retrievals and validation are known to be 350 challenging at these latitudes (O'Dell et al., 2018). The same latitudinal bias fits performed on the dataset intersecting available FOCAL v08 data (dashed lines) show improved 5AI bias compared to TCCON. This is mainly due to the airmass distribution difference between the two sets displayed in Fig. 1. Fig. 6.

(right panel). Crosses show individual session averages in the left panel and individual station averages in the right panel, full lines show polynomial fits of this bias for all target sessions, and dashed lines represent the polynomial fits of this bias for the target sessions intersecting FOCAL v08 available soundings, used for the simplistic empirical bias correction applied in
Finally, a consistent comparison of 5AI, ACOS and FOCAL v08 on this intersecting set of available soundings is performed 360 in Fig. 6. Its first column shows 5AI and ACOS raw !!! results. As previously mentioned, FOCAL v08 only distributes a posterior bias-corrected !!! product. Thus, in order to provide a more consistent comparison of the three retrieval schemes, in the second column of Fig. 6, we apply a simplistic empirical correction on 5AI and ACOS results that removes the fitted latitudinal bias with regard to TCCON, presented in dashed lines in the right panel of Fig. 5. Finally, the last column of Fig. https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. 6 shows official posterior bias-corrected ACOS B9r and FOCAL v08 products. The standard deviations of these biases are 365 quite similar between the three retrieval schemes (0.05 ppm difference between 5AI and ACOS, 0.01 ppm difference between 5AI and FOCAL v08). The slight improvement of 5AI bias compared to TCCON between Fig. 4 and Fig. 6 is due to the differences in airmass distribution between the two sounding sets.

FOCAL v08 (bottom row panel) OCO-2 target
retrieval results compared to TCCON official product. Depending on data availability, we show raw results (left column panels), simplistically corrected results based on a latitudinal bias fit (central column panels) and official bias-corrected products (right column panels). Individual sounding results are averaged for every target session: markers show session average for OCO-2 and TCCON , and error bars show standard deviations. One target session in Darwin on the 11 th of September 2015 distinguishes itself from other sessions with either increased bias 375 compared to TCCON or OCO-2 session-wise standard deviation for the three algorithms. It has been manually removed from the statistics but still appears in red with black lining in the figure.

OCO-2 nadir retrieval results
In this subsection, raw 5AI retrieved !!! is compared to the ACOS raw product on a sample of OCO-2 nadir clear sky soundings as described in Sect. 3 and displayed in Fig. 2. The nadir viewing configuration is the nominal science mode of 380 the OCO-2 mission and allows comparisons at a larger spatial scale than the one offered by the target mode dedicated to validation. Figure 7 shows the average and associated standard deviation of the difference between 5AI and ACOS retrieved raw !!! .
The overall 5AI-ACOS difference is about 3 ppm, with a latitudinal dependency: it is lower above mid-latitudes in the 385 Northern hemisphere. The standard deviation is mainly correlated with topography: it is higher in the vicinity of mountain chains and lower on flatter areas. As we do not take into account topography in the sampling strategy of the processed OCO-2 nadir soundings, its greater variability in mountainous areas can result in a greater variability of the retrieved surface pressure which is strongly correlated with retrieved !!! . As for the highest standard deviations in South America, they may be caused by the South Atlantic Anomaly to which they are close . 390 https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. Figure 7. Spatial repartition of 5AI -raw ACOS B8r average difference and its standard deviation on 5° × 5° square bins for the nadir data selection.
As seen in Fig. 8, latitudinal variations of raw 5AI retrieved !!! are consistent with those of ACOS, with a difference between the two products almost constant except above mid-latitudes in the Northern hemisphere where the differences are smaller. In addition, the comparison between 5AI and ACOS in nadir mode is consistent with the results obtained for target sessions. Indeed, the raw 5AI -ACOS target difference lies within ± 1 of nadir results, with the standard deviation of the 5AI -ACOS difference. Figure 9 details the temporal variations of the retrieved !!! . The global long-term increase of 400 the atmospheric concentration of CO 2 can be observed in both hemispheres as well as the seasonal cycle, stronger in the Northern hemisphere where most of the vegetation respiration and photosynthesis happen. The temporal variations of the 5AI -ACOS !!! retrieval differences in nadir mode are also consistent with those presented in target mode.

Sensitivity of raw retrieval results to forward modelling
A difference of about 3 ppm is found between 5AI and ACOS raw !!! retrieved from OCO-2 for both nadir and target observations. As mentioned in Sect. 1 and 2, 5AI and ACOS retrieval schemes rely on different radiative transfer models and spectroscopic inputs, and their respective retrieval setups are also quite different. In order to quantify the impact of these 415 differences, we perform an average 'calculated -observed' spectral residual analysis (hereafter 'calc -obs'), where the calculated spectrum (convolved to OCO-2 Instrument Line Shape) is generated by the forward model 4A/OP using GEISA spectroscopic database and the ACOS retrieval results (posterior pressure grid, temperature, H 2 O and CO 2 profiles as well as albedo and albedo slope), and is compared to the corresponding OCO-2 observation. In addition, possible background differences are compensated by scaling the OCO-2 spectrum so that its transparent spectral windows fit those of the 4A/OP 420 calculated spectrum. This comparison is performed for a randomly chosen half of the nadir OCO-2 points with an airmass below 3.0 selected in 2016 (6,790 in total). Figure 10 shows the resulting averaged calculated -observed spectral residuals as well as the typical transmission of the OCO-2 measurements. Differences are principally located in the 0.7 µm O 2 absorption band, but also in the 1.6 and 2.0 µm CO 2 absorption bands. They are due to the radiative transfer models' differences between ACOS and 5AI (parametrization of continua, spectroscopy, etc). 425 https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

430
In order to compare 5AI retrievals with ACOS products while attenuating the impact of the forward modelling differences, the obtained averaged calc -obs residual is added to every OCO-2 measurements within the complementary half of 2016 selected nadir soundings (6,799 in total) to compensate for the systematic radiative model differences between 4A/OP and ACOS. We then apply the 5AI inverse scheme on this new dataset. Figure 11 compares the distributions of 5AI -ACOS 435 retrieval results obtained with and without the calc -obs adjustment. The systematic differences between 5AI and ACOS https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. results for !!! , !!! , surface pressure and global temperature profile shift are fully removed when adding the spectral residual adjustment to OCO-2 measurements. This allows a first quantification of how spectroscopic and radiative transfer differences can impact !!! retrievals. This calc -obs adjustment impacts the standard deviations of 5AI -ACOS differences. Indeed, several retrieval setup and forward modelling differences such as scattering particle parameters remain 440 unaccounted for in this analysis. Their impact may be attenuated by the background difference correction, which, if disabled, leads to a similar standard deviation of 5AI -ACOS differences in both with and without calc -obs cases. However, without the background compensation, the average difference between 5AI -ACOS is only reduced to 1.9 ppm for !!! (not shown). This exemplifies how highly challenging the sounding-to-sounding inter-comparison of retrieval results remains, and highlights how forward modelling and retrieval setup design impact !!! retrieval results. 445 https://doi.org/10.5194/amt-2020-403 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

Conclusions
In this work, we have introduced the 5AI inverse scheme: it implements Bayesian optimal estimation and uses the 4A/OP radiative transfer model with the GEISA spectroscopic database and an empirically corrected absorption continuum in the O 2 A-band. We have applied the 5AI inverse scheme to retrieve !!! from a sample of ~77k OCO-2 clear-sky soundings with low ACOS retrieved total AOD in target and nadir mode. Its global averaged uncorrected bias with regard to TCCON is 1.33 455 ppm with a standard deviation of 1.29 ppm for airmasses below 3.0. These results are comparable in standard deviation with those obtained by ACOS and FOCAL v08 for corresponding sets of OCO-2 soundings. Moreover, we showed that, similarly to ACOS, 5AI !!! retrievals satisfactorily capture the global increasing trend of atmospheric CO 2, its seasonal cycle as well as its latitudinal variations, and that 5AI results are consistent between OCO-2 nadir and target modes. Although 5AI exhibits a difference of ~3 ppm with regard to ACOS, we showed that forward modelling differences between 5AI and 460 ACOS can be removed with an average 'calculated -observed' spectral residual correction added to OCO-2 measurements, thus underlying the critical sensitivity of retrieval results to forward modelling.
For favourable conditions (clear sky, low ACOS total AOD), we showed that 5AI is a reliable implementation of the optimal estimation algorithm whose results can be compared to other available products. Efforts are underway in order to optimize 465 and increase the speed of 4A/OP coupling with LIDORT and VLIDORT, and hence to process more soundings and account for cirrus clouds or aerosols in the retrievals. Additionally, 5AI retrieval setup will be refined to process soundings with airmasses larger than 3.0 in future works. Finally, the implementation of the 5AI retrieval scheme is intended to be compatible with 4A/OP structure, so that the code can be easily adapted to any current or future greenhouse gas monitoring instrument, from TCCON or EM27/SUN (e.g. Gisi et al., 2012;Hase et al., 2016) to OCO-2, MicroCarb (Pascal et al., 2017) 470 or CO 2 Monitoring (Meijer and Team, 2019), and even applied to research concepts such as the one proposed in the European Commission H2020 SCARBO project (Brooker, 2018).

Data availability
For this work we use the B8r and B9r releases of OCO-2 data that were produced by the OCO-2 project at the Jet Propulsion Laboratory, California Institute of Technology, and obtained from the OCO-2 data archive maintained at the NASA Goddard 475 Earth Science Data and Information Services Center (NASA GES-DISC). TCCON data are available on the TCCON Data Archive (https://tccondata.org/) and FOCAL v08 data can be downloaded on the FOCAL-OCO2 website hosted by the University of Bremen (http://www.iup.uni-bremen.de/~mreuter/focal.php). 5AI retrieval results presented in this work are available upon request from Matthieu Dogniaux by email (matthieu.dogniaux@lmd.ipsl.fr).