XCO2 estimates from the OCO-2 measurements using a neural network approach

The OCO-2 instrument measures high-resolution spectra of the sun radiance reflected at the Earth surface or 10 scattered in the atmosphere. These spectra are used to estimate the column-averaged dry air mole fraction of CO2 (XCO2) and the surface pressure. The official retrieval algorithm (NASA’s Atmospheric CO2 Observations from Space retrievals ACOS) is a full physics algorithm and has been extensively evaluated. Here we propose an alternative approach based on an artificial neural network (NN) technique. For the training and evaluation, we use as reference estimate (i) the surface pressures from a numerical weather model and (ii) the XCO2 derived from an atmospheric transport simulation 15 constrained by surface air-sample measurements of CO2. The NN is trained here using real measurements acquired in nadir mode on cloud-free scenes during even months and is then evaluated against similar observations of odd months. The evaluation indicates that the NN retrieves the surface pressure with a root-mean-square error better than 3 hPa and XCO2 with a 1-sigma precision of 0.8 ppm. The statistics indicate that the NN, that has been trained with a representative set of data, allows excellent accuracy, slightly better than that of the full physics algorithm. An evaluation against 20 reference spectrophotometer XCO2 retrievals indicates similar accuracy for the NN and ACOS estimates, with a skill that varies among the various stations. The NN-model differences show spatio-temporal structures that indicate a potential for improving our knowledge of CO2 fluxes. We finally discuss the pros and cons of using this NN approach for the processing of the data from OCO-2 or other space missions.


Introduction 25
During the past decades, natural fluxes have absorbed about half of the anthropogenic emissions of CO2 (Knorr, 2009), but there is large uncertainty on the spatial distribution of this sink over time and therefore on the processes that control it. A growing network of high-precision atmospheric CO2 measurements has been used together with meteorological information to constrain the sources and sinks of CO2 using a technique known as atmospheric inversion (e.g., Peylin et al., 2013), but the lack of data in large regions of the globe like the tropics does not allow monitoring these fluxes with 30 enough space-time resolution. Early attempts to complement this network with satellite retrievals from sensors that were not specifically designed for this purpose were not successful (Chevallier et al., 2005), but a series of dedicated instruments have been put in orbit since the Greenhouse Gases Observing Satellite (GOSAT, Yokota et al., 2009)  European space agencies (CEOS Atmospheric Composition Virtual Constellation Greenhouse Gas Team, 2018). All missions have adopted the same CO2 observation principle that consists in measuring the solar irradiance that has been reflected at the Earth's surface in selected spectral bands. Along the double atmospheric path (down-going and up-going), the sunlight is absorbed by atmospheric molecules at specific wavelengths. The resulting absorption lines on the measured spectra makes it possible to estimate the amount of gas between the surface and the top of the atmosphere. CO2 shows 40 many such absorption lines around 1.61 and 2.06 µm that are used to estimate the CO2 column. Similarly, the oxygen lines around 0.76 µm are used to estimate the surface pressure and can also be used to infer the sunlight atmospheric path, leading to the column-averaged dry air mole fraction of CO2, referred to as XCO2 Rayner, 2002, Crisp et al., 2004).
One main difficulty in the retrieval of XCO2 from the measured spectra results from the presence of atmospheric particles 45 that scatter light and change its atmospheric path. Accounting for aerosols, in particular, is challenging because aerosols are very variable in amount and in vertical distribution. Another major difficulty results from modelling errors. The radiative transfer models that are used for the retrieval leave significant residuals between the measured and modelled spectra, even after the XCO2 and aerosol amount have been inverted for a best fit (Crisp et al., 2012;O'Dell et al., 2018).
As a consequence of the various uncertainties in the retrieval process, raw XCO2 retrievals show significant biases against 50 reference ground-based retrievals (Wunch et al., 2011b(Wunch et al., , 2017. These biases, together with the comparison against modelling results, led to the development of empirical corrections to the retrieved XCO2. In the case of the OCO-2 v8r retrievals generated by NASA's Atmospheric CO2 Observations from Space (ACOS), these corrections amount to roughly half that of the "signal", i.e. of the difference between the prior and the retrieved XCO2 (O'Dell et al. 2018).
The limitations in the full-physics retrieval method, despite considerable efforts and progresses (e.g., O'Dell et al. 2018, 55 Reuter et al. 2017, Wu et al., 2018 in the case of OCO-2), encourage developing alternative approaches. Here, we want to re-evaluate the potential of an artificial neural network technique (NN) to estimate XCO2 from the measured spectra.
A NN-based technique was already used by Chédin et al. (2003) for a fast retrieval of mid-tropospheric-mean CO2 concentrations from some meteorological satellite radiometers. These authors trained their NNs on a large ensemble of radiance simulations made by a reference radiation model and assuming diverse atmospheric and surface conditions. NN-60 based approaches are also commonly used for the retrieval of other species from various high-spectral-resolution satellite radiance measurements because of their computational efficiency (e.g., Hadji-Lazaro et al. 1999).
A NN approach requires a large and representative training dataset. A standard method for problems similar to that discussed here is to use a radiative transfer model and to generate a large ensemble of pseudo observations based on assumed atmospheric and surface parameters. However, as mentioned above, the radiative transfer models have 65 deficiencies that are rather small, but nevertheless significant with respect to the high precision objective of the CO2 measurements. In addition, there may be some wrong assumptions and unknown instrumental defects that are not accounted for in the forward modeling. We thus prefer to avoid using such radiative transfer models and rather base the training on a fully empirical approach (see, e.g., Aires et al., 2005). We use real OCO-2 observations together with collocated estimates of the surface pressure and XCO2. The retrievals from the NN approach are evaluated against model 70 estimates of surface pressure and XCO2, as well as observations from the Total Carbon Column Observing Network (TCCON, Wunch et al., 2011). In the following, section 2 presents the approach while section 3 describes the results.
Section 4 discusses the results and the way forward.

Data and method
Our NN estimates XCO2 and the surface pressure from nadir spectra measured by the OCO-2 satellite over land.  If successful, the same approach can be applied to all footprints. The focus on nadir measurements here is motivated by the complication introduced by the Doppler effect in glint mode, which is the other pointing mode for OCO-2 routine science operations: the absorption lines affect pixel elements that vary among the spectra. These variations of the position of the absorption line may cause additional difficulty to the NN training. The solar lines in the nadir spectra are also 80 affected by Doppler shifts due to the motion of the Earth and satellite relative to the sun, but this concerns a limited set of spectral elements that are affected by the solar (Fraunhofer) lines. The development of a glint-mode NN is therefore left for a future study.
We use spectral samples in the three bands of the instrument (around 0.76, 1.61 and 2.06 µm). They have footprints of ~ 3 km 2 on the ground. In principle, each band is described by 1016 pixel elements but some are marked as bad either 85 because some of the corresponding detectors died at some stage or because of known temporary or permanent issues. We systematically remove 15 pixel elements that are flagged in about 80% of the spectra and 478 pixels in the band edges.
Conversely, we do not remove the spectra that are affected by the deep solar lines, and we let the NN handle these specific features. Because the information in the spectrum is mostly in the relative depth of the absorption lines, and not in their overall amplitude, we normalize each spectrum by a radiance that is representative of the offline values (i.e. the mean of 90 the 90-95% range for each spectrum). This essentially removes the impact of the variations in the surface albedo and in the sun irradiance linked to the solar zenith angle. Other choices for the input may be attempted in the future.
As input to the NN, we add the observation geometry (sun zenith angle and relative azimuth). The sun zenith angle drives the atmospheric pathlength and is then required for the interpretation of the absorption line depth in terms of atmospheric optical depth. The azimuth was not included in our first attempts but, when included, it led to a significant improvement 95 in the results. Although the NN technique does not allow for a clear physical interpretation, we assume that the information brought by the relative azimuth is linked to the polarization of the molecular scattering contribution to the measurements that varies with the azimuth.
The NN exploits these 2557 input variables to compute 2 variables only: XCO2 and the surface pressure. It is structured as a Multilayer Perceptron (Rumelhart et al. 1986) with one hidden layer of 500 neurons that use a sigmoid activation 100 function. The number of hidden layers is somewhat arbitrary and based on a limited sample of trials. Lower quality estimates were obtained with 50 neurons whereas the training time increased markedly for 1000 neurons and more. The weights of the input variables to the hidden neurons and the weights of the hidden variables to the output variables are adjusted iteratively with the standard Keras library (Chollet, 2015). Figure A1 in the appendix illustrates the convergence process. The NN cost function (aka loss) becomes fairly constant for a test dataset after about 100 iterations, while it 105 continues to decrease for the training dataset, indicating an over-fitting of the data. The iteration is stopped when there is no decrease of the test loss for 50 iterations. There is a factor of 3 to 4 between the loss of the training dataset and that of the test, which confirms the over-fitting of the former.
Note that the NN estimate does not use any a priori information on surface pressure or the CO2 profile after the training is done. Also, no explicit information is provided on the altitude, location or time period of the observation. The NN 110 estimates are therefore only driven by the OCO-2 spectrum measurements, together with the observation geometry (sun zenith and relative azimuth). The observation geometry varies with the latitude and the season so that the NN may infer some location information from this input. Conversely, it is the same from one year to the next and, at a given date, for all longitudes. Thus, there is no information on the longitude or the year of observation in the geometry parameters that are provided to the network. 115 The NN training is based on OCO-2 radiance measurements (v8r) acquired during even months between January 2015 and August 2018. The 4-year period allows varying the global background CO2 dry air mole fraction by ~ 2%, as much as typical XCO2 seasonal variations in the northern extra-tropics (see, e.g., Fig. 1 of Agustí-Panareda et al., 2019). Our evaluation dataset is based on observations during the odd months of the same period. In both cases, we make use of XCO2 estimates and the quality control filters of the ACOS L2Lite v9r products: only observations with 120 xco2_quality_flag=0 are used. We also consider the warn level, outcome flag and cloud_flag_idp that are provided in the v8r L2lite and L2Std files. For the training of the NN, we only use the best quality observations, i.e. those with a warn level lower or equal to 2, a cloud_flag of 3 (very clear) and an outcome flag of 1. This choice is based on an evaluation of the surface pressure estimates that is described below (with the description of Figure 3). This distinction leads to about 131 000 observations for the training. For the evaluation of the NN estimates, we use less restrictive criteria and accept 125 observations with outcome_flag of either 1 and 2, and cloud_flag of 2 or 3. These choices are justified below. The spatial distribution of the observations that are used for the training is shown in Figure A2 of the appendix. The training dataset covers most regions of the globe with the exception of South America. The underrepresentation of this sub-continent stems for both the high cloudiness and impact of cosmic rays that leads to missing pixel elements (see below).
For the reference surface pressure (training and evaluation), an obvious choice is the use of numerical weather analyses 130 corrected for the sounding altitude. Indeed, the typical accuracy for surface pressure data is on the order of 1 hPa (Salstein et al. 2008). For convenience, we use the surface pressure that is provided together with the OCO-2 data and that is derived from the Goddard Earth Observing System, Version 5, Forward Processing for Instrument Teams (GEOS5-FP-IT) created at Goddard Space Flight Center Global Modeling and Assimilation Office (Suarez et al. 2008 andLucchesi et al. 2013). There is no such obvious choice for XCO2 as there is no global-scale highly-accurate dataset of XCO2 and 135 we thus rely here on best estimates from a modelling approach. We use the CO2 atmospheric inversion of the Copernicus modelling. For each OCO-2 observation, XCO2 is computed from the collocated concentration vertical profile, through a simple integration weighted by the pressure width of the model layers. Note that the model layers use "dry" pressure coordinates so that there is no need for a water vapor correction in the vertical integration. The GEOS5-FP-IT surface pressure and the XCO2 from CAMS are used both for the training and the evaluation, although using independent datasets 145 (odd and even months).
Many measured spectra lack one or several spectral pixels. This is particularly the case over South America, as a consequence of the South Atlantic cosmic ray flux anomaly that impacts the OCO-2 detector in this region. We therefore devised a method to interpolate the spectra and to fill the missing pixels. Our method first sorts all spectral pixels as a function of the measured radiance in a large number of complete measured spectra. The pixel ranks are averaged to 150 generate a rank representative of the full dataset. Then, when a pixel element is missing in a spectrum, we look for its typical rank and we average the radiances of the two pixel elements that have the ranks just above and below. The procedure is applied even when several pixel elements are missing in a spectrum, except when these are successive in the typical ranking. The procedure described here fills the missing elements, and the NN can then be applied to the corrected spectrum to estimate the surface pressure and XCO2. 155 Figure 1 shows a density histogram of the GEOS5 FP-IT surface pressure analysis and of the NN estimate for the evaluation dataset (odd months). Clearly, there is an excellent agreement between the two over a very wide range of surface pressures. There is no significant bias and the standard deviation is 2.9 hPa. The equivalent ACOS v8r retrieval shows a bias of 1.5 hPa and a standard deviation of 3.4 hPa, slightly larger than that of the NN approach. Note that the 160 ACOS statistics are those of the ACOS retrieval-minus-prior statistics (see Section 2). Interpreting them in terms of error is counter-intuitive because the Bayesian retrieval is supposed to be better than the prior, but in practice radiation modelling errors lead to a different interpretation (see, e.g., the discussion in   that have an outcome flag of 1 or 2. This choice is based on a prior performance analysis. We have analyzed how the performance of the NN approach varies with the quality indicators. For this objective, we have compared the retrieved surface pressure against the value derived from the numerical weather data, as in Figure 1, and we have evaluated the statistic of their difference as a function of the quality flags. First (figure not shown), there is no significant difference between the cases when the measured spectra are complete and those when one or several missing pixel elements have 185 been interpolated with the method described above. Conversely, the statistics vary with the cloud flag and the warn level, as shown in Figure 3. We only use the spectra for which an ACOS retrieval is available. Among those, and according to the flag cloud_flag_idp, about 53% are labeled as "very clear" while 43% are "probably clear". The statistics are slightly better for the former than they are for the latter. Conversely, the rather rare "definitely cloudy" and "probably cloudy"

Results
show deviations that are significantly larger. This result was well expected since our NN did not learn how to handle 190 clouds in the spectra, so that all "definitely cloudy" and "probably cloudy" soundings are outside the domain covered by the training dataset. Note also that the observations used here have all been classified as "clear" by the ACOS preprocessing. Thus, most OCO-2 observations are not used here and Figure 3 should not be interpreted as the ability to retrieve the surface pressure in cloudy conditions. Most (78%) of the observations have a warn level of 0. The deviation statistics increase with the warn level, both in terms of bias and standard deviation. In comparison, the difference in the 195 statistics for an outcome flag of 1 and 2 are small. Besides, more than half of the ACOS retrievals have an outcome flag of 2 which encourages us not to reject those for further use. Based on this analysis, we retain all spectra that are very clear (cloud flag of 2 or 3) and that have a warn level of 2 or less.
We have made a similar figure as Figure 3 but based on the XCO2 estimates (not shown). Although the results are similar in terms of sign (i.e. increase of the deviations with the warn levels), the signal is not as obvious (there is less relative 200 difference between a warn level and another, or for the various cloud flags). Our interpretation is that the relative accuracy of the surface pressure that is used as a reference estimate is much better than that of the NN retrieval, whereas the accuracy of the XCO2 from CAMS is not much better than that of the NN. As a consequence, variations in the accuracy of the NN do not show up as clearly for XCO2 than they do for the surface pressure.
A standard method to evaluate an algorithm that estimates XCO2 from spaceborne observation is the comparison of its 205 products against estimates from TCCON retrievals. These estimates use ground-based solar absorption spectra recorded by Fourier transform infrared spectroscopy and have been tuned with airborne in-situ profiles (Wunch et al. 2010). To take advantage of the full potential of the TCCON retrievals for the bias-correction and validation of the XCO2 estimates, the OCO-2 platform can be oriented so that the instrument field of view is close to the surface station. The ACOS fullphysics algorithm can handle these spectra that are acquired in neither nadir nor glint geometries, but the NN was trained 210 solely on nadir spectra and cannot be applied yet to the observations acquired in target mode. We thus have to rely on nadir measurements acquired in the vicinity of TCCON sites. In the following, we use nadir measurements that are within  Table 1. Two stations, Paris and Pasadena, show a large negative bias for both estimates, which may be interpreted as the impact of the city on 220 the atmosphere sampled by the TCCON measurement, while the atmosphere sampled by the distant satellite may be less affected. Conversely, there is no such negative bias for other stations that are located close to large cities, such as Tsukuba that is in the suburb of the Tokyo Metropolitan area. Zugspitze is rather specific due to its high altitude. The comparison against TCCON indicates that the NN approach has a similar performance as ACOS, if not better. The dispersion is larger for one versus the other for some stations, while the opposite is true for others. Note also that the CAMS model performs 225 better than both satellite retrievals for most stations. This observation provides further justification to the use of this model for training the NN.
The evaluation of the algorithm performance is limited by the distance between the satellite estimate and its surface validation. This is inherent to the use of nadir-only observations that are seldom located close to the TCCON sites. A reduction of the distance results in less coincidences, which leads to a validation dataset of poor representativeness. Note 230 that the CAMS model was sampled at the location of the satellite observations, so that the higher performance of the model versus the satellite products cannot be caused by a higher proximity to the TCCON station.
We now investigate whether the model-minus-NN differences are purely random or contain some spatial or temporal structures. This question is important as, if the differences show a random structure, there is little hope to use these data to improve the surface fluxes used in the CAMS product. Conversely, if the XCO2 differences do show some structures, 235 they can be attributed to surface flux errors in the CAMS product that may then be corrected through inverse atmospheric modelling. There is no certainty, however, as a spatial structure in the NN-minus-CAMS difference can also be interpreted as a bias in the satellite estimate.
We first show ( Figure 5) the difference between the NN estimate of the surface pressure and that from the numerical weather analyses. These are monthly maps of the NN-minus-CAMS difference for the 3 years of the period at a 5°×5° 240 resolution. We only present the odd months as the others months have been used for the training, and therefore do not show any significant differences. There are very clear spatial patterns of a few hPa which are not expected and should be interpreted as a bias in the NN approach. The biases over the high mountains and plateaus have already been mentioned. In addition, positive biases tend to occur in the high latitudes, and negative biases toward the tropics. The structures show additional spatial and temporal patterns and are therefore more complex than just a latitude function. The 245 same figure but based on the ACOS retrievals ( Figure A3) displays large-scale structures with different spatial patterns: the surface pressure bias is mostly negative over the Northern latitudes and positive over the low latitudes. A histogram ( Figure 6) of the monthly differences such as those shown on Figure 5 confirms that the amplitude of the surface pressure biases is larger with ACOS than it is with the NN. The NN (resp. ACOS) surface pressure bias is -0.33 hPa (resp 1.39 hPa) and the standard deviation is 2.12 (resp. 2.79 hPa). 250 Figure 7 is similar as Figure 5 but for XCO2 difference between the NN estimate and the CAMS model. As for the surface pressure, there are clear spatial patterns, with amplitudes of 1 to 2 ppm. The question is whether these are mostly linked to monthly biases in the CAMS model or to the NN. The first hypothesis is of course more favorable as it would indicate that the satellite data can bring new information to constrain the surface fluxes. However, the analysis of the surface pressure that shows biases of several hPa suggests that the NN XCO2 estimate may also show biases with spatially 255 coherent patterns. Interestingly, the patterns vary in time and are not correlated with those of the surface pressure. Further analysis, in particular atmospheric flux inversion, is necessary for a proper interpretation of the NN-CAMS differences.
The differences of ACOS estimates to the CAMS model also show patterns of similar amplitude as those in Figure 7 ( Figure A4). However, there is no clear correspondence between these patterns and those obtained using the NN product.
The differences between the satellite products and the CAMS model are small, but these contain the information that may 260 be used to improve our knowledge on the surface fluxes. The absence of a clear correlation between the spatio-temporal pattern from the NN and ACOS approaches indicate that their use would lead to very different corrections on the surface fluxes, if used as input of an atmospheric inversion approach. Figure 6, top, shows the histogram of these monthly-mean differences. The histograms are very similar for the two satellite products, although the standard deviation of the difference to the CAMS model is slightly larger for ACOS than it is for the NN approach (0.89 vs 0.83 ppm). 265

Discussion and Conclusion
The use of the same product for the NN training and its evaluation may be seen as a weakness of our analysis. One may argue that the NN has learned from the model and generates an estimate (either the surface pressure or XCO2) that is not based on the spectra but rather on some prior information. Let us recall that the NN input does not contain any information on the location or date of the observation. This is a strong indication that the information is derived from the spectra as 270 the NN does not "know" the CAMS value that corresponds to the observation location. Yet, the NN input also includes the observation geometry (sun angle and azimuth) that is somewhat correlated with the latitude and day-in-the-year. One may then argue that the NN learns from this indirect information on the observation location and then generates an estimate that is based on the corresponding CAMS value. However, since the observation geometry is exactly the same from one year to the next, there is no information, direct or indirect, on the observation year in the NN input. Thus, the 275 XCO2 growth rate, that is accurately retrieved by the NN method (see Fig. 7), is necessarily derived from the spectra. A similar argument can be made on the spatial variation across the longitudes.
To further demonstrate that the NN retrieves XCO2 from the spectra rather than from the prior, we made an additional experiment. The training is based only on even months. As a consequence, the prior does not include any direct information on the odd months. For the odd months, the best prior estimate here is a linear interpolation between the two 280 adjacent even months. We can then analyze how the NN estimate compares with the CAMS product, that accounts for the true synoptic variability, and a degraded version of CAMS that is based on a linear interpolation between the two adjacent months. This comparison is shown in Figure 8. The center figure compares the true CAMS value and that derived from the temporal interpolation. As expected, both are highly correlated (the seasonal cycle and the growth rate are kept in the interpolated values) but show nevertheless a difference standard deviation of 0.89 ppm. This value can be 285 interpreted as the synoptic variability of XCO2 that is present in CAMS but is not captured in the interpolated product.
The comparison of the NN estimate against CAMS (right) and the interpolated CAMS (left) shows significantly better agreement to the former. Thus, the NN product does reproduce some XCO2 variability that is not contained in the training prior. It provides further demonstration that the NN estimates relies on the spectra rather than on the time/space variations of the training dataset. 290 The results shown above indicate that the NN approach allows an estimate of surface pressure and XCO2 with a precision that is similar or better than that of the operational ACOS algorithm. The lack of independent "truth" data does not allow a full quantification of the product precision and accuracy. However, there are indications that the accuracy on the surface pressure is better than 3 hPa RMS, while the precision (standard deviation) of XCO2 is better than 0.9 ppm. The data used for the XCO2 product evaluation has its own error that is difficult to disentangle from that of the estimate based on 295 the satellite observation. It may also contain a bias that is propagated to the NN through its training.
One obvious advantage of the NN approach is the speed of the computation that is several orders of magnitude higher than that of the full physics algorithm. This is significant given the current re-processing time of the OCO-2 dataset despite the considerable computing power that is made available for the mission. It also bears interesting prospects for future XCO2 imaging missions that will bring even higher data volume (e.g., Pinty et al., 2017). 300 Another advantage is that the NN approach described in this paper does not require the extensive de-bias procedure which is necessary for the ACOS product (O'Dell et al 2018, Kiel et al. 2019. Per construction, there is no bias between the NN estimates and the dataset that is used for the NN training. The NN approach requires therefore less effort and manpower.
There are however a number of drawbacks for the NN approach that is described in this paper. 305 One obvious drawback is the use of a CO2 model simulation in the training while the main purpose of the satellite observation is to improve our current knowledge on atmospheric CO2 and its surface fluxes. Our argument is that, although the CAMS simulation used here has high skill (as demonstrated in Figure 4), it may have positive or negative XCO2 biases for some months and some areas. These biases are independent from the measured spectra so that the NN training will aim at average values. As a consequence, the NN product could in principle be of higher quality than the 310 CAMS product, even though the same model has been used as the reference estimate for the training (see, e.g., Aires et al., 2005).
Another drawback of the NN approach is that it does not directly provide its averaging kernel. The averaging kernel vector reports the sensitivity of the retrieved total column to changes in the concentration profile (Connor et al., 1994). It is a combination of physical information (about radiative transfer) and of statistical information (about the prior 315 information). It is needed for a proper comparison with 3D atmospheric models (e.g., Chevallier 2015). When comparing with model simulations, for instance for atmospheric inversion, we may wish to neglect the NN implicit prior information: this hypothesis leads to a homogeneous pressure weighting over the vertical, as this is the product that the NN was trained to simulate. Alternatively, we could decide to neglect the difference in prior information between the NN and the full physics algorithm and use typical averaging kernels of the latter. A third, more involving, option would be to perform a 320 detailed sensitivity study of the NN, based on radiative transfer simulations.
Similarly, the current version of our neural network does not provide a posterior uncertainty. A Monte Carlo approach using various training datasets could be use in the future for such an estimate.
Also, the NN that was developed cannot be safely used to process observations that are acquired later than a few weeks after the last data of the training dataset, in order to keep the application within the variability range of the training data 325 and despite the CO2 growth rate. Therefore, the use of the neural network approach for near real time applications would require frequent updates of the training phase.
We acknowledge the fact that the NN product that is evaluated here is not fully independent from the ACOS product.
Indeed, we use the cloud flag and the quality diagnostic from ACOS to select the spectra that are of sufficient quality. If we aim at some kind of operational product, there is a need to design a procedure to identify these good quality spectra. 330 One option would be to compare the surface pressure retrieved by the NN to the numerical weather analysis estimate, and to reject cases with significant deviations (e.g. differences larger than 3 hPa).
Despite these drawbacks, the results presented here do show that a neural network has a large potential for the estimate of XCO2 from satellite observations such as those of OCO-2, of the forthcoming MicroCarb (Pascal et al. 2017) or the CO2M constellation (Sierk et al. 2018) that aims at measuring anthropogenic emissions. It is rather amazing that a first 335 attempt leads to trueness and precision numbers that are similar or better than those of the full physics algorithm. There are several ways for improvement: one is to provide the NN with some ancillary information such as the surface altitude or a proxy of the atmospheric temperature. Another one is to train the NN with model estimates (such as those of CAMS used here) but that have been better sampled for their assumed precision, for instance through a multi-model evaluation.
Also, one could train the NN with observations acquired during a few days of each month, rather than the even months 340 as done here, so that the evaluation dataset would provide a better evaluation of the seasonal cycle.
Our next objective is to attempt a similar NN approach but for the measurements that have been acquired in the glint mode. As explained above, the glint observations may be more difficult to reproduce by the NN than those acquired in the nadir mode. However, we have been very much surprised by the ability of the NN with the nadir data, and cannot exclude to be surprised again. Last, we shall analyze the spatial structure of the NN retrievals in regions that are expected 345 to be homogeneous and in regions where structures of anthropogenic origin are expected (e.g., Nassar et al., 2017;Reuter et al., 2019).

Acknowledgments
This work was in part funded by CNES, the French space agency, in the context of the preparation for the MicroCarb mission. OCO-2 L1 and L2 data were produced by the OCO-2 project at the Jet Propulsion Laboratory, California Institute 350 of Technology, and obtained from the ACOS/OCO-2 data archive maintained at the NASA Goddard Earth Science Data and Information Services Center. TCCON data were obtained from the TCCON Data Archive, hosted by the Carbon Dioxide Information Analysis Center (CDIAC) -tccon.onrl.gov. We warmly thank those who made these data available.

Code/Data availability
The codes used in this paper and the CAMS model simulations are available, upon request, from the author. The OCO-355 2 and TCCON data can be downloaded from the respective websites.

Author contributions
FMB designed the study. LD developed the codes and performed the computations. All authors shared the result analysis.
650 Figure A3: Same as Figure 5 but for the surface pressure retrieved by the ACOS algorithm. The mean bias over the full period (µ) is removed so that the differences are centered on zero. 655 Figure A4: Same as Figure 7 but for the XCO2 retrieved by the ACOS algorithm. The mean bias over the full period (µ) is removed so that the differences are centered on zero. 660