On the potential of a neural network-based approach for estimating XCO2 from OCO-2 measurements

. In David et al (2021), we introduced a neural network (NN) approach for estimating the 10 column-averaged dry air mole fraction of CO 2 (XCO2) and the surface pressure from the reflected solar spectra acquired by the OCO-2 instrument. The results indicated great potential for the technique as the comparison against both model estimates and independent TCCON measurements showed an accuracy and precision similar or better than that of the operational ACOS (NASA’s Atmospheric CO 2 Observations from Space retrievals – ACOS) algorithm. Yet, subsequent 15 analysis showed that the neural network estimate often mimics the training dataset and is unable to retrieve small scale features such as CO 2 plumes from industrial sites. Importantly, we found that, with the same inputs as those used to estimate XCO2 and surface pressure, the NN technique is able to estimate latitude and date with unexpected skill, i.e. with an error whose standard deviation is only 7° and 61 days, respectively. The information about the date mainly comes from the weak CO 2 20 band, that is influenced by the well-mixed and increasing concentrations of CO 2 in the stratosphere. The availability of such information in the measured spectrum may therefore allow the NN to exploit it rather than the direct CO 2 imprint in the spectrum, to estimate XCO2. Thus, our first version of the NN performed well mostly because the XCO2 fields used for the training were remarkably accurate, but it did not bring any added value. 25

Abstract. In David et al. (2021), we introduced a neural network (NN) approach for estimating the column-averaged dry-air mole fraction of CO 2 (XCO 2 ) and the surface pressure from the reflected solar spectra acquired by the OCO-2 instrument. The results indicated great potential for the technique as the comparison against both model estimates and independent TCCON measurements showed an accuracy and precision similar to or better than that of the operational ACOS (NASA's Atmospheric CO 2 Observations from Space retrievals -ACOS) algorithm. Yet, subsequent analysis showed that the neural network estimate often mimics the training dataset and is unable to retrieve small-scale features such as CO 2 plumes from industrial sites. Importantly, we found that, with the same inputs as those used to estimate XCO 2 and surface pressure, the NN technique is able to estimate latitude and date with unexpected skill, i.e., with an error whose standard deviation is only 7 • and 61 d, respectively. The information about the date mainly comes from the weak CO 2 band, which is influenced by the well-mixed and increasing concentrations of CO 2 in the stratosphere. The availability of such information in the measured spectrum may therefore allow the NN to exploit it rather than the direct CO 2 imprint in the spectrum to estimate XCO 2 . Thus, our first version of the NN performed well mostly because the XCO 2 fields used for the training were remarkably accurate, but it did not bring any added value.
Further to this analysis, we designed a second version of the NN, excluding the weak CO 2 band from the input. This new version has a different behavior as it does retrieve XCO 2 enhancements downwind of emission hotspots, i.e., a feature that is not in the training dataset. The comparison against the reference Total Carbon Column Observing Net-work (TCCON) and the surface-air-sample-driven inversion of the Copernicus Atmosphere Monitoring Service (CAMS) remains very good, as in the first version of the NN. In addition, the difference with the CAMS model (also called innovation in a data assimilation context) for NASA Atmospheric CO 2 Observations from Space (ACOS) and the NN estimates is correlated.
These results confirm the potential of the NN approach for an operational processing of satellite observations aiming at the monitoring of CO 2 concentrations and fluxes. The true information content of the neural network product remains to be properly evaluated, in particular regarding the respective input of the measured spectrum and the training dataset.

Introduction
There is a growing interest for the monitoring of CO 2 from space. The aim is not so much the atmospheric concentration, which is already known with high accuracy, but rather the CO 2 fluxes. Indeed, there is a need to monitor natural fluxes of CO 2 to better understand their driving factors and to improve land and ocean models (Peylin et al., 2013). There is also a strong societal requirement to monitor the CO 2 anthropogenic emissions at national and more detailed scales. For these objectives, a series of dedicated instruments have been put in orbit since the Greenhouse Gases Observing Satellite (GOSAT, Yokota et al., 2009) and the second Orbiting Carbon Observatory (OCO-2, Eldering et al., 2017), launched in 2009 and 2014, respectively, and still operated at the time of writing. This new and evolving constellation is directly supported by Japanese, US, Chinese, and European space agen-Published by Copernicus Publications on behalf of the European Geosciences Union. 5220 F.-M. Bréon et al.: XCO 2 estimated by a neural network approach cies (CEOS Atmospheric Composition Virtual Constellation Greenhouse Gas Team, 2018). The OCO-3 instrument was launched in 2019 and is flying attached to the International Space Station (ISS) with a focus on the imagery of cities and industrial sites (Taylor et al., 2020). These targets are also the main focus of the CO2M mission under development at ESA.
These missions all use the same general principal to estimate the CO 2 concentration in the atmosphere. They measure the reflected solar light at high spectral resolution, which allows identification of absorption lines whose depth is related to the total amount of gas along the atmospheric path (O'Brien and Rayer, 2002). Atmospheric CO 2 shows a number of such lines close to 1.61 and 2.06 µm so that these spectral regions are targeted. Because the absorption is more intense at 2.06 µm, this measurement channel is often referred to as the strong-CO 2 (or sCO2) band, whereas the 1.61 µm is the weak-CO2 (wCO2) band. The line depth is also affected by the surface pressure and the number of scattering particles in the atmosphere. To identify and account for their contribution, an additional measurement is made around the oxygen absorption band at 0.76 µm (O 2 band). The combination of these measurements makes it possible to estimate the column-averaged dry-air mole fraction of CO 2 , referred to as XCO 2 (Crisp et al., 2004). Note that the MicroCarb instrument, to be launched by CNES in 2022, will have a fourth band at 1.27 µm. This band serves the same purpose as the O 2 band; it has the advantage of being spectrally closer to the CO 2 bands and the disadvantage of being affected by airglow (Bertaux et al., 2020).
The interpretation of measured spectra in terms of XCO 2 is achieved through full physics algorithms that explicitly account for the absorption by CO 2 , O 2 , and water vapor; for scattering in the atmosphere; and for non-Lambertian reflection on the Earth surface. The modeling must also account for the instrument line shape function and Doppler effects. The inversion process is iterative and starts from a prior estimate of all atmospheric parameters. It is very computer-timeconsuming. The processing of OCO-2 data has shown systematic differences between the measured spectra and those modeled after inversion, which led to the development of empirical corrections to the measured spectra (Crisp et al., 2012;O'Dell et al., 2018). In addition, raw XCO 2 retrievals show significant biases against reference ground-based retrievals (Wunch et al., 2011b. These biases, together with the comparison against modeling results, led to the development of empirical corrections to the retrieved XCO 2 . The need for empirical corrections to the full-physics algorithms and the considerable computer load motivated us to develop an alternative approach described in David et al. (2021). We used an artificial network technique (NN) which is purely empirical, without the use of any radiative transfer model. Our hypothesis was that the CAMS (Copernicus Atmosphere Monitoring Service) model constrained by surface air sample measurements provides a fairly accurate estimate of the atmospheric CO 2 concentration, including the growth rate over multiple years (Chevallier et al., 2019;see also Fig. 8). Indeed, the seasonal cycle of CO 2 together with the growth rate generates a set of XCO 2 samples with a well-known variability. The uncertainties on the modeling (≈ 1 ppm) are small with respect to the range of XCO 2 samples that is available in the multi-year dataset (20 ppm). As a consequence, although CAMS is not the truth, it may be used for supervised learning. Note that other 4D descriptions of the atmospheric composition could have been used for our work. We chose CAMS mostly for practical reasons; the same procedure may be attempted with another modeling dataset.
In practice, we used a series of OCO-2 spectra from a 5year dataset for the NN training. We then applied the NN to the observations that were not used in the training and compared their estimates to both the same CAMS model used for the training and also the fully independent set of Total Carbon Column Observing Network (TCCON, Wunch et al., 2011a) observations. The results indicated an accuracy and precision that were similar to, if not better than, that of the ACOS algorithm.
More recent results challenged our interpretation of the NN skill. In particular, the XCO 2 estimates of the NN did not show significant enhancement downwind of large power plants, unlike the product of the NASA Atmospheric CO 2 Observations from Space (ACOS) full-physics algorithm. This is shown in the following together with our interpretation. A new version of the NN resulted from this interpretation and retains the high accuracy of the first version, while being much more independent from the training dataset.
In the following, Sect. 2 describes the main characteristic of the NN approach and the training procedure. Section 3 presents the limitation of the first version of the NN, as it shows no innovation with respect to the training dataset. Section 4 describes and justifies a new version of the NN approach. Section 5 discusses the results, suggests directions for improvements, and concludes.

Data and method
The NN described in this paper estimates XCO 2 from spectra measured by the OCO-2 satellite over land. Most of the analysis is made with the spectra acquired in nadir mode, but we have also developed a version for glint acquisition that is described and commented on at the end of Sect. 4. Conversely to the analysis in David et al. (2021), we now use all crosstrack footprints. A single NN is used to process all footprints even though the spectral elements of different footprints correspond to different sampled wavelengths.
We use spectral samples in the three bands of the instrument (around 0.76, 1.61, and 2.06 µm). They have footprints of ∼ 3 km 2 on the ground. In principle, each band is described by 1016 samples, but some are marked as bad ei- ther because some of the corresponding detectors died at some stage or because of known temporary or permanent issues. We systematically remove 15 spectral samples that are flagged in about 80 % of the spectra and 478 pixels in the band edges. Conversely, we do not remove the samples that are affected by the deep solar lines, and we let the NN handle these specific features. Because the information in the spectrum is mostly in the relative depth of the absorption lines, and not in their overall amplitude, we normalize each spectrum by a radiance that is representative of the offline values (i.e., the mean of the 90 %-95 % range for each spectrum). This essentially removes the impact of the variations in the surface albedo and in the solar irradiance linked to the sun zenith angle. Figure 1 offers a graphical representation of the NN. As input, we use the three band spectra (or a subset; see below) and the observation geometry (sun and view zenith angle: SZA and VZA, and relative azimuth: AZI). Some versions also use the surface pressure (P surf ) as input. No explicit information is provided to the NN regarding the location or date of the observation. The inputs feed all the neurons of a first "hidden" layer. We use a fully connected neural network, which means that all the neurons are connected to the neurons of the previous and next layer. We have attempted NN versions with a variable number of hidden layers (a single one was used in David et al., 2021). Each neuron computes a weighted sum of the inputs and derives a single output on the basis of either a sigmoid function or a "rectified linear unit". The loss is derived from the mean absolute error. The weights of the input variables to the neurons are adjusted iteratively with the standard Keras library (Keras Team, 2015) for an optimal agreement between the NN output and a reference.
The NN training is based on OCO-2 radiance measurements (v10r) acquired between February 2015 and December 2019. We make use of XCO 2 estimates and the quality control filters of the ACOS L2Lite v9r products (Eldering et al., 2015): only observations with xco2_quality_flag = 0 are used. For the validation of the NN estimates, we also use ob-servations with relaxed quality requirements. For versions of the NN that use the surface pressure as input, we use the estimate that is provided together with the OCO-2 data, and they are derived from the Goddard Earth Observing System, Version 5, Forward Processing for Instrument Teams (GEOS5-FP-IT) created at Goddard Space Flight Center Global Modeling and Assimilation Office (Suarez et al., 2008;Lucchesi, 2013). The weather model pressures have been adjusted to the sounding surface height.
Our analysis makes use of the CAMS CO 2 atmospheric inversion (Chevallier et al., 2010;version 19r1). This product was released in July 2020 and contributed, e.g., to the Global Carbon Budget 2020 (Friedlingstein et al., 2020). It results from the assimilation of CO 2 surface air sample measurements in a global atmospheric transport model run at spatial resolution of 1.90 • in latitude and 3.75 • in longitude over the period 1979-2019 and using the adjoint of this transport model. Neither satellite retrievals nor TCCON observations were used for this modeling. For each OCO-2 observation, XCO 2 is computed from the collocated concentration vertical profile, through a simple integration weighted by the pressure width of the model layers. Note that the model layers use "dry" pressure coordinates so that there is no need for a water vapor correction in the vertical integration. The XCO 2 from CAMS is used for both the training and the evaluation, although using independent datasets: the "training" dataset is a 3 % random sample of the full dataset. The observations that are used for the training are earmarked and not used for further evaluation.

Initial results and interpretation
David et al. (2021) described a first version of the NN approach to estimate XCO 2 . In this first version, the surface pressure was not used as input, and the training was made on observations acquired during even months, while the validation used observations of the odd months. The results were surprisingly good in that the statistical difference to both the CAMS modeling and the independent TCCON observations indicated an accuracy similar to or better than that of the CAMS product. Further analysis posterior to the publication was worrisome, however.
First, we found that well-documented local enhancements of XCO 2 in the ACOS product (e.g., Nassar et al., 2017;Reuter et al., 2019), also referred to as plumes, did not show up in the NN product. We analyzed in particular a case over South Africa acquired on 31 August 2016, an illustration of which is provided in Fig. 2. Over a distance of ≈ 100 km, the ACOS product shows several well-identified enhancements of ≈ 5 ppm, whereas the NN product does not show any significant pattern. The presence of large coal power plants upwind of the OCO-2 observations makes the enhancements trustworthy. We found many similar cases where the NN did not display an XCO 2 plume where ACOS did. We concluded 5222 F.-M. Bréon et al.: XCO 2 estimated by a neural network approach that the NN did reproduce the seasonal variation in XCO 2 together with the growth rate but was unable to identify smallscale features. Since all observations are processed independently, we could not interpret this apparent incoherence.
Second, we made an experiment where the training dataset is biased by 1 ppm for the observations acquired during a single month (within the full period of 50+ months). When applied to the validation dataset, the differences to CAMS show a bias of ≈ 0.5 ppm, but only for the observations that are within a few weeks of the biased period (Fig. A1). This is rather surprising as the observation date is not an input of the NN. Still, these results provide a clear indication that this version of the NN is somehow sensitive to the observation date.
To investigate the issue, we developed and trained a new NN with the same inputs, but aiming at estimating the date, latitude, and longitude. For the training, we used the true values of these parameters, and we analyzed how the NN was able to make an estimate based on the inputs (the spectra and the observation geometry). Figure 3 shows the histograms of the errors when applied to the independent dataset.
The results indicate that the NN approach is able to make a reasonable estimate of the location and date of the observation based on the spectra and the observation geometry. The standard deviation of the latitude error is on the order of 7 • , and there is no significant difference with the footprint. One may expect that this information is largely derived from the observation geometry that changes with the latitude (both the SZA and the azimuth do). One argument in favor of this hypothesis is that the precision of the longitude estimate is much worse, with a standard deviation on the order of 58 • . Indeed, for a given day, the observation geometry is nearly the same for all successive orbits; thus, there is no information in the observation geometry to estimate the longitude, while there is such information for the latitude. As for the date, the standard deviation is ≈ 61 d, or 2 months. Clearly then, in the input data of the NN, there is indirect information about the observation date and latitude, and this was a surprise to us. Indeed, when describing the NN approach in David et al. (2021), we argued that the NN had no information on the measurement date, as successive observations from the same day of year and location, but different years, were made with the exact same observation geometry.
The various histograms of Fig. 3 were made using a single (O 2 ) band, a combination of the O 2 band with either CO 2 band, and all three bands. The most striking difference between the various histograms is for the date estimate. Indeed, the accuracy strongly degrades when the wCO2 band is not included. The combination of O 2 + wCO2 bands leads to a much better accuracy (a factor of more than 3 on the standard deviation) than that obtained with O 2 + sCO2. The other differences on the histograms are not as large.
How does the NN indirect information on the observation date, and why is this information somehow contained in the wCO2 band? Our best interpretation is that the weak CO 2 spectrum is sensitive to the upper atmosphere CO 2 concentration that is rather well mixed while increasing regularly in time. The absorption lines in the sCO2 band are much stronger so that their centers are saturated in the spectra. As a consequence, the CO 2 signal is more in the line wings, which are more sensitive to the higher pressure (lower altitude) levels. The wCO2 lines are not saturated and the spectrum shape may provide the information for an estimate of the highaltitude CO 2 concentration. We investigated another hypothesis which the wCO2 detector shows an evolution in time, that could be used by the NN to infer the observation date. However, we did not find any indication of such behavior. Thus, at this point, the stratospheric CO 2 hypothesis is physically plausible and is our best hypothesis because we have no other. Note however that we have investigated the correlation between the longitudinal anomalies of stratospheric CO 2 in the CAMS model and the error on the date estimate by the NN approach. No such correlation was found. Thus, either our hypothesis is wrong or the description of the longitudinal variations in stratospheric CO 2 in CAMS offers a poor representation of the reality. Both hypotheses are plausible.
These results clearly demonstrate that the input data to the NN provide indirect information on the date and latitude. Atmospheric simulations such as those of CAMS indicate that XCO 2 variations are mostly a function of time and latitude. Indeed, on average, the deviations of XCO 2 along the longitudes are on the order of 0.5 ppm (standard deviation). They are however larger (≈ 1 ppm) over the Northern Hemisphere where most of the observations analyzed here are acquired. We hypothesize that our first version of the NN, as published in David et al. (2021), obtains a proxy of the latitude and date and outputs the corresponding CAMS value. Based on the CAMS simulation, we found that the typical uncertainty on the position and date (σ lat = 7 • , σ lon = 58 • and σ date = 60 d) leads to a 1σ error of 0.91 ppm on XCO 2 (difference between the values at the true and perturbated location and date). This value appears consistent with the precision obtained with our first version of the NN. Note however that this statistical difference gets larger when considering locations consistent with the OCO-2 observations that are used here. The important point is that the error increases considerably (a factor of 2) for degraded precisions on the location and date with a different version of the NN that is discussed below.

A new version of the neural network
As shown above, the NN appears to use the wCO2 band to derive a proxy of the observation date, which makes it possible, together with the proxies of the location, to estimate XCO 2 based on the statistical distribution of the CAMS XCO 2 . To avoid this feature, an option is to not use the information from the wCO2 band. We therefore developed a similar version of the NN but without this band (i.e., only the O 2 and sCO2, together with the observation geometry).  With this version, the behavior of the NN changes markedly. The most important feature is that the NN now reproduces the XCO 2 plumes that are shown by the output of the ACOS algorithm. Two representative examples are shown in Fig. 4. These cases demonstrate that the NN does produce XCO 2 features that are not in the training database, as we expected. The NN is trained on the variations in XCO 2 caused by the atmospheric growth rate and the surface flux seasonal cycle. It identifies signatures in the spectra that relate to the CO 2 atmospheric content. These signatures can then be used for an estimate of XCO 2 , even for situations that are poorly reproduced in the training dataset.
In addition to the change in the band selection, and posterior to the result shown in Fig. 4, we made several other modifications to the NN algorithm.
1. We decided to use the surface pressure from the weather forecast model as an additional input to the NN. In David et al. (2021), the surface pressure was an output of the NN model. It was used to demonstrate the capability of the NN approach to interpret the spectral shapes in terms of atmospheric parameters. Indeed, the estimate of the surface pressure could be compared to an independent estimate from numerical weather analyses which are known to be precise within ≈ 1 ‰. However, the surface pressure may alternatively provide useful information to the NN for the interpretation of the spectra, as it does in the full-physics algorithms in the form of a prior estimate and also for the derivation of the biascorrected product.
2. We decided to increase the number of NN hidden layers to five (instead of one in David et al., 2021). Our experience indicates that, with a larger number of layers, there is less over-fitting of the training spectra; i.e., there is a better agreement between the loss of the training and that of the test dataset. An increased number of hidden layers also leads to slightly better performance, in particular for the NN that was designed for the land-glint observations (see below). of (i) larger variations in the optical path than for the nadir mode and (ii) the Doppler effect that may affect the absorption line positions on the input spectra. This is why our first attempts focused on the nadir cases, but there is a need to also exploit the many observations acquired in glint mode. Figure 5 shows the inter-comparison of the XCO 2 estimated from CAMS, ACOS, and NN. All three datasets are highly consistent, with a statistical difference around 1 ppm and little bias. Let us recall that there is no satellite data input to the version of CAMS that is used here, so that it is fully independent from ACOS. The 1.06 ppm standard deviation of their differences demonstrates that both product precisions are better than this number. CAMS and the NN are not as independent because the latter is trained with the former (but using different space-time locations). Let us stress that any bias in CAMS may be transferred to the NN product. Thus, a high agreement between CAMS and the NN product is not a demonstration of the latter accuracy. Still, it has been shown that the NN retrieves features that are not in CAMS, which indicates some independence between the satellite product and the model. The standard deviation of their differences is 0.85 ppm. The quadratic difference between NN and ACOS is a strong function of the sCO2 albedo as shown in Fig. 6: it decreases from ≈ 1.5 to ≈ 0.75 ppm as the sCO2 band albedo increases from 0.10 to 0.45. A better accuracy of the satellite product with stronger surface albedo is expected as (i) the measurement signal-to-noise ratio gets higher and (ii) the relative contribution of atmospheric scattering to the signal decreases. The precision estimate is also a function of the O 2 band albedo, but this effect is not as strong and the O 2 band albedo shows less variability than that of the sCO2 band. Figure 5 also shows that the slope of the best fit of the satellite products against CAMS is close to 1 but with a dif-ference of opposite sign (0.99 and 1.02). As a consequence, there is a more significant slope deviation from 1 (0.97) between the two satellite products. Figure 7 provides further information on the differences between the remotely sensed products and the CAMS estimate. The histograms are close to Gaussian and confirm that NN is closer to CAMS than the ACOS counterpart. An interesting feature is that both the NN-CAMS and ACOS-CAMS differences depend on the cloud flag (cloud_flag_idp), which indicates that this flag has some value. The difference between the cloud contamination histogram remains small however, and does not deserve to disqualify the observations with a cloud flag of 2. Here, we only use the "definitely clear" and "probably clear" cases (flags of 3 and 2). The population of the lower value cases ("definitely cloudy" and "probably cloudy") is much smaller, and the histograms for these cases are not shown, while they show further degradation. It is difficult to elaborate further as the true nature of the cloud contamination in the cases classified as probably clear is unknown. Figure 8 is based on the satellite product innovation, i.e., the difference to the model estimates. Indeed, one may consider that the model provides current knowledge on the XCO 2 distribution, constrained by surface air sample measurements and atmospheric transport. The satellite product has the potential to improve this knowledge, but only as much as the difference with the model estimate. Typical values are around 1 ppm. The interesting result shown by Fig. 8 is that the two satellite estimates are significantly correlated. This provides further evidence that the NN estimate is not only a reconstruction of the training dataset (CAMS) with some noise. Indeed, when NN differs from the model, ACOS, the independent satellite product tends to agree.
Finally, Fig. 9 shows a comparison of the model and remotely sensed estimates of XCO 2 against the reference retrievals of the TCCON network. Although the OCO-2 satel-  lite platform can be oriented so that the instrument field of view is close to the surface station, we only use nadir data here. Indeed, the NN was not trained on the target data and can only be used to process measurements that have been acquired in observation configurations that are similar to those of the training. We thus have to rely on nadir or glint measurements acquired in the vicinity of TCCON sites. In the following, we use nadir measurements that are within 5 • in longitude and 1.5 • in latitude to the TCCON site. For the reference, we average the TCCON estimates of XCO 2 within 30 min of the satellite overpass. No attempt was made to correct for the different weighting functions of the surface and spaceborne remote sensing estimates. Statistics per station are provided in Table 1. The biases vary significantly among stations, although they are generally less than 1 ppm (in amplitude). Two stations, Pasadena and Zugspitze, show a large negative bias for both satellite estimates and the model. For Pasadena, it may be interpreted as the impact of the city on the atmosphere sampled by the TCCON measurement, while the atmosphere at the location of the satellite observation (which may be several hundred kilometers away) is less affected. Zugspitze is a high-altitude site (2960 m), so that the atmospheric column sampled by the sun photometer does not have the same vertical representativeness as that of the satellite observation (in addition to the spatial distance that is common with other sites). A large negative bias is also found at Eureka (80.05 • N). The fact that the difference with the CAMS model at this site is much larger than for other sites could hint at an issue in the sun photometer product there. Conversely, there are large positive biases at Burgos and Ny-Ålesund (78.9 • N, very close to the latitude of Eureka). Since the model and satellite estimates somewhat agree, one may also question the TCCON calibration at these sites. For other stations, which form the large majority, the biases are smaller than 1 ppm, and there is a fair consistency between the satellite products in the sense that the sign of their bias is the same in most cases. The range of the difference with TCCON varies among stations. The best satellite-TCCON agreement is found at the Lamont station, which, interestingly, is also the one with the most coincidences. Excellent agreement is also seen at Darwin, Edwards, Park Falls, and Bremen. The comparison with TCCON does not allow favoring of one satellite estimate versus the other. Focusing on the stations with a large number of observation (25 overpasses or more), the NN estimates appear slightly better than ACOS at Darwin, Edwards, Garmisch, Orléans, and Białystok, while they are the opposite at Saga, Park Falls, and Sodankylä. The figure (and table) also clearly shows that the CAMS product offers a better agreement with the TCCON data than any of the satellite estimates in most cases. The high quality of the CAMS modeling used in this paper, at least over the TC-CON site, provides further justification of its use as a training dataset.
We have applied a very similar procedure to the OCO-2 observations acquired in glint mode over land. An evaluation Table 1. TCCON stations used in this paper (Figs. 8 and A4). The data have been obtained from the https://tccondata.org/ web site on 4 February 2021.  Figure 7. Histogram of the differences between either one of the two satellite datasets and the CAMS model. We distinguish cases when the flag cloud_flag_idp is "certainly clear" and "probably clear". The left figure is for the nadir dataset, whereas the right figure is for the glint. Figure 8. Density histogram of the innovation, i.e., the difference between the satellite product and the model estimates, differences between either one of the two satellite datasets and the CAMS model. The red line shows the result of a linear fit through the data points aiming at a minimization of the distance to the best line. The left figure is for the nadir dataset, whereas the right figure is for the glint. of the estimate performance is shown in Figs. 7, 8, A2, and A3. The conclusions are very similar to those obtained for nadir. The agreement with CAMS is slightly degraded with respect to the nadir cases (0.92 vs. 0.85 ppm for the certainly clear observations) but somewhat closer than that of ACOS (Figs. 7 and A2). The deviations from the model of the two satellite estimates are significantly correlated, and the correlation coefficient is even larger than that derived for nadir observations (0.45 vs. 0.39, Fig. 8). The comparison with the TCCON estimates leads to the same conclusions as those described above for the nadir cases.

Discussion and conclusion
This paper follows on from David et al. (2021), in which we described a neural-network-based technique to estimate XCO 2 and the surface pressure from the OCO-2 spectral measurements. An important message is that our interpretation of the results in that earlier study was incorrect. The NN developed in that paper reproduced the statistical variations of the training dataset (CAMS) and was unable to generate features, such as plume from emission hot-spots. Thus, contrary to our claims, the NN method, as presented in that paper, could not be used to process OCO-2 and generate XCO 2 estimates with any real value. We have shown here that a NN-based procedure is able to estimate the latitude and date Figure 9. Statistics of the differences between the NN retrieval (red), the CAMS model (green), or the bias-corrected ACOS retrievals (blue) and the TCCON retrievals. The boxes indicate the 25 %-75 % percentiles, and the median is shown by the horizontal line within the box. The whiskers indicate the 5 %-95 % percentiles. Stations are ordered by increasing latitudes. The numbers below the station name indicate the number of individual observations and coincidence days used for the statistics. The references of the various TCCON observations are provided in Table 1. Figure A4 provides similar results for the glint case.
of the observation with a reasonable accuracy. This was unexpected as we wrote in David et al. (2021) Let us recall that the NN input does not contain any information on the location or date of the observation. This is a strong indication that the information is derived from the spectra as the NN does not 'know' the CAMS value that corresponds to the observation location.
Our interpretation was wrong. In fact, the NN input can somehow be used by the NN for a fairly accurate estimate of the latitude and date. Because most XCO 2 variations are a function of latitude and date, this information could be used by the NN to generate a reasonable estimate, i.e., one that mimics the main variations in the training dataset.
A question remains on the indirect information that is used by the NN to estimate the observation date. The fact that the precision on the date estimate is much better when using a combination of the O 2 +wCO2 rather than of the O 2 +sCO2 suggests that the information lies in the wCO2 band. Our best hypothesis is that the wCO2 spectrum contains some information on the stratospheric CO 2 whose concentration is well-mixed while increasing regularly with time and implicitly contains, therefore, information on the observation date. Further testing this hypothesis would require, for instance, the identification of some anomaly in the stratospheric CO 2 (linked to a specific atmospheric circulation) that would show up as a significant error on the date estimate made by the NN. We have not been able to identify such a feature.
Despite this initial setback, we have continued our analysis of the potential of the NN to process the OCO-2 spectra. A strong motivation relied on the results obtained for the estimate of the surface pressure. Indeed, David et al. (2021) showed that the NN could estimate the surface pressure with an accuracy on the order of 3 hPa. The spatial and temporal variations in the surface pressure, at the scale of the potential accuracies on the date and location, are much larger than this number, so that the NN estimate cannot rely on this kind of indirect information. This provided a strong indication that the NN method has the potential to extract meaningful information from the spectrum itself.
We have therefore developed a new version of the NN excluding the wCO2 band from the inputs. In this version, the behavior of the NN is much different from the earlier version as it generates features that are not in the training dataset. This clearly shows that the NN uses the signature of XCO 2 contained in the sCO2 spectra to make an XCO 2 estimate. The accuracy of this estimate is similar to the one obtained with the first version of the NN and similar to that of the ACOS products. This is confirmed by the comparison of the XCO 2 estimates against the TCCON retrievals. Another strong argument that the NN XCO 2 estimate contains true information and is not only a noisy copy of the training dataset is that the innovations of the two satellite estimates, i.e., the differences to the model data, are significantly correlated (Fig. 8).
Note that we use here a single neural network for the eight footprints of the OCO-2 instrument. We analyzed whether the result performance, assessed as the standard deviation of the differences with CAMS, is a function of the footprint. The statistics are very similar for all, except for footprint 2, which shows slightly higher deviation for both the ACOS and the NN satellite products (a difference of ≈ 0.1 ppm to the mean of ≈ 1 ppm).
These results confirm that the NN technique has a strong potential to process the OCO-2 observations, as well as those from forthcoming missions aiming at the observation of CO 2 from space such as the forthcoming MicroCarb (Pascal et al., 2017) or the CO2M constellation (Sierk et al., 2019). As discussed above, the current version does not use the wCO2 band at all, and this may be seen as a loss of useful information. There is therefore a need to select appropriate spectral samples in the wCO2 band rather than discarding them all. It requires improved understanding of the indirect information that is used by the NN to estimate the observation date and location.
The NN technique has two obvious advantages compared to the physical methods that are used to process the OCO-2 observations as well as other instruments with similar objectives: (i) a much smaller computational burden and (ii) no need for a de-bias procedure (O'Dell et al., 2018;Kiel et al., 2019). Our implementation still faces remaining challenges, which we discussed in David et al. (2021).
The first challenge is the cloud detection. All the analysis described in this paper relies on the ACOS cloud detection, and only the observations identified as clear are processed. Our analysis demonstrates the potential of the NN approach but is currently not independent from ACOS. We are currently evaluating independent approaches for the cloud detection. Although the NN described here aims at an estimate of XCO 2 , we have shown earlier that the same tool can be used for an estimate of the surface pressure with a 1σ precision on the order of 3 hPa for clear-sky cases. Numerical weather analyses are actually better than that (Salstein et al., 2008). Thus, one may use the comparison of the surface pressure estimate from the NN to the numerical weather data for an easy identification of perturbations to the spectra that are linked to cloud or large aerosol contamination. This would allow an easy and rapid quality indicator for the selection of observations that may be used for XCO 2 estimates, either using a physics-based algorithm or a NN approach. This idea remains to be evaluated.
The second challenge concerns the absence of a quantitative indication of the amount of information that the NN takes from its prior information (contained in the training database) vs. the amount of information that the NN takes from the measured spectra. For Bayesian full-physics retrievals, these weights are represented by the averaging kernel (Rodgers, 1990), which allows a clean comparison of each retrieval with 3D atmospheric models, at least in theory (see the discussion about the practical difficulties in Chevallier, 2015). The NN training targets the CO 2 column with a homogeneous weighting along the vertical, but this can hardly be achieved without some contribution from the prior information. This challenge may be evaluated in the future on the basis of radiative transfer simulations.
The third challenge concerns the absence of a quality indicator with the XCO 2 estimate. With the physical methods, the spectrum residuals provide an efficient means to identify cases when no satisfactory agreement can be found between the measured and modeled spectra. With the NN approach, there is no uncertainty associated with each retrieval. Our analysis has shown that the apparent precision (evaluated against CAMS) is a strong function of the surface albedo. There may be other geophysical variables that pilot the uncertainty. To provide precision estimates for each NN-based XCO 2 estimate, ensembles of randomized trainings, where uncertain parameters or input/output variables are varied adequately (e.g., Chau et al., 2022), or analytical estimates (Aires et al., 2004) should be explored.
The last challenge concerns the need for a high-quality training dataset, in the context of increasing XCO 2 . The comparison against the TCCON observations (Fig. 8) demonstrates that the CAMS inversion product meets this requirement. In fact, there are strong indications that CAMS remains better than the satellite products, at the very least in terms of global precision. However, because of the atmospheric growth rate of CO 2 , the training must be regularly updated. Indeed, with a frozen training dataset, the true realtime XCO 2 progressively leaves the training range. The NN approach requires a training dataset that is representative of the observation and would then lead to underestimates. For quasi-near-real-time data assimilation (e.g., Massart et al., 2016), the training dataset must therefore gradually integrate recent high-quality XCO 2 data, but without sacrificing robustness.
As a final remark, we call for caution. We have been tricked by the NN ability to generate a consistent description of the atmospheric XCO 2 in our first analysis. It is difficult to ensure that we are not tricked again. The source of the information that leads to a fairly accurate estimate of the date, when using the weak CO2 band, remains unclear. As a consequence, although it is demonstrated that the new version of the NN generates structures that are not in the training dataset, there may be biases in the CAMS modeling that have a significant influence on the NN product.   Figure A3. Same as Fig. 9, but for the glint observations. Figure A4. Examples of anthropogenic CO 2 plumes as seen by the OCO2 instrument processed with the ACOS algorithm (red) and the neural network described in this paper (blue). The cases have been identified and described in Reuter et al. (2019) and Nassar et al. (2021).
Code and data availability. The codes used in this paper and the CAMS model simulations are available, upon request, from the author. The OCO-2 can be downloaded from the NASA OCO-2 archive depository (https://disc.gsfc.nasa.gov/datasets?keywords= oco-2, last access: 1 February 2022), and TCCON data can be downloaded from the TCCON Data Archive (https://tccondata.org/, last access: 4 February 2021).
Author contributions. FMB designed the study. PC and LD developed the codes and performed the computations. All authors shared the result analysis.
Review statement. This paper was edited by Folkert Boersma and reviewed by Christopher O'Dell and Sihe Chen.