XCO2 retrieval for GOSAT and GOSAT-2 based on the FOCAL algorithm

Stefan Noël1, Maximilian Reuter1, Michael Buchwitz1, Jakob Borchardt1, Michael Hilker1, Heinrich Bovensmann1, John P. Burrows1, Antonio Di Noia2, Hiroshi Suto3, Yukio Yoshida4, Matthias Buschmann1, Nicholas M. Deutscher5, Dietrich G. Feist6,7,8, David W. T. Griffith5, Frank Hase9, Rigel Kivi10, Isamu Morino4, Justus Notholt1, Hirofumi Ohyama4, Christof Petri1, James R. Podolske11, David F. Pollard12, Mahesh Kumar Sha13, Kei Shiomi3, Ralf Sussmann14, Yao Té15, Voltaire A. Velazco5, and Thorsten Warneke1 1Institute of Environmental Physics, University of Bremen, FB 1, P.O. Box 330440, 28334 Bremen, Germany 2Earth Observation Science, University of Leicester, LE1 7RH, Leicester, UK 3Japan Aerospace Exploration Agency (JAXA), 305-8505, Tsukuba, Japan 4National Institute for Environmental Studies (NIES), 305-8506, Tsukuba, Japan 5Centre for Atmospheric Chemistry, School of Earth, Atmospheric and Life Sciences, University of Wollongong NSW 2522 Australia 6Max Planck Institute for Biogeochemistry, Jena, Germany 7Deutsches Zentrum für Luftund Raumfahrt, Institut für Physik der Atmosphäre, Oberpfaffenhofen, Germany 8Ludwig-Maximilians-Universität München, Lehrstuhl für Physik der Atmosphäre, Munich, Germany 9Karlsruhe Institute of Technology, IMK-ASF, Karlsruhe, Germany 10Finnish Meteorological Institute, Space and Earth Observation Centre, Tähteläntie 62, 99600 Sodankylä, Finland 11NASA Ames Research Center, Atmospheric Science Branch, Moffett Field, CA 94035, USA 12National Institute of Water and Atmospheric Research Ltd (NIWA), Lauder, New Zealand 13Royal Belgian Institute for Space Aeronomy (BIRA-IASB), Brussels, Belgium 14Karlsruhe Institute of Technology, IMK-IFU, Garmisch-Partenkirchen, Germany 15Laboratoire d’Etudes du Rayonnement et de la Matière en Astrophysique et Atmosphères (LERMA-IPSL), Sorbonne Université, CNRS, Observatoire de Paris, PSL Université, 75005 Paris, France

Both GOSAT and GOSAT-2 perform point measurements with a spatial resolution (footprint diameter) of about 10 km. For both instruments, we use a tabulated instrumental line shape (ILS) with a kernel width of 15 cm −1 . For GOSAT this has been generated by a theoretical formula parameterising a "real-world" FTS instrument (see e.g. formula 5.21 in Davis et al., 2001) , which depends on the maximum optical path difference (MOPD, ±2.5 cm for GOSAT) and the size of the instantaneous field 85 of view (IFOV, 15.8 mrad for GOSAT). The same formula has been used by Heymann et al. (2015). This ILS is symmetric and the same for S and P polarisation.
For GOSAT-2, we use a preliminary tabulated ILS provided by JAXA and generated on 16 January 2020, which is different for S and P polarisation and asymmetric, especially in the NIR band. Meanwhile, this ILS has been officially released and is available via the NIES web site.

Reference Spectra and External Databases
For the retrieval several reference spectra and databases are used.
The solar spectrum used in the forward model is based on a high resolution solar transmittance spectrum (O'Dell et al., 2012) in combination with an ISS solar reference spectrum (Meftah et al., 2018). For the SIF retrieval we used a chlorophyll fluorescence spectrum by Rascher et al. (2009), which has been scaled to 1.0 mW/m 2 /sr/nm at 760 nm. 95 We use tabulated cross sections at a 0.001 cm −1 sampling based on HITRAN2016 (Gordon et al., 2017) and the absorption cross section database ABSCO v5.0 (Benner et al., 2016;Devi et al., 2016) from the NASA (National Aeronautics and Space Administration) ACOS/OCO-2 project.
Surface elevation, surface roughness and surface type are derived from the Global Multi-resolution Terrain Elevation Data (GMTED2010; Danielson and Gesch, 2011) of the U.S. Geological Survey (USGS) and the National Geospatial-Intelligence 100 Agency (NGA) at a spatial resolution of 0.025 • . Meteorological information (pressure, temperature, water vapour profiles) is obtained from ECMWF (European Centre for Medium-range Weather Forecasts) ERA 5 model data (Hersbach et al., 2020), which are available every 1 hour on a 0.25 • horizontal grid and on 137 altitude layers.
We use XCO 2 data from the CarbonTracker (CT) model CT2019 and CT-NRT v2020-1 (Jacobson et al., 2020a, b) and data from the Total Carbon Column Observing Network (TCCON, see e.g. Wunch et al., 2011a) in the context of the bias correction 105 reference database (see section 2.3). TCCON data are also used for validation (see section 5). Table 1 lists the TCCON stations which provided data for the present study.
XCO 2 a-priori profiles are derived using the 2018 version of the simple empirical CO 2 model SECM (Reuter et al., 2012).
In the context of validation, we use the 2020 version of SECM. XCH 4 a-priori data are from the simple CH 4 climatological model SC4C2018 developed and used by Schneising et al. (2019) and briefly described by Reuter et al. (2020). 110 For CO 2 we use the "synth" a-priori error covariance matrix described by Reuter et al. (2017b). For H 2 O, we use the same error covariance matrix as Reuter et al. (2017b), but scaled by a factor of 5 to reduce the dependencies of the retrieval results on the a-priori. For CH 4 , for convenience, we scale the CO 2 matrix to result in an XCH 4 uncertainty of 45 ppb, which is considered to be a reasonable estimate. Note that only the matrices are scaled, not the a-priori values. 115 Quality filtering and bias correction usually require the knowledge of a "true" (in this case XCO 2 ) value. For this, we do not simply use model data as truth, as one aim of XCO 2 products is to improve models. Another method is to take ground-based TCCON measurements as basis for a bias correction. However, although TCCON measurements are very accurate (estimated 1-σ precision is 0.4 ppm, see Wunch et al., 2010), they are only available at certain locations and are therefore more suited for validation. 120 Our choice is therefore to use a data base generated from a combination of TCCON measurements and CarbonTracker (CT) model data for a reference year (2018 for GOSAT, 2019 for GOSAT-2), which should on average reproduce large scale features correctly.

The reference database
This database is produced in the following way: As a first step, we determine from the CT data global daily 3D maps close to 13:00 local time (i.e. GOSAT and GOSAT-2 equator crossing time). We reduce the altitude grid to five layers with the same 125 dry-air sub-columns, i.e. the same amount of particles, and interpolate the data from the native CT horizontal resolution of  Rodgers and Connor, 2003;Wunch et al., 2010). We look for contiguous regions where CT 130 XCO 2 data differ by less than 0.75 ppm from the CT value at the TCCON location; these data are then used for the reference database. The reference database therefore does not contain any TCCON data -it only contains CT data which were confirmed by TCCON, but individual values may differ by up to 1.5 ppm. The choice of the 0.75 ppm ranges is based on a trade-off between accuracy (agreement with TCCON) and spatial coverage of the database.
The result are daily maps containing CO 2 data for five vertical sub-layer altitudes. The spatial coverage is usually not global 135 and varies from day to day. There are typically more data in the southern hemisphere during the second half of the year compared to the first half of the year (see Fig. 1). When comparing with GOSAT or GOSAT-2 measurement results, the "true" XCO 2 is then computed from the CO 2 layers of the reference database, considering the retrieval's averaging kernels.
Specifically, we use here for GOSAT CT2019 data in combination with TCCON GGG2014 (see Tab. 1) for 2018. For GOSAT-2 we also use TCCON GGG2014 data, but need to rely on CT-NRT v2020-1 for 2019. Because the CT NRT data are 140 not yet available for the whole year 2019, the GOSAT-2 reference database does not cover the whole year; there are essentially no data after August 2019. This is a limiting factor for GOSAT-2, especially because this also means that data in the southern hemisphere are less present in the 2019 database.

GOSAT Level 2 Products
To assess the quality of the newly created GOSAT-FOCAL XCO 2 products, they have been compared with several other well-145 established GOSAT Level 2 data sets (see section 5). The GOSAT BESD v01.04 product from IUP (Heymann et al., 2015) is a near-real time product generated in the context of the Copernicus Atmospheric Monitoring Service (CAMS, https://atmosphere. copernicus.eu/ (last access: 30-July-2020)) project. It is available from 2014 onward. The GOSAT RemoTeC v2.3.8 product from SRON (Butz et al., 2011) and the full-physics GOSAT product from the University of Leicester v7.3  were generated in the context of the Copernicus Climate Change Service (C3S, https://climate.copernicus.eu/; last access: 30- product (O'Dell et al., 2012(O'Dell et al., , 2018Kiel et al., 2019) is also available for the years 2009 to 2019. The operational GOSAT XCO 2 product v02.95 (bias corrected) from NIES (Yoshida et al., 2013) currently ends in August 2020. The BESD product contains only XCO 2 data over land, all other products are available for water and land surfaces.

155
The retrieval is performed in three main steps: Pre-processing, processing and post-processing. These are described in the following sub-sections.

Pre-Processing
During pre-processing all required input data for the main processing step are collected. Furthermore, a first filtering of data is performed to reduce processing time.

160
The pre-processing procedure is largely based on the pre-processing as present in the BESD GOSAT product (Heymann et al., 2015). The sequence of pre-processing activities is as follows: 1. Extraction of measured spectra, geolocation and information on quality and measurement modes (e.g. gain, scan direction) from the GOSAT L1B product.
the tropical reference atmosphere from Anderson et al. (1986), scaled to column average values of XCO = 0.1 ppm and Because FOCAL is a fast algorithm and the number of GOSAT and GOSAT-2 measurements is much less than for OCO-2, 175 we chose to set the pre-processing filters relatively relaxed and to apply the quality filtering mostly in the post-processing. As can be seen from Fig. 2 about two thirds of the measurements are filtered out during pre-processing.

Noise Estimate
Similar to Heymann et al. (2015) the spectral noise is initially assumed to be independent from wavenumber for each band. It is estimated from the standard deviation of the real part of the "dark" off-band signal (i.e. the first 500 spectral points in each 180 band). In a later step (see Section 3.2.1) this noise will be modified to account for additional forward model errors and overall scaling.

Cloud Filter
The cloud filtering is based on two physical properties of clouds: clouds are (usually) bright and clouds are high (higher than the surface) so that little water vapour is above them. In the pre-processing these properties are described by two quantities: 185 effective albedo and water vapour filter. These are derived for each spectrum as described in Heymann et al. (2015). The effective albedo for each band is estimated from the mean reflectance L within a spectral range outside the absorption. L is determined from the mean radiance I, the mean irradiance I 0 and the solar zenith angle α via: The specific wavenumber ranges and irradiance values used for filtering are given in Tab. 3.

190
The water vapour filter is determined from a spectral region with strong water vapour absorption in the SWIR-3 band (see Tab. 3). This filter is defined as the ratio between the median radiance and the median of the estimated noise in this spectral range.
A ground pixel is assumed to be cloudy if either the effective albedo in one of the bands or the water vapour filter exceeds the thresholds given in Tab. 2.

Processing
The processing is based on the Fast atmOspheric traCe gAs retrievaL (FOCAL) algorithm which is described in detail in Reuter et al. (2017c). A first successful application of this algorithm to OCO-2 data is given in Reuter et al. (2017b). Therefore, we only summarise the main features of the algorithm here and point out the differences to the OCO-2 application.
FOCAL approximates modifications of the direct light path due to scattering in the atmosphere by a single scattering layer, which is characterised by its height (pressure level), its optical thickness and an Ångström parameter which describes the wavenumber dependence of scattering. The layer height is normalised to the surface pressure. Furthermore, Lambertian scattering on the surface is considered. For atmospheric scattering processes an isotropic phase function is assumed. With this approximation, the FOCAL forward model is essentially an analytical formula; it uses pre-calculated and tabulated cross sections such that calculations can be performed considerably fast. The inversion of the forward model is based on optimal 205 estimation (Rodgers, 2000) and uses the Levenberg-Marquardt-Fletcher method (Fletcher, 1971) to minimise the cost function.
The OCO-2 retrieval of Reuter et al. (2017b, c) uses four fit windows in the NIR (near-infrared) and SWIR spectral range to derive the atmospheric parameters XCO 2 , water vapour and SIF. In contrast to OCO-2, GOSAT and GOSAT-2 cover a wider spectral range and provide spectra in two polarisation directions referred to as S and P. Therefore, we treat in our retrieval both polarisation directions as independent spectra opposed to the average of both as usually used in other GOSAT retrievals 210 (see e.g. Butz et al., 2011;O'Dell et al., 2012). However, recently  presented a methane retrieval for GOSAT based on an algorithm from Kikuchi et al. (2016), which also makes use of both polarisation directions.
We use both polarisation corrections mainly for the following reasons: -In principle, information is lost when averaging S and P spectra.
-In general, the sensitivity of the instruments and therefore the calibration of the measured spectra is different for S and 215 P. For example, the measured ILS is given independently for S and P.
-S and P include different information on scattering, which can also be used in filtering and/or bias correction.
Furthermore, the FOCAL fitting windows (see Tab. 4) have been adapted to the specific GOSAT(-2) spectral bands such that in addition also other atmospheric constituents and parameters like HDO and (in the case of GOSAT-2) also CO total columns and possibly XN 2 O can be retrieved. This results in six fitting windows for GOSAT and eight windows for GOSAT-2 for each 220 polarisation. The retrieval is performed on a wavenumber axis.
Because of the large number of target gases and spectral bands the retrieval requires various state vector elements. These are listed together with the fit windows, from which they are determined, and their a-priori values and uncertainty ranges in Tab. 5 for GOSAT and GOSAT-2.
For GOSAT, the retrieval determines CO 2 , CH 4 and H 2 O on 5 layers with same number of air particles, from which then the 225 column average values XCO 2 , XCH 4 and XH 2 O are calculated. Furthermore, solar induced fluorescence (SIF) is determined by scaling of a corresponding reference spectrum.
Instead of the HDO column, we fit a scaling factor for the relative abundance of HDO compared to H 2 O, δD, which is defined as: where R meas is the ratio of the measured HDO and H 2 O columns, R VSMOW ( = 3.1152 × 10 −4 ) is the corresponding value for Vienna Standard Mean Ocean Water (VSMOW). δD is usually given in units of per-mill.
δD = 0‰ corresponds to HDO concentrations as in VSMOW, δD = −1000‰ to no HDO. We assume the same profile shape for HDO as for H 2 O. For GOSAT-2, we also fit scaling factors to (fixed) CO and N 2 O profiles.
As mentioned above, atmospheric scattering is considered in FOCAL by a single scattering layer, which is described by 235 three parameters (height, optical depth and Ångström coefficient). As scattering is different for S and P polarised light, we fit two independent layers for S and P.
In addition, we determine in each fit window (independently for S and P) a polynomial background function describing the surface albedo. For this we use second order polynomials except for the small SIF windows (no. 1) where a linear function is sufficient.

240
The GOSAT data files only contain a fixed spectral axis. As e.g. described in Heymann et al. (2015), the spectral calibration of GOSAT changes especially at the begin of the mission with time. This change can be corrected by a spectral scaling factor.
We determine this overall scaling factor by a spectral fit in the SIF window before the retrieval. So far, this spectral pre-fitting seems to be unnecessary for GOSAT-2. In the retrieval, we then additionally consider for both GOSAT and GOSAT-2 possible additional spectral shifts and squeezes in each fit window by corresponding state vector elements, but the influences of these 245 spectral changes on the results is rather small.

Noise Model
The noise N derived from the off-band signal is only an estimate. It does not consider a possible wavenumber dependence of the noise within one spectral band. Furthermore, a potential error of the forward model needs to be considered. In the optimal estimation method this can be achieved by including the forward model error in the measurement error covariance. For this, 250 we define a scaling factor s for the estimated noise and the quantity δF , which denotes the relative error of the forward model.
The forward model error is proportional to the continuum radiance outside the absorption I, which is estimated from the 0.99 percentile of the measured radiance at the edge of each fit window. The quantities δF and s are determined using the approach described in (Reuter et al., 2017b), i.e. by running the retrieval for a representative subset of data and then fitting the function RSR(N SR) = (s N SR) 2 + δF 2 (3) 255 to binned values of the residual-to-signal ratio (RSR) as function of the noise-to-signal ratio (NSR). RSR is defined as the standard deviation of the retrieved spectral residual in each fit window divided by the continuum signal I; NSR is the standard deviation of the noise divided by I.
With the method described in Reuter et al. (2017c) it is also possible to define a 2σ-outlier limit based on NSR and RSR data, which will be used to filter out too noisy data during post-processing (see section 3.3). This is parameterised by a second 260 order polynomial as a function of the uncorrected NSR which is added to the RSR function of Eq.
(3). The coefficients a i are determined via a fit. To avoid extrapolation, f N is set to the edge values outside the fitting range.
In order to cover the varying signal over the year, we base the noise model fits on data from one day per month for one 265 reference year. For GOSAT, we take from December 2017 to November 2018 (as there are only few GOSAT data available in December 2018). For GOSAT-2 we use data from February 2019 to December 2019. In the case of GOSAT-2 we further restrict the input data for the noise model parameter fit to data over land because some of the data over water show an unexpected behaviour (low RSR in case of large NSR), which needs further investigation. In this sense, the current GOSAT-2 noise model is considered to be preliminary and may need some refinement in the future.
270 Fig. 4 shows as an example the noise model results for GOSAT and GOSAT-2 in the fit windows 2 (O 2 (A) band) and 6 (strong CO 2 absorption) for P polarisation. The orange line gives the fitted RSR function, the red line the outlier limit. The derived values from the noise model for all fitting windows / polarisations are given in Tab. 6 and 7 for GOSAT and GOSAT-2.
The forward model errors δF are on average slightly larger for GOSAT-2 than for GOSAT. In the SWIR, values similar to OCO-2 are obtained, but in the NIR the OCO-2 δF is typically smaller (about 0.003). This indicates that for GOSAT and

275
GOSAT-2 instrumental/calibration effects seem to impact the radiance errors more in the NIR than in the SWIR.

Post-Processing
The purpose of post-processing is to filter out invalid data and to perform a bias correction for the products. The current post-processing focuses on XCO 2 . The post-processing is performed in several steps, namely: 1. Basic filtering based on physical knowledge.

280
2. Filtering out low quality data using parameters / limits determined using a random forest classifier.
3. Application of a bias correction using a random forest regressor.
4. Additional filtering out of data with too large bias correction.
These steps are described in the following subsections.

Basic filter 285
The basic filtering removes measurements where the retrieval does not converge or where the quality of the fit results is too low.
We consider this to be the case if the χ 2 calculated over all fit windows is larger than 2 or if for at least one of the fit windows the RSR outlier limits (see section 3.2.1) are exceeded. Furthermore, we apply some initial filters for nonphysical values on the derived scattering parameters (i.e. layer height outside the atmosphere, Ångström coefficient not within [1,5]). Note that all retrieved scattering parameters such as the Ångström exponent can be considered "effective" parameters as they have to 290 account for not only cloud/aerosol scattering but also Rayleigh scattering (which has an Ångström coefficient of 4). Because Rayleigh scattering is always present and we filter out cloudy scenes, we usually get higher effective Ångström coefficients than those expected from clouds or aerosols only.
We also limit the maximum allowed optical depth of the scattering layer to 0.02 to filter out too thick clouds or aerosol amounts and use a maximum allowed XCO 2 posteriori uncertainty of 2 ppm. As described by Reuter et al. (2017c), FOCAL 295 simulates scattering only for an isotropic phase function. The prominent forward peak, usually existing for Mie scattering phase functions of cloud and aerosol particles does basically not modify the light path. As FOCAL's optical depths of the scattering layer do not include this forward peak, these optical depths are much smaller than optical depths including a strong forward peak while having a similar influence on the light path modification (see discussion in the publication of Reuter et al. (2017c)).
The maximum value of 0.02 for the layer optical depth should therefore not be interpreted as e.g. an aerosol optical depth.

300
The limits for the optical depth of the scattering layer and the XCO 2 error are somewhat arbitrary and actually result from visual inspection of the retrieval results. However, they are only intended as a first rough quality filter to facilitate later filter and bias correction methods, which will partly use the same parameters (see below). The detailed choice of these limits is therefore considered uncritical for the final results.
The above mentioned filter parameters and limits (see Tab. 8) are applied to both land and water surfaces and are the same 305 for GOSAT and GOSAT-2, except for the RSR outlier limit which has been determined individually for each instrument. Figs.
2 and 3 show how many data points are typically filtered out in this step. The different performance of the RSR filters for both instruments indicates that the filtering for GOSAT-2 needs some further optimisation, which is planned for the next version of the products.

310
In the next step, data are filtered out based on their expected XCO 2 bias i.e. the difference to a "true" XCO 2 . Of course, this true XCO 2 value is normally not known. We therefore use the XCO 2 reference database (as described in section 2.3) to train a random forest classifier (Pedregosa et al., 2011) to identify those variables which would remove -in combination with a corresponding random forest database -a pre-described percentage p of data based on their XCO 2 bias. This is done independently for data over land and water. Note that we are only interested here in the XCO 2 bias on top of an overall global 315 bias as the latter will be handled via the bias correction.
This method is similar to the random forest filter used by Schneising et al. (2019) in the context of CH 4 and CO retrieval.
Other approaches for data filtering used e.g. by Mandrake et al. (2013) and Reuter et al. (2017b) identify the best variables by minimisation of local XCO 2 variability in the retrieved products. However, all methods essentially serve the same purpose, i.e.
to derive a reproducible filtering procedure which does not solely rely on expert knowledge. 320 We determine the list of relevant variables and the random forest database for the filtering in the following way: We use the (uncorrected) results of the retrieval for the reference and apply the basic filtering as described in section 3.3.1. Then, the subset of these data is selected which has a corresponding "true" value in the reference database. For these data we determine the XCO 2 bias (measurement -reference XCO 2 ) and subtract the global median for land and water of this bias. We then sort the data according to this bias and flag those p percent of data with the highest absolute bias values as "bad".

325
For GOSAT, this results in a total number of 54317 data points over land and 109414 over water. These numbers are smaller for GOSAT-2 (land: 10625, water: 40459). The random forest classifier is then trained by using randomly 90% of these data as input. The training is done in two iterations: First, with a complete set of possible input variables ("features") and output variables ("estimators"); then, using only a reduced set consisting of the 10 best features/estimators, i.e. those with highest random forest score (relevance) of the first run. This relevance is a quantity which describes the relative importance of each 330 feature for the filtering. Relevances are normalised such that the sum of all relevances is 1. The random forest classifier then decides for each measurement based on these 10 variables if it is filtered out or not.
The initial list of possible features/estimators includes essentially all quantities available after the retrieval, including viewing angles, surface properties and continuum signal for each fit window. Furthermore, the retrieved values of the state vector elements and their errors are included in this list as well as averaging kernels for the profiles. We explicitly exclude the 335 geolocation of the measurement (latitude, longitude) and the retrieved values (but not the errors) for the data products we are interested in, i.e. the gases and SIF. This is to avoid e.g. the filtering out of certain geographical regions or removing all points with high XCO 2 values.
Note that in glint mode over ocean, the inclusion of information about the viewing geometry as possible features bears the risk that the random forest procedure may infer the geolocation from a combination of these.
340 However, we include as possible filter variable the gradient of the retrieved CO 2 profile (i.e. the difference between the two lowermost layers) as this has shown to be a suitable quantity.
The original number of candidate variables presented to the random forest classifier is quite high (193 for GOSAT and 246 for GOSAT-2) as can be seen from Figs. 5 and 6 (top left plots), but there are only few with a high relevance.
The ten best variables selected partly differ for land and water surface (as shown in the middle and left top panels), but they 345 usually comprise scattering parameters, polynomial coefficients, spectral corrections and some XCO 2 related parameters.
The other 10% of the input data are used to test the performance of the classifier. The results from this test and other crossvalidation activities indicate, that the random forest classification is -depending on surface -only accurate in about two thirds of the cases. This accuracy is defined as the fraction of correctly classified samples. This means that the filtering also removes possibly valid data points and does not remove all possibly bad ones. However, we do not expect a perfect classification, 350 because it is not possible to describe all inter-dependencies via the set of input features. Note that the performance of the filter is similar for both the training and the test data sets. This indicates that there are no problems with over-fitting.
To obtain a high quality of the remaining XCO 2 data, we therefore need to filter out quite a large percentage of data (and perform an additional filtering at a later time, see below). For future data products further investigations are planned to improve the performance of the classifier, e.g. by providing additional features from combination of existing ones (like the already used 355 CO 2 gradient). The percentage p of data to be filtered out is usually a trade-off between data quality and remaining amount of data. In the present case a 50% limit has been selected. Actually, as can be seen from Figs. 2 and 3, the relative amount of data filtered out via the random forest classifier is not exactly 50% of the data remaining after the previous filters.

Bias correction and filtering
XCO 2 retrieval methods usually require a bias correction to be applied to the data. This correction is often based on multi-linear 360 regressions using parameters identified from correlation analyses of differences to an assumed "true" XCO 2 data set. Different data products use different methods to define this "truth". In many cases, ground based TCCON measurements are taken as reference for the "true" XCO 2 , like in the GOSAT BESD product (Heymann et al., 2015), the SRON products (Guerlet et al., 2013), the product from the University of Leicester  and also the operational GOSAT product from NIES (Inoue et al., 2016). The bias correction of the NASA ACOS OCO-2 product is quite complex (O'Dell et al., 2018;Kiel et al., 365 2019); it uses as reference a modification of the "Southern Hemisphere Approximation" introduced by Wunch et al. (2011b).
The ACOS products takes multi-model mean data as reference and derives corrections from data in the southern hemisphere below 20 • S where CO 2 is assumed to be quite uniform on small areas. A similar "small area approximation" is also used by Reuter et al. (2017c) to correct FOCAL OCO-2 data. This correction method is not possible for GOSAT and GOSAT-2 data because of their sparse sampling. We therefore follow a different approach here.

370
For the bias correction we use as input the same data set as for the random forest filter, but with this random forest filter applied. 50% of the resulting data set is then used to train a random forest regressor (see also Schneising et al., 2019), which aims to minimise the "true" XCO 2 bias, i.e. the difference to the reference database value without global median subtracted, as function of the specified features. To create the bias correction database and the corresponding list of best features we again run the training twice, first with the full list (the same as for the filter) and then with the top ten features. Again, we use different 375 corrections for land and water. The resulting parameters and their performance are shown in the bottom panels of Figs. 5 and 6. The bias correction selects similar best features as the filter, but not exactly the same quantities in the same sequence.
The actual number of features/variables to be used for both filtering and bias correction is a trade-off between many variables (explaining all relations with a risk for over-fitting and high computational effort) and few variables (no over-fitting, but maybe missing some relations). Considering ten features for the bias correction is more than other algorithms typically use, but this 380 is appropriate in our case because we take into account the different relevance of these parameters, which drops off rapidly within the first ten variables (see top panels of Figs. 5 and 6).
The validity of this choice is confirmed by the application of the bias correction to the training data set and the other 50% of the input data, which shows a comparable reduction of the XCO 2 scatter. This is an indication for a good performance (e.g. no over-fitting) of the regressor.

385
During application of the bias correction, the random forest regressor estimates the XCO 2 bias based on the values of the input variables. This bias is then subtracted from the retrieved value.
Currently, there is only a bias correction for XCO 2 , but in principle this method is applicable also to other quantities depending on the availability of a corresponding reference database.
After the bias correction there are still a few outliers left in the XCO 2 data. These are filtered out by an additional filter on 390 the XCO 2 bias derived via the random forest classifier. The limits for this filter are the global median bias for test data set ±2 ppm. the median bias is different for land and water surfaces and also for GOSAT and GOSAT-2. The actual limits are given in Tab. 9. The value 2 ppm is estimated from visual inspection of the data. Figs. 2 and 3 show that typically less than 1-2% of the remaining data (less than 0.1% of all) are affected by this last filter.
The spatial variability of the finally derived bias correction (see Fig. 7) is typically below 1 ppm, but as mentioned above 395 there is a systematic difference between land and water data of about of 1 -1.5 ppm (see Tab. 9).
The FOCAL retrieval has been applied to all GOSAT and GOSAT-2 measurements until end of 2019. On average, FOCAL needs 22 s with 6 iterations for the processing of one GOSAT ground pixel. For GOSAT-2 numbers are slightly larger (28 s/7 iterations) because of the additional fit windows and state vector elements. All performance values are given for a single core 400 of an Intel Xeon E5-2667v3 CPU (3.2 GHz). These numbers are actually about one magnitude larger than the ones given in Reuter et al. (2017b, c) for the FOCAL application to OCO-2. This is because we use for GOSAT(-2) separate S and P polarisation spectra and more retrieved variables, which requires more and larger fit windows. For each of these fit windows and parameters, weighting functions have to be calculated, which involves a convolution with the ILS. This convolution is the most time consuming part of the FOCAL retrieval. This is even more relevant for GOSAT(-2), because the FTS ILS is in 405 principle sinc-shaped, i.e. it has a sharp peak in the centre but wide wings, which requires a large kernel width (in our case 15 cm −1 for the convolution. The left plots of Figs. 8 and 9 show examples for measured and fitted nadir mode radiance spectra for GOSAT and GOSAT-2 over land in the different fitting windows for P polarisation. Since the difference between measured and modelled spectra is small and thus hard to see, we show in these figures on the right side the corresponding residuals and the estimated noise. The 410 results for S polarisation look similar and are therefore not shown here. The residuals are on the order of magnitude of the noise, which is slightly higher for P polarisation than for S polarisation. Some small spectral structures are visible in the residuals, they appear more clearly in the smoothed residuals (convoluted with a 21 pixel boxcar), e.g. for GOSAT and GOSAT-2 in the O 2 (A) band (window 2), and some broadband oscillations in window 4 and 5 for GOSAT-2. These features are present in both S and P polarisations and occur also in other measurements, so they seem systematic. A reduction of these features could 415 further improve future products.
In Fig. 10 some statistical information about the GOSAT-FOCAL data products is given. A time series for the number of valid data is given in the top plot. In the recent years, about 5-6% of the available measurements could be transferred to valid XCO 2 data. The number of valid data points increases from 2009 to 2019. This is mainly due to an increase in the data over water, which is related to optimisations in GOSAT operations (better use of glint geometry) over water. As expected, the 420 mean global XCO 2 shown in the middle plot increases with time. Global mean values over water are typically slightly higher than over land. The observed XCO 2 variability (standard deviation, bottom plot) is larger over land. Part of this variability is attributed to influences of surface elevation and to different scattering pathways between the land and water measurements. For GOSAT-2, only retrieved data from 2019 are available so far. The total amount of available measurements is about 2.8 million, compared to about 3.5 million GOSAT measurements in 2019. Only about 3% of the GOSAT-2 data remain after all filtering / 425 post-processing, which is roughly half of the corresponding number for GOSAT (but similar to the first year of GOSAT). As can be seen from Fig. 3 more GOSAT-2 data are filtered out due to failed or bad convergence and by the RSR outlier limits than for GOSAT (Fig. 2). Future improvements of the GOSAT-2 calibration or the noise model are considered to help here.
For further analyses, we have generated monthly maps on a 5 • × 5 • grid. Example plots for the months April and August 2019 (begin/end of the growing season) are shown in Fig. 11 for GOSAT and in Fig. 12 for GOSAT-2. The data are not filtered 430 for low amounts of input data in the grid points, which explains some individual outliers in the plots. Overall, the spatial patterns observed by GOSAT and GOSAT-2 look reasonable. The north-south gradient in XCO 2 is visible with different sign in April and August for both instruments. The spatial coverage of the GOSAT-2 data is lower than for GOSAT, because more data are filtered out (see above). This affects especially regions like the northern part of Africa.

435
For the verification and validation of the GOSAT and GOSAT-2 FOCAL products we perform a comparison with various reference data sets (see section 2), namely: -The GOSAT BESD v01.04 product from IUP (referred to as "BESD" later).
-The GOSAT UoL_FP v7.3 product from the University of Leicester (referred to as "UoL" later).
For the comparisons, all data have been adjusted using the same a-priori (SECM2020).
Since all GOSAT products use different retrieval and filter methods, they do not contain the same number of data (see 445 Fig. 13). Currently, the NASA ACOS product has the largest number of valid data points, followed by the new GOSAT-FOCAL product with about 20% less data.

Direct comparisons
There are enough common measurement points included in the different GOSAT products to perform a direct comparison. For these data we performed a linear regression using the Orthogonal Distance Regression (ODR) method (see e.g. Boggs et al., 1987). Unlike common linear regression, ODR considers uncertainties for both axes (data sets) by minimising the orthogonal distances between the model curve and the data points. The ODR results are shown by the red line and its label. Number of collocations and median/mean and standard deviations of the differences are given in the titles.

455
Overall, the data scatter around the 1:1 line in a similar way for all comparisons. ODR slopes vary between the data sets

TCCON comparisons
The TCCON network provides high-quality XCO 2 (and other) data which are currently considered to be the main reference for greenhouse gas data obtained from satellite measurements. Therefore we compared the different GOSAT data sets with collocated TCCON measurements from 2009 to the end of 2018. BESD data are not included, because they do not cover the complete time interval. Collocation criteria are:

465
-Maximum time difference of 2 h.
-Maximum spatial distance of satellite measurement from TCCON station 500 km.
-Maximum surface elevation difference between satellite measurement and TCCON station 250 m.
In addition to these criteria we also consider in the validation only stations / TCCON data sets, which have at least 50 collocations for all algorithms. This improves the comparability of regional and seasonal biases. As a consequence, not all  The seasonal component of the bias has a station to station average standard deviation of 0.37 ppm. Overall, the ACOS product performs best in this comparison.
Note that the biases shown in Fig. 16 correspond essentially to a bias anomaly since a the mean bias over all stations was removed from all products. This is because for most applications this mean bias is not relevant since most information is contained in gradients. Subtracting a mean bias also facilitates a comparison of different bias patterns between the algorithms.

485
The subtracted mean station bias is actually small; for GOSAT it varies between 0.17 ppm (for ACOS) and 0.64 ppm (for NIES). Because of the subtracted mean bias different signs of biases for different products could be coincidental. However, the biases of FOCAL and ACOS are consistent with the biases found by Reuter et al. (2020).
In Figs. 17 and 18 we show results from a preliminary comparison of the new GOSAT-2 FOCAL data with TCCON. These results are considered less reliable, because they are based on a data set which covers less than a year. There are sufficient 490 collocations for only seven TCCON stations, and some of them comprise only a few months of data, which limits the fit of our trend model. The mean scatter of the GOSAT-2 data is 1.86 ppm and therefore similar as the one for GOSAT. The derived station-to-station bias for GOSAT-2 FOCAL is 1.14 ppm. This high value is mainly due to the biases derived for the stations Orleans and Reunion Island (where only few data are available) and the station Pasadena, which also showed larger discrepancies to the GOSAT products (see Fig, 16). We are confident that the station-to-station bias will improve when more 495 and improved GOSAT-2 data will be available.
Via the TCCON comparison it is also possible to validate the reported precision of the FOCAL data products (i.e. the specified XCO 2 error). The basic idea is to estimate the "true" precision from the variability of the XCO 2 bias relative to trend-corrected, collocated TCCON data. For this purpose, we define 20 bins with increasing reported XCO 2 uncertainty and compute the corresponding true precision from the scatter relative to TCCON (i.e., the fit residual mentioned above).

500
The corresponding scatter plot is shown in Fig. 19. We use the fitted linear curve to correct the reported uncertainty of the GOSAT-FOCAL data. After the correction, all data scatter around the 1:1 line (dashed).
A similar correction will be performed for the GOSAT-2 FOCAL product as soon as sufficient data (GOSAT-2 as well as TCCON) are available, which is currently not yet the case.

Time series 505
To investigate the temporal behaviour of the FOCAL XCO 2 data sets, we performed comparisons based on monthly data from 2009 to 2019, which were spatially gridded to 5 • × 5 • (examples are shown in Figs. 11 and 12). Similar data sets have been generated for the SRON, UoL, ACOS and NIES GOSAT products. We also produced a corresponding gridded GOSAT-BESD data set; since these are near-real-time (NRT) data only, there are no GOSAT BESD data before 2014 available (when the NRT processing started). GOSAT-2 data start in February 2019. 510 We then selected for each combination of GOSAT-FOCAL XCO 2 and a correlative data set grid points where the standard error of the mean is less than 1.6 ppm (as a basic quality filter, similar as done by Reuter et al., 2020). These data were then averaged over different latitudinal ranges, namely:  Figure 20 shows the results of these comparisons. The left plots display time series of the different data sets, the right plots the difference between the GOSAT-FOCAL XCO 2 and the reference data. All data products reproduce the overall increase of XCO 2 with time as well as the seasonal variations. On average, FOCAL data are typically about 0.5 ppm higher than the other 520 data sets (except for BESD). This related to the choice of the "true" XCO 2 for the bias correction. There are long-term changes in the order of 1 ppm over the complete time series, which differ for each data set. For example, the GOSAT-FOCAL data show in the tropics relative to SRON a higher value at the start of the time series, but both data sets agree at the end. On the other hand, the average difference to the UoL data in the northern hemisphere is negative during the first years, but increases to an almost constant small positive offset below about 0.5 ppm. There is not much difference in the temporal behaviour between 525 the GOSAT-FOCAL and the ACOS and NIES time series. The seasonal shapes also differ slightly with amplitudes of about 0.5 ppm with somewhat larger differences in the southern hemisphere where seasonal variations are generally smaller.
Overall, the agreement within the GOSAT data sets is broadly consistent with the systematic regional and seasonal biases derived from the TCCON validation, especially considering that all gridded data sets are based on a different spatial and temporal sampling. Also, the FOCAL products for GOSAT and GOSAT-2 seem to agree quite well, but more GOSAT-2 data 530 is needed to confirm this.

Conclusions
Based on the FOCAL retrieval method a new XCO 2 data set for GOSAT and a first XCO 2 data set for GOSAT-2 have been generated, making use of both measured polarisation directions. The GOSAT-FOCAL data set compares well with corresponding data from other currently available GOSAT retrieval algorithms, i.e. the RemoTec product from SRON, the UoL FP product, 535 the NASA ACOS product, the NIES product and the BESD product from IUP. All data sets use different filtering and bias correction schemes and therefore comprise also a different number and sampling of data. The GOSAT-FOCAL product performs well in this context and has almost as many valid data as the ACOS product. Based on gridded data, differences in long-term variations of all data sets in the order of 1 ppm per decade are observed. Also, seasonal variations differ by about 0.5 ppm.
Comparisons with ground-based TCCON data reveal for the GOSAT-FOCAL product an overall station to station bias of 540 0.56 ppm and a mean scatter of 1.89 ppm. These values are comparable to and in some cases even better than those of the already existing GOSAT products of which some have less valid data.
The first GOSAT-2 results using the FOCAL method are also quite promising, but further investigations, longer time series and more correlative data sets are required for a quantitative assessment of the GOSAT-2-FOCAL data quality.
Overall, the FOCAL method has proven to be computationally fast and to produce XCO 2 results with similar accuracy 545 as other, typically more time consuming, retrieval algorithms. This is the case not only when applied to OCO-2 but also for GOSAT and GOSAT-2. FOCAL is therefore considered to be a good candidate algorithm for future satellite sensors producing large amounts of data, like the forthcoming European anthropogenic CO 2 Monitoring (CO2M) mission.