Validation of methane and carbon monoxide from Sentinel-5 Precursor using TCCON and NDACC-IRWG stations

The Sentinel-5 Precursor (S5P) mission with the TROPOspheric Monitoring Instrument (TROPOMI) on board has been measuring solar radiation backscattered by the Earth’s atmosphere and surface since its launch on 13 October 2017. In this paper, we present for the first time the S5P operational methane (CH4) and carbon monoxide (CO) products’ validation results covering a period of about 3 years using global Total Carbon Column Observing Network (TCCON) and Infrared Working Group of the Network for the Detection of Atmospheric Composition Change (NDACCIRWG) network data, accounting for a priori alignment and smoothing uncertainties in the validation, and testing the sensitivity of validation results towards the application of advanced co-location criteria. We found that the S5P standard and bias-corrected CH4 data over land surface for the recommended quality filtering fulfil the mission requirements. The systematic difference of the bias-corrected total columnaveraged dry air mole fraction of methane (XCH4) data with respect to TCCON data is −0.26± 0.56 % in comparison to −0.68± 0.74 % for the standard XCH4 data, with a correlation of 0.6 for most stations. The bias shows a seasonal dependence. We found that the S5P CO data over all surfaces for the recommended quality filtering generally fulfil the missions requirements, with a few exceptions, which are mostly due to co-location mismatches and limited availability of data. The systematic difference between the S5P total column-averaged dry air mole fraction of carbon monoxide (XCO) and the TCCON data is on average 9.22± 3.45 % (standard TCCON XCO) and 2.45± 3.38 % (unscaled TCCON XCO). We found that the systematic difference between the S5P CO column and NDACC CO column (excluding two outlier stations) is on average 6.5±3.54 %. We found a correlation of above 0.9 for most TCCON and NDACC stations. The study shows the high quality of S5P CH4 and CO data by validating the products against reference global TCCON and NDACC stations covering a wide range of latitudinal bands, atmospheric conditions and surface conditions.


Introduction
The Sentinel-5 Precursor (S5P) mission with the TROPOspheric Monitoring Instrument (TROPOMI) on board was launched on 13 October 2017. The S5P is orbiting in a Sun-synchronous polar orbit with an Equator crossing at 13:30 local solar time. The TROPOMI instrument is a nadir-viewing hyperspectral spectrometer measuring solar radiation reflected by the Earth's atmosphere and its surface from the ultraviolet-visible (270-495 nm), near-infrared (675-775 nm) and shortwave-infrared (2305-2385 nm) with daily global coverage for monitoring atmospheric trace gases and aerosol (Veefkind et al., 2012). Methane (CH 4 ) and carbon monoxide (CO) are retrieved from shortwave-infrared (SWIR) and near-infrared (NIR) measurements.
Methane is the second most important anthropogenic greenhouse gas (GHG) after carbon dioxide (CO 2 ). It has a global warming potential that is about 28 times larger than CO 2 over a 100-year time period. It is less abundant in the atmosphere and has a significantly shorter lifetime than CO 2 (Stocker et al., 2013). Reduction in CH 4 will affect the Earth's radiation budget on a short timescale. CH 4 is also relevant in atmospheric chemistry, where it reacts with hydroxyl radicals (OH), thereby reducing the oxidation capacity of the atmosphere and producing ozone (Kirschke et al., 2013).
Carbon monoxide is a poisonous reactive gas considered principally an anthropogenic atmospheric pollutant. Volatile organic compounds (VOCs) are emitted to the atmosphere by incomplete combustion (e.g. vehicles, industry and biomass burning) and have an important role in the production of CO. The lifetime of CO is relatively short and ranges from weeks to months (Novelli et al., 1998). CO reacts with atmospheric oxidants, ozone (O 3 ), hydroperoxy (HO 2 ) and hydroxyl radicals (OH). It is the largest direct sink of OH affecting the self-cleansing capacity of the atmosphere. An increase in CO would imply a higher OH loss through chemical reaction and therefore less availability of OH for the depletion of other atmospheric constituents such as CH 4 . CO is therefore affecting the concentrations of primary greenhouse gases and has an indirect but important influence in determining the chemical composition and radiative properties of the atmosphere. It is therefore considered as an indirect greenhouse gas (Stocker et al., 2013).
Continuous precise and accurate global measurements of these gases are very important for long-term monitoring and their use by the inverse models such that the inferred surface fluxes can be better constrained. This paper focuses on the quality assessment of the operational S5P CH 4 and CO products by performing validation of the total columns of these two products with the reference data from all stations in the ground-based Total Carbon Column Observing Network (TCCON) and Infrared Working Group of the Network for the Detection of Atmospheric Composition Change (NDACC-IRWG) networks. The systematic and random error requirements of the CH 4 and CO products are checked based on 2.8 years of S5P data, and possible reasons are given where large deviations are observed.
The paper is organised as follows: Sect. 2 describes the satellite and ground-based reference data used in this study. Section 3 gives the details of the validation methodology. Section 4 gives the validation results for CH 4 , and Sect. 5 gives the validation results for CO. Section 6 summarises our results and conclusions.

Data
In this section, we present an overview of the input data from the S5P and the reference ground-based data from the TC-CON and NDACC-IRWG, herewith referred to as NDACC, which are used for the validation of the S5P operational CH 4 and CO products.

S5P methane and carbon monoxide data sets
TROPOMI is the unique payload of the ESA/Copernicus Sentinel-5 Precursor mission orbiting in a low-Earth Sunsynchronous polar orbit with a wide swath of 2600 km across track resulting in daily global coverage. The TROPOMI radiometric measurements of the Earth's radiance and solar irradiance are processed using an on-ground data processor to retrieve the atmospheric abundances of ozone (O 3 ), nitrogen dioxide (NO 2 ), sulfur dioxide (SO 2 ), formaldehyde (HCHO), methane (CH 4 ), carbon monoxide (CO), as well as cloud and aerosol properties. The spatial resolution of the operational level 2 (L2) CH 4 and CO products was originally 7 × 7 km 2 and was increased to 5.5 × 7 km 2 on 6 August 2019.
The operational processing to retrieve the total columnaveraged dry air mole fraction of methane (XCH 4 ) is performed by the RemoTeC-S5P algorithm. The information describing the theoretical baseline of the algorithm, the input and ancillary data needed, averaging kernel and the output generated is described in detail in Hu et al. (2016) and Hasekamp et al. (2019). The use of satellite measurements for estimating sources and sinks of CH 4 strongly depends on the precision and accuracy achieved. Systematic biases or lower precision on regional or seasonal scales can jeopardise the usefulness of the satellite measurements for the estimation of source and sink estimates (Bergamaschi et al., 2007). The bias requirement for S5P XCH 4 is 1.5 % and the random error requirement is 1 % (as reported in the official ESA document ESA-EOPG-CSCOP-PL, 2017, Table 1, p. 14). The current S5P CH 4 data are only processed for cloud-free measurements over land. Along with the standard CH 4 product, a bias-corrected CH 4 product is also made operationally available. We provide a brief summary of the CH 4 bias correction here, and the details of the bias correction can be found in Sect. 5.6 of the algorithm theoretical baseline document (ATBD) for S5P methane retrieval (Hasekamp et al., 2019). The operational S5P CH 4 product has been compared to co-located Greenhouse Gases Observing Satellite (GOSAT) proxy measurements. The S5P-GOSAT XCH 4 ratio shows a high correlation to the retrieved surface albedo in the SWIR. The highest correlation is for low surface albedo scenes. A posteriori bias correction has been applied to the S5P CH 4 product using a second-order polynomial fit. The effect of the bias correction is an increase of the retrieved CH 4 for scenes with relatively low albedo conditions (e.g. forest scenes) and a decrease of CH 4 for scenes with high albedo conditions (e.g. desert scenes). In the paper, we will show the validation results of both standard and biascorrected S5P CH 4 products. The latest product versions of S5P CH 4 data for the reprocessed (RPRO) and offline (OFFL) data from the start of the mission to 30 September 2020 are used in this work. The version numbers and the respective dates are listed in Table 1, and further details on the relevant improvements are given in the product readme file (PRF; https://sentinel.esa.int/documents/247904/ 3541451/Sentinel-5P-Methane-Product-Readme-File, last access: 14 July 2020). The quality assurance (QA) value is provided as part of the CH 4 data product. QA > 0.5 is used as recommended by the PRF to filter out the S5P CH 4 data to be used for the validation studies. This selection filters out measurements performed with surface albedo < 0.02, solar zenith angle (SZA) > 70 • , viewing zenith angle > 60 • and some other criteria as mentioned in the PRF.
The operational processing to retrieve the total column density of carbon monoxide (CO) simultaneously with interfering trace gases and effective cloud parameters (cloud height and optical thickness) is performed by the shortwave infrared carbon monoxide retrieval (SICOR) algorithm . The details describing the theoretical baseline of the algorithm, the input and ancillary data needed, example plots of averaging kernel and the output generated are described in detail in Landgraf et al. (2018). The bias requirement for total column-averaged dry air mole fraction of carbon monoxide (XCO) is 15 % and the random error requirement is < 10 % (as reported in the official ESA document ESA-EOPG-CSCOP-PL, 2017, Table 1, p. 14). The CO total column L2 data products are available as the OFFL and near-real-time (NRTI) timeliness data products. The version numbers and the respective dates are listed in Table 1 and further details on the relevant improvements are given in the PRF (https://sentinel.esa.int/documents/247904/3541451/ Sentinel-5P-Carbon-Monoxide-Level-2-Product-Readme-File, last access: 14 July 2020). The latest product versions of S5P CO data for the RPRO and OFFL data from the start of the mission to 30 September 2020 are used in this work. The NRTI data stream delivers the CO data product within few days after sensing. Due to the different timeliness, the NRTI products are given in 5 min data granules, whereas the OFFL data products are given per satellite orbit. The consecutive data granules of the NRTI product show an overlap of about 12 scan lines. The NRTI processing chains employ the same algorithm as the OFFL since processor version 01.03.02 starting from orbit no. 8906 on 3 July 2019 (see Sect. 9.4 of Lambert et al., 2020, for validation results showing the equivalence of S5P NRTI and OFFL CO products). More details on the two processing streams of the two data sets are given in the ATBD . In this paper, we show the detailed validation results of the S5P OFFL CO product. Data with QA values > 0.5 are used as recommended by the PRF. This selection filters out measurements performed with SZA ≥ 80 • , sensor zenith angle ≥ 80 • , two most westward pixels due to unresolved calibration issues and some other criteria as mentioned in the PRF. Furthermore, we also separated retrievals performed for measurements under clear-sky (CLSKY; cloud optical thickness < 0.5 and cloud height < 500 m, over land) and cloudy conditions (CLOUD; cloud optical thickness ≥ 0.5 and cloud height < 5000 m, over land and ocean) as suggested by Borsdorff et al. (2018b). The clear-sky observations over the ocean have too-low signal intensities in the SWIR and therefore cannot be used for the data interpretation. Unlike the S5P CH 4 a priori profiles which are available in the L2 files, the S5P a priori profiles for CO were downloaded from ftp://ftp.sron.nl/pub/jochen/TROPOMI_apriori/ (last access: 1 December 2020). Among the known data quality issues of the CO product, single overpasses of S5P show stripes of erroneous CO values < 5 % in the flight direction, probably due to calibration issues of S5P. We did not do any correction of this stripe pattern as we show the operational validation of the CO product and also because a small number of pixels < 5 % are affected by it. The striping effect is analysed in detail by Borsdorff et al. (2019). The effect on the TCCON validation was small. The destriping approach suggested by this work is planned to be implemented by the operational TROPOMI CO processing in the near future. Furthermore, the effect of updating the spectral cross-sections in the TROPOMI CO processing for clear-sky and cloudy conditions was analysed with ground-based Fourier transform infrared (FTIR) measurements from 12 stations of the TCCON network . In addition to the operational S5P CH 4 and CO products, a scientific version of the products (TROPOMI/WFMD) using the weighting function modified differential optical absorption spectroscopy (WFM-DOAS) has been developed by the University of Bremen (http://www.iup.uni-bremen. de/carbon_ghg/products/tropomi_wfmd/, last access: 1 June 2021). The details of the TROPOMI/WFMD CH 4 and CO products, their validation against reference groundbased measurements and operational TROPOMI products, as well as use cases of the products to address important scientific applications can be found in Schneising et al. (2019Schneising et al. ( , 2020a, Vellalassery et al. (2021).

Ground-based TCCON reference data set
The TCCON represents a network of ground-based Fourier transform spectrometers (FTSs), of the type Bruker IFS 125HR (some long-existing sites also use Bruker 120/5HR), that records direct solar absorption spectra in the NIR spectral range to retrieve accurate and precise column-averaged abundances of atmospheric constituents including CO 2 , CH 4 and CO amongst other species .
It is the current state-of-the-art validation system for total column measurements of important GHGs by remote sensing. TCCON data from several stations have been used in previous studies for the validation of trace gas data products from satellite platforms such as OCO-2 (O'Dell et al., 2018;Wunch et al., 2017), GOSAT (Iwasaki et al., 2017;Kulawik et al., 2016), S5P Borsdorff et al., 2018aBorsdorff et al., , 2019, MOPITT (Hedelius et al., 2019) and SCIAMACHY Hochstaffl et al., 2018). Data from all stations (23 in the Northern Hemisphere and 5 in the Southern Hemisphere) are used in this study and are listed in Table 2. The stations cover various atmospheric conditions (humid, dry, polluted, presence of aerosol), various surface conditions (range of albedo, flat terrain, high-altitude locations) and the latitudinal distribution from 80 • N to 45 • S. The stations at Nicosia and Xianghe are not yet officially part of TCCON but perform observations and data analysis fully compatible with TCCON guidelines. GGG2014 (the current standard TCCON retrieval code) XCH 4 systematic errors for TCCON are below 0.5 % for SZAs below 85 • . The XCO errors are below 4 % and decrease with SZA . The uncertainty in the scaling slope for XCO is 6 % (2σ ) (Hedelius et al., 2019). Previous studies have shown that the scaling factor of ∼ 7 % used in GGG2014 to tie the TCCON XCO measurements to the World Meteorological Organization (WMO) in situ scale is large compared to the current uncertainty in spectroscopy (Sha et al., 2018b;Hedelius et al., 2019;Zhou et al., 2019). A scaling factor of 7 % provided the best scaling to the in situ data available when the scaling for GGG2014 was calculated. There is currently an ongoing effort within the TC-CON community to determine whether the scaling factor is appropriate. These results are very important to decide on the choice of spectroscopic cross-sections that should be implemented for the future improved S5P CO product . In this work, we use the official TCCON XCO product as well as an XCO product without the application of the empirical scaling factor, herewith referred to as unscaled XCO. The unscaled XCO was calculated following Eq. (2) of Wunch et al. (2015), where the TCCON data without the scaling to the WMO scale were obtained from the site PIs. The validation work is done using the standard and rapid delivery of TCCON data from the whole network. The publicly available TCCON data can be accessed via https://tccondata.org/ (last access: 1 June 2021).

Ground-based NDACC-IRWG reference data set
The IRWG of the NDACC represents a network of highresolution Fourier transform spectrometers that records solar absorption spectra in the mid-infrared (MIR) spectral range. It is a multi-national collection of over 20 stations distributed from pole to pole (Eureka at 80 • N to Arrival Heights at 77.8 • S). The solar absorption spectra are used to retrieve the atmospheric concentrations of a number of gaseous atmospheric components, including ozone (O 3 ), nitric acid (HNO 3 ), hydrogen chloride (HCl), hydrogen fluoride (HF), carbon monoxide (CO), nitrous oxide (N 2 O), methane (CH 4 ), hydrogen cyanide (HCN), ethane (C 2 H 6 ) and chlorine nitrate (ClONO 2 ) (https://www2.acom.ucar. edu/irwg, last access: 1 June 2021). NDACC CH 4 and CO data from several stations have been used in previous studies for satellites validation Hedelius et al., 2019;Hochstaffl et al., 2018;Sha et al., 2018b;Buchholz et al., 2017;Olsen et al., 2017). In this study, data from all stations (19 in the Northern Hemisphere and 5 in the Southern Hemisphere) are used and are listed in Table 3. Several of the stations are located in high-latitude regions and many stations are located at high altitudes to reduce the interference of water vapour in the measurements. Some of these stations (e.g. Karlsruhe, Garmisch, Sodankylä, Porto Velho) are not officially part of NDACC but performs observations and data analysis fully compatible with NDACC guidelines. The co-located NDACC and TCCON stations often share one FTIR instrument, applying the respective detector and filter settings. The spectra are analysed either with the SFIT4 algorithm, an evolution of SFIT2 (Pougatchev et al., 1995) or the PROFFIT9 algorithm  to retrieve vertical profiles of CH 4 and CO. The retrieval allows the derivation of a tropospheric and a stratospheric column of the target gases (Sepúlveda et al., 2012. The NDACC CO column values can be used directly to validate the S5P CO column values. However, for the S5P XCH 4 validation, the NDACC XCH 4 values need to be calculated. Due to the NDACC measurements being performed in the MIR range, the oxygen (O 2 ) total column is not available from the spectrum for calculating the column-averaged dry air mole fractions of the target gas (Xgas), similar to what is done for TCCON (see Eq. A9 of Wunch et al., 2011). Therefore, the total column of dry air is computed as described in Eq. (1) of Deutscher et al. (2010). The surface pressure (P s ) is recorded at the local weather station of the FTS stations and H 2 O total column (TC H 2 O ) is derived from the National Centers for Environmental Prediction (NCEP) reanalysis data set. In the event that there is no surface pressure available, we extrapolate the pressure grid to the surface. The XCH 4 calculated values for NDACC measurements are then used for the validation of the S5P XCH 4 data. Unlike TCCON data, where a species-specific scaling factor is applied to tie the measurements to the WMO in situ scale, the NDACC data do not apply any scaling of the retrieved results. The typical accuracy and precision of the NDACC CH 4 data are about 3 % and 1.5 %, respectively. The typical accuracy and precision of the NDACC CO data are about 3 % and 1 %, respectively. High systematic uncertainty is mainly due to the too-conservative spectroscopic uncertainty component. Both the consolidated data available via http://www.ndaccdemo.org/ (last access: 1 June 2021) and the rapid delivery data supported by the CAMS27 project (https://cams27.aeronomie.be/, last access: 1 June 2021) have been used in this study. Table 2. List of FTIR stations that are associated with TCCON and contributed to the present work by providing public and rapid delivery data. The stations marked with an asterisk ( * ) are not yet associated with TCCON but perform observations and data analysis fully compatible with TCCON guidelines. Active dates correspond to the dates for which the measurements were provided from the satellite launch until the present work. 3 Validation methodology S5P provides the total column density of CO, which can be directly validated against the NDACC CO total column density product. However, we need to calculate the corresponding XCO values in order to compare to the TCCON XCO products. The S5P XCO is calculated by taking the ratio of the total column of CO (TC CO ) divided by the total column of the dry air (TC dry,air ) (following Eq. 1 in Deutscher et al., 2010).
where P s is the surface pressure, TC H 2 O is the total column of H 2 O, g is the column-averaged acceleration due to gravity, m dry,air and m H 2 O are the molecular masses of dry air and H 2 O, respectively. P s and TC H 2 O are taken from the S5P files.
The validation of the S5P methane and carbon monoxide data is performed based on the reference data sets from the ground-based TCCON and NDACC networks. We present the results for both of the networks with different co-location criteria applied to the data sets. The differences in the validation results are also based on whether or not a common prior has been used for the satellite and ground-based FTIR data sets; details are discussed in Appendix A.
S5P provides daily global coverage with a huge data set having a wide swath at a high spatial resolution for every overpass. Therefore, the selection of good co-location criteria is a crucial task in finding the best strict criteria while ensuring sufficient co-located data for a statistically significant validation. We tried several co-location criteria to test the sensitivity of the method in relation to the choice of the parameter (e.g. time, distance, line of sight). The best colocation criteria will be such that the bias is robust and not sensitive to small changes in the co-location criteria. In the next sections, the results of the application of these criteria are shown for the case with the reduction of uncertainty due to smoothing and in relation to direct comparisons. Table 3. List of FTIR stations that are associated with NDACC-IRWG and contributed to the present work by providing public and rapid delivery data. The stations marked with an asterisk ( * ) are not yet associated with NDACC but perform observations and data analysis fully compatible with NDACC guidelines. The location of the stations and the teams involved are indicated for the respective stations. The validation results of the S5P bias-corrected and standard methane products with reference TCCON and NDACC data are discussed in this section. The S5P observations colocated with the ground-based reference measurements are found by selecting all filtered S5P pixels within a radius of 100 km around each site and with a maximal time difference of 1 h for TCCON and 3 h for NDACC observations. The 1 h time difference for TCCON can be justified by noting that TCCON instruments acquire only one type of spectra and from each good spectrum methane is retrieved, while NDACC instruments are required to measure different types of spectra with different optical filter configurations, making the number of methane observations more sparse. An effective location of the FTIR measurement on the line of sight (i.e. at a 5 km altitude) is used to do the co-location. The co-located pixels can therefore differ from measurement to measurement. For each of the ground-based measurements which are co-located with the S5P measurements, an average of all S5P pixels is done. Co-located pairs are created between ground-based and averaged S5P only if a minimum of five pixels is found in applying the coincidence criteria. In the comparison, the a priori profiles in the TCCON and NDACC retrievals have been substituted with the S5P methane a priori following Eq. (A1). The a priori alignment, i.e. aligning the a priori profile to a common one, is done to compensate/correct its contribution to the smoothing equation (Rodgers and Connor, 2003). The TCCON results with the S5P prior substituted are then compared directly to the S5P XCH 4 data. However, the NDACC CH 4 concentration profile with the S5P prior substituted is additionally smoothed with the S5P column-averaging kernel following Eq. (A2). The NDACC XCH 4 is derived as discussed in Sect. 2.3 and then compared to the S5P XCH 4 data. Furthermore, each validation run also includes the adaptation of the S5P columns to the altitude of the ground-based FTIR instruments for cases where satellite averaging kernel is not applied or when column boundaries may differ (see Appendix B for details). Table 4 provides the validation results for the S5P biascorrected and standard XCH 4 data with the a priori aligned TCCON data at each TCCON station. The systematic difference (the mean of all relative differences) between the S5P and TCCON data is on average −0.68 ± 0.74 % (S5P standard XCH 4 product) and −0.26±0.56 % (S5P bias-corrected XCH 4 product). Only at a few TCCON stations (Sodankylä, East Trout Lake, Park Falls and Wollongong) is the bias slightly higher than 1.5 % for the S5P standard XCH 4 product. The albedo dependence correction of the S5P XCH 4 product shows a reduced bias relative to the TCCON data and are within the 1.5 %. The standard deviation of the relative bias, which is a measure of the random error, is well below 1 % for both standard (0.59±0.17 %) and bias-corrected (0.57 ± 0.18 %) S5P XCH 4 products. Figure 1 shows the bar plots for the S5P XCH 4 mean relative bias with respect to the TCCON XCH 4 data at all stations (left panel) and the standard deviation of the relative bias (right panel). The comparisons relative to the S5P bias-corrected XCH 4 product (labelled -bcsm100k1h) are the blue bars and those for the standard XCH 4 product (labelled -stdsm100k1h) are the magenta bars. The bias cor-rection of the S5P XCH 4 product being a function of the surface albedo acts differently at the different TCCON stations. Figure 2 shows the relative difference of the bias for the standard (top panel) and bias-corrected (bottom panel) S5P XCH 4 products as a function of the retrieved S5P SWIR surface albedo at the TCCON stations. The bias correction of the S5P XCH 4 product brings the high negative relative differences closer to zero for low surface albedo conditions and the high positive relative differences closer to zero for high surface albedo conditions. The low surface albedo conditions also show a high scatter in the relative difference plots. The latter is mainly because the scenes with low surface albedo are challenging for satellite-retrieved products due to large measurement noise. The difference of the mean relative bias between the S5P bias-corrected and the standard XCH 4 product for each TCCON station is shown as a magenta bar in the middle panel plot (labelled -diff_bcvsstd) of Fig. 1. It shows the overall direction of change is positive for most stations (low surface albedo conditions) and negative for few stations like Edwards, JPL and Pasadena (high surface albedo conditions). The standard deviations of the relative bias for the S5P standard and bias-corrected XCH 4 products are comparable. Scenes with low and high albedos pose specific challenges for S5P CH 4 retrieval. Validation of S5P CH 4 data at additional sites with different conditions (e.g. high surface albedo, high humidity, regions not covered by TCCON and NDACC) using portable FTIR spectrometers  will give further insight into the S5P CH 4 product quality.
The relative biases are plotted as mosaic plots and shown in Fig. 3, where the top panel shows the bias for S5P standard XCH 4 product, while the bottom panel shows the bias for S5P bias-corrected XCH 4 product relative to TCCON. Each bar in the mosaic plots represents the weekly averages of the relative bias values. The high-latitude stations show a high positive bias during the spring, which is then reduced and even switched sign to show negative bias during the autumn. Lorente et al. (2021), while analysing the improvements of their scientific S5P XCH 4 product, found similar seasonality in the bias at the high-latitude sites of Sodankylä and East Trout Lake and indicated correlations of high bias during spring time with the presence of snow (low surface albedo in the SWIR but high surface albedo in the NIR). In addition, the high-latitude sites are also influenced by the polar vortex, which is difficult to be represented by the a priori profile. The difference of the a priori profile from the true atmospheric profile will also add to the bias. This will be discussed further in the next section. Since measurements rely on direct line of sight of the Sun, data are not available during the winter months for high-latitude stations. The time series of the S5P bias-corrected XCH 4 product and TCCON data for each site are shown in Figs. 4 and 5. The ground-based TCCON XCH 4 data are represented in grey and the S5P data during that period are shown in light blue. The S5P data co-located with TCCON data are shown in blue and the co-located TC- Table 4. S5P XCH 4 validation results against TCCON XCH 4 data at 25 stations for the period between November 2017 and September 2020. Spatial co-location with radius of 100 km or cone with 1 • opening angle along the FTIR line of sight and time co-location of ±1 h around the satellite overpass were used. TCCON station (column 1) are sorted according to the decreasing latitude (column 2). The column with title "No." represents the number of co-located measurements, column title "SD" represents the standard deviation of the time series of the ground-based data relative to the standard deviation of the time series of the S5P data, column title "Corr" represents the correlation coefficient between the S5P and the reference ground-based data, column title "Rel diff bias" represents the relative difference ((SAT -GB)/GB) bias in percent, and column title "Rel diff SD" represents the standard deviation of the relative bias in percent.  (c) standard deviation of the relative bias in percent; (b) difference of the mean relative bias for validation cases (stdsm100k1h, bc100k1h, bcsm100k1hcone) in percent against the reference case (bcsm100k1h) in percent. Spatial co-location with radius of 100 km or cone with 1 • opening angle along the FTIR line of sight and time co-location of ±1 h around the satellite overpass were used. The stations are sorted with decreasing latitude.

Site
CON data with a priori alignment are shown in black. The amplitude of the CH 4 seasonal cycle is different at the different sites. This is related to the variability of the CH 4 concentrations in the atmosphere. The CH 4 concentration profile decreases rapidly with increasing altitude above the tropopause height. The concentration of CH 4 in the stratosphere, along with the troposphere, plays a key role in determining the total column of CH 4 at the given location. The CH 4 seasonal cycle in the troposphere is driven by the seasonality of both CH 4 sources and its sinks (mainly due to the reaction with OH), while the CH 4 seasonal cycle in the stratosphere is dominated by the vertical transport (Sepúlveda et al., 2012;Ostler et al., 2014;Bader et al., 2017;Zhou et al., 2018). The time series of the relative bias plots shown in Figs. 6 and 7 indicate a seasonal cycle, which is clearly seen for stations with a high density of reference data with a low scatter, e.g. Park Falls, East Trout Lake, Lamont, Edwards and Pasadena.
Taylor diagrams for the S5P bias-corrected XCH 4 and TC-CON XCH 4 data with a priori alignment are shown in Fig. 8. The correlation, represented by the angular coordinate, is above 0.6 for most stations (see Table 4 for exact values), and the distance to the origin of the ground-based dot relative to the satellite dot (ratio of SD of ground-based data to the SD of S5P) is below 1 for most stations, implying that the satellite data are more variable than the ground-based data. The correlation is mostly dominated by the seasonal cycle, and low correlations are seen for high-latitude sites where a bias jump is seen between spring and summer periods. Outliers such as Ny-Ålesund, JPL and Białystok are due to the limited data sets available for the comparison. The Ny-Ålesund station is located on the shore of a bay on the west coast of the island of Spitsbergen in Svalbard, Norway. As a result, only a few valid S5P XCH 4 pixels are found around the station, resulting in limited co-located data available for comparison. The TCCON instruments from the JPL and Białystok stations were moved to Edwards and Nicosia, respectively, thus resulting in limited data sets available from these sites. The very low correlation for Darwin and Wollongong is due to the low satellite values for some days (see Fig. 5), and for high-latitude sites it is due to the jump in the bias between the spring and later months (see Fig. 6). The altitude correction of the pixels works well, as can be seen by the relatively good correlation for Zugspitze; however, the scatter in the data is high. Table 5 provides the validation results for the S5P biascorrected and standard XCH 4 data with the smoothed NDACC data at each NDACC station. The systematic difference (the mean of all relative differences) between the S5P and NDACC data is on average −0.11 ± 1.19 % (S5P standard XCH 4 product) and 0.57 ± 0.83 % (S5P bias-corrected XCH 4 product). The mean of all stations is calculated by excluding outliers, which are stations with a low number of co-locations (Ny-Ålesund, Rikubetsu), high scatter in the ground-based data (Toronto) and unexpected high bias (Thule, Arrival Heights). Thule is located on the western coastline of Greenland. The valid S5P XCH4 pixels within Figure 2. Relative biases between co-located S5P (standard XCH 4 product -a; bias-corrected XCH 4 product -b) and TCCON XCH 4 data with a priori alignment are plotted as a function of the surface albedo retrieved by S5P at 25 TCCON stations within the period between November 2017 and September 2020. Spatial co-location with radius of 100 km and time of ±1 h around the satellite overpass were used. the co-location radius around Thule show several pixels with high XCH 4 values. These high XCH 4 values are in general found along the coastline and regions with altitude variability. Although a filter for the variability of the terrain roughness is applied in the QA filter options, these high values along the coastline of Greenland need detailed investigation and possible optimisation of the filter settings to remove the unexpected high values. We also observe valid pixels with unexpected high XCH 4 around the coastline and terrains with altitude variability in Antarctica. This is also the reason for the high bias observed at the Arrival Heights station located along the west side of the Hut Point Peninsula on Ross Island, Antarctica. The bias at Altzomoni is relatively high (2.44 % for S5P XCH 4 bias-corrected product), while the random error is comparable to other sites and within 1 %. Bezanilla et al. (2014) found large variability in CH 4 total columns measured at the Mexico City basin, pointing to significant local emissions affecting the natural background levels. A co-location mismatch would contribute partly to the bias seen with respect to S5P (see Sect. 4.3 on how using an advanced co-location criterion reduces the bias at Altzomoni). The mean standard deviation of the relative bias, which is a measure of the random error, is about 1 % for both the S5P standard (1.05 ± 0.51 %) and bias-corrected (1.04 ± 0.52 %) XCH 4 products. The high-latitude stations in the Northern Hemisphere show values slightly higher than 1 %.
The S5P XCH 4 mean relative bias and the standard deviation of the relative bias with respect to the NDACC stations as shown in Table 5 are shown as bar plots in Fig. 9. The comparisons relative to the S5P bias-corrected XCH 4 product (labelled -bcsm100k1h) are the blue bars and those for the standard XCH 4 product (labelled -stdsm100k3h) are the magenta bars. The standard deviations of the relative bias (right panel) for the S5P standard and bias-corrected XCH 4 products are comparable. Figure 10 shows the relative difference of the bias for the S5P standard (top panel) and biascorrected (bottom panel) XCH 4 products as a function of the retrieved surface albedo at the NDACC stations. Similar to Table 5. S5P XCH 4 validation results against NDACC XCH 4 data at 20 stations for the period between November 2017 and September 2020. Spatial co-location with radius of 100 km or cone with 1 • opening angle along the FTIR line of sight and time co-location of ±3 h around the satellite overpass were used. NDACC station (column 1) are sorted according to the decreasing latitude (column 2). The column with title "No." represents the number of co-located measurements, column title "SD" represents the standard deviation of the time series of the ground-based data relative to the standard deviation of the time series of the S5P data, column title "Corr" represents the correlation coefficient between the S5P and the reference ground-based data, column title "Rel diff bias" represents the relative difference ((SAT -GB)/GB) bias in percent, and column title "Rel diff SD" represents the standard deviation of the relative bias in percent.  the TCCON comparison, we also see here that the bias correction of the S5P XCH 4 product brings the high negative relative differences closer to zero for low surface albedo conditions and the high positive relative differences closer to zero for high surface albedo conditions. The data at stations with low surface albedo conditions also show a high scatter in the relative difference plots. The difference of the mean relative bias between the S5P bias-corrected and the standard XCH 4 product for each NDACC station is shown as a magenta bar in the middle plot (labelled -diff_bcvsstd) of Fig. 9. It shows the overall direction of change is positive for most stations (low surface albedo conditions) and negative for few stations like Boulder and Altzomoni (high surface albedo conditions).
The relative biases are plotted as mosaic plots and are shown in Fig. 11, where the top panel shows the bias for the S5P standard XCH 4 product, while the bottom panel shows the bias for the S5P bias-corrected XCH 4 product relative to NDACC. Each bar in the mosaic plots represents the weekly averages of the relative bias values. The high-latitude stations show a high positive bias during the spring, which is then reduced and even switches sign to show a negative bias during the autumn. This is the reason for the high standard deviation of the relative difference seen for the high-latitude stations having measurements during the spring and summer or autumn. Since measurements rely on direct line of sight of the Sun, the data are not available during the winter months for high-latitude stations. The time series of the S5P biascorrected XCH 4 product and the NDACC data for each site are shown in Figs. 12 and 13, and the respective relative biases are shown in Figs. 14 and 15. In the plots, the NDACC data are shown in grey and the S5P data are shown in light cyan. The S5P data co-located with NDACC data are shown in cyan and the co-located NDACC data are shown in black. . XCH 4 time series for all TCCON data (grey), S5P bias-corrected data (light blue), S5P data co-located with TCCON data (blue) and co-located TCCON data with a priori alignment (black) at each site ordered with decreasing latitude. Spatial co-location with radius of 100 km and time of ±1 h around the satellite overpass was used.
Taylor diagrams for the S5P bias-corrected XCH 4 and NDACC smoothed XCH 4 data are shown in Fig. 16. The correlation, represented by the angular coordinate, is above 0.5 for most stations (see Table 5 for exact values). No clear conclusion can be drawn as to whether the satellite data are more variable than the ground-based NDACC data, as we find quite a few stations where the distance to the origin of the groundbased dot relative to the satellite dot is both below 1 and above 1. The correlation is mostly dominated by the seasonal cycle, and low correlations are seen for high-latitude sites where a bias jump is seen between spring and summer periods. Outliers such as Ny-Ålesund, Rikubetsu and Porto Velho are due to the limited data sets available for the comparison. The ground-based data set from Toronto shows a high scatter, while a high unexpected bias for Thule and Arrival Heights indicates some problem with the S5P data set. The groundbased data set from Harestua shows a high scatter for few colocations. The low correlation for the high-latitude stations (Sodankylä and Kiruna) is due to the jump in bias between spring and later months (see Figs. 12 and 14).
Eight ground-based stations contributed to the validation study by providing XCH 4 data from both TCCON and NDACC measurements performed at the sites. The differences in the relative bias of the S5P bias-corrected XCH 4 product with respect to the TCCON and NDACC (bias NDACC − bias TCCON ) for these stations are the following: 0.15 % (∼ 2.9 ppb) for Eureka, 0.99 % (∼ 18.8 ppb) for Sodankylä, 1.59 % (∼ 30.2 ppb) for Bremen, 0.69 % (∼ 13.1 ppb) for Karlsruhe, 0.6 % (∼ 11.4 ppb) for Garmisch, 0.62 % (∼ 11.8 ppb) for Zugspitze, 0.84 % (∼ 16.0 ppb) for Wollongong and 0.26 % (∼ 5.0 ppb) for Lauder. Ostler et al. (2014) in a multistation (five) intercomparison study of column-averaged methane from NDACC and TCCON showed that there is no overall bias between MIR (NDACC) and NIR (TCCON) XCH 4 retrievals in general. However, dynamical variability can cause NDACC-TCCON differences in the XCH 4 values at the sites, with values up to 30 ppb. The high-latitude stations are affected by the stratospheric subsidence induced by the polar vortex, whereas for other locations, a deep stratospheric intrusion event can be the cause for the difference. Our study also shows differences between the bias NDACC − bias TCCON of the same order (up to ∼ 30 ppb) for the co-located stations. In the next section, we show detailed results of the a priori alignment and smoothing correction at the individual stations.

Smoothing effect in the validation of S5P methane data
The validation of the S5P bias-corrected XCH 4 data relative to the TCCON and NDACC XCH 4 data with and without (i.e. direct comparison) a priori alignment and smoothing correction are discussed in this section. S5P, TCCON and NDACC all have different vertical sensitivities and use different a priori profiles for their retrievals. In the case of similar vertical sensitivities, we can assume that the smoothing effects from satellite and ground-based retrievals are of nearly equal magnitude. However, the vertical sensitivities and the a priori profiles used are different, which means that the a priori profiles and the averaging kernels should be taken into account. For the case of TCCON, only an a priori alignment is done. The S5P prior is used as the common prior in our validation study. Smoothing effects are most relevant for cases with strong dynamic variability in the atmosphere. TCCON performs a profile scaling retrieval on the measurements performed in the NIR spectral region, whereas NDACC performs a profile retrieval in the MIR spectral region. The altitude of perturbation of the CH 4 profile plays a significant role on smoothing correction and is different for NIR and MIR retrievals. Ostler et al. (2014) showed that Figure 6. Relative difference ((satellite − ground-based)/ground-based]) of XCH 4 time series for all co-located S5P bias-corrected data and TCCON data with a priori alignment as the reference data at each site ordered with decreasing latitude as in Fig. 4. Spatial co-location with radius of 100 km and time of ±1 h around the satellite overpass was used.
TCCON retrievals are more accurate when perturbations are due to stratosphere-troposphere exchange in the upper troposphere/lower stratosphere (UTLS) region, whereas NDACC retrievals are more accurate for cases of stratospheric subsidence. In order to ascertain the effect of a priori alignment and smoothing, the validation results of the direct comparison are compared against the validation results with a priori alignment and smoothing as discussed in the previous section.
The validation results of the S5P bias-corrected XCH 4 data relative to the TCCON and NDACC data without a priori alignment and smoothing correction (direct comparison) are shown in columns 12-15 of Tables 4 and 5, respectively. The S5P XCH 4 mean relative bias and the standard deviation of the relative bias with respect to TCCON and NDACC are shown as grey bars in the left panel and right panel plots of Figs. 1 and 9, respectively. The standard deviation of the relative bias without smoothing correction is similar to the standard deviation of the relative bias for the case with smoothing correction. The differences between the mean relative bias with and without smoothing correction for the S5P biascorrected XCH 4 data for each TCCON and NDACC station are shown as grey bars in the middle panel plot (labelled -diff_smvsnosm) of Figs. 1 and 9, respectively. The difference plot relative to TCCON shows that the overall direction of change is negative for all stations, with high values for most stations in the Northern Hemisphere corresponding to regions with high dynamic variability. We observe a maximum difference of −0.25 % (∼ −4.8 ppb) and a mean difference of −0.14 ± 0.07 % (∼ −2.7 ± 1.3 ppb) across all TCCON sites for the duration of available measurements used in this study. The a priori alignment correction for the Southern Hemisphere sites is low where we observe on average a difference of about −0.07 % (∼ −1.3 ppb). The difference plot relative to NDACC shows that the overall direction of change is positive for all stations. Ny-Ålesund, which has the lowest number of collocations, shows the highest difference of 2.2 % (∼ 41.8 ppb). Thule, which has an unexpected high bias, shows the second highest difference of 1.86 % (∼ 35.3 ppb), and Toronto, which has a high scatter in the ground-based data, shows a high difference of 1.05 % (∼ 20 ppb). The difference at all other stations is below 1 %, with the high values seen for high-latitude sites; the mean difference of the selected NDACC sites shown in Table 5 is 0.38 ± 0.28 % (∼ 7 ± 5.3 ppb).
As pointed out in Sect. 4.1, the difference of smoothing (only a priori alignment for TCCON) vs. no smoothing for the eight co-located stations is observed highest for midlatitude TCCON stations and that for the NDACC stations, we observe the highest difference for the high-latitude stations. It is therefore important to use a realistic a priori profile for scaling retrievals, especially for cases of stratospheric subsidence or stratosphere-troposphere exchange. For such cases, improved a priori profiles representing the realistic atmospheric state will reduce the difference.

Comparison of circular vs. cone co-location criterion for validation of S5P methane data
In our standard S5P CH 4 validation settings with or without smoothing, we have used a co-location radius of 100 km around each ground-based site. As the operational S5P CH 4 pixels are currently provided only over land, the circular colocation criterion may not be optimal to be applied for all sites. Ground-based sites located close to a sea/ocean coast will always lack S5P CH 4 pixels over water. Furthermore, for sites located close to regions with high emission sources, there are possible scenarios when the ground-based FTIR line of sight is not covering all pixels observed by the satellite using the circular co-location criterion. This is also relevant for high-latitude sites where the ground-based FTIRs,   . Relative biases between co-located S5P (standard XCH 4 product -a; bias-corrected XCH 4 product -b) and NDACC XCH 4 data smoothed with S5P a priori and additionally smoothed with the S5P column-averaging kernel are plotted as a function of the surface albedo retrieved by S5P at 20 NDACC stations within the period between November 2017 and September 2020. Spatial co-location with radius of 100 km and time of ±3 h around the satellite overpass was used.
mostly measuring at high solar zenith angles, are always looking south for Northern Hemisphere sites and are looking north for Southern Hemisphere sites. We have implemented a cone selection criterion where we follow the ground-based FTIR line of sight with a 1 • opening angle of the cone at the highest altitude. Using the cone co-location criterion, we have done the validation of the S5P bias-corrected CH 4 data with smoothing and compared to the validation results using circular co-location criterion using the same settings as discussed in Sect. 4.1.
The validation results of the S5P bias-corrected XCH 4 data relative to the TCCON and NDACC data applying cone co-location criterion are shown in columns 16-20 of Tables 4 and 5, respectively. Using the cone co-location criterion reduces the number of S5P co-locations with ground-based FTIRs significantly (see column 16 in relation to column 3).
The S5P XCH 4 mean relative bias and the standard deviation of the relative bias with respect to TCCON and NDACC using the cone co-location criterion are shown as orange bars in the left panel and right panel plots of Figs. 1 and 9, respectively. The standard deviation of the relative bias with the cone co-location criterion is smaller than the standard deviation of the relative bias for the circular co-location criterion for sites with significantly reduced co-locations and is similar for other sites with small reduction in the number of co-locations. The difference between the mean relative bias with circular and cone co-location criterion for the S5P biascorrected XCH 4 data for each TCCON and NDACC station is shown as orange bars in the middle panel plot (labelled -diff_circvscone) of Figs. 1 and 9, respectively. The difference plot relative to TCCON shows the magnitude of change in bias, with values for some stations being negative while be- Figure 11. Mosaic plots showing relative biases between co-located S5P (standard XCH 4 product -a; bias-corrected XCH 4 product -b) and NDACC XCH 4 data smoothed with S5P a priori and additionally smoothed with the S5P column-averaging kernel at 20 NDACC stations within the period between November 2017 and September 2020. Spatial co-location with radius of 100 km and time of ±3 h around the satellite overpass was used. The time resolution of the data shown here is weekly. The stations are sorted with decreasing latitude.
ing positive for others. We observe a maximum difference of 0.3 % (∼ 5.7 ppb) and a mean difference of −0.02 ± 0.12 % (∼ −0.4 ± 2.3 ppb) across all TCCON sites for the duration of available measurements used in this study. The highlatitude sites in the Northern Hemisphere show a significantly low number of co-locations for the cone criterion. The relative bias for these sites (Eureka, Ny-Ålesund, Sodankylä and East Trout Lake) shows a slight increase for the cone colocation criterion in comparison to the circular co-location criterion. Sites where the relative bias using the cone criterion as compared to the circular criterion is lower by at least 2 ppb are the following: JPL (−0.2 %), Pasadena (−0.18 %), Lamont (−0.11 %) and Białystok (−0.11 %). Meanwhile, the sites where the cone criterion as compared to the circular criterion is higher by at least 2 ppb are the following: Lauder (0.3 %), Saga (−0.18 %) and Orléans (0.1 %). The difference plot relative to NDACC shows the magnitude of change in bias with values for some stations being negative while being positive for others. We observe a maximum difference of 0.49 % (∼ 9.3 ppb) and a mean difference of 0.01 ± 0.2 % (∼ 0.2 ± 3.8 ppb) across the selected NDACC sites (see Table 5) for the duration of available measurements used in this study. Several sites have few co-locations left upon selecting the cone criterion, with Ny-Ålesund showing no match at all. Amongst the sites where a significant number of co-locations remains, the sites where the relative bias using the cone criterion as compared to the circular criterion is lower by at least 2 ppb are the following: Altzomoni (0.49 %), Sodankylä (0.14 %) and Jungfraujoch (−0.11 %). The sites where the cone criterion as compared to the circular criterion is higher by at least 2 ppb are the following: Lauder (−0.30 %), Kiruna (0.25 %), Bremen (−0.15 %) and St. Petersburg (−0.12 %).
We have observed that applying the cone co-location criterion reduces the number of co-locations for all sites and Figure 12. XCH 4 time series for all NDACC data (grey), S5P bias-corrected data (light cyan), S5P data co-located with NDACC data (cyan) and co-located NDACC data smoothed with S5P a priori and additionally smoothed with the S5P column-averaging kernel (black) at each site ordered with decreasing latitude. Spatial co-location with radius of 100 km and time of ±3 h around the satellite overpass was used. Figure 14. Relative difference ((satellite − ground-based)/ground-based) of XCH 4 time series for all co-located S5P bias-corrected data and NDACC data smoothed with S5P a priori and additionally smoothed with the S5P column-averaging kernel as the reference data at each site ordered with decreasing latitude as in Fig. 12. Spatial co-location with radius of 100 km and time of ±3 h around the satellite overpass was used. quite significantly for some sites. There are seven TCCON stations and seven NDACC stations where the magnitude of the difference is above 2 ppb. Amongst all the stations, the magnitude of change in the relative bias between the two settings is the highest for Altzomoni station (see Sect. 5.3 for further discussion on the site).
4.4 Solar zenith angle dependence of the S5P methane bias relative to ground-based reference data The remote sensing measurements made either from the ground or satellites are known to be affected by the SZA of the measurements. In this section, we show the S5P CH 4 bias relative to the ground-based reference data as a function of the measurement SZA. Figure 17 shows the S5P relative bias for the a priori aligned and smoothed cases as a function of the measurement SZA against some of the reference groundbased TCCON stations. As mentioned in Sect. 2.1, the S5P CH 4 data are only available for SZA ≤ 70 • . The upper limits of the plots therefore show values only until 70 • . The S5P relative bias shows a high scatter for high SZAs. Stations like Sodankylä, East Trout Lake and Park Falls show high values in the relative bias for measurements at high SZAs when measurements are performed during winter and spring months. These measurements are influenced by surface conditions with snow cover and polar vortex conditions, whereas the negative bias at high SZA is from the summer and autumn measurements (e.g. see Figs. 6 and 7). At Lamont, we observe a strong increase in bias with decreasing SZA for measurements performed during spring. This is seen particularly in the case where the bias correction due to the SWIR surface albedo change occurred between 0.25 and 0.1 for measurements performed in this period at the site. The bias increase with decreasing SZA is also seen for other months at the different sites. Except for the spring measurements, which show a high bias, we observe a general decrease in relative bias with increasing SZA.

Validation of S5P carbon monoxide products
The validation of the S5P carbon monoxide data with the ground-based FTIR data from TCCON and NDACC stations is discussed in this section. The official S5P CO products are available over land as well as over water. As a result, in addition to the stations mentioned in the S5P methane validation results, co-locations with ground-based stations located on islands (e.g. Ascension, Izaña, Réunion and Mauna Loa) are found and discussed here. The NDACC station at Paramaribo and Porto Velho are the only stations in the South American continent currently contributing to the S5P CO validation study. As NDACC provides the CO column values, they are used directly to validate the S5P CO column values, whereas for the validation using TCCON XCO data, the S5P CO columns are converted to XCO as described in Sect. 3.

5.1
Validation of S5P XCO data using TCCON standard and unscaled XCO data and analysis of smoothing uncertainty As mentioned in Sect. 2.2, the validation of the S5P XCO offline data is performed with the TCCON standard XCO data as well as the TCCON unscaled XCO data, and the results are discussed in this section. The density of the official S5P valid CO pixels is higher as compared to the valid CH 4 pixels. As Figure 17. Relative biases between co-located S5P bias-corrected XCH 4 and TCCON XCH 4 data with a priori alignment are plotted as a function of the S5P measurement solar zenith angles retrieved at a few TCCON stations within the period between November 2017 and September 2020. Spatial co-location with radius of 100 km and time of ±1 h around the satellite overpass was used. The colours represent the different months from January (1) until December (12) of a year. a result, we found that using a co-location radius of 50 km around each ground-based station gave a sufficient number of pixels for robust statistics. We have used a maximal time difference of 1 h for TCCON observations, which is similar to the settings used for CH 4 validation. An effective location of the FTIR measurement on the line of sight is used to do the co-location. As a result, the co-located pixels can differ from measurement to measurement. For each of the ground-based measurements, which are co-located with the S5P measurements, an average of all S5P pixels is made. Co-located pairs are created between ground-based and averaged S5P pixels only if a minimum of five pixels is found in applying the coincidence criteria. In the comparison, the a priori profile in the TCCON retrievals have been substituted with the S5P CO a priori following Eq. (A1). The TCCON results with the S5P prior substituted are then compared directly to the S5P XCO data. Furthermore, each validation run includes the adapta-tion of the S5P columns to the altitude of the ground-based FTIR instruments. Table 6 provides the validation results using the a priori aligned TCCON unscaled and standard XCO data at each TCCON station. The systematic difference (the mean of all relative differences) between the S5P and TCCON data is on average 9.22 ± 3.45 % (TCCON standard XCO data) and 2.45 ± 3.38 % (TCCON unscaled XCO data). The absolute maximum bias value of 8.27 % is observed with respect to TCCON unscaled XCO data. While most stations show a positive relative bias of S5P XCO with respect to the TCCON unscaled XCO, there are few exceptions that show high negative values (e.g. Xianghe, JPL and Pasadena -all urban sites). This will be further discussed in detail later in this section. The standard deviation of the relative bias, which is a measure of the random error, is well below < 10 % for comparison against both TCCON standard and unscaled XCO data at all stations except at Wollongong where the value is 17.93 % (for TCCON unscaled XCO) and 19.37 % (for TCCON standard XCO). The high standard deviation of the relative bias at this station is due to the co-location mismatch during the period of fire event in that region producing enhanced CO plume passing over/nearby the ground-based station at Wollongong. As a result, for some of the days we found enhanced CO values in the S5P co-located pixels, which were not observed by the FTIR as the enhanced CO plume is not directly in the line of sight of the FTIR, while for other days we found enhanced CO values varying during the day as the fire plume passes by the station and in comparison the satellite measures for a shorter duration during the local noon and therefore misses the variability of CO during the co-location time selected for the validation. We tested with a reduced time co-location criterion of 30 min and found that, for the Wollongong station, the standard deviation of the relative bias reduced marginally to 17.89 % and the relative bias reduced to 1.87 % (for TCCON unscaled XCO validation results). The CO plumes emitted from the Australian fire during the summer of 2019/2020 were also observed at the Lauder station in New Zealand. The CO was well dispersed by the time the fire plumes were measured there, resulting in a better match between the S5P and ground-based FTIR measured XCO (see Figs. 20 and 22). Figure 18 shows the bar plots for the S5P XCO mean relative bias (left panel) and the standard deviation of the relative bias (right panel) with respect to the TCCON XCO data at all stations. The comparisons relative to the TCCON unscaled XCO data (labelled -unscsm50k1h) are the blue bars and those for the TCCON standard XCO data (labelled -stdsm50k1h) are the magenta bars. The mean relative bias of the S5P XCO data with respect to the TCCON unscaled XCO data is systematically lower than the mean relative bias with respect to the TCCON standard XCO data. The difference of the mean relative bias for S5P XCO data using the TCCON unscaled XCO and the standard XCO data for each station is shown as a magenta bar in the middle panel plot (labelled -diff_unscvsstd) of Fig. 18. It shows the overall direction of change is negative with a mean value of −6.77 ± 0.57 % for all stations. The result confirms the previously reported studies (Kiel et al., 2016;Sha et al., 2018b;Zhou et al., 2019) showing that the correction factor to tie the TCCON XCO data to WMO in situ scale is large and that TCCON XCO data are smaller than the uncorrected XCO data by about 7 %. The standard deviations of the relative bias for the S5P XCO data relative to the TCCON unscaled and standard XCO data are comparable.
The time series of the S5P XCO and TCCON unscaled XCO data for each site are shown in Figs. 19 and 20. The ground-based TCCON XCO data are represented in grey and the S5P XCO data during that period are shown in light red. The S5P data co-located with TCCON data are shown in red and the co-located TCCON data with a priori alignment are shown in black. The S5P and TCCON measurements observe the same seasonal cycle of CO. At the Northern Hemisphere sites, the high CO values are observed during winter and low values are observed during summer dominated by the OH variation (Té et al., 2016). At Southern Hemisphere sites, the high CO values are observed during September-November dominated by the influence of biomass burning (Duflot et al., 2010;Zeng et al., 2012). In addition to the seasonal cycle, we also see that at several of the ground-based sites, S5P and TCCON observe sometimes very high values of CO. These enhanced CO concentrations are due to the passing of the plumes with elevated CO concentrations over/nearby the station location (e.g. high CO seen at Wollongong during the Australian forest fires in November 2019-February 2020). Yurganov et al. (2004) also reported enhanced CO buildup measured at several sites with values much larger than the emission estimates. The time series of the relative bias plots shown in Figs. 21 and 22 indicate a seasonal cycle with a high bias seen during the high CO event and low bias seen during the low CO event. Sometimes very low S5P XCO values are observed in the validation plots at some stations, which pass the quality filter and find a match with the reference TCCON XCO data following our selection criterion. In these particular cases, we observe very low values in the relative bias plots. However, there are only a few occurrences of such low S5P XCO values.
The relative biases are plotted as mosaic plots and shown in Fig. 23, where the top panel shows the S5P bias with respect to the TCCON standard XCO data, while the bottom panel shows the S5P bias with respect to the TCCON unscaled XCO data. Each bar in the mosaic plots represents the weekly averages of the relative bias values. We will focus on the comparison of the results using TCCON unscaled XCO data. As mentioned in the previous paragraph, we observe a high positive bias during the high CO event periods, which is then reduced and even switches sign to show a negative bias during the low CO event periods. As TCCON performs solar absorption measurements, data are not available during winter for high-latitude stations.
Taylor diagrams for the S5P XCO and TCCON unscaled XCO data with a priori alignment are shown in Fig. 24. The correlation, represented by the angular coordinate, is above 0.9 for most stations (see Table 6 for exact values), and the distance to the origin of the ground-based dot relative to the satellite dot is below 1 for most stations, implying that the satellite data are more variable than the ground-based data. The good correlation indicates that the short-scale temporal variations in the XCO column captured by the ground-based instruments are moderately reproduced by S5P. Outliers such as Ascension, Zugspitze and JPL are due to the limited data sets available for the comparison. The altitude correction of the pixels works well, as can be seen by the relatively good correlation for Zugspitze; however, the scatter in the data is high.
In this section, we further show the results focusing on the effect of smoothing while doing the S5P XCO validation against TCCON unscaled XCO data. S5P and TCCON Table 6. S5P XCO validation results against TCCON XCO data at 28 stations for the period between November 2017 and September 2020. Spatial co-location with radius of 50 km or cone with 1 • opening angle along the FTIR line of sight and time co-location of ±1 h around the satellite overpass were used. TCCON station (column 1) are sorted according to the decreasing latitude (column 2). The column with title "No." represents the number of co-located measurements, column title "SD" represents the standard deviation of the time series of the ground-based data relative to the standard deviation of the time series of the S5P data, column title "Corr" represents the correlation coefficient between the S5P and the reference ground-based data, column title "Rel diff bias" represents the relative difference ((SAT -GB)/GB) bias in percent, and column title "Rel diff SD" represents the standard deviation of the relative bias in percent. have different vertical sensitivities (averaging kernels) and use different a priori profiles for their retrievals. The different a priori and vertical sensitivities should be taken into account in the validation. In the case of TCCON, only an a priori alignment is done. Smoothing corrections are most relevant for cases with strong dynamic variability in the atmosphere. TCCON performs a profile scaling retrieval on the measurements performed in the NIR spectral range and provides XCO. In order to ascertain the effect of smoothing correction, the results of the S5P validation using TCCON unscaled XCO are compared to the S5P validation results using a priori aligned TCCON unscaled XCO data.
The validation results of the S5P XCO data relative to the TCCON unscaled XCO data without smoothing correction (direct comparison) are shown in columns 12-15 of Table 6. The S5P XCO mean relative bias and the standard deviation of the relative bias with respect to the TCCON unscaled XCO data are shown as grey bars (labelled -unsc50k1h) in the left panel and right panel plots of Fig. 18. It can be seen that there exists an apparent interhemispheric difference in the bias for the direct comparison case (grey bars) between the Southern Hemisphere and Northern Hemisphere sites. This difference is greatly reduced when smoothing uncertainties are correctly accounted (blue bars) in the validation results (see left panel of Fig. 18). The difference between the mean relative bias with and without a priori alignment for the S5P XCO data for each TCCON station are shown as grey bars in the middle panel plot (labelled -diff_smvsnosm) of Fig. 18.
The magnitude of change between the smoothed and direct comparison is larger in the Southern Hemisphere than in the Northern Hemisphere with exception for sites located in highly polluted regions. The change at some stations (e.g. the Southern Hemisphere sites and highly polluted sites) is significant as it is larger than the XCO error estimated in Wunch et al. (2015). Zhou et al. (2019) reported similar findings for a comparison between six co-located sites, where both NDACC and TCCON CO measurements were performed. The difference plot shows the highest value of −17.43 % for Xianghe, a station located in a polluted area, due to a very high a priori difference from the true atmospheric state. As a result, the CO volume mixing ratio (VMR) at the surface is relatively high but it is not represented by the TCCON a priori, leading to an underestimation from the smoothing uncertainty. The same is true for other stations like Karlsruhe (change of −5.73 %) and Pasadena (change of −3.62 %). We observe a mean difference of 0.43 ± 4.44 % across all TC-CON stations. Figure 18 shows the TCCON stations where the a priori alignment uncertainty plays an important role in the bias and needs to be accounted for in the CO validation studies.

Validation of S5P CO column data using NDACC CO column data and analysis of smoothing uncertainty
In this section, the validation results of the S5P CO columns using NDACC CO columns are discussed. The S5P observa- Figure 19. XCO time series for all unscaled TCCON data (grey), all S5P data (light red), S5P data co-located with TCCON data (red) and co-located unscaled TCCON data with a priori alignment (black) at each site ordered with decreasing latitude. Spatial co-location with radius of 50 km and time of ±1 h around the satellite overpass was used.
tions co-located with the NDACC measurements are found by selecting all filtered S5P pixels within a radius of 50 km around each site and with a maximal time difference of 3 h. An effective location of the measurement on the line of sight is used to do the co-location. The co-located pixels can therefore differ from measurement to measurement. For each of the NDACC measurements co-located with the S5P measurements, an average of all S5P pixels is done. Co-located pairs are created between NDACC and averaged S5P only if a minimum of five pixels is found in applying the coincidence criteria. In addition to the direct comparison of the S5P and NDACC CO columns (referred to as NDACC CO un-smooth), the NDACC CO column values are additionally aligned with the S5P prior (referred to as NDACC CO ap-smooth) and used for the S5P validation, and in a further step the NDACC CO column values with the S5P prior substituted are additionally smoothed with the S5P columnaveraging kernel (referred to as NDACC CO smooth) following Eq. (A2) and used for S5P validation. Each validation run also includes the adaptation of the S5P columns to the altitude of the ground-based FTIR instruments. Table 7 provides the validation results for the S5P CO columns using smooth, un-smooth and ap-smooth NDACC CO column data at each NDACC station. The systematic difference (the mean of all relative differences) between the S5P and NDACC data is on average 6.76 ± 4.65 % (NDACC CO un-smooth), 4.27±5.62 % (NDACC CO ap-smooth) and 7.62 ± 5.04 % (NDACC CO smooth). However, the bias values are quite high at the Altzomoni and Arrival Heights stations. Eliminating the results of these two stations from the statistics of the overall stations, we observe the systematic difference between the S5P and NDACC data is on average 5.69 ± 3.07 % (NDACC CO un-smooth), 3.14 ± 4.19 % (NDACC CO ap-smooth) and 6.5 ± 3.54 % (NDACC CO smooth). The NDACC station at Altzomoni is located at a high altitude in the southwest direction of the Mexico City (Plaza-Medina et al., 2017;Baylon et al., 2017). The station is located < 60 km from the city centre. As a result, the emission from the world's eighth-largest megacity, with > 22 mil- lion population in its metropolitan area, plays a significant role in the satellite footprint (Stremme et al., 2013;Borsdorff et al., 2018aBorsdorff et al., , 2020. In the example plot shown in Fig. 25, we can see that the ground-based FTIR located at Altzomoni, with the line of sight to the south indicated by the yellow line, is not able to observe the high CO values located to the northwest of the station, which are selected for S5P using our co-location criterion. However, using the cone co-location criterion as described in Sect. 4.3, we can eliminate the pixels with high CO values that are not in the line of sight of the FTIR instrument and thereby reduce the co-location mismatch. The bias at Arrival Heights, the high-latitude background station located on the Antarctic continent showing very low values of CO, is slightly worse than the require- Figure 21. Relative difference ((satellite − ground-based)/ground-based) of XCO time series for all co-located S5P data and unscaled TCCON data with a priori alignment as the reference data at each site ordered with decreasing latitude as in Fig. 19. Spatial co-location with radius of 50 km and time of ±1 h around the satellite overpass was used. ment, while the random error is way below 10 %. The mean standard deviation of the relative bias, which is a measure of the random error, is well below < 10 % for validation using both smoothed and direct NDACC CO data. However, there are few exceptions for stations like Altzomoni, Wollongong and Boulder. The high values are due to the co-location mismatch during the high CO events (e.g. passage of a plume with a high CO concentration in the vicinity of the site) observed at these sites. Figure 26 shows the bar plots for the S5P CO mean relative bias (left panel) and the standard deviation of the relative bias (right panel) with respect to the NDACC CO column data at all stations. The comparisons relative to the NDACC smoothed CO data (labelled -ALLsm50k3h) are the blue bars, those for the NDACC un-smooth CO data (labelled -ALL50k3hr) are the magenta bars, and those for the NDACC ap-smooth CO data (labelled -ALLap50k3h) are the grey bars. The high-latitude stations show a high bias, while some stations like Paramaribo, Izaña and Mauna Loa show a low bias. The difference of the mean relative bias for S5P CO data for the NDACC smoothed CO (labelled -diff_smvsnosm) and NDACC ap-smooth (labelled -diff_apvsnosm) relative to the un-smooth CO data for each station are shown as magenta and grey bars in the middle panel plot of Fig. 26. It shows the magnitude of change in bias with values for some stations being positive while being negative for others. The effect of smoothing appears to be dependent on the station location. We observe a maximum difference of −6.89 % and a mean difference of 0.86 ± 2.79 % for all stations for the diff_smvsnosm case. And we observe a maximum difference of −11.26 % and a mean difference of −2.49 ± 2.96 % for all stations for diff_apvsnosm case. The changes at some stations are significant as it is larger than the CO column error estimated in NDACC. The standard deviation of the relative bias for the S5P CO data relative to the NDACC CO data with and without smoothing is comparable.
The time series of the S5P CO column and NDACC smoothed CO column data for each site are shown in Figs. 27 and 28. The ground-based NDACC CO data are represented in grey and the S5P data during that period are shown in light Table 7. S5P CO column validation results against NDACC CO column data at 23 stations for the period between November 2017 and September 2020. Spatial co-location with radius of 50 km or cone with 1 • opening angle along the FTIR line of sight and time co-location of ±3 h around the satellite overpass were used. NDACC station (column 1) are sorted according to the decreasing latitude (column 2). The column with title "No." represents the number of co-located measurements, column title "SD" represents the standard deviation of the time series of the ground-based data relative to the standard deviation of the time series of the S5P data, column title "Corr" represents the correlation coefficient between the S5P and the reference ground-based data, column title "Rel diff bias" represents the relative difference ((SAT -GB)/GB) bias in percent, and column title "Rel diff SD" represents the standard deviation of the relative bias in percent.  red. The S5P data co-located with NDACC data are shown in red and the co-located NDACC smoothed data are shown in black. The implication of the altitude correction can easily be seen for stations located at high altitude (e.g. Zugspitze, Jungfraujoch, Izaña, Mauna Loa, Altzomoni, Maïdo). The S5P and NDACC measurements observe the same seasonal cycle of CO. Similar to the TCCON results, we also see that at several of the NDACC sites, S5P and NDACC sometimes observe very high values of CO columns due to the passing of the plumes with elevated CO concentrations over/nearby the station location (e.g. Wollongong, Boulder, St. Petersburg, Porto Velho). The time series of the relative bias plots shown in Figs. 29 and 30 indicate a seasonal cycle with a high bias seen during the high CO event and low bias seen during the Figure 23. Mosaic plots showing relative biases between co-located S5P and TCCON XCO data with a priori alignment (standard -a; unscaled -b) at 28 TCCON stations within the period between November 2017 and September 2020. Spatial co-location with radius of 50 km and time of ±1 h around the satellite overpass was used. The time resolution of the data shown here is weekly. The stations are sorted with decreasing latitude. low CO event. The high scatter observed at the Toronto site is related to the scatter observed in the ground-based NDACC CO column data at the site.

Site
The relative biases of the S5P CO column and NDACC smoothed CO column data for each site are shown as a mosaic plot in Fig. 31. Each bar in the mosaic plot represents the weekly averages of the relative bias values. The plot shows high positive bias during the high CO event periods, which is then reduced and even switched sign to show negative bias during the low CO event periods. The biases at few stations like Toronto, Altzomoni and Arrival Heights appear as outliers in the plot. As NDACC CO column data are retrieved from solar absorption measurements, the data are not available during a few weeks in winter for high-latitude stations when the Sun is very low on the horizon.
Taylor diagram for the S5P CO column and NDACC smoothed CO column data are shown in Fig. 32. The correla-tion, represented by the angular coordinate, is above 0.9 for most stations (see Table 7 for exact values), and the distance to the origin of the ground-based dot relative to the satellite dot is below 1 for most stations (except at Paramaribo and Rikubetsu, which is due to the limited data sets available for the comparison) implying that the satellite data are more variable than the ground-based data. The good correlation indicates that the temporal variations in the CO column captured by the ground-based instruments are reproduced very similarly by S5P. Outliers such as Wollongong, Boulder and Altzomoni are due to the co-location mismatch during the high CO events (e.g. passage of a plume with a high CO concentration in the vicinity of the site) observed at these sites. The altitude correction of the pixels works well, as can be seen by the relatively good correlation at the high-altitude stations. Figure 24. Taylor diagram for daily mean differences between S5P and TCCON unscaled XCO data with a priori alignment at 28 TCCON stations within the period between November 2017 and September 2020. Spatial co-location with radius of 50 km and time of ±1 h around the satellite overpass was used. The stations are sorted with decreasing latitude. Figure 25. S5P CO column number density plotted around NDACC station at Altzomoni for one sample day. Panel (a) shows all available S5P pixels containing CO data in the overpass file. Panel (b) shows the co-located S5P pixels with 50 km radius selection criterion. Panel (c) shows the co-located S5P pixels with the cone co-location criterion with 1 • opening angle of the cone at the highest altitude. The yellow line in the plots represents the line of sight of the ground-based FTIR at the time of the satellite overpass over the site. (b) difference of the mean relative bias for validation cases (ALL50k3h, ALLap50k3h, ALLsmcone50k3h) in percent against the reference case (ALLsm50k3h) in percent. Spatial co-location with radius of 50 km or cone with 1 • opening angle along the FTIR line of sight and time co-location of ±3 h around the satellite overpass was used. The stations are sorted with decreasing latitude.
A total of 11 ground-based stations (Eureka, Ny-Ålesund, Bremen, Karlsruhe, Garmisch, Zugspitze, Rikubetsu, Izaña, Réunion-Maïdo, Wollongong and Lauder) contributed to the validation study by providing CO data from both TCCON and NDACC measurements performed at the sites. The mean difference in the relative bias of the S5P CO data with respect to the smoothed NDACC and TCCON (bias S5PvsNDACC − bias S5PvsTCCON ) for these 11 stations is −4.41±3.68 %. This indirectly implies that the NDACC CO is 4.41±3.68 % larger than TCCON CO data. The ground-based data available for these 11 stations do not always cover the same period. Therefore, this is only a qualitative estimate indicating the mean difference between NDACC and TCCON CO data at these 11 sites. Zhou et al. (2019) showed that the bias between co-located and smoothed TCCON and NDACC XCO data products for six stations has a mean value of 6.8 % (range 5.6 %-8.6 %). Our indirect comparison results for more sites and not exactly co-located ground-based data for the TCCON and NDACC show similar differences.

Comparison of circular vs. cone co-location criterion for validation of S5P carbon monoxide data
In our standard S5P CO validation settings with or without smoothing, we have used a co-location radius of 50 km around each ground-based site. In this section, we will discuss the validation results of the S5P CO column data with the smoothed ground-based data following the cone co-location criterion as described in Sect. 4.3. These results are further compared to the circular co-location criterion using the same settings. The application of the cone co-location criterion is shown in Fig. 25 for one sample day. The top-left panel plot shows all available S5P pixels containing CO column number density data in the overpass file. The Altzomoni station is marked at the centre of the plot. The high CO values to the northwest of the station are the footprint of the CO from Mexico City. Towards the northeast side of the station, some missing pixels are filtered due to clouds. The top-right panel plot shows the co-located S5P pixels with circular co-location criterion with a radius of 50 km as used for the CO validation study. As seen in the plot, there are few pixels with high CO values in the northwest, which are included in the selected pixels. The yellow line in the plot represents the line of sight of the ground-based FTIR at Altzomoni. Therefore, the high CO values in the northwest will not be observed by the FTIR measurement. This mismatch is a cause of the potential bias. The bottom panel plot shows the co-located S5P pixels with the cone co-location criterion with 1 • opening angle of the cone at the highest altitude. The selected S5P pixels using the cone co-location criterion are in the line of sight of the ground-based FTIR instrument and will potentially reduce a mismatch and therefore lower the potential bias between the satellite and ground-based data.
The validation results of the S5P CO data relative to the TCCON and NDACC data with smoothing and applying Figure 27. CO column time series for all NDACC data (grey), all S5P data (light red), S5P data co-located with NDACC data (red) and co-located NDACC data smoothed with S5P a priori and additionally smoothed with the S5P column-averaging kernel (black) at each site ordered with decreasing latitude. Spatial co-location with radius of 50 km and time of ±3 h around the satellite overpass was used. Figure 31. Mosaic plots showing relative biases between co-located S5P and NDACC CO column data smoothed with S5P a priori and additionally smoothed with the S5P column-averaging kernel at 23 NDACC stations within the period between November 2017 and September 2020. Spatial co-location with radius of 50 km and time of ±3 h around the satellite overpass was used. The time resolution of the data shown here is weekly. The stations are sorted with decreasing latitude. Figure 32. Taylor diagram for daily mean differences between S5P and NDACC CO column data smoothed with S5P a priori and additionally smoothed with the S5P column-averaging kernel at 23 NDACC stations within the period between November 2017 and September 2020. Spatial co-location with radius of 50 km and time of ±3 h around the satellite overpass was used. The stations are sorted with decreasing latitude. cone co-location criterion are shown in columns 16-20 of Tables 6 and 7, respectively. Using the cone co-location criterion only marginally reduces the number of S5P co-locations with ground-based FTIRs (see column 16 in relation to column 3). This is due to the high density of the official S5P valid CO pixels availability. The S5P CO mean relative bias and the standard deviation of the relative bias with respect to TCCON and NDACC using the cone co-location criterion are shown as orange bars in the left panel and right panel plots of Figs. 18 and 26, respectively. The S5P CO mean relative bias is comparable or slightly smaller for the cone co-location criterion as compared to the circular co-location criterion. The standard deviation of the relative bias with the cone co-location criterion is similar to the standard deviation of the relative bias for the circular co-location criterion. The difference between the mean relative bias with circular and cone co-location criterion for the S5P CO data for each TCCON and NDACC station is shown as orange bars in the middle panel plot (labelled -diff_circvscone) of Figs. 18 and 26, respectively. The difference plot relative to TCCON shows the magnitude of change in bias, with values for some stations being negative while being positive for others. We observe a maximum difference of 2.56 % and a mean difference of 0.09±0.55 % across all TCCON sites for the duration of available measurements used in this study. Sites where the relative bias using the cone criterion as compared to the circular criterion is outside the 1σ limit of the mean are Eureka (0.6 %), Garmisch (0.81 %) and Zugspitze (2.56 %). The difference plot relative to NDACC shows the magnitude of change in bias, with values for some stations being negative while being positive for others. We observe a maximum difference of −1.35 % and a mean difference of −0.07 ± 0.47 % across the selected NDACC sites for the duration of available measurements used in this study. The sites where the relative bias using the cone criterion as compared to the circular criterion is outside the 1σ limit of the mean are Eureka (0.78 %), Harestua (−0.53 %), Zugspitze (−0.75 %), Jungfraujoch (−0.7 %), Boulder (0.64 %) and Arrival Heights (−1.35 %). The high difference is observed mostly for the high-latitude stations where the cone colocation criteria following the ground-based FTIR line of sight are the best choice.

Validation of S5P CO (CLSKY, CLOUD and ALL) data using TCCON and NDACC data sets
As discussed in Sect. 2.1, we separated S5P retrievals performed for measurements under clear-sky (CLSKY; cloud optical thickness < 0.5 and cloud height < 500 m, over land) and cloudy conditions (CLOUD; cloud optical thickness ≥ 0.5 and cloud height < 5000 m, over land and ocean) in addition to our standard all case (ALL; cloud height < 5000 m over land and ocean). The validation results of S5P CO for ALL settings have been discussed in detail in . In this section, we show the validation results of the S5P CO for CLSKY and CLOUD settings against TCCON unscaled XCO with a priori alignment and NDACC CO column data with smoothing and compare the results in relation to the results of the ALL settings. Each validation run includes the adaptation of the S5P columns to the altitude of the groundbased FTIR instruments. Tables 8 and 9 provide the validation results for the S5P CO data for the ALL case, CLSKY case and CLOUD case at each TCCON and NDACC station. The systematic difference (the mean of all relative differences) between the S5P and unscaled TCCON data is on average 2.45±3.38 % (ALL case), 2.83 ± 3.43 % (CLSKY case) and 1.89 ± 3.11 % (CLOUD case). The standard deviation of the relative bias, which is a measure of the random error, is well below < 10 % for all sites except at Wollongong (ALL and CLOUD cases) and Pasadena (CLOUD case). Figure 33 shows the bar plots for the S5P XCO mean relative bias (left panel) and the standard deviation of the relative bias (right panel) with respect to the TCCON XCO data at all stations. The comparisons relative to the TC-CON unscaled XCO data for ALL case  are the blue bars, those for the CLSKY case (labelled -unscsm50k1hCLSKY) are the red bars, and those for the CLOUD case (labelled -unscsm50k1hCLOUD) are the green bars. The middle panel plot of Fig. 33 shows for each TCCON station the difference of the mean relative bias for S5P XCO data using the TCCON unscaled XCO ALL case and the CLSKY case (labelled -diff_ALLvsCLSKY) as red bars, as well as the CLOUD case (labelled -diff_ALLvsCLOUD) as green bars. The overall direction of change for the CLSKY case is negative with few exceptions, the maximum value of change is 2.41 % and a mean value of −0.38 ± 1.05 % for all stations. The overall direction of change for the CLOUD case is positive with few exceptions, the maximum value of change is 3.14 % and a mean value of 0.55 ± 0.79 % for all stations.
The systematic difference (the mean of all relative differences) between the S5P and NDACC data is on average 7.62 ± 5.04 % (ALL case), 7.7 ± 4.96 % (CLSKY case) and 7.74 ± 4.97 % (CLOUD case). The validation results at the Altzomoni and Arrival Heights stations show a quite high bias also for the CLSKY and CLOUD cases, similar to that observed for the ALL case. Eliminating the results of these two stations from the statistics of the overall stations, we observe that the systematic difference between the S5P and NDACC data is on average 6.5 ± 3.54 % (ALL case), 6.49 ± 3.11 % (CLSKY case) and 6.68 ± 3.69 % (CLOUD case). The random error at Arrival Heights, a high-latitude station located on the Antarctic continent, is well below 10 %. The mean standard deviation of the relative bias, which is a measure of the random error, is well below < 10 % for all three cases of validation results with few exceptions for stations like Altzomoni, Wollongong and Boulder. The high values are due to the co-location mismatch during the high CO events (e.g. the passage of a plume with a high CO concentration in the vicinity of the site) observed at these sites. Figure 34 shows the bar plots for the S5P CO mean relative bias (left panel) and the standard deviation of the relative bias (right panel) with respect to the NDACC CO column data at all stations. The comparisons relative to the NDACC CO column data for ALL case (labelled -ALLsm50k3h) are the blue bars, those for the CLSKY case (labelled -ALLsm50k3hCLSKY) are the red bars, and those for the CLOUD case (labelled -ALLsm50k3hCLOUD) are the green bars. The middle panel plot of Fig. 34 shows for each NDACC station the difference of the mean relative bias for S5P CO column data using the NDACC CO column ALL case and the CLSKY case (labelled -diff_ALLvsCLSKY) as red bars, as well as the CLOUD case (labelled -diff_ALLvsCLOUD) as green bars. The direction of change for the CLSKY and CLOUD cases is negative for some stations, while for other stations it is positive. The maximum value of change is 2.68 % and a mean value of 0.23 ± 1.11 % for CLSKY case for all stations. The Table 8. Validation of S5P XCO ALL, CLSKY and CLOUD data with TCCON XCO data at 28 stations for the period between November 2017 and September 2020. Spatial co-location with radius of 50 km and time co-location of ±1 h around the satellite overpass were used. TCCON station (column 1) are sorted according to the decreasing latitude (column 2). The column with title "No." represents the number of co-located measurements, column title "SD" represents the standard deviation of the time series of the ground-based data relative to the standard deviation of the time series of the S5P data, column title "Corr" represents the correlation coefficient between the S5P and the reference ground-based data, column title "Rel diff bias" represents the relative difference ((SAT -GB)/GB) bias in percent, and column title "Rel diff SD" represents the standard deviation of the relative bias in percent.  Table 9. Validation of S5P CO column ALL, CLSKY and CLOUD data with NDACC CO column data at 23 stations for the period between November 2017 and September 2020.
Spatial co-location with radius of 50 km and time co-location of ±3 h around the satellite overpass were used. NDACC station (column 1) are sorted according to the decreasing latitude (column 2). The column with title "No." represents the number of co-located measurements, column title "SD" represents the standard deviation of the time series of the ground-based data relative to the standard deviation of the time series of the S5P data, column title "Corr" represents the correlation coefficient between the S5P and the reference ground-based data, column title "Rel diff bias" represents the relative difference ((SAT -GB)/GB) bias in percent, and column title "Rel diff SD" represents the standard deviation of the relative bias in percent. Sites The CLSKY and CLOUD selection criteria can be useful in the case of specific applications. For example, the CLSKY case helped to reduce the standard deviation of the relative bias for Wollongong's TCCON and NDACC validation results. This is related to the significant filtering of the pixels over the ocean that are missing in the CLSKY case. The satellite clear-sky observations made over ocean have a toolow signal in the SWIR spectral region and are therefore filtered out. However, the ALL case results are quite comparable to the CLSKY and CLOUD cases in general and are therefore used as the general S5P CO data set in our validation studies.
5.5 Solar zenith angle dependence of the S5P carbon monoxide bias relative to ground-based reference data In this section, we show the S5P carbon monoxide bias relative to the ground-based reference data as a function of the measurement SZA. Figure 35 shows the S5P relative bias for the a priori aligned and smoothed cases as a function of the measurement SZA against the reference ground-based TC-CON stations at Sodankylä (left panel) and Lauder (right panel). As mentioned in Sect. 2.1, the S5P carbon monoxide data are only available for SZA < 80 • . The upper limits of the plots therefore show values only until 80 • . As explained in Sect. 5.2, the high values of S5P relative bias are observed during winter (measurements performed mostly at high SZAs) and the low values during summer (measurements performed mostly at low SZAs). We observe that the relative bias increases with increasing SZA of the measurement. This increase is about 10 % over the complete range of measurements SZAs.

Conclusions
In this study, we have done the geophysical validation of Sentinel-5 Precursor operational methane and carbon monoxide data sets (see Table 1 for version details) using reference ground-based TCCON and NDACC stations. A total of 28 TCCON stations and 24 NDACC stations covering a wide latitudinal range (Eureka at 80 • N to Arrival Heights at 77.8 • S), various atmospheric conditions (dry, humid, clean and polluted), various surface conditions (range of surface albedo), flat and high-altitude terrains, oceanic terrain have been used in this study. Furthermore, the combined use of the near-infrared TCCON data and mid-infrared NDACC data, as a whole network and at co-located stations, with their benefits helped to evaluate the Sentinel-5 Precursor operational methane and carbon monoxide product's quality in our validation study.
We found that the systematic difference between the S5P standard XCH 4 and a priori aligned TCCON data is on average −0.68 ± 0.74 %. The systematic difference changes to a value of −0.26 ± 0.56 % for the S5P bias-corrected XCH 4 data. The bias for both S5P standard and bias-corrected XCH 4 data is well within the mission requirements for bias (systematic error) of 1.5 %. We also found that the random error is well below 1 % for both standard (0.59 ± 0.17 %) and bias-corrected (0.57±0.18 %) S5P XCH 4 data. Most stations show a correlation above 0.6; the poor correlations at some sites are mostly dominated by the seasonal cycle or due to limited data sets available for the comparison. The systematic differences between the S5P standard and biascorrected XCH 4 against smoothed NDACC data are on average −0.11 ± 1.19 % and 0.57 ± 0.83 %, respectively. As the accuracy and precision of NDACC CH 4 data are lower than TCCON, conclusions about the S5P systematic and random error are drawn based on TCCON validation results. The bias-correction of the S5P XCH 4 data being a function of the retrieved surface albedo acts differently at different locations. We observe high scatter in the relative bias for low surface albedo conditions. A seasonal dependency of the relative bias is seen. We observe a high bias during the springtime measurements at high SZAs for high-latitude sites and a decreasing bias with increasing SZA for the rest of the year at all sites. The SZA dependence of the bias includes albedo correction and a priori difference from the true atmospheric state. We estimated the contribution of the a priori alignment uncertainty at the ground-based stations and found values up to ∼ 4.8 ppb at a TCCON station with mean value of ∼ −2.7 ± 1.3 ppb. The mean value of the smoothing uncertainty contribution at the NDACC stations is ∼ 7 ± 5.3 ppb, with some stations showing high values of up to ∼ 41.4 ppb. At the co-located TCCON and NDACC stations, we observed the highest contribution of the a priori alignment and smoothing uncertainty for midlatitude TC-CON stations, whereas for the NDACC stations we observe the highest contribution for the high-latitude stations. The comparison with a priori alignment and taking smoothing effects into account is recommended as the preferred method. However, the direct comparison of the satellite and reference data is useful to see the influence of the averaging kernel and a priori difference compared to the true profile. We found that using the cone co-location criterion improves the co-location between the satellite and ground-based station by observing similar air mass. This is crucial for certain stations, which are located closer to emission sources or high-latitude ones. Currently, we found seven TCCON and NDACC stations where the bias changed by more than 2 ppb between the circular and cone co-location settings. The cone criterion also significantly reduces the number of co-locations for some sites, thereby making the statistics less reliable for those sites. The L2 algorithm teams are continuously working on improving the operational products by optimising their code with respect to the observed biases seen with respect to the refer-   . Relative biases between co-located S5P XCO and TCCON unscaled XCO data with a priori aligned are plotted as a function of the S5P measurement solar zenith angles retrieved at a few TCCON stations within the period between November 2017 and September 2020. Spatial co-location with radius of 50 km and time of ±1 h around the satellite overpass was used. The colours represent the different months from January (1) until December (12) of a year. ence data sets. These improvements will be implemented in future versions of the S5P data.
We found that the systematic difference between the S5P XCO and a priori aligned TCCON data is on average 9.22 ± 3.45 %. Due to the uncertainty of the scaling slope of XCO in TCCON to tie the TCCON XCO measurements to WMO in situ scale, we have also used the unscaled TCCON XCO data (without application of the empirical scaling factor) for S5P XCO validation. We found that the systematic difference between the S5P XCO and a priori aligned TCCON unscaled XCO data is on average 2.45 ± 3.38 %. Both results are within the mission requirements for bias (systematic error) of 15 %. We found that the difference of the relative bias using the TCCON unscaled XCO and the TCCON standard XCO data is on average −6.77 ± 0.57 %. We estimated the contribution of the a priori alignment uncertainty in the validation and found that the magnitude of change between the a priori aligned and direct comparison is larger in the Southern Hemisphere than in the Northern Hemisphere except for sites located in polluted regions. The a priori alignment uncertainty contribution is significant at several sites, as it is larger than the estimated TCCON XCO error. We observe a mean difference of 0.43 ± 4.44 % across all TCCON stations with highest values of −17.43 % for Xianghe (due to very high a priori profile difference). We found that the systematic difference between the S5P CO column and the NDACC CO column data (excluding two stations which were obvious outliers) is on average 5.69 ± 3.07 % (NDACC CO direct comparison), 3.14 ± 4.19 % (NDACC CO smoothed by using S5P a priori as the common prior) and 6.5 ± 3.54 % (NDACC CO profile with S5P a priori substituted and additionally smoothed with S5P column-averaging kernel). The effect of the smoothing depends on the station location with a mean difference of 0.86 ± 2.79 % across all NDACC stations and a maximum value of −6.89 % in relation to the direct comparison. The effect of smoothing by doing only a priori substitution in relation to the direct comparison gives a mean difference of −2.49 ± 2.96 % across all NDACC stations and a maximum value of −11.26 %. The comparison with a priori alignment and taking smoothing effects into account is recommended as the preferred method. Most TCCON and NDACC stations show a correlation above 0.9, indicating that the temporal variations in CO column captured by the ground-based instruments are reproduced very similarly by S5P. The few exceptions are due to the limited data sets available for the comparison. We also found that the S5P random error for the TCCON and NDACC validation results is well below 10 %, except for few stations where a co-location mismatch occurs during certain periods with high values of CO events occurring due to plumes passing over/nearby the stations. A seasonal dependency of the relative bias is seen. We observe a high bias during the high CO event and low bias during the low CO event. We observed a mean difference of 0.09 ± 0.55 % with a maximum difference of 2.56 % for TCCON validation results using the cone co-location criterion compared to the circular co-location criterion. The results of the cone selection criterion at the NDACC stations show higher values than for the TCCON stations. We observe a mean difference of −0.07 ± 0.47 % with a maximum difference of −1.35 %. The high difference is observed mostly for high-latitude stations, where the cone co-location criterion following the line of sight of the ground-based FTIR is the best choice in finding co-located satellite pixels for validation. Furthermore, we observed that the validation results of the clear-sky and cloud cases of S5P pixels are in general comparable to the validation results including all pixels passing the filter criteria. The clear-sky or cloud cases are however useful for certain applications. We observe that the relative bias increases with increasing SZA of the measurement. We estimated this increase to be 10 % over the complete range of measurement SZAs.
Based on the validation results of the S5P operational methane and carbon monoxide data sets against the reference ground-based TCCON and NDACC data sets, we conclude that the S5P methane and carbon monoxide data are of high quality and fulfil the requirements for systematic and random uncertainties.
Appendix A: Reducing a priori and averaging kernel contribution in the validation The S5P and ground-based FTIR instruments have different instrument sensitivities and use different a priori profiles to retrieve the best representation of the true atmospheric state from the recorded spectra. The S5P uses an a priori profile derived from the TM5 model, a global chemistry transport model, whereas the TCCON uses a daily a priori profile generated by a stand-alone programme provided by Toon and Wunch (2015) and NDACC uses a single a priori profile from climatology of the Whole Atmosphere Community Climate Model version 6 (WACCM V6; ftp://nitrogen.acom.ucar.edu/ user/jamesw/IRWG/2013/WACCM/V6/, last access: 1 June 2021). In order to make the quantitative comparison, the influence of the a priori contribution to the smoothing equation needs to be compensated/corrected by adjusting the retrieval results to a common a priori profile (Rodgers and Connor, 2003). The S5P prior is used as the common prior. It is regridded to the FTIR grid using a mass conservation algorithm (Langerock et al., 2015). For the case where the satellite pixel elevation is above the ground-based site altitude, the S5P prior profile is extrapolated (i.e. a simple extension, the lowest VMR is taken as the VMR at the lowest ground-based grid) to the altitude of the ground-based instrument. The regridded S5P prior x a_S5P is substituted in the FTIR retrieval.
x FTIR_mod_prior = x FTIR + (I − A FTIR )(x a_S5P − x a_FTIR ), (A1) where x FTIR is the original VMR profile, x a_FTIR is the a priori profile used for the original FTIR retrieval (x FTIR ), x FTIR_mod_prior is the corrected FTIR-retrieved profile, A FTIR is the FTIR averaging kernel matrix, and I is the unity matrix. This step reduces the total smoothing uncertainty on the column differences by eliminating the uncertainty on the FTIR a priori. Although Eq. (A1) is only valid for NDACC profiles, it can be modified to be applied for TCCON column data as well. In that case, the prior profiles should be transformed to partial column profiles and divided by the total column of FTIR dry air.
For NDACC profiles, to further reduce the smoothing uncertainty contribution introduced by the averaging kernel, we smooth the corrected FTIR-retrieved profile (x FTIR_mod_prior ) with the S5P column-averaging kernel (cA S5P ). This requires the regridding of the corrected FTIR-retrieved profile to the S5P column-averaging kernel grid before applying the smoothing equation: where c a_S5P is the column values derived from the S5P a priori profile and c FTIR_smoothed is the smoothed FTIR column associated with a co-located S5P pixel. The n dryair in Eq. (A2) is the partial column profile calculated from the pressure difference ( P ) between the layer interfaces and the hydrostatic equation: P = m wet,air × n wet,air × g.
For CH 4 , the partial column of dry air is available in the S5P level 2 files. For CO, we derive it using the pressure on the boundaries as described in Eq. (A3). In Eq. (A3) above, n wet,air is approximated by n dry,air and the molar mass of wet air is approximated by the molar mass of dry air as there is no H 2 O profile available in the S5P prior. We found that this approximation has only a small influence, e.g. the bias change at Paramaribo, a tropical site, is about 0.2 % when compared to the case of using NCEP H 2 O profile. If the satellite pixel elevation is below the FTIR site altitude, the regridding of the corrected FTIR-retrieved profile is done such that the FTIR profile is extended with the S5P a priori profile. This extension of the a priori profile cancels on the right-hand side of Eq. (A3) and the FTIR smoothed column coincides with the S5P a priori partial column for the region where the grids mismatch.

Appendix B: S5P pixel altitude correction
An altitude correction is done for each S5P pixel in order to take into account the altitude difference between the S5P pixels and the ground-based station. The correction can be significant for co-location with mountain stations where the satellite pixels can be picked up from locations around the station, which are at lower or higher altitudes than stations.
The scaling factor (f ) is calculated from the satellite a priori profile using the following equation: where the numerator is the partial column from the FTIR station altitude to the top of the atmosphere (toa) and the denominator is the total column from the pixel altitude to the top of the atmosphere. The scaling factor is less than 1 for cases where the satellite pixels are located below the altitude of the FTIR station. In certain cases, where the S5P pixels are above the FTIR station, the scaling factor goes above 1. The scaling factor is applied to the satellite data such that the co-located pairs are on the same FTIR station altitude. Equation (B1) is valid for satellite pixels < station altitude, and we use the S5P prior profile. However, in the other case where satellite pixels > station altitude, we extrapolate the satellite prior to compensate the small altitude differences. The S5P products are adapted to the altitude of the station by either cutting off the scaled mixing ratio profiles at the station altitude (for the FTIR station at high-altitude locations) or by extending the profile assuming a constant elongation of the mixing ratio up to the station altitude (for the case where S5P pixel altitude is above the FTIR station). This method of S5P pixel altitude correction is applied when the satellite and ground-based columns are not calculated between the same boundaries, e.g. S5P vs. TCCON, and S5P vs. NDACC without extra satellite smoothing.
Data availability. The S5P CH 4 and CO data used in this study are made available to the S5P Mission Performance Centre via the ESA expert hub. Since the public release of the CO and CH 4 products, the data have been available at the Copernicus Open Access Hub (https://scihub.copernicus.eu, last access: 1 December 2020). The public S5P CO data can be accessed via Copernicus Sentinel-5P (2018) (https://doi.org/10.5270/S5P-1hkp7rp) and the public S5P CH 4 data can be accessed via Copernicus Sentinel-5P (2019) (https://doi.org/10.5270/S5P-3p6lnwd). The FTIR TC-CON data are available via the TCCON data archive, hosted by CaltechDATA (Total Carbon Column Observing Network Team, 2017, https://doi.org/10.14291/TCCON.GGG2014). The FTIR TC-CON data without the scaling to the WMO scale were obtained from the site PIs. The data from individual stations can be downloaded from the ftp server hosted at NOAA (ftp://ftp.cpc.ncep.noaa.gov/ ndacc/station/, last access: 1 June 2021) depending on the site PI's decision.
Author contributions. MKS and BL designed the study and produced the validation analysis and results. MKS wrote the first draft of the paper with support of BL. JL and AL had a joint responsibility in the S5P CH 4 prototype algorithm and operational processor. JL and TB had a joint responsibility in the S5P CO prototype algorithm and operational processor. All authors contributed in the generation of the data used in this study. All authors read the paper and provided comments.
The NDACC data used in this publication were obtained as part of the Network for the Detection of Atmospheric Composition Change (NDACC) and are publicly available (see http:// www.ndaccdemo.org/, last access: 1 June 2021). Rapid delivery data on NDACC are supported for selected sites by the CAMS27 project (https://cams27.aeronomie.be/, last access: 1 June 2021). The National Center for Atmospheric Research is sponsored by the National Science Foundation. The NCAR FTS observation programmes at Thule, GR, Boulder, CO, and Mauna Loa, HI, are supported under contract by the National Aeronautics and Space Administration (NASA). The Thule work is also supported by the NSF Office of Polar Programs (OPP). We wish to thank the Danish Meteorological Institute for support at the Thule site and NOAA for support of the Mauna Loa Observatory (MLO) site. The NDACC and TCCON stations (Ascension Island, Bremen, Garmisch, Izaña, Ny-Ålesund, Paramaribo and Karlsruhe) have been supported by the German Bundesministerium für Wirtschaft und Energie (BMWi) via DLR under grants 50EE1711A-E. The FTIR sites (Garmisch, Izaña, Karlsruhe, Kiruna and Zugspitze) have been supported by the Helmholtz Society via the research program ATMO. The NDACC/TCCON Izaña data benefit from the financial support from the Ministerio de Economía y Competitividad from Spain for the project INMENSE (CGL2016-80688-P). We thank the International Foundation High Altitude Research Stations Jungfraujoch and Gornergrat (HFSJG, Bern) for supporting the facilities needed to perform the FTIR observations at Jungfraujoch. The University of Liège contribution has been supported primarily by the Fonds de la Recherche Scientifique -FNRS under grant no. J.0147.18, by the GAW-CH program of MeteoSwiss, as well as by the CAMS project. Emmanuel Mahieu is a senior research associate of the F.R.S.-FNRS. The RUOA network "Red Universitaria de Observatorios Atmosféricos de la Universidad Nacional Autónoma de México" is acknowledged for the support of the Altzomoni site. Omar Lopez and Delibes Flores Roman are acknowledged for technical support. Alejandro Bezanilla and Noemie Taquet are acknowledged for helping with measurements and data processing. For the Altzomoni site, UNAM-DGAPA (grant nos. IN111418 and IN107417) and CONACYT (grant no. 290589) are acknowledged. The St. Petersburg site of the NDACC is operated by Saint Petersburg State University with the financial support provided by the Russian Foundation for Basic Research (grant no. 18-05-00011). The scientific equipment is maintained by the Geomodel research centre of SPbU. The NDACC site at Rikubetsu is operated as parts of the joint research programme of the Institute for Space-Earth Environmental Research (ISEE), Nagoya University, and supported in part by the GOSAT series project. The Wollongong NDACC site is funded by the Australia Research Council (grant no. DP160101598). Measurements at Lauder and Arrival Heights are core funded by NIWA through New Zealand's Ministry of Business, Innovation and Employment Strategic Science Investment Fund. Dan Smale thanks Antarctica New Zealand for providing support for the measurements at Arrival Heights. Review statement. This paper was edited by Andreas Richter and reviewed by two anonymous referees.