A scientific algorithm to simultaneously retrieve carbon monoxide and methane from TROPOMI onboard Sentinel-5 Precursor

Carbon monoxide (CO) is an important atmospheric constituent affecting air quality, and methane (CH4) is the second most important greenhouse gas contributing to human-induced climate change. Detailed and continuous observations of these gases are necessary to better assess their impact on climate and atmospheric pollution. While surface and airborne measurements are able to accurately determine atmospheric abundances on local scales, global coverage can only be achieved using satellite instruments. The TROPOspheric Monitoring Instrument (TROPOMI) onboard the Sentinel-5 Precursor satellite, which was successfully launched in October 2017, is a spaceborne nadirviewing imaging spectrometer measuring solar radiation reflected by the Earth in a push-broom configuration. It has a wide swath on the terrestrial surface and covers wavelength bands between the ultraviolet (UV) and the shortwave infrared (SWIR), combining a high spatial resolution with daily global coverage. These characteristics enable the determination of both gases with an unprecedented level of detail on a global scale, introducing new areas of application. Abundances of the atmospheric column-averaged dry air mole fractions XCO and XCH4 are simultaneously retrieved from TROPOMI’s radiance measurements in the 2.3 μm spectral range of the SWIR part of the solar spectrum using Published by Copernicus Publications on behalf of the European Geosciences Union. 6772 O. Schneising et al.: CO and CH4 retrievals from TROPOMI onboard Sentinel-5P the scientific retrieval algorithm Weighting Function Modified Differential Optical Absorption Spectroscopy (WFMDOAS). This algorithm is intended to be used with the operational algorithms for mutual verification and to provide new geophysical insights. We introduce the algorithm in detail, including expected error characteristics based on synthetic data, a machine-learning-based quality filter, and a shallow learning calibration procedure applied in the post-processing of the XCH4 data. The quality of the results based on real TROPOMI data is assessed by validation with ground-based Fourier transform spectrometer (FTS) measurements providing realistic error estimates of the satellite data: the XCO data set is characterised by a random error of 5.1 ppb (5.8 %) and a systematic error of 1.9 ppb (2.1 %); the XCH4 data set exhibits a random error of 14.0 ppb (0.8 %) and a systematic error of 4.3 ppb (0.2 %). The natural XCO and XCH4 variations are well-captured by the satellite retrievals, which is demonstrated by a high correlation with the validation data (R = 0.97 for XCO and R = 0.91 for XCH4 based on daily averages). We also present selected results from the mission start until the end of 2018, including a first comparison to the operational products and examples of the detection of emission sources in a single satellite overpass, such as CO emissions from the steel industry and CH4 emissions from the energy sector, which potentially allows for the advance of emission monitoring and air quality assessments to an entirely new level.

the scientific retrieval algorithm Weighting Function Modified Differential Optical Absorption Spectroscopy (WFM-DOAS). This algorithm is intended to be used with the operational algorithms for mutual verification and to provide new geophysical insights. We introduce the algorithm in detail, including expected error characteristics based on synthetic data, a machine-learning-based quality filter, and a shallow learning calibration procedure applied in the post-processing of the XCH 4 data. The quality of the results based on real TROPOMI data is assessed by validation with ground-based Fourier transform spectrometer (FTS) measurements providing realistic error estimates of the satellite data: the XCO data set is characterised by a random error of 5.1 ppb (5.8 %) and a systematic error of 1.9 ppb (2.1 %); the XCH 4 data set exhibits a random error of 14.0 ppb (0.8 %) and a systematic error of 4.3 ppb (0.2 %). The natural XCO and XCH 4 variations are well-captured by the satellite retrievals, which is demonstrated by a high correlation with the validation data (R = 0.97 for XCO and R = 0.91 for XCH 4 based on daily averages).
We also present selected results from the mission start until the end of 2018, including a first comparison to the operational products and examples of the detection of emission sources in a single satellite overpass, such as CO emissions from the steel industry and CH 4 emissions from the energy sector, which potentially allows for the advance of emission monitoring and air quality assessments to an entirely new level.

Introduction
Carbon monoxide (CO) is an atmospheric pollutant compromising air quality. It is a colourless, odourless, and tasteless gas that can disrupt the transport of oxygen by haemoglobin in the red blood cells after inhalation of high doses, thus having the ability to cause severe health problems (Omaye, 2002). Its lifetime of about 1 to 2 months allows it to be used as a tracer for the long-range transport of pollution. CO plays a central role in tropospheric chemistry by acting as a precursor to tropospheric ozone (The Royal Society, 2008), which is another pollutant considered harmful to public health and a greenhouse gas. Moreover, CO is the largest direct sink of the hydroxyl radical (OH) affecting the self-cleansing capacity of the atmosphere, as the consumed OH cannot deplete other atmospheric constituents such as methane anymore. Hence, CO can be interpreted as an indirect agent of climate change because it is affecting concentrations of direct greenhouse gases.
Methane (CH 4 ) is an important long-lived anthropogenically released greenhouse gas. It is second only to carbon dioxide (CO 2 ), which accounts for the largest share of radiative forcing caused by human activities since 1750. CH 4 is less abundant in the atmosphere than CO 2 , but it has a considerably higher global warming potential per unit mass. An accurate understanding of the sources and sinks of CH 4 is indispensable to reliably predict future climate. Due to its relatively long atmospheric perturbation lifetime (budget lifetime multiplied by feedback factor) of about 12 years (Prather et al., 2012;Holmes et al., 2013), CH 4 is well-mixed in the atmosphere and the signals in question are typically only small variations on top of large background concentrations. Therefore, requirements for the precision and accuracy of atmospheric CH 4 measurements are demanding Bergamaschi et al., 2009;World Meteorological Organization, 2006, 2011.
Detailed and continuous observations with global coverage of both gases are needed to improve our understanding of the climate system, tropospheric chemistry, and atmospheric transport processes. This objective can only be achieved using satellite instruments. Several spaceborne instruments have been measuring CO and CH 4 on a global scale up to now, including the Atmospheric Infrared Sounder (AIRS) (McMillan et al., 2005;Xiong et al., 2008), the Tropospheric Emission Spectrometer (TES) (Luo et al., 2015;Worden et al., 2012), and the Infrared Atmospheric Sounding Interferometer (IASI) (Clerbaux et al., 2009), which observe emissions in the thermal infrared (TIR) and are mainly sensitive to mid-to upper-tropospheric abundances. For CO this category is expanded by the Measurement of Pollution in the Troposphere (MOPITT) instrument , which combines observations of spectral features in the TIR and in the shortwave infrared (SWIR), increasing surface-level sensitivity in some scenes (Worden et al., 2010).
Nearly equal sensitivity to all altitude levels, including the boundary layer, can be achieved from radiance measurements of reflected solar radiation in the SWIR part of the spectrum. This was first demonstrated by retrievals from the SCanning Imaging Absorption spectroMeter for Atmospheric CHartographY (SCIAMACHY) instrument (Burrows et al., 1995;Bovensmann et al., 1999) onboard EN-VISAT for CO (Buchwitz et al., 2004;de Laat et al., 2010) and CH 4 (Buchwitz et al., 2005;Frankenberg et al., 2006) in the 2.3 or 1.6 µm spectral range. ENVISAT was launched in 2002 and the end of the mission was declared after 10 years in orbit due to unexpected loss of contact with the satellite in 2012. The Thermal And Near infrared Sensor for carbon Observations Fourier-Transform Spectrometer (TANSO-FTS) onboard the Greenhouse gases Observing SATellite (GOSAT) (Kuze et al., 2009), which was launched in 2009, also yields atmospheric CH 4 with high near-surface sensitivity but with a fairly sparse spatial sampling interval of about 160 km in five-point across-track mode between its 10 km diameter circular footprints. Its successor GOSAT-2 (launched in 2018) has an extended spectral range and is designed to additionally measure CO.

6773
The launch of the Sentinel-5 Precursor (Sentinel-5P) satellite in October 2017 with the TROPOspheric Monitoring Instrument (TROPOMI) onboard (Veefkind et al., 2012) can be considered a game changer for the determination of atmospheric composition from space. TROPOMI is a spaceborne nadir-viewing imaging spectrometer measuring solar radiation reflected by the Earth in a push-broom configuration. It has a swath width of 2600 km and allows for the analysis of several atmospheric species with an unprecedented level of detail by combining high precision and spatial resolution with daily global coverage. TROPOMI measures radiances between the ultraviolet (UV) and the shortwave infrared (SWIR) in eight bands. The characteristics of the TROPOMI NIR and SWIR bands are summarised in Table 1. CO and CH 4 can be retrieved from radiance measurements in TROPOMI's SWIR bands. For these bands, the spatial resolution of the nadir measurements is typically 7 km × 7 km, which is almost 40 times finer than for SCIAMACHY. In contrast to TANSO, the imaging capabilities of TROPOMI provide 3 orders of magnitude more measurements without gaps, thus facilitating real global maps of CO and CH 4 in a short time. The unique combination of high precision, spatiotemporal resolution, and coverage enables new fields of application. As large sources are readily detected in a single overpass, emission monitoring and air quality assessments are only two examples of the new prospects TROPOMI offers. The first applications concerning CO and CH 4 have already been highlighted and demonstrated in recent publications Borsdorff et al., 2018Borsdorff et al., , 2019Schneising et al., 2019).
As in the fields of weather and climate modelling, ensemble approaches have recently acquired an increased importance in the context of satellite observations, aiming at benefitting from a larger range of possible realisations of different physical aspects (Reuter et al., 2013) or to analyse to what extent specific geophysical findings depend on the particular characteristics of an algorithm or instrument (Buchwitz et al., 2017). Along these lines, it is worthwhile to have a set of distinct retrieval algorithms for each analysed atmospheric constituent at hand.
Here we introduce a scientific algorithm to retrieve CO and CH 4 simultaneously from TROPOMI that has the objective of complementing the operational algorithms in the sense described above and to provide new geophysical insights, whilst performing within the mission requirements concerning random and systematic errors at the same time. The presented scientific algorithm differs from the operational algorithms in several respects Hu et al., 2016) (see also Sect. 4.1 for a summary of the differences), and the corresponding products are thus predestined to be used together with the operational products in an ensemble approach. After a thorough description of the algorithm, including error characteristics based on synthetic data and validation with independent reference data, we present the first results of our new algorithm for both trace gases, demon-strating broad consistency with the operational products for example cases and the potential to advance the new application fields, for which TROPOMI's groundbreaking features pave the way.

WFM-DOAS retrieval algorithm
The Weighting Function Modified Differential Optical Absorption Spectroscopy (WFM-DOAS) algorithm (Buchwitz et al., 2006(Buchwitz et al., , 2007Schneising et al., 2011Schneising et al., , 2014) is a linear least-squares method based on scaling (or shifting) preselected atmospheric vertical profiles. The vertical columns of the desired gases are determined from the measured sunnormalised radiance by fitting a linearised radiative transfer model to it. A concise mathematical algorithm description and the key settings and adjustments for the simultaneous CO and CH 4 retrieval from TROPOMI's radiance measurements are summarised in the following subsections. The data products are based on TROPOMI Level 1b V01.00.00 files comprising spectra from the nominal operational mode, which started at the end of April 2018, and reprocessed spectra from the previous 6-month commissioning phase. The corresponding version is referred to as TROPOMI/WFMD (or WFMD in abbreviated form) v1.2.

Forward model
The forward model is derived from the radiative transfer model SCIATRAN (Rozanov et al., 2002(Rozanov et al., , 2014 in pseudospherical atmosphere mode. To enable a fast retrieval, a lookup table scheme for the radiances and their derivatives has been implemented, containing 17 280 reference spectra for varying solar zenith angle, altitude, albedo, water vapour, and temperature. The reference spectra are computed with high spectral resolution in line-by-line mode and subsequently convolved to the TROPOMI spectral resolution of the SWIR bands using an instrument-specific fixed spectral response function extracted from the TROPOMI ISRF Calibration Key Data v1.0.0 for nadir at 2338 nm. The auxiliary input data include US Standard Atmosphere profiles with methane scaled to 1850 ppb, the SCIATRAN aerosol model using the background scenario described in Schneising et al. (2008Schneising et al. ( , 2009, and HITRAN 2016 spectroscopic parameters (Gordon et al., 2017).

Inversion procedure
The linearised radiative transfer model (appropriately chosen from the lookup table according to the relevant parameters) plus a low-order polynomial is linear least-squares fitted to the logarithm of the measured sun-normalised radiance. The trace gas vertical profiles (CH 4 , CO, H 2 O) are scaled for the fit (i.e. the profile shape is not varied). Additional fit parameters are the shift of a preselected temperature profile, a scal- ing factor for the pressure profile, and parameters for a 2ndorder polynomial. Let m ∈ N be the number of spectral points in the fitting window and n ∈ N the number of state vector elements (fit parameters) with m n. The modelled radiance at wavelength λ is given by with state vector v, linearisation point v, and polynomial coefficients a of 2nd-order polynomial P . A derivative with respect to a vertical column thereby refers to the change in the top-of-atmosphere radiance caused by a scaling of a preselected absorber concentration vertical profile. There are m equations of this type, one for each detector pixel in the fitting window. The objective is to find the optimal state so that the linear model best fits the observed radiance. This problem can be rewritten as with (log-)radiance difference y ∈ R m of the measurement and linearised model due to a deviation x ∈ R n of the state vector from the multidimensional linearisation point, weighting function (Jacobian) matrix A ∈ R m×n (with derivatives at the linearisation point and polynomial basis functions as columns) and the sum of forward model error and (normally distributed log-transformed) instrument noise ∈ R m . The covariance matrix associated with measurement noise is given by C y = diag(σ 2 1 , . . ., σ 2 m ) ∈ R m×m . To give larger weight to spectral points with smaller error variances and to obtain error estimates of the retrieval parameters via error propagation from the uncorrelated measurement errors σ i , a weighted least-squares approach is applied with a matrix of weights defined by W = C −1 y . With the posterior probability p(x | y) of x given y, the most probable inference of the inversionx = arg max x∈R n p(x|y) is obtained by minimising with respect to x, where T is the matrix transpose. Hence, provides the solutionx = C x A T Wy of the inverse problem, where C x = A T WA −1 is the covariance matrix of solution x. The errors of the retrieval parameters are estimated bŷ Due to the potential non-linear dependencies of the radiances with respect to water vapour and temperature within their natural variability, the algorithm treats both parameters iteratively. The algorithm starts with lookup table elements representing US Standard Atmosphere water vapour amount and temperature. If the retrieved parameter pair after the fit is closer to another lookup table element, the process is repeated with the corresponding reference spectrum. Usually convergence is achieved after one iteration step.
As the lookup table only covers direct nadir conditions to limit its dimension to a reasonable size, a geometric path length correction has been implemented to remove the path extension and associated enhancement of the retrieved vertical columns for off-nadir conditions with a non-vanishing viewing zenith angle.
The spectral fitting windows in TROPOMI band 7 were optimised to retrieve CO and CH 4 simultaneously as accurately as possible (determined by an error analysis based on simulated measurements). They are shown in Fig. 1 together with the absorption features of the relevant trace gases. Note that CO is a much weaker absorber compared to CH 4 and H 2 O. The apparent albedo is retrieved in the preprocessing by comparison of the measured continuum radiance with precalculated values from a lookup table. Cloud information is obtained from strong H 2 O absorption lines in band 8 (see Fig. 1) by comparing the measured radiances to reference radiances for cloud-free conditions. As the absorption in these lines is strong, the measured radiance is small in the clear-sky case. In the presence of clouds, most of the atmospheric H 2 O is shielded and the measured backscattered radiance coherently increases . The corresponding ratio r cld of measured to reference radiance for the selected strong absorption lines is thus an indicator of cloud contamination.

Sensitivity and error analysis using synthetic data
The sensitivity of the retrievals to different atmospheric layers is demonstrated by the vertical column averaging kernels (Fig. 2). Compared to measurements in the thermal infrared spectral region, which are primarily sensitive to midor upper-tropospheric gas abundances in the absence of high thermal contrast, the advantage of the shortwave infrared spectral region is the sensitivity to all altitude levels, including the boundary layer, which is important to analyse emissions originating from the Earth's surface.
As described in the previous subsection, the retrieval noise is determined via error propagation from the measurement noise. To assess the theoretical precision performance, we assume a simple shot noise-limited noise model, which is defined in the following way: the reference signal-to-noise ratio is SN ref = 100 in the continuum (radiance L ref = 4.3 × 10 11 phot s −1 cm −2 nm −1 sr −1 ) for a dark scene (albedo = 0.05) with low sun (solar zenith angle of 70 • ) and is scaled according to for other radiances. The resulting absolute precision is widely independent of the current concentrations. For US Standard Atmosphere values, the corresponding relative retrieval noise for different albedos and solar zenith angles is shown in Fig. 3. It is below 1 % for solar zenith angles smaller than 75 • and albedos larger than 0.03 in the case of CH 4 . As the CO absorption is considerably weaker than the CH 4 absorption, the CO retrieval exhibits larger relative noise, which is below 8 % for albedos larger than 0.03.  The analysis of systematic errors is performed using simulated measurements. That means that for different scenarios defined by specific atmospheric conditions, radiances and irradiances are calculated with the radiative transfer model, which are subsequently used as measurement input in the retrieval. The errors are then defined as the deviation of the retrieved from the true quantities. The corresponding results for several scenarios are summarised in Table 2. All scenarios already include interpolation between different wavelength grids (for measured and reference spectra) unless otherwise stated.
The analysis includes basic scenarios testing if perturbations of the state vector elements can be retrieved, quantifying lookup table interpolation errors, and analysing errors caused by off-nadir conditions. In order to examine the sensitivity to vertical profile variations, the sce- nario class of profiles includes several realistic model atmospheres based on measurements and theoretical predictions (Anderson et al., 1986), with all methane profiles scaled to have surface values of 1850 ppb in each case to better represent current atmospheric conditions. The respective atmospheres differ from the US Standard Atmosphere with respect to temperature, pressure, water vapour, carbon monoxide, and methane profiles (see Appendix A of Anderson et al. (1986) for a visualisation of the different vertical profiles). These scenarios are more difficult to deal with than the basic ones because the perturbations are not consistent with the scaling assumption; i.e. they include proper variations of the profile shape.
Also examined is the sensitivity to the spectral albedo of the natural surface types shown in Fig. 4 taken from the Advanced Spaceborne Thermal Emission Reflection Radiometer (ASTER) and United States Geological Survey (USGS) spectral libraries. The analysed aerosol scenarios are largely described in Schneising et al. (2008Schneising et al. ( , 2009, with aerosol type definitions in the different atmospheric layers based on Optical Properties of Aerosols and Clouds (OPAC) (Hess et al., 1998). The retrieval errors due to undetected subvisual clouds are also investigated for different ice and water clouds.
This gives an impression of the magnitude of errors one can expect, assuming that thick clouds can be filtered out by cloud screening in the preprocessing or post-processing: typical systematic retrieval errors are below 1 % for methane and below 2 % for carbon monoxide, even for challenging scenarios. Larger systematic errors in the case of thick clouds are expected because clouds are not explicitly considered in the forward model of the retrieval algorithm to retain the high processing speed. Therefore, the systematic biases due to clouds are further analysed in more detail. The results for water and ice clouds at different heights are summarised in Fig. 5. Thereby, clouds are modelled as a layer of 1 km vertical extent consisting of water droplets with an effective radius of 10 µm or fractal ice crystals with an edge length of 100 µm. The analysis is performed for three different cloud types: two water clouds with cloud-top heights (CTH) of 2 and 4 km and an ice cloud with CTH of 10 km.
As expected, the absolute value of systematic errors typically increases with increasing cloud optical thickness, increasing cloud-top height, increasing solar zenith angle, and decreasing albedo. In most cases, there is a considerable underestimation of the vertical column in the case of thick clouds. However, there are also conditions under which the absolute value of the error is small even at a cloud optical thickness of τ = 1 or occasionally turns to an overestimation for measurements over bright surfaces. Overall the systematic errors due to clouds are qualitatively similar for CO and CH 4 .
The error analysis based on synthetic data has shown that perturbations of the state vector elements can be clearly retrieved and that the algorithm is theoretically suitable to successfully retrieve carbon monoxide and methane from real TROPOMI data for cloud-free scenes. In the case of thick clouds, systematic errors can become rather large, confirming that an efficient cloud-screening algorithm is necessary, in particular to meet the demanding requirements for the precision and accuracy of atmospheric CH 4 measurements. An appropriate quality filter is implemented in the postprocessing and described in Sect. 2.5.2. For CO it may be possible to relax the filter due to the less stringent requirements, but for now we employ a joint quality filter for both simultaneously retrieved trace gases. Table 2. Error analysis for different scenarios. Standard settings are direct nadir, sea level, solar zenith angle 50 • , albedo 0.1, and US Standard Atmosphere. Scenarios with ⊕ include scaling of the CH 4 and CO profiles by 10 %; for scenarios with the sensor zenith angle is set to 30 • (relative azimuth 60 • ). Standard cirrus clouds are located between 11 and 12 km (cloud optical thickness τ = 0.03), consisting of fractal ice crystals with an edge length of 100 µm. Standard cumulus clouds are located between 3 and 4 km (τ = 0.03), consisting of water droplets with an effective radius R of 10 µm.

High-resolution auxiliary data
As a consequence of the high spatial resolution of the TROPOMI SWIR measurements, a digital elevation model required for the selection and interpolation of suitable precalculated reference spectra and a land cover characterisation data set necessary to provide land fraction and surface type as additional information have to be implemented in high resolution. For this purpose, the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) and the Global Land Cover Characterization (GLCC) of the United States Geological Survey (USGS) (United States Geological Survey, 2018a, b) are used with a resampled resolution of 0.05 • (about 5 km at the Equator) to compute surface elevation, land fraction, and dominating surface type (Biosphere Atmosphere Transfer Scheme Legend) for every sounding of the satellite. Some incorrect values of zero elevation in the GMTED2010 data set over the Caspian Sea and Lake Superior have been replaced with corresponding Global 30 Arc-Second Elevation (GTOPO30) values (United States Geological Survey, 2018c). Figure 6 demonstrates the resolution of the implemented elevation and surface type data sets using the example of Europe.
2.5 Post-processing 2.5.1 Column-averaged dry air mole fractions In order to convert the retrieved vertical columns into column-averaged dry air mole fractions (denoted XCO and XCH 4 ), the columns are divided by the dry air column obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) analysis. Thereby, the ECMWF dry columns are corrected for the actual surface elevation of the individual TROPOMI measurements (based on the deviation from the mean altitude of the coarser model grid), inheriting the high spatial resolution of the satellite data.
An analysis based on simulated measurements has indicated that this approach is superior to a normalisation by simultaneously retrieved oxygen (O 2 A band) from TROPOMI band 6 for off-nadir conditions and/or in the presence of strong scatterers in the atmosphere (aerosol, clouds) as a consequence of the spectral distance in combination with the albedo differences of natural surface types between NIR band 6 and SWIR band 7 (see Fig. 4). For these reasons, O 2 is a barely sufficient proxy for the light path in the 2.3 µm spectral range in a scattering atmosphere. For example, the O 2 errors for the scattering scenarios aerosols (extreme in boundary layer) and clouds (cirrus) from Table 2 are −5.40 % and −7.54 %, respectively. Hence, the O 2 underestimations are considerably larger than the corresponding errors for CH 4 and CO, which would lead to distinct overestimations of mole fractions obtained from the O 2 -proxy approach in the presence of strong scatterers.
In addition to the better accuracy of the ECMWF-based mole fraction computation, this approach is also faster because the oxygen fit and the interband coregistration mapping can be omitted. As a consequence, the fitting procedure is about twice as fast without the normalisation by O 2 . The out-of-spectral-band stray light issue of the TROPOMI band 6  would potentially further hamper the O 2 -proxy approach.

Quality filter
To enable a fast processing speed to handle the huge amount of TROPOMI data, the lookup table is limited to rather simple physical conditions (e.g. cloud-free scenes). Thus, a quality-screening algorithm excluding measurements not sufficiently characterised by the forward model had to be implemented. First of all, challenging conditions with solar zenith angles larger than 75 • , which are increasingly prone to scattering and saturation-related issues due to the weakening signal and lengthening of the light path, are cut off. To be independent of other data sets and their ongoing availability, it was aimed at filtering based on parameters directly included in the retrieval output. This was achieved by using a machinelearning approach based on a random forest classifier, which is a meta estimator growing many independent decision trees on different subsamples of the data set that uses averaging to improve the predictive accuracy and prevent overfitting. Thereby, each tree of the ensemble is grown in the following way (Breiman, 2001 2. From the F input variables, f F features are randomly chosen out of F , and the best split according to the minimisation of Gini impurity on these f features is used to split the node (Breiman, 1996a). The value of f is held constant during the forest growing.
3. There is no pruning of the decision trees; i.e. each tree is grown to the largest possible extent.
To classify a new previously unseen measurement after growing the forest with the training data, each decision tree gives a classification according to the input features of the measurement, and the forest chooses the majority vote over all trees in the forest. The combination of the tree results, each based on different bootstrap replicates of the learning set, is called bootstrap aggregating or bagging (Breiman, 1996b). The forest error rate depends on the correlation between the trees in the forest and the strength of the individual trees. The forest error rate decreases with decreasing correlation and increasing strength of the trees. Reducing f reduces both the correlation and the strength, while increasing f increases both. Hence, there is an optimal range of f that minimises the forest error rate.
training subset consists of 80 million measurements, which are classified based on cloud information from the Visible Infrared Imaging Radiometer Suite (VIIRS) onboard Suomi NPP (Hutchison and Cracknell, 2005), which flies in loose formation configuration with Sentinel-5 Precursor (S5P trails behind by 3.5 min). This classification is augmented by additionally flagging distinct XCH 4 deviations relative to a climatology consisting of averages on a 6 • × 4 • grid for the years 2003-2005 based on the MACC-II flux inversion system (Bergamaschi et al., 2013) and adjusted by an accumulated increase until the time of the measurement based on globally averaged marine NOAA surface data (Dlugokencky, 2018), identifying scenes obviously not well-characterised by the forward model, in particular conspicuously decreased methane abundances in the presence of clouds due to shielding of the underlying atmosphere or in the case of very low surface reflectances.
To train the forest, a set F of 25 feature variables is selected by feature ranking with recursive feature elimination and cross-validated selection of the best features. As widely used, 20 % of the training data is randomly drawn and retained as test data (e.g. Suthaharan, 2016;Hino et al., 2018). The corresponding predictive accuracy is shown in Fig. 7 as a function of the selected features, confirming that the random forest does not overfit because the accuracy has its global maximum when using all 25 features as listed below. The selected variables in order of importance are (1) H 2 O column difference to the ECMWF, (2) cloud parameter r cld , (3) simplified surface type (water, coastal, land, desert, ice), (4) linear polynomial coefficient p 1 , (5) pressure difference to the ECMWF, (6) altitude, (7) latitude, (8) CO fit error, (9) temperature, (10) root mean square of fit residual, (11) temperature difference to the ECMWF, (12) H 2 O fit error, (13) pressure fit error, (14) H 2 O column, (15) longitude, (16) solar zenith angle, (17) pressure, (18) quadratic polynomial coefficient p 2 , (19) radiance ratio of strong H 2 O absorption to continuum, (20) dry air column from the ECMWF, (21) retrieved apparent albedo, (22) continuum radiance, (23) relative azimuth angle, (24) across-track dimension index, and (25) strong H 2 O absorption radiance. The predictive accuracy when using all 25 features amounts to 0.983, which means that 98.3 % of all scenes are correctly classified.
A more detailed analysis of the predictive power of the random forest can be obtained from the confusion matrix of the test data set (also shown in Fig. 7). As can be seen, the data set is unbalanced, with barely 10 % belonging to class 0, denoting the good measurements. This is primarily due to the large amount of cloudy scenes in combination with the issue that mainly sun glint and glitter scenes are classified as good over the ocean and inland waters as a consequence of the weak signal ascribed to the low reflectances of these dark surfaces (see Fig. 4). For land scenes the fraction of good observations accordingly increases to about 20 %. The percentage of all good measurements that are incorrectly excluded (false negative rate of class 0) amounts to about 13 % (9 % for land scenes). For these cases the filter is too strict, but the quality of the data passing the filter is not compromised. The percentage of all the measurements predicted to be good that are incorrectly classified and should actually be excluded (false discovery rate of class 0) amounts to about 11 %. For Figure 7. Cross-validated predictive accuracy of the quality classification random forest obtained by recursive feature elimination. The confusion matrix when using all 25 features is shown on the right-hand side for the test data set, denoting good observations with 0 and measurements to be excluded with 1. The green diagonal cells correspond to correct classifications and the red off-diagonal cells to incorrect classifications. The number of scenes and the percentage of the total number of scenes are given in each cell. Important key parameters are summarised in the grey cells along the edge. The right column shows the percentages of all the elements belonging to each class that are correctly (recall) and incorrectly (false negative rate) classified and the bottom row those that are predicted to belong to each class that are correctly (precision) and incorrectly (false discovery rate) classified. The dark grey cell in the bottom right corner displays the overall accuracy.
these cases the filter appears not stringent enough. However, as the training classification is quite strict, that does not necessarily mean that all these measurements are actually of low quality. The rate can rather be interpreted as an upper bound of potentially remaining challenging retrievals on the verge of sufficient characterisation by the forward model, e.g. observations near cloud edges. The effective diagnostic performance of the quality filter will emerge from the validation.
Adding additional parameters to F does not significantly improve the predictive accuracy further. It is important to note that the resulting classification is independent of the absolute abundances of the primary retrieval parameters CH 4 and CO. The performance of the classification algorithm is demonstrated in Fig. 8, confirming that cloudy scenes are reliably excluded in general and that the quality filter is usually stricter than the VIIRS classification, in particular over the weakly reflecting ocean. Measurements classified as cloudy by VIIRS but still passing the quality filter are rare and not associated with conspicuous methane abundances.

Shallow learning calibration for methane
The implemented machine-learning-based quality filter described in the previous subsection removes observations not sufficiently characterised by the forward model. Although this procedure typically excludes scenes exhibiting large systematic errors, smaller systematic errors may remain in the residual data set. In particular, there seems to be a systematic albedo dependence of unknown origin of retrieved methane abundances with an underestimation over dark surfaces. As a consequence of the fairly stringent quality requirements for methane, a random forest regressor algorithm was implemented to reduce the remaining systematic methane errors after the retrieval by calibrating against an assumed standard defined below, which is deemed insensitive to surface reflectance variations.
Like the classification algorithm described in the previous subsection, the random forest regressor (Criminisi and Shotton, 2013) grows an ensemble of decision trees, training each tree on a different data sample by applying the bootstrap aggregating technique. From the f randomly chosen parameters the optimal split maximising the variance reduction in the child nodes is used to split the nodes. To focus on the most prominent features (shallow learning of systematic errors caused by surface albedo variations), the tree growing is limited to 500 leaf nodes. Again, a forest size of 200 trees and f = √ F is used, where F consists of five feature variables, which are in the following order of importance: retrieved apparent albedo, solar zenith angle, cloud parameter r cld , strong H 2 O absorption radiance, and across-track dimension index.
To compute the correction for a new measurement after growing the forest with the training data, each decision tree provides a regression according to the input features of the measurement, and the random forest uses the average over all tree regressions as a final calibration value for this observation. In other words, the random forest regressor uses averaging in the bagging procedure to combine the individual tree results (in contrast to voting used in the classification case). Figure 8. (a, b) Quality-filtered XCH 4 over Europe overlaid on true colour reflectances from the Visible Infrared Imaging Radiometer Suite (VIIRS) taken from the NASA Worldview application for two example days not included in the training data set, demonstrating the performance of the machine-learning classification algorithm. Evidently, cloudy scenes are typically identified and excluded. (c, d) Comparison of the implemented quality filter (QUAL, 1: excluded) with the VIIRS cloud classification (1: cloudy). Matching classifications are shown in white and green. By definition the quality filter is generally stricter than the VIIRS cloud flag and the blue areas are additionally excluded. The rare instances of measurements classified as cloudy by VIIRS but still passing the quality filter are shown in cyan.
The calibration data set consists of the XCH 4 climatology introduced in the previous subsection evaluated for selected regions spanning a wide range of albedos and solar zenith angles. For individual regions, the climatology is roughly corrected for potential systematic overall biases by adding up a single region-specific correction value based on a comparison to nearby sites of the Total Carbon Column Observing Network (TCCON) (Wunch et al., 2011a) for the year 2017. In any case, the seasonal and intra-regional spatial variations are solely determined by the climatology. The training regions and corresponding climatology correction values are shown in Fig. 9.
The standard deviation of the resulting XCH 4 correction when considering global yearly averages of gridded data (on a 0.1 • × 0.1 • grid) amounts to 13 ppb, which is well below the natural variability. The XCO data set is not corrected.

Validation
TCCON is a network of ground-based Fourier transform spectrometers recording direct solar spectra in the NIR-SWIR spectral region to retrieve accurate and precise column-averaged abundances of several atmospheric constituents, including XCO and XCH 4 , thus providing a validation resource for satellite data (Wunch et al., 2011a). To ensure comparability, all TCCON sites use similar instrumentation (Bruker IFS 125HR) and a common retrieval algorithm. The TCCON data are tied to the WMO trace gas scale using airborne in situ measurements by applying individual scal- Figure 9. Regions used to train the machine-learning regressor comprising the Arctic (ARC), the western United States (WUS), central Europe (CEU), Japan (JAP), the Sahara (SAH), the South Atlantic (ATL), and Australia (AUS). The corresponding numbers specify the regional corrections applied to the methane climatology before learning in parts per billion, which are also colour-coded in the borders and backgrounds of the regions (blue for negative and red for positive corrections). The yellow circles highlight the TCCON sites used in the validation.
ing factors for each species. The estimated accuracy (1σ ) is about 2 ppb for XCO and 3.5 ppb for XCH 4 .
To compare the satellite data with TCCON quantitatively, it has to be taken into account that the sensitivities of the instruments differ from each other and that individual a priori profiles are used to determine the best estimate of the true atmospheric state, respectively. The first step is to correct for the a priori contribution to the smoothing equation by adjusting the measurements for a common a priori profile (Rodgers, 2000;Schneising et al., 2012;Dils et al., 2014). Here we use the TCCON prior as the common a priori profile for all measurements: In this equation,ĉ represents the originally retrieved TROPOMI column-averaged dry air mole fraction, l is the index of the vertical layer, A l the corresponding column averaging kernel of the TROPOMI algorithm, and x a and x a,T the TROPOMI and TCCON a priori dry air mole fraction profiles; m l is the mass of dry air determined from the dry air pressure difference between the upper and lower boundary of layer l via p l g l with gravitational acceleration g l and m 0 = l m l is the total mass of dry air. To minimise the smoothing error introduced by the averaging kernels we do not compareĉ adj directly with the retrieved TCCON mole fractionsĉ T but rather with the adjusted expression (Rodgers and Connor, 2003;Wunch et al., 2011b): Thereby, c a,T represents the TCCON a priori columnaveraged dry air mole fraction associated with the a priori profile x a,T . However, usingĉ T,adj instead ofĉ T has only a marginal impact on the validation results presented here because the satellite averaging kernels are close to 1 in the lower atmosphere (see Fig. 2), implyingĉ T,adj ≈ĉ T . The validation is performed at the TCCON sites listed in Table 3 (see also Fig. 9). For the comparison a set of collocation criteria has to be specified. Ideally, the representativity is maximised by criteria that are as strict as possible while concurrently ensuring sufficient data for a sound and stable comparison. This trade-off is resolved by the following selection. The spatial collocation criterion requires the satellite measurements to lie within a radius of 100 km around the TCCON site and the altitude difference to be smaller than 250 m. The temporal collocation criterion is set to ±2 h. As a consequence of the altitude representativity criterion, there are not enough collocations for a robust comparison at the mountain sites Zugspitze (Sussmann and Rettinger, 2018b) and Izaña (Blumenstock et al., 2017).
The validation results are summarised in Figs. 10 and 11, including the mean bias µ and the scatter σ relative to TCCON for each site. The parameter σ is estimated from Huber's Proposal-2 M-estimator (Huber, 1981), which is a well-established estimator of location and scale that is robust against outliers of a normal distribution. This is an appropriate choice and preferred over the standard deviation because one is interested in the actual single-measurement precision without distortion of the results by a few outliers, which are rather attributed to systematic errors, e.g. due to residual clouds. As a consequence, outliers are fully included in the computation of the systematic error but get lower weight in the robust determination of the random error, which is interpreted as a measure of the repeatability of measurements. It is also checked whether the respective site biases are sensitive to the selection of the spatial collocation radius, which is an indication of sources within the satellite collocation area with only a marginal influence on the TCCON measurements themselves. A considerable sensitivity was found for XCH 4 at Edwards. The collocation region intersects oil production areas in California's Central Valley (in contrast to Caltech and JPL; see also the results in Sect. 4 and Fig. 23) and the South Coast Air Basin (SoCAB), which has a wellknown methane enhancement . As such nearby sources limit the representativity of affected satellite measurements, the collocation radius is reduced to 50 km for Edwards.
The altitude representativity criterion separates the wellisolated air masses of the SoCAB, where Caltech and JPL are located, from the Mojave desert with the Edwards site to the north. Hence, different air masses are analysed in the validation at Caltech/JPL and Edwards, although the corresponding collocation circles overlap. This also explains the insensitivity to the spatial collocation radius at Caltech/JPL and why no additional constraints on the coincidence criteria are necessary for these sites to ensure representativity. As Caltech and JPL are both exposed to SoCAB air masses, the permissible altitude collocation tolerance of Caltech is equally assumed for JPL despite slightly differing surface elevation.
The results for the individual sites are condensed to the following parameters for the overall quality assessment of the satellite data: the global offset is defined as the mean of the local offsets at the individual sites, the random error is the global scatter (analogously estimated to the single-site case) of the differences to TCCON after subtraction of the respective regional biases, and the systematic error is the standard deviation of the local offsets relative to TCCON at the individual sites as a measure of the station-to-station biases. For XCO the global offset amounts to 4.49 ppb, the random error is estimated to be 5.14 ppb (6.12 ppb when using the standard deviation instead of Huber's Proposal-2 Mestimator), and the systematic error is 1.90 ppb, which is on the order of the estimated (station-to-station) accuracy of the TCCON of about 2 ppb. For XCH 4 the global offset aggregates to −1.30 ppb, the random error is 14.04 ppb (15.77 ppb when using the standard deviation), and the systematic error is given by 4.31 ppb, which is again similar to the TCCON accuracy of about 3.5 ppb.
To further analyse how well the real temporal and spatial variations are captured by the TROPOMI data, Fig. 12 shows a comparison to TCCON based on daily means for days with more than three collocations. The obvious linear relationship with a high correlation for both gases (R = 0.97 for XCO Figure 10. Comparison of the TROPOMI/WFMD v1.2 XCO time series (green) with ground-based measurements from the TCCON (red). For each site, N is the number of collocations, µ corresponds to the mean bias, and σ to the scatter of the satellite data relative to TCCON in parts per billion; σ is estimated from Huber's Proposal-2 M-estimator. The global offset is defined as the mean of the local offsets at the individual sites, the random error is the global scatter of the differences to TCCON after subtraction of the respective regional biases, and the systematic error is the standard deviation of the µ at the individual sites. and R = 0.91 for XCH 4 ) underlines the typically good agreement of the satellite and validation data. The linear regression yields a fit close to the 1 : 1 line for both gases.
In the case of XCH 4 , there are a few outliers for which the satellite values are considerably lower than the TCCON values. These occasional instances are not site-specific and can probably be ascribed to days with residual or partial cloud cover interfering with the satellite retrievals. Outliers with higher values compared to TCCON are more rare and dominated by a handful of collocations at East Trout Lake. This exceptional lack of XCH 4 agreement occurs on four days in the time period 10-21 February as well as on 29 March and may be attributable to Arctic polar vortex air above East Trout Lake potentially causing the following related issues: associated fronts of different air masses may complicate the identification of collocations near the vortex edge, and/or the stratospheric part of the methane profile may be largely affected by the polar vortex, leading to a considerable deviation from the assumed a priori profile shapes (Tukiainen et al., 2016). It is verified that the impact of outliers on the regression is marginal by repeating the fit with the Huber linear regression model (Huber and Ronchetti, 2009), which is robust to outliers and provides similar results to the standard linear regression here. In summary, the natural XCH 4 and XCO variations are well-captured by the satellite data. We find a singlemeasurement precision of the TROPOMI data of about 0.8 % for XCH 4 and 5.8 % for XCO, while the station-to-station accuracy of the satellite data is comparable to the TCCON.

Initial results using real TROPOMI data
In this section we present the first results from the mission start until the end of 2018. For temporally averaged data we grid the data on a 0.1 • × 0.1 • grid instead of showing swath data, which are used for daily data and single satellite overpass detection. Before analysing the data more regionally, we want to provide a global overview. The global distribution of retrieved XCO and XCH 4 for the year 2018 is shown in Figs. 13 and 14, respectively. Clearly visible is the interhemispheric gradient with larger values on the Northern Hemisphere, where the majority of sources is located, for both data sets superimposed by enhancements over prominent source regions like anthropogenic emissions in China, India, and Southeast Asia.
Other visible XCO source regions include human-initiated biomass burning in Africa and South America for land clearing and land use change, as well as wildfire emissions in North America, which were exceptionally pronounced in 2018. The anthropogenic emissions of congested urban areas like Mexico City and Tehran are already unambiguously detected on such a global map without zooming in.
In the case of XCH 4 additional visible source regions apart from the anthropogenic sources in Asia, like fossil fuels or rice cultivation, include tropical wetlands and anthropogenic emissions from California and the Padan Plain in Italy. There is also a distinct signal from Etosha National Park in the north of Namibia containing significant areas of wetland like the Etosha pan, an endorheic salt pan that exhibits intermittent shallow inundation.

Comparison to operational products
The operational TROPOMI CO product is retrieved using the Shortwave Infrared CO Retrieval (SICOR) algorithm , and the operational CH 4 product is based on RemoTeC , which is a physicsbased approach originally developed for CO 2 and CH 4 retrievals from OCO and GOSAT. Although the operational algorithms and the scientific algorithm presented here use similar spectral bands, there are many differences concerning the details of each approach. For example, TROPOMI/WFMD is a weighted least-squares approach, whereas SICOR and RemoTec are based on a Philips-Tikhonov regularisation scheme. There are also differences in the radiative transfer model, the quality filter, the spectroscopy used, and the state vector elements, in particular in the treatment of aerosols and clouds.
While WFMD and the operational CH 4 algorithm are mainly applicable to cloud-free scenes, the operational CO algorithm is designed to also handle cloudy observations under specific conditions. Both methane algorithms include a post-processing correction to improve the systematic albedo dependence. However, the details of this correction are again quite different: while the correction for the operational algorithm is based on linear regression relative to GOSAT retrievals, which are in turn bias-corrected against TCCON, the scientific WFMD algorithm uses a random forest regressor relative to a climatology as described in Sect. 2.5.3.
The comparisons are performed on a monthly basis with the latest version (V01.02.02) of the operational products. Figure 15 shows the corresponding CO results for December 2018. The comparison of the global distribution of all quality-filtered data for the respective algorithm illustrates that the spatial CO patterns are very similar for both algorithms. The operational CO algorithm exhibits better coverage as it can handle a larger amount of cloudiness. For common scenes passing the quality filters of both algorithms the data sets are highly correlated, with a correlation coefficient of R = 0.98, and the regression slope is also close to the 1 : 1 line, confirming the good agreement. The mean bias between the two data sets is about 1 %, and the standard deviation of the difference is comparable to the noise level.  WFMD, which is also reflected in a regression slope somewhat smaller than 1. The corresponding comparison of the CH 4 results is shown in Figs. 16 and 17 for December and June 2018, respectively. WFMD exhibits somewhat better coverage and also includes some retrievals over the ocean in contrast to the operational algorithm. Although the prominent features, like the interhemispheric gradient and source regions in Asia, are similar and the correlation coefficients are close to 0.9, the differences down to the last detail are more pronounced than for CO. This is reflected in both the global maps and the scatter plots for common scenes. The global offset between the two data sets amounts to a few parts per billion, and the stan-O. Schneising et al.: CO and CH 4 retrievals from TROPOMI onboard Sentinel-5P Figure 15. Comparison of TROPOMI/WFMD CO with the operational TROPOMI data for December 2018. Panel (a) depicts the global distribution of all quality-filtered data for the respective algorithm. Panel (b) shows a bivariate histogram of all common scenes passing the quality filters of both algorithms, summarising the linear regression results and the correlation of the data sets, as well as the mean and standard deviation of the difference. The number of points per bin is shown as a decadic logarithm lg(N). dard deviation of the difference is again comparable to the noise level.
In December 2018 the methane abundances over the Bohai Economic Rim, including the cities of Beijing and Tianjin, are larger than in southern China to the west of the Pearl River Delta for WFMD, whereas it is the other way round for the operational product, but this may be due to the different sampling. The XCH 4 distribution over the Sahara is more uniform for WFMD, and the corresponding patterns of the operational product seem to vaguely resemble some albedo features, with higher values over brighter parts of the Sahara. There is an obvious clustering of the common measurements around the 1 : 1 line, even for the largest values. Nevertheless, the linear regression line is somewhat distorted from the 1 : 1 line due to a slight shift of the two dominating densely populated sub-clusters.
In June 2018 there is a sharper XCH 4 gradient in the operational product when transitioning from the temperate into the low albedo boreal zone, and the values over the boreal ecosystem are lower than for WFMD. In addition, there are enhanced methane abundances in the operational product over the Canadian province Nunavut in contrast to WFMD. Some occasionally high values in the WFMD methane data over South America possibly attributable to surface roughness contribute to the rare outliers in the comparison scatter plot, which exhibits a clear linear relation close to the 1 : 1 line apart from that.
Overall, we find good agreement of our scientific CO product with the operational product based on the presented concise comparison. For CH 4 we find good agreement of the prominent features with some interesting differences in detail, including potential indications of residual albedo issues in the operational XCH 4 product. Further future analysis and understanding of the differences is expected to advance greenhouse gas retrievals from wide-swath imaging satellites like TROPOMI under challenging conditions such as scenes with low surface reflectance or residual cloudiness.

Carbon monoxide
Intense CO emissions from agglomeration areas, cities, and industrial facilities are clearly detected by TROPOMI. This is demonstrated using the example of China, India, and Southeast Asia in Fig. 18. The 2-month average was chosen to get an overview of the complete region. Typically, larger emissions can even be detected in a single satellite overpass. The tracked facilities mainly belong to the Chinese and Indian iron and steel industry.
For comparison, Fig. 18 also shows the operational product in addition to the TROPOMI/WFMD results. As the operational product is available as total CO columns, the corresponding mole fractions XCO were generated in the same way as for the scientific product by division of the total CO columns by the dry air columns obtained from the ECMWF. The comparison demonstrates that the enhancements due to the analysed emission sources can be typically identified in both data sets. However, as a consequence of the different spatiotemporal sampling, the enhancement over some point sources is somewhat more pronounced in the WFMD prod-  uct. A possible reason for this is the additional utilisation of cloudy observations in the operational SICOR product, which may be associated with reduced surface sensitivity under certain conditions reflected in the averaging kernels of the corresponding measurements.
In steelmaking CO is formed during two processes. Firstly, it is an essential constituent of blast furnace gas, which emerges when iron ore is reduced with coke to metallic pig iron. As the resulting pig iron has a relatively high carbon content, further processing is necessary to harden the metal. Therefore, the carbon-rich molten pig iron is converted to steel by lowering its carbon content via oxidation in the oxy-gen converter process (Linz-Donawitz steelmaking). The resulting converter gas predominantly consists of CO (≈ 70 %) (Ishioka et al., 1992;CarbonNext, 2017).
The detected factories include the following steel plants:  , with yearly production capacities of about 3 million tonnes of crude steel each. It can also be seen that the CO product exhibits striping in flight direction for single overpasses, similar to the operational product . There are also examples of detected CO emissions from steel works in Europe. Figure 20 illustrates such a case and shows that CO emissions from the largest steel plant in Poland, operated by ArcelorMittal in the industrial city Dąbrowa Górnicza in the Upper Silesian metropolitan area, are detected in a single overpass. As can be seen, the corresponding pronounced plume coincides with the boundary layer wind direction, and the striping is observable as well.
Another prominent source of CO is fire. In September 2018 a peat bog in the military training area WTD 91 in the Emsland region was accidentally set on fire by the German army and burnt for several weeks. The corresponding CO plume is clearly detected and aligns well with the wind direction (Fig. 21). The scenes right above the origin of the fire are automatically excluded by the quality filter because of the strong formation of smoke potentially shielding the subjacent partial columns, similar to thick clouds.
The difference between smoke and clouds is the particle size distribution. While clouds consist of water droplets with an effective radius of about 10 µm, the mass distribution of smoke plumes shows a prominent peak at about 0.3 µm (Stith et al., 1981) but is nevertheless dominated by a small number of supermicron-sized particles (Radke et al., 1990). The submicron particles reduce visibility and lead to an extended smoke plume over large distances in the true colour reflectances from VIIRS shown in Fig. 21. However, these small particles are not a major issue for the satellite measurements taken at 2.3 µm. The satellite retrievals near the origin of the fire are rather affected by the large supermicron-sized particles, which become more and more negligible when de-parting from the source of the fire due to their rapid fallout. This is the reason that at a sufficient distance from the fire the corresponding measurements pass the quality filter despite efficient scattering in the visible spectral range manifesting in an extensive plume in the VIIRS image. On the other hand, even very small clouds, which are barely visible in the VIIRS image at this resolution, are rigorously filtered out. This indicates that the algorithm implicitly distinguishes between smoke and clouds according to their particle sizes and that a reliable CO retrieval is possible in smoke plumes in the far field of the fire origin. A thorough discussion of the sensitivity of CO measurements in conjunction with smoke from fires can be found in the revised version of Schneising et al. (2019).
The total column enhancement E relative to background values allows us to roughly estimate the emitted mass flux of CO from the mean boundary layer wind speed v and the plume width x ⊥ perpendicular to the wind direction only using measurements passing the quality filter: The plume width is on the same order of magnitude as the instrument's spatial resolution, and the enhancement is thus calculated for the plume scene passing the quality filter which is nearest to the fire origin. As the wind direction is approximately perpendicular to one of the scene diagonals, the corresponding plume width x ⊥ is estimated by a √ 2 , assuming a quadratic scene with a side length a of about 7 km. With E = (7±0.7)×10 17 molec cm −2 , v = 12±3 m s −1 , and x ⊥ = 5 ± 1 km, the emission on 18 September amounts to about = 1.7 ± 0.6 ktCO. According to Kohlenberg et al. (2018), a CO : CO 2 emission factor of 16 ± 3 % for boreal peat fires is assumed, implying an associated CO 2 emission of approximately 10.5 ± 4.0 ktCO 2 on that day. Compared to the German yearly total budget of about 800 MtCO 2 yr −1 , the emissions from the Emsland peat fire are small even if one assumes that the fire burnt for several weeks at this strength.

Methane
One integral component of anthropogenic methane sources is emissions from the energy sector. As an example of methane leakage from natural gas production, Fig. 22 shows that the emissions of the world's second-largest natural gas field, Galkynysh in Turkmenistan, which is operated by Türkmengaz, can be clearly detected in a single satellite overpass. Also visible are XCH 4 enhancements over the productive South Caspian oil and gas basins, the oil and gas infrastructure at the Turkmen coast of the Caspian Sea, and smaller oil and gas fields south of Galkynysh.
Emissions from oil and gas production are important to monitor because methane leaks offset the climate change benefits of natural gas or oil over coal if the leakage exceeds a certain threshold (Alvarez et al., 2012;Farquharson et al., 2016). There are several studies suggesting that the oil and gas industry leaks more methane than assumed in inventories, at least locally or temporally (Brandt et al., 2014;Schneising et al., 2014;Alvarez et al., 2018), and the poten-tial heterogeneity among the sector complicates the specification of typical emission rates.
Another source region is the Central Valley in California, with combined anthropogenic emissions from oil fields and agriculture (see Fig. 23). While one main area of oil production is located in Kern County around Bakersfield, the dairy and cattle industry extends more or less over the whole valley, with the largest livestock density in the counties of San Joaquin, Stanislaus, Merced, Kings, and Tulare (Mauger et al., 2015). A reliable disentanglement of the emissions from the oil and agriculture sectors requires exact knowledge of the meteorology and unmistaken prior knowledge of the distribution of the different source types or methane isotopologue information, which is not yet available from satellite observations. As already mentioned in Sect. 3, the 100 km collocation radius standardly used in the validation intersects the Kern County source region for the Edwards TCCON site. As a consequence, the collocation radius is reduced to 50 km for Edwards to ensure the representativity of the satellite measurements used in the validation.
The two hitherto presented methane source regions of Turkmenistan and the Central Valley in California, which are both detected in a single TROPOMI overpass, were already identified in yearly averages of SCIAMACHY data (Buchwitz et al., 2017).
An additional source of methane from the energy sector is emissions from coal mining. The physical process of coal extraction directly releases methane, which was previously trapped within the coal bed in the form of gas particles  TCCON site (shown in cyan), the source region is intersected by the 100 km collocation radius standardly used in the validation for the Edwards site (pink). Therefore, the collocation radius is reduced to 50 km for Edwards. Due to the altitude representativity criterion, different air masses are analysed in the validation at Caltech/JPL and Edwards, although the corresponding collocation circles overlap. For a discussion of the collocation criteria at Edwards and Caltech/JPL, see also Sect. 3. adsorbed at coal grains. For safety reasons, the coal mine methane is diluted with air below the explosive range and released through ventilation shafts to the surface. Poland's primary energy consumption and electrical power generation relies strongly on coal, which helped the country to achieve one of the lowest energy import dependencies in the European Union (European Statistical Office, 2018) as measured by the share of net imports in gross inland energy consumption (the sum of energy produced and net imports). The energy dependence rate of Poland was about 30 % in 2016 compared to an EU-wide average of 54 %, meaning that the majority of the EU's energy needs are met by net imports. Only three EU countries have a lower energy dependence rate than Poland, namely Romania, Denmark, and Estonia.
Poland is the largest coal-mining country in Europe and among the top 10 coal producers in the world (BP, 2018), with huge reserves of hard coal and lignite (Polish Geological Institute, 2018a, b). The major coal basin is the Upper Silesian Coal Basin (USCB), which is larger than 5000 km 2 and hosts 80 % of the anticipated domestic hard coal resources. All operating hard coal mines in the country are sit-uated in the USCB except the Bogdanka Mine in the Lublin Coal Basin in the east of Poland. The USCB is shown in Fig. 24, highlighting individual mines. The corresponding methane plume is vaguely perceptible and coincides with the wind direction. This is an example of emissions that are obviously close to the detection limit for a single overpass.

Conclusions
We have introduced a scientific algorithm to retrieve XCO and XCH 4 simultaneously from shortwave infrared spectra recorded by the TROPOMI instrument onboard the Sentinel-5 Precursor satellite. The error analysis based on synthetic data and the successful validation with independent reference data from the TCCON have demonstrated that the algorithm is suitable to retrieve XCO and XCH 4 from real TROPOMI data well within the mission requirements after quality filtering (see Table 4). The corresponding quality filter is based on a machine-learning approach utilising a random forest classifier. As cloud data from VIIRS onboard Suomi NPP were only used in the preceding supervised learning process and are no longer needed in the actual quality prediction of individual previously unseen measurements after the completion of the training, the quality filter is independent of the continuous availability of external cloud information. The performance of the retrieval algorithm is expected to further improve in the future, for example with respect to striping in flight direction for single overpasses, due to a refined calibration of the TROPOMI instrument and/or dedicated algorithm advancements.
The good global agreement of our scientific products with the operational products for the analysed example cases further underlines the quality of the presented algorithm. The differences in detail for XCH 4 can be thought of as a stimulation for further future analysis. The understanding of these differences will likely allow us to symbiotically advance both retrieval algorithms under challenging conditions, such as scenes with low surface reflectance or residual cloudiness. Moreover, the scientific and operational products are predestined to be used together with other products in an ensemble approach to benefit from the large range of respective realisations of different physical aspects in the individual retrieval algorithms.
Nevertheless, the results of the presented scientific algorithm are also valuable in their own right, as TROPOMI enables the determination of XCO and XCH 4 with an unprecedented level of detail on a global scale, introducing new areas of application. It was shown that CO emissions from agglomeration areas, industrial facilities, in particular from the steel industry, and fires are readily detected, often even in a single satellite overpass. The same is true for CH 4 emissions from the energy sector, including leakage from oil and gas production and coal bed methane from coal mining. The future quantitative reinforcement of these primarily qualitative findings will potentially enable emission monitoring and air quality assessments, ideally on a daily recurrent basis. Furthermore, improved knowledge of the methane cycle, which is essential for better prediction of future climate, can be derived by combining inverse modelling with a comprehensive monitoring system comprising complementary information from accurate ground-based in situ measurements and satellite observations with a unique combination of high precision, spatiotemporal resolution, and global coverage.
Author contributions. OS designed and operated the TROPOMI/WFMD satellite retrievals, performed the data analysis, interpreted the results, and wrote the paper. MB, MR, HB, and JPB provided significant conceptual input to the design of the TROPOMI/WFMD satellite retrievals, the interpretation, and the improvement of the paper. TB and JL designed and executed the operational TROPOMI CO and CH 4 satellite retrievals and supported the interpretation of the results. NMD, DGF, DWTG, FH, CH, LTI, RK, IM, JN, CP, DFP, SR, KeS, KiS, RS, VAV, TW, and DW operated the TCCON retrievals for the various sites and supported the interpretation of the results. All authors discussed the results and commented on the paper.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "TROPOMI on Sentinel-5 Precursor: first year in operation (AMT/ACP inter-journal SI)". It is not associated with a conference.
Acknowledgements. This publication contains modified Copernicus Sentinel data (2017,2018). Sentinel-5 Precursor is an ESA mission implemented on behalf of the European Commission. The TROPOMI payload is a joint development by the ESA and the Netherlands Space Office (NSO). The Sentinel-5 Precursor groundsegment development has been funded by the ESA and with national contributions from the Netherlands, Germany, and Belgium. The research leading to the presented results has in part been funded by the ESA projects GHG-CCI, GHG-CCI+, and S5L2PP, the Federal Ministry of Education and Research project AIRSPACE, and by the State and the University of Bremen.
TCCON data were obtained from the TCCON Data Archive, hosted by CaltechDATA, California Institute of Technology (https: //tccondata.org/, last access: 9 September 2019), extended by updates for Eureka, Ny-Ålesund, Białystok, Burgos, Darwin, and Wollongong provided by the respective PIs of these sites. We thank Coleen Roehl and Paul Wennberg of Caltech as well as Jean-François Blavier and Geoffrey Toon of JPL for their efforts to operate the TCCON sites at Park Falls, Lamont, JPL, and Caltech and for providing support for the entire TCCON, which has evolved to the primary validation resource for our satellite data sets. The East Trout Lake TCCON station is supported by the Canada Foundation for Innovation, the Ontario Research Fund, Environment and Climate Change Canada (ECCC), and the Canadian Space Agency (CSA). The Eureka TCCON measurements were made at the Polar Environment Atmospheric Research Laboratory (PEARL) by the Canadian Network for the Detection of Atmospheric Change (CANDAC), primarily supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), ECCC, and CSA. The US TCCON sites used in this analysis are supported by the NASA OCO-2 Project and the NASA Carbon Cycle Science Program. The TCCON sites at Tsukuba and Burgos are supported in part by the GOSAT series project. Site support for Burgos is provided by the Energy Development Corporation (EDC, Philippines). The ESA Ariane Tracking Station at North East Bay, Ascension Island, is acknowledged for hosting the Ascension Island TCCON station. The TCCON stations Ascension Island, Garmisch, Karlsruhe, and Ny-Ålesund have been supported by the European Space Agency (ESA) under grant 3-14737 and by the German Bundesministerium für Wirtschaft und Energie (BMWi) via the DLR under grants 50EE1711A to 50EE1711E. KIT/IMK-ASF (Karlsruhe) and KIT/IMK-IFU (Garmisch-Partenkirchen, in cooperation with the University of Augsburg) acknowledge the BMWi for funding data analysis and delivery through DLR projects (contracts 50EE1711A, B, C, D). The Lauder TCCON programme is core-funded by the NIWA through New Zealand's Ministry of Business, Innovation and Employment. JN, CP, and TW acknowledge financial support by the DFG within Transregio (AC) 3 .
We acknowledge the use of VIIRS imagery from the NASA Worldview application (https://worldview.earthdata.nasa.gov/, last access: 9 September 2019) operated by the NASA/Goddard Space Flight Center Earth Science Data and Information System (ESDIS). We also thank the European Centre for Medium-Range Weather Forecasts (ECMWF) for providing the meteorological analysis and Peter Bergamaschi for providing the MACC-II project flux inversion system methane mole fractions used to generate the utilised XCH 4 climatology.
Financial support. The research leading to the presented results has in part been funded by the ESA projects GHG-CCI, GHG-CCI+, and S5L2PP, the Federal Ministry of Education and Research project AIRSPACE, and by the State and the University of Bremen.
The article processing charges for this open-access publication were covered by the University of Bremen.
Review statement. This paper was edited by Helen Worden and reviewed by two anonymous referees.