An Improved Total and Tropospheric NO 2 Column Retrieval for GOME-2

. An improved algorithm for the retrieval of total and tropospheric nitrogen dioxide (NO 2 ) columns from the Global Ozone Monitoring Experiment-2 (GOME-2) is presented. The reﬁned retrieval will be implemented in a future version of the GOME Data Processor (GDP) as used by the EUMETSAT Satellite Application Facility on Atmospheric Composition and UV Radiation (AC-SAF). The ﬁrst main improvement is the application of an extended 425-497 nm wavelength ﬁtting window in the differential optical absorption spectroscopy (DOAS) retrieval of the NO 2 slant column density, based on which initial 5 total NO 2 columns are computed using stratospheric air mass factors (AMFs). Updated absorption cross-sections and a linear offset correction are used for the large ﬁtting window. An improved slit function treatment is applied to compensate for both long-term and in-orbit drift of the GOME-2 slit function. Compared to the current operational (GDP 4.8) dataset, the use of these new features increases the NO 2 columns by ∼ 1-3 × 10 14 molec/cm 2 and reduces the slant column error by ∼ 24%. In addition, the bias between GOME-2A and GOME-2B measurements is largely reduced by adopting a new level 1b data version 10 in the DOAS retrieval. The retrieved NO 2 slant columns show good consistency with the Quality Assurance for Essential Climate Variables (QA4ECV) retrieval with a good overall quality. Second, the STRatospheric Estimation Algorithm from Mainz (STREAM),

The validation is illustrated for 6 stations covering urban, suburban, and background situations.Compared to the GDP 4.8 product, the new dataset presents an improved agreement with the MAXDOAS measurements for all the stations.

Introduction
Nitrogen dioxide (NO 2 ) is an important trace gas in the Earth's atmosphere.In the stratosphere, NO 2 is strongly related to halogen compound reactions and ozone destruction (Solomon, 1999).In the troposphere, nitrogen oxides (NO x =NO 2 +NO) serve as a precursor of zone in the presence of volatile organic compounds (VOC) and of secondary aerosol through gas-toparticle conversion (Seinfeld et al., 1998).As a prominent air pollutant affecting human health and ecosystem, large amounts of NO 2 are produced in the boundary layer by industrial processes, power generation, transportation, and biomass burning over polluted hot spots.For instance, a strong growth of NO 2 during the past two decades has caused severe air pollution problems for China with largest NO 2 columns in 2011, since then, cleaner techniques and stricter controlling have been applied to reduce the NO 2 pollution (Richter et al., 2005;Mijling et al., 2017;Liu et al., 2017).An increase in NO 2 concentrations due to economic growth is also found over India with a peak in 2012 (Hilboll et al., 2017).Despite the decrease in NO x emissions in Europe, still around half of European Union member states exceed the air quality standards mainly caused by diesel car emissions (European Commission, 2017).
The GOME-2 total and tropospheric NO 2 products are generated using the GOME Data Processor (GDP) algorithm at the German Aerospace Center (DLR).The retrieval algorithm has been first described by Valks et al. (2011) as implemented in the GDP version 4.4 and was later updated to the current operational version 4.8 (Valks et al., 2017).The NO 2 retrieval for GOME-2 follows a classical 3-steps scheme.
First, the total NO 2 slant columns (namely the concentration integrated along the effective light path from the Sun through the atmosphere to the instrument) are derived using the differential optical absorption spectroscopy (DOAS) method (Platt and Stutz, 2008).The DOAS technique is a least-squares method fitting the molecular absorption cross-sections to the measured GOME-2 sun-normalized radiances provided by the EUMETSAT's processing facility.The fit is applied on the data within a fitting window optimized for NO 2 .As analysed by Richter et al. (2011) and in the Quality Assurance for Essential Climate Variables (QA4ECV, www.qa4ecv.eu)project, extension of the fitting window for GOME-2 increases the signal-to-noise ratio and hence improves the NO 2 slant column error.The total NO 2 slant columns depend on the viewing geometry and also on parameters such as surface albedo and the presence of clouds and aerosol loads.They are therefore converted to initial total NO 2 vertical columns through division by a stratospheric airmass factor.
Second, the stratospheric contribution is estimated and separated from the NO 2 slant columns (referred to as stratospheretroposphere separation).The GDP 4.8 algorithm applies a modified reference sector method, which uses measurements over clean regions to estimate the stratospheric NO 2 columns based on the assumption of longitudinally invariable stratospheric NO 2 layers and of negligible tropospheric NO 2 abundance over the clean areas.The modified reference sector method defines a global pollution mask to remove potentially polluted regions and applies an interpolation over the unmask areas to derive the stratospheric NO 2 columns.As a result of using a fixed pollution mask, the modified reference sector method in GDP 4.8 has larger uncertainties over polluted areas, because limited amount of information over continents is used.To overcome the shortcomings, the STRatospheric Estimation Algorithm from Mainz (STREAM) method (Beirle et al., 2016) has been developed for TROPOMI instrument and was also successfully applied on GOME, SCIAMACHY, OMI, and GOME-2 measurements.
Belonging also to the modified reference sector method, STREAM defines not a fixed pollution mask but weighting factors for each observation to determine its contribution to the stratospheric estimation.
Third, the tropospheric NO 2 vertical columns are calculated from the tropospheric slant columns by an air mass factor (AMF) calculation, which contributes the largest uncertainty to the NO 2 retrieval, in particular over polluted regions (Boersma et al., 2004).The AMFs are determined with a radiative transfer model (RTM) and stored in a look-up table (LUT) requiring ancillary information such as surface albedo, vertical shape of the a priori NO 2 profile, clouds and aerosols.Improvements in the RTM and LUT interpolation scheme, the ancillary parameters, and the cloud and aerosol correction approach have been reported for OMI instrument (e.g., Boersma et al., 2011;Lorente et al., 2017;Vasilkov et al., 2017;Krotkov et al., 2017;Veefkind et al., 2016;Lin et al., 2014;Castellanos et al., 2015;Laughner et al., 2018), which in principle are beneficial for similar satellite instruments like GOME-2.
In this paper, a new algorithm to retrieve the total and tropospheric NO 2 for the GOME-2 instruments is described, which includes improvements in each of the 3 algorithm steps introduced above.The improved algorithm will be implemented in the next version of GDP (referred to as GDP 4.9 hereafter).We briefly introduce the GOME-2 instrument (Sect.2) and the current operational (GDP 4.8) total and tropospheric NO 2 retrieval algorithm (Sect.3).We present the improvements to the DOAS slant column retrieval (Sect.4), the stratosphere-troposphere separation (Sect.5), and the AMF calculation (Sect.6).Finally, we show an end-to-end validation of the tropospheric NO 2 dataset using ground-based multiple-axis DOAS (MAXDOAS) datasets with different pollution conditions (Sect.7).
2 Instrument and measurements GOME-2 is a nadir-scanning UV-VIS spectrometer aboard the MetOp-A and MetOp-B satellites (referred to as GOME-2A and GOME-2B throughout this study) with a satellite repeating cycle of 29 days and an equator crossing time of 9:30 local time (descending node).The GOME-2 instrument measures the Earth's backscattered radiance and extra-terrestrial solar irradiance in the spectral range between 240 and 790 nm.The morning measurements from GOME-2 provide a better understanding of the diurnal variations of the NO 2 columns in combination with afternoon observations from for example the OMI and TROPOMI instruments (13:30 local time).The default swath width of GOME-2 is 1920 km, enabling a global coverage in ∼1.5 days.
The default ground pixel size is 80×40 km 2 in the forward scan, which remains almost constant over the full swath width.In a tandem operation of MetOp-A and MetOp-B from July 2013 onwards, a decreased swath of 960 km and an increased spatial resolution of 40×40 km 2 are employed by GOME-2A.See Munro et al. (2016) for more details on instrument design and performance.
The operational GOME-2 NO 2 product is provided by DLR in the framework of EUMETSAT's Satellite Application Facility on Atmospheric Composition Monitoring (AC-SAF).The product processing chain starts with the level 0 to 1b processing within the core ground segment at EUMETSAT in Darmstadt (Germany), where the raw instrument (level 0) data is converted into geolocated and calibrated (level 1b) (ir)radiances by the GOME-2 Product Processing Facility (PPF).The level 1b (ir)radiances are disseminated through the EUMETCast system to the AC-SAF processing facility at DLR in Oberpfaffenhofen (Germany), and further processed using the Universal Processor for UV/VIS Atmospheric Spectrometers (UPAS) system.Broadcasted via EUMETCast, WMO/GTS, and the Internet, the resulting level 2 near-real-time total column products including NO 2 columns can be received by user communities 2 hours after sensing.Offline and reprocessed GOME-2 level 2 and consolidated products are also provided within 1 day by DLR, which can be ordered via FTP-server and the EUMETSAT Data Centre (https://acsaf.org/).
3 Total and tropospheric NO 2 retrieval for GDP 4.8 The first main step of the retrieval algorithm is the DOAS technique, which is applied to determine the total NO 2 slant columns from the (ir)radiance spectra measured by the instrument.Based on the Beer-Lambert's law, the DOAS fit is a least-squares inversion to isolate the trace gas absorption from the background processes, e.g., extinction resulting from scattering on molecules and aerosols, with a background polynomial P (λ) at wavelength λ: The measurement-based term is defined as the natural logarithm of the measured earthshine radiance spectrum I(λ) divided by the daily solar irradiance spectrum I 0 (λ).The intensity offset correction of f set(λ), which describes the additional contributions such as stray light in the spectrometer to the measured intensity, is modelled using a zero order polynomial with polynomial coefficient as fitting parameter.The spectral effect from the absorption of species g is determined by the fitted slant  .
Table 1 gives an overview of the DOAS settings for the current operational GDP 4.8 algorithm, the improved version 4.9 algorithm (see Sect. 4), and the algorithm used in the QA4ECV product (see Sect. 4.5).
The second component in the retrieval is the calculation of initial total vertical column densities V init using an stratospheric AMF (M strat ) conversion: Given the small optical thickness of NO 2 , M strat can be determined as: with m l the box-air mass factors (box-AMFs) in layer l, x l the altitude-dependent subcolumns from a stratospheric a priori NO 2 profiles climatology (Lambert et al., 1999), and c l a correction coefficient to account for the temperature dependency of NO 2 cross-section (Boersma et al., 2004;Nüß et al., 2006).The calculation of V init assumes negligible tropospheric NO 2 and hence uses only the stratospheric a priori NO 2 profiles to derive AMF.The box-AMFs m l are derived using the multi-layered multiple scattering LIDORT RTM (Spurr et al., 2001) and stored in a LUT as a function of various model inputs b, including GOME-2 viewing geometry, surface pressure, and surface albedo.The surface albedo is described by the Lambertian-equivalent reflectivity (LER).The surface LER climatology used in the GDP 4.8 algorithm is derived from combined TOMS/GOME measurements (Boersma et al., 2004) for the years 1979-1993 with a spatial resolution of 1.25 • lon×1.0• lat.
In the presence of clouds, the calculation of M strat adopts the independent pixel approximation based on GOME-2 cloud parameters: with ω the cloud radiance fraction, M cloud strat the cloudy-sky stratospheric AMF, and M clear strat the clear-sky stratospheric AMF.M cloud strat and M clear strat are derived with Eq. ( 3) with M cloud strat mainly relying on the cloud pressure and the cloud albedo.ω is derived from the cloud fraction c f : where I cloud is the radiance for a cloudy scene and I clear for a clear scene.I cloud and I clear are calculated using LIDORT, depending mostly on the GOME-2 viewing geometry, surface albedo and cloud albedo.From GOME-2, c f is determined with the Optical Cloud Recognition Algorithm (OCRA) by separating a spectral scene into cloudy contribution and cloud-free background, and the cloud pressure and the cloud albedo are derived using the Retrieval Of Cloud Information using Neural Networks (ROCINN) algorithm by comparing simulated and measured radiance in and near the O 2 A-band (Loyola et al., 2007(Loyola et al., , 2011)).Applied in the NO 2 retrieval in GDP 4.8, the latest version 3.0 of the OCRA (Lutz et al., 2016) applies a degradation correction on the GOME-2 level 1 measurements as well as corrections for viewing angle and latitudinal dependencies.A new cloud-free background is constructed from six years of GOME-2A measurements from the years 2008-2013.The updated OCRA also includes an improved detection and removal of sun glint that affects most of the GOME-2 orbits.The version 3.0 of ROCINN (Loyola et al., 2018) applies a forward RTM calculation using updated surface albedo climatology and spectroscopic data as well as a new inversion scheme based on Tikhonov regularization (Tikhonov and Arsenin, 1977;Doicu et al., 2010).
The computation time of ROCINN is optimised with a smart sampling method (Loyola et al., 2016).
The next retrieval step is the separation of stratospheric and tropospheric components from the initial vertical total columns, namely the "stratosphere-troposphere separation".Since no direct stratospheric measurements are available for GOME-2, a spatial filtering algorithm is applied to estimate the stratospheric NO 2 columns in GDP 4.8.The spatial filtering algorithm belongs to the modified reference sector method, which uses total NO 2 columns over clean regions to approximate the stratospheric NO 2 columns based on the assumption of longitudinally invariable stratospheric NO 2 layers and of negligible tropospheric NO 2 abundance over the clean areas.The spatial filtering algorithm uses a pollution mask to filter the potentially polluted areas (tropospheric NO 2 columns larger than 1 × 10 15 molec/cm 2 ), followed by a low-pass filtering (with a zonal 30 • boxcar filter) on the initial total columns of the unmasked areas, and afterwards a removal of a tropospheric background NO 2 (1 × 10 14 molec/cm 2 ) from the derived stratospheric columns.
Finally, the tropospheric NO 2 columns V trop can be computed as: where M strat is the stratospheric AMF in Eq. (3), M trop is the tropospheric AMF, and T is the tropospheric residues (T = V init − V strat ).M trop is determined using Eq. ( 3) and (4) with tropospheric a priori NO 2 profiles.The calculation of M trop relies on the same model parameters as of M strat , but the dependency on the parameters like surface albedo and cloud properties as well as on the a priori NO 2 profiles is much stronger.The GDP 4.8 adopts the tropospheric a priori NO 2 profiles from a run of global chemistry transport model MOZART version 2 (Horowitz et al., 2003) with anthropogenic emissions from the EDGAR2.0inventory (Olivier et al., 1996) for the early 1990s.The monthly average vertical profiles are calcu-lated from MOZART-2 data from the year 1997 for the overpass time of GOME-2 (9:30 local time) with a resolution of 1.875 • lon×1.875• lat.

Improved DOAS slant column retrieval
A larger 425-497 nm wavelength fitting window for the DOAS method (Richter et al., 2011) is implemented in the GDP 4.9 to retrieve the NO 2 slant columns, which improves the signal-to-noise ratio by including more NO 2 absorption structures.
Compared to the extended 405-465 nm range, as employed by the QA4ECV GOME-2 NO 2 product and used in the NO 2 retrieval for OMI instrument (Boersma et al., 2002;van Geffen et al., 2015), the 425-497 nm fitting window has stronger sensitivity to NO 2 columns in boundary layer because the importance of scattering decreases with wavelength (Richter and verification team, 2015).In this study, the slant columns are derived using QDOAS software developed at the Belgian Institute for Space Aeronomy (BIRA-IASB) (Danckaert et al., 2015) 1 .Table 1 summarises the new settings of the GDP 4.9 algorithm.

Absorption cross-sections
In the fitting window optimized for NO 2 retrieval, the DOAS fit includes species with strong and unique absorption structures and describes their spectral effect using absorption cross-sections from literature.In our GDP 4.9 algorithm, the absorption It is worth noting that our improved DOAS retrieval in the GDP 4.9 adopts a decreased temperature of NO 2 cross-section 220K instead of 240K in GDP 4.8, Valks et al. 2017) for a consistency with other NO 2 retrievals from GOME-2, OMI and TROPOMI (Müller et al., 2016;Boersma et al., 2002;van Geffen et al., 2015van Geffen et al., , 2016)), with minor effect on the fit quality (∼0.02%) from the two temperatures.Changing the temperature of NO 2 cross-section from 240K to 220K reduces the NO 2 slant columns by ∼6%-9%, but this temperature dependency is corrected in the AMF and vertical column calculation (see Eq. (3)).
The spectral signature of sand absorption has been investigated by Richter et al. (2011) for GOME-2 data, but it is not applied here because of the potential interference with the broadband liquid water structure (Peters et al., 2014), which might lead to non-physical results over the ocean.

Intensity offset correction
Besides the radiances backscattered by the Earth's atmosphere, a number of both natural (i.e. the Ring effect) and instrumental (e.g., stray light in the spectrometer and change of detector's dark current) sources contribute to an additional "offset" to the scattering intensity.To correct for this drift, an intensity offset correction with a linear wavelength dependency (i.e., polynomial degree of 1) is applied for the large fitting window in this study.Figure 1 illustrates the effect of using a linear intensity offset correction for the large fitting window on 3 March 2008.The use of a linear offset correction increases the NO 2 columns by up to 3 × 10 14 molec/cm 2 (17%) and decreases the fitting residues (retrieval root-mean-square, RMS) by up to 30%.Larger differences are found at the eastern scans (eastern part of GOME-2 swath), possibly suggesting instrumental issues specific to GOME-2.For the retrieval RMS, stronger improvements are mainly located above ocean, arguably from the compensation of inelastic vibrational Raman scattering in water bodies (Vountas et al., 2003).
The intensity offset can also be fitted using only the constant term, as employed by the GDP 4.8 algorithm (with 425-450 nm wavelength window) and as recommended by the QA4ECV algorithm (with 405-465 nm).Compared to the use of linear intensity offset correction, the application of a constant term on our retrieval shows a decrease in the NO 2 columns by up to 3.5 × 10 14 molec/cm 2 (17%) and an increase in the retrieval RMS by up to 14%, which implies the necessity of using a linear intensity offset correction for the large 425-497 nm wavelength range.

GOME-2 slit function treatment
An accurate treatment of the instrumental slit function is essential for the wavelength calibration and the convolution of highresolution laboratory cross-sections.In spite of a generally good spectral stability of GOME-2 in orbit, the width of GOME-2 slit function has been changing on both long and short timescales (Munro et al., 2016), which needs to be accounted for in the DOAS analysis.In this study, an improved treatment of GOME-2 slit function in the DOAS fit is achieved by calculating effective slit functions from GOME-2 irradiance measurements to correct for the long-term variations (see Sect. 4.3.1)and by including an additional cross-section in the DOAS fit to correct for the short-term variations (see Sect. 4.3.2).

Long-term variations
To analyse the long-term variations of the GOME-2 instrumental slit function and the impact on our retrieval, effective slit functions are derived by convolving a high-resolution reference solar spectrum (Chance and Kurucz, 2010) with a stretched  long-term variations in the GOME-2 slit function are caused by changing temperatures of the optical bench due to the seasonal variation in solar heating and the lack of thermal stability of the optical bench, respectively (Munro et al., 2016).Although the variations are only a few percent, the effect on the DOAS retrieval is significant.Compared to the application of the preflight slit function, the use of a stretched slit function improves the calibration residuals by ∼40% for both GOME-2A and GOME-2B (not shown).
In previous studies, slit functions have also been fitted using various Gaussian shapes.For instance, De Smedt et al. ( 2012) have derived effective GOME-2 slit functions for formaldehyde retrieval using an asymmetric Gaussian with it's width and shape as fit parameters.For NO 2 retrieval, the use of effective slit functions with an asymmetric Gaussian leads to similar results as using a preflight slit function.In addition, Beirle et al. (2017) have proposed a slit function parameterization using a Super Gaussian, which is proved to quickly and robustly describe the slit function changes for satellite instrument OMI or TROPOMI.In the case of GOME-2, the Super Gaussian obtains nearly identical results as the asymmetric Gaussian and is therefore not applied in here.

In-orbit variations
To correct for the in-orbit variations of GOME-2 slit function, a "resolution correction function" (Azam et al., 2015) is included as an additional cross-section in the DOAS fit (see Table 1).The cross-section is derived by dividing a high-resolution solar spectrum (Chance and Kurucz, 2010)

GOME-2 level 1b data
As described in Sect.2, the level 0 to 1b processing by the PPF at EUMETSAT calculates the geolocation and calibration parameters and produces the calibrated level 1b (ir)radiances.Due to the incomplete removal of Xe-line contamination in the GOME-2B calibration key-data (calibration key-data is taken during the on-ground campaign and required as an input to the level 0 to 1b processing), artefacts at wavelength larger than 460 nm have been reported by Azam et al. (2015) for GOME-2B irradiances.Mainly focusing on the cleaning of contamination in the GOME-2B calibration key-data, a new version 6.1 of the GOME-2 level 0 to 1b processor has been activated from 25 June 2015 onwards (EUMETSAT, 2015).To study the impact of the new level 1b data on our GDP 4.9 algorithm using the 425-497 nm fitting window, the retrieval is analysed using both the new version 6.1 (testing dataset provided by EUMETSAT for March 2015) and the previous version 6.0 data for the same period.Figure 4 presents a comparison of the retrieved NO 2 columns over the Pacific for GOME-2A and GOME-2B.The application of the version 6.1 level 1b data slightly reduces the NO 2 columns by ∼1-1.5 × 10 14 molec/cm 2 (∼6-11%) for GOME-2A.A larger effect is observed for GOME-2B with a decrease of NO 2 columns by ∼3-4×10 14 molec/cm 2 (∼15-23%) and a reduction of RMS error by ∼27-33% (not shown).The stronger decrease of GOME-2B NO 2 columns leads to a better consistency between the datasets from GOME-2A and GOME-2B with an overall bias reduced from ∼3 × 10 14 molec/cm 2 to ∼1 × 10 14 molec/cm 2 .

Comparison to QA4ECV data
The quality of the GDP 4.9 retrieval is evaluated using the GOME-2 NO 2 dataset from QA4ECV, which is a project aiming at quality-assured satellite products using a retrieval algorithm harmonised for GOME, SCIAMACHY, OMI and GOME-2.The GOME-2A NO 2 columns from QA4ECV (version 1.1) for the years 2007-2015 have shown an improved quality over previous datasets (Zara et al., 2018).Table 1 gives an overview of the DOAS settings used in the QA4ECV project.Figure 5 shows a comparison of the NO 2 columns over the Pacific from the GDP 4.8 algorithm, the GDP 4.9 algorithm, and the QA4ECV data for February 2007.For comparison, only ground pixels with solar zenith angle smaller than 80 • are considered.The GDP 4.8 dataset has been adjusted using a 220K Vandaele et al. (2002) NO 2 cross-section to remove the influence of temperature dependency of NO 2 cross-section (see discussion in Sect.4.1).Compared to the GDP 4.8 dataset, the improved DOAS retrieval in the GDP 4.9 increases the NO 2 columns by ∼1-3×10 14 molec/cm 2 (up to 27%).Compared to the QA4ECV product, a good overall consistency is found with the GDP 4.9 dataset at all latitudes considering the different DOAS settings such as fitting window, offset correction, and slit function characterisation.Figure 6 presents the time series of calculated slant column errors from the three datasets, following a statistical method to analyze the NO 2 slant column uncertainty for GOME-2 (Valks et al., 2011, Sect. 6.1 therein).The slant column errors, calculated as variations of NO 2 measurements within small boxes (2 increase for all the three datasets as a result of instrument degradation (Dikty et al., 2011;Munro et al., 2016) until the major throughput test in September 2009 (see Sect. 4.3.1)and stabilize afterwards.Mainly driven by the use of a wider fitting window 5 with stronger absorptions, smallest slant column errors are found by the GDP 4.9 algorithm, e.g., 23.8% smaller than from the GDP 4.8 and 13.5% smaller than from the QA4ECV dataset in February 2007, with an increasing difference with time for the QA4ECV dataset (27.9% in December 2015).
5 New stratosphere-troposphere separation The calculation of tropospheric NO 2 requires an estimation and removal of the stratospheric contribution to the initial total NO 2 columns.In our GDP 4.9 retrieval, the stratosphere-troposphere separation algorithm STREAM (Beirle et al., 2016) has been adapted to GOME-2 measurements.Belonging to the modified reference sector method, STREAM uses initial total NO 2 columns with negligible tropospheric contribution, i.e., unpolluted measurements at remote areas and cloudy measurements at medium altitudes, to derive the stratospheric NO 2 columns.Based on a tropospheric NO 2 climatology and the GOME-2 cloud product, STREAM calculates weighting factors for each satellite pixel to define the contribution of initial total columns to the stratospheric estimation: potentially polluted pixels are weighted low instead of being totally masked out in the GDP 4.8 spatial filtering method; cloudy observations at medium altitudes are given higher weights because they directly provide the stratospheric information; the weights are further adjusted in a second iteration if pixels suffer from large biases in the tropospheric residues.Depending on these weighting factors, stratospheric NO 2 fields are derived by weighted convolution on the daily initial total columns using convolution kernels.The convolution kernels are wider at lower latitudes due to the longitudinal homogeneity assumption of stratospheric NO 2 and narrower at higher latitudes to reflect the stronger natural variations.To remove the biases in the weighted convolution resulting from the large latitudinal gradients, a latitudinal correction is applied on the initial total columns: the latitudinal dependencies of initial total NO 2 are calculated over the clean Pacific, removed from the initial total NO 2 before weighted convolution, and added back to the estimated stratospheric columns afterwards.However, we found that longitudinal variations of NO 2 concentration resulted in biases in the latitudinal correction and hence in the stratospheric estimation.For the adaptation of STREAM to GOME-2 measurements, the performance of STREAM is analysed using synthetic GOME-2 NO 2 observations (see Sect. 5.1) and an improved latitudinal correction is applied (see Sect. 5.2).

Performance of STREAM
To test the performance of STREAM for GOME-2, simulated NO 2 fields from the C-IFS-CB05-BASCOE (referred to as C-IFS throughout this work) experiment (Huijnen et al., 2016) (see Eq. ( 2)).Modelled NO 2 slant columns S are based on the total vertical columns V total from C-IFS with interpolation to match the GOME-2 centre pixel coordinate and measurement time.Total AMFs M total and stratospheric AMFs M strat are derived using Eq. ( 3)-( 5) with surface properties and cloud information from GOME-2 orbital data and with C-IFS a priori NO 2 profiles for the whole atmosphere and between the tropopause (defined by a latitude-dependent parameterization with the tropopause height ranging from 270 hPa for arctic to 92 hPa for tropics) and the top of the atmosphere, respectively.
The performance of STREAM is evaluated by applying the synthetic initial total NO 2 columns and comparing the estimated stratospheric NO 2 columns with the a priori truth (stratospheric fields from C-IFS integrated between the tropopause and the top of the atmosphere).V strat (a priori truth) 05 August 05 February

Improved latitudinal correction
In Fig. 8 (top), larger differences are noticeable over the subtropical regions in winter for both days, primarily related to the latitudinal correction used in STREAM.As described in the previous Sect.5, the latitudinal correction is applied by  to reduce the biases over the subtropics.The new latitudinal correction determines the latitudinal dependencies of total NO 2 based on clean measurements in the whole latitude band (the median of lowest NO 2 columns for each 1 • latitude band).Figure 8 (bottom) shows the difference for the estimated stratospheric NO 2 using the improved latitudinal correction.For both days, the application of the new latitudinal correction in STREAM largely removes the biases over the subtropics in Fig. 8 (top).
Applying the improved STREAM on GOME-2 data, Fig. 9 presents the initial total columns from GOME-2 and the stratospheric NO 2 calculated with STREAM and with the spatial filtering method used in the GDP 4.8 algorithm (see Sect. 3) in February and August 2009.For both months, the results calculated with STREAM and with the spatial filtering method show similar global structures.Since the spatial filtering method applies a fixed pollution mask to remove the potentially polluted regions (tropospheric NO 2 larger than 1 × 10 15 molec/cm 2 ), moderately polluted pixels with tropospheric NO 2 up to 1 × 10 15 molec/cm 2 still contribute to the stratospheric estimation.Therefore, enhanced stratospheric NO 2 by more than 5 × 10 14 molec/cm 2 is found over polluted regions, e.g., Middle East, China, central Africa, southern Africa, and Australia in Fig. 9 (bottom).This overestimation is largely removed by STREAM in Fig. 9 (center).
6 Improvements to NO 2 AMF calculation

RTM
As summarized in Table 2, updated box-AMFs are calculated using the linearised vector code VLIDORT (Spurr, 2006) version 2.7.VLIDORT applies the discrete ordinates method to generate simulated intensity and analytic intensity derivatives with respect to atmospheric and surface parameters (i.e.weighting functions).Box-AMFs m l (see Eq. ( 3)) are determined as: with I the simulated top-of-atmosphere radiance, τ N O 2 ,l the absorption optical thickness of NO 2 at layer l, and term ∂I ∂τ N O 2 ,l • τ N O 2 ,l the NO 2 profile weighting function.Compared to the scalar (intensity-only) LIDORT code, VLIDORT provides more realistic modelling results with a treatment of light polarisation, which affects the tropospheric AMFs by up to 4%.
The box-AMFs m l for each layer are calculated for the mid-point wavelength of fitting window, i.e., 461 nm in our NO 2 retrieval, which is representative of the window-average box-AMFs.Compared to the tropospheric AMFs at 440 nm (mid-point wavelength in GDP 4.8), the ones calculated at 461 nm are higher by up to 10% for polluted situations, due to the wavelength- V strat (STREAM) related to the wavelength-dependency of the AMF is much smaller than the uncertainties introduced by surface albedo, a priori NO 2 profile, cloud and aerosol (see Sect. 6.4).
m l is calculated with the RTM and stored in a LUT as a function of GOME-2 viewing geometry, surface pressure, and surface albedo.Compared to the LUT used in the GDP 4.8, a new LUT is calculated with an increased number of reference points, e.g., for surface pressure (from 10 to 16) and for surface albedo (from 10 to 14), as well as vertical layers (from 24 to 60) to reduce the interpolation error (Lorente et al., 2017), leading to differences in tropospheric AMFs by up to 2%.

Surface albedo
Surface albedo is an important parameter for an accurate retrieval of NO 2 columns and cloud properties.The sensitivity of backscattered radiance to the boundary layer NO 2 is strongly related to the surface albedo, especially over polluted areas.In the GDP 4.9, the surface LER climatology based on TOMS/GOME data (Boersma et al., 2004)  Figure 10 shows the surface LER data from the GOME-2 and TOMS/GOME observations for 440 nm in February.A good overall consistency is found between the two LER datasets, in particular over the ocean.Larger differences are found over certain snow or ice areas, like Russia and southern Canada, which can be attributed to changes in snow or ice cover during the different measurement periods of the two LER datasets.Increased spatial resolution for the GOME-2 LER version 2.1 dataset Figure 11 illustrates the influence of the updated surface LER at 440 nm on the retrieved tropospheric NO 2 columns in February 2008.The difference over the ocean is very small.Larger effects are noticed primarily under polluted conditions with positive differences, e.g., over parts of central Europe, Russia or USA, and negative values, e.g., over parts of South Africa, India or China.The differences in the retrieved tropospheric NO 2 columns are consistent with the changes in the surface LER.
For example, the GOME-2 surface LER over central Europe is ∼0.012 smaller than TOMS/GOME data, and a lower sensitivity to tropospheric NO 2 is therefore assumed in the AMF calculation.This results in a decrease in the AMF and hence an increase in the retrieved tropospheric NO 2 column by ∼7 × 10 14 molec/cm 2 (∼12%).Vice versa, an increase of the surface LER values by ∼0.018 over the Yangtze River region in eastern China leads to a reduction of tropospheric NO 2 columns by ∼4 × 10 15 molec/cm 2 (∼15%).
As described in Sect.6.1, the AMFs are calculated for 461 nm in the GDP 4.9 (425-497 nm wavelength window) instead of 440 nm in the GDP 4.8 (with 425-450 nm wavelength window), therefore the corresponding surface LER values are 463 nm are used.The surface LER values at 463 nm are higher by up to 0.02 over desert areas and lower by up to 0.02 over the ocean and the snow or ice areas, which result in differences of up to 5% in the calculated AMFs.
The surface LER climatology from Kleipool et al. (2008) derived from OMI measurements for 2004-2007 has been widely used in satellite NO 2 retrievals (e.g., Boersma et al., 2011;Barkley et al., 2013;Bucsela et al., 2013).An important advantage of using the GOME-2 LER climatology with respect to the OMI LER dataset in our retrieval is the consistency with the GOME-2 NO 2 observations, considering the illumination conditions, observation geometry, and instrumental characteristics.Another advantage of the GOME-2 LER climatology is the use of more recent observations to reduce the errors introduced by ignoring the interannual variability of surface albedo, which are possibly large for varying snow and ice situations.Possible corrections for the surface albedo from a climatology include the use of external information about the actual snow and ice conditions, e.g., from Near-real-time Ice and Snow Extent (NISE) dataset (Nolin et al., 2005).

A priori vertical profiles
The retrieved tropospheric NO 2 columns are sensitive to changes in the relative vertical distribution of the a priori NO 2 concentrations (i.e.profile shape).Increasing the spatial and/or temporal resolution of the a priori profiles have shown to produce a more accurate NO 2 retrieval (e.g., Russell et al., 2011;Heckel et al., 2011;McLinden et al., 2014;Nüß et al., 2006;Laughner et al., 2016).To improve the tropospheric AMF calculation, daily a priori NO 2 profiles are obtained with a resolution of 1 • lon×1 • lat from the chemical transport model TM5-MP (Huijnen et al., 2010;Williams et al., 2017).The TM5-MP profiles have been used in several studies to derive AMFs and tropospheric NO 2 columns (e.g., van Geffen et al., 2016;Lorente et al., 2017;Boersma et al., 2018).
Figure 12 shows the TM5-MP and MOZART-2 a priori NO 2 profiles for two pollution hot spots located in Brussels (Belgium, 50.9 • N, 4. In Fig. 12, the tropospheric NO 2 columns retrieved for the individual days using TM5-MP and MOZART-2 a priori NO 2 profiles are also reported.Taking Brussels on 11 February 2009 (Fig. 12 top left) as an example, the smaller boundary layer concentration modelled by TM5-MP (less steep profile shape) leads to an increase in the tropospheric AMF and hence a decrease in the retrieved tropospheric NO 2 columns by 2.6 × 10 15 molec/cm 2 (19.7%).Figure 13 presents a comparison of the monthly averaged tropospheric NO 2 columns retrieved using daily TM5-MP and monthly MOZART-2 a priori NO 2 profiles in February and August 2009.The application of the daily TM5-MP a priori NO 2 profiles affects the tropospheric NO 2 columns by more than 1×10 15 molec/cm 2 mostly over polluted regions with enhanced NO 2 in the boundary layer, e.g., with an increase of tropospheric NO 2 over parts of China, India, and South Africa, and a decrease over parts of eastern US, Europe, and Japan.
To analyse the effect of using daily vs. monthly profiles, the tropospheric NO 2 columns are also retrieved using the monthly average TM5-MP profiles, as shown in Fig. 12. Differences in the profile shape of daily and monthly profiles are mainly related to the variations in the meteorology.In agreement with Nüß et al. (2006) and Laughner et al. (2016), the use of monthly profiles changes the tropospheric NO 2 columns by up to 3 × 10 15 molec/cm 2 depending on the wind speed and wind direction, A priori NO 2 profile  in particular for regions affected by transport (not shown).For the example of Brussels on 11 February 2009 (Fig. 12 top left), the use of monthly profiles increases the tropospheric NO 2 columns by 5 × 10 14 molec/cm 2 (4.7%).A comprehensive analyse of the effect of using a priori NO 2 profiles from different chemistry transport models on the retrieved tropospheric NO 2 will be described in a subsequent paper.

Examples of GOME-2 tropospheric NO 2 5
Figure 14 shows the tropospheric NO 2 columns from the improved GDP 4.9 algorithm for February and August averaged for the year 2007-2016.Figure 15 shows the difference in tropospheric NO 2 columns from the GDP 4.9 and GDP 4.8 product.
The tropospheric NO 2 columns increase globally by ∼1 × 10 14 molec/cm 2 due to the improved DOAS slant column fitting  and increase further by ∼3 × 10 14 molec/cm 2 around moderately polluted regions beneficial from the use of new stratospheretroposphere separation algorithm STREAM.A stronger change by more than 1×10 15 molec/cm 2 is found mainly over polluted continents, as a result of the improvements to the AMF calculation, primarily the surface albedo (which also affects the snow or ice area, e.g., southern Canada and northeastern Europe) and/or the a priori NO 2 profiles (which also affects the polluted ocean, e.g., shipping lanes in southeastern Asia).

5
Over central northern Europe, the tropospheric NO 2 columns are reduced by ∼1 × 10 15 molec/cm 2 for GDP 4.9 in winter and ∼3 × 10 14 molec/cm 2 in summer.A larger number of negative values in GDP 4.8, possibly related to the overestimated stratospheric NO 2 around polar vortex areas, is largely corrected in GDP 4.9 by improving the stratosphere-troposphere separation algorithm.Over eastern China and eastern US, the seasonal variation is consistent between GDP 4.8 and 4.9, with reduced values in winter (by more than 1×10 15 molec/cm 2 ) and enlarged values in summer (by more than 1×10 15 molec/cm 2 for eastern China and 5 × 10 14 molec/cm 2 for eastern US) for GDP 4.9 due to the combined impact of the algorithm changes, mainly the AMF calculation.Over India and its surrounding areas, a systematic increase in tropospheric NO 2 columns by ∼7 × 10 14 molec/cm 2 for GDP 4.9 benefits from the use of STREAM.

Uncertainty estimates for GOME-2 total and tropospheric NO 2
The uncertainty in our GDP 4.9 NO 2 slant columns is 4.4 × 10 14 molec/cm 2 , calculated from the average slant column error using a statistical method described in Sect.4.5.The uncertainty in the GOME-2 stratospheric columns is ∼4-5 × 10 14 molec/cm 2 for polluted conditions based on the daily synthetic GOME-2 data and ∼1-2 × 10 14 molec/cm 2 for monthly averages.The uncertainty in the GDP 4.9 AMF calculation is likely reduced, considering the improved surface albedo climatology and a priori NO 2 profiles, which are the main causes of AMF structural uncertainty (Lorente et al., 2017).In addition, the AMF uncertainty is substantially driven by the cloud parameters and the aerosol correction approach.
The largest cloud-related uncertainty in NO 2 retrieval is introduced by the surface albedo-cloud fraction error correlation, as analysed by Boersma et al. (2018) for OMI using OMCLDO2 cloud product, which requires a surface albedo climatology as input in the cloud fraction retrieval.But this uncertainty is likely smaller for OCRA/ROCINN cloud algorithms, since the surface albedo is treated differently in OCRA's cloud fraction calculation.Retrieved by separating a spectral scene into cloudy contribution and cloud-free background, the cloud fraction from OCRA is affected by surface albedo through the cloudfree map construction with a larger impact over bright surfaces like snow or ice cover, in particular during snowfall (higher background) or melting (lower background), which has been corrected by interpolating towards a daily value between two monthly cloud-free map in OCRA (Lutz et al., 2016).
The uncertainty introduced by aerosol in GDP 4.9 is ∼50% for high aerosol loading, in agreement with Lorente et al. (2017).
With direct impact on NO 2 AMF calculation and indirect impact via cloud parameters retrieval, the aerosol effect has been considered for OMI implicitly through the cloud correction (Boersma et al., 2004(Boersma et al., , 2011) ) or explicitly with additional aerosol information for regional studies (Lin et al., 2014(Lin et al., , 2015;;Kuhlmann et al., 2015;Castellanos et al., 2015;Chimot et al., 2018), leading to an increase or decrease of NO 2 AMF by up to 40% depending on NO 2 distribution and aerosol properties and distribution.Since aerosol is highly variable in space and time due to the dependency on emission sources, transports, and atmospheric processes (Holben et al., 1991), explicit aerosol correction will be applied in our AMF calculation when reliable observations or model outputs of aerosol optical properties and vertical distributions are available.To conclude, the uncertainty in the AMF calculation is estimated to be in the 10-45% range for polluted conditions, leading to a total uncertainty in the tropospheric NO 2 columns likely in the range of 30-70%.
7 End-to-end GOME-2 NO 2 validation The validation of NO 2 data derived from GOME-2 GDP algorithm is part of the validation activities done at BIRA-IASB in the AC-SAF context (Hassinen et al., 2016).An end-to-end validation approach is usually performed for each main release and summarized in validation reports that can be found on AC-SAF validation website (http://cdop.aeronomie.be/validation/valid-reports).This includes several steps, such as: (1) the DOAS analysis results, cloud properties retrievals, and AMF evaluations by confrontation of GOME-2 retrievals to other established satellite retrievals and AMF evaluations; (2) the stratospheric reference evaluation by comparison with correlative observations from ground-based zenith-looking DOAS spectrometers and from other nadir-looking satellites; and (3) the tropospheric and total NO 2 column data evaluation by comparison with correlative observations from ground-based multiple-axis DOAS (MAXDOAS) and Direct Sun spectrometers (Pinardi et al., 2014).
In this paper, we focus on the last point: the validation of tropospheric data with BIRA-IASB ground-based MAXDOAS data.
The MAXDOAS instruments collect scattered sky light in a series of line-of-sight angular directions extending from the horizon to the zenith.High sensitivity towards absorbers near the surface is obtained for the smallest elevation angles, while measurements at higher elevations provide information on the rest of the column.This technique allows the determination of vertically resolved abundances of atmospheric trace species in lowermost troposphere (Hönninger et al., 2004;Wagner et al., 2004;Wittrock et al., 2004;Heckel et al., 2005).Here the bePRO retrieval code (Clémer et al., 2010;Hendrick et al., 2014;Vlemmix et al., 2015) is used to retrieve tropospheric columns and low tropospheric profiles (up to 3.5 km with about 2 to 3 degrees of freedom).
As summarised in Table 3, a set of MAXDOAS stations (Beijing, Bujumbura, Observatoire de Haute Provence (OHP), Reunion, Uccle, and Xianghe) is providing interesting test cases for GOME-2 sensitivity to tropospheric NO 2 .Indeed Beijing and Uccle are typical urban stations, Xianghe is a suburban station (∼60 km from Beijing), Bujumbura and Reunion are small cities in remote regions, and OHP is largely rural but occasionally influenced by polluted air masses transported from for GOME-2) larger than the horizontal sensitivity of the ground-based measurements which is about few to tens of km (Irie et al., 2011;Wagner et al., 2011;Ortega et al., 2015).In this context, MAXDOAS data is already better than in-situ measurements with an extended horizontal and vertical sensitivity, more similar to the satellite sensitivity, but differences in sampling and sensitivity still remain and explain part of the biases highlighted by validation exercises.Several validation studies show significant underestimation of tropospheric trace gases, such as NO 2 , from satellite observations over regions with strong spatial gradients in tropospheric pollution (e.g., Celarier et al., 2008;Kramer et al., 2008;Chen et al., 2009;Irie et al., 2012;Ma et al., 2013;Wu et al., 2013;Kanaya et al., 2014;Wang et al., 2017;Drosoglou et al., 2017Drosoglou et al., , 2018)).Other possible explanations include the uncertainties in the applied satellite retrieval assumptions, such as the choices of surface albedo, a priori NO 2 profiles, or cloud and aerosol treatment (Boersma et al., 2004(Boersma et al., , 2011;;Leitão et al., 2010;Heckel et al., 2011;Lin et al., 2014Lin et al., , 2015)).The best agreement is generally obtained in the case of suburban and remote stations, but difficulties may arise when small local sources are present in a remote location, such as Reunion Island or Bujumbura (Pinardi et al., 2015;Gielen et al., 2017).
The same methodology as in the GDP 4.8 validation report (Pinardi et al., 2015) is used for the validation of this improved GDP 4.9 tropospheric NO 2 dataset: the satellite data are filtered for clouds (cloud radiance fraction smaller than 0.5) and the mean value of all the valid pixels within 50 km of the stations is compared to the ground-based value.The original ground-based MAXDOAS data usually retrieves NO 2 columns all day long every 20 to 30 minutes, and these values are linearly interpolated to the GOME-2 overpass time (9:30 local time), if original data exist within +/-1 hours.
Figure 16 shows an example of the time-series and scatter plot of the daily and monthly means comparison between GDP 4.9 tropospheric NO 2 columns and ground-based MAXDOAS measurements in Xianghe, including the statistical information on the number of points, correlation coefficient, slope and intercept of orthogonal regression analysis.Figure 17 presents the daily and monthly mean absolute and relative differences of GDP 4.9 and ground-based measurements.As can be seen in Fig. 16 and 17, the seasonal variation in the tropospheric NO 2 columns is similarly captured by both observation systems with differences on average within ±3 × 10 15 molec/cm 2 (median difference of −1.2 × 10 15 molec/cm 2 ).Larger differences are observed on some days and months, in particular in winter when NO 2 and aerosol loadings are large.A relatively compact scatter is found, with a correlation coefficient of 0.91 and a slope of 0.72±0.04for the orthogonal regression fit.These results are qualitatively similar to those obtained in previous validation exercises (Celarier et al., 2008;Kramer et al., 2008;Chen et al., 2009;Irie et al., 2012;Ma et al., 2013;Wu et al., 2013;Kanaya et al., 2014;Wang et al., 2017;Drosoglou et al., 2017Drosoglou et al., , 2018)).Similar figures for GDP 4.8 can be found on the AC-SAF validation website (http://cdop.aeronomie.be/validation/valid-results).
Figure 18 reports the monthly mean absolute and relative differences for both GDP 4.8 and GDP 4.9 for Xianghe station.
The daily differences are also reported through the histogram panel, where the reduction in the spread of the daily comparison points is clearly visible for GDP 4.9.The reduction of the bias, which is smaller and more stable in time, is seen in the absolute and relative monthly mean bias time-series.Three years show a standard deviation of the monthly biases larger for GDP 4.9 than for GDP 4.8 (±12% instead of ±8% in 2010, ±12% instead of ±8% in 2013, and ±41% instead of ±27% in 2014) but with a strongly reduced mean bias (-4% instead of -20%, -8% instead of -34%, and -1% instead of -44%).
Similar figures as Fig. 16 and 18 for all the stations are gathered in Fig. S1 to S4 in the supplement, and all the statistics are summarized in Table 4 and Table 5 for GOME-2A and GOME-2B, respectively.Fig. S1 and S2 present the time-series and scatter plots for GDP 4.9, while Fig. S3 and S4 present the differences for both GDP 4.9 and GDP 4.8 comparisons.As discussed in Pinardi et al. (2015), for background stations (here Bujumbura, Reunion and OHP), the mean bias is considered as the best indicator of the validation results, due to the relatively small variability in the measured NO 2 .In urban (Beijing and Uccle) and suburban (Xianghe) situations, the NO 2 variability is large enough and in this case, the correlation coefficient is a good indication of the linearity or coherence of the satellite and ground-based dataset, although larger difference in term of slope (closer to 0.5 than to 1 for urban cases) and of mean bias can be expected because satellite measurements (and especially GOME-2 80×40/40×40 km 2 pixels) smooth out the local NO 2 hot spots.This can be seen e.g. in the cases of Beijing and Xianghe for GOME-2A (see Fig. S1a and Fig. 16, respectively), where very high correlation (R=0.94 and 0.91, respectively) are obtained from GDP 4.9, showing the very consistent behavior of both datasets for small and large NO 2 columns, while their slopes (S=0.4 and 0.72, respectively) show almost a factor 2 of difference, with a smaller slope in the Beijing case, where the MAXDOAS instrument is in the city center and thus much more subject to local emission smeared out by the GOME-2 large pixel.This last effect is also seen through the biases values (RD=-47% and -5.8%, respectively) that are strongly reduced when moving the MAXDOAS outside the city in a suburban location like Xianghe.Slope of 0.47 (similar to the 0.4 of Beijing) is also obtained in Uccle, another urban site, where the MAXDOAS is affected by local emissions.
In remote cases such as OHP, Bujumbura or Reunion Island, as discussed above, the variation of the NO 2 columns are small and the statistical analysis on the regression are not very representative of the situation, with a cloud of points giving small slopes and low correlation coefficients (see e.g.Fig. S1b to d and Table 4 for GOME-2A).In those cases, GOME-2 is lower than the ground-based, with sometime almost no seasonal variation, e.g.Bujumbura and Reunion, and in other cases, like OHP, some of the daily peaks are captured by GOME-2 (as days in the winter of 2014 and 2015), and the seasonal patterns and the orders of magnitudes of both datasets are similar.In these cases, it is best to look at the absolute biases (as relative biases are large due to the division with small ground-based columns), as presented in e.g.Fig. S3b to d and Table 4.
Mean absolute differences for GDP 4.9 are about -3.6 × 10 15 molec/cm 2 for Bujumbura, -8.5 × 10 14 molec/cm 2 for OHP, and 5 -1.5×10 15 molec/cm 2 for Reunion, which are all smaller than their respective GDP 4.8 values.The daily differences presented in the histograms of those figures also show reduced spread of GDP 4.9 comparisons when superposed to the GDP 4.8 results.
Similar differences are also found for GOME-2B.
To conclude, although the Xianghe case presented in Fig. 16 to 18 is the best case (due to its suburban location and its long time-series), a better seasonal agreement between GDP 4.9 and MAXDOAS data is found for urban and suburban cases 10 like Beijing, Uccle, and Xianghe, compared to results with GDP 4.8.In remote locations such as OHP, which is occasionally influenced by polluted air masses transported from neighboring cities, the comparison is also meaningful (e.g. with a mean bias reduced from -45% for GDP 4.8 to -25% for GDP 4.9 for GOME-2A), while cases such as Bujumbura and Reunion are quite challenging for satellite validation, with specific local conditions (Bujumbura is in a valley on the side of the Tanganyika lake, while the MAXDOAS at Reunion is in St-Denis, on the coast of the 65 km long and 50 km wide island in the Indian ocean, containing a mountain massif with summits above 2740 m asl).In both cases the MAXDOAS instrument is located in small cities surrounded by specific orography, difficult for satellite retrievals and challenging for validation.The absolute and relative differences show, however, a clear improvement for all the stations, when comparing to GDP 4.8 results for both daily and monthly mean biases.The daily biases and spreads are all reduced.
To summarize, the impact of the improvement of the algorithm (as seen in Table 4 and 5 and in Fig. S3 and S4) leads to a decrease of the relative differences in urban conditions such as in Beijing or Uccle from [-52,-60]% for GDP 4.8 to [-43,-47]% for GDP 4.9 for GOME-2A and from -54% to -40% for GOME-2B.In suburban conditions such as in Xianghe, the differences go from -30% to -6% for GOME-2A and from -26% to -2% for GOME-2B.In remote (difficult) cases such as in Bujumbura or Reunion, the differences go from [-89,-90]% to [-64,-76]% for GOME-2A and from [-86,-87]% to [-47,-74]% for GOME-2B, while in background case such as in OHP, the differences decrease from -45% to -25% for GOME-2A and from -42% to -17% for GOME-2B.The differences in numbers for GOME-2A and GOME-2B are due to the different time-series length of both comparisons (e.g.March 2010-November 2016 for GOME-2A and December 2012-November 2016 for GOME-2B in Xianghe), the different sampling of the atmosphere by GOME-2A and GOME-2B (slight time-delay between both overpasses and reduced swath pixels for GOME-2A since July 2013), and the impact of the decreasing quality of the satellite in time, i.e.
These comparisons results aim at showing how the final GDP 4.9 product is improved compared to its predecessor, and not to summarize the improvements of each of the changes discussed in previous sections.In addition, the specific validation method could be improved or at least better characterized (including results uncertainties), by e.g.changing the colocation method (averaging the MAXDOAS within an hour of the satellite overpass or selecting the closest satellite pixel, or only considering the pixels containing the station, etc.), but this is out of the scope of the present manuscript that wants to compare to standard validation results performed routinely on GDP 4.8 (and publicly available on http://cdop.aeronomie.be/validation/valid-results).
For most stations, in addition of the tropospheric columns, MAXDOAS retrieved NO 2 profiles can also be exploited with satellite column averaging kernels (AK) to further investigate the impact of the satellite a priori NO 2 profiles in the comparison differences (Eskes and Boersma, 2003).The satellite AK describes the vertical sensitivity of measurements to NO 2 concentrations and relates the MAXDOAS profiles to satellite column measurements by calculating the "smoothed MAXDOAS columns" as: The smoothed MAXDOAS NO 2 columns V M AXDOAS,smoothed are derived for each day by convolving the layer (l)-dependent daily profile (interpolated to the satellite overpass time) x M AXDOAS expressed in partial columns with the satellite column averaging kernel AK sat .The comparisons of satellite and smoothed MAXDOAS columns for the different stations are reported in the supplement (Fig. S5 and S6) and Table 4 and 5.The different impact of MAXDOAS smoothing on the 2 GDP products results from the different AK as parameters like surface albedo or a priori NO 2 profiles used in both satellite retrievals are quite different (see Sect. 6).In general, the use of smoothing reduces the MAXDOAS columns and thus reduces both the daily and monthly differences of satellite and MAXDOAS columns.When the average kernels are used to remove the contribution of a priori NO 2 profile shape, as seen in Table 4 and 5 and in Fig. S5 and S6, the relative differences in urban conditions such as in Beijing or Uccle decrease from [-52,-57]% for GDP 4.8 to [-34,-37]% for GDP 4.9 for GOME-2A and from -56% to -29% for GOME-2B.
The differences go from -32% to -13% for GOME-2A and from -27% to -11% for GOME-2B for suburban conditions such as in Xianghe and go from -77% to -31% for GOME-2A and from -64% to -7% for GOME-2B for remote conditions such as in Reunion.
The results obtained here are coherent with other validation exercises at different stations and with other satellite products, where the NO 2 levels are underestimated by the satellite sensors, e.g., with differences of 5% to 25% over China (Ma et al., 2013;Wu et al., 2013;Wang et al., 2017;Drosoglou et al., 2018), mostly explained by the relatively low sensitivity of spaceborne measurements near the surface, the gradient-smoothing effect, and the aerosol shielding effect.These effects are often inherent to the different measurements types or the specific conditions of the validation sites (as seen for the different results for Beijing and Xianghe sites in this manuscript), but also to the remaining impact of structural uncertainties (Boersma et al., 2016), such as the impact of the choices of the a priori NO 2 profiles and/or the albedo database assumed for the satellite AMF calculations (see Sect. 6).Lorente et al. ( 2017) estimated e.g. the AMF structural uncertainty to be on average 42% over polluted regions and 31% over unpolluted regions, mostly driven by substantial differences in the a priori trace gas profiles, surface albedo and cloud parameters used to represent the state of the atmosphere.However, the differences in Bujumbura are still of -62%, because of the peculiar condition with the MAXDOAS being in a valley, close to the Tanganika lake, which always leads to a higher surface pressure for the satellite pixels due to the information coming from the a priori model.This is leading to large representation errors and uncertainties in the comparisons (Boersma et al., 2016) that needs to be investigated in more details.

Conclusions
NO 2 columns retrieved from measurements of the GOME-2 aboard the MetOp-A and MetOp-B platforms have been successfully applied in many studies.The abundance of NO 2 is retrieved from the narrow band absorption structures of NO 2 in the backscattered and reflected radiation in the visible spectral region.The current operational retrieval algorithm (GDP 4.8) for total and tropospheric NO 2 from GOME-2 was first introduced by Valks et al. (2011), and an improved algorithm (GDP 4.9) is described in this paper.
To calculate the NO 2 slant columns, a larger 425-497 nm wavelength fitting window is used in the DOAS fit to increase the signal-to-noise ratio.Absorption cross-sections are updated and a linear intensity offset correction is applied.The long-term and in-orbit variations of GOME-2 slit function are corrected by deriving effective slit functions with a stretched preflight GOME-2 slit function and by including a resolution correction function as a pseudo absorber cross-section in the DOAS fit, respectively.Compared to the GDP 4.8 algorithm, the NO 2 columns from GDP 4.9 are higher by ∼1-3 × 10 14 molec/cm 2 (up to 27%) and the NO 2 slant column noise is lower by ∼24%.In addition, the effect of using a new version (6.1) of the GOME-2 level 1b data has been analyzed in our NO 2 algorithm.The application of new GOME-2 level 1b data largely reduces the offset between GOME-2A and GOME-2B NO 2 columns by removing calibration artefacts in the GOME-2B irradiances (due for OHP, [-64%,-47%; -1.5 × 10 15 molec/cm 2 ,-0.8 × 10 15 molec/cm 2 ] for Reunion, and [-43%,-40%; -5 × 10 15 molec/cm 2 ,-4.2 × 10 15 molec/cm 2 ] for Uccle.Reunion and Bujumbura are difficult sites for validation, due to their valley/mountain nature, while urban sites Beijing and Uccle show similar relative results.The smaller absolute bias is found at the rural OHP station.
Compared to the current operational GDP 4.8 product, the GDP 4.9 dataset is a significant improvement.Although GOME-2 measurements are still underestimating the tropospheric NO 2 columns with respect to the ground data, the absolute and relative differences with the different MAXDOAS stations are smaller, both for the original comparisons and for the comparisons with the smoothed MAXDOAS columns.
In the future, the AMF calculation will be further improved, since uncertainty in AMF is one dominating source of errors in the tropospheric NO 2 retrieval, especially over polluted areas.The surface Bidirectional Reflectance Distribution Function (BRDF) effect will be included using a direction-dependent LER climatology from GOME-2 (Tilstra, L., personal communication) to describe the angular distribution of the surface reflectance.Aerosol properties will be considered explicitly in the RTM calculation using ground-based aerosol observations from e.g.MAXDOAS instruments, Mie scattering Lidars, or sun photometers operated by the AErosol RObotic NETwork (AERONET).A priori NO 2 profiles from different global and regional models will help to analyse the effect of spatial resolution, temporal resolution, and emission on the tropospheric NO 2 retrieval for GOME-2.Furthermore, the NO 2 algorithm will be adapted to measurements from the TROPOMI instrument with a spatial resolution as high as 7×3.5 km 2 .

Figure 1 .
Figure 1.Difference in NO2 columns (slant columns scaled by geometric AMFs) (left) and retrieval RMS (right) estimated with and without a linear intensity offset correction for GOME-2A on 3 March 2008.
Figure2displays the long-term evolution of the fitted GOME-2 slit function width (full width at half maximum, FWHM) calculated from the stretch factors.The GOME-2 slit function has narrowed after the launch by ∼5% for GOME-2A and ∼3.5% for GOME-2B at 451 nm, in agreement withDikty et al. (2011),Azam et al. (2015), andMunro et al. (2016).For GOME-2A, visible discontinuities of the slit function width are related to the in-orbit instrument operations, including an apparent anomaly in September 2009 when a major throughput test was performed(EUMETSAT, 2012).After the throughput test, the narrowing of slit function has slowed down.For GOME-2B, stronger seasonal fluctuations of the FWHM are found.The seasonal and

Figure 2 .
Figure 2. Temporal evolution of the fitted slit function FWHM for GOME-2A (left, January 2007-December 2016) and GOME-2B (right, December 2012-December 2016.) convolved with a stretched preflight GOME-2 slit function (see Sect. 4.3.1)by itself but convolved with a slightly modified slit function.Figure 3 shows an example of the fit coefficients and the influence on our DOAS retrieval on 1 February 2013.As shown in the left panel, the slit function width increases along the orbit by ∼2 × 10 −3 nm (∼0.4%) for GOME-2A (see Beirle et al. 2017, Fig. 8 therein) and ∼5.2 × 10 −3 nm (∼1%) for GOME-2B (a fit coefficient of 1 × 10 −2 corresponds to a change in the slit function width of ∼2.8 × 10 −3 nm).This in-orbit broadening of the slit function is caused by the increasing temperature of the instrument along the orbit.Taking into account the in-orbit broadening in the DOAS fit decreases the retrieval RMS by up to 5% for GOME-2A and up to 12% for GOME-2B in Fig. 3 (right).

Figure 3 .
Figure 3. Changes of GOME-2 slit function width along orbit 32636 on 1 February 2013 (left) and the impact on the retrieval RMS error (right).Red lines provide the boxcar average for GOME-2A (dotted) and GOME-2B (solid).A fit coefficient of 1 × 10 −2 corresponds to a change in the slit function width of ∼2.8 × 10 −3 nm in the left panel.
are applied.The C-IFS model is a combination of tropospheric chemistry module in the Integrated Forecast System (IFS, with current version based on the Carbon Bond chemistry scheme, CB05) of the European Centre for Medium-Range Weather Forecasts (ECMWF) and stratospheric chemistry from the Belgian Assimilation System for Chemical ObsErvations (BASCOE) system.Based on one year of C-IFS data (2009) at a resolution of 0.75 • lon×0.75• lat, synthetic initial total columns V init are calculated as:

Figure 7
Figure7displays the synthetic initial total columns from C-IFS, the modelled stratospheric columns, and the estimated stratospheric columns from STREAM on 5 February and 5 August 2009.The result from STREAM presents an overall smooth stratospheric pattern with a strong latitudinal and seasonal dependency resulting from photochemical changes and dynamical variabilities.Because the stratospheric values over polluted regions are taken from the clean measurements at the same latitude, the stratospheric and tropospheric contribution over polluted regions is well separated by STREAM, especially in the northern hemisphere.Due to the latitude-dependent definition of convolution kernels, STREAM conserves the longitudinal gradients of stratospheric NO 2 at low latitudes and identifies certain strong stratospheric variations at high latitudes, e.g., in the polar vortex on 5 February.However, smaller structures in the synthetic initial total columns, for instance, resulting from the diurnal variation of NO 2 across an orbital swath, are aliased into the troposphere by STREAM due to the use of convolution kernels.

Figure 8 (
Figure8(top) shows the differences in estimated (Fig.7bottom) and a priori (Fig.7center) stratospheric NO 2 .Overall, the stratospheric columns estimated from STREAM show a good agreement with the modelled truth with a slight overestimation, e.g., by ∼1-2 × 10 14 molec/cm 2 over low latitudes for both days.Larger differences are found at higher latitudes, especially in winter, e.g., by ∼5 × 10 14 molec/cm 2 over eastern Europe and over the North Pacific (west of Canada) on 5 February.The strong longitudinal variations of NO 2 over these regions in the a priori truth (Fig.7center) can not be completely captured

5
determining the latitudinal dependencies of total NO 2 over the clean Pacific, removing the latitudinal dependencies before convolution and adding it back to the estimated stratospheric columns.However, longitudinal variations of total NO 2 , for instance, enhanced total NO 2 columns over the Pacific (compared to the Atlantic Ocean) at 15 • N-30 • N on 5 February 2009 (Fig.7top left), introduce biases in the stratospheric NO 2 columns.Therefore, an improved latitudinal correction is introduced

Figure 9 .Figure 10 .
Figure 9. GOME-2 initial total NO2 columns (top) and stratospheric NO2 columns retrieved from the improved STREAM algorithm (center) and from the spatial filtering method used in GDP 4.8 (bottom), measured by GOME-2A in February (left) and August (right) 2009.
has been replaced by one based on GOME-2 observations(Tilstra et al., 2017b).Using the degradation-corrected GOME-2 level 1 measurements, the GOME-2 surface LER is derived by matching the measurements in a pure Rayleigh scattering atmosphere without cloud.Compared to the TOMS/GOME LER climatology, the GOME-2 surface LER (version 2.1) dataset takes advantage of newer observations for 2007-2013, an increased spatial resolution of 1.0 • lon×1.0• lat for standard grid cells and 0.25 • lon×0.25 • at coastlines(Tilstra et al., 2017a), and an improved treatment of cloud contaminated cells over the ocean.

Figure 11 .
Figure 11.Difference in tropospheric NO2 columns for clear-sky conditions (cloud radiance fraction smaller than 0.5) for February 2008 retrieved using the GOME-2 surface LER climatology version 2.1 and the LER climatology based on TOMS/GOME data at 440 nm.
4 • E) and Guangzhou (China, 23.1 • N, 113.3 • E) on one day in February and August 2009 as examples.Monthlyprofiles are shown for MOZART-2, and profiles for the given days are shown for TM5-MP.Large differences between the a priori NO 2 profile shapes from TM5-MP and MOZART-2 are found for both cities.These differences are the result of the different chemical mechanism, transport scheme, and emission inventory employed by the model, the different spatial resolution, and the use of daily vs. monthly profiles.In TM5-MP, the use of updated NO x emissions from the MACCity inventory(Granier et al., 2011) produces more realistic profiles.Improvement in the spatial resolution gives a more accurate description of the NO 2 gradient and transport.The use of daily profiles provides a better description of the temporal NO 2 variation, especially for regions dominated by emission and transport like Brussels and Guangzhou.

Figure 12 .
Figure 12.Examples of a priori NO2 profiles for Brussels (top) and Guangzhou (bottom) on a given day in February (left) and August (right) 2009.Monthly profiles are shown for MOZART-2 (green), and daily profiles on the given days are shown for TM5-MP (brown) together with the monthly average profiles calculated for TM5-MP (blue).The tropospheric NO2 columns retrieved using each a priori NO2 profile are also given.

Figure 13 .
Figure 13.Difference in tropospheric NO2 columns for clear-sky conditions (cloud radiance fraction smaller than 0.5) retrieved using daily TM5-MP and monthly MOZART-2 a priori NO2 profiles for February (left) and August (right) 2009.Red circles indicate locations in Fig. 12.

Figure 16 .
Figure 16.Daily (upper row) and monthly mean (lower row) time series and scatter plot of GOME-2A and MAXDOAS tropospheric NO2 columns (mean value of all the pixels within 50km around Xianghe).

Figure 17 .
Figure17.Daily (grey dots) and monthly mean (back dots) absolute and relative GOME-2A and MAXDOAS time series differences for the Xianghe station.The histogram of the daily differences is also given, with the mean and median difference, and the total time-series absolute and relative monthly differences are given outside the panels.

Figure 18 .
Figure18.Absolute and relative differences of GOME-2A and MAXDOAS tropospheric NO2 columns.The time-series presents the monthly mean differences for GDP 4.8 (black) and GDP 4.9 (red).The total mean differences values and standard deviations are given, as well as the yearly values.The histogram presents the daily differences over the whole time-series for the two products (grey for GDP 4.8 and red for GDP 4.9).

Table 1 .
Main settings of GOME-2 DOAS retrieval of NO2 slant columns discussed in this study.
g and associated absorption cross-section σ g (λ).An additional term with the Ring scaling factor α R and the Ring reference spectrum R(λ) describes the filling-in effect of Fraunhofer lines by rotational Raman scattering (the so-called Ring effect).The GDP 4.8 algorithm adopts a wavelength range of 425-450 nm to ensure prominent NO 2 absorption structures and controllable interferences from other absorbing species, e.g., water vapor (H 2 O vap ), ozone (O 3 ), and oxygen dimer(O 4

Table 2 .
Main settings of AMF calculation method and input data discussed in this study.

Table 3 .
An overview of BIRA-IASB MAXDOAS datasets used in this study.N, 116.96 • E suburban polluted site in China neighboring cities.These different station types are important in the validation context as it is generally expected that urban stations are underestimated by the satellite data, due to the averaging of a local source over a pixel size (80×40/40×40 km 2

Table 4 .
Averaged Absolute Differences (AD, SAT-GB in 10 15 molec/cm 2 ), Relative Differences (RD, (SAT-GB)/GB in %), standard deviation (STDEV), correlation coefficient R and regression parameters (slope S and intercept I) of the orthogonal regression for the monthly means GOME-2A tropospheric NO2 product when comparing to MAXDOAS data.Values for GDP 4.9 (this study) are given and the values for GDP 4.8 are reported in brackets for comparison.Results for both the original comparisons and the smoothed comparisons (smo.) are

Table 5 .
Same as Table 4 but for GOME-2B product.