Sentinel-5P TROPOMI NO 2 retrieval: impact of version v2.2 improvements and comparisons with OMI and ground-based data

. Nitrogen dioxide ( NO 2 ) is one of the main data products measured by the Tropospheric Monitoring Instrument (TROPOMI) on the Sentinel-5 Precursor (S5P) satellite, which combines a high signal-to-noise ratio with daily global coverage and high spatial resolution. TROPOMI provides a valuable source of information to monitor emissions from local sources such as power plants, industry, cities, trafﬁc and ships, and variability of these sources in time. Validation exercises of NO 2 version v1.2-v1.3 5 data, however, have revealed that TROPOMI’s tropospheric vertical columns (VCDs) are too low by up to 50% over highly polluted areas. These ﬁndings are mainly attributed to biases in the cloud pressure retrieval, the surface albedo climatology and the low resolution of the a-priori a (cid:58)(cid:58)(cid:58)(cid:58)(cid:58) priori (cid:58) proﬁles derived from global simulations of the TM5-MP chemistry model. This study describes improvements in the TROPOMI NO 2 retrieval leading to version v2.2, operational since 1 July 2021. Compared to v1.x, the main

older versions, which are therefore not discussed.
-NO 2 -v1.3 with level-1b v1.0 is used as of 20 March 2019, with the same NO 2 algorithm as v1.2 but with an improvement in the input cloud data from FRESCO that affects the NO 2 VCDs of some ground pixels. As of this version the surface or cloud albedo is adjusted to ensure that the retrieved cloud fraction is within the range [0 : 1], leading to more realistic 5 cloud pressures; the same albedo treatment is used for the NO 2 cloud fraction as of v2.1 (Sect. 4.3).
-NO 2 -v1.4 with level-1b v1.0 is used as of 29 Nov. 2020, with the same NO 2 algorithm as v1.2-1.3 but with an improvement in the input cloud data from FRESCO that affects the NO 2 VCDs of many ground pixels, as discussed by ; see also Sect. 4.1.
-NO 2 -v2.1 with level-1b v2.0 is used for test data DDS-2 and includes a number of improvements in the NO 2 retrieval 10 discussed in this paper.
-NO 2 -v2.2 with level-1b v2.0 and the same NO 2 algorithm as v2.1 is used for test data DDS-3 discussed in this paper and is operational as of 1 July 2021.
-NO 2 -v2.3 with level-1b v2.0 contains no changes in the NO 2 data (other than some minor bug fixes; cf. Sect. 6.2) and is operational as of 14 Nov. 2021. 15 -NO 2 -v2.4 with updated level-1b v2.0 and possibly updates in the NO 2 data (cf. Sect. 6) is scheduled for 2022 and will be used for a full mission reprocessing as of 30 April 2018, therewith replacing all previous versions.
Note that near-real time (NRT) data are not considered here; validation of both the off-line (OFFL) and NRT data has shown that results of the two processing chains do not differ significantly .
2 Satellite data sources and data selection 20 2.1 TROPOMI aboard Sentinel-5 Precursor

TROPOMI instrument
The Tropospheric Monitoring Instrument (TROPOMI; Veefkind et al., 2012), launched in October 2017 aboard ESA's Sentinel-5 Precursor (S5P) spacecraft, provides measurements in four channels (UV, visible, NIR and SWIR) of various trace gas concentrations, as well as cloud and aerosol properties, from an ascending sun-synchronous polar orbit, with an equator crossing 25 at about 13:30 local time. NO 2 retrieval is performed from the visible band (400 − 496 nm), which has spectral resolution and sampling of 0.54 nm and 0.20 nm, with a signal-to-noise ratio of around 1500.
Individual ground pixels are 7.2 km (5.6 km as of 6 Aug. 2019) in the along-track and 3.6 km in the across-track direction at the middle of the swath. The full swath width is about 2600 km, with which TROPOMI achieves global coverage each day, Table 1. Overview of the diagnostic data set (DDS) periods processed for evaluation of the updated NO2 data. Columns 3 and 4 give the start of the data that was processed. Columns 5 and 6 give the start of the data that is used for the analysis of the vertical column density (VCD), i.e. after the spin-up period needed by TM5-MP. Columns 7 and 8 give the end of the data that was processed. Note that the orbit at the start of a period may have a sensing start time just before midnight preceding the given date. The last two columns give the version number of the publicly released off-line (OFFL) and the DDS data. except for narrow strips between orbits of about 0.5 • wide at the equator. The swath is across-track divided in 450 ground pixels (rows) and their size remains more or less constant towards the edges of the swath (the largest pixels are ∼ 14 km wide).

TROPOMI observations used in this study
In order to test the NO 2 algorithm updates and their impact on the retrieval results, a diagnostic data set was made. DDS-2 (generated in Sept. 2020) consists for NO 2 of four periods of 12 days made with test processor version v2.1, and DDS-3 5 (generated in April 2021) consists of one period of 14 days made with final processor version v2.2; see in Table 1.
To be able to evaluate the new tropospheric and stratospheric vertical columns (VCDs), the full DDS periods are passed through the TM5-MP data assimilation system, starting from v1.x NO 2 fields of the day prior to the first day of the DDS periods, which means that TM5-MP needs a few days to adjust ("spin-up") to the new v2.x data. Hence, for analysis of the DDS VCDs (Sect. 4) the first 5 days of each period are skipped, whereas for the analysis of the SCDs (Sect. 3) the full periods 10 is used.
DDS-3 also contains three periods of about one day, one of which (04 Apr 2019) overlaps with one of the DDS-2 periods and is therefore included in Table 1 as it can be used to check the effect on the NO 2 SCD retrieval results of changes in the level-1b (ir)radiance spectra between DDS-2 and DDS-3 (Sect. 3.3).
2.1.3 Updates in level-1b (ir)radiance spectra 15 The NO 2 data products of versions v1.x use as input v1.0 level-1b (ir)radiance spectra. As of the switch to v2.2 of the data (cf. Sect. 1), updated v2.0 level-1b (ir)radiance spectra are used. For the DDS processing, the input also consists of v2.0 level-1b spectra. Table 2. Configuration parameters in the NO2 processing related to saturation in the level-1b radiance spectra and removal of outliers in the NO2 retrieval residual for different versions of the NO2 data, with their respective level-1b spectra version.
The maximum fraction of the radiance spectrum that is allowed 0.01 0.25 0.12 to be flagged as saturated before the ground pixel is skipped The maximum number of outliers that is allowed to be in a N/A 10 15 radiance spectrum before the ground pixel is skipped The pre-launch calibration results, used for most of the v1.0 level-1b spectra, are described by , while the updates in the level-1b spectra are detailed by Ludewig et al. (2020); see also the TROPOMI reflectance validation study of Tilstra et al. (2020). The updates most relevant for NO 2 are mentioned here, while Sect. 3.3 discusses the impact of the v2.0 level-1b spectra on the NO 2 retrieval.
Saturation effects may occur in the detectors of band 4 (visible, e.g. used for NO 2 retrieval) and band 6 (NIR, e.g. used 5 for cloud data retrieval) over very bright scenes, such as complexes of high clouds, which result in lower-than-expected radiances for certain spectral (i.e. wavelength) pixels. In addition, large saturation effects may lead to so-called blooming: excess charge flows from saturated into neighbouring detector (ground) pixels in the row direction, resulting in higher-than-expected radiances for certain spectral pixels (Ludewig et al., 2020). Level-1b v1.0 spectra contain flagging for saturation but not for blooming. Level-1b v2.0 also has flagging for blooming (Ludewig et al., 2020), where one error flag number is used for 10 both saturation and blooming. Also improved in v2.0 spectra is flagging for transients, caused by charged particles hitting the detector, relevant all over the world, but in particular over the South Atlantic Anomaly (cf. Sect. 3.2).
Further improvements lie in the degradation correction of the irradiance, in corrections for the absolute and relative (ir)radiances, in the noise and error estimate of the irradiance spectra, and in the determination of the measurement quality (Ludewig et al., 2020). A change in the absolute reflectance, the ratio between the radiance and irradiance, does not affect the retrieved SCDs 15 but it has an impact on the scene albedo and cloud fraction, and therefore on the AMFs and VCDs.
In the time between the generation of DDS-2 and DDS-3 the calibration key data (CKD) of the Level-1b v2.0 spectra, including the irradiance degradation correction, were recalculated using fits over more data (for DDS-2 up to May 2019, for DDS-3 up to Feb. 2021; over the latter period the irradiance degradation was about 2.6 % in band 4 and less than 0.5 % in band 6). This recalculation leads to minor differences for overlapping data periods of DDS-2 and DDS-3: for band 4 both 20 radiance and irradiance differ by less than 0.1 %. The impact on the NO 2 SCD retrieval results (Sect. 3.3) is negligible and is therefore not discussed here.
Implementation of a separate radiance degradation correction is at the time of writing under discussion, with the possible intent to include this in the planned mission reprocessing (cf. Sect. 6.3).

OMI instrument
The Ozone Monitoring Instrument (OMI; Levelt et al., 2006), launched in July 2004 aboard NASA's EOS-Aura spacecraft, provides measurements in three channels (two UV and one visible) of various trace gas concentrations, as well as cloud and aerosol properties, from an ascending sun-synchronous polar orbit, with an equator crossing at about 13:40 local time. NO 2 5 retrieval is performed from the visible band (349−504 nm), which has spectral resolution and sampling of 0.63 nm and 0.21 nm, with a signal-to-noise ratio of around 500.
Individual ground pixels are 13 km in the along-track and 24 km in the across-track direction at the middle of the swath. The full swath width is about 2600 km and with that OMI achieves global coverage each day. The swath is across-track divided in 60 ground pixels (rows) and their size increases towards the edges of the swath to ∼ 150 km.

OMI observations used in this study
Comparisons of the magnitude of the TROPOMI and OMI NO 2 column data are done using OMI orbits from the DDS periods (Table 1) processed within the framework of the QA4ECV project (Boersma et al., 2018); validation of that data is discussed by Compernolle et al. (2020) and Pinardi et al. (2020).
Since June 2007 a part of the OMI detector suffers from a so-called row anomaly, which appears as a signal suppression in 15 the level-1b radiance data at all wavelengths (Schenkeveld et al., 2017), leading e.g. to large uncertainties on the NO 2 data in the affected rows 22 − 53 (0-based), so that effectively the data of these rows have to be skipped from the NO 2 analysis.
Due to this issue and the fact that the TROPOMI and OMI orbits do not exactly overlap, because they measure from slightly different altitudes, direct orbit-to-orbit comparisons are not possible. Instead, data comparisons in this paper are performed after conversion to a common longitude-latitude grid. 20 3 Updates in the SCD retrieval step

Fit window wavelength assignment
The first step in the data processing chain is the selection of the spectral index range [i b : i e ] that comprises the wavelength window [λ b : λ e ] needed for the wavelength calibration and DOAS retrieval steps; for the NO 2 SCD retrieval λ b = 405 nm and λ e = 465 nm. The selection is done at the nominal wavelength grid assigned to the level-1b (ir)radiance spectra. For a given 25 spectral index, i, the radiance wavelengths varies across the detector rows, as illustrated in Fig. 1; this is the so-called spectral smile. Consequently, each detector row has its own [i b : i e ]. With λ b = 405 nm, for example, the level-1b v2.0 radiance nominal wavelength gives i b = 36 for the rows along the swath edge and i b = 24 for the central rows. Some of the rows around changes in the i b and/or i e have in v1.2-v1.4 a slightly higher SCD error estimate than neighbouring rows. This difference, which is less than 1 µmol m −2 , is reduced by two small corrections in the spectral index selection, with little to no effect on other rows.

Outlier removal
Spectral pixels flagged in the level-1b v1.0 data as suffering from saturation or transients (or other errors) are skipped from the measurement before the spectra are used in the further data processing.
Level-1b v1.0 spectra have no flagging for spectral pixels suffering from blooming (cf. Sect. 2.1.3), hence there may be many problematic spectral pixels around spectral pixels that are flagged as suffering from saturation. These spectral pixels 5 suffering from blooming have radiance levels very different from what is expected, leading to outliers (spikes) in the DOAS fit residual, which is the difference between the measured and the DOAS modelled reflectance. Similarly, level-1b data is flagged for transients caused by charged particles hitting the detector, but not all such events constitute transients and perhaps not all transient events are captured, thus leading to possible outliers in the residual.
Since the NO 2 v1.2-v1.4 processor does not have an algorithm that removes the spectral pixels that show such an outlier 10 ("outlier removal") from the DOAS fit, the maximum fraction of the spectral pixels within the NO 2 fit window (405 − 465 nm, which covers 304 or 305 spectral pixels) allowed with saturation flag without skipping the ground pixels was necessarily low (Table 2). With the introduction of an outlier removal routine in NO 2 v2.1 (as announced by van Geffen et al., 2020; see also van Geffen et al., 2021, App. F), and the fact that Level-1b v2.0 flags the spectral pixels suffering from blooming the same way as saturated pixels (Sect. 2.1.3), a larger fraction of the spectral pixels is allowed to be flagged as saturated (third column in 15 Table 2). In case of outliers in the residual of a given ground pixel caused by charged particles hitting the detector, it appears that the number of spectral pixels showing outliers is usually small (less than 5), while in the case of saturation/blooming the number of outliers may be much higher. If the number of outliers is really high, the outlier removal routine may not work well, because it is applied only once (van , the maximum number of allowed outliers is for the operational processing set to 10 (Table 2). the beginning of the NO 2 fit window and are thus related to details of the wavelength assignment (Sect. 3.1) in the pre-v2.1 processor. The NO 2 v2.1 processor has been used with level-1b v1.0 spectra to provide dedicated NO 2 data files for lightning NO x studies that look at the production of NO 2 above bright storm clouds, where saturation/blooming may be a big issue. To this end special configuration settings, listed in the fourth column of Table 2 as "v2.1_test", are used: more outliers are accepted, but the number of spectral pixels flagged for saturation is limited somewhat because level-1b v1.0 spectra lack flagging for 20 blooming. With these special settings, more ground pixels can be used -but with great care -for such lightning NO x studies (Allen et al., 2021;Perez-Invernon et al., 2021;Zhang et al., 2021). Across-track detector row   . Table 1), as function of the equator crossing longitude of the orbits. The overall averages are listed in the legend and in Table 3. Table 3. Relative differences in the DOAS retrieval results between the DDS and OFFL data averaged over the full DDS-2 and DDS-3 periods as well as the relative and absolute differences in the stratospheric VCD averaged over the VCD periods (cf.  columns. Table 3 lists the relative changes averaged over all orbits of each of the DDS-2 and DDS-3 periods. These averages are not an exact measure but are a good indicator of the combined impact on SCD retrieval results of the above mentioned improvements and of the use of level-1b v2.0 spectra combined. Based on the evaluation of only 12 test orbits with the v1.2 NO 2 retrieval, van Geffen et al. (2020) estimated that the update of the level-1b (ir)radiance spectra has a small impact on the NO 2 SCD value, SCD error and RMS error of on average +2 %, −1 % and −6 %, respectively.  Table 3.

Impact on SCD retrieval results
What stands out from comparing the two panels in Fig. 4 and the numbers given in Table 3 is that the change of the SCD error is very different for DDS-2 and DDS-3: in DDS-2 there is a small increase, while in DDS-3 there is a stronger decrease of the SCD error. The reason for this difference is an unfortunate bug introduced in the v2.1 processor used for DDS-2 that was repaired again in v2.2 used for DDS-3: in v2.1 there is a mistake in the calculation of the noise on the reflectance (from the noise on the (ir)radiance spectra) and this reflectance noise determines in part (i.e. scales) the magnitude of the SCD error, as well as the χ 2 of the DOAS fit (details of the DOAS fit approach are given by van Geffen et al., 2020).
Averaging the SCD error changes of the overlapping 21 orbits of the 04 Apr 2019 test data shows a clear decrease of about 3.5 % from DDS-2 to DDS-3. Using this to correct the DDS-2 SCD error differences to the DDS-3 level leads to the numbers in the 5th column of with some variation between the periods likely caused by differences in atmospheric circumstances and remaining seasonal effects, despite the use of a moving TL region for the averaging.
The RMS error is not affected by the reflectance noise and the numbers given in Table 3 do not show a clear difference between DDS-2 and DDS-3 (averaged over the 04 Apr 2019 period, the RMS error of DDS-3 is 0.2 % lower than of DDS-3), indicating that the quality of the NO 2 SCD fit has not been affected by the unfortunate bug. All DDS-2 periods have comparable 5 RMS error decreases, with possibly a somewhat larger decrease in the Autumn 2020 DDS-3 period, which may be due to a small change in the level-1b irradiance degradation correction, but may also be due to atmospheric circumstances.
The SCD values themselves show an increase of 3 − 4 % for DDS-2 and about 2.5 % for DDS-3, while averaged over the 04 Apr 2019 test data the DDS-3 SCD values are 1.1 % lower than those of DDS-2. Again the difference between DDS-2 and DDS-3 may be due to the small change in the irradiance degradation correction (reflectances have changed by less than 0.5 %) 10 and/or to atmospheric circumstances.
In summary the v2.1-v2.2 DDS data, compared to the v1.2-v1.3 OFFL data, shows a much improved DOAS fit quality, a reduced SCD error, and a small increase of the SCD values (Table 3). The SCD increase shows in the top panel of Fig. 4 some East-West variation, while in the bottom panel there is hardly any such variation. On the whole, it appears that the SCD increase is more or less uniform across the world, with little or no hotspots. Due to the physics of the subsequent NO 2 data assimilation, 15 the more or less homogeneous SCD increase leads to an increase of the stratospheric NO 2 vertical column (N strat v ).
The data assimilation is set up in such a way that the total column is made consistent with the TROPOMI observations over regions with small levels of air pollution (oceans, remote land regions), basically by adjusting the stratospheric column because of the minor contribution of the troposphere in those locations. A uniform increase of the TROPOMI total column will therefore lead to a similar increase of the stratospheric vertical column, while the tropospheric columns will be hardly affected. 20 The two right-most columns in Table 3 list the relative and absolute differences of the N strat v averaged over the TL region using the orbits of the VCD period given in Table 1. For the DDS-3 period the increase of N strat v is somewhat less than for the four DDS-2 periods, like with the SCD values and for the same reasons.   Regression (ODR), i.e. taking into account that both data sets have uncertainties, rather than only the data along the y-axis, while in Sect. 5 a different linear regression approach is used.

5
According to van Geffen et al. (2020) the NO 2 SCDs of OMI and TROPOMI agree quite well, with TROPOMI a few percent higher than OMI, as a result of small differences in the DOAS retrieval details, and with OMI showing more scatter than TROPOMI due to its lower spatial distribution. The above described minor changes in the TROPOMI SCD values imply that the same conclusion still holds. Linear fits in scatter plots of world-wide average gridded GCD v2.1 and OMI/QA4ECV data (not shown) for the five test VCD periods have slopes ranging from 0.98 to 1.03 and offsets between 1.27 and 2.57 µmol m −2 , 10 with high correlation (r > 0.94). A more detailed slant column comparison with OMI, based on regional averages, can be seen in Fig. 14 (Sect. 4.4).
4 Updates in the tropospheric VCD step 4.1 FRESCO cloud pressure and NO 2 cloud fraction A dedicated version of the FRESCO+ cloud algorithm (Wang at al., 2008), named FRESCO-S (with 'S' for 'Sentinel'), was 15 implemented in the NLL2DP processor to provide a support cloud product, and its cloud pressure data is used for the v1.2-v1.3 NO 2 data product. Studies showed that the FRESCO-S cloud pressure is too high for some scenes, in particular for scenes with low cloud fractions and/or a considerable aerosol load (which FRESCO sees as an effective cloud), in which case the cloud  pressure is close to the surface pressure (cf. Compernolle et al., 2021;. The consequence of this is that the tropospheric NO 2 VCD is too low for these scenes, as shown also in validation comparisons (see the Introduction).
As of NO 2 v1.4 the so-called FRESCO-wide approach is used, which provides a more realistic estimate of the cloud pressure for scenes with low cloud fractions: the cloud pressure is lower, i.e. the cloud is higher up, as a result of which the tropospheric AMFs decrease, which in turn leads to higher tropospheric NO 2 VCDs. To a large extent, this closes the gap between the 5 TROPOMI and the validation data, though for certain cases a difference between the two datasets remains, as discussed by ; see also van Geffen et al. (2021).
The FRESCO-wide approach is also used for the cloud pressure in the v2.1 (DDS-2) and v2.2 (DDS-3 and its public data release) NO 2 data, but with the cloud data retrieved from the improved level-1b v2.0 spectra. Fig. 7 shows the cloud pressure, c p , frequency distribution of a single orbit, considering only ground pixels identified as snow/ice-free land and ocean by the

Snow/ice flag
It is important to have information on the presence of snow or ice in a given satellite ground pixel, so that if necessary the climatological surface albedo can be adjusted or the AMF calculation can switch from using the cloud fraction and cloud pressure to the use of the effective scene pressure and effective scene albedo, because the cloud algorithm has difficulty 10 distinguishing clouds above snow/ice (cf. van Geffen et al., 2021) and cloud from sun glint. To this end the v1.2-v1.4 processing uses the daily snow/ice cover database from NISE (Nolin et al., 2005).
The NISE data, however, appear to suffer from a number of problems: it has a rather coarse spatial resolution, for a given day it is based on an average over a few days, and it has problems determining snow/ice content around coastlines. The latter is in particular problematic at high latitudes where snow/ice coverage may be important. As of v2.1 the snow/ice information 15 is taken from the daily ECMWF meteorological data, which solves the issues with NISE, thus improving the reliability of the NO 2 data. Fig. 9 shows an example over Canada of the two snow/ice flag datasets, where the NISE flag numbering is used, except that the ice-free ocean has been given the colour orange (value 175) instead of red for its flag value 255, so as to distinguish it from the problematic NISE flags 252-254.
Another issue solved with the switch to the ECMWF snow/ice data is that the NISE data over shallow water areas that may run dry during low tide can be wrong. Over the western part of the Waddenzee in The Netherlands, for example, NISE gives on 1 Jan. 2019 3 % sea-ice, whereas this area cannot possibly have any sea-ice: the ECMWF data correctly identify pixels as 5 ocean (flag value zero). Because of this corrected identification, the NO 2 surface albedo is adjusted from the value of 0.62 in the climatology to a more realistic 0.04.

Surface and cloud albedo
The surface albedo in the NO 2 fit window, used in e.g. the computation of the cloud fraction and the AMF ( The cloud fraction, f c , is determined in the NO 2 fit window at 440 nm following the same approach the FRESCO cloud retrieval (Wang at al., 2008) uses for the cloud retrieval in the O 2 A-band, with an assumed cloud albedo A c = 0.8. On physical grounds f c lies within the range [0 : 1]. If the actual surface albedo, A s , is lower than expected from the climatology, the cloud retrieval leads to f c < 0. Up to v1.4 this was clipped to zero, whereas as of v2.1 the A s is adjusted (decreased) to match f c = 0 and ensure radiative closure. Similarly, in case of very bright clouds the cloud retrieval leads to f c > 1, which is no longer was implemented in the FRESCO cloud retrieval in processor v1.3, leading to more realistic cloud pressures (cf. Eskes and Eichmann, 2021). With the same implementation in use for the NO 2 cloud fraction, the treatment is consistent. Fig. 10 shows as an example a map of the difference "v2.1 minus v1.2" in the NO 2 surface albedo for a part of an orbit. A 10 lower surface albedo leads to a smaller AMF and thus to a higher tropospheric NO 2 VCD. Fig. 11 shows for the full orbit the relationship between the NO 2 tropospheric VCD of v2.1 (vertical axis) and v1.2 (horizontal axis), considering only ground pixels for which the cloud retrieval gives a cloud fraction f c < 0.001 (i.e. effectively zero) and thus for which the surface albedo may have been reduced in v2.1: the tropospheric VCD increases by about 15 %, as the linear fit in Fig. 11 shows. The increase is larger for higher VCDs: for tropospheric VCDs < 100 µmol m −2 the increase is about 10 % (linear fit: y = 1.091x − 0.670).

15
This increase is partly related to the use of level-1b v2.0 (ir)radiance spectra as input in the v2.1 processing; a test processing of the orbit with v2.2 (not shown) reveals that level-1b v2.0 spectra give a tropospheric VCD that is about 5 % higher than level-1b v1.0 spectra (linear fit over all positive VCDs: y = 1.053x + 1.233, correlation coefficient: r = 0.999). A similar increase is found when looking at the ground pixels for which the cloud radiance fraction 0.2 < w c < 0.5 (y = 1.059x−1.135, r = 0.982).

20
The impact of processor changes on the tropospheric VCD data is dominated by the update of the FRESCO cloud retrieval,    Table A1. which amounts to cloud fractions of about 0.2 and less. With this filtering and only 7 or 10 days of data for the average gridded data, the results are somewhat more noisy than those presented in Sect. 3.3.   Table 4 lists the results for the linear fits of the five DDS periods, as in Fig. 12, for all N trop v and for N trop v ≤ 100 µmol m −2 .
From Fig. 12 and Table 4 it is clear that the average N trop v increases with the improvements in the algorithm. With this 5 increase, the TROPOMI data lies closer to the OMI/QA4ECV tropospheric VCD, as shown in Fig. 13 and the last three columns of Table 4. To further investigate the changes in the TROPOMI data, Fig. 14 shows comparisons of averages over selected regions (defined in Table A1) of the gridded tropospheric VCD (red solid lines) and the GCD (green dashed lines), where the TROPOMI averages are divided by the OMI/QA4ECV averages. Clearly, TROPOMI v2.x gives higher tropospheric VCDs than v1.x, in particular for the winter periods in polluted areas (upper panels in Fig. 14). In most cases the TROPOMI 10 v2.x tropospheric VCD lies closer to OMI than TROPOMI v1.2.

Ground-based validation
To assess the impact of the processor changes on the NO 2 VCD data through ground-based validation, both the operational OFFL and the updated DDS data of the five DDS periods for which VCD data is available (cf. Table 1) are compared for three sets of ground-based measurements provided by monitoring networks:

Stratospheric column
The ground-based validation of the stratospheric NO 2 column data reveals an improvement in the bias, from a median difference over all co-located pairs of −0.2 Pmolec cm −2 (identical to the bias reported in  and amounting to about −6 %) for the operational OFFL data, to −0.1 Pmolec cm −2 (−3 %) for the updated DDS data, which is in line with the slight increase in TROPOMI stratospheric column mentioned in Sect. 3.3. Typically, stratospheric columns show a seasonal 20 variation between 2 and 3 Pmolec cm −2 for nearly equatorial sites and between 1 and 6 Pmolec cm −2 for sites at very high latitudes. The updated processing does not change significantly the correlation (Pearson-R) and the dispersion (half of the 68 interpercentile) of the difference between ground-based and S5P stratospheric column data, and only slightly the results of a linear regression (see Fig. 15a), as expected from the reduced bias.
Investigating results at individual ground stations (Fig. 15b) shows improvements in bias (reduction of the absolute value of 25 the median difference) at 6 out of 9 stations, almost no change for one, and increases for the last two stations. The large bias at the Ny-Ålesund station, at about 79 • N, is under investigation; other high-latitude stations, for which there unfortunately were no co-locations in the DDS periods, do not show such a large bias.

Tropospheric column
The comparison of TROPOMI to MAX-DOAS tropospheric NO 2 column data reveals an improvement in both the bias and 30 the dispersion. The former improves from a median difference over all co-located pairs of −1.4 Pmolec cm −2 (or about −32 % and similar to the bias reported in  for the OFFL to −0.9 Pmolec cm −2 (−23 %) for the DDS data.  The dispersion of the difference improves from 3.3 Pmolec cm −2 to 2.4 Pmolec cm −2 . Fig. 16a demonstrates that also the linear regression improves somewhat, with a slight increase of the slope, as expected from the improvement in the derived (multiplicative) bias.  Looking at the change in bias at each individual station (Fig. 16b), the DDS data shows lower biases than the OFFL data at all but two of the 16 stations. However, results at these two outlying sites cannot be considered as meaningful: they represent relatively clean background conditions with small tropospheric column values, with already very small biases in the OFFL data. It is interesting to note that the improvement does not scale with the tropospheric column value: the most polluted site does not benefit from a larger improvement.

Total column
Similar to the tropospheric column validation, the comparison of TROPOMI to PGN total column NO 2 data reveals an improvement in both the bias and the dispersion. The former improves from a median difference over all co-located pairs of −0.8 Pmolec cm −2 (or about −12 %) for the OFFL data to −0.3 Pmolec cm −2 (−5 %) for the DDS data. The difference dispersion improves slightly from 2.5 Pmolec cm −2 to 2.3 Pmolec cm −2 . Fig. 17a shows that also the linear regression improves 10 somewhat, with a clear increase of the slope, as expected from the improvement in the derived (multiplicative) bias.  Looking at the bias per station (Fig. 17b), the situation is more complex to describe than for the tropospheric column. At relatively clean sites with small tropospheric column values, for which the OFFL data already presented slight positive biases with regard to the PGN measurements, the increased columns in the DDS data lead to even larger positive biases. The increased DDS total columns improve the bias only of those sites for which the OFFL data underestimated the PGN columns. This is also broadly true for the right-hand half of the graph, i.e. for sites with larger total columns due to a significant tropospheric

Validation summary and discussion
In summary, ground-based validation of the updated DDS NO 2 vertical column data, in comparison to the validation of the corresponding operational OFFL data, confirms the improvement (reduction) in the bias of the stratospheric column expected from the increase in the stratospheric column observed in the DDS data.
For the tropospheric and total NO 2 columns, the dispersion is lower with the DDS data, but whether the bias improves 5 depends on the range of tropospheric column values: at sites with large tropospheric columns, affected by strong negative biases in the OFFL data, the increased tropospheric (and total) columns imply a clear improvement. At clean background sites, however, the increased columns of the DDS data actually worsen the already positive bias of the OFFL data, a finding which is somewhat at odds with the ZSL-DOAS comparisons for the stratospheric columns, where an originally negative bias is reduced. This apparent inconsistency between direct-sun and zenith-sky measurements was already observed in Verhoelst et 10 al. (2021) and work is ongoing to elucidate and address this, including a reprocessing of the PGN data (upcoming v1.8) with more appropriate absorption cross-sections for clean sites where the total column resides mostly in the stratosphere.
Note that the two most polluted measurement sites, which show in the tropospheric (Fig. 16b: Vallejo) and total ( Fig. 17b: Unam) column bias a behaviour very different from the other polluted sites, are both located on the Mexican plateau, a situation very different from the other measurement sites. The NO 2 cloud (radiance) fraction is currently derived from the FRESCO cloud pressure, as mentioned in Sect. 4.1. Using the O 2 -O 2 cloud pressure instead would mean that the cloud pressure is determined a) from almost the same wavelengths as 25 NO 2 and b) from measurements by the same detector, thus eliminating the small spatial mismatch between ground pixels of NO 2 (band 4) and FRESCO (band 6). In addition it seems that for certain atmospheric circumstances the O 2 -O 2 cloud pressure may be more realistic than the FRESCO cloud pressure. Evaluation of the O 2 -O 2 cloud data product quality is ongoing, which may lead to selections rules within the NO 2 processor to choose between the two cloud pressures. It is as yet uncertain whether this will be included in data processor version 2.4. 30 6.2 Bug fixes in data version v2.3 Fixes have been included in data processor version v2.3, operational as of 14 Nov. 2021, of minor bugs related to the output of some detailed data not used by most data users (notably wavelength calibration parameters and NO 2 DOAS polynomial coefficients) that were accidentally introduced in v2.2 with the inclusion of the O 2 -O 2 cloud retrieval and which do not affect the v2.2 SCD and VCD values or quality. The improvements in the level-1b v2 spectra (cf. Sect. 2.1.3; Ludewig et al., 2020) include a correction for the degradation of the irradiance but not for the radiance, because at the time of delivery of the initial level-1b v2.0 calibration key data (CKD) the accumulated degradation in radiance was still to small to reliably determine a degradation correction for it. With a stronger effect and more radiance data data available, it has become possible to determine a degradation correction and updated CKD 10 have been determined.
With this update a new test data set, DDS-4, is made in Autumn 2021. If evaluation of TROPOMI data products in DDS-4 is favourable, the radiance degradation correction will be included in the operational processor, which for NO 2 will be v2.4, due for release in the first half of 2022 and due to be used for a full mission reprocessing later in 2022. Little effect is expected on the NO 2 SCD values, but the SCD error may improve due to the degradation correction. 15

TROPOMI surface albedo data
As mentioned in Sect. 4.3 the surface albedo in the NO 2 fit window is taken from the 5-year version of the OMI Lambertianequivalent reflectivity (LER) climatology (Kleipool et al., 2008), which is given on a grid of 0.5 • ×0.5 • and measured at almost the same overpass time as TROPOMI is measuring. The OMI LER, however, does not contain NIR wavelengths and for the FRESCO cloud retrieval the GOME-2 LER (Tilstra et al., 2017) is used, which is given on a grid of 0.25 • ×0.25 • and measured 20 at mid-morning rather than early afternoon. These climatologies are not optimal for TROPOMI, in particular in view of the spatial resolution. Furthermore, the LER approach assumes isotropic reflection of light, while in reality there is a viewing angle dependency in the reflected light (see e.g. Lorente et al., 2018).
For this reason a dedicated TROPOMI surface albedo climatology is under development, based on TROPOMI measurements and containing both a traditional LER as well as a directionally dependent LER (DLER), similar to the one developed recently 25 from GOME-2 measurements by Tilstra et al. (2021). This TROPOMI climatology will be available at a grid of 0.125 • ×0.125 • .
Initially it will be based on level-1b v1.0 spectra and as such it is expected to be available for use in both the FRESCO and NO 2 v2.4 operation processing and planned mission reprocessing. At a later stage, after the mission reprocessing, an update of the TROPOMI climatology will be made using level-1b v2.0 spectra. Whether and if so when that updated DLER will be implemented in the FRESCO and NO 2 processing is as yet undecided.

30
The TROPOMI NO 2 data product is widely used for monitoring air pollution levels world-wide, benefiting from TROPOMI's high spatial sampling and excellent signal-to-noise ratio. Since the first data release mid 2018 several improvements have been made, with a major update to version 1.4 at the end of November 2020 (van Eskes and Eichmann, 2021). This paper documents the improvements leading to version 2.2 of the TROPOMI NO 2 data product, 5 operational as of 1 July 2021. These improvements and their impact on the NO 2 SCD and VCD data, studied by comparing so-called Diagnostic Data Set (DDS) test data with operational offline (OFFL) v1.x data, can be summarised as follows.
-Small corrections in the wavelength assignment of the reflectance used in the DOAS slant column fit reduce the SCD error of ground pixels along some detector rows, without affecting other rows or the SCD values significantly.
-The introduction of an outlier removal improves the SCD retrieval quality for ground pixels suffering from charged 10 particles hitting the detector (notably over the SAA) and those suffering from saturation and blooming effects (notably over bright clouds), without affecting other ground pixels.
-The use of improved level-1b v2.0 (ir)radiance spectra, with among others better handling of blooming and transients effects, improved the (ir)radiance calibration, and improved irradiance degradation correction, in combination with the above two improvements, leads to: a) a reduction of the SCD error by about 2 %, b) a reduction of the RMS error of the 15 DOAS fit by about 7 %, and c) an increase of the SCD values of about 3 %.
-The increase of the SCD values is fairly homogeneous and leads to an estimated increase of the stratospheric VCD by 2 − 4 % or 0.6 − 1.5 µmol m −2 .
-The use of the improved level-1b v2.0 leads a) to a somewhat lower cloud pressure for ground pixels with small clouds fractions, which in turn leads to tropospheric VCDs for those ground pixels to be higher by some 5 %, and b) to a small 20 increase of the number fully cloud-free ground pixels.
-Switching the source of the snow/ice flag from NISE to ECMWF improves the quality of the VCD data because of the higher spatial resolution of the ECMWF flag and its better handling of coastlines and shallow water cases.
-The climatological surface albedo reduction for cloud-free ground pixels with reflectances lower than expected, in combination with the use of improved level-1b v2.0 spectra, leads to tropospheric VCDs to be higher by 10 − 15 % for 25 cloud-free pixels.
The combined effect of all improvements on the vertical column data necessarily includes the impact of an update of the FRESCO cloud retrieval as of v1.4  since there is no DDS that covers v1.4 data. On average the v2.x DDS data have tropospheric NO 2 columns that are 10 to 40 % larger than the v1.x OFFL data, depending on the level of pollution. This increase has brought these VCDs closer to OMI observations, while the underlying SCDs differ by only a few percent.

30
Ground-based validation of the updated DDS NO 2 vertical column data, in comparison to the validation of the corresponding operational OFFL data, shows on average an improvement of the negative bias of the stratospheric (from −6 % for OFFL to −3 % for DDS), tropospheric (from −32 % to −23 %) and total (from −12 % to −5 %) columns. For individual measurement stations, however, the picture is more complicated, in particular for the tropospheric and total columns. For most polluted sites the negative bias improves, but improvement is not proportional to the pollution level. And at clean background sites the 5 positive bias seems to get worse, which in turn seems inconsistent with the improved bias in the stratospheric column. Work is ongoing to try to clarify these differences.
Part of the negative bias observed when comparing with ground-based observations is probably due to the relatively coarse (1 • × 1 • ) resolution of the a-priori profiles used in the retrieval. Douros et al. (2021) show that the use of profiles from the CAMS 0.1 • × 0.1 • air-quality analyses leads to substantial increases of the retrieved tropospheric columns over emission 10 hotspots of order 20 %, depending on the location.
Processor version 2.3, operational since 14 Nov. 2021, contains only fixes of minor bugs not affecting the SCD or VCD data.
Version 2.4, due for release in the first half of 2022 and to be used for a full mission reprocessing later in 2022, may contain a few further improvements, depending on upcoming analyses: a) level-1b spectra with a radiance degradation correction, b) use of a dedicated TROPOMI surface albedo climatology in both the cloud data and NO 2 retrieval that accounts for viewing angle 15 dependencies, and c) criteria in the determination of the NO 2 cloud (radiance) fraction between use of the cloud pressure from the FRESCO or from the O 2 -O 2 cloud data product. Table A1 gives the longitudinal and latitudinal extent of the regions in Fig. 14.  Author contributions. JvG conducted the research described in this paper and is responsible for the text. HE is responsible for the AMF and 20 VCD steps and the final data product. SC, GP, TV, and JCL carried out the global validation analysis. MS and MtL implemented and tested the retrieval code in the TROPOMI processor. AL is leader of the TROPOMI level-1b team. KFB is involved in the final NO2 data product.

Appendix A: Region definitions
JPV is involved in retrieval issues and is the PI of TROPOMI.
Competing interests. The authors declare that they have no conflict of interests.