Improving algorithms and uncertainty estimates for satellite NO 2 retrievals : results from the quality assurance for the essential climate variables ( QA 4 ECV ) project

Global observations of tropospheric nitrogen dioxide (NO2) columns have been shown to be feasible from space, but consistent multi-sensor records do not yet exist, nor are they covered by planned activities at the international level. Harmonised, multi-decadal records of NO2 columns and their associated uncertainties can provide crucial information on how the emissions and concentrations of nitrogen oxides evolve over time. Here we describe the development of a new, community best-practice NO2 retrieval algorithm based on a synthesis of existing approaches. Detailed comparisons of these approaches led us to implement an enhanced spectral fitting method for NO2, a 1× 1 TM5-MP data assimilation scheme to estimate the stratospheric background and improve air mass factor calculations. Guided by the needs expressed by data users, producers, and WMO GCOS guidelines, we incorporated detailed per-pixel uncertainty information in the data product, along with easily traceable information on the relevant quality aspects of the retrieval. We applied the improved QA4ECV NO2 algorithm to the most current level-1 data sets to produce a complete 22-year data record that includes GOME (1995– 2003), SCIAMACHY (2002–2012), GOME-2(A) (2007 onwards) and OMI (2004 onwards). The QA4ECV NO2 spectral fitting recommendations and TM5-MP stratospheric column and air mass factor approach are currently also applied to S5P-TROPOMI. The uncertainties in the QA4ECV tropospheric NO2 columns amount to typically 40 % over polluted scenes. The first validation results of the QA4ECV OMI NO2 columns and their uncertainties over Tai’an, China, in June 2006 suggest a small bias (−2 %) and better precision than suggested by uncertainty propagation. We conclude that our improved QA4ECV NO2 long-term data record is providing valuable information to quantitatively constrain emissions, deposition, and trends in nitrogen oxides on a global scale.


Introduction
Nitrogen oxides (NO x = NO + NO 2 ) in the atmosphere have far-reaching effects on the Earth system.In the lower troposphere, nitrogen oxides promote the photochemical production of ozone (e.g.Liu et al., 1987;Grewe et al., 2012), whereas in the stratosphere, NO x leads to the catalytic de-Published by Copernicus Publications on behalf of the European Geosciences Union.
struction of ozone and the formation of reservoir species for halogens (e.g.Crutzen et al., 1970).Nitrogen oxides contribute to aerosol formation, and they are linked to the oxidising efficiency of the troposphere via ozone, which plays an important role in the formation of the hydroxyl radical (OH).NO 2 itself is only a weak greenhouse gas (Solomon et al., 1999) but has considerable relevance for radiative forcing because nitrogen oxides are important precursors of tropospheric ozone, aerosols, and OH.The net effect of nitrogen oxides on climate forcing is modelled to be negative or cooling, with NO x -driven aerosol screening dominating over tropospheric ozone warming (Shindell et al., 2009).In 2011, the World Meteorological Organization (WMO) Global Climate Observing System (GCOS) included NO 2 (together with SO 2 , HCHO, and CO) in its Implementation Plan for the Global Observing System for Climate in Support of the UN-FCCC (WMO, 2011) "in recognition of the emission-based view on climate forcing of ozone and secondary aerosols, relevant for climate mitigation and important for processes".The formal attribution of NO 2 as a precursor to the essential climate variables, or ECVs (Bojinski et al., 2014), of ozone and aerosols implies that the scientific community has committed itself to providing reliable, long-term measurement records of NO 2 .Apart from its relevance to climate change, atmospheric nitrogen oxides are also important for the health of ecosystems and humans.Deposition of nitrogen to ecosystems may affect the structure and functioning of ecosystems (e.g.Galloway et al., 2003).Recently, the World Health Organization stated that it is reasonable to infer that NO 2 has direct short-term health effects, such as airway inflammation and reductions in lung function (WHO, 2013), and a literature review of epidemiological studies over a wide geographic area by Hoek et al. (2013) showed that human mortality was significantly associated with long-term exposure to NO 2 .
High-quality observations are needed to monitor the concentrations of nitrogen oxides in the atmosphere, both close to the ground, where NO 2 is relevant for deposition and health aspects, as well as aloft, where nitrogen oxides influence atmospheric chemistry and climate.Such measurements are useful for reanalysis studies (e.g.Inness et al., 2013), contribute to documenting changes in NO 2 concentrations and NO x emissions (e.g.Zhang et al., 2008;Vinken et al., 2014), and to attributing any such changes to their underlying causes (e.g.Miyazaki et al., 2012;Verstraeten et al., 2015;Xu et al., 2013).This provides policy makers with options for decisions to counter environmental problems (e.g.Witman et al., 2014).Measurements may also enhance the public's appreciation of the extent and scope of the problem of air pollution.In situ measurements of NO x concentrations taken on the ground are representative of the quality of the air people breathe close to the measurement station.But such stations are relatively scarce in many countries and cannot provide spatio-temporal continuity on a global scale.Satellite observations, on the other hand, provide global coverage, thereby offering the unique opportunity to study spatial patterns and temporal variation in NO 2 pollution.For any type of measurement, it holds that they can only be used properly in science or as evidence basis for policy decisions if there is unequivocal confidence in the data sets, as well as a proper understanding of their limitations.
The EU Seventh Framework (FP7) project, Quality Assurance for Essential Climate Variables (QA4ECV, 2018, http://www.qa4ecv.eu,last access: 13 December 2018), was designed to demonstrate how reliable climate data sets can be generated, along with detailed and traceable information on the quality of such data.Specifically, for NO 2 , the goals of this project are as follows: 1. to generate a multi-decadal  satellite data record of tropospheric and stratospheric NO 2 column densities based on calibrated satellite data and state-ofthe-art retrievals, and 2. to provide fully traceable uncertainty metrics for this record, ready for ingestion in models or in other interpretation efforts.Obtaining global, long-term, and stable satellite observations with validated accuracy and precision is not straightforward.The GOME (1995-2003;Burrows et al., 1999), SCIAMACHY (2002SCIAMACHY ( -2012;;Bovensmann et al., 1999), OMI (from 2004 onwards; Levelt et al., 2006), and GOME-2A (from 2007 onwards; Munro et al., 2006) instruments have been providing global observations of NO 2 over the last 22 years, but there are important differences in overpass time, instrumental artefacts (e.g.calibration and design differences), and signal-to-noise levels that need to be taken into account.To be used properly, the information content of the NO 2 products needs to be validated over a variety of regions, and users need guidance provided by well-established quality information to help them judge the fitness for purpose of the NO 2 products.
In this work, we demonstrate our approach to improving a retrieval algorithm and apply it to generate a multi-decadal record of NO 2 columns with a consortium of European retrieval groups.We follow the guidelines for the generation of ECV data sets from WMO (2010).Our efforts are inspired by the QA4ECV project goals described above, but also by recent studies showing that there is still room for substantial improvement in all sub-steps of the retrieval (e.g.Richter et al., 2011;Lin et al., 2014;van Geffen et al., 2015;Krotkov et al., 2016), by the outcome of validation studies showing that various state-of-science retrievals have biases of the order of tens of percents (e.g.Jin et al., 2016;Drosoglou et al., 2017;Kollonige et al., 2018), and the considerable structural uncertainty in retrieved tropospheric NO 2 columns emerging when different retrieval methodologies are applied to the exact same satellite observations (e.g.van Noije et al., 2006;Lorente et al., 2017).The efforts from five European retrieval groups within the QA4ECV consortium allow us to perform a detailed comparison of current approaches to various retrieval sub-steps.These comparisons have proven to be helpful in reducing and better quantifying the uncertainty of the NO 2 retrieval.The improved quality of the QA4ECV NO 2 record itself and the improved knowledge of the uncertainties should make the QA4ECV satellite data record better fit the purpose of trend analysis, data assimilation, and inverse modelling studies.The paper is organised as follows: in Sect. 2 we discuss how NO 2 data user requirements, the expertise from NO 2 data providers, and the quality requirements defined by GCOS are providing direction for this study.In Sect. 3 we assess the quality of the best currently available level-1 data sets for NO 2 retrieval from GOME, SCIAMACHY, OMI, and GOME-2(A) and discuss how this guides the selection of spectral fitting approaches.Section 4 focuses on the algorithm design and the traceability of the retrieval approach and external data used.In Sect.5, we give an overview of the main lessons learnt in the intercomparisons of retrieval sub-steps.Section 6 summarises the uncertainty information provided in the QA4ECV data product and how these uncertainty estimates compare to the intercomparison results from Sect. 5. We conclude with a first validation of our new QA4ECV OMI NO 2 tropospheric columns and their uncertainties against independent MAX-DOAS measurements collected during a 1-month campaign over Tai'an, China.
2 User needs and expert recommendations

User survey
At the start of the QA4ECV project, we identified the requirements of data users in terms of uncertainty information and usability of the data product.This included a survey of 22 NO 2 data users and interviews with three NO 2 "champion users", who provided more detailed written answers to questions.The questionnaire was aimed at establishing what users need in terms of quality flags, traceability information, and product uncertainty description.The main outcome of the survey for NO 2 is summarised in the Supplement.Briefly, users need detailed quality flags, specific information on random and systematic contributions to the uncertainties, traceability information on the product, and validation of the product and algorithm.The full survey also includes results for the HCHO and CO data products and can be found in QA4ECV Deliverable 1.1 (Nightingale et al., 2015).

Producer requirements
We also carried out a survey of data producer requirements for quality assurance in satellite data records and discussed retrieval priorities and quality assurance (QA) needs with retrieval experts from different groups within the consortium (BIRA-IASB, IUP Bremen, KNMI, Max Planck Institute for Chemistry, and Wageningen University in alphabetical or-der).Producers of data products (other than those involved in QA4ECV) that we interviewed recognised the need for processing chain information to be more transparent and more easily accessible for data users.
There was also a strong intrinsic motivation from NO 2 data producers to improve the retrieval algorithms and generate a long-term NO 2 data set from available satellite reflectance measurements.The NO 2 retrieval groups in the QA4ECV consortium discussed priorities for retrieval improvement, based on their collective experience with the retrieval, validation, and use of existing individual NO 2 data products for different sensors.The central idea was to arrive at a QA4ECV consortium algorithm based on best practices derived from lessons learnt from intercomparisons between approaches for all relevant retrieval sub-steps and extend the steps initiated within the ESA S5P verification project (DLR, 2015).

QA4ECV consortium activities
Retrieval of tropospheric NO 2 columns is based on a threestep approach.First, a set of absorption cross sections, including NO 2 , is fitted to the measured top-of-atmosphere reflectance spectrum, which provides the slant column densities (SCDs, N s ).Then (step 2), the stratospheric contribution to the SCD (N s,strat ) is estimated and subtracted from the SCD.In the third step, the tropospheric air mass factor (or AMF, M trop ) is calculated based on knowledge of the satellite viewing conditions and assumptions on the state of the atmosphere in order to convert the residual tropospheric SCD into a tropospheric vertical column density, VCD (N v,trop ).The retrieval equation is as follows: The following activities leading to the retrieval improvement were identified and conducted during the QA4ECV project: 1. Institutes compared different approaches to spectral fitting.NO 2 SCDs were computed by all groups for the same orbits of level-1 data and results were compared.This resulted in a quantification of the level of agreement on the slant columns and a better understanding of the factors responsible for the remaining differences.This is a relevant exercise in view of the substantial revisions of spectral fitting approaches over the last years (e.g.Richter et al., 2011;van Geffen et al., 2015;Marchenko et al., 2015;Anand et al., 2015) and resulted in the definition of the QA4ECV best-practice spectral fitting algorithm.
3. Stratospheric NO 2 fields and associated tropospheric residues from different approaches were compared for consistency and plausibility checks, and quantification of differences.Recent improvements in the KNMI data assimilation approach (Maasakkers, 2013) and the newly developed STREAM scheme (Beirle et al., 2016) provided more insight into the stratospheric correction and the associated uncertainties.
4. Altitude-dependent or box air mass factors (AMFs) for simplified scenarios were compared.This comparison established the degree of consistency between radiative transfer models, pointed out discrepancies, and provided hints for possible improvements.The resulting spread between the (box) AMFs can be interpreted as the structural uncertainty1 when using different radiative transfer models, vertical layering, and interpolation schemes (Lorente et al., 2017).

Tropospheric AMFs calculated by different groups
with an increasing number of differences in algorithm choices were compared: from identical settings (wherein only model, vertical layering, and interpolation differ between groups), via preferred settings (every group using their own preferred information on clouds, albedo, NO 2 profile, etc.), to a wider round-robin comparison wherein groups outside of Europe also participated.This last comparison was unguided; i.e. groups could freely decide how to calculate their AMFs, deciding for themselves whether to include aerosol corrections, using look-up tables, correcting for residual clouds, etc.The spread between the round robin AMFs is indicative of the structural uncertainty in the AMF calculation (Lorente et al., 2017).
It is impossible at the algorithm development stage to have a full understanding of which settings and approaches lead to the best results.This led the consortium to consider it beneficial to include more than one best-practice approach for the stratospheric correction and AMF calculation sub-steps.Specifically, apart from the proposed default stratospheric correction method, stratospheric NO 2 column estimates from the independent STREAM method have also been included in the QA4ECV NO 2 data product.For the tropospheric AMF calculation, it was decided to provide both the standard tropospheric AMF (linear combination of a partly cloudy, partly clear-sky AMF) but also to include the clear-sky AMF in the data product.This allows data producers to directly test different retrieval options (correcting for residual clouds vs. cloud clearing) at the validation stage and provides users with the possibility to test the robustness of the data product beyond the quoted retrieval uncertainty alone.

GCOS requirements and GCOS guidelines for data set generation
The Global Climate Observing System (GCOS) published a set of requirements that tropospheric NO 2 columns should fulfil.The requirements from GCOS report 154 (WMO, 2011) are listed in Table 1 below.The recently published requirements from GCOS report 200 (GCOS, 2016) are not considered here yet.
The GCOS requirements, especially those on resolution, can be discussed for their adequacy.These are target requirements, which should be advanced towards when generating a long-term record of tropospheric NO 2 column measurements.The resolution requirements listed above cannot be met by the satellite sensors capable of measuring NO 2 that have been operational over the last 20 years, because of limitations in their instrument design, with the exception of the recently launched S5P-TROPOMI sensor, which does meet the requirement.Indeed, the GCOS report states that "products at lower spatial and temporal resolution" than 5-10 km (that is the NO 2 products currently available from GOME, SCIAMACHY, OMI, and GOME-2) ". . .would be sufficient to provide an independent instrument data record of longterm precursor trends to assist in the attribution of changes in ozone and aerosol".
The target requirements for uncertainty and stability are possibly within reach, judging from validation studies, and these have motivated the QA4ECV consortium to find ways to reduce the retrieval uncertainties and to better estimate the systematic error component of the retrieval uncertainty.
GCOS has also established guidelines for the generation of climate data sets (GCOS, 2010).Those guidelines serve as a checklist against which ECV producers can evaluate their production and documentation process (Nightingale et al., 2018).Section 3 of the Supplement provides a point-by-point overview of how these guidelines have been taken into account for the generation of the QA4ECV NO 2 data product.A comprehensive comparison with respect to these and other GCOS requirements (GCOS, 2016;WMO, 2010WMO, , 2011) ) is available in the QA4ECV Deliverable D6.1 (Compernolle, 2018).

Quality of level-1 data
In the early stages of the QA4ECV project design, it was decided to use GOME, SCIAMACHY, OMI, and GOME-2 (on MetOp-A) to generate a data record for tropospheric and stratospheric NO 2 vertical columns spanning the period 1995-2017.NO 2 tropospheric column 5-10 km n/a 4 h max(20 %; 0.03 DU 1 ) 2 % 1 An uncertainty of 0.03 DU (Dobson units) corresponds to 0.8 × 10 15 molec.cm −2 .The 0.03 DU holds for tropospheric NO 2 columns up to 4.0 × 10 15 molec.cm −2 .For larger column values the relative uncertainty of 20 % holds.Note that we replaced the heading "accuracy" (WMO, 2011) with "uncertainty" to be compliant with ISO standard on metrology (VIM).Indeed, (WMO, 2011) states that "the (accuracy) requirements are indicative of acceptable overall levels for the uncertainties of product values." 2 According to GCOS, the user requirement for stability is a requirement on the extent to which the uncertainty of a measurement remains constant over a long period (GCOS-200, 2016).
struments.For all instruments, the most recent and corrected level-1 data sets are used.
Prior to algorithm testing, we assessed the quality of the relevant level-1 data.Here we briefly discuss our findings and discuss how the quality of the level-1 data may affect the retrieval of NO 2 SCDs and their uncertainties.

GOME
GOME level-1 data with global coverage are available from July 1995 to June 2003.ESA produced a GOME level-1 data set for the mission called version 5.1 (GOME Products and Algorithms, 2018) that is sufficiently well characterised and complete.An important concern with GOME level-1 data is that the solar irradiance signal is detected after reflection from a diffuser plate, whereas the radiance signal is not.The reflection on the diffuser plate created large and seasonally varying artificial spectral structures in the solar irradiance (Richter and Wagner, 2001).This makes it very difficult for GOME to use solar irradiance spectra as a reference in the DOAS spectral fitting.To avoid the issue, Earthshine radiances over remote regions can be used as reference spectra.The implication is that only differential NO 2 SCDs are retrieved.To provide the total NO 2 SCDs necessary for an ECV data set of stratospheric and tropospheric NO 2 columns, a background correction, typically estimated from an external source, is required.
Detector degradation is another relevant issue for NO 2 and cloud retrievals in the visible channel.This degradation in the level-1 data has been estimated to amount to approximately 15 % between 1995 and 2003 (Slijkhuis et al., 2015) and is anticipated to result in modest increases in the GOME NO 2 SCD uncertainty.The quality of the GOME level-1 data has also been affected by other instrument-related issues, but these may be of less relevance to the quality of the NO 2 spectral fits.Different GOME scan angles (east, nadir, west) are affected differently in terms of throughput degradation and dichroic mirror degradation, possibly resulting in systematic differences in NO 2 SCDs and cloud products for the different scan angles (or stripes).

SCIAMACHY
SCIAMACHY lv1 data are available from August 2002 to April 2012.SCIAMACHY lv1 version 7.04 data have been made available by ESA in 2016.One particular feature of the SCIAMACHY level-1 data is that co-adding of spectra was performed on board SCIAMACHY, prior to downlinking the data from the satellite to receiving stations.The cluster 424-527 nm was read out more frequently (than other spectral bands) in order to minimise the co-addition of spectra and thereby optimising the spatial resolution for NO 2 to 60 × 30 km 2 (30 × 30 km 2 in some latitude bands).A consequence of this is that only spectral data from the 424-527 nm cluster are available for DOAS NO 2 spectral fitting.Similarly to GOME, SCIAMACHY solar irradiances suffer from spectral structures from the diffuser plate.A second diffuser was therefore included in the instrument, mounted on the backside of the azimuthal scan mirror.Using solar irradiances from this azimuthal scan mirror strongly reduces the apparent seasonality in NO 2 introduced by the diffuser, although some structures still remain (Richter et al., 2011).
Over its lifetime, the SCIAMACHY instrument suffered from degradation of its optical components.This degradation is the result of a complex mixture of aging of the front optics through UV radiation and photochemical reactions, detector contamination by water vapour deposition, and changes in the thermal equilibrium of the platform.As a result, the throughput of SCIAMACHY decreased over the years, in particular in the UV.In addition, small changes in spectral sensitivity over time, for example from etaloning 2 , are cancelled out when using daily irradiance spectra for DOAS spectral fits, but this prevents the use of a single solar irradiance for the full time series.As degradation of the scan mirror leads to scan-angle-dependent degradation, scan-angledependent biases, or stripes, can therefore develop over time in the NO 2 SCDs.

OMI
The OMI instrument produces stable (to ∼ 2 % over the mission time, in the row anomaly-free areas) lv1 radiances over the period 2004-2017 for rows not affected by the row anomaly.The OMI level-1 data are from the Collection 3 data.Processing of this Collection 3 data started in February 2010 with version 1.1.3 of the ground data processing system software (Dobber et al., 2008) and has produced a complete level-1 data set for the entire OMI mission.The main issue of the OMI level-1 data is the row anomaly (RA).From June 2007 onwards, several rows of the CCD detector (each corresponding to a specific part of the OMI nadir field of view) received less light from the Earth, and some other rows appear to receive sunlight scattered off a peeling piece of spacecraft insulation.A plausible reason for these effects is a partial obscuration of the entrance port by insulating layer material that may have come loose on the outside of the instrument.For rows affected by the RA, successful spectral fits can still be achieved for NO 2 , but the cloud retrievals suffer from large errors that cannot be overcome; thus the affected pixels have to be removed from further analysis.Figure 1 shows the rows flagged in the Collection 3 level-1 data over time.By 2017, 38 % of the available data were affected by the row anomaly.
All rows affected are flagged with a specific row anomaly flag in the QA4ECV OMI NO 2 data product, addressing the user needs expressed in Sect.2.1.Spurious across-track variability, or stripes, are apparent in current OMI NO 2 data products.The stripes appear as discrete jumps in NO 2 SCDs from one viewing angle to the other.The origin of the stripes is probably related to small differences in spectral calibration and detector sensitivity from one viewing angle to the other.There is currently no solution via the level-1 data, but application of a destriping correction (e.g.Boersma et al., 2011) reduces the systematic stripes to within acceptable limits.The magnitude of the NO 2 destriping corrections has increased from 0.3 × 10 15 to 0.5 × 10 15 molec.cm −2 between 2004 and 2016, related to the use of an annual mean (2005) irradiance spectrum as reference in the DOAS spectral fits.
Optical throughput changes in OMI's (visible) irradiance channel is of the order of 1-1.5 % over the mission period (for rows not affected by the RA).For a signal-to-noise ratio of approximately 500, a deterioration of 1.5 % leads to only marginal increases in NO 2 fitting uncertainties (Zara et al., 2018).Spectral stability, important for the accuracy of DOAS retrievals, has also been very good in the visible channel at 0.002 nm.Such wavelength shifts, if unaccounted for, cause NO 2 SCD errors of less than 1 %.For more details, please see Sect.2.2 of QA4ECV Deliverable 4.2 (Müller et al., 2016) and Schenkeveld et al. (2017).

GOME-2(A)
GOME-2 on EUMETSAT's MetOp-A satellite is an improved version of the GOME instrument (Munro et al., 2016).Level-1 data, version 6.0, are available from January 2007 onwards.A key concern is the accuracy of the longterm record of GOME-2(A) level-1 data.Like GOME and SCIAMACHY, GOME-2 suffered from degradation of its optical components during its lifetime.The optical parts of GOME-2(A) are thought to be increasingly contaminated by outgassing coating material that was meant to protect the detector electronics (Hassinen et al., 2016).This contamination resulted in a progressive wavelength-dependent loss of the instrument throughput.The discontinuity appearing in September 2009 reflects the second throughput test, during which the temperature of the GOME-2 instrument was changed in a controlled way to observe whether or not there was a recovery in performance at any point during the heating.Although the test did not recover the degradation already suffered, it did succeed in stabilising the throughput from September 2009 onwards.The main impact of this degradation is an increase in the noise due to throughput loss.As a result, uncertainties from random error on the NO 2 slant columns are expected to increase with time, especially between January 2007 and September 2009.Compared to GOME and SCIAMACHY, degradation of GOME-2(A) started immediately after launch and proceeded faster, but was stabilised after the throughput test.(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017).Prior to 2007, there was no row anomaly.Affected rows (red crosses) are suffering from a partial blockage of light entering the instrument, so that the absolute radiance levels become compromised.The upper x axis indicates the percentage of OMI pixels (defined as the 100 % × ratio of the number of pixels affected by the row anomaly to the total number of pixels) being affected in a particular month.
In-flight analysis of the GOME-2(A) instrument slit function using a non-linear fitting of Gaussian line shapes to the Kurucz solar atlas has revealed significant time variations of the GOME-2 slit function in channel 3 (e.g.Dikty et al., 2011).Specifically, the nominal width of the slit function (0.50 nm) has decreased over time, probably due to thermal fluctuations of the GOME-2(A) optical bench associated with seasonal and long-term changes in the solar irradiance (Munro et al., 2016).In QA4ECV, this issue is addressed by including the GOME-2(A) slit function as a fit parameter in the DOAS spectral fitting procedure.However, it is unlikely that this fully resolves the issue, so that further increases in NO 2 SCD uncertainties over time should be anticipated (Zara et al., 2018).Compared to GOME and SCIAMACHY, GOME-2(A) solar irradiances suffer much less from spectral structures caused by the diffuser plate, but some small effects remain (Richter et al, 2011).One minor issue is the sensitivity for polarisation structures in the level-1 spectra.In principle, this is corrected for in the level 0-to-1 algorithm (Munro et al., 2016), but some residual small spectral features remain that may interact with atmospheric absorbers in the DOAS fitting.
The instrument specifics, intrinsic quality, and degradation of the four instruments' level-1 data have guided us in select-ing the basic settings for spectral fitting of QA4ECV NO 2 SCDs.We used these guiding principles: -Select the same spectral fitting window for the four different instruments (and if not possible ensure spectral overlap as much as possible).NO 2 SCDs are known to be sensitive to the selection of fitting window, as shown in van Geffen et al. (2015), and in the S5P TROPOMI Verification Report (2015).
-Select a wide fitting window including more NO 2 absorption features for an instrument with a relatively low signal-to-noise, i.e.OMI.This is known to reduce the random component of the uncertainty in the NO 2 SCDs (e.g.Bucsela et al., 2006;Boersma et al., 2007).
-Select the most practical reference spectrum for the DOAS spectral fitting.Ideally these spectra are daily solar irradiances, as in the case of OMI, but if these are compromised in any way, they may be replaced by an average irradiance spectrum, or by daily Earthshine spectra, as is done for GOME.For the latter, a correction for the amount of NO 2 absorption signature in the Earthshine reference spectrum is still required.
www.atmos-meas-tech.net/11/6651/2018/Atmos.Meas.Tech., 11, 6651-6678, 2018 An important ambition of the QA4ECV project is to provide full traceability on retrieval algorithms.Usually, a condensed flow diagram for the retrieval algorithm is included in an Algorithm Theoretical Baseline Document (ATBD).The drawback is that ATBDs are often not easily accessible and that it is not immediately clear which ancillary information has been used in particular algorithm sub-steps.We therefore generated an algorithm traceability chain, a web-hosted interactive flow diagram that shows how the QA4ECV NO 2 algorithm is put together, which external pieces of information are embedded in the retrieval process, and where details on those pieces of information can be found.The traceability chain has different layers (Fig. 2).The main entry for users is the overall algorithm flow chart.Users can click on algorithm process elements, which takes them a level deeper into the algorithm.Figure 2 shows how to interact with the NO 2 traceability chain at multiple levels.The chain is provided as a clickable option on the QA4ECV website along with the options "Data Access" and "User Forum".Providing these options at the same entrance level allows users to obtain a good understanding of how the algorithm works and where ancillary data are coming from.The "Traceability Chain" button, leads to the full chain (first layer).Next, as an example, clicking the "DOAS + wavelength calibration" step will lead to details on that sub-process (second layer).The absorption cross sections used in the DOAS step are available under "Laboratory Absorption Cross Sections", which contains the references to the cross-section data and papers describing them (third layer).The references themselves are linked to the digital object identifiers (DOIs) and take users directly to the relevant paper.

Intercomparison of retrieval sub-steps and algorithm selection
Differences between NO 2 retrievals from different retrieval groups can be traced back to different settings and to different a priori parameters used in the individual retrievals.We made a systematic step-by-step analysis of all components of the NO 2 retrieval by documenting and comparing approaches from the consortium institutes and analysing their contribution to differences and their benefits.These tests, evaluations, and innovations have guided the development of the QA4ECV consortium best-practice algorithm for generating a multi-decadal record for NO 2 and helped to characterise the uncertainties of each retrieval sub-step.

Evaluation of spectral fitting approaches
NO 2 spectral fitting approaches by BIRA-IASB, IUP Bremen, KNMI, and MPI-C were compared in two rounds, with emphasis on OMI and GOME-2, for 1. common (as much as possible identical) settings for the same level-1 data, 2. preferred retrieval settings defined by each group.
The intercomparison comprised 4 full days in winter and summer and early and late in the mission in order to investigate the agreement of retrieval codes with respect to seasonal and instrumental changes.Table 3 shows the details of the spectral fitting retrieval code from the four participating institutes.The retrieval algorithms are based on the same principles, but have been implemented differently and use different software packages.The KNMI code applies a wavelength shift prior to the DOAS fit and does not include an intensity offset in the intensity-fitting model.The common settings are listed in the caption of Table 3.
The common settings intercomparison of OMI NO 2 SCDs for all orbits on 2 February and 16 August 2005, 4 February, and 4 August 2013 showed very good agreement between the different algorithms.The correlation between SCDs from each pair of retrieval codes is always > 99.8 % for all OMI orbits within the 4 selected days.The correlation is slightly less (but still > 99 %) between the KNMI code and the other three codes, suggesting that algorithms agree in capturing the full dynamical range of NO 2 SCDs.The remaining differences appear over background regions and can be attributed to using a non-linear intensity-fitting model instead of a linear optical density fit (resulting in NO 2 SCD differences over the oceans up to 1 × 10 15 molec.cm −2 ; see Fig. 3) and to including or excluding an intensity-offset term in the set of fit parameters (differences up to 1 × 10 15 molec.cm −2 , reducing contrast between bright and dark scenes).Retrieval on optical densities has the advantage of being a linear fit and has traditionally been used in DOAS applications.Fitting intensities has the advantage of a more transparent treatment of the Ring effect and has been applied in operational OMI data retrieval (van Geffen et al., 2015).While none of the two approaches is better by definition, the results differ in particular in combination with the offset correction applied.Based on these outcomes, it was recommended to use optical density fitting and include the intensity offset in the QA4ECV fitting model, even though the exact physical meaning of this term is not entirely clear.Including the intensity-offset term appears to account for spectral signatures originating from vibrational Raman scattering in open water (Oldeman, 2018) and associated incomplete Ring corrections, and prevents O 3 misfits over water and over land.Excluding the intensityoffset term results in larger NO 2 SCD uncertainties and in (spurious) spatial patterns in the O 3 SCDs that resemble the spatial patterns in TOA reflectance.In QA4ECV, the spectral fitting is approximated by Eq. (2) in Zara et al. (2018) (QDOAS), and a variation thereof for NLIN.For more details see QA4ECV Deliverable 4.2, Sect.2.3 (Müller et al., 2016).
In round 2, each institute applied preferred settings to retrieve OMI NO 2 SCDs for the same set of days.The   2013) 1 An optical depth fitting model is of the form ln I (λ) I 0 (λ) = − i σ i (λ) N s,i + j a j λ j with I (λ) the radiance, and I 0 (λ) the irradiance spectrum, σ i (λ) the absorption cross-section spectrum of trace gas i, N s,i the fitting coefficient, or slant column density of trace gas i, and a j the coefficients of a low-order polynomial.
2 An intensity-fitting model is of the form I (λ) = I 0 (λ) e − i σ i (λ)N s,i + j a j λ j .KNMI settings are identical to those in round 1 (Table 3).Relative to the common settings, IUP Bremen used the 425-497 nm fitting window and included a signature for sand absorption (see Richter et al., 2011) in the fitting model, BIRA-IASB applied a 425-460 nm window and included both sand and CHO-CHO signatures, and MPI applied a 431-460 nm window and excluded liquid water absorption from the fit.The intercomparison of preferred set-tings for SCDs again showed very good agreement between the algorithms.The correlation between the different pairs is > 98 %, and the average differences between the different sets are < 1 × 10 15 molecules cm −2 .The largest offset (+0.9 × 10 15 molecules cm −2 ) appears between KNMI and IUP Bremen (Fig. 3a).The higher KNMI SCDs are explained by the intensity fit used by KNMI and by the relatively large difference in the centre wavelengths of the fitting window be-tween these algorithms (435 nm for KNMI, 461 nm for IUP Bremen).Between 405 and 435 nm, the O 3 optical thickness is smaller, and photon paths through the stratosphere are slightly longer than in the 435-500 nm spectral region, located in the flanks of the Chappuis band.DAK simulations indeed show 1.5 % higher air mass factors at 405 nm than at 500 nm (Fig. 4b).For the majority of SCDs retrieved over unpolluted regions, the use of an intensity fit, together with the bluer fitting window, explains the differences between the KNMI and IUP Bremen retrievals.It was not possible to point out a clear winner among the different fitting approaches, but including an intensity offset and liquid water absorption in the fit model reduced fitting residuals and improved NO 2 and O 3 fit results.NO 2 SCDs are most sensitive to the fitting approach, i.e. intensity fit or optical density fit.
The comparisons of the fitting approaches led to a number of clear recommendations for spectral fitting of NO 2 for the QA4ECV record.A complete list can be found in QA4ECV Deliverable 4.2 (Müller et al., 2016).We highlight the most important ones here: -An intensity-offset correction should be included.
-Given the sensitivity to selecting intensity fit or optical density fit (systematic bias up to 1 × 10 15 molecules cm −2 ), it is recommended to use one and the same fit model for all sensors.
-For the 405-465 nm fitting window, the absorption spectrum of liquid water should be included (not necessary for the smaller windows) -Together with the recommendations driven by level-1 data quality considerations shown in Table 3, this led to the definition of spectral fitting of NO 2 and data processing from GOME, SCIAMACHY, OMI, and GOME-2 as summarised in Table 4. Here, the rationale was to maintain as much as possible the same fit approach and fit settings for GOME-2, SCIAMACHY, and GOME as for OMI (for details see Table 2 in Müller et al. (2018).
For the morning sensors GOME-2, SCIAMACHY, and GOME, tests were done to evaluate the consistency between results of the spectral fitting approaches, since some settings such as the fitting window had to be different in order to avoid SCIAMACHY and GOME features for wavelengths < 425 nm interfering with the spectral fit.The results of these tests are reported in the QA4ECV Deliverable document 4.5 (Müller et al., 2018), and we summarise them here.Monthly mean normalised NO 2 SCDs from GOME-2(A) and SCIAMACHY agreed very well in space and time despite the differences between the instruments in terms of coverage, pixel size, and fitting window (Figs. 17 and 18 in Müller et al., 2018).Because GOME suffers from a diffuser plate artefact emerging in the irradiance files, we used daily radiance spectra obtained over the Pacific Ocean as a substitute for the irradiance spectrum.The region over the Pacific is largely free of tropospheric NO 2 .We then determined the offset correction as the difference between the normalised SCD values from SCIAMACHY and GOME between August 2002 and June 2003, when both instruments were operational over the reference sector.It amounts to 1.48 × 10 15 molec.cm −2 .A subsequent matching of the corrected GOME stratospheric columns to the SCIAMACHY stratospheric columns over the reference region showed that the robustness of the correction is excellent, with only small deviations (±10 14 molec.cm −2 ) between GOME and SCIAMACHY for the period of overlap (Fig. 20 in Müller et al., 2018).

Evaluation of stratosphere-troposphere separation
We compared stratospheric correction approaches by IUP Bremen, KNMI, and MPI-C to establish best practices for this algorithm step.The stratospheric correction approach from IUP Bremen is based on scaling model-simulated (B3dCTM model) stratospheric vertical columns to match satellite observations over the remote Pacific (Hilboll et al., 2013).In the KNMI approach, NO 2 SCDs are assimilated in the TM4 model, so that model simulations of stratospheric NO 2 columns agree well with the retrieved slant columns over regions away from strong tropospheric pollution (Dirksen et al., 2011).MPI-C uses a modified reference sector approach called STREAM (Beirle et al., 2016).This approach estimates the stratospheric vertical columns from retrievals over regions where tropospheric NO 2 is assumed to be negligible and over regions with high clouds, where the tropospheric column is shielded.The derived stratospheric field is then smoothed and interpolated globally based on the assumption that the spatial pattern of stratospheric NO 2 does not feature strong gradients.The intercomparison of stratospheric correction approaches focused on 2 individual days (1 January and 19 July 2005) and 2 monthly means (January and July 2005).This comparison should be regarded as a "preferred settings" round, where SCD inputs were identical, but the stratospheric AMFs and methods used to estimate the stratospheric NO 2 columns varied between the groups.We evaluated the success of the stratospheric corrections via checks on the smoothness of stratospheric patterns and on the plausibility of the tropospheric residues (defined as N v − N v,strat ) over remote regions where values are expected to be low and not strongly negative.The comparisons (Sect.2.4 of QA4ECV Deliverable 4.2, Müller et al., 2016) indicated that the different schemes showed similar stratospheric NO 2 columns and tropospheric residues and each of the approaches would be appropriate for use in the QA4ECV NO 2 algorithm.The quantitative differences between the stratospheric NO 2 columns were generally smaller than 0.5 × 10 15 molecules cm −2 , a number that can be regarded as an upper limit for the structural uncertainty in the stratospheric estimate, but the patterns also revealed that IUP Bre-   men and KNMI stratospheric NO 2 columns were biased high at high solar zenith angles in the winter hemisphere.
In Lorente et al. (2017), we attributed this bias to the SCIA-TRAN and DAK radiative transfer models not fully accounting for the sphericity of the atmosphere in describing photon transport after backscattering.The McArtim model does account for the sphericity of the atmosphere for both incom-ing and backscattered light, resulting in lower stratospheric AMFs, especially for extreme solar zenith angles.
The KNMI data assimilation was selected as the default approach for estimating the stratospheric NO 2 column in the QA4ECV algorithm.This ensures consistent knowledge of the state of the atmosphere (NO 2 and temperature profiles, stratospheric dynamics) derived from the same model that predicts the a priori tropospheric NO 2 profile shape re-quired by the tropospheric AMF calculation.We decided to update the model framework for assimilation from TM4 to TM5-MP (Williams et al., 2017).Moreover, the data assimilation approach has incorporated a correction for sphericity via McArtim, as described in Lorente et al. (2017).Retrieval results point out that the stratospheric AMFs, together with improvements in the data assimilation scheme, lead to much fewer negative tropospheric columns for retrievals at extreme viewing geometries, also at midlatitudes.As a second option, the consortium selected MPI-C STREAM as a complementary algorithm for stratospheric NO 2 estimates to be included in the QA4ECV data product.STREAM is based on the measurements alone without involving models.This allows QA4ECV data users to switch approaches, which may be beneficial under certain circumstances.Especially in situations with strong stratospheric NO 2 gradients, such as near the polar vortex, assimilation is the preferred approach.It has been shown that the data assimilation captures the strong spatial gradients occurring near the vortex (Dirksen et al., 2011), whereas the STREAM method by design results in zonally smooth structures in those regions.STREAM could be a useful alternative to data assimilation for studies into weak NO x sources, such as emissions from soil, ships, and small, isolated anthropogenic sources.The strength of STREAM is that it is based on measurements and does not rely on models.Data assimilation is potentially somewhat vulnerable to misinterpreting tropospheric contributions as stratospheric NO 2 , so that STREAM could be used in areas away from strong stratospheric gradients (where the zonally smooth structure of the stratospheric field is of little consequence).Furthermore, the differences between the two methods are useful as a measure of structural uncertainty in the stratospheric correction, beyond the typical uncertainties of 0.2 × 10 15 molecules cm −2 derived from the observationforecast statistics of the assimilation scheme (Dirksen et al., 2011).Regions of enhanced structural uncertainty are relevant, especially over areas with small tropospheric NO 2 enhancements, such as from outflow of continental pollution over oceans, shipping lanes, and over areas with soil NO x emissions.
As an example, Figure 5 shows OMI stratospheric NO 2 estimates from both the data assimilation and STREAM approach for the QA4ECV v1.1 product on 2 February 2005.The upper panel illustrates that the latitudinal gradients in NO 2 between the data assimilation and STREAM agree reasonably well.It is evident that the data assimilation approach captures more variability along a zonal band, resulting on this day in lower stratospheric NO 2 over North America and Europe and higher amounts over north-eastern Asia than in the STREAM method.The differences are up to 1 × 10 15 molec.cm −2 , such that they have a substantial impact on the tropospheric column retrievals.

Evaluation of air mass factor calculations
We performed a comparison of approaches to calculate AMFs for NO 2 and mapped the uncertainties associated with these approaches.Much of this comparison has been reported in Lorente et al. (2017) and in Sect.2.5 of QA4ECV Deliverable 4.2 (Müller et al., 2016), so we give only a brief summary here.First, we compared radiative transfer models from the consortium (LIDORT, SCIATRAN, DAK, McArtim) for their top-of-atmosphere reflectances and their capacity to compute vertically resolved or box AMFs.The agreement between reflectances from the four models at 440 nm (and also at 340 nm) was excellent.Mean relative differences between models were generally small (< 1 %), with the exception of high solar zenith angles (> 80 • ), where systematic differences with the McArtim model amount to up to 10 %.McArtim is the only model that simulates radiative transfer in full sphericity for direct and diffuse light (Deutschmann et al., 2011).Other differences, such as different layering schemes, polarisation description, refractive index, and Rayleigh scattering cross-section spectrum, only lead to small differences (< 1 %) between the models.
To establish the QA4ECV NO 2 algorithm settings, we selected the appropriate wavelength for calculating the NO 2 box AMFs.We investigated the wavelength dependency of the NO 2 AMFs for retrieval scenarios with substantial tropospheric pollution (N v,trop = 16 × 10 15 molec.cm −2 ) and considered that the AMF calculated at a single wavelength should be representative of the fit window average AMF.Tropospheric NO 2 AMFs were calculated between 400 and 500 nm with 1 nm steps.Figure 6 shows a distinct increase in AMF with wavelength.This increase reflects the increasing transparency of the lower troposphere towards the green part of the spectrum where Rayleigh scattering is weakening.In general, tropospheric AMFs increase by 0.2 %-0.3 % per nanometre redshift.The purple, blue, and light-blue lines show the AMF averaged over all spectral points in three relevant fitting windows used within QA4ECV and by individual groups.
We saw in Fig. 4 that total NO 2 AMFs decrease weakly with wavelength (−0.01 % nm −1 redshift).Figure 6 shows that tropospheric NO 2 AMFs increase with wavelength (+0.2-0.3 % nm −1 ).This difference can be understood from Rayleigh scattering, occurring mostly in the lowest kilometres of the troposphere.The bulk scattering increasingly screens NO 2 in the boundary layer towards the UV, so that tropospheric AMFs are smallest for shorter wavelengths.For the fitting windows considered for QA4ECV NO 2 retrievals (425-465 and 405-465 nm), we recommend calculating the NO 2 AMF at 437.5 nm for all sensors.The blue and purple lines in Fig. 6 indicate that 437.5 nm is a representative wavelength used to calculate the NO 2 AMF.437.5 nm is reasonably near to the centre wavelength of both windows (435 and 445 nm respectively) and the 437.5 nm AMF is within 2 % of the average AMF for both windows.Uncertainties re- lated to the exact choice of AMF wavelength calculation are much smaller than other AMF uncertainties, such as clouds, albedo, trace gas and aerosol profiles, as discussed below and in Lorente et al. (2017).
We compared altitude-resolved AMFs and tropospheric AMFs calculated with the four different radiative transfer models.We found that the agreement is very good (within 3 % and 6 % respectively) if identical ancillary data (surface albedo, terrain height, cloud parameters, and trace gas profile) and cloud and aerosol corrections are being used.This shows that the choice of RTM (radiative transfer modelling) for calculation of the tropospheric AMF introduces a modest uncertainty of no more than 6 %, which is intrinsic to the calculation method and cannot be avoided.
To assess the full impact of preferred settings and methods for AMF calculations, we organised a round robin comparison.Six groups joined this round robin, each using their preferred setting to calculate the tropospheric AMFs.Besides the QA4ECV-partners KNMI/WUR, BIRA-IASB, and IUP Bremen, NASA GSFC, Leicester University, and Peking University also participated.The six groups used widely different calculation methods (RTMs, temperature, cloud and aerosol corrections) and preferred ancillary data on albedo, terrain height, NO 2 profile, etc. (Müller et al., 2016).The ensemble mean AMF served as a reference with which to compare the AMFs by the individual groups.The round robin exercise focused on China because it provides challenging retrieval conditions and Peking University only calculates AMFs over that region.The overall spatial pattern of AMF values was well reproduced by all groups.AMFs generally agree to within 10 % over unpolluted areas but show differences of up to 40 % with respect to the ensemble mean over polluted regions in eastern China and Korea.These differences can be traced back to differences in the preferred surface albedo, clouds, and a priori NO 2 profiles used in the AMF calculation.It is not possible to identify the single most important forward model dependency for the AMF calculation.The analysis in QA4ECV Deliverable 4.2 (Müller et al., 2016) and in Lorente et al. (2017) suggests that accurate knowledge of surface albedo, clouds, and a priori NO 2 pro- files are of similar importance, and their interplay, in combination with the choices for cloud and aerosol correction methods, is driving the structural uncertainty in the NO 2 AMFs.
Based on the results from the comparisons discussed above, the following recommendations for calculating QA4ECV NO 2 AMFs were made: -Calculate the NO 2 AMFs at 437.5 nm for all instruments.
-Apply the independent pixel approximation for cloud correction, but also include clear-sky AMFs in the product.
-Use cloud information (cloud fraction, cloud pressure) from FRESCO+ for GOME, SCIAMACHY, GOME-2A (Wang et al., 2008) and OMCLDO2 for OMI (Veefkind et al., 2016).These have been derived using the same physical principles as in the AMF calculation.
-Apply implicit aerosol correction (via the cloud correction).This correction is effective in most retrieval scenarios with moderate aerosol pollution.When accurate, observation-based aerosol information becomes avail-able from ECMWF CAMS or NASA GMAO.Explicit aerosol corrections will be considered.
-Use surface albedo climatologies (as close as possible to the 437.5 nm AMF wavelength) consistently with the ones used in the cloud retrievals.For GOME, SCIA-MACHY, and GOME-2A, this is the albedo climatology from Tilstra et al. ( 2017), and for OMI it is the updated 5-year climatology (Kleipool et al., 2008).
-Use the DEM_3KM pixel-average terrain height.
-Use spatially interpolated (to pixel centre) NO 2 profiles simulated by TM5-MP at 1 • × 1 • .TM5-MP is the model used for the data assimilation of NO 2 SCDs to estimate the stratospheric contribution (Sect.5.2).

QA4ECV NO 2 uncertainty estimates 6.1 Theoretical algorithm uncertainty
The QA4ECV NO 2 product contains an algorithm uncertainty estimate associated with each individual pixel's tropospheric NO 2 column.This estimate is calculated theoretically via uncertainty propagation based on the principal retrieval equation (Eq.1): The uncertainty propagation accounts for spectral fitting uncertainties (σ N S ) and contributions from uncertainties in a priori and ancillary data required for calculating the stratospheric NO 2 background (σ N s,strat ) and the AMF (σ M tr ).The uncertainty in the tropospheric AMF, or AMF covariance is written as follows: where ∂M ∂A s represents the local sensitivity of the air mass factor to surface albedo A s , σ A s the best estimate of the uncertainty in the surface albedo, and so on.The fourth term on the right-hand side represents the contribution from the uncertainty in the a priori profile shapes and is tentatively approximated as 10 % of the tropospheric AMF.This term is absent when using the averaging kernel in satellite data applications (Eskes and Boersma, 2003), which removes the dependence on the a priori profile.The last term represents the contribution from the error correlation between cloud fractions and surface albedo f cl A S ; surface albedo influences AMF directly, and indirectly because cloud fractions are sensitive to surface reflectance (see Eqs. 20 and A2 in Boersma et al.,K. F. Boersma et al.: Improving algorithms and uncertainty estimates for satellite NO 2 retrievals 2004 and Lorente et al., 2018 for more detail).As f cl A S and ∂M ∂f cl are negative and ∂M ∂A s is positive, this last term gives a positive contribution to σ 2 M tr .The uncertainty σ should be interpreted as the best guess of the retrieval uncertainty for one specific measurement.This uncertainty contains random and systematic error components, and the different systematic error components (due to errors in profile shape, surface albedo, etc.) each have their own spatial and temporal scales.Therefore, when averaging over multiple pixels (spatially) or over time, part of the error will cancel out or be smoothed, but (an unknown) part of the systematic error will remain even after averaging; see Boersma et al. (2016).
We recommend using Eq. ( 4) below to estimate the uncertainty σ o for spatially or temporally averaged data.This method takes the area-weighted (statistical) retrieval uncertainty σ and then accounts for a partial correlation in the errors between pixels as in Eskes et al. (2003): with c as the error correlation between the n retrievals.In Boersma et al. (2016), c = 0.15 is proposed based on the consideration that errors in surface albedo, clouds, a priori NO 2 profile, and aerosols (or lack of description thereof) are typically correlated at the spatio-temporal scales of moderate resolution (global) models, i.e. down to 0.5 • × 0.5 • and over 1 month (for example the surface albedo is from a monthly climatology).Equation ( 4) with c = 0.15 implies that the spatially or temporally averaged uncertainty cannot reduce to below 39 % of the level of typical single-pixel uncertainties (σ ), even when many observations are available.

Algorithm uncertainties and quality flags
Table 5 gives an overview of the most important uncertainties and the quality flags of QA4ECV NO 2 provided in the data product.Note that the uncertainty estimates and quality flags provide clearly different types of information to the user.The uncertainty characterises the dispersion of the NO 2 column, given the value of the measured column, and our best understanding of the retrieval process.Quality flags indicate whether the retrieved value and the uncertainty estimate have been obtained under conditions where they are expected to be valid.

Evaluating the sub-step uncertainty estimates
An innovative aspect of the QA4ECV project is the evaluation of the uncertainty estimates of retrieval sub-steps against independent estimates of the same metric and structural uncertainties.

Evaluation of NO 2 SCD uncertainties
We compared the DOAS uncertainty estimates (σ N S ) from the spectral fitting algorithm against independent estimates obtained from the spatial variability of an ensemble of DOAS SCDs over areas with little geophysical variability using a statistical approach (Boersma et al., 2007).Our SCD uncertainty evaluation is described in detail in QA4ECV Deliverable 5.5 (Boersma et al., 2017a) and in Zara et al. (2018) for OMI and GOME-2A, and we summarise the results here.For both instruments, we found that the improved QA4ECV OMI NO 2 retrieval shows smaller uncertainties than other OMI algorithms and good agreement between the DOAS and statistical SCD uncertainties.This suggests that the recommendations made in Sect.5.1 and in QA4ECV D4.2 (Müller et al., 2016) have improved the spectral fitting of NO 2 such that the typical mission-average SCD uncertainties for both instruments amount to 0.7-0.8× 10 15 (was ∼ 1.0 × 10 15 ) molec.cm −2 .For OMI, this uncertainty is dominated by random contributions from propagation of measurement noise, but we also noticed a 30 % systematic contribution from stripe effects.For OMI, the trend in SCD uncertainties was small (< 2 % yr −1 ) in line with the known radiometric stability of the instrument (Schenkeveld et al., 2017), but for GOME-2A, the NO 2 SCD uncertainties increased by 8 % yr −1 until September 2009 and after heating the instrument by < 3 % yr −1 over 2009-2015.The structural (systematic) uncertainty, estimated from the differences between NO 2 SCDs calculated with different but equally plausible fitting methods (with or without intensity-offset correction; see Sect.5.1), is larger the theoretical and statistical estimates than but of a similar magnitude.Table 6 gives an overview of the various estimates of uncertainty for the NO 2 SCDs.

Evaluation of uncertainties in the stratospheric correction
The uncertainty of the stratospheric NO 2 vertical column in QA4ECV NO 2 product is based on a global statistical analysis of results from the data assimilation procedure and documented as 0.2 × 10 15 molecules cm −2 (Dirksen et al., 2011).The assimilation predicts stratospheric NO 2 columns from an observation-constrained (analysed) start field and TM5modelled transport and chemistry.The average discrepancies between the 24 h forecast and actual satellite-observed NO 2 slant column fields over pristine areas are regarded as a measure of the uncertainty in the stratospheric NO 2 field.In QA4ECV Deliverable 5.5 (Boersma et al., 2017a), we verified that the observation minus forecast (O -F) assimilation statistics over the Pacific are indeed consistent with an uncertainty estimate of 0.2 × 10 15 molecules cm −2 for the stratospheric column.
To further evaluate the estimate of the stratospheric column uncertainty, we compare the QA4ECV data assimila-  tion and STREAM OMI stratospheric NO 2 column estimates for 2 February 2005.There are considerable methodological differences between the data assimilation and STREAM techniques.Yet the data assimilation and STREAM stratospheric NO 2 distributions agree to a reasonable extent, with data assimilation stratospheric columns generally smaller and their spatial features sharper than in STREAM.The TM5-MP assimilation approach distinguishes stratospheric NO 2 from free-tropospheric background contributions, while STREAM does not do this.This may be the main reason for the structurally lower values in the assimilation.This is illustrated in Fig. 7, which shows the meridional variability in the stratospheric NO 2 column from data assimilation and from STREAM along 40 • N on 2 February 2005.Between 75-125 • W over the United States and between 0 and 40 • E (Europe), the data assimilation stratospheric columns values are 0.2-0.5 × 10 15 molecules cm −2 lower than the STREAM values.Over eastern Asia (100-140 • E), data assimilation and STREAM agree to within ±0.3 × 10 15 molecules cm −2 .These differences reflect the structural uncertainty in stratospheric (vertical) NO 2 columns, arising when different retrieval methodologies are applied to the same satellite observations, and both uncertainty estimates are included in Table 6.

Evaluation and breakdown of uncertainties in the air mass factors
The uncertainty in the tropospheric AMF is calculated via the uncertainty propagation from Eq. ( 3).The contribution of each parameter to the overall AMF uncertainty depends on the specific observation conditions for each pixel.The air mass factor sensitivities (e.g.∂M ∂A s ) describe the sensitivity of the AMF to changes in the local parameter value, evaluated around the specific value for the parameter at the pixel.The uncertainties in the cloud parameters (σ f cl , σ pcl ), surface albedo (σ A s ), and the a priori profile shape have been estimated from the literature or derived from comparisons with independent data.For QA4ECV OMI NO 2 , we use an uncertainty in the surface albedo of 0.015, based on various comparisons of albedo databases (e.g.Boersma et al., 2011), uncertainties of 0.025 and 50 hPa in the OMI O 2 -O 2 cloud fraction and cloud pressure estimates, respectively, based on recent improvements in the cloud algorithm (Veefkind et al., 2016), and a 10 % contribution from NO 2 profile uncertainty.The latter is based on comparing AMFs calculated with simulated a priori profiles to AMFs calculated with measured NO 2 profiles from aircraft and lidar (e.g.Hains et al., 2010 and references therein).
Apart from the overall AMF uncertainty estimate, the QA4ECV NO 2 ECV precursor data product also provides the individual contributions from the cloud parameters, surface albedo, and a priori profile shapes.Figure 8 presents the relative monthly average tropospheric AMF uncertainties and their individual contributions from surface albedo, cloud, and profile uncertainties (not shown because they have been set at the 10 % level) for OMI throughout 2005 over Europe, the United States, China and Johannesburg, South Africa, regions polluted with NO 2 .The largest contribution to AMF uncertainty is from surface-albedo-cloud cross term (10 %-20 %), with substantial surface albedo (±10 %).In winter the uncertainty in cloud pressure is a substantial contributor in Europe and China.The strong surface-albedo-cloud fraction cross term 2 ∂M ∂A s ∂M ∂f cl f cl A S can be understood from the strong sensitivity of the cloud fraction to the surface albedo, especially when cloud fractions are small (see Appendix in Boersma et al., 2004).The overall tropospheric AMF uncertainties are estimated to be 20 %-25 %, comparable to earlier estimates for GOME tropospheric NO 2 presented in Boersma et al. (2004).
We quantified the structural uncertainty in tropospheric AMFs by comparing an ensemble of different AMF calculation methods and parameter assumptions over eastern China, a region with high amounts and a complex mixture of aerosols, clouds, and NO 2 pollution (Lorente et al., 2017).Retrieval groups used their preferences for ancillary data and for cloud and aerosol corrections.The outcome of the comparisons suggested systematic AMF differences of up to 15 % in summer and 40 % in winter between the groups.We consider these structural uncertainty estimates to be conservative, as they have been calculated for the particularly challenging retrieval regime of eastern China in 2005.Including the structural uncertainties in the overall budget, as done for the QA4ECV HCHO ECV precursor product (De Smedt et Here we present estimates of typical algorithm, single-pixel uncertainties for the QA4ECV NO 2 columns in four regions: Europe, United States, and China, as showcases for typical polluted regions, and the Pacific Ocean as an example of a remote region, with low, background levels.These uncertainty estimates should be interpreted as representative of typical, single-pixel uncertainties encountered by users interpreting the data.We see from Fig. 9 that, over the polluted regions in wintertime, the single-pixel retrieval uncertainty is dominated by the uncertainty in the tropospheric AMF.In summer, contributions from uncertainties in the SCD are largest, but there are comparable contributions from uncertainties in the stratospheric correction and the tropospheric AMF.On average a single pixel is 35 %-45 % uncertain in the polluted regions.Over the background region (Pacific Ocean), we see that the tropospheric NO 2 column uncertainty exceeds 100 % and is dominated year-round by the uncertainties in SCD and the stratospheric column estimate.).

Uncertainties in averaged tropospheric NO 2 columns
When averaging tropospheric columns over space, uncertainties may be considerably reduced.For example, over regions such as the Pacific Ocean, where the uncertainty is dominated by a random SCD error, the tropospheric column uncertainty will be greatly reduced when averaging over a month or over a larger region.Over polluted regions dominated by uncertainties in the tropospheric AMF, averaging will also reduce the tropospheric column uncertainties, but an unknown systematic component will remain.For both retrieval situations, we adopt Eq. ( 4) to account for possible systematic errors arising from imperfect knowledge of surface albedo, a priori NO 2 profile, clouds, and correlations between these.
In model-column comparisons and in-trend analysis studies, it is often important to have knowledge of temporally averaged uncertainties.Because the temporal variability in tropospheric NO 2 columns is typically strong (because of the diurnal cycle, day-to-day variability, weekly cycles, etc.), this implies considerable variability in day-to-day uncertainties.To obtain the uncertainty in a monthly mean tropospheric NO 2 column over a certain region, we recommend taking whichever is largest: (a) the temporally averaged values for σ o (Eq.4), or (b) the standard deviation of the mean (standard error) of the daily tropospheric NO 2 columns.If there   is substantial temporal variability (from changes in photochemistry, transport events), the standard error will be a good representation of the uncertainty in the monthly mean tropospheric NO 2 column.Figure 10 shows a comparison of monthly averaged uncertainties σ o and the local standard deviation of the mean NO 2 columns for four small regions (0.25 • × 0.25 • ).The figure confirms that the averaged uncertainties provide an optimistic estimate of the uncertainty, at ±10 %, in the monthly mean NO 2 columns.For the polluted regions, the standard deviation of the mean is 15 %-30 %, exceeding the average uncertainties.This illustrates that calculating the uncertainty in a monthly mean over a small region such as a city is more driven by sampling limitations than by the intrinsic uncertainty of the retrieval.

Validation of QA4ECV NO 2 columns and uncertainties
As an example of the validation efforts taken within QA4ECV, here we compare the QA4ECV OMI tropospheric NO 2 with independent MAX-DOAS column measurements in the polluted city of Tai'an, China.We compare OMI pixels measured within 20 km and 30 min of a MAX-DOAS measurement in Tai'an.We validate both the QA4ECV v1.1 and the well-established DOMINO v2 product for reference.
The MAX-DOAS measurements were conducted by Irie et al. (2008) in the Chinese city of Tai'an in May-June 2006 when pollution levels were substantial.The instrumentation and retrieval technique have been described extensively in Irie et al. (2008Irie et al. ( , 2012)).The slant column retrievals have been tested in a semi-blind intercomparison exercise in Cabauw, the Netherlands, indicating agreement to within 10 % of other groups (Roscoe et al., 2010).Uncertainties in the MAX-DOAS NO 2 columns are driven by noise, air mass factor and temperature uncertainties amounting to approximately 15 % uncertainty.The representative horizontal footprint of the MAX-DOAS measurement is of the order of 10 km.It was suggested by Irie et al. (2012) that the spatial distribution of NO 2 tropospheric columns around Tai'an during their observation period was rather homogeneous compared to other sites used for their validation comparisons.More quantitative characterisation of this aspect will be discussed below.
We compare OMI NO 2 tropospheric columns measured with a pixel centre within 20 km of the location of the MAX-DOAS instrument in Tai'an (for some days more than 1 pixel can be matched up with a MAX-DOAS measurement).This coincidence criterion limits spatial representativeness mismatches between MAX-DOAS and OMI and is consistent with the spatial dimensions of the MAX-DOAS (±10 km) and OMI (20-30 km) footprints (we excluded pixels from the outer four OMI rows).We furthermore require that the OMI columns were measured within 30 min of the coinciding MAX-DOAS measurement, have a pixel footprint area < 700 km 2 , and that the satellite retrieval was done under mostly clear-sky conditions (cloud radiance fraction < 0.5), which is in line with recommendations on the appropriate use of QA4ECV data as documented in the Product Specification Document (QA4ECV Deliverable D4.6).Earlier studies (e.g.Pinardi et al., 2017;Drosoglou et al., 2017) found the largest discrepancies between MAX-DOAS and satellite NO 2 columns over strongly polluted regions.Such discrepancies are at least partly due to spatial inhomogeneity in the NO 2 field around the station location.To quantify the spatial representativeness of the Tai'an MAX-DOAS site for the OMI pixels included in the comparison, we calculated the campaign-mean spatial tropospheric NO 2 column distribution (Fig. 11).We then use the ratio of this campaign-mean column at Tai'an to the campaign-mean column at the location of the individual OMI pixel to project individual OMI NO 2 columns (N V,p , i.e. what is usually validated) within our criteria to values more representative of the location of the Tai'an (N V,T ): For example, for a pixel observed directly south-west of Tai'an, where pollution levels are somewhat higher than directly over Tai'an, the scaling factor will be smaller than 1.
Figure 12b shows the scatter plot of DOMINO v2 vs. MAX-DOAS NO 2 columns for Tai'an.There are now 45 DOMINO v2 pixels matching 17 independent MAX-DOAS measurements.This higher number of matches can be explained from the previous version of the OMI O 2 -O 2 cloud product (Acarreta et al., 2004), used in the DOMINO v2 retrieval, containing effective cloud pressures that are too low compared to independent information (Boersma et al., 2011;Veefkind et al., 2016), so that more OMI pixels pass the selection criteria.The bias for DOMINO v2 is +0.85 × 10 15 molec.cm −2 (+11 %, n = 45), with a root mean square deviation of 2.66 × 10 15 molec.cm −2 (35 %).
The differences between OMI and MAX-DOAS NO 2 columns provide an opportunity to evaluate the uncertainties of the satellite retrievals.This relies on good knowledge of the MAX-DOAS uncertainties and relatively small uncertainties associated with the representativeness of MAX-DOAS for the coincident OMI columns.Assuming that the retrieval errors between OMI and QA4ECV are independent and follow a normal distribution, we expect that the distribution of the differences between OMI and MAX-DOAS takes on a Gaussian form characterised by width σ  et al. (2004) and below).In Fig. 13 we compare the distribution of differences predicted from the above Gaussian function based on the uncertainties reported in the OMI and MAX-DOAS data files and a 10 % representativeness difference error (estimated from deviations from the Tai'an value shown in Fig. S2 in the Supplement) to the actual observed differences (individual pairs of OMI and MAX-DOAS NO 2 column values).We see from Fig. 13 that the differences between OMI and MAX-DOAS NO 2 columns are more nar- rowly distributed than expected from algorithm uncertainties and theory, although the sample size is small (n = 31 for QA4ECV, n = 45 for DOMINO v2).This holds for QA4ECV differences, which are 39 % smaller than expected, but also for the DOMINO v2 differences, 16 % smaller than expected over Tai'an.The tighter distribution of the observed differences implies the following: 1.The uncertainties in OMI and MAX-DOAS retrievals possess some degree of correlation (for instance in situations when OMI has a high bias.Also, MAX-DOAS may be biased high, limiting the magnitude of the differences).
2. OMI and/or MAX-DOAS algorithm uncertainty estimates are too conservative.
3. OMI or MAX-DOAS uncertainties contain an unknown persistent error component, so that σ O or σ MD have been overestimated.
4. The uncertainties are a combination of the above.
The MAX-DOAS NO 2 retrieval technique suffers from some similar error contributions (a priori NO 2 profile shape, aerosols) but it is also different from the satellite retrieval by design (no albedo or stratospheric correction dependence, ground-based perspective), so we should expect some but not a full error correlation.If there was a substantial systematic and persistent error component to σ O or σ MD (and we have no indication for this nor do we know about its magnitude or sign), we would have needed to reduce our estimates for σ O and σ MD in Table 7 and expect a distribution of the differences that is more narrowly Gaussian and peaking at a typical systematic difference (or bias).Figure 13 shows a small bias for QA4ECV.We therefore conclude that the OMI (and MAX-DOAS) retrieval uncertainties estimates could be too conservative, although our findings are based on a small sample.In the case of QA4ECV, a reduction of both the OMI and MAX-DOAS uncertainties by 35 % would be in much better agreement with the observed differences at the Tai'an station.This first validation is based on a limited time range and one site.A more comprehensive validation work, based on several MAX-DOAS sites and several years of data, is in preparation (Compernolle, 2018).

Summary
We have developed an improved algorithm and uncertainty assessment for tropospheric NO 2 satellite retrievals from UV/VIS satellite sensors.Our effort has resulted in the generation of a 1995-2017 climate data record of tropospheric NO 2 columns with fully traceable uncertainty metrics that can be readily used for model evaluation, for estimating NO x emissions and nitrogen deposition.In designing our new algorithm, we followed advice from the user and producer community and from WMO GCOS best practices on generating climate data records.Specifically, we extended the information content on flags and uncertainties in the data files and present a traceability chain along with the data files.This traceability chain is an easily accessible web-hosted interactive flow diagram that shows the components of the QA4ECV NO 2 algorithm and how external information is embedded in the retrieval process, providing details on where those pieces of information can be found.
The QA4ECV project involved detailed comparisons of different approaches between groups for the DOAS slant column retrievals and the estimate of the stratospheric subcolumn and air mass factors.Using the latest and best available level-1 data for GOME, SCIAMACHY, GOME-2A, and OMI from the relevant space agencies, the comparisons led us to improve the spectral fitting of NO 2 by accounting for liquid water absorption and an intensity-offset correction.This improved the quality of the NO 2 fit over clear-sky ocean scenes by up to 30 % (Zara et al., 2018), but did not sub-Table 7. Expected and observed differences between OMI and MAX-DOAS NO 2 columns observed over Tai'an in June 2006 for the QA4ECV (n = 31) and DOMINO v2 (n = 45) ensemble.Summary of uncertainties for the all (31 or 45) matching pixels.σ O (reported OMI uncertainty) and σ M (reported MAX-DOAS uncertainty) are the mean of 31 or 45 individual values, and σ R is considered to be a 10 % contribution from mismatches.

Expected differences
Observed differences Expected differences Observed differences (QA4ECV) (QA4ECV) (DOMINO v2) (DOMINO v2) stantially affect the NO 2 fit over polluted scenes.We compared three alternate methods for estimating the stratospheric NO 2 background.Data assimilation was considered to be the most viable option for the QA4ECV algorithm because it provides a coherent framework for stratospheric corrections as well as air mass factor (AMF) calculations.We based the data assimilation on the TM5-MP chemistry transport model with 1 • × 1 • horizontal resolution, a major step forward compared to earlier assimilation schemes based on a TM4 (3 • × 2 • ), and include corrections for sphericity effects on atmospheric radiative transfer, as described in Lorente et al. (2017).Our new stratospheric correction leads to fewer negative tropospheric NO 2 columns for retrievals at extreme viewing geometries.We then tested various models and approaches to calculate tropospheric AMFs under challenging retrieval scenarios.AMFs calculated with different radiative transfer models agree well, as long as assumptions and ancillary data inputs are consistent.With groups using their own preferred settings, we find differences (or structural uncertainty) in AMFs up to 40 % with respect to the ensemble mean, stressing the importance of adequate traceability.Many of the lessons learnt for QA4ECV algorithm devel-opment are currently being applied to NO 2 retrievals from S5P-TROPOMI.
The QA4ECV NO 2 product contains an algorithm uncertainty estimate associated with each individual observation.We obtain this estimate via uncertainty propagation calculations, accounting for pixel-specific sensitivities to state parameters (Jacobians) such as surface reflectance, clouds, and the NO 2 vertical profile.The uncertainties are highest in the cold season, when AMFs are particularly uncertain and typically amount to 40 % over the polluted areas.For averaged QA4ECV NO 2 data, associated uncertainties may be reduced, but part of the uncertainty due to systematic error will remain.Our work provides recommendations on how to estimate the uncertainty for spatially or temporally averaged data, taking into account a partial correlation in the errors between pixels.We evaluated the algorithm uncertainties against independent assessments of structural uncertainties for each retrieval step and find that the structural uncertainties are of similar magnitude or exceed the algorithm uncertainties for all retrieval sub-steps.Finally, we used MAX-DOAS NO 2 column measurements obtained over the polluted Tai'an (China) region in June 2006 to validate the OMI QA4ECV NO 2 columns and their uncertainties.By accounting for spatial differences between the pixel and the location of Tai'an, we found good agreement between the QA4ECV and MAX-DOAS NO 2 columns (bias = −2 %, rms differences 16 %, n = 31), which are much better than the agreement between DOMINO v2 and MAX-DOAS (bias = +11 %, rms 35 %, n = 45).The small differences between coinciding QA4ECV and MAX-DOAS NO 2 columns suggest that our QA4ECV algorithm uncertainties are likely on the conservative side, at least over Tai'an.
Data availability.The QA4ECV NO 2 essential climate variable precursor product contains vertical NO 2 columns for the period 1995-2017.The data set contains (1) the tropospheric vertical column density, (2) the stratospheric vertical column density, and (3) the total vertical column density.The NO 2 ECV precursor data provide geophysical information for each and every ground pixel observed by GOME, SCIAMACHY, OMI, and GOME-2(A).The QA4ECV NO 2 data product is available online via http://www.qa4ecv.eu(last access: 13 December 2018), under "ECV data".The data product has been processed with the coherent algorithm described in this work.
For GOME, data are available from 1 July 1995 to 30 June 2003 (8 years).For SCIAMACHY, data are available from 1 August 2002 to 30 April 2012 (9 years and 9 months).For GOME-2(A), data are available from 1 January 2007 to 31 December 2017, and for OMI from 1 October 2004 to 31 December 2017, so that the total length of the data set exceeds 22 years at the time of writing.For each of the data sets, digital object identifiers have been registered (Boersma et al., 2017b, c, d, e).Detailed information on how to use the data can be found in the Product Specification Document for NO 2 ECV Precursor product (Boersma et al., 2017f).
Author contributions.KFB led the effort to generate the QA4ECV NO 2 data sets, prepared the figures and wrote the manuscript.HE and KFB developed the overall retrieval framework and processed the data.AR coordinated the SCD intercomparison and processed the GOME, SCIAMACHY and GOME-2 NO 2 SCDs.IDS, JVG and SB provided inputs to the SCD intercomparison; IDS processed the GOME-2 SCDs and JVG processed the OMI SCDs.AL generated Fig. 5, and AL and KFB coordinated the AMF intercomparison and processed the improved AMF look-up table.SB and KFB compared the stratospheric corrections and implemented the STREAM estimates to the data products.MZ and EP improved and analyzed the quality of the SCD approaches.MZ made Fig. 1, and EP contributed to Fig. 3 and made Fig. 4. MVR, TW, AR and KFB evaluated the level-1 data and oversaw the spectral fitting harmonization.JDM and KFB prepared the transition from TM4 to TM5-MP for the QA4ECV retrieval framework.RvdA helped to make data, images, and the dois available online, and test-used the new data.JN and ADR collected user feedback and developed the traceability chain concepts.HI and GP helped develop ideas for validation, and JCL and SC contributed to the quality assessment against international standards.All authors read and commented on the manuscript.
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure1.OMI row anomaly in the UV-2/VIS channel as a function of time throughout the OMI mission(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017).Prior to 2007, there was no row anomaly.Affected rows (red crosses) are suffering from a partial blockage of light entering the instrument, so that the absolute radiance levels become compromised.The upper x axis indicates the percentage of OMI pixels (defined as the 100 % × ratio of the number of pixels affected by the row anomaly to the total number of pixels) being affected in a particular month.

Figure 2 .
Figure2.Traceability chain for the QA4ECV NO 2 retrieval algorithm.The orange blocks (rectangles) are the building blocks of the retrieval, and in the main chain these are clickable and show more details in deeper layers.The light-blue blocks are also clickable and will provide more information on that process in a pop-up window.The parallelograms provide information on algorithm choices and input data sets.The interactive traceability chain is available at http://www.qa4ecv.eu/ecv/no2-pre(last access: 13 December 2018).

Figure 4 .
Figure 4. (a) Correlation plots of IUP Bremen (425-497 nm fitting window) and KNMI (405-465 nm) NO 2 slant columns retrieved using preferred fit settings for OMI orbit OMIL2_2005m0202t0339, including only pixels with SZA < 88 • and intensity < 1.1 × 10 14 .(b)Wavelength dependency of the total air mass factor for a scenario with SZA = VZA = 30 • (geometrical AMF = 2.31), as calculated with DAK for a midlatitude standard atmospheric profile with a total NO 2 column of 5.9 × 10 15 molecules cm −2 (mostly situated in the stratosphere) (red curve), and for the same midlatitude standard atmospheric profile but now with absorption by both NO 2 and O 3 (total column of 322 DU, purple curve).

Figure 5 .
Figure 5. Stratospheric NO 2 columns from OMI on 2 February 2005, estimated with the data assimilation method (a) and with the STREAM method (b).Panel (c) shows the differences between the stratospheric NO 2 estimates.

Figure 6 .
Figure 6.NO 2 tropospheric air mass factor (black) as a function of wavelength computed with DAK for a polluted boundary layer for a specific viewing geometry (θ = 60 • , θ o = 45.6 • ).Horizontal lines show averaged multi-wavelength AMF for different fitting windows (purple, 425-450 nm, blue 405-465 nm and light blue 425-497 nm).The grey line shows NO 2 absorption cross section from Vandaele et al. (1998) at 220 K.A midlatitude standard atmosphere was used including O 3 .The AMF was computed for a polluted boundary layer with 16 × 10 15 molec.cm −2 , without aerosols, a boundary layer height of 1 km and surface albedo of 0.05.The non-smooth behaviour of the black line is because the spectral resolution of the AMF is not sufficient to resolve the NO 2 cross section used in the calculation.If a constant cross-section value is used in the RTM for calculating TOA reflectance, the increasing AMF with wavelength would be spectrally smooth.

Figure 7 .
Figure 7. Meridional average QA4ECV OMI NO 2 column averaged over 39-41 • N on 2 February 2005.No cloud radiance, albedo, or AMF filtering have been applied.Data assimilation and STREAM stratospheric columns are indicated in the black and green lines; the total slant columns divided by the geometric AMF are light blue.Both data assimilation and STREAM stratospheric column estimates are included in the QA4ECV NO 2 product.

Figure 11 .
Figure 11.Campaign mean (30 May-30 June 2006) of the QA4ECV tropospheric NO 2 column distribution over eastern China for clear-sky situations (cloud radiance fraction < 0.5).The black circle indicates the location of Tai'an, where Chiba University operated the MAX-DOAS instrument.One cell corresponds to 0.1 • × 0.1 • .On average there are 15 satellite pixels per cell used to calculate the campaign mean.

Figure 12 .
Figure 12.(a) Scatter plot of QA4ECV OMI vs. MAX-DOAS tropospheric NO 2 columns for Tai'an (China) in May-June 2006.The solid line shows the result of a reduced major axis regression to the data.Only pixels measured a cloud radiance fraction < 0.5 and an effective cloud pressure < 875 hPa, within 20 km and 30 min of a MAX-DOAS measurement have been selected.(b) Same as (a), but now for DOMINO v2 vs. MAX-DOAS tropospheric NO 2 .

σ 2 .
Figure 13.(a) Histogram of differences in QA4ECV OMI vs. MAX-DOAS tropospheric NO 2 columns for Tai'an (China) in May-June 2006.The black line shows a Gaussian fit to the observed differences, and the red dashed lines shows the Gaussian expected from the uncertainties reported in the QA4ECV and MAX-DOAS data products.(b) Same as (a), but now for differences between DOMINO v2 and MAX-DOAS tropospheric NO 2 .

Table 1 .
Table 2 lists the relevant specifics for these in-GCOS requirements for satellite retrievals of tropospheric NO 2 columns (WMO, 2011).

Table 2 .
Satellite instruments and level-1 data contributing to the QA4ECV NO 2 ECV data product.

Table 3 .
Overview of OMI SCD retrieval codes from the QA4ECV consortium's institutes.The common settings used for round 1 were a 405-465 nm fitting window, polynomial degree of 4, and inclusion of O 3 , NO 2 , O 2 −O 2 , and H 2 O, Ring cross sections, use of mean solar irradiance as reference spectrum.The cross sections have been convolved with the OMI slit function for each row separately.

Table 4 .
Recommended settings for the QA4ECV NO 2 spectral fitting for the retrieval of NO 2 slant columns from GOME, SCIAMACHY, OMI, and GOME-2(A) for generating a multi-decadal data record for the period 1995-2017.

Table 5 .
Overview of the main uncertainty estimates and quality flags included in the QA4ECV NO 2 ECV precursor product.The third column indicates whether the estimate is unique for that pixel, derived from a global estimate, or is a blend of individual and global estimates.

Table 6 .
Comparison of uncertainty estimates for the main QA4ECV OMI NO 2 retrieval steps.The SCD and stratospheric SCD uncertainties are representative of all possible retrieval scenarios.AMF uncertainties are representative of situations with high NO 2 .
with σ O being the uncertainty reported (in the data files) for QA4ECV OMI NO 2 columns, σ MD the uncertainty reported for MAX-DOAS NO 2 columns, and σ R the uncertainty from spatio-temporal mismatches between the satellite and ground-based measurement (Table7).The mean reported uncertainties (σ O and σ MD ) are regarded as random errors here (see discussion in Sect.2.3 in Boersma