Air mass factor (AMF) calculation is the largest source of uncertainty in
NO

Satellite observations in the UV and visible spectral range are widely used
to monitor trace gases such as nitrogen dioxide (NO

Although trace gas satellite retrievals have improved over the last decades
(e.g.

Theoretical uncertainty (also known as parametric uncertainty) is the
uncertainty arising within one particular retrieval method. Structural
uncertainty is the uncertainty that arises when different retrieval
methodologies are applied to the same data

Flow chart of AMF calculation and comparison process followed in the
study. In the third step forward model parameters (

There are few studies addressing structural uncertainty for trace gas
retrievals.

We start with a comparison of top-of-atmosphere (TOA) reflectances simulated
by radiative transfer models (RTMs), the main tool for any AMF calculation
(Sect. 3.1). The RTMs DAK, McArtim, SCIATRAN and VLIDORT solve the radiative
transfer equation differently, and have different degrees of sophistication
to account for the Earth's sphericity and multiple scattering. Next we compare
altitude-dependent (or box-) AMFs for NO

The concept of a traceability chain (here in the form of a flow diagram) for
the AMF calculation process and uncertainty assessment used in this study is
illustrated in Fig. 1. Structural uncertainty estimated in each step is based
on the standard deviation (

Different models use different methods to solve the radiative transfer
equation and to describe the sphericity of the Earth's atmosphere.
Differences in modelled TOA reflectances between RTMs provide an estimate for
the reflectance structural uncertainty (

Altitude-dependent AMFs (box-AMFs, equivalent to scattering weights)
characterise the vertical sensitivity of the measurement to a trace gas
(e.g.

The air mass factor (

If the trace gas is optically thin, the total air mass factor can be written
as the sum of the box-AMFs of each layer weighted by the partial vertical
column (e.g.

In Eq. (

The NO

Overview of radiative transfer models that participated in the top-of-atmosphere reflectance comparison and their main characteristics.

Satellite retrievals also need to consider the presence of clouds. In the AMF
calculation, residual clouds can be accounted for in several ways. The
independent pixel approximation (IPA) consists of calculating the AMF for a
partly cloudy scene as a linear combination of cloudy (

AMFs for cloudy scenes are calculated using Eq. (

The atmosphere can also be assumed to be cloud-free for cloud fractions below
a certain threshold (e.g. 0.1 or 0.2, see Table 3). In that case, a clear-sky
AMF is used and Eq. (

Different retrieval groups use different sources for the ancillary data, as
well as different methods to account for the temperature dependence and the
presence of clouds and aerosols (e.g.

Next, each of the groups used their preferred settings to calculate
tropospheric AMFs. In this round-robin exercise, a comparison of
state-of-the-art retrieval algorithms, the differences between AMFs not only
arise from differences between the RTMs, vertical discretisation and
interpolation but also from differences in the selection of forward model
parameter values and the different corrections for clouds, aerosols and
surface reflectivity. Thus the differences in the AMFs using the preferred
settings can be interpreted as the overall structural uncertainty of the AMF
calculation

Four RTMs from different research groups participated in the comparison. Some
differences between models are highlighted in Table

DAK (Doubling-Adding KNMI) was developed at the Royal Netherlands
Meteorological Institute

Box-AMFs are calculated with DAK in this study by WUR/KNMI by differencing
the logarithm of reflectances at TOA with and without the trace gas in
atmospheric layer

McArtim (Monte Carlo Atmospheric Radiative Transfer Inversion Model)

Box-AMFs calculated by MPI-C are obtained from Jacobians (derived by

SCIATRAN

SCIATRAN calculates the Jacobians or weighting functions, which are the
derivatives of the simulated radiance with respect to atmospheric and surface
parameters (air number density in this case). These quantities are related to
the box-AMFs calculated by IUP-UB as follows:

VLIDORT (Vector-LInearized Discrete Ordinate Radiative Transfer) was
developed by Rob Spurr at RT SOLUTIONS, Inc. The model is based on the
discrete ordinate approach to solve the radiative transfer equation in a
multi-layered atmosphere, reducing the RTE to a set of coupled linear first
order differential equations. Then, perturbation theory is applied to the
discrete ordinate solution

Box-AMFs are derived from the altitude-dependent weighting functions determined by VLIDORT:

As a first exercise, a base case calculation and comparison of TOA reflectances was made to assess the performance of the four RTMs and to obtain the structural uncertainty in TOA reflectance modelling. The base case comparison allowed us to establish the best possible level of agreement between RTMs by identifying differences in the RTMs performance that in more complex settings would be difficult to recognise. Furthermore, total and ozone optical thickness were compared to evaluate how the models agreed in their treatment of scattering and absorption processes and whether differences in scattering and absorption can explain possible differences between the TOA reflectances.

Basic model parameters were established as input in all RTMs (details can be
found in Table S1 in the Supplement). The basic atmospheric profile was a
33-layer midlatitude summer atmosphere

In the RT modelling, we considered a clear-sky atmosphere, so clouds and
aerosols were not included. Rayleigh scattering and O

TOA reflectances simulated by four RTMs for

All models calculate the same spectral dependency of TOA reflectance, as
shown in Fig.

Distribution of relative model differences between TOA reflectances
simulated by four RTMs including polarisation (DAK-VLIDORT, DAK-SCIATRAN,
DAK-McArtim, VLIDORT-SCIATRAN, VLIDORT-McArtim, SCIATRAN-VLIDORT and reversed
combinations) for all geometry combinations (

Figure

The results show strong consistency of TOA reflectance calculations for the
most common moderate viewing geometry retrieval scenarios. Relative
differences are somewhat higher for larger VZA, SZA and shorter wavelengths.
For the more extreme geometries, the light path through the atmosphere is
generally longer and photons have a higher probability of undergoing
interactions (scattering, absorption) with the atmosphere. Furthermore,
differences in the treatment of the Earth's sphericity for the extreme geometries
have a stronger influence than in close to nadir viewing geometries. These
differences will still be present in the box-AMF comparison in Sect. 3.2.
Rayleigh scattering also affects the effective photon path and it is stronger
at 340 nm than at 440 nm. Thus, small differences in the description of
Rayleigh scattering in the RTMs are more likely to lead to differences for
the extreme geometries and shorter wavelengths. The standard deviation of
differences between modelled TOA reflectances of 1.5 % (at 340 nm) and
1.1 % (at 440 nm) in this comparison can be considered to be the reflectance
structural uncertainty. The agreement in this study is better than in
previous RTM comparisons like

Box-AMF dependencies on forward model parameters for NO

To calculate box-AMFs, a common vertical grid was agreed between the groups
in order to reduce the sources that might cause differences between the RTMs.
The common profile resolution was

Figure 4a shows that the four participating groups generally agree well on the
vertical profile shape of NO

Vertical profile of mean relative differences between NO

Figure

Relative differences for 950 hPa box-AMFs are below 1.1 % for NO

This comparison indicates a good agreement between box-AMF LUTs computed
using different RTMs. The structural uncertainty in the AMF calculation due
to the choice of RTM and different interpolation schemes is 2 % for NO

In order to compute tropospheric AMFs via Eq. (

Upper panels: total NO

Four groups used the same settings (forward model parameters, a priori
profiles, temperature and cloud correction) to calculate clear-sky and total
tropospheric NO

All groups calculate similar AMF spatial patterns for the selected orbit.
Figure 6 (upper panels) shows total tropospheric NO

The correlation between AMFs calculated by the different retrieval groups is
excellent (

The largest differences are found at the edges of the OMI orbit, where viewing
zenith angles are large and light paths are long. This can be seen in the
lower right panel of Fig. 6, where the relative differences of tropospheric
NO

Statistical parameters for the comparison of total tropospheric
NO

Mean relative differences between IPA and clear-sky NO

These results demonstrate that, even when similar RTMs, box-AMFs and
identical forward model parameters are used to calculate the AMFs, there is
structural uncertainty that is introduced by the specific implementation of
different groups. First, the choice of a RTM introduces uncertainty in the
box-AMF calculation. Second, there are interpolation errors that are
intrinsic to the calculation method using Eq. (

Overview of AMF calculation methods and ancillary data used in the round-robin experiment by various research groups.

It is important to account for the effect of clouds on the photon path
lengths in the troposphere when calculating tropospheric AMFs. There are
various approaches that are commonly used to calculate AMFs in (partly)
cloudy conditions. The independent pixel approximation (IPA), introduced in
Eq. (4) (e.g.

To quantify the differences between the two approaches, here we compare
tropospheric NO

In unpolluted conditions, IPA and clear-sky AMFs are generally quite similar, with
average relative differences within 5 %. Still, there are important
differences between the two approaches. In unpolluted conditions with clouds
in the free and upper troposphere (cloud pressure

These results indicate that the differences between using IPA or clear-sky
AMFs are especially substantial for polluted conditions and small residual
cloud fractions. Selecting a particular cloud correction approach implies
that AMF values that will be systematically different from values obtained
with the other method. In polluted conditions, the mean differences are
20–40 % for cloud fractions between 0.1 and 0.2, with cloud pressure largely
explaining the magnitude and sign of the differences. Note that the a priori
profiles used to calculate the AMFs in this section have been obtained from a
specific CTM. If a different CTM were used, the values for the differences
between IPA and clear-sky AMFs would be different, in line with the structural
uncertainty that is being discussed in this study (See Sect. 3.3.3). A
previous study by

For the round-robin comparison, each group calculated tropospheric NO

Table 3 summarises the AMF algorithms included in this comparison. There are
several differences with the harmonised settings used in the previous
section. IUP-UB and BIRA now apply IPA only when cloud fraction exceeds 0.1
and 0.2, respectively, motivated by the high uncertainty of cloud parameters
for scenes with small cloud fractions (see Sect. 3.3.2). Peking University
accounts for the surface reflectance anisotropy and does pixel-by-pixel
online radiative transfer calculations. They also include an explicit aerosol
correction, motivated by the fact that the implicit aerosol correction breaks
down under conditions of high aerosol optical thickness and strongly absorbing
particles (

Different groups use different LUTs for their AMF calculations, and POMINO uses pixel-by-pixel online radiative transfer calculations. The LUTs are different in several aspects, such as the RTMs used to create them and the number of reference points for each dimension. All these differences affect the AMF structural uncertainty. Based on the discussion in previous sections we consider that the use of different LUTs introduces a structural uncertainty of the order of 6.5 %.

Most of the surface albedo values used in the retrievals come from the

The agreement of AMFs from this round-robin exercise quantifies the overall AMF structural uncertainty. The comparison with seven groups allowed us to calculate a mean AMF as a reference (which is not necessarily the true AMF) value which can be considered a state-of-the-art AMF value. For a representative ensemble mean AMF, we required all groups to have a valid (unflagged) AMF value at a pixel location. We selected two different days (2 February and 16 August 2005) in winter and summer to identify possible seasonality effects in the agreement of the AMFs.

Tropospheric NO

Ratio of tropospheric NO

First we compare the six groups that use the same cloud parameters. In contrast
to what we found in the harmonised settings comparison, the global maps of
tropospheric AMF calculated by each group using their preferred settings
(Fig.

We compared global AMF calculations from all individual groups against the
pixel mean AMF from six groups (Peking University only calculates AMFs over
China). Figure 9 shows the average ratio of the AMF by each group to the
ensemble mean AMF (bars) and the correlation (crosses) for polluted conditions (NO

Over unpolluted regions the agreement is better: AMFs from the different groups agree within 8.5–18 % in both February and August, which implies a smaller structural uncertainty (Table S6 provides a detailed summary of the comparison).

In order to assess which forward model parameters explain most of the AMF
structural uncertainty, we analysed AMF differences from groups that use
identical cloud parameters and implicit aerosol correction (BIRA, University
of Leicester, NASA and WUR). Between these four groups, the only different
forward model parameters are surface albedo, a priori NO

We focus on explaining the differences between BIRA and WUR here, since these
were of the order of 30 % (Fig. 9). We explored the correlations between
BIRA-WUR AMF differences and differences between assumed surface pressures,
albedos and NO

Tropospheric NO

Selecting a specific chemistry transport model thus influences the AMF
structural uncertainty via differences in the profile shape. These
differences in the profile shape depend on the different characteristics of
the models (e.g. spatial and temporal resolution and parameterisation of
different processes in the atmosphere). Previous studies analysed how using
different CTMs influences the NO

All these aspects influence the estimation of retrieval (and AMF) theoretical
uncertainties. In order to quantitatively estimate the effect of one model
characteristic alone (e.g. the spatial resolution) on the AMF structural
uncertainty it would be necessary to compare AMF calculated with the same
approach but with just that specific characteristic being different in the
profile shapes generated by the CTM. Such a specific sensitivity analysis has
not been done in this study but should be considered in future AMF
comparisons. To test the robustness of our structural uncertainty estimate,
we did some experiments by simulating the effect of high-resolution a priori
profiles on AMF values.

The findings in this subsection indicate that quality assurance efforts for

Ratio of tropospheric NO

In the previous section, we found that differences between a priori NO

All groups calculate similar spatial patterns for the AMFs over China
(Fig. 10). In the polluted north-east (Beijing area) AMFs are lower due to
the reduced sensitivity to NO

Box-AMFs at 25 hPa as a function of cosine of SZA (left panel) and
as a function of cosine of VZA (right panel). In the left panel, VZA is
constant at 37

To estimate the effect of differences in cloud parameters on AMF structural
uncertainty, we analysed differences in AMF calculated by WUR and Peking
University. The Peking University AMF calculations (and the cloud parameters)
were based on a version of the POMINO retrieval using clouds retrieved with
an implicit aerosol treatment (i.e. similar to KNMI/WUR). We explored the
correlations between Peking University and WUR AMFs differences and
differences in cloud pressure (

The POMINO retrieval by Peking University explicitly corrects for the
presence of aerosols in the atmosphere by including profiles of aerosol
optical properties simulated by the GEOS-Chem model (and constrained by MODIS
AOD on a monthly basis) in the radiative transfer model and in the cloud
retrieval (

In conditions with substantial aerosol pollution (AOD

We pointed out in Sect. 3.2 that differences in the description of the
atmosphere's sphericity could lead to differences in stratospheric AMFs,
especially for extreme geometries. Here we investigate the differences
between stratospheric NO

A direct validation of stratospheric NO

Averaged OMI total NO

We tested whether possible errors in the diurnal cycle of stratospheric
NO

We have analysed the AMF calculation process for NO

The choice of RTM for TOA reflectance and box-AMF calculation introduces an
average uncertainty of 2–3 %. The detailed comparison showed that
state-of-the-art RTMs are in good agreement. Particularly for DAK, this is
the first time that box-AMF calculations are extensively tested against those
calculated with other RTMs. The McArtim model simulates systematically lower
box-AMFs in the stratosphere, which we attribute to the model's geometrically
more realistic description of photon scattering in a spherical atmosphere.
The four European retrieval groups agree within 6 % in their calculation of
NO

When retrieval groups use their preference for ancillary data along with
their preferred cloud and aerosol correction, we find that the structural
uncertainty of the AMF calculation is 42 % over polluted regions and 31 %
over unpolluted regions. Table

Average relative structural uncertainty for every step of the AMF
calculation following the comparison process shown in Fig. 1. This includes
the modelling of TOA reflectance (

Sensitivity studies for one particular algorithm indicate that the choice for
cloud correction (IPA or clear-sky AMF for small cloud fractions) is a strong
source of structural uncertainty especially for polluted conditions with
residual cloud fractions of 0.05–0.2 (on average an structural uncertainty
of 20 %). The choice for aerosol correction (explicitly or implicitly via
the cloud correction) introduces an average uncertainty of 50 %, especially
when aerosol loading is substantial. Selecting trace gas a priori profiles
from different chemistry transport models, surface albedo from different
data sets and cloud parameters from different cloud retrievals contributes
substantially to structural uncertainty in the AMFs. These findings point to
the need for detailed validation experiments designed to specifically test
cloud and aerosol correction methods under relevant conditions (strong
pollution, residual cloud fractions of 0.1–0.2). Not only should the retrieved
NO

The magnitude of the structural uncertainty in AMF calculations is
significant, and is caused mainly by methodological differences and
particular preferences for ancillary data between different retrieval groups.
This study provides evidence of the need for improvement of the different
ancillary data sets, including uncertainties of the forward model parameters
used in the retrievals for a better agreement in the AMF calculation. This
will significantly decrease AMF structural uncertainty towards the levels
desired in user requirement studies (

This research has been supported by the FP7 Project Quality Assurance for Essential Climate Variables (QA4ECV), grant No. 607405. Andreas Hilboll and Andreas Richter acknowledge funding by DLR in the scope of the Sentinel-5 Precursor verification project (grant 50EE1247). UoL acknowledges the use of the ALICE and SPECTRE High Performance Computing Facility at the University of Leicester. We would like to thank the two anonymous referees for the useful interactive discussion in the review process. Edited by: H. Worden Reviewed by: two anonymous referees