Ozone Monitoring Instrument (OMI) Aura nitrogen dioxide standard product version 4.0 with improved surface and cloud treatments

We present a new and improved version (V4.0) of the NASA standard nitrogen dioxide (NO2) product from the Ozone Monitoring Instrument (OMI) on the Aura satellite. This version incorporates the most salient improvements for OMI NO2 products suggested by expert users and enhances the NO2 data quality in several ways through improvements to the air mass factors (AMFs) used in the retrieval algorithm. The algorithm is based on the geometry-dependent surface Lambertian equivalent reflectivity (GLER) operational product that is available on an OMI pixel basis. GLER is calculated using the vector linearized discrete ordinate radiative transfer (VLIDORT) model, which uses as input high-resolution bidirectional reflectance distribution function (BRDF) information from NASA’s Aqua Moderate Resolution Imaging Spectroradiometer (MODIS) instruments over land and the wind-dependent Cox–Munk wave-facet slope distribution over water, the latter with a contribution from the water-leaving radiance. The GLER combined with consistently retrieved oxygen dimer (O2–O2) absorption-based effective cloud fraction (ECF) and optical centroid pressure (OCP) provide improved information to the new NO2 AMF calculations. The new AMFs increase the retrieved tropospheric NO2 by up to 50 % in highly polluted areas; these differences arise from both cloud and surface BRDF effects as well as biases between the new MODISbased and previously used OMI-based climatological surface reflectance data sets. We quantitatively evaluate the new NO2 product using independent observations from groundbased and airborne instruments. The new V4.0 data and relevant explanatory documentation are publicly available from the NASA Goddard Earth Sciences Data and Information Services Center (https://disc.gsfc.nasa.gov/datasets/ OMNO2_V003/summary/, last access: 8 November 2020), and we encourage their use over previous versions of OMI NO2 products.

Abstract. We present a new and improved version (V4.0) of the NASA standard nitrogen dioxide (NO 2 ) product from the Ozone Monitoring Instrument (OMI) on the Aura satellite. This version incorporates the most salient improvements for OMI NO 2 products suggested by expert users and enhances the NO 2 data quality in several ways through improvements to the air mass factors (AMFs) used in the retrieval algorithm. The algorithm is based on the geometry-dependent surface Lambertian equivalent reflectivity (GLER) operational product that is available on an OMI pixel basis. GLER is calculated using the vector linearized discrete ordinate radiative transfer (VLIDORT) model, which uses as input high-resolution bidirectional reflectance distribution function (BRDF) information from NASA's Aqua Moderate Resolution Imaging Spectroradiometer (MODIS) instruments over land and the wind-dependent Cox-Munk wave-facet slope distribution over water, the latter with a contribution from the water-leaving radiance. The GLER combined with consistently retrieved oxygen dimer (O 2 -O 2 ) absorption-based effective cloud fraction (ECF) and optical centroid pressure (OCP) provide improved information to the new NO 2 AMF calculations. The new AMFs increase the retrieved tropospheric NO 2 by up to 50 % in highly polluted areas; these differences arise from both cloud and surface BRDF effects as well as biases between the new MODISbased and previously used OMI-based climatological sur-face reflectance data sets. We quantitatively evaluate the new NO 2 product using independent observations from groundbased and airborne instruments. The new V4.0 data and relevant explanatory documentation are publicly available from the NASA Goddard Earth Sciences Data and Information Services Center (https://disc.gsfc.nasa.gov/datasets/ OMNO2_V003/summary/, last access: 8 November 2020), and we encourage their use over previous versions of OMI NO 2 products.

Introduction
The Dutch-Finnish-built Ozone Monitoring Instrument (OMI) has been operating onboard the NASA EOS Aura spacecraft since July 2004 (Levelt et al., , 2018. The primary objectives of OMI's mission are to continue the long-term record of total column ozone and to monitor other trace gases relevant to tropospheric pollution worldwide. Observations of sunlight backscattered from the Earth over a wide range of UV and visible wavelengths (∼ 260-500 nm) made by OMI allow for the retrieval of various atmospheric trace gases, including nitrogen dioxide (NO 2 ). NO 2 is a critically important short-lived air pollutant originating from both anthropogenic and natural sources. It is the principal precur-sor to tropospheric ozone and a key agent for the formation of several toxic airborne substances such as nitric acid (HNO 3 ), nitrate aerosols, and peroxyacetyl nitrate. Satellite-based observations yield a global, self-consistent NO 2 data record that can complement field measurements.
Over the last decade, there have been considerable efforts to improve NO 2 data quality from OMI and other satellite instruments (e.g., Boersma et al., 2018). Special emphasis has been placed on improving auxiliary information (e.g., a priori NO 2 vertical profiles, surface reflectivity), particularly with respect to spatial and temporal resolution. For instance, the global OMI NO 2 products are based on a priori NO 2 profiles from relatively coarse-resolution (> 1.0 • × 1.25 • ) global CTM simulations Choi et al., 2020). Many regional studies suggest a general low bias in the global tropospheric NO 2 column products, particularly over polluted areas, that can be partially mitigated by using a priori information from highresolution CTM simulations (Russell et al., 2011;McLinden et al., 2014;Lin et al., 2014Lin et al., , 2015Goldberg et al., 2017;Choi et al., 2020). Current global NO 2 retrievals are based on a low-resolution (0.5 • × 0.5 • ) static climatology of the surface Lambert equivalent reflectivity (OMLER) product , which is likely biased high due to insufficient cloud and aerosol screening. This bias in surface reflectivity can lead to an underestimation of tropospheric NO 2 retrievals (Zhou et al., 2010;Lin et al., 2014;Vasilkov et al., 2017). In addition, the OMLER data do not account for the significant day-to-day (orbital) variability in surface reflectance caused by changes in sun-satellite geometry, a phenomenon often expressed by the bidirectional reflectance distribution function (BRDF). Zhou et al. (2010) demonstrated the impact of both the spatial resolution and the BRDF effect on OMI tropospheric NO 2 retrievals over Europe by using high-resolution surface BRDF and albedo products from the Moderate Resolution Imaging Spectroradiometer (MODIS). Taking advantage of the MODIS highresolution data, albeit neglecting the BRDF and atmospheric effects, Russell et al. (2011) andMcLinden et al. (2014) created improved NO 2 products from the NASA standard product (Bucsela et al., 2013;Lamsal et al., 2014) over the continental US and Canada, respectively. While these and subsequent studies (e.g., Kuhlmann et al., 2015; addressed the limitation of climatological LER data for NO 2 retrievals, they did not account for the surface BRDF effect on the OMI cloud products (cloud pressure and fraction), which are also inputs to the NO 2 algorithm. Applying the MODIS BRDF data consistently to both the NO 2 and cloud retrievals demonstrably improves the quality of OMI NO 2 retrievals over China (Lin et al., 2014(Lin et al., , 2015Liu et al., 2019). However, this approach is computationally expensive and is applicable to land surfaces only. Our previous work (Vasilkov et al., 2018) proposed an approach appropriate for satellite NO 2 data processing on a global scale (a) by using MODIS BRDF information consistently in the cloud and NO 2 retrievals (b) for both land and water and (c) in an efficient way. Here, we apply the approach globally for the first time in the standard NASA OMI NO 2 algorithm.
In this paper we describe various updates made in the version 4.0 (V4.0) NASA OMI NO 2 algorithm, discuss their impact on the retrievals of tropospheric and stratospheric NO 2 column amounts, and provide an initial quantitative assessment of NO 2 data quality. Section 2 describes the OMI NO 2 algorithm and various auxiliary data used by the algorithm. We present validation results in Sect. 3. Section 4 summarizes the conclusions of this study.

OMI and the NO standard product
OMI is an ultraviolet-visible (UV-Vis) spectrometer on the polar-orbiting NASA Aura satellite (Levelt et al., , 2018. Aura, launched on 15 July 2004, follows a sunsynchronous orbit with an Equator crossing time near 13:45 local time. OMI employs two-dimensional charge-coupled device (CCD) detectors and operates in a push-broom mode, registering spectral data over a 2600 km cross-track spatial swath. The broad swath enables global daily coverage within 14-15 orbits. In the OMI visible channel used for NO 2 retrievals, each swath, measured every 2 s, comprises 60 cross-track fields of view (FOVs) varying in size from ∼ 13 km × 24 km near nadir to ∼ 24 km × 160 km for the FOVs at the outermost edges of the swath. Each orbit consists of ∼ 1650 swaths from terminator to terminator. OMI's full daily coverage has been affected by data loss due to an anomaly presumably caused by material on the spacecraft outside the instrument that results in reduced coverage to about half of its original swath, as discussed in Sect. 2.4.
The OMI NO 2 standard product (OMNO2) algorithm provides retrievals of NO 2 column (total, tropospheric, and stratospheric) amounts by exploiting Level-1B calibrated radiance and irradiance data from the visible channel (350-500 nm with 0.63 nm spectral resolution). The algorithm employs a multi-step procedure that consists of (1) a spectral fitting algorithm to calculate NO 2 slant column densities (SCDs) as discussed in Sect. 2.1, (2) determination of air mass factors (AMFs) to convert SCDs to vertical column densities (VCDs) as discussed in detail in Sect. 2.2, (3) a scheme to remove cross-track-dependent artifacts or stripes, and (4) a stratosphere-troposphere separation scheme to derive tropospheric and stratospheric NO 2 VCDs. The AMF depends upon a number of parameters including optical geometry (solar and viewing azimuth and zenith angles), surface reflectivity, cloud pressure and fraction, and the shape of the NO 2 a priori vertical profile.
Since the first release of OMNO2 in 2006 (Bucsela et al., 2006;Celarier et al., 2008), there have been significant conceptual and technical improvements in the retrieval of NO 2 from space-based measurements. Prior versions developed a new scheme for separating stratospheric and tropospheric components in version 2.1 (V2.1) (Bucsela et al., 2013;Lamsal et al., 2014) and a new algorithm for improved NO 2 SCD retrievals in V3.0 Krotkov et al., 2017), including improved cloud products  in V3.1 (Choi et al., 2020). The current version, V4.0, further improves on the retrievals in a number of significant ways for NO 2 AMF and VCD calculations. Figure 1 shows a schematic diagram of the retrieval algorithm, and Table 1 summarizes the differences and similarities between previous (V3.1) and current (V4) versions. Some of the approaches in the V4 algorithm are similar to those used in V3.1, but there are several important changes as discussed in detail in Sect. 2.1 and 2.2. Figure 1. Schematic diagram of the NASA OMI NO 2 algorithm version 4.0, which is coupled with the cloud-and geometrydependent surface Lambertian equivalent reflectivity (GLER) algorithms that ultimately produce stratospheric (strat) and tropospheric (trop) NO 2 vertical column densities (VCDs The spectral fitting algorithm for the operational standard OMI NO 2 product is described in detail in . Briefly, the algorithm retrieves NO 2 slant column densities (SCDs) by using a differential optical absorption spectroscopy (DOAS) approach (e.g., Platt and Stutz, 2006). In the DOAS approach, laboratory-measured spectra of NO 2 (Vandaele et al., 1998) and glyoxal (Volkamer et al., 2005), HITRAN08-based water vapor spectra (Rothman et al., 2009), and rotational Raman (RR; Ring effect) filling-in are sequentially fitted to the OMI-measured reflectance spectrum in the 402-465 nm wavelength range. The slant column   Modified DOAS fit (Vasilkov et al., 2018) AMF Terrain reflectivity Monthly climatology  Daily GLER data (Vasilkov et al., 2017;Qin et al., 2019;Fasnacht et al., 2019) Terrain pressure At pixel center (calculated from terrain height and GMI terrain pressure) Average over pixel (calculated from terrain height and GMI terrain pressure) Cloud pressure and fraction Operational O 2 -O 2 cloud product (OMCLDO2) v2.0  New O 2 -O 2 cloud product (OMCDO2N) derived using the GLER product (Vasilkov et al., 2018) Cloud radiance fraction Calculated at 440 nm from OMCLDO2 v2.0 cloud fraction using VLIDORTbased lookup Stratosphere-troposphere separation Spatial filtering and interpolation (Bucsela et al., 2013), but with minor changes in box sizes Same as in V3.1 represents the integrated abundance of NO 2 along the average photon path from the sun through the atmosphere to the satellite. The Ring spectra are calculated as a linear combination of the atmospheric (Joiner et al., 1995) and liquid water (Vasilkov et al., 2002) RR spectra, convolved with the wavelength-and cross-track-dependent OMI transfer function . The algorithm employs a multistep, iterative retrieval procedure for removal of the Ring and spectral under-sampling (Chance, et al., 2005) patterns as well as a low-order polynomial smoothing prior to estimation of SCDs for all interfering species. This is in contrast to the conventional DOAS approach that treats the Ring effect as a pseudo-absorber and fits all absorbers simultaneously with the polynomial functions. For accurate wavelength shifts (radiances vs. irradiances), the standard product algorithm splits the entire fitting window into seven carefully selected, partially overlapping micro-windows, iteratively evaluates the RR spectrum amplitudes, performs wavelength adjustments for each segment, and then iteratively retrieves the NO 2 , H 2 O, and glyoxal in the windows best suited for a particular trace gas species. The OMI NO 2 SCDs from the standard product were compared with improved SCD retrievals from the Quality Assurance for Essential Climate Variables (QA4ECV; http: //www.qa4ecv.eu/, last access: 18 May 2020), BIRA-IASB's (Royal Belgian Institute for Space Aeronomy) QDOAS software (http://uv-vis.aeronomie.be/software/QDOAS/, last access: 18 May 2020), and the latest KNMI retrievals (van Geffen et al., 2015) and are shown to agree within 2 % . The typical NO 2 SCD uncertainties amount to ∼ 0.8 × 10 15 molec. cm −2 , or 5 %-7 % in high-SCD areas and 15 %-20 % in low-SCD areas .

O 2 -O 2 spectral fitting algorithm
The oxygen dimer (O 2 -O 2 ) slant column fitting algorithm shares many features of the NO 2 fitting algorithm and is described in detail in Vasilkov et al. (2018). It consists of a multi-step, iterative retrieval approach with three carefully selected micro-windows sampling the flanks and the core of the broad O 2 -O 2 feature centered at 477 nm. The algorithm exploits OMI-measured reflectance spectra in the 451-496 nm range to determine the wavelength shifts and RR amplitudes. The Ring patterns are removed from the original OMI reflectances during the iterative adjustments for differences in the wavelength registration of radiances and irradiances. The O 2 -O 2 slant columns are retrieved after removal of the NO 2 and H 2 O absorptions estimated by the algorithm discussed in the previous section and of the ozone absorp-tion using total ozone data from Veefkind et al. (2006). After removal of the interfering signals, the 477 nm O 2 -O 2 absorption profile is carefully normalized to the adjacent O 2 -O 2 absorption-free reflectance levels accounting for very different wavelength dependencies of surface reflectances over various geographical sites (e.g., the open-ocean and desert area), as described in Vasilkov et al. (2018). The normalized O 2 -O 2 absorption profiles are then iteratively fitted with the temperature-dependent cross sections from Thalman and Volkamer (2013) over the 463-488 nm range to derive O 2 -O 2 SCDs. These are used to estimate the cloud properties as discussed below in Sect. 2.2.2.

Improved air mass factor calculations
The AMF, which is defined as the ratio of SCD to VCD, is needed to calculate the retrieved NO 2 VCD. Details of the AMF and its calculation are given in Palmer et al. (2001). The AMF for each FOV is calculated by combining altitudedependent (z-dependent) scattering weights (w) computed with a radiative transfer model and a local a priori vertical NO 2 profile shape (S), taken from a chemistry transport model: (1) For the tropospheric AMF, the integral extends from the surface to the tropopause, whereas the integral from the tropopause to the top of the atmosphere provides the stratospheric AMF. The scattering weight at a given altitude describes the sensitivity of the backscattered radiation to the abundance of the absorber at that altitude. For an optically thin absorber like NO 2 , scattering weights are a function of atmospheric scattering and are considered to be independent of the species' vertical distribution (Palmer et al., 2001). Factors affecting scattering weights include wavelength, optical geometry (solar and viewing azimuth and zenith angles), surface reflectivity, and cloud pressure and fraction. The wavelength dependence of scattering weights is accounted for by creating an average of scattering weights derived from the values at multiple wavelengths within the NO 2 spectral fitting window. To compensate for the effect of the assumed constant NO 2 temperature (220 K) in the NO 2 SCD retrievals, the scattering weights are corrected for the atmospheric temperature effect using local climatological monthly temperature profiles as discussed in Bucsela et al. (2013). These profiles are based on the meteorological field from the Modern-Era Retrospective Analysis for Research and Applications (MERRA-2) (Gelaro et al., 2017). The a priori NO 2 profile shapes are computed from a monthly mean climatology of vertical NO 2 profiles constructed from the Global Modeling Initiative (GMI) CTM simulation (Douglass et al., 2004;Strahan et al., 2007;Strode et al., 2015) driven by MERRA-2 meteorology. The spatial resolution of the model is 1.25 • in longitude and 1.0 • in latitude, and the atmosphere is divided into 72 pressure levels extending from the surface to 0.01 hPa. The model output is sampled between 13:00 and 14:00 local time, consistent with the OMI overpass time. The use of monthly NO 2 profiles helps capture the seasonal variation in the NO 2 vertical distribution (Lamsal et al., 2010). The simulation is based on yearly varying NO x emissions, as discussed in Strode et al. (2015); this is necessary to account for the effect of rapidly changing NO x emissions (e.g., Tong et al., 2015;Duncan et al., 2016;Miyazaki et al., 2017) on local NO 2 profile shapes Krotkov et al., 2017).
For each FOV, AMFs are computed for clear (AMF clr ) and cloudy (AMF cld ) conditions. The AMF of a partially cloudy scene is calculated by assuming the independent pixel approximation: where f r is the cloud radiance fraction (CRF), defined as the fraction of the measured radiation that comes from clouds and scattering aerosols, and it is computed at 440 nm from the retrieved effective cloud fraction (ECF), f c , using Eq. (8) (see below). AMF clr is calculated for the ground reflectivity of R s and at terrain pressure P s , whereas AMF cld is calculated assuming a Lambertian surface of reflectivity 0.8 at the retrieved cloud pressure. Below we provide a detailed discussion of each of these input parameters that are incorporated in the OMNO2 V4.0 algorithm.

New surface reflectivity product for NO 2 and cloud retrievals
Surface reflectivity is an important input parameter for UV-Vis satellite retrievals of trace gases and cloud information. The surface reflectance over both ocean and land depends upon viewing and illumination geometry and can be accurately described by the bidirectional reflectance distribution function (BRDF). This effect is, however, neglected by most currently available trace gas and cloud algorithms, which use a climatological Lambert equivalent reflectivity (LER) for the surface. To account for surface BRDF effects in the NO 2 and cloud retrievals, here we use the geometry-dependent surface LER (GLER) product derived using the Moderate Resolution Imaging Spectroradiometer (MODIS) BRDF data and the vector linearized discrete ordinate radiative transfer (VLIDORT) calculation (Vasilkov et al., 2017;Qin et al., 2019;Fasnacht et al., 2019). The GLER allows for a computationally efficient approach that does not require major changes to the existing trace gas and cloud algorithms. We derive GLER by inverting the top-of-atmosphere (TOA) radiance (I ) of a Rayleigh atmosphere over a non-Lambertian surface for each specific FOV and sun-satellite geometry within the Lambertian framework, i.e., where I 0 is the TOA radiance calculated for a black surface, T is the total (direct + diffuse) solar irradiance reaching the surface converted to the ideal Lambertian-reflected radiance (by dividing by π steradians) and then multiplied by the transmittance of the reflected radiation between the surface and TOA in the direction of a satellite instrument, and S b is the diffuse flux reflectivity of the atmosphere for the case of its isotropic illumination from below (Dave, 1978). The values of I 0 , T , and S b are pre-computed with VLIDORT and stored in a lookup table. The GLER values are calculated at wavelengths relevant for both NO 2 (440 nm) and cloud (466 nm) retrievals. Over land, the BRDF is calculated using the Ross-thick Lisparse kernel model (Lucht et al., 2000) in VLIDORT (Spurr, 2006): where the coefficients a iso , a vol , and a geo come from the Moderate Resolution Imaging Spectroradiometer (MODIS) Collection 5 gap-filled seasonally snow-free BRDF product MCD43GF (Schaaf et al., 2002(Schaaf et al., , 2011 for band 3 (459-479 nm) available at 30 arcsec spatial resolution and 8 d temporal resolution. The term a iso is the isotropic contribution describing the Lambertian part of light reflection from the surface; the volumetric kernel (k vol ) describes light reflection from a dense leaf canopy, and the geometric kernel (k geo ) describes light reflection from a sparse ensemble of surface objects casting shadows on the background assumed to be Lambertian. The kernels are the only angle-dependent functions, the expressions of which are given in Lucht et al. (2000). The band 3 BRDF coefficients spatially averaged over an actual satellite FOV are used to calculate TOA radiance and GLER at 466 nm. To calculate GLER at 440 nm, we apply a scaling method using the ratio of OMI-derived Lambert equivalent reflectivity (LER) data at 440 and 466 nm: The value of f s = LER 440 LER 466 is taken from the gridded monthly LER ratio data at 1 • × 1 • or coarser resolution. The LER is determined from OMI TOA radiance measurements as discussed in Vasilkov et al. (2017Vasilkov et al. ( , 2018. We use clear-sky (effective cloud fraction < 0.02) and aerosol-free (OMI UV aerosol index, < 0.5; Torres et al., 2007) OMI LER data to create the monthly gridded data. The cloud and aerosol screening is necessary because the spectral dependence of surface features differs from that of clouds and aerosols.
Over water, the surface reflectance is calculated at the two wavelengths, 440 and 466 nm, using VLIDORT. To calculate TOA radiance, we include light specularly reflected from a rough water surface and diffuse light backscattered by water bulk. We also account for contributions from oceanic foam that can be significant for high wind speeds. Reflection from the water surface is described by the Cox-Munk slope distribution function, which depends on both the wind speed and the wind direction (Cox and Munk, 1954). Polarization at the ocean surface is accounted for by using a full Fresnel reflection matrix as suggested by Mishchenko and Travis (1997).
We use wind speed data from a pair of satellite microwave imagers that include the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) instrument onboard the NASA Aqua satellite (Meissner and Wentz, 2004)  Diffuse light from the ocean is described by a Case 1 water model with a single input parameter of chlorophyll concentration (Morel, 1988) taken from the monthly Aqua MODIS data. The common Case 1 water model developed for the visible channel (Morel, 1988) was extended to the UV using data from Vasilkov et al. (2002Vasilkov et al. ( , 2005. To calculate waterleaving radiance, we require the downwelling irradiance at the surface (i.e., atmospheric transmittance). Since the transmittance and the water-leaving contribution are coupled, we develop a simple coupling scheme in VLIDORT to ensure that the value of water-leaving radiance used as an input at the ocean surface will correspond to the correct value of the downwelling flux reaching the surface interface .
For OMI ground pixels covering land and water surfaces, the TOA radiance (I ) is calculated as an average of radiance for land (I L ) and water (I w ) weighted by the pixel land fraction (f ): The value of f is determined by converting various surface categories in the MODIS data (note that these are of much higher spatial resolution than the OMI data) into a binary land-water mask (e.g., treating all shorelines and ephemeral water as the land category and classifying all other water subcategories simply as water). The areal fraction of land (or water) for each OMI pixel is then computed as the statistics of the binary categories. Figure 2 shows an example of changes in surface reflectivity used in the previous (V3.1) and the current (V4.0) version of the OMI NO 2 algorithm. The GLER data computed for OMI observations as discussed above for 20 March 2005 differ considerably from the OMI-derived climatological monthly LER data  for March. As shown in Figs. 2 and 3a, the GLERs are generally lower than climatological LER data except at swath edges with large viewing angles and over areas affected by sun glint that correspond to higher values of GLER. Changes over the sunglint areas are rather large, reaching up to 0.3. The climatological LER data derived by analyzing histograms of 5 years of OMI-based LER data likely overestimate the actual surface reflectivity due to residual cloud and aerosol contamination and underestimate over sun-glint areas as the procedure ignores sun-glint-affected observations. In contrast, the GLER data over land are based on atmospherically corrected radiances from high-resolution MODIS observations, minimizing the impact of both cloud and aerosols.

Improved cloud product retrieval
We develop a new algorithm that provides cloud parameters, namely cloud radiance fraction (CRF) and cloud optical centroid pressure (OCP), and use them in the OMNO2 algorithm. Similar to the standard OMCLDO2 algorithm , our cloud algorithm exploits the O 2 -O 2 absorption to retrieve O 2 -O 2 SCD as discussed in Sect. 2.1.2, but it derives the two cloud parameters using the GLER and other ancillary data that are used in the NO 2 algorithm, maintaining inter-algorithm consistency. The OM-CLDO2 algorithm retrieves these parameters using the climatological LER data from Kleipool et al. (2008). In the following, our new cloud product is referred to as OMCDO2N.
The derivation of CRF and OCP is based on a simple cloud model called the mixed Lambertian equivalent reflectivity (MLER) model (Joiner and Vasilkov, 2006;Veefkind et al., 2016). The MLER model treats cloud and ground as horizontally homogeneous, opaque Lambertian surfaces and mixes them using the independent pixel approximation (IPA). According to the IPA, the measured TOA radiance, I m , is a sum of the clear-sky (I g ) and overcast (I c ) subpixel TOA radiances that are weighted with an effective cloud fraction (ECF), f c (e.g., Stammes et al., 2008): We choose the wavelength of 466 nm that is not substantially affected by rotational Raman scattering (RRS) or atmospheric absorption to derive f c . The parameters I g and I c are a function of the ground and cloud LERs, respectively, and are calculated using VLIDORT (Spurr, 2006) and obtained with an interpolated lookup table. We use GLER discussed above for ground reflectivity and a uniform cloud reflectivity of 0.8 (Koelemeijer et al., 2001;Stammes et al., 2008).
The value of f c is calculated by inverting Eq. (7). Note that aerosols are implicitly accounted for in the determination of f c , as they are treated (like clouds) as particulate scatters. CRF (f r ) defines the fraction of TOA radiance reflected by cloud: We use pre-computed lookup tables of the TOA radiances generated using VLIDORT. Due to its wavelength dependence, we calculate CRF at 466 nm for OCP at 440 nm for NO 2 retrievals. The MLER model compensates for photon transport within a cloud by placing the Lambertian surface somewhere in the middle of the cloud instead of at the top (Vasilkov et al., 2008). The pressure of this surface corresponds to OCP, which can be modeled as a reflectance-averaged pressure level reached by backscattered photons (Joiner et al., 2012). We retrieve cloud OCP from the O 2 -O 2 SCD discussed above (Sect. 2.1.2). The cloud OCP, P c , is estimated by inversion using the MLER method to compute the appropriate O 2 -O 2 AMFs: (9) where VCD (SCD / AMF) is the vertical column density of O 2 -O 2 over ground (VCD g ) and cloud (VCD c ). The clearsky (AMF g ) and overcast or cloudy (AMF c ) subpixel AMFs are calculated at 477 nm with ground (GLER) and cloud (0.8) reflectivity, respectively. Lookup tables for the AMFs were generated using VLIDORT. Temperature profiles needed for estimation of VCD and AMF are taken from the GEOS-5 global data assimilation system (Rienecker et al., 2011).
In addition to OCP, we retrieve the so-called scene pressure. The scene pressure is derived from Eq. (9) assuming that f r = 1 and cloud reflectivity equals the scene LER. The scene LER is determined from the measured TOA radiance using the equation (Eq. 3) that defines TOA radiance in the Rayleigh atmosphere over a Lambertian surface. In the absence of clouds, aerosols, and any major gas absorptions, the scene pressure should be equal to the surface pressure. The scene pressure is therefore an important diagnostic tool for evaluation of the performance of cloud pressure algorithms.  Figure 4 shows an example of cloud products retrieved with our algorithm compared with those retrieved from the standard OMCLDO2 algorithm . The retrieved OCP and CRF from the two algorithms exhibit broadly consistent spatial patterns in both cloud altitude and amount. The values of OCP generally range from 370 to 1001 hPa in OMCDO2N versus 150 to 1011 hPa in OM-CLDO2N. For both products, CRF varies from 0 for clearsky to 1 for overcast conditions. A systematic difference is evident, with generally higher values in OMCDO2N for OCP by 147 hPa and CRF by 0.01 compared to OMCLDO2. For OCP, there is a general pattern in difference, with OM-CDO2N OCP higher for low-altitude clouds (> 700 hPa) and lower values for high-altitude clouds (< 300 hPa) (Fig. 3c). The largest OCP differences occur for cases in which cloud pressures in OMCLDO2 are clipped to 150 hPa. For CRF, larger differences occur for partially cloudy scenes, with higher CRF values in OMCDO2N by 0-0.1 for both land and water surfaces (Fig. 3b). Exceptions are over sun-glint areas where CRF in OMCDO2N is lower by 0-0.3 with a mean difference of 0.13.

Treatment over snow and ice surfaces
Over ice and snow surfaces, identified by near-real-time ice and snow extent (NISE) flags (Nolin et al., 2005) in the OMI Level-1b data, the following treatments are made for surface reflectivity. In the case of permanent ice and snow surfaces, the MCD43GF product provides BRDF parameters, allowing us to calculate GLER. Over seasonal snow area usually with data gaps in MCD43GF, we calculate OMI-derived LER capped by a constant snow albedo of 0.6 following Boersma et al. (2011). In rare cases of pixels not flagged by NISE and gaps in MODIS data, we use OMI LER climatology  regardless of whether the surface is either snow-and ice-covered but missed by NISE or snowand ice-free. The OMI-derived scene reflectivity and scene pressure are used for NO 2 and cloud retrievals over seasonally snowcovered areas. If the NISE flags are set as true, the following assumptions are made in our CRF, OCP, and NO 2 retrievals. Over bright surfaces (scene reflectivity > 0.2), we consider the scenes to be snow-or cloud-covered and assign the scene pressure to OCP. In addition, if the difference between the surface pressure and scene pressure is smaller than 100 hPa, the scene is considered to be either cloud-free or covered by optically thin clouds following the cloud over snow classification by Vasilkov et al. (2010), and CRF for the pixel is set to zero. If the difference between the surface pressure and scene pressure exceeds 100 hPa, the scene is considered to be overcast by optically thick (shielding) clouds (Vasilkov et al., 2010), and CRF for the pixel is set to 1. To avoid a possible NISE misclassification (Cooper et al., 2018) for low-reflectivity scenes (scene reflectivity < 0.2), we consider such scenes to be snow-and ice-free and calculate CRF, OCP, and NO 2 AMF using the standard procedure with GLER for those scenes.

Improved terrain height and pressure calculation
Terrain pressure is a critical parameter for the AMF in NO 2 and cloud algorithms as well as for the total optical depth of the Rayleigh atmosphere in the GLER algorithm. Prior studies have shown that errors in terrain pressure can introduce over 20 % errors in retrieved NO 2 VCD, especially in areas of complex terrain (Zhou et al., 2010;Russell et al., 2011).
Here, we use a 2 arcmin global relief model of global landwater surface data (ETOPOv2; National Geophysical Data Center, 2006) to derive terrain height for each individual OMI ground pixel. We derive the pixel-average terrain height by collocating and averaging the high-resolution data as discussed in Qin et al. (2019). The corresponding terrain pressure for each OMI pixel (P s ) is calculated from the terrain pressure-height relationship established based on MERRA-2 monthly terrain pressure (P s_GMI ) at a spatial resolution of 1 • latitude × 1.25 • longitude used in the GMI model discussed above: where z (= z −z GMI ) represents the difference between the average terrain height for an OMI pixel (z) and the terrain height at GMI resolution (z GMI ). The parameter H = kT Mg represents the scale height, where k is the Boltzmann constant, T is the temperature at the surface, M is the mean molecular weight of air, and g is the acceleration due to gravity.
2.3 Impact of the changes on AMF Figure 5 shows an example of how changes in each individual input parameter affect tropospheric AMFs, which, in turn, translate inversely to tropospheric NO 2 column retrievals. Replacing climatological LER from OMLER with daily GLER data affects scattering weight profiles in the lower troposphere, resulting in lower values of tropospheric AMF almost everywhere, except over sun-glint areas where the use of GLER enhances scattering weights and tropospheric AMF (Fig. 5a). The changes in tropospheric AMF with GLER usually range from −50 % to 25 %, occasionally reaching up to −100 %. The effect is small (−6 % to 1 %) for overcast scenes (CRF > 0.9), increases (−28 % to 17 %) over clear and partially cloudy scenes (CRF < 0.5) for unpolluted regions, and surges (−62 % to 3 %) over polluted areas (> 5 × 10 15 molec. cm −2 ). Figure 6a shows GLER-driven changes in clear-sky (CRF < 0.5) tropospheric AMF for different surface and scene types, separated by tropospheric NO 2 column amounts. For 80 % of cases over land, 97 % over water outside sun-glint areas, and 98 % over sun-glint areas, tropospheric NO 2 columns are < 1.5 × 10 15 molec. cm −2 , and the average GLER-driven differences are small at −6.6±17.3 %, −3.8±7.1 %, and 4.0±12.9 %, respectively. The differences increase gradually with column amount over NO x source regions (e.g., cities and highly polluted coastal areas), with binned (of size 1 × 10 15 molec. cm −2 ) average differences ranging from −10±20.1 % to −30±19.7 %. Over snow and ice surfaces, changes are rather large, reaching up to a factor of 2. The impact of change in the surface reflection data on stratospheric AMFs is negligible (< 2 %). Figures 5b and 6b show how changes in the cloud parameters (CRF and OCP) affect tropospheric AMF. Replacing OMCLDO2-based cloud parameters with those from OMCDO2N changes scattering weight profiles in a complicated way. Higher values of OCP in OMCDO2N will include additional portions of scattering weights between the OMCDO2N-and OMCLDO2-based OCPs, especially in the lower troposphere, thereby reducing the tropospheric AMF. On the other hand, the higher CRF values lead to an increased contribution of the cloudy AMF in the calculation of tropospheric AMF, thereby increasing its value. Their combination causes a wide range of scenarios and large variation in the AMF effect. Overall, the change in cloud parameters causes enhancement of tropospheric AMFs for partially cloudy and overcast scenes and reduction for clear-sky scenes, especially over polluted areas. The AMF differences are generally large for low AMF values that are driven by enhanced differences in either OCP, CRF, or both as dis-cussed in Vasilkov et al. (2017). The changes in tropospheric AMF with the OMCDO2N-based cloud parameters usually range from −17 % to 28 %, with larger variation over land (−34 % to 40 %) compared to water (−12 % to 25 %) and for low (< 1) AMF (−47 % to 41 %) compared to high (> 3) AMF (−4 % to 18 %). The largest changes in AMF (−96 % to 62 %) occur over snow and ice surfaces that result from the difference in the treatment of snow and ice for cloud and NO 2 retrievals as discussed in Sect. 2.2.3. For clear-sky and partially cloudy scenes with CRF < 0.5, the effect of the changes in cloud parameters differs between land and water surfaces as well as sun-glint and non-sun-glint geometries and becomes more pronounced over polluted land and coastal areas (Fig. 6b). As in the case of surface reflectivity, the impact of the change in cloud parameters on stratospheric AMF is < 1 %. Figure 5c presents an example of changes in tropospheric AMF differences between the previous approach of using terrain pressure at OMI pixel centers and the pixel-average terrain pressure implemented in the current version (V4.0). In general, the AMF changes driven by the changes in terrain pressure are within ±1 % over ocean and ±3 % over land, although at times they can reach up to 30 %, especially for observations over complex terrain such as mountainous regions (Fig. 5c inset). Figures 5d and 6c show the AMF differences arising from the combined effect of changes in all parameters discussed above. The effect arising from the replacement of the climatological OMLER with GLER is partially compensated for by the effect arising from the change in cloud parameters in places where the two parameters exhibit opposite trends. Exceptions are over polluted land and coastal areas; the GLER effect on AMF is augmented by the cloud effect. The average AMF changes arising from all parameters (2 %) are lower than the changes arising from either GLER (−2.3 %) or cloud parameters (4.1 %), although the combined effect leads to a wider range of variation in AMF changes (−100 % to 57 %) compared to the effect from individual parameters. The changes arising from all parameters are somewhat smaller (−21 % to 34 %) for overcast scenes (CRF > 0.9) compared to (−47 % to 29 %) clear and partially cloudy scenes (CRF < 0.5) and are substantial (−137 % to 30 %) over highly polluted areas (> 5 × 10 15 molec. cm −2 ) and over snow and ice surfaces (−126 % to 99 %). Differences in the AMF effect are evident among land, water, and sun-glint areas (Fig. 6c). The impact of the changes is below 1 % for the stratospheric AMF.

Row anomaly and removal of stripes
The retrieved NO 2 SCDs have persistent relative biases in the 60 cross-track FOVs and show a pattern of stripes running along each orbital track. This instrumental artifact is corrected using the "de-striping" procedure described in detail in Bucsela et al. (2013). Briefly, the de-striping algorithm estimates the mean cross-track biases using measurements obtained at latitudes between 30 • S and 5 • N and from orbits within two orbits of the target orbit. These correction values, one for each cross-track position, are then subtracted from the retrieved SCDs to derive the de-striped SCD field.
Starting 25 June 2007 and presumably even earlier, OMI experienced a more severe form of anomaly that affects the quality of radiance data in certain rows at all wavelengths Schenkeveld et al., 2017). This effect, called the "row anomaly" (RA), has developed and changed over time. Currently, the RA has affected approximately half of the OMI's FOVs, resulting in OMI's global coverage now being 2 d instead of 1 d before the onset of the RA.
The quality of radiance data for the RA-affected FOVs is sufficiently poor as to prevent reliable NO 2 retrievals. Therefore, we abandon retrieval calculations for all measurements that are flagged by the RA-detection algorithm used in the Level-1 processing. We found that this RA-detection algorithm may not be sufficiently sensitive to the relatively small (but important for our purposes) RA changes. Figure 7 shows an example of anomalous rows not flagged by the RA-detection algorithm but observed in the NO 2 retrievals. Shown are time series of average NO 2 SCDs normalized by geometric AMFs over the Pacific Ocean for the RAunaffected row of 20 (0-based) compared with three rows that show significant degradation in the quality of SCD retrievals. These particular rows are in immediate proximity to the main RA area, thus showing the gradual RA evolution: in the present epoch the RA slowly shifts towards the highnumbered rows -note the sequential timing of the big drops in the retrievals in rows 44-46. While the data from the three rows start deviating from row 20 beginning from summer 2016, the data quality degrades further for rows 44, 45, and 46 from September of 2017, 2018, and 2019, respectively, to the extent that they cannot be sufficiently corrected by the destriping algorithm. In such cases, we implement additional RA flagging for those rows that start showing anomalous behavior and exclude those data from Level-2 and higher-level NO 2 products.

Calculation of stratospheric and tropospheric NO 2 columns
We use an observation-based stratosphere-troposphere separation scheme to estimate the stratospheric NO 2 field, as discussed in detail in Bucsela et al. (2013), and the algorithm remains unchanged in the current version. Briefly, the stratospheric field for an orbit is computed by creating a gridded global field of initial stratospheric NO 2 VCD estimates (V init ) with data assembled from within ±7 orbits of the target orbit: Here, S strat and AMF strat represent stratospheric SCD and AMF, respectively. A priori estimates of the tropospheric  contribution (S trop ap ) are subtracted from the measured destriped SCDs (S), and grid cells wherein this contribution exceeds 0.3 × 10 15 molec. cm −2 are masked. This masking ensures that the model contribution to the retrieval is minimal, especially in polluted areas. The residual field of the initial stratospheric VCDs measured outside the masked regions mainly over unpolluted or cloudy areas is smoothed by a boxcar average and a two-dimensional interpolation, yielding an estimate for stratospheric NO 2 VCD (V strat ) for an individual ground pixel. The estimation of the stratospheric NO 2 VCD allows for the computation of the tropospheric NO 2 VCD (V trop ) from the de-striped NO 2 SCD (S) and the tropospheric AMF (AMF trop ): where stratospheric NO 2 SCD (S strat ) is calculated from stratospheric AMF (AMF strat ) and V strat computed in the previous step.
With the updates in surface and cloud treatments as discussed in Sect. 2.2, the current version has made significant improvements, particularly in tropospheric AMFs and consequently in VCD estimates. Further improvement to the retrievals is possible by enhancing the quality of a priori NO 2 profiles through improvements in model resolution, emissions, and chemistry, which remain unchanged in the current version. If improved a priori NO 2 profiles become available, one can first use Eq. (1) to readily recalculate AMF trop by combining them with scattering weights (w (z)) archived in the data files and then use Eq. (12) together with other supplied parameters to recalculate V trop . The same approach can be applied to remove the effect of a priori profiles used in retrievals altogether, while comparing NO 2 columns from a model simulation with retrievals (Eskes and Boersma, 2003;Lamsal et al., 2014). Figure 8 shows a comparison of tropospheric and stratospheric NO 2 columns retrieved from the V3.1 and V4.0 algorithms for 20 March 2005. As expected, the updates implemented in V4.0 yield higher (∼ 10 %-40 %) tropospheric NO 2 columns in polluted areas, with less-pronounced (±10 %) differences in background and low-column areas. These results are consistent with the observed differences in the tropospheric AMF as discussed above in Sect. 2.2.4 and with other previous regional studies over land surfaces (Zhou et al., 2010;McLinden et al., 2014;Lin et al., 2014Lin et al., , 2015Liu et al., 2019) that implemented one or more of the changes applied in V4.0. In contrast to changes in tropospheric NO 2 retrievals, changes in stratospheric NO 2 estimates range between −3.6 × 10 14 molec. cm −2 and 3.2 × 10 14 molec. cm −2 and are close to the range of expected uncertainties of stratospheric NO 2 estimates (Bucsela et al., 2013). The relative differences in the stratospheric NO 2 column between the two versions are close to 0 % on average, usually ranging between −2.5 % and 2.0 % and occasionally reaching up to ±13 %. This difference in stratospheric NO 2 estimates is much larger than the difference in stratospheric AMFs and is caused by differences in tropospheric AMFs that influence NO 2 observations over unpolluted and cloudy areas used by the stratosphere-troposphere separation scheme. Figure 9 shows the seasonally averaged tropospheric NO 2 columns over the selected domains of North America, Europe, southern Africa, and Asia for the months of June, July, and August in 2005. These domains contain highly polluted areas with significant NO x emissions where the impact of changes in surface reflectivity and cloud parameters on tropospheric NO 2 retrievals becomes increasingly important. The use of more accurate pixelspecific information for surface and cloud parameters in V4.0 results in significantly enhanced tropospheric NO 2 column retrievals almost everywhere. The effect, however, varies with the vertical distribution of NO 2 , with the largest effects in high-column areas. Figure 10 shows the seasonal average tropospheric NO 2 columns for December through February. While seasonal differences in NO 2 columns are evident owing to changes in NO x lifetime and boundary layer depth, the impact of algorithm changes in V4.0 remains similar. There are two notable exceptions specifically related to observations over snow and ice surfaces. First, there are significant data gaps in V3.1 but nearly none in V4.0. In V3.1, retrievals over snow and ice areas were considered to be highly uncertain and therefore discarded, following the recommendation of . As discussed above in Sect. 2.2.3, V4.0 incorporates changes in surface and cloud treatment in the NO 2 algorithm that allows us to retain more observations that we determine to be our acceptable level of cloudiness. Next, these algorithm changes led to profound changes in the calculated tropospheric AMFs and resulting NO 2 column amounts. The reduction in tropospheric NO 2 retrievals in V4.0 over snowand ice-covered surfaces arises from a combined effect of enhanced values of surface reflectivity, their impact on the CRF and OCP retrievals, and an inconsistent number of samples used in the calculation of the seasonal average. Nevertheless, due to inferiority in the quality of BRDF data and complexities in separating snow from clouds, caution is needed when interpreting wintertime data at high latitudes. Figure 11 shows some examples of how changes in the algorithm from V3.1 to V4.0 affect monthly domain average tropospheric NO 2 columns over areas affected by various NO x sources. In contrast to minor changes over the pristine Pacific Ocean, month-to-month changes over source regions vary considerably. The differences in tropospheric NO 2 columns between V4.0 and V3.1 range from −11 % to 15 % over Beijing, China, and from 0 % to 29 % over the Ruhr area in Germany, suggesting variations in relative differences among cities and industrial areas. The changes over a major biomass burning area in the Democratic Republic of Congo, Angola, and Zambia range 13 %-56 % during the biomass burning season of May through August but are < 5 % in other months. Differences between the two versions are small over areas influenced by lightning NO x emissions.
In Fig. 12, we examine the monthly variation of tropospheric NO 2 columns from the two versions over five highly populated and polluted cities that vary in terrain types ranging from coastal (e.g., Shanghai, Tokyo) to mountainous (e.g., Mexico City). NO 2 columns in V4.0 are generally higher than V3.1 by 0 %-30 %, but the difference can occasionally reach up to 50 % in some months. Changes of that order of magnitude in highly polluted areas have implications for the estimation of NO x emissions and trends using these data.

Assessment of OMI NO 2 product
In this section, we compare OMI NO 2 columns with total column retrievals from ground-based Pandora measurements and integrated tropospheric columns from aircraft spirals at several locations of the DISCOVER-AQ (Deriving Information on Surface Conditions from COlumn and VERtically Resolved Observations Relevant to Air Quality) field campaign held between 2011 and 2014.

Comparison between OMI and Pandora total column NO 2
Here, we compare the total column NO 2 retrievals from OMI and the ground-based Pandora spectrometer. Pandora is a compact sun-viewing remote sensing instrument that provides estimates of NO 2 column amounts from the surface to the top of the atmosphere (Herman et al., 2009(Herman et al., , 2018. The NO 2 retrieval approach for Pandora is similar to that of OMI and consists of the DOAS spectral fitting procedure to derive NO 2 SCD and its conversion to VCD using AMFs. However, the details differ due to the lack of top-of-atmosphere radiance measurements for the spectral fitting and simplicity in the AMF calculation for Pandora due to its direct-sun measurements. To compare with the OMI observations, we use Pandora data for sites listed in the Pandonia Global Network (https://www.pandonia-global-network.org/, last access: 10 May 2020). Out of 22 sites, we select 18 sites that we determined to be suitable for comparison. Data from some of the sites (e.g., Rome, Italy) are consistently higher than OMI by over a factor of 2, suggesting that the sites may be in close proximity to local sources that cannot be resolved by OMI. Although some of the selected sites have sporadic and short-term measurements (e.g., Ulsan, South Korea), we consider them for improved sampling and coverage. The collocation criteria include spatial and temporal matching between OMI and Pandora observations by selecting the OMI pixels that encompass the Pandora site and using Pandora 80 s total NO 2 column data averaged over ±10 min of OMI observations. We use high-quality data obtained under clearsky conditions with the root mean square of spectral fitting residuals < 0.05 and NO 2 retrieval uncertainty < 0.05 DU    Fig. 9, but for December, January, and February. The gray areas represent a lack of good observations as determined by data quality flags.
(∼ 1.3 × 10 15 molec. cm −2 ) for Pandora and with CRF < 0.5 for OMI. Figure 13 shows a comparison of OMI total NO 2 columns (sum of tropospheric and stratospheric columns) to coincidently sampled Pandora direct-sun NO 2 column retrievals at a clean site in Izaña on Tenerife, Spain, and a more polluted site in Greenbelt (Maryland, USA). The Izaña Atmospheric Observatory is located on the top of a mountain plateau, with an elevation of 2373 m above sea level. Since the site is free of local anthropogenic influences, Pandora observations likely provide stratospheric and free tropospheric NO 2 amounts. In contrast, the Greenbelt site in a suburban Washington DC area has traffic and air quality typical of polluted US cities. As shown in Fig. 13a and b, OMI NO 2 retrievals from the two versions are highly consistent (r > 0.92), with somewhat higher values in V4.0 compared to V3.1 by 13 % on average in Greenbelt and just 1 % in Izaña. The variations of OMI NO 2 from both versions are also broadly consistent with the Pandora measurements. The OMI and Pandora NO 2 columns are fairly correlated (r = 0.32, N = 232) at Izaña and moderately correlated (r = 0.51, N = 123) at Greenbelt; often the differences between each individual OMI and Pandora observation are significant. Overall, the total column NO 2 data from OMI are higher than Pandora, with an average difference of < 16 %. Occasional large discrepancies between OMI and Pandora reflect a combination of spatial heterogeneity, differences in spatial and temporal sampling, differences in the vertical sensitivity of satellite-and ground-based observations, and errors in OMI and Pandora retrievals. Figure 13c and d show the multiyear monthly mean variation of OMI and Pandora NO 2 columns. The seasonal variation in Pandora and OMI NO 2 columns is highly consistent and exhibits a summer maximum and a fall minimum at Izaña and a winter maximum and summer minimum in Greenbelt. The seasonal variation in the total column reflects that of the stratosphere for Izaña and of the troposphere in Greenbelt. For Izaña, the monthly mean differences between OMI and Pandora range from 8.2 % in June to 38 % in October for V4.0 and from 7.0 % in June to 37 % in October for V3.1. This discrepancy is likely due to the large aerial coverage of OMI pixels including nearby cities, unlike the point measurements made by Pandora at the mountaintop. The average tropospheric NO 2 column observed by OMI is 8.9 × 10 14 molec. cm −2 , suggesting significant NO 2 amounts in the troposphere with 20 %-32 % contributions to total column NO 2 on a monthly scale. For Greenbelt, the monthly mean differences between OMI and Pandora are within ±12 % for the majority of the cases for both versions, with V4.0 improving agreement for February, April, May, and December and worsening somewhat in other months, especially in September and November when the two versions exhibit larger differences in tropospheric NO 2 retrievals. Figure 14 shows average total NO 2 columns measured by Pandora and OMI at the 18 selected sites. Although there is a wide range of differences between individual sites, Pandora and OMI observations exhibit a good spatial correla- Figure 11. Monthly average tropospheric NO 2 columns in 2006 calculated from V3.1 (black) and V4.0 (red) data over selected 5 • latitude × 5 • longitude boxes from locations that are dominated by either anthropogenic (Beijing, China, and the Ruhr area, Germany), biomass burning (Democratic Republic of Congo -DRC, Angola, and Zambia), lightning (DRC), or no significant (Pacific) NO x sources. The vertical bars show the monthly standard deviation. The blue symbols that correspond to the right y axis show the monthly relative difference (in percent) between V4.0 and V3.1. tion, with slightly improved correlation for V4.0 (r = 0.65, N = 1082) compared to V3.1 (r = 0.62). The site-specific average values generally agree within ±35 % for columns < 10 16 molec. cm −2 . For more polluted sites, OMI retrievals tend to be lower than the Pandora data. Although the relationship between Pandora and OMI has not changed appreciably with the updates made in the OMI V4.0 product, the corrections are in the right direction for a majority of the sites. The observed differences should not be interpreted as biases in retrievals but rather as the combined effect of differences in spatial coverage, heterogeneity in the NO 2 field, preferential placement of Pandora instruments, and potentially a lack of site-specific profile shapes assumed in OMI retrievals.

Assessment using DISCOVER-AQ observations
We also use NO 2 observations from the DISCOVER-AQ field program to assess OMI NO 2 retrievals. The DISCOVER-AQ campaign was composed of four field deployments: the Baltimore-Washington area in Maryland (MD) in July 2011; the San Joaquin Valley in California (CA) in January-February 2013; Houston, Texas (TX), in Septem- Figure 12. Same as Fig. 11, but for a 1 • latitude × 1 • longitude box over five highly populated and polluted cities.  An observing strategy of the campaign was to carry out systematic and concurrent in situ and remote sensing observations from a network of ground sites and research aircraft that spiraled over each site two to four times a day. The payload of the P-3B research aircraft included in situ measuring instruments to measure NO 2 profiles in the 0.3-5 km altitude range. Each campaign hosted ground-based networks of surface monitors to provide in situ NO 2 observations and Pandora spectrometers to measure NO 2 column amounts.
We use Pandora NO 2 column observations and in situ NO 2 spiral data spatially and temporally matched to OMI on clear and partially cloudy (cloud radiance fraction < 0.5) days. Airborne measurements were carried out using the fourchannel chemiluminescence instrument from the National Center for Atmospheric Research (Ridley and Grahek, 1990) and the thermal dissociation laser-induced fluorescence instrument from the University of Berkeley (Thornton et al., 2000). Despite differences in the measurement technique and sampling strategy, NO 2 measurements from the two instruments are highly consistent and generally agree within 10 %, with the exception of ∼ 32 % difference for Houston (Choi et al., 2020). Here, we use the 1 s merged data from the chemiluminescence instrument only, taking advantage of its high-frequency measurements. The spiral data are extended to the ground by using coincident in situ surface NO 2 measurements sampled over the duration of spiral (∼ 20 min). To account for NO 2 amounts in the missing portion from the highest aircraft altitude to the tropopause, we use NO 2 from the GMI simulation. Like the surface data, the Pandora total column NO 2 data are averaged over the duration of each aircraft spiral. For OMI, we include data from all cross-track positions that are not subject to the row anomaly. Figure 15 and Table 2 show a summary of the comparison of OMI V4.0 and V3.1 NO 2 columns to vertically integrated tropospheric columns from the P-3B aircraft at 20 spiral locations. Overall, tropospheric NO 2 columns from OMI and aircraft spirals suggest poor agreement but a good correlation (r = 0.74, N = 100), with slightly improved agreement for V4.0 compared to V3.1. The agreement and correlations between OMI and P-3B observations vary by campaign locations (e.g., r = 0.4 for MD to r = 0.81 for CA for V4.0). The level of improvements from V3.1 to V4.0 also varies from 1.2 % in TX to 9.8 % in CA. OMI retrievals are usually lower than the aircraft data, with larger differences for sites with larger NO 2 gradients and columns (e.g., Denver-LaCasa, CO; Fresno, CA). OMI is rarely higher than the aircraft data as this usually happens over relatively cleaner sites (e.g., Fairhill, MD). This alternating nature of the variation in results in polluted versus clean areas suggests that OMI's large footprint size and the narrow spiral radius (∼ 4 km) of the aircraft are likely the primary causes of the observed differences. This was demonstrated in Choi et al. (2020) by using high-resolution Community Multi-scale Air Quality Model (CMAQ) simulations. Additional contributions to the observed differences could come from OMI retrieval errors arising from the use of coarse-resolution GMI-based a priori NO 2 profile shapes in the AMF calculation. Such profilerelated retrieval errors can be partially accounted for by replacing GMI profiles with the aircraft-observed NO 2 profiles (OMI obs ). The use of observed profiles in the OMI retrievals from both versions leads to a slight change in correlation but a 20 %-35 % reduction in the mean difference between OMI and aircraft observations, highlighting the role of a priori profiles in NO 2 retrievals as suggested by previous studies (Russell et al., 2011;Lamsal et al., 2014;Goldberg et al., 2017;Choi et al., 2020). The campaign average difference between OMI and aircraft observations is −38.8 % in V3.1 and −23.1 % in V4.0, resulting in a net improvement of 15.7 % with V4.0. We note here that the aircraft-observed profiles can be very different from the actual profiles over OMI's FOVs (pixels) due to a difference in the sampling domains for the two measurements. Figure 15 and Table 3 also show the comparison between the OMI and Pandora total column retrievals at the 20 DISCOVER-AQ sites. The correlations between collocated OMI V4.0 and Pandora observations for individual campaign locations vary from fair (r = 0.13 for MD) to good (r = 0.70 for CO), with a moderate correlation (r = 0.56, N = 83) for all observations from the four locations. The changes in correlation from V3.1 to V4.0 are generally small, with a minor improvement in CA and deterioration in MD. Compared to the aircraft observations, the OMI data generally show better agreement with the Pandora retrievals, with the smallest difference in MD and the largest difference in CO. Compared to V3.1, the agreement is improved for V4.0 by up to 9 %. The use of aircraft-observed NO 2 profiles in AMF calculations leads to higher OMI column retrievals than those from Pandora for MD and TX and lower columns than Pandora for CA and CO. Overall, total column retrievals from OMI V3.1 and Figure 15. Site average total (circles) and tropospheric (bars) NO 2 column data from the P-3B spiral (white bars), Pandora (green circles), and OMI (orange and red). The OMI tropospheric columns are derived using GMI-simulated (OMI GMI , orange) and P-3B (OMI obs , red) NO 2 profiles. The vertical bars for sites with over two observations represent the standard deviations. Table 2. Comparison of OMI V3.1 and V4.0 NO 2 retrievals based on a priori NO 2 profiles from GMI (OMI) and P-3B aircraft observations (OMI obs ) to P-3B observations during the DISCOVER-AQ field campaign. Shown here are the correlation coefficient (r) and mean difference, which is calculated as OMI minus validation data. V4.0 are respectively 33.5 % and 16.3 % lower than Pandora; this change is consistent with the comparison between OMI and P-3B observations. The observed discrepancy between the OMI, aircraft spiral, and Pandora data points to general difficulties in comparing observations of different spatial resolutions for a short-lived trace gas like NO 2 that has large spatial gradients, especially in the boundary layer.

Conclusions
We have described a series of significant improvements made to the operational OMI NO 2 standard product (OMNO2) algorithm.  (Bucsela et al., 2013;Lamsal et al., 2014); an optimized spectral fitting algorithm used for NO 2 slant column density retrievals ; and the use of annually varying monthly mean inputs derived from the Global Modeling Initiative (GMI) (e.g., NO 2 vertical profile shapes), as implemented in version 3.0 .
The changes in inputs result in substantial changes in tropospheric AMFs (and thus VCDs) in V4.0 relative to the previous version (V3.1). The geometry-dependent GLER data computed for OMI observations used in V4.0 differ considerably from the OMI-derived climatological LER data  used in V3.1. The data from GLER (a unitless value with 0.0-1.0 range) are generally lower, by < 0.05, than the climatological LER data over land and ocean outside sun-glint areas; GLER is much higher over sunglint areas and reaches more than 0.3 due to the geometrydependent Fresnel reflection. The cloud parameters (OCP and CRF) retrieved from the new O 2 -O 2 cloud algorithm described here and those from the operational cloud algorithm  used in V3.1 exhibit significant differences, with generally larger values for both parameters in V4.0 compared to V3.1 but noticeable exceptions over sunglint areas where CRFs in V4.0 are lower than V3.1 by < 0.3. Over snow and ice surfaces, identified by near-real-time ice and snow extent (NISE) flags in the OMI L1b data, various adjustments are made in V4.0 for GLER, OCP, and CRF by using other diagnostic parameters (e.g., scene pressure) retrieved by the new cloud algorithm. The scattering weights and tropospheric AMFs for NO 2 respond to the changes in these input parameters in a complicated way. Typically, tropospheric AMFs decrease with the use of GLER and increase with the use of the new cloud parameters, with exceptions over water surfaces affected by sun glint, for which we observe the opposite effect. Over highly polluted areas, the effect from GLER is augmented by the effect from the new cloud parameters, resulting in a considerable decrease in the tropospheric AMF. Changes in tropospheric AMFs resulting from updates in the treatment of snow-and ice-covered areas are also significant. Changes in the adopted terrain pressure (V4.0 vs. V3.1) can also have a sizable effect on tropospheric AMFs, particularly over areas with a complex terrain. In contrast, for stratospheric AMFs the combined impact of all of these algorithmic updates is negligible.
The changes in tropospheric AMFs translate directly into changes in tropospheric NO 2 retrievals and indirectly into stratospheric NO 2 estimates. Over background and lowcolumn NO 2 areas, tropospheric NO 2 column estimates have not changed appreciably from V3.1 to V4.0. Over more polluted areas, the tropospheric NO 2 retrievals have typically increased by 10 %-40 % from V3.1 to V4.0, mostly in direct proportion to the pollution level. Most of the increase in highly polluted areas is driven by the change in the surface reflectivity data used in the AMF calculation, with additional increases due to changes in the cloud parameters. Changes in the stratospheric NO 2 estimates are usually within ±2.5 %, which is close to the range of estimated uncertainties of stratospheric NO 2 estimates.
A global assessment of V4.0 tropospheric and stratospheric NO 2 products was performed by a thorough evaluation of their consistency with the data from V3.1, which was carefully evaluated in our previous works (e.g., Choi et al., 2020). In addition, we use NO 2 measurements made by independent ground-and aircraft-based instruments to evaluate the V4.0 product. The comparison of OMI total column NO 2 data to collocated Pandora observations at its 18 global network and 20 DISCOVER-AQ locations suggests that OMI and Pandora are generally highly consistent, exhibit similar seasonal variation, and agree within their expected uncertainties of 2.7×10 15 molec. cm −2 for Pandora (Herman et al., 2009) and ∼ 30 % for OMI under clear-sky conditions Bucsela et al., 2013). Individual data points differ considerably, and OMI tends to be lower than Pandora over highly polluted areas with spatially inhomogeneous NO 2 . The comparison of OMI tropospheric NO 2 column retrievals to columns derived from the aircraft spirals and surface data during the DISCOVER-AQ campaign also suggests general agreement in spatial variation, but OMI values are about a factor of 2 lower in polluted environments. This difference is partly due to inaccurate a priori assumptions but primarily to OMI's relatively large pixels. The use of observed NO 2 profiles as a priori information reduces the bias from ∼ 50 % to 23 % on average. A multi-axis differential optical absorption spectrometer (MAX-DOAS) (e.g., Chan et al., 2019) or high-spatialresolution measurements from aircraft (e.g., Nowlan et al., 2016;Lamsal et al., 2017;Judd et al., 2019) would provide a more comprehensive validation by mapping the NO 2 distributions over the complete areas of aircraft spirals and the satellite FOVs.
In this study, we focused on improving the surface and cloud parameters in the NASA standard NO 2 product retrievals. To further improve the retrieval accuracy, it is important to incorporate improved retrieval methods and auxiliary information, such as high-resolution a priori NO 2 profiles. For instance, current cloud algorithms based on the MLER model treat aerosols implicitly by providing effective (cloud + aerosol) CRF and effective cloud OCP, both necessary inputs for AMF calculations. Cloud effects on trace gas retrievals can be compromised by unknown aerosol effects, which lead to errors in AMF calculations. Therefore, the use of the GLER product in the NO 2 algorithm will greatly benefit from an explicit accounting for aerosol effects, particularly over polluted regions. We have recently developed an explicit and consistent aerosol correction method that can be applied consistently in both cloud and NO 2 retrievals (Vasilkov et al., 2020); it uses a model of aerosol optical properties from a global aerosol assimilation system paired with radiative transfer calculations. This approach allows us to account for aerosols within the OMI cloud and NO 2 algorithms with relatively small changes and will be used in the next version of the NO 2 algorithm.
Author contributions. LNL, NAK, JJ, and AV designed the data analysis. WQ, ZF, NAK, DH, and AV developed and evaluated the GLER product. EY, SM, AV, NAK, JJ, and BF developed and evaluated the cloud product. LNL, NAK, SM, WHS, and EB have developed and evaluated the NASA NO 2 standard product. LNL and SC conducted validation of the OMI NO 2 products using Pandora and other independent observations. LNL, AV, SM, and ZF wrote the paper with comments from all coauthors.