The MOPITT Version 9 CO product: sampling enhancements and validation

. Characteristics of the Version 9 (V9) MOPITT (Measurements of Pollution in the Troposphere) satellite retrieval product for tropospheric carbon monoxide (CO) are described. The new V9 product includes many CO retrievals over land which, in previous MOPITT product versions, would have been discarded by the cloud detection algorithm. Globally, the number of daytime MOPITT retrievals over land has increased by 30 %–40 % relative to the Version 8 product, although the increase in retrieval coverage exhibits signiﬁcant geographical variability. Areas beneﬁting from the improved cloud detection performance include (but are not limited to) source regions often characterized by high aerosol concentrations. The V9 MOPITT product also in-corporates a modiﬁed calibration strategy for the MOPITT near-infrared (NIR) CO channels, resulting in greater temporal consistency for the NIR-only and thermal-infrared– near-infrared (TIR–NIR) retrieval variants. Validation results based on in situ CO proﬁles acquired from aircraft in a variety of contexts indicate that retrieval biases for V9 are typically within the range of ± 5 % and are generally comparable to results for the V8 product.

Abstract. Characteristics of the Version 9 (V9) MOPITT (Measurements of Pollution in the Troposphere) satellite retrieval product for tropospheric carbon monoxide (CO) are described. The new V9 product includes many CO retrievals over land which, in previous MOPITT product versions, would have been discarded by the cloud detection algorithm. Globally, the number of daytime MOPITT retrievals over land has increased by 30 %-40 % relative to the Version 8 product, although the increase in retrieval coverage exhibits significant geographical variability. Areas benefiting from the improved cloud detection performance include (but are not limited to) source regions often characterized by high aerosol concentrations. The V9 MOPITT product also incorporates a modified calibration strategy for the MOPITT near-infrared (NIR) CO channels, resulting in greater temporal consistency for the NIR-only and thermal-infrarednear-infrared (TIR-NIR) retrieval variants. Validation results based on in situ CO profiles acquired from aircraft in a variety of contexts indicate that retrieval biases for V9 are typically within the range of ±5 % and are generally comparable to results for the V8 product.

MOPITT (Measurements of Pollution in the Troposphere)
is an instrument on the NASA Terra satellite which was launched on 18 December 1999. Measurements made by MOPITT's gas correlation radiometers (Drummond, 1989;Drummond et al., 2010) operating in both thermal-infrared (TIR) and near-infrared (NIR) spectral bands enable retrievals of CO mixing ratio vertical profiles and total column values. The MOPITT instrument has produced a unique long-term data record well suited for a variety of applications. MOPITT CO products are used, for example, to forecast air quality (Inness et al., 2015), estimate CO emissions (Pechony et al., 2013;Zheng et al., 2018;Nechita-Banda et al., 2018;Gaubert et al., 2020), and validate other satellite products . Over the last two decades, MOPITT retrieval products have improved continuously as knowledge has improved regarding the instrument, radiative transfer modeling, and geophysical variables Deeter et al., 2017).
MOPITT retrievals of CO volume mixing ratio (VMR) are generated with an optimal estimation-based retrieval algorithm (Deeter et al., 2003). CO retrievals are based on a log(VMR) state vector (Deeter et al., 2007a) and are performed on a retrieval grid with 10 pressure levels (surface, 900, 800, . . . , 100 hPa). Retrieval layers, used internally in the MOPITT retrieval algorithm, are defined by the layers between each level in this grid and the next-highest level in the grid . Thus, for example, the surfacelevel retrieval product actually represents the mean VMR for the layer between the surface and 900 hPa. (For the topmost MOPITT retrieval level at 100 hPa, the uniform-VMR layer extends from 100 to 50 hPa. Assumed VMR values in the layer from 50 hPa to the top of the atmosphere (TOA) are based on the Community Atmosphere Model with Chemistry (CAM-chem) model climatology and are fixed.) Retrieved CO total column values are calculated directly from the CO profile and are not retrieved independently. A priori CO profiles are derived from a model climatology based on the CAM-chem chemical transport model (Lamarque et al., 2012) and vary seasonally and geographically; the a priori climatology used for V9 products is identical to the climatology used for processing MOPITT Version 6, Version 7, and Version 8 products (Deeter et al., , 2019. MO-PITT a priori log(VMR) profiles vary by month but do not vary from year to year; this simplifies the interpretation of long-term trends in the data. Model-based climatologies used to generate the a priori are gridded at 1 • (lat/long) horizontal resolution and monthly temporal resolution. Spatial and temporal interpolation are used to generate a priori values at each specific observation location and day.
All MOPITT CO retrievals are based on a specific subset of the Average (A) and Difference (D) radiances from MO-PITT channels 5, 6, and 7; each channel is associated with a particular TIR or NIR gas correlation radiometer . Radiometers on MOPITT corresponding to channels 1-4 became inoperative in 2001 due to the failure of one of two coolers. TIR-only retrievals are based on the 5A, 5D, and 7D radiances in the 4.7 µm band, whereas NIRonly retrievals are based solely on the ratio of the 6D and 6A radiances in the 2.3 µm band. MOPITT TIR-only retrievals are typically most sensitive to CO in the mid-troposphere and upper troposphere, except in scenes characterized by strong thermal contrast (Deeter et al., 2007b). MOPITT NIR-only retrievals are most useful for retrievals of CO total column (Deeter et al., 2009;Worden et al., 2010). Unique "multispectral" or "joint" TIR-NIR retrievals exploit the 5A, 5D, 7D, 6D, and 6A radiances. This variant offers finer vertical resolution than the TIR-only and NIR-only variants and features the greatest sensitivity to CO in the lower troposphere (Deeter et al., 2013). However, because NIR measurements rely on reflected solar radiation, the benefits of the TIR-NIR variant are limited to daytime MOPITT observations over land.
This paper describes features of the new MOPITT V9 product which will be relevant to a wide spectrum of users. Changes to the processing algorithms used to produce the V9 CO product are discussed in Sect. 2. These include significant changes to (1) the method used to calibrate MOPITT's NIR radiances and (2) the cloud detection algorithm. Revi-sions to the cloud detection algorithm resulting in significantly enhanced retrieval coverage were described and analyzed previously in Deeter et al. (2021). V9 validation results based on in situ measurements acquired from aircraft are compared with corresponding V8 validation results in Sect. 3. Changes in retrieval sampling characteristics due to the revised cloud detection algorithm and their impacts are analyzed in Sect. 4. Finally, conclusions are presented and discussed in Sect. 5.

Calibration
Calibration of MOPITT's NIR radiances (6A and 6D) relies on a two-point calibration scheme involving both cold-calibration ("cold-cal") and hot-calibration ("hot-cal") events. Cold-cals are performed by pointing the scanning mirrors to space and occur many times per day. In contrast, hot-cals are typically performed annually as they require the execution of special instrument operations during which the internal blackbody is heated to ∼ 460 K . Ideally, NIR-channel radiances are calibrated using hot-cals occurring both before and after the time of observation. While this method is feasible in retrospective processing mode (i.e., processing previous years of data), it is not possible in forward processing mode (i.e., when processing recently acquired observations). Thus, in forward processing mode, only information from the most recent hotcal is used to calibrate MOPITT's NIR radiances. Comparisons of NIR-only retrieval products generated in retrospective and forward processing modes may exhibit significant differences (10 % to 20 %) in total column results, with the retrospectively processed data being more reliable . Therefore, because of the degraded quality of MOPITT products processed in forward processing mode, V8 and V9 products generated in this manner are labeled as "beta" products to distinguish them from standard archival products. Beta products are eventually replaced by standard archival files following the next hot-cal. Typically, this occurs no more than a year after the time of a particular observation (depending on the date of the most recent hot-cal). Thus, beta products are considered provisional and should not be exploited for quantitative analyses.
For V9, the NIR calibration methodology for retrospective processing has been significantly revised. Hot-cals are typically performed annually, usually in March, in conjunction with a decontamination procedure; the entire series of instrument operations typically requires 12-13 d. In most years, hot-cals are executed both immediately before and after the decontamination procedure. For previous MOPITT products, including V8, NIR calibration for archival (nonbeta) products relied on the closest bracketing hot-cals such that, usually, NIR radiances for a given date were calibrated using the most recent previous post-decontamination hotcal and the next pre-decontamination hot-cal. For example, for V8, NIR radiances observed between 5 March 2016 and 5 March 2017 were calibrated using information from the post-decontamination hot-cal on 4 March 2016 and the predecontamination hot-cal on 6 March 2017.
However, it was recently discovered that this NIR calibration strategy often results in a growing retrieval bias in the NIR-only products over the period between the two hotcals used for calibration. As illustrated in Fig. 1, this timedependent bias is most obvious when comparing TIR-only and NIR-only CO products immediately before and after a particular hot-cal/decontamination cycle. Time series plots of daily-mean CO total column values are shown in the top panel for the V9 TIR-only (V9T), V8 NIR-only (V8N), and V9 NIR-only (V9N) products for all daytime retrievals over land regions between 60 • S and 60 • N. Time series are shown in the bottom panel for CO total column values obtained by subtracting daily-mean V9T CO total column values from corresponding V8N and V9N daily-mean values. Although NIR-only and TIR-only retrievals are characterized by different vertical sensitivities and are therefore not expected to agree precisely, V9T total column values are a useful reference because they are unaffected by NIR calibration issues. Thus, TIR-only and NIR-only CO total column values averaged over large spatial scales should be expected to exhibit a very similar annual cycle.
Vertical gray bars shown in the upper and lower panels of Fig. 1 indicate periods during which the annual hotcalibration and decontamination procedures were performed. For each of the years shown from 2016 to 2020, the CO total column time series for V8N (plotted in blue) exhibits a physically unrealistic discontinuity when comparing dates just before a pre-decontamination hot-cal with dates just after the post-decontamination hot-cal several weeks later. For example, in 2019, CO total column for V8N increased from about −2 × 10 17 molecules cm −2 just before the predecontamination hot-cal to close to 0 just after the postdecontamination hot-cal. While the physical source of this discontinuity is not yet fully understood, it suggests that the pre-and post-decontamination hot-cals are not consistent with each other and are not equally useful for calibration.
Experiments were performed to develop an improved NIR calibration strategy for V9. It was found that the typical discontinuity in CO total column values before and after the hot-cal and decontamination cycle was greatly reduced when only post-decontamination hot-cals were used for calibration. The CO total column time series using this strategy, which was implemented for V9N operational processing, is plotted in purple in the bottom panel of Fig. 1. For each of the years shown, the improved stability of the V9N product compared to V8N is clearly evident. Additional details regarding the specific hot-cals used for NIR calibration in V9 over the entire MOPITT mission will be reported in a forthcoming re-vision of the L0-L1 Algorithm Theoretical Basis Document (ATBD).

Radiative transfer modeling
The operational MOPITT radiative transfer model, known as MOPFAS, is updated monthly with information describing the mean instrument state for that month, including the pressures and temperatures in the gas correlation cells (Edwards et al., 1999;Deeter et al., 2013). For V9, operational modeling of the MOPITT pressure modulation cell (PMC) radiances (7A and 7D) now also includes monthly updated values for the cell number density. The optical depth is calculated as the product of the cross-section, number density, and cell length. Monthly variations in cell pressure (P ) and temperature (T ) affect the number density, which is proportional to P /T . This dependency is now explicitly represented in V9. This correction removes a small but slowly growing bias in the 7D PMC radiance (0 % in 2006, 3 % in 2018) which is large enough to introduce a non-negligible long-term trend in CO retrieval bias. The operational radiative transfer model for V9 is based on HITRAN12 (Rothman et al., 2013), which is the same version of HITRAN used for MOPITT V7 and V8 processing.
The MOPITT retrieval algorithm exploits radiance bias correction factors to compensate for relative biases between simulated radiances calculated by MOPFAS and actual calibrated Level 1 radiances from the instrument. Radiance bias correction factors compensate for a variety of potential bias sources including errors in instrumental specifications, forward model errors related to the development of MOPFAS, errors in assumed spectroscopic data, and geophysical errors . Within the retrieval algorithm, these correction factors are applied by scaling the simulated radiances produced by MOPFAS each time it is executed.
As introduced in V8 processing, a radiance bias correction is based on a parameterization involving both (1) the date of the MOPITT observation and (2) the water vapor total column at the time and geographic location of the MOPITT observation, as derived from the MERRA-2 (https://gmao.gsfc. nasa.gov/reanalysis/MERRA-2/, last access: 11 April 2022) water vapor profiles needed to execute MOPFAS (Deeter et al., 2019). Within the retrieval software, the radiance bias correction factors for V8 and V9 are calculated using the relation where R i is the multiplicative radiance correction factor to be applied to the model-simulated value for radiance i; N dys is the number of elapsed days since 1 January 2000; WV is the water vapor total column (or "precipitable water vapor", expressed in molecules cm −2 ) determined from the MERRA-2 reanalysis (temporally and spatially interpolated to the time and location of the MOPITT observation); and R 0 , R t , and R w are the empirically determined parameters which effec- Figure 1. Time series comparisons of daily-mean CO total column (a) and CO total column (b) for daytime/land retrievals between 60 • S and 60 • N (as described in Sect. 2.1) for the V9T, V8N, and V9N variants. CO total column time series in panel (b) are obtained by subtracting the V9T total column time series (plotted in red in panel a) from the V8N (blue) and V9N (purple) time series. Vertical gray bars indicate periods during which the annual hot-calibration and decontamination procedures were performed. Discontinuities in CO total column for dates just before and after the hot-cal/decontamination events for the V8N variant (blue) are largely resolved for the V9N variant (purple).
tively minimize overall retrieval bias, bias drift, and bias water vapor sensitivity. Values of R 0 , R t , and R w for the 5A, 5D, 6D, and 7D radiances used for V8 and V9 operational processing are listed in Table 1. (Since the use of MOPITT's NIR radiances in the retrieval algorithm only involves the ratio of the 6D and 6A radiances, values of R 0 , R t , and R w for the 6A radiance are not optimized as they are for the other radiances. Thus, for 6A, R 0 is set to 1, while R t and R w are both set to 0.) V9 values are identical to the corresponding values used for V8 processing, except for the R 0 and R t values for 6D and 7D. V9 values of R 0 and R t values were re-optimized for 6D because of the revised calibration scheme described in Sect. 2.1. Values of R 0 and R t values were re-optimized for 7D due to the forward model corrections related to PMC modeling. The methods used to optimize the R 0 and R t values for 6D and 7D are described in Deeter et al. (2019). As indicated in Table 1, V9 radiance bias correction factors for 7D are smaller than the corresponding correction factors for V8, suggesting that the PMC model revisions in MOPFAS implemented for V9 resolved a substantial component of the discrepancy between observed and model-calculated radiances for Channel 7.

Cloud detection
Because the MOPITT radiative transfer model simulates radiances only in clear-sky conditions, MOPITT observations affected by clouds are not used in Level 2 retrieval processing. The clear/cloudy determination is performed by a cloud detection algorithm which involves both MOPITT's thermal-channel radiances and information from the Terra-MODIS (Moderate Resolution Imaging Spectroradiometer) cloud mask product (Warner et al., 2001;Francis et al., 2017). With respect to the MOPITT thermal-channel test, the ratio of the observed MOPITT Channel 7 Average radiance and the corresponding model-calculated value is compared to a predefined global threshold value. If the radiance ratio is less than the threshold value, that MOPITT observation is flagged as cloudy. For V9, the radiance ratio for each MO-PITT retrieval is reported in the new diagnostic "MOPCld Rad Ratio".
The overall outcome of the MOPITT cloud detection algorithm for a particular retrieval is described by the "Cloud Description" diagnostic in the Level 2 files. The Cloud Description diagnostic values (1-6) are defined as follows:  This last class (6) was first introduced in the V7 product and was applied only to ocean scenes as a response to declining quality in the MODIS cloud mask . For the V9 product, two significant changes were implemented in the revised cloud detection algorithm . The first change is related to the interpretation of the MODIS cloud mask, whereas the second change concerns the treatment of observations deemed cloudy by the MODIS cloud mask but clear by the MOPITT thermalchannel test. Together, these changes significantly increase MOPITT retrieval coverage over land.
The MODIS cloud mask reports one of four possible outcomes for each MODIS 1 km pixel: Cloudy, Uncertain, Probably Clear, or Clear. An individual MOPITT pixel typically encloses ∼ 500 MODIS 1 km pixels. Prior to V9, the MOPITT cloud detection algorithm interpreted the Probably Clear and Clear outcomes as clear and treated the Cloudy and Uncertain outcomes as cloudy. If at least 95 % of the MODIS cloud mask pixels enclosed within a given MOPITT pixel indicated either Probably Clear or Clear, that MOPITT pixel was considered clear according to MODIS. For V9 processing, the MODIS cloud mask test was relaxed to treat Uncertain MODIS pixels as clear in the same manner as Clear and Probably Clear MODIS pixels. This change was motivated by the observation that such MODIS pixels can often be found in apparently cloudless but heavily polluted scenes .
For V8 and earlier MOPITT products, observations over land were typically discarded if the MODIS cloud mask indicated clouds. In V9, however, observations over land are only discarded if both the MODIS cloud mask and MOPITT radiance tests indicate the presence of clouds; this change was introduced earlier for observations over the ocean, beginning with V7 products. It allows MOPITT retrievals in cases where the MODIS cloud mask tests indicate clouds (or are ambiguous) while the MOPITT TIR radiances are consistent with clear-sky conditions. Consequently, this change should allow the retrieval of scenes for which clouds in the MO-PITT field of view have a negligible effect on the MOPITT radiances. MOPITT retrievals for which the MODIS cloud mask considers the observation to be cloudy while the MO-PITT thermal-channel test passes the observation as clear are assigned the Cloud Description index of 6 and can therefore be analyzed separately from retrievals where MODIS determined the scene to be clear. Prior to V9, this value for the Cloud Description index was only allowed for observations over the ocean.
Finally, a minor change was also made in the revised cloud detection algorithm regarding cloud index 4 (MOPITT clear, MODIS indicating low clouds). In the revised algorithm, this index is only applied to observations over the ocean, where low clouds are more reliably detected. Retrievals over land which would have been assigned a cloud index value of 4 in the V8 algorithm are assigned a cloud index value of 6 in V9. Thresholds for the MODIS cloud mask and MO-PITT thermal-channel tests for V9 are unchanged relative to the values used for the MOPITT Version 8 product; i.e., the MODIS clear-sky fraction threshold is set to 0.95, and the MOPITT radiance ratio threshold is set to 1.00.
In addition to the Cloud Description diagnostic, a separate diagnostic is provided for each retrieval in the Level 2 product file to quantify the results of the various cloud tests applied to the set of MODIS Cloud Mask pixels matched to each MOPITT pixel . This diagnostic, which has been revised for V9, may be of use for analyzing potential retrieval biases associated with particular types of scenes. For V9, elements of the 12-element "MODIS Cloud Diagnostics" floating point vector indicate

Validation
Retrieval validation results for the V9 product are compared with corresponding results for the V8 product below. Validation results are based on quantitative comparisons of MOPITT retrieval products (CO VMR profiles and total columns) with in situ vertical profiles measured from aircraft. In situ measurements are assumed to be exact and representative of a defined region surrounding the sampling location. When making quantitative comparisons of MOPITT retrieved CO profiles and in situ profiles, the in situ data must be transformed to represent the effects of smoothing error and inclusion of a priori information (Deeter et al., 2003). Simulated retrievals based on in situ vertical profiles are calculated using the equation where x sim is the simulated retrieval, A is the retrieval averaging kernel matrix, x a is the a priori profile, and x true is the true (in situ) profile. For consistency with the MOPITT retrieval algorithm, the vector quantities x sim , x a , and x true are expressed in terms of log(VMR) rather than VMR. Retrieval error x is then calculated as where x obs is the observed (retrieved) MOPITT profile corresponding to x sim . Previously reported validation results based on a set of aircraft profiles over the Amazon Basin demonstrated that retrieval biases for the V8 TIR-only product and an experimental product incorporating the cloud detection revisions described in Sect. 2.3 were within about 3 % at all levels . However, since the disparities were similar to the estimated accuracy of the in situ measurements, the difference in biases was not considered significant. Below, we compare V8 and V9 validation results over a much larger set of aircraft profiles drawn from both a long-term measurement program operated by NOAA and several field campaigns. While the validation results reported below are useful for estimating the magnitude of expected retrieval bias and drift, they should not be used as the basis for applying ad hoc corrections to the MOPITT data.

NOAA aircraft profiles
V8 and V9 validation results reported below are based on a large set of CO vertical profiles measured by the NOAA Global Monitoring Laboratory using an airborne flasksampling system followed by laboratory analysis (Sweeney et al., 2021). Typical in situ profiles are derived from a set of 12 flasks acquired as the aircraft descends. Reproducibility of the laboratory-measured CO dry-air mole fractions, which are measured by either a vacuum UV-resonance fluorescence spectrometer or a reduction gas analyzer, is better than 1 ppb. Total uncertainty values for the flask measurements increase monotonically with CO mole fraction from ∼ 1.2 ppb at 100 ppb to ∼ 3.5 ppb at 500 ppb (https://gml.noaa.gov/ ccl/ccl_uncertainties.html, last access: 11 April 2022). All NOAA flask sample profiles were calibrated using the WMO CO X2014A scale (https://gml.noaa.gov/ccl/co_scale.html, last access: 11 April 2022). Results reported below are based on NOAA vertical profiles obtained from flights at 21 fixed sites (mainly over North America) between 2000 and 2020. The consistency, long record, and high accuracy characterizing this set of profiles is the basis for its use in optimizing the radiance bias correction factors and for quantifying long-term changes in MOPITT retrieval biases (Deeter et al., 2003(Deeter et al., , 2019. For matching MOPITT retrieved profiles with the NOAA in situ profiles, a maximum separation of 50 km was employed (relative to the center of the MOPITT 22 by 22 km footprint) and a maximum of 12 h was allowed between the time of the MOPITT observation and sampling time of the in situ data. In order to obtain a complete validation profile for comparison with MOPITT retrievals, each in situ profile was extended vertically above the highest-altitude in situ measurement using the CAM-chem chemical transport model (Lamarque et al., 2012) and then resampled to the standard pressure grid used for the MOPITT operational radiative transfer model . Vali-dation results for the MOPITT 100 hPa retrieval level are not reported below, since in situ data are generally unavailable from aircraft for the atmospheric layer above this height.
Validation results derived from the NOAA aircraft flask samples for the V8 and V9 TIR-only (V8T and V9T), NIRonly (V8N and V9N), and joint TIR-NIR (V8J and V9J) variants are compared in Fig. 2. Validation statistics for total column and alternating retrieval levels (surface, 800, 600, 400, and 200 hPa) are also summarized in Table 2. The left panel in Fig. 2 shows the mean retrieval bias versus pressure level and is obtained by calculating the mean log(VMR) retrieval error over all MOPITT retrievals matched to one of the NOAA in situ profiles according to the matching criteria described above. Retrieval error is calculated for each retrieval by subtracting the simulated in situ-based value (as calculated using Eq. 3) from the actual retrieved value. Retrieval bias values are converted from (log(VMR)) to percent as described in Deeter et al. (2017). The panel on the right side of Fig. 2 presents the retrieval bias drift at each pressure level as calculated using a least-squares fit to log(VMR) retrieval error as a function of time.
Overall retrieval bias values for the V9 TIR-only variant based on the NOAA profile set are generally in the range of a few percent and are comparable to corresponding V8 TIR-only values. The mean total column bias for V9, listed in Table 2, is slightly smaller than for V8 (9.69 × 10 15 molecules cm −2 vs. 1.33 × 10 16 molecules cm −2 ). Retrieval bias drift for the V9 TIR-only variant is less than 0.2 % yr −1 at all levels and is similar in magnitude to values for the V8 TIR-only variant. However, total column bias drift is somewhat larger for V9 than for V8 (1.52 × 10 15 molecules cm −2 yr −1 vs. 1.17 × 10 15 molecules cm −2 yr −1 ).
As shown in Fig. 2, NOAA validation results for the V9 NIR-only variant are slightly worse than for the V8 NIR-only variant. Nevertheless, for the V9 NIR-only variant, retrieval bias is still less than 1 % at all levels and retrieval bias drift is generally less than 0.2 % yr −1 at all levels. Total column bias and bias drift for the V9 NIR-only variant are 4.60 × 10 15 and 3.27 × 10 15 molecules cm −2 yr −1 , both of which are improved relative to the V8 NIR-only variant.
Retrieval biases for the V9 TIR-NIR variant are generally larger (in magnitude) than for the V9 TIRonly and NIR-only variants but are similar to values for the V8 TIR-NIR variant. Retrieval bias for the V9 TIR-NIR variant varies from −5.82 % at 500 hPa to 1.90 % at the surface. Total column bias is somewhat smaller for the V9 TIR-NIR variant compared to the V8 TIR-NIR variant (1.60 × 10 16 molecules cm −2 vs. 1.82 × 10 16 molecules cm −2 ). Bias drift for the V9 TIR-NIR variant varies from −0.22 % yr −1 at 700 hPa to 0.37 % yr −1 at 200 hPa. V9 bias drift is smaller (in magnitude) than for the V8 TIR-NIR at the surface but is larger than V8 bias drift values in both the lower troposphere (600-900 hPa) and upper troposphere (200-300 hPa). Total column bias drift for V9 is also larger than for V8 (−3.16 × 10 14 molecules cm −2 yr −1 vs. −2.27 × 10 14 molecules cm −2 yr −1 ) but is smaller than total column bias drift values for both the V9 TIR-only and V9 NIR-only variants.
Standard deviation values are also listed in Table 2. Although this metric is often used to characterize random retrieval error, it is also influenced by limitations of the reference dataset used for validation. For example, the use of a single set of 12 flask measurements at discrete altitudes to fully represent the CO distribution sampled by MOPITT likely exaggerates the actual retrieval error for several reasons including (1) fine-scale CO vertical variability not represented by the relatively coarse set of in situ measurements, (2) horizontal CO variability within the co-location radius, (3) temporal CO variability during the delay between the in situ sampling and MOPITT overpass, and (4) the lack of in situ measurements at high altitudes (e.g., above 10 km). Thus, the standard deviation values listed in Table 2 should be interpreted only as an upper bound for the actual random retrieval error. Alternative methods for analyzing random retrieval error will be the topic of a future study.

Cloud index
As described in Sect. 2.3, a cloud index diagnostic (1-6) is included in the MOPITT Level 2 data files for each retrieved profile and indicates the manner in which that observation passed the cloud detection algorithm. V8 and V9 retrieval biases for each of the six cloud index subsets are analyzed in Appendix A. The analysis is based on the same NOAA profile set for which the aggregate validation statistics are shown in Fig. 2. Except for cloud index 1, which represents only ∼ 1 % of the analyzed data, results presented in Appendix A show that biases for the cloud index subsets are in the range of ±5 % for the V9 TIR-only results, ±2 % for the V9 NIRonly results, and ±10 % for the V9 TIR-NIR results. Compared to the biases for the non-subsetted NOAA validation results (shown in Fig. 2), bias differences associated with the different cloud index values are generally no more than 2 %-3 %. Previous results of an analysis presented in  demonstrated that retrieval errors for the retrievals added because of changes to the cloud detection algorithm were consistent with the retrieval errors for retrievals resulting from the original cloud detection algorithm.

Field campaigns
The new V9 product was also separately validated using CO in situ profiles measured during the HIPPO (HIA-PER Pole-to-Pole Observations), ATom (Atmospheric Tomography Mission, https://espo.nasa.gov/atom, last access: 11 April 2022), and KORUS-AQ (Korea-United States Air Quality study, https://espo.nasa.gov/korus-aq, last access: 11 April 2022) field campaigns. Both the HIPPO and ATom Table 2. Summarized validation results for V8 and V9 TIR-only (V8T and V9T), NIR-only (V8N and V9N), and TIR-NIR (V8J and V9J) variants based on in situ data from NOAA aircraft validation sites. Total number of MOPITT retrievals used for validation is shown in parentheses in the leftmost column. Bias and standard deviation (SD) statistics for the total column are given in units of molecules cm −2 . Bias and SD for retrieval levels are expressed in percent (%). Total column drift values are provided in units of molecules cm −2 yr −1 . Drift for the retrieval levels is expressed in % yr −1 .  (Wofsy, 2011). ATom took place in four phases in 2016(Thompson et al., 2022. The KORUS-AQ campaign was conducted over the Korean peninsula (and vicinity) from April to June 2016 (Crawford et al., 2021). Since MOPITT retrievals over ocean are based solely on TIR radiances, validation results presented below for the HIPPO and ATom campaigns (which mainly produced over-ocean observations) are limited to the TIR-only variant. CO measurements used for validation for both HIPPO and ATom were performed with the QCLS (Quantum Cascade Laser Spectrometer) instrument (Santoni et al., 2014). CO measurements for KORUS-AQ were performed with the DACOM (Differential Absorption Carbon monOxide Measurement) instrument (Sachse et al., 1987). In-flight calibration for both the QCLS and DACOM instruments involves the use of compressed gas cylinders from NOAA's Global Monitoring Laboratory with known CO concentrations. For ATom and KORUS-AQ, the calibration of these reference cylinders from NOAA was based on the WMO CO X2014A scale, whereas for HIPPO the calibration was based on the prior X2004 scale. For the HIPPO, ATom, and KORUS-AQ CO measurements used herein, potential drift in the reference cylinder CO mole fractions (https://gml.noaa.gov/ccl/ co_scale.html) was addressed by calibrating the reference cylinders at NOAA's Central Calibration Laboratory both before and after the field campaign and applying linear interpolation. For CO, the estimated precision of the QCLS instrument is 0.2 ppb (Santoni et al., 2014). For DACOM, the estimated precision is 1 ppb + 1 % of the measured CO mole fraction (Sachse et al., 1987). A comparison of CO measurements obtained by QCLS and NOAA flasks during HIPPO indicated a negative bias of 2 ppb for QCLS (Santoni et al., 2014).
For matching MOPITT retrieved profiles with in situ profiles, a maximum collocation radius of 50 km was employed for the KORUS-AQ profiles (like the NOAA profiles), whereas a value of 200 km was used for the HIPPO and ATom profiles. The larger radius for HIPPO and ATom was chosen since expected horizontal CO gradients are generally much smaller over the open ocean than over continental regions. The influence of collocation criteria on MOPITT validation statistics was studied in .
V8 and V9 TIR-only validation results for HIPPO and ATom are compared in Fig. 3 and Tables 3 and 4. V9 retrieval biases for HIPPO vary over the range of ±6 %, while V9 retrieval biases for ATom vary from about −4 % to 2 %. With respect to total column, biases for the V9 TIR-only product for the NOAA, HIPPO, and ATom (listed in Tables 2, 3 and 4) are 9.69 × 10 15 , −2.06 × 10 15 , and −1.22 × 10 16 molecules cm −2 , respectively. For both HIPPO and ATom, the range of observed biases (over the vertical profile) is larger than for the NOAA TIR-only profiles. To some degree, the smaller biases for the NOAA profiles are likely a consequence of using those profiles to obtain optimal radiance bias correction factors, as described in Deeter et al. (2019). Differences in biases for the NOAA, HIPPO, ATom, and KORUS-AQ datasets could reflect either some type of geographically variable retrieval bias in the MOPITT retrievals or differences in the characteristics of the in situ measurements acquired during the field campaigns. V8 and V9 validation results for KORUS-AQ are compared in Fig. 3 and Table 5. Differences between V8 and V9 retrieval biases for KORUS-AQ are generally similar to differences observed for the NOAA profile set. For example, in comparison to V8, V9 TIR-only biases in the lower troposphere are shifted to slightly greater values in the lower troposphere and shifted to slightly smaller values in the upper troposphere. The range of bias values over the CO profile for V8 and V9 is also similar. Biases for the V8 and V9 TIR-only, NIR-only, and TIR-NIR variants for KORUS-AQ fall in the ranges ±4 %, ±2 %, and ±7 % respectively. Total column biases for the V9 TIR-only, NIR-only, and TIR-NIR variants listed in Table 5 are somewhat larger than for the corresponding V8 variants (in contrast to the NOAA validation results).

Sampling characteristics
Case studies presented in Deeter et al. (2021) illustrated the increased retrieval yield in selected scenes resulting from the cloud detection revisions described in Sect. 2.3. This previous analysis focused on the performance of the revised cloud detection algorithm in heavily polluted regions. Retrievals added because of the cloud detection revisions were found to be physically consistent with the retrieved CO in the rest of the scene. Below, we analyze the improved retrieval coverage in V9 products at global and regional spatial scales.

Zonal means
Zonal totals of the numbers of daytime retrievals over land obtained for the V8 and V9 TIR-only variants for the month of July 2017 are presented in the left panel of Fig. 4. Each plotted point indicates the total monthly number of daytime retrievals in a latitude band that is 10 • wide. The plot illustrates a sharp increase in the number of daytime retrievals over land for V9, especially over the Northern Hemisphere. Globally, the total number of daytime retrievals over land increased by 41 % from 9.84 × 10 5 for V8 to 1.36 × 10 6 for V9. Monthly totals of numbers of retrievals for V9 for other months which have been analyzed are typically 30 %-40 % larger than for V8. The panel on the right side of Fig. 4 compares V8 and V9 zonal-mean total column values for the same subsets of daytime retrievals over land analyzed in the left panel. The plot shows that the large relative increase in the number of daytime retrievals over land for V9 has a very weak effect on the monthly-average total column zonal means. V8T and V9T zonal means are within 2 % at most latitude bands. This finding suggests that the retrievals added in V9 by virtue of the cloud detection algorithm changes described in Sect. 2.3 may not strongly affect large-scale features in the MOPITT product.

Sampling frequency
The utility of MOPITT data for specific applications often depends on the temporal interval between observations. As illustrated below, a useful metric for this variable is retrieval sampling frequency . We define retrieval sampling frequency as the reciprocal of the mean sampling period, which is itself defined as the average number of days between retrievals acquired within a 1 • latitude by 1 • longitude grid cell, calculated over a specified period of observa-tions. Thus, for a particular grid cell, where ν s is the retrieval sampling frequency, τ s is the mean sampling period, L obs is the total length of the observation period (in days), and N obs is the number of days within that period which contain at least one MOPITT retrieval. In order to sample all longitudes equally, sampling frequency should be calculated over periods of observations equal to integral multiples of Terra's 16 d orbital repeat cycle. Maps of daytime retrieval sampling frequency for V8 and V9 retrievals for South America are compared in Fig. 5. Retrieval sampling frequency was calculated for the period between 1 September and 2 October 2017, spanning two complete Terra orbital repeat cycles. No filtering was applied with respect to cloud index or any other parameter. Sampling frequency over oceanic grid cells, which is not significantly different for the two cloud detection algorithms, is not shown. Grid cells for which the sampling frequency is exactly 0 (meaning that no retrievals were acquired over the entire 32 d observation period) are indicated by a cross covering the cell.
As shown in Deeter et al. (2021), increased sampling frequency for V9 results from both of the cloud detection algorithm revisions described in Sect. 2.3. For V8 results shown in the left panel, sampling frequency varies widely from zero in much of the extreme northern, easternmost, and southwestern regions of South America to ∼ 0.3 d −1 in parts of eastern South America and an area of western South America between 30 and 20 • S. For the V9 product, shown in the right panel, improved retrieval sampling frequency is indicated over most of the continent but is most obvious in the regions where the V8 sampling frequency is the poorest, e.g., Table 5. Summarized validation results for V8 and V9 TIR-only (V8T and V9T), NIR-only (V8N and V9N), and TIR-NIR (V8J and V9J) variants based on in situ data from the KORUS-AQ field campaign. See caption to  regions north of 5 • S. Over this region, the mean sampling frequency increases by 127 %, from 0.088 to 0.20 d −1 . Over the entire continent, the number of grid cells for which the retrieval sampling frequency is exactly zero decreases sharply from 62 to 2. Substantial improvements in sampling frequency for V9 are also observed for North America and Asia. V8 and V9 sampling frequency maps for North America were calculated for the period from 1 January to 1 February 2017 and are shown in Fig. 6. Sharply increased sampling frequency is evident over much of Canada and over much of the eastern United States where V8 sampling frequency is near zero. Improved sampling for V9 over Canada was found independently to be related to added retrievals in scenes with low clouds (Marey et al., 2022). For Asia, V8 and V9 sampling frequency maps were also calculated for the period from 1 January to 1 February 2017 and are shown in Fig. 7. Increased sampling frequency is apparent over much of the continent, particularly western China, northeastern China, and Mongolia.

Level 3 products
The beneficial effects of the cloud detection revisions are also readily apparent in the gridded MOPITT Level 3 monthly product, as shown in Fig. 8. The top row in this figure compares V8 and V9 TIR-NIR gridded monthly-mean daytime CO total column values for eastern China for January 2010. Empty grid cell values, indicated in white, are much more common in the V8 product than in the V9 product. The bot-    tom panel in the figure presents a map of the fractional difference derived from the top-row panels. This map demonstrates that over a heavily polluted region such as the North China Plain, monthly-mean total column values in the V9 product may be larger than corresponding V8 values by 20 % or more. This effect is due to the tendency of heavy aerosol loading to lead to the Uncertain outcome for the MODIS cloud mask, resulting in the exclusion of such scenes in the MOPITT V8 product . Thus, CO monthly means in the V9 product should be more accurate than for V8 because retrievals are averaged over a wider and more complete range of pollution levels.

Conclusions
Various aspects of the MOPITT calibration methods and retrieval algorithm have been revised since the instrument became operational in 2000. For the most recently released Version 9 products, significant revisions were made to the NIR calibration scheme and to the cloud detection algorithm. The new NIR calibration method was shown to reduce an apparent discontinuity in NIR-only retrievals for dates just before and just after the annual hot calibration/decontamination procedure. This revision should improve the temporal consistency of both the NIR-only and TIR-NIR products. The revised cloud detection algorithm allows retrievals in ambiguous situations (with respect to cloudiness) resulting in an increase in large-scale retrieval coverage over land of ∼ 30 %-40 % compared to the V8 product. Validation results based on aircraft in situ profiles indicate that V9 product retrieval biases are typically in the range of ±5 % and are generally comparable to results for the V8 product.
The improved retrieval coverage and sampling frequency for V9 should add value to the MOPITT product in a wide variety of applications. For example, more frequent retrievals in CO source regions, such as the fire-prone Amazon Basin and heavily industrialized North China Plain, should lead to more accurate emissions estimates using inverse modeling methods. For visualizing CO distributions using monthlymean maps, the new product is more statistically robust and has many fewer gaps due to missing data. Moreover, heavily polluted regions should be more accurately represented in such maps since the previous cloud detection algorithm tended to exclude the most heavily polluted scenes. Finally, the increased retrieval coverage should lead to better statistics when validating other satellite products.
Appendix A: Cloud-index-subsetted validation results NOAA V8 and V9 TIR-only validation results subsetted by cloud index value (1-6) are shown in Fig. A1 and are listed in Table A1. The number of V8 and V9 retrievals within each subset are indicated in the figure legend and in the leftmost column of the table. Corresponding results for the NIR-only and TIR-NIR products are presented in Figs. A2 and A3 and Tables A2 and A3. Cloud index values are defined in Figure A1. Cloud-index-subsetted validation results for the V8 and V9 TIR-only variants using the NOAA profile set. Numbers in parentheses in the legend indicate the number of retrievals within the subset for the corresponding cloud index value for the V8 and V9 products. Figure A2. Cloud-index-subsetted validation results for the V8 and V9 NIR-only variants using the NOAA profile set. Numbers in parentheses in the legend indicate the number of retrievals within the subset for the corresponding cloud index value for the V8 and V9 products. Sect. 2.3. A comparison of the numbers of V8 and V9 retrievals in Tables A1, A2, and A3 indicates that the large majority of added retrievals in V9 (not present in the V8 product) are either assigned cloud index 2 (MODIS clear, MO-PITT clear) or 6 (MODIS cloudy, MOPITT clear). Figure A3. Cloud-index-subsetted validation results for the V8 and V9 TIR-NIR variants using the NOAA profile set. Numbers in parentheses in the legend indicate the number of retrievals within the subset for the corresponding cloud index value for the V8 and V9 products.
For the V9 TIR-only results, retrieval biases for cloud index subsets 2-6 fall in the range of ±5 %. (The cloud index 1 subset, composed of retrievals for which the MODIS cloud mask was unavailable, represents only about 1 % of the entire set of retrievals analyzed and may not be statistically significant.) Corresponding bias ranges for the V9 NIR-only and TIR-NIR variants are ±2 % and ±10 %, respectively. In relation to the cloud index 2 subset (MODIS clear, MO-PITT clear) subset, which represents the retrieval subset most confidently cloud-free, biases for the cloud index 6 subset (MODIS cloudy, MOPITT clear) are within 2 % at all levels. Similarly, differences in the index 2 and index 6 subsets for the NIR-only and TIR-NIR variants are within 1 % and 3 %, respectively. Thus, comparing retrieval biases for cloud index 2 and 6, it appears that the results of the MODIS cloud mask test are not significant. However, the importance of the MODIS cloud mask test may be greater in specific contexts not represented in the validation results, such as nighttime retrievals over land. Since bias differences associated with the different cloud index values are generally similar in magnitude to bias variations over the vertical profile, validation results shown in Figs. A1, A2, and A3 do not imply a clear benefit to filtering based on cloud index.  (Sweeney et al., 2021). In situ data from the HIPPO, ATom, and KORUS-AQ campaigns can be obtained through https://doi.org/10.3334/CDIAC/HIPPO_010 (Wofsy et al., 2017), https://doi.org/10.3334/ORNLDAAC/1932 (Commane et al., 2021), and https://www-air.larc.nasa.gov/cgi-bin/ArcView/ korusaq#DISKIN.GLENN/ (Diskin, 2017), respectively.
Author contributions. MD led the development and evaluation of the V9 product and wrote the manuscript. DM integrated the algorithm revisions into the prototype and operational processing software and managed the data processing. GF implemented the revisions made to the operational radiative transfer model. SMA managed the acquisition and processing of the in situ datasets used for validation. MD, GF, DM, JG, SMA, HW, DZ, and JD participated in the development and testing of the revised NIR calibration scheme. RC, GD, and KM provided expertise with the in situ datasets used for validation. All authors reviewed the manuscript.
Acknowledgements. The NCAR MOPITT project is supported by the National Aeronautics and Space Administration (NASA) Earth Observing System (EOS) program. The National Center for Atmospheric Research (NCAR) is sponsored by the National Science Foundation. We also acknowledge the Canadian Space Agency, which provided the MOPITT instrument and continues to support instrument operations. In situ datasets used for validation were provided by NOAA's Global Monitoring Laboratory and its partners, as well as participants in the HIPPO, ATom, and KORUS-AQ field campaigns.
Financial support. This research has been supported by the National Aeronautics and Space Administration (grant no. 80GSFC19C0032).
Review statement. This paper was edited by Dietrich G. Feist and reviewed by two anonymous referees.