Validation of MODIS 3 km land aerosol optical depth from NASA ’ s EOS Terra and Aqua missions

In addition to the standard resolution product (10 km), the MODerate resolution Imaging Spectroradiometer (MODIS) Collection 6 (C006) data release included a higher resolution (3 km). Other than accommodations for the two different resolutions, the 10 and 3 km Dark Target (DT) algorithms are basically the same. In this study, we perform global validation of the higher-resolution aerosol optical depth (AOD) over global land by comparing against AErosol RObotic NETwork (AERONET) measurements. The MODIS–AERONET collocated data sets consist of 161 410 high-confidence AOD pairs from 2000 to 2015 for Terra MODIS and 2003 to 2015 for Aqua MODIS. We find that 62.5 and 68.4 % of AODs retrieved from Terra MODIS and Aqua MODIS, respectively, fall within previously published expected error bounds of ±(0.05+ 0.2×AOD), with a high correlation (R = 0.87). The scatter is not random, but exhibits a mean positive bias of ∼ 0.06 for Terra and ∼ 0.03 for Aqua. These biases for the 3 km product are approximately 0.03 larger than the biases found in similar validations of the 10 km product. The validation results for the 3 km product did not have a relationship to aerosol loading (i.e., true AOD), but did exhibit dependence on quality flags, region, viewing geometry, and aerosol spatial variability. Time series of global MODIS–AERONET differences show that validation is not static, but has changed over the course of both sensors’ lifetimes, with Terra MODIS showing more change over time. The likely cause of the change of validation over time is sensor degradation, but changes in the distribution of AERONET stations and differences in the global aerosol system itself could be contributing to the temporal variability of validation.


Introduction
The MODerate resolution Imaging Spectroradiometer (MODIS) sensors, on board the Earth Observing System (EOS) Terra and Aqua satellites, have been providing observations of Earth and the atmosphere for almost two decades (Salomonson et al., 1989).These data have been used to create a long-term set of atmospheric aerosol properties including aerosol optical depth (AOD -a measure of aerosol loading in the total atmospheric column) (Kaufman et al., 1997;Levy et al., 2013).In particular, the Dark Target (DT) algorithms applied to MODIS observations provide aerosol retrievals over both ocean and dark vegetated land surfaces (Kaufman et al., 1997;Remer et al., 2005;Levy et al., 2007aLevy et al., , b, 2013)).The DT products were designed with climate applications in mind and have been used to address a wide variety of geophysical science questions including the role of aerosols in climate-relevant processes (Kaufman et al., 2002;Christopher and Zhang, 2002;Yu et al., 2006), cloud and precipitation modifications (Koren et al., 2009(Koren et al., , 2012;;Yuan et al., 2011;Oreopoulos et al., 2016), and long-range transport of aerosols (Kaufman et al., 2005;Yu et al., 2012).Users have even applied the DT aerosol product to address needs for monitoring, evaluating, and Published by Copernicus Publications on behalf of the European Geosciences Union.P. Gupta et al.: Validation of MODIS 3 km land aerosol optical depth forecasting air quality (al Saadi et al., 2005;Gupta et al., 2009;Van Donkelaar et al., 2015).
The MODIS DT algorithm produces an aerosol product, over land and ocean, at a nominal 10 × 10 km 2 spatial resolution (referred to as "10 km" herein).This spatial resolution permits much selectivity in choosing which MODISmeasured reflectance pixels at 0.5 × 0.5 km 2 resolution to include in the retrieval, and generally produces smooth and accurate fields of AOD and other aerosol parameters (Remer et al., 2012).By allowing the algorithm to discard up to 90 % of the available pixels and still produce a high-quality aerosol product, the algorithm avoids marginal situations unfavorable for an aerosol retrieval such as cloud fringes, fragments, and shadows, as well as land surfaces that do not agree with algorithm assumptions (Remer et al., 2012).The 10 km product has undergone lengthy evaluation and validation, updated after each major algorithm modification (Ichoku et al., 2002;Chu et al., 2002;Remer et al., 2002Remer et al., , 2005;;Russell et al., 2007;Levy et al., 2005Levy et al., , 2010)).Some of this evaluation was global in nature, while some was local to a particular field experiment, but all concerned the 10 km MODIS DT aerosol product.
For climate studies, the initial intention of the algorithm, 10 km spatial resolution was sufficient to characterize global and regional aerosol loading.However, as the community expanded the use of MODIS AOD to a wide variety of purposes, need arose for a finer-resolution product, and a nominal 3 × 3 km 2 resolution (referred to as "3 km" herein) product was introduced as part of MODIS Collection 6 (Levy et al., 2013;Remer et al., 2013).The product is termed MYD04_3K for 3 km resolution aerosol parameters derived from the Aqua MODIS sensor and MOD04_3K for those derived from Terra MODIS.These products are produced operationally, over land and ocean, and the entire data records of Terra and Aqua have been reprocessed, creating a data record of almost two decades.
Before becoming operational, Remer et al. (2013) tested the algorithm by comparing 6 months of global 3 km retrievals from Aqua MODIS against available ground truth, while other independent studies (Munchak et al., 2013;Nichol and Bilal, 2016;He et al., 2017 and others) have done subsequent evaluation of the product regionally and locally.These limited comparisons suggested that the new AOD product would be sufficiently accurate to provide useful information and new perspective to the aerosol community.However, the studies also suggested that the finerresolution product might introduce additional noise and/or bias that the original coarser-resolution product successfully avoided.Now that the multi-decadal 3 km product is operational and available publicly, it is time to perform a comprehensive evaluation of this finer-resolution MODIS DT aerosol product.We present here the results of an analysis of a comparison of the global long-term MODIS 3 km product with collocated AErosol RObotic NETwork (AERONET) (Holben et al., 1998) observations.

The MODIS DT 3 km aerosol retrieval over land
The MODIS DT algorithm and products are described in detail in Levy et al. (2013) and also in the MODIS DT online Algorithm Theoretical Basis Document (https://darktarget. gsfc.nasa.gov/atbd/overview,last access: 1 January 2018).In summary, to retrieve aerosol parameters over land, the algorithm makes use of the reflectances measured in three of MODIS' 36 spectral channels, 0.47, 0.65, and 2.1 µm (Levy et al., 2007a).These are provided in nominal spatial resolution of 0.5 × 0.5 km 2 (at nadir), while other channels (some at 0.5 × 0.5 km 2 , some at 1 × 1 km 2 resolution) are used for identifying appropriate surfaces for retrieval and for masking clouds, snow, and ice.While the 10 km standard product begins with an aggregation of 20 × 20 = 400 native-resolution pixels, the 3 km aerosol retrieval box starts with an aggregation of 6 × 6 = 36 such pixels.Native pixels are removed in order to retain only the ones most appropriate for a darktarget, over-land retrieval.Native pixels tagged as too bright for the DT algorithm, or identified as containing cloud, surface water, snow, or ice, are removed from the aggregation.The remaining native pixels are sorted from darkest to brightest, and the darkest 20 % and brightest 50 % of all remaining native pixels are removed as well.For the 3 km retrieval this means that at most 12 native pixels will remain and likely fewer.For minimum statistical robustness, the 3 km algorithm requires at least 5 native pixels (out of the initial 36).If there are insufficient native pixels (e.g < 5), output parameters are assigned fill values and no retrieval is attempted.Based on the aggregation and filtering, it is likely that there will be native pixels used by the 3 km retrieval that would have been discarded by the product with coarser (10 km) standard resolution (Remer et al., 2013).The remaining native pixels are averaged, leading to a single set of spectral reflectance values that drives the aerosol retrieval.Except for downstream decisions based on number of native pixels used, the 10 and 3 km retrievals proceed identically.
The retrieval uses a look-up table (LUT) procedure in which the LUT is constructed a priori of simulated top-ofatmosphere reflectances.LUT calculations use assumptions of aerosol optical properties based on AERONET inversions (Dubovik and King, 2000) and radiative transfer.The surface reflectance is constrained by assuming an empirically based relationship between reflectance at 2.1 µm and the reflectances at 0.65 and 0.47 µm (Levy et al., 2007a(Levy et al., , b, 2013)).The algorithm finds the AOD that minimizes the differences between the MODIS-observed mean spectral reflectances and the simulated reflectance values of the LUT.The primary output of the retrieval is the AOD at 0.55 µm.Using a series of tests, the algorithm assigns a quality assurance flag (QAF) of either 3, "good quality", or 0, "bad quality", to the retrieval.These values can be interpreted as "confidence" in the aerosol retrieval (whether the retrieval proceeded nominally and whether there are enough native-resolution pixels).For quantitative use of the 10 km product, the MODIS DT Atmos.Meas. Tech., 11, 3145-3159, 2018 www.atmos-meas-tech.net/11/3145/2018/team has recommended limiting use to retrievals designated with QAF = 3.In the 10 km product there must be ≥ 51 native pixels surviving the selection process out of a possible 120 to reach this QAF level.The similar ratio for the 3 km product is ≥ 5 surviving pixels out of a possible 12.Fewer pixels would provide insufficient statistical information for confidence in an aerosol retrieval.Therefore, for the 3 km retrieval, a situation with fewer than 5 native pixels automatically receives the designation of "poor quality" (QAF = 0).For this resolution product there are no intermediate quality levels between 3 and 0 over land retrievals (Remer et al., 2013).
3 Data sets

MODIS 3 km AOD
The primary data set of this study is the Collection 6 MODIS DT retrieved AOD at 3 km spatial resolution, derived from Terra reflectances (MOD04_3K), or Aqua reflectances (MYD04_3K), as described in Sect. 2. These are publicly available and can be downloaded from https: //ladsweb.modaps.eosdis.nasa.gov/(last access: 1 January 2018).Of the products in the data sets, we analyze only the AOD at 0.55 µm.
Applying identical algorithms to two different sensors does not guarantee identical results (Levy et al., 2015).The two MODIS DT data sets, one from Terra MODIS and one from Aqua MODIS, must be addressed separately as individual and independent products, even though they have been created from identical algorithms with no specific tuning of parameters for each sensor.While Terra MODIS and Aqua MODIS began as near-identical sensors, they have evolved over their lifetimes to develop their own instrumental characteristics.For example, some detectors in Aqua's detector array at some wavelengths have died, resulting in fewer available reflectance pixels at those wavelengths.Terra's detector array has not lost any detectors.At the same time, we have seen drift in some of Terra's wavelengths, resulting in measurable artificial trends in the Terra MODIS aerosol products (Levy et al., 2013;Sayer et al., 2015;Lyapustin et al., 2014).The most flagrant of those Terra MODIS trends have been mitigated by aggressive radiometric calibration (Toller et al., 2013), which has been applied in creating the C006 DT products.Note that some projects (e.g., Lyapustin et al., 2014;Sayer et al., 2015) have since introduced additional calibration drift mitigation.However, the DT retrieval has not applied these strategies.In this work, we will analyze the C006 aerosol products from the two MODIS sensors, independently, to provide users with clear information on the strengths and limitations of each one.

AERONET AOD
AERONET is NASA's global ground network of CIMEL sun-sky radiometers that make measurement of directly transmitted solar light and scattered sky light at several wavelengths during daylight hours (Holben et al., 1998).In this work, only the direct sun measurements will be used.The AERONET group processes these spectral measurements to derive AOD at the wavelengths corresponding to the direct sun measurements.The AERONET spectral AOD product is a community standard for satellite-derived AOD validation, given that AERONET's AOD uncertainty of 0.01-0.02(Eck et al., 1999) is sufficiently more accurate and precise than can be expected by any satellite retrieval.The typical temporal frequency of direct sun measurements is every 15 min.The network consists of hundreds of stations, located globally, across all continents and in a wide variety of aerosol, meteorological, and surface type conditions.Only stations that sufficiently represent land areas will be used here, which means we are not comparing with observations taken on small islands, ocean platforms, or mobile ships.
The configuration of the spectral bands varies, but typically is centered at 0.34, 0.38, 0.44, 0.50, 0.67, 0.87, and 1.02 µm.Here we use a quadratic log-log fit (Eck et al., 1999) to interpolate AERONET AOD to 0.55 µm to match the primary MODIS AOD product.AOD data from AERONET are reported for three different quality levels: unscreened (Level 1.0), cloud screened (Level 1.5), and cloud screened and quality assured (Level 2.0).We will only use Version 2.0 Level 2.0 AERONET AODs in this study.

Spatial and temporal collocation
The validation procedure requires calculating the spatiotemporal statistics of a collocated MODIS-retrieved and AERONET-measured AOD pair (Ichoku et al., 2002;Petrenko et al., 2012;Munchak et al., 2013;Remer et al., 2012).We have created a collocated data set (CDS) of AOD at 0.55 µm from both Terra MODIS and Aqua MODIS, matched with AERONET, for nearly the entire mission (2003-2015for Aqua and 2000-2015 for Terra).From here on, we use the term "pixels" to refer to the MODIS retrieval product (e.g., 3 or 10 km resolution); if referring to the native MODIS pixel resolution (e.g., 0.5 km) we will denote it as "native pixel".
In previous validation studies of the standard 10 km product the spatial statistics were based on groupings of either 5 × 5 MODIS product pixels (∼ 50 × 50 km 2 box) centered on the AERONET station (Ichoku et al., 2002;Levy et al., 2010) or all the MODIS product pixels within a 27.5 km radius around the AERONET station (Petrenko et al., 2012).These spatial statistics would be matched with the temporal statistics of ±30 min of AERONET observations centered at satellite overpass time.These large spatial collocation boxes will not properly test the accuracy of finer-resolution satellite products to represent small-scale aerosol gradients.Therefore, Remer et al. (2013) and Munchak et al. (2013) moved to a 7.5 km radius and ±30 min satellite overpass.The 7.5 km radius encompasses roughly 25 AOD pixels at nadir, which is analogous to the number of product pixels used with the coarser-resolution product.In this study spatial statistics are calculated from all MODIS product pixels falling within a box of 0.15 • × 0.15 • (latitude × longitude) centered over an AERONET location.Except for polar regions, this is similar to a 15 × 15 km 2 box or 7.5 km radius at nadir.Temporal statistics are calculated from all AERONET observations of AOD within ±30 min of satellite overpass.
As recommended by the MODIS DT science team (Levy et al., 2010), unless otherwise specified, only AOD pixels with quality assurance flag "very good" (QAF = 3) are included in averaging over the AERONET sites.To be consistent with previous validation exercises (Levy et al., 2010), we have retained the CDSs only when there were at least 5 MODIS product pixels (out of a possible 25) and 2 AERONET measurements (out of a possible 2-4).The CDS consists of 574 AERONET stations with 90 162 collocated pairs for Terra MODIS and 71 248 collocated pairs for Aqua MODIS. Figure 1 shows the locations of these stations and the color-coding represents the number of collocated AERONET-MODIS AOD pairs over the station.Thus, a data set (i.e., CDS) of collocated MODIS-AERONET pairs of AOD at 0.55 µm is created that can be organized and subsampled in any number of configurations.In any subsample, or for the entire data set, these ordered pairs can be plotted, one against the other, to create a scatterplot, and collocation statistics are calculated.We will use the following statistical parameters to quantify how well the MODIS retrievals match their collocated AERONET counterparts (Hyer et al, 2011): slope of the linear regression line, and root mean square error (RMSE).
The percentage of collocations falling within expected error (EE) is calculated as The error ratio (ER) is calculated as The coefficients in the EE equation were determined from evaluation of the 3 km product over the 6 months of Aqua data analyzed by Remer et al. (2013).Those limited results suggested that expected error bounds should be broadened to the values seen in Eq. ( 2) from those derived for the 10 km product (EE = ±[0.05+ 0.15 × AERONET AOD]).
The number of collocations (N) is another parameter used to evaluate the 3 km retrieval in the CDS.

Global statistics
We first compare MODIS 3 km AOD retrievals against collocated AERONET values (Fig. 2), for both the recommended "high-quality" retrievals (QAF = 3) and for all the retrievals, regardless of quality, keeping Terra and Aqua results separate.Note that the 3 km product only tags data as either high quality or low quality.Table 1 presents the statistical parameters corresponding to this analysis while considering various combinations of QAFs.
Globally, there is strong correlation between MODIS 3 km AOD and collocated AERONET equivalents.However, there is scatter and a positive bias to the retrievals, more so for Terra than Aqua even though the correlation is similar between the satellites.The retrieval quality identified by the algorithm corresponds well to the product accuracy as determined by collocation with AERONET observations.Algorithm-identified high-quality retrievals (QAF = 3) Table 1.Global statistics of comparison between MODIS 3 km AOD at 0.55 µm retrievals and collocated AERONET observations for both Terra and Aqua, corresponding to three quality assurance categories (QAF = 0 for poor quality, QAF = 0, 3 for all quality, and QAF = 3 for high quality).The data used for the 10 km validation do not represent the same time period as 3 km.1).However, the high-quality data set contains about 20 % fewer retrievals than does the total data set with retrievals of all quality levels included.Figure 3 shows that the differences between Terra and Aqua in how they match AERONET values are much more apparent than the differences between QAF levels of the same satellite sensors.We note that only the high-quality (QAF = 3) Aqua 3 km retrievals meet expectations in terms of falling within the standard expected error bars (Remer et al., 2012;and Eq. 2).Table 1 also shows the corresponding validation statistics for the 10 km product for QAF = 3, distinguishing between Terra and Aqua.The 10 km product, as expected, more closely matches AERONET values, having higher correlation, lower bias, and RMSE and producing more retrievals that fall within expected error bounds than does the 3 km product.We note that even in the 10 km validation statistics, mean bias for Terra is 0.03 higher than for Aqua, which is the same difference between sensors as found for the 3 km product.The results in Table 1 confirm the conclusion of Remer et al. ( 2013) that the 3 km product is less accurate than the standard 10 km product.The remainder of the paper will be devoted to exclusively analyzing the differences between the 3 km product and AERONET, without further reference to the standard 10 km product.

Regional statistics
The accuracy of the 3 km AOD retrievals will be regionally and locally specific, depending on how well retrieval assumptions of surface and aerosol optical properties match actual conditions.Local cloud conditions also may introduce uncertainty into the retrieval.Furthermore, the spatialtemporal variability of the area may create biases in the collocation methodology that depends on assumptions of aerosol homogeneity.Here we investigate how well the MODIS 3 km product matches AERONET over individual AERONET stations.
For the regional and local analyses, we will use only QAF = 3 retrievals and calculate the same collocation statistics for each station individually.Figure 4 plots the values for correlation coefficient, mean bias, percentage within expected error, and RMSE for each station that reported at least 100 collocations over the entire time series.In general, the MODIS 3 km retrievals show high correlations over much of the northern midlatitudes where there are AERONET stations in abundance.Correlation is weaker at some stations in California and the arid southwest of North America, in the Caribbean, Central America, insular SE Asia, Australia, and especially in southern South America.These are locations where the standard 10 km product also shows poor agreement with AERONET (https://darktarget.gsfc.nasa.gov/validation/maps, last access: 1 January 2018).In most of these regions, like the arid southwest of North America, the surface properties do not agree with the assumptions used in the global retrieval, thereby introducing error in the retrieval.
Not all stations with strong correlations exhibit small mean biases.For example, MODIS 3 km retrievals severely under predict AOD in the stations of West Africa, falling well below expected error there, even though those stations report high correlations with AERONET.Such a validation pattern is symptomatic of incorrect assumptions of aerosol properties.In West Africa, the interplay of heavy dust and heavy smoke, often occurring simultaneously in the atmospheric column at the same time, creates difficult situations to properly model in the aerosol retrieval.Likely the poor agreement between MODIS and AERONET there can be attributed to this difficulty.Stations in Australia show relatively small mean biases and high percentages meeting expectations, despite poor correlations.This apparent contradiction suggests that the poor correlations are the result of small dy- namic range in the scatterplots that occur when AOD is consistently low.
In Fig. 4, we see the local nature of the validation statistics.Stations in close proximity to each other sometimes report very different statistics.For example, the stations clustered across northern India and those in an array across central South America (Brazil) range from strong positive to negative mean biases and RMSE from 0.05 to 0.20, even though these groupings of stations will fall within the same region as defined in Fig. 5.This is apparent in almost any region.Some of this variability may be due to differences in the temporal extent of the AERONET record at each individual station, so that even if stations are in close proximity in space they may actually be making measurements in entirely different years or seasons.Other differences may be related to topography, urban surfaces, or other factors.Still, the variability seen in Fig. 4 shows how local conditions, and possibly the individual characteristics of the time series, affect validation statistics.
The final point to note in Fig. 4 is the difference between Terra and Aqua.For example, in the mean bias plots we see how the mean bias across the North American central plains decreases from approximately positive 0.04-0.05for Terra to slightly negative for Aqua.For many of the stations, positive mean AOD biases are higher for Terra than for Aqua.This is in agreement with the global statistics presented in Table 1.
We next group individual stations into 17 regions, defined following Hyer et al. (2011).These are shown in Fig. 5 with Table 2 presenting the regional validation statistics for each of the defined regions.We know from previous analyses presented above that there are distinctive differences between Terra and Aqua mean biases; however, in calculating the regional statistics of Table 2, we combine Terra and Aqua collocations.
The majority of collocations are found in the northern midlatitude regions, with E. and W. CONUS (east and west continental United States) representing 25 % of the total collocations and Mediterranean Europe and boreal Eurasia representing another 34 % of the total.MODIS 3 km retrievals from E. CONUS, Mediterranean Europe, and boreal Eurasia show very good overall agreement with AERONET, exhibiting R ≥ 0.78, bias ≤ 0.05, and at least two-thirds of retrievals falling within expected error.W. CONUS retrievals agree with AERONET less well, exhibiting some of the highest positive biases of any region on the globe.These four regions drive the global validation statistics, which reflect both the good agreement of E. CONUS and Europe and the high

Error dependencies
We next explore the relationship between MODIS-AERONET 3 km AOD differences and various parameters for the global collocation data set.At each collocation, the AERONET AOD is subtracted from the MODIS AOD, so that a positive difference indicates a positive MODIS bias.The data are then sorted according a particular parameter in the database.Collocations are grouped into 87 bins for Terra and 67 bins for Aqua, each containing 1000 collocations.Thus, there are equal numbers of collocations in each bin, but the bins are not equally spaced along the x axis.The mean, median, and standard deviations (SD) of the MODIS-AERONET differences are calculated for each bin.
Figure 6 shows the results of this analysis as a function of AOD, both the true AOD, as measured by AERONET, and the MODIS-retrieved AOD.The differences between MODIS and AERONET AOD depend very little on the true AOD.There is some suggestion of a positive-negativepositive shift of differences at the very lowest AOD (< 0.1), but overall the differences are flat.Terra exhibits an overall positive mean bias against AERONET of about 0.06, with the bias in Aqua much less noticeable.We plot these differences against the MODIS-retrieved AOD to create a metric of retrieval accuracy that can be used to evaluate individual MODIS AOD retrievals in the absence of AERONET.Here we see a distinctive pattern between MODIS AOD bias and MODIS AOD.The higher the retrieved AOD is, the greater the positive difference between MODIS and AERONET.Significant biases of > 0.10 are seen for MODIS AOD values > 0.40.For retrieved AOD < 0.10, the mean differences between MODIS and AERONET are negative.This indicates that a high value of retrieved AOD has greater probability of being too high than too low, and a low value of retrieved AOD has a greater probability of being too low than too high.These results are expected, as high AOD retrievals are more sensitive to true aerosol properties whereas true surface properties become more important in low AOD retrieval.Panel a of Fig. 7 shows how MODIS-AERONET AOD differences vary as a function of AOD variability.SD of the retrievals in the 5 × 5 collocation box is a measure of the homogeneity of the aerosol across the box.The collocation methodology assumes that MODIS spatial statistics will match AERONET temporal statistics, which holds best if the aerosol field is homogeneous.As variability across the box increases (i.e., SD of the AOD), we expect differences between MODIS AOD and AERONET AOD to grow.We see from Fig. 7a that differences are increasingly positive as variability increases.This is because the SD is not normalized, and the differences increase simply because the AOD is increasing as it does in Fig. 6.
Another test of the collocation methodology assumptions is to look for error dependencies on the number of MODIS retrievals within the 5 × 5 collocation box.Note that the methodology requires at least 5 retrievals to represent the box and may have as many as 25.We see from Fig. 7b that there are dependencies.Fewer numbers of retrievals are associated with positive differences, but having almost all of the 25 retrievals available is associated with negative differences.We understand how collocation statistics might be skewed by having fewer numbers of retrievals available to match AERONET, especially if the aerosols across the collocation box were not spatially homogeneous.Also fewer numbers of retrievals may be a result of marginal retrieval conditions caused by clouds and unfavorable surface conditions.It is less easy to understand the negative differences when the box is especially well represented with sufficient retrievals, and this require further investigation on individual situations.
Panel c of Fig. 7 shows the MODIS-AERONET AOD differences as a function of the average number of native pixels (0.5 km) used by the MODIS 3 km retrieval in producing a value of AOD.The retrieval begins with 36 native pixels, and after masking, sorting, and discarding between 5 and 12 pixels remain.The number of pixels used by the retrieval is an indication of how much masking was required.If 12 pixels remain, then no masking was required, and the situation is cloud-free and taking place over favorable surfaces.If only 5 pixels remain, there are conditions that could raise concerns.In Fig. 7, we see that the fewer the pixels used by the retrieval (i.e., more masking is needed), the higher the positive bias is, especially for Terra.This suggests, in the Terra retrieval, that clouds or unfavorable surface conditions are contributing to the high bias seen in the global data set.Interestingly, MODIS-AERONET differences are negative when masking is at a minimum, similar as to when the collocation box contains almost all possible retrievals.It seems that cloud-free situations with appropriate surface features are associated with MODIS under predicting AERONET AOD.The same functional relationship is apparent in the Aqua data set also, but the biases, both high and low, are less pronounced.
Figure 8 shows the MODIS-AERONET AOD differences as a function of geometry.Panel a plots the differences against scattering angle, where we see positive bias increasing towards the extreme backscattering angles.The functional relationship is similar in both Terra and Aqua, but Terra's positive bias is more pronounced.Panel b plots the differences against sensor view (i.e., zenith) angle, where the Aqua differences show little dependence on view angle, but the Terra differences increase positive biases in near-nadir views.Geometrical dependencies in bias generally point to systematic inaccuracies in retrieval assumptions.These can be either in terms of surface angular functions or in aerosol optical properties.However, the difference between Terra and Aqua sensor zenith angle dependencies suggests an issue with instrument characterization, which could include geometrical functionality due to the need to calibrate across the scan mirror.

Temporal changes
Examining temporal changes of validation statistics across the entire time series of the collocation database further characterizes the accuracy of the 3 km AOD product.and regional emphasis and introduces temporal variation in the global results.Therefore, we have selected 26 AERONET stations (Table 3, Fig. 9) with long-term data records with consistent collocation over the entire time series for this analysis.The analysis over these selected stations allows us to examine the change in bias (and ER) over a longer time period without change in spatial and temporal distribution of AERONET stations.Only QAF = 3 retrievals are used.During the 15 years of the collocation data set many factors have changed.For example, satellite sensor characterization is an ongoing process that employs several different measures to monitor radiometric drift and then continuously adjusts calibration parameters to compensate for that drift.Thus, even though the algorithm remains consistent throughout the data record, the inputs to that algorithm may not be, despite the best efforts of the MODIS Characterization Team.
The time series of the monthly statistics shows strong seasonal variation of mean bias and number of collocations.Strong positive bias occurs in the April-August time period, followed by low or even negative bias in the October-February period.In addition to the seasonal variability, Fig. 9 also exposes long-term temporal trends.There is a steady increase of the number of collocations per month, as the AERONET network expands over time.The number of collocations nearly doubles from the early years, up and through the beginning of 2012.The last few years of the record show a decrease in collocations, in some part attributed to the lag in promoting AERONET records from Level 1.5 to Level 2.0.We only use AERONET Level 2.0 for collocations.
The temporal mean biases for the entire time series are 0.04 and 0.014 for Terra and Aqua, respectively, corresponding to temporal mean ER of 0.55 and 0.2, respectively.The mean biases also exhibit temporal trends with biases beginning to increase around 2008-2009.The bias for Terra increased from 0.038 to 0.048 whereas these numbers for Aqua are 0.014 and 0.016.The corresponding ER increase for Terra in 2008 is from 0.48 to 0.65.The increase in ER for Aqua is negligible.
The systematic higher biases exhibited by Terra as compared with Aqua agree with the global analysis presented above.This offset in bias between the two MODIS sensors appears systematic from the beginning of the Aqua record to the end of the time series, although the magnitude of that offset increases over time as Terra's biases grow.The system-  (2000-2010) and (2011-2015).The map in the inset shows locations of AERONET stations used in this analysis with more details provided in Table 3. atic greater number of collocations in the Terra data set than in Aqua's may result from diurnal cloud patterns that create cloudier conditions in the afternoon during Aqua overpass than during Terra's morning one.More clouds in the afternoon (King et al., 2013) may reduce the number of possible collocations.However, instrumental differences affecting available retrievals are another possibility.

Discussion and conclusions
To validate the MODIS 3 km AOD products (MOD04_3K and MYD04_3K), which became publicly available in the MODIS Collection 6 release, we created a database of collocations of the product with AERONET observations.The collocation data set spanned the extent of the MODIS record from 2000 to 2015.Collocation criteria employed 0.15 × 0.15 degree latitude × longitude MODIS retrievals centered at the AERONET station and all AERONET obser-vations ±30 min of satellite overpass.Thus, the collocation box is approximately 15 km per side for nadir views.Version 2, Level 2.0, cloud-screened and quality-assured AERONET observations are used, and AERONET AOD is interpolated to 0.55 µm to match MODIS values.Overall there are over 90 162 high-quality collocations of Terra retrievals and over 71 248 high-quality collocations for Aqua.
The validation statistics examined include mean bias, regression slope, correlation coefficient, and percentage falling within expected error bounds.In this validation exercise we hold the 3 km AOD product to expected error thresholds of ±(0.05 + 20 %) (Remer et al., 2013).We find that the global 3 km AOD product displays skill in matching AERONET observations with a correlation coefficient of 0.87, but there is a RMSE of 0.15 and 0.13 for Terra and Aqua, respectively.The scatter is not random, but exhibits a mean positive bias of 0.06 for Terra and 0.03 for Aqua.The Remer et al. (2013) error bounds capture two-thirds of the Aqua 3 km AOD re-P.Gupta et al.: Validation of MODIS 3 km land aerosol optical depth trieval, but less than 63 % of the Terra retrievals.There is significant degradation of validation accuracy if MODIS retrievals of poor data quality (QAF = 0) are included in the analysis.Thus, on a global basis we recommend using only QAF = 3 MODIS 3 km retrievals for quantitative analysis.If doing so, then the expected error for the Aqua product is ±(0.05 + 0.20 × AOD), on a global basis, but only ±(0.06 + 0.20 × AOD) for Terra, where AOD is the true AOD.However, a more accurate representation of Terra's expected error is to account for the positive bias with asymmetrical error bounds: −0.03-0.20 and +0.13 + 0.20 AOD.The expected error bounds contain two-thirds of all AOD retrievals.To assess the mean bias of the retrieval based on the retrieved AOD, we find that the mean bias can be modeled as 0.19 + 0.17 × ln(AOD_MODIS + 0.25) for Terra and 0.15 + 0.14 × ln(AOD_MODIS + 0.25) for Aqua.Note that mean bias itself is subject to uncertainty.
We find a wide range of accuracy in the 3 km product locally and regionally, with spatially contiguous stations sometimes exhibiting significantly different validation statistics.The distribution of validation sites is highly skewed towards the northern midlatitudes with over 50 % of all collocations in the database resulting from these areas.Within the northern midlatitudes, eastern North America, Europe, and boreal Eurasia show some of the best agreement with AERONET.However, western North America, also in the northern midlatitudes, exhibits some of the poorest agreement.Regions outside of the northern midlatitudes are less well represented in the database, but we find that north-central South America including the Amazon, equatorial-southern Africa, and Australia show good agreement with AERONET.Mexico, the Caribbean, southern South America, SW Asia, East Asia, and the maritime continent of Southeast Asia generally show poor agreement.No attempt was made to isolate urban regions from rural ones or to otherwise sort the data by surface type.
The difference between MODIS-retrieved 3 km AOD and AERONET-observed values are mostly independent of true AOD.This is unexpected as error bounds are defined as a function of the percentage of AOD ±(0.05 + 0.20 × AOD).However, the mean differences between MODIS 3 km AOD and AERONET are dependent on AOD variability.The more variable the AOD is, the higher the positive offset between MODIS and AERONET.Some of this is due to the conditions of the original MODIS retrieval, and some is due to the difficulties of a spatiotemporal match-up in the collocation methodology.We also find that the greater the need for masking clouds and unfavorable surfaces in the original retrieval, the greater the offset between MODIS and AERONET.Interesting and unexplained is the tendency for the differences between MODIS and AERONET to go negative when conditions appear to be homogeneous and cloud-free.We also find error dependencies on geometry, with greater error in the far backscattering region, and, for Terra only, greater error in near-nadir views.Some of these geometrical errors are in-troduced by uncertainties in the assumptions of surface characteristics and aerosol optical properties in the MODIS retrieval, but the difference between Terra and Aqua suggests differences in the sensors themselves.
We continue to see differences between the sensors in how validation statistics have evolved over time.By limiting our time series analysis to only 26 AERONET stations that span the entire time series, we eliminate changes in validation statistics due to changing AERONET station distribution.We find that both sensors exhibit time series with strong seasonal dependence.Both sensors have higher positive biases against AERONET in the northern spring and summer than in northern fall and winter, with Terra's positive bias always greater than Aqua's.However, during the early years of the time series, both sensors were reporting similar number of retrievals falling within expected error.This changed during 2007-2009, when Terra's accuracy began to fall off and its positive biases increased.Aqua's bias against AERONET also increased during this time frame, but not as rapidly as Terra's.While, these drifts in validation accuracy suggest changes in characterization accuracy of the MODIS sensors themselves, there are other factors.The number of collocations has fallen off towards the end of the time series.We attribute this to a lag for AERONET observations to be elevated to Level 2 status.Because of this lag, there may in fact be a change in the distribution of AERONET stations in the temporal collocation database, despite our best intentions.This may introduce a temporal trend in the validation statistics.Furthermore, the aerosol system itself has undergone significant changes since 2000, with the US and Europe drastically reducing their urban and industrial emissions and substituting other types of aerosol such as wildfire smoke in the western US, biogenic particles during the summer in the southeastern US (Hand et al., 2014;Toon et al., 2016), and transported Saharan dust around the Mediterranean (Karnieli et al., 2009).Likewise emissions and resulting AOD from other regions experience both long-term trends and interannual variability.The combination of variations in AERONET station distribution and the changing aerosol system over the time series examined may be contributing to the trends seen in the validation statistics.However, the differences between Terra and Aqua are difficult to explain without pointing to sensor characterization stability.
The standard 10 km product that meets expected error at 67 and 74 % levels for Terra and Aqua, respectively, on a global basis is measurably more accurate than the 3 km product examined here in detail.Similarly the global standard 10 km AOD product exhibits half of the mean bias with Terra and no bias at all for Aqua.These validation statistics for the 10 km standard product are preliminary.Once a more comprehensive evaluation of the 10 km Collection 6 product is completed, these validation statistics are likely to change.The 10 km product numbers are provided here only to lend perspective to our results with the finer-resolution product.Given this perspective, we confirm the Remer et al. (2013) recommendation that users whose interests are global should use the more robust and accurate 10 km product and leave the 3 km product for specific applications that require the finerresolution representation of the AOD field.
This validation study only addressed the 3 km AOD product over land and did not evaluate the over-water product.The study took a global and regional view, not a local one.Users of the product on a local level are encouraged to consider particular biases that may occur due to local conditions.For example, we know that the MODIS Collection 6 DT AOD retrieval is systematically biased over urban surfaces (Gupta et al., 2016).This is true for both the 10 and 3 km DT products.This problem has been addressed and is substantially mitigated with the release of the Collection 6.1 version of the algorithm (Gupta et al., 2016).In the meantime, the results here show that overall the DT MODIS Collection 6 algorithm is producing an AOD product at 3 km resolution with sufficient accuracy and with well-characterized biases.The product can now be used quantitatively in a wide variety of science and practical applications.

Figure 1 .
Figure 1.Locations of AERONET stations used in the validation study.The color scale shows the number of coincident MODIS-AERONET data points over each station for the entire period.Panel (a) is for Terra MODIS and panel (b) for Aqua MODIS.Most stations operated for only a subset of the 13-to 15-year record.

Figure 2 .
Figure 2. Two-dimensional density scatterplot of MODIS 3 km AOD versus AERONET-observed AOD at 0.55 µm for the global collocation data set.Panel (a) is for Terra MODIS for only the retrievals identified as "high quality" (QAF = 3), and panel (b) is for Aqua MODIS for QAF = 3.The solid line denotes the 1 : 1 line, and the dashed lines denote the envelope of the expected error (EE), defined by Eq. (2).

Figure 3 .
Figure 3. Global distribution of mean bias in MODIS 3 km AOD retrievals with respect to collocated AERONET observations.The circular dots with solid lines are for Terra values and diamond dots with dotted lines are for Aqua values.The colors vary for the three quality levels (QAF = 0, poor quality; QAF = 3, high quality; and QAF = 0 and 3, all quality).

Figure 4 .
Figure 4. Statistics calculated from the collocation database at each AERONET station, individually, for Terra on the left and Aqua on the right.Shown are values for correlation coefficient (R), mean bias, percentage within expected error (EE %), and RMSE.Only stations with at least 100 collocations are plotted, which may differ between the two satellites, and only collocations with MODIS retrievals of QAF = 3 are included.

Figure 6 .
Figure 6.AOD differences between the MODIS 3 km product and AERONET for the global collocation data set, QAF = 3, as a function of AERONET AOD (a, b) MODIS AOD (c, d).The left column shows Terra values and the right column shows Aqua.The global data set was sorted according to AOD, binned into bins with equal number of collocations, and then mean, median, and standard deviation of each bin were calculated.Red dots and lines show the mean.Black dots and lines show the median.The blue cloud indicates 1 standard deviation of each bin.The horizontal black line denotes zero difference, and the dashed lines indicate EE envelopes.Positive values indicate that MODIS AOD is higher than AERONET.
Figure 7. Same as Fig. 6 except for standard deviation of MODIS AOD within the 5 × 5 collocation box (a), number of MODIS retrievals with the 5 × 5 collocation box (b), and number of MODIS reflectance native pixels used by the retrieval, averaged for all retrievals made in the 5 × 5 collocation box (c).

Figure 8 .
Figure 8. Same as Fig. 6 except for scattering angle (a) and sensor zenith angle (b).

Figure 9 .
Figure 9.Time series of monthly mean error ratios (Eq. 3) (a) and number of collocations (b) for the global collocation data set from 26 selected long-term AERONET stations.The Terra record is in red, and Aqua in blue.Note that Aqua's record begins 2 years after Terra, and the total number of collocations is temporally variant.Only MODIS 3 km retrievals with QAF = 3 are included.The horizontal red and blue lines are temporal means of ER.Dotted red and blue horizontal lines indicate long-term temporal mean ER for each satellite.Solid red lines are temporal mean ER calculated for Terra for two periods(2000-2010) and (2011-2015).The map in the inset shows locations of AERONET stations used in this analysis with more details provided in Table3.
Terra MODIS 10 km data period is March 2000 to June 2013 whereas Aqua MODIS 10 km data period is January 2003-June 2013.
have stronger correlation, smaller RMSE and more retrievals falling within expected error than do the low-quality (QAF = 0) retrievals (Table

Table 2 .
Regional statistics of intercomparison between MODIS and AERONET.This is using joint data sets of Terra and Aqua for QAF = 3 only.
Figure 5. Map showing 17 selected parts of the world where regional analysis is performed.

Table 3 .
List of selected AERONET stations for the long-term analysis as presented in Fig.9.