Assessing sub-grid variability within satellite pixels using airborne mapping 1 spectrometer measurements

16 Sub-grid variability (SGV) of atmospheric trace gases within satellite pixels is a key issue 17 in satellite design, and interpretation and validation of retrieval products. However, characterizing 18 this variability is challenging due to the lack of independent high-resolution measurements. Here 19 we use tropospheric NO 2 vertical column (VC) measurements from the Geostationary Trace gas 20 and Aerosol Sensor Optimization (GeoTASO) airborne instrument with a spatial resolution of 21 about 250 m ´ 250 m to quantify the normalized SGV (i.e., the standard deviation of the sub-grid 22 GeoTASO values within the sampled satellite pixel divided by their mean of the sub-grid 23 GeoTASO values within the sampled satellite pixel) for different satellite pixel sizes. We use the 24 GeoTASO measurements over the Seoul Metropolitan Area (SMA) and Busan region of South 25 Korea during the 2016 KORUS - AQ field campaign, and over the Los Angeles Basin, USA during 26 the 2017 SARP field campaign. We find that the normalized SGV of NO 2 VC increases with 27 increasing satellite pixel sizes (from ~10% for 0.5 km ´ 0.5 km pixel size to ~35% for 25 km analyses presented in this study are equally applicable in model evaluation when comparing model 39 grid values to local observations. Results from the Weather Research and Forecasting model 40 coupled with Chemistry (WRF-Chem) model indicate that the normalized satellite SGV of 41 tropospheric NO 2 VC calculated in this study could serve as an upper bound to the satellite SGV 42 of other species (e.g., CO and SO 2 ) that share common source(s) with NO 2 but have relatively 43 longer lifetime. 44 spectrometer.

Sub-grid variability (SGV) of atmospheric trace gases within satellite pixels is a key issue 17 in satellite design, and interpretation and validation of retrieval products. However, characterizing 18 this variability is challenging due to the lack of independent high-resolution measurements. Here 19 we use tropospheric NO 2 vertical column (VC) measurements from the Geostationary Trace gas 20 and Aerosol Sensor Optimization (GeoTASO) airborne instrument with a spatial resolution of 21 about 250 m ´ 250 m to quantify the normalized SGV (i.e., the standard deviation of the sub-grid 22 GeoTASO values within the sampled satellite pixel divided by their mean of the sub-grid 23 GeoTASO values within the sampled satellite pixel) for different satellite pixel sizes. We use the 24 GeoTASO measurements over the Seoul Metropolitan Area (SMA) and Busan region of South 25 Korea during the 2016 KORUS-AQ field campaign, and over the Los Angeles Basin, USA during 26 the 2017 SARP field campaign. We find that the normalized SGV of NO 2 VC increases with 27 increasing satellite pixel sizes (from ~10% for 0.5 km ´ 0.5 km pixel size to ~35% for 25 km ´ 25 28 km pixel size), and this relationship holds for the three study regions, which are also within the 29 domains of upcoming geostationary satellite air quality missions. We also quantify the temporal 30 variability of the retrieved NO 2 VC within the same satellite pixels (represented by the difference 31 of retrieved values at two different times of a day). For a given satellite pixel size, the temporal 32 variability within the same satellite pixels increases with the sampling time difference over SMA. 33 For a given small (e.g., <=4 hours) sampling time difference within the same satellite pixels, the 34 temporal variability of the retrieved NO 2 VC increases with the increasing spatial resolution over 35 the SMA, Busan region, and the Los Angeles basin.

36
The results of this study have implications for future satellite design and retrieval 37 interpretation, and validation when comparing pixel data with local observations. In addition, the 38 analyses presented in this study are equally applicable in model evaluation when comparing model 39 grid values to local observations. Results from the Weather Research and Forecasting model 40 coupled with Chemistry (WRF-Chem) model indicate that the normalized satellite SGV of 41 tropospheric NO 2 VC calculated in this study could serve as an upper bound to the satellite SGV 42 of other species (e.g., CO and SO 2 ) that share common source(s) with NO 2 but have relatively 43 longer lifetime. 44 45

Introduction 46
Characterizing sub-grid variability (SGV) of atmospheric chemical constituent fields is 47 important in both satellite retrievals and atmospheric chemical-transport modeling. The inability 48 to resolve sub-grid details is one of the fundamental limitations of grid-based models (Qian et al.,49 2010) and has been studied extensively (e.g., Boersma et al., 2016;Ching et al., 2006;Denby et 50 al., 2011;Pillai et al., 2010;Qian et al., 2010). Pillai et al. (2010) found that the SGV of column-51 averaged carbon dioxide (CO 2 ) can reach up to 1.2 ppm in global models that have a horizontal 52 resolution of 100 km. This is an order of magnitude larger than sampling errors that include both 53 limitations in instrument precision and uncertainty of unresolved atmospheric CO 2 variability 54 within the mixed layer (Gerbig et al., 2003). Denby et al. (2011) suggested that the average 55 European urban background exposure for nitrogen dioxide (NO 2 ) using a model of 50-km 56 resolution is underestimated by ~44% due to SGV. 57 In contrast, much less attention has been paid to the sub-grid variability within satellite 58 pixels (e.g., Broccardo et al., 2018;Judd et al., 2019;Tack et al., 2020). Indeed, some previous 59 studies (e.g., Kim et al., 2016;Song et al., 2018;Zhang et al., 2019;Choi et al., 2020) used satellite 60 retrievals to study SGV in models, and calculated representativeness errors of model results with 61 respect to satellite measurements (e.g., Pillai et al., 2010). Even though satellite retrievals of 62 atmospheric composition often have smaller uncertainties than model results, it has not been until 63 recently that the typical spatial resolution of atmospheric composition satellite products has 64 reached scales comparable to regional atmospheric chemistry models (< ~10 km). 65 Until recently, accurate in-situ measurements with sufficient spatiotemporal coverage have 66 not been available. As a result, it has been challenging to quantify satellite SGV, even though this 67 is a key issue in designing, understanding and correctly interpreting satellite observations. This is 68 especially important in the satellite instrument develop process, during which the required 69 measurement precision and retrieval resolution need to be defined in order to meet the science 70 goals. In addition, when validating and evaluating relatively coarse-scale satellite retrievals by 71 comparing with in situ observations, SGV introduces large uncertainties. This work is partly 72 motivated by validation requirements and considerations for the upcoming geostationary orbit 73 (GEO) satellite constellation for atmospheric composition that includes the Tropospheric 74 Emissions: Monitoring Pollution (TEMPO) mission over North America (Chance et al., 2013;75 Zoogman et al., 2017), the Geostationary Environment Monitoring Spectrometer (GEMS) over 76 Asia (Kim et al., 2020), and the Sentinel-4 mission over Europe (Courrèges-Lacoste et al., 2017). 77 The measurements of the Geostationary Trace gas and Aerosol Sensor Optimization 78 (GeoTASO) airborne instrument provide a unique dataset for quantifying satellite SGV.

79
GeoTASO is an airborne remote sensing instrument capable of high spatial resolution retrieval of 80 UV-VIS absorbing species like NO 2 , formaldehyde (HCHO; Nowlan et al., 2018) and sulfur 81 dioxide (SO 2 ; Chong et al., 2020), and with measurement characteristics similar to the GEMS and 82 TEMPO GEO satellite instruments. The GeoTASO data used here were taken in gapless, grid-like 83 patterns -or "rasters" -over the regions of interest, providing essentially continuous spatial 84 coverage that was repeated up to four times a day in some cases. As such, the GeoTASO data 85 provide a preview of the type of sampling that is expected from the GEO satellite sensors, making 86 the data particularly suitable for our study. We focus on the GeoTASO measurements made during 87 the Korea United States Air Quality (KORUS-AQ) field experiment in 2016. The measurements 88 from KORUS-AQ have been widely used by researchers for various air quality topics, including 89 quantification of emissions and model and satellite evaluation (e.g., Deeter et al., 2019;Huang et 90 al., 2018;Kim et al., 2018;Miyazaki et al., 2019;Spinei et al., 2018;Tang et al., 2018Tang et al., , 2019Souri 91 et al., 2020, Gaubert et al., 2020. We further compare our findings from KORUS-AQ with flights 92 conducted during the NASA Student Airborne Research Program (SARP) in 2017 over the Los 93 Angeles (LA) Basin to test the general applicability of our findings. The KORUS-AQ mission took 94 place within the GEMS domain, while the SARP in 2017 is within the domain of TEMPO. Given 95 the similarity between the TEMPO and GEMS instruments in terms of spectral ranges, spectral 96 and spatial resolution, and retrieval algorithms (Al-Saadi et al., 2014), such comparison is 97 reasonable and useful in facilitating the generalization of the results from the study. 98 We use the tropospheric NO 2 vertical column (VC) retrieved by GeoTASO as a tool to 99 assess satellite SGV. NO 2 is an important air pollutant that is primarily generated from 100 anthropogenic sources such as emissions from the energy, transportation, and industry sectors 101 (Hoesly et al., 2018). NO 2 is a reactive gas with a typical lifetime of a few hours in the planetary 102 boundary layer (PBL), although it can also be transported over long distance in the form of 103 peroxyacetyl nitrate (PAN) and nitric acid. NO 2 is a precursor of tropospheric ozone and secondary 104 aerosols, and has a negative impact on human health and the environment (Finlayson-Pitts et al., 105 1997). The results from this paper's analysis of NO 2 also have implications for other air pollutants 106 that share common source(s) with NO 2 , but that have somewhat longer lifetimes, for example, 107 carbon monoxide (CO) and SO 2 . 108 In this study, we apply a satellite pixel random sampling technique and the spatial structure 109 function analysis to GeoTASO data (described in Section 2) to quantify the SGV of satellite pixel 110 NO 2 VC at a variety of spatial resolutions. We analyze the relationship between satellite pixel size 111 and satellite SGV, and the relationship between satellite pixel size and the temporal variability of 112 NO 2 observations (Section 3). We then discuss the implications for satellite design, satellite 113 retrieval interpretation, satellite validation and evaluation, and satellite-in situ data comparisons 114 (Section 4). Implications for general local observations and grid data comparisons are also 115 discussed. Section 5 presents our conclusions. 116

Data and methods 117
In this section, we describe the GeoTASO instrument, campaign flights and the different 118 analysis techniques used to characterize the satellite pixel SGV. We outline two approaches: 119 satellite pixel random sampling to investigate separately both spatial variability and temporal 120 variability, and the construction of spatial structure functions for an alternative measure of spatial 121 variability. 122

GeoTASO instrument 123
In this study, we focus on GeoTASO retrievals of tropospheric NO 2 Vertical Column (VC).

124
GeoTASO is a hyperspectral instrument (Leitch et al., 2014) that measures nadir backscattered 125 light in the ultraviolet (UV; 290-400 nm) and visible (VIS; 415-695 nm). As one of NASA's 126 airborne UV-VIS mapping instruments, it was designed to support the upcoming GEO satellite 127 missions by acquiring high temporal and spatial resolution measurements with dense sampling for 128 optimizing and experimenting with new retrieval algorithms (Leitch et al., 2014;Nowlan et al., 129 2016;Lamsal et al., 2017;Judd et al., 2019). GeoTASO has a cross-track field of view of 45° (+/-130 22.5° from nadir), and the retrieval pixel size at nadir is approximately 250 m´250 m from typical 131 flight altitudes of 24,000-28,000 feet (7.3-8.5 km). The dense sampling of the GeoTASO datasets 132 is a unique feature and provides the opportunity to study the expected spatial and temporal 133 variability within the satellite NO 2 retrieval pixels at high resolution. The GeoTASO data used in 134 this study are mostly cloud-free. Validation of GeoTASO NO 2 retrievals during KORUS-AQ with 135 Pandora shows ~10% difference on average. The uncertainty estimate is lower than that reported 136 by Nowlan et al. [2016]. 137

The 2016 KORUS-AQ field campaign 138
The KORUS-AQ field measurement campaign (Al-Saadi et al., 2014), took place in May-139 June 2016, to help understand the factors controlling air quality over South Korea. One of the goals 140 of KORUS-AQ was the testing and improvement of remote sensing algorithms in advance of the 141 launches of GEMS, TEMPO, and Sentinel-4 satellite missions. It is hoped that the high-quality 142 initial data products from the GEO missions will facilitate their rapid uptake in air quality 143 applications after launch (Al-Saadi et al., 2014;Kim et al., 2020). During KORUS-AQ, GeoTASO 144 flew onboard the NASA LaRC B200 aircraft. We focus on the data taken over the Seoul 145 Metropolitan Area (SMA) that is highly urbanized and polluted, and the greater Busan region, that 146 is somewhat less urbanized and less polluted ( Figure 1). Figure 2 shows the 12 GeoTASO data 147 rasters (i.e., gapless maps) acquired over SMA. Figure S1 shows the 2 GeoTASO rasters acquired 148 over the Busan region. (https://airbornescience.nasa.gov/content/Student_Airborne_Research_Program), GeoTASO was 152 flown onboard the NASA LaRC UC-12B aircraft over the LA Basin ( Figure S2, which also shows 153 the landcover). A detailed description and analysis of these data can be found in Judd et al. (2018;154 2019). In this study, we compare our analyses and findings from KORUS-AQ with those using the 155 GeoTASO data over the LA Basin to test the general applicability of our findings. 156

Satellite pixel random sampling for spatial variability 157
GeoTASO provides continuous measurements in a gapless map pattern at high spatial 158 resolution (Figures 2, S1, and S2). This dataset allows us to sample and study the SGV of coarser 159 spatial resolution hypothetical satellite pixels sampling the same domain. To mimic satellite 160 observations and quantify the satellite SGV, we randomly sample the GeoTASO data with 161 hypothetical satellite pixels spanning 27 different pixel sizes (0.5 km´0.5 km, 0.75 km´0.75 km, 162 1 km´1 km, 2 km´2 km, up to 25 km´25 km). Because of the move to smaller pixel sizes in the 163 future satellite missions, and the limitation in the maximum hypothetical satellite pixel size 164 sampled using the random sampling method, the analysis of SGV only goes up to 25 km ´ 25 km. 165 This sampling process is conducted for each hour of each selected flight over the regions of interest 166 during the KORUS-AQ and SARP campaigns. For every sampled satellite pixel, the mean 167 (MEAN pixel ) and standard deviation (SD pixel ) of the GeoTASO tropospheric NO 2 VC data within 168 the pixel are calculated to represent the satellite SGV. Normalized satellite SGV is calculated by 169 the standard deviation of the GeoTASO data within the sampled satellite pixel divided by the mean 170 of the GeoTASO data within the sampled satellite pixel (SD pixel /MEAN pixel ). 171 We use a set of 10,000 hypothetical satellite pixels at each size to include all of the 172 GeoTASO data in the analysis and to cover as many locations as possible. Our sensitivity test 173 indicates that the results do not change by halving the sample size. Because the data are located 174 closely in space but may be sampled at slightly different times for the same flight, we separate 175 GeoTASO data into hourly bins for each flight before pixel sampling in order to reduce the impact 176 of temporal variability of the GeoTASO data within a single satellite pixel sample. 177 As an illustration, we describe the procedure below for the May 17 th afternoon flight 178 ( Figure 3) that was conducted from 13:00 to 17:00 local time: (1) the GeoTASO data during this 179 flight were divided into four hourly groups according to the measurement time, i.e., 13:00-14:00, 180 14:00-15:00, 15:00-16:00, and 16:00-17:00; (2) for each of the 27 hypothetical satellite pixel sizes, 181 we randomly generate 10,000 satellite pixel locations within each hourly group. Therefore, for 182 each hour, we sample 270,000 satellite pixels (27 different satellite pixel sizes and 10,000 samples 183 for each size), and for this example flight, we have a total of up to 1,080,000 possible satellite 184 pixels in each of 4 hourly groups. Note that the actual samples used in the analysis are less than 185 1,080,000 because we discarded a sampled satellite pixel if it is not covered by GeoTASO data for 186 at least 75% of its area. 187 We tested other choices of the coverage threshold in addition to 75% over SMA (not shown 188 here). The results are similar for small pixels (< ~10 km 2 ), as they are more likely to be covered 189 by GeoTASO data regardless of the threshold value. For larger pixels (> ~15 km 2 ), the satellite 190 SGV is slightly lower when using 30% or 50% as the area coverage threshold, because larger 191 pixels act like smaller pixels when only partially covered. The threshold of 75% was chosen as a 192 trade-off between sample size and representation. 193

Satellite pixel random sampling for temporal variability 194
We also quantify the temporal variability of the retrieved NO 2 VC within the same satellite 195 pixels for different satellite pixel sizes. To calculate temporal variability within a hypothetical 196 satellite pixel, we need GeoTASO data to cover the hypothetical satellite pixel at different times 197 during the day. During the KORUS-AQ and 2017 SARP campaigns, rasters were treated as single 198 units (Judd et al., 2019). Each raster produces a contiguous map of data that we consider as roughly 199 representative of the mid-time of the raster. Unlike the calculation of SGV, which is based on data 200 separated into hourly bins (section 2.4) to reduce the impact of temporal variability in the 201 calculated spatial variability, the satellite pixel random sampling to assess temporal variability is 202 based on rasters, and only conducted for days with multiple rasters. This is to ensure that the 203 sampled hypothetical satellite pixels have multiple values at different times of the day. and hence 204 maximize the sample size. 205 To assess temporal variability within the hypothetical satellite pixels, we randomly select 206 50,000 pixel locations for each of the 27 hypothetical satellite pixel sizes, and use this same set of 207 pixel locations to sample the GeoTASO data for each raster across all flights for a given day. This 208 process is repeated for all days with multiple rasters, and the 75% of area coverage threshold is 209 also applied. When there are two or more raster values of MEAN pixel for a given pixel location 210 separated by time Dt, the temporal mean difference (TeMD) within the satellite pixel is calculated 211 as: 212 This procedure is repeated for each satellite pixel size. 214

Spatial structure function 215
Structure functions have been applied to in situ measurements and model-generated 216 tropospheric trace gases to analyze their spatial and temporal variability in previous studies (Harris 217 et al., 2001). The Spatial Structure Function (SSF) (Fishman et al., 2011;Follette-Cook et al., 2015) 218 is an alternative measure to the satellite pixel random sampling described above for quantifying 219 spatial variability, and in this work, we apply the SSF to GeoTASO data to assist our analysis of 220 satellite SGV. The main difference between the two measures is that the SSF is based on individual 221 GeoTASO data points, while the results from satellite pixel random sampling are based on sampled 222 satellite pixels. The SSF is defined here follows Follette-Cook et al. (2015) as initial and boundary conditions, and the model meteorological fields above the PBL were 237 nudged 6-hourly. KORUS version 3 anthropogenic emissions and FINN version 1.5 fire emissions 238 (Wiedinmyer et al., 2011) were used. 239

Results 241
In this section, we discuss the results for SGV over the different regions considered. Results 242 are presented for the hypothetical satellite pixel random sampling for spatial variability and 243 temporal variability, and for the spatial structure function analysis for spatial variability. 2.3´10 16 molecules cm -2 , 1.1´10 16 molecules cm -2 , and 1.3´10 16 molecules cm -2 , respectively. 248 Over the three regions, the mean values (MEAN pixel ) and absolute values of standard deviation 249 (SD pixel ) of the hypothetical satellite pixels sampled over GeoTASO NO 2 VC data are different 250 ( Figure S3). This is consistent with previous studies suggesting absolute values of SGV can vary 251 regionally (Judd et al., 2019;Broccardo et al., 2018). However, we find that the normalized 252 satellite SGV (calculated as the ratio of SD pixel to MEAN pixel for a sampled pixel) is similar over 253 each of the areas, regardless of the absolute level of pollution as represented by MEAN pixel (Figure  254 4). Over SMA (Figure 4a), the mean normalized satellite SGV of tropospheric NO 2 VC increases 255 smoothly from ~10% for the pixel size of 0.5 km ´ 0.5 km, to ~35% for the pixel size of 25 km ´ 256 25 km. The interquartile variation of the satellite SGV also increases with satellite pixel sizes. The 257 patterns of the sampled satellite pixels over the Busan region ( Figure 4b) and LA Basin ( Figure  258 4c) are also found to be similar to those over SMA. Furthermore, Figures S4 and S5 show that 259 even the individual flights over the three domains generally follow the same pattern, except in the 260 case of the June 9 PM flight that is discussed below. 261 We also compare normalized satellite SGV for different levels of pollution, regardless of 262 their regions ( Figure S6). The normalized satellite SGV for the less polluted pixels (MEAN pixel 263 being lower than the average value of all pixels, i.e., 2´10 16 molecules cm -2 ) also shows an overall 264 similar pattern as for the more polluted pixels (MEAN pixel being higher than the average value of 265 all pixels). We notice that at small pixel sizes, less polluted pixels have higher normalized satellite 266 SGV, possibly contributed by relatively higher retrieval noise at lower pollution levels. 267 In addition to the comparison between different domains and pollution levels, we also 268 compare this relationship in the morning and afternoon. The variation of normalized SGV and 269 pixel size in the morning and afternoon are generally similar for the three regions ( Figure S7), 270 except for the large size pixels over SMA, where the normalized SGV is larger in the afternoon 271 than in the morning. This difference is driven by the GeoTASO data from June 9 PM ( Figure S4), 272 as the normalized SGV pattern for the afternoon agrees well with the normalized SGV pattern for 273 the morning when the June 9 PM data are excluded. Figure S1 shows that the June 9 PM NO 2 274 pollution level is higher than other days under meteorological conditions of light winds and 275 moderate temperatures. The MEAN pixel values increases ~60% going from 1 km ´ 1 km to 25 km 276 ´ 25 km pixel size, while SD pixel dramatically increases ~7 times from 1 km ´ 1 km to 25 km ´ 25 277 km. This is higher than any other day, and results in the highest SGV encountered over SMA at 278 the large pixel sizes. We also notice that the normalized SGV does not generally change 279 significantly in the range of 20 km ´ 20 km to 25 km ´ 25 km. However, in the case of SMA for 280 June 9 PM, the normalized SGV (as well as SD pixel ) increases significantly and monotonously with 281 pixel size in the range of 20 km ´ 20 km to 25 km ´ 25 km. 282

283
We show the normalized SGV for individual rasters over SMA ( Figure 5) to indicate the 284 uncertainty range of the normalized SGV shown in Figure 4. The spread of SGV across different 285 individual rasters represents the uncertainties of using the averaged normalized SGV for a specific 286 case. Note that the variation of normalized SGV with pixel size for individual rasters generally 287 follows the same pattern (i.e., increases with satellite pixel size), especially when the pixel size is 288 small (£10 km ´ 10 km). The normalized SGV increases from ~10% to ~25%, with the uncertainty 289 range consistently being ±5% when the pixel size is smaller than 10 km ´ 10 km. When the pixel 290 size is larger than 10 km ´ 10 km, the uncertainty range broadens with pixel sizes from ±5% (10 291 km ´ 10 km) to ±15% (25 km ´ 25 km). This means that when the satellite pixel size is large, 292 using the mean normalized SGV in Figure 4 to represent specific cases may lead to larger 293 uncertainties. Therefore, our analysis reveals a threshold for spatial resolution at about 10 km ´ 10 294 km. Below this resolution, SGV can be characterized by the mean value with relatively smaller 295 uncertainty (±5%) and hence high confidence, even with large diurnal or day-to-day variations. 296 The spatial resolutions of TEMPO, GEMS, and TROPOMI (TROPOspheric Monitoring 297 Instrument, Veefkind et al., 2012;Griffin et al., 2019;van Geffen et al., 2019) are within this £10 298 km ´ 10 km range, while the resolution of OMI (Ozone Monitoring Instrument, Levelt et al., 2006;299 2018) is not. This means that applying this study (e.g., Figure 4) to OMI for a specific case study 300 (e.g., a specific day) requires extra caution. 301 We tested the sensitivity of the results over SMA to sampling GeoTASO data with 302 hypothetical satellite pixels grouped by complete flight, rather than grouping the data by time in 303 hourly bins. The resulting patterns and relationships are similar, except that the normalized satellite 304 SGV increases ~5% for pixels of small sizes due to the inclusion of temporal variability ( Figure  305 S8a). We also tested the results for sampling satellite pixels by raster instead of within hourly bins.

306
The results are again similar to Figure 4, except that the normalized satellite SGV increases ~1% 307 for pixels of small sizes due to the inclusion of temporal variability ( Figure S8b). 308 The three regions investigated in this work have different levels of urbanization and air 309 pollution (Figures 1 and S2). PBL conditions are also different in the morning and afternoon 310 ( Figure S9). The similarity of the relationships between the satellite pixel size and the normalized 311 satellite SGV over these different regions (Figure 4)   we choose 3 km ´ 3 km, 5 km ´ 5 km, 7 km ´ 8 km, and 18 km ´ 18 km pixels to represent the 320 expected area of the satellite pixels for TEMPO (2.1 km ´ 4.4 km), TROPOMI (3.5 km ´ 7 km), 321 GEMS (7 km ´ 8 km), and OMI (18 km ´ 18 km), respectively. The expected normalized satellite 322 SGV for TEMPO, TROPOMI, GEMS, and OMI are 15-20%, ~20%, 20-25%, and ~30%, 323 respectively. Taking the TEMPO example, this implies that the satellite SGV could potentially 324 lead to uncertainties of 15-20% in a validation exercise comparing a satellite retrieval with sub-325 satellite local ground measurements of NO 2 VC as might be obtained from a Pandora spectrometer. 326 As a result, we should caution that calculating a pixel mean bias when evaluating against local 327 measurements within the pixel sometimes may be optimistic due to the cancellation of sub-grid 328 positive and negative biases. 329

Temporal variability (TeMD) within the same satellite pixels 330
In addition to satellite spatial SGV, we also analyze the temporal variability (i.e., TeMD) 331 within the same hypothetical satellite pixels. Figure 6 shows TeMD of satellite retrieved 332 tropospheric NO 2 VC over SMA as a function of hypothetical satellite pixel size and the separation 333 time Dt between flight rasters as described in section 2. along with improvements in the satellite retrieval spatial resolution with smaller pixels, improving 340 the satellite retrieval temporal resolution with higher frequency measurements is also an effective 341 way to enhance capability in resolving variabilities of NO 2 . This is expected because of NO 2 's 342 relatively short lifetime (~ a few hours) and strong diurnal cycle due to emission activities, 343 chemistry and photolysis rate (Fishman et al., 2011;Follette-Cook et al., 2015). The diurnal cycle 344 of the PBL also plays a large role because horizontal dispersion occurs as the PBL thickens during 345 the day. Early in the morning, the PBL is low (~1400 m during 9:00-11:00 in SMA) and strong 346 sources are evident such as traffic on major highways, etc. As the day progresses, the PBL height 347 increases (~1800 m during 15:00-17:00; Figure S9) allowing for greater horizontal mixing to take 348 place. By early afternoon, emissions from all the major sources in the central region have mixed 349 together to form a wide area of high pollution over the urban center. Judd et al. (2018) point out 350 that the topography over SMA also plays a role in the ability to mix horizontally as the PBL grows. 351 Therefore, the TeMD can be large between morning and afternoon (i.e., for Dt larger than 6 hours). 352 For a small Dt (2 or 4 hours), TeMD increases when increasing the satellite retrieval spatial 353 resolution (i.e., smaller pixel size). This is especially true for short time periods (e.g., 2 hours and 354 4 hours), which is more important for the GEO satellite measurements. For example, for Dt of 2 355 hours, TeMD for satellite pixels of 1 km ´ 1 km is about 0.80´10 16 molecules cm -2 , while TeMD 356 for satellite pixels of 25 km ´ 25 km is about 0.73´10 16 molecules/cm 2 (~9% lower); when Dt is 357 4 hours, TeMD for satellite pixels of 1 km ´ 1 km is about 1.3´10 16 molecules cm -2 , while TeMD 358 for satellite pixels of 25 km ´ 25 km is about 1.1´10 16 molecules/cm 2 (~15% lower). This indicates 359 that when increasing the satellite retrieval spatial resolution (decreasing pixel size), the temporal 360 variability of the retrieved values will increase, even though the normalized satellite spatial SGV 361 decreases. Thus, temporal resolution should be increased in conjunction with the increase in spatial 362 resolution in order to enhance the accuracy of the satellite products. This is expected because 363 averaging over a larger region smooths out temporal variability so producing smaller hourly 364 differences. Our finding here is consistent with that of Fishman et al. (2011). 365 GeoTASO data over the Busan region is limited. Given the fewer flights, we are not able 366 to show how TeMD changes with Dt over the Busan region in this study. However, we are able to 367 show the relationship between TeMD and satellite pixel sizes for a limited range of Dt. During 368 KORUS-AQ, there were only two rasters sampled over Busan with a Dt of 2 hours ( Figure S10). 369 For this Dt of 2 hours, TeMD increases slightly when increasing the satellite retrieval spatial 370 resolution (smaller pixel size). More data over the Busan region would help significantly for this 371 analysis. As for sampled hypothetical satellite pixels over the LA Basin, for a given Dt, TeMD 372 increases when increasing the satellite retrieval spatial resolution (smaller pixel size) for Dt equal 373 to 4 and 8 hours ( Figure S11). We note that with only 2 flight days of flight data, the GeoTASO 374 data over LA is also limited. Despite the limited sample sizes, TeMD increases when increasing 375 the satellite retrieval spatial resolution over both the Busan region and the LA Basin, which is 376 consistent with the relationships over the SMA for a small Dt. 377

Results from Spatial Structure Function (SSF) 378
In this section, we show the analysis of SSF over SMA (Figure 7) as a complement to our 379 analysis in Section 3.1. As mentioned before, SSF and SGV are different measures of spatial 380 variability and are not directly comparable. This is because SSF is calculated based on differences 381 between a single GeoTASO measurement and all the other GeoTASO measurements on the map, 382 while SGV is derived based on variation among all the GeoTASO measurements within a 383 hypothetical satellite pixel unit. SSF measures the averaged spatial difference at a given distance, 384 while SGV directly quantifies the expected spatial variability within a satellite pixel at a given size. 385 As both SSF and SGV are related to spatial variability, we include SSF in this study as an extension 386 to SGV. 387 Figure 7a shows that the SSF in SMA initially increases with the distance between data 388 points, peaks at around 40-60 km during most flights, and then decreases with distance between 389 60 and 140 km. The number of paired GeoTASO data points when the distance is larger than 100 390 km is relatively small ( Figure S12) therefore conclusions beyond this distance are not included in 391 this analysis. The increases in SSF for distances in the range of 1-25 km (Figure 7b) are consistent 392 with the relationship between pixel sizes and the normalized satellite SGV shown in Figure 4. For 393 example, over the 1-25 km range, Fig 4a shows the median increases from around 8% to around 394 28%, an increase by a factor of 3.5, and the black line in Figure 7 shows an approximately similar 395 factor (from 0.33 ´10 16 molecules/cm 2 for 1 km to 1.5´10 16 molecules/cm 2 for 25 km). This 396 increase of SSF between 1-25 km is also seen over the Busan region and the LA Basin ( Figure  397 S13). We also notice that SSF shows a relatively strong dependence on the particular GeoTASO 398 flight, while SGV is less sensitive, especially for small pixel sizes. 399 The shapes of the SSF are generally consistent with previous studies for modeled or in situ 400 observations of NO 2 (Fishman et al., 2011;Follette-Cook et al., 2015). Previous studies also 401 suggest that different aircraft campaigns may share the common shape of SSF but different 402 magnitudes, which is strongly related to the fraction of polluted samples versus samples of 403 background air in the campaign (Crawford et al., 2009;Fishman et al., 2011). Differences in the 404 shape and size of particular cities also contribute to the differences in the SSF. For example, at a 405 certain distance SSF may compare polluted areas within the same urban region, while over a 406 different smaller city, the comparison at the same distance reveals the gradient between the 407 polluted city and cleaner surrounding background air, so resulting in different peak values. Valin 408 et al. (2011) found that the maximum in OH feedback in a NOx-OH steady-state relationship 409 corresponds to a NO 2 e-folding decay length of 54 km in 5m/s winds. This may partially explain 410 the peak between 40~60 km in SSF. As shown in Figures 2 and S7, the overall spatial variability 411 over SMA is higher in the afternoon. Over SMA, the SSF in the morning is generally smaller than 412 in the afternoon, indicating higher spatial variability of tropospheric NO 2 VC in the afternoon (see 413 also Judd et al., 2018). As described in Section 2.6, SSF discussed here (Figure 7) is calculated 414 based on hourly bin. We also include SSF that is calculated within rasters in the supplement ( Figure  415 S14). The overall shapes of SSF ( Figure S14) calculated on raster basis are similar to SSF 416 calculated on hourly basis (Figure 7). 417 Previous studies (Fishman et al., 2011;Follette-Cook et al., 2015) used SSF values at a 418 particular distance to indicate the satellite precision requirement at a corresponding resolution in 419 order to resolve spatial structure over the pixel scale. For GEMS, the expected spatial differences 420 over the scale of its pixel for the SMA and Busan regions are ~7.5´10 15 molecules cm -2 and 421 ~3.5´10 15 molecules cm -2 , respectively, taking the SSF values at 5 km to be representative. For 422 TEMPO, the spatial difference is ~2.8´10 15 molecules cm -2 over LA Basin taking the SSF value 423 at 3 km. Assuming the NO 2 measurement precision requirement to be 1´10 15 molecules cm -2 for 424 both TEMPO and GEMS (Chance et al., 2013;Kim et al., 2020), the expected spatial differences 425 over the three regions are considerably higher than the precision requirement and should be easily 426 characterized by both the GEMS and TEMPO missions. 427

Discussions and implications 428
The relationship between satellite pixel sizes and the normalized satellite SGV is fairly 429 robust over the different regions studied here, and Figure 4 points to the possibility of developing 430 a generalized look-up table if more data were available in other regions. A generalized relationship 431 between satellite pixel sizes and the temporal variability ( Figure 6) is not as evident as the 432 relationship between satellite pixel sizes and the normalized satellite SGV due to limited data. 433 However, it is still useful for satellite observations over SMA, which is in the GEMS domain and 434 should be helpful in satellite retrieval interpretation. 435 This study also has implications for satellite validation and evaluation, and satellite-in situ 436 data comparisons of other trace gas species. Our initial motivation to study satellite SGV arose 437 from our previous work on validation of MOPITT (Measurements of Pollution in the Troposphere) 438 CO retrievals over urban regions (Tang et al., 2020). In that study, we compared the satellite 439 retrievals with aircraft profiles, and realized that satellite SGV and representativeness error of 440 aircraft profiles in the comparisons to MOPITT retrievals introduced uncertainties in the validation 441 results. Previous studies have noticed the same issue for NO 2 (e.g., Nowlan et al., 2016Nowlan et al., , 2018442 Judd et al., 2019;Pinardi et al., 2020;Tack et al., 2020), but this issue is difficult to address and 443 quantify due to the limited spatial coverage of most aircraft observations. Even though only a few 444 trace gas species are routinely retrieved, the gapless raster datasets of GeoTASO are a possible 445 way to address this problem. The normalized SGV of the GeoTASO tropospheric NO 2 VC might 446 serve as an upper bound to the SGV of CO, SO 2 and other species that share common source(s) 447 with NO 2 but have relatively longer lifetimes, even if their spatial distributions may have different 448 patterns (e.g., Chong et al., 2020). For example, at the resolution of 22 km ´ 22 km (resolution of 449 MOPITT CO retrievals), the expected normalized satellite SGV of tropospheric NO 2 VC is ~30%. 450 Therefore, we might expect the normalized satellite SGV for tropospheric CO VC to be lower than 451 this value. 452 To demonstrate this idea, we use the WRF-Chem regional model at an intermediary step. 453 At the model resolution, if the SVG of the WRF-Chem model and GeoTASO NO 2 VC agree 454 reasonably well, then the model can be used to predict the SVG of other species that are chemically 455 constrained with NO 2 at the model resolution and at coarser resolutions. This is shown in Figure 8  456 which illustrates how SGV varies with satellite pixel size for NO 2 VC, CO VC, SO 2 VC, and 457 formaldehyde (HCHO) VC calculated from a WRF-Chem simulation. The modeled NO 2 , CO, SO 2 , 458 and HCHO concentrations are converted to VC, and are filtered to match the rasters of GeoTASO 459 measurements ( Figure S15). As expected, SGV of modeled NO 2 VC is higher than SGV of 460 modeled CO VC, SO 2 VC, and HCHO VC. We also notice that SGV for modeled NO 2 VC, CO 461 VC, SO 2 VC, and HCHO VC increases with pixel size, which is similar to that for GeoTASO 462 measurements. The SGV for GeoTASO NO 2 shown in this figure (black lines) is calculated based 463 on GeoTASO data that are regridded to the WRF-Chem grid (3 km ´ 3 km), making it slightly 464 different from that in Figure 4. Note that a more comprehensive comparison requires further work 465 and ideally actual dense GeoTASO-type measurements of CO and other species to address 466 differences due to local sources on the background concentrations. 467 This study is also relevant to model comparison and evaluation with local observations. 468 Whenever local observations are compared to grid data (e.g., comparisons between satellite 469 retrievals and local observations, comparisons between grid-based model and local observations, 470 and data assimilation), SGV will introduce uncertainties that need to be quantified to better 471 interpret and understand the comparison results. For example, we note that at the resolution of 14 472 km´14 km (a typical resolution for the forward-looking Multi-Scale Infrastructure for Chemistry 473 and Aerosols Version 0; MUSICA-V0, https://www2.acom.ucar.edu/sections/multi-scale-474 chemistry-modeling-musica; Pfister et al. [2020]), the expected normalized satellite SGV of 475 tropospheric NO 2 VC is ~25-30%. When comparing model simulations at a coarser resolution with 476 local observations for tropospheric NO 2 VC, a normalized SGV larger than ~25-30% may be 477 expected. If comparing for a specific vertical layer instead of vertical column, an even larger 478 normalized SGV may occur. 479

Conclusions 480
Satellite SGV is a key issue in interpreting satellite retrieval results. Quantifying studies 481 have been lacking due to limited high-resolution observations. In this study, we quantified likely (1) The normalized satellite SGV increases with hypothetical satellite pixel sizes based on satellite 487 pixel random sampling of hourly GeoTASO data, from ~10% (±5% for specific cases such as 488 an individual day/time of day) for a pixel size of 0.5 km ´ 0.5 km to ~35% (±10% for specific 489 cases such as an individual day/time of day) for the pixel size of 25 km ´ 25 km. This 490 conclusion holds for all the three study regions, despite their different levels of urbanization 491 and pollution, and for time of day, morning or afternoon.

492
(2) The normalized satellite SGV of tropospheric NO 2 VC could serve as an upper bound to 493 satellite SGV of CO, SO 2 and other species that share common source(s) with NO 2 but have 494 relatively longer lifetime, as supported by the high-resolution WRF-Chem simulation.

495
(3) The temporal variability (TeMD) within the same hypothetical satellite pixels increases with 496 sampling time differences (Dt) over SMA. TeMD ranges from ~0.75´10 16 molecules cm -2 at 497 Dt of 2 hours to ~2´10 16 molecules cm -2 (about three times higher) at Dt of 8 hours. TeMD is 498 likely impacted by the short lifetime and diurnal cycle of NO 2 due to emission activities and 499 photolysis rate, and the meteorology and PBL evolution during the day. Improving the satellite 500 retrieval temporal resolution is an effective way to enhance the capability of satellite products 501 in resolving variabilities of NO 2 . 502 (4) Temporal variability (TeMD) increases when increasing the satellite retrieval spatial resolution 503 (i.e., smaller pixel size) in SMA. For example, when Dt is 2 hours, TeMD for satellite pixels 504 with the size of 25 km ´ 25 km is about 20% lower compared to TeMD for satellite pixels with 505 the size of 1 km ´ 1 km. Thus, temporal resolution should be increased along with any increase 506 in spatial resolution in order to enhance the accuracy of satellite products. 507 (5) The spatial structure function (SSF) firstly increases with the distance between data points, 508 peaks at around 40-60 km during most flight days, and then decreases with distance. This is 509 generally consistent with previous studies. 510 (6) SSF analyses suggest that GEMS will encounter NO 2 VC pixel scale spatial differences of 511 ~7.5´10 15 and ~3.5´10 15 molecules cm -2 over the SMA and Busan regions, respectively. 512 TEMPO will encounter NO 2 VC spatial differences at its pixel scale of ~2.8´10 15 molecules 513 cm -2 over the LA Basin. These differences should be easily resolved at the stated measurement 514 precision requirement of 1´10 15 molecules cm -2 . 515 (7) These findings are relevant to future satellite design and satellite retrieval interpretation, 516 especially now with the deployment of the high-resolution GEO air quality satellite 517 constellation, GEMS, TEMPO, and Sentinel-4. This study also has implication for satellite 518 product validation and evaluation, satellite-in situ data comparisons, and more general point-519 grid data comparisons. These share similar issues of sub-grid variability and the need for 520 quantification of representativeness error. 521 We note that this study has some uncertainties and limitations. (1) The variability at a 522 resolution finer than 250 m ´ 250 m (i.e., GeoTASO's resolution) may introduce uncertainties to 523 the analysis here, although this is beyond the scope of this study.
(2) Even though a large number 524 of GeoTASO retrievals have been analyzed in this study, we would still benefit from more high-525 resolution measurements with a broader spatiotemporal coverage, particularly over the Busan 526 region. More GeoTASO-type data over the Busan region will help testing the consistence in TeMD 527 over different regions. (3) The KORUS-AQ campaign was conducted in Spring (May and June), 528 and the 2017 SARP campaign was also conducted in June. More GeoTASO-type measurements 529 over South Korea during different season(s) would be particularly helpful to understand and 530 generalize the findings in this study. 531 This work demonstrates the value of continued flights of GeoTASO-type instruments obtaining 532 continuous, high spatial resolution data several times a day, particularly for the upcoming 533 validation exercises for the GEO air quality satellite constellation. 534 535 Acknowledgement 536 The authors thank the GeoTASO team for providing the GeoTASO measurements. The authors 537 thank the KORUS-AQ and SARP team for the campaign data. We thank the DIAL-HSRL team 538 for the mixing layer height data (available at https://www-air.larc.nasa.gov/cgi-539 bin/ArcView/korusaq). Tang was supported by a NCAR Advanced Study Program Postdoctoral 540 Fellowship. Edwards was partially supported by the TEMPO Science Team under Smithsonian 541 Astrophysical Observatory Subcontract SV3-83021. The National Center for Atmospheric 542 Research (NCAR) is sponsored by the National Science Foundation. The authors thank Ivan 543 Ortega and Sara-Eva Martinez-Alonso for helpful comments on the paper. 544 545 Data availability 546 The KORUS-AQ and SARP data are available at https://www-air.larc.nasa.gov/cgi-547 bin/ArcView/korusaq and https://www-air.larc.nasa.gov/cgi-bin/ArcView/lmos, respectively. 548 549