We use 2011–2019 aerosol optical depth (AOD) observations from the Geostationary Ocean Color Imager (GOCI) instrument over East Asia to infer 24 h daily surface fine particulate matter (PM
Exposure to outdoor fine particulate matter (PM
The potential of satellites for high-resolution monitoring of PM
More recently, non-parametric machine learning models have been developed to
predict PM
Geostationary satellites are now dramatically increasing the capability for the mapping of PM
AOD cannot be observed under cloudy conditions, and AOD retrievals from
satellites can also fail for other reasons, including snow surfaces.
Different methods have been used to fill the data gaps and produce
continuous data sets. Some studies use chemical transport model (CTM) AODs when satellite data are missing (Hu et al., 2017; Stafoggia et al., 2019). Kianian et al. (2021) used a statistical interpolation algorithm combining RF with the lattice kriging method to infer missing AOD over the USA, while Di et al. (2019) used an RF trained on gap-free covariates to fill in the
gaps for MODIS AOD. Yet, others first estimate PM
Here we apply an RF algorithm to 2011–2019 GOCI AOD data to construct a
continuous dataset of 24 h PM
GOCI is on board the Korean Communication, Ocean, and Meteorological Satellite (COMS) that was launched by KARI in June 2010 (Choi et al., 2012, 2016). The first ocean color imager placed in geostationary orbit, GOCI covers a 2500
Validation of the GOCI YAER V2 AOD with surface measurements from the
AERONET surface network shows a high correlation (
We use hourly PM
We use hourly meteorological data from the ERA5 global reanalysis, with a resolution of 30
Mean aerosol optical depth (AOD) and surface network PM
Figure 1 shows the mean distributions of GOCI AOD and surface
network PM
Figure 2 shows the percentage of days with at least one successful
hourly GOCI AOD retrieval on the 6
Percentage of days in 2011–2019 with at least one successful
hourly retrieval of AOD on the 6
We calculate the weighting factors
Table 1 lists the predictor variables included in the RF to infer 24 h PM
Random forest predictor variables for 24 h PM
Decision trees are fit recursively to the predictor variable. Suppose we
have a collection of
Due to the recursive training structure, decision trees are sensitive to the
data on which they are trained because a change in one split point changes
the composition of all its child nodes. Individual decision trees thus have
high error variance but no inherent bias. It follows that averaging many
individual and uncorrelated trees should yield a low variance, low bias,
prediction. We construct 200 trees in parallel and reduce the correlation
between them through a bagging procedure; for each of the 200 decision trees
in the RF, we sample the input data with replacement to form a new dataset of
the same dimensions and then grow a decision tree from this bootstrapped
data (Breiman, 2001). Because of the high input sensitivity, a wide variety of decorrelated trees are grown. The predictions of each individual tree are
averaged to yield the prediction of the RF. We fit our RF using the
RandomForestRegression class in the Python module Scikit-learn (Pedregosa et al., 2011). We attempted to further decorrelate the trees by following Breiman (2001) and calculating the split points of each individual tree using only a random subset of the
We evaluate how the RF generalizes to predictions for the full 6
An outcome of interest is the ability of our predictions to capture
exceedances of National Ambient Air Quality Standards (NAAQS). We categorize
each prediction within the test sets into one of the following four classes: true positives (TPs), where both predicted and observed PM
Predictor variable selection is an important task in implementing an RF, as
the addition of non-informative variables can decrease performance. Unlike
linear regression, which can naturally ignore unhelpful predictors,
irrelevant data can, by chance, aid in minimizing impurity
Figure 3 shows scatterplots, color-coded by count, comparing surface observations of 24 h and annual mean PM
The ability of the random forest algorithm to predict 24 h
Error statistics for fitting of PM
Our gap-filling strategy does not introduce bias for days without GOCI
observations (and with AOD inferred, instead, from Eq. 1). Figure S1 in the Supplement shows that surface network PM
Ability of the RF algorithm to diagnose exceedances of air quality standards
One potential application of PM
Cumulative probability density functions (pdf's) of 24 h and annual
mean PM
The main difficulty for GOCI PM
We experimented with several modifications to the RF algorithm to improve the
prediction of NAAQS exceedances but with no success. These tests included
training separate RFs for each of the three countries, training annual
PM
Figure 5 shows long-term trends of annual PM
Trends in the annual mean PM
Figure 6 shows the changes in annual mean PM
Annual mean PM
Figure 7 depicts the relative 2015–2019 trends of PM
The 2015–2019 trends per year in PM
AOD and PM
Monthly PM
We examine here the ability of GOCI PM
The 24 h PM
Same as Fig. 9 but for a pollution event in Beijing on 16–21 December 2016.
Figure 10 shows an additional test of the RF algorithm with one of
the most severe pollution events in the record, the 16–21 December 2016
Beijing winter haze episode. The 24 h PM
Regional air quality model predictions of PM
The simulation for South Korea was conducted for 2015–2019 using emissions from the Clean Air Policy Support System (CAPSS) 2016 (Choi et al., 2020) for South Korea and KORUSv5 (Woo et al., 2022) for outside South Korea. The simulation for North Korea was conducted for 2016 using emissions from the Comprehensive Regional Emissions inventory for Atmospheric Transport Experiment (CREATE) 2015 (Woo et al., 2020) and CAPSS 2013. Natural aerosols, including sea salt and mineral dust, are included. To prepare the boundary conditions, a coarse domain at 27 km horizontal grid resolution covering northeastern Asia was used.
Mean PM
Figure 11 illustrates the increased capability for model evaluation in South Korea enabled by the GOCI PM
Mean PM
Figure 12 evaluates the CMAQ simulation with the GOCI PM
We used 2011–2019 geostationary aerosol optical depth (AOD) observations
from the GOCI satellite instrument, in combination with a random forest (RF)
machine learning algorithm trained on air quality network data, to produce a
continuous 24 h PM
We trained the RF algorithm on gap-filled AODs from the GOCI instrument and
a suite of 12 meteorological, geographical, and temporal predictor
variables. Gap-filling of AODs was done by a weighted combination of
nearest-neighbor data and chemical transport model fields, with the weight
serving as an additional predictor variable. The RF algorithm is
successfully able to exploit information encoded in AOD retrieval failure to
produce a continuous product. Testing of the RF algorithm by the prediction of withheld network sites shows single-value precisions in each country of
26 %–32 % for 24 h PM
We compared the continuous 24 h GOCI PM
We examined the ability of the RF algorithm to map air quality on urban
scales by an analysis of two multi-day pollution episodes in Seoul and Beijing. The algorithm captures the day-to-day temporal variability observed by the surface networks and the spatial variability on the 6
The continuous spatial coverage of PM
More work could be done to improve our GOCI PM
The 24 h 6
The supplement related to this article is available online at:
DCP and DJJ designed the study. DCP developed the RF and performed analysis. SZ, MB, and SK ran and analyzed the chemical transport model data. SL aided in satellite data processing. JK, HL, and JHK provided the scientific interpretation and discussion. All authors provided input on the paper for revision.
The contact author has declared that neither they nor their co-authors have any competing interests.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was funded by the Samsung PM
This research has been supported by the Samsung Advanced Institute of Technology (Samsung PM
This paper was edited by Marloes Penning de Vries and reviewed by two anonymous referees.