the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Estimating hourly ground-level aerosols using GEMS aerosol optical depth: A machine learning approach
Abstract. The Geostationary Environment Monitoring Spectrometer (GEMS) is the world's first ultraviolet–visible instrument for air quality monitoring in geostationary orbit. Since its launch in 2020, GEMS has provided hourly daytime air quality information over Asia. However, to date, validation and applications of these data are lacking. Here we evaluate the effectiveness of the first 1.5-year GEMS aerosol optical depth (AOD) data in estimating ground-level particulate matter (PM) concentrations at an hourly scale. To do so, we employ random forest models and use GEMS AOD data and meteorological variables as input features to estimate PM10 and PM2.5 concentrations, respectively, in South Korea. The model-estimated PM concentrations are strongly correlated with ground measurements, but they exhibit negative biases, particularly during high aerosol loading months. Our results indicate that GEMS AOD values represent underestimates compared to ground-measured AOD values, possibly leading to negative biases in the final PM estimates. Further, we demonstrate that more training data could significantly improve random forest model performance, thus indicating the potential of GEMS for high-resolution surface PM prediction when sufficient data are accumulated over the coming years. Our results will serve as a reference to aid the evaluation of future GEMS AOD retrieval algorithm improvements and also provide initial guidance for data users.
- Preprint
(622 KB) - Metadata XML
-
Supplement
(401 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on amt-2024-142', Anonymous Referee #1, 16 Sep 2024
This manuscript is purposed on the PM concentration estimation based on the GEMS AOD observation data.Â
Because the GEMS is the geostationary orbit satellite, the idea in this manuscript has advantage of diurnal change monitoring of PM concentration from the satellite measurement.
However, to understand and check this characteristics, this manuscript needs to some additional analysis. Especially, the machine learning method is not a perfect approach and its result can be changed by the input data selection.
For this reason, idealizing and analyzing the input variable for the machine learning method is essential to include the manuscript.
For the detail, I listed to the below.1) Introduction: For the readability of the manuscript, the author will be added to the brief explanation of sections in the final part of the Introduction section.
2) Adding the reference
- Because the study of satellite retrieved AOD was largely evaluated, the reference and related paragraph will be added before the paragraph of Line 33 (Before GEMS AOD study)
- L35: The references related to AOD definition and its retrieval method will be added, such as Go et al. (2020).
- L56: What is 'Korea Environment Corporation'? Is that 'Korean Environmental Institute'?
- L57: For the PM concentration measurement, the author will be added the related references.Â
- L63: What is ARA? Need to clarify.
3) From the Section 2, this manuscript included both methodology and result parts. I suggest that this section separates the method section (e.g., Section 2) and Result section (e.g. Section 3), and the author will make sub-sections for the detailed explanation of each parts. In this version of the manuscript, method and result parts are too short to clarify the detailed machine learning method and the reason of variable selections. For this reason, the manuscript is not able to identify the difference of research compared to the several previous studies for PM estimation by the machine learning. The author have to include the table for the list of the selected variable and selection criteria.Â
4) L59-L64: For the data colocation between ground and satellite pixels, temporal colocation is clarfied in the manuscript. However, the spatial colocation between Airkorea and GEMS pixel, the manuscript is explained only the 'nearlest pixel'. How to be selected the 'nearlest pixel'? Because the cloud contamination of the GEMS AOD value, some pixels have to be eliminated, and the spatial distance between two measurements will be far. Do you have criteria of the maximum distance? In addition, the ground observation stations are dense in the urban region, especially denser than the GEMS spatial resolution. In this case, the same GEMS pixel is duplicately selected in different AirKorea observation sites. In this case, how to correct the colocation method?
- In addition, for the colocation between observation and reanalysis dataset, what kind of interpolation method is used in this study? If you simply selected the 'nearlest grid', it may affect the uncertainty.
5) L132: Boundary Layer Height (BLH) is not a linear relation to correct the PM concentration from satellite AOD. The BLH is roughly changing the PM concentration. But its sensitivity is also changed by the columnar concentration of aerosols. Did the author check the sensitivity change of BLH for the relationship between PM concentration and satellite AOD? (Including the reference survey)
6) Section 2.1: Although the supplement part include the PM2.5 result, the body of the manuscript is not shown the detailed analysis of PM2.5. I suggested that both the PM10 and PM2.5 estimation method and SHAP analysis will be included separately. Also, the detailed SHAP analysis results have to be included with the detailed analysis and explanations. If the author compares the difference between PM2.5 and PM10 estimation, it is possible to evaluate the contribution of aerosol types or absorptivity. In addition, for the explanation of PM10 concentration, the manuscript is confused about what is 'observed from AirKorea' and 'satellite retrieved PM concentrations'. The author will clarify the word for satellite-derived PM concentration and Ground-based observed PM concentration.
7) Figure 4 and 5: Re-arrange the time scale (24 hours -> Daytime)
8) L161-L170: The author mentioned that the main reason for estimated PM underestimation is due to the GEMS AOD underestimation. However, this study's method made the machine learning model based on the GEMS AOD. If so, the uncertainty characteristics of GEMS AOD is adopted in the machine learning modeling. Another possibility of the estimated PM underestimation is the false selection of variables or lack of the variable for the machine learning method. From several previous studies, the PM concentration is not affected only by the meteorological components, but also by the chemical processes. The author has to check the variable selections.
9) For the Machine learning adaptation, do you have the criteria of minimum concentration of observed PM and minimum value of satellite retrieved AOD? Low concentration of aerosol cases may be affecting the overall performance of estimation.
10) L179-L184 and Figure 6: For the statistical score, a detailed explanation will be needed. In addition, in Figure 6, 'n0', 'n1',' n2', 'n4', and 'n8' are not explained in the caption of Figure 6 and the body of the manuscript. The author has to clarify the explanation of Figure 6 and the mean of the statistical score.Citation: https://doi.org/10.5194/amt-2024-142-RC1 -
RC2: 'Comment on amt-2024-142', Anonymous Referee #2, 25 Sep 2024
The manuscript is based on the estimation of PM2.5 and PM10 from GEMS AOD. The main objective is to evaluate the effectiveness of GEMS AOD in estimating ground level PM concentrations. This study attempts to study how GEMS AOD can provide air quality estimates in a global scale, which is of great importance.
However, there are few concerns regarding the formulation of the study and the structure of the manuscript. Given below are my suggestions.
- The overall paper lacks adequate explanations and citations to corroborate the objective of the study and how it differs from existing studies/novelty. (Ex: are there any ML based studies for estimation of PM concentrations? What are the advantages of this method over the existing?)
- The introduction of the manuscript should include a brief description on the sections of the manuscript. Results and discussion should be a separate section from data and methodology. I suggest separating data and methodology as separate sections, as this manuscript lacks proper description on the methodology (there is too little information on the machine learning method (RF), selection criteria for input variables, ranges of the input variables.
- What is the sample size of the data used in RF?
- RF was selected to estimate PM concentrations out of some other ML methods. How do you evaluate the model effectiveness in this work? Model performance can also affect the conclusions you draw regarding the ability of GEMS AOD to accurately provide PM concentrations.
- The first part of the results should be to validate the GEMS AOD retrievals
- The labeling of PM measurements used in RF, and the PM estimations, is vague. Make it more distinct.
- The use of mean vs error plots would be a better way of understanding the model performance rather than comparing the correlation coefficients. (Refer, Bland-Altman analysis)
- Add more details description on SHAP analysis.
- L 59-61 Include more details about GEMS instrument (uncertainties, wavelength channels). Do you perform any pixel averaging?
- You need to add a description on GEMS AOD retrieval algorithm and explain possible uncertainties in AOD retrievals. The citation is not enough.
- L 74 – Do you perform any geolocation of data? How do you collocate reanalysis data? What is the maximum possible difference in the colocation of ground-based PM concentrations, AOD and reanalysis data?
- L 115 – Is this something evident across all AOD values? Are there any differences seen for PM estimations under lower AOD and higher AOD values. Is there any detection limit?
- L 154 – AERONET data should be introduced under the data section. How do you collocate AERONET data? What do you mean by closest? You should specify the distance limit. How do you average temporal data. Does the difference between AERONET AOD and GEMS AOD lies with AEROENT AOD uncertainty?
Â
- L 63 – What is ARA?
- L 69 – What is ERA5?
- L 75 – How did you perform the AOD-PM simulations? Or do you mean estimations?
- L 83 – 89 This paragraph should go under the data section
- L 157 – Has it been observed for low AOD or high AOD?
- L 164 -165 Does the GEMS AOD algorithm consider any non-sphericity dust? You should add a description about the AOD algorithm
- L 176 – Fig 6. What does n=1,2,… stand for?Â
Â
Â
Citation: https://doi.org/10.5194/amt-2024-142-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
165 | 26 | 122 | 313 | 16 | 7 | 7 |
- HTML: 165
- PDF: 26
- XML: 122
- Total: 313
- Supplement: 16
- BibTeX: 7
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1