09 May 2022
09 May 2022
Status: this preprint is currently under review for the journal AMT.

A New Machine Learning based Analysis for Improving Satellite Retrieved Atmospheric Composition Data: OMI SO2 as an Example

Can Li1,2, Joanna Joiner2, Fei Liu2,3, Nickolay A. Krotkov2, Vitali Fioletov4, and Chris McLinden4 Can Li et al.
  • 1Earth System Science Interdisciplinary Center, University of Maryland, College Park, MD 20740, USA
  • 2Atmospheric Chemistry and Dynamics Laboratory, NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA
  • 3Goddard Earth Sciences Technology and Research (GESTAR) II, Morgan State University, Baltimore, MD 21251, USA
  • 4Environment and Climate Change Canada, Toronto, Ontario, Canada

Abstract. Despite recent progress, satellite retrievals of anthropogenic SO2 still suffer from relatively low signal-to-noise ratios. In this study, we demonstrate a new machine learning data analysis method to improve the quality of satellite SO2 products. In the absence of large ground truth datasets for SO2, we start from SO2 slant column densities (SCDs) retrieved from the Ozone Monitoring Instrument (OMI) using a data-drive, physically based algorithm and calculate the ratio between the SCD and the root mean square (RMS) of the fitting residuals for each pixel. To build the training data, we select presumably clean pixels with small SCD / RMS ratios (SRRs) and set their target SCDs to zero. For polluted pixels with relatively large SRRs, we set the target to the original retrieved SCDs. We then train neural networks (NNs) to reproduce the target SCDs using predictors including SRRs for individual pixels, solar zenith, viewing zenith and phase angles, scene reflectivity and O3 column amounts, as well as the monthly mean SRRs. For data analysis, we employ two NNs: 1) one trained daily to produce analysed SO2 SCDs for polluted pixels each day and 2) the other trained once every month to produce analysed SCDs for less polluted pixels for the entire month. Test results for 2005 show that our method can significantly reduce noise and artifacts over background regions. Over polluted areas, the monthly mean NN analysed and original SCDs generally agree to within ±15 %, indicating that our method can retain SO2 signals in the original retrievals except for large volcanic eruptions. This is further confirmed by running both the NN analysed and the original SCDs through a top-down emission algorithm to estimate the annual SO2 emissions for ~500 anthropogenic sources, with the two datasets yielding similar results. We also explore two alternative approaches to the NN-based analysis method. In one, we employ a simple linear interpolation model to analyse the original SCD retrievals. In the other, we develop a PCA-NN algorithm that uses OMI measured radiances, transformed and dimension-reduced with a principal component analysis (PCA) technique, as inputs to NNs for SO2 SCD retrievals. While the linear model and the PCA-NN algorithm can reduce retrieval noise, they both underestimate SO2 over polluted areas. Overall, the results presented here demonstrate that our new data analysis method can significantly improve the quality of existing OMI SO2 retrievals. The method can potentially be adapted for other sensors and/or species and enhance the value of satellite data in air quality research and applications.

Can Li et al.

Status: open (until 13 Jun 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Can Li et al.


Total article views: 133 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
96 34 3 133 6 1 1
  • HTML: 96
  • PDF: 34
  • XML: 3
  • Total: 133
  • Supplement: 6
  • BibTeX: 1
  • EndNote: 1
Views and downloads (calculated since 09 May 2022)
Cumulative views and downloads (calculated since 09 May 2022)

Viewed (geographical distribution)

Total article views: 127 (including HTML, PDF, and XML) Thereof 127 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 25 May 2022
Short summary
Satellite observations provide information on the sources of SO2, an important pollutant that affects both air quality and climate. However, these observations suffer from relatively poor data quality due to weak signals of SO2. Here, we use a machine learning technique to analyse satellite SO2 observations in order to reduce the noise and artifacts over relatively clean areas while keeping the signals near pollution sources. This leads to significant improvement in satellite SO2 data.