Preprints
https://doi.org/10.5194/amt-2020-473
https://doi.org/10.5194/amt-2020-473

  22 Dec 2020

22 Dec 2020

Review status: a revised version of this preprint was accepted for the journal AMT and is expected to appear here in due course.

Towards low-cost and high-performance air pollution measurements using machine learning calibration techniques

Peer Nowack1,2,3,4, Lev Konstantinovskiy5, Hannah Gardiner5, and John Cant5 Peer Nowack et al.
  • 1Grantham Institute - Climate Change and the Environment, Imperial College London, London SW7 2AZ, UK
  • 2Department of Physics, Imperial College London, London SW7 2AZ, UK
  • 3Data Science Institute, Imperial College London, London SW7 2AZ, UK
  • 4School of Environmental Sciences, University of East Anglia, Norwich NR4 7TJ, UK
  • 5Air Public Ltd, London, UK

Abstract. Air pollution is a key public health issue in urban areas worldwide. The development of low-cost air pollution sensors is consequently a major research priority. However, low-cost sensors often fail to attain sufficient measurement performance compared to state-of-the-art measurement stations, and typically require calibration procedures in expensive laboratory settings. As a result, there has been much debate about calibration techniques that could make their performance more reliable, while also developing calibration procedures that can be carried out without access to advanced laboratories. One repeatedly proposed strategy is low-cost sensor calibration through co-location with public measurement stations. The idea is that, using a regression function, the low-cost sensor signals can be calibrated against the station reference signal, to be then deployed separately with performances similar to the original stations. Here we test the idea of using machine learning algorithms for such regression tasks using hourly-averaged co-location data for nitrogen dioxide (NO2) and particulate matter of particle sizes smaller than 10 μm (PM10) at three different locations in the urban area of London, UK. Specifically, we compare the performance of Ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of Random Forest (RF) regression and Gaussian Process regression (GPR). We further benchmark the performance of all three machine learning methods to the more common Multiple Linear Regression (MLR). We obtain very good out-of-sample R2-scores (coefficient of determination) > 0.7, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best performing method in our calibration setting, followed by Ridge regression and RF regression. However, we also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, none of the methods is able to extrapolate to pollution levels well outside those encountered at training stage. Ultimately, this is one of the key limiting factors when sensors are deployed away from the co-location site itself. Consequently, we find that the linear Ridge method, which best mitigates such extrapolation effects, is typically performing as good as, or even better, than GPR after sensor re-location. Overall, our results highlight the potential of co-location methods paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables, and the features of the calibration algorithm.

Peer Nowack et al.

 
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement
 
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement

Peer Nowack et al.

Peer Nowack et al.

Viewed

Total article views: 479 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
329 145 5 479 5 8
  • HTML: 329
  • PDF: 145
  • XML: 5
  • Total: 479
  • BibTeX: 5
  • EndNote: 8
Views and downloads (calculated since 22 Dec 2020)
Cumulative views and downloads (calculated since 22 Dec 2020)

Viewed (geographical distribution)

Total article views: 431 (including HTML, PDF, and XML) Thereof 427 with geography defined and 4 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 30 Jul 2021
Download
Short summary
Machine learning (ML) calibration techniques could be an effective way to improve the performance of low-cost air pollution sensors. Here we provide novel insights from case studies within the urban area of London, UK, where we compare the relative performance of three ML techniques to calibrate low-cost sensors of two key air pollutants (NO2 and PM10). We further highlight several advantages and challenges related to each method, which will be useful to inform future measurement campaigns.