Development of a general calibration model and long-term performance evaluation of low-cost sensors for air pollutant gas monitoring

Malings, Carl; Tanzer, Rebecca; Hauryliuk, Aliaksei; Kumar, Sriniwasa P. N.; Zimmerman, Naomi; Kara, Levent B.; Presto, Albert A.; R. Subramanian,

doi:10.5194/amt-12-903-2019

Articles | Volume 12, issue 2

https://doi.org/10.5194/amt-12-903-2019

© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

Collection:

Low-cost sensors for the measurement of atmospheric...

https://doi.org/10.5194/amt-12-903-2019

© Author(s) 2019. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 12, issue 2

Research article

|

11 Feb 2019

Research article |

| 11 Feb 2019

Development of a general calibration model and long-term performance evaluation of low-cost sensors for air pollutant gas monitoring

Carl Malings, Rebecca Tanzer, Aliaksei Hauryliuk, Sriniwasa P. N. Kumar, Naomi Zimmerman, Levent B. Kara, Albert A. Presto, and R. Subramanian

Download

Final revised paper (published on 11 Feb 2019)
Supplement to the final revised paper
Preprint (discussion started on 08 Aug 2018)
Supplement to the preprint

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Review of Malings et al.', Anonymous Referee #1, 24 Oct 2018
- AC1: 'Response to RC1', Carl Malings, 03 Dec 2018
RC2: 'Review Comments', Anonymous Referee #2, 05 Nov 2018
- AC2: 'Response to RC2', Carl Malings, 03 Dec 2018

Peer-review completion

AR – Author's response | RR – Referee report | ED – Editor decision

AR by Carl Malings on behalf of the Authors (04 Dec 2018) Author's response Manuscript

ED: Referee Nomination & Report Request started (04 Dec 2018) by Jun Wang

RR by Anonymous Referee #1 (28 Dec 2018)

Suggestions for revision or reasons for rejection

This is an improved manuscript – in particular, the additional details about the training and test data sets have substantially clarified the authors’ approaches to sensor calibration. Most of my concerns have been addressed; remaining comments are listed below.

If training and test sets are not the same (p. 6, line 1), it’s hard to understand the utility (or even the meaning) of Table 4 – combining all test and training data into combined ranges isn’t terribly meaningful, since there’s substantial overlap between the two. Moreover combining them even risks misleading the reader as to the true ranges of the two sets for individual sensors – the ranges for an individual sensor might be quite different than those shown in the table. I would recommend changing this table substantially to make clear the differences in training/test sets for individual sensors. In addition, these differences should be discussed in the table caption. If there was a way to visualize these differences (maybe some example histograms?) that would be helpful also.

Results: in my original review I had suggested that the random forest and hybrid approaches shouldn’t be different, since the training and test sets appeared to be identical. But since the ranges given in Table 4 turn out to be combined ranges, and not ranges covered by each individual sensor, this may be incorrect – there may be sensors for which the ranges of the training and test sets differ substantially. (However, whether this is actually the case is hard to evaluate based on the information in the paper and SI.) In such cases the two models may be expected to give different results, so the two could be discussed individually.

Regardless, if hybrid model is to be left in, the authors should still need to provide information on the number of “crossings” between the RF and LR models, and the fraction time evaluated by RF vs time evaluated by LR.

P. 16, lines 9-11: it should also be mentioned that the clustering algorithm would likely be improved by use of kNN (rather than k-means-clustering, which is what is used).

P. 16, lines 13-15: if the authors are going to continue to make this suggestion based on the current work (even with the new caveat added), they need to be back it up with much more than a citation to another paper. Specifically they need to show some evidence that the differences in performance results from the RAMP circuitry, and not from differences in the training/test set used. I’m not sure how one would do this, but as written the sentence is purely speculative, and not backed up with any substantive evidence.

P. 18 line 14: it is stated that a new model should be developed “each year”, but this is probably more specific than is warranted from the work. My takeaway from this work is that models stay reasonably robust for timescales of several months, but should be periodically evaluated/updated when used over longer timescales (on the order of every ~6-18 months). I would recommend changing to wording to reflect this. This recommendation is also included in the abstract, and so should be changed there as well. (As a minor side note, I feel including it in the abstract risks detracting from the more fundamental results of this work, related to generalized models. So I might recommend removing or shortening the sentence in lines 20-22 of the abstract.)

SI: in the Response to Reviews the authors state that “the randomized nature of the training approach for some models (such as the random forest models) will lead to slightly different results if these models are re-built.” I don’t understand this. If the algorithm uses a random seed to generate psuedo-random numbers, the same psuedo-random numbers should be generated each time, so results should be replicable. (More generally, if the results are indeed different when the model is re-run, this represents a potentially major problem, as it calls into question the robustness of the results.)

Hide

RR by Anonymous Referee #2 (10 Jan 2019)

ED: Publish subject to minor revisions (review by editor) (17 Jan 2019) by Jun Wang

One review recommends 'publish as it is', and another recommends 'minor changes' as suggested in below. Please address the recommendations before finalizing the paper for submission and publication.

peer reviews:

This is an improved manuscript – in particular, the additional details about the training and test data sets have substantially clarified the authors’ approaches to sensor calibration. Most of my concerns have been addressed; remaining comments are listed below.

If training and test sets are not the same (p. 6, line 1), it’s hard to understand the utility (or even the meaning) of Table 4 – combining all test and training data into combined ranges isn’t terribly meaningful, since there’s substantial overlap between the two. Moreover combining them even risks misleading the reader as to the true ranges of the two sets for individual sensors – the ranges for an individual sensor might be quite different than those shown in the table. I would recommend changing this table substantially to make clear the differences in training/test sets for individual sensors. In addition, these differences should be discussed in the table caption. If there was a way to visualize these differences (maybe some example histograms?) that would be helpful also.

Results: in my original review I had suggested that the random forest and hybrid approaches shouldn’t be different, since the training and test sets appeared to be identical. But since the ranges given in Table 4 turn out to be combined ranges, and not ranges covered by each individual sensor, this may be incorrect – there may be sensors for which the ranges of the training and test sets differ substantially. (However, whether this is actually the case is hard to evaluate based on the information in the paper and SI.) In such cases the two models may be expected to give different results, so the two could be discussed individually.

Regardless, if hybrid model is to be left in, the authors should still need to provide information on the number of “crossings” between the RF and LR models, and the fraction time evaluated by RF vs time evaluated by LR.

P. 16, lines 9-11: it should also be mentioned that the clustering algorithm would likely be improved by use of kNN (rather than k-means-clustering, which is what is used).

P. 16, lines 13-15: if the authors are going to continue to make this suggestion based on the current work (even with the new caveat added), they need to be back it up with much more than a citation to another paper. Specifically they need to show some evidence that the differences in performance results from the RAMP circuitry, and not from differences in the training/test set used. I’m not sure how one would do this, but as written the sentence is purely speculative, and not backed up with any substantive evidence.

P. 18 line 14: it is stated that a new model should be developed “each year”, but this is probably more specific than is warranted from the work. My takeaway from this work is that models stay reasonably robust for timescales of several months, but should be periodically evaluated/updated when used over longer timescales (on the order of every ~6-18 months). I would recommend changing to wording to reflect this. This recommendation is also included in the abstract, and so should be changed there as well. (As a minor side note, I feel including it in the abstract risks detracting from the more fundamental results of this work, related to generalized models. So I might recommend removing or shortening the sentence in lines 20-22 of the abstract.)

SI: in the Response to Reviews the authors state that “the randomized nature of the training approach for some models (such as the random forest models) will lead to slightly different results if these models are re-built.” I don’t understand this. If the algorithm uses a random seed to generate psuedo-random numbers, the same psuedo-random numbers should be generated each time, so results should be replicable. (More generally, if the results are indeed different when the model is re-run, this represents a potentially major problem, as it calls into question the robustness of the results.)

Hide

AR by Carl Malings on behalf of the Authors (21 Jan 2019) Author's response Manuscript

ED: Publish as is (26 Jan 2019) by Jun Wang

AR by Carl Malings on behalf of the Authors (29 Jan 2019) Manuscript

Short summary

This paper compares several methods for calibrating data from low-cost air quality monitors to reflect the concentrations of various gaseous pollutants in the atmosphere, identifying the best-performing approaches. With these calibration methods, such monitors can be used to gather information on air quality at a higher spatial resolution than is possible using traditional technologies and can be deployed to areas (e.g. developing countries) where there are no existing monitor networks.