Calibrating Networks of Low-Cost Air Quality Sensors
- 1Department of Urban and Regional Planning, University of Colorado Denver, 80202
- 2NASA Goddard Space Flight Center, Greenbelt MD
- 3Denver Department of Public Health and Environment, USA
- 4Department of Civil, Environmental, and Architectural Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
- 5Department of Geography and Environmental Sciences, University of Colorado Denver, 80202
- 6Senseable City Lab, Massachusetts Institute of Technology, Cambridge 02139
- 7Division of Biostatistics and Bioinformatics, National Jewish Health, 2930
- 8Department of Geography and the Environment, University of Denver, Denver, CO, USA
- 9Department of Epidemiology, University of Colorado at Denver - Anschutz Medical Campus, 129263
- 10Boston University School of Public Health, Boston, MA, USA
- 1Department of Urban and Regional Planning, University of Colorado Denver, 80202
- 2NASA Goddard Space Flight Center, Greenbelt MD
- 3Denver Department of Public Health and Environment, USA
- 4Department of Civil, Environmental, and Architectural Engineering, University of Colorado Boulder, Boulder, Colorado 80309, United States
- 5Department of Geography and Environmental Sciences, University of Colorado Denver, 80202
- 6Senseable City Lab, Massachusetts Institute of Technology, Cambridge 02139
- 7Division of Biostatistics and Bioinformatics, National Jewish Health, 2930
- 8Department of Geography and the Environment, University of Denver, Denver, CO, USA
- 9Department of Epidemiology, University of Colorado at Denver - Anschutz Medical Campus, 129263
- 10Boston University School of Public Health, Boston, MA, USA
Abstract. Ambient fine particulate matter (PM2.5) pollution is a major health risk. Networks of low-cost sensors (LCS) are increasingly being used to understand local air pollution variation. However, measurements from LCS have uncertainties which can act as a potential barrier for effective decision-making. LCS data thus need to be calibrated to obtain better quality PM2.5 estimates. In order to develop correction factors, LCS are typically co-located with gold-standard reference monitors. A calibration equation is then developed that relates the raw output of the LCS as closely as possible to measurements from the reference monitor. This calibration algorithm is then typically transferred to measurements from monitors in the network. Calibration algorithms tend to be evaluated based on their performance at co-location sites. It is often implicitly assumed that the conditions at the relatively sparse co-location sites are representative of the LCS network, overall. Little work has been done to explicitly evaluate the sensitivity of the LCS network hotspot detection, and spatial and temporal PM2.5 trends to the correction method applied. This paper provides a first look at how transferable different calibration methods are using a dense network of Love My Air LCS monitors in Denver. It offers a series of transferability metrics that can be applied to other networks and offers suggestions for which calibration method would be most useful for different end goals. Finally, it develops a set of best practice suggestions on calibrating LCS networks.
- Preprint
(6727 KB) -
Supplement
(4537 KB) - BibTeX
- EndNote
Priyanka deSouza et al.
Status: final response (author comments only)
-
RC1: 'Comment on amt-2022-65', Anonymous Referee #1, 14 Apr 2022
This is a very timely paper that provides a systematic and deep analysis on the different ways that low cost sensors can be calibrated by colocation with regulatory grade equipment. In particular, it provides useful information on how best calibrate depending on the colocation period possible. The paper uses a variety of calibration models (n=21) starting with simple linear corrections, and ending with complex machine learning algorithms, where it is often difficult to know the mechanism of the correction. The calibration models are tested on four different colocation periods. In particular the difference between the C1 and C2 colocation strategies is interesting because it shows that more calibration data is not necessarily helpful if it doesn’t capture the variability in the parameters. The hot spot analysis is also interesting, highlighting the need for care when interpreting individual sensors within a network.
Low cost sensors are used in various ways. Sensor networks like the ‘love my air’ network used as the data set in this paper are used to complement existing regulatory activities, whereas in other contexts low cost sensors are used where regulatory measurements are scant or non-existent. This paper will provide very useful to all users of low cost sensors.
The paper is very robust in its description and should be published, once the following (mostly minor) points are addressed.
In general, the resolution of the figures should be improved.
Abstact and L49 – no need to say ‘gold standard reference monitors’, ‘reference monitors’ is sufficient.
L42 estimates vary widely for number of premature deaths due to air pollution, this should be acknowledged, or at least the prefix of ‘approximately’ should be added by the 6.7M.
L70 ‘leading to mass overestimation…’ should be ‘leading to the (regulatory) dry mass overestimation’ or similar
L74 need to acknowledge that most of the PM mass concentration is at particle diameters greater than 300 nm.
L96 Köhler not kohler
L119 I would state that R^2 is a misleading indicator rather than might be
L215-216 you would expect averaged data to have less variance.
L240 RH, T, and D are not independent parameters. A discussion of the use of non-independent parameters within the calibration algorithms should be provided.
L302 how do you choose which site to leave out in the LOSO methodology? What potential bias(es) does this introduce into the analysis?
L333 and most other equations. Pet peeve – use proper multiply symbol rather than x in equations.
L351 “as these concentrations account for the greatest differences in health and air pollution avoidance behavior impacts” this statement is unclear. Are you suggesting that 30 ug/m3 is a cut off for more harmful PM health effects? My understanding is the health effect: concentration curve is reasonably linear over these ranges.
L393 note a p value of 0.05 means that 1/20 results can be to chance. With 21 models and 4 colocation conditions, you might expect some false positives.
L457 model 2 has a lower RMSE than model 16, so doesn’t that contradict “more complex models yielded a better performance”
L472 “the nonlinear correction for RH” gave best performance. Doesn’t this suggest a model using a physically reasonable model (essentially k-Köhler) works best when extensive colocation data is not possible. See for example Crilley et al. (2020) https://doi.org/10.5194/amt-13-1181-2020
L528 does the temperature offset on CS19 make sense with respect to the position of the sensor?
-
RC2: 'Comment on amt-2022-65', Anonymous Referee #2, 28 Apr 2022
General comments
This paper is about calibrating low-cost sensors of particulate matter using many different models. The paper promises to give a set of best practices and to describe the transferability of the calibration to sensors not co-located with a reference measurement; however, there is so much data in the paper that these tangible conclusions are lost to me. Maybe some of my comments below will help bring clarity to the next version of this paper.
Also, in the conclusions, future work #2 is exactly what this paper was supposed to determine (based on what the abstract tells us). Thus, there may be a big problem with the overall scope of this paper and confusion over exactly what the take-home messages should be from this work.
For better readability of the final paper, consider breaking up the big tables into a few smaller ones with more focused information in them. It might be worth using color or shading to indicate the sensors or models that stand out and are talked about more in the text.
There are so many references to Supplemental figures, do some of these perhaps belong in the main paper? Maybe display data for a specific site or model and then have the rest of the sites and models in the Supplemental. But then, the reader gets more out of the main paper as a standalone manuscript.
Lines 333 and 453 - Which is it, 89 models or 21 models?
Is Section 2.3.1 (and really, all of Section 2.3) necessary? The way this section is presented, I’m not sure what value it adds to the paper (except for the equations). There are lots of statements about ‘we report’ and ‘we display’, but doesn’t say where to find these.
Line 454 seems to be an important conclusion, but I don’t see any good defense of this statement in the rest of the paper. How is the C2 correction better exactly? In fact, line 683 says that the C2 correction was significantly worse for the complex models (were complex and simple models clearly defined anywhere?). Lines 567 says that C1 and C2 corrections have no significant differences between them. These statements seem like contradictions to me and help lead to my confusion about the whole paper.
Line 670 - The statement here is not very certain; it seems to say that differences in meteorology “likely” matter. Can’t this paper quantify the influence of meteorology? You have T and RH data at each sensor, so you should be able to better determine the effects of meteorology as compared to aerosol composition, where you have no measurements to use.
Line 704-705 If this is an important conclusion, then there should a figure in the main paper that supports this conclusion (I don’t think there is).
Stylistically, much of the Discussion (Section 4) doesn’t seem to add anything new; it’s just repeating the conclusions from each of the figures presented earlier.
Line 730 - Can you really conclude this about Denver? Later, Line 779, you state that the network was over a “fairly small area”. Was all of Denver covered, then?
Line 741 - Did you define or identify different pollution regimes somewhere?
There are a number of sentences throughout the paper which use “it” or “this” as the subject in the sentence, which can add confusion and ambiguity to those sentences. Consider rewording all of these instances.
Specific comments
Line 42 - Can you find a more recent citation and statistic? References says you last accessed the website almost 2 years ago.
Line 125-127 - What about sensor-to-sensor variability? Why is that not considered? Are these sensors all cross-calibrated in a lab prior to deployment?
Line 201 - Is this correlation for minute or hr time resolution?
Line 217 - What additional uncertainties? Be specific.
Line 330-331 - Figure S5 and S6 don’t actually prove that there is a high correlation across sites.
Line 335 - Figure S9 seems pretty important to some conclusions stated later; I wonder if this should be in the main paper? Also, does this include all of the co-located sites or just some of them? (I think it must be all the reference monitor sites but just one of the LCS at those sites even though there may be multiple.). Be specific in the text and figure caption. Also, the colorbar is missing labels.
Lines 503-505 - I am confused about what the 1-minute data are being compared to to evaluate the LCS performance at this time resolution? The reference monitors do not report data this frequently I don’t think.
Line 511 - Which models, specifically?
Line 529 - “appears to be” is qualitative and not useful; quantify the difference.
Line 549 - Why does Figure 2b appear to have a different shape to the box and whisker plots relative to the other parts of this figure?
Lines 553-554 - confusing sentence
Figure 2 caption - typos: no (d) or (e)
Figure 3 - Need better labels in the y-axis for the models; they are referred to as “Model 1, 2, …” in the caption but differently on the figures themselves.
Line 625 - “It appears” is qualitative language and not helpful. Don’t you quantify the variation later in the sentence? You should prove that these variations are significant and then leave no doubt to the reader what the conclusion should be.
Lines 626-627 - These max numbers look like they are all due to one specific model (there is one row and one column with dark green colors, while all other models are pink). If you take out this one model, does your conclusion hold? Why is the one model so different than the others?
Line 652 - Was “exposure assessment” used/defined earlier? I don’t know if this is something new calculated from the PM concentrations or not.
Lines 705-707 - confusing sentence
Lines 745-746 - redundant wording
Technical corrections
Line 234 - missing space
Lines 283 and 333 - single sentence paragraph
Line 495 - ‘correction’
Section 3.1 appears twice, all succeeding sections need to be renumbered.
Line 542 - this sentence is not needed
Line 546 - “LOBD”
Line 608 - inconsistent ways of referring to months
Line 692 - why the colon?
Misplaced or missing commas - Lines 684, 685, 702, 711, 751
Priyanka deSouza et al.
Priyanka deSouza et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
587 | 236 | 11 | 834 | 37 | 15 | 7 |
- HTML: 587
- PDF: 236
- XML: 11
- Total: 834
- Supplement: 37
- BibTeX: 15
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1