Articles | Volume 16, issue 20
https://doi.org/10.5194/amt-16-4723-2023
© Author(s) 2023. This work is distributed under the Creative Commons Attribution 4.0 License.
Development of low-cost air quality stations for next-generation monitoring networks: calibration and validation of NO2 and O3 sensors
Download
- Final revised paper (published on 20 Oct 2023)
- Supplement to the final revised paper
- Preprint (discussion started on 02 May 2023)
- Supplement to the preprint
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2023-673', Mark Joseph Campmier, 10 Jul 2023
- AC2: 'Reply on RC1', Alice Cavaliere, 06 Aug 2023
-
RC2: 'Comment on egusphere-2023-673', Anonymous Referee #2, 24 Jul 2023
- AC1: 'Reply on RC2', Alice Cavaliere, 06 Aug 2023
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
AR by Alice Cavaliere on behalf of the Authors (12 Aug 2023)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (27 Aug 2023) by Albert Presto
RR by Mark Joseph Campmier (08 Sep 2023)
RR by Anonymous Referee #2 (08 Sep 2023)
ED: Publish as is (18 Sep 2023) by Albert Presto
AR by Alice Cavaliere on behalf of the Authors (19 Sep 2023)
Manuscript
Overall, this effectively communicates the full pipeline of data collection, calibration, and validation of NO2 and O3 low-cost sensors. The authors have shown a clear commitment to transparent science and carefully applied data science principles while staying relevant to the domain of atmospheric science. Importantly, rather than just fitting a plethora of models, the overall interpretability of features is investigated – including how the relevance of features varies across different meteorological and pollutant loading regimes over the course of a relatively long timeline.
However, this paper suffers from several structural weaknesses. It overstates the novelty of applying SHAP and generally should cite more recent literature throughout. Furthermore, the novelty of exploring feature relevance is somewhat lost in the acronym dense model performance metrics – most of which are already well characterized in literature. After restructuring the paper to more precisely state and describe its novel findings, it will be a useful reference for the community.
Introduction:
Overall, the introduction should be better organized. It would be helpful to first motivate applications of low-cost sensors with a short (1-2 sentences) summary of the relevance of fine-scale spatial-temporal NO2 and O3 patterns before diving into their design. Consider citing work like: doi.org/10.1016/j.envint.2018.04.002, doi.org/10.1136/bmj.n534. Don’t use phrases like “in the last few years” or “nowadays” (line 23) – Be specific on when low-cost sensors emerged and when deployments scaled
The phrasing in Line 45 is confusing. I suggest removing the claim that there are “no established protocols” and instead merely stating there are two common strategies. There are some existing guidelines from relevant government agencies (cfpub.epa.gov/si/si_public_file_download.cfm?p_download_id=517654, as well as publications.jrc.ec.europa.eu/repository/handle/JRC83791).
Please restructure the last two paragraphs of the introduction as to not exaggerate claims of novelty. This would not be the first study to use SHAP for environmental low-cost sensor evaluation as the authors claim at the end of the introduction section: doi.org/10.1016/j.atmosenv.2023.119692, doi.org/10.3390/s20195497, & doi.org/10.1109/SENSORS52175.2022.9967180. Furthermore, although the 3 gaps identified by the authors in the literature are still relevant areas of investigation, please better contextualize them – for example (iii) has been an active area of investigation with many recent publications as referenced earlier regarding SHAP and low-cost sensors.
Materials and Methods:
Clustering analysis does not obviously follow from a correlation analysis – especially if it is expected that many environmental variables will be collinear. Additionally, this paper emphasizes the importance of trying many supervised methods but offers no justification for the unsupervised method – this is especially tricky since K-means is probably not the best method for identifying robust clusters given the expected collinearity.
This section is very acronym and initialism heavy please consider writing out at least some of the less utilized terms. It may also enhance readability to move many of the details describing the exact model instantiations and hyperparameters (2.3.1 & 2.3.2) employed to a table in the appendix or SI – especially since many of the models are “off-the-shelf” from scikit-learn and not developed by the authors.
SI Table 1 would benefit from also including some historical data about pollution concentrations from NO2 and O3.
Results:
I recommend changing the joint plots in Figure 3 from hex-binned heatmaps to the much more intuitive scatterplots.
The purpose of the k-means analysis is still unclear here. These 6 clusters maybe the most robust set k-means could identify, but that does not mean they are a meaningful or interpretable clustering. From Figure 4, there does not seem to be an obvious regime change or large Euclidean distance between clusters. I would recommend removing these results or considering a density-based clustering approach. Similarly, the regression lines in Figure 4 are not obviously interpretable, they seem noisy and less robust than simply relying on correlation matrix. I would suggest promoting the bottom triangles of the two Pearon’s r matrices in SI Fig S2 and removing Figure 4. If Figure 4 stays in the manuscript, it should also use a different colormap, the yellow does not appear clear on my computer screen. I would recommend a categorical colormap such as Hawaii from https://www.fabiocrameri.ch/colourmaps/.
Figures 5 & 6 are very useful for telling the story of your paper. Consider enhancing them by increasing the height of the y-axis or adding jitter to the points to avoid the overlap as it adds to visual clutter. Furthermore, please change the colormap as suggested for Figure 4.
While Taylor plots like those in Figure 8 can be useful, it seems a table would more succinctly get the point across. I’d recommend moving it to the supplement.
Discussion & Conclusions
The first two paragraphs of the Discussions section can be combined and made more concise. They do not communicate the novelty of the study and reflect an overall structural problem with this manuscript – too much emphasis is placed on the individual “off-the-shelf” models rather than the much more interesting implications of feature relevance at differing concentration regimes or the role of model complexity in spatial-temporal transferability. The discussion point on seasonal transferability is quite interesting and I recommend expanding on it. The comparison of DA, PFI, and SHAP is out of place and would be much better in the methods section with references to literature.
The conclusions should include some detail about implications for work outside of Italy, in differing pollution and meteorological regimes.
Consider promoting SI Figure S10 to the main text, it is useful for understanding the discussion points as well as contextualizing the range of pollutant concentrations of this study.