the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Improving raw readings from ozone low cost sensors using artificial intelligence for air quality monitoring
Abstract. Ground level ozone (O3) is a highly oxidising gas with very reactive properties, harmful at high levels, generated by complex photochemical reactions when "primary" pollutants from combustion of fossil materials react with sunlight. Thus, its concentration serves as an indicator of the activity of other air pollutants and plays a key role in Air Quality monitoring systems in smart cities. To increase its spatial sampling resolution over the city map, ozone low cost sensors are an interesting alternative, but they have a lack of accuracy. In this context, artificial intelligence techniques, in particular ensemble machine learning methods, can improve the raw readings from these sensors taking into account additional environmental information. In this paper, we analyse, propose and compare different techniques, reducing the estimation error in around 94 %, achieving the best results using the Gradient Boosting algorithm and outperforming the related work using sensor approximately 10 times less expensive.
- Preprint
(7917 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 27 Dec 2024)
-
RC1: 'Comment on amt-2024-127', Anonymous Referee #1, 31 Oct 2024
reply
This paper requires major edits to be considered for publication. The introduction and related works sections are extremely weak and do not set up a solid foundation for the work the authors hope to achieve with their gradient boosted calibration of low-cost ozone sensors. The many figures and tables regarding feature selection are not well explained. The ML model generation and final model outputs, especially the gradient boosted model, seem sound, but the authors neglect to include the results of the testing data, which is a better indicator of whether the models are overfitting and better demonstrates how these models would perform in the field as compared to the training statistics, which are the focus of the article. The grammar throughout needs improvement, and there are many instances where subscript is needed (including in figures). Overall, additional literature review and context will lay a stronger foundation for the model building, and careful revision of which figures and tables are really necessary along with added information on the training dataset of the model (which speaks to overfitting and real-world applicability) will greatly improve the paper.
The introduction section does not provide sufficient context. First, the authors list a few vague sentences about air quality in general. For example, line 15: “exceeds the limit values of the recommended safety guidelines”– what guidelines? Limits for what pollutants? A brief description of how ozone is formed is given, but no specifics on the region of interest or what these health effects and consequences. The authors mention that low-cost sensors have lower accuracy – why? What are the issues surrounding them?
One other machine learning-enabled calibration effort is mentioned in this section, with no information on HOW machine learning actually improves this. This feels out of place here and the same information is listed again in section 2, so I would suggest removing it here and expanding on it in section 2. More exploration of other machine learning based calibration algorithms beyond the ZPHS01B-specific ones referenced later on would strengthen the paper as ML-based calibration is common practice in the field. The outline in line 36 is unnecessary.
In table 1, rather than listing “low”, “mid-low”, etc. and then defining it in the text, it would be easier for the reader if the cost was just listed in the table. Better distinction is needed between what is an individual sensor vs. what is a complete package. The table is titled systems and/or modules, but the text does not explain what the distinction between a system or module is. Why were these chosen for this table? Without any explanation as to why these are here it seems random. There is also no explanation as to why some are more expensive than others – are some better performing? In line 48, “it is necessary to use modules embedding as many AQ LCS as possible.” - why is it necessary?
In line 49, it is stated that one of these “is the best solution at the time of writing”. If this means the best choice for the author’s specific set of needs and wants, this needs to be clearly stated. It reads as an opinion stated as universal fact. Table 2 does not summarize 4 distinct concentration levels as stated in the text.
In line 66, the authors state “The calibration process of these LCS is a challenge, where ML and Deep Learning (DL) models can be used.” The authors have not given any information on why calibrating low-cost sensors is challenging. The introduction should include more background information on what these challenges are. There are numerous other papers using gradient boosting to calibrate low-cost sensors, yet there is not even one cited in this ‘related work’ section.
The final two paragraphs of section 2 are both non sequiturs. The authors do not mention data preprocessing, analysis or interpretability at all up to this point – this paragraph would only make sense if information on how others have handled these aspects of the data were included in the literature review of other ML calibration techniques.
On line 80, the authors write, “In conclusion, we see that to increase the AQ monitoring resolution at a city scale, LCS are required.” This has nothing to do with the related works in this section, where different machine learning algorithms and their previous performances are listed.
Table 3 seems unnecessary since most of the data available at this station was unused, and it seems the relevant ones are already listed in the text?
In section 3.2, the monitoring intervals listed on lines 104-105 are unclear. Is this 10 minute average or once every 10 minutes? The comment on line 105 “it is sufficient” is also unclear – you need to explain to the reader why without expecting them to read the entire Zhu paper.
For any table or figure, the reader should be able to understand it based on the table or figure and its caption alone. For table 4, the meaning of the abbreviations are not defined anywhere in the figure, caption, or main text. You shouldn’t make the reader guess what MAD, Diff., Stat., etc. stand for. Without any definitions, this information is not helpful to the reader. Even with definitions, it’s a huge jump from this table to what’s written in the text.
Of table 4, the authors write, “From these results, it is worth mentioning that the CH2O, CO, NO2 and TVOC sensors are not very reliable in the ZPHS01B module. Also, the RH sensor has a positive offset as we can see from the maximum value, 118%. The other sensors have a normal behaviour, although with low accuracy.” There is no CH2O, CO, or TVOC data in table 4. For NO2, the only pollutant mentioned in your description of table 4 that even appears in the table, I don’t know what about the random assortment of numbers and yes/no’s in the table is supposed to tell me that it’s ‘not very reliable’. For RH, I don’t see any value of 118% in the table. Are ‘RH’ (as written in the text) and ‘Hum’ (as written in the table) different? The text and the table have almost nothing in common, and neither helps me understand what you’re doing with the data.
On line 114, DFT is not defined. After reading the rest of the section, it is never explained HOW the results of Figure 3 are used in your analysis. What do those peaks and harmonies tell you, or how do they inform the way you built the model? This needs better explanation for the figure to be worth keeping.
In the figure 4 and 5 captions, ‘vs’ is typically reserved for Y vs X. Your reference and sensor ozone are plotted on the same X axis; consider rewording. In my opinion, Figure 4 can be removed as Figure 5 shows the same information but in better detail.
In table 5, some of the model acronyms are not defined in the text until well after their first appearance in the tables – moving these higher in the text or defining them directly in the table will make it easier on the reader. There is again discrepancy between ‘RH’ in the text and ‘Hum.’ in the figure. Was there a cutoff number to determine which were the most important? Was this across all models, or were the results of one in particular favored? Including this information in the text will help the reader to follow how you selected the three inputs to move forward with. I think the sentence “For clarity it is not included the importance of date and ozone itself from LCS values, that complete the rest.” is meant to explain why ozone isn’t included in this analysis, but the sentence doesn’t make sense. It might make more sense to include ozone in the analysis to demonstrate how important it is rather than ask the reader to just trust that it is.
In Figure 6, ‘CH2O’ (letter O) seems to be misspelled as ‘CH20’ (number zero). Many variables that were left out of the previous tables/figures are now shown here – CH2O, CO, TVOC. Had you already ruled these out? It seems that these are in the wrong order, at the very least. Tables 4 and 5, and Figures 3-6 all seem to be getting at which data to include in the model, but several of them could likely be moved to the supplement (or removed outright) pending better explanations of how these are actually used. What separate purpose does each of them serve?
In the paragraph beginning on line 136, the authors state, “, two of them showed better results” – which two? List this information here.
The location of tables 6 and 7 in the text doesn’t make sense – you are showing the results of the models before explaining what the models are in section 3.4. I don’t think showing both tables 6 and 7 is necessary. The authors state, “Thus, if we add more features that are not so significant, it makes the dataset poorer.” This is already a well-established principle in the field that does not require explicit demonstration. You’ve already shown in several figures and tables how you did feature selection – does this contradict the feature selection work you did earlier? Either way, there are many other papers establishing ozone sensor + temperature + humidity (and sometimes NOx) as the best model inputs for O3 (see several below). When many others have already demonstrated the same result that it’s taking you 4 tables and 4 figures to describe, you can just cite those who have done it before with a brief explanation.
https://doi.org/10.3390/atmos12050645
https://doi.org/10.5194/amt-11-1937-2018
https://doi.org/10.1016/j.snb.2018.12.049
In tables 8-11, the captions should indicate what the numbers in bold mean. This is stated once in the text in line 167, but the authors don’t state what criteria was used to decide on the ‘best option’. Was it highest R2, lowest RMSE? If needed, all of these except for that of the best performing model can be moved to the supplemental.
In line 202, it is stated that the 90-10 test-train split worked best for all models. Has any analysis been done to ensure these aren’t overfitting? This could be interesting to explore with Table 14 and/or figure 8, but table 14 without this doesn’t seem overly informative.
Similar with table 15 – in line 224, the authors write, “In the same line as before, once again we can see how the GB adjusts better compared with the other models.” If this entire table exists just to make a point about GB that has already been made, is it a necessary table?
In tables 6, 7, 12, 13, 14, 16, 17, I suggest clarifying in the captions whether this is training or testing data. If it’s all training data, I would be very interested in seeing the training data added as the training data is a better indicator of how this model would actually perform in the field.
Figure 7 is great and the most informative in the paper. If a graphical abstract is requested, I would suggest this one.
Figure 8 has a typo in ‘Percentage’ on the lower right. This plot could be much stronger if the R2 and RMSE were plotted for both the training and testing data instead of just training. It’s no question that the more training data you have, the better the fit will be – the training data is what will indicate whether you’re overfitting. This would be a great place to address overfitting in your discussion.
Figure 9 could use a sentence in the accompanying paragraph (starting at line 217) plainly stating what they key takeaway should be. Is it that HOP improves the model greatly regardless of the original model used?
I’m not sure I see the value of table 16 – as you stated in the introduction and in Figures 4-5, the raw sensor readings are completely unreliable on their own.
In table 17, ‘et al (2016)’ seems to be missing a name. I’m not sure I see the value of table 17 – if these were other studies comparing ozone quantification with the ZPHS01B module, that would make more sense to me than seemingly randomly selected projects using different sensors at different price points?
This study builds a strong ML model to fit a single low-cost sensor for ozone. Some of the challenges regarding field deployments of low-cost sensors include ensuring that each individual node is properly calibrated, and that these calibrations perform just as well in the field, where temperatures, humidities, and ozone concentrations not seen during the co-location with a reference instrument appear. For future works, it would be great to see the authors address what their path forward might look like.
Citation: https://doi.org/10.5194/amt-2024-127-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
142 | 36 | 6 | 184 | 7 | 5 |
- HTML: 142
- PDF: 36
- XML: 6
- Total: 184
- BibTeX: 7
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1