Improved real-time bio-aerosol classification using artificial neural networks

Leśkiewicz, Maciej; Kaliszewski, Miron; Włodarski, Maksymilian; Młyńczak, Jarosław; Mierczyk, Zygmunt; Kopczyński, Krzysztof

doi:https://doi.org/10.5194/amt-11-6259-2018

Articles | Volume 11, issue 11

https://doi.org/10.5194/amt-11-6259-2018

© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/amt-11-6259-2018

© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 11, issue 11

Research article

|

20 Nov 2018

Research article |

| 20 Nov 2018

Improved real-time bio-aerosol classification using artificial neural networks

Maciej Leśkiewicz, Miron Kaliszewski, Maksymilian Włodarski, Jarosław Młyńczak, Zygmunt Mierczyk, and Krzysztof Kopczyński

Download

Final revised paper (published on 20 Nov 2018)
Preprint (discussion started on 20 Apr 2018)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Review of Leśkiewicz et al., Improved real-time bio-aerosol classification using Artificial Neural Networks, for AMT.', Anonymous Referee #1, 05 Jun 2018
- AC1: 'Responses to Reviewer's comments.', Miron Kaliszewski, 13 Jul 2018
RC2: 'Review of Leśkiewicz et al.', Anonymous Referee #3, 10 Jun 2018
- AC2: 'Responses to Reviewer's comments.', Miron Kaliszewski, 13 Jul 2018

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

AR by Miron Kaliszewski on behalf of the Authors (13 Jul 2018) Author's response Manuscript

ED: Referee Nomination & Report Request started (17 Jul 2018) by Mingjin Tang

RR by Anonymous Referee #1 (31 Jul 2018)

RR by Anonymous Referee #4 (20 Sep 2018)

Suggestions for revision or reasons for rejection

This manuscript discusses the application of artificial neural networking (ANN) techniques to the previously developed BARDet fluorescence detection system, as well as the potential application to aerosol characterization. In this study, the authors aerosolized 48 different fluorescent aerosols and attempted to model a system of artificial neural networks into a decision tree to appreciably categorize the measured particles. The resulting system was 22 sets of ANNs to totally classify the overall data set over multiple iterations. Real-time and inexpensive bioaerosol classification is an extremely important step to understanding the overall effects bioaerosols have on the environment and climate, and so a paper addressing such could be of great interest AMT community. However, there are some issues here that need to be addressed prior to considering acceptance.

Major

1. Leaving out non-fluorescent particles may be a bigger challenge to successfully implementing the ANNs than the authors suggest, considering that the majority of atmospheric aerosols are largely “non-fluorescent.” There’s no justification of this further in the text of this manuscript other than the statement in line 195 regarding application of a threshold. Separating a host of fluorescence particle types is one thing, but an atmospheric sample is going to contain an extreme minority of fluorescent particles. A recent study (Savage et al., 2018) utilized HAC techniques to attempt to classify similar types of particles, though the high-thresholding needed to get rid of the majority of the “non-fluorescent” particles ultimately confounded the clustering algorithm due to an appreciable number of “fluorescent” particles being removed as well. If this paper is going to move forward, this needs to be addressed as a limitation of the study. For example, I suggest discussing that nonfluorescent particles being absent is a limitation for usage in ambient studies, though future work could include them in attempts to mimic ambient conditions.

2. The section describing the ANN generation and decision tree processing needs to clarify that this process isn’t replicable in terms of the exact factors used for ANN generation, and that the ANN decision-tree generation results may be different with subsequent trials. A response to reviewer 3 from the first submission discusses this in length (ie how the weights/start factors are randomly chosen). I understand this was a real-time attempt as classification, but specific factors being non-replicable as well as having no secondary decision-tree development trials shown, and utilizing a new type of instrument only available to these researchers, can greatly limit possible impacts of this manuscript. This is only compounded by the first point (No non-fluorescent particles probed) in that what was done here may not be applicable to ambient data sets.

3. The sizes of individual particle types, as well as asymmetry factor, are listed in table 2 for each particle type. It’s not clear from the text that these parameters are being used in the ANNs in any way, and in fact the opposite seems to be the case. In terms of the sizes of particles, there are three sub-points:

a. Some of these pollen particles are seen around 85 microns (A. alba), and relatively small particles around 2 microns (Riboflavin) are also being measured. Other commercial UV-LIF instrumentation have issues with detecting simultaneously small and large particles without having limit of detection or saturation issues respectively. The 2016 paper shows some information on size dependence, but only goes up to 8 microns. A statement about the dynamic range of the instrumentation would be helpful here.
b. The average and standard deviation of size is mentioned in this chart, though with no units attached. This needs to be addressed on the table. With the FM7 measurements listed, it appears to be in microns. It is unlikely that the authors would be measuring intact pollen grains with such aggressive sampling methods (intense vortexing/vibration). Aerosolization of pollen has been seen to rupture pollen in previous studies (Hernandez et al., 2016; Savage et al., 2017) as well, let alone aggressive vortexing/vibration. The low uncertainty on the measurements (e.g. 44.8 + 2.01 for S. cereale pollen) also points to intact pollen being measured.
c. Why was only the normalized spectral shape used in the ANN decision making? In a particularly bad example buckwheat flour and cellulose were effectively unable to be classified against one another, though these two particles types showed very different average size and asymmetry factors.

4. The paragraph beginning on line 53, describing fluorescent particles and their detection/characterization, seems to be missing several key papers, as well as cites a paper (Hernandez et al., 2016) that is irrelevant to the discussion there. Pan et al., 2007; Crawford et al., 2015; Ruske et al. 2017, 2018; Savage et al., 2017 and 2018 are all examples of recent work that support recent work in the area discussed in the referenced sentence.

5. There needs to be mention of the absence of nonfluorescent particles in the abstract, and that further work would need to probe this.

6. Usage of the word “real-time” in the abstract is misleading, because while the instrument does measure in real time the data was collected separately (per aerosol type). The time-component for this study is irrelevant in this case.

Minor (or Technical) Points

1. Raw number of particles per aerosolized particle type is not listed here, and instead a raw number total (114779) for the entire data set is listed, with an average spectra per total listed (~2400). This needs to be addressed, as the statement of 2400 average could be true, though it could also be misleading.

2. Take out the word “impressive” on line 30 (before effectiveness)

3. Line 188: Leaf scraps should be “leaf litter”

4. Naming of things in Table 2 not consistent (e.g. pollen types - some scientific name, some common name), nor are the abbreviations (e.g. Ambio vs FM7 vs PF) which is distracting to the overall data.

5. Graph styles are not consistent (Figure 7 and 8 ROC graphs have different tick numbering, as well as background line densities) which is distracting to the overall presentation of the data.

6. The text size on certain figures (8 and 9) need to be increased, as well as the ROC graphs are low-resolution compared to the confusion matrices listed.

7. Line 123: “The simple statistics” isn’t the correct syntax. Maybe “simpler statistical analysis”

8. Line 192: This line makes no sense currently, and needs reworked.

9. Line 194: “The non-fluorescent particles were not a subject of the research since they can be automatically discarded as non-biological applying given fluorescence threshold.” This line needs taken out, because it is fundamentally wrong. As addressed above, the non-fluorescent removal via higher thresholding isn’t sufficient reasoning to claim they’re going to efficiently be removed because this gets rid of a large overall number of particles (Savage et al., 2017, Savage et al., 2018) and can confound HAC clustering, at the very least. This needs to be mentioned, but as a limitation overall of the scope of this paper.

10. Fluoromax Microspheres are cited as the material used for the FM7, though it doesn’t cite the particular fluorescent type (usually listed as a color) used.

11. Figure 2’s usage of 50 spectra-per-type is more confusing than not to how the input data is used for the ANN training. I assume if the reported 2400 spectra had been visualized it would be much busier, but this gives the impression that the training data only used 50 spectra for each aerosol type, which may or may not be the case.

12. Line 16: The term “air contamination” is not usually associated with biological particles, unless there is a specific source of contamination like a waste facility or a mold outbreak.

13. Figure 2 significant figures listed are not uniform for all Size and Asymmetry Factor measurements.

Citations:

Pan, Y.L., Pinnick, R.G., Hill, S.C., Rosen, J.M. and Chang, R.K., 2007. Single‐particle laser‐induced‐fluorescence spectra of biological and other organic‐carbon aerosols in the atmosphere: Measurements at New Haven, Connecticut, and Las Cruces, New Mexico. Journal of Geophysical Research: Atmospheres, 112(D24).

Savage, Nicole J., et al. "Systematic characterization and fluorescence threshold strategies for the wideband integrated bioaerosol sensor (WIBS) using size-resolved biological and interfering particles." Atmospheric Measurement Techniques10.11 (2017): 4279-4302.

Savage, Nicole J., and J. Alex Huffman. "Evaluation of a hierarchical agglomerative clustering method applied to WIBS laboratory data for improved discrimination of biological particles by comparing data preparation techniques." Atmospheric Measurement Techniques 11.8 (2018): 4929-4942.

Crawford, I., et al. "Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol." Atmospheric Measurement Techniques8.11 (2015): 4979-4991.

Ruske, Simon, et al. "Evaluation of machine learning algorithms for classification of primary biological aerosol using a new UV-LIF spectrometer." Atmospheric Measurement Techniques (2017).

Hide

ED: Reconsider after major revisions (22 Sep 2018) by Mingjin Tang

AR by Miron Kaliszewski on behalf of the Authors (31 Oct 2018)

ED: Publish as is (06 Nov 2018) by Mingjin Tang

AR by Miron Kaliszewski on behalf of the Authors (06 Nov 2018) Author's response Manuscript

Short summary

In this study we demonstrate the application of artificial neural networks to the real-time analysis of single-particle fluorescence fingerprints acquired using BARDet (a BioAeRosol Detector). 48 different aerosols including pollens, bacteria, fungi, spores and nonbiological substances were characterized. An entirely new approach to data analysis using a decision tree comprising 22 independent neural networks was discussed. A very high accuracy of aerosol classification in real time resulted.