Quantification of toxic metals using machine learning techniques and spark emission spectroscopy

Davari, Seyyed Ali; Wexler, Anthony S.

doi:https://doi.org/10.5194/amt-13-5369-2020

Articles | Volume 13, issue 10

https://doi.org/10.5194/amt-13-5369-2020

© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/amt-13-5369-2020

© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 13, issue 10

Research article

|

09 Oct 2020

Research article |

| 09 Oct 2020

Quantification of toxic metals using machine learning techniques and spark emission spectroscopy

Seyyed Ali Davari and Anthony S. Wexler

Download

Final revised paper (published on 09 Oct 2020)
Preprint (discussion started on 23 Oct 2019)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Quantification of toxic metallic elements using machine learning techniques and spark emission spectroscopy', Anonymous Referee #1, 06 Dec 2019
- AC1: 'Response to Reviewer 1 Comments', Ali Davari, 17 Feb 2020
- AC3: 'Response to Reviewer 1 Comments with Suppl', Ali Davari, 20 Feb 2020
RC2: 'Quantification of toxic metallic elements using machine learning techniques and spark emission spectroscopy', Anonymous Referee #2, 19 Dec 2019
- AC2: 'Response to Reviewer 2 Comments', Ali Davari, 17 Feb 2020
- AC4: 'Response to Reviewer 2 Comments with Suppl', Ali Davari, 20 Feb 2020

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

AR by Ali Davari on behalf of the Authors (23 Feb 2020) Author's response Manuscript

ED: Referee Nomination & Report Request started (04 Mar 2020) by Francis Pope

RR by Anonymous Referee #1 (15 Mar 2020)

RR by Anonymous Referee #2 (16 Mar 2020)

Suggestions for revision or reasons for rejection

The authors have to some degree addressed my comments and the manuscript has improved. However, I still find it very hard to read, mostly because it has no clear structure and logical flow to it. Sections that describe developments, experiments, results and analysis are ultra short and leave out much detail. Much is left to the reader’s imagination and interpretation, which is not a good scientific reporting style.

The objectives of the work are still not entirely clear to me. Throughout the manuscript, the same phrases are repeated, but details on how these objectives are achieved is missing in the right places. Also, in my report on the original manuscript I have pointed out several of these shortcomings. The reply of the authors was to repeat the same phrases, which is not very impressive.

In conclusion, I am disappointed by the manuscript and by the authors’ efforts to improve it. As somebody not immediately working in the field of atmospheric science, I accepted the review of the manuscript, because the title and the premise sounded interesting. I believe the manuscript is publishable, but it takes more than the current revisions to make the manuscript readable and understandable. More details are given below:

- The introduction has now improved and it is easier to read. However, the section on the spark-induced breakdown spectroscopy (SIBS) and laser-induced breakdown spectroscopy (LIBS) is still very difficult to read. This is in part, because of the citation style. The citations are not separated from the actual text in any way and sometimes it is impossible to tell where a citation ends and the text starts again.

- I had recommended to add more context and background to the introduction. The authors talk at length about different detection methods, but do not say much about machine learning in the introduction, although half of the manuscript is about machine learning. I had recommended to cite and then discuss several recent machine learning approaches for spectra and spectroscopy. Since then there have been even more in the literature. This request has been blatantly ignored by the authors. Makes me wonder why we reviewers write referee reports in the first place.

- The objectives are now stated more clearly: “We employed two complementary approaches: (1) decreasing the cost of the electronics associated with SIBS and (2) incorporating advanced data analysis techniques to improve quantification and limit of detection.” However, what is still not clear to me is how the cost is decreased (see below). An example of frequent circular arguments in this manuscript is a sentence in the same paragraph in the introduction: “The expensive components such as spark generation and delay generator have been developed to reduce the overall cost.”

To me this sentence sounds like `we developed expensive components to reduce the cost’, which would, of course, be a contradiction. I suppose the authors meant to say that they improved the expensive components so that they become cheaper. Maybe the sentence could also be improved.

- Both for the spark generation system and the delay generator, the manuscript states that these are expensive components. But it does not say why. What is expensive about them? I am not an expert on these instruments and most likely most readers will not be either.

- How is the spark generator improved? The manuscript devotes literally only 1 sentence to this “In our setup, a 10Ω resistor maximizing power dissipation in the spark gap, while minimizing oscillations.” And this sentence is not even grammatically correct. It has no verb. Such a short description is simply not good enough and not scientific at all.

- Spectra collection: this section is also very short. How many spectra were collected in the end and at what settings? What is the spectral resolution? How are the spectra discretised, etc.? In other words, what is the data set?

- What I am missing throughout is a section and a schematic figure that illustrates the whole process. What does the instrument look like and how do the different components that the authors improve fit in? How do we go from spark generation to a spectrum? How is the spectrum further processed with machine learning? This schematic would ideally also illustrate the overall objectives of the research.

- The results section has more figures than it has text. Very hard to understand anything.

- In the clustering section on page 6, I could not follow anything anymore after “For each element, 0.1, 1, 10 and 100 ng…” What does “element” refer to here? I think this part might now be describing data collection? Maybe? And therefore answer one of my earlier questions? But then seamlessly without paragraph break the paragraph continues with “Feature scaling is a standard preprocessing step…” How do we get from clustering to spectra collection to feature scaling? I cannot follow the logic anymore.

- “Figure 7 illustrates the Boltzmann plot…” What is a Boltzmann plot?

- Throughout the whole results section, the authors speak of features and feature selection. A “feature” in machine learning is an abstract concept that refers too patterns in the data input. It is ok to use the word features, but then it has to be clear what the data is, what the data input is to the machine learning and what the features are. I have figured out now, that the input into the machine learning is x-y-z data in the form of discretised spectra. x are the frequencies and y the intensities of the spectra and z are the detected masses of metals. A feature is then simply one point in the spectrum. This information may seem trivial to the authors, but without it, the manuscript is much harder to read and understand.

- in equation 1 “m” appears to be the number of spectra, i.e. the data set size. Would be good to say that in the text.

- Figure 8 is used to justify the choice of the regularisation constant c. I find this procedure also hard to follow. Instead of plotting the training and test loss as function of the number of features (which is related to c) it would be more illustrative to plot it as a function of c. Or have an alternative x-axis at the top of the graph that shows the c values.

- The whole discussion on the univariate “technique” is very hard to follow. What was actually done? What is plotted in Figures 12 and 13?

Hide

ED: Reconsider after major revisions (09 Apr 2020) by Francis Pope

AR by Ali Davari on behalf of the Authors (21 May 2020) Author's response Manuscript

ED: Publish as is (29 Jul 2020) by Francis Pope

AR by Ali Davari on behalf of the Authors (07 Aug 2020) Manuscript

Short summary

Traditional instruments for detection and quantification of toxic metals in the atmosphere are expensive. In this study, we have designed, fabricated, and tested a low-cost instrument, which employs cheap components to detect and quantify toxic metals. Advanced machine learning (ML) techniques have been used to improve the instrument's performance. This study demonstrates how the combination of low-cost sensors with ML can address problems that traditionally have been too expensive to be solved.