The authors have to some degree addressed my comments and the manuscript has improved. However, I still find it very hard to read, mostly because it has no clear structure and logical flow to it. Sections that describe developments, experiments, results and analysis are ultra short and leave out much detail. Much is left to the reader’s imagination and interpretation, which is not a good scientific reporting style.
The objectives of the work are still not entirely clear to me. Throughout the manuscript, the same phrases are repeated, but details on how these objectives are achieved is missing in the right places. Also, in my report on the original manuscript I have pointed out several of these shortcomings. The reply of the authors was to repeat the same phrases, which is not very impressive.
In conclusion, I am disappointed by the manuscript and by the authors’ efforts to improve it. As somebody not immediately working in the field of atmospheric science, I accepted the review of the manuscript, because the title and the premise sounded interesting. I believe the manuscript is publishable, but it takes more than the current revisions to make the manuscript readable and understandable. More details are given below:
- The introduction has now improved and it is easier to read. However, the section on the spark-induced breakdown spectroscopy (SIBS) and laser-induced breakdown spectroscopy (LIBS) is still very difficult to read. This is in part, because of the citation style. The citations are not separated from the actual text in any way and sometimes it is impossible to tell where a citation ends and the text starts again.
- I had recommended to add more context and background to the introduction. The authors talk at length about different detection methods, but do not say much about machine learning in the introduction, although half of the manuscript is about machine learning. I had recommended to cite and then discuss several recent machine learning approaches for spectra and spectroscopy. Since then there have been even more in the literature. This request has been blatantly ignored by the authors. Makes me wonder why we reviewers write referee reports in the first place.
- The objectives are now stated more clearly: “We employed two complementary approaches: (1) decreasing the cost of the electronics associated with SIBS and (2) incorporating advanced data analysis techniques to improve quantification and limit of detection.” However, what is still not clear to me is how the cost is decreased (see below). An example of frequent circular arguments in this manuscript is a sentence in the same paragraph in the introduction: “The expensive components such as spark generation and delay generator have been developed to reduce the overall cost.”
To me this sentence sounds like `we developed expensive components to reduce the cost’, which would, of course, be a contradiction. I suppose the authors meant to say that they improved the expensive components so that they become cheaper. Maybe the sentence could also be improved.
- Both for the spark generation system and the delay generator, the manuscript states that these are expensive components. But it does not say why. What is expensive about them? I am not an expert on these instruments and most likely most readers will not be either.
- How is the spark generator improved? The manuscript devotes literally only 1 sentence to this “In our setup, a 10Ω resistor maximizing power dissipation in the spark gap, while minimizing oscillations.” And this sentence is not even grammatically correct. It has no verb. Such a short description is simply not good enough and not scientific at all.
- Spectra collection: this section is also very short. How many spectra were collected in the end and at what settings? What is the spectral resolution? How are the spectra discretised, etc.? In other words, what is the data set?
- What I am missing throughout is a section and a schematic figure that illustrates the whole process. What does the instrument look like and how do the different components that the authors improve fit in? How do we go from spark generation to a spectrum? How is the spectrum further processed with machine learning? This schematic would ideally also illustrate the overall objectives of the research.
- The results section has more figures than it has text. Very hard to understand anything.
- In the clustering section on page 6, I could not follow anything anymore after “For each element, 0.1, 1, 10 and 100 ng…” What does “element” refer to here? I think this part might now be describing data collection? Maybe? And therefore answer one of my earlier questions? But then seamlessly without paragraph break the paragraph continues with “Feature scaling is a standard preprocessing step…” How do we get from clustering to spectra collection to feature scaling? I cannot follow the logic anymore.
- “Figure 7 illustrates the Boltzmann plot…” What is a Boltzmann plot?
- Throughout the whole results section, the authors speak of features and feature selection. A “feature” in machine learning is an abstract concept that refers too patterns in the data input. It is ok to use the word features, but then it has to be clear what the data is, what the data input is to the machine learning and what the features are. I have figured out now, that the input into the machine learning is x-y-z data in the form of discretised spectra. x are the frequencies and y the intensities of the spectra and z are the detected masses of metals. A feature is then simply one point in the spectrum. This information may seem trivial to the authors, but without it, the manuscript is much harder to read and understand.
- in equation 1 “m” appears to be the number of spectra, i.e. the data set size. Would be good to say that in the text.
- Figure 8 is used to justify the choice of the regularisation constant c. I find this procedure also hard to follow. Instead of plotting the training and test loss as function of the number of features (which is related to c) it would be more illustrative to plot it as a function of c. Or have an alternative x-axis at the top of the graph that shows the c values.
- The whole discussion on the univariate “technique” is very hard to follow. What was actually done? What is plotted in Figures 12 and 13? |