The manuscript has be improved considerably, but there is still some missing information and incomplete reasoning that needs to be corrected before it can be considered for publication. In particular, I am not convinced about the biases and variances of least-squares fits in Figures 3 and 4. Some key information I asked for was also given only in the reply letter, but the manuscript was not changed accordingly.
>> 1. The manuscript lacks critical references and fails to explain key principles of the AI
>> model. If the idea is to introduce the AI techniques to the ISR community, skipping most
>> of the key information "for brevity" may not be the best choice. It is understandable that
>> Section 3.1, which describes the AI architecture, is full of field-specific jargon, but the
>> terminology should be explained to the reader in such a level that reading the text is
>> possible also for a non-expert of the field without reading all the references, and
>> references to the key concepts should be given for readers who are interested in more
>> details.
>>
> Response: Section 3.1 has been significantly expanded to clarify field-specific
> terminology and to explain the terms of the transformer architecture in a manner
> accessible to non–machine-learning experts. We have also included a number of
> references in several areas.
AND
>> Sections 3.1, 3.2 and 3.3: Please explain the AI terminology so that also readers who are
>> not familiar with it can follow the description at least superficially, and give sufficient
>> references. I will not list every single point separately in these comments.
>>
> Response: Done
Section 3.1 was improved significantly and is now better readable also for a non-expert of the field. However, Section 3.2, which was mentioned in the detailed comments, should be improved similarily. Please give references to, or explain terms like "batch size", "epoch", "Adam optimizer", "learning rate", LLRD, and BERT. Many of these are basic AI terminology, but references would be very useful for non-experts in the field.
>> 3. Least-square fits contain several tunable parameters that may greatly affect quality of
>> the results, but these are not considered at all. In particular, stopping criteria for the
>> iterations and initial values of the fitted parameters may affect both standard deviation
>> and bias of the results. These should be carefully evaluated when comparisons between
>> the AI model and the least-squares fits are performed.
>>
> Response: In the present study, the least-squares baseline does not involve a general
> iterative optimization with multiple tunable convergence parameters. Doppler velocity is
> estimated by minimizing the squared error between the measured and modeled spectra
> over a one-dimensional grid of Doppler shifts with a fixed resolution (0.1 m/s). For this
> formulation, the objective function admits a unique global minimum for a given
> resolution, and the resulting solution is therefore deterministic and independent of both
> initialization and stopping criteria. The following text has been added to the end of
> section 3.3.
> ‘Finally, the traditional curve fitting method is included as a reference for comparison against the
> best-performing deep learning model, where Doppler velocity is obtained by a one-dimensional
> least-squares search over a discretized Doppler shift grid with a resolution of 0.1 m/s. For this
> formulation, the cost function has a unique global minimum at the chosen resolution, yielding a
> deterministic solution that is independent of initialization and stopping criteria. The fitting
> algorithm is described in Li & Zhou (2024).’
Thank you for the clarification. The Doppler shift grid resolution of 0.1 m/s also seems sufficient, but how wide is this grid? Also, are the biases in Figure 3 (a) calculated as mean values over all fit results at each altitude? If the grid is not wide enough, the posterior distribution may be non-zero at its edges. This will cause bias in mean values of the fitted velocities and their variances when the true velocity is non-zero, because the distribution is truncated at different distances from its true mean value at negative and positive sides. Maximum value of the posterior distribution should still provide an unbiased estimate.
I am still surprised to see that bias and standard deviation of the LSF depends on velocity in Figure 3 (a) and (b). Such behaviour might perhaps be expected for the moment method, but I cannot see how zero-mean noise added to the spectra could bias the LSF. The altitude variations make a qualitative match with the idea that they may be caused by truncation of the posterior distribution due to insuffucient width of the velocity grid. The authors should investigate the peak values of the posterior distribution or test the analysis with a wider velocity grid to see if this affects the results.
As an example, based on Figure 3, the true velocities reach almost 60 m/s and standard deviation of the fit reaches 15 m/s at some altitudes. To reach 4-sigma limit on both sides of the distribution at all altitudes, which is safe for sure, the velocity grid should span the range [-120 120] m/s. If the range is reduced to [-90 90] m/s (2-sigma), almost 3 percent of the posterior distribution may be cut off, and the mean values will be biased accordingly.
>> 5. The Arecibo radar collapsed a few years ago, but there are several other incoherent
>> scatter radars in the world. Re-analysis of the archived Arecibo data is indeed valuable,
>> but the authors could also comment if their technique might be usable for data from
>> other radars that have considerably lower SNR and operate in completely different
>> geophysical environments. In particular, other radars may observe much larger velocities
>> and the users are typically interested also in electron densities and electron and ion
>> temperatures, not just plasma velocities. At very end of the conclusions the authors
>> claim, without any justification, that the model can be applied more broadly, but the
>> very different noise levels and very much larger line-of-sight velocities observed with
>> many other ISRs are not discussed at all.
>>
> Response: Although the examples in this study are designed specifically for Arecibo ISR,
> the model itself is trained entirely on synthetic ISR spectra, for which radar parameters,
> Doppler velocity range, and SNR are manually controlled. Therefore, the training set can
> be easily adapted to a different SNR and Doppler velocity range.
> The relevant text has been revised to ‘As the training data are generated using physics-based
> ISR simulations, SNR and Doppler velocity range can be explicitly controlled during data
> generation. Therefore, the proposed framework is not inherently limited to the Arecibo ISR and
> can be adapted to other instruments by retraining the model using instrument-specific
> parameters and configurations.’
The training set could be generated for any radar system and conditions, but this does not guarantee good performance in data analysis. Very much larger velocities, lower SNR, and steep gradients in plasma parameter profiles are detected by high-latitude radars. I agree with the reply above, but I do not find justification for the much stronger statement "The AI approach applies to all situations where the spectrum can be parameterized." in the abstract and "the approach applies to any situation where the observed spectra can be parameterized." in Conclusions. Please give some evidence that the method actually works in the very different conditions where many other ISRs operate, or change these sentences.
>> Line 64: "...with the traditional curve fitting method."
>> Please explain what is "the traditional curve fitting method", and give a reference.
>>
> Response: “The traditional curving fitting method” is the least-squares fitting method.
> We have changed the sentence to: “The interpolation was originally introduced for
> compatibility with the LSF method used in Li and Zhou (2024) and is retained in this
> work without modification.”
Thank you for clarifying this. The term "curve fitting method" is still used in several places. This should be replaced with "LSF method" everywhere.
>> Equation (1): Shape of this profile seems to affect the final results, because the context-
>> aware AI model learns this profile shape. Is there some physical justification for the
>> selected function?
>>
> Response: Equation (1) constrains the maximum vertical variation of plasma parameters
> over the 1.5 km altitude range, with hyperparameters selected empirically based on
> variability observed in real ISR measurements.
This is a very critical point for the analysis and should be explained in detail in the manuscript. Please add a detailed explanation about how Equation (1) was selected in the text. This selection may also restrict use of the model. For example, does it work in presence of narrow sporadic E layers, which produce very steep gradients in Ne profiles? Also, regarding applicability to other radar systems, how would steep gradients in Vi profiles affect the results?
>> Lines 124-125: "In transformer architectures such as BERT or ViT (Devlin et al. 2019;
>> Dosovitskiy et al. 2020)."
>> This sentence seems to be completely detached from the surrounding text.
>>
> Response: This was a typo in the original manuscript. It was supposed to be a comma
> rather than a period after the sentence.
The manuscript has not been changed accordingly.
>> Line 199: "...context-unaware model is trained on standalone 101-point spectra with
>> artificial noise..."
>> Is this noise somehow different from the noise added to the 5x101 input of the context-
>> aware model?
>>
> Response: No. The same noise variance is applied independently at each height in both
> models. The context-aware model uses all five height-resolved spectra as separate
> tokens, while the context-unaware model averages the five heights.
The text was not modified and is still unclear. It gives the impression that the noise is somehow different for the context-unaware model. Please clarify the text.
>> Lines 208-217 & Figure 2. I do not understand what LSF-ideal and LSF-realistic mean here
>> and how the comparison is done. The contours in Figure 2 are as function of bandwidth
>> and noise std, but then the authors claim that there was no added noise (noise std=0?)
>> in the LSF-ideal case. Please explain what happens in the comparison.
>>
> Response: LSF-ideal and LSF-realistic differ only in how the spectrum used for Doppler
> fitting is obtained. In the LSF-ideal case, we assume the true (noise-free) spectrum shape
> is known and retrieve the Doppler velocity by shifting this fixed template along the
> frequency axis and minimizing the least-squares error. LSF-Ideal represents the limiting
> case for the LSF method. In the LSF-realistic case, the spectrum shape is unknown and
> must first be estimated from noisy data. To make it easier to read, we have also added
> the explanation in the Figure 2 caption.
Thank you for clarifying this. However, it is not sufficient to put the explanation in the figure caption. It should be included in the main text at the point where LSF-ideal is first mentioned. Also, are the values rounded to the nearest grid point in the LSF, or are exact values used? If exact values are used, this might partially explain the differences between LSF-ideal and LSF-realistic.
>> Lines 244-246: "The LSF and moment methods underestimate the true velocity for the
>> same reason that the mean velocity tends to zero in the absence of noise."
>> I do not understand this sentence. What is "the same reason"?
>>
> Response: The original sentence has a typo. “in the absence of noise” should have been
> “in the absence of signal”. The sentence is now changed to: “In the extreme case of all
> noise and no signal, the mean LSF and moment velocities tend to zero because the
> estimated velocities are symmetrically distributed at positive and negative values.
> Similarly, as long as there is noise, LSF and moment techniques tend to underestimate
> the velocity amplitude.”
At the limit of zero signal the distribution should become flat, and it is centered around zero only because the search space is symmetric with respect to zero velocity. For non-zero signals the distribution should be centered at the true velocity. It seems possible that distribution of the LSF results is truncated due to insufficient width of the velocity grid in this case (see my comment above).
Detailed comments of the revised manuscript.
Lines 107-108: "...where domain knowledge shapes the training data..."
Is "domain knowledge" the set of possible altitude profiles in equation (1) in practice?
Lines 130-131: "...Doppler shift manifests as a subtle, globally coherent displacement in frequency space that is shared across altitudes..."
The Doppler shift changes with altitude, it cannot be described as "globally coherent".
Line 147: AS -> As
Line 161: "commonly referred to as [CLS] token in AI literature"
Please give a reference.
Title of Section 3.2. "training" -> "Training" (or perhaps "Training the AI model" or something similar?)
Line 255: "...When 𝜂 is above 30 (~101.5), it is largely independent of..."
Does 'it' refer to the velocity RMSE?
Line 276: "for a representative condition at Arecibo"
Please give details of these conditions.
Lines 350-351: "Above 120 km, a larger number of heights can be used in..."
Does one need re-train the model for this? |
The manuscript reports results of line-of-sight plasma velocity fits to Arecibo ISR data using artificial intelligence (AI), namely context-aware transformers. The manuscript seems to be continuation to a series of papers by the same authors [1,2,3], in which they apply different data analysis techniques to archived coded long pulse (CLP) data from the Arecibo radar. The results suggest that AI may produce high-quality results in ISR data analysis.
While the idea to replace the traditional least-square fitting techniques with computationally less expensive (after the expensive training has been done) AI techniques is novel and the results look promising in general, I have several critical comments about the text and interpretation of the results.
1. The manuscript lacks critical references and fails to explain key principles of the AI model. If the idea is to introduce the AI techniques to the ISR community, skipping most of the key information "for brevity" may not be the best choice. It is understandable that Section 3.1, which describes the AI architecture, is full of field-specific jargon, but the terminology should be explained to the reader in such a level that reading the text is possible also for a non-expert of the field without reading all the references, and references to the key concepts should be given for readers who are interested in more details.
2. The actual scientific target of the measurements considered remains unclear. The authors first give a few very general motivations for measuring the Doppler velocities (without references). The AI model, which solves only for plasma velocities, is then compared with a least-squares fitting technique, which fits also several other parameters. Is there some specific application, for which the velocities alone are important? Would it be possible to use the AI model to fit the same parameters that are fitted with the least-squares solver? On line 300 the authors finally claim that focus of this study is around 110 km altitude. What exactly is the focus and why not to mention it in the abstract and in the introduction?
3. Least-square fits contain several tunable parameters that may greatly affect quality of the results, but these are not considered at all. In particular, stopping criteria for the iterations and initial values of the fitted parameters may affect both standard deviation and bias of the results. These should be carefully evaluated when comparisons between the AI model and the least-squares fits are performed.
4. Computational requirements are not discussed at all until the Conclusions section, where the authors claim that "Velocity inference is roughly 100 times faster than the fitting method and requires significantly fewer computational resources.". This may be true, but some key figures about computational resources needed for both training the model and the final velocity inference should be given. Also the training part is important for potential users of the technique, because it seems that one may need to train the model for each radar and radar operation mode separately.
5. The Arecibo radar collapsed a few years ago, but there are several other incoherent scatter radars in the world. Re-analysis of the archived Arecibo data is indeed valuable, but the authors could also comment if their technique might be usable for data from other radars that have considerably lower SNR and operate in completely different geophysical environments. In particular, other radars may observe much larger velocities and the users are typically interested also in electron densities and electron and ion temperatures, not just plasma velocities. At very end of the conclusions the authors claim, without any justification, that the model can be applied more broadly, but the very different noise levels and very much larger line-of-sight velocities observed with many other ISRs are not discussed at all.
Detailed comments:
Lines 22-23: "particularly during disturbed conditions."
Does this mean that the radars are more reliable than other instruments during disturbed conditions?
Line 25-26: References to studies where these measurements are valuable would be useful.
Line 28: To my understanding, the moment and autocorrelation methods are not commonly used for ISR data analysis, because computers are powerful enough for the least-squares fits and the users are typically interested in many other plasma parameters as well. Please correct me if I am wrong.
Lines 33-34: "Their easy implementations and computational efficiency make them a popular first choice."
Again, is this still true for IS radars nowadays?
Lines 39-40: "Unlike traditional methods,..."
Does this refer to some traditional machine learning methods, or to the traditional radar data analysis methods?
Line 59: Please give a reference to the coded long pulse technique.
Line 64: "...with the traditional curve fitting method."
Please explain what is "the traditional curve fitting method", and give a reference.
Equation (1): Shape of this profile seems to affect the final results, because the context-aware AI model learns this profile shape. Is there some physical justification for the selected function?
Lines 85-86: "context-aware" and "context-unaware" are here used without explaining the terms first.
Lines 93-94: Please give references to the "broader definitions".
Section 2: It would be useful to show some examples of the synthetic IS spectra with different noise levels.
Sections 3.1, 3.2 and 3.3: Please explain the AI terminology so that also readers who are not familiar with it can follow the description at least superficially, and give sufficient references. I will not list every single point separately in these comments.
Lines 124-125: "In transformer architectures such as BERT or ViT (Devlin et al. 2019; Dosovitskiy et al. 2020)."
This sentence seems to be completely detached from the surrounding text.
Line 199: "...context-unaware model is trained on standalone 101-point spectra with artificial noise..."
Is this noise somehow different from the noise added to the 5x101 input of the context-aware model?
Lines 208-217 & Figure 2. I do not understand what LSF-ideal and LSF-realistic mean here and how the comparison is done. The contours in Figure 2 are as function of bandwidth and noise std, but then the authors claim that there was no added noise (noise std=0?) in the LSF-ideal case. Please explain what happens in the comparison.
Lines 234-235: "...frequently used moment method..."
Please provide references that demonstrate the frequent use of the moment method in ISR data analysis.
Lines 243-244: Is the bias in the LSF results possibly affected by the initial parameter values? One might expect this kind of bias profile if the iteration starts from zero velocity and tends to stop a bit too early.
Lines 244-246: "The LSF and moment methods underestimate the true velocity for the same reason that the mean velocity tends to zero in the absence of noise."
I do not understand this sentence. What is "the same reason"?
Lines 248-249: Does the LSF standard deviation depend also on stopping criteria of the iteration? If the criteria are too loose, the iteration might stop at random locations around the true minimum of the cost function, increasing the noise.
Figure 3, panel c: please change the colors, especially yellow is almost invisible.
Lines 279-280: "For a slowly varying quantity, the standard deviation of the second-order difference of independent samples is √6 times of the random error, as measured by the standard deviation."
Please give a reference.
Lines 294-296: Is it possible that the fluctuations are true temporal variations in the wind field?
Lines 299-300: "In any event, the AI error is still 30% smaller than the LSF method around 110 km, which is the focus of the current study"
What exactly is the focus of the current study, and why is this mentioned only on line 300?
Caption of Figure 5: (divided by 20) -> (divided by 40)?
Conclusions: The conclusions should summarize the results and they should preferrably be understandable without reading the whole manuscript. Neither of these conditions is fulfilled in this case. The contents of the first paragraph would better fit to the preceding sections, and the discussion about computing resources should be expanded there.
[1] Li, Y., and Zhou, Q.: Measurements of F1-region ionosphere state variables at Arecibo through
quasi height-independent exhaustive fittings of the incoherent scatter ion-line spectra, J.
Geophys. Res. Space Phys., 129(11), e2024JA032620, 2024.
[2] Li, Y., and Zhou, Q.: Accurate spectral fitting in the upper F-region using the randomly coded data
of the Arecibo 430 MHz radar, J. Geophys. Res. Space Phys., 130, e2025JA033877,
https://doi.org/10.1029/2025JA033877, 2025a.
[3] Zhou, Q., Li, Y., and Gong, Y.: Variance estimations in the presence of intermittent interferences
and their applications to incoherent scatter radar signal processing, Atmos. Meas. Tech., 17(14),
4197–4209, https://doi.org/10.5194/amt-17-4197-2024, 2024.