Estimation of Doppler velocity from incoherent scatter spectra using context-aware transformers

Li, Yanlin; Zhou, Qihou

doi:10.5194/amt-19-3865-2026

Articles | Volume 19, issue 11

https://doi.org/10.5194/amt-19-3865-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/amt-19-3865-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 19, issue 11

Research article

|

12 Jun 2026

Research article |

| 12 Jun 2026

Estimation of Doppler velocity from incoherent scatter spectra using context-aware transformers

Yanlin Li and Qihou Zhou

Download

Final revised paper (published on 12 Jun 2026)
Preprint (discussion started on 30 Oct 2025)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-5022', Anonymous Referee #1, 01 Dec 2025

The manuscript reports results of line-of-sight plasma velocity fits to Arecibo ISR data using artificial intelligence (AI), namely context-aware transformers. The manuscript seems to be continuation to a series of papers by the same authors [1,2,3], in which they apply different data analysis techniques to archived coded long pulse (CLP) data from the Arecibo radar. The results suggest that AI may produce high-quality results in ISR data analysis.

While the idea to replace the traditional least-square fitting techniques with computationally less expensive (after the expensive training has been done) AI techniques is novel and the results look promising in general, I have several critical comments about the text and interpretation of the results.
1. The manuscript lacks critical references and fails to explain key principles of the AI model. If the idea is to introduce the AI techniques to the ISR community, skipping most of the key information "for brevity" may not be the best choice. It is understandable that Section 3.1, which describes the AI architecture, is full of field-specific jargon, but the terminology should be explained to the reader in such a level that reading the text is possible also for a non-expert of the field without reading all the references, and references to the key concepts should be given for readers who are interested in more details.
2. The actual scientific target of the measurements considered remains unclear. The authors first give a few very general motivations for measuring the Doppler velocities (without references). The AI model, which solves only for plasma velocities, is then compared with a least-squares fitting technique, which fits also several other parameters. Is there some specific application, for which the velocities alone are important? Would it be possible to use the AI model to fit the same parameters that are fitted with the least-squares solver? On line 300 the authors finally claim that focus of this study is around 110 km altitude. What exactly is the focus and why not to mention it in the abstract and in the introduction?
3. Least-square fits contain several tunable parameters that may greatly affect quality of the results, but these are not considered at all. In particular, stopping criteria for the iterations and initial values of the fitted parameters may affect both standard deviation and bias of the results. These should be carefully evaluated when comparisons between the AI model and the least-squares fits are performed.
4. Computational requirements are not discussed at all until the Conclusions section, where the authors claim that "Velocity inference is roughly 100 times faster than the fitting method and requires significantly fewer computational resources.". This may be true, but some key figures about computational resources needed for both training the model and the final velocity inference should be given. Also the training part is important for potential users of the technique, because it seems that one may need to train the model for each radar and radar operation mode separately.
5. The Arecibo radar collapsed a few years ago, but there are several other incoherent scatter radars in the world. Re-analysis of the archived Arecibo data is indeed valuable, but the authors could also comment if their technique might be usable for data from other radars that have considerably lower SNR and operate in completely different geophysical environments. In particular, other radars may observe much larger velocities and the users are typically interested also in electron densities and electron and ion temperatures, not just plasma velocities. At very end of the conclusions the authors claim, without any justification, that the model can be applied more broadly, but the very different noise levels and very much larger line-of-sight velocities observed with many other ISRs are not discussed at all.

Detailed comments:
Lines 22-23: "particularly during disturbed conditions."

Does this mean that the radars are more reliable than other instruments during disturbed conditions?
Line 25-26: References to studies where these measurements are valuable would be useful.
Line 28: To my understanding, the moment and autocorrelation methods are not commonly used for ISR data analysis, because computers are powerful enough for the least-squares fits and the users are typically interested in many other plasma parameters as well. Please correct me if I am wrong.
Lines 33-34: "Their easy implementations and computational efficiency make them a popular first choice."

Again, is this still true for IS radars nowadays?
Lines 39-40: "Unlike traditional methods,..."

Does this refer to some traditional machine learning methods, or to the traditional radar data analysis methods?
Line 59: Please give a reference to the coded long pulse technique.
Line 64: "...with the traditional curve fitting method."

Please explain what is "the traditional curve fitting method", and give a reference.
Equation (1): Shape of this profile seems to affect the final results, because the context-aware AI model learns this profile shape. Is there some physical justification for the selected function?
Lines 85-86: "context-aware" and "context-unaware" are here used without explaining the terms first.
Lines 93-94: Please give references to the "broader definitions".
Section 2: It would be useful to show some examples of the synthetic IS spectra with different noise levels.
Sections 3.1, 3.2 and 3.3: Please explain the AI terminology so that also readers who are not familiar with it can follow the description at least superficially, and give sufficient references. I will not list every single point separately in these comments.
Lines 124-125: "In transformer architectures such as BERT or ViT (Devlin et al. 2019; Dosovitskiy et al. 2020)."

This sentence seems to be completely detached from the surrounding text.
Line 199: "...context-unaware model is trained on standalone 101-point spectra with artificial noise..."

Is this noise somehow different from the noise added to the 5x101 input of the context-aware model?
Lines 208-217 & Figure 2. I do not understand what LSF-ideal and LSF-realistic mean here and how the comparison is done. The contours in Figure 2 are as function of bandwidth and noise std, but then the authors claim that there was no added noise (noise std=0?) in the LSF-ideal case. Please explain what happens in the comparison.
Lines 234-235: "...frequently used moment method..."

Please provide references that demonstrate the frequent use of the moment method in ISR data analysis.
Lines 243-244: Is the bias in the LSF results possibly affected by the initial parameter values? One might expect this kind of bias profile if the iteration starts from zero velocity and tends to stop a bit too early.
Lines 244-246: "The LSF and moment methods underestimate the true velocity for the same reason that the mean velocity tends to zero in the absence of noise."

I do not understand this sentence. What is "the same reason"?
Lines 248-249: Does the LSF standard deviation depend also on stopping criteria of the iteration? If the criteria are too loose, the iteration might stop at random locations around the true minimum of the cost function, increasing the noise.
Figure 3, panel c: please change the colors, especially yellow is almost invisible.
Lines 279-280: "For a slowly varying quantity, the standard deviation of the second-order difference of independent samples is √6 times of the random error, as measured by the standard deviation."

Please give a reference.
Lines 294-296: Is it possible that the fluctuations are true temporal variations in the wind field?
Lines 299-300: "In any event, the AI error is still 30% smaller than the LSF method around 110 km, which is the focus of the current study"

What exactly is the focus of the current study, and why is this mentioned only on line 300?
Caption of Figure 5: (divided by 20) -> (divided by 40)?
Conclusions: The conclusions should summarize the results and they should preferrably be understandable without reading the whole manuscript. Neither of these conditions is fulfilled in this case. The contents of the first paragraph would better fit to the preceding sections, and the discussion about computing resources should be expanded there.

[1] Li, Y., and Zhou, Q.: Measurements of F1-region ionosphere state variables at Arecibo through

quasi height-independent exhaustive fittings of the incoherent scatter ion-line spectra, J.

Geophys. Res. Space Phys., 129(11), e2024JA032620, 2024.
[2] Li, Y., and Zhou, Q.: Accurate spectral fitting in the upper F-region using the randomly coded data

of the Arecibo 430 MHz radar, J. Geophys. Res. Space Phys., 130, e2025JA033877,

https://doi.org/10.1029/2025JA033877, 2025a.
[3] Zhou, Q., Li, Y., and Gong, Y.: Variance estimations in the presence of intermittent interferences

and their applications to incoherent scatter radar signal processing, Atmos. Meas. Tech., 17(14),

4197–4209, https://doi.org/10.5194/amt-17-4197-2024, 2024.

Citation: https://doi.org/10.5194/egusphere-2025-5022-RC1
- AC1: 'Reply on RC1', Qihou Zhou, 12 Dec 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-5022/egusphere-2025-5022-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-5022-AC1
RC2:
'Comment on egusphere-2025-5022', Anonymous Referee #2, 05 Dec 2025

The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-5022/egusphere-2025-5022-RC2-supplement.pdf

Citation: https://doi.org/10.5194/egusphere-2025-5022-RC2
- AC2: 'Reply on RC2', Qihou Zhou, 12 Dec 2025
  
  The comment was uploaded in the form of a supplement: https://egusphere.copernicus.org/preprints/2025/egusphere-2025-5022/egusphere-2025-5022-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/egusphere-2025-5022-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Qihou Zhou on behalf of the Authors (13 Dec 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (17 Dec 2025) by Jorge Luis Chau

RR by Anonymous Referee #1 (14 Jan 2026)

Suggestions for revision or reasons for rejection

The manuscript has be improved considerably, but there is still some missing information and incomplete reasoning that needs to be corrected before it can be considered for publication. In particular, I am not convinced about the biases and variances of least-squares fits in Figures 3 and 4. Some key information I asked for was also given only in the reply letter, but the manuscript was not changed accordingly.

>> 1. The manuscript lacks critical references and fails to explain key principles of the AI
>> model. If the idea is to introduce the AI techniques to the ISR community, skipping most
>> of the key information "for brevity" may not be the best choice. It is understandable that
>> Section 3.1, which describes the AI architecture, is full of field-specific jargon, but the
>> terminology should be explained to the reader in such a level that reading the text is
>> possible also for a non-expert of the field without reading all the references, and
>> references to the key concepts should be given for readers who are interested in more
>> details.
>>
> Response: Section 3.1 has been significantly expanded to clarify field-specific
> terminology and to explain the terms of the transformer architecture in a manner
> accessible to non–machine-learning experts. We have also included a number of
> references in several areas.
AND
>> Sections 3.1, 3.2 and 3.3: Please explain the AI terminology so that also readers who are
>> not familiar with it can follow the description at least superficially, and give sufficient
>> references. I will not list every single point separately in these comments.
>>
> Response: Done

Section 3.1 was improved significantly and is now better readable also for a non-expert of the field. However, Section 3.2, which was mentioned in the detailed comments, should be improved similarily. Please give references to, or explain terms like "batch size", "epoch", "Adam optimizer", "learning rate", LLRD, and BERT. Many of these are basic AI terminology, but references would be very useful for non-experts in the field.

>> 3. Least-square fits contain several tunable parameters that may greatly affect quality of
>> the results, but these are not considered at all. In particular, stopping criteria for the
>> iterations and initial values of the fitted parameters may affect both standard deviation
>> and bias of the results. These should be carefully evaluated when comparisons between
>> the AI model and the least-squares fits are performed.
>>
> Response: In the present study, the least-squares baseline does not involve a general
> iterative optimization with multiple tunable convergence parameters. Doppler velocity is
> estimated by minimizing the squared error between the measured and modeled spectra
> over a one-dimensional grid of Doppler shifts with a fixed resolution (0.1 m/s). For this
> formulation, the objective function admits a unique global minimum for a given
> resolution, and the resulting solution is therefore deterministic and independent of both
> initialization and stopping criteria. The following text has been added to the end of
> section 3.3.
> ‘Finally, the traditional curve fitting method is included as a reference for comparison against the
> best-performing deep learning model, where Doppler velocity is obtained by a one-dimensional
> least-squares search over a discretized Doppler shift grid with a resolution of 0.1 m/s. For this
> formulation, the cost function has a unique global minimum at the chosen resolution, yielding a
> deterministic solution that is independent of initialization and stopping criteria. The fitting
> algorithm is described in Li & Zhou (2024).’

Thank you for the clarification. The Doppler shift grid resolution of 0.1 m/s also seems sufficient, but how wide is this grid? Also, are the biases in Figure 3 (a) calculated as mean values over all fit results at each altitude? If the grid is not wide enough, the posterior distribution may be non-zero at its edges. This will cause bias in mean values of the fitted velocities and their variances when the true velocity is non-zero, because the distribution is truncated at different distances from its true mean value at negative and positive sides. Maximum value of the posterior distribution should still provide an unbiased estimate.

I am still surprised to see that bias and standard deviation of the LSF depends on velocity in Figure 3 (a) and (b). Such behaviour might perhaps be expected for the moment method, but I cannot see how zero-mean noise added to the spectra could bias the LSF. The altitude variations make a qualitative match with the idea that they may be caused by truncation of the posterior distribution due to insuffucient width of the velocity grid. The authors should investigate the peak values of the posterior distribution or test the analysis with a wider velocity grid to see if this affects the results.

As an example, based on Figure 3, the true velocities reach almost 60 m/s and standard deviation of the fit reaches 15 m/s at some altitudes. To reach 4-sigma limit on both sides of the distribution at all altitudes, which is safe for sure, the velocity grid should span the range [-120 120] m/s. If the range is reduced to [-90 90] m/s (2-sigma), almost 3 percent of the posterior distribution may be cut off, and the mean values will be biased accordingly.

>> 5. The Arecibo radar collapsed a few years ago, but there are several other incoherent
>> scatter radars in the world. Re-analysis of the archived Arecibo data is indeed valuable,
>> but the authors could also comment if their technique might be usable for data from
>> other radars that have considerably lower SNR and operate in completely different
>> geophysical environments. In particular, other radars may observe much larger velocities
>> and the users are typically interested also in electron densities and electron and ion
>> temperatures, not just plasma velocities. At very end of the conclusions the authors
>> claim, without any justification, that the model can be applied more broadly, but the
>> very different noise levels and very much larger line-of-sight velocities observed with
>> many other ISRs are not discussed at all.
>>
> Response: Although the examples in this study are designed specifically for Arecibo ISR,
> the model itself is trained entirely on synthetic ISR spectra, for which radar parameters,
> Doppler velocity range, and SNR are manually controlled. Therefore, the training set can
> be easily adapted to a different SNR and Doppler velocity range.
> The relevant text has been revised to ‘As the training data are generated using physics-based
> ISR simulations, SNR and Doppler velocity range can be explicitly controlled during data
> generation. Therefore, the proposed framework is not inherently limited to the Arecibo ISR and
> can be adapted to other instruments by retraining the model using instrument-specific
> parameters and configurations.’

The training set could be generated for any radar system and conditions, but this does not guarantee good performance in data analysis. Very much larger velocities, lower SNR, and steep gradients in plasma parameter profiles are detected by high-latitude radars. I agree with the reply above, but I do not find justification for the much stronger statement "The AI approach applies to all situations where the spectrum can be parameterized." in the abstract and "the approach applies to any situation where the observed spectra can be parameterized." in Conclusions. Please give some evidence that the method actually works in the very different conditions where many other ISRs operate, or change these sentences.

>> Line 64: "...with the traditional curve fitting method."
>> Please explain what is "the traditional curve fitting method", and give a reference.
>>
> Response: “The traditional curving fitting method” is the least-squares fitting method.
> We have changed the sentence to: “The interpolation was originally introduced for
> compatibility with the LSF method used in Li and Zhou (2024) and is retained in this
> work without modification.”

Thank you for clarifying this. The term "curve fitting method" is still used in several places. This should be replaced with "LSF method" everywhere.

>> Equation (1): Shape of this profile seems to affect the final results, because the context-
>> aware AI model learns this profile shape. Is there some physical justification for the
>> selected function?
>>
> Response: Equation (1) constrains the maximum vertical variation of plasma parameters
> over the 1.5 km altitude range, with hyperparameters selected empirically based on
> variability observed in real ISR measurements.

This is a very critical point for the analysis and should be explained in detail in the manuscript. Please add a detailed explanation about how Equation (1) was selected in the text. This selection may also restrict use of the model. For example, does it work in presence of narrow sporadic E layers, which produce very steep gradients in Ne profiles? Also, regarding applicability to other radar systems, how would steep gradients in Vi profiles affect the results?

>> Lines 124-125: "In transformer architectures such as BERT or ViT (Devlin et al. 2019;
>> Dosovitskiy et al. 2020)."
>> This sentence seems to be completely detached from the surrounding text.
>>
> Response: This was a typo in the original manuscript. It was supposed to be a comma
> rather than a period after the sentence.

The manuscript has not been changed accordingly.

>> Line 199: "...context-unaware model is trained on standalone 101-point spectra with
>> artificial noise..."
>> Is this noise somehow different from the noise added to the 5x101 input of the context-
>> aware model?
>>
> Response: No. The same noise variance is applied independently at each height in both
> models. The context-aware model uses all five height-resolved spectra as separate
> tokens, while the context-unaware model averages the five heights.

The text was not modified and is still unclear. It gives the impression that the noise is somehow different for the context-unaware model. Please clarify the text.

>> Lines 208-217 & Figure 2. I do not understand what LSF-ideal and LSF-realistic mean here
>> and how the comparison is done. The contours in Figure 2 are as function of bandwidth
>> and noise std, but then the authors claim that there was no added noise (noise std=0?)
>> in the LSF-ideal case. Please explain what happens in the comparison.
>>
> Response: LSF-ideal and LSF-realistic differ only in how the spectrum used for Doppler
> fitting is obtained. In the LSF-ideal case, we assume the true (noise-free) spectrum shape
> is known and retrieve the Doppler velocity by shifting this fixed template along the
> frequency axis and minimizing the least-squares error. LSF-Ideal represents the limiting
> case for the LSF method. In the LSF-realistic case, the spectrum shape is unknown and
> must first be estimated from noisy data. To make it easier to read, we have also added
> the explanation in the Figure 2 caption.

Thank you for clarifying this. However, it is not sufficient to put the explanation in the figure caption. It should be included in the main text at the point where LSF-ideal is first mentioned. Also, are the values rounded to the nearest grid point in the LSF, or are exact values used? If exact values are used, this might partially explain the differences between LSF-ideal and LSF-realistic.

>> Lines 244-246: "The LSF and moment methods underestimate the true velocity for the
>> same reason that the mean velocity tends to zero in the absence of noise."
>> I do not understand this sentence. What is "the same reason"?
>>
> Response: The original sentence has a typo. “in the absence of noise” should have been
> “in the absence of signal”. The sentence is now changed to: “In the extreme case of all
> noise and no signal, the mean LSF and moment velocities tend to zero because the
> estimated velocities are symmetrically distributed at positive and negative values.
> Similarly, as long as there is noise, LSF and moment techniques tend to underestimate
> the velocity amplitude.”

At the limit of zero signal the distribution should become flat, and it is centered around zero only because the search space is symmetric with respect to zero velocity. For non-zero signals the distribution should be centered at the true velocity. It seems possible that distribution of the LSF results is truncated due to insufficient width of the velocity grid in this case (see my comment above).

Detailed comments of the revised manuscript.

Lines 107-108: "...where domain knowledge shapes the training data..."
Is "domain knowledge" the set of possible altitude profiles in equation (1) in practice?

Lines 130-131: "...Doppler shift manifests as a subtle, globally coherent displacement in frequency space that is shared across altitudes..."
The Doppler shift changes with altitude, it cannot be described as "globally coherent".

Line 147: AS -> As

Line 161: "commonly referred to as [CLS] token in AI literature"
Please give a reference.

Title of Section 3.2. "training" -> "Training" (or perhaps "Training the AI model" or something similar?)

Line 255: "...When 𝜂 is above 30 (~101.5), it is largely independent of..."
Does 'it' refer to the velocity RMSE?

Line 276: "for a representative condition at Arecibo"
Please give details of these conditions.

Lines 350-351: "Above 120 km, a larger number of heights can be used in..."
Does one need re-train the model for this?

Hide

RR by Anonymous Referee #2 (17 Jan 2026)

Suggestions for revision or reasons for rejection

The manuscript presents an application of transformer-based models to Doppler velocity estimation from incoherent scatter radar spectra, and the revised version shows improvements in clarity and organization. However, despite these improvements, the work still suffers from fundamental scientific and methodological deficiencies that prevent acceptance. The major issues are structural and require substantial revision.

The major concerns are below.

Issue 1: Poor scientific framing and overstated generality

The manuscript claims broad applicability of the proposed method, stating that it can be applied to any radar system or to any situation where the observed spectra can be parameterized. These claims are not supported by the results presented. The study demonstrates performance only for a specific ISR configuration and a single real-data case. Adapting the method to another radar system would require instrument-specific synthetic data generation, retraining of the model, and thorough revalidation. These steps are non-trivial and constitute a substantial methodological effort. This dependency is not adequately acknowledged in the manuscript, and no clear validity domain, assumptions, or potential failure modes are defined.

As a result, the conclusions are overstated relative to the evidence provided. The scope of applicability must be explicitly limited, and speculative statements regarding generalization should either be removed or clearly supported with demonstrations and results.

Issue 2: Lack of physical validation and bias toward smoothness

The manuscript does not provide convincing evidence of a physically validated improvement in Doppler velocity estimation. The evaluation using real data relies primarily on smoothness- and coherence-based metrics, such as second-order differences in altitude and time, which inherently favor smooth solutions. These metrics demonstrate only that the AI method produces smoother profiles than the least-squares fitting (LSF) approach.

In several instances, this smoothing appears to suppress physically meaningful structures. For example, in Figure 4b, around 115 km altitude near 12:00, a clear velocity structure visible in the LSF results is entirely removed (smoothed) by the AI method. This raises concerns that genuine physical variability may be attenuated or lost.

At the same time, the training data are synthetically generated using constrained, smooth vertical profiles. This creates a closed methodological loop in which the model is trained to favor smooth outputs and is subsequently evaluated using metrics that explicitly reward smoothness. Under these conditions, improved performance according to the selected metrics does not necessarily indicate improved physical accuracy, but rather stronger implicit regularization or denoising.

No independent physical validation is presented to demonstrate that the AI-derived velocities are closer to the true plasma drift than those obtained with least-squares fitting. The manuscript does not sufficiently distinguish between noise suppression and physical correctness, yet this distinction is essential for supporting the scientific claims being made.

Issue 3: Unjustified model complexity and unclear practicality

The proposed model is exceptionally large relative to the input size, consisting of 31 transformer blocks and approximately 100 million parameters. The manuscript does not provide a convincing justification for why such a large architecture is required, nor does it include comparisons with simpler models that could plausibly achieve comparable performance.

Furthermore, the discussion of computational efficiency is incomplete. While inference speed is emphasized, the substantial cost associated with model training and the reliance on modern GPU hardware are not properly contextualized. Least-squares fitting does not require a training phase and incurs relatively modest computational cost. As a result, the comparison between the two approaches is not balanced, and the practical advantages of the proposed method remain unclear.

Hide

ED: Reconsider after major revisions (17 Jan 2026) by Jorge Luis Chau

AR by Qihou Zhou on behalf of the Authors (30 Jan 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (01 Feb 2026) by Jorge Luis Chau

RR by Anonymous Referee #1 (09 Feb 2026)

RR by Anonymous Referee #2 (20 May 2026)

Suggestions for revision or reasons for rejection

The revised manuscript shows improvements in clarity, organization, and technical description. The authors have expanded several sections and clarified aspects of the transformer architecture, training strategy, and simulation framework. However, despite these revisions, the manuscript still suffers from fundamental scientific and methodological deficiencies that prevent acceptance in its current form. The major concerns remain largely unresolved and require substantial revision.

The main issues are summarized below.

Issue 1: Poor scientific framing, overstated generality, and lack of clearly defined scope and limitations

The manuscript continues to make broad claims regarding the applicability and generalization capability of the proposed AI approach, repeatedly stating that the method applies to “all situations where the spectrum can be parameterized” and can be extended broadly to other radar systems and non-ISR applications. These claims remain unsupported by the evidence presented.

The issue is not whether one can mathematically train a neural network on arbitrary spectra. Of course one can. The relevant scientific question is whether:

* the model generalizes robustly outside the training distribution,
* the AI approach consistently outperforms traditional estimators under realistic operating conditions,
* and whether the assumptions embedded in the training data remain valid under substantially different radar configurations, SNR regimes, and geophysical environments.

These points require experimental demonstration and validation, not plausibility arguments alone.

The manuscript demonstrates results only for a specific ISR configuration, a specific spectral parameterization, and a single real-data experiment from Arecibo. No cross-radar validation, no out-of-distribution testing, and no demonstration under substantially different ISR conditions are presented.

Given the limited validation presented, a critical missing component of the manuscript is a rigorous discussion of:

* scope,
* validity domain,
* limitations,
* and potential failure cases.

Furthermore, the authors suggests that different model configurations are preferable under different geophysical conditions, including combinations of height-aware and height-unaware approaches. However, this immediately raises unresolved scientific questions:

* Under what conditions does each model succeed or fail?
* What spatial or vertical scales are preserved or suppressed by each configuration?
* How can the user determine when the assumptions of the height-aware model break down?

At present, the manuscript does not provide a rigorous framework for answering these questions. As a result, the methodology appears heuristic and condition-dependent rather than systematically validated. The scope of applicability must therefore be substantially narrowed, or alternatively, the claims must be supported through additional validation experiments and a rigorous discussion of limitations and failure regimes.

Issue 2: Lack of physical validation and suppression of small-scale variability

This remains the central unresolved scientific issue in the manuscript.

The manuscript demonstrates that the proposed AI model produces smoother and more “coherent” velocity fields than the LSF method. However, it still does not convincingly demonstrate that the resulting Doppler velocities are physically more accurate.

The evaluation of real data relies primarily on smoothness- and coherence-based metrics, particularly second-order differences in altitude and time. These metrics inherently favor smooth solutions. At the same time, the training data are generated using constrained smooth vertical profiles with limited variability. As a result, the methodology creates a closed loop:

* the model is trained to favor smooth outputs,
* and is then evaluated using metrics that explicitly reward smoothness.

Under these conditions, improved performance according to the selected metrics does not necessarily imply improved physical accuracy. It may instead indicate stronger implicit regularization or denoising.

The manuscript interprets increased smoothness as evidence of reduced statistical error and improved estimation quality. However, in statistical estimation, lower variance alone does not imply improved physical correctness. A strongly regularized estimator can reduce variance while simultaneously suppressing genuine small-scale variability and localized structures. Distinguishing denoising from physical fidelity therefore requires independent validation beyond smoothness- and coherence-based metrics.

This concern is particularly important because the proposed context-aware model explicitly links neighboring altitude bins and is trained using vertically smooth synthetic profiles. Such a framework naturally favors vertically coherent structures and may reduce sensitivity to localized variability or sharp gradients. Although the authors argue that the height-unaware model also performs well, the manuscript does not quantitatively characterize what spatial or vertical scales are preserved, attenuated, or suppressed by either configuration.

As a result, it remains unclear:

* which classes of geophysical structures are faithfully reproduced,
* which are smoothed or attenuated,
* and under what conditions each model configuration should be preferred.

This issue becomes particularly important because the manuscript itself suggests combining the height-aware and height-unaware models depending on the observational scenario. However, such a strategy implicitly requires prior knowledge of when each model succeeds or fails, yet no quantitative framework or scale-dependent validation is provided to make this determination.

This behavior is already visible in Figure 4, where localized structures visible in the LSF solution appear substantially weakened or absent in the AI results. For example, around 115 km near 12:00 LT, a clear velocity feature visible in the LSF solution is almost completely suppressed by the AI model. The manuscript does not demonstrate whether these structures are noise artifacts or genuine physical variability.

From a geophysical perspective, the suppression of localized variability is itself a critical concern, particularly in regions where sharp gradients, intermittent structures, or small-scale dynamical processes are expected. A smoother solution is not necessarily a more physically accurate solution.

For these reasons, I continue to recommend major revisions.

Hide

ED: Publish subject to minor revisions (review by editor) (20 May 2026) by Jorge Luis Chau

AR by Qihou Zhou on behalf of the Authors (22 May 2026) Author's response Author's tracked changes Manuscript

ED: Publish as is (28 May 2026) by Jorge Luis Chau

AR by Qihou Zhou on behalf of the Authors (30 May 2026) Manuscript

Short summary

We introduce a transformer-based AI model for estimating Doppler velocity from incoherent scatter radar (ISR) spectra. Inspired by Vision Transformers, the model uses a standard transformer encoder adapted for radar data. Trained solely on simulated spectra, it performs well on real data from the Arecibo radar and significantly outperforms the traditional least-squares fitting (LSF) method. This approach is potentially applicable wherever spectral data can be parameterized.