Extraction of spatially confined small-scale waves from high-resolution all-sky airglow images based on machine learning

Wüst, Sabine; Strutz, Jakob; Hannawald, Patrick; Steffen, Jonas; Lienhart, Rainer; Bittner, Michael

doi:10.5194/amt-19-3539-2026

Articles | Volume 19, issue 10

https://doi.org/10.5194/amt-19-3539-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/amt-19-3539-2026

© Author(s) 2026. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 19, issue 10

Research article

|

29 May 2026

Research article |

| 29 May 2026

Extraction of spatially confined small-scale waves from high-resolution all-sky airglow images based on machine learning

Sabine Wüst, Jakob Strutz, Patrick Hannawald, Jonas Steffen, Rainer Lienhart, and Michael Bittner

Download

Final revised paper (published on 29 May 2026)
Supplement to the final revised paper
Preprint (discussion started on 13 Oct 2025)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2025-4611', Anonymous Referee #1, 13 Nov 2025
This manuscript presents a method for detecting small-scale airglow wave structures using a modified YOLOv7. The paper is well written, scientifically sound, and a welcome contribution. A few concerns regarding the methodology need to be addressed, however. Below are my itemized comments.

Line 173. ‘BYOL….’

Pretraining using BYOL may not be particularly beneficial for this application, and the manuscript provides no evidence that BYOL improves performance. Nevertheless, the authors should at least include further details on the implementation of BYOL and YOLOv7. In particular:
Which variant of YOLOv7 was used? YOLOv7 has multiple variants with different backbones and capacities.

Was the BYOL pretraining performed starting from a randomly initialized network, or from an already pretrained YOLOv7 backbone (for example COCO pretrained)?

What data augmentations were used for BYOL during pretraining?

Did the authors compare the performance of the BYOL-pretrained model with a non-pretrained (trained from scratch) or standard pretrained YOLOv7 model?

YOLOv7 is adapted to output wavelength and orientation. This is in fact a substantial change to the net and these tasks deviate from the original design goal of the YOLO structure. YOLO and its variants are highly optimized for object detection but not much for extracting information from the objects. Adding three additional regression features may severely interfere with its main task (object detection). Generally speaking, regression tasks in neural networks require careful design of the architecture, loss functions, and training strategy. Simply adding three additional regression outputs to the YOLOv7 head is unlikely to work well without further justification or validation.

Has the author tried using YOLOv7 in its original form and compared the numbers?

Line 284. A validation set is absolutely necessary. The testing set is the one that is optional. In recent work, some studies omit a separate testing set and report validation metrics only, provided that the validation set is sufficiently large and representative. Omitting the validation set, however, is not consistent with standard neural network training practice, since it prevents proper monitoring of overfitting and reliable model selection. Without a validation set, it is impossible to detect overfitting during training. Given that the training set is relatively small (only in the thousands), not using a validation set is a fatal mistake, and the performance metrics obtained during training are likely to reflect overfitting rather than true generalization.

Line 288. ‘……78% are correctly identified’.

This is not an appropriate way to report regression performance. Regression tasks should be evaluated using continuous error metrics such as MSE or RMSE, and wavelength and orientation should be reported separately with their respective error distributions. Using a binary threshold to count predictions as “correct” obscures the actual performance and does not provide enough information to assess model accuracy.

Figure 5 and ~ Line 286.

The reported performance is subpar for a task that should not be particularly difficult for a modern neural network. This suggests there might be issues with the data, the net config, and/or training. I suggest that the authors retrain the network without the additional regression features, expand the training data if possible, and include a validation set. If the size of the training dataset is the main constraint, using the testing set as the validation set and reporting the validation metrics is also acceptable.
The orientation and wavelength can be handled much more effectively by a dedicated CNN or ViT that processes the image content within the bounding box. Or even better, a DETR-based model would be more suitable for predicting both the bounding box and the orientation. However, adapting the method to DETR would require substantial additional work and is not strictly necessary here.

Line 320. ‘In summary, the 2D-FFT provides more accurate results as 78% of the FFT predictions have an error of less than or equal to 2.5° for the orientation and 3% (relative to the labelled wavelength) for the wavelength. For the modified YOLOv7 algorithm, 78% of the results are considered correct, if the wavelength error is less than 10% relative to the labelled wavelength and the error of the angle is less than 10°.’

This is not a fair comparison. The 2D-FFT results are evaluated using an error threshold of 2.5° for orientation and 3 percent for wavelength, while the YOLOv7 results are evaluated using a much looser threshold of 10° and 10 percent. Because the criteria differ by a large factor, the “78 percent correct” numbers for the two methods cannot be directly compared.
While I understand the authors are trying to show that 2D-FFT performs better on normal images, the comparison is still pretty weird. It would be better to compare both methods under the same benchmark. MSE or RMSE is the standard metrics for regression tasks like these.
Citation: https://doi.org/10.5194/egusphere-2025-4611-RC1
- AC1: 'Reply on RC1', Sabine Wüst, 29 Jan 2026
  
  Thank you very much for you valuable comments. We answered all of them in the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4611-AC1
RC2:
'Comment on egusphere-2025-4611', Anonymous Referee #2, 15 Nov 2025
Comments on the manuscript ‘Extraction of spatially confined small-scale waves from high resolution all-sky airglow images based on machine learning’ by Sabine Wüst et al.

This paper reports the high resolution/wide area observations of OH airglow images using a scanning camera at DLR Oberpfaffenhofen, and a new method of analyzing ripple structures in the image using ML technique. The authors have also shown the statistical results of the ripples. The new analysis technique has extracted two order larger number of events than the past literatures, and the results are well compared with the past observations.

The reviewer would like to congratulate the authors’ successful observation and analyses. Th e new method applied to a wide-horizontal range and high-resolution images is very capable of studying the statistics of ripple structures (small-scale wave-like structures) in the image, for which the relations with the instabilities and secondary gravity waves are of great interest.

However, there are some points that need to be improved before the manuscript is published. Thus, I would like to recommend ‘minor revision’.

Main point:
The wording of propagation direction
There are many places where the authors mention ‘propagation direction’. I understand there are three meeting what ‘directions’ mean.
(1) Direction of so called ‘phase velocity’, which is the apparent phase velocity of phase front lines.
(2) Direction of motion of the area that the wave-like structure ‘packet’ is moving to.
(3) Direction perpendicular to the phase front line (of a single image)

My understanding from the text is that (1) is the one we normally use in case of gravity waves and 2D-FFT is showing this by its peak (with 180 ambiguities in case of a single image.) (2) and (3) can be derived from ML technique shown here. I would like to suggest the authors to use clearly different words for (2) and others, to avoid confusion of the readers. My suggestion is to use the word like ‘direction of the wave structure movement’, ‘direction of wave packet motion’, ‘direction of wave migration’, ‘direction of wave drift’ etc. Or, Li et al. (2017) uses ‘advection’, instead. I would prefer not to use the word of ‘propagation’ for (2) because it is not related to wave parameters but observed area. I hope such wording separation would help readers to understand the paper correct.

Related question.
L 410-412
‘If all the observed wave-like structures are (secondary) gravity waves, then there is no difference between the propagation directions derived from the two different approaches.’

I do not understand what this sentence means. Even in case of secondary gravity waves, wave packet motion direction may not be the same as the phase velocity of the wave. Please explain more.

Other points
40-49

The authors describe airglow imaging observation of breaking gravity waves with citation of a few papers. To my knowledge the first clear observation of showing gravity wave breaking by airglow and its analysis was published by Yamada+ (GRL, 2001, DOI: 10.1029/2000GL011945) and Fritts+(GRL DOI: 10.1029/2001gl013753). I would suggest to cite these papers which are earlier publication by about 20 years.

L 94-100
The description of the FAIM 4.
It would be useful if the authors can also provide chip (or pixel) size of the InGaAs camera, and F number of the lens (or the effective aperture of the lenz) for the reader to understand the sensitivity of the optics (e.g. for knowing ‘A x omega’ value (throughput) of the camera).

L 173-179
It would be helpful, if the authors briefly introduce how the FOVs of FAIM 4 and FAIM 3 (13 km x 13 km?) are different.

L 214 – 215
‘Firstly, performing a 2D-FFT, especially on high-resolution images, is time-consuming and computationally expensive, leading to longer processing times and significantly affecting efficiency in analysing large data sets.’
（similar expression is at L 325.）
My feeling is that 2D-FFT is not so time consuming nowadays, as long as the number of points (most efficient one is 2^N) is selected properly. I would like to know how difficult it is to use 2-D FFT for the images introduced here. I believe zero-padding to make a square image of (2^N) * (2^N) size would make the computation time short enogh.

L 517 – 522
The authors refer to Jacobi et al. (2015) and speculated that the meridional wind is strong in April/May, and the zonal wind is in August, which can explain the probability of dynamical instability is large. I do not understand this logic. Why the largest wind at around OH altitude shows probability of dynamical instability, without a measurement of wind shear.

L 576
‘we observe in our data changes from south in spring to east in winter’.

I cannot read the direction is south in spring from Figure 15. Please check it.

Figure 15.
Please indicate the location of center of the plot, as well as WE and NE line.
Is ‘zero’ value shifted from the center, which is my guess from the scale axis?
If so, what is the reason?
Citation: https://doi.org/10.5194/egusphere-2025-4611-RC2
- AC2: 'Reply on RC2', Sabine Wüst, 29 Jan 2026
  
  Thank you very much for your valuable comments. We addressed all of them in the attached pdf.
  
  Citation: https://doi.org/10.5194/egusphere-2025-4611-AC2

Peer review completion

AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload

AR by Sabine Wüst on behalf of the Authors (13 Feb 2026) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (22 Feb 2026) by Jorge Luis Chau

RR by Anonymous Referee #1 (11 Mar 2026)

RR by Anonymous Referee #2 (17 Mar 2026)

ED: Publish subject to technical corrections (17 Mar 2026) by Jorge Luis Chau

AR by Sabine Wüst on behalf of the Authors (10 Apr 2026) Author's response Manuscript

Short summary

Since June 2019, an infrared camera has been scanning the nearly entire sky (diameter: 500 km) above DLR Oberpfaffenhofen (48.09° N, 11.28° E), Germany, every night providing images of the OH* airglow layer (height: 85–87 km), with a high spatial and temporal resolution (150 m, 2 min). We analysed three years of data for spatially confined small-scale wave structures with a machine learning approach. We derived seasonal variations and deduced that wave breaking is mostly observed in summer.

Extraction of spatially confined small-scale waves from high-resolution all-sky airglow images based on machine learning

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection