Determination of time-varying periodicities in unequally spaced time series of OH* temperatures using a moving Lomb–Scargle periodogram and a fast calculation of the false alarm probabilities

Kalicinsky, Christoph; Reisch, Robert; Knieling, Peter; Koppmann, Ralf

doi:https://doi.org/10.5194/amt-13-467-2020

Articles | Volume 13, issue 2

https://doi.org/10.5194/amt-13-467-2020

Articles | Volume 13, issue 2

Research article

05 Feb 2020

Research article |

| 05 Feb 2020

Determination of time-varying periodicities in unequally spaced time series of OH* temperatures using a moving Lomb–Scargle periodogram and a fast calculation of the false alarm probabilities

Christoph Kalicinsky, Robert Reisch, Peter Knieling, and Ralf Koppmann

Abstract

We present an approach to analyse time series with unequal spacing. The approach enables the identification of significant periodic fluctuations and the derivation of time-resolved periods and amplitudes of these fluctuations. It is based on the classical Lomb–Scargle periodogram (LSP), a method that can handle unequally spaced time series. Here, we additionally use the idea of a moving window. The significance of the results is analysed with the typically used false alarm probability (FAP). We derived the dependencies of the FAP levels on different parameters that either can be changed manually (length of the analysed time interval, frequency range) or that change naturally (number of data gaps). By means of these dependencies, we found a fast and easy way to calculate FAP levels for different configurations of these parameters without the need for a large number of simulations. The general performance of the approach is tested with different artificially generated time series and the results are very promising. Finally, we present results for nightly mean OH^* temperatures that have been observed from Wuppertal (51^∘ N, 7^∘ E; Germany).

Download & links

Article (PDF, 4341 KB)

Download & links

How to cite.

Received: 11 Sep 2019 – Discussion started: 30 Sep 2019 – Revised: 18 Dec 2019 – Accepted: 06 Jan 2020 – Published: 05 Feb 2020

1 Introduction

Many time series in atmospheric sciences are characterised by an unequal spacing of the data points, e.g. due to data gaps. OH or other airglow observations often have such data gaps in the measured time series (e.g. Espy et al., 1997; Das and Sinha, 2008; Reid et al., 2014). The OH^* temperatures which have been observed from Wuppertal (51^∘ N, 7^∘ E) since the 1980s also exhibit an unequal spacing. The time series of nightly mean OH^* temperatures repeatedly has data gaps mainly because of bad weather conditions during some nights that prevent useful measurements (see e.g. Bittner et al., 2000). Within a single night such data gaps can also occur when clouds move through the line of sight. The measurements before and after such a cloud contamination are useful. Typical methods such as the fast Fourier transformation (FFT) rely on a discrete sampling with equal spacing. Thus, a time series like that of OH^* temperatures has to be manipulated, e.g. with interpolation techniques before the analysis (e.g. Espy et al., 1997; Bittner et al., 2000; Reid et al., 2014). The Lomb–Scargle periodogram (LSP; Lomb, 1976; Scargle, 1982) is a method that can handle this drawback, as it can be used for time series with unequal spacing. This method has been used in different studies analysing airglow observations (e.g. Espy et al., 1997; Takahashi et al., 2002; Gao et al., 2010; Reid et al., 2014; Egito et al., 2018; Franzen et al., 2018; Nyassor et al., 2018).

A second important point with respect to the analysis of periodicities is the variation of these periodicities with time, i.e. the period is not stable during the complete analysed time interval or the amplitude varies. In such cases many methods as the FFT and the LSP will lead to results of a mean state only. The wavelet transform is a method that is very useful as it delivers time-resolved information on the periodicities of the analysed time series and it is used in several studies analysing the temporal evolution of periodic signals in airglow observations (e.g. Das and Sinha, 2008; Takahashi et al., 2013; Reid et al., 2014; Nyassor et al., 2018). In the case of the Wuppertal OH^* temperatures, Bittner et al. (2000) used the wavelet transform to analyse the variability of the nightly mean OH^* temperatures after assimilation of the data gaps in the time series by use of the maximum entropy method (MEM). Similar to that, other studies also report that the time series have to be interpolated before the use of the wavelet transform (e.g. Das and Sinha, 2008; Reid et al., 2014) or the sampling is at least almost evenly distributed (Nyassor et al., 2018). The goal of the presented study is to avoid such an assimilation of the data gaps and still derive time-resolved information on the periodicities. Thus, we combined the LSP and the idea of a moving window to identify and characterise periodicities in unequally spaced time series even when the periodicities vary with time. Other airglow studies also use some kind of windowed LSP but for independent time windows following each other such as different parts of a night (Reid et al., 2014) or months of a year (Egito et al., 2018). Some studies analysing radar observations of winds report a periodogram analysis with a moving window (Yoshida et al., 1999, but without significance evaluation) or a LSP analysis for at least partly overlapping windows (Luo et al., 2000). However, our study combines the LSP with a moving window (moved with the minimum possible time step); additionally, we derive a fast and easy method to calculate the false alarm probabilities (FAPs) for different situations (length of time series, frequency range, data gaps) to identify significant results. The determination of the FAP levels is typically done with Monte Carlo simulations, which is very time-consuming (e.g. Cumming et al., 1999; Zechmeister and Kürster, 2009). Thus, our new empirically derived relationship to calculate the levels improves the application of the method.

The main intention of the paper is to describe the approach from a user perspective and to illustrate the capabilities of the approach with examples of artificial data sets as well as observations. The paper is structured as follows. In Sect. 2 the classical LSP and the new approach are explained. The evaluation of the significance of obtained results is made in Sect. 3. Finally, the method is applied to artificial data and observations of OH^* temperatures in Sect. 4. A short summary is given in Sect. 5.

2 Methodology

2.1 Classical Lomb–Scargle periodogram

The Lomb–Scargle periodogram (LSP) was developed by Lomb (1976) and Scargle (1982). The periodogram is defined as

\begin{matrix} (1) & \begin{aligned} P_{X} (ω) = \frac{1}{2} \\ \{\frac{{[\sum_{j} X_{j} \cos ω (t_{j} - τ)]}^{2}}{\sum_{j} \cos^{2} ω (t_{j} - τ)} + \frac{{[\sum_{j} X_{j} \sin ω (t_{j} - τ)]}^{2}}{\sum_{j} \sin^{2} ω (t_{j} - τ)}\}, \end{aligned} \end{matrix}

where X_j represents the measurements at the times t_j, ω is the angular frequency (ω=2πf), and the time offset τ is defined as

\begin{matrix} (2) & \tan (2 ω τ) = \frac{(\sum_{j} \sin 2 ω t_{j})}{(\sum_{j} \cos 2 ω t_{j})} . \end{matrix}

An advantage compared to other methods such as the FFT is that the LSP can handle unequally spaced time series. A prerequisite is that the time series has zero mean before the calculation of the periodogram powers. With the given definition, the LSP has two useful properties: (1) it is invariant to a shift of the origin of time, and (2) it is equivalent to the least squares fitting of sinusoids (e.g. Horne and Baliunas, 1986). Scargle (1982) showed that the definition of the periodogram is the same (except for a factor of 1∕2) as the reduction in sum of squares (sum of squares of data minus sum of squares of residual) when using least squares fitting of sinusoids (see Scargle, 1982, Appendix C). Thus, the maximum power in the periodogram occurs at that frequency that leads to a minimum of the sum of squares of the residuals when a sinusoid with this frequency is fitted to the time series.

2.2 Moving LSP

The approach used in the following analyses is based on the classical LSP, but the whole time series is analysed sequentially. The procedure is as follows.

A window size (time interval), which is typically much smaller than the length of the whole time series, is defined. Then the procedure starts at the beginning of the time series:

calculate LSP for the data points within the window (time interval),
move the window by one time step (minimum possible sampling step),
move to step one until the end of the times series is reached.

By executing this procedure, one single LSP is calculated for each possible part of the time series with the length of the window (time interval). By contrast to the LSP for the whole time series at once, this procedure delivers time-resolved information on the periodicities and amplitudes.

2.3 Normalisation of the LSP

There are different ways to normalise the periodogram: sample variance (or sum of squares), known variance of data, and variance of the residuals (see e.g. Cumming et al., 1999; Zechmeister and Kürster, 2009). Here we use the normalisation by the sample variance and sum of squares, respectively. These two only differ by a constant factor that relies on the number of data points N. The periodogram power can vary between 0 and $(N - 1) / 2$ when using the normalisation by the sample variance and between 0 and 1 when using the normalisation by sum of squares (when the factor 1∕2 is also considered; compare Sect. 2.1) (e.g. Cumming et al., 1999; Zechmeister and Kürster, 2009). As the height of a peak in the case of the normalisation by the variance depends on the number of data points N, the peak heights for the same sinusoid differ for different numbers of data points. Since the data gaps in the time series of nightly mean OH^* temperatures are randomly distributed, the number of data points in different possible windows of same size can vary. In order to make the peak heights in these different windows comparable, we prefer the normalisation by the sum of squares. This type of normalisation has another useful property. Because of the equivalence to the reduction in sum of squares when fitting a sinusoid, the normalisation by the sum of squares leads to a normalised power that gives the contribution of the sinusoid to the total sum of squares, and therefore to the total variance. In this way it is a measure of the explained variance. Here non-correlation between different sinusoids and/or a sinusoid and the residual is assumed. This is, at least approximately (increasing with number of data points), the case for sinusoids with different periods and, thus, the variances of the individual parts (sinusoids) of the time series add up.

Alternatively, one can determine the amplitude of the sinusoid at each frequency. This is also based on the equivalence of the periodogram power and the reduction in sum of squares. Furthermore, the variance of a sinusoid is given by A²∕2, where A is the amplitude (e.g. Horne and Baliunas, 1986; Smith, 1997). With these two relationships, the amplitude can be calculated as

\begin{matrix} (3) & A (ω) = \sqrt{\frac{4 P_{X} (ω)}{N - 1}} . \end{matrix}

In total the LSP delivers information on the periodicities together with a measure of the explained variance when a sinusoid is fitted to the data and the corresponding amplitude of the sinusoid. An example periodogram is shown in Fig. 1. The time series that is analysed is a combination of two sinusoids with different periods and amplitudes. The first one has a period of 10 d and an amplitude of 1 K, whereas the second sinusoid has a period of 35 d and an amplitude of 0.5 K. The total length of the time series is 60 d and the time series has equal spacing. Thus, the variance of the second sinusoid is only one-quarter of the variance of the first one. This can be seen in the normalised power (black curve in Fig. 1) where at 10 d a value of about 0.8 is reached and at 35 d a value of about 0.2 is reached. Because of the different amplitudes, the sinusoids contribute 80 % and 20 % to the total variance of the time series. And also the amplitudes themselves are well determined by using Eq. (3) (see red curve in Fig. 1).

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f01

Figure 1Example LSP for a time series composed of two sinusoids. The first one has a period of 10 d and an amplitude of 1 K and the second has a period of 35 d and an amplitude of 0.5 K. The normalised power is shown as a black curve and the amplitude as a red curve with a second axis to the right.

Download

3 Significance evaluation

3.1 False alarm probability

An important quantity with respect to the LSP is the so-called false alarm probability (FAP). It gives the probability that a peak with a height above a certain level can occur just by chance, e.g. due to noise. The distribution of the periodogram powers and thus the description of the false alarm probability depends on the type of normalisation (see e.g. Cumming et al., 1999; Zechmeister and Kürster, 2009). In the case of the normalisation by the sample variance, the periodogram powers follow a beta distribution (Schwarzenberg-Czerny, 1998). As the variance and the sum of squares differ by a constant factor only, the type of distribution is the same. Hereafter, we only describe the situation for the normalisation by sum of squares. At a single frequency the probability that a peak height z exceeds a value of z₀ is given by

\begin{matrix} (4) & Prob (z > z_{0}) = (1 - z_{0})^{\frac{N - 3}{2}}, \end{matrix}

where N is the number of data points (Zechmeister and Kürster, 2009). Since periods in a frequency range are analysed, one is interested in the probability that one peak somewhere in the periodogram covering a frequency range Δf exceeds a certain value by chance, which is given by the FAP. The probability that all peaks in this frequency range are below or equal to a certain value is given by $(1 - Prob (z > z_{0}))^{N_{i}}$ , where N_i is the number of independent frequencies (number of frequencies where potentially peaks can occur). Then the FAP is

\begin{matrix} (5) & FAP = 1 - (1 - Prob (z > z_{0}))^{N_{i}}, \end{matrix}

where N_i gives the number of independent frequencies (see e.g. Horne and Baliunas, 1986; Cumming et al., 1999; Zechmeister and Kürster, 2009, for some discussion on FAP). There is no analytical way to describe the number of independent frequencies, but a good way to determine N_i is the use of Monte Carlo simulations (see e.g. Cumming et al., 1999).

The procedure to determine N_i using simulations is as follows. As already pointed out by Scargle (1982) the cumulative distribution function (CDF) can be used to determine the FAP. We use a large number of samples of random values taken from a Gaussian distribution. Then we calculate the LSP for each sample and determine the height of the maximum peak within the analysed frequency range. From these maximum peak heights, we calculate the empirical CDF which gives the probability that the maximum peak and thus all other peaks in a periodogram have a height equal to or below a certain value. The CDF is then given by $(1 - Prob (z > z_{0}))^{N_{i}}$ , and consequently the FAP is then 1−CDF. In the last step we determine N_i by fitting Eq. (5).

An example for the results of this procedure is shown in Fig. 2. The example shows the FAP derived from 10 000 samples of Gaussian noise, where each sample has 60 data points and a sampling of 1 d⁻¹; thus, the complete time interval length is 60 d. The frequency range used for the analysis is $Δ f = 1 / 2$ –1∕60 d⁻¹. The frequency sampling during these simulations (and all other simulations) is fixed with respect to the length of the time interval, thus the duration of observations T and the frequency range. We evaluated the LSP at N_freq equally spaced frequencies in the frequency range Δf, where N_freq=4TΔf, which was shown to be an adequate sampling to observe all possible peaks by Cumming et al. (1999). The blue circles show the results for Prob(z>z₀) at a single frequency. The theoretical curve of Eq. (4) is shown in magenta. The determined probability and the theoretical one match very well. The results for the FAP (1-CDF) are shown as black circles. The red curve is determined by fitting Eq. (5) to these data points. The number of independent frequencies N_i in this case is about 72. From this curve, different FAP levels can be determined. In the following we typically use a FAP level of 5 %, which means that in only 5 % of the noise samples the maximum peak in the complete frequency range exceeded the corresponding peak height value. In Fig. 2 the dashed horizontal line marks a FAP of 5 % and the intersection with the red curve gives the height of about 0.225 that corresponds to this level.

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f02

Figure 2False alarm probability (FAP) and Prob(z>z₀) at a single frequency derived from 10 000 noise samples with 60 data points each. The data sampling was 1 d⁻¹ and the analysed frequency range $Δ f = 1 / 2$ –1∕60 d⁻¹. The derived Prob(z>z₀) is shown with blue circles and the theoretical curve (Eq. 4) is depicted in magenta. The determined FAP is shown by the black circles and the fit to these data points using Eq. (5) is displayed as red curve. The dashed horizontal line marks a FAP of 5 %.

Download

3.2 Dependency of N_i and FAP

The number of independent frequencies N_i and the false alarm probability depend on different factors: the length of the analysed time interval T, the data gaps within the time interval, and the analysed frequency range Δf. Since different situations with respect to data gaps can occur during the analysis of the OH^* temperatures and, additionally, the length of the window (time interval) and the frequency range can be chosen, one would have to perform simulations for all situations. As these simulations are much more time-consuming than the calculation of the LSP itself, we want to avoid these numerous simulations. Thus, we examined the different dependencies to find a faster and easier way to determine N_i and thus the FAP levels. The sampling of the time series used for these analyses was chosen to be 1 d⁻¹, which is the same as for the nightly mean OH^* temperatures without data gaps. For the different analyses, we varied only one parameter and kept the other two fixed. In all cases 10 000 noise samples were used to determine one N_i value.

Firstly, we analysed the dependency of N_i on the length of the time interval T. Here the frequency range was kept constant and the time series had no data gaps. As this is the case and the sampling is 1 d⁻¹, the length of the time interval is equal to the number of data points N, i.e. a time interval of 60 d has 60 data points. The frequency range was fixed to $Δ f = 1 / 2$ –1∕60 d⁻¹ for the first analysis. Since the width of a peak is inversely proportional to the length of the analysed time interval (see e.g. Cumming et al., 1999; Zechmeister and Kürster, 2009), the number of independent frequencies N_i for a fixed frequency range should linearly increase with increasing time interval length. Figure 3 shows the results for N_i for different time interval lengths T between 30 and 90 d (typical values used for the analysis of nightly mean OH^* temperatures) as blue full circles. Obviously, the dependency is linear. A linear fit including an additional intercept leads to an intercept of about zero. Thus, we calculated a fit line that has to intersect the point (0,0) and only determined the slope of this line, which is 1.208 (±0.004) d⁻¹. The fit is shown as a blue line in Fig. 3a. Since the number of data points N increases with increasing length of the time interval T, the probability that the power at a single frequency exceeds a certain value by chance decreases (compare Eq. 4). As this effect is larger than the opposite effect of the increase in N_i, the FAP levels also decrease. Figure 3b shows the levels of a FAP of 5 % for the different time interval lengths as blue full circles.

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f03

Figure 3(a, b) Dependency of N_i and the FAP level of 5 % on the length of the time interval T and the frequency range. The analysed frequency ranges are 1∕2–1∕5, 1∕2–1∕10, and 1∕2–1∕60 d⁻¹, and the time series of the simulations have no data gaps. (c) Dependency of the slopes (lines from panel a) and additionally for the frequency ranges (1∕2–1∕30 and 1∕2–1∕90 d⁻¹) on the frequency range Δf. The error bars show 2 times the standard error of the slopes.

Download

In a second analysis we varied the frequency range and repeated the analysis that was done before. The frequency ranges lay between 1∕2–1∕5 and 1∕2–1∕90 d⁻¹. A smaller frequency range should include a smaller number of independent frequencies. As the decrease in N_i for a reduction of Δf depends on the width of the peaks, and therefore on the length of the time interval T, the decrease in N_i for the same reduction of Δf has to be larger for larger T. This can be seen in Fig. 3a, where example results for the frequency ranges 1∕2–1∕5 and 1∕2–1∕10 d⁻¹ are shown in black and red, respectively. For the smallest frequency range, the lowest values can be seen and the largest decrease in N_i is observed for the longest time interval T. Because of this dependency of the decrease in N_i on the time interval length, the fit lines are not shifted by a constant value, but the slopes of the fit lines change. Thus, the slopes depend on frequency range Δf. Figure 3c shows the dependency of the slopes on the frequency range Δf. Obviously, for the analysed frequency ranges this dependency can be described by a straight line. A fit to the data leads to the results for the slope of 2.92 (±0.02) d d⁻¹ and for the intercept of −0.203 (±0.008) d⁻¹. The fit line is shown as black line. With the knowledge of these parameters the number of independent frequencies N_i can be determined for each combination within the analysed parameter range by

\begin{matrix} (6) & N_{i} = (2.92 d d^{- 1} \cdot Δ f - 0.203 d^{- 1}) \cdot T . \end{matrix}

In the last analysis we evaluated the dependency of N_i on the number of data gaps in a fixed time interval. The frequency ranges for this analysis were $Δ f = 1 / 2$ –1∕5, 1∕2–1∕10, and 1∕2–1∕60 d⁻¹. We took a time interval of 60 d and introduced 1 to 29 randomly distributed data gaps. We only removed data points inside the complete time interval, i.e. both end points were always there and the time interval length was always 60 d. Since the spectral width of the peaks depends on the length of the time interval, which is fixed, and not on the number of data points, the number of independent frequencies N_i is supposed to be almost the same for different numbers of data gaps. Figure 4a shows N_i as a function of the number of data gaps for different frequency ranges. In all cases only a slight decrease in N_i with increasing number of gaps can be seen. The decrease is slightly larger for those frequency ranges that lead to larger N_i values. But the relative decrease is very similar for all shown situations. The decrease in all cases is only of the order of a few percent for 50 % data gaps. This decrease is caused by an on average very small decrease in the resolution caused by a small increase in the peak width. Although the number of independent frequencies is nearly constant, this does not mean that the FAP levels stay the same. Since the number of data points N decreases with increasing number of data gaps, the probability that the power at a single frequency exceeds a certain value increases (compare Eq. 4). Thus, the FAP for a certain peak height also increases. This increase is shown in Fig. 4b. The effect of the decrease in N_i on the FAP levels of 5 % is typically of the order of a few per mil (‰). Thus, a non-consideration of this decrease in N_i would lead to a very small change in the FAP levels. Furthermore, the change when considering the decrease would be negative, i.e. the FAP for the same height z would get smaller. Consequently, the FAP levels of 5 % also have smaller values. Thus, a non-consideration would not change the judgement if a signal is significant or not in a false way. When a signal exceeds a higher value it will certainly exceed a smaller value, too. Nonetheless, in the FAP levels shown later on, the effect of the data gaps on the N_i values is considered.

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f04

Figure 4Dependency of N_i and the FAP level of 5 % on the number of data gaps in a fixed time interval of length T. The analysed frequency ranges are 1∕2–1∕5, 1∕2–1∕10, and 1∕2–1∕60 d⁻¹, and the time interval length is 60 d.

Download

4 Data evaluation

4.1 Artificial data

In order to study the performance of the approach we analysed different time series of artificial data. In this section we present selected examples of these time series. The total length of the time series was always 1 year (365 d) and the sampling was 1 d⁻¹, which is the same as for the nightly mean OH^* temperatures without data gaps.

The analysis of a single sinusoid is a very trivial problem and the approach delivers the expected results (not shown). As the approach shall be used in the case of non-stable periodicities, we focus here on such problems. The first example shows a time series of a periodic signal with a period that increases with time from approximately 8 to 16 d and an amplitude of 1 K. The time series is shown in Fig. 5a as a black curve. (The components signal (blue) and noise (green) are shown additionally in separate panels.) The results of the analysis are shown in Fig. 5b and c for the normalised power and the amplitude, respectively. The y axes of these two figures give the frequency and period, respectively, and the x axes show the centre days of the sequentially analysed time intervals. The normalised power and the amplitude are shown colour coded and the white contour lines mark the FAP level of 5 % (N_i was determined using Eq. 6). The results clearly show the change in the period with time and the normalised power is close to one. The small deviation from a value of one can be explained by the change in the period, which occurs on a smaller timescale than the interval size of 60 d. Thus, a sinusoid with a fixed period is not able to explain the complete variance in each of the analysed time intervals. The results for the amplitude show values close to 1 K, and thus also the expectation. The analysis was repeated for the same periodic signal with additional noise added to the time series and also data gaps that have been incorporated. The standard deviation of the noise was 0.5 times the standard deviation of the signal and thus the variance of the noise is one-quarter of that of the signal. Additionally, about 30 % of the data points have been randomly removed. The signal with gaps (blue curve), the noise (green curve), and the complete time series (sum of both; black curve) are shown in Fig. 5d. The corresponding results are shown in Fig. 5e and f. The displayed FAP level of 5 % was determined for each LSP individually with respect to the varying length of the time interval (when end points are missing) and the number of data points inside these time intervals. Additionally, the small decrease in N_i due to the data gaps was considered (see Sect. 3.2). The change in the period is still captured very well. In the case of noise and data gaps, the normalised power reduces to a value of about 0.8 as a part of the variance can be explained by the contribution of the noise (ratio 4 to 1 for signal to noise). The amplitude shows some fluctuations, but these fluctuations go around a value of 1 K. Additionally, the noisy behaviour at smaller periods is much better visible for the amplitudes compared to the powers, because the square root of the powers enters the calculation of the amplitudes (compare Eq. 3) and therefore differences with respect to the maximum amplitude get smaller. In total, the results clearly capture the main features of the time series with respect to period, amplitude, and explained variance.

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f05

Figure 5(a) Time series of a periodic signal with increasing period. The upper panel shows the signal, the middle panel the noise, and the lower panel the sum of both. (b, c) Results for the normalised power and amplitude. The results are displayed at the centre day of the corresponding time window. The length of the time window was 60 d. The white contours mark the significant results. (d–f) Same as for (a–c) with additional noise added to the time series and data gaps.

Download

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f06

Figure 6(a) Time series of a periodic signal with varying amplitude and additional noise and data gaps. The upper panel shows the signal, the middle panel the noise and the lower panel the sum of both. (b, c) Results for the normalised power and amplitude. The results are displayed at the centre day of the corresponding time window. The length of the time window was 60 d. The white contours mark the significant results. (d) Time series of a periodic signal with increasing amplitude plus a periodic signal with varying amplitude and additional noise and data gaps. The upper two panels show the two signals, the third panel the noise, and the lower panel the sum of all. (e, f) Same as for (b, c).

Download

We additionally present two further examples. The time series and the results of the analyses are shown in Fig. 6. The first time series is composed of a periodic signal with a period of 25 d, and an amplitude that varies between 0 and 1 K (Fig. 6a blue curve in upper panel), and additional noise (Fig. 6a green curve in middle panel). The standard deviation of the noise was again 0.5 times the standard deviation of the signal and about 30 % of the data points have been removed. The complete time series is shown as a black curve in the lower panel of Fig. 6a. The results for the normalised power and the amplitude are shown in Fig. 6b and c, respectively. The normalised power shows an increasing value to the centre of the complete time interval. This behaviour is caused by the contribution of the noise to the total time series, which is much larger when the amplitude is small and decreases with increasing amplitude of the signal. The result for the amplitude nicely reflects the increase in the amplitude to the centre and the following decrease to the end of the time series. As the variation of the amplitude occurs on a smaller timescale than the chosen time interval for the analysis some kind of averaging occurs. Thus, the theoretical maximum of 1.0 K is not reached and the maximum value that is observed is about 0.9 K. In total, the main features of the signal are captured very well by the analysis and the correct period and the variation of the amplitude with time are detected. The last example shows the sum of the two former ones. Thus, the complete time series (Fig. 6d black curve in lower panel) is composed of a sinusoid with an amplitude of 1 K and an increasing period (Fig. 6d blue curve in upper panel), a periodic signal with a period of 25 d and an amplitude that varies between 0 and 1 K (Fig. 6d red curve in second panel), and noise (Fig. 6d green curve in third panel). The standard deviation of the noise and the number of data gaps are the same as before. The results for the normalised power and amplitude are presented in Fig. 6e and f, respectively. The first signal can significantly be detected during the whole time and the increase in the period from about 8 to 16 d is captured very well. As the amplitude of the second signal increases to the centre of the complete time interval, this signal can only be significantly detected in the middle of the complete time interval. The normalised power reflects the different contributions of the two signals to the complete time series very well. In the middle of the complete time series, each single signal contributes to almost the same amount, as the amplitude is about 1 K in both cases. The remaining part of the total variance can be explained by the noise (variance of noise is 0.25 times variance of sum of signals). At the beginning and the end, only the first signal and additionally the noise contribute to the complete time series. The results for the amplitude also show the main features of the two signals. For the first signal, the amplitude stays at around 1 K during the whole time and the amplitude modulation of the second signal is also captured. Compared to the former example, the result for the amplitude is noisier because of the larger absolute noise in the last example.

In summary, the applied method is able to detect periodic signals that vary with time, i.e. the amplitude or the period changes with time. In cases where changes occur on much smaller timescales than the used time window, the results show some kind of averaging. Then the maximum values of the amplitude or the explained variance cannot be obtained and a mean value in the analysed time window is derived. The method is also very useful when noise is added to the time series and additionally data gaps are introduced. Although about 30 % of the data points have been removed, the results are very good and still reflect the behaviour of the signals. Thus, the presented method is well suited to analyse time-varying periodicities even in the case of unequally spaced time series.

4.2 Measurement data

The OH^* temperatures are derived from measurements by a GRIPS (GRound-based Infrared P-branch Spectrometer) instrument operated in Wuppertal (51^∘ N, 7^∘ E; Germany). This GRIPS instrument measures three emission lines of the OH^*(3,1) band, the P1(2), P1(3), and P1(4) lines. The relative intensities of these lines are used to derive rotational temperatures (Bittner et al., 2000, and references therein). The OH layer from which the emissions originate is located in the mesopause region. The mean altitude is about 87 km and the layer has a full width at half maximum (FWHM) of about 9 km (e.g. Baker and Stair Jr., 1998; Oberheide et al., 2006). Measurements are carried out every night, except for nights with bad weather conditions. The OH^* temperatures have been continuously observed from Wuppertal since mid-1987 and a GRIPS instrument is still in operation to continue the observations. Until mid-2011 the measurements have been carried out by the GRIPS-II instrument (see Bittner et al., 2000, 2002, for an instrument description), and after that the GRIPS-N instrument (follow-up of GRIPS-II) is used to continue the observations (Kalicinsky et al., 2016).

Figure 7 shows the nightly mean OH^* temperatures for the year 1989 as an example. This year was chosen because Bittner et al. (2000) analysed the same year with a different technique (wavelet transform); thus, the results of our approach can be compared to their results. The temperatures show the typical seasonal behaviour with a temperature minimum in summer and a maximum in winter. This behaviour can be described with three main components: an annual, a semi-annual, and a ter-annual cycle (Bittner et al., 2000). The red curve in the figure shows a least squares fit to the data that considers these three components. Such fits are typically used to determine the annual average OH^* temperatures since a simple arithmetic mean is not advisable because of the data gaps (e.g. Bittner et al., 2002; Offermann et al., 2010; Perminov et al., 2014; Kalicinsky et al., 2016). The lower panel of Fig. 7 shows the residual temperatures, i.e. the OH^* temperatures minus the determined fit curve. Bittner et al. (2000) already showed that such residual temperatures include statistically significant periodic fluctuations. We now analyse the residual temperatures with respect to such fluctuations using the moving LSP approach.

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f07

Figure 7Nightly mean OH^* temperatures observed from Wuppertal in the year 1989. The red curve shows the fit of the seasonal cycle including an annual, semi-annual, and ter-annual components. The residual temperatures (measurements minus fit) are shown in the lower panel.

Download

The results for the normalised power and the amplitude are shown in Fig. 8. Different events with significant periodic fluctuations can be detected when using the moving LSP approach. The largest event is detected at the beginning of the year. The determined period is about 40 d and the amplitude 6 to 7 K. This behaviour can also be seen in the residual temperatures just by eye (compare Fig. 7). It seems that the fluctuations continue with a slightly larger period and smaller amplitude, but the result cannot be judged as significant after a centre day of the interval of about 70 d. As can be seen in the residual temperatures, the number of observations between day 75 and 125 is very low and a lot of data gaps are present. The FAP levels for time windows including a large number of data gaps increase then and thus the results are not significant, although it is likely that the signal is still there and real. Additionally, the data gaps are responsible for the vertical structure that can be clearly observed in the amplitudes in this time region, because the gaps interrupt the continuity. Around a centre day of 250, a second significant result for a fluctuation with a period of about 50 d is detected, but the amplitude is smaller with 4 to 5 K. At the end of the year additional significant events with smaller periods of about 10 and 16 d can be seen. All of these significant fluctuations agree well with the findings of Bittner et al. (2000), where the authors analysed the same observations using a wavelet transform and assimilation technique based on the maximum entropy method to get rid of the data gaps. Our new method now enables a safe detection of such significant fluctuations without the need for processing the data before the analysis.

https://www.atmos-meas-tech.net/13/467/2020/amt-13-467-2020-f08

Figure 8Results for the normalised power and amplitude for the analysis of the temperature residual of the GRIPS observations in 1989. The results are displayed at the centre day of the corresponding time window. The length of the time window was 60 d. The white contours mark the significant results.

Download

5 Summary and conclusions

We present an approach to analyse time series with unequal spacing with respect to significant period fluctuations. The approach is also able to derive time-resolved information on the periods and amplitudes of the detected fluctuations. It is based on the classical Lomb–Scargle periodogram (LSP), a method that can handle unequally spaced time series. Additionally, it uses the idea of a moving window to enable the determination of time-resolved periods and amplitudes. The significance of the results is analysed with the typically used false alarm probability (FAP). As the determination of the FAP levels needs many simulations, we derived the dependencies of the FAP levels on the length of the analysed time interval T, the frequency range Δf, and the number of data gaps to find a fast and easy way to calculate the FAP levels in the used parameter range. Thus, we can avoid a large number of simulations. In the analysed parameter range, the number of independent frequencies N_i shows a linear dependency on the length of the time interval T, because the peak width is inversely proportional to T. Furthermore, the slope of the line that describes this dependency is different for different frequency ranges, where a smaller frequency range Δf reduces the slope. We used these two relationships to quickly calculate the FAP levels. The number of data gaps has only a very minor effect, because the peak width depends on the length of the time interval and not on the number of data points.

The approach was tested with different artificially generated time series. These time series include variations of the period and amplitude with time, and, additionally, noise is added and data gaps have been introduced. In all cases, the approach shows very good results and thus the approach is a suitable method for the time-resolved detection of periodic fluctuations, even in the case of unequal spacing. Finally, we analysed the nightly mean OH^* temperatures that have been observed from Wuppertal (51^∘ N, 7^∘ E; Germany) in the year 1989. The results show several significant events with fluctuations that have periods in the range between 10 and 50 d and amplitudes between 3 and 7 K. These significant results agree very well with the results of a former study carried out by Bittner et al. (2000) without the need for processing the data before the analysis.

Data availability

The nightly mean OH^* temperatures can be obtained by request to the corresponding author or to Peter Knieling (knieling@uni-wuppertal.de).

Author contributions

CK conceptualised the method. CK and RR performed the simulations and did the analyses under intensive discussion with RK. PK provided the OH^* data. The article was written by CK with contributions from all coauthors.

Competing interests

The authors declare that they have no conflict of interest.

Acknowledgements

We thank the two reviewers for their useful comments that helped to improve the paper.

Financial support

This research has been supported by the Bundesministerium für Bildung und Forschung (ROMIC project MALODY (grant no. 01LG1207A)).

Review statement

This paper was edited by Lars Hoffmann and reviewed by two anonymous referees.

References

Baker, D. J. and Stair Jr., A. T.: Rocket measurements of the altitude distributions of the hydroxyl airglow, Phys. Scripta, 37, 611, https://doi.org/10.1088/0031-8949/37/4/021, 1998. a

Bittner, M., Offermann, D., and Graef, H. H.: Mesopause temperature variability above a midlatitude station in Europe, J. Geophys. Res., 105, 2045–2058, https://doi.org/10.1029/1999JD900307, 2000. a, b, c, d, e, f, g, h, i, j

Bittner, M., Offermann, D., Graef, H. H., Donner, M., and Hamilton, K.: An 18-year time series of OH^* rotational temperatures and middle atmosphere decadal variations, J. Atmos. Sol.-Terr. Phy., 64, 1147–1166, https://doi.org/10.1016/S1364-6826(02)00065-2, 2002. a, b

Cumming, A., Marcy, G. W., and Butler, R. P.: The lick planet search: detectability and mass thresholds, Astrophys. J., 526, 890–915, https://doi.org/10.1086/308020, 1999. a, b, c, d, e, f, g, h

Das, U. and Sinha, H. S. S.: Long‐term variations in oxygen green line emission over Kiso, Japan, from ground photometric observations using continuous wavelet transform, J. Geophys. Res., 113, D19115, https://doi.org/10.1029/2007JD009516, 2008. a, b, c

Egito, F., Buriti, R. A., Fragoso Medeiros, A., and Takahashi, H.: Ultrafast Kelvin waves in the MLT airglow and wind, and their interaction with the atmospheric tides, Ann. Geophys., 36, 231–241, https://doi.org/10.5194/angeo-36-231-2018, 2018. a, b

Espy, P. J., Stegman, J., and Witt, G.: Interannual variations of the quasi‐16‐day oscillation in the polar summer mesospheric temperature, J. Geophys. Res., 102, 1983–1990, https://doi.org/10.1029/96JD02717, 1997. a, b, c

Franzen, C., Espy, P. J., Hibbins, R. E., and Djupvik, A. A.: Observation of quasi‐periodic structures in the hydroxyl airglow on scales below 100 m, J. Geophys. Res.-Atmos., 123, 10935–10942, https://doi.org/10.1029/2018JD028732, 2018. a

Gao, H., Xu, J., and Wu, Q.: Seasonal and QBO variations in the OH nightglow emission observed by TIMED/SABER, J. Geophys. Res., 115, A06313, https://doi.org/10.1029/2009JA014641, 2010. a

Horne, J. H. and Baliunas, S. L.: A prescription for period analysis of unevenly sampled time series, Astrophys. J., 302, 757–763, 1986. a, b, c

Kalicinsky, C., Knieling, P., Koppmann, R., Offermann, D., Steinbrecht, W., and Wintel, J.: Long-term dynamics of OH^* temperatures over central Europe: trends and solar correlations, Atmos. Chem. Phys., 16, 15033–15047, https://doi.org/10.5194/acp-16-15033-2016, 2016. a, b

Lomb, N. R.: Least-squares frequency analysis of unequally spaced data, Astrophys. Space Sci., 39, 447–462, 1976. a, b

Luo, Y., Manson, A. H., Meek, C. E., Meyer, C. K., and Forbes, J. F.: The quasi 16-day oscillations in the mesosphere and lower thermosphere at Saskatoon (52^∘ N, 107^∘ W), 1980–1996, J. Geophys. Res., 105, 2125–2138, https://doi.org/10.1029/1999JD900979, 2000. a

Nyassor, P. K., Buriti, R. A., Paulino, I., Medeiros, A. F., Takahashi, H., Wrasse, C. M., and Gobbi, D.: Determination of gravity wave parameters in the airglow combining photometer and imager data, Ann. Geophys., 36, 705–715, https://doi.org/10.5194/angeo-36-705-2018, 2018. a, b, c

Oberheide, J., Offermann, D., Russell III, J. M., and Mlynczak, M. G.: Intercomparison of kinetic temperature from 15 µm CO₂ limb emissions and OH^*(3,1) rotational temperature in nearly coincident air masses: SABER, GRIPS, Geophys. Res. Lett., 33, L14811, https://doi.org/10.1029/2006GL026439, 2006. a

Offermann, D., Hoffmann, P., Knieling, P., Koppmann, R., Oberheide, J., and Steinbrecht, W.: Long-term trend and solar cycle variations of mesospheric temperature and dynamics, J. Geophys. Res., 115, D18127, https://doi.org/10.1029/2009JD013363, 2010. a

Perminov, V. I., Semenov, A. I., Medvedeva, I. V., and Zheleznov, Yu. A.: Variability of mesopause temperature from the hydroxyl airglow observations over mid-latitudinal sites, Zvenigorod and Tory, Russia, Adv. Space Res., 54, 2511–2517, https://doi.org/10.1016/j.asr.2014.01.027, 2014. a

Reid, I. M., Spargo, A. J., and Woithe, J. M.: Seasonal variations of the nighttime O(¹S) and OH (8‐3) airglow intensity at Adelaide, Australia, J. Geophys. Res.-Atmos., 119, 6991–7013, https://doi.org/10.1002/2013JD020906, 2014. a, b, c, d, e, f

Scargle, J. D.: Studies in astronomical time series analysis. II. Statistical aspects of spectral analysis of unevenly spaced data, Astrophys. J., 263, 835–853, 1982. a, b, c, d, e

Takahashi, H., Buriti, R. A., Gobbi, D., and Batista, P. P.: Equatorial planetary wave signatures observed in mesospheric airglow emissions, J. Atmos. Sol.-Terr. Phy., 64, 1263–1272, https://doi.org/10.1016/S1364-6826(02)00040-8, 2002. a

Takahashi, H., Shiokawa, K., Egito, E., Murayama, Y., Kawamura, S., and Wrasse, C. M.: Planetary wave induced wind and airglow oscillations in the middle latitude MLT region, J. Atmos. Sol.-Terr. Phy., 98, 97–104, https://doi.org/10.1016/j.jastp.2013.03.014, 2013. a

Smith, S. W.: The Scientist and Engineer's Guide to Digital Signal Processing, California Technical Publishing, San Diego, California, USA, 1997. a

Schwarzenberg-Czerny, A.: The distribution of empirical periodograms: Lomb–Scargle and PDM spectra, Mon. Not. R. Astron. Soc., 301, 831–840, https://doi.org/10.1111/j.1365-8711.1998.02086.x, 1998. a

Yoshida, S., Tsuda, T., Shimizu, A., and Nakamura, T.: Seasonal variations of 3.0∼3.8-day ultra-fast Kelvin waves observed with a meteor wind radar and radiosonde in Indonesia, Earth Planets Space, 51, 675–684, https://doi.org/10.1186/BF03353225, 1999. a

Zechmeister, M. and Kürster, M.: The generalised Lomb–Scargle periodogram – A new formalism for the floating-mean and Keplerian periodograms, Astron. Astrophys., 496, 577–584, https://doi.org/10.1051/0004-6361:200811296, 2009. a, b, c, d, e, f, g

Articles

Short summary

This study presents an approach to analyse unequally spaced time series of OH* temperatures with respect to time-varying periodic fluctuations. The approach is based on the classical Lomb–Scargle periodogram and, additionally, the idea of a moving window is used. Furthermore, a fast and easy way to analyse the significance of the results is presented. The general performance of the approach is tested with artificially generated time series and results for real observations are presented.

Determination of time-varying periodicities in unequally spaced time series of OH* temperatures using a moving Lomb–Scargle periodogram and a fast calculation of the false alarm probabilities

2.1 Classical Lomb–Scargle periodogram

2.2 Moving LSP

2.3 Normalisation of the LSP

3.1 False alarm probability

3.2 Dependency of Ni and FAP

4.1 Artificial data

4.2 Measurement data

3.2 Dependency of N_i and FAP