Cubic splines with equidistant spline sampling points are a common method in atmospheric science, used for the approximation of background conditions by means of filtering superimposed fluctuations from a data series. What is defined as background or superimposed fluctuation depends on the specific research question. The latter also determines whether the spline or the residuals – the subtraction of the spline from the original time series – are further analysed.

Based on test data sets, we show that the quality of approximation of the background state does not increase continuously with an increasing number of spline sampling points and/or decreasing distance between two spline sampling points. Splines can generate considerable artificial oscillations in the background and the residuals.

We introduce a repeating spline approach which is able to significantly reduce this phenomenon. We apply it not only to the test data but also to TIMED-SABER temperature data and choose the distance between two spline sampling points in a way that is sensitive for a large spectrum of gravity waves.

It is essential for the analysis of atmospheric wave signatures like gravity waves that these fluctuations are properly separated from the background. Therefore, particular attention must be attributed to this step during data analysis. Splines are a common method in atmospheric science for the approximation of atmospheric background conditions. The shortest wavelength or period which can be resolved by the spline is twice the sampling point distance according to the Nyquist theorem. Depending on the field of interest, either the smoothed data series or the residuals – the subtraction of a spline from the original time series – are further analysed (see, for example, the work of Kramer et al., 2016; Baumgarten et al., 2015; Zhang et al., 2012; Wüst and Bittner, 2011, 2008; Young et al., 1997; Eckermann et al., 1995).

Algorithms for the calculation of splines are implemented in many programming languages and in various code packages, making them easy to use. Nevertheless, spline approximations sometimes need to be handled with care when it comes to physical interpretation.

Mean squared temperature residuals for the years 2010 to 2014
(colour-coded). They are derived from TIMED-SABER, data version 2.0 by using
a cubic spline routine with equidistant sampling points for detrending. The
distance between two spline sampling points is 10 km. All vertical SABER
temperature profiles which were retrieved between 44 and
48

Figure 1 explains our motivation for the work presented below. It shows the
squared temperature residuals averaged over 1 year for the years
2010–2014 versus height between 44 and 48

Since we are not aware of any physical reason for this oscillation, we formulate the hypothesis that this is an artefact of the analysis. In order to avoid or at least reduce such problems, here we propose a repeating variation of the cubic spline approach, which we explain in Sect. 2. In Sect. 3, we apply the original and the repeating approach to test data sets. The results are discussed in Sect. 4. A brief summary is given in Sect. 5.

The approach we investigate here relies on cubic splines with equidistant sampling points. Since spline theory is well elaborated, we will not go into much detail here. The algorithm we use is based on Lawson and Hanson (1974).

The first step for the adaption of a spline function to a data series on an interval [a, b] is choosing the number of spline sampling points (also called knots). These points divide the interval for which the spline is calculated into subintervals of equal length. For each subinterval a third-order polynomial needs to be defined, which means the coefficients have to be determined. At the spline sampling points, not only the function value, but also the first and second derivatives of the two adjacent polynomials need to be equal. The optimal set of coefficients is calculated according to a least squares approach where the sum of the squared differences between the data series and the spline is minimized.

As mentioned above, the number of spline sampling points (and the length of
the data series) determines the sensitivity of the spline to specific
wavelengths. Since the length of the data series must be an integer number
of the distance between two spline sampling points, only certain distances
between two consecutive spline sampling points can be chosen if the whole
data series is approximated. We would like to operate the spline algorithm
by providing the shortest wavelength which shall be resolved
by the spline. That means that we have to cut the upper part of the profile
in each case. This is only possible for data sets of sufficient length
such as the SABER temperature profiles, which we used for this purpose. In detail, our
spline algorithm works as follows. The scheme includes the repeating as well
as the non-repeating algorithm.

Provision of shortest wavelength

We provide the algorithm with the shortest wavelength which shall be resolved by the spline (in the following denoted by lim). It is equal to the doubled distance between two spline sampling points; therefore the distance between two spline sampling points is equal to lim/2.

Determination of

The minimal

Calculation of spline approximation

The spline approximation is calculated based on Lawson and Hanson (1974). If the length of the data series is not equal to an integer multiple of lim/2, the surplus part at the end of the data series is not subject of this step. For the non-repeating approach, the spline algorithm stops here.

Iteration of starting point

The first point of the data series is removed and steps 2 and 3 are repeated.
If the starting point is equal to the original minimal

Calculation of the final spline

The mean of all splines derived before is calculated. That is the final (repeating) spline.

For the repeating approach, the length of the data series is not the same in
each iteration since data at the beginning and the end of the data series
are not necessarily part of each iteration: at the beginning of the data
series, this holds for all

For the non-repeating approach, data are cut only at the end of the data series if the length of the data series is not equal to an integer multiple of lim/2.

The purpose of this section is to help to understand the general behaviour of splines if the data set contains waves with a wavelength of double the sampling point distance, which may happen in the general case of an unknown mixture of waves.

We generate a basic example using an artificial sine with a vertical wavelength of 3 km, a phase of zero and an amplitude of one. The function is sampled every 375 m (that means at its zero crossings, at its extrema and once in between the zero crossing and the next extremum/the extremum and the next zero crossing).

The values for the sampling rate and the vertical wavelength are set arbitrarily. However, the spatial resolution of 375 m is motivated through the spatial resolution of TIMED-SABER, an instrument which is commonly used for the investigation of gravity waves (e.g. Zhang et al., 2012; Ern et al., 2011; Wright et al., 2011; Krebsbach and Preusse, 2007) and which delivered also the temperature profiles we used in Fig. 1.

Figure 2a shows the test data series (dotted line) between 15 and 100 km height. This large height range is chosen since it facilitates the demonstration of our results. A non-repeating spline with a distance of 1.5 km between two spline sampling points is fitted (solid line). According to the Nyquist theorem, the chosen distance between two spline sampling points is small enough to resolve the oscillation in our test data. In parts (b) and (c) of Fig. 2, a spline with a distance of 1.6 and 1.4 km between two spline sampling points is calculated. Parts (d) to (f) of Fig. 2 focus on the height range of 15 to 50 km of Fig. 2a to c: here, the height-coordinates of the spline sampling points are plotted additionally (dashed-dotted lines). The asterisks mark the sampling points of the original sine. The spline adaption in Fig. 2a/d differs significantly from the spline adaption in Fig. 2b/e and 2c/f: apart from a slight oscillation at the beginning/end of the height interval, the spline is equal to zero in Fig. 2a/d. The spline approximation plotted in Fig. 2b and c shows a beat-like structure across the whole height range.

This figure shows the approximation of a cubic spline using
different numbers of spline sampling points.

In order to give an overview concerning the quality of adaption not only for
some chosen examples as they were shown in Fig. 2, the test data set is
approximated by a cubic spline with varying numbers of spline sampling
points. The squared differences between the spline and the test data are
summed up between 20 and 40 km (this height interval is chosen in order
to be consistent with Fig. 7 later). We call this value the sum of squared
residuals which is equal to the approximation error in this case. It does not
decrease continuously with an increasing number of spline sampling points and/or decreasing distance between two spline sampling points but it is
characterized through a superimposed oscillation which reaches its maximum
for a distance of ca. 1.5 km between two spline sampling points (Fig. 3,
solid line). When changing the phase of the test data set to

The analysis described above is repeated, but the phase of the oscillation
varies between 0 and 2

This figure shows the differences between the spline and the
approximated test data (solid line: phase of 0, dashed line: phase of

This example directly motivates the application of the repeating spline approach on the same test data set (see Fig. 5a–f, which can be directly compared to Fig. 2a–f: the black line represents the final spline approximation and the different colours refer to the spline approximations during the different iteration steps). In this case, the sum of squared residuals depends much less on the distance between two spline sampling points (Fig. 6a) and on the phase of the test data set (Fig. 6b). Only for a distance of 1.6 km between two spline sampling points is a slight phase dependence still visible (Fig. 6b).

Dependence of the sum of squared residuals on the phase of the wave with a wavelength of 3.0 km and a distance of 1.4 km (short dashes), 1.5 km (solid line) and 1.6 km (long dashes) between two spline sampling points.

Until now, we showed only test data which are not superimposed on a
larger-scale variation like the atmospheric temperature background. Now,
three sinusoidals with vertical wavelengths of 3, 5 and 13 km, phase 0,

Here, the results based on the repeating spline approach are
shown. The different colours refer to the different spline approximations
(to keep it as clear as possible, we only show the first four iterations, a
fifth one exists for case

Part

In Sect. 3, we showed that the quality of a spline which approximates the
background, and its ability to filter for a specific part of the wave
spectrum, vary:

with the number of spline sampling points, and

with the exact position (height coordinate) of the spline sampling points.

When the distance between two spline sampling points matches exactly half
the wavelength of the test data, the approximation is worst for a phase of 0
and

Furthermore, we showed that if the distance between two spline sampling points is only slightly larger or smaller than half the wavelength present in the data series and if enough wave trains are present (which might not be the case in reality), the non-repeating spline resembles a beat (see Fig. 2b and c; an explanation is given in Appendix B). The subtraction of such a beat will lead to an artificial oscillation in the residuals with a periodically increasing and decreasing amplitude reaching ca. 70–80 % of the original amplitude at maximum (Fig. 2e and f). This oscillation must not be interpreted as a gravity wave of varying amplitude, for example, and the described effect has to be taken into account when analysing wavelengths similar to the doubled distance between two spline sampling points.

For our case studies, we used a constant and a realistic CIRA-based temperature background profile. For both background profiles, we showed that the sum of squared residuals decreases much more smoothly with an increasing number of spline sampling points for the repeating approach compared to the non-repeating one (compare Fig. 3 to Fig. 6a) and the amplitude of the beat-like structure is reduced.

However, the motivation for this work was – as already mentioned – the
results shown in Fig. 1 which are characterized by a strong superimposed
oscillation with a wavelength of approximately 10 km for which we do not
have a physical explanation. Figure 8a now depicts the mean squared
residuals after the application of the repeating spline to the same data
set; Fig. 8b focuses on the year 2014 (the dashed line is based on the
application of the repeating spline, the solid line refers to the
non-repeating spline). This year is chosen arbitrarily and allows the direct
comparison of the repeating and non-repeating approach. The amplitude
of the superimposed oscillation is reduced significantly but the oscillation
can still be observed. This supports our hypothesis that the strong
superimposed oscillation described in Fig. 1 is an artefact of the
non-repeating spline detrending procedure. Furthermore, it now becomes obvious
that gravity wave activity increases less with altitude between
approximately 45 and 60 km height compared to the height range below and
above. This is in accordance with the literature (e.g. Mzé et al., 2014;
Offermann et al., 2009). For most heights, the mean squared residuals are
smaller for the repeating approach than for the non-repeating one. At 38 km
height, for example, the difference reaches ca. 2.5 K

In order to give a comprehensive comparison of the repeating and non-repeating spline algorithm, we also calculate the mean (non-squared) residuals. In this case, the results look very similar. In both cases, they again show an oscillation with a vertical wavelength of 10–20 km (Fig. 8c for the non-repeating approach, Fig. 8d for the repeating spline approach). We can explain this in the following way: when calculating the mean (non-squared) residuals and the mean squared residuals at a specific height, one refers to two different parameters of the distribution of residuals at that specific height. While the mean (non-squared) residuals estimate the mean of the distribution, the mean squared residuals refer to the variance of the distribution. We conclude that at a defined height, the repeating approach changes the mean of the distribution of the residuals only slightly, but it reduces its spread significantly. For individual profiles, the approximation through the repeating approach is therefore less variable on average and can be recommended. The repeating approach can also be recommended if squared residuals are needed for further analysis (e.g. for the calculation of the wave potential energy). If non-squared residuals will be analysed, it does not make a difference on average which approach is applied; for the individual profile, however, this does not necessarily hold. In this case, only waves with amplitudes larger than 0.5 K in the stratosphere and 1.0 K in the mesosphere (Fig. 8c and d) should be taken seriously.

It is known that the tropo-, strato- and mesopause, where the temperature gradient becomes zero and changes, are challenging for approximation methods. The same holds for the beginning and the end of a data series. This becomes evident when a smooth profile like a CIRA-temperature profile is detrended with different numbers of spline sampling points (see Fig. 9a–d). In these cases, the residuals show oscillations for both approaches, the repeating and the non-repeating spline, which become smaller with decreasing distance between two spline sampling points. Compared to Fig. 7d, which shows the difference between the approximated background and the real one in the presence of typical gravity wave signatures, and restricted to the height range above 40 km, the strength and the position of the oscillations (in Fig. 9d) change only slightly (see Fig. 10a). That means the non-optimal approximation of the smooth temperature background, i.e. without any gravity waves, is the most likely reason for the oscillations observed in the mean SABER residuals for both approaches (see Fig. 8c and d).

However, for the non-repeating approach strength and position of the oscillation in the residuals (detrended CIRA background) change when another starting height is chosen while the oscillation is only slightly shifted in the vertical for the repeating approach (Fig. 10b and c). For Fig. 1, the starting height varied mostly in the range of ca. 2 km. A comparison of Figs. 1 and 8c reveals that the height coordinates of the local extrema of the mean residuals correspond approximately to the ones of the mean squared residuals. The less pronounced dependence of the repeating spline approach on the starting height (Fig. 10b) is therefore the most likely reason for the lower variance of the residuals (as described above).

There exist many methods to approximate/detrend/filter time series (see e.g Baumgarten et al., 2015, and references therein) and we do not claim that the presented repeating cubic spline is the best method for every purpose and every data series. It is just one possible algorithm which reduces artefacts of the non-repeating cubic spline routine as proposed by Lawson and Hanson (1974) if the data set contains waves with wavelengths of about double the sampling point distance, which is mostly not known in advance. Furthermore, it reduces its dependence on the starting height. However, it comes with enhanced computational effort which is of special importance when analysing large data sets.

It is essential for the analysis of atmospheric wave signatures like gravity waves that these fluctuations are properly separated from the background. Therefore, particular attention must be attributed to this step.

Cubic splines with equidistant sampling points are a common method in atmospheric science for the approximation of superimposed, large-scale structures in data series. The subtraction of the spline from the original time series allows the investigation of the residuals by means of different spectral analysis techniques. However, splines can generate artificial oscillations in the residuals – especially if the background is described by a coarse spline or if the data set contains waves with wavelengths of about double the sampling point distance – which must not be interpreted in terms of gravity waves. The ability of a spline to approximate the background state (and large-scale wave-induced fluctuations) does not only vary with the number of spline sampling points, but also with their exact position.

Since knowledge about the wavelengths present in the data set is normally not available in advance, this directly motivates the use of a repeating spline which is based on changing starting points. It comes with enhanced computational effort but can be recommended for the approximation/detrending of individual profiles and if squared residuals are needed for further analysis (e.g. for the calculation of the wave potential energy).

The SABER data are available at the SABER home page

The test data sets are superimposed sinusoidal oscillations. Their parameters are given in the manuscript, so they can easily be reproduced.

We would like to thank the TIMED-SABER team for their great work in providing an excellent data set.

We also thank the Bavarian Ministry for the Environment and Consumer Protection for financially supporting our work: Verena Wendt was paid by the Bavarian project BHEA (Project number TLK01U-49580, 2010–2013). The work of Sabine Wüst was subsidized in part by this project.

Last, we thank Julian Schmoeckel, formerly from the University of Augsburg, for helping us to produce the test data sets and the figures. The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association. Edited by: Gerd Baumgarten Reviewed by: two anonymous referees