09 Mar 2021
09 Mar 2021
An algorithm to detect non-background signals in greenhouse gas time series from European tall tower and mountain stations
- 1Laboratoire des Sciences du Climat et de l’Environnement , LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
- 2Empa, Laboratory for Air Pollution/Environmental Technology, CH-8600 Duebendorf, Switzerland
- 3Institute for Atmospheric and Earth System Research/ Physics, University of Helsinki, Helsinki, Finland
- 4Lund University, 22100 Lund, Sweden
- 5Deutscher Wetterdienst, Meteorological Observatory Hohenpeissenberg, 82383 Hohenpeissenberg, Germany
- 6DRD/OPE, Andra, Bure, 55290, France
- 7European Center for Medium-Range Weather Forecasts, Shinfield Park, Reading, UK
- 1Laboratoire des Sciences du Climat et de l’Environnement , LSCE/IPSL, CEA-CNRS-UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
- 2Empa, Laboratory for Air Pollution/Environmental Technology, CH-8600 Duebendorf, Switzerland
- 3Institute for Atmospheric and Earth System Research/ Physics, University of Helsinki, Helsinki, Finland
- 4Lund University, 22100 Lund, Sweden
- 5Deutscher Wetterdienst, Meteorological Observatory Hohenpeissenberg, 82383 Hohenpeissenberg, Germany
- 6DRD/OPE, Andra, Bure, 55290, France
- 7European Center for Medium-Range Weather Forecasts, Shinfield Park, Reading, UK
Abstract. We present a statistical framework for near real-time signal processing to identify regional signals in CO2 time series recorded at stations which are normally uninfluenced by local processes. A curve-fitting function is first applied to the detrended time series to derive a harmonic describing the annual CO2 cycle. We then combine a polynomial fit to the data with a short-term residual filter to estimate the smoothed cycle and define a seasonally-adjusted noise component, equal to two standard deviations of the smoothed cycle about the annual cycle. Spikes in the smoothed daily data which rise above this 2σ threshold are classified as anomalies. Examining patterns of anomalous behavior across multiple sites allows us to quantify the impacts of synoptic-scale weather events and better understand the regional carbon cycling implications of extreme seasonal occurrences such as droughts.
Alex Resovsky et al.
Status: open (until 04 May 2021)
-
RC1: 'Comment on amt-2021-16', Anonymous Referee #2, 24 Mar 2021
reply
The manuscript presents an algorithm for detecting anomalous signals in European atmospheric CO2 and CH4 data by decomposing the time series into background (long-term and seasonal), non-background (synoptic/regional) elements, and very localised emissions spikes. The method relies on the CCGCRV decomposition algorithm of Thoning et al. 1989 with additional LOESS smoothing applied with spans of 30 and 90 days to isolate anomalies on synoptic and seasonal timescales respectively.
I think the manuscript warrants more clarity in a few places as mentioned in the specific comments below. In particular, clearer definitions of some of the terms (seasonal, synoptic, etc.) and demonstration of how they are treated by the framework, as these terms are susceptible to subjective interpretation. I would also recommend that in the results section of the manuscript, the interpretation of the events should either be strengthened with evidence that precludes a dominant role of atmospheric transport influences, or that the language here should be softened as I find a few of the interpretations unsupported by the observational evidence in its current form.
Nevertheless, this manuscript is generally of high quality and will be of value to the atmospheric greenhouse gas community; it therefore justifies publication in AMT, subject to some alterations as suggested below.
Specific comments
I think some additional practical information about the algorithm would be very helpful in the methods. For example, is the algorithm designed only for use with daily data? Can the smoothing spans be easily altered by the user to other values if required? What is the minimum amount of data required to make reasonable fits (e.g. 1 year, 3 years, etc)? Is the algorithm sensitive to end effects?
Top of page 4: it seems a bit limiting and perhaps a bit subjective to assume that synoptic events predominate only in the winter and seasonal perturbations occur only during the growing season. I can think of instances where this might not be the case, such as uncommonly warm winters, and periods of summer storminess/heavy rainfall. Would it not be better to apply both smoothing windows to the whole year and allow the data to determine what kind of event occurs and when, instead of this being pre-defined by the user? It is surely also possible in reality for both types of event (seasonal and synoptic) to occur at the same time, superimposed on each other. How would the algorithm treat such cases?
Top of page 5: is the time window user settable? I think so, but this could be made more explicit. Although afternoon periods are favoured in many analyses, there might be some instances where it is more appropriate to use the full daily period. How does this pre-selection of only 5 hours of data per day (at non-mountain sites) influence the detection of events?
Figure 3: I would recommend adding plots of the deseasonalised anomalies, and somewhere I would also like to see plots of the trends, since the way CCGCRV assigns ‘seasonal’ and ‘trend’ variability depends on the settings and these things are not independent of each other (more variability in the seasonal component usually results in smoother trends, and vice versa). In general, I think the manuscript would be stronger for clearer definition and demonstration of the terms “long-term trend”, “seasonal cycle”, “synoptic”, “localized”, etc., perhaps including some discussion (or by utilizing the figures) to explain the limitations/assumptions associated with each term. E.g. Here, we have assigned variability longer than XX days to the trend component of the decomposition procedure, thus excluding it from the seasonal cycle, etc.
Page 9: again, I am a bit concerned about the assumptions made regarding the length of events, when they occur and what they are caused by. It is also not clear to me why the user would only want to identify BLO type events in the winter and not also NAO events (2nd paragraph), or only seasonal events in summer, not synoptic events. If it is the case that these sorts of events in specific seasons have been chosen simply as illustrative examples to demonstrate the scope of algorithm, then I think this needs to be made much clearer to the reader. If this is not the case, I am finding it difficult to see the rationale behind this approach, so I think either more justification is required, or a broader/looser demonstration of the algorithm is needed.
Page 10: is the distinction of localized fluctuations vs SSAs part of the algorithm/framework, or was this done manually afterwards?
Pages 16-17: I would urge caution with the interpretation here. Although the signals observed at some sites appear to agree with the expected signals associated with droughts, heatwaves, etc. a causal link has not been established in this manuscript. It is entirely possible that the signals are predominantly caused by variability in atmospheric transport at the sites and that this just happens to show what you are expecting. Without some sort of wind sector/back trajectory analysis demonstrating that this is not the case, it is very difficult to be conclusive about these signals and I do not think the influence of atmospheric transport should be underestimated, nor neglected from the interpretation of the results. I would recommend some additional analysis and an extra section showing how the impact of variability in atmospheric transport might manifest in the frequency and magnitude of anomalies. If this is not feasible, I think the interpretation on fluxes needs to be significantly scaled back and the potential influence of atmospheric transport brought much more to the fore in this section.
Page 17, lines 304-305. It is difficult to see a trend in the spikes from Figure 9. I think that at some stations the record is not long enough to determine a trend, and others, such as OPE, there seems to be a reduction in all anomalies, not just negative ones. In addition, if I understand the procedure correctly, you have used a mean representation of the seasonal cycle for each station, but at most CO2 sites there is evidence that the seasonal cycle amplitude is slowly increasing, especially at northern sites. Any trend in anomalies, especially in the longer records, might therefore be partially explained by the exclusion of this increasing trend in amplitude from the seasonal cycle used in the algorithm. I would actually not expect you to see a reliable trend in the anomalies unless interannual variability in atmospheric transport processes has been accounted for. In my experience, atmospheric transport easily masks trends in emissions (natural and anthropogenic) at most atmospheric measurement sites, unless these trends are extremely significant, which is not the case here.
Discussion section: In general, I think that time series decomposition and anomaly detection are analysis techniques that are fraught with complication and I commend the authors at highlighting many of the difficulties inherent in the process. It seems to me that the method has been designed for use in the ICOS network, or an ICOS-type network. I would suggest that this network/multi-station requirement should be mentioned earlier on in the manuscript, perhaps on page 3, ‘We present here…’.
Conclusion: The first sentence needs to be softened considerably unless additional analyses are done to exclude the possibility that the pattern of anomalies predominantly reflects atmospheric transport variability, as per my previous comment.
Technical corrections
Lines 16-17: I would recommend re-wording this sentence as it is not clear whether you mean the stations are uninfluenced or the regional signals are uninfluenced by the local processes.
Abstract and elsewhere: the terms ‘weather’ and ‘atmospheric transport’ are used somewhat synonymously through the manuscript, but these do mean different things. I would recommend the authors pick one and only use that, except in specific instances where a different meaning is necessary. The same might apply to ‘meteorological’ and ‘climatic’. Also ‘swath’/‘span’.
Line 44 and elsewhere: the usual notation I think is ±2σ, ±3σ, etc.
Line 54: ‘A Gaussian was then defined using…’ is this a typo? I am not familiar with the word Gaussian as a noun. Should it be Gaussian curve/fit/function? Same with ‘Gamma’.
Line 78: at multiple [European] sites?
Line 210: no ‘-‘ after NAO?
Figure 5: it looks to me like the positive SSA in CH4 at HPB at the end of Jan 2019 is caused by a gap, followed immediately by a very transitory positive spike in the data and is therefore perhaps not an SSA at all? I wonder if a similar thing happens also in CO2 in Fig 4 at GAT in later Oct 2018, only this time the spike (negative) comes first and the data gap is directly after. At GAT it looks like there might be a straight turquoise line interpolating this data gap when perhaps there should not be?
Line 308-309: ‘CCGCRV is ill-suited to handle such gaps…’ I think this requires a citation.
-
AC1: 'Reply on RC1', Alex Resovsky, 14 Apr 2021
reply
The referee's suggestions are valid and quite helpful, especially those concerning the need for additional explanations to clarify the usage of certain terms, the misleading use of certain terms which are employed interchangeably throughout the manuscript, the need for additional practical information about the algorithm's functionality, the need for additional transparency regarding the calculation of the seasonal cycle and trend components by CCGCRV, and the suggestion to scale back the interpretation of the results with regard to seasonal CO2 anomaly detection. Some specific responses follow:
Last paragraph of the introduction changed to note that the algorithm is designed for use with daily datasets. Sentence #2 of the second paragraph of the methodology section changed to read “In our case, we select two different settings for the short-term filter…” This is meant to impart to the reader that the smoothing span and filter length are user-definable. We have no best estimate for the minimum time series length required to produce a “reasonable” fit. By default, the minimum required length in our study is that of the shortest records in our analysis (4 years at HPB and NOR), since we effectively assume that the results we obtain at these sites are generally as valid/useful as those obtained at sites with longer historical records. Admittedly, it would be nice to have a uniform setting for the historical record length that we use in all cases, but the problem is that many ICOS sites have come online only recently, and using only the past 4 years of data just for the sake of making the analysis uniform would have meant ignoring results prior to 2017 at the other 8 sites. This would mislead our readers as to the breadth of the algorithm’s applications, and moreover, 4 years seems objectively too short for a good fit at the sites which have almost 10 years of usable data.
We have now noted in the first paragraph of section 2.3 that the time period used for the calculations is user-settable (there is a line in the code for this purpose). In addition, there is a paragraph in the discussion section noting that the length of the historical record surely makes a difference for the results. We have added to this paragraph a sentence noting that in the future, as our stations accumulate longer and longer records, a “standard” record length (e.g. 10 years) may be adopted for added consistency. As for end effects, we feel that we have adequately circumvented this problem by calculating our ±2σ envelope according to the procedure described in equation (3), where σd for any day, including the very last day in the record, depends on a sufficiently large residual dataset in the vicinity of that same calendar day over several years of data. Any end effects should thus be negligible.
Top of page 4: While it is true that synoptic events do not occur only in winter and seasonal anomalies are not best characterized as occurring only in summer, dividing the analysis in this way seemed a logical choice. Firstly, as we mention in the discussion section, anomaly patterns appearing to be “seasonal” in length may simply reflect the unusual persistence or frequency of synoptic-scale weather patterns. This might conceivably be the case for unusually warm winters over Europe, where we would expect frequent or persistent positive NAO regimes to dominate. However, when we apply the 90-day smoothing span to the whole year and examine winter periods, we find that such winters appear mostly as flat, black lines on our anomaly graphs. This makes them, by our definition, not very anomalous. The more interesting winters are those featuring frequent cold weather patterns from the east, which do show positive CO2 anomalies at the 90-day bandwidth. However, we feel that these winters are better viewed as just that (dominated by more frequent atmospheric transport patterns emerging from the east) and not as winters with an atypical seasonal suppression of photosynthetic activity. Terrestrial carbon cycle anomalies such as GPP anomalies are events to which we ascribe “seasonal” anomalies, and these do not appear to be the driving force of the excursions that we detect in the wintertime. If they were, we would also expect that warmer winters would coincide with increased GPP, enough so that we might observe negative seasonal-length CO2 anomalies during warm winters. We did not observe any such wintertime 90-day anomalies, at least not with our ±2σ envelope definition.
In the summer, a span of 30 days rendered the figures a bit too cluttered to be able to clearly see the signals we were hoping to detect. Indeed, this is likely due to the fact that synoptic and seasonal signals can occur simultaneously and overlap with one another, as you mention. Our intention in using a 90-day smoothing span for the summer was actually to filter out synoptic signals to the extent possible. Synoptic anomalies certainly do occur in the summertime, but they are more difficult to tease out than they are in the winter, due to the contemporaneous effects of terrestrial carbon cycling processes which often occur at longer timescales. In addition, the NAO index is slightly less well-defined in the summertime, which increases uncertainty insofar as interpreting the causes of any synoptic signals observed. We therefore chose not to examine the algorithm’s ability to detect synoptic length events in the summer. We agree that the reasons behind our decision could be explained a bit more thoroughly in the manuscript and have added an explanation for our rationale in the third paragraph of section 2.
Top of page 5: Yes. There is a line in the code which extracts only afternoon hours for non-mountain sites and nighttime hours for mountain sites, prior to the analysis. The user could skip this step and keep all 24 readings for each day (or however many were available), or select a different time range. We have now noted this in section 2.1. In general, studies in our field tend to use afternoon values for non-mountain sites (e.g. Morgan et al., 2015; El Yazidi et al., 2018; Wang et al., 2018), and we decided to keep with this convention. We did run the analysis using all daily readings at all sites, and the anomaly patterns were not drastically different overall. However, using the mean of all 24 daily measurements renders the results less useful for flux inversion estimations, which generally rely on samples taken when vertical CO2 gradients are lowest (Monteil et al., 2019), and would be inconsistent with ICOS flask sampling protocols, which recommend afternoon sampling at non-mountain sites and nighttime sampling at mountain sites.
Figure 3: Plots of the deseasonalized anomaly patterns (δC30 and δC90) for the TRN station have now been added to figure 3. The long-term trend for 2013-2020 at TRN has been added to figure 2. Our usage of the terms “seasonal cycle” and “long-term trend” has now been explicitly defined in the first paragraph of section 2.2. Our usage of the term “synoptic” is already defined in the Introduction (paragraph 4). As to the specific example the reviewer gives (“Here, we have assigned variability longer than XX days to the trend component of the decomposition procedure, thus excluding it from the seasonal cycle”), we feel that this adequately discussed in the third paragraph of section 2.2, where we state that we use 667 days as the cutoff value for the long-term filter. As we understand it, variations at frequencies greater than 1 cycle per year are by default not considered a part of the seasonal cycle. It is thus implied that variations at frequencies longer than 365 days and up to a maximum of 667 days are assigned to the trend component. The term “localized” to describe nearby point sources of pollution has now been clarified in paragraph 6 of the introduction.
Page 9: It is true that we chose to examine “seasonal” events in summer and “synoptic” events in winter partially to illustrate the scope of algorithm, and some of what we thought were its most useful applications. As mentioned above, we also found that after trial and error, this was the most logical way to divide the analysis and interpret the results. We have now explained this rationale in the third paragraph of section 2.
Page 10: Localized fluctuations were counted manually as a sort of ad-hoc way to score the skill of the algorithm. We might consider removing this step from the manuscript altogether, since it is not really a part of the algorithm per se. In the meantime, we have noted the manual nature of this step in section 3.1.
Page 16-17: It does seem a bit unfeasible at this point to conduct back-trajectory analyses over several months at ten separate sites. Less labor-intensive would be using monthly average NAO indices to test whether, for example, the CO2 spike observed in July of 2018 corresponded with unusually intense or persistent blocking conditions. Then again, we already know that this was the case, to some extent. Thus, we agree that it is probably better to note that the algorithm merely produces the signals we expect to see, and not extrapolate too much as to why, or at least to mention that atmospheric transport may have also played a significant role in the summer of 2018. We have scaled back our interpretation of the results in section 3.2 and included a new paragraph noting that, although the timing of the seasonal anomaly patterns we observe in 2018 corresponds to the terrestrial biospheric signals observed across Europe in that year, we can not definitively say that these signals are what we detect, and have not established a causal link.
Page 17: We came to the same conclusion, in analyses conducted separately. Best we do not read too far into any trends in variability or anomalies that we think we observe. The final paragraph of section 3.2 has been removed, as it was largely based on speculation and not any scientific analysis, as noted.
Discussion section: Paragraph 4 of the introduction has been changed to note that the methodology is designed for application to ICOS station data specifically.
Conclusion: The first paragraph has been edited to reiterate that the effects of summertime atmospheric transport have not been quantified in this study and as such, the algorithm’s capacity to detect exceptional biospheric episodes at the seasonal bandwidth has not be definitely determined.
Technical corrections:
- The opening sentence of the abstract has been re-worded to make it less ambiguous, as suggested.
- The term “weather” has been replaced with “atmospheric transport” in all instances. The lone use of the phrase “climatological occurrences” has been replaced by “meteorological occurrences.” The term “swath” used once in the literature review section to describe a window of time series measurements has been changed to “span.”
- Notation change: σ, 2σ and 3σ have been changed to ±σ, ±2σ and ±3σ where appropriate.
- The terms “Gaussian” and “Gamma” are used as nouns in the literature, although perhaps less frequently than we realized. We have changed these to “Gaussian curve” and “Gamma curve” for added clarity.
- Line 78 wording changed to read “at multiple European sites.”
- Line 210: No changes made. The sentence is intended to read “such as when NAO- or BLO regimes prevail.”
- The positive SSA in CH4 at HPB at the end of Jan 2019 is not caused by a gap, although the figure appears to show a data gap at this time. Actually, the CH4 reading was just too high to be contained within the selected y-axis bounds. The default behavior in ggplot is to leave a blank space in such instances. The y-axis bounds have now been adjusted accordingly. Good catch. The observed CH4/CO2 gap in October/November of 2018 at GAT, on the other hand, is real. No readings exist for either trace gas at GAT from Oct. 23 to Nov. 21, 2018, likely due to an instrument malfunction at this time. Before the CCGCRV fitting procedure is applied to the raw daily measurements, data gaps such as this are filled using a simple linear interpolation, as mentioned in the first paragraph of the discussion section and as recommended by, e.g., Pickers and Manning (2015). This may, in some cases, lead to the selection of “false positives” such as the one you notice at GAT. This is, admittedly, a drawback of the algorithm, albeit one without an easy solution other than manual inspection. However, we note that the issue here is not with our methodology, but rather the data gap itself, i.e. large data gaps may occasionally compromise the integrity of the results. In the first paragraph of the discussion section, we have now alluded specifically to the gap you note at GAT as an illustrative example of the data gaps problem.
- Pickers and Manning (among others) note that the FFT algorithm used by CCGCRV requires time series data to be evenly spaced and without gaps. Their work has now been cited in the first paragraph of the discussion section.
-
AC1: 'Reply on RC1', Alex Resovsky, 14 Apr 2021
reply
Alex Resovsky et al.
Alex Resovsky et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
195 | 67 | 6 | 268 | 0 | 2 |
- HTML: 195
- PDF: 67
- XML: 6
- Total: 268
- BibTeX: 0
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1