An algorithm to detect non-background signals in greenhouse gas time series from European tall tower and mountain stations

We present a statistical framework to identify regional signals in station-based CO2 time series with minimal local influence. A curve-fitting function is first applied to the detrended time series to derive a harmonic describing the annual CO2 cycle. We then combine a polynomial fit to the data with a short-term residual filter to estimate the smoothed cycle and define a seasonally adjusted noise component, equal to 2 standard deviations of the smoothed cycle about the annual cycle. Spikes in the smoothed daily data which surpass this±2σ threshold are classified as anomalies. Examining patterns of anomalous behavior across multiple sites allows us to quantify the impacts of synoptic-scale atmospheric transport events and better understand the regional carbon cycling implications of extreme seasonal occurrences such as droughts.


Introduction
Continuous measurements of long-lived atmospheric greenhouse gases (GHGs) at ground-based monitoring stations exhibit 25 variations at multiple timescales. These include a well-established diurnal cycle and an annual pattern linked to seasonality which generally exist on top of the long-term trend of the background concentration. Other variations, related to localized surface fluxes or regional-scale atmospheric transport patterns, are observable at synoptic frequencies lasting from 1 -2 days to several weeks, while others reflect longer-term meteorological occurrences such as droughts or ocean circulation anomalies.
Identification of these latter components can reveal much about the intensity and geographic extent of specific atmospheric 30 events while also improving understanding of background signal evolution. Extracting them, however, requires a methodology to decompose the signal into "background" and "non-background" components and to differentiate meteorology-driven regional signals from spikes due to local emissions, biospheric uptake and other forms of signal noise. https://doi.org/10.5194/amt-2021-16 Preprint. Discussion started: 9 March 2021 c Author(s) 2021. CC BY 4.0 License.
We define "background" here as "the concentration of a given species in a pristine air mass in which anthropogenic impurities of a relatively short lifetime are not present" (IUPAC, 1997). Various methods exist to extract background signals in 35 atmospheric time series. These include back trajectory analyses that categorize readings based on air provenance (e.g. Schuepbach et al., 2001;Balzani Loöv et al., 2008;Cui et al., 2011) and the application of chemical filters using markers such as 222 Rn (e.g. Biraud et al., 2000;Pal et al., 2015;Chambers et al., 2016) or NOy/CO (e.g. Parrish et al., 1991;Zellweger et al., 2003). Although such approaches yield reliable estimates, they are often labor-intensive or require sophisticated transport modeling or additional instrumentation and must take into account site-specific measurement conditions and data availability. 40 Statistical algorithms provide high-precision, computationally inexpensive alternatives to these techniques. These commonly involve a two-or three-step process in which data are first smoothed using filters or polynomial curve fitting, then subsequently refined through the identification of outliers, characterized as points which deviate from the curve by more than a specified threshold (e.g. σ, 2σ, or 3σ, where σ is the standard deviation of the residuals about a smooth curve fit to the data).
Already in the late 1980s, Thoning et al. (1989) developed a filtering technique to separate the annual cycle from the long-45 term trend and approximate the background signal of the CO2 record at Mauna Loa (Hawaii). More recently, O'Doherty et al.
(2001) extracted non-background components of atmospheric CHCl3 time series by fitting a polynomial to the daily minima of a moving 121-day swath of measurements. They then subtracted the polynomial fit from the data and estimated σ from the measurements below the median of the residual distribution. Measurements on the middle day of each 121 -day period exceeding 3σ were flagged as "polluted" and removed. In a second iteration, readings between 2σ and 3σ above the median 50 of the newly refined residual set were marked as "possibly polluted" and subsequently removed if immediately adjacent to "polluted" data points. Giostra et al. (2011) applied a similar approach to atmospheric halocarbon records. They calculated a probability density function (PDF) using the deviations of all data points from σ, predefined as the sixteenth percentile of measurements within a 30-day span. A Gaussian was then defined using σ and the median value of the PDF, and a Gamma was fit such that the sum of the two curves yielded a best-fit to the PDF. The background was approximated using all data 55 points below the intersection of the Gamma and the right-hand branch of the Gaussian. Ruckstuhl et al. (2012) estimated background signals in atmospheric CO and HFC-152a series by applying a localized linear regression to a given span of data points and removing points which deviated by more than the σ-value of the negative (left side) residuals within each successive, overlapping span. Individual points were then weighted for robustness according to their distance from the newly defined background curve, with iterative applications further refining the dataset. Apadula et al. (2019) developed an algorithm to 60 subtract outliers from hourly CO2 datasets. They first removed all values which differed by more than a specified threshold ρ from the median value within a sliding 21-day window and subsequently rejected values that differed by more than ρ from the mean value of the remaining points.
Such methods have been widely applied to estimate baseline concentrations of atmospheric trace gases and, in some instances, to identify the occurrence of short-term signal spikes in time series (e.g. El Yazidi et al., 2018). Lacking in the current literature, however, is a comprehensive statistical framework for the extraction of non-background events occurring at synoptic (1-2 days to several weeks) to seasonal timescales. We thus present here a novel approach to identify exceptional non-background events ("anomalies") in atmospheric time series based on statistical curve-fitting, LOESS smoothing and outlier detection with the aim of developing a protocol for the detection of anomalous episodes of synoptic and seasonal duration. In particular, our goal is to investigate whether seasonal-and synoptic-length deviations from background concentrations can be discerned in near 70 real-time (NRT) through statistical filtering and cross-referencing observations from multiple sites, and to present a framework for communicating information about such events to station managers and other end users.
We focus primarily on CO2, although we validate our detection of wintertime CO2 signal peaks by applying our methodology to concurrent CH4 time series. In the winter months, since carbon exchanges related to terrestrial ecosystem exchange are relatively limited, the timing of meteorology-driven anomalies observed in CO2 and CH4 signals should be similar, as they are 75 linked principally to changes in the predominant upwind air source. Validation of summer CO2 anomaly patterns using CH4 is impractical due to the dominant role of the biosphere on CO2 concentrations during the growing season.
We place particular emphasis on the discernibility of anomalies observable at multiple sites, since we reason that these are most likely to represent continent-wide terrestrial biosphere changes or synoptic-scale transport patterns, as opposed to localized contamination effects or other forms of noise. Moreover, the ability to identify these multi-site events is critical in 80 communicating to station managers in near real-time the presence of atypical signals and in mapping the footprint of regional carbon cycle fluctuations.
Finally, we present the methodology in the context of a near real-time anomaly detection algorithm (ADA) developed and employed at the Atmospheric Thematic Centre of the Integrated Carbon Observation System (ICOS ATC). The algorithm is concise, portable, and is intended to be used with multi-year datasets consisting of validated (level 2) data and NRT (level 1) 85 data from sites in the ICOS network. Both R and Python implementations of the algorithm currently exist, but the methodology can theoretically be adapted to any programming language by any user with access to the ICOS Carbon Portal or other standardized GHG data. The methodology described in the following sections refers to the R implementation of the algorithm.

Materials and Methods
We conduct our analysis using daily aggregated CO2 and CH4 data from ten European sites. At each, we approximate the 90 background signal of both trace gases using the curve-fitting method of Thoning et al. (1989). We then define an "envelope" representing the range of normal or expected seasonal variability in the signal. The envelope is calculated from the smoothed cycle, which consists of a polynomial function fit to the data and a short-term residual filter. The upper and lower bounds of the envelope are defined by the second standard deviation (2σ) of the smoothed cycle about the background signal and are adjusted to account for seasonal effects on signal stability. 95 We then smooth the daily data using a LOESS function and evaluate the smoothed daily data in relation to the 2σenvelope. Two settings are used for the short-term filter and the LOESS smoothing span: 30 and 90 days. The 30-day analysis is applied to the extended winter season (November-March), where our goal is to discern anomalies indicative of shifts in weather patterns. These synoptic scale anomalies (SSAs) are identified as peaks where consecutive smoothed daily measurements fall outside the 2σ-envelope. The 90-day analysis is applied to the growing season (April-October), with the 100 aim of identifying seasonal anomalies. At this wider bandwidth, the smoothing function should be minimally affected by shorter (< 1 month) regional signals or SSAs, and thus large spikes detected are taken to reflect seasonal-length perturbations such as droughts, springtime carbon uptake, or mesoscale circulation anomalies.

Observations
We analyze continuous time series data from ten stations, which are part of the Atmosphere network of the European Integrated 105 Carbon Observation System (ICOS) research infrastructure (ICOS RI, 2020a, b). ICOS provides high-precision, long-term and standardized observations of the carbon cycle such as GHG concentrations in the atmosphere and GHG exchanges between the atmosphere, ecosystems and oceans. All ICOS stations are rigorously assessed before being labelled, i.e. before receiving approval to join the network (Yver-Kwok et al., 2021). Daily CO2 and CH4 records for the ten stations are available through the ICOS Carbon Portal (https://www.icos-cp.eu) from varying start dates, depending on the date an individual station joined 110 the ICOS network.  We use the complete records available through the ICOS Carbon Portal for all sites except OPE and PUY. For these two, we 115 use slightly longer records obtained from the ICOS ATC data products database. We first concatenate all available validated L2 daily data (ICOS RI, 2020b) together with daily near-real time (L1) data (ICOS RI, 2018), which are typically available for the past year or so. We then extract and aggregate the afternoon (12:00pm-5:00pm) values for each site except for the two mountain sites (JFJ, PUY), where we extract and aggregate nighttime values only (8:00pm-5:00am). The afternoon period generally represents optimal mixing conditions of boundary layer air at non-mountain sites. At the mountain sites, nighttime 120 values are used to capture the properties of subsiding air from the free troposphere. The concatenated datasets are stored as R dataframes for the ensuing analysis. The sites chosen are distributed throughout Central and Northern Europe and include a mix of rural and mountain siteswhich 125 are fairly remote and minimally affected by nearby pollution sourcesand sites in closer proximity to large urban settlements or other sources of anthropogenic contamination. Figure 1 shows the locations of the ten sites.

CCGCRV curve fitting
CCGCRV (Thoning et al., 1989) is a curve fitting application for long-lived GHG time series maintained at the Carbon Cycle Group of the Climate Monitoring and Diagnostics Laboratory (CCG/CMDL) of the National Oceanic and Atmospheric 130 Administration (NOAA, USA). The version of CCGCRV used here is applied as a standalone function in R and is available from the NOAA CMDL ftp server at ftp://ftp.cmdl.noaa.gov/pub/john/ccgcrv.
The method is succinctly summarized by Pickers and Manning (2015). Basically, a fit to a time series is first obtained using a linear least squares regression following the "LFIT" protocol (Press et al., 1996). The seasonal cycle and the long-term trend of the time series are then approximated through the combination of a polynomial and a harmonic function: 135 where t is the time in years, n is the number of terms in the polynomial (typically three), a0, a1, ..., a(n-1) are constants, h represents the nth harmonic (typically four), and mk and φk define the magnitude and phase of each successive sinusoidal component.
Next, a Fast Fourier Transform (FFT) algorithm is applied to the residuals of the input data to C(t) in order to retain short-term 140 and interannual variations in the fitted curve. The data are transformed from the time domain into the frequency domain and multiplied by a low-pass filter to remove variations with frequencies higher than a specified cutoff threshold. An inverse FFT is then used to transform the filtered data back to the time domain. The low-pass filter function is represented as: where fc is the cutoff frequency in cycles per year. The low-pass filter is applied to the residuals twice, once with a short-term 145 cutoff value (fc = fs) to smooth the data, and once with a long-term cutoff (fc = fl) to capture interannual variations in the data not characterized by the polynomial part of C(t) and to remove any remaining influence of the seasonal cycle. For fl, we use the default value of 0.55 cycles per year (667 days).
Finally, the features of interest (e.g. the long-term trend and the seasonal cycle amplitude) are derived by combining the relevant components of the fitting procedure. The long-term trend is represented by the combination of the polynomial part 150 of C(t) with the fl filter (i.e. long-term trend = C(t)polynomial only + H(fl)). The seasonal cycle is obtained by subtracting the longterm trend from the combination of C(t) and the fs filter (i.e. seasonal cycle = C(t) + H(fs)long-term trend). A more detailed description of the routine can be found in Thoning et al. (1989) and on the NOAA Earth System Research Laboratories (ESRL) website at http://www.esrl.noaa.gov/gmd/ccgg/mbl/crvfit/crvfit.html.

Synoptic and seasonal anomaly detection 155
To develop the synoptic and seasonal anomaly detection algorithm (ADA), we first apply CCGCRV to extract a background signal over the full time period available at each site. The background curve is meant to approximate the mean annual cycle, and is composed of the long-term trend plus the harmonic part of C(t) fitted to the detrended data. Figure 2 shows an example of this procedure applied to the 2013-2020 CO2 data from the TRN station. We then extract the smoothed seasonal cycle, S(t), defined as the function C(t) plus the short-term filter of the residuals. We use two different settings for the short-term filter fs, equivalent to 30 and 90 days. We then calculate the difference between S(t) and the harmonic on each day t for both seasonal cycle curves to derive the vectors ⃗⃗⃗⃗ 30 and ⃗⃗⃗⃗ 90. These are then used 165 to compute variability vectors 30 and 90, which are adjusted to reflect seasonal patterns in CO2 and CH4 variability. specific signal excursions (which we term "localized fluctuations") to the extent possible while retaining the capacity to capture the true magnitudes of atypical regional events. Figure 3 shows the CCGCRV harmonic and the 30-and 90-day smoothed, detrended seasonal cycle of CO2 at TRN for the period 2013-2020. The algorithm next smooths the raw data via a LOESS (locally estimated scatterplot smoothing) function (Cleveland, 1979) implemented via the R stats package (R Core Team, 2019). This is done as a way of filtering the daily data and attenuating the influence of short-duration, high-intensity signal spikes when categorizing deviations from the background as anomalies vs. normal signal instabilities. Short-duration (≤ 1 day) spikes are not uncommon in continuous greenhouse gas measurements 190 and are often related to instrument errors or localized perturbations from contaminated air masses. Smoothing the daily data ensures that these short-duration spikes are less heavily weighted and that spikes will only be considered non-background if part of a cluster of other nearby measurements that fall outside the range of expected variability. The LOESS algorithm is applied using a smoothing span (bandwidth) of 30 and 90 days. Anomalous events are then identified by comparing the smoothed daily values to the respective 2σ-range for each day, i.e. the 30-day LOESS curve is compared to the 2σ30-envelope 195 and the 90-day curve to the 2σ90-envelope.
The goal of the 30-day analyses is to identify synoptic scale anomalies (SSAs). We identify these as peaks in the signal where the smoothed daily value is outside the 2σ-envelope for at least two consecutive days. We focus these analyses on the extended winter season (November-March) when non-meteorology-related effects on the signal, including terrestrial biosphere exchanges, are minimized. We consider 30 days sufficiently wide to mask short (≤ 1 day) spikes, yet precise enough to detect 200 the signals of distinct weather episodes (as opposed to more generalized effects of seasonal trends in circulation patterns). Although we focus primarily on CO2 in the analysis, we also attempt to validate the SSA detection by applying the methodology to concurrent CH4 records from the ten sites. Emissions of both CH4 and CO2 largely occur over the continents (Friedlingstein et al., 2020;Saunois et al., 2020). Thus, if positive CO2 anomalies during the winter months coincide with periods of sustained transport of easterly winds from the continental interior, such as when NAO-conditions or BLO regimes 210 prevail, then they should be more or less synchronized with CH4 spikes. Meanwhile, concentrations of both species should approach background levels when westerly, marine-influenced winds predominate. Although this approach is complicated slightly by the fact that the annual CH4 cycle is less distinct than that of CO2, our envelope is wide enough that measurements must be rather far from the mean annual cycle determined by CCGCRV for several consecutive days in order to register as SSAs. The method should therefore be able to adequately discern anomalous signal components for both species in most 215 winters.
The 90-day analysis is intended for the extraction of longer-term seasonal anomalies. In effect, any period when the 90-day LOESS curve is outside the 2σ90-envelope is considered to be an anomalous event. At such a wide bandwidth, the smooth curve should be minimally affected by regional signals lasting from a few days to a few weeks, leaving only a broader signal representative of seasonal effects (Ruckstuhl et al., 2012). Anomalies may be induced by enhanced s pring carbon uptake, 220 extended droughts or, to give a more germane example, wide-scale emissions reductions due to global pandemics. For the 90day application, we concentrate on the extended summer growing season (April-October) to examine the capacities of the methodology in detecting large-scale terrestrial biosphere anomalies. We focus in particular on the summer of 2018, which saw a spate of intense droughts and heat waves across Central and Northern Europe that altered continent-wide GPP and CO2 storage and flux patterns (Lindroth et al., 2020;Ramonet et al., 2020;Rinne et al., 2020;Wang et al., 2020). 225 Table 2 summarizes the results of the SSA extraction for the ten sites for the period 1 November 2015 to 31 March 2020. This period is selected since the winter of 2015-2016 is the first year in which we have CO2 data available at enough sites to accurately discern the number of localized fluctuations detected at each site, for which we require that CO2 data must be present 230 at no fewer than five sites. Localized fluctuations are defined as SSAs with no analogue at any other site, i.e. spikes at a single site which do not coincide with a similar spike elsewhere. Figure 4 shows the background CO2 signal, 2σ-envelope (shaded in gray), and 30-day LOESS curve at each of the ten sites. The period 1 July 2018 to 1 July 2019 is selected as an example.

SSAs
An analogous paneled figure for CH4 is presented as Fig. 5.

Seasonal anomalies 270
The CO2 patterns observed during the growing season of 2018 indeed reveal distinctive signals of that year's exceptional drought conditions. The ADA finds negative CO2 anomalies in May 2018 at all sites except for OPE and TRN. The largest of these is at HTM, although a fairly large spike is detected at PUY as well. This is likely a product of unusually warm and sunny early spring conditions in 2018, which contributed to early green-up and growth across the region and led to enhanced net biome production (NBP) and plant CO2 uptake. Subsequent extreme heat and dry conditions led to a lapse in summertime 290 productivity and reduced photosynthesis, resulting in July CO2 spikes which are captured at SMR, NOR, HTM, LIN and HPB (Ramonet et al., 2020). The ensuing dip in CO2 observed at SMR, NOR, LIN and HPB in September 2018 can plausibly be explained as a legacy effect of the reduced summertime productivity; drought stress likely led to decreased litter availabili ty in the fall, and hence lower than normal decomposition rates and total ecosystem respiration (Bastos et al., 2020).
Several other multi-site anomalies are captured at the three French sites in the summers of 2012-2015, though the lack of 295 recordings at the other sites makes it unclear whether these represent smaller-scale climatological occurrences or broader regional patterns. We note, however, that the large positive spike seen at OPE and TRN in October 2015 followed an intense summer dry spell that year (Erdman, 2015). This could plausibly have triggered early senescence onset and reduced photosynthetic activity in the fall. Likewise, an unusually wet summer in France in 2014 might have led to increased ecosystem productivity in the fall, resulting in the negative spike seen simultaneously at OPE, TRN and PUY in ;that year. 300 In general at OPE and TRN, anomalous enhanced maximum CO2 uptake (indicated by negative spikes in early-to midsummer) appears to correlate with higher SPEI values (wetter conditions). In addition, we note a general downward trend in summertime SPEI over the 9-year period, indicating a transition to drier conditions overall. We speculate that this might be driving reductions in maximum mid-summer uptake and the visible decrease in the occurrence of anomalous negative CO2 spikes over time. 305

Discussion
Several potential challenges may arise in the technical application of the ADA. These include the presence of large data gaps in time series which could arise from instrument malfunctions or other technical issues. CCGCRV is ill -suited to handle such gaps, meaning missing data points must be artificially imputed using a structural modeling function or linearly interpolated.
We opt for the latter by applying the method of Moritz and Beielstein (2017), which is sufficient for small data gaps but 310 generally impractical for large ones (~1 month). Long data gaps should thus be identified before performing the linear interpolation and applying the anomaly extraction, so that any quantification of the signal during these periods is regarded with caution.
The reliability of the results may also be strongly influenced by the range of available data. Sites with longer historical records will have a larger residual dataset to draw from, allowing for more precision when extracting the mean annual cycle and 315 resultant 2σ-envelope, while sites with shorter records may produce less precise results. The potential drawbacks of this are twofold; 1) very slight anomalies might tend to be obscured at sites with a limited number (~3 -4 years) of relatively capricious measurements, and 2) low-amplitude localized fluctuations might be detected at sites where the full range of expected seasonal variability is underestimated by the 2σ-envelope or the background curve is an imprecise fit to the true seasonal cycle.
Anomaly patterns at sites with shorter records should thus be regarded with caution and cross-validated with patterns from 320 other nearby sites if possible. Furthermore, the method may have limited applicability at more isolated sites that have no clear analogue within the ICOS network. Evaluation of anomalies through cross-validation is difficult if a site has relatively few nearby sister stations which can reasonably be expected to sample from similar air masses most of the time. This drawback is apparent when considering the results at PUY and JFJ, both background sites which sample frequently from well-mixed air above the planetary boundary 325 layer (Asmi et al., 2011;Herrmann et al., 2015). At PUY, for example, planetary boundary layer (PBL) height analyses reveal that the station samples from the free troposphere more than 70% of the time and up to 81% in the winter .
With such infrequent sampling of surface air, air parcels most likely to contain the carbon signatures of bellwether biospheric events (such as droughts and their legacy effects) or shorter-term anomalies linked to passing weather systems or localized contaminant plumes may go undetected. At JFJ, PBL air is sampled only intermittently when conditions favor mountain 330 venting or advection (Zellweger et al., 2003;Griffiths et al., 2014), meaning short-duration anomalies may merely reflect the prevalence of such localized phenomena rather than broader atmospheric transport patterns. In the future, this limitation will become less important as the ICOS atmosphere network continues to grow in size; 26 stations across Europe currently possess the ICOS label and another dozen or so are set to join soon.
The occasional selection of localized fluctuations is an additional concern. In some cases, low-amplitude spikes classified as 335 anomalies at a particular station might not truly represent significant regional-scale excursions from the background signal.
This implies the need for station-specific protocols to classify anomalies based on duration and magnitude, which may require cross-validation using multiple station readings or manual inspection by PIs. The similarity in the patterns at the six northernmost sites, for example, offers a means of validation for the detection of SSAs; since true synoptic scale anomalies should produce a signal over a broad swath of the continent, those anomalies observed only at certain sites can reasonably be 340 assumed to indicate localized events. Note, for example, the very slight positive anomaly in December 2016 seen only at GAT (Fig. 6).
In other cases, signal excursions might register as anomalies at certain sites but not others. In such cases, users may determine that these events are noteworthy enough that they should be classified as anomalies more broadly. The recourse then is a sitespecific refinement of the selection criteria or tuning of the algorithmic parameters. For example, although we use a bandwidth 345 of 30 days for SSA detection, this choice may not be the most appropriate in all cases. Different stations have different ambient signal variability ranges depending on their geographical setting and proximity to emission sources; those with higher overall variability (and hence wider 2σ-ranges) might record too few SSAs if applying an excessively wide smoothing span, as peaks in the smoothed signal would be dampened sufficiently to be contained within the envelope. Users may thus find a bandwidth of, e.g., 15-25 days to produce more informative results at some locations. Likewise, sites with lower overall variability might 350 tend to record too many SSAs when using a bandwidth that is too short. By definition, the envelope width also affects the anomaly selection. Note, for example, that the 2018 drought pattern typified at the six northernmost sites does not appear at OPE or TRN in Figure 8. A closer examination of Fig. 8 reveals that while measurements at these two sites during, e.g. May 2018 were below the mean annual cycle, no anomalous springtime CO2 dip was registered since the smooth curves at OPE and TRN were still contained within the 2σ envelope bounds. As mentioned, the specification of a 2σ threshold stems from 355 our desire to avoid excessive selection of low-amplitude, site-specific signal peaks. However, this width might mask noteworthy seasonal patterns at certain sites with greater year-round variability, making cross-examination all the more critical.
Uncertainties can also arise in the interpretation of the results. For example, the distinction we make between synoptic scale and seasonal anomalies is primarily based on the length of observed signal spikes. Normally, SSAs which persist from 1-2 days to several weeks are presumed to be linked to prevailing wind conditions at a given site and hence changes in the source 360 regions of sampled air parcels, e.g. from relatively clean North Atlantic air to continental-sourced air parcels bearing the signatures of terrestrial emissions. However, in some cases, anomalies deemed to be seasonal in length may simply represent the frequent occurrence or unusual persistence of synoptic-scale weather patterns. For example, the extreme heat waves in Northern and Central Europe throughout much of June and July 2018 were associated with the persistence of a high-pressure blocking system which formed over the region (Rösner et al., 2019). Although synoptic in size, this pressure anomaly -365 combined with high temperatureshad resounding effects on European forests and resulted in subsequent CO2 anomalies in the fall of 2018. It is thus more appropriately considered as part of a broader seasonal anomaly. The implication is that i n some cases, "seasonal" anomalies are rather patterns which consist of a series of shorter, related signal irregular ities. These irregularities will often be visible at shorter bandwidths and could be directly linked to meteorological events. Summertime anomalies should thus be considered in the wider context of terrestrial ecosystem production, indicators of which may lag well 370 behind the occurrence of exceptional weather episodes.

Conclusions
In general, we find that the algorithm performs well in capturing the imprints of regional and continent -wide extremes in NBP and other ecosystem indicators, specifically the distinctive markings of the 2018 European drought, on which we have placed particular emphasis. The ADA also captures well the signature effects of unusually strong or persistent weather regimes in 375 the wintertime. This interpretation is reinforced by the fact that the CO2 and CH4 anomaly patterns during the extended winter season are strikingly similar for the 30-day implementation of the methodology.
The robustness of the results is reliant on cross-validation of anomaly detection across multiple sites. However, we note that anomalies of sufficient sizee.g. ~3 ppm greater than our 2σ envelope boundary when using a 30-day bandwidth or ~0.5 ppm when using a 90-day bandwidthwill usually register simultaneously at multiple sites within the same region and seem 380 generally unlikely to stem from localized contamination. The method also has a low computational cost and the process of including additional sites in an analysis is relatively straightforward. As new NRT (level 1) GHG data are uploaded to the ICOS Carbon Portal, users have only to concatenate these to existing datasets and reinitiate the method beginning with the steps outlined in section 2.3. The current default implementation of the algorithm is to produce time series plots in the style of Figures 6, 7 and 9. 385 The ability to detect in NRT the occurrence of non-background signal events at multiple timescales is central to an improved understanding of GHG variability and regional carbon cycling processes at multiple timescales, and the ADA represents an important step toward this end. For the moment, GHG data must be downloaded by individual users and preprocessed before running the code. Results can then be made available online and used to alert station managers and other end users as to the occurrence of anomalous signal events. Future developments may include a fully-automated online implementation in which 390 data for all ICOS sites are extracted and processed daily. These would then be used to produce data files and generate time series plots to display on the sites' respective panel board pages.

Code availability 395
The ADA is intended to be open-source and the R code is currently accessible via a GitHub repository page (https://github.com/hellonskis/ICOS_ATC_anomaly_detection). Python code is available internally via the ICOS Jupyter hub at https://jupyter3.icos-cp.eu/ and is available upon request.

Author contributions
All code required to run the ADA was written and is maintained by A. Resovsky. A. Resovsky carried out all experiments 405 and produced the figures and text in this paper. M. Ramonet manages the OPE and PUY stations, and also contributed the conceptual idea behind the ADA as well as many hours of advice and feedback, especially on the sections discussing the 2018 European drought. L. Rivier was instrumental in finalizing the layout of the figures and the experimental workflow detailed in the methodology section. J. Tarniewicz assisted in pre-processing of ICOS data products and provided scientific advice regarding the smoothing procedures used to extract anomalies. P. Ciais provided suggestions related to the 410 terminology used for the text and figures, descriptions of the 2018 European drought and its after-effects and additional data analysis considerations included in the discussion section. M. Steinbacher is the manager of the JFJ station data and also provided a great deal of advice and feedback regarding the organization of this paper and the presentation of the ideas herein. We also wish to thank all members of the ICOS Atmosphere Monitoring Station Assembly for their contributions to the 430 discussions surrounding anomalous signal detection techniques.