The ACE-FTS (Atmospheric Chemistry Experiment – Fourier Transform
Spectrometer) instrument on board the Canadian satellite SCISAT has been
observing the Earth's limb in solar occultation since its launch in 2003.
Since February 2004, high resolution (0.02 cm

One of the most common techniques for screening out anomalous data from a
data set is to calculate the set's mean (

This method is much less sensitive to extreme outliers, as the presence of
outliers typically has an insignificant effect on the median value. It can
be used as an efficient tool in detecting outliers for data that are
normally distributed. However, using this value as a method of detecting
outliers can be ineffective if the data being analysed are multimodal and/or
are asymmetrically distributed about the median. In the case of data that
are multimodal or asymmetrically distributed and contain multiple extreme
outliers, it is likely that neither the

In satellite remote sensing, isolating geographical/seasonal/local time regions where global satellite-based data are symmetrically distributed and uni-modal can be difficult and/or tedious. Measurements grouped into a given altitude, latitude, month, and local time bin can be driven away from typical behaviour by any number of factors (e.g. the polar vortex, a solar proton event, a sudden stratospheric warming, biomass burning, presence of polar stratospheric clouds, etc.), thereby altering the “typical” distribution of observed measurements, and hence the probability density function (PDF) of the trace species concentration.

The often used method for detecting outliers of employing the MAD does not explicitly make use of a PDF, but, in order for it to be useful, it does make an implicit assumption that the PDF is approximately symmetric about the median value. Other often used methods, such as Peirce's criterion (Peirce, 1852; Ross, 2003) and Chauvenet's criterion (Chauvenet, 1871), explicitly make use of a PDF, however they assume that the PDF is a Gaussian distribution. It should also be noted that in atmospheric science, the use of PDFs is not uncommon in tracer and validation studies. Lary and Lait (2006), in their introduction, give excellent examples of different types of tracer studies; and studies such as Migliorini et al. (2004), Lary and Lait (2006), and Wu et al. (2008) have demonstrated that PDFs can be used as a validation tool, where PDFs as measured by different atmospheric sounders are inter-compared rather than inter-comparing co-located measurements.

The ACE-FTS (Atmospheric Chemistry Experiment – Fourier Transform
Spectrometer (Bernath et al., 2005)) instrument, on board the Canadian satellite SCISAT,
is a solar occultation, high spectral-resolution (0.02 cm

This study outlines the repercussions of screening data based on the

amended sets of microwindows for all molecules, and an increase in the number of allowed interferers in the retrievals;

improvement in temperature/pressure retrievals, leading to a reduction in unphysical oscillations in retrieved temperature profiles;

inclusion of COCl

ACE-FTS level 2 v3.5 H

Physically unrealistic outliers can occur in the ACE-FTS level 2 for a number of different reasons. Many of these are often caught prior to being added to the level 2 database, such as outliers due to exceedingly noisy spectra, ice contamination on the ACE-FTS detector affecting an occultation, and a variety of processing errors. However, these are not always caught by pre-screening, and other factors not accounted for in the pre-screening can contribute to the presence of outliers, for example, poor statistical fitting or convergence onto an unrealistic solution in the retrieval, inaccurate pressure and temperature a priori information.

The outlier detection and subsequent data flagging procedures discussed in this study have only been performed on the ACE-FTS level 2 data products that have been interpolated onto a 1 km altitude grid (between 0.5 and 149.5 km) (Boone et al., 2005). The philosophical approach for identifying data as potential outliers was one of caution, in that it is better to keep some “bad” data (likely to be physically unrealistic) than to reject “good,” or “true,” data (likely to be physically realistic). It was also desired that the approach be consistent for all subsets of data being analysed, i.e. tolerance levels, regional limits, etc. should be the same for all species, for all seasons, at all altitudes. For the remainder of this study, these physically unrealistic data will be referred to as “unnatural” outliers, and the data that are likely to be physically realistic, yet still seemingly outlying, as “natural” outliers. All data that are not unnatural outliers will be referred to as inliers.

All distributions of data discussed in this section represent the February 2004–February 2013 data, and all VMRs are given in parts per volume (ppv).

Global satellite-based measurements of trace gases in the atmosphere
are typically not symmetrically distributed and are often multimodal.
Different regions are governed by different, varying processes, and
therefore analysis of the data is typically carried out by breaking down the
data into different altitude, latitudinal, etc. bins. Figure 1 shows all the
ACE-FTS H

The data can be separated further into bins based on latitudinal regions and
local times. For example, Fig. 2 shows H

2004–2013 ACE-FTS VMR distributions for sunset occultations
(symbols) in the Southern hemisphere and corresponding best fits to normal
distribution (dashed lines):

Sunrise ACE-FTS O

Initially, all data were pre-screened. Any occultation that contained errors
due to previously known issues (e.g. unrealistic N

Sunrise ACE-FTS VMR distribution (blue circles) and fitted EDF
(black dashed lines) for:

Sunrise ACE-FTS data for the same data subsets as Fig. 4. The red circles are data that have been determined to be unnatural outliers as per the EDFs, and the blue dots are the inlying data.

The screening processes started by analysing the data's PDFs. The normalized
PDF of data subset

The total integral of the EDF is equal to

Percent rejection of ACE-FTS level 2 v3.5 profiles that contain one or more detected unnatural outlier (either by running MeAD or EDF).

This method, however, required determining an analytical solution for the
data's EDF. For each of the 50

Figure 4 shows three examples of ACE-FTS sunset data
distributions – NO

Sunrise ACE-FTS data for the same data subsets as Fig. 4. The red circles are data that have been determined to be unnatural outliers as per the 15 day running median and MeAD, and the blue dots are data that have been determined to be inliers.

All Antarctic ACE-FTS data for H

The final inlying data (blue dots) and unnatural outliers (red
dots) for all ACE-FTS HCN data at 9.5 km (left) and SF

It should be noted that screening using the EDF is a hard-limiting filter.
Therefore, using it in the manner described above does not necessarily reject
data that are non-physically anomalous for a given season. To screen the
data for this type of moderate outlier, the 15 day median and a 15 day
variation scale are calculated for each subset, excluding outliers as
determined from the EDFs. Even on a 15 day timescale, ACE-FTS subset data
can have distributions that are bimodal. In many cases, the primary mode is
sampled much more frequently than the secondary mode, and therefore, without
careful consideration, data within the secondary mode can be erroneously
screened as unnatural outliers. To avoid this, we need a variation scale
that is sensitive to outliers (unlike the MAD), but not overly sensitive to
outliers (like the

Any data point with a value outside the bounds of median

The final inlying data for ACE-FTS:

In order to explore the response to periodic extreme events and to trends,
Fig. 8 shows the final inliers and unnatural outliers in all ACE-FTS HCN
data at 9.5 km, which exhibits periodic increases that could correspond to
biomass burning events (e.g. Crutzen and Andreae, 1990; Pommrich et al., 2010); as well as all SF

In the overwhelming majority of instances where the ACE-FTS VMR data exhibit
a sudden and/or extreme change in the distribution, the unnatural outlier
detection method described above does not screen out these events. Sudden
stratospheric warmings cause there to be strong descent in the northern
high-latitude upper atmosphere. This leads to anomalously large
concentrations of NO in the upper stratosphere-lower mesosphere, near 50 km
(e.g. Manney et al., 2008; Randall et al., 2009). Figure 9a shows the time series of the final
inliers in all ACE-FTS NO at 55.5 km. It can be seen that the detection
method is able to keep the data during these extreme events as inliers.
Anderson et al. (2012), using in situ aircraft measurements, demonstrated that in the
summer there can be H

Table 1 shows what percentage of ACE-FTS level 2 v3.5 profiles contain at least one detected outlier (by either step). For any given species, if all profiles that contained at least one outlier are rejected, less than 6 % of the total number of profiles will be omitted.

A two-step process has been developed in order to screen all ACE-FTS level 2 data for physically unrealistic outliers. The first step fits an EDF, the superposition of three Gaussian distributions, to actual distributions. This fit is done in log-space. Data in the tails of the distributions where the probability of finding a data point is less than the tolerance level are determined to be extreme unnatural outliers. The second step iteratively takes the 15 day running median and MeAD and screens for moderate seasonal unnatural outliers. Data that are further than 10 times the MeAD from the median are determined to be moderate outliers.

Using these methods to screen the ACE-FTS data for unnatural outliers, a flagging system has been implemented to give ACE-FTS level 2 data users a guide for how best to use the data. Each VMR data point in each profile is flagged with an integer from 0–9. Table 2 gives the definition for each flag value. Any data with a 0 flag are recommended for use. In previous versions, data users were recommended that they filter out data where the percent error (the retrieval statistical fitting error divided by the retrieved value) is either greater than 100 % or less than 0.01 %; for legacy reasons, these data have been given a flag value of 1. It is recommended that data points with a corresponding flag greater than 2 be removed before any analysis is performed. This screening method alone may be adequate when only looking at one altitude level, however, profiles that contain an outlier at a given altitude level may also be compromised at lower altitude levels. Therefore it is recommended that any profile that contains a flag between 4 and 7 (inclusive) be removed before analysis.

Definition of flag values associated with ACE-FTS level 2 data.

At certain altitude levels for a given species, the data can be either noisy, with a significant number of negative values, or have a strong negative bias. In either case, since the ACE-FTS retrieval allows for negative concentrations, it is possible for valid data to have values close to zero, both positive and negative. When values are systematically near zero, the percent error becomes extremely large. Therefore, in these situations, screening the data based on the percent error may introduce a bias in the data. As such, before analysis, removing data that has a corresponding flag value of 1 is only recommended at altitude levels where the overwhelming majority of data points have a VMR value greater than zero.

Since the outlier detection methodology was approached with a philosophy
that it is better to leave in unnatural outliers than to remove natural
outliers, there are outliers that have gone unflagged – especially in data
sets that are inherently noisy and at low altitudes (below

The flag values for all v3.5 data are now available for download on the ACE-FTS website, and v2.5 flag values are available upon request from the lead author and will soon be made available for download on the ACE-FTS website. It is currently expected that similar flags will be a standard product within the level 2 data of all future products.

The authors wish to thank the anonymous referees for their comments and valuable insight. This work was supported by the Canadian Space Agency (CSA). The Atmospheric Chemistry Experiment is a Canadian-led mission mainly supported by the CSA.Edited by: C. von Savigny