The errors inherent in the fitting and integration of the pseudo-Gaussian ion peaks in Aerodyne high-resolution aerosol mass spectrometers (HR-AMSs) have not been previously addressed as a source of imprecision for these or similar instruments. This manuscript evaluates the significance of this imprecision and proposes a method for their estimation in routine data analysis.

In the first part of this work, it is shown that peak-integration errors are
expected to scale linearly with peak height for the constrained-peak-shape
fits performed in the HR-AMS. An empirical analysis is undertaken to
investigate the most complex source of peak-integration imprecision: the
imprecision in fitted peak height,

In the second part of this work, the empirical analysis is used to constrain
a Monte Carlo approach for the estimation of

In the third part of this work, the Monte Carlo approach is extended to the
case of an arbitrary number of overlapping peaks. Here, a modification to the
empirically constrained approach was needed, because the ion-specific

The Aerodyne high-resolution aerosol mass spectrometer (HR-AMS;

HR-AMS signals are routinely quantified with the free,
open-source “PIKA” software

PIKA is widely used to prepare HR-AMS data for least-squares statistical models
such as positive matrix factorization (PMF)

The current version of PIKA (1.10C) estimates HR-AMS uncertainties from the
square root of the number of estimated ion counts

The significance and magnitude of peak-integration uncertainties in
PIKA-analysed HR-AMS data have not been previously addressed.

This manuscript addresses peak-integration uncertainties in PIKA by using a test data set to explore and understand the origins of peak-integration uncertainties in PIKA, using methods that are intended to be applicable to any HR-AMS or other mass spectrometer. The results of this empirical analysis are then used to construct a Monte Carlo model of the PIKA peak-fitting procedure, which allows the magnitude of peak-integration uncertainties for well-resolved (isolated) peaks and for overlapping peaks to be estimated. This empirically based approach allowed several assumptions behind uncertainty estimation to be directly evaluated and may be applied to any new data set.

The manuscript is structured as follows. First,
Sect.

Based on the constraints established in the previous section,
Sect.

A number of mathematical symbols and abbreviations are used throughout;
a list is provided in Table

List of symbols and abbreviations. Symbols used only once in the text are omitted from this list.

Throughout this manuscript, the terms “imprecision” and “bias” are used
when referring respectively to random and independent errors (averaging to
0) and to errors of constant value. The distinction between these two
concepts varies naturally at different stages of the analysis: if
peak fitting is biased by an effect that varies from peak to peak, then an
imprecision in the overall set of fitted peaks will result. As we discuss
below, this is the case when peak positions are constrained by an

Four conceptual categories contributing to HR-AMS uncertainties can be defined: interpretation, counting, instrumental, and analysis uncertainties. These uncertainties may be defined and addressed as follows.

Interpretation uncertainties arise when a given signal may arise from sources
other than the analyte. For example, in the AMS,

Interpretation uncertainties may also occur if an ion is misidentified or omitted in the PIKA software. Such errors are not considered in this work.

Counting uncertainties estimate the degree to which a count of

Instrumental uncertainties may arise due to electronic noise or to
changes in the performance of various instrumental components.
In the latter case, which for example may reflect changes in detector
sensitivity or long-term performance fluctuations, the significance of such
variations can be evaluated by Allan-variance analysis

For an SP-AMS,

Finally, analysis uncertainties reflect the confidence with which mass-spectral peak areas can be determined. These uncertainties may comprise both biases and imprecisions. The remainder of this manuscript focusses on these uncertainties in PIKA, with an emphasis on the peak-integration imprecision which is most relevant to PMF (and other least-squares-minimization techniques). The next section describes the PIKA fitting procedure to provide a basis for this discussion.

In PIKA (up to the current version, 1.11C), the signal intensity

To a first approximation, the peak

In practice, the Gaussian model is modified to account for peak broadening,
skewness, tailing, or other instrumental non-idealities

To improve the robustness of the PIKA fitting routine against poorly resolved
peaks and noisy data, some of the parameters in Eq. (

The

The

Thus, the three inputs to Eq. (

With the inputs described above, Eq. (

Example of a PIKA fit. The raw data (

The fitted

After estimating

In the HR-AMS, more than one peak is typically observed at each integer

The uncertainty of fits to Eq. (

The next section therefore focusses on the case of well-resolved peaks,
before Sect.

The integration of fits to well-resolved pseudo-Gaussian peaks via
Eq. (

In Eq. (

The value of

The estimation of

In this subsection we describe the theoretical basis on which the
RMSE, defined below, has been interpreted in Sects.

Since the width, location, and shape of the PIKA pseudo-Gaussian fit function
(Eq.

A fit to Eq. (

The influence of such variations on the imprecision of the fitted

To quantify the error in a given fit of

As the RMSE of a single peak is the main quantity of interest below, the
range

The RMSE is not used to directly infer the error in the fitted

For a sufficiently large sample, the expected value of the squared RMSE is
the sum of the model variance and the squared model bias

Letting

For small peaks, counting uncertainties dominate the fit residual:

For large peaks, the counting uncertainties

Thus, for signals large enough for noise to be negligible, a plot of

The RMSE of a number of well-resolved peaks in the test data set are
plotted in Fig.

The RMSE of standard PIKA fits to well-resolved peaks in the test
HR-AMS data set. Numbers in the legend show the integer

The peaks in Fig.

Two distinct trends are evident in Fig.

The constant

The source of the constant relative RMSE in Fig.

The original magnitude of each error was estimated directly from the
data. Based on these estimates, significantly larger uncertainties
were added to the data, as specified in
Table

Effects of manually introducing errors to the PIKA analysis
procedure. Cases highlighted in bold are plotted in
Fig.

Figure

Response of the RMSE of a representative ion (

With this approach, multiple potential sources of the constant
relative RMSE term can be eliminated: noise in the predicted peak
width

Conversely, two potential sources of the constant relative RMSE can be
identified: noise in the predicted peak location

The relative

One additional implication of Fig.

To explore the impact of errors in

Changes in the RMSE of

Allowing

Further relaxing the

Allowing

Summary of

When both

When the predicted peak location

The dark-shaded data in Fig.

Distribution of prediction errors in

Figure

The lack of a

Although it is readily apparent that a

Scatter plot of the estimated errors in peak-location prediction
(same data as in Fig.

A Monte Carlo approach for the estimation of peak-fitting uncertainties was
developed based on the above discussion. With this approach,

The RMSE of

The Monte Carlo approach used to evaluate

The RMSE of these fits to the simulated data are plotted in
Fig.

It was noted in Sect.

The imprecision in

The blue points in Fig.

Normalized and centred probability distributions of the simulated
imprecision in fits to well-resolved peaks from the test data set, using the

In practice, it is desirable to estimate

For overlapping peaks, the

In Fig.

Response of the peak-fitting imprecision

Each probability distribution in Fig.

The impact of the

A non-negligible

Example of the relationship between the uncertainty in the
integrated area of for a single, well-resolved peak and its absolute area.
The example uses real data for

To place Fig.

The second row of plots in Fig.

The third row of plots in Fig.

Thus, the bias need not be known or estimated for the accurate Monte Carlo
estimation of

Consistent estimates of

An approach to estimating

With

For well-resolved peaks, the first term in Eq. (

While

Fig.

The arguments presented above clearly show that

The importance of high signals means that fitting errors may be especially
important for the high aerosol concentrations that may be measured at the
roadside

Given the variable importance of high signals within different data sets, and the importance of instrument-specific parameters to the peak-integration imprecision as discussed above, an absolute statement of the relevance of these uncertainties in PMF is impossible. Nevertheless, as an example we performed PMF on a synthetic data matrix to demonstrate the significance of addressing peak-fitting errors.

The synthetic data matrix was constructed using the PMF solution presented by

When this synthetic matrix was factorized using only a Poisson imprecision
term to weight the data (i.e. with the largest signals overweighted), the
residual matrix showed significant outliers for the highest signals. For
example, the largest residual outliers for

The appearance of the highest

Overlapping-peak uncertainties estimated by the method outlined in
this manuscript. Four model peaks are shown, defined with the same shape as
observed in the test data set. Prior to fitting, counting noise was added to
each data point and representative mass calibration errors (

It was concluded in Sect.

This approach is also applicable, without modification, to the case of
multiple overlapping peaks. Fig.

The figure shows that, for the two larger signals, peak-fitting imprecisions
were on the order of counting imprecision. For the two smaller peaks,
peak-fitting imprecisions dominate, with the leftmost peak having a large
enough uncertainty to be considered unquantifiable. The influence of counting
imprecision on

The largest uncertainty in this analysis was due to the estimation of the
biases and imprecisions in

Biases in

The above conclusions that the major causes of fitting errors in PIKA are errors in peak-location prediction raises the question of whether the fit procedure itself might be improved.

It would be preferable to allow a priori knowledge of calibration
uncertainties to be incorporated into the fitting procedure, following
Bayesian theory

In addition to an improved fitting procedure, an improved calibration
procedure would be an obvious recommendation for reducing

By definition, the Monte Carlo approach invariably requires more time than a direct approach. In large HR-AMS data sets, thousands of mass spectra may be recorded, each containing hundreds of peaks to be fitted. On modern personal computers, fitting may consequently take hours, so performing 100 Monte Carlo simulations during exploratory data analysis can be impractical. A compromise between rapid and reliable results is therefore desirable.

In their work,

To minimize the computation time required for the Monte Carlo approach, the
following algorithmic approach could be used. First, the approximation

While only peak-integration and ion-counting uncertainties were addressed in the discussion above, a number of other AMS-specific uncertainties can be identified.

Other AMS-specific uncertainties include the fundamental uncertainty involved
in converting electronic signals at the detector to ion counts. This
conversion is performed after estimating the signal intensity of a single
ion, a process complicated by the signal-thresholding applied by the data
acquisition software (version 4.0.9). We performed this single-ion
measurement procedure on each of the measurement days described herein and
obtained results varying by

A second HR-AMS-specific scenario arises when the intensity of
less-abundant isotope peaks
are predicted based on the intensity of more-abundant isotope peaks.
This procedure propagates fitting
errors across integer

Finally, not all ions follow the peak shape established by the PIKA
calibration procedure. In particular, thermally generated ions such as

Peak-integration uncertainties in the analysis of HR-AMS data by PIKA originate from uncertainties in peak-width prediction and in peak-height fitting. The former uncertainty may be easily estimated from the peak-width calibration procedure; the latter by an empirically constrained Monte Carlo approach.

Peak-fitting uncertainties depend most strongly on errors in the

Since peaks are fitted in PIKA by linearly scaling a predefined function,
peak-fitting errors for well-resolved peaks also scale linearly with peak
height. This leads to a constant-relative-imprecision term in the overall
peak-integration uncertainty. Since a constant relative imprecision scales
linearly with the ion count

In a synthetic data set, including the constant-relative-imprecision term
during PMF led to a significant improvement in the accuracy of the solution.
Thus, although peak-integration uncertainties are much smaller than
the uncertainties inherent in AMS-measured mass concentrations or elemental
ratios, neglecting peak-integration uncertainties during PMF of data sets
containing high-signal ions may significantly bias mass concentrations or
elemental ratios of the retrieved factors. The dependence of the relative
imprecision on the

Finally, peak-fitting errors may also increase rapidly when peaks overlap
significantly, potentially becoming much larger than the uncertainties of
well-resolved peaks. It was shown that the fitting imprecision for
overlapping peaks may be estimated directly by a minor modification to the
Monte Carlo approach described above, that is, by intentionally
overestimating the input imprecision such that biases may be neglected. This
overestimate results in a moderate overestimate of the peak-fitting
imprecision but avoids the ion-to-ion biases that would otherwise result
from unquantifiable

The data set used for evaluating and testing fitting uncertainties represents
the mass spectra of fresh, aged, and filtered-and-aged aerosols emitted from
a beech-wood combustion stove. Up to six batches of wood were burnt
consecutively on 3 consecutive days in these experiments. A complete
description of the experimental setup and instrument configuration is given
in

The wood-combustion aerosols were vaporized and ionized in an Aerodyne HR-AMS
equipped with a soot-particle (SP) vaporization module

All data were analysed in Igor Pro (Version 6.2, Wavemetrics, OR, USA) using
a modified version of PIKA, derived from PIKA 1.10H, and custom code. The
modifications to PIKA consisted of improvements to the peak-width
calibration procedure, the selective introduction of errors to the analysis,
algorithmic
improvements to the peak-fitting procedure,
and the implementation of peak-integration uncertainties as part of the
overall PIKA error calculation. The first modification improved the
robustness of the peak-width calibration procedure by replacing the mean peak
width with a trimmed mean, followed by a weighted fit to the data, as
detailed in

In this paper, “diff” HR-AMS data have been presented. Diff data
represent the difference of “open” measurements (comprising signals from
particulate, gaseous, and background species) and “closed” measurements
(comprising background species from gases and slowly evaporating material).
The same trends seen in the diff data were seen in analogous plots for the
open and closed data; however, the noise regime of the RMSE was much less
noticeable in these data. Diff data were used to allow the two regimes of the
RMSE to be clearly highlighted and to remove inconsequential differences due
to different background levels, for example of

The following peaks were used for

The open-source nature of the PIKA software was essential to this work, making it possible to read and understand the details of the existing AMS analysis procedures. Input from P. Lowdon, M. Tanadini, M. R. Canagaratna, and two anonymous reviewers led to significant improvements in this work. Edited by: J. Schneider