the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Correcting bias in loglinear instrument calibrations in the context of chemical ionization mass spectrometry
Chenyang Bi
Jordan E. Krechmer
Manjula R. Canagaratna
Quantitative calibration of analytes using chemical ionization mass spectrometers (CIMSs) has been hindered by the lack of commercially available standards of atmospheric oxidation products. To accurately calibrate analytes without standards, techniques have been recently developed to loglinearly correlate analyte sensitivity with instrument operating conditions. However, there is an inherent bias when applying loglinear calibration relationships that is typically ignored. In this study, we examine the bias in a loglinearbased calibration curve based on prior mathematical work. We quantify the potential bias within the context of a CIMSrelevant relationship between analyte sensitivity and instrument voltage differentials. Uncertainty in three parameters has the potential to contribute to the bias, specifically the inherent extent to which the nominal relationship can capture true sensitivity, the slope of the relationship, and the voltage differential below which maximum sensitivity is achieved. Using a prior published case study, we estimate an average bias of 30 %, with 1 order of magnitude for less sensitive compounds in some circumstances. A parameterexplicit solution is proposed in this work for completely removing the inherent bias generated in the loglinear calibration relationships. A simplified correction method is also suggested for cases where a comprehensive bias correction is not possible due to unknown uncertainties of calibration parameters, which is shown to eliminate the bias on average but not for each individual compound.
 Article
(3273 KB)  Fulltext XML

Supplement
(1063 KB)  BibTeX
 EndNote
The timeofflight chemical ionization mass spectrometer (TofCIMS) has been widely used for online characterization of organic compounds in the atmosphere. Gasphase analytes are reacted with reagent ions to form analyte ions and then detected and classified by mass spectrometry. Many reagent ions have been examined, with some of the most popular being hydronium (Yuan et al., 2016; Lindinger et al., 1998), acetate (Bertram et al., 2011; Brophy and Farmer, 2016), nitrate (Jokinen et al., 2012; Krechmer et al., 2015), CF_{3}O (Crounse et al., 2006; St Clair et al., 2010), and iodide (Lee et al., 2014; Slusher, 2004). Each reagent ion accesses a different region of chemical space (Riva et al., 2019; IsaacmanVanwertz et al., 2017) and differs in its range of sensitivities, from relatively universal to highly variable. For example, proton transfer reaction is commonly used for measurements of less oxidized compounds with a sensitivity that varies only by a factor of up to 3 or 4 for most analytes (Sekimoto et al., 2017). In contrast, iodide is most useful for semivolatile, oxidized compounds, but sensitivity varies by several orders of magnitude (Iyer et al., 2016; LopezHilfiker et al., 2016).
Unfortunately, these wide ranges in sensitivity pose significant issues in the quantitative measurement of ambient atmospheres. For many atmospheric constituents, it is not possible or not feasible to calibrate using authentic standards, due to a lack of commercial availability and/or chemical instability (thermal lability, flammability, etc.) (Brophy, 2016). Several approaches have consequently been developed to estimate the sensitivity of a CIMS instrument to a given analyte based on either its physicochemical properties (e.g., dipole moment and polarizability) (Sekimoto et al., 2017) or its observed response (e.g., induced dissociation) to changing instrument conditions (Zaytsev et al., 2019; LopezHilfiker et al., 2016). Estimating instrument sensitivity based on derived relationships between sensitivity and other properties inherently carries some uncertainty, as the relationship is unlikely to be ideal and typically includes some scatters. Nevertheless, this approach of “derived sensitivity” is often the best (or only) tool available for calibration, so a close look is warranted into the implications of this approach for the error of a single analyte, as well as the combined error of the sum of many analytes.
Previous work, which we discuss in detail in the following section, has examined the uncertainty in estimating a parameter from a derived relationship (i.e., using a regression model to predict a value). Specifically, prediction of a value (e.g., sensitivity) from a linear model introduces no bias and has normally distributed error, but in more complex relationships (involving logtransformations, step functions, etc.), bias and other errors may be introduced. Many of the derived sensitivity relationships used for CIMS have more complex forms, so the overarching goal of this work is to evaluate and correct for biases and other errors in the types of relationships used for estimating CIMS sensitivities.
We focus in this work on the calibrations of analytes in an iodide CIMS because (1) this measurement technique is widely used, (2) it has orders of magnitude variance in sensitivities (Iyer et al., 2016), and (3) estimating its sensitivity often relies on a complex (loglinear, piecewise) sensitivity relationship. Iyer et al. (2016) have shown that the sensitivities of analytes in an iodide CIMS are loglinearly correlated with the binding enthalpy of the iodideanalyte adduct, with some maximum sensitivity that is limited by the rate of collisions between the analyte and the reagent ion. LopezHilfiker et al. (2016) further suggested that modulating voltage differences in certain components of the mass spectrometer (i.e., between the skimmer of the smallsegmented quadrupole and the entrance of the bigsegmented quadrupole) can introduce declustering of the iodidemolecule adduct. The parameter, dV_{50}, which is the voltage difference where signals of a compound are at halfmaximum, is reported to be an indicator of the binding enthalpy of the adduct (LopezHilfiker et al., 2016; Iyer et al., 2016). Therefore, the iodideCIMS sensitivities can be predicted by dV_{50} based on a loglinear relationship, up to a plateau of maximum sensitivity at sufficiently high binding enthalpies.
The objective of this study is to understand the error in the calibrated mass of an analyte or the sum of multiple analytes measured by a CIMS. The work here focuses on sensitivities that are predicted using logtransformed derived relationships as in the case of the iodideCIMS voltage scan method, but any calibration approach that relies on mathematically transformed relationships should be studied in this manner, and biases should be corrected. We first examine the problem by comparing simple linear and loglinear models used to estimate instrument sensitivity, then expand these ideas to the more complex relationship used in iodideCIMS voltage scanning, and finally provide and evaluate corrections to reduce or even remove the bias.
2.1 Linear fits
In some cases, the sensitivity of an instrument can be estimated from a direct linear fit to a property or parameter. For example, sensitivity of a flame ionization detector is linearly correlated with oxygentocarbon content of an analyte (Hurley et al., 2020). In a linear model such as this, the average residual of the fit (i.e., the difference between the true sensitivity and the predicted sensitivity) will necessarily be equal to zero. In other words, there is no difference between average true and average modeled sensitivity. The sensitivity of any given analyte might be uncertain, but those uncertainties are normally distributed around the model, so the potential overprediction is equal in scale to the potential underprediction. The average sensitivity measured for each analyte is therefore unbiased, and the summed mass of multiple ions is consequently unbiased. Specifically, relative uncertainty, σ_{sum}, in the summed mass or concentration, C_{sum}, of N analytes is the sum of the squares of the relative uncertainty in each individual analyte, σ_{i}, and their individual concentrations, C_{i}:
In cases where relative uncertainty of each analyte is equal (e.g., “instrument uncertainty is 20 %”), ${\mathit{\sigma}}_{\mathrm{1}}={\mathit{\sigma}}_{\mathrm{2}}=\mathrm{\dots}={\mathit{\sigma}}_{N}$ and Eq. (1) can be rewritten as
This equation has two extreme conditions. When one compound dominates total mass, C_{sum} is essentially equal to C_{i}, and this equation collapses to
In this case, relative uncertainty is equal to that of a single analyte. At the other extreme condition, when all N analytes are equal in concentration, this equation collapses to
In most cases, neither a single analyte will dominate summed mass, nor will all analytes be evenly distributed, so uncertainties in the summed mass of realworld measurements likely fall between the extremes of ${\mathit{\sigma}}_{i}/\sqrt{N}$ and σ_{i}. From Eqs. (3) and (4) it is clear then that, when calibration relies on linear fits, relative uncertainty in summed mass is generally lower than uncertainty in the mass of an individual analyte. In other words, as a set of analytes gets larger, their average predicted sensitivities are increasingly well described by the average model.
2.2 Logtransformed fits
It is tempting to assume that the conclusions drawn from linear fits are generalizable: that the sum of many analytes is less uncertain than any given analyte. However, this conclusion has some truth, as well as some limitations, when a mathematical transformation (e.g., the logarithm) is applied to data to linearize it. The case we address here is specifically when log(sensitivity), not sensitivity, is correlated to some other parameter, as in the case of iodideCIMS voltage scanning. What we present here is substantively similar to the treatment by Miller (1984) of the case of linear fits to naturallogtransformed data, through the lens of its implications for atmospheric measurements.
A linear fit through logtransformed data can be described as
where the logtransformed value of Y is described by two coefficients (α and β) describing a linear relationship with X and an error term, ε, describing deviation in the true value from the fit. In such a fit, the error term is assumed to be normally distributed in logarithmic terms, meaning it is lognormally distributed in linear terms (i.e., ε is normally distributed).
To understand the effect of this error term on a real instrument, we consider a thought experiment presented in Fig. 1, though the discussion here applies to linear fits through any logtransformed data. A distribution of points is shown with a normal distribution of “scatter” around an average linear fit describing the relationship between log(sensitivity) and the parameter, dV_{50}, that is an empirical description of the binding enthalpy of the analyte with the reagent ion. Sensitivity and dV_{50} of 100 analytes are backcalculated from a predefined loglinear fit (i.e., slope and intercept of the line) with a distribution of scatter described by σ_{scatter}=0.4 log units (i.e., a factor of 2.5, similar to previously estimated uncertainty in an iodide CIMS; IsaacmanVanwertz et al., 2018). Consider two analytes of dV_{50}=5.0 V (i.e., blue circles in Fig. 1), which have an equal probability of occurring at one sigma above or below this fit. Using this loglinear fit, the sensitivity that would be assigned to both analytes is 10^{0}=1 (in units of signal per mass, scaled arbitrarily). However, one analyte has a true sensitivity of 10^{0.4}=2.5 signal/mass while the other has a true sensitivity of 10${}^{\mathrm{0.4}}=\mathrm{0.4}$ signal/mass. The average sensitivity of these two components is therefore 1.45 signal/mass, 45 % higher than the predicted value. In other words, uncertainty in log terms is implicitly “factor”based uncertainty as opposed to “percentage”based uncertainty, and a factor of 2.5 times (i.e., 0.4 log unit) larger is a higher difference than a factor of 2.5 times smaller. Taking this example one step further, consider an environment in which both analytes are present in equal mass, e.g., one mass unit each is equal to two mass units total. Signal generated by this instrument from both analytes would equal 0.4 + 2.5 signal units = 2.9 signal units. In turn, the 2.9 signal units would be interpreted using the predicted sensitivities of 1 signal/mass, calculating a total mass of 2.9 mass units, 45 % higher than the true mass of 2 mass units. Summing increasingly large numbers of ions does not remove this bias. Instead, a correction must be made to the loglinear model to account for this difference between true and predicted average sensitivity.
Correcting this bias requires a proper consideration of the true average of the error term, ε, in linear terms. The true value of Y can be calculated as
The median of a lognormal distribution is equal to the median of the logtransformed distribution, so the median value of Y is correctly represented by this equation. However, as observed in the example shown in Fig. 1, the mean value of a lognormal distribution is higher than the mean of the logtransformed distribution. Specifically, the mean value of a lognormal distribution with a width of σ and a median of 0, for any logarithm base, B, is
Both forms of this equation given are equivalent, and for a naturallognormal distribution, it collapses to the expected form of ${e}^{\frac{\mathrm{1}}{\mathrm{2}}{\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{2}}}$ (Miller, 1984). Equations (6) and (7) can be combined to yield a full equation for accurately estimating the mean expected value of Y using a linear fit through base10logtransformed data:
Implementation of this errorcorrection term removes the expected bias. Figure 2 demonstrates the bias in average mass in the simple scenario discussed above as a function of the scatter around a loglinear model. Inherent bias in the sensitivity, and thus measured mass, of ions is reasonable for small values of σ_{scatter} but quickly becomes substantial with increasing σ_{scatter}. Introducing the bias correction term removes bias entirely. The magnitude of this bias is independent of assumptions about the relationship shown in Fig. 1, such as its slope or the range of dV_{50} across which it is applied.
Equations (4) and (8) suggest two important conclusions: (1) summing multiple analytes reduces the uncertainty in the summed concentration, and (2) analytes and sums of analytes calibrated using logtransformed relationships are inherently biased. To explore the combined effects of these two conclusions, we perform a Monte Carlo analysis that simulates the realworld application of logtransformed sensitivity models. N number of simulated analytes are generated with a randomly assigned “true sensitivity” defined by the relationship shown in Fig. 1 with a Gaussian distribution of error, σ. Each analyte is assigned a random “true sampled mass” spanning 6 orders of magnitude (i.e., 10^{−3} to 10^{3} arbitrary mass units). A simulated signal produced by each analyte is calculated by multiplying its true sensitivity by its true sampled mass. The nominal loglinear model is used to estimate a fitted sensitivity for each analyte, which is used to convert the signal to the fitted mass of an analyte. The summed fitted mass of all N analytes is compared to the summed true sampled mass to calculate the error in the fitted mass; 100 000 such simulations of N analytes yield a probability distribution of expected error.
The combined effects of the two statistical trends implied by Eqs. (4) and (8) are clear in Fig. 3. As the number of analytes, N, increases, the sum of the mass converges toward a tighter distribution of uncertainty (i.e., the sum becomes less uncertain). However, the mass to which the distribution converges is inherently biased. In other words, the sum of five analytes may span a wide range of potential error, but on average they will be biased high by ∼ 50 %. Increasing the number of analytes just improves the probability that the sum is ∼ 50 % too high. The sum of 500 analytes, each with an uncertainty of a factor of 2.5 has high precision but inherently biased accuracy.
This approach assumes that true sensitivities are not perfectly represented by the nominal relationship (i.e., scatter is “real”); this is in contrast to the case in which each analyte is actually truly described by the fit and deviations are due to measurement uncertainties (i.e., scatter is measurement error). If the latter is discovered in subsequent literature to be the case, no bias would truly exist and the work in this paper would be extraneous. However, we believe the more likely case is that the scatter is a real consequence of the calibration approach for two reasons. Firstly, it is unlikely that an empirically derived relationship captures with perfect fidelity the sensitivity of an analyte. Secondly, because an iodide CIMS classifies analytes by elemental formula with no regard to molecular structure, the dV_{50} of each analyte (i.e., ion) is typically some combination of multiple compounds (Bi et al., 2021). It therefore inherently represents some composite of a distribution of analytes and is unlikely to equally represent all analytes in the mixture. Nevertheless, scatter measured by a real instrument provides some insight into true scatter; imperfect measurements of many compounds scattered around the nominal relationship would yield the nominal relationship with some uncertainty that represents the true variability (at least to some greater or lesser degree). For the purposes of realworld instruments, then, we suggest that it is reasonable to use observed uncertainty in model parameters as an estimate of their true variability and will do so throughout this work. However, we note that the sensitivity of some compounds predicted by the loglinear relationship between sensitivity and dV_{50} may have high uncertainty, likely due to the empirical nature of the relationship (Bi et al., 2021).
4.1 Sources of uncertainty
So far, this work has treated a fairly simple case: normally distributed error in logtransformed data. However, in the case of an iodideCIMS calibration using voltage scans, this may not accurately represent the form of uncertainty. The true relationship between log(sensitivity) and dV_{50} has several parameters, each of which could be uncertain or may represent a central tendency of an inherently imperfect relationship. Figure 4 shows the nominal relationship between sensitivity and dV_{50} of a form that is typically considered for an iodide CIMS using voltage scans, as well as an illustrative potential spread of true sensitivities around this relationship. At some dV_{50}, _{max}, the instrument reaches maximum sensitivity, S_{max}, and it might be reasonable to expect that analytes closer to this value adhere more closely to the general relationship than compounds that diverge significantly from maximum sensitivity. In this case, variability in sensitivity may itself be partly (but perhaps only partly) a function of dV_{50} (i.e., heteroscedastic). Note that while compounds near maximum sensitivity are generally well predicted, the nominal relationship in Fig. 4 assumes that sensitivities of lowsensitivity analytes may diverge by roughly an order of magnitude from the general trend.
The relationship shown in Fig. 4 is defined by four critical parameters that may have some uncertainty or may deviate from the nominal relationship. The distribution of sensitivities can be described by some distribution in each of the four parameters, in the units and forms they have been previously considered (IsaacmanVanwertz et al., 2018).

σ_{scatter} is the scatter in true sensitivity around the nominal relationship (i.e., the extent to which the average relationship inherently describes the data). Units are log units of sensitivity.

σ_{slope} is the variability in the slope of the relationship between log(sensitivity) and dV_{50}. Units are log units of sensitivity per volt.

${\mathit{\sigma}}_{\mathrm{d}{V}_{\mathrm{50}},\mathrm{max}}$ is the variability in the inflection point, the dV_{50} voltage at which sensitivity reaches its maximum. The unit is volts.

${\mathit{\sigma}}_{{S}_{\mathrm{max}}}$ is the extent to which the nominal maximum sensitivity describes the sensitivity of compounds that are expected to be maximally sensitive. The unit is percent.
Each deviation from the nominal relationship will lead to inherent bias as in the simple logtransformed example discussed above. The exception to this issue is the fourth source of variability, variability in maximum sensitivity. Because this parameter is typically known reasonably well, uncertainty is low and best considered as a percentage. Uncertainty in this parameter is therefore not in log terms and does not introduce bias (i.e., 10 % lower and 10 % higher are equally different from the nominal maximum sensitivity).
4.2 Bias correction
To correct for the three potential sources of bias, we introduce Eq. (9) to calculate the expected sensitivity, S, of an analyte of a given dV_{50}.
A critical term in this calculation is the extent to which the dV_{50} of an analyte is below the nominal inflection point dV_{50,max}, which is defined as going to 0 in the region of maximum sensitivity, ΔdV_{50}= max(dV_{50,max} − dV_{50}, 0). The use of ΔdV_{50} has changed the sign of the slope compared to when it is plotted against dV_{50} (top x axis in Fig. 4). The slope is defined as change in log(sensitivity) per unit ΔdV_{50} and is therefore necessarily a negative value (i.e., sensitivity decreases with ΔdV_{50}). The first two terms in this equation (i.e., ${S}_{\mathrm{max}}\left({\mathrm{10}}^{\mathrm{slope}\times \mathrm{\Delta}\mathrm{d}{V}_{\mathrm{50}}}\right))$ constitute the nominal relationship, while the last three terms introduce corrections for bias due to σ_{scatter}, σ_{slope}, and ${\mathit{\sigma}}_{\mathrm{d}{V}_{\mathrm{50}},\mathrm{max}}$, respectively. We note that the first two terms are identical in form to the sensitivity equation used in previous work (IsaacmanVanwertz et al., 2018), except Eq. (9) excludes an additional correction term (“S_{0}”) that is outside the scope of the present work but is typically included to account for partial declustering at ΔdV_{50}=0.
Unlike in the simple case of σ_{scatter}, note that bias correction factors for σ_{slope} and ${\mathit{\sigma}}_{\mathrm{d}{V}_{\mathrm{50}},\mathrm{max}}$ are not independent of parameters in the nominal relationship. Bias caused by σ_{slope} increases with the range of dV_{50} across which the relationship is applied. Bias caused by ${\mathit{\sigma}}_{\mathrm{d}{V}_{\mathrm{50}},\mathrm{max}}$ increases with the slope, which makes sense when considered at its extreme – if there were no decrease in sensitivity with dV_{50}, then the inflection point is irrelevant. Given these dependencies, the scope of bias and the efficacy of the bias correction term must be explored using some approximation of typical CIMS conditions. For this work we use the calibration parameters used by IsaacmanVanwertz et al. (2018): d${V}_{\mathrm{50},\mathrm{max}}=\mathrm{6.3}$ V, slope = −0.9 log units sensitivity per volt, up to a maximum ΔdV_{50}=2.3 (a minimum effective sensitivity was applied by IsaacmanVanwertz et al. (2018), which is irrelevant to this work). Using these bounding conditions, the bias introduced by the model parameters is shown in Fig. 5, in which the other sources of variability are held at 0 to isolate the effect of each parameter. As in Fig. 2, bias quickly increases with σ for all parameters except ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}$ (as expected). The correction factors introduced in Eq. (9) almost fully remove all bias.
To examine the combined effect of variability in all four parameters, we investigate the conditions described for a realworld iodide CIMS by IsaacmanVanwertz et al. (2018). Uncertainty was estimated based on reported values in that work: σ_{scatter}= 0.2 log units, σ_{slope}= 0.125 log units per volt, ${\mathit{\sigma}}_{\mathrm{d}{V}_{\mathrm{50}},\mathrm{max}}=\mathrm{0.125}$ volts, and ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}=\mathrm{85}$ % (calculated as the approximate standard deviation of their two reported possible values for S_{max}). No value for σ_{scatter} was actually reported as no measurements were available to constrain the inherent scatter in sensitivity, so 0.2 log units is assigned here as an estimate that produces approximately the same average uncertainty reported in that work for individual ions (a factor of ∼ 2.5).
The example shown in Fig. 6 provides a case study to examine the importance and the limitations of bias correction. Without introducing the correction parameters, the sum of 225 analytes (ions measured by iodide CIMS) is expected to yield a mass roughly 30 % too high, with a range of possible measurements spanning from negative error to nearly a factor of 2. As described in Sect. 2.2, this 30 % bias in the sum is caused by an average 30 % in each individual analyte, so the bias exists for one analyte as well as for the sum of analytes. Introducing the correction parameters in Eq. (9) removes this bias and tightens the distribution, but the range of possible sums is still substantial. In the work upon which this case study is based, IsaacmanVanWertz et al. (2018) used a similar Monte Carlo approach to calibrate all ions, explicitly considering a distribution of uncertainty in calibration parameters, so likely avoided introducing bias. Notably, they estimated that a factor of 3 uncertainty in any given analyte led to an uncertainty of ∼ 60 % in the sum of the 225 ions measured, comparable to the width of the distribution shown in Fig. 6. The approach described here alleviates the need to perform a full Monte Carlo approach in future work seeking to calibrate large numbers of ions, instead using Eq. (9) to remove the bias in average sensitivities.
To remove bias in CIMS calibration, the correction terms in Eq. (9) should be included in the calculation of an analyte's sensitivity. However, in many realworld cases, the number of calibrants to establish the loglinear relationship is limited (e.g., fewer than 10 in LopezHilfiker et al., 2016; Mattila et al., 2020), so it may not be feasible to separately treat uncertainty in all four parameters. A simplification of the detailed, parameterexplicit approach here could instead treat all forms of uncertainty as some average residual between the nominal and true sensitivities with an effective scatter, ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$. Such an approach would apply only the first correction term using this average scatter, ignoring the terms dependent on dV_{50} and slope and implicitly assumes that uncertainty is homoscedastic. This simplified approach, shown below in Eq. (10), is mathematically equivalent to the basic logtransformed case of Eq. (8) and roughly works for low to moderate ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ but loses skill as the slope and the range in dV_{50} increase.
Application of Eq. (10) represents a more feasible approach to the implementation of bias correction under many realworld scenarios than the full, parameterexplicit form of Eq. (9) but requires a careful consideration of the best approach to estimate ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$. In the specific case that σ_{scatter} is the only source of uncertainty (i.e., ${\mathit{\sigma}}_{\mathrm{slope}}{=}_{\phantom{\rule{0.125em}{0ex}}}{\mathit{\sigma}}_{\mathrm{d}{V}_{\mathrm{50}},\mathrm{max}}=\mathrm{0}$), ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ must equal σ_{scatter} and error is homoscedastic. Because σ_{scatter} is by definition a description of the error in the model relationship, it can be estimated as the standard deviation of the residual of the loglinear fit (σ_{residual}), and this value must represent a reasonable estimate of ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$. However, a nonnegligible caveat to this approach is that ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}$ quantitatively impacts the residual of the loglinear fit but does not introduce bias and thus should not influence the bias correction term. Consequently, the effect of ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}$ needs to be removed from the residual before using it as an estimate of ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$. Fortunately, uncertainty in S_{max} is often reasonably well constrained based on experimental parameters (e.g., uncertainty in the calibration of a maximum sensitivity analyte), so ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ can be estimated as
where ${\mathit{\sigma}}_{{S}_{\mathrm{max}}\left(\mathrm{log}\right)}$ is the logequivalent uncertainty in S_{max}, which is typically considered in linear terms. For example in the case of ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}=\mathrm{10}$ % (i.e., a factor of 0.9), the logequivalent uncertainty ${\mathit{\sigma}}_{{S}_{\mathrm{max}}\left(\mathrm{log}\right)}$ is log(0.9) × (−1) = 0.045. This lineartolog conversion is only meaningful for relatively low uncertainty (⪅ 50 %), for which ${\mathit{\sigma}}_{{S}_{\mathrm{max}}\left(\mathrm{log}\right)}$ can be estimated as
For uncertainty in ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}$ beyond 50 %, the conversion is provided as Eq. (S1), but uncertainty is probably sufficiently high that it should be considered in log terms in any case. In some cases, ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}$ may not be available, so in the Supplement, we examine alternative statistical parameters as ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ but find that Eq. (11) is most effective in eliminating the bias.
^{∗} Equation (S1) should be used to calculate ${\mathit{\sigma}}_{{S}_{\mathrm{max}}\left(\mathrm{log}\right)}$ when ${\mathit{\sigma}}_{{S}_{\mathrm{max}}}\mathit{>}\mathrm{50}$ %.
As shown in Fig. 6 (green line), the average bias in the calibrated mass of analytes can be fully eliminated using the simplified ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ bias corrections. However, a significant shortcoming of this simplified approach is its implicit assumption of homoscedastic error. Some single average correction will necessarily underestimate the bias in some analytes and overestimate the bias in others. Because uncertainty is expected to increase with decreasing sensitivity, this simplified correction will lead to a systematic bias toward overcorrecting highsensitivity analytes and undercorrecting lowsensitivity analytes. This issue is demonstrated in Fig. 7, in which the simplified bias correction (green line) represents some average representation of the true bias correction (blue). The effect of this issue is strongly dependent on the relative importance of each source of error. Limitations of the simplified approach are more severe in cases where heteroscedastic errors (e.g., σ_{slope}) are significant (Fig. S5). Not enough data are yet available in the literature to determine the relative importance of uncertainties in each parameter, so the potential downsides of the simplified approach are not yet well constrained. Therefore, parameterexplicit bias correction should be implemented in cases where all four parameters can be reasonably estimated, but a simplified approach remains reasonable.
An additional value of ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ is that it can be considered an indicator of magnitude of the potential bias in retrospective analyses of past datasets. In two previous studies implementing the voltage scan calibration, we found that the calculated ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ can be as low as 0.012 (Mattila et al., 2020), or as high as 0.29 (LopezHilfiker et al., 2016; Iyer et al., 2016). Based on the loglinear fit and the calculated ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ in the two previous studies, the average bias in the summed mass of 100 simulated ions would be approximately 2 % and 28 % in Mattila et al. (2020) and LopezHilfiker et al. (2016), respectively, if sensitivities were determined by voltage scanning. However, these two extremes represent voltage scanning of two different instrument voltage regions and are calculated using a limited number of calibrants. The potential range of the ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}}$ therefore remains unclear, and future work is needed to examine this approach in realworld applications, and the real potential for bias in voltage scanning approaches. Furthermore, the number of calibrants in a voltage scan calibration is often limited due to the lack of commercially available standards covering the entire sensitivity range, so σ_{residual} (and thus ${\mathit{\sigma}}_{\mathrm{scatter}}^{\mathrm{eff}})$ may not adequately capture the true scatter of residuals.
Uncertainty of instrumental measurements is frequently reported in the literature, which is typically a measure of the combined instrument precision and accuracy. In contrast, bias represents a systematic error in the accuracy and is distinct from these reported uncertainties. It is theoretically worth comparing the relative magnitude of the two types of errors, as a small bias would likely be negligible in the case of large uncertainty. However, this is difficult as bias may vary significantly depending on the uncertainties of the loglinear fit, with examples shown in Fig. S5 ranging from 8 % to 300 % bias in summed mass of analytes. Recent work by Bi et al. (2021) found uncertainties of a factor of 3–10 for individual ions and ∼ 30 % for the sum of many ions using the voltage scanning method. This summed uncertainty is comparable in scale to the bias determined for the data from IsaacmanVanWertz et al. (2018), indicating bias is likely nonnegligible. For individual ions, the importance of bias correction depends strongly on the dV_{50} of the compound and the scale of the bias correction, though a parameterexplicit bias correction always increases accuracy.
In this work, we examine uncertainty in the case where instrument sensitivity is itself a function of some parameter, with a focus on uncertainty in the summation of multiple analytes. We show that when sensitivity is a linear function of a parameter, the sum of multiple analytes necessarily has lower relative uncertainty than any given analyte. However, when an iodide CIMS is calibrated by the voltage scanning method utilizing the linear relationship between log transformation of sensitivity and a parameter, an inherent bias is introduced into the sensitivity of analytes. While summing multiple analytes increases the precision of the sum, the bias can only be eliminated by specifically introducing correction terms to the relationship. Although the discussions of this work mainly focus on iodide CIMS, we believe that this correction can be applied to other CIMSs or more broadly other atmospheric measurement instruments using loglinear calibration relationships.
The correction terms introduced in this work for both the general case of logtransformed relationships and the special case of an iodide CIMS (i.e., Eq. 9), fully remove this bias. We propose that these correction terms should be introduced into any such calibration schemes in future work in order to minimize bias and reduce uncertainty in the literature. Given that realworld calibration scenarios are complex and consequently not all parameters have known uncertainties, we suggest that, at least, a term to correct for the average observed scatter around the nominal relationship, i.e., Eq. (10), should be incorporated in calibrations to remove a major portion of the bias. For the convenience of method users, we summarize correction procedures as a stepbystep guidance to apply the simplified bias correction method in Table 1.
However, we do recommend that this simplified approach be used cautiously to avoid overcorrections of sensitivity for more sensitive analytes and undercorrections for less sensitive ones. While data are limited on the uncertainty in each calibration parameter and the relative merits of simplified vs. parameterexplicit correction, biascorrected results are expected to be more accurate than uncorrected values, and some form of bias correction should be introduced into instrument calibrations relying on logtransformed calibrations.
All raw and processed data collected as part of this project are available upon request.
The supplement related to this article is available online at: https://doi.org/10.5194/amt1465512021supplement.
GIVW developed the thought experiment. CB led the consequent data simulation and analysis under the guidance of GIVW. JEK and MRC contributed to the development of the theory of the described approach. CB and GIVW prepared the manuscript with contributions from all other authors.
Jordan E. Krechmer and Manjula R. Canagaratna are employed by Aerodyne Research, Inc., which commercializes CIMS instruments for geoscience research.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We would like to thank the Alfred P. Sloan Foundation Chemistry of the Indoor Environment Program for supporting this work.
This research has been supported by the Alfred P. Sloan Foundation (grant no. P201811129).
This paper was edited by Glenn Wolfe and reviewed by two anonymous referees.
Bertram, T. H., Kimmel, J. R., Crisp, T. A., Ryder, O. S., Yatavelli, R. L. N., Thornton, J. A., Cubison, M. J., Gonin, M., and Worsnop, D. R.: A fielddeployable, chemical ionization timeofflight mass spectrometer, Atmos. Meas. Tech., 4, 1471–1479, https://doi.org/10.5194/amt414712011, 2011.
Bi, C., Krechmer, J. E., Frazier, G. O., Xu, W., Lambe, A. T., Claflin, M. S., Lerner, B. M., Jayne, J. T., Worsnop, D. R., Canagaratna, M. R., and IsaacmanVanWertz, G.: Coupling a gas chromatograph simultaneously to a flame ionization detector and chemical ionization mass spectrometer for isomerresolved measurements of particlephase organic compounds, Atmos. Meas. Tech., 14, 3895–3907, https://doi.org/10.5194/amt1438952021, 2021.
Brophy, P.: Development, characterization, and deployment of a highresolution timeofflight chemical ionization mass spectrometer (hrtofcims) for the detection of carboxylic acids and tracegas species in the troposphere, Colorado State University, 1–215, 2016.
Brophy, P. and Farmer, D. K.: Clustering, methodology, and mechanistic insights into acetate chemical ionization using highresolution timeofflight mass spectrometry, Atmos. Meas. Tech., 9, 3969–3986, https://doi.org/10.5194/amt939692016, 2016.
Crounse, J. D., McKinney, K. A., Kwan, A. J., and Wennberg, P. O.: Measurement of gasphase hydroperoxides by chemical ionization mass spectrometry, Anal. Chem., 78, 6726–6732, https://doi.org/10.1021/ac0604235, 2006.
Hurley, J. F., Kreisberg, N. M., Stump, B., Bi, C., Kumar, P., Hering, S. V., Keady, P., and IsaacmanVanWertz, G.: A new approach for measuring the carbon and oxygen content of atmospherically relevant compounds and mixtures, Atmos. Meas. Tech., 13, 4911–4925, https://doi.org/10.5194/amt1349112020, 2020.
IsaacmanVanWertz, G., Massoli, P., O'Brien, R. E., Nowak, J. B., Canagaratna, M. R., Jayne, J. T., Worsnop, D. R., Su, L., Knopf, D. A., Misztal, P. K., Arata, C., Goldstein, A. H., and Kroll, J. H.: Using advanced mass spectrometry techniques to fully characterize atmospheric organic carbon: Current capabilities and remaining gaps, Faraday Discuss., 200, 579–598, https://doi.org/10.1039/c7fd00021a, 2017.
IsaacmanVanwertz, G., Massoli, P., O'Brien, R., Lim, C., Franklin, J. P., Moss, J. A., Hunter, J. F., Nowak, J. B., Canagaratna, M. R., Misztal, P. K., Arata, C., Roscioli, J. R., Herndon, S. T., Onasch, T. B., Lambe, A. T., Jayne, J. T., Su, L., Knopf, D. A., Goldstein, A. H., Worsnop, D. R., and Kroll, J. H.: Chemical evolution of atmospheric organic carbon over multiple generations of oxidation, Nat. Chem., 10, 462–468, https://doi.org/10.1038/s4155701800022, 2018.
Iyer, S., LopezHilfiker, F., Lee, B. H., Thornton, J. A., and Kurtén, T.: Modeling the Detection of Organic and Inorganic Compounds Using IodideBased Chemical Ionization, J. Phys. Chem. A, 120, 576–587, https://doi.org/10.1021/acs.jpca.5b09837, 2016.
Jokinen, T., Sipilä, M., Junninen, H., Ehn, M., Lönn, G., Hakala, J., Petäjä, T., Mauldin, R. L., Kulmala, M., and Worsnop, D. R.: Atmospheric sulphuric acid and neutral cluster measurements using CIAPiTOF, Atmos. Chem. Phys., 12, 4117–4125, https://doi.org/10.5194/acp1241172012, 2012.
Krechmer, J. E., Coggon, M. M., Massoli, P., Nguyen, T. B., Crounse, J. D., Hu, W., Day, D. A., Tyndall, G. S., Henze, D. K., RiveraRios, J. C., Nowak, J. B., Kimmel, J. R., Mauldin, R. L., Stark, H., Jayne, J. T., Sipilä, M., Junninen, H., St. Clair, J. M., Zhang, X., Feiner, P. A., Zhang, L., Miller, D. O., Brune, W. H., Keutsch, F. N., Wennberg, P. O., Seinfeld, J. H., Worsnop, D. R., Jimenez, J. L., and Canagaratna, M. R.: Formation of Low Volatility Organic Compounds and Secondary Organic Aerosol from Isoprene Hydroxyhydroperoxide LowNO Oxidation, Environ. Sci. Technol., 49, 10330–10339, https://doi.org/10.1021/acs.est.5b02031, 2015.
Lee, B. H., LopezHilfiker, F. D., Mohr, C., Kurtén, T., Worsnop, D. R., and Thornton, J. A.: An iodideadduct highresolution timeofflight chemicalionization mass spectrometer: Application to atmospheric inorganic and organic compounds, Environ. Sci. Technol., 48, 6309–6317, https://doi.org/10.1021/es500362a, 2014.
Lindinger, W., Hansel, A., and Jordan, A.: Online monitoring of volatile organic compounds at pptv levels by means of ProtonTransferReaction Mass Spectrometry (PTRMS) Medical applications, food control and environmental research, Int. J. Mass Spectrom., 173, 191–241, 1998.
LopezHilfiker, F. D., Iyer, S., Mohr, C., Lee, B. H., D'Ambro, E. L., Kurtén, T., and Thornton, J. A.: Constraining the sensitivity of iodide adduct chemical ionization mass spectrometry to multifunctional organic molecules using the collision limit and thermodynamic stability of iodide ion adducts, Atmos. Meas. Tech., 9, 1505–1512, https://doi.org/10.5194/amt915052016, 2016.
Mattila, J. M., Lakey, P. S. J., Shiraiwa, M., Wang, C., Abbatt, J. P. D., Arata, C., Goldstein, A. H., Ampollini, L., Katz, E. F., Decarlo, P. F., Zhou, S., Kahan, T. F., CardosoSaldaña, F. J., Ruiz, L. H., Abeleira, A., Boedicker, E. K., Vance, M. E., and Farmer, D. K.: Multiphase Chemistry Controls Inorganic Chlorinated and Nitrogenated Compounds in Indoor Air during Bleach Cleaning, Environ. Sci. Technol., 54, 1730–1739, https://doi.org/10.1021/acs.est.9b05767, 2020.
Miller, D. M.: Reducing Transformation Bias in Curve Fitting, Am. Statist., 38, 124–126, 1984.
Riva, M., Rantala, P., Krechmer, E. J., Peräkylä, O., Zhang, Y., Heikkinen, L., Garmash, O., Yan, C., Kulmala, M., Worsnop, D., and Ehn, M.: Evaluating the performance of five different chemical ionization techniques for detecting gaseous oxygenated organic species, Atmos. Meas. Tech., 12, 2403–2421, https://doi.org/10.5194/amt1224032019, 2019.
Sekimoto, K., Li, S.M., Yuan, B., Koss, A., Coggon, M., Warneke, C., and de Gouw, J.: Calculation of the sensitivity of protontransferreaction mass spectrometry (PTRMS) for organic trace gases using molecular properties, Int. J. Mass Spectrom., 421, 71–94, https://doi.org/10.1016/j.ijms.2017.04.006, 2017.
Slusher, D. L.: A thermal dissociation–chemical ionization mass spectrometry (TDCIMS) technique for the simultaneous measurement of peroxyacyl nitrates and dinitrogen pentoxide, J. Geophys. Res., 109, 1–19, https://doi.org/10.1029/2004jd004670, 2004.
St Clair, J. M., McCabe, D. C., Crounse, J. D., Steiner, U., and Wennberg, P. O.: Chemical ionization tandem mass spectrometer for the in situ measurement of methyl hydrogen peroxide, Rev. Sci. Instrum., 81, 094102, https://doi.org/10.1063/1.3480552, 2010.
Yuan, B., Koss, A., Warneke, C., Gilman, J. B., Lerner, B. M., Stark, H., and De Gouw, J. A.: A highresolution timeofflight chemical ionization mass spectrometer utilizing hydronium ions (H_{3}O^{+} ToFCIMS) for measurements of volatile organic compounds in the atmosphere, Atmos. Meas. Tech., 9, 2735–2752, https://doi.org/10.5194/amt927352016, 2016.
Zaytsev, A., Breitenlechner, M., Koss, A. R., Lim, C. Y., Rowe, J. C., Kroll, J. H., and Keutsch, F. N.: Using collisioninduced dissociation to constrain sensitivity of ammonia chemical ionization mass spectrometry (NH${}_{\mathrm{4}}^{+}$ CIMS) to oxygenated volatile organic compounds, Atmos. Meas. Tech., 12, 1861–1870, https://doi.org/10.5194/amt1218612019, 2019.
 Abstract
 Introduction
 Prior work on uncertainty analysis
 Method for quantifying error in sums of analytes
 Expanding to CIMSspecific parameters
 Corrections in realworld applications
 Conclusions
 Data availability
 Author contributions
 Competing interests
 Disclaimer
 Acknowledgements
 Financial support
 Review statement
 References
 Supplement
 Abstract
 Introduction
 Prior work on uncertainty analysis
 Method for quantifying error in sums of analytes
 Expanding to CIMSspecific parameters
 Corrections in realworld applications
 Conclusions
 Data availability
 Author contributions
 Competing interests
 Disclaimer
 Acknowledgements
 Financial support
 Review statement
 References
 Supplement