A suite of generally applicable statistical methods based
on empirical bootstrapping is presented for calculating uncertainty and
testing the significance of quantitative differences in temperature and/or
ice active site densities between ice nucleation temperature spectra derived
from droplet freezing experiments. Such experiments are widely used to
determine the heterogeneous ice nucleation properties and ice nucleation
particle concentration spectra of different particle samples, as well as in
studies of homogeneous freezing. Our methods avoid most of the assumptions
and approximations inherent to existing approaches, and when sufficiently
large sample sizes are used (approximately
Ice nucleation (IN) is a complex process with significant implications for cloud properties in atmospheric science (Gettelman et al., 2012; Mülmenstädt et al., 2015; Froyd et al., 2022). Heterogeneous ice nucleation, where a separate phase or substance assists the nucleation of ice above the homogeneous freezing limit, is particularly difficult to study as the length and timescales at play in nucleation are difficult to directly observe (Fletcher, 1969; Wang et al., 2016; Kiselev et al., 2017; Holden et al., 2019). Most researchers resort to macroscopic measurements of this nanoscale process by creating droplets containing suspensions of the ice active material and observing freezing events as time passes or temperature changes (Vali, 2014). The most common technique is a variation on the droplet-on-substrate apparatus, where droplets of known sizes are created by manual pipetting, condensation, or microfluidic means (Stan et al., 2009; Budke and Koop, 2015; Whale et al., 2015; Chen et al., 2018; Polen et al., 2018; Reicher et al., 2018; Brubaker et al., 2020; Gute and Abbatt, 2020; Roy et al., 2021). These droplets are usually exposed to a negative temperature ramp and the freezing temperatures of each droplet are recorded to produce an ice nucleation rate or active site density spectrum as a function of temperature (here we use the term “IN activity” as a general term to refer to any measured or derived variable which quantifies ice nucleation rate with respect to temperature). Other procedures can be used to test the effects of time and other variables on IN activity (Wright and Petters, 2013).
Because these experiments only indirectly measure IN activity, results can have high natural variability, even when measuring the same sample on the same instrument. This variability is inherent to ice nucleation. Using the combined singular–stochastic VS66 model most recently discussed in Vali (2014) and terminology proposed in Vali et al. (2015), ice nucleation activity (or rate) is an accumulation of many ice nucleation sites with variable critical temperatures dispersed randomly throughout a material. In turn, the material is distributed randomly throughout droplets which can have varying sizes, shapes, and environments. Therefore, a measured IN activity can be affected by heterogeneity in the distribution of ice active sites across a material, heterogeneity in the mass or surface area of material suspended in each droplet, differences between droplet sizes and environments, and variations in temperature between droplets (Polen et al., 2018). Even in a perfect experimental setup, the stochastic nature of nucleation causes variation in the measured temperature dependence of a material's IN activity using a singular model (Vali, 2014, 2019). Combined with the large variations in IN activity observed between different ice nucleating substances and particles, this inherent uncertainty creates difficulties in reliably assessing whether differences in observed IN spectra indicate a statistically significant difference in IN activity.
Experimental error is always present and must be accounted for and reported, usually in the form of a standard error or a confidence interval of the mean measurement recorded. In our experience, there is no widely implemented approach to reporting uncertainty in IN temperature spectra derived from freezing experiments. Instead, methods vary between groups, relying on different assumptions about the nature of ice nucleation experiments, the forms of distributions that the random variables involved take, and the quantification of the derived uncertainties. In the simplest case, standard deviations, errors, and/or confidence intervals have been calculated from repeated experiments either by assuming that variability follows a normal distribution (Losey et al., 2018; Polen et al., 2018; Jahn et al., 2019; Chong et al., 2021; Roy et al., 2021; Worthy et al., 2021), a Poisson distribution, (Koop et al., 1997; Alpert and Knopf, 2016; Kaufmann et al., 2017; Knopf et al., 2020; Yun et al., 2021), or that droplet freezing follows a binomial distribution (McCluskey et al., 2018; Suski et al., 2018; Gong et al., 2019, 2020; Wex et al., 2019). In other cases, authors have used a model of ice nucleation to simulate their experiments and use that simulated distribution to estimate the uncertainty present in their experiment. In simpler models, droplet freezing is modeled as a Poisson point process (Vali, 2019; Jahl et al., 2021; Fahy et al., 2022b). In more sophisticated models, random variables such as the number of sites, mass of material, and temperature variations are parameterized to run completely new simulated experiments (Wright and Petters, 2013; Harrison et al., 2016). Even in these models, either additional measurements are required or assumptions must be made about the distribution of each variable. Until the inherent variability behind ice nucleation can be measured to prove or disprove the assumptions being made, all the above methods are only as reliable as the assumptions themselves. In Sect. 4, each method, their required assumptions, and the validity of those assumptions are discussed in further detail.
Empirical bootstrapping is an alternative approach to estimating statistics for a dataset that to our knowledge has not been applied in the context of ice nucleation. In this technique, a series of random samples of the measured dataset is taken to generate estimated statistics that converge on the actual values as the number of samples increases (Efron, 1979; Shalizi, 2022). No assumptions are required about the distributions of random variables underlying ice nucleation and it can be applied to any system where the freezing temperatures or times of droplets are measured. Here we present a set of generalized and statistically rigorous methods based on empirical bootstrapping for quantifying uncertainty in IN spectra. When accompanied by interpolation methods presented in Sect. 3, this approach can be used to calculate continuous confidence bands and statistically test differences between IN spectra as shown in Sect. 5. We also address the effects of interpolation techniques, droplet sample size, and bootstrap sample size to direct the field towards more rigorous and repeatable methods of experimentation and data analysis. An implementation of all presented statistical methods along with documentation and instructions for its use is provided freely for use or reference to assist in future research and improve the statistical treatment of ice nucleation data in the field.
To demonstrate the statistical methods described here, we selected an
example IN dataset shown in Fig. 1. The Fuego ground PM37 sample (FUE) from
Jahn et al. (2019) was tested for ice nucleation activity before and after
being exposed to water in a 1 wt % suspension and allowed to dry under a
constant 1 Lpm flow of pre-dried lab air similarly to Fahy et al. (2022b).
In both cases, a 0.1 wt % suspension of unaged or aged ash was created in
water (HPLC grade, Sigma) filtered through a 0.02 micrometer pore size Anatop
syringe filter. These suspensions were then tested for IN activity on the
CMU-CS droplet-on-substrate system described in detail by Polen et al. (2018) and are compared to a background freezing spectrum obtained from the
filtered water used to create the suspensions. Approximately 50 100 nL
droplets (1.5 mm diameter) were tested per array with a cooling rate of 1
Since multiple freezing experiments were performed on nominally identical samples (e.g., the replicate suspensions of the same ash or aging experiment), these spectra were combined by merging the lists of freezing events that occurred in each experiment. The frozen fractions and ice nucleation active site density spectra were then recalculated as if the combined freezing events occurred in a single experiment. This is only valid when the IN spectrum of a given suspension is insignificantly different from the combined spectra of all other suspensions, and the physical and chemical properties (e.g., suspension concentration, sample type, water purity, background freezing) are identical between suspensions. The second condition can easily be tested in the laboratory, while the first condition can be evaluated using statistical tests described in this paper (see Sect. 5.2).
Raw (not interpolated or binned) and combined raw
The ice active site density spectra were calculated directly based on Eqs. (1)–(3) (Vali et al., 2015; Vali, 1971, 2019), where
The most common style of reporting ice nucleation activity is using the cumulative ice nucleation active site density curves calculated directly from raw data as shown in the previous section, but there is an important limitation to this type of data treatment. While it represents the data exactly as measured, there is no way to quantitatively compare one raw freezing spectrum with another without some type of interpolation. This is because even if a droplet freezes at a particular temperature in one experiment, there is no guarantee that a droplet will freeze at or near that temperature in another experiment. Often the approximate difference between spectra is just compared by eye for lack of a better method. This presents issues when trying to subtract a background spectrum or when quantifying uncertainty and testing statistical difference between spectra, and leads to a need for effective interpolation methods for comparing IN spectra.
One common method for interpolating IN spectra is through temperature
binning, where a temperature interval is represented by a single value of IN
activity that is treated as constant throughout the interval. This approach
is appealing, as it aligns with the discrete nature of IN experiments and
allows straightforward calculation of differential IN spectra by using the
bin width as
To make a discrete variable continuous, some type of functional interpolation is required. Many studies approximate IN spectra as exponential polynomials or similar simple functions (Atkinson et al., 2013; Kanji et al., 2013; Niedermeier et al., 2015; Harrison et al., 2016, 2019; Peckhaus et al., 2016; Vergara-Temprado et al., 2017; Price et al., 2018). Exponential polynomials can capture the overall exponential shape of cumulative IN spectra in most cases, however they impose explicit assumptions about the shape of the IN spectra through their closed-form expressions. Particularly in samples that contain mixtures of different types of ice nucleation sites (e.g., Beydoun et al., 2017), simple polynomials are likely to be insufficient for accurate interpolation of IN spectra.
Instead, the ideal interpolation method would take a series of measured data points from a droplet freezing experiment and would output a continuous IN parameterization that could predict the IN activity of the sample at any temperature. A parameterization such as a contact angle scheme (Chen et al., 2008; Beydoun et al., 2016; Ickes et al., 2017) or the singular–stochastic formulation of ice nucleation (Vali, 2014; Barahona, 2012; Niedermeier et al., 2011) would be preferred, however these parameterizations require preexisting knowledge or assumptions about of the nature of the sample being tested. For data analysis in laboratory or field studies, this information is often not available, and we must look for an interpolation method that can capture an ice nucleation spectrum with any shape.
For a generally applicable interpolation scheme, piecewise fitting algorithms such as a spline interpolation fit all requirements. Spline fits provide interpolations of arbitrarily complex data by fitting a series of polynomials to small portions of the available data. The resulting piecewise functions are continuous and differentiable, meaning that only one or the other of the cumulative or differential IN spectrum must be directly fit from the data – the other spectrum can be calculated by either computing the negative derivative of the cumulative freezing curve or the negative antiderivative of the differential spectrum. To find the fitting method that performs well, a variety of algorithms available in the Python Scipy library (Virtanen et al., 2020) were modified and tested for their ability to interpolate the combined water aged volcanic ash ice nucleation spectrum. “Splinederiv” uses a cubic spline fit of the cumulative spectrum, “splineint” uses a cubic spline fit of the differential spectrum, “PCHIP” uses the piecewise cubic Hermite interpolated polynomial algorithm of the cumulative spectrum, and “smoothedPCHIP” is the PCHIP curve followed by a cubic spline fit with a smoothing factor.
Comparison of interpolation methods on the water aged ash sample
above in
Figure 2a (cumulative
The question of how to calculate confidence intervals for IN spectra derived
from droplet freezing experiments has been addressed several times in the IN
literature. In some cases, a normal distribution about the frozen fraction
curves is assumed. Where multiple freezing experiments are available and are
interpolated such that means and standard deviations can be calculated for a
collection of freezing spectra, a
Other studies (e.g., McCluskey et al., 2018; Suski et al., 2018; Wex et al., 2019; Gong et al., 2019, 2020) have calculated approximate confidence intervals for frozen fraction values by treating them as binomial ratios and using the adjusted Wald interval suggested by Agresti and Coull (1998). In the latter case, calculating uncertainty for derived ice active site density spectra requires propagation of error through Eq. (1) and (2), followed by an assumption of normality when the confidence intervals are calculated. However, there is no reason to believe that the spread of freezing events in droplets should even approach a normal distribution, making this assumption unreliable.
A better approximation for the variability in droplet freezing experiments is the Poisson distribution, in part because the widely used ice active site density spectra are based in Poisson statistics (Vali, 1971), but also because droplet freezing resembles a Poisson point process where freezing events occur approximately continuously and independently at a given rate. Koop et al. (1997) suggested the use of Poisson fiducial limits to calculate uncertainty in a variety of types of freezing experiments, and this approach has been used by several studies since (Alpert and Knopf, 2016; Kaufmann et al., 2017; Knopf et al., 2020; Yun et al., 2021). However, the distributions of IN sites across particles, distributions of these particles among droplets, distributions of freezing abilities of individual IN sites, distributions of freezing events that occur based on the aggregate freezing ability in a droplet, and temperature distribution between the droplets could all serve to skew or otherwise change the distribution of droplet freezing events measured. Using a Poisson distribution corrects for only some of these random factors, and because ice active site spectra are based on the Poisson process, these are the variables that most need to be considered when calculating experimental uncertainty. Thus, while these closed-form confidence limits are convenient, they are not likely to be accurate.
Another class of methods of calculating confidence intervals for freezing spectra relies on a technique known as bootstrapping, where artificial freezing experiments are generated from a measurement using Monte Carlo simulations (Davison and Hinkley, 1997). When the simulations are based on an existing ice nucleation theory (e.g., when simulated experiments are produced using a parameterization of ice nucleation), this technique is known as parametric bootstrapping, and given enough simulations, the artificial experiments represent the full range of possible variability around the measured result that could be observed in the theoretical framework used.
For example, based on Wright and Petters (2013), Harrison et al. (2016) and
subsequent publications simulate a number distribution of ice active sites
in a collection of theoretical droplets based on the ice active site
densities calculated from the original experiment. This model can be used to
simulate freezing spectra by sampling these theoretical droplets and
assuming that freezing events occur when the number of ice active sites in
each droplet is greater than or equal to one. When repeated enough times,
this distribution of freezing spectra can be used to calculate confidence
intervals for the measured data either by assuming that the quantiles of the
distribution of simulated freezing spectra approximate the confidence
intervals or by calculating simple
An alternative method of parametric bootstrapping for confidence intervals of IN spectra models individual droplets freezing as a Poisson point process (again the same assumption used in deriving ice active site density spectra) as shown in Vali (2019) and applied in Jahl et al. (2021) and Fahy et al. (2022b). In this approach, the number of droplets that freeze in each temperature interval (or equivalently, the rate of droplet freezing) is used as the mean value of a discrete Poisson distribution. Then, for each temperature interval, a new number of droplets freezing in the interval is selected from the distribution. When this is done for all temperature intervals, the simulated values are combined into a simulated experiment. Once ice active site density spectra are calculated from these simulations and this process is repeated hundreds to thousands of times, the quantiles of the distribution of simulated ice active site densities for each temperature bin can be used as an approximation of confidence intervals.
Both parametric bootstrapping approaches described here rely on the parameterization to produce accurate results, meaning that if the parameterizations are approximate or inaccurate, they may produce misleading or incorrect statistics. An in-depth analysis of the accuracy of the assumptions of each of these parameterizations is beyond the scope of this paper, but there are major concerns for each model. The calculations based on particle distributions in droplets (Wright and Petters, 2013; Harrison et al., 2016) assume that ice active sites are distributed evenly across the surface of a material, that the material is suspended evenly throughout the droplet, and possibly (depending on the specific approach) that the material is composed of uniform spheres and that ice nucleation is time-independent or the characteristic temperatures for each given ice nucleation site are normally distributed. The first assumption is known to be false for some materials; minerals often have higher concentrations of and/or more ice active IN sites near or in specific nanoscale defects, cracks, pores, or other specific regions such as the perthitic textures in some feldspar minerals (Whale et al., 2017; Kiselev et al., 2017; Holden et al., 2019; Friddle and Thürmer, 2020). The second assumption may or may not be true, especially at higher suspension concentrations (Beydoun et al., 2016). The third assumption depends on the material in question. The fourth assumption ignores time, one of the most important factors introducing uncertainty and randomness into droplet freezing experiments (Wright and Petters, 2013; Herbert et al., 2014; Vali, 2014; Knopf et al., 2020), and the fifth assumption does not have a theoretical basis and requires additional experimentation to determine the parameters of the normal distribution (Wright and Petters, 2013). Regardless of the specific approach used, these techniques either require extensive experimentation to determine the nature of the ice nucleation material being studied or rely on assumptions that produce an incomplete and potentially inaccurate parameterization.
The calculations based on the Poisson distribution (Vali, 2019; Fahy et al., 2022b; Jahl et al., 2021) have very different assumptions. Stochasticity and IN site variability are accounted for in the process of simulation from the measured IN spectrum, however this method requires coarse binning, as ideally multiple freezing events will occur within each bin. As discussed before, binning continuous data is inefficient. It also assumes that in these bins, the nucleation rate does not change with temperature. For coarse temperature bins especially, this assumption will break down, as ice nucleation spectra are strong exponential functions of temperature (Fletcher, 1969). While the Poisson parametric bootstrapping method makes fewer assumptions and captures more variability than other parametric methods, it relies on risky and/or false assumptions, contributing systematic error to the confidence intervals. Note it is not the purpose of this study to quantitatively compare methods previously used to calculate uncertainty in IN spectra, and the above discussion is only a qualitative overview of the assumptions and approximations previous methods use.
The other class of bootstrapping method, non-parametric bootstrapping (known
as empirical bootstrapping), does not rely on any parameterizations.
Instead, the original experimental data are sampled from with replacement
(i.e., the same data point can be sampled more than once) to produce
artificial datasets (Efron, 1979; Efron and Tibshirani, 1994; Davison and
Hinkley, 1997; Shalizi, 2022). This method is remarkably well suited
to the problem of ice nucleation statistics, as droplet freezing experiments
result in a list of freezing temperatures that can be easily sampled from to
create new simulated droplet freezing experiments. The large droplet numbers
coupled with a limited freezing temperature range ensure that the empirical
data cover most of the possible variability within each experiment. If
multiple freezing experiments are performed on identically prepared samples,
this method will even capture the variability in sample preparation and
other aspects of the experiments being performed. Since variations in
droplet size, sample mass suspended, or distributions of surface area among
droplets (the parameters behind the normalization constant
Figure 3a and b show the application of empirical bootstrapping to simulate
cumulative and differential spectra for the combined and interpolated
volcanic ash ice nucleation data previously introduced in Figs. 1 and 2.
Each spectrum is statistically simulated by randomly sampling with
replacement
Interpolated combined data (bold line), interpolated 2.5th
and 97.5th quantiles (dashed lines), and interpolated individual
simulations (faint lines;
While the mathematical theory behind empirical bootstrapping is complex (see
Efron and Tibshirani, 1994 or Davison and Hinkley, 1997 for a thorough
treatment of the mathematics behind bootstrapping and Canty et al., 2006
for a thorough discussion of inconsistencies and errors that can be
encountered when using bootstrapping), Fig. 3 provides some evidence that
this approach has successfully captured the possible variability in the ice
nucleation spectra. Using the interpolated quantiles as a measurement of the
spread of the simulated spectra, the magnitude of the variability in each
spectrum largely follows the trends that would be expected. For example, the
simulated cumulative spectra have much less relative variability than the
simulated differential spectra and both types are less variable at
intermediate temperatures where more droplets froze in the actual
experiments. This reflects the fact that increased sample sizes tend to
reduce uncertainty as cumulative spectra represent a sum of all previous
data points, and most droplets tend to freeze at intermediate temperatures in
a droplet freezing assay. The noisiness of the differential spectra
indicates large uncertainty, meaning the differential spectrum for the
unaged volcanic ash is largely uninterpretable, while the differential
spectrum for the aged volcanic ash and both cumulative spectra are much more
descriptive – for example, it can clearly be seen that the two cumulative
spectra do not overlap significantly below
Using this new method to simulate data that capture the variability inherent
to freezing experiments, bootstrapped summary statistics describing the
experimental measurement can be calculated. Values such as the bootstrapped
standard error of the mean approximate the true standard error of the mean
remarkably accurately when large numbers (
Fortunately, other bootstrap confidence intervals exist. For a simple
interval rooted in statistical theory, we can construct the reverse
percentile interval, also known as the pivotal interval, where the upper and
lower quantiles are subtracted from twice the sample mean for the lower and
upper confidence intervals respectively. However, in skewed distributions
such as uncertainty in ice nucleation spectra, the pivotal interval tends to
be inaccurate. For a more traditional interval, we can construct a
Significant work has gone into correcting these problems with basic
bootstrapped confidence intervals. The tboot interval can be corrected for
skewness to the “tskew” interval by including a second-order skewness term
in the tboot calculation as shown by Johnson (1978). The quantile interval
can be expanded by changing the quantile bounds by a factor related to the
Comparison of methods to calculate confidence bands (shown as
different-colored dashed lines) for
The above methods were used to calculate confidence bands (continuous
confidence intervals) for the cumulative and differential IN spectra of the
unaged and water aged combined volcanic ash sample. Like the summary
statistics, to create confidence bands confidence intervals were calculated
at every 0.1
If accurate confidence bands on both the cumulative and differential spectra
are required from low-resolution data, studentized intervals should always
be used. Ideally, the studentized confidence bands should be used in all
cases, but the computational time required for calculation of these
confidence bands can be excessive. For most use cases then, the tskew bands
are somewhat conservative confidence bands rooted in theory, and we will use
them in the remaining examples below. Quantile or expanded quantile bands
are also an appropriate choice when empirical bootstrapping is used but
should be tested against the studentized bands for each system to check for
potential biases in the data collection process. Quantile bands should be
avoided when using small numbers of droplets (
Although we cannot theoretically determine the sample sizes required for
accurate confidence bands using empirical bootstrapping due to the same
limitations discussed previously, the sample sizes required for accurate
confidence bands can be empirically evaluated by testing how many assays,
droplets, and simulated spectra are required for confidence bands to
converge (therefore reducing the uncertainty of the confidence bands due to
sample size). Figure 5a displays interpolations and resulting confidence bands
for the differential IN spectrum of aged volcanic ash when 50, 100, 150,
200, and 286 (where all droplets are included) droplets are randomly sampled
from the six performed experiments. The width and shape of the confidence
bands change significantly but seem to be converging to a smooth curve
exemplified when
Differential freezing spectra of the water aged FUE ash with tskew
confidence bands
In Fig. 5b, the tskew confidence bands of the combined water aged volcanic
ash IN spectra (all 286 droplets) are compared when the number of
simulations (
Finally, Figs. 3–5 provide evidence that the interpolation technique used is
not overfitting the data, as the quantiles and other confidence bands follow
the general shape of the experimental spectra. Since these statistics are
calculated from an aggregate of 1000 samples in most cases, they would be
expected to smooth out random variation present in a single measured
spectrum that could be causing the complex interpolated curve observed.
Because the aggregated data maintain the same shape, it can be assumed that
it is at least somewhat meaningful, and that the interpolation technique is
using an appropriate smoothing factor; however, this should be tested
regularly to minimize potential overfitting. Note that when droplet numbers
are below 200 (as in some of the Fig. S4a spectra and in Fig. S5) the
interpolated differential spectra have shapes that look unrealistic (e.g., many inflection points within 1 or 2
Confidence bands provide useful information about the variability of a single dataset – in the case of droplet freezing assays, 95 % confidence intervals contain the true population mean ice nucleation activity of the suspension being sampled from in 19 out of 20 analyses (that is, either the true spectrum is within the confidence interval, or an event of probability at most 5 % happened during data collection). All ice nucleation data should be reported with some form of confidence interval or quantification of the distribution of the measurements (e.g., standard error bars). These statistics must be calculated using a method, such as empirical bootstrapping, rooted in statistical theory to minimize assumptions about the ice nucleation experiment and accurately represent the uncertainty inherent to the experiment.
Another key application of statistics that quantify the variability within a dataset is in comparing measurements of different samples to assess the degree of similarity of their ice nucleation activity. In general terms, confidence bands can be used to compare two IN spectra by determining whether they could reasonably have been drawn from the same population. Often confidence intervals or bands are interpreted based on whether they overlap: if confidence intervals of two spectra do not overlap, they are statistically significantly different. However, it is not necessarily true that if the confidence bands overlap the two measurements are statistically the same at a given confidence level. This common misconception is based on the difference between error bars calculated using the standard error of the mean and confidence intervals (Barde and Barde, 2012; Belia et al., 2005).
For a more quantitative (and interpretable) method to compare IN spectra can simply be divided or subtracted. We will use the term “difference spectrum” to refer to this ratio or difference as a function of temperature, as both are calculated using the same procedures and provide similar information. When interpolated IN activity spectra are used, a continuous difference spectrum can easily be generated by calculating the ratio (or difference) between two interpolations at each point in a dense grid of temperatures, then interpolating between those points. A difference spectrum can be plotted as a function of temperature with its own confidence bands and can be used to test whether two IN spectra are statistically significantly different at any temperature where the two spectra overlap at any confidence level. Stated precisely, the hypothesis that the two IN spectra are different can be tested against the null hypothesis that the two IN spectra are not quantitatively different. in the case of a ratio-based difference plot with confidence bands, if the confidence bands do not contain one at a given temperature, then the null hypothesis is rejected. If they do contain one, then that claim cannot be made. If a difference between IN spectra is used instead of a ratio, then zero is used for this hypothesis test instead of one. Therefore, if confidence bands can be accurately calculated for a difference spectrum, then continuous statistically rigorous claims about differences between IN spectra can be tested.
Calculating confidence bands for differences or ratios of continuous variables is not trivial, but for these metrics to be useful, confidence bands are necessary. Subtracting or dividing the confidence bands of the compared spectra is not accurate. Elementary propagation of error formulas assumes that the variability within both spectra (and of the difference spectra) is normally distributed, which is a poor assumption as discussed previously. Again, bootstrapping offers a solution. To simulate the variability in the difference spectra, individual simulations of each measurement can be subtracted or divided from each other pairwise until a collection of simulated difference spectra combining the variability inherent to each measurement is produced. From these bootstrapped simulations, confidence bands can be produced using any of the methods in Sect. 4.
Figure 6a and b show the ratio and difference between the IN spectra of
water aged volcanic ash and unaged volcanic ash with confidence intervals.
Suspension of minerals and volcanic ash in water can cause alteration of the
ice-active surface sites due to a variety of geochemical processes as shown
in recent literature (Harrison et al., 2019; Jahn et al., 2019; Kumar et
al., 2019; Maters et al., 2020; Perkins et al., 2020; Fahy et al., 2022b).
Based on the confidence bands of either metric, it can easily be seen that
below approximately
Comparison of the water aged
Difference spectra have a variety of useful applications within the context
of ice nucleation. The first has already been shown, as two spectra can be
easily tested to determine whether there is a statistically significant
difference between them. This is particularly useful in studies of chemical
aging, where the change in IN activity after a given chemical treatment can
be quantitatively measured using the difference or ratio before and after
aging. Another application is in background freezing subtraction for IN
spectra. All droplet-on-substrate methods used to measure heterogeneous IN
activity have some level of background freezing activity either from
background heterogeneous nucleation or from homogeneous ice nucleation that
can change day to day depending on the system (Polen et al., 2018; Vali,
2019). For accurate measurements and to compare between instruments, the
instrumental background (or homogeneous ice nucleation activity) must be
subtracted from any measured heterogeneous IN spectrum. This can be readily
accomplished by calculating the difference between the IN spectrum of
interest and the background freezing spectrum. Where there is no background,
the difference is equal to the sample spectrum. By saving the subtracted
simulations used to calculate the variability in this difference spectrum,
the background-subtracted data can be compared further via another difference spectrum if desired.
This can also be useful in determining whether a sample's IN activity is
distinguishable from the instrumental background in weak IN active materials. For
all use cases, accurate confidence bands based on the bootstrapping
procedures presented here are integral to ensuring rigorous and correct
analysis and interpretation of the data, as simply subtracting
A third application of difference spectra in IN activity is in locating outliers. Droplet-on-substrate IN measurements are extremely sensitive to contamination and human error, even when great care is taken during the sample preparation process. When two measurements of the same sample disagree, additional replicate measurements are taken to determine if a measurement is an outlier, usually visually. Ideally, a more quantitative measurement of outlier status would be used, such as the Grubbs test (Grubbs, 1969), Tukey's Fences (Tukey, 1977), or the modified Thompson Tau test (Thompson, 1985). However, the usefulness of these common techniques and the assumptions they require for IN spectra is questionable. Instead, we propose that for a quantitative measurement of whether a sample is an outlier, the difference spectrum comparing the sample in question with the combined spectrum of the remaining measurements of the same sample can be used. An example of this analysis is shown in Fig. 7, where the various water aged ash freezing experiments are compared using a difference plot to combinations of the remaining measurements. It can be clearly seen that only the spectrum shown in goldenrod is statistically significantly different (in this case lower) at the 99 % confidence level based on the bootstrapped tskew confidence bands. Therefore, this experiment could be treated as an outlier at that confidence level and excluded from future analysis. Even still, great care should be taken when dealing with potential outliers, and the confidence level required to exclude outliers should be carefully considered so as not to remove valid data. Whenever possible, decisions about whether to exclude a potential outlier should combine this statistical method with observations or lack thereof of specific experimental errors in the laboratory.
We have presented a rigorous and generalized set of methods for
interpolating raw data, calculating confidence bands and other statistics,
and quantitatively comparing IN spectra derived from droplet freezing
assays. The interpolation methods discussed use ice nucleation data far more
efficiently than previous binning methods, and allow continuous quantitative
comparison of IN spectra without compromising statistical power and detail
present in the original data. Empirical bootstrapping is introduced as an
improvement on the elementary statistical methods and parametric
bootstrapping previously used by capturing the full variability present in
each IN spectrum or collection of IN spectra with no assumptions about the
nature of ice nucleation for the material being tested. Continuous
confidence bands are calculated using rigorous and modern algorithms to
replace the quantile intervals or
Comparison of individual water aged cumulative
These approaches can be used to help answer many important research
questions in the field related to statistically assessing observed changes
or differences in IN activities, and can be applied to any experimental setup
using arrays of droplets freezing over time or at varying temperatures. They
are supported by statistical theory and use widely accepted methodologies
from the statistics literature. The universality, simplicity, and accuracy
of this approach makes it an ideal candidate to be a standard statistical
method by which to compare datasets from different instruments and groups.
The bootstrapping approach could be particularly useful for incorporating
uncertainty in IN activity into advanced atmospheric models, as a full
distribution of IN activity at each temperature can be easily estimated from
simulations. To facilitate adoption of these statistics, all code developed
for this project along with documentation and data to recreate the figures
in this paper are available in archived form as was used at the time of
writing at KiltHub (Fahy et al., 2022a) or in a living GitHub repository
where updates or additional information may be added in the future
(
Further refinement of these methods by optimizing code runtime, improving confidence interval coverage, adding simulation methods, and implementing different statistics may be accomplished in the future as necessary. Extension of the procedures described here may be possible to describe uncertainty in instruments that measure ice nucleation in the aerosol phase such as CFDC-type instruments and expansion chambers, and are not limited to ice nucleation. This may lead to applications describing uncertainty in experiments analyzing a variety of nucleation processes under varying conditions. If widely adopted, the quality and consistency of statistical treatment of nucleation data will improve, leading to enhanced representation and communication of results and interpretations within those fields.
All code and data used in this project can be accessed in its archived form at
The supplement related to this article is available online at:
WDF and RCS conceptualized the paper. CRS contributed statistical methods. WDF wrote the script; collected, analyzed, and visualized data; and wrote the initial paper draft. All authors provided input into the methods developed and edited the paper.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We are grateful to Leif Jahn for helpful discussions in developing this concept, and for two anonymous reviewers for their feedback which has greatly improved the clarity of this paper.
This research has been supported by the Division of Chemistry (grant no. 1554941). This research was funded by National Science Foundation of the United States of America (grant no. CHM-1554941).
This paper was edited by Mingjin Tang and reviewed by two anonymous referees.