Analysis of functional groups in atmospheric aerosols by infrared spectroscopy: sparse methods for statistical selection of relevant absorption bands
Abstract. Various vibrational modes present in molecular mixtures of laboratory and atmospheric aerosols give rise to complex Fourier transform infrared (FT-IR) absorption spectra. Such spectra can be chemically informative, but they often require sophisticated algorithms for quantitative characterization of aerosol composition. Naïve statistical calibration models developed for quantification employ the full suite of wavenumbers available from a set of spectra, leading to loss of mechanistic interpretation between chemical composition and the resulting changes in absorption patterns that underpin their predictive capability. Using sparse representations of the same set of spectra, alternative calibration models can be built in which only a select group of absorption bands are used to make quantitative prediction of various aerosol properties. Such models are desirable as they allow us to relate predicted properties to their underlying molecular structure. In this work, we present an evaluation of four algorithms for achieving sparsity in FT-IR spectroscopy calibration models. Sparse calibration models exclude unnecessary wavenumbers from infrared spectra during the model building process, permitting identification and evaluation of the most relevant vibrational modes of molecules in complex aerosol mixtures required to make quantitative predictions of various measures of aerosol composition. We study two types of models: one which predicts alcohol COH, carboxylic COH, alkane CH, and carbonyl CO functional group (FG) abundances in ambient samples based on laboratory calibration standards and another which predicts thermal optical reflectance (TOR) organic carbon (OC) and elemental carbon (EC) mass in new ambient samples by direct calibration of infrared spectra to a set of ambient samples reserved for calibration. We describe the development and selection of each calibration model and evaluate the effect of sparsity on prediction performance. Finally, we ascribe interpretation to absorption bands used in quantitative prediction of FGs and TOR OC and EC concentrations.