<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" dtd-version="3.0">
  <front>
    <journal-meta>
<journal-id journal-id-type="publisher">AMT</journal-id>
<journal-title-group>
<journal-title>Atmospheric Measurement Techniques</journal-title>
<abbrev-journal-title abbrev-type="publisher">AMT</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Atmos. Meas. Tech.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1867-8548</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>

    <article-meta>
      <article-id pub-id-type="doi">10.5194/amt-9-3429-2016</article-id><title-group><article-title><?xmltex \hack{\vspace{4mm}}?> Analysis of functional groups in atmospheric aerosols by infrared
spectroscopy: sparse methods for statistical selection of <?xmltex \hack{\break}?>relevant absorption bands</article-title>
      </title-group><?xmltex \runningtitle{Sparse methods for statistical selection of relevant absorption bands}?><?xmltex \runningauthor{S.~Takahama et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Takahama</surname><given-names>Satoshi</given-names></name>
          <email>satoshi.takahama@epfl.ch</email>
        <ext-link>https://orcid.org/0000-0002-3335-8741</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Ruggeri</surname><given-names>Giulia</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Dillner</surname><given-names>Ann M.</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>ENAC/IIE Swiss Federal Institute of Technology Lausanne (EPFL), Lausanne, Switzerland</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>University of California – Davis, Davis, California, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Satoshi Takahama (satoshi.takahama@epfl.ch)</corresp></author-notes><pub-date><day>28</day><month>July</month><year>2016</year></pub-date>
      
      <volume>9</volume>
      <issue>7</issue>
      <fpage>3429</fpage><lpage>3454</lpage>
      <history>
        <date date-type="received"><day>6</day><month>January</month><year>2016</year></date>
           <date date-type="rev-request"><day>18</day><month>February</month><year>2016</year></date>
           <date date-type="rev-recd"><day>15</day><month>June</month><year>2016</year></date>
           <date date-type="accepted"><day>1</day><month>July</month><year>2016</year></date>
      </history>
      <permissions>
<license license-type="open-access">
<license-p>This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link></license-p>
</license>
</permissions><self-uri xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016.html">This article is available from https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016.html</self-uri>
<self-uri xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016.pdf">The full text article is available as a PDF file from https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016.pdf</self-uri>


      <abstract>
    <p>Various vibrational modes present in molecular mixtures of laboratory and
atmospheric aerosols give rise to complex Fourier transform infrared (FT-IR)
absorption spectra. Such spectra can be chemically informative, but they often
require sophisticated algorithms for quantitative characterization of aerosol
composition. Naïve statistical calibration models developed for
quantification employ the full suite of wavenumbers available from a set of
spectra, leading to loss of mechanistic interpretation between chemical
composition and the resulting changes in absorption patterns that underpin
their predictive capability. Using sparse representations of the same set of
spectra, alternative calibration models can be built in which only a select
group of absorption bands are used to make quantitative prediction of various
aerosol properties. Such models are desirable as they allow us to relate predicted properties to their underlying molecular
structure. In this work, we present an evaluation of four algorithms for
achieving sparsity in FT-IR spectroscopy calibration models. Sparse
calibration models exclude unnecessary wavenumbers from infrared spectra
during the model building process, permitting identification and evaluation
of the most relevant vibrational modes of molecules in complex aerosol
mixtures required to make quantitative predictions of various measures of
aerosol composition. We study two types of models: one which predicts alcohol
COH, carboxylic COH, alkane CH, and carbonyl CO functional group (FG)
abundances in ambient samples based on laboratory calibration standards and
another which predicts thermal optical reflectance (TOR) organic carbon (OC)
and elemental carbon (EC) mass in new ambient samples by direct calibration
of infrared spectra to a set of ambient samples reserved for calibration. We
describe the development and selection of each calibration model and
evaluate the effect of sparsity on prediction performance. Finally, we
ascribe interpretation to absorption bands used in quantitative prediction of
FGs and TOR OC and EC concentrations.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <title>Introduction</title>
      <p>Atmospheric aerosols or particulate matter (PM) can range in size from a few
nanometers to tens of micrometers and exist as complex mixtures of organic
compounds, black carbon, sea salt and other inorganic salts, mineral dust,
trace elements, and water <xref ref-type="bibr" rid="bib1.bibx93" id="paren.1"/>. Inhalation exposure of PM can
lead to increased morbidity and mortality in susceptible populations,
interaction with radiation can lead to visibility reduction and perturbations
in the Earth's energy balance, and PM can serve as seeds for cloud droplets
and ice particles that lead to additional changes in the climate system
<xref ref-type="bibr" rid="bib1.bibx27 bib1.bibx56" id="paren.2"><named-content content-type="pre">e.g.,</named-content></xref>.</p>
      <p>Fourier transform infrared (FT-IR) spectroscopy <xref ref-type="bibr" rid="bib1.bibx40" id="paren.3"/> is a
versatile tool that has been used to detect or measure ammonium, water, ice,
mineral dust, organic functional groups (FGs), inorganic ions, and
carbonaceous material in laboratory and ambient particles
<xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx74 bib1.bibx1 bib1.bibx22 bib1.bibx55 bib1.bibx71 bib1.bibx92 bib1.bibx53 bib1.bibx54 bib1.bibx69 bib1.bibx77 bib1.bibx23 bib1.bibx47 bib1.bibx35 bib1.bibx102 bib1.bibx25" id="paren.4"><named-content content-type="pre">e.g.,</named-content></xref>.
While absorption bands of isolated molecules culminate in a series of narrow
peaks, condensed phase spectra pose challenges for interpretation as these
peaks significantly overlap due to heterogeneous broadening of bands from
similar bonds vibrating in slightly altered chemical environments
<xref ref-type="bibr" rid="bib1.bibx60" id="paren.5"/>. This phenomenon is particularly salient for atmospheric
PM, as it comprises a mixture of many different components, with the organic
fraction alone consisting of thousands of different types of molecules
<xref ref-type="bibr" rid="bib1.bibx43" id="paren.6"><named-content content-type="pre">e.g.,</named-content></xref>. Substrate interferences can additionally
obfuscate interpretation. For instance, a particular advantage of the FT-IR
technique is its capability to directly analyze particles collected on
polytetrafluoroethylene (PTFE, or Teflon) filters, which are routinely used
for analysis of gravimetric mass and elemental composition, among other
properties. FT-IR can extract a spectrum from an IR beam transmitted through
the filter rapidly and non-destructively, without requiring sample
pretreatment. In these cases, the PTFE signal can be the dominant component
of variation in the spectra <xref ref-type="bibr" rid="bib1.bibx74" id="paren.7"/>. In the face of such
complexity, statistical approaches are useful in building quantitative models
for calibration.</p>
      <p>These calibration models can take the form of a multivariate linear equation,
in which suitable coefficients are found to combine the effect of absorbances
at various wavenumbers of the infrared spectra to reproduce the concentration
of a target analyte. Problems of this form are commonly solved by ordinary
least squares (OLS) regression, but OLS performs poorly when the system is
undetermined (i.e., there are many thousands of wavenumbers and only several
hundred samples), and serial correlation exists among predictor variables
(i.e., absorbances among adjacent wavenumbers are not independent of one
another) <xref ref-type="bibr" rid="bib1.bibx108" id="paren.8"/>. Partial least squares (PLS) regression is a
method that is suitable for obtaining coefficients when such features are
present <xref ref-type="bibr" rid="bib1.bibx111 bib1.bibx37 bib1.bibx72" id="paren.9"/>. PLS can be applied to
spectra with or without accounting for PTFE contribution (background
interferences) a priori. The suitability of PLS has been demonstrated in
building calibration models for ammonium <xref ref-type="bibr" rid="bib1.bibx84" id="paren.10"/>, silica
<xref ref-type="bibr" rid="bib1.bibx107" id="paren.11"/>, organic functional groups
<xref ref-type="bibr" rid="bib1.bibx20 bib1.bibx102 bib1.bibx91" id="paren.12"/>, and, more recently, organic
carbon (OC) and elemental carbon (EC) reported by thermal optical reflectance
(TOR) <xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26" id="paren.13"/>. We remark that while the latter two
carbonaceous substances are presumably are composed of a complex combination
of molecules, statistical calibration models using FT-IR spectra have been
demonstrated to accurately predict TOR-equivalent OC and EC within
measurement precision of collocated samples reported by the TOR method
<xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26" id="paren.14"/>. However, we have not yet expounded on how
we are able to make accurate predictions from these calibration models,
especially for TOR OC and EC, which presumably require a suite of vibrational
modes for quantification.</p>
      <p>One approach to facilitate interpretation is variable selection, in which
models are reduced to only the relevant wavenumbers required for prediction
<xref ref-type="bibr" rid="bib1.bibx46 bib1.bibx29 bib1.bibx2" id="paren.15"/>. Typically, all wavenumbers are
used for prediction in PLS, with irrelevant wavenumbers having small
coefficient values. Retaining unnecessary wavenumbers can contribute to
overall noise in predictions, degrading model accuracy
<xref ref-type="bibr" rid="bib1.bibx13 bib1.bibx97 bib1.bibx52 bib1.bibx86 bib1.bibx79 bib1.bibx9" id="paren.16"/>,
and their elimination can make relevant variables more salient for
interpretation. Common methods examine various combinations of variables and
their subsets to arrive at the most predictive model <xref ref-type="bibr" rid="bib1.bibx46" id="paren.17"/>.
However, combinatorial analysis is prohibitive when the dimensionality of the
data is large, which is the case for infrared spectra where absorbances for
thousands of wavenumbers are available. More efficient methods for variable
elimination are accomplished through statistical sampling
<xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx107" id="paren.18"/>, but we explore a class of algorithms falling
under the domain of sparse methods <xref ref-type="bibr" rid="bib1.bibx29 bib1.bibx2" id="paren.19"/>. In the
context of linear regression, sparsity involves arriving at a set of
regression coefficients in which some or many of the values are exactly 0,
allowing the identification of important set of variables which remain.
Sparsity constraints can be imposed in one of several ways in the context of
PLS regression, and in this work we explore their merits for analysis of
atmospheric PM.</p>
      <p>We revisit calibration models for four FGs developed using laboratory
standards <xref ref-type="bibr" rid="bib1.bibx91 bib1.bibx101" id="paren.20"/> and TOR OC and EC calibration
models developed with ambient PM<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn>2.5</mml:mn></mml:msub></mml:math></inline-formula> samples collected in 2011 at seven
sites within the Interagency Monitoring of PROtected Visual Environment
(IMPROVE; <xref ref-type="bibr" rid="bib1.bibx70 bib1.bibx44" id="altparen.21"/>) monitoring network
<xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26" id="paren.22"/>. These past studies evaluate various
performance metrics achieved by statistical calibration models using the full
set of wavenumbers, and we evaluate the effect of variable selection on model
performance and interpretation. We explore four alternative multivariate
calibration models that can be built according to different sparsity
constraints and the resulting sensitivity of predictions to sparse
formulations. We build two sets of calibration models using two levels of
spectral processing (with and without removal of the PTFE interference) for
each analyte and algorithm, and we further report on the most influential
absorption bands in the infrared spectra identified for prediction of FGs and
TOR OC and EC.</p>
</sec>
<sec id="Ch1.S2">
  <title>Methods</title>
      <p>In this section, we first summarize the experimental protocol detailed by
<xref ref-type="bibr" rid="bib1.bibx91" id="text.23"/>, in which infrared spectra and reference measurements
are acquired (Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/>). We then describe the PLS
formulation, which provides the general framework for solving the calibration
problem by projection onto latent variables (LVs), and methods for generating
a range of sparse solutions (Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/>).
Section <xref ref-type="sec" rid="Ch1.S2.SS3"/> describes how models are selected
and evaluated, and Sect. <xref ref-type="sec" rid="Ch1.S2.SS4"/> describes our approach to
inferring influential absorption bands from the sparse solutions.</p>
<sec id="Ch1.S2.SS1">
  <title>Experimental methods and spectra processing</title>
<sec id="Ch1.S2.SS1.SSS1">
  <title>Laboratory and ambient samples</title>
      <p>For this work, we use 794 pairs of ambient PM<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn>2.5</mml:mn></mml:msub></mml:math></inline-formula> samples collected in
the IMPROVE monitoring network, 250 laboratory standards, and 54 blank
samples used previously by <xref ref-type="bibr" rid="bib1.bibx91" id="text.24"/>,
<xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26" id="text.25"/>, and <xref ref-type="bibr" rid="bib1.bibx101" id="text.26"/> for building FG
and TOR OC and EC calibration models with canonical PLS regression. A pair of
ambient samples consists of particles collected on 25 mm quartz fiber
filters and 25 mm PTFE filters. The quartz filters are analyzed by TOR IMROVE_A protocol for OC and EC mass
<xref ref-type="bibr" rid="bib1.bibx16" id="paren.27"/>, and the PTFE filters are used for acquisition of infrared
spectra, among other properties. Monthly median values of OC loadings in
blank samples are subtracted from ambient TOR OC loadings to account for the
gas phase adsorption artifact by quartz fiber filters <xref ref-type="bibr" rid="bib1.bibx25" id="paren.28"/>.
The  ratio of organic mass (OM) to organic carbon estimated in ambient samples span a range of 1.46 and 2.01
between the 10th and 90th percentiles, with a median ratio of 1.69
<xref ref-type="bibr" rid="bib1.bibx91" id="paren.29"/>. The 250 laboratory standards consist of 9 compound
types in single, binary, and ternary mixtures, and reference concentrations
are obtained by gravimetric analysis of the filters <xref ref-type="bibr" rid="bib1.bibx91" id="paren.30"/>.
Four FG calibration models are built for alcohol hydroxyl (aCOH), carboxylic
hydroxyl (cCOH), alkane hydrocarbon (aCH), and carbonyl (CO), which comprise
these compounds. The blank samples are analytical blanks of PTFE filters
analyzed in the laboratory.</p>
</sec>
<sec id="Ch1.S2.SS1.SSS2">
  <title>Infrared spectra</title>
      <p>The PTFE filters are scanned (without pretreatment) using a Tensor 27 FT-IR
spectrometer (Bruker Optics) with liquid-nitrogen-cooled mercury cadmium
telluride detector in transmission mode. Each spectrum is acquired over
mid-infrared wavenumbers of 4000 to 420 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, and absorbance
spectra are calculated with respect to an empty sample chamber as reference.
The chamber is purged with air free of water vapor and carbon dioxide using a
purge-gas generator (Puregas) for all scans.<?xmltex \hack{\newpage}?></p>
      <p>Two different versions of the spectra described above are used in building
calibration models with each described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/>.
Unprocessed (“raw”) spectra are unmodified except zero-filled
(interpolated) points introduced by the acquisition software are removed such
that absorbance values at 2784 wavenumbers at a resolution of
1.3 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> remain. Baseline corrected spectra are modified according
to the procedure described by <xref ref-type="bibr" rid="bib1.bibx102" id="text.31"/>. Absorbances below
1500 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are removed, and the interferences from PTFE are removed
by polynomial and linear interpolation between background regions such that
the analyte absorption is isolated for analysis. In this process, spectra are
interpolated along a wavenumber grid, which is also spaced at a resolution of
1.3 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, such that this spectra type contains 1563 wavenumbers.
Both types of spectra have previously shown comparable results for TOR OC and
EC prediction with PLS calibration <xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26" id="paren.32"/>. Example
spectra are shown in Supplement Sect. S1.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS2">
  <title>Development of calibration models</title>
      <p>In multivariate calibration, we seek to solve the linear equation for
coefficients <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">b</mml:mi></mml:math></inline-formula>:
            <disp-formula id="Ch1.E1" content-type="numbered"><mml:math display="block"><mml:mrow><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="bold">X</mml:mi><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> is the spectra matrix (composed by rows of spectra),
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> is a vector (column matrix) of response values, <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">b</mml:mi></mml:math></inline-formula> are the
regression coefficients (also referred to as the regression vector), and
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">e</mml:mi></mml:math></inline-formula> is a vector of residuals. We continue our discussion under the
assumption that <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> are centered by their column means
such that an intercept is not included as an additional coefficient.
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> can alternatively be a multivariate response matrix <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>,
but the univariate case is studied in this work to increase possibility for
interpretation <xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx14" id="paren.33"><named-content content-type="pre">e.g.,</named-content></xref>.
Equation (<xref ref-type="disp-formula" rid="Ch1.E1"/>) is commonly solved by OLS, where <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">b</mml:mi></mml:math></inline-formula> is
found by minimizing the residual sum of squares (RSS). However, in
spectroscopic applications, the problem is often complicated by
underdeterminacy (many more variables than samples) and collinearity (serial
correlation among absorbance values). Therefore, projection onto LVs
<xref ref-type="bibr" rid="bib1.bibx112" id="paren.34"><named-content content-type="pre">e.g.,</named-content></xref> or numerical regularization
<xref ref-type="bibr" rid="bib1.bibx57" id="paren.35"><named-content content-type="pre">e.g.,</named-content></xref> is commonly used to obtain a suitable solution
<xref ref-type="bibr" rid="bib1.bibx46" id="paren.36"/>. Since four of the five methods used in this work are
based on partial least squares or PLS
regression, we first introduce PLS and the underlying structures (loading
weights and direction vectors) by which the regression vector is constructed.</p>
<sec id="Ch1.S2.SS2.SSS1">
  <title>PLS description</title>
      <p>Notation for matrices and vectors is provided in
Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/>. PLS performs a bilinear decomposition
and projection of both <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> onto orthogonal bases
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">P</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">q</mml:mi></mml:math></inline-formula>, respectively
<xref ref-type="bibr" rid="bib1.bibx113 bib1.bibx114 bib1.bibx37 bib1.bibx76" id="paren.37"/>:

                  <disp-formula specific-use="align"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">TP</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold">E</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="bold-italic">y</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="bold">T</mml:mi><mml:msup><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

              The score matrix <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">T</mml:mi></mml:math></inline-formula> relates the two sets of variables and is
defined by column matrices of loading weight vectors, <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="bold">W</mml:mi><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>K</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, factor loadings <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="bold">P</mml:mi><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>K</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, and direction vectors <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="bold">R</mml:mi><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>K</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx103" id="paren.38"/>. The
direction vectors can be defined by loading weights and factor loadings:
              <disp-formula id="Ch1.Ex3"><mml:math display="block"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold">R</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold">W</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:msup><mml:mfenced open="(" close=")"><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">P</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>T</mml:mi></mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold">W</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:mfenced><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
            A hat above a symbol denotes the estimator of a variable. From the matrix of
direction vectors, we can construct scores and regression coefficients:

                  <disp-formula specific-use="align"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold">T</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mi mathvariant="bold">X</mml:mi><mml:mover accent="true"><mml:mi mathvariant="bold">R</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold">R</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>T</mml:mi></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

              <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> is therefore estimated as
              <disp-formula id="Ch1.Ex6"><mml:math display="block"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi mathvariant="bold">X</mml:mi><mml:mover accent="true"><mml:mi mathvariant="bold">R</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:msup><mml:mover accent="true"><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>T</mml:mi></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
            The objective function that is satisfied by solutions for <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">W</mml:mi></mml:math></inline-formula> and
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">R</mml:mi></mml:math></inline-formula> are described in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>.</p>
      <p>There are two types of variables in our application of PLS regression: the
physical variables, or spectral features, corresponding to wavenumbers at
which absorbances are measured (columns of <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula>), and LVs which
represent the underlying components of the model (columns of <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">T</mml:mi></mml:math></inline-formula>).
To avoid ambiguity, we will always refer to the latter as LVs. Solutions
obtained using PLS with full wavenumber resolution will be referred to as the
“full” (wavenumber) solutions.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS2">
  <title>Sparse methods</title>
      <p>We consider four methods for obtaining models that require fewer wavenumbers,
which are summarized below and described in more detail in
Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>, and by their respective authors in the
cited literature. The underlying principle for wavenumber selection used in
this paper is covariance maximization (for PLS regression) or residual
minimization (for OLS regression) scaled by a 1-norm penalty (<inline-formula><mml:math display="inline"><mml:mrow><mml:mo>‖</mml:mo><mml:mo>⋅</mml:mo><mml:msub><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>)
placed on the regression vector (<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">b</mml:mi></mml:math></inline-formula>), weight, or direction vector
(<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">w</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">r</mml:mi></mml:math></inline-formula>, respectively). These penalties lead to
(1) shrinkage (vector elements tend toward 0) and (2) selection (some
elements become exactly 0). One sparse PLS formulation, which we refer to
as SPLSa, effectively imposes the 1-norm penalty on the weight vectors
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>s of PLS <xref ref-type="bibr" rid="bib1.bibx66" id="paren.39"/>. The second sparse PLS formulation,
which we refer to as SPLSb, imposes the penalty on surrogate vectors kept in
close alignment with the direction vectors <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>s (which are closely
related to the weight vectors) to target a higher degree of sparsity
<xref ref-type="bibr" rid="bib1.bibx17" id="paren.40"/>. Elastic net (EN) regularization is not a variant of PLS but
belongs to a separate class of regression algorithms that solve
Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>). EN generalizes the least absolute shrinkage and
selection operator (LASSO; <xref ref-type="bibr" rid="bib1.bibx104" id="altparen.41"/>); the model infidelity
with respect to the response variable is penalized not only by the 1-norm but
also by the 2-norm applied directly to the regression vector <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">b</mml:mi></mml:math></inline-formula>. The
primary advantage for including the 2-norm penalty is that this addition
imparts a grouping effect, in that variables which co-vary are often selected
together rather than one of its members at random. For spectroscopic
applications where absorption bands span over several wavenumbers, selection
by groups associated with the same absorption band rather than single
wavenumbers from each band is desirable for interpretation. The last method
of estimation, EN–PLS, is a hyphenated method that combines EN for variable
selection and PLS for finding an alternate solution to Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>)
using the same subset of wavenumbers as selected by EN. In addition to the
number of LVs described above, the sparse methods described above introduce
an additional free parameter that controls the magnitude of penalty against
lack of sparseness, and different models with varying degrees of sparsity are
defined by changing this parameter (Table <xref ref-type="table" rid="App1.Ch1.T3"/>).</p>
</sec>
</sec>
<sec id="Ch1.S2.SS3">
  <title>Model selection</title>
<sec id="Ch1.S2.SS3.SSS1">
  <title>Designation of calibration and test sets</title>
      <p>We distinguish between samples used for training and validation of
calibration models (the “calibration set”) and the test set used for
evaluation (the “test set”, which has no influence on model development or
validation). The calibration and test sets are constructed identically to the
most accurate class of full wavenumber models described previously
<xref ref-type="bibr" rid="bib1.bibx91 bib1.bibx101 bib1.bibx25 bib1.bibx26" id="paren.42"/>. For FG
calibration, 158 laboratory standards are used for the calibration set while
80 similar laboratory samples and all 794 ambient samples are reserved for
the test set <xref ref-type="bibr" rid="bib1.bibx101" id="paren.43"/>. No blank samples are included in the FG
calibration, though samples with particular FG concentrations of 0 (e.g.,
compounds that only contain other FGs than those for which the calibration
model is being built) are included <xref ref-type="bibr" rid="bib1.bibx91" id="paren.44"/>. For TOR OC and
EC calibration, the 794 ambient samples are arranged in order of TOR
reference concentration and every third sample is selected for the test set
and the remaining two-thirds of samples used for the calibration set
<xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26" id="paren.45"/>. Similarly, one-third of blank samples are
reserved for the test set while the remaining two-thirds are included in the
calibration set. While the division between sets is arbitrary in the TOR OC
case, for TOR EC the blanks are first arranged according to their predicted
concentrations using a calibration model developed without blank samples
<xref ref-type="bibr" rid="bib1.bibx26" id="paren.46"/> and every third selected for the test set as for ambient
samples. <xref ref-type="bibr" rid="bib1.bibx26" id="text.47"/> additionally divided the EC samples into high
and low concentration samples to build a hybrid (piecewise) calibration model
for improving predictions for the latter group of samples. For the purpose of
this work, we only consider a single calibration model that spans the entire
range of concentrations for each algorithm and spectra preparation. Full
wavenumber calibration models for TOR OC and EC developed using this protocol
have demonstrated ability to capture overall variations in concentrations
which span a wide range of PM composition and environmental conditions
<xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26" id="paren.48"/>.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS2">
  <title>Metrics for model evaluation</title>
      <p>The root mean squared error (RMSE) between the observed (<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula>) and
estimated values (<inline-formula><mml:math display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula>) determined by cross validation (CV) is
conventionally used for model selection
<xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx46 bib1.bibx3" id="paren.49"/>. The RMSE given <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula> (a model
parameter or a set of parameters) is defined as
              <disp-formula id="Ch1.Ex7"><mml:math display="block"><mml:mrow><mml:msub><mml:mi mathvariant="normal">RMSE</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mo>‖</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:msubsup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
            This estimate is calculated for <inline-formula><mml:math display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula> model predictions evaluated against <inline-formula><mml:math display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula>
validation sets to arrive at the RMSE of <inline-formula><mml:math display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula>-fold CV, or RMSECV. To obtain
this estimate, the calibration set is divided into <inline-formula><mml:math display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula> “folds” or subsets;
model parameters are trained on <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>V</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> subsets combined together and validated
on the remaining subset, with the training and validation sets determined by
successive permutation over <inline-formula><mml:math display="inline"><mml:mi>V</mml:mi></mml:math></inline-formula> repetitions. For this work, samples in the
calibration set are arranged in order of increasing concentration for 10-fold
venetian blind CV in all methods such that the results are deterministic,
and the validation sets are likely to be representative of the training sets
for each permutation <xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx101" id="paren.50"/>. However, as RMSECV
generally captures the decreasing model bias and underestimates the growing
variance, various strategies for avoiding overfitting risk has been proposed.
One convention is to select a solution within a specified tolerance of the
minimum RMSECV – e.g., a fixed value of 10 % <xref ref-type="bibr" rid="bib1.bibx17" id="paren.51"/>, or within
1 standard deviation of the mean as estimated from the CV folds
<xref ref-type="bibr" rid="bib1.bibx46" id="paren.52"/>. For PLS, <xref ref-type="bibr" rid="bib1.bibx39" id="text.53"/> and <xref ref-type="bibr" rid="bib1.bibx101" id="text.54"/>
present new metrics which weigh decreasing RMSECV against increasing 2-norm
magnitude <inline-formula><mml:math display="inline"><mml:mrow><mml:mo>‖</mml:mo><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo>‖</mml:mo></mml:mrow></mml:math></inline-formula> of the regression vector, which is directly
correlated with calibration model variance <xref ref-type="bibr" rid="bib1.bibx28" id="paren.55"/>. We consider an
additional solution defined by the minimum of a penalized form of the RMSECV
<xref ref-type="bibr" rid="bib1.bibx39" id="paren.56"/>:

                  <disp-formula specific-use="align"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi mathvariant="normal">pRMSECV</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi mathvariant="normal">RMSECV</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mo>min⁡</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="normal">RMSECV</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mo>max⁡</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="normal">RMSECV</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mo>min⁡</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="normal">RMSECV</mml:mi><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>‖</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>‖</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mo>min⁡</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">{</mml:mo><mml:mo>‖</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>‖</mml:mo><mml:mo mathvariant="italic">}</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mo>max⁡</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">{</mml:mo><mml:mo>‖</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>‖</mml:mo><mml:mo mathvariant="italic">}</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mo>min⁡</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo mathvariant="italic">{</mml:mo><mml:mo>‖</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi mathvariant="italic">θ</mml:mi></mml:msub><mml:mo>‖</mml:mo><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

              As <inline-formula><mml:math display="inline"><mml:mrow><mml:mo>‖</mml:mo><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo>‖</mml:mo></mml:mrow></mml:math></inline-formula> generally increases with the number of LVs, the model
selected with this metric is sensitive to the maximum number of LVs over
which the metric is evaluated <xref ref-type="bibr" rid="bib1.bibx101" id="paren.57"/>. We evaluate pRMSECVs over
the interval of LVs between one and the minimum RMSECV solution in increments
of one, leading to selected models generally exceeding 10 % of the
minimum RMSECV value and providing a higher degree of parsimony with respect
to the number of LVs. Limited justification for this work is provided in
Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>, and a more dedicated discussion
on these metrics is provided by <xref ref-type="bibr" rid="bib1.bibx101" id="text.58"/>.</p>
      <p>Selection of the full wavenumber model is described according to the
procedure by <xref ref-type="bibr" rid="bib1.bibx101" id="text.59"/>, whereby an alternate formulation of the
pRMSECV metric is evaluated with an ensemble of penalties and selected by
consensus scoring <xref ref-type="bibr" rid="bib1.bibx49" id="paren.60"/>. For EN, in addition to the minimum
RMSECV solution, we consider a more parsimonious solution within 1 standard
error of the minimum RMSECV. For evaluation and selection of sparse PLS
methods, we reduce the complete set of models generated by varying the
sparsity parameter (except for EN–PLS, which is fixed by the EN solution) and
LVs by considering (1) global minimum RMSECV solutions, (2) solutions within
10 % tolerance of the minimum RMSECV model (standard errors of RMSECVs
were not readily obtainable), and (3) solutions meeting the minimum pRMSECV
criterion. These solutions are described further in Sect. S2. The final
solution for each algorithm and spectra type is selected from among candidate
models according to the extent of wavenumber reduction and capability to
produce estimates which are evaluated favorably against external reference
measurements of TOR OC and EC in the test set (when significant differences
in predictions exist). We consider our approach sufficient for this
exploratory work, but model selection weighing a metric of fidelity (e.g.,
RMSE) against two dimensions of parsimony (number of wavenumbers and LVs) is
a potential area of further research.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS3">
  <title>Methods of evaluation</title>
      <p>Models are evaluated on sparseness and comparison to reference measurements.
Reference measurements include FG abundances in laboratory standards and TOR
OC and EC concentrations in ambient samples in the test set. For evaluation
of FGs in ambient samples, we sum the carbon mass estimated from FG
abundances (which we designate as “FG-OC”) and compare against TOR OC.
FG-OC is estimated from moles <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> of FGs as
12.01 <inline-formula><mml:math display="inline"><mml:mi mathvariant="normal">µ</mml:mi></mml:math></inline-formula>g mole <inline-formula><mml:math display="inline"><mml:mrow><mml:mo>×</mml:mo><mml:mo>(</mml:mo><mml:mn>0.5</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msub><mml:mi>n</mml:mi><mml:mrow class="chem"><mml:mi mathvariant="normal">aCH</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mrow class="chem"><mml:mi mathvariant="normal">cCOH</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx91" id="paren.61"/>. We characterize sparseness by the number of selected
wavenumbers, or non-zero variables (NZVs). We use the Pearson's correlation
coefficient (<inline-formula><mml:math display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula>) to determine how closely related the predictions are (as
characterized by linearity) and the regression slope determined by
orthogonal regression (also referred to as major axis regression) to
characterize overall bias between two sets of values
<xref ref-type="bibr" rid="bib1.bibx87 bib1.bibx109" id="paren.62"/>. The intercept from the regression is not
reported to simplify the discussion (it is generally near 0); for detailed
evaluation that also considers detection limits and values close to 0, a
further study and exposition is recommended. Orthogonal regression is
appropriate for our comparisons as it considers that both concentrations
being compared are subject to error. However, we do not weigh each
observation by the magnitude of their errors, as measurement errors are
currently not well characterized for each sample and erroneous weighting can
lead to mischaracterization of bias.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS4">
  <title>Interpretation of influential absorption bands</title>
      <p>Examining sparse regression coefficients are informative for identifying
important absorption bands, but interpretation can still be complicated by
the compensation of interfering bands <xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx64" id="paren.63"/>.
Rather than viewing these coefficients directly, we examine the loading
weights of PLS to aid in this interpretation. The first weight component
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> of PLS can indicate a first approximation to the “pure
component” representation of the response variable but may be
obfuscated when large contributions to the signal come from components that
are not the target analyte <xref ref-type="bibr" rid="bib1.bibx42" id="paren.64"/>. This can be especially true
for our models where the spectra are not baseline corrected to remove the
PTFE signal prior to calibration. Therefore, we also examine the variable
importance in projection (VIP) metric <xref ref-type="bibr" rid="bib1.bibx112 bib1.bibx14" id="paren.65"><named-content content-type="pre">e.g.,</named-content></xref>,
which considers normalized loading weights with the fraction of captured
response and summarizes the importance of each wavenumber <inline-formula><mml:math display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula> for <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> LVs:

                <disp-formula specific-use="align"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mtext>VIP</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msqrt><mml:mrow><mml:mi>M</mml:mi><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:munderover><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mtext>SS</mml:mtext><mml:mi>h</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:msubsup><mml:msub><mml:mtext>SS</mml:mtext><mml:mi>h</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:msup><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:msub><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mtext>where</mml:mtext><mml:mspace width="0.25em" linebreak="nobreak"/><mml:msub><mml:mtext>SS</mml:mtext><mml:mi>h</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:msubsup><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi>h</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            VIP is an expression of the normalized loading weight <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">w</mml:mi></mml:math></inline-formula> of the <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th
LV weighted by the corresponding fraction of captured response
(sum of squares, or SS). The average of squared VIP scores across
wavenumbers equals one (<inline-formula><mml:math display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mi>M</mml:mi><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>M</mml:mi></mml:msubsup><mml:msubsup><mml:mtext>VIP</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>), so a
value of VIP greater than unity is often taken to be an indicator of an
important variable. However, this criterion is not a strict one, and
determination of useful thresholds is dependent on the proportion of
unimportant variables, correlation among important variables, and variation
in coefficient strengths present in the data set <xref ref-type="bibr" rid="bib1.bibx14" id="paren.66"/>. We
discuss our interpretation and selection of threshold in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>.</p>
      <p>For TOR OC and EC, we further provide an additional level of qualitative
interpretation by associating absorption bands of vibrational modes to FGs
that contribute to our capability for predicting TOR OC and EC. For this
purpose, we examine regression coefficients alongside VIP scores and consider
that negative coefficients (a) compensate for positive artifacts in other
absorption regions or (b) are themselves artifacts of oscillations that occur
in regression coefficients when the number of LVs in the model is large
<?xmltex \hack{\mbox\bgroup}?><xref ref-type="bibr" rid="bib1.bibx39" id="paren.67"/><?xmltex \hack{\egroup}?>. In the latter case, we consider its influence alongside
the positive contributions of adjacent wavenumbers. Wavenumbers and
vibrational modes of organic bonds tabulated by <xref ref-type="bibr" rid="bib1.bibx95" id="text.68"/> and
<xref ref-type="bibr" rid="bib1.bibx80" id="text.69"/> have been used in this analysis. We note that carboxylates
and aminium FGs have different vibrational frequencies compared to their
neutral forms. As our PM samples are analyzed under dry conditions
<xref ref-type="bibr" rid="bib1.bibx91" id="paren.70"/>, ionic forms of carboxyl and amine groups are not
expected in significant quantities. These FGs can have similar vibrational
frequencies in crystalline structures at ambient conditions (e.g., L-alanine,
from <xref ref-type="bibr" rid="bib1.bibx12" id="altparen.71"/>) but have not been considered in the
interpretation of the different model solutions discussed in this work. As
organic PM mass estimated by the FT-IR technique without consideration for
these structures often agrees with OM reported by other analytical techniques
<xref ref-type="bibr" rid="bib1.bibx89 bib1.bibx38 bib1.bibx19" id="paren.72"><named-content content-type="pre">e.g.,</named-content></xref>, we expect that
their contribution to OC quantification may also be small.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><caption><p>Estimated RMSECVs for a range of models using a different subset of
wavenumbers are shown by shaded regions. Results are shown for models using
raw spectra. For panels in the first two columns (SPLSa and SPLSb), the
shaded regions extend from the minimum RMSECV to pRMSECV solutions for each
sparsity penalization parameter. For the last column of panels (EN/EN–PLS),
the shaded region extends from the minimum RMSECV to 1 standard error above
for each value of the penalization parameter in EN estimates. Circles
correspond to models selected for this work. For EN/EN–PLS panels, red
circles correspond to the EN solution, and blue circles correspond to the
selected solutions for EN–PLS. The RMSECVs for EN–PLS are underestimated
(Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>) and may not be amenable for
direct comparisons with other methods.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f01.pdf"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><caption><p>Same information as Fig. <xref ref-type="fig" rid="Ch1.F1"/> is shown for
models using baseline corrected spectra.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f02.pdf"/>

        </fig>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><caption><p>Number of wavenumbers and LVs selected for final models.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:thead>
       <oasis:row>  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2"/>  
         <oasis:entry rowsep="1" namest="col3" nameend="col6" align="center">Number of NZVs (percent of full), number of LVs </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Spectra type</oasis:entry>  
         <oasis:entry colname="col2">Method</oasis:entry>  
         <oasis:entry colname="col3">aCOH</oasis:entry>  
         <oasis:entry colname="col4">cCOH</oasis:entry>  
         <oasis:entry colname="col5">aCH</oasis:entry>  
         <oasis:entry colname="col6">CO</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">Full</oasis:entry>  
         <oasis:entry colname="col3">2784 (100 %), 13</oasis:entry>  
         <oasis:entry colname="col4">2784 (100 %), 17</oasis:entry>  
         <oasis:entry colname="col5">2784 (100 %), 10</oasis:entry>  
         <oasis:entry colname="col6">2784 (100 %), 18</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">SPLSa</oasis:entry>  
         <oasis:entry colname="col3">549 (20 %), 13</oasis:entry>  
         <oasis:entry colname="col4">1022 (37 %), 4</oasis:entry>  
         <oasis:entry colname="col5">1837 (66 %), 12</oasis:entry>  
         <oasis:entry colname="col6">178 (6 %), 6</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">SPLSb</oasis:entry>  
         <oasis:entry colname="col3">107 (4 %), 10</oasis:entry>  
         <oasis:entry colname="col4">237 (9 %), 7</oasis:entry>  
         <oasis:entry colname="col5">2029 (73 %), 15</oasis:entry>  
         <oasis:entry colname="col6">1464 (53 %), 4</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">EN</oasis:entry>  
         <oasis:entry colname="col3">94 (3 %), none</oasis:entry>  
         <oasis:entry colname="col4">102 (4 %), none</oasis:entry>  
         <oasis:entry colname="col5">40 (1 %), none</oasis:entry>  
         <oasis:entry colname="col6">54 (2 %), none</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">EN–PLS</oasis:entry>  
         <oasis:entry colname="col3">94 (3 %), 12</oasis:entry>  
         <oasis:entry colname="col4">102 (4 %), 7</oasis:entry>  
         <oasis:entry colname="col5">40 (1 %), 7</oasis:entry>  
         <oasis:entry colname="col6">54 (2 %), 7</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">Full</oasis:entry>  
         <oasis:entry colname="col3">1563 (100 %), 22</oasis:entry>  
         <oasis:entry colname="col4">1563 (100 %), 9</oasis:entry>  
         <oasis:entry colname="col5">1563 (100 %), 18</oasis:entry>  
         <oasis:entry colname="col6">1563 (100 %), 9</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">SPLSa</oasis:entry>  
         <oasis:entry colname="col3">172 (11 %), 6</oasis:entry>  
         <oasis:entry colname="col4">100 (6 %), 1</oasis:entry>  
         <oasis:entry colname="col5">1451 (93 %), 9</oasis:entry>  
         <oasis:entry colname="col6">40 (3 %), 4</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">SPLSb</oasis:entry>  
         <oasis:entry colname="col3">483 (31 %), 31</oasis:entry>  
         <oasis:entry colname="col4">236 (15 %), 9</oasis:entry>  
         <oasis:entry colname="col5">147 (9 %), 9</oasis:entry>  
         <oasis:entry colname="col6">248 (16 %), 9</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">EN</oasis:entry>  
         <oasis:entry colname="col3">99 (6 %), none</oasis:entry>  
         <oasis:entry colname="col4">47 (3 %), none</oasis:entry>  
         <oasis:entry colname="col5">91 (6 %), none</oasis:entry>  
         <oasis:entry colname="col6">79 (5 %), none</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">EN–PLS</oasis:entry>  
         <oasis:entry colname="col3">99 (6 %), 19</oasis:entry>  
         <oasis:entry colname="col4">47 (3 %), 15</oasis:entry>  
         <oasis:entry colname="col5">91 (6 %), 13</oasis:entry>  
         <oasis:entry colname="col6">79 (5 %), 16</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup>

  <oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:thead>
       <oasis:row>  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2"/>  
         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center">Number of NZVs (percent of full), number of LVs </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Spectra type</oasis:entry>  
         <oasis:entry colname="col2">Method</oasis:entry>  
         <oasis:entry colname="col3">OC</oasis:entry>  
         <oasis:entry colname="col4">EC</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">Full</oasis:entry>  
         <oasis:entry colname="col3">2784 (100 %), 48</oasis:entry>  
         <oasis:entry colname="col4">2784 (100 %), 28</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">SPLSa</oasis:entry>  
         <oasis:entry colname="col3">2784 (100 %), 8</oasis:entry>  
         <oasis:entry colname="col4">2565 (92 %), 15</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">SPLSb</oasis:entry>  
         <oasis:entry colname="col3">629 (23 %), 8</oasis:entry>  
         <oasis:entry colname="col4">160 (6 %), 14</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">EN</oasis:entry>  
         <oasis:entry colname="col3">194 (7 %), none</oasis:entry>  
         <oasis:entry colname="col4">113 (4 %), none</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Raw</oasis:entry>  
         <oasis:entry colname="col2">EN–PLS</oasis:entry>  
         <oasis:entry colname="col3">194 (7 %), 7</oasis:entry>  
         <oasis:entry colname="col4">113 (4 %), 8</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">Full</oasis:entry>  
         <oasis:entry colname="col3">1563 (100 %), 15</oasis:entry>  
         <oasis:entry colname="col4">1563 (100 %), 33</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">SPLSa</oasis:entry>  
         <oasis:entry colname="col3">895 (57 %), 12</oasis:entry>  
         <oasis:entry colname="col4">828 (53 %), 15</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">SPLSb</oasis:entry>  
         <oasis:entry colname="col3">331 (21 %), 7</oasis:entry>  
         <oasis:entry colname="col4">1544 (99 %), 12</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">EN</oasis:entry>  
         <oasis:entry colname="col3">124 (8 %), none</oasis:entry>  
         <oasis:entry colname="col4">143 (9 %), none</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Baseline corrected</oasis:entry>  
         <oasis:entry colname="col2">EN–PLS</oasis:entry>  
         <oasis:entry colname="col3">124 (8 %), 8</oasis:entry>  
         <oasis:entry colname="col4">143 (9 %), 9</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</sec>
<sec id="Ch1.S3">
  <title>Results and discussion</title>
      <p>In this section, we first describe the range of sparse models that are
generated by different algorithms and tuning parameters and present the
models selected based on validation within the calibration set
(Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>). We then discuss our evaluation of these
models on the test set samples, which are samples excluded from the model
building and selection stage (Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>). We
conclude with a discussion of our interpretation of absorption bands used by
these calibration models (Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>).</p>
<sec id="Ch1.S3.SS1">
  <title>Sensitivity of wavenumber reduction to sparse formulation</title>
      <p>The sensitivity of RMSECV to models formulated with different NZVs are shown
in Figs. <xref ref-type="fig" rid="Ch1.F1"/> and <xref ref-type="fig" rid="Ch1.F2"/> for raw and
baseline corrected spectra, respectively (FGs are shown in fixed order from
highest to lowest wavenumber of absorption bands – aCOH, cCOH, aCH, and CO
– in all figures). In general, we note that decreasing the number of NZVs
does not necessarily reduce prediction quality as assessed through the
RMSECV. The solution selected is also indicated in each panel and summarized
in Table <xref ref-type="table" rid="Ch1.T1"/>. For TOR OC and EC, the most parsimonious solution
with respect to NZVs and LVs within 10 % of the minimum RMSECVs are selected
for the SPLSa and SPLSb, the solution within 1 standard error above the
minimum RMSECV for EN, and the minimum pRMSECVs for EN–PLS are chosen. The
selection criterion for FGs varies by method and spectra type, and a more
detailed evaluation is described in Sect. S2. Additional consideration
is required as RMSECV indicated for laboratory standard spectra may not
necessarily reflect the prediction error when extrapolated to ambient sample
spectra <?xmltex \hack{\mbox\bgroup}?><xref ref-type="bibr" rid="bib1.bibx101" id="paren.73"/><?xmltex \hack{\egroup}?>, and predictions of FGs cannot be evaluated
individually as no reference measurements exist for these samples. Therefore,
while candidate solutions are formulated with respect to the minimum RMSECV
and pRMSECV, we select one from among them after considering agreement
(correlation and regression slope) of the combined FG-OC with TOR OC and
overall reduction in NZVs.</p>
      <p>We observe from a comparison of methods that the range of RMSECVs estimated
by each model algorithm depends on the response variable (FGs or TOR OC and
EC) and spectra type, and none consistently outperforms the rest. EN–PLS is
able to achieve lower apparent RMSECVs than EN in many cases even while using
the same wavenumbers, though this difference may partially be due to
underestimation of the final RMSECV resulting from awareness of validation
samples in the PLS stage of model evaluation (Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>).
However, <xref ref-type="bibr" rid="bib1.bibx67" id="text.74"/> also suggests
that the ability of EN–PLS to access lower dimensional spaces leads to more
accurate calibration models than EN (i.e., further transformations are
applied, and unnecessary information is discarded by appropriate LV
selection). Imposing sparsity on individual PLS weight or direction vectors
as targeted by SPLSa and SPLSb eliminates different wavenumbers for each LV;
when combined over large number of LVs that we have in our models (Table <xref ref-type="table" rid="Ch1.T1"/>),
this approach does not guarantee overall sparsity in the
final regression coefficients. The tuning parameter for EN controls the
sparsity of the NZVs directly through the regression vector rather than the
PLS direction vectors, thereby permitting more control over the range of NZVs
than SPLSa or SPLSb. This control can allow construction of more sparse
solutions, though for extreme reductions in NZVs we observe consistently high
RMSECVs. SPLSb was formulated to achieve higher degrees of sparsity (fewer
NZVs) by penalizing the surrogate of the direction vector, rather than the
direction vector directly. In the case for TOR OC and EC where the identical
model selection criterion is used, we find that our selected SPLSb solutions
are more sparse than SPLSa in three out of four scenarios, with an exception noted
for the TOR EC calibration model using baseline corrected spectra.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><caption><p>Predictions vs. reference for ambient samples (OC from sum of FG).
“Anomalous” samples are those identified by <xref ref-type="bibr" rid="bib1.bibx91" id="text.75"/> (38
samples or 5 % of the total set) that share similar spectral profiles and
large disagreement with TOR in estimated OC. The cause for the disagreement
is at present time unknown.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f03.pdf"/>

        </fig>

      <p>The reduced wavenumber models using raw spectra resulted in fewer NZVs than
using baseline corrected spectra for 14 out of the 24 of the cases examined,
indicating that it is possible for sparse methods to effectively remove the
PTFE interference and achieve suitable performance. One potential explanation
may be that isolated regions of PTFE interference can be used efficiently to
correct for the remaining interferences from the analyte regions. On average,
NZVs for both spectra types are reduced by approximately 20 % for the
solutions chosen. However, reductions can be as low as 1–9 % for any
substance, mostly achieved by EN (with the exception of CO for the baseline
corrected case, where the fewest NZVs is achieved by the SPLSa algorithm).
The highest percent reduction is achieved for aCH (99 %; corresponding to 40
NZVs) using raw spectra with EN, which is a surprising result given the
richness of features in the aCH region <xref ref-type="bibr" rid="bib1.bibx41" id="paren.76"/>.</p>
      <p>Forty NZVs correspond to the fewest in the set evaluated, tied with CO of
baseline corrected spectra. <xref ref-type="bibr" rid="bib1.bibx102" id="text.77"/> previously found that molar
absorption coefficients for carboxylic and ketonic CO were approximately
similar for the compounds used in their study. If a similar conclusion holds
for carboxylic, ketonic, and ester CO absorption in the compounds used in
this study, it is plausible for such a reduced PLS model to result if a
subset of wavenumbers common to these three types of CO bonds is selected.
Comparing the minimum NZVs obtained in this study, the TOR OC and EC retain
more than 110 each, where NZVs for individual FGs are all below this value.
This is consistent with our understanding that OC and EC comprise complex
mixtures beyond a single FG. However, the NZVs for TOR OC or EC are less than
the sum of individual functional groups; not all of the NZVs from these
individual FGs are necessary for prediction of OC and EC.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <title>Sensitivity of predictions to model sparsity and evaluation against reference measurements</title>
      <p>Prediction of FG concentrations in laboratory samples shows good agreement
with laboratory samples with <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>0.9</mml:mn></mml:mrow></mml:math></inline-formula> and slopes within 5 % of
unity (Sect. S2.2.2). The FT-IR FG-OC estimated with the full models show high
correlation with TOR OC (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>0.9</mml:mn></mml:mrow></mml:math></inline-formula>) (Fig. <xref ref-type="fig" rid="Ch1.F3"/>).
Estimates of FG-OC from the raw spectra calibration model exhibit lower bias
on average with a regression slope of 0.97, while the regression slope is
0.75 with the baseline corrected model. For estimates from SPLSa and SPLSb
models, correlation with TOR OC is mild to strong (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>&gt;</mml:mo><mml:mn>0.7</mml:mn></mml:mrow></mml:math></inline-formula>), with
regression slope varying between 0.82 and 1.69. However, these two metrics do
not fully capture the bifurcating relationships between predicted and
reference concentrations as shown by the scatter plots (e.g., for SPLSa, raw
spectra model and SPLSb, baseline corrected model), which speaks to the way in
which predictions from the reduced models deviate from those of the full
models for different types of samples. For instance, the moles of aCH
predicted by SPLSb for rural samples are more than a factor of 2 higher than
that predicted by the full model, while the agreement is within 10 % for
urban samples (Fig. <xref ref-type="fig" rid="Ch1.F4"/>). Using the same
wavenumbers as EN, EN–PLS predictions are more consistent with those of the
full model and exhibit generally higher correlation with the reference TOR OC
concentrations (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn>0.92</mml:mn></mml:mrow></mml:math></inline-formula> and 0.97 compared with <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mn>0.87</mml:mn></mml:mrow></mml:math></inline-formula> and 0.9 for the raw
and baseline corrected models, respectively). Examining the correlation for
each FG reveals that the major difference is in the estimated aCH by EN and
EN–PLS; the aCH predictions especially for EN in urban areas are 1.5 times
higher than the full solution, largely contributing to an increase in the
slope from 0.97 to 1.24. In contrast, the EN–PLS solutions predict aCH within
10 % of the full model.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4"><caption><p>Comparison statistics of full and reduced wavenumber solutions show
sensitivity of model predictions to sparsity. OC and EC correspond to
predictions from direct calibration to TOR measurements rather than summing
FGs.</p></caption>
          <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f04.pdf"/>

        </fig>

      <p>It is worth noting that all sparse models predict higher concentrations of
aCOH in both urban and rural samples than full models on average by a factor
of 3–8, when raw spectra are used for calibration. For this spectra type,
urban cCOH samples are also on average over-predicted by a factor of 4
(except by SPLSb). Predictions for CO generally exhibit less variation.
However, as the aCOH and cCOH are not as large contributors to the OM as aCH
(60–70 % of OM mass according to the full model; <xref ref-type="bibr" rid="bib1.bibx91" id="altparen.78"/>),
this is not prominently reflected in the comparison against TOR OC. There is
less variability in aCOH and cCOH according to sparse formulations using
baseline corrected spectra, suggesting that removal of PTFE interferences in
this region may be relevant. We provide a limited illustration of how
predictions vary in the baseline corrected models according to selected
wavenumbers in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS1"/>.</p>
      <p>In comparison to the FG-OC predictions which are extrapolated from laboratory
standards to the composition domain of atmospheric OM, predictions for TOR OC
and EC made by direct calibration to ambient samples show remarkable
consistency with the full model solution (Fig. <xref ref-type="fig" rid="Ch1.F4"/>),
and capability for accurate prediction
with respect to evaluation TOR OC and EC measurements (Fig. <xref ref-type="fig" rid="Ch1.F5"/>).
The difference with respect to the full model
solution is generally within 30 % with highest differences observed in rural
samples, but this is likely due to the lower concentrations in these areas.
Comparing against TOR OC test set samples, <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>≥</mml:mo><mml:mn>0.98</mml:mn></mml:mrow></mml:math></inline-formula> and slope within 8 %
of unity except for EN; for TOR EC, <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>r</mml:mi><mml:mo>≥</mml:mo><mml:mn>0.93</mml:mn></mml:mrow></mml:math></inline-formula> and slope less than 5 % of
unity, again except for EN. EN predictions retain high correlations but
slopes with respect to reference are between 0.89 and 0.95 for both TOR OC
and EC.</p>
      <p>We have only compared one possible solution from each method, but other
solutions can be generated for a given algorithm by changing the model
parameters; there may be possible solutions which are better suited. However,
as concluded previously (somewhat obviously) in PLS applications to aerosol
FT-IR spectra, predictions are most robust when samples in the evaluation set
are similar to those in the calibration set <xref ref-type="bibr" rid="bib1.bibx25 bib1.bibx26 bib1.bibx101" id="paren.79"/>,
and this also applies to sparse calibration models.
Calibration models developed with laboratory standards and ambient samples
predict concentrations in laboratory standards and ambient samples,
respectively, with only mild sensitivity to model formulation. Largest
variations in predictions occur when extrapolating from laboratory standards
to ambient samples.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5" specific-use="star"><caption><p>Predictions vs. reference for ambient samples (direct calibration).</p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f05.pdf"/>

        </fig>

<?xmltex \hack{\newpage}?>
</sec>
<sec id="Ch1.S3.SS3">
  <title>Influential absorption bands</title>
      <p>VIP scores and the sign of regression coefficients at each wavenumber are
shown in Figs. <xref ref-type="fig" rid="Ch1.F6"/> and <xref ref-type="fig" rid="Ch1.F7"/> for raw and baseline
corrected PLS model solutions, respectively. As EN–PLS uses the same
wavenumbers as EN with the same or better performance metrics (Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>),
we omit discussion of EN in this section. We
first confirm that FG calibration models use wavenumbers which are consistent
with our physical understanding of the vibrational modes belonging to FG
groups (Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS1"/>) but also include PTFE interferences
and spurious correlations with other absorption bands. For describing our
interpretation of bonds and FGs that give rise to our capability to predict
TOR OC and EC, we begin with the EN–PLS solution (Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS2"/>)
as it is the most parsimonious subset of each of the
full and other sparse solutions, and then we extend our interpretation to the
remaining solutions (Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS3"/>).</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6" specific-use="star"><caption><p>VIP for solutions using raw spectra. Dark gray lines in the “full”
solution panels correspond to the first loading weights, and the vertical
bars in every panel extend from 0 to VIP scores. Red points accompanying
vertical bars indicate wavenumbers for which regression coefficients are
positive and blue points indicate wavenumbers for which coefficients are
negative. Regions up to VIP scores of 0.5 are shaded to indicate VIP scores
not considered for our interpretation
(Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>). </p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f06.pdf"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7" specific-use="star"><caption><p>VIP for solutions using baseline corrected spectra. Lines and colors
are as indicated for Fig. <xref ref-type="fig" rid="Ch1.F6"/>.</p></caption>
          <?xmltex \igopts{width=398.338583pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f07.pdf"/>

        </fig>

<sec id="Ch1.S3.SS3.SSS1">
  <title>Interpretation of FG solutions</title>
      <p>We first describe our interpretation of the most parsimonious EN–PLS solution
for each FG and extend our interpretation to the SPLSa, SPLSb, and full
spectrum solutions. For aCH, CO, and cCOH FGs, the same vibrational modes are
used for both baseline corrected and raw spectrum solutions. In the case of
aCH, the C–H stretching mode (near 2900 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) and what appears to
be spurious correlation with C=O stretching (near 1700 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) in
carbonyl compounds are found. In the calibration experiments, carbonyl
compounds were included in the set of compounds used as laboratory standards
for the calibration of the aCH FG (i.e., malonic acid, adipic acid, suberic
acid, arachidyl dodecanoate, and 12-tricosanone; <xref ref-type="bibr" rid="bib1.bibx91" id="altparen.80"/>).
For CO, the C=O stretching is used. The C=O stretching vibration is used also
in the cCOH calibration, in both the baseline corrected and raw spectrum
solutions. Different sections of the spectra in the region of C=O stretching
absorption are used for CO and cCOH solutions, though they do not necessarily
coincide with wavenumbers expected for their specific chemical environments.
Carbonyl in carboxylic acids are nominally centered around a lower
vibrational frequency (<inline-formula><mml:math display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1710 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) than in esters
(<inline-formula><mml:math display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1735 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), but in our solution the wavenumbers used by the cCOH model
in the carbonyl region is higher than that for the CO model which also
included esters in the calibration set. While carbonyls have a relatively
narrow absorption band compared to O–H stretching used by alcohol
(3500–3200 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) or carboxylic hydroxyl groups (3400–2400 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), it
is broad enough such that EN–PLS selects different regions of the band that
may not correspond to the location of peak absorbance. The high VIP scores
and negative coefficients found for the wavenumbers around 2360 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>
in both CO and cCOH raw spectrum solutions can be associated
with the background (PTFE) correction which can interfere with carbonyl
quantification, which lies at the shoulder of the C–F stretching mode of
PTFE. Additional wavenumbers associated with high VIP scores found in the
cCOH in the raw solution are also attributed to background correction. In the
case of aCOH, different vibration modes are used by the baseline corrected
and the raw spectrum solutions. While the baseline corrected solution uses the
alcohol O–H stretching mode (near 3300 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), the raw spectrum
solution uses C–O–H bending (680–620 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) and C–O stretching
vibrational modes (1200–1015 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>). In the baseline corrected
solution for aCOH, the high VIP score at 1707 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is interpreted
as a spurious correlation with CO, present in compounds in the calibration
set. The high VIP scores in the aCOH raw spectrum solution are attributed to
background correction.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8"><caption><p>Example group of baseline corrected spectra shown with VIP scores
above 0.5 overlayed. The VIP is derived from all calibration spectra
consisting of laboratory standards (same as those shown in
Fig. <xref ref-type="fig" rid="Ch1.F7"/>). The column of panels on right show sensitivity of
predictions for each FG for this subset of samples.</p></caption>
            <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://amt.copernicus.org/articles/9/3429/2016/amt-9-3429-2016-f08.pdf"/>

          </fig>

      <p>SPLSa, SPLSb, and full spectrum solutions use the same vibrational modes of the
EN–PLS solution for the quantification of cCOH, aCH, and CO. The additional
peaks in the raw spectrum solutions are interpreted as being associated with
background correction. For aCOH, the less parsimonious methods use the O–H
stretching in both the baseline corrected and the raw spectrum solutions in
contrast to the EN–PLS solution, which uses two different vibrational modes
for different spectra types.</p>
      <p>Similarity in predicted abundances is not necessarily anticipated by the
number of wavenumbers used. An illustration is provided in Fig. <xref ref-type="fig" rid="Ch1.F8"/>,
where a group of similar baseline corrected ambient
spectra is overlaid on VIP scores for the full model, SPLSa, and EN–PLS.
While the EN–PLS solutions have a higher degree of sparsity than the SPLSa
(2–6 % of original wavenumbers EN–PLS compared to 6–93 % for SPLSa) for
every FG except baseline corrected CO (Table <xref ref-type="table" rid="Ch1.T1"/>), EN–PLS
estimates are more closely aligned with the full wavenumber predictions. The
aCOH abundance of sparse solutions are an anomaly in that they are correlated
with each other and overpredict the full wavenumber estimates by a
significant amount (which is true for all sparse solutions;
Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>). This pattern may be the result of selecting
narrow bands of wavenumbers for the estimation of this FG, when O–H
stretching from hydroxyl groups in alcohol compounds exhibit a broad
absorption band <xref ref-type="bibr" rid="bib1.bibx80" id="paren.81"/>. Even for this similar group of spectra
which presumably share similar chemical composition, the varied patterns in
predicted concentrations across sparse models indicate that the features
selected for quantification by laboratory standard calibrations may not be
consistent with additional features present in ambient samples. We anticipate
that this analysis of such sensitivity can further guide the selection of
laboratory standard mixtures for calibration, in addition to measurement
intercomparisons.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2" specific-use="star"><caption><p>Summary of FG and associated vibrational modes with positive
regression coefficients used in the prediction of TOR OC and TOR EC by EN–PLS
method. The FGs that are common to every solution (both raw and baseline
corrected solution for TOR OC and EC) are reported first, followed by
non-oxidized FGs, oxygenated FGs, and nitrogenated FGs; the order is not
indicative of abundance.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="91.048819pt"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="91.048819pt" colsep="1"/>
     <oasis:colspec colnum="4" colname="col4" align="justify" colwidth="91.048819pt"/>
     <oasis:colspec colnum="5" colname="col5" align="justify" colwidth="91.048819pt"/>
     <oasis:thead>
       <oasis:row>  
         <oasis:entry colname="col1"/>  
         <oasis:entry rowsep="1" namest="col2" nameend="col3" align="center" colsep="1">OC </oasis:entry>  
         <oasis:entry rowsep="1" namest="col4" nameend="col5" align="center">EC </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Functional groups</oasis:entry>  
         <oasis:entry colname="col2">Raw spectra</oasis:entry>  
         <oasis:entry colname="col3">Baseline corrected <?xmltex \hack{\hfill\break}?>spectra</oasis:entry>  
         <oasis:entry colname="col4">Raw spectra</oasis:entry>  
         <oasis:entry colname="col5">Baseline corrected <?xmltex \hack{\hfill\break}?>spectra</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Aromatic</oasis:entry>  
         <oasis:entry colname="col2">Ring stretch in aromatic compounds <?xmltex \hack{\hfill\break}?>Substituted benzene ring overtones<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula><?xmltex \hack{\hfill\break}?>Conjugation with C=O Naphthalene in plane ring deformation</oasis:entry>  
         <oasis:entry colname="col3">Conjugation with C=O <?xmltex \hack{\hfill\break}?>C=C stretch<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula> <?xmltex \hack{\hfill\break}?>Substituted benzene ring overtones<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col4">Benzene ring stretch</oasis:entry>  
         <oasis:entry colname="col5">Benzene ring stretch <?xmltex \hack{\hfill\break}?>Conjugation with C=O <?xmltex \hack{\hfill\break}?>Substituted benzene ring overtones</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Amides</oasis:entry>  
         <oasis:entry colname="col2">N–C=O bend <?xmltex \hack{\hfill\break}?>C=O out of plane bend <?xmltex \hack{\hfill\break}?>N–H bend</oasis:entry>  
         <oasis:entry colname="col3">N–H bend</oasis:entry>  
         <oasis:entry colname="col4">N–H bend</oasis:entry>  
         <oasis:entry colname="col5">N–H bend</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Esters</oasis:entry>  
         <oasis:entry colname="col2">C=O stretch</oasis:entry>  
         <oasis:entry colname="col3">C=O stretch</oasis:entry>  
         <oasis:entry colname="col4">C–O–C antisymmetric stretch</oasis:entry>  
         <oasis:entry colname="col5">C=O stretch</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Alkanes</oasis:entry>  
         <oasis:entry colname="col2">CH<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> bend</oasis:entry>  
         <oasis:entry colname="col3">C–H stretch</oasis:entry>  
         <oasis:entry colname="col4"/>  
         <oasis:entry colname="col5">C–H stretch</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Alkenes</oasis:entry>  
         <oasis:entry colname="col2">Conjugation with C=O</oasis:entry>  
         <oasis:entry colname="col3">C=C stretch<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula> <?xmltex \hack{\hfill\break}?>Conjugation with C=O</oasis:entry>  
         <oasis:entry colname="col4"/>  
         <oasis:entry colname="col5">Conjugation with C=O<?xmltex \hack{\hfill\break}?>Alkene C=C<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Carboxyl</oasis:entry>  
         <oasis:entry colname="col2">C=O stretch<?xmltex \hack{\hfill\break}?>O–C=O bend</oasis:entry>  
         <oasis:entry colname="col3">C=O stretch<?xmltex \hack{\hfill\break}?>OH stretch</oasis:entry>  
         <oasis:entry colname="col4"/>  
         <oasis:entry colname="col5">C=O stretch</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Ketones</oasis:entry>  
         <oasis:entry colname="col2">C=O stretch</oasis:entry>  
         <oasis:entry colname="col3">C=O stretch</oasis:entry>  
         <oasis:entry colname="col4"/>  
         <oasis:entry colname="col5">C=O stretch</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Aldehydes</oasis:entry>  
         <oasis:entry colname="col2">C=O stretch<?xmltex \hack{\hfill\break}?>C–C–CHO bend</oasis:entry>  
         <oasis:entry colname="col3">C=O stretch</oasis:entry>  
         <oasis:entry colname="col4"/>  
         <oasis:entry colname="col5">C=O stretch</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Ethers</oasis:entry>  
         <oasis:entry colname="col2"/>  
         <oasis:entry colname="col3"/>  
         <oasis:entry colname="col4">C–O stretch in aryl ethers</oasis:entry>  
         <oasis:entry colname="col5"/>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Alcohol</oasis:entry>  
         <oasis:entry colname="col2">C–O–H bend<?xmltex \hack{\hfill\break}?>Ar–OH out of plane deformation</oasis:entry>  
         <oasis:entry colname="col3">OH stretch</oasis:entry>  
         <oasis:entry colname="col4"/>  
         <oasis:entry colname="col5"/>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Amines</oasis:entry>  
         <oasis:entry colname="col2"/>  
         <oasis:entry colname="col3">N–H bend</oasis:entry>  
         <oasis:entry colname="col4">C–N stretch in <?xmltex \hack{\hfill\break}?>aromatic amines</oasis:entry>  
         <oasis:entry colname="col5">N–H bend</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Nitro compounds</oasis:entry>  
         <oasis:entry colname="col2">NO<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> deformation<?xmltex \hack{\hfill\break}?>NO<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> antisymmetric stretch</oasis:entry>  
         <oasis:entry colname="col3"/>  
         <oasis:entry colname="col4">NO<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:math></inline-formula> antisymmetric stretch</oasis:entry>  
         <oasis:entry colname="col5"/>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p><inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula> Weak bands</p></table-wrap-foot></table-wrap>

</sec>
<sec id="Ch1.S3.SS3.SSS2">
  <title>Interpretation of TOR OC and EC solutions from EN–PLS</title>
      <p>For both the prediction of TOR OC and EC, different sets of wavenumbers
(i.e., absorption bands) are used between the raw (Fig. <xref ref-type="fig" rid="Ch1.F6"/>)
and baseline corrected (Fig. <xref ref-type="fig" rid="Ch1.F7"/>) spectrum solutions. In the
raw solution for TOR OC and TOR EC prediction, large VIP scores with negative
coefficients near 2000 and 1200 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> constitute a
means for the correction of the PTFE absorption and scattering. For instance,
this correction compensates for the positive artifact near
4000 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>,
where the PTFE scattering is the only contributor to the
infrared signal, but PTFE contributions are also present in regions of
analyte absorption. As the largest VIP scores are associated with PTFE
correction and smaller VIP scores associated with wavenumbers in absorption
bands of target analytes, we primarily consider absorption bands above a low
VIP threshold of approximately 0.5 in this work.</p>
      <p>Even by excluding wavenumbers with extremely small VIP scores and PTFE
contributions, many interpretations for contributing FGs still exist on
account of the large number of overlapping absorption bands at each
wavenumber used by the two spectra types. In this first analysis of relevant
wavenumbers for TOR OC and EC calibration, we present our interpretation
through a “common FG hypothesis” in which we assume it most likely that
predictions by the raw and baseline corrected spectra models are primarily
enabled by a common set of FGs. This framework leads to the possibility that
the same FG may be used by the two solutions by means of different
vibrational modes (at different wavenumbers), and the inference that these
comprise essential FGs necessary for prediction. We cannot exclude the
possibility that there exists a suite of FGs with approximately similar
capability to provide, in some combination, quantitative prediction of TOR OC
and EC, leading to two models that require less than maximal overlap in FGs.
If we consider an extreme case, which we denote as the “divergent FG
hypothesis,” a pair of models may use a minimally redundant set. This
approach to band assignment can lead to an intractable number of
possibilities and is also considered less plausible given the similar level
of accuracy and robustness attained by the two models. Table <xref ref-type="table" rid="Ch1.T2"/>
presents our findings of bonds found in each model;
additional possibilities for each wavenumber are documented in Sect. S3. We
limit our discussion below to main FGs found by both models through the
perspective of the common FG hypothesis.</p>
      <p>We preface our interpretations by stating that, at this time, we make no claim regarding
the relative contributions of each FG to TOR OC or EC mass, as our VIP
analysis considers the importance of spectra absorbances (and not FG
abundance) to the mass concentrations. Knowledge regarding molar absorption
coefficients and the relationship of FG to carbon abundance
<xref ref-type="bibr" rid="bib1.bibx102" id="paren.82"/> are additionally necessary to relate absorbance with the
mass concentrations to which the models are calibrated; this information is
unavailable and not possible to estimate unambiguously from the set of
regression coefficients for these complex mixtures.</p>
      <p>The wavenumbers used by the TOR OC calibration models correspond to
vibrational modes associated with major FGs of organic PM. Carbonyls
associated with carboxylic acids, ketones, aldehydes, and esters are used by
both models through the C=O stretch (1700–1750 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>). Carboxylic
acid groups are inferred through O–H and C=O stretch by the baseline
corrected model and O–C=O bending in the raw spectra model. The alcohol FG is
used by both baseline corrected and raw solutions but by means of different
vibrational modes (i.e., O–H stretch in baseline corrected spectrum solution
and C–O–H bending in the raw spectrum solution), as in the case of the EN–PLS
solution for aCOH FG. N–H bending associated with amide and amines
(1640–1550 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) is used by both models, with a strong N–C=O
(630–570 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) and C=O out-of-plane bending absorption in amides
(615–535 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) additionally used by the raw spectrum solution.
Given the additional support for assignment of the N–H bending mode to amide
in the raw spectrum solution, our common FG hypothesis favors the
interpretation that this mode in the baseline corrected spectra mode may also
be more strongly linked with amides. While not currently reported in
measurements of organic aerosol by FT-IR <xref ref-type="bibr" rid="bib1.bibx90 bib1.bibx91" id="paren.83"><named-content content-type="pre">e.g.,</named-content></xref>,
amide-containing compounds have been suggested to partition
to the aerosol phase as well as formed through condensed phase reactions
<xref ref-type="bibr" rid="bib1.bibx82 bib1.bibx4 bib1.bibx78" id="paren.84"/>. Various vibrational modes
associated with aromatic and alkene species have medium or weak absorption in
regions used by both raw and baseline corrected spectrum solutions. The modes
associated with conjugation of C=O with phenyl and alkenes absorb in regions
used by both baseline corrected and raw solutions. Alkane chains are used by
the baseline corrected solution by means of the C–H stretching mode in the
wavenumber range 2913–2921 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. In the raw solution, the
assignment is less clear, but the region of 1505–1517 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> used by the
model (with high VIP scores and positive coefficients) overlaps with the
shoulder of <inline-formula><mml:math display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CH</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> bending vibrations of alkane chains near 1475 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>.
Given the large contribution of alkane C–H groups to the
overall organic aerosol mass estimated as estimated by FG calibrations
(60–70 %; <xref ref-type="bibr" rid="bib1.bibx91" id="altparen.85"/>), the lack of a more obvious
correspondence between regression coefficients and vibrational models of
saturated CH groups is unexpected.</p>
      <p>The baseline corrected TOR EC calibration model appears to rely on similar
absorption bands and FGs as TOR OC; this can be partly explained by the
restricted range of wavenumbers used. However, as the regression coefficients
are different we note that the bands are weighted differently in arriving at
their respective predictions, possibly indicating their use for OC artifacts
in quantification of TOR EC. Many similarities in the structure of VIP scores
with the raw solutions of TOR OC can be accounted to the PTFE corrections
previously described, though the main spectral features used by the raw
spectra TOR EC model appears to be vibrational modes in the molecular
fingerprint region that overlaps with the absorbance from the C–F stretching
of PTFE (<inline-formula><mml:math display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1200 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) and the adjacent region between
1497 and 1531 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, which is at the lower boundary of the baseline
corrected solution.</p>
      <p>Both TOR EC models may use four FGs used by the TOR OC solutions: aromatic
and ring structures, amines and amides, and esters. The C–C ring stretch is
present in the baseline corrected solution at <inline-formula><mml:math display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1600 and
<inline-formula><mml:math display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1500 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> for the raw spectrum solution. In the former
case, there may also be indication for the conjugation of phenyl rings with
carbonyl compounds in ketones, aldehydes, and esters and weak absorption due
to overtones in substituted benzene rings. Amines and amides may be
incorporated through the N–H bending absorption by the baseline corrected
solution (1587–1601 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) and the C–N stretching in aromatic
amines in the raw solution. Additionally, given the positive regression
coefficients, it is possible that the TOR EC raw spectra model uses the
fingerprint region (1100–1300 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) associated with vibrational
modes of amines and ethers, using not only this region to compensate for the
possible artifacts due to PTFE absorption as for TOR OC. The C=O stretching
vibration (1722–1739 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) in the baseline corrected solution can
be attributed to ketones, aldehydes, or esters, but when harmonizing with the
raw spectrum solution according to the common FG hypothesis, the ester
assignment through the C–O–C antisymmetric stretch
(1275–1279 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) is considered possible. Ester formation has been
associated with aqueous phase processing of biogenic organic compounds in
atmospheric PM <xref ref-type="bibr" rid="bib1.bibx99 bib1.bibx62" id="paren.86"/>, so its association with EC is
unexpected and tentative.</p>
      <p>The band assignments for TOR OC are perhaps not surprising given that many of
the FGs have been used previously for quantification of organic PM, but it is
worth considering our interpretation for TOR EC in context as FT-IR is not
commonly employed for the study of elemental carbon or similar substances.
“Elemental carbon” strictly refers to sp<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula> carbon not bound to other
elements and has the property of thermal stability with respect to
vaporization up to 4000 K in an inert atmosphere, or 340 <inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C in an
oxidizing environment <xref ref-type="bibr" rid="bib1.bibx81 bib1.bibx65" id="paren.87"/>. EC by TOR is therefore
quantified by heating PM samples at these high temperatures in the presence
of oxygen, and its quantity separated from the preceding OC vaporized
anoxically by an operationally defined protocol based on evolving filter
optical properties <xref ref-type="bibr" rid="bib1.bibx16" id="paren.88"/>. Atmospheric elemental carbon as reported
by this method is likely to represent a set of strongly light-absorbing,
low-volatility compounds that characterize carbonaceous material formed or
emitted from combustion processes rather than in one of the chemically pure
allotropic forms (e.g., graphite and diamond) <xref ref-type="bibr" rid="bib1.bibx15 bib1.bibx81" id="paren.89"/>.</p>
      <p>Direct measurements of elemental carbon and associated substances by infrared
spectroscopy are not numerous, but <xref ref-type="bibr" rid="bib1.bibx33 bib1.bibx32" id="text.90"/> recorded
infrared spectra of ground graphite and coal; reporting strong, broad
absorption bands centered near 1600 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> superposed on a wider band
between 1800 and 900 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> in both substances. While 1600 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>
is near reported lattice frequencies for crystalline graphite
<xref ref-type="bibr" rid="bib1.bibx106" id="paren.91"/>, the band becomes visible only with extensive grounding
of this material in which crystalline structure is lost <xref ref-type="bibr" rid="bib1.bibx32" id="paren.92"/>.
Therefore, <xref ref-type="bibr" rid="bib1.bibx32" id="text.93"/> attribute these bands to non-crystalline
graphite structure, presumably carbon–carbon bonds which occur in exposed
edges of the ground graphite <xref ref-type="bibr" rid="bib1.bibx100" id="paren.94"/>. The 1600 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> band
has in the past been attributed to aromatic or carbonyl structures as their
resonances overlap in this region, but the absorption line shapes of these two
structures are not accompanied by the broader band spanning a range of
900 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx32 bib1.bibx100 bib1.bibx98 bib1.bibx96" id="paren.95"/>. The
absorption band around 1600 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> can also be found in spectra of
polycyclic aromatic hydrocarbons such as anthanthrene and benzo[ghi]perylene
and has been attributed to the stretching of the aromatic C=C bonds
<xref ref-type="bibr" rid="bib1.bibx59" id="paren.96"/>.</p>
      <p>The absorption band around 1600 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is required by our baseline
corrected solution and the raw solution appears to use a shoulder of the same
band at lower wavenumbers (range 1497–1531 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>). The relevance of
this band to both solutions for the prediction of TOR EC is consistent with
the presence of sp<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula> bonds in ring-structured substances (as discussed
above) known to be emitted from combustion sources
<xref ref-type="bibr" rid="bib1.bibx30 bib1.bibx6" id="paren.97"/>. We attribute the absorption around 1700 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> in the
baseline corrected solution to (the shoulder of the) ester C=O stretch for
harmony in interpretation with the ester C–O–C antisymmetric stretch at
1275 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> in the raw solution, as described previously. Because of the
complexity of the PM mixture captured by the infrared spectra, we are unable
to identify the broader feature of 1800–900 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> associated with
graphitic structure reported by <xref ref-type="bibr" rid="bib1.bibx32" id="text.98"/>. However, as the baseline
corrected spectrum solution does not use information below 1500 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>,
it appears that accurate calibration models of TOR EC can be
constructed even by omission of a large portion of this broad band. Amines
have been associated with anthropogenic emissions and strong partitioning
behavior to the aerosol phase <xref ref-type="bibr" rid="bib1.bibx115" id="paren.99"><named-content content-type="pre">e.g.,</named-content></xref>, so identification of
this FG is plausible given our understanding of EC sources.</p>
      <p>We explicitly remark that while the absorption bands discussed in relation to
VIP scores are all observed in the overall infrared spectra used for building
calibration models, they are associated with the PM mixture and do not
correspond to direct observations of bands in physically or chemically
isolated specimens of TOR EC. The bands are selected mathematically, based on
strength of covariance (in combination with other bands) with TOR EC for
selection in quantitative prediction. Similar approaches based on covariance
analysis methods have reported acetyl, aromatic, and phenol structures
<xref ref-type="bibr" rid="bib1.bibx7" id="paren.100"/> or alkane C–H <xref ref-type="bibr" rid="bib1.bibx24" id="paren.101"/> to predict the
abundance of recalcitrant or black carbon in soils. For predicting TOR EC in
atmospheric samples, it is possible that our calibration models not only use
graphitic structure of EC but also rely on fragments of co-emitted OC, or
artifact from the partitioning of total carbon between OC and EC by TOR.
Based on our analysis of FGs in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS1"/>, we cannot rule
out at present time that some of these organic FG assignments may arise from
spurious correlations. While surface FGs of soot particles sampled at source
have been detected by FT-IR <xref ref-type="bibr" rid="bib1.bibx10" id="paren.102"/>, we consider that their
potential mass contribution is insignificant with respect to the overall mass
of accompanying organic PM such that the possibility of extracting this surface FG
contribution to the overall organic signal is unlikely. We anticipate that
plausibility of these hypotheses can be further constrained in future
studies.</p>
      <p>There are additional bands which appear to be relevant for one spectra type
but not the other for both TOR OC and EC (and also other analytes discussed
in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS1"/>) and, as described in the following section
(Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS3"/>), more models can be constructed that include
a larger number of wavenumbers. It is unclear whether these additional FGs
are superfluous, complementary, or spurious in their relation to groups
discussed above, but the value of their absorbances was not ruled out in the
model selection process. The most reasonable interpretation of relevant FG
lies somewhere between the extremes of the common FG and divergent FG
hypothesis, and further investigation on this topic is also left for future
work.</p>
</sec>
<sec id="Ch1.S3.SS3.SSS3">
  <title>Interpretation of TOR OC and EC solutions from SPLSa, SPLSb, and full models</title>
      <p>SPLSa and SPLSb solutions for TOR OC and EC calibration models developed with
baseline corrected spectra use wavenumbers similar to EN–PLS, which are
described above. In both TOR OC and EC models, additional wavenumbers in the
range 2200–2500 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and above 3300 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are used by
SPLSa and SPLSb solutions. The former set of wavenumbers may correspond to
vibrational modes from isocyanates, nitriles, and phosphines and usually
have very low absorbance in ambient samples. This may explain why the most
parsimonious solutions from EN–PLS do not use this range of wavenumbers. The
wavenumbers above 3300 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> may correspond the absorption of
alcohol and amine FGs, whose contribution to TOR OC and EC prediction is
taken into account in other parts of the spectrum by the EN–PLS solution.
SPLSa, SPLSb, and EN–PLS raw spectrum solutions for TOR OC and EC use the
range near 1700 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> attributed to C=O absorption, and the PTFE C–F
stretching region to account for removing the PTFE contribution to the
overall signal.</p>
      <p>The full spectrum solutions include features that have been previously
described in the EN–PLS solutions, though more specific interpretation is
more difficult for lack of sparsity. In the baseline corrected solution we
can see that for both TOR OC and EC high VIP scores correspond to the regions
around 1700, 3000, and 3400 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> associated with C=O, C–H, and O–H
stretching, respectively. For TOR EC, the high VIP scores in the region of
C–H stretching are associated with negative coefficients, as in the EN–PLS
solution. For the raw spectra models of TOR OC and EC, the PTFE C–F
stretching region is also used for background subtraction. The most
distinguishing features with highest VIP scores associated with analytes are
the C=O stretching (near 1700 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) for the TOR OC and benzene
ring stretch (near 1500 <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="normal">cm</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) for the TOR EC.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <title>Conclusions</title>
      <p>We evaluated four sparse methods in the construction of calibration models
for four organic FGs and TOR OC and EC. Since the full wavenumber models
already performed well in prediction, the best of the sparse models generally
did not improve model performance but provided interpretation regarding the
most relevant absorption bands required for prediction. In formulating sparse
models, the direct 1-norm penalty on regression coefficients by EN
permitted better control of sparsity – i.e., stronger correlations between
penalty and sparsity were observed, and more sparse solutions were ultimately
obtained – than imposing penalties on individual weight or direction
vectors as formulated by SPLSa and SPLSb.</p>
      <p>SPLS methods were less robust than EN and EN–PLS in extrapolating calibration
models developed with laboratory standards for use in estimating FG
abundances in ambient samples. For example, FG-OC estimated by the full
wavenumber PLS model using raw spectra had a slope and correlation of 0.97
and 0.93 in comparison to TOR OC, while the performance dropped as low as
1.69 (slope) and 0.77 (correlation) with the SPLS methods. The additional
dimensionality reduction applied by EN–PLS led to better performance than EN
using the same wavenumbers, and similar performance to the full wavenumber
models was achieved while using only 1–6 % of original wavenumbers for each
FG. As some samples are more sensitive to model formulation and sparsity,
such methods can possibly be used to identify cases in which laboratory
standards do not reflect the types of bonds in ambient samples. When sparse
methods were used to build calibration models using ambient samples, TOR OC
and EC prediction metrics were insensitive to sparsity. All PLS-based models
predicted reference values (not included during calibration) with less than
10 % bias and correlation coefficients higher than 0.9.
<?xmltex \hack{\newpage}?>
In examining sparse calibration models for aCOH, cCOH, aCH, and CO, selected
wavenumbers for FGs are consistent with known absorption bands of their
constituent bonds. FGs contributing to our capability for prediction for TOR
OC are those which are commonly associated with organic PM, while for TOR EC,
the main bond found in common between the raw and baseline corrected spectra
are C–C stretch in ring-structured compounds. Wavenumbers used by raw and
baseline corrected spectra appear to vary significantly, but they can be (and have
been) interpreted through different vibrational modes associated with a
common set of FGs.</p>
      <p>This first evaluation of sparse calibration methods using FT-IR spectra shows
promise in conferring interpretation of associated molecular bonds to TOR OC
and EC measurements. Sparse calibration models “localized” (in the
statistical sense) by spectral features or external variables can be used
to identify key FGs used for prediction of TOR measurements at various sites
<xref ref-type="bibr" rid="bib1.bibx85" id="paren.103"><named-content content-type="pre">e.g.,</named-content></xref> and can aid construction of calibration sets
suitable for prediction of individual or groups of samples (e.g., stratified
by environmental conditions or chemical composition). This work provides a
demonstration of how molecular structure can be associated with other
quantifiable metrics of complex PM to which spectral features from FT-IR can
be correlated.</p>
</sec>
<sec id="Ch1.S5">
  <title>Data availability</title>
      <p>The IMPROVE network data will be made publicly available.</p><?xmltex \hack{\clearpage}?>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <title>Notation</title>
      <p>Tables <xref ref-type="table" rid="App1.Ch1.T1"/> and <xref ref-type="table" rid="App1.Ch1.T2"/> summarize notation used
for matrices and vectors with their corresponding dimensions. Matrices in
written in uppercase italic bold and vectors in lowercase italic bold.
Vectors are column vectors by convention; row vectors are written as
transposed vectors.</p>

<?xmltex \floatpos{t}?><table-wrap id="App1.Ch1.T1"><?xmltex \hack{\hsize\textwidth}?><caption><p>Dimensions and indexing variables.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Scalar variable</oasis:entry>  
         <oasis:entry colname="col2">Description</oasis:entry>  
         <oasis:entry colname="col3">Dummy index</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">number of samples</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">number of independent variables (wavenumbers)</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">number of latent variables used</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<?xmltex \floatpos{t}?><table-wrap id="App1.Ch1.T2"><?xmltex \hack{\hsize\textwidth}?><caption><p>Arrays and dimensions.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row>  
         <oasis:entry colname="col1">Array variable</oasis:entry>  
         <oasis:entry colname="col2">Vector/scalar</oasis:entry>  
         <oasis:entry colname="col3">Description</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">notation</oasis:entry>  
         <oasis:entry colname="col3"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of spectra (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of dependent variables (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">B</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of PLS coefficients (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">T</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> scores (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">P</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> loadings (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">E</mml:mi><mml:mi>X</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> residuals (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Q</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of <inline-formula><mml:math display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula> loadings (<inline-formula><mml:math display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">E</mml:mi><mml:mi>Y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of <inline-formula><mml:math display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula> residuals (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">R</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> direction vectors (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">W</mml:mi></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2"><inline-formula><mml:math display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>h</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col3">matrix of <inline-formula><mml:math display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> weights (<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</app>

<app id="App1.Ch1.S2">
  <title>Model specification</title>
      <p>The derivation, properties, and implementation of sparse methods used in this
paper are described in detail by their respective authors: SPLSa
<xref ref-type="bibr" rid="bib1.bibx66" id="paren.104"/>, SPLSb <xref ref-type="bibr" rid="bib1.bibx17" id="paren.105"/>, EN <xref ref-type="bibr" rid="bib1.bibx116 bib1.bibx34" id="paren.106"/>,
and EN–PLS <xref ref-type="bibr" rid="bib1.bibx36" id="paren.107"/>. In this section, we briefly summarize the methods
using consistent notation such that (1) their problem statements can be
compared through their objective functions and constraints (formulated as
penalties) and (2) how sparsity is controlled by their respective tuning
parameters is apparent. An overview of methods and the parameters over which
models are explored is provided in Table <xref ref-type="table" rid="App1.Ch1.T3"/>. For PLS methods,
the more general case for multivariate <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> is introduced, and
specific simplifications for univariate <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> are described where notable.
For solving the PLS problem, we use the NIPALS algorithm in each case as the
weight vectors derived from this algorithm can be used for calculating VIP
scores.<?xmltex \hack{\newpage}?><?xmltex \hack{\vspace*{9.5cm}}?></p>
<sec id="App1.Ch1.S2.SS1">
  <title>Partial least squares</title>
      <p>A search for LVs can be framed as an optimization problem to maximize
covariance between response and explanatory variables under a set of
transformations <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx17 bib1.bibx67 bib1.bibx29 bib1.bibx68" id="paren.108"/>.
Writing the matrix product of the spectra and response variables as
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="bold">Z</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold">X</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi mathvariant="bold">Y</mml:mi></mml:mrow></mml:math></inline-formula>, the transformations are introduced
through the weight vector <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">w</mml:mi></mml:math></inline-formula> for each <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th LV:</p>

<?xmltex \floatpos{t}?><table-wrap id="App1.Ch1.T3" specific-use="star"><caption><p>Summary of sparse methods and parameters. </p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Method</oasis:entry>  
         <oasis:entry colname="col2">Parameter</oasis:entry>  
         <oasis:entry colname="col3">Values</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">EN</oasis:entry>  
         <oasis:entry colname="col2">sparsity</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>max⁡</mml:mo></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>max⁡</mml:mo></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">SPLSa</oasis:entry>  
         <oasis:entry colname="col2">sparsity</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn>0.1</mml:mn><mml:mo>,</mml:mo><mml:mn>0.2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>0.9</mml:mn><mml:mo>,</mml:mo><mml:mn>0.92</mml:mn><mml:mo>,</mml:mo><mml:mn>0.95</mml:mn><mml:mo>,</mml:mo><mml:mn>0.98</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">LVs</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>35</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">SPLSb</oasis:entry>  
         <oasis:entry colname="col2">sparsity</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn>10</mml:mn><mml:mo>,</mml:mo><mml:mn>20</mml:mn><mml:mo>,</mml:mo><mml:mn>30</mml:mn><mml:mo>,</mml:mo><mml:mn>50</mml:mn><mml:mo>,</mml:mo><mml:mn>70</mml:mn><mml:mo>,</mml:mo><mml:mn>100</mml:mn><mml:mo>,</mml:mo><mml:mn>200</mml:mn><mml:mo>,</mml:mo><mml:mn>300</mml:mn><mml:mo>,</mml:mo><mml:mn>500</mml:mn><mml:mo>,</mml:mo><mml:mn>1000</mml:mn><mml:mo>,</mml:mo><mml:mn>1500</mml:mn><mml:mo>,</mml:mo><mml:mo>(</mml:mo><mml:mn>2000</mml:mn><mml:mo>,</mml:mo><mml:mn>2500</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mo>*</mml:mo></mml:msup><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">LVs</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>35</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">EN–PLS</oasis:entry>  
         <oasis:entry colname="col2">sparsity</oasis:entry>  
         <oasis:entry colname="col3">fixed by EN</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">LVs</oasis:entry>  
         <oasis:entry colname="col3"><inline-formula><mml:math display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>35</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p><inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula> Values in parentheses correspond to values explored only in
raw spectra models as baseline corrected models cannot exceed the value of
1563 (the total number of wavenumbers). </p></table-wrap-foot></table-wrap>

      <p><disp-formula id="App1.Ch1.E1" content-type="numbered"><mml:math display="block"><mml:mtable class="array" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">arg</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">max</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msubsup><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtext>s.t.</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
          with a constraint to ensure that the weight vectors are normalized. In the
nonlinear iterative partial least squares (NIPALS) algorithm
<xref ref-type="bibr" rid="bib1.bibx110 bib1.bibx72" id="paren.109"/>, the weight vectors are calculated from the
deflated (residual) matrix <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold">X</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>
obtained from the <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th iteration <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx17 bib1.bibx67" id="paren.110"/> in
which <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">I</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold">T</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mi mathvariant="bold">T</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mo>+</mml:mo></mml:msubsup><mml:mo>)</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">I</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold">T</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mi mathvariant="bold">T</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mo>+</mml:mo></mml:msubsup><mml:mo>)</mml:mo><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>.
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">I</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the identity matrix of dimension <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula>,
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">T</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold">T</mml:mi><mml:mo>+</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> is the Moore–Penrose
inverse of <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">T</mml:mi></mml:math></inline-formula>. <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">T</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>≡</mml:mo><mml:msub><mml:mn mathvariant="bold">0</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> such
that <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="bold">X</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="bold">Y</mml:mi></mml:mrow></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx103 bib1.bibx67" id="paren.111"/>. The weight vectors correspond to
eigenvectors of <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msubsup><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx51 bib1.bibx88" id="paren.112"/>.</p>
      <p><inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">w</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">r</mml:mi></mml:math></inline-formula> introduced as column elements of <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">W</mml:mi></mml:math></inline-formula> and
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">R</mml:mi></mml:math></inline-formula>, respectively, in Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/> are related in
concept and often referred to as loading weights, loadings, weights, and
direction vectors interchangeably
<xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx63 bib1.bibx76 bib1.bibx66 bib1.bibx17 bib1.bibx67 bib1.bibx29" id="paren.113"><named-content content-type="pre">e.g.,</named-content></xref>.
In this paper, we adopt the convention of referring to <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">w</mml:mi></mml:math></inline-formula> as
(loading) weights and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">r</mml:mi></mml:math></inline-formula> as direction vectors, respectively. Using the
definition of deflated matrices, we can also write the relationship between
the loading weights and direction vectors as <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="bold">X</mml:mi><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">I</mml:mi><mml:mi>M</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold">RP</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mo>)</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, where
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">I</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the identity matrix of dimensions <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:mo>×</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx103" id="paren.114"/>. The reader will note that <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Z</mml:mi></mml:math></inline-formula> is proportional
to the cross-covariance matrix between <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula>, and the
objective function (Eq. <xref ref-type="disp-formula" rid="App1.Ch1.E1"/>) is in fact proportional to the
inner product of the cross covariances between <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and the
transformed variables <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for each LV <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>.</p>
      <p>To solve for the underlying weights and direction vectors which satisfy these
equations, we use the NIPALS algorithm implemented in the <preformat preformat-type="code"><![CDATA[pls]]></preformat> library
<xref ref-type="bibr" rid="bib1.bibx76" id="paren.115"/> for the R programming language <xref ref-type="bibr" rid="bib1.bibx83" id="paren.116"/> in
this work. Candidate models are generated by varying <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>120</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>,
from which one is selected by penalizing model variance over an ensemble of
scaling factors and combining them through consensus scoring
<xref ref-type="bibr" rid="bib1.bibx101" id="paren.117"/>.</p>
</sec>
<sec id="App1.Ch1.S2.SS2">
  <title>Elastic net regularization</title>
      <p>EN regularization is not a variant of PLS but solves for regression
coefficients in Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) without using LVs. The objective
function function consists of a log-likelihood <inline-formula><mml:math display="inline"><mml:mi mathvariant="script">L</mml:mi></mml:math></inline-formula> term combined with additional constraints imposed
on the regression vector <xref ref-type="bibr" rid="bib1.bibx116" id="paren.118"/>:
            <disp-formula id="App1.Ch1.E2" content-type="numbered"><mml:math display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">arg</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">min</mml:mi></mml:mrow><mml:mi mathvariant="bold-italic">b</mml:mi></mml:msub><mml:mspace width="0.33em" linebreak="nobreak"/><mml:mo>-</mml:mo><mml:mi mathvariant="script">L</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>;</mml:mo><mml:mi mathvariant="bold">X</mml:mi><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>‖</mml:mo><mml:mi mathvariant="bold-italic">b</mml:mi><mml:msub><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>‖</mml:mo><mml:mi mathvariant="bold-italic">b</mml:mi><mml:msubsup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>The first penalty corresponds to that used by LASSO regression
<xref ref-type="bibr" rid="bib1.bibx104" id="paren.119"/> and imposes sparseness constraints, while the second
penalty corresponds to that used by ridge regression <xref ref-type="bibr" rid="bib1.bibx50" id="paren.120"/> or
standard Tikhonov regularization <xref ref-type="bibr" rid="bib1.bibx105" id="paren.121"/> and imposes
restrictions on the overall size of the regression vector. Combining the two
penalties imposes sparsity constraints but retains a grouping effect which
enables selection of co-varying variables together <xref ref-type="bibr" rid="bib1.bibx116" id="paren.122"/>. Without
the second penalty, the LASSO penalty alone often selects only one of the
covariates at random (leading to potential loss of relevant variables) and
permits at most <inline-formula><mml:math display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> variables to be retained <xref ref-type="bibr" rid="bib1.bibx116" id="paren.123"/>, which is not
always desirable when the inverse problem is underdetermined. In practice,
the parameter space for EN is not formulated in terms of <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> but by the overall penalty <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>
and the fractional contribution of the first penalty <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>/</mml:mo><mml:mi mathvariant="italic">λ</mml:mi></mml:mrow></mml:math></inline-formula>, such that the overall penalty to the log-likelihood term is written as
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="italic">α</mml:mi></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mfenced open="[" close="]"><mml:mi mathvariant="italic">α</mml:mi><mml:mo>‖</mml:mo><mml:mi mathvariant="bold-italic">b</mml:mi><mml:msub><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:mo>)</mml:mo><mml:mo>‖</mml:mo><mml:mi mathvariant="bold-italic">b</mml:mi><mml:msubsup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mfenced></mml:mrow></mml:math></inline-formula>. <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> corresponds to LASSO
regression and <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> corresponds to ridge regression. Though <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>
can be treated as a free parameter spanning values between 0 and 1, we fix
the value at <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula> to mix the penalties for our EN regression
solution <xref ref-type="bibr" rid="bib1.bibx34 bib1.bibx45" id="paren.124"/>. As part of the algorithm, a scaling
correction is applied a posteriori to the “naïve” regression coefficients
specified by Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.E2"/>) to correct for biases introduced
through application of both the ridge and LASSO regularization procedures in
obtaining the regression vector <xref ref-type="bibr" rid="bib1.bibx116" id="paren.125"/>.</p>
      <p>We perform EN regression using the <preformat preformat-type="code"><![CDATA[glmnet]]></preformat> library <xref ref-type="bibr" rid="bib1.bibx34" id="paren.126"/>
in R. We generate 100 models (regression coefficients) for various values of
<inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">λ</mml:mi></mml:math></inline-formula> spanning a specified range for each variable. As an upper bound,
<inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">λ</mml:mi></mml:math></inline-formula>, for which all regression coefficients become 0, is selected, and
10<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> times this upper value is selected as a lower bound.</p>
</sec>
<sec id="App1.Ch1.S2.SS3">
  <title>Sparse PLS by penalty on the weight vector </title>
      <p>A sparse PLS formulation by <xref ref-type="bibr" rid="bib1.bibx66" id="text.127"/> is inspired by the sparse
principal component analysis (PCA) of <xref ref-type="bibr" rid="bib1.bibx94" id="text.128"/> and based on a
singular value decomposition (SVD) estimate of the direction vector with
sparsity imposed by a LASSO penalty function. For each <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th LV,
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is decomposed by its rank-one approximation from SVD, and
modified singular vectors which additionally satisfy the sparsity constraints
are sought by iterative calculation:

                <disp-formula id="App1.Ch1.Ex1"><mml:math display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">arg</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="normal">min</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msubsup><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msubsup><mml:mo>‖</mml:mo><mml:mtext>F</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

          <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the left and right singular vectors of
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, respectively; as eigenvectors of <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msubsup><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>
and <inline-formula><mml:math display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> they are also equivalent to loading weights
for <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx66" id="paren.129"/>. <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the
corresponding singular value from the SVD, <inline-formula><mml:math display="inline"><mml:mrow><mml:mo>‖</mml:mo><mml:mo>⋅</mml:mo><mml:msub><mml:mo>‖</mml:mo><mml:mtext>F</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is the
Frobenius norm, and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the deflated matrix after subtraction of
the first <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> components (<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Z</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="bold">Z</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi mathvariant="bold-italic">D</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mi mathvariant="bold-italic">V</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula>). While the expression above is written
generally to accommodate a multivariate <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold">Y</mml:mi></mml:math></inline-formula> scenario, <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> is a
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>N</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> column vector in our specification. Therefore, <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the
<inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th weight vector in PLS formulation (hence the labeling of this vector as
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">w</mml:mi></mml:math></inline-formula>); <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>≡</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> for all <inline-formula><mml:math display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> and the second penalty term on
this vector is not necessary. Updated weight vectors are obtained from a soft
thresholding operator <xref ref-type="bibr" rid="bib1.bibx94 bib1.bibx31 bib1.bibx73 bib1.bibx75" id="paren.130"/>
applied at each iteration <xref ref-type="bibr" rid="bib1.bibx66" id="paren.131"/>: <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="normal">sign</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>⋅</mml:mo><mml:mo>max⁡</mml:mo><mml:mfenced close="}" open="{"><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mfenced></mml:mrow></mml:math></inline-formula>
and corresponds to the LASSO penalty function <xref ref-type="bibr" rid="bib1.bibx46" id="paren.132"/>.</p>
      <p>This algorithm is implemented in the <preformat preformat-type="code"><![CDATA[mixOmics]]></preformat> package
<xref ref-type="bibr" rid="bib1.bibx11" id="paren.133"/> in R. The soft threshold (<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>) is selected
accordingly by the magnitude of the <inline-formula><mml:math display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>th highest loading weight, which we
write as <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>X</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, when sorted by decreasing magnitude. Therefore, the threshold
or tuning parameter is specified by the number of variables to retain in each
LV according to user input. We select values of <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>X</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn>10</mml:mn><mml:mo>,</mml:mo><mml:mn>20</mml:mn><mml:mo>,</mml:mo><mml:mn>30</mml:mn><mml:mo>,</mml:mo><mml:mn>50</mml:mn><mml:mo>,</mml:mo><mml:mn>70</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula>
100, 200, 300, 500, 1000, <inline-formula><mml:math display="inline"><mml:mrow><mml:mn>1500</mml:mn><mml:mo>,</mml:mo><mml:mn>2000</mml:mn><mml:mo>,</mml:mo><mml:mn>2500</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>; as permitted by the maximum
number of wavenumbers <inline-formula><mml:math display="inline"><mml:mi>M</mml:mi></mml:math></inline-formula> in the input spectra. We choose
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>35</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>. Because the wavenumbers for loading weights falling
below the <inline-formula><mml:math display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>th-ranked value are different for each LV, the specification by
number of variables does not necessarily lead to the same number of non-zero
coefficients in the regression vector. The number of non-zero coefficients is
calculated from the solutions presented in Sect. <xref ref-type="sec" rid="Ch1.S2"/>.</p>
</sec>
<sec id="App1.Ch1.S2.SS4">
  <title>Sparse PLS by penalty on a surrogate vector</title>
      <p>Another sparse PLS algorithm, introduced by <xref ref-type="bibr" rid="bib1.bibx17" id="text.134"/>, is based on the
sparse PCA by <xref ref-type="bibr" rid="bib1.bibx117" id="text.135"/> combined with
the EN penalty of <xref ref-type="bibr" rid="bib1.bibx116" id="text.136"/>. Rather than imposing a sparsity constraint
on the weight or direction vector, a higher degree of sparsity is targeted by
placing a constraint on a surrogate <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">c</mml:mi></mml:math></inline-formula> of the direction vector:

                <disp-formula specific-use="align"><mml:math display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">arg</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="normal">min</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="italic">κ</mml:mi><mml:msubsup><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msup><mml:mi mathvariant="bold">ZZ</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">κ</mml:mi><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:mi mathvariant="bold">ZZ</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msub><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msubsup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mtext>s.t.</mml:mtext></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msub><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mn>1.</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            The first term maximizes covariance (by minimizing its negative value) as in
the original PLS problem (Eq. <xref ref-type="disp-formula" rid="App1.Ch1.E1"/>); the second term keeps
the surrogate vector <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">c</mml:mi></mml:math></inline-formula> in close alignment with the direction vector
<inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">r</mml:mi></mml:math></inline-formula>, with <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> controlling the tradeoff between the two terms. The
penalty, similar to that of EN, balances sparsity with preventing a potential
singularity in the inversion of <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">Z</mml:mi><mml:msup><mml:mi mathvariant="bold-italic">Z</mml:mi><mml:mi>T</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. The solution in the
multivariate case is sought by fixing <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">w</mml:mi></mml:math></inline-formula> or <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">c</mml:mi></mml:math></inline-formula> and solving for
the other vector in an alternate fashion <xref ref-type="bibr" rid="bib1.bibx17" id="paren.137"/>. In practice,
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is chosen to be large for the estimator to be obtained by soft
thresholding; for univariate <inline-formula><mml:math display="inline"><mml:mi mathvariant="bold-italic">y</mml:mi></mml:math></inline-formula> the solution does not depend on
<inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">κ</mml:mi></mml:math></inline-formula> so it is fixed to a value of <inline-formula><mml:math display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx29" id="paren.138"/>.
Therefore, the key dependence can be reduced from four to two parameters, <inline-formula><mml:math display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>
and <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. For the univariate case, the solution for <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is
related to the direction vector
<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mtext>sign</mml:mtext><mml:mo>(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>⋅</mml:mo><mml:mo>max⁡</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>/</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx17" id="paren.139"/>. As with SPLSa, the penalty with respect to <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is
reformulated as a soft thresholding operator containing the bounded parameter
<inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">η</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">η</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>) to be applied at each iteration
<xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx29" id="paren.140"/>: <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mo mathvariant="normal">̃</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mtext>sign</mml:mtext><mml:mo>(</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>⋅</mml:mo><mml:mo>max⁡</mml:mo><mml:mfenced open="{" close="}"><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>k</mml:mi></mml:msub><mml:mo>|</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">η</mml:mi><mml:msub><mml:mo>max⁡</mml:mo><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>≤</mml:mo><mml:mi>j</mml:mi><mml:mo>≤</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>r</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mfenced></mml:mrow></mml:math></inline-formula>. Rather than thresholding on
the <inline-formula><mml:math display="inline"><mml:mi>j</mml:mi></mml:math></inline-formula>th value of a vector as described for SPLSa, components of the
direction vectors below a fraction <inline-formula><mml:math display="inline"><mml:mi mathvariant="italic">η</mml:mi></mml:math></inline-formula> of the magnitude of the largest
component are set to 0. The solutions obtained for <inline-formula><mml:math display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula> are
only used for variable selection, and ordinary PLS is used on the selected
wavenumbers to obtain the final sparse solution <xref ref-type="bibr" rid="bib1.bibx17" id="paren.141"/>.</p>
      <p>We use the implementation in the <preformat preformat-type="code"><![CDATA[spls]]></preformat> library <xref ref-type="bibr" rid="bib1.bibx18" id="paren.142"/> in R.
We select values of the thresholding parameter as
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn>0.1</mml:mn><mml:mo>,</mml:mo><mml:mn>0.2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>0.9</mml:mn><mml:mo>,</mml:mo><mml:mn>0.92</mml:mn><mml:mo>,</mml:mo><mml:mn>0.95</mml:mn><mml:mo>,</mml:mo><mml:mn>0.98</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> and the number of components
over the same domain for SPLSa: <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn>35</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>. Internally, the
final fitting is performed via the <preformat preformat-type="code"><![CDATA[pls]]></preformat> library, and for this we also
specify use of the NIPALS algorithm.</p>
</sec>
<sec id="App1.Ch1.S2.SS5">
  <title>EN combined with PLS </title>
      <p><xref ref-type="bibr" rid="bib1.bibx36" id="text.143"/> introduced a two-step strategy by which EN regression is used
for preliminary variable selection, and a final calibration model is
developed by projection onto LVs with PLS regression. In the proposition by
<xref ref-type="bibr" rid="bib1.bibx36" id="text.144"/>, additional backward variable selection on groups of
contiguous wavenumbers is incorporated into the second stage of the
procedure; models are iteratively constructed with diminishing subsets of
wavenumbers until the apparent performance with respect to a defined
criterion (e.g., minimum RMSECV) no longer improves. As the definition of
this criterion may not be consistent with the criteria by which we evaluate
and select the final model for this work (e.g., parsimony, comparison with
ambient reference measurements), we forgo this additional wavenumber
selection step and use the wavenumbers selected by EN regression directly.</p>
      <p>We have implemented this method in R in combination with the <preformat preformat-type="code"><![CDATA[pls]]></preformat>
library and NIPALS algorithm as described above. Variable selection with PLS
is applied to calibration models developed with TOR measurements. PLS
regression using non-zero wavenumbers from EN regression is also performed
with laboratory standards for FG estimation, but the additional variable
selection is not used since the best model for extrapolation to ambient
samples may not necessarily be selected by lower RMSE<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mi>d</mml:mi></mml:msub></mml:math></inline-formula>
<xref ref-type="bibr" rid="bib1.bibx101" id="paren.145"/>.</p>
</sec>
</app>

<app id="App1.Ch1.S3">
  <title>Additional remarks on model selection</title>
<sec id="App1.Ch1.S3.SS1">
  <title>Metrics for model selection</title>
      <p>The weighting of the bias and variance can be formulated in various
functional forms with tuning parameters, and the final model chosen by
consensus scoring <xref ref-type="bibr" rid="bib1.bibx48 bib1.bibx49 bib1.bibx61" id="paren.146"><named-content content-type="pre">e.g., sum of ranking
differences;</named-content></xref> when the most
suitable value for the parameter is not known a priori
<xref ref-type="bibr" rid="bib1.bibx58 bib1.bibx101" id="paren.147"/>. The solutions for the full wavenumber PLS
solutions are obtained using this method. Given the additional information
required (RMSE values for each CV fold at every definition of sparsity and
LVs), we forgo applying this formal procedure for the sparse PLS methods.
Model selection according to the pRMSECV criterion will yield the number of
LVs as using metric <inline-formula><mml:math display="inline"><mml:mrow><mml:mi>M</mml:mi><mml:msub><mml:mn mathvariant="normal">2</mml:mn><mml:mi>k</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx101" id="paren.148"/> and setting the
penalty parameter to its characteristic value, <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="italic">λ</mml:mi><mml:mo>*</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>, which
corresponds to the extreme limit of penalties considered in the evaluation of
this metric. <?xmltex \hack{\newpage}?></p>
</sec>
<sec id="App1.Ch1.S3.SS2">
  <title>Biases in RMSECV</title>
      <p>Variable selection and model parameter estimation is performed entirely
within the calibration set for all methods. The two objectives are combined
in EN and SPLSa by their respective soft thresholding penalties, but are
separated into two steps by SPLSb and EN–PLS. SPLSb correctly applies both
variable selection and parameter estimation prior to estimation of error and
against the validation (sub)set, while our current implementation of EN–PLS
method performs variable selection (via EN) prior to the CV and RMSECV
estimation procedure of the second step. In the latter scenario, the reported
RMSECV can be underestimated <xref ref-type="bibr" rid="bib1.bibx46 bib1.bibx67" id="paren.149"/> as  the validation samples have already informed construction of the
model (for wavenumber reduction), albeit according to different set of selection
criteria. We note that the implication of this bias is the lack of
comparability with RMSECV values estimated with other algorithms. While this
is a potential area for further improvement, our model selection criteria use
relative RMSECV estimates; we expect that this bias will not have a major
influence on model selection and no affect on reported evaluation against
test set samples.</p><?xmltex \hack{\clearpage}?><supplementary-material position="anchor"><p><bold>The Supplement related to this article is available online at <inline-supplementary-material xlink:href="http://dx.doi.org/10.5194/amt-9-3429-2016-supplement" xlink:title="pdf">doi:10.5194/amt-9-3429-2016-supplement</inline-supplementary-material>.</bold></p></supplementary-material>
</sec>
</app>
  </app-group><ack><title>Acknowledgements</title><p>The authors acknowledge funding from the Swiss National Science Foundation
(200021_143298) and the IMPROVE program (National Park Service cooperative
agreement P11AC91045). We also thank A. Weakley for helpful
discussions.<?xmltex \hack{\newline}?><?xmltex \hack{\newline}?> Edited by:
A. Sayer<?xmltex \hack{\newline}?> Reviewed by: G. Lebron and one anonymous referee</p></ack><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Allen et al.(1994)</label><mixed-citation>Allen, D. T., Palen, E. J., Haimov, M. I., Hering, S. V., and Young, J. R.:
Fourier-transform Infrared-spectroscopy of Aerosol Collected In A
Low-pressure Impactor (LPI/FTIR) – Method Development and Field Calibration,
Aerosol Sci. Tech., 21, 325–342, <ext-link xlink:href="http://dx.doi.org/10.1080/02786829408959719" ext-link-type="DOI">10.1080/02786829408959719</ext-link>, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Andries and Martin(2013)</label><mixed-citation>Andries, E. and Martin, S.: Sparse Methods in Spectroscopy: An Introduction,
Overview, and Perspective, Appl. Spectrosc., 67, 579–593,
<ext-link xlink:href="http://dx.doi.org/10.1366/13-07021" ext-link-type="DOI">10.1366/13-07021</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Arlot and Celisse(2010)</label><mixed-citation>Arlot, S. and Celisse, A.: A survey of cross-validation procedures for model
selection, Statistics Surveys, 4, 40–79, <ext-link xlink:href="http://dx.doi.org/10.1214/09-SS054" ext-link-type="DOI">10.1214/09-SS054</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Barsanti and Pankow(2006)</label><mixed-citation>Barsanti, K. C. and Pankow, J. F.: Thermodynamics of the formation of
atmospheric organic particulate matter by accretion reactions – Part 3:
Carboxylic and dicarboxylic acids, Atmos. Environ., 40, 6676–6686,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2006.03.013" ext-link-type="DOI">10.1016/j.atmosenv.2006.03.013</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Bishop(2009)</label><mixed-citation>
Bishop, C. M.: Pattern recognition and machine learning, Springer, New York,
NY, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Bond et al.(2004)</label><mixed-citation>Bond, T. C., Streets, D. G., Yarber, K. R., Nelson, S. M., Woo, J.-H., and
Klimont, Z.: A technology-based global inventory of black and organic carbon
emissions from combustion, J. Geophys. Res., 109, D14203, <ext-link xlink:href="http://dx.doi.org/10.1029/2003JD003697" ext-link-type="DOI">10.1029/2003JD003697</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Bornemann et al.(2008)</label><mixed-citation>Bornemann, L., Welp, G., Brodowski, S., Rodionov, A., and Amelung, W.: Rapid
assessment of black carbon in soil organic matter using mid-infrared
spectroscopy, Org. Geochem., 39, 1537–1544,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.orggeochem.2008.07.012" ext-link-type="DOI">10.1016/j.orggeochem.2008.07.012</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Burnham et al.(1996)</label><mixed-citation>Burnham, A. J., Viveros, R., and MacGregor, J. F.: Frameworks for latent
variable multivariate regression, J. Chemometr., 10, 31–45,
<ext-link xlink:href="http://dx.doi.org/10.1002/(SICI)1099-128X(199601)10:1&lt;31::AID-CEM398&gt;3.0.CO;2-1" ext-link-type="DOI">10.1002/(SICI)1099-128X(199601)10:1&lt;31::AID-CEM398&gt;3.0.CO;2-1</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Cai et al.(2008)</label><mixed-citation>Cai, W., Li, Y., and Shao, X.: A variable selection method based on
uninformative variable elimination for multivariate calibration of
near-infrared spectra, Chemometr. Intell. Lab., 90,
188–194, <ext-link xlink:href="http://dx.doi.org/10.1016/j.chemolab.2007.10.001" ext-link-type="DOI">10.1016/j.chemolab.2007.10.001</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Cain et al.(2010)</label><mixed-citation>Cain, J. P., Gassman, P. L., Wang, H., and Laskin, A.: Micro-FTIR study of soot
chemical composition-evidence of aliphatic hydrocarbons on nascent soot
surfaces, Phys. Chem. Chem. Phys., 12, 5206–5218,
<ext-link xlink:href="http://dx.doi.org/10.1039/b924344e" ext-link-type="DOI">10.1039/b924344e</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Cao et al.(2014)</label><mixed-citation>Cao, K.-A. L., Rohart, F., Gonzalez, I., and Dejean, S.: mixOmics: Omics Data
Integration Project, <uri>http://CRAN.R-project.org/package=mixOmics</uri>, r package version
5.0-3, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Caroline et al.(2009)</label><mixed-citation>Caroline, M. L., Sankar, R., Indirani, R., and Vasudevan, S.: Growth, optical,
thermal and dielectric studies of an amino acid organic nonlinear optical
material: l-Alanine, Mater. Chem. Phys., 114, 490–494,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.matchemphys.2008.09.070" ext-link-type="DOI">10.1016/j.matchemphys.2008.09.070</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Centner et al.(1996)</label><mixed-citation>Centner, V., Massart, D.-L., de Noord, O. E., de Jong, S., Vandeginste,
B. M., and Sterna, C.: Elimination of Uninformative Variables for
Multivariate Calibration, Anal. Chem., 68, 3851–3858,
<ext-link xlink:href="http://dx.doi.org/10.1021/ac960321m" ext-link-type="DOI">10.1021/ac960321m</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Chong and Jun(2005)</label><mixed-citation>Chong, I. G. and Jun, C. H.: Performance of some variable selection methods
when multicollinearity is present, Chemometr. Intell. Lab.,
78, 103–112, <ext-link xlink:href="http://dx.doi.org/10.1016/j.chemolab.2004.12.011" ext-link-type="DOI">10.1016/j.chemolab.2004.12.011</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Chow et al.(2004)</label><mixed-citation>Chow, J. C., Watson, J. G., Chen, L.-W. A., Arnott, W. P., Moosmüller, H.,
and Fung, K.: Equivalence of Elemental Carbon by Thermal/Optical Reflectance
and Transmittance with Different Temperature Protocols, Environ. Sci. Tech.,
38, 4414–4422, <ext-link xlink:href="http://dx.doi.org/10.1021/es034936u" ext-link-type="DOI">10.1021/es034936u</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Chow et al.(2007)</label><mixed-citation>Chow, J. C., Watson, J. G., Chen, L.-W. A., Chang, M. O., Robinson, N. F.,
Trimble, D., and Kohl, S.: The IMPROVE_A Temperature Protocol for
Thermal/Optical Carbon Analysis: Maintaining Consistency with a Long-Term
Database, J. Air Waste Manage., 57, 1014–1023,
<ext-link xlink:href="http://dx.doi.org/10.3155/1047-3289.57.9.1014" ext-link-type="DOI">10.3155/1047-3289.57.9.1014</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Chun and Keles(2010)</label><mixed-citation>Chun, H. and Keles, S.: Sparse partial least squares regression for
simultaneous dimension reduction and variable selection,
J. Roy. Stat. Soc. B Met., 72, 3–25,
<ext-link xlink:href="http://dx.doi.org/10.1111/j.1467-9868.2009.00723.x" ext-link-type="DOI">10.1111/j.1467-9868.2009.00723.x</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Chung et al.(2013)</label><mixed-citation>Chung, D., Chun, H., and Keles, S.: spls: Sparse Partial Least Squares (SPLS)
Regression and Classification, <uri>http://CRAN.R-project.org/package=spls</uri>, r package version
2.2-1, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Corrigan et al.(2013)</label><mixed-citation>Corrigan, A. L., Russell, L. M., Takahama, S., Äijälä, M., Ehn, M.,
Junninen, H., Rinne, J., Petäjä, T., Kulmala, M., Vogel, A. L.,
Hoffmann, T., Ebben, C. J., Geiger, F. M., Chhabra, P., Seinfeld, J. H.,
Worsnop, D. R., Song, W., Auld, J., and Williams, J.: Biogenic and biomass burning
organic aerosol in a boreal forest at Hyytiälä, Finland, during HUMPPA-COPEC 2010,
Atmos. Chem. Phys., 13, 12233–12256, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-13-12233-2013" ext-link-type="DOI">10.5194/acp-13-12233-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Coury and Dillner(2008)</label><mixed-citation>Coury, C. and Dillner, A. M.: A method to quantify organic functional groups
and inorganic compounds in ambient aerosols using attenuated total
reflectance FTIR spectroscopy and multivariate chemometric techniques,
Atmos. Environ., 42, 5923–5932, <ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2008.03.026" ext-link-type="DOI">10.1016/j.atmosenv.2008.03.026</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Cunningham et al.(1974)</label><mixed-citation>Cunningham, P. T., Johnson, S. A., and Yang, R. T.: Variations in chemistry of
airborne particulate material with particle size and time, Environ.
Sci. Tech., 8, 131–135, <ext-link xlink:href="http://dx.doi.org/10.1021/es60087a002" ext-link-type="DOI">10.1021/es60087a002</ext-link>, 1974.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Cziczo et al.(1997)</label><mixed-citation>Cziczo, D. J., Nowak, J. B., Hu, J. H., and Abbatt, J. P. D.: Infrared
spectroscopy of model tropospheric aerosols as a function of relative
humidity: Observation of deliquescence and crystallization, J.
Geophys. Res.-Atmos., 102, 18843–18850, <ext-link xlink:href="http://dx.doi.org/10.1029/97JD01361" ext-link-type="DOI">10.1029/97JD01361</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Day et al.(2010)</label><mixed-citation>Day, D. A., Liu, S., Russell, L. M., and Ziemann, P. J.: Organonitrate group
concentrations in submicron particles with high nitrate and organic fractions
in coastal southern California, Atmos. Environ., 44, 1970–1979,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2010.02.045" ext-link-type="DOI">10.1016/j.atmosenv.2010.02.045</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>de la Rosa Arranz et al.(2013)</label><mixed-citation>de la Rosa Arranz, J. M., González-Vila, F. J., González-Pérez, J. A.,
Almendros Martín, G., Hernández, Z., López Martín, M., and Knicker, H.:
How useful is the mid-infrared spectroscopy in the assessment of black carbon
in soils, Flamma, 4, 147–151, <uri>http://digital.csic.es/handle/10261/82100</uri>,
2013.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Dillner and Takahama(2015a)</label><mixed-citation>Dillner, A. M. and Takahama, S.: Predicting ambient aerosol thermal-optical
reflectance (TOR) measurements from infrared spectra: organic carbon,
Atmos. Meas. Tech., 8, 1097–1109, <ext-link xlink:href="http://dx.doi.org/10.5194/amt-8-1097-2015" ext-link-type="DOI">10.5194/amt-8-1097-2015</ext-link>, 2015a.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Dillner and Takahama(2015b)</label><mixed-citation>Dillner, A. M. and Takahama, S.: Predicting ambient aerosol thermal-optical
reflectance measurements from infrared spectra: elemental carbon, Atmos. Meas. Tech.,
8, 4013–4023, <ext-link xlink:href="http://dx.doi.org/10.5194/amt-8-4013-2015" ext-link-type="DOI">10.5194/amt-8-4013-2015</ext-link>, 2015b.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Dockery et al.(1993)</label><mixed-citation>Dockery, D. W., Pope, C. A., Xu, X. P., Spengler, J. D., Ware, J. H., Fay,
M. E., Ferris, B. G., and Speizer, F. E.: An Association Between
Air-pollution and Mortality In 6 United-states Cities, New Engl. J. Med.,
329, 1753–1759, <ext-link xlink:href="http://dx.doi.org/10.1056/NEJM199312093292401" ext-link-type="DOI">10.1056/NEJM199312093292401</ext-link>, 1993.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Faber and Kowalski(1997)</label><mixed-citation>Faber, K. and Kowalski, B. R.: Propagation of measurement errors for the
validation of predictions obtained by principal component regression and
partial least squares, J. Chemometr., 11, 181–238,
<ext-link xlink:href="http://dx.doi.org/10.1002/(SICI)1099-128X(199705)11:3&lt;181::AID-CEM459&gt;3.0.CO;2-7" ext-link-type="DOI">10.1002/(SICI)1099-128X(199705)11:3&lt;181::AID-CEM459&gt;3.0.CO;2-7</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Filzmoser et al.(2012)</label><mixed-citation>Filzmoser, P., Gschwandtner, M., and Todorov, V.: Review of sparse methods in
regression and classification with application to chemometrics, J.
Chemometr., 26, 42–51, <ext-link xlink:href="http://dx.doi.org/10.1002/cem.1418" ext-link-type="DOI">10.1002/cem.1418</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Flagan and Seinfeld(1988)</label><mixed-citation>
Flagan, R. C. and Seinfeld, J. H.: Fundamentals of air pollution engineering,
Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1988.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Francis Bach and Obozinski(2011)</label><mixed-citation>Francis Bach, Rodolphe Jenatton, J. M. and Obozinski, G.: Optimization with
Sparsity-Inducing Penalties, Foundations and Trends<sup>®</sup> in Machine Learning,
4, 1–106, <ext-link xlink:href="http://dx.doi.org/10.1561/2200000015" ext-link-type="DOI">10.1561/2200000015</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Friedel and Carlson(1972)</label><mixed-citation>Friedel, R. and Carlson, G.: Difficult carbonaceous materials and their
infrared and Raman spectra. Reassignments for coal spectra, Fuel, 51, 194–198,
<ext-link xlink:href="http://dx.doi.org/10.1016/0016-2361(72)90079-8" ext-link-type="DOI">10.1016/0016-2361(72)90079-8</ext-link>, 1972.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Friedel and Carlson(1971)</label><mixed-citation>Friedel, R. A. and Carlson, G. L.: Infrared spectra of ground graphite, J.
Phys. Chem., 75, 1149–1151, <ext-link xlink:href="http://dx.doi.org/10.1021/j100678a021" ext-link-type="DOI">10.1021/j100678a021</ext-link>, 1971.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Friedman et al.(2010)</label><mixed-citation>Friedman, J. H., Hastie, T., and Tibshirani, R.: Regularization Paths for
Generalized Linear Models via Coordinate Descent, J. Stat.
Softw., 33, 1–22, <ext-link xlink:href="http://dx.doi.org/10.18637/jss.v033.i01" ext-link-type="DOI">10.18637/jss.v033.i01</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Fu et al.(2013)</label><mixed-citation>Fu, D., Leng, C., Kelley, J., Zeng, G., Zhang, Y., and Liu, Y.: ATR-IR Study of
Ozone Initiated Heterogeneous Oxidation of Squalene in an Indoor Environment,
Environ. Sci. Tech., 47, 10611–10618, <ext-link xlink:href="http://dx.doi.org/10.1021/es4019018" ext-link-type="DOI">10.1021/es4019018</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Fu et al.(2011)</label><mixed-citation>Fu, G.-H., Xu, Q.-S., Li, H.-D., Cao, D.-S., and Liang, Y.-Z.: Elastic Net
Grouping Variable Selection Combined with Partial Least Squares Regression
(EN-PLSR) for the Analysis of Strongly Multi-collinear Spectroscopic
Data, Appl. Spectrosc., 65, 402–408, <ext-link xlink:href="http://dx.doi.org/10.1366/10-06069" ext-link-type="DOI">10.1366/10-06069</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Geladi and Kowalski(1986)</label><mixed-citation>Geladi, P. and Kowalski, B. R.: Partial least-squares regression: a tutorial,
Anal. Chim. Acta, 185, 1–17, <ext-link xlink:href="http://dx.doi.org/10.1016/0003-2670(86)80028-9" ext-link-type="DOI">10.1016/0003-2670(86)80028-9</ext-link>, 1986.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Gilardoni et al.(2009)</label><mixed-citation>Gilardoni, S., Liu, S., Takahama, S., Russell, L. M., Allan, J. D., Steinbrecher, R.,
Jimenez, J. L., De Carlo, P. F., Dunlea, E. J., and Baumgardner, D.: Characterization
of organic ambient aerosol during MIRAGE 2006 on three platforms,
Atmos. Chem. Phys., 9, 5417–5432, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-9-5417-2009" ext-link-type="DOI">10.5194/acp-9-5417-2009</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Gowen et al.(2011)</label><mixed-citation>Gowen, A. A., Downey, G., Esquerre, C., and O'Donnell, C. P.: Preventing
over-fitting in PLS calibration models of near-infrared (NIR)
spectroscopy data using regression coefficients, J. Chemometr., 25,
375–381, <ext-link xlink:href="http://dx.doi.org/10.1002/cem.1349" ext-link-type="DOI">10.1002/cem.1349</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Griffiths and Haseth(2007)</label><mixed-citation>
Griffiths, P. and Haseth, J. A. D.: Fourier Transform Infrared Spectrometry,
John Wiley &amp; Sons, 2nd Edn., 2007.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Guzman-Morales et al.(2014)</label><mixed-citation>Guzman-Morales, J., Frossard, A., Corrigan, A., Russell, L., Liu, S., Takahama,
S., Taylor, J., Allan, J., Coe, H., Zhao, Y., and Goldstein, A.: Estimated
contributions of primary and secondary organic aerosol from fossil fuel
combustion during the CalNex and Cal-Mex campaigns, Atmos. Environ.,
88, 330–340, <ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2013.08.047" ext-link-type="DOI">10.1016/j.atmosenv.2013.08.047</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Haaland and Thomas(1988)</label><mixed-citation>Haaland, D. M. and Thomas, E. V.: Partial least-squares methods for spectral
analyses – 1. Relation to other quantitative calibration methods and the
extraction of qualitative information, Anal. Chem., 60, 1193–1202,
<ext-link xlink:href="http://dx.doi.org/10.1021/ac00162a020" ext-link-type="DOI">10.1021/ac00162a020</ext-link>, 1988.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Hamilton et al.(2004)</label><mixed-citation>Hamilton, J. F., Webb, P. J., Lewis, A. C., Hopkins, J. R., Smith, S., and Davy, P.:
Partially oxidised organic components in urban aerosol using GCXGC-TOF/MS,
Atmos. Chem. Phys., 4, 1279–1290, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-4-1279-2004" ext-link-type="DOI">10.5194/acp-4-1279-2004</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Hand et al.(2012)</label><mixed-citation>Hand, J. L., Schichtel, B. A., Pitchford, M., Malm, W. C., and Frank, N. H.:
Seasonal composition of remote and urban fine particulate matter in the
United States, J. Geophys. Res.-Atmos., 117, D05209,
<ext-link xlink:href="http://dx.doi.org/10.1029/2011JD017122" ext-link-type="DOI">10.1029/2011JD017122</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Hastie and Qian(2014)</label><mixed-citation>Hastie, T. and Qian, J.: Glmnet Vignette, <uri>http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html</uri> (last access: 6 January 2016), 2014</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Hastie et al.(2009)</label><mixed-citation>
Hastie, T., Tibshirani, R., and Friedman, J.: The elements of statistical
learning: data mining, inference, and prediction, Springer Verlag, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Hawkins and Russell(2010)</label><mixed-citation>Hawkins, L. N. and Russell, L. M.: Oxidation of ketone groups in transported
biomass burning aerosol from the 2008 Northern California Lightning Series
fires, Atmos. Environ., 44, 4142–4154,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2010.07.036" ext-link-type="DOI">10.1016/j.atmosenv.2010.07.036</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Héberger(2010)</label><mixed-citation>Héberger, K.: Sum of ranking differences compares methods or models fairly,
TRAC-Trend. Anal. Chem., 29, 101–109,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.trac.2009.09.009" ext-link-type="DOI">10.1016/j.trac.2009.09.009</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Héberger and Kollár-Hunek(2011)</label><mixed-citation>Héberger, K. and Kollár-Hunek, K.: Sum of ranking differences for method
discrimination and its validation: comparison of ranks with random numbers,
J. Chemometr., 25, 151–158, <ext-link xlink:href="http://dx.doi.org/10.1002/cem.1320" ext-link-type="DOI">10.1002/cem.1320</ext-link>,
2011.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Hoerl and Kennard(1970)</label><mixed-citation>Hoerl, A. E. and Kennard, R. W.: Ridge Regression – Biased Estimation For
Nonorthogonal Problems, Technometrics, 12, 55–67,
<ext-link xlink:href="http://dx.doi.org/10.1080/00401706.1970.10488634" ext-link-type="DOI">10.1080/00401706.1970.10488634</ext-link>, 1970.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Höskuldsson(1988)</label><mixed-citation>Höskuldsson, A.: PLS regression methods, J. Chemometr., 2,
211–228, <ext-link xlink:href="http://dx.doi.org/10.1002/cem.1180020306" ext-link-type="DOI">10.1002/cem.1180020306</ext-link>, 1988.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Höskuldsson(2001)</label><mixed-citation>Höskuldsson, A.: Variable and subset selection in PLS regression,
Chemometr. Intell. Lab., 55, 23–38,
<ext-link xlink:href="http://dx.doi.org/10.1016/S0169-7439(00)00113-1" ext-link-type="DOI">10.1016/S0169-7439(00)00113-1</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Hudson et al.(2007)</label><mixed-citation>Hudson, P. K., Schwarz, J., Baltrusaitis, J., Gibson, E. R., and Grassian,
V. H.: A spectroscopic study of atmospherically relevant concentrated aqueous
nitrate solutions, J. Phys. Chem. A, 111, 544–548,
<ext-link xlink:href="http://dx.doi.org/10.1021/jp0664216" ext-link-type="DOI">10.1021/jp0664216</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Hudson et al.(2008)</label><mixed-citation>Hudson, P. K., Young, M. A., Kleiber, P. D., and Grassian, V. H.: Coupled
infrared extinction spectra and size distribution measurements for several
non-clay components of mineral dust aerosol (quartz, calcite, and dolomite),
Atmos. Environ., 42, 5991–5999, <ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2008.03.046" ext-link-type="DOI">10.1016/j.atmosenv.2008.03.046</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Hung et al.(2002)</label><mixed-citation>Hung, H. M., Malinowski, A., and Martin, S. T.: Ice nucleation kinetics of
aerosols containing aqueous and solid ammonium sulfate particles, J.
Phys. Chem. A, 106, 293–306, <ext-link xlink:href="http://dx.doi.org/10.1021/jp012064h" ext-link-type="DOI">10.1021/jp012064h</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>IPCC(2013)</label><mixed-citation>IPCC: Climate Change 2013: The Physical Science Basis, Contribution of Working
Group I to the Fifth Assessment Report of the Intergovernmental Panel on
Climate Change, Tech. rep., <ext-link xlink:href="http://dx.doi.org/10.1017/CBO9781107415324" ext-link-type="DOI">10.1017/CBO9781107415324</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Kalivas(2012)</label><mixed-citation>Kalivas, J. H.: Overview of two-norm (L2) and one-norm (L1) Tikhonov
regularization variants for full wavelength or sparse spectral multivariate
calibration models or maintenance, J. Chemometr., 26, 218–230,
<ext-link xlink:href="http://dx.doi.org/10.1002/cem.2429" ext-link-type="DOI">10.1002/cem.2429</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Kalivas et al.(2015)</label><mixed-citation>Kalivas, J. H., Héberger, K., and Andries, E.: Sum of ranking differences
(SRD) to ensemble multivariate calibration model merits for tuning parameter
selection and comparing calibration methods, Anal. Chim. Acta, 869,
21–33, <ext-link xlink:href="http://dx.doi.org/10.1016/j.aca.2014.12.056" ext-link-type="DOI">10.1016/j.aca.2014.12.056</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Karcher et al.(1985)</label><mixed-citation>
Karcher, W., Fordham, R. J., Dubois, J. J., Glaude, P. G. J. M., and Ligthart,
J. A. M.: Spectral Atlas of Polycyclic Aromatic Compounds: including Data on
Occurrence and Biological Activity, Spectral Atlas of Polycyclic Aromatic
Compounds, D. Reidel Publishing Company, Dordrecht, the Netherlands, 1985.</mixed-citation></ref>
      <ref id="bib1.bibx60"><label>Kelley(2012)</label><mixed-citation>
Kelley, A. M.: Condensed-Phase Molecular Spectroscopy and Photophysics,
John Wiley &amp; Sons, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx61"><label>Kollár-Hunek and Héberger(2013)</label><mixed-citation>Kollár-Hunek, K. and Héberger, K.: Method and model comparison by sum of
ranking differences in cases of repeated observations (ties),
Chemometr. Intell. Lab., 127, 139–146, <ext-link xlink:href="http://dx.doi.org/10.1016/j.chemolab.2013.06.007" ext-link-type="DOI">10.1016/j.chemolab.2013.06.007</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx62"><label>Kroll and Seinfeld(2008)</label><mixed-citation>Kroll, J. H. and Seinfeld, J. H.: Chemistry of secondary organic aerosol:
Formation and evolution of low-volatility organics in the atmosphere,
Atmos. Environ., 42, 3593–3624,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2008.01.003" ext-link-type="DOI">10.1016/j.atmosenv.2008.01.003</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx63"><label>Kvalheim and Karstang(1989)</label><mixed-citation>Kvalheim, O. M. and Karstang, T. V.: Interpretation of latent-variable
regression models, Chemometr. Intell. Lab., 7,
39–51, <ext-link xlink:href="http://dx.doi.org/10.1016/0169-7439(89)80110-8" ext-link-type="DOI">10.1016/0169-7439(89)80110-8</ext-link>, 1989.</mixed-citation></ref>
      <ref id="bib1.bibx64"><label>Kvalheim et al.(2014)</label><mixed-citation>Kvalheim, O. M., Arneberg, R., Bleie, O., Rajalahti, T., Smilde, A. K., and
Westerhuis, J. A.: Variable importance in latent variable regression models,
J. Chemometr., 28, 615–622, <ext-link xlink:href="http://dx.doi.org/10.1002/cem.2626" ext-link-type="DOI">10.1002/cem.2626</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx65"><label>Lack et al.(2014)</label><mixed-citation>Lack, D. A., Moosmueller, H., McMeeking, G. R., Chakrabarty, R. K., and
Baumgardner, D.: Characterizing elemental, equivalent black, and refractory
black carbon aerosol particles: a review of techniques, their limitations and
uncertainties, Anal. Bioanal. Chem., 406, 99–122,
<ext-link xlink:href="http://dx.doi.org/10.1007/s00216-013-7402-3" ext-link-type="DOI">10.1007/s00216-013-7402-3</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx66"><label>Lê Cao et al.(2008)</label><mixed-citation>Lê Cao, K.-A., Rossouw, D., Robert-Granié, C., and Besse, P.: A Sparse
PLS for Variable Selection when Integrating Omics Data, Stat. Appl. Genet. Mo. B., 7, 35,
<ext-link xlink:href="http://dx.doi.org/10.2202/1544-6115.1390" ext-link-type="DOI">10.2202/1544-6115.1390</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx67"><label>Lee et al.(2011)Lee, Lee, Lee, and Pawitan</label><mixed-citation>Lee, D., Lee, W., Lee, Y., and Pawitan, Y.: Sparse partial least-squares
regression and its applications to high-throughput data analysis,
Chemometr. Intell. Lab., 109, 1–8,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.chemolab.2011.07.002" ext-link-type="DOI">10.1016/j.chemolab.2011.07.002</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx68"><label>Liu(2014)</label><mixed-citation>Liu, J.: Developing a soft sensor based on sparse partial least squares with
variable selection, J. Process Contr., 24, 1046–1056,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.jprocont.2014.05.014" ext-link-type="DOI">10.1016/j.jprocont.2014.05.014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx69"><label>Liu et al.(2009)</label><mixed-citation>Liu, S., Takahama, S., Russell, L. M., Gilardoni, S., and Baumgardner, D.:
Oxygenated organic functional groups and their sources in single and submicron
organic particles in MILAGRO 2006 campaign, Atmos. Chem. Phys., 9, 6849–6863, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-9-6849-2009" ext-link-type="DOI">10.5194/acp-9-6849-2009</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx70"><label>Malm et al.(1994)</label><mixed-citation>Malm, W. C., Sisler, J. F., Huffman, D., Eldred, R. A., and Cahill, T. A.:
Spatial and seasonal trends in particle concentration and optical extinction
in the United States, J. Geophys. Res.-Atmos., 99,
1347–1370, <ext-link xlink:href="http://dx.doi.org/10.1029/93JD02916" ext-link-type="DOI">10.1029/93JD02916</ext-link>, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx71"><label>Maria et al.(2003)</label><mixed-citation>Maria, S. F., Russell, L. M., Turpin, B. J., Porcja, R. J., Campos, T. L.,
Weber, R. J., and Huebert, B. J.: Source signatures of carbon monoxide and
organic functional groups in Asian Pacific Regional Aerosol Characterization
Experiment (ACE-Asia) submicron aerosol types, J. Geophys.
Res.-Atmos., 108, 8637, <ext-link xlink:href="http://dx.doi.org/10.1029/2003JD003703" ext-link-type="DOI">10.1029/2003JD003703</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx72"><label>Martens(1991)</label><mixed-citation>
Martens, H.: Multivariate Calibration, John Wiley &amp; Sons, New York, 1991.</mixed-citation></ref>
      <ref id="bib1.bibx73"><label>Mazumder et al.(2011)</label><mixed-citation>Mazumder, R., Friedman, J. H., and Hastie, T.: SparseNet: Coordinate Descent
With Nonconvex Penalties, J. Am. Stat. Assoc.
106, 1125–1138, <ext-link xlink:href="http://dx.doi.org/10.1198/jasa.2011.tm09738" ext-link-type="DOI">10.1198/jasa.2011.tm09738</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx74"><label>Mcclenny et al.(1985)</label><mixed-citation>Mcclenny, W. A., Childers, J. W., Rōhl, R., and Palmer, R. A.: FTIR
transmission spectrometry for the nondestructive determination of ammonium
and sulfate in ambient aerosols collected on teflon filters, Atmos.
Environ., 19, 1891–1898, <ext-link xlink:href="http://dx.doi.org/10.1016/0004-6981(85)90014-9" ext-link-type="DOI">10.1016/0004-6981(85)90014-9</ext-link>, 1985.</mixed-citation></ref>
      <ref id="bib1.bibx75"><label>Mehmood et al.(2012)</label><mixed-citation>Mehmood, T., Liland, K. H., Snipen, L., and Saebo, S.: A review of variable
selection methods in Partial Least Squares Regression, Chemometr. Intell. Lab., 118, 62–69,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.chemolab.2012.07.010" ext-link-type="DOI">10.1016/j.chemolab.2012.07.010</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx76"><label>Mevik and Wehrens(2007)</label><mixed-citation>Mevik, B. and Wehrens, R.: The pls package: Principal component and partial
least squares regression in R, J. Stat. Softw., 18, 1–24,
<ext-link xlink:href="http://dx.doi.org/10.18637/jss.v018.i02" ext-link-type="DOI">10.18637/jss.v018.i02</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx77"><label>Moussa et al.(2009)</label><mixed-citation>Moussa, S. G., McIntire, T. M., Szöri, M., Roeselová, M., Tobias, D. J.,
Grimm, R. L., Hemminger, J. C., and Finlayson-Pitts, B. J.: Experimental and
Theoretical Characterization of Adsorbed Water on Self-Assembled Monolayers:
Understanding the Interaction of Water with Atmospherically Relevant
Surfaces, J. Phys. Chem. A, 113, 2060–2069,
<ext-link xlink:href="http://dx.doi.org/10.1021/jp808710n" ext-link-type="DOI">10.1021/jp808710n</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx78"><label>Murphy et al.(2007)</label><mixed-citation>Murphy, S. M., Sorooshian, A., Kroll, J. H., Ng, N. L., Chhabra, P., Tong, C.,
Surratt, J. D., Knipping, E., Flagan, R. C., and Seinfeld, J. H.: Secondary aerosol
formation from atmospheric reactions of aliphatic amines, Atmos. Chem. Phys.,
7, 2313–2337, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-7-2313-2007" ext-link-type="DOI">10.5194/acp-7-2313-2007</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx79"><label>Nadler and Coifman(2005)</label><mixed-citation>Nadler, B. and Coifman, R. R.: The prediction error in CLS and PLS: the
importance of feature selection prior to multivariate calibration, J.
Chemometr., 19, 107–118, <ext-link xlink:href="http://dx.doi.org/10.1002/cem.915" ext-link-type="DOI">10.1002/cem.915</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx80"><label>Pavia et al.(2008)</label><mixed-citation>
Pavia, D., Lampman, G., and Kriz, G.: Introduction to Spectroscopy, Brooks/Cole
Pub Co., Belmont, CA, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx81"><label>Petzold et al.(2013)</label><mixed-citation>Petzold, A., Ogren, J. A., Fiebig, M., Laj, P., Li, S.-M., Baltensperger, U.,
Holzer-Popp, T., Kinne, S., Pappalardo, G., Sugimoto, N., Wehrli, C., Wiedensohler, A.,
and Zhang, X.-Y.: Recommendations for reporting “black carbon” measurements,
Atmos. Chem. Phys., 13, 8365–8379, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-13-8365-2013" ext-link-type="DOI">10.5194/acp-13-8365-2013</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx82"><label>Pitts Jr.et al.(1978)</label><mixed-citation>Pitts Jr., J. N., Grosjean, D., Cauwenberghe, K. V., Schmid, J. P., and
Fitz, D. R.: Photooxidation of aliphatic amines under simulated atmospheric
conditions: formation of nitrosamines, nitramines, amides, and photochemical
oxidant, Environ. Sci. Tech., 12, 946–953,
<ext-link xlink:href="http://dx.doi.org/10.1021/es60144a009" ext-link-type="DOI">10.1021/es60144a009</ext-link>, 1978.</mixed-citation></ref>
      <ref id="bib1.bibx83"><label>R Core Team(2014)</label><mixed-citation>R Core Team: R: A Language and Environment for Statistical Computing, R
Foundation for Statistical Computing, Vienna, Austria,
<uri>http://www.R-project.org/</uri>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx84"><label>Reff et al.(2007)</label><mixed-citation>Reff, A., Turpin, B. J., Offenberg, J. H., Weisel, C. P., Zhang, J., Morandi,
M., Stock, T., Colome, S., and Winer, A.: A functional group characterization
of organic PM<inline-formula><mml:math display="inline"><mml:msub><mml:mi/><mml:mn>2.5</mml:mn></mml:msub></mml:math></inline-formula> exposure: Results from the RIOPA study RID C-3787-2009,
Atmos. Environ., 41, 4585–4598,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2007.03.054" ext-link-type="DOI">10.1016/j.atmosenv.2007.03.054</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx85"><label>Reggente et al.(2016)</label><mixed-citation>Reggente, M., Dillner, A. M., and Takahama, S.: Predicting ambient aerosol
thermal-optical reflectance (TOR) measurements from infrared spectra: extending
the predictions to different years and different sites, Atmos. Meas. Tech.,
9, 441–454, <ext-link xlink:href="http://dx.doi.org/10.5194/amt-9-441-2016" ext-link-type="DOI">10.5194/amt-9-441-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx86"><label>Reinikainen and Höskuldsson(2003)</label><mixed-citation>Reinikainen, S. P. and Höskuldsson, A.: COVPROC method: strategy in
modeling dynamic systems, J. Chemometr., 17, 130–139,
<ext-link xlink:href="http://dx.doi.org/10.1002/cem.770" ext-link-type="DOI">10.1002/cem.770</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx87"><label>Ripley and Thompson(1987)</label><mixed-citation>Ripley, B. D. and Thompson, M.: Regression techniques for the detection of
analytical bias, Analyst, 112, 377–383, <ext-link xlink:href="http://dx.doi.org/10.1039/AN9871200377" ext-link-type="DOI">10.1039/AN9871200377</ext-link>, 1987.</mixed-citation></ref>
      <ref id="bib1.bibx88"><label>Rosipal and Krämer(2006)</label><mixed-citation>Rosipal, R. and Krämer, N.: Overview and Recent Advances in Partial Least
Squares, in: Subspace, Latent Structure and Feature Selection, edited by:
Saunders, C., Grobelnik, M., Gunn, S., and Shawe-Taylor, J.,
Lect. Notes Comput. Sc., 3940, 34–51, Springer Berlin
Heidelberg, <ext-link xlink:href="http://dx.doi.org/10.1007/11752790_2" ext-link-type="DOI">10.1007/11752790_2</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx89"><label>Russell et al.(2009)</label><mixed-citation>Russell, L. M., Bahadur, R., Hawkins, L. N., Allan, J., Baumgardner, D., Quinn,
P. K., and Bates, T. S.: Organic aerosol characterization by complementary
measurements of chemical bonds and molecular fragments, Atmos.
Environ., 43, 6100–6105, <ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2009.09.036" ext-link-type="DOI">10.1016/j.atmosenv.2009.09.036</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx90"><label>Russell et al.(2011)</label><mixed-citation>Russell, L. M., Bahadur, R., and Ziemann, P. J.: Identifying organic aerosol
sources by comparing functional group composition in chamber and atmospheric
particles, P. Natl. Acad. Sci. USA, 108, 3516–3521, <ext-link xlink:href="http://dx.doi.org/10.1073/pnas.1006461108" ext-link-type="DOI">10.1073/pnas.1006461108</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx91"><label>Ruthenburg et al.(2014)</label><mixed-citation>Ruthenburg, T. C., Perlin, P. C., Liu, V., McDade, C. E., and Dillner, A. M.:
Determination of organic matter and organic matter to organic carbon ratios
by infrared spectroscopy with application to selected sites in the IMPROVE
network, Atmos. Environ., 86, 47–57,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2013.12.034" ext-link-type="DOI">10.1016/j.atmosenv.2013.12.034</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx92"><label>Sax et al.(2005)</label><mixed-citation>Sax, M., Zenobi, R., Baltensperger, U., and Kalberer, M.: Time resolved
infrared spectroscopic analysis of aerosol formed by photo-oxidation of
1,3,5-trimethylbenzene and alpha-pinene, Aerosol Sci. Tech., 39,
822–830, <ext-link xlink:href="http://dx.doi.org/10.1080/02786820500257859" ext-link-type="DOI">10.1080/02786820500257859</ext-link>, rID F-1113-2010, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx93"><label>Seinfeld and Pandis(2006)</label><mixed-citation>
Seinfeld, J. H. and Pandis, S. N.: Atmospheric Chemistry and Physics: From Air
Pollution to Climate Change, John Wiley &amp; Sons, New York, 2nd edition Edn.,
2006.</mixed-citation></ref>
      <ref id="bib1.bibx94"><label>Shen and Huang(2008)</label><mixed-citation>Shen, H. and Huang, J. Z.: Sparse principal component analysis via regularized
low rank matrix approximation, J. Multivariate Anal., 99,
1015–1034, <ext-link xlink:href="http://dx.doi.org/10.1016/j.jmva.2007.06.007" ext-link-type="DOI">10.1016/j.jmva.2007.06.007</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx95"><label>Shurvell(2006)</label><mixed-citation>Shurvell, H.: Spectra–Structure Correlations in the Mid- and Far-Infrared,
John Wiley &amp; Sons Ltd., <ext-link xlink:href="http://dx.doi.org/10.1002/0470027320.s4101" ext-link-type="DOI">10.1002/0470027320.s4101</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx96"><label>Si and Samulski(2008)</label><mixed-citation>Si, Y. and Samulski, E. T.: Synthesis of Water Soluble Graphene, Nano Lett.,
8, 1679–1682, <ext-link xlink:href="http://dx.doi.org/10.1021/nl080604h" ext-link-type="DOI">10.1021/nl080604h</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx97"><label>Spiegelman et al.(1998)</label><mixed-citation>Spiegelman, C. H., McShane, M. J., Goetz, M. J., Motamedi, M., Yue, Q. L., and
Cote, G. L.: Theoretical justification of wavelength selection in PLS
calibration development of a new algorithm, Anal. Chem., 70, 35–44,
<ext-link xlink:href="http://dx.doi.org/10.1021/ac9705733" ext-link-type="DOI">10.1021/ac9705733</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx98"><label>Stankovich et al.(2006)</label><mixed-citation>Stankovich, S., Piner, R. D., Nguyen, S. T., and Ruoff, R. S.: Synthesis and
exfoliation of isocyanate-treated graphene oxide nanoplatelets, Carbon, 44,
3342–3347, <ext-link xlink:href="http://dx.doi.org/10.1016/j.carbon.2006.06.004" ext-link-type="DOI">10.1016/j.carbon.2006.06.004</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx99"><label>Surratt et al.(2007)</label><mixed-citation>Surratt, J. D., Kroll, J. H., Kleindienst, T. E., Edney, E. O., Claeys, M.,
Sorooshian, A., Ng, N. L., Offenberg, J. H., Lewandowski, M., Jaoui, M.,
Flagan, R. C., and Seinfeld, J. H.: Evidence for organosulfates in secondary
organic aerosol, Environ. Sci. Tech., 41, 517–527,
<ext-link xlink:href="http://dx.doi.org/10.1021/es062081q" ext-link-type="DOI">10.1021/es062081q</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx100"><label>Szabó et al.(2006)</label><mixed-citation>Szabó, T., Berkesi, O., Forgó, P., Josepovits, K., Sanakis, Y., Petridis, D.,
and Dékány, I.: Evolution of Surface Functional Groups in a Series of
Progressively Oxidized Graphite Oxides, Chem. Mater., 18,
2740–2749, <ext-link xlink:href="http://dx.doi.org/10.1021/cm060258+" ext-link-type="DOI">10.1021/cm060258+</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx101"><label>Takahama and Dillner(2015)</label><mixed-citation>Takahama, S. and Dillner, A. M.: Model selection for partial least squares
calibration and implications for analysis of atmospheric organic aerosol
samples with mid-infrared spectroscopy, J. Chemometr., 29,
659–668, <ext-link xlink:href="http://dx.doi.org/10.1002/cem.2761" ext-link-type="DOI">10.1002/cem.2761</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx102"><label>Takahama et al.(2013)</label><mixed-citation>Takahama, S., Johnson, A., and Russell, L. M.: Quantification of Carboxylic and
Carbonyl Functional Groups in Organic Aerosol Infrared Absorbance Spectra,
Aerosol Sci. Tech., 47, 310–325,
<ext-link xlink:href="http://dx.doi.org/10.1080/02786826.2012.752065" ext-link-type="DOI">10.1080/02786826.2012.752065</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx103"><label>ter Braak and de Jong(1998)</label><mixed-citation>ter Braak, C. J. F. and de Jong, S.: The objective function of partial least
squares regression, J. Chemometr., 12, 41–54,
<ext-link xlink:href="http://dx.doi.org/10.1002/(SICI)1099-128X(199801/02)12:1&lt;41::AID-CEM500&gt;3.0.CO;2-F" ext-link-type="DOI">10.1002/(SICI)1099-128X(199801/02)12:1&lt;41::AID-CEM500&gt;3.0.CO;2-F</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx104"><label>Tibshirani(1996)</label><mixed-citation>
Tibshirani, R.: Regression shrinkage and selection via the Lasso,
J. Roy. Stat. Soc. B Met., 58, 267–288, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx105"><label>Tikhonov and Arsenin(1977)</label><mixed-citation>
Tikhonov, A. N. and Arsenin, V. I.: Solutions of ill-posed problems, Halsted
Press, New York, 1977.</mixed-citation></ref>
      <ref id="bib1.bibx106"><label>Tuinstra and Koenig(1970)</label><mixed-citation>Tuinstra, F. and Koenig, J. L.: Raman Spectrum of Graphite, J.
Chem. Phys., 53, 1126–1130, <ext-link xlink:href="http://dx.doi.org/10.1063/1.1674108" ext-link-type="DOI">10.1063/1.1674108</ext-link>, 1970.</mixed-citation></ref>
      <ref id="bib1.bibx107"><label>Weakley et al.(2014)</label><mixed-citation>Weakley, A., Miller, A., Griffiths, P., and Bayman, S.: Quantifying silica in
filter-deposited mine dusts using infrared spectra and partial least squares
regression, Anal. Bioanal. Chem., 406, 4715–4724,
<ext-link xlink:href="http://dx.doi.org/10.1007/s00216-014-7856-y" ext-link-type="DOI">10.1007/s00216-014-7856-y</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx108"><label>Weisberg(2013)</label><mixed-citation>
Weisberg, S.: Applied Linear Regression, Wiley Series in Probability and
Statistics, Wiley, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx109"><label>Wittig et al.(2004)</label><mixed-citation>Wittig, A. E., Anderson, N., Khlystov, A. Y., Pandis, S. N., Davidson, C., and
Robinson, A. L.: Pittsburgh air quality study overview, Atmos.
Environ., 38, 3107–3125, <ext-link xlink:href="http://dx.doi.org/10.1016/j.atmosenv.2004.03.003" ext-link-type="DOI">10.1016/j.atmosenv.2004.03.003</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx110"><label>Wold(1966)</label><mixed-citation>
Wold, H.: Estimation of Principal Components and Related Models by Iterative
Least squares, in: Multivariate Analysis, Academic Press, 391–420, 1966.</mixed-citation></ref>
      <ref id="bib1.bibx111"><label>Wold(1975)</label><mixed-citation>
Wold, H.: Soft modeling by latent variables: the nonlinear iterative partial
least squares approach, Perspectives in probability and statistics, Papers in
honour of M. S. Bartlett, 520–540, 1975.</mixed-citation></ref>
      <ref id="bib1.bibx112"><label>Wold(1993)</label><mixed-citation>Wold, S.: Discussion: PLS in Chemical Practice, Technometrics, 35, 136–139,
<ext-link xlink:href="http://dx.doi.org/10.2307/1269657" ext-link-type="DOI">10.2307/1269657</ext-link>,  1993.</mixed-citation></ref>
      <ref id="bib1.bibx113"><label>Wold et al.(1983)</label><mixed-citation>
Wold, S., Martens, H., and Wold, H.: The Multivariate Calibration-problem In
Chemistry Solved By the Pls Method, Lect. Notes Math., 973,
286–293, 1983.</mixed-citation></ref>
      <ref id="bib1.bibx114"><label>Wold et al.(1984)</label><mixed-citation>Wold, S., Ruhe, A., Wold, H., and Dunn, III, W. J.: The Collinearity Problem
in Linear Regression. The Partial Least Squares (PLS) Approach to
Generalized Inverses, SIAM J. Sci. Stat. Comp.,
5, 735–743, <ext-link xlink:href="http://dx.doi.org/10.1137/0905052" ext-link-type="DOI">10.1137/0905052</ext-link>, 1984.</mixed-citation></ref>
      <ref id="bib1.bibx115"><label>You et al.(2014)</label><mixed-citation>You, Y., Kanawade, V. P., de Gouw, J. A., Guenther, A. B., Madronich, S.,
Sierra-Hernández, M. R., Lawler, M., Smith, J. N., Takahama, S.,
Ruggeri, G., Koss, A., Olson, K., Baumann, K., Weber, R. J., Nenes, A., Guo, H.,
Edgerton, E. S., Porcelli, L., Brune, W. H., Goldstein, A. H., and Lee, S.-H.:
Atmospheric amines and ammonia measured with a chemical ionization mass spectrometer
(CIMS), Atmos. Chem. Phys., 14, 12181–12194, <ext-link xlink:href="http://dx.doi.org/10.5194/acp-14-12181-2014" ext-link-type="DOI">10.5194/acp-14-12181-2014</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx116"><label>Zou and Hastie(2005)</label><mixed-citation>Zou, H. and Hastie, T.: Regularization and variable selection via the elastic
net, J. Roy. Stat. Soc. B Met., 67, 301–320, <ext-link xlink:href="http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x" ext-link-type="DOI">10.1111/j.1467-9868.2005.00503.x</ext-link>, 2005.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx117"><label>Zou et al.(2006)</label><mixed-citation>Zou, H., Hastie, T., and Tibshirani, R.: Sparse principal component analysis,
J. Comput. Graph. Stat., 15, 265–286,
<ext-link xlink:href="http://dx.doi.org/10.1198/106186006X113430" ext-link-type="DOI">10.1198/106186006X113430</ext-link>, 2006.</mixed-citation></ref>

  </ref-list><app-group content-type="float"><app><title/>

    </app></app-group></back>
    <!--<article-title-html> Analysis of functional groups in atmospheric aerosols by infrared
spectroscopy: sparse methods for statistical selection of relevant absorption bands</article-title-html>
<abstract-html><p class="p">Various vibrational modes present in molecular mixtures of laboratory and
atmospheric aerosols give rise to complex Fourier transform infrared (FT-IR)
absorption spectra. Such spectra can be chemically informative, but they often
require sophisticated algorithms for quantitative characterization of aerosol
composition. Naïve statistical calibration models developed for
quantification employ the full suite of wavenumbers available from a set of
spectra, leading to loss of mechanistic interpretation between chemical
composition and the resulting changes in absorption patterns that underpin
their predictive capability. Using sparse representations of the same set of
spectra, alternative calibration models can be built in which only a select
group of absorption bands are used to make quantitative prediction of various
aerosol properties. Such models are desirable as they allow us to relate predicted properties to their underlying molecular
structure. In this work, we present an evaluation of four algorithms for
achieving sparsity in FT-IR spectroscopy calibration models. Sparse
calibration models exclude unnecessary wavenumbers from infrared spectra
during the model building process, permitting identification and evaluation
of the most relevant vibrational modes of molecules in complex aerosol
mixtures required to make quantitative predictions of various measures of
aerosol composition. We study two types of models: one which predicts alcohol
COH, carboxylic COH, alkane CH, and carbonyl CO functional group (FG)
abundances in ambient samples based on laboratory calibration standards and
another which predicts thermal optical reflectance (TOR) organic carbon (OC)
and elemental carbon (EC) mass in new ambient samples by direct calibration
of infrared spectra to a set of ambient samples reserved for calibration. We
describe the development and selection of each calibration model and
evaluate the effect of sparsity on prediction performance. Finally, we
ascribe interpretation to absorption bands used in quantitative prediction of
FGs and TOR OC and EC concentrations.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Allen et al.(1994)</label><mixed-citation>
Allen, D. T., Palen, E. J., Haimov, M. I., Hering, S. V., and Young, J. R.:
Fourier-transform Infrared-spectroscopy of Aerosol Collected In A
Low-pressure Impactor (LPI/FTIR) – Method Development and Field Calibration,
Aerosol Sci. Tech., 21, 325–342, <a href="http://dx.doi.org/10.1080/02786829408959719" target="_blank">doi:10.1080/02786829408959719</a>, 1994.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Andries and Martin(2013)</label><mixed-citation>
Andries, E. and Martin, S.: Sparse Methods in Spectroscopy: An Introduction,
Overview, and Perspective, Appl. Spectrosc., 67, 579–593,
<a href="http://dx.doi.org/10.1366/13-07021" target="_blank">doi:10.1366/13-07021</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Arlot and Celisse(2010)</label><mixed-citation>
Arlot, S. and Celisse, A.: A survey of cross-validation procedures for model
selection, Statistics Surveys, 4, 40–79, <a href="http://dx.doi.org/10.1214/09-SS054" target="_blank">doi:10.1214/09-SS054</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Barsanti and Pankow(2006)</label><mixed-citation>
Barsanti, K. C. and Pankow, J. F.: Thermodynamics of the formation of
atmospheric organic particulate matter by accretion reactions – Part 3:
Carboxylic and dicarboxylic acids, Atmos. Environ., 40, 6676–6686,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2006.03.013" target="_blank">doi:10.1016/j.atmosenv.2006.03.013</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bishop(2009)</label><mixed-citation>
Bishop, C. M.: Pattern recognition and machine learning, Springer, New York,
NY, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Bond et al.(2004)</label><mixed-citation>
Bond, T. C., Streets, D. G., Yarber, K. R., Nelson, S. M., Woo, J.-H., and
Klimont, Z.: A technology-based global inventory of black and organic carbon
emissions from combustion, J. Geophys. Res., 109, D14203, <a href="http://dx.doi.org/10.1029/2003JD003697" target="_blank">doi:10.1029/2003JD003697</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Bornemann et al.(2008)</label><mixed-citation>
Bornemann, L., Welp, G., Brodowski, S., Rodionov, A., and Amelung, W.: Rapid
assessment of black carbon in soil organic matter using mid-infrared
spectroscopy, Org. Geochem., 39, 1537–1544,
<a href="http://dx.doi.org/10.1016/j.orggeochem.2008.07.012" target="_blank">doi:10.1016/j.orggeochem.2008.07.012</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Burnham et al.(1996)</label><mixed-citation>
Burnham, A. J., Viveros, R., and MacGregor, J. F.: Frameworks for latent
variable multivariate regression, J. Chemometr., 10, 31–45,
<a href="http://dx.doi.org/10.1002/(SICI)1099-128X(199601)10:1&lt;31::AID-CEM398&gt;3.0.CO;2-1" target="_blank">doi:10.1002/(SICI)1099-128X(199601)10:1&lt;31::AID-CEM398&gt;3.0.CO;2-1</a>, 1996.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Cai et al.(2008)</label><mixed-citation>
Cai, W., Li, Y., and Shao, X.: A variable selection method based on
uninformative variable elimination for multivariate calibration of
near-infrared spectra, Chemometr. Intell. Lab., 90,
188–194, <a href="http://dx.doi.org/10.1016/j.chemolab.2007.10.001" target="_blank">doi:10.1016/j.chemolab.2007.10.001</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Cain et al.(2010)</label><mixed-citation>
Cain, J. P., Gassman, P. L., Wang, H., and Laskin, A.: Micro-FTIR study of soot
chemical composition-evidence of aliphatic hydrocarbons on nascent soot
surfaces, Phys. Chem. Chem. Phys., 12, 5206–5218,
<a href="http://dx.doi.org/10.1039/b924344e" target="_blank">doi:10.1039/b924344e</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Cao et al.(2014)</label><mixed-citation>
Cao, K.-A. L., Rohart, F., Gonzalez, I., and Dejean, S.: mixOmics: Omics Data
Integration Project, <a href="http://CRAN.R-project.org/package=mixOmics" target="_blank">http://CRAN.R-project.org/package=mixOmics</a>, r package version
5.0-3, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Caroline et al.(2009)</label><mixed-citation>
Caroline, M. L., Sankar, R., Indirani, R., and Vasudevan, S.: Growth, optical,
thermal and dielectric studies of an amino acid organic nonlinear optical
material: l-Alanine, Mater. Chem. Phys., 114, 490–494,
<a href="http://dx.doi.org/10.1016/j.matchemphys.2008.09.070" target="_blank">doi:10.1016/j.matchemphys.2008.09.070</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Centner et al.(1996)</label><mixed-citation>
Centner, V., Massart, D.-L., de Noord, O. E., de Jong, S., Vandeginste,
B. M., and Sterna, C.: Elimination of Uninformative Variables for
Multivariate Calibration, Anal. Chem., 68, 3851–3858,
<a href="http://dx.doi.org/10.1021/ac960321m" target="_blank">doi:10.1021/ac960321m</a>, 1996.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Chong and Jun(2005)</label><mixed-citation>
Chong, I. G. and Jun, C. H.: Performance of some variable selection methods
when multicollinearity is present, Chemometr. Intell. Lab.,
78, 103–112, <a href="http://dx.doi.org/10.1016/j.chemolab.2004.12.011" target="_blank">doi:10.1016/j.chemolab.2004.12.011</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Chow et al.(2004)</label><mixed-citation>
Chow, J. C., Watson, J. G., Chen, L.-W. A., Arnott, W. P., Moosmüller, H.,
and Fung, K.: Equivalence of Elemental Carbon by Thermal/Optical Reflectance
and Transmittance with Different Temperature Protocols, Environ. Sci. Tech.,
38, 4414–4422, <a href="http://dx.doi.org/10.1021/es034936u" target="_blank">doi:10.1021/es034936u</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Chow et al.(2007)</label><mixed-citation>
Chow, J. C., Watson, J. G., Chen, L.-W. A., Chang, M. O., Robinson, N. F.,
Trimble, D., and Kohl, S.: The IMPROVE_A Temperature Protocol for
Thermal/Optical Carbon Analysis: Maintaining Consistency with a Long-Term
Database, J. Air Waste Manage., 57, 1014–1023,
<a href="http://dx.doi.org/10.3155/1047-3289.57.9.1014" target="_blank">doi:10.3155/1047-3289.57.9.1014</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Chun and Keles(2010)</label><mixed-citation>
Chun, H. and Keles, S.: Sparse partial least squares regression for
simultaneous dimension reduction and variable selection,
J. Roy. Stat. Soc. B Met., 72, 3–25,
<a href="http://dx.doi.org/10.1111/j.1467-9868.2009.00723.x" target="_blank">doi:10.1111/j.1467-9868.2009.00723.x</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Chung et al.(2013)</label><mixed-citation>
Chung, D., Chun, H., and Keles, S.: spls: Sparse Partial Least Squares (SPLS)
Regression and Classification, <a href="http://CRAN.R-project.org/package=spls" target="_blank">http://CRAN.R-project.org/package=spls</a>, r package version
2.2-1, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Corrigan et al.(2013)</label><mixed-citation>
Corrigan, A. L., Russell, L. M., Takahama, S., Äijälä, M., Ehn, M.,
Junninen, H., Rinne, J., Petäjä, T., Kulmala, M., Vogel, A. L.,
Hoffmann, T., Ebben, C. J., Geiger, F. M., Chhabra, P., Seinfeld, J. H.,
Worsnop, D. R., Song, W., Auld, J., and Williams, J.: Biogenic and biomass burning
organic aerosol in a boreal forest at Hyytiälä, Finland, during HUMPPA-COPEC 2010,
Atmos. Chem. Phys., 13, 12233–12256, <a href="http://dx.doi.org/10.5194/acp-13-12233-2013" target="_blank">doi:10.5194/acp-13-12233-2013</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Coury and Dillner(2008)</label><mixed-citation>
Coury, C. and Dillner, A. M.: A method to quantify organic functional groups
and inorganic compounds in ambient aerosols using attenuated total
reflectance FTIR spectroscopy and multivariate chemometric techniques,
Atmos. Environ., 42, 5923–5932, <a href="http://dx.doi.org/10.1016/j.atmosenv.2008.03.026" target="_blank">doi:10.1016/j.atmosenv.2008.03.026</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Cunningham et al.(1974)</label><mixed-citation>
Cunningham, P. T., Johnson, S. A., and Yang, R. T.: Variations in chemistry of
airborne particulate material with particle size and time, Environ.
Sci. Tech., 8, 131–135, <a href="http://dx.doi.org/10.1021/es60087a002" target="_blank">doi:10.1021/es60087a002</a>, 1974.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Cziczo et al.(1997)</label><mixed-citation>
Cziczo, D. J., Nowak, J. B., Hu, J. H., and Abbatt, J. P. D.: Infrared
spectroscopy of model tropospheric aerosols as a function of relative
humidity: Observation of deliquescence and crystallization, J.
Geophys. Res.-Atmos., 102, 18843–18850, <a href="http://dx.doi.org/10.1029/97JD01361" target="_blank">doi:10.1029/97JD01361</a>, 1997.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Day et al.(2010)</label><mixed-citation>
Day, D. A., Liu, S., Russell, L. M., and Ziemann, P. J.: Organonitrate group
concentrations in submicron particles with high nitrate and organic fractions
in coastal southern California, Atmos. Environ., 44, 1970–1979,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2010.02.045" target="_blank">doi:10.1016/j.atmosenv.2010.02.045</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>de la Rosa Arranz et al.(2013)</label><mixed-citation>
de la Rosa Arranz, J. M., González-Vila, F. J., González-Pérez, J. A.,
Almendros Martín, G., Hernández, Z., López Martín, M., and Knicker, H.:
How useful is the mid-infrared spectroscopy in the assessment of black carbon
in soils, Flamma, 4, 147–151, <a href="http://digital.csic.es/handle/10261/82100" target="_blank">http://digital.csic.es/handle/10261/82100</a>,
2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Dillner and Takahama(2015a)</label><mixed-citation>
Dillner, A. M. and Takahama, S.: Predicting ambient aerosol thermal-optical
reflectance (TOR) measurements from infrared spectra: organic carbon,
Atmos. Meas. Tech., 8, 1097–1109, <a href="http://dx.doi.org/10.5194/amt-8-1097-2015" target="_blank">doi:10.5194/amt-8-1097-2015</a>, 2015a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Dillner and Takahama(2015b)</label><mixed-citation>
Dillner, A. M. and Takahama, S.: Predicting ambient aerosol thermal-optical
reflectance measurements from infrared spectra: elemental carbon, Atmos. Meas. Tech.,
8, 4013–4023, <a href="http://dx.doi.org/10.5194/amt-8-4013-2015" target="_blank">doi:10.5194/amt-8-4013-2015</a>, 2015b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Dockery et al.(1993)</label><mixed-citation>
Dockery, D. W., Pope, C. A., Xu, X. P., Spengler, J. D., Ware, J. H., Fay,
M. E., Ferris, B. G., and Speizer, F. E.: An Association Between
Air-pollution and Mortality In 6 United-states Cities, New Engl. J. Med.,
329, 1753–1759, <a href="http://dx.doi.org/10.1056/NEJM199312093292401" target="_blank">doi:10.1056/NEJM199312093292401</a>, 1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Faber and Kowalski(1997)</label><mixed-citation>
Faber, K. and Kowalski, B. R.: Propagation of measurement errors for the
validation of predictions obtained by principal component regression and
partial least squares, J. Chemometr., 11, 181–238,
<a href="http://dx.doi.org/10.1002/(SICI)1099-128X(199705)11:3&lt;181::AID-CEM459&gt;3.0.CO;2-7" target="_blank">doi:10.1002/(SICI)1099-128X(199705)11:3&lt;181::AID-CEM459&gt;3.0.CO;2-7</a>, 1997.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Filzmoser et al.(2012)</label><mixed-citation>
Filzmoser, P., Gschwandtner, M., and Todorov, V.: Review of sparse methods in
regression and classification with application to chemometrics, J.
Chemometr., 26, 42–51, <a href="http://dx.doi.org/10.1002/cem.1418" target="_blank">doi:10.1002/cem.1418</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Flagan and Seinfeld(1988)</label><mixed-citation>
Flagan, R. C. and Seinfeld, J. H.: Fundamentals of air pollution engineering,
Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1988.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Francis Bach and Obozinski(2011)</label><mixed-citation>
Francis Bach, Rodolphe Jenatton, J. M. and Obozinski, G.: Optimization with
Sparsity-Inducing Penalties, Foundations and Trends<span style="position:relative; bottom:0.5em; " class="text">®</span> in Machine Learning,
4, 1–106, <a href="http://dx.doi.org/10.1561/2200000015" target="_blank">doi:10.1561/2200000015</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Friedel and Carlson(1972)</label><mixed-citation>
Friedel, R. and Carlson, G.: Difficult carbonaceous materials and their
infrared and Raman spectra. Reassignments for coal spectra, Fuel, 51, 194–198,
<a href="http://dx.doi.org/10.1016/0016-2361(72)90079-8" target="_blank">doi:10.1016/0016-2361(72)90079-8</a>, 1972.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Friedel and Carlson(1971)</label><mixed-citation>
Friedel, R. A. and Carlson, G. L.: Infrared spectra of ground graphite, J.
Phys. Chem., 75, 1149–1151, <a href="http://dx.doi.org/10.1021/j100678a021" target="_blank">doi:10.1021/j100678a021</a>, 1971.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Friedman et al.(2010)</label><mixed-citation>
Friedman, J. H., Hastie, T., and Tibshirani, R.: Regularization Paths for
Generalized Linear Models via Coordinate Descent, J. Stat.
Softw., 33, 1–22, <a href="http://dx.doi.org/10.18637/jss.v033.i01" target="_blank">doi:10.18637/jss.v033.i01</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Fu et al.(2013)</label><mixed-citation>
Fu, D., Leng, C., Kelley, J., Zeng, G., Zhang, Y., and Liu, Y.: ATR-IR Study of
Ozone Initiated Heterogeneous Oxidation of Squalene in an Indoor Environment,
Environ. Sci. Tech., 47, 10611–10618, <a href="http://dx.doi.org/10.1021/es4019018" target="_blank">doi:10.1021/es4019018</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Fu et al.(2011)</label><mixed-citation>
Fu, G.-H., Xu, Q.-S., Li, H.-D., Cao, D.-S., and Liang, Y.-Z.: Elastic Net
Grouping Variable Selection Combined with Partial Least Squares Regression
(EN-PLSR) for the Analysis of Strongly Multi-collinear Spectroscopic
Data, Appl. Spectrosc., 65, 402–408, <a href="http://dx.doi.org/10.1366/10-06069" target="_blank">doi:10.1366/10-06069</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Geladi and Kowalski(1986)</label><mixed-citation>
Geladi, P. and Kowalski, B. R.: Partial least-squares regression: a tutorial,
Anal. Chim. Acta, 185, 1–17, <a href="http://dx.doi.org/10.1016/0003-2670(86)80028-9" target="_blank">doi:10.1016/0003-2670(86)80028-9</a>, 1986.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Gilardoni et al.(2009)</label><mixed-citation>
Gilardoni, S., Liu, S., Takahama, S., Russell, L. M., Allan, J. D., Steinbrecher, R.,
Jimenez, J. L., De Carlo, P. F., Dunlea, E. J., and Baumgardner, D.: Characterization
of organic ambient aerosol during MIRAGE 2006 on three platforms,
Atmos. Chem. Phys., 9, 5417–5432, <a href="http://dx.doi.org/10.5194/acp-9-5417-2009" target="_blank">doi:10.5194/acp-9-5417-2009</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Gowen et al.(2011)</label><mixed-citation>
Gowen, A. A., Downey, G., Esquerre, C., and O'Donnell, C. P.: Preventing
over-fitting in PLS calibration models of near-infrared (NIR)
spectroscopy data using regression coefficients, J. Chemometr., 25,
375–381, <a href="http://dx.doi.org/10.1002/cem.1349" target="_blank">doi:10.1002/cem.1349</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Griffiths and Haseth(2007)</label><mixed-citation>
Griffiths, P. and Haseth, J. A. D.: Fourier Transform Infrared Spectrometry,
John Wiley &amp; Sons, 2nd Edn., 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Guzman-Morales et al.(2014)</label><mixed-citation>
Guzman-Morales, J., Frossard, A., Corrigan, A., Russell, L., Liu, S., Takahama,
S., Taylor, J., Allan, J., Coe, H., Zhao, Y., and Goldstein, A.: Estimated
contributions of primary and secondary organic aerosol from fossil fuel
combustion during the CalNex and Cal-Mex campaigns, Atmos. Environ.,
88, 330–340, <a href="http://dx.doi.org/10.1016/j.atmosenv.2013.08.047" target="_blank">doi:10.1016/j.atmosenv.2013.08.047</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Haaland and Thomas(1988)</label><mixed-citation>
Haaland, D. M. and Thomas, E. V.: Partial least-squares methods for spectral
analyses – 1. Relation to other quantitative calibration methods and the
extraction of qualitative information, Anal. Chem., 60, 1193–1202,
<a href="http://dx.doi.org/10.1021/ac00162a020" target="_blank">doi:10.1021/ac00162a020</a>, 1988.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Hamilton et al.(2004)</label><mixed-citation>
Hamilton, J. F., Webb, P. J., Lewis, A. C., Hopkins, J. R., Smith, S., and Davy, P.:
Partially oxidised organic components in urban aerosol using GCXGC-TOF/MS,
Atmos. Chem. Phys., 4, 1279–1290, <a href="http://dx.doi.org/10.5194/acp-4-1279-2004" target="_blank">doi:10.5194/acp-4-1279-2004</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Hand et al.(2012)</label><mixed-citation>
Hand, J. L., Schichtel, B. A., Pitchford, M., Malm, W. C., and Frank, N. H.:
Seasonal composition of remote and urban fine particulate matter in the
United States, J. Geophys. Res.-Atmos., 117, D05209,
<a href="http://dx.doi.org/10.1029/2011JD017122" target="_blank">doi:10.1029/2011JD017122</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Hastie and Qian(2014)</label><mixed-citation>
Hastie, T. and Qian, J.: Glmnet Vignette, <a href="http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html" target="_blank">http://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html</a> (last access: 6 January 2016), 2014
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Hastie et al.(2009)</label><mixed-citation>
Hastie, T., Tibshirani, R., and Friedman, J.: The elements of statistical
learning: data mining, inference, and prediction, Springer Verlag, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Hawkins and Russell(2010)</label><mixed-citation>
Hawkins, L. N. and Russell, L. M.: Oxidation of ketone groups in transported
biomass burning aerosol from the 2008 Northern California Lightning Series
fires, Atmos. Environ., 44, 4142–4154,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2010.07.036" target="_blank">doi:10.1016/j.atmosenv.2010.07.036</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Héberger(2010)</label><mixed-citation>
Héberger, K.: Sum of ranking differences compares methods or models fairly,
TRAC-Trend. Anal. Chem., 29, 101–109,
<a href="http://dx.doi.org/10.1016/j.trac.2009.09.009" target="_blank">doi:10.1016/j.trac.2009.09.009</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Héberger and Kollár-Hunek(2011)</label><mixed-citation>
Héberger, K. and Kollár-Hunek, K.: Sum of ranking differences for method
discrimination and its validation: comparison of ranks with random numbers,
J. Chemometr., 25, 151–158, <a href="http://dx.doi.org/10.1002/cem.1320" target="_blank">doi:10.1002/cem.1320</a>,
2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Hoerl and Kennard(1970)</label><mixed-citation>
Hoerl, A. E. and Kennard, R. W.: Ridge Regression – Biased Estimation For
Nonorthogonal Problems, Technometrics, 12, 55–67,
<a href="http://dx.doi.org/10.1080/00401706.1970.10488634" target="_blank">doi:10.1080/00401706.1970.10488634</a>, 1970.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Höskuldsson(1988)</label><mixed-citation>
Höskuldsson, A.: PLS regression methods, J. Chemometr., 2,
211–228, <a href="http://dx.doi.org/10.1002/cem.1180020306" target="_blank">doi:10.1002/cem.1180020306</a>, 1988.
</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Höskuldsson(2001)</label><mixed-citation>
Höskuldsson, A.: Variable and subset selection in PLS regression,
Chemometr. Intell. Lab., 55, 23–38,
<a href="http://dx.doi.org/10.1016/S0169-7439(00)00113-1" target="_blank">doi:10.1016/S0169-7439(00)00113-1</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Hudson et al.(2007)</label><mixed-citation>
Hudson, P. K., Schwarz, J., Baltrusaitis, J., Gibson, E. R., and Grassian,
V. H.: A spectroscopic study of atmospherically relevant concentrated aqueous
nitrate solutions, J. Phys. Chem. A, 111, 544–548,
<a href="http://dx.doi.org/10.1021/jp0664216" target="_blank">doi:10.1021/jp0664216</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Hudson et al.(2008)</label><mixed-citation>
Hudson, P. K., Young, M. A., Kleiber, P. D., and Grassian, V. H.: Coupled
infrared extinction spectra and size distribution measurements for several
non-clay components of mineral dust aerosol (quartz, calcite, and dolomite),
Atmos. Environ., 42, 5991–5999, <a href="http://dx.doi.org/10.1016/j.atmosenv.2008.03.046" target="_blank">doi:10.1016/j.atmosenv.2008.03.046</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Hung et al.(2002)</label><mixed-citation>
Hung, H. M., Malinowski, A., and Martin, S. T.: Ice nucleation kinetics of
aerosols containing aqueous and solid ammonium sulfate particles, J.
Phys. Chem. A, 106, 293–306, <a href="http://dx.doi.org/10.1021/jp012064h" target="_blank">doi:10.1021/jp012064h</a>, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>IPCC(2013)</label><mixed-citation>
IPCC: Climate Change 2013: The Physical Science Basis, Contribution of Working
Group I to the Fifth Assessment Report of the Intergovernmental Panel on
Climate Change, Tech. rep., <a href="http://dx.doi.org/10.1017/CBO9781107415324" target="_blank">doi:10.1017/CBO9781107415324</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Kalivas(2012)</label><mixed-citation>
Kalivas, J. H.: Overview of two-norm (L2) and one-norm (L1) Tikhonov
regularization variants for full wavelength or sparse spectral multivariate
calibration models or maintenance, J. Chemometr., 26, 218–230,
<a href="http://dx.doi.org/10.1002/cem.2429" target="_blank">doi:10.1002/cem.2429</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Kalivas et al.(2015)</label><mixed-citation>
Kalivas, J. H., Héberger, K., and Andries, E.: Sum of ranking differences
(SRD) to ensemble multivariate calibration model merits for tuning parameter
selection and comparing calibration methods, Anal. Chim. Acta, 869,
21–33, <a href="http://dx.doi.org/10.1016/j.aca.2014.12.056" target="_blank">doi:10.1016/j.aca.2014.12.056</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Karcher et al.(1985)</label><mixed-citation>
Karcher, W., Fordham, R. J., Dubois, J. J., Glaude, P. G. J. M., and Ligthart,
J. A. M.: Spectral Atlas of Polycyclic Aromatic Compounds: including Data on
Occurrence and Biological Activity, Spectral Atlas of Polycyclic Aromatic
Compounds, D. Reidel Publishing Company, Dordrecht, the Netherlands, 1985.
</mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Kelley(2012)</label><mixed-citation>
Kelley, A. M.: Condensed-Phase Molecular Spectroscopy and Photophysics,
John Wiley &amp; Sons, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Kollár-Hunek and Héberger(2013)</label><mixed-citation>
Kollár-Hunek, K. and Héberger, K.: Method and model comparison by sum of
ranking differences in cases of repeated observations (ties),
Chemometr. Intell. Lab., 127, 139–146, <a href="http://dx.doi.org/10.1016/j.chemolab.2013.06.007" target="_blank">doi:10.1016/j.chemolab.2013.06.007</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Kroll and Seinfeld(2008)</label><mixed-citation>
Kroll, J. H. and Seinfeld, J. H.: Chemistry of secondary organic aerosol:
Formation and evolution of low-volatility organics in the atmosphere,
Atmos. Environ., 42, 3593–3624,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2008.01.003" target="_blank">doi:10.1016/j.atmosenv.2008.01.003</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Kvalheim and Karstang(1989)</label><mixed-citation>
Kvalheim, O. M. and Karstang, T. V.: Interpretation of latent-variable
regression models, Chemometr. Intell. Lab., 7,
39–51, <a href="http://dx.doi.org/10.1016/0169-7439(89)80110-8" target="_blank">doi:10.1016/0169-7439(89)80110-8</a>, 1989.
</mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Kvalheim et al.(2014)</label><mixed-citation>
Kvalheim, O. M., Arneberg, R., Bleie, O., Rajalahti, T., Smilde, A. K., and
Westerhuis, J. A.: Variable importance in latent variable regression models,
J. Chemometr., 28, 615–622, <a href="http://dx.doi.org/10.1002/cem.2626" target="_blank">doi:10.1002/cem.2626</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Lack et al.(2014)</label><mixed-citation>
Lack, D. A., Moosmueller, H., McMeeking, G. R., Chakrabarty, R. K., and
Baumgardner, D.: Characterizing elemental, equivalent black, and refractory
black carbon aerosol particles: a review of techniques, their limitations and
uncertainties, Anal. Bioanal. Chem., 406, 99–122,
<a href="http://dx.doi.org/10.1007/s00216-013-7402-3" target="_blank">doi:10.1007/s00216-013-7402-3</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Lê Cao et al.(2008)</label><mixed-citation>
Lê Cao, K.-A., Rossouw, D., Robert-Granié, C., and Besse, P.: A Sparse
PLS for Variable Selection when Integrating Omics Data, Stat. Appl. Genet. Mo. B., 7, 35,
<a href="http://dx.doi.org/10.2202/1544-6115.1390" target="_blank">doi:10.2202/1544-6115.1390</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Lee et al.(2011)Lee, Lee, Lee, and Pawitan</label><mixed-citation>
Lee, D., Lee, W., Lee, Y., and Pawitan, Y.: Sparse partial least-squares
regression and its applications to high-throughput data analysis,
Chemometr. Intell. Lab., 109, 1–8,
<a href="http://dx.doi.org/10.1016/j.chemolab.2011.07.002" target="_blank">doi:10.1016/j.chemolab.2011.07.002</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Liu(2014)</label><mixed-citation>
Liu, J.: Developing a soft sensor based on sparse partial least squares with
variable selection, J. Process Contr., 24, 1046–1056,
<a href="http://dx.doi.org/10.1016/j.jprocont.2014.05.014" target="_blank">doi:10.1016/j.jprocont.2014.05.014</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Liu et al.(2009)</label><mixed-citation>
Liu, S., Takahama, S., Russell, L. M., Gilardoni, S., and Baumgardner, D.:
Oxygenated organic functional groups and their sources in single and submicron
organic particles in MILAGRO 2006 campaign, Atmos. Chem. Phys., 9, 6849–6863, <a href="http://dx.doi.org/10.5194/acp-9-6849-2009" target="_blank">doi:10.5194/acp-9-6849-2009</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Malm et al.(1994)</label><mixed-citation>
Malm, W. C., Sisler, J. F., Huffman, D., Eldred, R. A., and Cahill, T. A.:
Spatial and seasonal trends in particle concentration and optical extinction
in the United States, J. Geophys. Res.-Atmos., 99,
1347–1370, <a href="http://dx.doi.org/10.1029/93JD02916" target="_blank">doi:10.1029/93JD02916</a>, 1994.
</mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>Maria et al.(2003)</label><mixed-citation>
Maria, S. F., Russell, L. M., Turpin, B. J., Porcja, R. J., Campos, T. L.,
Weber, R. J., and Huebert, B. J.: Source signatures of carbon monoxide and
organic functional groups in Asian Pacific Regional Aerosol Characterization
Experiment (ACE-Asia) submicron aerosol types, J. Geophys.
Res.-Atmos., 108, 8637, <a href="http://dx.doi.org/10.1029/2003JD003703" target="_blank">doi:10.1029/2003JD003703</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>Martens(1991)</label><mixed-citation>
Martens, H.: Multivariate Calibration, John Wiley &amp; Sons, New York, 1991.
</mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>Mazumder et al.(2011)</label><mixed-citation>
Mazumder, R., Friedman, J. H., and Hastie, T.: SparseNet: Coordinate Descent
With Nonconvex Penalties, J. Am. Stat. Assoc.
106, 1125–1138, <a href="http://dx.doi.org/10.1198/jasa.2011.tm09738" target="_blank">doi:10.1198/jasa.2011.tm09738</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib74"><label>Mcclenny et al.(1985)</label><mixed-citation>
Mcclenny, W. A., Childers, J. W., Rōhl, R., and Palmer, R. A.: FTIR
transmission spectrometry for the nondestructive determination of ammonium
and sulfate in ambient aerosols collected on teflon filters, Atmos.
Environ., 19, 1891–1898, <a href="http://dx.doi.org/10.1016/0004-6981(85)90014-9" target="_blank">doi:10.1016/0004-6981(85)90014-9</a>, 1985.
</mixed-citation></ref-html>
<ref-html id="bib1.bib75"><label>Mehmood et al.(2012)</label><mixed-citation>
Mehmood, T., Liland, K. H., Snipen, L., and Saebo, S.: A review of variable
selection methods in Partial Least Squares Regression, Chemometr. Intell. Lab., 118, 62–69,
<a href="http://dx.doi.org/10.1016/j.chemolab.2012.07.010" target="_blank">doi:10.1016/j.chemolab.2012.07.010</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib76"><label>Mevik and Wehrens(2007)</label><mixed-citation>
Mevik, B. and Wehrens, R.: The pls package: Principal component and partial
least squares regression in R, J. Stat. Softw., 18, 1–24,
<a href="http://dx.doi.org/10.18637/jss.v018.i02" target="_blank">doi:10.18637/jss.v018.i02</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib77"><label>Moussa et al.(2009)</label><mixed-citation>
Moussa, S. G., McIntire, T. M., Szöri, M., Roeselová, M., Tobias, D. J.,
Grimm, R. L., Hemminger, J. C., and Finlayson-Pitts, B. J.: Experimental and
Theoretical Characterization of Adsorbed Water on Self-Assembled Monolayers:
Understanding the Interaction of Water with Atmospherically Relevant
Surfaces, J. Phys. Chem. A, 113, 2060–2069,
<a href="http://dx.doi.org/10.1021/jp808710n" target="_blank">doi:10.1021/jp808710n</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib78"><label>Murphy et al.(2007)</label><mixed-citation>
Murphy, S. M., Sorooshian, A., Kroll, J. H., Ng, N. L., Chhabra, P., Tong, C.,
Surratt, J. D., Knipping, E., Flagan, R. C., and Seinfeld, J. H.: Secondary aerosol
formation from atmospheric reactions of aliphatic amines, Atmos. Chem. Phys.,
7, 2313–2337, <a href="http://dx.doi.org/10.5194/acp-7-2313-2007" target="_blank">doi:10.5194/acp-7-2313-2007</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib79"><label>Nadler and Coifman(2005)</label><mixed-citation>
Nadler, B. and Coifman, R. R.: The prediction error in CLS and PLS: the
importance of feature selection prior to multivariate calibration, J.
Chemometr., 19, 107–118, <a href="http://dx.doi.org/10.1002/cem.915" target="_blank">doi:10.1002/cem.915</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib80"><label>Pavia et al.(2008)</label><mixed-citation>
Pavia, D., Lampman, G., and Kriz, G.: Introduction to Spectroscopy, Brooks/Cole
Pub Co., Belmont, CA, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib81"><label>Petzold et al.(2013)</label><mixed-citation>
Petzold, A., Ogren, J. A., Fiebig, M., Laj, P., Li, S.-M., Baltensperger, U.,
Holzer-Popp, T., Kinne, S., Pappalardo, G., Sugimoto, N., Wehrli, C., Wiedensohler, A.,
and Zhang, X.-Y.: Recommendations for reporting “black carbon” measurements,
Atmos. Chem. Phys., 13, 8365–8379, <a href="http://dx.doi.org/10.5194/acp-13-8365-2013" target="_blank">doi:10.5194/acp-13-8365-2013</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib82"><label>Pitts Jr.et al.(1978)</label><mixed-citation>
Pitts Jr., J. N., Grosjean, D., Cauwenberghe, K. V., Schmid, J. P., and
Fitz, D. R.: Photooxidation of aliphatic amines under simulated atmospheric
conditions: formation of nitrosamines, nitramines, amides, and photochemical
oxidant, Environ. Sci. Tech., 12, 946–953,
<a href="http://dx.doi.org/10.1021/es60144a009" target="_blank">doi:10.1021/es60144a009</a>, 1978.
</mixed-citation></ref-html>
<ref-html id="bib1.bib83"><label>R Core Team(2014)</label><mixed-citation>
R Core Team: R: A Language and Environment for Statistical Computing, R
Foundation for Statistical Computing, Vienna, Austria,
<a href="http://www.R-project.org/" target="_blank">http://www.R-project.org/</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib84"><label>Reff et al.(2007)</label><mixed-citation>
Reff, A., Turpin, B. J., Offenberg, J. H., Weisel, C. P., Zhang, J., Morandi,
M., Stock, T., Colome, S., and Winer, A.: A functional group characterization
of organic PM<sub>2.5</sub> exposure: Results from the RIOPA study RID C-3787-2009,
Atmos. Environ., 41, 4585–4598,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2007.03.054" target="_blank">doi:10.1016/j.atmosenv.2007.03.054</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib85"><label>Reggente et al.(2016)</label><mixed-citation>
Reggente, M., Dillner, A. M., and Takahama, S.: Predicting ambient aerosol
thermal-optical reflectance (TOR) measurements from infrared spectra: extending
the predictions to different years and different sites, Atmos. Meas. Tech.,
9, 441–454, <a href="http://dx.doi.org/10.5194/amt-9-441-2016" target="_blank">doi:10.5194/amt-9-441-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib86"><label>Reinikainen and Höskuldsson(2003)</label><mixed-citation>
Reinikainen, S. P. and Höskuldsson, A.: COVPROC method: strategy in
modeling dynamic systems, J. Chemometr., 17, 130–139,
<a href="http://dx.doi.org/10.1002/cem.770" target="_blank">doi:10.1002/cem.770</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib87"><label>Ripley and Thompson(1987)</label><mixed-citation>
Ripley, B. D. and Thompson, M.: Regression techniques for the detection of
analytical bias, Analyst, 112, 377–383, <a href="http://dx.doi.org/10.1039/AN9871200377" target="_blank">doi:10.1039/AN9871200377</a>, 1987.
</mixed-citation></ref-html>
<ref-html id="bib1.bib88"><label>Rosipal and Krämer(2006)</label><mixed-citation>
Rosipal, R. and Krämer, N.: Overview and Recent Advances in Partial Least
Squares, in: Subspace, Latent Structure and Feature Selection, edited by:
Saunders, C., Grobelnik, M., Gunn, S., and Shawe-Taylor, J.,
Lect. Notes Comput. Sc., 3940, 34–51, Springer Berlin
Heidelberg, <a href="http://dx.doi.org/10.1007/11752790_2" target="_blank">doi:10.1007/11752790_2</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib89"><label>Russell et al.(2009)</label><mixed-citation>
Russell, L. M., Bahadur, R., Hawkins, L. N., Allan, J., Baumgardner, D., Quinn,
P. K., and Bates, T. S.: Organic aerosol characterization by complementary
measurements of chemical bonds and molecular fragments, Atmos.
Environ., 43, 6100–6105, <a href="http://dx.doi.org/10.1016/j.atmosenv.2009.09.036" target="_blank">doi:10.1016/j.atmosenv.2009.09.036</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib90"><label>Russell et al.(2011)</label><mixed-citation>
Russell, L. M., Bahadur, R., and Ziemann, P. J.: Identifying organic aerosol
sources by comparing functional group composition in chamber and atmospheric
particles, P. Natl. Acad. Sci. USA, 108, 3516–3521, <a href="http://dx.doi.org/10.1073/pnas.1006461108" target="_blank">doi:10.1073/pnas.1006461108</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib91"><label>Ruthenburg et al.(2014)</label><mixed-citation>
Ruthenburg, T. C., Perlin, P. C., Liu, V., McDade, C. E., and Dillner, A. M.:
Determination of organic matter and organic matter to organic carbon ratios
by infrared spectroscopy with application to selected sites in the IMPROVE
network, Atmos. Environ., 86, 47–57,
<a href="http://dx.doi.org/10.1016/j.atmosenv.2013.12.034" target="_blank">doi:10.1016/j.atmosenv.2013.12.034</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib92"><label>Sax et al.(2005)</label><mixed-citation>
Sax, M., Zenobi, R., Baltensperger, U., and Kalberer, M.: Time resolved
infrared spectroscopic analysis of aerosol formed by photo-oxidation of
1,3,5-trimethylbenzene and alpha-pinene, Aerosol Sci. Tech., 39,
822–830, <a href="http://dx.doi.org/10.1080/02786820500257859" target="_blank">doi:10.1080/02786820500257859</a>, rID F-1113-2010, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib93"><label>Seinfeld and Pandis(2006)</label><mixed-citation>
Seinfeld, J. H. and Pandis, S. N.: Atmospheric Chemistry and Physics: From Air
Pollution to Climate Change, John Wiley &amp; Sons, New York, 2nd edition Edn.,
2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib94"><label>Shen and Huang(2008)</label><mixed-citation>
Shen, H. and Huang, J. Z.: Sparse principal component analysis via regularized
low rank matrix approximation, J. Multivariate Anal., 99,
1015–1034, <a href="http://dx.doi.org/10.1016/j.jmva.2007.06.007" target="_blank">doi:10.1016/j.jmva.2007.06.007</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib95"><label>Shurvell(2006)</label><mixed-citation>
Shurvell, H.: Spectra–Structure Correlations in the Mid- and Far-Infrared,
John Wiley &amp; Sons Ltd., <a href="http://dx.doi.org/10.1002/0470027320.s4101" target="_blank">doi:10.1002/0470027320.s4101</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib96"><label>Si and Samulski(2008)</label><mixed-citation>
Si, Y. and Samulski, E. T.: Synthesis of Water Soluble Graphene, Nano Lett.,
8, 1679–1682, <a href="http://dx.doi.org/10.1021/nl080604h" target="_blank">doi:10.1021/nl080604h</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib97"><label>Spiegelman et al.(1998)</label><mixed-citation>
Spiegelman, C. H., McShane, M. J., Goetz, M. J., Motamedi, M., Yue, Q. L., and
Cote, G. L.: Theoretical justification of wavelength selection in PLS
calibration development of a new algorithm, Anal. Chem., 70, 35–44,
<a href="http://dx.doi.org/10.1021/ac9705733" target="_blank">doi:10.1021/ac9705733</a>, 1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib98"><label>Stankovich et al.(2006)</label><mixed-citation>
Stankovich, S., Piner, R. D., Nguyen, S. T., and Ruoff, R. S.: Synthesis and
exfoliation of isocyanate-treated graphene oxide nanoplatelets, Carbon, 44,
3342–3347, <a href="http://dx.doi.org/10.1016/j.carbon.2006.06.004" target="_blank">doi:10.1016/j.carbon.2006.06.004</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib99"><label>Surratt et al.(2007)</label><mixed-citation>
Surratt, J. D., Kroll, J. H., Kleindienst, T. E., Edney, E. O., Claeys, M.,
Sorooshian, A., Ng, N. L., Offenberg, J. H., Lewandowski, M., Jaoui, M.,
Flagan, R. C., and Seinfeld, J. H.: Evidence for organosulfates in secondary
organic aerosol, Environ. Sci. Tech., 41, 517–527,
<a href="http://dx.doi.org/10.1021/es062081q" target="_blank">doi:10.1021/es062081q</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib100"><label>Szabó et al.(2006)</label><mixed-citation>
Szabó, T., Berkesi, O., Forgó, P., Josepovits, K., Sanakis, Y., Petridis, D.,
and Dékány, I.: Evolution of Surface Functional Groups in a Series of
Progressively Oxidized Graphite Oxides, Chem. Mater., 18,
2740–2749, <a href="http://dx.doi.org/10.1021/cm060258+" target="_blank">doi:10.1021/cm060258+</a>, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib101"><label>Takahama and Dillner(2015)</label><mixed-citation>
Takahama, S. and Dillner, A. M.: Model selection for partial least squares
calibration and implications for analysis of atmospheric organic aerosol
samples with mid-infrared spectroscopy, J. Chemometr., 29,
659–668, <a href="http://dx.doi.org/10.1002/cem.2761" target="_blank">doi:10.1002/cem.2761</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib102"><label>Takahama et al.(2013)</label><mixed-citation>
Takahama, S., Johnson, A., and Russell, L. M.: Quantification of Carboxylic and
Carbonyl Functional Groups in Organic Aerosol Infrared Absorbance Spectra,
Aerosol Sci. Tech., 47, 310–325,
<a href="http://dx.doi.org/10.1080/02786826.2012.752065" target="_blank">doi:10.1080/02786826.2012.752065</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib103"><label>ter Braak and de Jong(1998)</label><mixed-citation>
ter Braak, C. J. F. and de Jong, S.: The objective function of partial least
squares regression, J. Chemometr., 12, 41–54,
<a href="http://dx.doi.org/10.1002/(SICI)1099-128X(199801/02)12:1&lt;41::AID-CEM500&gt;3.0.CO;2-F" target="_blank">doi:10.1002/(SICI)1099-128X(199801/02)12:1&lt;41::AID-CEM500&gt;3.0.CO;2-F</a>, 1998.
</mixed-citation></ref-html>
<ref-html id="bib1.bib104"><label>Tibshirani(1996)</label><mixed-citation>
Tibshirani, R.: Regression shrinkage and selection via the Lasso,
J. Roy. Stat. Soc. B Met., 58, 267–288, 1996.
</mixed-citation></ref-html>
<ref-html id="bib1.bib105"><label>Tikhonov and Arsenin(1977)</label><mixed-citation>
Tikhonov, A. N. and Arsenin, V. I.: Solutions of ill-posed problems, Halsted
Press, New York, 1977.
</mixed-citation></ref-html>
<ref-html id="bib1.bib106"><label>Tuinstra and Koenig(1970)</label><mixed-citation>
Tuinstra, F. and Koenig, J. L.: Raman Spectrum of Graphite, J.
Chem. Phys., 53, 1126–1130, <a href="http://dx.doi.org/10.1063/1.1674108" target="_blank">doi:10.1063/1.1674108</a>, 1970.
</mixed-citation></ref-html>
<ref-html id="bib1.bib107"><label>Weakley et al.(2014)</label><mixed-citation>
Weakley, A., Miller, A., Griffiths, P., and Bayman, S.: Quantifying silica in
filter-deposited mine dusts using infrared spectra and partial least squares
regression, Anal. Bioanal. Chem., 406, 4715–4724,
<a href="http://dx.doi.org/10.1007/s00216-014-7856-y" target="_blank">doi:10.1007/s00216-014-7856-y</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib108"><label>Weisberg(2013)</label><mixed-citation>
Weisberg, S.: Applied Linear Regression, Wiley Series in Probability and
Statistics, Wiley, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib109"><label>Wittig et al.(2004)</label><mixed-citation>
Wittig, A. E., Anderson, N., Khlystov, A. Y., Pandis, S. N., Davidson, C., and
Robinson, A. L.: Pittsburgh air quality study overview, Atmos.
Environ., 38, 3107–3125, <a href="http://dx.doi.org/10.1016/j.atmosenv.2004.03.003" target="_blank">doi:10.1016/j.atmosenv.2004.03.003</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib110"><label>Wold(1966)</label><mixed-citation>
Wold, H.: Estimation of Principal Components and Related Models by Iterative
Least squares, in: Multivariate Analysis, Academic Press, 391–420, 1966.
</mixed-citation></ref-html>
<ref-html id="bib1.bib111"><label>Wold(1975)</label><mixed-citation>
Wold, H.: Soft modeling by latent variables: the nonlinear iterative partial
least squares approach, Perspectives in probability and statistics, Papers in
honour of M. S. Bartlett, 520–540, 1975.
</mixed-citation></ref-html>
<ref-html id="bib1.bib112"><label>Wold(1993)</label><mixed-citation>
Wold, S.: Discussion: PLS in Chemical Practice, Technometrics, 35, 136–139,
<a href="http://dx.doi.org/10.2307/1269657" target="_blank">doi:10.2307/1269657</a>,  1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib113"><label>Wold et al.(1983)</label><mixed-citation>
Wold, S., Martens, H., and Wold, H.: The Multivariate Calibration-problem In
Chemistry Solved By the Pls Method, Lect. Notes Math., 973,
286–293, 1983.
</mixed-citation></ref-html>
<ref-html id="bib1.bib114"><label>Wold et al.(1984)</label><mixed-citation>
Wold, S., Ruhe, A., Wold, H., and Dunn, III, W. J.: The Collinearity Problem
in Linear Regression. The Partial Least Squares (PLS) Approach to
Generalized Inverses, SIAM J. Sci. Stat. Comp.,
5, 735–743, <a href="http://dx.doi.org/10.1137/0905052" target="_blank">doi:10.1137/0905052</a>, 1984.
</mixed-citation></ref-html>
<ref-html id="bib1.bib115"><label>You et al.(2014)</label><mixed-citation>
You, Y., Kanawade, V. P., de Gouw, J. A., Guenther, A. B., Madronich, S.,
Sierra-Hernández, M. R., Lawler, M., Smith, J. N., Takahama, S.,
Ruggeri, G., Koss, A., Olson, K., Baumann, K., Weber, R. J., Nenes, A., Guo, H.,
Edgerton, E. S., Porcelli, L., Brune, W. H., Goldstein, A. H., and Lee, S.-H.:
Atmospheric amines and ammonia measured with a chemical ionization mass spectrometer
(CIMS), Atmos. Chem. Phys., 14, 12181–12194, <a href="http://dx.doi.org/10.5194/acp-14-12181-2014" target="_blank">doi:10.5194/acp-14-12181-2014</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib116"><label>Zou and Hastie(2005)</label><mixed-citation>
Zou, H. and Hastie, T.: Regularization and variable selection via the elastic
net, J. Roy. Stat. Soc. B Met., 67, 301–320, <a href="http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x" target="_blank">doi:10.1111/j.1467-9868.2005.00503.x</a>, 2005.

</mixed-citation></ref-html>
<ref-html id="bib1.bib117"><label>Zou et al.(2006)</label><mixed-citation>
Zou, H., Hastie, T., and Tibshirani, R.: Sparse principal component analysis,
J. Comput. Graph. Stat., 15, 265–286,
<a href="http://dx.doi.org/10.1198/106186006X113430" target="_blank">doi:10.1198/106186006X113430</a>, 2006.
</mixed-citation></ref-html>--></article>
