Constraining the response factors of an extractive electrospray ionization mass spectrometer for near-molecular aerosol speciation

,


Section S1. Condensation sink estimation
The condensation sink, CS (Lehtinen et al., 2003;Dal Maso et al., 2002) in s -1 is calculated as where D is the vapor diffusivity, dP is the particle diameter, the N(dP) is the number of particle of diameter dp, and βM(dP) is the Fuchs-Sutugin correction factor for gas-phase diffusion over particles in the transition regime.Using a discrete particle size distribution as measured by the SMPS, we calculate CS using an approximation of the integral, namely  = 2 ∑        Eq. (S2) The lifetime for gaseous condensation in the presence of a CS is (Markku Kulmala and Wagner 2001) As an approximation, D can be assumed to be 6 to 7 x 10 -6 m 2 s -1 for condensable organic vapors (Palm et al., 2016;Krechmer et al., 2017).A more nuanced estimation is described below.The Fuchs-Sutugin correction factor β, is calculated Eq. (S4) where α is the mass accommodation coefficient.In lieu of empirical values, unity is assumed for α (Markku et al., 2001).Recent experimental results support this unity assumption (Krechmer et al., 2017;Liu et al., 2019).Kn is the Knudsen number, Eq. (S5) where the particle radius (dp / 2) is used as the characteristic length; λ is the effective free mean path of vapor molecules.The mean free path in dry air varies slightly in the literature, e.g.6.53 to 6.673 x 10 -8 m (Jennings 1988).The mean free path of any organic compound can be calculated if its gas-phase diffusion coefficient (at the bath gas pressure), DPr, and average molecular speed c, are known, Eq. ( S6) where R is the ideal gas constant (8.314J mol -1 K -1 ), T is the temperature in K, and MW is the molar mass (kg mol -1 ).Note that Dpr is a function of bath gas pressure, P (Torr), and the gas diffusivity, D (Torr cm -2 s -1 ) For a trace gas A in a bath gas B, the gas diffusivity could be estimated using Fuller's method (Fuller et al., 1966;Tang et al., 2015), (, ) = 1.0868× 1.75   √(,)( √ where VA and VB are dimensionless diffusion volumes of A, and B; m(A,B) is the reduced mass of the A-B pair and can be calculated based on the molecular masses (g mol -1 ) of A and B, mA and mB, respectively (, ) = 2 (1/  +1/  ) Eq. (S10) VA may be estimated from the molecular formula of the trace gas where ni is the number of atoms with diffusion volume of Vi, which is 15.9 for C, 2.31 for H, 6.11 for O, and 4.54 for N (Reid et al., 1987).Subtracting 18.3 from the total diffusion volume accounts for the effect of the aromatic ring.For compounds containing multiple aromatic rings, it maybe be best to correct only for independent aromatic rings, based on limited experimental data (Tang et al., 2015).Alicyclic rings are not expected to have an effect on the diffusion volume (Tang et al., 2015).Diffusion volumes of common bath gasses are known instead of estimated: N2 (18.5),O2 (19.7),H2O (13.1).For inorganic and slightly oxygenated organic compounds, the mean free path of condensable vapors may be quite uniform (within 20%), where the Knudsen number can be estimated based on pressure and particle diameter alone (Tang et al., 2015), where P is the pressure of air in atm, and λP is the pressure normal mean free path equal to 100 nm atm.The deviation of Kn estimated using Eq.S12 for a 100 nm particle (i.e.Kn = 2) with respect to that estimated using Eq.S5 for selected compounds is shown in Table S1 with the corresponding gas diffusivity D, estimated using Eq.S9.All compounds are assumed to be non-aromatic unless indicated otherwise.For C5 to C10 VOCs (e.g.isoprene, monoterpenes) and their oxidation products (e.g.C5 to C10 monomers and C20 dimers), the estimated diffusivities differ less than a factor of 2 from 6.5•10 -6 cm 2 s -1 .Diffusion volume correction for (single) aromatic rings results in minor differences (< 5%) of the estimated D values.The estimated Knudsen numbers agree within 15%, as do the estimated Fuchs-Sutugin correction factors, β, between the simplified and the more rigorous estimation methods, assuming either a mass accommodation coefficient of 1 (3.08•10 -1 for all compounds) or 0.1 (3.67•10 -2 for all compounds), estimated using Eq.S4 and Eq.S12.

Section S2. Oxidation flow reactor schematic
A schematic of the experiment setup is shown in Figure S1 along with the physical dimensions of the oxidation flow reactor (OFR).VOC precursor and seed particles are injected near the entrance region of the OFR, whereas O3 is injected coaxially in the direction of the flow through a 6 mm outer diameter stainless-steel tubing about 61 cm downstream of the entrance region.Instruments sampled from near the exit region of the OFR.The cross-sectional area of the OFR is approximately 4.3•10 -3 m 2 .At 12 L min -1 , the plug flow velocity is roughly 4.65•10 -2 m s -1 .The residence time within the oxidation region (i.e.39 cm) is roughly 8.38s, or an effective dilution rate of 0.12 s -1 .Eq. (S13)

Section S4. AMS Vaporizer artifact correction
The high-resolution aerosol mass spectrometer (AMS) determines the aerosol composition in terms of NO3, NH4, SO4, Chl, and Organics (OA).All experiments were conducted under low-NOx conditions using NH4NO3 seed particles.Therefore, all NH4 + and NO3 -observed are attributed to NH4NO3.Due to the high inorganic concentrations used (up to 11.6 mg m -3 ), caution needs to be taken to account for vaporizer artifacts, where NOx + ions generated from nitrate particles during the electron impact ionization process could oxidize organic residues on the vaporizer surface, producing CO2 + ions that are falsely attributed to organic aerosols (Pieber et al., 2016).The extent of this artifact is determined by injecting NH4NO3 seed particles into the OFR in the absence of any organic oxidation products.As shown in Figure S3a below, the correlation of the organic vaporizer artifact, Orgartifact, can be described by an exponential function of the NH4NO3 (i.e.combined mass concentrations of NO3 -and NH4 + ).This correlation is used to correct for Orgartifact for all runs, as shown in Figure S3b to Figure S3d.Note that this correlation could change with the vaporizer history (Pieber et al., 2016).Here, the vaporizer artifact was characterized in the midst of the campaign.-cresol, and in (d) for the OH oxidation of 1,2,4trimethylbenzene.The correlation between condensed organics and NH4NO3 seed concentrations can be roughly described by a double exponential function.

Section S5. Oxidation flow reactor model
The organic vapor wall loss may be estimated from the OFR dimension and the gasdiffusivities as proposed by McMurry and Grosjean (1995), when the vapor wall accommodation coefficient is greater than 10 -5 , i.e. eddy diffusion dominates.This is the case for oxidation flow reactors (OFR) of similar dimensions to the one used in this study (Brune 2019;George et al., 2007).A and V are the surface area (1.02 x 10 -1 m 2 ) and volume (1.72 x 10 -3 m 3 ) of the OFR, respectively.ke is the coefficient of Eddy diffusion, which may be estimated as a function of the enclosure volume (Krechmer et al., 2016), which is 4.05 x 10 -3 s -1 .Due to their relatively small enclosure volume (relative to that of a typical smog chamber, ke would be close to 4•10 -3 s -1 for most OFR designs.For estimated gas diffusivity, D ranging from 3.69•10 -6 (C20H32O16) to 1.18•10 -5 (C3H6) m 2 s -1 , the corresponding kwall ranges from 4.60•10 -3 s -1 to 8.22•10 -3 s -1 , resulting in a wall loss timescale, τwall between 122 and 218 s.Two different vapor wall loss experiments conducted using a PTR-TOF and an acetate atmospheric pressure interface chemical ionization TOF-MS indicate a 50% vapor wall loss rate at 10 L min -1 flow rate, which suggest a τwall similar to that of the dilution lifetime, i.e. 27 seconds, meaning that the actual kwall is close to 3.7•10 -2 s -1 , roughly 4 to 8 times higher than Eq.S14 and Eq.S15 would suggest.For simplicity, a kw value of 0.04 s -1 is used as the base case scenario.The effects of higher kw (i.e.0.4 s -1 ) and lower kw (i.e.0.04 s -1 ) values on the gas-and particle-phase concentrations are simulated and shown in Figure 3a-c for generic oxidation products of differing saturation vapor concentrations ranging from 10 -2 to 10 6 µg m - 3 .The OFR wall is also assumed to be a perfect sink for organic vapors, i.e. no back-partitioning of organic vapor from the wall to the gas-phase is considered.
The remaining gas-phase concentration, Gremain and the condensed particle-phase concentration, Pcond during seed injection are expressed in relative terms with respect to the steady gas-phase concentration prior to the seed injection, Gss (e.g.Gremain/Gss and Pcond/Gss).So that they are not dependent on the absolute value of Gss, and vice versa on the actual production rate, provided that the production rate is not affected by the seed injection.The modeled gas-particle partitioning is shown below in Figure S4.A sensitivity analysis was performed by varying the organic aerosol concentration (OA), the condensation sink (CS), or the wall loss rate (kw) from the base condition (20 µg m -3 OA, 1 s -1 CS, and 0.04 s -1 kw) in Figure S4a-c.The observed OA and CS values were used to simulate the partitioning behaviors as shown in Figure S4d-i.For each VOC system, the observed OA concentration and CS roughly followed a linear correlation.Figure S4d shows the Pcond normalized to the maximum value as a function of CS, and suggests that it may be possible to infer the saturation vapor concentration, C * of semi-volatile compounds based on the uptake trend without the knowledge of near-molecular particle-phase sensitivity or gas-phase concentration (as long as GSS remains constant in this case).However, compounds of different C * may exhibit similar trends, i.e. high inter-correlations, which cannot be numerically resolved due to noise.Visually, this is obvious for compounds with log(C * ) > 2 or < -1 as shown in Figure S4d.
To determine the range of log(C * ) that could be in theory numerically resolved from the Pcond behaviors alone, we modeled the normalized Pcond for compounds with log(C * ) ranging from -2 to 6 using OA and CS values observed for each system.The lower C * threshold is set at the point beyond which all compounds with lower C * would exhibit normalized Pcond trends with intercorrelation (R 2 value from linear regression between the normalized Pcond values corresponding to any pair of C* values, i.e. any two "vertical slices" from Figure S4e and S4f) above 0.99.The decision to set the cutoff at R 2 = 0.99 is arbitrary.The upper C * threshold is similarly defined in Figure S4g-i.The experimentally constrainable log(C * ) ranges based on the uptake behavior alone are narrow: 1.25 to 2.02 for the cresol system, 1.18 to 2.09 for the TMB system, and 0.57 to 1.85 for the limonene system.The span of the constrainable C * range is wider for the limonene system due to the higher maximum CS range explored experimentally (>2 s -1 as compared to <1 s -1 for the anthropogenic systems).The upper constrainable C * range for limonene system (i.e.log(C * ) =1.85) is lower compared to that for either cresol (i.e.log(C * ) =2.02) or TMB system (i.e.log(C * ) =2.09) due to the lower maximum OA uptake as a function of CS for the limonene system as compared to the anthropogenic systems.All else being equal, the constrainable range of log(C * ) increases with the experimental CS range, which is limited by the maximum particle concentrations the instruments could accommodate before clogging or signal depletion becomes too severe.

Section S7. Parameterization and Model Validation
The EESI-TOF response factor in ions s -1 ppb -1 , RFx * , can be estimated by performing a linear regression of Ix (in ions s -1 ) as a function of Pcond,x (in ppb -1 ) as described in the main text, and taking the slope.Because ordinary least square regression (OLS) minimizes only the vertical (i.e.Ix on the y-axis) distance of the dependent variable, it cannot account for uncertainties in the explanatory variable (i.e.Pcond,x on the x-axis) during error propagation.Propagation of uncertainties in the explanatory and dependent variables can be achieved by performing an orthogonal distance regression (ODR).The slope values obtained using either method agree within a factor of 2, as shown in Figure S7.where, if DBE ≤ 0, Xc is set to 0. Note that for CHO compounds, Eq.S18 simplifies to Eq. (S19) In addition, the carbon-oxygen non-ideality (NICO) from Eq. ( 7) itself is an interaction term between the product of the number of carbon and oxygen atoms (PCO) and the inverse of the sum of carbon and oxygen atoms (ICO), Eq. (S20) In addition to the aforementioned features, the log of effective saturation vapor concentration, log(C * ) is included as a feature.
Preliminary ordinary least square (OLS) regressions of the near-molecular EESI-TOF response factor, RF * x (which was obtained with ODR) as a function of nC, nO, MW, NICO, PCO, or ICO, are shown in Figure S8a-f for each of the three VOC systems studied.The RF * x values estimated for cresol and TMB oxidation products appear to increase as the molecules increase in size (i.e.positive correlation with MW and nC) and/or become more functionalized (i.e.positive correlation with nO).The correlations also appear to be steeper for the TMB system than for the cresol system.In contrast, the RF * x values estimated for limonene oxidation products do not appear to be well correlated with nC, nO, MW, PCO, ICO, or NICO.The discrepancies observed between the aromatic systems and the biogenic system are likely due to differences in the structure of the oxidation products as discussed in the main text.

Figure S8. Preliminary regression analysis OLS regression analysis of the log of RF *
x with respect to (a) the number of carbon, nC, (b) the number of oxygen, nO, (c) the molecular weight, MW, (d) the carbon-oxygen non-ideality, NICO, (e) the product of nC and nO, PCO, and (f) the inverse of the sum of nC and nO, ICO.The red, blue, and green dashed lines correspond to the linear fitting lines for the log(RF * x) values of TMB, cresol, and LMN oxidation products, respectively.The coefficient of determination, R 2 of ordinary linear regression for the log(RF * x) as a function of the feature is shown in brackets after the corresponding VOC label.
First, an exhaustive search over the feature space was performed to determine the optimal set of features for each regressor using their respective default hyperparameter values.Leave-one-out (LOO) cross-validation was used to evaluate the model performance in terms of the coefficient of determination, R 2 Eq. ( S21) where yi and  ̂ are the true and the predicted value for the i-th sample among a total of n samples, and  ̅ is the mean of the n samples.If the model always predicts  ̅, the R 2 will be 0, e.g. a naive model where all values are predicted to equal that of the sample mean regardless of input.The R 2 can be negative if it performs worse than this naive model, i.e. assuming the mean value regardless of model input produces on average better results.For a dataset of size n, LOO involves setting aside each data point (yi) in turn as the test sample while the remaining (n-1) data points are used to train the model and make a prediction,  ̂.yi and  ̂ are then used to estimate the R 2 using Eq.(S21).LOO can be considered as performing a K-fold crossvalidation where the number of K is equal to the number of data points.Compared to the Kfold cross-validation method, LOO is more computationally intensive to perform, but is nonetheless appropriate given the small size of the dataset used here (nsample = 28 for case 1 and 70 for 2a and 2b).During cross-validation, a portion of the dataset is used to train the model (i.e."train" set), while the remaining dataset is withheld to validate against the model predictions (i.e."test" set).For each train-test set, the training feature values (n = nsample -1) were standardized, which involves subtracting by their mean and dividing by their standard deviation.The same transformation was then applied to the feature values from the test set (n = 1), which was not included in deriving the transformation required for the standardization to prevent information leak between the training and test sets.
The results of the feature optimization are shown in Figure S8 in terms of the best R 2 vs. the number of features used.The optimal feature sets are shown in Table S2.In addition to OLS, linear ridge regression ("Ridge") and Bayesian ridge regression (BayRR) are included.Both Ridge and BayRR implement L2 regularization, making them more resilient against overfitting and feature co-linearity.Support vector regression (SVR) with linear kernel is also included as a linear regression model for comparison.Exploratory analysis using SVR with radial basis functions (rbf) yielded better R 2 , but the relative feature importance was not easily interpretable when rbf was used, hence the choice of linear kernel.Lastly, nonparametric regressions such as random forest regressor (RFR) and gradient boosting regressor (GBR) were included, as the RF * x is likely not a linear function of features already included.While it is possible that RF * x could be well-described by a linear combination of engineered features, it is not feasible to explore all nonlinear (e.g.nC 2 ) or interaction (ncnH) feature terms, hence the necessity of nonparametric regressors.For the purpose of feature selection and later hyperparameter tuning, the random state (which controls the permutation of features at each split within the decision tree) for RFR and GBR are fixed (i.e.given a seed value of 0) so that the models generate producible outputs.Note that in some cases (e.g.SVR in case 1 and 2a), the optimal feature set selected does not correspond to the set with the highest R 2 , but rather one with slightly lower R 2 score but also (sometimes substantially) lower total number of features used.The feature abbreviations used are as followed: Carbon-oxygen non-ideality (NICO), product of the number of oxygen and carbon numbers (PCO), the inverse of the sum of the number of oxygen and carbon numbers (ICO), logarithm of saturation vapor concentration (log(C * )), aromaticity (XC), double bond equivalent per carbon (DBEpC), number of oxygen atoms (nO), number of hydrogen atoms (nH), molecular weight (MW), mass defect (∆m), hydrogen-to-carbon ratio (H:C), oxygen to carbon ratio (O:C), one-hot encoded precursor VOC label (VOC).
The R 2 determined from the leave-one-out (LOO) cross-validation test is shown.For ordinary least square (OLS) regression, linear ridge regression (LRR), Bayesian ridge regression (BRR), and support vector regression (SVR), the weight for each feature is shown.For random forest (RFF) and gradient boosting regression (GBR), the importance is shown, which is a measure of the usefulness of a feature in constructing the decision tree.VOCLMN, VOCTMB, and VOCCresol are the one-hot encoded representation of the VOC identity.
Note that if we were to use the entire dataset to train and validate the model, the resultingR 2 would be overly optimistic, as shown in Figure S11 especially for those obtained using the nonparametric regressors.

Figure S11. Regression using the entire dataset
Comparison of the predicted log(RF * x) using the entire dataset with VOC label included as one of the features using (a) linear regression models and (b) nonparametric regression models.The optimal feature sets and hyperparameters used for each model are identical to those used for Figure S10 and Table S5, except that now each model was trained with the entire dataset to predict the entire dataset, instead of following the LOO procedure.The 1-to-1 line is shown in solid black.The darker shaded region represents a factor of 2 deviation from the 1-to-1 line.The lighter shaded region represents a factor of 5 deviation from the 1-to-1 line.
For typical ambient measurements or chamber experiments with complex precursor mixtures, the VOC precursor identity is often not known without additional constraints (e.g.ion mobility or gas chromatography measurements supported with chemical reaction box models).The prediction capability of the regression model for an unknown VOC is examined in Figures S12a and S12b, using the TMB dataset as the "known" VOC system to predict the log(RF * x) for the "unknown" cresol and limonene (LMN) systems.As shown in Figure S12a, while the regression models trained with TMB dataset tend to overestimate the log(RF * x) for the cresol system, the predictions and observations are qualitatively consistent in terms of the relative log(RF * x), likely due to the structural similarity of cresol and TMB, which would be reflected to varying degrees in their respective oxidation products.In contrast, regression models trained with the TMB dataset are unfit to predict the log(RF * x) for the limonene oxidation products, as shown in Figure S12b.
The effect of the VOC precursor on the predicted log(RF * x) values, using the model trained in Case 2b (all data with digitized VOC label), for all CHO molecular formulae used for EESI-TOF spectral fitting is shown in Figures S12c and S12d.In general, the predicted log(RF * x) trend in the same direction for all VOCs.The predicted effect of VOC precursor is distinct when a linear regressor is used, as shown in Figure S12d, where log(RF * x) is treated as a linear combination of features, one of which is the digitized VOC precursor identity.When a decision-tree type regressor is used, the VOC precursor identity effect is not as simple, as shown in Figure S12c.Lastly, the combination of dataset from multiple VOC systems also affects the predicted log(RF * x), as shown in Figure S12e and S12f for the TMB system.Linear models trained with the combined dataset (i.e.Case 2b) appear to (severely) underestimate the log(RF * x) as compared to the models trained with a single VOC dataset (i.e.Case 1).Furthermore, regressors that performed reasonably well (e.g.LRR for Case 1) for the training dataset with a limited number of features (e.g.NICO) may be ill-equipped when predicting for a more diverse set of compounds, whose variabilities are only reflected in other features (e.g.optimal features for LRR in Case 2b, see Table S5).x) for all molecular formulae used for EESI-TOF MS fitting predicted using the GBR model from Case 1 and Case 2b for TMB system only.(f) Same as (e), but with the LRR model from Case 1 and Case 2b.The optimal feature sets and hyperparameters used for each model are listed in Table S5.The 1-to-1 line is shown in solid black.The darker shaded region represents a factor of 2 deviation from the 1-to-1 line.The lighter shaded region represents a factor of 5 deviation from the 1-to-1 line.The case number indicated on the axis legend and in annotations indicate the how the model was trained as described throughout Tables S2-5.

Figure S13. Comparison of estimated and observed OA concentration
Comparison of the observed organic aerosol (OA) as measured by the AMS with the OA concentration estimated using EESI-TOF measurements converted from ions s -1 to μg m -3 using the RF * x (ions s -1 ppb -1 ) predicted using the gradient boosting regression (GBR) model.Conversion of ppb to molecules cm -3 is performed under standard conditions, i.e. 2.46•10 10 molecules cm -3 per ppb.The 1-to-1, 2-to-1, and 3-to-1 lines are shown in solid black.Two versions of the regression models are used to predicted the RF * x for TMB, one trained with single VOC dataset (Case 1) and one trained with combined VOC datasets where the VOC precursor identity is used as a training feature (Case 2b).

Figure
Figure S1.Flow tube dimension.

Figure S3 .
Figure S3.Inorganic salt-induced vaporizer artifact (a) Artefact organics concentration observed by the AMS when sampling nebulized NH4NO3 in the absence of any organic oxidation products.An exponential function of NH4NO3 concentration is used to estimate the organic signal attributable to the vaporizer artifact.The

Figure
Figure S4.Modeled partitioning (a-c) Expected distribution of organic oxidation products of differing volatilities between the gas-and particle-phase during the seed injection period for a hypothetical base case scenario of 20 µg m -3 organic aerosol concentration (OA), 1 s -1 condensation sink (CS), and 0.04 s -1 vapor wall loss rate (kw).Alternative scenarios assume higher or lower OA, CS, and kw.(d-i) Modeled ratio of Pcond to Gss for compounds of varying log(C * ) under observed OA and CS conditions.(a) Ratio of condensed organic material during seed injection, Pcond to the steadystate gas-phase concentration prior to seed injection, GSS.The ratio can exceed 1 under high CS conditions.(b) Ratio of Pcond to the sum of Pcond with the gas-phase concentration during the seed injection period, Gremain.Partitioning between Pcond and Gremain is invariant with respect

Figure S6 .
Figure S6.Comparison of major particle-and gas-phase oxidation products Intensities of selected [M+Na] + adducts observed by the EESI-TOF for the particle-phase are shown for (a) C7 OH + cresol oxidation products, (c) C9 OH + TMB oxidation products, and (e) C10 limonene + O3 oxidation products.Intensities of selected [M+H] + ions observed by the Vocus-PTR in the gas-phase are shown for (b) C7 OH + cresol oxidation products, (d) C9 OH + TMB oxidation products, and (f) C10 limonene + O3 oxidation products.Average particle-phase signals over all uptake events are shown in (a), (c), and (e).Average steadystate gas-phase concentrations prior to each uptake event are shown in (b), (d), and (f).Note that the color scales are only consistent within each of the (a-b), (c-d), and (e-f) pairs.Ion

Figure S7 .
Figure S7.Comparison of the response factor (RFx * ) values determined using ordinary least square (OLS) and orthogonal distance regression (ODR).Uncertainties in the explanatory and response variables are taken into consideration by ODR during fitting.Vertical and horizontal error bars shown represent the standard deviation of the fitted slope of EESI-TOF vs. Vocus-PTR measurements, i.e.RFx * Based on the elemental formulae measured by the EESI-TOF and the Vocus-PTR, several additional features could be derived from the number of carbon (nC), hydrogen (nH), and oxygen (nO), including the exact molecular mass (MW), the mass defect (∆m), the hydrogen-to-carbon ratio (H:C), the oxygen-to-carbon ratio (O:C), the double bond equivalent (DBE), and the double bond equivalent per carbon (DBEpC)  = 1 + 1 2 (2 −  +  + )Eq.(S16)

Figure S9 .
Figure S9.Feature selectionThe best R 2 from LOO cross-validation test for each regressor using different permutations of features as a function of the number of features included for (a) Case 1, where only the TMB dataset is used, (b) Case 2a, where data from all the VOC systems were used without providing the digitized VOC identity as one of the input features, or (c) Case 2b, where data from all the VOC systems were used with the one-hot encoded VOC identity provided as one of the input features, hence the one extra feature over cases 1 and 2a.

Figure S12 .
Figure S12.Prediction of all RF * x (a) Comparison of the observed log(RF * x) for cresol oxidation products with that predicted using gradient boosting regression (GBR) and the linear ridge regression (LRR) models trained with the TMB dataset (b) Same as (a) but for the limonene (LMN) system.(c) Comparison of the log(RF * x) for all molecular formulae used for EESI-TOF MS fitting predicted using the GBR model from Case 2b for different VOC systems, i.e. all feature values used during