Mass spectrometry-based Aerosolomics : a new approach to resolve sources, composition, and partitioning of secondary organic aerosol

. Particulate matter (PM) largely consists of secondary organic aerosol (SOA) that is formed via oxidation of biogenic and anthropogenic volatile organic compounds (VOCs). Unambiguous identiﬁcation of SOA molecules and their assignment to their precursor vapors are challenges that have so far only succeeded for a few SOA marker compounds, which are now well characterized and (partly) available as authentic standards. In this work, we resolve the complex composition of SOA by means of a top-down approach based on the newly created Aerosolomics database, which is fed by non-target analysis results of ﬁlter samples from oxidation ﬂow reactor experiments. We investigated the oxidation products from the ﬁve biogenic VOCs α -pinene, β -pinene, limonene, 3-carene, and trans -caryophyllene and from the four anthropogenic VOCs toluene, o -xylene, 1,2,4-trimethylbenzene, and naphthalene. Using ultrahigh-performance liquid chromatography coupled to a high-resolution (Orbitrap) mass spectrometer, we determine the molecular formula of 596 chromatographi-cally separated compounds based on exact mass and isotopic pattern. We utilize retention time and fragmentation mass spectra as a basis for unambiguous attribution of the oxidation products to their parent VOCs. Based on the molecular-resolved application of the database, we are able to assign roughly half of the total signal of oxygenated hydrocarbons in ambient suburban PM 2 . 5 to one of the nine studied VOCs. The application of the database enabled us to interpret the appearance of diurnal compound clusters that are formed by different oxidation processes. Furthermore, by performing a hierarchical cluster analysis (HCA) on the same set of ﬁlter samples, we identiﬁed compound clusters that depend on sulfur dioxide mixing ratio and temperature. This study demonstrates how Aerosolomics tools (database and HCA) applied to PM ﬁlter samples can improve our understanding of SOA sources, their formation pathways, and temperature-driven partitioning of SOA compounds

Abstract. Particulate matter (PM) largely consists of secondary organic aerosol (SOA) that is formed via oxidation of biogenic and anthropogenic volatile organic compounds (VOCs). Unambiguous identification of SOA molecules and their assignment to their precursor vapors are challenges that have so far only succeeded for a few SOA marker compounds, which are now well characterized and (partly) available as authentic standards. In this work, we resolve the complex composition of SOA by means of a topdown approach based on the newly created Aerosolomics database, which is fed by non-target analysis results of filter samples from oxidation flow reactor experiments. We investigated the oxidation products from the five biogenic VOCs α-pinene, β-pinene, limonene, 3-carene, and trans-caryophyllene and from the four anthropogenic VOCs toluene, o-xylene, 1,2,4-trimethylbenzene, and naphthalene. Using ultrahigh-performance liquid chromatography coupled to a high-resolution (Orbitrap) mass spectrometer, we determine the molecular formula of 596 chromatographically separated compounds based on exact mass and isotopic pattern. We utilize retention time and fragmentation mass spectra as a basis for unambiguous attribution of the oxidation products to their parent VOCs. Based on the molecularresolved application of the database, we are able to assign roughly half of the total signal of oxygenated hydrocarbons in ambient suburban PM 2.5 to one of the nine studied VOCs. The application of the database enabled us to interpret the appearance of diurnal compound clusters that are formed by different oxidation processes. Furthermore, by performing a hierarchical cluster analysis (HCA) on the same set of filter samples, we identified compound clusters that depend on sulfur dioxide mixing ratio and temperature. This study demonstrates how Aerosolomics tools (database and HCA) applied to PM filter samples can improve our understanding of SOA sources, their formation pathways, and temperaturedriven partitioning of SOA compounds.

Introduction
Secondary organic aerosol (SOA) is a complex mixture forming through the oxidation of biogenic (BVOCs) and anthropogenic volatile organic compounds (AVOCs) in the atmosphere. Aerosol particles influence Earth's climate as well as human health (Hallquist et al., 2009;Shrivastava et al., 2017). Earlier work has shown that SOA makes up a large fraction of fine particulate matter (PM 2.5 , particles with an aerodynamic diameter less than 2.5 µm) Huang et al., 2014;McDonald et al., 2018). Globally, the emissions of BVOCs are considerably higher than those of AVOCs, with 760-1150 compared to 140 TgC yr −1 , respectively (Kari et al., 2019;Shrivastava et al., 2017;Sindelarova et al., 2014). BVOC emissions can mainly be distributed among isoprene (70 %), monoterpenes (11 %), methanol (6 %), and others (13 %) (Sindelarova et al., 2014). AVOCs and BVOCs are not only emitted by different sources but also have different SOA yields and result in different products with distinct properties. Furthermore, it is known that both organic and inorganic anthropogenic emissions can affect SOA formation from BVOCs (Kari et al., 2019;Xu et al., 2021). Still, major knowledge gaps exist on the sources and formation pathways of SOA, its transformation and lifetime in the atmosphere, and its underlying effect on Earth's climate and human health. Furthermore, emissions of anthropogenic and biogenic precursors as well as their atmospheric fate are uncertain, resulting in a discrepancy between measured and modeled SOA (Fuzzi et al., 2015). Improved chemical characterization of ambient SOA can help the understanding of sources, formation pathways, and the effects on both climate and human health.
Many controlled laboratory studies have increased our mechanistic understanding of the oxidation of volatile organic compounds (VOCs) (Burkholder et al., 2017). However, the ambient atmosphere is usually more complex than chamber experiments, and unaccounted chemical interactions can therefore alter SOA yields that are derived from simple laboratory systems (McFiggans et al., 2019). Furthermore, it is likely that many VOCs that are relevant for SOA formation are understudied. Therefore, a comprehensive top-down investigation of SOA can enable the identification of missing important precursor gases or relevant formation pathways. Numerous previous investigations (Glasius et al., 2000;Kristensen et al., 2016;Nozière et al., 2015;Surratt et al., 2007) clearly highlight the advantages of offline measurement techniques, which apply separation techniques like gas chromatography or (ultrahigh-performance) liquid chromatography (UHPLC) coupled to (high-resolution) mass spectrometry (HRMS), because with these techniques, the unambiguous identification of different compounds becomes possible. In recent years, non-target analysis (NTA) of UHPLC-HRMS measurements has become a powerful tool that builds peak lists of all detected compounds in complex samples and determines the molecular formula based on the exact mass and isotopic pattern. Furthermore, MS 2 spectra can be compared to fragmentation libraries and enable database-assisted identification of compounds (Ditto et al., 2018;Ma et al., 2022;Pereira et al., 2021;Pleil et al., 2018). However, there are currently no established databases of atmospheric SOA tracers which can be applied to measurements of ambient PM 2.5 filter samples.
Inspired by metabolomics, a tool widely used in the life sciences to identify metabolites, metabolic pathways, and biomarkers (Fiehn, 2002;Johnson et al., 2016), we created an aerosolomics database for database-assisted identification of marker compounds (without having the need for authentic standards), hence enabling the investigation of atmospheric transformation pathways of VOCs under different oxidation conditions. The database enables compound matching based on filters from potential aerosol mass (PAM) oxidation flow reactor (OFR) experiments on nine biogenic and anthropogenic VOCs. We applied the database to ambient air filter samples collected in summer 2018 near Vienna, Austria. Figure 1 shows the principal steps of the new Aerosolomics approach that is based on the comparison between filter samples from OFR experiments and from the ambient. Additionally, a hierarchical cluster analysis (HCA) was performed in order to reduce the complexity of the ambient dataset and to assign compounds to certain formation processes or emission sources. Both strategies combined allow the identification of oxidation products from either biogenic or anthropogenic VOCs and enable a better understanding of the ox- Figure 1. Establishment and application of the Aerosolomics database. The database is filled with the results of several PAM-OFR experiments with different precursors and reactants and is compared to the molecular fingerprints of ambient PM 2.5 filter samples. Matching compounds can be assigned to the corresponding VOC precursor. Knowledge about non-matching compounds, due to missing primary sources or SOA from unaccounted precursors or formation mechanisms, can guide further experiments. idation conditions and of temperature-driven gas-to-particle partitioning.
We evaporated the VOCs in a heated glass flask purged continuously with nitrogen (N 2 , 6.0 purity, Nippon Gases). The aerosol mass concentration was measured with a scanning mobility particle sizer spectrometer (SMPS, consisting of an electrostatic classifier 3082, a differential mobility analyzer 3081A, and an ultrafine condensation particle counter 3776, TSI Inc.). Changing the precursor concentration by varying the temperature inside the flask resulted in aerosol mass concentrations between 20 and 184 µg m −3 . The individual settings and the resulting mass concentrations are given in Table S1.
In all experiments, the nitrogen flow into the reactor was 4.8 L min −1 , and the oxygen (O 2 , 5.0 purity, Nippon Gases) flow was 1.2 L min −1 , resulting in a mean residence time of 2.4 min. The relative humidity was 55 %. The O 3 concentration was ∼ 1 ppm, decreasing to 0.8 ppm under OH conditions. Based on these experimental conditions, we calculated the external OH reactivity (Eq. S1) and the OH exposure using the OFR exposure estimator (Peng et al., 2015(Peng et al., , 2016. The resulting OH exposures (Table S1) correspond to approximately 0.1-6 d of equivalent atmospheric OH exposure, based on the assumption of an averaged tropospheric OH concentration of 1.09 × 10 6 molecules cm −3 .
The aged air leaving the OFR passed through two 50 cm denuders packed with charcoal (IAC-402, Infiltec GmbH) and potassium permanganate (KMnO 4 , IAC-630, Infiltec GmbH) in order to remove (reactive) gas-phase compounds. Glass fiber filters (47 mm, Pallflex Emfab Filters, Pall) sampled the formed SOA particles with a flow of 3 L min −1 for a duration of 90 min. Until sample preparation and analysis, filter samples were packed in aluminum foil and stored at −18 • C.
2.2 Ambient air filter sampling campaign PM 2.5 filter samples were collected in August 2018 during a field campaign (48.127 • N, 16.534 • E) at a suburban background station between the Vienna International Airport in the east and the Schwechat Industrial Park and city of Vienna in the northwest. A total of 52 glass fiber filters (150 mm, Ahlstrom-Munksjö) were sampled for 12 h, starting at 05:00 (UTC) or 17:00 (UTC) respectively, using a high-volume sampler (DHA-80, Digitel Elektronik AG) at a flow rate of 30 m 3 h −1 . The meteorological parameters (i.e., wind direction, wind speed, and temperature), the trace gas concentration (i.e., nitrogen oxide -NO, nitrogen dioxide -NO 2 , sulfur dioxide -SO 2 , and carbon monoxide -CO), and the PM 2.5 mass concentration were monitored continuously.

Sample preparation
From each ambient filter sample, one punch (25 mm diameter) was cut into small pieces and extracted in a glass vial using 200 µL of ultrapure water (Milli-Q Reference A+, Merck KGaA) and methanol (Optima LC/MS Grade, Thermo Fisher Scientific Inc.) (90/10, v/v) for 20 min on an orbital shaker with 300 rpm. Afterwards, the solvent was drawn up with a syringe (Injekt-F, Braun Melsungen AG) and filtered through a 0.2 µm syringe filter (non-sterile PTFE Syringe Filter, Thermo Fisher Scientific Inc.). In a second step, 100 µL of the solvent mixture was added, and the procedure was repeated; 50 µL of the extracted sample was mixed with 5 µL of an internal standard containing isotopically labeled benzoic acid (C 6 H 5 13 CO 2 H, 99 atom % 13 C, Sigma-Aldrich, c = 0.1 mg mL −1 ).
Half of each filter from the OFR experiments was cut into small pieces and extracted analogously to the ambient air filter samples with an adjustment in the eluent volume: 180 µL was used in the first and 80 µL was used in the second step. Finally, 100 µL of the extracted sample was mixed with 10 µL of the internal standard.

Standard mixture for non-target analysis validation
A solution of 13 analytical standards was used to validate UHPLC-HRMS measurements and the NTA workflow, with primary attention toward automated compound identification but also toward fragmentation and adduct formation, which can result in false positives. To cover a variety of atmospherically relevant compounds, the mixture consists of carboxylic acids, organosulfates, and organophosphates, as well as nitrogen-containing compounds. The injection volume for the analysis was 1 µL. A detailed overview of the substances used and their concentrations in the mixture is given in Table S2.

UHPLC-HESI-HRMS measurements
The extracts of the ambient PM samples were separated by ultrahigh-performance liquid chromatography (Vanquish Flex, Thermo Fisher Scientific Inc.) on a reversed phase column (Accucore C 18 , 2.6 µm, 150 × 2.1 mm, Thermo Fisher Scientific Inc.), ionized in the negative mode using a heated electrospray ionization source (HESI-II Probe, Thermo Fisher Scientific Inc.), and detected with a highresolution hybrid quadrupole-Orbitrap mass spectrometer (Q Exactive Focus, Thermo Fisher Scientific Inc.). Eluents were ultrapure water (eluent A) and methanol (eluent B), both mixed with 0.1 % formic acid (98 %, Merck KGaA). The injection volume was 5 µL, the flow rate was 400 µL min −1 , and the temperature was 40 • C. The gradient started with 1 % eluent B (0-0.5 min), increased linearly to 99 % B (0.5-14 min), stayed at 99 % B (14-16 min), and was backflushed in 1 min and equilibrated in 3 min, resulting in a total run M. Thoma et al.: Aerosolomics: database and ambient OA time of 20 min. The ion source settings were −3.5 kV spray voltage, 40 psi sheath gas, 8 arbitrary units auxiliary gas, and 262.5 • C capillary temperature. The spectra were recorded in full-scan MS with data-dependent tandem mass spectrometry (ddMS 2 ) using a higher-energy collisional dissociation (HCD) cell with stepped collision energies of 15, 30, and 45 eV. The scan range in full MS was m/z 50-750, with a resolution of 70 000 at m/z 200. For ddMS 2 , the resolution was 17 500.
A representative selection of 10 ambient PM samples was measured a second time. The selection was based on external influences like wind direction, temperature, time of day, and trace gas concentrations. In one sequence, together with filter samples from laboratory OFR experiments, we applied and improved the gradient on another reversed phase column (Cortecs Solid Core T3, 2.7 µm, 150 × 3 mm, with the corresponding VanGuard Cartridge, Waters Corp.). The gradient also started with 1 % B for 30 s, increased linearly to 99 % B in 15 min, and held it for 2 min. Afterwards, the column was backflushed in 90 s and equilibrated in 2.5 min , resulting in a total run time of 21.5 min. This dataset is the basis for the application of the database on ambient samples.
As a quality control routine, we extracted one filter three times to estimate the reproducibility of the extraction procedure. In addition, we determined the instrument performance by a triplicate measurement of one filter extract. We calculated the relative standard deviation (RSD) for seven compounds (m/z 115-357 and signal intensities of 3 × 10 5 -5 × 10 8 counts). Averaged over all seven compounds, we determined a mean RSD of 6.7 % for the reproducibility of the extraction procedure and 2.1 % for the instrumental performance.

Non-target analysis, MS 2 libraries, hierarchical cluster analysis, and volatility estimation
We used Compound Discoverer 3.2 (Thermo Fisher Scientific Inc.) for the NTA of the UHPLC-HRMS raw files. Chromatographic peaks of interest were aligned with a maximum shift of 0.1 min in retention time and a mass tolerance of ±1 ppm. Ions were detected if the peak intensity was at least 5 × 10 5 counts for one of the following ions: In addition to the mass-to-charge ratio of the detected ion, at least one corresponding isotopologue has to be measured. The tolerance between the measured and calculated intensity of the isotopologue has to be less than 30 %. Unknown compounds were then grouped with a retention time tolerance of 0.1 min, and those of them with a sample-to-blank ratio smaller than 5 were marked as background. A predicted composition was calculated within ±1 ppm, allowing the elements carbon (C), hydrogen (H), bromine (Br), chlorine (Cl), nitrogen (N), oxygen (O), and sulfur (S). Compounds were grouped together as CHO, CHNO, CHOS, CHNOS, and "other" if the elemental composition contained other heteroatoms. For unidenti-fied compounds, the software does not predict a composition under the given conditions. The detailed workflow is given in Table S3.
To be clear on the degree of certainty regarding compound identification, we used the confidence levels from Schymanski et al. (2014). Probable and tentatively labeled compounds correspond to level 2 and level 3, respectively. We used the mzCloud database (HighChem LLC, 2013-2021) for comparing MS 2 spectra of commercial chemicals. Unambiguously identified compounds (reference standard, MS 2 spectrum) correspond to confidence level 1.
Based on the Compound Discoverer results from the OFR experiments, we created a library for every examined chemical system (e.g., limonene and ozone) using mzVault 2.3 (Thermo Fisher Scientific Inc.), resulting in a total of 14 libraries of the Aerosolomics database. Every entry in a library contains the exact mass-to-charge ratio, the retention time, the MS 2 spectrum, and the relative abundance to the major product of the respective system if the relative abundance is higher than 1 %. These libraries were implemented in Compound Discoverer and aligned with the identified compounds from the representative selection of the field campaign. An entry in the library was dedicated to a compound in the ambient air filter samples if the difference in the retention times was smaller than 0.2 min, the measured mass-to-charge ratios of the ddMS 2 scans were within a window of 10 ppm, and the match factor indicating the similarity of the MS 2 spectra was bigger than 50 %. Detailed settings of this node are given in Table S3. If a compound appeared in several libraries, the match factor was crucial for an assignment. If match factors were equal, the system in which the compound had the highest relative abundance was chosen.
We calculated the effective saturation mass concentration (log 10 C * ) for each compound with a predicted composition including at least the elements C, H, and O as well as N and S, based on Li et al. (2016). However, we would like to point out that this parameterization comprises a large molecular corridor and thus leads to a wide range of log 10 C * . A bias has been reported for nitrogen-containing compounds (Isaacman-VanWertz and Aumont, 2021); CHO compounds also appear to be biased. For example, log 10 C * of the atmospheric tracer 3-methyl-1,2,3-butanetricarboxylic acid (C 8 H 12 O 6 , MBTCA, Szmigielski et al., 2007) results in 1.97 µg m −3 , while with SIMPOL.1 (Pankow and Asher, 2008), we find log 10 C * at 298 K to be −2.2 µg m −3 . However, this difference of 4 orders of magnitude is certainly an extreme case, as all oxygen atoms of MBTCA are a part of a carboxylic acid functional group.
We performed an HCA with MATLAB R2020a (The MathWorks) based on the complete dataset from the Vienna field campaign, including the blank-corrected integrated sample peak areas. After z-transformation, we used a Euclidean distance metric and the Ward algorithm for computing the distance between the clusters. For the compound clusters of the HCA, an intensity-weighted mean of log 10 C * was calculated. The borders of the different volatility classes are given according to Schervish and Donahue (2020). Based on this, the organic compounds can be classified as volatile (VOC), intermediate volatile (IVOC), semi-volatile (SVOC), low volatile (LVOC), extremely low volatile (ELVOC), and ultra-low volatile (ULVOC).
3 Results and discussion 3.1 Oxidation flow reactor OFR experiments provided SOA from several individually studied VOCs under different oxidation conditions. NTA of UHPLC-HRMS measurements of the SOA extracts enabled us to populate our Aerosolomics database with individual oxidation products of the studied VOCs in a qualitative way. We investigated five BVOCs and four AVOCs, and we identified 481 and 115 oxidation products, respectively. Each of these 596 oxidation products is listed in the database with the information on precursor, oxidation condition, exact massto-charge ratio, retention time, MS 2 spectrum, and relative abundance in the respective system. Although we are not able to determine the individual chemical structure of the different SOA compounds, their individual structures lead to compound-specific retention times. Using all these parameters in the presented database allows unambiguous attribution of SOA compounds in ambient samples to their major parent VOC.

SOA compounds from oxidation of biogenic VOCs
We investigated the composition of SOA from the atmospherically most abundant biogenic monoterpenes (C 10 H 16 ) α-pinene, β-pinene, limonene, and 3-carene. The results are shown as mirror spectra in Fig. 2. The upper half of each subplot shows the ozonolysis products under dark conditions, while the lower spectra show the products from OH oxidation (254 nm UV). The most abundant compounds are labeled with the predicted formula and their retention time; however, the database contains these entries of all compounds down to 1 % relative peak intensity. Panel (a) shows the results of the α-pinene oxidation experiments. Monomers produced during ozonolysis are mainly in the mass range between 140 and 210 Da, while dimers are in the range between 300 and 400 Da. The major products during ozonolysis are pinic acid (C 9 H 14 O 4 at 8.79 min, level 1; Yu et al., 1999), terpenylic acid (C 8 H 12 O 4 at 6.67 min, level 2; Claeys et al., 2009), pinyl-diaterpenyl ester (C 17 H 26 O 8 at 11.28 min, level 2; Kahnt et al., 2018;Yasmeen et al., 2010), C 8 H 14 O 5 (at 5.84 min), and C 8 H 14 O 6 (at 6.56 min). Oxidation by OH reduces the absolute signal intensity of most oxidation products (see Fig. S1). Furthermore, this oxidation environment prevents the production of dimers and certain monomers and changes the relative abun-dance of several monomers. For example, the relative abundance of pinic acid, which is the major compound of the ozonolysis, decreases by 30 % under OH conditions. In contrast, the relative abundance of several other compounds increases, which indicates a higher relevance in the OH system, like terpenylic acid increasing to 100 % relative abundance, C 10 H 16 O 5 (at 9.28 min) increasing to 88 % relative abundance, C 8 H 14 O 5 (at 5.84 min) increasing from 28 % to 73 % relative abundance, and MBTCA increasing from 4 % to 46 % relative abundance. Panel (b) shows the results of the β-pinene oxidation experiments. Here, many compounds are similar to the α-pinene oxidation products, with the exception of dimer formation during OH conditions. For both ozonolysis and OH oxidation, C 9 H 14 O 4 is the compound with the highest relative abundance, although the chromatography resolves different isomers: in the upper spectrum (O 3 ), pinic acid has a relative abundance of 100 %, whereas homoterpenylic acid (C 9 H 14 O 4 at 7.58 min, level 2; Yasmeen et al., 2010) has a relative abundance of 17 %. In the lower spectrum (OH), the relative abundances are reversed, with 100 % at 7.57 min and 65 % at 8.78 min. This indicates that different oxidation conditions of the same precursor result in different isomers of C 9 H 14 O 4 , which can only be resolved with chromatographic separation. Furthermore, homoterpenylic acid does not appear in any other experiment in higher amounts, for which reason it can be used as a specific β-pinene tracer. While the most prominent dimer (pinyldiaterpenyl ester) appears analogously to the α-pinene system during ozonolysis, here, under OH conditions, β-pinene oxidation results in dimer oxidation products like C 19 H 32 O 9 (at 13.23 min) in contrast to no dimers in α-pinene system. Panel (c) shows the results of the limonene oxidation experiments. In contrast to the other three monoterpenes, no dimers were formed, which is in general agreement with Hammes et al. (2019). The ozonolysis shows three major products: hydroxyterpenylic acid (C 8 H 12 O 5 at 5.57 min, level 2; Kahnt et al., 2014), ketolimononic acid (C 9 H 14 O 4 at 6.44 min, level 2; Yasmeen et al., 2011), and C 10 H 16 O 5 (at 6.85 min). In the OH system, ketolimononic acid becomes the major compound, whereas the intensity of hydroxyterpenylic acid increases clearly. Analogous to the β-pinene oxidation, the C 9 H 14 O 4 isomer ketolimononic acid can be used as a specific limonene tracer due to the missing appearance of this isomer in other experiments.
Panel ( (at 11.40 min) appear during ozonolysis, three of which are also reported by Thomsen et al. (2021), tentatively identified In addition to the four monoterpenes, we investigated the composition of sesquiterpene SOA from trans-caryophyllene (C 15 H 24 ). During ozonolysis, we find one major and four minor products in the mass range between 198 and 302 Da (Fig. 2e). The major compound is tentatively identified as βnorcaryophyllonic acid (C 14 H 22 O 4 at 11.82 min, level 3; van Eijck et al., 2013;Jaoui et al., 2003). In contrast, the reaction with OH leads to one major and seven minor products in the range of 118-382 Da. The major compound C 17 H 26 O 4 (at 14.03 min) also appears during ozonolysis but only with a relative abundance of 8 %.
Considering BVOC oxidation in general, it is worth mentioning that different isomers of C 9 H 14 O 4 are clearly separated by the chromatographic system, and we can use them as specific tracers for different BVOCs in the Aerosolomics database. These and even more isomers are present in ambient filter samples (Fig. S2), which demonstrates the necessity of chromatographic separation if an unambiguous assignment is desired. Furthermore, ion source dimerization is a known phenomenon that hinders the unambiguous identification of atmospheric dimers or leads to misinterpretation of results from direct-injection HESI. Based on the knowledge of the exact m/z and the mass dependence of the retention time, we can assign ion source-related dimers to the associated atmospheric monomer. This allows us an unambiguous distinction between monomers and covalently bonded "real" dimers ( Fig. S3).

SOA compounds from oxidation of anthropogenic VOCs
We investigated the composition of SOA from the anthropogenic VOCs 1,2,4-trimethylbenzene, toluene, o-xylene, and naphthalene. We carried out only OH oxidation of AV-OCs because oxidation of aromatic compounds by O 3 is negligible. The filter criteria were similar to the experiments with BVOCs, and the resulting spectra are shown in Fig. 3. All experiments show a noticeably lower number of oxidation products compared to biogenic precursors. We observe dimers only in the 1,2,4-trimethylbenzene and the o-xylene systems. Panel (a) shows the results of the 1,2,4-trimethylbenzene oxidation. The two most prominent compounds are C 9 H 8 O 4 (at 9.04 min) and C 5 H 6 O 4 (at 4.29 min). The remaining compounds play a minor role due to their relative abundances being less than 10 %. Panel (b) shows the results from the oxidation of toluene. The five most prominent compounds show a relative abundance higher than 40 %, and no compound has a relative abundance lower than 15 %. Most of these products are highly oxygenated with more than four oxygen atoms, with the exception of the highest signal corresponding to C 5 H 6 O 3 . All the small (C ≤ 5) highly oxygenated molecules also exhibit a high polarity (t R < 2 min) compared to the oxidation products of other anthropogenic precursors.
Panel (c) shows the results of the o-xylene oxidation, the anthropogenic precursor with the largest number of detected oxidation products (n = 52) of the four investigated AVOCs. While no composition could be assigned by the NTA software for the highest signal due to an invalid isotopic pattern, the most abundant product in panel (d) appears on the same mass trace and at the same retention time. This peak was identified as phthalic acid (C 8 H 6 O 4 , level 1), which is described as a naphthalene SOA tracer by Al-Naiema et al. (2020).
Panel (d) shows the oxidation of naphthalene, resulting in the main oxidation product, phthalic acid. All other compounds have a relative abundance smaller than 20 %. It is worth mentioning that two isomers of C 8 H 6 O 3 appear with a similar relative abundance but with two distinguishable retention times.

Fingerprint
The NTA of the representative selection of the Vienna field campaign extracts results in 1312 compounds, shown in panel (a) of Fig. 4 as retention time as a function of molecular mass. The scatter size represents the mean signal intensity of the 10 measurements on a linear scale. The majority are CHO compounds (54.7 %), followed by CHOS (17.3 %), CHNOS (13.1 %), and unidentified (12.3 %) compounds (Fig. 5) (Claeys and Maenhaut, 2021), and monoterpene oxidation products like MBTCA, terpenylic acid, and pinic acid. Beside the two series of isoprene-derived CHNOS isomers highlighted in Sect. 3.3.1, three additional series of isomers can be found in the molecular fingerprint: C 10 H 17 NO 7 S (295 Da), C 10 H 17 NO 9 S (327 Da), and C 10 H 17 NO 10 S (343 Da) are all described as monoterpene-derived SOA (Surratt et al., 2008). In particular, C 10 H 17 NO 7 S plays a prominent role due to the overall maximum intensity, and it illustrates the anthropogenic influence on the oxidation products from BVOCs in the presence of NO x and SO 2 . The appearance of the ions NO − 3 and HSO − 4 in the MS 2 spectra prove that these compounds are nitrooxy organosulfates. Overall, 86 % of the CHNOS compounds show both ions in their MS 2 spectra and can also be attributed to this group.
The average carbon oxidation state (OS C ) according to Kroll et al. (2011) is in the range between −1.5 and 3, with the exception of two small CHOS compounds with an OS C of up to 4 (Fig. 4b). The majority of the CHO compounds consist of 10 carbon atoms or fewer, while the majority of the CHOS compounds consist of 5 carbon atoms or fewer. This pattern indicates the importance of monoterpenes and isoprene as SOA precursors.
In order to evaluate optimal settings for chemical composition prediction during NTA and to avoid false predictions, the mass difference between calculated and measured molecular mass ( mass) has to be considered. Figure 4d shows that, over the entire mass range, only a slight shift to negative values (median = −0.29 ppm) can be observed within the space between −1 and 1 ppm deviation. Within the set of allowed elemental compositions, an enlargement of the range (±1 ppm) would increase false predictions due to the increased mathematical possibility of other elemental combinations. The Kendrick mass defect (KMD) plot (Fig. 4e) allows the identification of homologous series. Members of such series have the same Kendrick mass defect (Kendrick, 1963), such as C n H 2n−2 O 3 with n = 5-15 (KMD = 0.082), C n H 2n−4 O 4 with n = 4-16 (KMD = 0.119), or C n H 2n−2 O 5 with n = 3-14 (KMD = 0.128). Also, CHOS compounds form homologous series like C n H 2n O 5 S with n = 2-8 (KMD = 0.178) or C n H 2n O 6 S with n = 2-10 (KMD = 0.201). The appearance of these homologous series can be interpreted as oxygenated aliphatic hydrocarbons of fossil origin, as homologous series of (sulfur-containing) aliphatics can be detected in crude oil.

Aerosolomics database assignment
The results from the NTA of the ambient samples were compared with the outcome from the PAM-OFR experiments in order to estimate the contribution of oxidation products formed by several precursors to ambient PM 2.5 . Out of a total of 580 detected CHO compounds, we can assign 108 CHO compounds to biogenic precursors and 24 CHO compounds to anthropogenic VOCs using our Aerosolomics database. The bar plot in Fig. 5 divides the CHO fraction into the different contributions examined; 40.8 % of the mean signal intensity can be attributed to SOA that originated from biogenic precursors (gradations of green and yellow). Out of this, 26.5 % accounts for experiments with OH oxidation, and 14.3 % accounts for ozonolysis experiments. The two major biogenic precursors are αand β-pinene, with a respective share of 17.6 % and 13.2 %. The remaining shares are distributed among trans-caryophyllene products (4.2 %), limonene products (3.6 %), and 3-carene products (2.2 %). Anthropogenic precursors (gradations of magenta) play a minor role, with a total contribution of 5.7 %. From that, 2 % can be assigned to o-xylene, 1.9 % to naphthalene, 1.2 % to 1,2,4-trimethylbenzene, and 0.6 % to toluene. Apart from this, 53.5 % of the mean signal intensity could not be assigned (diagonally hatched).
The majority of the assigned compounds from the representative selection of the Vienna field campaign have molecular masses smaller than 250 Da and a retention time lower than 10 min (Fig. S4a). The mean number of carbon atoms is 9, and the mean OS C is −0.4 (Fig. S4b). SOA that originated from biogenic precursors is located in the H / C area between 1.2 and 1.8 (Fig. S4c), while compounds with H / C < 1.2 can be of aromatic character. The observation that 19 % of the CHO compounds (number-wise) in the database are responsible for nearly 50 % of the mean signal intensity demonstrates the high relevance of the investigated VOCs in SOA formation. Nevertheless, a few compounds with high signal intensities remain unassigned, such as C 8 H 14 O 5 , C 7 H 6 O 2 , C 9 H 16 O 4 , or the tentatively isoprene-derived C 4 H 6 O 5 and C 4 H 8 O 4 (Claeys and Maenhaut, 2021;Krechmer et al., 2015). Considering the retention behavior and the molecular mass of these unassigned compounds (Fig. S4a), we expect isoprene as well as other monoterpenes to be promising candidates for closing this gap. In addition, the precursors already used should be investigated under varying chemical conditions, such as with further oxidants or more complex mixtures of VOCs. A comprehensive study of isoprene oxi-dation is planned, and the outcome will be uploaded to the Aerosolomics database in the near future.
Beside CHO compounds, CHOS and CHNOS compounds play an important role in the overall composition of suburban SOA. Based on these outcomes, further experiments with various VOCs and complex mixtures including inorganic trace gases need to be performed, and the results have to be added to the Aerosolomics database.

Hierarchical cluster analysis
In Fig. 6, we show the results of the HCA (as a heatmap with dendrograms), with the ambient PM 2.5 filter samples from the Vienna field campaign on the horizontal axis and the detected compounds of the NTA on the vertical axis. The color code of the heatmap represents the standardized values of the integrated peak intensities after background correction. We find that the driving factors for the clustering of the filter samples are the wind direction overlaid by the diurnal cycle and the influence of decreasing temperatures, as explained in the following section. Clusters that contain mostly nighttime samples are shaded by a gray background. The ones that contain mainly daytime samples are shaded by a yellow background, while samples during a low-temperature period are shaded by a blue background. Over the entire period, the mean PM 2.5 mass concentration was 8.7 ± 4.4 µg m −3 , the mean NO x concentration was 15.4 ± 16.7 µg m −3 , the mean SO 2 concentration was 1.5 ± 1.2 µg m −3 , and the mean CO concentration was 0.17 ± 0.03 mg m −3 . A detailed time series of meteorological data, PM 2.5 , and trace gas concentrations is shown in Fig. S5. Additionally, Fig. S6 shows the distribution of the wind direction of sample clusters a to i. With the exception of sample clusters b and f, all clusters show a predominant wind direction. The overall mean PM 2.5 mass concentration was higher during collection in sample cluster I (10.3 ± 4.1 µg m −3 ) compared to sample cluster II (5.0 ± 2.4 µg m −3 ).

Compound cluster I -daytime chemistry
Figure 7a-c illustrate the molecular fingerprint of the 373 compounds occurring in compound cluster I; 79 % of the mean signal intensity is caused by CHO compounds, 16 % is unidentified, and 4 % is caused by CHOS compounds. The molecular masses of the compounds are in the range between 100 and 350 Da. CHO compounds mainly have masses smaller than 250 Da and a mean bulk composition of C 8.5 H 12.5 O 5 , which is in good agreement with the appearance of monomers from monoterpene oxidation during daytime. The number of carbon atoms ranges mainly between 4 and 10 (Fig. 7b), indicating biogenic VOCs, like monoterpenes, as potential precursors. The compounds cover a large range in volatility, with log 10 C * values between −4.0 and 6.5 µg m −3 (Fig. 8a), corresponding mainly to LVOC-IVOC, with an intensity-weighted log 10 C * mean value of Figure 5. Contribution of different molecular formula groupings to the mean signal intensity of the molecular fingerprint from the representative selection shown in Fig. 4. Compounds of the CHO group were compared to the libraries of the individual OFR experiments presented in Sect. 3.1 and dedicated to a plausible SOA precursor and oxidation condition shown in the bar. Figure 6. Standardized matrix of the detected compounds from the NTA of the filter samples from the Vienna field campaign. Parts of the horizontal dendrogram of the filter samples are shaded in gray for night cluster, yellow for day cluster, and blue for clustered samples at decreasing temperatures. The vertical compound clusters are separated into four subclusters according to day-and nighttime chemistry, organosulfates, and temperature dependency. For a better differentiation, the dendrograms are colored randomly.
2.3 ± 1.6 µg m −3 . Only a small fraction of the detected compounds have an H / C smaller 1.2, indicating an aromatic character. The five most abundant compounds with a predicted composition are MBTCA (Aerosolomics database library: α-pinene + OH), hydroxyterpenylic acid, C 5 H 6 O 7 , C 4 H 6 O 5 , and C 7 H 10 O 5 . All five compounds are characterized as biogenic SOA compounds derived from isoprene or monoterpenes Ehn et al., 2012;Müller et al., 2012;Qi et al., 2020).
CHOS compounds appear less important in this compound cluster based on a 5 % contribution to the mean signal intensity. Nevertheless, about 45 % of the mean CHOS signal intensity can be attributed to monoterpene-and isoprenederived SOA (Brüggemann et al., 2020).
It is also remarkable that compounds appearing in this cluster show low standardized values in sample cluster II. Low standardized values indicate lower signal intensities of the MS measurements. However, it needs to be considered that equal signal intensities of different compounds do not necessarily mean equal concentrations. It is well known that the ionization efficiency of HESI varies greatly for several compounds and compound classes (Kenseth et al., 2020;Ma et al., 2022). Nevertheless, variation in the intensity of a single compound or class of compounds can be qualitatively interpreted as a variation of its concentration. Compared to a mean temperature of 25.5 ± 4.8 • C during sample cluster I, the mean temperature of sample cluster II is noticeably lower at 19.3 ± 4.3 • C. In particular, the sub-cluster including the last six samples of the entire field campaign shows the lowest standardized values and the lowest mean temperature of 16.1 ± 3.3 • C. The correlation of lower temperatures and low standardized values can be explained due to the temperature dependency of terpene emissions from plants (Holzke et al., 2006) resulting in a lower biogenic SOA burden in the atmosphere.

Compound cluster II -nighttime chemistry
The mean signal intensity of the 134 compounds appearing in compound cluster II (Fig. 7d-f) is mainly caused by CHO (78 %) and CHNO (14 %) compounds. The molecular masses reach up to 440 Da, and the mean bulk composition is C 11.9 H 18.2 O 5.1 . The OS C of these CHO compounds is in the range of −1 to 0.5 (Fig. 7e). Analogously to compound cluster I, the H / C is in the range of 1.2-1.8 (Fig. 7f), indicating biogenic SOA. Log 10 C * values range from −7.1 to 6.7 µg m −3 (Fig. 8b), corresponding mainly to LVOC-IVOC with an intensity-weighted mean of 2.3 ± 2.7 µg m −3 . The most prominent compound is pinic acid (Aerosolomics database library: α-pinene/β-pinene + O 3 ) with a contribution of about 24 % to the mean signal intensity. It is well described as a αand β-pinene ozonolysis product (Christoffersen et al., 1998;Glasius et al., 2000).
In contrast to sample cluster I, compounds from nighttime chemistry form a higher ratio of dimers, with oxidation products up to 20 carbon atoms (Fig. 7e). A total of 14 of 23 CHO dimers occurring in compound cluster II are also reported from OFR experiments performed by Kristensen et al. (2016) and agree with our findings from the OFR experiments described in Sect. 3.1. These 14 dimers are responsible for 72 % of the mean dimer signal intensity. Among them are pinyl-diaterpenyl ester C 17 H 26 O 8 (Aerosolomics database library: α-pinene / β-pinene + O 3 ) and C 19 H 28 O 7 (Aerosolomics database library: α-pinene / 3-carene + O 3 ), both of which are described as esters between pinic acid and terpenylic acid (Gao et al., 2010) or pinic acid and hydroxypinonic acid (Müller et al., 2008), respectively. Furthermore, Yasmeen et al. (2010) reported pinyl-diaterpenyl ester in nighttime ambient aerosol.
While, the main oxidant of VOCs is OH during the day, at night, O 3 and NO 3 dominate the oxidation process, resulting in the formation of organonitrates (Kiendler-Scharr et al., 2016). In accordance, we observed several nitrogencontaining monomers and dimers, including NO − 3 , in the cor-responding MS 2 spectra. Hence, about 87 % of the CHNO compounds at night are tentatively identified as organonitrates (R−ONO 2 ).
The mean signal intensity of compound cluster III is mainly caused by CHOS compounds (60 %), followed by unidentified (29 %) and CHNOS compounds (7 %). The molecular masses of the compounds reach from 120 to 350 Da; those with higher mean signal intensities show only a very limited retention (< 0.9 min) due to high polarity and represent about 72 % of the mean signal intensity. These compounds have ≤ 5 carbon atoms and an OS C between 0 and 4 (Fig. 7h). The generally higher H / C compared to other compound clusters, shown in panel (i), indicates a predominantly aliphatic character of the detected compounds. The saturation vapor pressure (log 10 C * ) ranges between −8.8 and 5.0 µg m −3 (Fig. 8c), which is noticeably lower compared to compound clusters I and II. The intensity-weighted mean of log 10 C * of this cluster yields −0.9 ± 2.0 µg m −3 . Several of the chemical formulas can be described as isoprene-derived SOA (Brüggemann et al., 2020;Chen et al., 2018;Nestorowicz et al., 2018;Riva et al., 2016;Surratt et al., 2007). From 174 CHOS compounds in compound cluster III, 163 MS 2 spectra were recorded. From those, the vast majority (98 %) shows the fragment at m/z 96.9601 (HSO − 4 ), indicating an organosulfate functional group (R−OSO 3 ). The remaining 2 % only show m/z 79.9573 (SO •− 3 ). While this sulfur trioxide radical anion can indeed occur in an organosulfate fragment spectra (Wang et al., 2019), it could also originate from organosulfonates (R−SO 3 ) (Liang et al., 2020;Liu et al., 2015). Besides one high signal intensity of a CHNOS compound at 295 Da and 9.7 min (C 10 H 17 NO 7 S), three further series of isomers appear in the CHOS cluster. We identified six isomers of C 5 H 10 N 2 O 11 S, four isomers of C 5 H 11 NO 9 S, and three isomers of C 5 H 9 NO 8 S. All three chemical formulas can be described as isoprene-derived SOA (Nestorowicz et al., 2018;Surratt et al., 2007Surratt et al., , 2008. Furthermore, three isomers of C 5 H 9 NO 7 S also appear in this compound cluster but have not yet been described as isoprene-derived SOA. The distinct increased standardized values of several cluster III compounds in the heatmap between 8 August 2018 at 17:00 (UTC, AS-17) and 10 August 2018 at 05:00 (UTC, AS-19) can be explained by high SO 2 concentrations up to 18 µg m −3 around noon on 8 August 2018 (Fig. S5). The nearby airport can be excluded as a main source due to similar high SO 2 concentrations at the Stixneusiedl monitoring station (Umweltbundesamt GmbH, 2021), located 12 km southeast of the airport and thus in the upwind direction of the airport. It is also noticeable that the standardized values of these cluster III compounds are very low during northwest trajectories, analogously to compound cluster I (Sect. 3.3.1). Therefore, it is likely that this cluster is mainly linked to long-range transport of pollution from the southeast.

Compound cluster IV -decreasing temperature
Compounds occurring in compound cluster IV (Fig. 7j-l) show clearly increased standardized values during the last six filter samples of the field campaign and are mainly CHO (73 %) and unidentified (17 %) compounds. The majority of the cluster IV compounds have molecular masses < 200 Da. The OS C of those compounds ranges between −1.5 and 1, while the numbers of carbon are less than 10 (Fig. 7k). The majority of the 174 compounds have a non-aromatic character, illustrated in panel (l). Dicarboxylic acids like phthalic acid (C 8 H 6 O 4 , level 1), succinic acid (C 4 H 6 O 4 , level 3), or maleic and fumaric acid (C 4 H 4 O 4 , level 3) are reported as tracers for emissions from biomass burning, vehicular exhaust, and fossil fuel combustion (Zhao et al., 2018). Furthermore, this compound cluster contains the homologous series C 4−9 H 6−16 O 4 , C 5−10 H 8−18 O 3 , and C 5−9 H 10−18 O 3 , which can be interpreted as oxygenated aliphatics of fossil origin.
Compound cluster IV includes log 10 C * values between −8.6 and 6.5 µg m −3 (Fig. 8d), corresponding predominantly to SVOC-IVOC with an intensity-weighted mean of 3.9 ± 1.9 µg m −3 . The high standardized values in the dendrogram of these compounds in sample sub-cluster i can be attributed to their intermediate volatility, which will only occur in the condensed particle phase at low ambient temperatures. During sample cluster I, which is characterized by a higher mean temperature, those compounds would likely partition predominantly into the gas phase.

Challenges with measurements in HESI positive ionization mode
All results in this paper are based on negative ionization mode measurements. For a comprehensive assessment of the chemical composition of the investigated PM 2.5 , measurements in positive ionization mode ((+)HESI) are essential with respect to relevant compound classes that are considerably better ionized in (+)HESI (e.g., organophosphates, phthalates, pesticides, and others). Issues due to strong frag- mentation or ion source adduct formation, which are explained in detail in the Supplement, highlight the need for further optimization of the NTA workflow to prevent false identifications and thus misinterpretations of the results.

Conclusions
A large fraction of ambient PM 2.5 consists of anthropogenic and biogenic SOA. The chemical composition of this mixture is highly complex, which hinders the identification and attribution of single molecules to their precursors, potential sources, and formation pathways. In this study, we present two complementary approaches that enable complexity reduction and the identification of the precursors, formation pathways, and partitioning effects of various IVOCs. The introduced Aerosolomics database for compound matching and precursor identification is based on PAM-OFR experiments with five biogenic and four anthropogenic precursors, as well as different oxidizing regimes (OH, O 3 ). In order to validate the method on ambient samples, we applied the database to a set of PM 2.5 filter samples. Based on the average composition of these samples, we find that CHO compounds account for the largest proportion, with about 55 % of the mean signal intensity. Approximately 45 % of the CHO compounds can be attributed to one of the investigated VOC precursors, which we confirmed by the compound-matching procedure. Hence, the compounds are identified based on re-tention time, exact mass-to-charge ratio, isotopic pattern, and the MS 2 fragmentation spectrum.
On a 1-month set of 12 h filter samples, we performed an HCA to reduce the complexity due to the large number of compounds detected. The compounds were clustered when their intensities showed similar behavior over time, which in turn indicates similar sources or (trans)formation pathways. The clustering of the various PM 2.5 filter samples was primarily driven by wind direction, as well as by the diurnal cycle (day or night) and temperature-driven partitioning changes. Known proxies for monoterpene ozonolysis, like pinic acid or α-pinene-derived dimers, were identified in the nighttime compound cluster. The SOA-aging tracer MBTCA appeared in the daytime compound cluster. A large number of sulfur-containing compounds were clustered together, and this cluster was clearly elevated during a southeasterly wind direction. Small IVOCs were clustered and showed high intensities during a cold period. These observations are a proof of principle for the following: (1) the presented Aerosolomics database enables the identification of tracers from the oxidation of different VOC precursors; (2) in combination with HCA, we can attribute different oxidation products to either nighttime or daytime chemistry, identify periods of multiphase-chemistry processes resulting in organosulfate formation, and observe temperature-driven partitioning of IVOCs.
We would like to encourage the community to apply the database to their own samples. Therefore, we provide the database as db-files, msp-files, and csv-files, which allows the application of the database with Compound Discoverer, MZmine 3, or any self-built solution. As a community effort, further input to the database is desirable to improve our understanding of the sources and formation of secondary organic aerosol.
Author contributions. MT wrote the paper, and ALV and MS advised on the writing of paper. MT and ALV conceptualized the study. MT, FB, FLG, and MS were responsible for the laboratory experiments, sample preparation, and measurements. Data evaluation was done by MT. ALV directed the project administration. All authors commented on the paper and contributed to the scientific discussion.
Competing interests. The contact author has declared that none of the authors has any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.