A new methodology for performing long-term source apportionment (SA) using positive matrix factorization (PMF) is presented. The method is implemented within the SoFi Pro software package and uses the multilinear engine (ME-2) as a PMF solver. The technique is applied to a 1-year aerosol chemical speciation monitor (ACSM) dataset from downtown Zurich, Switzerland.
The measured organic aerosol mass spectra were analyzed by PMF using a small (14
In this study four to five factors were tested for every PMF window. Factor profiles for primary organic aerosol from traffic (HOA), cooking (COA) and biomass burning (BBOA) were
constrained. Secondary organic aerosol was represented by either the combination of semi-volatile
and low-volatility organic aerosol (SV-OOA and LV-OOA, respectively) or by a single OOA when this separation was not robust. This scheme led to roughly 40 000 PMF runs. Full visual inspection of
all these PMF runs is unrealistic and is replaced by predefined user-selected criteria, which allow
factor sorting and PMF run acceptance/rejection. The selected criteria for traffic (HOA) and BBOA were the correlation with equivalent black carbon from traffic (eBC
HOA and COA contribute between 0.4–0.7
Atmospheric aerosols are at the center of scientific and political air quality discussions due to their highly uncertain direct and indirect climate effects (IPCC, 2013) and negative impact on human
health (e.g., Peng et al., 2005). Regulatory policies addressing these effects require characterization and understanding of aerosol physicochemical properties, sources and formation processes. During the past years, the study of submicron particulate matter (
Source apportionment of organic aerosol (OA) measured with an AMS and/or ACSM is typically performed using the positive matrix factorization algorithm (PMF, Paatero and Tapper, 1994). PMF solutions describe the complex, time-dependent organic aerosol composition as a linear combination of static factor profiles (for AMS/ACSM data, mass spectra) and their time-dependent contributions. Factors can represent a primary organic aerosol emission (POA) or secondary organic aerosol (SOA).
Many organic source apportionment studies with AMS (see review by Zhang et al., 2011) and ACSM data
(e.g., Aurela et al., 2015; Budisulistiorini et al., 2013; Canonaco et al., 2013; Fröhlich et al., 2015; Li et al., 2017; Minguillón et al., 2015; Reyes-Villegas et al., 2016; Ripoll et al., 2015; Schlag et al., 2016; Sun et al., 2013, 2018; Tiitta et al., 2014; Wang et al., 2017; Zhang et
al., 2019; Zhu et al., 2018) have successfully employed the PMF algorithm. PMF results suffer from
rotational ambiguity (Paatero et al., 2002); i.e., several PMF results exist with a similar goodness of fit. An approximate method for the quantification of the rotational uncertainty, i.e., the amount of rotational ambiguity (Paatero et al., 2014), can be obtained using the global
A crucial limitation of the traditional PMF approach is that the time-dependent variability of the composition of the organic aerosol sources cannot be properly modeled using static profiles in a year-long PMF model. Both POA and SOA may have time-dependent composition. For example, vehicles utilize different fuel blends in winter and summer for traffic (Agrola, 2017), while biomass burning may be dominated by different burning types and/or materials in different seasons, e.g., domestic heating in winter, agricultural waste/residue burning in spring/fall, and wildfires in summer. SOA sources may likewise be affected by seasonal changes in either precursor emissions (e.g., monoterpene emissions increase exponentially with temperature) or physicochemical processes (e.g., gas–particle partitioning, oxidant concentrations) (Hallquist et al., 2009). Amongst others, Canonaco et al. (2015), Daellenbach et al. (2017) and Sun et al. (2018) showed that ACSM SOA mass spectra possess distinct seasonal trends which need to be considered during the PMF analysis. For Zurich, Stefenelli et al. (2019) and Qi et al. (2019) were able to demonstrate this seasonal variability of SOA characteristics by molecular analysis, with terpene-related SOA being dominant in summer and aged wood burning organic aerosol being dominant in winter.
Technically, modeling seasonally dependent mass spectra from a given source family, e.g., traffic, biomass burning, or SOA, can be achieved in two ways. PMF can be applied to a multi-season dataset, with time-dependent source composition modeling of a single factor per source or source class, similar to typical representations of SOA in short-term field campaigns by two factors with different degrees of oxygenation (Zhang et al., 2011). However, multi-factor representations of seasonal changes are likely to significantly increase the complexity of the PMF solution, primarily due to a rapid increase in the number of factors and thus leading to difficulties in interpretation. Another possibility is to perform PMF over a small, moving time frame such that the factor profiles evolve with time while maintaining a single factor per source family. This is likely the best choice for long-term data, due to both the relative simplicity of the solution and important savings in computational and evaluation time. The latter is also more compatible with a continuously growing dataset, e.g., for online source apportionment studies, where the entire dataset does not have to be completely reanalyzed when new data are included, in contrast to classical batch analyses. Parworth et al. (2015) have already shown the effectiveness of such an approach, i.e., employing a small and moving PMF window for analyzing remote long-term ACSM data with only a few unconstrained aerosol sources/components. However, a rotational and statistical uncertainty exploration was not conducted.
This study presents the analysis of ACSM data measured in Zurich between February 2011 and February
2012. The dataset includes several sources that were difficult to separate using unconstrained PMF,
which are constrained using known POA sources in ME-2 for a small and rolling time window. This
strategy allows us to adequately account for time-dependent variation of the POA and SOA factor profiles. The applied constraining technique allows for a more comprehensive and quantitative
assessment of the rotational uncertainty than the global
An ACSM (Aerodyne Research, Inc., Billerica, MA, USA) was deployed at the Kaserne station, an urban
background station in the city center of Zurich (Switzerland), between February 2011 and February 2012 (Lanz et al., 2007, 2008; Canonaco et al., 2013). The ACSM is an instrument based on Aerodyne aerosol mass spectrometer (AMS) technology but optimized for long-term measurements with minimal maintenance requirements. The ACSM measures the real-time composition of non-refractory
submicron particulate matter, customarily referred to as NR-
The ACSM in Zurich was operated with a scan rate of 1
The meteorological data (temperature, relative humidity, solar radiation, precipitation, wind speed
and wind direction) and trace gases (
Seasonal PMF runs performed on the ACSM data in earlier studies (Canonaco et al., 2013, 2015) showed
three primary OA factors and one to two secondary OA factors contributing throughout the measurement
year. Among the primary OA factors a traffic-related hydrocarbon-like organic aerosol (HOA) factor
was found, which correlated with
ME-2 (Paatero, 1999) is a powerful engine for solving the positive matrix factorization algorithm
(PMF, Paatero and Tapper, 1994). Model configuration and post-analysis are performed by Source
Finder (SoFi Pro 6.8, Datalystica Ltd., Villigen, Switzerland) within the Igor Pro software environment (Wavemetrics, Inc., Portland, OR, USA) as described in Canonaco et al. (2013). In its bilinear mode,
PMF describes the measured data matrix
To compare
The organic data and error matrices (Allan et al., 2003) are computed using the ACSM local tool
version 1.5.3.2 (Aerodyne Research, Inc., Billerica, MA, USA) in Igor Pro. Weak (signal-to-noise ratio between 2 and 0.2) and bad (signal-to-noise ratio below 0.2) variables were downweighted according
to the recommendations in Paatero and Hopke (2003). The
The new method consists in performing PMF runs on a small and moving window that is translated across the entire dataset. At each step, many individual PMF runs are performed, and the resulting runs are accepted or rejected according to predefined criteria. The window is then moved to the next position, with the distance between window positions being significantly smaller than the window size itself. The set of all accepted PMF runs determines the final source apportionment solution and is also used to assess model uncertainties.
The novelty of this method compared to Parworth et al. (2015) lies in the application of ME-2 for enhanced control of the matrix rotations and in the automated application of user-defined criteria to determine the set of accepted PMF runs. Moving properties of the window (window runs) are discussed in Sect. 2.3.1, whereas the main settings of PMF within a window (PMF runs) are described in Sect. 2.3.2.
PMF analysis is conducted on a subset of data defined by a small window that is moved in 1
The model performance in response to
The mathematical metric
A 14
Overview of the rolling mechanism and the repeats of the PMF analysis.
The rolling strategy described above defines a new window after every window shift. Within this new
window, a PMF run, referred to as repeat in the text, is generated via ME-2, which initializes new
seeds,
In the current study, constraints are applied only to profiles of the POA factors, namely traffic (HOA), cooking (COA) and biomass burning (BBOA). The HOA and COA profiles are taken from Crippa et al. (2013), while BBOA is the averaged mass spectrum reported by Ng et al. (2011a). These anchor profiles were also successfully used for the seasonal analysis of the Zurich–Kaserne data (Canonaco et al., 2013, 2015).
Every constrained factor profile applied in a PMF run requires a sensitivity analysis of the
The second strategy, which is used here, exploits the a priori information of the sources. If some factor profiles are known to be present and their source profiles are known to some extent, there is no need to explore regions in the solution space, for which these factor profiles may drastically depart from their realistic anchors.
Therefore,
Figure 1b shows an almost flat
The random resampling of the PMF input uses the bootstrap approach for every repeat. A window
comprising 14
Manual inspection of all generated PMF runs is impractical and is replaced by an automated procedure based on pre-defined user criteria that (1) identifies and sorts unconstrained factors and
(2) determines whether each PMF run should be accepted or discarded. Examples of user-defined
criteria could include the factor correlation with an external tracer in terms of either the overall time series or diurnal pattern or characteristic temporal features, e.g., a prominent lunch peak for a cooking factor. Modeled PMF factors for which no factor criteria are satisfied, i.e., very poor score values due to factor mixing/swapping or sampling of transient sources not accounted for, typically yield
In addition to determining whether an individual PMF run should be accepted or rejected, the
criteria are used to determine the identity of unconstrained factors. While the positions of
constrained factors within the
Considering the large amount of PMF runs by the rolling window algorithm, the main advantage of this criteria-based inspection is that the complexities of a factor profile and time series are reduced to single values (“scores”). Based on the score plots, potentially promising PMF runs can be further investigated and validated. This significantly improves the efficiency of PMF analysis by discarding PMF runs where the score for any criterion falls below the user-defined threshold (“bad PMF runs”). In contrast to conventional analyses, where a single PMF run often represents an optimal description of the dataset, the entire set of PMF runs classified as environmentally reasonable is used for the analysis and presentation. This provides a more comprehensive and robust representation of the dataset and supports uncertainty assessment.
To determine whether an individual PMF run is accepted or rejected, acceptance thresholds are defined for each of the selected criteria. These thresholds are free parameters and must be defined for each criterion separately. A threshold is inferred either from previous studies or from significance tests or similar statistical analyses (see the discussion for the HOA and COA thresholds in Sect. 2.3.4 for such an example).
The computational time required for criteria application subsequent averaging is typically on the order of minutes to hours with a modern multicore PC, depending on the number of accepted PMF runs. Thereafter, the results can be inspected in real time, allowing the user to efficiently investigate the set of PMF runs and, if needed, test various criteria.
In this study one criterion per factor was defined, although it is possible to apply multiple criteria to the same factor, as each criterion is assessed individually on an accept/reject basis.
Figure 2 shows the criterion scores calculated for each PMF run, with each plot representing an individual factor. The grey points show the score values for all PMF runs, the blue points denote PMF runs where criterion thresholds are satisfied, and the green points represent PMF runs where criterion thresholds for all criteria are simultaneously fulfilled. These green points are then used to compute the final PMF solution. The criteria and their corresponding thresholds applied for each criterion (blue points in Fig. 2) are also reported in Table 2 (first value).
PMF runs sorted based on the scores (grey points), PMF runs fulfilling the criterion thresholds (blue points) and PMF runs fulfilling criterion thresholds in all criteria (green
points). The five criteria are
In the current study, the thresholds for the criteria of HOA and COA were determined based on
statistical analyses with the help of the results from conventional (no rolling technique) seasonal
PMF from previous studies (Canonaco et al., 2013, 2015). The contributions of HOA and its tracer eBC
Criteria scheme employed in this study. The first value represents the applied threshold
for the final PMF solution and the values in brackets for HOA and COA stand for the threshold
value coming from the seasonal resampling analysis.
NA: not available.
As is frequently the case, no chemical tracers for COA were available in this study. Previous measurements in Zurich (Canonaco et al., 2013, 2015) have demonstrated a strong diurnal pattern for COA, with an increased concentration during lunchtime. As a proxy for COA, the lunchtime COA enhancement is monitored (Table 2).
The wood burning contribution to black carbon (eBC
Ng et al. (2010) described higher
The criterion of SV-OOA is further used to differentiate between four- and five-factor solutions on the window runs. For the PMF windows where no five-factor solution with SV-OOA is selected, the set of four-factor solutions in the corresponding PMF window is automatically selected (green points at zero in Fig. 2e). Finally, the averaging procedure also controls and prevents that four- and five-factor solutions are simultaneously considered for the averaging of single time points by privileging five-factor solutions; i.e., any time point containing accepted PMF runs with both four- and five-factor solutions retains only the five-factor solution.
The amount of
Figure 3a shows the time series of each factor for the entire dataset as a mean, averaged over all accepted PMF runs. The data from Fig. 3a are re-averaged to monthly and seasonal means and shown in Fig. 3b and c, respectively. For Fig. 3c, seasons are defined as follows: winter is December–February, spring is March–May, summer is June–August, and fall is September–November.
In winter, spring and fall the concentrations of primary organic aerosols (HOA, COA and BBOA) are approximately 40 % compared to the 60 % of the (secondary) oxygenated organic aerosols (SV-OOA, LV-OOA or OOA). In summer the primary fraction decreases to reach minimum values of
30 % compared to 70 % of OOA. The relative fractions of HOA and COA are rather constant,
contributing on average between 0.4–0.7
For the remaining seasons the seasonal concentrations of SV-OOA, LV-OOA and OOA comprise
0.3–1.1
The time series of the primary OA factors HOA, COA and to some extent BBOA are rather spiky
(Fig. 3a), underlining a strong influence of local sources. The COA spikes that are present from May
2011 through the end of September 2011 are likely due to local barbecuing events during the evening,
as also observed in an earlier study at this site (Lanz et al., 2007). The highest COA
concentrations are observed in early July 2011, where the NR-
Figure 4 summarizes the weekday (left) and weekend (right) daily cycles for the modeled factors. The daily cycle of HOA follows the averaged daily cycles of the estimated traffic of eBC
(eBC
The weekday
The analysis and further validation of the PMF runs using the criteria-based selection are performed
on the PMF results of the rolling windows, and therefore correlations are performed over 14
Correlation coefficients (
BBOA shows substantial correlation with eBC
High correlations between LV-OOA and
Importantly, the rolling results show generally higher correlations with the external tracers than do the conventional seasonal PMF runs (values in brackets in Table 3). This demonstrates that the rolling approach generally outperforms the conventional seasonal PMF analysis.
The mean factor profiles of the six modeled sources/components over the entire year are presented in Fig. 5. Error bars show 1 standard deviation of profile variability across the entire measurement year. Note that this variability comprises both the time-dependent variation of the factor profiles and the PMF error (see Sect. 3.5 for more details on the discussion of the errors in this study).
The mass spectra of the six factors. The spectra have been truncated at
A better understanding of the temporal variation of the factor profiles is gained when inspecting
them over time. Figure 6 shows the fractional contributions of
Daily averaged fractions of important AMS/ACSM
This is different for the oxygenated factors. LV-OOA, SV-OOA and OOA for example contain high
The time-dependent mass spectral matrix of the factors can be found in the Supplement Section C, although a detailed analysis is beyond the scope of the current study. When employing this type of analysis, future studies should investigate in more detail changes in the variables in the factor profiles. This information might provide new insights on seasonal or source-specific markers, essential for source apportionment analyses.
Figure 7a and b show the scaled residuals as functions of
This results in a change in the factor profiles of COA and BBOA and SV-OOA (as already stressed in Sect. 3.3). However, the COA, BBOA and SV-OOA profiles roughly 8–10
Within this study, each PMF run combines a random selection of
Probability density functions for the PMF
The data reported in Fig. 8 were first log-transformed, as the untransformed distribution was skewed to the right, mostly due to time points with low signal-to-noise ratio that would have had a stronger impact on the final error calculation using an untransformed, i.e., linear representation.
The techniques described in this study are relevant for long-term source apportionment (SA) studies,
in particular for ACSM data. The stability of the primary profiles (HOA, COA and BBOA) suggests that
they are rather independent of the season and that employing primary OA factors coming from other SA studies (here profiles from an AMS SA in Paris conducted years earlier) using, e.g., the
In general, selection of the rolling window size (
The importance of defining the proper number of factors is strongly emphasized when analyzing transient events, e.g., the Caliente episode. This becomes even more important when performing automated source apportionment schemes, where the ability of factors to dynamically change and adapt
to the current window run is limited, as is the case for the current rolling mechanism presented in this study. During Caliente the variability of
Crippa et al. (2014) already demonstrated for 25 AMS datasets that an
The other remaining free parameters (
Unlike batch-style PMF (i.e., a single PMF run encompassing the entire dataset), here corrections or scaling factors affecting entire rows or columns of the input data matrix should be applied prior to SA analysis. For example, the collection efficiency (CE) parameter applied for ACSM data analysis is
applied to all measured
It is likely that the PMF errors reported above can be further reduced by further refinements to the rolling window algorithm. One major limitation is the application of season-specific criteria thresholds. In the future, criteria thresholds with a higher temporal resolution are certainly desirable. Another major limitation is the continuous presence of the primary OA factors during the entire analysis. Similarly to the (de)activation of SV-OOA within this study, in the future one or more factors should be (de)activated during the evolution of the rolling approach to better cope with the complex and dynamic real atmospheric conditions.
A rolling-window PMF algorithm was applied to NR-
PMF runs were conducted where the
The separation between the primary OA factors (HOA, COA and BBOA) and oxygenated organic aerosol (SV-OOA, LV-OOA and OOA) was rather robust throughout the year. HOA and COA were rather constant, whereas BBOA showed a very strong seasonality with the highest contribution in winter and lowest in summer. The model separated OOA into SV-OOA and LV-OOA mainly during the warm season (spring and summer), including a warm episode during the first winter. The strongest changes in the factor profiles were visible for the oxygenated species SV-OOA and LV-OOA, whereas the primary species HOA, COA and BBOA showed smaller variations. Hence, the rolling mechanism is certainly essential when properly apportioning the oxygenated organic aerosol fraction.
The model was still able to separate a semi-volatile fraction for the colder seasons based on the
variation in
The rotational and statistical uncertainties were assessed via random
Finally, the free parameters tested and validated in this study, i.e., the 14
Data related to this article are available at
The supplement related to this article is available online at:
All the authors made substantial contributions to the conception, design, analysis and interpretation of the data. All the authors participated in drafting the article or revised it critically for important intellectual content and all the authors gave final approval of the version to be submitted and of any revised version.
Francesco Canonaco, Carlo Bozzetti, Anna Tobler and Yulia Sosedova have also been/are still employed by Datalystica Ltd. during the final development of the main SoFi Pro packages, and Datalystica Ltd. is the official distributor of the SoFi Pro licenses.
This research has been supported by the COST action CA16109 Chemical On-Line cOmpoSition and Source Apportionment of fine aerosoLs (COLOSSAL), the SNF COST project SAMSAM IZCOZO_177063, the SNF project IZLCZ2_169986 Haze pollution in China: Sources and atmospheric evolution of particulate matter (HAZECHINA), the EU Horizon 2020 Framework Programme via the ERA-PLANET and transnational project SMURBS (grant agreement no. 689443), and the Swiss State Secretariat for Education, Research and Innovation (SERI; contract nos. 15.0159-1 and 15.0329-1).
This paper was edited by Mingjin Tang and reviewed by two anonymous referees.