Application of factor and cluster analyses to determine source–receptor relationships of industrial volatile organic odor species in a dual-optical sensing system

Most odor nuisance investigations rely on either human olfactory examination or on-site sampling and analytical techniques, but these methods are often subject to spatial and temporal limitations and thus impractical for locating odor emission sources. This study developed an alternative approach with a dual-optical sensing system, a meteorological station, and the combination of factor and cluster analyses to identify and characterize emission sources of multiple air contaminants. Factor and cluster analyses were employed to establish the emission profile of multiple odorous substances from each emission source. Both receptor and source monitoring data were collected to characterize the emission sources of various odorous substances. Openpath Fourier transform infrared (OP-FTIR) as a receptor path detected concurrent trends of several organic solvents with concentrations higher than the reference odor threshold values, indicating that these compounds were potential causes of odor nuisance. Qualitative source apportionment by factor and cluster analyses suggested that these odorous substances were used as organic solvents in surface coating or painting processes. Closed-cell Fourier transform infrared (CC-FTIR) at two nearby surface-coating companies indicated that only one company’s stack exhibited the same odorous substance profile found by the OP-FTIR receptor path. The major odor emission source was thus identified in this study. This study demonstrated the feasibility of using the alternative investigative framework to successfully identify emission sources from an industrial odor nuisance site. The major emission sources were identified, and future enforcement plans can be conducted to enhance odor investigation efficiency and improve overall air quality.


Introduction
The rapid growth of the economy and industrialization have led to environmental pollution problems, and consequently, an increase in environmental nuisance complaints has been evidenced in recent years.With more than 93,265 complaints, representing 33.7% of total reported environmental nuisances (Fig. 1), odor nuisances have been ranked as the leading cause of environmental nuisances in Taiwan (Taiwan Environmental Protection Agency 2017).Volatile organic compounds (VOCs) are one of the factors contributing to odors and triggering various health problems, such as asthma, pneumonia, and bronchitis (Pride et al., 2015).It is also a precursor to fine particulate matter in the atmosphere, aggravating photochemical smog conditions in urban areas (Hu et al., 2017;Jathar et al., 2014).With residential area gradually expanded to industrial districts, odor nuisance has become another critical problem related to industrial VOC emissions, with a great impact on quality of life.
Identifying the emission sources responsible for VOCs and odors remains a great challenge.Most odor nuisance investigations rely on either human olfactory examination or on-site sampling and analytical techniques (Merlen et al., 2017).However, these methods are hampered by spatial and temporal limitations.The "triangle odor bag method", originally developed by the Tokyo metropolitan government in 1972, was adopted by the Taiwanese government as a regulatory enforcement method in odor nuisance investigation.This method quantifies odor nuisance by using the human olfactory sense of a group of trained personnel (Higuchi, 2009;Higuchi and Masuda, 2004;Ueno et al., 2009).However, this method can only help determine the degree of odor intensity in collected air samples, but cannot enable the identification of the responsible emission sources.Sampling tools such as the Summa canister, Tedlar bag, and charcoal tube can be equipped with conventional fixed-point sampling and analytical methods to measure various VOC odor species (van Harreveld, 2003;Rumsey et al., 2012).However, these methods are highly temporally and spatially dependent, rendering the sampling of periodic or occasional odor episodes problematic.The insufficiency of conventional fixed-point sampling and analytical methods poses a great challenge to regulatory inspectors when odor nuisance occurs intermittently or during nonworking hours or originates from multiple sources.Many repeated air pollution complaints remain unresolved because the root pollution sources have not yet been found.
Fourier transform infrared (FTIR) spectrometry is an optical sensing technology that can detect multi-gaseous pollutants on a continuous basis and is therefore suitable for use in VOC or odor emission source investigation (Russwurm et al., 1991;Sung et al., 2014).It can allow real-time monitoring and analysis of several compounds simultaneously.The IR "fingerprints" of over 300 compounds were established on the basis of information from the US Environmental Protection Agency (USEPA) and the FTIR software developers.FTIR systems are of two types, namely open-path and closed-cell systems (USEPA, 2011).The open-path system, also called open-path Fourier transform infrared (OP-FTIR) spectroscopy, is an optical remote sensing technique used for measuring VOCs and inorganic compounds such as ammonia and hydrogen chloride in the ambient environment (e.g., fenceline monitoring).The closed-cell system, also called closed-cell Fourier transform infrared (CC-FTIR) spectroscopy, is equipped with the same basic FTIR module as the OP-FTIR system, but employs gas pumps and sampling tubes to extract waste gas (e.g., from stack outlets) to a multipath cell attached to the FTIR spectrometer.In this study, the OP-FTIR and CC-FTIR systems were combined to obtain a "dual-optical sensing system" for accomplishing the multiple functions of open-path long range measurement, continuous monitoring, and multiple species measurement of stack exhaust, offering a powerful alternative method for investigating VOC or odor emission sources.Because OP-FTIR and CC-FTIR systems can generate a large speciation dataset in a short period, statistical methods play an essential role in data processing to extract the underlying meaning behind time-series patterns.Multivariate statistical modeling is suitable for processing FTIR data because it primarily analyzes correlations between time-series trends of different species at different locations.By identifying common contaminants and concurrent trends among the various species measured using both systems, data from both receptors (OP-FTIR) and sources (CC-FTIR) may be compared and analyzed.
The aim of this study was to develop an alternative investigative framework to detect air pollution sources by using a dual-optical sensing system, a meteorological station, and factor and cluster analyses to enable future accomplishments of emission reductions according to the investigation result.

Site description
Taiwan Environmental Protection Agency (TEPA) frequently receives complaints of odor nuisance at an intersection near an industrial park in southern Taiwan.The odor, being described as solvent-or chemical-like, is mostly reported by commuters traveling through or waiting at the traffic signal at this intersection.A sunglass factory (hereafter called CY) is located at the northwest corner, a light metal casing factory (hereafter called KS) in the southeast corner, and a solar cell manufacturer (hereafter called NS) at a slight remove from the intersection to the east.Stacks (approximately 15-30 m high) on the rooftop of each factory continuously emit various processing gases during operating hours.The chemicals used at both CY and KS are mainly paint-related materials containing organic solvents, such as toluene, xylene, acetone, and ethyl acetate, for surface coating purposes (CRC, 2006).NS mainly uses inorganic materials such as ammonia, silane, and nitric acid for silicon glass processing, thus generating both primary and secondary air pollutants (e.g., nitrogen dioxide) from high-temperature glass sintering processes (USPatent:4883521A, 1989).

Sampling techniques
To investigate odor emission sources at this location, an OP-FTIR beam path was deployed at the intersection to mimic the olfactory sense of people traveling through it.The OP-FTIR spectrometer (AirSentry-FTIR, CEREX, USA) used in this study was a monostatic type equipped with Zn-Se beam splitters and liquid nitrogen-cooled mercury cadmium telluride (HgCdTe) detectors and a corner-cube retroreflector (PLX, Inc., USA) placed on the other end of the beam path.An infrared (IR) light beam transmitted from a telescope to the retroreflector targeted some distance away from the light emitter was reflected back to the detector inside the instrument, enabling measurement of pollutants transported through the light beam path.
Monitoring was conducted from March 9 to 19, 2015, collecting a total of 2,911 consecutive spectral data.The OP-FTIR beam path was 143 m long in one direction and was equipped with a light emitter on the ground level on one side and a retroreflector at a height of 10 m on the other side.A meteorological station at a height of 12 m (fourth floor) was used together with the OP-FTIR beam path to monitor wind speed and direction (see Fig. 2a).
Wind and OP-FTIR data were simultaneously measured and continuously collected in a synchronic system to enable identification of the incoming direction of gaseous contaminants and provide spatiotemporal measurement of VOCs or odor pollutants.
Official documents were reviewed to ascertain the raw material usage of each of the nearby factories.Three potential sources (factories), namely, CY, KS, and NS were targeted for further stack monitoring using CC-FTIR.A 10-m (path length) gas cell with the inner pressure of 720 mm Hg, and an estimated gas flow rate of 0.37 Liter/sec.was used for the CC-FTIR multi-reflection gas measurements.The water vapour was mostly removed by using an impinger connected to the inlet of the gas cell to decrease interference with H2O absorption in the FTIR spectra.
The stack exhaust of these three factories was measured for 24 to 72 hours, generating data at each 5-min interval.This continuous monitoring system generated sufficient time-series data to enable factor and cluster analysis in the next phase.Two CC-FTIR systems were deployed at each selected emission source to measure chemical species of exhaust gases from each stack (see Fig. 2b).Sampling tubes were divided into several manifolds at the stack end, joining together before entering the CC-FTIR gas cell.This sampling method allowed multiple waste gas flow from different stacks to be collected and transferred to the gas cell simultaneously, avoiding time lags when switching the sampling line from one stack to another.A total of 4,378 spectral data was collected from the stack outlets of the three potential odor emission sources, namely 288 spectra from CY, 2,907 spectra from KS, and 1,183 spectra from NS.

Chemical analysis methods
Any gaseous compounds absorbed in the IR region (approximately 2.5-25 microns) were potential candidates for monitoring using FTIR technology.The resolution of the OP-FTIR and CC-FTIR interferograms was 1 cm −1 , recording a coadded infrared spectrum at 5-min intervals, with 64 IR scans generated at each interval.
Contaminants of interest were identified and quantified using spectral search software featuring compound-specific analysis and comparison to the system's internal reference spectra library.The unique fingerprint characteristics of each chemical compound brought identification of gaseous pollutants possible through comparing the shape, position and relative peak height of each measured spectrum with reference spectra.Multicomponent classical least-squares techniques were employed in the FTIR spectral quantitative analysis.Rolling backgrounds were used in OP-FTIR spectral analysis to eliminate baseline shifts resulting from contingent changes in weather conditions (Hunt, 1995).A "fixed" reference method was used in CC-FTIR spectral analysis.

Qualitative receptor modeling
Factor analysis using the SAS statistical software package (SAS Institute, Inc., USA) was employed in qualitative receptor modeling using an eigenvector with varimax orthogonal rotations to interpret large datasets (Johnson, 1998).The factor analysis model expresses each variable as a linear combination of underlying common factors f1, f2,…,fm with an accompanying error term to specify that part of the variables that is uncorrelated with any of the common factors.For X1, X2,…, Xp in any observation vector X, the m-factor model is calculated using the following Eq.(1-4): (Rencher, 2002): Eq. ( 1) Eq. ( 2) , and e = (e1,…ep)' Eq. ( 4) where Xi = the ith chemical species with mean 0 and unit variance, i = 1,…,p; ai1 to aim = the factor loadings for the ith chemical species; f1 to fm = m uncorrelated common factors, each with mean 0 and unit variance; e = the error terms indicating the residual part of Xi that is not in common with the other variables.
Because data collected by FTIR contain many intercorrelated variables that are multivariate, simultaneous consideration of all variables was essential to understanding the underlying meaning of the measured data.
Variables (VOC or odor substances) with concurrent patterns were grouped together as a factor to gain insight into the underlying emission source characteristics.Factors with an eigenvalue greater than one were retained for varimax rotations and factor loading calculations.Factor loadings with absolute values greater than 0.4 were considered influential variables (Rencher, 2002); the higher the factor loading (>0.4), the stronger the correlation between the variables (odor substances) and the factor (emission source).The combination of variables in each factor roughly represented the types or characteristics of each factor or source.This method is especially useful when the patterns of association between the receptor (measured by ambient OP-FTIR) and source (measured by stack CC-FTIR) are compared reciprocally, enabling emission sources that mutually correspond to be identified.

Meteorological data
The meteorological data from March 9 to 19, 2015 are shown in Fig. 3.The prevailing wind from March 9 to 14 was from the NNW-N-NNE direction, whereas the prevailing wind from March 15 to 18 was from the SSW-S-SSE-SE-ESE direction.A dramatic change in wind direction from March 14 to March 15, when the incoming wind direction changed from north to south, was observed.The integrated wind direction shown in Fig. 3a indicated that the overall wind direction was from the N-NNE direction during the 10 days of field monitoring.

Ambient data from receptor path
Table 1 shows the ambient concentration of air contaminants measured using the OP-FTIR system at the intersection.The first column represents the 16 species measured by the receptor path (OP-FTIR), namely acetone, ethyl acetate, ammonia, gasoline, m-xylene, nitrogen dioxide, o-xylene, n-butyl acetate, toluene, propylene glycol mono ethyl acetate (PGMEA), p-xylene, acetylene, ethylene, butyl cellosolve, carbon monoxide, and nitrous oxide.Fig. 4 displays a series of comparisons between the measured and reference spectra.The concentrations of most species were quantified, except for background species such as carbon monoxide and nitrous oxide.The exact concentration of background species cannot be quantified using a rolling background in spectral analysis because of unknown background levels; however, the incremental concentration of these species can still be calculated to generate concentration trends suitable for factor analysis.A total of 2,911 consecutive spectra was collected during the 10 days of field monitoring, with various detection limits intrinsic to each compound.The numbers shown in the second column indicated that the probability of detection of ammonia, ethyl acetate, methanol, sulfur hexafluoride, acetone, butyl cellosolve, n-butyl acetate, o-xylene, PGMEA, and ethylene was higher than that of other species.The maximum value of each detected contaminant represented the highest concentration measured within a 5-min period.Concentrations detected using OP-FTIR were path average.Among the 16 detected species, the major compounds were gasoline, m-xylene, and nitrogen dioxide, with mean concentrations of 33.21 ± 5.00, 27.96 ± 6.05, and 25.13 ± 3.28 ppbv, respectively.Toluene, isopropanol, o-xylene, dichloromethane, and acetone revealed mean concentrations ranging from 11.61 to 20.57 ppbv.The concentration levels of gasoline, m-xylene, nitrogen dioxide, n-butyl acetate, toluene, and PGMEA were higher than the odor threshold reference values, indicating that these compounds were potential causes of odor nuisance in the intersection zone.These odor substances are mainly used as organic solvents in surface coating or painting processes.The evidence of correlation between the substances (concentrations) detected at the receptor site and reported odor nuisance events was provided by using phi coefficients and point biserial correlation (Gallagher, 2011;Demirtas et al., 2012).The phi-coefficient correlations (rphi) for "odor" versus "compound" displayed correlation coefficients of two dichotomous variables between the detection of compounds (detected vs. non-detected) and the perception of odor (odor vs. non-odor; as recorded by the local environmental protection agency).The point biserial correlation (rpb), a correlation between one continuous and one dichotomous variable, represents the concentration of compounds and the perception of odor (Capelli et al., 2013).A value close to 1 for rphi / rpb indicated that the association between "odor" and "compound" was strong.The rphi / rpb between the "odor" and acetone, ethyl acetate, toluene, PGMEA and butyl cellosolve were mostly at moderate levels (rphi = 0.50 to 0.67; rpb = 0.30 to 0.45), and the correlations were statistically significant (p<0.001).Relatively weak correlations between the "odor" and m-xylene, p-xylene, and n-butyl acetate were shown, although the correlations were statistically significant (p<0.001) as well.Therefore, it would suggest that acetone, ethyl acetate, toluene, PGMEA and butyl cellosolve were the most possible odor substances that were correlated with the recorded odor nuisance events, which were defined by any solvent smell arising from the intersection zone.A complete time series pattern of chemical species found at the receptor site that were used as the basis for the calculation of r_phi and r_pb was shown in Fig. 5, in which the periods when odor was reported were highlighted., 2009) and could be considered possible causes of odor nuisance because their concentrations were higher than the reference values.The daytime pattern of factor scores for the first group, as shown in Fig. 9a, revealed higher concentrations and frequencies of occurrence from 14:00 to 22:00, particularly on weekdays.This could explain the higher incidence of odor nuisance complaints during the afternoon and evening hours on weekdays.Moreover, the incoming direction of these seven species (as represented by factor scores) was mostly from the N-NE or NW, although a few came from the ESE direction (Fig. 7a), indicating an upwind location of the emission source(s).
The compounds included in the second factor (F2_OP) were acetylene, ethylene, gasoline, and carbon monoxide.
Fig. 9b shows the daytime pattern of these five species, indicating higher concentrations during the peak traffic hours from 6:00 to 9:00 and 17:00 to 20:00 on weekdays.This unique pattern indicates that the second group of compounds was derived from incomplete combustion in vehicles waiting or idling at the intersection and thus generating chemical byproducts such as acetylene, ethylene, and carbon monoxide (USEPA, 2000;Liu et al., 2014).
These mainly inorganic compounds came from the NE direction (Fig. 9c) and exhibited higher concentrations from 06:00 to 09:00 on weekends.The solar cell production company located in the NE and using inorganic materials such as ammonia, silane, and nitric acid to produce silicon glass could generate nitrogen dioxide and nitrous oxide from high-temperature glass sintering processes (USPatent:4883521A, 1989), and was therefore deemed the potential emission source.
The fourth-factor (F4_OP) compounds, namely acetone and n-butyl acetate (Fig. 9d), also exhibited higher concentrations and greater frequency of occurrence from 6:00 to 10:00 and from 17:00 to 22:00 on weekdays.The incoming direction of these two compounds was mainly from the N-ENE direction (Fig. 7d), which is slightly different from that of the first-factor (F1_OP) compounds.
The four factors were identified and characterized through the combination of species, hours of emission, and incoming direction of each.Four groups of emission sources were identified and categorized using factor analysis, namely surface coating (paint), incomplete engine combustion, solar cell production, and solvent use.

Comparison of ambient data from the receptor path and source profiles from multiple stacks
The ambient data from the receptor path indicated a number of factor or source groups at the intersection, including organic solvents from surface coating, traffic emissions from incomplete vehicle engine combustion, and inorganic emissions from solar cell production.Official documents showed that the chemicals used in both CY and KS were paint-related materials containing organic solvents, which were thus categorized as first-factor (F1_OP) compounds.However, windrose diagrams for the first factor (Fig. 4a) displayed multiple source directions (including N-NE, NW, and ESE), indicating that the first factor (F1_OP) might not be limited to one source; further efforts are thus required to clarify the sources.To analyze observations at the receptor path, the emission profiles of potential sources were compared.
Figure 5 and Table 3 present a comparison of the detected air pollutants and their concentrations at the receptor path and source stacks in the intersection zone.Vehicle exhaust profiles from the USEPA's SPECIATE database are also provided in the last column of Table 3 to indicate the traffic emissions from incomplete vehicle engine combustion at the receptor path.Almost every compound detected in the receptor path corresponded with one or more chemicals from the source stacks, except for traffic-related chemicals (e.g., gasoline, ethylene, and acetylene).The panel plot shows the patterns of association between the receptor (ambient data from OP-FTIR) and source (stack source profile from CC-FTIR).Concentration boxplots for chemical species (except carbon monoxide) measured using OP-FTIR (at the intersection) are shown in Fig. 8e, with eight species coinciding with those found in the CY stacks (Fig. 8a), seven coinciding with those found in the KS stacks (Fig. 8b), and three coinciding with those found in the NS stacks (Fig. 8c), as well as six from vehicle emissions (Fig. 8d).Furthermore, among the species found in the of CY and KS stacks, six coexisted in both factories, namely, ethyl acetate, toluene, o-xylene, m-xylene, p-xylene, and acetone, indicating that these six compounds were common species emitted at both locations.By contrast, butyl cellosolve and PGMEA were uniquely found in the CY stacks.
Ammonia was found at both the KS and NS stacks.

Factor and cluster analyses of sources
Because the chemicals used at both CY and KS were mainly organic solvents that are similar to each other, factor analysis was performed for each source (CY and KS) to distinguish the main contributor of odor nuisance in this location and examine relationships between the ambient data and the profiles of these two sources.
Two types of multivariate statistical methods, namely factor and cluster analyses, were used together to analyze concurrent trends of CC-FTIR data measured at the CY, KS and NS stacks (Table 4).The result of factor analysis for CY (Table 4a) indicated two factors with an eigenvalue greater than one.The influential species (factor loading of >0.4) for the first factor (F1_CY) were o-xylene, m-xylene, p-xylene, toluene, PGMEA, ethyl acetate, and butyl cellosolve, but only acetone for the second factor (F2_CY).The first factor (F1_CY) contained a combination of various types of solvents used as paint thinners (for plastic coating purposes), whereas the second factor (F2_CY) species (acetone) were used as a chemical solvent to remove residual paint in sprinkle nozzles.Two factors were also identified from the CC-FTIR results for the KS stack (Table 4c).The first factor (F1_KS) comprised p-xylene, toluene, m-xylene, and o-xylene, and the second factor acetone and ethyl acetate.The first factor (F1_KS) thus contained various chemical solvents used as paint thinners (for metal coating purposes), whereas the second factor (F2_KS) contained substances used for cleaning or other purposes in manufacturing light metal casings.
The chemicals from the NS stacks were mainly inorganic materials (nitrous oxide, silane, ammonia, nitrous acid, and nitrogen dioxide) that were commonly used in the solar cell production (Table 4e), all of which were not corresponded with the organic odorous solvents identified in the receptor sites.
Using cluster dendrograms, different compounds can be linked to represent their relationships with each other and the interrelationships between groups, thus providing another means of displaying correlations between different variables.According to the cluster analysis results in Table 4b& 4d, acetone was excluded from other chemicals already in the first branch, indicating that its original source was different from others.Similarly, the linkage path between groups of chemicals differed from one company to another, indicating that different types of paint thinner could be used in two companies for different purposes.Factor analysis between ambient data and source profiles indicated that the grouping pattern of seven odorous compounds (o-xylene, m-xylene, p-xylene, toluene, PGMEA, ethyl acetate, and butyl cellosolve) between the receptor path (OP-FTIR) and the CY stack (CC-FTIR) was identical.Thus, the CC-FTIR results from the CY stacks indicated the same odorous compounds as the receptor path (OP-FTIR), all of which came from the direction of CY.However, the grouping pattern for KS differed from that of the receptor path (OP-FTIR), with three key species in the first factor (PGMEA, ethyl acetate, and butyl cellosolve) missing in the KS stacks.
Figure 6 uses scatter plots to display concentration variations in detected contaminants over time, with the interrelationship between odorous compounds at the CY stack (CC-FTIR) and the receptor path (OP-FTIR) delineated and compared.Compounds for each pair were linearly correlated, with the correlation coefficients mostly greater than 0.7.However, the correlation coefficients for the KS stack was mostly below 0.1, indicating that the relationships between the ambient data and the KS source profiles were not as significant as those for CY.

Conclusion
This study developed an alternative investigative framework for detecting air pollution sources of odor nuisance by measuring 16 gas species simultaneously using FTIR spectroscopic measurements and factor analyses to identify and characterize emission sources of multiple air contaminants.Meteorological data and cluster analysis were employed to proof the identification of the major odor emissions.Different industrial processes were related to a specific combination of different pollutants, and this combination was obtained using the two statistical methods of factor analysis and cluster analyses.Factor and cluster analyses were employed to improve the quality and completeness of the source profiles.A field study used FTIR spectroscopic measurements to determine the source of the emission of volatile organic odor species near an industrial park in southern Taiwan demonstrated the feasibility of this proposed method.The major odor emission source was identified through qualitative source apportionment of factor and cluster analyses.With enhanced efficiency in odor investigation methodology, future emission reduction plans can be developed and overall air quality can be improved.

Author contribution
Jen-Chih Yang and Pao-Erh Chang designed the experiments.Jen-Chih Yang performed the spectra analysis of both OP-FTIR and CC-FTIR systems.Jen-Chih Yang and Chi-Chang Ho performed statistical modeling for factor and cluster analyses.Jen-Chih Yang prepared the manuscript with contributions from all co-authors.Chang-Fu Wu supervised the project.The authors declare that they have no conflict of interest.

Table 2
summarizes the results of factor analysis for the OP-FTIR receptor path.The pattern of the first factor (F1_OP) indicated several organic solvents, including m-xylene, p-xylene, o-xylene, ethyl acetate, PGMEA, toluene, and butyl cellosolve, all of which are commonly used as chemical solvents in surface coatings and paints (USEPA