Comparison of methods for resolving the contributions of local emissions to measured concentrations

Edwards, Taylor D.; Wong, Yee Ka; Jeong, Cheol-Heon; Wang, Jonathan M.; Su, Yushan; Evans, Greg J.

doi:10.5194/amt-18-2201-2025

Articles | Volume 18, issue 9

https://doi.org/10.5194/amt-18-2201-2025

Articles | Volume 18, issue 9

Research article

16 May 2025

Research article |

| 16 May 2025

Comparison of methods for resolving the contributions of local emissions to measured concentrations

Taylor D. Edwards, Yee Ka Wong, Cheol-Heon Jeong, Jonathan M. Wang, Yushan Su, and Greg J. Evans

Abstract

To accurately study the characteristics of an air pollution emitter, it is necessary to isolate the contribution of that emitter to total measured pollution concentrations. A variety of published methods exist to complete this task, like placing measurements upwind the emitter, employing a distant background measurement station, or algorithmic methods that extract a background from the time series of measured concentrations (e.g. wavelet decomposition). In this study, we measured nitrogen oxides (NO_x), carbon monoxide (CO), carbon dioxide (CO₂), and fine particulate matter (PM_2.5) at four sites spanning Toronto, Ontario, Canada. We first characterized the spatial variability of background concentrations across the city and then tested the accuracy of seven different algorithmic methods of estimating true measured upwind-of-emitter backgrounds near Toronto's Highway 401 by using the data collected at a downwind site. These methods included time-series and regression methods, including machine learning (XGBoost). We observed background concentrations had notable spatial variability, except for PM_2.5. When predicting backgrounds upwind the highway, we found a distant measurement station provided an accurate background only during some times of day and was least accurate during rush hours. When testing algorithmic predictions of upwind-of-highway backgrounds, we found that regression models surpassed the performance of time-series methods, with best predictions having R² exceeding 0.8 for all four pollutants. Despite the better performance of regression models, time-series methods still provided reasonable estimates. We also found that emitter-specific covariates (e.g. traffic counts, on-site dispersion modelling) did not play an important role in regressions, suggesting backgrounds can be well characterized by time of day, meteorology, and distant measurement stations. Based on our results, we provide ranked recommendations for choosing background estimation methods. We suggest future air pollution research characterizing individual emitters includes careful consideration of how background concentrations are estimated.

Please read the corrigendum first before continuing.

Download & links

Article (PDF, 14727 KB)

Notice on corrigendum
The requested paper has a corresponding corrigendum published. Please read the corrigendum first before downloading the article.
Article (14727 KB)

Corrigendum

Download & links

The requested paper has a corresponding corrigendum published. Please read the corrigendum first before downloading the article.

How to cite.

Received: 06 Aug 2024 – Discussion started: 17 Oct 2024 – Revised: 13 Feb 2025 – Accepted: 20 Feb 2025 – Published: 16 May 2025

1 Introduction

Across air pollution literature, there is a common distinction between stationary field measurement sites located well away from any known sources that record background pollution concentrations and those that record local concentrations, such as near-road sites, influenced by emissions from nearby “local” sources. Generally, background concentrations are considered to arise from a mix of more distant upwind anthropogenic and natural sources and processes, while local concentrations are impacted by one or more nearby sources of interest. The difference between the concentration of an air pollutant measured at near-source and background sites can be attributed to local emissions. Within this process of apportioning the measured total concentration, the contribution of emissions from nearby sources is referred to as the local or emitted concentration, while that within air masses arriving from upwind of a measurement site is referred to as the background concentration.

Good measures of background concentrations are important for isolating local sources of pollution. Ideal outdoor field measurements would include instruments both up- and downwind of the source of interest such that the source's contribution is the difference between the two. However, this is not always possible: requiring two simultaneous measurements increases instrumentation and operation cost, there may not be an appropriate upwind location to place instruments, and widely varying wind directions might necessitate more than just one upwind–downwind measurement pair. For these reasons, tools for estimating background concentrations (C_bkg) without a second measurement site are valuable. With reliable C_bkg estimates, researchers can isolate continuous measurements of their sources of interest, which is vital for source attribution and measuring emission rates and emission factors.

If measurements immediately upwind of a source of interest are not available, researchers might utilize either an urban background station or tracer species to isolate contributions from sources of interest. Urban background stations are typically within a few kilometres of the study location but are removed from any major nearby sources. These sites might be located in a park or a nearby rural area. Tracer species are those that are specific to the source of interest – if a researcher knows a measured emissions source is the only major nearby source of a particular species, they can be confident their measured source is the only contributor to measured concentrations of that species.

Unfortunately, both approaches, despite their prevalence in the literature, have limitations. Urban background stations might not be completely isolated from all sources, or background concentrations might vary spatially between the urban background station and the study site (particularly in the context of the strict definition of “background concentration” we provide below). For tracer species, in many cases the source of interest cannot be guaranteed to be the only measured contributor. For example, nitrogen oxides (NO_x) are often considered a tracer for traffic emissions, but in a dense urban area measured NO_x concentrations will contain emissions from many different roads, so no single road can be isolated.

Beyond these common approaches, there exist some other methods for estimating background concentrations, particularly for application to continuous time-series measurements of atmospheric pollution. Notable methods include the following:

measuring pollutant concentrations immediately upwind of the source of interest, as mentioned above, such as in highway studies by Zhu et al. (2002), Kohler et al. (2005), and Frey et al. (2022);
designating a geographically distinct measurement station as an urban or regional background, with that station typically having few nearby emissions sources (Hicks et al., 2021; Hilker et al., 2019);
comparing times when a measurement site is up- and downwind of a target source (Hilker et al., 2019);
identifying a background or apportioning sources via wavelet decomposition (Klems et al., 2010; Sabaliauskas et al., 2014; Wei et al., 2019);
an iterative algorithm employed and tested by Wang (2018) and Hilker et al. (2019) that heuristically estimates a background signal similar to that produced from wavelet decomposition, which is termed a pseudo-wavelet (In brief, this method takes a smoothed interpolation of minima in the measured near-source concentration within a moving time window.);
inverse dispersion modelling, where multiple downwind measurements are paired with a dispersion model estimating downwind concentrations given an emission rate (Inverse dispersion modelling approaches are usually applied to measure emission rates from the source of interest, though concentration upwind of the emitter should be produced as a by-product of this calculation (Fushimi et al., 1997; Olaguer, 2022).);
clustering algorithms, where clustering can identify sources by grouping correlated pollutants and may not necessarily delineate between local and background sources (However, Rodríguez et al. (2024) demonstrated a separation of local and non-local sources using a fuzzy clustering algorithm.);
geospatial interpolation from urban background stations, which can estimate the spatial variability of background concentrations, such as in Arunachalam et al. (2014);
localized iterative regression within a time series of concentrations to extract a baseline signal, as described by Ruckstuhl et al. (2012) (However, this study presented a method to further decompose measurements from a background site, implying a definition of background concentration that is geographically broader than what we consider in this study.).

1.1 Defining “background concentration”

To address the limitations of the methods identified above, we propose a definition for background that is useful for isolating emissions sources of interest: background concentrations, C_bkg, are the portions of the total measured concentrations that were not emitted from the local emission source of interest. This definition is similar to the one provided by Arunachalam et al. (2014). With this definition, the total measured concentration, C_meas, is strictly a sum of the local concentration, C_local, and background concentration, C_bkg:

\begin{matrix} (1) & C_{meas} = C_{local} + C_{bkg} . \end{matrix}

As a corollary to this definition, C_local is only the portion of C_meas that was emitted from the source of interest, and thus the local concentration becomes useful for estimating emissions, source characteristics, etc. This definition recognizes that the background concentration may vary across regions such as a city because of the many sources present. At the same time, the background concentration across a city can be relatively homogenous if much of the background originates from sources or processes well upwind of a city, as is often the case for pollutants such as PM_2.5 and CO₂. Ideally, this background concentration should be measured directly upwind the source of interest, with no interstitial sources. The up- and downwind measurements should also be near enough to each other and the emissions source that dilution of background concentrations while they travel between the up- and downwind instruments is not of concern. This is the configuration at the highway field site studied here, which had instruments placed up- and downwind a major urban highway in Toronto, Canada. While it is desirable for the background site to be as close as possible to the emissions source of interest, the nearer the background site is to the emission source, the greater the potential for emissions from that source to contribute at times to the concentrations measured at the background site. We posit that this definition of background concentration lends itself readily to useful measurements of C_local. Accordingly, it is desirable that researchers measuring rates and/or characteristics of emissions sources can estimate C_bkg when direct measurement is not possible, as previously discussed.

We note that this definition differs from existing interpretations of background in air pollution research, where background might be interpreted as either a minimum or baseline concentration or as pollution arising from long-range transport from multiple distant sources (Gómez-Losada et al., 2016, 2018). These existing definitions would imply homogeneous and temporally constant concentrations spread across an entire neighbourhood, city, or region. Measuring such a background concentration might require a rural measurement or an urban measurement isolated from any single source. In our case, we are interested in measuring C_bkg for the purpose of extracting C_local, so emissions from sources other than the targeted emitter are only a problem if they are so nearby as to render the measurement of C_bkg obviously unusable.

1.2 Study outline and objectives

In this study we tested the accuracy of a variety of methods for estimating background concentration at a field site adjacent a large roadway emissions source. We first qualitatively examined how background concentrations varied across an urban area (Sect. 3.1). We then tested the accuracy of seven algorithms for predicting background concentrations at the near-road site (Sect. 3.2). The algorithmic methods were differentiated into two classes. Frequency methods used the time-series nature of C_meas to predict C_bkg, on the theoretical basis that background concentrations vary on a longer temporal scale than a nearby source and that C_bkg=C_meas at least occasionally. Regression methods were those that incorporated additional covariates measured or estimated at the study site and were regressed to the measured upwind background concentrations. We evaluated the accuracy of each algorithmic estimate of background concentration by temporarily deploying a low-cost air pollution sensor platform to the upwind side of the tested highway site. Finally, we evaluated the relative importance of regression model covariates in estimating background concentrations (Sect. 3.3 and 3.4) and considered limitations (Sect. 3.5).

This study was completed as part of the larger Study of Winter Air Pollution in Toronto (SWAPIT) campaign, a collaborative effort between the academic, government, and private institutions in the Toronto, Ontario, region.

2 Methodology

2.1 Field measurements

We gathered field measurements at four sites throughout Toronto, Ontario, Canada, from 23 November 2023 to 12 April 2024, totalling just over 141 d of measurements. All measurements occurred during winter and early spring conditions in Toronto when photosynthesis of CO₂ is minimal. The next two sections describe the sampling sites and instruments.

2.1.1 Site descriptions

The primary highway field site was located adjacent a stretch of Toronto's Highway 401 located at UTM 617300 m E 4840900 m N 17N (see A in Fig. 1, top; Fig. 1 bottom). This stretch of highway is one of the busiest in North America, with over 400 000 annual average daily traffic (AADT) counts as reported by the Ontario Ministry of Transportation (2021). It is 17 lanes and 113 m wide, is adjacent to the measurement sites, and runs in a primarily west–east direction, offset 18° towards a southwest–northeast direction. This site included two instrument locations. The first was a permanent roadside station on the south side of the highway that was frequently downwind the road. The second location was a background sensor placed north of the highway, which was frequently upwind the road. The north site was designated as the background site based on predominant wind directions and the fact that this site featured a temporarily deployed low-cost sensor platform, while the south site features a permanent air quality station operated by the Ontario Ministry of the Environment, Conservation and Parks. Figure 1 maps this and the remaining study sites.

https://amt.copernicus.org/articles/18/2201/2025/amt-18-2201-2025-f01

Figure 1Top: locations of measurement sites throughout Toronto region. Bottom: detailed map of the Highway 401 field study site. Bottom inset: wind rose measured at Highway 401 roadside (downwind) station during the study period. Throughout this document, the Highway 401 downwind roadside station is referred to as “highway roadside downwind” or “highway downwind”, and the Highway 401 upwind background site is referred to as “highway upwind background” or “highway upwind”.

In addition to the primary highway site, we recorded pollution concentrations at three additional sites throughout the Toronto area. The first site was the Wallberg urban near-road site, located at the University of Toronto's Wallberg Memorial Building at UTM 629381 m E 4835252 m N 17N (Site B in Fig. 1). This site features a similar set of air pollution instruments to the permanent Highway 401 downwind site and was located 15 m from a major urban road and 40 m from an intersection. The remaining two sites were designated as distant urban background sites, not near any emissions sources of comparable magnitude to Highway 401. The first urban background site was Downsview, located at UTM 623330 m E 4848631 m N 17N (Site C in Fig. 1). This site is in a green space near an office building and is about 175 m from the nearest road. The final site was the Hanlan's Point urban background station, located at UTM 630025 m E 4830061 m N 17N (Site D in Fig. 1). This site is located on an island in Lake Ontario, south of Toronto's downtown core. The Hanlan's Point site is isolated from any nearby sources, with the only notable emissions source being a regional airport over a kilometre to the north. Measurements were collected during winter to early spring, so we expect green space near background sites to have a minimal CO₂-sink effect.

All sites listed here except the highway upwind background site were equipped with a similar set of air contaminant instruments, detailed in the next section.

2.1.2 Airborne pollutants, traffic, and meteorology

We employed a variety of instruments to measure air pollutant concentrations, meteorology, and traffic counts. The instruments deployed at each site except the highway upwind background are listed in Table 1. We selected NO_x (NO + NO₂), CO, PM_2.5, and CO₂ to cover a range of dominant sources: we expect PM_2.5 and CO₂ to have large regional background concentrations, while CO and NO_x are more sensitive to proximity to sources. For PM_2.5, given the dominance of regional transport and secondary formation, and the consequential homogeneity of this pollutant's concentration across urban areas, we expect that differentiating between local and background pollution might be difficult. However, we retained PM_2.5 to serve as a counterexample to the other pollutants, which have greater differences between local and background concentrations.

Table 1Air pollution, meteorology, and traffic count instruments deployed at each measurement site except the highway upwind background site.

^a Traffic counts were only recorded at the Highway 401 downwind site and only for the nearest eight lanes. LDV – light-duty vehicles, MHDV – medium- and heavy-duty vehicles. ^b PM_2.5 at the Hanlan's Point background station was measured with a Teledyne API T640, while other sites used the Thermo Fisher 5030 or 5030i SHARP.

Download Print Version | Download XLSX

We acquired additional micrometeorological measurements for dispersion models from various sources, which we detail in Appendix A; we used dispersion model outputs as exogenous variables for regression methods. At the Highway 401 north background site, we deployed a low-cost AirSENCE air pollution measurement system (AUG Signals, Toronto, Canada). This system hosts a variety of low-cost sensor systems to simultaneously measure a variety of pollutants, including the pollutants tested here. Morris et al. (2020) have previously explored the performance of the AirSENCE system.

For PM_2.5 at the Hanlan's Point site, we collected concentrations measured with the Teledyne API T640 rather than the Thermo Fisher SHARP instrument deployed at each other site (also again except for the low-cost instrument upwind the highway). Zheng et al. (2018) directly compared two T640s to the same model SHARP used here and reported variations up to 3 to 5 µg m⁻³ in concentration ranges similar to those typically measured here, with the T640 more often reporting slightly higher concentrations than the SHARP. The possibility that PM_2.5 measured at Hanlan's Point may be slightly inflated should be kept in mind when reading results that directly compare concentrations across sites. Presumably, the low-cost sensor-based PM_2.5 we measured north of the highway also deviated from reference instruments by similar or larger amounts; however, as explained below, we produced a corrective calibration for the low-cost sensor platform prior to deployment. We also found that when directly comparing hourly PM_2.5 concentrations between SHARP and T640 instruments across sites used in this study, variation between instruments was similar to variation between sites, suggesting no systematic bias due to instrument differences (Appendix E). Should any disagreement between instruments exist anyways, this should only affect our results in cases where measured concentrations are compared directly – in cases where data were included in regression models, any offset in measured concentration should have a limited impact on regression results, as regression models can account for systematic biases.

We averaged sub-minutely measurements to the nearest minute to allow time-matched comparison across the instruments. To ensure the low-cost AirSENCE instruments reported concentrations comparable with reference instruments, we applied multiple quality control and calibration steps prior to analysis. In particular, we addressed calibration and drift in some of the low-cost sensors through comparison with other sites and corrected the low-cost PM_2.5 measurements for hygroscopicity with the correction procedure devised by Crilley et al. (2018). We also placed the AirSENCE device atop the downwind highway station for nearly 18 d at the start of our measurement campaign and used this co-location period to calibrate the AirSENCE's sensors against the station's reference instruments, controlling for interference from humidity, pressure, and temperature. Finally, in some cases for CO and CO₂ to avoid concentration biases between sites due to different instrument calibration schedules, we calculated a 0.1 % rolling percentile concentration at each site and set each site's rolling quantile equal. We describe these preprocessing steps in greater detail in Appendix B.

Additional information on some of these same sampling sites and instruments can be found in publications by Wang et al. (2018), Hilker et al. (2019), and Jeong et al. (2020); this list is not exhaustive, and these sites have been employed in a variety of prior air pollution studies.

2.2 Separating measured local and background concentrations at the highway site

To choose when we could consider the difference between near-road and upwind measurements as local concentrations, C_local, we considered the relationship between measured concentrations and wind at the highway site. From Fig. F1 we identified which wind directions to subsample from our measurements to isolate local and background signals: we selected periods where wind direction relative to the road was between 80° to the northwest and 40° to the northeast. The asymmetry in downwind directions relative to the road could be explained by traffic-induced turbulence, which can influence bulk air flow above the road (Hashad et al., 2022). Since the station south of the highway is nearest to an eastbound lane, those lanes might add a westerly component to the observed wind direction. From Fig. F2 we also observe that some downwind roadside (C_meas) and traffic-related ( $C_{local} = C_{meas} - C_{bkg}$ ) concentrations diverged below wind speeds of about 1.0 m s⁻¹. At low wind speeds, measurement of wind direction becomes unreliable, so identifying up- and downwind periods is not possible with stagnant winds. Further, at low wind speeds the likelihood of vehicle-induced turbulence effecting the background measurements increases. To avoid analysing the lowest wind speed periods where these issues might be prevalent, we also restricted highway measurements to non-stagnant winds (i.e. ≥ 1 m s⁻¹).

2.3 Predicting background concentrations at the highway site

2.3.1 On-site background concentration (C_bkg) prediction methods

We tested nine methods of estimating background concentration measured upwind the highway: two urban background stations, three frequency methods, three regression methods, and a final ensemble method.

The urban background stations we tested were the same two urban background stations mentioned previously:

The Downsview station is located in an urban area but 175 m from the nearest road (Site C in Fig. 1).
The Hanlan's Point station is located on an island in Lake Ontario, isolated from nearby emissions (Site D in Fig. 1).

We tested three frequency methods:

A naïve rolling minimum, with the length of the rolling window optimized to minimize prediction error, is a basic method that was included as a minimally simple approach.
The pseudo-wavelet method was devised by Wang et al. (2018).
A rolling ball background subtraction was used, as rolling ball algorithms are common in image processing, where they are used to correct unevenly intense image backgrounds. To our knowledge, this is the first case of a rolling ball algorithm applied in air pollution research.

We included three regression methods:

Traditional ordinary least squares (OLS) multiple linear regression was used.
Regularized (elastic net) regression is a linear model with regularization terms to control for overfitting.
Machine learning regression with XGBoost can produce accurate non-linear predictions and has many hyperparameters that can be tuned to control overfitting, degree of variable interaction, model complexity, etc. The XGBoost model has been successfully deployed previously in air quality studies, demonstrating its potential usefulness (Xu et al., 2020b, a). See Appendix C for details on how we specified XGBoost models.

For each regression method we included a variety of predictive covariates in addition to concentration measured downwind of the road, including concentrations measured at the distant urban background stations, traffic count, predictions of pollutant dilution from the RLINE dispersion model, meteorology measured at the Highway 401 site, and more (Snyder et al., 2013). In some cases, we transformed covariates prior to fitting regression models to increase the linearity of the relationship between covariate and measured C_bkg, and for regression models we scaled predictors. Finally, we included one additional ensemble model: this final method was a regularized (ridge) regression using the predictions from each of the prior listed methods as inputs. Extended descriptions of each of the algorithmic methods are provided in Appendix C.

2.3.2 Optimizing prediction methods and evaluating accuracy

Many of the above methods for predicting C_bkg require user-specific parameters. To select these parameters, we applied a similar process across each method. For each algorithmic method, we optimized for parameters that produced the lowest prediction error by either iterating over parameters or via Bayesian hyperoptimization (Akiba et al., 2019). In each case we evaluated prediction error with 5-fold cross-validation to control for overfitting. The only exception was OLS, which has no hyperparameters to tune; however, we still evaluated its accuracy with the same cross-validation scheme. Additional details on C_bkg prediction method optimization and evaluation, including details on optimized hyperparameters, cross-validation, and metrics, are included in the Appendix.

3 Results and discussion

3.1 Geographic variability of urban background concentrations

After defining when a measurement is considered background at the highway site, we first compared average background concentrations at the three sites in the Greater Toronto Area. Figure 2 summarizes average concentrations, while Fig. 3 depicts their diurnal patterns. From these figures, we can directly compare typical levels and daily patterns in background concentrations across a city. Table 2 quantifies geographic and temporal variability in local and background concentrations at the same sites.

https://amt.copernicus.org/articles/18/2201/2025/amt-18-2201-2025-f02

Figure 2Box-and-whisker plots of minutely concentrations measured at the various sites throughout Toronto. Darker hatched boxes indicate sites near and/or downwind a road (i.e. non-background sites). Boxes extend to 25th and 75th quantiles; whiskers extend an additional 1.5 interquartile ranges. Middle bars are medians. Note that highway sites were limited to periods with appropriate wind directions and speeds, as described in the methodology.

Comparison of methods for resolving the contributions of local emissions to measured concentrations

1.1 Defining “background concentration”

1.2 Study outline and objectives

2.1 Field measurements

2.1.1 Site descriptions

2.1.2 Airborne pollutants, traffic, and meteorology

2.2 Separating measured local and background concentrations at the highway site

2.3 Predicting background concentrations at the highway site

2.3.1 On-site background concentration (Cbkg) prediction methods

2.3.2 Optimizing prediction methods and evaluating accuracy

3.1 Geographic variability of urban background concentrations

3.2 Comparing performance of background concentration estimates

3.3 Importance of site-specific covariates

3.4 Regression model feature importance

3.5 Limitations of analysis

C1 Naïve rolling minimum

C2 Pseudo-wavelet

C3 Rolling ball

C4 Regression model covariates

C5 Ordinary least squares regression

C6 Regularized (elastic net) regression

C7 Machine learning with XGBoost

C8 Ensemble background estimate

2.3.1 On-site background concentration (C_bkg) prediction methods