Strategies of Method Selection for Fine Scale PM 2 . 5 Mapping in Intra-Urban Area Under Crowdsourcing Monitoring

Fine particulate matters (PM2.5) are of great concern to public due to their significant risk to human health. Numerous methods have been developed to estimate spatial PM2.5 concentrations at unobserved locations due to the sparse fixed 10 monitoring stations. On the other hand, as the rising of low-cost sensing for air pollution monitoring, crowdsourcing activities has been gradually introduced into fine exposure control in cities. However, the optimal mapping method for conventional sparse fixed measurements may not suit this new high-density monitoring way. This study therefore for the first time presents a crowdsourcing sampling campaign and strategies of method selection for hundred meter-scale level PM2.5 mapping in intraurban area of China. In this process, the crowdsourcing sampling campaign was developed through a group of volunteers and 15 their smart phone applications; the best performed mapping approach was chosen by comparing three widely used modelling method (ordinary kriging (OK), land use regression (LUR), and universal kriging combined OK and LUR (UK)) with increasing training sites. Results show that crowdsourcing based PM2.5 measurements varied significantly by sites (i.e. urban microenvironments) (Period 1: 28–136 μg m; Period 2: 115–266 μg m) and clearly differed from those at national monitoring sites (Period 1: 20–58 μg m; Period 2: 146–219 μg m). Despite the performance of the three models in estimating 20 PM2.5 concentrations all improved as the number of training sites increase, OK interpolation performed best under conditions with non-peak traffic (9:00-11:00) in Period 1 (i.e. light-polluted period) with the hold-out validation R ranging from 0.47 to 0.82. Meanwhile, the accuracy of UK was the highest for 8:00 and 12:00 with less than 70% training sites (0.40–0.69) and all five hours of Period 2 (i.e. heavy-polluted period) (0.32–0.68). Comparatively, LUR demonstrated limited ability in PM2.5 concentration simulations (0.04–0.55). Moreover, spatial distributions of PM2.5 concentrations based on the selected model 25 with crowdsourcing data clearly illustrated their hourly intra urban variations which are generally concealed by the results from national air quality monitoring sites. This method selection strategy provides solid experimental evidence for method selection of PM2.5 mapping under crowdsourcing monitoring and a promising access to the prevention of exposure risks for individuals in their daily life. Atmos. Meas. Tech. Discuss., https://doi.org/10.5194/amt-2018-402 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 7 January 2019 c © Author(s) 2019. CC BY 4.0 License.


Introduction
Fine particulate matters (PM2.5)have been associated with increased risk of morbidity and mortality for both long-term and short-term (Beverland et al., 2012;Cohen et al., 2017;Di et al., 2017;Lelieveld et al., 2017).Nevertheless, it comes down to the persistent cumulative effects from the exposure in daily activities, especially in daily traveling (Kingham et al., 2013;Hankey et al., 2017).It would be very helpful for health protection if individuals could consciously choose the location and time for outdoor activities based on detailed knowledge about the spatiotemporal variation of PM2.5 concentration, and thus exposure.
In situ measurement is the most reliable way to capture the PM2.5 concentrations across every corner of a city in real time.
However, fixed monitoring stations under conventional air quality monitoring networks are sparse.As a result, it has been difficult for site-based observations to capture spatiotemporal variations of air pollutants, especially in intra-urban areas with unevenly-distributed emission sources and dispersion conditions (Kumar et al., 2015;Zou et al., 2016;Apte et al. 2017).For this, spatial mapping methods including air dispersion modelling, spatial interpolation, satellite remote sensing (RS), and empirical models have been increasingly employed to estimate concentrations of PM2.5 at unobserved locations over the past two decades (Jerrett et al., 2005;Henderson et al., 2007;El-Harbawi, 2013;Kim et al., 2014;Rice et al., 2015;Fang et al., 2016;Zou et al., 2017;Zhai et al., 2018;Xu et al., 2018).Among them, the outputs of dispersion model largely depend on detailed emission inventories and meteorological information which are not usually available for many cities.The coarse spatial resolution (≥1-10 km) of satellite instruments and data missing problem due to the cloud cover prohibit the widespread use of RS in PM2.5 concentration mapping in urban environments (Zou et al., 2015;Apte et al., 2017).
In contrast, geostatistical and empirical models can estimate concentrations at high spatial resolution with a rather low requirement for data.The most commonly used ones are ordinary kriging (OK) interpolation and land use regression (LUR) modelling.Moreover, some studies have improved the estimating accuracy through a universal kriging (UK) interpolation by combining these two technologies (Beelen et al., 2009;Mercer et al., 2011;De Hoogh et al., 2018).While they have been successfully applied to map the spatial variability of PM2.5 concentrations for various geographic areas, their accuracy varies as concentration levels and sample sizes change (Wang et al., 2012;Mercer et al., 2011;Lee et al., 2014;Zou et al. 2015;Gillespie et al., 2016;Choi et al., 2017;De Hoogh et al., 2018).
As the rising of low-cost sensing for air pollution monitoring, the real-time strategies for fine exposure control in cities have been further developed (Kumar et al., 2015).Crowdsourcing activities based on informal social networks and web 2.0 technologies that allowed citizens themselves to produce geospatial data among others (Heipke, 2010).Unlike the traditional fixed monitoring stations usually mount on the roofs (i.e. 3 to 20 meters above the ground) for the sake of instruments protection, crowdsourcing activities provide real-time PM2.5 monitoring that reflect the real exposure for individuals living and working on the ground.Although crowdsourcing activities tend to produce observations with questionable quality, they enable us to obtain measurements of ambient air pollution in dense networks at relatively low cost.Some studies have employed these data to display the air pollution concentration and investigate the exposure risks (Thompson, 2016;Miskell et al., 2017;Jerrett Atmos. Meas. Tech. Discuss., et al., 2017).Nevertheless, these observations are still point measurements which are only representative of the limited area around the site and cannot meet the demand of obtaining air pollution concentration whenever and wherever we want.
One way to address the aforementioned challenge is to combine high-density crowdsourcing observations with spatial mapping methods.One of the important investigations was carried out by Schneider et al. (2017) in Oslo, Norway.They presented a universal kriging technique for urban NO2 concentration mapping combining near real-time crowdsourcing observations of urban air quality with output from an air pollution dispersion model.However, the high-density crowdsourcing measurements may vary among urban microenvironments with different human daily activities and differed from the sparsely distributed conventional in situ measurements.Using the elected mapping methods from previous studies to depict variation of air pollution at very fine spatial and temporal scale under new monitoring ways may lead to the misclassification of exposure and an underestimate of risk.Moreover, as the number of valid crowdsourcing observations may change significantly due to the instruments fault, human error, and other quality issues, the applicability of mapping methods to different sampling sizes still needs sound scientific evidence.
In this study, we therefore presented strategies of method selection for PM2.5 concentration mapping based on crowdsourcing datasets with varying sizes.The intra-urban crowdsourcing sampling campaign was conducted in the city of Changsha, China, over two periods under different pollution scenarios.The performance of OK, LUR and UK in estimating PM2.5 pollution was evaluated and compared with increasing training sites.The best performed one was then employed to plot the variation of hourly PM2.5 concentration and identify pollution hotspots in the intra-urban area.Results from this study will provide evidence for method selection of PM2.5 mapping under crowdsourcing monitoring and contribute significantly to efficiently and effectively air pollution mapping and exposure assessment in intra-urban areas.

Measurement instrument
Measurements were performed with 86 portable laser air quality monitors.They are convenient to carry with the overall size of 25×34×14 cm (Fig. 1a).And the monitor has the advantages of fast response, rather high resolution (0.1μg/m 3 ) and data consistency.The concentration of particulate matter was measured using the light-scattering method.The monitor contains a special laser module, the signals were recorded by the photoelectric receptor when particulate matters passing through the laser light.The count and size of particulate matters were then analysed by the microcomputer after the signals were amplified and converted.Finally, their concentrations were calculated based on the conversion factor (K-Factor) for airborne dust concentration (Fig. 1b).
In order to ensure the quality of observations from those monitors, we randomly selected 30 monitors and continuously observed two days next to three national monitoring instruments.The scatter plots of hourly measurements for two instruments were presented in Fig. 1c.The rather high correlation coefficient R 2 of 0.89, and low root-mean-square-errors (RMSE,5.63)and mean relative error (MRE, 4.48%) has guaranteed the reliability of sampling data, more details can be found in the Supporting Information (Table S1).

Sampling design
The sampling area is located in the Changsha metropolitan area (112°49′-113°14′E, 27°58′-28°24′N), which covers an area of approximately 920 km 2 consisting of seven districts (see Fig. 2).Changsha is the capital of Hunan Province with a population of greater than 7 million people.The area experienced high-level exposure to air pollutants due to the growing anthropogenic activities and intensive energy consumption.
To make sure the sampling sites distributed typical for different urban microenvironments (i.e.residential community, building site, school, park, etc.) and relatively even.A series of rules were designed to set up the PM2.5 sampling network based on air quality related local geographical and meteorological conditions as well as the distribution of potential emission sources (see Table 1).The data supporting sampling design consist of important points of interest (POI), dust surfaces, and main road network.All these data were collected from the Information Center of Department of Land and Resources of Hunan Province.
A total of 208 PM2.5 sampling sites were selected.

Sampling and data processing
Sampling was carried out in two time periods in the winter of 2015 to examine the effect of air quality grades on mapping results.The first period was between 8:00 and 12:00, representing a light-polluted period (Period 1).The second period was between 14:00 and 18:00, when Orange (i.e. the pollution level was high) warning signals of haze were released by the Changsha Meteorology Bureau, representing a heavy-polluted period (Period 2).At each monitoring site, samples were uploaded twice-to-three times hourly through a smart phone application (App) by a group of volunteers.Geographic coordinates of sampling sites have been uploaded as well.To represent the hourly pollution level of the sampling area, we averaged valid PM2.5 concentrations measured at all sites.In total, 179-208 samples were successfully collected at each hour over Period 1 and 105-118 samples over Period 2. The official observations at 10 national monitoring sites in the study area were also obtained (China Environmental Monitoring Center, CEMC: http://106.37.208.233:20035/) and averaged for comparison purposes.

Ordinary kriging
Ordinary kriging (OK) estimates the target variable at unsampled location as a linear combination of neighboring observations.It relies on a weighting scheme where closer observations have greater impact on the final prediction.The weighting scheme is dictated by the variogram (Pang et al., 2010;Zou et al., 2015).It can be described as follows: where Z * (X 0 ) is the estimation of an unknown sample point, Z(X i ) and ω i are the value of the i th known sample point surrounding the unknown sample point and its corresponding weight, respectively.and n is the number of known sample points.

Land use regression
Land use regression (LUR) modeling predicts air pollution concentration by linking measurements of monitoring sites and geographic elements around them using the least square method.It is composed of predictor variable extraction and selection as well as regression modelling and validation.
Geographic factors including pollution sources (dust surface and pollution industries), road networks, and land use/cover were employed to indirectly characterize the PM2.5 emissions in this study.These data were generated using multiple ring buffers with different radii (50-1000m) at each monitoring site.Meteorological data with a spatial resolution of roughly 0.4 sites per 100 km 2 (wind speed, atmospheric pressure, relative humidity, temperature) that might affect the dispersion of PM2.5 were obtained as well.Geographic factors were made available by the Information Centre of Department of Land and Resources of Hunan Province.Meteorological data were released by the Hunan Meteorology Bureau.All variables (Table 2) were extracted using ArcGIS (version 10.0).The optimal buffer radius for the percentage of dust surfaces and land use, pollution industries density, and road density were defined based on the maximum Pearson correlation coefficients.
An automatic forward-backward stepwise regression procedure was used to select the best fitting LUR models based on the screened-out predictors.The final LUR models in this study were determined based on the criteria of the lowest Akaike information criterion (AIC) value and the highest fitting R 2 .The model structure can be expressed as follows: where  2.5, is the estimation of hourly-averaged PM2.5 concentration of site s,  , (i=1,2,⋯,n) are independent variables,  0 is a constant,   (i=1,2,⋯,n) are regression coefficients, and μ is the random error estimated using the least squares method.
This process was conducted in R statistical software (version 3.3.2) (Fox and Weisberg 2011, R Core Team 2016).

Universal Kriging
Universal Kriging (UK) in this study is a two-stage statistical procedure.Firstly, sperate standard LUR models were developed based on crowdsourcing observations in training dataset for each hour.Secondly, the residuals for LUR models was calculated and interpolated for each hour using OK technology.Finally, the estimations of residuals at validation sites were extracted and added to LUR estimations.
In this study, OK was performed with Ordinary Kriging using the Geostatistical Analyst Tool of ArcGIS (version 10.0), and interpolated residuals were obtained using the Extract Values to Point Tool.The whole process was implemented with Python scripts.

PM2.5 concentration mapping
Based on the best modelling method with 90% training sites, maps of the spatial distribution of PM2.5 concentration for each hour were estimated.In this study, nearest-neighbor distances range from 15 to 60 meters for Period 1 and 54 to 98 meters for Period 2 between sampling sites.Considering the resolutions of potential predictors, 100 meters was therefore used as the mapping grid size.

Model performance for OK, LUR and UK
The box plots of Fig. 4 show the variation of hold-out validation R 2 for the three mapping approaches in relation to the number of training sites.Overall, the variability ranges and average values of R 2 for OK, LUR and UK were positively associated with the increasing training sites.UK outperformed OK significantly for all five sampling hours of Period 2 while only for 8:00 and 12:00 of Period 1 when the number of training sites smaller than a certain threshold.LUR demonstrated poorest performance for both periods.
Average and standard deviation of RMSE and MRE between the observed and predicted concentrations of PM2.5 in hold-out validation presented in Supporting Information (Table S4-S5) further demonstrated the better performance yet larger variation of these three methods with larger training data sets in PM2.5 concentration estimation.The average RMSE and MRE of OK and UK were close and significantly smaller than LUR.Meanwhile, those of OK were generally smaller than UK in Period 1 while cases in Period 2 were the opposite.

Spatial patterns of crowdsourcing PM2.5 concentration
Figure 6 reveals the spatial distributions PM2.5 concentrations estimated by the best performed mapping method with 90% training sites over the two periods (i.e.OK for Period 1 and UK for Period 2) in the study area from both crowdsourcing sampling sites and national monitoring sites.Significant difference can be found between two sources, and the crowdsourcingbased hourly PM2.5 concentration maps clearly demonstrate more detailed intra urban variations than the national monitoringbased ones, especially during the Period 1.
For Period 1, OK interpolated PM2.5 concentration from national monitoring sites (Fig. 6b) for almost all of the study area was less than 35 µg m -3 .While that from crowdsourcing sampling sites (Fig. 6a) generally shows a three-step growth from the south-east to the north-west and multiple hot spots of PM2.5 concentrations.During each zone of the growth, area with more factories and major roads were experiencing relatively higher PM2.5 concentration than other areas.For Period 2, with the exception of 14:00, the national monitoring-based PM2.5 concentration maps estimated by UK (Fig. 6d) showed high-east and low-west patterns, lower concentrations of PM2.5 (<175 µg m -3 ) centered in the center of Yuelu district.However, crowdsourcing based PM2.5 concentrations revealed by UK (Fig. 6c) demonstrate extensive cold spots of PM2.5 concentrations in not only the southern Changsha County but also the southern Kaifu district.While the southern Yuelu and the west Tianxin with high-density of factories were hot spots of PM2.5 concentration.

Discussion
Aiming at efficiently and effectively mapping PM2.5 concentration in intra-urban area at fine scale under crowdsourcing monitoring, a high-density crowdsourcing sampling campaign and strategies of popular mapping method selection with increasing training sites were presented in China for the first time.
The numbers of sampling sites were 18 and 10 per 100 km 2 for Period 1 and Period 2, respectively.This is a tremendous improvement in comparison with the density of about 0.015 sites per 100 km 2 in the national air quality monitoring network in China.As expected, crowdsourcing based PM2.5 measurements varied significantly by urban microenvironments.
Meanwhile, compared with observations obtained from 10 national PM2.5 monitoring sites, the average and variability range of PM2.5 observations for each sampling hour in this study were found to be significantly different from those at national monitoring sites.These findings suggest that the national air quality monitoring sites are relatively inadequate and inaccurate for exposure risk assessment of PM2.5 on the ground in urban environment.Crowdsourcing based sampling, on the other hand, could reduce the monetary cost of sampling activity and guarantee the data collection occur on the real-ground simultaneously, thus could be an effective alternative for high-density PM2.5 monitoring and be strongly supportive to the short-term air pollution exposure assessment for epidemiologic studies.
However, the difference of PM2.5 observations between crowdsourcing sampling sites and national monitoring sites vary with air quality grades.The crowdsourcing observations were significantly larger than national monitoring ones during Period 1 (light-polluted), while that was clearly opposite during Period 2 (heavy-polluted).This suggests the inconvenient truth that the exposure risk remains relatively high for the public when official air pollution levels are "Good" (i.e.No health implications) and "Moderate" (i.e.Members of sensitive groups should reduce outdoor activities) and this risk is significantly different across the urban area.Meanwhile, it further confirms the necessity of developing real-ground high-density PM2.5 monitoring networks.The variation of PM2.5 difference between two periods may possibly because the change of the major pollution sources in study area; the major contribution of local sources especially the vehicle emission during light-polluted period may lead to the accumulation of PM2.5 near the ground; the sources of long-range transport of regional pollution during heavypolluted period could increase the concentration of PM2.5 on the upper layer.
Unlike previous studies that conducted comparisons of performance and the corresponding exposure assessments of OK, LUR and UK in estimating air pollution concentration at annual and seasonal scale based on measurements from static and sparse fixed stations (Beelen et al., 2009;Mercer et al., 2011;Lee et al., 2014;Zou et al. 2015;Choi et al., 2017;De Hoogh et al., 2018), this is the first study to evaluate and compare their performance with increasing training sites at the hourly scale for different air quality grades under crowdsourcing monitoring.
As expected, the performance of three methods all improved with the increasing training sites.Compared with former studies normally developed in other fields (e.g.spatial variability analysis of soil components in the environmental sciences) (Li and Heap, 2014), this study further confirmed the better performance of OK interpolation with larger training data sets in air pollution estimation.Meanwhile, we substantiated the findings of Johnson et al. (2010) that LUR models developed with fewer sampling sites may perform poorly using real-ground PM2.5 measurements.But average hold-out validation R 2 (0.04-0.55) between observed and predicted concentrations of PM2.5 in this study were smaller than those in Johnson et al. (2010) (0.29-0.67) and results in similar studies of NO2 presented by Wang et al. (2012) and Gillespie et al. (2016) (0.44-0.85).The main reason is probably because the variations of hourly average PM2.5 concentration between monitoring sites were generally sharper in comparison to those of annual average.Moreover, meteorological conditions played a more sensitive role in shortterm transmission and diffusion of PM2.5 than long-term processes.These findings suggest that the most effective way to improve the accuracy of mapping method would still be increasing the number of sampling sites.They confirm, again, the necessity of developing high-density crowdsourcing sampling for PM2.5 monitoring.However, the increased variability ranges of R 2 and standard deviation of RMSE and MRE with increasing training sites also suggest that the performance of these methods was affected by more than sampling size.The spatial distribution of samples, for example, may influence their estimating accuracy too (Li and Heap 2014).0.59).Moreover, it was the best performed method for 8:00 and 12:00 of Period 1 with less than 70% training sites and for all five hours of Period 2. These results suggest that OK interpolation based on crowdsourcing sampling is the best strategy for PM2.5 mapping in intra urban area when official air pollution levels are "Good" and "Moderate" for non-peak traffic conditions in this study, while UK is the best one when pollution levels are "Heavy-polluted".These findings challenge the traditional point on LUR model's good performance in air pollution mapping and verify that the applicability of mapping methods varies as the monitoring technology and sampling density change.Meanwhile, it further implies the major contribution of vehicle emission on PM2.5 concentrations during light-polluted period considering the peak traffic condition of 8:00 and 12:00.In addition, the accuracy of OK and LUR were obviously higher for Period 1 (0.24-0.82; 0.13-0.55)than for Period 2 (0.18-0.59; 0.04-0.42),while that for UK was rather stable (0.40-0.71 vs. 0.32-0.68).This indicates the robustness and generalization capability of UK in estimating PM2.5 concentration.
Using the selected mapping method, the spatial distributions of hourly PM2.5 concentration based on crowdsourcing sampling data and national air quality observations were successfully plotted and compared.Clearly, the former one provides more information on intra urban PM2.5 variations than the latter one.The nearest-neighbor distances ranging from 15 to 60 m between crowdsourcing sampling sites make it real for PM2.5 concentration mapping to reach the hundred meter-scale level.Meanwhile, in the light-polluted period, this phenomenon was more pronounced and the crowdsourcing estimations were clearly higher than the national monitoring ones.These findings above not only suggest the superiority of crowdsourcing activities in PM2.5 monitoring at fine scale, but also prompt us to pay more attention to the scenarios with low-level air pollution.This is critical to the long-term future of air pollution prevention and control and public health protection for China, given that its main emphasis has gradually shifted from the control of heavy pollution to the prevention of exposure risks.
However, despite a technology course and a set of standard quality parameters before crowdsourcing sampling and data cleaning before data processing have been provided, the possible measurement bias caused by the lack of professional services and the relatively low sampling frequency (twice/ three times per hour) during the process is still unavoidable.Future developments in the workflow of crowdsourcing system, the automatic processing technique of crowdsourced data, and the rational uniform quality standard could improve the efficiency of PM2.5 concentration sampling and lower the measurement bias (Heipke 2010).Additionally, as the urban air quality grade has an important effect on spatial distribution of samples (spatial autocorrelation, heterogeneity etc.), which may also be affected by sample size, the mechanism for this influence is somewhat equivocal and needs further research.

Conclusions
This study presented strategies of method selection for efficiently and effectively PM2.5 concentration mapping with increasing training sites based on a crowdsourcing sampling campaign.According to the results, it can be confirmed that PM2.5 concentrations in microenvironments varied significantly across the intra urban area in China's city.And these variations could be clearly disclosed by the crowdsourcing based PM2.5 sampling rather than the national air quality monitoring sites.
Meanwhile, the selection of models for fine scale PM2.5 concentration mapping should be adjusted with the changing sampling and pollution circumstances.Generally, OK interpolation performs best under conditions with non-peak traffic situation in light-polluted period, while the UK modeling can perform better for conditions with the peak traffic and relatively few sampling sites in heavy-polluted period.Additionally, it has to be noticed that the LUR model demonstrates limited ability in estimating PM2.5 concentrations at very fine scale in this study.In short, this method selection strategy provides solid experimental evidence for method selection of PM2.5 mapping under crowdsourcing monitoring and a promising access to the prevention of exposure risks for individuals in their daily life.
Author contribution.SX performed the experiments and wrote the manuscript text.BZ supervised and designed the research work, as well as helped with the manuscript.YL and XZ helped with discussion and revisions.SL and CH participated in data processing.
Competing interests.The authors declare that they have no conflict of interest.

Figure 3
Figure3demonstrates the spatial variation of PM2.5 measurements over the two periods in the study area, and strong spatial variations can be found between different sampling sites and two periods.For Period 1, the PM2.5 concentrations decreased gradually from north to south and from west to east.Higher concentrations of PM2.5 (> 75 µg m -3 ) were observed on sampling sites in the northwest corner of the study area.The sampling sites in Changsha County with high levels of green vegetation Atmos.Meas.Tech.Discuss., https://doi.org/10.5194/amt-2018-402Manuscript under review for journal Atmos.Meas.Tech.Discussion started: 7 January 2019 c Author(s) 2019.CC BY 4.0 License.

Figure 1 :
Figure 1: Principle and accuracy of measurement instrument.

Figure 4 :
Figure 4: Box plots of hold-out validation R 2 between the observed and predicted concentrations of PM2.5 for OK, LUR and UK with 5