Analysis of mobile monitoring data from the microAeth® MA200 for measuring changes in black carbon on the roadside in Augsburg

The portable microAeth® MA200 (MA200) is widely applied for measuring black carbon in human exposure profiling and mobile air quality monitoring. Due to its relatively new in the market, the field lacks a refined assessment of the instruments performance under various settings and data postprocesing approaches. This study assessed the mobile real-time performance of the MA200 in an 25 urban area, Augsburg, Germany. Noise reduction and negative value mitigation were explored via different data postprocessing methods (i.e., local polynomial regression (LPR), optimized noise reduction averaging (ONA), and centered moving average (CMA)) under common interval time (i.e., 5, 10, and 30 s). After noise reduction, the treated-data were evaluated and compared by (1) the amount of useful information attributed to microenvironmental characteristics retained; (2) relative 30 number of negative values left; (3) reduction and retention of peak-samples; and (4) the amount of useful signal retained after correction for local background conditions. Our results identify CMA as a useful tool for isolating the central trends of raw black carbon concentration data in real time while reducing non-sensical negative values and the occurrence and magnitudes of peak-samples that affect visual assessment of the data without substantially affecting bias. Correction for local background 35 concentrations improved the CMA treatment by bringing nuanced microenvironmental changes into more visible. This analysis employs a number of different postprocessing methods for black carbon data, providing comparative insights for researchers looking for black carbon data smoothing 2 approaches, specifically in a mobile monitoring framework and data collected using the microAeth® series of aethalometers. 40

In addition, air pollution concentrations at a specific time and place may consist of two primary aspects: contributions from local source emissions and a background concentration (Tan et al., 2014).
Background concentrations, especially the high background concentration of typical pollution events (such as haze), can obscure the contribution of local sources of pollution ( Van et al., 2013;Van den Bossche et al., 2015). Moreover, real-time changes in local sources, meteorology, and regional 100 transport cause changes in the pollution background (Brantley et al., 2014), which will affect the comparability of measurements over different time periods, even at different times on the same days (Li et al., 2019). Therefore, we employed background concentration values to evaluate the noise reduction in mobile data after postprocessing and to provide a better assessment for local sources contribution of air pollution to measured concentrations.

105
In this study, the application of several common methods for postprocessing black carbon data to improve reliable mobile measurements at high frequencies, including ONA (Hagler et al., 2011), LPR, and CMA, was assessed in the urban city. The postprocessing assessments data were focused on the microAeth® MA200. The quality of each noise reduction approach was assessed by analyzing post-processed data under the following criteria: (1) retention of detailed information attributed to 110 microenvironmental characters; (2) relative number of negative values remained; (3) reduction and retention of peak-samples; and (4) retention of detailed information on microenvironmental characters after background correction.

115
In this study, seven MA200 portable black carbon monitors (MA200-0051, MA200-0053, MA200-0059, MA200-0060, MA200-0155, MA200-0153, MA200-159) (microAeth® MA200; AethLabs, San Francisco, CA, USA) were used simultaneously to measure black carbon levels at the city center under different interval times (5 s, 10 s, and 30 s). The MA200 measures optical ATN from black carbon on a filter across 5 optical wavelengths: infrared, red, green, blue, and ultra-violet (880, 120 625, 528, 470, and 375 nm, respectively). Measurement of optical ATN at 880 nm characterizes the eBC concentration. The detection limit of the MA200 is reported at 30 ng eBC/m 3 and it notifies concentrations at the resolution of 1 ng/m 3 (AethLabs, 2018). In mobile monitoring, the MA200 can be used to estimate personal exposure and quantify eBC mass concentrations in different microenvironments. It can be used to identify the hot spots and to quantify black carbon levels on roads 125 and highways as well as in various other mobile environments (Apte et al., 2011, Dons et al., 2012, Madueño et al., 2019 including bicycles (Wójcik et al., 2014, Samad andVogt, 2020), trains (Andersen et al., 2019), and airplanes (Kim et al., 2019). The device can also be applied in long-term stationary monitoring, vertical profiling, and atmospheric measurement with unmanned aerial vehicles (Cao et al., 2020, Chiliński et al., 2018, Pikridas et al., 2019, balloons (Ferrero et al., 2016(Ferrero et al., , 2014(Ferrero et al., , 2011 130 Markowicz et al., 2017, Samad andVogt, 2020), community monitoring, indoor air quality monitoring, and the assessment of personal exposure and related health effects (Isley et al., 2017). In order to reduce the noise concentration of the data obtained with high time resolution, smoothing algorithms can be used.
AethLabs offers tools for applying several noise reduction algorithms to MA-series device data on its 135 website (https://aethlabs.com [note: a free account is required]). To evaluate the relative performance of MA200, this study analyzed black carbon data collected from multiple MA200 devices, identified individually by serial numbers. Comparative measurements of the MA200 and a stationary Aethalometer (AE33, Magee Scientific, Berkeley, USA) taken approximately for 30 to 60 min between walks showed a good agreement (Pearson's r =0.933) (Liu et al., 2021). In addition, it is worth noting 140 that when the AE33 was used for monitoring black carbon at the same time as the MA200, the AE33 was placed in container, while MA200 was used outdoor (in the stroller) during the individual walks, which may have different relative humidity and temperature. This phenomenon did not influence the consistency of eBC concentration measured with both instruments. Information about the date, duration, and time resolution (time base) of each MA200 device are summarized in Table 1. To demonstrate the 145 unit-to-unit comparability between the MA200 units, we performed intercomparisons at fixed monitoring stations (Table S1) and during collocated mobile measurements (Fig. S2). No wavelength dependence was observed between different instruments for fixed and mobile monitoring measurements.

Study design and routes
The MA200 instrument is able to measure black carbon in 1 s, 5 s, 10 s, 30 s, 60 s, and 300 s interval times. The 1 s time base exhibits the most challenging interpretation because of poor signal to noise ratio especially at low concentrations, which is similar to other optical black carbon monitors (Hagler et al., 2011). Therefore, 1 s measurement resolution may be most useful when sampling in high 155 concentration environments, performing direct emissions testing and requiring high time resolution for application. However, the eBC average concentration is low in the city center of Augsburg, Germany, (measured at 2.62 μg/m 3 in winter by Gu, (2012)) thus we did not use the 1 s time base. Moreover, 60 s and 300 s are too long distance for mobile monitoring, which may affect the accuracy of the spatial variation of pollutants, hence both time bases were also not selected in this study. In order to better 160 understand at which interval time of sampling might be most useful in this context, mobile measurements at low eBC concentrations, three MA200 devices were used in parallel to measure eBC concentrations with the interval times of 5 s, 10 s, and 30 s (Measurement numbers 5-7 in Table 1).
To account for the different land use types of the microenvironments, a fixed walking route within the center of the city was determined. Wherever possible, the mobile measurements were carried out on the 165 right side of the road simulating people's common habits (driving and walking on the right side in Germany). All walks along the route were conducted on weekdays, with clear skies and calm winds to avoid misrepresentation of typical urban exposure conditions. The route started from Augsburg Briefly, the study was consisted of the following phases, (1) collecting raw black carbon data using the sampling instruments (MA200); (2) smoothing the acquired raw black carbon data under different postprocessing methods (i.e., noise reduction); (3) comparing the noise reduction data based on the detail change of value characters and number of negative value; (4) following the peak-samples 180 identification by coefficient of variation (COV) approach and (5) following the background estimation and correction by thin plate regression spline (TPRS) approach; (6) finally, selecting the best noise reduction approach.

Instrumentation preparation
The instruments were prepared and adjusted in our laboratory before each walk, consisting of "zero" 185 calibration checks, the examination of the MA200 filter cassette, battery, GPS, and memory checks.
Flow calibrations were adjusted with a factory-calibrated flow meter (Alicat Scientific, Inc. Tucson, AZ, USA).

Postprocessing methods
The relative utility of the different postprocessing methods is determined by (1) the ability to perceive 190 nuanced differences between microenvironmental pollution characteristics after after noise reduction; (2) the relative number of negative eBC values remained; (3) the reduction and retention of peak-samples; and (4) the ability to perceive nuanced differences between microenvironmental pollution characteristics with the noise-reduced data after background correction. These methods include ONA, LPR, and CMA. 195

ONA (optimized noise reduction averaging)
The principle of the ONA is based on the time series of three parameters in the original observation data, namely the observation time, the original eBC concentration, and the optical ATN, as specifically described by Hagler et al. (2011). Briefly, a ∆ATN threshold is manually set to prevent the algorithm from recalculating eBC until a certain amount of ATN has been detected (e.g., enough black carbon has 200 deposited on the filter to "confidently" calculate an eBC concentration). The aims to reduce erroneous and spurious estimation by dynamically extending the effective sample time-base, hence, there is sufficient ATN to significantly reduce the error effects of instrument noise. This effective time-base will be longer in low concentrations than at higher concentrations and, hence, *no* negatives and less eBC noise will be reported. When using ONA algorithm, this ΔATN threshold needs to be manually 205 assigned. Hagler et al., (2011) implemented a ΔATN threshold of 0.05 to postprocess data from a fixed monitoring site. However, when applied to MA200 data, a ΔATN threshold of 0.05 results in a very smooth curve and may obscure more information than is necessary to provide a usefully smoothed curve. For this reason, a lower ΔATN threshold of 0.01 was selected for the mobile measurement data of our study ( Figure S3). 210

LPR (local polynomial regression)
The LPR algorithm is a non-parametric tool similar to a moving average, but it operates on polynomial regression rather than simple averaging (Masry, 1996, Breidt and Opsomer, 2000, Kai et al., 2010. In LPR, the number of points across which to smooth must be manually identified. This value should be chosen to balance effective smoothing of the measured values and the sensitivity required to provide 215 spatial resolution in mobile measurements (e.g., distance over which the average was taken). The distance resolution was chosen at approximately about 100 m. Assuming the sampling speed is 1.3 m/s, when the interval time is 5 s, 10 s, and 30 s, the smoothing number of points are 15, 7, and 3, respectively.

220
The CMA algorithm is a smoothing technique used to make the long-term trends of a time series clearer (Easton and McColl, 1997). Unlike a simple moving average, CMA has no shift or group delay in the data processing, as it incorporates data from both before and after the datapoint that is being smoothed. The smoothing number of points was determined as previously described in the LPR algorithm, assuming a sampling speed of 1.3 m/s. After postprocessing data, the character change of the treated data is used as criterion to select the best method. In this regard, when the treated data provide more detailed microenvironmental characters, the data reflect the actual situation of air pollutants and facilitate the identification of pollution sources.

230
However, if the microenvironmental characters is less detailed, it may hinder to identify the pollution source. Therefore, more detailed microenvironmental features contributed more accurate information.
In addition, the number of remaining negative values is determined as another criterion to propose the best method. And, the method with the smallest proportion of the negative values is selected as the best method. The proportion of negative values remaining was calculated as the number of negative values 235 divided by the total sample size. and 3 times the standard deviation (Wang et al., 2015). The formula for the running method used in this analysis is previously described by Hagler et al. (2012) with minor modification (Eq. 1):

Peak-sample identification
where COVt is the 70 s sliding COV of the t-th eBC sample under a 10s timebase (representing 30 s prior to the sample, the sample, and 30 seconds after the sample), xi is the i-th eBC sample, x is the average of the t-th eBC sample and the three samples before and after it, andxall is the average of all eBC data in one experiment. The 99th quantile of the 70 s sliding COV of all eBC data is used as the threshold for determining "peak-sample". The eBC samples that are greater than this threshold are 250 flagged as peak-samples along with the eBC samples 3 data points before and after. However, under different time bases (e.g., 5 s, and 30 s), the sliding COV of the t-th black carbon sample are different.
Accordingly, the COV equation is required for modification under different time base.
To calculate the reduction value of peak-samples, the number of peak-samples was calculated before and after postprocessing data, and the difference value was obtained. Then the change in the number of 255 peak-samples was divided by the total number of peak-samples before postprocessing data to calculate the proportion of peak-samples values. After noise reduction, we compared the reduction values and the number of peak-samples to further evaluate postprocessing methods. In short, if the reduction value of peak-samples is high, the treated data has a high peak noise reduction without removing the numbers of peak-samples. Therefore, the method with high reduction value of peak-samples and retaining the 260 number of peak-samples after postprocessing is considered as the better method.

Background estimation and correction
The ability of a processing method to adequately remove the estimated background concentration was used to evaluate which method provides the most useful information related to microenvironmental effects. A noise reduction method that appears to better facilitate background estimation and correction 265 (as described below calculated from noise-reduced data via a defined background estimation and evaluation approach) is assessed to select a better postprocessing method.
Background correction methods include the single sample standardization method, the sliding minimum method, the linear regression postprocessing method, and the spline (of minimum) regression postprocessing method. Brantley et al. (2014) suggests that a thin plate regression spline (TPRS) 270 method can reliably evaluate the background value of mobile measurements, and used to examine the "useful" information in the noise-reduced data (i.e. non-spurious, non-background pollution trends).
Briefly, the TPRS approach includes three steps: first, the noise reduction data of pollutant was processed by a 30 s moving average; second, the results of the 30 s moving average were sequentially processed by the specified time window (i.e., 5 and 10 min), and the position of the minimum sample 275 of pollutant concentration was identified in each window; and finally, thin-plate spline regression was used to fit the sample of minimum pollutant concentration obtained in the previous step, then the background concentration at each time point was obtained.

Results and discussion
The average eBC concentrations of raw, ONA-processed, LPR-processed, and CMA-processed data 280 (Measurements 1-10) monitored by all instruments were compared in this study (Table S2). The results show that the three postprocessing methods accounted of ±1 % bias from the average of raw concentrations. This indicates that the average concentration under each postprocessing method did not affect the average concentration of the raw unprocessed data.

285
As shown in Figure 1, three MA200s were used at the time bases of 5 s, 10 s, and 30 s. The proportion of negative values in the raw data collected under different time base of was 42.1 %, 37.6 %, and 30.5 %, for 5 s, 10 s, and 30 s, respectively (Fig. 1a, Table 2, Fig S4a). Following that, the raw data were processed using ONA, LPR, and CMA (Fig. 1b, 1c, and 1d).
In the 5 s time base, the eBC values changed very rapidly (Fig. 1a), and the ONA processing of the data 290 resulted in only one value (which was negative) (Fig. 1b). Thus, the microenvironmental characters of the eBC concentration was not reproduced. We found all Δ ATN (ATNt(0)+ Δ t'-ATN0) data were negative in the raw data collected at 5 s, which, according to the ONA method described above, resulted in only a single value. In short, after the first measurement, the Δ ATN threshold (which is positive) for calculating the next value was never reached. The first value was likely a negative value 295 due to a combination of instrument noise, coincidence, and a low background concentration (i.e., low baseline instrument signal), which is consistent with both the raw data measurements and the typical low eBC concentrations in the city center of Augsburg, Germany (Gu, 2012). It is unclear why ΔATN remained negative, but, given the long series of low concentration vales in the beginning of the sample and the initial negative measurement, it is possible that the summed Δ ATN became increasingly 300 negative as a result of the initial negative Δ ATN measurement. The subsequent measurements at low-concentration did not exceed the magnitude of the initial negative Δ ATN value. Under these conditions, a cumulative negative sum of Δ ATN would prevent the positive Δ ATN threshold from being achieved at all. If true, this phenomenon highlights one potential weakness of the ONA algorithm, such as difficulty registering a signal under low concentrations and requires further investigation of 305 the conditions under which ONA is truly unbiased. At any rate, the observed phenomenon prevented the use of ONA in the 5 s time base (Fig. 1b). Previous studies in which ONA was successfully applied In the 10 s interval time base, the negative values were not found after ONA processing, suggesting that a reasonable smoothing effect is obtained at low black carbon concentration. The microenvironmental character presented strong changes against the raw data, remaining less detailed information of air pollution. After postprocessing with LPR and CMA, the microenvironmental 315 characters revealed more detailed information of air pollution, with 30.2 % of negative values for LPR and 25.3 % for CMA. In the 30 s interval time base, the negative values comprised 0 % of the post-processed data for ONA, 25.5 % for LPR, and 22.4 % for CMA. The 30 s interval dataset presented the lowest proportion of negative values before and after postprocessing, due to the longer interval times of sampling. However, the longer 30 s measurement period results in more distance 320 covered during each measurement, given the mobile nature of the sampling device. Thus, 30 s black carbon measurements may be too long to detect local concentration peaks in urban contexts that supported in other study (Kerckhoffs et al., 2016).
The ONA algorithm showed a strong ability to extract negative values. As a result, the ONA-treated data may present bias that obscure nuanced microenvironmental trends (Fig. 1b). Interestingly, LPR 325 and CMA postprocessing are capable of decreasing negative values while retaining microenvironmental trends. Both methods are promising for the analysis of spatiotemporal changes in pollutant concentrations with sensitivity to local sources. Previous studies have shown that the spatiotemporal variability of black carbon is highly heterogeneous (Liu et al., 2019;Liu et al., 2021); the ability to capture spatiotemporal variability of microenvironments is critical for assessing 330 differential exposures among populations.

Reduction and number of peak-samples after postprocessing methods
The processing of peak-sample is a pivotal evaluation index for the measurement of time-averaged roadside air quality. Passing vehicles, for example, may bias estimates of typical local concentrations due to their contribution to the dataset of peak concentrations that may substantially related to 345 arithmetic averages. Therefore, after noise reduction, we compare the reduction values and the retained number of peak-samples to further evaluate the postprocessing methods.
In the interval time 5 s, the average reduction of peak-samples for the LPR and CMA algorithms was 72.0 % and 87.4 %, respectively (as discussed above, the ONA method could not be used). In this interval time, the reduction of peak-samples was relatively high, indicating that when monitoring black 350 carbon at low concentrations and high sample frequencies, the drastic noise may occur in the raw data, and the higher noise reduction may affect the actual values. Therefore, the suitable interval time should be considered when monitoring low eBC concentrations. In the interval time 10 s, the average reduction of peak-samples for the CMA (47.7 %) is higher than ONA (5.54 %) and LPR (22.7 %). In the interval time 30 s, CMA presented the greatest average reduction of peak-samples (39.1 %) 355 compared to ONA (6.24 %) and LPR (0.62 %) (Table 2, Fig. S4b). The retention of peak-samples remaining after postprocessing was also assessed using the COV method (Measurements 1-10). The result showed that all three algorithms retained all peak-samples before and after postprocessing. In this regard, CMA retained all peak samples despite the highest reduction in their magnitude. Therefore, CMA highlights microenvironmental trends while preserving the identity of peak-samples, facilitating 360 the identification of local pollution sources, and may thus be a better postprocessing method than ONA or LPR (Table 2, Fig. S4b).
To further characterize the distribution of peak-sample concentration under CMA, we performed an intensive graphical analysis on a single data stream (Measurement 4; Fig. 2). As shown in Figure 2, eBC values along the main roads and intersections were higher than other locations, presumably due in 365 large part to stop-and-go traffic and cars in close proximity to the mobile monitor (Fig. 2). It can be seen from Figure (2013) found evidence of a substantial improvement in data quality related to vibration-related spikes after an equipment upgrade by AethLabs, which reflected the aforementioned improvements to opto-electronics. In addition, there were no major mechanical shocks to or unique vibrational effects on the stroller and no major different of accelerometer data 395 in the raw data, precluding these as potential con-founders on all 3 instruments.

Comparison of background estimation and correction after noise reduction
Local air pollution can be highly affected by long-range and regional transport. The timing and magnitude of such transport varies in space and time and is highly dependent upon the stochasticity of meteorology. As a result, local background concentration changes may vary, affecting the 400 comparability of measurements made at the same location at different times (Brantley et al., 2014). Therefore, based on the comparison of background correction, the CMA showed better applications for estimating the background concentration and location source contribution.

Generalizability
To verify the generalizability of our assessment, we performed another three measurement runs in 445 Munich (Measurement 8,9,10). Raw data were post-processed for noise reduction using CMA (Fig.   S7). The results showed that the following method is equally applicable in a city like Munich as in our study site in Augsburg, two cities that differ in location and environmental characteristics (e.g., population, economy, traffic density etc.). After treated by CMA, the peak-samples can be identified in different interval times (Fig. S8), and the estimated background concentrations showed few negative 450 values (Fig. S9). Further research into the transferability of our results to a more diverse set of contexts is still needed.

Practical implication
The MA200 is widely used to measure human exposure to black carbon and for mobile air quality monitoring. In this study the MA200 were applied in mobile measurements in an urban area 455 (Augsburg), and the sensitivity of the final analysis to various data postprocessing methods was investigated. In contrast to our findings, Hagler et al., (2011) suggested the use of ONA algorithm to postprocess Aethalometer data from microAeth AE51, portable AE42, and rackmount AE21 aethalometers (Magee Scientific, Berkeley, CA, USA). In their analysis, ONA demonstrated a strong noise reduction in all datasets and retained spatiotemporal variation. ONA also reduced the occurrence 460 of negative data values in low concentration sampling environments. However, for the microAeth® series of black carbon monitoring instruments, our study showed that ONA leads to a considerable dampening of spatiotemporal resolution in local black carbon signals at street level -an effect that is lower under CMA postprocessing.
In addition, our analysis highlights that the selection of an appropriate data postprocessing method is 465 crucial to the proper assessment and interpretation of exposure-relevant microenvironmental contributors to pollution concentrations in urban areas. This analysis is important when estimating exposures that occur during transit, where spatiotemporal variability in pollution concentrations is vast, like in commuter traffic (Snyder et al., 2013). Due to the typically low-but-heterogeneous nature of eBC concentrations in many areas like Augsburg, noisy measurement with the MA200 under 470 high-frequency sampling may obscure actual trends in measured values. This study demonstrated that postprocessing MA200 data using CMA can reliably extract the actual signals from such noise and, alternatively, that postprocessing via ONA and LPR could be less reliable. Future researchers and agencies may find a distillation of our results in the form of the flow diagram in Scheme 1 useful in determining how to reliably assess spatiotemporal variability of MA200 measurements for black 475 carbon in different microenvironments.

Scheme 1
The proposed decision tree for mobile monitoring data from the microAeth® MA200.

Conclusion
A mobile monitoring campaign was conducted in the city center of Augsburg, Germany to determine a 480 suitable noise reduction algorithm for the MA200 aethalometer. Our results showed that, at the interval time of 5 s, 10 s, and 30 s, CMA postprocessing effectively removed spurious negative concentrations without major bias and reliably highlighted effects from local sources, effectively increasing spatiotemporal resolution in mobile measurements. Evaluation of the effects of each method on peak-sample reduction and the estimation of background concentrations further support the reliability 485 of CMA algorithm. Further analysis is needed to understand how well these findings apply in different seasons; across different diurnal patterns; and in more-rural, more-urban, and non-German locations.

Data availability
The data is available upon request by contacting the first author of the paper. Liu, X., Schnelle-Kreis, J., Zhang, X., Bendl, J., Khedr, M., Jakobi, G., Schloter-Hai, B., Hovorka, J., and Zimmermann, R.: Integration of air pollution data collected by mobile measurement to derive a preliminary spatiotemporal air pollution profile from two neighboring German-Czech border villages.