Practical field calibration of electrochemical NO 2 sensors for urban air quality applications

In many urban areas the population is exposed to elevated levels of air pollution. However, air quality is usually 10 only measured at a few locations. These measurements provide a general picture of the state of the air, but they are unable to monitor local differences. Since a few years new low-cost sensor technology is available, which has the potential to extend the official monitoring network significantly. These sensors, however, are still in an experimental stage and suffer from various technical issues which limit their applicability. This study explores the added value of alternative air quality measurements, focusing on nitrogen dioxide (NO2) in 15 Amsterdam, the Netherlands. 16 low-cost air quality sensor devices were built and distributed among volunteers living close to roads with high traffic volume for a two-month measurement campaign. Careful calibration of individual sensors is essential to measure ambient concentrations of NO2 significantly. Field calibration was done next to an air monitoring station during an 8-day period, resulting in R ranging from 0.3 to 0.7. The NO2 accuracy can be improved by including temperature and humidity measurements from an additional low-cost sensor, R 20 ranging from 0.6 to 0.9. Recalibration is crucial, as all sensors show significant signal drift after the two-month measurement campaign. The measurement series between the calibration periods can be corrected in hindsight by taking a weighted average of the calibration coefficients. Validation against an independent air monitoring station shows good agreement. Using our approach, the standard deviation of a typical sensor device for NO2 measurements was found to be 7 μg m. This shows that, if properly treated, low-cost 25 sensors based on the current generations of electrochemical NO2 sensors may provide useful complementary data on local air quality in an urban setting. Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-43, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 4 April 2017 c © Author(s) 2017. CC-BY 3.0 License.


Introduction
Because air pollution is difficult to measure, instrumental and operational costs of official measurement stations are usually high.Air quality networks in cities, if present at all, are therefore usually sparse.Emerging low-cost sensor technology has the potential to extend the official monitoring network significantly, and improve our understanding of local urban air pollution.Miniaturized and affordable sensors enable citizens to measure their environment in more detail in space and time (Kumar et al., 2015).However, most sensors are still in an experimental stage and suffer from various technical issues which limit their applicability.The poor data quality is of concern to health authorities, scientists and citizens themselves.Before conclusions can be drawn from the measurements, comprehensive calibration and validation is essential (e.g.Lewis and Edwards, 2016;Lewis et al., 2016).
Several studies have been done to explore the performance of low-cost air quality sensors, e.g.Jiao et al., 2016, Duvall et al., 2016;Mead et al., 2013;Moltchanov et al., 2015.For NO 2 monitoring, mostly metal oxide and electrochemical sensors are used (Borrego et al., 2016;Spinelle et al., 2015b;Thompson, 2016).Typical ambient concentrations of NO 2 are at part-perbillion (ppb) level.The main problems encountered in NO 2 sensor evaluations in these real-world environments are low sensitivity, poor selectivity, low precision and accuracy, and drift.Especially metal oxide sensors are not very stable (Spinelle et al., 2015b;Thompson, 2016) and suffer from lower selectivity.Therefore, in this study, we opted for electrochemical sensors to measure NO 2 .Mead et al. (2013) already noted the strong interference of ozone and other ambient factors in electrochemical NO 2 sensors.
The performance can be increased significantly when adding additional measurements of e.g.temperature and humidity in a regression model or neural network, as shown by e.g.Piedrahita et al. (2014), Spinelle et al. (2015b), Masson et al. (2015).
Coping with sensor degradation remains a serious issue.Some studies, such as Jiao et al. (2016), include an additional temporal term in their linear regression which improves the predicted NO 2 slightly.
The following sections will further explore the applicability of electrochemical NO 2 sensors for measurements of urban air quality, using a practical method for in-field calibration and regression modelling for assessment of accuracy and sensor degradation.

The Urban AirQ project
The Urban AirQ project explores the added value of alternative air quality measurements in the city.It focusses on a 2×1 km 2 area around Valkenburgerstraat, a primary road in the East-central part of Amsterdam, see Figure 1.Its dense traffic causes regular exceedances of the European annual limit value for nitrogen dioxide (40 μg m -3 ).
Two town hall meetings were organized in which residents of this area were invited to raise their concerns about air pollution in their neighborhood and to formulate related research questions.Topics included the relation between traffic density and air pollution, the difference between a main road and a side street, the front side of an apartment compared to its backside, the influence of apartment height, and the influence of cut-through traffic at nighttime.The residents were invited to participate in finding answers to their questions by measuring their outdoor air quality with 16 experimental low-cost sensor devices, built for this purpose by Waag Society.
Measurements were done from June to August 2016.Beforehand, the sensor devices were calibrated using side-by-side measurements next to an official air quality measurement station.With a second calibration period after the campaign, individual sensor drift was assessed and compensated in hindsight.

NO 2 sensor devices
The concept of the Urban AirQ sensor is building a device with low-cost electronic components which is easy to operate, so citizens can do their own air quality measurements.It builds on the basic design described by Jiang et al. (2016), having an improved power supply, weather resistant housing, WiFi connectivity, and additional sensors for temperature, relative humidity, and particulate matter.The sensor development is part of an open hardware project; detailed technical information can be found at https://github.com/waagsociety/making-sensor.
Central is the microcontroller board (Arduino UNO) which handles the reading of the sensors and sends the data to the WiFi module, see Figure 2.
For NO 2 measurements, an amperometric electrochemical cell is used from Alphasense Ltd (Essex, United Kingdom).The cell contains four electrodes.The target gas, NO 2 , diffuses through a membrane where it is chemically reduced at the Working Electrode, generating a current signal.This electric current is balanced by a opposite current from the Counter Electrode.The Reference Electrode sets the operating potential of the Working electrode.The sensor also includes an Auxiliary Electrode, which is used to compensate for baseline changes in the sensor.To get full sensor performance, low noise interface electronic is necessary.An individual sensor board, also provided by Alphasense, is used to guarantee a low noise environment and to optimize the sensor resolution at low ppb levels.The sensor signal is read by a 16-bit analog to digital (A/D) converter (ADS1115).14 sensor devices contain model NO2-B43F for NO 2 measurements, the other two use model NO2-B42F.
All devices are equipped with a DHT22 sensor from Aosong Electronics measuring temperature and relative humidity (RH).
12 of the 16 sensor boxes are also equipped with a Shinyei PPD42NS sensor in order to measure particulate matter optically.
The present paper, however, will focus only on the assessment of the NO 2 measurements.

Averaging and filtering
Raw sensor measurements are stored in a central database on a one minute base.However, the calibration analysis is based on hourly averages to enable direct comparison between the ground truth (also provided as hourly values), and to improve the signal to noise ratio.
The NO 2 sensor measurements are done at the Working Electrode (S WE ) and the Auxiliary Electrode (S AE ).They are provided as counts from the A/D converter.Sensor readings of temperature and RH are converted according to the indication of the manufacturer to degrees Celsius and percentages respectively.Raw, hourly averaged, sensor data is shown in Figure 3.
Careful filtering is needed before the data can be further processed.We have applied the following rules: • Raw, minute-based, S WE and S AE measurements outside a ±10% range of their mean value during the entire measuring period are considered outliers.This affects 0.33% of all measurements.
• All readings at sensor temperatures above 30°C are discarded to avoid non-linear temperature dependence of the electrochemical NO 2 sensor (see Sect. 4.4).This affects 4.53% of the measurements during the entire period.
• At least 20 valid minute-based measurements are required to calculate a representative hourly mean.
During the first calibration period, the sensors were measuring 79% of the time on average.After applying the criteria above, this resulted in 70% valid hourly measurements.During the measurement campaign, the sensors produced 79% valid hourly measurements on average, with the uptime dropping to 50% in places were sensors experienced connectivity problems due to limited range of the participant's WiFi network.

Calibration periods
Calibration of the sensors devices have been done by placing the 16 sensors side by side on the rooftop of the air quality station at Vondelpark, operated by the Public Health Service of Amsterdam (GGD).This station is classified as a city background station.It measures nitrogen dioxide, nitrogen monoxide (NO), ozone (O 3 ), particulate matter (PM 10 , PM 2.5 , particle number and size distribution), black carbon, and carbon monoxide (CO).For NO and NO 2 measurements, GGD alternates Teledyne API 200E and Thermo Electron 42I NO/NO x analysers, both based on chemiluminescence.The validated measurements used in this study are considered to be the ground truth.The calibration period spanned several days to be able to test the sensors under a wide range of ambient conditions.To assess the stability of the calibration, the sensors were brought back after the two-month measurement campaign to the calibration facility for a second calibration period.The Urban AirQ campaign consisted therefore of three phases.
The first field calibration period at GGD Vondelpark station started at 2 June 2016, 00h LT (local time), and ended at 10 June 2016, 10h (8.5 days; 204 hours).Due to connectivity problems sensor data was missing between 4 June 19h and 6 June 9h.
During the following citizen campaign, 15 sensors were distributed among the participants.One sensor (55303) was kept at the Vondelpark station as a reference.The first sensor was installed and connected at 13 June 2016, 18h, the last sensor connected at 17 June 2016, 17h.At 15 August 2016, 9h, the first sensor was disconnected, at 16 August 2016, 18h, the last sensor was disconnected.
The second field calibration period at GGD Vondelpark station started at 18 August 2016, 15h, and ended at 29 August 2016, 00h (10.4 days; 249 hours) .Due to connectivity problems sensor data was missing between 26 August 12h and 27 August 11h.concentrations at the Vondelpark station in the calibration periods is larger than in the campaign, reaching more frequently higher NO 2 values.During the campaign the sensor are more closely located to the GGD station at Oude Schans.NO 2 values measured here are generally a few μg m -3 higher than at Vondelpark.The Oude Schans site does not measure ozone.

NO 2 calibration
Electrochemical sensors such as the Alphasense NO2-B series, are known to be sensitive to interfering species and ambient factors.Especially ozone, temperature, and relative humidity influence the sensor reading (see e.g.Spinelle et al., 2015a).

Explaining the NO 2 sensor signal
To understand better the behavior of the NO 2 sensor, we study its sensitivity to different ambient factors.We use the first calibration period to test the correlation of the measured S WE and S AE signal with NO 2 , ozone, temperature and humidity by making a best fit though the hourly time series, e.g.
Temperature and RH were not available from the obtained GGD station data.Instead of taking from ambient air measurements, we take temperature and RH from the average readings from the DHT22 sensors, as these reflect better the internal sensor conditions.
Figure 5 shows scatter plots for an average performing sensor and the R 2 , the coefficient of determination.The measured S WE signal can be explained by ambient NO 2 (R 2 =0.20), but better by its anti-correlation with ozone (R 2 =0.49).Temperature alone is an even better predictor for the sensor signal (R 2 =0.73), probably because of direct temperature dependence of the sensor, and indirect dependence (temperature being a reasonable proxy for both NO 2 and O 3 concentrations).Also the correlation with humidity is very strong (R 2 =0.73).The measured S WE signal can best be explained as a linear combination of NO 2 , O 3 , T, and RH together, resulting in a correlation of 0.98 (R 2 =0.96).
The S AE signal is practically insensitive to NO 2 .This suggests that a combination of S WE and S AE is more sensitive to NO 2 and less to the other interfering factors, as intended by the manufacturer.

NO 2 calibration models
The sensor manufacturer suggests to correct both Working Electrode and Auxiliary Electrode for a zero-offset with S WE,0 and S AE,0 respectively.Then a sensitivity constant s is applied to convert from mV to ppb NO 2 : In practice, the factory-supplied constants S WE,0 , S AE,0 , and s do not result in realistic values of NO 2 .As an alternative, we with the coefficients to be determined with data from the calibration period using ordinary least squares (OLS).Table 1 shows the fit results and the corresponding correlation with true NO 2 signal.As can be seen, within the batch of sensors there is a large variability of direct sensitivity to ambient NO 2 .
During the calibration period, hourly ozone values (also taken from the Vondelpark station) happened to be a good proxy for the ambient NO 2 concentration: NO 2 (t) = 44.6 -0.40•O 3 (t) in [μg m -3 ], with R 2 of 0.49 (see Figure 6).
When compared with Table 1, one can see that direct sensor readings from a fair part of the sensors cannot outperform this result.To improve the results we use additional measurements and their statistical relation to NO 2 .We fit different calibration models with multiple linear regression (using OLS).The calibration models which were tested are listed in Table 2.
Temperature and RH are taken from the DHT22 sensor.There is no need to calibrate the individual T and RH sensor values beforehand; the calibration coefficients for NO 2 are determined for the specific set of all sensors in the box.However, this means that if an individual sensor is replaced, new calibration parameters for the sensor box have to be derived.

Calibration results
A complete overview of fit results for all models can be found in the supplement.The sign of the calibration parameters can be easily understood.As the electrochemical sensor loses sensitivity at higher temperatures, coefficients c 3 are positive to compensate for this effect.The additional sensor response due to cross-sensitivity with ozone is compensated by negative values for c 5 .
From significant spread in performance, around a median value for R 2 of 0.83. Figure 7 shows the results for the different calibration models for an average performing sensor.The time series in Figure 7(b) show clearly how the performance of a typical sensor device improves when temperature and humidity are included in the calibration analysis.

Dependency on temperature
Calibrated, but uncorrected, data show occasionally strong negative values, see Figure 8 below.These negative peaks coincide with internal sensor temperatures exceeding 30 °C.This behavior can be explained from the dependency of the electrochemical sensor on temperature becoming non-linear, see Figure 8(b).In this regime, the response of the sensor cannot be described well with our multilinear regression approach.As temperatures during the measurement period only rose occasionally above 30 °C, we decided to filter these measurements out.

Startup time
When the sensors are switched on after an unused period they need time to stabilize.Figure 9 give some examples of 4 sensors which are switched on at their campaign sites after being offline for a couple of days.The startup-effect is translated by the calibration model as a strong positive NO 2 peak.After 4 hours most sensors are sufficiently stabilized.Note that this startup effect should not be confused with the response time, which is determined to be less than 2 minutes in Mead et al.

Sensor drift, aging, and uncertainty estimation
Almost all electrochemical sensors have some degree of drift because of aging and poisoning (Di Carlo et al., 2011;Hierlemann and Gutierrez-Osuna, 2008).This becomes a serious complication when the drift is in the order of the strength of the signal of interest.The idea of keeping sensor 55303 next to the reference station during the whole campaign was to study sensor degradation in more detail.Unfortunately, the sensor was removed temporarily from 10 to 14 July for service, which introduced a sudden and unexplained offset in measurements.By introducing a second calibration period after the measurement campaign, we have another possibility to assess the stability of the sensors, and calibrate the measurements in hindsight.All sensors were brought back to the GGD station at the Vondelpark.In Figure 10, the sensor signals (calibrated with coefficients from the first calibration period) are compared to the official station measurements.As can be seen in As can be seen in Table 4 below, the bias is mostly positive.Note that sensor 54911 and 1184206 had a limited uptime in the second period, which makes their bias and RMS calculation not very representative.
The strongest bias after two months is found for 14560051 and 1184206, both of model NO2-B42F and having been used in others experiments for more than one year.These sensors have also the largest RMSE in the first calibration period (see also Table 3), another indication of their poor performance.The range in RMSE of the remaining sensors is 4.5 -7.2 μg m -3 for the first period.The bias corrected RMSE increases to 5.3 -9.3 μg m -3 for the second period.The latter is a more conservative yet more realistic estimation of the precision of the NO 2 estimates, as they are based on measurements which were not used for calibration.Based on our results listed in the last column of Table 4, we take 7 μg m -3 as a typical uncertainty for the estimated NO 2 values.
The increase of SDR is also due to a loss of sensitivity over time.The aging of the sensors can be further investigated by recalibrating the devices, i.e. determining the coefficients of regression model D, using the data of the second calibration period (see the Supplemental Material).
The panels in Figure 11  this can not avoid that they have the worst performance and the worst performance loss in terms of R 2 .

Weighted calibration
Taking 18 μg m -3 as a typical NO 2 concentration in an urban environment (Figure 4), the sensor drift as listed in Table 4 is a significant error component, even after a two month period.It is impossible to predict the progressing bias for an individual sensor.However, using the second calibration period we can compensate for signal drift in hindsight.If  � 1 () represents the estimated NO 2 value at time t based on the first calibration period (starting at t 1 ), and  � 2 () the estimated NO 2 value based on the second calibration period (ending at t 2 ), the we take for intermediate times  1 ≤  ≤  2 a weighted average of both calibrations: Assuming that the sensor degradation is linear in time we select such that f(t 1 )=0 and f(t 2 )=1.

Validation against Oude Schans station
From 14 June to 16 August, sensor 54200 was placed at Korte Koningsstraat (ground floor/street side), which happens to be 120m from another GGD station at Oude Schans, also classified as an urban background station.The Korte Koningsstraat characterizes as a side street, away from traffic arteries.The proximity to a reference station enables us to perform an independent validation of the sensor measurements, as the calibration of the sensor is based on side-by-side measurements with Vondelpark station, at 3km distance.As can be seen from Table 5, the sensor readings agree very well with the official measurements.Figure 12(a) and 12(b) show the time series and the scatter plot.
Using the weighted calibration explained in the previous section, the measurement bias largely disappears.The RMSE is comparable to the RMSE found during the calibration period (see Table 4).The results give confidence that our calibration method is independent of location, and that our assumption of sensor degradation being linear in time is acceptable.

Discussion
As all electrochemical NO 2 sensors, the Alphasense NO2-B4 sensor is not very selective to the target gas.The sensor response can best be explained as a linear combination of NO 2 , O 3 , temperature and humidity signals (R 2 ≈ 0.9).
As a consequence, a linear combination of the Working Electrode and the Auxiliary Electrode alone give poor indication of ambient NO 2 concentrations.The accuracy varies greatly between different sensors (R 2 between 0.3 and 0.7).For the Urban AirQ campaign, temperature and relative humidity were included in a multilinear regression approach.The results improve significantly with R 2 values typically around 0.8.This corresponds well with the findings of Jiao et al. (2016), who find an adjusted R 2 =0.82 for the best performing electrochemical NO 2 sensor in their evaluation, when including T and RH.
Best results are obtained by also including ozone measurements in the calibration model: R 2 increases to 0.9.Spinelle et al.
(2015b) used a similar regression and found R 2 ranging from 0.35 to 0.77 for 4 electrochemical NO 2 sensors during a twoweek calibration period, but dropping to 0.03-0.08 when applied to a successive 5-month validation period.Low NO 2 values at their semi-rural site partly explains this poor performance, but most likely also unaccounted effects such as changing sensor sensitivity and signal drift.This error consists of several components.The reference measurements by the NO/NO x analysers have an estimated hourly error of 3.65% (certified validation at a 200 μg m -3 NO 2 concentration), which would contribute to 0.5 μg m -3 under typical conditions.The low-cost DHT22 sensor has a reported error of 0.5 °C for temperature and 2-5% for RH.For a single measurement, this would contribute to an error of approximately 1 μg m -3 and 0.5 μg m -3 , respectively (Figure 11(d) and   11€).It should be noted, however, that binning minute-based measurements to hourly averages removes large part of the variability, while determining the best fitting regression model for each sensor device removes large part of the remaining systematical biases.The largest part of the error term is therefore introduced by the linear regression model itself, which does not include all interfering species or meteorological quantities, and is not able to describe non-linear dependencies of its variables.One should therefore be careful extrapolating the calibration model for conditions different than the calibration period.
The found sensor accuracy after two calibrations and corrections is good enough to complement official measurements by providing additional information on local air quality between reference stations, and detect unexpected hot spots (or low values) of urban NO 2 .However, it must be further investigated if the regression method used here would provide realistic estimates for peak values (such as the EU hourly limit value, 200 μg m -3 ).
The necessity for recalibration troubles practical applications in operational urban networks.Sensors must be brought back to a calibration facility on a regular basis, or must be recalibrated on the spot by a travelling reference instrument.New data driven techniques, such as Bayesian networks (e.g.Xiang et al., 2016), might offer a solution for this problem.

Conclusions and outlook
The current generation of low-cost NO 2 sensors has some serious issues which trouble straightforward application.To make electrochemical NO 2 sensor measurements accurate, careful filtering of the raw data is necessary.There is a strong spread in sensor performance, even if the sensors come from the same batch, which make individual calibration essential.The accuracy of the measurements can be improved by including temperature and humidity measurements from other low-cost sensors in a multilinear regression approach.A practical calibration method is measuring side-by-side to an air monitoring station.This measurement period should be as long as possible (but at least a few days), to capture the sensors behavior under a wide range of pollution levels and meteorological conditions.
Startup time of sensors is estimated 4 hours.To avoid nonlinear response of the electrochemical sensor at elevated temperatures, we filter out measurements above 30 °C.This is not a serious restriction for applicability in moderate climates such as in the Netherlands, provided that the sensor is protected from direct sunlight.However, for warmer regions or during Individual sensor drift can be compensated in hindsight by taking a weighted average of the calibration coefficients determined before and after the campaign, assuming that the sensor degradation is linear in time.The sensor degradation makes it necessary to think about smart re-calibration programs when one wants to use electrochemical sensors operationally in a low-cost urban networks.More research is needed to gain better insight of how sensors age in field applications.This will provide better calibration strategies which improve data quality.
To further improve accuracy of electrochemical NO 2 measurements in low-cost sensor devices we recommend to include an additional ozone sensor to better resolve cross-sensitivity issues.Even imperfect ozone measurements will improve the NO 2 estimation, as large part of the sensor's cross-dependency issues are solved by the linear regression approach.The RH sensor signal should be used more cleverly to detect and filter for sudden changes in humidity.Adding a local data logger is also recommended, to be able to recover data for periods when the WiFi connection to the central database is lost.

Figure 4
Figure4shows the distribution of temperature, relative humidity, NO 2 , and O 3 during the different periods.The calibration periods are characterized by higher temperatures and ozone levels than the campaign period.The range of hourly NO 2 propose a linear combination of signal S WE and S AE Atmos.Meas.Tech.Discuss., doi:10.5194/amt-2017-43,2017 Manuscript under review for journal Atmos.Meas.Tech.Discussion started: 4 April 2017 c Author(s) 2017.CC-BY 3.0 License.NO 2 [µg m −3 ] =  0 +  1  WE +  2  AE(3)

Figure 10 (
Figure 10(b), most sensors have been drifting in the intermediate two-month period.Note that part of the drift could also be partly related to the aging of the DHT22 temperature and RH sensor.We describe this degradation effect as a bias b between the mean of the hourly estimated NO 2 values  �  and the mean of the hourly true NO 2   during the calibration period: show how the calibration coefficients change after two months of deployment.Having in mind that the S WE signal is the only component which has direct sensitivity to NO 2 , one can see in Figure 11(b) (all dots below the y=x line) that all sensors suffer from sensitivity loss to NO 2 .This results in lower R 2 values in Figure 11(f), although the performance loss is partly compensated by the other components in the multivariate linear regression.The older Alphasense models NO2-B42F (red dots in Figure 11(b)) are the most insensitive to NO 2 , and have the largest sensitivity loss, which the regression tries to compensate with an increased temperature dependence (Figure 11(d)), although Atmos.Meas.Tech.Discuss., doi:10.5194/amt-2017-43,2017   Manuscript under review for journal Atmos.Meas.Tech.Discussion started: 4 April 2017 c Author(s) 2017.CC-BY 3.0 License.
Atmos.Meas.Tech.Discuss., doi:10.5194/amt-2017-43,2017   Manuscript under review for journal Atmos.Meas.Tech.Discussion started: 4 April 2017 c Author(s) 2017.CC-BY 3.0 License.The sensor devices were tested in an Amsterdam urban background in summertime, with NO 2 values ranging from 3 μg m -3 to 78 μg m -3 , and median values around 15 μg m -3 .During the 3-month period most sensors show loss of sensitivity and significant drift, ranging from -9 to 21 μg m -3 .After bias correction we found a typical value for the accuracy of the NO 2 measurements of 7 μg m -3 .
Atmos.Meas.Tech.Discuss., doi:10.5194/amt-2017-43,2017   Manuscript under review for journal Atmos.Meas.Tech.Discussion started: 4 April 2017 c Author(s) 2017.CC-BY 3.0 License.heat waves this may reduce the data stream considerably, unless the temperature dependencies are better captured by more advanced regression models.The calibration coefficients seem to be location independent, as independent validation in the proximity of a second monitoring station suggests.However, calibration coefficients are not constant in time.During the 3-month period most sensors suffer from significant sensitivity loss and drift.The standard deviation of the random error is estimated 7 μg m -3 for a typical sensor.The strongest drift and largest uncertainty are found for the older NO2-B42F sensors.It remains unclear if the poorer performance is related to the sensor model or the longer usage in field experiments.

Figure 1
Figure 1 Locations of the sensor devices during the citizen measurement campaign.The red marker indicates the GGD station at Oude Schans.Not shown is the GGD Vondelpark station, 2.5 km in south-west direction.

Figure 2
Figure 2 Hardware components of a sensor device (left), and sensors in their housing (right) 5

Figure 3
Figure 3 Raw sensor data, unfiltered but hourly averaged, from the 16 sensors during the first calibration period, 2-10 June 2016.The data gap is due to a connectivity problem to the central database.

Figure 4
Figure 4 Box whisker diagrams of hourly ambient parameters during the calibration period and the measurement campaign.The 5

Figure 5
Figure 5 The reading of typical performing NO2-B43F sensor (ID 1185325) explained as a linear regression of respectively NO 2 , O 3 , T, RH, and all variables.Top two rows show results for Working Electrode ; bottom two rows for Auxiliary Electrode.On the axis the A/D converter counts, which can be considered as arbitrary units.

Figure 6
Figure 6 Ozone as a proxy of ambient NO 2 .

Figure 7
Figure 7(a) Calibration model results for an average performing sensor (ID 1184838).Bottom row shows the recommended calibration by Model D (left), and the results when ozone would be included (right).

Figure 7
Figure 7(b) Time series compared to ground truth with calibration parameters of Model A and D.5

Figure 8 (
Figure 8(a) Examples of negative spikes in the calibrated NO 2 measurements due to sensor temperatures exceeding 30 °C.

Figure 8 Figure 9
Figure 8(b) Variation of zero output of the working electrode caused by changes in temperature for a typical batch of electrochemical sensors.Image taken from Alphasense Data Sheet for NO2-B43F (ADS, 2016).5

Figure 10 (
Figure 10(a) Time series of a batch of sensors, calibrated with model D, compared with the reference measurements (grey line).

Figure 10 (
Figure 10(b) Comparison of the time series of the same batch of sensors with the reference measurements (grey line), after two 5

Figure 11
Figure 11 Change in calibration coefficients of model D from the first calibration period (horizontal axis) when recalibrating after two months of deployment (vertical axis).The red dots correspond to sensor devices containing the Alphasense NO2-B42F.5

Figure 12 (
Figure 12(a) Comparison of sensor 54200 NO 2 time series with the nearby Oude Schans station (8-day snap shot), and the effect of bias correction.For comparison, measurements of Vondelpark station are also shown.

Figure 12 (
Figure 12(b) Scatterplot of sensor 54200 against Oude Schans station NO2 measurements during the campaign period.5 Model A NO = c 0 + c 1 •S WE + c 2 •S AE Linear combination of Working Electrode and Auxiliary ElectrodeModel B NO = c 0 + c 1 •S WE + c 2 •S AE + c 3 •T Temperature correction Model C NO = c 0 + c 1 •S WE + c 2 •S AE + c 4 •RH Relative humidity correction Model D NO = c 0 + c 1 •S WE + c 2 •S AE + c 3 •T + c 4 •RH Temperature and RH correction Model E NO = c 0 + c 1 •S WE + c 2 •S AE + c 3 •T + c 4 •RH + c 5 •O 3Adding also correction for ozone crosssensitivity R 2 values closer to 1) are obtained by including ozone (Model E).The ozone values were obtained from the GGD station, as the sensor devices do not measure ozone.As local ozone measurements were only available during the calibration periods, we used Model D for the Urban AirQ campaign, i.e. generating an NO 2 value based on a linear combination of S WE , S AE , T, and RH.The regression analysis of Model D and correlation with the NO 2 ground truth can be found in Table3.
Pang et al. (2016) see that Model C (including RH) performs better than Model A, but model B (including T) outperforms model C. Model D (including both RH and T) only marginally improves the results of Model B. This can be understood from the strong sensor dependence on temperature directly, and indirectly on temperature as a proxy for ozone.The better performance of model C with respect to model A can be explained by RH being a reasonably proxy for temperature.Note that measuring RH is essential for guarding the data quality of electrochemical sensors, as these sensors are very sensitive to sudden changes in RH, see e.g.AAN (2013) andPang et al. (2016).The best calibration results (i.e.The two worst performing sensor boxes (14560051 and 1184206) contain the older NO2-B42F sensor.It is not clear if their poor performance can be attributed to the different sensor model, or to their longer operating time (both sensors have been used in previous experiments for more than a year).Again, one can see that even within the same batch of sensors, there is a Atmos.Meas.Tech.Discuss., doi:10.5194/amt-2017-43,2017 Manuscript under review for journal Atmos.Meas.Tech.Discussion started: 4 April 2017 c Author(s) 2017.CC-BY 3.0 License.