High-resolution mapping of urban air quality with heterogeneous observations: a new methodology and its application to Amsterdam

In many cities around the world people are exposed to elevated levels of air pollution. Often local air quality is not well known due to the sparseness of official monitoring networks, or unrealistic assumptions being made in urban air quality models. Low-cost sensor technology, which has become available in recent years, has the potential to provide complementary information. Unfortunately, an integrated interpretation of urban air pollution based on different sources is not straightforward because of the localized nature of air pollution, and the large uncertainties associated with measurements of low-cost sensors. 10 This study presents a practical approach to producing high spatio-temporal resolution maps of urban air pollution capable of assimilating air quality data from heterogeneous data streams. It offers a two-step solution: (1) building a versatile air quality model, driven by an open source atmospheric dispersion model and emission proxies from open data sources, and (2) a practical spatial interpolation scheme, capable of assimilating observations with different accuracies. The methodology, called Retina, has been applied and evaluated for nitrogen dioxide (NO2) in Amsterdam, the Netherlands, 15 during the summer of 2016. The assimilation of reference measurements results in hourly maps with a typical accuracy (defined as the ratio between the root means square error and the mean of the observations) of 39% within 2 km of an observation location, and 53% at larger distances. When low-cost measurements of the Urban AirQ campaign are included, the maps reveal more detailed concentration patterns in areas which are undersampled by the official network. It is shown that during the summer holiday period, NO2 concentrations drop about 10%. The reduction is less in the historic city centre, while strongest 20 reductions are found around the access ways to the tunnel connecting the northern and the southern part of the city, which was closed for maintenance. The changing concentration patterns indicate how traffic flow is redirected to other main roads. Overall, it is shown that Retina can be applied for an enhanced understanding of reference measurements, and as a framework to integrate low-cost measurements next to reference measurements in order to get better localized information in urban areas.

pollution, to identify hot spots, and to improve the ability to anticipate events. This is especially relevant for nitrogen dioxide (NO 2 ) concentrations, which can vary considerably from street to street. NO 2 is, apart from being a toxic gas on its own, an important precursor of particulate matter, ozone, and other regional air pollutants. Observations from a single location are not necessarily representative for a larger area. Unfortunately, urban air quality reference networks are usually sparse or even absent due to their high installation and maintenance costs. New low-cost sensor technology, available for several years now, 35 has the potential to extend an official monitoring network significantly, even though the current generation of sensors have significant lower accuracy (WMO, 2018).
However, exploiting these measurements (either official or unofficial), apart from publishing the data as dots on a map, is not straightforward. Here, the aim is to make better use of the existing measurement networks to get the best description of hourly urban air quality, and to create value from low-cost measurements towards a Level 4 product, according to the classification 40 proposed by Schneider et al. (2019) To obtain high-resolution information of air pollutants with sharp concentration gradients, a very sparse observation network needs to be accompanied by a valid high-resolution air quality model, whereas a very dense network can do with simple spatial interpolations. The situation in most large cities is somewhere in between. There is often a reasonably large reference network present (10+ stations), sometimes complemented with an experimental network of low-cost AQ sensors. Assumptions about 45 underlying unresolved structures in the concentration field are still needed, but this can be done with a simplified air quality model, using the available measurements to correct simulation biases where needed.
A popular approach in detailed mapping of air quality is land use regression modelling (LURM), see e.g. Beelen et al. (2013).
LURM uses multiple linear regression to couple a broad variety of predictor variables (geospatial information such as traffic, population, altitude, land use classes) to the observed concentrations. It is typically used in exposure studies, which target long 50 integration intervals by definition. Problems of over-fitting might arise when too many predictor variables are used.
Alternatively, Denby (2015) advocates the use of less proxy data, and a model based on more physical principles. In his approach, the emission proxies are first (quasi) dispersed with a parameterized inverse distance function, before being coupled to observed concentrations in a regression analysis.
Mapping of air pollution for short time scales is challenging. Only a few scientific studies are published aiming at assimilation 55 of near-real time observations in hourly urban concentration maps. Tilloy et al. (2013) use the 3-hourly output of a welldeveloped implementation of the AMDS Urban dispersion model in Clermont-Ferrand, France, to assimilate in-situ NO2 measurements at 9 reference sites in an optimal interpolation scheme. With a leave-one-out validation they show a strong reduction in root mean square error of the time series after assimilation. Schneider et al. (2017) use Universal Kriging to combine hourly NO 2 observations of 24 low-cost sensors in Oslo, Norway, with a time-invariant basemap. The basemap is 60 created from a yearly average concentration field calculated with an Eulerian/Lagrangian dispersion model on a 1 km grid, downscaled to 100 m resolution. Averaged over reference locations, their study shows that hourly values compare well with official values, showing the potential of low-cost sensor data for complementary air quality information at these time scales.
In this paper presents a more advanced yet practical approach to map hourly air pollutant concentrations, named Retina. Retina uses a two-stage approach. It runs an urban air quality model to account for hourly variability in meteorological conditions (described in Section 3) which is dynamically calibrated with recent measurements (Section 4). In the second stage 80 it assimilates current measurements using statistical interpolation (Section 5). Section 6 presents the validation of the system, while Section 7 shows the added value when assimilating additional low-cost sensor measurements. The last section is reserved for discussion, conclusion and outlook.

Air quality measurements
The Public Health Service of Amsterdam (GGD) is the responsible authority for air quality measurements in the Amsterdam 85 area. Within the domain used in this study their NO2 network consists of 15 reference stations: 5 stations classify as road station, 5 as urban background station, 2 as industry, 2 as rural, and 1 undecided. Alternatingly, GGD operates a Teledyne API 200E and a Thermo Electron 42I NO∕NO X analyser, both based on chemiluminescence. A catalytic-reactive converter converts NO 2 in the sample gas to NO, which, along with the NO present in the sample is reported as NO X . NO 2 is calculated as the difference between NO X and NO. The accuracy of both type of reference instruments is estimated at 3.7% (GGD, 2014), 90 following the EN 14211 standard which includes all aspects of the measurements method: uncertainties in calibration gas and zero gas, interfering gases, repeatability of the measurement, derivation of NO 2 from NO X and NO, and averaging effects.
Low-cost NO 2 measurements are taken from the 2016 Urban AirQ campaign (Mijling et al., 2018). Sixteen low-cost air quality sensor devices were built and distributed among volunteers living close to roads with high traffic volume for a 2-month measurement period, from 13 June to 16 August. The devices are built around the NO2-B43F electrochemical cell by 95 Alphasense Ltd (Alphasense, 2018). The sensor generates an electrical current when the target gas diffuses through a membrane where it is chemically reduced at the working electrode. Better sensor performance at low ppb levels is obtained by using low-noise interface electronics. The sensor devices were carefully calibrated using side-by-side measurements next to a reference station, solving issues related to sensor drift and temperature dependence (Mijling et al., 2018). After calibration, they are found to have a typical accuracy of 30%. 100

Setting up a versatile urban air quality model
One of the largest unknowns when modelling urban air quality is a detailed, up-to-date emission inventory capable of describing the local contribution. For cities such as Amsterdam the local emissions are dominated by the transport and residential sector. This is confirmed by the EDGAR HTAP v2 emission inventory (Janssens-Maenhout et al., 2013), which estimates the contribution of NO X emissions in a 20 × 33 km 2 (0.3 degree) area around the centre being 62%, 20%, 12%, and 105 6% for the sectors transport, residential, energy, industry respectively. Especially the contribution of road transport is relevant, as its emissions are close to the ground in densely populated areas. Traffic information and population density will be used as proxies for urban emission (see Section 3.2.1 and 3.2.2).
In contrast to the regional atmosphere, the urban atmosphere is more dominated by dispersion processes, while many chemical reactions are less important due to a relatively short residence time (Harrison, 2018). For the dispersion of the emission sources, 110 the open source steady-state plume model AERMOD (Cimorelli et al., 2004) is used, developed by the American Meteorological Society (AMS) and United States Environmental Protection Agency (EPA). Based on the emission inventory and meteorology (see Section 3.2.4), AERMOD calculates hourly concentrations of air pollutants. The concentration distribution of an emission source is assumed to be Gaussian both horizontally as vertically when boundary layer conditions are stable. In a convective boundary layer, the vertical distribution is described by a bi-Gaussian probability density function. 115 Note that any other dispersion model can be used in the Retina methodology, as long as it is capable of simulating concentrations from individual emission sectors on an arbitrary receptor mesh.

AERMOD simulation settings
AERMOD version 16216r is used with simulation settings summarized in Table 1, operating on a rectangular domain of 18 × 22 km 2 covering the municipality of Amsterdam for the most part. All coordinates are reprojected in a custom oblique 120 stereographic projection (EPSG:9809) around the city centre coordinate, such that the coordinate system can be considered equidistant at the urban scale. Instead of using a regular grid, a road-following mesh ( resolution. Roads are modelled as line sources, while residential emissions are described as area sources. The dispersion is calculated for NOX to avoid a detailed analysis of the rapid cycling between its constituents NO and NO 2 . Afterwards, an NO 2 /NO X ratio is applied, depending on the available ozone (O 3 ), see Section 3.1.1.
Memory usage of AERMOD for the Amsterdam domain is proportional to the total number of emission source elements (here 17,069 road fragments and 12,182 residential squares) and the number of receptor points in the road-following mesh (here 130 42,128). The calculation time for a single concentration field is around 10 minutes, but can be reduced to a fraction of this by parallelizing the code.

Ozone chemistry and lifetime
Primary emissions of NO 2 (e.g. directly from the tailpipe) are only 5-10% of the total emitted NO X (Sokhi et al., 2007). At short time scales, secondary NO 2 is formed by oxidation of NO with O 3 , while this reaction is counterbalanced by photolysis 135 converting NO 2 to NO. The reaction rate of the first reaction is temperature dependent, while the latter depends on the available sunlight. The NO 2 /NO X ratio has therefore an intricate dependence on temperature, radiation, and the proximity to the source (i.e. the travel time of the air mass since emission).
A practical approach to estimate this ratio is the Ozone Limited Method (OLM), as described in EPA (2015). The method uses ambient O 3 to determine how much NO is converted to NO 2 . The dispersed (locally produced) NO X concentration is divided 140 into two components: the primary emitted NO 2 (here assumed to be 10%), and the remaining NO X which is assumed to be all NO available for reaction with ambient O 3 : NO+O 3 → NO 2 +O 2 . If the mixing ratio of ozone (O 3 ) is larger than the 90% of (NO X ), than all NO is converted to NO 2 . Otherwise, the amount of NO converted is equal to the available O 3 , i.e. (NO 2 ) = 0.1(NO X ) + (O 3 ). The reaction is assumed to be instantaneous and irreversible. The resulting NO 2 concentration is added to the NO 2 background concentration. 145 Removal processes of NO X are modeled with an exponential decay. The chemical lifetime is in the order of a few hours. Liu et al. (2016) find NO X lifetimes in a range from 1.8 to 7.5 h using satellite observations over cities in China and the USA.
Given the size of the domain and average wind speeds, its exact value is not of great importance here. Based on regression results a practical value of 2 hour is chosen.

Simulation input data 150
The dispersion simulation is driven by input data regarding emissions, background concentrations, and meteorology, listed in Table 2. All data, except for the traffic counts of inner city traffic, are taken from open data portals. The emission proxies are mapped in Fig. 2.

Traffic emissions
A recurrent problem when building urban air quality models is finding sufficiently detailed traffic emission information. 155 Traffic emissions depend roughly on traffic flow and fleet composition, including engine technology. For many cities, unfortunately, this information is not available. Here a distinction is made between highways and primary roads, as both have a distinct traffic volume and weekly cycle. Differences in driving conditions and fleet composition are captured by assigning two different emission factors later on.
to create a free editable map of the world. A distinction is made between urban roads (labelled in OSM as "primary", "secondary", and "tertiary") and highways (labelled as "motorway" and "trunk"), as they have a distinct traffic pulse, fleet composition, and driving conditions. Road segments labelled as "tunnel" are not taken into account.
When the traffic flow q (in vehicles per hour) is known, the emission rate E for a road segment l can be written as

= veh
(1) 165 with emission factor α veh representing the (unknown) NO X emission per unit length per vehicle. Hourly traffic flow data is taken from 29 representative highway locations from the National Data Warehouse for Traffic Information (NDW, 2019), which contains both real-time and historic traffic data. For the urban traffic flow, data from 24 inductive loop counters provided by the traffic research department of Amsterdam municipality is used. Due to its large numbers, traffic flow is relatively well predictable, especially when lower volumes during holiday periods and occasional road closures are neglected. For each 170 counting site a traffic "climatology" is constructed, parametrized by hour and weekday, based on hourly data of 2016, see Fig. 3.
Traffic counts correlate strongly between different highway locations, all showing a strong commuting and weekend effect.
Urban traffic typically shows, apart from lower volumes, less reduction between morning and evening rush hours, a less pronounced weekend effect, and higher traffic intensities on Friday and Saturday night. 175 For locations x between the counting locations xi the traffic flow q(x) is spatially interpolated by inverse distance weighting (IDW): in which the weighting factors w i depend on the distance d between x and the counting location x i : Validation in the Supplementary Material shows that for this counting network IDW predicts the traffic volume within a 50% error margin at most locations. Better results are obtained when more counting locations are available, or when they are selected strategically around crossings and access roads. Model simulations show that using inferior traffic data is partly compensated by the calibration (Section 4), at the expense of less pronounced concentration gradients.

Population data 185
Population density is considered to be a good proxy for residential emissions, e.g. from cooking and heating. Here, data is taken from the gridded population database of 2014, compiled by the national Central Bureau for Statistics (CBS, 2019) at a 100 m resolution. Each grid cell is offered to the dispersion model as a separate area source. To reflect the observation that residential emissions per capita are less when people are living closer to each other (Makido et al., 2012), the emission fluxes are taken proportional to the square root of the population density p: 190

Background concentrations
As AERMOD only describes the local contribution to air pollution, background concentrations are added which are taken from the Copernicus Atmosphere Monitoring Service (CAMS) European air quality ensemble (Marécal et al., 2015). The CAMS ensemble consists of 7 regional models producing hourly air quality and atmospheric composition forecasts on a 0.1 × 0.1 195 degree resolution. The analysis of the ensemble is based on the assimilation of up-to-date (UTD) air quality observations provided by the European Environment Agency (EEA). Each model has its own data assimilation system.
In the CAMS product the local contributions are already present. To get a better estimate for regional background concentrations avoiding double counts, the lowest concentration found in a 0.3 × 0.3 degree area around the city for NO2 is taken, together with the mean concentration found in this area for O 3 . 200

Meteorological data
The dispersion of air pollution is strongly governed by local meteorological parameters, especially the winds driving the horizontal advection and the characterization of the boundary layer which defines the vertical mixing. Meteorology also affects the chemical lifetime of pollutants.
AERMET (EPA, 2019) is used as a meteorological pre-processor for organizing available data into a format suitable for use 205 by the AERMOD model. AERMET requires both surface and upper air meteorological data, but is designed to run with a minimum of observed meteorological parameters. Vertical profiles of wind speed, wind direction, turbulence, temperature, and temperature gradient are estimated using all available meteorological observations, and extrapolated using similarity (scaling) relationships where needed (EPA, 2018).
Hourly surface data from the nearby Schiphol airport weather station can be obtained from the Integrated Surface Database 210 (ISD, see Smith et al. (2011)). Observations of temperature, winds, cloud cover, relative humidity, pressure, and precipitation are retrofit to match the SAMSON data format (WebMet, 2019a) which is supported by AERMET. Upper air observations are taken from daily radiosonde observations in De Bilt (at 35 km from Amsterdam), archived in the Integrated Global Radiosonde Archive (IGRA) (Durre et al., 2006). Pressure, geopotential height, temperature, relative humidity, dew point temperature, wind speed and direction are converted to the TD6201 data format (WebMet, 2019b) for each reported level up to 300 hPa. 215 in which P ik represents the corresponding emission proxy. The contribution of this source to the concentration at a receptor location j is with describing the dispersion of a unit emission from i to j, including the conversion from NO X to NO 2 from the OLM. 225 Eq. (6) is assumed to describe a linear relation between emission and concentration, although strictly speaking the variable NO 2 /NO X ratio introduces a weak nonlinearity. A regression analysis is applied for a certain period, assuming that for each t the total NO 2 concentration c j at station j can be described as a background field b and a local contribution consisting of a linear combination of the dispersed fields of K emission sectors: S k represents the number of source elements for an emission sector k. The second sum in this equation is calculated for every hour with the Gaussian dispersion model taking the meteorological conditions during t into account. Note that both background concentrations b(t) and local concentrations c j (t) are taken from external data, see Section 3.2.3 and 2. Considering a period of T hours, Eq. (7) can be interpreted as a matrix equation from which the emission factors a k can be solved using ordinary least squares. Given the physical meaning of a k , only positive regression results are allowed. 235 In this setup, the emissions are approximated by three sectors highway traffic, urban traffic, and population density (K=3). The resulting a k do not necessarily represent real emission factors. Their values partly compensate for unaccounted emission sectors and unrealistic modelling (e.g. based on wrong traffic data or an incorrect chemical lifetime). In Retina a k every 24 hours, based on observations of the preceding week (T=168). Doing so, the periodic calibration adjusts itself to seasonal cycles and episodes not captured by the climatologies (e.g. cold spells or holiday periods). To avoid reducing the predictability of the 240 regression model too much (a k dropping to zero), not all reference stations are used for calibration, but only stations classified as roadside or urban background. For the Amsterdam network, N=11. The residential emissions are represented by the population density, which is a time invariant proxy. To allow for a diurnal cycle, the residential emission factor is evaluated for two-hour bins. This brings the total number of fitted emission factors to 14: one for highway traffic, one for urban traffic, and 12 describing the daily residential emission cycle. 245 Figure 4 shows an example of the air quality simulation after the emission factors have been determined. The stacked colours in the time series of Fig. 4b show that the contribution from different emission sectors to local air pollution can strongly vary from site to site.

Diurnal and seasonal analysis of calibration results
It is important to realize that the numerous modelling assumptions prevent that the calibration realistically solves the 250 underdetermined inverse problem of finding the underlying NOX emissions based on the observed NO 2 concentrations. Instead, it evaluates how much NO X must be injected into the model to explain the observed spatial NO 2 patterns (unbiased with respect to the calibration locations). To study the results of the regression analysis, a comparison was made between a summer month (July 2016, mean temperature 18.4 ºC) and a winter month (January 2017, mean temperature 1.6 ºC). Figure 5 shows the diurnal emissions for a 0.2 × 0.1 degree area, corresponding to the two grid cells of the EDGAR inventory covering the city 255 centre.
Ideally, the emissions would be around the values found in the EDGAR inventory (6.23 and 7.18 10 -10 kg NO X /m 2 /s for Summer and Winter respectively), and a corresponding ratio between residential and transport emissions (8% and 48% for Summer and Winter respectively). Unlike traffic, however, the diurnal cycle for the residential contribution is not prescribed, but is shaped in the regression analysis. The seasonal analysis shows that its fitted diurnal cycle not only describes changing 260 residential emissions, but also compensates for changing NO 2 /NO X ratios over the day (not included in the OLM) due to changing photochemistry and temperature. In daylight, the destruction of NO 2 by photolysis (NO 2 + hv → NO + O 3 ) is strong, reducing the NO 2 /NO X ratio. At low temperatures, the formation of NO 2 from NO (NO + O 3 → NO 2 + O 2 ) is slow, also reducing the NO 2 /NO X ratio. Also, due to collinearity, part of the traffic emissions will be explained by population density. Therefore, the found emission factors (and the corresponding sectoral emissions) should be considered as "effective" rather 265 than real, i.e. as factors which best describe the observations under the given model assumptions.

Assimilation of observations
As the air quality network is spatially undersampling the urban area, the observations need to be combined with additional model information to preserve the fine local structures in air pollutant concentrations. The interpolation technique of choice here is Optimal Interpolation (OI) (Daley, 1991), having the desired property that the Bayesian approach allows for assimilation 270 of heterogeneous measurements with different error bars. At an observation location the model value is corrected towards the observation, the innovation depending on the balance between the observation error and the simulation error. The error covariances determine how the simulation in the surroundings of this location is adjusted. Note that OI is essentially the same assimilation scheme as kriging-based approaches. The main advantage here is that one has detailed manual control over the error covariance matrix, which allows for a more comprehensive specification of the area of influence for each contributing 275 observation. Outside the representativity range (i.e. the correlation length) of the observations, the analysis relaxes to the model values.
Consider a state vector x representing air pollutant concentrations on the (road-following) receptor mesh (n≈40,000). Define x b as the background, i.e. the model simulation. Observation vector z contains m measurements, typically 10-100. Following the convention by Ide et al. (1997), the OI algorithm can now be written as: 280 uncorrelated), R is a diagonal matrix with the measurement variances on its diagonal. 285 P b is the n×n model error covariance matrix, describing how model errors are spatially correlated. The calculation of P b is not straightforward; in Section 5.1 an approximation is derived.
Operator H is the forward model, which maps the model state to the observed variables and locations. The matrix calculations can be simplified by reserving the first m elements of the state vector for the observation locations, and the other − elements for the road-following mesh. The Gaussian dispersion model is evaluated "in-situ" at the observation locations. 290 Avoiding reprojection or interpolation means that there are no representation errors associated with H. The simulations at the observation locations z b can then be written as a matrix multiplication in which H is an m×n matrix for which its first m columns form a unity matrix, while its remaining elements are 0.
Eq. (8) describes the analysis x a , i.e. how the observations z are combined (assimilated) with the model x b . It is a balance 295 between the model covariance and the observation covariances, described by the gain matrix K in Eq. (9). K determines how strong the analysis must incline towards the observations or remain at the simulated values, to obtain the lowest analysis error variance, P a in Eq. (10).
Note that Eq. (8)-(10) are analogous to the first step in Kalman filtering. The second step of the filter, propagating the analysis to the next time step, cannot be made here as the plume model solves a stationary state which is independent of the initial air 300 pollutant concentration field. Also note that since an approximated model error covariance matrix will be used, generally these equations do not lead to an optimal analysis, hence this approach is more correctly referred to as Statistical Interpolation.
Let vector c represent the observed NO2 mass concentrations, as described in Section 2. The distribution of the air pollutant concentrations resembles better the lognormal distribution than the Gaussian distribution, as can be seen from the Q-Q plots in Fig. 6. The analysis is done in log-space (z j = ln c j ), stabilizing the results by reducing the impact of less frequent 305 measurements of high concentrations. Once returning from the log domain, Eq. (8) can be rewritten as: By doing the analysis in the log-domain the assimilation updates correspond to multiplication instead of addition: exp( Δ ) represents the local multiplication factor with which the simulated concentration c b is corrected. This means that the shape of the model field (e.g. strong gradients found close to busy roads) is locally preserved. Note that the error in z j corresponds to 310 the relative error in c j : d = d(ln ) d ⁄ = d / . The observation error covariance matrix is therefore = diag( 1 2 , 2 2 , … , 2 ), with σ j the relative error corresponding to observation j.

Modelling the model error covariance matrix
For an optimal result in the data assimilation a realistic representation of the model covariance matrix P b is essential. The model covariances influence the spatial representativity of the observations: when model errors correlate over larger distances, 315 the assimilated observation will change the analysis over a longer range. Tilloy et al. (2013) choose to model the covariances depending on the road network. Error correlations are assumed to be high on the same road or on connected roads. For background locations, the correlation decreases fast in the vicinity of a road, while the error correlation between two background locations remains significant across a larger distance. The error covariances are kept constant in time, and taken independent of traffic conditions. 320 However, P b changes from hour to hour, mainly because varying meteorology changes the atmospheric dispersion properties.
Here, the model error covariance is estimated for each hour based on the spatial coherence of the simulated concentration field.
The covariance between two grid locations x i and x j can be expressed as their correlation and their standard deviations : The model error can only be evaluated at locations of the reference network using time series analysis. These model errors 325 are spatially interpolated to other grid locations using IDW, analogous to Eq. (2)-(3). The correlation of model errors between different locations is parametrized with a downwind correlation length L dw and a crosswind correlation length L cw . The extent of the correlation lengths reflect the turbulent diffusion and transport of the Gaussian dispersed plumes for a specific hour.
From spatial analysis of the simulation data a heuristic model is derived which describes the dependence of the correlation on distance: 330 with d the scaled distance between x i and x j (expressed as x dw and x cw along the downwind and crosswind axes) such that all points on an ellipse with semi-major axis L dw and semi-minor axis L cw have the same correlations.
To fit the parameters L dw and L cw for a certain hour, 1000 sample locations are selected from the road-following mesh. To 335 represent both polluted and less polluted areas, the locations are selected such that their concentrations are homogeneously distributed over the value range, excluding the first and last 5 percentile. For this sample, correlation lengths L dw and L cw are fitted using Eq. (14) and (15). Figure 7 shows the results of this analysis for two different hours. For fields with low gradients (e.g. when traffic contribution is low at night), large values of L can occur. To prevent assimilation instabilities, the fitted values of L are limited to a maximum 340 of 10 km. During the 2016 summer months, longest correlation lengths are found for fields with low gradients. Average midnight values, when traffic contribution is low, are about 8 km. Correlation lengths are shortest during the morning rush hour (~1 km), increasing to 3 km during the late morning and afternoon. There is a wind dependency, as stronger winds stretch the pollution plumes, increasing correlation lengths. From the fit results the average ratio between Lcw and L dw is found to be 68%. 345 Once the covariance parameters are known, the covariance matrix elements are calculated with Eq. (13). Note that for the calculation of the gain matrix K there is no need to calculate the full P b matrix. Instead, P b H T is calculated, which due to the structure of H this matrix product corresponds to the first m columns of the n×n matrix P b .
To assess the data quality across the domain, a leave-one-out analysis is performed at all locations of the reference network 350 for the period June 1 -August 31, 2016. The results are summarized in Table 3. Figure 8  From Table 3 can be seen that the relative error in the model forecast (defined as the ratio between the RMSE and the mean of the observations) is around 58% on average. When assimilating, the error becomes dependent on the distance to the nearest 365 observation locations. For sites having the nearest assimilated observation within 2 km distance, the average RMSE drops from 16.8 to 11.9 μg/m 3 , corresponding to an average relative error of 39%. For sites where the nearest assimilated observation is further away than 2 km, the average RMSE drops from 10.8 to 9.1 μg/m 3 , corresponding to an average relative error of 53%.

Added value of low-costs sensors
The previous analysis is purely based on high-quality reference measurements. In this section is explored whether the statistical 370 interpolation scheme can be used to derive useful information of low-cost measurements, despite their lower accuracy. This is done by testing different assimilation configurations during the Urban AirQ campaign, from 15 June to 15 August 2016 (see Section 2). The campaign targeted a central area with 4 reference stations and 14 low-cost sensors (See Figure 10a).

Validation is done for 5 different assimilation scenarios (AS):
• AS1: Assimilation of all reference measurements (leave-one-out); 375 • AS2: Assimilation of measurements from 3 central reference sites (leave-one-out); • AS3: Assimilation of low-cost data only; • AS4: Assimilation of measurements from 3 central reference sites (leave-one-out) and all low-cost data; • AS5: Assimilation of all reference measurements (leave-one-out) and all low-cost data. assimilation are reduced from 14 (AS1) to 3 (AS2). The correlation decreases at all 4 validation locations. At NL49012, the RMSE increases significantly due to a positive jump in the bias. The lower analysis with respect to the observations is due to the absence of assimilation of high values at nearby street location NL49002, which enlarges the influence of lower observations found at urban background location NL49019. At NL49019, located in the middle of 3 assimilation locations, the RMSE does not change significantly. Apparently, the effect of assimilation of observations farther than the surrounding 385 locations is small.
When only observations of 14 low-cost sensors are assimilated (AS3), instead of observations at 3 reference sites (AS2), there is a notable improvement visible in bias and RMSE at location NL49019. Here, the low-cost sensors are relatively nearby, the closest being sensor SD04 at 120 m distance. At the other validation locations, the low-cost sensor assimilation results in similar RMSE (i.e. within 1 µg/m 3 ), a comparable bias, but a slightly lower correlation. 390 The results can be further improved if both reference and low-cost sensor data are included (AS4 and AS5). At NL49019, the RMSE drops to 5.1 µg/m 3 (compared to 7.8 µg/m 3 when no low-cost data are included) while the correlation increases to 0.89.
Again, there is no significant difference between including the three surrounding reference locations and including all reference locations. Also at street location NL49017 and urban background location NL49003, the inclusion of low-cost sensor data improves RMSE and correlation compared to assimilations with reference data only (AS4 vs. AS2, and AS5 vs. AS1). At 395 location NL49012, the bias reduces considerably only when all reference data are included in the assimilation (AS1 and AS5).
The different assimilation scenarios show that low-cost sensor data assimilation improves the results locally, even in absence of reference data. Generally, the best results are obtained when both reference data and low-cost data are included. Assimilation can reduce local model biases. However, unrealistically modelled covariances can lead locally to the introduction of an additional bias. 400 Next, a monthly averaged concentration map of Amsterdam is constructed with all reference data and all low-cost sensor data from the first half of the Urban AirQ, see Fig. 10b. The addition of the low-cost data lowers the assimilation results by several μg/m 3 in the undersampled area west of Oude Schans (NL49019), while the NO 2 increases with several μg/m 3 around the traffic arteries found south and east of this location (Fig. 10c). A large fraction of traffic on these roads uses the IJ-tunnel to cross the river. On a monthly basis, this tunnel is used by approximately one million vehicles. 405 The second half of the Urban AirQ campaign coincides with the start of the summer holiday period and the closure of the IJtunnel for maintenance. Comparison of the NO 2 concentration maps of both periods reveal interesting features (Fig. 11). Based on averaged NO2 measurements at rural stations NL49565 and NL49703, the NO2 reduction due to meteorological variability is estimated to be 7%. The overall drop in NO 2 concentrations in the central area, however, is around 10% due to reduced traffic during the summer break. Notable exception is the historic city centre, where the NO 2 reduction is only a few percent, 410 probably related to the steady economic activity driven by tourism. The strongest NO 2 reductions, around 15%, are found around the access ways of the IJ-tunnel. A few main roads (e.g. De Ruijterkade/Piet Heinkade and Ceintuurbaan) show less NO 2 reduction than average, apparently due to redirected traffic avoiding the tunnel.
As air pollution gradients can be strong in the urban environment, it is essential to combine (sparse) measurements with an air 415 quality model when aiming at street-level resolution. Retina is a practical approach to interpolating hourly urban air quality measurements. The first step consists of a simulation by a dispersion model which is driven by meteorological data and proxies for traffic and residential emissions. The model is daily calibrated with historic measurements. In the second step, observations of different accuracy are assimilated using a statistical interpolation scheme.
Validation analysis confirms that the European CAMS ensemble is a good predictor for hourly NO 2 concentrations found in 420 the urban background. However, the CAMS data for NO 2 can be misleading when interpreted at the local scale, as the predicted diurnal cycle often deviates substantially from that observed at urban air quality stations. Local effects can be better resolved when CAMS data is used for background concentrations in a dispersion model which is driven by proxies for traffic and residential emissions. The mapping results improve considerably with the second Retina step when available observations are assimilated by the statistical interpolation scheme. When assimilating measurements of the reference network, the relative error in NO2 430 concentrations drops to 44% on average. The local error depends on the distance to the nearest observations: approximately 39% within 2 km of an observation site, increasing to 53% for larger distances. The typical correlation increases from 0.6 to 0.8.
The Bayesian assimilation scheme also allows us to improve the results by including low-cost sensor data, in order to get improved localized information. However, biases must be removed beforehand with careful calibration, as most low-cost air 435 quality sensors suffer from issues like cross-sensitivity or signal drift, see e.g. Mijling et al. (2018). The assimilation of lowcost sensor data from the Urban AirQ campaign reveals more detailed structure in concentration patterns in an area which is undersampled by the official network. The additional measurements correct for wrong assumptions in traffic emissions used in the apriori interpolation, and give better insight into how traffic rerouting (for instance due to closure of an arterial road) affects local air quality. 440 Retina has been built on open data to facilitate a flexible application to other cities. The meteorology needed for AERMOD is taken from global data sets of ISD and IGRA. Road network information can also be obtained globally from OpenStreetMap.
Traffic data tends to be hard to obtain. When no local data is available on diurnal and weekly traffic flow its patterns should be estimated. In the absence of local census data, population density data can be taken from the Global Human Settlement database (Schiavina et al., 2019), which has global coverage on a 250 m resolution. For application within Europe, the 445 necessary background pollutant concentrations can be obtained from CAMS. For applications outside Europe other data sets have to be found.
In general, degraded input data and imperfections in the dispersion modelling will deteriorate the system's capability to resolve local structures; it will lower the effective spatial resolution of the simulations. In its extreme it will only describe the blurry urban background pollution contribution added to the rural background. Oppositely, with improved input data and atmospheric 450 modelling, the effective resolution will improve, reducing local biases. This is the focus of future research.
Significant inaccuracies due to local emissions which are not described adequately by the proxies (e.g. from industry, port and airport activity) can be reduced by including these sources explicitly in the dispersion modelling. Small-scale structures provoked by the local built-up area will be better described by introducing the street canyon effect. The model will also benefit from a more detailed traffic emission model, based on more counting locations and aggregated from shorter time intervals. 455 Ideally, such an emission model takes local differences in fleet composition also into account. Finally, simulations will gain accuracy with a more realistic NOX chemistry, concerning the NO X chemical lifetime (influencing the plume length) and the NO 2 /NO X ratio.
Overall, the error of the assimilation results depends on the accuracy of the air quality model, the number of assimilated observations, the quality of observations, and the distance to the observation location. A reasonable approximation of the 460 model covariance matrix is found by assuming the model covariance to be isotropic and by fitting correlation lengths along the downwind and crosswind axes for every hour. Finding a more realistic description of the model covariance matrix will better suppress the introduction of bias by the assimilation, and will be subject to future research. Apart from assessment of historic data such as in this study, Retina has been applied successfully for near-real time monitoring and forecasting of NO2 in the cities of Amsterdam, Barcelona, and Madrid. Future work includes the implementation of other cities inside and outside of Europe, and the application of Retina to other pollutants such as particulate matter. 470

Author contribution
All work in this study was carried out by the author.