Evaluation of the coupled high-resolution atmospheric chemistry model system MECO(n) using in situ and MAX-DOAS NO 2 measurements

. We present high spatial resolution (up to 2.2 × 2.2 km 2 ) simulations focussed over south-west Germany using the online coupled regional atmospheric chemistry model system MECO(n). Numerical simulation of nitrogen dioxide ( NO 2 ) surface volume mixing ratios (VMR) are compared to in situ measurements from a network with 193 locations including background, trafﬁc-adjacent and industrial stations to investigate the model’s performance in simulating the spatial and temporal variability of short-lived chemical species. We show that the use of a high-resolution and up-to-date emission inventory is 5 crucial for reproducing the spatial variability, and resulted in good agreement with the measured VMRs at the background and industrial locations with an overall bias of less than 10%. We introduce a computationally efﬁcient approach that simulates diurnal and daily variability in monthly resolved anthropogenic emissions to resolve the temporal variability of NO 2 . MAX-DOAS measurements performed at Mainz (49.99 ◦ N, 8.23 ◦ E) were used to evaluate the simulated tropospheric vertical column densities (VCD) of NO 2 . We propose a consistent and robust approach to evaluate the vertical distribution of 10 NO 2 in the boundary layer by comparing the individual differential slant column densities

The subscript 'di' in Table 1 indicates the use of diurnal and day-of-the-week variability in NO x and CO emissions from the road transport and residential and non-industrial combustion sectors (see appendix A for further details). For the TNO f l and UBA f l set ups, constant anthropogenic emissions are used for the complete month. The sector-wise anthropogenic emissions are imported via the IMPORT submodel (Kerkweg and Jöckel, 2015). For specifying the temporal profiles (diurnal and day-135 of-the-week) in the anthropogenic emissions, we first created hourly resolved time series of scaling factors to be applied to  Figure A1. Please note that the factors have a weekly cycle, and these are normalized such that the total emission over a week is conserved for a given sector. Individual hourly time series of emission scaling factors are imported via IMPORT_TS (Kerkweg and Jöckel, 2015). The scaling factor for a specific model time step is calculated by interpolating the time series and applying to the monthly emissions and subsequently, the emissions flux and the telescopes (T1-T4) point towards the same elevation angle (EA). One complete measurement sequence for each telescope involved measurements at eight off-axis elevation angles (1, 2, 3, 5, 8, 10, 15 and 30 • ) and in the direction of the zenith. 165 The field of view at 1 • elevation angle was blocked partially for the different telescopes and hence was discarded from the subsequent analyses. We applied the DOAS principle (Platt and Stutz, 2008) to the measured spectra to retrieve the elevation angle dependent differential slant column densities (dSCD) of NO 2 and the oxygen dimer (O 2 -O 2 or O 4 ) adapting to the fit setting described in Table C1. The dSCDs can be regarded as the difference between the concentration integrated along the light path at a chosen elevation angle and the concentration integrated along the direction of the zenith. This approach is used 170 to eliminate the stratospheric information and retrieve the tropospheric contribution.
In order to retain only the highest quality DOAS fit results, we discarded all retrievals with fit rms (root mean square) values greater than 1.0×10 −3 . NO 2 VCDs are retrieved using the geometric approximation on the measured dSCDs at 30 • elevation angle. Since MECO(3) was configured to write the output for the CM02 domain at an hourly frequency as mean values, we also average the MAX-DOAS retrieved quantities (see section 2.2.2) at a similar frequency while discarding retrieval with high 175 spectral analysis rms values.

Cloud classification
A cloud classification was performed using MAX-DOAS measurements of the colour index (CI; the ratio of measured signal at 330 nm and 390 nm) and the O 4 dSCDs according to the method described by Wagner et al. (2016). In order to generate robust thresholds for the cloud classification, one month of data is not sufficient, and hence we used a longer time series of MAX-180 DOAS measurements from 27.03.2018 until 14.09.2018. We performed cloud classification separately using measurements performed by the four telescopes. Figure C1 summarizes the cloud conditions for all the days of May 2018 for telescope T2.
Briefly, clear sky conditions were observed from 05-05-2018 until the afternoon of 09-05-2018. On other days, cloudy conditions were observed for several hours with sky conditions alternating between broken clouds, continuous clouds and optically thick clouds. The cloud classification results for the other telescopes were similar to that of T2. 185 2.2.2 Retrieval of differential box air mass factors using 3D aerosol profile inversion As mentioned above, the dSCDs retrieved from MAX-DOAS measurements depend on the differential light path between the off-axis (EA = α) and zenith measurements. dSCDs are related to the VCDs via the differential air mass factors (dAM F s) according to the following equation: The O 4 mixing ratio is almost constant throughout the troposphere and its VCD only depends on the atmospheric temperature and pressure profile. Hence, using measured O 4 dSCDs and the knowledge of O 4 VCDs, the corresponding dAM F s can be calculated. If we visualize the atmosphere in several discrete layers, the partial dSCD in a specific layer (k) would be related to the partial VCD (V k ) and the differential box air mass factor (dbAM F α,k ) would be specific for the layer k in a similar way as in equation 1. dAM F can be reconstructed from the dbAM F α,k according to equation 2: The presence of aerosols can change the light path, and hence the dSCDs (and consequently the dAM F s). Profile inversion algorithms can find the optimal aerosol extinction profiles corresponding to the measured O 4 dSCDs for a sequence of elevation angles (Wagner et al., 2004;Clémer et al., 2010;Wagner et al., 2011). This can be subsequently used to calculate the dbAM F α,k in the discrete atmospheric layer indexed by k. In addition to the O 4 VCDs and measured dSCDs, the profile 200 inversion algorithms require an offline look-up table of O 4 dAM F s corresponding to various combinations of measurement geometry and aerosol extinction profiles calculated using radiative transfer models (e.g., McArtim; Deutschmann et al. (2011)).
We used the profile inversion algorithm π-MAX (Parameterized profile Inversion for MAX-DOAS measurements) (Remmers et al., in preparation), for the retrieval of the dbAM F s. In comparison to the traditional parametrized profile inversion algorithms (e.g., MAPA; (Beirle et al., 2019)), which only parametrizes the aerosol optical depth (AOD) and vertical profiles 205 of aerosol extinction (e.g. shape (s) and height (h) of the profile) for a 1D retrieval (along altitude), π-MAX includes additional parameters related to the horizontal gradients in the viewing direction. Figure 2 shows the schematic of a traditional profile inversion for an example case of AOD = 1.0, h = 1 and various parametrizations of s representing the respective profile shapes as well as additional parameters g and l for π-MAX. These additional parameters describe the linear aerosol extinction change (g) from the telescope location to a specific distance (l). Hence, they allow retrievals of 2D dbAM F s (and aerosol extinction profiles) as a function of distance from the telescope and altitude from the instrument, if measurement in only one azimuth direction is considered. If measurements in several azimuth directions are combined, 3D retrievals can be performed. In the current π-MAX set up, l is fixed to 10 km. The dSCD measurements in all four directions are used simultaneously with the constraint that the profile at the origin (location of the instrument) is the same for all the telescopes.
The quality of the profile retrieval from π-MAX can be qualified using the rms of the dSCD fit corresponding to each 215 complete elevation sequence. In order to retain only the highest quality profile inversion results, we have retained retrievals corresponding to rms values less than 0.04 times the O 4 VCDs.

In situ chemical and meteorological measurement data
We used the surface temperature, relative humidity and wind measurement data from the climate data centre of the German online using the chemiluminescence method, in which NO 2 is reduced to NO using a heated molybdenum converter prior to its detection (Eickelpasch and Eickelpasch, 2004). Only at Schmücke (DEUB029), a photolytic converter is used in place of the molybdenum converter, whereas at Pfälzerwald-Hortenkopf (DERP017), a CAPS (Cavity Attenuated Phase Shift Spec-230 troscopy) instrument is used for measurement of NO 2 (https://www.env-it.de/stationen/public/downloadRequest.do). O 3 is measured online using UV absorption technique at all the stations. The measured in situ data are available at 1-hour resolution.

Results and Discussion
3.1 Meteorological evaluation: surface temperature, relative humidity and wind speed evaluate the performance of MECO(n) in the CM02 set up with respect to the measured surface temperature, relative humidity and wind speed close to the surface.
The ability of the model to reproduce the temporal variability at multiple measurement stations can be evaluated using Taylor diagrams (Taylor, 2001), where we show the Pearson correlation coefficient (R), relative root mean square difference (RMSD) and relative standard deviation (RSD) with respect to the hourly resolution measured data. The Taylor diagrams for 245 these parameters are shown in Figure 3. For surface temperature, the trends in the hourly resolved time series agree quite well with Pearson correlation coefficients generally between 0.8 and 0.9. The spatial patterns of the surface temperatures are also represented very well as inferred from small and precise RMSD values of circa 0.5. RSD values of less than 1 indicate that observed temporal variability in the model has a smaller amplitude than that of the measurements. There is a cold bias of ∼3 • C across the domain, which is similar to 250 that observed by Mertens et al. (2016) for Germany in summer. Previous long term evaluation of the COSMO-CLM model has shown a cold bias of 2 -2.5 • C compared to observation of the annual mean surface temperature over Germany, which increases in the summertime (Böhm et al., 2006). This bias is most probably due to inaccurate representation of root depth and soil temperature damping in the soil model. For relative humidity, the trends in the hourly resolved time series agree reasonably well with Pearson correlation coefficient generally between 0.5 and 0.7. Both positive and negative mean biases are observed 255 for the different stations. For wind speeds, the Pearson correlation coefficients are generally between 0.2 and 0.5, but the bias was generally small and in the range of ±1 m s −1 .
3.2 Evaluation of surface mixing ratios of NO 2 In this section, we present the model results for simulated NO 2 surface volume mixing ratios (VMR) and compare with the in situ observations for May 2018. Figure 4 shows the spatial distribution of monthly mean NO 2 VMRs in the lowest vertical 260 layer (0-20 m) for the CM02 domain for the three model set ups listed in Table 1. The monthly mean VMRs from the in situ measurement stations are depicted as square, circle and pentagon markers for background, traffic-adjacent and industrial sites, respectively, overlaid on the maps using the same colour scale as that for simulated VMRs. Overall, the spatial distribution of NO 2 VMRs is as expected, such that the high values are observed in densely populated areas, e.g., the Ruhr area, Luxembourg, around Frankfurt, Mannheim, Karlsruhe and Stuttgart. For the simulation with the 265 high-resolution UBA emissions, we observe many details in the NO 2 surface concentration with higher values coinciding with the major motorways of Germany which were not so obvious with TNO f l (e.g., A61 motorway between Köln and Bingen, A3 between Frankfurt and Bingen, A48 and A1 connecting Koblenz and Luxembourg and A4 and A9 between Gießen and Leipzig). The performance of the model in reproducing the spatial variability can be quantitatively described using the root mean square deviation (RMSD) between the monthly mean measured and simulated NO 2 VMRs for all the measurement 270 stations combined. We note that using the high-resolution UBA emissions improves the RMSD for background locations from 3.3 ppb (∼ 45% of the measured mean) for TNO f l to 2.7 ppb (∼ 37%). Since UBA emissions are up to date and are available for the same year as that of simulation, the mean bias for the background locations also improves from -2.0 ppb (-27%) for TNO f l to -0.5 ppb (-7%) for UBA f l . At locations near heavy traffic, the bias improved from -12.5 ppb (-63%) for TNO f l to -10.4 ppb (-52%) for UBA f l . over urban centres in Germany have also pointed towards an underestimation of as much as a factor of 2 from the transport sector and 1.5 overall (Kuik et al., 2018) by the TNO MACC III emissions, even by considering a conservative approach. The underestimation of the a priori NO x emissions is the most important factor for the large negative bias in the TNO f l set up.
Getting up to date emission inventories is difficult, and we could only get this data for Germany, but for future studies with simulation involving larger domains which include more countries, it is recommended to use more current emission inventories.

285
Unlike for the background locations, we do not observe a major improvement for the traffic-adjacent locations using the UBA emissions. Even at such high spatial resolution (2.2 × 2.2 km 2 ), the spatial smoothing leads to insufficient reproduction of peaks for locations close to strong emission sources as also documented by Shaiganfar et al. (2015). Another factor which could contribute to the differences between measured and simulated NO 2 is related to the chemiluminescence principle used for measurements: NO 2 is first reduced to NO before subsequently reacting with O 3 generated within the analyser. This is 290 known to overestimate the actual NO 2 , because the molybdenum converter within the analyser also reduces the NO x reservoir species (e.g., HNO 3 , PAN) to NO prior to detection (Dunlea et al., 2007). PAN and HNO 3 are more abundant at the traffic and urban locations with a combined monthly mean mixing ratio of between 0.9 and 1.1 ppb in the UBA di set up. This could account for 3 -10% of the measured NO 2 at the traffic-adjacent locations. In Germany, transport emissions account for > 45% of the total NO x emissions, which show a large diurnal variability (greater than 200 % peak to peak; see Figure A1). These variabilities are generally not taken into account for regional model simulations and have shown to cause larger bias during peak traffic hours on weekdays (Kuik et al., 2018). In the UBA di set up, accounting for diurnal and day-300 of-the-week variability in the anthropogenic emissions shows significant improvement with R values of between 0.3 and 0.6, smaller RMSD values and more consistent agreement for different stations. However, we also note that overall negative bias is increased for the UBA di set up compared to UBA f l . The diurnal profiles of emissions in the transport sector increase the NO x amount to more than twice as much during the daytime when the atmospheric lifetime is lower and decreases to less than one quarter during the night when lifetime is high. Hence, overall, the monthly mean surface NO 2 VMR decreases when a diurnal 305 profile is applied to NO x emissions. The normalized standard deviation (relative to the standard deviation of measured VMRs) improves by inclusion of diurnal profiles, as we observe more stations with values in the 0.5 -1.5 range.
Using the different COSMO/MESSy instances of the MECO(3) system, we are able to investigate the importance of model resolution if the same emission inventory is used for different model resolutions. Figure C2 shows the spatial distribution of simulated NO 2 surface VMRs and the agreement with measurements for the CM07 set up in a similar way as Figure 4 for 310 CM02. We note that for TNO f l , there is only very little further detail in the spatial distribution for CM02 as compared to CM07. Meteorology (e.g. wind patterns) might be better resolved using a fine model resolution on individual days. Still, when averaged over several days, these will be smoothed, and the spatial patterns would be limited by the resolution of the input emissions inventory. Consequently, we also did not observe any significant improvement in the RMSD (45%) between the monthly mean measured and simulated NO 2 VMRs as compared to the CM07 simulation (RMSD = 48%) using the TNO 315 MACC III emission inventory. In contrast to this, using the high-resolution UBA emissions, the spatial details as depicted in CM02 smear out in the CM07 instance. For both, UBA f l and UBA di set ups, the RMSD improves from ∼ 45% for the CM07 to ∼ 37% for CM02, showing the added value of the higher resolution simulation. Hence, for studies where small scale variability is crucial, it is important to use a high-resolution model together with an input emission inventory of similar resolution.
Further reasons which could account for the lower bias of the NO 2 VMR in the model could be related to stronger advection 320 and vertical mixing. The vertical distribution of NO 2 is evaluated in section 3.3.2. Regarding advection, the wind speeds at 10 m altitude in the CM02 domain have been compared with measurements at 95 stations located within the CM02 domain (see section 3.1). A small positive bias of ca. 0.5 m s −1 was found. A cold bias of ∼ 3 • C (a general feature of COSMO in summer over western Europe (Böhm et al., 2006)) was observed across the CM02 domain, but that should not cause a lower bias in the simulated NO 2 VMRs. domain of all three set ups, we learn that the partial column in the lowest 1 km accounts for ∼ 80% of the tropospheric NO 2 column.
However, a generalized vertical integration on a regular model grid can introduce artefacts, because MAX-DOAS measurements are rather sensitive to air mass in the viewing direction for distances of up to a few kilometres. The artefacts increase 340 with increasing spatial heterogeneity. In order to consider NO 2 only in the viewing direction of the four telescopes, we first linearly interpolated the simulated concentrations along the respective viewing directions of the telescopes (see, e.g., Figure 1).
The tropospheric VCDs are calculated in the following two steps: (1) summing up the partial VCDs vertically up to a height of 4 km and (2) taking the mean of VCDs up to a fixed distance (3 km as a first estimate) only in the line of sight of the telescope. An example time series of the simulated VCDs for the UBA di set up is shown in Figure 6 along with the measured 345 MAX-DOAS VCDs. Figure 7 shows the agreement between the MAX-DOAS geometric VCDs and the simulated VCDs as scatter plots for the three different set ups in different panels. The frequency distribution of the measured and the simulated VCDs are also shown next to the corresponding panels.
We observe a large scatter between the MAX-DOAS and simulated VCDs in all three model set ups, but the best agreement 350 was observed for the UBA di set up with 78% of the simulated VCDs no less than half and no greater than twice the magnitude of the measured VCDs and the best Pearson correlation coefficients (R = 0.33) among the three. For TNO f l , large underestimation of VCDs was observed, as also seen for surface VMRs in section 3.2. The markedly lower bias (see Table C2 and C3) can be attributed to significantly lower input NO x emissions (e.g., ∼ 40% lower compared to UBA emissions over Germany for 2018).
Using UBA emissions reduces the bias from 37 -47 % to 9 -21 % for the different telescopes. Adding diurnal variability 355 to emissions reduces the bias further (-1 -13 %), as it increases the emissions during the daytime (see Figure A1 and Table   C2). While the model was able to capture the general trend in day-to-day variability, the intra-day variability could not be reproduced on most of the days. The agreement was much better on the days with clear sky conditions (4-9 May) and periods if the analysis is restricted to cloud-free conditions. Between 5 and 9 May, the simulated VCDs matched almost exactly to the MAX-DOAS VCDs for the telescopes T3 and T4, but for T1 and T2 the agreement was not as good. Several factors can contribute to the observed differences, but there are at least two shortcomings related to VCD comparison, which hinder a conclusive assessment.
-The MAX-DOAS VCDs are calculated using the geometric approximation which assumes a single scattering event of the 365 incoming photons above the trace gas layer. This yields reasonable VCDs for clear sky conditions with a low aerosol load scenario (Shaiganfar et al., 2011;Kumar et al., 2020). More accurate VCDs can be retrieved using the profile inversion approach, which also accounts for aerosol extinction profiles and the relative sun geometry. For the three clear sky days with low aerosol load, we also performed trace gas profile inversion using the sophisticated π-MAX approach, which also considers a linear change in NO 2 concentration along the line of sight. The VCDs retrieved using π-MAX (shown 370 in Figure C1) agree quite well with the geometric VCDs for these days.  inversion approach, which also requires an a priori estimate of the NO 2 vertical profile and can bias the model evaluation 375 if the assumed a priori profile is similar to that simulated by the model. The averaging kernels (A k ) can be applied on the model partial column (V k ) to calculate the modified VCD (V CD corr ), using the a priori profile V a k according to equation 3, which can be directly compared to the MAX-DOAS VCDs.
For high A k (i.e., close to 1 ), V k is limited by the simulated V k , whereas for low A k (i.e., close to 0; where MAX-DOAS 380 sensitivity is limited), V k is limited by the a priori profile. Hence, the choice of the a priori profiles can impact the comparison in the latter scenario.
-While calculating the simulated VCDs, model NO 2 fields are given equal weights up to a distance of 3 km in the viewing direction and 4 km altitude. As mentioned previously, distances in both dimensions were only a first estimate and the actual MAX-DOAS sensitivity distances in both dimensions might vary according to the aerosol load, trace 385 gas distribution, viewing geometry with respect to the sun, and presence of clouds. Moreover, even within the actual sensitivity volume, the sensitivity might vary as a function of distance from the telescope. Blechschmidt et al. (2020) have previously demonstrated that accounting for vertical sensitivity of MAX-DOAS (via averaging kernels) does not noticeably affect the simulated VCDs, because most of the NO 2 is located within the boundary layer and the averaging kernel profile has a similar shape as the model NO 2 vertical profiles. Hence, vertical sensitivity was not an issue where 390 most of the NO 2 is located. However, if the profile is elevated, then this may no longer hold true. Nevertheless, sensitivity in the horizontal direction still needs to be accounted for, as large heterogeneity is expected close to the emission sources for short-lived species like NO 2 . The studies by Blechschmidt et al. (2020) and Vlemmix et al. (2015) proposed the relatively coarser model resolution of up to 7 × 7 km 2 as one possible reason for this discrepancy. For comparison with ground-based measurements (e.g., MAX-DOAS), it is crucial to have model simulations with a grid resolution finer than 395 or the same as the typical sensitivity ranges of the measurements. If that is not the case, spatial heterogeneity within the model grid box would result in underestimation of the enhancement and overestimation of the background values due to spatial smoothing. For MAX-DOAS measurements at 30 • elevation angle, the horizontal sensitivity distance (HSD) can be approximated using the boundary layer height (BLH) (HSD = BLH/sinα), which would be in the range of 1-3 km in the daytime. However, the exact HSD also depends on the aerosol conditions, which can vary significantly over time 400 and should be retrieved from measurements.
In the next section, we will address these shortcomings by calculating the differential slant column densities (dSCDs) using the simulated NO 2 . Simulated dSCDs can be directly compared to the corresponding quantities derived from the DOAS analyses while also avoiding several assumptions and approximations discussed earlier. The dbAM F s used for dSCD calculation inherently account for the aerosol conditions and hence also address issues related to spatial sensitivity (see section 2.2.2). This

Calculation of simulated dSCDs
For calculation of dSCDs from the model simulated NO 2 fields, we mimic the viewing geometry and sensitivity volume corresponding to MAX-DOAS measurements using the differential box air mass factors as described in section 2.2.2. Using 410 MAX-DOAS, we probe the vertical and horizontal variation of NO 2 concentrations by measuring at various elevation angles.
The sensitivity of the MAX-DOAS measurements at a given elevation angle (EA) is described by the differential box air mass factors (dbAM F s). An example of the dbAM F s along the viewing direction of telescope T1 for 09-05-2018 14:00 UTC for EAs ranging from 3 to 30 • is shown in Figure 8. In each viewing azimuth (corresponding to T1-T4) and for each EA (α; between 2 and 30 • ) we perform a two-dimensional summation of partial VCDs (V i,k ) weighted by the differential box air 415 mass factors (dbAM F α,i,k )(unit-less), along the distance from MAX-DOAS (indexed as i) and altitude above the instrument (indexed as k): with

420
Where, c i,k and dh i,k represent the concentration (molecules cm −3 ) of trace gas in the grid with a thickness of dh i,k (cm).
Using equation 4, we can estimate the EA dependent horizontal sensitivity distances (HSD) in the viewing direction of the MAX-DOAS as the distance from the instrument which accounts for 90% of the simulated dSCDs. Fig C3 shows the HSD for all the off-axis elevation angles for the four telescopes. The mean HSD increases from 3-4 Km for 30 • EA to 8-9 Km for 3 • EA. In contrast to the comparison of the geometric VCDs in the previous section, which is limited to only one EA, we can 425 evaluate the dSCDs at various EAs with varying sensitivity volume from the location of the instrument. Additionally, while calculating the dSCDs, we also account for the horizontal heterogeneity and varying sensitivity within the sensitivity volume in the viewing direction. Figure 9 shows an example time series of measured and simulated dSCDs for UBA di for 30 • EA. Comparing these values to the VCDs shown in Figure 6, we observe that the simulated dSCDs are higher than the calculated VCDs as shown in the 430 previous section, implying dAM F s at 30 • EA are larger than 1. This is also observed under the cloud-free conditions, which corroborates the drawback in the geometric approximation and assumption of spatial homogeneity in the viewing direction. Figure 10 shows the distribution of measured and simulated dSCDs (different colours for the different model set ups described in Table 1) for the various elevation angles as box and whiskers plots for the four telescopes in separate panels. Measurements 435 performed at low EAs have large light paths and are more sensitive to air mass close to the surface (higher dbAM F s; see     Table C2 for all measurements and in Table C3 for cloud-free cases only. For 30 • EA we did not observe any significant change in the agreement of temporal variability (i.e. R) between model and MAX-DOAS, whether 445 we compare dSCDs or VCDs. However, significantly more information is gleaned with respect to the available dSCDs corresponding to all the off-axis EAs at which measurements were performed. Instead of comparing only one VCD value for a complete elevation sequence, we include dSCDs corresponding to each of the elevation angles. For example, in Figure 11, we observe a much better agreement between the measured and simulated dSCDs at 3 • EA, especially on the cloud-free days.

Evaluation of simulated dSCDs
In Figure 12  to cloud-free cases only. Similar to the VMR and VCD comparisons, the best accountability was observed for UBA di , with ∼ 455 82% for the simulated dSCDs no less than half and no greater than twice the magnitude of the measurements for cloud-free cases. The frequency distribution of the measured dSCDs follows a right-skewed normal distribution for all the EAs, which is also represented best by the UBA di set up. The width of the peak of the frequency distribution broadens from high to low EAs, representing a larger scatter in the measurements at low EAs. For UBA f l , the extreme values (less than half or more than double of measurements) were mostly observed for cloudy sky conditions. Since clouds are not considered in the radiative 460 transfer simulations and therefore also not in dAM F retrieval, these only affect the measured dSCDs. The improved agreement between simulated and measured dSCDs for the cloud-free conditions are more obvious for the individual EAs (see Tables C2 and C3) with significantly higher Pearson correlation coefficients and smaller RMSD values.
From Tables C2 and C3, it can be inferred that the agreement between measured and simulated dSCDs improves at lower EAs, which also indicates a better performance of the model in the layers close to the surface. For example, both for the UBA f l 465 and UBA di set ups, a large positive bias in the range 27 -42% was observed at 30 • EA, but for low EAs (e.g. ≤ 8 • ) small biases in the range of +7 and -15% were observed for all the azimuth directions in the cloud-free cases. However, more pronounced negative biases were observed for TNO f l simulation even at low elevation angles. A positive bias at 30 • EA is most likely due to the stronger vertical mixing in COSMO/MESSY as also indicated by Mertens et al. (2016). Due to the stronger mixing, a large fraction of NO 2 reaches at higher altitudes in the model, which has a strong weight for the dSCD calculation at high Previous studies in which models were compared with MAX-DOAS indicated that the major disagreements arise due to weekday and weekend differences and inappropriate representation of the diurnal cycle of emissions (Blechschmidt et al., 2020;Vlemmix et al., 2015). We use the measurements at a low elevation angle (e.g. 3 • ) to investigate if the model can reproduce the measured diurnal profiles and weekday-weekend differences of NO 2 dSCDs. A low elevation angle was chosen  The left panel in Figure 13 shows the distribution of measured and simulated NO 2 dSCDs at 3 • EA, for all telescopes dSCDs are observed on the weekends. However, such a distinct weekday-weekend difference was not observed for TNO f l and UBA f l simulations, supporting the fact that the observed differences were primarily because of change in emissions and not due to varying meteorological conditions. Smaller weekend dSCDs were only for the UBA di set up which also accounts for weekday and weekend differences in NO x emission from the transport and residential combustion sectors. Concerning 485 the diurnal variation, both UBA f l and UBA di show similar dSCDs and surface VMRs during the daytime which also agrees well with the measurements. However, stronger discrepancies are observed during early morning and late evening hours.
Comparison of surface VMRs provide further information about the night-time, where even a larger difference is observed between the two model set ups. NO 2 has a short lifetime of a few hours in the daytime, which together with a strong mixing in the daytime boundary layer compensates even for ∼ 50% higher emissions (an increase in emissions from the transport sector using "flat" emission profiles (see Table C3). For model studies concerning satellite measurement having afternoon overpasses (e.g. OMI, TROPOMI), consideration of these factors for the anthropogenic emissions will have a relatively weak effect in our study domain or similar urban environments. For the biomass burning regions (e.g. tropical forests, southern Africa), strong NO x emissions are typically observed during mid-day, which results in a diurnal profile of NO 2 columns with a broad daytime peak (Boersma et al., 2008). Consideration of diurnal profiles of emissions in model simulations is crucial for comparisons 500 with satellite observations in these regions (Miyazaki et al., 2012;Boersma et al., 2008).

Conclusions
We performed high spatial resolution (up to 2.2 × 2.2 km 2 ) regional model simulations focused on south-west Germany to evaluate the short-lived pollutant NO 2 using MAX-DOAS and a network of in situ measurements. Three different MECO (3) simulations were performed in order to investigate the importance of model spatial resolution and the influence of spatial and 505 temporal resolution of input emission inventories. Anthropogenic emission inventories generally used for model simulations (e.g. TNO MACC for Europe) do not cover the most recent periods in most of the cases and are available at spatial resolutions coarser than that of the model. We show that the spatial patterns of NO 2 are best reproduced in the UBA f l set up, in which an up to date and high-resolution (1 × 1 km 2 ) input emission inventory (UBA) is used. In the UBA di set up, use of accurate temporal profiles (e.g. diurnal and day of the week) of the road transport and residential combustion emissions improves the 510 agreement of the temporal profiles at individual measurement stations. An improved agreement was observed at the background and industrial locations with an overall bias of less than 10% for the UBA di set up. However, the model largely underestimates the NO 2 VMRs (by up to 50%) at the traffic-adjacent locations. For the background locations, the mean diurnal profiles were accurately simulated in the UBA di set up. Biases were stronger if the fine resolution emissions were used for a coarser resolution model simulation (e.g., 7 × 7 km 2 ). In contrast, a finer resolution model employing a coarse emission inventory did 515 not result in the addition of large spatial details.
We employed the measurements of a 4-azimuth MAX-DOAS instrument from Mainz to first compare the tropospheric NO 2 VCDs. The day-to-day variability was reasonably well reproduced by the model in the UBA di set up for all four viewing directions with biases of between -10% and 2%. To further augment the surface VMR and tropospheric VCD evaluation, we apply a consistent approach of comparison of the so-called differential slant column densities (dSCDs), which, we suggest, comes 520 with several advantages. Firstly, dSCDs are available for a number of elevation angles, each of which has distinct sensitivities to the different vertical level of the troposphere. Hence, this approach enables an evaluation of the vertical distribution of NO 2 in the model. Additionally, the horizontal sensitivity distance of the MAX-DOAS instrument also changes for different elevation angles, as described by the corresponding differential box air mass factors. Hence, when using the measurement at one single location, model evaluation can be performed for various sensitivity volumes. Secondly, as compared to the VCD 525 comparison, which gives one comparable quantity per complete elevation sequence of MAX-DOAS measurements, dSCDs provide a way to evaluate the simulation against the measured values for each elevation angle. Finally, for the dSCD compar-ison, we succeed in overcoming the uncertainties introduced by the assumption of a homogeneous spatial distribution of NO 2 and in some cases a priori estimates of its vertical distribution for the retrieval of VCDs from the MAX-DOAS measurements.
We evaluated the simulated dSCDs in the four azimuth direction for seven elevation angles (EA) ranging from 2 • to 30 • .

530
We observe a similar variation of measured and simulated dSCDs for various EAs indicating a reasonable vertical distribution of NO 2 . The agreement between model and simulation improved for lower elevation angles indicating better accountability at near-surface layers. The agreement further improved, if only the measurements in cloud-free conditions were considered for the comparison. We did not observe large differences in the measured dSCDs in the four azimuth direction because of the prevailing wind direction from urban areas. We also show that the consideration of diurnal profiles of the anthropogenic 535 emissions is crucial for comparison with NO 2 dSCDs and VMR measurements for early morning and night time hours. For the afternoon hours, however, even up to 50% higher anthropogenic NO x emissions only have a minor effect on ambient VMRs and dSCDs due to its short atmospheric lifetime.
Over the last two decades, several MAX-DOAS measurements have been reported from locations across the world, which have substantially contributed to the evaluation of satellite observations. We think that the complexity and uncertainties in-540 volved in VCD retrieval from the MAX-DOAS measurements have so far hindered a similar scale application in model evaluation. The consistent dSCD comparison approach proposed in our study validates such usage of these valuable datasets, which can be used to evaluate the vertical distribution of trace gases within the boundary layer.  (2005) also provide recommendations for sector-specific fine temporal profiles (day of the week and diurnal) of emissions (see Figure A1). The fine temporal profiles are applied only in the UBA di set up using the MESSy ONEMIS submodel as described in section 2.1. The recommendation of Builtjes et al. (2002) and Schaap et al. (2005), however, do not differentiate between the diurnal profiles of emission between weekdays and weekends, which is crucial for the road transport emissions.
Hence, for the road transport sector, we used the actual hourly vehicle count on the A60 motorway for 2018, and derived the 565 temporal emission profiles assuming a direct scaling between the number of vehicles and emissions. The actual vehicle counts are provided by the automatic vehicle counter and are available at www.bast.de/BASt_2017/DE/Statistik/statistik-node.html.
From Figure A1, we note that for weekdays, the derived profiles look similar to those recommended by Schaap et al. (2005).
However, for the weekend, the shape of the diurnal profiles are markedly different from the factors derived using actual vehicle counts, which show a single broad afternoon peak.
570 Figure A1. Temporal profiles of NOx emissions from various sectors. "TRA": Road transport, "ENE": Energy generation, "RCO": Residential and non-industrial combustion, "IND": Industries. For "TRA", lines and markers correspond to the profiles derived using actual vehicle count, while for others, profiles are derived according to the recommendation of Schaap et al. (2005). For "TRA", the recommendation of (Schaap et al., 2005) are shown as the dashed line in panel (a).
Please note that the current implementation of incorporation of diurnal emission factors is limited to surface emissions.
Since for "ENE" and "IND" sectors, a significant fraction is emitted at high altitudes, the day of the week and diurnal profiles could not be applied to these sectors. However, from Figure A1, we note that these temporal variations are not as strong as for "TRA"(> 200% peak to peak) and "RCO" (> 120% peak to peak) sectors. Moreover, for Germany, "ENE" and "IND" account for only ca. 14% and ca. 18% of the total NO x emissions as compared to ca. 45% from "TRA" only. Hence, we only expect 575 a minor effect of including the day of the week and hourly temporal factors of emissions of "ENE" and "IND" sectors on the total NO x emissions. with high values relatively far from the NO 2 hotspots. We observe an overestimation for smaller O 3 VMRs for all three model set ups. Since ozone is formed photochemically in the troposphere, we have further investigated the agreement separately for daytime (07:00-18:00 UTC) and night time for the UBA di set up in Figure B2.
During night time, we observe a general overestimation by ∼ 37% for all the measurement stations combined. However, during daytime, we observed much better agreement with a relative bias of ∼ -5%. Mertens et al. (2016)  The Taylor diagrams in Figure B3 show significantly better model performance (larger R and smaller RMSD values) as 595 compared to that for NO 2 in section 3.2. This is due to the relatively larger lifetime of O 3 , owing to which strong spatial gradients are not observed. The temporal variability is, however, underestimated as evident by the relative standard deviations of less than 1. Using an updated and high resolved anthropogenic emission with high NO x did not elicit further improvement in   to inappropriate NMVOC speciation, biogenic emissions and a relatively simpler chemical mechanism used in the model, due to which we are not confident about this finding. Further investigation in this direction is beyond the scope of this study and should be pursued in future works with complex chemistry.   The HSD for an elevation angle α is estimated as the distance along the viewing direction which accounts for 90% of simulated dSCDα.