The Airborne ROmanian Measurements of Aerosols and Trace gases (AROMAT) campaigns

. The Airborne ROmanian Measurements of Aerosols and Trace gases (AROMAT) campaigns took place in Romania in September 2014 and August 2015. They focused on two sites: the Bucharest urban area and the power plants in the Jiu Valley. Their main objectives were to test recently developed airborne observation systems dedicated to air quality studies and to verify the concept of such campaigns in support of the validation of spaceborne atmospheric missions such as the TROPOspheric 5 Monitoring Instrument (TROPOMI)/Sentinel-5 Precursor (S5P). We present the AROMAT campaigns, focusing on the ﬁndings related to the validation of tropospheric NO 2 , SO 2 , and H 2 CO . We also quantify the emissions of NO x and SO 2 at the two sites. We show that tropospheric NO 2 vertical column density (VCD) measurements using airborne mapping instruments are in principle valuable for satellite validation. The signal to noise ratio of the airborne NO 2 measurements is one order of mag-10 nitude higher than its spaceborne counterpart when the airborne measurements are averaged at the TROPOMI pixel scale. A signiﬁcant source of comparison error appears to be the time variation of the NO 2 VCDs during a ﬂight, which we estimated at about 4x10 15 molec cm − 2 in the AROMAT conditions. Considering the random error of the TROPOMI tropospheric NO 2 VCD ( σ ), the dynamic range of the NO 2 VCDs ﬁeld extends from detection limit up to 37 σ (2.6x10 16 molec cm − 2 ) or 29 σ (2x10 16 molec cm − 2 ) for Bucharest and the Jiu Valley, respectively. For the two areas, we simulate validation exercises of the

We show that tropospheric NO 2 vertical column density (VCD) measurements using airborne mapping instruments are in principle valuable for satellite validation. The signal to noise ratio of the airborne NO 2 measurements is one order of magnitude higher than its spaceborne counterpart when the airborne measurements are averaged at the TROPOMI pixel scale. A significant source of comparison error appears to be the time variation of the NO 2 VCDs during a flight, which we estimated 10 at about 4x10 15 molec cm −2 in the AROMAT conditions. Considering the random error of the TROPOMI tropospheric NO 2 VCD (σ), the dynamic range of the NO 2 VCDs field extends from detection limit up to 37 σ (2.6x10 16 molec cm −2 ) or 29 σ (2x10 16 molec cm −2 ) for Bucharest and the Jiu Valley, respectively. For the two areas, we simulate validation exercises of the TROPOMI tropospheric NO 2 product using airborne measurements. These simulations indicate that we can closely approach the TROPOMI optimal target accuracy of 25% by adding NO 2 and aerosol profile information to the airborne mapping obser-1 Introduction 20 Since the launch of the Global Ozone Monitoring Experiment (GOME, Burrows et al. (1999)) in 1995, spaceborne observations of reactive gases in the UV-visible range have tremendously improved our understanding of tropospheric chemistry. GOME mapped the large urban sources of NO 2 in North America and Europe, the SO 2 emissions from volcanoes and coal-fired power plants (Eisinger and Burrows, 1998), and the global distribution of H 2 CO with its maxima above East Asia and the tropical forests (De Smedt et al., 2008). Subsequent air-quality satellite missions expanded on the observation capabilities of GOME.
25 Table 1 lists the past, present, and near-future nadir-looking satellite instruments dedicated to ozone and air quality monitoring with their sampling characteristics in space and time. The pixel size at nadir has shrunk from 320x40 km 2 (GOME) to 3.5x5.5 km 2 (TROPOMI, Veefkind et al. (2012)). This high horizontal resolution enables for instance to disentangle contradictory trends in ship and continental emissions of NO 2 in Europe  or to distinguish the different NO 2 sources in oil sand mines in Canada (Griffin et al., 2019). The satellite-derived air quality products are now reliable enough to improve 30 the bottom-up emission inventories (e.g. Kim et al. (2009), Fioletov et al. (2017, Bauwens et al. (2016)) and to be used in operational services, for instance to assist air traffic control with the near-real time detection of volcanic eruptions (Brenot et al., 2014). The bottom lines of Table 1 presents the near-future perspective in spaceborne observation of the troposphere: a constellation of geostationary satellites will provide hourly observations of the troposphere above east Asia (GEMS, (Kim, 2012)), North America (TEMPO, Chance et al. (2013)), and Europe (Sentinel-4, Ingmann et al. (2012)). These new developments will 35 open-up new perspectives for atmospheric research and air quality policies .
Validation is a key aspect of any spaceborne Earth observation mission. This aspect becomes even more important as the science matures and leads to more operational and quantitative applications. Validation involves a statistical analysis of the differences between measurements to be validated and reference measurements (von Clarmann, 2006). The aim of validation is to verify that the satellite data products meet their requirements in terms of accuracy and precision. Table 2 presents such re-40 quirements for the TROPOMI-derived tropospheric vertical column densities (VCDs) of NO 2 , SO 2 , and H 2 CO (ESA, 2014). Richter et al. (2014) have discussed the challenges associated with the validation of tropospheric reactive gases. These challenges arise from the large variability in space and time of short-lived reactive gases, the dependency of the satellite products on different geophysical parameters (surface albedo, profile of trace gases and aerosols), the differences in vertical sensitivity between satellite and reference (ground-based or airborne) measurements, and the small signals. An ideal validation study 45 would involve a reference dataset of VCDs whose well-characterized uncertainties would be small compared to those required for the investigated products. This reference dataset would cover a large amount of satellite pixels with adequate spatial and temporal representativeness at different seasons, places, and pollution levels. Beside the VCDs, the ideal validation exercise would also quantify the geophysical parameters that impact the retrieval of the investigated satellite products. In the real world however, Richter et al. points out that "the typical validation measurement falls short in one or even many of these aspects".

50
The first validations of the tropospheric NO 2 and H 2 CO VCD products of GOME involved in-situ samplings from aircraft (Heland et al., 2002;Martin et al., 2004). Such measurements may cover good fractions of satellite pixels but they miss the lower part of the boundary layer, where the trace gas concentrations often peak. Schaub et al. (2006) and Boersma et al. (2011) summarize other early validation studies for the tropospheric NO 2 VCDs retrieved from GOME, SCIAMACHY, and OMI.
Several of these studies make use of the NO 2 surface concentration datasets from air quality monitoring networks. Compared 55 to campaign-based data acquisition, operational in-situ networks provide long-term measurements, but their comparison with satellite products relies upon assumptions on the NO 2 profile. Other validation studies use remote-sensing from the ground and aircraft, in particular based on the Differential Optical Absorption Spectroscopy (DOAS) technique (Platt and Stutz, 2008), which is also the basis for the retrieval algorithms of the satellite-derived products. In comparison with in-situ measurements, DOAS has the benefit of being directly sensitive to the column density of a trace gas, i.e. the same geophysical quantity which 60 is retrieved from space. Heue et al. (2005) conducted the first comparison between a satellite-derived product (SCIAMACHY tropospheric NO 2 ) and airborne DOAS data. Many validation studies also use ground-based DOAS measurements, in particular since the development of the Multi-AXis DOAS (MAX-DOAS) technique (Hönninger et al., 2004). MAX-DOAS measurements are valuable for validation due to their ability to measure integrated columns at spatial scales comparable to the satellite ground pixel size. Moreover, they broaden the scope of validation activities since they also provide limited profile information on both trace gases and aerosols (Irie et al., 2008;Brinksma et al., 2008;Ma et al., 2013;Kanaya et al., 2014;Wang et al., 2017;Drosoglou et al., 2018). The limitations of using the MAX-DOAS technique for validation arise from their still imperfect spatial representativeness compared to typical satellite footprints and to some extent from their limited sensitivity in the free troposphere. Spatial representativeness has often been invoked to explain the apparent low bias of the OMI tropospheric NO 2 VCDs in urban conditions (Boersma et al., 2018).

70
The unprecedented horizontal resolution enabled by the last generation of air-quality space-based instruments motivated preparatory field studies around polluted areas in North America (DISCOVER-AQ, https://discover-aq.larc.nasa.gov), Europe (AROMAT and AROMAPEX, Tack et al.,2019) and Korea (KORUS-AQ, https://www-air.larc.nasa.gov/missions/korus-aq/). These campaign activities quantified key pollutants (NO 2 , SO 2 , O 3 , H 2 CO, and aerosols) and assessed practical observation capabilities of future satellite instruments while preparing for their validation. They combined ground-based and airborne 75 measurements. DISCOVER-AQ involved the deployment of the Geostationary Trace gas and Aerosol Sensor Optimization instrument (GEOTASO, Leitch et al.,2014;Nowlan et al.,2016) and of the Geostationary Coastal and Air Pollution Events (GEO-CAPE) Airborne Simulator (GCAS, Kowalewski and Janz,2014;Nowlan et al.,2018). In Europe, the two AROMAT campaigns, which took place in Romania in September 2014 and August 2015, demonstrated a suite of new instruments such as the Airborne imaging DOAS instrument for Measurements of Atmospheric Pollution (AirMAP, Schönhardt et al.,2015;80 Meier et al.,2017), the NO 2 sonde (Sluis et al., 2010), and the Small Whiskbroom Imager for atmospheric compositioN monitorinG (SWING, Merlaud et al.,2018). Different airborne imagers were intercompared and further characterized during the AROMAPEX campaign in April 2016 (Tack et al., 2019).
In this work, we introduce the measurements performed during the AROMAT campaigns and analyze their relevance for the validation of air quality satellite products. The datasets collected during AROMAT fulfill several requirements of the ideal 85 validation study, as described above. We further investigate the strengths and limitations of the acquired data sets.
The paper is structured as follows: Section 2 describes the two target areas and the deployment strategy. Section 3 characterizes the geophysical fields of the sampled areas and Section 4 presents a critical analysis of the strengths and limitations of the campaign results while elaborating on recommendations for future validation campaigns in Romania. The supplement lists the main instruments operated during the campaigns, gives practical details about their deployment, and presents additional 90 information and measurements.

Target areas and deployment strategy
This section presents the two target areas of the AROMAT campaigns, Bucharest and the Jiu Valley. It also lists available studies on air quality at these two sites as well as logistical aspects of relevance. Figure 1 presents a map of the tropospheric NO 2 vertical column densities (VCDs) above Romania, derived from OMI 95 measurements (Levelt et al., 2006) and averaged between 2012 and 2016. The map also indicates the position of the 8 largest cities of the country. Compared to highly polluted areas in western Europe such as northern Belgium or the Netherlands, Romania appears relatively clean at the spatial resolution of the satellite data. There are however two major NO 2 sources clearly visible from space, which appear to be of similar magnitude with NO 2 columns around 2.5 x 10 15 molec cm −2 : the Bucharest area and the Jiu Valley, northwest of Craiova. For the latter, the NO 2 enhancement is due to a series of large coal-fired 100 thermal power plants.

Bucharest
Bucharest (44.4 • N, 26.1 • E) is the capital and largest city (1.9 million inhabitants according to the 2011 census) of Romania.
Within its administrative borders, the city covers an area of 228 km 2 . Adding the surrounding Ilfov county, the total Bucharest metropolitan area numbers 2.3 million inhabitants in 1.583 km 2 . The built-up areas are mainly located within a ring road whose 105 diameter is around 20 km. The NO 2 VCDs seen from space above Bucharest appear lower than over western European sites at the resolution of OMI 110 (see Fig.S7 in the Supplement). However, this is partly due to the dilution effect for this relatively small and isolated source.
Local studies based on the 8 air quality stations inside the city point out that, regarding local PM and NO x levels, Bucharest is amongst the most polluted cities in Europe (Alpopi and Colesca, 2010;lorga et al., 2015). The city center is the most heavily polluted, with concentrations of pollutants well above the European thresholds. For instance, the annual mean concentration of NO 2 at the traffic stations was about 57 µ g.m −3 in 2017 (EEA, 2019). Stefan et al. (2013) have shown the importance of local 115 conditions and anthropogenic factors in air quality analysis in areas close to Bucharest, during two weeks of measurements in 2012. lorga et al. (2015) and Grigoraş et al. (2016) showed that the main NO x contributions came from traffic and production of electricity, spread over about 10 medium-size thermal power plants within the city.

The Jiu Valley between Targu Jiu and Craiova
The second NO 2 plume in Fig. 1  Beside NO 2 , the SO 2 emissions from these plants are also visible from space, as first reported by Eisinger and Burrows 135 (1998) using GOME data. Since 2011, the OMI-derived trends above the area indicate that the emissions of SO 2 have been decreasing, while those of NO 2 are stable (Krotkov et al., 2016). This is related to the installation of flue gas desulfurization (FGD) systems, which was part of environmental regulations imposed on Romania following its entry in the European Union in 2007.

Groups, instruments, and platforms
The AROMAT consortium consisted of research teams from Belgium (BIRA-IASB), Germany (IUP-Bremen, FUB, MPIC), The Netherlands (KNMI), Romania (University "Dunarea de Jos" of Galati, hereafter UGAL, National Institute of R&D for Optoelectronics, hereafter INOE, and National Institute for Aerospace Research "Elie Carafoli", hereafter INCAS), and Norway 155 (NILU). The AROMAT consortium had a common focus on measuring the tropospheric composition using various techniques.
The supplement presents the main atmospheric instruments operated during the two campaigns, classified into airborne, ground-based, remote sensing, and in-situ sensors. The primary target species during AROMAT-1 were NO 2 and aerosols while the observation capacities expanded in AROMAT-2 through the improvements of the AirMAP and SWING sensors for SO 2 measurements and the deployments of other instruments such as SO 2 cameras, DOAS instruments targeted to H 2 CO, and 160 a PICARRO instrument to measure water vapor, methane, CO, and CO 2 .
We used two small tropospheric aircraft: the Cessna-207 from FUB, and the Britten-Norman Islander (BN-2) from INCAS.
The Cessna was dedicated to remote sensing. It mainly performed mapping flights at 3 km a.s.l. for the airborne imagers, while parts of the ascents and descents were used to measure aerosol extinction profiles with the FUBISS-ASA2 instrument. The BN-2, which was only used during AROMAT-2, was dedicated to in-situ measurements around Bucharest between surface 165 and 3000 m a.s.l.. In AROMAT-2, there was also an ultralight aircraft used by UGAL for nadir-DOAS observations in the Jiu Valley. The ultralight aircraft typically flew between 600 and 1800 m a.s.l. Two UAVs, operated by INCAS and UGAL flew during AROMAT-1. These measurements were not repeated during AROMAT-2 since the coverage of the UAVs was too limited, both in horizontal and vertical direction. Finally, we also launched balloons carrying NO 2 sondes from Turceni and performed Mobile-DOAS measurements from several cars during both campaigns. The supplement provides more details 170 about the practical deployments during the campaigns.

Geophysical results
This section presents selected findings related to the atmospheric structure and the geophysical fields of trace gases and aerosols in the two target areas. The Supplement gives details of the instruments involved in these observations. Tables S6 and S7 of the Supplement summarize the observed ranges of the main geophysical quantities for the two campaigns.  During AROMAT-2, the AOD at 532 nm was around 0.2 most of the days, except on 30 August 2015 (0.35) and 1 September 2015 (0.05). Interestingly, the maximum AOD was seen on a Sunday, when the anthropogenic emissions of NO x from Bucharest were minimal, as indicated by Fig. 7, which compares the NO 2 above the city on Sunday 30 August and Monday 185 31 August 2015.

Boundary layer height
We estimated the boundary layer height (BLH) using a gradient method applied to the lidar range corrected signal (Belegante et al., 2014;Timofte et al., 2015).  Except for CH 4 on 31 August 2015, the vmrs were relatively stable in the boundary layer, and rapidly decreased above.
Both soundings exhibit more polluted layers aloft, probably linked with long-range transport. These vertical distributions give confidence in the lidar-derived BLH, under which convection and turbulences homogenize the distribution of aerosols and trace gases.

200
Unfortunately, the BN-2 did not vertically sample the exhaust plume of Bucharest due to flight restrictions. We thus have little access to the vertical distributions of the aerosols and NO 2 in the Bucharest plume.   while AirMAP onboard the Cessna was mapping the city. We extracted the AirMAP NO 2 VCDs at the position of the CAPS observations. The figure confirms that the two instruments detected the plume at the same place. This suggests that along this portion of the flight, which was inside the plume but outside the city, the NO 2 vmr measured at 300 m a.s.l. may be used as a proxy for the NO 2 VCD. The BLH was about 1500m (Fig. 5) during these observations. Assuming a constant NO 2 vmr of 3.5 ppb leads to a NO 2 VCD of 1.4 x 10 16 molec cm −2 . This estimation is close to the AirMAP NO 2 VCD observed in the 225 plume (Fig. 8). Future campaigns should include vertical soundings inside the plume to further investigate its NO 2 vertical distribution. Figure 9 shows the H 2 CO and NO 2 VCDs measurements from the Avantes spectrometer operated onboard the Cessna on 31 August 2015 (morning flight), together with the time-coincident MPIC Mobile-DOAS measurements. The H 2 CO VCDs 230 range between 1±0.25 x 10 16 molec cm −2 and 7.5±2 x 10 16 molec cm −2 , a maximum observed inside the city. We estimated the H 2 CO reference column for the airborne data using the Mobile-DOAS measurements. Both NO 2 and H 2 CO are in good agreement when comparing their distributions as seen from the airborne and ground-based instruments. However, if the highest H 2 CO VCDs are found above the Bucharest city center, they are not coincident with the NO 2 maximum, as can be seen comparing the upper and lower panels of Fig. 9, for instance on the second Cessna flight line from the north.

235
The H 2 CO hotspot observed above Bucharest is mainly anthropogenic. Indeed, biogenic emissions typically account for 1 to 2 x 10 16 molec cm −2 (J.-F. Müller, personal communication), in agreement with the background VCDs measured by the Mobile-DOAS along the Bucharest ring. During the measurements, the wind was blowing from south and west. The difference between NO 2 and H 2 CO spatial patterns may be explained by the different sources of NO x compared to H 2 CO or by the formation time of H 2 CO through the oxidation of VOCs.

240
Anthropogenic hotspots of H 2 CO have already been observed, e.g. above Houston (Texas), an urban area which includes significant emissions from transport and petrochemical industry (Parrish et al., 2012;Nowlan et al., 2018). Nowlan et al. also deployed an airborne DOAS nadir instrument, they reported H 2 CO VCDs up to 5 x 10 16 molec cm −2 in September 2013.   Figure S4 in the Supplement (upper panel) extracts the AirMAP and SWING NO 2 VCDs along the path 255 of the simultaneous ground-based Mobile-DOAS measurements and compares the three datasets. This comparison confirms that there is a good agreement for the airborne instruments but indicates that comparing with Mobile-DOAS instruments is less straightforward. When observed with the Mobile-DOAS, the plume shows higher NO 2 VCDs and appears narrower than with the airborne instruments. This is partly related to air mass factor uncertainties, but probably also to 3-D effects as the plume is very thin and heterogeneous close the power plants, as discussed in Merlaud et al. (2018). 260 Figure 11 shows those AROMAT-1 NO 2 sonde measurements above Turceni which detected the plume. The NO 2 is not well-mixed in the boundary layer, with maxima aloft and lower vmrs close to the surface. This is understandable so close to the source, as high-temperature NO x is emitted from the 280 m high stack. In these balloon-borne datasets, the observed maximum NO 2 vmr is about 60 ppb inside the plume, and the NO 2 vmr vanishes above 1200 m a.s.l.. These results suggest that airborne measurements with the ULM-DOAS, which can fly safely at 1500 m a.s.l., can provide reliable measurements of the integrated 265 column amount inside the plume. Note that we measured NO 2 vmrs of up to 95 ppb on the ground at the soccer field (see Fig.   S11 in the Supplement) when the wind was blowing from the power plant. observations and compares them with SWING results. It is found that the AirMAP-derived SO 2 columns inside the plume SO 2 reach 6x10 17 molec cm −2 and that the AirMAP and SWING SO 2 VCDs agree within 10%. Moreover, for these airborne data, the SO 2 horizontal distribution broadly follows that of NO 2 . The discrepancies can be explained by the different lifetimes of the two species.

Horizontal distribution of SO 2
We have measured higher SO 2 levels during AROMAT-2 than during AROMAT-1, both in terms of VCDs and vmrs, as shown in Fig. S10 of the supplement material. This corresponds to a shutdown of the desulfurization unit of the power plant, which local workers reported during the campaign. On 26 August 2015 in particular, the SO 2 vmr went up to 250 ppb, above the WMO guidelines of 191 ppb for the 10 minutes mean, while it never exceeded 50 ppb in 2014.
As for NO 2 , it appears difficult to quantitatively relate the airborne and Mobile-DOAS SO 2 VCDs observations in the close 280 vicinity of the power plant. As shown in Fig. S4 of the supplemental material (lower panel), the maximum SO 2 VCD measured from the ground on the road close to the factory amounts to 1.3 x 10 18 molec cm −2 while from the aircraft, the SO 2 VCD reached 8 x 10 17 molec cm −2 . Part of this difference can be explained by 3-D effects on the radiative transfer, as for NO 2 . As discussed below, it seems easier to compare the SO 2 flux.   Regarding Bucharest, the mapped area of Fig. 7 virtually covers 43 TROPOMI near-nadir pixels. Averaging the high spatial resolution AirMAP NO 2 VCDs within these 43 hypothetical TROPOMI measurements reduces the dynamic range of the observed NO 2 field. The latter decreases from 3.5x10 16 to 2.6x10 16 molec cm −2 (37 σ where σ is the required precision on the tropospheric NO 2 VCD). Nevertheless, 33 of the 43 hypothetical TROPOMI pixels exhibits a NO 2 VCD above the 295 required 2-σ random error for TROPOMI (1.4x10 15 molec cm −2 ).
Regarding the Jiu Valley, a similar exercise based on our measurments on 28 August 2012 (Fig.S3 in the Supplement) leads to 48 near-nadir TROPOMI pixels, out of which 35 would have a NO 2 VCD above the 2-σ TROPOMI error. The largest NO 2 tropospheric VCD seen by TROPOMI would be around 2x10 16 molec cm −2 (29 σ for TROPOMI).  Table 4 summarizes the NO 2 observations during the AROMAT campaigns. For each instrument, the table indicates the measured range of NO 2 VCDs (or vmr), the ground sampling distance and a typical detection limit and bias. Regarding DOAS instruments, we estimated the detection limits on the NO 2 VCDs from typical 1-σ DOAS fit uncertainties divided by typical air mass factors (AMF). Table S1 in the Supplement presents these typical AMFs and detection limits. The 1-σ DOAS fit uncertainty is instrument specific and an output of the DOAS fitting algorithms. The AMF depends on the observation's geometry, atmospheric and surface optical properties. Uncertainties on the AMF usually dominate the systematic part of the error for the DOAS measurements. Therefore, for these instruments, the bias given in Table 4 corresponds to the uncertainty in their associated AMF.

Characterization of the reference measurements
Combined with the ground sampling distance, the detection limit enables one to quantify the random uncertainty of a reference observation at the satellite horizontal resolution. Indeed, considering reference measurements averaged within a 310 satellite pixel, the random error associated with the averaged reference measurements decreases with the square root of the number of measurements, following Poisson statistics. For instance, a continuous mapping performed with SWING at a spatial resolution of 300 x 300 m 2 inside a TROPOMI pixel of 3.5 x 5.5 km 2 would lead to 214 SWING pixels. Averaging the NO 2 VCDs of these SWING pixels would divide the SWING original uncertainty (1.2x10 15 molec.cm −2 ) by √ 214, leading to 8.2x10 13 molec.cm −2 , about one tenth of the random error of TROPOMI (7x10 14 molec.cm −2 ) given in table 2.

315
However, the temporal variation of the NO 2 VCDs further adds uncertainty to the reference measurements when comparing them with satellite data. The validation areas typically extend over a few tens of kilometers. At this scale, satellite observations are a snapshot in time of the atmospheric state, while an airborne mapping typically takes one or two hours.
We investigated the temporal variation of the NO 2 VCDs comparing consecutive AirMAP overpasses above Bucharest from the morning flight of 31 August 2015. During this flight, the Cessna covered the same area three times in a row between 320 07:06 and 08:52 UTC. For each AirMAP overpass, we averaged the NO 2 VCDs at the horizontal resolution of TROPOMI (see previous section). The standard deviation of the differences between two averaged overpasses then indicates the random part of the NO 2 VCDs temporal variation during an aircraft overpass. This standard deviation is 3.7x10 15 and 4.2x10 15 molec cm −2 , respectively between the first and second, and second and third overpass. Hereafter, we used 4x10 15 molec cm −2 as random error due to the temporal variation. Figure S10 in the Supplement illustrates these investigations on the temporal variation of 325 the NO 2 VCDs at the TROPOMI resolution.
Clearly, the NO 2 VCD temporal variation depends on characteristics of a given validation experiments, such as the source locations and the wind conditions during the measurements. In the studied case however, this error source is larger for the reference measurements than the TROPOMI precision (7x10 14 molec cm −2 ). This is quite different from using static MAX-DOAS as reference. The latter are usually averaged within one hour around the satellite overpass. Compenolle et al. (paper in prepa-330 ration) quantify the temporal error for MAX-DOAS NO 2 VCDs, typically ranging between 1 to 5x10 14 molec cm −2 . Whether the reference is based on airborne or ground-based measurements, underestimating its temporal random error propagates in underestimating the slope between reference and satellite observations. For MAX-DOAS, this also happens when averaging the NO 2 VCDs within larger time windows around the satellite overpass (Wang et al., 2017).

335
We simulated TROPOMI Cal/Val exercises with the spatially averaged AirMAP observations. We considered these averaged AirMAP NO 2 VCDs as the ground truth in simulated TROPOMI pixels, on which we added Gaussian noise to build synthetic satellite and reference NO 2 VCDs datasets. For the synthetic satellite observations, the noise standard deviation corresponded to the TROPOMI random error (the precision in Table 2). For the synthetic airborne observations, we added in quadrature the aforementioned averaged airborne shot noise (e.g. 7x10 13 molec cm −2 for SWING) and temporal error (4x10 15 molec cm −2 , which we assumed to be also realistic around Turceni). We then applied orthogonal distance regressions to a series of such simulations to estimate the uncertainty on the regression slope. This led to slope uncertainties of about 6% and 10% in Bucharest and Turceni, respectively.
In a real-world validation experiment, this regression slope would quantify the combined biases of the two NO 2 VCDs datasets (satellite and reference). These biases mainly originate from errors in the AMFs, resulting in particular from uncer-345 tainties on the NO 2 and aerosol profiles, and on the surface albedo. To some extent, these quantities can be measured from an aircraft with the type of instrumentation deployed in the AROMAT activity. The ground albedo can be retrieved with the DOAS instruments by normalizing uncalibrated airborne radiances to a reference area with known albedo (Meier et al., 2017) or by using a radiometrically calibrated DOAS sensor (Tack et al., 2019). The NO 2 and aerosol profiles can be measured with in-situ instruments such as a CAPS NO 2 monitor and a nephelometer. For legal reasons, vertical soundings are difficult above 350 cities. One can measure the NO 2 and aerosol profile further down in the exhaust plume, once the latter is above rural areas. The conditions inside the city can be different and this motivates the deployment of ground-based instruments, e.g. sunphotometers and MAX-DOAS, inside the city.
Regarding uncertainties on the references AMFs, the added value of knowing the aerosol and NO 2 profile appears when comparing the AMF error budget for airborne measurements above Bucharest (26%, Meier et al. (2017)) and above the Turceni 355 power plant (10%, Merlaud et al. (2018)). In the latter case, there was accurate information on the local NO 2 and aerosol profiles thanks to the lidar and the balloon-borne NO 2 sonde, respectively. We used these two AMF uncertainties to estimate a total possible bias between reference and satellite observations. Table 7 summarizes this section and presents total error budgets for different scenarios of validation exercises using reference airborne mapping to validate spaceborne tropospheric NO 2 VCDs. We estimated the random and systematic uncertainties 360 between satellite and reference measurements with SWING and AirMAP, including (or not) profile information on the aerosols and NO 2 vmr, and for measurements over Bucharest or Turceni. Note that we considered 25% for the satellite accuracy. The temporal error of the airborne measurements clearly dominates the total random error, making the differences in detection limit between AirMAP and SWING irrelevant for this application. Adding the profile information on the other hand reduces the total multiplicative bias from 37% to 28% or 29% in Bucharest and Turceni. This quantifies the capabilities of such airborne 365 measurements for the validation of the imaging capabilities of TROPOMI regarding the NO 2 VCDs above Bucharest and the Jiu Valley. Table 5 is similar to Table 4 but for H 2 CO, which we only measured in significant amounts in and around Bucharest.

Lessons learned for the validation of space-borne H 2 CO VCDs
The background level of the H 2 CO VCD around the city is around 1x10 16 molec cm −2 and the anthropogenic increase 370 in the city center is up to 7x10 16 molec cm −2 (Fig. 9). The background falls within the TROPOMI H 2 CO spread (1.2x10 16 molec cm −2 ), and Fig. 9 indicates that the extent of the urban hotspot only corresponds to a few TROPOMI pixels, with a maximum at 6 σ. This limits the relevance of individual mapping flights for the validation of H 2 CO. The information on the H 2 CO horizontal variability is nevertheless useful, as it justifies the installation of a second MAX-DOAS in the city center, in addition to background measurements outside the city. Indeed, long-term ground-based measurements at two sites would be 375 useful to investigate seasonal variations of H 2 CO. Averaging the H 2 CO over a season would reduce the random errors of the satellite measurements and it could reveal the horizontal variability of H 2 CO from space. The H 2 CO hotspot around Bucharest seems to be visible in the TROPOMI data of summer 2018 (I. De Smedt, personnal communication).
Getting information on the profile of H 2 CO during an airborne campaign may also help to understand the differences between ground-based and space-borne observations. This could be done by adding to the BN-2 instrumental set-up an in-situ  Table 6 is similar to Table 4 but for SO 2 , which we only measured in significant amounts in the Jiu Valley. The higher bias of the airborne measurements for SO 2 compared to NO 2 is due to the albedo. The latter is lower in the UV where we retrieve 385 SO 2 , which leads, for the same albedo error, to a larger AMF uncertainty (e.g. Merlaud et al., 2018, Fig.10).

Lessons learned for the validation of space-borne SO 2 VCDs
Averaging the SO 2 VCDs from the airborne mapping of Fig. 12 at the TROPOMI resolution leads to 30 near nadir TROPOMI pixels above a 2-σ error of 5.4x10 16 molec cm −2 . The maximum SO 2 tropospheric VCD seen by TROPOMI would be 2.4x10 17 molec cm −2 (7 σ). This tends to indicate that airborne mappings of SO 2 VCDs above large power plants could help to validate the horizontal variability of the SO 2 VCDs measured from space, to a limited extent in the AROMAT conditions due to the 390 small dynamic range (7 σ).
However, it would be difficult to quantify the bias of the satellite SO 2 VCD with AROMAT-type of airborne measurements.
Adding in quadrature the biases of the SO 2 VCDs for airborne measurements (40%, Table 6) and for TROPOMI (30%, Table 2) already leads to a combined uncertainty of 50%, without considering any temporal variation or regression error. This best-case scenario is already at the upper limit of the TROPOMI requirements for tropospheric SO 2 VCDs (Table 2).

395
As for H 2 CO, the validation of the satellite-based SO 2 measurements should thus rely on ground-based measurements, enabling to improve the signal-to-noise ratio of the satellite and reference measurements by averaging their time series. An additional difficulty for validating SO 2 VCDs emitted by a power plant arise from the spatial heterogeneity of the SO 2 field around the point source, which renders ground-based VCDs measurements complicated. On the other hand, Fioletov et al.
(2017) presented a method to derive the SO 2 emissions from OMI data and validated it against reported emissions. The SO 2 400 fluxes can be measured locally in several ways and we tested some of them during AROMAT-2 (see Sect. 4.4.2 below). To validate satellite-derived SO 2 products in Europe, it thus seems possible to compare satellite and ground-based reference SO 2 fluxes. Theys et al. (2019) already validated TROPOMI-derived volcanic SO 2 fluxes against ground-based measurements. In this context, a SO 2 camera pointing to the plant stack would be a valuable tool since it could be permanently installed and automated. One advantage of such a camera compared to the other tested remote-sensing instruments, beside its low operating cost, is that it derives the extraction speed from the measurements, avoiding dependence on low-resolution wind information.
Note that since the desulfurization unit of the Turceni power plant was not fully operational during AROMAT-2, the SO 2 VCDs detected above Turceni on 28 August 2015 are expectedly higher than for standard conditions, all the more so as the latter ranges between 10 and 40% of the total NO x according to Trombetti et al. (2018). This tends to confirm that the EMEP inventory underestimates the NO x emissions for Bucharest.  It is difficult to interpret the discrepancies between those measured fluxes and the yearly reported emissions since we observed large variations in the instantaneous emissions with the SO 2 camera (see below and Fig. 13). However, the ratio of the two fluxes appears interesting since we can assume its relative stability. This ratio for a given power plant depends on whether or not a desulfurization unit is operational at the plant. On Fig. 14, Turceni appears to have both the largest measured ratio the NO x flux from the aircraft but confirms that the nearby road is too close to the plant to estimate a meaningful NO x flux from Mobile-DOAS NO 2 observations. Note that the conversion of NO into NO 2 is also visible right above the Turceni stack in the NO 2 imager data of 24 August 2015, as appears in Fig.6 of Dekemper et al. (2016). Figure 13 presents a time series of the SO 2 emissions from the Turceni power plant between 9:00 and 10:50 UTC on 28 August 2015. We derived SO 2 fluxes at different altitudes above the stack using a UV SO 2 camera which is an updated version 470 of the Envicam2 system, used during the SO 2 camera intercomparison described by Kern et al. (2015). We converted the measured optical densities to SO 2 column densities using simultaneous measurements with an integrated USB spectrometer (Lübcke et al., 2013). We estimated the stack exit velocity from the SO 2 images, recorded with a time resolution of about 15 seconds, by tracking the spatial features of the plume. Dekemper et al. (2016) used a similar approach to derive the NO 2 flux from NO 2 camera imagery.

475
The SO 2 fluxes retrieved for transverses at 400 to 700 m vertical distances above the stack agree on average with each other within 20%. Emissions estimated 100 m above the stack are underestimated due to saturation (SO 2 column densities above 2 x 10 18 molec.cm −2 ) and high aerosol concentration close to the exhaust.
The SO 2 emissions show large fluctuations. During the time of our observations they increased from 1 kg.s −1 (15.6 mol.s −1 ) to around 4 ± 1 kg −1 (62.4 mol.s −1 ). The images (Fig.S13 in the supplement) also show a second and weaker source that 480 emits SO 2 . This is probably the desulfurization unit, which was reported to be turned on again on this day, after the temporary shutdown. Indeed, as appears in Table 9, the SO 2 /NO 2 ratio measured from AirMAP is lower than the ones measured from the ULM-DOAS during the previous days, and the same holds true for the Mobile-DOAS measurements.

Conclusions
The We have shown that the airborne mapping of tropospheric NO 2 VCDs above Bucharest is potentially valuable for the vali-490 dation of current and future nadir-looking satellite instruments. These measurements agree with ground-based measurements and cover a significant part of the dynamic range of the NO 2 tropospheric VCDs at an appropriate signal to noise ratio, enabling to constrain the accuracy of the satellite NO 2 VCDs within 37 or 28%, with and without information on the aerosol and NO 2 profile, respectively. This points out the importance of the profile information to approach the TROPOMI optimal target accuracy for tropospheric NO 2 VCDs (25%).

495
A unique advantage of airborne mapping is its ability to validate the imaging capabilities of nadir-looking satellites. This feature becomes more important as the satellite horizontal resolutions reaches the suburban scale. Judd et al. (2019) pointed out the difficulty for static ground-based measurements to represent the NO 2 VCDs measured from space in polluted areas, due to the horizontal representativeness error. This error cancels out by mapping the full extent of satellite pixels. The caveat is the temporal error, which can be larger than with static ground-based measurements. We have estimated the temporal error to 500 be about 4 x 10 15 molec cm −2 in our observations above Bucharest, but it varies with local conditions for a given experiment.
This indicates the usefulness of simultaneous ground-based measurements, which may also be useful to estimate the reference NO 2 VCDs in the airborne observations.
We also detected a clear signal of H 2 CO in and around Bucharest, with an anthropogenic hotspot in the city center. Due to the lower signal to noise ratio of the spaceborne H 2 CO observations, this structure is not visible in daily satellite measurements. 505 We thus propose considering long-term ground-based MAX-DOAS measurements in the city for the validation of H 2 CO.
In the Jiu Valley, NO 2 is clearly visible from both satellite and aircraft, and the VCDs are comparable in magnitude with the signal detected above Bucharest. However, it appears more complicated to quantitatively compare the NO 2 VCDs datasets in the thick exhaust plumes of the power plants. These plants also emit SO 2 but, as for H 2 CO, the low signal to noise ratio of satellite measurements reduces the validation relevance of individual airborne measurements. As the SO 2 emissions have been 510 drastically reduced with the installation of flue-gas desulfurization units in the Jiu Valley, we propose targeting SO 2 emissions from other coal-fired power plants having higher emissions, e.g. in Serbia.
Considering the optimal validation study mentioned in the introduction, the validation relevance of an international airborne campaign is usually limited by its timespan of typically a couple of weeks, imposed in practice by cost considerations. To overcome this limitation, one might consider routine airborne mapping of NO 2 VCDs by local aircraft operators, close to 515 a well-equipped ground-based observatory. Such a set-up would reduce the fixed costs of the observations, which could be allocated to flight hours in different seasons. This would combine the advantages of long-term ground-based and airborne measurements. In the longer term, high altitude pseudo-satellites (HAPS) could help to achieve such routine measurements above selected supersites, which would be particularly valuable to validate the observations from geostationary satellites.