Long-term airborne measurements of pollutants over the UK, including during the COVID-19 pandemic, to support air quality model development and evaluation Why: Skilful representation of pollutant distributions throughout the atmospheric column is important to enable skilful prediction at the surface.

The ability of regional air quality models to skilfully represent pollutant distributions throughout the 10 atmospheric column is important to enabling their skilful prediction at the surface. This provides a requirement for model evaluation at elevated altitudes, though observation datasets available for this purpose are limited. This is particularly true of those offering sampling over extended time periods. To address this requirement and support evaluation of regional air quality models such as the UK Met Offices Air Quality in the Unified Model (AQUM), a long-term, quality assured, dataset of the three-dimensional distribution of key pollutants has been 15 collected over the southern United Kingdom from June 2019 to April 2022. This sampling period encompasses operations during the global COVID-19 pandemic, and as such the dataset serves an additional application in providing a unique resource with which to explore changes in atmospheric composition associated with reduced emissions during this period. Measurements were collected using the Met Office Atmospheric Survey Aircraft (MOASA), a Cessna-421 instrumented for this project to measure gaseous nitrogen dioxide, ozone, sulphur 20 dioxide and fine mode (PM2.5) aerosol. This paper provides a technical introduction to the MOASA measurement platform, flight strategies and instrumentation. The MOASA air quality dataset includes 63 flight sorties (totalling over 150 hours of sampling), the data from which are openly available for use. Example case studies using data from these sorties are presented, which include an analysis of the spatial scales of measured pollutant variability, initial work to evaluate performance of the AQUM regional air quality model, and an 25 introduction to the vertical structure of pollutants observed during repeated flight patterns over Greater London, including during the COVID-19 impacted period.

In a comparison of AQUM to AURN observations, Savage et al. (2013) found that AQUM generally performed well, in particular for large air quality events, but had a number of systematic biases. For example, a positive bias in ozone at urban sites, a positive NOx bias at rural sites and a negative bias at urban sites and general 80 negative biases in both PM2.5 and PM10. Ground based observations are used to bias-correct the model data and minimise some of these systematic biases at the surface (Neal et al., 2014). We note that these biases may not solely be due to model performance and could also be partially attributable to difficulties in evaluating a 12 km resolution model with point observations that have limited spatial coverage, both in the horizontal (raising questions of representivity) and in the vertical (limiting model evaluation away from the surface-atmosphere 85 boundary). These limitations in observational data currently available for model evaluation provide motivation for the current work, with a particular focus on the need for observations away from the surface. Given that vertical mixing serves to transport pollutants both away-from and towards the surface, and pollutant chemical, physical and removal processes occur throughout the atmospheric column, model skill in this domain is critical to achieving successful prediction at the surface (Solazzo et al., 2013).

90
Observations of pollutants throughout the atmospheric column are increasingly available from satellite instruments (e.g. Tropomi on ESAs Sentinel-5P (Veefkind et al., 2012, Air Quality Expert Group, 2020, Wyche et al., 2021 and GOME on ESAs ERS-2 ( Molina and Molina, 2004)). While these observations can provide global coverage extending over timescales of years, they generally contain limited information on the vertical 95 distribution of pollutants within the column (Fleming, 1996, Peers et al., 2019. Instrumented aircraft provide one way of addressing this gap. Over several decades, there have been a number of related large-scale initiatives to instrument in-service commercial aircraft to provide such measurements, for example Measurements of OZone, water vapour, carbon monoxide and nitrogen oxides by Airbus In-service airCraft (MOZAIC, Solazzo et al., 2013) and In-service Aircraft for a Global Observing System (IAGOS, (Petzold et al., 2015)). Over forty-100 four thousand flights have been conducted under IAGOS since 1994 and though temporally and spatially restricted by commercial flight patterns and timings, these projects serve as a prime example of the use of instrumented aircraft to provide long term observations for atmospheric model evaluation. An alternative approach is the use of atmospheric research aircraft (ARA), which are aircraft instrumented and deployed specifically for the pursuit of atmospheric science and monitoring. ARA deployments tend to focus on specific 105 locations or events and instrument payloads can vary greatly dependent on the phenomenon under study. As such, while ARA are particularly well suited to the detailed study of chemical and physical processes (a key requirement for model development), the often-sporadic nature of their deployment limits the generation of consistent, long-term datasets. It is this gap that this work seeks to fill with a specific focus on air quality observations over the UK to allow for the evaluation of regional models such as AQUM. The UK Clean Air: Analysis and Solutions research programme is led by the Met Office and Natural Environment Research Council (NERC) and has invested in modelling, data and analytical tools to assess current and future air quality and the impact of policies designed to improve it (DEFRA, 2019). Under this umbrella, a long-term, quality assured dataset of the three-dimensional distribution of key pollutants (NO2, O3, 115 SO2 and PM2.5) has been collected using the instrumented Met Office Atmospheric Survey Aircraft (MOASA).
Observations have primarily covered the southern UK, including Greater London, with 63 flights throughout the period 2019-2022. This paper introduces the strategy and quality assurance basis for these observations, with the intention of serving as a comprehensive technical reference for all future users of these data. In particular it includes descriptions of: i) the measurement platform and instrumentation, ii) flight strategies, iii) analysis of the 120 spatial scales of measured pollutant variability, iv) initial use of these data to evaluate performance of the AQUM regional air quality model, and v) an introduction to the vertical structure of pollutants observed during repeated flight patterns conducted over Greater London during the COVID-19 impacted period.

Impact of COVID-19
In January 2020, the first case of severe acute respiratory syndrome coronavirus (SARS-CoV-2), referred to as 125 COVID-19, was identified in the UK (Jephcote et al., 2020). Since 24 th March 2020, to curtail person-to-person transmission of the virus, the United Kingdom has been subject to various levels of lawful regulation limiting all non-essential travel and contact. A consequence of the restrictions has been a reduction in mobility (50-75% across major cities during the Spring 2020 lockdown, Air Quality Expert Group, 2020) as businesses switched to homeworking, and industry and commercial sectors reduced operations. This resulted in a significant drop in 130 emissions of primary air pollutants, most markedly from the transport sector (road, rail, and aviation) and in urban environments. Similar impacts have been seen across Europe (Lee et al., 2020) and have collectively resulted in significant changes to UK air quality compared to the climatological norm (Air Quality Expert Group, 2020).
Flight operations with the MOASA aircraft encompass periods in 2020 and 2021 impacted by these COVID-135 related changes to air quality over the UK. The implications of this are two-fold. Firstly, users of these data for model evaluation should be mindful that emissions throughout the measurement period were not always at climatological levels. In addition to bulk concentration changes, pollution properties such as particulate size and composition may also have been different during these periods. While this does not negate the use of these data for some aspects of model evaluation, it certainly cautions against their use to assess quantitative performance of 140 models driven using standard climatological emissions. Secondly, and more positively from a scientific perspective, as the database includes observations covering pre-, during and post-lockdown periods, it presents a unique and valuable resource with which to further explore changes in atmospheric composition over the UK associated with reduced emissions during the COVID-19-impacted period.

145
The MOASA is a Cessna-421 aircraft based at Bournemouth airport, operated by Alto Aerospace Ltd for the Met Office (Fig 1). The MOASA is instrumented to allow airborne measurement of key air quality-relevant aerosol and gas phase pollutants; namely gaseous nitrogen dioxide (NO2), ozone (O3), sulphur dioxide ( Measurement Technology (DMT)) though it does not form part of the air quality measurement suite and therefore is not discussed further here. Nitrogen dioxide, ozone and sulphur dioxide instruments are rack mounted in the cabin and sample at 0.85, 1.8 and 0.5 litres per minute, respectively. All instruments have a 1 Hz 160 sampling resolution, except for the O3 monitor which samples at 0.5 Hz. Ambient gaseous samples are drawn from a stainless-steel air sample pipe that takes air from outside of the fuselage boundary layer through an onrack PTFE headed sample pump (KNF N834.3FTE). Also within the cabin is a backscatter aerosol lidar (Leosphere) which is used operationally though does not form part of the core air quality measurement suite.
The starboard side nose bay compartment contains a custom-built 'Air Quality Box' (AQ Box) and a 165 nephelometer (Ecotech, Aurora 3000) (Fig 2). The sample to each of the instruments in the front hold is controlled with actuated valves and volume flow controllers inside the AQ Box (see Appendix A for AQ Box flow schematic).
The AQ Box contains a Portable Optical Particle Spectrometer (POPS, Handix) and a Tricolour Absorption Photometer (TAP, Brechtel, model 2901) and has the capability to sub-select only PM2.5 sample aerosol for 170 analysis. The sample into the AQ Box is from a Brechtel Iso-Kinetic inlet which samples at 6.35 litres per minute and has >95% sampling efficiency for particle diameters from 0.1 to 6 µm (Brechtel Manufacturing Inc, 2011). The PM2.5 sample flow is dried via two Perma Pure MD-700 driers, connected in series via a 180-degree bend. The sample then passes through an impactor with an aerodynamic cut point size of 2.5 µm, before being split between the POPS (0.5 LPM (sample + sheath)), TAP (1 LPM) and the nephelometer (5 LPM) which is 175 situated alongside the AQ Box. Measurements at the nephelometer and TAP inlet indicate the PM2.5 sample relative humidity is typically below 20% and therefore the sample is a good representation of the dry PM2.5 size distribution. Within the AQ Box the sample line temperature and pressure are also recorded.
Particle losses through the PM2.5 sampling lines have been estimated using open access particle loss calculation software (Von Der Weiden et al., 2009) based on the tubing dimensions, flow characteristics and a 180 representative particle density of 1.64 gcm -3 . This analysis has suggested losses downstream of the inlet of <17% for particle diameters in the range 0.1 -3µm.
In addition to particle losses due to flow deposition, we have considered the extent to which loss of particle mass may occur due to evaporation of ammonium nitrate, NH4NO3, a semi-volatile aerosol component that readily repartitions between condensed and gas phases upon changes in temperature and humidity (Nowak et al., 185 2010, Langridge et al., 2012, Morgan et al 2010 (1971), as implemented by Dassios and Pandis, 1999) was used to calculate the rate of change in diameter of polydisperse NH4NO3 particles through the MOASA flow system. The model unsurprisingly showed that the loss of particulate nitrate had a strong temperature dependence and varied dynamically as a function of time.

190
Total mass losses during the MOASA sampling residence time of 2 seconds and at a representative sampling temperature of 30 o C were approximately 7%. The NH4NO3 losses showed a weak dependence on pressure and relative humidity, with absolute losses increasing by only 2% at 500mb compared to 100mb and by approximately 2% over the relative humidity (RH) range 10-50% (where in-flight PM2.5 sample RH was typically below 20%). Although evaporative loss of NH4NO3 during MOASA sampling will vary on a case-by-

195
case basis, for representative conditions this work confirms that the loss is small and likely less than 7%.
The AQ box also allows for measurement of the aerosol population without particle size selection or drying, however this mode of operation has not been utilised in this work and is therefore not described further.

Nitrogen dioxide
A Cavity Attenuated Phase Shift Spectrometer Nitrogen Dioxide detector (Aerodyne Research Inc, referred to 200 here as NO2CAPS to avoid confusion with the Cloud and Aerosol Precipitation Spectrometer, CAPS) was repackaged in-house, from a 5U, 12 kg to a 3U, 9.7 kg 19" rack-mounted unit to optimize volume and weight for airborne use. The analyser monitors ambient atmospheric NO2 concentrations up to 3000 ppbv (parts per billion by volume) using a 450 nm LED based absorption spectrometer utilizing cavity attenuated phase shift spectroscopy (Kebabian et al., 2005). A comprehensive review of the theory of operation is detailed in Kebabian 205 et al., 2005. The NO2CAPS analyser has been shown to be insensitive to other nitro-containing species and variability in ambient aerosol, humidity and other trace atmospheric species (Kebabian et al., 2005, Aerodyne Research, n.d.).
While some cavity-based absorption techniques are often referred to as calibration free (Langridge et al., 2008), this feature relies on knowledge of the variation in absorption cross-section across the spectral range of the light 210 source being used. Given the broadband nature of the NO2CAPS light source, which is difficult to characterise accurately and may be subject to change over time, we chose to undertake routine direct calibration of the instrument. As such, full multi-point calibrations are carried out annually at the National Centre for Atmospheric Science (NCAS) Atmospheric Measurement and Observation Facility (AMOF) COZI-lab at the University of York. Here, a multi-gas calibrator is used to dilute a high concentration NO standard into zero air (grade Pure

215
Air Generator (PAG) 001) at varying levels. Ozone is added in excess to ensure full conversion of NO to NO2.
Seven concentration levels are used, and zero checks are also carried out. Calibration coefficients are determined from linear fits and applied to the NO2CAPS during data post-processing.

NO2 analyser baseline pressure dependency correction
During normal operation, the NO2CAPS analyser periodically establishes a baseline to account for the optical 220 losses associated with light transmission by the cavity mirrors (which depend both on mirror cleanliness and alignment) and Rayleigh scattering of light by air (Kebabian et al., 2005). This is achieved by passing NO2 free air through the analyser every 15 minutes (automated). The standard NO2CAPS software then applies a constant baseline correction based on these periodic measurements for the sampling segment that follows. For variable- pressure changes lead to shifts in the instrument baseline between filter periods.
To account for these changes, a new correction scheme has been developed. During post processing, the pressure dependence of the baseline is determined by applying a linear fit to the pressure variation in Rayleighcorrected filtered-air measurements recorded across the full flight. This dependence is used to calculate a new time-varying baseline based on sample pressure measurements alone. This baseline is then used to recalculate 230 the NO2 concentration across the flight. Spikes due to valve switches are also removed from the data series at this stage. Figure 3 shows raw (red) and processed (blue) NO2 concentration during flight M304 in November 2021, where the NO2CAPS sample inlet was fitted with a zero-air filter such that measurements were sensitive only to 235 baseline changes. Following take-off at 11:52:00 the aircraft climbed to an altitude of 5.5 km resulting in an ambient pressure change of 509 mb and a NO2CAPS measurement-cell pressure change of 250 mb. The profile shows corrected data is markedly more stable in comparison to the raw data and suggests a mean error in NO2 concentration due to pressure-dependent baseline corrections of ± 0.09 ppbv (data averaged over 10s intervals).
The oscillations seen in the data during the filter test are an artefact of the filter, which impacted performance of 240 the instrument pump. During a separate zero-air test experiment, the sensitivity of the NO2CAPS was derived to be 0.17 ± 0.14σ ppbv (data also averaged over 10s intervals). As such, following correction, NO2CAPS pressure sensitivity is not considered a significant source of uncertainty for aircraft NO2CAPS observations.

Ozone
A dual beam ozone monitor (2B Tech, model 205) enables measurements of atmospheric ozone up to 100 ppmv 245 (parts per million by volume). Measurements are based on the absorption of ultraviolet (UV) light at 254 nm in two absorption cells, one with ozone-scrubbed (zero) air and one with un-scrubbed (sample) air from which the Beer Lambert law can be used to determine ozone concentration. Instrument sensitivity, empirically derived by sampling filtered air at 0.5 Hz during a test flight, is 2.9 ± 0.4 σ. The monitor is calibrated annually at the NCAS AMOF COZI-lab where the instrument is compared with a NIST-traceable standard ozone spectrometer over a 250 wide range of ozone mixing ratios. These results are used to calibrate the ozone monitor with respect to gain and sensitivity which are applied to the instrument directly.
A known but not widely recognized issue with UV absorption ozone monitors is that rapid changes in humidity (as may occur during airborne ascents and descents) can cause a large zero shift. This is due to modulation of humidity of the sample stream by the ozone scrubber which can cause the humidity in the sampling and zero 255 cells to go out of equilibrium. To equilibrate the humidity, Nafion tubes known as DewLines are used in the 2B Tech monitor (Dewline, n.d., Wilson and Birks, 2006). Biases may become apparent should the DewLines stop working effectively and thus, following some initial issues with negative calculated ozone values during MOASA measurements (impacting the first 7 flights), the Dewlines were regularly replaced.

260
A pulsed florescence SO2 analyser (Thermo Scientific, 43i Trace Level-Enhanced) detects sulphur dioxide up to 1000 ppbv. It operates on the principle that SO2 molecules fluoresce following absorption of ultraviolet (UV) light, with the fluorescence intensity proportional to the number of SO2 molecules in the air sample (Beecken et al., 2014). Instrument sensitivity was empirically determined using zero-air checks to be 0.90 ± 0.26 σ ppb (averaged over 10s intervals). The SO2 instrument is calibrated (zero and span) monthly in the field using an 863 265 ppb BOC Alpha Standard.  total scattering, and 7% (450 nm), 3% (525 nm) and 11% (635 nm) for total backscatter, which are adopted here.

Aerosol scattering
The signal to noise ratio for backscattering is worse compared to total scattering, since the backscattering signal is about one order of magnitude smaller than the total scattering signal for ambient air (Müller et al., 2011).

Aerosol absorption
Aerosol absorption is measured using a Tricolor Absorption Photometer (TAP, Brechtel, model 2901). The TAP 285 is a 3-wavelength (467, 528, 652 nm) filter based absorption photometer which derives real-time aerosol light absorption from the difference in light transmission measured between two 47 mm diameter Pallflex (E70-2075W) glass-fibre filter spots, one of which receives particle laden air and the second of which receives aerosol-filtered air (Davies et al., 2019, Bond et al., 1999, Perim De Faria et al., 2021and Ogren et al., 2017.
The TAP employs empirical corrections to account for scattering effects that complicate the derivation of

295
The errors in absorption measurements from filter based photometry are dominated by uncertainties in the empirical scattering corrections, but also have contributions from uncertainties in the spectral response of the light source (±1-2 nm (Ogren et al., 2017)), sample flow rate (<1% (Ogren et al., 2017)), filter spot size and the  (Bond et al., 1999, Davies et al., 2019, Müller et al., 2014, Virkkula, 2010, Ogren et al., 2017. Internal particle losses within the instrument flow system due to diffusion, 300 impaction and sedimentation are estimated to be < 1% for particles with diameters in the range 0.03-2.5 µm (Davies et al., 2019, Ogren et al., 2017. To minimise the effects of instrument noise observed in-flight, a lowpass filter is applied to raw data with a cut-off frequency of 0.08 Hz although this had minimal impact on optical properties derived from these data.
We apply scattering corrections to the low-pass-corrected TAP data using the Virkkula, 2010 correction scheme 305 which relies on simultaneous measurements of the light scattering coefficient, which in this case are provided by the nephelometer. The correction scheme is implemented as described by Davies et al., 2019. Ogren et al., 2017 provided an estimate of the accuracy of TAP absorption measurements of 30% and this value is adopted here.
However, as summarised by Davies et al., 2019, given the empirical nature of filter-based correction schemes and strong source and wavelength dependencies, these correction schemes are unlikely to fully bound 310 uncertainties associated with filter-based absorption measurements.

Aerosol size distributions
A portable optical particle counter (POPS, Handix) measures the size of dried particles predominantly in the accumulation mode (approximately 0.1 um < d < 1 um) using a light scattering technique. The POPS uses a spherical mirror to collect a fraction of light scattered sideways (38 -142 degrees) by individual 315 particles traversing a 405 nm laser beam. The scattered light is directed to a photomultiplier tube, the signal from which is digitised and placed into one of 32 bins that are spaced logarithmically in scattering amplitude space. For a given laser power, the measured scattering amplitude is determined by the particle size, shape, and index of refraction (IOR), thus allowing the bin boundaries to be converted to effective particle size subject to assumptions about shape and optical properties. In addition to particle size, given the POPS is a single particle 320 instrument, it also provides a measure of the total particle number within its detection size range. A comprehensive review of POPS theory of operation is provided by Gao et al. (2016).

Calibration
Particle sizing by the POPS is calibrated by measuring the scattering amplitude of atomised NIST traceable polystyrene latex (PSL) spheres of known size, spherical shape and IOR (Rosenberg et al., 2012, Peers et al., 325 2019, Gao et al., 2016). Calibrations use 10 discrete sizes of PSL between 0.15 and 3 µm. The PSL are atomised and dried prior to entering the POPS sample inlet. PSL sizes between 0.15 and 0.70 µm are, where possible, also passed through a differential mobility analyser (DMA, TSI 3082 Electrostatic Classifier) in order to help minimise the impacts of contaminants from the PSL generation process.
For each PSL diameter, Mie theory is used to calculate the particle scattering cross section (Fig 4), using a PSL

330
IOR at 405nm of 1.615+0.001j (Gao et al., 2016). Linear regression is then used to fit the relationship between the POPS-measured scattering amplitude and the theoretical PSL scattering amplitude (see Appendix B) (Rosenberg et al., 2012). The error in response is determined from the standard error in the mean for each 15 second period of sampling, averaged over the duration of the PSL run. The error in PSL diameter is the NIST- To size ambient particles, it is necessary to convert the bin boundaries to equivalent diameters for particles with different optical properties. The impact of particle index of refraction on the POPS response is shown in Fig 4 which shows the relationship between particle diameter and theoretical POPS response for both PSL's and particles representative of urban sampling. To account for the significant differences seen, we again apply Mie 340 theory. The calibrated POPS bin boundaries in scattering cross section space are converted to diameter space based on Mie calculations. These calculations integrate scattering over the angular range of collection angles of the POPS and use an estimate of the ambient particle IOR (further details below) (Rosenberg et al., 2012, Gao et al., 2016. To overcome inherent Mie resonance oscillations in calculated scattering signals (where Dp > 600 nm in Fig 4), which result in non-monotonic behaviour with increasing particle diameter (van de Hulst 1981, 345 Gao et al., 2016, Rosenberg et al., 2012, each Mie response curve is smoothed using spline interpolation (Hagan and Kroll, 2020). As particle morphology and inter-and intra-particle homogeneity of the ambient sample are unknown, an assumption of spherical, homogeneous particles is implicit to the application of this Mie theory-based approach.

350
The IOR of the aerosol sample used for determination of POPS bins boundaries for ambient sampling is estimated using the method described in Liu andDaum, 2000 andPeers et al., 2019. This is an iterative approach whereby the single scattering albedo (the wavelength dependent ratio of aerosol scattering to total extinction, ω0) is calculated from the dry POPS particle size distribution (ω0psd, λ = 405 nm) using an initial guess IOR and then compared to the measured single scattering albedo at 405 nm derived from independent 355 observations from the MOASA nephelometer and TAP (ω0nt). The IOR is then adjusted iteratively until acceptable closure is reached between calculated and measured ω0, noting that the POPS bin boundaries are adjusted upon each iteration.
This process is summarised in Fig 5 and more detail, including a case study, is in Appendix 5.
A strength of the MOASA data set is that the POPS, TAP and nephelometer all share a common sample inlet,

360
which reduces the potential source of sampling bias that may impact this analysis. Further, to minimise differences in sampling volumes and response times, all ω0 calculations are performed using 30 second averaged data and only data from straight and level runs (SLR, flight transects at approximate constant altitude and velocity) of at least 3 minutes duration are included. The iterative IOR analysis step is performed on the flight-mean of these SLR data. While this approach does not allow in-flight variability to be accounted for, it 365 minimises potential for erroneous impacts on the POPS size distribution arising from noise and uncertainty in the ω0 measurements, which can be large at low aerosol loading levels. The flight-average approach adopted here has been shown to lead to modest errors in particle diameter of <10% compared to analysis at finer temporal scales (see Appendix C, case study). We also note while the IOR derived here provides closure between MOASA optical and size distribution instruments, it is subject to potential uncertainties that caution 370 against its use as an accurate measure of the true ambient particle IOR (Frie and Bahreini, 2021).

Size distribution uncertainties
A review of uncertainties for the POPS instrument is given in Gao et al. (2016). the sample flow over all flights ranged from 2.7 to 5.9 cm 3 s -1 (data averaged over 10s intervals). The higher values arose due to flow system cross-interference issues that generated flow noise impacting the first 11 MOASA flights, following which the source of noise was removed and a more representative range of normal operation is 2.9 cm 3 s -1 ± 3.2%.

380
Coincidence errors, whereby two or more particles traverse the laser beam at the same time leading to sizing errors, are a common feature of all optical particle counters when used in high aerosol loading environments.
The impact of coincidence errors on the MOASA POPS observations are addressed during data processing by flagging all data where particle concentrations exceed 7000 cm 3 /s (McMeeking, 2020, personal communication).

385
Particle sizing uncertainties arise from a number of sources, including scattering amplitude measurement uncertainty (leading to an estimated 3% 1σ sizing error for 500 nm particles) and laser intensity instability (±3 % diameter sizing error for temperatures from 43 to 46 o C). In addition, for reasons already discussed above, uncertainty in the IOR of particles being measured also impact uncertainty in particle sizing. Gao et al. (2016) used a theoretical ambient aerosol population to investigate the potential magnitude of this error. They assessed 390 the accuracy in the location and width of lognormal fits to both a theoretical population fine mode (10% and 10% respectively) and coarse mode (1.4% and 19% respectively). These uncertainties were propagated to derive an estimated uncertainty in the total particle volume of 19%. Though based on a single theoretical ambient size distribution, this analysis provides an indication of the magnitude of error arising from IOR variation. For MOASA POPS-derived size distributions, it is likely to provide an upper indication of the error, given that 395 efforts to correct the POPS bin boundaries based on the iterative IOR method described above should serve to improve sizing accuracy.
Based on the information above, an upper estimate for the error in total particle volume from POPS measurements (required for subsequent calculation of particle mass) is derived by combining in quadrature contributions from IOR/scattering (19%), sample flow (3.2%) and laser amplitude (6%) to yield an uncertainty 400 of 20%.

Determination of mass concentration (PM2.5)
To calculate particulate mass, we convert the calibrated, IOR-corrected POPS particle size distributions to volume distributions, and subsequently mass distributions by assuming a fixed particle density. The total mass is then calculated by integrating across the distribution within the PM2.5 size range. Calculations are performed on The selection of an appropriate particle density for converting volume to mass is an important part of the above apply a fixed density to all data of 1.64 ± 0.07 (1σ) gcm 3 . This value is derived by weight-averaging the densities of PM2.5 aerosol components measured during a range of UK field experiments, as detailed in Appendix D.

415
The total uncertainty in the determined PM2.5 mass concentration, estimated by combining uncertainties in the measured particle volume (20%) and the assumed particle density (4.2%), is 20.4% and thus dominated by the volume error.

Flight Planning
The MOASA air quality flight strategy was based on flying a series of repeated sorties, each designed to provide 420 data suitable for different aspects of model evaluation work. On a week-to-week basis, sorties were selected based on the prevailing weather conditions and any required modifications to flight plans are made at that time.
This section describes the rationale behind each of the sortie types, together with a summary of flight activities.
Given the MOASA home base is at Bournemouth on the south coast of the UK, operations have predominantly focused on sampling over the south of the UK. This includes work over the English Channel (e.g., sampling conducive to the production and build-up of pollutants such as ozone and as such, high pollution events tend to be more frequent and severe in the summer (Savage et al., 2013).

Ground Network Survey
Ground Network Survey sorties describe two flight patterns that sample both rural and urban background 445 regional pollution at various altitudes. One flight pattern is focused on the southwestern UK (Fig 7, panel A1) and the other on the eastern UK (Fig 7.A2). A particular feature of these sorties is that they overfly a number of AURN ground sites allowing pollutant concentrations at the surface to be compared to those aloft.
Characterisation of pollution at regional scales is important for air quality model evaluation, particularly for cannot accurately represent them in terms of location and concentration.

High-Density Plume Mapping
High Density Plume Mapping flights (Fig 7.B) use intensive model grid-box scale sampling to allow for assessment of the (often sub-grid in models) scale of pollutant variability in a high pollution region. Repeated runs upwind, downwind and within the plume are performed at a range of altitudes. This sortie has primarily 455 been flown over Port Talbot in South Wales, a heavily industrialised area and AQUM pollution hotspot, but has also been flown once north of Cambridge (east UK). In that case, horizontal transects sampling the plume at multiple altitudes downwind of the city were conducted.

South Coast Survey
South Coast Surveys were flown onshore and offshore along the south coast of the UK, typically from Dartmoor

Coastal Transition Survey
The coastal transition sortie (Fig 7.D) also operates along the south coast of the UK. The primary distinction from the south coast survey was a zigzag manoeuvre whereby observations across the land-to-sea transition are repeatedly sampled. The objective for this sortie is to obtain data for benchmarking model performance across the land-sea interface where strong gradients in humidity and temperature can impact forecast pollution fields.

470
In later flights, these surveys have also been extended eastwards to encompass the Dover Straights to allow sampling of pollutants transported from industrial activities around the Dunkirk region of northern France, which is another emissions hotspot that can lead to strong pollutant transport over the UK when meteorological conditions permit.

The spatial scales of pollutant variability
The evaluation of limited-resolution regional air quality models (such as AQUM with a 12km grid length) using high resolution in-situ surface or airborne data, is complicated by the differences in spatial scale between the 540 two. While instrumentation may be capable of measurements at high precision and accuracy, these uncertainty metrics may, or may not, provide criteria suitable for determining the degree to which models and observations should agree. In many cases the magnitude of natural pollutant variability at scales that are sub-grid for models provides an important additional consideration. With this in mind, in this section we use the MOASA Clean Air database to assess how observed pollutant variability changes, on average, as a function of length scale, and how 545 this variability compares to fundamental instrument measurement precision.
We take a statistical approach that uses data from all MOASA SLRs, over 44 flights between July 2019 and July 2021. The number of SLRs per flight varies depending on the type of sortie flown, with a minimum of 2 and a maximum of 11 (see table 1). The minimum permissible SLR length was capped at 3 minutes to ensure adequate counting statistics. In total this yielded 240 SLRs representing 1,389 minutes of sampling and we focus here on 550 measurements of relative humidity, NO2, SO2 and total particle number concentration.
High temporal resolution datasets corresponding to each straight and level run (e.g., SO2 in Fig 9),  indicate the range of variability observed at a number of fixed sampling scales. Of particular note, it is clear that measured variability in SO2 was generally close to or below the noise limit of the MOASA instrumentation.
Hence aside from cases of elevated emissions (such as flight M284, fig 10), instrument performance dominates observed SO2 variability in the MOASA database. For RH, NO2 and particle counts, the natural variability is generally well sampled by the MOASA instrumentation. It is interesting to note how the peak position and 575 width of the distributions changes upon moving to progressively longer sampling scales. These changes tell us how, in an average sense, we might expect model sub-grid variability to change as a function of grid box length.
Changes are particularly marked for relative humidity and somewhat less so for NO2 and particulate counts.
Focussing on the 12km length scale relevant to AQUM, the upper ends of the distributions bound the (average) sub-grid variability that we might expect model output to represent. For NO2 this absolute variability is below 580 7.35 ppbv and for particulate counts below 2412.830 counts/second for 90% of data points.

Preliminary model evaluation
In

Ozone
Large ozone biases are seen for both flights (Fig 13). The model data show large overprediction when compared against the aircraft data at corresponding locations (mean model bias of 18.49 ppb and 48.93 ppb for M270 and 600 M296, respectively). It is of note that this model bias is expected to have been larger if the AQUM data was produced using emissions modified for the COVID-19 pandemic (Grange et al., 2021). The bias appears to be relatively consistent across the latitude and longitude ranges of the flights and does not show any particular correlation with location. For M270, the bias is lowest near to the surface and increases with altitude up to approximately 700 -800 m, above which the bias decreases. This can be attributed to differences in modelled 605 and observed boundary layer height, which is discussed further in the following section. Savage et al. (2013) also reported biases during a ground-site AQUM comparison. A statistical post-processing routine using ground based observations is applied to the forecast model data in order to generate the operational forecast and this is known to significantly improve predictions (Neal et al., 2014). It may be possible to use the aircraft observations to help identify sources of model bias, in a similar process to the above, or to determine an 610 ozone bias correction factor that can be applied to the model data. respectively). This model bias is expected to have been larger if the AQUM data was produced using emissions modified for the COVID-19 pandemic (Grange et al., 2021). In consonance with the observations, the model also shows light north-westerly winds at all altitudes. Modelled NO2 concentration is comparable to surface level NO2 at the lowest altitude circuit and decreases imperceptibly with altitude.

Long term observations over London
In this section we look at long term surface level and airborne NO2 and O3 data to illustrate how the two datasets can be combined to help characterise persistent trends in the temporal and vertical distribution of pollutants.

660
Higher concentrations of O3 are observed aloft by the aircraft, where, further away from the surface sources of nitrogen oxides (NOx=NO+NO2), O3 can reform through the oxidation of NO to NO2 with peroxy radicals and subsequent photolysis of NO2 to form O3 (Lee et al., 2020). As such, the increase in O3 is coincident with a reduction of the observed NO2 aloft, which, in addition to being reduced by chemical reaction, is also further 665 away from sources (fossil fuel burning, traffic (Jones et al., 2021, Lee et al., 2020). Here, the impact of external factors (meteorology, boundary layer height, seasonal changes, complex chemistry) are not discussed and is beyond the scope of this paper. However, the persistent difference between the surface-based observations and airborne observations aloft demonstrates the importance in quantifying the vertical structure of pollutants, so their transport to/from the surface and the associated complex chemistry can be better evaluated in models,

680
This can be contrasted with an average increase of 3.35 ug/m 3 (8.12 %) in the hourly surface-level O3 across all sites in Greater London following lockdown (calculated as per above, with individual site data shown in appendix E). The mean increase in O3 is consistent with the reduction in NO emissions following lockdown acting to decrease the extent of chemical loss of O3 through reaction with NO (Air Quality Expert Group, 2020).
However, the increase in surface level O3 is not observed at all sites; of the 6 sites analysed, an increase is seen 685 in the 5 urban background sites and a decrease is seen in the single suburban site. This suggests more complex changes in the production/distribution of O3 in Greater London during the pandemic, consistent with literature on UK-wide surface-level O3 during the pandemic (Jephcote et al., 2020, Lee et al., 2020, Air Quality Expert Group, 2020, Wyche et al., 2021 and further work is recommended on the effect of observing site location on ozone production.

690
Throughout the pandemic, the start and end of lockdown periods were not clearly defined (restrictions were incrementally decreased in different locations on different timescales). As such it presents a complex timeline and, given individual flight observations were made over discrete time periods, perturbations in long-term trends in airborne NO2 and O3 due to COVID impacted emissions are not immediately evident. However, the availability of airborne observations concurrent to this complex timeline presents a unique opportunity to 695 examine, in depth, case studies of the three-dimensional distribution of emissions below climatological levels during the COVID-19 pandemic, as well as the subsequent recovery to 'normal' (pre-pandemic) emissions, which is beyond the scope of this paper.

Conclusions and future plans
A long-term, quality assured, dataset on the three-dimensional distribution of NO2, O3, SO2, and fine mode PM- Analysis of relative humidity, total particle counts, NO2 and SO2 over the campaign shows that instrument  bounds are 3 and 0.7, respectively, AAE is removed when raw red absorption < 1 Mm -1 and the AAE is set to 1.5 if the difference between absorption channels is < 1 Mm -1 . For the SAE, upper and lower bounds are 2.5 and 0.5, respectively, SAE is removed when raw red absorption < 10 Mm -1 and the AAE is set to 0.5 if the 955 difference between scattering channels is < 1 Mm -1 . The data is then further averaged over 30 seconds to minimise variability from instrument noise/precision and any mismatch of data. To minimise uncertainties in wavelength correction using the ÅngstrÖm exponents, ω0nt is derived from the blue wavelengths only, using equation C3.

975
: Error propagation for ω0nt, where σsc is independent scattering and σa is independent absorption coefficients.
ω0 is not very sensitive to the real part of the index of refraction, and as such the real part of the estimated index of refraction is not very well constrained (Peers et al., 2019). Figure C1 shows ω0psd derived using Where insufficient data is available to enable calculation of the ω0 and thus IOR, an IOR for flights in a similar location and meteorological conditions is adopted. The uncertainties associated with applying a flightmean IOR is investigated in more depth in the following case study.

985
Case study Section 2.7 describes the processing applied to particle sizing measurements to account for sizing errors caused by differences in the IOR between the calibrant and ambient particles. The method applies corrections based on the assumption of a single ambient IOR per flight, which was derived via an iterative process based on achieving closure with independent observations of particles single scattering albedo. In this section we observations over London (typically from 0.85 in urban plumes to 0.95 in regional pollution and background 1000 aerosol).
A flight mean ω0psd=0.917±0.10 σ (Fig C1, blue line) was calculated using a particle size distribution (PSD) corrected with an optimally derived IOR=1.59+0.12j (herein referred to as IORDER). To examine sensitivity in particle sizing due to variability in observed ω0 throughout the column, we also undertook PSD corrections based on achieving closure between ω0psd and the maximum observed ω0nt (IORMAX, 1.59+0.008j), minimum

1020
In summary, we conclude that use of a flight-mean IOR approach in correcting size distribution data introduces modest uncertainty of <10% compared to applying a variable IOR approach.