Intercomparison of in situ measurements of ambient NH 3 : instrument performance and application under ﬁeld conditions

. Ammonia (NH 3 ) in the atmosphere affects both the environment and human health. It is therefore increasingly recognised by policy makers as an important air pollutant that needs to be mitigated, though it still remains un-regulated in many countries. In order to understand the effectiveness of abatement strategies, routine NH 3 monitoring is required. Current reference protocols, ﬁrst developed in the 1990s, use daily samplers with ofﬂine analysis; however, there have been a number of technologies developed since, which may be applicable for high time resolution routine monitoring of NH 3 at ambient concentrations. The following study is a comprehensive ﬁeld intercomparison held over an intensively managed grassland in southeastern Scotland using currently available methods that are reported to be suitable for routine monitoring of ambient NH 3 . In total, 13 instruments


Introduction
Excess reactive nitrogen in the environment has been demonstrated to have environmental impacts, as highlighted by the European Nitrogen Assessment (ENA) (Sutton et al., 2011). The ENA identified five key threats of excess reactive nitrogen to Europe: water quality, air quality (AQ), greenhouse gas (GHG) balance, ecosystem and biodiversity, and soil quality (WAGES). Atmospheric ammonia (NH 3 ) plays a direct role in four of the five WAGES and is indirectly implicated in the greenhouse gas (GHG) balance, as it influences the radiative balance through secondary aerosol formation. Ammonia is the highest concentration basic gas in the atmosphere, forming secondary inorganic particulate matter of 2.5 µm or less in aerodynamic diameter (PM 2.5 ) following reaction with acidic gases. PM 2.5 has AQ impacts on human health, visibility, and climate (Sutton et al., 2020). Vieno et al. (2016) have shown that reductions in NH 3 emissions in the United Kingdom (UK) would result in the reduction in PM 2.5 , findings that were mirrored in the global studies of Gu et al. (2021) and Pozzer et al. (2017). Globally and across Europe, agriculture is the primary source of NH 3 emissions (> 80 %) (Backes et al., 2016). It is predicted that current NH 3 emissions will increase under most future scenarios due to (1) a rise in global temperatures and (2) predicted growth in global consumption of animal products. Fowler et al. (2015) estimate that global annual emissions of NH 3 will increase from 65 Tg N yr −1 (2008) to 135 Tg N yr −1 in 2100 based on an assumed increase of 5 • C in global warming by 2100 and the continued increase in the global consumption of animal products. There are, however, large uncertainties in NH 3 emission inventories, with up to 1 order of magnitude in some sectors (Kuenen et al., 2014). It is therefore essential to accurately measure ambient NH 3 concentrations to better quantify concentration and concentration changes and hence to evaluate the impacts of NH 3 .
To understand the complexities of NH 3 in the atmosphere and to provide evidence of the effectiveness of mitigation strategies, accurate, traceable routine NH 3 monitoring is required. One of the major challenges is to achieve accurate and precise NH 3 measurements at the source (typically > 1 ppm) and close to emission sources (typically > 100 ppb) as well as ambient background concentrations (< 0.1 to 10 ppb). Concentrations of NH 3 vary greatly across spatial and temporal scales, as this molecule is deposited rapidly and is also reactive in the atmosphere. Until recently, achieving quantitative artefact-free measurements of long-term monitoring at high temporal resolutions (LTMHTR) required a high attention to detail and the operation of instrumentation. This tended to only be economically feasible in the research domain; hence, monitoring strategies of ambient NH 3 vary between countries. The current reference method of the United States Environmental Protection Agency (US EPA, 1999) and the European Monitoring and Evaluation Programme (EMEP, 2001) is by sampling a known volume of air through acid-coated (typically citric acid) denuders for 2-12 h, with offline analysis. The disadvantages of the US EPA and EMEP denuder methods are that they are labour intensive and susceptible to handling and storage artefacts in addition to the fact that they do not provide the high temporal resolution information that state-of-the-art methods can provide. Individual European countries have taken different approaches, sometimes combining a few LTMHTR sites alongside passive monitoring networks that sample at a lower frequency (weekly to monthly). The passive sampler networks tend to follow the recently published European standard diffusion sampler methodology, EN 17346: Ambient air -Standard method for the determination of the concentration of ammonia using diffusive samplers (CEN, 2020). In the Netherlands, LTMHTR has been carried out since 1992, initially using continuous flow annular wet rotating denuders (WRD) with selective ion membrane and conductivity analysis in an instrument called the ammonia monitor (AMOR, ECN, NL) until 2015, and then using differential optical absorption spectroscopy (DOAS, RIVM, NL) (Volten et al., 2012;Berkhout et al., 2017). In the UK, there are two LTMHTR (hourly) NH 3 measurements at rural background sites using WRDs with online ion chromatography analysis, as implemented in the commercial Monitor for AeRosols and Gases in Ambient air (MARGA, Metrohm, NL) . Wet chemistry LTMHTR instruments (AMOR and MARGA) require specialist operators and are labour intensive; however, calibration and quality assurance are accurate and simple, as they use liquid calibrations. The disadvantage of the wet chemistry approach is that there is the potential that, at elevated concentrations, not all NH 3 is captured by the WRD, and for the selective ion membrane and conductivity analysis method, it is not ion specific, and therefore, it is possible that there could be interference from other gasphase compounds.
There have been major advances in spectroscopic approaches to NH 3 measurement over the last 20 years. Previously, mid-infrared (MIR)-lead salt diodes required cryogenic cooling and were frequently multimodal, but these have been replaced by stable, more powerful and monochromatic, thermoelectrically cooled lasers. The development of reliable IR light sources, initially near-infrared (NIR) diode lasers and later mid-infrared quantum cascade lasers, resulted in an increasing number of spectroscopic instruments on the market. These include cavity ring-down systems (CRDS; Martin et al., 2016;Kamp et al., 2019), optical-feedback cavity-enhanced absorption spectrometers (OF-CEAS; Leen et al., 2013;Leifer et al., 2017), quantum cascade laser absorption spectrometers (QCLAS; Whitehead et al., 2008;Ellis et al., 2010;Zöll et al., 2016), open-path Fourier transform infrared systems (FTIR; Bjorneberg et al., 2009;Suarez-Bertoa et al., 2017), and photoacoustic methods (Pogány et al., 2009;von Bobrutzki et al., 2010;Liu et al., 2020). Recently, CRDS instruments have been introduced for routine ambient NH 3 monitoring in France; in addition, the French national metrology institute has been involved in the calibration of the instruments (Macé et al., 2022). There are also other types of instruments, e.g. utilising the ultraviolet (UV) spectrum for spectroscopy and the aforementioned DOAS systems in the Dutch network. Chemical ionisation spectrometers (CIMS), including the proton-transferreaction mass spectrometer (PTRMS, Ionicon), have been shown to be applicable for the measurement of NH 3 (Norman et al., 2009;von Bobrutzki et al., 2010;Pfeifer et al., 2020). However, there is no record in the literature of CIMS being used for routine NH 3 monitoring, presumably due to their high acquisition cost.
Since the most recent NH 3 intercomparison studies (Schwab et al., 2007;Norman et al., 2009;von Bobrutzki et al., 2010), there are more LTMHTR instruments on the market, advertised to be applicable for routine NH 3 measurements. The instruments have become more affordable and now no longer, in theory, require specialist operators, resulting in reduced labour costs; some claim to provide quantitative measurements down into the parts-per-trillion (ppt) range. However, their capabilities under field conditions have still to be evaluated against established methods, as no standard protocols for setup, operations in the field, and routine calibrations of these instruments exist. Traceable NH 3 gas standards are now available, but they have not been tested in field systems for undertaking routine in-field quality assurance and quality control.
This study reports a field intercomparison within a European Joint Research Project (EMRP), Metrology for NH 3 in ambient air (MetNH3; Pogány et al., 2016). MetNH3 aimed to improve the comparability and reliability of ambient air NH 3 measurements by achieving metrological traceability for NH 3 measurements in the range of 0.5-500 ppb from primary certified reference material (CRM) and instrumental standards at the field level. In this study, 13 instruments -including commercially available technologies, research prototype instruments, and legacy instruments -were deployed and exposed to concentrations from background (< 10 ppb) to elevated (> 200 ppb). The instruments included an online ion chromatography system (MARGA, Metrohm-Applikon,NL), two wet chemistry continuous flow analysis systems (AiRRmonia, Mechatronics, NL), a photoacoustic spectrometer (NH 3 monitor, LSE, NL), two mini differential optical absorption spectrometers (miniDOAS; NTB Interstate University of Applied Sciences Buchs, now part of "Eastern Switzerland University of Applied Sciences, CH and RIVM, NL"), and seven spectrometers using cavityenhanced techniques: a quantum cascade laser absorption spectrometer (QCLAS, Aerodyne, Inc. US), a Picarro G2103 analyser (Picarro US), an economical NH 3 analyser (Los Gatos Research, US), a Tiger-i 2000 (Tiger Optics, US), and a ProCeas ® gas analyser (AP2E, FR). In this study, we evaluate the precision of these instruments by comparing their data to the ensemble median and studying the between-instrument variability, including those operated on a common manifold, as recommended by von Bobrutzki et al. (2010). The importance of setup and time response is also considered through the use of duplicate instruments with different inlet designs. Metrological methods developed under the MetNH3 are also evaluated under field conditions as standards for determining the accuracy of the instrumentation deployed, as previous studies of the metrological applications have focused on laboratory settings (Pogány et al., 2016(Pogány et al., , 2021. We discuss recommendations for future LTMHTR ambient NH 3 measurements, considering instrument capabilities and sampling setups to achieve high precision for use in routine monitoring of NH 3 and also where further developments are still required in determining the accuracy of ambient NH 3 measurements. Figure 1. (a) Layout of field site, (b) wind rose of wind direction and wind speed for the period 23 to 29 August (generated using OpenAir package in R; Carslaw and Ropkins, 2012), and (c) photo of set-up instruments (photo credit: Mhairi Coyle, UKCEH).

Field site description
Instruments were deployed at an intensively managed grassland in southeastern Scotland, which lies approximately 12 km south of Edinburgh, between 22 August-2 September 2016. The grass is dominated by Lolium perenne (perennial ryegrass) over an area of approximately 5 ha, which is split into two fields. The instrumentation was positioned along the boundary between the two fields ( Fig. 1), which are typically used for intensive grazing. For the campaign, the southern field with the dominant wind direction was being grown for silage so that a uniform surface was available for the study. On 23 August, both fields were fertilised with approximately 35 kg N ha −1 of urea (pellets) to generate larger concentrations. This field site was previously used in 2008 in an NH 3 intercomparison (von Bobrutzki et al., 2010), where an application of 35 kg N ha −1 urea resulted in NH 3 concentrations of up to 120 ppb at the site.

Instrumentation
During the campaign, instrumentation was housed in either the tow van or the mobile laboratory; the exceptions were the open-path miniDOAS instruments that were positioned on the scaffolding and the AiRRmonia #1, which is designed to be operated outside to minimise the inlet used. All participants were given the opportunity to sample from a common high-flow inlet, where applicable. The instruments housed in the mobile laboratory shared a high-flow inlet with a Pyrex manifold, with the exception of the MARGA. This manifold setup used a 1/2 (ID) polyethylene (PE) tubing with a length of 3.5 m (sampling point to manifold) and with an airflow of 50.08 L min −1 when all instruments were operational. The residence time from the sampling point to manifold exit was calculated to be ∼ 1.62 s. All instruments were configured to sample at a height of approximately 1.7 m. Table 1 presents a summary of all instrumentation employed, including the sampling position, reporting temporal resolution, and manufacturer-and user-reported limit of detection. Table 2 summarises, where applicable, instrument inlet characteristics including length, flow rate, residence time, air velocity, and Reynolds number. The table also states if the instrument has a filter inline for sampling.

Wet chemistry methods
During this campaign, three wet chemistry instruments, which convert gas-phase NH 3 to aqueous NH 3 (NH + 4 ) for online analysis, participated in the field campaign: a Monitor for AeRosols and Gases in Ambient air (MARGA, Metrohm NL) and two AiRRmonia (Mechatronics B.V., NL) instruments.

MARGA
The MARGA (Metrohm, NL) is a method used to measure both the gas phase of several water-soluble species (NH 3 , HCl, HNO 3 , HONO, and SO 2 ) as well as their aerosol counterparts (NH + 4 , Cl − , NO − 3 , and SO 2− 4 ) and base cations (Na + , K + , Ca 2+ , Mg 2+ ) by means of online ion chromatography. The gas-phase species that are water soluble, including NH 3 , are sampled using a wet rotating annular denuder (WRD), through which air is drawn and the gas diffuses into a continuously exchanging liquid film on the surface. Water-soluble aerosols do not have sufficient time within the denuder to diffuse into the liquid film; instead, they are then drawn into a steam jet aerosol collector (SJAC; Khlystov et al., 1995), where they undergo rapid growth in a steam chamber and are then mechanically separated out by a cyclone. Both the liquid from the WRD and SJAC are continuously drawn by syringes and sequentially analysed by ion chromatography. The cation chromatography was set up with a 500 µL loop, and as a result, the detection limit for NH 3 has previously been reported to be 0.05 µg m −3 (0.72 ppb at 25 • C at STP). An instrument blank was undertaken, but it was not subtracted from the reported concentrations. A more detailed description of the instrument can be found in Makkonen et al. (2012). During the campaign, the instrument's inlet had a PM 2.5 cyclone (URG Inc. USA). The inlet sampled at a rate of 16.7 L min −1 . Due to limited space within the mobile laboratory and the positioning of the MARGA, the positioning of the instrument resulted in a longer inlet with a length of 8.46 m, which is atypical compared to other studies (Makkonen et al., 2012;Twigg et al., 2015;Stieger et al., 2018).

AiRRmonia
The AiRRmonia (Mechatronics B.V., NL) is a wet chemistry instrument based on NH + 4 analysis using a selective diffusion membrane and conductivity method (Erisman, 2001). Sampling is carried out by drawing air over a Teflon diffusion membrane where gas-phase NH 3 diffuses into ultra-pure water, which is in counterflow to the air sample. The sample is then mixed with a sodium hydroxide solution, which forces the liquid NH + 4 back to the gas phase so that diffusion can occur across a second Teflon membrane into ultra-pure water. The conductivity of the water and sample are measured to derive a temperature-corrected concentration of NH + 4 , from which the NH 3 gas concentration can be derived. The sample is continuously drawn using syringe pumps, providing a constant liquid flow rate. The two AiRRmonias instruments were calibrated together at the start and end of the trial using liquid NH + 4 standards ranging from 0 to 500 ppb. The limit of detection has been reported as 0.08-0.1 µg m −3 (equal to 0.114-0.142 ppb at STP @ 25 • C), and the operational accuracy has been reported as 3 %-10 % (Erisman, 2001;Norman et al., 2009). In this study, there were differences in the reporting resolution and inlet setup between the two AiRRmonias instruments (refer to Tables 1 and 2 for further details).

ALPHA ® samplers
During the campaign, passive samplers -Adapted Lowcost Passive High Absorption diffusive samplers (ALPHA ® ), UK) -were placed along a transect in triplicate (1.7 m height) at three positions (at 3.5, 10.5, and 17.5 m measured from the scaffolding) to investigate the homogeneity between the miniDOAS instruments and the reflectors (refer to Fig. 1). The ALPHA sampler is a diffusion badge-type device with a citric acid-coated filter. The ALPHAs were exposed in triplicate, with a rain shelter, at each position for two periods -Period 1: 22 August 2016, 16:35 GMT, to 29 August 2016, 16:29 GMT, and Period 2: 29 August 2016, 16:29 GMT, to 5 September 2016 GMT. Chemical analysis was performed using an ammonia flow injection analyser (AMFIA; ECN, NL), which deployed the same analytical principle as the AiRRmonia. An uptake rate of 0.00324 m 3 h −1 was established by comparison with a local active sampler (UKCEH DELTA ® , UK). The prepara- tion, deployment, and analysis followed the EN17346 standard methodology (CEN, 2020). Further details of the theory of the passive sampler can be found in Tang et al. (2001); likewise, in a recent exposure chamber study (Martin et al., 2019), an expanded uncertainty of < 11.6 % was shown for concentrations ranging from 1 to 23 µg m −3 .

Cavity ring-down spectroscopy (CRDS)
Cavity ring-down (CRD) instruments utilise the near-infrared region and use an optical cavity to increase the pathlength and thereby to improve sensitivity in measuring the absorption. The laser is periodically turned off to allow the light to decay as it leaks out of the cavity through the mirrors. This happens as the beam is reflected multiple times off the mirrors within the cavity, resulting in a large pathlength. When an absorbing gas is added to the cavity, the mean lifetime of the beam decreases, and the absorption coefficient can be obtained from the measured ring-down times. The concentration is calculated from the "ring-down time", which is the time it takes for the light to decay to 1/e of its original intensity. During the campaign, there were three instruments that used this analytical technique.

Picarro G2103 analyser (Picarro)
The Picarro G2103 analyser (Picarro, US) uses the CRDS technique. The gas temperature and pressure are kept constant in the cavity, at 45 • C and 140 Torr (corresponding to ∼ 187 hPa), respectively. The analyser uses a tuneable NIR diode laser as a light source, which is scanned over multiple, isolated data points inside the spectral window, from 6548.50 to 6549.25 cm −1 , which includes several NH 3 , H 2 O and CO 2 absorption lines. A cross-sensitivity to H 2 O, and CO 2 , originating from the overlapping absorption lines of the three molecules, is effectively eliminated by using empirical correction functions, as outlined in Martin et al. (2016). During the campaign, two of these instruments were operated (Picarro #1 and #2); however, this correction was not yet released by the manufacturer at the time of the field study and thus was not yet implemented in the participating instruments. The reported detection limit from the manufacturer for this instrument is 0.09 ppb. In this study, Picarro #1 relied on an external pump with a sampling rate of 0.8 L min −1 , whereas Picarro #2 utilised an external pump with a sampling rate of 1.35 L min −1 (refer to Tables 1 and 2). The Picarro #2 instrument was also used as an optical gas standard (OGS), as described in Sect. 2.3.1.

Tiger-i 2000 (Tiger optics)
The Tiger-i 2000 (Tiger Optics, US) analyser also uses the CRDS technique. Like the Picarro G2103, it utilises a tunable continuous wave (CW) NIR diode laser. The instrument is configured to deliver concentration measurements of NH 3 in the ppb regime, and with regular maintenance prescribed by the manufacturer, the system should not, in theory, require calibration. The manufacturer states that Tiger-i is able to measure trace NH 3 in ambient air without effects from varying humidity levels or from potentially interfering molecules due the high specificity of the CRDS technology. During the campaign, the instrument was configured to have a detection limit of 10 ppb.

Quantum cascade laser absorption spectroscopy (QCLAS)
Mini-TILDAS ammonia monitor The mini-TILDAS ammonia monitor is a quantum cascade laser absorption spectrometer (QCLAS) produced by Aerodyne Reasearch Inc. (Billerica, USA) and is provided with an inertial inlet. Due to the instrument being reported already in the literature (Whitehead et al., 2008;von Bobrutzki et al., 2010), it is referred to as the QCLAS during this study in order to limit confusion. Air was sampled at 13 L min −1 through a quartz siloxyl coated-inertial inlet (removing particles > 300 nm from the air stream) followed by a 3 m perfluoroalkoxy (PFA) tube, both of which were heated to a temperature of 40 • C, based on the design of Roscioli et al. (2016), though no passivation was used. The QCLAS uses an astigmatic multi-pass absorption cell (AMAC) with a pathlength of 76 m (volume 0.5 L and 30 Torr), a continuous-wave midinfrared quantum cascade laser operated at 966.814 cm −1 during this campaign (Roscioli et al., 2016), and a thermoelectrically cooled detector. Substraction of the background spectrum was performed with dry research-grade nitrogen (BOC, Product 293679-L, 99.9995 % N 2 min) for 30 s every 30 minutes. The manufacturer-reported detection limit for this instrument is 0.05 ppb. Although the instrument can be operated at 10 Hz for eddy covariance flux measurements, here it sampled at 1 Hz to increase sensitivity and to reduce data volume.

Off-axis integrated cavity output spectroscopy (OA-ICOS)
GLA331-EAA enhanced-performance economical NH 3 analyser (LGR) The GLA331-EAA enhanced-performance economical NH 3 analyser (ABB-Los Gatos Research, US) uses the off-axis integrated cavity output spectroscopy (OA-ICOS) technique. The LGR instrument uses either an internal two-head or external three-head diaphragm pump (Table 1) to continuously draw air through a 1/4 PTFE inlet tube into the cavity for measurement; the cavity is pressure controlled to maintain a pressure of 100 Torr. The OA-ICOS cavity is a cylindrical two-mirror design with the gas inlet and outlet at either end; sensors for gas temperature and pressure are inserted via ports in the middle of the cavity. A fibre-coupled, continuously scanned ∼ 1.7 µm diode laser is directed into the gas inlet side of the cavity, and a wideband IR detector with collimating lens covers the mirror on the gas outlet side of the cavity. Although the cavity mirrors are highly reflective (> 99.99 %), a fraction of the light directed into the cavity will "leak" on each pass, allowing the collection of a resolved, continuously scanned absorption spectrum, which forms the basis of the measurement. The laser is pulsed to produce wavelength scans at several hundred Hz, which are then integrated to provide 1 s real-time data. It is able to achieve a detection limit of 0.3 ppb at 100 s. During the campaign, two LGR instruments were used: LGR #1 used its internal pump (0.25 L min −1 ), and LGR #2 used an external pump (2.3 L min −1 ); in addition, the inlet for LGR #2 was heated to stabilise between 40 and 70 • C (refer to Tables 1 and 2 for further details).

Optical-feedback cavity-enhanced absorption spectroscopy (OF-CEAS)
ProCeas gas analyser (AP2E) The optical-feedback cavity-enhanced absorption spectroscopy (OF-CEAS) uses the principle of absorption spectroscopy. In OF-CEAS, the concentration is based on a scanned-wavelength direct measurement of absorption as a function of integrated transmitted laser intensity. For a detailed description of the OA-CEAS, refer to Morville et al. (2005). The ProCeas ® gas analyser produced by AP2E, Aix-en-Provence, France, utilises the OF-CEAS techniques with a high-finesse, V-shaped optical cavity made with three highly reflective mirrors and including a fibred, distributedfeedback diode laser to operate in the near-infrared at a wavelength of ∼ 1.53 µm. It had a reported detection limit of 45 ppt (3σ , 300 s). During the campaign, the instrument was operated with an external pump only (refer to Tables 1 and 2 for further details).

Photoacoustic spectroscopy NH 3 -1700 analyser (LSE)
During this campaign, only one instrument used photoacoustic spectroscopy, which takes advantage of the development of stable quantum cascade lasers in the IR; however, instead of measuring the absorption of light, it measures an acoustic signal. The signal is generated as target molecules absorb light of the IR and become excited, resulting in a pressure change. The LSE NH 3 -1700 analyser (LSE) by LSE Monitors, the Netherlands, uses this method by modulating the laser at an acoustic frequency of 1600 Hz; the resultant pressure modulation is detected by a microphone. By scanning the laser over a specific spectral range, the gas of interest can be determined by the recorded microphone signal. It has a detection limit of 1 ppb.

Mini differential optical absorption spectrometer (miniDOAS)
Differential optical absorption spectroscopy, or DOAS, retrieves the concentration of a trace gas from its characteristic fingerprint in an optical spectrum in the ultraviolet spectral range -refer to Platt and Stutz (2008) for a thorough discussion of this method. The two systems taking part in this campaign were miniDOAS #1 -developed by the Bern University of Applied Sciences, Switzerland, in collaboration with Neftel Research Expertise -and miniDOAS #2 -developed by the Dutch National Institute for Public Health and the Environment (RIVM), the Netherlands. The systems were of a similar setup. Each system used a UV lamp to generate a light beam. The beam is reflected back to the instrument by a retroreflector placed at a distance of 22 m, creating an optical path of 44 m. The light is collected by a telescope and measured with a low-cost compact spectrograph. A adjustable mirror corrects for small changes in the alignment of the setup. Measured spectra are averaged over a period of typically 1 min. Whilst the closed-path instruments described above work at low pressure and reduce line broadening so that they can distinguish different absorption lines for different compounds, the open-path nature of the DOAS necessitates that the NH 3 concentration be retrieved from an averaged spectrum along with concentrations of SO 2 and NO, which also have optical absorptions in the wavelength range used (205-230 nm), using the DOAS inversion algorithm.
Both systems were designed and built at their respective institutes. They are described in detail elsewhere: miniDOAS #1 in Sintermann et al. (2016), and miniDOAS #2 in Volten et al. (2012) and Berkhout et al. (2017). The most important differences between the systems were as follows: -A deuterium lamp is used in miniDOAS #1, and miniDOAS #2 uses a xenon arc lamp. Because a xenon lamp emits much visible light, miniDOAS #2 uses an interference filter to block this part of the spectrum; miniDOAS #1 does not require a filter.
-The spectrograph in miniDOAS #1 is peltier cooled; the one in miniDOAS #2 is not.
-Although both instruments are housed in temperaturecontrolled boxes, the temperature of miniDOAS #1 is better stabilised than that of miniDOAS #2.
Calibration of the systems took place in the laboratory (Sintermann et al., 2016;Berkhout et al., 2017) before deployment at the field site. The lamp reference spectra used were obtained from the 61 spectra, with the lowest NH 3 concentrations measured during the campaign. The reference spectra are the baseline; the DOAS concentrations are calculated as the difference to this concentration, so they can also be negative. During this campaign, the instruments were placed side by side on a scaffolding (see Fig. 1). Their optical paths ran at 1.78 m above the ground. Because the optical paths are in the free atmosphere, no delay or interference from inlets, filters, or surfaces can occur. This means the measurement is not affected by temporal averaging beyond the integration time, but note that the concentration retrieved by a DOAS is an average over the entire optical path. This is to be taken into account when comparing results to instruments that sample air from a single inlet point. Since this campaign, significant improvements have been made to miniDOAS #2, especially in the handling of the spectrograph dark current and in stabilising the optical alignment (Swart et al., 2022).

Metrologically developed components
As part of the study, metrological methods developed under the MetNH3 were evaluated under field conditions, which were used to estimate the accuracy of LTMHTR instruments but not to calibrate any instruments.

Optical gas standard (OGS)
An optical gas standard (OGS) is an instrumental transfer standard concept that does not require initial or repetitive calibration using calibration gases. Instead, an OGS determines absolute concentrations based on first principles, i.e. a full physical model of the absorption process. Here, the measured absorption in a sufficiently spectrally isolated rovibrational transition of a small molecule like NH 3 is described by Beer-Lambert's law, an analytical absorption line shape model, and by molecular spectral parameters like the absorption line strength. The OGS concept is explained in Nwaboh et al. (2021Nwaboh et al. ( , 2017 and Qu et al. (2021). Buchholz et al. (2014) rigorously validated the calibration-free property of an OGS for the case of H 2 O by cross-comparing the H 2 O-OGS named SEALDH with PTB's primary gas humidity standard. An OGS can thus serve as a field transfer standard and be used to calibrate and validate other instruments. In this study, the Picarro #2 CRDS instrument operated by PTB was converted into an OGS by extracting and refitting the raw CRDS absorption spectra. The OGS essentially extracts and re-evaluates the Picarro raw spectra; hence, it uses the same hardware but a completely different evaluation and different spectral reference. To this end, it was fully metrologically characterised in the German national metrology institute, Physikalisch-Technische Bundesanstalt (PTB) -i.e. the accuracy of the temperature sensors, the pressure sensors, and the spectral scale (wavenumber) was verified by comparison to SI standards. Furthermore, a custom spectral fitting algorithm using accurately measured spectral line parameters (Pogány et al., 2021) was developed and employed by PTB. An expanded uncertainty of 1 % could be achieved for the line intensity of the two strongest NH 3 lines, which allowed the total uncertainty of the retrieved NH 3 concentra-tion to be decreased down to 3 % (k = 2, 95 % confidence interval). Further important contributors to this uncertainty are spectral line broadening coefficients or the choice of the fitted spectral model. Due to this full physical model, the need for empirical, calibration-based instrument corrections -e.g. to compensate spectral interferences )was eliminated. As a result, traceable and absolute NH 3 concentrations were obtained.

Permeation calibration system
Ammonia calibration in the field is difficult due to the adsorptive nature of NH 3 resulting in losses to the inlet and surfaces of the calibrator, tubing, and instruments, associated with long stabilisation times to achieve equilibrium and uncertainty of absolute concentrations (Vaittinen et al., 2014(Vaittinen et al., , 2018. A metrologically traceable source was developed under laboratory conditions in the framework of the EMRP MetNH3 project. The campaign was a means to determine the applicability of the system in the field, to determine the accuracy of measurement instrumentation under field conditions, and thus to allow for comparability of the results. The traceable source was a dynamic calibration system known as ReGaS (reactive gas standard; Pascale et al., 2017), developed and constructed by the Federal Institute of Metrology (METAS), Switzerland. For this campaign, only the ReGaS1 was applied in the field. The ReGaS1 reference gas generator was developed to dynamically generate SI-traceable reference gas mixtures with very low levels of uncertainty (< 3 %) in the 0.5-500 nmol mol −1 range (0.5-500 ppb). It employs as the NH 3 source a permeation device in a temperaturecontrolled oven and two dynamic dilution steps with mass flow controllers to obtain the required amount fractions using zero-grade synthetic air (SA) (158283-L-C, BOC). Additionally, a commercially available gas purification cartridge (Microtorr, model MC 400-203V SAES Getters, Pure Gas Inc.) was used for additional synthetic air purification. According to the product specifications, the outflow of purified SA should contain less than 100 pmol mol −1 H 2 O and less than 100 pmol mol −1 CO 2 . The content of acids, bases, organics, and refractory compounds in the outflow should not exceed 10 pmol mol −1 . The Microtorr purification system is based on inorganic sorbent materials and operates at normal ambient temperature (no heating or cooling required). The connectors of the cartridge are made of stainless steel. Re-GaS1 is transportable to allow for in situ calibration of NH 3 instrumentation. A SilcoNert2000 coating has been applied to all interior surfaces of ReGaS1 in contact with NH 3 in order to reduce adsorption effects and thus stabilisation times. During the calibration, the ReGas1 was connected to a Teflon six-port manifold using 1/4 PFA, which was connected to a three-way valve and T-piece that had been coated in SilcoN-ert2000.
The instruments that were evaluated against the ReGaS (i.e. LSE, Picarro #2, LGR #1, LGR #2, and Tiger Optics) were transferred from the Pyrex manifold to the Teflon manifold for this purpose. Due to the maximum flow rate of the ReGaS1 (5 L min −1 ), the LGR #2 did not use its external pump but was reliant on the internal pump of the instrument and so had a flow rate of 0.25 L min −1 , which equates to a residence time of 6.83 s for the inlet. The system was set for the following concentrations in sequence for the duration of 31 minutes each: 0, 9.98, 24.39, 39.71, 2.95, and 1.02 ppb. Unfortunately, the data of following instrument were excluded from the analysis; the LGR #1 concentrations remained low, even at elevated concentrations, indicative of a fault, and the Tiger Optics reported 0 ppb, as it could not detect concentrations below the 10 ppb detection limit. As a result, in this study, only information from the OGS, LSE, and LGR #2 are evaluated against the ReGaS.

Gas cylinders
Stable traceable primary standard gas mixtures (PSMs) of NH 3 were developed by the UK's national metrology institute, the National Physical Laboratory (NPL), in order to improve the current state-of-the-art metrological traceability and validation of NH 3 instrumentation. The PSM employed in this work was prepared gravimetrically using the method outlined in the guide (ISO, 2001) using pure ammonia (Air Products, VLSI, 99.999 % purity) and nitrogen (Air Products, BIP+, 99.99995 % purity). Full details of the preparation of the cylinders can be found in Martin et al. (2016). During this study, PSM cylinder number 1825R2, which contained 99.78 ppm NH 3 in N 2 , was used to calibrate the miniDOAS #2 instrument.

Data analysis
For each instrument, the data quality assurance (QA) procedures, where applicable, are outlined in the Method Section for each instrument. For the AiRRmonia, MARGA, QCLAS, and the miniDOAS instruments, the zeros and/or calibration standards used are described in Sect. 2.2.1, 2.2.3, and 2.2.7, respectively. They were applied to the datasets prior to undertaking the data analysis presented in Sect. 3. No other LTMHTR instrument had any zero or calibration applied, as the instrument manufacturers described the methods as "calibration free" at the time of the study, so they were only operated with the manufacturer factory calibrations. Data which did not meet the QA was not included in the analysis; further details are found in the Sect. 3.2. Measurements provided in units of µg m −3 were converted to parts per billion (ppb) using the temperature and pressure measured at the Easter Bush site. To facilitate direct comparisons, data were averaged to 1 h, unless stated otherwise, to match the reporting time of the slowest instrument. The data analysis assumed that instruments "received" or "saw" the same concentrations in the field. Efforts were made to remove likely periods of inhomogeneity during the data analysis (refer to Sect. 3.5); however, instruments which did not share a common inlet will not have received exactly the same concentrations at all times (Table 1). This is specifically an additional consideration for the miniDOAS instruments that measure a line average concentration (22 m) rather than sample at a point. Though instruments were deployed for a longer period, for the purpose of this study, only the period of 23 to 29 August is studied, unless otherwise stated, as not all instruments were operational at the start and end of the campaign.

Meteorology and background aerosol composition
during the campaign Figure 2 summarises the meteorology (wind speed, wind direction, temperature, and relative humidity) for the period studied. The cumulative rainfall during the campaign was atypical for the site: 2.8 mm compared to averages of 98 mm for the month of August (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014). Though the site was unusually dry, the average temperature of 14.3 • C was typical (climatological average 14.03 • C, 2005-2014) for August in southeastern Scotland, with temperatures ranging from 7.8 to 20.6 • C. As expected for this time of year, the predominant wind direction was from the southwest. As well as reporting NH 3 gases, the MARGA also reported the PM 2.5 water-soluble inorganic species. Prior to fertilisation on 23 August, PM 2.5 was dominated by sea salt (NaCl), but during the interim, it was dominated by secondary inorganic aerosols, which coincided with a drop in wind speed and a reduction in the relative humidity on the 24 August.

Overview of the NH 3 measurements during the campaign
The time series of the measurements by the instruments at their reporting temporal resolution (1 s to 1 h), unfiltered, are summarised in Fig. 3 (Table S1 in the Supplement). Instruments display similar temporal features for NH 3 concentrations over the duration of the study, though there are differences in their structures due to differences in the reporting and measurement resolution (refer to brackets in legend of Fig. 3). The maximum NH 3 concentration observed was on the evening of the 24 August following fertilisation on 23 August (Fig. 3). It is likely that the emission of NH 3 was suppressed following fertilisation due to intermittent precipitation during 23 August (Fig. 2) and that the peak in NH 3 concentrations observed the following evening was due to stable ( Fig. S2) and dry conditions (Fig. 2). The LSE instrument reported the highest concentration, with a maximum of 282 ppb (1 min average). The concentrations reported by all the instruments following fertilisation were large compared to the von Bobrutzki et al. (2010) study, which reported maximum concentrations of 120 ppb, though the same amount of urea was applied to the same field. The difference in meteorological conditions during the von Bobrutzki et al. (2010) study is likely to have impacted NH 3 emissions -specifically, the site received a high volume of rain, resulting in the formation of a pond in the north field, whereas this study was relatively dry (Fig. 2). Many temporal features can be picked out as the concentrations change throughout the field campaign (Fig. 3); however, to have a brief look at instrument response, the response on 25 August at 04:00 is discussed for each panel in Fig. 3. Figure 3a presents the time series for each of those instruments that used their own inlets during the campaign. It was observed that the QCLAS had a faster decrease in concentration at 04:00 GMT on 25 August compared with the other instruments that used their own inlets, as the wind direction changed to a northeasterly direction (Fig. 2). The delay in the time response of the other instruments is likely to be due to the instrument setup, with long inlets and low airflow rates ( Table 2). The delay in the MARGA is also likely to be due to both the reporting interval (1 h average) and the atypical inlet length. Instruments on the manifold (Fig. 3b) did not show the delayed response following the change in wind direction. The exception to this is the LGR #1 (but not LGR #2). LGR #1 was reliant on its internal pump to subsample from the manifold; as a result, it had a lower sample flow rate, resulting in a slower response time (Table 1). The Tiger Optics instrument was set up in a configuration with a 10 ppb limit of detection, and following post-campaign data analysis, only the period of 23 August, 20:00, to 26 August, 11:00, was valid and is presented. The LGR #1 reported 0 ppb NH 3 initially, and a laser fault was identified by the operators. The fault was corrected remotely by the manufacturer on 24 August at 10:00 GMT (refer to the arrow on Fig. 3). Following this, there is an apparent improvement in agreement of LGR #1 compared to the other instrumentation on the manifold (refer to the arrow on Fig. 3). Therefore, for the remainder of this study, only data after 24 August, 10:00, are used for the LGR #1. Instruments in the campaign situated on scaffolding (Fig. 3c) were either open path or had a very short inlet. The AiRRmonia #1, though reporting at the same temporal resolution as the miniDOAS instruments (1 min averages), did not capture the same temporal features, demonstrating a slower instrument response time. This is not surprising, since it was previously reported in a study by in von Bobrutzki et al. (2010) that the AiRRmonia had a time response of 14 ± 4 min.
As described in Sect. 2.2.1, the ALPHA samplers were deployed along the miniDOAS pathlength to evaluate the homogeneity of the NH 3 during the campaign. Both miniDOAS #1 and miniDOAS #2 compared well to the AL-PHA samplers (Fig. 4), reporting 11.3 and 10.9 ppb , respectively, compared to 10.9 ppb from the ALPHA samplers during period 1. This is within the method error. A summary of the averages from each instrument can be found in Table S2. It is worth noting that, during Period 1, which is the main focus of this study, though there were large temporal variations in concentrations (Fig. 3), the transect of ALPHA samplers reported similar concentrations, suggesting that the NH 3 concentrations were relatively homogenous spatially. In Period 2, however, the miniDOAS #1 appeared to report a higher average concentration over the whole period due to a lower data capture of 89 % for this period.
Across all instruments, though the temporal pattern was comparable, there are large variations in the reported magnitude of concentrations measured, even when data is averaged to an hour (Fig. 5). For example, on the morning of the 25 August, 02:00 GMT, when NH 3 was elevated, the AiR-Rmonia #2 reported the lowest concentration of 57.2 ppb, whereas AiRRmonia #1 reported a concentration 66.8 ppb. The highest concentration reported at this point was by the LSE with 88.5 ppb. These extreme values are a function of the averaging time and the response time of the instrument. A faster instrument naturally shows larger extreme values, and the NH 3 adsorbed to the inlet walls has the potential to desorb during subsequent hours. On the longer-term average, the instruments that covered the period 23 to 29 August with high data coverage (≥ 98 %) agreed within ±15 % of the overall mean (Table S1).

Precision across the suite of instrumentation
To assess the precision across the suite instruments during the campaign, the coefficient of variance was studied (CV, Eq. 1). As a guidance, the US EPA accepts a CV of up to 10 % for PM sensors and up to 15 % for NO 2 monitors (EPA, 2012;Sousan et al., 2016;Crilley et al., 2019); an increase beyond this range suggests a worsening of the reported precision, where currently there is no guide for NH 3 . For the purposes of this study, a CV limit of 20 % was set. The CV (%) is calculated using the following equation: where σ = standard deviation, and µ = mean for the measurement of the hourly average reported by the reporting instruments in each period. Figure 6 summarises the CV compared to the hourly average reported concentration by the ensemble median during the campaign. It is observed that the ensemble CV varied between 10 % and 50 %. On 23 August, the CV was high (> 20 %), which matched a period of low NH 3 concentrations (< 10 ppb). The CV then suddenly dropped (< 20 %) at around midnight on 24 August, coinciding with an increase in the average concentration (> 10 ppb). At 20:00 on 24 August, there is a spike in the CV, as there is a rapid increase in the NH 3 concentration, but then it drops again at 22:00, as the concentration remains elevated. It is postulated that the loss in agreement between instruments during this period is due to the different response times to a rapid change in concentration. The same reduced precision is also observed when the NH 3 concentrations decrease on 25 August, which again is likely due to the differing response times of instrumentation.

Effect of inlet setup on response time
There were a number of inlets used during the campaign, and this is hypothesised to have affected the concentrations received by the instruments, which is especially apparent at lower concentrations in the CV of the suite of instruments.
To study the inlet's design impact on time response, the two collocated instruments of the same model, the Picarro and AiRRmonias, were studied, as the operational differences were the instrument and inlet setup. The LGR comparison was excluded due to the poor performance of the LGR #1  (refer to Sect. 3.2 for further details). The time response of each instrument was calculated based on the response of the miniDOAS #1, as it does not have an inlet and is therefore assumed to have an immediate response to changes in concentration. It is assumed that any differences in time response are due to adsorption or desorption effects. To determine the response of the instruments, the miniDOAS #1 data were smoothed using the running mean of the measured concentrations (c(t)) by adjusting its smoothing factor (f ) until the delayed smoothed concentration (c (t)) matched the data from the slower instrument in each case, based on the same method as von Bobrutzki et al. (2010;Eq. 5).
The e-folding time (τ 1/e ) was then calculated by τ 1/e = 1/f . Figure 7a compares the results of the AiRRmonia #1 and #2 and Picarro #1 and #2 to those of the miniDOAS #1 under elevated concentrations. It is clear that the AiRRmonia #2 has a slower response compared to the AiRRmonia #1, with a 95 % response time from AiRRmonia #1 of 18.4 min compared to the AiRRmonia #2 with a response time of 372 min, demonstrating that the presence of an inlet with a low flow rate (1 L min −1 ) leads to a loss of the NH 3 temporal features. This is not, however, the only controlling factor for the response of an instrument, as the Picarro #1 inlet is calculated to have a residence time of 1.3 s for air compared to Picarro #2 that has a residence time of 3.6 s (including the manifold inlet and manifold); nonetheless, it still appears that the Picarro #2 performs better. It is postulated that the surface area / volume ratio for the Picarro #1 is 2 times the surface area / volume ratio of Picarro #2 (Table 2), resulting in more molecules interacting with the inlet walls, leading to the observed smoothed feature. It was discounted that turbulent flow was a controlling factor in the response time, as it would be expected that wall interactions would increase under a turbulent regime, leading to greater losses (Table 2). In contrast, under ambient conditions, the response time of the instruments is reduced (Fig. 7b). The AiRRmonia #1 e-folding time increased from 6.15 to 32.8 min; similarly, the Picarro #2 changed from 4.48 to 49.5 min. This is to be expected, as the losses of NH 3 due to the adsorption and desorption effects of both the inlet and instrument are more apparent, as any losses make a greater contribution to the absolute concentrations when at low concentrations.

Performance of instrumentation at ambient conditions (< 10 ppb)
Though there is evidence of agreement across the suite of instruments at high concentrations, in order to understand the varying performance across the instruments, the hourly ensemble median (excluding the Tiger Optics and LGR #1 due operational issues -refer to Sect. 3.2 for details) was split into NH 3 < 10 ppb or NH 3 ≥ 10 ppb, so that a direct comparison could be made to the von Bobrutzki et al. (2010) study and to assess performances for lower concentrations without the results being skewed by individual large concentration points. The intercomparison is presented for the full dataset in Fig. S3 in the Supplement. In the absence of a "perfect" reference instrument, Fig. 8 presents the summary of the instrument comparison to the ensemble median for NH 3 < 10 ppb. It is acknowledged that the ensemble median could be biased if the majority of instruments are biased. The Tiger Optics data were excluded from this analysis, as the instrument used during the comparison had a limit of detection of ∼ 10 ppb. For all the data (both the open and closed circles), the majority of instruments have a spread of points around the one-to-one line. Instruments which reported an R 2 < 0.6 compared to the ensemble median were the miniDOAS #2, LGR #1, Picarro #1, AP2E, and AiRRmonia #2, whereas the instruments that reported R 2 > 0.9 were the LSE, AiRRmonia #1, Picarro #2, and LGR #2, three of which sampled from the common manifold. To investigate if the differences were due to periods of inhomogeneity in NH 3 concentrations at different sampling locations, caused by low wind speed and atmospheric stability conditions, the data were filtered to exclude data when wind speed was < 0.8 m s −1 , and atmospheric stability was filtered for −0.1 < z − d/L > 0.1 (Fig. 8). There was an improvement in the performance of most instruments, with reported R 2 ranging from 0.71 to 0.98, with the exception of the LGR #1 and the miniDOAS #2, reporting the lowest R 2 values with 0.27 and 0.55, respectively, suggesting that these  instruments randomly deviated from the ensemble median. It is assumed that the difference between the miniDOAS instruments is due to the stability of instrumentation in regulating temperature; however, it is beyond the scope here to interrogate each instrument's temperature dependence. It is noted that the slopes and intercepts changed when applying a meteorological filter. In general, instruments with faster response found their slopes reduced, whereas the reverse was observed for the instruments with the slower time response. With the exception of the AP 2 E, slopes after filtering were closer to unity, and with the additional exception of Picarro #2, the intercepts decreased. For the remainder of this discussion, however, we will only discuss the filtered data.
Most instruments (Fig. 8) had a slope of less than 1, with the exceptions of the AP2E, Picarro #1, and the LSE. The largest slope reported was from the AP2E (1.47), and it had the largest negative offset of −1.39 ppb. The y axis offset is a result of uncertainties in the linear fit and contamination or losses of NH 3 in the inlet or the instrument. Interpretation of the intercept is limited here in order to hypothesise regarding the relationship between predicted NH 3 (from the ensemble median) and the concentration response of the instruments. Contamination, inlet losses, limits of detection, and non-linear instrument response are the major issues which will lead to linear slopes with significant offsets. Negative intercepts are often indicative of losses of NH 3 , either to the inlet or the instrument; however, the large slope and high scatter (r 2 = 0.76) would also contribute to the offset value. The instrument with the smallest offset is the QCLAS, which had an offset of 0.05 ppb but had a slope of 0.82 compared to the ensemble median. The largest positive offsets are seen in the Picarro #1 (with an offset of 1.05 ppb), miniDOAS #1 (0.74 ppb), LGR #1 (2.11 ppb), LGR #2 (0.65 ppb), and the AiRRmonia #2 (0.75 ppb). Working with the assumption that, within the uncertainty of the regression, the positive offsets are real, the positive offsets in this case could be attributed to contamination in the inlet or, in the case of the CRDs, in the inline filters. For the LGR #2, another possible explanation is that heating the sample line may have resulted in a positive offset due to the volatilisation of NH 4 NO 3 . The (large) positive offset found for the miniDOAS #1 cannot be due to contamination, since it is an open-path instrument. The two miniDOAS systems reported different offsets at below 10 ppb, as the systems use different approaches to derive the concentrations. The differences between the two instruments can include variations in the spectral, fits leading to biases for NH 3 or another interfering gas (e.g. SO 2 , NO), uncertainties in the spectral lines used, or technical issues including alignment, dark current, or imperfections in the spectral response of the spectrograph. Identifying the source of the differences between the miniDOAS systems is challenging. A similar positive offset was observed by Berkhout et al. (2017), who compared the miniDOAS instrument to an AMOR wet chemistry analyser. It is suggested that the miniDOAS #2 was sensitive to ambient temperature, as the spectrometer was not temperature controlled compared to the miniDOAS #1.

Performance of instrumentation at elevated ambient NH 3 concentrations (≥ 10 ppb)
Under elevated concentrations of NH 3 ≥ 10 ppb, filtered for wind speed and atmospheric stability, all instruments demonstrated improved agreement with the ensemble median (Fig. 9). The AP2E, Picarro #1, AiRRmonia #2, and the LGR #1 all reported an R 2 ≤ 0.81, whereas all other instruments reported a correlation of R 2 > 0.95. The instruments reporting a lower R 2 , with the exception of LGR #1, sampled from the same location but used their own inlets. The same instruments also reported large positive offsets of 4.3, 2.67, and 2.4 ppb for AiRRmonia #2, AP2E, and Picarro #1, respectively. For concentrations ≥ 10 ppb, the instruments with a slope greater than 1 are the miniDOAS #2, LGR #2, Tiger Optics, Picarro #2, AiRRmonia #1, and the LSE and are the instruments which have an R 2 > 0.96. The only exceptions to this are the miniDOAS #1, QCLAS, and the MARGA, which consistently reported a slope less than 1 but also reported an R 2 of 0.97, 0.99, and 0.98, respectively. It is likely that the MARGA would have losses due to the length of the inlet used (Table 2). In addition, the capture efficiency of the MARGA of the WRD was limited at high concentrations of NH 3 . When the solution becomes more alkaline, "breakthrough" -where NH 3 is not captured by the WRD but continued through to the SJAC, where the NH 3 would be reported as an NH + 4 aerosol -can occur. To confirm the breakthrough, the ion balance of the PM 2.5 reported was investigated. It was apparent that, at elevated NH 3 concentrations, there was an excess of NH + 4 aerosol over neutralising anions, which can be attributed to the breakthrough of NH 3 gas from the WRD to the SJAC (Fig. S4). This therefore highlights that, in the configuration presented, the MARGA is limited in its range of concentration measurements. The work here comes to a similar conclusion with regards to the slope for the QCLAS as that by von Bobrutzki et al. (2010), who also reported a slope less than 1 when compared to the ensemble median of the partaking instruments. However, the two studies differ in that there it is not a clear split on the performance of the wet chemistry instruments. In von Bobrutzki et al. (2010), it was found that all the wet chemistry instruments had a slope > 1, whereas in this study, at > 10 ppb, the AiRRmonia #1 had a slope > 1, while the reverse is observed for MARGA and AiRRmonia #2. This potentially highlights how performance varies with setup but could also reflect further progress in the development of the spectroscopic methods since the 2010 study.

Variability between individual instruments
To investigate the relationship between individual instruments, least-squares regressions were carried for (i) the whole range and (ii) when values were < 10 ppb of the ensemble median (Tiger Optics was excluded for the < 10 ppb comparison). The instruments were then clustered according to Euclidean distances based on their correlation coefficients. It is immediately clear (Fig. 10) that, when using this approach, all instruments compared well when the whole period was studied. However, if the analysis was limited to below 10 ppb, a different relationship emerged. The LGR #1 was the worst performing instrument, with an average R 2 = 0.44 when studying concentrations below 10 ppb, whereas the LGR #2, which was the same make and model, compared well with other instruments. Even though remote troubleshooting from the manufacturer has been performed on LGR #1 (see Sect. 3.2), this may be linked to a remaining misconfiguration of the instrument, preventing low NH 3 concentrations from being quantified with acceptable performance. The miniDOAS instruments compared well when studying the whole time series (R 2 = 0.99) and were even clustered together; however, their relationship changed when examining concentrations below 10 ppb with an R 2 = 0.88. At concentrations below 10 ppb, the instruments that operated with their own inlets, with the exceptions of the QCLAS and AiRRmonia #1, correlated well with each other but not with the instruments on the manifold or the miniDOAS instrumentation. AiRRmonia #1 was instead grouped with the LSE and Picarro #2 on the manifold, even though their locations were different. The QLCAS was grouped with the LGR #2 and the miniDOAS #1, even though its sampling point was the same as the instruments with their own inlets, suggesting that the sampling point was not a factor.
The second approach used to assess the variability between instrumentation was to look at the normalised difference (ND) calculated between instrumentation using the equation (Pinto et al., 2014) where X i is the concentration of one instrument, and the X j is the concentration measured by another instrument. The ND is then used to calculate coefficients of divergence (CD) to investigate the similarity between instruments (Wongphatarakul et al., 1998): where P is the number of points. For CD = 0, the two instruments are identical, and a CD of 1 indicates that the instruments are completely different. The reason this additional technique was chosen to compare instruments is that the statistical technique provides greater weighting to low concentrations where the main deviations occur between instruments, as observed when comparing the ensemble median to concentrations below 10 ppb (Sect. 3.5), and that it also describes the systematic differences whilst even a correlation coefficient of 1 still allows for an offset and slopes other than unity. Table 3 summarises the CD values between instruments, with the comparison of the LGR #2 and the Picarro #2 having the smallest CD (0.04). It is clear that there is not much difference between the LGR #2 and Picarro #2 when looking at the ND (Fig. 11a), though there may be a positive bias of the LGR #2 to the Picarro #2 at lower concentrations. The two instruments, which operated on the same manifold, agreed well. There are a number of possible explanations for the positive bias at lower concentrations. It is known that both spectrometers have a potential for water interferences, as previously reported by Martin et al. (2016) for the Picarro. In this study, the Martin et al. (2016) correction had not been applied to the Picarro #2. An alternative explanation is that the air sampled by Picarro #2 had a longer residence time between the manifold and the instrument (Table 2), resulting in greater losses of NH 3 to the inlet, which is more evident at lower concentrations. Another hypothesis could be that the use of a heated inlet by the LGR #2 could have led to the potential volatilisation of ammonium nitrate (NH 4 NO 3 ↔ NH 3 + HNO 3 ), generating an NH 3 interference. Compared to the miniDOAS #1, which does not have an inlet, there was no obvious difference in the ND for the LGR #2 and the Picarro #2 to provide a further explanation of the above hypotheses ( Fig. 11b and c). The comparison of the miniDOAS #2 and the AP2E resulted in the largest reported CDs. When the ND is displayed (Fig. 11d), it is apparent that the data are scattered, especially at the lower concentrations. It is especially noticeable that there was a divergence of the miniDOAS #2 when the instruments in this study were grouped based on their CD using a hierarchal clustering approach, where the Euclidean distance Figure 9. Intercomparison of instruments' (hourly) averages from 22 to 29 August 2016 to the ensemble median (excluding LGR #1 and Tiger Optics) when the median is equal to or greater than 10 ppb NH 3 . Data were filtered for low wind speed and stable and unstable conditions that could have led to inhomogeneity at the site. The solid black line is the 1 : 1 line, and the red dashed line is the fit. was calculated based on CD and presented in a dendrogram (Fig. 12). Even though the miniDOAS #1 and miniDOAS #2 are the same analytical method, they were separated into the two distinct groups. This is hypothesised to be the result of the different approaches in the spectral algorithm and the calibration procedures between the two miniDOAS instruments -see the previous discussion in Sect. 3.5. Instead, the miniDOAS #1 clustered with the QCLAS. Even though the CD between the miniDOAS #1 and the QCLAS was low (Table 3), there appears to have been an obvious positive bias at lower concentrations in the QCLAS measurements when looking at the ND between the two instruments (Fig. 11e). This positive bias was not observed for the Picarro #2 or LGR #2 in the ND when compared to the miniDOAS #1 but was observed when both instruments were compared to the QCLAS, suggesting that the bias lay with the QCLAS. The positive bias was investigated to see if it was related to drift in the instrument with time, background NH + 4 aerosols, or the influence of relative humidity; however, none of the parameters assessed could explain this bias at lower concentrations. One additional potential factor is the fit of the absorption spectrum at lower concentrations, where the influence of optical fringes becomes greater. Even when the QCLAS is compared to the ensemble median, either at < 10 ppb or > 10 ppb, it also had a slope less than 1. This is not the first time the QCLAS has been reported to underestimate compared to other instruments. Whitehead et al. (2008) reported in an earlier version of the instrument (using a pulsed rather than a continuous quantum cascade laser) that the QCLAS reports lower concentrations but has a good R 2 compared to a wet chemistry method that sampled with WRD and analysed with selective ion membrane and conductivity analysis.
The large CDs of LGR #1 are likely due to drift of the instrument, which has been reported previously. In Misselbrook et al. (2016), data from two LGR instruments measuring NH 3 were rejected after there was significant drift in the reported values when doing periodic calibration checks.
It cannot, however, be stated that this issue is only evident in the off-axis approach, as the AP2E and the Picarro #1 also performed poorly; instead, this highlights that all measurement techniques should be compared to either a calibration standard or another instrument at regular intervals. Overall, there is no clear message on the clustering of instrumentation based on their CD or using the correlation coefficient, as the LGR, Picarro, and AiRRmonia instruments separate into the two distinct groups. It was suggested by von Bobrutzki et al. (2010) that there should only be one sampling point for future intercomparisons, but it is clear that, although most instruments, that sampled from the manifold were clustered together, it is not the controlling factor of the CD clustering. The AiRRmonia #1, which was on the scaffolding at another location in the field, is also grouped with the manifold instruments. It is most likely that the clustering is also due to the time responses as a result of the instruments and inlet setup (refer to Table S3, Sect. 3.4). For example, the AiRRmonia #1, LGR #2, and Picarro #2 have similar time responses and are clustered together, whereas the AiRRmonia #2 and Picarro #1 have much slower responses and are clustered together.

Bias compared to the optical gas standard
An estimate of the bias of each instrument was calculated compared to the OGS (i.e. the alternative, first principles, offline evaluation of the Picarro #2 concentration using raw spectra) as the reference (refer to Fig. 5 for hourly time series), where m is the slope of the orthogonal regression when the intercept is forced through zero, as it is assumed that there is no artefact in the reference measurement (von Bobrutzki et al., 2010): Table 4 summarises the bias compared to the OGS, which ranged from −20 % to +23 % for the whole period (figures can be found in Figs. S2 and S3). The worst performing in- Figure 10. Least-squares regression correlation coefficients between instruments clustered into a matrix based on their Euclidean distances (black lines on RHS of the figure) for the (a) whole range and (b) when NH 3 < 10 ppb of the ensemble median for the period of 23 August 2016, 00:00, and 29 August 2016, 01:00, based on their hourly averages. Graph generated using OpenAir package (Carslaw and Ropkins, 2012). Note the Tiger Optics is excluded from panel (b). The colour scale relates to the magnitude of the correlation coefficient.
struments, based on this metric, with a positive bias are the AP2E, with +23 %, and the Picarro #1, with +21 %, while those with a negative bias were the miniDOAS #2 and the QCLAS, with −20 % and −15 %, respectively. In contrast, unsurprisingly, the manufacturer-based evaluation of the Picarro #2 has a relatively small bias of 5 %, since the OGS uses this instrument's spectra. However, the smallest reported biases of ±1 % for the whole period are the LGR instruments, followed by the Tiger Optics and Airrmonia #1 with +2 %. It is noted for the LGR #1 that the correlation coefficient was weaker, with an R 2 = 0.79, compared to the LGR #2, which had an R 2 = 1.00 (Fig. S5). The data were then filtered to only include periods where the ensemble median < 10ppb. The bias previously reported for instruments compared to the OGS increased or remained the same, with the exception of the miniDOAS instruments and the AP2E, where there was an apparent improvement in the bias (Table 3). It was apparent that all the instruments sampling from the manifold had quite a low bias. LGR #2 and LSE as well as miniDOAS #1 and AiRRmonia #1 had the lowest bias compared to the OGS. Though the Picarro #2 had a larger bias of 7 %, likely due to different spectral data and different data evaluation, it had a high correlation of  R 2 = 1.00 (Fig. S6), which again was to be expected, as the same instrument was used to derive the OGS values. Below 10 ppb, the largest positive biases are with the AP2E, Picarro #1, and AiRRmonia #2, where there are large nega-tive biases for the miniDOAS #2, QCLAS, and the MARGA. The bias of the miniDOAS and the QCLAS is most likely due to the OGS using spectral data from the Picarro #2, which has already been shown to be greatly influenced by the inlet setup at below 10 ppb, resulting in a smoothed temporal pattern (refer to Sect. 3.4), whereas the miniDOAS and QCLAS retained the temporal features of NH 3 , even at lower concentrations. To investigate the accuracy of the OGS in the field, it was checked alongside the LSE and LGR #2 using standards produced by the ReGaS1 calibration system.

Ammonia calibration system
As previously stated, the LGR #2, LSE, and OGS were compared to the ReGaS1 calibration system. For 0 ppb, it was found that the instruments reported the following average concentrations: LSE: −0.77 ppb, LGR #2: 0.16 ppb, and the OGS: 0.14 ppb (refer to Fig. 13). The LGR #2 performed poorly compared to the other instruments. However, it is noted that the instrument was operated on a lower flow rate compared to that used during the field campaign (Table 1), resulting in a slower time response. It is evident in Fig. 13 that the LGR #2 was still stabilising and had not reached equilibrium.
LGR #1 was part of this calibration; however, it developed a fault; therefore, no results are reported here.
The OGS agrees closely with the expected concentrations, except at the two lowest concentrations, which were measured at the end of the experiment after a reduction from a higher concentration value and hence might be affected by the longer response time (hysteresis) of the instruments (Fig. 13b, Table S4). The OGS and ReGas1 values, however, are metrologically compatible (refer to Table S4 and Fig. S7 for further details).

Calibration of miniDOAS with the gas standard
On 22 August 2016, the miniDOAS #2 was compared to a PSM. A flow cell of 75 mm in length was installed in the optical path. From 13:30 to 14:52 UTC, NH 3 was flushed through this cell from the PSM cylinder (#1825R2), which contained 99.78 ppm NH 3 in N 2 . Taking into account the pressure and temperature in the flow cell as well as the ratio between the open air path and the cell length, the extra concentration that the miniDOAS was expected to measure was 163 ppb. The results of the experiment are shown in Fig. 14 below. The NH 3 concentrations measured in the open air on this day were low: 1.37 ppb averaged over the hour before the experiment and 3.31 ppb averaged over the hour after the experiment. After the start of the gas flow from the PSM cylinder, which had an expanded uncertainty of < 2 %, the concentration as measured by the miniDOAS rose sharply at first, and then rose much more slowly as an equilibrium was established. Even after the gas flow was stopped, the measured concentration still rose somewhat, indicating that a steadystate had not yet been reached. After 15:00, the NH 3 diffused out of the cell and the open tubing, and the measured concentration decreased. For the comparison, we take the average of the concentrations measured by the miniDOAS over the last 30 min the gas flowed. This was 181 ppb, 11.1 % more than the nominal concentration in the flow cell. The experiment showed that, in principle, the gas cell approach can be used for span checking the miniDOAS; however, further research into making this type of span checking affordable, routine, and at concentrations relevant to ambient concentrations is needed before this approach can be routinely applied in the field.

Accuracy and precision of the measurements
In this study, we assessed the precision by comparing the inter-variability between instruments and the variability against the ensemble median (R 2 ). In a previous study by von Bobrutzki et al. (2010), the main factors identified for affecting the precision of the measurements were (a) inlet design, (b) the condition of inline filters (where applicable), and (c) the quality of gas-phase calibration standards. In this study, it has further been shown that the precision across the suite of instruments is also dependent on the ambient concentration measured and instrument response time to rapidly changing concentrations. The majority of the instruments, with the exception of the QCLAS (specifically designed for fast response) and miniDOAS (open path), have a fairly slow response to variations in ambient concentration (Table S3), some because of their internal measurement principle, others because their inlet and filter systems dampen concentration peaks (von Bobrutzki et al., 2010). The fast response instruments (QCLAS and miniDOAS) therefore had more structure in their temporal patterns compared to the ensemble median (Fig. 3). As a result, more scatter is observed in the correlation plots for these instruments (Figs. 8 and 9), resulting in a misconception that these instruments had a poor precision; however, when these fast response instruments were compared to each other (QCLAS vs miniDOAS #1, Fig. 15a) at differing averaging times of 1 min (Fig. 15b), 10 min (Fig. 15c), and 720 min (Fig. 15d), the precision improved correspondingly: (a) r 2 = 0.85, slope: 0.87, intercept: 2.69 ppb; (b) r 2 = 0.94, slope: 0.97, intercept: 0.72 ppb; and (c) r 2 = 1.0, slope: 1.09, intercept: 1.17 ppb. Therefore, the larger scatter in comparing the QCLAS and miniDOAS instruments to the ensemble median (Figs. 8 and 9) should not be taken as a sign that the faster response instruments have reduced precision but rather as evidence of the difference of precision due to differences in instrument time response (refer to Sect. 3.4). It is likely that the observed finer-scale structure reflects the heterogeneity of the air concentrations of NH 3 across the field that both instruments would detect. Although it is beyond the scope of this study to carry out a full site emissions modelling exercise (e.g. with Lagrangian modelling), the data from this study could, in future, be used to explore concentration heterogeneity at these fine scales. Also, evidence of the precision of the MARGA has to be treated with caution, as the inlet setup in the study was atypical, with a long length (8.46 m) compared to more typical 1.29-4 m inlet setups (Makkonen et al., 2012;Rumsey et al., 2014;Twigg et al., 2015). It is therefore likely that the time response reported here (Table S3) is not a true reflection of the time response of the MARGA instrument; instead, a setup without an inlet would have to be undertaken to quantify the time response. More generally, differences between the performances of near-identical instruments and spectroscopic methods (e.g. Picarro #1 and #2; LGR #1 and #2, miniDOAS #1 and #2) show that performance is not purely linked to the measurement approach or instrument but is greatly influenced by, e.g. inlet setup, operation (e.g. flow rate), and the status of the instruments, which likely includes the status of the filters, where applicable.
An assessment of the accuracy in this study was determined by the comparison to a CRDS-based OGS (Picarro #2 with modified algorithm) and, for some checks, through the in-field gas calibration standards. For the comparison to the OGS (Table 4, Figures S4 and S5) some instruments show very little bias (LGR #1, LGR #2, Picarro #2, Tiger Optics, AiRRmonia #1, and LSE), all of which, with the exception of AiRRmonia #1, were attached to the manifold also used by the OGS. The remaining instruments were not attached to the same manifold and had either the slowest (Picarro #1, AiR-Rmonia #2, AP2E, and MARGA) or fastest time responses (QCLAS, miniDOAS #1, miniDOAS #), as set out in Ta-   ble S3. Therefore, no conclusion can be made regarding the accuracy of the reported concentrations of these instruments. The OGS comparison is likely to be limited by (i) instruments not sampling at the same point, with miniDOAS measuring a line average, and (ii) the OGS concentrations being limited by the setup of the Picarro #2 instrument. The OGS, however, is a promising methodology, as the OGS and the ReGaS1 values were comparable, but further research is required, especially regarding gas sampling issues, prior to the system being used as a reference methodology for routine monitoring.
Both the spectroscopic methods and wet chemistry methods have some cross-sensitivities that would affect the accuracy of the reported concentrations. The reported concentrations are likely to have been impacted by ammonium aerosol deposition to surfaces (inlets or filters), which has the potential to generate an artefact through volatilisation into NH 3 gas. In the von Bobrutzki et al. (2010) study, it was found that one (photoacoustic) instrument overestimated NH 3 concentrations compared to other instrumentation prior to the filter being replaced. Stieger et al. (2018) also observed when comparing the MARGA to a Picarro instrument that the Picarro reported up to 3 µg m −3 more NH 3 compared to the MARGA when it was reporting low concentrations (< 5 µg m −3 ). This was attributed by Stieger et al. (2018) to artefacts of the volatilisation of ammonium nitrate from the filter, whereas the reverse was observed under higher NH 3 concentrations. It was hypothesised, by the authors that potentially negative artefacts could occur at higher concentrations due to the formation of aerosols on the filter. Unfortunately, during this study, filters used by the instruments were not replaced. Therefore, the reported positive intercepts discussed above (Figs. 8 and 9) cannot be conclusively attributed to contaminated filters. It is, however, noted that averaging the concentration over a long period (Table S1, ALPHA exposure time), instruments with filters tend to report higher concentrations compared to filter-free methods (Table S1), supporting the suggestion that filters introduce an artefact. There is recent evidence that frequent filter changes are starting to be considered by network operators to limit artefacts in measurements. For example, He et al. (2020) reported changing filters at a frequency of every two weeks to monthly, depending on atmospheric conditions for a CRD instrument.
The Picarro instruments operated during the campaign are known to have suffered a spectral interference by H 2 O. As Martin et al. (2016) found, this could be corrected for by a water correction algorithm for the Picarro. However, this interference is known to be rather minor (< 4 %). In case of the OGS, the data evaluation algorithm has included a spectral water suppression approach and thus has no need for additional empirical water corrections. The Picarro instruments in this study did not have the water correction applied, and therefore, the results are likely to change with humidity; as a result, it is likely to have affected the accuracy of the reported concentrations. During the campaign, the LGR #1 also displayed issues in precision and accuracy (refer to Sect. 3.2). Misselbrook et al. (2016) have previously reported issues in the accuracy of an LGR instrument. Misselbrook et al. (2016) found that there was significant drift in the recorded values during calibration checks. This issue is not only limited to the LGR but has also previously been observed for the Picarro in the laboratory (Marsailidh M. Twigg, personal communication, 2022). Unfortunately, an assessment of the drift of instrumentation studied using the ReGaS was not possible during this study. It is recommended that such assessment be undertaken in future studies. However, it provides evidence that regular calibration span checks are required to determine the accuracy and precision of instrumentation, especially instrumentation considered to be plug-and-play instruments, which are thought to be stable in time. Had it not been for the comparison with other instruments, the poor performance of LGR #1 may have taken longer to identify if operating in isolation.
Manufacturers of some instruments used state that the instruments are stable and do not require recalibration, although they do recommend routine span checks. However, no frequency is provided by the manufacturers (LGR, Picarro, AP2E, and Tiger Optics). The exception is the LSE instrument, for which it is recommended that a calibration be undertaken twice a year and that, at the same time, the filter be replaced. At the time of this study, there was no routine maintenance protocol from the other manufacturers regarding the frequency of filter changes. Tiger Optics recommend that their inline filter be replaced when it begins to show discolouration, and Picarro only when the filter becomes blocked. As filters are a known source of uncertainty in terms of the absolute NH 3 concentration, it is of concern that manufactures do not provide a recommended maintenance schedule for both filters and span checks.

Inlet design
Consistent with previous NH 3 measurement studies, our results have demonstrated that inlet design is important. Whitehead et al. (2008) demonstrated that polyethylene (PE) or Teflon (PTFE) had the best response time compared to stainless steel or silcosteel, whereas PE was found to be best by Dias (1988). Vaittinen et al. (2014) studied the absorption of NH 3 under a range of humidities for stainless steel, stainless steel with Dursan, SilcoNert 2000, and halocarbon wax coatings, as well as Teflon (PTFE) and polyvinylidene difluoride (PVDF). It was found that PVDF and PTFE were the least-absorbing materials. In this study, all operators used either PFA, PTFE, or PE for their inlet. It has become evident that, although inlet material is important (where applicable), consideration of the surface to air volume ratio and residence time are also important controlling factors. For example, the air sampled by Picarro #1 had a shorter residence time in the inlet compared to the Picarro #2 but had a larger ratio of surface to volume (Table 2), which is likely to have led to greater interaction of NH 3 molecules with the surface wall. It would therefore be the recommendation that, where an inlet is required, the wall interactions be minimised by minimising the length of inlet used, the residence time, and the surface to volume ratio of the inlet. A previous study by Norman et al. (2009) demonstrated the importance of condensation on inlet lines and that care should be taken to ensure that condensation does not occur in the inlet. They recommended that an optimal design might therefore include thermal insulation and would, if possible, keep inlets heated a few degrees above the ambient temperature, particularly any sections that run within air-conditioned measurement cabins. Ellis et al. (2010) also evaluated the use of a heated inlet and found that heating the inlet line led to an improvement in the time response of a QCLAS. During this study, only the LGR #2 and the QCLAS used heated inlets. However, caution is required when heating an inlet, as if the temperature is too high, this will lead to the dissociation of NH 4 NO 3 , leading to an artefact. There is no evidence to suggest that for the QCLAS this led to an NH 3 artefact as the QCLS had a very small positive intercept (0.05 ppb) for concentrations < 10 ppb when compared to the ensemble median (Fig. 8). However, the inertial inlet of the QCLAS is designed to re-move much of the ammonium nitrate from the air stream. The LGR #2, however, had a positive intercept of 0.65 ppb (Fig. 8), though this cannot be concluded to be the result of heating, as the instrument used filters too. Not considered in this study is the contamination of the inlet, which is likely to occur over time and has been discussed previously in the literature, though there are still no recommendations regarding the frequency of either cleaning or replacing inlets. Moravek et al. (2019), for example, demonstrated that, for the QCLAS, time response degrades with age (based on a study spanning 5 months) due to contamination of the inlet, but even after cleaning, the response time did not always return. As a result, some network instruments have already started to replace their inlets frequently. Twigg et al. (2015) replace their inlet at quarterly intervals for the MARGA currently operated in the UK in an attempt to minimise contamination. Therefore, though not studied here, it would also be recommended that the frequency that an inlet is either cleaned or replaced be considered in order to account for a potential loss of precision.

Progress towards standard operating procedure for routine NH 3 monitoring
This study highlights that, currently, there is no standard operating procedure for NH 3 instrumentation in monitoring networks, and it is at the discretion of the user to determine the monitoring network design. There is evidence to suggest that this approach will lead to variations in reported concentrations, as seen in both the comparison to the ensemble mean (Fig. 8) and between instrument variability ( Fig. 10 and Table 3). The interferences from artefacts and alterations in instrument performance are an ongoing area of concern for NH 3 instruments used in long-term monitoring. It highlights the need for the further development of protocols to ensure the precision and accuracy of instrumentation. This is likely to be achieved through regular zero and span checks as well as a regular servicing programme, which, to the authors' knowledge, is not yet available for any of the instrumentation presented. Work is required to determine if span checks and calibrations should be undertaken using humidified air, as the evidence from Martin et al. (2016) would suggest that reported NH 3 concentrations from spectrometry methods are likely to suffer interferences from water. However, preparing humid gas samples with accurately characterised NH 3 concentrations in the ambient concentration range is challenging, and work is required to develop standard methodologies to produce a humidified gas standard, such as using a scrubber or heated catalysts. Pollack et al. (2019) provide a valuable study in evaluating these approaches. It would be advisable that a standard is also used on a frequent basis to determine the contamination of the setup, as previously demonstrated by Ellis et al. (2010) and Pollack et al. (2019), who observed that inlet contamination can be identified via an increase in the calculated time response. In addition, a standard inlet design needs to be agreed upon (where applicable). Evidence from the Picarro and AiRRmonia setups in this study (Fig. 7) would suggest that inlet design can lead to losses of information regarding the temporal pattern of NH 3 . Consideration is also required to determine if passivation of the inlet is valuable to routine air quality monitoring, as there is evidence that it can effectively reduce the interactions of NH 3 with the inlet walls (Roscioli et al., 2016). Open-path techniques, such as DOAS, will benefit from the availability of zero-air facilities, where instruments can check their zero level on ammonia-free air. Work on such a facility is ongoing. This study did not include the methodologies that are the current NH 3 reference methods used by the US EPA and EMEP. However, there is literature evidence of the MARGA being compared to reference methodologies. Makkonen et al. (2014) compared the MARGA to the EMEP filter pack method at a background station, Hyytiälä, in Finland. It was found that MARGA compared well to the filter pack method at low concentrations (< 0.8 µg m −3 ). Stieger et al. (2018) also found that the MARGA compared well to acid-coated denuders (NH 3 mini-denuder, Midefex and Radiello ® ), with r 2 from 0.82 to 0.98. However, the MARGA reported higher concentrations compared to these denuder methodologies, with slopes ranging from 1.30 to 1.53. This is in contradiction to Rumsey et al. (2014), who found that, although precision was within acceptable limits, the accuracy of the MARGA was variable, with concentrations being consistently underestimated compared to the US EPA reference denuder methodology, which has a sampling frequency of 12 h. This loss was attributed by Rumsey et al. (2014) to the consumption of NH 3 by bacteria. The studies for the MARGA give mixed conclusions that are likely to be due to variations in the setups between studies and the reference methodology used. To the authors' knowledge, there are no further comparisons of the US EPA and EMEP reference methodologies for the other instrumentation presented in this study. It would therefore be advisable for any future study to compare the instrumentation presented here to the US EPA and EMEP reference methodologies, using a similar approach outlined in the European guide to demonstrate equivalence for ambient air monitoring (GDE, 2010) in order to quantify the uncertainty in the different measurement techniques. This study did not include all instruments currently used in routine monitoring of NH 3 across the globe. In India, for example, the Central Pollution Control Board (CPCB) monitors NH 3 concentrations by means of the indirect measurement of NH 3 through conversion by a molybdenum convertor coupled to an NO chemiluminescence analyser (Pawar et al., 2021). In future, any other instruments identified to be used in routine monitoring of NH 3 should be added to the suite of instruments to take part in any follow-up study looking at the uncertainties compared to reference methodologies.
With the available instruments showing significant variability within the < 10 ppb range, it is clear that the accu-rate assessment of the exceedance of critical levels (CL) of NH 3 concentrations for sensitive ecosystems with these automated measurement methods remains a challenge. Critical levels are there for the protection of vegetation from damage by NH 3 ; currently, these are set by the International Cooperative Programme (ICP) Vegetation of the United Nations Economic Commissions for Europe (UNECE, 2007) at annual averages of 1 µg m −3 for lichens and bryophytes, and 3 µg m −3 for higher plants. Therefore, to achieve quantitative annual measurements with high temporal resolution instruments, great care in setup and operational quality assurance as well as data quality control would be required to achieve the CV < 20 % set in this study. The UNECE (2007) retained a monthly critical level of 23 µg m −3 as a provisional value for the prevention of ecological damage during intermittent periods such as fertiliser and manure spreading seasons. The instrumentation in this study have been shown to be cable of achieving an acceptable CV at these more elevated concentrations; however, care would be need to be taken to minimise base-line drift and instrument contamination. As with the annual averages, similar care in set up, operational quality assurance, and data quality control would be required to ensure a traceable and acceptable level of data quality for policy evidence purposes.

Conclusions
To date, this study is the most comprehensive comparison of NH 3 instruments which are, or have the potential to be, used in routine monitoring of NH 3 from background concentrations (< 1 ppb) to agricultural emission sources (> 100 ppb). Due to the interaction of NH 3 with inlets and other surfaces, a comparison of the instruments is complicated due to some instrument response times reaching or exceeding the 1 h averaging time and due to the difficulty of sampling at the same location (due to size of instruments and the need for longer sampling lines). Overall, the instruments studied performed well at elevated NH 3 concentrations, though there is evidence that MARGA has a limited range in the configuration presented. However, at concentrations below 10 ppb, performance in precision differed, with instrumentation splitting into two distinct groups based on the instrument setup. At low concentrations, even seemingly identical instruments performed differently, highlighting the impact of the setup, inlet design, and operation (external pump, inlet length, maintenance, filter ageing); here, inlet and filterless instruments have an intrinsic advantage. It should be noted that real-time instruments are currently evolving, and some of the instruments included in this study have been further developed since and partly in response to the study.
In general, the level of agreement between instruments participating in this intercomparison was encouraging. However, given the variation in performances at low and high concentrations, there is still a lot of work to do to achieve equivalence of measurements. Ideally, we should be aiming for a coefficient variance of 10 % if there is to be confidence in the measurements from different places being compared -for example, measurements being used to evaluate critical levels or long-term trends or for integrated concentrations for exposure. Therefore, if different instruments are to operate in an air quality network, equivalence work is required to determine the uncertainties across the techniques to ensure comparability against a reference method.
Overall, the simple requirement for both science and policy is that ambient NH 3 concentrations are measured to a known accuracy and precision, particularly for long-term measurements (weeks, months, years). Therefore, long-term NH 3 measurements need to be fit for purpose, taking into account the time response required and the range of concentrations to be observed. Networks with multiple measurement sites need to be comparable, and this will be only achieved by prescribed setups and traceable quality assurance and quality control protocols which are developed to achieve data quality with operational economy. Without further support of the other instrumentation present in this study, it would have been incredibly difficult for a data user to verify the accuracy and precision of the reported NH 3 concentrations. Therefore, to understand instrument performance, it is strongly recommended that, for any short-or long-term deployment as part of routine QA/QC, regular calibrations as well as zero and span checks be undertaken at a frequency determined by the operational need in the location of the measurements (i.e. high concentration and high PM concentrations will likely necessitate more frequent maintenance and checks). This will enable routine reporting of monitoring stations and ensure the precision and accuracy of concentrations reported from the inlet and instrument setup. Further long-term monitoring research is required to develop and test standard operating protocols for instrument setup, in situ calibrations, and maintenance routines, such that an international set of standards can be agreed.
Data availability. The 1 h averaged data will be available from all instruments at the UKCEH Environmental Information Data Centre, and raw files are available from the authors on request.
Author contributions. MMT, CFB, and EN designed the experiment. All co-authors were involved in various aspects of data collection. MMT conducted the data analysis and the interpretation of data, with input from all co-authors. MMT prepared the manuscript with contributions from all co-authors. Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements. The measurements were funded through the following projects, which supported the measurements during the intercomparison study: Metrology for NH 3 in ambient air (MetNH3) project as part of the European Metrology Research Programme (EMRP) of the European Union; the Academy of Finland as part of the Centre of Excellence programme (project no. 1118615); the Swiss Federal Office for the Environment (Contract 06.9115.P2I P264-1000). For IMT Nord Europe, the participation to this work has been supported by the French Ministry of Environment (grant no. 2200995403) as a part of the activities of the National Reference Laboratory for air quality monitoring (LC-SQA). The field site infrastructure at Easter Bush is supported by the NERC UK Status, Change and Projections of the Environment (UKSCAPE) National Capability Programme (NE/R016429/1). The data analysis and preparation of this manuscript was supported by the NERC Integrated Research Observations System for clean air (OSCA, NE/T001798/1). We dedicate this paper to Rens Zijlmans, who sadly passed away before the publication of this paper.
Financial support. This research has been supported by the European Metrology Programme for Innovation and Research (grant no. JRP: ENV55 MetNH3), the Natural Environment Research Council (grant nos. NE/R016429/1 and NE/T001798/1), the Academy of Finland (grant no. 1118615), and the Bundesamt für Umwelt (grant no. 06.9115.P2I P264-1000). For IMT Nord Europe, the participation in this work has been supported by the French Ministry of Environment (grant no. 2200995403) as a part of the activities of the National Reference Laboratory for air quality monitoring (LCSQA).
Review statement. This paper was edited by Glenn Wolfe and reviewed by two anonymous referees.