Validation of ACE and OSIRIS ozone and NO 2 measurements using ground-based instruments at 80 ◦ N

The Optical Spectrograph and Infra-Red Imager System (OSIRIS) and the Atmospheric Chemistry Experiment (ACE) have been taking measurements from space since 2001 and 2003, respectively. This paper presents intercomparisons between ozone and NO 2 measured by the ACE and OSIRIS satellite instruments and by groundbased instruments at the Polar Environment Atmospheric Research Laboratory (PEARL), which is located at Eureka, Canada (80 ◦ N, 86 W) and is operated by the Canadian Network for the Detection of Atmospheric Change (CANDAC). The ground-based instruments included in this study are four zenith-sky differential optical absorption spectroscopy (DOAS) instruments, one Bruker Fourier transform infrared spectrometer (FTIR) and four Brewer spectrophotometers. Ozone total columns measured by the DOAS instruments were retrieved using new Network for the Detection of Atmospheric Composition Change (NDACC) guidelines and agree to within 3.2 %. The DOAS ozone columns agree with the Brewer spectrophotometers with mean relative differences that are smaller than 1.5 %. This suggests that for these instruments the new NDACC data guidelines were successful in producing a homogenous and accurate ozone dataset at 80 ◦ N. Satellite 14–52 km ozone and 17–40 km NO2 partial columns within 500 km of PEARL were calculated for ACE-FTS Version 2.2 (v2.2) plus updates, ACEFTS v3.0, ACE-MAESTRO (Measurements of Aerosol Extinction in the Stratosphere and Troposphere Retrieved by Occultation) v1.2 and OSIRIS SaskMART v5.0x ozone Published by Copernicus Publications on behalf of the European Geosciences Union. 928 C. Adams et al.: Validation of ACE and OSIRIS and Optimal Estimation v3.0 NO 2 data products. The new ACE-FTS v3.0 and the validated ACE-FTS v2.2 partial columns are nearly identical, with mean relative differences of 0.0± 0.2 % and−0.2± 0.1 % for v2.2 minus v3.0 ozone and NO2, respectively. Ozone columns were constructed from 14–52 km satellite and 0–14 km ozonesonde partial columns and compared with the ground-based total column measurements. The satellite-plus-sonde measurements agree with the ground-based ozone total columns with mean relative differences of 0.1–7.3 %. For NO 2, partial columns from 17 km upward were scaled to noon using a photochemical model. Mean relative differences between OSIRIS, ACE-FTS and ground-based NO 2 measurements do not exceed 20 %. ACE-MAESTRO measures more NO 2 than the other instruments, with mean relative differences of 25– 52 %. Seasonal variation in the differences between NO 2 partial columns is observed, suggesting that there are systematic errors in the measurements and/or the photochemical model corrections. For ozone spring-time measurements, additional coincidence criteria based on stratospheric temperature and the location of the polar vortex were found to improve agreement between some of the instruments. For ACE-FTS v2.2 minus Bruker FTIR, the 2007–2009 spring-time mean relative difference improved from−5.0± 0.4 % to−3.1± 0.8 % with the dynamical selection criteria. This was the largest improvement, likely because both instruments measure direct sunlight and therefore have well-characterized lines-of-sight compared with scattered sunlight measurements. For NO 2, the addition of a±1 latitude coincidence criterion improved spring-time intercomparison results, likely due to the sharp latitudinal gradient of NO2 during polar sunrise. The differences between satellite and ground-based measurements do not show any obvious trends over the missions, indicating that both the ACE and OSIRIS instruments continue to perform well.


Introduction
Consistent long-term measurements of ozone and NO 2 are essential for the characterization of ozone depletion and recovery.Therefore, long-term evaluation of satellite measurements is necessary.The Optical Spectrograph and Infra-Red Imager System (OSIRIS) and the Atmospheric Chemistry Experiment (ACE) satellite instruments have been taking measurements since 2001 and 2003, respectively.While ozone and NO 2 data products from both satellites have been validated (e.g.Brohede et al., 2008;Degenstein et al., 2009;Dupuy et al., 2009;Kerzenmacher et al., 2008), continued assessment assures long-term consistency within the datasets.Furthermore, the new ACE Fourier Transform Spectrometer (FTS) Version 3.0 (v3.0) ozone and NO 2 data have not yet been validated.
Measurements and validation in the High Arctic present a unique set of challenges.There is reduced spatial coverage by ground-based measurements due to the logistical challenges of operating in a cold, remote, and largely unpopulated environment.Intercomparisons between measurements in the Arctic are complicated by the polar vortex, which isolates an air mass over the pole during the winter and spring.When the polar vortex is present, two instruments can sample air masses which are near each other spatially, but are isolated from one another.Therefore, coincident measurement pairs can include one measurement inside the vortex, with, e.g.low ozone and NO 2 , and one measurement outside the vortex.This reduces the apparent agreement between two datasets.In some validation studies, additional coincidence criteria based on dynamical parameters have been adopted in order to match similar air masses (e.g.Batchelor et al., 2010;Fu et al., 2011;Manney et al., 2007).
The Polar Environment Atmospheric Research Laboratory (PEARL) in Eureka, Canada (80 • N, 86 • W) is an excellent location for Arctic satellite validation.Measurements taken at PEARL have been included in numerous validation studies (e.g.Batchelor et al., 2010;Dupuy et al., 2009;Fraser et al., 2008;Fu et al., 2007Fu et al., , 2011;;Kerzenmacher et al., 2005;Sica et al., 2008;Sung et al., 2007).PEARL (known as the Arctic Stratospheric Ozone Observatory -AStrO prior to 2005) comprises three sites and has been operated by the Canadian Network for the Detection of Atmospheric Change (CANDAC) since 2005.Measurements included in this study were taken at the PEARL Ridge Lab (80.05 • N, 86.42 • W) and the Eureka Weather Station (79.98 • N, 85.93 • W), which is located 15 km from the Ridge Lab.Since August 2006, CAN-DAC instruments have recorded measurements of ozone and NO 2 , using ground-based zenith-sky differential optical absorption spectroscopy (DOAS) instruments and a Bruker Fourier transform infrared spectrometer (FTIR), when sunlight and weather permitted.Additional spring-time measurements were taken on a campaign basis as a part of the 2003 Stratospheric Indicators of Climate Change Campaign and the 2004-2011 Canadian Arctic ACE Validation Campaigns (e.g.Kerzenmacher et al., 2005).Brewer spectrophotometer measurements were also taken year-round for 2004-2011 by Environment Canada, with support from the Canadian Arctic ACE Validation Campaigns and CANDAC.This yields a multi-year dataset that can be used for long-term validation of satellite measurements.
The DOAS and Bruker FTIR instruments at PEARL are part of the Network for the Detection of Atmospheric Composition Change (NDACC).NDACC (formerly the Network for the Detection of Stratospheric Change -NDSC) was formed in 1991 and currently includes over 70 research stations world-wide, which monitor the stratosphere and the troposphere.Intercomparisons between the measurements can be used to assess consistency between NDACC datasets.Furthermore, the new NDACC guidelines and air Atmos.Meas.Tech., 5, 927-953, 2012 www.atmos-meas-tech.net/5/927/2012/Furthermore, the UT-GBS took summer and fall measurements at the PEARL Ridge Lab in 2008and 2010. For 1999-2001, the UT-GBS was installed outside in a temperaturecontrolled aluminum case, while for 2003-2011, it was installed inside a viewing hatch at the PEARL Ridge Lab.The PEARL-GBS is an NDACC-certified instrument.It was assembled and permanently installed inside a viewing hatch in the PEARL Ridge Lab in August 2006 and has been taking measurements during the sunlit part of the year since then (Adams et al., 2010;Fraser, 2008;Fraser et al., 2009).The GBSs have similar input optics with a field-of-view of 2 • .They both have three gratings, which are attached to a motorized turret.Resolution varies across the CCD chip from 0.5-2.5 nm for ozone; 0.5-1.0nm for NO 2 retrieved in the visible region (NO 2 -vis); and 0.2-1.0nm for NO 2 retrieved in the UV region (NO 2 -UV).Spectra from the GBSs are recorded using thermoelectrically cooled back-illuminated CCD detectors manufactured by ISA.The original UT-GBS www.atmos-meas-tech.net/5/927/2012/Atmos.Meas.Tech., 5, 927-953, 2012 CCD, used from 1999CCD, used from -2004CCD, used from , had 2000 × 800 pixels and reached temperatures of 230-250 K (Bassford et al., 2000).From 2005-2011, a 2048 × 512 pixel CCD, which operated at a temperature of 201 K, was used for the UT-GBS.The PEARL-GBS CCD is identical to the UT-GBS CCD, except it is coated with an enhanced broadband coating and its operating temperature oscillates slightly from 203-205 K on timescales of approximately 5 min.
The UT-GBS and PEARL-GBS measurements were analyzed using the settings described in Sect.3. Since the UT-GBS and PEARL-GBS are very similar instruments and data were analyzed with the same settings, their columns agree within an average of 1 % for ozone, NO 2 -vis, and NO 2 -UV.Therefore, the measurements for the UT-GBS and PEARL-GBS were combined to form a single GBS dataset.For twilight periods when both instruments took the same measurement, data were averaged.

SAOZ DOAS instruments
The System D'Analyse par Observations Zenithales (SAOZ) (Pommereau and Goutail, 1988) instruments are deployed in a global network for measurements of stratospheric trace gases and are also NDACC-certified instruments.A SAOZ instrument was deployed at the PEARL Ridge Lab during each spring for the 2005-2011 Canadian Arctic ACE Validation Campaigns.SAOZ-15 and SAOZ-7 were deployed from 2005-2009and 2010-2011, respectively. For 2008-2009and 2011, the SAOZ instrument took measurements outside on the roof of the PEARL Ridge Lab, while in other years the SAOZ instrument was installed inside the PEARL Ridge Lab and took measurements through a UV-visible transparent window.
SAOZ-15 and SAOZ-7 are UV-visible grating spectrometers which measure in the 270-620 nm region with 1.0 nm resolution and a 10 • field-of-view.They record spectra on uncooled 1024-pixel linear diode array detectors every fifteen minutes during the day and continuously between SZA 80-95 • .SAOZ ozone and NO 2 total columns were retrieved with the settings discussed in Sect.3.

CANDAC Bruker FTIR
The CANDAC Bruker IFS 125HR Fourier transform infrared spectrometer is an NDACC certified instrument that was installed inside the PEARL Ridge Lab in 2006 and is described in depth by Batchelor et al. (2009).The Bruker FTIR records spectra on either an InSb or HgCdTe detector.A KBr beamsplitter is used and eight narrow-band interference filters cover a range of 600-4300 cm −1 .Solar absorption measurements consist of two to four co-added spectra recorded in both the forward and backward direction.Each measurement takes about 6 min and has a resolution of 0.0035 cm −1 .No apodization is applied to the measurements.
The Bruker FTIR ozone and NO 2 measurements are described by Batchelor et al. (2009) and Lindenmaier et al. (2010Lindenmaier et al. ( , 2011)).The SFIT2 Version 3.92c (v3.92c) algorithm (Pougatchev et al., 1995) and HITRAN 2004 with updates were used in order to produce volume mixing ratio (VMR) profiles of the species using the optimal estimation technique.Ozone 14-52 km partial columns and total columns were retrieved in the 1000.0-1005cm −1 microwindow.Ozone has uncertainties of 4.3 % for total columns and 3.8 % for partial columns.NO 2 17-40 km partial columns were retrieved in five microwindows between 2914.590 and 2924.925cm −1 with a mean uncertainty of 15.0 %.Only NO 2 partial columns for SZA smaller than 80 • were included in this study, due to oscillations in the NO 2 profiles for larger SZA.

Brewer spectrophotometers
Brewer spectrophotometers measure total ozone columns using direct and scattered sunlight at UV wavelengths (e.g.Savastiouk and McElroy, 2005).Four Brewer spectrophotometers took measurements from 2004-2011 at both the PEARL Ridge Lab and the Eureka Weather Station.Brewers #021 and #192 are both MKIII double monochromaters, which took measurements from 2004-2011, and 2010-2011 respectively.Brewer #069, a MKV single monochromator, took measurements from 2004-2011 and Brewer #007, a MKIV single monochromator, took measurements from 2005-2011.Data were analyzed using the standard Brewer algorithm (Lam et al., 2007), with small changes to the analysis parameters due to the high latitude of the measurements.The AMF was limited to be smaller than 5 instead of 3.5, which is acceptable under low ozone conditions and allows for more days with good data in the winter months.Furthermore, the ozone layer was set at 18 km instead of 22 km to better reflect Arctic conditions.For each day, ozone data from all available instruments were averaged to create one Brewer dataset.The random error in Brewer measurements is typically less than 1 % (Savastiouk and McElroy, 2005).

Ozonesondes
Ozonesondes are launched on a weekly basis from the Eureka Weather Station (Tarasick et al., 2005)

OSIRIS
OSIRIS was launched aboard the Odin spacecraft in February 2001 (Llewellyn et al., 2004;Murtagh et al., 2002).It observes limb-radiance profiles with a 1-km vertical field-ofview over altitudes ranging from approximately 10-100 km, with coverage of 82.2 • N to 82.2 • S. The grating optical spectrograph measures scattered sunlight from 280-800 nm, with 1-nm spectral resolution.OSIRIS measures within 500 km of Eureka several times per day and measures ozone and NO 2 during the sunlit part of the year.The SaskMART v5.0x ozone dataset was used in this study.The SaskMART Multiplicative Algebraic Reconstruction Technique (Degenstein et al., 2009) combines ozone absorption information in both the UV and visible parts of the spectrum to retrieve number density profiles from the cloud tops to 60 km (down to a minimum of 10 km in the absence of clouds).SaskMART v5.0x ozone agrees with SAGE II (Stratospheric Aerosol and Gas Experiment) ozone profiles to within 2 % from 18-53 km (Degenstein et al., 2009).Random errors due to instrument noise in 14-52 km partial column measurements within 500 km of Eureka, calculated for a subset of the measurements, were on average 3.7 %.Systematic and other errors are expected to be on the same order as the instrument noise.
For NO 2 , the v3.0 Optimal Estimation data product was used.NO 2 slant column densities (SCDs) are retrieved using the DOAS technique in the 435-451 nm range.These SCDs are converted to number density profiles from 10-46 km using an optimal estimation inversion, with high response for 15-42 km (Brohede et al., 2008).The precision of these measurements is 16 % between 15-25 km and 6 % between 25-35 km based on comparisons with other instruments (OSIRIS, 2011).The average random error in 17-40 km NO 2 partial columns within 500 km of Eureka was 6.8 %.
The ACE-FTS is a high-resolution (0.02 cm −1 ) infrared FTS instrument, operating from 750-4400 cm −1 , which measures more than 30 different atmospheric species.Based on a detailed CO 2 analysis, pressure and temperature profiles are calculated from the spectra using a global nonlinear least squares fitting algorithm.Then VMR profiles are retrieved, also using a nonlinear least squares fitting algorithm.ACE-FTS v2.2 data with the ozone update (Boone et al., 2005) as well as preliminary v3.0 data were included in this study.
ACE-MAESTRO is a UV-visible-near-IR double spectrograph, with a resolution of 1.5-2.5 nm, and a wavelength range of 270-1040 nm (McElroy et al., 2007).ACE-MAESTRO v1.2 visible ozone update and UV NO 2 were used for this study.ACE-MAESTRO VMRs were converted to number densities using pressure, temperature, and density information from the ACE-FTS v2.2 data.

DOAS measurements
The PEARL-GBS and SAOZ are both NDACC-certified instruments and, therefore, data retrieved from these instruments and submitted to the NDACC database are expected to agree well.Furthermore, the UT-GBS and SAOZ both met NDACC standards during the 2009 Cabauw Intercomparison of Nitrogen Dioxide measuring Instruments (Roscoe et al., 2010).GBS and SAOZ ozone and NO 2 measurements have been compared during several Arctic and mid-latitude campaigns using the same analysis settings and the same software (Fraser et al., 2007(Fraser et al., , 2008(Fraser et al., , 2009)).
In this study, GBS and SAOZ ozone total columns were retrieved independently, following the new NDACC guidelines (Hendrick et al., 2011), with different analysis software and small differences in retrieval settings.Therefore, this is a good example of the practical implementation of the new settings and the resulting homogeneity of the NDACC dataset.The NDACC UV-Visible Working Group is currently developing similar guidelines for NO 2 and they will be made available in the near future.Therefore, for the present study, www.atmos-meas-tech.net/5/927/2012/Atmos.Meas.Tech., 5, 927-953, 2012 the GBS and SAOZ datasets were analyzed with their own preferred settings.

Differential slant column densities
The DOAS technique (e.g.Platt and Stutz, 2008) was used to retrieve the SAOZ, UT-GBS and PEARL-GBS differential SCDs (DSCDs).SAOZ DSCDs were retrieved using inhouse software, while the GBS DSCDs were retrieved with the QDOAS software (Fayt et al., 2011).For SAOZ, a single reference spectrum was used each year, while for the GBS, daily reference spectra were selected.For both instruments, wavelengths were calibrated against the solar spectrum based on the reference solar atlas (Kurucz et al., 1984).Ozone DSCDs were retrieved using the settings recommended by the NDACC UV-visible working group (Hendrick et al., 2011).For SAOZ, ozone was retrieved in the 450-550 nm window.450-545 nm and 450-540 nm windows were used for the UT-GBS and PEARL-GBS respectively, because data quality decreased for larger wavelengths taken at the detector edge.The following cross-sections were all fitted during the DOAS procedure: ozone measured at 223 K (Bogumil et al., 2003), NO 2 measured at 220 K (Vandaele et al., 1998), H 2 O (converted from line parameters given in Rothman et al., 2003), O 4 , and Ring (Chance and Spurr, 1997).The GBS DSCDs were retrieved using the wavelength-corrected Greenblatt et al. (1990) O 4 cross-section, which was recommended by NDACC in 2009, while the SAOZ DSCDs were retrieved with the Hermans (2004) cross-section, which was included in the Hendrick et al. (2011) NDACC recommendations.Based on sensitivity tests performed using the GBS datasets, this is expected to have less than a 1 % impact on the DSCDs.An additional cross-section was also included in the GBS analysis to correct for systematic polarization errors.
GBS and SAOZ NO 2 DSCDs were retrieved in three different wavelength regions.SAOZ NO 2 was retrieved using the same methods and cross-sections as for ozone, in the 410-530 nm range, with a gap from 427-433 nm.The GBS DSCDs were retrieved in the 425-450 nm visible range (NO 2 -vis), when the 600 gr mm −1 and 400 gr mm −1 gratings were used, and the 350-380 nm UV range (NO 2 -UV), when measurements were taken with the 1200 gr mm −1 and 1800 gr mm −1 gratings.The NO 2 -vis measurements were retrieved with the same parameters and cross-sections as for ozone, except a first order offset was applied to correct for dark-current and stray-light.The GBS NO 2 -UV DSCDs were retrieved with same retrieval settings as NO 2 -vis, with the addition of a BrO cross-section measured at 223 K (Fleischmann et al., 2004) and an OClO cross-section measured at 204 K (Wahner et al., 1987).Polarization correction crosssections were not included in the GBS NO 2 -vis and NO 2 -UV retrievals because there was no evidence of polarization errors in the residuals, likely due to the small wavelength intervals of the analyses.

Vertical columns
Ozone and NO 2 columns were retrieved using the Langley method with the settings described in Hendrick et al. (2011).For each twilight, DSCDs in the SZA 86-91 • window were selected, when those SZAs were available.Otherwise, the nearest available 5 • SZA window was used.For the GBS instruments, a daily average reference column density was calculated from the morning and evening twilights because a daily reference spectrum was used in the DSCD retrievals.For SAOZ, a single average of monthly average reference column densities was calculated for each spring.For both the GBS and SAOZ, total columns throughout the twilight were calculated using the reference column density and AMFs.A single column value was produced for each twilight from the weighted mean of the columns in the selected SZA range, weighted by the DOAS fitting error divided by the AMF.
For DOAS ozone retrievals, the inclusion of daily ozone data in the AMF calculations improves results, especially under vortex conditions (Bassford et al., 2001).Ozone total columns for both instruments were retrieved using the NDACC-recommended AMF LUTs (Hendrick et al., 2011).Daily AMFs are extracted from these LUTs based on the latitude and elevation of the PEARL Ridge Lab, day of year, sunrise or sunset conditions, wavelength, SZA, surface albedo, and ozone column.For the GBS, daily ozone total columns interpolated from ozonesonde data were input to the AMF LUTs, while for SAOZ, measured ozone SCDs for each twilight were input.
For NO 2 , the ozone profile has a small impact on DOAS AMFs (Bassford et al., 2001) and therefore daily ozone data is not necessary for the interpolation of AMFs.For the GBS measurements, daily AMFs were extracted from a new set of LUTs, developed by the Belgian Institute for Space Aeronomy (BIRA-IASB) (see Appendix A for details).The NO 2 VMR below 17 km was set to zero, so these AMFs produce partial columns from 17 km upward.SAOZ data were analyzed with a single set of AMFs constructed from an average of summer evening composite profiles derived from POAM III (Polar Ozone and Aerosol Measurement) and SAOZ balloon measurements in the Arctic.30 % of the NO 2 in this profile is below 17 km.These SAOZ Arctic AMFs are then used to convert the measured slant column densities into total columns of NO 2 .
A detailed error analysis was performed on the GBS measurements, including random error as well as systematic errors from cross-sections, residual structure in DOAS fits, and AMFs.For NO 2 , the temperature dependence of the crosssection and the impact of the diurnal variation were also considered.A mean total ozone error of 6.2 % was calculated.This is slightly larger than the 5.9 % total error reported for NDACC ozone column measurements (Hendrick et al., 2011), but is consistent with the challenges of taking DOAS measurements at high latitudes (see Sect. 3.3), particularly during seasons when the 86-91 For SAOZ, the estimated total error in ozone is 5.9 % (Hendrick et al., 2011).For NO 2 , the precision and accuracy are estimated at 1.5 × 10 14 mol cm −2 and 10 %, respectively.When applied to the 2005-2011 Eureka measurements and added in quadrature, this yields an average 13.2 % total error in NO 2 .

Effect of 24-h sunlight on DOAS analysis
The evolution of available SZA ranges above Eureka is shown in Fig. 6 of Fraser et al. (2009).At the summer solstice, the maximum SZA above Eureka is 76 • .This yields AMFs of ∼4 for both ozone and NO 2 , which is approximately four times smaller than the typical AMF at SZA 90 • .Furthermore, the range in AMFs for SZAs 86-91 • is greater than 10, while for SZAs 71-76 • , the range in AMFs is smaller than 1.This leads to larger uncertainties in the summertime reference column density calculations from the Langley plots.For Arctic ozone, these small AMFs coincide with low ozone total columns, leading to small differential optical depths in the DOAS fitting process.
Furthermore, the altitude sensitivity of DOAS measurements changes significantly between the spring and summer.The approximate averaging kernels for DOAS ozone and NO 2 measurements were calculated using the method of Eskes and Boersma (2003) for SZA 90 • in March and SZA 76 • in June at 75 • N and are shown in Fig. 1.For ozone, the averaging kernels were produced with the Total Ozone Mapping Spectrometer (TOMS) v8 climatology for 375 DU of ozone (Hendrick et al., 2011).For NO 2 , the sunrise NO 2 profiles from the Lambert et al. (1999Lambert et al. ( , 2000) ) climatology, described in Appendix A, were used.The averaging kernels indicate that for the large SZA, corresponding to spring and fall measurements at Eureka, sensitivity peaks in the stratosphere, with very little sensitivity to the troposphere.This is expected as strong scattering occurs in the stratosphere for these SZAs.In the summer, photons are scattered throughout the atmosphere, leading to enhanced sensitivity to the troposphere and clouds.This reduces the quality of the DOAS fits, particularly in the ozone retrieval window, as O 4 and water vapour interfere with the measurements.This enhanced sensitivity to clouds also yields additional uncertainties in the AMFs (e.g.Bassford et al., 2001).Due to these factors, summertime DOAS measurements at 80 • N are very challenging and therefore it is especially important to validate these measurements against other instruments.2011) AMF LUTs were used and for NO 2 , the AMF LUTs described in Appendix A were used.Note that for NO 2 , measurements do not extend below 17 km, as AMFs are set to zero below 17 km.

Methodology
Coincident measurements for this validation study were selected using the criteria described in Sect.4.1.Satellite ozone partial columns were calculated and combined with ozonesonde data to create total columns as described in Sect.4.2.Using the method described in Sect.4.3, all NO 2 measurements were scaled to local solar noon prior to comparison.
Agreement between these datasets was evaluated using several methods.The mean absolute difference abs between sets of coincident measurements (M 1 and M 2 ) is defined as where N is the number of measurements.The mean relative difference rel between M 1 and M 2 is defined as The standard deviation (σ ) and the standard error (σ / √ N ) of the mean absolute and relative differences were also calculated.The standard error is the reported error throughout this paper.To assess correlation between the datasets, correlation plots were also produced.Measurement errors were not included in the linear regressions.
The OSIRIS, ACE-FTS, and ACE-MAESTRO satellite instruments have better vertical resolution than the groundbased ozone and NO 2 instruments included in this study.Some studies (e.g.Batchelor et al., 2010;Dupuy et al., 2009;Kerzenmacher et al., 2008) account for this by smoothing www.atmos-meas-tech.net/5/927/2012/Atmos.Meas.Tech., 5, 927-953, 2012 the higher-resolution measurements by the averaging kernel of the lower-resolution measurements (Rodgers and Conner, 2003).Data were not smoothed in the present study because averaging kernel matrices were not available for some of the ground-based instruments and we preferred to treat all datasets in a consistent manner.In previous studies, ACE-FTS data have been smoothed to the resolution of the Bruker FTIR (Batchelor et al., 2010;Lindenmaier et al., 2011).
Smoothing is expected to have a small impact on ozone intercomparisons, since the Bruker FTIR has good sensitivity for most of the ozone column (Batchelor et al., 2009).A subset of ACE-FTS NO 2 measurements was smoothed to the resolution of the Bruker FTIR by Lindenmaier et al. (2011).ACE partial columns for 17-40 km changed on average by 1 %, with a 4 % standard deviation, when smoothing was performed.This is small compared with the agreement between NO 2 measurements in this study.The impact of smoothing on OSIRIS measurements is expected to be comparable.

Coincidence criteria
Temporal coincidence criteria were selected to maximize the number of coincident data points while minimizing the reliance on the photochemical model corrections for the diurnal variation of NO 2 , described in Sect.4.3.For comparisons between the ACE-FTS v3.0, ACE-FTS v2.2, and ACE-MAESTRO measurements, coincidences were restricted to the same occultation.For the twilight-measuring instruments (ACE and the DOAS instruments), measurements were compared from the same twilight.This prevents morning twilight measurements from being scaled to the evening by the photochemical model and vice versa.For intercomparisons between all remaining instruments, a ±12 h coincidence criterion was used.Satellite ozone and NO 2 measurements taken within a 500-km radius of the PEARL Ridge Lab were selected for intercomparisons with the ground-based measurements.Note that the satellite geolocations are given at the geometric tangent heights of 25 km for OSIRIS ozone, 35 km for OSIRIS NO 2 , and 30 km for ACE ozone and NO 2 .ACE solar occultations typically have ground tracks of 300-600 km (Dupuy et al., 2009), while OSIRIS limb measurements have ground tracks of ∼500 km.
None of the instruments included in this study measures air masses directly above PEARL.Instead they sample air masses along their lines-of-sight.Figure 2 shows the longitude and latitude of the sampled air masses in the stratosphere at 25 km for OSIRIS and 30 km for ACE, the Bruker FTIR, and the GBS.The OSIRIS measurements (panel a) do not reach latitudes above 82.2• N. The ACE measurements (panel b) are distributed approximately evenly within 500 km of PEARL.The Bruker FTIR spring-time measurements (panel c) follow the location of the sun during typical operational hours (e.g.∼09:00-16:00 local time), with larger SZA measurements sampling air masses further from PEARL.The Brewer instruments, which also measure directsunlight, would have similar sampling to the Bruker FTIR in the spring.The DOAS instruments' approximate sampling (panel d) depends on the location of the sun, as described in Appendix B. Like the Bruker FTIR, the DOAS measurements get closer to PEARL as the sun gets higher.Furthermore, as sunrises and sunsets shift northward in azimuth, the DOAS measurements shift north of PEARL.

Ozone
For comparison against ground-based total column ozone measurements, an altitude range of 14-52 km was chosen for satellite partial columns.This was the maximum altitude range for which the majority of OSIRIS, ACE-FTS, and ACE-MAESTRO profiles within 500 km of PEARL had available data.Ozonesonde data from the nearest day were added to the satellite profiles from 0-14 km, in a similar approach to Fraser et al. (2008).The resulting satellite-plussonde profile was smoothed from 12-16 km using a moving average filter in order to avoid discontinuities where the two profiles joined.No correction was applied above 52 km, since according to the United States 1976 Standard Atmosphere (Krueger and Minzner, 1976), there is less than 1 DU of ozone above 52 km.This is much smaller than the measurement errors of the various instruments (see NO 2 partial columns for satellite and Bruker FTIR measurements were calculated for 17-40 km.The lower value of this range was determined by GBS partial columns, which range from 17 km to the top of the atmosphere.The upper value of this altitude range was determined by the availability of OSIRIS, ACE-FTS, and ACE-MAESTRO data.For comparison between the satellite and GBS partial columns, no correction was applied above 40 km, because less than 1 % of the NO 2 column resides at these altitudes, which is much smaller than the measurement error (see Table 2).For comparison against the partial columns, the SAOZ total column measurements were scaled down by 30 %, corresponding to the fraction of NO 2 below 17 km in the profiles used to construct the SAOZ AMFs.These scaled SAOZ measurements are indicated by a * in the figures and tables throughout this text.NO 2 has a strong diurnal variation and therefore corrections must be applied when comparing measurements taken at different times (e.g.Brohede et al., 2008;Kerzenmacher et al., 2008;Lindenmaier et al., 2011).A photochemical box model (Brohede et al., 2007b;McLinden et al., 2000) was used to simulate the evolution of NO 2 at Eureka (80 • N) for each measurement day.Ozone profiles and temperatures from the ozonesonde launched closest to the measurement day were used to constrain the model.
The seasonal variation of NO 2 17-40 km partial columns, calculated by the photochemical model using ozonesondes launched in 2009, is shown in Fig. 3a.NO 2 at solar noon (black) increases throughout the spring as PEARL exits polar night.It reaches a maximum during the summer period of 24h sunlight and then decreases again in the fall.Throughout the year, the diurnal variation of NO 2 also changes, as can be seen by the morning (blue) and evening (red) twilight partial columns, where twilight is defined as SZA 90 • or the closest available SZA.In the spring and fall, NO 2 increases from morning to evening as NO x (NO x = NO 2 + NO) is released from its night-time reservoirs.In the summer 24-h sunlight, NO 2 decreases at noon as it is photolyzed to NO.
The ratios of NO 2 in the evening and morning twilights retrieved by the GBS instruments and calculated with the photochemical model are shown in Fig. 4 for 2007-2010.The measurements and model show good agreement.In spring 2007, when the vortex passed back and forth over PEARL, there is more scatter in the values than in the less dynamically active 2008-2010 yr.This may be because the GBSs are sampling different air masses between the morning and evening twilights.Systematic discrepancies appear in the late fall (days 280-300), as PEARL enters polar night.This may be caused by measurement error since NO 2 concentrations become very low as NO 2 is converted to its night-time reservoirs.
The instruments compared in this study sample NO 2 at different times of day, or different parts of the diurnal cycle, as shown in Fig. 3b.Instruments that measure columns at  1.
larger SZAs (GBS, SAOZ, and ACE) tend to measure more NO 2 than instruments that measure columns at smaller SZAs (OSIRIS and Bruker FTIR), as can be seen in Fig. 3c.In order to correct for this, ratios of NO 2 partial columns at noon and the measurement time were calculated using the photochemical model.These ratios were multiplied by the measurements to produce an NO 2 partial column at noon.The resulting noon-time measurements are shown in Fig. 3d and were used in all NO 2 intercomparisons.The model profiles were not degraded to the resolution of the ground-based instruments prior to scale-factor calculations.The modeled ratio of twilight to noon NO 2 does not vary greatly with altitude for 15-35 km, where the bulk of the NO 2 column resides.Therefore, the error that this introduces is expected to be small.Lindenmaier et al. (2011)  NO 2 scale factors from the same photochemical model at 7.7-16.4% above PEARL, with the maximum values around days 90 and 240, when the ratio of twilight-to-noon NO 2 is largest.
In addition to affecting measurements taken at different times, the diurnal variation of NO 2 can introduce errors in individual measurements through the "diurnal effect", which is also referred to as "chemical enhancement" (e.g.Fish et al., 1995;Hendrick et al., 2006McLinden et al., 2006;Newchurch et al., 1996).The diurnal effect is a result of sunlight passing through a range of SZAs, and hence sampling NO 2 at different points in its diurnal cycle, on its way through the atmosphere to the instrument.An error is introduced when this variation is not accounted for in the analysis, and the SZA assigned to a retrieved profile or column corresponds to the location of the instrument (for a groundbased observation) or to the location of the tangent height (for a limb observation).This effect is largest when the range of SZAs encountered includes twilight, when NO 2 varies rapidly.
For OSIRIS, these errors are relevant to measurements taken at SZAs greater than 85 • (during the spring and fall, for measurements near PEARL) and can introduce errors up to 40 % below 25 km (Brohede et al., 2007a;McLinden et al., 2006).The 60 • N error profiles shown in Fig. 9 of Brohede et al. (2007a) were applied to the OSIRIS profiles used in this study and yielded less than 10 % error in the 17-40 km NO 2 partial columns.For ACE-FTS and ACE-MAESTRO, measurements of NO 2 can be biased high below 25 km by up to 50 % (Kerzenmacher et al., 2008).When the ACE profiles included in this study were increased by 50 % below 25 km, the 17-40 km NO 2 profiles increased by up to ∼20 %.Based on the viewing geometry described in Appendix B, DOAS instruments sample a 30-km layer of the atmosphere with an SZA that is up to 3 • smaller than the SZA at the instrument.This causes the underestimation of NO 2 concentrations, particularly for measurements taken at large SZAs in the spring and fall.The Bruker FTIR NO 2 measurements were restricted to SZA less than 80 • .Since NO 2 varies slowly for those SZAs, the diurnal effect for the Bruker FTIR is small.
The instruments also sample the NO 2 maximum at different latitudes, as shown in Fig. 2.This is of particular concern at high latitudes during polar sunrise and sunset, as NO x is released from and returns to its night-time reservoirs, leading to a strong gradient in NO 2 , with lower concentrations at higher latitudes (Noxon et al., 1979).Using the photochemical model initialized with climatological ozone and temperature profiles (McPeters et al., 2007), 17-40 km NO 2 partial columns were calculated at various latitudes for the evening twilight (SZA = 90 • or nearest available SZA).Ratios of NO 2 partial columns calculated at 78 • N over 82 • N are shown in grey in Fig. 5.This represents a typical latitude difference between measurements.On days 55 and 290, which are near the first and last measurement days of the season, NO 2 partial columns at 78 • N are ∼7 times larger than at 82 • N. The difference between the columns decreases throughout the spring until approximately days 80-85.Throughout the summer, no strong latitudinal gradient is observed in NO 2 until approximately days 265-270, as polar night begins.Ratios of NO 2 partial columns calculated at 76 • N over 84 • N, representing the maximum latitude difference between coincident measurements, are also shown in Fig. 5 in red.ACE measures above PEARL during the spring and fall periods for which this effect is significant.The impact of the latitudinal gradient in NO 2 on the spring-time intercomparisons is assessed in Sect.7.

Ozone intercomparisons
Ozone partial and total column measurements made by the ground-based and satellite instruments were compared using the methods described in Sect. 4. The resulting mean absolute and relative differences are summarized in Table 3 and are discussed below.Available coincident measurements from all time periods are included in the intercomparisons.

Satellite versus satellite partial columns
The 14-52 km ozone partial columns measured by the satellite instruments were compared and are shown in the first section of Table 3. Partial columns from all four satellite instruments agree very well, with mean relative differences of 3 % or lower.Correlation plots between the satellite measurements are shown in Fig. 6   The mean relative difference between ACE-FTS v3.0 and v2.2 ozone partial columns is 0.0 ± 0.2 %.Furthermore, the two datasets are extremely well correlated, with an R 2 value of 0.973.Note that the ACE-FTS v2.2 and v3.0 datasets have slightly different results when compared with the other instruments in this study because data were compared for different time periods, based on data availability.Therefore, fall 2010 and spring 2011 are included for v3.0, but not for v2.2.ACE-FTS v2.2, ACE MAESTRO v1.2, and OSIRIS SaskMART v5.0.x ozone measurements have been compared in previous studies.Fraser et al. (2008) found a mean relative difference of +5.5 % to +22.5 % between ACE-FTS v2.2 and ACE-MAESTRO v1.2 ozone 16-44 km partial columns from 2004-2006, which is larger than the +2.7 % mean relative difference found in this study.Dupuy et al. (2009) compared profiles from ACE-FTS v2.2, ACE-MAESTRO v1.2, and OSIRIS SaskMART v2.1 data.They found ACE-MAESTRO profiles agreed with OSIRIS to ±7 % for 18-53 km.ACE-FTS profiles were typically +4 % to +11 % larger than OSIRIS profiles above 12 km.This is opposite to the findings of this study, in which ACE-FTS partial columns are lower than OSIRIS partial columns.Since OSIRIS SaskMART v2.1 and v5.0x are very similar for the 14-52 km altitude range, this difference is likely because the During the spring and fall, when the sun rises and sets, the evening twilight is defined as SZA = 90 • .During polar night, the evening twilight is defined as the minimum available SZA.During summer, when the sun is above the horizon 24-h per day, the evening twilight is defined as the maximum available SZA. 76 • N to 84 • N is the maximum range over which coincident measurements were selected (see Fig. 2).The thin black line indicates a ratio of one.
present study included only measurements taken in the Arctic, while Dupuy et al. (2009) considered measurements at all latitudes.

Satellite versus ground-based columns
Mean absolute and relative differences between groundbased total columns and satellite-plus-sonde 0-52 km columns are included in the second section of Table 3.The satellite-plus-sonde measurements are consistently larger than the DOAS measurements and smaller than the Bruker FTIR measurements.The Brewer columns fall between the satellite-plus-sonde and other ground based measurements.All ground-based measurements are within 7.3 % of the satellite-plus-sonde columns.Comparisons are not shown between ACE and the Brewer instruments because there are few coincident measurements as ACE measures above PEARL in the early spring and late fall during periods when the SZA is too large for Brewer direct-sun measurements.
The timeseries of absolute differences between the four satellite-plus-sonde data products and the ground-based measurements are plotted in Figs.7 and 8.The largest discrepancies occur in the spring-time for all measurements, with the Bruker FTIR measuring more ozone and the DOAS and Brewer instruments measuring less ozone than the satelliteplus-sondes.Although there is some year-to-year variability in the absolute differences, there is no apparent systematic change between the satellite and ground-based measurements in time.The year-to-year variability has no obvious relation to vortex activity above Eureka, such as sudden stratospheric warmings.This suggests that the performance of OSIRIS, ACE-FTS, and ACE-MAESTRO has not changed and their measurements of ozone within 500 km of PEARL are suitable for multi-year analyses.
Figure 9 shows correlation plots between the satelliteplus-sonde and ground-based total ozone columns.R 2 coefficients range from 0.518-0.910.Note that ACE-FTS v3.0 data were retrieved for spring 2011, which had abnormally low ozone values (Manney et al., 2011), and therefore has higher correlation coefficients than v2.2.

Comparisons with NDACC DOAS measurements
Intercomparison results between SAOZ and GBS ozone total columns retrieved from 2005-2011 using the NDACC settings (described in Sect.3) are shown in Fig. 10.The absolute difference between the SAOZ and GBS ozone total columns (panel a) shows good agreement for most years.SAOZ measures more ozone than the GBS in 2005 and 2007, two years in which the polar vortex passed over Eureka.This may be due in part to the different fields-of-view of the two instruments, leading to sampling of different air masses.The correlation plot between SAOZ and GBS ozone (panel b) shows a strong correlation between measurements, with an R 2 value of 0.898.For large ozone total columns, the GBS measures systematically lower than SAOZ.The mean relative difference for GBS minus SAOZ ozone is −3.2 ± 0.3 % (see third section of Table 3).This is well within the combined error of the two instruments and is comparable to the values of −6.9 % to −2.3 % found by Fraser et al. (2008Fraser et al. ( , 2009) ) for 2005-2007, when SAOZ and GBS data were retrieved by the same analysis group, using the same analysis software.This demonstrates that, even when implemented independently with slight differences in the analysis settings and software, the new NDACC data standards are sufficient to produce a homogeneous ozone dataset.
Absolute differences (panel c) and correlations (panel d) between the DOAS (GBS and SAOZ) and Brewer data are also shown in Fig. 10.Good agreement between the instruments is evident throughout the year.The mean relative difference between the GBS [SAOZ] and Brewer total ozone column measurements is −1.4 % [+0.4 %].This is better than the high-latitude agreement reported by Hendrick et al. (2011), who found that SAOZ ozone total columns were systematically lower than Brewer measurements at Sodankyla (67 • N, 27 • E) by 3-4 %, with the largest discrepancies in the spring and fall.Hendrick et al. (2011) accounted for this bias with the temperature dependence and uncertainty in the UV ozone cross-section used in Brewer measurements.The agreement between the GBS, SAOZ, and Brewer in the present study is remarkable given the challenges of taking Black lines indicate 1-1.Instrument abbreviations are given in Table 1.DOAS measurements at 80 • N, particularly in the summer (see Sect. 3.3).
The DOAS measurements are systematically lower than the Bruker FTIR total column and satellite-plus-sonde measurements by 1.6-9.2% (see Table 3).Discrepancies between the satellite-plus-sonde and DOAS measurements, shown in Fig. 7, are particularly large in the spring.Correlation plots, shown in Fig. 9, indicate that the satelliteplus-sonde measurements are systematically higher than the DOAS measurements for high ozone columns.R 2 correlation coeffictions for the satellite-plus-sonde versus GBS [SAOZ] ozone columns are greater than 0.84 [0.51].Fraser et al. (2008) compared 15-40 km ACE partial columns with ozonesonde measurements added to the columns below 15 km against GBS and SAOZ.The GBS and SAOZ measurements had been retrieved by the same analysis group with identical settings, including the Burrows et al. (1999) ozone cross-section and the AMFs described in McLinden et al. (2002).They found mean relative differences between ACE-FTS v2.2 satellite-plus-sonde and GBS [SAOZ] measurements of +3.2 to +6.3 % [+0.1 to +4.3 %].This is similar to the values of +6.5 % [+3.2 %] found in the present study.Fraser et al. (2008) found that the mean relative difference between ACE-MAESTRO v1.2 plus updates and the GBS [SAOZ] was −19.4 % to −1.2 % [−12.9 % to −1.9 %].In this study, the mean relative difference for ACE-MAESTRO minus GBS [SAOZ] was +5.0 % [+1.6 %].
The differences between satellite and DOAS measurements in this study are larger than the values reported for comparisons between SAOZ and satellite ozone total column measurements in Table 10 of Hendrick et al. (2011).The satellite data products compared by Hendrick et al. (2011) were TOMS v8, GOME-GDP4 (Global Ozone Monitoring Atmospheric CartograpHY) products, SCI-TOSOMI (SCIA-MACHY with TOSOMI algorithm developed at the Royal Netherlands Meteorological Institute -KNMI) and SCIA-OL3 (SCIAMACHY offline v3).For various stations, Hendrick et al. (2011) found that agreement between SAOZ total ozone columns and satellite total ozone columns ranged from −4.1 % to +3.1 %.The agreement in Hendrick et al. (2011) is better than the present study for several possible reasons.Hendrick et al. (2011) corrected satellite columns for temperature and SZA dependence using comparisons with the SAOZ measurements.Furthermore, DOAS retrievals are particularly challenging for higher latitudes (see Sect. 3.3).The present study compares different satellite instruments at 80 • N, which is higher than the maximum latitude of 71 • N considered by Hendrick et al. (2011).Furthermore, the satellite instruments compared by Hendrick et al. (2011) are all nadir sounders, which take dedicated ozone column measurements, while in the present study, satellite ozonesonde profiles are combined to calculate a total column.

Comparisons with Bruker FTIR measurements
On average, the Bruker FTIR measures more ozone than most other instruments (see Table 3), with the largest differences observed in the spring (see Fig. 8).A mean relative difference of +0.1 % is calculated for OSIRIS-plus-sonde minus Bruker FTIR total columns, reflecting particularly good agreement in the summer and fall (see Fig. 8).The mean relative difference for the ACE-FTS v2.2 [v3.0] minus the Bruker FTIR is −6.7 % [−4.7 %].A similar mean relative difference of −6.1 % is observed between ACE-MAESTRO and the Bruker FTIR.The comparisons worsen for 14-52 km partial columns, to −3.3 % for OSIRIS minus Bruker FTIR, −12.2 % [−9.6 %] for ACE-FTS v2.2 [v3.0] minus Bruker FTIR, and −11.2 % for ACE-MAESTRO minus Bruker FTIR.This may be due in part to the altitude resolution of the Bruker FTIR, which is lower than the satellite instruments (see Sect. 4).Bruker FTIR 10-50 km partial columns of ozone have on average 4.4 degrees of freedom for signal.Therefore, there is sufficient information to calculate partial columns in the 14-52 km altitude range.Batchelor et al. (2010) found mean relative differences between ACE-FTS v2.2 and Bruker FTIR ozone partial columns of −7.45 % in spring 2007 and −4.26 % in spring 2008, for an average partial column altitude range of 6-43 km.This is similar to the results for total column intercomparisons in the present study.Batchelor et al. (2010) found that agreement improved with the addition of dynamical coincidence criteria.This is discussed further in Sect.7. Dupuy et al. (2009) compared ACE-FTS v2.2 with ground-based FTS measurements at four locations north of 60 • N latitude from 2004-2006.They applied the same smoothing and altitude selection scheme as Batchelor et al. (2010).No vortex filtering was performed.This yielded various partial column altitude ranges with minimum values of 10 km and maximum values of 46.9 km.Mean relative differences for satellite minus ground-based FTS of −9.1 % to +3.2 % for the ACE-FTS and −8.7 % to −0.5 % for ACE-MAESTRO were obtained.This is similar to the level of agreement found in the present study.

NO 2 intercomparisons
NO 2 partial column measurements made by the groundbased and satellite instruments were compared using the methods described in Sect. 4. ACE, OSIRIS, and Bruker FTIR partial columns were calculated for 17-40 km; GBS-UV and GBS-vis partial columns were retrieved for 17 km to the top of the atmosphere; and SAOZ total column measurements were scaled to partial column amounts (see Sect. 4.3).The resulting mean absolute and relative differences are summarized in Table 4 and are discussed below.Available coincident measurements from all time periods are included in the intercomparisons.

Satellite versus satellite partial columns
Mean absolute and relative differences between 17-40 km NO 2 partial columns measured by the satellite instruments are included in the first section of  between all satellite measurements are greater than 0.61, except for ACE-MAESTRO versus OSIRIS, which has an R 2 value of 0.352.ACE-FTS v2.2 and v3.0 partial columns are nearly identical, with a mean relative difference of −0.2 ± 0.1 % and a correlation coefficient of 0.999.Note that the ACE-FTS v2.2 and v3.0 datasets have slightly different results when compared with the other instruments in this study because data were compared for different time periods, based on data availability.Therefore, fall 2010 and spring 2011 are included for v3.0, but not for v2.2.
The ACE-FTS v2.2 [v3.0] data are systematically higher than the OSIRIS dataset with mean relative differences of 6.4 % [7.4 %].These values are outside the combined random errors of the instruments (see Table 2) suggesting that the discrepancies originate from systematic errors in the measurements, the photochemical model scale factors, or the diurnal effect.See Sect.4.3 for a discussion of errors associated with scale factors and the diurnal effect.This is opposite to the results for globally coincident measurements in Kerzenmacher et al. (2008), who found that on average OSIRIS measurements were 17 % larger than ACE v2.2 measurements at the NO 2 maximum, with better agreement below the NO 2 maximum.This may be because Kerzenmacher et al. (2008) compared coincident measurements at all latitudes.Furthermore, they corrected for the diurnal effect in the ACE and OSIRIS measurements prior to comparison, eliminating a high-bias in the ACE measurements below 25 km (see Sect. 4.3).
The OSIRIS, ACE-FTS v2.2, and ACE-FTS v3.0 datasets are 24.5-34.2% lower than the ACE-MAESTRO measurements.Since the ACE-FTS and ACE-MAESTRO instruments take measurements at the same time and location, this bias cannot be attributed to coincidence criteria, photochemical model scaling, or the diurnal effect.The mean relative difference for ACE-FTS v2.2 minus ACE-MAESTRO of −24.5 % is comparable to the range of −5.found by Fraser et al. (2008) for 22-40 km partial columns from 2004-2006 within 500 km of PEARL.This offset may be due in part to an error of up to a few kilometers in the ACE-MAESTRO tangent heights, which can lead to a high bias in ACE-MAESTRO NO 2 data at high latitudes (Kerzenmacher et al., 2008).

Satellite versus ground-based partial columns
The comparisons between the ground-based and satellite measurements are summarized in the second section of Table 4.The GBS measures partial columns from 17 km to the top of the atmosphere.SAOZ measures total columns, which were scaled down to 17 km to the top of the atmosphere, as described in Sect.4.3.17-40 km partial columns were calculated from satellite and Bruker FTIR profiles.The amount of NO 2 above 40 km is negligible compared with the error in the NO 2 partial columns.Therefore no correction was applied above 40 km.No coincidences were available between the Bruker FTIR and ACE instruments because only Bruker FTIR data for SZAs smaller than 80 • were included in this study.On average, OSIRIS NO 2 measurements fall in the middle of the ground-based measurements, with mean relative differences of −7.8 % to +12.2 %.ACE-FTS measures larger values of NO 2 than the DOAS instruments, with mean relative differences of +10.3 % to +18.4 %, while ACE-MAESTRO has mean relative differences of +39.1 to +52.1 %, compared with the DOAS instruments.
The timeseries of the absolute differences between the various satellite and ground-based measurements is shown in Fig. 12. Good agreement is observed between the DOAS and OSIRIS measurements in the spring and fall (panel a).In the summer, the GBS-vis and GBS-UV measure significantly larger NO 2 columns than OSIRIS.A similar seasonal variation is observed between OSIRIS and Bruker FTIR measurements (panel b).There is also a slight seasonality observed in differences between the Bruker FTIR and DOAS measurements (not shown here).This suggests that there are seasonal systematic errors in one or more of the datasets or in the diurnal correction scale factors.The differences between the ACE-FTS and the GBS partial columns (panels c, d) are more scattered in the spring than the fall, likely due to increased spatial variability of NO 2 when the polar vortex is changing structure and position rapidly in spring.The ACE-MAESTRO measurements are systematically higher than the DOAS measurements except in fall 2009 (panel e).Differences between the satellite and ground-based measurements do not change year-to-year, indicating that the satellite measurements have not changed systematically over time.
Correlations between the satellite and ground-based measurements are shown in Fig. 13.The Bruker FTIR and GBS measure more NO 2 than OSIRIS for larger NO 2 columns.This corresponds with seasonal variation in the discrepancies discussed above.ACE-MAESTRO measurements are not as well correlated with the ground-based measurements as OSIRIS and ACE-FTS.

Comparisons with DOAS measurements
The mean relative difference for GBS-vis minus GBS-UV NO 2 is +6.1 ± 0.4 %.This demonstrates good agreement, despite the shorter paths through the stratosphere taken by zenith-scattered light at UV wavelengths.indicates that the new AMFs used for the GBS retrievals (see Appendix A) produce similar NO 2 columns for both UV and visible wavelengths.
The GBS-UV, GBS-vis, and SAOZ partial columns for 17 km to the top of the atmosphere all agree to within 6.4 %.Fraser et al. (2008Fraser et al. ( , 2009) ) found a comparable agreement of 2.2-12.3% for 2005-2007, when SAOZ and GBS NO 2 total columns were retrieved using the same analysis settings.In Fig. 14, the absolute difference (panel a) and correlation plot (panel b) between SAOZ and GBS-vis (grey) and GBS-UV (red) NO 2 are shown.The offset between the GBS minus SAOZ measurements appears to vary year-to-year, with positive offsets in 2005, 2006, 2010 and 2011 and negative offsets in 2007, 2008 and 2009.Similar year-to-year variation is observed in the differences between the satellite and SAOZ measurements (see Fig. 12).This may be because the SAOZ instrument measures total columns of NO 2 , which have been scaled down to 17-km to the top of the atmosphere by a fixed value.Therefore, year-to-year differences in lower stratospheric NO 2 may be a factor.Furthermore, this may reflect year-to-year differences in the SAOZ reference column density, which is averaged on a campaign basis.These reasons may also explain why the GBSs are more strongly correlated with the satellites than SAOZ for NO 2 (see Fig. 13).The DOAS NO 2 measurements are systematically lower than the satellite NO 2 measurements in the spring.This may be due to the diurnal effect, which causes the GBS partial columns to be lower and the ACE partial columns to be higher (see Sect. 4.3).

Comparisons with Bruker FTIR measurements
The Bruker FTIR measures less NO 2 than the other instruments by 12.2-19.2%.This is similar to the results of Lindenmaier et al. (2011) who found that Bruker FTIR NO 2 15-40 km partial columns were systematically lower than GEM-BACH (Global Environmental Multiscale stratospheric model with the online Belgium Atmospheric CHemistry package), CMAM-DAS (Canadian Middle Atmosphere Model Data Assimilation System), and SLIMCAT (Single-Layer Isentropic Model of Chemistry and Transport) models for the entire measurement season.Aside from these model comparisons, the PEARL Bruker FTIR NO 2 has not previously been validated.Kerzenmacher et al. (2008) compared ACE-FTS v2.2 and ACE-MAESTRO v1.2 partial columns with ground-based FTS measurements from other stations.For that study, the ACE data were smoothed to the resolution of the FTSs and partial columns were calculated in ranges determined by the instrument sensitivities.Mean relative differences for satellite minus ground-based FTS 14.8-32.9km partial columns measured at Ny Alesund, Svalbard (78.9 • N, 11.9 • E) were +20.9 % for the ACE-FTS and +25.6 % for ACE-MAESTRO.This is consistent with the results of the present study.

Spring-time coincidence criteria
In this study, it was found that agreement between the various instruments was worse for both ozone and NO 2 in the spring.This may be attributed to the different lines-of-sight described in Sect.4.1, which can result in instruments sampling very different air masses.This is especially relevant during spring when air masses inside and outside the vortex can be close spatially but isolated from one another.Ozone and NO 2 columns tend to be lower when the lower stratosphere (∼18-20 km) is inside the polar vortex.Furthermore, the latitudinal distribution of NO 2 has a strong gradient in the spring (see Sect. 4.3).Therefore, additional coincidence criteria were tested for the 2004-2009 GBS, SAOZ, Bruker FTIR, OSIRIS and ACE-FTS v2.2 datasets.Spring-time data for days 50-78 (19 February to 18/19 March) were selected as this is the approximate period of spring-time ACE measurements within 500 km of PEARL.
In order to identify similar air masses, derived meteorological products (DMPs) (Manney et al., 2007) from the GEOS v5.0.1 (Reinecker et al., 2008) analysis were calculated along the line-of-sight of the ACE-FTS, the Bruker FTIR, and the DOAS (GBS and SAOZ)    and latitude of the OSIRIS 25-km tangent height.To determine whether measurements were sampling similar air masses, scaled potential vorticity (sPV), a dynamical parameter used to estimate the location of the vortex edge, and temperature profiles were considered.Lindenmaier et al. (2012) present the evolution of sPV in the lower stratosphere above Eureka for springs 1997-2011.
The selection criteria of Batchelor et al. (2010) could not be applied directly to the DOAS (GBS and SAOZ) and OSIRIS datasets because only pressure levels were available for these DMPs.Furthermore the imposition of dynamical coincidence criteria at altitudes up to 46 km reduced the comparison statistics.Therefore, a new set of dynamical coincidence criteria were developed.The best results were obtained when dynamical coincidence criteria were imposed only in the lower stratosphere, where the bulk of the ozone column resides, at 131 hPa (∼14 km), 72.5 hPa (∼18 km), and 53.9 hPa (∼20 km).The difference in temperature between www.atmos-meas-tech.net/5/927/2012/Atmos.Meas.Tech., 5, 927-953, 2012 measurements at each of these layers was restricted to <10 K. Furthermore, coincident measurements were selected only if they were both inside (sPV > 1.6 × 10 −4 s −1 ) or both outside (sPV < 1.2 × 10 −4 s −1 ) the polar vortex at the selected pressure levels.All measurements on the vortex edge (sPV between 1.2 × 10 −4 s −1 and 1.6 × 10 −4 s −1 ) at 131 hPa, 72.5 hPa, and 53.9 hPa were rejected.With these additional selection criteria, four out of nine instrument intercomparisons improved within standard error.For the remaining five intercomparisons, changes were not significant.The largest improvement was observed for ACE-FTS minus Bruker FTIR, which had a mean relative difference of −5.0 ± 0.4 % without the dynamical selection criteria and −3.1 ± 0.8 % with the dynamical selection criteria.These modest improvements may be limited by the narrow 500-km coincidence criterion already in place and the approximate line-of-sight calculations used in this study.The DOAS and OSIRIS instruments measure scattered sunlight, for which photons travel various paths through the atmosphere to the instrument.Therefore, precise line-of-sight calculations cannot be performed.The DOAS DMPs were calculated along the approximate line-of-sight (see Appendix B) and the OSIRIS DMPs were calculated at the fixed latitude and longitude of the 25-km tangent height.This weakens the dynamical selection criteria.
For NO 2 , dynamical coincidence criteria did not improve comparison results.The uncertainties in the measurements, the diurnal scale factors, and the diurnal effect likely overwhelm the impact of the polar vortex on these intercomparisons (see Sect. 4.3).This is consistent with the results of Kerzenmacher et al. (2008), who found that scatter in the differences between ACE and high-latitude ground-based FTIR measurements could not be attributed to the polar vortex.
Due to the latitudinal gradient of NO 2 in the early spring (see Sect. 4.3), a narrower ±1 • latitude coincidence criterion was applied to the 30-km tangent height of the ACE measurements, the 35-km tangent height of OSIRIS measurements, and the location of the 30-km layer along the calculated line-of-sight of the DOAS measurements.Dynamical coincidence criteria were not included.The Bruker FTIR was not included in this comparison because NO 2 measurements for this time period were removed by the SZA < 80 • filter.
The resulting mean relative differences with (grey circles) and without (red stars) the additional latitude coincidence criterion are shown in Fig. 15 for comparisons with twenty or more measurement points.The impact of the additional criterion suggests that the latitudinal gradient of NO 2 plays a role in the intercomparisons.During the time period considered, the average latitude was 77.9 • N for GBS-vis, 80.2 • N for GBS-UV, 78.1 • N for SAOZ, 80.2 • N for OSIRIS, and 79.6 • N for ACE-FTS.Therefore, with the new criterion, the GBS-vis, SAOZ, and ACE-FTS measurements (taken on average at lower latitudes) decrease relative to the OSIRIS measurements (taken on average at higher latitudes).Mean relative differences between OSIRIS and SAOZ improve from −5.1 ± 1.6 % to +2.3 ± 1.7 % with the latitude filter.Furthermore, mean relative differences between OSIRIS and ACE-FTS improve from −6.7 ± 2.4 % to −1.1 ± 2.1 %.Agreement between OSIRIS and GBS-vis also improves, but is not significant within standard error.The mean relative difference between the GBS-UV and OSIRIS NO 2 (which measure at the approximately the same average latitude) changes only by a small amount.While some of these improvements may be due in part to the isolation of similar dynamical air masses, this suggests that the latitudinal distribution of NO 2 plays a significant role in validation exercises at high latitudes.

Conclusions
Ground-based and satellite ozone and NO 2 columns were compared for satellite measurements within 500 km of the PEARL Ridge Lab.Satellite ozone and NO 2 partial columns were calculated from 14-52 km and 17-40 km, respectively.For comparison with ground-based measurements, satelliteplus-sonde 0-52 km columns were calculated by adding ozonesonde data to the satellite partial columns from 0-14 km.For NO 2 intercomparisons, the satellite data were compared directly to the ground-based data, as all groundbased instruments except SAOZ measured partial columns above 17 km.For SAOZ, the total column measurements were scaled down to 17 km to the top of the atmosphere.All satellite and ground-based NO 2 measurements were scaled to solar noon with the same photochemical model prior to comparison to account for the diurnal variation of NO 2 .DOAS ozone total columns were retrieved for the GBS and SAOZ by independent analysis groups using the new NDACC guidelines (Hendrick et al., 2011) For DOAS NO 2 , NDACC-recommended settings do not yet exist.GBS NO 2 partial columns for 17 km to the top of the atmosphere were calculated in the 425-450 nm window (GBS-vis) and the 350-380 nm window (GBS-UV), using the new AMF LUTs described in Appendix A. The mean relative difference for GBS-vis minus GBS-UV measurements was +6.1 %, indicating that, despite the challenges of retrieving stratospheric columns at UV wavelengths, the GBS-UV measurements perform well.The GBS NO 2 agreed to within 6.5 % of the SAOZ measurements, which had been calculated using SAOZ Arctic AMFs which were scaled down to partial columns by a fixed scale-factor.
Partial columns measured by the various satellite instruments showed good agreement for measurements within 500 km of PEARL.For ozone, all satellite instruments agreed with each other within 3 %.For NO 2 , all satellite instruments except for ACE-MAESTRO agreed within 7.4 %.ACE-MAESTRO NO 2 measurements were systematically higher than the others by 24.5-34.2%, perhaps due to a problem with tangent height gridding (Kerzenmacher et al., 2008;Manney et al., 2007).ACE-FTS NO 2 is systematically larger than OSIRIS by 6.4-7.4 %, perhaps due to the diurnal effect on the ACE-FTS measurements (Kerzenmacher et al., 2008).ACE-FTS v2.2 and v3.0 ozone and NO 2 partial columns were found to be strongly correlated, with mean relative differences of 0.0 ± 0.1 % and −0.2 ± 0.1 %.This indicates that ACE-FTS v2.2 and v3.0 partial columns of ozone and NO 2 within 500 km of PEARL are nearly identical.
Satellite measurements were validated against four ground-based ozone and four ground-based NO 2 datasets from PEARL.Satellite-plus-sonde measurements agree with ground-based total ozone columns with a maximum mean relative difference of 7.8 %.The Bruker FTIR and satellite instruments measure larger ozone total columns than the DOAS and Brewer instruments, with the largest discrepancies in the spring.
For NO 2 , OSIRIS, ACE-FTS v2.2 and ACE-FTS v3.0 data agreed with all ground-based measurements to within 20 %.ACE-MAESTRO measured systematically larger NO 2 than the ground-based instruments, with mean relative differences of 39.1-52.1 %.The Bruker FTIR measured systematically lower NO 2 than the other instruments, which is similar to the comparison results of Kerzenmacher et al. (2008) for other Arctic ground-based FTIR instruments.In the spring, the GBS and SAOZ also measured lower NO 2 partial columns than the satellite instruments, perhaps in part due to the diurnal effect (see Sect. 4.3).Large seasonal variation in the differences between satellite and ground-based NO 2 measurements was observed, with more scatter in the differences in the spring than in the fall.The differences between OSIRIS and the ground-based measurements varied systematically throughout the year, reaching minima in the summertime.This could point to seasonal systematic errors in the measurements or in the diurnal scaling applied prior to intercomparison.
Since intercomparison results for both ozone and NO 2 columns were worse in the spring, several filtering tests were applied to the datasets.The addition of dynamical coincidence criteria in the lower stratosphere improved the agreement between some of the datasets by 1-3 %.This improvement is likely limited because the 500-km distance coincidence criterion was already narrow and the line-of-sight calculations for the DOAS and OSIRIS instruments are approximate.Furthermore, an additional latitude-filtering criterion was tested on the NO 2 measurements in order to account for the strong latitudinal gradient in NO 2 at high latitudes in the spring and fall.The addition of latitudinal filtering improved agreement between some spring measurements.
For both ozone and NO 2 , the OSIRIS, ACE-FTS and ACE-MAESTRO satellite measurements do not change systematically relative to ground-based measurements taken from 2003 to 2011.This indicates that these satellite instruments continue to perform well and demonstrates the usefulness of acquiring long-term datasets at PEARL.  10, 30, 50, 70, 80, 82.5, 85, 86, 87, 88, 89,  90, 91, and 92 •   intercomparison exercises (Hendrick et al., 2006;Wagner et al., 2007).Parameter values used to initialize UVSPEC/DISORT for the calculation of the AMF LUTs are summarized in Table A1.Ozone, temperature, and pressure profiles are taken from the TOMS v8 climatology, which is similar to the climatology of McPeters et al. (2007).Since this climatology is limited to the 0-60 km altitude range, the ozone, temperature, and pressure profiles are complemented above 60 km by the Air Force Geophysical Laboratory (AFGL) Standard Atmosphere to match the 0-90 km altitude grid chosen in UVSPEC/DISORT.The NO 2 profiles are also complemented above 60 km by the AFGL Standard Atmosphere and set to zero below 17 km altitude.Therefore the calculated AMFs are purely stratospheric.The surface albedo and altitude output values (varying from 0 to 1 and 0 to 4 km, respectively) allow coverage of all stations with UV-visible instruments.For the aerosol settings, an extinction profile corresponding to a background aerosol loading has been selected from the aerosol model of Shettle (1989) included in UVSPEC/DISORT.Therefore, as mentioned above, the NO 2 AMF LUTs are not applicable to times when there are large volcanic eruptions such as Mount Pinatubo in 1991.
The calculated LUTs depend on the following set of parameters: latitude, day of year, sunrise or sunset conditions, wavelength, SZA, surface albedo, and altitude.As for the ozone AMF LUTs described in Hendrick et al. (2011), an interpolation routine has been developed for extracting appropriately parameterized NO 2 AMFs for various stations with UV-visible instruments.A global monthly mean climatology of the surface albedo derived from satellite data at 380 and 494 nm (Koelemeijer et al., 2003) were coupled to the interpolation routine, so the latter can be initialized with realistic albedo values in a transparent way.Appendix B

Line-of-sight of zenith-scattered measurements
Zenith-sky instruments sample the atmosphere along a lineof-sight which varies in latitude and longitude with altitude, SZA, solar azimuth angle, and wavelength.This appendix outlines a method for calculating the approximate line-ofsight of zenith-scattered measurements, which is used in the calculation of DMPs and the application of additional coincidence criteria.
For measurements above SZA 85 • , most light is scattered at an altitude called the scattering height, as shown in Fig. 1 of Solomon et al. (1987).The radiance of sunlight in the zenith as a function of scattering height was calculated with a radiative transfer model (McLinden et al., 2002), using the methods described by Solomon et al. (1987).The zenithscattered radiance at the surface versus the scattering altitude is shown in panel a of Fig. B1 for various SZA and wavelengths.The wavelengths correspond to the GBS DSCD retrieval windows described in Sect.3.1 (500 nm for ozone, 425 nm for NO 2 -vis, and 365 nm for NO 2 -UV).Approximate scattering heights for the various SZA and wavelengths were calculated by taking the weighted means of the scattered radiances and are shown in Table B1.The scattering height is lower for longer wavelengths and for smaller SZA, as expected.
Using the scattering height and SZA, the distance between the PEARL Ridge Lab and the ground location directly below the sampled air mass can be calculated, using the geometry in Solomon et al. (1987).The latitude and longitude of the sampled air mass can then be calculated from the distance and the solar azimuth angle.The distances from the PEARL Ridge Lab are shown in panel b of Fig. B1 and vary considerably depending on the SZA, altitude, and wavelength.At an altitude of 18 km, in the lower stratosphere where springtime ozone depletion can occur, measurements can range from directly above the PEARL Ridge Lab for NO 2 -UV to 175 km away.At 30 km, DOAS instruments sample an air mass up to 400 km away from the PEARL Ridge Lab.B1.
Approximate column averaging kernels for DOAS measurements of ozone and NO 2 in March at SZA 90 • and June at SZA 76 • .For ozone, the Hendrick et al. (

Fig. 2 .
Fig. 2. Location of ozone air mass sampled by (a) all OSIRIS scans and (b) ACE-FTS v2.2 occultations used in this study; and (c) Bruker FTIR and (d) GBS spring-time measurements.PEARL is indicated by the red star.Locations of the OSIRIS scans are shown for the 25-km air mass, while all other measurements are shown for the 30-km air mass.

Fig. 3 .
Fig. 3. Seasonal evolution of NO 2 in 2009.(a) 17-40 km partial columns calculated by the photochemical model initialized with ozonesondes for morning twilight (thick cyan line), evening twilight (red dashed line), and solar noon (thin black line).(b) Measurement SZAs.Note that ACE, the GBS, and SAOZ all measure at approximately the same SZA and, therefore, have overlapping data points.Similarly OSIRIS and the FTIR measure at approximately the same SZA.(c) NO 2 partial columns measured by ground-based and satellite instruments.(d) Same as (c) except all measurements scaled to solar noon using photochemical model.Instrument abbreviations are given inTable 1.

Fig. 4 .
Fig. 4. Ratio of evening twilight to morning twilight NO 2 as measured by the GBS-vis (black squares) and GBS-UV (blue circles) and as calculated using a photochemical model (red dots).Ratios are plotted against day of year for (a) 2007, (b) 2008, (c) 2009, and (d) 2010.
have R 2 values of 0.821 or greater.

Fig. 5 .
Fig.5.Ratios of 17-40 km NO 2 partial columns at various latitudes during the evening twilight, calculated with photochemical model initialized with climatological ozone and temperatures.During the spring and fall, when the sun rises and sets, the evening twilight is defined as SZA = 90 • .During polar night, the evening twilight is defined as the minimum available SZA.During summer, when the sun is above the horizon 24-h per day, the evening twilight is defined as the maximum available SZA. 76 • N to 84 • N is the maximum range over which coincident measurements were selected (see Fig.2).The thin black line indicates a ratio of one.
Fig. 10.(a) Absolute difference (circles) between GBS and SAOZ ozone total columns.(b) Correlation between GBS and SAOZ ozone total columns.(c) Absolute difference (circles) between GBS (grey) and SAOZ (red) minus Brewer ozone total column measurements.(d) Correlation for GBS (grey) and SAOZ (red) versus Brewer ozone measurements.In (a) and (c), the solid black lines indicate the zero line and the dashed lines indicate mean absolute differences.In (b) and (d), the solid black lines indicate the 1-1 line and the dashed lines indicate linear fit (m = fitted slope, y = fitted y-intercept).

Fig. 13 .
Fig. 13.As for Fig.6, satellite versus ground-based NO 2 partial columns.Note that comparisons are not shown for ACE versus the Bruker FTIR because most ACE measurements above PEARL were for early spring and late fall when SZA > 80 • and Bruker measurements were excluded for these SZA.
Fig. 14.(a) Absolute difference (circles) and mean absolute difference (dashed line) between GBS-vis (grey) and GBS-UV (red) minus SAOZ NO 2 partial column measurements.The solid black line indicates zero.(b) Correlation for GBS-vis (grey) and GBS-UV (red) versus SAOZ NO 2 partial column measurements.The solid black line indicates the 1-1 line and the dashed line indicates linear fit (m = fitted slope, y = fitted y-intercept).
of sampled air−mass from PEARL

Fig. B1 .
Fig. B1.(a)Zenith-scattered radiance at the surface and (b) horizontal distance of sampled air mass from the PEARL Ridge Lab as a function of the altitude of the sampled airmass for various SZA and wavelengths.Note that the horizontal distance of the sampled airmass was calculated with fixed scattering heights, given in TableB1.

Table 2 .
Mean percent error of various measurements.Square brackets indicate errors in partial columns.Error sources are described in Sect. 2. Instrument abbreviations are summarized in Table 1.Note that for some instruments, error estimates include systematic and random errors, while for others, only random errors are calculated.

Table 4 .
As for Table3, NO 2 partial columns.All measurements were scaled to local solar noon using the photochemical model.Instrument abbreviations are summarized in Table1.
* Indicates scaling of primary total column measurements to partial columns.

Atmos. Meas. Tech., 5, 927-953, 2012 www.atmos-meas-tech.net/5/927/2012/
instruments, and at the longitude relative difference between the GBS and SAOZ ozone total columns was −3.2 %.The DOAS instruments agreed with the Brewer to 0.4-1.4%, indicating that the NDACC settings perform well, even in the summer months at PEARL, when the maximum SZA of 76 • makes DOAS measurements challenging.Therefore, the NDACC settings and AMFs for DOAS ozone are successful in producing a homogeneous and accurate dataset at 80 • N.

Table A1 .
Parameters used to initialize the UVSPEC/DISORT radiative transfer model for the calculation of the NO 2 AMF LUTs.

Table B1 .
Mean scattering height (z) for zenith-sky measurements at various SZA and wavelengths.