Relative drifts and stability of satellite and ground-based stratospheric ozone profiles at NDACC lidar stations

The long-term evolution of stratospheric ozone at different stations in the low and mid-latitudes is investigated. The analysis is performed by comparing the collocated profiles of ozone lidars, at the northern midlatitudes (Meteorological Observatory Hohenpeißenberg, Haute-Provence Observatory, Tsukuba and Table Mountain Facility), tropics (Mauna Loa Observatory) and southern mid-latitudes (Lauder), with ozonesondes and space-borne sensors (SBUV(/2), SAGE II, HALOE, UARS MLS and Aura MLS), extracted around the stations. Relative differences are calculated to find biases and temporal drifts in the measurements. All measurement techniques show their best agreement with respect to the lidar at 20–40 km, where the differences and drifts are generally within ±5 % and ±0.5 % yr−1, respectively, at most stations. In addition, the stability of the long-term ozone observations (lidar, SBUV(/2), SAGE II and HALOE) is evaluated by the cross-comparison of each data set. In general, all lidars and SBUV(/2) exhibit near-zero drifts and the comparison between SAGE II and HALOE shows larger, but insignificant drifts. The RMS of the drifts of lidar and SBUV(/2) is 0.22 and 0.27 % yr −1, respectively at 20–40 km. The average drifts of the long-term data sets, derived from various comparisons, are less than ±0.3 % yr−1 in the 20–40 km altitude at all stations. A combined time series of the relative differences between SAGE II, HALOE and Aura MLS with respect to lidar data at six sites is constructed, to obtain long-term data sets lasting up to 27 years. The relative drifts derived from these combined data are very small, within ±0.2 % yr−1.


Introduction
The discovery of the Antarctic ozone hole (Farman et al., 1985) and the understanding of the negative impacts of ozone depleting substances (ODS) on the evolution of the ozone layer led to the creation of international treaties (Vienna P. J. Nair et al.: Stability of ozone measurement systems Convention, in 1985, Montreal Protocol, in 1987, which to a large extent, have phased out production and emission of harmful chlorofluorocarbons. The analysis of stratospheric ozone trends in the wake of declines in the abundances of ODSs in the stratosphere is currently the focus of stratospheric ozone research. Statistical studies of ozone content in the upper stratosphere have revealed a strong decreasing trend until the mid-1990s and a levelling off after 1996, consistent with the decreasing trend in upper stratospheric HCl (Reinsel et al., 2002;Newchurch et al., 2003;WMO, 2007;Jones et al., 2009;Steinbrecht et al., 2009a). A study by Steinbrecht et al. (2006) found upper stratospheric ozone trends of about −6, −4.5 and −8 % decade −1 at northern mid-latitude, subtropics and southern mid-latitude stations, respectively before 1997. After 1997, changes in the trends by about 7, 7 and 11 % decade −1 were evaluated at the respective stations.
In the lower stratosphere too, studies have shown a negative trend until the mid-1990s and a positive trend afterwards at selected low and mid-latitude regions (Yang et al., 2006;Zanis et al., 2006). These studies suggest that the decrease in ozone depletion between 18 and 25 km is consistent with the reduction in stratospheric chlorine and bromine amounts, whereas below 18 km the increase in ozone is most likely driven by changes in atmospheric transport. In a recent study, Dhomse et al. (2006) found that the rapid increase of Northern Hemispheric total ozone is due to the effect of enhanced residual circulation during the recent years, which is also confirmed in a study by Harris et al. (2008). Several studies (for e.g. Weatherhead and Anderson, 2006) reported that an understanding of ozone recovery to the pre-1980 levels is possible only after differentiating the effects of transport, temperature, and solar cycle on observed ozone changes. Hence, an accurate evaluation of ozone trends and an understanding of the factors playing important roles in the increase or decrease of ozone are necessary to evaluate the efficiency of the Montreal Protocol for the preservation of the ozone layer. This evaluation depends largely on the quality and continuity of the measurements used for the studies. Because instrument stability is essential to derive statistically significant ozone trends, a consistent evaluation of ozone observations is crucial for the estimation of trends and the prediction of ozone evolution in the future.
The Network for the Detection of Atmospheric Composition Change (NDACC) is an international network set up in 1991. NDACC relies on worldwide measurement stations with various instruments designed initially for the simultaneous monitoring of atmospheric parameters involved in the ozone depletion issue. Recently, NDACC has broadened its scope with the monitoring of atmospheric composition in the free and upper troposphere and the mesosphere. One of the main goals of NDACC is the validation of space-based observations. For that purpose, a careful evaluation of the stability of NDACC ground-based measurements is necessary. In this context, a thorough analysis of 6 satellite [Solar Backscatter UltraViolet (SBUV(/2)), Stratospheric Aerosol and Gas Experiment (SAGE) II, Halogen Occultation Experiment (HALOE), Microwave Limb Sounder (MLS) on board the Upper Atmosphere Research Satellite (UARS) and Aura and Global Ozone Monitoring by Occultation of Stars (GO-MOS)] and 3 ground-based (lidar, ozonesondes and Umkehr) ozone data sets was performed at one of the NDACC lidar stations, located at Haute-Provence Observatory (OHP) (Nair et al., 2011). The study showed that the considered data sets agree well with the lidar observations, showing an average bias of less than ±0.5 % in the 20-40 km altitude range. All measurements are stable, and their relative drifts are estimated to be within ±0.5 % yr −1 in this altitude range.
The present work extends this study to other NDACC lidar stations located in the tropical and mid-latitude regions. We focus on NDACC lidar stations providing long-term and continuous ozone measurements, namely the northern mid-latitude stations of Meteorological Observatory Hohenpeißenberg (MOHp: 47.80  This article is organised in the following way: the introduction is followed by the data description of lidar, ozonesondes and satellite observations in Sect. 2. The methodology used for the analyses is presented in Sect. 3. Section 4 discusses the average biases, the stability evaluation of ozone measurements using relative drifts, the temporal evolution of the combination of older and newer satellite data sets and the drifts derived from the combined data. The final section concludes with the findings from the study.

Lidar
The lidar is an active remote sensing instrument based on the interaction between laser radiation and the atmosphere. According to the atmospheric parameter to be measured, lidar systems use various light-matter interactions, such as Rayleigh, Mie and Raman scattering, absorption or fluorescence. The lidar stations considered in our study use the Differential Absorption Lidar (DIAL) technique for measuring stratospheric ozone. It provides range-resolved measurements with high vertical resolution (Schotland, 1974). The technique requires the simultaneous emission of lidar radiation at two wavelengths characterised by a different ozone absorption cross-section. For all stations, the ozone-absorbed wavelength used is 308 nm, emitted from a Xenon Chloride excimer laser. The reference wavelength varies at each station between 353 and 355 nm based on its generating method. A Raman cell filled with hydrogen is used for obtaining 353 nm, while the third harmonic of a Nd : YAG laser provides light at 355 nm. The ozone number density is computed from the difference in the slope of the logarithm of the range-corrected returned signals. Measurements are performed during nighttime under clear sky conditions. In the presence of strong aerosol loading, additional backscattering contaminates the Rayleigh signals. In such conditions, measurements use lidar signals originating from the vibrational Raman scattering of the laser radiation by atmospheric nitrogen (McGee et al., 1993). The vibrational Raman signals are backscattered at the wavelengths 332 and 385/387 nm corresponding to the Rayleigh wavelengths 308 and 353/355 nm, respectively.
Ozone DIAL systems have been making routine operations at MOHp, OHP, Tsukuba, TMF, MLO and Lauder since 1987, 1986, 1988, 1988and 1994 These lidar systems and their ozone retrieval methods are similar. The main difference is in the choice of the reference wavelength. Most lidar stations use 355 nm as the reference wavelength except MOHp and Lauder lidar, which use 353 nm over the whole period, and TMF and MLO lidars used this configuration until 2000 and then changed to 355 nm (Leblanc and McDermid, 2000). Other differences among the lidars are in the receiving data acquisition system and the number of channels used to detect the dynamical range of the lidar signals. For that, the Rayleigh signals are split into high and low energy channels to retrieve ozone profiles in the upper and mid-lower stratosphere, respectively. For instance, at OHP, the receiving system had 2 acquisition channels until 1993. It was then modified to accommodate 6 channels (4 at 308, 355 nm; 2 at 332, 387 nm) in 1994, which improved the observational capacity of the lidar system (Godin-Beekmann et al., 2003). Similar 6 channels are used to measure ozone at Tsukuba (Tatarov et al., 2009) and Lauder (Brinksma et al., 2000). However, only 2 receiving channels (2 at 308, 353 nm) are used at MOHp (Steinbrecht et al., 2009b) and 8 channels at TMF (4 at 308, 332 nm; 4 at 355, 387 nm) and MLO (3 at 308, 332 nm; 5 at 355, 387 nm). The precision of ozone lidar measurements degrades with height, with values of 1 % up to 30 km, 2-5 % at 40 km and 5-25 % at 50 km.
The altitude range of most ozone lidar measurements is between the tropopause and 45-50 km, except at Tsukuba, where the highest altitude was 40 km in the beginning of the observation period and decreased to ∼35 km in 2002 and ∼30 km in 2010. Data from the starting year of observations until 2010 for OHP and Tsukuba and 2011 for other stations are considered for the analysis. As in Nair et al. (2011), here also we have used the OHP ozone lidar profiles re-analysed using National Center for Environmental Prediction (NCEP) temperature data and using Bass and Paur (BP) ozone cross-sections (Godin-Beekmann and Nair, 2012). Because the ozone cross-section is sensitive to temperature, a trend of 1 K decade −1 can induce an ozone trend of about 0.2 % decade −1 (Godin-Beekmann et al., 2003). Note that WMO (2011) has reported a temperature trend of about 1.5 K decade −1 in the middle and upper stratosphere.

Ozonesondes
Ozonesonde measurements are characterised by a higher vertical resolution (∼0.2 km) compared to other measurements. The main ozonesonde types are Brewer-Mast (BM) (Brewer and Milford, 1960), electrochemical concentration cell (ECC) (Komhyr, 1969) and Japanese ozonesonde (KC) (Kobayashi and Toyama, 1969). The measurement principle of sondes is that ambient air is pumped into a chamber containing a potassium iodide (KI) solution, where it becomes oxidised by ozone and a current is produced. In the Japanese KC sondes, the concentration of potassium bromide (KBr) is higher than that of KI and it plays an auxiliary role for the above reaction. The amount of ozone in the air sample can be derived from the measurement of the electron flow together with the air volume flow rate delivered by the sonde pump. There are different types of ECC sondes depending on the manufacturing company, i.e. Science Pump Corporation (SPC) and Environmental Science Corporation (EN-SCI). Several studies (e.g. Johnson et al., 2002;Smit et al., 2007) revealed that the ENSCI sondes overestimate ozone by ∼5 % below 20 km and 5-10 % above 20 km as compared to SPC-6A sondes, when both sondes operate with 1 % KI full buffer cathode solution. Also, the BM sondes underestimate ozone by 10 %, while the ECC sondes with 1 % KI cathode solution overestimate ozone by 5 % compared to that with 0.5 % KI (Stübi et al., 2008). Similarly, the KC sondes underestimate ozone by 10 % above 50 hPa (Deshler et al., 2008).
Generally, correction factors (CFs) are used to screen the sonde profiles (Tiao et al., 1986). It is the ratio of total ozone provided by a nearby column measuring instrument to the sum of total ozone integrated up to the burst level of sonde measurements and a residual total ozone value evaluated above that level (Logan et al., 1999). The profiles having CF 0.8-1.2 for ECC and KC and 0.9-1.2 for BM sondes are considered of good quality (SPARC, 1998) and are selected in this study. The ECC sonde measurements have an uncertainty of about ±(5-10) % and provide accurate measurements up to ∼32 km (Smit et al., 2007). Ozone soundings performed at MOHp, OHP, Tateno, Hilo and Lauder are considered here.
The BM sondes manufactured by the Mast Keystone Corporation have been used at MOHp since 1967(Steinbrecht et al., 1998. They employ a bubbler consisting of an electrochemical cell filled with 0.1 % buffered KI solution, in which cathode and anode wires are immersed. The uncertainty of BM sondes is better than 5 % in the stratosphere. The radiosonde type was changed from VIZ to Vaisala RS80 in 1996. The BM ozonesonde profiles are normalised by total column data. We used the BM ozonesonde profiles in 1987-2011 for this study. At OHP, the ECC ozonesondes with 1 % buffered KI cathode sensor solution were used for measuring ozone from 1991 onwards. Type 5A sondes manufactured by SPC were flown from 1991 to 1997 and 1Z series sondes by ENSCI afterwards. The ozonesondes were coupled to Vaisala RS80 radiosondes through a TMAX interface until 2007 and then to Modem M2K2DC radiosondes through an OZAMP interface. We follow the approach described in Nair et al. (2011) for analysing the OHP ozonesonde data except that the ozone partial pressure from the ECC Modem sondes (from June 2007 to the present) was now reprocessed from the current and the pump temperature. The ECC ozonesonde profiles in 1991-2010 are utilised for the analysis.
The KC type ozonesondes, manufactured by Meisei Electric Company, are used at Tateno (hereafter termed as Tsukuba ozonesondes) from January 1968 to November 2009 and ECC sondes thereafter. The KC68, KC79 and KC96 were used in 1968-1979, 1979-1997and from mid-1997 They are based on a carboniodine ozone sensor, an electrochemical cell containing platinum gauze as cathode and carbon as anode immersed in an aqueous neutral KI/KBr solution (Fujimoto et al., 1996). In 1979, the double-chambered electrochemical cell was modified to a single cell. The KC79 and KC96 sondes, normalised to ozone total column data, are used here for the period 1988-2009. ECC sondes made by SPC-4A, 5A and 6A, and ENSCI 1Z and 2Z models have been used for measuring ozone at Hilo in 1991-2010. These are connected to Vaisala RS-80-15 type radiosondes using the interface boards En-Sci V2C for all 2Z sondes, TMAX for all 5A, 6A and 1Z sondes and an analog data system for 4A sondes. The data acquisition is made using the "Strato" version (V) 7.2 program . The cathode sensor solution was switched from 1 % KI buffered to 2 % KI unbuffered in 1998 and was again changed to 1 % KI buffered in 2005. The integrated ozone column is compared to measurements with a Dobson spectrophotometer, but normalisation is not performed (McPeters et al., 1999). In our analysis the CF is calculated from the ratio of the Dobson ozone column to the sonde ozone column provided in the data files. The ECC ozonesonde measurements in 1993-2010 are used for this study. Hereafter, Hilo ozonesondes are referred to as the ozonesondes at MLO.
At Lauder, ECC ozonesondes with 1 % KI cathode solution concentration were flown from 1986 to 1996 and have been using 0.5 % KI from 1996 to the present. SPC-4A, 5A and 6A series of sondes were used in 1986-1989, 1990-1994 and 1995-1996, respectively, followed by ENSCI-1Z. The VIZ radiosonde was used until 1989 and then Vaisala RS80, coupled with a TMAX interface. Here, ozonesonde data are not normalised with total column ozone data, but the data from the sondes containing 1 % solution are multiplied by 0.9743 to put them on the BP scale for Dobson column measurements, because the BP cross sections affect the Dobson data, on which ozonesonde calibrations are based (Bodeker et al., 1998). Corrections are applied to the ozonesonde values above 200 hPa to account for pump efficiency degradation. The integrated ozone profile is compared to the total column of ozone measured by Dobson spectrophotometer at Lauder, and the uncertainty is typically less than 5 %. ECC ozonesonde measurements from SPC-5A, 6A and ENSCI in 1994-2009 are analysed here.

Space-based observations
The SBUV(/2) instruments include the original SBUV launched on the NASA (National Aeronautics and Space Administration) NIMBUS-7 satellite in 1978 and the SBUV/2 instruments deployed on the NOAA (National Oceanic and Atmospheric Administration) -9, 11, 14, 16, 17, 18 and 19 series of satellites from 1984 onwards. The nadir measurement technique is employed to measure ozone profiles from the backscattered UV radiation (250-340 nm). The latitudinal coverage of the measurements is 80 • S-80 • N, and the vertical range is 18-51 km (Bhartia et al., 1996). The long-term measurement uncertainty is ∼3 % (DeLand et al., 2004). The vertical resolution of V8 data is 6-8 km, and the horizontal resolution is 200 km . We use V8 ozone column measurements from NIMBUS-7, NOAA-9, 11, 16 and17 in 1985-2007 for this study (Flynn et al., 2009). SAGE II on the Earth Radiation Budget Satellite (ERBS), provided long-term ozone observations from October 1984 to August 2005. Ozone profiles are derived using the solar occultation technique by measuring limb transmittances in seven channels between 385 and 1020 nm that are inverted using the onion-peeling approach. SAGE II measured about 800 profiles per month, with less sampling in summer months at tropical and mid-latitudes. The spatial coverage ranges from 80 • S to 80 • N from month to month. The vertical range of the ozone profiles is 10-50 km with a vertical resolution of ∼1 km and a horizontal resolution of 200 km. The ozone measurements have an uncertainty of ∼5 % at 20-45 km and 5-10 % at 15-20 km. The ozone number density profiles retrieved as a function of geometric altitudes processed by the V6.2 algorithm (Wang et al., 2006) for the period 1984-2005 are used here.
HALOE on UARS was put into orbit in September 1991, and operated for 14 years, until 2005. It also measured limb transmittances from the 9.6 µm ozone band utilising the solar occultation technique, and the onion-peeling procedure for the inversion. The latitudinal coverage of the measurements is 80 • S-80 • N over the course of one year. The vertical range of the ozone profiles is 15-60 km with a vertical resolution of ∼2.5 km and a horizontal resolution of 500 km (Russell et al., 1993). Uncertainty of the ozone measurements is about 10 % at 30-64 km and ∼30 % at 15 km (Brühl et al., 1996). The ozone volume mixing ratio (VMR) profiles V19 for 1991-2005 are used for the analysis.
MLS was launched on UARS in 1991 and its successor aboard Aura in 2004. Both instruments measure thermal emissions from rotational lines of the measured species through the limb of the atmosphere. The 57 • inclination of the UARS orbit allowed MLS to observe from 34 • on one side of the Equator to 80 • on the other. The profiles retrieved from 205 GHz have a vertical range of 15-60 km with a resolution of ∼3-4 km and a horizontal resolution of 300 km. The estimated uncertainty of a single profile is 6 % at 21-60 km and 15 % at 16-20 km (Livesey et al., 2003). Aura MLS has better spatial coverage (vertically and horizontally) than UARS MLS, as well as improved resolution. The latitudinal coverage of the measurements is 82 • S-82 • N. Ozone measurements retrieved from 240 GHz have a vertical range of about 10-73 km and a vertical resolution of 2.5-3 km in the stratosphere. The along-track resolution is ∼300-450 km, and the estimated uncertainty is about 5-10 % at 13-60 km. Data characterisation and validation of Aura MLS V2.2 data can be found in the works by Froidevaux et al. (2008); Jiang et al. (2007) and . The ozone VMRs from UARS MLS V5 in 1991-1999 and Aura MLS V3.3 in 2004-2011, screened as suggested in the V3.3 validation report, are used here.

Stability issues of long-term data sets
Long-term stability is one of the key issues we are interested in this paper. All instruments have different characteristics in this respect. For the ozonesondes, changes in sonde types, manufacturing and sonde preparation are unavoidable in practice, and may affect the long-term stability on the time scale of years to decades. The long-term stability of SBUV(/2) data critically depends on maintaining accurate spectral calibrations over the lifetime of one or more instruments. Solar occultation instruments like SAGE II and HALOE are less prone to drifts, because in their measurements they directly compare reference data taken outside the atmosphere with data at various slant paths through the atmosphere. However, accurate pointing and accounting for Rayleigh scattering can be crucial, as is the long-term stability of filter wavelengths and bandpasses. Lidars should have very good long-term stability, because their differential absorption measurement is self-calibrating in principle. It is differential in wavelength, determined very accurately by lasers, and differential in range, which is measured extremely accurately by electronic clocks.

Data analysis
The average bias and relative drift of different long-and short-term data sets are analysed with respect to the ozone lidar measurements in order to evaluate their consistency and stability. The lidar stations, the respective locations and other observations considered for the analysis are listed in Table 1. The satellite data are extracted around the stations using spatial criteria of ±2.5 • latitude and ±5 • longitude of each station for SBUV(/2), UARS MLS and Aura MLS, and ±5 • latitude and ±10 • longitude for the solar occultation measurements (SAGE II and HALOE) due to their relatively lower sampling. The total number of measurements of all observational techniques at the lidar stations and the number of coincidences obtained by all data sets from different comparisons are displayed in Fig. 1. The top panel shows the total  (1100), MLO (860) and Lauder (1500) during the analysis period. Among the satellites, SBUV(/2) and Aura MLS provide maximum number of measurements (∼8000) during their analysis period of 23 and 8 years, respectively. They measure nearly the same number of profiles at all regions irrespective of latitude. On the other hand, UARS MLS, SAGE II and HALOE show a clear latitudinal dependence with fewer observations by SAGE II and HALOE at all stations. The solar occultation measurements (SAGE II and HALOE) take more observations above 40 • latitude in both hemispheres (e.g. MOHp, OHP and Lauder) and less measurements at other stations. On the contrary, UARS MLS yields more profiles at stations situated below 37 • latitude (e.g. Tsukuba, TMF and MLO) and fewer profiles at other stations. Generally, UARS MLS provides more measurements between 34 • S to 34 • N because of the UARS yaw manoeuvres as stated in Sect. 2.3. Normally, satellite measurements yield more than 1 measurement a day. So in order to be coherent with the ground-based measurements, only one observation per day is considered and is illustrated in panel (b) of Fig. 1.
The analysis is performed using the coincident ozone profiles of various data sets. Coincidences are determined using spatial grids similar to those applied for the data extraction mentioned previously, with a time difference maximum of ±12 h. In order to get a clear idea about the bias and drift of various time series, different types of comparisons are performed at each station. First, various data sets are compared to the lidar measurements. Figure 1c shows total number of coincidences of all measurement techniques with respect to the ozone lidar. Among the lidars, the Tsukuba lidar provides the fewest coincidences due to its comparatively lower measurement frequency. Compared to the stations above 40 • N/S, Lauder lidar provides fewer collocations since it started operation in 1994, about 8 years after the MOHp and OHP lidars. Then, the analysis is performed by the cross-comparison of long-term data sets such as lidar, SBUV(/2), SAGE II and HALOE with respect to SBUV(/2), SAGE II and HALOE as references. Figure 1d, e, and f display the number of collocated profiles of these long-term measurements with SBUV(/2), SAGE II and HALOE as references, respectively. As expected, SBUV(/2) and HALOE provide the highest and the lowest number of collocated profiles, respectively, with respect to all other measurement techniques.

Relative differences and mean biases
In order to quantify the bias of various data records with respect to lidar, the difference in time series is computed. As the observing period of lidars is different for various stations, the period of comparisons also differs. The comparison periods of ozonesondes depend on the availability of both lidar and sonde data at the station. In the case of comparison with lidar, the difference between collocated measurements is computed as where i = coincident day, and j = altitude or pressure. "Meas" denotes SBUV(/2), SAGE II, HALOE, UARS MLS, Aura MLS and ozonesondes. The mean bias of each measurement technique is then calculated by averaging the relative differences over the respective coincident periods with each lidar.
where O 3L (j ) is the average ozone difference and N (j ) is the number of collocated profiles at altitude j . The standard error of the bias is determined as where σ (j ) is the standard deviation of the relative differences at altitude j . The estimation of drifts of satellite data requires an evaluation of the stability of the reference measurements, the lidars in this study. The stability of lidar data is analysed by comparing lidar ozone with SBUV(/2), SAGE II and HALOE as references and by estimating the relative drifts. To compare the drift of lidar measurements with that of other long-term observations, the relative drifts of SBUV(/2), SAGE II and HALOE ozone data are estimated by the mutual comparison (taking each of them as the reference) in a similar way. For instance, the comparison with SBUV(/2) as the reference is performed as with "Meas" as lidar, SAGE II and HALOE. The same procedure is repeated for the comparisons with respect to SAGE II and HALOE: where "Meas" is lidar, SBUV(/2) and HALOE and where "Meas" is lidar, SBUV(/2) and SAGE II.

Slope and standard deviation
The drift between the measurements is computed from the estimation of the slope of the monthly averaged difference time series, using a simple linear regression. The standard deviation (σ s ) of the slope is computed using the same equation (taken from Press et al., 1989) as used in Nair et al. (2011). In addition, autocorrelation is calculated for all data sets with a one month lag and is found to be within ±0.3 in the 20-40 km altitude range. Then, the standard deviation is calculated using the equation given by Frederick (1984) that makes use of the autocorrelation term. The standard deviations estimated from both the equations are found to be very similar. Hence, the ones estimated from Press et al. (1989) are discussed in this study. The derived drift is considered to be significant if the slope is greater than twice the standard deviation of the slope. Generally, a longer time series with continuous and sufficient number of profiles is needed to determine accurate drifts and to reduce standard deviation to a large extent. The presence of outliers will also result in incorrect drifts, and hence they are removed from the analysis. For example, our analysis excludes ozonesonde profiles with values of about 1×10 10 molecules cm −3 at OHP. In addition, for SAGE II and HALOE, the relative differences exceed 200 % at altitudes below 17 km and at 45 km for some profiles. Those altitudes are also removed from the analysis. However, these outliers are very few in number, less than 5 in total for a station during the entire analysis period.

Data conversion
The comparison is performed by converting all data to ozone number density as a function of geometric altitude, except for SBUV(/2). Lidar and SAGE II data are given in these units, and ozone partial pressures from sondes and VMRs from HALOE and MLS are converted to number density using the pressure-temperature (p/T ) data provided in the respective data files. The sondes use the PTU (pressure-temperaturehumidity) data measured using the radiosondes coupled to the ozonesondes. SAGE II and HALOE provide the interpolated NCEP p/T data, whereas MLS retrieves p/T data independently. In order to account for the vertical resolution of MLS ozone, these are compared by integrating the higher resolution lidar profiles within a ±1.5 km altitude band with respect to each MLS altitude level, and then both lidar and MLS data are interpolated to the mean MLS altitude calculated for the comparison period, until 30 km. Above 30 km both lidar and MLS have similar vertical resolution, and thus the comparison is done by interpolating lidar data to MLS altitudes. Comparison between SAGE II and HALOE is also done in the same way, using number density profiles on geometric altitudes by converting HALOE ozone VMRs to number density.
SBUV(/2) provides ozone information as both VMRs and partial columns in Dobson Unit (DU), from which partial ozone columns are used here. Contrary to other comparisons, the partial ozone columns of SBUV(/2) on pressure levels are retained and ozone data from the compared instrument are converted to ozone column in DU. The resulting ozone values are then added above the respective pressure levels and are interpolated logarithmically to the SBUV(/2) pressure levels. Then, ozone in the adjacent layers is subtracted to determine the partial ozone column in each SBUV(/2) layer, which are used for finding the relative differences. Even if the comparisons are performed on pressure levels, the results are presented on geometric altitudes for the comparison with other measurement techniques too. For that, the approximate altitudes corresponding to the SBUV(/2) mid-pressure levels are calculated. As altitude-pressure conversion always induces some bias between the measurements; special care is needed for its use.
In a previous work (Nair et al., 2011), we used NCEP data for converting ozone lidar number densities to ozone partial columns for comparing with SBUV(/2) at OHP. It showed drifts of about 0.5 % yr −1 , which is larger than that estimated in this study, for the comparison between SBUV(/2) and lidar above 30 km. In a similar study, McLinden et al. (2009) also referred to an anomalous temperature trend above 30 km for the comparison between SBUV(/2) and SAGE II. Therefore, in this study we took p/T data from Arletty (Hauchecorne, 1998), to convert ozone number density from lidars and SAGE II or VMR from HALOE to ozone partial column to compare with SBUV(/2) data. Arletty is an atmospheric model that makes use of the European Centre for Medium Range Weather Forecasts (ECMWF) meteorological analysis and the Mass Spectrometer-Incoherent Scatter-1990 (MSIS-90) atmosphere model (Hedin, 1991) for deriving atmospheric profiles. The MSIS-90 model data are based on the Middle Atmospheric Program (MAP) Handbook (Labitzke et al., 1985) tabulation of zonal average p/T data below 72.5 km and the NCEP p/T data below 20 km. Arletty used the ECMWF data up to 30 km and the MSIS-90 above 30 km until 1998 and the ECMWF data for all altitudes thereafter. In order to demonstrate which temperature data are useful for the analysis, trends in the NCEP and Arletty temperature data at MOHp are calculated using a simple linear regression. The NCEP temperature shows insignificant trends of less than −1 K decade −1 below 30 km and about −1 to The comparison between various lidars and the nearby ozonesondes is performed using the normalised sonde profiles (ozone profiles multiplied by the CF). It should be noted that the BM sondes at MOHp and KC sondes at Tsukuba are already provided after normalisation, whereas the ECC sondes at OHP and MLO are not normalised. So in our analysis, we have multiplied the CF to the OHP and MLO sondes to find the relative difference and drift.
In short, though we follow similar comparison statistics as Nair et al. (2011), there are some major changes in this study. While Nair et al. (2011) performed only one type of comparison, with respect to lidar observations, this study uses four different types of comparison statistics (lidar, SBUV(/2), SAGE II and HALOE as references) to find the drift in the measurements and thus the instrument stability. The average drift is computed to present the global picture of the estimated instrumental drift. Further, the Aura MLS data are compared to lidar in a different way to compensate for the lower vertical resolution of the lidar above 30 km. Also, Arletty p/T data are used instead of NCEP p/T data for the unit conversions. Therefore, there are significant improvements in the analysis presented in this study to find the relative difference, bias and drift.  Generally, the differences are larger in the upper stratosphere (above 40 km) compared to those in the middle stratosphere (20-40 km), but are smaller than those observed in the lower stratosphere (below 20 km). Yet, they do not exceed ±7 % in most cases. These large biases above 40 km are likely due to the relatively lower precision of the ozone lidar above 40 km. However, smaller biases are observed with respect to TMF lidar measurements, which implies that these measurements are very powerful and are less noisy even in the upper stratosphere.
Comparatively larger differences observed below 18 km are mostly due to the large ozone variability in the lower stratosphere. It is noted that the tropopause varies from ∼10 to ∼15 km depending on the season at MOHp, OHP and Lauder, and from ∼12 km in winter to ∼18 km in summer at Tsukuba and TMF, whereas it is located between 16 and 20 km at MLO. Because of the elevated tropopause in all seasons, the analysis excludes the measurements below 21 km at MLO. Near the tropopause the ozone variability is largest, which can be the reason for the observed large differences for all measurements below 18 km at Tsukuba and TMF. Besides, as in our analysis, Jiang et al. (2007) also showed some high bias for Aura MLS with the OHP, TMF and MLO lidars in the lower stratosphere, which could be due to the upper troposphere/lower stratosphere (UT/LS) oscillations. In addition, it is a more difficult region to retrieve for satellite measurements.
Large deviations are found at Tsukuba particularly in 15-17 and 40-42 km, as seen in Tatarov et al. (2009). These are possibly due to the fewer coincidences with Tsukuba ozone lidar measurements. The large positive deviations found for UARS MLS below 20 km at all stations can be due to the poorer retrieval of UARS MLS. This positive bias near 100 hPa was also found in the comparison between SAGE II and UARS MLS at all latitudes (Livesey et al., 2003). Aura MLS shows very small deviations above 20 km even though a slight negative bias of ∼5 % is found at OHP and MLO above 38 km. At MLO, it is mainly generated from the MLS temperature data used for the conversion of MLS ozone VMR to number density. This negative difference above 38 km (3-1.46 hPa) was already shown in Jiang et al. (2007) when compared to lidar and in Boyd et al. (2007) for the comparison with microwave radiometer (MWR) at MLO. Similarly, the differences of SAGE II and Aura MLS with the MWR show positive deviations in the upper stratosphere at Lauder , which is same as obtained in our comparison for SAGE II and Aura MLS with the Lauder lidar. Lower negative deviations of Aura MLS at OHP above 40 km, in contrast to the higher bias shown in Nair et al. (2011), imply that differences in vertical resolution can play a significant role in the determination of ozone biases of different instruments.

Application of correction factor
As mentioned in Sect. 2.2, the CF is used to screen the sonde profiles at MOHp, OHP, Tsukuba and MLO. So we investigate the differences in the estimated biases in terms of CF. Therefore, the normalised BM and KC sonde profiles are divided by the CF to remove the scaling. Figure 3 shows the average biases obtained for the comparison between lidar and non-normalised (left panel) and normalised (right panel) sondes. The non-normalised BM (at MOHp), KC (at Tsukuba) and ECC (at OHP) sondes provide larger bias compared to the respective normalised sondes. However, the non-normalised ECC sondes at MLO yield smaller bias than that of the normalised sondes. The non-normalised sondes consistently underestimate ozone at all altitudes at MOHp and OHP. Nevertheless, the non-normalised KC sondes at Tsukuba overestimate ozone above 21 km and underestimate below 21 km. Hence, the normalised KC sondes show comparatively larger negative bias below 21 km. In general, multiplication of the CF reduces the bias except at MLO. Besides, the differences between these comparisons, in terms of CF, are not as large for ECC sondes as compared to the BM and KC sondes. In addition, the ozonesondes at MOHp show slightly larger bias above 29 km in both cases, which is largely due to the inadequate correction of decreasing pump efficiency in the low pressure regions (Steinbrecht et al., 1998(Steinbrecht et al., , 2009b.

Relative drifts
Monthly mean difference time series of the compared data sets are used to evaluate drifts in the ozone measurements, because they are less noisy compared to the daily differences. Also, there are possibilities of non-linear drifts for the satellite measurements due to the degradation, particularly for SBUV(/2). But in our analysis, for consistency, a simple linear regression is applied to these time series and the drift is derived from the slope value of the regression.

Comparison with ozone lidar as reference
Lidars are used as the reference for Fig. 4, where drifts are estimated for the data set samples from SBUV(/2), SAGE II, HALOE, Aura MLS and ozonesondes. UARS MLS is excluded from the drift estimation since it is not considered as good for ozone trend studies because of the change of instrument set-up in 1997 due to the failure of one radiometer for the independent p/T retrievals. Generally, the relative drifts are less than ±0.5 % yr −1 at 20-40 km and most of them are insignificant too. However, some significant drifts are observed at some altitudes for SAGE II at OHP and MLO, for HALOE at OHP, TMF and MLO, for SBUV(/2) at TMF and MLO and Aura MLS at MOHp and TMF. As we have seen for the biases, drifts are larger below 20 and above 40 km. Among the long-term measurements, SBUV(/2) and ozonesondes provide the smallest drift with respect to all lidars. Aura MLS exhibits comparable drifts to those of SAGE II and HALOE even though it has only eight years of measurements and the drifts are significant at Aura MLS shows relatively larger negative drifts at MOHp and TMF above 30 km. In order to understand these negative drifts, we analysed the raw ozone time series (i.e. by considering all observations irrespective of the coincident profiles) from various observations (SBUV(/2), SAGE II, HALOE, UARS MLS and ozonesondes), including Aura MLS and lidar, at MOHp and TMF. From the ozone anomaly time series, it is found that the ozone anomaly computed from the MOHp ozone lidar measurements agrees well with those estimated from the above mentioned data sets until 2007. Conversely, an increase in ozone anomaly is found above 30 km since 2007 compared to that evaluated prior to 2007 for lidar data at MOHp. A similar result is also found for TMF ozone lidar measurements, i.e. an increase in ozone anomaly in 2008 and 2009 compared to the ozone anomalies computed from TMF lidar in other years above 30 km. Nevertheless, the ozone anomalies of Aura MLS do not show any discontinuity and exhibit a similar pattern over the analysis period. Therefore, the significant negative drift of the Aura MLS data above 30 km at MOHp and TMF can be due to the high ozone values of lidar measurements at these stations in the specific years. However, more comparisons with additional data sets are necessary to find an exact reason for these differences.
Note that the drift in the measurement differences may not entirely be due to the measurement uncertainties of the comparison data sets, as the reference data can also contribute to it. Therefore, accurate diagnosis of the stability of the reference data is a prerequisite in drift studies; hence, the stability of lidar time series is evaluated in the following section.

Comparison of lidar with SBUV(/2), SAGE II and HALOE as references
The stability of ozone lidar measurements at various stations is checked by finding their drifts in comparison with other long-term data sets such as SBUV(/2), SAGE II and HALOE. The derived drifts of all lidars considering SBUV(/2), SAGE II and HALOE as references are shown in Fig. 5a, b and c, respectively. Generally, all lidars exhibit very small drifts (within ±0.2 % yr −1 ) with SBUV(/2), but some of these are significant at MOHp (at 30,32,42 and 45 km), Tsukuba (at 26 and 32 km), TMF (at 32, 42 and 45 km) and MLO (at 30, 32 and 42 km). The drifts with respect to SAGE II and HALOE are slightly larger, but most of them are not significant except the ones at 20-22, 25, 38 and 39 km with SAGE II at MLO. The RMS of the drifts of lidar in the 20-40 km altitude region, averaged over the stations excluding Tsukuba is about 0.16, 0.34 and 0.42 % yr −1 with respect to SBUV(/2), SAGE II and HALOE, respectively. To corroborate these results, the drifts of other long-term measurements SBUV(/2), SAGE II and HALOE are estimated in a similar manner and are described in the following section.

Comparison of SBUV(/2), SAGE II and HALOE
As mentioned earlier, the relative drifts of SBUV(/2), SAGE II and HALOE are evaluated by comparing them to each other. Figure 6a shows the relative drifts of HALOE at various stations with SAGE II as reference. The drifts are of about ±0.5 % yr −1 at MOHp, OHP, Tsukuba and Lauder, above 20 km. At TMF, it is more or less scattered and is less than ±0.5 % yr −1 except at 21-22 and 29-34 km. At MLO, the drifts are larger as the coincidences are available for 4 years (mid-1999-mid-2003) only. Even though these drifts are larger compared to those of lidar and SBUV(/2), they are insignificant and are compatible with the no-drift hypothesis, but the uncertainty is too large to detect small drifts. Figure 6b and c represent the relative drifts of SBUV(/2) with SAGE II and HALOE as references, respectively. The relative drifts of SBUV(/2) with SAGE II are very small, and most of them are close to zero irrespective of the stations. Similarly, the comparison of SBUV(/2) with HALOE exhibits drifts of less than ±0.5 % yr −1 . The SBUV(/2)-SAGE II comparison yields smaller drifts than those between SBUV(/2) and HALOE. The former comparison yields around ±0.1 % yr −1 in 20-44 km, while the latter leads to about ±0.2 % yr −1 at 21-25, 30-42 km and ∼0.5 % yr −1 at 45 km at all stations. The importance is that even if the drifts are very small, some of these are significant -particularly in the upper and middle stratosphere. These results are very similar to those mentioned in Nazaryan et al. (2005) and Nazaryan et al. (2007), who compared SBUV/2 (NOAA-11,16) with SAGE II and HALOE, respectively in the latitude bands of 50-40 • S, 10-20 • N, 30-40 • N and 40-50 • N. In the same manner, Cunnold et al. (2000) calculated drifts between SBUV and SAGE and found drifts of ±0.5 % yr −1 in the tropical and mid-latitude regions. From Figs. 5 and 6, it is obvious that the comparison between SBUV(/2) and all other long-term measurements provides near-zero drifts (or no drifts) at all stations and at all altitudes. Here, the comparison is performed using partial ozone columns on SBUV(/2) pressure levels, which reduces the ozone variability. Moreover, the coincidences between SBUV(/2) and other measurements provide a continuous time series (or the coincidences are available in all months considered over the time period). These reasons contribute to the smaller drifts.
Shortly, the comparison between SAGE II and HALOE produces larger drifts with each other, but their comparison with SBUV(/2) and lidar yields comparatively small drifts. Therefore, the large drift obtained for the comparison between SAGE II and HALOE does not imply that these measurements are unstable for the long-term study. It indicates that comparison of similar techniques having a low measurement frequency does not provide a clear picture of the stability of the data.

Average of the drifts of long-term measurements
In order to summarise or to compare globally the magnitude of the drifts of different measurement techniques obtained from various comparisons, the means of the drifts are computed for each data set at each station and are presented in Fig. 7. For example, the drift of the lidar shown at each station is the average of its drifts (shown in Fig. 5) obtained from the comparisons with SBUV(/2) (Eq. 4), SAGE II (Eq. 5) and HALOE (Eq. 6) as references. Similarly, the mean drift of SBUV(/2) is the average of the drifts obtained from the comparisons with lidar (Eq. 1), SAGE II (Eq. 5) and HALOE (Eq. 6) as references and similarly for SAGE II and HALOE. In a similar way, the standard deviation corresponding to the mean drift of each measurement technique is computed by averaging the standard deviations of each drift obtained from different comparisons. It is just a way to represent the standard deviation and does not show the significance of the drift.
Generally, as found in the previous comparisons, all data sets show small drifts of around ±0.2 % yr −1 in the 18-45 km altitude range and the measurements are stable too. SAGE II and HALOE ozone at MLO show slightly larger drifts because of the lack of coincidences in most of the years. Below 18 km, the large ozone variability near the tropopause plays a pivotal role in deciding the magnitude of the differences.

Combined data: SAGE II, HALOE and Aura MLS
In general, the 8-year data record of Aura MLS yields comparable drifts to the long-term measurements with respect to most of the ozone lidar measurements except with MOHp and TMF ozone lidars above 30 km. So Aura MLS can be considered as a strong candidate for extending the observations of SAGE II and HALOE. Here, we assess the possibility of using Aura MLS as a successor of SAGE II and HALOE for ozone trend studies in the low and mid-latitude regions. The combined data sets are computed from the relative differences between the lidar data and SAGE II or HALOE measurements until August 2004, and Aura MLS observations from September 2004 until the end of the respective coincident periods. Before combining data sets of entirely different observational techniques, a correction of bias with respect to lidar measurements needs to be applied. For this, the average biases over the coincident periods of SAGE II, HALOE and Aura MLS, with respect to lidar data, are removed from the corresponding time series of relative differences at each station. Because of the differences in vertical resolutions of SAGE II, HALOE and Aura MLS, the combined data sets are made available at specific reference altitudes (18, 21, 25, 30, 35 and 40 km). The relative differences at these altitudes are calculated by averaging ozone number density within ±2 km of the altitudes (e.g. 18 ± 2 km). The features of the combined time series are described in Sect. 4.3.1, and the drifts derived from these combined data are discussed in Sect. 4.3.2. Figure 8 shows the bias-corrected combined time series at MOHp (left panel), OHP (middle panel) and Tsukuba (right panel). At MOHp and OHP, small differences of ±(5-7) % are observed for SAGE II and HALOE in 19-23, 23-27, 28-32 and 33-37 km. Aura MLS shows very small deviations of less than ±5 % in these altitudes at both stations. At 16-20 and 38-42 km, differences are relatively larger (±10 %) for SAGE II and HALOE and are less than ±7 % for Aura MLS. Even if the Tsukuba time series is characterised by relatively fewer data and large discontinuities, smaller differences are observed. At MOHp, a decreasing tendency is observed in the relative differences of Aura MLS from 28-32 km onwards, which can be due to the increase in ozone lidar measurements after 2007, as discussed in Sect. 4.2.1. In addition, at MOHp, a clear seasonal difference is also seen for the comparison with Aura MLS at 38-42 km showing positive deviation in the winter, indicating that the Aura MLS ozone is slightly higher than that of MOHp lidar in that season. Figure 9 displays the bias-corrected combined time series at TMF (left panel), MLO (middle panel) and Lauder (right panel). At MLO, the relative differences are less than ±5 %. In the tropics, the ozone variability is very small compared to that of high latitudes, which explains the smaller differences at MLO. At TMF and Lauder, Aura MLS shows differences of ±5 % at all altitudes except at 16-20 km, and SAGE II and HALOE exhibit about ±10 % deviation except at 16-20 km, where the differences exceed ±20 %. At TMF, Aura MLS exhibits negative differences in 2008 and 2009 from 28-32 km onwards, which can be due to higher lidar ozone during the period as compared to other years, as mentioned in Sect. 4.2.1. Figure 10 presents the relative drifts estimated from the combined time series (as shown in Figs. 8 and 9) of SAGE II and Aura MLS (left panel), and HALOE and Aura MLS (right panel) at various stations. The drifts are generally within ±0.2 % yr −1 . However, SAGE II/Aura MLS drift at Lauder shows around ±0.2 % yr −1 at 21, 25, 30 and 35 km and around ±0.3 and ±0.48 % yr −1 at 18 and 40 km, respectively. These large values are due to the fact that the first two measurements in the beginning of the period show slightly larger difference for SAGE II versus lidar (as shown in Fig. 9). The removal of those two measurements results in a very small drift of less than ±0.2 % yr −1 over the whole range (shown as dashed lines in the left panel of Fig. 10). At Tsukuba, drifts are relatively larger at some altitudes compared to that at other stations. Generally, the combined data show insignificantly small drifts. It indicates that the combination of these satellite observations can be a potential long-term data set for the evaluation of long-term ozone trends in the stratosphere, even though Aura MLS shows significant drifts with lidars at some stations above 30 km.

Conclusions
An extensive analysis of stratospheric ozone measurements at different NDACC lidar stations (MOHp, OHP, Tsukuba, TMF, MLO and Lauder) is performed in this study. The diagnosis is done by comparing various long-and shortterm satellite observations of SBUV(/2), SAGE II, HALOE, UARS MLS and Aura MLS as well as ozonesonde measurements at the respective stations.
The relative difference (or bias) of all measurement techniques is found by comparing them with respect to lidar measurements in their respective coincident periods. All measurement techniques (satellites and sondes) agree well with all lidars, with average biases of less than ±5 %, in the 20-40 km range. In order to detect ozone trends on the order of a few % decade −1 , stability of long-term measurements is essential. This is particularly important for long-term groundbased and satellite sensors, which may be subject to some degradation during their lifetime. Therefore, in this study we examine the stability of each measuring system by investigating the magnitude of the drifts. This is attained first by comparing all measurements with respect to lidars, which yields drifts of less than ±0.5 % yr −1 at 20-40 km for most observations. Aura MLS with 8 years of observation also shows drifts that are comparable to those of the long-term data sets at all stations except at MOHp and TMF above 30 km. Below 20 and above 40 km, relative differences and drifts are larger, mostly due to discontinuity in the time series, smaller ozone values or higher uncertainty of ozone observations in these altitude regions. In addition, in the lower stratosphere larger atmospheric variability at the mid-latitude stations and a higher tropopause at the tropical station also contribute to the observed large biases and drifts.
A successful evaluation of biases and drifts depends on the stability of the reference data, and hence the drifts of ozone lidar measurements with respect to the longer data sets SBUV(/2), SAGE II and HALOE are estimated. The relative drifts of lidar are nearly zero at most altitudes. Similarly, the drifts of SBUV(/2), SAGE II and HALOE are estimated by comparing them with each other. Comparison between SAGE II and HALOE shows drifts with maximum of ±0.5 % yr −1 in 20-45 km, whereas the comparison of SBUV(/2) with lidar and SAGE II produces near-zero drifts. Because of successive instruments, SBUV(/2) provides daily global measurements over the whole period with a large number of collocated profiles, and thus a very accurate evaluation of drift of the data is performed. So a sufficient number of continuous profiles is an important factor for deducing accurate drifts with meaningful statistics. The average of the drifts of long-term measurements obtained from various comparisons is within ±0.2 % yr −1 in 20-45 km. Therefore, the long-term measurements considered here are stable at the respective latitude bands.
As the various ozone measurement techniques yield consistent results, it is useful to combine different ozone measurements to establish a long-term data set for further analyses and trend studies. Hence, a bias-corrected combined time series is constructed using the relative differences of SAGE II and HALOE, with respect to lidar data, with those of Aura MLS and estimated the relative drifts. It shows drifts of less than ±0.2 % yr −1 at most altitudes for all the considered latitude bands. So the combination of the older data sets, SAGE II and HALOE, with Aura MLS can be used for the estimation of long-term ozone trends.
Therefore, this work satisfies one of the main goals of NDACC-the validation of ozone measurements from satellites over several decades at different latitude bands. This study is unique, as it establishes for the first time the bias and drift of short-and long-term data for a number of groundbased stations using at least four different comparison methods and evaluates drifts of the combined data sets. It demonstrates that the long-term NDACC ozone lidar measurements are suitable for the evaluation of the stability of satellite observations and the estimation of ozone trends.