SAGE version 7 . 0 algorithm : Application to SAGE II

This paper details the SAGE version 7.0 algorithm and how it is applied to SAGE II. Changes made between the previous (v6.2) and current (v7.0) versions are described and their impacts Oll the data products explained for both coincident event comparisons and time-series analysis. Users of the data will notice a general improvement in all of the SAGE II data products, which are now in better agreement with more modem data sets (e.g. SAGE III) and more robust for use with trend studies.

This discussion paper is/has been under review for the journal Atmospheric Measurement Techniques (AMT). Please refer to the corresponding final paper in AMT if available.

Introduction
The Stratospheric Aerosol and Gas Experiments (SAGE I, II, III/METEOR-3M, and III/ISS) are an ongoing series of satellite-based solar occultation instruments spanning 10 over 26 yr. Measurements from the SAGE series have been a cornerstone in studies of stratospheric change, including having played a key role in numerous international assessments (e.g. WMO, 2011). Given the importance of the data, it is imperative that the data sets, and the processing codes that produce them, be maintained and, when necessary, updated and improved to reflect the evolving "best practices" for process- 15 ing occultation data to science products. To facilitate using data from multiple instruments to investigate long-term variability in atmospheric components, it is important to maintain consistency in methodology (when applicable) and fundamental assumptions made in processing data from each instrument. This paper describes the first standard algorithm to process SAGE data, SAGE version 7 (v7.0). The basis of the The optical properties of most channels were defined by the position of exit slits along a Rowland spectrometer where photodiodes measured the impinging light. The seven channels, in channel number order, were nominally located at 1020, 935, 600, 525, 452, 448, and 386 nm. Due to limitations on the size of the diodes (i.e. placing them next to each other), channels 2 and 5 were placed at the zero order location with filters providing the desired band-pass. Channel 6 required a narrow band-pass and also employed a filter to relax the requirements on high tolerance mechanical positioning of the channel 6 exit slit. The SAGE II instrument was oriented towards nadir on the spacecraft such that the optics could observe the Earth's limb. Prior to the expected start of an occultation 15 (event), the scan head/telescope/spectrometer assembly rotated towards the predicted azimuth location of the Sun and then locked onto the solar centroid brightness. An elevation scan-mirror then began moving the field-of-view (0.5 by 2.5 arc-minutes) across the solar disk normal to the Earth's surface. As the field-of-view went off the edge of the Sun, the scan-mirror would reverse direction. In this way, the field-of-view scanned ver- 20 tically across the Sun while each channel recorded solar irradiance data (count values) at a rate of 64 Hz (packets per second) to construct a series of solar limb-darkening curves (counts observed as a function of time). The instrument continued this process until the Sun disappeared below the Earth's limb (for sunsets) or a preset amount of time had elapsed (for sunrises). The benefit of scanning back and forth across the 25 Sun, as opposed to simply staring at the Sun, is that during the course of a sunrise or sunset, the instrument was able to observe the same altitude multiple times, albeit through slightly different viewing geometries. Stated differently, the instrument was able 5103 (NO 2 ), water vapor (H 2 O), and aerosol extinction (at 1020, 525, 452, and 386 nm) using a simple onion-peeling technique. In order to do this, the inversion algorithm must make the assumption that the layer of atmosphere at each altitude is homogeneous, or at least has a constant gradient, through the whole swath that the instrument observes. This assumption has obvious limitations in the troposphere and well understood biases 15 at higher altitudes for certain species due to rapid photochemistry, giving rise to a nonlinear variation across the terminator (Chu and Cunnold, 1994), but works well through most of the stratosphere (Cunnold et al., 1989). A general outline of the algorithm can be seen in Fig. 1. The sections that follow provide greater detail on the various steps just outlined and note when and how these differ from the approach of SAGE II version Introduction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | meteorological information. In addition, some calculations related to the spectroscopy of the instrument are performed to facilitate later calculations.

Ephemeris
NASA's Tracking and Data Relay Satellite System measured the state vectors (position and velocity) of the ERBS on a regular basis. The Operation Support Computing 5 Division at the Goddard Space Flight Center assimilated this data with an accurate model to determine the spacecraft position and velocity at 60 s intervals, which was provided as level zero ephemeris data to the SAGE processing team. From this original ephemeris data, events with beta angles greater than 61 • or events that occur during spacecraft viewed solar eclipses are excluded from further processing. High 10 beta angles (defined as declination of the Sun as measured from the orbital plane) are excluded because the duration of an event (and subsequent ground track) becomes long and the assumption of atmospheric spherical symmetry, used for the inversion process, breaks down. For each remaining event, the original spacecraft state vectors just prior to the start of the event are used as input to the processing algorithm. Using 15 a model for the Earth's gravitational potential, the equations of motion are calculated and the state vectors are propagated throughout the event. In addition to the state vectors, the position of the Sun is calculated for each time. Then, for each time, coordinate and geometrical data required by the algorithm are calculated. This includes information related to the spacecraft (sub-spacecraft latitude, longitude, and altitude), 20 the Sun (angular size, right-ascension, declination, and azimuth), and the tangent point (altitude, latitude, longitude) looking at the center, top, and bottom of the Sun. The methodologies for these calculations originate from Buglia (1988). For the most part, these methods are straightforward geometry but, in pre-version 7.0 algorithms, some aspects (e.g. sidereal time, precession, nutation, and Sun position) were approx-25 imated using numerical constants derived from pre-1984 data. It is important to note that in 1984, national and international almanacs adopted a revised set of physical constants put forth by the International Astronomical Union in 1976, which necessitated 5106 Introduction a change to various numerical constants used in these approximations. However, these changes were not uniformly implemented and data quality in versions prior to version 7.0 was adversely impacted by an inconsistency in ephemeris epoch usage. In version 7.0, all physical constants (e.g. Earth's equatorial and polar radii and gravitational constant) have been updated to those used in the World Geodetic System 5 1984 (WGS84, updated in 2004), which is the current standard. The constants used for the multi-pole expansion of the Earth's gravitational potential have been updated to those from the Earth Gravity Model 1996 (EGM96) (Lemoine, 1998). In addition, coded routines that relied upon approximations based upon pre-1984 data have been modified to utilize SDP Toolkit routines (Noerdlinger, 1995). Most of these changes result 10 in inconsequential changes to the results with one notable exception. The original routine designed to calculate the position of the Sun suffered from inconsistent ephemeris epoch usage and outdated numerical approximations. The adoption of a Toolkit routine has corrected what was a previously unknown quasi-random error in the altitude registration in the SAGE II data products. For any given event, this correction manifests 15 itself as an altitude offset between SAGE II versions 6.2 and 7.0. This altitude offset can be positive or negative, can have a magnitude up to a few hundred meters, and varies from event to event. While there is no simple dependence upon beta angle or latitude, there is some correlation with time of year, as expected, as shown in Fig. 2.

20
SAGE processing algorithms require ancillary meteorological data that relates density and temperature to altitude. SAGE II version 6.2 used multiple sources of data to yield density and temperature data (aka P/T data) from the surface up to 100 km. NCEP Reanalysis data (Kalnay et al., 1996) was used first, yielding P/T data from 1000 mbar up to 10 mbar (∼ 30 km). Above that, operational model data provided by NCEP was 25 used up to 0.4 mbar (∼ 50 km). Lastly, the Global Reference Atmospheric Model-1995 (GRAM-95) (Johnson et al., 1995) was used to extend the profiles up to 100 km. Since each atmospheric layer is assumed to be uniform, a single location was chosen to 5107 Introduction retrieve P/T data from, namely the 20 km sub-tangent latitude and longitude. The temperature and pressure data sets were then combined and interpolated to a standard 0.5 km spaced altitude grid. Analysis of the version 6.2 P/T data revealed a few anomalies where NCEP operational model data are absent, resulting in the use of GRAM-95 data deep into the 5 stratosphere. In addition, an analysis comparing SAGE II and III coincident events revealed that SAGE II NCEP operational model temperature data were typically warmer than SAGE III, even though both were using NCEP data sources (Fig. 3). The cause of this bias is unexplained. To maintain as much consistency as possible, a long-term self-consistent meteorological data set was required that spans the lifetimes of all 10 of the SAGE instruments and provides data throughout the stratosphere. The Modern Era-Retrospective Analysis for Research Applications (MERRA) (Rienecker et al., 2011), based on the Goddard Earth Observing System Model (GEOS-5.2) (Rienecker et al., 2008), provides temperature data at 42 pressure levels from the surface up to 0.1 mbar (∼ 65 km) from 1978 to the present. While this has not yet been implemented 15 in SAGE III, there are plans to use MERRA data for the reprocessing of SAGE III/M3M data to the version 7.0 standard and to use operational GEOS data for the upcoming SAGE III/ISS mission. SAGE II version 7.0 uses MERRA data from the surface up to 0.1 mbar and GRAM-95 above that up to 100 km. Instead of using the actual GRAM-95 temperature values, the GRAM-95 lapse rate is used to extrapolate above the MERRA 20 data. This is done because the MERRA lower mesospheric temperature values are often smaller than those from GRAM-95. Any attempt to merge the two data sets can introduce an artificial inversion layer in the lower mesosphere.
In both versions, after P/T profiles are determined, number density profiles are calculated. In order to have uncertainty estimates in derived quantities in the inversion the light ray. These quantities are calculated for each tangent altitude (in 0.5 km increments) from the surface up to 300 km. The methodology remains largely unchanged from version 6.2 and comes from Chu (1983) and Auer and Standish (2000). After refraction, tangent point altitudes, latitudes, and longitudes are updated, taking an oblate Earth model into account (Fig. 4). 15 There is one important error in previous versions to note, which is corrected in version 7.0. As the algorithm determines the refraction angle, it begins at the surface and moves to higher tangent altitudes. At a small threshold value for the refraction angle, this process ceases and remaining refraction angle values are assumed to be zero. However, these values were not explicitly set to zero. The result of this was that if one 20 event hit this threshold at some altitude and the following event hit this threshold value at some lower altitude, then the refraction angle values between these two altitudes would be carried over from the previous event, potentially through multiple events. This generally occurred in the 50-60 km range, and would be interpreted as non-physical characteristics in the atmosphere, which could propagate downward in the inversion Introduction a single event, to zero. As such, correcting this error in version 7.0 did not uncover or correct any biases in prior versions.

Spectroscopy
The retrieval process requires absorption cross-sections for O 3 , NO 2 , the oxygen dimer (O 2 -O 2 ), and molecular (Rayleigh) scattering at all wavelengths. This is done by first 5 combining each channel's measured spectral response function with the solar spectrum (Kurucz et al., 1984). The spectral response of channels 2, 5, and 6 also incorporate the long-term evolution in the spectral characteristics of their respective band-pass filters. These spectral responses are then combined with cross-section data for each species (i.e. O 3 , NO 2 , O 2 -O 2 , and Rayleigh) and integrated across each channel's 10 spectral range. Rayleigh cross-sections (cm 2 molecule −1 ) are wavelength-dependent and are calculated in the same fashion as in Bucholtz (1995). O 2 -O 2 cross-sections (cm 5 molecule −2 ) are taken to be wavelength-dependent, are assumed to scale with density, and are calculated in the same fashion as Mlawer et al. (1998) (for wavelengths encompassing channels 1 and 2) and Newnham et al. (1998) (for wavelengths 15 encompassing channels 3 through 7). O 3 and NO 2 cross-sections (cm 2 molecule −1 ) are taken to be both wavelength-dependent and temperature-dependent. In version 6.2, they were derived from the Shettle and Anderson cross-section compilation (Shettle and Anderson, 1995), which is the same cross-section compilation that was used in SAGE III version 3.0. In SAGE III version 4.0 (Thomason et al., 2010), however, the O 3 20 cross-sections were updated to the Bogumil (SCIAMACHY V3) cross-sections (Bogumil, 2003), which had a positive impact on several data products and produced better agreement with in-atmosphere measurements (Pitts et al., 2006 vertical "profiles" of effective cross-sections for these species in each channel, facilitating later computations. Effective cross-sections are constructed in this fashion because the interactions between species absorption within the band-pass of any channel are negligible. The O 3 retrieval is primarily dependent upon observations in channel 3 (600 nm), where the 5 NO 2 cross-section has decreased two orders of magnitude from its peak and contributes a negligible amount to the extinction. The NO 2 retrieval is primarily dependent upon correlative measurements in channels 5 (452 nm) and 6 (448 nm), where the O 3 cross-section has decreased nearly two orders of magnitude from the Chappius and displays little to no structure within these narrow band-passes. Aerosol retrievals are 10 primarily dependent upon observations in channel 1, where the cross-sections of both O 3 and NO 2 have decreased by several orders of magnitude and contribute a negligible amount to the extinction.
Due to the nature of the water vapor feature near 940 nm, the use of effective crosssections is not possible and the absorption must be modeled. Water vapor line data 15 for version 6.2 were provided by Linda Brown (personal communication, 2002), which were later incorporated into the water vapor line data for the 2004 version of HITRAN. The version 7.0 water vapor line data come from the 2008 version of HITRAN (including the 2009 update to water vapor). These line data are used to pre-compute derivatives of absorption as a function of temperature, pressure, and line-of-sight molecular num-20 ber density for later use with an emissivity-curve-of-grown-approximation (EGA) as the forward model for water vapor absorption (Gordley and Russel, 1980). After SAGE II had been operating for some time, a long-term analysis of the water vapor product showed it to be in poor agreement with other satellites and ground measurements. While many possibilities were considered, and eventually ruled out, it was 25 determined that the poor quality of water vapor data was a result of a shift in the spectral response of the water vapor channel (channel 2) prior to 1986 (Fig. 6). The primary reason behind this thinking was an incident associated with the SAGE II instrument during one of its final thermal vacuum tests just prior to being shipped off for integration with the ERBS. An incomplete shut down of the cooling system caused condensation within the instrument. It is believed that the channel filters absorbed water and then subsequently dried out on orbit, affecting their spectral characteristics. Unfortunately, it was impossible to reproduce this in the lab as no original filter material remained and the company that created it was no longer in business. Version 6.2 was the first version 5 to attempt to account for an apparent shift in the location of the water vapor channel filter. In order to attempt to model the filter characteristics, it was decided to adjust the spectral response of the water vapor channel to make the mean SAGE II water vapor data at one latitude and season agree with a HALOE climatological profile for the same latitude and season (northern mid-latitudes in March). The center wavelength and full-10 width at half-max (FWHM) were adjusted and the two data sets were again compared. While it was impossible to completely match these data sets, it was found that the best match came from a center wavelength shift of +10 nm (945 nm) and an increase in the FWHM of 10 % (22 nm). This was then applied to all SAGE II data from 1986 onward. The before and after comparisons of that work can be seen in Fig. 7. (Thomason et al., With the adoption of the SCIAMACHY V3 ozone cross-section database came a large shift in the Wulf ozone bands that span the SAGE II water vapor channel spectral response (Fig. 8). Given the sensitivity of the water vapor retrieval to the details of ozone (Chu et al., 1993), the evolution of the water vapor channel spectral response 20 was reevaluated. A comparison of SAGE II version 6.2 with newer data sets ( Fig. 9) shows that, while version 6.2 matches well to HALOE as expected, there is an offset of about 10 % with both SAGE III and Aura MLS, perhaps suggesting the choice of HALOE as a standard for inferring channel drift was not optimal. The channel 2 drift assessment was repeated, except SAGE III water vapor was used to infer the loca-25 tion of the water vapor channel rather than HALOE. The smallest difference between data sets came from an additional center wavelength shift of +2.7 nm (947.7 nm) and an additional increase in the FWHM of 5 % (23 nm). An updated set of instrument to instrument comparisons can be seen in Fig. 10 SAGE II is able to measure NO 2 by observing the difference in absorption between the two channels located at 452 nm (channel 5) and 448 nm (channel 6). Another look at exoatmospheric data ( Fig. 6) shows that, like the water vapor channel, channel 6 may have had a change in spectral response prior to 1990 as a result of the aforementioned thermal vacuum testing incident. We chose a similar, but slightly different approach, to 5 that used to correct the water vapor channel. Since NO 2 measurements are derived primarily from the differential extinction between the two channels, we chose to use one channel to calibrate the other, since the long-term variation in solar I-zero showed channel 5 to be well behaved. The differential effective cross-sections between the two channels were compared to the differences in their exoatmospheric counts. It was as-10 sumed that any difference in the exoatmospheric counts of channel 6 relative to that seen in channel 5 over time were a result of a change in the band-pass of its filter. In this way, a time-dependent fit could be made that allowed the application of a differential cross-section correction to the effective cross-sections in channel 5 to obtain the effective cross-sections in channel 6 for any time during the mission. The time-dependent 15 model for the channel 6 spectral band-pass required the determination of the initial properties and the rate of change relative to the observed long-term anomalous I-zero drift. To compute these, the SAGE II NO 2 stratospheric column abundances were compared to twilight NO 2 measurements at Lauder, New Zealand (Johnston and McKenzie, 1984). This procedure was applied in version 6.2, though only at one temperature 20 (240 K). The dependence of SAGE II NO 2 retrieval on O 3 necessitated a repeat of this procedure after both the O 3 and NO 2 cross-section databases were changed. The difference in version 7.0 is the implementation of a temperature-dependent differential cross-section correction and the use of SAGE III NO 2 data to determine the starting spectral location of channel 6. The overall quality of SAGE II NO 2 measurements relative to the SAGE III NO 2 measurements remains mostly unchanged between versions (Fig. 11).

Transmission
The transmission algorithm combines measured limb-darkening curves with timing and pointing data to produce slant-path optical depths for each tangent altitude between 0.5 km and 100 km in 0.5 km increments. In order to do this, a number of physical and instrumental sources of variation must be compensated for.

5
The series of calculations and corrections were done, in version 6.2, in a mostly linear fashion with some iterative calculation of transmission. It is now recognized that many of these corrections are dependent upon transmission and are thus, in version 7.0, updated throughout these iterations. This section will focus more on the various calculations and corrections that are done and less on the order in which they are 10 computed or how these steps may be iterated.

Edges and pointing
The first step in transmission processing requires that the limb-darkening curves ( Fig. 12) be combined with pointing data to place every data packet accurately on the face of the Sun. First the limb-darkening curves in the 1020 nm channel are used 15 to determine where the physical top and bottom edges of the Sun are by looking for the inflection points and the times associated with them. The timing of science packet data and ephemeris pointing data can then be accurately mapped to each other. This is done by assuming that the rate of motion of the scan-mirror is constant during a scan so that each packet of data within a scan can be interpolated to a location on the face 20 of the Sun. This becomes problematic when the bottom of the Sun is obscured by cloud or is below the limb of the Earth, as the calculated inflection point of the limb-darkening curve no longer correlates to the physical edge of the Sun and the calculated scan rate becomes biased high.
The apparent rate of motion of the scan-mirror is a combination of the motion of 25 the scan-mirror itself and the orbital and attitudinal motions of the spacecraft. A look at the average scan rates and the difference between up and down scan rates for a typical event shows that they are generally well behaved and slowly varying with time ( Fig. 13a). However, at any time during the event, the attitude actuators on the spacecraft (these keep the spacecraft in a desired orientation) can turn on or off and cause an abrupt shift in these quantities. The derivatives of these quantities show linearity with respect to time excluding attitude control maneuvers (Fig. 13b). This relationship 5 allows for a linear fit of these quantities to data just above where the rate data begins to go bad, which is characterized by large values in the first or second derivatives (Fig. 13c). These fits are then used to reconstruct low altitude rate data (Fig. 13d).
Once the algorithm has good rate data for each scan, the proper tangent altitude and position on the face of the Sun are calculated for each packet. Given the non-10 linearity of the problem of combining refraction effects and an oblate Earth model in determining tangent point altitudes, the algorithm uses an iterative scheme optimized for rapid convergence. During the development of both the SAGE III version 4.0 and SAGE version 7.0 algorithms, the algorithmic uncertainty in determining the tangent point altitude was determined to be better than 20 m. Lower in the atmosphere, where 15 refraction effects can become large, the uncertainty in the tangent point altitude is dominated by uncertainties in the meteorological data. Each packet of data, in each channel, is subsequently assigned a time, tangent altitude, position on the face of the Sun, scan-mirror elevation position, and photodiode count value. 20 In order to create transmission profiles, the algorithm ratios measurements made by the instrument looking at a particular point on the Sun through the atmosphere to the same point seen above the atmosphere. Thus one of the first things the transmission algorithm does is create a standard exoatmospheric limb-darkening curve (I-zero curve) to ratio other scans to. The instrument generally collects between 10 and 20 25 exoatmospheric scans, so these are combined into a pair of I-zero curves, up-scans and down-scans, which are then interpolated to a fine grid in position on the face of the Sun (∼ 1000 points in version 7.0 versus ∼ 100 points in version 6.2). As the I-zero 5115 Introduction curve is mapped onto each scan (including the exoatmospheric or I-zero scans), an edge time refinement (see Sect. 3.5) is made for the I-zero scans. A new addition in version 7.0 is the introduction of a time-dependent I-zero correction, which was first introduced in SAGE III version 4.0. A time-dependent I-zero correction benefits high altitude scans by helping to correct for apparent rotation of the scan track across the 5 face of the Sun due to orbital motions (Burton et al., 2010).

I-zero
Occasionally the instrument would fail to acquire more than one or two exoatmospheric scans. This happened sporadically through the lifetime of the instrument, but most notably during the so called "short event" period (from mid-1993 to mid-1994), when a battery problem on the spacecraft reduced the operational scan time of the 10 instrument, causing sunset events to begin later than normal and sunrise events to end earlier than normal. Many of the corrections applied to the data require a minimum number of I-zero scans (i.e. data over a large range of exoatmospheric altitudes). In prior versions, the absence of this data would often result in anomalous events that were not screened (i.e. dropped) by the algorithm and required the user to manually 15 screen them out (Wang et al., 2002). In version 7.0, all events without a minimum number of I-zero scans are dropped from processing and flagged accordingly.

Other corrections
Once an I-zero curve is established (and mapped to each scan), a preliminary transmission can be computed for each scan. This then allows the calculation of a few 20 corrections that need to be made. An initial correction for Rayleigh attenuation is done (for the benefit of other corrections) and a polar mesospheric cloud (PMC) detection routine is run (Burton and Thomason, 2000). While no overall correction is made in the presence of PMCs, the use of data within PMCs will be avoided when determining the mirror calibration (see Sect. 3.4). A sunspot detection routine is also run and measure- filtering (i.e. it would often omit all data from the start of a sunspot to the edge of the Sun) and a more robust algorithm has been introduced for version 7.0. As the SAGE II instrument began taking data during each event, a small transient was observed in channels 5 and 6. This so-called thermal shock needed to be accounted for in the data processing. For sunset events, this is relatively easy as the 5 data are fairly homogenous (exoatmospheric scans) and any time dependency is easily recognized and removed. For sunrise events, however, the transient occurs while the instrument is looking through the attenuated atmosphere and the correction is necessarily at low altitudes. In this case, scan-to-scan variations in the channel 5 and 6 differential extinction at a given altitude, which are correlated in time across multiple scans, are examined. By looking at the ratio of channel 6 to channel 5, which is used for the NO 2 retrieval, the algorithm determines the rate of change of differential extinction, which is then integrated to produce a correction that is applied to these two channels. The impact on the transmission in these channels is on the order of 0.25 %, whereas the impact on other channels (were it to be applied) would be no more than 15 0.05 % (SPARC/IOC/GAW, 1998).

Mirror calibration
SAGE operates by using a scan-mirror to move the instrument field-of-view up and down across the solar disk (normal to the Earth's limb). The reflectivity of the scanmirror varies slightly with the angle of incidence and requires calibration. This is not 20 an absolute reflectance calibration, as only the relative change in reflectivity with angle is required. In version 6.2, this calibration was determined through a quadratic fit to high altitude transmission data as a function of altitude and performed only once at the end of transmission processing. In version 7.0, this quadratic fit is to high altitude transmission data as a function of the elevation angle of the scan-mirror and is 25 performed iteratively. In each case, the mirror calibration is applied as a multiplicative correction to remove curvature in the high altitude transmission data, which should theoretically be a constant value of 1 with some instrument noise. Since smaller angles 5117 Introduction (higher altitudes) are used for the fit, an extrapolation of the correction term to larger angles (lower altitudes) is necessary to correct attenuated data. An example of the mirror correction can be seen in Fig. 14, demonstrating that the angular dependence of the reflectivity of the scan-mirror is on the order of 0.5 %.

5
The edge-time refinement algorithm is designed to minimize any biases in the limbdarkening curves created by slight errors of the initial calculation of the edge times of each scan. The measured solar intensity of a scan (I) in a given channel (λ) is a function of both position on the face of the Sun (p) and altitude (z), which are themselves both functions of time (t), and can be written as where I 0 is the I-zero limb-darkening curve, T is the slant-path transmission, and ε is the error in the measurements or estimates. Since there is some inherent uncertainty in the calculated edge times, Eq. (1) can realistically be rewritten as 15 where ε t is related to uncertainties in the edge times. There are two cases that can be considered separately. The first case is for high altitude measurements where T (z) = 1, and Eq.
(2) can be expanded into where I 0 is the estimate for the I-zero curve and c 1 and c 2 are the linear shift and 20 stretch coefficients respectively. The correction is a linear function because position on the face of the Sun is linearly mapped to time. The derivative term is calculated using finite differences and a multiple linear regression technique is used to obtain the shift and stretch terms, which yield the correction to the edge times. The second case involves measurements in the atmosphere where T (z) can no longer be ignored. In this case, Eq. (2) can be expanded into where c 1 and c 2 are again the linear shift and stretch coefficients. Following the same procedure, the correction to the edge times is obtained. Since the pair of edge times apply equally to all spectral channels, the derivatives are performed at each wavelength 10 and used together in the regression calculation. The shift and stretch algorithm has the effect of removing spectrally and vertically correlated noise in the transmission profile, typically at low altitudes (Fig. 15).

Normalization, gridding, and uncertainties
In order to calculate the various corrections already outlined, the algorithm iteratively 15 computes transmission values for each packet. In order to mitigate effects remaining from residual edge time uncertainties, the outer 10 % of the solar disk is omitted. After the algorithm has undergone several iterations of calculating transmission and applying corrections, it computes the final transmission profile. A running median filter is applied through altitude sorted transmission packets to minimize the impact of strong outliers, 20 and then smoothed with a boxcar average. The filtering process is performed in altitude and the filtering/smoothing parameters correspond to 1.0 km so that all transmission data meet the Nyquist sampling criteria for a 0.5 km gridded profile. An example of an intermediate transmission profile and a final transmission profile is shown in Fig. 16. The transmission data are then interpolated to the 0.5 km grid, and the variance of the 25 fit with respect to the raw data in each bin is used to compute the uncertainty estimate. In version 6.2, the statistics of this fit were used for the uncertainty in the transmission value in each bin. In version 7.0, we have incorporated an additional uncertainty term, namely a calculated uncertainty in the original I-zero curve, meant to account for variations between the exoatmospheric scans used to create the I-zero curve and the resulting I-zero curve itself. These minor variations between each exoatmospheric 5 scan are highly correlated with position on the solar disk, but have no discernible timedependency (i.e. they are not detected and filtered by the time-dependent I-zero correction). It is believed that these variations represent physical features in the Sun's photosphere (e.g. granulation) combined with the apparent rotation of the solar disk during the event, as opposed to instrumental noise. These variations manifest themselves as 10 low amplitude oscillatory patterns in high altitude transmission (on the order of 0.1 %) that are periodic in altitude (due to their high correlation with surface features on the Sun) and correlated between channels (Fig. 17). An extension of the time-dependent I-zero algorithm to compensate for this effect is in development. Lastly, all slant-path transmission profiles are converted to slant-path optical depth profiles.

Vertical profiles of individual species
The inversion algorithm takes slant-path optical depth profiles and, along with other data such as P/T data, separates them into species specific slant-path optical depth profiles before finally inverting them into vertical profiles of O 3 and NO 2 number densities, H 2 O volume mixing ratio, and aerosol extinction. While much of this follows the 20 same basic procedure described in Chu et al. (1989), there are some important differences between versions 6.2 and 7.0, some of which include more subtle aspects of the algorithm. As such, this section will generally review the entire inversion process.

Basic procedures
The viewing geometry of SAGE is such that at each tangent height the instrument looks through the atmosphere, it is also looking through a slant-path column of air that incorporates all of the tangent heights above it. Accounting for refraction, a triangular path-length matrix is computed. The slant-path total column at each tangent height, 5 derived from a matrix multiplication of the path-length matrix and density profile, is thus comprised of the sum of partial slant-path columns from all overlying 0.5 km thick layers. This simple matrix operation allows for an "onion peeling" process to be performed later for inverting a species' slant-path optical depth profile to the species density profile. 10 The first step in the retrieval of vertical profiles of individual species is to account for and remove the contributions of molecular (Rayleigh) scattering in all channels and O 2 -O 2 absorption in a subset of channels. O 2 -O 2 cross-sections (cm 5 molecule −1 ) are scaled with density and both O 2 -O 2 and Rayleigh effective crosssections (cm 2 molecule −1 ) are combined with the slant-path total column to convert to 15 slant-path optical depths, which are then subtracted from each channel.

Species separation
The species separation in SAGE II is performed using five of the seven channels simultaneously (1020, 600, 525, 452, and 448 nm) and is separated into three altitude regions: altitudes where some of the 5 channels are no longer available, altitudes in 20 which all five channels are available, and altitudes above which it is believed there is no aerosol extinction. These five channels are used because the poor quality of the 386 nm channel prevents its use in the broad retrieval. Significant absorption by water vapor occurs only in the 940 nm channel and is inverted in a separate process.
The first step is to find the highest extent to which one of the five channels is not 25 available (i.e. no valid data) and begin just above that. NO 2 , and aerosol, though at longer wavelengths (particularly 1020 nm) the contribution from gas species absorption becomes vanishingly small. At this point we have measurements with seven unknowns: O 3 , NO 2 , and aerosol at each wavelength. We solve this set of equations using a least-squares solution where we approximate the aerosol contribution at 600 and 448 nm as the linear combination of aerosol at 1020, 525, and 5 452 nm. The full set of species separation equations to be solved can be expressed as where i = {1, 3, 4, 5, 6} is the channel number, σ(λ i ) is the effective cross-section of the 10 stated species at the given channel, OD(λ i ) is the slant-path optical depth of the stated channel, and the channel specific aerosol, O 3 , and NO 2 ODs are the unknowns. The coefficients for this process (c 1 , c 2 , and c 3 ) are determined using an ensemble of single mode log-normal size distributions of sulfate aerosol at stratospheric temperatures, though, in practice, composition is of secondary importance. The ensemble of log- 15 normal size distributions spans the observed wavelength-dependence of the aerosol spectra.
Once O 3 and NO 2 ODs are determined, their relative contributions in the 940 and 386 nm channels can be removed and aerosol OD in the 386 nm channel is retrieved as a residual. Aerosol OD in the water vapor channel is calculated from the 525 and 20 1020 nm aerosol values, using separate weighting coefficients determined from the use of the 525 to 1020 aerosol OD ratio. Once the aerosol contribution in the water vapor channel is determined, the actual water vapor OD is calculated as a residual. This process separates the various species from lower altitudes (typically in the middle to upper troposphere) up to some maximum altitude (typically set to 75 km).
While this retrieval works very well, it typically suffers from attempting to retrieve species at higher altitudes where extinction values are near detection limits and noise 5122 Introduction becomes the dominant signal. This noise is a carryover of the channel-correlated solar structure noise first discussed in Sect. 3.6, which does not dampen at higher altitudes. Much of this noise ends up being interpreted by the algorithm as aerosol and detrimentally affects the simultaneous retrieval of all species. To compensate for this in version 6.2, a separate retrieval scheme was used that uses only 4 channels (600, 5 525, 452, and 448 nm) and assumes that no aerosol is present to calculate O 3 OD in the 600 nm channel and NO 2 OD in the 448 nm channel. Water vapor was again treated as a residual in the 940 nm channel. In lieu of more adaptive methods, this process began at 40 km up to some maximum altitude (typically 75 km). Thereafter, the 5-channel retrieval was transitioned into the "no aerosol" retrieval between 40 and 45 km. The 10 version 7.0 algorithm utilizes the data to determine how to transition into regions where the inclusion of aerosol in the retrieval is no longer necessary, the methods of which are outlined later in this section. For altitudes below the 5-channel retrieval, there is no longer valid data in the 448 nm channel and thus NO 2 cannot be retrieved. Instead, the NO 2 OD profile from the 5-15 channel retrieval is inverted to get extinction values and the algorithm reconstructs ODs at lower altitudes by assuming the NO 2 mixing ratio is zero. The OD contribution from NO 2 at lower altitudes is then removed from all channels. With NO 2 removed, the algorithm begins working from the bottom of the 5-channel retrieval and moves down. It first uses a 4-channel retrieval (1020, 600, 525, and 452 nm) so long as there is valid 20 data and then transitions to a 3-channel retrieval (1020, 600, and 525 nm) when necessary. The main retrieval algorithm stops if there is no longer any valid data in any of these three channels. Aerosol extinction in the 1020 nm channel is thereafter retrieved as low as data exists. These data are then used to estimate the contribution of aerosol in the 940 nm channel so that water vapor can again be calculated as a residual. 25 This entire process goes through two iterations. The first iteration uses a default set of weighting coefficients to determine the contribution of aerosol at 600 and 448 nm. In the second iteration, we use the measured 525 to 1020 nm aerosol extinction ratio to select sets of coefficients determined using the fits to the ensemble of aerosol spectra for values around the observed value. This accounts for a small degree of non-linearity observed in the fits to 600 and 448 nm aerosol extinction. In reality, this is a distinctly second order correction but seems to reduce the sensitivity of the quality of the ozone data product to aerosol, particularly when aerosol levels are high. However, for the same reasons that version 6.2 has a "no aerosol" retrieval, at altitudes near and above 5 40 km, the 525 to 1020 aerosol OD ratio can begin to vary wildly through non-physical numbers. This has a detrimental effect on the second iteration retrievals, producing extremely "noisy" data above 35 km. In version 7.0, once the retrieved aerosol OD in each channel drops below a pre-determined threshold, a non-linear least squares fit is made to the data in the form of an exponential decay curve. Above the altitude where 10 these fits drop below the amplitude of the noise, the fits are used for the 525 to 1020 ratio instead of the actual data. The second iteration retrievals then have far more realistic weighting coefficients to work from. While the use of fits to determine the 525 to 1020 ratio does improve the quality of the 5-channel retrievals, particularly above 35 km, it can still suffer from the same lim- 15 itations that necessitated the use of a "no aerosol" retrieval above 40 km, namely the fact that the aerosol ODs at higher altitudes can still become non-physical and detrimentally affect the simultaneous retrieval of all species. Typically this is a byproduct of retrieved non-physical aerosol OD values in the shorter wavelengths at higher altitudes detrimentally impacting the retrieved values in longer wavelengths. This had the ten-20 dency (in version 6.2) to result in O 3 and NO 2 values that were biased high at altitudes above ∼ 40 km. To compensate for this, in version 7.00, the algorithm looks only at the 600 nm channel to retrieve ozone in these altitude regimes. When the fit to the 600 nm aerosol OD drops below a certain threshold, the algorithm subtracts out the fit (which is generally below the "noise level") and inverts only the remaining 600 nm OD to re-25 trieve ozone. A more adaptive algorithm is being developed for retrieving NO 2 at higher altitudes though, for now, the high bias persists from version 6.2.
The water vapor retrieval has benefited greatly from the use of fits to the aerosol optical depths to determine the 525 to 1020 ratio. In version 6.2, once either the 525 or 1020 aerosol contribution became negative, the algorithm assumed there was no longer any aerosol contribution above that altitude in the water vapor channel (mainly because the weighting coefficients could not be determined from a negative ratio). However, it is possible, due to noise, for the aerosol contribution to become negative but then become some non-negligible positive number again. This should have manifested 5 itself as noise in the data, but was instead being artificially removed. By using the fits to the aerosols to determine the 525 to 1020 ratio, weighting coefficients could be applied to the real data (be it positive or negative) through the entire retrieval range. This has benefited the retrieval greatly as, after the removal of O 3 and NO 2 , the aerosol contribution to the water vapor channel can still be a significant fraction of the remaining 10 signal in some cases.

Inversion
After species separation, water vapor is the first retrieved species. The slant-path optical depth data are first converted to extinction and then smoothed to help mitigate noise in the weak signal. The algorithm to retrieve water vapor mixing ratio remains 15 mostly unchanged from Chu et al. (1993) with one small exception. Originally the process began at 50 km and worked down. The algorithm requires a small range of altitude at the top of the retrieval process to establish the abundance and scale height of the H 2 O profile as a boundary condition. Improvements in the version 7.0 transmission allow the starting altitude to be moved up to 60 km. This, combined with the aforemen-20 tioned aerosol fit method, has greatly improved the water vapor product above 40 km. In version 6.2, water vapor mixing ratio profiles would suffer from a characteristic "hook" towards unrealistically large values nearing 50 km. This can also be seen in SAGE III version 4.0 water vapor mixing ratio profiles. With these modifications to the retrieval, this "hook" has been removed (Fig. 18). 25 Prior to version 7.0, once water vapor was retrieved at the end of the first iteration, its contribution to the 600 nm channel was removed and the second iteration began. This had an inconsequential impact on ozone above the hygropause. However, below 5125 Introduction  (Wang et al., 2002). This feature was turned off in SAGE III version 4.0, due to uncertainty of the quality of the relative spectroscopy used between the water vapor channel and the 600 nm channel, with beneficial results (Wang et al., 2006). For the same reason, this feature has also been turned off in SAGE II version 5 7.0. It is important to note, however, that this has not completely corrected the low bias in SAGE II tropospheric ozone data.
Once the algorithm has run through both iterations, it has produced vertical water vapor volume mixing ratio and NO 2 number density profiles. All that remains is to invert the O 3 and various aerosol slant-path column optical depth profiles into vertical 10 extinction profiles. As with NO 2 , the O 3 vertical extinction profile is then converted to a vertical number density profile by simply dividing by the effective cross-section. The inversion technique used in version 7.0 is different from that used in version 6.2, which utilized Twomey's modification of Chahine's algorithm (Chahine, 1972;Twomey, 1975) to retrieve extinction values and a simple onion-peeling technique to retrieve uncer-15 tainty estimates. Twomey-Chahine was found to have several undesirable behaviors. It would not allow negative values and therefore introduced a positive bias in regions of the density (extinction) profile where the signal to noise ratio was small, typically at the higher altitude end of the retrieved profile. It also systematically approached the solution from one direction and stopped once the tolerance criterion was met, producing 20 another form of bias as a result. Lastly, its use introduced discontinuities in the profile when the slant-path extinction fell below a preset value and vertical smoothing was activated. Given the high quality of the SAGE II version 7.0 transmission profiles and the algorithmic limitations of the Twomey-Chahine inversion method, it was replaced entirely with onion-peeling in version 7.0. 25 Lastly, with all primary data products computed, the aerosol extinction in the 525 and 1020 nm channels are used to compute some physical parameters to characterize aerosol at each altitude. The methods outlined in Thomason et al. (2008) are used to determine the effective radius and surface area density of aerosol particles. This paper describes the SAGE II version 6.2 and version 7.0 algorithms. Prior versions of SAGE II data products have been well validated (e.g. Wang et al., 2002) and included in numerous international assessments (e.g. WMO, 2011). Several of the version 7.0 changes affect the quality of these data products and a brief assessment of 5 the differences seen in the version 7.0 data products follows. Figure 19 shows several comparison plots of ozone. The changes in ozone from version 6.2 to version 7.0 come almost entirely from the change in spectroscopy, resulting in a nearly uniform decrease in concentration of ∼ 1.5 % in the stratosphere. The mag-10 nitude of the offset increases to about 2 % above 40 km due to the removal of the "no aerosol" method of retrieval. The large increase in ozone below 10 km is a result of the removal of the water vapor correction to ozone. The difference between the mean and median below 20 km is a result of using onion-peeling for inversion instead of Twomey-Chahine, as the outliers in version 6.2 were biased positive whereas in version 7.0 they 15 can take on negative values; though overall the same number of outliers exist. While the changes move SAGE II ozone values further from SAGE III concentrations by about 1.5 % despite using the same spectroscopy, the changes in the retrieval method make the altitude dependency of the offset more consistent. Given the diurnal nature of NO 2 , sunrise and sunset events are compared sepa-20 rately. Figure 20a and b shows comparison plots for sunset and sunrise NO 2 , respectively. Due to changes in spectroscopy, the overall concentration of sunset NO 2 has dropped by 5-10 % in the mid-stratosphere. The large decrease below 20 km is again a result of using onion-peeling instead of Twomey-Chahine for inversion. As previously mentioned, sunset NO 2 values are biased high above 40 km and an algorithm to better 25 retrieve NO 2 at these higher altitudes is in development. Net changes in sunrise NO 2 are an amalgam of changes made in transmission, spectroscopy, species separation, and inversion. Sunrise NO 2 remains somewhat of a research product as quantifying the impact on sunrise NO 2 data quality is hampered by the fact that insufficient high quality sunrise NO 2 measurements exist for comparison during the time period where SAGE II measured sunrise NO 2 (all comparisons with SAGE III are sunset events due to power problems late in the mission forcing operation at half duty cycle). Looking at 5 the sunset/sunrise NO 2 ratio, however, reveals that the data are more consistent in the mid-stratosphere in version 7.0 than in version 6.2 (Fig. 20c). While no changes have been made to the retrieval of aerosol extinction in version 7.0 specifically, the data in various channels are impacted by the changes made to the spectroscopy and the technique used for inversion. As a result of using onion-peeling 10 instead of Twomey-Chahine for inversion, aerosol extinction tends to decrease more quickly at higher altitudes, as opposed to asymptoting to some positive non-zero value as in previous versions. The overall data quality through the mid-stratosphere has remained mostly unchanged for aerosol extinction in the 1020 and 385 nm channels. The change in spectroscopy has had a large impact on the comparisons between SAGE II 15 and SAGE III aerosol extinction in the 525 and 452 nm channels. SAGE II version 7.0 aerosol extinction at 525 nm is in much better agreement with SAGE III version 4.0 as compared to SAGE II version 6.2, whereas the opposite is true for aerosol extinction at 452 nm. We have transitioned the aerosol derived products including surface area density (SAD) and effective radius (R eff ) from the technique outlined in Thoma-20 son et al. (1997) to a more robust method developed in Thomason et al. (2008). Since there is some concern about the newer technique's performance at low 525 to 1020 nm aerosol extinction coefficient levels, the new operational method transitions from the 2008 method for ratios above 2.0 to the old method for ratios below 1.5 with a linear mix in between. As a result, aerosol products do not change significantly during the 25 post-Pinatubo period but change substantially during the clean period, particularly after 1998. The change in this period can be seen in Fig. 22 where the SAD has increased by 50 % throughout the lower stratosphere and the R eff has decreased by about 10 %. The process of retrieving water vapor and the impact on data quality has been discussed throughout this paper. As mentioned in Sect. 2.4, the water vapor channel filter spectral location was shifted a total of +12.7 nm and the FWHM was increased by 15 % from the original location. This filter location was determined by comparing SAGE II and SAGE III water vapor data, albeit prior to the inclusion of the use of MERRA me-5 teorological data in the retrieval. We have since revisited this problem and come to the conclusion that, in addition to the drift of the water vapor channel filter spectral response, the relative ozone spectroscopy used to remove ozone from the water vapor channel may be incorrect. The reasoning behind this is illustrated in several plots shown in Fig. 23. During the course of determining the best location to move the water vapor channel spectral response, it was noted that very good agreement could be reached in altitude regions of low ozone without adjusting the FWHM of the channel. However, removing the remaining "ozone-like" signal from water vapor required nearly doubling the FWHM of the channel. It is well understood that some level of uncertainty exists in laboratory experiments to retrieve temperature-dependent ozone absorption 15 cross-sections, particularly in the Wulf bands (Bogumil, 2003). Since the release of the SCIAMACHY V3 ozone cross-section database, several other ozone cross-section databases have been released that show significant changes in the Wulf bands relative to the Chappius (e.g. Chehade, 2013;Serdyuchenko, 2011). While we are hesitant to adopt a new cross-section database until it has been validated, SAGE II data suggests 20 that the relative ozone spectroscopy in the SCIAMACHY V3 database could be off by on the order of 10 %. We have identified many possible combinations of changing both the filter location and the relative ozone spectroscopy in order to minimize differences with SAGE III water vapor. However, any change to the ozone spectroscopy creates a coupled problem when comparisons are made with SAGE III water vapor, as 25 the same spectroscopy would have to be adopted by SAGE III as well. As it currently stands, the water vapor product in SAGE II version 7.0 is in much better agreement with SAGE III version 4.0 than was SAGE II version 6.2. However, we are still not satisfied with the result and this issue remains a topic of further study.

Time series analysis
Herein we present a new way of fitting SAGE II data for use with time series analysis; namely, we fit the entirety of the data at a single altitude simultaneously using the dates and latitudes of the measurements as they were made (i.e. no latitude gridding or monthly means). The purpose of this fit was for the creation of climatologies, but has 5 revealed some interesting data quality impacts between versions 6.2 and 7.0. Since the content of this paper will focus on the residuals of the fit rather than the fits themselves, we will only briefly outline the fitting process. The fitting process begins by applying a modification of the Wang et al. (2002) filtering criteria to each event and then taking daily (zonal) means of collocated events. The 10 following functional form is then regressed to all of the data: where η is the concentration of the given species (i.e. O 3 , NO 2 , and H 2 O), Θ(θ) is the functional form of the latitudinal dependence, and T (t) is the functional form of the time dependence. Θ(θ) is simply a Fourier series with the constraint of zero derivative at the 15 poles. T (t) contains semi-annual (3, 4, and 6 month terms), annual, QBO (Singapore wind proxy), solar cycle (11 yr period terms), and EESC (for fits to O 3 ) terms as well as an additional piecewise term to account for any potential diurnal variation in a species. Both Θ(θ) and T (t) also contain a constant term, which collectively provide the constant for the fit. Residual analysis is performed to omit any outliers with large influence on the 20 data and a correction is made for lag-1 autocorrelation. This fit is performed for each altitude. Some examples of the fits themselves are illustrated in Figs. 24 and 25. Two sets of plots are shown at low and mid-latitudes to illustrate the robustness of the fit.
The absolute values of the residuals, hereafter simply stated as residuals, are examined throughout the fitting process. Figure 26 illustrates the mean residuals of the fit to 25 O 3 prior to the lag-1 autocorrelation correction for versions 6.2 and 7.0 (total residuals). It also shows the same data averaged between 60 • S and 60 • N. There is a clear improvement in total residuals between 30 and 50 km from version 6.2 to version 7.0. The 5130 Introduction artifact in residuals in version 6.2 at 50 km is a result of smoothing applied above that altitude. Recall that version 6.2 employed a 2.5 km boxcar smoothing of O 3 data above ∼ 50 km, while version 7.0 employs no smoothing to O 3 data. The effect of the QBO can be seen around 35 km in both versions. The correlated residuals (Fig. 27) exhibit similar behavior as the total residuals. The uncorrelated residuals (Fig. 28) are, not surpris-5 ingly, similar between versions 6.2 and 7.0 (with the exception of the aforementioned smoothing) and reveal the extent of combined instrumental and geophysical noise in the data. The uncertainties in the measurements (dashed lines) mirror the shape of this noise and are slightly (∼ 1 %) lower than the observed residuals, indicating that the uncertainty estimates are reasonable and that the zonally averaged, unresolved geophysical variability is on the order of 1 %. There is a slight signature from the QBO, suggesting perhaps that this element of the fit needs further attention (and/or that an ENSO term is required). All told, version 7.0 data is more consistent, with less correlated noise and smaller total residuals than version 6.2, making the data more robust to work with for time series analysis. Figure 29 shows the same (non-latitude dependent) plots as O 3 for sunset NO 2 . There is again a clear overall decrease in the total residuals in the mid-stratosphere. The artifact in version 6.2 at 40 km is a result of the transition to a "no aerosol" retrieval method above that altitude in addition to additional smoothing. While the correlated residuals have decreased from version 6.2 to version 7.0 as expected, they exhibit dif-20 ferent behavior above 40 km in version 7.0, which is a result of the previously discussed high bias in sunset NO 2 . Analysis of the uncorrelated residuals for sunsets shows behavior consistent with that of O 3 (i.e. version 6.2 and version 7.0 are comparable with mean uncertainties that mimic their shapes at a smaller level and an implied geophysical variability of ∼ 1 %) above 25 km and below the high bias in the version 7.0 data. At 25 altitudes below 25 km, version 7.0 has more noise than in version 6.2 as well as having large uncertainties. The cause of this behavior is a matter of further research. A look at sunrise NO 2 (Fig. 30) NO 2 . Sunrise NO 2 shows consistent measurements only near the peak, but it is still a research product. Figure 31 shows the fitting results for water vapor. While the total residuals have decreased overall between version 6.2 and version 7.0, especially at higher altitudes thanks to changes in the retrieval, the behavior of the correlated residuals is not consis-5 tent with other species. Several possible causes exist to explain this behavior, though we believe the most likely explanation is a lingering interference from other species in the water vapor product. The uncertainties in the water vapor measurements are much too high when compared to the uncorrelated residuals at lower altitudes (there is a transition to additional smoothing above 25 km), which may be the result of the inclusion of an aerosol clearing uncertainty into the water vapor uncertainty. The aerosol uncertainty manifests itself as a bias rather than increased random noise. As previously mentioned, we are still not satisfied with the result and plan on addressing these issues in a future revision.
With the exception of some unusual behavior in water vapor, the total residuals of 15 retrieved species have decreased from version 6.2 to version 7.0 partly as a result of improvements to the algorithm and partly as a result of decreased correlated residuals.
Since the correlated residuals are the result of the lag-1 autocorrelation, and the data has been consolidated into daily means, the correlated residuals represent a combination of daily geophysical variability that is not captured in the model and noise in the 20 data that is correlated from day to day. Since the algorithm itself cannot produce correlated noise (without serious errors), any correlated noise in the data must be a result of correlated noise in the input data, namely the ephemeris and meteorological data. While the possibility of correlated noise in the ephemeris data itself exists (Buglia, 1989), the manner in which the algorithm processes ephemeris data and the fact that 25 ephemeris data is highly correlated from day to day can affect the correlated noise in the processed ephemeris data. As such, version 6.2 did have a source of correlated ephemeris noise. The altitude registration offset caused by improper ephemeris calculations (outlined in Sect. 2.1 and shown in Fig. 2)  correlated noise in the processed ephemeris data. Since the solar ephemeris calculations in version 7.0 are more accurate, this correlated noise in the processed ephemeris data has been reduced. The meteorological input data also has some amount of correlated noise. Figure 32 illustrates the same residual analysis for the neutral density that comes from the meteorological data. While the overall level of correlated noise 5 is small, there is a clear reduction in correlated noise above ∼ 35 km in version 7.0. Recall from Sect. 2.2 that, in version 6.2, operational model data provided by NCEP was used between 10 mbar and 0.4 mbar (∼ 30-50 km) and has been shown to be potentially inconsistent and problematic. This explains the separation of the correlated residuals between versions 6.2 and 7.0 and the subsequent convergence as both algo-10 rithms eventually return to the use of GRAM-95 data at higher altitudes. The reduction of both of these sources of correlated input data result in the subsequent reduction of correlated noise in the retrieved species.

Conclusions
Version 7.0 of the SAGE II processing algorithm has been detailed and discussed and 15 the changes made between version 6.2 and version 7.0 have been explained. Some of the changes from version 6.2 to version 7.0 that have had the greatest impacts on the data include: corrections to solar ephemeris processing to remove a quasi-random altitude registration offset, migration from NCEP ancillary meteorological data to MERRA, adaptation of updated O 3 and NO 2 temperature-dependent absorption cross-sections, 20 incorporation of transmission processing algorithms closer to those used in SAGE III version 4.0, redetermination of the water vapor channel filter spectral location due to spectral drift, and removal of Twomey-Chahine in favor of a simple onion peeling technique for inversion. Ozone has always been and continues to be a high quality data product. As a result Introduction is now more consistent as a function of altitude when compared with other instruments. Ozone is also now more robust for use in time-series analysis, with smaller total residuals and decreased correlated noise in the data. The continued use of SAGE II ozone data is encouraged, though care should be used when using the data either below 20 km or above 50 km due to increased noise in the data.

5
NO 2 continues to be a good product for study, though use of only sunset events is recommended as sunrise NO 2 , while better in version 7.0 than version 6.2, is still somewhat of a research product. Version 7.0 sunset NO 2 concentrations have decreased by 5-10 % and are in better agreement with SAGE III than was version 6.2. Much like O 3 , sunset NO 2 is now easier to work with for time-series analysis, with smaller total resid-10 uals and decreased correlated noise in the data. Care should be taken when using the data below 25 km or above 40 km due to noise. In addition, sunset NO 2 values above 40 km are biased high and an algorithm to better retrieve NO 2 at these altitudes is in development.
Aerosol has also been a cornerstone SAGE II data product and continues to be so 15 in version 7.0. Changes to spectroscopy in version 7.0 have resulted in large changes in aerosol extinction in the 525 and 452 nm channels. While 1020 nm aerosol has remained consistent, 525 nm aerosol is now in better agreement with SAGE III, though 452 nm aerosol is now much smaller than before. Use of aerosol in the 385 nm channel continues to be discouraged due to unknown issues with that channel. The aerosol de- 20 rived products have been updated, with surface area density increasing by 50 % during clean periods and effective radius decreasing by 10 % compared to version 6.2. Water vapor continues to be a work in progress. Many changes have been made to the version 7.0 algorithm to improve the quality of the water vapor data product, though several sources of uncertainty remain that require additional work. Most notably 25 is the presence of lingering signatures from other interfering species. While potential solutions are currently being investigated, we did not want to hold back the release of the significant improvements made to the O 3 and aerosol products. That having been said, the SAGE II version 7.0 water vapor data product is now in much better agreement 5134 and Aerosol Measurement Sensor during SOLVE II, Atmos. Chem. Phys., 6, 2695Phys., 6, -2709Phys., 6, , doi:10.5194/acp-6-2695Phys., 6, -2006Phys., 6, , 2006