Optimized Umkehr profile algorithm for ozone trend analyses.

. The long-term record of Umkehr measurements from four NOAA Dobson spectrophotometers was reprocessed after 20 updates to the instrument calibration procedures. In addition, a new data quality-control tool was developed for the Dobson automation software (WinDobson). This paper presents a comparison of Dobson Umkehr ozone profiles from NOAA ozone network stations (Boulder, OHP, MLO, Lauder) against several satellite records, including Aura Microwave Limb Sounder (MLS; ver. 4.2), and combined SBUV and OMPS records (NASA AGG and NOAA COH). A subset of satellite data is selected to match Dobson Umkehr observations at each station spatially (distance less than 200 km) and temporally (within 24 hours). 25 Umkehr Averaging Kernels (AKs) are applied to vertically smooth all overpass satellite profiles prior to


Introduction.
The success of 30-years of international collaborations since the implementation of the Montreal Protocol and its amendments were celebrated at the Symposium for the 30th Anniversary of the Montreal Protocol (http://www.montreal30.io3c.org/) that brought together leading scientists, policymakers, and the public at the French Academy of Sciences in Paris, France  on September 22-23, 2017. The emphases were on future scientific and public policy challenges for 40 efficiently guiding ozone recovery processes . Confirmation of stratospheric ozone recovery was reported in recently published literature (Steinbrecht et al. 2017;Ball et al. 2019; SPARC/IO3C/GAW, 2019). The current state of stratospheric ozone recovery was summarized in the 2018 WMO/UNEP ozone assessment (WMO, 2018), where trend uncertainties for combined observational records have been used to describe confidence in detected trends. Uncertainty of trend detection did not include full information about ozone measurement uncertainty. The difference in trends derived from 45 satellite combined observational records suggests that further work needs to be done to assure good practices for the homogenization of long-term ozone records. Ground-based records are often used to verify the stability of satellite records (Fioletov et al, 2006, Krzycin and Rajewska-Wich, 2009, Nair et al, 2012, Hubert et al, 2016, Bernet et al, 2019, Wang et al, 2020. In order to provide the reference, ground-based observations require careful and continuing examination of past calibration records, changes in instrumentation and assessment of measurement uncertainties. Changes in 50 the frequency of measurements can create complications in interpretation of relative stability of records and the resulting impact on the derived ozone trends (Sofieva et al, 2014;Damadeo et al., 2018).
Multiple studies show statistically significant positive trends in ozone in upper stratospheric levels in Tropical and Northern mid-latitudes, and nearly significant positive trends in the Southern Hemisphere. The statistical and analytical approaches to quantify ozone recovery are complicated by the natural year-to-year variability which is detected in the observed ozone records. 55 Moreover, stratospheric ozone recovery rates are expected to be slower than the decline of stratospheric ozone during the 1980s due to the long lifetime of the ozone-depleting substances. While ozone recovery in the upper stratosphere is mostly determined by halogen levels, temperature plays an important role in ozone recovery, including so-called "super recovery" , where ozone abundances exceed 1980 levels due to greenhouse gas-induced stratospheric cooling. At the same time, in the lower stratosphere atmospheric composition and ozone levels are driven by the climate-impacted changes in the Brewer-60 Dobson circulation and by seasonal to decadal variability in stratosphere-troposphere exchange. These processes are difficult to discern and predict based solely on ozone or other atmospheric composition observations (Ball et al, 2019a;Ball et al, 2019bOrbe et al, 2020;Strahan et al, 2020;Dietmüller et al, 2021). Analyses of the processes that are responsible for ozone changes through atmospheric chemistry and dynamical transport rely on the development of Climate Chemistry Models (CCM, Morgenstein, et al., 2018). However, the long-standing differences between the model 65 reconstruction of the past ozone variability and observations suggest the need for improvement of simulations of the seasonal to sub-seasonal processes. Continuous verification of modelling results with the ongoing long-term measurements will help with understanding the processes that determine ozone recovery. Dobson Umkehr time series beginning in the 1950s are one https://doi.org/10.5194/amt-2021-203 Preprint. Discussion started: 23 September 2021 c Author(s) 2021. CC BY 4.0 License. two stations in Europe where sampling is done three times a week). The issue of relatively short time records also applies to Lidar (Jiang et al., 2007) and Microwave  observations.
The Umkehr retrieval algorithm relies on the "self-calibration" technique that applies normalization of a set of morning or 105 afternoon measurements to a single measurement selected at the smallest SZA. This process removes the majority of the instrumental artifacts and homogenizes time series. The vertical distribution of ozone is retrieved in 10 ozone layers between surface and ~45km. However, routine (operational) data processing is still not optimized to account for an out-of-band (i.e. known as stray) light that affects measurements at the high SZAs (Petropavlovskikh et al, 2005b;).
Optimization of stray light correction is a unique process to each Dobson instrument as it depends on its band-pass and optical 110 alignment that are not always known from the historical calibration records. Recent attempts to measure the band-passes of several Dobson instruments in the optical lab with lasers (Kohler et al., 2018) led to an investigation of instrumental uncertainties in Dobson total ozone retrieval. The band-pass adjustment for some instruments lead to several percent change in derived total column ozone. However, not many instruments have been optically characterised so far. The Dobson Umkehr algorithm thus requires an extensive verification of stray light levels in multiple instruments used to create long-term records. 115 Change of the instrument can introduce step changes in the vertical distribution of retrieved ozone profiles and thus affect the stability of the long-term record.
NOAA Dobson ozone observations are positioned to continue monitoring stratospheric ozone recovery for the next 30 years.
In addition to the six NOAA Dobson Stations (Table 1) and four NOAA Brewer stations), Umkehr observations are regularly performed by several Dobson (3) and Brewer spectrometers (6) that are distributed globally. Stratospheric ozone recovery 120 rates will differ between tropics, middle latitudes and high latitudes (WMO/UNEP Ozone Assessment, 2018). Umkehr stations are located at multiple locations around the world and will hence provide important information for tracking ozone recovery.
The current operational Umkehr profile algorithm produces data that have relatively large uncertainty (~ 5 % in the stratosphere), which precludes our ability to detect small changes in stratospheric ozone. The refinement of the processing software is required to resolve the instrument-related offsets in ozone profile retrievals. It is also important to remove offsets 125 between satellite and ground-based ozone profiles to further improve the satellite ozone profile validation process. The main objective is to add value to the validation activities with the continuous improvement of the satellite retrieval algorithms that require new ground-based observations of higher accuracy while additionally reducing the noise in the data improving the usefulness for trend analysis.
In this paper we discuss optimization approach to homogenize long-term Umkehr ozone profile records. In Section 2 we 130 describe several long-term ozone observing records and model simulations of stratospheric ozone variability selected for this study. We also discuss a matching criterion for comparisons of these records with ground-based observations. In Section 3 we present methods developed for identification of vertical and temporal offsets between operational Umkehr and other ozone observing systems. Then, we describe the approach for removing offsets to homogenize Umkehr record. Finally, in Section 4, we demonstrate the consistency between optimized Umkehr and other ozone records. Dobson total column ozone records are regularly used in satellite record validation (Bai, 2015;Koukoulil, 2016;Boynard, 2018) and development of the global combined ozone data records (Fioletov, 2008;Hassler 2018). In 2017 NOAA long-term Dobson total column ozone records at 15 stations were homogenized to account for inconsistencies in the past calibration 140 records, data processing methods and selection of representative data. The updated total ozone records are used in Umkehr ozone profile retrievals. Descriptions of three Dobson stations used in this paper analyses, instrumentation, and total ozone data changes can be found in Evans et al. (2017)  The Umkehr data collection is automated by the NOAA WinDobson operational software (Evans et al., 2017) that schedules zenith sky observations at C-pair spectral channels during the morning and afternoon hours. The software uses the near-IR cloud detector to screen the Umkehr data for clear sky conditions, interpolates screened observations to 12 nominal SZAs, adds total column ozone information, processes data and checks retrieved ozone profiles for quality flags and against station climatological variability (+/-2 standard deviations). NOAA Dobson Umkehr operational ozone profile data are posted on 155 the GML archive https://gml.noaa.gov/aftp/data/ozwv/Dobson/AC4/Umkehr/. The Umkehr observations are archived at the WMO ozone and UV Data center (www.woudc.org), operated by the Environment Climate Change Canada, where the centralized data processing is done by python-based version of the UMK04 processing software (https://github.com/woudc/woudc-umkehr). The content of the files at the NOAA and WOUDC archives is the same for the operational Umkehr ozone profile record, but the format differs. 160

Ozonesonde data.
The ozonesonde instrument has been launched on the meteorological balloons since the 1980s at ten NOAA stations. Evolving instrumentation has created discontinuities and gaps leading to inhomogeneous data records. NOAA and the international community developed homogenization methods for NOAA and SHADOZ networks (Sterling et al, 2018;Witte, 2018). The error budget for each profile is calculated and included in the archived files (Sterling, 2018). Modern ozonesonde instruments 165 sample ozone at the high vertical resolution, on the order of 100 -200 m. The sondes constitute an essential component of satellite calibration and cross-calibration (Hubert, 2016), and are used for verification and improvement of climate chemistry, https://doi.org/10.5194/amt-2021-203 Preprint. Discussion started: 23 September 2021 c Author(s) 2021. CC BY 4.0 License. chemistry-transport models and reanalyses (Stone et al, 2016;Miyazaki and Bowman, 2017;Wargan, 2018;Stauffer, 2018).
The ozonesonde profile records provide key measurements for the middle and lower stratospheric, and tropospheric ozone trend calculations, and are a benchmark network for stratospheric ozone profile observations (Steinbrecht, 2017;170 SPARC/IOC/GAW, 2018;WMO, 2018). Data for ozonesonde records are publicly available from the NOAA Global Monitoring Lab (GML) at https://gml.noaa.gov/aftp/ozwv/Ozonesonde/, from the World Ozone and Ultraviolet Radiation Data Centre (WOUDC) at www.woudc.org, from the Network for the Detection of Atmospheric Composition Change (NDACC) at www.ndacc.org, and from the NOAA National Centre for Environmental Information (NCEI) archive at https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ncdc:C01562. In this paper we are using ozonesonde data from Boulder, 175 USA; Hilo, USA; and Lauder, New Zealand. The data for the first two stations are taken from the NOAA GML archive and are homogenized version (Sterling et al, 2018). The Lauder ozonesonde data prior to 2018 were provided by Richard Querel of NIWA, New Zealand for the use in the LOTUS Report (2019). This dataset is not homogenized, and the data are the same as archived at NDACC (http://www.ndaccdemo.org/). We extended Lauder ozonesonde data with the un-homogenized 2018-2020 data downloaded from the NDACC archive (last accessed in April 2021). The OHP ozonesonde data were homogenized 180 in 2020. The data are available from the NDACC archive (Gaudel et al., 2015). However, the NDACC version at the time of data analyses contained some small errors associated with the telemetry noise in the recent measurement period. Therefore, we used the latest version provided by G. Ancellet and S. Godin-Beekmann of Latmos, France (private communications, June 15, 2021), which is also now archived at NDACC.

Satellite ozone profile data 185
Several satellite records are used for monitoring ozone globally and vertically. In this paper we are using daily NOAA and NASA long-term records that are sampled for the Umkehr station overpass conditions and also matched in time with Umkehr profiles.

SBUV and OMPS ozone profile records
NASA and NOAA have produced satellite measurements of ozone profiles through the Solar Backscatter Ultraviolet (SBUV) 190 and related instruments (Nimbus 4 and 7) providing nearly 40 years of continuous data (1978 -present). The use of the common-design single instrument dataset eliminates many homogeneity issues including varying vertical resolution or instrumentation differences. Version 8.6 SBUV data incorporates additional calibration adjustments beyond the Version 8 release , Bhartia et al, 2012. Small but evident biases remain .
The Suomi National Polar-orbiting Partnership (S-NPP) satellite of the Joint Polar Satellite System (JPSS) was launched in 195 October 2011 (Flynn et al, 2006). It carries the Ozone Mapping and Profiler Suite Nadir Profiler (further referred to as OMPS) sensor that collects high spectrally resolved solar backscattered radiance in the sun-lit part of the globe (Seftor et al, 2014).
OMPS makes measurements from 250 to 310 nm with a 1.1 nm resolution. It has a 16.6° cross-track FOV and 0.26° alongtrack slit width, but several spectrums are combined to cover a footprint of 250x250 km. The ozone profile retrieval is very https://doi.org/10.5194/amt-2021-203 Preprint. Discussion started: 23 September 2021 c Author(s) 2021. CC BY 4.0 License. similar to Rodger's optimal statistical method deployed in the SBUV and Umkehr retrieval techniques. Validation of the 200 NOAA operational OMPS ozone profile products is described in Flynn et al. (2014). Evaluation of the OMPS NASA V8.6 algorithm products for trend analyses is described in McPeters et al. (2019).
In this paper we used two satellite combined records. The first record is the NASA aggregated dataset (further referred to as AGG) which is comprised of SBUV, SBUV/2 and OMPS profiles from all (Nimbus 4 through NOAA 19) overlapping satellites and using the NASA version 8.6 processing . The AGG station overpass data are selected from all daily 205 records that are found within the +/-2/20 degrees latitude/longitude box centred on the station location and averaged using 1/distance weighting to the station location.
The data set for Boulder station is available at https://acdext.gsfc.nasa.gov/anonftp/toms/sbuv/AGGREGATED/sbuv_aggregated_boulder.co_067.txt. The AGG overpass records for other Umkehr stations can be found in the same directory. Sometimes, there are 2 or 3 satellite overpass data found for a single day. For the purpose of comparisons with Umkehr data all daily records are averaged. 210 The second record is the COH data set that combines records data from the SBUV/2 and OMPS (NOAA processing, further referred to as OMPS_NOAA) instruments on the many satellites using correlation-based adjustments providing an overall bias adjustment plus an ozone dependent factor (SPARC/IO3C/GAW. 2019). The resulting profile product is a set of daily or monthly zonal means, has been used in climate reviews (Weber, 2018;Steinbrecht, 2017) and is publicly available at https: ftp.cpc.ncep.noaa.gov/SBUV_CDR. 215 In order to create the station overpass data each SBUV/2 and OMPS satellite record is sampled separately to find all daily records from +/-2/20-degree latitude/longitude box centred on the station, The collected profiles are 1/distance weighted to the station location and averaged. This is a similar process to the AGG overpass record but does not combine daily data from different satellites. The overpass data from each satellite is adjusted using the SBUV COH technique developed for zonal average data. The SBUV/2 & OMPS COH station overpass data (further referred to as COH) are available at NOAA website 220 at htps://ftp.cpc.ncep.noaa.gov/SBUV_CDR/overpass.

Aura MLS profiles
The Microwave Limb Sounder (MLS) measured ozone profiles from the UARS and Aura satellite platforms (Waters et al, 1999). We use Aura MLS Version 4.2 data (Livesey et al, 2020) for comparisons with Umkehr observations during the 2005 -2020 period. MLS Version 5.1 was not available at the time of analysis, the ozone product is not expected to differ 225 significantly between the two versions (Levesey et al, 2020). Ozone profiles are provided on 12 pressure levels per decade, the vertical resolution of MLS AK is about 2.6 km in the middle stratosphere and increases to ~3.5 km at 1 hPa pressure level.
The MLS mixing ratio profiles are converted to layers in DU using pressure and temperature profiles provided in the files as 2 reanalyses (Wargan et al, 2017 and references therein). Section 2.4 discusses MERRA2 data use in the global NASA chemistry transport models used for Umkehr homogenization.

SAGE II ozone record 235
SAGE is an ongoing series of solar occultation instruments spanning several decades providing high-precision vertical profiles of ozone from the troposphere to the mesosphere with ~1 km vertical resolution. Providing the longest single-instrument record of stratospheric ozone, SAGE II (Mauldin et al., 1985) was operational onboard the Earth Radiation Budget Satellite between October 1984 and August 2005. In this paper we use the 1985 and 2000 period to avoid the reduced sampling after 2000. In mid-inclination orbit (57°), the instrument observed upwards of 31 solar occultation measurements per day (~15 sunrises and 240 ~15 sunsets as viewed from orbit). The sampling is such that, for each event type, successive observations are evenly spaced in longitude (i.e., ~24° between each) and slowly moving in latitude, collectively providing uniform sampling over two separate latitude bands of different meridional extents (i.e., larger near the tropics and narrower at mid-latitudes) in any given day that slowly shifts from day to day. Because of the infrequent sampling, the matching criteria for the SAGE II ozone satellite data is relaxed to +/-20 degrees in longitude and +/-2 degrees in latitude. The SAGE II ozone V7 data are available as number 245 density profile at pressure levels from this directory:https://doi.org/10.5067/ERBS/SAGEII/SOLAR_BINARY_L2-V7.0 . The number density profile is converted to ozone partial pressure and to DU (1 DU is 2.69×1020 molecules per meter squared) using pressure and temperature profiles provided in the files which are based on MERRA. The high-resolution SAGE II profile is smoothed with AK from the respective Umkehr profile found by temporal and spatial matching as described above.

GMI CTM and M2GMI simulated ozone profiles
The NASA Global Modeling Initiative chemistry transport model (GMI CTM), an off-line model driven by MERRA2 meteorological reanalysis (Gelaro et al., 2017), is used to assess the impact of various natural and anthropogenic perturbations of atmospheric composition and chemistry (Strahan, 2013). Strahan et al. (2016) uses the excellent agreement between simulated and observed seasonal evolution of Arctic N2O to demonstrate the simulation's value in quantitatively separating 255 chemical from dynamical changes in polar ozone depletion during the Aura period (2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015). Douglass et al. (2017) compared a GMI CTM simulation with mid-latitude NDACC column measurements of long-lived reservoir species HNO3 and HCl to verify the realism of MERRA2 transport in both hemispheres from 2004 to the present and to demonstrate the value of GMI CTM simulations to explain how sparse sampling impacts interpretation of trends in the observations. Strahan et al. (2015) analysed MLS N2O data to show that the QBO had a profound and far-reaching impact on Cly variability in the Southern 260 Hemisphere. The QBO modulates the extratropical mean age (and hence N2O and Cly) each winter, and the impacts are then The CTM is integrated at 1-degree horizontal resolution on 72 vertical levels from the surface to 0.01 hPa and uses MERRA2 meteorological fields as input. The output from the GMI CTM simulation is available for 1985-present 265 (https://portal.nccs.nasa.gov/datashare/dirac/gmidata2/users/mrdamon/Hindcast-Family/HindcastMR2V2/). The CTM's tropospheric physical processes include convection, boundary layer turbulent transport, wet scavenging in convective updrafts, wet and dry deposition, lightning NOX production, and anthropogenic, natural and biogenic emissions. The chemical mechanism uses JPL-2015 rates and currently has 119 species and more than 400 kinetic and photolytic reactions; it is an updated version of the mechanism described in Duncan (2007). 270 Customized GMI CTM simulation outputs were created for the three NOAA Dobson Umkehr stations for 1979-2017 to assist in the assessment of the instrumental offsets and to develop instrument-specific corrections to homogenize Umkehr record.
GMI CTM data at the NDACC sites (including six NOAA Umkehr sites) is available at www.ndacc.org. The files contain vertical profiles of O3, NO2, H2O, temperature, pressure, potential temperature, and potential vorticity on a geometric altitude grid with hourly time resolution. Model output is generated on geometric altitude, geopotential height, or pressure level grids 275 as needed for comparisons with Umkehr that is derived as pressure level gridded layer data. Daily global ozone, trace gas, and meteorological fields are also available as needed for synoptic-scale interpretation of Dobson and ozonesonde data.
We use another simulation M2GMI (Orbe et al, 2017, Wargan et al, 2018) that is available for Umkehr step-change analyses.
It is called MERRA-2 GMI ("M2GMI"). M2GMI is the full GEOS general circulation model (GCM) with the GMI chemical mechanism and is driven by the MERRA-2 horizontal winds, temperature, and surface pressure using the 'replay' methodology 280 (Orbe et al., 2017). The MERRA-2 assimilated meteorological fields are used by the model to simulate meteorology that is continuously adjusted to the MERRA-2 winds, temperature and surface pressure. Comparisons of the M2GMI against MERRA2, GMI CTM, and ozonesonde profiles have been recently described in Stauffer et al (2019).
The step-change in the GMI CTM ozone record in 1998 was documented (Stauffer et al, 2019 and references therein). The TOVS/ATOVS satellite data transition in 1998 and assimilation of MLS temperatures in 2004 impacted the MERRA-2 285 meteorological fields (Gelaro et al., 2017). The MERRA2 analysis increments alter the wind fields that come from its general circulation model (GCM), pushing them toward the meteorological observations. Where the GCM has biases, the increments are large, driving unrealistic circulations that impact the GMI CTM stratospheric ozone distributions in the tropics and subtropics.
There are differences between the GMI CTM and M2GMI ozone simulations. Even though they both use the same full GMI 290 chemical mechanism, the meteorology used in the 2 models is not identical. In the GMI CTM the MERRA-2 meteorological product is used. M2GMI output is driven by a specified dynamics (SD) simulation. Instead of using MERRA-2 meteorology, this SD uses a different method: "replay" (see further description in Orbe et al, 2017). Because the 1998 and 2005 discontinuity is smoothed in the M2GMI ozone record (Stauffer et al, 2019), we decided to use its ozone data as a reference for the Umkehr optimization. In addition, we are using the GMI CTM output for assessment of changes in the optimized Umkehr record and 295 for evaluation of ozone variability represented by two modelling records. The M2GMI ozone profile output is sub-sampled for Boulder, OHP, MLO (or Hilo) and Lauder Dobson station geolocation (selected from the grid closest to the station location) and is matched within 30 minutes to the Umkehr observation (local time for the averaged sun elevation between 70 and 90 degrees SZA). The ozone profiles are provided on the constant pressure levels that are converted to DUs and smoothed with Umkehr AK to created Umkehr-like layers. This is the version of data that 300 is used as a reference dataset for Umkehr optimization. The M2GMI ozone and temperature profiles are available for 1980-2019 time period (https://www.esrl.noaa.gov/gmd/aftp/data/ozwv/Dobson/AC4/). In addition, the temperatures are used to adjust ozone absorption cross-sections in the radiative transfer modelling of Umkehr curves to account for the diurnal, daily and seasonal ozone variability in stratosphere (See Appendix D).

FG11 and QBO a priori 305
FG 11 (further referred to as fg11ap) is a climatological ozone dataset (McPeters and Labow, 2011) that describes typical ozone variability with latitude (5 degree zonal averages) and season (12 months). It is based on the Aura MLS and ozonesonde records measured between 2005 and 2010. Note that the ozone profile on any day of the year is the same in each year of the record. Thus, ozone in each Umkehr layer only changes seasonally.
The QBO a priori (further referred to QBOap) is an ozone climatology developed for analyses of the SBUV records to improve 310 soft calibrations for the MOD ozone record (Ziemke et al., 2021). In addition to the seasonally and latitudinally dependent climatology the method empirically modifies ozone profiles based on the phase of the QBO cycle. The QBOap is a zonally (36 5-degree latitude bins) and monthly averaged dataset available from 1970 to 2019.
Both climatologies are matched with the dates and latitude location of the Umkehr observation at Boulder stations (40.05 N) and are also AK -smoothed. 315

Combined MLS and ozonesonde record
The Aura MLS record (described in section 2.3.2 above) is matched with ozonesonde profile by date (+/-12 hours) and location (+/-5 degrees in longitude and +/-5 degrees in latitude). The approach to the combining of MLS and ozonesonde record is described in McPeter and Labow (2011). We use this method to extend MLS station overpass ozone profile below 100 hPa with the ozonesonde profiles. The time series of MLS-ozonesonde combined profiles between 2005 and 2020 is created for 320 Boulder station. The extended dataset is indicated by SND_MLS in the figures and is used in the homogenization process.

Description of the Dobson measurement uncertainties.
The Dobson consists of two monochromators and a slit plate for selecting two bands of the UV solar spectrum approximately 20 nm apart. An optical wedge and photomultiplier tube are used to determine the relative intensity of the pair (see Komhyr 325 and Evans, 2006  unique optical system. Some of the optical wedges are made from fused silica and others from quartz glass. Fused silica has higher UV transmission and is relatively even across the spectra used by the Dobson. The transmission of quartz glass is several percent less and passes longer wavelengths more efficiently. The optical wedges are also designed to have a logarithmic density curve, but wedge calibrations show that it's not uniform across the entire wedge, and some are inherently darker overall. An 330 error in poorly mapped wedge tend to increase toward the darker portion of the wedge, which would have a greater effect on measurements made at large SZAs. The thickness of the cobalt filters can make observations at longer wavelength more susceptible to stray light. With time, the optical alignment in the instrument may shift or the optical prisms may degrade. These changes are identified during the calibration procedures (every 4-6 years) and post-corrected to homogenize the ozone record at the station. The total 335 ozone changes are typically corrected with a linear adjustment (step change or time dependent increments based on comparison with the Dobson standard), but for Umkehr measurements the changes are identified through the characterization of the optical wedge which is then mapped into R-N tables that produce Umkehr N-values. The relation between R and N are not linear and thus can modify the shape of the Umkehr curve after the calibrations. This is a small change in N-value but can result in a significant (above uncertainty) step change in the Umkehr ozone profile. 340 The optical characterization of instruments is not a simple task. It often requires mapping of the Dobson response to the laser beam that shines on the Dobson optics (Christodoulakis et al, 2015;). The original method was developed by Dobson (1957), but it is now done with two standard lamps (Komhyr and Evans, 2006). The calibration N tables are changed when the difference between the station and reference instrument is greater than the equivalent of 1 % in TOC.
Recent investigations of the difference in the bandpasses of three reference Dobsons (regional standard Dobsons No. 064, 345 Germany,No. 074, Czech Republic, and the world standard No. 083, USA) were performed in a laboratory setting with support from the EMRP ENV 059 project "Traceability for atmospheric total column ozone" (Kohler et al, 2018). Although some small deviations in the band-passes were found, the effective absorption cross sections derived using each Dobson slit function did not differ significantly and thus affected the derived total column ozone by less than 2 % (depending on the ozone cross section and wavelength pair). Unfortunately, the laboratory setting did not allow assessment of the stray light contribution for 350 the three Dobson instruments.
The non-laboratory-based methods can be used to discern the level of the stray light when referenced against another instrument with similar (Christodoulakis et al, 2015) or higher level of stray light rejection (Moieni et al, 2019). However, even with the knowledge of the instrument specific band-pass (shape and spectral alignment) and with the expected level of stray light (between 10^-4 to 10^-5) a small, but significant SZAs dependent bias remains unexplained in Umkehr observations. 355 Moreover, this bias propagates into the retrieved Umkehr profiles and creates a 5-10 % bias relative to other ozone observing techniques . The next session demonstrates the standardised stray light corrections and changes in the Umkehr biases.

Standardised stray light corrections.
The impact of a stray-light induced error in the Umkehr retrieval is described in Petropavlovskikh et al. (2009) where Umkehr 360 profiles in Boulder were compared against NOAA-11 and NOAA-16 SBUV/2 V8 satellite and ozonesonde co-incident profiles. It is further demonstrated in this paper by comparing multi-year biases between operational Umkehr retrievals at three additional stations (Haute Provence, France, Mauna Loa, Hawaii and Lauder, New Zealand, see Table 1 for details) and several satellite records (Aura MLS, AGG and COH, see details in Table 3). Prior to comparisons, all records with vertical resolution less than 2 km (satellites and ozonesondes) are converted to DU, interpolated to 61 pressure levels (quarter of a standard 365 Umkehr pressure layer) and smoothed with the Umkehr AKs. Subsequently, the high-resolution profiles are integrated to the ten standard Umkehr layers (see Table C1). in other layers with the largest positive bias (up to 15 %) in layer 8. The bias in layers 3, 6, 7 and 8 are larger than 5 % that is Umkehr retrieval uncertainty for these layers. Layer 1 bias is also larger than 5 %, but Umkehr retrievals uncertainty in this layer is ~ 10-15 %. Ozonesonde and COH biases are similar for the two periods. Aura MLS bias is also similar to the COH bias. The M2GMI model comparisons show a larger bias. The GMI CTM shows the smallest bias in comparison to Umkehr profiles in layers 5-8, whereas the bias increases in the second period but remains the smallest in the upper stratosphere. In 380 layers 2 and 3, GMI CTM has the largest bias, where M2GMI shows the lowest bias. Ozonesondes have the lowest bias in layers 3 and 6, high bias in layers 4 and 5 and large negative bias in layer 2. The models have lower bias in layers 6-8 as between standard deviations (SD) in two time periods, and they are larger than 5 % in layers 2, 3 and 4. The largest SDs are found in comparisons between ozonesonde and Umkehr. This could be related to a large vertical variability captured by ozonesondes and the limitations in the Umkehr AK smoothing. However, the SD in layer 2 is still below 15 %, which is the estimated Umkehr retrieval uncertainty in the bottom layers. In summary, we demonstrated that the standardized stray light 395 corrections do not fully reduce the bias between Umkehr and other ozone observing methods. Since the optical characterization of each Dobson instrument is not yet possible, the optimization approach is discussed next. In this paper we discuss an empirical approach to minimize simulated and observed Umkehr differences at large SZAs.

Empirical correction methodology
This section describes the new method developed for optimization of Dobson ozone profile retrievals to account for the 400 instrument-specific out-of-band stray light and other optical artifacts. This approach is used for homogenization of the long- with high (0.1 nm) spectral resolution. The convolution of spectrally resolved zenith sky radiances and standardized band-pass functions (Komhyr, 1993) are performed to create N-values at ten nominal SZAs. In the next step, the multiple scattering and refraction corrections are selected from look-up tables (LUT) that are prepared by the radiative transfer simulations of the Umkehr observations (Petropavlovskikh et al, 2005;Petropavlovskikh et al, 2009) using a set of climatological ozone profiles (McPeters et al, 1998). Corrections are selected based on the station location (i.e. in low, middle or high latitude regions) and 415 adjusted to the total ozone observed for the day. In the following step the standardized stray light out-of-band corrections are selected from LUT similarly developed to the scheme described above . This means that up to this point the Umkehr N-values are simulated for an idealized Dobson instrument. The assumption for out-of-band rejection (or SLC) of the UV light in a typical Dobson instrument is on the order of 2 x10 -5 level  but can vary between instruments (Moeni et al, 2019) and therefore can vary between Dobson instruments sequentially operated to 420 create the long-term station record.
In order to test the representativeness of the M2GMI's vertical ozone distribution over Boulder, the above described process is repeated by using several reference ozone records, including Boulder overpass output from the GMI CTM and M2GMI  Table 2 contains the dates and time periods selected to apply empirically derived adjustments to Dobson observations in Boulder, CO. The decision to adjust Umkehr data is tested every time the station instrument is replaced with a new instrument.
The change in observations can occur due to different levels of out-of-band rejection unique to each Dobson instrument optics 435 system. Therefore, another reason to re-process the data is after optical repair whether caused by sudden physical damage (i.e. fall of the instrument) or long-term wear-and-tear due to exposure to the weather elements (i.e. sea salt erosion). The instrument repair can include replacement of the optical wedge, replacement of the photo-counter or change of the centre of the bandpass due to a new temperature setup for the Q-levers).
The optimization method accounts for undetermined deviations in the optical system that have not been captured at the time 440 of the exchange or repair of the Dobson instruments. The changes may not be significant for accuracy of the total column ozone observations, but may be large enough to change Umkehr curve and create a step change in ozone record. To verify empirical adjustments and the consistency of re-processed Umkehr time series, in section 5 we present comparisons with independent ozone observing systems (satellites, ozonesonde) and co-incident with Dobson observations. In order to select the most effective empirical adjustment for the Boulder Umkehr data processing in 2008-2015, Umkehr ozone profiles retrieved with multiple empirical corrections are compared to the MLS station-overpass ozone profiles (Fig.2c).

Discussion of optimization results 445
The goal is to have a zero bias through the difference profile comparisons with MLS. Empirical optimizations minimize ozone bias in comparisons to the MLS profiles; however, optimized Umkehr profiles still show +/-5 % bias, even when the MLS 460 profiles are used as a reference (see results for the MLS and sonde combined profile, SND_MLS). There are some differences between optimized datasets, but they all agree within the uncertainty of each empirical correction. Results show the wave-like distribution of biases that change from negative bias in the upper stratosphere to positive in the middle, then again to the negative bias in the lower stratosphere and to the positive bias in the troposphere. Some of these biases are due to the Rodgers optimal estimation technique that relies on the vertical ozone profile smoothing and a priori covariance that assumes cross 465 correlations between adjoining layers (Rodgers, 1990). There is also a limitation in Umkehr observations that makes it difficult to clearly separate ozone information between tropospheric and stratospheric layers.
Since no Aura MLS data are available prior to 2004, we select the M2GMI dataset to develop optimized corrections for the entire Umkehr record. The M2GMI correction is derived separately for each calibration time period of the Dobson record in Boulder (Table 1). We note that the M2GMI-based optimized correction produces a small (+/-5 %) but significant bias in 470 retrieved Umkehr ozone profiles relative to the MLS profiles averaged over 2005-2018 period (see Figure 2c, green line). It means that there is an additional difference between the atmospheric state and the Umkehr observation that is not adequately simulated in the forward model of the Umkehr retrieval. Therefore, the iterative modification of the M2GMI N-value correction (see dark line in Fig.2 a) Table 2 for the dates). 480 The optimized corrections indicate several distinct time periods that change the mean ozone levels in time series and therefore impact trends calculated after 2000.

Comparisons of optimized Umkehr time series against reference records
This section discusses the vertical and temporal changes in the Umkehr optimized ozone record. Comparisons between operational OPR (red), standardized SLC (green) and optimized OPT (blue) ozone in layer 8 (4-2 hPa) at Boulder are plotted 485 in Fig. 4 Figure B3 showing 1979-1994 period). The 495 optimization method identified the need for an adjustment on the order of 10 %. This change in stratospheric ozone levels at the beginning of the Boulder Umkehr record can significantly reduce trends derived from the homogenised record (optimized series) prior to 1997 and bring it to a closer agreement with the satellite combined zonally averaged trends (LOTUS report, Figure 5.9a in Chapter 5). A discussion of trends is beyond of the focus of this paper and will be addressed in a follow up publication. 500 The step changes in the differences are clearly seen in this plot and vary between 0 and -15 % during non-volcanic periods, while during the volcanic periods (see Fig. B3) corrections can be as large as -30 %. Optimizing Umkehr ozone profile retrievals during the volcanic eruption follows a similar procedure as described above. When large volcanic eruptions inject aerosols into the stratosphere the operational Umkehr retrieval is not set up to account for the change in atmospheric scattering. Therefore, the errors in operationally retrieved Umkehr profiles can be as large as 70 %. For trend analyses, the volcanic time 505 periods in the Umkehr time series (i.e. 1991-1993) are typically removed prior to fitting the statistical model to the data. The optimization method reduces the introduction of gaps in Umkehr time series so that the entire record can be used for trend analysis. An example of volcanic period corrections and discussion of results is shown in Appendix B.

Changes in mean and seasonal biases
After reprocessing of the Umkehr data with optimization corrections, the changes to vertical profiles are verified through 510 comparisons against independent ozone observations that are matched with Umkehr record. For verification of changes in Umkehr data at Boulder, satellite overpass data are used for comparisons (Table 3) and co-incident ozonesonde profiles.   Table 2 for the dates of calibrations and the WinDobson automation events). The standardized 560 stray light correction is a long-term mean adjustment that depends on ozone climatology and is total ozone dependent . It creates the seasonally dependent adjustment (less than 1 % in the upper stratosphere), but this correction does not add significant long-term trend. However, different optimized corrections are applied to the individual periods between instrument calibrations, which results in different amounts of increases in the retrieved ozone.
To highlight changes in optimized Umkehr, the COH overpass ozone record is plotted as a reference (red line). The vertical 565 dotted lines indicate the periods of satellite records (also see abbreviations at the top of the plot) that are combined in the COH ozone dataset. The difference between COH and operational Umkehr data is shown as a dark green line with the mean negative bias of ~15 % (0%, 5 %) in layer 8 (6, 4 respectively) that varies seasonally and temporally. The percent difference between optimized Umkehr and COH is shown as a light green line. The average bias is close to zero, while the seasonal changes are also reduced in layer 8 and 4. The main change in the optimized Umkehr ozone dataset is the increase/decrease in ozone 570 amount vertically (three panels in Fig. 7). Also noticeable are changes in the relative shifts between calibration periods. For example, the change in the offset between COH and operational Umkehr biases (dark green line) is seen in 2001-2006 and 2006-2011 periods. This step change offset is largely reduced in COH comparisons against the optimized Umkehr version (light green). In addition, we do not find any evidence of a step change in the optimized data in 1998 or 2004/2005 that could be related to step-changes found in M2GMI (Stauffer et al, 2019). Similarly, in optimized records of three other Umkehr 575 stations (see Appendix A) we do not find any impact from the M2GMI step changes. For example, at MLO station, the instrumental artifacts in Dobson operations resulted in significant step change in Umkehr operational data but were completely eliminated by the optimization method (see Fig. A6a.). The importance of these changes for trend analyses is alluded in the LOTUS 2019 Report, and will be further discussed in a future paper.

Conclusions 580
In this paper we discussed a method for the Umkehr profile optimization and its impact on the homogenization of the long- careful and robust approach to instrument exchanges, repairs and calibrations against the WMO standard Dobson 083 allowed for collection of high quality long-term records of stratospheric ozone changes. The optimization provides a tool for a finetuning of the Umkehr retrievals, removing of the instrumental biases, and empirically evaluating the impacts of stray light contributions to the observations over different time periods. However, the optimization is not meant to reduce the bias between 590 the reference model and Umkehr ozone profile. The models are used only as a guide to assure the continuity of optimized ozone after evaluating and removing step changes caused by Dobson instrumental artifacts, changes to the data collection protocols and data processing. This careful approach aims at homogenizing Umkehr time series for trend analyses, reducing noise in the data and supporting NOAA and WMO efforts at detection of ozone recovery under the Montreal Protocol guidance.

Appendix A. Umkehr biases and time series for OHP, MLO and Lauder 595
This appendix shows comparisons of the operational (Fig. A1) (Fig. 5a). This reduction in upper layer biases is accompanied by changes in the biases in the lower layers, especially in layer 2 (decrease), 3 (increase for OHP and Lauder) and layer 4 (increase for MLO).
After the optimization (Fig. A3), a significant reduction in biases is found in layers 4-9 with the exception of comparisons at 605 Lauder where M2GMI bias has become more negative, but still less than 5 %, which is within the uncertainty of the Umkehr retrieval. At the same time, the M2GMI bias in layer 2 increased to 5 % at OHP and less than 5 % at MLO, which is similar to the bias between ozonesonde and optimized Umkehr. The Lauder station features a negative bias between the optimized Umkehr and ozonesonde data in layers 2 and 3, while a positive bias of similar magnitude is found at OHP. No significant biases are found in comparisons of MLO optimized Umkehr with Hilo ozonesonde data. The Hilo ozonesonde profiles are 610 limited to the pressure level above 680 hPa, which is the surface pressure at MLO.
The Umkehr and COH comparisons in layer 2 and 3 at all three stations show small and positive biases (<7 %). The AGG biases with respect to optimized Umkehr are similar to the COH-Umkehr biases, which suggests that AGG and COH records compare well during 2005 to 2019 periods. The largest bias is found between GMI CTM and optimized Umkehr at Lauder and OHP locations. The MLS record agrees with other records at OHP and MLO, but is biased high in layer 4 and agrees with 615 COH and AGG data in layer 3, while in layer 2 MLS bias is similar to ozonesonde and M2GMI records.
The impact of standardized SLC and optimized corrections derived for OHP (Fig. A4), MLO (Fig. A5) and Lauder (Fig. A6) are comparable to the results shown in Figure 3, where the largest changes to N-values are found at larger SZAs. The comparisons, we notice that the biases between M2GMI and Umkehr are reduced by 2 % in the upper stratosphere in 2005-2020 period, whereas relative biases between M2GMI and GMI CTM are contained. Similar changes in the upper stratospheric biases between the M2GMI and optimized Umkehr for two analysed periods are found for Boulder (Fig. 5a), although the absolute biases are different. The 2005-2020 period shows an improved agreement between Umkehr and M2GMI record, as well as bias between Umkehr and COH record is reduced below 20 hPa, except there is no reduction with respect to the GMI 640 CTM bias. The differences in stratospheric ozone offsets between ozonesonde/M2GMI (smaller) and ozonesonde/GMI (larger) over Lauder were reported in Stauffer et al. (2019) paper and had similar range of biases derived in this study at the lower and middle stratosphere.
In conclusion, the optimization reduces biases between Umkehr ozone profiles and most of the alternative coincident records to less than +/-5 % at all four stations. Investigation of large biases between GMI CTM and other records at Lauder needs 645 further investigation.

Appendix B. Umkehr optimization for volcanic time periods.
During the volcanic eruptions (like El Chichon in 1982or Pinatubo in 1991 sulphate aerosols were ejected to stratosphere and were transported globally or semi-globally. The large amounts of aerosol load, as large as to 0.1 in optical depth in UV wavelengths (Stevermer et al, 2000), significantly contributed to the scattered light in the atmosphere. Figure  operation Umkehr ozone retrieval algorithm does not account for aerosol-produced scatter in its forward model, and therefore interprets changes in observed N-values as changes in ozone profile. This creates a period of erroneous ozone values. Figure   B1, panel a, shows the reduced ozone in stratospheric layer 8 of the MLO operational Umkehr record (black line). The most depleted ozone is found soon after the eruption, coincident with the largest aerosol load, and then the error in retrieved ozone 655 is slowly reduced following a decay in aerosol particles over ~2-year time period.
The optimization to remove the low bias in the Umkehr ozone profile during volcanic periods is performed following a similar approach as discussed in this paper, except the corrections are developed for several 6-month long incremental periods. The result is shown in Panel b with a blue line. For the reference, the M2GMI data are also shown (red line) and compare well with the optimized Umkehr results. Yet as an independent reference the SAGE II data are also shown for comparisons (green line). Should not be used at least at lower levels, questionable at higher levels. Comparisons would be better with SAGE II here.] Table C1 summarizes the Umkehr layer system. Each Umkehr layer is defined by the pressure at the bottom of the layer. The highest layer extends to the top of the atmosphere. The standard Umkehr output is constructed in 10-layer system (2 left-most 670 columns in Table C1). The 16-layer system (two middle columns in Table C1) is used for the AK output. The 61-layer system (two right-most columns in Table C1) is the working grid of the Umkehr retrieval algorithm to avoid interpolation errors (Petropavlovskikh et al., 2005). Figure C1 shows Umkehr Averaging Kernels (AK) as function of pressure and respective Umkehr layers (as defined in Table   C1). The 61-, 16-and 10-layer AKs are shown in panels a, b, and c respectively. The AK concept is described by Rodgers 675 (2000). For plotting purposes, we follow Bhartia et al. (2013) formulation of the "smoothing" kernels that act as the low-pass filters to smooth fractional anomalies in each layer. The Fig. C1 shows the rows of the fractional AK that indicate the sensitivity of the retrieved ozone at that layer to changes in ozone at all layers. The red/green colors (see legend in panel a) highlights the high/low informational content of the AK. The maximum AK values between ~100 and 2 hPa are aligned with the nominal altitude of the layer, indicating the highest informational content obtained from that layer, while contribution from 680 adjacent layers is reduced at an exponential rate. Below 100 hPa (and above 2 hPa) level the maximum of the AK is shifted higher (lower) in altitude and the AK becomes broader. This means that the retrieval is the most sensitive to the ozone variability in the above (below) layers and therefore relies more heavily on the a priori information in order to separate https://doi.org/10.5194/amt-2021-203 Preprint. Discussion started: 23 September 2021 c Author(s) 2021. CC BY 4.0 License.

Appendix C. Umkehr Averaging Kernel
informational content into individual layers. The Umkehr ozone profile is reported in 10 layers selected such that the informational content is provided at about two datapoints per width of the smoothing kernel. The vertical resolution at the 685 bottom of the profile is poor and therefore several layers are combined into a thicker layer that represents tropospheric column ozone information (see Table C1).

Appendix D. Temperature sensitivity in Umkehr retrievals
The UMK08 operational algorithm (Petropavlovskikh et al. 2005) is based on the Bass and Paur (BP) ozone cross-section (Bass and Paur, 1985), convolved over the Dobson C-pair standardized band-pass (Komhyr et al, 1993;Petropavlovskikh 690 et. al, 2011). The impact of the ozone cross-section on the uncertainty of Brewer-and Dobson-observed total column ozone retrieval was outlined in detail by Redondas et al. (2014). They found that the use of Serdyuchenko et al. (2013) (Orphal et al., 2016). For operational Umkehr retrievals, the effects of the ozone cross-section were found to be minimal (less than 2 %) when NRL (Summer et al 1993) climatological temperatures were used in the retrieval. However, the report suggests the use of the Serdyuchenko et al. (2013) ozone cross-sections in Umkehr retrievals whenever total ozone is processed with that cross-section. It is also suggested that day-to-day and diurnal variability 700 in stratospheric temperatures is larger than represented by climatology and thus may add additional 1-2 % change in daily stratospheric ozone that is not currently captured by operational Umkehr retrievals. The nominal absorption cross-sections for the WMO GAW Dobson total ozone network are derived based on the selection of a standardized temperature −46.3 • C (Komhyr et al, 1993, Redondas et al, 2014. The previously published approach to 705 represent temperature sensitivity in ozone absorption cross section (i.e. Redondas et al, 2014 and references within) is to use a second-degree polynomial in temperature. We use a spectrally resolved dataset (Bass and Paur, 1984;Serdyuchenko et al, 2013) to calculate the effective ozone absorption cross-sections and their temperature dependence for C-pair spectral channels.
Next section shows examples of Umkehr retrieved ozone sensitivity to the variability in ozone absorption cross section. Table   D1 provides coefficients for second-degree polynomials fitted to the Bass and Paur (1984) absorption cross sections over 710 spectral bands of Dobson Umkehr C-pair (short and long) (Redondas et al., 2014). We further assess the representativeness of the NRL temperature climatology used in the operational Umkehr algorithm. We 720 compare the temperature seasonal cycle derived from the M2GMI dataset over the Boulder station to the NRL climatology at 40 degrees N over the 2010-2017 time period (Figure 4a). Monthly averaged M2GMI temperatures at 2.8 hPa are shown for the morning (red symbols and lines) and afternoon (grey symbols and lines) Umkehr measurements (i.e. at 15 UTC or 08 LT, and 03 UTC or 20 LT). The NRL temperature climatology is shown for comparison (blue crosses). The day-to-day variability in M2GMI temperatures are shown as whiskers (the minimum and maximum of all data, one standard deviation above and 725 below the mean of the data). Figure 4a shows that the NRL climatological mean temperature is biased higher in the morning than in the afternoon by ~8.9 and 6.3 K respectively, which is an indication of the diurnal cycle at 2.8 hPa that is not captured by the NRL climatology). Also, day-to-day variability in the stratospheric temperature (see Figure 4b, results are based on M2GMI temperature dataset) can create an offset that varies seasonally (i.e. box and whiskers values). The summer months show less variability in daily temperatures as compared to the winter months, where maximum offset can vary between -5 730 and 20 degrees C from the NRL climatological.
Impact of daily temperature variability on retrieved Umkehr ozone profiles are further tested for 2015-2017 Boulder Umkehr record. Figure 5a shows M2GMI daily and NRL monthly temperatures at 2.8 hPal. The difference in temperatures during days when Umkehr observations were made is also plotted and highlights large deviations in the winter months. This day-today temperature variability as high as 20 degrees results in relatively low ozone variability. The changes in the retrieved ozone 735 are based on temperature sensitivity in the spectrally resolved ozone cross section within Dobson spectral channels and profile smoothing. Panel b of Figure 5 shows comparisons between ozone retrieved using NRL climatology (y-axes) and M2GMI daily temperature profiles (X-axes). Each point is monthly averaged ozone for 2015-2017 time period. The solid line is the 1:1 reference. The mean bias is -0.26 % with 0.3 % standard deviation. This test provides an averaged uncertainty of the daily retrieved Umkehr ozone in layer 8. It may be a small error; however, this additional uncertainty varies from year to year and 740 thus can have an impact on the long-term ozone trend results in the upper stratosphere.   Table C1. Umkehr and COH pressure layer grids. The layer is defined by the pressure at the bottom of the layer. The pressure at the top layer is between the pressure level and the top pf the atmosphere. The standard Umkehr output is in 10-layer system (2 1065 leftmost columns). The 16-layer system (two middle columns) is used for the AK output. The 61-layer system (two rightmost columns) is utilized in the forward model.

1175
Colors in Panel c correspond to nominal Umkehr layers, which are shown in the legend (See Table 1