Evaluation of the New NDACC Ozone and Temperature Lidar at Hohenpeißenberg and Comparison of Results with Previous NDACC Campaigns

. A newly upgraded German Weather Service (DWD) ozone and temperature lidar (HOH) located at the Hohenpeißenberg Meteorological Observatory (47.8 ◦ N, 11.0 ◦ E) has been evaluated through comparison with the travelling standard lidar operated by NASA’s Goddard Space Flight Center (NASA STROZ), satellite overpasses from the Microwave Limb Sounder (MLS), the Sounding of the Atmosphere using Broadband Emission Radiometry (SABER), the Ozone Mapping and Proﬁler Suite (OMPS), meteorological radiosondes launched from München (65 km north-east), and locally launched ozonesondes. 5 The "blind" evaluation was conducted under the framework of the Network for the Detection of Atmospheric Composition Change (NDACC) using 10 clear nights of measurements in 2018 and 2019. This campaign was conducted within the larger context of NDACC validation activities for European lidar stations. The previous 2017-2018 validation campaign took place at the French Observatoire de Haute Provence and and showed a high degree of ﬁdelity between participating instruments. The results are reported in the companion article (Wing et al., 2020). 10 There was good agreement between all ozone lidar measurements in the range of 15 to 41 km with relative differences between co-located ozone proﬁles of less than ± 10%. Differences in the measured ozone numbers densities between the lidars and the locally launched ozone sondes were also generally less than 5% below 30 km. The satellite ozone proﬁles demonstrated some differences with respect to the ground based lidars which are due availability. The data that support the ﬁndings of this study are openly available [1] The data used in this were obtained from Hohenpeißenberg Meteorological Observatory as part of the Network for the Detection of Atmospheric Composition Change and are publicly available OMPS TJM, JTS, GS, and WS conducted the measurement campaign Hohenpeißenberg. SGB conducted the blind comparison of all HOPS data. RW drafted the article. TJM, JTS, and WS provided access to the data and instruments. SK processed the OMPS data. discussed the results and to the ﬁnal paper.


20
The Network for the Detection of Atmospheric Composition Change (NDACC, http://www.ndacc.org) is an international collaboration of more than 70 research stations (Kurylo et al., 2016;De Mazière et al., 2018) which provides a common framework for the early detection of long-term changes in the atmosphere and validation of atmospheric measurements. To facilitate these instrument validation exercises, a mobile reference lidar operated by NASA's Goddard Space Flight Center (NASA STROZ) is shipped around the world to conduct intensive comparison campaigns with other NDACC lidars. Most 25 recently, NASA STROZ participated in the LAVANDE campaign at the Observatoire de Haute Provence in southern France

LAVANDE.
A general background on ozone lidars, analysis techniques, data collection procedures, and NDACC comparison parameters of this study mirror the work recently done during LAVANDE. For the purposes of this article we will endeavour to provide brief but comprehensive introductions on each of these topics without engaging in onerous repetition. We invite readers seeking more 30 details of NDACC lidar validation activities to consult the companion paper Wing et al. (LAVANDE, 2020) and other NDACC studies: (STOIC, Margitan et al., 1995); (OPAL, McDermid et al., 1998); (OTIC, Braathen et al., 2004); (NAOMI, Steinbrecht et al., 1999); (HOPE, Steinbrecht et al., 2009); (MOHAVE, Leblanc et al., 2011); NDACC algorithm intercomparisons for ozone lidars (Godin et al., 1999); as well as a review paper summarising NDACC validation exercises (Keckhut et al., 2004).
When providing context for this most recent validation exercise we will refer back to the 2020 LAVANDE study and the last 35 validation study at Hohenpeißenberg, the during the HOPE campaign published in 2009.
In general, NDACC lidars measure stratospheric ozone with an accuracy better than 3% between 12 and 35 km altitude and better than 10% between 35 and 40 km. NDACC lidar temperature measurements similarly have an accuracy better than 1 K from 30 to 40 km altitude when compared with co-located measurements. Lidar precision for ozone is highest near the peak concentration of ozone in the stratosphere and decreases above and below the layer as the signal to noise ratio drops at 40 low ozone concentrations. The precision for temperature typically decreases above 70 km depending on the the laser power, telescope area, and integration time for a given lidar measurement. Further details on the theoretical uncertainty budgets for NDACC temperature and ozone lidars can be found in (Leblanc et al., 2016a, b, c).

Key Results from HOPE
The previous NDACC validation campaign (HOPE, Steinbrecht et al., 2009) found a low bias in the ozone profiles produced 45 by the Hohenpeißenberg Original (HOHO) lidar between 33 and 43 km by up to 10% and a high bias of approximately 50% above 50 km when compared with the travelling standard lidar operated by NASA-STROZ. These differences were attributed to the choice of numerical filters used by the NASA and DWD algorithms. An investigation of the precision for ozone data from both lidars concluded that the agreement between profiles from each system was better than 5% between 20 and 40 km.
The 2009 HOPE campaign study also found that the HOHO lidar measures temperatures 1 to 2 K colder than the NASA lidar 50 between 30 and 65 km and up to 15 K warmer than NASA above 65 km. These differences were only significant from 25 to 50km. Additionally, a small altitude offset of 290 m was discovered and corrected in the HOHO system. the 95% confidence level) with both lidars between 15 and 30 km. Above 30 km the uncertainties associated with the pump correction at low pressures contributed to larger measurement differences.
The temperature measurements of the NASA-STROZ reference lidar and the OHP lidar LiO 3 S were statistically equal from 22 to 60 km. Temperature is a secondary scientific product for LiO 3 S which is currently not archived with NDACC or reported above 60 km. A comparison was also conducted between NASA and the OHP temperature lidar LTA. The validation exer-65 cise determined that the photomultiplier in the low gain channel of LTA was defective and the component was subsequently replaced. NASA exhibited an apparent cold bias of approximately 3 K below 25 km with respect all other instruments. Temperature agreement between the lidars and the satellites MLS and SABER were generally very good throughout the stratosphere, only exceeding ±5 K above 55 km. MLS exhibited a vertical oscillation in the temperature profiles with an amplitude of ±5 K with respect to all other measurements. The characteristics of this MLS-lidar difference have been previously reported in Wing Total uncertainty estimates for ozone and temperature were calculated for each instrument involved in the campaign. This was done in an effort to characterise the uncertainty budgets of each of the participating instruments with respect to the observed standard deviation between each set of measurements. This comparison allowed us to evaluate the uncertainty estimates for the lidars and determine if we are realistically estimating the measurement uncertainty in our instruments and the total uncertainty 75 in our profiles of ozone and temperature. We found two outstanding issues during this exercise: 1) the temperature uncertainty budget for the LiO 3 S lidar overestimates the uncertainty above 35 km 2) there was a previously undetected discrepancy between the temperature uncertainty budget for the French lidar LTA and NASA of up to 2 K below 50 km. In response to the LAVANDE campaign findings the PMTs for the low gain channels (< 50 km) in LTA were replaced and plans were made to modify filtering codes for LiO 3 S temperatures for eventual submission to the NDACC database.  Tab. 2) with the dual purpose of providing an updated validation of the existing DWD ozone lidar, hereafter referred to as Hohenpeißenberg Original (HOHO), which has been in continuous operation since September 1987 (see key instrument publications: (Geh, 1987;Claude et al., 1994;Steinbrecht et al., 1997Steinbrecht et al., , 2009) and a first validation study for the new and improved DWD 85 3 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. ozone lidar, hereafter referred to as Hohenpeißenberg lidar (HOH). A technical comparison of both instruments is given in Sect.
2.2. The work presented in this article follows the NDACC standards for 'blind' instrument intercomparisons. The measurements were made onsite and ozone and temperature profiles were calculated by the respective NASA and DWD lidar teams, the nightly averaged lidar profiles were collected by an impartial NDACC referee (S. Godin-Beekmann) who was not involved in conducting the measurement campaign, and the intercomparison of the results was conducted by the referee's team.

90
The paper is structured according to the following outline: Sect. 2 introduces the instruments involved in the HOPS campaign and sets the co-location criteria for coincident measurements; Sect. 3 provides technical details for the new DWD temperature and ozone lidar and shows some examples of co-located ozone and temperature profiles; Sect. 4 conducts a statistical intercomparison between all instruments for ozone; Sect. 5 conducts a statistical intercomparison between all instruments for temperature; Sect. 6 examines and assesses the estimated uncertainty budgets for all instruments participating in the HOPS 95 campaign; Sect. 7 conducts a cross-intercomparison of both the LAVANDE and HOPS NDACC campaigns to assess the performance of the travelling standard lidar NASA-STROZ; and Sect. 8 summarises the major finding of the HOPS NDACC intercomparison campaign as well as the results of the LAVANDE-HOPS cross-comparison and evaluation of NDACC lidar validation activities in Europe.

Instruments used for HOPS
100 Table 3 summarises all the different systems participating in the HOPS intercomparison. Key aspects of each different instrument are noted in each subsection. References to original or most recent instrument descriptions are given for those seeking further details and can also be found in Wing et al. (LAVANDE, 2020).

Original DWD Lidar (HOHO)
The original DWD ozone lidar (HOHO) located at the Hohenpeißenberg Meteorological Observatory (47.8 N, 11.0 E) has been 105 in continuous operation since 1987 and has one of the longest and most complete data records in NDACC. The lidar uses a differential absorption (DIAL) technique which exploits the scattering cross-sections for ozone at two different wavelengths.
The first wavelength is generated using a Xenon Chloride excimer laser to generate a primary emission at 308 nm. The light passes through a hydrogen (H 2 ) gas cell where the primary emission is used to stimulate a Raman emission at 353 nm. Both wavelengths are transmitted through a 10X beam expander to reduce the divergence of the laser beam before transmission to 110 the sky. The receiver telescope is a 0.6 m Newtonian mirror. The 353 nm line is weakly absorbed by ozone (also referred to as the non-absorbed line or off-line) and can be used to infer the neutral density of the atmosphere above the aerosol layers present in the lower stratosphere. The shorter 308 nm line is more strongly absorbed by ozone (also referred to as the absorbed line or the on-line) and is used to detect the number of ozone scattering targets in a profile above the lidar. The DIAL technique uses these two profiles to infer the ozone number density by taking the derivative of the ratio between these two measured profiles 115 (Pelon and Megie, 1982). Generating lidar temperature profiles is accomplished using the Rayleigh lidar returns from the 353 nm channel. Relative density profiles can be inferred from the range corrected lidar proton counts profile. Using an assumed 4 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. a priori pressure at the top of the lidar profile, an absolute temperature profile can be calculated based on the relative density gradient. Full details for this technique are found in Hauchecorne and Chanin (1980). Below approximately 27 km both DWD lidars incorporate information from the local meteorological radiosonde in an effort to identify and correct for the possible 120 contamination by stratospheric aerosol layers.
Full technical specifications can be found in Steinbrecht et al. (2009) and a comparison of the technical specifications of the original and new DWD ozone lidar can be found in Tab. 4.
The data processing for the HOHO lidar is as described in Steinbrecht et al. (2009). Lidar return signals are corrected for photon counter dead-time effects, the background is subtracted, and the signals are averaged over the night. After correction, 125 the high gain and attenuated low gain signals are merged. Typically, the high gain signal is useful down to about 20 km and the low gain signal continues down to about 10 km. From the combined signals, temperature and ozone profiles are derived. Ozone profiles typically extend down to 10 or 15 km, depending on the night, while pure lidar temperature profiles care calculated down to 28 km (where aerosol becomes important and biases the retrieved temperature). The HOHO ozone algorithm uses a very wide differential filter in the ozone calculation (Godin et al., 1999;Steinbrecht et al., 2009). There is a resulting bias 130 from the differential filter, which is substantial near 35 km and is corrected (see Steinbrecht et al. (2009)). Corrections for signal-induced noise and timing delay are also applied in the ozone processing code as required.

New DWD Lidar (HOH)
The newly upgraded DWD lidar also exploits the DIAL technique for measuring ozone. The key difference in the new system is the use of two lasers to generate the weakly and strongly absorbed lines in place of a Raman gas cell. The weakly absorbed 135 line is generated at 355 nm from the frequency tripled output of an Nd:YAG laser and the second wavelength at 308 nm is produced using an excimer gas laser. In addition to using two dedicated high powered lasers to produce the lidar emissions, the new HOH lidar employs a 1 m receiver telescope, dedicated high and low gain channels at both 355 nm and 308 nm to improve the dynamic range of the lidar measurements, Raman channels at 332 nm and 387 nm, as well as new fast response PMTs. A full list and comparison for the technical specifications of the HOH system can be found in Tab. 4. A secondary objective for 140 this paper is to characterise the measurement bias and uncertainty budget of the new HOH with respect to the HOHO to ensure continuity and consistency in the Hohenpeißenberg NDACC data record.
Data processing for the new HOH lidar is essentially the same as for the HOHO lidar. The vertical resolutions of the derived ozone and temperature profiles (and the differential filter for ozone) are the same for HOH and HOHO. The different instrumental parameters (faster counters, better timing, etc.) are accounted for in the processing. Due to the much better return 145 signals, merging between low and high gain returns occurs at higher altitude, around 25 to 30 km. Precision of the measured ozone and temperature profiles is also better than for the HOHO lidar. Ozone profiles from the new HOH lidar usually cover the altitude range from 15 to 50 km (10 to 45 km for the old HOHO). Temperature profiles cover 28 to 80 km (28 to 65 km for the old lidar). 5 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

NASA Stratospheric Ozone Lidar (NASA STROZ)
NASA's Goddard Space Flight Center Stratospheric Ozone Lidar (NASA STROZ) the mobile NDACC validation lidar for temperature and ozone measurements. This mobile lidar system is shipped across the world and used to run intercomparison and validation campaigns for lidar stations within the NDACC network. The NASA STROZ is a DIAL system similar to the HOH, relying on an on-line wavelength of 308 nm and an off-line wavelength of 355 nm generated by two separate lasers. The system also has two Raman channels at 332 nm and 407 nm for tropospheric measurements. The system was constructed in

Radiosondes and Brewer-Mast ozonesondes (BM)
Brewer-Mast ozonesondes (BM) manufactured by Mast Keystone Co. consist of a single electrochemical cell with a silver anode and platinum cathode which are immersed in a potassium iodide (KI) solution (Solar and SIMS, 2014). The ozonesondes 160 are attached to a to Vaisala RS92-SGP radiosondes and were launched approximately every two nights during the campaign. A total of five in-situ ozone measurements were made to compare with ten nightly average lidar profiles. Brewer-Mast ozonesonde uncertainty estimates are given as ±(3-5)% by Stübi et al. (2008) however, we found this estimate to be too conservative and have adapted the uncertainty estimates for ECCs of ±(2.5-10)% given by Tarasick et al. (2016).
In addition to the BMs, we have also used the Vaisala RS41-SGP meteorological radiosondes launched from the nearby 165 station at München.

Microwave Limb Sounder (MLS)
The Microwave Limb Sounder (MLS) uses a spectrometer to make limb measurements of thermal microwave radiation of the atmosphere. The instrument, aboard the Aura satellite, allows for the retrieval of stratospheric ozone profiles with a vertical resolution of about 3 km. Measurements of stratospheric temperature profiles are alos made with a typical vertical resolution 170 of 8 km at 30 km altitude, 9 km at 45 km altitude, and 14 km at 80 km (full width at half maximum (FWHM) of the averaging kernels, Schwartz et al., 2008). MLS profiles of temperature, geopotential height and ozone were extracted from the Version 4.0 MLS dataset. A more complete description of the instrument is given in Waters et al. (2006). For the HOPS campaign, the geopotential altitude is converted to a geometric altitude and re-gridded to allow for a direct comparison with the lidars and sondes. Ozone and temperature measurements from the Sounding of the Atmosphere using Broadband Emission Radiometry (SABER) instrument were downloaded from 15 to 100 km. The vertical resolution for SABER temperature profiles is approximately 2 km and the estimated accuracy is 1 to 2 K between 15 and 60 km which decreases to 5 K near 85 km, and to 10 K near 100 km (Rezac et al., 2015a, b). Precision estimates for SABER ozone profiles are reported as 1% between 40 and 50 km 180 6 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. altitude, decreasing to 2% between 30 and 55 km and 10% near 80 km (Rong et al., 2009). A more complete description of the instrument is given in Mertens et al. (2001). SABER profiles of temperature, geopotential height and ozone were extracted from the Version 2.0 SABER dataset.  (Flynn et al., 2006(Flynn et al., , 2014. The given vertical resolution for ozone profiles in the stratosphere is 1 km. The estimated uncertainty on the visible OMPS-LP ozone profile is given as a function of altitude and ranges from approximately 40% near 10 km, to 15 % at 20 km, to roughly 3 to 5 % in the rest of the stratosphere. The estimated uncertainty of the 195 UV channel is approximately 4 % at 25 km and drops to 2.5 % at 35 km and is less than 2 % up to 60 km (Loughman et al., 2005;Zawada et al., 2018). In this study we use version 2.5 OMPS-LP ozone profiles described in Kramarova et al. (2018).
OMPS ozone profiles were not included in the LAVANDE study as at the time the authors considered the temporal offset too large. In HOPS we are making a first attempt at using a solar limb scanning satellite to validate night time lidar measurements.
2.2.6 Co-locating satellite profiles and ground-based profiles.

200
For HOPS, we considered all satellite profiles with a tangent point within ±5 • latitude and ±15 • longitude of the Hohenpeißenberg Meteorological Observatory (47.8 • N, 11.0 • E), and within ±6 hours of 00 UTC (1 hour after local midnight for the lidar measurements nights) for SABER, ±99 minutes of 1h40 UTC for MLS, and ±101 minutes of 11h50 UTC the following day for OMPS. This fairly large coincidence box is depicted in Fig. 1. It covers most of central Europe, from Wales in the northwest to Bulgaria in the southeast. The box size chosen here is similar to the compromise chosen in Wing et al. (2018b) and relates 205 to the trade off between a small number of close overpasses and a larger number of overpasses which may be further away from the ground station. For HOPS there are typically between 10 to 20 coincident profiles for each of the satellites, which are generally divided between one or two satellite overpasses, for a given night (the following morning for OMPS).

DWD NDACC Lidar Upgrades and Example Data
The HOPS campaign took place in two parts: the first period covered the nights of October 21 st and 22 nd 2018, and the 210 second period covered nights in 2019, from March 21 st to April 6 th . Table 2 shows which systems provided ozone and/or temperature profiles on each of the different nights of the campaign. Table 3 shows the details of the altitude range and important wavelengths for each instrument when making measurements of temperature and ozone.

Evaluation of the New HOH Lidar
The HOPS campaign provided a perfect opportunity to conduct an evaluation of the newly installed HOH lidar. The HOH lidar 215 ran concurrently with the NASA-STROZ mobile validation lidar as well as the original HOHO lidar. This crucial overlap period allows us the opportunity to conduct a formal NDACC evaluation of the Hohenpeißenberg lidars and ensure that there are no unexplained biases or problems which could go on to cause discontinuities in one of the longest running NDACC datasets . Table 4 shows an in-depth comparison of the technical specification for the HOH, HOHO, and NASA lidars. Figure 2 shows a comparison of the nightly average deadtime corrected, photon count rates for the high gain channels of 220 the HOH and HOHO lidars as well as the ratio of the high and low gain channels for both systems. The left hand panel shows the photon count rates (PCR) for the high gain channels in both lidars at 308 nm and 355 nm. The signal in the 308 nm 8 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. channel for the HOH lidar (red) is 74x larger than the signal in the 308 nm channel of HOHO (blue). Similarly, the high gain 355 nm channel of HOH (green) has 224x more signal than the 353 nm high gain channel in HOHO (magenta). The signal improvements in the low gain channels at both wavelengths are not indicative of the general increase in system performance 225 as there are neutral density filters placed in front of the photomultipliers to attenuate the signals. The increased SNR at of the HOH system with respect to the HOHO system results in ozone profiles with less statistical uncertainty (discussed later in Sect. 6.1) and the large 224 factor improvement in the high gain 355 nm channel will allow for Rayleigh temperature profiles to routinely reach the Upper Mesosphere and Lower Thermosphere (UMLT).
The right hand panel of Fig. 2 shows that there is significant improvement in the SNR of the high gain 308 nm channel (dark 230 red) above 55 km. The high gain channel at 355 nm (dark blue) is linear over the entire altitude range. The ratio between the low gain channels for both 308 nm (dark green) and 355 nm (dark purple) have small slopes which indicates very slight offsets in the slopes of the PCR profiles. It is recommended that either the attenuation of the low gain channels be reduced or that the high gain channels be truncated at a lower altitude to provide a greater overlap region between the high and low gain channels where both have high SNR.

235
In the crucial range between 30 to 50 km for ozone and 30 to 70 km for temperature the HOH channels are linear with respect to their HOHO counterparts and do not appear to exhibit altitude dependent biases or photomutliplier saturation effects. This is an important result to document with regards to the long-term stability of the NDACC lidar temperature and ozone dataset at the Hohenpeißenberg Meteorological Observatory.
9 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. The temperatures shown in the left hand panel of Fig. 3 were measured on the 21 st of October 2018 and all follows the expected profile for the middle atmosphere. There is very close agreement between 30 and 50 km in the stratosphere with slightly 245 more variation below 20 km and in the mesosphere. A closer examination of the temperature differences of each instrument with respect to the HOH lidar is shown in the right hand panel of Fig. 3. To calculate the differences, all measurements were adapted to a standard 1 km grid. Below 20 km it is expected that geophysical differences in the sampled air masses will have a larger impact on the differences with the stationary lidars than in the middle atmosphere. Additionally, the uncertainty of 10 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. the satellite measurements at low altitudes, advection of the balloon sondes, and possible contamination of the lidar signal by aerosols may all contribute to the observed differences. Above the stratopause, located near 50 km, there is again a greater chance that geophysical variability is contributing to the observed lidar-satellite differences. Above 60 km the signal to noise ratio of the HOHO lidar (cyan) becomes the largest contributor to the observed differences. Similarly, an example of nightly average ozone profiles for the night of the 6 th April 2019 (7 th April 2019 for OMPS) is given in the left hand panel of Fig. 4. We can see that all instruments accurately reproduce the shape of the stratospheric ozone 255 layer and also identify a ozone laminae near 13 km. The HOH and HOHO lidars report ozone profiles for altitudes greater than 15 km. In the right hand panel of Fig. 4 all profiles were adapted to a common 300 m grid and compared with the ozone profile measured by the HOH lidar. Below 25 km there is very good agreement between all instruments with differences of generally less than 10 %. The lowest couple of data points for the HOH lidar near 15 km may underestimate the ozone number density on this particular night. SABER ozone was cut at 20 km as below this point the profile number densities became unrealistically 260 11 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.
large. Above 28 km there is increased variability (expressed as percent difference) between the lidars and the satellites which is likely a function of low ozone number densities and geophysical variability. Above 40 km, the percent difference between the different measurements is not a useful metric as small absolute differences in the ozone number density can translate to very large percent differences. We will provide various other metrics later in the article when we discuss the systematic bias of ozone measurements in this region.  The ozone profiles of each instrument were integrated to 2 km resolution before being plotted. The top panel which shows the ozone number densities at 40 km, indicates that in 2019 (last 8 nights) there was tight clustering of all the measurements while in 2018 (first two nights) there was slightly more variation. In the second panel, which shows ozone densities at 30 km, we see 270 that there is again tight clustering for all instruments except OMPS. This is to be expected as the OMPS given that there is a 12 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. large temporal offset (data taken the morning after the lidar measurements) and that the visible channel at 602 nm which we used for this study only extends to 35 km. The ozonesonde on the 31 st for March appears to be an outlier and is likely that well known pump problems at low pressures is the cause. The third panel at 20 km also shows very tight clustering between all instruments with a slight high bias beginning to be seen in SABER data. The bottom panel at 15 km shows a higher level of 275 inter-measurement variability between the lidars and satellites as the geophysical variability and sampling uncertainty become evident. SABER clearly shows a high bias with respect to other instruments at this altitude. A more systematic look at the ensemble ozone number density differences between the HOH lidar measurements and the measurements made by each of the other instruments is shown in Fig. 6. The darkened line represents the mean difference for each pair of measurements and the shaded region is the 2σ (95% confidence level) limit. The best agreement between the 280 different ozone systems is found between 20 and 40 km altitude where differences are generally less than ±10%. The larger 13 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. deviation in the OMPS VIS profile (mustard) above 28 km is likely an indication that we should rely on the OMPS UV channel (burnt orange) above this height, while the sharp decrease in the Brewer-Mast profile (green) above 30 km arises from errors in the BM pump corrections at low pressure. Below 20 km there are larger differences between the satellites and the lidars (and sondes). For MLS (violet) and OMPS VIS (mustard) this is likely due to geophysical differences in the sampled air masses 285 while for SABER (magenta) there is a definite bias in the data. In general, the results shown in Fig. 8 are similar to the results of shown in Figure 7 of the LAVANDE study and to previous NDACC intercomparisons. Above 40 km there is an unexplained low bias of approximately 25% in the OMPS UV channel (burnt orange) and 35% in the NASA-STROZ ozone densities with respect to all other measurements. Some of the bias at this altitude may be the same as the documented low bias of 8-25 % in OMPS UV ozone with respect to co-located profiles from OSIRIS, MLS, and ACE (Kramarova et al., 2018).    Examining the correlation between each of the instruments in the HOPS campaign and the HOH lidar adds another facet to our understanding of the intercomparison. By examining the 'goodness' of the match as a function of altitude (shown in Fig. 8) we can examine the difference between measurements while taking into account the statistical scatter as well as any co-variances. Unsurprisingly, the best correlation with the HOH lidar is the HOHO lidar (cyan) with correlation coefficients greater than 0.95 below 35 km. Above this altitude, the drop in the signal to noise ratio (SNR) of the HOHO lidar contributes to There are two possible explanations for why the OMPS correlation is smaller than MLS. First, there is a significant time offset 315 between the nightly lidar measurements and the OMPS over pass which happens on the morning after. Second, as will be discussed in Fig. 14 the visible channel of OMPS has a very large estimated uncertainty below 20 km. 17 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

Intercomparison Results for Temperature
We have conducted a similar analysis for HOPS temperature measurements as was done in the previous section for ozone. Figure 9 shows the temperature time series at four altitudes for each of the different systems during HOPS.

320
The top panel of Fig. 9 traces temperatures in the mesosphere at 70 km. At these altitudes we can see the contribution that larger temperature uncertainties in the lidars (particularly, HOHO (cyan)) introduces to the time series. In general, all three lidars NASA, HOH, and HOHO report higher temperatures than the satellites MLS and SABER. This result is consistent with LAVANDE as well as other European lidar-satellite comparisons (Wing et al., 2018, b).
The second panel of Fig. 9 traces temperatures at 50 km, near the altitude associated with the stratopause. With the exception 325 of the 21 st of March 2019 where the lidars and satellites produce very different measured temperatures, the lidars and satellites generally produce similar temperatures, with the satellites being 5 K cooler than the lidars.
The third panel of Fig. 9 traces temperatures in the lower stratosphere and shows the best agreement between all measurements. We expect that the temperature at these altitudes would show very little variability due to the high SNR in all instruments as well as the low geophysical variability in lower stratospheric temperatures on hourly timescales. SABER (magenta) appears The bottom panel of Fig. 9 traces temperatures at 10 km in the UTLS. At this altitude there are fewer measurements, more geophysical variability associated with passing weather fronts, variability associated with the advection of balloon measurements, and possible bias introduced into the lidar temperatures from aerosol contamination. Despite the increased variability, 335 we can see that the NASA lidar (red), SABER (magenta) and MLS (violet) generally measure colder temperatures than the meteorological radiosonde from the München station (black) and the locally launched Brewer-Mast sonde (green). It is interesting to note that the two balloon sonde measurements agree very well despite the 65 km separation between Hohenpeißenberg and München.
18 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. The average temperature difference between all instruments participating in the HOPS campaign and the NASA lidar is 340 given in Fig. 10. The differences between the temperatures produced by the three lidars are less than ±5 K from 15 to 80 km.
The temperature differences between HOH and NASA (red) are only significant below 18 km and above 78 km. The NASA temperatures appear to have a slight cold bias below 30 km which is consistent with the results from LAVANDE described in the introduction Sect. 1.2. MLS (violet) becomes significantly different from the other measurements above 55 km and exhibits a vertically oscillating temperature bias described in (Wing et al., 2018b). SABER (magenta) exhibits a significant 345 warm bias between 15 and 25 km with respect to lidar measurements which has been previously been identified in the SABER temperature assessment paper (Remsberg et al., 2008).

19
https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. Figure 10. Average absolute differences with respect to the NASA temperature profile measured during the HOPS campaign. The shaded range gives ±2 standard deviations of the mean, and indicates statistical uncertainty at the 95% confidence level. Figure 11 shows the scatter between nightly temperature comparisons during the HOPS in three panels. The left hand panel shows the differences between temperatures from each instrument and the HOH lidar temperatures in the UTLS from 10 to 20 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.
35 km. The scatter shows fairly close agreement to the black 1:1 reference line, particularly for the in-situ temperatures from 350 the sondes. NASA (red) exhibiting slightly colder temperatures and the satellites MLS (violet) and SABER (magenta) having slightly warmer temperatures. The centre panel shows the scatter from 35 to 60 km in the upper stratosphere and stratopause region. The temperatures from NASA fall very closely along the reference line however, the temperatures from HOHO (cyan) exhibit more variation associated with the drop in SNR in that system. The satellites have a very high level of temperature variation but appear to be centred about the reference line. In the right hand panel the temperature scatter from 60 to 90 km is 355 shown. The temperature variance is largest at these altitudes however, despite the increased scatter we can see the systematic cool bias of the satellites and warm bias of NASA with respect to HOH. be due to the combination of HOH lidar data with the radiosonde mentioned in Sect. 2.2, differences in the overlap correction between the two lidar systems, the use of Raman temperature channels in the NASA-STROZ lidar, or geophysical sampling problems arising from a few nights where the DWD lidars measured longer than the NASA system. Disentangling the source 365 of the disagreement is beyond the scope of a 'blind intercomparison' and would require each team to reprocess their data. In Sect. 6 we will discuss the disagreement between the observed differences between NASA and HOH and the differences that we should expect given the reported uncertainty budgets of each system. From 35 to 58 km the correlation between NASA and HOH temperatures is very high at nearly 0.99. Above 60 km the statistical variation and the differences in filtering could  When comparing the published uncertainty estimates for lidars, sondes, and satellites during an intercomparison it is not sufficient to rely simply on the reported instrument precision. Some instruments report full uncertainty budgets, others average accuracy, and others single profile precision. To make the comparison fair we have taken the average of the total nightly uncertainty for each instrument and normalised it with respect to the nightly average measurement to arrive at a plot estimating the average relative uncertainty as a function of altitude during HOPS. This follows the same method used in LAVANDE.

380
For the lidars the largest terms in the uncertainty budgets are the statistical uncertainty arising from the Poisson counting statistics for photon detection which become large at higher altitudes. Several other smaller corrections with respect to atmospheric scattering and transmission, instrument corrections, and algorithm initialization (temperature only) are also included in the formal 'NDACC' uncertainty budget described in details in Leblanc et al. (2016a, b, c). In this blind intercomparison we take the reported total uncertainty or 'NDACC' uncertainty reported for by each group. In Fig. 13 we can see that the average 385 of the nightly relative uncertainty for temperature in the NASA (red), HOH (blue), and HOHO (cyan) lidars are typically less than 1 % over most of the measurement range. The HOHO lidar which has a less powerful laser output at 353 nm (refer to the introduction Sect. 3.1 and the discussion of Fig. 2) reaches 1% relative uncertainty at 45 km -much lower than the HOH and NASA lidars. The sudden drop in the relative uncertainty in all three lidar profiles near 25 km is associated with the transition from the low gain lidar channels to the high gain lidar channels. The nightly uncertainty profiles for MLS (violet) and SABER 390 (magenta) were downloaded directly with the temperature profiles. Temperature uncertainties for the Vaisala RS41-SGP radiosonde used at both the München radiosonde station and the Vaisala RS92-SGP attached to the Brewer-Mast launched at Hohenpeißenberg is given as 0.15 K below 100 hPa and 0.3 K above 100 hPa. We have not included these values in Fig. 13 as they are too small to be clearly distinguishable.
24 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. The major term in the uncertainty budget for the lidar ozone measurements comes from the Poisson photon counting uncer-395 tainty. A full and detailed propagation of uncertainty through the lidar equation is given in Godin et al. (1999). In Fig. 14 we   25 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. see different behaviours in the relative uncertainty of the NASA (red) and the uncertainties and the HOH (blue) and HOHO (cyan) lidars. The peak in relative uncertainty between 25 and 30 km in both DWD lidars is due to the transition between the low gain and high gain lidar channels. It is recommended that the DWD lidars merge their high and low gain channels at a lower altitude to suppress the uncertainty peak in this range. The ozone uncertainty in the Brewer-Mast is given simply as 400 ±3-5%. This flat uncertainty profile does not capture the observed variance between the Brewer-Mast measurement and the lidars which is discussed in the next section. We have chosen to include an uncertainty profile estimated for the ECC (green) by Tarasick et al. (2016) which presents a more realistic uncertainty profile for a similar instrument. The relative uncertainty profile for MLS (violet) and SABER (magenta) were calculated using the uncertainty information included in the downloaded data files. MLS has low relative uncertainty throughout most of the stratosphere, averaging 2 to 3%. The uncertainty rapidly 405 increases at low ozone densities below 20 km and above 45 km. SABER ozone uncertainty appears unrealistic above 35 km and increases rapidly below 30 km. We have endeavoured to estimate accurate profiles of OMPS relative uncertainty for both the visible channel and the UV channel for the HOPS campaign. Using the 1σ measurement uncertainty estimates found in Loughman et al. (2005); Zawada et al. (2018); Kramarova et al. (2018) we have calculated the nightly relative uncertainty profiles for OMPS visible (mustard) and UV channels (burnt orange). We have doubled the 1σ values and then averaged the 410 nightly relative uncertainty profile for HOPS to generate an uncertainty profile which is consistent with the other participating measurements at 2σ. This value is approximately 3% for the OMPS visible channel between 20 and 30 km. The relative uncertainty rises drastically below 18 km and increases slightly above 32 km. Likewise, the relative uncertainty profile for the UV channel of OMPS uses the reported 1σ precision and accuracy estimates by Loughman et al. (2005) of between 1 and 3% to calculate the relative uncertainty profile in Fig. 14 (burnt orange). The rapid increase in relative uncertainty seen in the OMPS 415 UV channel above 35 km results from the rapid decrease in ozone number density and the possible low bias of OMPS UV data seen in Fig. 6. 26 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

Assessment of the Uncertainties Reported by the Instruments
Here we conduct an intercomparison of the reported uncertainty budgets for all HOPS lidars for both temperature and ozone.
This exercise is important for establishing that the total uncertainty budgets for NDACC lidars are realistic and in keeping 420 with NDACC guidelines and standards. For lidar-lidar comparisons there is nearly perfect spatio-temporal coincidence and we can neglect geophysical variations in our uncertainty comparison. Here we will use the NASA-STROZ (red) average relative 27 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. uncertainty profile as the reference. Following the same statistical comparison technique used in the companion Wing et al.
(2020) article we will assume that there is no correlation between the average measurement noise for the lidars. In Fig. 15 the measurement uncertainty of NASA-STROZ lidar, σ N (red), HOH lidar, σ H (blue), HOHO lidar, σ Ho (cyan), are plotted 425 alongside the combined uncertainty, σ combined (black), given in 2, and the relative standard deviation of the measurement differences, σ RSD (grey), given in 1. In these equations, N i , describes the NASA measurement, N , described the average NASA measurement, σ N , describes the measurement uncertainty for NASA, X i , X, and σ X , describe same properties for the HOPS instrument under consideration, and n is the total number of measurements.
If the combined uncertainty estimates, expressed in Eq. 2 (black) are correct, they should be similar to the observed standard deviation of all the nightly mean ozone profile differences, σ RSD (grey), expressed in Eq. 1.
(2) Figure 15 compares the average relative uncertainties for the three lidars participating in the HOPS campaign for both temperature and ozone. In panel a) we see the comparison of the relative temperature uncertainty for the NASA (red) and 435 HOH (blue) lidars. Above 35 km the combined uncertainty budget (black) is dominated by NASA which has a smaller receiver telescope than the HOH lidar (see Tab. 4) which results in a reduced photon count rate at higher altitudes. Below 35 km, the HOH lidar has the larger contribution to the combined relative uncertainty budget arising from increased measurement uncertainty in the low gain 355 nm channel. When comparing the combined estimated relative uncertainty (black) with the observed standard deviation (grey) we see that below 55 km there is variance between the lidar temperature measurements 440 which cannot be explained by the combined uncertainty budget. A nearly identical result was found in the LAVANDE study with unexplained variance below 55 km between NASA and the OHP temperature lidar, LTA, and between NASA and the OHP stratospheric ozone lidar, LiO 3 S.
In panel b), the uncertainty of the HOHO lidar (cyan) is the largest contributor to the combined estimated relative uncertainty budget (black). The combined uncertainty accounts for most of the observed variance in the comparison with NASA (grey) 445 except for a discrepancy between 25 and 35 km. This region appears to be directly above the transition from the low gain to high gain channels in the HOHO lidar and the estimation of the HOHO uncertainty in this region may not be complete. Taken together with our interpretations of the LAVANDE results and the results shown in 15a) we begin to see a pattern of increased variability between lidar measurements in the region surrounding the transition between high and low gain channels which is not fully accounted for the the NDACC uncertainty budget.

450
Panel c) shows the relative uncertainty in ozone for NASA (red), HOH (blue), combined uncertainty (black), and observed variation between measurements (grey). As was previously stated, the differences between the estimated relative uncertainty profiles for the NASA and HOH lidars arises from thre transition from the low gain to high gain channels in the DWD lidras.
The observed variance is well represented by the combined uncertainty above 23 km. From 15 to 23 km there is more variation in the data than can be accounted for in the uncertainty estimates of either lidar. One possible explanation for the increased 455 variability below 25 km is sampling time. On a few nights the NASA-STROZ lidar measured for a set number of hours while the DWD lidars measured for the entire night. Given that this is a 'blind intercomparison' we cannot reprocess the data however, in future NDACC validation exercises we strongly encourage participating PIs to end measurements at the same time or submit partial files to the NDACC referee. Below 25 km there is sufficient geophysical variation that a few hours of extra measurements can change the nightly mean profile.  30 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License.

Uncertainty Evaluation of the satellites
In the LAVANDE companion paper we attempted to separate the measurement uncertainty associated with each profile taken 465 during a satellite overpass, the sampling uncertainty associated with the variation between individual profiles included in the average satellite overpass, and the geophysical variability. It was correctly pointed out that characterisations of sampling uncertainty are not completely independent of geophysical variability. For the HOPS intercomparison of lidar-satellite relative uncertainty estimates we have not attempted to address sampling uncertainty. In all cases where the observed standard deviation of the differences between observations (grey) is larger than the combined NASA-satellite estimated uncertainty budget (black) 470 we will interpret the difference as 'geophysical variability' with the understanding that there is some unknown contribution associated with the accuracy of the satellite measurement.

Uncertainty Evaluation of the balloon sondes
The temperature measurement uncertainty for NASA lidar (red) and the balloon borne in-situ measurements (green), shown 485 in Fig. 17 a) are very similar in magnitude. However, the combined uncertainty (black) is consistently less than the observed standard deviation between the lidar and sonde temperature measurements (grey). We expect that the majority of the difference observed variation below 20 km is likely due to the geophysical variability inherent in sampling different air masses while variability in ozone above 27 km likely arises from problems with the ozonesonde pump or poor pump corrections. Figure   17 b) shows that the combined ozone uncertainty (black) is an overestimates the observed standard deviation between 22 and 490 26 km and severely underestimates the variation at lower altitudes. Recall that we have used the ECC uncertainty reported by Tarasick et al. (2016)  In the introduction we gave the NDACC standard for ozone lidars as having an accuracy better than ± 3% between 12 and 35 km and an accuracy of better than ± 10% between 35 and 40 km. The accuracy for NDACC temperature lidars was given 37 https://doi.org/10.5194/amt-2020-396 Preprint. Discussion started: 10 November 2020 c Author(s) 2020. CC BY 4.0 License. as agreement better than ± 1 K. b LiO3T is a tropospheric system and has minimal overlap with the stratospheric lidars.

Conclusions
The HOPS intercomparison campaign of the DWD lidars at the Hohenpeißenberg Meteorological Observatory with the travelling standard NDACC reference lidar NASA-STROZ has demonstrated the consistency of the HOH lidar measurements with respect to measurements made using the HOHO lidar. We have confidence in the continued high quality of the Hohenpeißenberg dataset for both temperature and ozone after the installation of a new lidar system.

550
The intercomparison exercise has confirmed that the original DWD lidar, HOHO continues to meet NDACC standards for ozone profiles at the 3% level between 16.5 and 43 km and at the 10% level between 10 and 44 km. The HOHO lidar meets the NDACC temperature standards for accuracy at the ±1 K level between 18 and 70 km. The new DWD lidar, HOH, meets the 3% ozone standard between 17 and 41 km, the 10% ozone standard between 15 and 41 km, and the ±1 K temperature standard between 17 and 78 km.

555
The cross-comparison of NDACC campaign at Hohenpeißenberg Meteorological Observatory (HOPS) and at Observatoire de Haute Provence (LAVANDE) has allowed for the unique opportunity to assess potential biases in the NASA-STROZ reference lidar. When cross-compared against the LiO 3 S, LTA, and HOH lidar temperature profiles, and MLS and SABER satellite temperature profiles, the NASA-STROZ lidar appears to have a warm bias above 60 km. The NASA temperatures have an apparent cold bias below 30 km when cross-compared to all other instruments. These possible biases may arise from algorithm 560 initialisation choices and serve as strong motivation for another NDACC temperature algorithm paper.
When the ozone density profiles are cross-compared for both HOPS and LAVANDE instruments there is a high degree of variability in all of the stratospheric lidars below 20 km. The NASA lidar measures higher ozone densities than the DWD lidars but lower densities than the OHP lidar. At altitudes above 40 km, the NASA lidar and OMPS-LP UV measure lower ozone density than LiO 3 S, HOH, HOHO, MLS, and SABER.