Quality assessment of Dobson spectrophotometers for ozone column measurements before and after automation at Arosa and Davos

instrument. One of them is the Arosa station where both instrument types are run in parallel. Here, an automated version of the Dobson instrument was developed and implemented recently. In the present paper, the results of the analysis of simultaneous measurements from pairs of Dobson instruments that were either collocated at Arosa or Davos, or operated one at each location, are presented for four distinct time periods:

started, with a view to continue the world's longest total column ozone series based on Dobson observations in Davos.
The present study is centered on the analysis of Dobson instruments data and is a follow-up of two previous analyses of the LKO Brewer triad measurements (Stübi et al., 2017a, b).
The paper is organized as follows: in section 2, the measurement principles are presented, followed in section 3 by a description of the data sets and of the data quality control procedures applied. The results of the analysis are presented in section 4, and 5 the discussion of the results in section 5.

Dobson spectrophotometer measurements
The principle of the Dobson instrument is described in many publications (Dobson, 1968;Komyhr, 1980;Evans, 2008;Scarnato et al., 2009Scarnato et al., , 2010Moeini et al., 2019). The intensity of the sun's radiation in the UV range at ground level is modulated by the amount of ozone in the atmosphere. The sun spectrophotometers of type Dobson and Brewer measure the intensity at 10 a few specific wavelengths in the range 310-340 nm. In the Dobson instrument, the sun light is diffracted by a prism and two narrow slits allow to select the different pairs of wavelength commonly referred to as A (305.5 nm / 325.4 nm), C (311.45 nm 3 https://doi.org /10.5194/amt-2020-441 Preprint. Discussion started: 4 December 2020 c Author(s) 2020. CC BY 4.0 License.
/ 332.4 nm) and D (317.6 nm / 339.8 nm). These pairs are combined to form the double pairs AD and CD used to calculate the ozone column while eliminating atmospheric interferences (Evans, 2008;Basher, 1982). Following Evans (2008) notation, the ozone column is retrieved with the following formula: where the superscripts s (l) refer to the short (long) wavelength within each pair, α λ is the absorption coefficient of ozone, β λ 5 and δ λ are respectively the Rayleigh and Mie scattering coefficients, m and µ refer to the air masses for Rayleigh and ozone respectively. The ratio p/p 0 is a correction for the mean station pressure and SZA is the solar zenith angle. The measured N values are the differences of the solar radiation intensity ratios I s 0 /I l 0 at the top of the atmosphere and I s /I l at the surface: The wavelength dependence of the Mie scattering is much smaller than the dependence of ozone and Raleigh scatterings there- 10 fore the last term of equation 1 is negligible for the double pairs. In the Brewer instruments, a diffraction grating selects four wavelengths (310.1 nm, 313.5 nm, 316.8 nm, 320.0 nm) which are then combined in a similar way as for the Dobson instrument to extract the ozone column (Kerr et al., 1981;Kerr and McElroy, 1995). The Dobson ozone column retrieval algorithm is fairly simple and assumes similar characteristics for all instruments, characteristics based on the optical properties of the primary reference Dobson instrument D 083 (Komhyr et al., 1989). In the past 10 years, the EMRP-ATMOZ project has contributed to 15 an improved understanding of the sun spectrophotometer's measurement principle (ATMOZ, 2018). Thus, measurements of the Dobson slit functions (Köhler et al., 2018), of the ozone cross-sections and their temperature dependencies (Bass and Paur, 1985;Serdyuchenko et al., 2014;Malicet et al., 1995;Janssen et al., 2018), of the stray-light effect (Christodoulakis et al., 2015;Karppinen et al., 2015;Moeini et al., 2019) and their implications on the ozone column retrieval for different instruments (Redondas et al., 2014) are now available. An adaptation of the processing algorithm with these recent findings would certainly 20 improve the absolute accuracy of the ozone observations. However, it is impossible to apply these findings consistently to the historical records of Dobson measurements because some essential instrument characteristics (slit functions, wavelengths in use, etc.) are not available for older instruments and data sets.
The Dobson network calibration is organised by the World Meteorological Organization's (WMO) Global Atmosphere Watch program. It is based on absolute calibration of a primary reference and six regional secondary traveling standards to transfer 25 the primary reference scale to each individual station (Komhyr et al., 1989). These calibrations were carried out regularly at LKO as indicated in Figure 3 by the arrows.
The Dobson automation and re-location from LKO to the Physikalisch-Meteorologisches Observatorium Davos / World Radiation Center (PMOD/WRC) was considered by MeteoSwiss with the prospect of perpetuating the measurements in the long term under optimal conditions. Factors considered in the analysis included the availability of operators for a year-round 24/7 30 monitoring program, data quality improvements (repeatability, reproducibility, increased frequency of measurements) and reduction of operational cost due to institutional synergies. Great care was taken to avoid a fundamental change of the Dobson measurements and hence to support the continuity of the LKO ozone column time series. A description of the technical details of the automated system is found in a separate publication (Stübi et al. , 2020). Table 1 lists the dates of the main changes that have the potential to introduce changes in the measurements of the three LKO Dobson instruments. By the end of 2015, all three Dobson instruments were automated and had reached the same configuration.

Data sets of coincident measurements
The automation of the Dobson spectrophotometers D 062 and D 051 was performed between the Inter-comparisons of summer  For the present analysis, measurements between a pair of Dobson instruments were defined as coincident if the following criteria were met : time difference δt < 300 seconds, air mass difference δµ < 0.05 and air mass µ ≤ 4. At LKO, the manual operation was facilitated by having the two instruments side-by-side on a turntable, which resulted in a systematic time difference δt between 45 and 75 seconds. For the automated operation, the mean δt is close to zero seconds. has been treated separately for the single wavelength pairs C, D, A and for the double pairs AD and CD. In a first step, the sun duration for 10-min periods is used as additional information, measurements in periods with less than 4 minutes of sun are flagged. Then the standard deviation of the 20 seconds R-dial records (δR) is used as a quality criterion. In the next step, an algorithm based on consecutive elimination of bad or doubtful measurements is applied for flagging. A 4 th order 20 polynomial function of time is calculated as a proxy of the daily variation. Outliers are eliminated (flagged) one by one, the polynomial function being recalculated after each elimination until all measurements of a day fulfill the wavelength and instrument dependant empirically determined criteria (e.g for D 062 |poly-O 3 | < 0.8%, < 2.0% and < 1.0% for respectively the C, D and A pair). The two minutes measurement cycle that was adopted helps to identify these outliers based on the assumption that the total ozone abundance changes slowly over time. Therefore, two consecutive measurements must also agree within a 25 given limit. Once these limits and convergence criteria are established, the flagging is done automatically without human intervention. However the measurements of the different instruments are still compared by visual control in order to detect malfunctions or drifts in an individual Dobson, which would then be flagged manually.

Results
In Figure Table 2 shows the statistics of the observed differences for these different periods of operation of the Dobson instruments.
Since there are 3 instruments and 2 locations, different cases for a given period are present in Table 2. In the MMC 20 year period, only D 101 and D 062 were used for total ozone measurements. The median difference is 0.14% with a 2.5%-97.5% interpercentile range (IPR 2.5%−97.5% ) slightly below 4%. The two instruments were in very good agreement with no significant 5 difference. Considering an average of 250 sunny days a year in Arosa, the 31'129 data points correspond to 6 to 7 coincident observations per day. On the relatively short MAC transition period, automated D 062 and D 051 were compared to manual D 101 .
For the pair D 101 /D 062 the results are very similar to the MMC case, but for the pair D 101 /D 051 a non significant bias of ∼0.6% is observed. The AAC comparison period shows an increase of the sample sizes by a factor of ∼10 together with a reduced IPR 2.5%−97.5% and no significant differences. Finally for the AAD period, an intermediate IPR 2.5%−97.5% and a non significant 10 bias ∼ 0.4% were found.
In Stübi et al. (2017a), an analysis of the daily Brewer data to discern the mid to long term variations of the differences and the short term random fluctuations of coincident measurements was introduced. This was an alternative method to the one introduced by  The difference, ∆062−051 (red), is the bias between D062 and D051 instruments evaluated from the coincident measurements of that day.
if values from instrument 1 are larger than those of instrument 2 (see Figure 4). σ i is a measure of the random fluctuations of each instrument, i.e., its repeatability. This approach works best with the numerous daily data available from the automated system but it can also be applied to the manual operation. The results of the daily analysis for the different periods mentioned above are presented in the next subsections.  data. In the earlier years of parallel measurements, Dobson D 062 was between 0.5 and 1% higher than D 101 but this bias has gradually decreased and the two data sets have agreed within ±0.5% since about the year 2000. We note a shallow seasonal cycle in the difference since 2005. The regular maintenance/calibration campaigns (black lines) did not induce noticeable 5 breaks in the time series of differences. We also observe that the differences during the periods following calibrations are not always zero as expected. This is because each instrument was calibrated independently against the traveling standard and differences of ±0.5% are within the uncertainty of the calibration procedure itself and were therefore not compensated. The repeatability σ i is shown separately in the lower panle of Figure 5. Values between 0.3% to 0.6% were observed for both instruments. The MMC section of Table 3, resp. of Table 4 summarize the statistics of the parameters ∆, resp. σ resulting from 10 the daily analysis. The mean monthly median differences ∆ are not significantly different from zero and the IPR 2.5%−97.5% is 1.7%. The repeatability around ∼0.4% (0.3%-0.7%) for these two manually operated Dobson instruments is similar and probably varied depending on the operator's experience and skill. These numbers are our reference metrics for comparing results of manual and automated observations of the Dobson instruments in the next sections.

Period of manual vs. automated Dobson operation (MAC period)
Over the one and a half year period, while the data acquisition and measurement program for automatic operation were developed, different interventions interrupted and perturbed the measurements repeatedly. Changes in the automated operating procedures, their timing, and improvements of hardware components make the comparison between the systems challenging.
It was also demanding for the operators to measure continuously to get a sufficiently large data set of coincident measurements of the azimuth control system was introduced but interference generated by this new system affected the measurements negatively. This problem was brought to light and solved in July 2013. In the first half of this MAC period, D 101 was ∼0.5% higher than D 062 and by mid-2013, the three instruments agree. The lower panel shows the improvement of the data quality with a significant decrease of the random fluctuations: the automated instruments (in red and orange) yield values around ∼0.3% while the manually operated instrument (in blue) is closer to ∼0.6%. In Figure 7, the monthly medians of ∆ 101−062 , ∆ 101−051 5 and σ i are shown. With the exception of the period April-June 2013, the mean bias between the manual and automated instruments is within ±0.6% and the repeatability of the automated Dobson is significantly reduced in comparison to the manually operated instrument. Table 3 show that D 101 data are on average 0.19% larger than D 062 data. However, we are looking at a bi-modal distribution due to the April-June 2013 period and the unevenly distributed measurements over the relatively short   and D 051 in Arosa for calibration and maintenance campaigns. These transfers could have altered the instrument response but this is difficult to assess from these relatively short comparison periods. The monthly averages for these periods are also less 10 representative since the sample is limited to only a few days in some cases. Notwithstanding, most data points lie within a ±1%

Lines 2 and 3 in
interval with periods of lesser agreement. Overall, the period 2016-2018 shows a convergence of the differences in the ±0.5% range associated with the improvement and tuning of the Dobson instruments' control system. The time series of ∆ 101−062 (red strip in Figure 8) is mostly within the ±0.5% range except at the end of 2015 where D 062 seems to be slightly lower.
The ∆ 051−062 (blue strip) shows the same deviation at the beginning of 2016 but converges to the ±0.5% range afterwards.

15
The 2013-2014 period of the ∆ 051−062 time series indicates that the automated systems were not yet fully stable and that the bias could change by ±0.5% over a year time period. As shown in the AAC section of Table 3, the ∆ i for the difference pairs comparison are not significantly different from zero except for the pair D 101 -D 051 at LKO. As evidenced in Figure

Period of automated vs. automated distant Dobson operation 2016-2019 (AAD period)
In January 2016, the D 101 instrument was relocated to Davos with a set-up similar to the one at Arosa. Since September 2018, D 051 instrument has also been relocated to Davos. The line-of-sight distance between Arosa and Davos is 11 km. The sites are sufficiently close to suggest a similar large scale stratospheric ozone regime. However, the altitude difference between the two observatories is 250 m which could translate into a slightly different total ozone column. Thus, total column ozone values at 5 Davos are expected to be comparable or slightly larger than at Arosa. Since 2016, the data acquisition and computer controlled operation have had slightest changes compared to the previous period of developments. Similar to the previous Figures, Figure 9 compares the Dobson pairs in terms of ∆ and σ for the distant instruments. The ∆ 101−062 time series (red strip) is now mostly within 0.5%±0.5% which could be an indication of an average offset between the two stations of the order of ∼0.5%. The most recent data of 2019 tend to exhibit a smaller offset as also indicated by the ∆ 051−062 time series (blue strip). The ∆ 101−051 10 time series (black strip) has a very similar pattern which corroborates the agreement seen in Figure 8 between D 062 and D 051 . to ∼0.4% which is in the range of 1-1.5 DU for the ozone column observed in the area. In the lower panel of Figure 9, the variations of σ i appear substantially larger than for the collocated cases. This is not too surprising since the two stations could certainly have different atmospheric conditions which influence the daily variations of the ozone column measured by the two distant instruments. In some cases, a time delay can be observed in the ozone variations at the two sites for example when a front is passing over the area (not shown). Attempts to systematically correct these time shifts did not improve the results significantly so they were not implemented. The 97.5%-percentiles of the σ of the Dobson instruments at different locations 5 reached 0.6%-0.8% mostly in winter. Such values were less frequent in the case of collocated instruments (Figure 8). However, these observed larger σ i variations do not significantly affect the monthly averages in Table 4 for the AAD cases, which were in the range of 0.1%-0.4%.

Seasonal cycle
For almost all optical measuring systems, a stray light effect is present with more or less influence on the measured values. The Dobson and Brewer sun spectrophotometers are no exception to this problem. The double-monochromator Brewer instruments are known to be free of a major stray light bias but the single-monochromator Brewer as well as the Dobson instruments are affected (Moeini et al., 2019;Karppinen et al., 2015). The larger the ozone slant path (OSP = ozone amount * air mass) the 5 larger the stray light effect, because the signal at the shorter wavelengths decreases more rapidly and gets to the noise level.
As the OSP is naturally seasonally dependant and the stray light effect is instrument dependant, it is of interest to analyse a possible bias due to the OSP. As noted in section 4.1, a seasonal cycle was observed in the 2005-2010 period and the right part of the upper panel of Figure 9 also shows a similar tendency. The result of the seasonal analysis of the ∆ i differences is presented in Figure 10. The colored strips denoting the IPR 2.5%−97.5% largely cover the zero line but the medians show a trend operation of the instruments. The method used to separate the mid-to long-term systematic biases between instruments and the short term random variations associated with each instrument were first presented in Stübi et al. (2017a). This method allowed 5 us to reduce by half the overall global bias range from typically IPR 2.5%−97.5% ∼3% (Table 2) down to IPR 2.5%−97.5% ∼1.5% ( Table 3).
The 20 year MMC period was long enough to bring both Dobson instruments D 101 and D 062 in agreement after multiple calibration campaigns. No significant biases were observed within the uncertainty associated with manual operations and the rather limited number of daily observations. Dobson instrument D 051 was primarily dedicated to automated Umkehr measure-  Table 2, the mean differences for the MAC cases are not significant considering the large IPR 2.5%−97.5% ∼ 3.8% for the coincident D 101 -D 062 and D 101 -D 051 values.

20
From the refined daily analysis, the IPR 2.5%−97.5% have been reduced to ∼ 1.6% (Table 3). Even though the values of the differences for the two pairs appear to be quite different (∆ 101−062 = 0.19% vs. ∆ 101−051 = -0.56%), they still remain close to ±0.5%. Moreover, they represent averages of different time periods and sample lengths and should be compared with caution.
Beginning in 2014, all three Dobson instruments were ready for automated and collocated (AAC) operation. For a while, as shown in Table 1, the operating environment was still changing from time to time, and the system was subject to occasional 25 technical glitches. Table 2 shows that the direct comparison differences for the AAC case are not significant with ten times larger sample sizes than in the MAC case. The daily analysis results from Table 3  Considering the homogeneity and continuity of the Arosa / Davos ozone column time series, the comparison of coincident data obtained independently at the two stations is an essential part of this study. A similar analysis by Stübi et al. (2017b) considering the long term stability and random uncertainties of the Brewer instruments found no significant differences between the Arosa and Davos sites. The analysis of the AAD period presented in section 4.4 arrives at the same conclusion. Notwithstanding, the last three lines in Table 2 may indicate the possibility of a 0.4% systematic high bias within an IPR 2.5%−97.5% of 2.5-2.9 for the instruments located at Davos. The daily analysis results in Table 3 confirm these numbers with ∆ i values 5 of 0.5% but with a reduced IPR 2.5%−97.5% of ∼1.3%. In Stübi et al. (2017b), the authors estimated that the Arosa-Davos altitude difference of 260 m could contribute 0.25% ± 0.15% to the ozone column. Therefore half of the observed difference could be attributed to the longer ozone column measured from Davos. The σ i values reported in Table 4  instruments are now available from the ATMOZ project (ATMOZ, 2018). However, such improvements were beyond the scope of the present analysis. Similarly, the characterisation of the first few kilometers of the ozone profile and its seasonal cycle in the Arosa and Davos valleys, to more accurately assess differences of the free troposphere ozone column above these two sites, needs to be refered to future research.

20
The present results based on Dobson data confirm the conclusion reported in Stübi et al. (2017b) based on Brewer data. Biases found are not statistically significant at the IPR 2.5%−97.5% level, and therefore, could not be systematically compensated. A re-processing of the Dobson and Brewer data sets with an improved algorithm based on recent ozone cross-section values, improved stray-light correction based on better slit functions could perhaps reduce the uncertainties on the biases found but would most certainly not change our conclusions. The results presented in this study are unique since no other station of the 25 Dobson network has operated fully automated collocated Dobson instruments over a multi-years time period. Considering the importance of the Arosa time series, research will continue with a focus on trend analyses and break detection of the series both on data from Arosa (continued until mid 2021) and data based on the combined Arosa-Davos time series.