Long-term trends of total column ozone (TCO), assessments of stratospheric ozone recovery, and satellite validation are underpinned by a reliance on daily “best representative values” from Brewer spectrophotometers and other ground-based ozone instruments. In turn reporting of these daily total column ozone values to the World Ozone and Ultraviolet Radiation Data Centre (WOUDC) has traditionally been predicated upon a simple choice between direct sun (DS) and zenith sky (ZS) observations. For mid- and high-latitude monitoring sites impacted by cloud cover we discuss the potential deficiencies of this approach in terms of its rejection of otherwise valid observations and capability to evenly sample throughout the day. A new methodology is proposed that makes full use of all valid direct sun and zenith sky observations, accounting for unevenly spaced observations and their relative uncertainty, to calculate an improved estimate of the daily mean total column ozone. It is demonstrated that this method can increase the number of contributing observations by a factor of 2.5, increases the sampled time span, and reduces the spread of the representative time by half. The largest improvements in the daily mean estimate are seen on days with the smallest number of contributing direct sun observations. No effect on longer-term trends is detected, though for the sample data analysed we observe a mean increase of 2.8 DU (0.82 %) with respect to the traditional direct sun vs. zenith sky average choice. To complement the new calculation of a best representative value of total column ozone and separate its uncertainty from the spread of observations, we also propose reporting its standard error rather than the standard deviation, together with measures of the full range of values observed.

Global ground-based monitoring of total column ozone (TCO) relies on the international network of Brewer spectrophotometers since they were first developed in the 1980s (Kerr et al., 1981), which has expanded the number of sites and measurement possibilities from their still-operating predecessor instrument, the Dobson spectrophotometer. Together these networks provide validation of satellite-retrieved total column ozone as well as instantaneous point measurements that have value for near-real-time low-ozone alerts, particularly when sited near population centres, as inputs to radiative transfer models at ultraviolet wavelengths, and critically underpin the monitoring requirement of The Vienna Convention for the Protection of the Ozone Layer, 1985.

There are recent indications that, for the first time since the treaty was enacted and chlorofluorocarbons and other depleting substances were banned, the ozone layer is showing signs of recovery (WMO, 2014, and references therein). These and related trend analyses, however, use daily mean TCO values as their starting point. In the parlance of the World Ozone and Ultraviolet Radiation Data Centre (WOUDC), the WMO data centre for ground-based ozone data, the daily values submitted by data originators should be the “best representative values” (WOUDC, 2016). For Brewer spectrophotometers a cascade of choices is recommended as follows. If available, the mean of valid direct sun (DS) measurements is preferred. If no valid DS observations are available for a given day, then the mean of all valid zenith sky (ZS) measurements is used. If no valid DS or ZS observations are available, the last choice is to rely on the mean of valid focused moon observations, a measurement mode predominantly used at high-latitude stations. Here we only consider the choice between DS and ZS observations.

The majority of the effort spent on calibrating Brewer spectrophotometers is directed towards ensuring high-quality DS calibrations are distributed globally through the Brewer reference triad (Fioletov et al., 2005), through the intercomparisons at the Regional European Calibration Centre, and through initiatives such as COST Action ES1207 and EUBrewNet. ZS observations are then linked to the DS calibration through a polynomial fit of quasi-synchronous DS and ZS observations (Kipp & Zonen, 2005). This additional calibration step explains the default preference for DS over ZS measurements as it incurs a small associated uncertainty. However, at mid- and high-latitude stations in particular the annual mean cloud fraction can exceed 50 % (Wilson and Jetz, 2016) and limits opportunities for recording viable DS observations. As a consequence, for a high fraction of days the best representative daily value (BRDV) is based upon zenith sky measurements. More crucially during partly cloudy days, the BRDV can be reliant on a small number of individual DS observations (< 5), which may be biased towards either the start or end of the day, whilst a greater number of valid ZS observations from throughout the observation period are rejected

This gives rise to the question: could a more representative daily value be obtained from an increased number of ZS measurements than from a small number of DS measurements? To answer this still forces a choice between valid DS and ZS observations as the number of DS measurements falls – whichever set of observations is chosen, a set of otherwise valid data is not incorporated into the calculation of the best representative value. Therefore, we propose an alternative methodology to calculate a best daily representative value that retains both direct sun and zenith sky measurements, taking into account their relative uncertainties and periods of time when valid measurements are more frequent.

Brewer spectrophotometers, their operation, and standard data processing routes have been described previously in the literature (Brewer, 1973; Fioletov et al., 2005; Smedley et al., 2012; Savastiouk and McElroy, 2005). For context we outline the key points here. The core of each instrument is a single or double monochromator unit whose output is detected by a photomultiplier tube. For the DS measurement mode the input is from a rotating prism assembly pointed towards the sun's disc. Column ozone observations are achieved by rapidly repeated measurements at five operational wavelengths over a period of approximately 3 min, and a final value is calculated by implementing the Lambert–Beer law and knowledge of the absorption cross section of ozone molecules. ZS observations are made in the same way but the rotating prism is instead directed to collect scattered light from the zenith, and then an empirical polynomial adjustment is applied. This polynomial adjustment assumes that the apparent ozone column from the zenith sky measurement is quadratic in both the air mass factor and the actual column ozone. The nine constants necessary are determined from a large number of quasi-simultaneous DS and ZS measurements (> 500) and are instrument and site specific. This relationship is usually determined at the instrument's home site, rather than during an intercomparison or calibration exercise, though Fioletov et al. (2011) described an improved radiative-transfer-modelling-based methodology that reduced the instrument-specific unknowns to two parameters (though nine constants are still necessary). However the polynomial constants are determined, the ZS observation is then found by solving the relevant quadratic equation.

For a site where direct sun can be guaranteed for the majority of the day, an instrument could be scheduled to only attempt DS observations at regular intervals (together with the necessary diagnostic routines) and the mean of these observations would be a reliable estimate of the actual daily mean TCO overhead at the station. For other sites where cloud is more variable and unpredictable, the observational schedule must contain a combination of both ZS and DS measurements. However, local cloud cover conditions may only permit a small number of DS measurements to be successfully recorded. Figure 1 shows an example day where only four DS observations were recorded between 13:48 and 15:49 UTC and where their arithmetic mean (298.2 DU) differs substantially from the daily mean TCO at the station as indicated by ZS observations. In order to avoid the potential binomial choice between a small number of DS observations and a greater number of ZS observations, we propose a weighted daily mean that utilises all valid DS and ZS values.

Example day showing valid DS and ZS observations and their
standard deviations. Also shown are the daily representative value based on
the traditional (arithmetic mean, DS > ZS preference) methodology
(red outlined square,

Our aim is to construct a daily mean that has the following properties. In the absence of either any valid DS or ZS observations, it produces the same result as the standard method (once clustering of observations is accounted for). With the addition or subtraction of a single DS data point, there is a graceful change in the BRDV and the overall time period it represents. It should represent as fully as possible the day's TCO observations. It should give equal weight to equal periods of time and hence account for time clustering of valid observations and for their relative uncertainties. It should be able to be applied to historic data and not necessitate any changes to the instrument's future schedule or data collection routines.

The proposed methodology is as follows. The first prerequisite is for ZS
polynomials to be assessed regularly (here these have been recalculated
using the full available dataset for each inter-calibration period) to
ensure the individual DS and ZS observations are comparable and there is
minimal bias in the ZS observations (for example in our dataset overall
DS–ZS bias

All DS and ZS measurements are then filtered to remove those that do not
meet the validity criteria. Observations that have a standard deviation of
> 2.5 DU for DS and > 4.0 DU for ZS are rejected. We
note that the standard choice of standard deviation threshold is 2.5 DU for
ZS observations, but increasing the limit to 4 DU does not introduce any
bias and increases the total number of valid observations (Fioletov et al.,
2005, 2011). Observations at air mass factors > 4 are also rejected for single monochromator instruments, but this limit is
raised to 6 for double monochromator instruments due to their improved stray
light rejection (Karppinen et al., 2015). To ensure that any residual bias
is not present at high air mass factors, an additional tail removal step is
applied. For this the day's data are smoothed with a 30 min running
average filter, and end periods of time where the smoothed TCO exhibits
apparent rates of change > 20 DU h

At this stage the remaining DS and ZS values meet the specified validity
criteria and have passed the additional tail removal check. To form a BRDV
from these individual observations, we calculate a weighted mean of the full
set of data points but where the weighting has two components: the time for
which the observation is representative and the uncertainty of each
observation, as in Eq. (1):

We also note that this methodology could be applied to all data acquired without applying a threshold standard deviation validity filter as data points with large errors will contribute to the BRDV proportionally less. However, more care needs to be taken as regards relaxing the air mass threshold requirement as small biases may be introduced, inflated by the effect of observing at high air mass, whilst the uncertainty would not have been captured by the intrinsic standard deviation of the observation. Further the ZS uncertainties could be expanded appropriately to account for any day-to-day bias between ZS and DS observations under differing sky conditions or alternatively to incorporate the DS–ZS polynomial fit mean residual, for example.

For higher-latitude sites where other measurement modes are relied upon, such as focussed moon, focussed sun, or TCO derived from global spectral irradiance, these observations could also be incorporated into the BRDV calculation in a similar way (see seasonal variation of observation types in Karppinen et al, 2016). The prerequisites would be that each individual observation should have an associated uncertainty and that the observations from different measurement modes should be homogenised beforehand. In terms of practical implementation, if the method were adopted by the community a new observation type would have to be registered at WOUDC (a mechanism that is already available), with relevant details added to the scientific support statement as necessary. For stations that submit raw data, or processed individual observations, the weighted mean BRDV calculation could be applied across all sites as a daily summary value.

Summary statistics for data shown in Fig. 2: number of contributing observations, representative observation time, and time span of contributing observations. For each case, values shown are the arithmetic means, and in brackets the lower 10th percentile and upper 10th percentile values are shown.

It is also worthwhile to consider the strengths of this methodology under specific theoretical conditions. If, for example, on a given day there is a strong linear east–west ozone gradient present, then the most appropriate daily measure should return a value similar to the TCO above the site. The traditional method risks producing a daily value that could be substantially different if, due to cloud cover, only a few valid DS measurements could be recorded during early morning or late in the day when the TCO is being sampled to the west or east of the site. In partly cloudy or cloudy conditions a minimum air mass TCO measurement may not be obtained. In contrast, the proposed method guards against these issues as ZS observations could be sampled more fully through the day, whilst the contribution from DS measurements would represent the effective TCO along the slant path when the direct solar beam is visible. As a result the proposed BRDV TCO calculation is more appropriate for UV exposure studies than the traditional calculation, more representative of the conditions throughout the day, and more resilient than relying on a single value at minimum air mass, for example. For non-linear spatial gradients in TCO, the limited DS measurements could result in a value more different still from the mean TCO overhead, while the bias from the selection of the TCO near minimum air mass would depend on the spatial distribution of ozone.

To demonstrate the impact of this method on real world data, we apply it to
the 2000–2016 data record from Brewer spectrophotometer #172, located in
Manchester, UK (53.47

Overall we see an increase in the mean number of contributing observations
(

As expected we see concomitant tightening of the distribution of representative times (defined as the mean of valid observational times, weighted by their TCO values) around solar noon (close to midday). There is also a skewing of the observational time span distribution to longer periods. The representative time is symmetrical about solar noon in both the traditional and new methods, but the width of the annual distribution (defined as the interdecile range) is halved from 4.05 to 2.01 h, showing the new method results in BRDVs that are more representative of the conditions at solar noon. Again due to the longer day length the improvement is accentuated during summer (6.5 h reduced to 2.7 h) but still present during winter months (1.56 h reduced to 1.20 h). Time spans for the whole year are generally skewed to the right-hand side of the distribution, though the upper and lower bounds do not change (being limited by number of daylight hours and instances of single contributing observations respectively). Much of the skewness in the annual distribution is attributable to that occurring during the summer subset (Fig. 2, third row, second column), where the lower 10th percentile increases from 0.5 to 10.0 h.

Histograms of the number of contributing observations (

Taken together these results demonstrate that the method enables a more representative daily mean to be calculated, predominantly by sampling more fully through each day and over a wider range of weather conditions. However, it is prudent to investigate the impact on the overall time series and trends.

Focussing on the 2006–2016 subset for clarity, in Fig. 3, we see only a
small impact on the monthly mean TCO values from the application of the methodology
described, with no clearly discernible trend or annual cycle in the difference (Fig. 3, upper and middle panels). Regression analysis shows the trend in the
difference between traditional and proposed methodologies to be

In Fig. 3 (lower right panel) we explore the ranges of the differences between methodologies further. It is anticipated that the greatest differences between traditional and proposed methods will be seen on days with high ozone variability and a low number of contributing DS measurements (or a high fraction of ZS observations). Plotting the TCO difference against the number of valid DS measurements, the greatest variability is seen for a single valid DS measurement with the distribution rapidly narrowing as the number of valid DS observations increases. We distinguish days according to their seasons, with those from summer and autumn (JJASON) marked in dark grey and those from winter and spring (DJFMAM) marked in red. Typically TCO exhibits a much larger variability during winter and spring at this location, though the effect is not overly strong and the number of DS observations is a better indicator for a large potential improvement in the BRDV.

Together these results suggest that there should be no impact on long-term trends at a site where the data record is derived from a single instrument type. However, there could be implications where there has been a change in the data sampling method. Moving from a semi-manual Dobson spectrophotometer that makes a limited set of observations on a predefined schedule to a Brewer spectrophotometer that operates quasi-continuously and selects a daily value on the traditional DS vs. ZS choice could introduce a small step change due to this effect, which may contribute to a perceived trend. Likewise applying the proposed method to only part of a data record, because individual historical measurements have been lost, for example, could also introduce a small step in the overall record.

Testing the influence of the new methodology in terms of the agreement
between ground-based and satellite retrievals (Fig. 4), we find a marginal
improvement in the ground vs. satellite TCO for daily mean data in terms of
their

While the focus of this study is on the determination of a more representative daily TCO value, there are a number of related issues concerning reporting of the daily spread that will be discussed in this section.

At present, WOUDC recommendations are to report, in addition to the best representative value, a standard deviation for the day's observations, which implicitly assumes a normal distribution. Whether a day's observations of the underlying ozone column necessarily falls within a normal distribution is not obvious, nor are the authors aware of any evidence in the literature. To that end we have applied the Kolmogorov–Smirnov test (Massey, 1951) to each day's observations for the same 2006–2016 subset of data. The null hypothesis for this test is that the individual daily observations come from a standard normal distribution, and on application we find that this null hypothesis is not rejected for any of the days in our test sample. That is, all can be considered as being taken from a normal distribution.

Whilst this result does not undermine the use of the standard deviation as a measure of the spread of the day's data, we propose that other metrics may be more useful. Specifically, to separate out the uncertainty in the best representative daily value from the range exhibited by individual observations, a more useful measure would be to use the standard error of the weighted mean to indicate the uncertainty of the best representative value plus additional metrics relating to the maximum and minimum TCO observed. The latter could be the strict maximum and minimum, or, to guard against the influence of short-duration spikes, the upper and lower 10th or 25th percentiles could be used, for example.

Whilst developing the methodology described in Sect. 3, the geostatistics analysis route known as “kriging”, or Gaussian process regression, was also tested (Bailey and Gatrel, 1995; Lophaven et al., 2002). This analysis produces the best linear unbiased estimator of the actual underlying TCO at times intermediate to the observations and also produces an associated uncertainty. In brief, it performed well for days where there are a larger number of contributing observations but showed poorer performance during winter or other days with few observations. This latter issue is in part due to the complex nature of applying the method, where for few observations there is a risk of overfitting. However, for studies where short-term prediction of the TCO and its near-term uncertainty is of interest, such as real-time estimates or nowcasting, kriging may find applications. More generally its applications could include spatial analysis and interpolation of TCO and surface irradiance, which are two fields where global datasets are reliant on a limited number of measurement sites.

In this study, we propose, describe, and assess a new methodology for determining a more representative best daily value of total column ozone from Brewer spectrophotometer observations. This method overcomes the limitations of making the traditional choice between a possibly small number of direct sun measurements and zenith sky measurements. It requires a homogenised set of DS and ZS data as a prerequisite but then, by taking a weighted mean and accounting for both the uncertainty associated with each individual observation and the time period the observation represents, produces a more representative value based on the full set of daily observations.

Applying the new method to the 2000–2016 dataset from Brewer 172 stationed
at Manchester (53.47

To complement our proposed BRDV calculation, we also recommend reporting the standard error of the daily mean value and replacing the standard deviation by a more complete measure of the daily spread such as the upper and lower limits of the interdecile range, or simply the maximum and minimum observed values.

The underlying data used in this study can be accessed at the World Ozone and Ultraviolet Radiation Data Centre (Smedley et al., 2017).

ARDS was primarily responsible for the data collection, processing, and monitoring of Brewer spectrophotometer #172 and led the manuscript preparation. JSR assisted with data collection, contributed to the manuscript, and secured funding. ARW contributed to the manuscript, secured funding, and was the Principal Investigator on the overall grants.

The authors declare that they have no conflict of interest.

Stratospheric ozone and spectral UV baseline monitoring in the United Kingdom is supported by DEFRA, the Department for Environment, Food, and Rural Affairs, since 2003. The authors would also like to thank Vladimir Savastiouk and two anonymous referees for their constructive and valuable comments. Edited by: Andrew Sayer Reviewed by: Vladimir Savastiouk and two anonymous referees