Reducing cloud contamination in AOD measurements

We propose a new cloud screening method for sun photometry that is designed to effectively filter thin clouds. Our method is based on a k-nearest neighbour algorithm instead of scanning timeseries of aerosol optical depth. Using ten years of data from a precision filter radiometer in Innsbruck, we compare our new method and the currently employed screening technique. We exemplify the performance of the two routines in different cloud conditions. While both algorithms agree on the classification of a datapoint as clear or cloudy in a majority of the cases, the new routine is found to be more effective in 5 flagging thin clouds. We conclude that this simple method can serve as a valid alternative for cloud detection, and discuss the generalizability to other observation sites.

neighbours algorithm. This principle is well established in the fields of machine learning and data mining as an efficient way to identify outliers in data (Ramaswamy et al., 2000). In the context of AOD measurements, clear sky will lead to regions of high density/little distance between points, whereas clouds will result in less dense regions/outliers.

Instrument and Raw Data
We use a precision filter radiometer (PFR) developed by the PMOD WRC Davos for the GAW network (Wehrli, 2005) with four channels (368nm, 412nm, 501nm, 864nm) and a field of view of 1.2 • . The instrument is set up on top of a ten-storey university building in Innsbruck, Austria (47 • 15' N,11 • 24' E). Our operational guidelines are based on the ones of GAW and the instrument is calibrated by PMOD, but our site runs independently of the network. Additional to the minutely PFR reading, 35 we measure the air temperature and pressure at the site and monitor the overall cloud conditions with an all-sky camera taking pictures every 10 minutes.

Processing and filtering
First steps in quality control include the removal of data points where any of the four voltages is negative. Furthermore, flags are introduced if the sun tracker records a value higher than 15", and an ambient temperature above 310 K. 40 After the initial filtering, aerosol optical depth (AOD) is calculated from the voltage measurements. Starting from the Lambert-Beer law, the aerosol optical depth is derived as: where V (λ) are the measurements, V 0 (λ) the calibration factors, R the Sun-Earth distance, and m the airmass in the path between instrument and sun.

45
Several atmospheric constituents contribute to the optical depth, one of which are aerosols (indicated by subscript a). Other factors (subscript o) which are taken into account in our calculation are Rayleigh scattering (Kasten and Young, 1989;Bodhaine et al., 1999), Ozone (Komhyr et al., 1989), and NO 2 (Valks et al., 2011). We use climatological values for O 3 and NO 2 , and temperature and pressure measured on site.
Resulting unphysical values (negative or infinite) of the aerosol optical depth are discarded.

50
At each timestep, we perform a linear and a quadratic fit (indicated with subscripts l and q respectively) to τ a (λ) to derive the Angstrom parameters (Ångström, 1929, 1964King and Byrne, 1976).
The spectral slope α l of the linear fit and the spectral curvature γ q of the quadratic fit are used in further analysis and referred to without the subscript hereafter.

Cloud flagging
The next step in quality control of the data is the flagging of potentially cloud-contaminated datapoints. Figure 1 shows the basic principle of the presently employed scheme and the proposed new method.

60
Currently, our operational routine is based on the criteria laid out in Smirnov et al. (2000), with some minor adaptions according to Wuttke et al. (2012). The main criterion for filtering datapoints is the difference between the maximum and minimum AOD value within a multiplet of consecutive datapoints (Smirnov et al. (2000) uses a triplet, whereas we look at a quintuplet) which cannot exceed a set value. This threshold is balanced to filter clouds while retaining real AOD variations. As this criterion is the most relevant for flagging, we will refer to it as "Multiplet routine" hereafter.

65
Instead of step-wise scanning timeseries, our new routine performs one calculation for all currently available datapoints. We use a k-nearest neighbours algorithm to establish the 20 closest points {P 1 , P 2 , ..., P 20 } for each of our measurements P 0 in a four dimensional space. Then the mean euclidian distance between P 0 and its neighbours is calculated (referred to as d 20 ) and P 0 is identified as cloudy if this distance exceeds a threshold. This method is usually used to identify clusters of data points, hence we will call it "Clustering routine".

70
The dimensions used are the aerosol optical depth at 501nm, its first derivative with respect to time, and the two Angstrom parameters α and γ. The first two cover temporal variations of one wavelength, the second two changes in the spectrum. To ensure that these parameters are comparable in order of magnitude, the Angstrom parameters derived from equations 2 and 3 are divided by a factor of 10. Furthermore, the finite-difference time derivative of the AOD is used in units of 1 per 5 minutes, analoguous to checking the AOD variation within a quintuplet of measurements.

75
Points affected by a track error will not be considered in the set of possible nearest neighbours. If less than 20 measurements are valid, the number of nearest neighbours k will be reduced accordingly down to a minimum value of 5 points in real-time analysis. To account for the lower number of nearest neighbours, the calculated distance is then multiplied by 20 k to make it comparable to the d 20 measure. Similarly, if the number of datapoints identified as clear on one particular day is lower than 30, the Clustering routine is re-run with 10 nearest neighbours during post-processing to ensure high data retention.

80
To establish the threshold for possible cloud contamination, we calculate the distribution of d 20 on clear days. We estimate a limit, which is further fine tuned using days on which the Multiplet routine fails to identify thin clouds. An example (12 th March 2020) is given in figure 2. We show the four dimensions of our space, as well as the resulting d 20 of our datapoints.
From these days, we set the d 20 threshold to 0.012.

85
To assess the performance of the Clustering routine, we will compare it to the Multiplet routine, using the last 10 years of measurements (2010-2019), with 3330 days of measurements in total. Of these days, 1906 are found to have clear datapoints On a clear day, the routines agree unsurprisingly very well. Clustering retains more points at the beginning and end of the day, which get picked up by limiting the airmass in the Multiplet routine. On the other hand, some slight outliers in α and γ get flagged by Clustering. The difference in daily mean is smaller than the measurement error.

95
When thick clouds are passing, with just short intervals of clear sky in between, Multiplet hardly identifies these as such. As Clustering takes all available data into account, it can assign points as clear even if the immediately preceeding and consecutive point are deemed cloudy. Despite flagging less points, Clustering lowers the daily mean τ 501 by 0.008 in this case, which is of similar magnitude as our measurement error.
On a day with lots of thin clouds (mainly contrails), the differences between the two routines are pronounced: a few relatively 100 high AOD points in the morning (around 8am) pass Clustering, as do points during midday (between 10am and 12am). These points, which are spectrally very similar, are indeed cloud free, as confirmed by pictures of the sky camera. Multiplet however, filters less points as cloudy, which show cloud contamination as a decrease in fine mode fraction in the α-γ plane. For this day, the Clustering-Multiplet difference of daily mean τ 501 is -0.027, which is the order of possible bias of Multiplet reported by Chew et al. (2011).

105
Another example of Clustering being more rigorous in cloud flagging can be seen on the day labelled with "Various Clouds".  There were several optically thick clouds passing, which get identified correctly, but neither their thin edges nor the optically thin clouds on that day get picked up by the Multiplet routine. This day gets correctly eliminated by Clustering despite Multiplet marking 89 datapoints as clear.
Occasionally, Saharan Dust can get transported to Austria. Despite unusually high AOD, both routines correctly identify most 110 of the data as cloud free. Daily mean τ 501 is slightly lower (-0.005) when using Clustering, but this is still of the order of the calibration error.
One very unusual event is depicted last: after the eruption of Eyjafjallajökull in Iceland in April 2010, its ash plume was dispersed over Europe. It exhibits high AOD, and similar particle radii and fine mode fraction as Sahara dust. Clustering flags more data due to the high variation in AOD with time, but still retains data in the afternoon after about 1pm. Unfortunately, 115 we do not have pictures available to estimate whether the data in the morning was cloud free, and should therefore be retained.
Clustering lowers daily mean τ 501 significantly, leading to -0.057 absolute and -12% relative difference. However, such an event is rare enough to be manually cloud screened, if necessary.
Overall, the Clustering routine flags more data than the Multiplet routine, albeit not necessarily the same data points. A more detailed comparison can be seen in figure 4. The Multiplet routine identifies about 47.6% of datapoints as cloudy, Clustering 120 about 50.5%, which is a realistic value considering the amount of sunshine hours Innsbruck receives on average.
As the main objective of the new algorithm was to filter thin clouds which previously passed the quality criteria, a higher number of flagged datapoints overall is expected. On the other hand, Clustering can flag isolated outliers without flagging the preceding and succeeding points of the multiplet, which lowers the number of flagged points. In 88% of all cases, the two methods agree in the (non-)assignment of a cloudflag. Nonetheless, about 10% of the data deemed cloudy by Multiplet is not there are only 10 days where the opposite is the case. Nonetheless, there are more than 1000 days without clear data in the 10-year record. The number of datapoints on the days which are disregarded by Clustering ranges between 1 and 89. Most of these days would therefore not be considered in further analysis in other measurement networks (Kazadzis et al., 2018;Giles et al., 2018) either. Furthermore, as shown in Figure 3, some of these days should be eliminated as they are indeed cloud contaminated.

135
Clustering leads to lower daily mean AOD on about 63% of the days ( Figure 5). The mean difference is -0.0029 forτ 501 , which is of the order of the calibration error (0.001 to 0.01, depending on wavelength and airmass). However, on particular days this difference can range from -0.08 to 0.04 in absolute numbers, or -62% to +27% relative to the values based on Multiplet screening. Similarly, Clustering leads to higher mean α on 67% of the days. Averaged over ten years of data this leads to an increment ofᾱ by 0.02. In extreme cases, the difference can be as high as +0.54. Both distributions are indicative of Clustering flagging We presented a new approach for flagging cloud contaminated data points from sun photometer measurements by treating them as outliers/region of low density in a four-dimensional space. Our method successfully tackles the problem of the Multiplet routine by Smirnov et al. (2000) of not detecting optically thin clouds, such as cirrus and contrails.

145
In contrast to the new routine employed by AERONET (Giles et al., 2018), which introduces more quality parameters (ten in total) and requires an aureole scan, our routine only needs one semi-empirically derived threshold and direct sun measurements for assigning a cloud flag.
While fewer datapoints are retained overall, which is expected from being able to filter thin clouds, the Clustering routine does not just flag more, but different datapoints (figure 4). As there is an ambiguity in the transition between humidified aerosols 150 and clouds (Koren et al., 2007), an exact discrimination between false positives and negatives for either routine is not possible.
Nonetheless, the new routine leads to lower AOD and higher α in the long term mean, which indicates a reduction of cloud contamination bias.
Detailed comparison with the previously employed cloud screening routine showed that both methods agree in their classification for the vast majority of cases ( figure 4). Still, Clustering reduces mean AOD for most of the days in our testing period 155 (figure 5). The daily mean AOD at 501nm averaged over the last 10 years is lowered by 0.0029, which is comparable to instrument precision (Wuttke et al., 2012). However, on single days Clustering reduces daily mean by more than 0.02 (up to 0.08), which is the same magnitude as reported as bias of the Multiplet routine by Chew et al. (2011), and exceeds the error of the instrument and trace gas optical depth. Together with specific example days (figure 3), this supports the notion that clustering corrects some cloudy points of the Multiplet routine to clear, while flagging some of its erroneously clear points as cloudy. The 160 small difference in the long term mean is partly due to the specific cloud conditions in Innsbruck, and could therefore be much larger in high latitude regions with low mean AOD and higher prevalence of thin clouds.
Due to the nature of the Clustering routine, it needs a minimum number of measurements, though dynamic adaptions can be made if these are not available. As its accuracy increases with a higher number of datapoints, it is ideal for post-processing.
This does not diminish its usefulness for real-time analysis, as erroneously cloudy points can be cleared later when more data is available, but not the other way round.
While the four dimensions considered in the Clustering routine account for variations of one specific wavelength and in the spectrum, the question arises whether these can be reduced even further. Especially γ, which has the highest error of the variables (Gobbi et al., 2007), might be a candidate. Initial independence tests using mutual conditional information as measure (Runge et al., 2019) show a strong association of α and γ. However, outliers in γ can appear independently of α, which is why 170 we kept γ as a dimension and therefore data constraint.
So far we have tested the routine only for our specific measurement site. It performs well in different cloud conditions, as shown in figure 3, and was shown to alleviate AOD bias in the presence of thin clouds. Adaptations regarding the number of nearest neighbours, the relative weight of the different dimensions, or the d 20 threshold can be easily done to optimize cloud detection with other instruments as well.