A simple and versatile cloud-screening method for MAX-DOAS retrievals

C. Gielen, M. Van Roozendael, F. Hendrick, G. Pinardi, T. Vlemmix, V. De Bock, H. De Backer, C. Fayt, C. Hermans, D. Gillotay, and P. Wang Belgian Institute for Space Aeronomy (BIRA-IASB), Brussels, Belgium Department of Geosciences and Remote Sensing, Delft University of Technology, the Netherlands Royal Meteorological Institute of Belgium, Brussels, Belgium Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, China


Introduction
In recent years, ground-based multi-axis differential absorption spectroscopy (MAX-DOAS) has been demonstrated to be ideally suited for the retrieval of tropospheric trace gases and deriving information on aerosol properties (e.g.Hönninger et al., 2004;Wagner et al., 2004;Frieß et al., 2006;Clémer et al., 2010;Hendrick et al., 2014).These measurements are invaluable to our understanding of the physics and chemistry of the atmospheric system, and the impact on the Earth's climate.
MAX-DOAS retrievals of trace-gas columns and aerosol optical depths typically assume clear-sky conditions in the forward model.However, MAX-DOAS measurements are often strongly affected by clouds, leading to significant data quality degradation and larger uncertainties on the retrievals.This, in turn, strongly impairs the use of ground-based retrievals in the context of satellite validation.
In this paper we present a cloud-screening method, based on (MAX-)DOAS measurements, which aims at providing a general qualification of the sky and cloud conditions during the measurements.The data set consists of multi-year observations made at three sites with very different typical meteorological conditions, Xianghe (suburban Beijing, 39.75 • N, 116.96 • E), Brussels (50.78 • N, 4.35 • E) and the alpine station of Jungfraujoch (46.55 • N, 7.98 • E).We focus on 90 • elevation observations for the colour index as our simulations show these are the most sensitive to the sky conditions (see Sect. 3).Moreover, they are independent of the azimuth angle, and are very sensitive to the temporal variability of C. Gielen et al.: A simple and versatile cloud-screening method for MAX-DOAS retrievals.
clouds above the instrument site.The use of the zenith measurements means that the cloud-screening method is not only limited to MAX-DOAS but can also be applied to similar instruments working in the zenith mode only.For the O 4 measurements we also use the 30-90 • elevation measurements, but the method can also be applied if only zenith measurements are available (see Sect. 4.3).
The recent paper of Wagner et al. (2013) described in detail the effect of clouds on the different quantities derived from MAX-DOAS observations, such as the radiance, colour index, O 4 absorption and the Ring effect (the filling-in of Fraunhofer lines due to inelastic scattering on atmospheric molecules).They developed a cloud-screening method based on these effects and on the comparison with clear-sky reference simulations.The method was applied to observations made during the CINDI campaign (Piters et al., 2012), where a good agreement with sky images taken from the ground was found.However, the total data set used contained data of only a limited time span (12 June 2009-15 July 2009).
Our cloud-screening method is similar to the method described in Wagner et al. (2013) but uses a simpler approach.Both methods use colour-index (CI) simulations and the temporal variability of the CI and O 4 absorption, but our method is not based on radiance or O 4 simulations and does not use information from the full MAX-DOAS elevation scan but focuses on the zenith elevation (and in lesser degree the 30 • elevation).Our approach is furthermore based on a general simulation model of the colour index, which is used for all different measurement sites, thereby strongly reducing the computational cost and enhancing the general applicability of the method.
This paper shows that a simple cloud-screening method can be successfully applied to large data sets measured under a wide variety of meteorological conditions, from the extreme polluted atmosphere above Xianghe, the clouddominated Brussels data set, to the pristine alpine skies in Jungfraujoch.
In Sect. 2 the different MAX-DOAS instruments and the DOAS data analysis are described.In Sect. 3 the concept of the colour index and its relationship with sky and cloud conditions are presented.A description of our cloud-screening method and the definition of the cloud-screening flags can be found in Sect. 4. In Sect. 5 the results from the cloud screening at Brussels with co-located thermal infrared cloud-cover measurements are compared.Next, we apply our cloudscreening to aerosol model retrievals.A description of the radiative transfer model and co-located aerosol measurements and the resulting effect of the cloud screening on the agreement between model and measurements can be found in Sect.6.We end with the conclusions in Sect.7.

MAX-DOAS measurements
The MAX-DOAS instrument is a passive DOAS instrument that performs quasi-simultaneous measurements of scattered sunlight for a range of different elevations, from the horizon to the zenith (Hönninger et al., 2004;Platt and Stutz, 2008).This results in an enhanced sensitivity to absorbing species in the lower troposphere compared to zenith observing techniques.

Instrument and site description
This study focuses on MAX-DOAS measurements at three different sites with very different typical meteorological conditions, namely Brussels, Jungfraujoch and Xianghe (suburban Beijing).Xianghe is characterized by a polluted atmosphere, with episodes of extreme aerosol conditions which lead to a very low visibility.The sky over Brussels on the other hand only suffers from mild pollution, but is strongly affected by the presence of clouds.The alpine station of Jungfraujoch experiences almost no aerosol pollution but can suffer from cloudy and snowy conditions.
The instrument in Xianghe (39.75 • N, 116.96 • E) is located about 60 km east of Beijing and points towards the north azimuthal direction, with a 0.8 • field of view.For Xianghe a full MAX-DOAS scan is comprised of nine different elevations angles (2, 4, 6, 8, 10, 12, 15, 30, and 90 • ) and takes about 15 minutes of measurement time.This instrument has been discussed in detail in Clémer et al. (2010); Hendrick et al. (2014).It is a dual-channel instrument composed of two grating spectrometers, covering the UV (300-390 nm) and visible (400-720 nm) wavelength regions.The Xianghe MAX-DOAS instrument has been designed and assembled at the Belgian Institute for Space Aeronomy (BIRA-IASB) in Brussels, and has been continuously running since 2010.
The mini-MAX-DOAS instrument in Brussels (50.78 • N, 4.35 • E), has a shorter wavelength range, limited to 290-435 nm, again pointing north, with a 0.6 • field of view.A full scan goes over 11 elevation angles (2, 3, 4, 5, 6, 8, 10, 12,  15, 30, and 90 • ) and requires approximately 15 minutes.It is a commercial system from Hoffmann Messtechnik GmbH and has been continuously running since 2011.A more detailed description of the instrument can be found in Ma et al. (2013).
The alpine station of Jungfraujoch (46.55 • N, 7.98 • E) is located in the Swiss Alps, with a pointing azimuth of 145 • , at an altitude of 3570 m.A dual-channel UV (300-390 nm) and VIS (400-560 nm) MAX-DOAS instrument has been installed by BIRA-IASB and operational there since 2010.The configuration of the instrument is similar to the one in Xianghe, but it can also reach negative elevation angles pointing down in the valley (−10, −8, −6, −4, −2, and 0 • ).However, we do not use these negative elevation angles in this work.

DOAS data analysis
The first step of the retrieval consists of analysing the MAX-DOAS spectra by making use of the DOAS method (Platt and Stutz, 2008).This method is developed to separate narrow-band differential absorption patterns (which can be related to specific molecules in the atmosphere) from broadband extinction caused by Rayleigh and Mie scattering due to scattering on molecules and particles.The direct products of this technique are differential slant column densities (DSCDs), i.e. the integrated concentration of absorbing molecular species along the effective light path relative to the integrated concentration along the average light path of a reference spectrum.To analyse the MAX-DOAS spectra the spectral-fitting software package QDOAS is used (http://uv-vis.aeronomie.be/software/QDOAS/). Information on aerosol characteristics, i.e.AOD and extinction profile, is obtained using O 4 DSCDs (see Sect. 6.1).This is possible since the vertical distribution of O 4 is well known and nearly constant, as it varies with the square of the O 2 monomer.Deviations of the O 4 DSCD from values representative for a clear sky are often caused by aerosols or clouds.Measurements of the O 4 DSCD can therefore be used for the retrieval of aerosols (Hönninger et al., 2004;Wagner et al., 2004;Frieß et al., 2006).These DSCDs are retrieved in the UV (338-370 nm) for Brussels, Jungfraujoch and Xianghe, and in the VIS for Xianghe and Jungfraujoch (425-490 nm), using the O 4 cross-sections from Hermans et al. (2003).These wavelength ranges are the most sensitive to O 4 absorption (Roscoe et al., 2010), and have minimal interference from other absorbing species.Other trace gases used for the fitting include NO 2 , O 3 , H 2 O, HCHO and BrO, along with a Ring spectrum.For the observed broad-band extinction a fifth-order polynomial is used.A detailed description of the QDOAS setting for aerosol retrievals can be found in Clémer et al. (2010).

The colour index
To characterize the sky conditions at the different measurement sites we develop a cloud-screening method based on two different measured quantities: the colour index (CI) of the sky and the O 4 DSCDs.The CI is defined as the ratio of the intensity of a measured spectrum at two wavelengths, and gives information on the observed colour of the sky.Since, during the daytime, the sky colour changes from blue during clear skies to white/gray when clouds or aerosols are present, we can use the CI to qualify the sky condition.This becomes increasingly difficult for high SZA values, as the sky colour varies, even for clear skies.
The CI for Xianghe, Brussels, and Jungfraujoch are defined as I 405 /I 670 , I 347 /I 420 , and I 405 /I 550 respectively, with I x the median intensity over the [x − 5 nm, x + 5 nm] wavelength range, to reduce the effect of spectral noise on the derived intensity values.The wavelength regions were chosen to obtain the largest spectral contrast, i.e. they span the largest wavelength range possible for the respective instrument, and avoid the influence of strong atmospheric spectral features.
As can be seen for Xianghe in Fig. 1, the CI shows a clear pattern depending on the observed meteorological conditions.For clear skies the CI values are high, due to the wavelength dependence of Rayleigh scattering, and they decrease with increasing aerosol load (Fig. 1:day 35) since scattering on aerosol and cloud particles is less wavelength dependent.We also see a clear separation between the different elevation angles of the observations.The highest CI values can be found for spectra with the highest elevation angles, whereas low elevation angles show lower values and spread.In the case of an extreme aerosol load or full cloud cover, the CI values are all clustered around a constant value (Fig. 1: days 114-115).In the case of broken or scattered clouds, the CI shows a very variable temporal behaviour (Fig. 1: day 256).
Simulations of the CI corroborate the observed decrease of the CI in the presence of clouds and aerosols, as can be seen in Fig. 2.These simulations were made with the DAK (doubling-adding KNMI code) radiative transfer model (Stammes et al., 1989;Stammes, 2001, http://www.knmi.nl/~stammes/DAK/Manual_DAKver312.pdf)under varying aerosol and cloud optical depths, and varying parameters such as wavelength, elevation, SZA and azimuth angle.For the aerosols a homogeneous layer up to 1 km with a single scattering albedo of 0.9 and asymmetry parameter of 0.7 was used, for the clouds these values are respectively 1.0 and 0.85.The cloud base height was set at 1 km, with a total thickness of 1 km, a surface albedo of 0.05 was used, and atmospheric Rayleigh scattering and ozone absorption were included.We also tested the effect of varying the cloud base height, ranging from 1 km to 8 km, but found very little influence on the derived CI values, especially for higher elevation angles.
These simulations show that it is very difficult to distinguish between aerosols and clouds using only CI information.For this reason also information from the observed O 4 DSCDs will be used, which will be discussed in a later section (Sect.4.3).
Fig. 3, which presents simulations of the CI for the three different wavelength ratios used for the different measurement sites, shows that the CI derived from spectra with low elevation angles have a much narrower spread regarding different aerosol settings, making it difficult to distinguish between the different parameters.These simulations furthermore show that the same problem of overlapping simulations occurs for observations taken at SZA> 85 • .For this reason we exclude these data from our study.The simulations at 90 • elevation show a narrower spread for lower SZA values ( 40 • ), compared to the 15 • and 30 • elevation angle.However, at larger SZA ( 55 • ) the situation is reversed.The same result is found for simulations made under different cloud optical depth settings.
As we only have little observations made at low SZA (< 40 • ), we therefore choose the 90 • as the best elevation for our further study.In principle, the method can be extended in a similar way to include CI (and O 4 DSCD) information from multiple elevation angles, with the realization that the higher elevation angles will give the best constraints.This is discussed briefly in Sect.6.1.We restrict ourselves to only one elevation angle for the sake of simplicity and to show that the method already works well with this restriction.The zenith elevation further has the advantage of not depending on the viewing azimuth of the instrument, which simplifies the computational effort for the CI simulations if data sets from instrument with different pointing direction are used.
The resulting calculated zenith CI values for the full Xianghe, Brussels and Jungfraujoch data sets can be found in Fig. 4. All sites show a frequency distribution with a clear peak at the lowest CI values, corresponding to observations taken under non-clear-sky conditions.For Jungfraujoch we see a more bimodal distribution, where the small peak at higher CI values corresponds to a larger frequency of clear days, compared to the other sites.The differences in the observed CI values between the different sites are due to the different wavelength ranges used for the CI calculation and the differences in instrumental response.
It is important to investigate the behaviour of the CI over time to spot variations in the CI which are due to instrumental issues, such as a shift in instrumental response after technical difficulties, changes in set-up, or instrument degradation.If clear CI variations are spotted that can be linked to instrumental issues, it is important to correct for this.
An example of this can be seen in Fig. 5: an instrumental failure at Brussels on the 20th of May resulted in a strong downward shift of the CI values.We corrected for this by shifting the CI values after the failure in such a way that the peak values of the histograms of CI values before and after the incident coincide.It is clear that sufficient data need to be present to make an accurate correction.

The cloud-screening method
To characterize the sky conditions we define three different flags: the sky flag, the broken-cloud flag, and the multiplescattering flag.The sky flag defines the general sky conditions in terms of visibility -i.e.clear, mediocre, and bad.This flag does not distinguish between a visibility reduction due to clouds or aerosols.The broken-cloud flag denotes the presence of broken or scattered clouds.The third flag, which is based on the O 4 DSCDs and not the CI, marks the presence of enhanced multiple scattering in the line-of-sight, which we attribute to the presence of thick clouds.

The sky flag
For the sky flag we define three regions of CI values which are linked to general sky conditions in terms of visibilityi.e.clear, mediocre, and bad.To do this the calculated CI values are normalized between 0 and 1, to remove as much as possible of the wavelength and instrumental effects on the CI between the different measurement sites.This can only be done if enough data are available to have observations during both very clear and low-visibility sky conditions, which allows the determination of the minimum and maximum CI values.
To constrain the three regions, the full set of normalized CI values is compared with a grid of pre-calculated CI simulations, which we scale to give the best match with the observed spread in CI.For this the plane-parallel DAK simulations described in Sect. 3 are used, which are calculated for a range of different wavelengths, corresponding to the ones used for the CI calculation at the different sites.We do not fine-tune other model parameters such as surface albedo to the different site characteristics to minimize the computational effort.The simulations are scaled in such a way that the peak of the normalized CI frequency distribution corresponds to the clustering of simulations with high aerosol and/or cloud optical depth (AOD/COD), and so that the simulation with the lowest simulated aerosol optical depth (AOD = 0.05) follows the top of the normalized measured CI values.Additional AOD information from co-located instruments, such as a Cimel sun photometer (Holben et al., 2001), Brewer spectrophotometer (Cheymol and de Backer, 2003;De Bock et al., 2010) or solar irradiance instruments (Nyeki et al., 2012), is used to validate the procedure and make small adjustments in the scaling.As can be seen in Fig. 6, the distribution of scaled CI simulations corresponds well to the observed CI values and measured AOD values.
We then take the scaled simulation made with AOD = 0.15 and COD = 0.0 (green-diamond line in Fig. 6) as the limit to separate the "good" and "mediocre" region, as the simulation predicts that data above this curve were taken under cloud-free conditions with an extremely low aerosol load.This is further corroborated by comparison with co-located AOD measurements, as explained above.CI values above this curve are therefore flagged as made under "good" visibility conditions.
To separate between the "mediocre" and "bad" regions we define a horizontal line in such a way that the peak of the frequency distribution falls in the "bad" region.More specifically, we place the line at a distance of FWHM (full width at half maximum) from the peak position of the histogram.If x and y respectively denote the CI values and the frequency distribution, then the x position of the limit is x bad = x(y max ) + FWHM(y).Note that this is of course only valid if the peak of measured CI values is associated with cloudy conditions.For sites with very clear skies and only little cloudy measurements a reverse approach could be taken.In this case a similar definition using the peak distribution could be used to define the "good" regime and the "bad" regime by comparing with simulations.
The resulting "good", "mediocre", and "bad" regions can be seen as, respectively, the green, orange and red regions in Fig. 7.
These regions correspond to different visibility conditions.Data flagged as "good" are taken under relatively clear conditions, i.e. very low aerosol and cloud optical depth."Mediocre" data represents data under sky conditions with slightly decreased visibility, i.e. thin clouds and/or moderate aerosol pollution.Data with a "bad" flag points to the presence of thick clouds and/or extreme aerosol conditions.

The broken-cloud flag
To determine the presence of broken (semi-continuous cloud cover) or scattered clouds (predominantly clear sky) in the line-of-sight of measurement, the temporal variability of the CI is studied.As could already be seen in Fig. 1, the CI remains very stable for clear skies, skies with aerosol pollution, and skies with a full cloud cover, but in the presence of scattered clouds, the CI shows large drops in value when a cloud passes over.rapid temporal variability in the CI.For these days it was found that the observed jumps in CI predominantly fall above this cut-off value.These outliers are flagged as observations made under scattered/broken-cloud conditions.Examples of this modelling and outlier determination can be found in Fig. 8.
Note that this broken-cloud (BC) flag does not give any information about the presence of a full cloud cover, since this will not give rise to strong temporal variation in the CI values.The influence of aerosol variability on the temporal variation of the CI typically gives rise to a smoother increase or decrease, and will not give rise to the strong temporal jumps seen for cloud contamination.

The multiple-scattering flag
As discussed in the previous sections, the CI alone is not enough to distinguish between the presence of visibility reduction due to clouds or to aerosols.To partly resolve this problem we also define an additional constraint based on the measured O 4 DSCDs, which provide information on the effective light path of scattered photons.Clouds can have an increasing or decreasing effect on the O 4 DSCD value, with respect to clear-sky conditions.The first typically occurs for optically thick clouds, due to enhanced multiple scattering in the cloud layer.Optically thin clouds at high altitudes can also lead to an increase, but only for measurements under low elevation angles.Thin clouds at low altitudes tend to decrease the O 4 DSCDs at all elevation angles (Wagner et al., 2011).An increase in aerosol load will also affect the O 4 DSCDs, and will lead to a decrease in observed spread for the different elevation angles, as can clearly be seen in Fig. 1a.
Since both clouds and aerosols can thus have a very complex effect on the O 4 absorption, which can only be investigated in detail by comparing with radiative transfer models (as done in Wagner et al., 2013), we opt to only study the temporal variation of the measured O 4 DSCDs.Strong temporal variability due to enhanced multiple scattering commonly occurs in optically thick clouds, and is seen less for high aerosol optical depth, as illustrated by Fig. 1.
To study the temporal variability a similar procedure as for the detection of broken clouds is applied.Since we are not interested in slow and smooth changes in O 4 absorption, such as the observed diurnal trend, the DSCD measured at zenith is subtracted from the DSCDs at lower elevation angles α.This technique is commonly used in MAX-DOAS retrieval studies (e.g.Clémer et al., 2010;Hendrick et al., 2014), as it effectively removes the (negligible) stratospheric contribution to the O 4 absorption (Hönninger et al., 2004).Here, it has the advantage of removing the very strong diurnal trend, which hinders our modelling and outlier detection.
We then again model the resulting O 4 (α −90 • ) DSCDs with a double-sine function f (t) (see Fig. 9), and define an outlier as points with |(O 4 (t) − f (t))/f (t)| > 0.2.We then make the assumption that these outliers are affected by multiple scattering due to clouds.This multiple-scattering (MS) flag can be defined for measurements at each elevation angle, but here we will focus further only on the 30-90 • elevation scan, as the 30 • elevation is closest to zenith and thus will encounter the lowest temporal cloud variation.In the case of zenith-pointing DOAS instruments, one could use only the zenith O 4 data, but then a model curve suited to fit the strong diurnal variation needs to be chosen.

Flag comparison
A good agreement is found between the O 4 -based flag and information derived from the CI-based flags.Both types of flags can be used to mark cloudy data, i.e. data with enhanced multiple scattering on the one hand and data with a "bad" sky flag or broken-cloud flag on the other.Ideally, the same data points should be flagged as cloudy by both flag types.However, the multiple-scattering flag will be most sensitive to clouds with a high optical depth, whereas the colour index is also sensitive to clouds with a lower cloud optical depth, as even such clouds quickly change the observed sky colour.
For Brussels this indeed seems to be the case and a good agreement of 85 % is found; for Jungfraujoch, however, only about 60 % of points get marked as cloudy by both flag types.
The Xianghe data set shows the limitations of cloud screening using only the CI flags: when using the CI flags 20 % more data are marked as cloudy in comparison to the multiple-scattering O 4 flag.This is due to the fact that events of very high aerosol optical depth (AOD> 2) are wrongly flagged as cloudy.For sites where such events occur regularly, the multiple-scattering flag seems to be a better choice to detect thick clouds.However, since the multiple-scattering flag requires a much larger computational effort compared to the CI flags and it adds little additional information for low or mild aerosol pollution, we opt not to use it for sites with low or moderate aerosol pollution.A comparison between the effect of the different flags on the measurements can be found in Sect.6.
As MAX-DOAS measurements have the benefit of multiple viewing angles, one could extend the zenith-viewing method proposed here to other elevation angles.From our simulations it is clear that the highest elevation angles are best suited for this.Different viewing angles will mainly be sensitive to broken or scattered clouds, as the flagging in the case of clear or overcast days will give the same results for each elevation angle.Sites which experience a lot of broken clouds including CI flags from for example the 30 • elevation angle will therefore be more likely to correctly identify the presence of clouds.We briefly discuss the effect on this on our retrievals in Sect.6.

Comparison with infrared cloud-cover measurements
In order to validate the previously defined flags, the cloudscreening results for Brussels are compared with thermal infrared cloud-cover measurements.The Brussels site has access to an infrared pyrometer, which determines the total cloud-cover fraction based on temperature data over a field of view of 6 • (Gillotay et al., 2001).The method works well to describe most cloudy conditions, with the exception of cirrus clouds with variable emissivity.The total cloud-cover fraction is defined as the ratio between the observed cloudy solid angle elements and clear-sky elements.
In Fig. 10 the total cloud-cover fraction values for an example day can be seen, where we colour-marked our different CI flagging results.High cloud-cover fractions are systematically flagged with a "bad" sky flag, whereas low cloudcover data correspond to "good/mediocre" sky flags.These results are summarized in Fig. 11 where the distribution of cloud-cover values for the full Brussels data set (∼ 2.5 years) is given, together with the distribution of points with corresponding sky flags.In the bottom plot the fraction of our sky flag results over the cloud-cover values are shown.Data with high cloud-cover fractions (> 60 %) are generally flagged as "bad" by our cloud screening, whereas 80 % points with a cloud-cover fraction < 20 % are flagged as "good/mediocre with no broken clouds".
The same exercise was performed for the multiplescattering flag, and we again find that data with high cloudcover fractions are typically flagged as having a multiple scattering.This can again be seen in Fig. 11, where the blue line denotes the distribution of the MS flag over the cloudcover percentages.We find that compared to the flags derived from the CI, more data with low cloud-cover values are flagged as having multiple scattering, i.e. being cloudy.
This shows that there is a good agreement between our cloud screening and the cloud-cover determination.One has to take into account that the field-of-view of the thermal infrared instrument is significantly larger than that of the MAX-DOAS, and thus that different areas of the sky are measured.Also, the value of our sky flag does not only depend on the presence of clouds, but also on the aerosol content.This means that cloud-free measurements can still be flagged as "mediocre" (or even "bad") if aerosols are present.

Application to aerosol model retrievals
With the flags defined in the previous section we proceed to define a cloud-screening procedure for our MAX-DOAS retrievals.For this particular study we are interested in the aerosol load of the atmosphere, which means data too strongly affected by clouds need to be removed, but measurements made under polluted conditions should be retained.
For this reason we investigate data with a sky flag of either "good" or "mediocre", since a sky flag of "bad" is most commonly found for fully cloudy conditions.To remove data which are affected by scattered or broken clouds, we only keep data which are not flagged in the broken-cloud screening.This effectively removes all the cloud-contaminated data from our data sets.However, it also removes data under extreme aerosol conditions, as often found in Xianghe, as those measurements will also be flagged as "bad" in the sky-flag determination.To resolve this issue, we use the multiplescattering flag to differentiate between measurements made under conditions with either high AOD or high COD.

AOD model retrievals and measurements
To study the effect of our cloud-screening method, AOD values retrieved by MAX-DOAS are compared to co-located AOD measurements.For Xianghe and Brussels we use AERONET Level 1.0 (unscreened) (and 1.5, cloud-screened) data, and for the Brussels site we extend the comparison with co-located Brewer spectrophotometer measurements at 320 nm (Brewer instruments #16 and #178).A detailed description of the co-located instruments and measurements can be found in Cheymol and de Backer (2003), De Bock In red we plot data with a "bad" sky flag, in orange data with a "good/med" sky flag and a broken-cloud flag, and green denotes points with a "good/med" sky flag but no brokencloud flag.In blue we mark data with a multiple-scattering flag.Bottom: fraction of total cloud-cover values as distributed over our different flag values.et al. ( 2010) and Holben et al. (2001).At Jungfraujoch no AERONET instrument is available, so we used AOD values as measured by the Precision Filter Radiometer Network (Nyeki et al., 2012) in the context of the Global Atmosphere Watch -World Data Center for Aerosol (GAW-WDCA) programme of the World Meteorological Organization (WMO) (ebas.nilu.no).
For the MAX-DOAS aerosol retrievals the bePRO radiative transfer code (Clémer et al., 2010) is used, which is an inversion algorithm based on the optimal estimation method (Rodgers, 2000, OEM).The model uses the observed MAX-DOAS O 4 DSCDs to derive the vertical profiles of the aerosol extinction at different wavelengths.For Xianghe and Jungfraujoch, we focus on the 360 and 477 nm wavelengths, whereas for Brussels we only have access to the 360 nm wavelength due to the shorter instrumental range.A detailed description of the bePRO algorithm and parameters can be found in Clémer et al. (2010).It is found that the model is most sensitive to aerosols close to the surface, below 1 km, and typically contains about 1-2 pieces of independent information (DFS, degrees of freedom for signal).As discussed in Clémer et al. (2010) a correction factor of 0.8 was applied to the measured O 4 DSCDs for the Xianghe data set.This correction factor is needed to account for the observed offset between the simulated O 4 DSCDs and the measured values.For both the Brussels and Jungfraujoch data sets we did not find an improvement using the 0.8 correction factor, so we did not apply it here.At this point it is unclear what the origin is of this observed discrepancy between the measured and modelled O 4 DSCDs.
Only those results where the retrieved O 4 DSCDs have a percentage root mean square difference (RMSD) < 50 % from the measured DSCDs are kept for the further study, since high RMSD values typically point to a failure of the model during the retrieval process.We define the RMSD as . For Jungfraujoch we use a more strict cut-off value of RMSD < 30 % since we found that our modelling was less successful and stable for this data set.

Impact on aerosol retrievals
We perform the cloud screening by defining data as cloudy if: the sky flag is "bad" or it has a broken-cloud flag when using the CI-based flags, or it has a multiple-scattering flag when using the O 4 -based flag.These constraints generally correctly flag the presence of thick clouds, but are less reliable when thin clouds are present, as the discrimination between thin clouds and low/mild aerosol pollution is not straightforward.
The results of our AOD retrievals and cloud screening can be found in Fig. 12 and the online supplementary material.In these figures the cloud-screened and co-located AOD measurements (black) and the cloud-screened bePRO retrievals are shown, colour-coded with the different flag values.A removal of data with evidence for the presence of clouds, be it either based on the sky and broken-cloud flag or the multiplescattering flag, results in a much better agreement with the AOD measurements and retrievals.From Fig. 12 it is clear that "bad" data on average have higher AOD values.This is due to the fact that our bePRO model tries to model the observed optical depth increase caused by the clouds with aerosol optical depth, as clouds are not present in the model.
For the Xianghe data set it is also clear that a cloud screening based on the colour index alone, i.e. the sky and brokencloud flag, often removes data under extreme aerosol conditions.When using the CI flags, about 60 % of data are removed, whereas the O 4 -based flag only removes about 35 % of the data.Additional information from the O 4 DSCDs, i.e. the multiple-scattering flag, is needed to make sure we can differentiate between high AOD and high COD.This problem does not arise for Brussels and Jungfraujoch where a similar amount of data is flagged as cloudy by both methods, since here the AOD typically does not reach values above 1.5.It is also clear from Fig. 12 that a correct broken-cloud identification is much more important at the Brussels site, where clouds contaminate almost 70 % of the data.The Jungfraujoch data set clearly shows the low aerosol levels of the alpine site (AOD < 0.2), with significantly less cloudy conditions (∼ 45 %) compared to Brussels.An overview of the statistics of data removal by the cloud-screening method can be found in Table 1.
We also compared the distribution of AOD measurements and retrievals, as presented in Fig. 13.It can be seen that the cloud screening does not drastically change the observed shape and peak in the frequency distribution.For Xianghe we find a significant difference in shape between the 360 and 477 nm retrievals, with the former giving bePRO retrievals peaking at AOD = 0.05, whereas the latter peak around 0.2.The difference between our bePRO retrievals and co-located AERONET measurements is about 0.1, with a shift of the retrievals to the left for 360 nm, but to the right for the 477 nm data.This difference in AOD for different wavelengths was already found for a similar study of MAX-DOAS aerosol be-PRO retrievals (Clémer et al., 2010) and could possibly be due to fitting difficulties during the DOAS retrievals, or uncertainties on the aerosol phase function used in the bePRO model.For Jungfraujoch we find very low AOD values, with a more double-peaked shape in the bePRO retrievals.For both wavelengths the retrievals peak at AOD = 0.01-0.02,but with a much narrower distribution at 477 nm.The colocated AOD measurements peak at AOD = 0.005 and 0.03 respectively for 360 and 477 nm.For the Brussels data set we again find that the cloud screening mostly removes data at higher AOD values, changing the observed right shoulder of the distribution.The bePRO retrievals and co-located measurements show a similar peak position, but the latter show a slightly broader distribution.A more detailed description of the bePRO modelling and retrieved aerosol properties for the different sites will be presented in an upcoming paper.

Correlation with co-located measurements
Correlation plots between our retrievals and co-located AOD measurements (AERONET/Brewer/solar irradiance spectra) can be found in Fig. 14.For the correlation study we averaged our retrievals in time steps of 0.2 h for Xianghe and Brussels.For Jungfraujoch the co-located measurements are hourly averages so we used the same 1 h time step.Averages were given a "bad" sky flag if ≥ 20 % of individual points had a "bad" sky flag, and a broken-cloud flag if ≥ 10 % of points had a broken-cloud flag.For the multiple-scattering flag we flagged the averages if more than one data point was flagged.
It should be noted that the AOD measurements (AERONET, Brewer, Cimel) themselves typically only operate when direct-sun observations can be made, which effectively removes a large part of data to use in the correlation study.For our correlation study we use non-cloud-screened co-located AOD measurements, a description of which can be found in Smirnov et al. (2000), Cheymol and de Backer (2003), De Bock et al. (2010) and Nyeki et al. (2012).
For Xianghe, about 46 % of points with coincident colocated measurements for the correlation study remain.For Brussels and Jungfraujoch this is around 20 %.This large removal of data is not only due to direct-sun restrictions but also long-time inoperativeness of the AERONET/Brewer instruments.Another note of caution is that the MAX-DOAS and other AOD-measuring instruments have different Example results of our bePRO AOD retrievals (crosses) compared to co-located AOD measurements (black diamonds/asterisks for non-screened/screened respectively).The different colours used for the retrievals denote the different cloud-screening results.Data with a "bad" sky flag are in red, data with a "good" or "mediocre" sky flag are in orange, data with a "good" or "mediocre" sky flag plus no broken-cloud flag are in green, and data with no multiple-scattering flag are in blue.More results can be found in the online supplementary material.
viewing directions, and might thus trace regions with slightly different cloud and aerosol characteristics.
We find a good agreement between our cloud flagging and the absence of AERONET/Brewer data.For Brussels ∼ 75 % of data without coincident measurements are flagged as cloudy, for Xianghe this number goes up to 80 % and for Jungfraujoch around 65 % of data with no co-located measurements are flagged as cloudy.A large percentage of the remaining data without co-located measurement but no cloud flag from our method can be attributed to instrumental inoperability.
The correlation plots in Fig. 14 show the linear regression results using only the CI information (green/orange crosses) and the results using only the O 4 DSCD multiple-scattering information (blue diamonds).As already mentioned above, without the multiple-scattering flag from O 4 absorption we remove non-cloudy data under extreme aerosol conditions, which is especially important for the Xianghe data set.
For the Xianghe data set we find high correlation coefficients R for the non-cloud-screened data.This is due to the fact that this site has only little influence from clouds, especially in comparison to Brussels, as can be seen in Figure 12.For both 360 and 477 nm we have a correlation value of ∼ R = 0.86, and also the linear regression slopes S are very close to S = 1.For both wavelengths the cloud screening based on the CI (green crosses) slightly increases the correlation, with correlation values changing from R = 0.86 to R = 0.89.We see a difference between the two wavelengths: at 360 nm our model seems to overestimate the AOD in comparison to AERONET, whereas the opposite occurs at 477 nm.Applying the cloud screening improves the slope at 477 nm (from S = 1.21 tot S = 0.91), but worsens the slope at 360 nm (from 0.95 to S = 0.78).For both wavelengths it can be seen that a cloud screening using the multiple-scattering flag improves the observed correlation slightly with ∼ 0.01.It is clear that this method includes an area of high AOD values which were not covered using only the CI-based cloud screening, and that for sites that experience high AOD levels the CI (alone) is not enough for a correct cloud detection.Overall, the effect of our cloud screening on the Xianghe data is only minimal, which is mainly due to the fact that the site is not highly affected by clouds.
For Brussels we find much lower correlation values between our results and AERONET data.Without cloud screening, the correlation coefficient has a value of R = 0.37.When we remove data with a "bad" sky flag, the correlation coefficient increases to R = 0.56, and it increases even more when we also remove data contaminated by broken clouds, reaching a value of R = 0.62.Applying the cloud screening also changes the observed slope from S = 0.60 to S = 0.83 for the 0.2 h averaged time step.However, we find that even though the O 4 -based cloud screening improves the correlation com-pared to the non-cloud-screened data, it does not give as good R values as the CI-based cloud screening.
When comparing the retrievals with the co-located Brewer measurements we do not expect to find an exact one-toone correlation as the AOD values are determined at different wavelengths, respectively 320 nm and 360 nm for the Brewer and MAX-DOAS measurements.However, as we expect the AOD to have similar temporal behaviour at both wavelengths we can still study the observed correlation.We find a strong improvement in the observed correlation after applying our cloud screening, resulting in an increase from R = 0.53 to 0.7.
The extremely low AOD values for Jungfraujoch make it difficult to comment on the agreement between the modelled AOD values and co-located GAW-WDCA measurements, as most of the points are clustered around AOD = 0.02.Our bePRO retrievals in some cases strongly overestimate the AOD, which skews the observed correlation and slope to extreme values of R = 0.0-0.08 and S = 0. Applying the CIbased cloud screening only minimally improves the retrieved Table 1.Overview of the number of retrieval points removed by the cloud-screening procedure.We give both the statistics for CI cloud screening based only on the zenith elevation angle (top) and cloud screening based on both the 30 and 90 • elevation angles (bottom).The first column shows the respective site and wavelength, the second column gives the total number of points before cloud screening, the third column provides the percentage of data remaining after removing data with a "bad" CI flag, the fourth after removing data with a "bad" CI flag and a broken-cloud flag.The last column gives the percentage of data remaining after removal of data with a multiple-scattering flag.correlation to R = 0.15.We find that the O 4 -based cloud screening removes a similar amount of data points but does not drastically improves the correlation or slope.However, it is clear that the intrinsic uncertainties of the bePRO retrievals are too big in the case of very low AOD values to make strong conclusions on the observed correlations.
In the supplementary material we also show the correlation between our AOD retrievals and co-located measurements,

Figure 1 .
Figure 1.Comparison of 4 days at Xianghe with distinct meteorological conditions.The top box shows the measured AERONET AOD at 477 nm, both the non-screened (level 10) and cloud-screened (level 15) data, the middle box the measured MAX-DOAS O 4 DSCDS (in units of 10 40 molec 2 cm −5 ) and the bottom box the calculated colour index.Different colours represent the different MAX-DOAS elevation angles.Fractional day is always given in UT time.

Figure 2 .
Figure 2. Simulations of the colour index (I 405 /I 670 ) under varying aerosol optical depth (AOD) and cloud optical depth (COD).The simulations were performed with the DAK plan-parallel radiative transfer model(Stammes, 2001), using a cloud-layer height of 1 km.

Figure 3 .
Figure3.Simulations of the colour index under varying AOD and using the wavelengths corresponding to the three different measurement sites, respectively Xianghe, Brussels and Jungfraujoch from top to bottom.The simulations were performed with the DAK planparallel radiative transfer model(Stammes, 2001), using a cloudlayer height of 1 km and a cloud optical depth of 0.

Figure 4 .
Figure 4.The calculated zenith CI values and frequency distribution for the full Xianghe, Brussels, and Jungfraujoch data sets.

Figure 5 .
Figure 5. Illustration of the CI correction for Brussels due to an instrumental failure on May 20th 2012.The top panel shows the CI values and histograms before and after the incident in black and blue respectively.The bottom panel shows the corrected CI values.This was done by shifting the blue points in such a way that both histograms have the same peak value.

Figure 6 .
Figure 6.The normalized CI values (points) versus solar zenith angle, together with the scaled CI simulations (coloured lines) made under different aerosol and cloud optical depth values (left/middle legend).We colour-marked the observed CI values with additional AOD data if available (right legend).

Figure 7 .
Figure 7.The normalized CI values (points) versus solar zenith angle.The green, orange and red regions correspond to the "good", "mediocre" and "bad" regions as defined by the sky flag.

Figure 8 .
Figure 8. Results of the CI modelling (red line) to the measured CI values (black diamonds) and outlier detection (blue crosses) for the broken-cloud flagging, for example days in Xianghe and Brussels.

Figure 9 .
Figure 9. Top panel: results of the O 4 DSCD modelling (green line) to the measured O 4 values (black crosses) and outlier detection (blue diamonds) for the multiple-scattering flagging, for an example day in Brussels.For comparison we also show the results of the CI flagging (bottom panel).The CI values (asterisks) are coloured according to their CI flag.Outliers from the broken-cloud flagging are marked with a blue diamond.

Figure 10 .
Figure 10.Total cloud-cover values from thermal infrared measurements are given in black points.Top: overplotted in coloured crosses are the respective CI-flag values as derived from our cloudscreening method (green/orange/red = good/med/bad).Data with a broken-cloud flag are marked with a magenta diamond.Bottom: in blue asterisks we plot data points with a multiple-scattering flag.

Figure 11 .
Figure11.Top: distribution of the cloud-cover percentages for the full Brussels data set in black, with the division of data points with different sky flags.In red we plot data with a "bad" sky flag, in orange data with a "good/med" sky flag and a broken-cloud flag, and green denotes points with a "good/med" sky flag but no brokencloud flag.In blue we mark data with a multiple-scattering flag.Bottom: fraction of total cloud-cover values as distributed over our different flag values.

Figure
Figure 12.Example results of our bePRO AOD retrievals (crosses) compared to co-located AOD measurements (black diamonds/asterisks for non-screened/screened respectively).The different colours used for the retrievals denote the different cloud-screening results.Data with a "bad" sky flag are in red, data with a "good" or "mediocre" sky flag are in orange, data with a "good" or "mediocre" sky flag plus no broken-cloud flag are in green, and data with no multiple-scattering flag are in blue.More results can be found in the online supplementary material.

Figure 13 .
Figure13.Histograms of our bePRO AOD retrievals and co-located AOD measurements for the three data sets.The bePRO retrievals are plotted in full lines, with black, red, green and blue respectively the unscreened data, data with "bad" CI flag, "good" CI flag and no brokencloud flag, and no multiple-scattering flag.The co-located measurements are plotted in dashed lines, with magenta being the unscreened data and dark green the cloud-screened data.

Figure 14 .
Figure 14.Correlation plots of our bePRO MAX-DOAS AOD retrievals and measured AOD values for the Xianghe and Jungfraujoch data set at 360 and 477 nm and for the Brussels at 360 nm, in time steps of 0.2 h. for Xianghe and Brussels and 1 h for Xianghe.The full noncloud-screening data is given by black crosses.Cloud-screened data (based on the CI) with a "good/mediocre" sky flag are marked in orange, data with "good/mediocre" sky flag and no broken-cloud flag are marked in green crosses.Data with no multiple-scattering flag (based on the O 4 DSCDs) are marked with blue diamonds.For each sample set we also give the linear regression lines and correlation information.