A broad range of different of Bayesian cloud detection schemes is applied to measurements from the Medium Resolution Imaging Spectrometer (MERIS), the Advanced Along-Track Scanning Radiometer (AATSR), and their combination. The cloud detection schemes were designed to be numerically efficient and suited for the processing of large numbers of data. Results from the classical and naive approach to Bayesian cloud masking are discussed for MERIS and AATSR as well as for their combination. A sensitivity study on the resolution of multidimensional histograms, which were post-processed by Gaussian smoothing, shows how theoretically insufficient numbers of truth data can be used to set up accurate classical Bayesian cloud masks. Sets of exploited features from single and derived channels are numerically optimized and results for naive and classical Bayesian cloud masks are presented. The application of the Bayesian approach is discussed in terms of reproducing existing algorithms, enhancing existing algorithms, increasing the robustness of existing algorithms, and on setting up new classification schemes based on manually classified scenes.

Cloud masking of Earth observation measurements is an important and often crucial part of various remote sensing retrievals. This includes, but is not limited to, the retrieval of cloud and aerosol microphysical parameters, the estimation of cloud cover, ocean color retrievals, and in general, algorithms which include atmospheric correction schemes. Cloud masking algorithms differ widely in their complexity, computational requirements, and assumptions about what a cloud is and which physical process is exploited for their detection. Implementation of particular algorithms are often application specific, which makes the cloud masks as well application specific and generally complicates the inter-comparison of results from different cloud masks.

This paper emphasizes the application of Bayesian methods for the cloud
masking of the complete 9.5 year time series of the Medium Resolution Imaging
Spectrometer (MERIS)

Major challenges of cloud detection are validation, the correct
classification of scenes with clouds for mountainous regions and over snow-
and ice-covered areas, and the distinction between clouds and optically thick
aerosol plumes such as dust storms. These points are discussed in more detail
in Sect.

Common approaches to cloud masking are hierarchies of thresholds

The results presented here are computational highly efficient and are very
well suited for the processing of large numbers of data, which makes these
results very well suited for future application to the Ocean Land Colour
Instrument (OLCI)

Bayes' theorem can be used to reverse joint probabilities. It is appealing to
apply it to cloud masking since its theory is widely adopted, its
implementation on a computer system is straightforward, and its results are
probabilities which can be directly interpreted. The theorem allows the
computation of the probability

With

Evaluating Bayes' theorem involves only a few arithmetic operations so that
a specific implementation can be very fast and efficient, which is of
importance when large numbers of data are to be processed. Additional
computations involve the feature

With an appropriate set of thresholds, one can convert the probability

Estimating the value of the background probability

For the special case of

Several papers on Bayesian approaches to cloud masking have been published in the past and fundamental differences between the various algorithms are often buried in the technical details of the particular paper. A nomenclature which aims to clearly separate different approaches to Bayesian cloud masking is discussed in the following.

Let the feature

Let us call the feature set

This paper focuses on Bayesian cloud masks based on strongly independent features. Only MERIS and AATSR measurements and trivial functions operating on them are used to construct the feature set. This class of features allows to implement a numerically highly efficient algorithm with simple opportunities to parallelization and vectorization. With no dependency on external data, the algorithm can be used in non-operational environments where the acquisition of NWP data can require significant effort. In general, there is no obvious reason why the techniques which are discussed in the following sections are limited to the independent case.

The second major branch in Bayesian cloud masking schemes involves the
computation of the joint probabilities

Computing the joint probabilities in the classical approach can be greatly
simplified by assuming an analytic form and estimating its parameters.
Depending on the assumed form, for instance multivariate Gaussian

The classical and naive approaches can be mixed when one or more subsets of the

This paper is mainly concerned with the discussion of the classical and naive approach with an emphasis on the classical one. In conclusion, this paper is mostly concerned with the application of classical Bayesian cloud masks based on strongly independent features. As it will be shown later in the paper, the classical approach gives better results for the cloud masking in our scenario and the strongly independent feature set was chosen to allow the implementation of a very fast algorithm.

Cloud detection methods based on Bayesian probabilities have been used for
cloud masking in the past, and a short overview is given now but without the attempt to
fully outline them.

Channels of the MERIS and AATSR instruments cover the spectral range from
412 nm to

Strongly independent features are constructed using a single channel or any
combination of channels in a trivial function. Such combinations have been
called derived channels in the literature

Several views of a scene over Greenland from
17 July 2007 with the image centered at

Similar to Fig.

In contrast to approaches based on expert knowledge, an objective measure for
any given set of feature functions is exploited to numerically search for the
best possible set of feature functions. Maximizing the Hanssen–Kuipers skill
score

Validation of cloud masks for MERIS and AATSR on-board ENVISAT is a difficult
task since no generally accepted and available set of truth data exists. A
generally used approach is to generate truth data by means of manual
classification of images by human experts or the use of data from ground-based stations. Converting a ground truth to a pixel-by-pixel truth can be
complicated, and possibly insufficient spatial coverage can limit the
applicability of that approach. Consequently, most approaches for generating
truth data for MERIS and AATSR are based on the manual
classification of sample data by human experts

To demonstrate the feasibility of the Bayesian approach, results from the
Synergy cloud mask (see

Optimizing the choice for a particular set of feature functions is not
straightforward, since this problem is noncontinuous with a varying number of
free parameters. First, the number of feature functions has to be set. Then,
for each feature, a feature function from the pool of considered functions
has to be selected. The identity function, all four basic arithmetic
operations, and the index function are considered as feature functions. As a
last step, the input channels for each feature function must be set.
Depending on the chosen functions and channels, a maximum of

Then, for a particular feature set, the prerequisites for computing the joint
probabilities must be carried out, which is described in detail in Sect.

The only numeric optimization procedure that we are aware of, which is
generally applicable to this situation, is a random search in the huge search
space spanned by this outlined procedure. This is quite a different approach
to that of a human expert, who would likely start an educated search but
might not attempt to cover the whole search space. The number of possible
combinations depends on the number of chosen features and the number of
available channels (22 in the case of the MERIS and AATSR Synergy) and can be
estimated using the binomial coefficient. In the simplest case, where merely
the identity function is used, no channel is used more than once, and four
features are to be selected, the search space spans

The proposed random search might not be able to cover the complete search
space, but with a sufficiently long runtime one will be able to find solutions
with a sufficiently high skill score. In addition, unusual combinations of
channels might be found which would not be considered in an educated search by
a human expert. The features shown in Figs.

The physical meaning of a certain feature set and why it might be better or worse than a different one is not discussed here and is also not within the scope of this paper. This knowledge is very useful for educated searches but is not necessarily needed in this setup. However, for the experienced expert it might be only slightly surprising which channels are found to be successful by the optimization scheme. There is also no apparent reason why human experts should not compete with the optimization scheme in order to find an optimum set of features. This is especially important for applications where only a small fraction of the search space can be tested using the optimization approach.

Implementing such a search strategy is straightforward. A generator of random feature functions must be implemented and each of these instances can be tested for its skill score with respect to the artificial truth. This procedure is easily parallelizable, and one could store only the results with a higher skill score rather than some predefined value. At any given time during an ongoing search, one can sort these results and evaluate the top results.

The background joint probabilities

However, the main argument of

Both left panels of Fig.

Both right panels of Figs.

It should be noted that this is an extreme case and we do not propose to use so few data points to construct cloud masks for real-world applications. These two examples merely show how well this approach operates and that a surprisingly small number of data might be sufficient to explore the application of classical Bayesian cloud masks.

The Gaussian smoothing approach works reasonably well and is so far only justified by its actual success for a particular problem, where in fact sufficient numbers of artificial truth data are available. Its general application to situations with limited numbers of such data is therefore not very well justified. However, numerical experiments with the available data have shown that this approach yields remarkably good results. Other functional kernels have not been tested, but the Gaussian approach seems sufficient since the convoluted histograms yield nearly the same skill score as the original histograms. Success of this approach is likely based on the fact that the smoothing procedure distributes data to neighbor bins but does not strongly change the defining spectral features of the measurements. That is, it implicitly creates data which could represent different viewing geometries or situations with slightly varying optical parameters. Hence, this approach is not justified by first principles but rather with working examples which strengthen our expectations that this approach will work reasonably well for any other set of features.

The Synergy cloud mask is discussed in detail by

Difference of the two-dimensional histograms
for

Similar to Fig.

When the computation of

A Bayesian cloud mask can be used to approximate independent algorithms but
with the advantage of possibly drastically decreased computation times.
However, it is not obvious that a particular algorithm is reproducible to a
sufficient extent with this technique. Artificial truth data from the Synergy
cloud mask, which was shortly discussed in Sect.

The presented results do not have to represent a global optimum since only a small fraction of the search space was covered in the finite search time. Depending on the number of features and the classical or naive Bayesian approach, a certain upper bound of skill scores for any test case was not exceeded, but many feature sets with similar skill score to that soft limit were found.

Figures

Global distribution of skill scores for a
classical Bayesian cloud mask using only two strongly independent features.
Data are shown for the year

Interpreting spatial patterns of skill score or reproducibility is not straightforward. It is difficult to differentiate between poor reproducibility caused by inherent limitations of the selected feature set and that caused by inconsistencies or errors in the truth data. In general, when one decides to trust the truth data, one can only explore the state of methodological parameters such as the selected features or bin size of the histograms in order to optimize the reproducibility. It is then up to the potential user whether a certain skill score meets the requirements for the desired application.

The data used to produce Figs.

Similar to Fig.

Best found results for
feature sets of classical Bayesian cloud masks with two strongly independent
features which best recreate Synergy cloud mask results. The results are
separated for the Synergy of MERIS and AATSR, MERIS, and AATSR. Channels are
referenced by their central wavelength. MERIS channels use the unit nm,
while AATSR channels use

Similar to
Table

Similar to
Table

Similar results can be achieved by using different combinations of feature
functions and channels. An overview of results for the Synergy data,
MERIS, and AATSR alone is given in Tables

Table

Similar studies were also performed for higher numbers of features, but no results with significantly higher skill scores were found. The skill score results for using three features are positioned right in the middle of the two discussed results, such that four features seems to be the best choice to reproduce the Synergy cloud mask with a classical Bayesian cloud mask based on strongly independent features.

Cloud probability from the two classical
Bayesian cloud masks from Fig.

Similar searches for naive Bayesian cloud masks with strongly independent
features were performed for 5 and

Concluding this aspect, it is possible to find feature sets that reproduce the Synergy cloud mask reasonably well even without covering the complete search space. For a soft upper limit of the skill score, different feature sets with similar skill score can be found. This is actually not surprising and represents the fact that the same classification results in terms of skill score can be achieved with many different feature sets. From a technical point, it is then sufficient to choose one of those results with best skill scores, even if this might not be the absolute global maximum.

Some commonly used features, such as the brightness temperature difference of

For both classical and naive Bayesian cloud masks, a specific set of features should be evaluated as a whole. The effect of a certain feature on the skill score for the total feature set can be estimated by evaluating results for a particular set with and without the feature in question. The effect on the skill score when adding a feature to a given set might strongly depend on the original feature set. In addition, features which show only poor reproduction skill when used alone might significantly improve the skill score for a certain set of features.

Next, the impact of the number of bins

Skill score of a classical Bayesian
cloud mask with four strongly independent features with respect to number of
bins for each dimension of the underlying histograms and the applied Gaussian
smoothing. Artificial truth data are taken from

With no Gaussian smoothing applied, the skill score clearly decreases with increasing number of bins since the sample size is much too small for this resolution. Also, the impact of the sample is largest when the standard deviation is highest. The skill score increases with increasing number of bins and Gaussian smoothing until a maximum is reached. With the increasing bin number and smoothing, the skill score decreases only slightly. In this case, an optimal set of bin size and smoothing can be found. When smaller vales are used, the skill scores are drastically reduced, but when larger values are used, the skill score decreases only slightly.

A similar sensitivity study is shown in Fig.

In both cases, for small and very large sample sizes of artificial truth data the skill score decreases with increasing Gaussian smoothing for small numbers of bins. This clearly shows that too strong Gaussian smoothing can destroy information in an accurately estimated histogram but distributes information in incomplete histograms such that it better represents the true probability density.

In general, one can not perform such studies to assess the optimal number of
bins and value of Gaussian smoothing parameter, because only an insufficient
number of artificial truth data might be available. The presented results
from numerical experiments indicate that for four features and a sufficiently
large sample of artificial truth data, a bin size of

It was shown so far that Bayesian cloud masks can be used to reproduce at
least one existing cloud mask up to a certain extent. It is unclear, however,
what the limiting factors are in global skill score with respect to this
particular cloud mask. A major contributor to this upper limit can be
inconsistencies in the artificial truth data set. Examples are shown in panel
a and b of Figs.

Similar to
Fig.

The appearance of such errors does not mean that the algorithm should be
abandoned and with it all the work that has been invested into developing it.
Panels c and d in Figs.

Similar to
Fig.

Manual classifications of the scenes shown in
Figs.

This result is merely shown as proof of concept for the enhancement of existing algorithms. The shown case was limited to only two scenes which were manually corrected and used as artificial truth for the Bayesian cloud mask, which is therefore only strictly applicable to these two scenes. In a realistic approach, one would need some knowledge on where the existing algorithm performs below the requirements. This poses no real limitation and will always be the case; otherwise one would have no incentive to improve the existing algorithm. These cases, e.g., limited to certain areas, known weather conditions, or certain periods of time, could be excluded from the artificial truth data set while other correctly classified results are still included. These introduced data gaps, or better representativity gaps, can then be filled with artificial truth data from manual classification. Such an approach can be used to focus the attention of the human experts to areas where their expertise is most strongly needed and to use their available labor in the most efficient way.

As discussed in Sect.

Human experts can produce artificial truth data of high quality by careful manual classification of MERIS, AATSR, or Synergy images. It is of great advantage that the spatial resolution of MERIS and AATSR images is high enough that spatial and spectral patterns together can be used to classify data points. Cloud shadows, for instance, can be used to clearly distinguish clouds from snow and ice surfaces. In that respect, the algorithm itself is not based on spatial information, but it was surely used to create the artificial truth data. It is beyond the scope of this paper to produce a cloud mask with global applicability, but it should be demonstrated how straightforward such a procedure would be. The results presented here are then clearly applicable to OLCI and SLSTR on-board the upcoming Sentinel-3 satellite.

The same two orbits which were discussed in Sects.

Results of this procedure are shown in Fig.

The Bayesian cloud mask is clearly able to separate the clouds from the snow
and ice underground, does not misclassify the land area (see Sect.

This approach is most straightforward when the spatial resolution of the instrument in question is high enough that the human expert can use the spatial pattern information to correctly classify cloudy from non-cloudy areas. For global applicability, a higher number of orbits with representative spatial and seasonal sampling should be included in the set of considered artificial truth data. Especially complex cases such as scenes with ice, snow, sun glint, mountains, or dust storms should be included in the classification effort.

The application of the classical and naive Bayesian cloud masking technique to MERIS, AATSR, and their Synergy was discussed in detail. Bayesian cloud masks based on independent features are numerically highly efficient and are very well suited for the fast processing of large numbers of data. This technique will be applied to a reprocessing of the 9.5 year time series of MERIS and AATSR measurements within ESA's Cloud CCI project.

Details of the actual implementation of the Bayesian cloud mask for Cloud CCI
are not part of this paper. The algorithm is implemented in Python and is
based on the multiprocessing, SciPy, and NumPy libraries

Sufficient numbers of artificial truth data and the frequentist approach can be used to estimate multidimensional histograms for the estimation of background joint probabilities. Gaussian smoothing of appropriate width can be used to drastically reduce the actual numbers of truth data needed to compute histograms for the classical Bayesian approach. This post-processing step greatly simplifies our ability to further explore the classical Bayesian approach.

Due to restrictions of modern computer hardware, the practical limit for the classical Bayesian approach is reached with six to seven features. This does not actually restrict its applicability, since trivial feature functions can be used which combine any number of measurements into a single feature.

It was found that classical Bayesian cloud masks with four strongly independent features are the best choice for the cloud masking of MERIS, AATSR, and their Synergy measurements when the Synergy cloud mask is used as a benchmark. The classical approach gave significantly better results then the naive approach. MERIS and the MERIS–AATSR Synergy give very similar results in terms of cloud classification, while AATSR alone shows significantly smaller skill scores. The MERIS Oxygen-A absorption channel was found to be present in the best results when the set of selected feature functions and channels was numerically optimized.

The broad spectral range and the number of available channels within the Synergy data set can be used to set up Bayesian cloud masks with very similar classification skill but based on different combinations of channels. This can be used to design cloud masking schemes which are robust against partially missing data.

It was shown how Bayesian cloud masks can be used to reproduce the results of existing algorithms, improve existing algorithms and how to set up new classification schemes based on manual classification by human experts. Reproducing existing algorithms offers the perspective of increased numerical efficiency and processing robustness. The approach based on manual image classification is straightforward for the human expert. Classified scenes can be stored and revisited if the produced cloud masks show misclassifications in certain areas or weather conditions. When errors are not traceable to errors in the manual classification, additional scenes can be added to the set of artificial truth data to increase the chance of correct classification.

The presented results for MERIS and AATSR can be used to implement an accurate and highly efficient cloud masking scheme for OLCI and SLSTR on-board the upcoming Sentinel 3 satellite. Especially the additional oxygen absorption channels from the OLCI instrument might be used within an improved and numerically efficient cloud classification algorithm.

Although this paper is focused on strongly independent Bayesian cloud masks, there is no apparent reason which prevents the application of the introduced techniques to the case of dependent Bayesian cloud masks. It is straightforward to include external information such as clear sky radiance estimators or NWP fields in the proposed optimization strategy for the construction of features. The application of Gaussian smoothing to derived histogram fields is independent of external information and can be used to reduce the numbers of needed truth data. To actually assess the added value of the external data, one must assure that the quality of the truth data is sufficient. In the case of MERIS and AATSR, one likely needs a reasonable large set of manually classified data.

This work has been funded by the European Space Agency in the framework
of the Climate Change Initiative project and by the German Federal
Ministry of Education and Research (BMBF) in the framework of the