On the optimal method for evaluating cloud products from passive satellite imagery using CALIPSO-CALIOP d ta : example investigating the CM SAF CLARA-A 1 dataset

Introduction Conclusions References


Introduction
The introduction of the A-Train (i.e.Aqua Train or sometimes referred to as the Afternoon Train) series of satellites (Stephens et al., 2002) has been a major milestone for cloud research and for satellite meteorology in general.For the first time in history, a series of satellites and sensors are able to provide not only the global coverage of cloud fields and aerosols but also the vertical structure of clouds and aerosols and their respective properties.The vertical probing capability was in particular associated with the launch of the CloudSat (Stephens et al., 2002) and CALIPSO (Winker et al., 2009) missions in 2006 since both satellites are carrying active sensors -a cloud profiling radar (CPR) on CloudSat and a cloud and aerosol lidar (CALIOP) on CALIPSO.These satellites have now produced more than five years of data.It means that despite the relatively limited spatial sampling capability from the polar orbit (i.e. a consequence of both active sensors operating exclusively in nadir view) the long time series now offers enough measurements to become useful also for studying mean conditions (approaching the estimation of climatologies) in parallel to the more obvious use in case-to-case process-oriented studies.Good examples of this and of methods and tools aiming at climatological studies are given by Stubenrauch et al. (2012), Cesana et al. (2012), Liu et al. (2012), Devasthale and Thomas (2011), Chepfer et al. (2010), and Delano ë et al. (2011).In addition, the improved statistical significance of the datasets now make them very useful as "ground truth" datasets for training of cloud algorithms (Heidinger et al., 2012).Similarly, an important application is the use for more thorough evaluation of cloud products from various algorithms based on data from other satellite platforms.This concerns especially those based on data from wider-swath scanning sensors measuring in visible, infrared, and microwave spectral regions (i.e.data from passive imagers) as being demonstrated by Holz et al. (2008), Minnis et al. (2008), Reuter et al. (2009), Heidinger and Pavolonis (2009), and Karlsson and Dybbroe (2010).
The information from the CALIPSO-CALIOP lidar is particularly interesting since this sensor is undoubtedly much more sensitive to the presence of clouds in the Figures

Back Close
Full atmosphere than any other space-born sensor at hand.Because of this it has the potential of being used for establishing a firm knowledge of the cloud detection limit for other cloud retrieval methods based on data from other satellite sensors.This aspect is of fundamental importance for securing an optimal and unambiguous use of the information from various cloud retrieval algorithms.An important application in this respect is the method for how to compare satellite-derived cloud parameters to information simulated by climate models and numerical weather prediction (NWP) models.
To ensure an appropriate inter-comparison here specific tools have been developed.The most well-established tool is the Cloud Feedback Model Inter-comparison Project (CFMIP) Observational Simulation Package (COSP) which is described by Bodas-Salcedo et al. (2011).For all the cloud datasets being simulated by COSP from model data, the information on which clouds are detected or not by the various satellite sensors is essential for assuring an appropriate inter-comparison of model and observation datasets.This paper focuses on the use of CALIPSO-CALIOP data for evaluating the cloud detection limitations of the methods used to derive one particular satellite-derived climate data record; the CLARA-A1 dataset.The acronym stands for the Climate Monitoring Satellite Application Facility (CM SAF -see www.cmsaf.euand Schulz et al., 2009) Cloud, Albedo, and Radiation dataset from AVHRR data (Karlsson et al., 2013).It is based on global historic Advanced Very High Resolution Radiometer (AVHRR) data from the polar orbiting NOAA satellites covering the period 1982 until 2009.
While working with the evaluation, several issues related to how to interpret the CALIPSO-CALIOP cloud datasets aroused.This was mainly triggered by the notification of some inconsistencies between CALIOP cloud datasets created at different spatial resolutions.We claim that these differences have so far not been accounted for specifically in previous similar studies.Also, the philosophical question on how to define the upper boundary (cloud top) of a cloud needs specific attention.These issues may all be critical to the finally achieved results and we want to highlight these aspects in this paper.Introduction

Conclusions References
Tables Figures

Back Close
Full Section 2 introduces the two datasets to be inter-compared and Sect. 3 elaborates further on the problems associated with this comparison and suggests a method on how to deal with them.This is followed in Sect. 4 by the presentation of results on the performance of cloud detection, its regional dependency and the apparent cloud detection limit in terms of the thinnest (in the cloud optical thickness sense) clouds being detected.Sect. 5 presents results for the cloud top height determination, taking into account the deduced cloud detection limitations.Finally, Sect.6 concludes and gives some further discussion on the optimal method to be used for cloud parameter validation.

The investigated dataset: CLARA-A1
The CLARA-A1 dataset of global cloud products retrieved by CM SAF cloud retrieval methods is based on reduced resolution (approximately 4 km) global area coverage (GAC) AVHRR data spanning the time period 1982-2009.The total set of cloud products includes Cloud Fractional Cover, Cloud Top Level, Cloud Optical Thickness, Cloud Phase, Liquid Water Path, Ice Water Path, and Joint Cloud property histograms.Here, we will concentrate on the evaluation of the first two products.For a full description of the dataset the reader is referred to Karlsson et al. (2013).
The Cloud Fractional Cover (CFC) product is derived directly from results of a cloud screening or cloud masking method.CFC is defined as the fraction of cloudy pixels per grid square compared to the total number of analysed pixels in the grid square.Fractional cloud cover is expressed in percent.The product is calculated using the Nowcasting Satellite Application Facility (NWC SAF) PPS (Polar Platform System) cloud mask algorithm (see http://www.nwcsaf.org/for details on the NWC-SAF project).The algorithm (detailed by Dybbroe et al., 2005) is based on a multi-spectral thresholding technique applied to every pixel of the satellite scene.Several threshold tests may be Introduction

Conclusions References
Tables Figures

Back Close
Full applied (and must be passed) before a pixel is assigned to be cloudy or cloud-free.Thresholds are determined from present viewing and illumination conditions and from the current atmospheric state (prescribed by data assimilation products from numerical weather prediction models -here, the ERA-Interim dataset, see Dee et al. (2011) and http://www.ecmwf.int/research/era/do/get/era-interim).Also ancillary information about surface status (e.g.land use categories and surface emissivities) is taken into account.Thus, thresholds are dynamically defined and therefore unique for each individual pixel.
The Cloud Top Level (CTO) product is also derived using NWC SAF PPS algorithms.The product is abbreviated CTO because it can be expressed in three alternative forms: cloud top height (in meters), cloud top pressure (in hPa), and cloud top temperature (in Kelvin).In this paper we concentrate on the cloud top height version since this is what is available from the CALIPSO-CALIOP cloud products.Consequently, we will refer to the product as either CTO or cloud top height.
Cloud top processing is sub-divided using two separate algorithms, one for opaque and one for fractional and semi-transparent clouds, and it is applied to all cloudy pixels as identified by the PPS cloud mask product.The opaque algorithm uses simulated cloud free and cloudy top of atmosphere (TOA) 11 µm radiances which are compared and matched to measured radiances.Cloudy radiances are simulated assuming "black-body"-clouds at various levels.The semi-transparent algorithm is applied to all pixels classified as semi-transparent cirrus or fractional water cloud.This classification is based on the analysis of brightness temperature differences of the 11 µm and 12 µm (split window) channels noting that this difference is generally small or negligible for opaque clouds.Also brightness temperatures at 3.7 µm are involved in this process.A histogram technique is applied based on the construction of two dimensional histograms using AVHRR channel 4 and 5 brightness temperatures composed over larger segments.By an iterative procedure a polynomial curve (simulating the arc shape) is fitted to the histogram-plotted values from which the cloud top temperature and pressure (taken from ERA-Interim profiles) is retrieved.Introduction

Conclusions References
Tables Figures

Back Close
Full Obviously, only a small fraction of the CLARA-A1 dataset may be evaluated using CALIPSO-CALIOP data (limited to years 2006-2009).However, it is believed that results should largely be valid also for results before these years provided that a reasonably large number of collocations can be found and if considering that the AVHRR instrument has not undergone drastic changes throughout the years.The basis for the comparison is the use of original PPS cloud mask and cloud top height products for full orbit swaths (about 13 000 scan lines) which are collocated with CALIPSO-CALIOP orbits using specific matching criteria (further described in Sect.3).
In the remainder of the text we will use the notation CLARA-A1/PPS to emphasize that we examine the performance of the PPS cloud mask and PPS cloud top height products for the PPS version used when defining the CLARA-A1 dataset.

The reference validation dataset: CALIPSO-CALIOP cloud products
The Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite was launched in April 2006 together with CloudSat.The satellite carries the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) and the first data became available in August 2006.CALIOP provides detailed profile information about cloud and aerosol particles and corresponding physical parameters.CALIOP measures the backscatter intensity at 1064 nm while two other channels measure the orthogonally polarized components of the backscattered signal at 532 nm.The horizontal resolution of each single field of view (FOV) is 333 m and the vertical resolution is 30-60 m.
The CALIOP cloud product we have used reports observed cloud layers, i.e. all layers observed until signal becomes too attenuated.In practice the instrument can only probe the full geometrical depth of a cloud if the total optical thickness is not larger than a certain threshold (somewhere in the range 6-10).For optically thicker clouds only the upper portion of the cloud is sensed.Introduction

Conclusions References
Tables Figures

Back Close
Full associated information from the Lidar Level 2 Vertical Feature Mask product.Regarding the latter it is important to notice the use here of the categorisation of low-level, medium-level and high-level clouds introduced by the International Satellite Cloud Climatology Project (ISCCP).This categorisation uses pressure levels of 680 hPa and 440 hPa to separate the three categories.We will use this classification later for separating results of cloud top height determinations between the three vertical groups of clouds.
The CALIOP products are defined in five different versions with respect to the alongtrack resolution ranging from 333 m (individual footprint resolution), 1 km, 5 km, 20 km, and 80 km.The four latter resolutions are consequently constructed from several original footprints/FOVs.This allows a higher confidence in the correct detection and identification of cloud and aerosol layers compared to when using the original high resolution profiles.For example, the identification of very thin Cirrus clouds is more reliable in the 5 km dataset than in the 1 km dataset since signal-to-noise levels can be raised by using a combined dataset of several original profiles.
The natural choice of product resolution for the validation of 4 km AVHRR GAC products is to use the CALIOP 5 km dataset.The CALIOP 5 km dataset also offers estimation of cloud optical thicknesses of individual layers (not available for finer resolution FOVs) which is a very attractive feature since it means that this offers a possibility to analyse cloud detection limits quantitatively.Introduction

Conclusions References
Tables Figures

Back Close
Full signal-to-noise levels by averaging results from high resolution fields of views into coarse resolution field of views.By doing this it is possible to identify cloud layers that are too thin to be detected in the original fine FOV resolution of 330 m because of high noise levels.This means that in theory the optically thinnest cloud layers will be found in the 80 km FOV resolution CALIOP dataset.Since a lot of concern in climate research for many years has been given to the potential impact of thin and sub-visible Cirrus clouds (Stephens et al., 1990) (1) In other words, when reducing resolution we should be able to add thin cloud layers to previously detected cloud layers leading to an overall increase of global cloud fraction.
However, in practice Eq. ( 1) seems not to be fulfilled in all cases.For example, in the collocation dataset which was used here (further detailed in Sect.3.5) almost 20 % of the cases had higher or equal cloud fractions in the 1 km CALIOP cloud dataset than in the 5 km dataset.Similar experiences have also been reported by scientists working with the evaluation of the MODIS cloud algorithms (R. Kuehn, University of Introduction

Conclusions References
Tables Figures

Back Close
Full this dataset is not ideally suited for this task which has also been expressed by the CALIPSO Science team (D.Winker, personal communication, 2012).What appears to happen is that when aggregating results from several fine resolution FOVs, strong signals from boundary layer clouds have been suppressed (this is done deliberately for increasing sensitivity further) which unfortunately has led to the loss of some cloud layers that existed in the previous fine resolution dataset.Thus, thin layers have been added to coarse resolution datasets but unfortunately some (relatively) thick cloud layers have been lost at the same time.This fact results in unchanged or only relatively modestly increased total cloud fraction and in some cases even slightly reduced total cloud fractions.
What complicates things further is that also finer resolution (1 km) CALIOP datasets have been shown to miss some specific cloud layers which cannot be considered as exceptionally thin (as reported by Chan and Comiso, 2011).These cloud layers were reported to be boundary-layer clouds at high latitudes with geometrical thicknesses of less than 1 km and with cloud optical thicknesses lower than 14.
Because of these ambiguities in the delineation of global cloudiness we suggest that, while waiting for further upgrades of the CALIOP cloud retrieval algorithms (potentially addressing these limitations), one should try to use the existing information in the 1 km and 5 km CALIOP datasets in a combined way to estimate the true cloud situation as far as possible.In the next sub-section we introduce a method which we believe takes the best from both CALIOP datasets, thereby reducing inherent deficiencies.

Proposed evaluation of cloud amounts using combined 1 km and 5 km CALIOP datasets
The principle for constructing an optimal cloud dataset from CALIOP 1 km and 5 km datasets is based on the assumption that thick (opaque) clouds are well described by the CALIOP 1 km dataset while thin clouds are best described by the 5 km dataset.
In other words, we think that the potentially "lost" thick clouds in the 5 km dataset (as described in the previous section) are most likely included in the 1 km dataset.Similarly, 1102 Introduction

Conclusions References
Tables Figures

Back Close
Full the thinnest clouds (including clouds of the type reported by Chan and Comiso, 2011) are reasonably well depicted in the 5 km dataset as well as thin Cirrus clouds that were not detected at all at 1 km FOV resolution.Thus, we can construct a new merged cloud dataset by going through the following rather simple post-processing steps: -Step 1: Compute a preliminary cloud fraction (CFC ) at 5 km FOV resolution from the 1 km FOVs.
(If a cloud layer was missing in the 5 km dataset but covering more than 50 % of the involved 1 km FOVs, a new layer will now be added).
- By these simple steps we believe that we have reduced ambiguities significantly even if steps 2 and 3 still mean that there are undetermined retrieved values of both CFC and cloud optical thickness.For example, in step 2 we might add (or restore) a new 5 km layer but we have no way of giving this new layer a retrieved value of cloud optical thickness (since this quantity is only retrieved for the 5 km FOV dataset and not for the 1 km FOV dataset).We have "solved" this by prescribing the new value to optical thickness 1.0.This is just to show that we believe that this cloud layer should not belong to the category of very thin cloud layers, thus assuring that it will not be included in subsequent cloud detection limit studies focussing at clouds with low optical thickness values.Furthermore, step 2 means that there could be cases when at 1 km FOV there are only one or two cloudy columns while at 5 km FOV we have a cloud layer.This Introduction

Conclusions References
Tables Figures

Back Close
Full cloudy 5 km FOV will now be removed which maybe could be questioned.But on the other hand, a 5 km FOV that is less than 50 % cloud-covered should actually in this context not be considered as a fully cloudy FOV.The ambiguity comes from the consideration that it could theoretically be a cloud layer that is only partly detected by the 1 km FOVs within the 5 km FOV while in reality it is actually covering the entire 5 km FOV.We simply have not enough information here to judge what the truth is so we have to stay with the simple interpretation resulting from step 2. We actually think this uncertainty is marginal in comparison with the general uncertainty about the true CFC within the 5km FOV.This concerns in particular the entirely "new" thin cloud layers appearing in the 5 km dataset being detected after the averaging procedure for reducing the signal-to-noise levels.These new cloud layers are assumed to cover the entire 5 km FOV but there is actually no way of estimating the true CFC within the 5 km FOV.It is possible that these clouds only cover a fraction of the 5 km FOV.In some sense, it is even possible that these interpreted thin cloud layers might be just broken cloud layers which are optically relatively thick but just covering a small fraction of the 5 km FOV (as suggested by A. Devasthale, personal communication, 2012).Despite these remaining ambiguities, we believe that the proposed approach yields reasonable results that are more consistent and robust than results based exclusively on either 1 km or 5 km FOVs.

Method for evaluating cloud detection efficiency and the cloud detection limit
The merged new 5 km CALIOP dataset (compiled according to the method described earlier) now includes information about cloud layers at each 5 km FOV and for each cloud layer an estimated cloud optical thickness is given.Though, important to remember is that for the lowermost layer it might be only a minimum value since the entire cloud layer might not be penetrated by the lidar signal.We may now evaluate the cloud detection efficiency of the methods used to derive the CLARA-A1 dataset either in a direct inter-comparison (i.e. using all CALIOP-detected cloud layers) or by applying 1104 Introduction

Conclusions References
Tables Figures

Back Close
Full a filtering mode where cloudy columns having an integrated cloud optical thickness below a certain value are treated as being cloud-free.In this way we should be able to quantify the cloud detection limit of the CLARA-A1 dataset.For this purpose we have filtered the CALIOP dataset in cloud optical thickness steps of 0.05 in the range 0.0-0.5 and in steps of 0.1 in the range 0.5-1.0.
For quantifying results we have use the following statistical scores: 1. Mean-error (Bias) 2. Root Mean Square Error (RMS) 3. Probability of Detection (POD) for both cloudy and cloud-free conditions 4. False Alarm Rate (FAR) for both cloudy and cloud-free conditions 5. Hit Rate (HR) 6. Kuiper's skill score (KSS) For the estimation of cloud occurrence or cloud fractional cover (CFC), we have used a binary representation of the results (i.e.cloud cover = 1 for cloudy conditions and cloud cover = 0 for cloud-free conditions) for each individual pixel or FOV.Consequently, results are accumulated over the full matchup track to get a mean CFC (according to Eq. 2 below) and the associated Bias and RMS values.As a final step, all matchup results for all matched orbits are accumulated and averaged.

CFC = cloudy allpixels (2)
For the remaining four quantities we have used the following definitions (referring to notations in the contingency matrix in Table 1):

AMTD Introduction Conclusions References
Tables Figures

Back Close
Full (5) where The POD and FAR quantities estimate how efficient CLARA-A1/PPS is in determining either cloudy or cloud-free conditions.Naturally, we want POD values to be as high as possible and FAR values to be minimized.The hit rate HR is a condensed measure of the overall efficiency of cloud detection.Finally, the KSS quantity is a complementing measure since the HR can sometimes be misleading because it is heavily influenced by the results for the most common category.For example, if a case is almost totally cloud free but all the few cloudy portions are misclassified as cloud-free by CLARA-A1/PPS the HR score would still be high.A more reasonable measure in such a condition is the KSS score that at least to some extent punishes misclassifications even if they are in a small minority of all the studied cases.The KSS score tries to answer the question how well the estimation separated the cloudy events from the cloud-free events.A value of 1.0 is in this respect describing the situation of a perfect discrimination while the value −1.0 describes a complete discrimination failure.
In addition, we have also separately studied the cloud detection efficiency over various regions of the Earth and the performance as a function of time of day.Here, we have used the twilight category defined as valid for solar zenith angles between 80 and 95 degrees with day and night categories either having lower or higher solar zenith angles, respectively.Concerning the study of the geographical variation we have

Conclusions References
Tables Figures

Back Close
Full separated results according to geographical regions defined in Table 2.These results have also been further separated for land and ocean conditions using a land mask.

Method for evaluating accuracy of cloud top height products
If considering that there is a cloud detection limit (expressed in terms of minimum cloud optical thickness, τ min ), for a dataset such as CLARA-A1, this should mean that the evaluation of corresponding cloud top height products must take this into account.
It is of course trivial that clouds which are not detected cannot be given a valid cloud top height.But also for clouds that are detected, the effect of cloud detection limitations must be taken into account in some way.For example, if a very thin cloud layer (not detectable by CLARA-A1/PPS but present in the CALIOP dataset) is overlaying a thicker cloud layer (being detected by CLARA-A1/PPS) one should actually neglect this uppermost layer when doing cloud top height validation.Even in the case when we have just detected one single cloud layer, the uppermost part of that layer (with integrated optical thickness of the minimum detection value) should theoretically be discarded.One could actually claim that a representative cloud top height would be even lower since the measured radiance for the AVHRR instrument is a mix of contributions from several altitudes below the cloud top unless the cloud is optically very thick.In other words, the AVHRR-representative cloud top is rather the radiatively efficient cloud top than the physical or geometrical cloud top.Thus, for an AVHRR-detected cloud layer a representative cloud top height should rather be the mid-layer altitude of the CALIOPdetected layer than the uppermost cloud layer boundary.This can also be motivated for the clouds that are not fully penetrated by the CALIOP lidar signal.When the cloud is optically too thick the CALIOP cloud layer will describe only the uppermost part of the cloud and the mid-layer value here would then still be representative for the AVHRRdetected (radiatively efficient) cloud top, in our opinion.
Taking these aspects into account, we have applied the following criteria for evaluating the cloud top height: Introduction

Conclusions References
Tables Figures

Back Close
Full -Cloud top height is interpreted as the mid-level of the uppermost CALIOP cloud layer assumed to be detected in CLARA-A1, i.e. the mean of the cloud base and the cloud top altitude for that layer.

The collocated NOAA-AVHRR and CALIPSO-CALIOP dataset
We have adopted the following strategy for collecting the collocated NOAA-AVHRR and CALIPSO-CALIOP cloud observations to be inter-compared: -Select the best complete collocations or matches, i.e. entire global orbits with minimum observation time differences between NOAA-18 and A-Train/CALIPSO for every month where we have CALIPSO data available (in practice from October 2006 until December 2009).
Observe that the choice of NOAA-18 is explained by the fact that this satellite is placed in almost the same orbital plane as the Aqua-Train satellites with approximately the same equator crossing time.Thus, if choosing matches where the orbital tracks crosses simultaneously (denoted Simultaneous Nadir Observations -SNOs) -in this case limited to within only 12 s -we can get measurements matched in near nadir observation conditions for an entire global orbit and with a maximum time difference between observations of less than approximately 2 min for positions farthest away from the SNO point.Using this criterion we may theoretically get close to 3 such optimal matches each month.However, due to some losses of data (i.e.cases where we could not find both 1 km and 5 km CALIOP data) we ended up with a total of 99 global orbits evenly distributed over the period (see total coverage in Fig. 1).The geographical coverage is good but we can see that for some regions (e.g. over South America, North Atlantic Ocean, Africa and parts of the Pacific Ocean) the orbit coverage is less frequent than Introduction

Conclusions References
Tables Figures

Back Close
Full over other regions due to some loss of data.An example of one of the resulting orbits is shown in Fig. 2. The corresponding plot of CALIOP-observed cloud layers (green) and CLARA-A1/PPS cloud top height results (blue) is given in Fig. 3.Only small deviations (less than 10 degrees) from the nadir view are achieved for the matched AVHRR observations during such an orbit.
The 99 collocated orbits resulted in a total of 725 900 matched FOVs within 2 min observation time difference for the calculation of statistics and scores.

Overall results based on all collocations
A way to estimate the cloud detection efficiency is to plot and analyse various statistical scores as a function of the CALIOP-filtered cloud optical thickness.For clarity, we repeat that the filtering process means that whenever CALIOP-derived total cloud optical thickness in the column/FOV falls below a specific cloud optical thickness threshold we will treat the observation as if being cloud-free when calculating statistics.Figures 4-6 show the results for all statistical parameters described in Sect.3.3 based on all collocated orbits.
The basic mean error and RMS error quantities are shown together with the resulting total cloud fraction (i.e. percentage cloudy FOVs of all FOVs) in the CALIPSO-CALIOP dataset in Fig. 4. We notice that after filtering clouds having optical thicknesses up to 1.0, the total cloud fraction for CALIOP reduces from approximately 73 % to 50 %.At the same time the mean error changes from −14 % to +8 % and the RMS changes from 47 % to 50 %.Based on mean error results alone one might conclude that the optimal agreement is reached after filtering all cloudy columns with optical thickness values below 0.35.The fact that mean errors become positive for higher filtered optical Introduction

Conclusions References
Tables Figures

Back Close
Full thickness thresholds only means that some cloudy CALIOP columns are now treated as being cloud-free even if they were detected successfully by CLARA-A1/PPS, thus giving a positive mean error.If comparing with Fig. 6 showing Hitrates and Kuiper's skill scores, we see that the skill now peaks at slightly lower values of the filtered cloud optical thickness threshold, namely at about 0.2 for Hitrate and 0.1 for Kuiper's skill score.This shows that from these different statistical measures it is not easy to make a very clear conclusion about cloud detection limits.However, results of Probabilities of Detection and False Alarm Ratios (hereafter denoted POD and FAR) in Fig. 5 also reveal some further features of CLARA-A1/PPS results.These features are not evident in Fig. 4 or 6 and, in particular, they are not directly related to how thin or thick clouds are.We first note that the FAR quantity for clear FOVs initially reduces rapidly with increasing value of the filtered cloud optical thickness.This is what we should expect if very thin cloud layers are not detected by CLARA-A1/PPS, i.e. scores would improve if also these CALIOP observations are treated as being cloud-free.Similarly, POD for cloudy conditions improves with increasing values of filtered optical thickness.However, more serious is the observation that the FAR quantity for cloudy conditions amounts to 8 % initially for unfiltered CLARA-A1/PPS results.Thus, we seem to have a significant misclassification of clear FOVs labelled as cloudy which also explains why POD results for clear conditions are relatively far away from 100 % in the unfiltered mode.This shows that the cloud detection efficiency cannot be judged solely from studies of how thin or thick clouds are.It is clear that there are also Earth surfaces that have appearances that resemble those of clouds regardless of whether clouds are thin or thick.The most obvious example is the case when interpreting a cold ground surface at night as being a cloud if using an inappropriate value of the assumed ground surface temperature (i.e.being too warm).Another case is when a bright land surface (e.g.desert) is mistaken for a cloud because of using inappropriate (i.e.too dark) surface reflectance thresholds.We conclude that some measures must be taken to try to remove the influence from this latter Introduction

Conclusions References
Tables Figures

Back Close
Full type of misclassifications which could be interpreted as a constant bias in our results not related to the thickness of the clouds.

Results after excluding misclassified cloud-free surfaces
The most obvious way of trying to isolate the results depending mainly on the cloud optical thickness value of clouds would be to remove or ignore all cases being misclassified as cloudy in the completely unfiltered mode.In other words, let us restore the cloudy CLARA-A1/PPS pixels in evidently cloud-free CALIOP FOVs to become clear.Thus, these 8 % of the cases in the FAR category for cloudy pixels in the unfiltered mode are now being treated as correctly classified as clear.Ideally, we should also try to exclude or ignore the oppositely misclassified cases, i.e. when clouds are misclassified as clear regardless of their optical thickness (i.e. for non-separability reasons).However, these cases are not as easily identified as the cases of misclassified clear pixels.More clearly, they can occur at any cloud optical thickness meaning that these cases are inherently mixed with all the cases we actually aim at, namely those cases when cloud detection will clearly depend on the cloud optical thickness value.In that sense these misclassifications exist as an almost constant bias in our results.They are best identified in Fig. 5 as explaining why the FAR clear value is still high (20 %) even at the maximum filtered cloud optical thickness of 1.0.It means that in 20 % of all cases when CLARA-A1/PPS gives a cloud free result there are actually clouds in reality and they have cloud optical thickness values higher than 1.0.Further details on when these misclassifications occur will be revealed in forthcoming Sects.4.1.3and 4.1.4.The revised results for the statistical scores (after ignoring misclassified clear cases labelled as cloudy) are now shown in Figs.7-9.We notice in Fig. 7 that now the mean error quantity does not reach the zero level until at a filtered cloud optical thickness of 0.7.This is a high value and indicates that the CLARA-A1/PPS cloud screening method is generally rather cloud conservative.But it does not necessarily mean that the cloud detection limit is best described by this value of optical thickness.Rather we should use a quantity which is more uniquely decided by and dependent on the filtered cloud 1111 Introduction

Conclusions References
Tables Figures

Back Close
Full optical thickness.The two quantities that best fit this description seems to be POD for cloudy conditions and FAR for clear conditions.The first quantity improves with increasing filtered optical thicknesses until "all" clouds are detected.The fact that the POD cloudy saturation level does not reach 100 % means that the difference with respect to the 100 % level represent all those cases where clouds remain undetected regardless of their cloud optical thickness.Similarly, the FAR for clear conditions behaves in the same way where the apparent convergence level defines the same misclassified cases (i.e. that portion of the CLARA-A1/PPS clear cases that actually are undetected clouds even for higher optical depths).
The significant increase at lower optical thicknesses than 0.7 of the POD quantity for cloudy conditions and the corresponding decrease of the FAR quantity for clear conditions in Fig. 8 shows that much thinner clouds than what the mean error quantity indicates are indeed detected.A better value of the minimum optical thickness detected could then be suggested to be derived from the rate of change of the mentioned POD and FAR quantities for cloudy and clear conditions, respectively.The minimum optical thickness to be determined would then be the value found when the improvement of these two quantities have slowed down or "saturated" (i.e.approaching constant or almost constant values).The interpretation of this value would be that at this cloud optical thickness all clouds are detected, unless other problems not related to how thick clouds are exists.For lower cloud optical thicknesses some clouds will be detected but for very low optical thicknesses no clouds at all will be detected.Here, we will apply the following criteria for finding this cloud optical thickness limit: This means that we will interpret the cloud detection limit as the first (i.e.lowest) cloud optical thickness value where this inequality is fulfilled while checking for higher and higher filtered cloud optical thickness values.The value 1 % is maybe rather arbitrarily chosen but it was considered reasonable as a value for representing the case when

AMTD Introduction Conclusions References
Tables Figures

Back Close
Full the two quantities had reached almost constant values.Consequently, if applying this definition we get an overall cloud detection limit of optical thickness 0.35.If comparing with Fig. 9 we see that this value is also relatively close to where the maximum of the Hitrate score occurs (although it is peaking at slightly lower optical thickness values).The Kuiper's score does not really help us here.If remembering that this score is a measure of how well cloudy and cloud-free situations are separated, it is clear that this will now occur in the unfiltered case (after having removed all obviously misclassified cloudy cases).

Results subdivided into day and night portions
Since the overall results include results from both illuminated and dark conditions, an interesting aspect is to study what happens if we look at both conditions separately.Basically, it means that we look at the impact of having access to visible spectral channels (i.e.information on reflected sunlight) or not.Figures 10-12 show corresponding results for all statistical scores at day and at night (as defined in Sect.3.3).All figures show convincingly how cloud detection efficiency degrades for night-time conditions.For example, in Fig. 10 we see that while the mean error reaches the zero level already at cloud optical thickness 0.2 during day it never reaches this level at night (i.e.remains negative).It is clear that a large fraction of all clouds are not detected at night, even at large cloud optical thicknesses.This is also well illustrated in Fig. 11 with decreasing POD cloudy and increasing FAR clear at night (i.e.FAR clear at filtered cloud optical thickness of 1.0 increases from about 10 % during day to about 25 % during night).Skill scores in Fig. 12 also show significantly better results during day compared to during night.Thus, the availability of information in the visible and short-wave infrared AVHRR channels appears to be quite important for the success of cloud detection.
Somewhat surprising, the derived value of the minimum cloud detection limit (according to Eq. 9) is found at cloud optical thickness 0.3 for both day and night conditions.Thus, the sensitivity to the filtered cloud optical thickness is relatively unchanged even if much fewer clouds are detected at night.We conclude that this must be explained 1113 Introduction

Conclusions References
Tables Figures

Back Close
Full by the increase in frequency of cases when clouds are completely missed at night (as indicated by the high FAR clear value at night).Thus, we are facing more general nonseparability conditions of clouds and Earth surfaces at night and this has nothing to do with how thick clouds are.The fact that the overall cloud detection limit was estimated to be at cloud optical thickness 0.35 in Sect.4.1.2(i.e. higher than the derived value for either day or night) indicates that conditions must be especially problematic at twilight conditions.Hence, the cloud detection limit is found to lie at a cloud optical thickness of 0.45 for twilight conditions.From Fig. 13, showing the POD and FAR quantities at twilight conditions, we conclude that this is explained by the rather slow increase in POD cloudy and the corresponding slow decrease of FAR clear for increasing filtered cloud optical thicknesses.Thus, at twilight we still miss a substantial fraction of optically thick clouds (more or less the same fraction as during night) but now we also face increasing difficulties in detecting thinner clouds.

Global results subdivided into regions
Let us now look at the geographical variations of the validation results.This might shed some further light on actually where we encounter the static problems of misclassified clear or misclassified cloudy conditions, i.e. those misclassifications that are not depending on existing clouds' optical thickness.If first considering the unfiltered CALIPSO results, we will investigate if there are specific regions where misclassifications of cloud-free areas occur (i.e.explaining the 8 % of CLARA-A1/PPS misclassified clear cases mentioned in Sect.4.1.2).These results are summarised for the mean error quantity in Table 3 for latitudinal bands defined in Table 2, for day, twilight and night categories and for land and ocean surfaces.We restrict the description to the mean error quantity since by this detailed sub-division of results the number of samples per category is sometimes too small to enable a proper estimation of all the statistical scores.For example, no samples for the twilight category could be found for tropical and sub-tropical regions.As expected, for most categories in Table 3 we have a substantial underestimation of cloudiness explained by the inability to detect very thin cloud 1114 Introduction

Conclusions References
Tables Figures

Back Close
Full layers.However, one of the categories actually showing some overestimation (+6.2 %) is the category sub-tropical land.Near-zero results are also presented for the tropical land category.Further visual inspection of results revealed that misclassifications of clear conditions mainly occur over semi-arid land areas, i.e. in the zone where desert regions change from being pure desert to being partly vegetation-covered.Thus, misclassifications do not occur over pure desert areas but where we have a seasonal transition from near-desert conditions to tropical vegetated conditions.Table 4 shows results where we treat all CALIOP-detected clouds with cloud optical thicknesses lower than 0.35 as non-existent (i.e. as cloud-free cases).We notice that for the day category we now get dominantly positive values, i.e. we normally detect some clouds that are thinner than optical thickness 0.35.However, the overestimation is now quite excessive for categories sub-tropical land and tropical land which even further emphasizes the misclassification problems encountered here.Some positive values are also seen at night over sub-tropical and tropical categories but otherwise we have dominantly negative results for night and twilight categories which is in line with the results discussed previously in Sect.4.1.3.For these categories, we obviously do not detect a substantial fraction of all clouds regardless of their optical thickness.This occurs mainly in the Polar regions but also during dark and snow-covered periods over high latitude regions.

Cloud top height results
Results from the evaluation of cloud top height retrievals (following the method described in Sect.3.4) are presented in Table 5. Results are here also compared with cases where we did not apply any filtering of very thin cloud layers and where we also always compared with the cloud top boundary for the uppermost CALIOP cloud layer (instead of the mid-layer value).
It is obvious from Table 5 that the chosen validation methodology has a tremendous impact on the achived results.If including all thin cloud layers and if comparing with uppermost cloud boundary, a substantial underestimation of cloud top heights is found Introduction

Conclusions References
Tables Figures

Back Close
Full (on the average more than 2.5 km).If instead taking into account the cloud detection limit and trying to represent clouds with a more radiatively relevant height results improve drastically (overall bias of −274 m).If filtering with the optical thickness value of 0.5 the bias almost disappears.However, if taking advantage of that the CALIOP cloud layer description in the Vertical Feature Mask product also includes a sub-division into categories low-level, medium-level and high-level clouds following the ISCCP definition, we notice that the small bias is largely a result of the sum of a large underestimation of high-level cloud tops (−1769 m) and a large overestimation of low-level cloud tops (+1137 m).Thus, there seems to be different behaviour of high-level clouds and lowlevel clouds.The low-level boundary layer cloud problem is the same as being reported previously for MODIS cloud top products (Menzel et al., 2008).For CLARA-A1/PPS it can be explained as a problem with the reference atmospheric temperature profile taken from NWP analyses (here, ERA-Interim).For boundary layer clouds trapped in a temperature inversion, the reference profile is not detailed enough (i.e.too weak inversion which is partly due to the mismatch between pixel and NWP grid resolution) leading to an overestimation of the cloud top height.A typical example of when this occurs can be seen in Fig. 3 between track positions 7000 and 8000.
The underestimation of high-level clouds reflects the problem of how to define the radiatively efficient cloud top height for thin and multiple cloud layers for an infrared channel of a passive imager.CALIOP measurements have also revealed the frequent existence of surprisingly thick (geometrically) single cloud layers which are optically very thin.Good examples of this are found in Fig. 3 at track positions 2400, 4000 and 6500.The use of a mid-layer representation of such a cloud layer is apparently still inadequate.Currently, CLARA-A1/PPS retrievals underestimate the height for all highlevel clouds substantially even if the method of applying cloud filtering of the thinnest clouds have reduced the difference.It is clear that there are remaining ambiguities in the determination of an appropriate radiatively efficient height.A possible further improvement of the validation methodology could be to better try to estimate how deep (in the optical sense) into cloud layers we need to go for finding this efficient height.Introduction

Conclusions References
Tables Figures

Back Close
Full The corresponding integrated cloud optical thickness should obviously be larger than the estimated detection limit of 0.35.But even if we cannot determine this value exactly, the systematic use of a stipulated value (e.g.optical thickness 1.0) could be valuable in the evaluation of different and upgraded cloud height retrieval methods in the future.

Conclusions
This study investigated the optimal validation methodology to be used when evaluating cloud retrievals from passive imagers for taking full advantage of the measurements provided by the active cloud lidar CALIOP carried by the CALIPSO satellite.Some inconsistencies of the current CALIOP datasets were identified and a method for mitigating the influence of those was proposed.The method was applied for evaluating the geographical environment, an overall cloud detection limit was estimated at a cloud optical thickness of 0.35.It means that at this cloud optical thickness most cloud layers are being detected.Thinner clouds are detected but at decreasing efficiency with smaller cloud optical thickness.The diurnal variation showed that the detection limit is close to 0.3 both day and night while conditions are deteriorating considerably at twilight conditions when the cloud detection limit is estimated to 0.45.
The study also revealed that there is a substantial fraction of cases where cloud detection results are not depending at all on the thickness of existing clouds.In other words, there are cases where clouds are either completely missed or falsely identified.This explains why the probability of detecting clouds is limited to about 90 % during day but as low as 75-80 % during night and twilight conditions.Daytime misclassifications of semi-arid sub-tropical and tropical land surfaces as clouds were identified, as well Introduction

Conclusions References
Tables Figures

Back Close
Full as a substantial amount of missed clouds in the Polar regions during the Polar winter.Both deficiencies are well understood and reflect major challenges for most cloud retrieval schemes using data from passive imagery.The daytime problem is linked to the fundamental difference in the cloud-free spectral appearance of desert surfaces and tropical forested surfaces.While cloud screening seems to work well over both mentioned surfaces, problems arise in the transition zone between them where the appearance also changes seasonally.The current methodology has an inappropriate description of this transition zone and the associated temporal changes of its surface appearance (i.e. a static climatology is used).Thus, an improved methodology must address this limitation in the future.
The underestimation of cloudiness at high latitudes and especially during winter conditions is linked to another well-known problem for all cloud screening methods applied to passive imagery.It occurs when there is no distinct temperature difference between clouds and the underlying surface.The situation becomes even worse if the temperature difference is also reversed (i.e. if clouds are warmer than the surface) which is a frequent feature in the polar winter.Also, when ground temperatures becomes extremely cold (like over the Antarctic plateau in the polar winter) the radiometric accuracy of the AVHRR measurement is not any longer accurate enough for estimating the brightness temperature difference between infrared channels; a quantity that is heavily used by many cloud screening methods.Altogether, this leads for CLARA-A1 to a substantial underestimation of cloudiness over Polar regions in the polar winter and also during night and twilight conditions at high latitudes.Notice that even if this problem is common to most cloud screening methods applied in the Polar region, the achieved results may differ depending on the actual method.For some methods like CLARA-A1/PPS the problems are manifested as missed clouds while for other methods it could as well lead to overestimated cloudiness (misclassified cold cloud-free surfaces).
Complementary to the study on cloud detection efficiency, also the quality of CLARA-A1/PPS cloud top height assignments was evaluated.Also here the information on cloud detection limitations were taken into account, either by discarding too thin Introduction

Conclusions References
Tables Figures

Back Close
Full better resembles the level that is normally retrieved from passive imagery.When using the latter approach a relatively small total cloud top height bias of −274 m was found.This can be compared to the cloud top height bias of −2762 m for the default method based on the uppermost cloud boundary and including all thin clouds.However, even after using the more realistic radiatively efficient level approximation it is clear that large underestimations of high-level cloud top heights and overestimations of low-level cloud top heights exist which has to be addressed in a future reprocessing of the dataset.
In conclusion, we have demonstrated how CALIPSO-CALIOP results can be used to carry out a very detailed examination of cloud retrieval results from passive imagers.
Results presented here are not entirely surprising or unexpected but they are given with unprecedented detail.Although the current CALIOP datasets show some internal inconsistencies depending on the FOV resolution, we have shown how these can be mitigated to construct a more reasonable validation reference.As such, we believe that its value is unprecedented and that it can be used as an invaluable reference for the evaluation of any cloud retrieval scheme based on data from passive imagers.In particular, we believe that even beyond the lifetime of the CALIPSO satellite, the extracted subset of collocated NOAA-18 and CALIPSO-CALIOP observations might serve as a benchmarking dataset for the testing of various AVHRR-based cloud retrieval methods.For the planned future upgrades of the CLARA-A1 dataset, the idea is to use the currently collected CALIPSO dataset in exactly this way.There is a limitation in that it is based exclusively on afternoon-orbit NOAA-18 data but we believe that it can be complemented with a limited set of morning-orbit data from satellites carrying the modified AVHRR instrument with the additional 1.6 µm channel.Although, for the latter the Introduction

Conclusions References
Tables Figures

Back Close
Full matched datasets has to be limited to latitudes near ±70 degrees due to orbital considerations (i.e. this is the only latitude where simultaneous overpasses with CALIPSO occur for morning orbit satellites).The next CLARA release (CLARA-A2) is scheduled for 2016 and we will utilise the current validation tool heavily in the work of upgrading and evaluating the methodology.But even concerning the current CLARA-A1 results, our findings should be very important for potential users.One particularly good example is the provision of essential background information for the construction of a CLARA-A1 simulator tool to be used for evaluation of cloud properties simulated by climate models.
Regarding the prospect of applying this methodology to data from other sensors than AVHRR, it is clear (or even trivial) that the method is directly applicable to data from the MODIS sensor (being already available on the A-Train platform).The method is also directly applicable to data from the new Visible Infrared Imager Radiometer Suite (VIIRS) sensor on the Suomi-NPP satellite, also being placed in an afternoon orbit very similar to the orbit of NOAA-18.As for the aforementioned morning orbit NOAA and Metop satellites, the method should also be applicable at high latitudes for sensors like the Advanced Along-Track Scanning Radiometer (AATSR) and the Medium Resolution Imaging Spectrometer (MERIS) onboard the ENVISAT satellite.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full  Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Step 3: If a cloud layer exists at 5 km FOV but NOT at any 1 km FOV ⇒ new thin layer detected!⇒ Set 5 km FOV to CLOUDY (or, rather, keep the 5 km dataset unchanged).
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | POD cloud-free = a a + b (4) FAR cloudy = b b + d Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

-
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | The uppermost cloud layer (or layers) in the CALIPSO dataset is disregarded if the cloud optical thickness (summed if more than one layer) does not exceed the minimum cloud optical thickness (τ min ).
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | a sub-set (covering the years 2006-2009) of the CMSAF CLARA-A1 dataset derived from historical global AVHRR data.It was demonstrated how the CALIOP-provided information of cloud presence and cloud optical thickness can be used to delineate the current cloud detection limitations of the methods used to compile the CLARA-A1 dataset.Although the cloud detection capability does vary with time of day and with Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | single-layer clouds or too thin uppermost cloud layers.Results were shown to differ substantially depending on whether the cloud top boundary was defined as the uppermost CALIOP-derived cloud layer boundary or as the mid-level (i.e. the mean of cloud base and cloud top) of the corresponding CALIOP-observed cloud layer.The latter definition gives a height that is closer to the radiatively efficient level of the cloud which Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 12 .
Fig. 12. Same visualisation as in Fig. 9 but for categories day (left) and night (right).

Table 1 .
Contingency matrix for the two different satellite observations.

Table 4 .
Same as Table3but now after filtering results with cloud optical thickness threshold 0.35 (i.e.all CALIOP-detected clouds with smaller optical thickness are neglected and treated as a cloud-free observation).

Table 5 .
Cloud Top Height (CTO) results from CLARA-A1/PPS evaluated using unfiltered and filtered CALIOP results.Mean errors (Bias) and RMS errors are given for unfiltered (column 1) conditions and for two filtered conditions (columns 2 and 3) with two different cloud optical thickness thresholds.Mean errors are also given for the three cloud layer groups of low-level, medium-level and high-level clouds (explained in text).