Calibration of global MODIS cloud amount using CALIOP cloud profiles

The Moderate Resolution Imaging Spectroradiometer (MODIS) cloud detection procedure classifies instantaneous fields of view (IFOVs) as either “confident clear”, “probably clear”, “probably cloudy”, or “confident cloudy”. The cloud amount calculation requires quantitative cloud fractions to be assigned to these classes. The operational procedure used by the MODIS Science Team assumes that confident clear and probably clear IFOVs are cloud-free (cloud fraction 0 %), while the remaining categories are completely filled with clouds (cloud fraction 100 %). This study demonstrates that this “best-guess” approach is unreliable, especially on a regional/local scale. We use data from the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) instrument flown on the CloudAerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) mission, collocated with Aqua MODIS IFOV. Based on 33 793 648 paired observations acquired in January and July 2015, we conclude that actual cloud fractions to be associated with MODIS cloud mask categories are 21.5 %, 27.7 %, 66.6 %, and 94.7 %. Spatial variability is significant, even within a single MODIS algorithm path, and the operational approach introduces uncertainties of up to 30 % of cloud amount, notably in polar regions at night, and in selected locations over the Northern Hemisphere (e.g. China, the north-west coast of Africa, and eastern parts of the United States). Consequently, applications of MODIS data on a regional/local scale should first assess the extent of the uncertainty. We suggest using CALIPSO-based cloud fractions to improve MODIS cloud amount estimates. This approach can also be used for Terra MODIS data, and other passive cloud imagers, where the footprint is collocated with CALIPSO.


Introduction
Cloud plays a key role in distributing solar energy in the Earth's atmosphere (Trenberth et al., 2009). Consequently, research into the present and future state of the climate system requires accurate information about cloud amount. Depending on its frequency and physical properties, cloud can both heat (greenhouse effect: +30 W m −2 ) and cool (albedo effect: −48 W m −2 ) the atmosphere. Their net effect on the planetary radiation budget is negative, meaning the Earth would be warmer if all cloud disappeared (Ramanathan and Kiehl, 2006).
The Global Climate Observing System identifies 13 essential climate variables. This set of critical environmental parameters characterizes the Earth's climate (Hollmann et al., 2013); they not only include cloud properties, but they highlight that our knowledge of cloud relies largely on satellite remote sensing. Satellite cloud climatology starts with a cloud mask. The aim is to decide whether cloud is present in a sensor's instantaneous field of view (IFOV) or whether it is cloud-free. Input data include at-sensor registered radiances, along with other auxiliary information that aims to maximize cloud detection.
Efficient cloud detection algorithms have to consider the technical limitations of sensors, available computing power, and environmental factors such as the background (e.g. water, land, snow) and solar illumination (day and night). The resulting cloud mask takes the form of a map that divides IFOV into at least two categories: "cloud-free" and "cloudcontaminated" (or "cloud-filled"). Many masking algorithms introduce additional categories in order to reflect the level of uncertainty in cloud detection (Derrien and Le Gléau, 2005;Dybbroe et al., 2005;Kopp et al., 2014).
Published by Copernicus Publications on behalf of the European Geosciences Union. The Moderate Resolution Imaging Spectroradiometer (MODIS) is a cloud-imaging instrument that is flown on board NASA's polar orbiting satellites: Terra and Aqua. Circling the Earth in the morning orbit (10:30 local solar time, Terra), and afternoon orbit (13:30 local solar time, Aqua), these twin sensors provide a global picture of cloud four times each day, at 1 km per pixel resolution (Guenther et al., 2002;Platnick et al., 2003). With 36 spectral channels, continuous correction for orbital drift, and precisely calibrated detectors, MODIS has set a new standard in cloud remote sensing and is still considered to be a state-of-theart cloud imager, despite being launched in 1999 (Terra) and 2002 (Aqua).
MODIS's cloud detection scheme results in four cloud mask categories: "confident cloudy", "probably cloudy", "probably clear", and "confident clear" . The fact that these classes are presented as qualitative, textbased labels rather than a numeric probability causes the technical problem of how to quantitatively interpret these labels. A numeric interpretation is mandatory when instantaneous observations (Level 2 products) are aggregated spatially and/ or temporally to provide climatological information such as mean monthly cloud amount (Level 3 products).
The procedure implemented by NASA's MODIS Science Team (hereinafter the "standard" or "operational" procedure) is to assume that IFOVs declared confident cloudy and probably cloudy are, in fact, 100 % cloud-filled, while confident clear and probably clear are completely cloud-free (cloud fraction of 0 %) (Hubanks et al., 2008). The approach is widely used whenever there is a need to make a binary distinction between cloudy and cloud-free pixels -e.g. Gao et al. (2008), Remer et al. (2012), Wilson and Oreopoulos (2013), Wilson et al. (2014), Kraatz et al. (2017), and Gomis-Cebolla et al. (2020).
However, since the MODIS Science Team (ST) approach is only a "best guess", alternative assumptions are also used. For instance, it can be assumed that only confident cloudy pixels are "cloudy", while all remaining classes are 100 % cloud-free. Similarly, only confident clear detections can be considered truly cloud-free, while all other classes are assumed to be 100 % cloud-filled (Li et al., 2005). Krijger et al. (2007) argue that the latter approach leads to the false detection of small clouds, while cloud is frequently overlooked if the first method is applied. Another approach is simply to exclude probably clear and probably cloudy detections from the analysis. This strategy was adopted by Chan and Comiso (2013), whose work was based on only confident clear and confident cloudy categories of MODIS data.
Quantitative studies have shown that only considering the confident cloudy class as cloudy may be more consistent with other cloud data such as Landsat observations (Melchiorre et al., 2020), or visual observations at meteorological stations (Kotarba, 2015). On the other hand, Fontana et al. (2013) compared MODIS data with ground-based observations in Switzerland (four stations, 12 years of data) and found that results varied from station to station.
The theoretical range of uncertainty related to various interpretations of the MODIS cloud mask was investigated by Kotarba (2015). The latter study found that the global cloud amount estimates may differ by up to 14 %, depending on whether only confident cloudy detections are considered to be cloudy or whether the definition is extended to include intermediate classes. The discrepancy was found to increase by up to 40 %-60 % regionally, suggesting that MODIS cloud estimates are very uncertain in these areas. Such a wide range of uncertainty makes it difficult to run reliable studies on the climate system.
Neither the MODIS ST standard procedure nor any other best-guess variants have been validated on a global scale. Most importantly, no research-based, objective alternatives to the 0/0/100/100 interpretation have been suggested. This study addresses this problem. Specifically, it provides global cloud fractions based on quantitative analysis of CALIOP lidar observations. CALIOP (the Cloud-Aerosol Lidar with Orthogonal Polarization) is a cloud-profiling instrument flown onboard the CALIPSO (Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation) spacecraft. Launched in 2016, CALIPSO flies in close formation with the Aqua satellite; therefore both instruments -MODIS and CALIOPsample the same fragment of the atmosphere tens of seconds apart (Stephens et al., 2018). In this study, CALIPSO data are considered to be ground truth. This is because CALIOP is an active remote sensing instrument, which means that it can sample the atmosphere during the day and at night with comparable sensitivity. Imaging radiometers (such as MODIS) perform less effectively at night, when solar channels are missing. Furthermore, the use of short wavelengths makes CALIOP very sensitive to cloud of low optical thickness (e.g. sub-visual cirrus) that is often missed by imagers .
In the following sections we seek to answer the following questions. (1) What quantitative cloud fractions (based on CALIOP observations) should be applied to MODIS thematic cloud mask classes? (2) What uncertainties in global cloud amount are introduced by the MODIS ST standard procedure? Finally, we evaluate whether the standard procedure for calculating the MODIS Level 3 cloud amount is reliable.

MODIS data
The MODIS cloud detection scheme is based on thresholds that are applied to brightness temperature (thermal bands) and reflectance (solar channels), derived from observations in 22 spectral bands ranging from 0.66 to 13.9 µm. Ackerman et al. (1998, and Baum et al. (2012) provide very detailed descriptions of the cloud masking procedure. The general concept is as follows.
The algorithm executes a series of tests, each of which results in a confidence level (ranging from 0 to 1) that a particular IFOV is cloud-free. Tests to detect similar cloud types are grouped. The lowest confidence level for a test within a group is set as the confidence level for the whole group. Confidence levels for groups are then multiplied to determine the final confidence level (Q). Following this procedure, the IFOV is assigned to one of four cloud mask classes: confident clear (Q > 0.99), probably clear (Q > 0.95), probably cloudy (Q > 0.66), or confident cloudy (Q ≤ 0.66). Numeric values of the confidence level for an individual test, a particular group of tests, and the final confidence level (Q) are not provided within the MODIS Cloud Mask product, despite being potentially useful for a quantitative interpretation of Cloud Mask thematic categories. Only these four thematic classes are reported and used for further processing of MODIS cloud data.
The exact number of spectral tests executed for an IFOV varies from a few to over a dozen, depending on the path through the algorithm. Paths reflect different environmental conditions and are introduced to maximize success. Dedicated sets of spectral tests are executed for land, ocean, desert, and coastal areas, for both day and night conditions. The presence of snow and/or ice is taken into account, as is sunglint over oceans. Separate thresholds have been introduced for polar regions, which are defined as land and ocean within 30 • of each pole.
Cloud detection results are stored in the 48-bit "Cloud Mask" product, code-named MYD35 (Aqua) and MOD35 (Terra) following the MODIS nomenclature. In this study, we evaluated the latest version of MYD35 (Collection 061) data, available in the form of 5 min granules, at 1 km per pixel spatial resolution (at nadir), with native satellite projection. Each MYD35 file is accompanied by a MYD03 "Geolocation file" product that stores longitude and latitude information for individual cloud mask IFOV.

CALIOP data
CALIOP operates at 532 and 1064 nm. The instrument's pencil-like beam only scans locations along the satellite's ground track, as a trade-off for information on the vertical structure of cloud/aerosols. Its spatial resolution is a function of the satellite's altitude. Resolution is finest in the troposphere (altitude up to 8.2 km): 0.333 km horizontally and 30 m vertically. Between 8.2 and 20.2 km, vertical resolution falls to 60 m and horizontal sampling to 1 km. Between 20.2 and 30.1 km, data are even coarser: 1.667 km horizontal and 180 m vertical resolution. Higher in the atmosphere (30.1 to 40.0 km) horizontal resolution decreases to 5 km, while vertical resolution is 300 m Winker et al., 2006). CALIOP detects cloud by applying thresholds to 532 nm attenuated scattering ratios. The aim is to separate the cloud signal from the clear air background (molecular scattering), aerosols, and instrument noise. The algorithm calculates cloud base height, cloud-top height, and -as a consequence -the number of cloud layers within a profile. Up to 10 layers can be reported. The procedure is fully automatic ). The output is stored in the Level 2 Cloud Layer Data product, available at 333 m, 1 km, and 5 km along-track sampling intervals. Here, we use the 1 km interval (version 4.20; CAL_LID_L2_01kmCLay-Standard-V4-20), as its resolution matches the spatial resolution of the MODIS cloud mask. Furthermore, 1 km is the highest available level of detail for CALIOP data within the troposphere.
In order to use the CALIOP product to evaluate MODIS data, three-dimensional cloud layer data were reduced to column-integrated, binary cloud or no cloud information. Specifically, we focused on the "Number_Layers_Found" parameter provided in the CAL_LID_L2 product. "No cloud" was recorded when the latter variable was set to 0 (i.e. zero layers found), and "cloud" otherwise (i.e. at least one layer was reported). Geolocation was based on longitude and latitude arrays included in the product at 1 km spatial resolution.
In some cases, cloud and aerosol can appear similar to CALIOP. The cloud-aerosol discrimination (CAD) score, which is a numerical index stored in the CAL_LID_L2 product, provides information about the algorithm's uncertainty in separating cloud and aerosol. In the case of cloud, CAD values vary between 0 % (it is unclear whether aerosol or cloud was observed) and 100 % (cloud detected with the highest confidence). The index is calculated for each cloud layer found in the CALIOP atmospheric profile. Since our study focuses on column-integrated information of cloud presence, we selected the highest CAD value within a profile. Statistics for January and July 2015 showed that 95.6 % of considered CALIOP observations were characterized by a CAD score of at least 70 %, while it was below 20 % for only 1.5 % of data. Therefore, the selected CALIOP data can be considered a reliable reference for MODIS. See the Supplement, Fig. S1 for more detailed statistics about CAD scores.

Matching CALIOP and MODIS data
Matching CALIOP data is a well-established method for the calibration/validation of atmospheric products from various space missions. It has already been widely used for Aqua MODIS (Baum et al., 2012;Holz et al., 2008;Sun-Mack et al., 2014;Wang et al., 2016;Xie et al., 2010) and sensors flown on board Suomi NPP, NOAA, and MetOp polar orbiting spacecraft, which occasionally synchronize their orbital configuration with CALIPSO Heidinger et al., 2012;Karlsson and Johansson, 2013;Karlsson and Dybbroe, 2010). CALIPSO also passes within the field of view of other geostationary satellites, and CALIOP data are used to assess their atmospheric products (Sèze et al., 2015;Shang et al., 2018).
In this study, Aqua MODIS data for January and July 2005 were paired with corresponding CALIPSO and CALIOP observations. The matching procedure selected a MODIS IFOV and compared it with the corresponding CALIOP profile (where the geometric centre was within the selected MODIS IFOV). The orbital configuration of the two missions only allows CALIOP to sample MODIS IFOVs that are close to the MODIS nadir (due to low sensor viewing angles). Consequently, matching observations across the whole width of the MODIS swath is not possible.
Although very straightforward, the procedure was timeconsuming since a single MODIS granule contains ∼ 2030 IFOVs, and a full day of Aqua observations produces 288 granules. The final database consisted of 33 793 648 paired MODIS-CALIOP observations. Average spatial separation between the centres of MODIS and CALIOP IFOVs was 418 m, and 19 % had a separation of less than 250 m. Temporal differences between lidar and imager observations were determined using the spacecrafts' on-orbit separation, and ranged from 60 to 97 s (81 s on average). Our dataset excluded one MODIS cloud mask processing path: sunglint. This was because CALIPSO's orbit has been intentionally designed to avoid sunglint areas, in order to avoid the lidar being "blinded" by solar reflection from the ocean.
Our empirical calculation of cloud fraction in each MODIS cloud mask class was based on the ratio of CALIOP cloudy detections to all detections within a class. A perfect MODIS cloud detection algorithm would categorize a 0 % cloud fraction as confident clear, while a 100 % cloud fraction would be categorized as confident cloudy.
The resulting MODIS-CALIOP statistics have been spatially aggregated and are shown as global maps with 2.5 • × 2.5 • longitude and latitude resolution.

Misdetection of cloud and clear sky by MODIS
The first key point that emerged from the matched MODISlidar observations was the accuracy of MODIS cloud detections. Overall accuracy in January and July 2015, compared to reference CALIOP data, was 89.4 % during the day and 84.2 % at night (Table 1). This statistic assumes that probably clear detections are merged with confident clear, and probably cloudy detections are combined with confident cloudy. If a less tolerant approach is applied, i.e. only confident clear and confident cloudy detections are considered (probably clear and probably cloudy classes are interpreted as misdetections), overall accuracy fell to 81.9 % during the day and 73.3 % at night.
Clouds missed by MODIS but detected by CALIOP were most frequent during the polar night, regardless of the hemi-sphere (Fig. 1c, d). Up to 40 % of MODIS confident clear and probably clear detections were found to be incorrect around Antarctica in July and the Arctic in January. Globally, daytime (Fig. 1a, b) misdetections were around half of those at night. They only exceeded 30 % locally, and polar regions were less affected. A notable observation was July in the Northern Hemisphere, where only a few small regions of misdetection were observed. The analysis highlighted a belt of relatively higher-frequency misdetections (15 %-25 %) in the equatorial zone; here the magnitude of the effect was similar for both day and night.
Only a few occasions were identified when over 10 %-15 % of MODIS confident cloudy and probably cloudy observations were identified as cloud-free by CALIOP (Fig. 2). Further analysis showed that although false detection was rare in polar regions, it was significant in specific regions of the Northern Hemisphere. North-east China emerged as the most problematic area (Fig. 2a). Here, 50 %-70 % of MODIS confident cloudy and probably cloudy detections were cloudfree according to CALIOP. However, this high rate of false detection was only observed in January, and only during the day.

Cloud fraction for cloud mask classes
Our empirical calculation of cloud fraction in each MODIS cloud mask class was based on the ratio of CALIOP cloudy detections to all detections within a class. A perfect MODIS cloud detection algorithm would categorize a 0 % cloud fraction as confident clear, while a 100 % cloud fraction would be categorized as confident cloudy.
On average, one-fifth of MODIS confident clear detections were found to be cloudy by CALIOP. Consequently, the average cloud fraction for this class was 21.5 %, instead of the theoretically expected 0 % (Table 2). At night, the fraction was over twice the daytime value (29.5 % compared to 12.7 %). On the other hand, pixels flagged by the MODIS algorithm as confident cloudy were, almost always, contaminated with some cloud and were sometimes cloud-filled. Regardless of the time of day, the actual CALIOP-based cloud fraction for confident cloudy detections was close to 100 %, reaching 94.7 %.
MODIS intermediate classes constituted 13.3 % of all detections. CALIOP cloud fractions were 27.7 % and 66.6 % for probably clear and probably cloudy classes respectively. The statistics revealed a difference of up to 17 % between day and night conditions, and it was especially small for the probably clear class (1.3 %) and the confident cloudy class (daytime had no impact at all).
The parameters reported in Table 2 are global averages (means), and spatial diversity was observed. Differences were smallest for the confident cloudy class -both during the day (Fig. 3a) and at night (Fig. 3b) -and the CALIOPbased cloud fraction exceeded 90 % at almost every location. North-east China, the southern Arabian Peninsula, and    eastern Antarctica were the only significant exceptions; here cloud fraction decreased to 50 %-70 %. The cloud fraction distribution was homogeneous for the confident clear class, however, only during the day (Fig. 3g). At night (Fig. 3h) it increased substantially in polar regions, especially over the oceans of the Southern Hemisphere, along the coast of Antarctica. Unlike polar regions, no noticeable day-night difference was observed for mid-latitudes and low latitudes (< 10 %); here, the cloud fraction was very low (< 20 %), and very few MODIS confident clear detections were identified as cloudy by CALIOP.
Among the MODIS intermediate classes, probably clear differed most from CALIOP-based data. First, a high cloud fraction (> 70 %) was observed at night along the Equator and in polar regions (both oceanic and continental, Fig. 3f). At mid-latitudes the cloud fraction for MODIS probably clear observations was relatively low (< 10 %-20 %). This pattern was inverted during the day (Fig. 3e). At this time

5002
A. Z. Kotarba: MODIS cloud amount calibrated with CALIOP a higher (50 %-75 %) cloud fraction was noted for midlatitudes and over parts (typically land) of the polar regions.

Cloud fraction as a function of the algorithm path
The MODIS cloud detection algorithm distinguishes between day and night (Tables 1, 2) and four types of background (land, desert, coast, ocean), each of which can be either snow-covered or snow-free. CALIOP-based cloud fractions for all algorithm paths are reported in Table 3. These values give a detailed understanding of MODIS cloud detection results. Data are given for each class of the MODIS cloud mask separately. In our study, we structured the paths through the algorithm in more detail. Snow-covered conditions were considered for land, desert, ocean, and coast separately, while in the MODIS algorithm they are grouped as snow/ice. This greater level of detail allowed us to observe how the presence of snow impacted the cloud mask over different backgrounds.
Per-class estimates of cloud fraction were very consistent for all algorithm paths for the confident cloudy category (Table 3). Final values ranged between 97.7 % (night, snowfree, land) and 86.4 % (night, snow-covered, desert) and were close to the standard Level 3 assumption of 100 %. This finding contrasted with cloud fractions found for the confident clear category. While MODIS recorded cloud-free conditions, CALIOP data revealed that the actual cloud fraction ranged from 8.0 % (night, snow-free, land) to 49.7 % (night, snow-covered, ocean).
The combination of night, an oceanic background, and snow cover (or sea ice) constituted the most cloudy scenario (Fig. 4). Here, a very high cloud fraction was found for not only the confident cloudy category (96.8 %, Fig. 4a), but also all remaining classes: 82.5 % (probably cloudy; Fig. 4b), 73.3 % (probably clear; Fig. 4c), and, surprisingly, up to 49.7 % for confident clear (Fig. 4d).
CALIOP-based cloud fractions were most consistent with the standard Level 3 interpretation for snow-free land at night (Fig. 5). Here, the cloud fraction for confident clear was low (8.0 %; Fig. 5d) and very high for confident cloudy (97.7 %; Fig. 5a). At the same time, intermediate classes were wellseparated: 68.5 % for probably cloudy (Fig. 5b) and 25.6 % for probably clear (Fig. 5c). Globally, no significant difference was found for cloud fraction values for the night-snowfree-land algorithm path. A small exception was noted for the probably clear type, where the cloud fraction was 10 %-30 % higher in the tropics compared to the rest of the world.
A similar spatial distribution of CALIOP-based cloud fractions was observed for snow-free land during the day -the scenario of particular interest for land-vegetation remote sensing with MODIS. The two notable differences were related to probably clear (Fig. 6c) and confident clear categories (Fig. 6d). The latter occurred twice as often during the day (15.6 %) than at night (8.0 %). Similarly, cloud was more frequent in the probably clear class. However, this was only found in the tropics and at high latitudes, which mirrored a zonal pattern that was only weakly seen at night.
As ice-free oceans represent the majority of Earth's surface, cloud detection over ocean is the most frequent algorithm path. Daytime conditions make detection easier (due to the availability of solar channels). Under such circumstances, CALIOP detected cloud in 10.5 % of MODIS's confident clear observations (Fig. 7d) and confirmed 95.2 % of confident cloudy detections (Fig. 7a). Cloud fractions for intermediate classes (daytime over ice-free ocean) were 54.5 % and 28.4 % for probably cloudy (Fig. 7b) and probably clear (Fig. 7c) categories, respectively. Probably clear was the only class where there was a clear latitude-dependent cloud fraction: values increased by 30 %-60 % along a path ∼ 30-40 • N/S.

Discussion
Our investigation of spatially and temporally collocated MODIS (cloud imager) and CALIOP (cloud-profiling lidar) observations for January and July 2015 revealed that MODIS Collection 061 global cloud amount estimates are imperfect. During the generation of the Level 2 product, the masking algorithm fails to accurately report cloud over polar regions, and over selected locations at lower latitudes. Consequently (as discussed in this section) the Level 3 product generation underestimates cloud fractions for cloud mask classes in numerous regions. The reliability of these results depends on several factors, most notably the spatial and temporal accuracy of Aqua MODIS and CALIPSO-CALIOP collocation.
Temporal differences between Aqua and CALIPSO observations varied from 60 to 97 s. In this time, cloud can develop and move, introducing the risk that CALIOP observes a different state of the atmosphere compared to MODIS. Várnai and Marshak (2012) evaluated the problem by comparing MODIS reflectance with that collected by the Wide Field Camera. The latter is an imaging instrument flown on board CALIPSO, along with CALIOP. They found that for low cloud, radiance differed only slightly over 72 s, and it was reasonable to ignore any discrepancies when focusing on aerosol properties (they gave no particular conclusions for cloud). In order to test how sensitive our results were to the time shift, we calculated the overall accuracy of the cloud detection algorithm as a function of the time between Aqua and CALIPSO passes. The results were very consistent: despite the shift, accuracy remained at 86.7 ± 0.1 %. This finding confirmed that the temporal separation between Aqua and CALIPSO had no significant impact on the results of our study.
Another potential source of uncertainty is the geometric mismatch between MODIS and CALIOP IFOVs. They are not aligned perfectly: 66 % of collocated IFOVs were separated by less than 0.5 km and 82 % by less than 0.6 km. Similar statistics -75 % and 93 % respectively -were found by Table 3. CALIOP-based cloud fractions for MODIS cloud mask classes, calculated individually for each MODIS algorithm path. Note that more paths are reported here than in the MODIS project. Snow-covered ocean, land, desert, and coast constitute a single path in the operational algorithm, while here they are reported individually to highlight how snow impacts the results. The sunglint path is missing as CALIOP does not sample over sunglint areas. Numbers in brackets refer to how frequently (% of n) a given algorithm path was executed, n = 33, 793, 648.  Wang et al. (2016) in their investigation of cloud based on MODIS and CALIOP observations. To investigate whether geometric conditions did have an impact on our results, we calculated the overall accuracy of the MODIS cloud mask as a function of the distance between MODIS and CALIOP IFOVs. For ranges up to 1 km with a 100 m step, the change in accuracy was insignificant: 87.0 ± 0.3 % on average. (See Fig. S2 for more detailed statistics about the spatial and temporal separation between MODIS and CALIOP.) It is possible that agreement between MODIS and CALIOP data is affected by cloud optical thickness (τ ) or, more precisely, by the higher sensitivity of CALIOP in detecting optically thin cloud. Ackerman et al. (2008) estimated the MODIS limit for τ to be approximately 0.4. A similar improvement in agreement with CALIOP as a consequence of increasing τ was observed by Karlsson and Håkansson (2018) for the Advanced Very High Resolution Radiometer (AVHRR) instrument. The latter study demonstrated that the imager's probability of detection changed in the range 0.0 < τ < 1.0. We calculated the same statistic, and we found that the probability distribution for MODIS was identical to AVHRR -although MODIS values were higher. This finding strongly suggests that cloud thickness has the same impact on our results as that found in previous studies. Collection 006 of MODIS data was investigated by Wang et al. (2016), who used lidar-radar (CALIPSO-CloudSat) pro-files to focus on daytime multi-layered clouds. Our findings are consistent with those reported by Wang et al. (2016), despite the fact that the latter authors used a dataset of 267 million IFOVs, while our study relied on around 33 million profiles. Their validation of Collection 006 reliability found overall agreement of 77.8 % compared to our study, which found 81.9 %. The difference may be due to the different sample sizes. Our result for cloud-free sky detection was slightly higher than in Wang et al. (2016): 25.5 % compared to 20.9 %. On the other hand, results for cloudy-sky detection were very similar: 56.9 % compared to 56.4 % in our study.
Our study revealed that even for Collection 061, i.e. the most recent (July 2020) version of the MODIS cloud mask, up to 40 % of cloud-free skies detected during the polar night were actually cloudy. Daytime accuracy was lowest over China (in January), the USA and Canada (in January), and over tropical ocean along the west (January) and east (July) coasts of Africa. In these cases, MODIS detected cloud that did not exist according to CALIOP. False detections may be due to snow cover (the USA and Canada), high aerosol content over China (Zhang et al., 2019;, and ocean bordering desert regions in North Africa (Weinzierl et al., 2017;Zuluaga et al., 2012).
As reported by Wang et al. (2016), and previously by Baum et al. (2012) and Ackerman et al. (2008), cloud detection in polar regions remains an unresolved issue for MODIS and similar passive imaging radiometers. Polar night is especially challenging. Successful discrimination between cloud and the underlying surface requires radiance measurements in ice absorption bands (e.g. 1.6 or 2.1 µm). But as these are only available in daytime, night-time detection has to rely on thermal infrared data. As thermal inversion in the polar tropopause decreases the thermal contrast between cloud and the background, the thermal signatures of cloud and the land-ocean surface become indistinguishable, leading to cloud masking errors (Liu et al., 2004). CALIOP, however, does not require solar illumination to operate. As it uses light emitted by the instrument itself, its performance is far less affected by day-night conditions. CALIOP's night-time data are of even higher quality, because solar illumination introduces an additional background signal and, thus, decreases the signal-to-noise ratio .
Furthermore, MODIS tends to miss up to ∼ 20 % of cloud along the Intertropical Convergence Zone (ITCZ), regardless of the time of year (January or June) and the time of day. This can be partially explained by the fact that MODIS is less sensitive to optically thin cloud than CALIPSO, and the ITCZ is the region where cirrus is most frequently observed (Sassen et al., 2009). The higher sensitivity of CALIOP to optically thin cirrus, and the higher sensitivity of the lidar during night- time, also explains why CALIOP-based cloud fractions for MODIS confident clear and probably clear classes are higher along the ITCZ at night (Fig. 3f, h) than during the day (Fig. 3e, g).
The main goal of our study was to investigate the validity of the standard (operational) approach to the quantitative interpretation of MODIS cloud mask classes. The most important consequence of calculating empirical cloud fractions for MODIS cloud mask categories is the ability to recalculate global cloud amount with new weights. Therefore, instead of using global fractions (reported in Table 2), we derived a set of dedicated fractions for each algorithm path and each 2.5 • grid box (i.e. a local equivalent to the data given in Table 3). This considers MODIS IFOVs within the full swath (excluding sunglint), and not only those collocated with CALIOP. Full-swath data were used because the MODIS L3 cloud amount product applies the same cloud mask interpretation to all IFOVs, regardless of their off-nadir angle. On the other hand, the use of nadir-only MODIS observations would result in CALIOP-like spatial coverage of the data, creating Figure 6. CALIOP-based cloud fraction for MODIS cloud mask classes for the "daytime, snow-free land" algorithm path and corresponding histograms (red vertical line indicates the mean value). significant gaps due to CALIOP's pencil-like viewing geometry. Figure 8 illustrates the results of the calculation and reports differences in cloud amount between the MODIS ST operational product and the product generated using the fractions presented in this study.
The outcome of the simulation shows that the use of current operational cloud fractions introduces significant errors. In some locations, MODIS underestimates cloud amount by 20 %-40 %, most notably in polar regions at night. An overestimation of similar magnitude is observed mostly over the Northern Hemisphere: the USA and Canada and China in January (both day and night) and the tropical coasts of Africa during the day (in both January and July). Consequently, MODIS Level 3 estimates of cloud amount should be used with great caution in those regions. This is especially important for the Arctic, which is undergoing a rapid change in climatic conditions (Serreze and Barry, 2011), and where cloud has been found to be an essential element in feedback (Kay et al., 2008;Vavrus, 2004;Shupe and Intrieri, 2004;Tan and Storelvmo, 2019).
The availability of collocated MODIS and CALIOP observations also allowed us to examine which of the three bestguess interpretations of cloud mask categories is most accurate: the one when only confident cloud IFOVs are cloudy, Figure 7. CALIOP-based cloud fraction for MODIS cloud mask classes for the "daytime, snow-free ocean" algorithm path and corresponding histograms (red vertical line indicates the mean value). the one that only considers confident clear to be clear, or the operational approach. We therefore calculated merged global cloud amount for January and July 2015. Our results show that, on the global scale, the standard approach is closest to CALIOP reference data, although only during the day (Table 4). At night, it is more accurate to assume that only confident clear is actually cloud-free. The global result is biased by the polar night. In these conditions, all three best guess interpretations noticeably underestimate cloud amount. At low latitudes and mid-latitudes the standard (operational) approach differs from CALIOP data by ±2 %. However, it should be noted that these statistics relate to large areas. As our study shows, regional differences are orders of magnitude larger.
Our study assumed that CALIOP's cloudy IFOV was always completely cloud-filled. This assumption is common when interpreting cloud masks based on data from the majority of imaging radiometers flown on board meteorological and land-imaging satellites. However, studies by Zhao and Di Girolamo (2006), and Kotarba (2010) suggest that this postulate may not be true. Both of the latter studies took advantage of a rare collocation between a meteorological imager (MODIS) and the high-resolution land imager (ASTER) flown on board the Terra satellite. Nearly 3000 ASTER IFOVs were located within each MODIS pixel. Kotarba (2010) showed that for sunglint-free oceanic scenes in the tropics, actual cloud coverage for the confident cloudy MODIS category was 79.2 % (mean) or 99.8 % (median), instead of the assumed 100 %. Comparable statistics for CALIOP are not available, as the CALIPSO spacecraft does not carry a high-resolution imager. Given the lack of alternatives, we must accept the hypothesis that cloudy means 100 % cloud-filled.

Summary and conclusion
This study investigated 33 793 648 collocated MODIS (cloud imager) and CALIOP (cloud-profiling lidar) observations, acquired in January and July 2015. Our evaluation of the dataset allowed us to answer three essential questions, related to global estimates of cloud amount resulting from the Aqua MODIS mission. These questions are as follows.
1. What are the actual cloud fractions corresponding to MODIS cloud mask classes? We found that these fractions are 21.5 %, 27.7 %, 66.6 %, and 94.7 %, rather than the MODIS Science Team-assumed values of 0 %, 0 %, 100 %, and 100 % for the confident clear, probably clear, probably cloudy, and confident cloudy categories respectively (Table 2). Importantly, we found that the percentage of cloud cover to be assigned to MODIS cloud mask classes varied spatially (Fig. 3), and we recommend that global fractions should be avoided, in favour of local alternatives.
2. How significant are uncertainties in global cloud amount estimates calculated using the MODIS ST operational approach? We found that uncertainties were up to −30 % of cloud amount in the polar regions at night and up to +30 % of cloud amount in selected locations over the Northern Hemisphere, more frequently during the day (Fig. 8).
3. Is the MODIS Level 3 standard approach reliable? Our results showed that when a global cloud amount value is required (day and night, for all latitudes), the standard approach can be considered reliable (Table 4). We found that, in this case, it was more accurate than other best-guess approaches -namely only confident clear is clear (other classes are cloudy) and confident cloudy is cloudy (other classes are clear). However, on a regional scale the standard approach fails (Fig. 8). Whenever MODIS cloud amount is estimated regionally or locally it is necessary to assess whether a particular location might be affected by an error of up to ±30 %.
Errors and uncertainties related to the generation of the MODIS Level 3 cloud amount product originate in the Level 2 product: the cloud mask (Figs. 1 and 2 vs. Fig. 8).
The cloud detection algorithm is more or less accurate depending on environmental conditions, which are approximated as algorithm paths (Table 3). However, conditions within paths are not constant (Figs. 4-7): for instance, the same radiance/reflectance thresholds are applied to Europe, the USA, and China, while environmental conditions in these locations are not the same (e.g. different aerosol loads, different aerosol optical properties). The MODIS Science Team have attempted to discriminate between these conditions. For instance, since collection 006 the 0.86 µm reflectance test over land considers thresholds that are a function of the normalized difference vegetation index (NDVI) and scattering angle. Although cloud misclassification is less frequent than in previous Collections, it still occurs and impacts the degree of uncertainty of L3 cloud amount estimates, as shown in this study.
CALIOP-based estimates of cloud fraction are a robust way to adjust (and correct) MODIS estimates. The method described in this paper can be used globally, with the exception of sunglint regions (which are not sampled by CALIOP). In these areas best-guess findings can, potentially, be applied. CALIOP also does not sample MODIS IFOVs other than those close to nadir, while increasing the sensor's zenith angle impacts cloud amount estimates (Maddux et al., 2010). However, the MODIS ST standard procedure assumes identical cloud fractions for all cloud mask classes, regardless of the viewing angle. The analogical application of CALIPSObased cloud fractions may still be an improvement over the MODIS ST best-guess procedure, as it is for nadir IFOVs. The potential availability of lidar data for all MODIS zenith angles can further improve the method.
The polar regions benefit most from the new method. Cloud fractions derived for Aqua MODIS may also be adopted for Terra MODIS, since the two sensors are expected to produce comparable and homogenous records. Moreover, the occasional collocation of the CALIPSO satellite with the AVHRR and Visible Infrared Imaging Radiometer Suite (VI-IRS) instruments makes it possible to calculate similar cloud fractions for these missions and produce more reliable cloud climatologies. Supplement. The supplement related to this article is available online at: https://doi.org/10.5194/amt-13-4995-2020-supplement.
Competing interests. The author declares that there is no conflict of interest.
Financial support. This research has been supported by the National Science Centre of Poland (grant no. UMO-2017/25/B/ST10/01787) and the Space Research Centre, Polish Academy of Sciences (Statutory research theme "Satellite monitoring of geophysical processes in the atmosphere and Earth's surface, related to climate and light pollution").
Review statement. This paper was edited by Andrew Sayer and reviewed by two anonymous referees.