Retrieval of liquid water cloud properties from POLDER-3 measurements using a neural network ensemble approach

. This paper describes a neural network algorithm for the estimation of liquid water cloud optical properties from the Polarization and Directionality of Earth’s Reﬂectances-3 (POLDER-3) instrument aboard the Polarization & Anisotropy of Reﬂectances for Atmospheric Sciences coupled with Observations from a Lidar (PARASOL) satellite. The algorithm has been trained on synthetic multi-angle, multi-wavelength measurements of reﬂectance and polarization and has been applied to the processing of 1 year of POLDER-3 data. Comparisons of the retrieved cloud properties with Moderate Resolution Imaging Spectroradiometer (MODIS) products show that the neural network algorithm has a low bias of around 2 in cloud optical thickness (COT) and between 1 and 2 µm in the cloud effective radius. Comparisons with existing POLDER-3 datasets suggest that the proposed scheme may have enhanced capabilities

Abstract. This paper describes a neural network algorithm for the estimation of liquid water cloud optical properties from the Polarization and Directionality of Earth's Reflectances-3 (POLDER-3) instrument aboard the Polarization & Anisotropy of Reflectances for Atmospheric Sciences coupled with Observations from a Lidar (PARASOL) satellite. The algorithm has been trained on synthetic multi-angle, multi-wavelength measurements of reflectance and polarization and has been applied to the processing of 1 year of POLDER-3 data. Comparisons of the retrieved cloud properties with Moderate Resolution Imaging Spectroradiometer (MODIS) products show that the neural network algorithm has a low bias of around 2 in cloud optical thickness (COT) and between 1 and 2 µm in the cloud effective radius. Comparisons with existing POLDER-3 datasets suggest that the proposed scheme may have enhanced capabilities for cloud effective radius retrieval, at least over land. An additional feature of the presented algorithm is that it provides COT and effective radius retrievals at the native POLDER-3 Level 1B pixel level.

Introduction
Clouds are undoubtedly one of the most important components of the Earth system. Cloud formation and transport processes are among the most imposing mechanisms through which water is daily redistributed across our planet, with obvious implications for meteorology and climate. In this re-gard, it is worthwhile to mention that every day approximately between 55 % and 75 % of the Earth's surface is covered by clouds (Stubenrauch et al., 2013). In addition to their role in the water cycle, clouds also impact the Earth's climate by affecting the planetary energy balance in multiple ways. They exert a cooling effect by reflecting incoming solar radiation at visible wavelengths and a warming effect by absorbing and re-emitting infrared radiation (Rossow and Lacis, 1990;Rossow and Zhang, 1995). The impact of clouds on climate is further complicated by the existence of a number of feedback mechanisms involving clouds and temperature (Stephens, 2004) and by cloud-aerosol interactions (Rosenfeld et al., 2014;Fan et al., 2016). According to the latest reports of the Intergovernmental Panel on Climate Change (IPCC), the net effect of clouds on our climate is still highly uncertain (Boucher et al., 2013).
In order to reduce our uncertainty about the effect of clouds on the climate system, it is crucial to establish a global observational basis for a number of cloud properties. These include cloud cover, thermodynamic phase, optical thickness, droplet concentration, height and size. The estimation of cloud properties by means of satellite remote sensing has received significant attention in the last 3 decades. The most widespread estimation methods infer cloud properties from measurements of reflected sunlight in the visible, near and shortwave infrared spectral domain or from measurements of emitted radiation in the longwave infrared domain. Reflectance measurements in visible (VIS) and near-infrared (NIR) channels with no substantial water absorption are 1698 A. Di Noia et al.: Neural network cloud retrievals from POLDER sensitive to the cloud optical thickness (COT), whereas reflectance measurements in shortwave infrared (SWIR) channels with significant absorption by liquid water or ice are sensitive to the cloud effective radius (Nakajima and King, 1990). Brightness temperature measurements in the thermal infrared (TIR) allow the estimation of cloud top pressure and temperature (Menzel et al., 2008). Furthermore, measurements in SWIR and TIR channels at which liquid water and ice exhibit different absorption behaviour are suitable for the estimation of the cloud thermodynamic phase (Baum et al., 2000). The measurement principles mentioned so far form the basis for the estimation of cloud properties from multispectral instruments such as the Moderate Resolution Imaging Spectroradiometer (MODIS), as described in King et al. (2003) and Platnick et al. (2003Platnick et al. ( , 2017, or the Spinning Enhanced Visible and Infrared Imager (SEVIRI; Roebeling et al., 2006). Complementary information about the cloud liquid and ice water path can be obtained from passive microwave measurements (Vivekanandan et al., 1991;Liu and Curry, 1993), and information on the vertical structure of clouds is provided by active instruments, which include spaceborne lidars (Wu et al., 2011) and cloud profiling radars (Stephens et al., 2008).
An additional source of information for the retrieval of cloud properties from space is represented by multi-angle polarimetric observations, such as those provided by the Polarization and Directionality of Earth's Reflectances (POLDER) instruments (Deschamps et al., 1994), which will be followed by the by the Multi-Viewing Multi-Channel Multi-Polarization Imaging (3MI) instrument (Fougnie et al., 2018) in 2021. While the COT retrieval in POLDER Level 2 products follows a principle similar to that used in MODIS retrievals (Buriez et al., 1997), the retrieval of the cloud effective radius is based on the analysis of the polarized reflectance at scattering angles ranging from approximately 140 to 165 • . More specifically, in the presence of a liquid water cloud, the polarized reflectance has a maximum at a scattering angle close to 142 • (the primary cloudbow), and -if the droplet size distribution is narrow enough (effective variance smaller than ∼ 0.05) -a number of secondary maxima in the angular range from 145 to 165 • . The scattering angles at which such secondary maxima occur depend on the cloud effective radius (Bréon and Goloub, 1998;Bréon and Doutriaux-Boucher, 2005;Alexandrov et al., 2012a). The amplitude of the angular oscillations of the polarized reflectance instead depends on the effective variance, with smaller effective variances leading to wider oscillations (Shang et al., 2015).
Cloud effective radius retrievals from multi-angle polarimetry are often based on tabulated synthetic measurements and require an assumption about the cloud droplet size distribution, which can bias the retrievals toward certain cloud types (Bréon and Doutriaux-Boucher, 2005). When polarization measurements are available at a large number of viewing angles, as is the case with the airborne Research Scanning Polarimeter (RSP) instrument (Cairns et al., 1999), techniques exist for estimating the complete cloud droplet size distribution that rely on fewer and less-stringent assumptions (Alexandrov et al., 2012b). In this paper we investigate an alternative method for the estimation of the COT and cloud effective radius from satellite multi-angle polarimetric measurements, based on artificial neural networks (NNs). NNs have been widely applied to cloud remote sensing problems, including cloud detection and classification (Miller and Emery, 1997;Aires et al., 2011;Taravat et al., 2015), retrieval of cloud properties based on traditional, plane-parallel radiative transfer assumptions (Cerdeña et al., 2007;Loyola et al., 2007Loyola et al., , 2010Loyola et al., , 2018Håkansson et al., 2018), and threedimensional retrievals (Cornet et al., 2004;Cornet et al., 2005;Evans et al., 2008;Okamura et al., 2017). Among the attractive features of NN-based retrieval schemes are their high speed and their modest memory demand (at least after the training phase is completed), which make them suitable for processing large amounts of measurements in very little time. Furthermore, NN retrievals have sometimes been shown to be more accurate than lookup-table (LUT) retrievals with reasonably sized LUTs (Di Noia et al., 2015Noia et al., , 2017Whitburn et al., 2016). Besides the aforementioned advantages, NN-based schemes also have some disadvantages with respect to other methods. One disadvantage is that they do not directly provide a measure of how well the retrieved parameters fit the measurements. Another disadvantage is that NN retrievals are designed by optimizing a cost function that is not defined on a single retrieval but is defined on a statistical sample. This means, for example, that a NN retrieval performed on a single measurement is not necessarily the one that provides the best fit with that measurement when fed back to a radiative transfer model.
While most of the NN-based cloud retrieval schemes have been developed for single-viewing, multispectral reflectance measurements, the application of neural networks to multiangle observations has been limited so far, with few examples in the closely related task of aerosol retrieval (Han et al., 2006;Di Noia et al., 2015, one in the context of three-dimensional cloud retrievals (Evans et al., 2008) and a recently developed method for cloud retrievals from RSP (Segal-Rozenhaimer et al., 2018). The main challenge posed by multi-angle measurements in the design of neural network retrieval schemes lies in the variability in the combinations of viewing angles at which the measurements are performed. In fact, in such a situation the values taken by a measurement vector are not only dependent on the atmospheric state to be retrieved but are also strongly dependent on the observation geometry, which makes the inverse model to be learned by the neural network particularly complex. For instance, a vector of multi-angle polarized radiances over a cloudy scene characterized by a certain combination of cloud parameters (COT, effective radius, effective variance, etc.) can take dramatically different values depending on whether the sampled angular interval contains the cloudbow scattering angle range (approximately 135-165 • ). This problem is especially noticeable for large-swath instruments, in which the pixel-topixel variability in the available set of observation geometries is largest. In order to cope with this problem, it is important to use the variables that describe the observation geometry as inputs for the neural network and to generate a training set that reflects the real statistical distribution and correlation structure of such variables. Given the complexity of the learning problem, in order to enhance the accuracy of our neural network retrievals, in this work we adopt an approach based on neural network ensembles (Hansen and Salamon, 1990), which simply consists of training multiple neural networks and averaging their outputs. This approach is known to reduce random retrieval errors.
In addition to being based on neural networks, the retrieval scheme presented in this paper differs from the existing cloud retrieval techniques for POLDER-3 in the used input quantities and in the spatial resolution of the output product. While the operational COT product does not use polarization information, it performs one COT retrieval for each pixel and each available viewing angle and then combines the COT retrievals on 20 km 2 ×20 km 2 "super pixels" (Buriez et al., 1997); in our scheme the COT is retrieved by combining radiance and degree of linear polarization (DoLP) measurements at multiple viewing angles, and the retrieval is provided at the native POLDER-3 pixel size of 6 km 2 ×7 km 2 . Also the effective radius is retrieved at the native POLDER-3 pixel size in an attempt to avoid relying on the strong assumptions about the spatial homogeneity of cloud properties that are necessary in algorithms using the super-pixel approach (the super pixels defined in the existing effective radius product for POLDER-3 have a size of 150 km 2 ×150 km 2 ). The paper is structured as follows. In Sect. 2 a brief description of the POLDER-3 instrument is given. In Sect. 3 the design of the neural network and its verification of synthetic data are discussed. Section 4 discusses the application of the neural network retrieval to a year of POLDER-3 data and the comparison of the retrieval results to MODIS Level 2 and Level 3 data and to existing POLDER-3 retrievals. Finally, in Sect. 5 conclusions are drawn.

The POLDER-3 instrument
The POLDER-3 instrument was a multi-angle, multiwavelength polarimeter flying aboard the Polarization & Anisotropy of Reflectances for Atmospheric Sciences coupled with Observations from a Lidar (PARASOL) satellite (Lier and Bach, 2008). It was launched in 2004 and was a part of the satellite constellation A-Train (L'Ecuyer and Jiang, 2010) until 2009. Initially designed to be operated for 2 years, POLDER-3 performed its measurements until late 2013, when it was decommissioned. POLDER-3 performed multi-angle radiance measurements in nine VIS-NIR wavelength bands (443 to 1020 nm) and polarization measure-ments in three of these bands (490, 670 and 865 nm), observing the Earth at up to 16 viewing angles. Its spatial resolution at the nadir was about 6 km, and its swath width was 2400 km.
The successor of POLDER-3 will be the 3MI instrument (Fougnie et al., 2018), which is expected to be launched aboard the EUMETSAT Metop Second Generation A (Metop-SG A) satellite in 2021. 3MI will be based on a similar measurement principle to that of POLDER-3 but will carry out its measurements in 12 spectral bands, nine of which will have polarimetric capability. Furthermore, 3MI will extend its measurement spectral range to the SWIR region in which it will also perform polarization measurements, which are expected to allow for more accurate cloud retrievals.

Training set generation
Our neural network approach to the retrieval of cloud properties from POLDER-3 data decomposes the retrieval task into four subtasks, with a different neural network dedicated to each. A first level of separation was made between retrievals over land and over ocean. Second, separate neural networks were trained for the estimation of the COT and for the estimation of cloud microphysical properties (effective radius and effective variance). Having separate NNs specialized on cloud retrievals above ocean and land is expected to simplify the training of each NN. The separation between COT and microphysical properties retrievals was made because, given the physical nature of the retrieval problem, a different measurement vector appears more suitable for each of these two retrievals. The information on COT is especially present in radiance, with polarization bringing additional information for small optical thicknesses. Since POLDER did not have spectral channels with significant absorption by liquid water, information about cloud droplet size in POLDER measurements is only contained in the angular and spectral dependence of polarized radiance in the cloudbow region. For this reason, our NN scheme for COT retrieval uses multi-angle and multi-wavelength radiance and degree of linear polarization (DoLP) measurements, whereas the NN for the effective radius and effective variance retrievals only relies on multi-angle and multi-wavelength polarized radiance measurements. Furthermore, effective radius and effective variance retrievals are only performed if the scattering angle range at which measurements are performed includes the cloudbow range of 135 to 165 • , whereas this restriction does not apply to COT retrievals. An additional differentiation between the neural network for COT retrieval over ocean and that for COT retrieval over land concerns the measurement vector used in the retrieval. The ocean neural network uses reflectances at 490, 675, 860 and 1020 nm and DoLPs at 490, 675 and 860 nm as inputs, whereas the land network does not make use of the wavelengths 860 and 1020 nm in order to avoid dealing with the large variability in the reflective properties of land surfaces at these wavelengths. The reflectance at 565 nm is used as an additional input in the NN for COT retrieval over land.
The NNs presented in this study have been trained using synthetic data. While the prior distributions of the cloud properties used to generate the training sets are the same for all the NNs, a difference lies in the treatment of the surface properties between the ocean networks and the land networks. In the radiative transfer simulations produced in order to train the ocean network, the surface properties are parameterized with respect to wind speed as in Cox and Munk (1954), to chlorophyll concentration as in Morel and Maritorena (2001) and Chowdhary et al. (2006), and to whitecap fraction and foam albedo as in Koepke (1984). For the land network, instead, the surface bidirectional reflectance distribution function (BRDF) has been parameterized using the Ross-Li model (Maignan et al., 2004, and references therein), and the bidirectional polarization distribution function (BPDF) has been parameterized as in Maignan et al. (2009). For both land and ocean simulations, the radiative transfer model described in Landgraf (2002, 2005) has been used. Clouds have been parameterized as spherical particles with a Gamma size distribution, with a refractive index taken from the Optical Properties of Aerosols and Clouds (OPAC) database (Hess et al., 1998). In order to reduce the computational complexity of the radiative transfer simulations, the forward scattering peak in the phase function of cloud droplets has been accounted for by means of the "MS correction" proposed by Nakajima and Tanaka (1988).
An important aspect to be pointed out, which may represent a limitation in our approach, is that the often highly peaked phase functions of cloud droplets make it important to use an adequate number of streams in the forward model calculations. For example, in Fig. 1 a plot is shown of the simulated top-of-atmosphere (TOA) reflectance at 443.9 nm versus COT for 8, 16, 32 and 64 streams, assuming spherical cloud particles and a Gamma size distribution with an effective radius of 25 µm and an effective variance of 0.03. The simulations were performed for a solar zenith angle of 30 • and assuming nadir observation. It can be seen that, for COT larger than approximately 20, the simulated TOA reflectances critically depend on the chosen number of streams. A higher number of streams generally results in more accurate radiative transfer simulations but also leads to an in- creased computation time. Since the generation of our neural network training dataset involves a large number of radiative transfer simulations (of the order of 10 5 to 10 6 ), we were forced to limit the number of streams to 16 even if this does not appear to be the most accurate choice, as choosing a higher number of streams would have led to overly timeconsuming simulations.
An additional limitation -not specific to our approach but caused by the 1-D homogeneous cloud assumption (Zhang and Platnick, 2011) -is that reflectance measurements in the optical range saturate for COTs larger than ∼ 40. This means that not enough information is available to distinguish COTs around this value from larger COTs. For this reason, we decided to use a maximum COT of 40 in the generation of the training datasets for our four neural networks.
As pointed out in Sect. 1, a very important element in the design of a neural network algorithm dealing with multiangle large-swath measurements is how to deal with the variability in the set of viewing angles at which the measurements are performed. Our approach is similar to that proposed by Han et al. (2006) in the context of aerosol retrievals from the Multi-angle Imaging SpectroRadiometer (MISR) and basically consists of using the set of viewing angles as additional input quantities for the neural network. However, a difference between the two approaches lies in the way the neural networks are trained. While Han et al. (2006) trained their neural network scheme using real MISR measurements co-located with MODIS Level 2 aerosol properties, our neural networks are trained on synthetic data. In order to perform our simulations at viewing angle combinations representative of actual POLDER-3 measurements, we sampled 1 year of POLDER-3 measurements, randomly selecting 25 orbits per month and 5000 combinations of viewing zenith angles, solar zenith angles and relative azimuth angles per orbit. We thereby generated a database of 1.5×10 6 angle combinations which we used as inputs for our radiative transfer calcula-tions. Since the number of angles at which POLDER-3 performed its measurements is also variable, we decided to only consider measurements performed at a minimum of 14 angles to train our neural networks. This number was chosen in order to have a good trade-off between the benefit of having a good number of available viewing angles and that of not discarding too many measurements, as would have been the case if we only considered measurements performed at 16 angles (the maximum available number for POLDER-3). Also for POLDER-3 measurements performed at more than 14 angles, only 14 angles are used in the input vector for the neural networks. This choice was made in order to avoid dealing with the difficulty of having a input vector of variable size or, as an alternative, of passing input vectors with missing data to the neural networks.
As discussed at the beginning of this section, the quantities used in the input vector differ between the COT retrieval neural networks and the effective radius retrieval neural networks. The complete list of input quantities for each network is given in Table 1. Effective variance is retrieved as an additional parameter in the effective radius network, whereas cloud top height is retrieved as an additional parameter in both networks. The ocean and land surface parameters are not included in the output vector but are randomized in the training set. From a theoretical standpoint, this corresponds to designing a neural network that approximates the conditional expectation of the set of cloud properties x given the measurement vector y, marginalized over the possible values of the surface properties. Mathematically, if we introduce a vector b describing the surface properties, the neural network will return an approximation of the following conditional expectation: where f b is the probability density function of b and D is the domain of f b . The same logic was followed in the treatment of cloud microphysical parameters in the COT retrieval NN and of COT in the microphysical parameter retrieval NN, i.e. rather than making an assumption about the non-retrieved parameters and keeping them constant in the training set, they are randomized. This approach enables the training procedure to adjust the weights of the NN based on the input features that are of direct relevance for the retrieval of the parameter at hand, preventing biases due to incorrect assumptions. A "classical" equivalent of this approach would consist of averaging multiple retrievals carried out using different assumptions.
The details of statistical distributions of the cloud, ocean and land surface parameters used in the radiative transfer simulations are summarized in Table 2.
The measurement noise was modelled as an additive Gaussian random variable with a zero mean and a standard devia-tion of 1 % for reflectance, 2 % for polarized reflectance and 0.7 % for DoLP. As done in previous work (Di Noia et al., 2015Noia et al., , 2017, such noise was added to the synthetic measurements during the training phase as a form of regularization, as explained in Bishop (1995). The neural network architecture was chosen based on results inherited from previous work and, for all the four networks, consists of three hidden layers with 40 neurons each.

The ensemble approach
As anticipated in Sect. 1, training a neural network to invert multi-angular measurements from a large-swath instrument is a difficult task. In an attempt to reduce the retrieval uncertainty caused by imperfect training, we decided to train multiple neural networks and use the average of their outputs as our final retrieval. This approach -called the "neural network ensemble" approach (Hansen and Salamon, 1990) -is known to reduce the random component of the neural network output error and has been already applied with success in remote sensing contexts (Krasnopolsky, 2007). In principle, several methods exist to combine the results of multiple neural networks in order to build an enhanced estimator, and an overview of such methods is given by Sharkey (1996). In this work, we have chosen to train multiple neural networks characterized by the same architecture and on the same training set, varying the random initial value of the neural network weights. In practice, in this way, each ensemble member (i.e. each individual net) is trained so that its weight vector approaches a different local minimum of the error surface (Tresp, 2002). It is possible to prove that this method ensures that the variance of the error of the neural network ensemble is smaller than the average variance of the single ensemble members (Perrone, 1993). Contrary to the error variance, the bias is not reduced by the ensemble approach.
The maximum reduction in variance is achieved when the errors of the ensemble members are uncorrelated, in which case the reduction factor is equal to the number of ensemble members. This is not the case in our situation, as we observed a significant correlation in the errors of the individual neural networks. For this reason, the gain obtained by using the ensemble approach in our case is smaller than the maximum possible gain.

Results on synthetic data
The results of applying the neural network retrieval scheme on approximately 30 000 synthetic test data not used during the training process are summarized in Table 3. The average bias, root-mean-square error (RMSE) and correlation coefficient of the individual ensemble members are shown together with the corresponding statistics for the neural network ensemble. In this test, we used ensembles of five neural networks. As expected, the ensemble approach does not reduce the retrieval bias but does reduce the RMSE, although not  in a dramatic way, as the errors of the individual networks are highly correlated. The results of the synthetic retrievals over ocean and those over land are very similar. The COT, effective radius and effective variance are generally well retrieved, whereas the quality of the cloud height retrievals is not as high.
While the results reported in Table 3 serve as an internal consistency check for the neural network performance, they cannot be used to predict the performance of our neural networks when applied to real measurements. In fact, our simulation set-up entails a number of simplifications which are typically not satisfied in real scenarios. Some of the most important simplifications are the following: -Training and test data have both been generated using a one-dimensional radiative transfer model. Therefore, a degradation of the retrieval accuracy can be expected when three-dimensional effects are significant.
-The impact of an incorrect assumption about the cloud particle size distribution cannot be assessed, as all the data are generated with an assumed Gamma size distribution.
-The vertical distribution of cloud effective radius and effective variance is assumed constant. This is often not the case in reality (Martins et al., 2011;Zhang and Platnick, 2011;Zhang et al., 2012).
All the aforementioned assumptions may lead to biases in the retrieved cloud properties (Zhang and Platnick, 2011;Zhang et al., 2012;Miller et al., 2018). The assessment of the performance of our neural network scheme when applied to real measurements is be the object of the next section.

Application to POLDER-3 data
As pointed out in Sect. 3.3, testing the retrieval algorithm against synthetic data only gives an indication of its performance under idealized conditions. In order to assess the performance of our neural network scheme, we used it to process the entire POLDER-3 2006 Level 1B dataset and compared the retrieved COTs and effective radii to MODIS-Aqua Collection 6 Level 2 and Level 3 retrievals and to the available POLDER-3 Level 2 data. The results of this comparison are discussed in the following subsections.

Comparison to MODIS Level 2 data: selected examples
In order to evaluate the performances of our neural network retrieval, we have compared our retrievals to the  MODIS Level 2 product remapped on the POLDER-3 official sinusoidal grid (Zeng et al., 2011) Fig. 2, snow-covered areas were present in several regions in the Northern Hemisphere, such as Central Asia, northeastern Europe, Scandinavia and North America. Furthermore, a significant dust outbreak took place over the Eastern Mediterranean Sea, as shown in Fig. 3.
In Fig. 4, a global map of the COT retrieved by the neural network algorithm for POLDER-3 is shown together with an analogous maps produced by regridding the MODIS Level 2 product on the POLDER-3 resolution. The neural network product covers a narrower swath with respect to the MODIS product, as only pixels observed at a minimum of 14 viewing angles are considered in the NN retrieval.
In general, the spatial patterns of COTs seem to be well captured by the NN retrieval, but a number other aspects can be noticed in the maps. A first important remark is that at this stage we have not yet developed a cloud detection algorithm for our neural network product. As a consequence, the neural network will process all the available ground pixels regardless of whether they are actually covered by clouds. Nevertheless, it is interesting to see that over areas such as the Sahara -where a large number of cloud-free pixels are present -the NN correctly reports COTs very close to zero. As explained in Sect. 3.1, the NN has been trained on a variety of synthetic scenes, including situations of small COTs on bright, desert-like surfaces. However, the NN has not been trained on snow-covered synthetic scenes, and in such situations our neural network scheme seems to detect large COTs. This is clearly visible in Fig. 4, where an extended area of large retrieved COT can be seen over Central Asia. This area corresponds to the snow-covered region shown in Fig. 2. Similar features can be observed over North America. In general, distinguishing snow from clouds using POLDER-3 channels is extremely challenging if the observation geometry does not sample the cloudbow region (Bréon and Colzy, 1999). However, incorporating snow-covered scenes in the training set would be an interesting subject for future research in view of the application of our method to 3MI. In fact, 3MI will provide measurements at 1.6 µm, where snow has high absorption and can therefore be distinguished from clouds (Warren, 1982;Krijger et al., 2005Krijger et al., , 2011) . An analogous behaviour of the NN COT retrievals can be observed over cloud-free regions covered by sea ice, such as the Hudson Bay, which appears to be covered by sea ice in Fig. 2 and exhibits large retrieved COTs in Fig. 4.
Other areas in which the NN retrievals differ significantly from MODIS data need a more careful inspection. Figure 5 shows retrieved COT maps over Europe for the NN algorithm and for MODIS. Both MODIS and the POLDER-3 NN generally see large COTs over the Balkans and in an area approximately centred around the Russian enclave of Kaliningrad. However, additional areas of large COTs can be seen in the NN map and not in the MODIS map. One approximately covers northeastern Italy, and a smaller one is centred on northwestern Spain, down the Bay of Biscay. A plausible explanation for this behaviour can be obtained by looking at the cloud phase index (CPI) product, described in Riedi et al. (2010), which can be used to estimate the cloud thermodynamic phase from combined POLDER-3 and MODIS data. Values of the CPI lower than 50 indicate liquid water clouds and values larger than 150 indicate ice clouds, whereas intermediate values indicate mixed-phase clouds. A map of the CPI over Europe for the selected date is shown in Fig. 6. The regions in which the NN retrieves much larger COTs than MODIS generally coincide with regions marked as ice clouds in the CPI map (CPI systematically larger than 170). These are not represented in the NN training set, and therefore the quality of the NN retrieval cannot be guaranteed over this type of clouds.
In order to explain the origin of the bias in the retrieved COT over ice clouds, we can reason as follows. The reflectance of a cloud layer at nonabsorbing wavelengths can be approximated as follows (van Diedenhoven et al., 2014a): where µ 0 is the cosine of the solar zenith angle, τ is the total optical thickness and g is the asymmetry parameter of the scattering phase function. If we take Eq.
(2) as the reflectance of an ice cloud layer of optical thickness τ ice , with asymmetry parameter g ice , and assume the asymmetry parameter g liq of a liquid water cloud in our retrieval, a liquid water cloud of optical thickness τ liq with the same reflectance as that of the true ice cloud satisfies the relationship Therefore, we can expect the estimated liquid water cloud optical thickness to be related to the true ice cloud optical thickness as approximately Given that in the MODIS Collection 6 algorithm an ice cloud optical model with an asymmetry parameter of 0.75 is assumed (Holz et al., 2016), and that liquid water cloud droplets typically have an asymmetry parameter of around 0.85, based on Eq. (4) it is expected that our retrievals over ice clouds are biased by a factor of around 1.66 (van Diedenhoven et al., 2014b). Histograms of the ratio between the COT retrieved by the NN scheme and that retrieved by MODIS over liquid water clouds (CPI < 50), ice clouds (CPI > 150) and mixed-phase clouds (50 ≤ CPI ≤ 150) are shown in Fig. 7 for the selected date. It can be seen that the peak of the histogram for ice clouds does not lie too far from the value predicted based on our simple approximation. The histogram for liquid water clouds, instead, is much more peaked around a value of approximately 0.8.
Since our NN algorithm for COT retrieval does not just rely on single reflectance values but also uses the angular variability of reflectance and DoLP, it is expected that incorporating ice clouds in the training dataset may help to improve the COT retrieval accuracy over this type of clouds, at least reducing the spread of the histogram in Fig. 7.
In order to carry out a global quantitative comparison between the COTs retrieved by the neural network and those retrieved by MODIS over liquid water clouds, the following filtering criteria have been applied: -The MODIS cloud fraction had to be larger than 0.95.
-The CPI had to be smaller than 50.
-Ocean regions affected by sun glint were not considered.
A pixel was flagged as affected by glint if observed from at least one viewing direction satisfying the following conditions: -The relative azimuth angle was between −90 and 90 • .
-The absolute difference between viewing zenith angle and solar zenith angle was smaller than 10 • .
The results of the comparison are shown in Fig. 8, where it can be seen that the POLDER-3 neural network algorithm  tends to retrieve systematically lower COTs with respect to MODIS, with a bias of around −2. Overall, the two datasets look fairly well correlated, with a correlation coefficient of 0.79.
A similar comparison between the NN retrieval and MODIS has been performed for the cloud effective radius. Since the information on the cloud effective radius in POLDER-3 measurements mostly resides in the polarized radiance in the cloudbow scattering angle range, only pixels containing observations within this range were used in the comparison. More precisely, a POLDER-3 pixel was considered eligible for effective radius retrieval if it was observed in a scattering angle range satisfying the following two conditions: -The minimum scattering angle was not larger than 135 • ; -The maximum scattering angle was not smaller than 165 • .
Such conditions were applied on top of the criteria mentioned above for the comparison of the retrieved COTs. The results of the comparison are presented separately for ocean and land in Fig. 9. Over both surface types, a negative bias in POLDER retrieved effective radii with respect to MODIS effective radii using the 2.13 µm band is observed. The bias is around −2 µm over ocean and −1 µm over land. The correlation coefficients are worse than for the COT, which is to be expected given the differences in sensitivity to vertical and horizontal inhomogeneities between the polarimetric and bispectral methods (Miller et al., 2016. Nevertheless, it appears that the NN retrievals are still capable of capturing some of the global patterns in the spatial variability in the cloud effective radius. Global histograms of the effective radii retrieved by the POLDER-3 NN over ocean and land are shown in Fig. 10. These seem to reflect the general knowledge of the fact that cloud droplets over land tend to be smaller than over ocean (Han et al., 1994;Bréon and Colzy, 2000;  Doutriaux -Boucher, 2005), thereby providing us with an additional sanity check for our retrievals.

Global comparison with MODIS Level 2 data
The comparison presented in Sect. 4.1, which refers to a particular day, highlights a number of features in the behaviour of our NN retrieval algorithm. We carried out a more systematic comparison by looking at the entire Level 2 dataset for the year 2006. In order to make the size of our datasets more manageable, we chose to rebin both the neural network retrievals and MODIS retrievals on a 1 • × 1 • grid. The data used in the regridding were filtered according to the same criteria explained earlier in this section for the example date 24 February. The fact that the same criteria were applied to both datasets, and the fact that the MODIS retrievals had already been remapped to the POLDER spatial resolution prior to the regridding, should minimize the risk of introducing greater biases at this processing stage.
The results of this global comparison for COT are shown separately for ocean and land scenes in Fig. 11. As for the effective radius, we observed a distinct behaviour in our retrievals between cases of "low clouds" (cloud top pressures larger than 600 hPa) and cases of "high clouds" (cloud top pressures lower than 600 hPa). As shown in Fig. 12, effective radius retrievals appear less biased with respect to MODIS, both over ocean and land over low clouds. It is important to note that both the POLDER-3 and the MODIS retrievals used in the rebinning were preliminarily filtered for ice using the POLDER-MODIS CPI, but it is still possible that cases exist in which the cloud phase identification is imperfect. This result suggested us to filter out clouds higher than 600 hPa in our global effective radius comparison. This filter has been used in all the comparisons that are shown hereinafter.
Scatter plots of the NN vs. MODIS effective radii over ocean and land are shown in Fig. 13. The NN-retrieved effective radii show a negative bias with respect to MODIS. Such bias appears fairly consistent between ocean and land. Looking at the scatter plot, there seems to be a "cluster" of points over land for which the NN retrieves much larger effective radii than MODIS. A preliminary investigation shows that these situations preferentially occur over southeastern China, around the Río de la Plata estuary in South America and along the southern part of the Bight of Biafra in Africa. This is shown in Fig. 14, which depicts a map of the data points over land where the MODIS retrieved effective radii smaller than 10 µm while the neural network retrieved effective radii larger than 18 µm. It is not clear, at this stage, if particular cloud types are associated with these areas or if that effect is caused by the presence of aerosols above clouds, which is at least possible over western central Africa (Waquet et al., 2009). Aerosols above clouds were not modelled in the training set of our neural network scheme and may have an impact on our retrievals when scattering angles between 100 and 120 • are present in the measurement vector, as Waquet et al. (2009) show that TOA polarized reflectances in this scattering angle range are significantly higher over a cloud with an aerosol layer above than they would be in absence of an aerosol layer. We have tried to identify cases of aerosol above clouds by means of an empirical filter on polarized reflectances between 100 and 120 • , but so far we have found no conclusive indications regarding their impact on our cloud property retrievals. An additional element that can be noticed in the scatter plots in Fig. 13 (and in all the subsequent plots) is that the number of available data points over land is much smaller than over ocean. This is a result of filtering for high clouds, which reduced the ratio between the number of land and ocean data points from ∼ 55 % to roughly 39 %.
The bias observed in the retrieved effective radii seems consistent with results obtained with other algorithms (Bréon and Doutriaux-Boucher, 2005). The negative bias present in our COT retrievals is possibly explained by the presence of residual cases of broken clouds in our dataset, which are known to cause negative POLDER-MODIS differences (Zeng et al., 2012). On the contrary, the differences in the assumed cloud model should not be an important cause for the observed bias, as the uncertainty induced by changing the assumed cloud model in MODIS retrievals is estimated to be small (Platnick et al., 2017), with the exception of retrieval at special scattering angles such as cloudbow and glory (Cho et al., 2015). However, these situations only account for a very small fraction of MODIS retrievals.

Effective variance retrievals: preliminary results
As pointed out in Sect. 3.1, our NN scheme for cloud property retrievals does not assume a fixed effective variance but retrieves the effective variance together with the effective radius. This design choice is different from what is done  in most of the existing cloud retrieval algorithms. In general, most cloud retrieval schemes assume a cloud model with a fixed effective variance ranging from 0.1 to 0.15, depending on the algorithm (Arduini et al., 2005, and references therein). In particular, a value of 0.1 is assumed in the MODIS Collection 6 products (Platnick et al., 2017). In the training set of our NN retrieval algorithms, effective variance was assumed to be uniformly distributed between 0.03 and 0.35. In the interpretation of the NN retrievals, this can be regarded as a sort of "prior distribution" for cloud effective variance. As shown in Table 3, our NN scheme seems capable of retrieving cloud effective variance to a good accuracy when applied to synthetic measurements. This impression is corroborated by the upper panel of Fig. 15, which is a scatter plot of retrieved versus true effective variances on a set of synthetic measurements over ocean. This test gives us some confidence in the possibility of retrieving effective variance from POLDER-3 measurements. In the lower panel of Fig. 15 hand, experiments on synthetic data seem to suggest that the NN should be able to retrieve effective variance within the typically assumed range of this parameter. On the other hand, when applied to real POLDER-3 measurements, the algorithm has a strong preference for effective variances around 0.05 over ocean and 0.06 over land, both smaller than the value assumed in the MODIS product. It is important to emphasize that, given the broad prior distribution used to train the NN, the behaviour of the NN when applied to real data appears to be entirely driven by the measurements. It is interesting to note that the result we observed over ocean has been recently found also by another study based on SEVIRI measurements over marine stratocumulus clouds, currently under review (Benas et al., 2019), using a method that is completely independent of the one presented in this article. However, it must be noted that the effective variances we retrieved over land are smaller than those suggested by other studies focussed on continental clouds (Miles et al., 2000;Khaiyer et al., 2005). Therefore, additional research will be needed on the interpretation of our effective variance retrievals.

Comparison to existing POLDER-3 data and MODIS Level 3 data
In order to gain further insight on the behaviour of our retrieval algorithm, we carried out a global intercomparison between our retrieved COTs and effective radii to existing POLDER-3 retrievals. COT data at a spatial resolution of 20 km 2 ×20 km 2 are available from the existing POLDER-3 Level 2 product, whereas effective radius data are not included in the POLDER-3 Level 2 product, but are available from the Cloud Droplet Radius (CDR) product. Both products are available from the ICARE data archive mentioned in Sect. 4.1.
Also, for the comparison between the neural network retrievals and the existing POLDER-3 products, we have chosen a regular 1 • × 1 • grid. In order to better understand the difference between our retrievals and existing POLDER-3 products, we compared retrievals from both products to the MODIS MYD08_D3 daily product . This also allowed us to compare effective radius retrievals to MODIS retrievals using the 3.7 µm channel, which are not available in the combined POLDER-MODIS Level 2 prod- uct. These retrievals are known to be more sensitive to the upper cloud layers compared to the standard MODIS product using the 2.13 µm channel (Platnick, 2000) and are therefore expected to be more comparable to polarimetric retrievals (Miller et al., 2016).
The results of comparing neural network COT retrievals to POLDER-3 Level 2 retrievals are shown separately for ocean and land in Fig. 16. The correlation between the two retrieval schemes is high, although the NN retrievals are slightly lower than the Level 2 retrievals, with a low bias of around −1 over ocean and of around −1.5 over land. A possible reason for these systematic differences may lie in the different treatment of the cloud microphysical properties in the two algorithms. POLDER-3 Level 2 COT retrievals assume water droplets with a constant effective radius of 9 µm over land and 11 µm over ocean and an effective variance of 0.15 (Zeng et al., 2012), whereas in our retrieval these parameters are not fixed but are varied randomly in the training set.
A similar comparison is shown in Fig. 17 for cloud effective radius retrievals. In this case, the agreement between the neural network and existing POLDER-3 data is better over ocean than over land. A moderate negative bias of the neural network retrievals versus the CDR product exists over ocean, whereas over land the agreement is generally poor. We decided, then, to compare both datasets to gridded MODIS retrievals using the 3.7 µm channel. The results are shown in Fig. 18 for ocean and Fig. 19 for land. As expected, the negative bias of neural network retrievals versus MODIS 3.7 µm effective radii is smaller than that versus the 2.6 µm product. The existing CDR product seems more accurate and more precise than the neural network product over ocean, whereas the opposite seems to occur over land, where the neural network retrievals exhibit a significantly smaller bias versus MODIS compared to the CDR retrievals. Both over ocean and land, the bias of our neural network retrievals appears -in absolute value -smaller than 2 µm. An additional difference between our cloud effective radius retrievals and the CDR product is the fact that the latter seems to present a large number of cases in which the retrieved effective radius "saturates" to a value of 20 µm, whereas this does not seem to be the case for the NN retrieval. This difference probably results from the different choices made in generating the training set for the NN and the tabulated values for the CDR retrieval.

Conclusions
In this paper we have presented a neural network algorithm to retrieve the microphysical properties of liquid water cloud from satellite multi-angle, multi-wavelength polarimetric measurements. The proposed approach is based on four ensembles of neural networks trained for synthetic measurements. Two NN ensembles are dedicated to the retrieval of cloud optical thickness separately over ocean and land, whereas the other two are used in order to retrieve the cloud effective radius and effective variance, also separately over ocean and land. A distinctive feature of our algorithm is that it produces estimates of the cloud properties at the native POLDER-3 Level 1B spatial resolution of 6 km 2 ×7 km 2 . Cloud property retrievals at this resolution are performed by exploiting the relative wavelength shift of the observed maxima and minima in polarized radiance. It must be considered, though, that the scatter in cloud effective radius retrievals at the inherent pixel size is significantly larger than that of gridded retrievals. This, in our opinion, means that the fact that the polarization maximum of cloudbow is sometimes missed may still be a limitation for the retrieval, but it gets compensated by spatial averaging, as it leads to random errors rather than systematic ones.
We tested the proposed algorithm with measurements from the POLDER-3 instrument, which operated aboard the PARASOL satellite between 2004 and 2013, but the approach can be readily adapted to any other imaging multiangle polarimeter (e.g. 3MI) once instrument-specific factors -such as available spectral channels and the number of available viewing angles -are modified in the neural network design process.
The neural network algorithm has been applied to process the entire POLDER-3 dataset for the year 2006, and the re- trieval results have been compared to MODIS Collection 6 Level 2 and Level 3 data and to the existing POLDER-3 product. Comparisons between the NN retrievals and MODIS Level 2 data at the native POLDER-3 spatial resolution have been carried out on a number of selected cases. They showed a negative bias of around −2 in the NN COT retrievals and negative biases around −2 µm in the effective radius retrievals. More systematic, global comparisons against MODIS Level 2 and 3 cloud products, encompassing the entire dataset, have been performed at a reduced spatial resolution of 1 • × 1 • . Such comparisons confirmed the high spatial resolution results with respect to the biases in NN retrievals.
A comparison between the NN results and MODIS data also highlight a number of interesting aspects. One is that the NN algorithm seems to correctly process cloud-free scenes (or nearly cloud-free scenes) by returning low COT values in response to such scenes. While this behaviour needs further investigation, it may be a suggestion that a properly trained NN scheme for COT retrieval may not need a preliminary cloud detection step. Another interesting feature is the fact that the NN retrievals -which have been trained for liquid water clouds -seem to exhibit a specific pattern in the presence of ice clouds. Specifically, the presence of ice clouds leads to significant COT overestimations. While on the one hand, the inability of our NN scheme to handle ice cloud situations could be expected, given the way the training set was designed, on the other hand it appears that properly incorporating such scenes in the training dataset would be an important algorithmic improvement for the retrieval of the cloud optical thickness. It must be emphasized, though, that including ice clouds in the training dataset is not expected to improve effective radius retrievals instead. In fact, ice crystals are nonspherical and often much larger than visible wavelengths, and therefore their scattering phase function does not present angular features from which particle size can be inferred. As a result, over ice clouds, only optical thickness and crystal shape information can be retrieved using multi-angle polarimetry, whereas no information is available on particle size (van Diedenhoven et al., 2012).
A three-way comparison between our NN, MODIS Level 3 data and existing POLDER-3 products suggests that our retrieval scheme seems to have good capability in the retrieval of the cloud effective radius over land. The agreement between the NN retrievals and MODIS Level 3 products looks better than that exhibited by the existing POLDER-3 product. Over ocean, instead, the performances are similar, with the existing product showing a slightly better correlation with MODIS.
Unlike many of the existing retrieval schemes, our algorithm also attempts a retrieval of the effective variance of the cloud droplet size distribution in addition to that of the effective radius. While no global correlative datasets exist for validating retrievals of this parameter, a preliminary inspection of the statistical distribution of the retrieved effective variances reveals a good agreement with literature values over ocean, while the retrieved values look smaller than expected over land. Further investigations are needed in order to further understand this behaviour.
An interesting future application of the proposed algorithm may be to measurements from new multi-angle polarimeters such as 3MI (Fougnie et al., 2018), due for launch in 2021 on EUMETSAT Metop SG-A, as well as SPEXone (Hasekamp et al., 2019) and the Hyper-Angular Rainbow Polarimeter (HARP-2), which are both to be launched in 2022 in the framework of the NASA Plankton Aerosol, Cloud, ocean Ecosystem (PACE) mission (Omar et al., 2018). The availability of polarization measurements in a higher number of spectral channels (for HARP-2) and, at a larger number of viewing angles, has the potential to further improve effective radius retrievals for liquid water clouds, thanks to an improved sampling of the angular positions of the secondary cloudbows. In contrast, in the case of SPEXone the limited number of available viewing angles (five) may be compensated by the availability of high spectral resolution measurements.
Data availability. The gridded data used in the comparisons described in Sect. 4.2 and 4.4 can be downloaded via anony-mous FTP from ftp://ftp.sron.nl/open-access-data/antonion/10. 5194-amt-2018-345 (last access: 11 March 2019). Contacting the corresponding author for information on the use and interpretation of the data is advised . The full neural network Level 2 dataset for the year 2006 and the code are available from the corresponding author upon request.
Author contributions. ADN and OH designed the retrieval experiment. ADN designed the neural network algorithm with inputs from OH and BvD and carried out the analysis of the results. BvD and ZZ ensured the correct use and interpretation of MODIS data products and contributed to the physical interpretation of the validation results. ADN prepared the paper, with contributions from all coauthors.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. POLDER-PARASOL Level 1 data was originally provided by CNES. POLDER-3 Level 2 and Level 3 products, the POLDER-3 CDR product, and the MODIS-POLDER Level 2 product were produced by ICARE-LOA-LSCE.
MODIS MYD_D3 data were obtained from the NASA Level-1 and Atmosphere Distribution System (LAADS) Distributed Active Archive Center (DAAC).
We acknowledge the use of imagery from the NASA Worldview application (https://worldview.earthdata.nasa.gov/, last access: 11 March 2019) operated by the NASA-Goddard Space Flight Center Earth Science Data and Information System (ESDIS) project.
We gratefully acknowledge the two anonymous reviewers for their comments and suggestions.
ADN gratefully acknowledges Piet Stammes, Nikos Benas and Jan Fokke Meirink (KNMI) for the valuable discussions about the interpretation of our effective variance retrievals.
Edited by: Alexander Kokhanovsky Reviewed by: two anonymous referees