Cloud retrievals from satellite data using optimal estimation : evaluation and application to ATSR

Introduction Conclusions References


Introduction
Clouds have long been recognised as one of the key moderators of the Earth's atmosphere: low clouds such as stratus effectively reflect incoming solar radiation, giving an overall cooling effect, while high clouds may partially transmit solar radiation, but effectively trap the outgoing thermal radiation, resulting in an overall warming effect.The balance between these effects and, in particular, how they might change over time, involving processes such as water vapour-feedback and cloud-aerosol interaction, significantly complicates prediction of future climate, as has been recognised by the Intergovernmental Panel on Climate Change (IPCC).
In order to test climate models, we require accurate, consistent, long-term, well characterised, global measurements of clouds and their properties.Ground-based observations are important, but these observations are often biased towards land and populated centres.Only satellites provide truly global coverage, which is essential for comparison with climate models.Various active and passive satellite cloud climatologies exist; for example, the active Cloud Profiling Radar (CPR) (Stephens et al., 2008) and Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP) sensors are able to provide height-resolved information on cloud properties (Winker et al., 2007), however, coverage is limited to the subsatellite track and the time-series are short.Of the passive satellite instruments the most widely known are the High resolution Infrared Sounder (HIRS, Wylie and Menzel, 1999), Moderate Resolution Imaging Spectroradiometer (MODIS, Platnick et al., 2003), Advanced Very High Resolution Radiometer (AVHRR, Jacobowitz et al., 2003;Heidinger and Pavolonis, 2009) and Multi-angle imaging SpectroRadiometer (MISR, Moroney et al., 2002) datasets.The passive sensors cannot represent the complex vertical structure, but have much better global coverage and longer time series than active instruments.Typically information on optical depth is derived from the visible and near infrared channels (Nakajima et al., 1990) while information on the cloud top pressure is derived separately from infrared measurements using brightness temperatures, the split window technique or through CO 2 slicing algorithms.MISR lacks thermal infrared channels and instead uses stereoscopic observations from its multiple viewing directions to derive cloud top height.This technique is also used to derive cloud top heights from Along Track Scanning Radiometer (ATSR) measurements (Muller et al., 2007).The International Satellite Cloud Climatology Project (ISCCP, Rossow and Schiffer, 1991) comprises a merging of polar and geostationary satellite data which has been a fundamental, reference dataset of global cloud properties for many years, though shortcomings of this dataset have been identified (Evan et al., 2007).
Progress in understanding the global distribution of cloud and its evolution with time are expected to come from systematic inter-comparison of results from different sensors or from different retrieval approaches (e.g., via activities of Global Energy and Water cycle EXperiment, GEWEX), as well as model/measurement intercomparisons e.g., CFMIP, Bodas et al. (2008).Here, we describe an optimal estimation method (OEM) (Rodgers, 2000) to generate a dataset from the Along-Track Scanning Radiometers (ATSR-2 and AATSR) which will provide a valuable contribution in this area.Advantages of the dataset stem from both the retrieval method and from the characteristics of the ATSR observations: -The ATSRs provide a long time-series (from 1995present) of consistent, well-calibrated observations in 6 channels sensitive to cloud properties, spanning the visible to infrared spectral range, obtained in two viewing directions.This time-series will continue into the foreseeable future via the Sea and Land Surface Temperature Radiometer (SLSTR) on Sentinel-3.
-The OEM application to similar remote-sensing cloud retrievals has been described in Heidinger (2003), Miller et al. (2000), Heidinger and Pavolonis (2005) and Copper et al. (2003).In these instances the OEM has been applied to the visible channels/near infrared to retrieve cloud effective radius and optical depth or separately to the mid-infrared channels to derive cloud top pressure.The OEM scheme described here differs to the above applications in that it is based on fitting a physically consistent model of cloud to observations spanning ALL the channels, the visible to mid-infrared, extracting information on the height, optical depth, particle size simultaneously, while rigorously treating model and observation errors.This in turn provides detailed estimation of the errors in the retrieved quantities, and quantification of the "goodness of fit" of the observations to the cloud forward model (FM).This enables the appropriateness of underlying assumptions in the retrieval to be tested and data interpreted accordingly.Furthermore, where the retrieval obtains a good fit to observed radiances, one can be assured that the resulting cloud properties provide simultaneously a good representation of the short-wave and long-wave radiative effects of the observed cloud.Ham et al. (2009) and Siddans et al. (2010) show large discrepancies between observed MODIS radiances and those predicted based on MODIS cloud retrievals.Such discrepancies are inherently avoided by the retrieval method adopted here.
In this paper, we describe the optimal estimation algorithm, we assess the sensitivity to the retrieved parameters for a range of cloud conditions including multi-layer cloud, and finally show some examples of the retrieval algorithm as it has been applied to ATSR-2 data.
The algorithm has been applied to ATSR data to produce the GRAPE dataset of cloud parameters from July 1995 to June 2003 (from ATSR-2), and has been processed from August 2002 to December 2009 (from AATSR).There was a 6 month data outage from January to June 1996 caused by a temporary scan mirror failure.The cloud parameters provided are outlined in Table 3.Further details, data, documentation, quality statements and imagery of products can be found at the British Atmospheric Data Centre (BADC) website, www.badc.ac.uk.
The retrievals have already been used to study ship tracks (Campmany et al., 2009;Sayer and Grainger, 2010) and study cloud-aerosol interactions (Bulgin et al., 2008).It is noted that, while the authors have applied the algorithm Table 1.Definition of reflectance and transmittance terms.Additional transmittance terms can be created by the inclusion of the direct transmittance (the unattenuated beam) to give total transmittance terms for a layer.

Definition
Names R(λ, 2π , ω r ) = 1 π 2π 0 R(λ, ω i , ω r ) cos θ i dω i hemispherical-directional reflectance factor for isotropic illumination ) cos θ r dω r directional-hemispherical reflectance factor, reflection, local albedo, planetary albedo, black sky albedo ) cos θ i cos θ r dω i dω r bi-hemispherical reflectance factor for isotropic illumination, white sky albedo to the ATSR instrument, the method could in fact be applied to many different passive visible and infrared remotesensing instruments.Indeed, the theoretical basis for the algorithm was established through a EUMETSAT study to derive cloud properties for the Meteosat Second Generation SEVIRI, (Spinning Enhanced Visible and InfraRed Imager), instrument (Watts et al., 1998) and used to investigate the optical properties of ice cloud in Baran and Havemann (2004).
A version of the algorithm for SEVIRI is under development at EUMETSAT (Watts et al., 2011) and ORAC has been applied to AVHRR and MODIS in the context of the European Space Agency, ESA, Climate Change Initiative.An analogous technique for retrieving aerosol properties has been described in Thomas et al. (2009b).

The Along Track Scanning Radiometer (ATSR)
The ATSRs are dual-viewing imaging instruments measuring visible and infrared radiances (within narrow band centred on 0.55, 0.67, 0.  The ATSRs are designed to have exceptional long-term sensitivity and stability of calibration.Thermal channels are calibrated using two on board black bodies at known temperatures which are observed during each across-track scan of the instrument.This makes it possible to determine single channel equivalent brightness temperatures correct to ±0.05 K (Smith et al., 2001).The instrument also has an on board visible/near-infrared calibration system enabling the visible channels to be calibrated to an accuracy of better than 4 % (Smith et al., 2008), which is subsequently improved via vicarious calibration using scenes of known stable surface BRDF (certain deserts and ice caps).The specifications of the different ATSR instruments can be found in Table 2.The excellent calibration and long time series of the ATSR instruments, as well as substantial overlap periods to enable inter-instrument calibration, make ATSR measurements well suited to generate records suitable for climate science.

C. A. Poulsen et al.: Cloud retrieval algorithm 3 Cloud retrieval scheme
The ATSR instruments' primary scientific mission is to perform high-accuracy sea surface retrievals (Mutlow et al., 1994), which requires accurate detection of cloud-affected scenes.The instruments are also suitable for the retrieval of cloud-properties, since the 7 channels are sensitive in different ways to the macro-and microphysical properties of cloud.For example the infrared channels provide useful extra information to the visible channels in the case of optically thin clouds.However, the observations are certainly not sensitive to every aspect of the three-dimensional distribution of all relevant cloud properties and no single channel sensitive solely to a specific cloud property.In the ORAC (Oxford-RAL Aerosol and Cloud) algorithm we approach the problem of extracting useful information on cloud as an inverse problem: a forward model is defined which applies a radiative transfer model (RTM) to simulate satellite radiances based on a parametrised cloud/atmosphere/surface model and the defined observing conditions.An inverse or retrieval model is then used to obtain the cloud parameters which give the best fit between the model predicted and observed radiances, taking into account measurement errors and relevant prior knowledge.This inverse problem is solved using the optimal estimation method (Rodgers, 2000) (OEM): the basic principle of the OEM is to maximise the probability of the retrieved state, conditional on the value of the measurements and any a priori knowledge.This is achieved by maximising the probability P = P (x|y m , x a ) with respect to the values of the state vector x.The a priori estimate of the state is defined by x a , i.e., the most likely state prior to taking the measurements into account.The state mapped into measurement space is defined by y(x).The measurement vector is described by y m .The assumption is made that errors in the measurements and a priori parameters (and forward model) can be described by a Gaussian distribution normally distributed with zero mean and covariances given by S y and S a , respectively.The solution state is found by maximising P or equivalently minimising the sum of the Guassian exponents the "cost function", J : The cost function can be minimised by estimating its gradient (by linearising the forward model) for an initial estimate of the state.Using this gradient, an estimate of the state is made which is predicted to have lower cost.The procedure is iterated until convergence (or the attempt to reach convergence is abandoned).To find the minimum we start at a first guess state x o which in the absence of other information is set to be the value of the a priori x a and proceed to make steps, assuming the value of J (x) decreases at each step then the updated x vector moves towards the cost function minimum.In this retrieval, we use the Levenberg-Marquardt (Marquardt, 1963;Levenberg, 1944) scheme to perform the minimisation.The rationale of the Levenberg-Marquardt is to use the weighted combination of the steepest descent method and Newtonian descent according to the characteristics of the cost function, i.e., when the cost is far from the solution the steepest descent algorithm is preferred while when the cost function is close to the solution the Newtonian scheme is used.Convergence is judged to occur when the cost function changes by less than 1 between iterations.Retrievals which do not converge after 25 iterations are considered invalid.
If the a priori and measurement errors are well represented by their respective covariances, then the cost function value at solution is expected to follow a χ 2 distribution with degrees of freedom equal to the total number of elements in the measurement and state vectors.The value of the cost function, therefore, provides a measure of the likelihood of the solution-state being consistent with observations and prior knowledge.A typical value of a cost function with 5 measurements and 5 state variables would be 10, if none of the measurements deviated by more then their expected noise and no state variables deviated from their a priori value by more than the a priori error, otherwise the cost value would be reduced if any of the state variables are bounded and do not have any significant a priori.The reasons for high/low values of J can be difficult to estimate, values too low implies an overestimation of error such as the measurement noise, values too large imply underestimation of noise or convergence criteria that is too loose.
For retrievals which satisfactorily converge, i.e., converge to a minimum cost which is consistent with measurement and prior errors, then the errors on the estimated state parameters are described by S x .The diagonals of S x provide the expected variance of each element in the state vector, assuming that the retrieval is linear within the range of its errors and the measurement and prior errors are well described by their respective assumed covariances: where K is known as the Jacobian or weighting function matrix, and contains the derivatives of the forward model with respect to the state parameters at the solution: (3)

Measurement vector and covariance
The implementation of the ORAC retrieval scheme described here uses nadir-view observations in the 0.55, 0.67, 1.6, 11 and 12 µm channels of the ATSRs and, hence, can only be applied to day time observations.The 3.7 µm channel is not included because it has been found to be difficult to consistently represent both the 1.6 and 3.7 µm channels with the assumed cloud model as they are sensitive to the vertical gradient of effective radius in the cloud (Baran et al., 2005).
Similarly forward view radiances are not included as threedimensional structure of cloud will often cause differences between the views which cannot be accommodated by the current model and there are parallax effects.Note that the 0.55 µm channel is often not present due to the operating modes of the instrument.The measurement covariance, S y , is the sum of two terms: The random noise on the observations is represented by S noise .The matrix is diagonal, i.e., we assume no correlation in the noise between different channels.In the setup used in this paper, these values equal to the square of the measurement noise, namely 0.058, 0.009, 0.018 (all units of sunnormalised radiance as defined in Sect.7.1.2),0.21 K and 0.23 K respectively, taken from (Smith, 2005).
The forward model errors are represented by S fm , including its representation of the atmosphere and the instrument response.These errors are typically larger than the instrumental noise, but are more difficult to quantify and are not necessarily normally distributed or with zero mean, as formally assumed by the OEM method.Potential contributions include errors in co-registration between channels, error in absolute radiometric calibration, errors in the assumed atmospheric and surface temperature profiles, errors in the planeparallel assumption (e.g., cloud vertical structure, 3-D effects), errors in the ice scattering model, impact of aerosol, error in modelling surface reflectance/emissivity, etc.The impact of these terms has been investigated in studies such as Watts et al. (1998) and Siddans et al. (2009).In general, the errors depend on the context of a particular retrieval (including cloud-type, clear-sky atmospheric state etc).
In practice a simplified (but somewhat ad-hoc) approach is adopted in this scheme applied in this paper and in GRAPE (Global Retrieval and cloud Product Evaluation): S fm is defined to be the sum of two terms: (5) For the visible and near-IR channels, S p describes errors which are assumed proportional to the signal.These terms are assumed uncorrelated between the channels and have values of equal to the square of 2.75 % of the measured radiance for the 0.55, 0.67 and 0.87 µm channels and 2.5 % of the measured radiance in the 1.6 µm channel, respectively.These values are chosen (following work in the studies cited above) to broadly represent the potential impact of errors in interchannel co-location, sub-pixel scene homogeneity, radiometric gain uncertainty, clear-sky atmospheric transmission, and the error in fast forward model compared to DISORT (DIScrete Ordinate Radiative Transfer).For the 11 and 12 µm channels, errors of 0.1 K are assumed to cover errors limiting the accuracy of the forward model for these channels (however these values are small compared to the assumed noise).
S s is defined only for the visible and near-IR channels and is intended to represent errors in the modelled surface reflectance which limits the FM accuracy when cloud opacity is low.This depends on both the error in the surface reflectance and the sensitivity of the observations to the surface reflectance, which depends upon the cloud properties being retrieved.For a given cloud state, S s is given by: where S sfc is the error covariance of the assumed Lambertian surface reflectance R SFC appropriate to each channel and K sfc contains the derivatives of the modelled radiance in each channel with respect to the surface reflectance (calculated by the fast FM).Diagonal elements of S sfc are set to (the square of) the albedo for the corresponding channel multiplied by 0.2.Offdiagonals are set to give a correlation between the channels of 0.4, surface albedo values are taken from MODIS data (Schaaf et al., 2002).Errors arise from the MODIS retrievals themselves and the inference from these of values appropriate to the ATSR scene and spectral response.The assumed 0.2 fractional error and 0.4 correlation values are broadly consistent with the accuracy reported in (Liu et al., 2009).Further consideration of the use of MODIS albedo data in the ORAC scheme is given in (Sayer et al., 2011b).
Note that the inclusion of the state-dependent term S s means that S y varies during the iteration of the retrieval.Similar results can be obtained by including surface reflectance in the retrieval state vector, with prior covariance given by S s (but this is not the approach adopted in GRAPE).
It is recognised that values placed in S p and, to a lesser extent S s , do not perfectly represent the true errors applicable to a given scene (though they are considered to be reasonable).This has the consequence that the cost function value resulting from retrievals cannot be expected to follow the chisquared distribution for the expected number of degrees of freedom and the estimated state-vector errors will not perfectly represent all contributions to the actual error covariance.In this paper, we state the assumptions that have been made in the application of the scheme in the GRAPE project.The performance of the scheme which makes these assumptions is assessed in (Sayer et al., 2011a).This analyses the relationship between retrieval quality and the value of the solution cost function.
It should also be noted that errors related to the deviation of the true vertical structure of cloud from the planeparallel model assumption made by the retrieval are not expressed in S y .Clouds which cannot be well represented by the plane-parallel assumption are, therefore, expected to give rise to high solution cost.This issue is considered further in Sect.10.A priori and first guess values depend on the assumed phase (phase determination is addressed in Sect.8).For liquid cloud, a priori values are 15, 8 µm, 800 hPa for optical depth, effective radius, cloud top pressure, respectively, for ice the equivalent a priori values are 15, 30 µm, 400 hPa.These values were chosen as they are typical of ice and liquid cloud, the values could be optimised in the future by using climatological values from recent instruments such as CALIPSO/CALIOP.The surface temperature a priori value is taken from European Centre for Medium Range Weather Forcasting, ECMWF, model fields.
For computational efficiency the observations were processed for an average of 3 pixels across track and 4 pixels along track, this corresponds to approximately 3 × 3 km.The a priori value for cloud-fraction (within the 3 × 4 pixel analysed scene) is the fraction of full-spatial resolution pixels flagged as cloudy.When running at reduced resolution it is necessary to apply a cloud mask before the retrieval is applied.
The distinction between the a priori estimate and first guess is important.The retrieval is constrained against any a priori information, the degree to which the retrieval is constrained depends on the associated a priori error.The a priori must be independent information, while this is not necessary for the first guess.In the retrievals performed here all state variables except surface temperature are effectively unconstrained as the a priori error is set to 10 8 .This value gives a completely flat probability distribution over the whole range of possible parameter values.The first guess does not constrain the retrieval, an appropriate first guess will mean that the retrieval will start closer to the solution and converge faster, but will not affect the result.This retrieval assumes no useful information is available as a priori information except on cloud fraction and surface temperature.The cloud fraction a priori error is set at a relatively small value of 0.1.This is because the cloud fraction was found in initial experiments to change erroneously to compensate for other inadequacies in the retrieval.We do not expect the retrieval to provide highly accurate information on surface temperature (except in cloud-free conditions) or fraction.These quantities are included in the state so that errors on their assumed values can be considered properly in fitting the other parameters and their errors propagated into the expected error on the other parameters.The surface temperature a priori error is 1 K over sea and 3 K over land.Comparisons we have performed between buoy and satellite data have shown that the error on SST (Sea Surface Temperature) is typically very much less than 1 K.However, in rare cases, such as upwellings close to land, this could be up to 5 K.The land surface temperature over most dark vegetated surfaces will be reasonably accurate.The land surface temperature over deserts and other surfaces with a strong diurnal cycle could indeed have an error greater than 3 K.This number will be reviewed for future implementations.The algorithm framework means that useful independent a priori information could be incorporated in the future.The a priori covariance is assumed to be diagonal i.e., the cloud properties are uncorrelated as the only terms which are significantly constrained by the prior covariance are the surface temperature and the cloud fraction and there exists no justification for assuming these to be correlated.Other terms are effectively unconstrained by the high assumed prior error.The initial (first-guess) values for statevector parameters are set equal to the a priori values.

Cloud/atmosphere/surface model
The forward model simulates the measured ATSR radiances for a given scene under the assumption that each scene is composed of a clear-sky fraction and a cloudy fraction.For the thermal channels the clear-sky atmosphere is defined by temperature and humidity profiles taken from ECMWF analyses (ECMWF, 2008) and RTTOV (Saunders et al., 1999).Fixed profiles for 3 latitude bands, tropics mid-latitudes and poles, have been used to model atmospheric transmission in the visible channels.The validity of this approach is discussed in Sect.7.1.3.
The surface is assumed to be a Lambertian reflector.Over sea the surface algorithm uses the model of Cox and Munk (1954a) and Cox and Munk (1954b) (using winds from ECMWF).Over land the MODIS albedo product for the year 2002 has been used in the processing of ATSR-2 and for AATSR.Error on the modelled surface albedo (Schaaf et al., 2002) is addressed via the assumed measurement covariances.For the infrared channels the surface is assumed to have an emissivity of 1.This is a reasonable assumption as the error on the emissivity for the 11 and 12 µm channels of ATSR will be small over sea ≈< 1 %.The error will be largest over bare soil especially deserts where errors ≈ 2 % will be encountered.Like surface albedo uncertainties the impact will be largest in fractional cloud and thin cloud scenarios affecting the accuracy of the retrieval in these scenarios.For example, the retrieved cloud top temperature might be expected to be erroneously low as the surface radiance contributions will be overestimated.Emissivity error was not modelled in the version of ORAC applied to the GRAPE (Global Retrieval and cloud Product Evaluation) ATSR climatology.The error will be considered in future applications.The temperature of the surface is a retrieved parameter (see Sect. 5).
In the current implementation of ORAC to the GRAPE dataset a cloud mask is required to identify regions of cloudy sky to process.Over sea the cloud flag used is that of Zavody et al. (2000).Over land the cloud flag used is that described in Birks (2004) which uses a NDVI (Normalised Difference Vegetation Index) technique, no infrared channels are implemented in the cloud mask over land so sensitivity to thin cloud will be reduced.Retrievals are only performed for scenes with a non-zero cloud fraction.Subsequent experiments have shown that by processing all pixels (clear and cloudy) a good cloud mask can be derived using retrieval information, but it is not used here.The selection technique used to determine cloud phase is described in Sect.8.
The forward model simulates radiances for the whole scheme by linearly weighting simulated radiances for the clear-sky and cloudy parts of the scene by cloud fraction, f (also a retrieved parameter).Cloud is assumed to be a single, plane-parallel, layer of either liquid or ice particles.The layer is assumed to be geometrically infinitely thin and is placed within the clear-sky atmospheric model.As the limited number of IR channels used have similar gaseous and cloud absorption properties this is probably justified from a radiance consistency point of view, however, the assumption leads to retrieved cloud top pressures that are effective values lying within the upper part of the cloud (this is clearly seen, e.g., in comparisons to CALIOP or CloudSat observations (Watts et al., 2011).It would be more precise to model a geometrically thick layer, but this relatively complex parameterisation is deferred.
The cloud layer is parametrised in terms of the following retrieved quantities: -the cloud phase, i.e., ice or liquid.
-The effective radius, r eff of the cloud particle size distribution.
-The optical depth, τ of the cloud at a fixed wavelength of 0.55 µm.
-The cloud top pressure, p c .
Particle size distributions for ice and liquid cloud are defined as a function of only r eff and τ .The shape of the modelled size distribution is used to determine r eff and τ defines implicitly the total number of particles.For ice clouds, single scattering properties (extinction coefficient, single scattering albedo and phase function) are taken from Baran and Havemann (2004).These are based on a mixture of ray tracing and T-Matrix methods.Size distributions themselves are those of warm uncinus cirrus cloud (Takano and Liou, 1989) with scaled distributions to give a range of effective radii.The definition used is that by Fu (1996), where the generalised effective size is given by where A c is the total cross-sectional area of cloud particles per unit volume, ρ i the density of ice and IWC the ice water content.
Single scattering properties of liquid cloud are derived by Mie theory assuming a modified gamma size distribution of particle radius r, (Hansen and Travis, 1974): where r m is the mode radius of the distribution.The radiatively significant effective radius, r eff , is given by: This approach reduces the complexity of cloud to a simple model with parameters which can be distinguished using the channels available on the ATSRs: the visible channel radiances are predominantly determined by the cloud optical depth.Near-IR channels are also sensitive to particle size and phase due to the dependence on size of the singlescattering albedo in that spectral range, and the associated differences between ice and liquid phase particles.Thermal channels predominantly provide information on cloud-top pressure (via the dependence of the cloud thermal emission on the atmospheric temperature profile).It is important to note that a cloud parameter is determined from all available measurements and not from a single channel or set of channels.All channels are sensitive to a greater or lesser extent dependant on the scene.We recognised that this simple model cannot represent all aspects of cloud three-dimensional structure.In the ideal case, the retrieved parameters will correspond to cloud top (p c and r eff ) or column total (τ ) values that are horizontal (over the scene) averages of the "true" cloudy properties.However, there may be classes of clouds, particularly those with strong vertical variations in particle size and phase, for which the model may or may not be able to produce radiances consistent with observations in all ATSR channels.When it cannot, the condition can be recognised because the retrieval will not converge with satisfactory cost and the retrieved products considered invalid.When it can, the retrieval will successfully converge and there is no way to know that vertical variations existed; the retrieved parameters will then be radiatively consistent effective values and not necessarily the physical averages desired.
There will be cases where multiple low cost minima are present in the cost function.In this case the search algorithm, Levenberg-Marquadt, will  One of the advantages of the ORAC OEM approach to use all data simultaneously is that the chances that the system can accommodate more than one solution is less than when subsets of information are used; i.e., the system is more likely to be over-constrained.
In Sect. 10 we specifically test the performance of the scheme under the more extreme case of varied multi-layer cloud conditions, diagnosing under what conditions the retrieval provides a good solution (within estimated errors of the "true" state) and whether the solution cost can effectively be used to distinguish conditions in which the model assumptions are inappropriate.

Radiative transfer model
Distinct RTMs are used for the solar (0.55, 0.67, 0.87 and 1.6 µm) and thermal (11 and 12 µm) channels.Fast radiative transfer is necessary to enable retrievals to be performed within practical computational constraints.To achieve the necessary speed both models have the following common aspects: -the effects of multiple-scattering inside the cloud are accounted for using pre-computed look-up-tables, LUTs.The values stored in the LUTs are described and listed in Table 1.The values are stored at discrete values of τ , r eff , solar zenith angle, satellite zenith angle and azimuth angle.LUTs are pre-computed using DISORT (Stamnes et al., 1988).During retrievals they are linearly interpolated to obtain the required values.
-Radiative transfer is performed quasimonochromatically, i.e., a single radiative transfer calculation is performed for each channel, taking as input channel spectral-response function convolved optical properties (e.g., clear-sky transmission to cloud layer, cloud single-scattering properties).Errors from this approximation are known to be negligibly low for the AATSR channels, but may become significant for channels of other instruments with strong variations in optical properties across the spectral response.Tests have been carried out for the extreme case of the Meteosat Second Generation, MSG, SEVIRI 3.9 µm channel (which is much wider than the AATSR 3.79 µm channel and encompasses a strong CO 2 absorption features as well as a strong gradient in the Planck function).This indicates errors generally less than 1 K (Siddans et al., 2010).
-Derivatives with respect to state-vector elements (i.e., the elements of K) are analytically computed.The derivatives are required to calculate J .They are calculated by (i) differentiating the equations in Sect.7.1.3to give the derivative of the simulated radiance with respect to the LUT parameters (ii) calculating the derivatives of the interpolated LUT parameters with respect to the state variables (iii) applying this chain to infer the radiance derivatives with respect to the state variables.
Details of the solar and thermal RTMs are provided below.

Radiometric terminology
Consider a spherical coordinate system whose origin is centred on a small area dA.The spherical coordinates are orientated so that θ is the angle from the normal of dA and φ is the angle in the plane of dA.The movement of electromagnetic energy can be discussed in terms of radiance L, which is the rate of energy propagation in a given direction per unit solid angle per unit area perpendicular to the axis of the solid angle (ISO, 1992).The distribution of radiance with wavelength is expressed by the spectral radiance L λ (λ) such that dL(λ) = L λ (λ)dλ represents the radiance in the interval To describe the reflection of radiation by dA we consider incident radiance dL i from direction (θ i , φ i ) giving rise to a reflected radiance dL r travelling in direction (θ r , φ r ).For convenience these direction pairs will be represented as solid angles ω i and ω r , respectively.By using these definitions θ i and θ r are always in the range [0, π/2] and this avoids their cosine ever being negative.The incident ray subtends a solid angle dω i = sin θ i dθ i dφ i at dA while the reflected ray subtends a solid angle dω r = sin θ r dθ r dφ r .The bidirectional reflectance distribution function (BRDF)f r (λ, ω i , ω r ) is defined as the radiant reflectance per reflected solid angle in (Schaepman-Strub et al., 2006): A Lambertian reflector reflects incident energy isotropically.Its BRDF is, therefore, f r = R/π which is independent of incident or reflection angle and where R is a constant in the range [0, 1].An ideal Lambertian reflector redirects all the energy that is incident on it (i.e., R = 1) so f r = 1/π.It is convenient to use a bidirectional reflectance factor or reflection function (Takano and Liou, 1989) which is defined as the BRDF relative to that from an ideal Lambertian surface.
The bidirectional reflectance factor R(λ, ω i , ω r ) is then: Using this definition, the reflected radiance for diffuse illumination (incident radiation not confined to a beam, but spread over the hemisphere) is: Where the notation for an integral over the hemisphere has been abbreviated as: If the incident field is isotropic then the integral in Eq. ( 12) can be performed with only knowledge of the bidirectional reflectance factor.This gives the hemispherical-directional reflectance factor for isotropic illumination R(λ, 2π , ω r ) where the argument 2π is used to indicate the use of this term is limited to the cases where the input radiance is isotropic.Different integrations give further reflection terms which are shown in Table 1 along with equivalent terms for the diffusely transmitted radiation which are derived from the transmission function T (λ, ω i , ω t ) defined by: where the transmitted ray L t is travelling from direction ω t (= θ t , φ t ).There is no consistent naming or notation of reflectance and transmittance terms in the literature so we have listed names we have encountered and have followed (Schaepman-Strub et al., 2006) in adopting a notation where a diffuse (but not isotropic) energy flow incident, reflected or transmitted from a layer is indicated in the argument of a term by 2π.In this way the redirection of energy between directional beams and diffuse fields in the expression for reflection or transmission can be easily interpreted.

An AATSR short wave measurement
The shortwave AATSR signal is a measurement of energy; a weighted sum of radiance over wavelength and over the instrument field-of-view (FOV) for some instrument measurement period.However, in common with most shortwave imagers the reported value for a scene is a "Sun-normalised radiance" which is defined as the ratio of the measured radiance to the radiance that would be observed from a perfect Lambertian reflector illuminated by the Sun.The forward model simulation of the measured Sun-normalised radiance starts by establishing a spherical coordinate system whose origin is the centre of the scene of interest.In this system the solar direction (θ 0 , φ 0 ) is abbreviate as the direction vector ω 0 .The energy per unit area per unit time illuminating the scene is cos θ 0 E 0 λ (λ)dλ where E 0 λ (λ) is the top of atmosphere solar spectral irradiance.The spectral radiance reflected by an ideal Lambertian scene would then be cos θ 0 E 0 λ (λ)/π.The variation in reflectance with wavelength and geometry is expressed as: where L r λ (λ, ω r ) denotes the reflected spectral radiance propagating in direction ω r = (θ r , φ r ).
For each short wave channel i the ATSR instruments report a Sun-normalised radiance, R i that is formed by calibrating the observed scene signal with the signal from a near-ideal diffuse reflector illuminated by the Sun (Smith, 2005).If each channel is defined by a response function, (λ), whose limits are [λ 1 , λ 2 ] then the Sun-normalised radiance for channel i can be expressed as: where ς (ω) denotes the geometric response function of the instrument.Note that the coordinate system used in this expression is centred on the instrument (but can be related to scene centred coordinates through appropriate geometrical transforms).If ς (ω) is constant across the field-of-view then the outer integral can be completed and the expression becomes: In the limit of a very narrow band, the measured Sunnormalised radiance is a good approximation to the bidirectional reflectance factor R(λ, ω i , ω r ) evaluated at the response weighted mean wavelength of the channel.

Visible and near-infrared radiative transfer model
The visible and near-infrared radiative transfer model assumes the observed scene is composed of a homogeneous cloud layer, with a fraction cover f , and clear sky.The bidirectional reflectance factor is the weighted sum of the cloudy, R i• , and clear, R i• , bidirectional reflectance factors: The gaseous absorption optical depth of the atmosphere is calculated by MODTRAN (MODerate resolution atmospheric TRANsmission) (Berk et al., 1989) using fixed US standard atmospheric profiles for different latitude bands.This is fine for AATSR which has little sensitivity to water vapour.However, for instruments that have greater sensitivity to water vapour and high satellite zenith angle the scheme has been modified for future application to use ECMWF and RTTOV for both the thermal and solar channels (Siddans, 2011).The gaseous optical depths are weighted by the instrument spectral response function to account for the rapid variation of transmission across a channel.This total absorption optical depth is then partitioned into the above cloud optical depth τ ac and the below cloud optical depth τ bc based on the cloud top pressure relative to the surface pressure.The LUTs includes the Rayleigh scattering produced by the whole atmosphere assuming a fixed surface pressure, i.e., typical ocean surface pressure of 1013 hPa.Variations in surface pressure are not modelled for the AATSR channels, which leads to errors of up to around 0.002 in sun normalised radiance, except in locations of high surface elevation.The scheme could be easily extended to take this into account by adding a surface pressure dimensions to the LUTs, but would require additional processing time.
The spectral bidirectional reflectance factor for the noncloudy portion of the instruments view is given by the surface bidirectional reflectance factor, R SFC i (ω 0 , ω r ) attenuated by the gaseous absorption of the atmospheric column, i.e.: R i• = e −(τ ac +τ bc )/ cos θ 0 R SFC i (ω 0 , ω r )e −(τ ac +τ bc )/ cos θ r .(19) For the cloudy fraction of a scene the atmosphere is modelled as having three layers: a below-cloud layer, a cloud layer and an above-cloud layer.The above and below cloud layers consist of gaseous absorbers that attenuate radiation without scattering.
The surface is assumed Lambertian with reflectance R SFC i (2π, 2π ).This means that the directionality of the radiance onto the surface can be ignored.The advantage of this formulation is that the multiple scatters between the cloud and the surface are contained in diffuse terms.Ignoring the below cloud absorption the bidirectional reflectance factor for channel i at the top of atmosphere is given by: where T CLD i (ω 0 , 2π ) is the cloud directional-hemispherical total transmittance factor and T CLD i (2π , ω 0 ) is the cloud hemispherical-directional total transmittance factor.The cloud bi-hemispherical reflectance is given by The multiple reflections between cloud and surface, shown stylistically in Fig. 1, give rise to a geometric series which can be evaluated analytically.To complete this model we parameterise the transmittance of the layer below the cloud as: where τ bc is the optical thickness of the layer.This assumes the mean angle of below cloud transmittance is 66 • , chosen to give a reasonable approximation to the transmission appropriate to the diffuse reflection (Watts et al., 1998).Including the below cloud absorption within the forward model gives:

Thermal-infrared radiative transfer model
The thermal RTM makes extensive use of the RTTOV model (Saunders et al., 1999).RTTOV directly provides the modelled radiance from the clear-sky fraction of the scene.As in the solar case the TOA radiances are a linear combination of overcast and clear atmospheric radiation as per Eq. ( 18).The observed radiance for the cloudy part of the scene is modelled in terms of contributions from four terms: transmission of the radiance upwelling from below cloud level, emission from the cloud, reflection of radiance downwelling from above cloud level and emission of radiance from the atmosphere above the cloud: RTTOV directly computes clear sky terms L ↑ bc , the upward radiance at the cloud base, L ↓ ac , the downward radiance at the cloud top from the atmosphere, and L ↑ ac , the TOA radiance from the atmosphere above the cloud.The cloud effective emissivity, defined by ε CLD , is the efficiency compared to a black body at which the cloud emits radiation (it is a function of view angle only).The emissivity is computed by DISORT and tabulated in LUTs.The cloud transmission and reflection properties are calculated and stored in LUTs in exactly the same way as for the shortwave channels see Sect.7.1.2.The Planck function for the temperature of the cloud (obtained by interpolation of the model temperature profile to p c ) is defined by B(T (p c )).The RTTOV calculations are performed 6 hourly and use ECMWF analyses for input.

Accuracy of radiative transfer model
The accuracy of the radiative transfer model has been evaluated by comparing the fast forward model equations with calculations performed by DISORT for an ensemble of conditions totalling 6048 cases and comprising: -3 solar zenith angles 9, 45 and 72 • -2 satellite zenith angles 0 and 50 • -2 azimuth angles 36 and 144 • -7 cloud optical thicknesses 0.01, 0.10, 0.32, 1.0, 3.2, 10. and 100.
-3 cloud top pressures 270, 490 and 750 hPa -3 model atmospheres tropical, mid-latitude and polar The results of the comparison for the different channels are summarised in Tables 4 and 5. Table 4 summarises the ensemble of radiances considered in the comparison, while Table 5 summarises the differences between the Fast Forward Model and DISORT (Stamnes et al., 1988), DISORT itself of course is not perfect, but it serves as a good reference.In general the differences are small, but in a few cases, not explicitly shown in this paper, the difference may be large, for example, where the optical depth is 0.32 and the cloud is at a high altitude.

Phase determination
The only retrieved parameter not directly included in the state-vector is phase, principally because it is treated as a binary and not a continuous variable.Phase is retrieved as follows: -at the beginning of a retrieval, the phase of the cloud is assumed to be ice or liquid based on the value of the calculated overcast brightness temperature of the 11 µm channel.The threshold between ice and liquid is assumed to be 260 K.
-The phase may be switched during the retrieval iteration according to the following criteria: -The phase change from liquid to ice is identified when the current estimate of r eff exceeds 23 µm (provided the scheme has not converged to this value as its final solution).When this threshold is reached, the retrieval is restarted assuming the cloud to be ice.
-Similarly if r eff for ice cloud becomes lower than 20 µm the retrieval is re-started assuming liquid.
-Only one change of phase is allowed in the retrieval.
www.atmos-meas-tech.net/5/1889/2012/Atmos.Meas.Tech., 5, 1889-1910, 2012 It is recognised that ice clouds do exist with r eff < 20 µm, and the retrieval will not provide reliable results in such situations.An alternative approach to the selection of cloud phase which would partly avoid this problem would be to simply run the retrieval twice, once for each phase, and select the most probable phase based on solution cost.Mixed phase clouds are not considered.Mixed phase clouds will either be retrieved as ice or liquid and with "average" liquid/ice values or with a high cost.In practice the boundaries on effective radius make little difference to the result.

Cloud ice and liquid path
In addition to the retrieved state parameters a number of cloud variables are derived and stored in the global level cloud products.Cloud water path, CWP, is derived using the method of (Han et al., 1994): where Q ext , the extinction coefficient, is assumed to be 2 for liquid and 2.1 for ice, for wavelengths much less than r eff .
The density is 1 ρ (g m −3 ) for liquid and 0.91267 (g m −3 ) for ice.Depending on phase, CWP is also known as liquid water path (LWP) or ice water path (IWP).

Retrieval scheme performance
In this section, we examine the theoretical performance of the cloud retrieval algorithm in the configuration used when processing ATSR data.Three questions are addressed in terms of the retrievals sensitivity and our ability to identify situations when the retrieval does not perform well: 1. at what optical depths and effective radii does the retrieval algorithm perform well for single-layer clouds?
2. How does the retrieval perform in the presence of multilayer cloud, given the cloud model used is a single layer, and can we identify when a single cloud layer model is inappropriate?
3. How sensitive is the retrieval to assuming an incorrect cloud, i.e., how well does the cloud retrieval perform when ice LUTs are applied to a liquid cloud and vice versa?
The questions are addressed by performing linear error simulations in the first case, and nonlinear retrieval simulations for the last two questions.In linear simulations, the sensitivity of observations to cloud parameters and error sources is computed for a specific set of atmospheric profiles and observing conditions.Observation sensitivities are then transformed into retrieval sensitivity assuming that the cloud forward model is linear within some suitable range about the atmospheric/observing state.In nonlinear simulations the solution is found iteratively, in this case Levenberg-Marquardt technique is used.

Sensitivity study of single-layer cloud retrievals
In this section, linear simulations are used to evaluate the sensitivity of observations to cloud parameters and error sources.

Simulation setup
In addition to the retrieval setup outlined in Sect.4, the linear simulations were performed for the following scenarios: -Solar zenith angle 30 • , relative azimuth 0 • and satellite zenith angle 0 • .
-The retrieval uses a standard temperature, humidity, and trace gas profile for northern mid-latitudes.
-The surface is assumed to be sea, with a Lambertian reflectance of 0.01 for all visible channels.
-Cloud fraction is fixed to 1.
Figure 2 shows the simulated retrieval errors, i.e., the square root of the error covariance, S x as a percentage of the expected retrieved parameter for a single layer of ice or liquid cloud with varying optical depth and effective radius.
(Note that the radius ranges are different for the two different phases to reflect realistic scenarios).The primary findings are: -the percentage uncertainty on the optical depth increases as clouds become optically thick and is high for optically thin clouds (i.e., clouds with an optical depth less than 1).
-The percentage uncertainty for the effective radius is highest for optically thin clouds and very small radii (i.e., clouds with a radius less than 5 µm).
-The percentage uncertainty on the cloud top pressure is largest for optically thin clouds.
-The percentage uncertainty for cloud liquid/ice path is greater for ice clouds and increases with optical depth and effective radius and is high for optically thin clouds.From this simulation it is clear that with the current number of measurements used with this model, optically thin clouds, clouds with a small effective radii are difficult to retrieve with low retrieval error.When the cloud becomes optically thick then the cloud optical depth may be significantly underestimated.For thin clouds the contribution from the surface becomes increasing important to model correctly, this will be more significant over bright surfaces such as deserts.For thick clouds as the optical depth increases the rate of change in the visible channels decreases until a "saturation" point is reach.The retrieved error reflects this scenario.It is interesting to note that the saturation point is reached at a lower optical depth for ice clouds than for water clouds.The uncertainty on the ice crystal effective radii increases at higher values of radii as the rate of change of the 1.6 µm) channel flattens.This simulation assumes that the optical models of the liquid and ice cloud are correct, and uncertainty in the models would add uncertainty to the state.Ice clouds are typically more difficult to model than liquid clouds due to the variation in shape, i.e., hexagonal aggregates, rosettes and the choice of optical model could have a significant effect on the accuracy of the retrieval (Zhang et al., 2009).

Simulations of multi-layer cloud
The frequent occurrence of multi-layer cloud is one of the most difficult problems facing passive satellite remote sensors of cloud (Chang et al., 2005).Incorrectly retrieving multi-layer cloud with a single layer cloud model could potentially result in incorrect and biased retrievals of the cloud properties.A typical occurrence of multi-layer cloud is thin cirrus over stratus cloud and it is this situation that is investigated here.To evaluate the effect of this category of multilayer clouds, we apply the cloud retrieval scheme to a set of simulated radiances.The effective radius and cloud top pressure are fixed.The radiances are generated by varying the optical thickness for each layer.The retrievals are performed separately assuming liquid and ice optical properties.The effect of using ice cloud optical properties when retrieving a liquid cloud, and vice-versa, is also discussed.

Simulation setup
The simulation was setup as in Sect.10.1 with the following extra conditions.The values of the simulated multi-layer cloud are: -320 hPa and 780 hPa cloud top pressure for the ice and liquid cloud layer, respectively.
-50 µm and 12 µm effective radius for the ice and liquid cloud, respectively.
-The optical depth of each layer analysed are combinations of optical depths studied in Sect.10.1.
It is necessary for the purposes of evaluation to define "true" values against which to evaluate the retrieval.The "true" value for effective radius, cloud top pressure and cloud water path has been defined by the phase selected by the retrieval, liquid or ice, which is the phase with the lowest cost.In the scenarios considered here the phase is nearly always ice except where the upper cloud layer is optically thin (less than 1 optical depths) and the lower cloud layer is more opaque.
The "true" optical depth is the sum of the optical depths of each layer.The multi layer scenario of 0.01 optical depth upper layer cloud is representative of a single layer liquid cloud while a 0.01 optical depth lower layer is effectively a single layer high cloud.
Figure 3 shows a false colour image of the simulated multi-layer cloud for different layer thickness.The false colour image is created using the 0.67, 0.87 and 1.6 µm channels.Ice crystals are absorbing in the 1.6 µm channel, resulting in the pale blue colour which increases in intensity as the overlapping cloud increases in thickness.Thick liquid clouds also absorb strongly in the 1.6 µm channel.
Figure 4 shows the results of performing nonlinear retrievals of single and multi-layer cloud assuming a single layer cloud model.The plots show the retrieved values of the cloud parameters and the corresponding cost, or goodness of fit. Figure 5 shows the percentage differences between the "true" and retrieved cloud parameters.These retrieved values are then compared with the retrieval error for the cloud parameter, the following points are noted: -all retrievals converged regardless of if the retrieval was accurate or not.
-Multi-layered clouds where the upper ice layer had an optical depth ≤1 generally fit the liquid LUTs best.In these cases the retrieved properties were closer to the "true" liquid cloud and the cloud phase selected on the basis of cost is liquid.
-The most poorly retrieved cases were thin ice clouds over thick liquid cloud.In these cases the effective radius of the ice cloud was underestimated, the clouds placed too low in the atmosphere and the retrieved optical depth underestimated.These scenarios did not always exhibit the highest retrieval error, however, the cost is nearly always greater than 1 in these cases.
-The ice cloud parameters are retrieved with higher accuracy and low cost when the lower layer of liquid cloud is thin compared to the upper layer.
-The optical depth is underestimated when the underlying liquid cloud is very thick.Optical depth is rarely overestimated except for thin clouds.
-The cloud top pressure retrieved when the upper layer is thin is generally the intermediate or the "effective" cloud top pressure of both layers.A high cost is usually observed in these cases.
-When the cost is low, the retrieval error on the multilayer clouds approaches that of a single-layer cloud retrieval.
In summary, the retrieval is performing well for single-layer clouds.For multi-layer clouds the optical depth and effective radius are generally retrieved with reasonable accuracy when the upper ice layer is thick and the lower liquid layer has an optical depth smaller than the ice layer.When the retrieval with the lowest cost is selected the phase is that of upper layer of the cloud when the scene is multi-layered.Optical depth will be underestimated for very thick liquid clouds.Interestingly, using the ice LUTs results in generally lower values of optical depth which arise because of the different scattering phase function of ice, which tend to be more forward scattering.Cloud top pressure will be overestimated (i.e., clouds will have a lower altitude) in most multi-layered cloud scenarios.In this sense the retrieved pressure represents some radiative average of the cloud layers.The cloud retrieval here uses only the 1.6 µm channel which is sensitive to cloud layers deeper into the cloud than the 3.7 µm channel (Platnick, 2000).Hence, the upper cloud layer needs to be thick to dominate the signal, i.e., greater than approximately five optical depths, or the effective radius retrieved will be a mixture of liquid and ice cloud.The cost can be seen to be a useful parameter for identifying multi-layer clouds, i.e., high cost is an indicator of poorly retrieved multi-layered cloud parameters.If the cost is low, as in the case of thin cloud scenarios, the retrieval errors will nearly always reflect the error on the retrieval.However, the authors realise that these results are simulations and in real cases other sources of high cost may somewhat obscure the multi-layer detection.
Liquid and ice water path are derived from the retrieved optical depth and effective radius.For lack of other information the cloud is assumed to be a single phase, ice or liquid in the calculation of retrieved IWP/LWP.For reference the "true" LWP/IWP is calculated for each layer using the correct phase by applying Eq. ( 24).In general the retrieval of IWP and LWP has a smaller percentage error than the optical depth and effective radius individually as the optical depth and effective radius tend to compensate for each other to achieve a more accurate IWP or LWP.When the cloud is very thick, i.e., optical depth > 50, then the IWP/LWP are underestimated.For scenes with thick ice cloud over liquid cloud the IWP/LWP are often overestimated.In difficult (i.e., retrievals with high cost) multi-layer scenarios the retrieved cloud ice/liquid water path is underestimated.The global bias caused by this will depend on the type and frequency of multi-layered clouds.

Example retrieval
In this section, we show an example retrieval of cloud using ATSR-2 data from the 10 November 1999 over Europe.Figure 6 shows the false colour image and the cost of the retrieval.The pale blue colouring denotes the likely presence of  ice cloud due to absorption in the 1.6 µm channel.The scene comprises a variety of cloud types over land and sea.Directly over Switzerland is some thick ice cloud, towards the Italian coast the cloud is thin liquid cloud.The value of the cost is high where the cloud is thin or the scene is clear (i.e., the scene has been falsely masked cloud).Elevated costs are also apparent over thicker cloud banks, perhaps indicating the presence of multi-layered cloud.The white spaces within the ATSR swath images show where no cloud has been de-tected using the cloud mask (see Sect. 5 for details on cloud mask used).
Figures 6-8 show the retrieved cloud parameters, the associated error in the retrieval and the selected cloud phase.From these figures the following observations are made: -cloud top pressure appears to be realistic.The error on the cloud top pressure retrieval is highest where the cloud is thin and the surface contribution to the TOA radiances becomes significant.Plots showing the percentage differences between the "true" and retrieved cloud parameter, left column, compared to the retrieval error for the cloud parameter, right column.From top to bottom the results are shown for optical depth, effective radius, cloud top pressure and cloud liquid/ice path.The 'truth' is defined by the phase selected.The "X" in plot (g) denotes when a liquid phase was selected as the retrieval with the lowest cost as per Fig. 4. -The uncertainty in the optical depth retrievals is proportional to the thickness of the cloud, i.e., thick clouds have the highest error in optical depth.
-Ice clouds are retrieved with a higher effective radius than liquid clouds.The uncertainty is highest when the clouds are thin.
-Low cloud fractions are typically retrieved at the edges of larger cloud fields.
-The cost is highest when the cloud is thin or where there is no cloud visible to the eye in the false colour image.Enhanced cost values are visible around the edge of identified cloud fields, possibly due to 3-D radiative transfer effects such as shadowing or horizontal photon transport and error in the surface albedo as the edge clouds are likely to be thin.
-Ice cloud phase is collocated with clouds identified as being high or with large effective radii, as would be expected.
-There is a clear discontinuity in the cost as it goes from sea to land.The cost should be higher over land than over the sea due to the difficulty of modelling the surface over land.However, due to the excessively large albedo-related error assignment described in Sect.10.1 the cost values over land are too small compared with those over sea.More well characterised assignment of surface forward model error would produce more realistic cost estimates.
-It is clear that the cloud mask used in the retrieval is not always accurate and uncertainty in the cloud mask propagates into the retrieval.The cloud mask used here, which is typical of many cloud masks generally underestimate the amount of cloud.In the future it maybe more appropriate to process individual pixels assuming each pixel is fully cloudy and identifying likely clear  The retrievals from the ATSR-2 scene over Europe appear consistent with the simulated retrievals described previously.A more complete validation of the cloud properties can be found in (Sayer et al., 2011a).

Conclusions
A method of optimally retrieving cloud parameters from passive visible through infrared satellite data has been described.The algorithm is capable of using the instrument noise characteristics and a priori information with associated retrieval errors to provide cloud macro and microphysical properties (cloud optical depth, effective radius, cloud top pressure and cloud fraction).The algorithm is based around a forward model which uses look up tables for computational speed.
The LUTs for liquid cloud droplets are based on Mie scattering while the ice LUTs are calculated using optical properties from ice crystals.A key advantage of this technique is that it uses all channels, and derives all parameters, simultaneously, meaning the resulting cloud parameters are radiatively consistent with the measurements.It also means that the observing system is likely to be more constrained then if subsets of channels were used.In addition the retrieval provides two quality control measures, the cost and the expected errors.The cost assesses the quality of fit to the model used and where the fit is good then the accuracy of the retrieval is based on the expected error.By basing the selection of phase on the lowest cost retrieval correct cloud phase could be deduced in most cloud scenarios.The cost and error information provided will enable effective use of the products for comparison with climate models or for exploitation via data assimilation.Using linear simulations the retrieval is found to be accurate for single layer clouds except when the cloud is very thin < 1 optical depths or approaching very high optical depths, i.e., typically > 50.Nonlinear retrieval simulations have been performed to assess the sensitivity of the retrievals to assuming a single layer cloud and to the choice of incorrect phase used to retrieve cloud properties.For many multilayer cloud scenarios, e.g., when the layers are of comparable thickness or the lower layer is thin and the upper layer is greater than five optical depths, the retrieval is relatively robust.However, retrievals of multi-layer cloud when the upper ice cloud layer is less than five optical depths will generally retrieve cloud top pressure and cloud effective radius which are an average of the two-layers.The retrievals of optical depth are relatively robust, but are underestimated when the two-layer cloud is very thick.
In addition to the state parameters retrieved directly by the algorithm, cloud liquid or ice water path are derived.These are underestimated in multi-layer situations, except when the lower layer is very thick compared to the upper layer.As thin ice cloud over liquid cloud scenario is relatively frequent, users of the data should consider the cost and retrieval error when using the data.The cost and retrieval errors have been identified as useful quality indicators to assess when the model used is appropriate and retrieval is accurate.The algorithm is globally applicable, however, the performance will have a dependence on the uncertainty associated with the location and the type of cloud.In the case of regions with high surface reflectance such as deserts and poles the retrieval will have a higher uncertainty, particularly for thin clouds.The technique has been demonstrated using satellite data from ATSR-2 for the compilation of the GRAPE dataset.The ATSR-2/AATSR dataset can be downloaded at www.badc.co.uk.The algorithm is applicable to most passive visible, near infrared and infrared sensors such as ATSR, MODIS, AVHRR and SEVIRI.Many of these sensors have additional channels.The addition of more measurements and, hence, more information is likely to increase the sensitivity to thin and cloud and the vertical profile enabling more accurate retrievals.
The state vector x in Eq. (1) used in the retrieval comprises the following:

Fig. 1 .
Fig. 1. Figure illustrating the multiple reflections between cloud and surface.

Fig. 2 .
Fig. 2. Simulated retrieval error as a function of varying effective radii and cloud optical depth for cloud optical depth (a) and (b), effective radius (c) and (d), cloud top pressure (e) and (f) and cloud liquid/ice path (g) and (h).Results for simulated liquid (left) and simulated ice (right) single layer cloud over sea.The numbers printed in the white boxes indicate values exceeding the scale of the colour bar.

Fig. 4 .
Fig. 4. Nonlinear retrievals of single and multi-layer cloud performed using a single-layer cloud model using the LUT (cloud or ice) that produced the lowest cost.The scenario where the upper cloud layer is 0.01 optical depths is equivalent to retrieving a single layer cloud.Plots(a), (b), (c), (d)show the retrieved optical depth, effective radius and cloud top pressure and cloud liquid/ice path, respectively.The phase of the best retrieval is indicated on plot (a) the "X" indicates the best retrieval was a liquid cloud.Plot (e) shows the cost.
Fig. 5.Plots showing the percentage differences between the "true" and retrieved cloud parameter, left column, compared to the retrieval error for the cloud parameter, right column.From top to bottom the results are shown for optical depth, effective radius, cloud top pressure and cloud liquid/ice path.The 'truth' is defined by the phase selected.The "X" in plot (g) denotes when a liquid phase was selected as the retrieval with the lowest cost as per Fig.4.

Fig. 6 .
Fig. 6.False colour image of clouds over Europe from ATSR-2 on the 10 November 1999.The pale blue colour results form absorption in the 1.6 µm channel and is an indicator of ice phase or very thick clouds (top left), cost of the retrieval (top right).Cloud top pressure (bottom left) and error on cloud top pressure (bottom right), The white area within the swath shows where the region has been designated cloud free using the cloud mask.

Fig. 7 .
Fig. 7. Cloud optical depth and error on cloud optical depth Cloud effective radius and error on cloud effective radius for the image of Fig. 6, 10 November 1999.

Table 2 .
ATSR instrument specifications.Note that for SLSTR the numbers specified are provisional as the instrument is still in the design phase.The swath width for SLSTR is the width of the dual view swath, the single view swath will be wider.

Poulsen et al.: Cloud retrieval algorithm the
ensemble behaviour of retrieved parameters.One such example manifest is boundary layer cloud where cloud top pressure solutions can appear below and above the inversion at altitudes of similar temperature.This is a consequence of an under-determined system and no algorithm based on the same information could reliably resolve the issue.The solution is to report all the multiple low cost minima, but this is mostly impractical.Practically the solutions to the multiple low cost minima problem must involve additional information (e.g., in the case of the boundary layer cloud top pressure, to identify inversion conditions in the Numerical Weather Prediction (NWP) temperature profile and constrain the cloud top pressure solution into the boundary layer).

Table 3 .
ATSR retrieved cloud properties units and range for GRAPE.

Table 4 .
This table summarises the ensemble of radiances considered in the comparison between the fast forward model and DIS-ORT, the visible channel units are reflectance and the infrared channel units are in Kelvin.

Table 5 .
This table summarises the differences between the Fast Forward Model and DISORT, the visible channel units are reflectance and the infrared channel units are in Kelvin.