The FORUM End-to-End Simulator project: architecture and results

FORUM (Far-infrared Outgoing Radiation Understanding and Monitoring) will fly as the 9th ESA’s Earth Explorer mission, and an End-to-End Simulator (E2ES) has been developed as a support tool for the mission selection process and the subsequent development phases. The current status of the FORUM E2ES project is presented, together with the characterization of the capabilities of a full physics retrieval code applied to FORUM data. We show how the instrument characteristics and the 5 observed scene conditions impact on the spectrum measured by the instrument, accounting for the main sources of error related to the entire acquisition process, and the consequences on the retrieval algorithm. Both homogeneous and heterogeneous case studies are simulated in clear and cloudy conditions, validating the E2ES against two independent codes: KLIMA (clear sky) and SACR (cloudy sky). The performed tests show that the performance of the retrieval algorithm is compliant with the project requirements both in clear and cloudy conditions. The far infrared (FIR) part of the FORUM spectrum is shown to be sensitive 10 to surface emissivity, in dry atmospheric conditions, and to cirrus clouds, resulting in improved performance of the retrieval algorithm in these conditions. The retrieval errors increase with increasing the scene heterogeneity, both in terms of surface characteristics and in terms of fractional cloud cover of the scene.

All-sky broadband spectral flux FORUM Level 1 FIR outgoing longwave radiation (OLR) extended to broadband with the Level 1 from IASI-NG, consistent with independent broadband flux observations to within the radiance to flux uncertainty, with minimal bias averaged over all scenes Water vapor profile Vertical profiles of water vapor concentration with 15% uncertainty at 2 km vertical resolution Surface emissivity 0.01 in the 300-600 cm −1 spectral range for polar region on 50 cm −1 spectral grid Ice water path (IWP) 20 g/m 2 Cloud Top Height (CTH) 1 km Particle size diameter 20% FEI measurement frequency ≥5/dwell time of the sounder FEI and FSI line of sight co-alignment Co-aligned within 0.7 mrad FEI spectral channel 11.5 µm with 2.0 µm width (G), 10.5 µm with 1.5 µm width (T) FEI NEdT smaller than 0.3 K (G), 0.8 K (T) at 210 K FEI ARA better than 1 K (G) and 2 K (T) at 210 K The area covered by the FEI field of view (FoV) is designed to include the FSI pixel (a circle of 7.5 km radius) and IASI-NG co-located measurements. A spatial sampling of 0.6 km is needed to detect clouds to which the FSI is sensitive (ESA, 2020; Dinelli et al., 2020).

The role of the E2ES
120 The key feature of the E2ES is the capability to introduce and single out possible causes for the discrepancies between the true and retrieved parameters. This applies both to the L1 products, i.e. the difference between the radiation reaching the instrument and the output spectrum, but also to the L2 products, i.e. the difference between the true atmospheric state and the atmospheric components retrieved by the inversion module.
The usual way of performing L2 simulations consists in using the same code for simulating the instrumental spectrum, 125 adding some random error to the exact values, and then retrieving the atmospheric state. In other words, errors aside, all the simulated cases could be exactly retrieved by the inversion code. In the case of the E2ES, it is possible to simulate external factors which must be definitively considered in the operative life of the instrument, and that can have an impact on the accuracy of the measurements. The main factors that can be simulated by the E2ES are: -The instrument pointing errors;

130
-The lack of homogeneity in the instrument field of view; -The errors due to the instrument hardware that are not fully represented by the Instrument Spectral Response Function (ISRF), which acts as a convolution kernel on the simulated high resolution radiances.
Unfortunately, there is often no sensitivity in the measurements to address such fine details. Nevertheless, we are able to quantify the degradation of the results due to these external factors.

135
The operative L2 products are affected also by the model error. The radiative transfer is performed via a code that emulates the main mechanisms governing the diffusion of the radiance in the atmosphere. Our earth is however a much more complicated framework, the most evident difference is that the atmosphere is a continuous system, while a discretization technique has anyway to be applied in order to be fed into a computer. However, this lack of knowledge cannot be attributed to the instrument, so that adding a further uncertainty only interferes with the assessment of the instrument concept. Thus, the same forward model 140 has been used in the generation of the scene and in the retrieval. In this way, we are sure that the radiative transfer does not introduce any bias with respect to the simulated observations.

E2ES architecture
The E2ES is structured as a chain of modules, each performing a particular task. Each module relies on the data produced by the previous modules. The modules of the FORUM E2ES are:

145
-The Geometry Module (GM) calculates the true and error affected geographical coordinates of the field of view based on satellite location and pointing.
-The Scene Generator Module (SGM) calculates the high resolution radiances reaching the instrument, using the exact geographical coordinates and the prescribed atmospheric parameters. This is considered the true state.
-The Observing System Simulator (OSS) simulates the acquisition process of the instrument. Since in the preliminary 150 phase, two different instrument concepts were available, two different configurations reproduce the two different hardwire instrument specifications.
-The Level 2 Module (L2M) uses the spectrum generated by the OSS and the noisy geographical coordinates to retrieve atmospheric parameters.
-The Performance Assessment Module (PAM) compares the true versus the retrieved L1 and L2 products and produces a 155 report on the discrepancies.
The execution chain can be driven and checked from an external environment, the OpenSF structure, which is tailored for this kind of simulators. Figure 1 represents the E2ES structure with the data flow.
In the following subsections we briefly describe the purpose of the various modules and sketch the algorithms. A complete documentation may be found on the website devoted to FORUM at: https://www.forum-ee9.eu/.  The GM produces two types of instrument geolocation grids annotated with the sampling time and the observation angles: 165 error-free grids (on top of which the SGM simulates the TOA spectrum) and estimated grids (intended to feed the OSS with a geolocation that is affected by telemetry and calibration uncertainties). The GM output files are written in netCDF format.
The GM configuration file, which is based on XML syntax, allows the user defining several simulation parameters as the latitude and longitude of the region of interest, the acquisition start and stop times, the satellite orbit and attitude (optionally also with knowledge errors), the IERS Bulletin providing Earth Rotation Parameters, the focal length, the mirror scanning law, 170 the focal plane arrangement of FEI and FSI instruments and the spatial oversampling factor. The line of sight modelled with these parameters for every sample is then sent to the EO-CFI (Sánchez-Nogales, 2020) for computing the geodetic coordinates of its intersection with WGS84 geoid, or the selected digital elevation model.

Scene Generator Module
The Scene Generator Module (SGM) exploits the information on geolocation and observational geometry provided by the GM 175 and performs a simulation of gridded spectral radiances to feed the FORUM observing system simulator. The SGM computes pixel by pixel radiances, with three different approaches: a fully automatic set up, a user driven scene definition, or a mixture of the two methods.
In the automatic approach, multiple databases are loaded to configure the surface, atmosphere, and cloud properties in accordance with geolocation, season, and time of the day without any intervention from the user. For storage limit, all the 180 databases are currently available only for observations in the range 0-85°N and 0-30°E, for two seasons (winter and summer), at four times of the day (00, 06, 12, 18 UTC). The surface properties are described by the spectral emissivity, obtained from the geolocated emissivity database by Huang et al. (2016), and the skin surface temperature, obtained from the ERA 5 reanalysis data (Hersbach et al., 2020). The atmospheric profiles of temperature, pressure, water vapor, and other 11 gases mixing ratios, interpolated at fixed altitude levels, are derived from the ERA 5 reanalysis and the IG2 climatological data (Remedios et al.,185 2007). The gases included in the simulation are H 2 O, CO 2 , O 3 , N 2 O, CO, CH 4 , O 2 , NO, SO 2 , NO 2 , NH 3 , and HNO 3 . Cross section data using predefined values for the CCl 4 , CFC 11 and CFC 12 molecules were also added. The definition of the cloud geometrical and optical properties requires multiple steps. The ERA 5 reanalysis provides the total cloud cover (cloud fraction within the pixel) and, for each vertical layer, the cloud liquid water and/or ice content (LWC and IWC). It is assumed that liquid water and ice clouds can coexist in the same layer even if they are computationally treated as two separate components.

190
The LWC and IWC values are used to establish if the scene is clear or cloudy (total cloud cover) and for the definition of the cloud vertical structure. For each layer of the model, the liquid water particles effective radii (Re) are computed from the LWC by using the parameterization described in Martin et al. (1994), whereas the ice particle radii are derived from the IWC and the application of the parameterization by Sun and Rikus (1999) in its revised version by Sun (2001). The cloud optical depth (OD) at 900 cm −1 is derived, layer by layer, from information about the IWC (in case of ice clouds) or LWC (in case of liquid 195 water clouds) and the effective radius of the particle size distributions (PSD). The equation defining the OD at 900 cm −1 for vertically homogeneous clouds of geometrical thickness ∆z is where β(R ef f , 900) is the extinction coefficient at 900 cm −1 of the PSD corresponding to a specific effective radius R ef f , normalized with respect to a unit IWC (or LWC for liquid water clouds).

200
Water clouds are simulated as PSDs of liquid water spheres while ice clouds are assumed as composed of 8 columns aggregates crystals. Single scattering properties for single particles (SSSP) are combined to compute the optical properties (extinction, scattering and absorption coefficients and the phase function) for a set of PSDs over the whole FORUM spectral range. The assumed PSDs are modified gamma distributions (Hansen, 1971) representatives of effective radii from 2 to 200 micrometers in case of ice clouds and from 1 to 30 micrometers in case of liquid water clouds. The SSSP are derived from cases as a compromise between computational speed and accuracy. In fact, for a resolution of 10 −2 cm −1 the computational times in cloudy conditions decrease by a factor ranging from about 10 to about 20 with respect to using a 10 −3 cm −1 resolution.
Nevertheless, the simulation accuracy is maintained since OSS-FSI radiances in cloudy conditions computed using 10 −2 or 10 −3 cm −1 differ less than FSI goal NESR (Figure not shown).
In the user driven approach, the scene is manually configured and the user is able to set a large set of input parameters to customize the specific simulation. Basically, all the parameters defining the surface and cloud properties are user configurable.
Nevertheless, the vertical atmospheric profiles are automatically selected from the ERA 5/IG2 databases for the required geolocation and data, as in the automatic approach. However, the SGM configuration parameters include the list of the atmospheric gases used for the simulation and thus, for each of them, their absorption properties can be turned off on request. The SGM 220 allows the user to define surface temperature and emissivity, or to select specific emissivities from a list of 11 pre-defined surface types derived from Huang et al. 2016, e.g., water, desert, snow, deciduous vegetation. Two different kinds of surface can be assumed to exist in the FORUM field of view. The shape of the areas covered by different surfaces (with different emissivity and temperature) are defined by sectors or circles whose sizes are set by the user through proper geometric parameters in the SGM configuration list. The scene can be assumed as clear, overcast, or partially cloudy adding a third/fourth level of 225 possible heterogeneity to the observed scene. In this last case, the cloud covers only a fractional area of the full FORUM field of view, corresponding to a sector or a circle with configurable geometric features such as in the case of the inhomogeneous surface scene. Liquid water, ice, or mixed phase clouds can be placed at different top altitudes and be of any thickness, even overlapping each other when two clouds are assumed in the same scene (e.g. liquid-ice or ice-ice). In case of ice clouds at

FORUM Embedded Imager
The FORUM Embedded Imager Observing System Simulator (FEI-OSS) is a compiled MATLAB executable in charge of 245 simulating and calibrating the instrument acquisition of the FEI thermal infrared imager. The FEI is modelled as an uncooled bolometer with one spectral channel centred at 10.5 µm and a 2-D sensor layout of typically 60 x 60 pixels (configurable).
The FEI OSS is composed of the instrument simulator (FEI-IS) and the L1b processor (FEI-L1).
The FEI-IS ingests the SGM output scene (possibly oversampled in the spatial domain), convolves the TOA spectra with the instrument Point Spread Function (PSF) in the spatial domain and integrates the resulting radiance with an Istrumental Spectral

250
Response Function (ISRF) provided externally. Among its PSF modelling capabilities, the FEI-OSS is able to ingest an external PSF or to compute it by taking into account the configured parameters for the optics layout, the focal plane arrangement, the satellite motion blur and other high-frequency contributions as the AOCS jitter and the instrument micro-vibrations. Also, the channel centre is configurable.
The FEI-L1 translates the instrument output radiance into calibrated spectral radiance by taking into account noise and other 255 instrumental errors. In particular, the noise is simulated in the brightness temperature domain with a Gaussian distribution of standard deviation equal to the configured NEdT. The calibrated radiance is finally written into the L1b product (netCDF file) along with the estimated geolocation provided by the GM module and some flags related to the FEI/FSI co-registration.

FORUM Sounding instrument
The FORUM Sounding Instrument Observing System Simulator (FSI-OSS) operates the software simulation of the perfor-260 mances of the FSI Fourier transform spectrometer from the starting point given by a set of input TOA spectral radiances provided by the Scene Generation Module (SGM). The simulator generates the interferograms corresponding to the outputs of the FSI instrument observing the selected scene, and then processes them with the level 1 data analysis code in order to recreate the observed TOA radiance.
In order to account for the effects of the finite FoV of the FSI instrument and telescope, the input radiances are provided on a 265 sub-pixel matrix which covers the FSI footprint plus a suitable safety margin, needed to cover pointing errors in the simulations.
Each of the sub-pixel is treated according to its specific observation geometry, specifically, its off-axis angle which produces a corresponding frequency shift due to the different pathlengths inside the FTS, and contributes accordingly to the output interferogram. This is the so-called self-apodisation effect.
Also, to correctly model the acquisition during the instrument dwell time, several inputs are provided to the OSS, corre-270 sponding to different times in the duration of the acquisition. Each input will have, in general, different line of sight parameters, and the result of the processing of each of the inputs is interpolated at interferogram level in order to produce the complete interferogram acquired in the duration of the dwell time.
The reference optical design used in the FSI-OSS consists in a Mach Zehnder interferometer, a design that provides the most general approach to a Fourier transform spectrometer. This design in fact allows us to place different sources on each of the and recombiner beamsplitters, which gives greater flexibility in the instrument design. Other optical schemes can be simulated just as particular cases of this more general design.
The main configuration parameters that allow to define the interferometer performances inside the FSI-OSS are the beam splitter reflection, transmission, and absorption coefficients, which are provided as complex, spectrally-dependent quantities to 280 obtain the most accurate and general instrument response modeling (Bianchini et al., 2009. One of the inputs of the interferometer is permanently set on a black-body source (reference black-body, RBB), while the other can be switched between the scene and two other reference sources (cold and hot black-bodies, CBB and HBB). This allows to operate the FSI-OSS in two different modes, scene and calibration, in order to simulate the full measurement process of the FSI instrument.

285
To obtain a meaningful representation of the interferogram that is observed on the outputs of the FSI the OSS needs to define a further set of transfer functions that describe the behaviour of the optical and electronic components of the system.
An overall, frequency dependent absorption coefficient can be defined in order to model possible effects due to optical components. Moreover, transfer functions describing the electronics response, expressed in terms of phase and amplitude are applied in order to correctly represent the output interferograms. The electronic response function can be further subdivided 290 into detector response and preamplifier response, if needed. A last transfer function is used to model the effects of sampling and digitization, with the possibility of introducing arbitrary random or periodic sampling errors, or a calibration error in the interferogram time scale.
All the above described functions are coded in the instrument module, which outputs the simulated interferograms. The level 1 module has the function of transforming and calibrating the interferograms in order to reconstruct the observed scene 295 radiance . The level 1 module is subdivided in three parts: -Level 1a, which performs the Fourier transform of the interferogram, including the possible application of compensations for the different instrumental transfer functions, the zero path difference detection and the phase correction of the interferogram, producing as output the uncalibrated spectrum.
-Level 1b, which performs the radiometric calibration of the uncalibrated spectrum and the estimation of the a-priori 300 errors deriving from the calibration procedure (calibration error) and the detector noise (NESR error), providing as an output the measured atmospheric spectral radiance corresponding to the input scene as observed on each of the two instrument outputs.
-Level 1c, which performs the averaging of the two instrument outputs and the resampling of the average spectrum on a configurable frequency scale, in order to provide the level 2 module with a single measured spectrum with the required 305 spectral sampling. The resampling is done through Fast Fourier Transform (FFT) interpolation and zero padding in case of oversampling.
The frequency scale is chosen to obtain statistically independent adjacent measures. This leads to a diagonal VCM matrix with respect to the NESR component, which is the dominant one. The resolutions obtained are 0.35714 and 0.37037 cm −1 for the two instrument concepts.

310
Last, a separate FSI-OSS module is used to calculate the ISRF corresponding to the configured FSI optical setup. This module performs the calculation of the ISRF taking into account the effect of the finite maximum optical path difference and the line broadening and shift due to the finite divergence of the radiation propagating inside of the interferometer (the self-apodization effect). The ISRF module generates an output file in a format which is directly compatible with the level 2 module.

Level 2 module
The L2M module is composed by the L2M_CIC (Cloud Identification and Classification) and L2M_I (Inversion) submodules.
A python wrapper gathers the input data and runs the two submodules, providing the correct configuration files.

Cloud Identification and Classification
The main goal of the L2M_CIC submodule is to provide a classification of the observed spectrum. A machine learning algo-

330
The CIC algorithm classification is based only on FORUM sounder spectral radiance and thus regards the radiance from an extended (about 15 km diameter) field of view. The classification is provided independently of the presence or not of sub-pixel heterogeneities. Therefore, a scene classified as cloudy could be the results of a field of view that is only partially covered by clouds plus a clear sky fractional area, and vice versa. The L2M_CIC exploits the FORUM imager data to pair the spectrum classification with a scene homogeneity information. The imager pixels radiances are first converted into brightness 335 temperatures (BT), then, a BT distribution (histogram) is analysed with a custom made fitting function. A simple algorithm identifies the BT distribution modes and splits the imager pixels into homogeneous groups of pixels characterized by limited BT variation. Thus, the L2M_CIC submodule provides auxiliary information concerning the number of homogeneous areas identified at FORUM sounder sub-pixel level, their average BT and standard deviation. This is used as a quality flag for the inversion results, since non-homogeneous scenes are in any case managed as homogeneous in the retrieval process. A land 340 mask for the imager field of view is also defined, based on the geo-referred database GSHHG (Wessel and Smith, 1996, and updates) , so that BT inhomogeneities caused by different sea-land surfaces are easily identified, especially in clear sky.

Inversion
The purpose of the inversion module is to solve the inverse radiative transfer problem, retrieving any combination of surface temperature, surface emissivity and vertical profiles of temperature and water vapour in clear sky conditions, and cloud param-345 eters in cloudy sky conditions. The retrieval algorithm is based on the classical OE (Optimal Estimation) method (Rodgers, 2000). The inversion is performed via the Gauss-Newton sequence, with the Marquardt modification to ease Hessian inversion and to compensate for the non-linearity of the forward model. The a-posteriori IVS (Iterative Variable Strength) regularization technique (Ridolfi and Sgheri, 2011;Eremenko et al., 2019;Sgheri et al., 2020) is then applied to smooth out the retrieved profiles. In accordance with the SGM, the code makes use of the LBLRTM forward model in clear sky conditions, and the 350 LBLDIS frontend to the DISORT multiple scattering code in cloudy sky conditions. In cloudy sky condition, when a cirrus cloud is detected, the inversion module includes a preprocessor that, using the parameter decoupling technique, finds a better estimate for the cloud parameters than the CIC guess. This feature is important to avoid the retrieval stopping on local minima, see the discussion of the cloudy tests for more explanations. We used the fDISORT (fast DISORT), an accelerated version of DISORT (Sgheri and Castelli, 2018), able to reduce the multiple scattering computation time. The instrumental effects are 355 factored via the convolution with the ISRF function. In this early version of the FORUM E2ES the decision was made to maintain a frequency dependent ISRF. Consequently, the convolution cannot be computed in the Fourier domain. We obtained the actual ISRF on a rather fine grid with sampling step of 3 cm −1 . However, using the apodized sampled ISRF can increase the chi-square by up to 30% with respect to the unapodized spectrum, due to the fact that we exchange the interpolation and apodization operators. The correct sequence of first convolving with the sampled ISRF and then convolving with the apodiza-360 tion kernel increases the computation time and does not remove the error introduced by cutting the tails of the ISRF, which is one of the purposes of apodization. Thus, at least in this stage, we preferred to use the unapodized spectrum.

Performance Assessment Module
The Performance Assessment Module (PAM) is a MATLAB executable aimed to compute and plot the retrieval accuracy of L2M. In particular this module also allows inspecting the SGM output (TOA spectral radiance) and plotting the L2M vs.

365
SGM profiles including error bars and residual statistics for clear-sky retrievals (emissivity, skin temperature, atmospheric temperature, atmospheric water vapor and precipitable water vapor) and cloudy-sky retrievals (cloud top height, cloud optical depth, cloud particle size, cloud thickness and ice/liquid water path). The PAM is also able to plot the Averaging Kernel (AK) profiles calculated by the L2M and the vertical resolution, computed at each level as the Full Width at Half Height (Rodgers, 2000) of the corresponding AK row. Averaging kernels may be optionally convolved with the SGM reference for comparison 370 to the L2M retrievals.

Test scenarios
The FORUM E2ES is meant to study the potentialities and criticalities of the FORUM mission for realistic characteristics of the instrument and for different measurement scenarios.
The different measurement scenarios, used both for the validation of the different modules and the assessment tests, were 375 defined in the homogeneous case to be representative of seasons and latitudes in clear and cloudy conditions.
The FEES is able to simulate FORUM measurements all over the globe, provided the climatological data is included in the SGM dataset (see the relevant section).
Within the area selected for the FORUM E2ES study, seven scenarios were identified: three extreme cases (two in polar regions in winter with two different surface characteristics, ice and fine snow, one in the tropics on the desert) and four cases 380 at Middle Latitude (two on the sea, in summer, two on vegetation, one in summer and one in winter). These scenes can be simulated either in clear-sky or in cloudy-sky. Table 4 provides, for each scene, geolocation, type of surface, and time, as well as the type of cloud assumed for each scene in the case of cloudy sky. The cloudy sky scenarios are computed adding to the clear sky atmosphere different homogeneous types of clouds filling all the FoV (see Table5). 385 We depart from real data using this approximation, however it does not have an impact on the assessment of the performances of the E2ES.
In the assessment tests, we also used the capability of SGM to simulate heterogeneous scenes, assuming a FoV covered by portions of different homogeneous atmospheres. Assessment tests were performed to evaluate the quality of the L2 products, assuming the FORUM instruments in a configuration satisfying the L1 goal requirements, for the different scenarios identified to cover a variety of atmospheric characteristics.
These tests were performed for both homogeneous and inhomogeneous scenes, in both clear sky and cloudy sky.
The results obtained in the homogeneous case represent the best that can be obtained from FORUM for the different scenes, and hence the quality of the products obtained in these conditions constitutes the reference for the other tests, performed with 395 heterogeneous scenes.
The L2 clear sky products of the E2ES were also validated by comparing them with the outputs obtained, starting from the same observations generated by the FSI module, by an already validated correlative code. The reference code used in clear sky is KLIMA (Kyoto protocoL Informed Management of the Adaptation).
Conversely the L2 cloudy sky products of the E2ES were compared with the outputs of the reference SACR (Simultaneous 400 Atmospheric and Cloud Retrieval) code. However, due to some differences in the model and approach, we cannot speak rigorously of a validation.

Homogeneouos Clear sky cases
The seven clear sky test case listed in Table 4 were analyzed. The simulated observed spectra were generated by the E2ES chain, including the GM, the SGM, the FEI and the FSI. The reference atmospheric state was constructed using ECMWF 405 information. The vertical grid used to represent the atmospheric profiles in both the SGM and the L2M is composed of levels distant about 0.5 km up to 15 km, 1 km from 15 to 25 km, 5 km from 25 to 40 km, 10 km up to 80 km. In the retrieval we also tested an optimized grid similar to that of . However, in the E2ES retrieval setting we obtained worse results because a coarser grid introduces a smoothing error, particularly noticeable in the tropospheric water vapour profiles. The used noise error and spectral resolution are aligned with information provided by the industrial consortia, compliant with the goal 410 parameters specified in Tables 1 and 3. Temperature, Water vapour, surface temperature and surface emissivity are retrieved simultaneously by L2M using the Optimal Estimation approach. The Covariance Matrices of the a-priori vertical profiles of temperature and water vapour are built assuming the errors defined by the UK MetOffice for routine assimilation of IASI products into their operational Numerical Weather Prediction (NWP) system (see Figure 6). Indeed, IASI-NG measurements are planned to be used in synergy with the FORUM measurements. For the emissivity, which is represented on a 5 cm −1 spaced 415 grid, we used a constant error of 0.05, with a correlation length in wavenumber of 50. For surface temperature an a-priori error of 2 K was used. The a-priori profiles used for these tests for all the retrieved variables are given by the truth perturbed with one standard deviation of the a-priori error. This is the most sensible choice, especially for emissivity. In fact, the sensitivity to the emissivity in the spectrum is different in different regions of the spectrum. If a stochastic perturbation were used, the results would depend on how large is the actual perturbation in the frequency ranges where there is sensitivity. For the initial condition 420 of the retrieval we used for temperature, water vapour and surface temperature the values taken from the climatology, obtained as the 10-years local average in the month of ECMWF analysis profiles, while for emissivity the a-priori values were used.
The reason for this difference is that the Gauss-Newton (GN) method only looks for local minima. Starting from the a-priori, the chance to obtain a solution close to the a-priori itself is not negligible. On the other hand, the dependence of the spectrum from emissivity is linear, so that the possibility of finding a local minimum for the emissivity itself is minimal. However, the 425 combined retrieval of emissivity and surface temperature raises an issue that is discussed in this paper.

Validation with KLIMA code
The KLIMA code, used to validate the L2M module in clear-sky conditions, performs the retrieval of the atmospheric profiles and surface parameters from the spectral radiance measurements. The code consists of two distinct modules, the Forward Model (FM) and the Retrieval Model (RM). The RM has been developed in the context of the KLIMA ESA study (Cortesi 430 et al., 2014) by upgrading the algorithm employed for the analysis of REFIR-PAD measurements  adapted in turn from the MARC inversion code for the MARSCHALS ESA study (Carli et al., 2007 Moreover, the correction of the Planck function (Clough et al., 1992) is included to take into account the optical depth of the atmospheric layer at the different frequencies. The only difference between the KLIMA and LBLRTM forward models is in the line-shape: KLIMA uses the Voigt function, while LBLRTM uses a linear combination of faster functions, performing a grouping of nearby lines. The validation of KLIMA FM was conducted in the context of IASI data analysis (Cortesi et al., 2014) by comparing synthetic IASI measurements generated by the KLIMA FM code with those of the FM of the LBLRTM. The priori information (optimal estimation method) and the Marquardt parameter. The code implements the multi-target retrieval: more than one species is simultaneously retrieved along with many other atmospheric, surface, and instrumental parameters.
A complete Covariance Matrix (CM) can be used, including both the measurement errors and the errors in the calibration procedure and/or in the estimation of the FM parameters.

450
The purpose of the validation was to compare the retrieved quantities and the corresponding errors (given by the mapping of measurement error on retrieved quantities) provided by the L2M_I of the E2ES and by KLIMA when starting from the same conditions. The validation may be considered as reached if the differences in the estimated retrieval error and the differences in the retrieved profiles are small enough. More precisely, we set a goal at 10% of the retrieval error for the retrieved profiles.
Moreover, we ask for the difference in the retrieval error to be less than 10%. With the actual results, statistically, these goals 455 are achieved.
In Table 6 we report the comparison between the initial and final reduced χ 2 obtained using KLIMA end L2M_I. The number of Gauss iterations is also reported. Each row of the table refers to a different scenario. The Reduced χ 2 differences (both initial and final) are smaller than 0.02. Although the convergence criteria are the same (i.e. reduction of the χ 2 less than 1%), the number of Gauss iterations in some cases is different. This is due to differences on reduced χ 2 of the order of 0.001.

460
In Figure 2 we report the comparison between the retrieved H 2 O value profile and error profile obtained using KLIMA and L2M_I. The differences are divided by the mean retrieval error. The differences between retrieval errors are smaller than 5% and differences between profiles retrieved by the two codes are smaller than 0.4 times the retrieval error, with the maximal difference concentrated in the lower tropospheric region.    Results for temperature validation are reported in Figure 3. The differences between retrieval errors are smaller than 25% 465 and the differences between profiles retrieved by the two codes are mostly smaller than 0.5 times the retrieval error.
In Table 7 we report the comparison between the retrieved surface temperature and error obtained using KLIMA and L2M_I.
Each row of the table refers to a different scenario. The differences between the retrieval errors are smaller than 15% and the differences between the profiles retrieved by the two codes are mostly smaller than 0.3 times the retrieval error.
In Figure 4 we report the comparison between the spectrum of the retrieved surface emissivity and its error spectrum obtained 470 using KLIMA and L2M_I. The differences are divided by the mean retrieval error. The differences between retrieval errors are smaller than 30% and the differences between profiles retrieved by the two codes are mostly smaller than 0.5 times the retrieval error. The larger differences in the MIR range that show up in some cases are mainly due to the high negative correlation between surface temperature and emissivity. For the retrieval procedure, the effect of a lower surface temperature with a higher emissivity is similar to that of a higher surface temperature with a lower emissivity.
475 Figure 5. Simulated spectra and residuals after the fit for the seven clear cases.

Discussion of the results
We describe here a selection of the results.
In Figure 5 we report the simulated spectra and the residuals (given by the difference between simulated observations generated by the OSS and the best fit L2 simulated radiances) for the seven clear cases. The different cases are characterized by large differences both in the FIR and in the MIR spectrum. However, in all cases the residuals are compatible with the 480 measurement noise and show no bias. In Figure 6 we report the retrieved profiles, the retrieval errors (estimated from the diagonal elements of the CM) and the differences between the retrieved and the true profiles of temperature and water vapour for all the seven cases.
For both temperature and water vapour the difference between the retrieved profile and the truth is well within the retrieval error for most of the points and all cases. Despite the variability of the temperature and water vapour profiles in the various 485 cases, the sensitivity of the temperature and water vapour retrieval, given by the retrieval error, does not change significantly and the requirements on precision are met in all cases. The Information Gain Quantifier (IGQ, Dinelli et al. (2009)) is defined as two times the base 2 logarithm of the ratio between the a-priori error and the retrieval error, and measures the contribution of the measurements in the retrieved values. The IGQ is larger than 2 for temperature, and larger than 4 for water vapour in troposphere. In Figure 7 we report the typical behaviour of the AKs for temperature and water vapour. Values of the diagonal 490 elements significantly smaller than 1 in the troposphere does not depend only on the fact that the used a-priori errors are small and hence the information gain is limited, but also on the very fine used retrieval grid in the troposphere which determines  Table 8 reports the results of surface temperature retrieval. All cases show a good accuracy, with an IGQ larger than 4. The exception is case 1.1 (desert at noon in summer), where there is a negative bias of 0.8 degrees, which is larger than the retrieval error. This is due to a large negative correlation between surface temperature and emissivity in the MIR. This is further discussed when dealing 500 with the results for surface emissivity.
For emissivity retrieval, we can divide our sample into three groups, with similar atmospheric conditions:  -The polar cases (5.1 and 7.1), high latitude and dry atmosphere (snow and ice).
-The desert case (1.1), which combines a dry atmosphere and hot surface temperature. Also, the emissivity pattern has 505 substantial features in the MIR region, the quartz Reststrahlen bands (Salisbury and D'Aria, 1992) also called the "devil's horns".
We selected one model case from each group. The number of DOF is reported in Table 9 for the FIR and MIR regions. These numbers are directly comparable because the emissivity grid is the same for all cases.
For the polar group we show case 5.1 in Figure 8. We note that, as expected due to the dry atmosphere which makes it 510 transparent to the surface also in the FIR, there is sensitivity to the measurements both in the FIR and in the MIR regions.  This emerges also from the analysis of the AKs of the frequency dependent emissivity profile (see Figure 9), characterized by values of the diagonal matrices very close to 1 in the 500-600 cm −1 and in the 700-1000 cm −1 regions. In this regions the apriori information contributes only very marginally to the inversion and the grid is sufficiently coarse to allow each component of the used frequency grid to capture significant information.

515
For the middle latitude group we show case 2.1 in Figure 10. In these cases, the rich water content in the troposphere masks the emissivity signal in the FIR, so there is only sensitivity in the MIR atmospheric window, where the transparency of the atmosphere allows to get information on the surface. Finally in case 1.1 the retrieval of the emissivity shows some sensitivity in the FIR region, but also a positive bias of about 0.01 in the MIR region, as shown in Figure 11.
With sensitivity tests we discovered that the bias does not depend on the particular choice of the spectrum noise. Also, the 520 bias shows up also when retrieving only emissivity and skin temperature. The sign of the bias depends on the sign of the    Left panel: a-priori profile with error bars (green), retrieved profile with error bars (red), regularized retrieved profile (black), truth (blue).
Right panel: difference between retrieved and true (red), retrieval error (blue), difference between regularized and model (black), difference between a-priori and model (green), a-priori error (cyan). Lines are drawn to separate surface temperature and emissivity sector (lower indeces), from temperature (middle indeces) and water vapour (higher indeces). Points with a positive or negative correlation less than 0.01 are not drawn to enhance the readability of the figure. In the right panel the correlation between the surface temperature and the emissivity is explicitly shown for the two cases.
perturbation of the a-priori emissivity, while the initial guess of the emissivity and the initial guess and a-priori of the surface temperature have no effect. The effect is due to a strong anticorrelation between retrieved emissivity and surface temperature, which reaches −0.8 in the MIR region, see also right panel of Figure 13. There are different ways of solving this problem and ad-hoc studies to optimize the retrieval settings are under way. In particular, the use of a coarser emissivity grid has to 525 be considered. One of the solutions is to use a larger a-priori emissivity error, i.e. 0.2. A larger error of the a-priori however produces oscillations in the retrieved emissivity due to the weaker constrain, which can be reduced by extending the IVS regularization to emissivity. The results are shown in Figure 12.
In Figure 13 we show the correlation matrices for the retrieved quantities in the case 1.1 (desert) and 2.1 (water). The right panel contains the correlation of the surface temperature wrt the surface emissivity in the retrieval of both cases, i.e. the rows of 530 the leftmost columns of the two other panels corresponding to emissivity. Again we remark the strong anticorrelation between the two quantities in the retrieval, which reaches its maximum in the desert case.

Homogeneous cloudy sky cases
The second group of tests still deals with the homogeneous case, but for the cloudy-sky. The retrieval module used for the E2ES project only retrieves cloud properties in cloudy conditions. The aim of these tests is to show that, even using a perturbed 535 atmosphere, we are still able to retrieve the cloud properties with a good precision.
The quantities retrieved in the cloudy-sky case are: the top of the cloud, the equivalent radius of the particle, the total optical depth of the cloud. We realized that there is no sensitivity to the geometrical thickness of the cloud, so we assumed this parameter to be constant. The tests were performed analyzing the simulated E2ES FORUM observations related to the first six homogeneous scenarios described in Table 4 in cloudy sky condition.

Comparison with SACR code
The SACR code is able to perform the simultaneous retrieval of the atmospheric state and ice cloud parameters, and it was applied to the analysis of the spectral measurements acquired by the REFIR-PAD spectro-radiometer, which has been measuring at Concordia Station on the Antarctic Plateau since 2012 (Palchetti et al., 2016;Di Natale et al., 2017). The SACR code (Di Natale et al., 2020) performs the retrieval of water vapour and temperature profiles, the surface temperature, the cloud position 545 and the cloud optical and microphysical properties, such as the generalised ice and water effective diameter, the ice fraction and the optical depth or the IWP. To simulate the atmospheric radiative transfer, the LBLRTM is integrated with a specifically developed subroutine based on the δ-Eddington two-stream approximation, whereas the single-scattering properties of cirrus clouds are derived from a database for hexagonal column habits. To perform the retrieval procedure, SACR code uses the optimal estimation method with the Levenberg-Marquardt approach.

550
The L2 products obtained from the L2M_I and SACR are here compared. To perform a realistic validation, the atmospheric state used in the retrieval procedure is perturbed with respect to the true state according to the CM of the a-priori. In the E2ES project, the atmospheric parameters are not retrieved. Due mainly to the differences in the cloud representation, we cannot truly speak of validation in the cloudy case. In order to obtain similar results, the initial guess, a-priori and a-priori errors for the SACR code were taken from the output of the L2M_I preprocessor, not from the CIC estimate. The aim of the comparison is to 555 show that the retrieval module of the E2ES and the comparative code have similar capabilities at identifying cloud properties, thus confirming that the results are not code dependent.

Discussion of the results
Results for five cloudy cases (the coastal marine case was wrongly attributed by the CIC to a clear sky retrieval due to the small contrast between the cloud and the surface, and case 7 was only studied in clear sky) are summarised in Table 10. 560 We see that, qualitatively, the results obtained by the two codes are similar. With some exceptions, the retrieved cloud parameters are very close to the true parameters, even if, being the retrieved parameters characterized by very small errors, in most cases, the difference between the retrieved and the true value is larger than the error.
In the E2ES the cloud composition and scattering properties are the same in the simulation and retrieval of data, so the model error, which is significant with real data and must be accounted for in the CM, does not impact on the retrieval error.

565
On the other hand, there is a combination of factors that explains the small error bars: -The error on the assumed atmosphere, which is perturbed with respect to the true state, is not taken into account in the error budget.
-The linear estimate may not be verified, especially when parameters with different behavior are mixed. -In presence of non negligible correlations in the Covariance Matrix differences between the retrieved value and the 570 true state can be larger than the square root of the diagonal matrix. The typical absolute values of the cloud parameter correlations are in the 0.1 − 0.7 range.
The retrieval of cloud parameters may be critical even when cloud composition and scattering properties are perfectly known.
The following examples list the critical aspects of the retrieval.
-A water cloud at very low altitudes (case 3.2), that is not identified because the cloud and surface effects are similar.

575
-A cirrus cloud combines scattered radiation from below the cloud and direct radiation from above the cloud in the TOA spectrum. This combination leads to the existence of local minima in the cost function which is minimised by the retrieval procedure, because different parts of the spectrum are well fitted by different cloud parameters. If the initial guess is far from the true values, the retrieval may converge to a local minimum. This problem is tackled by the L2M_I by using a cloud preprocessor, as already mentioned. If the preprocessor is not used, the results of both the L2M_I and SACR for 580 cases 1.2 and 2.2 worsen.
-A cloud characterized by a very large OD (case 6.2), where the sensitivity to the OD is lost. In this case the cloud becomes fully opaque, and the only radiation seen by the instrument comes from above the cloud. The error on the Figure 14. Very thick cloud case (case 6.2, OD=300). Quantification of the error in the radiance compared with the noise requirements (black curve) using SL approximation (purple curve), maximal OD of 10 (green curve) or both (blue curve).
spectrum using an OD=10 instead of the true value OD=300 is smaller than the requirement for the NESR, and can be seen in Figure 14.

585
For thick clouds, two L2M_I approximations were tested in order to reduce the computation time: maximum OD set to 10 and representation of the cloud as a single layer (SL) instead of multiple layers. The results are presented in Figure 14.
The OD limit of 10, if used alone, introduces an error which is larger than the NESR, because some radiation manages to penetrate the first layers of the cloud. The error is comparable with the NESR if also the SL approximation is applied, because the compactness of the cloud impedes the penetration of the radiation. Indeed, the error becomes negligible if only the SL 590 approximation is used, and the speedup is stunning. The fDISORT computation time with the SL approximation is about one tenth of the computation time when no approximation is applied. On the other hand, the OD limit only subtracts some seconds.
Thus, the retrievals were performed using the SL approximation. The drawback is that the sensitivity to large values of the OD is even smaller, since the cloud is more compact.

Sensitivity to thin cirrus cloud 595
The far-infrared part of the spectrum is particularly sensitive to cirrus clouds and recently, simulations were used to show that FIR can contribute to improve the detection of thin cirrus cloud (Maestri et al., 2019b, a;Magurno et al., 2020). The analysis with the E2ES allows to evaluate the capability of retrieving cloud information from these measurements also in presence of very thin cirrus clouds and to compare the lowest detectable optical depth for a cirrus cloud by FORUM (range 100-1300   Table 5. The surface properties are assumed homogeneous for the entire scene.
For both cases, we started from the optical depth value of the considered case and then we progressively halved the optical depth whilst keeping all other parameters unchanged until the Cloud Identification and Classification tool continued to classify 605 this case as cloudy. Then L2M_I retrieval was performed to be sure that the retrieval was able to retrieve the cloud parameters.
Contrary to previous tests where perturbed values of the truth were used for water vapour, temperature and surface temperature, for these tests they were set to the true values to avoid that a different error in temperature and water vapour had a different impact in the retrievals performed in the two different spectral regions. The results for the two considered cases are reported in Table 11.

610
For case 1.2 the minimal optical depth was 0.1 for both MIR and FIR+MIR, while for case 2.2 the minimal optical depth was 0.05 when only MIR measurements are used and 0.03 when both FIR and MIR bands are combined. The addition of the FIR band reduces the retrieval error and enhances the accuracy of the retrieval. Figure 15 reports the differences between the cloudy spectra relative to various optical depths and the clear-sky spectrum in both cases, compared with the random measurement error. Clouds are detectable and cloud parameters retrievable until the 615 contribution of the cloud to the spectrum is larger than the noise. In real life, where the knowledge of the atmospheric profiles is affected by an error and in presence of other systematic errors, the retrieval of the cloudy quantities may be more difficult, but the information provided by this test represents the goal we can aim to reach.

Tests on heterogeneouos cases
Given the extension of the FSI pixel, the probability that the scene might be heterogeneous is very high. Heterogeneity of the 620 pixel poses different problems in the retrieval of L2 products (either atmospheric profiles, surface temperature and emissivity or cloud information) from TOA radiance. The first issue is that the ISRF is modelled assuming a homogeneous FoV. If the FSI FoV is strongly heterogeneous, the ISRF may not well represent the instrumental response, thus introducing an error in the retrieval. In the case in which some predominant homogeneous characteristics can be identified in the heterogeneous scene, it is useful to quantify the error in the retrieval of the predominant homogeneous contribution due to the contamination of the scene 625 with clouds or heterogeneities in the surface. On the contrary, if the scene is so heterogeneous that predominant homogeneous characteristics cannot be identified, the measured spectrum has a value in itself, but the information extracted by the L2 analysis is just a combination of contributions coming from the different parts of the sounded atmosphere and surface. The different types of heterogeneities that can occur in the pixel is manifold and the impact of heterogeneities in different scenarios can be very different. A large effort in the SGM was devoted to the handling of heterogeneous scenes, primarily to study the impact of 630 these heterogeneities on the spectrum and then on the retrieved quantities. In the standard approach used by the L2M module the pixel is assumed to be homogeneous, and the retrieval is performed in either clear sky or cloudy sky condition on the basis of the CIC estimates. Thus, at the moment, we regard any heterogeneity of the FSI FoV as a contamination of either a clear or a cloudy homogeneous scene.

635
The objective of this test is to check whether the ISRF, which is modelled assuming a homogeneous FSI FoV, allows a good representation of the FSI spectrum also in presence of heterogeneities in the FSI FoV. The OSS is able to accurately reconstruct the radiance seen by the instrument taking care of the different beams coming from different parts of the FoV. L2M simulates the spectrum observed by the instrument by convolving the high resolution spectrum generated by the forward model with the ISRF computed assuming a homogeneous FoV, and hence the deformations in the ISRF introduced by the heterogeneities are 640 neglected. This may have a large impact on the quality of the retrieval, since the information on the profiles at the different altitudes comes from the knowledge of the line shapes of the spectral lines. The magnitude of the impact depends on the heterogeneities which are considered. Given the cylindrical symmetry of the FSI, heterogeneities in the FoV that are expected to mostly affect the ISRF are the ones that have a strong dependence on the radius.
To this purpose, we performed the following test. In the homogeneous case 1.1 we considered the spectrum calculated by  the difference between these two spectra with the error noise. We then repeated the same comparison, this time using a cirrus covering only a part of the pixel in the center of the FoV. Figures 16 and 17, show, respectively for the homogeneous case and the heterogeneous one, the FSI FoV and the difference between the output radiance of the FSI, as calculated by the OSS, with the radiance obtained by convolving the high resolution 650 spectrum computed by SGM with the ISRF. This difference is plotted as a function of the wave number, together with the measurement noise.
In case of a homogeneous scene the convolution with the ISRF is a good representation of the instrumental response.
On the contrary, when strong heterogeneities are present, the convolution with the ISRF introduces an error larger than the measurement noise. In principle, the error coming from the representation of the ISRF cannot be neglected in presence of large 655 discontinuities as the ones we have considered. However, the heterogeneity of the scene poses anyway problems to the retrieval module, which considers the scene to be homogeneous. Also, the meaning of the retrieved quantities is uncertain since the measured spectrum represents a kind of average of the different scene characteristics.
In the future, the L2M approach can be developed to use information coming from FEI image and chi-square to identify and define dedicated strategies for heterogeneouos scenes.

Cloud contamination of clear sky FoV
This test aims at estimating the impact that an unidentified cloud contamination of the FSI FoV has on the retrieval, performed with the assumption of clear sky conditions, of temperature and water vapor profiles, and of the surface temperature and frequency dependent emissivity. The test is performed by simulating with the SGM a clear sky scene, which is progressively contaminated by a cloud entering in the FSI FoV. This is the most simple and idealized case, with only two distinct FoV 665 fractions internally homogeneous, that is investigated as a reference case. In real conditions, larger heterogeneities can be observed within the FoV. For this specific test, the study case 1.1/1.2 (tropical profile on desert surface) is used, with the contaminating cirrus cloud as in test 1.2. The surface properties, as well as the atmosphere, are assumed homogeneous for the entire scene. The FoV fraction occupied by the cloudy sector is increased at steps of 2.5% from 0 to 10%, then at steps of about 7% up to 25% cloud coverage fractions.
670 Figure 18 reports the simulated spectra and residuals of the fit in the various cases. We note that by increasing the cloud contamination, the residuals depart from purely random ones since the retrieval is not able to compensate for the cloud contamination by changing the retrieved quantities.
The retrieval is performed for each simulation by assuming clear sky conditions, and the retrieved quantities are compared with the values used by the SGM to simulate the part of the scene in clear-sky. These are indicated as the true values. Figure 19 675 shows the differences between retrieved and true values for the temperature profiles (left panel) and the water vapor profiles (right panel), compared with the corresponding retrieval error for different cloud contamination percentages. In general, increasing the cloud contamination increases the difference of the retrieved temperature profile with respect to the true expected values. In the worst case, at about 700 hPa, the differences increase from within the error for the completely clear sky case (red line) up to about six times larger than the error for the case 97.5% clear -2.5% cloudy (black line). Similar results, but with a 680 reduced magnitude, are observed for the water vapor profile retrieval. The surface skin temperature retrieval is also affected by the cloud contamination. Figure 20 shows the difference between the true and the retrieved values of the surface emissivity as a function of the clear sky FoV fraction (red line) and the corresponding χ 2 values (green line). The differences in the retrieved temperature rapidly increase up to 3 K for a 5% cloud contamination in the tested scenario. As described in Subsection 5.1.2 a   is an optically thin cirrus cloud (OD=1), we expect that for thicker clouds, such as in case of cumuli or cumulonimbi, the effects on the observed radiance and hence on the retrieved skin temperature are larger even for lower fractional areas. For the tested scenario, the quality of the retrieval is compliant with the E2ES requirements for a cloud contamination smaller than 10%.
Nevertheless, the CIC algorithm applied to the simulations identified the scene as cloudy as soon as the cloud fraction grew above 8%. This means that the automatic inversion procedure would switch to the cloudy case and would not try to retrieve the 690 surface properties in the worst scenarios, where the errors are too large, preserving the performance of the L2M_I.

Clear sky, heterogeneities in the surface
This test aims at evaluating the impact of the heterogeneity of the surface on the retrieval of temperature and water vapour profiles. It is performed for a polar scene, since the largest information on the surface is obtained mainly in the polar regions where the atmosphere is drier and hence more transparent. The scene at the geolocation corresponding to case 5.1 (polar case) 695 is simulated assuming that the surface has a part covered with fine snow and a part covered with sea. The surface temperature used to generate observations changes accordingly, being 273.27 K on water and 254.45 K on snow. Temperature and water vapour profiles are assumed homogeneous in the sounded atmosphere over the pixel. We used 1 for emissivity a-priori and 253.423 K for surface temperature, a perturbation of the snow value. Results are shown for scenes covered with different percentages of water and snow. 700 Figure 21 reports the simulated spectra and residuals of the fit for different percentages of water/snow FoV coverage (snow 10%, 55%, 75%, 90%). The residuals in all cases are compatible with the measurement noise and show no bias. Figure 22 reports the retrieved emissivity in the various cases, while the retrieved surface temperature is reported in Table 12. water (blue) soils. The used a priori profile and relative error is shown in grey. For readibility reason the retrieval error is shown only for one case (snow 90%-water 10%), but the other cases are characterised by similar errors. In all cases we find that retrieval of surface emissivity, as well as surface temperature, adapts in an almost linear way to handle the different percentage of snow/water pixel coverage, in particular in the spectral region between 400 and 600 cm −1 , 705 where water and snow emissivity mostly differ, the retrieved emissivity is closer to the water or the snow surface emissivity according to the predominance percentage of the surface type in the pixel coverage. Differences between retrieved products and the weighted mean of the emissivities characterizing the different pixels of the FoV have to be attributed to the negative correlation between surface temperature and emissivity. On the contrary, the retrieved water and temperature profiles are little affected by the surface heterogeneities, since the differences between the retrieved value and the true one are always within the 710 retrieval error (see Figure 23).

Tests based on MODIS L2 products
All the previously investigated scenarios rely on idealized conditions, where the surface and, eventually, the cloud layer can only have homogeneous properties within the FSI FoV or a simple binomial characterization. This simplified scene description is convenient to evaluate study case scenarios and to guarantee a good computational speed of the E2ES chain. Unfortunately, 715   Table 13.
The surface, atmospheric and cloud parameters used to build the scene are derived from the following databases:

730
-Atmosphere: the MODIS Atmospheric Profile product (07_L2, https://modis-images.gsfc.nasa.gov/products.html) defines the temperature and water vapor profiles, and the surface height at 5 km resolution; -Cloud properties and surface temperature: the MODIS Cloud Product (06_L2) are used to define the cloud parameters (particles phase and effective radius, cloud top height and optical thickness) and the surface temperature, with spatial resolution of 1 km.

735
The above products are characterized by different grids and spatial resolutions; thus, they are remapped into the FEI grid. Once the cloud, atmospheric and surface information are available at the same grid, a dedicated subroutine, mimicking the SGM, is applied to compute the spectral radiance for each pixel. Due to the complexity of the process the subroutine is kept external to the E2ES chain. Nevertheless, following the above procedure, multiple scenarios can be built from satellite products for complex atmospheric conditions. The radiances are stored in the E2ES as ancillary data and used for testing the FORUM E2ES 740 performances on realistic conditions.  The MODIS radiance at specific bands are also reported in red dots. The vertical red bars indicate the MODIS radiance variability within FSI FoV.
radiance based on MODIS products and the mean MODIS radiance at specific bands are consistent, but a perfect match is not expected due to the non-linearity between L2 products and the radiance.
Finally, the L2M is used to retrieve the characterizing parameters of the complex scenes, in particular: temperature and water vapour profiles, and surface skin temperature and emissivity for the case M.1, correctly classified by CIC as a clear sky scene; 755 cloud properties for the cases M.2 and M.3, classified by CIC as a cloudy scene." In the analysed realistic cases, it is not possible to define a single truth to be compared with the retrieval results. The average parameters within the FSI FoV do not properly describe the scene as a whole. To evaluate the L2M_I performance, the assessment test verifies that the retrieved scalar quantities lie within one standard deviation from the average value, and that spectral quantities and profiles are consistent, within the error bars, with the bulk of the input values.    the task needs to be performed in the radiance domain and the FFT cannot be applied.
Of course the computing time heavily depends on the machine used and on the compiler. The figures presented are referred to a Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz machine with six physical cores, and to the Intel ifort compiler.
The scene preparation, consisting in reading the user input parameters and setting the input files for the forward computations, is very fast and typically lasts about 5 seconds.

790
The typical running speed for the LBLRTM, with an optimized spectroscopic database, is 35-40 seconds, with no major differences between scenes. The running time of fDISORT is the bottleneck of the cloudy retrieval, and it is heavily dependent on the cloud composition. A larger number of layers of the cloud, larger particles and larger OD tend to increase the computation time of the multiple scattering effect. The running time of the fDISORT, if executed serially, varies from 20 minutes (cirrus cloud of case 1.2) to 30 hours (thick opaque cloud of case 6.2). The fDISORT is itself between 1.2 (for liquid clouds) and 3.7

795
(for cirrus clouds) times faster than the original DISORT (Sgheri and Castelli, 2018). Applying optimizations of Section 5.2.2, the thick cloud fDISORT is ten times faster.
As for the convolution, a single operation takes about 20 seconds, but the tasks needs to be repeated at each iteration for each point of the vertical profiles of temperature and water vapour, plus two times for skin temperature and emissivity. For clear sky cases in the standard settings we need 106 convolutions at each iteration.

800
The running time of the SGM in a homogeneous situation is approximately equal to that of the LBLRTM for the clear sky cases, and that of the fLBLDIS for the cloudy sky cases. It is evident that in the case of the MODIS scenes, where each sub-pixel of the FSI FoV is treated separately, the SGM radiances had to be pre-computed, because the calculations last several days, even with parallelization of the tasks.
The L2M_CIC module only takes less than one minute to perform the land mask, the cloud identification and classification, 805 the characterization of the level of inhomogeneities in the FSI pixel, and the preparation of the first guess inputs for the cloud retrieval.
For the L2M_I module we take advantage of the multi-core feature of the machine. The fDISORT is computed in parallel over different wave number intervals running multiple copies of the external binary, with results close to the theoretical maximum speedup. The convolution is computed using the multi-thread support of the OpenMP (OpenMP Architecture Review Board, 810 2018) libraries. With these settings, the effective running time is still affordable to run tests, even in cloudy sky situations.
For the clear sky cases the running time of each iteration of the L2M_I module is about 20-25 minutes. The full L2M_I clear retrieval, including ancillary tasks such as the additional forward model after the regularization to calculate the chi-square of the regularized solution, takes about 90 minutes for case 1.1 that uses four iterations.
For the cloudy sky cases the running time of each iteration of the L2M_I module varies from the 10-12 minutes of case 1.2 to 815 the 90-120 minutes of case 6.2. In the cloudy cases the longest execution time of the forward model is compensated by the fact that we only retrieve the cloud parameters, using a constant perturbed atmosphere. The pre-processor, which is only enabled for cirrus clouds, takes about 30 minutes. The full L2M_I cloudy retrieval takes about 120 minutes for case 1.2 that uses five iterations, and 440 minutes for case 6.2 (the longest) that uses three iteration, but also needs three Marquardt micro-iterations.

820
The findings of the FORUM E2ES project are presented. In the E2ES the geometry of the orbit, the atmosphere and the FSI and FEI instruments are fully modeled with realistic configurations and retrieval is performed assuming either full clear sky or cloudy sky scenes.
We first successfully validate the performances of the E2ES chain with correlative codes, a necessary step to ensure that there are no flaws. Then, we select some homogeneous test cases, both in clear and in cloudy sky, chosen to represent all different 825 atmospheric conditions, soil characteristics and cloud types. In all cases, the retrieved quantities satisfy FORUM requirements.
In cloudy sky, we also show that the optical depth threshold for a cirrus to be detected varies between 0.03 and 0.1, with generally better sensitivity if the FIR is included. The threshold depends on the different characteristics of the surface and the atmosphere.
The influence on the retrieval of inhomogeneities in the field of view is also investigated. The convolution of the high resolu-830 tion spectrum with the ISRF function (computed assuming a model homogeneous FoV) well represents the instrumental effects for a homogeneous FoV, with the error being below the NESR threshold. This is not the case when a strong inhomogeneity is considered.
Heterogeneous soil in the FoV does not impact the retrieval of atmospheric profiles, with the retrieved surface temperature and emissivity approaching the weighted averages of the surface properties.

835
Even small contamination of a cloud in the FoV induces errors in the retrieved atmospheric and surface quantities. The error increases with the increase of the percentage of the FoV affected by the contamination and of the optical depth of the cloud. However, it is important to underline that, by exploiting the information content of the FIR channels, the CIC algorithm is highly sensitive to the presence of a cloud in the FSI FoV, even with a low percentage of cloud contamination. Thus, in most cases, the cloud identification is performed and the performances of the L2M_I are preserved. Note that CIC is the first 840 algorithm which exploits the far infrared spectral interval for cloud identification and classification from satellite observations, including the determination of cloud phase. In particular, it is demonstrated that the combined use of FIR and MIR spectral radiances enhances the cloud detection performances achievable by using the MIR part of the spectrum only.
Finally we perform the retrieval on some realistic, heterogeneous scenes, which, however, present a predominance of either clear sky or cloudy sky. We show that the retrieval converges to values that are included in the range of variability of the 845 quantities used to model the radiances.
Some critical problems have been highlighted. In particular, we believe the following main issues should be investigated, also considering the best way to use the IASI-NG sinergy.
-Emissivity retrieval. The retrieval method and grid should be optimized, as it is clear from the tests of this paper. Some work is being done (Ben-Yami et al., 2021) in clear sky. The preliminary tests in cloudy sky conditions, with an evolution 850 of the code that retrieves simultaneously the atmospheric and cloud parameters, show that this issue will be even more important, given the reduced sensitivity to the atmospheric surface due to the cirrus cloud.
-Atmospheric retrieval in presence of thick clouds. In this case there is no sensitivity to atmospheric parameters below the cloud top. A clear sky retrieval can be attempted with the cloud top representing the bottom limit for the radiative transfer, similarly to Feng et al. (2021).

855
-Heterogeneous FoV. In this case we might employ the FEI results to determine a homogeneity map for the various parts of the FSI FoV. This can be seen as an a-priori to attempt a composite (partly clear, partly cloud) retrieval.