Evaluation of MUSICA IASI tropospheric water vapour profiles using theoretical error assessments and comparisons to GRUAN Vaisala RS92 measurements

Volume mixing ratio water vapour profiles have been retrieved from IASI (Infrared Atmospheric Sounding Interferometer) spectra using the MUSICA (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water) processor. The retrievals are done for IASI observations that coincide with Vaisala RS92 radiosonde measurements performed in the framework of the GCOS (Global Climate Observing System) Reference Upper-Air Network (GRUAN) in three different climate zones: the tropics (Manus Island, 2 S), mid-latitudes (Lindenberg, 52 N), and polar regions (Sodankylä, 67 N). The retrievals show good sensitivity with respect to the vertical H2O distribution between 1 km above ground and the upper troposphere. Typical DOFS (degrees of freedom for signal) values are about 5.6 for the tropics, 5.1 for summertime mid-latitudes, 3.8 for wintertime mid-latitudes, and 4.4 for summertime polar regions. The errors of the MUSICA IASI water vapour profiles have been theoretically estimated considering the contribution of many different uncertainty sources. For all three climate regions, unrecognized cirrus clouds and uncertainties in atmospheric temperature have been identified as the most important error sources and they can reach about 25 %. The MUSICA IASI water vapour profiles have been compared to 100 individual coincident GRUAN water vapour profiles. The systematic difference between the data is within 11 % below 12 km altitude; however, at higher altitudes the MUSICA IASI data show a dry bias with respect to the GRUAN data of up to 21 %. The scatter is largest close to the surface (30 %), but never exceeds 21 % above 1 km altitude. The comparison study documents that the MUSICA IASI retrieval processor provides H2O profiles that capture the large variations in H2O volume mixing ratio profiles well from 1 km above ground up to altitudes close to the tropopause. Above 5 km the observed scatter with respect to GRUAN data is in reasonable agreement with the combined MUSICA IASI and GRUAN random errors. The increased scatter at lower altitudes might be explained by surface emissivity uncertainties at the summertime continental sites of Lindenberg and Sodankylä, and the upper tropospheric dry bias might suggest deficits in correctly modelling the spectroscopic line shapes of water vapour. Published by Copernicus Publications on behalf of the European Geosciences Union. 4982 C. Borger et al.: Comparison of MUSICA IASI and GRUAN water vapour profiles

Abstract. Volume mixing ratio water vapour profiles have been retrieved from IASI (Infrared Atmospheric Sounding Interferometer) spectra using the MUSICA (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water) processor. The retrievals are done for IASI observations that coincide with Vaisala RS92 radiosonde measurements performed in the framework of the GCOS (Global Climate Observing System) Reference Upper-Air Network (GRUAN) in three different climate zones: the tropics (Manus Island, 2 • S), mid-latitudes (Lindenberg, 52 • N), and polar regions (Sodankylä, 67 • N).
The retrievals show good sensitivity with respect to the vertical H 2 O distribution between 1 km above ground and the upper troposphere. Typical DOFS (degrees of freedom for signal) values are about 5.6 for the tropics, 5.1 for summertime mid-latitudes, 3.8 for wintertime mid-latitudes, and 4.4 for summertime polar regions. The errors of the MU-SICA IASI water vapour profiles have been theoretically estimated considering the contribution of many different uncertainty sources. For all three climate regions, unrecognized cirrus clouds and uncertainties in atmospheric temperature have been identified as the most important error sources and they can reach about 25 %.
The MUSICA IASI water vapour profiles have been compared to 100 individual coincident GRUAN water vapour profiles. The systematic difference between the data is within 11 % below 12 km altitude; however, at higher altitudes the MUSICA IASI data show a dry bias with respect to the GRUAN data of up to 21 %. The scatter is largest close to the surface (30 %), but never exceeds 21 % above 1 km altitude. The comparison study documents that the MUSICA IASI retrieval processor provides H 2 O profiles that capture the large variations in H 2 O volume mixing ratio profiles well from 1 km above ground up to altitudes close to the tropopause. Above 5 km the observed scatter with respect to GRUAN data is in reasonable agreement with the combined MUSICA IASI and GRUAN random errors. The increased scatter at lower altitudes might be explained by surface emissivity uncertainties at the summertime continental sites of Lindenberg and Sodankylä, and the upper tropospheric dry bias might suggest deficits in correctly modelling the spectroscopic line shapes of water vapour.

Introduction
Atmospheric water plays a key role in the atmospheric energy balance and temperature distribution via radiative effects (clouds and vapour) and latent heat transport. Hence the distribution and transport of atmospheric moisture is closely linked to atmospheric dynamics on all scales, and understanding its spatial and temporal variations is essential for weather and climate modelling. Also, understanding the coupling between moisture transport, clouds, and atmospheric dynamics is seen as a major challenge for improving atmospheric models (Stevens and Bony, 2013). In this context the global monitoring of the water vapour distribution is important, whereby the large inhomogeneity in time and space (horizontally and vertically) is particularly challenging.
In the meantime, several in situ and remote sensing measurement techniques for the observation of water vapour have been established using platforms such as surface stations, balloons, aircraft, and satellites. The radiative properties of water vapour enable satellite remote sensing measurements in a large range of wavelength regimes from the visible, e.g. GOME (Grossi et al., 2015), near-infrared, e.g. MODIS (Gao and Kaufman, 2003), thermal infrared, e.g. AIRS (Susskind et al., 2003), TES , and IASI (Herbin et al., 2009;Schneider and Hase, 2011), to the microwave, e.g. AMSU (Rosenkranz, 2000). The instrument IASI (Infrared Atmospheric Sounding Interferometer Clerbaux et al., 2009) aboard EUMETSAT's MetOp satellites is particularly promising: it has been providing global observations with high resolution and accuracy twice a day on a long-term mission for more than 14 years. Furthermore, IASI follow-up missions have already been approved, guaranteeing observations until the 2030s, which will offer great opportunities for studying the atmospheric composition over long time periods.
When using satellite data in research, it is important to understand their characteristics (sensitivity/representativeness and errors). Theoretical error assessments can be used to reveal the leading error sources. Ideally these error assessments should be accompanied by empirical data validation studies, in which the remote sensing data are compared to independent high-quality reference data. Radiosonde measurements are a good candidate for providing references for validating the remote sensing profiles; however, great care is needed for constraining the uncertainties in the radiosonde data (McMillin et al., 2007). Particularly promising in this context are the temperature and humidity profiles produced from Vaisala RS92 radiosonde measurements in the framework of the GCOS Reference Upper-Air Network (GRUAN, http://www.gruan.org, last access: 29 August 2018), a subnetwork of the Global Climate Observing System (GCOS, https://www.wmo.int/pages/prog/gcos/ index.php, last access: 29 August 2018). Currently GRUAN consists of about 30 reference sites and provides humid-ity and temperature profiles of a high and well-documented quality (Dirksen et al., 2014).
In this paper we perform a detailed theoretical error assessment and an empirical validation of the water vapour profiles as generated by the MUSICA (MUlti-platform remote Sensing of Isotopologues for investigating the Cycle of Atmospheric water Schneider et al., 2016) IASI retrieval processor. The retrievals are done for three different climate regions (tropics, mid-latitudes, polar regions) and for coincidences with GRUAN in situ radiosonde measurements, which we use as the reference for the empirical validation study. Our investigations will give an overview of the retrieval's capability of profiling atmospheric water vapour. The paper is organized as follows: Sect. 2 will give a brief overview of the MUSICA IASI processor by describing general retrieval and error estimation principles, by presenting the particularities of the MUSICA retrieval set-up, and by discussing the MU-SICA retrieval output. Section 3 presents the sites and time periods for which the data evaluation is performed. Section 4 shows the theoretical IASI data characterization, and Sect. 5 presents and discusses the results of the comparison between the remote sensing data and the GRUAN in situ reference data. In Sect. 6 we summarize the outcomes of the study.

Atmospheric remote sensing retrieval principles
In this subsection we give a very brief introduction to the principles of the optimal estimation retrieval method. It is a standard retrieval method in atmospheric remote sensing. For more details, please refer to Rodgers (2000), and for a general introduction on vector and matrix algebra, dedicated textbooks are recommended.
Atmospheric remote sensing means that the atmospheric state is retrieved from the radiation measured after it has interacted with the atmosphere. This interaction of radiation with the atmosphere is modelled by a radiative transfer model (also called the forward model, F ), which enables the measurement vector and the atmospheric state vector to be related by We measure y (the measurement vector, e.g. a thermal nadir spectrum in the case of IASI) and are interested in x (the atmospheric state vector). Vector b represents auxiliary parameters (like surface emissivity) or instrumental characteristics (like the instrumental line shape) which are not part of the retrieval state vector. However, a direct inversion of Eq. (1) is generally not possible because there are many atmospheric states x that can explain one and the same measurement y.
For solving this ill-posed problem, a cost function J is set up that combines the information provided by the measurement with a priori known characteristics of the atmospheric state: Here, the first term is a measure of the difference between the measured spectrum (represented by y) and the spectrum simulated for a given atmospheric state (represented by x), while taking into account the actual measurement noise (S y,noise is the measurement noise covariance matrix). The second term of the cost function (Eq. 2) constrains the atmospheric solution state (x) towards an a priori most likely state (x a ), whereby the kind and strength of the constraint are defined by the a priori covariance matrix S a . The constrained solution is reached at the minimum of the cost function (Eq. 2). Due to the non-linear behaviour of F (x, b), the minimization is generally achieved iteratively. For the (i + 1)th iteration it is K is the Jacobian matrix (derivatives that capture how the measurement vector will change for changes in the atmospheric state x). G is the gain matrix (derivatives that capture how the retrieved state vector will change for changes in the measurement vector y). G can be calculated from K, S y,noise and S a as The averaging kernel is an important component of a remote sensing retrieval and it is calculated as The averaging kernel A reveals how a small change of the real atmospheric state vector x affects the retrieved atmospheric state vectorx: The propagation of errors due to parameter uncertainties b can be estimated analytically with the help of the parameter Jacobian matrix K b (derivatives that capture how the measurement vector will change for changes in the parameter b). According to Eq. (3), using the parameter b + b (instead of the correct parameter b) for the forward model calculations will result in an error in the atmospheric state vector of The respective error covariance matrix Sx ,b is where S b is the covariance matrix of the uncertainties b.
Noise on the measured radiances also affects the retrievals. The error covariance matrix for noise can be analytically calculated as where S y,noise is the covariance matrix for noise on the measured radiances y.

The MUSICA retrieval set-up
The MUSICA IASI retrieval is based on a nadir version of the retrieval code PROFFIT (PROFile FIT Hase et al., 2004) and on the corresponding radiative transfer model PRFFWD (PRoFit ForWarD model Hase et al., 2004;Schneider and Hase, 2009 2 H 16 O (from now on called HDO) as a separate species. Furthermore, the retrieval's spectral window contains spectroscopic features of CH 4 and N 2 O as well as weak spectroscopic features of HNO 3 and very weak spectroscopic features of CO 2 . All these trace gases are simultaneously fitted during the retrieval process, whereby the spectroscopic parameters are taken from the HITRAN 2016 database (Gordon et al., 2017) with small modifications for HDO parameters (similar to Schneider et al., 2016, the line intensity parameters of HDO have been increased by 10 %).
For the water isotopologues, CH 4 , N 2 O, and HNO 3 profile retrievals are performed on a logarithmic scale. For CO 2 the a priori profiles are scaled. A single a priori profile is used for all the retrievals for each of the different trace gases; i.e. the a priori profiles used are the same for all locations and time periods (Schneider et al., 2016;García et al., 2018). For CH 4 , N 2 O, HNO 3 , and CO 2 the a priori profiles are averaged low-latitude profiles from WACCM (Whole Atmosphere Community Climate Model-version 6, and are provided by NCAR (National Center for Atmospheric Research, James W. Hannigan, private communication, 2009). The water vapour isotopologue a priori data are averages obtained from the isotopologue incorporated global general circulation model LMDZ (Risi et al., 2012).
The retrieval also fits the surface temperature and the atmospheric temperature profile, whereby the a priori temperatures are taken from the EUMETSAT IASI level 2 (L2) products. There is no constraint on the surface temperature. The atmospheric temperature variations allowed are 1 K at the ground, 0.5 K in the free troposphere, and 0.75 K above the tropopause. This altitude dependency roughly follows the altitude dependency of uncertainties in the EUMETSAT IASI L2 atmospheric temperature profiles (August et al., 2012).
The MUSICA IASI water vapour retrieval only works for pixels that are not contaminated by clouds, whereby we rely on the IASI L2 cloud flag (we require zero for the flag "cldfrm"). Ground elevations are from GTOPO30 developed by the US Geological Survey and provided by the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC). GTOPO30 is a global digital elevation model with a horizontal grid spacing of 30 arcsec (approximately 1 km). The land surface emissivities are from the global database of infrared land surface emissivity (IREMIS; http://cimss.ssec.wisc.edu/iremis/, last access: 29 August 2018;Seemann et al., 2008), and the sea surface emissivities are calculated according to the model of Masuda et al. (1988) and for an assumed surface wind speed of 5 m s −1 . Figure 1 depicts an example of a typical radiance spectrum in the retrieval's spectral range as measured by IASI (upper graph) and the corresponding differences compared to the simulated spectra (the residuals, lower graph). The residuals are mostly within the order of the instrument's 1σ measurement noise (Pequignot et al., 2008). However, there are also distinctive spectral signatures that are not well understood, specifically at 1250 and at 1280 cm −1 .
For further information on the retrieval set-up and its evolution, more detailed descriptions are available in Schneider and Hase (2011)

The MUSICA retrieval output
The output of the retrieval refers to the {ln [H 2 O] , ln [HDO]} basis system. In this basis system the state vector x consists of the vector for the H 2 O profile extended by the vector for the HDO profile: Correspondingly, the averaging kernel matrix A has 2 × 2 blocks  Barthlott et al., 2017). Similarly, retrieval error covariance matrices consist of 2 × 2 blocks, whereby the blocks in the diagonal represent the H 2 O and HDO covariances. For this study only the H 2 O covariance block is of interest (i.e. we are only interested in the H 2 O error covariances). The outer diagonal blocks represent the error covariances between H 2 O and HDO.

Reference data and sites
The theoretical and empirical assessment studies are done for cloud-free IASI measurements that coincide with GRUANprocessed Vaisala RS92 radiosonde measurements. Useful coincidences are defined in accordance to (Pougatchev et al., 2009) and (Calbet et al., 2017).
We identified three different sites with coincidence between IASI and GRUAN measurements: Manus Island (Papua New Guinea; 2 • 5 S, 146 • 58 E) for the tropics, Lindenberg (Germany; 52 • 12 N, 14 • 7 E) for the mid-latitudes, and Sodankylä (Finland; 67 • 25 N, 26 • 35 E) for the polar region.  there are 100 individual GRUAN radiosonde measurements that coincide with IASI cloud-free measurements. These four ensembles of GRUAN profiles are well representative of the highly varying tropospheric H 2 O distributions. In the free middle/upper troposphere the data show variations of up to 2 orders of magnitude. At the tropical site of Manus Island we observe up to 10000 ppmv (at 5 km a.s.l.) and up to 1000 ppmv (at 10 km a.s.l.), whereas at the mid-latitude and polar sites of Lindenberg and Sodankylä, the H 2 O concentrations can be as small as 100 ppmv and 10 ppmv, respectively. In this context, using the four ensembles of GRUAN data enables us to conduct an evaluation of the retrieval performance that has a good global validity.
The coincidences at the three sites are for different time periods, and there is not a strictly uniform data set for creating the retrieval input files: EUMETSAT L2 data are not available for all the time periods or are generated by a different EUMETSAT L2 product processing facility (PPF) software version (for more details see Sect. 3.2-3.4 and the summary of Table 1).

GRUAN-processed Vaisala RS92 in situ profiles
The Vaisala RS92 radiosonde is equipped with a wire-like capacitive temperature sensor ("Thermocap"), two polymer capacitive moisture sensors ("Humicap"), a silicon-based pressure sensor, and a GPS receiver to measure position, altitude, and winds. Each second the RS92 transmits sensor data, which are received, processed, and stored by the ground station equipment.
The Humicap consists of a hydro-active polymer thin film as the dielectric between two electrodes applied on a glass substrate. The humidity sensors are not covered by protective caps, but they are alternately heated to prevent icing. To prevent overheating, the heating of the humidity sensors is switched off below −60 • C, or above 100 hPa, whichever is reached first. Humicaps show good performance over a wide range of temperatures but suffer from systematic errors such as dry bias due to solar radiative heating and a response lag below −40 • C. Known main error sources affecting the humidity profile are daytime solar heating of the Humicaps introducing a dry bias, sensor time lag at temperatures below about −40 • C, and temperature-dependent calibration correction.
We work with Vaisala RS92 data that have been processed by the GRUAN lead centre (http://www.gruan.org, last access: 29 August 2018). The GRUAN data processing assures that the humidity, pressure, and temperature profiles obtained are well calibrated and highly accurate (Dirksen et al., 2014;Sommer et al., 2016).

Manus Island (MI)
At Manus Island we have coincidences in 2011, 2012, and 2013 with 25 individual GRUAN radiosonde profiles. The collocation of IASI and GRUAN measurements has been performed by EUMETSAT in the framework of a planned IASI retrieval comparison study (Calbet et al., 2017), allowing a spatial and temporal window of 25 km and 30 min respectively.
For our retrieval we use the a priori temperatures (atmosphere and surface skin) as well as surface emissivities from the EUMETSAT IASI L2 product generated with the IASI L2 PPF software version 5. Since most of the ground scenes are over the ocean surface, the emissivity values are mainly according to the model of Masuda et al. (1988). The satellite pixels have been careful examined for clouds by EUMET-SAT according to the cloud flags as provided in the IASI L2 data and in addition by visual inspection (Calbet et al., 2017).

Lindenberg 2008 (LI08)
For Lindenberg there are coincidences with 32 individual GRUAN profiles in 2008 (representative of all seasons). We performed the collocation and required that the satellite pixel has to be within a distance of 25 km with respect to the starting position of the radiosonde and that the satellite's pixel sensing time has to be within the sensing time period of the radiosonde.
As for Manus Island we rely on the IASI L2 data for our retrieval input data (surface and atmospheric temperatures, surface emissivity, cloud filter, etc.). However, while for the 2011-2013 time period (Manus Island) the IASI L2 data are generated with the IASI L2 PPF software version 5, for the 2008 retrievals we work with L2 data generated by the IASI L2 PPF software version 4. As shown in García et al. (2018), there can be inconsistencies between the MUSCA IASI products that are generated using different IASI L2 PPF software versions.

Lindenberg 2007 (LI07) and Sodankylä 2007 (SK07)
In 2007 we have 26 individual GRUAN profiles for Lindenberg and 17 individual GRUAN profiles for Sodankylä (details on the Sodankylä campaign are available in Calbet et al., 2011) that coincide with IASI observations. This data set is limited to the summer observations. We performed the collocation using the same criteria as for the Lindenberg 2008 coincidences; i.e. we required that the satellite pixel has to be within a distance of 25 km with respect to the starting position of the radiosonde and that the satellite's pixel sensing time has to be within the sensing time period of the radiosonde.
For summer 2007 IASI L2 data were not available for our retrieval input. Thus data from the radiosonde measurements were used as the a priori temperatures. Above the top altitude of the radiosonde we use zonally and monthly aver-aged temperature climatologies (COSPAR International Reference Atmosphere Rees et al., 1990). Using the GRUAN temperatures (and climatologies at higher altitudes) instead of the IASI L2 temperatures as the a priori values for the atmospheric temperatures might cause some inconsistency between the LI07 and SK07, on the one hand, and the MI and LI08 retrievals, on the other hand.
Surface emissivities are taken from the global database of infrared land surface emissivity (IREMIS; http://cimss. 3) for the three reference sites. Grey lines show all row kernels, and the thick coloured lines highlight the kernels for some selected altitudes. The peaks of the row kernels for the altitudes of 1.8, 3.6, 6.4, and 9.8 km are close to their nominal altitudes, meaning that the values retrieved for these altitudes represent the situation at the nominal altitude well. In contrast, the row kernel for the ground altitude peaks typically 1 km above ground altitude, meaning that data retrieved at the ground level mainly represent the altitudes about 1 km above the ground level. The row kernel for the 13.6 km altitude only peaks close to its nominal altitude for the tropical site of Manus Island. At Lindenberg and Sodankylä, the respective kernels peak at 9-10 km altitude; i.e. at these sites variations in the 13.6 km retrieval data are mainly driven by the actual atmospheric variations at 9-10 km.
The thick black dashed line represents the sum along the row of the averaging kernel matrix and is a measure of the remote sensing system's sensitivity. This value is typically between 0.9 and 1.1 from 1 up to 11 km at Lindenberg and Sodankylä, and from 1 up to 13 km at Manus Island.
The seasonal dependency of the averaging kernels is indicated in Fig. 4, which depicts the seasonal variations in the degree of freedom for signal (DOFS) values. The DOFS values are calculated as the trace of the averaging kernel matrix, and the higher the DOFS values are, the more profile information is in the retrieved atmospheric state. In the tropics we observe no seasonal dependency. In the mid-latitudes the DOFS values are distinctively higher in summer than in winter. More details on this seasonal dependency are provided in Fig. 5, which depicts typical wintertime and typical summertime row kernels for Lindenberg. It seems that the seasonal variation is mostly a variation of the sensitivity at higher altitudes. In summer the remote sensing system can detect the actual atmospheric variations up to 11-12 km, whereas in winter the sensitivity is limited to altitudes below 8-9 km. This is connected to the variation of the tropopause altitude. As shown in Schneider et al. (2017), the averaging kernels depend strongly on the atmospheric temperature and humidity profiles (as well as on the surface temperature and emissivity).
In summary, at all three different sites, the MUSICA IASI retrieval provides H 2 O profile information from about 1 km above ground up to about the altitude of the tropopause.

Calculation of error Jacobians
The error Jacobians (K b from Eqs. 7 and 8) are calculated by the forward model PRFFWD as follows: PRFFWD is executed running on a vertical grid of 28 levels from the surface altitude to approximately 55 km above mean sea level. For every site reference, forward calculations are performed for all cloud-free situations. The input (i.e. temperature, trace gas concentrations, etc.) for the reference forward model runs is the same as the input used in the forward calculation of the last iteration step of the MUSICA IASI retrievals; i.e. the reference radiances are given by F (x, b). Then for each reference scenario we make additional forward calculations with slightly modified parameters; i.e. we calculate F (x, b + b). For a measurement vector y with m elements and a parameter vector b with n elements, the Jacobian matrix K b will Figure 5. Same as Fig. 3, but for a typical winter and summer observation above Lindenberg. have the dimension m × n. The individual matrix elements are calculated as where k is the index for the kth element of the measurement state vector y (simulated by vector function F ) and l is the index for the lth element of the parameter vector b, respectively. Table 2 gives an overview of the uncertainty assumptions b used for calculating the Jacobians and for performing the error estimation. The calculations of the error Jacobians for water vapour continuum and clouds require specific treatment, which is detailed in the following two subsections.

Water vapour continuum
We assume that calculations based on the model MT_CKD v2.5.2 (Mlawer et al., 2012;Delamere et al., 2010;Payne et al., 2011) only partly capture the full water vapour continuum effect. For the respective Jacobian calculation, we perform forward calculations without considering the water vapour continuum (F noWVC (x, b)). Then we calculate the Jacobian matrix as The spectral response for an underestimation of 10 % of the water vapour continuum effect is then K noWVC b noWVC , with b noWVC = 0.1.

Opaque clouds (cumulus)
We estimate the influence of fractional coverage by opaque liquid cumulus clouds with different cloud top altitudes (1.3, 3.0, and 4.9 km). The radiance at the top of the cloudy atmosphere F cum (x, b) is calculated by starting PRFFWD at the cloud's top height, assuring that no radiation from below the cloud contributes to F cum (x, b). Additionally it is assumed that the surface emissivity of the cloud is 1.0 and that the skin temperature of the cloud's upward-looking surface is in thermal equilibrium with the surrounding air temperature. The Jacobian matrix for opaque cumulus clouds is then , and the spectral response of a 10 % fractional cloud cover is K cum b cum , with b cum = 0.1.

Transmitting clouds (mineral dust and cirrus)
Some clouds are not opaque and we have to consider partial attenuation by the cloud particles. This is the case for cirrus clouds and mineral dust clouds. We consider these clouds by introducing them as an additional species in the forward model calculations. The extinction of these clouds is the sum of absorption and scattering. Since PRFFWD does not include the simulation of scattering clouds, we calculate the attenuated radiances using forward model calculations from KOPRA (Karlsruhe Optimized and Precise Radiative transfer Algorithm; Stiller, 2000) and consider single scattering. The frequency dependency of the extinction cross sections, the single scattering albedo, and the scattering phase functions of the clouds are calculated from OPAC v4.0b (Optical Properties of Aerosol and Clouds; Hess et al., 1998;Koepke et al., 2015). For cirrus clouds we assume the particle composition as given by OPAC's "Cirrus 3" ice cloud example (see Table 1b in Hess et al., 1998) and for mineral dust clouds a particle composition according to OPAC's "Desert" aerosol composition example (see table Table 4 in Hess et al., 1998).
We conduct cirrus cloud forward calculations F cir considering cirrus clouds with a vertical cloud layer thickness of 1 km and the cloud top at different altitudes ranging from 6 to 14 km. The Jacobians are calculated as K cir = F cir (x, b)− F (x, b) and for a cloud coverage of 50 %, the spectral response is K cir b cir , with b cir = 0.5.
For the dust clouds we conduct forward calculations F dust for homogeneous 2 km thick layers between the ground and 6 km altitude. The Jacobians are then given as Figure 6 depicts the spectral responses (i.e. K b b) for an example of different uncertainty sources for a typical situation at the tropical reference site. The left panel shows that lower tropospheric temperature uncertainties mainly affect the spectra between 1190 and 1250 cm −1 (which is also the spectral region of an "atmospheric window"), but are negligible for higher wavenumbers. This is in contrast to upper tropospheric temperature uncertainties, which have the highest spectral responses for wavenumbers larger than 1250 cm −1 . Table 2. List of uncertainty assumptions used for the error estimation of the MUSICA IASI water vapour product (for emissivity and atmospheric temperature, we assume random and systematic uncertainties). The abbreviation "pdf" refers to the probability distribution function.
Random (Gaussian pdf) + Systematic Random: 2 K from the ground to 2 km and 1 K above 2 km altitude, with correlation length increasing from 2 km at the ground to 10 km in the stratosphere; systematic: 2 K from the ground to 2 km, and 1 K for 2-5 km, 5-10 km, and 10 km-TOA  The right panel of Fig. 6 illustrates that uncertainties in dust layers and uncertainties due to cirrus clouds have the highest impact at the lower end of wavenumbers and that a cirrus cloud has a different dependency on wavenumber than a dust layer. Furthermore unrecognized clouds have the op-posite effect on the spectrum than increasing the atmospheric temperatures although affecting the spectrum in the same order of magnitude.  Figure 4 shows a certain seasonal variability in the DOFS values (in particular at the mid-latitude site), indicating varying sensitivities of the remote sensing system. This variation is also present in the sensitivity with respect to uncertainty sources. For this reason we present the estimated errors for all the Manus Island and Sodankylä retrievals (MI and SK07) and for all the Lindenberg 2008 retrievals (LI08). The Lindenberg 2008 error estimations are representative of all seasons; hence they cover the full sensitivity variation with respect to uncertainty sources well. Table 2 gives on overview of the different uncertainty sources we consider for our error estimation. We distinguish between random uncertainty sources (the uncertainty affecting an observation is uncorrelated with the uncertainty affecting another observation), systematic uncertainty sources (the uncertainty is the same for all observations), and uncertainty sources that are always positive but with a random amplitude (clouds: the sky is either cloud-free or covered by a random amount of cloud).

Errors caused by random uncertainty sources
From the top to the bottom, Fig. 7 depicts the H 2 O error profiles due to the random uncertainties instrumental noise, emissivity, and atmospheric temperatures (from the left to the right for Manus Island, Lindenberg, and Sodankylä). The error profiles shown are the square root of the diagonal elements of the error covariance matrix Sx ,noise calculated for the instrumental noise according to Eq. (9) and of the error covariance matrix Sx ,b calculated for emissivity and atmospheric temperature according to Eq. (8).
For the calculations of Sx ,noise we assume a noise covariance S y,noise of the IASI radiances according to (Pequignot et al., 2008). The measurement noise errors vary around 2 %-10 % near the ground, but decrease to approximately 2 %-3 % above the boundary layer and remain there throughout the free troposphere. Close to the tropopause, errors increase again to values of around 10 %. For Manus Island we observe similar errors for all the different observations. For Sodankylä and in particular for Lindenberg, the errors vary. For instance, in the lower troposphere at Lindenberg the error is 10 % for some days, but only 1 %-3 % for other days. The varying sensitivity with respect to the uncertainty sources is due to the varying atmospheric conditions and is in agreement with the varying DOFS values as documented by Fig. 4 (the Lindenberg data cover all mid-latitude seasons).
For calculating the error covariances Sx ,b due to surface emissivity uncertainties, we assume a 1 % emissivity uncertainty and a spectral correlation length of this uncertainty of 100 cm −1 . The resulting errors are highest close to the ground and for the continental sites of Lindenberg and Sodankylä, where they can reach 30 %. Above 5 km altitude these errors are generally below 2 %.
For calculating the error covariances Sx ,b due to atmospheric temperatures uncertainties, we assume 2 K uncertainty from the ground to 2 km and 1 K uncertainty above 2 km altitude, with correlation lengths increasing from 2 km at the ground to 10 km in the stratosphere. The errors are typically 10 %-15 %, but can occasionally reach 25 %.

Errors caused by systematic uncertainty sources
From the top to the bottom, Fig. 8 shows the H 2 O error profiles due to systematic uncertainties in surface emissivity, atmospheric temperature, and spectroscopic parameters. The error profiles are calculated as x according to Eq. (7).
We assume two patterns of surface emissivity uncertainty. The first pattern means a −1 % uncertainty at the spectral grid points 1185 and 1240 cm −1 and 0 % uncertainty at the grid points 1295, 1350, and 1405 cm −1 (this means that between 1240 and 1295 cm −1 the uncertainty is linearly changing from −1 % to 0 %). The second pattern means a −1 % uncertainty at the spectral grid points 1405 and 1350 cm −1 and 0 % for the rest (with a linear change −1 % to 0 % between 1350 and 1295 cm −1 ). The top row of panels in Fig. 8 shows that surface emissivity uncertainties are mainly important for the wavenumber region below 1300 cm−1 (the first uncertainty pattern). An emissivity uncertainty of −1 % has a rather uniform response at Manus Island: a positive error of up to 5 % close to the ground and a weak negative error around 3 km altitude. At the continental sites of Lindenberg and Sodankylä the response to a systematic −1 % emissivity uncertainty can be positive or negative close to the ground (it varies between about −25 % and +20 %). Around 3 km the error response is generally negative and between −2 % and −20 %.
Positive atmospheric temperature uncertainties cause large positive errors in the retrieved tropospheric H 2 O profiles (we assume a systematic uncertainty of +2 K up to 2 km altitude and +1 K at higher altitudes). The errors can reach +30 %, whereby these errors are largest for the atmospheric layers where the atmospheric temperature uncertainty is assumed. For instance, uncertainties in lower tropospheric temperature (ground-2 km, black lines) cause maximal errors from the ground up to 3 km and decrease rapidly with altitude upwards, whereas uncertainties in upper tropospheric temperature (5-10 km, green lines) are negligible from the ground up to 6 km, but then increase to values of around +20 % at 8 km.
Concerning spectroscopic parameters, we consider systematic uncertainties in the H 2 O line intensity and pressurebroadening parameters and an uncertainty in the applied water continuum model. The uncertainty in the water vapour continuum model causes error profiles with small oscillations. For a water continuum model that underestimates the water continuum effect by 10 % (see Sect. 4.2.1), the error is positive near the ground (about +2 %), negative at around 3 km altitude (about −4 %), and negligible for alti- The impact of +5 % uncertainties in the pressure-broadening parameter depends on the reference site: at Manus Island the resulting errors are negligible above 3 km, but at Lindenberg and Sodankylä the error profiles contain strong oscillations with maximal error of about +10 % above 10 km altitude.
This behaviour of the errors due to uncertainties in the line shape modelling might be explained as follows: most of the thermal nadir spectra's information about the vertical H 2 O distribution is a consequence of the vertical atmospheric gradients of temperature and humidity. Without these gradients the spectral emission from a lower atmospheric layer is widely cancelled out by the absorption at a higher layer. The gradients are generally strong up to the tropopause; i.e. up to the tropopause the remote sensing system's sensitivity is widely determined by these gradients. At Manus Island the tropopause is generally above 15 km, whereas at Lindenberg and Sodankylä it can be at much lower altitudes. This can be observed in Fig. 2, which indicates a decrease of H 2 O concentration up to 16 km above Manus Island, but only up to Linestrength Pres. board Figure 8. H 2 O error profiles derived from the systematic uncertainty sources: emissivity (−1 % in two different wavenumber regions), atmospheric temperatures (2 K between surface and 2 km a.s.l. and 1 K in the other layers), and spectroscopy (line strength, +5 %, and pressure-broadening, +5 %) and the water vapour continuum (assuming a 10 % underestimation of the MT_CKD model). The errors in the atmospheric state vector x are shown according to Eq. (7). The data are depicted for all members of the MI, LI08, and SK07 ensembles. about 13 and 12 km above Lindenberg and Sodankylä, respectively. Due to the weaker gradients above Lindenberg and Sodankylä and the relatively good spectral resolution of the IASI spectra, the line shapes do also provide information on the vertical distribution of H 2 O. This is due to the pressure-broadening effect; i.e. the broadness of the line decreases with decreasing pressure. As a consequence, the H 2 O profiles retrieved at Lindenberg and Sodankylä are much more affected by uncertainties in the line shape modelling than the profiles retrieved at Manus Island. Figure 9 shows the influence of different cloud types on the errors. Uncertainties due to unrecognized cirrus clouds (top row in Fig. 9) lead to errors of −20 % from 3 to 6 km at all sites and then decrease with altitude. However their impact on the water vapour volume mixing ratio (WVMR) profiles in the boundary layer shows large variation, especially at Lindenberg and Sodankylä, which is a result of the more variable . Same as Fig. 8, but for errors due to unrecognized clouds: cirrus (50 % fractional coverage; for the location of cloud layers, see legend), cumulus (10 % fractional coverage with cloud top altitudes as given in the legend), and mineral dust (homogeneous dust clouds layers as give in the legend and with the composition according to OPAC "Desert").

Errors due to unrecognized clouds
atmospheric conditions at these sites (compared to the tropical site of Manus Island). The influence of a 10 % fractional cloud cover of opaque clouds depends on the height at which the clouds are assumed (middle row in Fig. 9): clouds at 1.3 km only show a small impact on the humidity profiles in the boundary layer, with error magnitudes of 5 %-10 %, but clouds at 3.0 km account for errors of more than 10 % up to 5 km above mean sea level. Yet similarly to cirrus clouds, their effect in the boundary layer shows large variation at Lindenberg and Sodankylä.
The error profiles due to mineral dust layers (bottom row in Fig. 9) show that such layers have almost no impact if they are situated in the boundary layer; however if they are situated in the middle troposphere the errors are more than 10 %. The effect of dust clouds is particularly large for the mid-latitude site of Lindenberg, where we also observe the largest variability in the calculated error profiles.

Comparison of GRUAN and IASI data
We use GRUAN-processed Vaisala RS92 radiosonde measurements as a reference for empirically validating the retrieved MUSICA IASI H 2 O profiles. The radiosonde ascents are collocated temporally and spatially with MetOp overpasses (for details see Sect. 3), which is essential for a meaningful comparison.

Regridding and smoothing of the high-resolution GRUAN in situ profiles
The in situ profiles have a high vertical resolution. This differs from the remote sensing profiles, which can only detect the major characteristics of the vertical H 2 O distribution. Before comparing the data we have to account for these different characteristics by regridding and smoothing the in situ profiles. While the remote sensing retrieval provides atmospheric states and averaging kernels on a coarse atmospheric grid (between ground level and about 55 km a.s.l., 28 grid points are defined), the radiosonde reports data about every 5 m. Therefore, we have to regrid the radiosonde data to the coarse vertical grid used by the remote sensing retrieval. In order to guarantee that the regridding does not significantly affect the H 2 O partial columns, the regridding is performed in two steps.
First, the radiosonde data points between the 28 MUSICA retrieval grid points are averaged by using a triangle inversedistance-weighted function resulting in a first estimate of the regridded radiosonde data. In the second step this first estimate is corrected by requiring that the partial columns between adjacent grid levels remain almost the same in the original high-resolution data and in the regridded data. In the correction process a constraint is put on the smoothness of the profile, thereby preventing the correction from producing strongly oscillating profiles. The results are regridded data consisting of reasonably smooth profiles with practically the same partial columns as the original high-resolved radiosonde profiles. For the high altitudes that are not detected by the GRUAN radiosonde, we use the retrievals' a priori data (x a ).
The regridded GRUAN in situ profiles g may be smoothed according to the averaging kernels of the remote sensing retrieval. The regridded and smoothed GRUAN in situ profilê g is then comparable to the remote sensing profile, wherebŷ Here A 11 is the H 2 O block of the averaging kernel matrix, A 12 the block that describes the response of the retrieved H 2 O to atmospheric HDO (see Sect. 2.3), and the vector x a is the a priori state vector. An example illustrating the effects of the regridding and the smoothing is given in Fig. 10. We would like to note that by using Eq. (13) Here S a,δD describes the actual atmospheric δD covariances. Because A 12 and S a,δD only have small entries, this uncertainty is below 1 % and can be neglected for our comparison.

Metric for quantifying data agreement
For a better statistical quantification of the deviations of the remote sensing data from the GRUAN reference data, we introduce a skill score, DL, describing the difference of the logarithmic values of the respective water vapour concentrations. Because ln(x) ≈ x x , we interpret the logarithmicscale difference between IASI and GRUAN as the relative difference (and use the GRUAN data in the denominator). DL then becomes where [H 2 O] GRUAN is the regridded and smoothed radiosonde H 2 O data (i.e.ĝ from Eq. 13), and [H 2 O] retrieval is the retrieved IASI H 2 O data. The skill score DL defined in this way is a good measure for the relative difference between the GRUAN and IASI data. As a good measure for the mean relative difference between GRUAN and IASI, we can use the mean difference of Figure 11. Vertical profiles of retrieval skill scores calculated according to Eqs. (16)-(19) for the MI and LI08 ensembles. The black line and error bars represent the mean difference and the 1σ scatter between IASI and smoothed GRUAN data (MDL and ±σ MDL ). The red shaded area around MDL is the 1σ scatter expected due to MUSICA IASI and GRUAN errors (± MDL ). The grey shaded area represents the area beyond the 1σ variability of smoothed GRUAN data (area beyond ±σĝ).
logarithmic values (MDL): Similarly, we can use the standard deviation of the logarithmic differences as a measure for the relative scatter between GRUAN and IASI, and introduce σ MDL as For illustrating the variation of the atmospheric state, we introduce σĝ as We want to document to what extent the differences between GRUAN and MUSICA IASI data can be explained by the estimated errors. In Sect. 4 we estimate the error in the MUSICA IASI H 2 O profiles in detail for three different climate zones. Uncertainties in the GRUAN H 2 O profiles also have to be considered. In general the uncertainty of the GRUAN data increases with altitude. For the regridded and smoothed GRUAN profiles, ĝ is about 3 %-5 % near the surface and 5 %-20 % at around 10 km altitude. For further details on the radiosonde uncertainty, we refer the reader to Appendix A. If we assume that the MUSICA IASI and the GRUAN errors are uncorrelated random errors, we can calculate the 1σ scatter of DL around MDL as expected from the MUSICA IASI and GRUAN errors by Here the index i stands for an individual observation and N is the number of all observations.

Data agreement for individual ensembles
In this section we present the comparison between the regridded and smoothed GRUAN H 2 O profiles and the IASI H 2 O profiles using the metric as described in the previous subsection. The statistical quantifications are done individually for the four different ensembles as given in Table 1. The aim is to illustrate the remote sensing data quality for the three different climate zones. Figure 11 depicts the vertical distribution of the data agreement for the MI and LI08 ensembles. These ensembles correspond to IASI measurements with available EUMETSAT L2 data, and we can execute the standard MUSICA IASI retrieval, which uses the EUMETSAT L2 temperature data as the a priori atmospheric temperatures. For the MI ensemble the MDL value (thick black line) oscillates between −23 % and −7 % below 10 km altitude and is close to zero at higher altitudes. The scatter σ MDL is indicated by the black error bars and it is generally within 20 %, except for the altitudes around 12 km where it is slightly higher. For the LI08 ensemble the MDL value oscillates between −20 % and +16 % and the scatter σ MDL is up to 42 % below 5 km and about 15 % at higher altitudes. For both ensembles (MI and LI08) the σ MDL values are significantly smaller than the 1σ variation in the smoothed radiosonde data (σĝ). The red shaded area around the MDL value represents the MDL values, i.e. the scatter in the MDL value we expect due to the errors in the MUSICA IASI and GRUAN H 2 O data. The MDL values are calculated according to Eq. (19) by considering MUSICA IASI random errors due to measurement noise, emissivity, and atmospheric temperature (actually we work with the error estimations as depicted in Fig. 7) and the GRUAN random errors as discussed in Appendix A and presented in Fig. A2. For the MI ensemble, the MDL and σ MDL show similar amplitudes and vertical behaviour, meaning that the expected and the observed scatter agree reasonably well. For the LI08 ensemble MDL and σ MDL only agree well above 5 km altitude. At lower altitudes the actually observed scatter is significantly larger than the scatter expected from the estimated MUSICA IASI and GRUAN errors, which might indicate an underestimation of the MU-SICA IASI random errors at Lindenberg below 5 km altitude.

MUSICA IASI standard retrieval
The comparison suggests a weak dry bias in the MUSICA IASI data between 2 and 10 km at Manus Island and above 10 km at Lindenberg. The former could be explained by errors in the simulated line intensities and the latter by errors in the simulated line shapes (see discussion in the context of Fig. 8). However, given the small number of ensemble members, we should be careful and avoid premature conclusions.

Retrieval using external temperature data
During summer 2007 EUMETSAT L2 data were not available, and the retrievals for the LI07 and SK07 ensembles were executed using the atmospheric temperatures measured by the GRUAN radiosondes as the a priori atmospheric temperatures (see discussion in Sect. 3.4). In order to avoid inconsistencies when comparing the different ensembles, we simulate retrievals of the MI and LI08 ensembles that also use the GRUAN radiosonde temperatures instead of the EU-METSAT L2 temperatures as the a priori atmospheric temperatures. The simulated retrieval products are obtained by adding GK T (T L2 −T GRUAN ) to the standard MUSICA IASI retrieval products, where G is the gain matrix, K T is the Jacobian matrix for atmospheric temperature, and T L2 and T GRUAN are the atmospheric temperature state vectors of the EUMETSAT L2 and GRUAN data, respectively. For the altitudes above the radiosonde we extend the T GRUAN vector with a zonally and monthly mean temperature climatology (Rees et al., 1990). For calculating the combined MUSICA IASI and GRUAN random error (i.e. the expected scatter MDL ) we have to consider the uncertainties in the GRUAN temperatures instead of the uncertainties in the EUMETSAT L2 temperatures. Appendix B gives a brief overview of the GRUAN temperature uncertainties. Figure 12 depicts the data agreement for all four ensembles when GRUAN radiosonde temperatures are used as the a priori atmospheric temperatures. For Manus Island and Lindenberg (ensembles MI and LI08), the scatter in MDL (σ MDL ) is significantly reduced when compared to Fig. 11 (figure showing the data agreement for MUSICA IASI retrievals that use EUMETSAT L2 temperatures as the a priori atmospheric temperatures). A similar reduction is also observed in the theoretically predicted scatter ( MDL ) because the GRUAN temperatures have a much smaller uncertainty (typically 0.1-0.3 K) than the EUMETSAT L2 temperatures (we assume 1-2 K). At Lindenberg (ensemble LI08), σ MDL and MDL agree much better for the retrieval products (obtained by using GRUAN temperatures as a priori values) than for the MUSICA IASI standard retrieval products (obtained by using EUMETSAT L2 temperatures as a priori values). This suggests that for Lindenberg and the year 2008, our uncertainty assumptions for the EUMETSAT L2 atmospheric temperatures (see Table 2) are probably too optimistic.
The bottom panels in Fig. 12 show the data agreement for the LI07 and SK07 ensembles. These ensembles are exclusively representative of summer observations. We observe that the σ MDL values are generally larger than the MDL values, meaning that we probably underestimate the MUSICA IASI random errors. In addition we find a wet bias of up to 30 % below 2 km altitude and a dry bias of about 20 % at around 14 km.
An upper tropospheric dry bias is consistently observed in the analysis of the LI08, LI07, and SK07 ensembles, but not seen in the analysis of the MI ensemble. A systematic uncertainty source that affects upper tropospheric H 2 O at Lindenberg and Sodankylä but not at Manus Island is the shape of the water vapour lines (see discussion in the context of Fig. 8). Therefore, deficits in simulating the line shapes might explain this upper tropospheric dry bias. In the near-surface atmosphere we observe a wet bias at the two continental sites, Lindenberg and Sodankylä, but only for the ensembles that are limited to the summer season (LI07 and SK07). Our error estimation study suggests that small uncertainties in the emissivity can cause large errors at these continental sites. Therefore, an uncertainty in the IREMIS emissivity used is a candidate for explaining the near-surface wet bias; however, the H 2 O retrieval response for a −1 % uncertainty in the emissivity differs between observations and can be positive or negative (see Fig. 9). This means that emissivity uncertainties can only explain the bias if the sign of the emissivity uncertainty is correlated with the atmospheric state (e.g. the uncertainty in the used monthly IREMIS surface emissivity is typically positive for dry atmospheric conditions and typically negative for humid atmospheric conditions) or surface conditions (e.g. the uncertainty in the IREMIS data is typically positive/negative for a surface with high/low emissivity or high/low skin temperatures).   Figure 13 depicts the vertical profiles of the data agreement skill score parameters for all coinciding observations without separating the different ensembles. This analysis is based on 100 individual comparisons.

Global overview of data agreement
Below 12 km altitude the MDL value oscillates between −10 % and +11 % and at around 14 km altitude it reaches −21 %. The scatter in MDL (σ MDL ) is almost 29 % close to the surface but generally smaller than 20 % above 1 km altitude. Above 5 km altitude this observed scatter is only slightly larger than the scatter predicted from the estimated errors ( MDL ). At lower altitudes the predicted scatter is clearly smaller than the observed scatter. An explanation of the observed upper tropospheric bias and increased scatter at low altitudes is given in the previous section: the dry bias in the upper troposphere might have its origin in incorrect modelling of the spectroscopic line shapes, and the increased scatter near the surface might be due to uncertainties in the IREMIS emissivities.  Figure 14. Correlation between GRUAN (along x axes) and MUSICA IASI data (along y axes) for the six different atmospheric altitudes that are highlighted in Figs. 3 and 5. All the presented data are for MUSICA IASI retrievals that use the GRUAN temperature profiles as the a priori atmospheric temperatures. The retrieval altitudes are given in the individual scatter plots, together with correlation coefficient (R 2 ), bias (b) and scatter (s). Data belonging to the different ensembles can be identified by the symbols and colours as described in the legend (bottom right). The yellow stars represent the a priori value (the retrieval uses the same a priori H 2 O values globally) and the blue error bars indicate the typical GRAUN and IASI errors. The dotted line represents the 1-1 diagonal.
The observed scatter between GRUAN and IASI (σ MDL ) is significantly smaller than the 1σ variation of the smoothed radiosonde data (σĝ), which reaches about 50 % near the surface and more than 100 % in the middle and upper troposphere. This reflects the large variation in the atmospheric water vapour concentration data we use for our evaluation study (see also Fig. 2). The MUSICA IASI data product does capture most of these variations well. For demonstrating this capability, Fig. 14 illustrates correlation between the MUSICA IASI retrieval products and the smoothed GRUAN data for selected altitudes. The respective altitudes are highlighted in Figs. 3 and 5, which document that at all sites the MUSICA IASI product for 1.8, 3.6, 6.4, and 9.8 km is independent and very sensitive to real atmospheric variations. Near the surface, the sensitivity is generally limited, and at 13.6 km, only the Manus Island data are reasonably sensitive to the actual atmospheric variations.
At the altitudes at which the MUSICA IASI product shows very good sensitivity (1.8, 3.6, 6.4, and 9.8 km), we observe a very good correlation and can demonstrate that the MU-SICA IASI product can correctly capture the large variations that are present in atmospheric water vapour. For instance, at 3.6 km the mixing ratios range from below 200 to almost 20000 ppmv and at 9.8 km from 10 to more than 1000 ppmv.
Please note that these large variations are reliably reproduced by the MUSICA IASI processor, although the retrieval works with a single humidity a priori value that is used at all sites and during all seasons (indicated by the yellow stars in Fig. 14). Near the surface the correlation is a bit weaker than at higher altitudes, mainly due to some outliers belonging to the LI07 and SK07 ensembles (the ensembles representing the summer season over land). At 13.6 km altitude we observe a good correlation, which demonstrates the possibility of detecting H 2 O at Manus Island. However, at Lindenberg and Sodankylä, these variations are strongly driven by actual atmospheric variations that take place at lower altitudes (see magenta lines in Figs. 3 and 5).
For our theoretical error analyses in Sect. 4, we assume that the relative errors have a component that is mostly random (Fig. 7) and another component that is mostly systematic (Fig. 8). For the comparison study we proceed similarly and examine bias and scatter, which means that we describe the variance in the MUSICA IASI data by the variance in the GRUAN data (σ 2 g ) and the variance in the difference between MUSICA IASI and GRUAN (σ 2 MDL ). Using this description, we can calculate the R 2 value that represents the portion of the MUSICA IASI variance that is in full agreement (fully Figure 15. Profiles of correlation coefficients (R 2 ) for comparison between GRUAN and MUSICA IASI (for retrievals that use the GRUAN temperature profiles as the a priori atmospheric temperatures). Different line colours and symbols show the different ensembles and thick black solid line for considering all four ensembles as a single data set (the R 2 values for the latter are also written in the panels of Fig. 14). correlated) with the GRUAN variance: Each panel of Fig. 14 contains the R 2 value calculated for the respective altitude. The error blue bars on the diagonal of the plots of Fig. 14 indicate the typical GRUAN errors ( ĝ as detailed in Appendix A) and the root square sum of the typical leading MUSICA IASI random errors ( x), whereby we have considered measurement noise, uncertainties in surface emissivity, and uncertainties in the GRUAN temperatures. The MDL and σ MDL values are also written in each panel as bias (b) and scatter (s) values, respectively. They are the same as shown in Fig. 13. Figure 15 resumes the capability of the MUSICA IASI retrieval product for capturing real atmospheric H 2 O variations at different altitudes by showing vertical profiles of the R 2 values calculated according to Eq. (20) for the different ensembles individually and when considering all 100 individual comparisons together. Between 1 and 12.5 km altitude (and when considering all comparisons of the MUSICA IASI products together), the retrieval detects more than 90 % of the atmospheric variations in agreement with GRUAN.

Summary and outlook
In this paper, we compare water vapour profiles retrieved from IASI spectra by the MUSICA IASI retrieval with in situ measurements from GRUAN radiosondes at three different reference sites representative of three different climate zones (tropics, mid-latitudes, and polar regions). In addition, we provide an extensive theoretical error estimation of the retrieval's water vapour product for the respective reference sites considering many different uncertainty sources.
The error estimations of the MUSICA IASI water vapour profiles at the different reference sites reveal that for the lowermost 3 km, the errors can be as large as 30 %. The most important uncertainty sources are unrecognized clouds, and uncertainties in lower tropospheric temperature and in surface emissivity. Between 3 and 6 km the error can be as large as 20 %, mainly due to middle atmospheric temperature uncertainties and unrecognized high cirrus clouds. Above 6 km the errors are typically smaller than 20 % and mainly caused by uncertainties in upper tropospheric temperatures and uncertainties in spectroscopic pressure-broadening parameters.
For the empirical validation study the remote sensing MU-SICA IASI H 2 O profiles have been compared to 100 different Vaisala RS92 radiosonde measurements that have been processed by the GRUAN lead centre. The scatter found for the difference between GRUAN and IASI is smaller than 21 % above 1.8 km altitude. It is slightly higher near the ground. This is in good agreement with errors as given for the GRUAN data and the errors as estimated for the MUSICA IASI product. It is important to note that the coincidences correspond to 5 different years and represent three different climate zones, giving the study presented here a good global representativeness. We demonstrate that the MUSICA IASI retrieval is able to correctly capture variations in H 2 O profiles between 1 km above ground and the upper troposphere.
The comparison indicates a dry bias of the remote sensing data of 20 % in the upper troposphere of the middleand high-latitude sites, but not at the tropical site. We find that deficits in spectroscopic line shape modelling could explain such behaviour. For the current MUSICA IASI retrieval, a Voigt line shape model is assumed and HITRAN 2016 pressure-broadening parameters are used. It would be interesting to investigate if the usage of more sophisticated line shape models (e.g. a speed-dependent Voigt line shape model) could reduce the upper tropospheric bias and improve the agreement between the MUSICA IASI remote sensing and GRUAN in situ data. For the continental sites (Lindenberg and Sodankylä) during summer, we observe a wet bias in the MUSICA IASI data with respect to GRUAN. Uncertainties in land surface emissivity being correlated to atmospheric or surface conditions (e.g. negative/positive emissivity uncertainties occurring in line with very dry/humid atmospheric conditions or hot/cold skin temperatures) could explain this behaviour. It would be interesting to test if the usage of a daily surface emissivity product instead of the monthly mean IREMIS data (which have been used for the retrievals presented here) improves the agreement between MUSICA IASI and GRUAN. Data availability. The MUSICA IASI data presented here are available on the MUSICA website at http://www.imk-asf.kit. edu/english/musica.php (last access: July 2018). Please contact Matthias Schneider for more details.

Appendix A: Uncertainties of GRUAN water vapour volume mixing ratios
In order to perform a valid comparison between remote sensing data and in situ measurements, the uncertainties of the in situ data have to be considered. GRUAN provides uncertainties for the relative humidity ( ), for the temperature ( T ), and for the pressure ( p). The water vapour volume mixing ratio (WVMR) is defined as where E is the water vapour saturation pressure. The GRUAN WVMR error for each individual radiosonde can be calculated as Uncertainties in atmospheric pressure p can be neglected when compared to the uncertainties of E(T ) and . For the calculation of the water vapour saturation pressure, we use the same formula as GRUAN from Hyland and Wexler (1983). Since E(T ) is a highly non-linear function, we estimate the uncertainty of E by For a reasonable comparison, the vertically highly resolved GRUAN profiles have to be adjusted to the vertical resolution of the remote sensing profiles (see Sect. 5.1). This means a significant reduction of the vertical resolution and cancelling out of the uncorrelated errors. The regridding and smoothing of the correlated errors is accomplished as follows: first, the errors WVMR e are added to the measured WVMR data. Second, for WVMR + WVMR e we perform the regridding as described in Sect. 5.1; i.e. we calculate the regridded version of the erroneous GRUAN WVMR profile. The difference between the erroneous and the original profiles (of the regridded versions) give the regridded GRUAN WVMR uncertainty profile ( g). Above the radiosonde (where we set g equal to the retrievals a priori) we set the uncertainty to 100 %. Then we calculate an uncertainty covariance matrix S g using the values from the uncertainty profile g and a large correlation length of 30 km individually for the two blocks representing the data measured by the radiosonde and the data above the radiosonde. Third, in analogy to Eq. (13) we apply the averaging kernels to S g and obtain the error covariance for the regridded and smoothed GRUAN profiles as Sĝ = (A 11 + A 12 )S g (A 11 + A 12 ) T .
(A4) Figure A2 depicts the square root values of the diagonal of Sĝ for the different ensembles. The uncertainties typically increase from 5 % near the ground to 5 %-20 % at around 10 km altitude. For higher altitudes they decrease again due to the decaying sensitivity (see averaging kernel plots of Fig. 3).
(a) (b) Figure A1. Profiles of the WVMR errors of the GRUAN radiosondes: panel (a) represents the correlated errors and panel (b) the uncorrelated errors. The colours distinguish between the different ensembles of the retrieval set-up: black for MI and LI08, and red for LI07 and SK07. Figure A2. Same as top panels of Fig. A1, but for the correlated errors in the regridded and smoothed GRUAN radiosonde data. Figure A3. Profiles of the correlated temperature errors of the GRUAN radiosondes. Above the radiosonde's top height, we use a zonally and monthly mean temperature climatology and assume an uncertainty of 5 K.

Appendix B: Uncertainties of GRUAN temperatures
The MUSICA IASI retrieval uses the EUMETSAT L2 temperatures as the a priori atmospheric temperatures. However, in summer 2007 EUMETSAT L2 data are not available, and instead we use the GRUAN temperatures for the LI07 and SK07 retrievals. In addition, for the MI and LI08 ensembles we simulate retrievals that use the GRUAN temperatures instead of the EUMETSAT L2 temperatures as the a priori atmospheric temperatures. For all these retrievals the uncertainty in the GRUAN temperatures and not the uncertainty in the EUMETSAT L2 temperatures has to be considered for the error estimation. Figure A3 depicts the correlated GRUAN temperature uncertainty profiles after regridding the data to the MUSICA retrieval grid points by using a triangle inverse-distanceweighted averaging function (as for the first H 2 O regridding step, see Sect. 5.1). We assume that the uncorrelated uncertainties are cancelled out by this averaging. Below 20 km altitude the GRUAN temperature uncertainties are well within 0.3 K; i.e. they are much smaller than the uncertainties of 1-2 K we assume for the EUMETSAT L2 temperatures. The GRUAN nighttime temperature data have an uncertainty that is rather constant with altitude, whereas for the daytime data the uncertainty monotonically increases with altitude. Above the top altitude of the radiosonde, we use a monthly and zonally averaged temperature climatology (COSPAR International Reference Atmosphere; Rees et al., 1990) and assume a temperature uncertainty of 5 K (this explains the instantaneous uncertainty increase that can be observed for some Manus Island and Lindenberg radiosondes).
Author contributions. CB performed most calculations for this work during his master thesis at KIT IMK-ASF and prepared the manuscript together with MS and in collaboration with all coauthors. MS developed the IASI retrievals in the framework of the MUSICA project and BE supported these developments by making the processing chain more efficient. FH wrote the PROFFIT and PRFFWD codes. OG helped in reading and formatting the EUMET-SAT IASI L2 data. MS provided the GRUAN radiosonde measurements in a very useful data format for the sites of Lindenberg and Sodankylä. MH helped us with the KOPRA calculations used for estimating the effect of the scattering by cirrus and mineral dust particles. ST provided all necessary data for the site of Manus Island in a very useful data format in the framework of a planned exercise called "Intercomparison of Hyperspectral Retrieval Codes". XC collected the IASI/GRUAN coincidences over Manus Island and helped us in the interpretation of the radiosonde's measurement uncertainties.
Competing interests. The authors declare that they have no conflict of interest.
Special issue statement. This article is part of the special issue "Towards Unified Error Reporting (TUNER)". It is not associated with a conference.