Evaluation of tropospheric water vapour and temperature profiles retrieved from Metop-A by the Infrared and Microwave Sounding scheme

. Since 2007, the Meteorological Operational satellite (Metop) series of platforms operated by the European Organisation for the Exploitation of Meteorological Satel-lites (EUMETSAT) have provided valuable observations of the Earth’s surface and atmosphere for meteorological and 5 climate applications. With 15 years of data already collected, the next generation of Metop satellites will see this measurement record extend to and beyond 2045. Although a primary role is in operational meteorology, tropospheric temperature and water vapour profiles will be key data prod-10 ucts produced using infrared and microwave-sounding instruments onboard. Considering the Metop data record that will span 40 years, these profiles will form an essential climate data record (CDR) for studying long-term atmospheric changes. Therefore, the performance of these products must 15 be characterised to support the robustness of any current or future analysis. In this study, we validate 9.5 years of profile data produced using the Infrared and Microwave Sounding (IMS) scheme with the European Space Agency (ESA) Water Vapour Climate Change Initiative (WV_cci) against ra-20 diosondes from two different archives. The Global Climate Observing System (GCOS) Reference Upper-Air Network (GRUAN) and Analysed RadioSoundings Archive (ARSA) data records were chosen for the validation exercise to provide the contrast between global observations (ARSA) with 25 sparser characterised climate measurements (GRUAN). Re-sults from this study show that IMS temperature and water vapour profile biases are within 0.5 K and 10% of the reference for ‘global’ scales. We further demonstrate the difference between diurnal sampling and cloud amount match-30 ups on observed biases and discuss the implications sampling also plays on attributing these effects. Finally, we present the first look at the profile bias stability from the IMS product, where we observe global stabilities ranging from -0.32 ± 0.18 to 0.1 ± 0.27 K/decade, and -1.76 ± 0.19 to 0.79 ± 0.83 % pp-35 mv/decade for temperate and water vapour profiles respectively. We further break down the profile stability into diurnal and latitudinal values and relate all observed results to required climate performance. Overall, we find the results from this study demonstrate the real potential for tropospheric wa-40 ter vapour and temperature profile CDRs from the Metop series of platforms.


Introduction
The water cycle, the largest movement of any substance between the surface and atmosphere, is a critical component of the Earth climate system (Chahine, 1992).Most water resides in the ocean and land reservoirs (ice, snow, surface and underground water, and biota); however, the small fraction (< 1 % by mass) found in the atmosphere acts as a greenhouse gas warming the lower atmosphere.As a greenhouse gas, water vapour has a predominant capacity for a positive feedback of approximately 2 W m −2 K −1 (Dessler et al., 2008), acting as a powerful amplification mechanism for anthropogenic climate change compared to radiative forcing from other greenhouse gases (Chung et al., 2014).Water vapour also influences (directly and indirectly) the radiative balance of the Earth, as well as surface and soil moisture fluxes.However, it is also sufficiently abundant and short-lived that it is considered to be under natural control (Sherwood et al., 2010).In the troposphere (the lowest 8-12 km), water vapour concentrations vary by 4 orders of magnitude between (i) the surface and the tropopause and (ii) wet tropical and dry polar latitudes.This global distribution, along with high temporal variability, results in tropospheric water vapour playing a significant role in global climate to micrometeorology-scale processes (Bevis et al., 1992).Therefore, accurately capturing distributions and changes in atmospheric water vapour is critical for climate studies (Held and Soden, 2000;Trenberth et al., 2005).
The capability for observing tropospheric water vapour has been around since 1966 with the Medium Resolution Infrared Radiometer (MRIR) flown on the Nimbus-2 platform (NASA, 2021a).This instrument consisted of five channels, with one sensitive to upper-tropospheric humidity (UTH) operating in the 6 µm region (NASA, 2021b).Subsequent advances in MRIR have seen the instrument evolve into the High-resolution InfraRed Sounder (HIRS), in which the fourth (and final) generation (HIRS/4) is a 19-channel instrument operating between 3.76 and 14.95 µm in the midinfrared band.The first combination of HIRS with companion microwave (MW) instruments sensitive to temperature and humidity was in 1979 on board the Television and Infrared Observation Satellite (TIROS)-N mission (NOAA fourth generation series satellite prototype).This combination of instruments became known as the TIROS Operational Vertical Sounder (TOVS) configuration (Smith et al., 1979).The TOVS set-up was operated until May 2007 on board the NOAA-6 to NOAA-14 missions.In 1998 the NOAA-15 satellite was launched with the Advanced Television Infrared Observation Satellite Operational Vertical Sounder (ATOVS), consisting of the Advanced Microwave Sounding Units (AMSU-A and AMSU-B) and HIRS/3 (Li et al., 2000) which provided significant improvements over TOVS, especially for numerical weather prediction (NWP) (English et al., 2000).This technological change also allowed for course profiles of tropospheric humidity and temperature to be inferred operationally (Courcoux and Schröder, 2015).Finally, the launch of the Atmospheric Infrared Sounder (AIRS) in 2002 (Chahine et al., 2006) and the Infrared Atmospheric Sounding Interferometer (IASI) in 2006 (Hilton et al., 2012) allowed water vapour and temperature profiles to be retrieved with an increased vertical resolution.This capability will be maintained into the 2040s through the Joint Polar Satellite System (JPSS) and the Meteorological Operational Satellite Second Generation (MetOp-SG) programmes.With nearly 15 years in space, the current IASI series of instruments represent a climate data record (CDR) in their own right.
This study evaluates a 9.5-year record of temperature and humidity profiles from IASI and its companion MW instruments on board MetOp-A, retrieved using the RAL Infrared and Microwave Sounding (IMS) scheme, developed through both UK and European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) funding and produced as part of the European Space Agency (ESA) Water Vapour Climate Change Initiative (WV_cci) project.While modern NWP systems assimilate some spectral information from IASI and other satellites, the IMS product is designed to be independent of reanalysis.Therefore, in addition to climate model evaluation, tropospheric profile information from IMS can be used for comparative studies of reanalysis for both meteorological and climate applications.An example of this application is shown in Fig. 1, where ERA5 has been collocated with IMS water vapour and temperature profiles.Here we see the daily differences between the data from satellites and reanalysis, with the most significant differences observed over polar regions.The assertion here is that IMS will look to maximize information content from each set of measurements in a way that is too computationally expensive for reanalysis.However, for users to be confident about using IMS in such a manner, profiles need to be validated so that their performance is characterized.
Here, the validation of the IMS data archive is done using two radiosonde archives: (i) the Global Climate Observing System (GCOS) Reference Upper-Air Network (GRUAN; Immler et al., 2010) and (ii) the Analyzed RadioSoundings Archive (ARSA; Scott, 2015).Both records cover the whole study period, with GRUAN supplying characterized soundings with higher vertical resolution at selected climate sites and ARSA providing global coverage with coarser vertical resolution.This approach allows for localized performance to be compared to broader global results, enabling a thorough test of the applicability of the IMS data for use as a CDR for temperature and humidity.This paper is structured as follows: Sect.2.1 provides a detailed description of the IMS algorithm used to generate the IASI temperature and water vapour profiles, with the radiosonde records discussed in Sect.2.2.Methods used for collocation and analysis by this study are provided in Sect.3, with results presented in Sects.4 and 5.

Data
This section describes the algorithm used to retrieve water vapour and temperature profiles from IASI with companion MW sounder data and details of the radiosonde datasets used for their assessment.Example of the global mean differences between IMS temperature and water vapour profiles and ERA5 reanalysis for 15 June 2012.Also included are the standard deviations (SD) for the differences.Reanalysis has been interpolated to the observation time and the centre of the IASI instantaneous field view.Before differences were calculated, the IMS averaging kernels were applied to the reanalysis profiles.Both IMS and ERA5 use IASI data.Therefore, differences are partly due to the differing backgrounds (a priori) and the different information extracted from the satellite radiances.For further discussion on averaging kernels refer to Sect. 3 (Methodology).

IMS
The RAL Infrared Microwave Sounding (IMS) core scheme employs the optimal estimation method (OEM) to jointly retrieve profiles of water vapour, temperature, and stratospheric ozone, along with surface spectral emissivity and cloud parameters from the Infrared Atmospheric Sounding Interferometer (IASI), Microwave Humidity Sounder (MHS), and the Advanced Microwave Sounding Unit (AMSU) on the MetOp satellites.The addition of spectral emissivity and cloud parameters to the state vector improved on the agreement with the European Centre for Medium-Range Weather Forecasts (ECMWF) analyses of lower-tropospheric water vapour.These developments also reduced the sensitivity to cloud contamination, significantly improving global coverage.Employing very weak constraints based on zonal mean climatologies of water vapour, temperature, and ozone, IMS is independent (in practice) of profile information.Therefore, this makes the IMS profile data record ideal for climate studies.

Algorithm description
The IMS algorithm is described in detail in Siddans (2019).It uses optimal estimation (Rodgers, 2000) to fit a set of measurements (in measurement vector, y) with known error covariance, S y , by optimizing a set of retrieved parameters (in state vector, x) using a "forward model" (FM), F (x), capable of predicting the measurements from an estimate of the state.This inverse problem is solved using an a priori estimate of the state, a, and its assumed error covariance, S a .The solution state is found by minimizing the following cost function (in this case, using the Levenberg-Marquardt method): The IMS scheme uses RTTOV 10 (Saunders et al., 2012) as the primary radiative transfer model (inside the FM).The measurement vector contains a sub-set of IASI, AMSU, and MHS spectral channels: for IASI, IMS follows the same approach as the v6 operational OEM scheme (EUMETSAT, 2014(EUMETSAT, , 2017) ) to pre-process the measurements and describe their errors.In particular, IMS uses IASI L1C (level 1C) spectra, which have been compressed and re-constructed using the operational principal components (Atkinson et al., 2010), which tend to filter noise.A further filter is applied to remove other instrumental artefacts (Hultberg and August, 2017).The 139 IASI channels (between 662.5 and 1900 cm −1 ) selected by EUMETSAT via information content analysis (see Fig. 2) are used by the retrieval algorithm.The v6 scheme used a scan-dependent spectral bias correction for IASI, determined by comparing observed spectra to RTTOV simulations, based on atmospheric profiles from the version 6 piecewise linear regression (PWLR) scheme.The correction was parameterized as a function of the view zenith angle using two spectra, b 0 and b 1 , to represent the mean bias spectrum and its (assumed) linear dependence on the secant of the view zenith angle.The measurement error covariance matrix was calculated from the differences between bias-corrected IASI measurements and the RTTOV calculations.The IMS scheme uses the same spectral selection, measurement covariance, and bias-correction spectra.However, instead of assuming a fixed-view zenith angle dependence for the bias correction, IMS jointly retrieves two parameters, x b0 and x b1 , which scale the spectra b 0 and b 1 .spectra are added to the RTTOV simulation, R(x) in the FM: The bias correction is needed to account for systematic differences between RTTOV and IASI observations, including errors in RTTOV.Allowing the retrieval to fit scale factors x b0 and x b1 instead of assuming a fixed scan-angle dependence improves the fit (gives lower cost) over a wide range of observing conditions.Examples of these corrections to systematically biased spectra are given in Siddans and Gerber (2015).The recent study by Calbet et al. (2018) supports this approach as the authors demonstrated that the inhomogeneities in water vapour within a satellite instantaneous field of view (IFOV) cause a significant modification in the results from radiative transfer modelling.Observational inhomogeneities across the IASI IFOV are predominantly due to clouds within the scene.These effects are accounted for at the L1 data stage by EUMETSAT through the collocation of Advanced Very High Resolution Radiometer (AVHRR) images within the IASI IFOV (EUMETSAT, 2019).
For MetOp-A, IMS uses all AMSU-A and MHS channels except for channels 7 and 8 due to instrumental problems.An across-track-dependent bias correction is applied to the AMSU and MHS measurements (fixed as a function of view zenith), based on analysis measurements and simulations for a set of cloud-free scenes over the sea equatorward of 60 • (Siddans and Gerber, 2015).The complete measurement vector comprises the selected IASI and microwave sounder measurements in a single vector.Errors in IASI channels are assumed to have no correlation with errors in microwave sounder channels.
The state vector, x, contains parameters representing surface temperature and emissivity, the temperature, water vapour, and ozone profiles, cloud fraction, cloud height, and the bias-correction scale factors, x b0 and x b1 .The state vector elements (along with their a priori values and variances) are described in more detail below (for further information, see Siddans, 2019).The a priori covariance is diagonal.Temperature, water vapour, and ozone profiles are represented using basis functions, Mx, which are the leading eigenvectors of a covariance matrix representing the prior variability in the profiles on the 101 RTTOV pressure levels.For temperature, 28 vectors are fitted, with 18 for water vapour and 10 for ozone.
The covariance matrices were determined by computing the differences from the zonal mean of ECMWF analysis profiles for 3 d (17 April, 17 July, and 17 October 2013).The zonal mean and covariance matrix were computed in kelvin for temperature and ln(VMR) for water vapour and ozone.The state vector comprises the coefficients of the leading eigenvectors of the covariance matrix.Temperature profiles (in K) on the 101 RTTOV pressure levels are defined (in the FM before calling RTTOV) from the corresponding 28 elements of the state vector as follows: where m T is the zonal mean (interpolated to the latitude of observation), M T is the matrix of eigenvectors, and x T is the temperature sub-set of the state vector.Water vapour and ozone profiles (in ppmv: parts per million by volume) are defined similarly with an exponent: The a priori state vector elements for temperature, water vapour, and ozone are all zero (the zonal mean of each profile is added in the FM).The eigenvalues of the covariance matrix are used as the a priori variances.A similar approach in the spectral domain is adopted to represent surface emissivity.The state vector includes weights for the 20 leading eigenvectors of an assumed global spectral emissivity covariance.The covariance is constructed using the RTTOV emissivity atlases to simulate emissivity in all the channels of IASI, AMSU, and MHS from the same set of scenes used to define the profile covariances.However, this approach is insufficient because only a limited amount of spectral information is represented in the RTTOV atlases.To accurately simulate spectra in all used IASI channels, it is necessary to introduce further spectral patterns from the University of Wisconsin emissivity database (Borbas and Ruston, 2010).The a priori values for the emissivity weights are set based on the RTTOV atlas for the specific location.Eigenvalues of the global covariance are used to define the a priori variances.
Cloud is modelled (via RTTOV) as a black body.The area fraction and top pressure are both retrieved for the cloud frac-tion.The state vector is the natural logarithm of cloud fraction with a priori and first guess value ln(0.01) and a priori error of 10.The log representation is adopted to prevent negative values of cloud fraction from arising.For cloud top height, the state vector is defined in terms of the cloud pressure, p: where z * corresponds approximately to altitude, and the a priori and first guess values are assumed to be 5 km with an a priori error also of 5 km.
Although not retrieved, variations in CO 2 , CH 4 , and N 2 O are represented by a monthly latitude-dependent climatology derived from the Monitoring Atmospheric Composition and Climate (MACC) greenhouse gas (GHG) flux inversion reanalysis (Bergamaschi et al., 2013).Surface pressure is defined from ECMWF analysis (ERA-Interim; Dee et al., 2011), adjusted to the mean altitude within the IASI footprint, assuming the logarithm of the surface pressure varies linearly with the difference between the IASI altitude and that of the ECMWF model.
A simple brightness temperature difference (BTD) test is applied for each scene to detect optically thick and highaltitude clouds using the IASI observation at 950 cm −1 and a simulation with ECMWF analysis.The scene is not processed if the BTD (observation − simulation) is outside the range of −5 to 15 K. Residual cloud will remain in a significant fraction of scenes; the joint retrieved cloud fraction and height allow this to be accommodated to some extent and can be used to more stringently clear the cloud in the retrievals.
IMS provides several diagnostics from the OEM which can be used to characterize the retrieved quantities (Rodgers, 2000).The error covariance (S x ) for a given solution using an optimal estimation retrieval framework is given by With the transformation from the state vector to vertical profiles within IMS being expressed as a matrix operation (Eqs.3 and 4), the corresponding error covariances for layer averages are obtained (e.g. for temperature) by where S x:T is the sub-matrix of the error covariance for the temperature elements only.Water vapour and ozone profiles require an additional conversion from log units to obtain the covariance of the mixing ratio profile (in ppmv): S q = (qM q )S x:q (qM q ) T , (8 where q is the retrieved water vapour profile (in ppmv).The averaging kernel (A) can account for the vertical sensitivity of the retrieved state vector and the influence of the a priori.This is because the averaging kernel characterizes the sensitivity of the retrieved state to the actual state (e.g. for water vapour): where the G is the gain matrix, and the subscript f associated with the Jacobian matrix (K) and averaging kernel (A) denote that derivatives are computed with respect to perturbations on the fine atmospheric grid, p atm , as opposed to the state vector.The 101 RTTOV pressure levels define this fine grid for the IMS algorithm.Therefore, the averaging kernel matrix is not square; rather, the two dimensions are the number of eigenvector weights in the state vector and the 101 levels in the "true" profile.However, the (square) averaging kernel derived using the state vector weighting function (e.g.A = GK) would give the derivative of the retrieved eigenvector weights with respect to true profile perturbations with shape given by the eigenvectors.The practical uses of the square averaging kernel matrix (A) are (i) to smooth atmospheric profiles from models, reanalysis, and in situ measurements (discussed further in Sect.3.2) and (ii) to obtain the degrees of freedom for signal (DOFS) values for specific retrieval products.The DOFS values are given by the trace (sum of the diagonal elements) of the sub-matrix of A corresponding to a specific product and represent the total number of independent pieces of information in the profile.The averaging kernel of retrieved water vapour profiles (defined on the RTTOV levels) with respect to perturbations on the fine atmospheric grid is given by where A f :q is the averaging kernel for the water vapour state vector elements with respect to perturbations (in ln(ppmv)) on the fine atmospheric levels.The averaging kernel for temperature is derived similarly using the corresponding matrices.An understanding of a profile vertical resolution can be inferred from the DOFS value, as it describes the number of independent pieces of information resolved (Rodgers, 2000).
Figure 3 shows the range of DOFS values for IMS temperature and water vapour profiles as a function of latitude, with the tropopause height (TPH) overlaid.The two-dimensional (2D) histograms show the distribution of profile DOFS values, from which we can see that in the tropics, most profiles sit between DOFS values of 6-7 and 11-12 for water vapour and temperature, respectively.Moving outwards through the mid-tropics to the high latitudes, the DOFS values reduce, with the distribution becoming more variable.Comparing the water vapour distribution to the cold-point tropopause height (dashed black line), we can observe that they hold similar shapes, while for temperature, this is less so.This result is expected as nadir infrared (IR) plus MW sounders are predominately sensitive to the emissions from the troposphere, especially for water vapour.The next conceptual step is how the DOFS values relate to the vertical resolution of IMS profiles.Examples of averaging kernels for water vapour and temperature profiles from the IMS level 2 (L2) product are given in Fig. 4, where we see that most of the information for water vapour is situated in the lowest 10 km of the atmosphere, while for temperature it is more continuous into the lower stratosphere.Therefore, an examination of the cumulative degrees of freedom for signal (CDOFS) values from these averaging kernels can be used to describe the vertical resolution of the retrieved profiles.The gradients of the CDOFS values as a function of altitude can then be interpreted as the profile resolution at given heights.The desired performance for vertical resolutions from IASI is 1 and 2 km for temperature and water vapour profiles, respectively (Hilton et al., 2012).What can be seen from Fig. 4 is that vertical resolution is not necessarily constant throughout the troposphere.Indeed, examining a different sounding over the same radiosonde site would show subtle differences.A key observation here is that the information from the water vapour profile terminates (vertical gradient) at the tropopause.Therefore, using the IMS water vapour profile above this height is meaningless.

Radiosonde reference measurements
This section outlines the two radiosonde records used as reference measurements in this study.The first source of radiosonde measurements used has been taken from the GCOS Reference Upper-Air Network (GRUAN) (Immler et al., 2010;Dirksen et al., 2014) archive, and the locations of the sites can be seen in Fig. 5a.The scope of GRUAN is to provide long-term fiducial measurements, i.e. inclusion of uncertainty estimates, that can be used for calibration/validation exercises, studying atmospheric processes and determining trends.These high-resolution soundings are reported on time intervals of 2 s during the flight from the surface into the upper troposphere and lower stratosphere (UTLS) rather than the set pressure grid used by operational radiosonde archives.An advantage of the higher resolution of GRUAN measurements is that it captures changes in humidity gradients and temperature inversions which can be missed or underrepresented by standard and significant pressure levels.It should be noted that the soundings from GRUAN feature only the Vaisala RS92 radiosondes measurements and not the more recent (and accurate) RS41.
The second source of radiosonde data is taken from the Analyzed RadioSoundings Archive (ARSA).Produced at the Laboratoire de Météorologie Dynamique (LMD) since the late 1990s, ARSA is designed for the processing and validation of level 1 (L1) and level 2 (L2) satellite data and applications.This includes forward and inverse radiative transfer simulations and intercomparison of retrieved satellite geophysical parameters.The ARSA database is a global archive with observations from approximately 1450 stations.In the first instance, raw radiosonde observations with measurements between the surface and 300 hPa for water vapour and 30 hPa for temperature profiles are extracted from the ECMWF archive.These radiosonde observations are then extended above their highest measured point to 0.1 hPa with collocated data from ERA-Interim.Finally, level profile data from the SciSat Atmospheric Chemistry Experiment Fourier transform spectrometer (ACE-FTS) are used to complete the profile between 0.1 and 0.0026 hPa.The vertical resolution of ARSA varies within the profile, where the lowest part of the troposphere ranging from the surface to 800 hPa has a resolution of 0.5 km.Between 800 and 200 hPa, the resolution is 0.8 km, increasing to 1.5 km from 200 to 100 hPa.Above 100 hPa to the top of the atmosphere, the resolution further reduces to 2.5 km.Unlike GRUAN, which applies several corrections to the raw measurement, e.g.correction to water vapour due to incident solar radiation on the radiosonde casing, rather the validation of every ARSA profile relies upon analysing the bias and standard deviation between observed satellite and simulated radiances (Scott, 2015).The ARSA measurement record started in January 1979 and is regularly updated on a monthly basis.Locations of 587 sites present in the archive during the study period can be seen in Fig. 5b.Example of averaging kernels (AKs) for IMS water vapour (q) and temperature (T ) profiles extracted over the GRUAN Lindenberg (LIN) site.Panel (c) shows the cumulative degrees of freedom for signal (CDOFS) values, which illustrates how the vertical distribution information content can be related to profile vertical resolution.In the lower-middle troposphere, the water vapour CDOFS aligns with a vertical resolution of 2 km.Above 8 km, the gradient swiftly becomes zero around the tropopause height (TPH).The temperature profile starts at a 1 km vertical resolution and reduces to 2 km at 6 km.Above this height, the gradient shows the vertical resolution changes to ≈ 2.5 km, which remains consistent into the upper troposphere and lower stratosphere (UTLS) before further degrading to 3 km per degree of freedom (DOF).A1.
For this study, we use the current version 2.7 archive, which has been in use since 2005.
Finally, it is worth noting that, while radiosondes provide a source of reference data for profile validation, they are not without their own limitations and caveats of use.
-Model type.Corrections made to radiosondes are highly dependent on the make and model type, especially with older radiosondes (e.g.Miloshevich et al., 2001Miloshevich et al., , 2006)).
Both archives used in this study have different approaches to correct radiosondes, with GRUAN applying empirical corrections (Dirksen et al., 2014) and ARSA using radiative transfer modelling to test for consistency between stable satellite radiances (Scott, 2015;Calbet et al., 2017).
-Time series consistency.Radiosonde archives are subject to semi-regular observation system changes, some of which are recorded by the World Meteorological Organization (WMO).For GRUAN, their certified sites undergo periodic auditing of their measurement programmes and annual reviews to ensure all sites continue to meet practice standards.It is unclear how well this approach scales from ≈ 30 sites to more than 500 found in a global network.ARSA uses the long-term statistics from the radiance intercomparisons to ensure quality consistency across the archive.This approach allows for a common method to be applied to a global network of up to 1450 sites; however, this relies on the radiometric stability of the reference satellite instrument.
-Sources of uncertainty.Radiosondes are subject to a number of sources of uncertainty which can be difficult to characterize fully.The GRUAN provides a comprehensive error budget for their products as their correction process allows estimates for each step.However, ARSA, like other global datasets, does not give an uncertainty for the profiles it provides due to the complexity of such an exercise.In Trent et al. (2019), it was demonstrated that the uncertainty of operational records reduces to a few percent of parts per million by volume (% ppmv) with large collocation numbers.
-Distribution of sites.One of the strengths of operational radiosonde records is a large number of global sites available for match-ups.While ARSA does quality filter these, it still has over 500 sites within the study period.
For GRUAN, there are only a small number of sites, A key weakness for any radiosonde archive is the lack of sites in the Southern Hemisphere, especially for GRUAN (Fig. 5a).

Collocation of IMS profiles at radiosonde sites
The framework for creating the satellite match-ups to ground truth used in this study has been developed within the ESA WV_cci project and builds from previous validations studies (Trent et al., 2018(Trent et al., , 2019)).Referred to from here on as the match-up processor (MUP), this framework is designed to handle swath or gridded satellite data, as well as several predefined in situ references.The match-up database (MUDB) is generated by supplying the MUP a driver file containing information on (i) the dataset being validated, (ii) the validation data record used as a reference, (iii) what variable is being validated, (iv) the date range to process, and (v) which set of collocation criteria to use.This approach allows for a flexible system that is capable of rapidly processing whole missions.
This study used broad criteria to maximize collocations for both radiosonde datasets.An IASI profile was initially considered collocated to a GRUAN or ARSA station if the satellite measurement fell within ±3 h and 100 km of the radiosonde launch time.The IMS profile also required a consistent averaging kernel and uncertainty information to be propagated by the MUP.As the IMS scheme retrieves in cloudy and clear-sky conditions, we accept all scenes with up to 80 % cloud cover (Susskind et al., 2006).Finally, an additional quality filter was applied to all matched cases that fell within these criteria to reduce uncertainty.Levels within all profiles were excluded if the IMS water vapour profile uncertainty was above 50 % ppmv.This scenario predominately was found to occur for IMS profiles only at high altitudes in the troposphere, resulting in lower-density sampling, which will introduce some noise to the analysis.However, this is minimized by calculating global or per latitude band statistics which use large numbers of matched pairs, unlike for site comparisons with a low number of matched cases.When using broad collocation criteria, any mismatch introduced during the match-up will affect the performance of individual comparison performance (Sun et al., 2010(Sun et al., , 2017)).Therefore, a robust statistics approach was adopted to minimize this effect as demonstrated in Trent et al. (2019).

Comparison of IMS profiles with radiosonde
Retrieved temperature and humidity profiles from IASI, MHS, and AMSU-A represent the best estimates of the atmospheric state, to which a smoothing function has been applied (Rodgers and Connor, 2003).Therefore, averaging kernels from the IMS L2 product are used to smooth (or convolve) the radiosonde profile to the vertical resolution of IASI.This allows for like-for-like comparisons between the retrieved and reference profiles.For radiosonde temperature profiles, the averaging kernel is applied thus: where x o is the IMS a priori profile, Ã is the averaging kernel that has been reconstructed onto the 101-level retrieval grid, x t is the radiosonde reference profile on the 101-level grid, and x est is the convolved reference profile.In the thermal infrared (TIR) band, changes in column density of water vapour have greater linearity in log space relative to any absolute change.Therefore, for humidity profile comparisons, Eq. ( 11) is rewritten as follows (Maddy and Barnet, 2008): Next, values are calculated for weighted layers (x (z) ) within each profile, where the layer boundaries are defined by standard pressure levels defined at 1000, 925, 850, 700, 500, 400, and 300 hPa: where x (l) is the convolved radiosonde or IMS profile value at level l, p (l) is the pressure profile value at level l, and n is the numbers of levels in the layer.Weighted layer mean profiles are not calculated for altitudes higher than 300 hPa because ARSA profile values are taken from ERA-Interim in the upper troposphere and stratosphere.All statistics used in this study are calculated from the layer mean profiles.Firstly, for each layer, we calculate the systematic difference or bias (b (z) ).As with Trent et al. (2019), we use the median difference: where x (z) and x est(z) are the profile values for layer z for IMS and the radiosonde, respectively.Water vapour profile values will vary by up to 4 orders of magnitude between the surface and upper troposphere.Therefore, the layer bias is normalized by the median radiosonde layer value (x est(z) ): Profiles from GRUAN, unlike ARSA, are provided with estimates of the uncertainty for each measurement.These can then be propagated to provide corresponding uncertainties for profile measurements.However, when averaging over large numbers of collocations, the uncertainty of the bias reduces below 1 % ppmv.In the Trent et al. (2019) study, bias uncertainties for AIRS were shown to reduce to between 0.15 % ppmv-0.43% ppmv for global matches to GRUAN, although what is difficult to calculate accurately and is not accounted for in tropospheric profile validation studies is the collocation uncertainty.While the collocation uncertainty will also reduce with averaging large numbers of matches, broad collocation criteria and atmospheric variability mean this uncertainty will still dominate the total error budget.Therefore, we can think of the variability in the median as an estimate of the precision of the bias.To quantify the spread about the median, we calculate the median absolute deviation (σ (z) ), a robust measure of the data variability: (16) As we use robust statistics, the median absolute deviation (MAD) values cannot be treated in the same way as standard deviation and used to calculate the standard error by dividing by √ N.For water vapour, MAD values are also normalized by the median radiosonde layer value: where σ(z) is the normalized layer MAD.Scaling the normalized values by 10 2 presents the units for both the bias and median absolute deviation in % ppmv.This allows biases at different layers to be relatable.For future studies, the approach for handling collocation uncertainty can be made more sophisticated than is outlined here.A new study from Laeng et al. (2022) provides a framework to account for the natural variability in atmospheric mixing ratios, allowing for such estimates.At the time the work of our study was undertaken, this tool was not available and as such is not discussed further.
Finally, for examining the stability in the observed biases, a level-shift regression model is used to calculate the trend in the monthly IMS layer (Weatherhead et al., 1998;Mieruch et al., 2014): where Y t is the bias at time t, µ is the intercept, ω is the trend in the bias, X t is the time index, δ is the magnitude of any shift, U t is the step function, and η t is the fit residual.For this study, the step function is assumed to be negligible (i.e.δU t = 0) because the IASI instrument is considered to be a stable reference.This is evidenced by the use of IASI brightness temperatures for calibration by the Global Spacebased Inter-Calibration System (GSICS) for other IR satellite sensors (Goldberg et al., 2011).For the residuals, the same approach is used from Schröder et al. ( 2019) in which four frequencies (asymmetric fitting of the annual cycle) and El Niño-Southern Oscillation (ENSO) strength are fitted simultaneously.4 Validation over GRUAN sites

Water vapour biases
Results were computed for matches made in cloudy conditions (up to 80 % cloud covered) over 17 of the 18 GRUAN sites (see Table A1 for the complete list), with Darwin being the only site with no cases found.Matches were further subdivided into day and night scenes using the solar zenith angle from the IMS L2 product.Figure 6 presents the results for IMS water vapour profile biases, median absolute deviations for all scenes, and the split between day and night cases.IMS biases show a generally low wet bias relative to GRUAN, which increases with altitude.The lowest bias is found in the mid-tropospheric layer between 850-700 hPa, where a slight dry bias of −0.29 ± 17.51 % ppmv is observed.This layer coincides with the overlap in peak vertical sensitivities of the IASI 6 µm region and MHS 183 GHz channels.In the lower troposphere (1000-850 hPa), daytime biases dominate with a high of 8.93 ± 12.72 % ppmv seen in the surface layer.The inverse is valid for the mid-to-upper-tropospheric layers (700-400 hPa), where night-time biases are larger than the equivalent daytime biases by about 2 % ppmv in both layers.The upper-tropospheric layer (400-300 hPa) displays a consistent wet bias across the day and night scenes (10.45 ± 14.46 % ppmv to 10.39 ± 16.29 % ppmv, respectively).each GRUAN site between June 2007 and December 2016.
The first point is that sampling differences can be up to 3 orders of magnitude because GRUAN does not have regular launch data for each site.The Lindenberg (LIN) lead site provides approximately 72 % of all matches made to GRUAN radiosondes.Therefore, "global" biases are weighted towards the Lindenberg site result.GRUAN sites situated at high latitudes in the Northern Hemisphere tend to show wetter biases for the lower-to-mid-tropospheric layers.In contrast, GRUAN sites in the Tropical Warm Pool (TWP) see persistent dry biases below 500hPa.Northern Hemisphere high-latitude sites also have a weaker performance in the upper-tropospheric layer (400-300 hPa), with Barrow (BAR) seeing a wet bias of 27.63 ± 18.58 % ppmv.

Temperature biases
The same exercise is repeated for IMS temperature profiles, with biases reported in kelvin (K). Figure 7 shows the results for biases calculated for all matches over all sites and the diurnal split between day and night cases.IMS temperature biases are within ±0.2 K for the first scenario.The bottom and top layers both see cold biases of −0.18 ± 0.49 and −0.17 ± 1.16 K, respectively, while the rest of the troposphere shows warm biases between 0.06 ± 0.62 and 0.21 ± 0.55 K.As with water vapour, temperature biases in the lowest layer (925-1000 hPa) are dominated by the daytime bias (−0.44 ± 0.99 K).This negative daytime bias continues up to 500 hPa, with the magnitude reducing with altitude.Night-time biases below 400 hPa show warm biases between 0.11 ± 1.31 and 0.32 ± 1.07 K, with the largest seen in the mid-tropospheric layers between 700 to 925 hPa.The night-time bias dominates the mid-tropospheric temperature bias for all sites and all matches.The median absolute deviation ranges between 0.43 to 1.31 K for scenarios with a decrease in magnitude consistently observed as a function of altitude.Higher variability is observed for night-time temperature biases relative to the daytime, mirroring the behaviour seen for water vapour.
The breakdown of IMS temperature profile biases by GRUAN site is given in Table 2.As expected, temperature biases for all sites and matches are weighted towards Lindenberg (LIN) results.The surface layer negative bias is a common feature for stations situated in the mid-latitudes and tropics, with the coldest bias value of −1.57± 0.7 K seen over Manus (MAN).However, the site at Potenza, Italy (POT), displays a different behaviour with a warm bias of 2.44 ± 1.7 K in the surface layer.With only 126 matches over the whole period, this will have little impact on the collectively observed bias.A majority of sites see a cold bias in the mid-troposphere (400-925 hPa), whereas the result for "all sites/all matches" (Fig. 7) shows a small warm bias for these layers.However, the sites which exhibit the warm bias also tend to have a higher number of matches (> 1000) and are mainly found at higher latitudes, e.g.Barrow (BAR), LIN, Ny-Ålesund (NYA), and Sodankylä (SOD).

Biases dependence on cloud fraction
A key benefit of using the combination of IR and MW instruments for NWP is the ability to produce water vapour and temperature profiles in clear and cloudy scenes.However, it has been shown for the Atmospheric Infrared Sounder (AIRS) that cloud amount and type can impact profile biases (Hearty et al., 2014;Wong et al., 2015;Trent et al., 2019).Therefore, understanding the impact of cloud fraction within the IASI IFOV on IMS profile biases is also of in-terest to this study.IMS profile biases were binned according to cloud fraction at intervals of 0.1 for all sites for all matches (day and night cases) and for the separate day and night cases.Water vapour and temperature bias results as a function of cloud fraction are presented in Fig. 8 along with the difference between day and night cases and the "all cases" result.It should be noted that a BTD flag is used to remove cloudy scenes that significantly impact the retrieval.While IMS can produce profiles for cloudy IFOVs, the BTD flag will disproportionately remove some of the profiles across increasing cloud fractions.This explains the distribution we observe in Fig. 8.
Water vapour profile biases shown in Fig. 8 indicate that the wet bias seen in the lowest tropospheric layer (925-1000 hPa) is weighted towards clear skies or scenes with a cloud fraction below 0.1 (or 10 %).The stronger wet bias observed in the upper-tropospheric layers (300-500 hPa) is more sensitive to cloud amounts > 10 %.At higher cloud fractions, the wet bias is seen to double from < 10 % cloud cover to 18.87 % ppmv.Cloud amount can also be attributed to the slight warm biases observed in the mid-to-upper troposphere (925-400 hPa) relative to GRUAN.The most affected layer is found between 925-850 hPa, with the maximum warm bias of 0.73 K above 50 % cloud fraction.For the 500-400 hPa layer, the biases seen above 60 % cloud fraction dominate the result seen in Fig. 7, whereas for the 925-850 hPa layer where the strongest biases are found, biases seen in the global all-site result are being significantly weighted by cold biases seen in cloud fractions < 10 %.The split of matches into daytime and night-time cases also reveals a diurnal dependence of the cloud fraction and observed biases.From visual inspection, an apparent 1 : 1 gradient (running bottom left to top right) is observed in both scenarios, splitting behaviour seen in day and night cases relative to the cumulative result.Daytime biases are up to 3.88 % drier relative to those seen for the global all-site result above the 1 : 1 split, while below it, wetter biases are seen for daytime matches with larger differences of 5.08 % to 6.91 % observed.The inverse of this relationship is seen for nighttime results.The region above the 1 : 1 gradient is wet-biased by up to 2.29 % relative to all matches and dry-biased below the 1 : 1 division.Biases observed in this region again see larger differences in all matches for night-time cases, with a maximum difference of −4.45 % seen in the lowest tropospheric layer (1000-925 hPa).
The diurnal pattern for temperature biases displays a more monotonic behaviour than water vapour.Daytime biases are almost exclusively colder than the result for all matches, while night-time biases are more warm-biased.In both scenarios, the lower troposphere (below 850 hPa) with cases up to 20 % cloud cover shows the greatest differences of −0.28 and 0.3 K for day and night matches, respectively.
Figure 9 illustrates sampling across the cloud fraction bins in absolute and relative terms.Over 60 % of all matches are found for scenes with 0 %-10 % cloud cover, with 40 % of all those cases found to be when the IMS IFOV is 0 %-1 % (e.g.clear skies).The number of matched pairs drops off significantly with increasing cloud fraction.For the highest cloud cover category (70 %-80 %), only 0.7 % of cases remain.While sampling of cloud cover reduces in frequency as cloud fraction increases, relative sampling between day and night scenes for each bin is reasonably consistent with an average 43 % and 57 % split, respectively.

Validation over ARSA sites
While radiosonde archives such as ARSA do not contain the same level of fiducial information as GRUAN, a key advantage is greater global sampling.A multi-year time series can yield match-up numbers 1 to 2 orders of magnitude greater than with a smaller network like GRUAN.Therefore, analysis against GRUAN expands on the global results by splitting match-ups into five latitudinal bands.Finally, global and latitudinal bias trends are examined to assess the stability relative to Global Climate Observing System (GCOS) requirements.

Global profile biases
The collocation of IASI soundings against ARSA radiosonde measurements between 1 June 2007 and 31 December 2016 yields over 1.2 × 10 6 matched pairs for analysis, with a 59 % and 41 % split between day and night overpasses, re-spectively.Water vapour profiles are wet-biased between 0.38 % ppmv and 6.54 % ppmv relative to ARSA, with the larger biases seen in the 1000-925 and 500-400 hPa layers (Fig. 10a).Like comparisons to GRUAN, the smallest bias is seen in the mid-troposphere.The spread of biases measured by the median absolute deviation shows the same behaviour as GRUAN results, with values ranging from 11.85 % ppmv to 18.35 % ppmv and the larger values occurring between 850-400 hPa.Daytime cases dominate the observed wet bias at each layer seen in the result for all matches, with a maximum of 9.39 % ppmv seen in the lowest tropospheric layer.In contrast, night-time biases drop below 4 % ppmv, with all layers showing lower biases than the results for all matches and daytime.In the mid-troposphere, the night-time is again the smallest in magnitude, though it switches from a wet to a dry bias (Fig. 10b).
Temperature profile biases are found to be within −0.39 to 0.06 K relative to ARSA with a predominant cold bias (Fig. 10c).The observed biases' variation is highest in the surface layer, with a median absolute deviation of 1.13 K.The magnitude of the layer median absolute deviations reduces with altitude, with a value of 0.46 K seen in the uppertropospheric layer (400-300 hPa).Figure 10d highlights that daytime matches dominate the IMS profile cold bias seen for all matches, while night-time matches exhibit a small warm bias (0.07 to 0.22 K) between 925-400 hPa.With more than 2 × 10 5 matched pairs, daytime median absolute deviation values are constantly lower than those calculated for nighttime collocations.However, these differences are less than 0.12 K on average.
Figure 11a-c show the impact of cloud fraction on the IMS water vapour profile biases on all, day, and night matches, respectively.In general, increasing cloud amount slightly reduces the wet bias relative to clear skies below 850 hPa while increasing the wet bias between 850-400 hPa.This pattern is also observed for daytime collocations, though biases are wetter by up to 4.5 % ppmv than those seen for all matches.Similarly, night-time collocations show the same behaviour, except biases tend to be drier relative to all matches by as much as −3.35 % ppmv.
Figure 11d-f, for all, day, and night cases, respectively, show the same results for temperature biases.The cloud faction impact on all matches shows an average warm bias of 0.2 K, with a maximum of 0.44 K for cloud fractions above 10 %.In the upper-tropospheric layer (400-300 hPa), the cold bias increases in magnitude by ≈ 0.1 K with increasing cloud faction within the field of view (FOV).The separation of the diurnal effects of cloud fraction on temperature profile biases has the same behaviour seen over GRUAN sites.Daytime biases are colder relative to the all-site result by up to −0.37 K, while night-time biases are warmer by as much as 0.4 K.The larger differences in both cases are seen below 700 hPa, nearer to the surface, and for cloud fractions greater than 30 %.

Latitudinal dependence on biases
To investigate how biases change with latitude, collocations are binned into five broad bands that span 90-60 • S, 60-30 • S, 30 • S-30 • N, 30-60 • N, and 60-90 • N. Due to the disproportionate distribution of global radiosonde sites (see Fig. 5b), the percentage of match-ups are split 0.8 %, 4.7 %, 21.7 %, 62.9 %, and 9.9 % between the bands, respectively.Once separated, matches were processed in the same manner as global results to produce biases for all, day, and night cases and cloud fraction dependence.
Figure 10e shows IMS water vapour profile biases per latitude band for all matches.The largest biases are observed between 90-60 • S, where values are > 20 % ppmv in most layers.However, there are only 10 sites along the Antarctic coastline at this latitude.The mid-latitude bands show similar performance with wet biases between 1 % ppmv-12 % ppmv, with the largest biases seen in the 1000-925 hPa surface layer.The northern mid-latitude band is slighter and better performing with wet biases 3 % ppmv lower on average.Biases in the tropical band are the lowest ranging between −2.6 % ppmv and 4 % ppmv below 700 hPa and −0.4 % ppmv and 0.9 % ppmv above 700 hPa.Finally, the Arctic band (60-90 • N) sees predominately wet biases below 10 % ppmv with a small 1 % ppmv dry bias seen between 700-500 hPa.All bands show the same median absolute deviation distributions with varying magnitude except for the Antarctic band, where the highest values are seen at the surface (25 % ppmv), reducing with altitude.The daytime wet bias dominance observed in the global all result (Fig. 10b) is also seen in the northern mid-latitude band.This is not coincidental, as 60 % of all daytime cases are found between these latitudes.The other latitude bands show differing patterns, though most of all biases are wet relative to ARSA.The main exception to this behaviour is in the tropics, where night-time values above 850 hPa (free troposphere) are dry-biased up to 4.3 % ppmv.An examination of the impact of cloud fraction shows Northern Hampshire midlatitudes strongly influence what is observed for all global results (Fig. 11g).At other latitudes, comparisons to ARSA generally present wetter biases except for the tropics.Here we observe the (overall) lowest wet biases and a persistent dry bias between 850-700 hPa.The strong wet biases are seen below 60 • S in Fig. 10e and continue across all cloud amounts.It is worth noting that strong wet biases can be correlated with low sampling, especially in lower-and uppertropospheric layers.Diurnal behaviour seen for changes in cloud fraction in the global comparisons to ARSA prevails when split in latitude bands with one key difference.At high latitudes above 500 hPa, biases show opposing results.When daytime biases are normally wet, they become dry-biased relative to global results.For night-time, the inverse is observed where the expectation is that biases are drier relative to the global result and are now wetter.This effect could arise from upper-tropospheric-layer sensitivity to the tropopause https://doi.org/10.5194/amt-16-1503-2023Atmos.Meas.Tech., 16, 1503-1526, 2023 and lower stratosphere, similar to what has been observed for AIRS (Trent et al., 2019).
Results for IMS temperature profile comparisons to ARSA radiosondes for different latitude bands and the split between day and night match-ups are shown in Fig. 10g and h, respectively.IMS shows a dominant cold bias at tropical and midlatitudes relative to ARSA.Here values range from −1.01 K in the surface layer in the tropics to 0.13 K in the mid-toupper troposphere between 30-60 • N, with an average bias of −0.2 K, whereas at high latitudes, a warm bias is observed with maximum values between 0.74-0.87K.However, unlike in lower latitudes, polar biases peak throughout the free troposphere rather than the surface layers.An analysis of diurnal biases shows that outside tropical latitudes, night match-ups are consistently warm-biased, with a strong dominance above 60 • N. Mid-latitude daytime biases are cold relative to ARSA temperature profiles, where the distributions and magnitudes negate the warm biases in the "all" match-up results.In the tropics, daytime and nighttime biases are similar, with the magnitude of the night-time results only 0.03 K colder on average relative to daytime biases.The spread of biases exhibits a high-level consistency across the latitudinal bands relative to both one another and the global results.Median absolute deviation values range from 0.96-1.3K near the surface, degrading to 0.4-0.57K in the upper troposphere.Like the global results, night-time variability is higher than daytime matches, reaching a maximum of 0.26 K in the lower troposphere.
Temperature biases as a function of cloud are shown in Fig. 11j-l for all, day, and night differences, respectively.Like water vapour, temperature biases for the 30-60 • band show a strong resemblance to the global result for all matches (Fig. 11d).The significant influence of cloud fraction on temperature profile biases is observed at high latitudes.Polar temperature biases can become increasingly warm-biased as the cloud amount within the IFOV increases.There is also a vertical dependence on this behaviour as the sensitivity to cloud fraction reduces with altitude, i.e. the surface layer biases respond continuously to cloud fraction, while other altitudes peak at lower cloud amounts.Temperature biases for the surface layers compared to ARSA soundings reach 1.5 and 3.3 K for northern and southern polar sites, respectively.Diurnal patterns generally follow the global results with warmer night-time biases compared to daytime collocations.However, there are a few occasions where this behaviour is flipped.Most notable are the matches between 90 and 60 • S, where daytime near-surface layers show bias peak differences 1.36-1.77K warmer relative to night-time biases for cloud fractions above 40 %.The second region is the northern polar region where the magnitudes for the surface tropospheric layer are similar though reversed in sign.This suggests that cloud dependence observed in the global results has no diurnal dependence, whereas the Southern Hemisphere does exhibit a diurnal dependence for bias changes to cloud fraction.The final region to note is found above 500 hPa in the tropics, which shows the inverted general behaviour in a consistent manner across all cloud fraction amounts.Differences in diurnal biases at these altitudes are within 0.25 K of one another.Finally, the strongest diurnal bias differences from "all" matches are found nearer the surface at all latitudes except for collocations above 60 • N. The largest differences are observed between 850-400 hPa for cloud fractions below 40 %.Relative to water vapour, temperature profile biases exhibit greater complexity in the presence of varying cloud amounts.As with GRUAN, collocations made over ARSA sites show disproportionate sampling under different cloud fraction amounts, with the highest frequency always seen for cloud fractions below 10 % (Fig. 12).Furthermore, daytime match-ups at ARSA sites dominate across all bins, whereas for GRUAN, it was nighttime collocations.With the separation of the ARSA results into the five latitudinal bands, any sampling bias will have a higher impact due to the lower collocation numbers outside of 30-60 • N.

Bias stability
The final analysis performed on IMS profiles looks at the stability of the biases over the study period (June 2007-December 2017).The Global Climate Observing System (GCOS) sets performance requirements for essential climate variables (ECVs), which for water vapour and temperature profiles are 0.3 % per decade and 0.05 K per decade, respectively (GCOS, 2016).It should be noted that the unit for water vapour is in absolute units (e.g.ppmv) rather than relative humidity (% RH).Monthly median biases were first calculated for global and latitudinal banded results, and the trends were calculated using Eq. ( 18).For the linear trend model, we follow the same approach used within the Global Energy and Water Exchanges (GEWEX) Water Vapor Assessment (G-VAP) where four frequencies and the strength of the ENSO are fitted simultaneously as part of the regression.A correction is also applied to the trend uncertainty for autocorrelation (for details, please see Schröder et al., 2019).While the record analysed is only 9.5 years, results are scaled to give stability in units per decade (e.g.K per decade) for comparison to GCOS requirements.
Results for water vapour profile bias stability are given in Table 3. Global comparisons show that biases between 850 hPa and the surface are within 0.3 % ppmv per decade for all cases (global average).However, when the split between day and night matches is examined, we observe positive trends for daytime cases and negative trends for night- time cases, all outside GCOS requirements.Above this altitude, the bias trend increases and switches sign, before reducing in the upper-tropospheric layer and becoming positive again.This behaviour is driven more by the daytime collocations, whilst the night-time cases generally show smaller positive bias trends (i.e.stability) relative to all match-ups.
When broken down into the five latitude bands, stability performance becomes more complex, with broad variability for the calculated trends ranging from −10.71 ± 4.45 % ppmv to 3.38 ± 1.84 % ppmv per decade.When split diurnally, this range increases to between −17.96 ± 8.26 and 11.22 ± 12.29, with both the maximum and the minimum found between 90-60 • S from night-time cases.Negative trends between 850-1000 hPa observed in the tropics influence global results and are driven by daytime cases.A key point to note is that though several signals are fitted to the bias time series, a significant amount of noise remains.Only 53 % of results are outside the uncertainty, and these tend to be where poorer performance is observed, e.g.90-60 • S. Night-time trends have a slightly better performance, with 60 % of the trends outside of the uncertainty.
Table 3. Stability of IMS water vapour profile biases at ARSA sites, given as the trend in the bias (with units of % ppmv per decade).Trends are reported globally as well as five latitude broad bands for (i) all matches, (ii) daytime-only matches, and (iii) night-time-only matches.It should be noted that the separation between day and night cases is more representative of seasonal differences (e.g.polar winter/summer) than diurnal ones for high latitudes.Values denoted with * are outside the 95 % confidence interval.

Pressure layer
Global 90-60 While exhibiting strong negative and positive gradients, the bias trends for the Antarctic latitudinal band have little impact on the global result.This is reassuring as they are all outside the 95 % confidence interval.For results within the 95 % confidence interval, the uncertainties of water vapour bias trends range from 0.66 % ppmv-1.84% ppmv per decade for all cases, 0.46 % ppmv-2.45% ppmv per decade for daytime-only cases, and 0.9 % ppmv-2.3% ppmv per decade for night-time match-ups.Table 4 gives the bias trends for global and latitudinalband match-ups for IMS temperature profiles.Trends range between −0.41 ± 0.36 and 0.31 ± 0.24 K, with only two layers in the mid-to-upper troposphere between 90-60 • S falling within GCOS requirements.An examination of the daytime and night-time trends shows that these polar values are dominated by daytime cases.The two surface layers in the global results are close to the 0.05 K per decade requirement for all match-ups, which is driven by night-time surface trends between ±60 • .However, where we find small stability trends (better performing), they are inside of the trend uncertainty.Similar to the water vapour results, we find that only 47 % of trends are outside the noise for all daytime cases, with a slight improvement in night-time match-ups where 50 % of the trend are outside the uncertainty.The vertical pat-tern in global trends is matched in latitudinal results between 60 • S to 60 • N, with positive trends between 1000-850 hPa and negative trends above 850 hPa.At polar latitudes, tropospheric temperature bias trends are predominately positive, with the stronger gradients observed in the Northern Hemisphere.The uncertainty range for all trends is between 0.06 and 0.98 K per decade, with polar night-time match-ups dominating the higher end of this range.While the temperature profile stability is not quite meeting GCOS requirements, it is worth noting that daytime polar (Southern Hemisphere) and night-time northern mid-latitudes both have four out of six layers with bias trends of 0.06 K per decade or less.

Discussion
The EUMETSAT Polar System (EPS) programme, which began in 2006 and will run until 2027, consists of the Meteorological Operational Satellite (MetOp) series of platforms: MetOp-A, MetOp-B, and MetOp-C.Continued through the EPS Second Generation programme, the data record from this satellite series will provide a continuous data record to 2045.Therefore, MetOp data products are invaluable for climate data records (CDRs).IASI water vapour and temperature profile data analysis has been performed from four addihttps://doi.org/10.der et al., 2016).With our study, we have adopted a similar methodology to that outlined in Trent et al. (2019) so that the results are comparable within a common framework.This is especially important for future validation exercises when considering combining different platforms to create a CDR.The capability to be incorporated with other international collaborative efforts has extended the scope of this validation framework.Tools developed within G-VAP (Schröder et al., 2019) have been employed to investigate the stability of observed biases, which is vital for the long-term characterization of CDR performance.
Early assessment of the EUMETSAT operational processor L2 during the Joint Airborne IASI Validation Experiment (JAIVEx) revealed relative humidity (RH) profiles to be dry-biased near the surface (−10 % RH) and at altitudes above 12 km (−10 % RH to −5 % RH), with wet biases in the free troposphere reaching 5 % RH.Temperature profiles were shown to be cold-biased and within 1 K of ra-diosonde measurements below 12 km (Zhou et al., 2009).A second presented in Pougatchev et al. (2009) looked at 650 collocations over Lindenberg, Germany.IASI RH profiles showed a predominate wet bias within 10 % RH between 800-300 hPa, with a maximum of 20 % RH observed between 300-200 hPa.Temperature profile biases were shown to exceed 2 K near the surface.In contrast, between 950-100 hPa, biases oscillate between ±0.5 K. Further analysis of EUMETSAT IASI water vapour over the Tibetan Plateau revealed profile biases within 25 % g kg −1 of radiosondes, with RMS differences between 20 % g kg −1 -50 % g kg −1 (Ting et al., 2013).Differences in observed bias strongly depended on seasonality, with warmer months generally wet-biased at altitudes lower than 600 hPa, while colder months exhibited a dry bias.While August et al. (2012) present results of temperature and water vapour profile comparisons to reanalysis, they are not included as part of this discussion as we consider these to be intercomparisons rather than validation.The most recent validation results for the operational IASI product can be found in the L2 validation report (EUMETSAT, 2018).Here, comparisons to global radiosondes show temperature profile biases within ±0.5 K (predominately coldbiased), and water vapour profiles are within 0.1-0.2g kg −1 , dry-biased in the lower troposphere, and wet-biased at altitudes above 800 hPa.
Divakarla et al. ( 2011) compared NOAA-processed IASI-MHS and AMSU-A water vapour and temperature profiles to radiosonde soundings using the same collocation criteria as our study.Match-ups were split into four groups: (i) clear-sky ocean, (ii) clear-sky all surfaces, (iii) cloud-cleared ocean, and (iv) cloud-cleared all surfaces.The term cloud cleared refers to the method outlined in Susskind et al. (2003) for treating IR measurements in cloudy scenes when used in conjunction with MW radiances.Results showed RMS differences of between ≈ 15 % RH-40 % RH and 1-1.5 K between the surface and 300 hPa for water vapour and temperature, respectively.Temperature profiles were shown to have slightly better performance in clear-sky scenes, while water vapour had better accuracy in cloud-cleared cases.
Further analysis of the NOAA IASI product over East Asia (Kwon et al., 2012) showed relative humidity biases with respect to radiosondes are within ±5 % RH over all surfaces except land, where biases rise to ≈ 8 % RH.Temperature profile biases were ±0.5 K between the surface and 800-200 hPa.Nearer the surface, temperature profile biases can exceed 2 K. Wet and warmer biases were observed in drier atmospheres, and similar performance was seen across differing cloud fraction amounts.
The study from Bouillon et al. (2022) presents the most comprehensive analysis of IASI temperature profiles by comparing 13 years of retrievals using an artificial neural network (ANN) with ARSA, European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis (ERA5; Hersbach et al., 2020), and the climate data record (CDR) of all-sky IASI temperature profiles from EUMETSAT (released 2020; https://doi.org/10.15770/EUM_SEC_CLM_0027,Doutriaux-Boucher and August, 2020).Results from this study found (i) good agreement between all four datasets, especially between 7-750 hPa; (ii) that differences between ARSA and ERA5 were less than 0.5 K over the majority of latitudes and pressure levels; and (iii) a warming trend in tropospheric temperature of 0.5 K per decade at mid-latitudes and 1 K per decade at the poles due to Arctic amplification.Finally, Validation of MUSICA IASI (IR only) water vapour profiles at selected GRUAN sites showed biases of 11 % below 12 km; a stronger dry bias was observed at higher altitudes of up to 21 % ppmv (Borger et al., 2018).With the MUSICA IASI scheme only using IR information, these biases represent clear skies only.
Looking forward, it is important that common units and metrics are adopted for tropospheric profile analysis.This is especially true for water vapour profiles due to the range of units that can be used, both absolute and relative.For atmospheric temperature, we see similar results to EUMETSAT (2018) and Bouillon et al. (2022), with profiles predominantly cold-biased and below 0.5 K.For water vapour we can only really compare our results to those from Borger et al. ( 2018), where we see a similar performance below 12 km (within 11 % ppmv).With our study, we have gone further to look at the impacts of clouds and diurnal sampling on the report biases.Furthermore, we present the first results on the stability of these biases, values that are needed when considering using the data for climate applications.

Conclusions
This study has assessed 9.5 years of IMS water vapour and temperature profiles co-retrieved from IASI, MHS, and AMSU-A on board the MetOp-A platform.This dataset was produced as part of the ESA WV_cci project.A database of match-ups was collected over GRUAN and ARSA radiosonde sites for IASI footprints within 100 km and ±3 h of launch.These broad collocation criteria allow multiple IMS profiles to be averaged over each site to reduce the collocation uncertainty.IMS averaging kernels were applied to all matched radiosonde profiles, smoothing the higherresolution (vertical) in situ measurements to the satellite vertical atmospheric sensitivity, thus allowing for like-for-like comparisons.An evaluation of IMS performance was conducted for "global" matches, day and night differences, the impact of cloud fraction, and site-to-site differences.From the results gathered, the main conclusions from this study are as follows.
-Global biases calculated from 9.5 years of matches made at both GRUAN and ARSA sites are within performance goals of 10 % ppmv and 1 K for water vapour and temperature profiles, respectively (Hilton et al., 2012).The strongest wet biases for collocations made at GRUAN sites can be found in the upper troposphere, while the lower-tropospheric layers exhibit the wetter biases for ARSA stations.Temperature profiles are predominately cold-biased relative to ARSA.In contrast, a small warm bias is found for GRUAN comparisons in the mid-to-upper troposphere.
-Although global results are within desired performance limits, site-to-site differences are observed for GRUAN sites.Likewise, biases reported as a function of latitude relative to ARSA profiles increase in magnitude, moving towards higher latitudes.Biases are larger at latitudes > 60 • than at 30-60 • though within respective 1 % and 1 K limits, except for water vapour at latitudes > 60 • N. Some of this variability can be explained by sampling differences, especially for GRUAN, where a difference of 3 orders of magnitude between the number of match-ups can be found.However, the ARSA results indicate a latitudinal dependence of biases related to total column concentrations (e.g.Roman et al., 2016).
-Water vapour profile biases for global GRUAN matches between the surface and 850 hPa are substantially larger in the daytime, with corresponding night-time biases within ±0.3 % ppmv.-Globally, biases from ARSA match-ups have consistent diurnal patterns with those made at GRUAN sites.However, daytime data dominate (cold) temperature biases rather than night-time match-ups.On latitude bands, this daytime dominance in temperature profile biases is driven by tropical and mid-latitude collocations.
-Further to the diurnal (polar summer/winter) effects on reported biases, cloud fraction has an additional impact.For water vapour profile match-ups, daytime water vapour biases relative to ARSA tend to be wetter than global (daytime) results under all cloud fraction amounts.Similarly, cloudy night-time matches are drier than those shown in Fig. 11d.This pattern is consistent for matches between ±60 • , although it breaks down for polar sites.For GRUAN collocations, a slightly different pattern is observed.Profiles become drier relative to global results with increasing cloud fraction in the mid-to-lower troposphere, with the upper-tropospheric biases consistently wetter.This pattern is inverted for night-time match-ups.Cloud fraction impacts on temperature biases have better global consistency, with cloudy scenes colder for daytime and warmer for nighttime for both GRUAN and ARSA match-ups.However, as is shown by Figs. 9 and 12 these results will include a sampling bias due to the relatively low numbers of matches over the differing cloud fraction amounts relative to cloud fraction below 10 %.This feature in the data can partially be attributed to the fact that not every retrieval in the IMS product contains an averaging kernel, so we are limited in selecting the profiles we can use in this study before quality filtering.
-An initial look into the height-resolved stability of the IMS biases (with respect to ARSA) over the 9.5-year time series shows that most are outside of GCOS requirements.However, to the authors' knowledge, this is the first look at profile bias stability from combined MW and IR nadir sounders.Therefore, it is understood that, while a longer time series is required to conduct this analysis, a greater understanding of the impact of collocation uncertainties on the trend is also needed.It is anticipated that this can explain the effects seen in mid-tropospheric values (Tables 3 and 4).
Finally, results from this study show the promise of satellite water vapour and temperature profile records for longterm climate studies, especially for scenes with cloud fractions below 0.1.Additionally, knowledge of temperature and water vapour profile accuracy is vital for calculating uncertainty budgets for all trace gases retrieved from IASI (similar) data.Therefore, the extension of combined IR plus MW profile records forward in time through MetOp-B and MetOp-C, MetOp-SG, and the US series S-NPP/JPSS (Suomi National Polar-orbiting Partnership; NOAA-20 onwards) but also backwards with ATOVS is needed.A dataset of this type would take the time series back to 1999 and out to 2045.Adding new missions into the time series will also capture greater diurnal variability for profiles and total column water vapour (TCWV).This final point would provide a new complementary record for the existing SSM/I (SSMIS; Special Sensor Microwave Imager/Sounder) record (ice-free oceans only), which could be used to extend coverage over land and possibly polar regions.
T. Trent et al.: Validation of IMS q and T profiles Special issue statement.This article is part of the special issue "Analysis of atmospheric water vapour observations and their uncertainties for climate applications (ACP/AMT/ESSD/HESS interjournal SI)".It is not associated with a conference.
Acknowledgements.This study was partly funded by ESA (contract no.4000123554) via the Water Vapour Climate Change Initiative (WV_cci) project of ESA's Climate Change Initiative (CCI).Tim Trent, Brian Kerridge, Richard Siddens, and John Remedios would also like to acknowledge the funding from the Natural Environment Research Council through the Natural Centre for Earth Observation, contract PR140015.This research used the AL-ICE High-Performance Computing Facility at the University of Leicester.Marc Schröder acknowledges the financial support of the EUMETSAT member states through CM SAF.The authors would like to thank the ARA/LMD group for the production, validation, and availability of the ARSA (Analyzed RadioSoundings Archive) database.The ARA/LMD group gratefully acknowledges the ECMWF data server for making available the ERA-Interim outputs, the radiosonde archive, and the surface station archive that comprises the ARSA database.The authors would also like to thank the two reviewers who provided very useful and constructive comments that helped refine the final manuscript.
Financial support.This research has been supported by the Natural Environment Research Council (grant no.PR140015) and the European Space Agency (grant no.4000123554).
Review statement.This paper was edited by Chunlüe Zhou and reviewed by Xavier Calbet and one anonymous referee.

Figure 1 .
Figure1.Example of the global mean differences between IMS temperature and water vapour profiles and ERA5 reanalysis for 15 June 2012.Also included are the standard deviations (SD) for the differences.Reanalysis has been interpolated to the observation time and the centre of the IASI instantaneous field view.Before differences were calculated, the IMS averaging kernels were applied to the reanalysis profiles.Both IMS and ERA5 use IASI data.Therefore, differences are partly due to the differing backgrounds (a priori) and the different information extracted from the satellite radiances.For further discussion on averaging kernels refer to Sect. 3 (Methodology).

Figure 2 .
Figure 2. IASI channels used by the IMS scheme (indicated by vertical black lines) and nadir optical depths of major absorbers.

Figure 3 .
Figure 3. Visualization of IMS water vapour and temperature profile degrees of freedom for signal (DOFS) values.This figure illustrates the latitudinal distribution of DOFS variability for both IMS water vapour and temperature profiles.DOFS values were collected from the IMS L2 files between 2007-2016 and binned as a function of latitude.Values were then normalized using the total number of profiles in their respective latitude bin.The DOFS values vary between 2-7 for water vapour and 8-13 for temperature, with strong peaks in the tropics.The spread in the data resembles the cold-point tropopause height (TPH), especially for water vapour.The dashed black line represents the cold-point TPH calculated from ERA5 temperature profiles(Hersbach et al., 2020).

Figure 4 .
Figure4.Example of averaging kernels (AKs) for IMS water vapour (q) and temperature (T ) profiles extracted over the GRUAN Lindenberg (LIN) site.Panel (c) shows the cumulative degrees of freedom for signal (CDOFS) values, which illustrates how the vertical distribution information content can be related to profile vertical resolution.In the lower-middle troposphere, the water vapour CDOFS aligns with a vertical resolution of 2 km.Above 8 km, the gradient swiftly becomes zero around the tropopause height (TPH).The temperature profile starts at a 1 km vertical resolution and reduces to 2 km at 6 km.Above this height, the gradient shows the vertical resolution changes to

Figure 5 .
Figure 5. Locations of sites within the two radiosonde archives: (a) the GCOS Reference Upper-Air Network (GRUAN) and (b) the Analyzed RadioSoundings Archive (ARSA).Locations are specific to upper-air soundings made between 1 June 2016 and 31 December 2016.Further details regarding the individual GRUAN sites are given in TableA1.

Figure 6 .
Figure 6.Comparisons of water vapour profiles from the IMS L2 product at GRUAN sites, matched at sites between 1 June 2007 and 31 December 2016 and with up to 80 % cloud cover.Median biases for atmospheric layers are shown with blue bars representing a median wet bias relative to GRUAN, while brown bars depict a median dry bias with respect to GRUAN.Panel (a) shows results for all data that pass quality control, with panel (b) showing the breakdown between day and night results.All biases have been normalized by the median GRUAN layer value for the site and multiplied by 100 (i.e. to scale to % ppmv).Dashed lines represent each layer's normalized median absolute deviation (MAD).

Figure 7 .
Figure 7. Like Fig. 6 but for comparisons of IMS IASI temperature profiles at GRUAN sites.Median biases for atmospheric layers are shown with red bars representing a median warm bias relative to GRUAN.In contrast, dark blue bars depict a median cold bias with respect to GRUAN.The dashed lines represent the layer median absolute deviation (MAD) in kelvin.

Figure 8 .
Figure 8. IMS water vapour (a, c, e) and temperature (b, d, f) profile biases as a function of cloud fraction for all sites, calculated for 10 % cloud faction bins for each tropospheric pressure layer.Differences between daytime and night-time biases and the bias for all sites are shown for water vapour and temperature, respectively.

Figure 9 .
Figure 9. Stacked histogram of IMS collocations to GRUAN sites split into 10 % cloud cover intervals.Results are separated into day and night cases, with the total number of matches shown at the top of each bar.A further subdivision of the first bin (0-10 %) can be seen in Fig. B1 of Appendix B.

Figure 10 .
Figure 10.IMS profile biases relative to ARSA radiosonde measurements.Profile biases are shown for all global water vapour (a) and temperature (c) results, as well as the split between day and night biases (b and d, respectively) in the same manner as Figs.6 and 7. Biases for five broad latitude bands are also shown for water vapour and temperature comparisons (e, g).Day and night biases for each latitude band are also given in (f) and (g) for water vapour and temperature profiles, respectively.Median absolute deviations (MADs) are shown as dashed lines for all matches and the day and night split.

Figure 11 .
Figure11.Cloud fraction effects on IMS biases relative to ARSA radiosonde water vapour and temperature profiles.Results here are broken down into global water vapour and temperature biases (a, d), daytime differences from global biases (b, e), night-time water vapour and temperature bias differences (c, f), and biases calculated for five broad latitudinal bands (g, j), as well as the daytime and night-time differences (h, i and k, l, respectively).

Figure 12 .
Figure 12.Like Fig. 9, binned sampling of IMS collocations over ARSA sites for global matches split into the five latitudinal bands used in this study.The total number of collocations is shown in the legend for each panel.

Table 1 .
Breakdown of GRUAN site-specific IMS water vapour profile biases and standard errors (given in % ppmv) for the same layers shown in Fig.6.The site short names correspond to the details given in TableA1and are accompanied by the total number of matches made with IMS.Sites appear in latitudinal order (north to south).After quality filtering, ≈ 90 % of all matches remain at all levels.

Table 2 .
As Table1, breakdown of GRUAN site-specific IMS temperature profile biases and standard errors (given in K) for the same layers shown in Fig.7.

Table 4 .
As Table3but for temperature profile biases.Trends are given in kelvin per decade.
However, night-time biases exceed daytime biases in layers above 700 hPa.Temperature biases in layers up to 500 hPa are warm-biased at night and cold-biased for daytime matches.Overall, night-time temperature profile biases dominate the global results.