On the Use of Routine Airborne Observations for Evaluation and Monitoring of Satellite Observations of Thermodynamic Profiles

. Satellite-based observations require independent sources of data to monitor and evaluate their precision and accuracy. For the temperature and water vapor profiles produced by satellite-based sounders, this typically results in 10 comparisons to operational radiosonde observations. However, polar-orbiting satellite overpasses are frequently misaligned with the global synoptic launch times. The routine airborne in situ observations of temperature and water vapor from the Airborne Meteorological Data Relay (AMDAR) program and Water Vapor Sensing System-II (WVSS-II) instrument greatly enhance opportunities for making precise matchups due to the far greater temporal frequency and spatial density of aircraft flights. 15 The potential for the use of aircraft-based observations as a source for evaluation of tropospheric satellite sounder profiles is explored through a year-long intercomparison with the IASI Level 2 profiles produced from both the Metop-A and Metop-B satellites. Results using 1 h and 50 km match criteria indicate good agreement between the satellites and the aircraft-based observations with temperature, specific humidity, and relative humidity biases generally less than 0.5 K, 0.8 g kg -1 , and 5% respectively; both IASI instruments perform nearly identically. While the intercomparisons are generally limited to the 20 troposphere as aircraft typically reach their maximum height at the tropopause, the substantially larger number of intercomparison points enable characterization as a function of season, scan angle, and other characteristics heretofore unexplored due to a lack of sufficient validation data.


Introduction
The advantages of low Earth-orbiting (LEO) satellites, also known as polar-orbiting satellites, are well known, such as global coverage with coordinated sets of high-spatialresolution instruments and frequent observations of polar regions.The impact of LEO observations on both atmospheric research and operational meteorology has been felt for over 60 years, as products obtained from these systems have progressed from simple black-and-white visible wavelength snapshots of the location and extent of daytime cloud clover to well-calibrated quantitative measurements of atmospheric and surface properties.One particularly useful application of satellite remote sensing of the atmosphere is the retrieval of thermodynamic profiles.Through passive remote sensing of upward atmospheric emission in the infrared and/or microwave bands and judicious use of statistical or physical retrievals, it is possible to obtain accurate vertical profiles of temperature and water vapor in the atmosphere.Information about atmospheric structure and stability is now available in places where such observations are otherwise sparse, such as over oceans, polar regions, and lessdeveloped land areas.Thermodynamic soundings from LEO satellites have a diverse set of applications, including weather forecasting and nowcasting, climate monitoring (Schröder et al., 2018), and atmospheric composition (Clerbaux et al., 2009), where they are inputs to subsequent geophysical retrievals.In particular, the improvements made over the last years with the precision and latency of the retrievals have enabled these retrievals to become important assets to further support the work of forecasters (Smith et al., 2021;Bloch et al., 2019;Herold and Hungershofer, 2019;Kocsis et al., 2018).Knowledge developed in support of current polar missions is also preparing the groundwork for nowcasting applications of the upcoming geostationary InfraRed Sounder (IRS) on board the Meteosat Third Generation (MTG, Holmlund et al., 2022).MTG-IRS will have the decisive advantage of an unprecedented three-dimensional look into the atmosphere with a substantial improvement in temporal resolution: every 30 min over Europe compared to the twice-daily revisits offered by an individual satellite.
For successful scientific and operational applications, it is essential to monitor the performances of the satellite products throughout the mission lifetime.Traditionally, the standard for evaluating thermodynamic profiles remains the balloonborne radiosonde due to its ubiquity and well-characterized precision.However, the utility of the operational radiosonde network in evaluating LEO products is limited due to the large spatial and temporal gaps (up to several hours) between radiosonde launches and satellite overpasses.While radiosonde observations are primarily launched at standard synoptic observation times (such as 00:00 and 12:00 UTC), LEO overpasses are often synchronized with the Sun so that insolation and solar zenith angles can be relatively constant for all daytime observations over the course of a day.This does mean, however, that performance can only be monitored at certain locations at which radiosonde launches and LEO overpasses coincide within an acceptable time frame.For example, at the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT), the operational comparison window is as large as 3 h between radiosonde and satellite observations.
One possibility for augmenting the radiosonde validation matchups for the LEO thermodynamic products is the in situ observations from the commercial aviation network.Modern jetliners need to constantly monitor the atmospheric state (including winds, pressure, and temperature) to safely operate and navigate, and these observations have significant value to both the research community as well as national weather services.The World Meteorological Organization (WMO) developed the Atmospheric Meteorological Data Relay (AMDAR; Moninger et al., 2003) program to collect, control for quality, and disseminate these observations in near real time.With every take-off and landing, aircraft from AMDAR profile the depth of the troposphere, and while cruising between airports these aircraft report substantial information about the near-tropopause environment and lower stratosphere.In addition, approximately 100 aircraft operating in the United States have been equipped with the Water Vapor Sensing System-II (WVSS-II, Petersen et al., 2016) to measure specific humidity (SH).
In the present work, we studied the use of AMDAR and WVSS-II observations for the validation of atmospheric temperature and humidity profiles retrieved from satellite sounders.The methodology was developed to evaluate and characterize the performance of the Infrared Atmospheric Sounding Interferometer (IASI) (Cayla, 1993) level-2 (L2) temperature and water vapor products from the EUMETSAT Polar System (EPS; Klaes et al., 2007Klaes et al., , 2021) ) over the continental United States (CONUS).However, this same technique can be applied to other similar satellite sounder products, including those originating from geostationary orbits.We have chosen to conduct this study over the CONUS as it offers a diverse set of meteorological conditions and surface types coupled with a high density of airborne observations (especially water vapor observations) relative to the rest of the planet.The large number of AMDAR observations fosters many more intercomparison points than is possible with radiosondes alone, enabling new categories of intercomparisons that would otherwise be difficult to perform with other datasets.The remainder of this paper explores the performance of the IASI relative to AMDAR observations in a variety of different categorizations and offers insight into how the airborne observations can serve as part of an operational evaluation system for any satellite profiling system.

IASI observations and level-2 products
The EPS (Klaes et al., 2007(Klaes et al., , 2021) ) consists of three satellites, Metop-A, Metop-B, and Metop-C, launched in 2006, 2012, and 2018, respectively.These satellites have a Sunsynchronous orbit at a mean altitude of 817 km and a period of 101 min with a 29 d ground track repeat cycle.One of the primary instruments is the IASI (Blumstein et al., 2004;Hilton et al., 2012), a hyperspectral Fourier transform interferometer that observes over the spectral range of 640 to 2700 cm −1 (3.6 to 15.5 µm) with a spectral sampling of 0.25 cm −1 (0.5 cm −1 resolution), a horizontal resolution at nadir of 12 km, and a swath width of approximately 2000 km which enables global coverage twice a day.The retrieval of geophysical parameters also exploits observations from the microwave companion instruments: the Advanced Microwave Sounding Unit (AMSU) and the Microwave Humidity Sounder (MHS).This infrared-microwave synergy allows for complete vertical profiling including cloudy environments.The combined retrieval is referred to as a IASI level-2 product for convenience.IASI-only retrievals are the fall-back mode in case the microwave sensors are unavailable, but evaluating the performance of this mode is out of the scope of this study.
The retrieval methodology (Hultberg and August, 2014;August et al., 2012) is based on machine learning techniques and constitutes the operational baseline of all EUMET-SAT hyperspectral missions, both current (IASI) and future (MTG-IRS, IASI-Next Generation).It implements a piecewise linear regression (PWLR).In this approach, a training base is constructed with over 100 million real IASI and AMSU-MHS observations collocated with model reanaly-sis data from the ECMWF (ERA-5; Hersbach et al., 2020).The observed spectra (predictors) and the atmospheric profiles (predictands, originally on 137 surface-dependent pressure levels) are represented in principal component scores (PCSs).The satellite observations, once in PCSs, are partitioned into observation classes by application of k-means clustering (MacQueen, 1967).A linear regression is then performed in each individual observation class between the IASI and AMSU-MHS observations and the geophysical parameters from ERA-5.Once the regression coefficients are derived, a second linear regression is applied between the observations (in PCSs) and the absolute training error in the lower troposphere (i.e., the difference between the training and retrieved quantities at the bottom of the troposphere).This forms an uncertainty estimate which users can use as a quality indicator to perform data selection adapted to their applications.
In the retrieval stage, the satellite measurements are mapped onto the observation classes and the regression coefficients are applied to retrieve the geophysical information as well as the uncertainty estimates.Further details are provided in the MTG-IRS L2 Algorithm Theoretical Baseline Document (EUMETSAT, 2021).Longer-term records of temperature and humidity products like those evaluated here have been subject to numerous radiosonde-based validation studies, essentially using radiosondes (EUMETSAT, 2022(EUMETSAT, , 2018(EUMETSAT, , 2016;;Feltz et al., 2017;Roman et al., 2016;Boylan et al., 2016).Regional products are available with a latency of 15 to 30 min, and global products are available within 1.5 h.
As Sun-synchronous polar-orbiting satellites, Metop-A and Metop-B have an ascending node covering the CONUS at night (roughly 21:00 to 22:00 LT depending on the location and time zone) and a descending node during the day (roughly 09:00 to 10:00 LT).These are both periods in which substantial aircraft reports are available over much of the CONUS.With radiosonde launches timed to observe at 00:00 and 12:00 UTC, this means that CONUS-based radiosondes are valid for local times between 04:00 and 08:00 as well as between 16:00 and 20:00.While eastern radiosonde launches tend to align with Metop overpasses, the western launches are well before Metop arrives.This limits the potential satellite-radiosonde intercomparisons to specific regions and illustrates why additional sources of evaluation data may be desired.

AMDAR observations
With so many high-quality in situ observations being made by commercial aircraft every day, it is natural that the aviation industry and the meteorological community have partnered together to exploit their benefits.Through the AM-DAR program, participating airlines share their meteorological data with each other and with various national weather agencies.Costs for the system, which mostly consist of data transfer and quality control, are generally borne by the weather agencies and are substantially less than the operational costs of even a modest network of consumable radiosondes (WMO, 2014).
The observations are transmitted to the surface via the different air-to-ground communication protocols in place throughout the world.In North America, this is accomplished via the Aircraft Communications Addressing and Reporting System (ACARS), which is another name by which the airborne observations are sometimes known.Participation in the AMDAR program is voluntary, but most major carriers in the United States and western Europe contribute observations.Data coverage is densest over those regions, which means that the locations of the observations tend to be biased toward more populated regions of the Northern Hemisphere.Several studies evaluating the accuracy of AM-DAR observations have been carried out; Zhang et al. (2018) summarize many of them.More recently, Wagner and Petersen (2021) performed a year-long CONUS-wide evaluation of AMDAR-observed temperatures against operational radiosondes and found excellent agreement between the two systems, with a small cool bias of 0.2 K and a standard deviation of 0.8 K.It is unsurprising, therefore, that with so many well-characterized measurements, assimilating AMDAR observations into numerical weather prediction (NWP) models has been found by numerous studies to have a significant positive impact on forecasts; a review of many of these studies can be found in Petersen (2016).
There are some differences between AMDAR observations and radiosondes that are worth noting that largely arise out the airlines' primary purpose of the safe transfer of people and goods instead of meteorological data collection.Regular diurnal, hebdomadal, and annual cycles in the number of observations correspond to typical fluctuations in air traffic (e.g., more during the day than at night, more on weekdays than weekends, and more during winter holidays and summer than other months).Few observations are made within or adjacent to severe storms or tropical systems, profiles rarely extend above 160 hPa as most planes do not cruise that high, and large-scale disruptions to air travel like the COVID-19 pandemic can greatly reduce the number of observations.Despite these limitations, the AMDAR dataset provides far greater spatial and temporal density throughout the troposphere than is possible with the radiosonde network.

WVSS-II observations
Water vapor observations do not have the same immediate utility to aircraft in flight that observations of temperature, pressure, and winds do.Therefore, humidity sensors are not included as standard equipment by the major aircraft manufacturers.However, the atmospheric science community has recognized that significant value would be added to the already beneficial AMDAR observations if they could be augmented by in situ measurements of water vapor.This has culminated in the creation of the WVSS-II sensor, a https://doi.org/10.2021) and Wagner and Petersen (2021) conducted CONUS-wide comparisons between WVSS-II and operational national weather service radiosondes and found good agreement between the two systems, with a bias of approximately 0.3 g kg −1 and a standard deviation of approximately 1 g kg −1 near the surface that decreases with height as the absolute water vapor content decreases.The WVSS-II observations have also been shown to have a significant positive impact on NWP (Hoover et al., 2017;Petersen et al., 2016).A map showing the distribution of WVSS-II observations across the CONUS is shown in Fig. 1c.For the sake of convenience, this paper uses the term AMDAR to include all the aircraft-based observations, whether or not water vapor observations are included.

Methodology
The present work uses the entirety of IASI and AMDAR observations for the calendar year of 2017.This represents a period in which two separate IASI-supporting satellites, Metop-A and Metop-B, were operational and the significant flight disruptions due to the COVID-19 pandemic had yet to be realized.Observations were considered to be matched if they occurred within 50 km and ±1 h.Since this matching radius is larger than the footprint of an IASI pixel, the same AM-DAR observation could be matched to more than 1 IASI pixel simultaneously, while multiple AMDAR observations could be matched to the same IASI observation.IASI profiles were interpolated onto a vertical grid with three bins per 100 hPa of height, and the observational differences were calculated between AMDAR observations in pressure altitude, which is easily converted to pressure using the standard atmosphere.AMDAR profiles were also interpolated to that vertical grid to facilitate the intercomparisons.The IASI level-2 algorithm retrieves SH, which is also directly observed by WVSS-II.
The relative humidity (RH) values investigated here were independently computed for the IASI and aircraft observations at each reported pressure level using the Bolton (1980) formula.
e s (T ) = 611.2exp 17.67T c T c + 243.5 (1) e s is the saturation vapor pressure (Pa) and T c is the temperature in degrees Celsius.The quality control applied here retained Metop retrievals where the uncertainty estimates of temperature and dew point profiles are better than 1.5 K and 2.5, respectively.
It is important to note that this study accounts for the spatial drift of the observations when doing the matching; since an airplane usually undergoes significant horizontal displacement during ascent and descent, a given airplane may be matched to one IASI profile near the surface and a different profile after having ascended or descended for a period.Many previous IASI validations, including the operational comparisons carried out at EUMETSAT, have used a no-drift assumption with respect to the radiosonde as many operational radiosonde data feeds do not retain geographical coordinates beyond the launch site.Each AMDAR observation includes the latitude and longitude of the observation, which makes these direct geographic comparisons possible.
Throughout this paper, the bias is calculated as the mean of IASI-minus-AMDAR differences at a specific height, meaning that the aircraft-based observations are the reference state for the intercomparison.The Williams et al. ( 2021) and Wagner and Petersen (2021) studies showed that the airborne observations compare very favorably with operational National Weather Service (NWS) radiosondes over similar spatial and temporal domains, which makes this assumption appropriate.The standard deviations of the differences were calculated as well, and these results are shown throughout Sect.4.

Global statistics for Metop-A and Metop-B separately
Since the dataset consists of two different IASI instruments, the first step in the analysis is to determine whether the separate instruments exhibit similar behaviors.The results from this investigation are shown in Fig. 2. Overall, the two instruments agree remarkably well, with effectively identical biases and standard deviations relative to AMDAR at all the analyzed heights.The overall pattern of IASI performances relative to AMDAR can also be discerned from Fig. 2. Except at the surface, where IASI is effectively unbiased, the IASI temperature retrievals (Fig. 2a) have slight cool biases of 0.2 to 0.4 K up to 260 hPa.Above that height, the magnitude of the cool bias increases to approximately 0.6 K at 200 hPa.The standard deviation of the differences in the temperatures, which is a measure of uncertainty in the retrieval, is at its greatest at the surface, with a value of approximately 2.4 K.This decreases with height to only about 1 K in the middle to upper troposphere (400 to 500 hPa), at which point it increases again to 2 K at 200 hPa.This largely aligns with the profile of the bias and standard deviation of differences between IASI and radiosondes, although the magnitudes of the standard deviations are larger for the AMDAR comparisons than they are for the radiosondes (EUMETSAT, 2022).
The SH bias (Fig. 2b) is dry, is greatest at the surface at approximately 0.8 g kg −1 , and decreases with increasing altitude to near zero at 200 hPa.Since SH is an absolute measure of the water vapor content, which decreases with height due to decreasing temperature and increasing distance from evaporative sources, this decrease is expected.Likewise, the standard deviations in water vapor differences also decrease with height, from 2.2 to about 0.4 g kg −1 at 200 hPa.Dry biases in humidity have not necessarily observed to that extent when evaluated with respect to radiosondes (EUMETSAT, 2022); however, it is important to note that the geographical and diurnal samplings are different.The eastern and southern United States contain the most water vapor intercomparisons (Fig. 1), which have a moisture content that might not be consistent with the global regions sampled by the operational radiosonde intercomparisons.Previous work has shown a small moist bias in AMDAR observations when compared to collocated radiosondes (Wagner and Petersen, 2021), which could account for part of the bias reported here.The random component of the differences in SH (a measure of the precision of the observations) is consistent with the results from radiosonde intercomparisons.
Because the end-user requirements for IASI were also specified in terms of derived RH, it is useful to examine the comparisons for that measure (Fig. 2c).The differences will reflect the combined effects of biases already quantified in temperature and water vapor; note that, due to the nonlinear relationship between temperature and SH in RH, even unbiased inputs can create a biased RH.In this case, the RH bias is consistently between 2 % and 4 % from the surface to 250 hPa, at which point it steadily increases to 8 % at 200 hPa.The general trends in temperature and water vapor biases tend to offset each other with increasing height away from the surface in the lower troposphere, resulting in an RH bias profile that is nearly constant with height; however, since the airborne water vapor observations are not geographically distributed in the same way as the temperature observations, care must be taken when quantitatively comparing RH to SH and temperature.Near the tropopause, the SH levels are so low that the temperature bias dominates the RH, resulting in nearly identical curve shapes in both Fig. 2a   the user requirements for IASI of 10 % precision in RH in the 1 to 2 km layers.
It is important to note that the differences in Fig. 2, while very small, are statistically significant: at certain heights the bias between Metop-A and Metop-B differs by less than 0.01 K, yet according to a two-sample t test that difference is still statistically significant at the 95 % confidence interval due to the thousands of observations present at that height.In a practical sense, however, the two datasets are functionally identical, with changes in bias that are well within the uncertainty of the instruments themselves.Because of this, subsequent analyses will focus on a single large combined dataset comprising both Metop-A and Metop-B.The significant spatial and temporal density of AMDAR observations provides an opportunity to evaluate satellite performance using previously unassessed matchup characteristics.Several examples of questions that cannot be addressed using traditional twicedaily synoptic radiosonde observations as a ground truth for validating LEO retrievals follow.The additional stratifications shown below may have a smaller number of observations in each subset.However, they also have larger differences.

Sensitivity to the viewing angle
One test of the utility of AMDAR reports a validation standard regarding the recurring question of the degree to which satellite retrieval performance varies as a function of scan angle.IASI scans up to 48 • on either side of the nadir, with scans further away from the nadir having a longer geometric path through the atmosphere and a larger spatial footprint.Figure 3 explores the impact that these factors might have on the accuracy of the profile retrievals by evaluating IASI performance as a function of the scan angle.All the observations were sorted into bins with 10 • increments, except for the highest bin, which includes the maximum zenith angle of approximately 58.5 • .The discrepancy between the maximum scan and zenith angles is a result of the curvature of the Earth.The darker blue colors in Fig. 3 represent the profiles from viewing angles closer to the nadir, while the lighter green colors indicate more oblique views.It is evident that the cool temperature bias near the surface and tropopause (Fig. 3a) generally becomes colder with an increasing scan angle, increasing from about 0.1 to 0.4 K. Mean differences are smaller in the middle troposphere, with very little angle dependence between 60 and 700 hPa.The uncertainties also have a small but discernible dependence on the scan angle, with the more vertically pointing views tending to have a smaller random error than the more slanted views.The spread in the uncertainties is largest above 400 hPa, where they differ by approximately 0.13 K.
Similar behaviors can be seen with respect to SH differences, which remain underestimated throughout the depth of the troposphere.In the lower half of the troposphere, where the largest amount of moisture is concentrated (Fig. 3b), results show slightly worsened biases and uncertainty at the higher zenith angles.Overall, the differences caused by view- ing angle changes in IASI humidity profiles remain consistent within 0.1 to 0.2 g kg −1 , which is small compared to the uncertainty budget.The profiles of derived RH bias and uncertainty (Fig. 3c) do not show as clear a dependency on the scan angle as the temperature and mixing ratio.The largest dispersion is observed for the widest angles but stays well within 2 % RH on average.Again, this is likely due to the offsetting biases of the temperature and water vapor observations.

Differences between day and night retrievals
With a clear demarcation between daytime and nighttime nodes, the Metop satellites view the CONUS under two very different solar conditions.While CONUS radiosonde launches tend to be near dawn and sunset, the AMDAR observations have a much broader distribution, which enables direct evaluation of both daytime and nighttime observations.Figure 4 explores the differences in IASI performance as a function of day versus night.Overall, the day and night statistics are individually quite close to the bulk statistics presented previously, with both nodes showing persistent dry biases throughout the analyzed depth, but slight differences exist.Near 1000 hPa, the nighttime observations show a relatively small warm bias of approximately 0.2 K, while the daytime observations have a slight cool bias of less than 0.1 K; this would only be noticeable for coastal locations as most other airports are located at lower pressures.The differ-ences between the temperature biases lessen with increasing height up to 850 hPa, at which point they are effectively the same throughout the rest of the analyzed depth.The differences in the observation uncertainty between day and night are very small.
The differences in SH are also small but are slightly more pronounced.Differences in biases are most prevalent below 750 hPa, with daytime retrievals being as much as 0.15 g kg −1 too dry, while the nighttime retrievals are slightly less biased than the daytime ones throughout the lowest 250 hPa of the atmosphere.Subtle differences in the derived RH bias are most prevalent between 750 and 950 hPa, with day-night variations of 1 % to 2 % RH at the maximum.As seen before, there is little difference in the temperature bias but more noticeable differences in the SH bias that can lead to a stronger dry bias during the day than at night at those levels.Outside that range, differences in the RH biases between day and night are small as the temperature and water vapor biases either offset or are too small to have an impact.More significant day-night differences are observed in the variability of the SH measurements.As measured by the standard deviation, daytime retrievals are consistently more precise than night retrievals by 0.1 to 0.3 g kg −1 throughout the troposphere, possibly due to more vertically constant moisture profiles in the boundary layer during the day.This is also reflected in the RH statistics, where the derived daytime retrievals show up to 3 % less variability from AMDAR reports throughout the troposphere than at night.The improved https://doi.org/10.5194/amt-17-1-2024Atmos.Meas.Tech., 17, 1-14, 2024 water vapor sensing performance during daytime could be explained by the fact that surfaces are then warmer than at night, resulting in stronger thermal contrasts, which are more favorable for atmospheric sounding.

Differences as a function of season
The availability of AMDAR reports across the full CONUS provides a unique opportunity to validate LEO products derived from multiple overpasses throughout the year.The year-long evaluation dataset also affords the opportunity to evaluate whether the aircraft reports can be used to determine how IASI system performance varies as a function of season.For these tests, the matchup data from 2017 were binned into the standard seasons according to the month in which they were taken: winter (January, February, December), spring (March, April, May), summer (June, July, August), and fall (September, October, November).Since the data presented here are limited to the calendar year 2017, the winter season is discontinuous.Results are shown in Fig. 5. Compared to the day-night differences, the seasonal analysis indicates greater contrasts in performance, with the differences between seasonal temperature biases being larger than 0.5 K at most levels of the atmosphere; above 300 hPa the bias can change by upwards of 1 K between seasons.However, there appears to be little relationship between overall environmental temperature and the magnitude of the bias; for example, the winter and summer bias profiles cross repeatedly throughout the analyzed depth.The spread in the profiles of temperature standard deviation is also larger sea-sonally than diurnally, with higher precision being reached in summertime.
While there is a large spread in SH uncertainties, the uncertainties themselves largely correspond to the absolute amount of water vapor typically present in the atmosphere at each time and pressure level.Larger uncertainties are present nearer the surface than at higher altitudes, while substantially larger uncertainties are found during the summer than during the winter, when the absolute water vapor content is significantly lower.It is worth noting that the seasonal spread in the SH uncertainty only shows a small change with height, even as the magnitudes of the uncertainties themselves trend downward significantly with height.The corresponding statistics in RH profiles show little variability in bias.The most noticeable differences in agreement between the AMDAR and IASI observations occur between the surface and 700 hPa, where they can be as much as 5 %.The greatest agreement again occurs during the summer, a time when warmer summer surfaces could explain more favorable thermal contrasts for sounding.

Differences as a function of environmental characteristics
While it is natural to focus on retrievals as a function of pressure (and, by extension, altitude), this is not the only way to assess the performance of the IASI level-2 products.An unanswered question remains: how well can AMDAR observations be used to determine how retrievals perform as a function of the value of the quantity being retrieved?For example, the vertical profile of the standard deviation of temperature differences largely follows the vertical profile of temperature itself: highest at the surface and above the tropopause and lowest in the middle.Are the observed differences therefore a function of pressure and altitude, or are they actually a function of the underlying temperature?It is therefore appropriate to investigate the performance of the IASI level-2 retrievals at various temperatures and humidity thresholds.As with the other comparisons presented here, this increased level of investigation requires a more expansive dataset than can be provided by using standard radiosondes.Such relationships are examined in greater detail in Fig. 6.The AMDAR-minus-IASI differences are plotted as a function of the AMDAR value used in those calculations.Each panel illustrates the median difference (dots) as well as the interquartile range (lines) for the differences.As can be seen in Fig. 6a, there is a clearly identifiable trend toward increasing warm bias as temperatures drop below 220 K.Such temperatures are typically only found at near-tropopause levels, which shows consistency with the pressure-based plots discussed earlier, and only 12.1 % of the observations in this dataset are that cold.The majority of the observations reflect the slight cool bias previously observed.Between 220 and 280 K (60.7 % of the dataset), the magnitude of the bias is consistent.It is in this range, especially between 230 and 260 K, that the magnitude of the interquartile range is at its smallest.Between 280 and 300 K (25.7 % of the dataset), the cool bias gets colder with increasing temperature to ap-proximately −1.0 K at 300 K.At temperatures warmer than 300 K (the remaining 1.5 % of the dataset), the bias becomes slightly less cold with increasing temperature.
The specific humidity is captured in bins of 1 g kg −1 width centered on integer values (Fig. 6b).There is a clear trend of the retrievals underestimating moisture content as SH increases.Where SH values are small, the absolute difference between the IASI and WVSS-II observations is also small, hence the lack of a discernible interquartile range associated with the driest environments even though the number of observation points is largest.The relative error may be large, however.By contrast, for environments with 20 g kg −1 of the SH, the bias is approximately −3.3 g kg −1 .Most of the intercomparisons are at low values of SH, as the median SH in the intercomparison dataset is only 0.78 g kg −1 .While the width of the interquartile range is mostly constant over a large range of specific humidities as it varies by less than 0.5 g kg −1 between 6 and 12 g kg −1 , that range encompasses only 15 % of all the observations.At high moisture levels, the range of differences is small, implying that IASI agrees well with AMDAR in high-moisture environments where accurate assessment of moisture content could have a high impact on operational forecasting and NWP.RH (Fig. 6c) is similar to the SH in that it has a trend towards increasingly large amplification of dry bias as RH approaches 100 %, but unlike SH, this cannot solely be ascribed to small differences in water vapor content.Since it is possible for a given RH value to occur at effectively any pressure or temperature, this implies that the trend towards IASI retrievals underestimating higher https://doi.org/10.5194/amt-17-1-2024Atmos.Meas.Tech., 17, 1-14, 2024 Figure 6.Plots of the distribution of observed IASI-minus-AMDAR differences for temperature (a, K), SH (b, g kg −1 ), and RH (c, %) as a function of the AMDAR-observed value for that quantity.Bins are 5 K, 1 g kg −1 and 5 %, respectively.Lines extend from the 25th to 75th percentiles in each bin, while the dots indicate the median for that bin.The number of observations in each bin is also displayed; larger numbers are rounded to the nearest thousand or 0.1 million as appropriate.
water vapor content is not an artifact of altitude or absolute vapor content but instead is a legitimate issue that may need further addressing.

Differences as a function of geographical location
As mentioned before, one of the key advantages to using the airborne dataset to evaluate satellite observations is that it is not as spatially limited as radiosonde datasets are.While AMDAR observations at lower altitudes tend to be clustered near population centres due to the presence of major airports, at higher altitudes the entire CONUS is blanketed by airborne observations as commercial aircraft cruise between airports.This allows for the evaluation of what, if any, spatial dependency the IASI profiles may have.Figure 7 depicts such an analysis.In this case, all intercomparisons from the 300 hPa pressure level and higher were binned according to their geographic location into 1 • latitude by 1 • longitude bins; this height was chosen because the spatial extent of airborne observations is much more continuous at these cruising levels than it is closer to the surface.The mean and standard deviation of the differences in each bin were calculated and then plotted onto maps.The overarching message is that, regardless of the quantity being measured, there is little systemic spatial variation, neither geographical nor in continental versus maritime observations.The bias or standard deviation of the differences is mostly uniform across the analyzed region, with the same cool and dry biases observed above 300 hPa in the vertical profile analysis.Much of the variability between adjacent bins found in these maps can be attributed to the relative differences in the number of intercomparisons in each bin.As Fig. 1 shows, there are more temperature observations than moisture ones, more observations over land than over the ocean, and more observations in the middle of the CONUS than along the Canadian or Mexican borders.
Regions and parameters with fewer observations exhibit a greater degree of variability from one bin to the next, visible as checker-boarding in the figures, than those with more observations.

Summary and conclusions
In the present work, the value of the routine observations of atmospheric conditions observed by commercial aircraft during the course of their regularly scheduled flights for the evaluation of satellite-based observations was demonstrated.While this work used these observations to analyze the performance of IASI level-2 temperature and moisture profiles as processed by EUMETSAT, a similar analysis could be conducted on any thermodynamic profiling satellite or retrieval system, including the retrievals produced for various satellites by the NOAA Unique Combined Atmospheric Processing System (NUCAPS) or for existing or forthcoming hyperspectral geostationary sounders.Furthermore, separate analyses could be performed for microwave-only and infrared-only retrievals.When compared to radiosondes, the use of combined AMDAR and WVSS-II reports throughout the troposphere and lower stratosphere greatly increased the spatial and temporal coverage and density over the CONUS.Furthermore, as radiosondes are limited to standard global synoptic observing times while low Earth-orbiting profiling satellites are typically Sun-synchronous, many regions have significant mismatches between satellite overpasses and radiosonde launches.Airborne observations have no such limitation and therefore constitute a very valuable reference for the monitoring of polar satellite products in ways not previously possible.As a result, evaluation of IASI performance over regions like the western CONUS is now possible.
Results showed strong consistency between the IASI sounders on Metop-A and Metop-B, and aircraft reports agreed well with more limited, traditional radiosonde intercomparison available elsewhere.The more robust AMDAR intercomparisons also revealed good consistency in retrieval bias and uncertainty across different regions of the CONUS.The least amount of uncertainty for IASI temperature retrievals was generally found in the middle troposphere, during summertime days, and for near-nadir-pointing scans.SH retrievals showed the least absolute uncertainty where the least amount of water vapor was present, while RH uncertainties generally behaved as temperature uncertainties did, though there was less dependence on the satellite zenith angle.
It is worth noting that the AMDAR and WVSS-II observing systems are designed for operational meteorology and thus operate continuously in an automated manner.In this era of rapidly cycling forecast models, the airborne data need to be downlinked to the surface, quality controlled, and distributed to partnering weather services for assimilation https://doi.org/10.5194/amt-17-1-2024Atmos.Meas.Tech., 17, 1-14, 2024 within minutes of the reporting time.This always-ready state of the airborne observing network means that these observations are well-suited to continuous near-real-time monitoring of satellite performance and can be integrated into the operational workflow of satellite data processing centers.In fact, work is underway to implement an operational procedure at EUMETSAT to use AMDAR and WVSS-II observations to continuously evaluate the performance of the IASI level-2 retrievals.
While this initial work focused on the CONUS due to the high density of airborne temperature and moisture observations, it could be expanded to any region where a significant number of observations from AMDAR-participating airlines exist.This includes Europe, eastern Asia, the Caribbean, and (at cruise level) the North Atlantic.Further studies of the performance of only temperature data provided by LEO retrievals could be expanded into these areas without added instrumentation.Future work will be to expand the scope of the analysis to these regions, to introduce new classifications like performance over different surface types and meteorological situations, and to evaluate how different matching and quality control criteria impact the magnitude of the biases and uncertainties.The paucity of radiosonde observations worldwide means that such investigations would be difficult using traditional evaluation methods, but the greater spatial and temporal coverage of airborne observations would facilitate these novel evaluations.

Figure 1 .
Figure 1.Maps of the spatial distributions of the locations of IASI-AMDAR temperature (a) and moisture intercomparisons (c) over the CONUS for all of 2017 plotted as the base-10 logarithm of the number of observations per 0.1 • latitude by 0.1 • longitude box per day.Color scales are identical between the two maps to facilitate intercomparison.The vertical distribution of the number of intercomparisons as a function of pressure is also shown (b) for both temperature (red) and water vapor (blue).Data were binned into three bins per every 100 hPa, and the base-10 logarithm of the number of intercomparisons per bin per day is plotted.

Figure 2 .
Figure 2. Vertical profiles of the bias (solid) and standard deviation (dashed) of the IASI-minus-AMDAR level-2 profile retrieval differences for temperature (a, K), SH (b, g kg −1 ), and RH (c, %).Retrievals from Metop-A are shown in blue, while Metop-B retrievals are shown in red.The relative number of observations in each vertical bin is consistent with the vertical profile shown in Fig. 1.

Figure 3 .
Figure 3.As in Fig. 2 but for different satellite zenith angles.Darker blue lines represent more nadir-pointing views, while lighter green lines represent more oblique angles.

Figure 5 .
Figure 5.As in Fig. 2 but for the four seasons of winter (blue), spring (green), summer (yellow), and fall (red) of 2017.

Figure 7 .
Figure 7. Spatial variability of bias (a, c, e) and standard deviations (b, d, f) for temperature (a, b, K), SH (c, d, g kg −1 ), and RH (e, f, %) for all IASI − AMDAR − WVSS-II differences for observations at or above the 300 hPa pressure level.Intercomparisons were spatially grouped into 1 • latitude by 1 • longitude bins.Dark gray areas represent regions with no observations.
spectrometer that directly counts individual water vapor molecules.From that, the system is able to calculate SH with a high degree of accuracy.In the United States, certain Boeing 737 airplanes from Southwest Airlines and Boeing 757 airplanes operated by UPS Airlines have been equipped with WVSS-II.These two carriers complement each other well, since the passenger carrier Southwest Airlines tends to operate during daytime and early evening, while UPS, as a freight carrier, usually operates during overnight hours to facilitate next-day shipping.Additional WVSS-II observations are available in Europe from Airbus A320 planes serving short-haul destinations out of Lufthansa's Frankfurt, Germany, hub.Williams et al. ( 5194/amt-17-1-2024 Atmos.Meas.Tech., 17, 1-14, 2024 laser diode absorption