Validation of Aeolus winds using radiosonde observations and numerical weather prediction model equivalents

In August 2018, the first Doppler wind lidar, developed by the European Space Agency (ESA), was launched on board the Aeolus satellite into space. Providing atmospheric wind profiles on a global basis, the Earth Explorer mission is expected to demonstrate improvements in the quality of numerical weather prediction (NWP). For the use of Aeolus observations in NWP data assimilation, a detailed characterization of the quality and the minimization of systematic errors is crucial. This study performs a statistical validation of Aeolus observations, using collocated radiosonde measurements and NWP forecast equivalents from two different global models, the ICOsahedral Nonhydrostatic model (ICON) of Deutscher Wetterdienst (DWD) and the European Centre for Medium-Range Weather Forecast (ECMWF) Integrated Forecast System (IFS) model, as reference data. For the time period from the satellite’s launch to the end of December 2019, comparisons for the Northern Hemisphere (23.5–65 N) show strong variations of the Aeolus wind bias and differences between the ascending and descending orbit phase. The mean absolute bias for the selected validation area is found to be in the range of 1.8–2.3 ms−1 (Rayleigh) and 1.3–1.9 ms−1 (Mie), showing good agreement between the three independent reference data sets. Due to the greater representativeness errors associated with the comparisons using radiosonde observations, the random differences are larger for the validation with radiosondes compared to the model equivalent statistics. To achieve an estimate for the Aeolus instrumental error, the representativeness errors for the comparisons are determined, as well as the estimation of the model and radiosonde observational error. The resulting Aeolus error estimates are in the range of 4.1–4.4 ms−1 (Rayleigh) and 1.9–3.0 ms−1 (Mie). Investigations of the Rayleigh wind bias on a global scale show that in addition to the satellite flight direction and seasonal differences, the systematic differences vary with latitude. A latitude-based bias correction approach is able to reduce the bias, but a residual bias of 0.4–0.6 ms−1 with a temporal trend remains. Taking additional longitudinal differences into account, the bias can be reduced further by almost 50 %. Longitudinal variations are suggested to be linked to land–sea distribution and tropical convection that influences the thermal emission of the earth. Since 20 April 2020 a telescope temperature-based bias correction scheme has been applied operationally in the L2B processor, developed by the Aeolus Data Innovation and Science Cluster (DISC).

Abstract. In August 2018, the first Doppler wind lidar, developed by the European Space Agency (ESA), was launched on board the Aeolus satellite into space. Providing atmospheric wind profiles on a global basis, the Earth Explorer mission is expected to demonstrate improvements in the quality of numerical weather prediction (NWP). For the use of Aeolus observations in NWP data assimilation, a detailed characterization of the quality and the minimization of systematic errors is crucial. This study performs a statistical validation of Aeolus observations, using collocated radiosonde measurements and NWP forecast equivalents from two different global models, the ICOsahedral Nonhydrostatic model (ICON) of Deutscher Wetterdienst (DWD) and the European Centre for Medium-Range Weather Forecast (ECMWF) Integrated Forecast System (IFS) model, as reference data. For the time period from the satellite's launch to the end of December 2019, comparisons for the Northern Hemisphere (23.5-65 • N) show strong variations of the Aeolus wind bias and differences between the ascending and descending orbit phase. The mean absolute bias for the selected validation area is found to be in the range of 1.8-2.3 m s −1 (Rayleigh) and 1.3-1.9 m s −1 (Mie), showing good agreement between the three independent reference data sets. Due to the greater representativeness errors associated with the comparisons using radiosonde observations, the random differences are larger for the validation with radiosondes compared to the model equivalent statistics. To achieve an estimate for the Aeolus instrumental error, the representativeness errors for the comparisons are determined, as well as the estimation of the model and radiosonde observational error. The resulting Aeolus error estimates are in the range of 4.1-4.4 m s −1 (Rayleigh) and 1.9-3.0 m s −1 (Mie). Investigations of the Rayleigh wind bias on a global scale show that in addition to the satellite flight direction and seasonal differences, the systematic differences vary with latitude. A latitude-based bias correction approach is able to reduce the bias, but a residual bias of 0.4-0.6 m s −1 with a temporal trend remains. Taking additional longitudinal differences into account, the bias can be reduced further by almost 50 %. Longitudinal variations are suggested to be linked to land-sea distribution and tropical convection that influences the thermal emission of the earth. Since 20 April 2020 a telescope temperature-based bias correction scheme has been applied operationally in the L2B processor, developed by the Aeolus Data Innovation and Science Cluster (DISC).
wind lidar called ALADIN (Atmospheric LAser Doppler INstrument) which is the first European lidar and the first ever Doppler wind lidar (DWL) in space (Stoffelen et al., 2005;Reitebuch, 2012;ESA, 2008). ALADIN provides profiles of the line-of-sight (LOS) wind component perpendicular to the satellite velocity at an angle of 35 • off-nadir from the ground up to 30 km.
The Aeolus mission primarily aims to demonstrate improvements in atmospheric wind analyses for the benefit of numerical weather prediction (NWP) and climate studies (Stoffelen et al., 2005;Rennie and Isaksen, 2019). Despite the advancement of the Global Observing System (GOS), there are still major deficiencies, the lack of accuracy being one of the significant limitations of currently used wind observation methods (Källén, 2018). Accurate vertical profiles of the wind field from radiosondes, wind profilers, and commercial aircraft ascents and descents are mainly concentrated over continents in the Northern Hemisphere, whereas only a few profiles are available over the oceans and on most parts of the Southern Hemisphere. Atmospheric motion vectors derived from tracking cloud and water vapor structures in consecutive satellite images provide single-level winds with nearly global coverage but exhibit significant systematic and correlated errors due to uncertainties of their height assignment (e.g., Folger and Weissmann, 2014;Bormann et al., 2003). The vast majority of global observations consist of satellite radiances, mainly providing information on the atmospheric mass field (temperature, humidity, other trace gases, and hydrometeors). Wind information can only be retrieved indirectly from these observations, which is a particularly strong restriction in the tropics in the absence of geostrophic balance (Stoffelen et al., 2005). The actively sensed globally distributed lidar LOS winds are therefore filling a major gap of the GOS, especially in the upper troposphere and the lower stratosphere, in the tropics, and over the oceans (Baker et al., 2014;ESA, 2008). It has been shown that improvements are to be expected for short-range forecasts of severe weather situations, the analysis of tropical dynamics, and for a better definition of smaller-scale circulation systems in midlatitudes (e.g., Marseille et al., 2008;Tan and Andersson, 2005;Weissmann and Cardinali, 2007;Weissmann et al., 2012;Žagar, 2004). A crucial prerequisite for the use of meteorological observations in NWP data assimilation systems is a good knowledge of their statistical errors and the minimization of systematic observation errors. For this purpose, uncertainty assessment and validation through extensive comparisons with reference data is an essential requirement to assimilate these novel observations in NWP models and fully exploit the provided wind information.
The Aeolus direct detection wind lidar (ALADIN) is operating in the ultraviolet spectral region (354.8 nm). The laser emits pulses of about 60 mJ at a frequency of 50.5 Hz. A Cassegrain telescope with a diameter of 1.5 m collects the backscattered signal, and its Doppler shift is analyzed by a dual-channel receiver to measure backscattered signals from both molecules (Rayleigh channel) and particles (Mie channel) (ESA, 2008;Reitebuch, 2012). This complementarity of the two channels allows for broad vertical and horizontal data coverage in the troposphere. In preparation for the Aeolus mission, a prototype of the satellite instrument, the ALADIN Airborne Demonstrator (A2D), was deployed to test the wind measurement principles under real atmospheric conditions in several measurement campaigns, and to provide information on quality control algorithms (Lux et al., 2018). Two airborne validation campaigns with an operation base at DLR (Deutsches Zentrum für Luft-und Raumfahrt e.V.) Oberpfaffenhofen were already performed within the first 10 months after the satellite's launch. Deploying the A2D and a 2 µm DWL as reference, wind data for the first experimental comparisons with the Aeolus wind product and model wind data from the European Centre for Medium-Range Weather Forecasts (ECMWF) were provided. Detailed information and results have been published in Lux et al. (2020) and Witschas et al. (2020). Further, Aeolus wind observations are compared to the direct-detection Doppler lidar LIOvent at the Observatoire de Haute-Provence for a time period at the beginning of 2019 (Khaykin et al., 2020) and to wind profiles obtained from radiosonde launches on board the German RV Polarstern in autumn 2018 across the Atlantic Ocean (Baars et al., 2020). Airborne Doppler lidars have been used in several case studies of mesoscale phenomena, such as the French mistral (Drobinski et al., 2005), Alpine foehn (Reitebuch et al., 2003), the sea breeze in southern France (Bastin et al., 2005), or the Alpine mountain-plain circulation (Weissmann et al., 2005). As part of the German initiative EVAA (Experimental Validation and Assimilation of Aeolus observations), this paper presents the evaluation of Aeolus winds using operational collocated radiosonde data from the GOS as reference. Besides, observation monitoring statistics from the global ICOsahedral Nonhydrostatic model (ICON) of Deutscher Wetterdienst (DWD) and the ECMWF Integrated Forecast System (IFS) model are analyzed to corroborate the results and investigate dependencies and possible causes of systematic deviations.
The text is structured as follows. First, an overview and description of the data sets used for the evaluation of the Aeolus winds are provided. Collocation criteria are specified and the statistical methods for the comparison are described. Section 3 presents a time series of the validation, focusing on the temporal evolution of systematic and random differences between the Aeolus observations and the reference data sets. The derivation of error estimates for the Aeolus instrumental error includes the determination of representativeness errors which is based on analysis data from the regional model COSMO (Consortium for Small-scale MOdeling) of DWD and high-resolution ICON large eddy model (LEM) simulations. In Sect. 4, the Aeolus Rayleigh channel bias is investigated in more detail and two bias correction approaches are evaluated. Finally, the results are discussed and summarized.

Data and method
The Aeolus Level 2B (L2B) product is evaluated using collocated radiosonde observations of the GOS and short-term model forecast equivalents (first-guess departure statistics) of the global model ICON of DWD and the ECMWF model as reference.

Aeolus L2B wind product
The Aeolus L2B product contains the Horizontal LOS wind component (HLOS) observations suitable for NWP data assimilation . The majority of wind data are provided by the Rayleigh channel. In clear conditions, the Rayleigh wind coverage is from the surface up to 30 km. The Mie signals are strong within optically thin clouds and on top of optically thick clouds and cover the atmospheric boundary layer as well as aerosol layers for clear-sky conditions. Each Aeolus measurement is an accumulation of 20 laser pulses, which corresponds to a horizontal resolution of about 2.9 km. To achieve a sufficient signal-to-noise ratio to comply with the stringent wind accuracy requirements, observations are processed by averaging up to 30 individual measurements. The resulting HLOS wind observation therefore represents a horizontal average over 86.4 km. For the Mie channel, the horizontal integration length of the wind measurements was decreased to approximately 10 km after 5 March 2019, taking advantage of the higher signal-to-noise ratio of cloud returns (Šavli et al., 2019). In addition to HLOS observations, the Aeolus L2B processor developed by ECMWF and the Royal Netherlands Meteorological Institute (KNMI) provides an observation instrument noise estimate. Furthermore, to reduce systematic errors, corrections for the temperature and pressure dependence of the Rayleigh winds are performed using a priori information from the ECMWF model interpolated along the Aeolus track (Dabas et al., 2008). The measurements within an observation are classified into an observation type, clear or cloudy, using measurement-scale (2.9 km) optical properties, such as scattering ratio. Wind retrievals are performed for both channels resulting in four observation products (Rayleigh clear, Rayleigh cloudy, Mie clear and Mie cloudy). The vertical resolution varies from 0.25 km near the surface to 2 km in the highest bins, with a total of 24 bins. The processing at ECMWF is performed in near-real time; thus measurements are delivered within 3 h. More detailed information about the L2B processor retrieval algorithm can be found in Rennie et al. (2020). As Aeolus is a novel mission, the processing algorithms have been evolving since launch. Different processor baselines (in this study 2B02-2B07) and various updates led to different observation quality in different time periods. A consistent reprocessed data set with unique processor settings is not available yet. Furthermore, the instrument performance varied over the time period assessed in this study, which includes the missions Commissioning Phase (CP) from launch until the end of January 2019, the late Flight Model A (FM-A) laser period until mid-June 2019, and the FM-B laser period until the end of December 2019. Information about the actual performance of the Aeolus wind lidar and a discussion of the systematic and random error sources can be found in  and Rennie and Isaksen (2020). For the validation, the following quality control criteria are applied. Only valid Rayleigh clear and Mie cloudy winds (from now on referred to as Rayleigh and Mie) between 800 and 80 hPa are used. A distinction is made between the ascending orbital pass, when the satellite moves north, and the descending orbital pass, when the satellite moves south. Based on a compromise between the quality of the data set and the number of observations that pass the quality control, Rayleigh winds with an estimated error greater than 6 m s −1 and Mie winds with an estimated error greater than 4 m s −1 are excluded. Thus, on average over the validation period about 70 % of the Rayleigh and 76 % of the Mie winds are available for the analysis. On 14 June 2019 a correction scheme for dark current signal anomalies of single pixels (hot pixels) on the accumulation-charge-coupled devices (ACCDs) of the Aeolus detectors has been implemented into the Aeolus operational processor chain . All measurements before 14 June 2019 affected by hot pixels are excluded from the validation statistic.

Radiosonde data and collocation criteria
Radiosonde observations generally provide very accurate information on the true wind conditions. Given that radiosonde wind data are direct in situ measurements, the inherent errors (e.g., instrument errors) are small compared to errors of satellite-based instruments. That makes them well suited to serve as a reference data set for the true atmospheric state for the validation of Aeolus HLOS winds. Furthermore, the observation errors can be assumed to be uncorrelated between different radiosondes. At ECMWF, radiosonde feedback files are created from the Observational DataBase (ODB) at the end of the IFS analysis and archived in the Meteorological Archival and Retrieval System (MARS). For stations where ECMWF is assimilating BUFR (Binary Universal Format for the Representation of meteorological data) data, the balloon drift is taken into account by splitting data into groups of 15 min. Radiosonde feedback files from alphanumeric reports only contain the time and position of the radiosonde's launch, but not the time and position of the individual wind observations. Due to the radiosonde drift during the sounding and the ascent time, additional errors arise. Seidel et al. (2011) evaluated characteristic values of average drift distances to be 5 km in the mid-troposphere, 20 km in the upper troposphere, and 50 km in the lower stratosphere, tending to be larger in midlatitudes than in the tropics. A few individual radiosondes are found to drift up to 200 km. Estimates of the ascent time of the balloon range from 5 min, when it reaches 850 hPa, up to 1.7 h at 10 hPa. These values should be taken into account when defining collocation criteria for comparisons with radiosondes. In this study, all radiosonde observations that are within 120 km horizontal, 90 min temporal, and 500 m vertical distance from the Aeolus measurements are used for the validation statistics. For each location, the radiosonde HLOS wind component is computed as a linear function of the zonal wind component u and the meridional wind component v as where φ is the L2B azimuth angle, which is defined clockwise from north of the horizontal projection of the target to the satellite pointing vector. Since radiosonde observations are rare in the Southern Hemisphere and polar regions, the analysis concentrates on the midlatitudes of the Northern Hemisphere (23.5-65 • N). To achieve a sufficiently large data set, statistics for 1 d are based on a running mean over 7 d.

Model data for the validation
For a more comprehensive global assessment, the validation results of Aeolus winds with radiosondes are supplemented by a comparison to model equivalents from the global model ICON of DWD and the ECMWF IFS model. Due to the inhomogeneous spatial and temporal distribution of radiosondes, the model data continue to serve as a basis for further investigations of longitudinal and latitudinal bias dependencies. The global NWP system of DWD combines a threedimensional variational technique (3D-Var) with a local ensemble transform Kalman filter (LETKF) to produce consistent initial states for an ensemble forecasting system using the ICON model. The first-guess forecast of the deterministic ICON with approximately 13 km horizontal grid spacing is used to calculate the observation first-guess departures (O-B). In contrast to the ECMWF data assimilation system (4D-Var) with a grid spacing of approximately 9 km, the observations are not used at their actual time, but all observations within an observation window (±1.5 h around the analysis time) are assumed to be valid at the analysis time. The Aeolus observational feedback files of the ECMWF IFS model, as well as the monitoring files of the ICON model used for this study, include all observations that were screened by the data assimilation system but did not influence the analysis. At ECMWF, the Aeolus HLOS winds have been assimilated operationally since 9 January 2020; at DWD the operational assimilation started on 19 May 2020.
To ensure comparable data sets for the radiosonde and the ECMWF and DWD model validation of the Aeolus winds, only the nearest O-B value per radiosonde collocation is used for the model validation statistics. To put the regional validation results in a global context, a global statistic with the ECMWF O-B values is calculated, additionally. For this, a similar approach of limited regions and limited time periods is chosen. O-B statistics are calculated for regions of 10 • latitude × 10 • longitude and over periods of 7 d before they are averaged for the whole globe, to reduce the influence of horizontal and temporal fluctuations of systematic errors on the random errors.

Representativeness errors for the Aeolus wind validation
The knowledge of representativeness errors is a key to determine the Aeolus wind instrumental error. Firstly, representativeness errors arise due to different measurement geometries of the compared data sets. While the Aeolus HLOS wind observations correspond to line measurements, the NWP models are treating the Aeolus HLOS winds as point measurements. Also, the radiosonde observations can be regarded as point measurements. For the estimation of the representativeness error for the comparison of radiosonde and Aeolus data, three further error sources need to be taken into account: the spatial and temporal difference resulting from the collocation criteria, the spatial and temporal difference resulting from the displacement during the radiosonde ascents when radiosonde data from alphanumeric reports are assimilated (13 % of the radiosonde data), and the temporal offset value for the grouping time interval when accounting for balloon drift in BUFR data (87 % of the radiosonde data). The different error components are evaluated using analysis data of the regional COSMO-DE model of five 7 d periods (February, April, June, October, and December 2016). The COSMO-DE model covers Germany, Switzerland, Austria, and parts of other neighboring states and has a horizontal grid spacing of 2.8 km and 50 levels in the vertical. The data are only used up to 12 km to avoid influences of large model errors and uncertainties of the simulation in the stratosphere.
To determine the effect of unresolved scales in the COSMO-DE analyses, the results are compared to a 3 d (3 to 6 June 2016) large-eddy simulation with the ICON model centered over Germany with 150 m horizontal resolution and 150 levels in the vertical. This way, an offset value is calculated, which is added to the representativeness errors. A more detailed description of the estimation of the Aeolus instrumental error is provided in Sect. 3.2.

Statistical metrics
The following outlines the applied statistical metrics. Using the forecast of NWP models as reference, the bias estimate is described by the mean first-guess departure: where i represents the time step and N is the number of compared data points. y is the Aeolus HLOS wind observation, x b is the state vector of the short-term model forecast (background), and H (.) is the observation operator. Given that the model bias for long validation periods and large scales is usually small in comparison to that of Aeolus observations, the mean difference between the Aeolus observations and the reference data can be referred to as bias. In certain conditions, such as in jet stream regions, the tropical upper troposphere, and the stratosphere, however, Aeolus HLOS bias estimates based on NWP monitoring statistics should be treated with caution (Rennie, 2016). The bias using the radiosonde measurements as reference is calculated according to For quantifying random deviations, the standard deviation as well as the scaled median absolute deviation (MAD) is determined for the three reference data sets. The MAD is a very robust measure for the variability of the Aeolus HLOS winds, being more resilient to single outliers compared to the standard deviation. In case of a normally distributed data set, the MAD value multiplied by 1.4826 (scaled MAD) is identical to the standard deviation (Ruppert and Matteson, 2015).
3 Validation results -time series characteristics and error estimation of Aeolus HLOS wind comparisons

Systematic and random differences
For the time period from the first available L2B data after the satellite's launch up to January 2020, systematic and random differences between the Aeolus HLOS winds, radiosondes, and model fields are calculated. Figure 1  The mean scaled MAD constantly shows larger values for the validation using the radiosonde data as reference compared to the model O-B statistics. This can be explained by the larger representativeness errors associated with radiosondes, which can be regarded as in situ point measurements.
Besides the higher spatial resolution of a radiosonde observation compared to the resolution of a global NWP model, representativeness errors arise from the chosen collocation criteria and the spatial and temporal displacement during the radiosonde ascents. These error sources are considered in the Aeolus HLOS wind error estimation in the following Sect. 3.2. Comparing the two NWP models, the mean scaled MAD calculated with the ECMWF model is on average about 0.14 m s −1 smaller than when using O-B statistics of the DWD global model. This is likely to be mainly the result of neglecting the temporal evolution within the assimilation window in the DWD system. The globally derived Rayleigh wind mean absolute bias estimates, which are based on ECMWF first-guess departures of limited areas (10 • latitude × 10 • longitude) and periods (7 d) are slightly smaller compared to the model validation results of the restricted areas on the Northern Hemisphere. For the Mie winds, the global statistic shows values in the range of the three local validation statistics around the radiosonde collocations.
Assessing the temporal development of the Aeolus wind bias, it is apparent that the quality of the observations varies over time. To some extent, this is caused by six different processor baselines, and several updates of the calibration files during the selected time period, which makes the data partly inconsistent and incompatible. Right after the Aeolus launch, the Rayleigh wind ascending phase exhibits a negative bias, whereas the descending phase is positively biased. With time, the Rayleigh bias increases for both orbits. In January 2019, there was a reboot anomaly on the GPS unit on the satellite which led to the ALADIN instrument being in a stand-by mode for around 1 month (grey shaded area). Right after the standby period, the Rayleigh ascending bias reaches its maximum. For the descending orbit, the maximum occurs later in April 2019. The Mie winds' mean differences also show a positive trend within the first 8 months, but the values are smaller compared to the Rayleigh bias. The higher fluctuations in bias compared to the Rayleigh winds might be linked to the sparser coverage of the Mie winds and the higher variability and larger model error when clouds are present. Related to an update of the processor setting file at the end of May (Rennie and Isaksen, 2020), the estimated bias shows   when the Rayleigh bias is negative for both orbit phases, a sharp increase occurs in mid-December. This is caused by a manual L2B processor bias correction of +4 m s −1 in the Rayleigh wind product to compensate for a global average bias drift. The Mie wind mean differences are only slightly increasing. All three independent reference data show very good agreement for the bias estimation, raising confidence that the results are not determined by model biases. Besides the temporal changes in Aeolus Rayleigh and Mie wind quality, the discrepancies between the ascending and descending orbit, mainly for the Rayleigh channel, are a challenging issue for using these data in NWP models. Significant differences occur especially in late summer and autumn. Assessing the mean absolute values, the bias is larger for the descending than for the ascending orbit for both channels. For a more detailed analysis of the Rayleigh bias, see Sect. 4. The Rayleigh wind random differences calculated based on model O-B statistics vary between 3 and 6 m s −1 within the considered validation period. For the comparison with radiosonde observations, the mean random difference ranges from 4 up to 7 m s −1 . The Mie wind random differences show smaller values in total, but stronger fluctuations. Overall, a slight increase in standard deviation and scaled MAD until summer 2019 is visible. This is likely associated with the energy decrease in the FM-A laser over time. The laser switch led to reduced random differences for the Rayleigh channel. The Mie wind random differences do not exhibit such clear changes, because the Mie return signal does not only depend on the laser energy but also on the presence of aerosols or hydrometeors. Since mid-October 2019, the Rayleigh wind random differences again show a small increase. Comparing the standard deviation and scaled MAD, no striking differences appear. On average, the standard deviation is about 0.20 m s −1 larger than the scaled MAD, implying a few outliers in the statistics. To derive error estimates of the Aeolus HLOS winds, the representativeness errors of the comparisons and errors resulting from the radiosonde measurements and the NWP models also have to be taken into account (see Sect. 3.2).

Estimation of the Aeolus HLOS wind error
The total variance of the difference between radiosonde observations and Aeolus HLOS winds σ 2 val (squared scaled MAD) is the sum of the variance resulting from the Aeolus wind instrumental error σ 2 Aeolus , the variance resulting from the radiosondes wind observational error σ 2 RS , and the variance caused by the representativeness error σ 2 r_RS (Weissmann et al., 2005) (see Eq. 6a). For the comparison with model equivalents, the model representativeness error σ 2 r_model is used, and σ 2 RS is replaced by the model error σ 2 (see Eq. 6b).
As no model error estimate is available in the monitoring files of the ICON model, the Aeolus HLOS wind error is only assessed for the validation with the ECMWF model and the radiosonde observations. For the estimation of the radiosonde representativeness error σ r_RS , error sources caused by spatial and temporal displacements need to be considered, additionally to the different measurement geometries of the radiosonde and the Aeolus observations. Therefore, it is necessary to make a distinction between radiosondes for which the drift is assimilated (87 %) and those reports which only contain the launch position and time (13 %). For both cases, the temporal and the spatial part of the representativeness error, resulting from the collocation criteria, has to be considered. The error due to the spatial displacement is assessed by determining the differences between a point and a line measurement as weighted mean over distances up to 120 km in east-west and northsouth directions, and calculating the weighted average over altitude. To account for the temporal displacement, a timeoffset value is estimated by assessing the representativeness error of the appropriate spatial displacement. The mean wind velocity over the validation period (15.27 m s −1 ) and the temporal collocation criteria of 90 min results in a spatial displacement of 82 km, which corresponds to a representativeness error of 1.26 m s −1 for both channels with 90 km horizontal resolution and 1.40 m s −1 for the Mie winds with 10 km horizontal resolution. For the 13 % of the radiosonde data without the drift information, additionally an error component due to the spatial displacement up to 50 km and an error component due to the temporal displacement during the radiosonde ascents up to 90 min has to be considered. For the 87 % of the radiosondes with the drift information, a temporal offset value for the 15 min time interval, into which the data are grouped, has to be taken into account. Those parts of the representativeness error are calculated accordingly to the parts resulting from the collocation criteria, using the COSMO-DE analyses. To determine the overall contribution, the variances of the three different error components are summed up. As a last step, the effect of unresolved scales in the COSMO-DE analyses has to be assessed by using the high-resolution ICON-LEM simulation. Figure 2a shows the differences between a point and a line measurement averaged and weighted over distances up to 200 km as a function of altitude for the ICON-LEM and COSMO-DE data of the same date. The COSMO-DE model underestimates the representativeness error compared to the ICON-LEM simulation. On average, the offset value between the two models is 0.20 m s −1 . This offset value is added to the sum of the variances of the different error components, resulting in a representativeness error of 2.48 m s −1 for the Rayleigh winds, 2.49 m s −1 for the Mie winds with 90 km horizontal resolution, and 2.66 m s −1 for the Mie winds with 10 km horizontal resolution.

Model error σ b and radiosonde wind observational error σ RS
The ECMWF model error is derived from the ensemble data assimilation first-guess error, stored in the ODB. It provides a good measure for spatial and temporal variation of the background error. Table 2 displays the values of σ b as mean over the validation period for the Rayleigh winds, and as mean over the time periods before and after the change of the horizontal resolution for the Mie winds. They are determined for the latitudinal band between 23.5 and 65 • N, and globally. The values taken for the model error are only valid at the start of the 4D-Var window. They are increasing during the 12 h window. As NWP models in general tend to exhibit higher uncertainty in cloudy than in clear-sky areas, σ b is larger for the Mie winds with 90 km horizontal resolution. After the decrease in the horizontal integration length of the Mie wind measurements to approximately 10 km in the L2B product, the number of Mie wind observations increased, leading to a reduction in model error.
The radiosonde observational error σ RS is assumed to be 0.7 m s −1 , according to the estimated GCOS (Global Climate Observing System) Reference Upper-Air Network (GRUAN) measurement uncertainty (Dirksen et al., 2014).  Table 2. Overview of the estimated Aeolus wind instrumental error σ Aeolus (bold font) and the single components of the calculation (representativeness errors σ r_RS and σ r_model , radiosonde observational error σ RS , ECMWF model errors σ b , and random differences from the validation σ val_RS and σ val_model ) for the Rayleigh and Mie winds for the ascending and descending orbital pass for the Northern Hemisphere (23.5-65 • N), restricted to the radiosonde collocations, and for a global statistic using the ECMWF model.

Aeolus wind instrumental error σ Aeolus
The Aeolus wind instrumental error is calculated using Eqs. (6a) and (6b). Table 2 shows the values of σ Aeolus for the validation with radiosonde observations and ECMWF model equivalents for the latitudinal band between 23.5 and 65 • N for the Rayleigh and Mie winds, separated for the ascending and descending orbit phase. Additionally, σ Aeolus is derived for the global statistic using the ECMWF O-B values. The Rayleigh wind error estimate is 4.37 m s −1 (4.23 m s −1 ) for the ascending (descending) orbit using radiosonde observations as reference data, and 4.07 m s −1 for the ascending and the descending orbit for the comparison with model equivalents of the ECMWF model. The estimated error of the Mie winds with 90 km (10 km) horizontal resolution is around 2.68 m s −1 (3.00 m s −1 ) for the radiosonde validation and around 2.00 m s −1 (2.75 m s −1 ) for the model validation. For both channels σ Aeolus shows good agreement between the ascending and descending orbit phase. The differences between the model and radiosonde validation are at most 0.31 m s −1 , except for the Mie winds with 90 km resolution. Because the estimation of the repre-sentativeness error is based on averaged values of analyses which only cover the area around Germany at certain time periods, the values are affected by small uncertainty factors. As the model error estimates are associated with uncertainty, it is assumed that the discrepancies between the radiosonde and model validation are due to uncertainties in the calculation of the different error sources. Comparing the globally derived Aeolus wind instrumental errors with the results of the validation statistics of the Northern Hemisphere, smaller values occur for the Mie winds, whereas the Rayleigh wind instrumental errors show good accordance. It has to be taken into account that the representativeness error, considered for the global statistics, is based on a domain only covering central Europe. The results for the radiosonde and the model validation are found to correspond well to the results of Witschas et al. (2020) for comparisons with a 2 µm DWL during the validation campaigns WindVal III and AVATARE (Aeolus Validation Through Airborne Lidars in Europe) over Europe in late autumn 2018 and early summer 2019. By excluding the 2 µm DWL measurement error a Aeolus instrumental error of 3.9-4.3 m s −1 (2.0 m s −1 ) for the Rayleigh (Mie) winds is determined . Rennie and Isaksen (2020) estimate the Aeolus instrumental error using the ECMWF model on a global base by subtracting a background u wind error of 1.6 m s −1 , resulting in a σ Aeolus of 4-5 m s −1 (3 m s −1 ) for the Rayleigh (Mie) winds. The slight discrepancies are probably related to the small selected regions around radiosonde collocation points, from which the validation results in Table 2 are derived. The global statistic in this study is based on a similar approach using restricted regions and short time periods. These limited areas are used in particular to avoid the estimate of the random error being influenced by horizontal and temporal fluctuations of the bias.

Investigation of the Aeolus L2B HLOS Rayleigh wind bias
The following part concentrates on the Aeolus L2B HLOS Rayleigh wind bias. On a global scale, bias dependencies are investigated for different time periods, and accordingly, correction schemes are tested. Figure 3 displays the Rayleigh wind bias as a function of latitude for the ascending and descending orbit phase. The values are binned into 10 • latitude bins. Results are shown for March 2019 (Fig. 3a) and August 2019 (Fig. 3b). As in the validation statistics for the Northern Hemisphere, shown in Sect. 3, the two NWP models correspond very well along the climate zones. The largest differences appear in the tropics and subtropics. The comparison of Aeolus winds with in-homogeneously distributed radiosonde measurements overall shows good agreement as well. Outliers, such as those around 20 • S or 80 • N, are mainly related to small sample sizes.

Rayleigh wind bias dependence on latitude and orbit phase
Representative for winter and spring, Fig. 3a shows that the bias is quite constant with latitude in that period. Small differences between the orbital passes occur in the Southern Hemisphere and in the subtropical region of the Northern Hemisphere. From 40 • N up to the North Pole, almost no deviation between ascending and descending orbit is visible.
In August 2019 (Fig. 3b), the bias varies with latitude with an amplitude of 4-5 m s −1 . As seen in Sect. 3 for the summer and autumn season, large differences between the orbit phases exist, in particular outside of the tropics. Around the Equator, the sign of the bias is positive for the ascending and descending orbit. Between the subtropical region and the poles, the descending orbit bias is still positive, whereas the bias of the ascending orbit has a negative sign.
The results suggest that the satellites orbit phase and latitude position as well as the season seem to influence the Aeolus Rayleigh wind bias. As the formulation of most data assimilation schemes assumes unbiased incoming observations, the correction of systematic differences is crucial. Thus, a test is first made to see if a bias correction approach as a function of latitude based on first-guess departures of the preceding week, separately for ascending and descending orbit, can remove the systematic differences for the validation period.

Rayleigh wind bias correction approach as function of latitude
Based on the previous results, a bias correction approach is evaluated and tested with the ECMWF IFS and the ICON model monitoring data sets. For latitude bins of 10 • , the firstguess departures from the previous 7 d are averaged using the following weights: with i = 0 being the current day. The resulting correction values are subtracted from the first-guess departure of the considered day and the residuals are averaged for each month of the validation period (Fig. 4). Considering the effect of the orbit phase differences, this is done separately for the ascending and descending satellite pass. To estimate if the model bias matters three different configurations are tested, which differ regarding the correction values: the bias correction values are based on the same model (dark filled markers); the bias correction value is calculated with the other NWP model (unfilled markers); the bias correction value is an average value of the two NWP models (light filled markers).  After applying the bias correction, a temporal variation as seen in Sect. 3 for the systematic differences is still apparent in the residuals. At the beginning of the Aeolus mission, the correction is quite efficient. In spring 2019, when the latitude dependence is comparably weak and the bias compa-rably high, a residual up to over 1 m s −1 remains. After the processor update in May 2019, when the Rayleigh ascending wind bias tends to be negative, the residual bias also exhibits a negative sign. Differences between the two models regarding the sign of the remaining bias are visible in September A. Martin et al.: Validation of Aeolus wind observations 2018 for the ascending orbit and in December 2019. In total, the correction is able to clearly decrease the systematic differences, but there is a remaining bias, in particular in phases with large temporal changes of the bias. The seasonal variation of the bias and the influence of the latitudinal position of the satellite suggest a link to temporal and spatial variations in long-wave and solar radiation. Including the longitudinal component, the spatial bias dependence for different time periods is examined in more detail in the following Sect. 4.2. Table 3 presents the mean absolute residual bias averaged over the validated time period for the three applied latitudedependent correction values. In total, the bias is reduced by almost 1 m s −1 for the DWD global model and even more than 1 m s −1 for the ECMWF model. A correction based on the previous 7 d of the same model yields a comparable mean absolute residual bias for the ECMWF IFS and the ICON model. Correcting the ECMWF IFS model with the correction values calculated with the ICON model gives overall the smallest remaining bias and largest reduction. The ICON model O-B statistic in contrast shows worse results when applying information of the ECMWF IFS model to correct for the latitude-dependent error. Overall, the bias correction approaches show a statistically significant reduction in bias. However, no significant differences between the individual methods were found (following a Student's t distribution), which again indicates that model biases do not have a dominant effect on the bias assessment.
Altogether, these results show that a temporally varying latitude-dependent bias is present for the L2B Rayleigh wind product. Results from the evaluation with the two independent NWP models and in situ observations are overall in good agreement. A latitude-dependent bias correction successfully reduces the bias, but on average, a bias of 0.37-0.59 m s −1 remains. The remaining bias is related on one hand to phases with temporal changes of the bias and on the other hand to longitudinal differences that are investigated further in the subsequent section.

Rayleigh wind bias dependence on longitude,
latitude, and orbit phase Figure 5 shows two-dimensional plots of the Aeolus Rayleigh HLOS wind bias for January, May, and September 2019 for the ascending and descending satellite orbit. In January, when the orbit phase dependence is less pronounced, small fluctuations with longitude and latitude are visible in the tropical and subtropical regions. Large positive bias values occur between 30 and 90 • N, mainly for the ascending orbital pass, and in the tropics, more present for the descending orbital pass. The band of larger systematic differences found in the tropics seems to match with the Intertropical Convergence Zone (ITCZ), which moves further south from the Equator during the southern summer. In May, the orbit phase dependence of the systematic differences is more distinct. For the ascending orbit, longitude fluctuations of large neg-ative bias values over land appear in the temperate and polar areas of the Northern Hemisphere. Variability is also still present in the equatorial region. When the satellite moves from north to south these tropical fluctuations are less conspicuous. Except for the polar region of the Northern Hemisphere, the bias is mostly positive with the highest values between 30 and 90 • S. The three gaps on the Southern Hemisphere around 60 • S are due to a technical issue at ECMWF. In autumn, when latitude and the satellite's orbit phase influences the systematic error most, a significant longitude dependence is also apparent. The land-sea fluctuations for the ascending orbital pass on the Northern Hemisphere and in the tropical region are more pronounced. For the descending orbit, variability is mainly present in the Southern Hemisphere and it is not clear whether this is linked to the land-sea distribution. The positive bias band in the ITCZ region is still present for both orbits. Furthermore, the results of the ECMWF IFS model are again compared to the ICON model O-B statistics (Fig. 6), showing overall no statistically significant differences. Larger differences only emerge in the tropics, the area where NWP models in general differ the most, and in the midlatitudes of the summer hemisphere. Figure 5 highlights that in addition to the satellite's flight direction, latitude, and seasonal variations, longitudinal fluctuations also affect the Aeolus measurements systematically, supporting the assumption that radiative effects play an important role. To examine the extent of the influence of the longitude component, the bias correction approach outlined in Sect. 4.1 is repeated taking both geographical dimensions into account.

Rayleigh wind bias correction approach as function of latitude and longitude
For the ECMWF model, a two-dimensional bias correction approach is tested using the previous 7 d of Aeolus HLOS O-B statistics as a function of latitude and longitude averaged and weighted (Eq. 7). Bin sizes are chosen to be 10 • for both latitude and longitude. To also consider the seasonal variation, Fig. 7 displays the residuals (rose cross markers) averaged for each month for the whole validation period for the ascending and descending orbit. To get an impression of how strong the longitudinal bias variation is, the results are compared to the one-dimensional latitude-dependent correction approach from Sect. 4.1. The mean absolute remaining bias for both correction formulations is provided in Table 4. Altogether, the residual has been decreased by almost 50 % when considering the longitude dependence for both satellite orbit passes. The main improvements occur for the bias correction in late winter and early spring 2019, where a onedimensional correction approach is not that effective. Right after the mission's start, in May 2019, and at the end of the year the remaining bias is increased when taking the longitudinal dimension into account. In these months, the one-    dimensional latitude-dependent correction approach almost removes the systematic differences already. A discussion on possible reasons for the systematic bias variations and a summary of the findings in this study are presented in Sect. 5.

Summary and discussion
This study provides an overview of validation activities to determine the Aeolus HLOS wind errors and to understand the biases by investigating possible dependencies. To ensure meaningful validation statistics, collocated radiosondes and two different global NWP models, the ECMWF IFS and the ICON model of DWD, are used as reference data.
Overall, the determined mean wind differences of the comparisons with all three reference data sets show good concordance. This confirms that the detected bias is due to Aeolus L2B systematic wind errors and not the reference data set. A time series demonstrates that the Aeolus wind systematic differences vary considerably during the time period from the satellite's launch until the end of December 2019 (Sect. 3.1). Further, there are differences in bias between the ascending and descending orbit phase, which mainly occur for the Rayleigh channel in late summer and autumn. Whereas the Rayleigh descending phase winds are positively biased in these months, the ascending phase shows negative bias values. The Mie winds are less biased in total, but more fluctuating. The mean absolute bias is found to be approximately 1.8-2.3 m s −1 for the Rayleigh winds and 1.3-1.9 m s −1 for the Mie winds. These values are beyond the mission requirements of Aeolus, which state that the bias should be smaller than 0.7 m s −1 (ESA, 2016). However, it is demonstrated that the bias can be reduced to values lower than the mission requirement through calibration with observations and model fields of the preceding week.
The random differences of the Rayleigh winds show temporal changes that are mainly related to changes in the laser output energy. The Mie wind random differences are less influenced by the laser energy and quite constant with time. The mean scaled MAD of the comparisons shows the highest values when using the radiosonde observations as reference, which is caused by representativeness errors. The NWP model scaled MAD is larger for the ICON model O-B statistics than for the ECMWF first-guess departures, likely due to the neglection of temporal changes within the assimilation window in the DWD assimilation system. The Aeolus instrumental wind error σ Aeolus is estimated by determining the representativeness error for the ECMWF model validation and the radiosonde comparison, and by taking the ECMWF model error and the radiosondes measurement error into account. For the Rayleigh winds σ Aeolus is in the range of 4.1-4.4 m s −1 , for the Mie winds with 90 km horizontal resolution in the range of 1.9-2.8 m s −1 , and for the Mie winds with 10 km horizontal resolution in the range of 2.7-3.0 m s −1 . Given that the representativeness and the model error estimates exhibit large uncertainties and the subtracted bias varies a lot with latitude and longitude, these differences are probably within the range of the uncertainty of the estimates. A global statistic using the ECMWF O-B values of limited areas (10 • latitude × 10 • longitude) shows only slightly smaller values for the Mie wind instrumental errors, whereas the global Rayleigh wind instrumental errors are in good agreement with the validation results based on the Northern Hemisphere.
The second part (Sect. 4) of the results of this study further investigates the Rayleigh wind bias and its dependencies. Be- Figure 7. Residual after a latitude-dependent bias correction (magenta diamond marker) and a two dimensional latitude-longitude-dependent bias correction (rose cross marker) averaged over 1 month, using the ECMWF model equivalents. Rayleigh (a) ascending and (b) descending orbit phase. sides the satellite's flight direction and seasonal differences, latitude and longitude also influence the systematic differences. Again, the good agreement between the different validation data sets raises confidence that the results are not influenced by issues of the reference data sets. The latitude bias dependence and differences between the orbit phases mainly occur in late summer and autumn in the subtropics and temperate climate zone. A one-dimensional latitude-dependent correction approach, based on the previous 7 d, is able to reduce the bias, but still, a temporal trend of remaining bias values of 0.37-0.59 m s −1 occurs. It turned out that additionally, a longitude-dependent bias component is present that should be taken into account. When the satellite moves north, longitudinal variations are especially found in the tropics and between 20 and 60 • N, while for the descending orbit phase systematic differences mainly occur between 20 and 60 • S. These variations suggest correlations with land-sea distribution and tropical convection. A latitude-longitude correction approach using the ECMWF model equivalents is able to reduce the systematic error to 0.23-0.25 m s −1 . As the bias correction approach is essentially a temporal and spatial smoothing, it is suggested that fast changes in systematic errors are one source of the bias residuals.
At ECMWF, as part of the Aeolus Data Innovation and Science Cluster (DISC), the dominant source of the Rayleigh wind bias issues have been explained. It was found that the bias is correlated with the temperature gradients across the ALADIN primary mirror M1 of the telescope (Rennie and Isaksen, 2020). The M1 mirror temperature variation in turn is related to varying short-and long-wave radiation of the top of the atmosphere and the mirror's on-board thermal control in response to this, which explains the seasonal differences and the connection to features like convection and variations between land and sea. Since 20 April 2020 a M1 bias correction scheme has been applied operationally in the L2B processor, using a multiple linear regression method of all M1 telescope thermistors developed by the Aeolus DISC (Rennie and Isaksen, 2020). A re-processed data set including a M1 bias correction will be available in the near future. This data set should decrease the Aeolus instrumental error estimate and differences between the model and radiosonde comparisons.
Author contributions. AM performed the data analysis and prepared the main part of the publication. MW supervised the work. MW, OR, MR, and AG contributed to the development of methods and analysis of the data. OR and AG communicated important information on the Aeolus data quality and processing. MR provided knowledge about the ECMWF feedback files and the Aeolus wind bias. AC provided ideas for the bias correction approach and information about the DWD monitoring data. All co-authors engaged in discussions and contributed to the interpretation of the results.
Competing interests. The authors declare that they have no conflict of interest.
Disclaimer. The presented work includes preliminary data (not fully calibrated/validated and not yet publicly released) of the Aeolus mission that is part of the European Space Agency (ESA) Earth Explorer Programme. This includes wind products from before the public data release in May 2020 and/or aerosol and cloud products which have not yet been publicly released. The preliminary Aeolus wind products will be reprocessed during 2020 and 2021, which will include in particular a significant L2B product wind bias reduction and improved L2A radiometric calibration. Aerosol and cloud products will become publicly available by spring 2021. The processor development, improvement, and product reprocessing preparation are performed by the Aeolus DISC (Data, Innovation and Science Cluster), which involves DLR, DoRIT, ECMWF, KNMI, CNRS, S&T, ABB, and Serco, in close cooperation with the Aeolus PDGS (Payload Data Ground Segment). The analysis has been performed in the framework of the Aeolus Scientific Calibration and Validation Team (ACVT).
Special issue statement. This article is part of the special issue "Aeolus data and their application". It is not associated with a conference.