Validation of Aeolus wind profiles using ground-based lidar and radiosonde observations at La Réunion Island and the Observatoire de Haute Provence

. European Space Agency’s (ESA) Aeolus satellite mission is the first Doppler wind lidar in space, operating in orbit for more than four years since August 2018 and providing global wind profiling throughout the entire troposphere and the lower stratosphere. The Observatoire de Haute Provence (OHP) in southern France and the Observatoire de Physique de l’Atmosphère à La Réunion (OPAR) are equipped with ground-based Doppler Rayleigh-Mie lidars, which operate on similar 15 principles to the Aeolus lidar, and are among essential instruments within ESA Aeolus Cal/Val program. This study presents the validation results of the L2B Rayleigh-clear HLOS winds from September 2018 to January 2022. The point-by-point validation exercise relies on a series of validation campaigns at both observatories: AboVE (Aeolus Validation Experiment) that were held in September 2019 and June 2021 at OPAR, and in January 2019 and December 2021 at OHP. The campaigns involved time-coordinated lidar acquisitions and radiosonde ascents collocated with the nearest Aeolus overpasses. During 20 AboVE-2, Aeolus was operated in a campaign mode with an extended range bin setting allowing inter-comparisons up to 28.7 km. We show that this setting suffers from larger random error in the uppermost bins, exceeding the estimated error, due to lack of backscatter at high altitudes. To evaluate the long-term evolution in Aeolus wind product quality, twice-daily routine Météo-France radiosondes and regular lidar observations were used at both sites. This study evaluates the long-term evolution of the satellite performance along with punctual collocation analyses. On average, we find a systematic error (bias) of -0.92 25 ms -1 and -0.79 ms -1 and a random error (scaled MAD) of 6.49 ms -1 and 5.37 ms -1 for lidar and radiosondes, respectively.


Introduction
Wind velocity is one of the fundamental meteorological variables describing the atmospheric state. Assimilating atmospheric wind observations into numerical weather prediction (NWP) models is crucial to understand the evolution and structure of weather dynamics, air quality monitoring, forecasting, and climate and meteorological studies. Accurate NWPs are essential 30 for commercial activities such as agriculture, fisheries, construction, transportation, energy development, and daily life. Therefore, continuous global wind profiling is essential for enhancing our understanding of atmospheric dynamics and improving the accuracy of numerical weather predictions (Houchi et al., 2010;Albertema et al., 2019;Stoffelen et al., 2005;. The wind measurements are conducted with a large variety of techniques : Radiosondes (Houchi et al., 2010), Wind 35 Profiler Radars (WPRs) (Rogers et al., 1993), Ground-based Doppler Wind Lidars (DWLs) (Chanin et al., 1989;Baumgarten et al., 2010;Xia et al., 2012), Sodars (Anderson et al., 2005), Metal Resonance Lidars (She et al., 2004), Microwave Radiometers (Rüfenacht et al., 2012), Infrasound (Le Pichon et al., 2005), Aircrafts and Airborne Lidars (Prudden et al., 2018, Yan et al., 2015, Lux et al., 2020a, Satellites (Eyre., 2019) and Atmospheric Motion Vectors (Forsythe., 2007). The DWL technique offers a broad altitude range from the lower troposphere up to the lower mesosphere and a vertical resolution of 40 around 100 meters. In that sense, they overcome most of the other instruments in terms of both vertical resolution and altitude range, making them a perfect choice as reference instruments for satellite validation. Unfortunately, there are only a handful of them being operated today.
On August 22, 2018, The European Space Agency (ESA) launched the Aeolus satellite as part of the Living Planet Program.
With an initial estimated lifetime of three years, this mission is expected to pave the way for future operational meteorological 45 satellites dedicated to observing the atmospheric wind field in order to advance the understanding of climate processes and atmosphere dynamics (ESA, 2019;Straume et al., 2020). Aeolus is a polar-orbiting satellite in a sun-synchronous dawn-todusk orbit at about 320 km altitude.The satellite's payload consists of only one large instrument, a Doppler wind lidar called ALADIN (Atmospheric LAser Doppler INstrument), which is the first-ever Doppler-Rayleigh-Mie Wind Lidar (DWL) in space (Stoffelen et al., 2005;Reitebuch, 2012;Kanitz et al., 2019a). ALADIN is a direct-detection high spectral resolution 50 wind lidar providing vertical profiles of the Horizontal Line-of-Sight (HLOS) wind velocity profiles at an angle of 35∘ offnadir from the ground up to 30 km. While 30 km is the initially stated nominal range, it is very rarely used, favoring smaller ranges of observation. However, as with all satellite sensors, the remote sensing technology and retrieval algorithm require a careful assessment of the quality and validity of the generated data products. Although corrections to several substantial bias sources in the Aeolus L2B winds have been implemented in the data processing (Rennie et al., 2021Reitebuch et al., 202055 Weiler et al., 2021b), a direct validation against high resolution measurements is required to identify any residual biases in Aeolus' L2B winds. Total bias is a measure of the overall accuracy of a measurement, while residual bias measures the remaining error after accounting for known sources of error. Both are important in understanding the reliability and accuracy of a measurement. However, in this paper, we are discussing residual bias, as it takes into account the many corrections that backscatter ratio (Souprayen et al., 1999). The Doppler shift corresponds to the projection of the horizontal wind components 90 onto the line-of-sight of the laser, inclined off-zenith.
The lidar makes use of a Quanta-Ray Pro290 Q-switched, injection-seeded Nd:YAG laser emitting at 532 nm with a repetition rate of 30 Hz, 800 mJ per pulse energy. The laser beam is cycled successively between three lines of sight with a cadence of 1-2-2 minutes for measuring the zonal and meridional wind components, whereas the vertical pointing is used for calibration. The OHP system includes three fixed telescope subassemblies, each comprising a mosaic of four 50 cm mirrors: 95 one is pointed toward the zenith, while the others are tilted at 40° off the zenith to the North and East directions. The laser beam is steered by a galvanometric scanner mirror with the three predefined positions. Measurements are limited to nighttime or twilight conditions and the absence of optically thick clouds. After a series of technical upgrades that started in 2013, the LIOvent lidar has become an operational instrument with a capacity of wind profiling up to 75 km (Khaykin et al., 2020) and was approved as a climate monitoring instrument by NDACC in 2021. 100 In 2012, another Doppler lidar (LiWind) was deployed at the new tropical high-altitude Maido observatory of OPAR (21°S, 55°E, 2200 m a.s.l.) on the island of La Réunion. The LiWind system at OPAR uses the same laser, detection and acquisition systems as the OHP wind lidar but features a more compact design for the receiver assembly. The telescope is made up of a single rotating 60 cm mirror, which serves for both the emission and reception (Khaykin et al., 2018). Compared to the OHP's lidar, the smaller collective area of LiWind instrument is compensated by the station's higher elevation and the cleaner 105 atmosphere above the Indian Ocean. Both OHP and OPAR lidars provide wind measurements with an accuracy of better than 1 ms -1 within the entire range of Aeolus altitude coverage. Both instruments will be referred to as ground-based lidar hereinafter.

Radiosondes
The radiosonde (RS) wind measurements, based on a simple GPS tracking of the balloon position offer high accuracy and 110 vertical resolution and their inherent errors (e.g., instrument errors) are minor compared to satellite instrument errors. They are well suited to serve as a baseline dataset for the actual atmospheric state to validate the Aeolus HLOS winds.
Radiosondes measurements are known to provide a solid reference against which other measurements can be validated (B. Sun et al., 2010;Krisch et al., 2017). Furthermore, radiosondes also provide guidance for observational strategies and requirements when collecting feedback from past collocations campaigns with similar instrumentation (Iwai et al., 2021;115 Baars et al., 2020;Martin et al., 2021). For each launch, it can be assumed that the observation errors are not correlated between the different radiosondes. It should be noted that RS have the problem of time and space drift in measuring the vertical wind profile (Baars et al., 2020;Martin et al., 2021). The speed and direction of the horizontal wind are calculated using GPS position changes based on the Global Climate Observing System Reference. According to the GCOS Reference Upper-Air Network (GRUAN), the horizontal measurement of wind speed and direction uncertainties are assumed to be 120 between 0.4 and 1 m s−1 for the wind velocity and 1∘ for the wind direction (Dirksen et al., 2014). While lidars sample air advected into the fixed lines of sight, radiosonde measurements are made within the local flow. Although these instruments might not sample the exact same volume of air, their measurements are proven to be highly correlated above 500 m (Kumer et al., 2014;Khaykin et al., 2020), which make the radiosondes fully suitable for validation of ground-and space-based lidars. 125 The RS at OHP and Maido sites were flown under Totex 1200 gr balloons, drifting on average 160 km for OHP and 120 km for Maido. These values are considered when computing the spatial offset (also referred to as distance to collocation) between the Aeolus and measurements, defining collocation criteria for comparisons of Aeolus and RS measurements. In this study, the RS measurements were performed with no specific spatial offset criterion in order to assess the impact of the distance to collocation on the bias and random error. The closest distance to collocation was 23 km, and the farthest at 241 130 km.

Aeolus ALADIN instrument
The payload of Aeolus satellite consists of a single instrumentthe Atmospheric LAser Doppler INstrument. The instrument samples the atmosphere with a laser pulse and measures the resulting Doppler shift on the returned signal, resulting from the different backscatters throughout the different layers of the atmosphere. The frequency shift is caused by the relative motion 135 of the detected elements along the sensor's line of sight. This motion is correlated to the mean wind in the observed volume.
The measurement volume is determined by the vertical resolution, the width of the laser footprint and the ground integration length. The measurements are repeated every 80 kilometers. Each profile comprises several measurements clustered through grouping identifiers (De Kloe et al., 2016). The measurements are approximately 2.85 km (horizontal scale) apart from each other, and each of them is separately analyzed for atmospheric scene classification (Rennie and Isaksen, 2020). The along-140 orbit interval between individual profiles, obtained by aggregating 30 measurements to improve the signal-to-noise ratio, is ∼87 km. The measurements were classified using particle backscatter coefficients or feature lookup algorithm as criteria (Rennie et al., 2020) until October 8 th , 2020. Past that point, the baseline 2B11 introduced for the Rayleigh channel (The change was already in place for the Mie channel in 2B07) a different classification method, based on a Signal to Noise Ratio (SNR) threshold. The Level 2B product consists of four distinct wind observation types, selected using the atmospheric 145 classification performed in the processor chain (Rennie et al., 2020). The method currently applied by ESA is to use the scattering ratio, which is determined as part of Level 1B (L1B) processing (Reitebuch et al., 2014) and used as input for L2B processing (de Kloe et al., 2016;Rennie et al., 2020). For this purpose, a predefined scattering ratio threshold as a function of height is used. If the scattering ratio is greater than the threshold, the particle scattering is dominant. Under the threshold, only molecular scattering is assumed. Range bins allocated to the same classification type are accumulated in the corresponding 150 observations. The four wind types consist of Rayleigh and Mie-derived winds and can be either categorized clear or cloudy.
The Rayleigh and Mie wind retrieval algorithms are applied to their respective two classes of observations.
Since the optimal performance of the ground-based Doppler lidars is achieved in the clear sky conditions, this paper will only focus on the ALADIN Rayleigh clear data analysis. Rayleigh clear stands for clear skies. According to the Rayleigh approach, the winds are measured in regions showing absence of strong Mie backscatter. In the presence of a high backscatter 155 ratio would qualify the data as Rayleigh cloudy, as for cloudy/particle-loaded skies. Rayleigh cloudy products can also provide usable wind measurements. However, contamination of the Mie scattering must first be corrected, which is still in the experimental stage and is not within the scope of this study, therefore limiting our study only to the Rayleigh clear product.
The Level 2B product provides an HLOS error estimation for each range bin in the observation profiles. The validity flag (de Kloe et al., 2016) ensures the validity of the products. 160 We also apply the quality control guidance L2B threshold from the Aeolus NWP Impact Experiments (Rennie and Isaksen, 2020), except we do not apply any HLOS error threshold. The reason for this choice is to allow for more data to be collocated and also to observe the satellite's behavior in the higher altitudes, where is it shown to have a higher error. "In the following study, we present data from baselines ranging from 2B02 and from 2B11 to 2B13, covering the period from September 2018 to January 2022." 165 During Aeolus commissioning phase, it was noted that ACCDs (accumulation charge-coupled devices) pixels with an increased dark current were present in the memory zone of both ACCDs in the detector unit of ALADIN (Reitebuch et al., 2020;Kanitz et al., 2019b). These pixels are called hot pixels, and their increased dark current can have a time-variable magnitude. The results presented by Weiler et al. (2019a) revealed that by May 2020 6% of ACCD pixels could be classified as hot pixels. Approximately 13% of pixels will be concerned by this issue at the end of the mission's extended life in spring 170 2023, assuming the hot pixel generation rate does not change (Weiler et al., 2019a). Meanwhile, a hot pixel correction has been in place for Aeolus data since June 14, 2019. Keeping the hot pixel appearance rate under a minimum is of the utmost importance: it can greatly affect the collocations between Aeolus and the reference instruments, making them harder to estimate the nominal behavior of the satellite (See Sect. 4.1).
The wind is observed orthogonal to the satellite ground track, pointing 35º off-nadir, away from the Sun (Lux et al., 2020a, 175 their fig. 5). (H)LOS means (horizontal) line of sight. A single wind component, called vLOS, is measured along the satellite's line-of-sight (LOS). The latter is then converted into the HLOS wind speed, by assuming that the vertical wind speed w is negligibly small. Equations 1 and 2 allow obtaining the vLOS and HLOS based on the three cartesian wind components u (zonal wind), v (meridional wind), and w (vertical wind). If w is assumed to be minor, the difference between vLOS and HLOS becomes proportional by a factor of ( ). The angle Ψ represents the elevation of the target-to-satellite pointing vector 180 (55°) and the angle θ is the topocentric azimuth of the target-to-satellite pointing vector, measured clockwise from north.

Adaptation of the measurement grid
Two significant aspects must be considered for adequate comparison of the radiosonde wind profiles with the Aeolus wind data. First, the two instruments' different horizontal and vertical resolutions necessitate an adaptation of the radiosondes' measurement grid to that of Aeolus. Aeolus' data format consists of 24 vertical range bins that divide the atmosphere, resulting 190 in wind profiles that can be obtained between 0 and 30 km, displaying a vertical resolution between 250 and 2 km (Reitebuch et al., 2014). The distribution of these 24 range bins is defined through a dedicated Range Bin Settings (RBS). The main reason for adding or changing a RBS is to address a specific need, such as better sampling at specific heights. The RBS can therefore vary depending on the latitude and the time, which is all adjusted operationally. In this study, we operate with the bins of varying sizes (from 500 m to 1500 m) and a vertical range from the ground up to a maximum altitude (varying between 17.8 195 km to 28.7 km). The radiosondes data have a vertical range up to 35 km, depending on the balloon burst altitude, and the vertical resolution is approximately 5 m at the typical rate of climb of 5 m s-1. The ground-based lidars have a vertical range of up to 75 km and a resolution of 120 m.
Since the instruments have around 5000 measurement points for the radiosondes down to 200 measurements points for the ground-based lidars, compared to the satellite's 24 (at best, if all the measurements have passed quality checks) measurements 200 for a single profile, a down sampling of these two reference datasets is required. Each Aeolus profile is used as a reference for the collocated profiles downsampling, meaning that the averaging grid is specific to each satellite observation. In order to match the resolution of the Aeolus measurements, we first average the reference measurements between the bounds of each Aeolus bin. This avoids the need for interpolation and ensures that the reference measurements are at the same resolution as the Aeolus measurements.The result will be a reference profile downsampled to the corresponding numbers of Aeolus bins 205 present in the profile (~21 on average, when considering the validity flags).

Consideration of the different viewing geometries
The second significant aspect to consider when comparing different instruments is the different viewing geometries. Because Aeolus only measures in one direction, it is necessary for the other two instruments, the ground-based lidars, and the 210 radiosondes, to project their measurements onto the same line of sight. The HLOS wind component is computed as a linear function of the zonal wind component u and the meridional wind component v using Eq.2. Where (259.9°/100° for OHP and 259.0°/101° for Maido, for ascending/descending orbits) is the topocentric azimuth angle, which is defined clockwise from north of the horizontal projection of the target to the satellite pointing vector. Therefore, each observation site has its own azimuth angle value. 215

Statistical terms and methods
The offset between Aeolus and the reference data, also referred to as bias, representing the systematic error of the Aeolus wind measurements, is studied alongside the scaled Median Absolute Deviation (MAD), representing the random error. The MAD is preferable to the standard deviation because it is less sensitive to outliers (Ruppert., 2011). We refer to the scaled MAD as random error. The standard deviation is also used in specific cases. The bias, standard deviation, and scaled MAD are calculated 220 as: The scaled MAD is identical to the standard deviation (Eq. 5) if the analyzed data follows a normal distribution. In addition to the metrics presented above, a least-square line fit to the respective datasets is performed, to also provide the slope and the approximated bias, which we refer to as intercept. The AboVE-Maido campaigns took place from September 25 to October 10, 2019 (AboVE-Maido1) and from May 31, to June 24, 2021 (AboVE-Maido2). Eleven additional measurements were conducted outside the campaign's dates. The campaigns took place at the high-altitude Maido Observatory on the French island of La Réunion. Cal/Val activities at the observatory included Doppler-Rayleigh lidar operation, ranging from early night measurements to dusk till dawn observations 250 (depending on the overpass's time), to cover both ascending and descending orbits as well as time-coordinated radiosondes ascents during and in between the overpasses. The ANX, or Ascending Node crossing, is the point where the orbit of Aeolus intersects the x-y plane in the Earth's fixed coordinate system. During the campaign, the orbit parameter for the ANX was changed from ANX 4.5 to ANX 2.0 (as shown in Fig. 1) to support the Aeolus tropical campaign activities in Cape Verde.

AboVE campaigns overview
This change resulted in a shift in the orbit's location relative to the observatory. Previously, the ANX 4.5 ascending orbit was 255 located within 10 km of the lidar's eastward line-of-sight in the lower stratosphere on Wednesdays. After this change, the ascending orbit moved further away from the lidar's eastward line of sight, but remained within a distance of 200 km.
Thanks to coordination with ESA, the AboVE-Maido 2 campaign took advantage of specific settings planned in advance.
A particularity of this campaign is its unique range bin setting: the Réunion RBS. This specific setting raises Aeolus' top altitude to 28.7 km which permits intercomparisons higher into the stratosphere. This increase in vertical range is possible at 260 the expense of the resolution (see Sect 2.3). It is only activated when the satellite overpasses the vicinity of the island. After the AboVE2-Maido campaign, the cal/val measurement sessions at Maido were conducted once per month until the end of 2021. The OPAR lidar lines of sight facing east, south, and zenith are shown in Fig. 1. The eastward line-of-sight is such that the lidar acquisition area is crossing with that of Aeolus in the stratosphere (around 40 km altitude), reducing the influence of the spatial offset. The minimum spatial offset between Aeolus ascending phase and reference measurements was 22.1 km and 265 22.6 km for the lidar and radiosondes, respectively. At the same time, the descending Aeolus orbits were much further away, with the spatial offset ranging from 54 km to 241.4 km. During both campaigns, 19 Aeolus-collocated RS ascends were carried out, and 15 were time-coordinated with ground-based lidar acquisitions. The baseline, date, distance to collocation, bias, standard deviation, and scaled MAD for both reference instruments, Aeolus overpass time, and orbit type are provided in Table   1  The AboVE-OHP campaigns took place from January 6 to January 14, 2019 (AboVE-OHP1) and from November 29 to December 14, 2021 (AboVE-OHP2). Additional measurements were conducted on December 20 and 21, 2021, and January 10, 17, and 24, 2022. The ANX 4.5 orbit, active during AboVE-OHP1, allowed for collocations with both ascending and 285 descending orbits, within 100 to 150 km. The AboVE2-OHP benefited from the satellite orbit modification: The ANX 2.0 orbit enables collocations within 50 km twice per week for ascending and descending orbits. Similar to the AboVE-Maïdo2 campaign, the measurements included dusk till dawn coverage, shorter measurements, and radiosondes ascents. The satellite's AboVE2-OHP range bin setting is the same as the AboVE-Maïdo2 RBS, since the goal is to compare the two regions' datasets and differentiate biases between the ones inherent to the setting and the ones due to the geographical location. 290

Statistical comparison of Aeolus with Collocated Data
In this subsection, we statistically analyze the comparisons, before discussing Aeolus' capacities and performance at different altitudes. The dataset presented consists of the combined measurements from both cal/val sites, including all time periods. For this analysis, we will study the mean bias as a function of altitude, along with the number of collocated data points in Fig. 3.
Additionally, we will present the Aeolus Rayleigh wind values plotted in Fig. 4 against the corresponding values of the 300 reference instruments downsampled to match the Aeolus height resolution (as discussed in Sect. 2.3). We provide an overview of all the validation cases in Tables 1 and 2. Concerning the two reference instruments, a point-by-point comparison shows a mean bias of 0.1 ms -1 between the wind profile of the lidar and that of the radiosonde, with a standard deviation of 2.3 ms -1 .
The Aeolus instrument settings and ground processing have significantly changed during its mission. These will affect statistical properties like bias and standard deviation/MAD. In addition to the combined statistics, we provide in table 3 a  305 splitting of the results, presenting them separately for the different baselines. Also, near-real-time and reprocessing results are separated, i.e., baseline 11 (introduced in near-real-time processing on 8-Oct-2020) and the baseline 11 results before that date (based on reprocessed Aeolus data). The split is needed since the reprocessing used different calibration data than the nearreal-time processing.  and reprocessed data are also separated because of varying calibration parameters.
Within the dataset, distances to collocation vary between 22.1 km and 207 km for ascending orbits and between 54 and 241.4 km for descending orbits. Therefore, the spatial offsets are highly variable, depending on the orbit phase and the ascending node crossing (ANX) setting: For ascending orbits, the average distance to collocation is 67.4 km for radiosondes and 79.1 km for lidars, whereas, for descending orbits, the averages are 161.4 km and 147.5 km, respectively. Such a difference 315 does not allow for comparing the whole dataset as a unified set of collocations. The reference measurements were separated into ascending and descending orbital phases to account for that disparity. The number of data samples over each altitude bin for the radiosonde comparison. c) The Aeolus minus the lidar HLOS wind difference made during all the campaigns over each altitude bin. d) The number of data samples over each altitude bin for the Lidar comparison. Red represents measurements of an ascending orbit, while black represents measurements of a descending orbit. The lines represent the average bias of each bin altitude, and the red (black) shading is the standard deviation of the bias in each range bin for ascending (descending) orbits. Figure 3a, 3b, show the mean bias as a function of altitude with the shading representing the standard deviation. From 5 km to approximately 22 km, the bias lies within the +/-5 ms -1 range but increases as soon as the upper bins mark is reached.

325
The data below 5 km has a larger standard deviation, which is consistent with what Guo et al. (2021) reports on the increased wind speed differences in the 2-3 km range for descending orbits. The same calculations were then realized, only this time within the 5-22 km window (we refer to it as the "Altitude Range Method"). The results show that removing the higher bins 330 decreases the random error, from 5.58 ms -1 to 5.38 ms -1 for ascending profiles and from 4.99 ms -1 to 4.77 ms -1 for descending profiles. One other method also considered was to average every Aeolus profile within a window of 200 km around the observatory for each collocation, aggregating 2 or 3 profiles on average (we refer to it as the "Average Method"). With this averaging method, the random error decreases from 5.58 ms -1 to 3.67 ms -1 for ascending profiles and from 4.99 ms -1 to 3.38 ms -1 on descending profiles. The larger standard deviation in the lower troposphere might be due to several reasons. First, the 335 satellite's lidar performance is largely limited by received power, therefore the strong aerosol scattering in the boundary layer height will lower the apparent molecular scattering signal, reducing the inversion accuracy of HLOS wind from Aeolus (Tan et al., 2017). Secondly, there is also a smaller sample studied in that altitude region, which leads to an undersampling bias. The AboVE OHP2 lidar measurements were the only ones that had extended coverage below 5 km, which significantly reduced the number of data points in the lower troposphere. 340 The same observations still hold true for Fig. 3c, 3d, depicting the same increase in variance in the uppermost bins.
Similarly, from 5 to around 22 km, the bias fluctuates within the +/-5 ms -1 range, and the uppermost bins display similar features in terms of magnitude to the radiosonde's counterpart. The results differ slightly from the radiosondes observations, particularly when applying the methods of Altitude Range or Average on the descending orbits collocations. Indeed, removing the higher bins or averaging them decreases the random error from 7.17 ms -1 to 6.49 ms -1 or 4.9 ms -1 (for removing the higher 345 and lower bins or averaging several profiles together, respectively) for ascending profiles. This observation also holds for descending profiles, where the random error varies from 7.17 ms -1 to 6.49 ms -1 using the altitude range method and decreases to 3.96 ms -1 with the averaging method. The result of these different methods over the various parameters is reported in table 5. Ground-based lidar comparisons suffer from a higher random error than their radiosondes counterparts. The ground-based lidars show a higher random error in the uppermost bins because its precision decreases with air density (Khaykin et al., 2020). 350 The lack of data points in the 26 -27 km range is a specificity of the range bin setting, being the result of a compromise between height coverage and sample spacing. the difference between Aeolus and radiosonde wind speeds for the same data set. Red represents measurements of an ascending orbit, while black represents measurements of a descending orbit. c) The L2B Rayleigh clear winds versus the lidar measurements made during all the campaigns. d) Frequency distribution of the difference between Aeolus and lidar wind speeds for the same data set. Red represents measurements of an ascending orbit, while black represents measurements of a descending orbit.

360
The various colors indicate if Aeolus had an ascending (red) or descending (black) node, i.e., if it was done in the evening or in the morning (local time), respectively. This separation between orbital phases is done because of the aforementioned distance to collocation disparities. Nonetheless, several long-term cal/val activities showed significant phase-dependent differences in the determined biases of Aeolus wind measurements (Wu et al., 2021, Iwai et al., 2021Lux et al., 2021;Baars et al., 2020;Rennie and Isaksen, 2020;Geiß et al., 2019;Krisch and the Aeolus DISC, 2020), further assessing the need for 365 an independent study on both cases. The correlation plot of the Aeolus wind is shown in Fig. 4a, alongside the retrieved linear regression. A linear trend is clearly seen between the Aeolus and the reference radiosonde observations. The following coefficients are presented with confidence bounds of one sigma. The Fig. 4a ascending trend line has a slope of 0.96 +/-0.03 with an intercept (i.e., an approximated bias) of -1.32 +/-0.38 ms −1 , whereas the descending trend line has a slope of 0.94+/-0.03 and an intercept of -0.42 +/-0.56 ms -1 . The slopes are statistically similar, but the intercepts aren't. Comparing this result 370 with radiosondes from other cal/val campaigns, Iwai et al. (2021) found a slope of 1.01 and an intercept of 0.38 ms -1 in Okinawa. Baars et al. (2020) presents a slope of 0.97, which is consistent with our observations, and an intercept of 1.57 ms -1 that could be explained by the lower maximum height of comparison, where we see the biggest shift towards negative biases (Fig. 3).
Because of its similar location and window of acquisition, the lidar data share a lot of similar conclusions. In Fig .4c, the 375 ascending trend line has a slope of 0.97 +/-0.03 with an intercept of -1.41 +/-0.45 ms−1, whereas the descending trend line has a slope of 0.94 +/-0.04 and an intercept of -0.83 +/-0.78 ms-1. The slopes are statistically similar, same as for the intercepts.
Comparing it with other cal/val studies using lidar data, Wu et al. (2021) found overall negative biases with a slope of 1 and an intercept of -0.12 ms-1, confirming the negative intercept we observe for all scenarios.  Table 7. Figure 4b shows the normalized frequency distribution of the deviation between the Rayleigh clear and radiosonde wind observations. The distribution follows a Gaussian pattern, meaning that, according to the normal distribution law, almost 70% of the samples are within the 10 ms-1 385 absolute error margin. It also seems that the probability of overestimating or underestimating the wind product by Aeolus is equiprobable. When calculating the mean value of this distribution, one gets -0.79 ms−1 as the bias for the Rayleigh wind observations. If one uses the median of the distribution for the bias calculation, one gets a bias of -0.94 ms−1, which is a little more than the result calculated from the mean. All these results are reported in Table 4. The slope is close to one for both cases, which means that Aeolus excels at resolving wind speed variations, even if a special sparser RBS is used. 390 When calculating this distribution's average value for ascending orbit collocations (descending orbit collocations), one gets -1.41 ms−1 (0.11 ms-1) as the bias for the Rayleigh clear wind observations. If we use the median of the distribution for the bias calculation, one gets a bias of -1.05 ms−1 (0.16 ms-1). Putting these two together, the mean value becomes -0.92 ms-1 and the median -0.48 ms-1. While close, there is a clear difference in biases based on whether it is using an ascending or 395 descending orbit in reference, further assessing the need to distinguish both cases.

RS Lidar
Asc Desc Asc Desc  Overall, there is not a better slope or a lower bias for ascending orbits, which would go with the conclusion of the absence of any representative statistical difference presented by Guo et al. (2021). The data even suggests that the bias is lower for descending orbits, which would contradict the fact that they have a larger spatial offset from the reference measurement locations. We can also observe a negative bias for all the comparisons using the standard all-data method. This means that the 405 satellite is prone to underestimating the HLOS wind speeds. A previous study indicates differences in the biases between the ascending and descending orbit phases, mainly occurring for the Rayleigh channel in late summer and autumn (Martin et al., 2021). One reason Sun et al. (2014) raised may be that the meteorological conditions such as wind speeds, Boundary-Layer Height, air temperature, and aerosol distributions differ from one orbital phase to the other. However, none of these orbitdependent events were observed during our measurements. In addition, Weiler et al. (2021b) showed that an important 410 contribution for orbital phase biases is the telescope temperature effect. Other studies indicate lidar comparison biases of 1. with what we observe, the radiosonde biases show a greater variety in the outcomes, which might be due to the pendulum-line 415 motion of the suspended radiosonde during the early stage of ascent (Kumer et al., 2014 Table 5 . Overview of the different comparison method and their relative outcome on the statistical metrics, depending on the instrument and the orbit phase. The data used includes all the collocations shown in Tables 1 and 2. Table 5 shows the previously mentioned data and the different methods for comparison. The all-data method means that the instrument is collocated within the closest overall Aeolus overpass. The altitude range method cuts the data outside the 5 430 to 22 km range to account for possible RBS issues and non-optimal satellite coverage area. The last method aggregates every eligible profile (within the last two hours) into one, averaged, single collocation. The goal was to see if the uppermost bins issue would become insignificant after a certain amount of averaging, hence making the 22+ km range data available for usage.
So far, both methods show encouraging data, reducing the scaled MAD for both instruments at any orbit type. Putting these numbers into perspective, for the radiosondes, a scaled MAD of 4.84 ms -1 is found by Baars et al. (2020), 3.97 ms -1 for Iwai 435 et al. (2021), and 5.01 ms -1 for Martin et al. (2021). A standard deviation of 4.43 ms -1 (Iwai et al. 2021) is also reported. We report a scaled MAD of 5.37 ms -1 and a SD of 6.18 ms -1 , which belongs to the same range of results, if we account for the fact that we consider a broader altitude range for comparison (see Table. 5).
For the lidars, other cal/val campaigns report scaled MADs of 3.91 ms -1 (Wu et al. 2022), 5.21 ms -1 , and 5.58 ms -1 for Iwai et al. (2021). They also report SDs of 5.98 ms -1 and 4.78 ms -1 (Chen et al., 2022), 4.76 ms -1 (Wu et al., 2021), and 5.69 ms -1 440 and 6.53 ms -1 (Iwai et al., 2021). Our results point to a scaled MAD of 6.49 ms -1 and a SD of 7.25 ms -1 , further confirming other studies even if we observe bigger numbers, because of a wider altitude range comparison. It should also be noted that we do not meet the mission requirements (Ingmann and Straume, 2016), as seen in previous studies (Baars et al., 2020;Iwai et al., 2021;Martin et al., 2021;Rennie et al., 2021;Wu et al., 2022) with similar magnitudes and values. The offset between the observed SD and the mission requirement's SD does not seem to change between the free troposphere and the first 445 kilometers of the stratosphere. The uppermost region of observation is where we reach the highest difference between the observed and the required SD, by more than 4 ms -1 .

Case-based analysis
In this subsection, we further discuss Aeolus' capacities and performance with the help of four specific case studies. The goal is to provide an overview of the measurements by choosing representative collocations, each helping to show specificities present in the dataset. It shows that in the same conditions (i.e., spatial/temporal offsets) Aeolus can behave very differently.
These measurements are part of a more extensive statistical analysis presented in the above subsection. The figure 5 shows the 455 HLOS wind velocity profiles measured by the radiosonde (black), the ground-based lidar (red), and the space lidar (blue). The shadings represent the measurement error for each data point, and the radiosondes' uncertainty is between 0.4 and 1 ms-1 (Dirksen et al., 2014), which is smaller than the ground-based lidars (2.2 ms-1) (Khaykin et al., 2020) and Aeolus (4.1-4.4 m s−1) (Martin et al., 2021) uncertainties. For clarity, radiosonde wind speed uncertainties are not plotted in Fig. 5

. Since
Aeolus is taking measurements 35° off-nadir, the horizontal distance of the Aeolus observations to the ground-based lidar is 460 different for each height bin in the Aeolus wind profile. At the same time, the radiosonde drifts along the direction of local wind, making the distance between the Aeolus measurements and the radiosonde vary during the balloon sounding as a function of height. Furthermore, not only the distance between Aeolus and radiosonde changes with time, but also the time difference between the two systems changes with time and, therefore, also with altitude.

Case Study A: February 24, 2021 (Above Maido 2) -2B11
This collocation was taken at 14h32 UTC. Both instruments were started 30 minutes prior, at around 14h UTC. The mean measuring distance is 67 km for the radiosonde and 40 km for the lidar. The mean bias of the radiosonde is 0.46 ms -1 , and the lidars' is -0.68 ms -1 . The standard deviation is 6.8 ms -1 and 8.8 ms -1 , the scaled MAD is 6.36 ms -1 and 11.38 ms -1 , and correlation 475 coefficients are 0.55 and 0.46. The figure depicts a very specific pattern of oscillating nature. Although showing a similar trend to the other two instruments (both mean bias lower than |1| ms-1 for both reference instruments), the ALADIN shows a very particular signature between 12 and 19 km. This observation could hint toward the existence of oscillating perturbations, OPs. We could not find any literature assessing the existence of similar phenomena. Henceforth, what remains to be explained is the nature of the oscillation, which does not correspond to any known phenomenon. While the hot pixels could also play a 480 part in this phenomenon, explaining the increase in magnitude, they cannot account for the oscillating nature .
Case Study b): June 9, 2021 (Above Maido 2) -2B12 This collocation was taken at 14h33 UTC. Both instruments were started 30 minutes prior, at around 14h UTC. The mean spatial offset is 33 km for the radiosonde and 40.5 km for the lidar. The mean bias between Aeolus and the radiosonde is 2.52 485 ms -1 , and the mean bias between Aeolus and the lidars is 2.22 ms -1 . The standard deviations are 11.21 ms -1 and 11.54 ms -1 , the scaled MAD are 6.75 ms -1 and 9.05 ms -1 , and correlation coefficients are 0.83 and 0.84.
It is found that the Aeolus wind profile in the atmospheric boundary layer and the lower troposphere is in good agreement except for the 22 km height and higher bins of the Aeolus wind profile, which have a significant bias compared with the radiosondes and lidar-retrieved HLOS wind. These exceptionally strong deviations are observed for most AboVE-Maido 2 490 campaign collocations and happen specifically within the uppermost bins. The low molecular density could explain the cause of the higher values observed at high altitude levels. We conclude that, because of this mission's specific RBS, the satellite is not resolving the higher altitude ranges with enough precision since it is not receiving enough backscattering.

Case Study c): 14 December 2021 (Above OHP2) -2B13
The third case study is from the second and most recent cal/val campaign conducted at the Observatoire de Haute Provence. 495 The distance to collocation is under 70 km for both instruments, and the time interval is less than 2 hours. The mean bias between Aeolus and the radiosonde is 1.5 ms -1 , and the mean bias between Aeolus and the lidars is 0.5 ms -1 . The standard deviations are 2.84 ms -1 and 4.02 ms -1 , the scaled MAD are 2.85 ms -1 and 3.22 ms -1 , and correlation coefficients are 0.92 for radiosonde and 0.86 for lidar. The scaled MAD is exceptionally low compared to other cases, showing that ALADIN is still able to perform very well and deliver excellent results, beyond the nominal lifetime of three years. Here, the Rayleigh clear 500 profile obtained by Aeolus is within a close interval to the ground-based data, except for a sudden spike appearing at around 23km. Indeed, this spike not observed on the reference lidar could be linked to Aeolus' hot pixels issue (Weiler et al., 2021a) since 24 hot pixels were present in the Rayleigh channel (ESA., 2021).

Case Study d): December 20, 2021 (Above OHP 2) -2B13
This collocation is a good example of the poor-behaving measurements Aeolus can come across because the satellite seems 505 to miss what the lidar and the radiosonde see even within close collocation criteria (40 minutes difference and both instrument under 80 km). An oscillating perturbation is also present on Aeolus' profile as opposed to the reference data. Given the very close collocation, one could assume a negligibly small geophysical bias that the three observations would keep a general common trend. We are looking at an oscillating profile with significant offsets from the reference measurement. That could be interpreted as a coupling between both oscillations and hot pixels, meaning that these phenomena can co-occur. The mean 510 bias between Aeolus and the radiosonde is -6.56 ms -1 , and the mean bias between Aeolus and the lidars is -6.53 ms -1 . The standard deviations are 6.84 ms -1 and 7.12 ms -1 , the scaled MAD are 6.42 ms -1 and 6.59 ms -1 , and correlation coefficients are 0.76 and 0.75.

Long-term validation
This section aims to convey an idea of the evolution of Aeolus' performance over its mission cycle, starting in September 515 2018. For comparisons, Météo-France sites in La Réunion (21° S, 55° E) and Nîmes were used (43° N, 4° E). These sites perform twice-a-day radiosonde launches at midnight and noon, opening broader possibilities for potential collocations. Both sites have an average of 120 km of distance to collocation. The La Reunion site has a time difference of 3 hours on average, whereas the Nîmes site has 5 hours 30 minutes time difference. Figure 6 shows the time series of standard deviation of the difference between Aeolus Rayleigh clear HLOS wind and 520 Météo-France radiosonde launches at both sites. Outlier detection was conducted on both time series using the scaled MAD technique (outliers are defined as elements more than three scaled MAD from the median). Also shown in Fig. 6  As previously mentioned, both sites have a similar average distance to the collocation, however the time offset is larger for Nimes site by 2h30. As demonstrated in Sect.4.1, the distance to collocation does not significantly impact the bias of the collocations. As such, this explains why both time series share a very similar trend and show the same variations specific to mission events (FM-A laser power loss, FM-B switch etc.). We note that the standard deviation at Nimes is higher by 1.5 ms -535 1 on average compared to that at La Reunion. Because the distance to collocation is the same at both sites, this higher random error can be explained by the larger time offset. One can thus conclude that in terms of the collocation criteria, the temporal offset is more critical than the spatial offset when collocating satellite and ground-based wind measurements. While it might not be trivial to generalize these results to other stations with different weather and wind regimes, the findings of this study may be relevant to locations with similar characteristics. 540 For the La Reunion time series (Fig. 6a), the October 2019 and June 2021 AboVE-Maido campaign data correspond well with the Météo France observations, whereas the lower values could be explained by the campaign collocations with reduced geophysical bias. Additional lidar observations at Maido conducted outside the dedicated campaigns, show increased random errors, supposedly due to the larger time offsets for some of these collocations (cf. Table 1). The results of the AboVE-OHP-545 2 campaign, displayed in Fig. 6b, show random errors similar or lower than those inferred from the Météo-France radiosonde launches. As for the La Reunion series, the lower random errors can be explained by the reduced spatial and temporal offsets for the campaign collocations.
Overall, a few conclusions can be shared across both sites of observation. Over time, we observe a slight increase in the standard deviation, (+1 ms -1 on average between January 2020 and 2022). It is possible to split the time series into different 550 periods. The early FM-B period (July 2019 to January 2020) shows the lowest random error and is consistent across both observation sites. We refer to this period as the "golden era" of Aeolus. Around mid-April 2020 the M1 bias correction introduced into the retrieval, has lowered the bias significantly (Weiler et al. 2021b) however it does not appear to have an impact on the random error. From this point on, the disparity of the data stabilized, as well as the average value. The last months of the time series show a stable average value as well as its standard deviation, asserting that the satellite is maintaining 555 a good precision over the last few months of 2022.
High standard deviation values could be due to hot pixels, and OPs mentioned above present in the ACCDs. Previous research revealed that 6% of the ACCD pixels are hot pixels and that around 13% of the pixels will be affected at the end of the extended mission lifetime in spring 2023, assuming that the hot pixel generation rate does not change (Weiler et al., 2021a).
Corrections to several substantial bias sources in the Aeolus L2B winds have been implemented, including corrections to the 560 dark current signal anomalies of single pixels (so-called hot pixels) on the Accumulation-Charge-Coupled Devices (ACCDs), linear drift in the illumination of the Rayleigh/Mie spectrometers, and the telescope M1 mirror temperature variations (Reitebuch et al., 2020;Weiler et al., 2021b). Indeed, figure 6 confirms other studies that claim that hot pixels, laser energy, and receiving path degradation effects in ALADIN have been mitigated (Feofilov et al., 2022;Baars et al., 2020;Weiler et al., 2020). If this mitigation were not present, we would observe an increase in the standard deviation at a much higher rate (Weiler 565 et al., 2021a). Unfortunately, due to potential calibration issues, uncorrected biases might remain in Aeolus L2B winds and may contribute to potential biases between Aeolus and the Météo-France observed winds. In addition, the Aeolus L2B winds might be biased towards the ECMWF model, as the M1 bias correction makes use of ECMWF 6-hour forecasts (Rennie et al., 2021), which might also lead to suboptimal assimilation of Aeolus winds on ground comparisons .

Discussion 570
In this paper, to evaluate the accuracy and precision of the Aeolus retrieved wind results, double collocation techniques using both radiosondes and ground-based lidars were conducted. Due to the proximity of collocation and a similar functioning principle, a statistical analysis of random error, biases, and their evolution throughout the mission lifetime was performed,  (Guo et al., 2021) 0.81 6.82 --0.64 0.99 -0.64  Wu's (2021) initiative, we summarized the recent comparison campaigns from the Cal/Val teams across the globe and Aeolus' lifespan. We decided to use the same statistical parameters and presentation that Wu (2021) proposed, in order to keep a consistent comparison method over several studies. Therefore, the statistical parameters include the correlation 580 coefficient, SD, MAD, bias, slope, and intercept. We can compare our results to other instruments that weren't mentioned during previous sections, such as ALADIN Airborne Demonstrators (A2D) (Witschas et al., 2020;Lux et al., 2020a;, airborne Doppler Wind Lidars (DWL) (Witschas et al., 2020;Witschas et al., 2022) and Radar Wind Profilers (WPR) (Zuo et al., 2022;Guo et al., 2021;Belova et al., 2021;Iwai et al., 2021). Close to Wu's (2021) observations, we observe the same consistency and similarities with the more recent studies. A non-negligible proportion of the disparities 585 between the measurements are caused by the variety of comparison ranges, ranging from the boundary layer to the midstratosphere.
Over time, we assess a very modest decrease in the precision of ALADIN measurements (Fig. 6), at both cal/val sites.
While studies claim that the effects have been mitigated thanks to various corrections (Rennie et al., 2021Reitebuch et al., 2020Weiler et al., 2021b), it might not compensate for the increased error over time, coming from hot pixels, the temperature 590 of the mirror and laser-induced contamination. Uncorrected biases might remain in Aeolus L2B winds and may contribute to potential biases between Aeolus and the Météo-France observed winds .
The argument made in Sect.4.3 is that the time offset is more important than the spatial offset, in terms of impact on the collocation quality. Therefore, it should be inferred from this result that the time criterion has to be favored over the distance criterion when collocating different measurements. To do so, we showed how the distance to collocation had small effects on 595 the bias in Sect.4.2 and how the time difference had a significant impact on the standard deviation in Sect.4.3. Some error sources were pointed out by previous research, e.g., hot pixels and dark current anomalies (Weiler et al., 2021a), Rayleigh wind errors introduced by angular variations , Lux et al., 2018, Lux et al., 2020a, vibrations introduced by the satellite platform, which affects the Q-switched master oscillator cavity length (Lux et al., 2020b), photon shot noise (Liu et al., 2006), micro-vibrations due to critical rotation speeds of the satellite's reaction wheels (Lux et al., 2021), mechanical 600 disturbances generated by reaction wheels of the class of those embarked on Aeolus (Le., 2017), linear drift in the illumination of the Rayleigh/Mie spectrometers, and the telescope M1 mirror temperature variations (Reitebuch et al., 2020;Weiler et al., 2021b). In section 4.1, we observe a steep increase in the random error above 25 km, which can be further observed in fig.5b.
We believe that this issue, specific to the high RBS profiles, is caused by the lack of molecular backscatter. The lack of signal could result in a higher random error, but the estimated error is shown to still be within a narrow interval (+/-5 ms -1 ), compared 605 to its observed deviation from reference measurements. Although bigger than what is shown for the previous altitude bins, the retrieved measurement error for the uppermost bins is strongly underestimated, along with the physical fact that there are fewer molecules at higher altitude levels.
In this study, we also refer to OPs, which might be a new, unreferenced, phenomenon. Over the 65 collocations, the phenomenon was observed 5 times. It is impossible to tell how recurrent this perturbation appears, because it requires a single 610 profile-to-profile reference measurement, which is not something done in Aeolus to models comparisons. We note that applying a high-pass filter with the cutoff at 5 km vertical wavelength smooths out all oscillations and provides an estimation of the profile much closer to that of the reference instruments. While this is a good patch, it is not possible to apply such a filter on every wind profile, as it would probably also remove traces of physical events. One way to further improve this method, would be to see how the HLOS value fluctuates from one bin to another. Since OPs tend to have a fairly constant 615 peak-to-peak amplitude and period it should be fairly easy to detect such patterns in the data. One way to further track down this issue would be to retrieve the L1B useful signal, as well as the SNR, to see if it could be linked to bad behavior of the optical system itself.
We observe a negative bias in most scenarios, except for the Altitude range method, where the descending orbit lidar collocations show a positive bias. On average, the satellite tends to underestimate the wind speed by around 1 ms -1 . Several 620 papers cited above (Sect.4.1 and Table 7) report similar observations.
With this study, we have addressed the performance of the ALADIN Rayleigh channel at a broad range of altitudes, from the lower troposphere to the maximum altitude of 30 km enabled by the AboVE-2 range bin setting. The performance of the ALADIN Mie channel in the lower stratosphere remains to be assessed using the lidar and radiosonde measurements at La Reunion. This site was to provide the most extensive lidar observations of the 2022 Hunga Tonga volcanic eruption plumes in 625 the stratosphere (Baron et al., 2022), that were sampled by the ALADIN Mie channels (Legras et al., 2022;Khaykin et al., 2022).

Summary
The Aeolus wind products from the first wind lidar in space, ALADIN, were validated against radiosondes and ground-based Doppler Rayleigh-Mie Lidars from the observation sites of OPAR in La Réunion and OHP in Haute Provence (France). All 630 65 collocations were collected during periods from baseline 2B02 to 2B13, spanning January 2019 to January 2022. In summary, we find a standard deviation of 6.18 ms -1 and 7.25 ms -1 and a scaled MAD of 5.37 ms -1 and 6.49 ms -1 for radiosondes and lidars intercomparisons, respectively. We also find correlation coefficients of 0.82 and 0.77, slopes of 0.95 and 0.96 and y-intercepts of -0.87 ms-1 and -1.12 ms-1 for radiosondes and lidars intercomparisons, respectively. The biases and random errors observed are higher than those outlined in the mission requirements (Ingmann and Straume, 2016), as seen in Table 6: 635 The bias is higher by an average of 0.15 ms-1 and the standard deviation was exceeded at every altitude level by 50%, on average.
When collocating ground-based and satellite measurements, the time criterion was found to be more prevalent than the distance criterion in terms of choosing which one to favor, as shown in Sect. 4.3 of this study. However, it should be noted that these results may not be applicable to all stations with different weather and wind regimes and are only relevant to locations 640 with similar characteristics. This study also showed previously undocumented phenomena (Oscillating Perturbations, OPs Sect.4.2, excessive random error in the uppermost bins above 25 km, Sect.4.1. While the latter phenomenon can be mitigated through the use of averages between several adjacent profiles, the OP issue could be addressed by application of frequency filters, which would require further investigation. Within this study, we have noticed range-bin and temporal wind dependencies. For the uppermost bins (above 22 km on 645 average) enabled by the AboVE RBS, the random error is enhanced by 2 -3 ms -1 for ascending and descending phases. This can be explained by the lower air density there, reducing the molecular backscatter intensity. We note that aggregation of two or more adjacent Aeolus profiles improves the comparison by 70%. For the larger spatial offsets, the method yields poorer results compared to the altitude range method. Both methods lowered the scaled MAD on any comparison category for any instrument. Simlarly to the results by Guo et al. (2021) and Zuo et al. (2022), we do not observe any significant difference 650 between ascending and descending phases which goes against previously observations about orbit-dependent characteristics (Rennie et al., 2021, Martin et al., 2021. Guo et al. (2021) showed slopes of 0.91 and 0.96 and intercepts of 0.47 ms -1 and -1.4 ms -1 , respectively. This is close to our observations. Detailing our results, the comparison of ascending phase for radiosondes collocations has a mean correlation coefficient of 0.77 and a scaled MAD of 5.58 ms -1 . In contrast, the descending phase has a mean correlation coefficient of 0.91 and a scaled MAD of 4.99 ms -1 . The lidars collocations in ascending phase have a mean 655 correlation coefficient of 0.73 and a scaled MAD of 7.17 ms -1 , whereas the descending phase has a mean correlation coefficient of 0.85 and a scaled MAD of 5.06 ms -1 .
Overall, we recognize evolutions in the L2B data quality throughout the satellite's lifetime. Thanks to the regular measurement campaigns and the use of twice-daily Météo-France radiosondes, we managed to observe the long-term evolution of the precision of the satellite based on the standard deviation of daily collocations made between the satellite and the closest 660 Météo-France station. As observed during the AboVE-Maido validation campaigns, the mean random error increases from 4.6 ms -1 to 7.6 ms -1 between AboVE-1 (October 2019) and the AboVE-2 campaign (June 2021). For the AboVE-OHP campaigns, the mean random error increases from 5.6 ms-1 to 6.4 ms-1 between AboVE-1 (January 2019) and the AboVE-2 campaign (December 2021). This is consistent with the routine observations using Météo-France's radiosondes. The early FM-B period, "the golden era" of Aeolus, shows the lowest random error at both sites, which has been increasing ever since. 665 Data availability. Aeolus Data is publicly available through the Aeolus online dissemination system (https://aeolusds.eo.esa.int/oads/access/ and PK offered scientific insight. JPC and YH conducted AboVE-Maido radiosonde measurements The paper is written by MR with contributions from all co-authors. Acknowledgements. The upgrade of OHP and OPAR wind lidars was financially supported by CNES (Centre National d'Etudes Spatiales) as well as through EU FP7 ARISE and H2020 ARISE2 projects. We gratefully thank the personnel of 680 OPAR station (Eric Gloubic, Patrick Hernandez and Louis Mottet) and the personnel of Station Gerard Megie at OHP (Frederic Gomez, Francois Dolon, Pierre Da Conceicao, Francois Huppert and others) for conducting the radiosonde launches and lidar operation. The work related to Aeolus validation has been performed in the frame of Aeolus Scientific Calibration & Validation Team (ACVT) activities