Calibration of a Water Vapour Lidar using a Radiosonde Trajectory Method

Lidars are well-suited for trend measurements in the upper troposphere and lower stratosphere, particularly for species such as water vapour. Trend determinations require frequent, accurate and well-characterized measurements. However, water vapour Raman lidars produce a relative measurement and require calibration in order to transform the measurement into physical units. Typically, the calibration is done using a reference instrument such as a radiosonde. We present an improved trajectory technique to calibrate water vapour Raman lidars based on the previous work of Whiteman et al. (2006), Leblanc 5 and Mcdermid (2008), and Adam et al. (2010) who used radiosondes as an external calibration source, and matched the lidar measurements to the corresponding radiosonde measurement. However, they did not consider the movement of the radiosonde. As calibrations can be affected by a lack of co-location with the reference instrument, we have attempted to improve their technique by tracking the air parcels measured by the radiosonde relative to the field-of-view of the lidar. This study uses GCOS Reference Upper Air Network (GRUAN) Vaisala RS92 radiosonde measurements and lidar measurements from the 10 MeteoSwiss RAman Lidar for Meteorological Observation (RALMO), located in Payerne, Switzerland to demonstrate this improved calibration technique. We compare this technique to traditional radiosonde-lidar calibration techniques which do not involve tracking the radiosonde. Both traditional and our trajectory methods produce similar profiles when the water vapour field is homogeneous over the 30 min calibration period. We show that the trajectory method more accurately reproduces the radiosonde profile when the water vapour field is not homogeneous over a 30 min calibration period. We also calculate 15 a calibration uncertainty budget that can be performed on a nightly basis. We include the contribution of the radiosonde measurement uncertainties to the total calibration uncertainty, and show that on average the uncertainty contribution from the radiosonde is 4%. We also calculate the uncertainty in the calibration due to the uncertainty in the lidar’s counting system, caused by phototube paralyzation, and found it to be an average of 0.3% for our system. This trajectory method allows a more accurate calibration of a lidar, even when non-co-located radiosondes are the only available calibration source, and also allows 20 additional nights to be used for calibration that would otherwise be discarded due to variability in the water vapour profile. Copyright statement. 1 Atmos. Meas. Tech. Discuss., https://doi.org/10.5194/amt-2018-246 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 12 October 2018 c © Author(s) 2018. CC BY 4.0 License.


Introduction
Water vapour is the primary contributor to the greenhouse effect due to its ability to absorb infrared radiation efficiently. Water vapour also has high temporal and spatial variability making it difficult to characterize its influence on the atmosphere. Instruments with high spatial-temporal resolutions, such as lidars, are uniquely suited to long-term stratospheric and tropospheric water vapour studies. When conducting climatological studies, ground-based lidars have an advantage over satellite-borne in-5 struments in that they have the ability to provide more frequent observations of the same location. Lidar measurements are particularly useful for creating statistically significant water vapour trends of the Upper Tropospheric and Lower Stratospheric (UTLS) region, as they are able to take long term and frequent measurements (Weatherhead et al., 1998;Whiteman et al., 2011b). Minimizing the uncertainty in the measurements is also critical in order to establish a valid trend. A large component of a lidar measurement's uncertainty budget is its calibration constant. Water vapour lidars measure relative profiles, and there-10 fore require a calibration to convert the measurements into the correct units. Refining the calibration process is critical to detect the small changes anticipated in the trend analysis. Several Raman lidar calibration techniques have been developed over the years, including internal, external, and a hybrid of internal and external methods.
Internal calibration techniques require no external reference instrument. They can account for the entire optical path in the lidar system to find the water vapour calibration constant. In essence, all optical transmittance, quantum efficiencies of the 15 detectors, Raman cross-sections, the geometric overlap, and their associated uncertainties must be quantified and accounted for. Some of these can be derived simultaneously using the white light calibration discussed in Venable et al. (2011). The white light technique is advantageous in that it can accurately track changes in the calibration constant. However, the calibration is incapable of detecting shifts in spectral separation units, and is not able to accurately detect the cause of calibration changes unless multiple lamps in different locations are used (Whiteman et al., 2011a). Venable et al. (2011) further expanded on the 20 white light technique by using a scanning lamp instead of a stationary lamp. When the scanning method was compared to the external radiosonde method, it was found that both methods agreed with each other within their respective uncertainties.
However, the internal white lamp method is dependent on the degree to which we know the filter bands, the lamp intensity function, and the molecular cross-sections. The last factor is the most limiting due to its current uncertainties on the order of 10% (Penney and Lapp, 1976). While internal calibration offers many advantages, it is impractical for many systems, such 25 as lidars that use multiple mirrors (Dinoev et al., 2013;Godin-Beekmann et al., 2003) or large-aperture mirrors such as the rotating liquid mercury mirror of The University of Western Ontario's Purple Crow Lidar (Sica et al., 1995).
The standard external method involves comparing the lidar and a reference instrument; typically the reference instrument is a radiosonde (Melfi, 1972;Whiteman et al., 1992;Ferrare et al., 1995) but microwave radiometers may also be used (Han Atmos. Meas. Tech. Discuss., https://doi.org/10.5194/amt-2018-246 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 12 October 2018 c Author(s) 2018. CC BY 4.0 License. et al., 1994;Hogg et al., 1983;Foth et al., 2015). External calibrations are often preferable because there is no need to characterize every system component and the uncertainties in the Raman cross sections do not contribute. However, the accuracy of the external calibration is dependent on the accuracy of the reference instrument. Radiosondes are widely used calibration instruments but can have relative humidity uncertainties between 5 to 15% depending on the time of day (Miloshevich et al., 2009;Dirksen et al., 2014). To minimize the calibration uncertainties induced by biases in the radiosonde reference, the GCOS 5 (Global Climate Observing System) Reference Upper-Air Network (GRUAN) has established a robust correction algorithm for the Vaisala RS92 radiosondes as RS92 radiosondes are the most frequently used calibration radiosondes (Immler et al., 2010;Dirksen et al., 2014). GRUAN RS92 relative humidity profiles have been shown to be 5% more moist than Vaisala relative humidity profiles, while reducing the relative humidity errors by up to 2% (Dirksen et al., 2014). A portion of the calibration uncertainty when using radiosondes can occur from the radiosonde's lack of co-location with the lidar, hereafter the 10 "representation" uncertainty.
Hybrid internal-external methods have also been implemented by Leblanc and Mcdermid (2008) and Whiteman et al. (2011a). In these hybrid techniques, the white light calibration lamp is used to monitor the efficiency of the lidar optical paths, but is supplemented with radiosondes for the absolute calibration value. The hybrid technique will monitor relative changes in the calibration constant, but must be supplemented periodically with an external calibration (Leblanc and Mcdermid, 2008). 15 For any external calibration where the lidar and the calibration instrument do not share a common field-of-view, variations in water vapour cause an additional uncertainty in the calibration that is often not included in the uncertainty budget. This paper attempts to resolve the co-location problem and minimize the representation uncertainty by using a tracking technique that expands upon those discussed in Whiteman et al. (2006), Leblanc et al. (2012), and Adam et al. (2010). The co-location problem can be particularly acute for calibration via a radiosonde, as the radiosonde takes approximately 30 min to reach the 20 tropopause, during which time the radiosonde can travel 4 km or more from the lidar's field-of-view (assumed here to be the zenith, which is typically how water vapour lidars are operated). The distance traveled by the radiosonde would normally have little bearing on a calibration measurement, assuming the air mass being sampled is horizontally homogeneous. However, if we calibrate while on the edge of an airmass or the air mass simply is not horizontally uniform, then the water vapour field may change dramatically over the distances the radiosonde travels. Lidar stations which have the resources to use daily radiosondes 25 may not see this as much of a hindrance; however, if the station relies on infrequent calibration campaigns then the campaign calibration results are entirely dependent on the weather at the time.
In order to ensure that the lidar and the radiosonde are measuring the same air, we have developed an improved lidarradiosonde calibration technique that utilizes the position of the radiosonde and the wind speed and direction measured by the radiosonde. The wind speed and direction measurements allow us to track the air parcels as measured by the radiosonde with respect to the position of the lidar. If the air is within a 3 km radius around the lidar, we use the corresponding times and lidar scans for calibration. We have implemented the technique using 76 nighttime GRUAN RS92 radiosonde flights from 2011 to 2016. Daytime calibrations were not tested due to the significantly reduced signal-to-noise (SNR) in daylight measurements and the inability to reach above 5 km effectively with the lidar. We will illustrate the method using measurements from the 5 MeteoSwiss RAman Lidar for Meteorological Observing (RALMO) (Dinoev et al., 2013;Brocard et al., 2013) on July 22nd, 2017 corresponding to the 00:00 UTC GRUAN RS92 radiosonde launch. Section 2 will outline the measurements used in the study. Section 3 will discuss the methodology. Sections 4 and 5 will compare the new trajectory method with the traditional calibration technique and their respective uncertainties. Sections 6 and 7 will summarize the results and discuss their implications and the next steps forward.

Radiosonde Measurements
The MeteoSwiss Payerne research station launches Vaisala GRUAN RS92 radiosondes within 100 m of RALMO bi-weekly.
A subset of these radiosondes are processed by GRUAN. This study uses the official GRUAN RS92 radiosonde product to minimize and accurately calculate the calibration uncertainty and the contribution from the radiosonde. GRUAN requires that 15 radiosondes undergo several pre-flight checks and calibrations, which are detailed in Dirksen et al. (2014). These calibrations are needed to correct radiation and systematic relative humidity biases in the radiosonde temperature, pressure, and relative humidity profiles. All radiosonde measurements from 2011 to 2016 taken by the RS92 Vaisala sondes were processed by the GRUAN correction software (Immler et al., 2010). Radiosondes prior to October 2011 were RS92 radiosondes but were not processed by GRUAN because they were not compatible with the GRUAN requirements listed in Dirksen et al. (2014). 20 The radiosonde water vapour mixing ratios are calculated using the GRUAN-corrected relative humidity profiles and the Hyland and Wexler 1983 formulae (Hyland and Wexler, 1983). By convention, the relative humidities are assumed to be over water for all altitudes. A total of 76 GRUAN RS92 nighttime flights were used to conduct this analysis.

Lidar Measurements
Lidar measurements in this study were made using RALMO. RALMO was built at the École Polytechnique Fédérale de 25 Lausanne (EPFL) for operational meteorology, model validation, and climatological studies and is operated at the MeteoSwiss Station in Payerne, Switzerland (46.81 • N, 6.94 • E, 491 m a.s.l.). RALMO is designed to be an operational lidar, and as such, needs to have high accuracy, temporal measurement stability, and minimal altitude-based corrections (Dinoev et al., 2013;Brocard et al., 2013).
RALMO operates at 355 nm with a nominal pulse energy of 300 mJ and a repetition frequency of 30 Hz. Measurements are recorded for one minute (1800 laser shots) with a 3.75 m height resolution from both the nitrogen (407 nm) and water vapour 5 (387 nm) Raman scattering channels. The lidar measurements are processed for calibration in several steps. First, we select ±2 hours of 1-minute lidar profiles around the launch time of the radiosonde. While two hours was chosen as an arbitrary time range to allow for scan selection, in practice the method rarely selects scans more than 30 min before or after the launch. The 1 min scans are filtered to remove scans with abnormally high background and to ensure that we have a sufficiently high SNR in each scan. We assume clouds are present if the nitrogen SNR is less than 1 at 13 km. If a cloud is present, the scan is masked 10 and removed from the calibration. After the filtering process, the radiosonde measurements are linearly interpolated onto the lidar altitude grid of 3.75 m. The calibration is conducted at the lidar's native altitude resolution in order to provide as many data points as possible and to avoid smoothing out small features.
3 The Radiosonde Trajectory Method 3.1 Tracking Air Parcels 15 The flow chart of the calibration process for both methods is shown in Fig. 1. This section will discuss each step in the calibration process in three pieces. First, we will explain the air parcel tracking and how we choose the lidar measurements which match the radiosonde's. Second, we will discuss the conversion of the lidar measurements into water vapour mass mixing ratio units. Lastly, we discuss how we choose the relevant regions for calibration using cross-correlation and the least square fit.   After the measurements have been cloud-filtered and put on the same altitude grid, we use the radiosonde wind speed measurements to track the air parcels measured by the radiosonde. We use the latitude and longitude of the radiosonde, as calculated by the on-board GPS system, as the initial position for air parcel tracking. All GRUAN radiosondes report their geophysical coordinates, however radiosondes before 2011 were not GRUAN-processed and did not report their latitude or longitude coordinates. If the geophysical coordinates of the radiosonde are reported, we transform the coordinates onto a 5 Euclidean grid with the lidar located at the origin. Otherwise, when the geophysical coordinates are not available, we infer the radiosonde position from the wind measurements and the coordinates of the launch site. When inferring the radiosonde position, we assume linear geometry due to the relatively short distances traveled by the radiosonde between each measurement. The radiosonde positions are required to calculate the 2D air parcel trajectories. We do not explicitly consider the vertical movement of the air parcel in this method.
RALMO's field-of-view projects to a circle of approximately 1 m diameter at 5 km altitude, an area too small for most trajectories to pass directly through. Therefore, it was necessary to construct a region of assumed horizontal homogeneity in which the water vapour mixing ratio is constant. In order to maintain significant lidar SNR, we defined the homogeneous region, hereafter called the "lidar region", to be a circle with a radius of 3 km radius around the lidar. The size of the homogeneous region was chosen by varying the radius of the cylinder from a range of 1 -25 km and finally increasing it to infinity. Radii 15 below 3 km had very low SNR's and did not have enough altitude coverage to do an effective calibration. Radii above 3 km had large enough SNRs, however, they started to exhibit biases due to over-integrating at some altitudes and lost small features that had previously been visible. The 3 km radius provided the highest SNR, with profiles closest to the radiosonde measurements.  . Trajectory calculation and scan selection example. The purple circle around the lidar has a 3 km radius and represents the region in which we assume the humidity field is horizontally homogeneous. The green dot is the radiosonde position, the purple dot is the lidar position, and the red arrow is the air parcel trajectory. The variable z refers to altitude, t1 is the entry time, and t2 is the exit time from the 3 km radius. The integration time, t(z), is the total time that the air parcel spends inside the homogeneous region. When the air parcel trajectory does not intersect with the circle, then no data is used for the calibration. Fig. 2 shows how air parcels will always be "seen" by the lidar if the radiosonde remains inside the 3 km radius, whereas any air measured outside the radius may not intersect with the lidar region. If the trajectories do not enter the region, we do not use these altitudes for calibration. The entry and exit times from the homogeneous region mark the first and final scans used to calculate the lidar water vapour mixing ratios, with a maximum of 30 min of integration in order to accurately compare with the traditional technique, which uses a standard 30 min summation across all altitudes (Dinoev et al., 2013;Leblanc et al., 2012;5 Whiteman et al., 1992;Melfi, 1972). The standard thirty minute integration is the average time it takes a radiosonde to reach the tropopause, and therefore generally covers the primary calibration altitudes. Integrating for longer than 30 min is too long to capture the water vapour field variability viewed by the radiosonde. If the total time spent inside the homogeneous region exceeds 30 min, we take ±15 min around the time of closest approach to the lidar. The variation of the integration length with altitude is shown in Fig. 3. The integration time will decrease with altitude for two reasons: higher wind speeds and the air 10 parcel trajectories may intersect with the outer edges of the homogeneous region and are therefore inside for shorter time spans  The lidar water vapour integration period is determined by the length of time the air parcels spend inside the homogeneous region. The integration time will decrease with altitude due to higher wind speeds. The maximum integration time is 30 min, in order to properly compare with the traditional analysis.

Calculation of the Water Vapour Mixing Ratio for RALMO Measurements
The water vapour mixing ratio (w) for the RALMO is calculated from the background-and saturation-corrected lidar signals using the water vapour Raman lidar equation (Melfi, 1972;Whiteman et al., 1992;Whiteman, 2003): where N H2O,N2(z) is the background-and saturation-corrected water vapour and nitrogen photon signals as a function of 5 altitude (z) and Γ H2O,N2(z) is the downward transmissions for the water vapour and nitrogen channels. The transmission values are calculated using the GRUAN-corrected temperature and pressure profiles from the corresponding radiosonde and the Rayleigh cross-sections are determined using the Nicolet (1984) formulae (Nicolet, 1984). We do not correct for aerosols as they are considered to have a very small contribution to the overall mixing ratio (Whiteman et al., 1992). RALMO uses a polychromator with a bandpass of 0.3 nm (Simeonov et al., 2014). RALMO was designed to minimize temperature dependence and the central wavelengths of the water vapour and nitrogen channels were chosen accordingly. Dinoev et al. (2013) showed that the nitrogen channel had a relative change in transmitted intensity of 0.4% per 100 K and the water vapour channel intensity changed by roughly 1% when varied between −60 • C and +40 • C.
The calibration constant C w is defined as: The calibration constant contains all unknown factors, such as: the fraction of nitrogen molecules in air, 0.781 , the molecular weights of water and dry air (M H2O,Air ), the system efficiency of the nitrogen and water vapour channels (η N2,H2O ), the overlap function for both channels (O N2,H2O (z)),the Raman cross-section for each molecular species (σ N2,H2O (T (z))), and the temperature dependency of the Raman cross-section (F N2,H2O (T (z))). In RALMO's case, the ratio of the overlap between the two channels is designed to be unity (Dinoev et al., 2013;Simeonov et al., 2014). After calculating the ratio of the corrected lidar signals, we use a correlated and weighted least squares fit to normalize the lidar profile to the radiosonde and find the calibration constant (Dionisi et al., 2010;Whiteman et al., 2012). The radiosonde relative humidity profile is transformed into water vapour volume mixing ratio using the standard WMO conversion (World 10 Meteorological Organization (WMO), 2014). The calibration range extends from 500 m above sea level (ASL) to roughly 7 km ASL depending on the profile. The bottom limit is the first lidar altitude bin at 490 m above sea level, and the final calibration altitude is determined by the SNR and integration limits we impose. We remove scans at all altitudes where the trajectory spent less than 5 min in the lidar region due to their low SNRs which typically results in the calibration region ending around 7 or 8 km. To ensure that the calibration constant is not biased by a vertical displacement of the air parcel between the lidar and 15 the radiosonde volume, we require the resulting uncalibrated lidar and the radiosonde mixing ratio profile to be correlated to greater than 90%.
A moving window of 300 m is run over both the radiosonde and lidar profile, and the cross correlation between the two profiles inside each window is determined. To reduce the effect of noise on the cross correlation, both profiles are smoothed beforehand with a boxcar filter of 101.5 m width. In less than one-third of the cases, when the radiosonde leaves the lidar region 20 early, or the wind is such that the air is spending less than 5 min in the lidar region, a large portion of the profile may be cut off.
In which case, if there are less than 3 windows (900 m) available for calibration then the radiosonde and the lidar are smoothed to 22.5 m and windows of 100 m are used instead.
If the correlation between the radiosonde and lidar mixing ratios within each window is higher than 90% then that window's altitude range is accepted for calibration (Dionisi et al., 2010;Whiteman et al., 2012). If there is less than 900 m of data available 25 for calibration at the end of the correlation process, then we do not use that night for calibration as it does not have enough data with which to accurately calibrate. While the correlation is calculated on the smoothed profiles, the fit is done by using the native resolution of the lidar inside the accepted calibration windows with a requirement of at least 243 points. The least squares fit is conducted over all of the points selected in the cross-correlation procedure. Each fitting point is weighted by the sum of the variances of the water vapour mixing ratio percent uncertainty and the average radiosonde mixing ratio uncertainty. The lidar water vapour mixing ratio statistical uncertainty (σ w,stat ) is propagated from the water vapour and nitrogen channel statistical uncertainties (Melfi, 1972) using Eq. 1. The radiosonde mixing ratio percent uncertainties (σ M R,Radiosonde ) are calculated using the total relative humidity, temperature, and pressure uncertainties reported for each GRUAN flight and propagating 5 through the Highland and Wexler (1983) mixing ratio conversion while assuming the relative humidity to be over water (Hyland and Wexler, 1983;Dirksen et al., 2014;Immler et al., 2010). However, the GRUAN processing occasionally does not report pressure uncertainties below 15 km. Therefore, it was necessary to create a nightly average pressure uncertainty profile which was used on nights where pressure uncertainties were not reported. The variation in the pressure uncertainties was on the order of 0.01%, therefore this assumption is justified. The calibration constant is then determined by using a one-parameter weighted 10 least-squares fit of the form shown in Eq. 1.
The final calibrated water vapour profile for July 22, 2015 is shown in Fig. 4. The correlation algorithm selected 84% of the profile above 1.5 km to use for the calibration while regions with high variability were excluded from the calibration. The calibrated profile closely follows the radiosonde profile, with differences fluctuating between 5% and 20% over all altitudes.
The standard error of the slope from the weighted fit is the uncertainty in the calibration constant due to measurement noise. 15 The accuracy to which we know the calibration constant will be discussed further in Section 5.

Comparison of the Traditional and Trajectory Methods
We applied the trajectory technique to 76 nights between January 2011 and December 2016 in which 31 were removed due to lack of lidar measurements during the radiosonde launch window, primarily due to precipitation or repairs. From the 45 remaining nights, the trajectory calibration and traditional method automatically removed 8 nights due to abnormally high background. An additional 13 nights were removed from both the trajectory and traditional calibrations due to low signal-to-5 noise and clouds. The filtering process removed all of the nighttime flights from 2008 to 2011 due to significant cloud cover coincident with the radiosonde launch. A final list of the nights with their calibration constants is shown in Table 1. We found 12 out of the 24 remaining calibration nights exhibited significant disagreement between the traditional and the trajectory calibrations, the reasons for which are discussed below.  Table 1. A comparison of the calibration constants of all nights used in this study. The table is broken into two sections-homogeneous and heterogeneous calibration nights. Column 1 is the date on which the radiosonde was launched. Column 2 or C trad is the traditional calibration constant. Column 3 or Ctraj is the trajectory method calibration constant. Column 4 is the difference between the two constants. Column 5 is the percent difference of the two constants with respect to the traditional calibration constant. Column 6 is for comments regarding the differences. Two nights in the homogeneous section presented larger differences from the rest of the nights due to using different calibration regions. Three nights in the heterogeneous group had very small differences in their calibration constant due to using similar regions for calibration, despite the variability in the water vapour.
We compared the trajectory method result to the traditional technique discussed in the previous section in which the radiosonde movement is not taken into account, and all altitudes are integrated for 30 min after the launch. It became apparent that if the water vapour field is stable for long periods of time and experiences very little change over the distance traveled by the radiosonde, then the radiosonde and the lidar should measure roughly similar water vapour content. Therefore, we should see good agreement between the traditional and trajectory methods and small changes in the calibration constant. Of the 24 5 calibrations that passed cloud filtering and had enough calibration regions, 12 dates showed good agreement in their profiles when compared to the radiosonde due to stable or homogeneous water vapour conditions. A subset of these nights are shown in Fig. 5 and we have labeled these nights as "homogeneous" or "stable" nights in Table 1.
Both methods produce profiles that agree well with the radiosonde and have an average bias around 0% with the exception of the night of August 8, 2012 which has an offset of 5% difference with altitude when using the traditional method. The bias on that night is reduced when using the trajectory method. Both methods have difficulty matching the radiosonde at the altitudes where there are sharp changes in water vapour density as shown by the large spikes in Fig. 5. Figure 5. A subset of the dates with largely homogeneous conditions showing the differences between the traditional and trajectory calibration techniques. The first column is the water vapour mixing ratio time series averaged to 15 m altitude bins, and the first red line the time when the radiosonde was launched. The second red line is 30 min after radiosonde launch and indicates the last profile used for the traditional method. White vertical regions are where scans have been filtered. The second column is the percent difference between the radiosonde and the profile produced using the traditional method. The third column is the percent difference between the radiosonde and the profile produced by the trajectory method. Pink regions are regions where the correlation between the radiosonde and the lidar are above 90%. During homogeneous conditions, the trajectory and traditional methods show good agreement, with similar percent differences with respect to the radiosonde. Large spikes are regions where the lidar and the radiosonde disagree on layer heights. When the water vapour field is horizontally heterogeneous, meaning water vapour at a given pressure surface fluctuates by 50% or more over the course of the 30 min traditional calibration period, the trajectory method should better represent the air sample by the radiosonde than the traditional technique (Fig. 7). We define a "heterogeneous" field by movement of water vapour layers over 100 m in altitude over the course of 30 min. Layers on the order of several hundred meters thickness can change in altitude over this period, resulting in water vapour mixing ratios changing over 30% at a given height. In general, 5 the percent difference between the radiosonde and the trajectory-calibrated profile on heterogeneous nights is much smaller than the difference between the radiosonde and the traditional method. The trajectory method profile has a smaller standard deviation with altitude and has an average bias of 0%. The traditional method cannot compensate for the rapid changes during the half-hour calibration time, and this results in larger differences between the lidar and radiosonde, on the order of 10 -20%.
The trajectory method does have these large differences above 4 km altitude, as during periods where water vapor is rapidly 10 changing it uses shorter integration periods.
The differences between the calibration constants on the heterogeneous nights is larger than the homogeneous nights due to the difference in calibration regions (Table 1). The average difference in the calibration constants on heterogeneous nights is 2.0±1.4% from the traditional method calibration constant. Three nights out of the 11 in the heterogeneous nights showed very 15 Atmos. Meas. Tech. Discuss., https://doi.org/10.5194/amt-2018-246 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 12 October 2018 c Author(s) 2018. CC BY 4.0 License. small differences in the calibration constant despite structural changes throughout the calibration period. These three nights used similar calibration regions that were also stable over the course of the calibration in both methods. Figure 7. A subset of the dates with largely heterogeneous conditions showing the differences between the traditional and trajectory calibration techniques. The first column is the water vapour mixing ratio time series and the first red line is the time when the radiosonde was launched. The second red line indicates the last scan used in the traditional method. White vertical regions are where scans have been filtered. The second column is the percent difference between the radiosonde and the profile produced using the traditional method. The third column is the percent difference between the radiosonde and the profile produced by the trajectory method. This figure shows that when the water vapour field changes over the 30 min traditional calibration period, the traditional water vapour profile can look significantly different from the radiosonde. The trajectory method produces a profile with a smaller percent difference with respect to the radiosonde.
The average and the standard deviation of all percent difference profiles with the radiosonde from the trajectory and traditional method profiles are shown in Fig. 8. The average trajectory bias oscillates around 1%, but the variability increases above 5 km. This is due to the shorter integration times and smaller SNRs at higher altitudes (Fig. 8). The average traditional bias also 5 oscillates around -0.7%, however, the average profile deviates farther from the center than the trajectory method (Fig. 8). The standard deviation of all of the percent difference profiles shows that the trajectory method more accurately fits the radiosonde profile above 2 km on a profile-by-profile basis and will more consistently provide better fits. Below 2 km the traditional and trajectory methods produce similar profiles on average, with similar consistency.
While both methods will produce similar profiles on stable nights, the two may not share the same calibration constants 10 due to using different lidar scans (Fig. 6). The traditional method uses all profiles from the radiosonde launch to 30 min after  launch. The trajectory technique will choose the appropriate calibration scans based on each air parcel's trajectory and its position of closest approach. The trajectory method will remove measurements from altitudes where the air parcel trajectories do not intersect with the homogeneous region. Table 1 is divided into homogeneous and heterogeneous nights. The majority of the homogeneous nights have a percent difference from the traditional method of less than 0.5%. However, two nights show large differences and this is due to using different calibration regions in the trajectory method. The average percent difference 5 in the calibration constants is 0.4 ± 0.3% when not considering the two anomalous nights, but increases to 1.2 ± 1.95% when they are included.

Lidar Calibration Uncertainties for Trajectory and Traditional Methods
We investigated three major sources of uncertainty in the determination of the calibration constant: the lidar statistical uncertainty, the GRUAN radiosonde mixing ratio uncertainty, and the dead time uncertainty. Most of the calibration uncertainty is 10 due to that of the reference instrument Leblanc and Mcdermid (2008). The uncertainty in the calibration constant, the lidar statistical uncertainties, and dead time were identified as the major sources of uncertainty in RALMO water vapour measurements by Sica and Haefele (2016). The GRUAN radiosonde water vapour mixing ratio uncertainties were calculated from the reported GRUAN total uncertainties for pressure, temperature, and relative humidity, calculated using the Hyland and Wexler 1983 formula for saturation vapour pressure over water (Hyland and Wexler, 1983;Dirksen et al., 2014). We use an average pressure uncertainty profile calculated from all the nights when the pressure uncertainty is not reported for less than one-third 5 of the nights. The radiosonde mixing ratio uncertainties are linearly interpolated onto the lidar's 3.75 m resolution grid for the uncertainty determination.
The lidar mixing ratio statistical uncertainties are propagated through Equation 1 using the random uncertainties from both the water vapour and nitrogen signals. The lidar statistical uncertainties from the trajectory method are smaller than the radiosonde uncertainties below 3 km but are larger than the radiosonde uncertainties, varying from 10% to 20% at and above 10 4 km from profile to profile.
Both the lidar statistical and radiosonde uncertainties were used as the weights for the least squares fit performed in Sect. 3.3, defined by Eq. 3 (Bevington and Robinson, 2003).
where C w is the calibration constant, K is the number of points used in the fit, R i are the radiosonde mixing ratio points used 15 in the calibration, and L i are the saturation and transmission corrected ratio of water vapour and nitrogen signals, and σ i are the weights. Each calibration value has some standard uncertainty associated with the weighted fit, hereafter called the fitting uncertainty. The fitting uncertainty arises due to the lidar's photon counting statistics from the lidar's digital water vapour channels. The average trajectory method fitting uncertainty is 0.4% of the average calibration constant. The average fitting uncertainty for the traditional method is 0.3% of the average calibration constant. The traditional method has smaller fitting or 20 statistical uncertainties than the trajectory method due to the larger number of scans used per altitude, on average, compared to the trajectory method. The calibration of a lidar using a radiosonde is limited by the accuracy of the radiosonde measurement. The uncertainty of the water vapour calibration constant due to lidar random and radiosonde systematic and random uncertainties was determined using the uncertainty propagation in Eq. 4 (JCGM, 2008).
where X is the measurement vector including both the radiosonde and lidar measurements used to calculate the calibration 5 constant from Eq. 3 with length N = 2K. We make several assumptions in Eq. 4. First, by definition, the covariance of a radiosonde or lidar measurement uncertainty with itself is simply the variance. Second, we assume that the lidar photon counting uncertainties are uncorrelated with each other. Third, we assume that the radiosonde measurement uncertainties are uncorrelated with lidar measurement uncertainties. Lastly, we assume that the radiosonde measurement uncertainties are correlated with each other with a correlation coefficient of r = 1. Choosing r equal to unity implies that we are assuming 10 complete correlation and therefore the maximum possible uncertainty. With these assumptions we can simplify Eq. 4 as follows: where U L,R are the corresponding lidar and radiosonde mixing ratio uncertainties, and U i,j are the uncertainties corresponding to the measurement vector X.The derivatives are calculated from Eq. 3. Note that the second term in Eq. 5 is the uncertainty due 15 to the lidar's photon counting uncertainty. This term is the same as the fitting uncertainty discussed in the previous paragraph, effects is 4% when considering the lidar statistical uncertainty, the radiosonde measurement uncertainties, and the dead time uncertainty, with most of the uncertainty due to the uncertainty in the radiosonde measurement.
In many situations the uncertainty of the calibration constant is estimated using the standard deviation of the calibration constant over a time period of days (Leblanc and Mcdermid, 2008;Dionisi et al., 2010). However, using GRUAN radiosonde measurement uncertainties allows us to calculate uncertainties on a nightly basis and does not require a time series. We also 5 compared the average nightly calibration results to the RALMO calibration time series. The RALMO system is known to have differential aging of its photomultipiers which causes the calibration to drift (Simeonov et al., 2014). The calibration time series was de-trended to calculate the standard deviation of the calibration over 10 years. The standard deviation for both techniques was 4%, thereby agreeing with the average nightly uncertainty.
Two other possible contributors to the total calibration constant uncertainty are the overlap function and aerosol layers.

10
RALMO is designed to have no differential overlap in the water vapour and nitrogen channels and an overlap ratio between the nitrogen and water vapour signals of unity. However, a small differential overlap could result from chromatic aberration from the protective windows and edge filters (Dinoev et al., 2013). We also do not consider aerosols in our transmission calculation, but we expect this to make a small contribution to the uncertainty for the nights with typical aerosol loading used in this study. (Whiteman et al., 1992;Whiteman, 2003;Dinoev et al., 2013) and others have shown aerosol effects on lidar signals to be less 15 than 5% during mostly clear conditions, which would translate to a negligible effect on the calibration constant.

Summary
We have presented a new method to calibrate Raman-scattering water vapour lidar systems that incorporates geophysical variability into the determination of the calibration constant. The trajectory method tracks the air parcels measured by the radiosonde and matches them with the appropriate lidar measurement time; thus, the integration time varies with height. We 20 compared this method to the traditional lidar calibration technique where we sum 30 min of lidar measurements and fit them to a radiosonde profile.
The difference between the traditional and trajectory method calibration coefficients is due to the different lidar profiles used by the methods, as well as the difference in correlation regions used to determine the calibration coefficient from these profiles.
This difference means that when the water vapour field is homogeneous, the traditional method and trajectory method profiles 25 will produce similar profiles, with slight differences due to the correlation regions included. We found that the homogeneous nights had an average difference of 0.4% from the traditional calibration constant value. In contrast, the heterogeneous nights, The average fitting uncertainty in the calibration constants produced by the trajectory technique is 0.4%, as opposed to the traditional method error of 0.3%, although the fitting uncertainty is negligible relative to the uncertainty of the calibration constant, due to the accuracy of the radiosonde measurements which average 8% of the mixing ratio from the ground until 5 10 km. We have also shown that using trajectories to track the air sampled by the radiosonde more accurately reproduces the radiosonde profile when the water vapour field is variable and decreases the percent difference between the lidar and radiosonde measurements by 5 -10%. In summary, we found the following: 1. The traditional and trajectory methods agree when the water vapour field is homogeneous during the radiosonde flight.
The average difference between their calibration constants was 0.4%.

10
2. The trajectory method provides a better fit with the radiosonde when the water vapour field changes appreciably over the time of the radiosonde flight. For these cases the calibration constants calculated by the trajectory method result in an average of 2% difference with the trajectory method typically larger than the traditional.
3. The trajectory method produces a smaller average bias between the radiosonde and the lidar than the traditional method below 4 km. Adding points above 4 km does not change the calibration constant due as the weights at those altitudes are 15 so small.
4. The combined lidar statistical and radiosonde mixing ratio uncertainties contribute an average of 4% uncertainty in the calibration constant determination for both calibration methods where the radiosonde mixing ratio uncertainty is the dominating factor.
5. The uncertainty in the dead time contributes an average of 0.3% in the calibration constant for a 4% dead time uncertainty 20 in the knowledge of the dead time.
6. The trajectory method has an RMS uncertainty of 5% while the traditional method has an RMS of 4% on average over all calibration nights which agrees with the total uncertainties calculated using Eq. 4.
A summary of the uncertainty components in the calibration constant is shown below in Table 2.

Discussion and Conclusions
The trajectory calibration technique attempts to more realistically represent the physical processes taking place during a radiosonde -lidar calibration, by ensuring the radiosonde and lidar sample the same air mass. This tracking method was built upon the methods suggested in Whiteman et al. (2006), Leblanc and Mcdermid (2008), and Adam et al. (2010). Similarly to Whiteman's "Track" technique and Leblanc's "radiosonde-tracking" technique, we match the measurements at each altitude 5 with the radiosonde. However, Whiteman et al. (2006) assumed a horizontally homogeneous and uniformly translating atmosphere and did not consider varying wind speed and direction. In Whiteman et al. (2006) the integration time was varied with altitude in order to keep the random uncertainty below 10%, however, the position of the air parcels was not considered. Our method does not assume a uniformly translating atmosphere, however, we do consider a homogeneous region around the lidar and the integration time is varied as a function of the time the air parcels spend inside the homogeneous region. Leblanc and 10 Mcdermid (2008) used four methods to match the radiosonde and the lidar measurements: 1) no matching, summing 2 hours of lidar profiles 2) using all lidar scans before the radiosonde reaches 10 km, about 30 min of scans, similar to our "traditional method", 3) only altitudes with minimum water vapour variability over 2 hours are used to calibrate, and 4) only using scans which were coincident with the radiosonde altitude -similar to the Whiteman et al. (2006) "Track" technique and our trajectory method. However, method 4 did not track the air parcels as we did. Leblanc et al. (2012) found that the second method 15 provided the smallest variation in their calibration constant, but did mention that the other methods produced very close results and could be used as well.
Using our new trajectory method has several advantages over the traditional technique. The first advantage is that the method presents an automatic and new scheme to calibrate with non-co-located radiosondes. The trajectory method does not rely on the radiosonde's location, but instead relies of the direction of the air measured by the radiosonde. The trajectory method 20 will automatically find the appropriate calibration times as a function of altitude for the lidar. Lidar stations may then be able to use airport radiosondes more effectively thus allowing more frequent calibrations over the year, reducing the need for expensive calibration campaigns. Lidar stations who use this technique with radiosondes located several kilometers away may find it necessary to expand their "lidar region" to greater than 3 km. Secondly, this method allows for calibration if the water vapour field changes rapidly in space and time, allowing more nights to be used for calibration when they would otherwise be discarded due to large differences between the traditional lidar profile and the radiosonde. Lidars with drifts or fluctuations in their calibration constant that may require many calibrations might also find this technique useful. Additionally, frequent and accurate lidar calibrations are critical for detecting water vapour trends and small changes in water vapour in the UTLS region. The tracking method also removes the representative uncertainty which is a large component of the uncertainty budget 5 in the traditional calibration constant calculation. We consider the representation uncertainty to be small in the tracking method because we are now considering the location of the radiosonde relative to the lidar. Lastly, this technique provides an automatic, objective and quantitative method of determining acceptable calibration nights. This method could conceivably be expanded to work with ozonesondes or tracking other conserved quantities such as aerosols. We have not attempted to expand this technique, but leave it up to others who may find it useful. The method could also be further expanded to work with wind field 10 measurements that include vertical wind speeds.
Half of the calibration nights in this study showed structural variations in water vapour over the 30 min traditional calibration period. These nights had an average of 2% difference in the calibration constant, which is less than the average calibration uncertainty of 4%. Therefore, the trajectory and traditional methods do not produce statistically different calibration values.
However, the trajectory method does more accurately reproduce the radiosonde profile than the traditional method below 4 km 15 and above 4 km the methods do equally well. The water vapour content below 4 km for the nights in this study was an average of 87% of the total content measured by the radiosonde. Therefore, we believe that calibration should be limited to below 4 km where the signal is highest and the trajectory method performs best. Additionally, the points above 4 km do not make a significant difference in the calibration due to their smaller weights and therefore are not necessary to include.
The RALMO is one of few lidars with enough measurements to detect water vapour trends with an average of 50% uptime 20 over 10 years. In addition to frequent measurements, trend analyses also require minimal uncertainty and well-characterized retrievals. The aim of this work was to develop a calibration method that characterized the uncertainty of the calibration constant as well as making sure it was physically consistent with the reference instrument. The trajectory calibration technique will be used on a larger scale to produce a 10 year water vapour climatology and UTLS trend analysis using RALMO measurements combined with the Optimal Estimation Method discussed in Sica and Haefele (2016).

25
Data availability. All GRUAN data is accessible on www.gruan.org and access may be requested through them. MeteoSwiss lidar data may be requested by contacting Dr. Alexander Haefele (Alexander.Haefele@meteoswiss.ch).