Quality assessment of integrated water vapour measurements at St. Petersburg site, Russia: FTIR vs. MW and GPS techniques

The cross-comparison of different techniques for atmospheric integrated water vapour (IWV) measurements is the 10 essential part of their quality assessment protocol. We inter-compare the synchronised data sets of IWV values measured by Fourier-transform infrared spectrometer Bruker 125 HR (FTIR), microwave radiometer RPG-HATPRO (MW) and global navigation satellite system receiver Novatel ProPak-V3 (GPS) at St. Petersburg site between August 2014 and October 2016. Generally, all three techniques agree well with each other and therefore are suitable for monitoring IWV values at St. Petersburg site. We show that GPS and MW data quality depends on the atmospheric conditions; in dry atmosphere (IWV 15 smaller than 6 mm), these techniques are less reliable at St. Petersburg site than the FTIR method. We evaluate the upper bound of statistical measurement errors for clear-sky conditions as 0.33 ± 0.03 mm (2.0 ± 0.3 %), 0.54 ± 0.03 mm (4.5 ± 0.3 %), and 0.76 ± 0.04 mm (6.3 ± 0.7 %) for FTIR, GPS and MW methods, respectively. We conclude that accurate spatial and temporal matching of different IWV measurements is necessary for achieving the better agreement between various methods for IWV monitoring. 20

In recent years, the IWV content has been regularly measured at St. Petersburg site (Semenov et al., 2015;Berezin et al., 2016Berezin et al., , 2017Virolainen et al., 2016;Ionov et al., 2017). Table 1 presents a description and main results of different campaigns for 10 IWV inter-comparisons at Peterhof and gives the references to original studies. Note that all series correspond to different rather short time intervals as well as to various numerical estimates; this makes their comparison difficult. Therefore, it is worth inter-comparing all available simultaneous IWV measurements at St. Petersburg site throughout the longest available period to assess the quality of individual methods.
Following Semenov et al. (2014) and Berezin et al. (2016Berezin et al. ( , 2017, who demonstrated that 50 km distance between locations of 15 IWV measurements might be responsible for significant disagreement due to the spatial inhomogeneity of the water vapour fields, we exclude from the current study the radiosonde data of nearby WMO site #26063 (Voejkovo). Moreover, in the comparison we do not include the sun photometer (Cimel) measurements since they require an additional calibration procedure (Berezin et al., 2017). Therefore, the current research is devoted to the simultaneous IWV measurements by three groundbased methods that use Bruker 125 HR spectrometer (FTIR), RPG-HATPRO radiometer (MW), and global navigation satellite 20 system receiver Novatel ProPak-V3 (GPS).

FTIR method
Since the beginning of 2009, St. Petersburg site FTIR system, which consists of a Bruker 125 HR spectrometer and a selfdesigned solar tracker (Poberovsky, 2010), has been recording solar spectra. Atmospheric FTIR measurements using the Sun as a light source are performed under cloudless conditions or when breaks in cloud cover allow measurements of solar spectra. 25 The alignment of FTIR instrument is controlled by HBr cell spectra generated using both an internal light source and the Sun (Hase, 2012;Makarova et al., 2016).
2610-3020 cm -1 (type M). Type A is a standard PROFFIT retrieval scheme, type M refers to the MUSICA 2015 retrieval scheme (Barthlott et al., 2017) and has a special focus on {H2 16 O, HD 16 O/H2 16 O} data pairs. Figure 2 presents typical ground-based IFS Bruker 125HR measurements of solar absorption spectra in the fitted spectral microwindows containing water vapour lines. FTIR measurements are performed with a spectral resolution of about 0.005 cm -1 (with optical path difference of 180 cm). The A-type spectral region is characterised by the saturated water vapour lines (with 5 the H2 16 O isotopologue having the stronger signatures than the HD 16 O isotopologue) and their interference with O3 absorption lines, whereas water vapour lines in the M-type region are of very similar line strengths for H2 16 O and HD 16 O and not saturated but well isolated from other absorption lines. The spectral scheme for M-type retrieval also includes three micro-windows with CO2 lines (in 2610-2627 cm -1 spectral range), which are used for temperature retrieval.
To improve the IWV measurement accuracy we use approaches proposed in Schneider et al. (2010b): (a) a logarithmic scale 10 inversion, (b) a speed-dependent Voigt line-shape model, (c) the consideration of atmospheric emission for the retrievals, (d) a simultaneous retrieval of interfering species, and finally, for the M-type retrievals, (e) a simultaneous temperature retrieval as well as the use of water vapour isotopologues inter-species constraints (Barthlott et al., 2017). The H2 16 O and HD 16 O retrievals in the A-type setup are made independently. For spectroscopic parameters of the absorption lines, we use the HITRAN2008 database with 2009 updates (Rothman et al., 2009) with slight modifications of pressure broadening and line 15 intensities: for the spectral range used by the A-type retrieval, according to Schneider et al. (2011), and for spectral range used by the M-type retrieval, according to the Appendix of Barthlott et al. (2017).
The corresponding pressure and temperature profiles used for the analysis are the daily National Center for Environmental Prediction (NCEP) re-analysis data (Lait et al., 2005) for Peterhof location. The a priori profiles of interfering atmospheric constituents are adopted from The Whole Atmosphere Community Climate Model (WACCM) data -using a single set of 20 climatological a priori profiles during all seasons for Peterhof location (Park et al. 2013). The selected mode of the PROFFIT retrieval code is based on the Tikhonov-Phillips approach (A-type) (Tikhonov 1963;Phillips 1962) and on the optimal estimation method (M-type) (Rodgers 2000), respectively. After analysis of spectra, we filtered the IWV retrievals, depending on the ratio: remaining measurement noise (in percent) to the number of dofs (degrees of freedom for signal). Further, we consider only those retrievals, for which this ratio is less than 1.0 for the A-type and 0.5 for the M-type retrievals. We chose 25 this criterion in accordance with the signal-to-noise ratio in corresponding spectral regions. Thus, for the period between March 2009 and December 2016 we selected 3265 and 3548 IWV retrievals of A-type and M-type, respectively.
For the error budget calculations, we assume the same uncertainty sources and values for both types of the retrieval. We calculate "gain matrices" that show the sensitivity of the retrieval to some error source, and associated covariance matrices for statistical and systematic errors (Rodgers 2000). For calculating the error matrices, the uncertainty of the error sources are 30 taken into account as listed in Table 2. The relative weight of statistical and systematic contribution to the total IWV error varies depending on the error source (Hase et al., 2004). The spectroscopic line parameters uncertainty (see Table 2) is the major source of systematic errors. In the A-type spectral channels (see Fig. 2) the intensity of solar radiation is smaller than in the M-type, resulting in a decrease of signal-to-noise ratio and, consequently, in an increase of measurement noise and its Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 29 May 2017 c Author(s) 2017. CC-BY 3.0 License.
influence on statistical errors of the A-type retrieval. Statistical errors of the A-type retrieval are controlled mainly by temperature profile uncertainty, whereas in the M-type scheme temperature profile is simultaneously retrieved together with the target gases. Thus, our estimations indicate that IWV retrieval of the M-type is slightly more precise than that of the Atype due to: a) the higher signal-to-noise ratio, and b) the simultaneous temperature retrieval.

MW method 5
The 14-channel microwave radiometer RPG-HATPRO (generation 3) is one of the instruments used for IWV measurements at St. Petersburg site. It has been functioning since June 2012 in constant mode with sampling interval of about 1-2 s and the integration time of 1 s. A complete description of radiometers of the RPG-HATPRO type is presented at the website of the manufacturer (http://www.radiometer-physics.de). All information relevant to the experimental setup can be found in the paper by Kostsov et al. (2016). It should be noted that IWV measurements are performed from zenith observations only. 10 We analyse the atmospheric MW radiation brightness temperature spectra by two separate and independent retrieval algorithms. The first algorithm is the built-in regression algorithm (REGR) provided by the manufacturer and tuned for SPbU measurement site. The algorithm uses a quadratic regression scheme applied to the brightness temperature observations in zenith mode plus surface pressure sensor data. Tuning of this algorithm is based on radiative transfer calculations for atmospheric models that have been compiled using 10 years of radio-sounding data at Voejkovo station near St. Petersburg. 15 The absolute accuracy of IWV retrievals by the REGR algorithm declared by the manufacturer is of 0.3 mm, the random noise is less than 0.05 mm. Our estimates of IWV variations in stable atmospheric conditions for different days of measurements (Virolainen et al., 2016) showed that the standard deviation (SD) of means for RPG HATPRO totals 0.05-0.09 mm, which is rather close to noise level presented by other researchers (Steinke et al., 2015).
The second algorithm is based on the inversion of the radiative transfer equation and therefore is referred below as "physical 20 algorithm" (PHYS). This algorithm uses the well-known and widely applied approach of simultaneous retrieval of several atmospheric parameters profiles, which influence the radiative transfer at frequencies corresponding to spectral channels of the MW radiometer. Since the problem is ill-posed, we used the optimal estimation method for its regularisation. The description of the specific features of the physical algorithm applied to RPG-HATPRO measurements, assessment of the retrieval accuracy for different parameters, and the examples of retrievals can be found in the paper by Kostsov (2015a). 25 Besides the brightness temperature measurements, the PHYS algorithm utilizes the surface pressure, temperature and humidity readings, the temperature and relative humidity profile statistics as well as the hydrostatic equilibrium constraint by applying the general approach to the solution of multi-parameter inverse problems (Kostsov, 2015b). To calculate the IWV values, we integrate the absolute humidity vertical profile. We obtain the IWV retrieval error from the error matrix corresponding to the absolute humidity profile, which we calculate for every single set of brightness temperature measurements. Therefore, the 30 IWV retrieval error estimate is a variable quantity. In practice, the values of statistical retrieval error estimates for the PHYS algorithm are within the interval 0.08-0.10 mm.
The MW IWV retrievals at St. Petersburg site considered in earlier studies (Berezin et al., 2016(Berezin et al., , 2017Virolainen et al., 2016;Ionov et al., 2017) correspond to the REGR algorithm. In this study, we use the PHYS algorithm to analyse the MW measured spectra due to the following reasons: a) the PHYS algorithm provides the error estimates for every single retrieval together with a quality control flag, which is very useful for detection and removal of spurious data; b) the output of PHYS algorithm is a self-consistent set of several atmospheric parameters (water vapour and temperature profiles, pressure profile, and cloud 5 liquid water content); and c) the PHYS algorithm is a flexible tool that gives the possibility to use different amount of input data and it is more convenient for modifications, if necessary.
We compared the results of the REGR and PHYS retrieval algorithms through the whole period of measurements (2012)(2013)(2014)(2015)(2016) to assess the differences in retrieved IWV values. Relative mean difference of the two data sets does not depend on absolute IWV values: REGR is biased high compared to PHYS by approximately 5 %, which means that there is a factor of 1.05 10 between two retrieval techniques. Almost a half of the absolute differences (PHYS vs. REGR) are between -0.6 and -0.2 mm.
The mean difference between two data sets amounts to -0.52 mm with the SD of 0.44 mm. The two retrievals are less consistent in a dry atmosphere. The RPG-HATPRO radiometer is operating at its limits below 5 mm of IWV, which is affected by the intrinsic relative weakness of the 22 GHz water vapour line. Therefore, the errors of both methods are increasing with decreasing IWV values in dry conditions. Differences in the IWV results of the two algorithms might be due to many reasons, 15 particularly to different a priori information, radiative transfer model, etc.

GPS method
GPS method implies a technique of active remote sensing by GPS satellites, which transmit radio signals in the microwave range. Before these signals can reach the Earth-based receiver, they are delayed and refracted in the atmosphere. Owing to the permanent dipole moment of water vapour molecule, atmospheric refractivity is very sensitive to the presence of this gaseous 20 constituent in the atmosphere (Businger et al., 1996). Since a hydrogen bond between water molecules in liquid water and ice significantly reduces the contribution of the dipole moment to radio signal delay, the impact of cloud water and ice on atmospheric refractivity is limited. This allows ground-based GPS receivers to provide data on IWV above the receiver site even in cloudy weather. If the position of the receiver is accurately known, a target atmospheric delay is derived by comparison between an observed signal path length (pseudorange) and a geometric distance between satellite and receiver (true range). 25 For IWV retrieval, we use a ground-based GPS sensor -Novatel ProPak-V3 dual-frequency receiver with a GPS-702-GG antenna mounted on a roof. The instrument is operating continuously in all weather conditions since August 2014. The carrier phase and binary code pseudorange measurements from GPS satellites on two GPS carrier frequencies (1227 and1575 MHz) are processed with the help of TropoGNSS software, developed at Kazan Federal University (Kalinnikov and Khutorova, 2017). Retrieval algorithm identifies the Precise Point Positioning strategy for zenith tropospheric delay (ZTD) estimation 30 (Kouba, 2009). Phase measurements play a key role in the algorithm: they are compared with geometric distances between receiver and corresponding satellites. Code measurements serve only for calculation of receiver clock corrections. Geometric distances and satellite clock corrections are determined using precise ephemerides/clock products of International GNSS Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 29 May 2017 c Author(s) 2017. CC-BY 3.0 License. Services (http://www.igs.org). The algorithm takes into account changes of a receiver antenna position due to the ocean loading effect, solid and pole tides (Petit and Luzum, 2010). Influence of ionosphere is excluded by the formation of iono-free combination of phase measurements at two frequencies (Schaer, 1999). Phase ambiguities are removed by forming differences between phase measurements from two consecutive epochs. Slant tropospheric delays during measurements are expressed in the form of multiplication of zenith tropospheric delay and Niell mapping function that are determined by the zenith angle of 5 each satellite, day of a year, latitude and altitude of station (Niell, 1996). The zenith cut-off angle in TropoGNSS processing is established at 83°. Time series of iono-free combinations of phase measurements are consistently processed from epoch to epoch by Kalman filter with ZTD as unknown parameter. Output ZTD time series have a 5-minute step. We assume that ZTD is a sum of the dry hydrostatic (ZHD) and the wet (ZWD) components (Bevis et al., 1994). The hydrostatic component with the accuracy of 1 mm is determined using Saastamoinen model (Saastamoinen, 1972). The wet component is defined as a 10 difference between ZTD and ZHD and then is converted to IWV values following an approach proposed by Askne and Nordius (1987) and Mendes (1999).
The uncertainty of ZHD and ZWD determination results in the IWV retrieval uncertainty of 1.5-2 mm. This estimate is close to the uncertainty of GPS IWV measurements, obtained by other authors. Ning et al. (2016)  The variability of presented errors of IWV measurements can be explained by the dependence of retrieval errors on the atmospheric state, particularly, the IWV values, measurement conditions (solar zenith angles, number of used satellites, etc.), 20 the stability of instruments and the consistency of measurements themselves. Taking into account the location of St. Petersburg site between the Gulf of Finland and rural areas, local horizontal gradient of water vapour distribution might be also a reason for differences.

IWV measurements at St. Petersburg site
All instruments for IWV monitoring are installed at the buildings of SPbU Peterhof campus: RPG-HATPRO radiometer and 25 Novatel ProPak-V3 receiver -on a roof of the same building (at a distance of 2 m), 55 m a.s.l.; Bruker 125 HR spectrometer -in the ground floor of a nearby building at a distance of 330 m to the west, 21 m a.s.l. Figure 3 depicts schematically a diagram of mutual location of all three instrumentation. It is worth mentioning that the solar tracking system of Bruker 125 HR is located on a roof, so the beam path partly passes through a pipe from the top of the building to the ground floor. In accordance with the beam pattern of MW radiometer, input signal comes from about 20 m above the instrument. We evaluated possible 30 differences in measured IWV values due to differences in an elevation of considered instrumentation. We used the ECMWF monthly averaged humidity profiles and got the following estimates. Depending on season, FTIR technique might give values Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech. Although MW radiometer and GPS receiver are located close to each other, they have some spatial disagreement: MW radiometer is operated only in a zenith observation mode for IWV measurements, whereas GPS receiver gets the information from various satellites with a horizontal averaging of several dozen of kilometres. We also should take into account the 5 difference in observed air masses while comparing MW and FTIR measurements. Virolainen et al. (2016) have demonstrated that depending on season and time of the day (different solar azimuth and zenith angles), the measured IWV values may belong to different air mass located at a distance of up to 20-25 km. At worst, the spatial inhomogeneity of water vapour fields might cause the discrepancy between two types of measurements, especially, considering the surroundings of St. Petersburg sitethe Gulf of Finland from one side and the rural suburbs of Saint Petersburg from the other. 10 MW radiometer measures spectra every 2 s, GPS receiver gets a single measurement every 5 minutes, FTIR spectrometer records spectra only in clear-sky conditions, one record usually lasts about 12 minutes. Table 3 lists the main features of considered instrumentation for IWV measurements. It is worth mentioning that the period of observations of all three instruments varies from device to device: FTIR has been working since January 2009, MW radiometer since June 2012, and GPS receiver since August 2014. There are also some gaps in measurement series due to technical problems with one or more 15 instruments. In order to synchronise all three types of IWV measurements, we average MW and GPS measurements over 12minutes interval for each FTIR individual measurement. In this study, we consider the period of IWV measurements from September 2014 to December 2016 when such triples are available.

FTIR measurements
In earlier papers (Semenov et al., 2014;Virolainen et al., 2016;Berezin et al., 2017;Ionov et al., 2017), we presented the 20 results of FTIR IWV retrievals that used the spectral scheme close to the A-type. In this study, we add the M-type retrieval developed for the MUSICA project (Schneider et al., 2016;Barthlott et al., 2017). Therefore, we compare FTIR IWV retrievals of these two setups to harmonise our previous and present results.
Since we record solar spectra within limited spectral bands using a set of broadband filters, the spectra underlying the A-type and M-type retrievals are not observed simultaneously. The acquisition time for individual interferograms obtained by co-25 adding ten scans equals approximately 12 minutes. We usually make a series of three individual measurements for each spectral band. Thus, there is a time lag between the two types of FTIR IWV measurements of at least 12-15 minutes.
In order to compare data sets of the A-type and M-type IWV measurements we assume the pairs to be near synchronised if the time mismatch between the nearest ones does not exceed 30 minutes.     Table 4 lists statistical characteristics of the M and A-type comparison, depending on IWV values. Absolute differences increase with growing IWV values, relative differences slightly fall in the 8.6-9.9 % range. To harmonise IWV measurements of the A-type and M-type, we multiply IWV values of the A-type by a factor of 1.09 (and get so-called the Acorr-type retrieval).
The observed mean difference between the M-type and Acorr-type and the standard deviation reduce to 0.14 mm (1 %) and 20 0.42 mm (3 %), respectively. The remaining offset is statistically insignificant, and the standard deviation value is within the error margins of both types of retrievals (see Table 2), so we may conclude that both setups agree well. Therefore, for the following analysis and comparison with independent IWV measurements we combine data sets of the M-type and Acorr-type to cover a more extended period. Figure 6 shows an example of using the Acorr-type retrieval in harmonisation and analysis of IWV diurnal cycle by the FTIR-25 method. The number of separate measurements of one type may be insufficient to detect the strong variation of IWV in contrast with using a combination of both types of FTIR data. Figure 6 (right) demonstrates that considering only a M-type retrieval, we miss decreasing IWV values up to 9-10 mm (8-9 am 13.09.2016). At the same time, the correction of an A-type measurement helps to avoid "artificial" IWV variations caused by systematic differences between the retrieval types (Fig. 6,   left). 30

Simultaneous FTIR, MW and GPS measurements
Finally, we create three data sets of synchronised IWV measurements: FTIR (M-type + Acorr-type), MW (PHYS) and GPS for the period between August 2014 and October 2016. Figure 7 depicts results of comparison as scatter plots showing correlation Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 29 May 2017 c Author(s) 2017. CC-BY 3.0 License. between the data pairs at a logarithmic scale. Generally, different data sets correlate well, the correlation coefficient is close to or larger than 0.99 for considered data pairs. However, the scatter of IWV values obtained from different techniques depends on IWV values themselves, the smaller the values the greater the scatter. We observe the smallest IWV values obtained by GPS technique (less than 1 mm). The accuracy of GPS measurements makes up 0.5-1.5 mm and worse (see Sect. 2.3), thus for dry atmosphere, the errors of GPS technique might be larger than measured IWV values (more than 100%). It accounts for 5 significant curvature in log-scale presentation of the scatter for small IWV values (see MW vs. GPS panel). At the same time, measurement errors of MW technique are also larger for small IWV values due to the weakness of the 22 GHz water vapour line. The best agreement between data pairs is observed for IWV values larger than 5-6 mm. Even for these IWV values, FTIR measurements agree better with GPS and MW data than GPS and MW with each other. Table 5 presents results of the same comparison (mean differences and their standard deviations) of all IWV data pairs, i.e. 10 shows biases and scatters between the different techniques. Since GPS and MW techniques are less accurate for small IWV values, we single out two subsets depending on IWV quantity: less than 6 mm ("dry" subset) and larger than 6 mm ("wet" subset). FTIR and GPS measurements are in better agreement than other considered data pairs (the smallest scatter, the strongest correlation), whereas GPS and MW experience the largest scatter of differences with minimal bias. For all pairs, the smallest scatter in absolute and relative units is observed for the subset with IWV values greater than 6 mm. A percentage 15 scatter for the "dry" subset varies from 14.6 to 27.5 %, whereas for the "wet" subset it ranges from 4.7 to 7 %. The worst agreement belongs to GPS-MW pairs. These values of scatter and correlation coefficient confirm that in dry atmosphere GPS and MW techniques are less reliable for IWV measurements at St. Petersburg site than the FTIR method.
Taking FTIR measurements as a reference, for the whole dataset and for the "wet" subset, we observe an underestimation of GPS and MW data, with larger dry bias for the latter. The same situation is for "dry" FTIR-GPS pairs. Particularly, this 20 systematic discrepancy can be explained by differences in instruments elevations above sea level (last column of Table 5) discussed at the beginning of Sect. 3. However, it is not the only reason for a systematic disagreement of IWV values, since the observed differences are larger than estimated for different elevation (Table 5). On the contrary, for the "dry" subset, the bias between FTIR-MW and GPS-MW techniques is quite different: GPS measurements have a dry bias and FTIR measurements have no bias compare to MW data. This probably results from increasing errors of MW measurements in a dry 25 atmosphere. Taking into account differences in elevation of considered instrumentation and corresponding estimates in IWV differences, we may reduce a wet bias of FTIR measurement with respect to GPS data up to 0.2 mm, not depending on observed IWV values. In this context, biases in pairs with MW measurements depend on IWV values themselves. Thus, for the whole data set and the "wet" data set, we may reduce a dry bias of MW measurements up to 0.2-0.3 and 0.4-0.6 mm in comparison with GPS and FTIR measurements, respectively. For small IWV values (<6 mm), the dry MW bias converts into a wet bias 30 estimated as 0.2 and 0.4 mm compare to FTIR and GPS data.

Empirical statistical assessment of IWV measurement errors
Having three co-located methods for IWV retrieval at our disposal, we may empirically evaluate the accuracy of individual methods. The individual estimate of IWV value measured by method A can be expressed as where , is the true value of IWV, and are the systematic and statistical errors, respectively. Taking into account spatial and temporal misalignments of the two types of measurements (A and B) and assuming that statistical 5 measurement errors are uncorrelated and have a zero mean, we can express the square of the observed standard deviation − as follows: Since we inter-compare three near-synchronised data sets, we may assume that the temporal misalignment is equal to zero. As we have noted earlier, FTIR spectrometer tracks the Sun, MW radiometer has a zenith-viewing geometry, and GPS-receiver 10 gets the information from different satellites providing a spatial-averaged value of IWV. Therefore, the considered data triples might have a spatial disagreement (Virolainen et al, 2016). However, we do not have two-dimensional maps of IWV fields at our disposal, thus we cannot make a quantitative estimation of spatial disagreement, so in this research we neglect this misalignment error, too. It means that we evaluate the upper bound of measurement errors. Using Eq. (1) for each pair of data sets, we obtain a system of three linear equations, from which we can derive the empirical statistical errors for each of the 15 compared methods: Using standard deviation values from Table 5 in these equations, we get the statistical errors for the whole data set of compared 20 IWV measurements (see the second column of Table 6). Since differences between considered measurements strongly depend on IWV values (see Sect. 3.2), we tried to get the same estimates for the "dry" and "wet" subsets. The third column of Table   6 displays errors for the "wet" subset, while for the "dry" subset the system of Eq. (2) could not be solved presumably due to correlations of measurement errors of one or more instruments (MW, GPS) in dry atmosphere. Comparing the whole and "wet" data sets results, we see that errors significantly decrease only for MW technique; along with the largest among other 25 instruments error values, it confirms that MW measurements are less reliable in dry atmosphere.
For verification of the quality and accuracy of our estimates, we allocate two more data sets: a) the so-called "MW stable" data set, for which we select measurements with variability of MW IWV values less than 2% for 12 minutes averaging interval; b) M-data set, for which we take MW and GPS measurements coincident with the M-type FTIR data only. The estimates for "MW stable" and M-data sets are shown in the fourth and fifth columns of Table 6, respectively. All estimates from Table 6  30 Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech. Summing up the results of the IWV retrieval accuracy assessment, we may conclude that at St. Petersburg site FTIR and GPS techniques demonstrate more stable and consistent results than MW technique.

Discussion
A great number of studies are devoted to the analysis of differences in water vapour FTIR measurements caused by differences 10 in retrieval schemes: spectral microwindows, algorithms, a priori information, etc. (Schneider et al., 2010a(Schneider et al., , 2010b (see Table 4). At St. Petersburg site, IWV ranges from below 1 mm to more than 40 mm, and both the mean difference and its standard deviation are very consistent for different IWV values indicating the high quality of IWV variability as obtained from both retrieval schemes. 25 The spatial mismatch of compared data sets might significantly influence the results of the comparison. Semenov et al (2015) coupled radiosonde data in Voejkovo with FTIR data at Peterhof; Berezin et al (2016) did the same for MW data. Although correlations between measurements are higher than 0.96, root mean square differences reached more than 20% for most of the collocated IWV data sets. The strong disagreement was mainly due to the natural spatial variability of IWV, taking into account the distance of 50 km between Peterhof (MW and FTIR instruments location) and Voejkovo (radiosondes launches). This 30 variability reached approximately 13 mm during a day (Semenov et al., 2015). Even for monthly means of FTIR and radiosonde data (correlations higher than 0.99), the SD values reached approximately 11 % (or 0.98 mm). Excluding the days Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech.   (Buehler et al., 2012;Palm et al., 2010) were originally designed for 10 observations of other atmospheric species; IWV was derived as a by-product, so the IWV retrieval scheme as well as its accuracy had not been optimised. Buehler et al. (2012) reported differences between MW and FTIR data sets at Kiruna site (Sweden), which varied from (-1.90 ± 12.85) % up to (22.79 ± 29.34) %, or from (-0.20 ± 0.92) mm up to (0.90 ± 1.08) mm. Virolainen et al. (2016) showed that at St. Petersburg site (Russia) MW measurements overestimated FTIR data by 0.29 mm, not depending on season. The SD values varied from 0.24 mm (the dry season) to 0.54 mm (the wet season), amounting to 15 0.42 mm (4.1 %) in average. In this study, differences between the two types of measurements are larger and have opposite sign, amounting to (0.85 ± 0.87) mm for FTIR vs. MW measurements. Such different results for the same site can be explained by different FTIR retrieval schemes: the A-type in earlier study (Virolainen et al., 2016) and the M-type in this study. We have shown that both types of retrieval have systematic difference in IWV values of about 9 %.
The comparison between FTIR and GPS IWV data sets are discussed in several studies (Schneider et al., 2010a;Buehler et 20 al., 2012;Mengistu et al., 2015). For both Arctic and African site, GPS measurements overestimate FTIR data by 0.3-0.6 mm.
The standard deviation of mean differences varies from 0.9 to 1.6 mm. Our estimates at St. Petersburg site demonstrate nearly the same in percent underestimation of GPS vs. FTIR data as at Izana site (Canary islands), but larger in absolute values. This wet bias of FTIR measurements at St. Petersburg site may come from the location of GPS sensor being 34 m higher than FTIR spectrometer, which might be crucial under specific atmospheric conditions. 25 Finally, many studies are devoted to comparisons of GPS and MW IWV measurements (van Baelen et al., 2005;Memmo et al., 2005;Morland et al., 2006;Buehler et al., 2012;Steinke et al., 2015;Roman et al., 2016) Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 29 May 2017 c Author(s) 2017. CC-BY 3.0 License. random differences are much smaller than for ARM TWP (Tropical Western Pacific) and SGP (Southern Great Plains) stations.
It is worth mentioning that for TWP site, the disagreement between the two GPS data sets reached (0.8 ± 3.1) mm. The differences between GPS and MW data at St. Petersburg site are very similar to those reported by Roman et al. (2016) for ARM NSA site.
Most of the differences presented in Table 7 (between FTIR, MW and GPS IWV data pairs) are larger than observed in this 5 study at St. Petersburg site. The stringent spatial and temporal matching conditions applied here is the predominant reason, in our opinion, of good agreement of different methods for IWV measurements. Our figures demonstrate that with an accurate spatial and temporal matching of different types of IWV measurements their disagreements are close to the total measurement errors of individual methods and thereafter to WMO goal requirements to the accuracy of IWV measurements in atmospheric chemistry. 10

Summary
We describe three methods for IWV measurements available at St. Petersburg site and compare these observations. We analyse the MUSICA IWV retrievals (M-type) in comparison with the standard PROFFIT retrieval (A-type) to enable the comparison with results of previous studies. We evaluate averaged IWV measurement errors for the whole period of measurements (2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) from the error matrix calculations and demonstrate that the M-type retrieval is slightly more accurate (systematic errors 15 constitute 2.0 vs. 2.3 %) and precise (statistical errors make up 0.4 vs. 0.9 %) than the A-type retrieval. We observe the overestimation of the M-type retrieval vs. A-type retrieval with a scaling factor of 1.09 ± 0.2. The mean difference between the M-type and A-type retrievals amounts to (1.2 ± 0.8) mm or (8.9 ± 5.9) % and is mainly caused by the different spectroscopy in spectral regions related to the A-type and M-type setups. We harmonise M-type and A-type of IWV retrievals to increase the continuity of a series of IWV measurements by FTIR method at St. Petersburg site. The correction of A-type retrievals by 20 a factor of 1.09 allows reducing the differences between the M-type and A-type data to (0.14 ± 0.42) mm or (1 ± 3) %, which is close to the IWV measurement errors of the FTIR method.
We analyse in detail FTIR, MW and GPS techniques for IWV retrieval at St. Petersburg site and allocate the data triples of near-synchronised IWV measurements by all three methods. We show that FTIR and GPS measurements are in better agreement among all coincident pairs, whereas GPS and MW methods experience largest scatter of differences with minimal 25 bias. FTIR vs. GPS methods agree within (0.52 ± 0.63) mm or (4.5 ± 5.4) %, FTIR vs. MW -within (0.85 ± 0.87) mm or (7.3 ± 7.4) %, and finally GPS vs. MW -within (0.33 ± 0.97) mm or (2.9 ± 8.8) % for the whole data set of synchronised triples. It is worth mentioning that in a dry atmosphere (IWV values less than 6 mm) FTIR method is more reliable for IWV measurements than MW or GPS techniques, for which the measurement errors are increasing with decreasing IWV values.
We observe an underestimation of GPS and MW techniques with respect to FTIR data that occurs particularly due to 30 differences in elevation of considered instruments (GPS sensor is located higher than Bruker 125HR by 34 m, MW -by 54 m).
Accounting for differences in IWV values due to the different elevation of instruments may significantly reduce systematic Atmos. Meas. Tech. Discuss., doi:10.5194/amt-2017-135, 2017 Manuscript under review for journal Atmos. Meas. Tech. Discussion started: 29 May 2017 c Author(s) 2017. CC-BY 3.0 License. discrepancies between FTIR, GPS and MW IWV measurements at St. Petersburg site. Horizontal inhomogeneity of water vapour fields in the vicinity of the observing site might also result in the discrepancy of compared quantities due to different observational geometry, since FTIR spectrometer tracks the Sun, while MW radiometer has a zenith-viewing geometry, and GPS-receiver gets the information from different satellites providing a spatial-averaged value of IWV.