Validation of GOME-2 / MetOp-A total water vapour column using reference radiosonde data from the GRUAN network

The main goal of this paper is to validate the total water vapour column (TWVC) measured by the Global Ozone Monitoring Experiment-2 (GOME-2) satellite sensor and generated using the GOME Data Processor (GDP) retrieval algorithm developed by the German Aerospace Centre (DLR). For this purpose, spatially and temporally collocated TWVC data from highly accurate sounding measurements for the period January 2009–May 2014 at six sites are used. These balloon-borne data are provided by the GCOS Reference Upper-Air Network (GRUAN). The correlation between GOME-2 and sounding TWVC data is reasonably good (determination coefficient, R, of 0.89) when all available radiosondes (1400) are employed in the intercomparison. When cloud-free cases (544) are selected by means of the satellite cloud fraction (CF< 5 %), the correlation exhibits a remarkable improvement (R∼ 0.95). Nevertheless, the analysis of the relative differences between GOME-2 and GRUAN data shows a mean absolute bias error (weighted with the combined uncertainty derived from the estimated errors of both data sets) of 15 % for all-sky conditions (9 % for cloud-free cases). These results evidence a notable bias in the satellite TWVC data against the reference balloon-borne measurements, partially related to the cloudy conditions during the satellite overpass. The detailed analysis of the influence of cloud properties – CF, cloud top albedo (CTA) and cloud top pressure (CTP) – on the satellitesounding differences reveals, as expected, a large effect of clouds in the GOME-2 TWVC data. For instance, the relative differences exhibit a large negative dependence on CTA, varying from −6 to −23 % when CTA rises from 0.3 to 0.8. Furthermore, the satellite-sounding TWVC differences show a strong dependence on the satellite solar zenith angle (SZA) for values above 50. Hence the smallest relative differences found in this satellite-sounding comparison are achieved for those cloud-free cases with satellite SZA below 50. Finally, the relative differences also show a negative dependence on the reference TWVC values, e.g. changing from +10 % (TWVC below 10 mm) to −10 % (TWVC above 40 mm) when cloud-free conditions with SZA below 50 are selected. Overall, relative differences within ±10 % with respect to reference sounding data for a large range of TWVC values can be considered as a good result for satellite retrievals.


Introduction
Atmospheric water vapour is a key component for weather and the climate system because it plays a vital role in the formation of clouds and precipitation, the growth of aerosols and significantly contributes to the energy balance of the Earth when acting as a powerful greenhouse gas.Unlike most trace gases, the atmospheric water vapour exhibits a highly variable spatial and temporal distribution.Hence, close monitoring of its variability and long-term changes is a critical issue for the scientific community (e.g.Hartmann et al., 2013).
Remote sensing instruments aboard satellite platforms provide an effective way to monitor the geographical and temporal distribution of the column-integrated amount of atmospheric water vapour, called total water vapour column (TWVC), thanks to their global coverage, high spatial resolution and accurate observations (e.g.Kaufman and Gao, 1992;Bauer and Schlüssel, 1993;Noël et al., 1999Noël et al., , 2004;;Maurellis et al., 2000;Wagner et al., 2006;Li et al., 2006;Deeter, 2007;Lang et al., 2007;Mieruch et al., 2008;Pougatchev et al., 2009).Within this framework, the European satelliteborne atmospheric sensor Global Ozone Monitoring Experiment 2 (GOME-2) aboard the Meteorological Operational satellite program (MetOp-A and MetOp-B) provides the potential for a detailed analysis of the global distribution of the atmospheric water vapour (Grossi et al., 2014).MetOp-A and MetOp-B were launched in 2006 and 2012, respectively, belonging to a series of three similar meteorological satellites from the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT) (MetOp-C is expected to be in orbit in 2018).The main objective of MetOp missions is to provide continuous and long-term observations of the most important trace gases, supporting operational meteorology, global weather forecasting and climate monitoring (Edwards et al., 2006).The three MetOp satellites will guarantee continuous TWVC time series using the same sensor (GOME-2) to at least the first half of the 2020s.
To assure the quality and accuracy of the operational TWVC data derived from satellite observations, validation exercises using independent measurements recorded by reference instruments are required.Among them, the atmospheric sounding through weather balloons equipped with pressure, temperature and humidity sensors is an essential technique to monitor the TWVC changes under all weather conditions (e.g.Ross and Elliott, 2001;Durre et al., 2009).Nevertheless, it is well known that radiosonde humidity records can contain sensor-dependent errors that vary notably over time and space (e.g.Vömel et al., 2007;Wang and Zhang, 2008;Dai et al., 2011).Therefore, the balloon-borne data used as reference in the validation of satellite observations must be generated by high-quality networks with identical instrumentation and a common mode of operation.For instance, the GCOS Reference Upper-Air Network (GRUAN) provides highly accurate sounding measurements complemented by ground-based instruments for the study of atmospheric processes (Seidel et al., 2009;Immler et al., 2010).GRUAN has developed a high-quality data product based on measurements of temperature, humidity, wind and pressure by the Vaisala RS92 radiosonde (Immler and Sommer, 2011;Dirksen et al., 2014).
This paper focuses on the validation of the TWVC data measured by the GOME-2/MetOp-A satellite instrument using as reference the balloon-borne data recorded between January 2009 and May 2014 from six GRUAN stations.In this satellite validation, we use the TWVC data inferred from the GOME Data Processor (GDP) retrieval algorithm (versions 4.6 and 4.7) generated by the German Aerospace Center, Remote Sensing Technology Institute (DLR-IMF) in the framework of the EUMETSAT Satellite Application Facility on Atmospheric Chemistry Monitoring (O3M SAF) (Valks et al., 2011).This retrieval algorithm is based on the classical Differential Optical Absorption Spectroscopy (DOAS) technique (Platt, 1994).Although some validation exercises of GOME-2 TWVC data have been separately carried out before (e.g.Kalakoski et al., 2011Kalakoski et al., , 2014;;Schröder and Schneider, 2012;Grossi et al., 2013Grossi et al., , 2014)), the present study should be considered as complementary since it works with a homogeneous high-quality data sets as reference (RS92 GRUAN Data Product, RS92-GDP) and with a focus on the analysis of the effects of cloudiness and geometrical properties that have not been studied in detail up to now.It is therefore expected that this paper will improve the understanding of the quality of the GOME-2 TWVC data retrieved by the GDP retrieval algorithm.
The satellite and sounding data employed in this paper are described in Sect. 2. Section 3 explains the methodology applied in the validation.The results obtained are presented and discussed in Sect. 4 and, finally, the conclusions are summarized in Sect. 5.

Satellite observations
The GOME-2 is an across-track scanning nadir-viewing (from about 240 to 790 nm) spectrometer launched on board EUMETSAT MetOp-A in October 2006 (Munro et al., 2006).This satellite instrument is an enhanced version of its antecessor GOME/ERS-2 launched in 1995 (Burrows at al., 1999) with an improved temporal coverage (near daily global coverage at the equator) thanks to a spatial resolution of 80 km × 40 km and a swath of 1920 km.The MetOp-A has a sun-synchronous orbit, with a mean altitude of 817 km and an equator crossing time of 09:30 LT.
The operational algorithms for the retrieval of TWVC data from the GOME-2/MetOp-A is the level-1-to-2 GOME Data Processor (GDP) (versions 4.6 and 4.7), integrated into the Universal Processor for Atmospheric Spectrometers (UPAS, version 1.3.9)processing system at DLR-IMF.A detailed description of GDP can be found in the Algorithm Theoretical Basis Document of Valks et al. (2011) and in the work of Grossi et al. (2014).Here a brief description is presented.In a first step, this algorithm retrieved the water vapour slant column density (SCD) by means of the DOAS methodology applied in the spectral range 614.0-683.2nm.In a second step, correction factors derived from numerical simulations are applied to the SCD values in order to remove the absorption non-linearity effect which is related to the highly fine structured water vapour (and O 2 ) absorption bands (Wagner et al., 2003).Finally, TWVC values are obtained dividing the corrected SCD values by appropriated air mass factors (AMFs) which is derived from the measured O 2 absorption.The major improvement in the current GDP 4.7 compared with the previous GDP 4.6 is the empirical correction for the scan angle dependency which almost completely removes the bias between East and West ground pixels in the 24 forward scans (Grossi et al., 2014).The new GDP 4.7 version was released in July 2013, hence, most of the GOME-2 TWVC data used in this study (∼ 86 %) correspond to the GDP 4.6 version.
According to Valks et al. (2011) and Grossi et al. (2014), the error budget in the TWVC retrieved from GOME-2 can be separated in errors affecting the retrieval of the slant columns (DOAS-related errors), and errors affecting the conversion of the slant into vertical column (AMF-related errors).Additionally, the general error contribution (e.g.crosssection, temperature, Ring effect, etc.) is taken as a constant of 10 %.Hence, the uncertainty of the GOME-2 TWVC data is elevated.GDP 4.6-4.7 provides an estimated error (1 standard deviation) for each satellite TWVC data.

Sounding measurements
The reference balloon-borne data used in this work to validate the GOME-2 TWVC observations are taken from the GRUAN network which aims to provide traceable measurements of atmospheric profiles for a detailed characterization of essential climate variables (e.g.pressure, temperature, water vapour) over a long-term period (Seidel et al., 2009;Immler et al., 2010).
The GRUAN data product (RS92-GDP) currently available is based on balloon-borne measurements using Vaisala RS92 sondes (Immler and Sommer, 2011;Dirksen et al., 2014).This instrument is equipped with a wire-like capacitive temperature sensor ("Thermocap"), two polymer capacitive moisture sensors ("Humicap"), a silicon-based pressure sensor, and a GPS receiver to measure position, altitude and winds.The RS92 transmits recorded data at 1 sec intervals, being received, processed and stored by the DigiCora ground-based station equipment.The "Humicap" sensors are used to measure the relative humidity and they consist of a thin hydrophilic polymer layer on a glass substrate which acts as the dielectric of a capacitor.These two humidity sensors are alternately measuring and being heated, thus removing coating of the sensor by ice or liquid inside clouds.
The RS92 radiosonde has a proven quality exhibiting the smallest systematic and random errors among the diverse types of radiosonde sensors (e.g.Miloshevich et al., 2006;Moradi et al., 2013).The high quality of this setup has enabled its participation in radiosonde inter-comparison campaigns under the auspices of the World Meteorological Organization (WMO) (e.g.Nash et al., 2011) and also its use in numerous inter-comparison exercises against both groundbased setups (e.g.Schneider et al., 2010;Buehler et al., 2012;Pérez-Ramírez et al., 2014) and satellite-based instruments (e.g.du Piesanie et al., 2013;Diedrich et al., 2015).Nevertheless, there are several error sources of RS92 measurements which may limit their quality, such as the solar radiation error related to the solar heating of the humidity sensor (see Miloshevich et al., 2009, and references within).Hence, the added value of the GRUAN product is associated with the implementation of an exhaustive data processing method including corrections for the different error sources which guarantees high quality sounding measurements (Immler and Sommer, 2011;Dirksen et al., 2014).
RS92-GDP provides vertical profiles of temperature (temp), relative humidity (rh), pressure (press), altitude, geopotential height and wind, together with other variables such as the water vapour volume mixing ratio derived from rh, temp and press.The TWVC used in this work is obtained by integrating vertical profiles of water vapour volume mixing ratio.Additionally, RS92-GDP is the first data set of balloon-borne measurements that provides vertically resolved uncertainty estimates which include, for humidity data, uncertainties in calibration, radiation correction, time lag correction and sonde preparation (Immler and Sommer, 2010;Dirksen et al., 2014).
To provide an uncertainty for the TWVC data used in our study, we use the "correlated uncertainty" (u_cor_rh) of the rh data provided by the RS92-GDP, which represents 1 sigma (i.e. 1 standard deviation from the mean).For each profile, a relative error associated with the corresponding TWVC is obtained as the weighted average of the ratio of u_ cor_rh to rh based on the contribution of each layer to the TWVC.The mean value of the relative errors determined for all sounding analysed in this work is 3.5 %, which evidences the high quality of the reference GRUAN TWVC data.
RS92-GDP is stored in NetCDF format, and processed soundings that have passed quality control are freely disseminated through www.gruan.org/data.

Methodology
In this work, two co-location criteria are followed to select TWVC data for inter-comparison purposes.Firstly, the GOME-2 data are selected such that the distance between the centre of the satellite pixel and the location of the GRUAN station is always less than 100 km.Nevertheless, most cases are substantially below this figure, the 50, 75 and 90 percentiles being 18, 29 and 51 km, respectively.The mean distance of all selected GOME-2 overpasses is 27 km.The second criterion is related to the measured time -only those radiosondes with a difference between their launch time and the satellite overpass time smaller than 120 min being selected.Additionally, all GOME-2 TWVC data used in this work correspond to those cases which are not flagged by the GDP 4.6-4.7 retrieval algorithms as contaminated by heavy clouds (large fraction of pixel covered by clouds and, simultaneously, with high cloud albedo).Thus, the "H2O flag" is set when cloud albedo × cloud fraction > 0.6 or when the O 2 absorption is too small (Valks et al., 2011;Grossi et al., 2014).The TWVC data for the remaining cloudy cases are provided by the satellite algorithms.RS92-GDP is currently available only for 14 GRUAN stations.Applying the two co-location criteria and the "H2O flag", a total of 1400 soundings of six GRUAN stations (Table 1) were used to be compared against GOME-2 TWVC data throughout the period 2009-2014.Detailed information about these six stations can be found at www.gruan.org.
A linear regression analysis is performed between the TWVC values measured by the radiosonde and those observed by the satellite instrument.Regression coefficients, coefficients of determination (R 2 ) and the root mean square errors (RMSEs) are evaluated in this analysis.Furthermore, the relative differences (RD) between radiosonde TWVC data (Rad) and satellite TWVC data (Sat) are obtained for each GRUAN site by means of the expression The combined uncertainties of these relative differences (σ (RD)) can be derived from the estimated errors of both satellite data (σ (Sat)) and radiosonde data (σ (Rad)) as From the relative differences and their combined uncertainties, the weighted mean bias error (MBEw) and the weighed mean absolute bias error (MABEw) parameter are determined as where N is the number of data pairs Satellite-Radiosonde recorded in each GRUAN site.
Finally, the uncertainty (u) of MBEw and MABEw parameters is characterized by . (5)

Regression analysis
First, a linear regression analysis between the GRUAN and GOME-2 TWVC data is performed for each GRUAN station and for all stations together in order to analyse their proportionality and similarity.Statistical parameters (the slope of the regression, R 2 and RMSE) derived from the linear fitting between radiosonde and GOME-2 data are shown in Table 2. Additionally, this table also shows the MBEw and MABEw parameters obtained by Eqs. ( 3) and (4).The results shown for all-sky conditions (upper rows) reveal a fair agreement between GRUAN and GOME-2 TWVC with determination coefficients higher than 0.70 for the six stations.Furthermore, the linear regression analysis provides slopes close to unity which is a sign of the proportionality between the sounding and satellite data.The negative values of all MBEw parameters indicates that GOME-2 TWVC data underestimate on average the GRUAN balloon-borne measurements when all-sky conditions are used in the inter-comparison.This underestimation is around 11 % when the six stations are analysed together.Table 2 also shows high MABEw values (between 13 and 22 %) with an average value of 15 % for all-sky conditions.These results evidence a notable bias in the GOME-2 TWVC with respect to the reference sounding data which can be partially associated with the cloudy conditions during the satellite overpass.Although a satellite "H2O flag" has been used to select those GOME-2 TWVC data not contaminated by heavy cloudy conditions, the remaining cloudy cases can introduce a notable bias in the inter-comparison between satellite and sounding TWVC data.Thus, a correlation analysis between GOME-2 and GRUAN data has been performed only for those cases with satellite cloud fraction (CF) smaller than 5 % (called cloud-free cases).These cloud-free cases represent about 39 % of all cases.The average (±1 standard deviation) of the CF values for the remaining cases is (53 ± 32) %.Table 2 also shows the results for these cloud-free correlations (lower rows).Thus, when exclusively cloud-free conditions are used in the correlation analysis, it can be seen that, as expected, the statistical parameters R 2 and RMSE improve substantially.Furthermore, the MABEw displays a remarked reduction for the six stations, showing a value around 9 % when all cloud-free data are analysed together.It must be also noted that MBEw values for cloud-free cases are substantially smaller (closer to zero) than for all-sky conditions and even turns to a positive value for one station, which may suggest that GOME-2 TWVC data significantly underestimate the reference balloon-borne measurements for cloudy conditions, as will be verified in what follows.
Table 2 also reports the uncertainty of the weighted averages (Eq.5) for each station and all data derived from a combined uncertainty using the estimated errors of the sounding and satellite TWVC data.The weight of the GOME-2 estimated errors in these combined uncertainties is larger than 99 %.The large estimated errors of satellite data produce high uncertainty values of the relative differences (∼ 67 % when all data are analysed together).The huge uncertainties obtained for Ny-Ålesund and Sodankylä (high-latitude stations) are related to the strong increase of the estimated errors of GOME-2 data with increasing the solar zenith angle (SZA).This fact may likely be associated with large errors affecting the conversion of the water vapour slant column density into vertical column (AMF-related errors) for high SZA values (Grossi et al., 2014).Nevertheless, more than 98 % of the relative differences between GOME-2 and GRUAN TWVC data are within the uncertainty reported for each station.From this result, it must be emphasized that almost all relative differences are explained by the combined errors of sounding and satellite data sets.
Figure 1 shows the relationship between satellite and sounding TWVC data for all-sky conditions (top plot), revealing a good agreement in the correlation but with a high degree of spread (RMSE ∼ 27 %).This large scatter is strongly reduced when cloud-free conditions are selected for the analysis (bottom plot), decreasing the RMSE value to about 16 %.Nevertheless, it can be seen for these cloudfree cases than GOME-2 data tend to overestimate (underestimate) GRUAN data for small (large) TWVC values.This issue will be analysed in detail in Sect.4.4.Overall, the inclusion of cloudy cases in the satellite-sounding intercomparison produces an increase of the both the scatter in the correlation and the bias with respect to the reference data.Therefore, our recommendation to potential users of the operational GOME-2 TWVC data is to work whenever possible with satellite data for cloud-free conditions.

Dependence of the differences on geometrical parameters
The weighted mean relative differences between sounding and GOME-2 TWVC data (Eq.3) as a function of satellite ground pixel solar zenith angle (SZA) are shown in Fig. 2  cases (CF < 5 %, in red) and cloudy conditions (CF > 50 %, in blue).The number of cases selected is 546 (39 %) for cloud-free conditions and 434 (31 %) for cloudy conditions.The curves for both cloud-free and cloudy conditions follow practically similar patterns with a stable behaviour until 40-50 • .Nevertheless, these cases with satellite SZA values up to 50 • show a clear difference between cloudy and cloudfree conditions: while GOME-2 data strongly underestimate the GRUAN measurements recorded under cloudy conditions (relative differences around 20 %), the results for the cloud-free data set reveals a good agreement between satellite and balloon-borne data (relative differences between −5 and +3 %).For satellite SZA values above 50 • , a monotonic increase is observed in the two curves until high satellite SZA values in agreement with other GOME-2 validation exercises (e.g.Kalakoski et al., 2011).Thus, the curve for cloudy cases shows a reduction of the underestimation (even turns to a positive value for the highest SZA), while the relative differences for cloud-free cases show a significant dependence on SZA with values between +2 % (SZA of 50 • ) and +23 % (SZA of 70 • ).The dependence on satellite SZA found for both cloudfree and cloudy data sets is currently under investigation and  could be due to some calibration issues in the level 1B (calibrated radiances) satellite products.Another possible error source for this SZA dependence may be related to the correction factor applied to obtain the AMF of the water vapour, which is derived from the measured AMF of O 2 absorption.This correction factor takes into account the different vertical profiles of these trace gases and it strongly depends on SZA (see Fig. 10 in Valks et al., 2011).
The significant SZA dependence found on the GOME-2 TWVC data leads to a systematic seasonal dependence with respect to reference balloon-borne measurements which is shown in Fig. 3 for five out of six studied sites.This plot shows the evolution of the monthly averages of the weighted mean relative differences (Eq.3).These monthly averages are determined only for those months with more than 10 available pairs of sounding-satellite data.It can be seen that the satellite observations remarkably underestimate the sounding data in spring-summer months, while this underestimation clearly decreases (even in some stations turns to overestimation) for the autumn-winter months.The satellite view zenith angle (VZA) also known as satellite scan angle is another relevant geometry parameter.GOME-2/MetOp-A measured 24 scenes along the ground swath, one for each satellite VZA stepping at 5 • intervals between −60 and +60 • (Munro et al., 2006).As was commented in Sect.2.1, the main improvement in the current GDP 4.7 with respect to the previous version GDP 4.6 is the inclusion of an empirical correction to reduce the bias detected for the outermost west pixels (positive scan angles).The weighted relative differences between sounding and satellite TWVC data have been determined using 10 • bins of VZA (i.e.groups of two view scenes) whenever these groups present at least 10 pairs of data.Additionally, the differences have been obtained using exclusively GOME-2 data from GDP 4.6 for those cases with satellite SZA smaller than 50 • in order to minimize the effect of SZA dependence on the relative differences.Hence the number of pairs of data used to analyse the VZA dependence is 617. Figure 4 displays the sounding-satellite differences as a function of the satellite VZA, showing no significant dependence on VZA for all-sky and cloud-free conditions (the number of cloudy cases is too low to show any dependence).However, the relative differences for the outermost west scenes show a slight variation with respect to the stable behaviour shown by the other satellite view angles.Furthermore, there is an evident difference between the curves corresponding to cloud-free and cloudy conditions for the range of coincident VZA, similar to the results shown in Fig. 2. The limited number of pairs of satellitesoundings data (50) using GDP 4.7 with SZA smaller than 50 • prevents the analysis of the VZA dependence for the current version.

Dependence of the differences on cloud parameters
Cloudiness represents the most relevant atmospheric factor that can substantially decrease the accuracy of the trace gas column retrievals from satellite instruments (e.g.Koelemeijer and Stammes, 1999;Kokhanovsky et al., 2007;Antón and Figure 5. Differences between TWVC data retrieved by GOME-2 and GRUAN sounding data (Eq.3) as function of satellite cloud fraction for all cases, and those with satellite SZA < 50 • , and SZA > 50 • .Loyola, 2011;du Piesanie et al., 2013).The previous section has shown the strong effect of the cloudy cases on the GOME-2-GRUAN inter-comparison.Therefore, it is highly interesting to study the effects of the cloud parameters (CF, cloud top albedo (CTA) and cloud top pressure (CTP)) in the satellite-sounding differences.While the GOME-2 CF is retrieved by the OCRA algorithm using broadband radiance measurements in the UV-visible range, GOME-2 CTP and CTA are both derived from the ROCINN algorithm using the spectral information in and around the oxygen-A band (Loyola et al., 2007).The GDP 4.6 and 4.7 also use improved cloud retrieval algorithms including detection of sun glint effects (Loyola et al., 2011).
Figure 4 showed a slight VZA dependence of the sounding-satellite differences for the outermost west satellite pixels using GOME-2 data from GDP 4.6.Thus, to minimize this dependence, the GDP 4.6 data with VZA higher than +30 • (81 out of 1400) were removed from the analysis of cloud effects on the sounding-satellite differences.
The significant influence of cloudiness in the weighted relative differences between satellite and balloon-borne TWVC measurements (Eq. 3) is confirmed by Figs.5-7 which show the remarkable dependence of these differences with the satellite CF, CTA and CTP, respectively.Each plot exhibits three curves corresponding to all cases (in black), and those cases with satellite SZA < 50 • (in red) and SZA > 50 • (in blue).One should bear in mind that these plots have been built using cases where the GOME-2 scene is always contaminated by some degree of cloudiness (CF > 5 %).
The relative differences as a function of satellite CF are shown in Fig. 5 using bins of 10 %.It can be seen that the underestimation rises with increasing CF up to a cloud coverage percentage of 20-30 %, showing a more stable behaviour for the rest of higher CF values with relative differences between −10 and −20 %.Regarding the CTA effect, Fig. 6 shows that when all cases are considered, the relative differences change from −6 to −23 % when CTA rises from 0.3 to 0.8.Finally, Fig. 7 shows an evident increase of the relative differences (in absolute term) with decreasing CTP (increasing cloud top height) for both small and large satellite SZA cases.The three commented plots shows a common behaviour: the TWVC data inferred from the satellite instrument for SZA values below 50 • clearly present a larger underestimation of the sounding measurements than the TWVC data for SZA values above 50 • , confirming the SZA dependence observed in Fig. 2 for cloudy cases.From these results, one can infer that the cloud effects on the satellite TWVC retrieval are stronger for low SZA than for high SZA values, in agreement with the results reported for other satellite products like the total ozone column (e.g.Antón and Loyola, 2011).
The strong influence of cloudiness in the satellite TWVC retrieval is mainly associated with the so-called shielding effect as a result of which the amount of water vapour below clouds is hidden by them (Kokhanovsky and Rozanov, 2008).As most the water vapour is found in the troposphere, increasing its volume mixing ratio towards the surface, a large impact of the shielding effect on the satellite TWVC retrievals is expected (Mieruch et al., 2008(Mieruch et al., , 2010)).Thus, some retrieval algorithms make use of a cloud correction method to take into account the water vapour present below the clouds -e.g. the AMC-DOAS (Air Mass Corrected Differential Absorption Spectroscopy) method used to retrieve TWVC from SCIAMACHY (Scanning and Imaging Absorption Spectrometer for Atmospheric Chartography) measurements in the visible spectral range (Noël et al., 2004).Du Piesanie et al. ( 2013) checked this correction method by means of a detailed analysis of the sounding-satellite differences as a function of cloud fraction, cloud optical thickness and cloud top height.They found no significant dependencies with the former two cloud properties, but found a strong dependence when investigating the bias as a function of cloud top height.Although the GDP retrieval algorithm provides a "H2O flag" for heavy cloudy conditions that invalidates the AMF determination and, consequently, the retrieved TWVC data, it does not apply any cloud correction method for the remaining cloudy cases (Valks et al., 2011;Grossi et al., 2014).Therefore, it is expected that the TWVC data derived from the GDP algorithm presents a larger dependence on cloud properties than other satellite retrieval algorithms with some implemented cloud correction method.

Dependence of the differences on reference TWVC data
Figure 8 (top plot) shows the weighted relative differences between sounding and satellite data as a function of the reference GRUAN TWVC values (using bins of 10 mm).This dependence has been studied using those cases with SZA smaller than 50 • (black line) in order to minimize the strong SZA effect shown in previous sections.Additionally, this data set has been divided into cloud-free (CF < 5 %, in red) and cloudy (CF > 50 %, in blue) cases in order to analyse the cloud effects in the dependence shown in this plot.The three curves show a similar pattern: satellite data increase the underestimation of the reference data with increasing TWVC values.For instance, for all cases with SZA < 50 • , the relative differences change from 0 % for small TWVC values to −20 % for large TWVC values.Furthermore, the great underestimation associated with cloudy cases for the whole range of TWVC values can also be appreciated, in agreement with the results shown in the previous section.Figure 8 (bottom plot) shows only cloud-free cases for opposite SZA conditions.It can be seen that GOME-2 data clearly overestimate the reference data for those cases with SZA > 50 • , most of them corresponding to low TWVC values (below 10 mm) recorded at high-latitude stations.The red curves shown in the two plots of Fig. 8 correspond to the same cases: cloud-free satellite scenes (cloud fraction < 5 %) with low solar zenith angles (SZA < 50 • ).For these particular conditions, the relative differences are within ±10 % for the whole range of data, showing a slight overestimation (underestimation) for small (large) TWVC values.Overall, the best agreement between GOME-2 and GRUAN TWVC data are obtained, as expected, for those cloud-free cases with low SZA.

Conclusions
The analysis of the relative differences between GOME-2 and GRUAN TWVC data reported an average value (weighted with the combined uncertainty derived from the estimated errors of both data sets) of −11 % using 1440 radiosondes from GRUAN network.The negative sign indicates that, on average, satellite data underestimate the reference sounding values.These relative differences showed a mean absolute value of 15 %.When cloud-free satellite scenes were selected in the inter-comparison, a clear improvement was observed, as expected, with a mean relative difference of −4 % (9 % in absolute term).
Nevertheless, the sounding-satellite differences obtained during cloud-free conditions displayed a strong dependence on SZA for angles above 50 • .Thus, the cloud-free relative differences changed from +2 % (SZA of 50 • ) to +23 % (SZA of 70 • ).A similar pattern was also obtained for cloudy satellite scenes with SZA higher than 50 • .This strong SZA effect leads to a systematic seasonal dependence of the sounding-satellite differences, with the GOME-2 TWVC data showing a larger underestimation of the reference GRUAN values in spring-summer months than in the autumn-winter months.This effect could be associated with inaccuracies in the level 1B (calibrated radiances) and/or with the geometrical correction factor applied to obtain the AMF of the water vapour using the GDP retrieval algorithm.
The influence of the cloud properties (CF, CTA and CTP) in the sounding-satellite differences were also studied in detail.Thus, GOME-2 data underestimate the reference GRUAN values when the satellite scene is contaminated with some degree of cloudiness, and this underestimation increases with increasing CF (up to 30 %) and CTA, and decreasing CTP (increasing cloud top height).Therefore, although heavy cloudy conditions were removed from the analysis using the "H2O flag" provided by the satellite algorithm, the remaining cloudy cases cause a significant bias in the satellite-sounding inter-comparison.
Overall, the recommendation given to potential users of the operational GOME-2 TWVC data is to work with cloudfree data for SZA below 50 • .Finally, we would like to emphasize that a continuous validation effort of the satellite data using independent measurements from reliable instruments is required in order to assess its accuracy and quality.

Figure 1 .
Figure 1.GRUAN TWVC against GOME-2 TWVC data for allsky conditions (top plot) and cloud-free conditions (bottom plot).The solid line represents the unit slope to which the data comply.

Figure 2 .
Figure 2. Differences between TWVC data retrieved by GOME-2 and GRUAN sounding data (Eq.3) as function of the GOME-2 ground pixel solar zenith angle (SZA) for all, cloud-free and cloudy conditions.

Figure 3 .
Figure 3. Monthly averages of the differences between TWVC data retrieved by GOME-2 and GRUAN sounding data (Eq.3) for five out of six GRUAN sites used in this study.

Figure 8 .
Figure 8. Differences between GOME-2 and GRUAN sounding data (Eq.3) as function of the GRUAN TWVC values for cases with SZA below 50 • (top plot) and cloud-free cases (bottom plot).

Table 1 .
GRUAN stations with available sounding data within 100 km and 120 min GOME-2 overpass.

Table 2 .
Parameters obtained in the correlation analysis between GOME-2 TWVC data and GRUAN radiosonding measurements during the period 2009-2014.Upper (lower) rows show the parameters obtained for all-sky (cloud-free) conditions.The parameters are: N , number of data; Slope, slopes of regression lines; R 2 , determination coefficients; RMSE, root mean square errors; MBEw, weighted mean bias errors; MABEw, weighted mean absolute bias errors; u, uncertainty of MBEw and MABEw.