Total Column Water Vapour Retrieval from S-5P/TROPOMI in the Visible Blue Spectral Range

Total column water vapour has been retrieved from TROPOMI measurements in the visible blue spectral range and compared to a variety of different reference data sets for clear-sky conditions during boreal summer and winter. The retrieval consists of the common two-step DOAS approach: first the spectral analysis is performed within a linearized scheme and then the retrieved slant column densities are converted to vertical columns using an iterative scheme for the water vapour a priori profile shape which is based on an empirical parameterization of the water vapour scale height. Moreover, a modified albedo 5 map was used combining the OMI LER albedo and scaled MODIS albedo map. The use of the alternative albedo is especially important over regions with very low albedo and high probability of clouds like the Amazon region. The errors of the TCWV retrieval have been theoretically estimated considering the contribution of a variety of different uncertainty sources. For observations during clear-sky conditions, over ocean surface, and at low solar zenith angles the error typically is around values of 10-20% and during cloudy-sky conditions, over land surface, and at high solar zenith angles it 10 reaches values around 20-50%. In the framework of a validation study the retrieval demonstrates that it can well capture the global water vapour distribution: the retrieved H2O VCDs show very good agreement to the reference data sets over ocean for boreal summer and winter whereby the modified albedo map substantially improves the retrieval’s consistency to the reference data sets in particular over tropical landmasses. However over land the retrieval underestimates the VCD by about 10%, particularly during summertime. Our 15 investigations show that this underestimation is likely caused by uncertainties within the surface albedo and the cloud input data: Low level clouds cause an underestimation but for mid to high level clouds good agreement is found. In addition, our investigations indicate that these biases can probably be further reduced by the use of updated cloud input data. The TCWV retrieval can be easily applied to further satellite sensors (e.g. GOME-2 or OMI) for creating uniform measurement data sets on longterm which is particularly interesting for climate and trend studies of water vapour. 20

In the visible spectral range total column water vapour (TCWV) has so far been retrieved mostly in the "red" spectral range because the absorption is strongest there. However for this spectral range the ocean surface albedo is relatively low leading to a low sensitivity for the lowermost troposphere where the highest water vapour concentrations occur. In addition current and past satellite sensors can not resolve the fine absorption structure of water vapour in this spectral range causing non-linear 35 absorption effects (e.g. saturation) which have to be accounted for in post-processing. Thus Wagner et al. (2013) suggested to apply retrievals in the "blue" spectral range (around 442 nm) where the absorption is much weaker than in the red making the retrieval problem quasi-linear. In addition the ocean surface albedo is much higher leading to a higher sensitivity of the nearsurface layers. First operational analyses of a similar approach have been performed by Wang et al. (2019) for measurements of the Ozone Monitoring Instrument (OMI, Levelt et al., 2006). 40 In October 2017 the TROPOspheric Monitoring Instrument (TROPOMI, Veefkind et al., 2012) onboard ESA's Sentinel-5 Precursor (S-5P) satellite was launched in a sun-synchronous polar orbit with an equator crossing time of 13:30 local time.
TROPOMI is a UV-Vis-NIR push-broom spectrometer and consists of 450 detectors/rows covering a swath width of 2600 km.
The outstanding property of TROPOMI is that its spectral bands in the visible combine a high signal to noise with an unprecedented spatial resolution of 3.5×7.5 km 2 (and 3.5×5.6 km 2 since August 2019) at nadir which allows to perform spectral 45 analyses at a never seen before accuracy even on small spatial scale.
In this paper we introduce a TCWV retrieval based on the spectral analysis approach of Wagner et al. (2013) to S-5P/TROPOMI observations. The paper is organized as follows: In Sect. 2 we give an overview of the retrieval describing general retrieval principles and presenting the retrieval set-up. In Sect. 3 we present an empirical parameterization of the a priori water vapour profile shape and an iterative scheme making use of the relation between the water vapour profile shape and TCWV. In Sect. 4 50 we evaluate different input albedo products and in Sect. 5 we perform a detailed uncertainty analysis including a variety of different error sources. In Sect. 6 we present first TCWV results retrieved from TROPOMI measurements and perform a validation study using data sets from satellite, ground-based measurements, and reanalysis models as reference. In Sect. 7 we draw conclusions and summarize the outcomes of our investigations.

Wavelength calibration and spectral analysis
In a first step the wavelength alignment of the measured irradiance is calibrated for each of the 450 TROPOMI detectors/rows via a nonlinear least-squares fit in intensity space using the solar spectrum from Kurucz (1984) as reference. Simultaneously the instrumental spectral response function (ISRF) is approximated assuming an asymmetric Super-Gaussian following the definition of Beirle et al. (2017): Next, we perform a spectral analysis using the differential optical absorption spectroscopy (DOAS; Platt and Stutz, 2008) scheme in which the attenuation along the light path is calculated via Beer-Lambert's law in optical depth space: where i denotes the index of a trace gas of interest, σ i (λ) its respective absorption cross section, SCD i = s c i ds its concentration integrated along the light path s (the so called slant column density), and Φ a closure polynomial accounting for Mie and Rayleigh scattering as well as low-frequency contributions. Table 1 summarizes the fit setup of the retrieval's spectral analysis. The retrieval's fit window ranges from 430 nm to 450 nm and accounts for molecular absorption by water vapour (HITRAN 2008, Rothman et al., 2009, NO 2 at 220K (Vandaele et al., 1998), ozone (Serdyuchenko et al., 2014) and the O 2 -O 2 dimer (Thalman and Volkamer, 2013). In order to account for the Ring 70 effect we include two Ring spectra (Wagner et al., 2009) and for Φ we use a 5th order polynomial. Furthermore we include pseudo-absorbers accounting for intensity offset, for shift and stretch effects (Beirle et al., 2013) and, for ISRF changes along the orbit (Beirle et al., 2017) for the ISRF parameters w and k in Eq. (1). All molecular absorption cross sections are convolved with the ISRF of the corresponding TROPOMI row/detector determined during the calibration process.
The molecular absorption by water vapour within our fit window is relatively weak and hence the modelled line lists vary 75 strongly from HITRAN 2008 to HITRAN 2012 (Rothman et al., 2013) and to HITRAN 2016(Gordon et al., 2017. Thus the choice of line list is afflicted by a high degree of uncertainty. Lampel et al. (2015)  Due to the high daily data volume of the TROPOMI L1B radiances (about 40 gigabyte per day for Band 4), the execution of a non-linear fit without high performance infrastructure is demanding in computation time. Therefore, we implemented a weighted linear least squares fit for our retrieval, in which the weights are the fractional coverage of the pixel within the fit 85 window (details in Appendix A). This weighting of the outermost pixels of the fit window avoids "jumps" of pixels included in the DOAS fit, as it would occur for a fixed fit window due to the changing pixel-to-wavelength mapping across track. Thus, across track "stripes" in the SCDs are avoided. Figure 1 illustrates a typical example of such a spectral analysis of a TROPOMI measurement spectrum in which the absorption structures of water vapour, NO 2 , and the Ring effect can be well identified and the residual spectrum showing a mainly noisy 90 structure. Figure 2 depicts the distribution of the H 2 O SCD from one TROPOMI orbit (orbit number 6930) on 13th February 2019. It demonstrates that the TROPOMI retrieval is able to capture the meso-to macro-scale water vapour patterns like convective updrafts in the tropics and atmospheric rivers in the midlatitudes whereby the small H 2 O SCD values in the tropics are caused by cloud shielding.

95
To convert the slant column density to a vertical column density (VCD), we apply the so called airmass factor (AMF):

VCD = SCD AMF
The airmass factor accounts for the non-trivial effects of the atmospheric radiative transfer and is usually based on radiative transfer model (RTM) simulations. In our case we used the 3D Monte Carlo RTM McArtim (Deutschmann et al., 2011) and performed simulations at a wavelength of 442 nm for different retrieval scenarios (summarized in Tab. 2) assuming an aerosol-100 free atmosphere. These simulations yield a Jacobian vector J = ∂ ln I ∂β defined at each grid box i and the altitude-dependent AMFs (BAMF) can be calculated according to the formula: with the simulated intensity I normalised by the solar spectrum and the box thickness ∆h. These BAMF profiles have to be combined with the partial vertical columns VCD i of an a priori water vapour profile: For the case of a cloud contaminated pixel we assume that the cloud is a Lambertian reflector with an albedo of 80% and use the cloud top height as surface altitude input for the AMF. Under the assumption of the independent pixel approximation the resulting cloud-affected AMF can then be calculated as a linear combination of the AMF for a clear-sky scenario and the AMF for a cloudy-sky scenario weighted by the respective simulated intensities I and the effective cloud fractions ζ as follows: out that their calculated AMF change strongly depends on which reanalysis model data they were using. Weaver and Ramanathan (1995) approximated the water vapour profile by an exponential decay with altitude and a corresponding scale height defined as where T denotes the mean air temperature within an atmospheric column, Γ the mean lapse rate within the same atmospheric column, R v the gas constant of water vapour, and L the specific latent heat. However this definition requires knowledge of the mean air temperature and/or the lapse rate and that the relative humidity is constant with altitude. The former can be only estimated using numerical weather models and the latter is very unlikely to occur in the atmosphere.

125
Thus we investigate to find an empirical parameterization of the scale height and thereby focus on its dependency on the H 2 O VCD and the aforementioned atmospheric variabilities, i.e. dependencies of latitude, seasonal cycle and surface properties (such as vegetation effects).
For these investigations we use profile data retrieved from measurements of the Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC, Anthes et al., 2008) program provided by ROMSAF. The COSMIC data are based 130 on the GPS radio occultation (RO) technique which provides high resolution vertical profiles of bending angles (Hajj et al., 2002) that can be used to retrieve the atmospheric refractivity N . Since the atmospheric refractivity N is given by (Smith and Weintraub, 1953): with the air pressure p, the air temperature T , and the water vapor pressure E, GPS RO allows to retrieve profile information 135 under all-weather conditions with a high vertical resolution of approximately 100 m in the lower troposphere up to 1 km in the stratosphere (Anthes, 2011) and an accuracy of around 1 g/kg (Heise et al., 2006;Ho et al., 2010b) while having an almost uniform global distribution (Ho et al., 2010a). We use data retrieved between 2013 and 2016, which accumulates to approximately 1.6 × 10 6 profiles.

140
For the calculation of the scale height we highsample the COSMIC profile to a 100 m grid up to 14 km or rather only consider profile data below 150 hPa (close to the tropopause height). Then we sum up all the partial columns of the COSMIC profile data from ground up to a (scale-) height H sum where 63% of the H 2 O VCD are reached: In order to evaluate this scale height approach we performed a synthetic study in which we compared AMFs calculated for the The results of the intercomparison are given in Figure 3. The 2D histograms reveal that the AMFs derived with the exponential 150 profile agree well with the AMFs calculated directly from the COSMIC profiles indicating that the chosen method can well reproduce the shapes of the COSMIC profiles. This good agreement can be also observed in the histograms of Fig. 4 which illustrate distributions of relative deviation between the AMFs for for selected latitude bins. These distributions have a sharpe shape and peak around values of 0% indicating that the AMFs from the exponential shape are almost unbiased to the reference AMFs. In addition Fig. 5 shows examplary profiles for cases of good and bad agreement to the reference AMFs for the same 155 selected latitude bins as in Figure 4. In general the bad agreement (left column) occurs for profile shapes in which a distinctively strong gradient is observed in the lower troposphere and from that quasi-constant values with altitude. Nevertheless the maximal absolute relative AMF-deviations only have values around 15%.
The results of the intercomparison for prescribed cloudy-sky conditions and nadir viewing geometry are illustrated in Fig. 6 in which the panels show histograms of the relative AMF deviation for the same selected latitude bins as in Fig. 4 but for by the higher atmospheric variability due to higher atmospheric dynamics in the midlatitudes. Also the uncertainty is higher in COSMIC profile because a drier atmosphere leads to a smaller sensitivity of the COSMIC profile retrieval to water vapour concentrations (compare Eq. (4) and Kursinski et al. 1997). Figure 8 illustrates the same panels as Fig. 7 but for data over land. In general the scatter for all latutude bins has increased distinctively resulting in an inferior linear agreement between the H 2 O VCD and the scale height compared to the data over 180 ocean, especially for deserts and northern polar regions. Fortunately, the surface albedo of these regions is usually high and thus the AMF is less dependent on the a priori profile shape. In addition these regions are governed by an arid climate and thus the retrieved H 2 O VCDs are expected to be small. Correspondingly the absolute H 2 O VCD errors due to uncertainties in the AMF are still relatively small.
In the following we investigate a parameterization of the scale height with respect to H 2 O VCD, latitude, season for ocean and 185 land separately.

Ocean
The regression line parameters of the ODR fit results between COSMIC TCWV and COSMIC scale height for each latitude bin for each month for data over ocean are illustrated in Figure 9. The values for the fitted slopes (left panel in Fig. 9) indicate a quadratic dependency with latitude and reveal a seasonal shift towards higher latitudes during July, August, and September.

190
Also the values for the fitted intercept vary with latitude and season.
Thus the scale height over ocean H ocean can be approximated as follows: with the latitude θ and the day of year t. The annual variation of the function parameters a i , b i and θ 0 from Eq. (6) fitted for the monthly data sets (illustrated in Fig. 9) is depicted in Figure A9. Most function parameters reveal an annual and semi-annual cycle over the year. Hence these function parameters can be approximated by a superposition of two simple cosine functions with prescribed frequencies: with t as the day of year and ω = 2π 365 . Such functions have also been fitted and illustrated for the monthly data in Fig. A9 (solid orange lines) whereby we assumed that the day of year representing the month is the first day of the month. For most function parameters the fits coincide well with the data points and in the cases of suboptimal fit results the annual variation of the data is relatively small, indicating that our choice of parameterization is valid.

205
Altogether we have to fit 35 parameters to the complete data set of calculated COSMIC scale heights for the parameterization of the scale height over ocean. The goodness of the parameterization in approximating the scale height is illustrated in Fig.10 for different latitude zones. For the latitude zones including the tropics (-15 • to 15 • N) and subtropics (-35 • to 35 • N) we find a good agreement between the parameterization and the calculated COSMIC scale height with R 2 of 0.72 and 0.60 respectively.
However including higher latitudes in the evaluation, i.e. midlatitudes (-60°to 60°N) and polar regions (-90°to 90°N), leads 210 to an increased scatter and a worsening of the parameterization (R 2 of 0.45 and 0.44 respectively). This inferior agreement is likely caused by the larger atmospheric variability in the midlatitudes (e.g. higher atmospheric dynamics) as well as an increased uncertainty in the COSMIC water vapour profile measurements due to lower water vapour concentrations.  Figure 11 shows the ratio H land /H ocean as a function of the NDVI for data sets filtered by different landcover types and the solid red lines represent the robust regression results (summarized in Tab. 3) using the model from Siegel (1982). The left panel depicts the distribution for which no filter is applied. Except for low NDVI values a linear relation between ratio and NDVI is observable, however for NDVI values around 0.1 the ratio varies strongly between 0.7 and 3.0. In the center panel

Land
whereby in the following we use the results for the data set filtered for landcover type 7 and 15 globally. Since regions of landcover type 7 or 15 are usually arid, the retrieved H 2 O VCD is small and thus the error due to an inadequate parameterization of the AMF is much smaller than the fit error of the spectral analysis.

Iterative retrieval scheme
For the calculation of the H 2 O VCD we precomputed AMF look-up tables (LUT) for the different water vapour profile shapes with scale heights ranging from 0.5 km to 5.0 km. These LUTs can then be used within a fixed-point iteration. In our case the iterative retrieval scheme is based on a fixed-point iteration according to Steffensen's method (Steffensen, 1933; Wendland and where f is a function calculating the scale height for a given VCD using Eq. (5) and (7), applying it to the precomputed AMF look-up tables and from that returning a new VCD. The advantage of Steffensen's method is that it does not need a derivative and is able to determine the fixed-point even for the case of a non-contractive function (Wendland and Steinbach, 2015). For the first guess we derived the initial VCD from the SCD using a geometric AMF (AMF geo = 1 cos(SZA) + 1 cos(VZA) ) and stop 250 the iteration as soon as the logarithmic difference between two consecutive results is smaller than 5% (approximately 1 kg m −2 assuming an average H 2 O VCD of 20 kg m −2 ) or after six iteration steps. We also checked other values for the first guess and could confirm that the convergence of the iterative scheme is independent from them. Similiar results can be found in Fig. 13 which illustrates the H 2 O VCD distributions of both approaches for the same scenario for clear-sky (effective cloud fraction CF<20%, top row) and all-sky (CF≤100%, bottom row) conditions. In addition Fig. 13 depicts the TCWV distribution from microwave sensor SSMIS f16 which has a temporal difference of around +2.3 hours. For the clear-sky case the VCD distributions between both approaches are almost identical whereby for the constant scale height 265 approach (left panel) very high VCDs (exceeding values higher than 80 kg m −2 ) can be observed at the edges of the cloudy regions in the Northern subtropics. For the all-sky case (bottom row) the differences between the two approaches are largest in cloudy regions, but as already mentioned before even under these unfavourable observation conditions the iterative approach is still able to give reasonable VCD values, whereas the constant scale height approach distinctively overestimates the VCD.
In addition the iterative approach shows an overall good agreement to the SSMIS observations.

285
In contrast to that MODIS pixels have a much higher spatial resolution and MODIS' NIR channels are more sensitive to cloud contimination, yielding a higher sample size and allowing for correct cloud filtering. Hence the H 2 O VCD distribution using the MODIS surface reflectance results in a much smoother transition from ocean to land and in general much higher VCD values over land along the equator. Thus in the following we use a combination of the MODIS and OMI albedos: the scaled MODIS Aqua blue surface reflectance over land and the monthly minimum OMI albedo over ocean.  The retrieval's spectral analysis directly yields the 1σ standard fit error of the H 2 O SCD which is usually dominated by noise.

300
For a better understanding of these fit errors we separated them into data for small/large solar zenith angles (SZA<20 • and 70 • <SZA≤90 • , respectively), low/high surface albedo (<3% and >15%, respectively), and clear-/cloudy-sky observation conditions (CF<5% and CF>20%, respectively). The distributions of the standard and relative fit errors of the spectral analysis are given in Fig. 15 and 16, respectively. The median values in Fig. 15   ability density and the medians also indicate that the distributions are positive skewed in particular for high SZA scenarios: for these scenarios the relative errors easily exceed values of 100%. Nevertheless using the locations of maximal probability 310 density as a rule-of-thumb estimate, relative fit errors have values around 10% for low SZAs and approximately 30% for high SZAs.

Uncertainties in the AMF
The uncertainty in the AMF depends on the uncertainty of its input parameters. Because the parameters of the viewing geometry (i.e. solar zenith angle, line of sight angle, and solar relative azimuth angle) are known with high accuracy the most 315 important uncertainties are uncertainties of the surface albedo, cloud fraction, cloud top height, and water vapour profile shape.
In order to estimate the contribution of each input parameter to the overall AMF uncertainty we define standard scenarios (summarized in Tab. 5) for which we calculate the AMF from the precalculated LUT and then vary the input parameter for each scenario according to its uncertainty assumption listed in Table 6. The uncertainties of the water vapour scale height have been derived from the fit results of the intercomparisons between the measured COSMIC scale height and the parameterized 320 scale height over ocean (see Fig. 10) and land (see Fig. 11).   Table 5. In contrast to the clear-sky scenarios the impact of the surface albedo uncertainties has strongly decreased, but in general the contributions of all AMF errors have increased distinctively. The main source for the AMF errors is still the uncertainty of the scale height over low vegetation whose median 330 values varies between 20-50%, but can also cause AMF errors larger than 60%. Table 6 summarizes the results of the different error sources considered in the AMF uncertainty for clear-and cloudy-sky conditions. For clear-sky conditions one can typically assume a relative AMF error around 10-15% and for cloudy-sky conditions around 10-25%.

Total H 2 O VCD uncertainty 335
The total relative H 2 O VCD uncertainty can be approximated by With our findings of typical relative AMF and H 2 O SCD uncertainties the total relative VCD uncertainty is typically around 10-20% for observations during clear-sky conditions, over ocean surface, and at low solar zenith angles. During partly clouded-sky conditions, over land surface, and at high solar zenith angles the error reaches values of approximately 20-50%.

340
In order to evaluate the retrieval's performance we conducted a validation study for the time ranges of boreal summer (June, July, and August) 2018 and boreal winter (December, January, February) 2018/2019 whereby we only include clear-sky observations (i.e. pixels with a effective cloud fraction smaller than 20%) and ice-and snowfree pixels. As reference data for the validation we use TCWV from the Special Sensor Microwave Imager/Sounder (SSMIS), from the reanalysis model ERA-5 345 and from the ground-based GPS network SuomiNet.
As cloud input data we use the cloud information (effective cloud fraction at 440 nm and cloud top height) as well as the surface altitude from the TROPOMI L2 NO 2 product (Van Geffen et al., 2019) and as surface albedo input data we use the combination of the modified MODIS and OMI albedo described in Section 4.

SSMIS comparison 350
For the evaluation we use measurements from SSMIS onboard NOAA's f16 and f17 satellite processed by Remote Sensing Systems (RSS) and provided by NASA Global Hydrology Resource Center on a daily 0.25°×0.25°grid. SSMIS can observe the TCWV distribution under all-sky conditions over ocean with an accuracy of around 1 kg m −2 (Wentz, 1997;Mears et al., 2015). Since SSMIS changes its equator crossing time (ECT) we only include SSMIS observations whose ECT is within 3 hours (and 5 hours for f17, respectively) with respect to TROPOMI's ECT of 13:30 LT. For the intercomparison we only 355 include SSMIS measurements that are not affected by rain. with R 2 = 0.89 for both seasons. Overall considering the differences in collocation time (3 hours and 5 hours for f16 and f17, respectively) the comparison shows that the TROPOMI TCWV retrieval can well capture the water vapour distribution over ocean.
To investigate the influence of clouds on our retrieval, we plot the difference (top row) and relative difference (bottom row) 365 between TROPOMI and SSMIS as a function of the input cloud top height (CTH) in Figure 20 and 21 for f16 and f17, respectively. The median over the whole CTH range (blue dashed line) indicates an underestimation of the TROPOMI H 2 O VCD of approximately 12-13% (2.6 kg m −2 ). However the large majority of data points is distributed within the CTH bin between 0-1 km revealing that the underestimation of the TROPOMI TCWV is mainly caused by low clouds. For mid clouds the median difference almost cancels out whereas for high clouds it first increases and then remains almost constant with cloud 370 top height.
Further validation results for SSMIS f16 and f17 separated into different cloud fraction and cloud top height bins for July 2018 are given in Fig.A10 and Fig. A11 respectively. The results indicate that there is no dependency with cloud fraction but a distinctive dependency with cloud top height: The retrieval underestimates for clouds below 1 km, is in very good agreement for mid level clouds (1 km to 4 km) and overestimates for higher clouds.

ERA-5 comparison
For the intercomparison between the reanalysis model ERA-5 and TROPOMI we use ERA-5 TCWV data provided by Copernicus Climate Change Service (2017) on a 0.25°× 0.25°grid. We only take into account values which are within 1 hour with respect starting sensing time of the TROPOMI orbit and separate the data into data over ocean and data over land.
The results of the intercomparison are summarized in Figure 22. Over ocean (top row in Fig. 22 systematic uncertainty within the input parameters for our retrieval. The influence of the cloud top height input is illustrated in Fig. 23 for data over ocean. The median is around −1.6 kg m −2 (−7.1%) and −1.3 kg m −2 (−6.7%) during summer and winter, respectively, whereby similar to SSMIS these underestimations are caused by the majority of data points within the 0-1 km CTH bin. For increasing CTH the deviation from the reference increases and leads to an overestimation. For data over land (Fig. 24) the CTH variability is much larger than over ocean, 390 i.e. most data points are now distributed between 0 to 3 km and the median is around values of −1.5 kg m −2 (−10.3%) and −0.4 kg m −2 (−4.0%) during summer and winter respectively. Furthermore low clouds still cause an underestimation and for mid to high clouds the deviations almost cancel out, but one can also observe an increasing scatter for winter data.
All these findings reveal that the combination of albedo uncertainties and uncertainties in the cloud properties (cloud fraction and cloud top height) as well as in the scale height parameterization have a distinctive influence on the AMF. The cloud 395 products from TROPOMI rely on the OMI albedo which, as we have demonstrated in Sect. 4, has several problems over land surface. In addition the uncertainty of the OMI albedo over land surface is higher than over ocean due to a highly spatiotemporal variability of the scenery and the differences between the monthly minimum and the monthly mean albedo are higher over land than over ocean. Furthermore the cloud top height is calculated via the cloud top pressure and has to be combined with the surface pressure. Thus the uncertainty of the cloud top height over land is higher than over ocean since over ocean the 400 topography is much simpler.
Nevertheless the complex radiative interactions between albedo and clouds might amplify or cancel out these deviations and thus make it difficult to draw clear conclusions.
As for the SSMIS comparison, further validation results for ERA-5 over ocean and land separated into different cloud fraction and cloud top height bins for July 2018 are given in Fig. A12 and Figure A13. Similar to SSMIS, the results over ocean reveal 405 an underestimation for low clouds and an overestimation for high clouds that there is almost no dependency with cloud fraction.
Over land low clouds still cause an underestimation, however for cloud top heights above 2 km the retrieval shows very good agreement to ERA-5 indicating that the input cloud top height for our retrieval is too low.

SuomiNet/GPS comparison
For the intercomparison with TCWV from ground-based GPS we use data from the network SuomiNet (Ware et al., 2000) 410 provided by UCAR. SuomiNet stations are distributed over North and Central America and provide data every 30 min with a typical accuracy of 2 kg m −2 (Duan et al., 1996;Fang et al., 1998). Thus we only take into account TROPOMI pixels within a distance of 0.1°to the GPS station and within 2 hours with respect to the GPS measurement. VCD of approximately 14% (3.5 kg m −2 ) during summer and of 8% (0.8 kg m −2 ) during winter. However during summer the median values for each 1 km CTH bin (blue dots) reveal that the underestimation is mainly caused by low clouds whereas for 420 mid and high clouds the median difference almost cancels out. During winter this pattern is not clearly observable due to much larger scatter but also here low clouds mainly cause the underestimation in TCWV whereby the difference is generally within the range of accuracy of the SuomiNet retrieval.

Summary and conclusions
In this paper, we introduce a total column water vapour retrieval from TROPOMI spectra in the visible blue spectral range using an iterative vertical column conversion scheme and provide a detailed characterization of our retrieved H 2 O VCD by 430 performing a detailed uncertainty analysis and intercomparisons to reference data sets from the microwave sensor SSMIS, from the reanalysis model ERA-5 and from the ground-based measurements GPS network SuomiNet.
For the iteration scheme we describe the a priori water vapour profile as an exponential decay with a scale height H and developed an empirical parameterization for this scale height. This parameterization is based on COSMIC water vapour profile data and relates the a priori water vapour profile shape to the H 2 O VCD, the seasonal cycle, the latitude and the vegetation (and 435 NDVI, respectively). We demonstrate that we can correctly reproduce the scale heights in particular for data at low latitudes (tropics and subtropics). However we also observe an increasing scatter if higher latitudes are included in the comparison, likely because of the higher variability in H 2 O VCD due to midlatitudinal cyclone dynamics and a general higher uncertainty in the COSMIC profile data for drier atmospheric conditions. Overall, the retrieved profile heights are very reasonable and we obtain a substantial improvement using the new parameterisation compared to the use of a prescribed constant water vapour 440 profile.
For the uncertainty analysis we investigated the impact of several error sources on the H 2 O SCD and AMF like clouds, surface albedo, profile shape and instrument properties. The error estimation reveals that the main SCD uncertainty is the fit error of the spectral analysis and that the main AMF uncertainties are caused by uncertainties in the surface albedo and water vapour profile shape. For the H 2 O VCD we estimated a typical total relative error of around 10-20% for observations during clear-sky In the validation study we demonstrate that for clear-sky conditions the retrieved TROPOMI H 2 O VCDs over ocean are in very good agreement to the reference data sets and can correctly capture the global water vapour distribution. Over land the TROPOMI retrieval can reproduce the TCWV distribution however we also observe a distinctive underestimation of around 10% in particular during boreal summer.

455
Nevertheless these underestimations might be caused by the uncertainties of the external input data for the retrieval: For instance the OMI LERs from Kleipool et al. (2008) are too high over tropical landmasses likely due to incorrect cloud filtering which causes too high AMFs leading to too low H 2 O VCDs. Although we tried to overcome this issue by using a surface reflectance product from MODIS Aqua, the cloud products from the TROPOMI L2 NO 2 product still rely on the OMI LER for calculating the effective cloud fraction and cloud top height and thus also have a large uncertainty. The intercomparisons 460 to the reference data sets show that these uncertainties in the cloud products have a substantial impact on the H 2 O VCD: Our investigations reveal that the input cloud top height is probably too low which in turn leads to higher AMFs and consequently to an underestimation in TCWV. Yet one has to consider that the radiative properties of the cloud and albedo products interact at a high degree of complexity so that a clear explanation or suggestion on how to overcome these issues is beyond the scope of this paper.

465
Overall the successful application of the TCWV retrieval in the visible blue spectral range on TROPOMI measurement is very promising for further investigations including application to further satellite sensors such as OMI, SCIAMACHY, and GOME-1/2 or the upcoming Sentinel-4 instrument and expanding the retrieval to measurements contaminated by higher cloud fractions. As the retrieval allows for a fast execution of large data sets investigations of longterm trends using a TCWV data set of merged timeseries of different satellite sensors are easily possible. However, since these data sets have to be uniform they 470 require consistent input data across the different satellite sensors, in particular for cloud products.
window (see also Fig. A1):  Table 8  and that these variations are within the same range of the variation of the COSMIC profile data. Figure A7 depicts histograms of the relative AMF deviation for both methods for selected latitude regions assuming nadir viewing geometry and clear-sky conditions (like in Sect. 3.1 and Fig. 4). The peaks of the histograms for the sum method 520 are close to the 0% line indicating very good agreement with AMF calculated from the COSMIC profiles. In contrast the histograms for the non-linear fit peak at values around 2% and show a broader distribution than the histogram of the sum method, thus revealing an inferior agreement to the reference AMFs. For cloudy-sky conditions (see Fig. A8), both methods are biased to smaller AMF-values (deviations of around -5%) for a cloud top height of 1 km, but for higher clouds both methods show similar good agreement to the reference AMFs. Yet the variance in the AMFs for the sum method is much smaller than in the 525 AMFs for the non-linear fit.
In summary the sum method is to be preferred because it provides more consistent results for clear-sky and cloudy-sky scenarios than the non-linear fit.
helped with the McArtim calculations and HS helped with the tessellation of the TROPOMI H2O VCD orbit data to a regular grid. TW supervised this study.
Competing interests. The authors declare that they have no conflict of interest.
providing reanalysis data. Furthermore we acknowledge UCAR and ROMSAF for providing SuomiNet and COSMIC data. We also would like to thank Stefan Schmitt and Johannes Lampel from the Institute of Environmental Physics at the University of Heidelberg for performing the analysis of the LP-DOAS measurements during CINDI-2 and for providing the WVMR results in a very useful format.