Improved water vapour retrieval from AMSU-B and MHS in the Arctic

Monitoring of water vapour in the Arctic on long timescales is essential for predicting Arctic weather and understanding climate trends, as well as addressing its influence on the positive feedback loop contributing to Arctic amplification. However, this is challenged by the sparseness of in situ measurements and the problems that standard remote sensing retrieval methods for water vapour have in Arctic conditions. Here, we present advances in a retrieval algorithm for vertically integrated water vapour (total water vapour, TWV) in polar regions from data of satellite-based microwave humidity sounders: (1) in addition to AMSU-B (Advanced Microwave Sounding Unit-B), we can now also use data from the successor instrument MHS (Microwave Humidity Sounder), and (2) artefacts caused by high cloud ice content in convective clouds are filtered out. Comparison to in situ measurements using GPS and radiosondes during 2008 and 2009, as well as to radiosondes during the NICE2015 campaign and to ERA5 reanalysis, show the overall good performance of the updated algorithm.


Introduction
Water vapour is a key element of the hydrological cycle (Chahine, 1992;Serreze et al., 2006;Jones et al., 2007;Hanesiak et al., 2010), with shifts in it affecting atmospheric transport processes, creating and intensifying droughts and flooding (Trenberth et al., 2013). Additionally, as the most important greenhouse gas in the atmosphere, it has a dominant effect on climate and radiative forcing (Soden et al., 2002;Dessler et al., 2008;Kiehl and Trenberth, 1997;Trenberth et al., 2007;Ruckstuhl et al., 2007). Hence, it is essential to monitor its variability considering both that water vapour increases when temperature does and the anthropogenic increase in other greenhouse gases (Solomon et al., 2010), with the water vapour positive feedback loop highlighted as part of other feedbacks responsible for Arctic amplification (Francis and Hunter, 2007;Miller et al., 2007;Screen and Simmonds, 2010;Ghatak and Miller, 2013). In summary, understanding the water vapour cycle has high value, yet our comprehension is incomplete (Stevens and Bony, 2013). Throughout this paper, when mentioning atmospheric water content, we refer to the vertically integrated mass in an air column with an area of 1 m 2 , and call it total water vapour (TWV, sometimes also called column water vapour, integrated water vapour or total precipitable water), the units are hence kg m −2 .
Balloon-borne radiosondes are a standard method for retrieving the water vapour profile. Additionally, ground-based retrievals by microwave radiometers and GPS-based retrievals (while having a lower vertical resolution) are good for monitoring purposes in regions where ground stations can be installed. However, in the Arctic, neither radiosonde measurements nor ground-based retrievals are sufficient for this purpose because weather stations are too scarce. Only satellite measurements fulfil the global coverage requirements. An additional challenge is to construct a consistent long-term climate record, due to the changes in measuring instruments, and degradation of the existing ones. Because of the strong absorption properties of water vapour in the infrared and mi-Published by Copernicus Publications on behalf of the European Geosciences Union.

3698
A. M. Triana-Gómez et al.: Improved water vapour retrieval from AMSU-B and MHS in the Arctic crowave range, suitable space-borne instruments can in principle ensure a complete global coverage of water vapour retrievals (Miao et al., 2001;Bobylev et al., 2010). In polar regions, however, satellite retrieval of water vapour faces a number of obstacles, such as cloud cover, which restricts infrared measurements, or incomplete understanding of the high and highly variable sea ice emissivity, which challenges microwave measurements. Some studies, like the one by Weaver et al. (2017), have been done for TWV in the Arctic atmosphere, but none of them have been able to provide a long-term Arctic-wide dataset.
An important step for Arctic water vapour retrieval comes from the work of Miao et al. (2001). They used data from the SSM/T2 (Special Sensor Microwave Humidity) humidity sounder to develop an algorithm which was designed to work in the Antarctic. The key concept of this method is the use of several microwave channels with similar surface emissivity but different water vapour absorption. These are the three channels near the 183.31 GHz water absorption line (183.31 ± 1, ±3 and ±7 GHz), which, together with the channel at the 150 GHz window frequency, allow retrieval of TWV values up to about 7 kg m −2 . Above this value, two of the 183.31 GHz band channels become saturated and the sensor is not able to "see" through the whole atmospheric column anymore. In other words, when the TWV reaches a certain threshold, the brightness temperature at these Advanced Microwave Sounding Unit-B (AMSU-B) channels does not change with increasing TWV (Miao, 1998;Melsheimer and Heygster, 2008). This limited range is enough for Antarctica and suffices for the Arctic in winter conditions (in the polar winter atmosphere, the water vapour column is typically around 3 kg m −2 , according to Serreze et al., 1995), as well as for the central Arctic (above 70 • N) most of the year. However, because of the upper limit, this method cannot ensure monitoring of the complete yearly cycle. The algorithm developed by Melsheimer and Heygster (2008) extends the TWV retrieval range over sea ice by including the AMSU-B (Advanced Microwave Sounding Unit-B) 89 GHz channel into the retrieval. Using the triplet of the 183.31 ± 7, 150 and 89 GHz channels allows the retrieval to function up the saturation limit of the 183.31 ± 7 GHz channel. This method has been compared with other datasets. In Rinke et al. (2009) a comparison with the HIRHAM model showed realistic patterns and maximum root-mean-square differences (RMSDs) for monthly data in summer of 1-2.5 kg m −2 . For the comparison with Ny-Ålesund radiosondes in Palm et al. (2010), the correlation coefficient was 0.86 and the slope 0.8 ± 0.04. And lastly, in Buehler et al. (2012) AMSU-B TWV are compared to GPS data from Kiruna, with a RMSD of 1 kg m −2 and a correlation coefficient of 0.86. However, the AMSU-B algorithm is not without problem: while the frequency range allows it to bypass most clouds, the AMSU-B sensor is still sensitive to convective clouds with high ice content. Here we provide an approach for filtering out problematic data caused by the effect of such ice clouds. This is intended as groundwork for the planned merging with TWV retrieved over open ocean based on passive microwave imagers (product described by Wentz and Meissner, 2006).
In Sect. 2, we describe the algorithm in a more detailed way. In Sect. 3 we evaluate the application of the algorithm to MHS (Microwave Humidity Sounder) instead of AMSU-B data, which is necessary for extending the dataset to cover recent years, performing a comparison with different in situ data sources in Sect. 3.2 and to ERA5 reanalysis in Sect. 3.3. Following this, in Sect. 4 we evaluate the new ice cloud filtering developed for the algorithm in Sect. 4, and finally give some conclusions in Sect. 5.

Data sources
The algorithm uses microwave radiometer satellite measurements from humidity sounders such as AMSU-B or MHS on board the NOAA (National Oceanic and Atmospheric Administration) 15 to 19 satellites and EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites) Metop-A, Metop-B and Metop-C satellites. The characteristics of each sensor can be found in Table 1, and the launch dates of each satellite are given in Table 2. Throughout this paper, when we refer to AMSU-B TWV, the brightness temperature data used for the retrieval is always from the sensor on NOAA-17, with the version from the Fundamental Climate Data Record (Ferraro and Meng, 2016), which provides an inter-satellite calibrated set of brightness temperatures as described in Ferraro (2016). When we refer to MHS TWV, the brightness temperature data are from NOAA-18 and are similarly sourced.
Additionally, to distinguish between surface types, the daily ice concentration provided by the ASI algorithm (ARTIST Sea Ice algorithm, Spreen et al., 2008) is used, with pixels with ice concentrations below 15% as open water, while the ones with more than 80 % will be considered ice. The percentages between those will not be used.

Radiative transfer equation
The algorithm starts from the formulation of the radiative transfer equation in the contracted form by Guissard and Sobieski (1994), which describes the brightness temperature (T B ) measured by a space-borne radiometer as follows: where θ is the zenith angle, T s and T 0 are the surface and air temperatures, respectively, T c is the cosmic background emission, s the surface emissivity, τ 0 the total opacity of the atmosphere in the vertical direction, and m p a correction to take into account both a non-isothermal atmosphere and the difference between the surface (skin) temperature, T s , and the temperature of the atmosphere at the ground, T 0 (m p = 1 would be the isothermal case and T 0 = T s ). The approach by Melsheimer and Heygster (2008), summarized in the following, assumes the ground to be approximated as a specular reflector, which should be good enough for remote sensing in the frequency range we are dealing with, according to Hewison and English (1999).

Retrieval for equal emissivity assumption
Note that the entire derivation of the final total water vapour retrieval equation from the radiative transfer equation is described in detail in the initial paper for the Antarctic by Miao et al. (2001) and the subsequent Arctic extension by Melsheimer and Heygster (2008). We summarize it here because the basic mechanism is necessary to understand the changes performed. We start from microwave radiometer satellite measurements in three different channels i, j and k, as mentioned in Sect. 2.1. We assume none of these three channels are saturated, i.e. the sensor is still sensitive to the whole atmospheric column and ground. Additionally, we take the ground emissivity as equal in all three channels (as they see the same footprint, and the emissivity does not vary between the channels), while the water vapour absorption (mass absorption coefficient k (m 2 kg −1 )) is different, with k i < k j < k k . Following this, the brightness temperature difference of two channels i and j can be expressed as follows: where τ i is the nadir opacity of the atmosphere at the frequency of channel i and b ij is a bias related to the term m p for the channels i and j : As shown in Melsheimer and Heygster (2008), Appendix II, the bias can here be approximated as follows: where T (z) is the atmospheric temperature profile. Then we take the ratio of what we call compensated brightness temperature differences: We can express the opacities τ i as a sum of the atmospheric constituent contributions to them: water vapour (τ w i ) and oxygen (τ oxygen i ). The latter is negligible for AMSU-B channels near the water vapour line, so if we take water vapour mass absorption coefficients k i and TWV W : If we approximate the differences of exponentials by products in Eq. (5) and take logarithms, we get the following equation: The three constants B 0 , B 1 , and B 2 depend on the mass absorption coefficients for the different channels. The term quadratic in W can be neglected (Selbach, 2003;Miao et al., 2001), which leaves us with an equation linear in W that can then be solved to yield our retrieval equation: where C 0 = B 0 B 1 and C 1 = 1 B 1 . They are determined empirically as calibration parameters from simulated brightness temperatures based on radiosonde profiles by a regression analysis, described in more detail below (Sect. 2.6).

Extension of the retrieval
Normally, for TWV values above 7 kg m −2 , saturation occurs at Channel 19 (183.3 ± 3 GHz). To extend the retrieval range above this threshold, another channel is required that is less sensitive to water vapour to take its place in the triplet. This means that a new set of assumptions has to be made about the surface emissivity influence. For AMSU-B, the next channel "in line" is the one at 89 GHz (Channel 16). Thus, the three channels i, j and k are now the AMSU-B Channels 16, 17 and 20 (89, 150 and 183.31 ± 7 GHz). Because Channel 16 is so far from the other two, we can no longer assume that it has the same surface emissivity as the others. Therefore, the retrieval equation needs to be rederived with the changed premise: i = j = k . This leaves us with a similar-looking retrieval equation: where η c is a modified ratio of compensated brightness temperatures: and C(τ j , τ k ) is defined as follows: Since now there is a dependence on emissivities i , or, equivalently, on reflectivities r i = 1 − i , the surface emissivity at 89 GHz needs to be examined. Ideally, the ratio of corresponding reflectivities would be taken for each footprint. However, that is not possible without knowing atmospheric conditions and surface temperature. As an approximation, the emissivity is parameterized, and fixed reflectivity ratios depending on surface types are obtained. This was done for sea ice in Melsheimer and Heygster (2008) and for open water surfaces in Scarlat et al. (2018). The upper limit of this extended retrieval is about 15 kg m −2 . Here, we will use this extended retrieval only over sea ice.

2.5
The "sub-algorithms": regime selection As described through Sects. 2.3 and 2.4, three different channel triplets are used for the retrieval, depending on the water vapour amount and the saturation of channels; hence, there are three "sub-algorithms" or retrieval regimes. Each subalgorithm reaches its upper retrieval limit when the channel that is most sensitive to water vapour becomes saturated. In the original algorithm formulation by Melsheimer and Heygster (2008), the switch from one sub-algorithm to the next (always starting with the most sensitive one) is done only when the following saturation condition is fulfilled: This means that for each satellite footprint, only one of the three sub-algorithms is finally used. As the sub-algorithms have been calibrated independently, the switch from one to the next can cause a jump in the retrieved value. A method avoiding this discontinuity in the retrieval values will be discussed further in the follow-up paper. Additionally, as the switch between regimes is done in the brightness temperature space, this does not correspond to a strict cut-off point in water vapour. In Table 3 we summarize the characteristics of each regime.

Bias and calibration parameters
Since we ordered the channels by the water vapour sensitivity (τ i < τ j < τ k ), the difference of exponentials in T 0ij and T 0j k is negative. Therefore, the first term of the temperature difference increases with increased emissivity from negative values to 0 (reached when = 1). η c does not depend on , which cancels on the ratioing. In a plot with T j k as abscissa and T ij as ordinate, for constant W and varying , this is a straight line with slope η c (W ), running through the bias points (b j k , b ij ). Since the biases depend only weakly on W and , all straight lines for different W run through almost the same point F = (F j k , F ij ), which is called focal point by Miao et al. (2001) and Melsheimer and Heygster (2008). The focal point F is found by simulating brightness temperatures for a set of different , with different input atmospheric profiles (including W ) from radiosonde data, and surface temperature taken as ground-level atmospheric temperature (which makes the small emissivity dependence of the biases vanish; see Melsheimer and Heygster, 2008, Appendix II). Having determined the focal point, the simulated brightness temperature differences and corresponding TWV values from the radiosonde profiles can be used to get the calibration parameters C 0 and C 1 . Thus, together with the two focal point coordinates F j k and F ij , there is a total of four calibration parameters in the retrieval equation which are derived by this regression. The specific values for each viewing angle and regime of AMSU-B sensor are found in Melsheimer and Heygster (2008), Appendix III. For MHS, all these calibration parameters were recalculated and are shown in Appendix A.

Filtering ice cloud artefacts
The effect of ice clouds at the AMSU-B frequencies as studied in Sreerekha (2005) is known and has been used for detecting tropical deep convection (Hong et al., 2005) and for an automated method for finding polar mesocyclones (Melsheimer et al., 2016). The latter method uses the sen-  sitivity of retrieved TWV to convective clouds with high ice content as one of the main signatures of polar lows. In these cases, since cloud ice particles are strong scatterers in the used microwave range, the radiation from below the clouds is scattered strongly and hardly reaches the sensor, so that the AMSU-B retrieval is only sensitive to atmospheric water vapour above such clouds and retrieves erroneously low TWV. A procedure to recognize and screen such cases for the AMSU-B-MHS algorithm has been developed. Cloud ice contents high enough to affect our TWV retrieval are almost entirely caused by strong convective clouds, which are typically organized in rather small-scale (tens of kilometres) cells or clusters thereof, or which take the shape of mesoscale structures such a polar lows with extents of at most a few hundred kilometres; even in large-scale, synoptic low-pressure systems, convective clouds are organized in clusters and lines with the above-mentioned scales of tens to a few hundred kilometres. Therefore, image processing methods that rely on the size of ice cloud artefacts can be used. Our approach for eliminating the affected TWV is to find connected areas (with a minimum of two pixels) of low TWV (<4 kg m −2 ) smaller than 50 pixels that are surrounded by higher or non-retrieved values. The threshold of 50 pixels was selected because with the data on the selected latitudelongitude grid of 0.25 • it would approximate to areas of 7000 km 2 at 60 • N and 19 600 km 2 at 80 • N, and it amply covers the scale of events that need masking. Following this, we remove these connected areas with a succession of morphology operations (Gonzalez and Woods, 2007), using the tools for Python described in van der Walt et al. (2014): First a dilation with a 7 × 7 square structural element, and then a closing with the same size structural element. We ensure that only the data within the original connected areas are removed by using an image comparison between the mask and the initial connected areas.

Evaluation of retrieval
In this section, the performance of the TWV retrieval using MHS data is evaluated in Sect. 3.1. Following this, the satellite-based retrieval is firstly compared with in situ data in Sect. 3.2, and secondly to ERA5 reanalysis data in Sect. 3.3.

Comparison between MHS-and AMSU-B-based retrievals
As shown in Table 1, there are some frequency and polarization differences between AMSU-B and MHS sensors. According to the analysis in John et al. (2012), there are some non-negligible discrepancies between the brightness temperatures of AMSU-B and MHS for the second and fifth channels (17-150 GHz and 20-183.31 ± 7 GHz for AMSU-B, respectively), due to the differences in frequency, while the differences in polarization seem not to be relevant. This raises the question of whether the TWV algorithm will perform equally when using MHS data as input and, if that is not the case, which adaptation would be needed to ensure consistency of the retrieval results. One main adjustment we did to the retrieval for MHS is the recalculation of all the calibration parameters as described in Sect. 2.6 and shown in Appendix A. First, we evaluate the performance for the retrieval as a whole by comparing the retrieved data of both algorithms in the overlap period of both sensors (2008)(2009). For this  analysis, we considered all the coincident points in the daily gridded data with a 0.25 • grid. Figure 1 shows two density plots for the overlap months of January (top) and July (bottom) of 2008-2009. The results of a least-squares regression are shown in the figure as well. Both datasets show good agreement, with most of the points along the one-to-one line. However, we can observe some outliers with high MHS TWV and low, almost constant, AMSU-B TWV, and vice versa, especially striking during the month of July. These points are mostly associated with time differences of the satellite overpasses, and amount to only about 0.27 % of the data, so they are not significant in the overall picture.
In Table 4, the fit statistics for all months are shown. The correlation ranges from 0.87 in June to 0.94 in September. The lowest slope (0.82) is found in December. On the other hand, the slope is closest to 1.0 in May (0.91). The intercept increases for the summer months (June, July, August) but is relatively small for the other months. The RMSD has a similar behaviour: we find higher values for the central months of the year, with a maximum of 2.25 kg m −2 in August, coinciding with the increased number of outliers. The minimum is 0.73 kg m −2 in March. The bias is generally small (minimum of 0.04 kg m −2 in March, maximum of 0.49 kg m −2 in September), and positive except for May and June. In general, all parameters show the lowest agreement in the summer months when the atmospheric variability is highest. However, we presume the strongest contribution to the lower agreement in summer is due to the higher uncertainty and variability in the surface emission due to melt process and occurrence of melt ponds.
To check any possible influence from the surface type in the consistency of our retrievals, we have studied the TWV time series during 2008-2009 for MHS and AMSU-B over different surfaces: ice, land and open water. The location chosen for each study point is shown in Fig. 2, with the surface classification used in the TWV retrieval for a day in early March 2008 (maximum ice extent) as background. We show the monthly and yearly means of this time series for the four different locations in Fig. 3. Note the lack of data for summer months over open water and ocean because of the limitations of the algorithm. All four time series show good agreement, which confirms the consistency between our retrievals. The bias and RMSD are small for all four surface types (ice: 0.1 ± 0.4 kg m −2 ; open water: 0.03 ± 0.15 kg m −2 ; marginal ice zone: 0.2 ± 0.7 kg m −2 ; land: 0.12 ± 0.19 kg m −2 ) but slightly higher in two cases with ice surfaces, which agrees with the higher error of our method for higher water vapour values (extended regime).

Comparison with in situ data sources
While TWV retrieved from AMSU-B has been validated with different data sources (Rinke et al., 2009;Palm et al., 2010;Buehler et al., 2012), the same cannot be said about the retrieval with MHS data. Therefore, we perform a comparison with TWV derived from radiosondes taken during the N-ICE2015 campaign from January to June 2015 onboard research vessel Lance north of Svalbard Cohen et al., 2017). We select the MHS data as the mean of all the values in a 50 km radius around the location of each radiosonde. The resulting time series is shown in Fig. 4. The first thing to note is that the MHS series ends at the start of June because, afterward, the surface in the area is considered mixed according to the criteria described in Sect. 2.1. However, both datasets show good visual agreement, except  that MHS is not able to capture some of the quasi periodic peaks in TWV from N-ICE2015 dataset (seen roughly every 2 weeks in February and March). We have eliminated these nine outliers associated with the quasi periodic peaks in TWV from the following analysis. The scatter plot of all overlapping points of both datasets, with the colour scale representing the month of the campaign (shown in Fig. 5) confirms the good agreement.
Additionally, we used global positioning system (GPS) and radiosounding (RS) TWV observations during the common 2008-2009 period between the AMSU-B and MHS sensors to evaluate the satellite TWV retrieval. GPS and radiosonde TWV have been measured at the five coastal Arctic stations Alert, Eureka, Ny-Ålesund, Resolute and Scorebysund, as shown in Fig. 6. These datasets are part of a homogenized time series. From the GPS data, 1 h average values of local integrated TWV have been computed each 6 h. The  radiosoundings have been performed once or twice per day at the selected sites (00:00 and 12:00 UTC). Further details about processing can be found in Negusini et al. (2016). As for the AMSU-B and MHS TWV values, we selected points fulfilling the data conditions of ±1 h from the integrated GPS measurements (00:00, 06:00, 12:00 and 18:00 UTC) and found in a 50 km radius around the GPS and RS stations. Additionally, TWV data from ERA5 reanalysis (Copernicus Climate Change Service, C3S, 2017) were obtained using the same conditions. The resulting AMSU-B, MHS, ERA5, GPS and radiosonde time series in Fig. 7 present generally consistent patterns and reasonable seasonal evolution, with drier winters and wetter summers. Overall, the datasets have worse agreement during the summer months, mainly due to "spikier" data, i.e. more extreme water vapour values. Due to this pronounced seasonal cycle, we separate the results between summer (April to September) and winter (October to March) in the following analysis. There seems to be a slight wet bias in summer for both satellite-derived TWV with respect to the other datasets.
Scatter plots comparing each dataset (both satellite and reanalysis) with both radiosondes and GPS have been prepared for each season and station. As an example, Fig. 8 shows the results for Alert. The correlation coefficients vary between 0.55 to 0.88, and the correlations in winter seems to be generally lower. We presume this is just a numerical effect because of the narrower data distribution. The RMSD, in contrast, is higher in summer (as seen in Fig. 9). The only difference between both satellite-based retrievals seems to be a smaller number of coincident points between the MHS TWV and the radiosondes TWV (approximately half of the data points). Figure 9 shows all fit parameters for the five stations, with separated results between summer and winter. There seems to be only little difference between the results from the two satellite-based retrievals, which corroborates our confidence in the MHS-based retrieval. Over the three quality-indicating parameters, RMSD, bias and correlation coefficient, there is even a slight but consistent advantage for the MHS-based retrieval. The bias values are almost all negative, and the RMSD is along usual values for TWV studies at high latitudes (as seen in Palm et al., 2010, for Ny-Ålesund and in Buehler et al., 2012, for Kiruna), which reassures us of the quality of satellite-based TWV retrievals. The higher RMSD values in the Arctic summer in Fig. 9 can also be seen at high TWV values over 7 kg m −2 during summer for all methods in Fig. 8a and c. One explanation for the smaller bias and RMSD during winter can be that also the absolute values during winter are small. The reason for a low correlation is likely that the temporal coherence is less pronounced.
When fits like in Fig. 8 are performed for all stations for ERA5 vs. GPS and radiosondes, the slopes are closer to one in summer (0.99 for GPS and 0.87 for radiosondes on aver-age for all stations) but underestimate data to a higher degree in winter (on average 0.85 for GPS and 0.76 for radiosondes). This seasonal variation is similar for the correlation coefficient, higher in summer (averaging 0.9 and 0.92) and lower in winter, 0.85 and 0.75 for GPS and radiosondes, respectively. These values are very similar to the averages for the satellite data vs. the in situ data. The RMSD and bias are generally small but smaller in winter. The average RMSD is 1.89 kg m −2 in summer and 1.10 kg m −2 in winter for GPS and 1.58 kg m −2 in summer and 1.05 kg m −2 in winter for radiosondes. The average bias is generally negative for GPS, averaging −0.5 kg m −2 in summer and −0.02 kg m −2 in winter, while it is always positive for radiosondes, averaging 0.34 kg m −2 in summer and 0.17 kg m −2 in winter.

Comparison with ERA5 reanalysis
The reanalysis product ERA5 combines a variety of observations and a numerical model using an optimization procedure. Due to ERA5 assimilation of some of the observations used as verification here, namely radiosondes, it is not a completely independent estimate of TWV. ERA5 also assimilates some 183GHz data over sea ice and snow-covered surfaces (as suggested in Bormann et al., 2017), including the MHS sounding channels. While it is unclear to the authors which sensors would have been available and assimilated within ERA5 for the time period 2008-2010 of this study, we cannot presume that ERA5 TWV is entirely independent of microwave humidity sounder radiances.
For this study, we have compiled all the overlapping daily means of TWV from AMSU-B and ERA5 (Copernicus Climate Change Service, C3S, 2017) for the complete months of January (top) and July (bottom) from 2008 to 2009, shown in Fig. 10. The results of a least-squares regression are shown in Fig. 10 as well. Both datasets show good agreement, with most of the points along or parallel to the one-to-one line. Low AMSU-B TWV values compared to high ERA5 TWV values can be observed in both months but are more prominent in summer. These are remnants of ice cloud artefacts that were not entirely filtered out. Table 5 shows the fit statistics for all months. The correlation ranges from 0.71 in June to 0.88 in December. The worst slope (1.6) is found in September. On the other hand, the slope is closest to 1.0 in August (0.97). However, the RMSD has higher values for summer months of the year, with a maximum of 5.9 kg m −2 in August, coinciding with the increased number of outliers. The minimum is 1.00 kg m −2 in March. The bias is generally negative and shows similar behaviour to the RMSD. In general, all parameters show the lowest agreement in the summer months when the atmospheric variability is highest.  4 Evaluation of changes and improvements in the retrieval: filtering ice cloud artefacts Figure 11 shows daily averaged TWV maps with the ice cloud filtering (Sect. 2.7) already applied for the AMSU-B-MHS algorithm (top and second row), as well as from a dif-ferent data product based on AMSR-E observations (Wentz and Meissner, 2006) over open ocean (third row) and ERA5 reanalysis daily mean (bottom row) in winter (left column) and summer (right column). The days chosen to represent each season (6 January and 6 July 2008, respectively) show what a typical retrieval looks like for the respective season.
The first thing to notice is the difference in spatial coverage of AMSU-B TWV between winter and summer. In summer, AMSU-B-MHS retrieval is restricted to the drier regions, mostly over sea ice and Greenland (the upper limit of the retrieval is usually about 15 kg m −2 for sea ice surfaces). In winter, the retrieval is possible over most of the land, open water areas and sea ice. Meanwhile, there is no significant coverage variation shown between seasons for the AMSR-E retrieval: most open water areas are covered. As a consequence, the area covered by both methods is smaller in summer, as we can note in the map illustrating the regional coverage (for the same days) of both algorithms in Fig. 12 (orange area shows joint coverage). Still, TWV is retrieved in most of the Arctic in both seasons. Another consequence is that in summer the overlap area is small. In this particular example of Fig. 12, there is no overlap between both datasets. As for the ERA5 dataset, the agreement with both AMSU-B and AMSR-E is qualitatively good, showing similar patterns, particularly in winter.
To visualize the areas affected by the ice cloud artefact, Fig. 13 shows different areas of interest before (left) and after (right) filtering, for different days spaced evenly throughout 2008 (approximately every 3 months: 6 January, 2 April, 6 July and 14 October). These areas have been chosen as representative cases for the season. Most features (small regions of low TWV surrounded by high TWV) are removed, but there are still some small areas of low values of TWV (such as the retrieved regions in the land around 62 • N, 70 • W Fig. 13 (October, bottom right)). Note that these incorrectly retrieved areas are surrounded by grey values which represent water vapour too high to be retrieved with the AMSU-B method (about >7 kg m −2 over ocean or land surfaces). We confirmed by comparison to the ERA5 atmospheric reanalysis that the remaining high TWV values are within the expected range. Also the high, >14 kg m −2 , TWV values on 6 July 2008 in the Hudson Bay area are in agreement with ERA5.
To show the overall effectiveness of the ice cloud filtering, we have compiled all the overlapping retrieved TWV from AMSU-B and AMSR-E for the complete months of January (Fig. 14a, b) and July (Fig. 14c, d) from 2006 to 2008. Before filtering for ice cloud artefacts (Fig. 14a, c), there is a big cluster of data with high AMSR-E values for relatively low AMSU-B values. Those correspond to the values affected by convective clouds with high ice content. Note that the overlap area between AMSR-E and AMSU-B is small (Fig. 12) and therefore cloud artefacts make up a large fraction of the overlap data points, particularly in summer. After filtering (Fig. 14b, d)   gone. Additionally, the fit performed improves significantly, with the correlation reaching 0.6 in summer and the slope getting much closer to one in winter (0.95, as compared to 0.3). Note also the jump in density of the retrieved TWV values caused by switching between sub-algorithms mentioned above (Sect. 2.5), most notably near 6 kg m −2 (Fig. 14). Between 7.6 % (January) and 11 % (March) of the data are masked by the ice cloud filter for winter months, while the percentage is much smaller in the summer months, ranging between 0.18 % of the data in August to 3.7 % in June. In summer, up to 94 % of those values (July) come from the overlap area between AMSU-B and AMSRE, with the aver-age 55.5 %. In winter, the values from the overlap area average 11.8 %.

Conclusions
We provide an updated version of the TWV retrieval algorithm that originally uses as input microwave humidity sounder data from AMSU-B. The updated algorithm can now also use data from MHS, the successor instrument of AMSU-B, and contains a filter for artefacts caused by convective clouds with high cloud ice content. The improved retrieval performs better when compared to another satellite product and to in situ data. The coefficients in the retrieval algorithm were adapted for MHS (Appendix A). We have investigated the impact of differences between AMSU-B and MHS on the retrieved TWV and have found the differences to be negligible. This means that a consistent continuous dataset for the years 1999 until 2020 can be generated from combining AMSU-B and MHS data. Additionally, the MHS-based TWV data have been compared with radiosonde data from the N-ICE2015 campaign, and the results show good performance for MHS TWV. Both satellite-derived TWV have been compared against GPS and radiosonde data for five Arctic coastal stations during 2008 and 2009, and the results are satisfactory, with averaged correlations for all stations and methods 0.82 in summer and 0.75 in winter and RMSD along usual values for TWV studies at high latitudes. The satellite-based TWV retrieval also compares well with the ERA5 reanalysis. Some artefacts of unfiltered ice clouds remain, but overall the correlation with 0.79 and RMSD of 3.01 kg m −2 shows good correspondence.
The filter for ice cloud artefacts performs well, as shown by comparison with data from the AMSR-E-based algorithm that works over open water. A remaining issue is the jumps of retrieved TWV values between the different retrieval regimes. This can, however, in principle be mitigated by comparing root-mean-square differences and bias for adjacent TWV regimes and choosing an optimal regime, i.e. channel combination, for the range of the water vapour column. Where regimes overlap, weighted averages can smooth the transition.
The algorithm described here has an upper TWV limit that restricts retrieval in summer to the central Arctic and Greenland. However, when combining the TWV data retrieved by the algorithm described here with TWV retrieved over open ocean from AMSR-E and AMSR2, a product of remote sensing systems (RSSs) (Wentz and Meissner, 2006), a nearly complete year-round coverage of the whole Arctic is possible, starting in 2000, which is the overall goal of future work.

Appendix A
The following tables list the calibration parameters C 0 , C 1 , F j k and F ij for the TWV retrieval algorithm for the Arctic and (for the sake of completeness) the Antarctic, for 15 viewing angles that span the range of the viewing angles of MHS, calculated in the same way as the parameters for AMSU-B-based retrieval by Melsheimer and Heygster (2008). The retrieval equation is taken from Eqs. (5) and (8) as follows: where T ij = T b, i − T b, j and the MHS channels i, j and k are -5 (190.31 GHz), 4 (183.31 ± 3 GHz), and 3 (183.31 ± 1 GHz) for the low-TWV algorithm; -2 (157 GHz), 5 (190.31 GHz), and 4 (183.31 ± 3 GHz) for the mid-TWV algorithm; and from Eqs. (10) and (9) as follows: where i, j and k are 1 (89.9 GHz), 2 (157 GHz) and 5 (190.31 GHz) for the extended algorithm. The calibration parameters for the Arctic (Tables A1-A3) were derived using radiosonde data from those World Meteorological Organization (WMO) stations in the Arctic that are located on the coast or on islands (29 stations), from the years 1996 to 2002, which amounts to about 27 000 radiosonde profiles.