Performance evaluation of multiple satellite rainfall products for Dhidhessa River Basin (DRB), Ethiopia

Precipitation is a crucial driver of hydrological processes. Ironically, a reliable characterization of its spatiotemporal variability is challenging. Ground-based rainfall measurement using rain gauges is more accurate. However, installing a dense gauging network to capture rainfall variability can be impractical. Satellite-based rainfall estimates (SREs) could be good alternatives, especially for data-scarce basins like in Ethiopia. However, SRE rainfall is plagued with uncertainties arising from many sources. The objective of this study was to evaluate the performance of the latest versions of several SRE products (i.e., CHIRPS2, IMERG6, TAMSAT3 and 3B42/3) for the Dhidhessa River Basin (DRB). Both statistical and hydrological modeling approaches were used for the performance evaluation. The Soil and Water Analysis Tool (SWAT) was used for hydrological simulations. The results showed that whereas all four SRE products are promising to estimate and detect rainfall for the DRB, the CHIRPS2 dataset performed the best at annual, seasonal and monthly timescales. The hydrological simulation-based evaluation showed that SWAT’s calibration results are sensitive to the rainfall dataset. The hydrological response of the basin is found to be dominated by the subsurface processes, primarily by the groundwater flux. Overall, the study showed that both CHIRPS2 and IMERG6 products could be reliable rainfall data sources for the hydrological analysis of the DRB. Moreover, the climatic season in the DRB influences rainfall and streamflow estimation. Such information is important for rainfall estimation algorithm developers.

Abstract. Precipitation is a crucial driver of hydrological processes. Ironically, a reliable characterization of its spatiotemporal variability is challenging. Ground-based rainfall measurement using rain gauges is more accurate. However, installing a dense gauging network to capture rainfall variability can be impractical. Satellite-based rainfall estimates (SREs) could be good alternatives, especially for data-scarce basins like in Ethiopia. However, SRE rainfall is plagued with uncertainties arising from many sources. The objective of this study was to evaluate the performance of the latest versions of several SRE products (i.e., CHIRPS2, IMERG6, TAMSAT3 and 3B42/3) for the Dhidhessa River Basin (DRB). Both statistical and hydrological modeling approaches were used for the performance evaluation. The Soil and Water Analysis Tool (SWAT) was used for hydrological simulations. The results showed that whereas all four SRE products are promising to estimate and detect rainfall for the DRB, the CHIRPS2 dataset performed the best at annual, seasonal and monthly timescales. The hydrological simulation-based evaluation showed that SWAT's calibration results are sensitive to the rainfall dataset. The hydrological response of the basin is found to be dominated by the subsurface processes, primarily by the groundwater flux. Overall, the study showed that both CHIRPS2 and IMERG6 products could be reliable rainfall data sources for the hydrological analysis of the DRB. Moreover, the climatic season in the DRB influences rainfall and streamflow estimation. Such in-formation is important for rainfall estimation algorithm developers.
G. K. Wedajo et al.: Satellite rainfall performance evaluation using hybrid techniques niques provide point measurements and are subject to missing data mainly due to measurement errors Maggioni et al., 2016). It may also be infeasible to install and maintain dense ground-based gauging stations in remote areas like mountains, deserts, forests and large water bodies Tapiador et al., 2012). On the other hand, radar-based rainfall measurement techniques cover larger areas and provide rainfall data at high spatial and temporal scales (Sahlaoui and Mordane, 2019). However, radar rainfall measurements have limitations due to the attenuation of the radar signal by several features that negatively affect the quality of rainfall measurement (Villarini and Krajewski, 2010;Berne and Krajewski, 2013;Sahlaoui and Mordane, 2019). Satellite-based rainfall estimates (SREs), however, provide high-resolution precipitation data including in areas where ground-based rainfall measurements are impractical, sparse or nonexistent (Stisen and Sandholt, 2010).
As indirect rainfall estimation techniques, SRE products possess uncertainties resulting from errors in measurement, sampling, retrieval algorithm and bias correction processes (Dinku et al., 2010;Gebremichael et al., 2014;Tong et al., 2014). Local topography and climatic conditions can also affect the accuracy of SRE estimation (Bitew and Gebremichael, 2011). Hence, SRE products should be carefully evaluated before using the products for any application. Statistical and hydrological modeling are two common methods for evaluating SREs. The statistical evaluation method examines the intrinsic precipitation data quality including its spatiotemporal characteristics via pairwise comparison of the SRE products and ground observations. Scale mismatches between area-averaged SRE data and point-like groundbased measurements are the most critical drawback. The hydrological modeling method evaluates the performance of an SRE product for a specific application such as streamflow predictive ability at watershed scale . The two methods complement each other in that the statistical method provides information on data quality, while the hydrological model technique assesses the usefulness of the data for hydrological applications (Thiemig et al., 2013). However, most studies used only statistical evaluation methods (e.g., Dinku et al., 2018;Ayehu et al., 2018).
Studies have recommended SRE products for data-scarce basins (Behrangi et al., 2011;Bitew and Gebremichael, 2011;Thiemig et al., 2013). However, there is no consensus regarding the "best" SRE product for different climatic regions. Nesbitt et al. (2008) found that CMORPH and PER-SIANN produced higher rainfall rates compared to TRMM for the mountain ranges of Mexico. Dinku et al. (2008) reported the better performance of the TRMM and CMORPH products in Ethiopia and Zimbabwe, whereas PERSIANN outperformed TRMM in South America according to de Goncalves et al. (2006). Interestingly, the performance of SRE products seems to differ even within a basin. For the Blue Nile Basin in Ethiopia, for example, CMORPH overestimated precipitation for the lowland areas but underestimated it for the highlands (Bitew and Gebremichael, 2011;Habib et al., 2012;Gebremichael et al., 2014). The discrepancy in the findings of these studies shows the performance of SREs varies with region, topography, season and climatic conditions of the study area (Kidd and Huffman, 2011;Seyyedi et al., 2015;Nguyen et al., 2018;. As such, many studies have recommended SRE evaluation at a local scale to verify its performance for specific applications (Hu et al., 2014;Toté et al., 2015;Kimani et al., 2017;Ayehu et al., 2018).
Studies have examined the performance of SREs in Ethiopia (Haile et al., 2013;Worqlul et al., 2014;Ayehu et al., 2018;Dinku et al., 2018). However, a majority of these studies used the statistical method to evaluate SREs, and no study has been completed for the Dhidhessa River Basin (DRB). With only 0.32 rain gauges per 1000 km 2 , the DRB meets the World Meteorological Organization (WMO) datascarce basin classification (WMO, 1994). Evaluating the performance of various SRE products in terms of characterizing the spatiotemporal distribution of rainfall in the DRB could assist with the planning and management of existing and planned water resource projects in the river basin.
SREs have been continuously updated to minimize bias and uncertainty. Evaluating and validating improved products for various climatic regions would be valuable (Kimani et al., 2017). Recently improved SRE products include Tropical Rainfall Measuring Mission (TRMM) Multi-Satellite Precipitation Analysis version 7 (hereafter referred to as 3B43 for monthly and 3B42 for daily products), Climate Hazards Group InfraRed Precipitation with Station data version 2 (CHIRPS2), Tropical Applications of Meteorology using SATellite version 3 (TAMSAT3) and Integrated Multi-satellitE Retrievals for GPM version 6B (IMERG6). Studies have reported improvements in these new versions compared to their predecessors. However, to the best of the authors' knowledge, the rainfall detection and hydrological simulation capability of these SRE datasets were not evaluated for the basins in Ethiopia including the DRB. This study examined the latest SRE products in terms of their rainfall de-tection and estimation skills and how to improve the hydrological prediction for the DRB, a medium-sized river basin with scarce gauging data. As such, the objectives of this study were (1) to evaluate the intrinsic rainfall data quality and detection skills of multiple SRE products (i.e., 3B42/3, CHIRPS2, TAMSAT3 and IMERG6) and (2) to examine hydrological prediction performances of SREs for the DRB. The Soil and Water Assessment Tool (SWAT), a physically based semi-distributed model that has performed well in humid tropical regions like Ethiopia, was used for the hydrological simulation.
2 Methods and materials

Descriptions of the study area
The Dhidhessa River drains into the Blue Nile River (Fig. 1). It is one of the largest and most important river basins in Ethiopia in terms of its physiography and hydrology (Yohannes, 2008). Located between 7 • 42 43 to 10 • 2 55 N latitude and 35 • 31 23 to 37 • 7 60 E longitude, the river basin exhibits highly variable topography that ranges from 619 m to 3213 m above mean sea level (amsl). The Dhidhessa River starts from the Sigmo mountain ranges and travels 494 km before it joins the Blue Nile River around the Wanbara and Yaso districts. The outlet considered for this study is the confluence of the Dhidhessa River and the Blue Nile River which covers a total drainage area of 28 175 km 2 . The river basin has many perennial tributaries (Fig. 1).
Temperature and precipitation in the Dhidhessa River Basin exhibit substantial spatial and seasonal variability. The mean maximum and minimum daily air temperatures in the river basin range from 20-33 to 6-19 • C, respectively. The long-term mean annual rainfall ranges from 1200 to 2200 mm in the river basin. Soils in the DRB are generally deep and have high organic content, implying that they have high infiltration potential. The dominant soil type is Acrisols, while Cambisols and Nitisols are common (OWWDSE 1 ). Igneous, sedimentary and metamorphic rocks are common, but igneous rock, particularly basalt, is dominant in the basin. 2 Forest, shrubland, grassland and agriculture are the dominant land cover types in the basin (Kabite et al., 2020). Major crops include perennial and cash crops like coffee, mango and avocado (OWWDSE, 2014

Data sources and descriptions
For this study, we used different spatial and temporal datasets such as a digital elevation model (DEM), climate, streamflow, soil and land cover from different sources ( Table 1).
The DEM derived from the Shuttle Radar Topography Mission (SRTM) of 30 m×30 m spatial resolution was obtained from the United States Geological Survey (USGS). It is one of the input data for the SWAT model from which topographic and drainage parameters (e.g., drainage pattern, slope and watershed boundary) were derived. The soil map was obtained from the sources described in Table 1. The soil physical properties required for the SWAT model were derived from the soil map. The supervised image classification was used to prepare the land cover map of 2001. Together with the land cover and soil maps, a DEM was used to create hydrological response units (HRUs).
Rainfall data for nine stations within the river basin and for three nearby stations ( Fig. 1) from 2001 to 2014 were obtained from the National Meteorological Agency (NMA) of Ethiopia. The rainfall data were used to evaluate the SREs using the statistical and hydrological modeling evaluation methods. In addition, Enhancing National Climate Service Time Series (ENACTS) gridded (4 m × 4 m) minimum and maximum air temperature data were obtained from the National Meteorological Agency (NMA) of Ethiopia. Daily streamflow data from 2001 to 2014 were obtained for a station near the town of Arjo (Fig. 1) from the Ethiopian Ministry of Water, Irrigation and Energy (EMoWI).
The hydrometeorological stations used for this study were selected due to their long-term records and better data quality. The observed streamflow was used to calibrate and validate the SWAT model. The land use map for 2001 and soil map were obtained from Kabite et al. (2020) and the Ethiopian Ministry of Water, Irrigation and Energy (EMoWI), respectively.

Satellite rainfall products
The satellite rainfall estimates (SREs) considered in this study include 3B42/3, TAMSAT3, CHIRPS2 and IMERG6. These datasets were selected because of several reasons including that they (i) have relatively high spatial resolutions, (ii) are gauge-adjusted products, (iii) are the latest products and have been found to perform well by recent studies, and (iv) were not compared for the basins in Ethiopia, particularly IMERG6.
The TMPA provides rainfall products for the area covering 50 • N-50 • S for the period of 1998 to the present at 0.25 • ×0.25 • and 3 h spatial and temporal resolution, respectively. The 3 h rainfall product is aggregated to daily (3B42) and monthly (3B43) gauge-adjusted post-real-time precipitation. The performance of the 3B42v7 is superior compared to its predecessor (i.e., 3B42v6) and the real-time TMPA product (3B42RT) (Yong et al., 2014). The 3B43 was used in this G. K. Wedajo et al.: Satellite rainfall performance evaluation using hybrid techniques  study for the statistical evaluation, while the 3B42 was used for the hydrological performance evaluation. The detailed description is given by Huffman et al. (2007). The TAMSAT3 algorithm estimates precipitation in an indirect method using the cloud-index method, which compares the cold cloud duration (CCD) with a predetermined temperature threshold. The CCD is the length of time that a satellite pixel is colder than a given temperature threshold.
The algorithm calibrates the CCD using parameters that vary seasonally and spatially but are constant from year to year. This makes interannual variations in rainfall dependent only on the satellite observation. The dataset covers the whole of Africa at ∼ 4 km and 5 d (pentadic) resolutions for the period of 1983 to the present. The original 5 d temporal resolution is disaggregated to daily time steps using daily CCDs from which monthly data are derived. TAMSAT3 algorithm is improved compared to its processor (i.e., TAMSAT2). The details are described in Maidment et al. (2017).
The Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) is a quasi-global precipitation product at ∼ 5 km (0.05 • ) spatial resolution and is available at daily, pentadic (5 d) and monthly timescales. The CHIRPS precipitation data are available from 1981 to the present. It is a gauge-adjusted dataset which is calculated using weighted bias ratios rather than using absolute station values, which minimizes the heterogeneity of the dataset . The latest version of CHIRPS that uses more station data (i.e., CHIRPS version 2 hereafter CHIRPS2) was used in this study. A detailed description of CHIRIPS2 is given in Funk et al. (2015).
The Global Precipitation Measurement (GPM) is the successor of TRMM with better rainfall detection capability. GPM provides precipitation measurements at 0.1 • and halfhourly spatial and temporal resolutions. Integrated Multi-satellitE Retrievals for GPM (IMERG) is one of the GPM precipitation products estimated from all constellation microwave sensors, infrared-based observations from geosynchronous satellites and monthly gauge precipitation data. IMERG is the successor algorithm of TMPA. The IMERG products includes Early Run (near real time with a latency of 6 h), Late Run (reprocessed near real time with a latency of 18 h) and Final Run (gauge-adjusted with a latency of 4 months). The IMERG Final Run product provides more accurate precipitation information compared to the near-realtime products as it is gauge-adjusted. The latest release of GPM IMERG Final Run version 6B (IMERG6) was used for this study. The details are described in Huffman et al. (2014).
In this study, the performances of 3B42/3, TAMSAT3, CHIRPS2 and IMERG6 rainfall products were evaluated statistically and hydrologically. All the SREs considered in this study are gauge-corrected, and thus bias correction may not be required. Therefore, rain gauge stations (e.g., Jimma and Nekemte) that were used for calibrating the SRE datasets were excluded for fair comparison. The lists of rain gauge stations used for this study are shown in Fig. 1 and Appendix Table A1. The detail summaries of the data types used for this study are shown in Table 1.

Methodology
Satellite rainfall estimates offer several advantages compared to the conventional methods but can also be prone to multiple errors. The rainfall detection capability of SREs can be affected by local climate and topography (Xue et al., 2013;Meng et al., 2014). Therefore, the performance of SREs should be examined for a particular area before using the products for any application (Hu et al., 2014;Toté et al., 2015;Kimani et al., 2017).
The two common SRE performance evaluation methods are statistical (i.e., ground truthing) and hydrological modeling performance (Behrangi et al., 2011;Bitew and Ge-bremichael, 2011;Thiemig et al., 2013;Abera et al., 2016;Jiang et al., 2017), and they were used in this study. The methods complement each other, and their combined application is recommended for more reliable SRE evaluation techniques. The statistical evaluation method involves a pairwise comparison of SREs and the rain gauge products. The method provides insight into the intrinsic data quality, whereas the modeling approach assesses the usefulness of the data for a desired application (Thiemig et al., 2013). Statistical evaluation was performed for all the SRE products considered in this study (i.e., 3B43, CHIRPS2, TAMSAT3 and IMERG6) to examine their rainfall detection skills. Continuous and categorical validation indices were used to evaluate the performance of the products. In addition, the SRE product and gauge datasets were independently used as forcing to calibrate and verify the SWAT model. Accordingly, streamflow prediction performance of the rainfall products was evaluated graphically and using statistical indices.

Statistical evaluation of satellite rainfall estimates
The statistical SRE evaluation method was conducted at monthly, seasonal and annual timescales for the overlapping period of all the rainfall data sources (i.e., 2001-2014). A daily comparison was excluded from this study due to weak performance reported in previous studies (Ayehu et al., 2018;Zhao et al., 2017;Li et al., 2018). This is attributed to the measurement time mismatch between ground and satellite rainfall products.
Two approaches are commonly used for the statistical evaluation method. The first approach is pixel-to-pixel pairwise comparisons of the spatially interpolated gauge-based and satellite-based data. The second approach is a pointto-pixel pairwise comparison in which satellite rainfall estimates are extracted for each gauge location, and the satellitegauge data pairs are generated and compared. The second approach was used for this study. This is because the 12 rainfall stations considered in this study are too unevenly distributed throughout the basin to accurately represent spatial variability in rainfall in the DRB as required for the first approach. As a result, we chose to extract gauge-satellite rainfall pair values at each rain gauge location instead of interpolating the gauge measurements into gridded products.
Accordingly, 168 and 2016 paired data points were extracted for annual and monthly analysis, respectively, and were evaluated using continuous validation indices such as Pearson correlation coefficient (r), bias ratio (BIAS), Nash-Sutcliffe efficiency (E) and root mean square error (RMSE). The Pearson correlation coefficient (r) evaluates how well the estimates correspond to the observed values, BIAS reflects how the satellite rainfall estimate over-or underestimates the rain gauge observations, and E shows how well the estimate predicted the observed time series. On the other hand, RMSE measures the average magnitude of the estimate errors. The summary of performance indices are presented in Table 2.
In addition to the continuous validation indices, tercile categories (i.e., percentile-based evaluation) along with probability of exceedance were performed to test the performance of SREs in detecting low-and high-end values. The tercile (percentile) and probability of exceedance methods better evaluate the rainfall detection capabilities of SREs at monthly timescales compared to the other categorical indices such as probability of detection (POD), false alarm ratio (FAR) and critical success index (CSI). This is because the POD, FAR and CSI are not effective for monthly based analysis but effective for daily based analysis.
Tercile is a set of data that is partitioned into three equal groups each containing one-third of the total data. To calculate terciles, percentiles were used for this study. Accordingly, the low, middle and high terciles were defined using the 33rd, 67th and 100th percentiles. As such, the first 33rd percentile is named lower tercile (P33), the second 33rd percentile is named medium tercile (P67), and the third 33rd percentile is named higher tercile (P100). On the other hand, the probability of exceedance was calculated as a percentage of a given event to be equaled or exceeded.
where P represents the percentage probability that a given event will be equaled or exceeded, m represent ranks of the event value, with 1 being the largest possible value, and n is the total number of events or data points on record.
In general, SREs with r > 0.7 and relative bias (RB) within 10 % can be considered as reliable precipitation measurement sources (Brown, 2006;Condom et al., 2011). However, attention should be given to certain indices depending on the application of the product (Toté et al., 2015). For flood forecasting purposes, for example, an underestimation of rainfall should be avoided (i.e., mean error, ME, > 0 is desirable). In contrast, for drought monitoring, an overestimation must be avoided (i.e., ME < 0 is preferred) (Dembélé and Zwart, 2016).

SWAT model setup
The Soil and Water Assessment Tool (SWAT) is a semidistributed, deterministic and continuous simulation watershed model that simulates many water quality and quantity fluxes (Arnold et al., 2012). It is a physically based and computationally efficient model that has been widely used for various hydrological and/or environmental applications in different regions of the world (Gassman et al., 2014). Furthermore, the capability of the SWAT model to be easily linked with calibration, sensitivity analysis and uncertainty analysis tools (e.g., SWAT Calibration and Uncertainty Program, SWAT-CUP) made it more preferable.
The SWAT model follows a two-level discretization scheme: (i) subbasin creation based on topographic data and (ii) hydrological response unit (HRU) creation by further discretizing the subbasin based on land use and soil type. An HRU is a basic computational unit assumed to be homogeneous in hydrological response. Hydrological processes are first simulated at the HRU level and then routed at the subbasin level (Neitsch et al., 2009). The SWAT model estimates surface runoff using the modified United States Department of Agriculture (USDA) Soil Conservation Service (SCS) curve number method. In this study, a minimum threshold area of 400 km 2 was used for determining the number of subbasins, and a 5 % threshold for the soil, slope and land use was used for the HRU definition. Accordingly, 13 subbasins and 350 HRUs are created for the Arjo gauging station as outlets.

SWAT model calibration and validation
The hydrological modeling performance evaluation technique is commonly performed by either calibrating the hydrological model with gauge rainfall data and then validating with SREs (i.e., static parameters) or calibrating and validating the model independently with each rainfall product (i.e., dynamic parameters) and then comparing accuracies of the streamflow predicted using the capacity of the rainfall products. The latter is preferred for watersheds such as the DRB where gauging stations are sparse and unevenly distributed. Moreover, studies have reported that independently calibrating the hydrological model with SREs and gauge data improves the performance of the hydrological model (Zeweldi et al., 2011;Vernimmen et al., 2012;Lakew et al., 2017).
The calibration, validation and sensitivity analysis of SWAT was done using the SWAT-CUP software. The sequential uncertainty fitting (SUFI-2) implemented in SWAT-CUP was used in this study (Abbaspour et al., 2007). SUFI-2 provides more reasonable and balanced predictions than the generalized likelihood uncertainty estimation (GLUE) and the parameter solution (ParaSol) methods (Zhou et al., 2014;Wu and Chen et al., 2015) offered by the tool. It also estimates parameter uncertainty attributed to input data and model parameter and structure as total uncertainty (Abbaspour, 2015). The total uncertainty in the model prediction is commonly measured by P factor and R factor. P factor represents the percentage of observed data enveloped by the 95 % prediction uncertainty (95PPU) simulated by the model. The R factor represents the ratio of the average width of the 95PPU band to the standard deviation of observed data. For realistic model prediction, P factor ≥ 0.7 and R-factor ≤ 1.5 are desirable (Abbaspour et al., 2007;Arnold et al., 2012).
The first steps in the SWAT model calibration and validation process is determining the most sensitive parameters for a given watershed. For this study, 19 parameters were identified based on the recommendations of previous studies (Roth et al., 2018;Lemann et al., 2019). Global sensitivity analysis was performed on the 19 parameters from which 11 parameters were found to be sensitive for the DRB, and they were Table 2. SRE evaluation indices, mathematical descriptions and perfect score.

Indices
Mathematical expression Description Perfect score R g is gauge rainfall observation; R s satellite rainfall estimates; 1 correlation R g is average gauge rainfall observation; R s is average satellite rainfall estimates. The value ranges from −1 to 1.

Root mean
The number of data pairs is n; the value ranges from 0 to ∞. 0 square error (mm) Bias ratio BIAS = R s R g A value above (below) 1 indicates an aggregate satellite 1 (BIAS) overestimation (underestimation) of the ground precipitation amounts.

Relative bias RB
This describes the systematic bias of the SREs; positive values 0 (RB) indicate overestimation, while negative values indicate underestimation of precipitation amounts.
Mean error ME = 1 n n i=1 R s − R g This describes the average errors of the SREs relative 0 (ME) to the observed rainfall data. 0 The value ranges from −∞ to 1; 0 < E ≤ 1 is acceptable, 1 Q o is observed discharge; Q s is simulated 0 ( %) discharge for the available pairs of data for which < ±15 % is very good. The value ranges from 0 to 1. Graphical and statistical measures were used to evaluate the prediction capability of the rainfall datasets. Accordingly, the performance of the model forced by each rainfall dataset was tested using the most widely used statistical indices (i.e., R 2 , NSE and PBIAS) in addition to the P factor and R factor.  respectively. For reference, mean annual rainfall for the DRB is 1650 mm yr −1 based on the rain gauge data, which is within 1.8 % to 3 % of the estimates provided by the products. However, total annual rainfall range estimates were substantially different among the products. The decreasing rainfall trend from the southern (highlands) to the northern (lowlands) part of the basin was captured by all products. In particular, TAMSAT3 and CHIRPS2 captured the rainfall variability in better detail perhaps due to their high spatial resolution. On the other hand, the resolution of the 3B43 rainfall product seems too coarse to satisfactorily represent spatial variability in rainfall in the basin. Figures 3 to 5 show the results of statistical evaluation indices calculated from rainfall from the rain gauges and from the SRE products. More specifically, Figs. 3 and 4 show correlation coefficients for the annual and monthly timescales, respectively. The results show that all four SRE products produced rainfall that correlates better with the ground-based rainfall observations at monthly timescales than at annual timescales. This is because the performance of SREs improved with increased time aggregation and peaks at monthly timescales. More likely, the seasonal variability is much larger than the interannual variability. The seasonal variability is, apparently, captured reasonably well, causing a higher degree of correlation for monthly data. The values of statistical evaluation indices for all products are summarized in Table 3. The results show that the CHIRPS2 performed better for the DRB with relatively higher r and E and lower BIAS, ME and RMSE for annual and monthly timescales, respectively.

Statistical evaluation
Figures 3 to 5 and Table 3 show that generally, CHIRPS2 performed better than the other three products for the DRB. Correlation coefficients for both monthly and annual timescales, as well as all the indices presented in Fig. 5, fa-vor CHIRPS2, indicating its superior performance. The relative performance of the other three SREs is inconsistent as it varies with the statistical indices used in this study. The 3B43 product, for example, performed worse based on Figs. 3 and 4 (i.e., correlation coefficients for annual and monthly timescales) and RMSE and E (Fig. 5) but performed better than the other two SREs based on BIAS and ME.
The tercile (percentile) categorical and probability of exceedance analysis results (Fig. 6) show that all the SREs considered in this study have a high rainfall detection capability for the DRB. Rainfall threshold used for this figure is 1 mm d −1 . The lower tercile (33rd percentile; P33), middle tercile (67th percentile; P67) and higher tercile (100th percentile; P100) of all SREs have values closer to the corresponding gauge values, indicating that the SREs detect rainfall for the DRB. However, CHIRPS2, 3B43 and IMERG6 have lower tercile, medium tercile and higher tercile values much closer to the gauges, respectively. Moreover, the probability of exceedance further confirms the rainfall detection capability of the SREs considered in this study for the DRB. The probability of exceedance result indicated that TAM-SAT3 has an 80 % probability to exceed 0 mm, whereas the other products have nearly 100 % probability. This is because TAMSAT3 has more observations with zero rainfall values compared to the other products. Overall, TAMSAT3 exhibited relatively less rainfall detection skill, which could be attributed to the relatively greater sensitivity of TAMSAT3 to topographic effects. Figure 7 shows the seasonal SRE performance evaluation results. The figure generally shows that the performance of the SREs varied from season to season and among the rainfall products. The main rainy season in the DRB is from June to September, while a short rainy season ranges from March to May, but the rest is the dry season (Fig. 9). For exam-  ple, CHIRPS2 is superior in detecting and estimating rainfall events for the DRB for all months (seasons). The rainfall detection and estimating capability of CHIRPS2 is better for the rainy season compared to the dry season. Likewise, the rainfall detection capability of TAMSAT3 is stronger for the rainy season (May to November) but weaker for the dry season (December to April). Compared to the other SRE products, TAMSAT3 generally poorly correlated for all months (seasons), and its BIAS was the highest for the rainy season but the lowest for the dry season.

Hydrological modeling performance evaluation
The centroid of each subbasin was used as gauging locations and used for extracting rainfall for all the SRE rainfall datasets. Thus, each subbasin is represented by a dense group of separate gauges unlike that of the measured rainfall representation. The performance of the rainfall products was evaluated using SWAT-CUP at monthly time steps. Table 4 shows details of the calibrated parameters, including their ranges, best fit values and sensitivity ranks when different rainfall datasets are used as inputs for the DRB. The best fit values were multiplied by 1 plus the given value and replaced by the given value for the parameters with r prefix and v prefix. The table shows that ranges and the best fit values vary from one rainfall data source to another. This indicates the sensitivity of hydrological model performance to rainfall products, and thus the accurate characterization of rainfall variability is very critical for reliable hydrological predictions. This finding is consistent with studies that reported that different precipitation datasets influence model performance, parameter estimation and uncertainty in streamflow predictions (Sirisena et al., 2018;Goshime et al., 2019). The relative sensitivity of the parameters also varied between the rainfall datasets. In general, the threshold depth of water in the shallow aquifer required for return flow to occur (mm) (GWQMN.gw), base flow alpha factor (AL-PHA_BF.gw), groundwater delay (day) (GW_DELAY.gw), deep aquifer percolation fraction (RCHRG_DP.gw), and runoff curve number for moisture condition II (CN2.mgt) are the top five sensitivity parameters. This seems to indi-G. K. Wedajo et al.: Satellite rainfall performance evaluation using hybrid techniques  cate that groundwater processes dominate streamflow in the DRB. This could be attributed to the dominantly deep and permeable soil, vegetated land surface, and dominant tertiary basaltic rocks in the DRB (Conway, 2000;Kabite and Gessesse, 2018). The groundwater parameters can have a strong effect on the amount of streamflow that can cause the over-or underestimation of streamflow. For this reason, the validation of streamflow was sorely dependent on the rainfall products. Figure 8 compares the observed and the predicted streamflows for the calibration (2003 to 2008) and verification (2009 to 2014) periods for all five rainfall datasets. The goodness of the streamflow predictions is also summarized in Table 5. The results show that the peak streamflow is under-  estimated for all rainfall products, including gauges, but the streamflow volume is generally overestimated. This could be due to the uncertainty in SREs for the extreme rainfall events at daily scales (Jiang et al., 2017) and the SWAT model errors. The overestimated streamflows could also be attributed to the overestimation of rainfalls by the SREs as described in the previous sections. Generally, the indices provided in Table 4 indicate that the streamflow predictions are good for CHIRPS2 and IMERG6 and satisfactory for the gauged rainfall but not for TAMSAT3 and 3B42 according to the classification system of Moriasi et al. (2007). The performance of the SREs is consistent with the climatology of the products. Mean monthly rainfall from 2001 to 2014 showed that TAM-SAT3 and 3B42 deviate more from observed rainfall, while CHIRPS2 and IMERG6 are relatively closer (Fig. 9).

Discussion
The statistical SRE evaluation results showed that all the rainfall products captured the spatiotemporal rainfall variability in the DRB except the 3B43. The poor performance of 3B43 in capturing the basin's rainfall variability is in agreement with findings of two previous studies done for other basins in Ethiopia (Dinku et al., 2008;Worqlul et al., 2014). The reasons could be attributed to the fact that gauge adjustment for the 3B43 product did not use adequate gauge data from Ethiopian highlands due to the lack of data (Haile et al., 2013) and coarse spatial resolution of the dataset (Huffman et al., 2007). However, Gebremicael et al. (2019) reported the better performance of 3B43 for the Tekeze-Atbara Basin, which is located in the northern mountainous area of Ethiopia.
A better correlation of SREs with observed rainfall was observed at monthly rather than at annual timescales for all products. This is consistent with studies that reported the performance of SREs improved with increased time aggregation that peaks at monthly timescales (Dembélé and Zwart, 2016;Katsanos et al., 2016;Zhao et al., 2017;Ayehu et al., 2018;Li et al., 2018;Guermazi et al., 2019). The weak agreement of SREs with observed data at annual timescales shows that the SREs considered in this study generally did not capture the interannual rainfall variability. In this regards, particularly the 3B43 product failed to capture annual rainfall variability compared to the other three SREs. Overall, all four SRE products overestimated rainfall for the DRB by 10 % for CHIRPS2 to 30 % for IMERG6 and TAMSAT3 (Fig. 5). This finding is consistent with studies that reported the overestimation of IMERG6 and 3B43 products for the alpine and gorge regions of China . However,   Gebremicael et al. (2019) reported underestimation of rainfall by CHIRPS2 for the Tekeze-Atbara Basin, which is a mountainous and arid basin in northern Ethiopia. Ayehu et al. (2018) also reported a slight underestimation of rainfall by CHIRPS2 for the upper Blue Nile Basin. The discrepancy between our finding and the previous studies done for the basins in Ethiopia may be due to differences in watershed characteristics such as topography, vegetation cover and climatic conditions. Generally, this study showed that the SRE products considered in this study exhibited a satisfactory rainfall detection and estimation capability for the DRB. The products could be applicable for flood forecasting applications for the DRB (Toté et al., 2015). CHIRPS2 performed better than the other three SREs for annual, seasonal and monthly timescales in detecting and estimating rainfall for the basin. The superiority of CHIRPS2 was also reported by previous studies for different parts of the world (Katsanos et al., 2016;Dembélé and Zwart, 2016), including basins in Ethiopia (Bayissa et al., 2017;Ayehu et al., 2018;Dinku et al., 2018;Gebremicael et al., 2019). For example, Dinku et al. (2018) reported better rainfall estimation capability of CHIRPS2 for eastern  Africa compared to African Rainfall Climatology version 2 (ARC2) and TAMSAT3 products. Ayehu et al. (2018) reported the better performance of CHIRPS2 for the Blue Nile Basin compared to ARC2 and TAMSAT3. The better performance of CHIRPS2 has been attributed to the capability of the algorithm to integrate satellite, gauge and reanalysis products and its high spatial and temporal resolution . In contrast, generally, the 3B43 rainfall product performed poorly for the DRB for all timescales. This could be due to its coarse spatial resolution and lack of gauge adjustments for the highlands of Ethiopia (Haile et al., 2013). The IMERG6 showed a better rainfall detection and estimation capability for the study area than the 3B43 product, which is consistent with findings of previous studies (Huffman et al., 2015;Zhang et al., 2018Zhang et al., , 2019. The better performance of IMERG6 is attributed to the inclusion of dual and high-frequency channels which improve light and solid precipitation detection capabilities (Huffman et al., 2015).
The hydrological simulation performance evaluation results of SREs showed that the accurate characterization of rainfall variability is very critical for reliable hydrological predictions. This finding is consistent with studies that reported that different precipitation datasets influence model performance, parameter estimation and uncertainty in streamflow predictions (Sirisena et al., 2018;Goshime et al., 2019). The overestimation of streamflow for all SRE products could result from uncertainty in SREs for extreme rainfall events at daily scales (Zhao et al., 2017). The overestimated streamflow could also be attributed to the overestimation of rainfalls by the SREs as described in the previous sections and the uncertainty in the SWAT model.
Overall, this study showed that CHIRPS2 and IMERG6 predicted streamflow better than the gauge rainfall and other two SRE products for the DRB. The superior hydrological performance of SRE products compared to gauge rainfall data was also reported by many other studies (Grusson et al., 2017;Bitew and Gebremichael, 2011;Goshime et al., 2019;Xian et al., 2019;Li et al., 2018;Belete et al., 2020). For example, Bitew and Gebremichael (2011) reported that satellite-based rainfall predicted streamflow better than gauge rainfall for complex high-elevation basins in Ethiopia. Likewise, a bias-corrected CHIRP rainfall dataset resulted in better streamflow prediction than a gauge rainfall dataset for the Ziway watershed in Ethiopia (Goshime et al., 2019).
The relatively poor performance of gauge rainfall compared to the CHIRPS2 and IMERG6 shows that the existing rainfall gauges do not represent spatiotemporal variability in rainfall in the DRB. The rain gauges are sparse, spatially uneven and incomplete records for the DRB. As previously mentioned, rain gauge density for the DRB is 0.32 per 1000 km 2 , which is much lower than the World Meteorological Organization (WMO) recommendation of one gauge per 100-250 km 2 for mountainous areas of tropical regions such as the DRB (WMO, 1994).
In contrast to several previous studies on SRE evaluation, the present study combined statistical and hydrological performance evaluations in the data-scarce river basin of the upper Blue Nile Basin, the Dhidhessa River Basin. This method is important for identifying SREs that better detect and estimate rainfall and selecting application-specific rainfall products such as for hydrological and climate change studies. The results of this study also highlight the seasonal dependence of rainfall detection and hydrological performance capabil-ities of SREs for the DRB and similar basins in Ethiopia. In addition, the performance of IMERG6, which is the latest SRE product, was evaluated for Ethiopian basins for the first time. Overall, this study showed that CHIRPS2 and IMERG6 rainfall products performed best in terms of detecting and estimating rainfall, as well as predicting streamflow, for the DRB.

Conclusions
Satellite rainfall estimation is an alternative rainfall data source for hydrological and climate studies for data-scarce regions like Ethiopia. However, SREs contain uncertainties attributed to errors in measurement, sampling, retrieval algorithm and bias correction processes. Moreover, the accuracy of the rainfall estimation algorithm is influenced by topography and the climatic conditions of a given area. Therefore, SRE products should be evaluated locally before they are used for any application. In this study, we examined the intrinsic data quality and hydrological simulation performance of CHIRPS2, IMERG6, 3B42/3 and TAMSAT3 rainfall datasets for the DRB. The statistical evaluation results generally revealed that all four SRE products showed a promising rainfall estimation and detection capability for the DRB. Particularly, all SREs captured the south-north declining rainfall patterns of the study area. This could be due to the fact that all the SRE products were gauge adjusted and that they are the latest improved versions. However, all the SRE datasets overestimated rainfall for the DRB, indicating that the rainfall products could be applicable for flood studies but not for drought studies. The results also showed the stronger correlation of all SREs with measured rainfall data for the monthly timescales than for the annual timescales, which shows that all the rainfall products considered in this study cannot capture interannual rainfall variability.
The quantitative statistical indices showed that CHIRPS2 performed the best in estimating and detecting rainfall events for the DRB at monthly and annual timescales. This is likely due to the fact that CHIRPS2 was developed by merging satellite, reanalysis and gauge datasets at high spatial resolution, whereas 3B43 performed poorly for the basin.
The hydrological-modeling-based performance evaluation showed that ranges, best fit values and relative sensitivities of SWAT's calibration parameters varied with the rainfall datasets. Overall, groundwater-flow-related parameters such as GWQMN.gw, ALPHA_BF.gw, GW_DELAY.gw and RCHRG_DP.gw were found to be more sensitive for all rainfall products. This showed that subsurface processes dominate hydrological responses in the DRB. The hydrological simulation performance results also showed that all the rainfall products, including the observed rainfall, overestimated streamflow and especially the high flows. The peak streamflow overestimation could be attributed to the uncertainty in SRE rainfall to predict at shorter timescales (e.g., daily) and event rainfalls. The study showed CHIRPS2 and IMERG6 predicted streamflow for the basin satisfactorily and even outperformed the performance of the gauge rainfall. The relatively poor performance of the gauge rainfalls can be attributed to the fact that the gauges are too sparse to accurately characterize rainfall variability in the basin. Overall, the CHIRPS2 and IMERG6 products seem to perform better for the DRB in detecting rainfall events, estimating rainfall quantity, and improving streamflow predictions. The new insights of this study include the following: (i) the SRE evaluation was done by combining statistical and hydrological modeling methods, (ii) the SREs considered in this study are the latest products and are reported to be the best in different studies (IMERG6 is the most recent product, and it is being evaluated in Ethiopian basins for the first time in this study), and (iii) the rainfall detection and estimation, as well as the streamflow prediction capability of SREs, is dependent on seasons. The results of this study are of interest to both scientific communities and water resource managers, and this paper has made a good contribution to improve our understanding of the latest SREs for Ethiopia and the DRB. However, the streamflow simulation capability of the selected SRE products should be tested for other hydrological model to see if model types affect the results.
Appendix A