Journal cover Journal topic
Atmospheric Measurement Techniques An interactive open-access journal of the European Geosciences Union
Journal topic
Atmos. Meas. Tech., 11, 1921-1936, 2018
https://doi.org/10.5194/amt-11-1921-2018
Atmos. Meas. Tech., 11, 1921-1936, 2018
https://doi.org/10.5194/amt-11-1921-2018

Research article 06 Apr 2018

Research article | 06 Apr 2018

# Validation of new satellite rainfall products over the Upper Blue Nile Basin, Ethiopia

Getachew Tesfaye Ayehu1,2,3, Tsegaye Tadesse2, Berhan Gessesse1, and Tufa Dinku4 Getachew Tesfaye Ayehu et al.
• 1Earth observation research division, Entoto Observatory & Research Center, Addis Ababa, 33679, Ethiopia
• 2National Drought Mitigation Center, School of Natural Resources, University of Nebraska-Lincoln, Lincoln, 830988, USA
• 3Geospatial Data & Technology Center, Institute of Land Administration, Bahir Dar University, Bahir Dar, 79, Ethiopia
• 4International Research Institute for Climate and Society, The Earth Institute at Columbia University, Palisades, NY 10964, USA
Abstract

Accurate measurement of rainfall is vital to analyze the spatial and temporal patterns of precipitation at various scales. However, the conventional rain gauge observations in many parts of the world such as Ethiopia are sparse and unevenly distributed. An alternative to traditional rain gauge observations could be satellite-based rainfall estimates. Satellite rainfall estimates could be used as a sole product (e.g., in areas with no (or poor) ground observations) or through integrating with rain gauge measurements. In this study, the potential of a newly available Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) rainfall product has been evaluated in comparison to rain gauge data over the Upper Blue Nile basin in Ethiopia for the period of 2000 to 2015. In addition, the Tropical Applications of Meteorology using SATellite and ground-based observations (TAMSAT 3) and the African Rainfall Climatology (ARC 2) products have been used as a benchmark and compared with CHIRPS. From the overall analysis at dekadal (10 days) and monthly temporal scale, CHIRPS exhibited better performance in comparison to TAMSAT 3 and ARC 2 products. An evaluation based on categorical/volumetric and continuous statistics indicated that CHIRPS has the greatest skills in detecting rainfall events (POD = 0.99, 1.00) and measure of volumetric rainfall (VHI = 1.00, 1.00), the highest correlation coefficients (r= 0.81, 0.88), better bias values (0.96, 0.96), and the lowest RMSE (28.45 mm dekad−1, 59.03 mm month−1) than TAMSAT 3 and ARC 2 products at dekadal and monthly analysis, respectively. CHIRPS overestimates the frequency of rainfall occurrence (up to 31 % at dekadal scale), although the volume of rainfall recorded during those events was very small. Indeed, TAMSAT 3 has shown a comparable performance with that of the CHIRPS product, mainly with regard to bias. The ARC 2 product was found to have the weakest performance underestimating rain gauge observed rainfall by about 24 %. In addition, the skill of CHIRPS is less affected by variation in elevation in comparison to TAMSAT 3 and ARC 2 products. CHIRPS resulted in average biases of 1.11, 0.99, and 1.00 at lower (< 1000 m a.s.l.), medium (1000 to 2000 m a.s.l.), and higher elevation (> 2000 m a.s.l.), respectively. Overall, the finding of this validation study shows the potentials of the CHIRPS product to be used for various operational applications such as rainfall pattern and variability study in the Upper Blue Nile basin in Ethiopia.

1 Introduction

Rainfall is a major component of the climate system and plays a key role in the Earth's hydrological cycle and energy balance. Rainfall variability in its rate, amount, and distribution substantially determine the Earth's ecosystem, water cycle, and climate (Huang and Van den Dool, 1993; Stillman et al., 2014). Thus, accurate measurement of rainfall is vital to analyze the spatial and temporal patterns of precipitation at various scales and advance our understanding of the effect of rainfall on agriculture, hydrology, and climatology. Conventionally, the rain gauge is a primary source of rainfall data, which has been the most accurate and reliable approach for rainfall measurement. However, ground rainfall stations in many parts of the world and most parts of Ethiopia are very sparse and unevenly distributed. As a result, analysis using rain gauge observation is significantly limited to point-based particular location. Because of this scattered distribution of weather stations, the dependability of rain gauge data to estimate areal rain and spatial distribution of rainfall over large areas of Ethiopia is considerably reduced. However, advances in remote sensing science have provided an opportunity to estimate rainfall from satellite observations and are becoming an important source of rainfall data.

Satellite-derived rainfall estimates (SREs) are widely available from thermal infrared radiation (TIR) and passive microwave (PMW) channels, from geostationary and low-Earth-orbiting satellites, respectively. The TIR-based approaches use an indirect relationship to estimate rainfall from cloud top brightness temperatures. The TIR-based rainfall estimates have some uncertainties because of misidentification of rain-producing clouds such as cirrus clouds, while warm clouds might generate a considerable amount of rain (Trejo et al., 2016). However, the PMW approach is based on the direct measurements of atmospheric liquid water content and rainfall intensity by penetrating clouds and as a result would give more accurate rainfall estimates (Kummerow et al., 2001; Young et al., 2014). However, observations from PMW are less frequent due to a relatively low temporal resolution from low-Earth-orbiting satellites. Combining TIR and PMW has been the recent approach to estimate rainfall from satellites nowadays.

Techniques for satellite rainfall estimates have limitation and embedded uncertainties because satellites do not measure rainfall by itself and should be related to precipitation based on one or multiple surrogate variables (Wu et al., 2012; Toté et al., 2015). The uncertainties, therefore, may originate in the processes of temporal samplings, error from algorithms, and satellite instruments themselves (Gebremichal et al., 2005). These may affect the accuracy of satellite-derived rainfall products and may result in a significant error when they are used for various purposes such as rainfall pattern and variability study. The issue of accuracy has received substantial attention to the extent that satellite-derived rainfall products are concerned. In this respect, stringent validation is essential to verify the performance of the product in a diverse physiographic setting and use for the intended applications.

Several studies have been conducted in Ethiopia (e.g., Dinku et al., 2007, 2008, 2011a; Hirpa et al., 2010; Romilly and Gebremichael, 2011; Young et al., 2014; Gebre et al., 2015) and specifically to the Upper Blue Nile basin (e.g., Dinku et al., 2011b; Gebremichael et al., 2014; Fenta et al., 2014; Worqlul et al., 2014) to validate the performance of satellite-based rainfall products. These studies validated mainly the skills of Tropical Applications of Meteorology using SATellite and ground-based observations (TAMSAT; Grimes et al., 1999; Thorne et al., 2001; Maidment et al., 2014; Tarnavsky et al., 2014), Africa Rainfall Climatology version 2 (ARC 2; Novella and Thiaw, 2013), Tropical Rainfall Measuring Mission (TRMM; Huffman et al., 2007), Climate Prediction Centre (CPC) morphing technique (CMORPH; Joyce et al., 2004), and Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN; Hsu and Sorooshian, 2008) precipitation products at different spatial and temporal scale and topographic patterns. The results of these studies indicate that the skills of SREs vary with the characteristics of local climate, topography, and seasonal distributions of rainfall and have shown low to moderately high skills. There is now a newly available satellite rainfall product called the Climate Hazards Group Infrared Precipitation with Stations (CHIRPS; Funk et al., 2015) with a relatively high spatial and temporal resolution (i.e., 5 km resolution at daily temporal scale) and quasi-global coverage. So far, however, there has been very little work on the performance of CHIRPS satellite rainfall estimates over Ethiopia as well as other countries in Africa. That might be because CHIRPS is a relatively new dataset. The works of Toté et al. (2015) in Mozambique can be mentioned here as the first validation work we are aware of that reveals the potential applications of CHIRPS in Africa. Maidment et al. (2017a) have also validated the performance of satellite rainfall products (including CHIRPS v2.0) in four countries in Africa (Mozambique, Nigeria, Uganda, and Zambia). In general, these few validation works have shown the promising skills of CHIRPS in Africa and its potentials for various working applications in the continent. Nevertheless, it is important to note that for better exploitation of a relatively new CHIRPS rainfall product, more validation work needs to be done at different spatial and temporal scales in the region.

For this validation study, the Upper Blue Nile basin in Ethiopia was selected because of a relatively good density of rain gauge stations, varied topography, and high spatial and temporal variability of precipitation (Taye and Willems, 2013). The aim of this study was, therefore, to compare and validate the performance of CHIRPS with rain gauge observations that were collected from 32 weather stations from 2000 to 2015. CHIRPS performance was also compared against TAMSAT 3, TAMSAT 2, and ARC 2 satellite rainfall products. In the course of this analysis, both the TAMSAT and ARC 2 products have been validated as well. In addition, this study has also compared TAMSAT 2 and TAMSAT 3 products to assess the improvements made with the new version (TAMSAT 3; Maidment et al., 2017c). The analyses used dekadal (10 days) and monthly timescale rainfall data both from the satellite products and rain gauge observations.

The paper is structured as follows. Section 2 provides the site descriptions of the study area, followed by dataset used in the study (Sect. 3). In Sect. 4 detailed descriptions of the methodology used in this study are provided. The results and discussions are given in Sect.5. Finally, Sect. 6 presents our conclusions.

2 Site descriptions

The Upper Blue Nile (UBN) basin is located in the northwestern part of Ethiopia with latitude between 745 and 1245 N and longitude between 3430 and 3945 E (Fig. 1). The Blue Nile River originates from Lake Tana in Ethiopia and travels all the way to the Sudanese border to finally meet the White Nile at Khartoum. The UBN basin is a primary source of the Nile River, and it contributes about 60 % of the annual flow of the Nile (Conway, 2005; Degefu, 2003). The basin has an approximate drainage area of 176 000 km2 (Conway, 2000). The basin is characterized by a complex topography with elevation ranging from 4261 m a.s.l. at the northeastern part of the basin to 500 m a.s.l. at the western part of the basin near the Ethiopian–Sudan border (Fig. 1). The incessantly changing topography of the basin leads to varying agro-ecology within short distances. The climate of the UBN basin ranges from humid to semi-arid. The main rainfall season (known as “Kiremt”) occurs from June to September. The dry season runs from October to January followed by a short rainy season (called “Belg”) from February to May. According to Kim et al. (2008), about 70 % of the annual precipitation in the study area (UBN basin) is observed during the Kiremt season. The UBN basin receives up to 2200 mm of annual rainfall. The annual mean rainfall varies between 1200 and 1800 mm (Conway, 2000) with an increasing trend from northeast to southwest (Kim et al., 2008). However, the basin is characterized by large temporal fluctuations in rainfall (Conway, 2000; Taye and Willems, 2013) both on intra-annual and interannual scale. As a result, the hydrological processes in the basin are quite complex and highly variable in space and time. The impact of rainfall variability in the basin is described by severe and regular climatic and hydrological extremes, such as floods and droughts and ensuing low rate of food production and poverty (Taye and Willems, 2012). Although quite a diversity of land use systems is common, the livelihoods of the majority of the populations in the basin are highly dependent on rain-fed agriculture.

Figure 1Elevation map of the Upper Blue Nile (UBN) basin and its location in Africa. The northeastern regions have higher elevation, while the northwestern regions have lower elevation.

3 Dataset

Rainfall data for this study were collected from ground-based weather stations and remote sensing satellite estimates.

## 3.1 Station data

Rain-gauge-observed daily rainfall data from 32 first- and second-class stations from 2000 to 2015 were collected from the National Meteorological Agency (NMA) of Ethiopia. First-class stations (synoptic stations) are those stations where all meteorological parameters are recorded every hour, while second class stations are those where observations are taken every 3 h. Since the SREs evaluated here incorporate rain gauge data, the available rain gauge datasets were compared with the station archives (data source for the generation of CHIRPS, TAMSAT, and ARC2) and those datasets used for the generation of SREs were removed from the analysis to guarantee the complete independence of the validation datasets. Therefore, a total of 3460 complete dekadal observations that were not used for the generation/calibration of SREs were retained for the validation over the 32 stations.

## 3.2 Satellite rainfall data

High resolution satellite rainfall products selected for this study are CHIRPS v2.0 (a relatively new satellite rainfall product), TAMSAT 3, TAMSAT 2, and ARC 2. The TAMSAT 2 product was used in this study mainly to assess the improvements made by the recent version TAMSAT 3. These rainfall products were selected because they (i) have a relatively high spatial resolution, (ii) have relatively long time series, and (iii) are freely available. Brief descriptions of these datasets are given below.

### 3.2.1 CHIRPS, v2.0

CHIRPS is a quasi-global (50 S–50 N) gridded products available from 1981 to near present at 0.05 spatial resolution ( 5.3 km) and at daily, pentadal, dekadal, and monthly temporal resolution (Funk et al., 2015). The CHIRPS dataset is developed by the U.S. Geological Survey (USGS) and the Climate Hazards Group (CHG) at the University of California (Knapp et al., 2011; Funk et al., 2015). The development of CHIRPS products entails three major input datasets and processes. First, infrared precipitation (IRP) pentad (5-day) rainfall estimates are created from two TIR satellite observations archives (i.e., Globally Gridded Satellite (GriSat) and NOAA Climate Prediction Center dataset (CPC TIR)) using cold cloud durations (CCDs) and calibrated using the Tropical Rainfall Measuring Mission Multi-Satellite Precipitation Analysis (TMPA 3B42) precipitation pentads. Then, the IRP pentads were divided by their long-term IRP mean values to be present as percent of normal. Second, the percent of normal IRP pentad is then multiplied by the corresponding Climate Hazards Precipitation Climatology (CHPClim) pentad to produce an unbiased gridded estimate, with units of millimeters per pentad, called the CHG IR Precipitation (CHIRP). In the third part of the process, the final product of CHIRPS has been produced through blending stations with the CHIRP datasets. Details of CHIRPS satellite rainfall products can be found in Funk et al. (2015).

### 3.2.2 TAMSAT

The TAMSAT product is developed by the University of Reading based on the Meteosat TIR (thermal infrared) channel. The TAMSAT rainfall estimation method (Dugdale et al., 1991; Grimes et al., 1999; Thorne et al., 2001; Maidment et al., 2014; Tarnavsky et al., 2014) assumes that rainfall is produced from convective clouds that lead to cold cloud tops, and rainfall and CCD are linearly correlated. The retrieval algorithm is calibrated using local gauge records. TAMSAT products are available from 1983 onwards at 0.0375 spatial resolution ( 4 km) and at dekadal, monthly, and seasonal temporal resolution. This validation study has considered the recent version of TAMSAT product (TAMSAT 3) for the comparison to CHIRPS product. However, the previous version (TAMSAT 2) was also incorporated to further confirm the improvements made by the recent version TAMSAT 3. The principle of the TAMSAT method is still the same for TAMSAT 2 and TAMSAT 3. However, there are some improvements on the calibration procedures and approaches. Details on the main difference between the recent version (TAMSAT 3) and the previous version (TAMSAT 2) have been provided by Maidment et al. (2017a).

### 3.2.3 ARC 2

ARC 2 is the revised version of ARC 1 (Novella and Thiaw, 2013). The ARC 2 satellite rainfall estimates were produced from two primary input data sources: (1) 3-hourly geostationary infrared (IR) data centered over Africa from the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) and (2) quality-controlled Global Telecommunication System (GTS) gauge observations reporting 24 h rainfall accumulations over Africa (Novella and Thiaw, 2013). The ARC 2 dataset is available on a daily timescale with a grid resolution of 0.1× 0.1 and with a spatial domain of 40 S–40 N and 20 W–55 E, encompassing the African continent from 1983 to the present.

4 Methodology

This study has evaluated the performance of CHIRPS satellite rainfall estimates at dekadal and monthly temporal scales against 32 rain gauge observations and compared with TAMSAT 3, TAMSAT 2, and ARC 2 products for the period of 2000 to 2015. The dekadal and monthly data were further classified to validate the satellite products per elevations and for each month over the UBN basin, respectively. The double mass curve techniques and correlation coefficient analysis (similar to Gebere et al., 2015) confirmed the consistency and homogeneity of rain gauge observations, respectively. The dekadal and monthly data were created from the aggregates of daily rain gauge observations and TAMSAT 3 and ARC 2 rainfall values, while CHIRPS and TAMSAT 2 satellite products are available at dekadal and monthly timescale. The comparison between gridded satellite rainfall estimates and ground rainfall observations can be made using either grid-to-grid or point-to-grid comparison methods. However, an attempt made to convert point ground observations to gridded interpolated dataset led to poor results due to uneven geospatial distributions of gauge stations. Thus, this study has used point-to-grid comparison approaches. For each validation station, the grid values of satellite rainfall products containing the stations were extracted and pair-wise comparisons with rain gauge values were undertaken.

## 4.1 Performance analysis

The performances of satellite rainfall estimates were analyzed using categorical and volumetric indices and the continuous statistical measures. The most common form of categorical indices is a 2 × 2 contingency table which reports the number of hit (H), miss (M), false alarm (F), and true null events. To describe whether there is rain or no rain events, a threshold value of 1.0 mm dekad−1 or month was used in evaluating the skills of the satellite products.

### 4.1.1 Categorical validation indices

This section summarizes the categorical indices used to assess the intensity of rainfall estimated by satellite products with respect to gauge observation. These include the probability of detection (POD), the false alarm ratio (FAR), and the critical success index (CSI). The POD score is defined as $H/\left(H+M\right)$, and it describes the fraction of the gauge observations detected correctly by the satellite, while the false alarm ratio, FAR $=F/\left(H+F\right)$, corresponds to the portion of events identified by the satellite but not confirmed by gauge observations. The critical success index, CSI $=H/\left(H+M+F\right)$, combines different aspects of the POD and FAR, describing the overall skill of the satellite products relative to gauge observation. All these categorical validation indices have score values ranging from 0 to 1; in general, 1 indicates perfect skill, except for FAR, where 0 is the perfect score.

### 4.1.2 Volumetric validation indices

Since the contingency table metrics do not provide information regarding the volume of correctly (incorrectly) detected rainfall by the satellite products relative to rain gauge observations, recently AghaKouchak and Mehran (2013) suggested an extension of categorical table indices known as “volumetric indices”. In this study, therefore, the volumetric indices that include (a) volumetric hit index (VHI), (b) volumetric false alarm ratio (VFAR), and (c) the volumetric critical success index (VCSI) that were proposed by AghaKouchak and Mehran (2013) have been adopted to evaluate the volumetric performance of the selected satellite rainfall products.

$\begin{array}{}\text{(1)}& \mathrm{VHI}=\frac{{\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}\phantom{\rule{0.125em}{0ex}}{G}_{i}>t\right)\right)}{{\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}\phantom{\rule{0.125em}{0ex}}{G}_{i}>t\right)\right)+{\sum }_{i=\mathrm{1}}^{n}\left({G}_{i}\mathrm{|}\left({S}_{i}\le t\phantom{\rule{0.125em}{0ex}}\mathit{&}\phantom{\rule{0.125em}{0ex}}{G}_{i}>t\right)\right)},\end{array}$

where VHI is the volume of correctly detected rainfall by the satellites relative to the volume of the correctly detected satellites and missed gauge observations.

$\begin{array}{}\text{(2)}& \mathrm{VFAR}=\frac{{\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}\phantom{\rule{0.125em}{0ex}}{G}_{i}\le t\right)\right)}{{\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}\phantom{\rule{0.125em}{0ex}}{G}_{i}>t\right)\right)+{\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}\phantom{\rule{0.125em}{0ex}}{G}_{i}\le t\right)\right)},\end{array}$

where VFAR is the volume of false rainfall by the satellites relative to the sum of rainfall by the satellites.

$\begin{array}{}\text{(3)}& \frac{\mathrm{VCSI}={\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}{G}_{i}>t\right)\right)}{\begin{array}{c}{\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}{G}_{i}>t\right)\right)+{\sum }_{i=\mathrm{1}}^{n}\left({G}_{i}\mathrm{|}\left({S}_{i}\le t\phantom{\rule{0.125em}{0ex}}\mathit{&}{G}_{i}>t\right)\right)\\ +{\sum }_{i=\mathrm{1}}^{n}\left({S}_{i}\mathrm{|}\left({S}_{i}>t\phantom{\rule{0.125em}{0ex}}\mathit{&}\phantom{\rule{0.125em}{0ex}}{G}_{i}\le t\right)\right)\end{array},}\end{array}$

where VCSI is the overall measure of volumetric performance.

Here S is satellite rainfall estimates, G is gauge observations, i=1 to n and n is the sample size, and t is the threshold values (t= 1 mm in this study).

### 4.1.3 Continuous statistical tools

In addition, the continuous statistical measures were used to quantify the overall performance of the satellite rainfall products.

$\begin{array}{}\text{(4)}& r=\frac{\sum \left(G-\stackrel{\mathrm{‾}}{G}\right)\left(S-\stackrel{\mathrm{‾}}{S}\right)}{\sqrt{\sum \left(G-\stackrel{\mathrm{‾}}{G}{\right)}^{\mathrm{2}}}\sqrt{\sum \left(S-{\stackrel{\mathrm{‾}}{S\right)}}^{\mathrm{2}}}}.\end{array}$

Pearson correlation (r) is used to evaluate the goodness of fit of the relation. A value of 1 is the perfect score.

$\begin{array}{}\text{(5)}& \mathrm{RMSE}=\sqrt{\frac{\sum \left(G-S{\right)}^{\mathrm{2}}}{n}}.\end{array}$

The root mean square error (RMSE) measures the absolute mean difference between two datasets. A value of 0 is the perfect score.

$\begin{array}{}\text{(6)}& \mathrm{Bias}=\frac{\sum S}{\sum G}.\end{array}$

Bias is a measure of how the average satellite rainfall magnitude compares to the ground rainfall observation. A value of 1 is the perfect score. A bias value above (below) 1 indicates an aggregate satellite overestimation (underestimation) of the ground precipitation amounts.

Here G is gauge rainfall observations, S is satellite rainfall estimates, $\stackrel{\mathrm{‾}}{G}$ is average gauge rainfall observations, $\stackrel{\mathrm{‾}}{S}$ is the average satellite rainfall estimates, and n is the number of data pairs.

5 Results and discussions

The performances of satellite rainfall estimates were evaluated using the categorical indices (i.e., POD, FAR, and CSI), volumetric index (i.e., VHI, VFAR, and VCSI), and a set of continuous statistics (i.e., correlation coefficient (r), bias, and RMSE) at dekadal and monthly temporal scale. High values of POD, VHI, CSI, VCSI, and r; small values of FAR, VFAR, and RMSE; and bias values of 1 (or near to 1) indicate good performance of the satellite rainfall products.

## 5.1 Spatial rainfall patterns of satellite products

Figure 2 provides the 16-year mean rainy season (June to September) and the annual rainfall of TAMSAT 2, ARC 2, TAMSAT 3, and CHIRPS satellite rainfall products over the UBN basin in Ethiopia for the period of 2000 to 2015. The wet/Kiremt season (June to September) produced the majority of the total annual precipitation. Therefore, both the rainy season (Fig. 2a) and annual estimates (Fig. 2b) generated by the satellite products have shown similar rainfall patterns. However, TAMSAT 2 and ARC 2 showed a decreasing trend of rainfall from west to the east region (or from low- to high-elevation areas) of the basin, while TAMSAT 3 and CHIRPS show a significant amount of rainfall in the central and southwest regions. The large discrepancy in TAMSAT 2 and ARC 2 rainfall pattern in the west and east areas could be attributed to the orographic effect on rainfall.

Figure 2Comparison of mean satellite rainfall estimates for (a) Kiremt season (June–September) and (b) annual rainfall over the Upper Blue Nile basin for the period of 2000–2015. Years with missed values were not considered in the mean analysis.

## 5.2 Dekadal comparison

The dekadal comparisons were made using (i) all dekadal values from rain gauge observation and satellite products and (ii) classifications of the dekadal values, for further validation, per elevation of the UBN basin.

### 5.2.1 Overall validation at dekadal temporal scale

Table 1 gives an overall comparison between the satellite products and rain gauge observation from 2000 to 2015 at a dekadal temporal scale. In addition, Figs. 3 and 4 provide the cumulative distribution function (CDF) and the scatter plot, respectively.

Table 1Summary of the point-to-grid evaluation at dekadal temporal scale using categorical, volumetric, and continuous statistical tools. Probability of detection (POD), false alarm ratio (FAR), critical success index (CSI), volumetric hit index (VHI), volumetric false alarm ratio (VFAR), volumetric critical success index (VCSI), correlation coefficient (r), bias, and the root mean square error (RMSE). The RMSE values are shown in millimeters.

The overall evaluation and comparison summary, shown in Table 1, indicates that CHIRPS scored relatively higher POD, VHI, and VCSI values followed by TAMSAT 3 and TAMSAT 2. It is apparent from the same table that both TAMSAT products have shown a similar skill and have scored almost similar POD, VHI, and VCSI values. Given these results, it is possible to conclude that the improvement made by TAMSAT 3 over the previous version TAMSAT 2 on the skills of detecting the frequency of a rainfall event is very insignificant. On the other hand, ARC 2 scored relatively lower POD, VHI, and VCSI values.

However, ARC 2, TAMSAT 2, and TAMSAT 3 scored lower FAR and higher CSI values than CHIRPS. The CHIRPS product resulted in the highest FAR (0.31) and lowest CSI (0.68) values. Similarly, a FAR value of 0.29 (close to 0.31 of this study) for CHIRPS has been obtained by Tote et al. (2015) from the dekadal product validation in Mozambique. This means that TAMSAT (which hereafter refers to both version 2 and version 3) and ARC 2 products are better than CHIRPS in detecting the relative frequency of rain events. The overestimation of rainy days by CHIRPS might be related to the process of translating infrared (IR) CCD values into estimates of precipitation using the 0.25 grid cell TMPA datasets, which may result in the formation of too much light rain (Funk et al., 2015). Nevertheless, from the volumetric indices, VFAR values (0.06) of CHIRPS are much reduced, and CHIRPS' overall performance (VCSI = 0.94) is improved and even better than TAMSAT and ARC 2 products. Since the volumes of rainfall detected by CHIRPS during false events were negligible, they had a minimal contribution to the total amounts of rainfall.

Furthermore, Table 1 shows that CHIRPS has better agreement with the rain gauge observations than TAMSAT and ARC 2 on most continuous statistical assessments, and that is results in the highest correlation coefficient (r), better bias values, and the lowest RMSE. The two likely explanations for CHIRPS good performance might be the use of CHPClim and the inclusion of station data in the CHIRPS datasets (Funk et al., 2015). Indeed, TAMSAT 3 has scored very comparable values to the CHIRPS product, particularly to the bias ratio. Both CHIRPS and TAMSAT 3 have managed to reproduce the rainfall amount observed by rain gauge stations reasonably well (with an overall bias of 0.96 (i.e., underestimated only by 4 %) and 1.04 (overestimated only by 4 %), respectively), while TAMSAT 2 and ARC 2 showed a substantial underestimation of rain gauge observation by 31 and 24 %, respectively. The underestimations of ARC 2 and TAMSAT 2 might be attributed to the complex topography of the validation site (possibly dominated by warm rain processes) that may reduce the ability to identify rainy clouds (Dinku et al., 2007; Funk et al., 2015; Maidment et al., 2014) and the calibration process using gauge stations. However, the statistical analysis in Table 1 reveals that the recent version of TAMSAT 3 has well addressed the problem of underestimation of rainfall by TAMSAT 2 and that it significantly improved the bias ratios. Thus, the overall dekadal validation and comparison indicated that CHIRPS has a high level of correspondence with rain gauge observations and may have a useful skill for various functions in the study area.

In Fig. 3, the CDFs of dekadal rainfall between the satellite products and the rain gauge observation are presented to validate how often the satellite products occur below or above the rain gauge observation values.

Figure 3Cumulative distribution function (CDF) of dekadal rainfall for (a) ground rainfall observation, ARC 2, TAMSAT, and CHIRPS rainfall estimates and (b) magnified view of their CDF for 0 to 200 mm part over the Upper Blue Nile basin for the period of 2000–2015.

As can be seen in Fig. 3a, TAMSAT 3 has shown better performance (followed by CHIRPS) in detecting dekadal maximum values observed by rain gauge stations. The result shows significant improvements made by TAMSAT 3 in comparison to the previous TAMSAT 2 product. The plot in Fig. 3b further reveals that CHIRPS and TAMSAT 3 are very close to the rain gauge observation at all rainfall measurement values, except for low rainfall (< 20 mm) and rainfall between (20 to 100 mm) accumulation, respectively, where they show a slight overestimation. The CHIRPS product has also demonstrated a little underestimation in high-rainfall areas. A similar result for CHIRPS product has been noted by prior studies of Tote et al. (2015) and Trejo et al. (2016) in Mozambique and Venezuela, respectively. However, TAMSAT 2 and ARC 2 are well below the rain gauge observations. The comparison between SREs and rain gauge observations at 80 % frequency level indicated that TAMSAT 3 and CHIRPS only varies with 5.9 mm above and 2.69 mm below, respectively, from the 71.5 mm rainfall value observed by rain gauge stations, while ARC 2 and TAMSAT 2 are 13.84 and 16.3 mm below, respectively, at dekadal temporal scale. This shows that CHIRPS (followed by TAMSAT 3) is very close to rain-gauge-observed values, while TAMSAT 2 and ARC 3 are well below.

In addition, scatter plots shown in Fig. 4 were used to further define the relationship between satellite rainfall products and rain gauge observations. The satellite rainfall estimates show better agreement with rain gauge observations at lower rainfall amount. The agreement slowly reduces to the higher values. However, CHIRPS and TAMSAT 3 have shown a relatively better agreement with rain gauge observations (with r= 0.81 and 0.78, in their order of appearance) in comparison to TAMSAT 3 and ARC 2 at dekadal timescale. However, ARC 2 has exhibited the lowest agreement with rain gauge values (r= 0.72) compared to the other SREs. The regression values are very consistent with the values presented in the CDF shown in Fig. 3.

Figure 4Scatter plot between rain gauge observations and satellite rainfall estimates at dekadal temporal scale over the Upper Blue Nile basin for the period of 2000–2015.

### 5.2.2 Comparison at different elevations using the dekadal timescale data

The effect of topography on the skill of satellite rainfall products might be substantial (Hirpa et al., 2010). Stations selected in this study have a broad range of elevation from 790 to 3098 m a.s.l. This wide range of elevation and spatial variation is essential to confirm the dependence of the satellite rainfall products on topographic patterns. The dekadal timescale data were classified into the 32 rain gauge stations. Thus, the skills of the satellite products at different station elevations have been validated, and the results are given in Figs. 5 and 6.

Figure 5 depicts the categorical and volumetric indices of the satellite products at different elevation values during 2000 to 2015. CHIRPS has shown a more prominent skill than TAMSAT and ARC 2 products and scored POD and VHI values close to 1.00 at most elevations. However, the competencies of TAMSAT and ARC 2 products in detecting rainfall events seem to reduce with elevation.

Figure 5Categorical and volumetric indices of satellite rainfall products as a function of elevation: (a) probability of detection (POD), (b) false alarm ratio (FAR), (c) critical success index (CSI), (d) volumetric hit index (VHI), (e) volumetric false alarm ratio (VFAR), and (f) volumetric critical success index (VCSI) over the Upper Blue Nile basin for the period of 2000–2015.

A closer look at Fig. 5a and to some extent at Fig. 5c reveals that there is a clear trend of decreasing skills of TAMSAT 3 and ARC 2 with an increase in elevation. Stations with relatively low elevation values ranging from 790 to 1928 m resulted in the highest POD and CSI values for TAMSAT and ARC 2 estimates, whereas the majority of TAMSAT and ARC 2's lowest skills were recorded by relatively high-elevation stations ranging from 2000 to 3098 m. Further analysis of the correlation between the satellites skills (i.e., POD, FAR, CSI, VHI, VFAR, and VCSI) and elevations (given in Table 2) showed that the POD of TAMSAT 3, TAMSAT 2, and ARC 2 products have a substantial negative correlation with elevation.

The same table (Table 2) indicates that the skill of CHIRPS has resulted in a relatively low correlation coefficient with elevation. Overall, these results could imply that the skills of CHIRPS estimate are less affected by variation in elevation in comparison to TAMSAT and ARC 2 products. However, in most other indices no clear relationships between the skills of the SREs and change in elevation were observed.

Table 2Pearson correlation between the skills of SREs and station elevations (only important correlations are presented here). Probability of detection (POD), critical success index (CSI), and bias.

From the statistical analysis presented in Fig. 6a, the satellite products have shown correlation coefficients (r) ranging from 0.32 to 0.91 independent of variation in elevation. The lowest correlation (r= 0.32) was scored by TAMSAT 2 at “Sirinka” rain gauge station with an elevation of 1861 m.a.s.l. Moreover, the bias ratios for TAMSAT 2 and ARC 2 seem to have elevation-dependent trends (Fig. 6b and Table 2). The CHIRPS and TAMSAT 3 have scored the best average bias ratios (1.00 and 1.07, respectively) independent of elevations, although they considerably under/overestimate rainfall values at some elevations. The average bias ratio among satellite products at wider elevation range were compared, and ARC 2 (TAMSAT 2) resulted in mean biases of 1.53 (1.35), 0.86 (0.73), and 0.77 (0.66) at low (< 1000 m a.s.l.), medium (1000 to 2000 m a.s.l.), and high elevation (> 2000 m a.s.l.), respectively. On the other hand, the CHIRPS dataset scored a bias of 1.11, 0.99, and 1.00, while TAMSAT 3 reached 1.14, 1.07, and 1.07 at low, medium, and high elevation, respectively. These results are in good agreement with those presented in Table 2 and Fig. 7. The results, as shown in Table 2, indicated that the bias ratios of ARC 2 and TAMSAT 2 have modest negative correlations with elevation ($r=-\mathrm{0.44}$ and $r=-\mathrm{38}$, respectively), while CHIRPS and TAMSAT 3 resulted in correlation values close to zero.

Figure 6Statistical validation of the satellite products as a function of elevation: (a) Pearson correlation coefficient (r), (b) bias, and (c) the root mean square error (RMSE) over the Upper Blue Nile basin for the period of 2000–2015.

The same result has been revealed by Fig. 7, in which TAMSAT 2 and ARC 2 underestimate rainfall values at higher (Fig. 7a) and medium (Fig. 7b) elevations, while they are overestimated at lower elevation (Fig. 7c) stations. The average dekadal values from all stations given in Fig. 7d further showed that TAMSAT 2 and ARC 2 consistently underestimate rain gauge values, while CHIRPS and TAMSAT 3 show very close estimation, with better performance from CHIRPS. The relatively good performance of CHIRPS at different elevations is partly due to the inclusion of typical physiographic indicators such as elevation during the development of the datasets (Funk et al., 2015). These could make CHIRPS a relatively better satellite rainfall product that might be used in complex topographic areas, such as the UBN basin, to detect the pattern and variability of precipitation.

Figure 7Comparison of the satellite products at gauge stations with wider difference in elevation values (e.g., > 2000 m), based on dekadal average over the Upper Blue Nile basin for the period of 2000–2015: (a) at “Nefas Mewucha” station with an elevation of 3098 m a.s.l., (b) at “Majate” stations with an elevations of 2000 m a.s.l., (c) at “Metema” stations with an elevation of 790 m a.s.l., and (d) on dekadal rainfall average from all rain gauge stations. The x axis represents the 36 dekadals of a year.

A possible explanation for TAMSAT 2 and ARC 2 overestimations at lower elevation might be the deep convective nature of the Intertropical Convergence Zone (ITCZ), the main rain-producing mechanism in Ethiopia (Seleshi and Zanke, 2004), in the lower-elevation areas that results in too deep cold clouds that may stay for a number of days. The underestimations at higher elevation could be linked to the potential evaporation of rainfall at the cloud base in high-altitude areas. However, results (in Figs. 6, 7 and Table 2) provide confirmatory evidence that the recent TAMSAT product (TAMSAT 3) has addressed many of the weaknesses of TAMSAT 2 in complex topographic areas, particularly the bias ratios, and the improvement in this regard is very encouraging.

Furthermore, Fig. 6c shows that the RMSEs of satellite products have no significant relationship to elevation. Nevertheless, CHIRPS and TAMSAT 3 have scored the lowest average RMSE (30.02 and 32.24 mm dekad−1) in comparison to ARC 2 and TAMSAT 2 RMSE (38.44 and 38.13 mm dekad−1, respectively).

## 5.3 Monthly comparison

The daily rain gauge observation and TAMSAT 3 and ARC 2 products were aggregated to monthly total rainfall, while CHIRPS and TAMSAT 2 satellite rainfall products are available at a monthly timescale. The monthly comparison was made using (i) all monthly values from rain gauge observation and satellite products and (ii) classified the monthly values into 12 classes for further validation of the satellite products for each specific month of the UBN basin.

### 5.3.1 Overall comparison at a monthly temporal scale

Table 3 presents the summary of the overall monthly validation results. Figure 8 shows scatter plots of rain gauge observations and satellite rainfall estimates at a monthly temporal scale.

Table 3Summary of the point-to-grid evaluation at a monthly temporal scale using categorical, volumetric, and continuous statistical tools. Probability of detection (POD), false alarm ratio (FAR), critical success index (CSI), volumetric hit index (VHI), volumetric false alarm ratio (VFAR), volumetric critical success index (VCSI), correlation coefficient (r), bias, and the root mean square error (RMSE). The RMSE values are shown in millimeters.

In general, the overall monthly comparisons between the four SREs and the rain gauge observations have shown a better agreement than the comparison at dekadal temporal scale. This is as expected because errors at sub-monthly scale show closely symmetric characteristics and may finally cancel each other out following the aggregation to monthly temporal scale. The monthly comparisons, shown in Table 3, indicate CHIRPS' better performance in most validation tools than TAMSAT and ARC 2. However, CHIRPS still has high FAR values and overestimates the frequency of rainfall events by 14 %, but its monthly FAR value is much improved in comparison to the dekadal timescale analysis (FAR = 0.31). From comparison of the TAMSAT products, TAMSAT 2 has outperformed the newer TAMSAT 3 in the scores of POD and CSI, while they showed equal values in FAR, VFAR, and VCSI values. ARC 2 exhibited the lowest categorical and volumetric values.

Additionally, from the continuous statistical analysis in Table 3 and the scatter plot in Fig. 8, good agreement was found between rain gauge observations and all four SREs (r > = 0.80). CHIRPS scored the highest correlation coefficient (r= 0.88) and the lowest RMSE (59.03 mm month−1), while ARC 2 resulted in the largest RMSE (79.21 mm month−1) and the weakest but fairly good correlation coefficients (r= 0.80). On the other hand, CHIRPS and TAMSAT 3 satellite products resulted in bias values close to the perfect score of 1.00, whereas TAMSAT 2 and ARC 2 showed poor bias ratios and underestimated monthly gauge observed rainfall by 31 and 24 %, respectively. In this respect, a lot has been done in the recent version of TAMSAT 3, and there have been significant improvements in the weak bias values of the previous version, TAMSAT 2.

Figure 8Scatter plot between rain gauge observations and satellite rainfall estimates at a monthly temporal scale over the Upper Blue Nile basin for the period of 2000–2015.

Overall, the skill of CHIRPS is still better than the other satellite rainfall estimates in the monthly timescale analysis as well. In fact, TAMSAT 3 has shown a comparable performance and very close scores, in the majority of validation tools, to CHIRPS, particularly to bias ratio, similar to the dekadal timescale analysis above.

### 5.3.2 Comparison at each month

The performances of the satellite rainfall products were also evaluated for each month of the UBN basin, where a different amount of rainfall is recorded. Thus, the monthly data from all the 32 stations for the validation period of 2000–2015 (both from SREs and rain gauge observations) were categorized into 12-month classes. The months from June to September (wet months) contribute the largest proportion of annual rainfall in the study area, followed by low-rainfall months (from February to May) and the dry months (from October to January). Figures 9 and 10 illustrate the performances of all four SREs for the categorical, volumetric and continuous statistical validation tools.

The categorical and volumetric analysis, presented in Fig. 9, for each month revealed that the performances of all four satellite rainfall products are very encouraging during the wet months and have good agreement with rain gauge observations, shown in the lower semicircle of the polar plot (i.e., high POD, VHI, CSI, and VCSI, and low FAR and VFAR). A similar result has been obtained by Young et al. (2014) and Dinku et al. (2011b) during the wettest periods in the Ethiopian Highlands and over the upper Nile region in Ethiopia, respectively, using TAMSAT 2, ARC 2, TRMM, and CMORPH satellite rainfall products. This might be because the numbers of hit values are noticeably larger than the number of missed and false events during the wet months. However, over the upper semicircle of the polar plot in the same figure (Fig. 9), dominated by dry and low-rainfall months, the satellite products have shown a relatively wider difference in their skills. CHIRPS has scored the highest POD, VHI, CSI, and VCSI values in comparison to TAMSAT and ARC 2 products. A comparable finding has been reported by Tote et al. (2015) and Young et al. (2014). However, CHIRPS still has high FAR (up to 0.4) and VFAR (0.31) values, particularly during the month of January. The increase in FAR and VFAR values of CHIRPS is because of the overdetection of rainfall events, which can perhaps be linked to its calibration with TMPA 3B42. The rather weak performance of TAMSAT and ARC 2 products during the dry and low-rainfall months could be associated with low frequency of rain events owing to the lower amount of rainfall detected by the satellites. Overall, the results highlighted that the skill of CHIRPS is relatively better than TAMSAT and ARC 2 products and has good agreement with rain-gauge-observed rainfall data both in the wet and dry seasons, although it overpredicts rainfall events particularly for dry and low-rainfall months.

Figure 9Categorical and volumetric validation of the satellite products for each month of the Upper Blue Nile basin for the period of 2000–2015: (a) probability of detection (POD), (b) false alarm ratio (FAR), (c) critical success index (CSI), (d) volumetric hit index (VHI), (e) volumetric false alarm ratio (VFAR), and (f) volumetric critical success index (VCSI).

Figure 10Statistical validation of the satellite products for each month of the Upper Blue Nile basin for the period of 2000–2015: (a) Pearson correlation coefficient (r), (b) bias ratio, and (c) the root mean square error (RMSE). The RMSE values are shown in millimeters.

As can be seen from the continuous statistical validation presented in Fig. 10a, the correlation coefficients for all four satellite rainfall products are generally low (as low as r= 0.03) during the dry months, shown in the upper semicircle of the polar plot, except for the month of October. However, over the low-rainfall and wet months, the correlation coefficient for CHIRPS was relatively high in comparison to TAMSAT and ARC2, except for the months of February and May, with values of r= 0.53, r= 0.82, r= 0.72, and r= 0.77 for the months of March, June, July, and September, respectively. TAMSAT 3 has also scored comparable correlation values during these months alongside CHIRPS, while TAMSAT 2 and ARC 2 scored the weakest values. In the months of February (r= 0.58) and May (r= 0.79), the highest correlation coefficients were recorded by TAMSAT 2 and TAMSAT 3, respectively. Over the months of December, April, and August all four SREs scored low correlation values.

Further, in Fig. 10b, CHIRPS shows better bias ratios in all months, except in the months of November (0.78) and December (0.70) and a little overestimation (1.13) in the month of February, when only a small amount of rainfall was recorded. This result is consistent with the CDF in Fig. 3, where CHIRPS shows slightly overestimated rain gauge observed values with a low amount of rainfall accumulations. TAMSAT 3 scored the second-best bias ratio next to CHIRPS, except for the months of December to March, in which it considerably underestimates rain-gauge-observed rainfall. On the other hand, TAMSAT 2 and ARC 2 result in a weak bias ratio for all months, mainly for the dry months indicated in the upper semicircle of the polar plot (Fig. 10b). Overall, the dependency of the CHIRPS bias ratio on the monthly temporal pattern, particularly during the wet season, is very minimal in comparison to TAMSAT 3. These bias ratios would appear to indicate that the potential of CHIRPS satellite rainfall estimates for hydrological functions. Following the performance of CHIRPS during months of high rainfall, Trejo et al. (2016) also suggested its use for hydrological applications. For hydrological monitoring, it is vital to accurately estimate significant rain events (Dinku et al., 2007). CHIRPS scored the lowest RMSE, followed by TAMSAT 3. TAMSAT 2 and ARC 2 presented the relatively largest values of RMSE (Fig. 10c). In fact, RMSE is higher in the wet months due to increased amounts of rainfall.

6 Conclusions

This study set out with the aim of evaluating the performance of CHIRPS satellite rainfall estimates against 32 rain gauge observations over the Upper Blue Nile (UBN) basin in Ethiopia for the period of 2000 to 2015. Then, the performance of CHIRPS was compared with TAMSAT (TAMSAT 2 and TAMSAT 3) and ARC 2 rainfall products. In the course of the analysis, the TAMSAT and ARC 2 products were validated as well. The TAMSAT 2 rainfall estimate was used in this study mainly to assess the improvements made by the recent version of TAMSAT product (TAMSAT 3). A point-to-grid-based comparison was carried out at dekadal and monthly temporal timescale using categorical, volumetric and continuous statistical validation tools. The dekadal and monthly timescale data were further utilized for the validation of the SREs at different elevations and for each particular month of the UBN basin, respectively.

From the overall validation at dekadal and monthly temporal scale, CHIRPS has shown the highest skill, the lowest RMSE, and better bias values than TAMSAT 3 and ARC 2. Indeed, TAMSAT 3 has scored very comparable values to the CHIRPS product, particularly to the bias ratio, while ARC 2 underestimates rain-gauge-observed rainfall by 24 %. Although CHIRPS overpredicted rainy days (i.e., high false alarm rate), its volumes of false alarm ratios are much reduced, and its overall performance is significantly improved and was better than TAMSAT 3 and ARC 2. Since the volumes of rainfall detected by CHIRPS during false events were negligible, it had a minimal contribution to the total amounts of rainfall. The findings of this study, therefore, indicated that event-based analysis alone might not be enough to verify the skill of the satellite rainfall product as small rain events might lead to wrong conclusions.

Validation at different elevations indicated that all the SREs have generally good agreement with rain gauge observations and their performances are independent of elevations, except in their skills of detecting rainfall events (POD) by TAMSAT 3 and ARC 2. The PODs of TAMSAT 3 and ARC 2 have a considerable negative correlation ($r=-\mathrm{55}$) with elevation and their skills of detecting rainfall events reduced with an increase in elevation, while CHIRPS results in a relatively small positive correlation (r= 0.34). Compared to all the satellite rainfall products, CHIRPS still scores better values at most elevations. In fact, TAMSAT 3 has also scored a comparable average bias ratio (1.07), quite close to CHIRPS' perfect score of 1.00. Moreover, the bias ratio of TAMSAT 2 and ARC 2 seems affected by variation in elevation.

Generally, the validation for each specific month of the study area indicated that the performances of SREs are better during the wet months, except for the RMSE, and has good agreement with rain gauge observations. In fact, RMSE is higher in the wet months due to increased amounts of rainfall. The best values were scored by CHIRPS, closely followed by TAMSAT 3, particularly for the correlation coefficient and the bias ratio. However, over the majority of low-rainfall and dry months, the SREs have shown weak performance, especially for POD and VHI (TAMSAT and ARC 2); FAR and VFAR (CHIRPS); and CSI, VCSI, correlation coefficient, and bias (all four SREs). However, the overall skill of CHIRPS is relatively good during these months as well and was better than TAMSAT 3 and ARC 2. Good performance has also been observed from TAMSAT 3 alongside CHIRPS, particularly for the bias ratios.

To summarize the results, the performance of CHIRPS in the UBN basin is very encouraging and relatively better than the other satellite rainfall products (TAMSAT and ARC 2). More specifically, the reliable performance of CHIRPS at different elevations and during the wet months could make the product more appropriate for various hydrological and rainfall analysis functions in complex topographic areas, such as the UBN basin. The performance of TAMSAT 3 is very comparable to CHIRPS product and scores close values to CHIRPS in many of the validation indicators, particularly the bias ratios. This validation study has also provided confirmatory evidence that the recent version of the TAMSAT product (TAMSAT 3) has well addressed many of the weaknesses of TAMSAT 2 (e.g., underestimations up to 31 % in this study) in complex topographical areas, and the improvement in this regard is very encouraging. Future work will involve validation of the product in different rainfall categories and spatial and temporal scale as well as during drought and wet periods for complete understanding of its potential.

Data availability
Data availability.

Rainfall data for this study were collected from ground-based weather stations and remote sensing satellite estimates. Gauge station rainfall data are provided by the National Meteorological Agency of Ethiopia (http://www.ethiomet.gov.et, NMAE, 2016) upon request. The CHIRPS satellite rainfall estimates are publically available and can be downloaded free of charge (ftp://ftp.chg.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/, last access: 29 June 2017). The daily TAMSAT satellite rainfall estimates are freely available from the University of Reading research data archive (version 2.0, Maidment et al., 2017b; version 3.0, Maidment et al., 2017c). Alternatively, TAMSAT 2 products are given with decadal, monthly, and seasonal temporal resolution and can be accessed through (https://www.tamsat.org.uk/data/archive, TAMSAT, 2017). The ARC 2 rainfall data are available from ftp://ftp.cpc.ncep.noaa.gov/fews/fewsdata/africa/arc2/geotiff/ (last access: 24 June 2017).

Competing interests
Competing interests.

The authors declare that they have no conflict of interest.

Acknowledgements
Acknowledgements.

This research was supported in part by the NASA Interdisciplinary Research in Earth Sciences Program, NASA Greater Horn of Africa Project NNX14AD30G and the Entoto Observatory & Research Center postgraduate program research fund. We are grateful to the data providers of CHIRPS, TAMSAT and ARC 2. The National Meteorological Agency of Ethiopia is also gratefully acknowledged for providing gauge station rainfall data found in the UBN basin.

Edited by: Piet Stammes
Reviewed by: Bozena Lapeta and one anonymous referee

References

AghaKouchak, A. and Mehran, A.: Extended contingency table: Performance metrics for satellite observations and climate model simulations, Water Resour. Res., 49, 7144–7149, https://doi.org/10.1002/wrcr.20498, 2013.

Conway, D.: Some aspects of climate variability in the northeast Ethiopian highlands-Wollo and Tigray, Ethiopian Journal of Science, 23, 139–161, 2000.

Conway, D.: From headwater tributaries to international river: Observing and adapting to climate variability and change in the Nile basin, Glob. Environ. Change, 15, 99–114, https://doi.org/10.1016/j.gloenvcha.2005.01.003, 2005.

Degefu, G. T.: The Nile Historical Legal and Developmental Perspectives, Trafford Publishing, St. Victoria, Canada, 2003.

Dinku, T., Ceccato, P., Grover-Kopec, E, Lemma, M., Connor, S., and Ropelewski, C.: Validation of satellite rainfall products over East Africa's complex topography, Int. J. Remote Sens., 28, 1503–1526, https://doi.org/10.1080/01431160600954688, 2007.

Dinku, T., Chidzambwa, S., Ceccato, P., Connor, S., and Ropelewski, C.: Validation of high resolution satellite rainfall products over complex Terrain, Int. J. Remote Sens., 29, 4097–4110, https://doi.org/10.1080/01431160701772526, 2008.

Dinku, T., Ceccato, P., and Connor, S.: Challenges of satellite rainfall estimation over mountainous and arid parts of east Africa, Int. J. Remote Sens., 32, 5965–5979, https://doi.org/10.1080/01431161.2010.499381, 2011a.

Dinku, T., Connor, S., and Ceccat, P.: Evaluation of satellite rainfall estimates and gridded gauge products over the Upper Blue Nile region, in: Nile River Basin, edited by: Melesse, A. M., Springer: Dordrecht, The Netherlands, Heidelberg, Germany, London, and New York, NY, 109–127, 2011b.

Dugdale, G., McDougall, V., and Milford, J.: Rainfall estimates in the Sahel from cold cloud statistics: Accuracy and limitations of operational systems, in: Proceedings of the Niamey Workshop on Soil Water Balance in the Sudano-Sahelian Zone, IAHS Publication, February 1991, 51–67, 1991.

Fenta, A., Rientjes, T., Haile, A., and Reggiani, P.: Satellite rainfall products and their reliability in the Blue Nile Basin, in: Nile River Basin, edited by: Melesse, A. M., Abtew, W., and Setegn, S. G., Springer International Publishing Switzerland, 109–127, 2014.

Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., Husak, G., Rowland, J., Harrison, L., Hoell, A., and Michaelsen, J.: The climate hazards infrared precipitation with stations – a new environmental record for monitoring extremes, Sci. Data., 2, 1–21, https://doi.org/10.1038/sdata.2015.66, 2015.

Gebere, S., Alamirew, T., Merkel, B., and Melesse, A.: Performance of High Resolution Satellite Rainfall Products over Data Scarce Parts of Eastern Ethiopia, Remote Sens.,7, 11639–11663, https://doi.org/10.3390/rs70911639, 2015.

Gebremichael, M., Krajewski, W., Morrissey, M., Huffman, G., and Adler, R.: A detailed evaluation of GPCP one-degree daily rainfall estimates over the Mississippi River Basin, J. Appl. Meteorol., 44, 665–681, 2005.

Gebremichael, M., Bitew, M., Hirpa, F., and Tesfaye, G.: Accuracy of satellite rainfall estimates in the Blue Nile Basin, Lowland plain versus highland mountain, Water Resour. Res., 50, 8775–8790, https://doi.org/10.1002/2013WR014500, 2014.

Grimes, D., Pardo-Igzquiza, E., and Bonifacio, R.: Optimal areal rainfall estimation using rain gauges and satellite data, J. Hydrol., 222, 93–108, 1999.

Hirpa, F., Gebremichael, M., and Hopson, T.: Evaluation of High-Resolution Satellite Precipitation Products over Very Complex Terrain in Ethiopia, J. Appl. Meteorol. Clim., 49, 1044–1051, https://doi.org/10.1175/2009JAMC2298.1, 2010.

Hsu, K. and Sorooshian, S.: Satellite-based precipitation measurement using PERSIANN system, in: Hydrological Modelling and the Water Cycle, Springer, Berlin, Heidelberg, 63, 27–48, 2008.

Huang, J. and van den Dool, H.: Monthly precipitation temperature relations and temperature prediction over the United States, J. Climate, 6, 1111–1132, 1993.

Huffman, G., Bolvin, D., Nelkin, E., Wolff, D., Adler, R., Gu, G., Hong, Y., Bowman, K., and Stocker, E.: The TRMM Multisatellite Precipitation Analysis (TMPA): Quasi- Global, Multiyear, Combined-Sensor Precipitation Estimates at Fine Scales, J. Hydrometeorol., 8, 38–55, 2007.

Joyce, R., Janowiak, J., Arkin, P., and Xie, P.: CMORPH: A Method That Produces Global Precipitation Estimates from Passive Microwave and Infrared Data at High Spatial and Temporal Resolution, J. Hydrometeorol., 5, 487–503, 2004.

Kim, U., Kaluarachchi, J., and Smakhtin, V.: Generation of Monthly precipitation under climate change for the upper Blue Nile River Basin, Ethiopia, J. Am. Water Resour., 44, 1231–1247, 2008.

Knapp, R., Ansari, S., Bain, L., Bourassa, A., Dickinson, J., Funk, C., Helms, N., Hennon, C., Holmes, D., Huffman, J., Kossin, P., Lee, T., Loew, A., and Magnusdottir, G.: Globally gridded satellite observations for climate studies, B. Am. Meteorol. Soc., 92, 893–907, 2011.

Kummerow, C., Hong, Y., Olso, W., Yang, S., Adler, R., McCollum, J., Ferraro, R., Petty, G., Shin, D., and Wilheit, T.: The evolution of the Goddard Profiling Algorithm (GPROF) for rainfall estimation from passive microwave sensors, J. Appl. Meteorol., 40, 1801–1820, 2001.

Maidment, R., Grimes, D., Allan, R., Tarnavsky, E., Stringer, M., Hewison, T., Roebeling, R., and Black, E.: The 30 year TAMSAT African Rainfall Climatology and Time series (TARCAT) data set, J. Geophys. Res.-Atmos., 119, 619–10, https://doi.org/10.1002/2014JD021927, 2014.

Maidment, R., Grimes, D., Black, E., Tarnavsky, E., Young, M., Greatrex, H., Allan, R., Stein, T., Nkonde, E., Senkunda, S., and Alcántara, E.: A new, long-term daily satellite-based rainfall dataset for operational monitoring in Africa, Sci. Data., 4, 170063, https://doi.org/10.1038/sdata.2017.63, 2017a.

Maidment, R., Black, E., and Tarnavsky, E.: TAMSAT daily rainfall estimates (version 2.0) dataset, University of Reading, https://doi.org/10.17864/1947.108, 2017b.

Maidment, R., Black, E., and Young, M.: TAMSAT daily rainfall estimates (version 3.0) datasets, University of Reading, https://doi.org/10.17864/1947.112, 2017c.

National Meteorology Agency of Ethiopia (NMAE): Data service, available at: http://www.ethiomet.gov.et, last access: 6 August 2016.

Novella, N. and Thiaw, W.: African rainfall climatology version 2 for famine early warning systems, J. Appl. Meteorol. Clim., 52, 588–606, https://doi.org/10.1175/JAMC-D-11-0238.1, 2013.

Romilly, T. G. and Gebremichael, M.: Evaluation of satellite rainfall estimates over Ethiopian river basins, Hydrol. Earth Syst. Sci., 15, 1505–1514, https://doi.org/10.5194/hess-15-1505-2011, 2011.

Seleshi, Y. and Zanke, U.: Recent changes in rainfall and rainy days in Ethiopia, Int. J. Climatol., 24, 973–983, https://doi.org/10.1002/joc.1052, 2004.

Stillman, S., Ninneman, J., Zeng, X., Franz, T., Scott, R., Shuttleworth, W., and Cummins, K.: Summer Soil Moisture Spatiotemporal Variability in Southeastern Arizona, J. Hydrometeorol., 15, 1473–1485, https://doi.org/10.1175/JHM-D-13-0173.1, 2014.

TAMSAT: TAMSAT data archive, available at: https://www.tamsat.org.uk/data/archive, last access: 16 February 2017.

Tarnavsky, E., Grimes, D., Maidment, R., Black, E., Allan, R., Stringer, M., Chadwick, R., and Kayitakire, F.: Extension of the TAMSAT satellite based rainfall monitoring over Africa and from 1983 to present, J. Appl. Meteorol. Clim., 53, 2805–2822, https://doi.org/10.1175/JAMC-D-14-0016.1, 2014.

Taye, M. and Willems, P.: Temporal variability of hydroclimatic extremes in the Blue Nile basin, Water Resour. Res., 48, 1–13, https://doi.org/10.1029/2011WR011466, 2012.

Thorne, V., Coakeley, P., Grimes, D., and Dugdale, G.: Comparison of TAMSAT and CPC rainfall estimates with rain gauges for southern Africa, Int. J. Remote Sens., 22, 1951–1974, 2001.

Toté, C., Patricio, D., Boogaard, H., van der Wijngaart, R., Tarnavsky, E., and Funk, C.: Evaluation of Satellite Rainfall Estimates for Drought and Flood Monitoring in Mozambique, Remote Sens., 7, 1758–1776, https://doi.org/10.3390/rs70201758, 2015.

Trejo, F., Barbosa, H., Murillo, M., and Farias, M.: Intercomparison of improved satellite rainfall estimation with CHIRPS gridded product and rain gauge data over Venezuela, Atmósfera, 29, 323–342, https://doi.org/10.20937/ATM.2016.29.04.04, 2016.

Worqlul, A. W., Maathuis, B., Adem, A. A., Demissie, S. S., Langan, S., and Steenhuis, T. S.: Comparison of rainfall estimations by TRMM 3B42, MPEG and CFSR with ground-observed data for the Lake Tana basin in Ethiopia, Hydrol. Earth Syst. Sci., 18, 4871–4881, https://doi.org/10.5194/hess-18-4871-2014, 2014.

Wu, H., Adler, R., Hong, Y., Tian, Y., and Policelli, F.: Evaluation of global flood detection using satellite-based rainfall and a hydrologic model, J. Hydrometeorol., 13, 1268–1284, https://doi.org/10.1175/JHM-D-11-087.1, 2012.

Young, M., Williams, C., Chiu, J., Maidment, R., and Chen, S.: Investigation of Discrepancies in Satellite Rainfall Estimates Over Ethiopia, J. Hydrometeorol.,15, 2347–2369, https://doi.org/10.1175/JHM-D-13-0111.1, 2014.