Something fishy going on? Evaluating the Poisson hypothesis for rainfall estimation using intervalometers: results from an experiment in Tanzania

Abstract. A new type of rainfall sensor (the intervalometer), which counts the arrival
of raindrops at a piezo electric element, is implemented during the Tanzanian
monsoon season alongside tipping bucket rain gauges and an impact
disdrometer. The aim is to test the validity of the Poisson hypothesis
underlying the estimation of rainfall rates using an experimentally determined
raindrop size distribution parameterisation based on
Marshall and Palmer (1948)'s exponential one. These parameterisations are defined
independently of the scale of observation and therefore implicitly assume that
rainfall is a homogeneous Poisson process. The results show that
28.3 % of the total intervalometer observed rainfall patches can
reasonably be considered Poisson distributed and that the main reasons for
Poisson deviations of the remaining 71.7 % are non-compliance with
the stationarity criterion (45.9 %), the presence of correlations
between drop counts (7.0 %), particularly at higher arrival rates
(ρa>500 m-2s-1), and failing a χ2
goodness-of-fit test for a Poisson distribution (17.7 %). Our results
show that whilst the Poisson hypothesis is likely not strictly true for
rainfall that contributes most to the total rainfall amount, it is quite useful
in practice and may hold under certain rainfall conditions. The
parameterisation that uses an experimentally determined power law relation
between N0 and rainfall rate results in the best estimates of rainfall
amount compared to co-located tipping bucket measurements. Despite the
non-compliance with the Poisson hypothesis, estimates of total rainfall amount
over the entire observational period derived from disdrometer drop counts are
within 4 % of co-located tipping bucket measurements. Intervalometer
estimates of total rainfall amount overestimate the co-located tipping bucket
measurement by 12 %. The intervalometer principle shows potential for
use as a rainfall measurement instrument.



Preface
The core scientific work of this master of science (MSc) thesis has been written up in the form of a journal paper. This paper is titled: Is something fishy going on Evaluating the Poisson hypothesis for rainfall estimation using intervalometers: first results from an experiment in Tanzania. Hereafter referred to as "The paper". The paper will be submitted to the Atmospheric Measurement Techniques journal of the European Geosciences Union shortly after the conclusion of the MSc defense. The paper has been included in this thesis under chapter 1. Chapter 1 is the core work of this thesis and can be read as a stand-alone document. The remainder of this thesis is taken up by supplementary materials in the form of introductory notes and an appendix.
Appendix A contains a README for the python code that was developed during the course of the MSc research. Several thousand lines of code were written and, as much as possible, the author has attempted to write the code in such a way that it is easily legible to anybody who may be seeking to build on this research or re-analyse the data. This means that the code is well spaced and sensible variables names have been chosen and consistent naming hierarchy logic has been employed to the best of the author's ability. Despite these efforts it may still be difficult to understand the logical flow and purpose of each script or function individually as well as how they relate to one another. Therefore, it was deemed a necessary kindness to provide a README for the code in which the different tasks that each python script or function completes are explained as well as the overall logical flow in the data analysis that links one script or function to the next. The actual code files are submitted as further supplementary materials alongside this thesis and can be found in the education repository at https://repository.tudelft.nl, by searching for the title of this MSc thesis.
Is something fishy going on? Evaluating the Poisson hypothesis for rainfall estimation using intervalometers: first results from an experiment in Tanzania. (TAH, 2017). In general, African climate has not been well researched (Otto et al., 2015;Washington et al., 2006).
There is a need for robust, inexpensive and accurate rainfall measuring instruments. For example, a recent review into the scaling up of index insurance for smallholder farmers (some of the world's poorest people) found that the sparsity of ground based weather stations is a large challenge for insurers in Sub-Saharan Africa (Greatrex et al., 2015) and companies have been forced to look to other sources of data or to develop other indices by which to insure crops. Satellite missions, such as the 30 Global Precipitation Measurement (GPM) mission show good potential for bridging this gap. However, satellite observations, whilst providing good spatial coverage, do not cover the entire temporal period and the spatial resolution may be too coarse for some applications.
Satellite data faces another issue for areas with a lack of ground based data for validation. Since radars do not measure rainfall directly, rainfall estimates are dependent on an accurate parameterisation of the drop size distribution (DSD) in order 35 to develop rainfall (R) to radar reflectivity (Z) relationships (Munchak and Tokay, 2008;Guyot et al., 2019). A foundational work in this regard is the negative exponential parameterisation presented by Marshall and Palmer 1948 as a fit to filter paper measurements of rain drop sizes for different rain rates. A lot of work has been done on determining the functional forms for these parameterisations and many different forms of the DSD have been proposed, of which the most widely used are the aforementioned exponential, gamma (Ulbrich, 1983;Tokay and Short, 1996;Iguchi et al., 2017) and lognormal distributions 40 (Feingold and Levin, 1986). It has also been shown that the appropriate parameterisation is dependent on the type of rainfall (Atlas and Ulbrich, 1977) and the climatic setting (Battan, 1973;Bringi et al., 2003). Therefore, ground 'truthing' of DSDs for satellite retrievals is very important to ensure that the DSD is being parameterised correctly in the derivation of rainfall rates (Munchak and Tokay, 2008).
An assumption that is seldom explicitly mentioned in the presentation of these parameterisations is the homogeneity as-45 sumption . The concept of the DSD is only useful if at some minimum scale raindrops are distributed homogeneously in space and time. If this was not the case then the parameterisation would depend on the size of the sample volume/area/time period to which it pertains . Statistical homogeneity implies that the frequency of raindrops in a volume or arriving at a surface in fixed time intervals obeys Poisson statistics. The arrival of raindrops at a surface has long been considered an example of a Poisson process (Kostinski and Jameson, 1997;Joss and Waldvogel, 1969). 50 However, this assumption has been questioned and several studies argue that the homogeneity assumption is unable to cope with the clumping of raindrops both in time and space that is observed in reality. To borrow Jameson and Kostinski's (1997) example; the 'streakiness' that is part of the lived experience of rainfall can be seen when sheets of rain pass across the pavement during thunderstorms. This clumping results in greater variability than is expected by Poisson statistics.
Generally two different approaches have been taken to explain the enhanced variability. Studies like (Lovejoy and Schertzer,55 1990; Lavergnat and Golé, 1998) propose to abandon the Poisson process framework and replace it with a scale dependent, multi-fractal framework. And others, that propose to generalize the homogeneous Poisson process (with a constant mean) to a doubly stochastic Poisson process or Cox process, where the mean itself is a random variable (Jameson and Kostinski, 1998;Smith, 1993). The implications for radar meteorology of abandoning the Poisson framework would require an entire re-working of how rainfall estimates are derived.

60
The aim of this study is to formally assess the adequacy of the Poisson assumption and its importance in deriving rainfall estimates from ground based measurements. To this end nine intervalometers were deployed over a two month period during the Tanzanian tropical monsoon.

Instruments
Three different types of instrument were used in the experiment; a Tipping Bucket rain gauge made by Onset, in the US, equipped with a HOBO datalogger; an Acoustic Disdrometer made by Disdro in Delft, The Netherlands; and an Intervalometer. The intervalometer is a simple device that registers the arrival of raindrops at the surface of a piezo electric drum. It has a minimum detectable drop diameter of 0.8 mm, determined by Pape (2018) in a lab experiment. Typical values of D min for impact disdrometers are between 0.3 and 0.6mm (Johnson et al., 2011). The larger than typical D min value for the intervalometer 70 means that the instrument may give slight underestimates of long term rainfall rates. The advantage of the intervalometer over a standard rain gauge, is that the drop counts can be used to constrain radar observations. Furthermore, the combination of intervalometer measurements with rain gauge data can be used to give crude estimates of the observed mean drop sizes. More information about the intervalometer can be found at https://github.com/nvandegiesen/Intervalometer/wiki/Intervalometer or in Pape's (2018) report. The acoustic disdrometer, registers the kinetic energy of drop impacts at a drum and converts this to 75 an estimate of the drop size. It can be thought of as a intervalometer that not only counts drops but also provides estimates of the drop size. The minimum detectable drop diameter for the disdrometer is 0.6mm. The tipping bucket rain gauge collects all drops over a surface area and funnels it to a small bucket with a resolution of 0.2mm. When the bucket is full, it tips over. The volume of each tip is verified in situ via a field calibration experiment (WMO, 2014). A good discussion of the pros and cons of impact disdrometers can be found at e.g. (Tokay et al., 2001;Guyot et al., 2019) and for tipping buckets at e.g. (Sevruk, 1985;80 WMO, 2014). In total, the experiment made use of nine intervalometers, one acoustic disdrometer and two tipping bucket rain gauges at eight different sites.

Experiment
Eight sites were selected along the southern coast of Mafia Island, Tanzania in an approximate line, such that a rectangle 3.1 km in length and 500m in width would cover all the sites. The dimension of the long axis of the experiment was chosen to be 85 approximately that of the spatial resolution ( 5km) of GPM mission's dual polarization radar (DPR) instrument.

Data Availability
There were some issues over the course of the experiment with the various instruments which affect the availability of data. The disdrometer picked up on a oscillating signal from the 20/05/2018 onwards that resulted in total corruption of the data. Some intervalometers experienced water damage, particularly in storms with high rainfall intensities, which caused the instruments to go offline for certain periods and two were damaged beyond repair. The Tipping Bucket gauges experienced no known issues.

100
The complete data record is presented in figure 1.
4 Figure 1. Record of "Instrument Online" for each intervalometer site and the total rainfall amount [mm] from the tipping bucket at Pole Pole.
3 Methods 3.1 Deriving rainfall rates from rain drop arrival rates  present an excellent review of the exponential parameterization of the DSD as well as full derivations in their paper. A small summary mostly derived from their work is presented below. The raindrop size distribution 105 in a volume of air N V (D)[mm −1 .m −3 ] is defined such that the quantity N V (D)dD represents the number of drops with diameters between D and dD per unit volume of air. Marshall and Palmer (1948) proposed a negative exponential parameterisation for N V (D), based on filter paper measurements, of the form: If raindrops are assumed to fall at terminal velocity then N V (D) can be related to the DSD of drops arriving at a surface , which describes the relationship between drop diameter and terminal fall velocity. N A (D) is the form of the DSD that is observed by disdrometers and intervalometers Smith, 1993).
115 v(D) = αD β (5) Atlas and Ulbrich (1977) showed that α = 3.778[m.s −1 mm −β ] and β = 0.67[−] provide a close fit to the data of Gunn and Kinzer (1949) for 0.5mm ≥ D ≤ 5.0mm. The mean rainfall arrival rate ρ A [m −2 .s −1 ] is defined as the integral over all drop sizes of N A (D). For, the intervalometer this is the integral between D min = 0.8 and ∞ since the instrument has a minimum detectable drop diameter.
Where Γ is the upper incomplete gamma function.  showed that for self consistency purposes, the use of = 4.1R −0.21 determines that α = 3.25, β = 0.762 , which are quite similar to the values presented by Atlas and 125 Ulbrich (1977). Using the  α, β values and the Marshall and Palmer (1948) R − Λ relationship the rainfall rate (R) can then be calculated from the measured rainfall arrival rate ρ A by first calculating the rainfall arrival rate at different rainfall rates (0 − 200mm/h in a step of 0.01mm/h) and then fitting a third order polynomial to the curve in Python using polyfit. The fit is forced through the origin and returns a correlation coefficient of approximately 1. The constants of the fitted polynomial can then be used to calculate the rainfall rate from the measured arrival rate. A plot of the ρ A −R curve 130 and fitted polynomial is shown in figure 2.

Calibrating the intervalometer
The intervalometer is still in development as an instrument and therefore, if it is giving poor estimates, then a more trusted and proven instrument can be used for calibration. Sources of measurement error for the intervalometer are the calibration of the parameter D min and the measurement of ρ A . Errors in the determination of D min affect the ρ A − R relationship. Errors 135 in the rainfall arrival rate can result from, splashing of drops from outside the sensor onto the sensor surface during high intensity rainfall (resulting in overestimates), spurious drops from something other than rain falling on the sensor (resulting in overestimates), or from edge effects (resulting in underestimates). Drops with D > D min landing near the edges of the sensor have a dampened signal and may not be recorded if D is quite close to D min . If there are a priori disdrometer measurements of the mean drop size for different rainfall intensity or arrival rates these can be used to constrain the intervalometer measurements 140 and provide more accurate rainfall estimates. The observed mean drop sizes can be incorporated into the parameterisation to ensure that the expected mean drop sizes of the parameterised gamma distribution match with the disdrometer measurements.
is a gamma distribution ; in this case truncated at D min .
The expected value (mean) of a left truncated gamma distribution and complete gamma distribution is given by e.g. (Johnson et al., 2011;: Where γ is the lower incomplete gamma function. Now, if the observed mean drop sizes (µ D A,obs ) are some function of ρ A,obs , f (ρ A,obs ) then we can express the expected rainfall rate (R exp ) and a 'corrected' rainfall rate (R corr ) as functions of the expected and observed mean drop sizes by using the relationship Λ = 4.1R −0.21 . A good first guess for the form of f (ρ A,obs ) is the expectation of the gamma parameterisation above, but could be any function or simply the observed data at each rainfall 155 arrival rate. For the complete gamma distribution an analytical solution exists.
Divide R corr by R exp to get: 165 D ef f is an effective parameter that scales the expected rainfall rates (calculated from the parameterisation of the DSD) to the observed DSD. Alternatively, if the intervalometer is co-located with a rain gauge then the independent observations of rainfall can be used to give an estimate of the mean drop size relation by re-arranging equation 16.
There is a small time delay between the first registering of drops on the surface of the intervalometer and the first tip of the tipping bucket due to the small volume of the bucket. Therefore it is not recommended to directly compare instantaneous rainfall rates (Ciach, 2003). However, by averaging over longer time periods such as the length an entire day, reasonable total rainfall amounts can be obtained in order to calculate R ef f in equation 22.
For the truncated gamma distribution a numerical approach is required and can be implemented in Python.
Divide R corr by R exp to get: Note that equation 26 is the same as equation 16 with an extra term to account for the truncation. The D min values in the above equation are 0.6mm for the observed drop sizes (from the disdrometer) and 0.8mm for the expected drop sizes (from the intervalometer). R exp , Λ exp and µ D A,exp are calculated from the observed rainfall arrival rate and relevant equations. can also be used to derive estimates of the 'corrected' mean drop size by incorporating co-located tipping bucket measurements R obs with intervalometer estimates R exp . Note, since the tipping bucket measures all drops it has no minimum detectable drop size and therefore we can combine equation 11 and 190 equation 12 with the R − Λ relationship to get: The 'corrected' drop sizes can be calculated directly from the above equation.

Testing the Poisson homogeneity hypothesis 195
The concept of a drop size distribution depends on the assumption that at some minimum spatial or temporal scale (the primary element) the data is homogeneous. Homogeneity in a statistical sense implies that the data within the primary element follows Poisson statistics . In order for a process to be reasonably assumed as Poisson some key assumptions must hold. As applied to rainfall, these are as follows: 1. The random process is stationary 200 2. The event counts in non-overlapping time intervals are statistically independent 3. The probability of an event occurring during a time interval t, t + δt is proportional to δt 4. The probability of more than one event in a time interval t, t + δt becomes negligible for sufficiently small δt If these fundamental assumptions hold then the distribution of event counts (rain drops) is given by (eg. (Feller, 2010)).
Where µ is the mean value per unit time and k is random number of drops observed during a particular counting interval/window of time. Kostinski and Jameson (1997) show that this evenly mixed Poisson model does not explain the clumpiness and super-Poisson variability that is observed in real rainfall. However, if µ itself is an unpredictable, random variable in time and space then a rainfall event can be sub-divided into N patches, each with its own µ. In order to derive an unconditioned PDF of the drop counts it is necessary to integrate over the probability distribution of the patches f (µ).
The variance of the Poisson mixture is enhanced beyond the variance of a pure Poisson PDF. Kostinski and Jameson (1997) show that the Poisson mixture provides a better description of the frequency of drop arrivals per unit time than a simple Poisson model. The definition of f (µ) in equation 31 implies that there is a definable coherence time τ over which µ can be considered stationary and to which the simple Poisson model can be applied. In order to estimate f (µ) with sufficient accuracy require 215 (t τ T ). Where T is the entire length of a rainfall event, τ is the coherence time of a patch and t is the counting interval for the raindrops. Kostinski and Jameson (1997) show that an order of magnitude difference is sufficient between t, τ and τ, T .
For the intervalometer data, rain drops are aggregated into 10 second bins. Therefore, the minimum value for τ is 100s and for T it is 1000s. The length of τ can be determined by calculating the normalized auto-correlation function of a rainfall event of length T at increasing lag times. The lag time for which the auto-correlation drops below 1 e is defined as τ (Kostinski and 220 Jameson, 1997). A rainfall event can then be sub-divided into N patches of length τ and the fundamental Poisson assumptions can be tested on each patch.
Assumptions 3,4 are trivial for rainfall and 1,2 can be tested. A hierarchical test is used, where a patch of rainfall, of length τ must pass each test before moving onto the next test. Note that since it is impossible to know where such a patch of length τ may start or end in the data record then it is best to view τ as a moving window over which the statistical tests are conducted. Upon 225 conclusion of the test, the window shifts one data point forward in time and the tests are conducted again. This methodology also ensures that the number of effective samples is increased. The system of hierarchical tests is as follows.

230
(a) The KPSS null hypothesis is that the process is trend stationary.
(b) The ADF null hypothesis is that the series has a unit root (not stationary).
2. Auto-correlation function at increasing lag times must be within the 95% confidence limit (CL) of a Poisson process with n samples 235 (a) 3. χ 2 test for goodness of fit between the observed frequencies and the expected frequencies of a Poisson distribution with the same mean, p-value = 0.05.

4.
Dispersion criterion, such that the observed dispersion must be within the 95% CL of a Poisson distribution of n samples 240 (a) 5. Calculation of Kullback-Leibler (KL) divergence to give a sample independent indication of how well the observed distribution matches the Poisson distribution. The KL divergence, also known as the relative entropy, between two prob-245 ability density functions is commonly used as a measure of similarity or 'distance' between the distributions (Hershey and Olsen, 2007).
Tests 1 and 2 test the stationarity and independence assumptions of a Poisson process. Test 3 checks that the distribution matches a Poisson distribution and Tests 4 and 5 are quality checks to ensure that the tests are providing good results. The quality check is used because often the sample size over which each test is conducted is quite small. Figure 3 shows an 250 example of a patch of rainfall that passes all of the tests and can therefore reasonably be assumed to comply with the Poisson Homogeneity assumption.
The rainfall can be characterised by uncorrelated fluctuations around a constant mean rate of arrival, in this case 365.7 [m −2 .s −1 ]. The corresponding probability mass function (pmf) of this patch of rainfall along with the expected pmf function

Rainfall Rates
The total rainfall amounts [mm] observed by the co-located tipping bucket, intervalometer and disdrometer at the main site (Pole Pole) for the longest 'online' period of the three instruments are presented in figure 5. Estimates of total rainfall derived from the disdrometer arrival rates are in good agreement with the tipping bucket record (the records match to within std error).
This is not the case for the intervalometer, which provides a large over-estimate of the total rainfall compared to the tipping bucket (by a factor of almost 3). The figure also shows that the intervalometer registers much higher arrival rates than the disdrometer over all rainfall events despite having a smaller sensor area and a larger minimum detectable drop size. Calibration of the intervalometer rainfall estimate (using equation 26) by the observed mean drop sizes results in good agreement with 265 the tipping bucket record as a whole (within 8% of the tipping bucket value). In figure 6 the performance of the calibrated (in black) are the rainfall arrival rates measured by the disdrometer and intervalometer, respectively. The suffixes 'Exp' and 'Adj' refer to the un-calibrated rainfall from the exponential parameterisation and the calibrated rainfall, respectively.    total 51% of all rainfall patches fail the tests due to the changes in the mean arrival rate or the presence of correlations between drop counts on scales as small as 2 minutes. It should be noted that these patches of rainfall are characterised by higher arrival rates (e.g. the rainfall that fails the independence test has a mean ρ A that is approximately 3 times the Poisson value is larger than what is expected for Poisson statistics. Again, this rainfall is characterised by higher rainfall arrival rates.
These average values are quite representative for all the sites, except for Chole Mjini. This site is atypical in that it was only online for a relatively short period between the 30/04/2018 and the 08/05/2018 and during this period 77% of all the rainfall was classified as Poisson. This can be clearly seen in the two middle panels of figure 9. The time series of rainfall arrival rates clearly show that the mean rainfall arrival rate is a good predictor of 'Poisson-ness'. Patches of rain with high rainfall arrival 305 rates are typically not classified as Poisson, whereas patches of rainfall with low arrival rates are. This can be clearly seen in the top two right hand panels where the rainfall peak does not pass the Poisson tests but the consistent light rainfall, characterised by low rainfall arrival rate, does.
The reason for the high percentage of Poisson rain at the Chole Mjini site is that the rainfall over this period is dominated by consistent rainfall with a low arrival rate. The signature of this storm can also be seen in the other sites that were online The data also shows a positive correlation between the mean drop sizes and the arrival rate. The time series of rainfall arrival rates clearly show that the mean rainfall arrival rate is a good predictor of 'Poisson-ness'. Patches of rain with high rainfall arrival rates are typically not classified as Poisson, whereas patches of rainfall with low arrival rates are. This can be clearly seen in the top two right hand panels where the rainfall peak does not pass the Poisson tests but the consistent light rainfall, characterised by low rainfall arrival rate, does. This consistent light strati-form type rainfall is quite atypical for the rainfall record as a whole. The time series for Pole Pole (top left) and Meremeta (bottom right) show that the record is dominated by intermittent sharp peaks of mostly non Poisson rainfall followed by dry spells.
In the middle panel of figure 10 for each data point the reason for failing to be classified as Poisson rain is also presented. This 320 panel also clearly shows that Poisson rain is found almost entirely at the bottom of the arrival rate range, ρ A ≤ 600[m −2 .s −1 ].
As was seen for the intervalometer. This range of rainfall arrival rates contributes little to the total rainfall, 69% of all drops fall in this range but only contribute 16% to total rainfall. Data greater than 2100[m −2 .s −1 ] exclusively fail the stationarity and independence tests. This rainfall is therefore characterised by correlations between drop counts and fluctuations in the mean arrival rate at scales smaller than 2-22 minutes. At arrival rates between 700 and 1300 [m −2 .s −1 ] the rainfall is a mixture 325 of Poisson rain and mostly patches of rainfall that fail the χ 2 . Data that fail the χ 2 test are patches of stationary rainfall with uncorrelated fluctuations about the mean. However the data are over or under dispersed compared to the expected Poisson value of 1 and therefore do not match the Poisson distribution. Mostly, this data is over-dispersed, i.e. the variance is greater than expected by Poisson statistics. As arrival rate increases to between 1400 and 2000 [m −2 .s −1 ], a higher proportion of rainfall (in the sub-range) fails the stationarity and independence tests indicating that rainfall is becoming more and more dynamic 330 (rapid changes in the mean and correlations between drop counts).
In the bottom panel trends in the mean drop size for Poisson and non-Poisson rain are presented. The expected mean drop size of the parameterisation at each arrival rate is also shown. The expected drop sizes are a slight over-estimate of the observed drop sizes, although they are well within the standard error. The overall agreement between the expected and observed drop sizes is quite good and in particular, over the region between 500 and 2500 m −2 .s −1 , which contributes most to the total rainfall 335 amount. This region accounts for 63% of the total rainfall amount. The parameterisation overestimates most of the drop sizes at arrival rates greater than 2500 m −2 .s −1 , however the data becomes quite sparse at higher arrival rates. The positive trend in mean drop size expected by the parameterisation is not as clear for the Poisson data as the non-Poisson data. At arrival rates less than 700m −2 .s −1 the Poisson mean drop sizes are larger than the parameterisation and non-Poisson values and at arrival rates greater than 700m −2 .s −1 the opposite is the case.

Rainfall Rates
Accurate estimates of total rainfall rainfall can be derived using Marshall and Palmer (1948)'s parameterisation with no adjustment from disdrometer arrival rate measurements. This is because the expected mean drop size of the parameterisation shows good agreement with the observed mean drop sizes. I.e. it is within the std error of the observed mean. In particular, 345 the expected and observed values match quite closely over the range of rainfall arrival rates that contribute most to the total rainfall (63% of total rainfall occurs between 500-2500 m −2 .s −1 ). The parameterisation under-estimates the observed mean drop size at low arrival rates ρ A ≤ 500 in comparison to observed values. However, it is known that impact disdrometers underestimate the number of small drops and the number of drops in general due to the truncation of drops below the detection limit.
Therefore, this difference between the parameterisation and the observed values could be a result of under-reporting of small 350 drops by the instrument. This leads to underestimates of rainfall at low arrival rates. The parameterisation also overestimates the mean size of drops at high arrival rates which leads to over-estimates of the rainfall at high arrival rates. The key point is that Marshall and Palmer (1948)'s parameterisation provides a good estimate of observed mean drop sizes and consequently accurate rainfall estimates can be derived.
This is not the case for the intervalometer estimates of rainfall. The intervalometer results in large over-estimates (by a factor 355 of approximately 3) of the total rainfall amount. This is because the intervalometer registers higher arrival rates during each rainfall event at Pole Pole in comparison to the disdrometer. The intervalometer has a smaller sensor area and a larger D min value than the disdrometer. It should not register higher arrival rates. The possible reasons for the overestimation are, splashing from the intervalometer housing onto the sensor during intense rainfall events, spurious drops due to an electromagnetic signal or physical interference with the sensor, the minimum detectable drop diameter is actually smaller than 0.8mm. Comparison of 360 the rainfall arrival rate records for the disdrometer and intervalometer, for example in figure 5, show that when the intervalometer senses rain, so too does the disdrometer and vice versa. Spurious drops from an interfering signal would also be expected to register outside the rainfall periods. This is not observed. Throughout the observational period and during all rainfall events the intervalometer registers a higher rainfall arrival rate than the disdrometer. I.e. the intervalometer overestimates are not constrained to intense rainfall periods. These two findings indicate that spurious drops and splashing are unlikely causes for the 365 higher arrival rates registered by the intervalometer. It is most likely that the parameter D min was poorly determined and the intervalometer registers drops that are smaller than 0.8mm. The overestimation of rainfall occurs because the parameterisation expects a much larger mean drop size than what is likely observed by the intervalometer.
Since the intervalometer and the disdrometer employ a similar sensor it is reasonable to assume that the drop sizes observed by the intervalometer are of a similar size to those observed by the disdrometer. Using this assumption the intervalometer 370 results are calibrated by the expected values of the mean drop size for the disdrometer. This results in accurate rainfall rates for the intervalometer compared to the tipping bucket (within 5 %) for the entire experiment. This indicates that the actual minimum detectable drop size for the intervalometer is most likely closer to 0.6mm than 0.8mm.
Both forms of the calibration also result in good estimates at another intervalometer site approximately 1 km away (within 9 % of the tipping bucket value). This indicates that the observed mean drop size and therefore the DSD is reasonably stable over 375 scales of 1 km. The calibration also gives good results outside the period of time when the disdrometer was online. The last rainfall estimate from the intervalometer is approximately 1 month later than the last measurement by the disdrometer. This indicates that the DSD is also relatively stable over the entire two month period of the experiment. The derivation of reasonably accurate rainfall measurements with both the disdrometer and the intervalometer indicates that Marshall and Palmer's (1948) parameterisation of the DSD is a good approximation of the observed DSD over the period of the experiment. The interval-380 ometer also shows good potential for being used to derive estimates of the parameter β, D min of the drop size distribution.
Using only 22 data points it was possible to estimate β = 0.37, D min = 0.53. More work, with a larger data-set is necessary to fully assess the validity of using intervalometer measurements for deriving estimates of the DSD parameters but this first step shows good promise.

385
The results show that the majority of rainfall does not comply with the Poisson Homogeneity assumption. Over all the sites only 22.5 % of all the raindrops observed by the intervalometers can be reasonably assumed to behave according to Poisson statistics.
For the disdrometer only 15% of the rainfall behaves according to Poisson statistics. The majority of this Poisson rainfall is to be found in a series of "atypical" rainfall events. These events are atypical because they are characterised by consistent periods of light rainfall that have a duration of up to several hours interspersed with sharper peaks of higher intensity. The rest of the 390 rainfall record is characterised by short intense showers with high arrival rates preceded and followed by dry periods. It seems that rainfall can be divided into two types over the experimental period. Consistent light rain which is most often classified as Poisson and short, intense showers that are never classified as Poisson.
The results of the tests indicate that high arrival rates are indicative of rainfall which has a fluctuating mean on very short time scales (< 2 min in some cases). Rainfall with high arrival rates is also characterised by correlations between drop counts on 395 very short time scales. Almost all of the rainfall that contributes most to the total rainfall amount does not exhibit characteristics that are consistent with Poisson statistics. One would then expect that estimates of rainfall based on a parameterisation that has been defined independently of the size of a reference volume, thus implying an assumption of homogeneity, would not return good results. This is not the case.
Estimates of rainfall are good and more surprisingly the trend in mean drop size with increasing rainfall arrival rate is not 400 only consistent with expected values derived from the parameterisation but also appears to be mostly captured by non-Poisson rainfall. The trend in the mean drop size of Poisson rainfall with increasing arrival rate is much less clear. This would imply that estimates of rainfall derived from an exponential parameterisation of the DSD would be less accurate over the patches of rainfall that contain Poisson rain as opposed to patches of rainfall that contain non-Poisson rain. In figure 11 two different rainfall patches of a similar total rainfall amount but very different arrival rate profiles are compared. One event contains a 405 significant proportion of Poisson rain and the other contains no Poisson rain. The figure clearly shows that the quality of the rainfall estimate is much worse for the rainfall event that contains Poisson rain. In that event rainfall is under-estimated by 22 Figure 11. The performance of the rainfall parameterisation over a period of rainfall with a high proportion of "Poisson Rain" (top panel) compared to a period of rainfall with a similar total rainfall amount but with no "Poisson Rain" (bottom panel). The disdrometer estimates are plotted against the Tipping bucket values and the rainfall arrival rate in both panels.
approximately 41%. In the rainfall event with no Poisson rain, the parameterised estimate is within 10% of the tipping bucket value. This seems to indicate that the presence of Poisson rainfall leads to worse rainfall estimates. However, the rainfall event with Poisson rain also contains a significantly higher proportion of light rainfall in general (both Poisson and not) compared 410 to the rainfall event with no Poisson rain. It is known that impact disdrometers underestimate the numbers of small drops and therefore the rainfall rate at low rainfall arrival rates. So, whilst the figure does show that rainfall estimates are worse when there is Poisson rainfall this cannot be de-tangled from the fact that rainfall estimates in general are also worse when arrival rates are low. More work needs to be done in order to understand if the poor rainfall estimates are due to Poisson rain or are simply an artefact of the measuring instrument. However, this research does show that the compliance with the Poisson One of the criticisms that arises with the statistical tests employed in this research is that the tests are less strict with smaller sample sizes and also at lower arrival rates. This could bias the results such that rainfall with low arrival rates is more likely to pass all of the tests. This was understood by Cornford (1967) and led to his simple sampling criterion requiring 23 drops per bin size. This criterion is not fulfilled in this study. However as is pointed out by Jameson and Kostinski Kostinski and Jameson 420 (1997); Jameson and Kostinski (1998), rainfall conditions are changing rapidly, sometimes at scales less than 2 minutes. The presence of these fine structures within rainfall would be obscured by larger sampling windows. Furthermore sampling across such structures with different means may actually lead to increased uncertainty in the mean. This increased uncertainty in the mean over an entire rainfall event would make it impossible to test the Homogeneous Poisson assumption because rainfall is very rarely stationary over longer time periods. Therefore in such cases the sampling criteria need to be adjusted to account for 425 the patch size. In this research it was decided to treat τ as a moving window to increase the effective number of samples and account for the small sample size. In this way the same tests are run on each of the drop counts many times, providing more robust and reliable results.

Conclusions
This research leads to the following conclusions.

430
1. The majority of rainfall that was observed is not consistent with Poisson statistics on observation scales from 2-22 minutes. The observed Poisson rainfall is characterised by low mean rainfall arrival rates. No Poisson rain is observed with 2. The majority of the Poisson rainfall can be associated with a series of storms over a three day period that are atypical 435 in comparison to the entire observed rainfall period. These storms are characterised by long periods of light stratiform type rainfall, most likely caused by a large scale synoptic forcing. The rest of the rainfall record is mostly comprised of convective showers.
3. The homogeneous Poisson assumption does not apply for the majority of rainfall observed in this study. Rainfall shows 440 correlations between drop counts and changes in the mean at scales as small as 2 min. It is possible that rainfall is homogeneously distributed at smaller time scales but these would be so small as to invalidate the very concept of a drop size distribution.
4. Despite the apparent invalidity of the Homogeneous Poisson assumption, plots of mean drop sizes against rainfall arrival 445 rate reveal that the expected mean drop sizes from Marshall and Palmer (1948) parameterisation shows good agreement with observed values both over spatial scales of 1km and a temporal period of 2 months. 5. Total cumulative rainfall estimates derived from the disdrometer drop counts are within the standard error of the total rainfall amount measured by a co-located tipping bucket over the same time period. 450 6. The intervalometers at both tipping bucket sites give large over estimates of the total rainfall. This is most likely due to a poor calibration of the parameter D min . The actual D min is most likely close to 0.6mm. Constraining the intervalometer arrival rates by the observed mean drop sizes results in rainfall estimates that are within within 5-10% of tipping bucket measurements. The form of the constraint relationship is the parameterisation used for the disdrometer measurements.

455
The accuracy of rainfall estimates is determined by the accuracy of the DSD parameterisation.
7. It is possible to determine reasonable rainfall estimates using an intervalometer. It is also likely that the intervalometer can be used in conjunction with co-located rain gauges to give good estimates of mean drop sizes and therefore the parameters of the exponential DSD. In turn this may improve satellite-derived rainfall estimates. Due to its low cost, the 460 instrument shows good potential for being deployed in Africa to alleviate the observational crisis.

mean_drop_sizes.py
A more detailed description of each script and function is presented in the next section.

Overview
This script reads all of the raw intervalometer txt files from each of the sites and processes the data. The raw data is in 4 forms all mixed into one txt file; a millisecond Unix timestamp at the start of each txt file, timestamps (in millisecond Unix time) for each time a drop is registered by the sensor, a check timestamp every 10 minutes so that you know that the intervalometer is online even when it is not raining and a voltage stamp (depending on the version of the Arduino software installed on the intervalometer). Version 6 includes voltage readings and Version 5 does not.
The script sorts the drop data from the check and voltage data and records the start and end times of each txt file. The end time is taken as the time of the last drop. Unix time is converted to date-time in UTC. The script also imports a manual record of when the intervalometer was being physically handled. E.g. working around the sensor to download data etc. The script deletes all drops registered within the manual record windows since these are spurious drops from touching the sensor. All the start and end times of each of the txt files are used to determine a continuous record for the instrument. I.e. the time period when the instrument was online and registering rainfall. Finally the continuous record data, drop arrival data and check/voltage data are each saved to their own txt file.

Usage
Modify the root_path variable within this script so that it points to the folder where all the data has been saved. No other changes are necessary. This script can be run from the terminal by calling: python path_to_script -s 'Name_of_site' For example: python '/Users/didierdevilliers/Documents/TU_Delft/Graduation/Python_Scripts/import_DC.py' -s 'Didimiza' Alternatively, you can open the script in a python IDE, such as Spyder, and run it within that environment.

Inputs
The excel file containing the manual record of working around the sensors and all the raw intervalometer data txt files.

Outputs
The script produces three txt files which contain the voltage data, processed drop arrival data and check/voltage data. These files are saved to sub-folders within the root_path folder.

A.2.2. Script Name: import_disdro.py
Overview This script reads the raw disdrometer csv file and removes any spurious drops from within the manual record windows that may have been caused by touching the sensor. The 'cleaned' data is saved to a txt file.

Usage
Modify the root_path variable within this script so that it points to the folder where all the data has been saved. No other changes are necessary. This script can be run from the terminal by calling: python path_to_script. Alternatively, you can open the script in a python IDE, such as Spyder, and run it within that environment.

Inputs
The excel file containing the manual record of working around the sensors and the csv file containing the raw disdrometer data.

Outputs
One txt file containing the cleaned drop data from the disdrometer.

A.2.3. Script Name: import_tb.py
Overview This script imports the raw tipping bucket data in csv files from Shamba Kilole, MIL1 and Pole Pole and removes any spurious tips from within the manual record windows. The results of the field calibration are also applied to convert from tips to mm of rainfall. The cleaned and processed rainfall amounts as well as the tips are saved to a txt file.

Usage
Modify the root_path variable within this script so that it points to the folder where all the data has been saved. No other changes are necessary. This script can be run from the terminal by calling: python path_to_script. Alternatively, you can open the script in a python IDE, such as Spyder, and run it within that environment.

Inputs
The csv files of raw tipping bucket data from teh three sites and the excel file containing the manual record of working around the sensors.

Outputs
A txt file containing the rainfall tips and volume of each tip (determined by a field calibration experiment).

A.2.4. Script Name: continuous_record_dc.py
Overview This script reads all the processed intervalometer drop data txt files that were generated with the import_DC.py script as well as the files containing the continuous record (periods of online operation) for each intervalometer. All the drop data within the continuous record periods is merged into one data frame for each site along with an indication of which continuous record period a drop corresponds to. The complete continuous record of drops for each site is then re-sampled into 10 second time bins. The re-sampled drop data as well as the continuous record of drops are saved to separate txt files for each site. I.e. this script takes many different txt files of drop data from each site and combines them into two txt files, one with data that has been re-sampled to 10s time bins and one with the original timestamps. The script also makes some plots with the drop data for each site.

Usage
Modify the root_path, drop_path and cr_path variables within this script so that they point to the relevant folders. No other changes are necessary. This script can be run from the terminal by calling: python path_to_script. Alternatively, you can open the script in a python IDE, such as Spyder, and run it within that environment.

Inputs
Processed drop data for each intervalometer site as well as the continuous record data for each intervalometer site.

Outputs
Two txt files for each intervalometer site, one containing the entire record of continuous drop data for the site and the other containing the same record but re-sampled into 10s time bins.

A.2.5. Script Name: continuous_record_disdro.py
Overview This script reads the processed disdrometer data txt file that was generated with the import_disdro.py script and deletes data after the malfunction date. The remaining data is re-sampled into 10s time bins. For each bin some basic statistical indices of the drop sizes (mean, median, var etc) are calculated.
The processed data are saved to a txt file. The script also makes some plots of the drop arrival time series.

Usage
Modify the root_path and disdro_path within this script so that they point to the relevant folders. No other changes are necessary. This script can be run from the terminal by calling: python path_to_script. Alternatively, you can open the script in a python IDE, such as Spyder, and run it within that environment.

Inputs
Processed drop data from the disdrometer.

Outputs
One txt file containing re-sampled drop data and basic statistical indices for each 10s bin.
A.2.6. Script Name: stations_online.py Overview This script reads the processed tipping bucket data from Pole Pole generated by import_tb.py, the processed disdrometer data generated by continuous_record_disdro.py and the continuous record data for each intervalometer site generated by import_DC.py and uses this to generate a plot showing the periods within the data record when the different instruments are online or offline in comparison to one another. The plot is saved to a specified path.

Usage
Modify the root_path variable and the path where the figure is saved so that they point to the relevant folders. No other changes are necessary. This script can be run from the terminal by calling: python path_to_script. Alternatively, you can open the script in a python IDE, such as Spyder, and run it within that environment.

Inputs
Continuous record data, processed tipping bucket data and processed disdrometer data.

Outputs
A plot saved in a png file.
A.2.7. Script Name: Poisson_Testing.py Overview This script reads the re-sampled disdrometer and intervalometer data (for each site) and then performs some tasks. It first separates the raindrop record into distinct rain events, using the function seper-ate_rain_events.py, determined by a dry period of greater than 1 hour between consecutive drops. The auto correlation of each rainfall event at increasing lag times is calculated. The lag time at which the auto-correlation drops below is defined as . A check is performed to determine if ≪ ≪ , where = 10 and T is the length of the rainfall event. If the rainfall event passes this 'Kostinski' criterion it is labelled a 'Kostinksi' storm. The script then passes all the Kostinski storms to another function called poisson_test.py. This function performs all of the hierarchical tests for 'Poisson-ness' on each of the Kostinski storms for each of the sites. The distinct rain events, Kostinski storms and the results of the Poisson tests are all saved to their own txt file. This analysis is performed for the intervalometer data at each site as well as the disdrometer data.

Usage
Modify the root_path and disdro_path within this script so that they point to the relevant folders. No other changes are necessary. This script can be run from the terminal by calling: python path_to_script. Alternatively, you can open the script in a python IDE, such as Spyder, and run it within that environment.

Outputs
Three txt files for each intervalometer site as well as the disdrometer site. The txt files contain, the separate rain events, the Kostinski storms and the results of the Poisson tests.

A.2.8. Script Name: PoissonAnalysis.py
Overview This script reads in many data files; the results of the Poisson tests, the Kostinski storms, the separate rain events, the re-sampled drop data and the un-resampled drop data for the disdrometer and all the intervalometer sites. Several different analyses are performed on the data. All the plots are saved to a prescribed path.

Usage
Modify the root_path within this script so that it points to the relevant folders. No other changes are necessary. This script must be opened in a python IDE, such as Spyder, and run within that environment.

Inputs
The function exp_drop_size.py and the following data for the intervalometer sites and the disdrometer: This function takes three arguments; a data frame of re-sampled drop data,a string containing the name of the of the site and a string specifying the re-sample period (in this case 10s). The function takes the re-sampled drop data and separates it into distinct rainfall events using a criterion of more than 1 hour between consecutive drops. The value of for each storm is determined and the 'Kostinski' criterion is applied. The function returns the separated rain events, the Kostinski storms, the and used to determine the Kostinski storms and a continuous record counter.

Usage
The function must be imported into the relevant script (as you would import any python module) and can then be called within the script by its name. Note, for the import to work, the function location in your hard drive must be part of the Python PATH.

Inputs
This function takes three arguments; a data frame of re-sampled drop data,a string containing the name of the of the site and a string specifying the re-sample period (in this case 10s).

Outputs
The function returns the separated rain events, the Kostinski storms, the criterion used to determine the Kostinski storms and a continuous record counter.
A.2.11. Function Name: poisson_test.py Overview This function takes the Kostinski storms and performs a series of tests of them to determine if the rainfall data can reasonably be assumed to comply with the Poisson Homogeneity hypothesis. The tests are performed on a sub-section of each Kostinski storm with length , determined in the seper-ate_rain_events.py function. The series of tests that are performed are explained in the methodology section of the paper. This function returns the results of the tests for each sub-section of each Kostinski storm.

Usage
The function must be imported into the relevant script (as you would import any python module) and can then be called within the script by its name. Note, for the import to work, the function location in your hard drive must be part of the Python PATH.

Inputs
The function takes the following arguments: Kostinski storm data, storm number (identifier), value, n = number of sub-sections within each storm, continuous record period, re-sample time, site name and the instrument (disdrometer or intervalometer). All of these inputs are calculated in previous scripts or are outputs of previous functions.

Outputs
This function returns the results of the tests for each sub-section of each Kostinski storm in a data frame.
A.2.12. Function Name: expon_rain.py Overview This function calculates the rainfall rate from the rainfall arrival rate using Marshall and Palmer's (1948) parameterisation. The rainfall rate is calculated using the polynomial constants from the function exp_poly_constants.py. The function returns a data frame containing the rainfall rates.

Usage
The function must be imported into the relevant script (as you would import any python module) and can then be called within the script by its name. Note, for the import to work, the function location in your hard drive must be part of the Python PATH.

Inputs
The function takes three arguments, the rainfall arrival rate data, the instrument (disdrometer or intervalometer) and the value of . The function also uses another function called exp_poly_constants.py.

Outputs
The rainfall rates.

A.2.13. Function Name: exp_poly_constants.py
Overview This function fits a third degree polynomial to the − relationship given by Marshall and Palmer's (1948) parameterisation. It takes only one argument, , the minimum detectable drop size and returns the polynomial constants.

Usage
The function must be imported into the relevant script (as you would import any python module) and can then be called within the script by its name. Note, for the import to work, the function location in your hard drive must be part of the Python PATH.

Inputs
The value of .

Outputs
The polynomial constants.
A.2.14. Function Name: exp_drop_size.py Overview This function calculates the expected mean drop size as a function of rainfall arrival rate using the expectation of a left truncated gamma distribution. It first converts the rainfall arrival rate to rainfall rate using the function exp_poly_constants.py and then converts the rainfall rate to Λ in order to calculate the expected mean drop size. The function returns the expected drop size for the truncated distribution as well as for the complete distribution.

Usage
The function must be imported into the relevant script (as you would import any python module) and can then be called within the script by its name. Note, for the import to work, the function location in your hard drive must be part of the Python PATH.

Inputs
The function takes three arguments, the rainfall arrival rate data, the instrument (disdrometer or intervalometer) and the value of . The function also uses another function called exp_poly_constants.py.

Outputs
The mean expected drop size for the truncated gamma and complete gamma distributions.

A.2.15. Function Name: expon_rain_adj.py
Overview This function constrains the rainfall estimates from Marshall and Palmer's (1948) parameterisation by a priori observations of the mean drop sizes from the disdrometer by using the equation derived in the methodology section of the paper for a complete gamma distribution.

Usage
The function must be imported into the relevant script (as you would import any python module) and can then be called within the script by its name. Note, for the import to work, the function location in your hard drive must be part of the Python PATH.

Inputs
The function takes three arguments, the rainfall arrival rate data, the instrument (disdrometer or intervalometer) and the value of . The function also uses two other functions, exp_poly_constants.py and exp_drop_size.py.

Outputs
The corrected rainfall rates.

A.2.16. Function Name: rain_adj_final.py
Overview This function constrains the rainfall estimates from Marshall and Palmer's (1948) parameterisation by a priori observations of the mean drop sizes from the disdrometer by using the equation derived in the methodology section of the paper for a left truncated gamma distribution.

Usage
The function must be imported into the relevant script (as you would import any python module) and can then be called within the script by its name. Note, for the import to work, the function location in your hard drive must be part of the Python PATH.

Inputs
The function takes three arguments, the rainfall arrival rate data, the instrument (disdrometer or intervalometer) and the value of . The function also uses two other functions, exp_poly_constants.py and exp_drop_size.py.

Outputs
The corrected rainfall rates.