Professional Documents
Culture Documents
Hydrologic Statistics
Reading: Chapter 11 in Applied Hydrology Some slides by Venkatesh Merwade
Hydrologic Models
Classification based on randomness.
Probability
A measure of how likely an event will occur A number expressing the ratio of favorable outcome to the all possible outcomes Probability is usually represented as P(.)
P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 % P (getting a 3 after rolling a dice) = 1/6
Random Variable
Random variable: a quantity used to represent probabilistic uncertainty
Incremental precipitation Instantaneous streamflow Wind velocity
Random variable (X) is described by a probability distribution Probability distribution is a set of probabilities associated with the values in a random variables sample space
4
Sampling terminology
Sample: a finite set of observations x1, x2,.., xn of the random variable A sample comes from a hypothetical infinite population possessing constant statistical properties Sample space: set of possible samples that can be drawn from a population Event: subset of a sample space
Example Population: streamflow Sample space: instantaneous streamflow, annual maximum streamflow, daily average streamflow Sample: 100 observations of annual max. streamflow Event: daily average streamflow > 100 cfs
6
Types of sampling
Random sampling: the likelihood of selection of each member of the population is equal Pick any streamflow value from a population Stratified sampling: Population is divided into groups, and then a random sampling is used Pick a streamflow value from annual maximum series. Uniform sampling: Data are selected such that the points are uniformly far apart in time or space Pick steamflow values measured on Monday midnight Convenience sampling: Data are collected according to the convenience of experimenter. Pick streamflow during summer
Summary statistics
Also called descriptive statistics
If x1, x2, xn is a sample then
Mean,
1 n X xi n i 1
1 n S xi X n 1 i 1
2 2
S S2
CV S X
Graphical display
Time Series plots Histograms/Frequency distribution Cumulative distribution functions Flow duration curve
10
300 200
100
0 1905 1900
1927 19001938
1968 1900
1998 1900
Histogram
Plots of bars whose height is the number ni, or fraction (ni/N), of data falling into one of several intervals of equal width
30 60 100
90
50 25
occurences No. of No. occurences of occurences No. of
80 60
70 40 20
30 15 50 20 10
40 30 10
10 20 5 0 0 0
50 0 10 0 15 0 20 0 25 0 30 0 35 0 40 0 45 0
0
300 300
3 33cfs) Annual m ax flow (10 Annual m ax flow (10 Annual m ax flow (10cfs) cfs)
Dividing the number of occurrences with the total number of points will give Probability 12 Mass Function
50 0
2) Fill one column with the data, and another with the intervals (eg. for 50 cfs interval, fill 0,50,100,) 3) Go to ToolsData AnalysisHistogram
4) Organize the plot in a presentable form (change fonts, scale, color, etc.)
14
90 0.8 80 0.7
occurences No. of Probability
0.6 0.5 0.4 0.3
70 60 50 40 30
0.2 20 0.1 10
0 0
0
50 100 100
150 200
400 500450
500 600
P (Q 50000) = 0.8
0.8
Probability
P (Q 25000) = 0.4
0.6
0.4
0.2
0 0 100 200 300 400 500 600 Annual m ax flow (103 cfs)
17
Steps
Arrange flows in chronological order Find the number of records (N) Sort the data from highest to lowest Rank the data (m=1 for the highest value and m=N for the lowest value) Compute exceedance probability for each value using the following formula
p 100 m N 1
600
500
400
Q (1000 cfs)
Median flow
300
200
100
23
Statistical analysis
Regression analysis Mass curve analysis Flood frequency analysis Many more which are beyond the scope of this class!
24
Linear Regression
A technique to determine the relationship between two random variables.
Relationship between discharge and velocity in a stream Relationship between discharge and water quality constituents
A regression model is given by : yi
b 0 b1 xi e i
i 1,2,..., n
yi = ith observation of the response (dependent variable) xi = ith observation of the explanatory (independent) variable b0 = intercept b1 = slope ei = random error or residual for the ith observation n = sample size
25
i 1,2,..., n
TDS (mg/L)
500
1000
1500
2000
2500
3000
SST ( yi y ) 2 Total sum of squares, Ybar is the mean of yi i ) 2 Error sum of squares SSE ( yi y
The higher the value of R2, the more successful is the model in explaining y variation. If R2 is small, search for an alternative model (non linear or multiple regression model) that can more effectively explain y variation
28