You are on page 1of 28

04/04/2006

Hydrologic Statistics
Reading: Chapter 11 in Applied Hydrology Some slides by Venkatesh Merwade

Hydrologic Models
Classification based on randomness.

Deterministic (eg. Rainfall runoff analysis)


Analysis of hydrological processes using deterministic approaches Hydrological parameters are based on physical relations of the various components of the hydrologic cycle. Do not consider randomness; a given input produces the same output.

Stochastic (eg. flood frequency analysis)


Probabilistic description and modeling of hydrologic phenomena Statistical analysis of hydrologic data.
2

Probability
A measure of how likely an event will occur A number expressing the ratio of favorable outcome to the all possible outcomes Probability is usually represented as P(.)
P (getting a club from a deck of playing cards) = 13/52 = 0.25 = 25 % P (getting a 3 after rolling a dice) = 1/6

Random Variable
Random variable: a quantity used to represent probabilistic uncertainty
Incremental precipitation Instantaneous streamflow Wind velocity

Random variable (X) is described by a probability distribution Probability distribution is a set of probabilities associated with the values in a random variables sample space
4

Sampling terminology
Sample: a finite set of observations x1, x2,.., xn of the random variable A sample comes from a hypothetical infinite population possessing constant statistical properties Sample space: set of possible samples that can be drawn from a population Event: subset of a sample space

Example Population: streamflow Sample space: instantaneous streamflow, annual maximum streamflow, daily average streamflow Sample: 100 observations of annual max. streamflow Event: daily average streamflow > 100 cfs
6

Types of sampling
Random sampling: the likelihood of selection of each member of the population is equal Pick any streamflow value from a population Stratified sampling: Population is divided into groups, and then a random sampling is used Pick a streamflow value from annual maximum series. Uniform sampling: Data are selected such that the points are uniformly far apart in time or space Pick steamflow values measured on Monday midnight Convenience sampling: Data are collected according to the convenience of experimenter. Pick streamflow during summer

Summary statistics
Also called descriptive statistics
If x1, x2, xn is a sample then
Mean,
1 n X xi n i 1
1 n S xi X n 1 i 1
2 2

m for continuous data

Variance, Standard deviation, Coeff. of variation,

s2 for continuous data s for continuous data

S S2
CV S X

Also included in summary statistics are median, skewness, correlation coefficient,


8

Graphical display
Time Series plots Histograms/Frequency distribution Cumulative distribution functions Flow duration curve

10

Time series plot


Plot of variable versus time (bar/line/points) Example. Annual maximum flow series
600 500
400

Annual Max Flow (10 3 cfs)

300 200
100

0 1905 1900

1908 1900 1918

1927 19001938

1948 1900 1958 Year Year

1968 1900

1978 1900 1988

1998 1900

Colorado River near Austin


11

Histogram
Plots of bars whose height is the number ni, or fraction (ni/N), of data falling into one of several intervals of equal width
30 60 100

90
50 25
occurences No. of No. occurences of occurences No. of

80 60

70 40 20
30 15 50 20 10

Interval = 50,000 cfs Interval = 25,000 Interval = 10,000 cfscfs

40 30 10

10 20 5 0 0 0
50 0 10 0 15 0 20 0 25 0 30 0 35 0 40 0 45 0
0

0 50 50 100 100 150 150 200 200 250 250

300 300

350 400 400 450 450 500 500 350

3 33cfs) Annual m ax flow (10 Annual m ax flow (10 Annual m ax flow (10cfs) cfs)

Dividing the number of occurrences with the total number of points will give Probability 12 Mass Function

50 0

Using Excel to plot histograms


1) Make sure Analysis Tookpak is added in Tools. This will add data analysis command in Tools

2) Fill one column with the data, and another with the intervals (eg. for 50 cfs interval, fill 0,50,100,) 3) Go to ToolsData AnalysisHistogram

4) Organize the plot in a presentable form (change fonts, scale, color, etc.)

14

Probability density function


Continuous form of probability mass function is probability density function
0.9 100

90 0.8 80 0.7
occurences No. of Probability
0.6 0.5 0.4 0.3

70 60 50 40 30

0.2 20 0.1 10

0 0
0

50 100 100

150 200

200 300 250

300 400 350

400 500450

500 600

3 3 Annual m ax flow (10 cfs) Annual m ax flow (10 cfs)

pdf is the first derivative of a cumulative distribution function


15

Cumulative distribution function


Cumulate the pdf to produce a cdf Cdf describes the probability that a random variable is less than or equal to specified value of x
1

P (Q 50000) = 0.8
0.8

Probability

P (Q 25000) = 0.4
0.6

0.4

0.2

0 0 100 200 300 400 500 600 Annual m ax flow (103 cfs)
17

Flow duration curve


A cumulative frequency curve that shows the percentage of time that specified discharges are equaled or exceeded.

Steps

Arrange flows in chronological order Find the number of records (N) Sort the data from highest to lowest Rank the data (m=1 for the highest value and m=N for the lowest value) Compute exceedance probability for each value using the following formula
p 100 m N 1

Plot p on x axis and Q (sorted) on y axis


22

Flow duration curve in Excel

600

500

400

Q (1000 cfs)

Median flow

300

200

100

0 0 20 40 60 80 100 % of tim e Q w ill be exceeded

23

Statistical analysis
Regression analysis Mass curve analysis Flood frequency analysis Many more which are beyond the scope of this class!

24

Linear Regression
A technique to determine the relationship between two random variables.
Relationship between discharge and velocity in a stream Relationship between discharge and water quality constituents
A regression model is given by : yi

b 0 b1 xi e i

i 1,2,..., n

yi = ith observation of the response (dependent variable) xi = ith observation of the explanatory (independent) variable b0 = intercept b1 = slope ei = random error or residual for the ith observation n = sample size
25

Least square regression


We have x1, x2, , xn and y1,y2, , yn observations of independent and dependent variables, respectively. i b 0 b1 xi i 1,2,..., n Define a linear model for yi, y Fit the model (find b0 and b1) such at the sum of the squares of the vertical deviations is minimum
Minimize yi y i 2 ( yi b 0 b1 xi ) 2
Regression applet: 26 http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

i 1,2,..., n

Linear Regression in Excel


Steps:
Prepare a scatter plot Fit a trend line
1800 1500

TDS (mg/L)

1200 900 600 300 0 0

TDS = 0.5946(sp. Cond) - 15.709 R2 = 0.9903

Data are for Brazos River near Highbank, TX

500

1000

1500

2000

2500

3000

Specific Conductance ( S/cm)

Alternatively, one can use ToolsData AnalysisRegression 27

Coefficient of determination (R2)


It is the proportion of observed y variation that can be explained by the simple linear regression model
SSE R 1 SST
2

SST ( yi y ) 2 Total sum of squares, Ybar is the mean of yi i ) 2 Error sum of squares SSE ( yi y
The higher the value of R2, the more successful is the model in explaining y variation. If R2 is small, search for an alternative model (non linear or multiple regression model) that can more effectively explain y variation
28

You might also like