6 CE 411 - HYDROLOGY (Statistical Measures)

AMIE LOU G.
CISNEROS, CE
Instructor
Eng’g & Technology Dep’t
Cor Jesu College
• When determining how to appropriately
analyze any collection of data, the first
consideration must be the characteristics of
the data themselves.
• A lower bound of zero. No negative values are
possible.
• Presence of outliers.
• Positive skewness.
• Non – normal distribution of data.
• Seasonal patterns
• Data reported only as below or above some
threshold.
• Autocorrelation.
• Dependence on other uncontrolled variables.
• The mean (𝒙 ) is computed as the sum of all
data values 𝒙𝒊 , divided by the sample size 𝒏:
𝒏
𝒙𝒊
𝒙=
𝒏
𝒊=𝟏
• INFLUENCE OF THE ANY OBSERVATION ON THE

MEAN
▫ The mean is the balance point of the data.
▫ Data points further from the center exert a
stronger downward force than those closer to the
center.
• SENSITIVITY OF THE MEAN
▫ The sensitivity to the magnitude of a small
number of points in the data set defines why
the mean is not a “resistant” measure of
location.
▫ It is not resistant to changes in the presence
of, or to changes in the magnitudes of, a few
outlying observations.
• The median or 50th percentile 𝑷𝟎.𝟓𝟎 , is the
central value of the distribution when the
data are ranked in order of magnitude.
• For an odd number of observations, the
median is the data point which has an equal
number of observations both above and
below it.
• For an even number of observations, it is the
average of the two central observations.
• To compute the median, first rank the
observations from the smallest to the largest,
so that 𝒙𝟏 is the smallest observation, up to
𝒙𝒏 , the largest observation. Then
▫ 𝒎𝒆𝒅𝒊𝒂𝒏 𝑷𝒏 = 𝒙(𝒏+𝟏)/𝟐 , when 𝒏 is odd, and

𝟏
▫ 𝒎𝒆𝒅𝒊𝒂𝒏 𝑷𝒏 = 𝟐 𝒙𝒏/𝟐 + 𝒙 𝒏/𝟐 +𝟏 , when 𝒏 is
even.
• The median is only minimally affected by
the magnitude of a single observation,
being determined solely by the relative
order of observations.
• The sample variance, and its square root the sample standard
deviation, are the classical measures of spread.
• Like the mean, they are strongly influenced by outlying values.
𝑛 2
𝑥𝑖 − 𝑥
𝑠2 =
𝑛−1
𝑖=1
𝑠 = 𝑠2
• They are computed using the squares of deviations of data
from the mean, so that outliers influence their magnitudes
even more so than for the mean.
• When outliers are present these measures are unstable and
inflated. They may give the impression of much greater
spread than is indicated by the majority of the data set.
• INTERQUARTILE RANGE (IQR)
▫ the most commonly used resistant measure of spread.
▫ It measures the range of the central 50 percent of the
data, and is not influenced at all by the 25 percent on
either end by considering a data set ordered from
smallest to largest: 𝒙𝒊 , 𝒊 = 𝟏, … , 𝒏
▫ Is defined as the 75th percentile minus the 25th
percentile.
 75th percentile – upper quartile, a value which exceeds
no more than 75 percent of the data and is exceeded by
no more than 25 percent of the data.
 25th percentile – lower quartile, a value which exceeds no
more than 25 percent of the data and is exceeded by no
more than 75 percent.
• Percentiles 𝑃𝑗 are computed using the following
equation:
𝑃𝑗 = 𝑥(𝑛+1)∙𝑗
 𝑛 = the sample size of 𝑥𝑖

 𝑗 = the fraction of data less than or equal to the
percentile value ( for the 25th , 50th and 75th
percentiles, 𝑗 = 0.25,0.50, 0.75, respectively.
𝐼𝑄𝑅 = 𝑃0.75 − 𝑃0.25

• MEAN ABSOLUTE DEVIATION
▫ Computed by first listing the absolute value of
the difference between each observation and
the median.
𝑴𝑨𝑫 = 𝒎𝒆𝒅𝒊𝒂𝒏 𝒙𝒊 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝒙𝒊

• Hydrologic data are typically skewed,
meaning that data sets are not symmetric
around the mean or median, with extreme
values extending out longer in one
direction.
• When the data is skewed, the mean is not
expected to equal the median, but it pulled
toward the tail of the distribution.
• COEFFICIENT OF SKEWNESS 𝒈
▫ The skewness measure used most often.
▫ It is adjusted third moment divided by the
cube of the standard deviation.
𝑛
𝑛 𝑥𝑖 − 𝑥 3
𝑔=
𝑛−1 𝑛−2 𝑠3
𝑖=1
• QUARTILE SKEW COEFFICIENT
▫ A more resistant measure of skewness.
𝑃0.75 − 𝑃0.50 − 𝑃0.50 − 𝑃0.25

𝑞𝑠 =
𝑃0.75 − 𝑃0.25
▫ Right – skewed: +𝑞𝑠

▫ Left – skewed: −𝑞𝑠
• Observations whose values are quite different
than others in the data set, often cause
concern or alarm.
• Outliers may be the most important point in the
data set, and should be investigated further.
• Outliers can have one of three causes:
▫ A measurement or recording error;
▫ An observation from the population not similar to
that of most of the data;
▫ A rare event from a single population that is quite
skewed.
• Transformations are used for three purposes:
▫ To make data more symmetric;
▫ To make data more linear; and
▫ To make data more constant in variance.
• Graphs provide crucial information to the
data analyst which is difficult to obtain in
any other way.
• Computing statistical measures without
looking at a plot is an invitation to
misunderstanding data.
• Graphs provide visual summaries of data
which more quickly and completely
describe essential information than do
tables of numbers.
• Graphs are essential for two purposes:
▫ To provide insight for the analyst into the data
under scrutiny;
▫ To illustrate important concepts when
presenting the results to others.
• These procedures that often or should be the
“first look” at data.
• Patterns and theories of how the system
behaves are developed by observing the
data through graphs.
• Their results provide guidance for the
selection of appropriate deductive
hypothesis testing procedures.
• HISTOGRAMS
▫ are familiar graphics,
and their construction is
detailed in numerous
introductory texts on
statistics.
▫ Bars are drawn whose
height is the number 𝑛𝑖
𝑛
or fraction 𝑛𝑖 of data
falling into one of
several categories or
intervals.
• QUANTILE PLOTS
▫ Visually portray the
quantiles, or
percentiles of the
distribution of sample
data.
• For the sample:
2 3 5 45 46 47
48 50 90 151 208
• Find the mean, median, IQR, standard

deviation, MAD, skewness, and quartile
skew coefficient.

6 CE 411 - HYDROLOGY (Statistical Measures)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6 CE 411 - HYDROLOGY (Statistical Measures)

Uploaded by

Copyright:

Available Formats

AMIE LOU G.

• INFLUENCE OF THE ANY OBSERVATION ON THE

▫ 𝒎𝒆𝒅𝒊𝒂𝒏 𝑷𝒏 = 𝒙(𝒏+𝟏)/𝟐 , when 𝒏 is odd, and

 𝑛 = the sample size of 𝑥𝑖

𝐼𝑄𝑅 = 𝑃0.75 − 𝑃0.25

𝑴𝑨𝑫 = 𝒎𝒆𝒅𝒊𝒂𝒏 𝒙𝒊 − 𝒎𝒆𝒅𝒊𝒂𝒏 𝒙𝒊

𝑃0.75 − 𝑃0.50 − 𝑃0.50 − 𝑃0.25

▫ Right – skewed: +𝑞𝑠

• Find the mean, median, IQR, standard

You might also like