You are on page 1of 41

Introduction to Probability and Statistics for

Civil Engineering Applications

Prepared by:
Engr. Kenny B. Cantila

Probability is the study of chance and is a very fundamental subject that we


apply in everyday living, while statistics is more concerned with how we
handle data using different analysis techniques and collection methods.

Traffic Engineering
(number of accidents, number of vehicles, traffic delay, routing)

Hydrology
(rainfall intensity, flood frequency analysis, return period)

Earthquake Engineering
(earthquake occurrence, ground motions, seismic risk assessment)

Material Testing
(compressive strength, tensile strength, modulus of rupture, density, unit
weight, moisture content, porosity, void ratio)

Rainfall Data

Graphical Representation

Line Diagram/Bar Chart


Dot diagram
Histogram
Frequency Polygon
Cumulative relative frequency diagram

Line Diagram/Bar Chart


In this type of graph, the horizontal axis gives the values of the discrete
variable and the occurrences are represented by the heights of vertical lines.
The horizontal spread of these lines and their relative heights indicate the
variability and other characteristics of the data.

Table1: Number of floods occurrences per year from 1939 to 1972 at the
gauging station of Calamazza on the Magra River, between Pisa
and Genoa in Northweastern Italy.
Number of Floods in a year
Number of Occurrences
0
0
1
2
2
6
3
7
4
9
5
4
6
1
7
4
8
1
9
0
34
A flood occurrence is defined as river discharge exceeding 300 m3/s

Line Diagram/Bar Chart

Dot Diagram
It is used to present continuous data. If the data are few (say, less than 25
items) a dot diagram is a useful visual aid.
Table 2: The first 15 items of modulus of rupture data measuring timber
strengths in N/mm2
29.11
29.93
32.02
32.40
33.06
34.12
35.58
39.34
40.53
41.64
45.54
48.37
48.78
50.98
65.35

Dot diagram for a short sample of timber strength/ modulus of rupture (a


material property, defined as the stress in a material just before it yields in a
flexure test).

Histogram
If there are at least, say, 25 observations, one of the most common graphical
forms is a block diagram called the histogram. For this purpose, the data are
divided into groups according to their magnitudes. The horizontal axis of the
graph gives the magnitudes. Blocks are drawn to represent the groups, each
of which has a distinct upper and lower limit. The area of a block is
proportional to the number of occurrences in the group. The variability of the
data is shown by the horizontal spread of the blocks, and the most common
values are found in blocks with the largest areas.
To draw a histogram, one divides the range into a number of classes or cells
nc. The number of occurrences in each class is counted and tabulated. These
are called frequencies.

Number of classes (Sturges, 1926)


nc = 1 + 3.3 log n
Number of classes (Freedman & Diaconis, 1981)

nc =

where:
nc
n
iqr
r
Q1
Q3

=
=
=
=
=
=

1
rn3

2iqr

iq = 3 1

number of classes
number of data samples
interquartile range
range
first quartile (median of the lower half data)
third quartile(median of the higher half data)

Frequency Polygon
A frequency polygon is a useful diagnostic tool to determine the distribution
of a variable. It can be drawn by joining the midpoints of the tops of the
rectangles of a histogram after extending the diagram by one class on both
sides. We assume that equal class widths are used. If the ordinates of a
histogram are divided by the total number of observations, then a relative
frequency histogram is obtained. Thus, the ordinates for each class denote the
probabilities bounded by 0 and 1, by which we simply mean the chances of
occurrence. The resulting diagram is called the relative frequency polygon.

Cumulative Relative Frequency Diagram

A cumulative frequency plot is a way to display cumulative information


graphically. It shows the number, percentage, or proportion of observations
that are less than or equal to particular values.

http://stattrek.com/statistics/charts/cumulative-plot.aspx

Table 3: Modulus of Rupture data from 50 mm x 150 mm Swedish redwood and


whitewood timber in N/mm2.
0.00
28.00 31.60 34.40 36.84 39.21 41.75 44.30 47.25 53.99
17.98 28.13 32.02 34.49 36.85 39.33 41.78 44.36 47.42 54.04
22.67 28.46 32.03 34.56 36.88 39.34 41.85 44.36 47.61 54.71
22.74 28.69 32.40 34.63 36.92 39.60 42.31 44.51 47.74 55.23
22.75 28.71 32.48 35.03 37.51 39.62 42.47 44.54 47.83 56.60
23.14 28.76 32.68 35.17 37.65 39.77 43.07 44.59 48.37 56.80
23.16 28.83 32.76 35.30 37.69 39.93 43.12 44.78 48.39 57.99
23.19 28.97 33.06 35.43 37.78 39.97 43.26 44.78 48.78 58.34
24.09 28.98 33.14 35.58 38.00 40.20 43..33 45.19 49.57 65.35
24.25 29.11 33.18 35.67 38.05 40.27 43.33 45.54 49.59 65.61
24.84 29.90 33.19 35.88 38.16 40.39 43.41 45.92 49.65 69.07
25.39 29.93 33.47 35.89 38.64 40.53 43.48 45.97 50.91 70.22
25.98 30.02 33.61 36.00 38.71 40.71 43.48 46.01 50.98
26.63 30.05 33.71 36.38 38.81 40.85 43.64 46.33 51.39
27.31 30.33 33.92 36.47 39.05 40.85 43.99 46.50 51.90
27.90 30.53 34.12 36.53 39.15 41.64 44.00 46.86 53.00
27.93 31.33 34.40 36.81 39.20 41.72 44.07 46.99 53.63

Number of classes (Sturges, 1926)

Class Width

nc = 1 + 3.3 log n
nc = 1 + 3.3 log(165)
nc = 8.32 say 9

w = /nc
w = 70.22/9
w = 7.80 say

Class Upper
Limit
(N/mm2)
8
16
24
32
40
48
56
64
72

Class
Center
(N/mm2)
4
12
20
28
36
44
52
60
68

Absolute
Frequency

Relative
Frequency

1
0
7
27
58
48
16
4
4

0.006
0.000
0.042
0.164
0.352
0.291
0.097
0.024
0.024

Cumulative
Relative
Frequency
0.61
0.61
4.85
21.21
56.36
85.45
95.15
97.58
100.00

Modulus of Rupture (N/mm2)

64.00-71.99

56.00-63.99

48.00-55.99

40.00-47.99

32.00-39.99

24.00-31.99

16.00-23.99

8.00-15.99

0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0-7.99

Relative Frequency

Histogram for timber strength data with class width


of 8 N/mm2

Modulus of Rupture (N/mm2)

64.00-71.99

56.00-63.99

48.00-55.99

40.00-7.99

32.00-39.99

24.00-31.99

16.00-23.99

8.00-15.99

0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0-7.99

Relative Frequency

Relative Frequency Polygon for timber strength data


with class width of 5 N/mm2

Number of classes (Freedman & Diaconis, 1981)


32.76 + 33.06
Q1 =
= 32.91
2

Q 2 = 39.05
Q3 =

44.54 + 44.59
= 44.57
2

iqr = Q 3 Q1 = 44.57 32.91 = 11.66


r = xmax xmin = 70.22 0.00 = 70.22

nc =
=

1
rn3

2iqr

1
(70.22)(165)3

2(11.66)
= 16.52 say

Class Width
w = r/nc
= 70.22/15
= 4.68 say

15

Class Upper
Limit
(N/mm2)
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75

Class Center Absolute


(N/mm2)
Frequency
2.5
7.5
12.5
17.5
22.5
27.5
32.5
37.5
42.5
47.5
52.5
57.5
62.5
67.5
72.5

1
0
0
1
9
18
26
38
34
20
9
5
0
3
1

Relative
Frequency

Cumulative
Relative Frequency

0.006
0.000
0.000
0.006
0.055
0.109
0.158
0.230
0.206
0.121
0.055
0.030
0.000
0.018
0.006

0.61
0.61
0.61
1.21
6.67
17.58
33.33
56.36
76.97
89.09
94.55
97.58
97.58
99.39
100.00

Modulus of Rupture (N/mm2)

70.00-74.99

65.00-69.99

60.00-64.99

55.00-59.99

50.00-54.99

45.00-49.99

40.00-44.99

35.00-39.99

30.00-34.99

25.00-29.99

20.00-24.99

15.00-19.99

10.00-14.99

5.00-9.99

0-4.99

Relative Frequency

Histogram for timber strength data with class width of


5 N/mm2

0.250

0.200

0.150

0.100

0.050

0.000

Modulus of Rupture (N/mm2)


70.00-74.99

65.00-69.99

60.00-64.99

55.00-59.99

50.00-54.99

45.00-49.99

40.00-44.99

35.00-39.99

30.00-34.99

25.00-29.99

20.00-24.99

15.00-19.99

10.00-14.99

5.00-9.99

0-4.99

Relative Frequency

Relative Frequency Polygon for timber strength


data with class width of 5 N/mm2

0.250

0.200

0.150

0.100

0.050

0.000

Cumulative relative frequency diagram for timber strength data

Measure of Central Tendency


It can be obtained by locating a central or representative value.
Mean- is a numerical average of values based from the number sample of
observations.
Median- is the central value in an ordered set or the average of the two
central values if the number of values, n, is even.
Mode- is the value that occurs most frequently.
Range- is the difference between maximum and minimum value.
R = xmax xmin
Outliers- values beyond the normal

Data from Concrete Test


Order
1
2
3
4
5
6
7
8
9
10
Sum

Density (kg/m3)
2, 411
2, 415
2, 425
2, 427
2, 427
2, 428
2, 429
2, 433
2, 434
2, 435
24, 264

Compressive Strength ( MPa)


49.9
50.7
52.5
53.2
53.4
54.4
54.6
55.8
56.3
56.7
537.5

Given the data from the concrete test above, perform the following:

Determine the mean value of the density of concrete.


x=

x
24,264
=
= , . /
n
10

Determine the median value of the density of concrete.


2,427 + 2,428
x=
= , . /
2
Determine the mode value of the density of concrete.
Most frequent = , /

Determine the range of density of concrete:


R = xmax xmin = 2,435 2,411 = /

Determine the mean value of the compressive strength of concrete.

x=

x
537.5
=
= .
n
10

Determine the median value of the density of concrete.


x=

53.4 + 54.4
= .
2

Determine the mode value of the density of concrete.


Most frequent

Determine the range of density of concrete:


R = xmax xmin = 56.7 49.9 = .

Harmonic Mean - It is applied in situations where the reciprocal of a


variable is averaged. It is the reciprocal of the mean of the reciprocals. Thus
the harmonic mean for a sample of observations,x1 ,x2 ,, xn is
xh =

1
1
n

1
n=1 x n

Example: Stream flow velocity. A practical example of the harmonic mean


is the determination of the mean velocity of a stream based on measurements
of travel times over a given reach of the stream using a floating device. For
instance, if three velocities are calculated as 0.20, 0.24, and 0.16 m/s,
calculate the harmonic mean.
xh =

1
1 1
1
1
+
+
3 0.20 0.24 0.16

= 0.19 /

Geometric mean is used in averaging values that represent a rate of


change. Here the variable follows an exponential, that is, a logarithmic law.
For a sample of observations, x1 ,x2 ,, xn , the geometric mean is the positive
nth root of the product of the n values.

xi1/n

xg =
i=1

Example: Suppose a storage bin for cement has dimensions 9.0 m x 2.0 m x
1.5m. Find the mean value of length, width and height such that the volume
of bin remains the same.
x = (9 2 1.5)1/3 = 3 m

Population growth: Consider the case of populations of towns and cities that
increase geometrically, which means that a future increase is expected that is
proportional to the current population. Such information is invaluable for
planning and designing urban water supplies and sewerage systems. Suppose,
for example, that according to a census conducted in 1970 and again in 1990
the population of a city had increased from 230,000 to 310,000. An engineer
needs to verify, for purposes of design, the per capita consumption of water in
the intermediate period and hence tries to estimate the population in 1980.
The central value to use in this situation is the geometric mean of the two
numbers which is

xi1/n

xg =
i=1

xg = 230,000 310,000

= 267,021

1/n

Measures of Dispersion
A measure of dispersion represents the degree of scatter shown by
observations or the inherent variability in a phenomenon under
observation. Dispersion also indicates the precision of the data. One
method of quantification is through an order statistic, that is, one of
ranked data. The simplest in the category is the range.

Measures of Dispersion
Mean absolute deviation (d) - measures the average absolute deviation from
the sample mean.

1
d=
n

xi x
i=1

Standard Deviation- it is the root mean square deviation about the mean.
n

s=
i=1

xi x
n

Example: Annual rainfall. If the annual rainfalls in a city are 50, 56, 42,
53, and 49 cm over a 5-year period with mean value of 50 cm, determine
the following:
a. Absolute deviation from the mean
b. Standard deviation from the mean
Part a:
1
d = * 50 50 + 56 50 + 42 50 + 53 50 + 49 50 +
5
d = 3.6 cm
Part b:

s=

1
50 50
5

s = 4.69 cm

+ 56 50

+ 42 50

+ 53 50

+ 49 50

Measure of Asymmetry
Another important property of the histogram or frequency polygon is its
shape with respect to symmetry (on either side of the mode). The sample
coefficient of skewness measures the asymmetry of a set of data about its
mean. For a sample of observations, x1 ,x2 ,, xn it is defined as

1
1 = 3

xi x
i=1

Skewness
Value of g1

Skew

Remarks

Positive

Positive skew

mean > median > mode

Zero

Normal distribution

mean = median = mode

Negative

Negative skew

mean < median < mode

Negative skew
(skewed to the left)

Normal distribution
(symmetrical)

Positive skew
(skewed to the right)

Skewness
A unitless indicator used in distribution analysis as a sign of asymmetry
and deviation from a normal distribution.

Interpretation:
Skewness > 0 - right skewed distribution
- most values are concentrated on left of the mean, with
extreme values to the right.

Skewness < 0 - left skewed distribution


- most values are concentrated on right of the mean, with
extreme values to the left.
Skewness = 0 - mean is equal to the median
- the distribution is symmetrical around the mean.

Measure of Peakedness

The extent of the relative steepness of ascent in the vicinity and on either side
of the mode in a histogram or frequency polygon is said to be a measure of its
peakedness or tail weight. This is quantified by the dimensionless sample
coefficient of kurtosis, which is defined for a sample of observations,
x1 ,x2 ,, xn by
1
2 = 4

xi x
i=1

Kurtosis
Value of g2

Distribution

Peakedness

>3

Leptokurtic distribution

Peak

=3

Mesokurtic distribution

Normal

<3

Platykurtic distribution

Flat

Kurtosis
A unitless indicator used in distribution analysis as a sign of flattening or
"peakedness" of a distribution.

Interpretation:

Kurtosis > 3 - Leptokurtic distribution, sharper than a normal distribution,


with values concentrated around the mean and thicker tails. This means
high probability for extreme values.
Kurtosis < 3 - Platykurtic distribution, flatter than a normal distribution
with a wider peak. The probability for extreme values is less than for a
normal distribution, and the values are wider spread around the mean.
Kurtosis = 3 - Mesokurtic distribution - normal distribution for example.

References

http://onlinestatbook.com/2/graphing_distributions/freq_poly.html
http://www.spcforexcel.com/knowledge/basic-statistics/are-skewness-andkurtosis-useful-statistics

http://www.graphpad.com/guides/prism/6/statistics/index.htm?stat_skewness
_and_kurtosis.htm
http://www.intercapital.ro/en/intercapital_start/explicatii/distribKurtosis.htm

https://en.wikipedia.org/wiki/Normal_distribution
http://chubbyrevision.weebly.com/representation-of-data.html

References
http://www.visualmining.com/resources/ncs_analytics/ncs_analytics_6/
http://www.slideshare.net/indiandentalacademy/statistical-tests
https://www.spcforexcel.com/knowledge/basic-statistics/are-skewness-andkurtosis-useful-statistics

You might also like