You are on page 1of 15

ENGSP103: MATHEMATICAL MODELLING AND ANALYSIS

ENGINEERING UNCERTAINTIES
Andy Chow (CEGE) - 2015

1. Uncertainties in engineering
Engineering design and analysis are essentially based on some quantitative models. However,
irrespective of the level of sophistication of these methods, they are based on certain idealized
assumptions or conditions. Consequently, information derived from these quantitative models
may not necessarily reflect the reality closely.
A main reason of why engineers should know about probability and statistics is that many
phenomena or problems of concern to engineers involve randomness, variability, or
uncertainties1.
Because of the existence of (unknown) variability, there is usually a range of measurements or
observations from even apparently identical conditions. This suggests that engineering decisions
should not be based on just one single observation or measurement. Presumably, we may assume
worst conditions (e.g. the highest possible flood, smallest observed fatigue life of materials, etc).
However, the resulting design can be too costly because of this over-conservative approach.
Engineering designs and decisions often have to be made under uncertainty, an appropriate tradeoff analysis should consider the uncertainties involved.
With probability and statistics, engineers can have a better understanding of phenomena or
problems of concern, and hence make better decisions.
In summary, probability and statistics can offer the mathematical basis for

collecting and analyzing data (stat.),


evaluating effects of uncertainties in engineering design (prob.),
deriving information and knowledge from measurements (both.), and
making decisions based on observations (both).

2. Random variables and probability distributions


1

Uncertainties can be regarded as quantities or phenomena that we do not expect and/or cannot predict exactly.

The concepts of random variables and probability distributions are central to probability theory
and statistical analysis.

2.1 Random variable


Random variable is variable (e.g. concrete strength, rainfall, traffic accident rate, etc) whose
value (or range of values) is uncertain and unpredictable.
Random variable can be either discrete or continuous.
Discrete random variables can only take a finite set of individual values (e.g. integers);
continuous random variables can take any value within two limits (e.g. any real numbers
between zero and one).

2.2 Probability distributions


A random variable can be statistically specified by a probability distribution function.

2.2.1 Probability mass function


The distribution of a discrete random variable can be specified by a probability mass function
(pmf).

Definition: (Probability mass function): Probability mass function (pmf) of a random variable
X gives the probability of a particular value taken by X, i.e.

p X ( a ) P( X a ) ,
where p X denotes the pmf of X.

2.2.2 Cumulative distribution function of a discrete random variable


Definition (Cumulative distribution function): For a discrete (also continuous) random
variable X, the cumulative distribution function (cdf) specifies the probability of it being less
than a certain value a, i.e.

FX (a) P( X a) ,
2

where FX denotes the cdf of X.

We note that FX is a monotonic increasing function of a, and 0 FX (a) 1 for all possible a.
In case of a discrete random variable, following the addition rule (Axiom 3, Section 2.2 in
Lecture 2), FX (a) is the sum of the probabilities of all possible values of X that are less than or
equal to a, i.e.

FX ( a ) P ( X a )

p X ( x) .

x a

2.2.3 Probability density function


The distribution of a continuous random variable is specified by a probability density function
(pdf). Probability density function arises from a limiting case in which the size of the sample is
infinite.

Definition (Probability density function): Probability density function (pdf) of a random


variable X gives the probability of a range of values taken by X, i.e.
b

f X ( x)dx P(a X b) ,
a

where f X denotes the pdf of X, and f X 0 for all possible x.


b

Note that the quantity

f X ( x)dx P(a X b) lies between zero and one.


a

As observed from the expression in Definition 3.3, the pdf f X by itself is not a probability, it is
merely a intensity of probability or rate of change of probability with respect to the values of X.
Exercise: For a continuous random variable X, what is the probability P( X a) ?

2.2.4 Cumulative distribution function of a continuous random variable


3

In case of a continuous random variable, the associated cdf FX (a) is defined as


a

FX ( a ) P ( X a )

f X ( x)dx .

It also follows that

dFX ( x)
f X ( x) .
dx

Also note that,

f X ( x)dx 1 .

3. Descriptors of random variables


3.1 Expectation
The expected value (denoted by E(X) )of a random variable X is a weighted average of X with
respect to its underlying probability distribution.
For discrete random variable X,

E ( X ) xp X ( x)
x

For continuous random variable X,

E( X )

xf X ( x)dx

Sometimes this expected value is also denoted by X .

3.2 Variance and standard deviation


For discrete random variable, the variance with respect to its mean is calculated as
2X E[ X E ( X )]2 ( x X ) 2 p X ( x) .
x

For continuous random variable,

2X E[ X E ( X )]2

(x X )

f X ( x)dx .

We have a useful expression relating the variance and expected values as follows:

Theorem:

2X E ( X 2 ) 2X ,
where E ( X 2 ) x 2 p X ( x) for discrete case, or E ( X 2 )
x

Proof:

2X E[ X E ( X )]2 E[ X 2 2 XE ( X ) [ E ( X )]2 ]
E ( X 2 ) 2 E ( X )[ E ( X )] [ E ( X )]2
E ( X 2 ) [ E ( X )]2
E ( X 2 ) 2X

Or, for discrete case,


2X ( x X ) 2 p X ( x)
x

x 2 p X ( x) 2 x X p X ( x) 2X p X ( x)
x

x p X ( x) 2 X
2

E( X
E( X

x
xp X ( x) 2X

p X ( x)
x

) 2 X ( X ) 2X
2
) 2X

For continuous case,

f X ( x)dx for continuous case.

2X

(x X )

f X ( x)dx

E( X
E( X

2
f X ( x)dx 2 X xf X ( x)dx X f X

2
2
) 2 X ( X ) X
2
) 2X

( x)dx

Given the variance, we also have the standard deviation X 2X .

4. Useful probability distributions


This section introduces some mathematical functions that we use to represent the underlying
distribution of random variables.

4.1. Uniform distribution


Uniform distribution is the simplest type of continuous distribution, in which the pdf is constant
over a given interval (say, from a to b):

f X ( x) b a

if a x b
otherwise

Exercise: Show that the expected value and variance of the uniform distribution are

ba
and
2

(b a) 2
respectively.
12

4.2. Bernoulli distribution


Consider the simplest type of experience in which there is only a single trial with two possible
outcomes: true or false. The outcomes are mutually exclusive and collectively exhaustive.
Bernoulli distribution corresponds to the distribution of the probabilities of the two outcomes of
such experiment.
6

Let X be a random variable following Bernoulli distribution, and X takes only two values: 1 if
the outcome is true, and 0 otherwise.
Given the probability of getting a true outcome be p, then the corresponding pmf of this
Bernoulli distribution can be defined as

p X (a) p a (1 p)1 a
where a = 0 or 1; 0 p 1 , and

p X ( x) 0
if a 0 or 1.
The cdf of Bernoulli distribution can be computed as

FX (a) p X ( x) (1 p)
x 0
1
a

if a 0
if 0 a 1
otherwise

We can also derive (see Appendix) the expected value and variance of this Bernoulli distribution:

X (1) p (0)(1 p) p
2X (1 p) 2 p (0 p) 2 (1 p) (1 p) p

4.3 Binomial distribution


Say we now repeat the preceding true-false experiment for n times (such series of experiments
is called a Bernoulli sequence), we are interested in determining the probability of having a
true values out of these n trials.
We note that the probability of having a particular arrangement or permutation of exactly a
true among n trials is p a (1 p) na , where p is the probability of having a true value in one
trial.

n
n!
However, recall from combinatorics that we could have
of such combinations
a a!(n a)!
of x true values given n trials.
7

Consequently, we have

n
p X (a) p a (1 p) na ,
a
which is regarded as the pmf of Binomial distribution, where a and n are positive integers; a n
; 0 p 1, and

p X (a) 0
if otherwise.
Note that Binomial distribution reduces to be Bernoulli distribution if n = 1 (i.e. only one trial).
The cdf of Binomial distribution can be computed as

FX ( a )

p X ( x)

x 0

x p x (1 p) n x .

x 0

The expected value and the variance of the binomial distribution are X np , and,

2X np(1 p) (See Appendix 1). They are indeed also the sums of the individual expected
values and variances of the underlying Bernoulli distributions (following Theorem 3.2).

Return period When we use this Bernoulli sequence to model a problem over time, the
number of time intervals (trials) between two successive true observed (or occurrences of
event) is called the recurrence time. Moreover, the mean (expected) recurrence time is called the
(average) return period.
For binomial distribution, it can be shown that the recurrence time T between two successive
events follows the following pmf:

pT (t ) P(T t ) p(1 p) t 1 ,
where t is number of time intervals (or trials), and it is a positive integer.
This distribution is known as the Geometric distribution. The expected return period can be
(1 p)
shown (see Appendix 2) as T 1 with variance T2
.
p
p2
8

4.4 Poisson distribution


Suppose now we perform a large number of trials (i.e. n is large) and the probability p of getting
a true value in each trial is small, the binomial distribution discussed in the previous section
can be shown to converge to a limiting distribution which is called Poisson distribution.
For large n and small p, the pmf of binomial distribution converge to the following
p X (a)

a
e ,
a!

which is known as Poisson distribution, in which is the expected number of true values
observed. The proof can be found in Appendix 3.
Sometimes we will replace with vt, where v is the rate of observing a true outcome, and t is
a given time period (say, day, hour, minute, or second).
Hence, the pmf of Poisson distribution can also be written as
p X (a)

(vt) a vt
e
a!

Moreover, the expected value and variance of a Poisson distribution can be shown to be both
(Appendix 4).

4.5. Exponential distribution


The exponential distribution is closely related to Poisson distribution as follows:
Given a Poisson distribution,
(vt) a (vt )
,
p X (a)
e
a!

then recurrence time T between two successive events can be shown to follow the exponential
distribution:

f T (t ) ve(vt ) ,
which is the pdf of exponential distribution in which t 0 .
9

We also have the cdf for exponential distribution:

FT (t ) P(T t ) 1 e (vt )
where T is the trial or time when we observe a true value.
The expected value of the exponential distribution is determined (making use of integration by
1
1
parts) as T and T2 2 respectively (see Appendix 5). It is interesting to note that the
v
v
coefficient of variation for exponential distribution is one.

4.6. Normal distribution


Normal distribution also known as Gaussian distribution is probably the most popular and
widely used probability function.
The normal distribution has the following pdf:

1 a 2
f (a)
exp
,
2
2
1

where x ; and are the expected value (mean) and the standard deviation of the
distribution respectively.
An usual short notation for the normal distribution is N (, ) .

10

Figure 1 Normal distributions with a) varying b) varying

The corresponding cdf, F, of normal distribution is

1 x 2
F (a)
exp
dx
2
2
a

4.6.1. Standard normal distribution


A normal distribution with = 0 and = 1 is known as the standard normal distribution, and is
denoted as N(0,1).
The corresponding pdf of a standard normal distribution is
( z )

1
exp z 2 ,
2
2

in which is the notation of standard normal pdf and z denotes a random variable following
standard normal distribution.
Note that is symmetric about zero.

11

Area =
(zi)

(z)

zi

Figure 2 Standard normal distribution


The cdf of the standard normal distribution is
z

( z )

1
exp x 2 dx ,
2
2

which is indicated by the shaded area in Figure 2.


Given the standard normal distribution, any other normal distributions can be determined as
follows:
Suppose a random variable X follows normal distribution N (, ) for some and . Given a
pair of real numbers a, b, with a<b, then we have

1 x 2
P ( a X b)
exp
dx ,
2

which is the area under the normal pdf curve between a and b.
Instead of calculating it directly, this integral indeed can also be calculated by making the
following change of variable:
z

x
, and dx dz ,

Hence,

12

1 x 2
exp
dx
2

P ( a X b)

1
b
a
exp z 2 dz

2
2

which can be recognized as the area under the standard normal pdf between

and

Furthermore, it is useful to reckon that the areas (or probabilities) covered within one, two and
three standard deviation about the mean ( =0) of the standard normal distribution are
respectively 68.3%, 95.4%, and 99.7% (see Figure 3).
4.6.2. Further property of normal distribution Linear transformation
Suppose that X is a random variable following N (, ) , then
Y = a + bX
will be a random variable following N (a b, b) .
Following this property, given the standard normal random variable Z ~ N (0,1) , any other
normal distributed random variable X ~ N (, ) can be expressed in terms of Z as
X Z .

Hence, it can be deduced that, for any X follows normal distribution N (, ) :

P( X ) 0.68
P( 2 X 2) 0.95
P( 3 X 3) 0.998

13

Figure 3. PDFs of standard normal distribution with areas covering 1, 2, 3 ' s .

4.6.3. Sum (or difference) of independent Normal random variables


Consider X and Y are two independent normal random variables. It can be shown that the sum of
them, Z = X + Y, will also be a normal random variable with

Z X Y
2Z 2X Y2
Similarly, for the difference: Z = X - Y

Z X Y
2Z 2X Y2
14

Note that the variance is the same as the sum case.


The above results can generalized for
n

Z ai X i ,
i 1

where ai are constants, X i are statistically independent normal variables ~ N ( X i , X i ) . Z will


also follow normal distribution with mean
n

Z ai Xi ,
i 1

and variance

2Z

ai2 2Xi .
i 1

The above relationships on Z and 2Z are also applicable for linear functions of any other
statistically independent random variable irrespective of their distributions.

References
Ang H and Tang W (2007) Probability Concepts in Engineering: Emphasis on Applications to
Civil and Environmental Engineering, Wiley.

15

You might also like