MODELING &

SIMULATION

INTRODUCTION

each day. For example, physician says that a

patient has a 50-50 chance of surviving a

certain operation. Another physician may

say that she is 95% certain that a patient

has a particular disease

Definition

and equally likely ways, and if m of these

possess a trait, E, the probability of the

occurrence of E is read as

P(E) = m/N

DEFINITION

of data collection. It consists of a

number of trials (replications) under

the same condition.

Definition

Sample space: collection of unique, non-overlapping possible

outcomes of a random circumstance.

outcome of a random circumstance.

space; often written as

A, B, C, and so on

Male, Female

Complement ==> sometimes, we want to know

the probability that an event will not happen; an

event opposite to the event of interest is called

a complementary event.

probability of the complement is AC or A

Example: The complement of male event is the

female

P(A) + P(AC) = 1

Views of Probability:

Subjective:

best guess about whether an outcome will occur.

physician’s opinion (based on information gained in

the history and physical examination) about whether a

patient has a specific disease. Such estimate can be

changed with the results of diagnostic procedures.

Objective

Classical

It is well known that the probability of flipping a fair

coin and getting a “tail” is 0.50.

If a coin is flipped 10 times, is there a guarantee,

that exactly 5 tails will be observed

If the coin is flipped 100 times? With 1000 flips?

As the number of flips becomes larger, the

proportion of coin flips that result in tails

approaches 0.50

THE MEANING OF A VARIABLE

A variable refers to any quantity that may take

on more than one value

Population is a variable because it is not fixed

or constant – changes over time

The unemployment rate is a variable because

it may take on any value from 0-100%

A random variable can be thought of as an

unknown value that may change every time it is

inspected.

THE MEANING OF A VARIABLE

continuous

A variable is discrete if its possible values have

jumps or breaks

e.g. Population - measured in integers or whole units:

1, 2, 3, …

A variable is continuous if there are no jumps or

breaks

Unemployment rate – needs not be measured in

whole units: 1.77, .., 8.99, …

DESCRIPTIVE STATISTICS

Descriptive statistics are used to describe the main

features of a collection of data in quantitative terms.

Descriptive statistics aim to quantitatively summarize a

data set

descriptive analyses. For example

Frequency Distribution

Central Tendency

Dispersion

Association

FREQUENCY DISTRIBUTION

certain values occur.

In statistics, a frequency distribution is a tabulation of the

values that one or more variables take in a sample.

Consider the hypothetical prices of Dec CME Live Cattle

Futures

Month Price (cents/lb)

May 67.05

June 66.89

July 67.45

August 68.39

September 67.45

October 70.10

November 68.39

FREQUENCY DISTRIBUTION

Univariate frequency distributions are often presented as lists

ordered by quantity showing the number of times each value appears.

A frequency distribution may be grouped or ungrouped

For a small number of observations - ungrouped frequency distribution

For a large number of observations - grouped frequency distribution

Ungrouped Grouped

Price (X) Frequency Price (X) Frequency

67.05 1 65.00-66.99 1

66.89 1 67.00-68.99 4

67.45 2 69.00-70.99 1

68.39 2 71.00-72.99 0

70.10 1 73.00-74.99 0

CENTRAL TENDENCY

In statistics, the term central tendency relates to

the way in which quantitative data tend to cluster

around a “central value”.

A measure of central tendency is any of a number

of ways of specifying this "central value.“

There are three important descriptive statistics that

gives measures of the central tendency of a variable:

The Mean

The Median

The Mode

THE MEAN

The arithmetic mean is the most commonly-used type of

average and is often referred to simply as the average.

In mathematics and statistics, the arithmetic mean (or simply

the mean) of a list of numbers is the sum of all numbers in the

list divided by the number of items in the list.

If the list is a statistical population, then the mean of that

population is called a population mean.

If the list is a statistical sample, we call the resulting statistic

a sample mean.

If we denote a set of data by X = (x1, x2, ..., xn), then the sample

mean is typically denoted with a horizontal bar over the variable

( X , enunciated "x bar").

The Greek letter μ is used to denote the arithmetic mean of

an entire population.

THE SAMPLE MEAN

X = (x1, x2, ..., xn) is given by

1 n 1

X X i ( X 1 X 2 ... X n )

n i 1 n

and the result is divided by the number of observations (n)

In the previous example, the mean price of Dec CME Live Cattle futures

contract is

1 n 1

X X i (67.05 66.89 ... 68.39) 67.96

n i 1 7

THE MEDIAN

the higher half of a sample or population from the lower half.

The median of a finite list of numbers can be found by arranging all the

observations from lowest value to highest value and picking the middle

one.

If there is an even number of observations, then there is no single

middle value, so one often takes the mean of the two middle values.

Organize the price data in the previous example in ascending order

67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10

The median of this price series is 67.45

THE MODE

In statistics, the mode is the value that occurs the most frequently in

a data set.

The mode is not necessarily unique, since the same maximum

frequency may be attained at different values.

Organize the price data in the previous example in ascending order

67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10

There are two modes in the given price data – 67.45 and 68.39

Thus the mode of the sample data is not unique

The sample price dataset may be said to be bimodal

A population or sample data may be unimodal, bimodal, or multimodal

STATISTICAL DISPERSION

or variation) is the variability or spread in a variable or probability

distribution.

In particular, a measure of dispersion is a statistic (formula) that

indicates how disperse (i.e., spread) the values of a given variable are

Common measures of statistical dispersion are

The Variance, and

The Standard Deviation

they are the most used properties of distributions

THE VARIANCE

expected (mean) value of the square of the deviation of that variable

from its expected value or mean.

Thus the variance is a measure of the amount of variation within the

values of that variable, taking account of all possible values and their

probabilities.

If a random variable X has the expected (mean) value E[X]=μ, then the

variance of X can be given by:

Var ( X ) E[( X ) 2 ] x2

THE VARIANCE

are discrete or continuous. It can be expanded as follows:

Var ( X ) E[( X ) 2 ]

E[ X 2 2X 2 ]

E[ X 2 ] 2E[ X ] 2

E[ X 2 ] 2 2 2

E[ X 2 ] 2

E[ X 2 ] ( E[ X ]) 2

THE VARIANCE: PROPERTIES

The variance of a constant a is zero, and the variance of a variable

in a data set is 0 if and only if all entries have the same value.

Var (a ) 0

Variance is invariant with respect to changes in a location parameter.

That is, if a constant is added to all values of the variable, the

variance is unchanged.

Var ( X a) Var ( X )

If all values are scaled by a constant, the variance is scaled by the

square of that constant.

Var (aX ) a 2Var ( X )

Var (aX b) a 2Var ( X )

THE SAMPLE VARIANCE

variable X as Xi, where i = 1, 2, ..., n, then the sample

variance, can be used to estimate the population variance

of X = (x1, x2, ..., xn), The sample variance is calculated as

X X

n

2

i

S x2 i 1

n 1

1

n 1

X1 X X 2 X ... X n X

2 2 2

THE SAMPLE VARIANCE

calculating s x2: Intuitively, once X is known, only n-1 observation

values are free to vary, one is predetermined by X

When n = 1 the variance of a single sample is obviously zero

regardless of the true variance. This bias needs to be corrected for

when n is small.

X X

n

2

X1 X X 2 X ... X n X

i

1 2 2 2

S

2 i 1

n 1 n 1

x

THE SAMPLE VARIANCE

For the hypothetical price data for Dec CME Live Cattle futures

contract, 67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10, the sample

variance can be calculated as

X X

n

2

i

S

2 i 1

n 1

x

1

7 1

67.05 67.96 ... 70.10 67.96

2 2

1.24

THE STANDARD DEVIATION

or distribution is the square root of its variance.

If a random variable X has the expected value (mean)

E[X]=μ, then the standard deviation of X can be given by:

x x2 E [( X )2 ]

That is, the standard deviation σ (sigma) is the square root

of the average value of (X − μ)2.

THE STANDARD DEVIATION

variable X as Xi, where i = 1, 2, ..., n, then the sample

standard deviation, can be used to estimate the

population standard deviation of X = (x1, x2, ..., xn). The

sample standard deviation is calculated as

X X

n

2

i

Sx S 2 i 1

1.24 1.114

n 1

x

THE MEAN ABSOLUTE DEVIATION

di (X X)

i

n n

is always zero. The positive and negative deviations cancel out in

the summation, which makes it a useless measure of dispersion.

The mean absolute deviation (MAD), calculated by:

d i (X i X )

n n

solves the “canceling out” problem.

THE MSD AND RMSD

squaring the deviations from the mean to obtain the mean

squared deviation (MSD):

di

2

X X 2

i

n n

The problem of squaring can be solved by taking the square root of

the MSD to obtain the root mean squared deviation (RMSD):

X X

n

2

i

RMSD MSD i 1

n

RMSD VS. STANDARD DEVIATION

greater importance to the deviations that are larger in absolute value,

which may or may not be desirable.

For statistical reasons, it turns out that a slight variation of the RMSD,

known as the standard deviation (SX), is more desirable as a measure of

dispersion.

X i X

n

2

RMSD MSD i 1

n

X X

n

2

i

Sx i 1

n 1

VARIANCE VS. MSD

STANDARD DEVIATION VS. RMSD

67.05 67.96 -0.91 0.91 0.83

66.89 67.96 -1.07 1.07 1.14

67.45 67.96 -0.51 0.51 0.26

68.39 67.96 0.43 0.43 0.18

67.45 67.96 -0.51 0.51 0.26

70.10 67.96 2.14 2.14 4.58

68.39 67.96 0.43 0.43 0.18

Total 0.00 6.00 7.44

MAD = 0.86

Variance = 1.24 MSD = 1.06

Std. Dev. = 1.11 RMSD = 1.03

ASSOCIATION

which two variables are related or associated, without

implying that one causes the other

which multiple variables are related or associated, without

implying that one causes any or some of the others

Covariance

Correlation Coefficient

Association: Bivariate Statistics

In Figure 3.3 (a) Y and X are positively but weakly correlated

while in 3.3 (b) they are negatively and strongly correlated

THE COVARIANCE

with mean (expected values) X and Y v , is

Cov( X , Y ) E[( X X ).(Y Y )] E[( X ).(Y v)]

E[ X .Y Y vX v]

E[ X .Y ] E[Y ] vE[ X ] v

E[ X .Y ] v v v

E[ X .Y ] v

Cov(X, Y) can be negative, zero, or positive

Random variables with covariance is zero are called uncorrelated

or independent

COVARIANCE

follows because under independence,

E[ X .Y ] E[ X ].E[Y ] v

and substituting, we get

Cov( X , Y ) v v 0

variables have covariance zero although they are not independent.

THE COVARIANCE: PROPERTIES

constants ("constant" in this context means non-random), then the

following facts are a consequence of the definition of covariance:

Cov( X , a ) 0

Cov( X , X ) Var ( X )

Cov( X , Y ) Cov(Y , X )

Cov(aX , bY ) abCov( X , Y )

Cov( X a, Y b) Cov( X , Y )

VARIANCE OF THE SUM OF CORRELATED

RANDOM VARIABLES

constants ("constant" in this context means non-random), then the

following facts are a consequence of the definition of variance and

covariance:

Var ( X Y ) Var ( X ) Var (Y ) 2Cov( X , Y )

Var (aX bY ) a 2Var ( X ) b 2Var (Y ) 2abCov( X , Y )

equal to the sum of their variances.

Var ( X Y ) Var ( X ) Var (Y )

THE SAMPLE COVARIANCE

two variables X and Y vary together:

Yi where i = 1, 2, ..., n, then the sample covariance can be used to

estimate the population covariance between X=(X1, X2, …, Xn) and

Y=(Y1, Y2, …, Yn). The sample covariance is calculated as

X X Yi Y

n

i

S x, y i 1

CORRELATION COEFFICIENT

can not be easily interpreted, since it depends on the units in

which we measure X and Y

The related and more used correlation coefficient remedies

this disadvantage by standardizing the deviations from the

mean:

Cov( X , Y ) X ,Y

x, y

Var ( X ) Var (Y ) X . Y

x, y y, x

CORRELATION COEFFICIENT

and Yi, where i = 1, 2, ..., n, then the sample correlation

coefficient, can be used to estimate the population correlation

coefficient between X and Y. The sample correlation coefficient

is calculated as

(X i X )(Yi Y )

rx , y i 1

(n 1) S x S y

CORRELATION COEFFICIENT

1 rx , y 1

rx,y= 1 => X and Y are perfectly positively correlated

rx,y = −1 => X and Y are perfectly negatively correlated

