Modeling & Simulation Lect 27 28

STATISTICAL CONCEPTS IN
MODELING &
SIMULATION
INTRODUCTION
People use the term probability many times

each day. For example, physician says that a
patient has a 50-50 chance of surviving a
certain operation. Another physician may
say that she is 95% certain that a patient
has a particular disease
Definition
If an event can occur in N mutually exclusive

and equally likely ways, and if m of these
possess a trait, E, the probability of the
occurrence of E is read as
P(E) = m/N
DEFINITION
Experiment ==> any planned process

of data collection. It consists of a
number of trials (replications) under
the same condition.
Definition
Sample space: collection of unique, non-overlapping possible
outcomes of a random circumstance.
Simple event: one outcome in the sample space; a possible

outcome of a random circumstance.
Event: a collection of one or more simple events in the sample

space; often written as
A, B, C, and so on
Male, Female
Complement ==> sometimes, we want to know
the probability that an event will not happen; an
event opposite to the event of interest is called
a complementary event.
If A is an event, its complement is The

probability of the complement is AC or A
Example: The complement of male event is the
female
P(A) + P(AC) = 1
Views of Probability:
Subjective:
It is an estimate that reflects a person’s opinion, or

best guess about whether an outcome will occur.
Important in medicine  form the basis of a

physician’s opinion (based on information gained in
the history and physical examination) about whether a
patient has a specific disease. Such estimate can be
changed with the results of diagnostic procedures.
Objective
Classical
It is well known that the probability of flipping a fair
coin and getting a “tail” is 0.50.
If a coin is flipped 10 times, is there a guarantee,
that exactly 5 tails will be observed
If the coin is flipped 100 times? With 1000 flips?
As the number of flips becomes larger, the
proportion of coin flips that result in tails
approaches 0.50
THE MEANING OF A VARIABLE
A variable refers to any quantity that may take
on more than one value
 Population is a variable because it is not fixed
or constant – changes over time
 The unemployment rate is a variable because
it may take on any value from 0-100%
A random variable can be thought of as an
unknown value that may change every time it is
inspected.
THE MEANING OF A VARIABLE
 A random variable either may be discrete or

continuous
 A variable is discrete if its possible values have
jumps or breaks
e.g. Population - measured in integers or whole units:
1, 2, 3, …
 A variable is continuous if there are no jumps or
breaks
 Unemployment rate – needs not be measured in
whole units: 1.77, .., 8.99, …
DESCRIPTIVE STATISTICS
 Descriptive statistics are used to describe the main
features of a collection of data in quantitative terms.
 Descriptive statistics aim to quantitatively summarize a
data set
 Some statistical summaries are especially common in

descriptive analyses. For example
 Frequency Distribution
 Central Tendency
 Dispersion
 Association
FREQUENCY DISTRIBUTION
 Every set of data can be described in terms of how frequently

certain values occur.
 In statistics, a frequency distribution is a tabulation of the
values that one or more variables take in a sample.
 Consider the hypothetical prices of Dec CME Live Cattle
Futures
Month Price (cents/lb)
May 67.05
June 66.89
July 67.45
August 68.39
September 67.45
October 70.10
November 68.39
FREQUENCY DISTRIBUTION
 Univariate frequency distributions are often presented as lists
ordered by quantity showing the number of times each value appears.
 A frequency distribution may be grouped or ungrouped
 For a small number of observations - ungrouped frequency distribution
 For a large number of observations - grouped frequency distribution
Ungrouped Grouped
Price (X) Frequency Price (X) Frequency
67.05 1 65.00-66.99 1
66.89 1 67.00-68.99 4
67.45 2 69.00-70.99 1
68.39 2 71.00-72.99 0
70.10 1 73.00-74.99 0
CENTRAL TENDENCY
 In statistics, the term central tendency relates to
the way in which quantitative data tend to cluster
around a “central value”.
 A measure of central tendency is any of a number
of ways of specifying this "central value.“
 There are three important descriptive statistics that
gives measures of the central tendency of a variable:
 The Mean
 The Median
 The Mode
THE MEAN
 The arithmetic mean is the most commonly-used type of
average and is often referred to simply as the average.
 In mathematics and statistics, the arithmetic mean (or simply
the mean) of a list of numbers is the sum of all numbers in the
list divided by the number of items in the list.
 If the list is a statistical population, then the mean of that
population is called a population mean.
 If the list is a statistical sample, we call the resulting statistic
a sample mean.
 If we denote a set of data by X = (x1, x2, ..., xn), then the sample
mean is typically denoted with a horizontal bar over the variable
( X , enunciated "x bar").
 The Greek letter μ is used to denote the arithmetic mean of
an entire population.
THE SAMPLE MEAN
 In mathematical notation, the sample mean of a set of data denoted as

X = (x1, x2, ..., xn) is given by
1 n 1
X   X i  ( X 1  X 2  ...  X n )
n i 1 n
 To calculate the mean, all of the observations (values) of X are added

and the result is divided by the number of observations (n)
 In the previous example, the mean price of Dec CME Live Cattle futures
contract is
1 n 1
X   X i  (67.05  66.89  ...  68.39)  67.96
n i 1 7
THE MEDIAN
 In statistics, a median is described as the numeric value separating

the higher half of a sample or population from the lower half.
 The median of a finite list of numbers can be found by arranging all the
observations from lowest value to highest value and picking the middle
one.
 If there is an even number of observations, then there is no single
middle value, so one often takes the mean of the two middle values.
 Organize the price data in the previous example in ascending order
67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10
 The median of this price series is 67.45
THE MODE
 In statistics, the mode is the value that occurs the most frequently in
a data set.
 The mode is not necessarily unique, since the same maximum
frequency may be attained at different values.
 Organize the price data in the previous example in ascending order
67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10
 There are two modes in the given price data – 67.45 and 68.39
 Thus the mode of the sample data is not unique
 The sample price dataset may be said to be bimodal
 A population or sample data may be unimodal, bimodal, or multimodal
STATISTICAL DISPERSION
 In statistics, statistical dispersion (also called statistical variability

or variation) is the variability or spread in a variable or probability
distribution.
 In particular, a measure of dispersion is a statistic (formula) that
indicates how disperse (i.e., spread) the values of a given variable are
 Common measures of statistical dispersion are
 The Variance, and
 The Standard Deviation
 Dispersion is contrasted with location or central tendency, and together

they are the most used properties of distributions
THE VARIANCE
 In statistics, the variance of a random variable or distribution is the

expected (mean) value of the square of the deviation of that variable
from its expected value or mean.
 Thus the variance is a measure of the amount of variation within the
values of that variable, taking account of all possible values and their
probabilities.
 If a random variable X has the expected (mean) value E[X]=μ, then the
variance of X can be given by:
Var ( X )  E[( X   ) 2 ]   x2
THE VARIANCE
 The above definition of variance encompasses random variables that

are discrete or continuous. It can be expanded as follows:
Var ( X )  E[( X   ) 2 ]
 E[ X 2  2X   2 ]
 E[ X 2 ]  2E[ X ]   2
 E[ X 2 ]  2 2   2
 E[ X 2 ]   2
 E[ X 2 ]  ( E[ X ]) 2
THE VARIANCE: PROPERTIES
 Variance is non-negative because the squares are positive or zero.

 The variance of a constant a is zero, and the variance of a variable
in a data set is 0 if and only if all entries have the same value.
Var (a )  0
 Variance is invariant with respect to changes in a location parameter.
That is, if a constant is added to all values of the variable, the
variance is unchanged.
Var ( X  a)  Var ( X )
 If all values are scaled by a constant, the variance is scaled by the
square of that constant.
Var (aX )  a 2Var ( X )
Var (aX  b)  a 2Var ( X )
THE SAMPLE VARIANCE
 If we have a series of n measurements of a random

variable X as Xi, where i = 1, 2, ..., n, then the sample
variance, can be used to estimate the population variance
of X = (x1, x2, ..., xn), The sample variance is calculated as
 X X
n
2
i
S x2  i 1
n 1

1
n 1

X1  X   X 2  X   ...  X n  X 
2 2 2

THE SAMPLE VARIANCE
 The denominator, (n-1) is known as the degrees of freedom in

calculating s x2: Intuitively, once X is known, only n-1 observation
values are free to vary, one is predetermined by X
 When n = 1 the variance of a single sample is obviously zero
regardless of the true variance. This bias needs to be corrected for
when n is small.
 X X
n
2

X1  X   X 2  X   ...  X n  X  
i
1 2 2 2
S 
2 i 1

n 1 n 1
x
THE SAMPLE VARIANCE
 For the hypothetical price data for Dec CME Live Cattle futures
contract, 67.05, 66.89, 67.45, 67.45, 68.39, 68.39, 70.10, the sample
variance can be calculated as
 X  X 
n
2
i
S 
2 i 1
n 1
x

1
7 1

67.05  67.96  ...  70.10  67.96
2 2

 1.24
THE STANDARD DEVIATION
 In statistics, the standard deviation of a random variable

or distribution is the square root of its variance.
 If a random variable X has the expected value (mean)
E[X]=μ, then the standard deviation of X can be given by:
 x   x2  E [( X   )2 ]
 That is, the standard deviation σ (sigma) is the square root
of the average value of (X − μ)2.
THE STANDARD DEVIATION
 If we have a series of n measurements of a random

variable X as Xi, where i = 1, 2, ..., n, then the sample
standard deviation, can be used to estimate the
population standard deviation of X = (x1, x2, ..., xn). The
sample standard deviation is calculated as
 X X
n
2
i
Sx  S  2 i 1
 1.24  1.114
n 1
x
THE MEAN ABSOLUTE DEVIATION
 The mean or average deviation of X from its mean
  di (X  X)
  i 
 n n 
 
is always zero. The positive and negative deviations cancel out in
the summation, which makes it a useless measure of dispersion.
 The mean absolute deviation (MAD), calculated by:

 d i   (X i  X ) 
 n n 
 
solves the “canceling out” problem.
THE MSD AND RMSD
 The alternative way to address the canceling out problem is by

squaring the deviations from the mean to obtain the mean
squared deviation (MSD):
 di
2
 X  X  2
 i
n n
 The problem of squaring can be solved by taking the square root of
the MSD to obtain the root mean squared deviation (RMSD):
 X X
n
2
i
RMSD  MSD  i 1
n
RMSD VS. STANDARD DEVIATION
 When calculating the RMSD, the squaring of the deviations gives a

greater importance to the deviations that are larger in absolute value,
which may or may not be desirable.
 For statistical reasons, it turns out that a slight variation of the RMSD,
known as the standard deviation (SX), is more desirable as a measure of
dispersion.
 X i  X
n
2
RMSD  MSD  i 1
n
 X X
n
2
i
Sx  i 1
n 1
VARIANCE VS. MSD
STANDARD DEVIATION VS. RMSD
Price (X) Mean (Xi−Mean) |Xi−Mean| |Xi−Mean|2

67.05 67.96 -0.91 0.91 0.83
66.89 67.96 -1.07 1.07 1.14
67.45 67.96 -0.51 0.51 0.26
68.39 67.96 0.43 0.43 0.18
67.45 67.96 -0.51 0.51 0.26
70.10 67.96 2.14 2.14 4.58
68.39 67.96 0.43 0.43 0.18
Total 0.00 6.00 7.44
MAD = 0.86
Variance = 1.24 MSD = 1.06
Std. Dev. = 1.11 RMSD = 1.03
p 53
ASSOCIATION
 Bivariate statistics can be used to examine the degree in

which two variables are related or associated, without
implying that one causes the other
 Multivariate statistics can be used to examine the degree in

which multiple variables are related or associated, without
implying that one causes any or some of the others
 Two common measures of bivariate and multivariate statistics are

 Covariance
32
 Correlation Coefficient
p 54
Association: Bivariate Statistics
 In Figure 3.3 (a) Y and X are positively but weakly correlated
while in 3.3 (b) they are negatively and strongly correlated
33
THE COVARIANCE
 The covariance between two real-valued random variables X and Y,

with mean (expected values) X   and Y  v , is
Cov( X , Y )  E[( X  X ).(Y  Y )]  E[( X   ).(Y  v)]
 E[ X .Y  Y  vX  v]
 E[ X .Y ]  E[Y ]  vE[ X ]   v
 E[ X .Y ]   v   v   v
 E[ X .Y ]   v
 Cov(X, Y) can be negative, zero, or positive
 Random variables with covariance is zero are called uncorrelated
or independent
COVARIANCE
 If X and Y are independent, then their covariance is zero. This

follows because under independence,
E[ X .Y ]  E[ X ].E[Y ]   v
 Recalling the final form of the covariance derivation given above,

and substituting, we get
Cov( X , Y )   v   v  0
 The converse, however, is generally not true: Some pairs of random

variables have covariance zero although they are not independent.
THE COVARIANCE: PROPERTIES
 If X and Y are real-valued random variables and a and b are

constants ("constant" in this context means non-random), then the
following facts are a consequence of the definition of covariance:
Cov( X , a )  0
Cov( X , X )  Var ( X )
Cov( X , Y )  Cov(Y , X )
Cov(aX , bY )  abCov( X , Y )
Cov( X  a, Y  b)  Cov( X , Y )
VARIANCE OF THE SUM OF CORRELATED
RANDOM VARIABLES
 If X and Y are real-valued random variables and a and b are

constants ("constant" in this context means non-random), then the
following facts are a consequence of the definition of variance and
covariance:
Var ( X  Y )  Var ( X )  Var (Y )  2Cov( X , Y )
Var (aX  bY )  a 2Var ( X )  b 2Var (Y )  2abCov( X , Y )
 The variance of a finite sum of uncorrelated random variables is

equal to the sum of their variances.
Var ( X  Y )  Var ( X )  Var (Y )
 This is because, if X and Y are uncorrelated, their covariance is 0.

p 53
THE SAMPLE COVARIANCE
 The covariance is one measure of how closely the values taken by

two variables X and Y vary together:
 If we have a series of n measurements of X and Y written as Xi and

Yi where i = 1, 2, ..., n, then the sample covariance can be used to
estimate the population covariance between X=(X1, X2, …, Xn) and
Y=(Y1, Y2, …, Yn). The sample covariance is calculated as
 X  X Yi  Y 
n
i
S x, y  i 1
n  1 38
CORRELATION COEFFICIENT
 A disadvantage of the covariance statistic is that its magnitude

can not be easily interpreted, since it depends on the units in
which we measure X and Y
 The related and more used correlation coefficient remedies
this disadvantage by standardizing the deviations from the
mean:
Cov( X , Y )  X ,Y
 x, y  
Var ( X ) Var (Y )  X . Y
 The correlation coefficient is symmetric, that is

 x, y   y, x
 If we have a series of n measurements of X and Y written as Yi

and Yi, where i = 1, 2, ..., n, then the sample correlation
coefficient, can be used to estimate the population correlation
coefficient between X and Y. The sample correlation coefficient
is calculated as
(X i  X )(Yi  Y )
rx , y  i 1
(n  1) S x S y
 The value of correlation coefficient falls between −1 and 1:
 1  rx , y  1
 rx,y= 0 => X and Y are uncorrelated

 rx,y= 1 => X and Y are perfectly positively correlated
 rx,y = −1 => X and Y are perfectly negatively correlated

Modeling &amp; Simulation Lect 27 28

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modeling &amp; Simulation Lect 27 28

Uploaded by

Copyright:

Available Formats

STATISTICAL CONCEPTS IN

People use the term probability many times

If an event can occur in N mutually exclusive

Experiment ==> any planned process

Simple event: one outcome in the sample space; a possible

Event: a collection of one or more simple events in the sample

If A is an event, its complement is The

It is an estimate that reflects a person’s opinion, or

Important in medicine  form the basis of a

 A random variable either may be discrete or

 Some statistical summaries are especially common in

 Every set of data can be described in terms of how frequently

 In mathematical notation, the sample mean of a set of data denoted as

 To calculate the mean, all of the observations (values) of X are added

 In statistics, a median is described as the numeric value separating

 In statistics, statistical dispersion (also called statistical variability

 Dispersion is contrasted with location or central tendency, and together

 In statistics, the variance of a random variable or distribution is the

 The above definition of variance encompasses random variables that

 Variance is non-negative because the squares are positive or zero.

 If we have a series of n measurements of a random

 The denominator, (n-1) is known as the degrees of freedom in

 In statistics, the standard deviation of a random variable

 If we have a series of n measurements of a random

 The mean or average deviation of X from its mean

 The alternative way to address the canceling out problem is by

 When calculating the RMSD, the squaring of the deviations gives a

Price (X) Mean (Xi−Mean) |Xi−Mean| |Xi−Mean|2

 Bivariate statistics can be used to examine the degree in

 Multivariate statistics can be used to examine the degree in

 Two common measures of bivariate and multivariate statistics are

 The covariance between two real-valued random variables X and Y,

 If X and Y are independent, then their covariance is zero. This

 Recalling the final form of the covariance derivation given above,

 The converse, however, is generally not true: Some pairs of random

 If X and Y are real-valued random variables and a and b are

 If X and Y are real-valued random variables and a and b are

 The variance of a finite sum of uncorrelated random variables is

 This is because, if X and Y are uncorrelated, their covariance is 0.

 The covariance is one measure of how closely the values taken by

 If we have a series of n measurements of X and Y written as Xi and

 A disadvantage of the covariance statistic is that its magnitude

 The correlation coefficient is symmetric, that is

 If we have a series of n measurements of X and Y written as Yi

 The value of correlation coefficient falls between −1 and 1:

 rx,y= 0 => X and Y are uncorrelated

You might also like

Modeling & Simulation Lect 27 28

Modeling & Simulation Lect 27 28