You are on page 1of 14

9/9/2013

1
The science of
decision-making
in the face of
uncertainty.
Statistics
P(A) = 0 impossible event
P(A) = 1 certain event
2. Sum of the probabilities of
all possible outcomes
must equal 1.
1. 0 < P(A) < 1 _ _
Probability requirements:
A numerical measure of the
likelihood that an outcome or an
event occurs.
P(A) = probability of event A
Probability
How do we measure
"uncertainty"?
mn Rule
If an operation can be done m ways and a
second operation can be done n ways,
then there are mn ways for the two
operations to occur in order.
mn Rule
Ex.)
A cafeteria offers 5 salads, 4 meats, 8 vegetables,
3 breads, 4 desserts, and 3 drinks. A meal is two
servings of vegetables, which may be identical.
How many meals are available?

5 * 4 * 8 * 3 * 4 * 3 = 5760
Sampling from a Population with Replacement

Sampling n times from a population of size N
with replacement would provide:




Where N = population size
n = sample size
()


9/9/2013
2
Sampling from a Population with
Replacement

Ex)
In a lottery six numbers are drawn from the digits
0 through 9, with replacement. How many
different groupings of six numbers can be drawn?


A million six-digit numbers are available!
()

= ()

= 1,000,000
Combinations: Sampling from a
Population without Replacement
This counting method uses combinations
Selecting n items from a population of N
without replacement
Combinations
Ex)
A tray contains 1,000 individual tax returns. If 3
returns are randomly selected without replacement
from the tray, how many possible samples are there?
0 166,167,00
)! 3 1000 ( ! 3
! 1000
)! ( !
!
=

=
|
|
.
|

\
|
=
n N n
N
n
N
C
r n
Combinations
Ex.)
Suppose a small law firm has 16 employees and three are
to be selected randomly to represent the company at the
annual meeting of the American Bar Association.
How many different combinations of lawyers could be sent
to the meeting?
Answer:
N
C
n
=
16
C
3
= 16! / (3!13!) = 560.
Discrete vs. Continuous Distributions
Random Variable - a variable which contains the outcomes of
a chance experiment

1. Discrete Random Variable
2. Continuous Random Variable

Discrete Random Variable A random variable that only
takes on distinct values

Ex: Number of heads on 10 flips, Number of defective items
in a random sample of 100, Number of times you check your
watch during class, etc.


Continuous Random Variable A random variable that
takes on infinite values by increasing precision. For each two
values, there always exists a valid value in between them.

Ex: Time until a bulb goes out, height, etc.
9/9/2013
3
Discrete Random Variable A random variable that only
takes on distinct values

Ex: Number of heads on 10 flips, Number of defective items
in a random sample of 100, Number of times you check your
watch during class, etc.


Continuous Random Variable A random variable that
takes on infinite values by increasing precision. For each two
values, there always exists a valid value in between them.

Ex: Time until a bulb goes out, height, etc.
Properties for Discrete R.V.
1. 0 < p
i
< 1, for each i
2. Sum of all probabilities
must equal 1.000.
Mean of a Discrete R.V.
p
i
= P( X = x
i
)

x
= E x
i
- p
i
= x
1
-p
1
+ x
2
-p
2
+...+ x
k
-p
k
Roll one die many, many times.
What do you expect the average
to be?
x
i

p
i
1 2 3 4 5 6
1/6 1/6 1/6 1/6 1/6 1/6

x
=
1-(1/6) + 2-(1/6) + ... + 6-(1/6)
= 3.5
Variance of a Discrete R.V.
o
2
= E (x
i
-
X
)
2
- p
i
Variance is a measure of RISK.
Standard Deviation:
o = o
2

Roll one die many, many times.
What do you expect the average
to be?
x
i

p
i
1 2 3 4 5 6
1/6 1/6 1/6 1/6 1/6 1/6
o
2
= E (x
i
-
X
)
2
- p
i

= (1-3.5)
2
-(1/6) + (2-3.5)
2
-(1/6) + ...
+ (6-3.5)
2
-(1/6)
= 2.75

9/9/2013
4
A discrete data distribution
used to describe a population
of binary variable values.
Binary one of only two outcomes
can occur,
coded as 0 or 1
Bernoulli Distribution
Examples of Bernoulli Variables
Sex (male or female)
Major (business or not business)
Defective? (defective or non-defective)
Response to a T-F question
(true or false)
Where a student lives
(on-campus or off-campus)
Credit application result
(accept or deny)
Own home? (own or rent)
Course result (Pass or Fail)
Bernoulli distribution, one trial.
x 0 1
P(X = x)
p
o = p (1 - p)
= The mean of of the Bernoulli is
The standard deviation is
Bernoulli has one parameter:
t = the probability of success.
Bernoulli (t)
X ~
1 p p
A discrete data distribution used
to model a population of counts for n
independent repetitions
of a Bernoulli experiment.
Binomial Distribution
Conditions:
1. a fixed number of trials, n.
2. all n trials must be independent of each other.
3. the same probability of success on each trial.
4. X = count of the number of successes.
Binomial distribution, n trials.
x 0 1 2 3 . n
P(X = x)
np
o = np (1 - p)
= The mean of of the Binomial is
The standard deviation is
Binomial has two parameters:
n = the fixed number of trials,
p = the probability of success for each trial.
to be determined
X ~ Binomial( n, p)
If the population of X-values has a
binomial data distribution, then the
proportion of the population having the
value x is given by:
This is also the probability that a single value
of X will be exactly equal to x.
P( X = x ) =
( )
n
x
p
x
(1 p)
nx
for x = 0, 1, 2, ..., n
9/9/2013
5
P( X = x ) =
( )
n
x
p
x
(1 p)
nx
Big X is the
random variable.
Little x is a
specific value
of Big X.
Example:
P( Count = 5)
P( X = x ) =
( )
n
x
p
x
(1 p)
nx
( )
n
x
=
n!
x! (nx)!
n! = n(n-1)(n-2) (2)(1)
0! = 1
P( X = x ) =
( )
n
x
p
x
(1 p)
nx
These are the
possible values of X;
each value has
its own probability.
x = 0, 1, 2, ..., n
Binomial Distribution
( )
p 1 q , 0 for
! !
!
) (
= s s

=
n X
q
X n
p
X
X n X
n
X P
p n =
q p n
q p n
= =
=
o
o
o
2
2
Probability
function


Mean value

Variance
and
Standard
Deviation
Binomial Distribution:
Demonstration Problem 5.3
According to the U.S. Census Bureau,
approximately 6% of all workers in Jackson,
Mississippi, are unemployed. In conducting a
random telephone survey in Jackson, what is the
probability of getting two or fewer unemployed
workers in a sample of 20?
In this example,
6% are unemployed => p
The sample size is 20 => n
94% are employed => q
X is the number of successes desired
What is the probability of getting 2 or fewer
unemployed workers in the sample of 20? =>
P(X2)
The hard part of this problem is identifying p, n,
and x emphasize this when studying the
problems.
Binomial Distribution:
Demonstration Problem 5.3
9/9/2013
6
Binomial Distribution:
Demonstration Problem 5.3
=
= + = + = = s
=
=
=
) 2 ( ) 1 ( ) 0 ( ) 2 (
94 .
06 .
20
X P X P X P X P
q
p
n
( ) ( ) 2901 . ) 2901 )(. 1 )( 1 ( 94 .
0 20
06 .
0
)! 0 20 ( ! 0
! 20
) 0 ( = =

= = X P
( ) ( ) 3703 . ) 3086 )(. 06 )(. 20 ( 94 .
1 20
06 .
1
)! 1 20 ( ! 1
! 20
) 1 ( = =

= = X P
( ) ( ) 2246 . ) 3283 )(. 0036 )(. 190 ( 94 .
2 20
06 .
2
)! 2 20 ( ! 2
! 20
) 2 ( = =

= = X P
8850 . 2246 . 3703 . 2901 . = + +
Binomial Distribution:
Demonstration Problem 5.3
What are the mean and standard deviation of
this distribution?
= = q p n o
2
= = p n 20 . 1 ) 06 )(. 20 ( =
128 . 1 ) 94 )(. 06 )(. 20 ( =
062 . 1 128 . 1
2
= = = o o

Continuous Distributions
The Normal Distributions
a.k.a., The Bell Shaped Curve
Describes the shape for some
quantitative, continuous random variables.
Equation of the Normal Distribution
density function:
s s x

1
f(x) =
2t o
(x )
2
2o
2
e
Probability Density Function of
the Normal Distribution
. . . 2.71828
. . . 3.14159 =
X of dev std
X of mean
:
2
1
) (
2
2
1
=
=
=
|
.
|

\
|
=

e
Where
x
x f
e
t
o

t o
X

9/9/2013
7
= mean determines the location.
o = standard deviation
determines spread.
Normal Population Distribution
has two parameters:
These parameters are estimated
with the following sample statistics:
x is an estimator of
s is an estimator of o
Observe: For normal density curves,
Continuous distribution - Line does not break
Total area is always 1.0 or 100%.
Area == Relative Frequency
Bell-shaped, symmetrical distribution
Observe: For normal density curves,
Values can range from - to +
Mean = median = mode
68% of data are within one std dev of mean,
95% within two std devs,
and 99.7% within three std devs

-4 -3 -2 -1 0 1 2 3 4
68%, 1o
95%, 2o
99.7%, 3o
0.15% 0.15%
Empirical Rule of the Normal Distribution
Three Normals A
B
C
Which population
has the
largest mean?
Three Normals A
B
C
Which population
has the
smallest mean?
9/9/2013
8
Three Normals A
B
C
Which population
has the
largest std. dev.?
Three Normals A
B
C
Which population
has the
smallest std. dev.?
Three Normals A
B
C
For which
population does
the mean equal
its median?
Standard
Z
Standard
Normal
X ~ N( = 120, o = 8) or N(120, 8)
Z = the number of standard deviations
that an X - value is from the mean.
Notation:
X -
o
Z =
Z ~ N( = 0, o = 1 ) or N(0,1)
Z follows the Standard Normal Distribution
Z Table
Second Decimal Place in Z
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.00 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.10 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.20 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.30 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.90 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.00 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.10 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.20 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

2.00 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

3.00 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.40 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
3.50 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998

9/9/2013
9
Table Lookup of a Standard
Normal Probability
-3 -2 -1 0 1 2 3
P Z ( ) . 0 1 0 3413 s s =
Z 0.00 0.01 0.02

0.00 0.0000 0.0040 0.0080
0.10 0.0398 0.0438 0.0478
0.20 0.0793 0.0832 0.0871

1.00 0.3413 0.3438 0.3461

1.10 0.3643 0.3665 0.3686
1.20 0.3849 0.3869 0.3888
Applying the Z Formula
X is normally distributed with = 485, and =105 o
P X P Z ( ) ( . ) . 485 600 0 1 10 3643 s s = s s =
For X = 485,
Z =
X-
o
=

=
485 485
105
0
10 . 1
105
485 600 - X
= Z
600, = X For
=

=
o

Z 0.00 0.01 0.02



0.00 0.0000 0.0040 0.0080
0.10 0.0398 0.0438 0.0478

1.00 0.3413 0.3438 0.3461

1.10 0.3643 0.3665 0.3686

1.20 0.3849 0.3869 0.3888
Applying the Z Formula
7123 . ) 56 . 0 ( ) 550 (
100 = and 494, = with d distribute normally is X
= s = s Z P X P
o
56 . 0
100
494 550 - X
= Z
550 = X For
=

=
o

0.5 + 0.2123 = 0.7123


Applying the Z Formula
0197 . ) 06 . 2 ( ) 700 (
100 = and 494, = with d distribute normally is X
= s = > Z P X P
o
06 . 2
100
494 700 - X
= Z
700 = X For
=

=
o

0.5 0.4803 = 0.0197


Applying the Z Formula
8292 . ) 06 . 1 94 . 1 ( ) 600 300 (
100 = and 494, = with d distribute normally is X
= s s = s s Z P X P
o
94 . 1
100
494 300 - X
= Z
300 = X For
=

=
o

06 . 1
100
494 600 - X
= Z
600 = X For
=

=
o

0.4738+ 0.3554 = 0.8292


Sampling Distributions
9/9/2013
10
Think of X as a random variable.
Fact:

Different samples of size n
will produce different values
of the sample mean X (statistic).
The population mean (parameter)
is fixed as long as the population
does not change.
From pop. of all QUMT students,
randomly select n = 1 student.
Record Exam score:
Sampled value: 76
X = 76.0
Example 1, continued:
Sampled values: 64, 78, 94, 46
X = (64 + 78 + 94 + 46) / 4 = 70.5
From pop. of all QUMT students,
randomly select n = 4 students.
Record Exam scores:
Example 1, continued:
x s from larger samples
tend to be closer to the

true mean, , than
x s from smaller samples.
Intuition (and Fact):
Sampling Distribution of X
is the distribution of all possible
sample means
calculated from
all possible samples
of size n.
Also called the population
of all possible x-bars.
Sample 1:
n = 4
x-axis
92, 67, 80, 73
x = 78.0
Original Population: 300 QUMT students.
X = Exam score
= mean (unknown)
o = std dev (unknown)
9/9/2013
11
Sample 2:
n = 4
52, 68, 90, 64
x = 68.5
Original Population: 300 QUMT students.
X = Exam score
= mean (unknown)
o = std dev (unknown)
x-axis
Sample 22:
n = 4

Original Population: 300 QUMT students.
X = Exam score
= mean (unknown)
o = std dev (unknown)
x-axis
Sample 23:
n = 4
63, 58, 87, 74
x = 70.5
Original Population: 300 QUMT students.
X = Exam score
= mean (unknown)
o = std dev (unknown)
x-axis
And so on, . . . ,
until we collect every
possible sample of size n = 4.
300
4
( ) = 330,791,175.
How many samples
of size 4 are there from a
population of 300 members?
Sampling Distribution of
x for n = 4
Based on all
samples of
size n = 4

x
o
x
x-axis

x
o
x
Definitions, from previous slide:
= average of all possible Xs
(center of the sampling dist.)
= std. dev. of all possible Xs
(spread of the sampling dist.)
9/9/2013
12
Compare parameters of the
original pop. of all individual scores
and the parameters of the
sampling dist. of all possible xs
mean = & std. dev. = o
Original population:

x
o
x
=
(same mean as
individual values)
=
o
n
(different std. dev.,
but related!)
If = 75 and o = 10,
Original Population: 300 QUMT 6310 students.
X = Exam score.
then the population of all possible
X-values for n = 4 will have
= = 75
x
o
x
=
o
n
=
10
4
= 5
We now know the parameters
of the population of
all possible x-bar values.
What is the distribution?
If X ~ N (, o),
then for samples of size n,
X ~ N ( , ).
If original population has a
Normal dist, then the distribution
of X values is Normal also.

o
n
Bottle filling machine for soft drink.
Bottles should contain 20.00 ounces;
assume actual contents follow a
normal distribution with a
mean of 20.18 oz. and a
standard deviation of 0.12 oz.
X = contents of one randomly selected bottle
X ~ N( = 20.18, o = 0.12)
Example:
P( X < 20.00) =
-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0
0 -1.50
.0668
= P( Z < 1.5 )
a. Find the proportion of
individual bottles that contain
less than 20.00 oz.

20.18 20.0
Z =
20.0 20.18
0.12
X = content of one bottle.
X ~ N( = 20.18, o = 0.12)
6.68% of the bottles will contain
less than 20.00 ounces.
Is this a problem?
= 1.50
Z-axis
X-axis
= .0668
9/9/2013
13
-4.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 4.0
0 3.67
.00012
= P( Z < 3.67 )
b. Find the proportion of
six-packs whose mean
content is less than 20.00 oz.

20.18 20.0
Z =
20.0 20.18
0.0490
Only 0.012% of the six-packs
will contain an average
less than 20.00 ounces.
= 3.67
Z-axis
X-axis
= .00012
Is population of x-bars Normal?
Yes; because original pop. is Normal.
X = mean of six-pack.
X ~ N( = 20.18, o = .0490)
X x
x
o = 0.12 / sqrt(6)
P( X < 20.00) =
New situation
But what if the
original population
is not normally
distributed?
If X ~ NOT Normal, then
for large samples of size n,
X ~ N ( , ), approximately.
If original population does
NOT have a Normal dist,

o
n
the X values are approximately
Normal IF n is large.
How big is BIG?
Bigger is better, but
30 is enough!
This same phenomenon will happen
for ANY non-normal distribution,
IF n is BIG!
Anytime the original pop.
is Normal (true for any n).
Anytime the original pop.
is not Normal, but
n is BIG (n > 30).
Reminder
When is the population of
all possible X values Normal?
Anytime the original population
is not Normal AND
n is NOT BIG.
Reminder
When is the population of
all possible X values NOT Normal?
9/9/2013
14
n
X
Z
o

=
Z Formula for Sample Means
Z Formula for Sample Proportions
p p
Z
p q
n
where
p
n
p
q p
n p
n q
=

=
=
=
=


:
sample proportion
sample size
population proportion
1
5
5

You might also like