You are on page 1of 15

Probability, Normal Distributions, and Sampling Distribution

Intensive Course in Statistics by Dr. Poch Bunnak March 29-April 29 April 2, 2 2010
2010 Dr. Poch Bunnak 1

Content
I. Link between probability and inferential statistics II. Important probability concepts III. Types of probability distribution Discrete Continuous IV. Normal and standard normal distributions V. Sampling li distributions di ib i VI. Central Limit Theorem VII.Exercise
2010 Dr. Poch Bunnak 2

I. Probability y and Inferential Statistics


Probability is a numerical measure of the chance that a particular event will occur:
near near-zero zero prob. prob of rain (no chance of rain), rain) 90% prob. prob of rain (almost certain that it will rain), or 50% prob. of rain (rain is just as likely to occur as not). Using data on the number of TV in the household, what is the prob. that there is one or two TV in the household? 15 males and 5 females in a class. A professor wants to select 2 p him conduct a p project. j What is the prob. p that students at random to help the two students chosen are female? The return on a business is normally distributed with a mean of 10% and a standard deviation of 5%. What is the probability of losing money? ? Sam is a lazy student taking a statistics course. When taking a quiz consisting of 10 MCQs, each MCQ has 5 answers but only 1 is correct, he relies on guessing the answer answer. What is the probability that Sam will get all wrong answers? What is the probability that he will get 50% correct to pass the test?
2010 Dr. Poch Bunnak 3

Probability y and Inferential Statistics


Statistical inference is based on the principles of probability: get the correct answer if we used our how often would we g method many times? Statistical inference takes into account the variability of sampling process that allows us to estimate precision and the likelihood of being correct. Ex. Our goal is to estimate the mean for a population of , We take a random sample p of 1000 for that population p p 100,000. and compute the sample to estimate the population mean.
Since we will miss a large portion of the population, it is unlikely that the sample mean will exactly match the population mean. mean If we took several samples, we would likely get a different answer each time. If f we took ki infinite fi i number b of f random d samples, l however, h the h average of the sample means will equal the population mean.

Based on this p principle p of sampling p g variability y, we can make an educated guess of the precision of a single sample mean
2010 Dr. Poch Bunnak

II. Probability y Concepts p


Probability distribution is a listing of the outcomes of a p and the corresponding p g probability p y random experiment
This list must be exhaustive, i.e. ALL possible outcomes included.

Die roll {1,2,3,4,5}

Die roll {1,2,3,4,5,6}

The list must be mutually exclusive, i.e. no two outcomes can occur at the same time:

Die roll {odd number or even number} A list of exhaustive and mutually exclusive outcomes is called a sample space and is denoted by S.
The outcomes are denoted by O1, O2, , Ok Using notation from set theory, we can represent the sample space and its outcomes as:

S = {O1, O2, , Ok}


2010 Dr. Poch Bunnak 5

Requirements q of Probabilities
Given a sample space S = {O1, O2, , Ok}, the h probabilities assigned i d to the h outcome must satisfy these requirements:
(1)The probability of any outcome is between 0 and 1

i e 0 P(Oi) 1 for each i i.e.


(2) The sum of the prob. of all the outcomes equals 1

i.e. P(O1) + P(O2) + + P(Ok) = 1


P(Oi) represents the probability of outcome i

2010

Dr. Poch Bunnak

Example p 1
Six top-rated applicants applied for jobs: M1, M2, F3, M4 F5, M4, F5 M6. M6 Only O l 2 vacancies i are available. il bl Since Si they are all highly qualified, the selection committee wants t to t select l t randomly. d l They Th want t to t know k what h t is i the probability that both female candidates are selected and the probability probabilit that one male and one female are selected. Pairing: ii See Table bl 4.1. in i SMSS, page 68 Answer: P(f,f) = .10, P(m,f) = .30, P(m,m) = .30

2010

Dr. Poch Bunnak

Example p 2
A=sex of students (1=fem., 2=male) B=married before graduation (1=yes, (1=yes 2=no) We want to calculate P(B1 | A1)
B1 A1 A2 P(Bj) .11 .06 .17 B2 .29 .54 .83 P(Ai) .40 .60 1.00

Thus, there is a 27.5% chance that that students are married before graduation given that they are female. Note on marginal probability 2010 Dr. Poch Bunnak 8

Counting g Rule for Combination


Suppose that we want to count the possible outcomes of an experiment with n objects selected from a set of N objects; that is the number of combinations of N objects taken n at a time. The possible outcomes are given by Example:
A quality control inspector wants to randomly select two of five parts to test for defects. In such a group of five parts, how many combinations of two parts may be selected? A state lottery uses a random selection of 6 numbers from a group of 44 numbers:

( )
2010

44 44! 44! (44)(43)(42)(41)(40)(39) = = = = 7,059,052 6 6!(44 - 6)! 6!38! (6)(5)(4)(3)(2)(1)


Dr. Poch Bunnak 9

III. Discrete Probability y Distribution


It is a probability distribution of a discrete random variable Conditions:

Examples:
Number of students in a class Number of children in the family Nu Number be of o TV V sets se s in the e household ouse o d
2010 Dr. Poch Bunnak 10

Discrete Probability y Distribution ( (cont.) )


It is a probability distribution of a discrete variable Conditions: Using relative frequencies, consider the discrete (countable) ( ) number of televisions p per household
1,218 101,501 = 0.012

2010

e.g. XBunnak =4) = P(4) = 0.076 = 7.6%11 Dr.P( Poch

The Mean and Variance of a Discrete P. D. The mean:


reports t the th central t l location l ti of f the th data. d t is the long-run average value of the random variable. is called the expected value value, E(X), ) in a probability distribution. is a weighted g average g computed p by y =[xP(x)] where

is the mean
P(x) is the probability of the various outcomes x.

The variance, denoted by 2 (sigma squared), measures the amount of spread (variation) of a distribution: 2 = [(x )2 P(x)] , , is the square q root of 2 The standard deviation,
Dr. Poch Bunnak 12

2010

Example p of a Discrete P. D.
Compute , 2, and of the number of siblings using these data: 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5

The weighted average = 2.65


2010 Dr. Poch Bunnak

2 = [(x )2 P(x)] = 1.73 = 1.31


13

IV. Continuous Probability y Distribution


It is a probability distribution of a continuous variable that can assume an uncountable number of values. We cannot list the possible values because there is an infinite number of them. Ex. Time taken to travel from PP to SHN Because there is an infinite number of values, the probability of each individual value is virtually 0. 0 Ex. Ex What is the probability that you will spend 4h36min50s for traveling from PP to SHN? It is possible, but it is one out of so many possibilities ibili i that h its i probability b bili becomes b 0. 0 Thus, we can determine the probability of a range of values only: P(a x b). ) Ex. Ex What is the probability that you will spend between 4h30min and 5h00m for traveling from PP to SHN? Probability density function is used to calculate 2010 Dr. Poch Bunnak 14 probabilities of a continuous variable: No need to

Probability y Density y Function


A function f(x) is called a probability density f function ti (over ( th the range a x b if it meets t the th following requirements: 1) f(x) 0 for all x between a and b, and
f(x) area=1 a b x

2) The total area under the curve between a and b is 1.0


2010 Dr. Poch Bunnak 15

V. The Normal Probability y Distribution


The normal distribution is the most important of all probability b bilit distributions. di t ib ti Areas under the curve is obtained from the probability p y density y function Characteristics:
1) Bell-shaped Bell shaped curve 2) Symmetrical shape around the mean, 3) Mean = median = mode 4) Asymptotic curve (tails get closer and closer to x-axis but never touch it) it).
2010 Dr. Poch Bunnak 16

Normal Probability y Distribution, , cont.


There are many NPDs that belong to a family of NPDs The shape of each NPD is determined by and Ex. There is only one NPD shape that = 10 0a and d=3

2010

Dr. Poch Bunnak

17

No matter what the values of and are for a NPD, the total area under th curve is the i equal l to t one We W can consider id partial ti l areas under d the th curve as representing probabilities. Ex. What is the probability that a value z falls above the value of + 2? Below + 3? Etc. Empirical Rules for NPD (See Page 52 in SMSS)

2010

Dr. Poch Bunnak

18

Area p probabilities by y standard deviation

2010

Dr. Poch Bunnak

19

V. Standard Normal Probability Distribution


A normal distribution whose mean is zero and standard deviation is one is called the standard normal distribution.

2010

Dr. Poch Bunnak

20

Example p for Calculating g Normal Probabilities


The time required to build a computer is normally di t ib t d with distributed ith a mean of f 50 minutes i t and d a standard t d d deviation of 10 minutes:

What is the probability that a computer is assembled in a time between 45 and 60 minutes? Or what is P(45 < X < 60) ?
2010 Dr. Poch Bunnak 21

Example, p cont. P(45 < X < 60) ?


mean of 50 minutes and a standard deviation of 10 minutes 1. Convert x to z

2010

Dr. Poch Bunnak

22

Example, p cont.
Weve converted P(45 < X < 60) for a normal distribution with mean = 50 and st. st dev. dev = 10 to P( P(.5 5 < Z < 1) [a standard normal distribution with mean = 0 and st. dev. = 1], meaning:
A z-value of x=45 is -.5, meaning that x=45 is .5 st. dev. < the mean A z-value of x=60 is 1, meaning that x=60 is 1 st. dev. > the mean

So, where do we go from here? Find the probability for the areas between z=-.5 and z=0 and between z=0 and z=1 Based on SNP table:
Area between z=-.5 and z=0: .5000 - .3085 = .1915 Area between z=0 and z=1: .5000 - .1587 = .3413 Both areas = .1915 1915 + .3413 3413 = .5328 5328 Thus, the probability that a computer is assembled between 45 and 60 minutes is 53.28% (See graphs in next slides)
Dr. Poch Bunnak 23

2010

Example, p cont.
P(.5 < Z < 1) looks like this: The probability is the area under the curve We will add up the two sections: P(.5 < Z < 0) and ( < Z < 1) ) P(0

.5
2010 Dr. Poch Bunnak

1
24

Example, p cont.
Use Table A on page 527 to look-up probabilities b biliti P(0 < Z < z) )

We can break up P(.5 < Z < 1) into: P(.5 < Z < 0) + P(0 < Z < 1) y around zero, , so The distribution is symmetric (multiplying by -1 and re-arranging the terms), thus:
P(.5 < Z < 0) = P(.5 > Z > 0) = P(0 < Z < .5) Hence: P(.5 < Z < 1) = P(0 < Z < .5) + P(0 < Z < 1) = .1915 + .3414 = .5328
2010 Dr. Poch Bunnak 25

Practices
Do exercises 1-3 on page 72 Do Problems 11 and 17 on page 89.

2010

Dr. Poch Bunnak

26

VI. Sampling p g Distribution & Sampling p g Error


Read R d SMSS P Pages 77 77-82 82 Class demonstration!

2010

Dr. Poch Bunnak

27

VII. Central Limit Theorem (CLT) ( )


CLT describes the shape of the distribution of sample means states that In selecting a random sample of size n from a population, the sampling distribution of y can be approximated by a N.P.D. with mean and standard d d error y = 2/sqrt( / (n), ) as n becomes b l large (n 30). 30) Implications: 1: As n increases, increases y decreases. decreases Thus, Thus y tends to be closer to for larger n , since the sampling dist. becomes less dispersed about . 2: For large n, the sampling dist. of y is approx. normal, regardless of the shape of f the h pop. di dist. 3: As n increases, the sampling dist. of y takes on a more normal shape. 4 W 4: We can use Table T bl A to calculate l l the h prob. b that h of fy is i within i hi zy units i of f . 5: Saying n 30 the sampling dist. ofy is normal does not mean that we p on the level of have to select n = 30 for statistical inference. n should depend precision desired. 6: When the pop. has a normal P.D., the sampling dist. of y is a normal P.D. for any sample size.
2010 Dr. Poch Bunnak 28

Exercise
1. 2. 3. 4. Problem 10, page 88 in SMSS Problem 17, page 89 in SMSS Problem 31, page 92 in SMSS Use census 1998 data:
1) Find measures of central tendency, dispersion, and shape of age of population 2) ) Each student draws a 5% random sample p and redo 1) ) above. 3) Compare the results 4) Redo 2) and 3) by using a 15% random sample.

2010

Dr. Poch Bunnak

29

You might also like