3 Probability and Sampling Distributions

Probability, Normal Distributions, and Sampling Distribution
Intensive Course in Statistics by Dr. Poch Bunnak March 29-April 29 April 2, 2 2010
2010 Dr. Poch Bunnak 1
Content
I. Link between probability and inferential statistics II. Important probability concepts III. Types of probability distribution Discrete Continuous IV. Normal and standard normal distributions V. Sampling li distributions di ib i VI. Central Limit Theorem VII.Exercise
I. Probability y and Inferential Statistics

Probability is a numerical measure of the chance that a particular event will occur:
near near-zero zero prob. prob of rain (no chance of rain), rain) 90% prob. prob of rain (almost certain that it will rain), or 50% prob. of rain (rain is just as likely to occur as not). Using data on the number of TV in the household, what is the prob. that there is one or two TV in the household? 15 males and 5 females in a class. A professor wants to select 2 p him conduct a p project. j What is the prob. p that students at random to help the two students chosen are female? The return on a business is normally distributed with a mean of 10% and a standard deviation of 5%. What is the probability of losing money? ? Sam is a lazy student taking a statistics course. When taking a quiz consisting of 10 MCQs, each MCQ has 5 answers but only 1 is correct, he relies on guessing the answer answer. What is the probability that Sam will get all wrong answers? What is the probability that he will get 50% correct to pass the test?
Probability y and Inferential Statistics

Statistical inference is based on the principles of probability: get the correct answer if we used our how often would we g method many times? Statistical inference takes into account the variability of sampling process that allows us to estimate precision and the likelihood of being correct. Ex. Our goal is to estimate the mean for a population of , We take a random sample p of 1000 for that population p p 100,000. and compute the sample to estimate the population mean.
Since we will miss a large portion of the population, it is unlikely that the sample mean will exactly match the population mean. mean If we took several samples, we would likely get a different answer each time. If f we took ki infinite fi i number b of f random d samples, l however, h the h average of the sample means will equal the population mean.
Based on this p principle p of sampling p g variability y, we can make an educated guess of the precision of a single sample mean
2010 Dr. Poch Bunnak
II. Probability y Concepts p

Probability distribution is a listing of the outcomes of a p and the corresponding p g probability p y random experiment
This list must be exhaustive, i.e. ALL possible outcomes included.
Die roll {1,2,3,4,5}
Die roll {1,2,3,4,5,6}
The list must be mutually exclusive, i.e. no two outcomes can occur at the same time:
Die roll {odd number or even number} A list of exhaustive and mutually exclusive outcomes is called a sample space and is denoted by S.
The outcomes are denoted by O1, O2, , Ok Using notation from set theory, we can represent the sample space and its outcomes as:
S = {O1, O2, , Ok}

Requirements q of Probabilities
Given a sample space S = {O1, O2, , Ok}, the h probabilities assigned i d to the h outcome must satisfy these requirements:
(1)The probability of any outcome is between 0 and 1
i e 0 P(Oi) 1 for each i i.e.

(2) The sum of the prob. of all the outcomes equals 1
i.e. P(O1) + P(O2) + + P(Ok) = 1

P(Oi) represents the probability of outcome i
2010
Dr. Poch Bunnak
Example p 1
Six top-rated applicants applied for jobs: M1, M2, F3, M4 F5, M4, F5 M6. M6 Only O l 2 vacancies i are available. il bl Since Si they are all highly qualified, the selection committee wants t to t select l t randomly. d l They Th want t to t know k what h t is i the probability that both female candidates are selected and the probability probabilit that one male and one female are selected. Pairing: ii See Table bl 4.1. in i SMSS, page 68 Answer: P(f,f) = .10, P(m,f) = .30, P(m,m) = .30
2010
Dr. Poch Bunnak
Example p 2
A=sex of students (1=fem., 2=male) B=married before graduation (1=yes, (1=yes 2=no) We want to calculate P(B1 | A1)
B1 A1 A2 P(Bj) .11 .06 .17 B2 .29 .54 .83 P(Ai) .40 .60 1.00
Thus, there is a 27.5% chance that that students are married before graduation given that they are female. Note on marginal probability 2010 Dr. Poch Bunnak 8
Counting g Rule for Combination

Suppose that we want to count the possible outcomes of an experiment with n objects selected from a set of N objects; that is the number of combinations of N objects taken n at a time. The possible outcomes are given by Example:
A quality control inspector wants to randomly select two of five parts to test for defects. In such a group of five parts, how many combinations of two parts may be selected? A state lottery uses a random selection of 6 numbers from a group of 44 numbers:
( )
2010
44 44! 44! (44)(43)(42)(41)(40)(39) = = = = 7,059,052 6 6!(44 - 6)! 6!38! (6)(5)(4)(3)(2)(1)

Dr. Poch Bunnak 9
III. Discrete Probability y Distribution

It is a probability distribution of a discrete random variable Conditions:
Examples:
Number of students in a class Number of children in the family Nu Number be of o TV V sets se s in the e household ouse o d
Discrete Probability y Distribution ( (cont.) )

It is a probability distribution of a discrete variable Conditions: Using relative frequencies, consider the discrete (countable) ( ) number of televisions p per household
1,218 101,501 = 0.012
2010
e.g. XBunnak =4) = P(4) = 0.076 = 7.6%11 Dr.P( Poch
The Mean and Variance of a Discrete P. D. The mean:

reports t the th central t l location l ti of f the th data. d t is the long-run average value of the random variable. is called the expected value value, E(X), ) in a probability distribution. is a weighted g average g computed p by y =[xP(x)] where

is the mean
P(x) is the probability of the various outcomes x.
The variance, denoted by 2 (sigma squared), measures the amount of spread (variation) of a distribution: 2 = [(x )2 P(x)] , , is the square q root of 2 The standard deviation,
Dr. Poch Bunnak 12
2010
Example p of a Discrete P. D.
Compute , 2, and of the number of siblings using these data: 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5
The weighted average = 2.65

2 = [(x )2 P(x)] = 1.73 = 1.31

13
IV. Continuous Probability y Distribution

It is a probability distribution of a continuous variable that can assume an uncountable number of values. We cannot list the possible values because there is an infinite number of them. Ex. Time taken to travel from PP to SHN Because there is an infinite number of values, the probability of each individual value is virtually 0. 0 Ex. Ex What is the probability that you will spend 4h36min50s for traveling from PP to SHN? It is possible, but it is one out of so many possibilities ibili i that h its i probability b bili becomes b 0. 0 Thus, we can determine the probability of a range of values only: P(a x b). ) Ex. Ex What is the probability that you will spend between 4h30min and 5h00m for traveling from PP to SHN? Probability density function is used to calculate 2010 Dr. Poch Bunnak 14 probabilities of a continuous variable: No need to
Probability y Density y Function

A function f(x) is called a probability density f function ti (over ( th the range a x b if it meets t the th following requirements: 1) f(x) 0 for all x between a and b, and
f(x) area=1 a b x
2) The total area under the curve between a and b is 1.0

V. The Normal Probability y Distribution

The normal distribution is the most important of all probability b bilit distributions. di t ib ti Areas under the curve is obtained from the probability p y density y function Characteristics:
1) Bell-shaped Bell shaped curve 2) Symmetrical shape around the mean, 3) Mean = median = mode 4) Asymptotic curve (tails get closer and closer to x-axis but never touch it) it).
Normal Probability y Distribution, , cont.

There are many NPDs that belong to a family of NPDs The shape of each NPD is determined by and Ex. There is only one NPD shape that = 10 0a and d=3
2010
Dr. Poch Bunnak
17
No matter what the values of and are for a NPD, the total area under th curve is the i equal l to t one We W can consider id partial ti l areas under d the th curve as representing probabilities. Ex. What is the probability that a value z falls above the value of + 2? Below + 3? Etc. Empirical Rules for NPD (See Page 52 in SMSS)
2010
Dr. Poch Bunnak
18
Area p probabilities by y standard deviation
2010
Dr. Poch Bunnak
19
V. Standard Normal Probability Distribution

A normal distribution whose mean is zero and standard deviation is one is called the standard normal distribution.
2010
Dr. Poch Bunnak
20
Example p for Calculating g Normal Probabilities

The time required to build a computer is normally di t ib t d with distributed ith a mean of f 50 minutes i t and d a standard t d d deviation of 10 minutes:
What is the probability that a computer is assembled in a time between 45 and 60 minutes? Or what is P(45 < X < 60) ?
Example, p cont. P(45 < X < 60) ?

mean of 50 minutes and a standard deviation of 10 minutes 1. Convert x to z
2010
Dr. Poch Bunnak
22
Example, p cont.
Weve converted P(45 < X < 60) for a normal distribution with mean = 50 and st. st dev. dev = 10 to P( P(.5 5 < Z < 1) [a standard normal distribution with mean = 0 and st. dev. = 1], meaning:
A z-value of x=45 is -.5, meaning that x=45 is .5 st. dev. < the mean A z-value of x=60 is 1, meaning that x=60 is 1 st. dev. > the mean
So, where do we go from here? Find the probability for the areas between z=-.5 and z=0 and between z=0 and z=1 Based on SNP table:
Area between z=-.5 and z=0: .5000 - .3085 = .1915 Area between z=0 and z=1: .5000 - .1587 = .3413 Both areas = .1915 1915 + .3413 3413 = .5328 5328 Thus, the probability that a computer is assembled between 45 and 60 minutes is 53.28% (See graphs in next slides)
Dr. Poch Bunnak 23
2010
Example, p cont.
P(.5 < Z < 1) looks like this: The probability is the area under the curve We will add up the two sections: P(.5 < Z < 0) and ( < Z < 1) ) P(0
.5
1
24
Example, p cont.
Use Table A on page 527 to look-up probabilities b biliti P(0 < Z < z) )
We can break up P(.5 < Z < 1) into: P(.5 < Z < 0) + P(0 < Z < 1) y around zero, , so The distribution is symmetric (multiplying by -1 and re-arranging the terms), thus:
P(.5 < Z < 0) = P(.5 > Z > 0) = P(0 < Z < .5) Hence: P(.5 < Z < 1) = P(0 < Z < .5) + P(0 < Z < 1) = .1915 + .3414 = .5328
Practices
Do exercises 1-3 on page 72 Do Problems 11 and 17 on page 89.
2010
Dr. Poch Bunnak
26
VI. Sampling p g Distribution & Sampling p g Error

Read R d SMSS P Pages 77 77-82 82 Class demonstration!
2010
Dr. Poch Bunnak
27
VII. Central Limit Theorem (CLT) ( )

CLT describes the shape of the distribution of sample means states that In selecting a random sample of size n from a population, the sampling distribution of y can be approximated by a N.P.D. with mean and standard d d error y = 2/sqrt( / (n), ) as n becomes b l large (n 30). 30) Implications: 1: As n increases, increases y decreases. decreases Thus, Thus y tends to be closer to for larger n , since the sampling dist. becomes less dispersed about . 2: For large n, the sampling dist. of y is approx. normal, regardless of the shape of f the h pop. di dist. 3: As n increases, the sampling dist. of y takes on a more normal shape. 4 W 4: We can use Table T bl A to calculate l l the h prob. b that h of fy is i within i hi zy units i of f . 5: Saying n 30 the sampling dist. ofy is normal does not mean that we p on the level of have to select n = 30 for statistical inference. n should depend precision desired. 6: When the pop. has a normal P.D., the sampling dist. of y is a normal P.D. for any sample size.
Exercise
1. 2. 3. 4. Problem 10, page 88 in SMSS Problem 17, page 89 in SMSS Problem 31, page 92 in SMSS Use census 1998 data:
1) Find measures of central tendency, dispersion, and shape of age of population 2) ) Each student draws a 5% random sample p and redo 1) ) above. 3) Compare the results 4) Redo 2) and 3) by using a 15% random sample.
2010
Dr. Poch Bunnak
29

3 Probability and Sampling Distributions

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3 Probability and Sampling Distributions

Uploaded by

Copyright:

Available Formats

Probability, Normal Distributions, and Sampling Distribution

I. Probability y and Inferential Statistics

Probability y and Inferential Statistics

II. Probability y Concepts p

Die roll {1,2,3,4,5}

Die roll {1,2,3,4,5,6}

S = {O1, O2, , Ok}

i e 0 P(Oi) 1 for each i i.e.

i.e. P(O1) + P(O2) + + P(Ok) = 1

Dr. Poch Bunnak

Dr. Poch Bunnak

Counting g Rule for Combination

44 44! 44! (44)(43)(42)(41)(40)(39) = = = = 7,059,052 6 6!(44 - 6)! 6!38! (6)(5)(4)(3)(2)(1)

III. Discrete Probability y Distribution

Discrete Probability y Distribution ( (cont.) )

e.g. XBunnak =4) = P(4) = 0.076 = 7.6%11 Dr.P( Poch

The Mean and Variance of a Discrete P. D. The mean:

The weighted average = 2.65

2 = [(x )2 P(x)] = 1.73 = 1.31

IV. Continuous Probability y Distribution

Probability y Density y Function

2) The total area under the curve between a and b is 1.0

V. The Normal Probability y Distribution

Normal Probability y Distribution, , cont.

Dr. Poch Bunnak

Dr. Poch Bunnak

Area p probabilities by y standard deviation

Dr. Poch Bunnak

V. Standard Normal Probability Distribution

Dr. Poch Bunnak

Example p for Calculating g Normal Probabilities

Example, p cont. P(45 < X < 60) ?

Dr. Poch Bunnak

Dr. Poch Bunnak

VI. Sampling p g Distribution & Sampling p g Error

Dr. Poch Bunnak

VII. Central Limit Theorem (CLT) ( )

Dr. Poch Bunnak

You might also like