Final Handouts - Stat220

Probability Distributions
Random Variable a function whose value is a real number determined by each element in the sample space
Probability Distribution a table or formula or a graph listing all possible values that a rv can take on, along with
the associated probabilities
Remark: We shall use an uppercase letter, say X, to denote a random variable and its corresponding lowercase
letters, x in this case, for one of its values.
Examples
1. In a random experiment of tossing a coin 3 times, let X be a random variable that pertains to the number of heads
in the outcome.
The possible outcomes are
S
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
x
3
2
2
2
1
1
1
0
Therefore, the probability distribution of X is given by

x
P(X=x) = f(x)
0
1/8
1
3/8
2
3/8
3
1/8
This means that the f(0) = P(X=0) = 1/8 is the probability of obtaining no head in the random experiment
and so on.
Notes:
1. f(x) = P(X=x)
2. f(x)=1
2. A fair die is thrown. Assign random variable X = # of spots on any throw.
The probability distribution of X in tabular form is:
Probability Distribution in Graphical Presentation
xp11
3
2
6
5
4
/
(6
x
)
P(x) 1
all x
3. What is the probability distribution of number of girls in families with two children?
The random variable here (denoted by X) refers to the number of girls.
The possible outcomes are:
X
2
1
1
0
Outcomes
GG
BG
GB
BB
0.5
0.4
0.3
0.2
0.1
0
0
How about family of three?

X
0
1
1
1
2
2
2
3
child #1
B
B
B
G
B
G
G
G
child #2
B
B
G
B
G
B
G
G
child #3
B
G
B
B
G
G
B
G
0.3
0.25
0.2
0.15
0.1
0.05
0
0
How about a family of 10?
As family size increases, the distribution looks more and more normal.
0.0
1.0
2.0
Number of Successes
3.0
10
-0
10
Number of Successes
Descriptive Characteristics of a RV: Mean and Variance

Expected Values (Mean) of a RV
Let X be a discrete rv with prob. Distribution
x
x1
x2
xn
P(X = x)
f(x1)
f(x2)
f(xn)
E ( X ) xi f ( xi )
i 1
The mean or expected value of X is

Variance of a Random Variable
Let X be a discrete random variable with mean and probability distribution given by the above table
n
2 var( X ) E ( X ) 2 ( xi )2 f ( xi )
i 1
The variance of X is
var( X ) E ( X 2 ) E ( X )
Computational Formula for 2:
Where : E(X2)= X2f(x)

Example
Compute the mean of the following probability distribution which represents the number of DVDs a person
rents from a video store during a single visit.
x
0
1
3
4
5
f(x)
0.06
0.58
0.22
0.03
0.01
Mean=0*0.06+1*0.58+2*0.22+3*0.1+4*0.03+5*0.01 = 1.49
Variance = (0-1.49)2*0.06+(1-1.49)2*0.58 +(2-1.49)2*0.22+(3-1.49)2*0.1 +(4-1.49)2*0.03+(5-1.49)2*0.01
=0.8699
Or using the alternative formula
Variance = 02*0.06+12*0.58+22*0.22+32*0.1+42*0.03+52*0.01 1.492 = 0.8699
Standard Deviation= square root of the variance = 0.932684
SOME SPECIAL PROBABILITY DISTRIBUTIONS
Probability
Distribution
Discrete
Probability
Distribution
Uniform
Distribution
Binomial
Distribution
Hypergeome
tric
Distribution
Continuous
Probability
Distribution
Poisson
Distribution
Normal
Distribution
Discrete Probability Distributions

A discrete random variable is a variable that can assume only a countable number of values
Many possible outcomes:
number of complaints per day

number of TVs in a household
number of rings before the phone is answered
Only two possible outcomes:
gender: male or female
defective: yes or no
spreads peanut butter first vs. spreads jelly first

Continuous Probability Distributions
A continuous random variable is a variable that can assume any value on a continuum (can assume an uncountable
number of values)
thickness of an item
time required to complete a task
temperature of a solution
height, in inches
These can potentially take on any value, depending only on the ability to measure accurately.
Exercise
Determine whether the following random variables are discrete or continuous. State possible values for the
random variable.
1. The number of light bulbs that burn out in a room of 10 light bulbs in the next year.
2. The number of leaves on a randomly selected Oak tree.
3. The length of time between calls to 911.
Discrete Probability Distributions
1.
2.
3.
4.
Uniform distribution
Binomial distribution
Hypergeometric distribution
Poisson distribution
1. Uniform Distribution
f ( x; N )
1
N
x a1 , a2,..., aN
For
V (X )
Example
N 2 1
N 1
E( X )
12
2
Suppose that an employee is selected at random from a staff of 10 to supervise a certain project. What is the
uniform distribution of X = employee number selected?
ANSWER:
Let X = employee number
= { 1, 2, ., 10}
Therefore, the probability distribution in formula form is f(x) = 1/10, x = 1, 2,, 10.
Exercises
1.
Two clergymen are to be chosen at random from 4 clergymen A, B, C, D for a community mass. Find the
uniform distribution of X = selected clergymen.
Answer: f(x) = 1/6, x = 1, 2, , 6
2.
Consider the random experiment of tossing a die where X = outcome of the die. Find the distribution of X.
Answer: f(x) = 1/6, x = 1, 2, , 6
2. Binomial Distribution
A binomial experiment is one that possesses the following properties:
1.
2.
3.
4.
The experiment consists of n repeated trials.

Each trial results in an outcome that may be classified as a success or a failure.
The probability of success, denoted by p, remains constant from trial to trial.
The repeated trials are independent.
The binomial random variable X is defined as the number of successes in n trials. Since it depends on the
number of trials and the probability of success on a given trial, the probability distribution of this random variable is
called the binomial distribution. The probability function or formula of the binomial distribution is
f(x) =nCxpxqn-x where q = 1-p is the probability of failure.
V ( X ) npq E ( X ) np
Example
From the previous example, the random experiment tossing a coin 3 times is a binomial experiment since it
consists of 3 repeated tosses, each toss results in an outcome that may be classified as success (when head appears)
or failure (when tail appears), the probability of having a head is for every trial and lastly, each toss will not affect
the next toss.
The random variable X is defined as the number of heads in 3 tosses.
Applying the binomial formula, we have
For f(0) = 3C0(1/2)0(1/2)3 = 1/8
For f(1) = 3C1(1/2)1(1/2)2 = 3/8
For f(2) = 3C2(1/2)2(1/2)1 = 3/8

For f(3) = 3C3(1/2)3(1/2)0 = 1/8
This is the same answer from previous example.
Exercises
1. Find the probability of obtaining exactly three 2s if an ordinary die is tossed 5 times.
Answer: 0.0322
2.
A multiple-choice quiz has 10 questions, each with 4 possible answers of which only 1 is correct. What is the
probability that sheer guesswork yields from 5 to 6 correct answers?
Answer: 0.0746
3. Hypergeometric Distribution
Characteristics of a Hypergeometric Experiment:
a. A random sample of size n is selected from a population of N items.
b. K of the N items may be classified as successes and N-k classified as failures.
f ( x; N , n, k )
Cx N k Cn x
, x 0,1,...min( n, k )
N Cn
nk ( N k )( N n)
nk
E( X )
2
N ( N 1)
N
V (X )
Example
A committee of size 5 is to be selected at random from 3 women and 5 men. Let X be the number of
women in the selected committee. What is the probability that there would be 2 women in the selected committee?
Answer: X ~ Hyp(N=8,n=5,k=3) since a size of 5 is to be selected from 8 people. k=3 since this is the total number
of women.
Therefore, the probability distribution of X is
f ( x)
f ( x)
Cx 5 C5 x
, x 0,1, 2,3
8 C5
C2 5 C3
8 C5
and P(there are 2 women)=P(X=2)=

Exercises
= 0.5357
1.
From a class of 4 girls and 3 boys, 2 are to be selected at random. What is the probability that 2 girls are
selected?
Answer: 0.2857
2.
A store has 10 iphones of which 3 are defective. Five iphones will be delivered to their customer. What is the
probability that there will be 2 defective iphones on the delivery?
Answer: 0.4167
4. Poisson Distribution
A random variable X, the number of successes in a fixed interval, follows a Poisson process provided the
following conditions are met
1. The probability of two or more successes in any sufficiently small subinterval is 0.
2. The probability of success is the same for any two intervals of equal length.
3. The number of successes in any interval is independent of the number of successes in any other interval provided
the intervals are not overlapping.
e x
f ( x; )
x!
V ( X ) E( X )
x 0,1, 2,..
for
or specified region
; where is the average number of outcomes occurring in the given time interval
Example
The number of typographical errors in new editions of textbooks varies considerably from book to book.
After some analysis he concludes that the number of errors is Poisson distributed with a mean of 1.5 typos per 100
pages. The instructor randomly selects 100 pages of a new book. What is the probability that there are no typos?
That is, what is P(X=0) given that = 1.5?
f ( 0 )=
e1.5 1.50
=0.2231
0!
There is about a 22% chance of finding zero
errors
Exercises
1. The average number of days school is closed due to typhoon during rainy season in a certain city is 4. What is the
probability that the schools in this city will close for 6 days during rainy season?
Answer: 0.1042
2. A certain bank receives an average of 7 bad checks per day. What is the probability that it will receive 5 bad
checks on any given day?
Answer: 0.1277
Special Continuous Probability Distribution
Normal Distribution
The most important probability distribution in the entire field of statistics is the normal distribution. Its
graph, called the normal curve, is a bell-shaped curve that describes many sets of data that occur in nature.
A normal random variable X is completely specified by two parameters : mean and variance 2.
Properties of the Normal Curve:
1.
2.
3.
4.
The mode, which is the point on the horizontal axis we=here the curve is a maximum, occurs at x = .
The curve is symmetric about a vertical axis through the mean .
The normal curve approaches the horizontal axis asymptotically as we proceed in either direction away
from the mean.
The total area under the curve and above the horizontal axis is equal to 1.
Unit/Standard Normal Distribution

To determine the probability distribution of a normal random variable, it is necessary that we perform a
change of scale, which converts the units of measurements from the original scale, x scale, into standard units or zscores by means of the formula
z=
The distribution of the variable Z is commonly known as standard normal distribution where the mean =
0 and the standard deviation = 1.
Note: We make use of standard normal table to evaluate probabilities involving normal random variables.
Shown below is a part of a standard normal table
z
0.00
0.01
0.02
0.03
0.0
0.0000 0.0040 0.0080 0.0120
0.1
0.0398 0.0438 0.0478 0.0517
0.2
0.0793 0.0832 0.0871 0.0910
0.3
0.1179 0.1217 0.1255 0.1293
How to use:
0.04
0.0160
0.0557
0.0948
0.1331
0.05
0.0199
0.0596
0.0987
0.1368
0.06
0.0239
0.0636
0.1026
0.1406
0.07
0.0279
0.0675
0.1064
0.1443
0.08
0.0319
0.0714
0.1103
0.1480
0.09
0.0359
0.0753
0.1141
0.1517
To find the P(0<z<0.34), look at the first column and locate 0.3 then on the first row, locate .04 (i.e.,
0.3+.04 = 0.34), the point of intersection is 0.1331. This is the probability of interest.
To find the P(-0.34<z<0), note that there is no negative value of z in the table but remember that a normal
random variable is symmetric, i.e., P(-0.34<z<0) = P(0<z<0.34). Therefore, the probability of interest is also equal
to 0.1331.
To find P(-0.34<z<0.34), this is equal to P(-0.34<z<0)+P(0<z<0.34) = 0.1331+0.1331 = 0.2662.
To find P(0.11<z<0.34), this is equal to P(0<z<0.34) P(0<z<0.11)=0.1331- 0.0438 = 0.0893.
Example
If IQ scores are normally distributed with a mean of 100 and a standard deviation of 20 then find the
probability of a persons having an IQ score of at least 130.
ANSWER:
The random variable X follows a normal distribution with = 100 and = 20. We are asked to find P(X
130). To use the standard normal table, we transform this into z-score.
P(X 130) = P
130100
( X
)
20
= P(z 1.50)
From the normal table, P(0<z<1.50)=0.4332.

Therefore, P(z 1.50) = 0.50-0.4332 = 0.0668.
Note: < in normal distribution is equivalent to
. This is also true for > equal to .
Exercise
1. Find the following areas under standard normal curve:
a. to the left of z = 1.75
b. to the left of z = 2.33
c. to the right of z = 2.33
d. to the right of z = 1.23
e. to the left of z = -0.68
f. to the left of z = -2.58
g. to the right of z = - 0.45
2. Find Z if
a. the area between 0 and z is .4922
b. area to the left of z is .9192
c. area to the right of z is .9974
d. area between -z and z is .6826
e. area between z and 0 is .4864
3. Word Problem
h. to the right of z = -1.25

i. between z = 0 and z = 2.40
j. between z = -2.30 and z = 0
k. between z = -0.58 and z = 3.4
l. between z = 1.23 and z = 2.28
a. A certain type of storage battery lasts on the average 3.5 years with a standard deviation of 0.5 year.
Assuming that the battery lives are normally distributed, find the probability that a given battery will last
less than 2.6 years.
b. The quality grade-point averages of 300 college freshmen follow approximately the normal distribution
with a mean of 2.1 and a standard deviation of 0.5. How many of these freshmen would you expect to have
a grade between 2.0 and 3.0, inclusive if the grade point average are computed to the nearest tenth?
Something to Read
Biography of Blaise Pascal (1623 1662) Solver of the Chevaliers Dilemma
As a precocious child growing up in France, Blaise Pascal showed an early inclination toward mathematics.
Although his father would not permit Pascal to study mathematics before the age of 15 (removing all math texts
from his house), at age 12 Blaise discovered on his own that the sum of the angles of a triangle are two right angles.
Pascal went on to become a distinguished mathematician, as well as a physicist, theologian, and the inventor of the
first digit calculator. Most historians attribute the beginning of the study of probability to the correspondence
between Pascal and Pierre de Fermat in 1654. The two solved The Chevaliers Dilemma a gambling problem
related to Pascal by his friend and Paris gambler the Chevalier de Mere. The problem involved determining the
expected number of times one could roll two dice without throwing a double 6. (Pascal proved that the break-even
point was 25 rolls)
Biography of Carl Gauss (1777-1855) The Gaussian Distribution
The normal distribution began in the eighteenth century as a theoretical distribution for errors in disciplines where
fluctuations in nature were believed to behave randomly. Although he may not have been the first to discover the
formula, the normal distribution was named the Gaussian distribution after Carl Friedrich Gauss. A well-known and
respected German mathematician, physicist, and astronomer, Gauss applied the normal distribution while studying
the motion of planets and stars. Gauss prowess as a mathematician was exemplified by one of his most important
discoveries. At the young age of 22, Gauss constructed a regular 17-gon by ruler and compasses a feat that was the
major advance in mathematics since the time of ancient Greeks. In addition to publishing close to 200 scientific
papers, Gauss invented the heliograph as well as a primitive telegraph device.
CHAPTER VI. INFERENTIAL STATISTICS
Inferential statistics is an area in Statistics that deals with methods used to make generalizations/inferences
about some characteristics of the population based on information contained in a sample. It is concerned with
making decisions or predictions about the parameters of the population such as the population mean and
population variance 2
Types of Inferences
1.
Estimation - estimating the value of the parameter of interest (point estimation and confidence interval
estimation)
- objective of estimation is to determine the approximate value of a population parameter on the
basis of a sample statistic
2.
Hypothesis testing - making decisions on whether or not the sample agrees with the researchers
assertion regarding some characteristic of the population
- objective is to make statement(s) regarding unknown population parameter values
sample data
based on
Inferential
Statistics
Hypothesis
Testing
Estimation
Confidence
Interval
Estimation
Point
Estimation
Hypothesis
of No
difference
Hypothesis
of No
Relationship
Example
Research Problem: How effective is Paracetamol in treating headache?
Specific Objectives:
1.
to estimate the population average length of time headache is gone right after paracetamol intake
2.
to determine whether treatment using paracetamol is better than the existing treatment that is known to
eliminate headache in 15 minutes after intake
Question: How do we achieve these objectives using inferential statistics?

To achieve objective 1, we use estimation and for objective 2 use hypothesis testing.
Point & Interval Estimation
1. Point Estimation based on sample data, a single number is calculated to estimate the population parameter and
the resulting number is called point estimate
= parameter (unknown)
estimator = 3 (point estimate)
2. Confidence Interval Estimation based on sample data, two numbers are calculated to form an interval,
consisting of the lower limit and an upper limit, within which the parameter is expected to lie with
probability (1-)100 percent and the resulting pair of numbers is called an interval estimate or a confidence
interval
= parameter (unknown)
(2.5, 4) confidence interval estimate
For example, suppose we want to estimate the mean rating of students on the food quality of a canteen. For n=25
students,
x
x
is calculated to be 89% out of 100%( perfect rating).

is called point estimator while its specific value, 89%, is called point estimate.
An alternative statement is:

The mean rating is between 85% and 93%.
(85%-93%) is called interval estimate.
Characteristics of a Good Point Estimate
1. The sampling distribution of the point estimator should be centered over the true value of the parameter to be
estimated. That is, the estimator should not consistently underestimate or overestimate the parameter of interest
unbiased estimator. An estimator is said to be unbiased if the mean of its distribution is equal to the value of the
parameter.
2. The spread (measured by variance) of the sampling distribution should be as small as possible. This ensures that,
with high probability, an estimate will fall close to the true value of the parameter.
Error of estimation the difference between the estimate and the true value of the parameter
Margin of error provides a practical upper bound for the error of estimation
- usually of the form of z/2*SE where SE is the standard error
Four commonly used
1-
/2
z /2
0.90
0.95
0.98
0.99
0.10
0.05
0.02
0.01
0.05
0.025
0.01
0.005
1.645
1.96
2.33
2.575
Remarks:
1.
The unbiased estimator of the population mean is the sample mean
x .
x
2.
The unbiased estimator of the population variance 2 is the sample variance s2.
3.
The standard error SE is calculated as s/
Example
The height in centimeters of a random sample of 5 basil plants are 14.6, 12.5, 15.3, 16.1 and 14.4. Find a
point estimate for the mean height of all basil plants.
Since a good point estimate for the population mean height of basil plants is the sample mean height then
14.6 12.5 15.3 16.1 14.4

14.58
5
Therefore, 14.58 is the estimated mean height of all basil plants.

Confidence Interval Estimation
When the sampling distribution of a point estimator is approximately normal, an interval estimator or
confidence interval (CI) can be constructed using the following reasoning: Assuming that a 95% CI is to be
constructed.
1.
For the standard normal variable Z, 95% of all values lie between 1.96.
2.
For an unbiased point estimator with a normal sampling distribution, 95% of all point estimates lie within
1.96*SE of the parameter of interest.
3.
The CI is : point estimate margin of error where margin of error is z /2*SE .
In general, confidence interval can be established at any confidence coefficient (1-).
Looking at the table below, if we want a 90% CI then z /2 = 1.645.
If we want a 99% CI then z /2 = 2.575 .
Two Desirable Characteristics of a Good CI:

1.
The CI must be as narrow as possible. The narrower the interval, the more exactly you have located the
estimated parameter.
2.
The confidence coefficient must be near to 1. The larger the confidence coefficient, the more likely it is that
the interval will contain the estimated parameter.
Note: As confidence coefficient increases, the more likely it is that the CI will become wider.
Confidence Interval Estimator for Population Mean
1. A (1- )100% large sample CI for a population mean
x z / 2
= x z /2
, x + z /2
n
n
n
2. A (1- )100% small sample CI for a population mean
s
,n1 n
2
x t
s
s
, x +t
,n1 n
,n1 n
2
2
x t
Where: t/2,n-1 will come from a t-table (see Appendix Table 2)

A part of t-table
df
1
2
3
4
5
6
0.05
6.31
2.92
2.35
2.13
2.02
1.94
0.025
12.71
4.3
3.18
2.78
2.57
2.45
0.01
31.82
6.96
4.54
3.75
3.36
3.14
0.005
63.66
9.92
5.84
4.6
4.03
3.71
Example: if = .05 and n = 7 then t/2,n-1 = t.025,6 = 2.45

Remarks:
1.
Large sample CI means n30 while small sample CI means n<30.
2.
If the population standard deviation is unknown and sample size n 30, use sample standard deviation s to
estimate and the formula for large sample CI.
3.
t/2 value is found in t-table with n-1 degrees of freedom
4.
A larger confidence level produces a w i d e r confidence interval which provides little information.
5.
Larger values of produce w i d e r confidence intervals.
6.
Increasing the sample size decreases the width of the confidence interval while the confidence level can remain
unchanged but this also increases the cost of obtaining additional data.
Example
1. In order to use our confidence interval estimator, we need the following pieces of data:
370.16
1.96
Calculated from
the data
75
n
Given
25
Large sample CI is used because is known/given.

Therefore
x z / 2
75
=370.16 1.96
=370.16 29.40=(340.76,399.56)
n
25
Margin of error is 29.40.

The lower and upper confidence limits are 340.76 and 399.56.
To interpret, we are 95% confident that the population mean has values between 340.76 and 399.56.
2. A simple random sample of 5 months of food sales data provided the following information:
Month:
Units Sold:
94
100
85
94
92
To construct a 90% CI for the population mean food sales per month, we need the following data:
93
t/2
t0.05 = 2.132
5.39
5.39
Using the formula for small sample CI , we have 932.132
= (87.86,98.14)
Therefore, a 90% CI for the population mean food sales per month is between 87.86 and 98.14.
Exercises
1. Forty-seven sedentary men were studied over a one-year period. These men tried to lose weight through exercise
alone. The results of the study showed that the mean weight loss of these men was 0.7 kilograms with a standard
deviation of 4.8. Find the 95% confidence interval for the population mean and interpret the result.
2. Assume that the monthly salary of a food technologist is normally distributed. Find a 99% confidence interval
for the population mean monthly salary and interpret it if the observed salaries of 5 food technologists are as
follows:
22,130
18,465
25,616
22,440
19,869
Hypothesis Testing
Goal is to make statement(s) regarding unknown population parameter values based on sample data
A hypothesis is a claim (assumption) about a population parameter:

population mean
Example: The mean weekly expenditure on food in this city is = $142.
Review of Statistical Notations:
Parameters (population)
Statistics(sample)
Variance
s2
Standard Deviation
Proportion
Mean
Elements of Hypothesis Testing

The Hypotheses
Null hypothesis (Ho)
the statement being tested; it represents what the experimenter doubts to be true
Alternative hypothesis (Ha)
the operational statement of the theory that the experimenter believes to be true
and wishes to prove
Note: The null hypothesis and
alternative hypothesis must be non-overlapping

population.
statements about the
Example
Consider a test by a light bulb manufacturer to examine the life of a new long-life bulb it hopes to market.
The leading brand of bulb in the market has a mean burning time of 2000 hours. For advertising purposes, the
manufacturer wishes to prove that the new bulb has a longer mean burning time.
Let = mean burning time of the new long-life bulb
Statement of hypotheses: Ho: = 2000 vs
Ha: > 2000
The Test Statistic

-
a statistic computed from the sample data that is especially sensitive to the differences between Ho and Ha
The test statistic should tend to take on certain values when Ho is true and different values when Ha is true.
The decision to reject Ho depends on the value of the test statistic
A decision rule based on the value of the test statistic: Reject Ho if the computed value of the test
statistic falls in the region of rejection.
Region of Rejection/Rejection Region

-
the set of all values of the test statistic which will lead to the rejection of Ho
Factors that Determine the Region of Rejection
the behavior of the test statistic if the null hypotheses were true
the alternative hypothesis: the location of the region of rejection depends on the form of Ha
level of significance (): the smaller is, the smaller the region of rejection
Level of Significance and the Rejection Region

Level of Significance =
Ho: = o
Ha: < o
Left-tailed test
Ho: = o
Ha: > o
Right-tailed test
Ho: = o
Ha: o
Two-tailed test
Lower tail test also called left-tailed test

Upper tail test also called right-tailed test
Level of Significance ()
- the size of the risk (0 < < 1) of erroneously rejecting Ho that the researcher is willing to make
A Summary of Possible Decisions in Hypothesis Testing and their Chances of Occurrence
True Situation
Decision
Ho is true
Ho is false
Reject Ho
TYPE I error
CORRECT decision
chance of occurrence=
chance of occurrence=1-
(level of significance)
(power of the test)
Do not reject Ho
CORRECT
chance of occurrence=1-
TYPE II error
chance of occurrence=
The experimenter is free to determine . If the test leads to the rejection of Ho, the researcher can then
conclude that there is sufficient evidence supporting Ha at level of significance.
1- = power of the test
We would like and small as possible, but unfortunately, there is an inverse relationship between the two. The test
procedures that we will study are based or developed by minimizing .
The choice of usually depends on the consequences associated with making a Type I error.
Common Choices of
Consequences of Type I error
0.01 or smaller
0.05
0.1
very serious
moderately serious
not too serious
Because of the inverse relationship of and , setting a very small should also be avoided if the
researcher cannot afford a very large risk of committing a Type II error.
Guidelines for Testing Hypothesis

1. State the Ho and Ha.
2. Determine the test statistic to use and the tail in distribution where the rejection region is to be found.
3. Decision rule
4. Computation
5. Decision
6. Conclusion/Interpretation, answering the problem.
Note: Steps 3 and 4 are sometimes interchange in some books.
Tests Concerning One Population Mean
Ho: = o (where o is a specified value of the population mean)
Ho
=o
Test Statistic
x o
n
(large-sample test/z-test)
=o
x o
s
n
(small-sample test/t-test)
Ha
>o
<o
Decision Rule
z>z
z<-z
|z|>z/2
>o
<o
t>t,n-1
t<-t,n-1
|t|>t/2,n-1
Example
Test the claim that the true mean # of gas stoves in US homes is less than 3. (Assume = 0.8)
Solution
1. Formulate the appropriate null and alternative hypotheses

Ho: = 3
Ha: < 3
2. This is a left-tailed test

Suppose that = .05 is chosen for this test
3. Determine the rejection region
This is a one-tailed test with = .05.

Since is known, the cutoff value is a z value:
Reject H0 if z < z = -1.645 ; otherwise do not reject H0
4. Computation of test statistic.
Suppose a sample is taken with the following results:
n = 100, x = 2.84 ( = 0.8 is assumed known)
Then the test statistic is:
x 2.84
3 .16
2.0
0.8
.08
100
5 and 6. Reach a decision and interpret the result
Since z = -2.0 < -1.645, we reject the null hypothesis and conclude that at 5% level of significance, we have
sufficient evidence to say that the mean number of gas stoves in US homes is less than to 3.

Final Handouts - Stat220

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Handouts - Stat220

Uploaded by

Copyright:

Available Formats

Probability Distributions

Therefore, the probability distribution of X is given by

Probability Distribution in Graphical Presentation

How about family of three?

How about a family of 10?

Descriptive Characteristics of a RV: Mean and Variance

The mean or expected value of X is

Where : E(X2)= X2f(x)

Discrete Probability Distributions

number of complaints per day

number of rings before the phone is answered

Only two possible outcomes:

gender: male or female

spreads peanut butter first vs. spreads jelly first

The experiment consists of n repeated trials.

For f(2) = 3C2(1/2)2(1/2)1 = 3/8

and P(there are 2 women)=P(X=2)=

There is about a 22% chance of finding zero

Unit/Standard Normal Distribution

From the normal table, P(0<z<1.50)=0.4332.

. This is also true for > equal to .

h. to the right of z = -1.25

Question: How do we achieve these objectives using inferential statistics?

(2.5, 4) confidence interval estimate

is calculated to be 89% out of 100%( perfect rating).

An alternative statement is:

The unbiased estimator of the population mean is the sample mean

The standard error SE is calculated as s/

14.6 12.5 15.3 16.1 14.4

Therefore, 14.58 is the estimated mean height of all basil plants.

The CI is : point estimate margin of error where margin of error is z /2*SE .

In general, confidence interval can be established at any confidence coefficient (1-).

Looking at the table below, if we want a 90% CI then z /2 = 1.645.

If we want a 99% CI then z /2 = 2.575 .

Two Desirable Characteristics of a Good CI:

2. A (1- )100% small sample CI for a population mean

Where: t/2,n-1 will come from a t-table (see Appendix Table 2)

Example: if = .05 and n = 7 then t/2,n-1 = t.025,6 = 2.45

Large sample CI means n30 while small sample CI means n<30.

t/2 value is found in t-table with n-1 degrees of freedom

Larger values of produce w i d e r confidence intervals.

Large sample CI is used because is known/given.

Margin of error is 29.40.

Using the formula for small sample CI , we have 932.132

A hypothesis is a claim (assumption) about a population parameter:

Elements of Hypothesis Testing

Alternative hypothesis (Ha)

Note: The null hypothesis and

alternative hypothesis must be non-overlapping

statements about the

Ha: > 2000

The Test Statistic

The decision to reject Ho depends on the value of the test statistic

Region of Rejection/Rejection Region

Factors that Determine the Region of Rejection

Level of Significance and the Rejection Region

Lower tail test also called left-tailed test

Consequences of Type I error

Guidelines for Testing Hypothesis

1. Formulate the appropriate null and alternative hypotheses

2. This is a left-tailed test

This is a one-tailed test with = .05.

5 and 6. Reach a decision and interpret the result