Professional Documents
Culture Documents
Introduction
Most people think of statistics as the study of the numerical features of a subject/population. It means the same to
statisticians, but also emphasizes the methods of collecting data, summarizing and presenting data, and drawing
inferences from data.
We all see on TV how political pundits justify opposing points of view by presenting statistics from respectable sources.
How could something be a science when it justifies two opposing points of view? The answer is that statistics has a
scientific basis but it can be misrepresented in use.
Example. During the saga of President Clinton's impeachment, we observed the following:
1. One pundit says that, according to statistics, the majority of Americans think that character matters.
2. The other pundit says, also according to statistics, that the majority of Americans think the president is doing a
good job.
The implication here is that one of them was "wrong." But the science of statistics says that both were correct. Data was
collected and analyzed, and it was found that the majority of Americans think that character matters and that the
majority of Americans think the president is doing a good job. It does not matter to the science of statistics which one of
the statistically established facts you or I want to believe.
Another point about the nature of statistics as a science is that it is not a deterministic science. It does not have laws
like force is equal to mass times acceleration. Statements in statistics come with a probability (i.e., quantified chance) of
being correct. When a weatherman says that it will rain today he means that there is, say, a ninety five percent chance
that it will rain today. Roughly, this means that if he makes the same prediction one hundred times he will be correct 95
times, and it will not rain the other 5 days. The problem is that sometimes a weatherman will hide the information that
there is a 95 percent chance only. Such information hiding is sometimes done for simplicity.
Before I conclude this introduction, let me tell you an interesting anecdote about the development of this subject. When
the proposal to establish the Indian Statistical Institute in Calcutta was considered by the government of India in the early
part of the last century, some critics said, then why not an institute in astrology? At the inception of statistics as a science
there was a lot of skepticism about its scientific validity. Those days are gone, and statistics is not likened to astrology any
more! Statistics is a well-founded and precise science. It is a nondeterministic science in nature; it makes precise
probabilistic statements only.
In this course we will be talking about two branches of statistics. The first one is called descriptive statistics and deals
with methods of processing, summarizing, and presenting data. The other part deals with the scientific methods of
drawing inferences and forecasting from the data, and is called inferential or inductive statistics.
In the rest of this lesson and the next we deal with descriptive statistics, which includes the presentation of data in the
form of tables, graphs, and computations of various averages of data.
In statistics we use a small representative "sample" to study a big "population." The reason for this is the cost or even the
impossibility of studying the whole population.
Definitions. A complete collection of data on the group under study is called the population or the universe.
A member of the population is called a sampling unit. Therefore, the population consists of all its sampling units.
Example. Suppose we are studying the daily rainfall in Lawrence. Since daily rainfall could be from 0 inches to anything
above 0, the population here is all nonnegative numbers (i.e., the interval [0, ∞)). A sample from this population would
be the observed amount of daily rainfall in Lawrence on some number of days. A sample of size 11 would be the observed
daily rainfall in Lawrence on 11 days.
Variables
Many definitions of variables are available in standard textbooks. For our purpose the following definition will suffice.
Definition. A variable is a rule or a formula or a mechanism that associates a value with each member of the
population. So, given a member w, a variable X assigns a valueX(w) to w. For us X(w) will be a characteristic (like height,
weight, time, salary) of the population.
Example. Suppose we are studying the KU student population. The population is the whole collection of KU students. A
KU student is a sample unit. If GPA is the "characteristic" that we are studying, then X = the GPA of a student is a
variable. So, given a student, X has a value. For example:
To give another example, if credit hours completed is the characteristic studied, T = the number of course credit hours
completed so far by a student is a variable.
Similarly, given any other characteristic like weight, annual income, annual expenditure, you can construct a variable for
this population.
A variable that takes numerical values is called a quantitative variable. So, the variables X, Z, and T above are
quantitative variables, while Y is not. A variable that takes non-numerical values is called a qualitative variable. So, the
variable Y above is a qualitative variable. We will mostly be concerned with quantitative variables.
We discuss two types of quantitative variables: continuous and discrete variables. A quantitative variable that can assume
any numerical value over an interval is called acontinuous variable. Since Z above can (hypothetically) assume any
value between 0 to 100 inches, Z is a continuous variable. T assumes only integer values and is therefore not a
continuous variable.
A different way to understand a discrete variable is that the possible values of the variable can be written down (or can be
counted) in a (finite or infinite) list. We say that the values of a discrete variable are countable.
A quantitative variable is called a discrete variable if its possible values consist of breaks between successive values. If
a variable assumes only a finite number of values, then it is also called a finite variable. Otherwise the variable is called
an infinite variable. A finite variable is definitely a discrete variable. The variable T above is a discrete variable.
1. The examples of continuous variables are weight, length, volume, area, and time.
2. For this course, examples of discrete variables are always the number of something—number of typos, number of
road accidents, number of phone calls.
Parameters and Statistics
Definition 1. Given a set of data, any numerical value computed from the data using a formula or a rule is called
a quantitative measure of the data.
Definition 2. A quantitative measure of a population data is called a parameter. In other words, parameters belong to
the whole population and are computed (if feasible) from the WHOLE population data. Examples: the average GPA of all
KU students, the height of the tallest student in KU, the average income of the entire KU student population.
One way to study a population is to know some of the parameters of the population. Unfortunately, computing such
parameters could be expensive or even impossible. Essentially, parameters are unknown and the main game of
statistics is to try to estimate parameters on the basis of small samples collected from the population.
Definition 3. A quantitative measure of a sample data is called a statistic. So, any constant that we compute from a
sample is a statistic. We use these statistics to estimate the parameters of the population. For example, the average
height computed from a sample is a reasonable estimate for the (parameter) average height of the KU student
population. Obviously, we do not expect the value of the statistic to be exactly equal to the parameter value.
Hopefully, the error will be small or will exceed our tolerable limit very rarely (say once in a 100 trials).
Sometimes it will be impossible to know the actual value of a parameter. For example, let μ be the mean length of the life
of light bulbs produced by a company. In this case, the company cannot test all the bulbs it produces to find a mean
length. So, the best it can do is to test a few bulbs, compute the sample mean length (a statistic) of the life of these bulbs
and use it as an estimate for the mean length (parameter μ) of the life for all the bulbs it produces.
Definition 4. The data that has not been processed or organized in any form is called raw data. When the data is
arranged in an increasing or decreasing order, then it is called an array. The range of the data is the difference between
the largest and the smallest value of the data.
In this section we talk about representation of data organized in tabular form. Such a representation is called
a frequency distribution. We are mostly concerned with numerical data (i.e., quantititative data), but also consider
some non-numerical data (i.e., qualitative data).
Example. (from Khazanie, p. 18) The following is data on the blood group of 36 patients in a hospital:
O A B O A A A O O
O A O A B O O O AB
B A A O O A A O AB
O A A B A O A O O
We have four types of blood groups, namely, O, A, B, AB. Each of these blood groups may be referred to as a "class."
The frequency of a class is defined as the number of data members that belong to that class. For example, the frequency
of the class O is 16; the frequency of class A is 14. A table that lists the classes and the corresponding frequency is called
the frequency distribution of this qualitative data. Following is the frequency distribution of this data:
Ungrouped Data
For the quantitative data, we consider two types of frequency table. When we are working with a large set of data we
group that data into a few classes and construct a "frequency table," which we will discuss later. If the data set is small or
if the number of values that appear in the data is small we need not group the data. Instead, we make a list of all the
data members and give the corresponding frequency for each data member in a table. The number of times a data
member (i.e., value) appears in the data is called the frequency of the data member. A list that presents the data
members and the corresponding frequency in a tabular form is called a frequency table orfrequency distribution.
The relative frequency and percentage frequency of a data member x are defined as follows:
frequency of x
relative frequency of x =
and
frequency of x
The frequency table may also contain the relative and percentage frequency. Since we did not group the data into a few
classes, we call this the frequency distribution of the ungrouped data.
Example 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several
time trials, and the following sample of times taken (in seconds) to complete the laps was collected:
50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50
Note that there are 35 observations here. So we say that the size of the sample (or data) is 35. Also the values present
are 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56. Since there are only 11 distinct values present we can make a frequency
table for the ungrouped data. The following is the frequency distribution of this ungrouped data:
Grouped Data
When we are working with a large set of data that has too many distinct class member (i.e., values) then we group the
whole set of data into a few class intervals and give the corresponding "frequency" of the class. When the data is
presented in this way, the data is called grouped data. The number of data members that fall in a class interval is called
the class frequency and the relative and percentage frequencies are computed by the same formula as above. A list
that gives various class intervals and the corresponding class frequencies in a tabular form is called a class frequency
table or class frequency distribution of the data. The frequency distribution may also include the relative and
percentage frequencies.
Sometimes it is convenient or necessary to group data into class intervals and construct a class frequency distribution.
This is the case when there are too many distinct numbers present in the data—too many even to fit into a simple table
on a page for presentation. In such situations, we group the data in a few class intervals. While class frequency
distribution is very good for presentation and convenient for other reasons, we lose a lot of information in this process.
There is no way we can recover the original data from the class frequency distribution.
Given a set of data, a good question would be, How many class intervals should we have? The answer is that it should not
be too few nor should it be too many. If we take too few (say one), then all the information will be lost. On the other
hand, if we take too many, we will have the problem of having to work with ungrouped data. (In this course we will
always tell you how many classes to take.) Although sometimes it may be necessary to take class intervals of varying
width, in this course we only consider classes of equal class width.
1. Range: Pick a suitable number L less than or equal to the smallest value present in the data. Pick a suitable
number H greater than or equal to the highest value present in the data. The range R that we consider is R = H -
L.
2. Number of Classes: Decide on a suitable number of classes. (In this course we will tell you the number of
classes.)
class width = w =
Number of classes
We will pick L, H, and the number of classes so that class width is a "round number."
Since this definition creates an ambiguous situation in which a data value may fall into two classes, we need a
convention to address this situation.
5. Frequency: Find the frequency for each of the classes. You can use an advanced calculator or some software
(like Excel) to count frequencies.
A few more important definitions. The above intervals are called class intervals. The w above is called the
class size or width. The lower end of the class is called lower limit and the upper end of the class is called upper limit.
The class mark is the midpoint of the class, defined as follows:
class mark = .
A class limit is also called a class boundary. I took a slightly different approach when I defined the classes, so that for
us class limits and class boundaries are the same. Although all the approaches are essentially the same, many slightly
different approaches are possible depending on the situation.
Example 1.2.2 The following is the weight (in ounces), at birth, of a certain number of babies.
We will construct a class frequency table of this data by dividing the whole range of data into class intervals.
Solution: Note that the lowest value is 62 and the highest value is 156. We take L = 60, H = 160, so R = H-W = 100. We
made such a choice of L and H, precisely so that R = 100 is a "nice" number. Now we decide to have 5 class intervals and
so w = R/5 = 20. According to what I said above, our classes should be : [60, 80], [80,100], [100,120], [120,140], [140,
160]. But if we do so then there is a risk that some data members (like 80, 100, 120, 140) will fall in two classes. One
way to avoid this is to add .5 to all the class boundaries. So, our classes are [60.5, 80.5], [80.5, 100.5], [100.5, 120.5],
[120.5, 140.5], [140.5, 160.5].
Relative Percentage
Classes Frequency
Frequency Frequency
60.5 - 80.5 9 9/99 9.09
80.5 - 100.5 20 20/99 20.20
100.5 - 120.5 25 25/99 25.26
120.5 - 140.5 37 37/99 37.38
140.5 - 160.5 8 8/99 8.08
Total 99 1 100
1.3 Pictorial Representation of Data
Another way to represent data is to use pictures and graphs. We see such pictorial representation in newspapers and
other sources every day. Pictorial representation is particularly important when you have to represent data to people with
limited technical background, like newspaper readers or a governmental or congressional body.
The pie chart is a commonly used pictorial representation of data. When you do your tax return every year, you find a few
pie charts in the instruction book for form 1040. These charts show what proportion/percentage of each tax dollar goes
for particular expenses. I reproduced the following pie charts from the 1040 instruction book of 1999.
Pie charts are self explanatory; we do not need to discuss them further.
The Histogram
Among pictorial representations, the most useful in this course is the histogram. The histogram of data is the graphical
representation of the frequency distribution of the data, where we plot the variable on the horizontal axis
and above each class interval, we erect a bar of the height equal to the frequency of the class. Such a
histogram is called a frequency histogram.
If, instead, we erect bars of height equal to the relative frequency, then the graph is called a relative frequency
histogram. Similarly, we can construct a percentage frequency histogram.
Remark. Take a look at the Stem and Leaf Diagram discussed in any textbook.
Example 1.3.1. Following is the frequency table of data on height (in inches) of some babies at birth. Sketch the
histogram of the following data:
Height Frequency
16-17 3
17-18 8
18-19 34
19-20 60
20-21 72
21-22 18
For a given value x of a variable, the cumulative frequency of the data, for x, is the number of data members that are
less than or equal to x.
Definition. Given a frequency distribution of some data, for a class boundary x, the cumulative frequency is the sum
of all the class frequenies less or equal to x. Thecumulative frequency distribution is a table that gives the cumulative
frequencies against some x values (for us the class boundaries). We also define cumulative relative frequency and
cumulative percentage frequency as follows:
cumulative frequency of x
cumulative relative frequency of x =
cumulative frequency
Example 1.3.2 Once again we consider the data on birth weight of babies in Example 1.2 that we discussed in the last
section. A cumulative frequency distribution can be constructed from the frequency distribution.
Solution: We have seen the frequency distribution before. The following is the cumulative distributions:
Cumulative
Cumulative Relative-Cumulative
Weight Percentage
Frequency Frequency
Frequency
60.5 0 0 0
80.5 9 9/99 9.09
100.5 29 29/100 29.29
120.5 54 54/99 54.55
140.5 91 91/99 91.92
160.5 99 1 100
The Ogive
Definition. The ogive is a line graph, where we plot the variable on the horizontal axis and the cumulative frequency on
the vertical axis. If we plot the cumulative relative frequency on the vertical axis, then the line graph is called
the relative frequency ogive.
Use of Calculators
Because we will be using calculators (TI-83) extensively in this course, let me explain how you enter data in the TI-83.
1. Press "stat."
2. To input data, enter "edit."
3. Enter your data (say in L1).
4. Press "stat."
5. Enter "sortA" L1.
6. Press "stat" and then enter "edit." On L1 you will see that the data is sorted in an increasing order.
7. Now you can count the frequencies.
Exercise 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several
time trials, and the following sample of times taken (in seconds) to complete the laps was collected:
50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50
Construct a histogram.
Exercise 1.2.2. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May
2000.
Construct a class frequency table of this data by dividing the the whole range of data into class intervals:
Solution
Exercise 1.2.3. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May
2000.
Construct a frequency table for this data by dividing the whole range into class intervals:
Note: If a data member falls on the boundary, count it in the right/upper class-interval.
Solution
Exercise 1.2.4. The following data represents the number of typos in a sample of 30 books published by some publisher.
Construct a frequency table (by sorting in your calculator). Also construct a histogram.
Solution
Exercise 1.2.5. Following is data on the hourly wages (paid only in whole dollars) in an industry.
9 11 8 9 10 11 7 10 12 13
7 11 8 11 14 9 10 9 11 7
13 13 14 12 9 8 12 14 15 9
9 7 12 7 12 7 7 11 13 9
11 9 9 9 10 14 11 12 14 7
Construct a frequency table (by sorting in your calculator). Also construct a histogram.
Solution
Exercise 1.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.
7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8
Introduction 2.1 Measures of Central Tendency: 2.2 Measures of Central Tendency: Mean, Median, Mode
Mean
Introduction
In this lesson we talk about two types of constants that we compute from data:
2. measures of dispersion.
A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are
measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around.
The most common measure of central tendencies is the mean or arithmetic mean.
If we denote a data value (i.e., the variable) by x and if n is the size of the data, then the above formula is written as
∑x
mean = x =
.
n
OR
If the data is a sample, then the mean is called the sample mean. Again, if x denotes the variable, the data is sometimes
denoted by x1,x2, ... ,xn and then
n
∑ xi
mean = x = .
i=1
n
OR
n
mean = x =
∑ xi/n
i=1
If you have not seen the notation ∑ before, it simply means summation. For example,
n
∑ xi = x1+x2+ ... +xn
i=1
Weighted Mean
Sometimes, different values in data carry different weight. Let us consider the following data and the corresponding
frequency distribution that we computed earlier:
Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several
time trials. The following are sample times taken (in seconds) to complete the laps:
50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50
Following is the frequency distribution of this data:
(46x1+47x1+48x3+49x3+50x4+51x6+52x4+53x5+54x5+55x2+56x1)
mean=x= =1799/35=51.4
(1+1+3+3+4+6+4+5+5+2+1)
The mean of the original data is the weighted mean of the data values 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 56 with
the corresponding frequency as the weight. So, a new formula for the mean would be
n
∑ xi fi
i=1
x = mean = n
∑fi
i=1
OR
n n
mean =x= ∑ fi xi / ∑fi
i=1 i=1
where fi is the frequency of xi. The weighted mean is defined in more general context as follows:
Definition. If x1, x2, ... , xn in a data set have different weights and the values xi has weight wi, then the weighted
mean is defined as
weighted mean =
n
∑ wx
i i
i=1
.
n
∑ wi
i=1
OR
weighted mean = x = ∑wixi / ∑wi
1. Combining two means. Suppose we have two sets of data. The mean of the first set is x, and the size of the
first set is m; the mean of the second set is y, and size of the second set is n. The mean of the combined data is
2. Effect of translation. Let x be the mean of x1, x2, ... , xn. Then the mean of y1 = x1+d, y2 = x2+d, ... ; yn =
xn+d is given by
y= x+d
3. Effect of multiplication by a constant. Let x be the mean of x1, ... , xn. Then the mean of
is given by
z = cx
Properties of Mean
Remark (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you
complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean
+ 7) = 73 + 7 = 80. This is what we meant by "effect of translation."
Example (effect of multiplication by c): Suppose you have some data x1, x2, ..., xn on salaries in an industry in the
United States and the mean is $37000. On a certain day, 1 U.S. dollar = 1.4729 Canadian dollars (say c = 1.4729). So, in
Canadian dollars the mean is 37000*c = 37000 x 1.4729. Similarly, the change of units (inches to feet or cm) are
"multiplication by a constant c."
Example 2.1.2. A student took PHSX 115 (College Physics), PSYC 120 (Personality), FREN 110 (Elementary French), BUS
241 (Managerial Accounting), and MATH 365 (Elementary Statistics). The number of credit hours and the student's grade
is given in the following table:
Course PHSX 115 PSYC 120 FREN 110 BUS 241 MATH 365
Grade (Points) B (3 points) A (4 points) B (3 points) C (2 points) B (3 points)
Credit Hours 4 3 5 3 3
What is the student's GPA?
Solution. The GPA is the weighted average of the points (corresponding to the grades), weight being the course-credit
hours. So, the GPA = (3x4+4x3+3x5+2x3+3x3)/(4+3+5+3+3) = 54/18 = 3.
The Median
The median represents the middle value of the data. Half the data will be less than or equal to the median, and half the
data will be greater than or equal to the median. You are above the median American income if half the American
population is making less than you make.
Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the size of data is ODD then
the median is the middle value. If it is EVEN, then themedian is the mean of the middle two values.
The Percentiles
Definition. For a number p between 0 to 100, the pth percentile xp of the data is a number such that at least p percent of
the data members are below xp and at least (100 - p) percent of the data members are above xp.
The Mode
Definition. The MODE of the data is the value or values that have the highest frequency. For example, the mode of the
set {1, 3, 5, 5, 7} is {5} because it has the highest frequency. The mode of {1, 1, 3, 5, 5, 7} is {1, 5} because 1 and 5
both have the highest frequency. Such a set is said to be bimodal.
1. Enter the frequency table in the calculator, say, x-values in L1 and frequencies in L2.
2. Select "1-Var Stats" in the CALC menu and enter.
3. The calculator will ask for the lists. Type in the list L1, L2 and enter.
4. The calculator will give a list of numbers; x-bar is the mean x.
Exercise 2.2.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times
on a particular day.
Exercise 2.2.2. The following figures refer to the GPA of six students.
Exercise 2.2.3. The following data give the lifetime (in days) of light bulbs.
Exercise 2.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the
athlete to complete the events.
Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.
Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.
7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8
Compute the mean and median hourly wage.
Solution
Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.
Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.
Range
Clearly, the measures of central tendency—mean, median, mode—cannot tell us the "whole story" about the data.
Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of
the semester:
Section A 81 84 83 80 82
Section B 72 93 92 82 71
Both these sections have the same mean—82. But in Section A, everybody will get a B grade. In section B, we will have
two C's, one B and two A's.
The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very
small dispersion or variability, whereas section B has a large dispersion.
A very simple measure of dispersion is the range of the data as we have defined before:
So, the mean deviation is the mean of the absolute deviations | xi -x | from the mean.
Remark.
Definition. The sample standard deviation s is defined as the square root of the sample variance s2. So, to compute
the sample standard deviation, we have to compute the sample variance first.
The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5=
42/5. Since the variability of section B was much higher, the mean deviation was very high.
( (81-82)2+(84-82)2+(83-82)2+(80-82)2+(82-82)2 )/(5-1) =
(1+4+1+4+0) /4= 10/4 = 2.5 .
( (72-82)2+(93-82)2+(92-82)2+(82-82)2+(71-82)2 )/(5-1) =
(100+121+100+0+121) /4= 442/4.
The mean and the standard deviation tell us a lot about how the data is distributed.
Chebyshev's Rule. This rule applies for all kinds of data. Suppose x is the mean and s is the standard deviation of the
data. Then we have the following:
1. At least 0 percent of the observations will fall within 1 standard deviation of the mean, i.e, within (x-s, x+s). This
is clearly obvious.
2. At least 75 percent of the observations will fall within 2 standard deviations of the mean, i.e., within (x-2s, x+2s).
3. At least 89 percent of the observations will fall within 3 standard deviations of the mean, i.e., within (x-
3s, x+3s).
4. More generally, at least 100(1 - 1/k2) percent of the data will be within k- standard deviations from the mean, i.e.
within (x-ks, x+ks).
Chebyshev's Rule makes no assumption about the data or the variable. If we make some assumptions about the data,
then we can improve the above rule as follows.
The Empirical Rule: Suppose the histogram of the data is symmetric around the vertical line x = x as follows:
In other words, the histogram should fit into a bell-shaped curve.
Bell-shaped Curve
1. Approximately 68.3 percent of the observations will fall in the interval (x-s, x+s).
2. Approximately 95.4 percent of the observations will fall in the interval (x-2s, x+2s).
3. Approximately 99.7 percent of the observations will fall within the interval (x-3s, x+3s).
Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data
members are EQUAL!
Practice Problem. Consider the exercises 2.2.1 through 2.2.8. For each problem, compute the mean and standard
deviation of the data and find what percentage of the data are within one, two, or three standard deviations from the
mean.
When a frequency table is given, we can use new formulas to compute the mean and variance of the data.
Formulas. Suppose the data consisting of n observations are given in a frequency table (ungrouped). Let xi denote the
values and fi be the frequency of xi. Then
1. the mean =
= =
x ,
∑ fixi ∑ fixi
∑ fi n
2. the variance =
s2 =
n- 1 ,
∑ fi(xi - x)2
s2 = [∑ (fixi2) - n x2 ].
n- 1
4. If the data is given in a frequency table of the grouped data, we use the same formula, with xi as the class
mark, which is the average of the class limits.
Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car
(example 2.1.1) to compute mean and variance using the above formulas.
Time Frequency
fx fx2
x f
46 1 46 2116
47 1 47 2209
48 3 144 6912
49 3 147 7203
50 4 200 10000
51 6 306 15606
52 4 208 10816
53 5 265 14045
54 5 270 14580
55 2 110 6050
56 1 56 3136
Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2,
Lesson 1):
We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight.
Remarks.
1. Note that we can only get an approximate mean and variance if we use the class mark and with the above
formula. If you also use the original data you may notice a difference.
2. Because of the availability of computers, the importance of such approximations has declined.
Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It
is important to understand these concepts and formulas.
It is equally important to appreciate the value and necessity of using calculators or other available software (like Excel). It
is almost impossible (and unnecessary) to compute these constants manually and correctly, unless one is specially gifted
with numerical computations.
1. Follow the same steps used for computing the mean (using either raw data or the
frequency table).
2. The calculator will give a list of numbers; SX is the standard deviation.
3. The variance is the square of the standard deviation.
Problems on 2.3: Variance, Standard Deviation, and Use of the Frequency Table
Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times
on a particular day.
Exercise 2.3.2. The following figures refer to the GPA of six students.
Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs.
Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.
Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.
7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8
Compute the variance and standard deviation of the hourly wages.
Solution
Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.
No. of Typos 156 158 159 160 162
Frequency 6 4 5 6 9
Find the mean number, variance, and standard deviation of typos in a book.
Solution
Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.
Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance
and standard deviation.
Weight x 31 32 33 34 35 36 37
Frequency f 3 2 4 5 6 5 9
Find the variance and the standard deviation.
Solution
Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus.
23 17 19 24 42 33 20 22 15 9
26 37 29 19 35 18 30 21 11 23
13 27 32 32 23 35 25 33 24 23
Find the mean, variance, and the standard deviation of the data.
Lesson 3 : Probability
Introduction 3.1 Basic Concept of Probability 3.2 Sets and Subsets, Statistical Experiments, Sample
Space, Events, Probability
3.3 Laws of Probability 3.4 Counting Techniques and Probability 3.5 Conditional Probability and Independent Events
Homework 8 - 11
Introduction
Some of the early theory of probability originated in gambling and later theories developed in bioscience. We get very
tempted when we see somebody win $1 million in a lottery, but lottery operators design their games and machines in
such a way that they will make more money than they give, in the long run.
We are all familiar with simple probabilistic statements. If you toss a coin the probability that the HEAD will show
up is 1 out of 2. If you roll a die the probability of getting the face 5 is 1 out of 6. The probability of having an
accident on a particular busy street on a particular day is 1 out of 100. (When a child says "probably we should
invite Aaron for my birthday," however, the "probably" may have little to do with mathematics of probability, but shows
the awareness of the concept of probability at a basic human level.)
When we toss a coin for a large number of times we find that essentially half the time the head shows up. As we continue
to toss, we see that the ratio of the number of Heads to the number of tosses remains close to and moves around 1/2. So
we say that if we toss a coin, the probability that the head will show up is .50. On the other hand, if this ratio remains
close to and moves around .49 then we will say the probability of heads is .49. To understand the concept of probability
empirically, we visit aflash animation of a coin tossing experiment.
We observe the accidents on a street over a long period of time and observe that on about one in a hundred days there is
an accident. The longer we continue to observe, we see that the ratio of the number of days there is an accident to the
number of days observed remains close to one to one hundred. So we say that probability of an accident on a day on that
street is 1 percent.
These examples explain the basic notion of probability. The probability of an event is understood as the "relative
frequency," the ratio of occurrences of the EVENT to the total number of times the EXPERIMENT is repeated.
3.2 Sets and Subsets, Statistical Experiments, Sample Space, Events, Probability
This section provides basic definitions that we will need for the rest of the course.
Definition. By a set S we mean a collection of objects. The objects in this set S are also called elements of the set. A
set E is said to be a subset of a set S if each element of E is also an element of S. We write
E⊆S
The following are some examples. We also explain the usage of braces to describe a set.
1. Let D = the collection of all 52 cards in a deck. Then D is a set. Let E be the collection of all the hearts in this
deck. Then E is a subset of D. In brace notation
E={x in D : X is a Heart }
L⊆T
C⊆T
In brace notation
3. Let N be the collection of all integers, and let E be the collection of even integers. Then N, E are set and
E⊆N
In brace notation
N = {n : n is an integer}
E = {n ∈ N : n is even}.
4. Let R be the set of all (real) numbers. Let I be the set of all numbers between 0 and 1, not equal to 0,1. Then R,I
are sets and I is a subset of R. In brace notation
R = {x : x is a real number}
I = {x ∈ R : 0 < x < 1}.
5. S = {1,7,13,17,19} is a set.
6. Let S be the collection of you and your siblings, B be the collection of your brothers, and F be the collection of
your sisters. Then S,B,F are sets and we have
F⊆S
B ⊆ S.
Definitions.
1. A statistical experiment is a procedure that produces exactly one out of many possible outcomes. All the
possible outcomes are known, but which outcome will result when you perform the experiment is not known.
2. Given an experiment, the set of all possible outcomes is called the sample space.
3. Given an experiment, an outcome of the experiment is called a sample point. So, the sample space consists of
sample points.
Examples. The following are examples of some experiments and their sample spaces.
1. Suppose your experiment is tossing a coin. The outcomes are H (heads) and T (tails). So, the sample space is S =
{H,T}.
2. Suppose your experiment is tossing a coin twice. The sample points (or outcomes) are HH,HT,TH,TT and the
sample space is S = {HH,HT,TH.TT}.
3. Your experiment is rolling a die. The outcomes are 1,2,3,4,5,6 and the sample space is S = {1,2,3,4,5,6}.
4. Suppose that your experiment is rolling a die twice. Then the sample space is
5. Suppose your experiment is to determine the number of road accidents in Lawrence on a particular day. So, the sample
space is S = {0,1,2,3 ... }.
6. Suppose the experiment is to determine the sex of an unborn chlid. Then the sample space is S = {Female, Male}.
7. Suppose your experiment is to determine the blood group of a patient in a lab. Then the sample space is S =
{O,A,B,AB}.
8. Suppose your experiment is to observe the annual wheat production in Kansas. Then the sample space is S={x : x is a
nonnegative Number} = {x ∈ R : x ≥ 0} =[0, ∞).
Definition. The sample space S is called a finite sample space if S has only a finite number of outcomes. If S has infinite
elements, it is called an infinite sample space. Note that examples 1, 2, 3, 4, 6, and 7 above have finite sample spaces,
and 5 and 8 have infinite sample space.
Events
Definitions. Given an experiment and its sample space S, the following are important definitions.
1. A subset of the sample space S is called an event. So, an event E consists of outcomes, and we have
E ⊆ S.
2. An event ∅ that has no outcome and is called the empty event or impossible event. The impossible event
consists of no outcome; if you perform the experiment, the impossible event will never occur.
3. Since S is also a subset of S, S is an event. This event S is called the sure event. If you perform the experiment,
this event is sure to occur.
Remark. Often, we will describe events in "English," and we may have to identify them as a subset of the sample space
and also conversely.
1. Look at example 2 above—the experiment on the coin toss. Let E be the event that at least one of the tosses gave
T, and let F be the event that both tosses gave the same face. Then
2. Look at example 4 above—the experiment on rolling a die. Let E5 be the event that first die showed 5. Then
Let T5 be the event that the sum of the two "rolls" is 5. Then
Let T1 be the event that the sum of the two rolls is 1. Because T1 has no outcome, it is an impossible event. Let
T13 be the event that the sum of the two rolls is 13. Then T13 is also an impossible event.
3. Look at the example 5 above—the experiment on road accidents. Let E be the event that there is no accident on
that day. Then
E = {0}.
4. Look at example 8 above—the experiment on annual wheat production. Let E be the event that there will be more
than 1000 units of wheat production in 1998. Then
E = (1000, ∞).
Given a sample space S, in the MATHEMATICS of probability we have rules for how to compute the probability of an event
E. Although the MATHEMATICS of probability was inspired by the empirical concept of probability, we do not derive
anything from our intuitive ideas. We are guided by the precise rules and laws that we set up.
Definition. Let
S = { e1, e2, ... ,en }.
be a finite sample space. The probability of a simple event {e} is a number (possibly given) denoted by P({e}) which
has the following properties:
1. 0 ≤ P({e}) ≤ 1.
3. If E is an event, then the probability E, P(E), is defined as the sum of the probabilities of all the sample events in
E:
P(E)= ∑ P({e})
e∈E
P(Sure Event)=P(S)=1
Remark. If we know the probabilities P({e}) of all the simple events {e}, we will be able to compute the probability of
any event E using 3. The probabilities of the simple events will
1. either be given
One of the most frequently used models to compute probabilities of simple events is called EQUALLY LIKELY
OUTCOMES.
Definition. Let S = {e1, ... , eN} be a finite sample space. We say that all the outcomes are equally likely if all the
outcomes have the same probability. So, in this case, we have
e∈E e∈E
P(E) = n(E) .
n(S)
Problems on 3.2
Exercise 3.2.1. The following table gives the blood group distribution of a certain population.
Find the probability that a random sample of blood will be of Blood Group A or B or AB. (Here S={O, A, B, AB} and we
want to compute the probability P(E) of the event E={A, B, AB}.
Solution
Exercise 3.2.2. A student wants to pick a school based on its grade distribution. Following is the most recent grade
distribution in a school:
Grade Distribution
Unreal Data
Grades A B C D F
Percentage of
19 33 31 14 3
Students
Find the probability that a randomly picked student will have at least a B average.
Solution
Exercise 3.2.3. The following table gives the probability distribution of a loaded die.
Find the probability that the face 2 or 3 or 6 will show up when you roll the die.
Solution
Exercise 3.2.4. An urn contains 7 apples and 3 oranges and 5 pears. One piece of fruit is picked at random. Find the
probability that
Solution
1. the sum is 8,
Solution
Exercise 3.2.6. A letter is chosen at random from the letters of the English alphabet. Find the probability that
Solution
Following are a few notations from the set theory, which we will be using in the context of sample spaces and events.
E ∪ F = {x ∈ S : x ∈ E or x ∈ F}.
So, if you put together the elements of E and F in a single collection, you get the union E ∪ F.
So, if you take all the elements common to both E and F, you get the intersection of E and F.
So, the complement Ec of E is the collection of all the elements in S that are not in E.
Remark. If we can understand and interpret the above definitions in our context of sample spaces and events, that is
adequate. For us, S will be a fixed sample space and E,F will be events.
1. E ∪ F is the event that consists of all outcomes that are either in E or in F (or both). So the occurrence of either E
or F is the same as the occurrence of E ∪ F. That is why some textbooks use the notation (E or F) for E ∪ F. So,
notationally, as in some textbooks,
E ∪ F = E or F.
2. E ∩ F is the event that consists of all the outcomes that are both in E and F. So the simultaneous occurrence of E
and F is the same as the occurrence of E ∩ F. That is why E ∩ F is denoted by (E and F) in some texts.
Notationally, as in some textbooks,
E ∩ F = E and F.
3. Similarly, Ec is the event that consists of all the outcomes in S that are not in E. So, the occurrence of Ec is the
same as the nonoccurrence of E. Notationally, as in some textbooks,
E c = (not E)
Laws of Probability
First, probability behaves like area and the laws of probability are like that of area.
Some formulas and definitions: Let S be sample space and let E and F be two events.
1. We have
We subtract P(E ∩ F) because we counted it twice: once in P(E) and once in P(F).
2. Definition. We say E and F are mutually exclusive if E ∩ F = ∅, i.e., E and F have no outcome in common.
Since P(∅) = 0, it follows from 1 that
3. We also have
P(Ec) = 1 - P(E).
4. Definition. Let E be an event. We say that the odds of an event E occuring are a to b if
P(E) = a/(a+b)
Remark: This concept of ODDS is used often in gambling. When the odds in favor of a horse are 2 to 3, essentially this
means that the probability the horse will win is 2/5. We say "essentially" because in actual betting, the probability is
actually slightly less than 2/3, so that in the long run the gambling establishment makes more money than it gives. (This
instructor is not particularly experienced in such betting or horse races.)
1. E or F occur,
Solution
Exercise 3.3.3. The probability that a Christmas tree is taller than 6 feet is .30; the probability that a Christmas tree
weighs more than sixty pounds is 0.25; and the probability that a Christmas tree is either taller than 6 feet or more than
sixty pounds is .4.
1. Find the probability that a Christmas tree is both taller than 6 feet and weighs more than sixty pounds.
2. Find the probability that a Christmas tree is not taller than 6 feet.
3. Find the probability that a Christmas tree is either less than 6 feet tall or less than sixty pounds in weight.
4. Find the probability that a Christmas tree is neither taller than 6 feet nor heavier than sixty pounds.
Solution
Exercise 3.3.4. The probability that a student majors in liberal arts is .44; the probability that a student majors in
business is .33; and the probability that a student majors in either liberal arts or business is .65. Find the probabilities
Solution
Counting techniques are important and useful to learn. You might like to know, for example,
1. the number of English words (formal) of 5 letters, (A formal word is any sequence of letters from the English
alphabet. For example, eezq is a formal word.)
2. the number of ways you can deal a hand of 13 cards from a deck of 52 cards, or
3. the number of ways you can assign the first row of 11 seats to 231 guests.
0!=1.
One of the main tools for such counting is the following principle:
The Basic Counting Principle. Suppose we have an experiment that is a combination of r sub-experiments, performed
one after the other, such that
2. corresponding to each outcome of the first sub-experiment, the second sub-experiment has n2 outcomes;
3. corresponding to each outcome of the first and the second sub-experiments, the third sub-experiment has
n3 outcomes;
•
r. corresponding to each outcome of each of the previous r-1 sub-experiments, the rth sub-experiment has
nr outcomes.
Remark. Here we have used the word "experiment" in a slightly different sense than the statistical experiments. The
basic counting principle will be used to count the number of outcomes in sample spaces and events.
Examples.
3.4.1. Count the number of words of length four that you can construct from the English alphabet. Answer: 26x 26x26x26
We use the counting principle by splitting this experiment into four sub-experiments:
3.4.2. Count the number of ways you can assign the 11 seats in the first row in a concert hall to 231 guests.
3.4.3. Contrast: How many ways can you form a committee of 11 members from a group of 231 people? Unlike
assigning seats, here the order of selection of the members will be ignored. The 11 members, when permuted around,
will have different seat assignments but in the same committee. Forming the committee is a "combination" problem that
comes below.
Remark. The difference between assigning 11 seats in a row and forming a committee of 11 is that in the first case
the order of assignment is important. Assigning the first row to the same 11 guests in two different ways will count as
two different outcomes. When we form a committee, the order in which we pick 11 members does not make any
difference.
Definition. Suppose we have n objects. We pick r of them one by one (without ever puttting them back) and arrange
them in a row. Such an ordered arrangement will be called a permutation of n objects taken r at a time. The number
of permutations of n objects taken r at a time is denoted by nPr. It follows from the basic counting principle that
In contrast, we can pick r objects from a collection of n objects one by one but place the object back in the collection
before the next pick, and arrange all of them in a row. Such selection and arrangement is called picking with
replacment. Constructing a formal word of length 4 is an experiment of picking with replacement.
Remark: Example 3.1 is a problem on picking with replacement because a letter can be selected more that once.
Example 3.2 is a permutation problem.
Definition. Suppose we have n objects in a container. We pick r of them all at a time. In this case the order of selection
does not come into consideration. Such a selection is called a combination of n objects taken r at a time. The number of
combinations of n objects taken r at a time is denoted by nCr and is given by
n!
nC r = (r! (n-r)!)
Examples. 1. Count the number of ways you can form a committee of 11 from a group of 231 people. Answer: 231C11
2. Count the number of ways you can deal a hand of 13 cards from a deck of 52 cards. Answer: 52C13.
Exercise 3.4.2. A homeowner would like to install a new storm door. The local store offers 2 brand names; each brand
has 4 different styles and 3 colors. How many choices does the homeowner have?
Solution
Exercise 3.4.3. Suppose in the World Cup soccer tournament, group A has 8 teams. Each team of group A has to play all
the other teams in the group. How many games will be played among the group A teams. Answer: 8C2
Exercise 3.4.4. How many ways can you deal a hand of 13 cards from a deck of 52 cards? Answer: 52C13
Exercise 3.4.5. How many ways can you deal a hand of 4 spades, 3 hearts, 3 diamonds, and 3 clubs?
Solution
Solution-variation
Exercise 3.4.6. We have 13 students in a class. How many ways can we assign the 4 seats in the first row? Solution
Exercise 3.4.7. Programming languages sometimes use a hexadecimal system (also called "hex") of numbers. In this
system, 16 digits are used and denoted by 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. Suppose you form a 6-digit number in a
hexadecimal system.
1. What is the probability that the number will start with a letter digit?
2. What is the probability that the number is divisible by 16 (i.e., ends with 0)?
Solution
Here the sample space is the collection of all the 5-digit hex numbers.
Exercise 3.4.8. You are playing Bridge and you are dealt a hand of 13 cards.
1. What is the probability that you will get a hand of 4 spades, 3 hearts, 3 diamonds and 3 clubs?
Solution
Exercise 3.4.9. A committee of 9 is selected at random from a group of 11 students, 17 mothers and 13 fathers.
1. What is the probability that the committee has 3 students, 3 mothers, and 3 fathers, i. e., is a balanced
committee?
2. What is the probability that the committee has 4 mothers and 5 fathers?
Solution
Exercise 3.4.10. Three scholarships of unequal value will be awarded from a group of 35 applicants. How many ways
can such a selection be made?
Solution
Sometimes when new information becomes available, the probability of an event may have to be reevaluated in light of
this new information. Suppose we have a sample space S and an event E. Now suppose we have new information that an
event C has occurred. We will have to reevaluate the conditional probability of E given that C has occurred. The
conditional probability of E given that C has occurred is denoted by P(E|C). Clearly, P(E|C) may be different from P(E). In
fact, now that C has occurred, our old sample space is no longer relevant. And C assumes the role of the new sample
space.
Example. Suppose we pick a KU student at random and let E be the event that the student is taller than 6 feet. Then we
have the following observations.
P(E) = = .
3. Now suppose we know that the student selected is a male. Let us denote the event that the student is a male by
C. The probability that the student is taller than 6 feet, given that the student is a male, is higher than "simple"
P(E). In fact, our new sample space is C, which is the whole KU male student population, not S, which is the
whole KU student population.
4. We now have the probability that the student is taller than 6 feet in height given that the student is a male
= P(E|C) =
n(E∩C)
= .
n(C)
n(E∩C)/n(S) P(E∩C)
P(E|C) = = .
n(C)/n(S) P(C)
Based on the above example, we give the following definition and formula.
P(E∩C)
P(E|C) =
P(C)
if P(C) ≠ 0.
Independent Events
If the conditional probability P(E|F) = P(E) the "simple" probability, then we say that E and F are independent. In this
case,
P(E∩F) = P(E)P(F).
P(E∩F) = P(E)P(F).
If two events are not independent, then they are said to be dependent.
Remark. Let us also describe what we mean by independence of 3 or more events. For events E1,E2, … , En, we say they
are independent if the "multiplication rule" applies. For example E,F,G,H are independent if all of the following holds:
2 events
3 events
4 events
P(E∩F∩G∩H) = P(E)P(F)P(G)P(H)
Find P(B|A).
Solution
P(A|B) = .8 P(B) = .1
Find P(A∩B).
Solution
Exercise 3.5.3. In a certain county, the probability that a person took a flu shot is .45 and the probability that a person
will get flu, given that he/she took a flu shot is .06. What is the probability that a randomly selected person took a flu
shot and will get flu?
Solution
Circuit 1 Circuit 2
Exercise 3.5.5. An airplane has two engines. The probability that engine 1 fails is 0.023 and the probability that engine 2
fails is 0.06. Assume that the engines function independently.
Solution
1. The probability that a patient in the emergency room will have health isurance is 0.75.
2. The probability that a patient in the emergency room will survive the treatment 0.85.
3. The probability that a patient in the emergency room will have health insurance and will also survive is 0.7.
What is the conditional probability that a patient in the emergency room will survive, given that he/she has health
insurance.
Solution
Exercise 3.5.7. The probability that you will receive a wrong number call this week is 0.3; the probability that you will
receive a sales call this week is 0.8; and that the probability that you will receive a survey call this week is 0.5. What is
the probability that you will receive one of each this week? (Assume that all these calls are independent.)
Homework 12 and 13
Definition. Let S be a sample space. Then a random variable X assigns a numerical value X(w) to each outcome w in S.
Examples. Suppose we pick a KU student at random. Then our sample space S is the whole population of KU students.
1. Let X be the GPA of the student. If w is a student, X has a value X(w) which is the GPA of w.
2. Define Y as follows :
Y(w) = 0 If w is Male
Y(w) = 1 If w is Female
Definitions. A random variable X is said to be a discrete random variable if the values that X can assume can be
written in a (possibly infinite) list
x1, x2, x3, ….
A random variable X is said to be a continuous random variable if X can assume any value in an interval.
Remark. In this course, examples of discrete random variables are always the number of something: number of typos,
number of accidents on a street, number of defective items in a lot, and so on. Examples of continuous random variables
are length, weight, and time.
So, Z,W are continuous random variables and X,Y,T,D are discrete random variables.
Examples.
1. Let X be the number of wrong number calls you receive in a day. Then X is a discrete random variable.
2. Let X be the waiting time before you receive the next wrong number call. Then X is a continuous random variable.
The probability distribution of a random variable X is a table or a rule or a method that answers probability-related
questions regarding X.
So, if the probability distribution of X is given in a table, then it looks like this:
Value Probability
x p(x)
x1 p(x1)
x2 p(x2)
x3 p(x3)
… …
Properties of Probability function. Suppose X is a discrete random variable that assumes value
x 1, x 2, x 3, …
and let p(x) be the probability function. Then we have the following:
1. 0 ≤ p(xi) ≤ 1.
2. ∑ p(xi) = 1.
The mean μ is also called the expected value of X and is denoted by E(X). The mean μ is also called the population
mean.
Example. Suppose you design a coin toss game. In this game, you give the opponent $3 if a head comes and you collect
$1 if a tail comes. Let X be the money you receive. Then X assumes the values -3 and 1. You also have a loaded coin so
that
Value Probability
x p(x)
-3 1/9
1 8/9
So, the mean μ of X is given by
μ=∑ xip(xi)= (-3)(1/9)+1(8/9)=5/9.
Interpretation of mean μ of X. In this example, (see the first example in section 4.1), the mean μ tells us your average
win per game if you play for a long time.
Similarly, if Z is the height then the mean μ = E(Z) is the actual mean height of the KU student population. If we take a
large sample from the KU student population and compute the sample mean, it should approximate μ.
The variance σ2 is also called the population variance. If we take a large sample and compute the sample variance
s2 then s2 will be an estimate for σ2. Similarly, σ is called the population standard deviation.
Exercise 4.2.1. The number of passengers X in a car on a freeway has the following probability distribution.
X=x 1 2 3 4 5
p(x) 0.35 0.30 0.15 0.15 0.05
Find:
Solution
Exercise 4.2.2. Karin is a plumber who works for 3 different employers. Employer A pays her $120 a day, employer B
pays her $70 dollars a day, and employer C pays her $180 a day. She works for whoever calls her first. The probability that
employer A calls her first is 0.30; the probability that employer B calls first is .20; and the probability that employer C calls
her first is 0.40 (the probability that no one calls is .10). What is the expected income and variance of Karin per day?
Solution
Exercise 4.2.3. An insurance company sells a flight insurance policy at a flat rate of $500 per flight. If a policyholder dies
in flight, the insurance company pays $100,000 to the survivors. The probability that a policyholder will die in flight is .
003. What is the expected gain and variance of the company per sale?
Solution
There are many random variables that we encounter fairly often. The first one that we discuss is called a Bernoulli
random variable.
Definition. There are many statistical experiments that have only two outcomes. In such cases, the outcomes may be
called a success or a failure. So the sample space is
S={s,f}.
Such an experiment is called a Bernoulli trial. Given a Bernoulli trial, we can define a random variable as
X = 1 if success
X = 0 if failure
If the probability P(success) = p then we have P(failure) = 1-p. So, the probability distribution of a Bernoulli random
variable is given by
Value Probability
x p(x)
0 1-p
1 p
The mean of X is
μ = 0(1-p)+1p = p.
The variance of X is
Definition. An interesting statistical experiment is a combination of n "identical and independent" Bernoulli trials. Such
an experiment is called a binomial experiment. More formally, given a positive integer n and a number p with 0 ≤ p ≤ 1
a binomial(n,p) experiment (or B(n,p) experiment) is characterized as follows:
Then X is called a binomial (n,p)-random (or B(n,p)-random) variable. Following are some important facts about a
B(n,p)-random variable X:
2. The mean of X is
μ = E(X) = np.
3. The variance of X is
σ2 = Variance(X) = np(1-p).
Exercise 4.3.1. Let X be a B(6,.3)-random variable. Find P(X = 2). Also find the probability that X is at least 2.
Solution
Exercise 4.3.2. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control
(CDC), 18 percent of children younger than 2 years had anemia in 1997. On a particular day, a pediatrician examined 11
children.
Exercise 4.3.3. A gardener planted 15 seeds. The probability that a seed will germinate is 0.1.
Solution
1. What is the probability that a jury of 12 will have exactly 6 Hispanic members?
2. What is the probability that a jury of 12 will have more than 6 Hispanic members?
Solution
Exercise 4.3.5. From the hiring statistics of a corporation (say IBM), it is known that for every 4 interviews they give,
they make 1 job offer. Suppose that the corporation interviews 8 candidates each time it comes to campus. What is the
mean and standard deviation of the number of job offers made each time?
5.1 Probability Density Function (pdf) 5.2 The Normal Random Variable 5.3 Nomal Approximation to Binomial
Homework 14 - 16
Given a sample space S, a continuous random variable was defined as a random variable X that can assume any value
in an interval. The probability distribution of a continuous random variable is described very differently from that of a
discrete random variable. We describe it as follows.
Definition. Let S be a sample space and X be a continuous random variable. Then there is a function f(x), of real numbers
x, to be called the probability density function, abbreviated as pdf of X. This pdf f(x) has the following properties:
the area under the graph of y = f(x), above x-axis, between the vertical lines x = a and x = b.
Look at the animations on
1. exponential probability.
2. normal probability.
P(X = a) = 0.
5. The whole area under the graph of y = f(x) above the x-axis must be one.
Remark. Given a continuous random variable X, to get a model for f(x) we look at a large sample and look at the relative
frequency histogram of the X-values.
1 if 0 ≤ x ≤ 1
f(x) =
0 Otherwise
Then we say X is uniformly distributed between 0 and 1 because it has the same density everywhere between 0 and 1.
1/4 if -1 ≤ x ≤ 3
g(x) =
0 Otherwise
The mean μ, variance σ2 and standard deviation σ of continuous random variables X are interpreted as we did for
discrete random variables. As before, the mean μ, which is also called the expectation E(X), represents the average value
of X.
But the definitions involve some calculus, which we are trying to avoid. If you have had calculus, I am giving the following
definitions.
Suppose f(x) is the pdf of a continuous random variable X. Then the mean of X is
∞
μ =E(X)=∫-∞ xf(x)dx
∞
σ 2 =Variance(X)=∫-∞ (x- μ )2 f(x)dx
and the standard deviation σ is the square root of the variance σ2.
1. Example 1. Normal
2. Example 2. t Distribution
3. Example 3. Chi-square Distribution
The most commonly encountered random variable in nature is the normal random variable. As we have seen in the last
section, the probability distribution of a random variable is determined by the pdf of the random variable. The pdf of a
normal random variable is described below.
PDF of a Normal Random Variable: Suppose f(x) is the pdf of a normal random variable X. Then we have the following
properties of f(x).
1. The graph of the pdf y = f(x) has a symmetric bell shape as illustrated below:
8. If X is a normal random variable, we say X is normally distributed, or X has normal distribution. We also
write X has N(μ,σ)-distribution.
Definition. A normal random variable is called a Standard Normal Random Variable if it has mean μ = 0 and standard
deviation σ = 1. So, a N(0,1)-random variable is called a standard normal variable. In some textbooks the standard normal
random variable is denoted by Z. The GOOD NEWS is that a table is available to compute these probabilities. The following
properties of Z will be useful.
1. The graph of the pdf y = f(x) of the standard random variable Z is symmetric around the y-axis.
2. The total area under the graph above the x-axis is one.
3. So, on each side of the y-axis, the area under the graph above the x-axis is .5.
4. Visit the flash animation on Standard Normal Probability to see illustrations of the above.
Using the Probability Tables: Tables are used widely to compute probability. However, due to the use of various
software programs on probability, the importance of such tables has declined. In this chapter, we will use the Z-table to
compute probability for the standard normal random variable. We note the following:
Inverse Probability: Sometimes we will be given the probability and asked to compute a "cut off" point.
1. Example: We may be given P(Z < c) = .975 and asked to compute c. You will see from the table P(Z<1.96) = .
975 and conclude that c=1.96.
2. Example: We may be given P(l < Z) = .005 and asked to compute l. P(l<Z) represents the area on the right side
of l, under the bell curve. So, P(Z < l) = 1 - .005 = .995. From the table P(Z<2.58) = .995 (actually .9951, but
the exact match is not always expected). So, l=2.58.
3. Visit the animation on Inverse Z distribution to inspect a particular type of cut-off problem that we will use later.
Given a N(μ,σ)-random variable X, we can use the Z-table to compute probabilities for X because of the following theorem.
Theorem. Let X be a N(μ, σ)-random variable. Then Z = [(X-μ)/(σ)] is a standard random varable. So,
a- μ b- μ
σ σ
OR
Problem Solving: We will have two types of problems in this section—probability computation and problems of inverse
probability (or cut-off points).
1. For a problem on normal random variables X with mean μ and standard deviation σ, the first step
is STANDARDIZATION.
2. Then, we look at the Z-table.
3. Example: Suppose X is a N(2, .5) random variable and P(X<L) = .95, what is the cut-off L? First, we standardize
and we have P((X-μ)/σ < (L-μ)/σ) = P(Z < (L-μ)/σ ) = .95. From table, P(Z < 1.65) = .95 (approximately). So, L-
μ/σ = 1.65 an L = μ+1.65σ = 2 + 1.65*.5 = 2.825.
Ubiquity of Normal Random Variables: Any random variable that we encounter in nature is, almost certainly, either
normal or approximately normal. If there is one concept that you take from this course it is this: nature's random variables
are normal or approximately normal. You will hear about normal random variables and the bell curve in your workplace or
anywhere you may have to use statistics.
Problems on 5.2: the Normal Random Variable
Exercise 5.2.2. Let X be a normal random variable with mean μ = 3 and standard deviation σ = 1.5 .
Exercise 5.2.3. The length of life of some light bulbs produced in a factory is normally distributed with mean 8640 hours
and standard deviation 1440 hours. Find the probability that a bulb will last
Solution
Exercise 5.2.4. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm.
What proportion (i.e, probability) of fish are between 44 cm and 110 cm long?
Solution
Exercise 5.2.5. The diameter of the pumpkins in my patch has normal distribution with mean 13 inches and standard
deviation 4.5 inches. What proportion (i.e., probability) of pumpkins is above 22 inches?
Solution
Exercise 5.2.6. The annual expenditure X of a student is approximately normally distributed with mean μ = 11,000
dollars and standard deviation σ = 1500 dollars. What percent of students spend less than 10,000 dollars?
Solution
Exercise 5.2.7. Suppose the annual production X of milk per cow is normally distributed with μ = 5500 liters and
standard deviation σ = 150 liters. What percent of cows have annual yield less than 5155 liters?
Solution
Exercise 5.2.8. The amount of vegetable oil X produced by a machine in a day is normally distributed with μ = 130 liters
and standard deviation σ = 25 liters. What is the probability that a machine will produce between 120 liters and 150 liters
on a day?
Solution
Exercise 5.2.9. The weight X at birth of babies is normally distributed with mean μ = 114 oz and standard deviation σ =
18 oz. What percent of babies will have birth weight below 141 oz?
Solution
Exercise 5.2.11. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm.
On a fishing trip to the lake, you are instructed to release those in the lower 33 percent in length. What is the cut-off
length?
Solution
Exercise 5.2.12. The telephone company's data shows that length X of their international calls has normal distribution
with mean 11.5 minutes and standard deviation 4.3 minutes. The company decided to give a special rate for the longest
20 percent calls. What is the cut-off time length?
Solution
Exercise 5.2.13. The weight X of babies (of a fixed age) is normally distributed with with mean μ = 212 oz and standard
deviation σ = 25 oz. Doctors would be concerned (not necessarily alarmed) if a baby is among the lower 5.05 percent in
weight. Find the cut-off weight L below which the doctors will be concerned.
Solution
Exercise 5.2.14. Monthly water consumption X per household, in a subdivision in Kansas City, has normal distribution
with mean 15000 gallons and standard deviation 3000 gallons. It has been decided that a surcharge will be imposed for
those in the top 25 percent. Find the cut-off consumption U in gallons.
Solution
A wide range of random variables behave approximately like a normal random variable. One such example is
binomial(n,p)-random variables.
Roughly, if X is a B(n,p) random variable, then X behaves approximately like a normal random variable with mean μ = np
and standard deviation σ = [np(1-p)]1/2.
P(Y = r) = 0.
Because of this, some correction needs to be done. The following theorem states how to use normal approximation to
binomial random variables.
Theorem. Suppose X is a B(n,p) random variable. If n is large and p is not very close to 0 or 1, then X behaves,
approximately, like a N(μ, σ) random variable where
Exercise 5.3.1. A Lawrence bank knows that 35 percent of its customers will visit the drive-through window. If 400
customers visit the bank, what is the approximate probability that more than 120 will visit the drive-through window?
Solution
Exercise 5.3.2. It is known that the probability that a household owns a food processor is 0.1. If 190 households are
interviewed, find the approximate probability that
Solution
Exercise 5.3.3. The campaign committee of a candidate claims that sixty percent of the voters are in favor of the
candidate. You interview 150 voters. Assuming that the campaign committe's claim is accurate, what is the approximate
probability that less than 77 will favor the candidate?
Solution
Exercise 5.3.4. A technique is used to fertilize eggs in a fertility clinic laboratory. It is known that the probability that an
egg will be fertilized by this technique is 0.1. If 500 eggs are treated, what is the probability that at least 60 eggs will be
fertilized?
Solution
Exercise 5.3.5. The probability that a computer chip produced in a factory is defective is is .2. If you have a sample of 60
chips, what is the probability that the number of defective chips will be less than 20?
Solution
Exercise 5.3.6. The probability that a light bulb produced by a machine is defective is p = 0.2. Suppose a quality control
inspector takes a sample of 120 bulbs. What is the probability that more than 30 bulbs will be defective?
Solution
Exercise 5.3.7. Suppose the probability that a student has access to the Internet is p = 0.8. Suppose you interview 160
students. What is the probability that less than 120 students will have access to the Internet?
Solution
Exercise 5.3.8. Suppose that the probability that a person favors medical use of marijuana is p = 0.6. If 780 individuals
are interviewed, what is the probability that less than 450 will be in favor?
Solution
Exercise 5.3.9. Suppose that the probability that a middle-income family invests in the stock market is p = 0.8. If we
interview 880 middle-income families, what is the probability that more than 700 have invested in the stock market?
Solution
Exercise 5.3.10. Suppose that an insurance company knows from experience that the probability that a life-insurance
policyholder will survive another 10 years is p = 0.9. The company has 2280 policyholders. What is the probability that
more than 2025 will survive another 10 years.
Introduction 6.1 Central Limit Theorem and Sampling Distribution of the Proportion Homework 17
Introduction
The sample mean x that we have computed in the previous chapters is, in fact, the observed value of a random variable X.
Similarly, the sample variance s2 that we have computed before is the observed value of a random variable S2. Each time
you collect a sample/data, the computed sample mean x is the value of the random variable X for this sample. This is
explained in the following example.
Example. Suppose we want to study the height distribution of the U.S. population. We collect data of size n = 1713. We
shall consider that height xi of the ith individual in this sample is, in fact, the observed value of a random
variable Xi. Here Xi is the notation for height of the ith member of the sample, which could be the height of any person
from the whole U.S. population. When we finished collecting data we have n measurements
x1, x2, …, xn.
X1+X2+…+Xn
X= X = .
n
We also (re)define sample variance S2 as the random variable
1
n
S2 = ∑ (Xi - X ) 2.
i=1
n- 1
So, the sample mean we computed before in Lesson 2 is a value of X.
We also say that X1, X2, … , Xn is a sample from the population X = height of an American. We assume that our sampling
was done with replacement. Such a sample has the following properties.
1. Let X = height of an American and let mean of X be μ and variance σ2. Then X is called the parent or
the population random variable. Also μ and σ2 are called the population mean and variance.
2. Then, each of the sample member Xi has the same distribution as X. So, mean of Xi is μ and variance of Xi is σ2.
3. The sample members X1,X2, …, Xn are all mutually independent.
4. The distribution of X is called the sampling distribution of X.
5. Theorem. The mean of the sample mean X is the population mean μ, that is
E(X) = E(X) = μ
Var(X) = σ2/n
σX = σ/√n.
Remark. In the above discussion, we have assumed that the sampling was done with replacement. That means that each
time a sample member is drawn, it is placed back before we select the next member. A member could, therefore, appear
more than once. Although this may seem unnatural, when we are working with a large population this is not likely to
happen and is most natural from the statistical point of view. (How often would one receive calls twice for the same
poll?)
The type of sampling where we do not place back the item selected before we select the next one is called sampling
without replacement. Although many textbooks have a lengthy discussion of this concept, we will not emphasize it. All our
samples are drawn with replacement and have the above properties.
Assume n is large.
N(μ,σX)
2. So, approximately,
OR
a- μ b-μ
σ/√n σ/√n
3. If the parent population X is Normal, then 1) and 2) are exact.
Suppose you are conducting a poll to determine the proportion p (or percentage) of people in favor of a certain
presidential candidate. You interview a randomly selected sample of n voters. Then you let X be the number of people
among these n voters who are in favor of the candidate. Then X/n is the proportion in this sample that are in favor of the
candidate. We use this sample proportion X/n as an estimate for the proportion of the entire voter population that are in
favor of the candidate. This is the number X/n that the pollsters report on TV every evening before the election.
Here p is the proportion of voters that are in favor of the candidate. So, X is a B(n,p) random variable. We have already
seen (section 5.3 in lesson 5) that, approximately, X follows a N(μ, σ) distribution, where μ = np, σ = √(np(1-p)). From
this it follows that the sample proportion X/n, approximately, has
N (p, σ) distribution
where σ =(p(1-p)/n)1/2.
In fact, the same could be derived from the central limit theorem. Let
Y=1 if success
Y=0 if failure
Here by "success" we mean that the voter is in favor of the candidate. Then Y is a Bernoulli(p) random variable and the
mean of Y is p and the variance(Y) = p(1-p). The response of each voter in the sample could thus be represented as a
random variable as follows
Then X1,X2, … , Xn is a sample from the Y- population, and the sample proportion
is the sample mean. So, by CLT the sample proportion X=X/n, approximately, has
N(p,σ) distribution
where σ=(p(1-p)/n)1/2.
2. So, approximately,
OR
a- p b- p
σ X/n σ X/n
Remark. The same thing applies when you are trying to estimate the proportion of success p. Some examples might be
the proportion of defective items, the proportion of people in favor of capital punishment, the proportion of immigrants.
Remark. The normal approximation of the sample proportion given above is not really different from the normal
approximation of the binomial random variable (section 5.3). The only difference is the way we use them. In section 5.3,
we used continuity correction. For large n, continuity correction is, in fact, negligible and will not have any effect.
Problems on 6.1: Central Limit Theorem and Sampling Distribution of the Proportion
Exercise 6.1.1. It is known that the tuition paid per semester by students in a university has a distribution with mean
$2,050 and standard deviation $310. If 64 students are interviewed, what is the approximate probability that the sample
mean tuition paid will be above $2,060?
Solution
Exercise 6.1.2.
The monthly water consumption X per household in a subdivision in Kansas City has normal distribution with mean 15000
gallons and standard deviation 3000 gallons. What is the probability that the mean consumption of the 44 households in
the subdivision will exceed 16000 gallons?
Solution
Exercise 6.1.3. According to some data, the annual Kansas wheat export X has a mean 733 million dollars and standard
deviation 163 million dollars. What is the probability that over the next 10 years Kansas wheat exports will exceed 8040
million dollars?
Solution
Exercise 6.1.4. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control
(CDC) 18 percent of the children younger than two had anemia in 1997. On a particular day in that year, a pediatrician
examined 180 children.
Solution
Exercise 6.1.5. On one day during an impeachment hearing, it is claimed that 75 percent of eligible voters think the
President should not be impeached. Suppose we interview 700 voters. Assuming the above, what is the probability that the
sample proportion of voters who do not think the President should be impeached
Introduction
The name of the game in statistics is trying to understand the POPULATION on the basis of the information
available in the SAMPLE. Part of what we mean by "understand" is estimating the values of the population parameters.
The game here is to use suitable sample STATISTICS to estimate population parameters. For example, we may like
to use the sample mean x as an estimate for the population mean μ.
1. The first one is called point estimation. In point estimation, we give a number as an estimate for the parameter.
For example, if we are trying to estimate the mean height μ of the American population, we may take a sample of
a certain size, compute the sample mean height x, and call it an estimate for μ.
2. The second one is called interval estimation. In interval estimation we give an interval (L, U) and say that the
parameter will be within this interval (with a certain level of confidence). For example, when estimating the mean
height μ of the American population, we may take a sample, compute the sample mean x and say that the
population mean μ is in the interval (x-1, x+1). Obviously, in interval estimation, the smaller the length, U-L, of
the interval and the higher the level of confidence, the better the estimation is.
As we have already mentioned, we use a statistic to estimate a parameter. The statistic T used to estimate a
parameter θ is called an estimator of θ. The computed value t of T is called a point estimate or an estimate of θ. For
example, the sample mean X is an estimator of μ and the computed value x is an estimate of μ. The estimator is a
sampling random variable. Similarly, the sample variance S2 is an estimator of the population variance σ2 and the
computed value s2 is an estimate of σ2.
It may be intuitively clear to you why X and S2 would be reasonable estimators, respectively, for μ and σ2. Mathematically,
the reasons are as follows:
1. We have
For this reason we say X and S2 are unbiased estimators, respectively, for μ and σ2.
2.
var(X) = σ2/n
is small if n is large. So, for large n, the standard deviation σ of X decreases. This means that values of X will be
X
close to the mean μ more frequently. This improves the level of confidence for X as an estimator of μ. View the
animation on normal distribution to see how the probability mass concentrates around the mean μ as the standard
deviation decreases.
Interval Estimation
We would almost never expect a point estimate t of a parameter θ to be exactly equal to the actual value of θ. This is why
it is more reasonable to give an interval (L,U) and say that θ would be within this interval. Here L, U will be statistics.
Since the computed values of L = l,U = u will depend on the sample, we do not expect that the value of θwill always be
within this computed interval (l,u). We are happy as long as the true value of θ falls within the interval (l,u) most
often (or often enough), allowing the possibility of being "wrong" a few times.
But how often is often enough? The probability P(L < θ < U) tells us how often the paramenter will fall within (l,u). So, it is
also reasonable to give the probability P(L < θ < U) or P( θ ∉ (L,U)). This is what we do in interval estimation, also
called a confidence interval of θ.
Definition. Let θ be a population parameter. An interval estimate for θ provides the following:
is called the level of confidence. And (L,U) is said to be a (1-α)100 percent confidence interval of θ.
3. In practice, α will be a small number, like, 0.1, 0.01, 0.05.
View the animation on inverse Z-distribution to understand the numbers z . As mentioned above, for us a will be a small
α
number .1, .01, .05 and so on. At the end of the Z-table is a list of the numbers z that we may need frequently.
α
Suppose X is a random variable with mean μ and variance σ2. We want to construct a confidence interval for μ.
We assume that σ is known. Let X1,X2, …, Xn be a sample from X. Note that from CLT we have, approximately,
where Z=(X-μ)√n/σ.
If we simplify, we get
Theorem. Assume that σ is known. Then a (1-α)100 percent confidence interval for μ is given by
Remarks.
1. If you go on computing (1-α)100 percent confidence intervals on a regular basis, the true value of μ will not be
within the confidence interval α100 percent times.
2. The confidence interval we computed above may also be called a (1-α)100 percent two sided confidence interval
for μ. There could be all kinds of confidence intervals. For example, if
then (L, ∞) will be a (1-α)100 percent one sided (upper) confidence interval for μ.
l = 2zα/2σ/√n.
E = zα/2σ/√n.
3. The sample size n needed for a (1-α)100 percent confidence interval to have a preassigned margin of error E is
given by
n = (zα/2σ/E)2.
To be sure, always round upward in this class. Also use the Z-table for online homework.
Exercise 7.1.1. Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you
have collected a sample of size 25 and the sample mean X was found to be 81.
Exercise 7.1.2. Assume that you have a normal population with mean μ and standard deviation σ = 9.8. Suppose you
have collected a sample of size 14 and the sample mean X was found to be 151.1.
Exercise 7.1.3. The time taken by an athlete to run an event is normally distributed with mean μ and known standard
deviation σ = 3.5 seconds. To estimate the mean μ, he ran 16 times and the sample mean was found to be X = 33
seconds.
1. Find the margin of error in estimating the true mean μ with 95 percent level of confidence.
2. Find a 99 percent confidence interval for μ.
Solution
2
Exercise 7.1.4. A population has normal distribution with variance σ = 289. How large a sample do we need to estimate
the mean μ within 3 units from the true value of μ, with 90 percent confidence?
Solution
Exercise 7.1.5. The tuition X paid by a student per semester in a university has a distribution with mean μ and σ = $416.
How large a sample should you draw so that you are 95 percent sure that the true value of μ will be within $10 of the
sample mean x?
Solution
Let X be a normal random variable with mean μ and variance σ2. Unlike in the last section, in this section we assume
that σ is not known, and we try to compute a confidence interval of μ. In the last section, the main tool (or fact) that we
used was that
Z=(X-μ) √n/σ
The distribution of T is known as t-distribution with degrees of freedom n-1, which we have not discussed. As we did
for the N(0,1) random variable, we will now give the properties of t-distribution.
About t-distribution
Given a positive integer ν, there is a random variable T = tν that is said to have t-distribution with degrees of freedom ν.
The useful properties of t-distribution are listed below:
1. A t-random variable has degrees of freedom. If a random variable T has t-distribution with degrees of
freedom ν then we say that T has tν distribution.
4. The graph of the pdf of a t-random variable is symmetric around the y-axis and has a bell shape.
5. For a T = tν random variable, if the degrees of freedom ν is large, then it can be approximated by a N(0,1)
random variable.
6. For a number 0 < α < 1 and any positive integer ν, we define a number tν, α by the equation
7. Tables are available, one for each degree of freedom ν, that can be used to compute the probability for T-random
variables. We will need only some of the numbers t . A table sufficient for us is provided at link for a table .
ν, α
Theorem. Let X be a normal random variable with mean μ and standard deviation σ. Let X ,X ,…, X be a
1 2 n
sample of size n from the X population. Then
T=(X-μ) √n/S.
So,
P(-tn-1,α/2 < (X-μ)√n/S < tn-1,α/2 ) = 1-α.
If we simplify, we get
Under the set up of the theorem, a (1-α)100 percent confidence interval for μ is given by
where E=tn-1,α/2s/ √n
E is also called the margin or error.
A Frequently Asked Question:To estimate μ, when do we use the ZInterval and when do we use the TInterval? Answer:
We use the TInterval only when σ is not known.
Exercise 7.2.1. Assume that we have normal populations with mean μ and standard deviation σ. We have a sample of
size n = 18 that has sample mean x = 170.5 and standard deviation s = 13.3. Find the margin of error and compute a 99
percent confidence interval for μ.
Solution
Exercise 7.2.2. Suppose that the time taken to complete a problem in a Math 365 test is normally distributed with
mean μ and standard deviation σ. A sample of size 23 was taken, and sample mean and standard deviation were found to
be x = 4.7 and s = .47. Estimate the mean time μ taken to complete a problem using a 98 percent confidence interval.
Solution
Exercise 7.2.3. It is assumed that the lifetime (in hours) of lightbulbs produced in a factory is normally distributed with
mean μ and standard deviation σ. To estimate μ the following data was collected on the lifetime of bulbs.
Compute a 95 percent confidence interval for μ. Write down the formula for (1-α)100 percent confidence interval that you
use here.
Solution
Exercise 7.2.4. To estimate the mean weight (in pounds) of salmon in a river the following sample was collected:
34.7 33.8 38.2 20.3 27.8 45.3 43.1 37.3 32.5 32.3
31.8 41.5 44.5 29.2 25.3 29.6 39.5 29.1 37.3
Compute a 99 percent confidence interval for the sample mean μ. Write down the formula for (1-α)100 percent confidence
interval that you use here.
Solution
Exercise 7.2.5. Suppose we collect a sample from a normal population of size n = 40 with sample mean X = 18.6 and
standard deviation s = 9.486. Construct a 95 percent confidence interval for mean μ.
Solution
Exercise 7.2.6. The time taken by an athlete to run an event is normally distributed with mean μ and unknown standard
deviation σ. To estimate the mean μ he ran 16 times and the sample mean was found to be X = 33 seconds and the
sample standard deviation s = 3.5 seconds.
1. Find the margin of error in estimating the true mean μ with 99 percent level of confidence.
2. Find a 99 percent confidence interval for μ.
Solution
Let X be the normal random variable with mean μ and variance σ2. In this section, we will construct a confidence interval
for σ2. We will take a sample X1,X2, …, Xn of size n from the X population. Let X be the sample mean and let S2 be the
sample variance. To compute a confidence interval for σ2, we will be using the distribution of
U = (n-1)S2/σ2
The distribution of U is known as χ2 distribution with degrees of freedom n-1, which we have not discussed. Next we
will give the properties of a χ2 random variable.
About χ2-distribution
Given a positive integer ν, there is a random variable χ2ν that is said to have χ2 distribution with degrees of freedom ν.
The useful properties of χ2 distribution are listed below.
1. A χ2 random variable has a degree of freedom. If a random variable U has χ2 distribution with degrees of
freedom ν then we say that U has χ2ν-distribution.
2
4. The graph of the pdf of a χ random variable is skewed to the right. If the degrees of freedom, ν, is large then it
can be approximated with a N(0,1) random variable.
View the animations on pdf of Chi-Square random variable and probability distribution of Chi-Square.
5. If U is a χ2ν random variable then the mean of U is ν. (We will not need this.) This fact is reflected in the
animation above.
6. For a number 0 < α < 1 and any positive integer ν, we define a number χ2ν, α by the equation
P(U > χ2 )=α
v,α
2
where U has χ distribution with degrees of freedom ν.
2
View the animation on inverse Chi-Square distribution to undertand the numbers χ .
ν, α
2
7. Tables are available, one for each degree of freedom ν, that can be used to compute probability for χ -random
2
variables. For our purpose, only some of the numbersχ will be needed. Here is a link for a table that will be
ν, α
sufficient for us.
Theorem. Let X be a normal random variable with mean μ and variance σ2. Let X1,X2,…,Xn be a sample of size n from
the X population. Then
T = (n-1)S2/σ2
So,
If we simplify, we get
where
L = (n-1)S2/χ2n-1,α/2
U = (n-1)S2/χ2n-1,1-α/2
Theorem. Under the same set-up as in the above theorem, a (1-α)100 percent confidence interval for the variance σ2 is
given by
l < σ2 < u
where
l = (n-1)s2/χ2n-1,α/2
u = (n-1)s2/χ2n-1,1-α/2
OR
<σ2< .
Use of Calculators: The TI-83 will not compute the confidence interval for σ2. If data is given, it is important to use the
calculator to compute the sample variance s2.
Exercise 7.3.1. Suppose that we have collected a sample of size n = 26 from a normal population with mean μ and
2 2 2
variance σ . The sample variance was found to be s = 26.7. Compute a 95 percent confidence interval for σ .
Solution
Exercise 7.3.2. The following is sample data on the amount (in 1000 bushels) of wheat harvested by Kansas farmers in
2002.
Solution
Exercise 7.3.3. The following is data on monthly gas consumption (in ccf) during the winter months by a household.
154 222 264 257 127
Solution
Once again, let p be the population proportion of a certain attribute. We want to compute a confidence interval for p. We
let
X = 1 if success
X = 0 if failure
X = X1+…+Xn
X=X/n
be the sample proportion of success. We have seen that, approximately, the sample proportion X has
N(μX, σX)-distrubution
Therefore,
P(-zα/2 < (X-p)/σX < zα/2 ) = 1-α.
Since p is unknown, this will not produce a confidence interval for p. But the sample proportion x of success is a point
estimate of p. So we have an approximate (1-α)100 percent confidence interval for p given by
where
e = zα/2 √(x(1-x)/n)
Following are some of the useful formulas and definitions that we may need.
e = zα/2 √(x(1-x)/n)
E = zα/2/√4n.
It can be checked that the margin of error e is always less or equal to the conservative margin of error E.
3. Theorem. For a (1-α)100 percent confidence interval for p, if we are given a preassigned conservative margin of
error E, then the sample size n that we need to take is given by
President Clinton has 64 percent approval rating. The poll has a margin of error plus or minus 3.1
percentage points. The poll surveyed 972 people.
They mean that the sample proportion x of people who "approve" President Clinton is 0.64. Normally they don't tell us the
level of confidence they are using. Assuming that they are using a 95 percent confidence interval, they mean that
E = zα/2 /√4n = 1.96/√(4x972) = 0.031.
Exercise 7.4.1 In a sample of 197 apples from a lot, 19 were found to be sour. Set a 99 percent confidence interval for
the proportion p of sour apples in the lot.
Solution
Exercise 7.4.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 97 of them
developed immunity. Find a 95 percent confidence interval for the proportion p of individuals in the population for whom
the vaccine would help.
Solution
Exercise 7.4.3. Before a congressional election, a poll was conducted. Out of 887 randomly selected voters interviewed,
389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.
1. Construct a 98 percent confidence interval for the proportion p of voters who would vote for A.
2. Construct a 98 percent confidence interval for the proportion p of voters who would vote for B.
3. What is the conservative margin of error for both?
Solution
Exercise 7.4.4. If a pollster wanted to estimate the proportion p of Americans who think that the President should not be
impeached, how large a sample should he/she take so that the true value of p will be within .02 of the sample proportion,
with 99 percent confidence?
Solution
Exercise 7.4.5. The proportion p of defective lightbulbs produced by a machine needs to be estimated within .01 to
determine whether the machine needs to be replaced. How large a sample should we take to do this with 90 percent
confidence?
Solution
Exercise 7.4.6. In a poll released on October 28,1998, it was revealed that 60 percent of Americans wanted President
Clinton rebuked but not impeached. The poll was conducted among 1,013 adults, and it had a margin of error of 3
percentage points.
Solution: News media polls use 95 percent confidence intervals. When they say "margin of error," they mean "conservative
margin of error." The conservative margin of error E and level of confidence 1 - α are related by the formula E =
zα/2 /√4n. For this problem E = .03, 1 - α =.95, and n =1,013. We can check zα/2 /√4n = 1.96/√(4x1013) =
0.03079.
Lesson 8 : Comparing Two Populations
Introduction
In this lesson we try to compare two populations. We will consider the following:
1. Compute a confidence interval of the difference μ1- μ2 of the means of two populations. For example, we may like
to estimate the difference μ1 - μ2 between the mean μ1 = annual male income and the mean μ2 = annual
female income in the United States.
2. Compute a confidence interval of the difference p1-p2 of the proportions of an attribute present (or proportions of
"success") in two populations. For example, we may like to estimate the difference p1-p2 between p1 = the
proportion of defective items produced by the new machine and p2 = the proportion of defective items produced
by the old machine.
Suppose X, Y are two similar random variables. Let mean and standard deviation of X be, respectively, μ1 and σ1. Let
mean and standard deviation of Y be, respectively, μ2and σ2. We want to compute a confidence interval for the
difference μ1- μ2. So we do the following.
1. We draw a sample X1, X2, …, Xm, of size m, from the X population and we draw a sample Y1, Y2, …, Yn, of size n,
from the Y population. Let
X = (X1+X2+ … +Xm)/m
Y = (Y1+Y2+ … +Yn)/n
N(μ1, σ1/√m )
N(μ2, σ2/√n )
distribution.
4. Now we assume that the X samples and Y samples are mutually independent. In that case, it follows that X-Y has
where
σ = √( σ12/m + σ22/n ).
5. It follows that
6. If we simplify, we get
E = zα/2 σ
Exercise 8.1.1. Suppose we have two normal populations with means μ , μ and standard deviation σ , σ respectively.
1 2 1 2
It is known that σ = 8.1 and σ = 11.3. A sample of size m = 64 was collected from the first population, and the sample
1 2
mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean
was found to be y = 4.1. Compute a 95 percent confidence interval for the difference of mean μ - μ .
1 2
Solution
Exercise 8.1.2. The birth weight of babies in developed and developing countries are normally distributed with
mean μ1, μ2 and standard deviation σ1, σ2, respectively. (My data is not real.) Given σ1 = 2.3 pounds and σ2 = 2.9
pounds. A sample of size m = 35 babies from the developed nations were collected and the sample mean birth weight
was found to be x = 8.9 pounds. A sample of size n = 48 babies from the developing nations was collected and the
sample mean birth weight was found to be y = 7.1 pounds.
1. Compute a point estimate of the difference of mean birth weight μ1- μ2.
2. Determine the margin of error of the difference μ1- μ2 at the 95 percent level of confidence.
Solution
Exercise 8.1.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. The mean height and standard deviation of African elephants
are μ1, σ1 = 1.2 feet, respectively. The mean height and standard deviation of Indian elephants are μ2, σ2 = 1.1 feet,
respectively. A sample of size 25 African elephants were collected and the sample mean height was found to be x = 10.9
feet. A sample of size 28 Indian elephants was collected and the sample mean height was found to be y = 9.1 feet.
2. Determine the maximum error of the difference μ1- μ2 at the 99 percent level of confidence.
Solution
As in the last section, we have two populations X, Y. We assume that X has N(μ1, σ1) distribution and Y has N(μ2, σ2)
distribution. Unlike in the last section, we assume thatσ1, σ2 are unknown. We try to find a confidence interval
for μ1 - μ2.
We take a sample X1, X2, …, Xm of size m from the X population, and we take a sample Y1,Y2, …, Yn from the Y
population. Following are some facts and notations.
1. Assumptions: We make an important assumption that the variances σ12 and σ22 are equal. So, we write
σ1 = σ2 = σ.
And, we also assume that the X-sample and the Y-sample are mutually independent.
2
Let X and S
X
2. be the sample mean and sample variance of the X-sample and let Y and SY2 be the sample mean and sample
variance of the Y-sample.
S p2 =
[(m-1)SX2+(n-1)SY2 ]/ [m+n-2] =
Although both SX2, SY2 are estimators of σ2, Sp2 is a better estimator for σ2 because it uses both the samples.
One can see that Sp2 is a weighted average of SX2and SY2.
4. It follows that
5. Using the same kind of computations that we have done before, we see that a (1-α)100 percent confidence
interval for μ1- μ2 is given by
where
Exercise 8.2.1. Suppose that we are comparing two "similar" normal populations with means μ1, μ2 respectively and the
populations both have standard deviation σ. We collected a sample of size m = 11 from the first population that produced
a sample mean x = 13.2 and sample standard deviation s1 = 2.33. A sample of size n = 13 was collected from the
second population that had sample mean y = 11.5 and sample variance s2 = 2.73.
3. Compute the margin of error in estimating μ1- μ2 at the 90 percent level of significance.
4. Compute a 90 percent confidence interval for μ1- μ2.
Solution
Exercise 8.2.2. Suppose we have two normal populations with means μ1, μ2 and equal standard deviation σ. A sample
of size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x =
3.7, s1 = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard
deviation were y = 4.1, s2 = 8.7.
2. Compute the margin of error for a 95 percent confidence interval for μ1- μ2.
3. Compute a 95 percent confidence interval for the difference of mean μ1- μ2.
Solution
Exercise 8.2.3. The birth weight of the babies in developed and developing countries are normally distributed with
mean μ1, μ2 and equal standard deviation σ. (My data is not real.) Suppose the following data about the birth weight
from developed and developing nations were collected.
Developed
8.8 8.1 6.3 9.7 6.3 Developing
7.1 5.3 7.7 9.1 8.1 6.3 5.2 8.3 5.9 5.5
8.2 7.9 8.3 8.9 9.0 7.1 8.1 7.9 6.3 6.9
10.1 9.9 8.8 7.8 5.2 9.1 8.1 7.0 4.9 5.3
1. Compute a point estimate of the difference of mean birth weight μ1- μ2.
3. Determine the maximum error of the difference μ1- μ2 at the 95 percent level of confidence.
Solution
Exercise 8.2.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. Assume that the height of African and Indian elephants have an
equal mean σ. The mean heights of African elephants and Indian elephants are μ1, μ2, respectively. Suppose the
following data were collected on the height of elephants from the two continents (these are not real data).
African
10.9 11.7 9.3 9.9 11.5 Indian
8.8 12.9 11.7 9.1 11.1 7.1 8.3 8.2 9.1 10.3
9.1 8.7 10.5 11.3 12.3 9.3 9.7 8.9 8.8 9.1
13.1 12.9 9.5 10.7 11.3 7.9 9.9 9.2 8.8 8.1
8.7 8.8 9.3 10. 1 9.9
9.9
Solution
Similarly, we might like to compare the proportion of defective items produced by an old machine and new machine in a
factory.
Assume we have two populations. Let p1 be the proportion of Population 1 that has an attribute A and let p2 be the
proportion of Population 2 that has the attribute A. We want to compute a confidence interval for p1-p2.
So, we take a sample of size m from Population 1 and let X be the number of sample members that have the attribute A
and X=X/m be the sample proportion that has the attribute A. ( We may say that X is the number of "success" in this
sample from Population 1 and X=X/m is the proportion of "success".) We take a sample from Population 2 of size
n, which is independent of the other sample. Let Y be the number of sample members that has attribute A
and Y=Y/n be the sample proportion that has the attribute A. (So, Y=Y/n is the sample proportion of "success" from
Population 2.)
(Let me explain the context of the example above. We interview m males and X would be the number of males
in this sample who make more than fifty thousand annually and X=X/m would be the proportion of the males
in this sample who make more than fifty thousand annually. Similarly, we interview n females and Y=Y/n
would be the proportion of females in this sample who make more than fifty thousand.)
We develop a confidence interval for p1-p2 as follows.
X=X/m
Y=Y/n
2. As we have seen before, by CLT, we have that X has N(p1,σ1) distribution where σ1 = √(p1(1-p1)
/m) and Y has N(p2,σ2) distribution where σ2 = √(p2(1-p2) /n).
4. As we have assumed that the X samples and Y samples are mutually independent, it follows that X-Y has N(p1-
p2,σ) distribution where σ = √ ( σ12 + σ22 ).
7. As in section 7.4, we use X as an estimate for p1 and Y as an estimate for p2 and get the following theorem.
Theorem. An approximate (1-α)100 percent confidence interval for p1-p2 is given by
where
E= Zα/2√( X(1-X)/m + Y(1-Y)/n )
Exercise 8.3.1. Suppose two independent samples were collected from two populations. We want to compare the
proportions p ,p , respectively, of an attribute A present in these two populations. Use 95 percent confidence interval to
1 2
estimate p -p . We are given that x = 55 had the attribute A in a sample of size m = 117 from the first population and y
1 2
= 37 had the attribute A in a sample of size n = 79 from the second sample.
Solution
Exercise 8.3.2. To compare the proportions p ,p of defective items produced by new and old machines, respectively,
1 2
samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of
41 items from the old, 9 were defective. Compute a 99 percent confidence interval for p -p
1 2
Solution
Exercise 8.3.3. To compare the proportions p1,p2 of men and women, respectively, who watch football, data was
collected. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch
football. (These are not real data.) Construct a 99 percent confidence interval for p1-p2.
9.1 The Philosophy of Testing Hypotheses 9.2 Developing a Test 9.3 Testing on a Single Population
2 9.5 Population Proportion 9.6 Testing of Hypotheses to Compare
9.4 Testing Hypotheses on Variance σ
Two Populations
9.7 Comparing Means of Two 9.8 Comparing Proportions p , Homework 28 - 32
1
Populations: σ , σ Unknown p of Two Populations
1 2 2
Example 1. Suppose we want to test the hypothesis that the disparity between the wages (annual income) of working
men and women does not exist any more. Let μ1 be the mean annual income of men and μ2 be the mean annual income
of working women. So, our Null hypothesis H0 and the alternative hypothesis HA may be written as
H0 : μ1- μ2 > 0
HA : μ1- μ2 = 0
Example 2. A TV commentator mentions that only about 10 years ago the average life expectancy of a human being was
75, and now it has increased substantially. To test the claim of this commentator, we let μ be the average life expectancy
of a human being. Then we set up our Null and alternative hypotheses as follows:
H0 : μ =75
HA : μ >75
1. Definition. A statistical hypothesis is a statement, claim, or proposition regarding a population. Most often, it
is about the values of the population parameters. In the above two examples, H0 and HA are statistical
hypotheses.
2. It is important to consider which is a Null hypothesis and which is an alternative hypothesis in a given context.
Essentially, one is the negation of the other.
3. The Null hypothesis H0 represents the status quo; it is something that you have believed for a long time, or it
is some assumption or method that has been working reliably for you for a long time. You want to hold on to
the Null hypothesis unless there is very strong evidence, in the collected data, that the alternative
hypothesis is better.
4. The alternative hypothesis represents a new claim or something out of the ordinary. It could be a
researcher's new technology or some sales person's claim that his/her product is better. We would be very
skeptical about the alternative hypothesis and would accept it only if there is very strong evidence, in the
collected data, in favor of it.
5. Given a Null hypothesis H0 and an alternative hypothesis HA, a test of hypothesis is a rule or a procedure to
decide, based on the collected sample, whether to accept H0 or HA.
Our test will be based on the value of a test statistic. The rule is also called the decision rule or a test of
significance.
6. Two Types of errors. In this process of testing, we may commit two types of errors.
1. If we reject H0 when it is in fact true, then it is called a type one error.
2. If we accept H0 when it is in fact false, then it is called a type two error.
3. The probability of committing a type one error is called the level of significance and is, normally,
denoted by α. Usually, α will be a .1, .05, .01 or a small number.
Let X be a random variable with mean μ and standard deviation σ. Some of our hypotheses testing will look like the
following.
H0 : μ = 75
HA : μ ≠ 75
or
H0 : μ = 75
HA : μ > 75
or
H0 : μ = 75
HA : μ < 75
or
H0 : μ = μ 0
HA : μ > μ 0
or
H0 : μ = μ 0
HA : μ < μ 0
To Develop a test:
Suppose we have a random variable X with mean μ and standard deviation σ. We want to develop a test procedure for the
following null and alternative hypotheses.
H0 : μ = μ 0
HA : μ ≠ μ 0
We take a sample X1,X2, …, Xm of size m from the X population and let X be the sample mean.
1. We assume that sample size m is large enough, so we have by CLT that X has
N(μ, σX)
distribution, where
σX = σ/√m.
2. Both type one and type two errors can be controlled by increasing the sample size m. But once the sample size is
fixed, it is not possible to control both simultaneously. If you want to reduce the probability of type one error, the
probability of type two error will go up. The converse is also true. Since we are more concerned about type
one error, we will try to minimize the probability of type one error, which is also called the level of
significance. So we want to develop a test at the level of significance α.
we will reject our null hypothesis H0 only if X and μ0 are far apart, that is, if
| X - μ0| is large.
4. Also, if H0 is true, then μ = μ0 and
Z=(X-μ0) /σX
σX = σ/√m.
Expression Z above will be called a test statistic and we will accept H0 if the observed (absolute) value |z| of |Z|
is small and reject H0 if the observed value |z| of |Z| is large.
5. If H0 is true, then
Accept H0 otherwise.
Some Hypotheses and Decision Rules. We will assume that the value of σ is known.
1. Two-tail test: Suppose we are testing
H0 : μ = μ0
HA : μ ≠ μ0
Accept H0 otherwise.
2. Left-tail test: Suppose we are testing
H0 : μ = μ0
HA : μ < μ0
Accept H0 otherwise.
Accept H0 otherwise.
Definition. The set of values (that is, the intervals) that leads to the rejection of the Null hypothesis H0 is called
the rejection region or the critical region.
Definition. Suppose we have a test statistic T to test H0 against HA. Let the observed value of T = t. The P-value is
defined as the probability, assuming H0 is true, that T will take a value at least as extreme as t or worse. In the above
decision rules, our test statistic is
Z = (X-μ0) /σX
p=P(Z ∉(-|z|,|z|))
p=P(Z < z)
p=P(Z > z)
1. In the TI-83 menu the above test is called the Z-Test, which comes under TESTS.
2. When we use calculators (say TI-83) for testing hypotheses, the calculator will give us z-values and p-values.
3. We can use the z-values with the above decision rules to test hypotheses.
4. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Remark. For the rest of this chapter, we will test hypotheses for various parameters.
1. In each case, as above, we will have three tests—the two-tail test, the left-tail test, and the right-tail test.
2. In each case, the calculator will give the value of the test statistic (as the z-value above) and the p-value.
3. If we use the p-value for a test, then the decision rule will remain the same for all the tests to come:
Reject H0 if p < α
Accept H0 otherwise.
Exercise 9.2.1. Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you
have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis
H0 : μ = 75
HA : μ ≠ 75
At the 5 percent level of significance will you reject or accept the null hypothesis?
Solution
Exercise 9.2.2. (Change the level of significance.) Assume that you have a normal population with mean μ and standard
deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to
test the null hypothesis
H0 : μ = 75
HA : μ ≠ 75
At the 1 percent level of significance will you reject or accept the null hypothesis?
Solution : Same as 9.2.1
Exercise 9.2.3. (Change the alternative hypothesis) Assume that you have a normal population with mean μ and
standard deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81.
We want to test the null hypothesis
H0 : μ = 75
HA : μ > 75
At the 5 percent level of significance will you reject or accept the null hypothesis?
Solution
Exercise 9.2.4. The time taken by an athlete to run an event is normally distributed with mean μ and known standard
deviation σ = 3.5 seconds. The coach believes that his mean has improved from last year's mean 34 seconds. To test, the
athlete ran 16 times and the sample mean was found to be X = 31 seconds.
Solution
In this section, we assume that X is a N(μ,σ) random variable. In the last section, we assumed that σ was known; but in
this section we assume that σ is not known. We will do all three tests as in the above section, but assume that the value
of σ is not known.
Once again, we draw a sample X1,X2,…,X m of size m from the X population. Let X and S2 be the sample mean and
variance, respectively. The test statistic we use is
T=((X-μ0) √m) /S
If H0: μ = μ0 is true then T has t-distribution with degrees of freedom m-1. Using the same kind of arguments, we
formulate the following decision rules.
H0 : μ= μ0
HA : μ ≠ μ0
Accept H0 otherwise.
Accept H0 otherwise.
H0 : μ = μ 0
HA : μ > μ0
Accept H0 otherwise.
1. In the TI-83 menu the above test is called the T-Test, which comes under TESTS. Use it when σ is not known.
2. The calculator will give us t-values and p-values.
3. We can use the t-values with the above decision rules to test hypotheses.
4. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Exercise 9.3.1. It is assumed that the lifetime (in hours) of light bulbs produced in a factory is normally distributed with
mean μ and standard deviation σ. The mean lifetime for an average light bulb on the market is 6000 hours. To estimate μ,
the following data was collected on the lifetime of light bulbs.
5110 4671 6441 3331 5055 5270 5335 4973 1837 5487
7783 4560 6074 4777 4707 5263 4978 5418 5123 5017
The producer claims that the mean life expectancy of the bulbs is more than the average bulbs on the market.
Solution
Exercise 9.3.2. To estimate the mean weight (in pounds) of salmon in a river, the following sample was collected.
Solution
Exercise 9.3.3. A supplier of light bulbs claims that the mean lifetime of his bulbs is longer than that of the bulbs
available on the market. It is known that the mean lifetime of the bulbs on the market is 3456 hours. To test the claim of
the supplier, you test a sample of 26 bulbs and find the sample mean to be 3720 hours and the sample standard deviation
to be s = 1152 hours. At 5 percent level of significance, would you accept the claim of the supplier?
Solution
Exercise 9.3.4. It is believed that the mean length of babies at birth in the United States is higher than the world wide
mean of 18.7 inches. A sample of 26 babies in the United States was collected, and the sample mean and standard
deviation was found to be x = 19 inches, s = 1 inch. At 1 percent level of significance, do you believe that babies in the
United States are longer?
Solution
Exercise 9.3.5. A car manufacturer claims that a new model of car will get more mileage per gallon than the old model.
The old model gets a mean mileage of 33 miles per gallon. To test the claim, 9 cars from the new model were tested and
the sample mean was found to be x = 35 miles and standard deviation s = 2.2 miles. At 5 percent level of significance,
would you accept the claim of this manufacturer?
Solution
Once again, let X be a N(μ, σ) random variable. We would like to test the Null hypothesis that
H0 : σ2 = σ20.
As usual we draw a sample X1,X2, …,Xm of size m from the X population. Let S2 be the sample variance. The test statistic
we use is
Y = (m-1)S2/σ02.
If H0 : σ2 = σ02 is true, then Y has χ2-distribution with degrees of freedom m-1. Using the same kind of arguments, we
formulate the following decision rules.
H0 : σ 2 = σ 0 2
HA : σ2 ≠ σ02
Accept H0 otherwise.
H0 : σ 2 = σ 0 2
HA : σ2 < σ02
Accept H0 otherwise.
H0 : σ 2 = σ 0 2
HA : σ2 > σ02
Accept H0 otherwise.
2
Remark. The TI-83 does not have a test for σ . So, one has to use the above decision rules for this section.
Exercise 9.4.1 Suppose that we have collected a sample of size n = 23 from a normal population with mean μ and
2 2
variance σ . The sample variance was found to be s = 46.7. At 5 percent level of significance, would you conclude
2
that σ is bigger than 25?
Solution
Exercise 9.4.2 Following is data on the life expectancies of a group of people older than 75.
87 92 81 76 81
87 79 88 88 79
81 89 97 91 82
2
At one percent level of significance, would you conclude that the variance, σ , of life expectancies is higher than 16?
Solution
Exercise 9.4.3 Following is data on a household's monthly gas consumption (in ccf) during the winter months.
154 222 264 257 127
228 240 393 278 140
2 2
At 5 percent level of significance, would you conclude that the variance σ of gas consumption is less than 6400 ccf ?
Solution
9.5 Population Proportion
Let p be the population proportion that has a particular attribute A. We want to test Null hypothesis
H0 : p = p 0 .
As usual, we draw (or interview) a sample of size m. Let X be the number of sample members that has this attribute
and X = X/m be the sample proportion. (So, X is the sample proportion of "success.") The test statistic we use is
Z=(X-p0) /σX
where
σX = √[(p0(1-p0)) /m].
If H0 : p = p0 is true, then Z has approximately N(0,1) distribution. As before, our decision rules are
H0 : p = p0
H A : p ≠ p0
Accept H0 otherwise.
H0 : p = p0
H A : p < p0
Accept H0 otherwisep.
H0 : p = p0
HA : p > p0
Accept H0 otherwise.
1. In the TI-83 menu the above test is called the 1-PropZTest, which comes under TESTS.
2. The calculator will ask for p0, the number of success x, and the sample size n.
3. The calculator will give us z-values and p-values; p-cap is, in fact, sample proportion of success x = x/n.
4. We can use the z-values with the above decision rules to test hypotheses.
5. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Problems on 9.5: Population Proportion
Exercise 9.5.1. In a sample of 197 apples from a lot, 19 were found to be sour.
1. At one percent level of significance, would you conclude that more than 10 percent of the apples are sour?
2. At five percent level of significance, would you conclude that more than 10 percent of the apples are sour?
3. At ten percent level of significance, would you conclude that more than 10 percent of the apples are sour?
Solution
Exercise 9.5.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 61 of them got
the virus. It is known that usually fifty percent of the population get the virus.
1. At one percent level of significance, would you conclude that the vaccine is effective?
2. At five percent level of significance, would you conclude that the vaccine is effective?
3. At ten percent level of significance, would you conclude that the vaccine is effective?
Solution
Exercise 9.5.3. Before an election for a congressional seat, a poll was conducted. Out of 887 randomly selected voters
interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.
1. At five percent level of significance, would you conclude that candidate A will receive more than 40 percent of the
vote?
Solution
2. At ten percent level of significance, would you conclude that candidate A will receive more than 40 percent of the
vote?
3. At ten percent level of significance, would you conclude that candidate B will receive more than 40 percent of the
vote?
Solution
As we have computed confidence intervals to compare two populations, in this section we will do significance tests to
compare two populations.
Let X be a random variable with mean μ1 and standard deviation σ1 and let Y be a random variable with mean μ2 and
standard deviation σ2. (For example, X could be the height of an American male and Y could be the height of an American
female.)
We may like to compare the equality (or inequality) of means μ1, μ2. So, our Null hypothesis is given by
H0 : μ1 = μ2
or equivalently
H0 : μ1- μ2 = 0.
So, as before we collect a sample X1,X2, …,Xm, of size m from the X-population and a sample Y1,Y2, …,Yn, of size n, from
the Y-population. Let X and S12 be the sample mean and variance, respectively, of the X-sample. Let Y and S22 be the
sample mean and variance, respectively, of the Y-sample.
Z = (X-Y)/σd
where
σd = √( σ12 /m + σ22 /n )
If the Null hypothesis H0 : μ1- μ2 = 0 is true, then Z has N(0,1) distribution. As before, our decision rules are formulated
as follows.
H0 : μ1 - μ2= 0
HA : μ1 - μ2≠ 0
Accept H0 otherwise.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 < 0
Accept H0 otherwise.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 > 0
Accept H0 otherwise.
Remark. If sample sizes m,n are large, we can use S1, S2 as an estimate for σ1, σ2 in the above expression for Z. So, the
modified formula for Z would be :
Z = (X-Y)/sd
where
Sd = √( S12 /m + S22 /n )
1. In the TI-83 menu the above test is called the 2-SampZTest, which comes under TESTS.
2. Use it when σ 1 and σ 2 are known.
3. The calculator will give us z-values and p-values.
4. We can use the z-values with the above decision rules to test hypotheses.
5. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Problems on 9.6: Testing of Hypotheses to Compare Two Populations — σ1, σ2 Known
Exercise 9.6.1. Suppose we have two normal populations with means μ , μ and standard deviation σ , σ , respectively.
1 2 1 2
It is known that σ = 8.1 and σ = 11.3. A sample of size m = 64 was collected from the first population, and the sample
1 2
mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean
was found to be y = 4.1. At 5 percent level of significance, would you conclude that μ ≠ μ ?
1 2
Solution
Exercise 9.6.2. Suppose the birth weight of babies in developed and developing countries are normally distributed with
mean μ1, μ2 and standard deviation σ1, σ2, respectively. (My data is not real, as is often the case.) It is known the σ1 =
2.3 pounds and σ2 = 2.9 pounds. A sample of size m = 35 babies from the developed nations was collected, and the
sample mean birth weight was found to be X = 8.9 pounds. A sample of size n = 48 babies from the developing nations
was collected, and the sample mean birth weight was found to be y = 7.6 pounds.
1. At 5 percent level of significance, would you conclude that the mean birth weight of babies in the developed
nations is higher than that of the developing nations?
2. At 1 percent level of significance, would you conclude that the mean birth weight of babies in developed nations is
higher than that of developing nations?
Solution
Exercise 9.6.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. The mean and standard deviation height of African elephants
are μ1, σ1= 1.5 feet, respectively. The mean and standard deviation of the height of Indian elephants are μ2, σ2= 1.3
feet, respectively. A sample of size 25 African elephants was collected, and the sample mean height was found to be x =
10.9 feet. A sample of size 28 Indian elephants was collected, and the sample mean height was found to be y = 9.1 feet.
1. At 5 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of the Indian elephants?
2. At 1 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of the Indian elephants?
Solution
As we did with confidence intervals, we consider the case where σ1, σ2 are not known, but we assume that standard
deviations are equal:
σ1 = σ2 = σ.
where SX and SY are the respective sample standard deviations of the corresponding samples. The test statistic that we
use is
T = (X-Y) /[Sp √( 1/m+1/n) ]
If the Null hypothesis H0 : μ1- μ2 = 0 is true, then T has a t-distribution with degrees of freedom m+n-2. We formulate
the test hypotheses and the decision rules as follows.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 ≠ 0
Accept H0 otherwise.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 < 0
Accept H0 otherwise.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 > 0
Accept H0 otherwise.
1. In the TI-83 menu the above test is called the 2-SampTTest, which comes under TESTS.
2. Use it when σ 1 = σ 2= σ are UNKNOWN and equals. Either s1 and s2 will be given or raw data will be given.
3. Always use Pooled estimate of σ by selecting YES for "Pooled".
4. The calculator will give t-values and p-values and also the pooled estimate SXP.
5. We can use the t-values with the above decision rules to test hypotheses.
6. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Exercise 9.7.1. Suppose that we are comparing two similar normal populations with means μ1, μ2, respectively, equal
standard deviation σ. We collected a sample of size m = 11 from the first population that produced a sample mean x =
13.2 and samples standard deviation s1 = 2.33. A sample of size n = 13 was collected from the second population that
had sample mean y = 11.5 and sample variance s2 = 2.73.
8.2 7.9 8.3 8.9 9.0 7.1 8.1 7.9 6.3 6.9
10.1 9.9 8.8 7.8 5.2 9.1 8.1 7.0 4.9 5.3
Solution
Exercise 9.7.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. Assume that the height of Arican and Indian elephants have an
equal standard deviation σ. The mean heights of the African elephants and Indian elephants areμ1, μ2, respectively. The
following data were collected on the height of the elephants from the two continents (these are not real data):
9.1 8.7 10.5 11.3 12.3 9.3 9.7 8.9 8.8 9.1
13.1 12.9 9.5 10.7 11.3 7.9 9.9 9.2 8.8 8.1
8.7 8.8 9.3 10. 1 9.9
9.9
1. At 5 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of Indian elephants?
2. At 1 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of Indian elephants?
Solution
H0 : μ1- μ2 = 0.
We continue to denote the first population random variable by X and the second population random variable by Y. We also
assume that X and Y have normal distribution, and that they are independent.
In certain situations, it is natural to collect samples in "pairs" (X,Y) from the two populations and consider the difference D
= X-Y. So, D has mean
μD = μ1- μ2
Also D has
N(μD, σD )-distribution
where
σD = √( σ12 + σ22 ).
We will collect samples in pairs (X1,Y1), …,(Xn,Yn) and look at the corresponding D-sample:
D1 = X1-Y1, …, Dn = Xn-Yn.
Let
D = ( D1+…+Dn )/n
S2D = [∑ (Di-D)2] / (n-1)
If the Null hypothesis H0 : μD = μ1- μ2 = 0 is true, then T has a t-distribution with degrees of freedom n-1.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 ≠ 0
Accept H0 otherwise.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 < 0
Accept H0 otherwise.
H0 : μ1 - μ2 = 0
HA : μ1 - μ2 > 0
Accept H0 otherwise.
Example. Suppose we are comparing two models of cars to see how fast they accelerate. In this case, to avoid any
variation due to individual drivers, we take n drivers and let each driver drive one of each model of car. So, (xi,yi) are the
accelerations of the first and second model driven by driver 1. Thus, we will have n pairs of observations.
Remark. The same technique of paired t-test will give us that a (1-α)100 percent confidence interval for μD = μ1- μ2 is
Once again, we have two populations and let p1 be the proportion of Population 1 that has a certain attribute A and let
p2 be the population proportion of Population 2 that has attribute A. We want to compare p1 and p2. We want to test the
equality of these two proportions. So,our Null hypothesis is
H0 : p1-p2 = 0.
We take a sample of size m from Population 1 and let X be the number of the sample members that have this attribute A,
and X = X/m be the sample mean. Similarly, we take a sample (or interview) of size n and let Y be the number of the
sample members that have this attribute A and Y = Y/n be the sample mean. (So, X, Y are proportion of "success" of the
two samples.)
Write
P=(X+Y)/(m+n)
where
sD = √ [P(1-P)(1/m + 1/n) ]
If H0 : p1-p2 = 0 is true, then Z has, approximately, N(0,1) distribution. Now our test hypotheses and the decision rules
are as follows.
H 0 : p 1 - p2 = 0
HA : p1 - p2 ≠ 0
Accept H0 otherwise.
HO : p1 - p2 = 0
HA : p1 - p2 < 0
Accept H0 otherwise.
Accept H0 otherwise.
1. In the TI-83 menu the above test is called the 2PropZTest, which comes under TESTS.
2. The calculator will give us z-values and p-values. Also, in our notations, p1-cap = X, p2-cap = Y, p-cap = P
3. We can use the z-values with the above decision rules to test hypotheses.
4. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Exercise 9.8.1. Suppose two independent samples were collected from two populations. We want to compare the
proportions p1,p2 , respectively, of an attribute A present in these two populations. We are given that x = 55 had the
attribute A in a sample of size m = 117 from the first population, and y = 37 had the attribute A is a sample of size n = 79
from the second sample.
At 1 percent level of significance, would you conclude that p > p ?
1 2
Solution
Exercise 9.8.2. To compare the proportions p1,p2 of defective items produced by new and old machines, respectively,
samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of
41 items from the old machine, 9 were defective.
At 5 percent level of significance, would you conclude that p < p ?
1 2
Solution
Exercise 9.8.3. Data was collected to compare the proportions p1,p2 of men and women, respectively, who watch
football. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch
football. (These are not real data).
1. At 5 percent level of significance, would you conclude that the proportion of men who watch football is higher than
the proportion of women who watch football?
2. At 1 percent level of significance, would you conclude that the proportion of men who watch football is higher than
the proportion of women who watch football?