You are on page 1of 86

Lesson 1: The Language and Terminology

Introduction 1.1 Definitions & Concepts 1.2 Frequency Distribution

1.3 Pictorial Representation of Data Homework 1 - 3

Introduction

Most people think of statistics as the study of the numerical features of a subject/population. It means the same to
statisticians, but also emphasizes the methods of collecting data, summarizing and presenting data, and drawing
inferences from data.

We all see on TV how political pundits justify opposing points of view by presenting statistics from respectable sources.
How could something be a science when it justifies two opposing points of view? The answer is that statistics has a
scientific basis but it can be misrepresented in use.

Example. During the saga of President Clinton's impeachment, we observed the following:

1. One pundit says that, according to statistics, the majority of Americans think that character matters.
2. The other pundit says, also according to statistics, that the majority of Americans think the president is doing a
good job.

The implication here is that one of them was "wrong." But the science of statistics says that both were correct. Data was
collected and analyzed, and it was found that the majority of Americans think that character matters and that the
majority of Americans think the president is doing a good job. It does not matter to the science of statistics which one of
the statistically established facts you or I want to believe.

Another point about the nature of statistics as a science is that it is not a deterministic science. It does not have laws
like force is equal to mass times acceleration. Statements in statistics come with a probability (i.e., quantified chance) of
being correct. When a weatherman says that it will rain today he means that there is, say, a ninety five percent chance
that it will rain today. Roughly, this means that if he makes the same prediction one hundred times he will be correct 95
times, and it will not rain the other 5 days. The problem is that sometimes a weatherman will hide the information that
there is a 95 percent chance only. Such information hiding is sometimes done for simplicity.

Before I conclude this introduction, let me tell you an interesting anecdote about the development of this subject. When
the proposal to establish the Indian Statistical Institute in Calcutta was considered by the government of India in the early
part of the last century, some critics said, then why not an institute in astrology? At the inception of statistics as a science
there was a lot of skepticism about its scientific validity. Those days are gone, and statistics is not likened to astrology any
more! Statistics is a well-founded and precise science. It is a nondeterministic science in nature; it makes precise
probabilistic statements only.

In this course we will be talking about two branches of statistics. The first one is called descriptive statistics and deals
with methods of processing, summarizing, and presenting data. The other part deals with the scientific methods of
drawing inferences and forecasting from the data, and is called inferential or inductive statistics.

In the rest of this lesson and the next we deal with descriptive statistics, which includes the presentation of data in the
form of tables, graphs, and computations of various averages of data.

1.1 Basic Definitions and Concepts

In statistics we use a small representative "sample" to study a big "population." The reason for this is the cost or even the
impossibility of studying the whole population.

Population and Sample

Definitions. A complete collection of data on the group under study is called the population or the universe.

A member of the population is called a sampling unit. Therefore, the population consists of all its sampling units.

A Sample is a collection of sampling units selected from the population.


Most often, we will work with numerical characteristics (like height, weight, and salary) of a group. So usually
the population is a large collection of numbers and the sample is a small subset of the population.

Example. Suppose we are studying the daily rainfall in Lawrence. Since daily rainfall could be from 0 inches to anything
above 0, the population here is all nonnegative numbers (i.e., the interval [0, ∞)). A sample from this population would
be the observed amount of daily rainfall in Lawrence on some number of days. A sample of size 11 would be the observed
daily rainfall in Lawrence on 11 days.

Variables

Many definitions of variables are available in standard textbooks. For our purpose the following definition will suffice.

Definition. A variable is a rule or a formula or a mechanism that associates a value with each member of the
population. So, given a member w, a variable X assigns a valueX(w) to w. For us X(w) will be a characteristic (like height,
weight, time, salary) of the population.

Example. Suppose we are studying the KU student population. The population is the whole collection of KU students. A
KU student is a sample unit. If GPA is the "characteristic" that we are studying, then X = the GPA of a student is a
variable. So, given a student, X has a value. For example:

X(Donald Smith) = 3.25, X(Sam Donaldson) = 3.11,


X(Karen Currie) = 3.89, X(King Who) = 2.13
On the other hand, if GENDER is the "characteristic" that we are studying, then Y = gender of a student is a variable. So,
given a student, Y has a value. For example:

Y(Donald Smith) = Male, Y(Sam Donaldson) = Male,


Y(Karen Currie) = Female , Y(King Who) = Male
If HEIGHT is the characteristic that we are studying, then Z = height of students is a variable.

To give another example, if credit hours completed is the characteristic studied, T = the number of course credit hours
completed so far by a student is a variable.

Similarly, given any other characteristic like weight, annual income, annual expenditure, you can construct a variable for
this population.

A variable that takes numerical values is called a quantitative variable. So, the variables X, Z, and T above are
quantitative variables, while Y is not. A variable that takes non-numerical values is called a qualitative variable. So, the
variable Y above is a qualitative variable. We will mostly be concerned with quantitative variables.

We discuss two types of quantitative variables: continuous and discrete variables. A quantitative variable that can assume
any numerical value over an interval is called acontinuous variable. Since Z above can (hypothetically) assume any
value between 0 to 100 inches, Z is a continuous variable. T assumes only integer values and is therefore not a
continuous variable.

A different way to understand a discrete variable is that the possible values of the variable can be written down (or can be
counted) in a (finite or infinite) list. We say that the values of a discrete variable are countable.

A quantitative variable is called a discrete variable if its possible values consist of breaks between successive values. If
a variable assumes only a finite number of values, then it is also called a finite variable. Otherwise the variable is called
an infinite variable. A finite variable is definitely a discrete variable. The variable T above is a discrete variable.

Examples of Continuous and Discrete Variables

1. The examples of continuous variables are weight, length, volume, area, and time.
2. For this course, examples of discrete variables are always the number of something—number of typos, number of
road accidents, number of phone calls.
Parameters and Statistics

Definition 1. Given a set of data, any numerical value computed from the data using a formula or a rule is called
a quantitative measure of the data.

Definition 2. A quantitative measure of a population data is called a parameter. In other words, parameters belong to
the whole population and are computed (if feasible) from the WHOLE population data. Examples: the average GPA of all
KU students, the height of the tallest student in KU, the average income of the entire KU student population.

One way to study a population is to know some of the parameters of the population. Unfortunately, computing such
parameters could be expensive or even impossible. Essentially, parameters are unknown and the main game of
statistics is to try to estimate parameters on the basis of small samples collected from the population.

Definition 3. A quantitative measure of a sample data is called a statistic. So, any constant that we compute from a
sample is a statistic. We use these statistics to estimate the parameters of the population. For example, the average
height computed from a sample is a reasonable estimate for the (parameter) average height of the KU student
population. Obviously, we do not expect the value of the statistic to be exactly equal to the parameter value.
Hopefully, the error will be small or will exceed our tolerable limit very rarely (say once in a 100 trials).

Why do we need a statistic?

Sometimes it will be impossible to know the actual value of a parameter. For example, let μ be the mean length of the life
of light bulbs produced by a company. In this case, the company cannot test all the bulbs it produces to find a mean
length. So, the best it can do is to test a few bulbs, compute the sample mean length (a statistic) of the life of these bulbs
and use it as an estimate for the mean length (parameter μ) of the life for all the bulbs it produces.

Definition 4. The data that has not been processed or organized in any form is called raw data. When the data is
arranged in an increasing or decreasing order, then it is called an array. The range of the data is the difference between
the largest and the smallest value of the data.

range = highest value - lowest value.

1.2 Frequency Distribution

In this section we talk about representation of data organized in tabular form. Such a representation is called
a frequency distribution. We are mostly concerned with numerical data (i.e., quantititative data), but also consider
some non-numerical data (i.e., qualitative data).

Example. (from Khazanie, p. 18) The following is data on the blood group of 36 patients in a hospital:

O A B O A A A O O
O A O A B O O O AB
B A A O O A A O AB
O A A B A O A O O

We have four types of blood groups, namely, O, A, B, AB. Each of these blood groups may be referred to as a "class."
The frequency of a class is defined as the number of data members that belong to that class. For example, the frequency
of the class O is 16; the frequency of class A is 14. A table that lists the classes and the corresponding frequency is called
the frequency distribution of this qualitative data. Following is the frequency distribution of this data:

Blood Group Frequency


O 16
A 14
B 4
AB 2
Total 36

Ungrouped Data

For the quantitative data, we consider two types of frequency table. When we are working with a large set of data we
group that data into a few classes and construct a "frequency table," which we will discuss later. If the data set is small or
if the number of values that appear in the data is small we need not group the data. Instead, we make a list of all the
data members and give the corresponding frequency for each data member in a table. The number of times a data
member (i.e., value) appears in the data is called the frequency of the data member. A list that presents the data
members and the corresponding frequency in a tabular form is called a frequency table orfrequency distribution.
The relative frequency and percentage frequency of a data member x are defined as follows:

frequency of x

relative frequency of x =

total # of data points

and

frequency of x

percentage frequency of x = · 100.

total # of data points

The frequency table may also contain the relative and percentage frequency. Since we did not group the data into a few
classes, we call this the frequency distribution of the ungrouped data.

Example 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several
time trials, and the following sample of times taken (in seconds) to complete the laps was collected:

50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50

Note that there are 35 observations here. So we say that the size of the sample (or data) is 35. Also the values present
are 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56. Since there are only 11 distinct values present we can make a frequency
table for the ungrouped data. The following is the frequency distribution of this ungrouped data:

Time Relative Percentage


Frequency
(in seconds) Frequency Frequency
46 1 1/35 2.86
47 1 1/35 2.86
48 3 3/35 8.57
49 3 3/35 8.57
50 4 4/35 11.43
51 6 6/35 17.14
52 4 4/35 11.43
53 5 5/35 14.29
54 5 5/35 14.29
55 2 2/35 5.71
56 1 1/35 2.86
Total 35 1 100

Grouped Data

When we are working with a large set of data that has too many distinct class member (i.e., values) then we group the
whole set of data into a few class intervals and give the corresponding "frequency" of the class. When the data is
presented in this way, the data is called grouped data. The number of data members that fall in a class interval is called
the class frequency and the relative and percentage frequencies are computed by the same formula as above. A list
that gives various class intervals and the corresponding class frequencies in a tabular form is called a class frequency
table or class frequency distribution of the data. The frequency distribution may also include the relative and
percentage frequencies.

Grouped Data and Loss of Information

Sometimes it is convenient or necessary to group data into class intervals and construct a class frequency distribution.
This is the case when there are too many distinct numbers present in the data—too many even to fit into a simple table
on a page for presentation. In such situations, we group the data in a few class intervals. While class frequency
distribution is very good for presentation and convenient for other reasons, we lose a lot of information in this process.
There is no way we can recover the original data from the class frequency distribution.

Given a set of data, a good question would be, How many class intervals should we have? The answer is that it should not
be too few nor should it be too many. If we take too few (say one), then all the information will be lost. On the other
hand, if we take too many, we will have the problem of having to work with ungrouped data. (In this course we will
always tell you how many classes to take.) Although sometimes it may be necessary to take class intervals of varying
width, in this course we only consider classes of equal class width.

Steps to Construct Frequency Distribution

1. Range: Pick a suitable number L less than or equal to the smallest value present in the data. Pick a suitable
number H greater than or equal to the highest value present in the data. The range R that we consider is R = H -
L.

2. Number of Classes: Decide on a suitable number of classes. (In this course we will tell you the number of
classes.)

3. Class Width: We have


R

class width = w =

Number of classes

We will pick L, H, and the number of classes so that class width is a "round number."

4. Classes: We divide our interval [L:H] into subintervals, to be called classes, as

[L,L+w],[L+w,L+2w],[L+2w, L+3w], ...,[H-w,H]

Since this definition creates an ambiguous situation in which a data value may fall into two classes, we need a
convention to address this situation.
5. Frequency: Find the frequency for each of the classes. You can use an advanced calculator or some software
(like Excel) to count frequencies.

A few more important definitions. The above intervals are called class intervals. The w above is called the
class size or width. The lower end of the class is called lower limit and the upper end of the class is called upper limit.
The class mark is the midpoint of the class, defined as follows:

lower limit of class+ upper limit of class

class mark = .

A class limit is also called a class boundary. I took a slightly different approach when I defined the classes, so that for
us class limits and class boundaries are the same. Although all the approaches are essentially the same, many slightly
different approaches are possible depending on the situation.

Example 1.2.2 The following is the weight (in ounces), at birth, of a certain number of babies.

74 105 124 110 119 137 96 110 120 115 140


65 135 123 129 72 121 117 96 107 80 91
74 123 124 124 134 78 138 106 130 97 145
93 133 128 96 126 124 125 127 62 127 92
95 118 126 94 127 121 117 124 93 135 156
143 125 120 147 138 72 119 89 81 113 91
133 127 138 122 110 113 100 115 110 135 141
97 127 120 110 107 111 126 132 120 108 148
143 103 92 124 150 86 121 98 74 85 99

We will construct a class frequency table of this data by dividing the whole range of data into class intervals.

Solution: Note that the lowest value is 62 and the highest value is 156. We take L = 60, H = 160, so R = H-W = 100. We
made such a choice of L and H, precisely so that R = 100 is a "nice" number. Now we decide to have 5 class intervals and
so w = R/5 = 20. According to what I said above, our classes should be : [60, 80], [80,100], [100,120], [120,140], [140,
160]. But if we do so then there is a risk that some data members (like 80, 100, 120, 140) will fall in two classes. One
way to avoid this is to add .5 to all the class boundaries. So, our classes are [60.5, 80.5], [80.5, 100.5], [100.5, 120.5],
[120.5, 140.5], [140.5, 160.5].

So the frequency distribution is as follows:

Relative Percentage
Classes Frequency
Frequency Frequency
60.5 - 80.5 9 9/99 9.09
80.5 - 100.5 20 20/99 20.20
100.5 - 120.5 25 25/99 25.26
120.5 - 140.5 37 37/99 37.38
140.5 - 160.5 8 8/99 8.08
Total 99 1 100
1.3 Pictorial Representation of Data

Another way to represent data is to use pictures and graphs. We see such pictorial representation in newspapers and
other sources every day. Pictorial representation is particularly important when you have to represent data to people with
limited technical background, like newspaper readers or a governmental or congressional body.

The Pie Chart

The pie chart is a commonly used pictorial representation of data. When you do your tax return every year, you find a few
pie charts in the instruction book for form 1040. These charts show what proportion/percentage of each tax dollar goes
for particular expenses. I reproduced the following pie charts from the 1040 instruction book of 1999.

Pie charts are self explanatory; we do not need to discuss them further.

The Histogram

Among pictorial representations, the most useful in this course is the histogram. The histogram of data is the graphical
representation of the frequency distribution of the data, where we plot the variable on the horizontal axis
and above each class interval, we erect a bar of the height equal to the frequency of the class. Such a
histogram is called a frequency histogram.

If, instead, we erect bars of height equal to the relative frequency, then the graph is called a relative frequency
histogram. Similarly, we can construct a percentage frequency histogram.

The following is a histogram.


We have decided to avoid unequal class lengths, which makes our discussion of the histogram fairly simple.

Remark. Take a look at the Stem and Leaf Diagram discussed in any textbook.

Example 1.3.1. Following is the frequency table of data on height (in inches) of some babies at birth. Sketch the
histogram of the following data:

Height Frequency
16-17 3
17-18 8
18-19 34
19-20 60
20-21 72
21-22 18

The Cumulative Frequency Distributions

For a given value x of a variable, the cumulative frequency of the data, for x, is the number of data members that are
less than or equal to x.

Definition. Given a frequency distribution of some data, for a class boundary x, the cumulative frequency is the sum
of all the class frequenies less or equal to x. Thecumulative frequency distribution is a table that gives the cumulative
frequencies against some x values (for us the class boundaries). We also define cumulative relative frequency and
cumulative percentage frequency as follows:

cumulative frequency of x
cumulative relative frequency of x =

total # of data points

cumulative frequency

cumulative percentage frequency of x= ×100

total # of data points

Example 1.3.2 Once again we consider the data on birth weight of babies in Example 1.2 that we discussed in the last
section. A cumulative frequency distribution can be constructed from the frequency distribution.

Solution: We have seen the frequency distribution before. The following is the cumulative distributions:
Cumulative
Cumulative Relative-Cumulative
Weight Percentage
Frequency Frequency
Frequency
60.5 0 0 0
80.5 9 9/99 9.09
100.5 29 29/100 29.29
120.5 54 54/99 54.55
140.5 91 91/99 91.92
160.5 99 1 100

The Ogive

Definition. The ogive is a line graph, where we plot the variable on the horizontal axis and the cumulative frequency on
the vertical axis. If we plot the cumulative relative frequency on the vertical axis, then the line graph is called
the relative frequency ogive.

Use of Calculators

Because we will be using calculators (TI-83) extensively in this course, let me explain how you enter data in the TI-83.

Use of Calculators (TI-83):


Enter Your Data:

1. Press the button "stat."


2. Select "Edit" in the Edit menu and enter.
3. You will find 6 lists named L1, L2, L3, L4, L5, L6.
4. Let's say you want to enter your data in L1. If L1 has some data, you clear it by pressing the stat button
and selecting ClrList in the Edit menu. ClrList appears then type L1 and hit enter. To type "L1" on your TI-
83 simply press 2nd then 1.
5. Once L1 is cleared, you select Edit in the Edit menu and enter.
6. Now type in your data; enter one by one.
It is not easy to construct a frequency table of a data set unless you are systematic. Traditionally, we used "tally marks"
to count the frequency. Now you can use some software programs (e.g., Excel). Let me show you a method, using a
calculator (TI-83).

1. Press "stat."
2. To input data, enter "edit."
3. Enter your data (say in L1).
4. Press "stat."
5. Enter "sortA" L1.
6. Press "stat" and then enter "edit." On L1 you will see that the data is sorted in an increasing order.
7. Now you can count the frequencies.

Problems on 1.2: Frequency Distribution

Exercise 1.2.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several
time trials, and the following sample of times taken (in seconds) to complete the laps was collected:

50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50

The following is the frequency distribution of this ungrouped data:

Time Relative Percentage


Frequency
(in seconds) Frequency Frequency
46 1 1/35 2.86
47 1 1/35 2.86
48 3 3/35 8.57
49 3 3/35 8.57
50 4 4/35 11.43
51 6 6/35 17.14
52 4 4/35 11.43
53 5 5/35 14.29
54 5 5/35 14.29
55 2 2/35 5.71
56 1 1/35 2.86
Total 35 1 100

Construct a histogram.

Exercise 1.2.2. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May
2000.

94 105 124 110 119 137 96 110 120 115 119


104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98

Construct a class frequency table of this data by dividing the the whole range of data into class intervals:

[60.5-70.5], [70.5-80.5], [80.5-90.5], [90.5-100.5], [100.5-110.5], [110.5-120.5], [120.5-130.5], [130.5-140.5],


[140.5-150.5]

Solution

Exercise 1.2.3. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May
2000.

18 18.5 19 18.5 19 21 18 19 20 20.5


19 19 21.5 19.5 20 17 20 20 19 20.5
18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18

Construct a frequency table for this data by dividing the whole range into class intervals:

[16-17], [17-18], [18-19], [19-20], [20-21], [21-22].

Note: If a data member falls on the boundary, count it in the right/upper class-interval.
Solution

Exercise 1.2.4. The following data represents the number of typos in a sample of 30 books published by some publisher.

156 159 162 160 156 162


159 160 156 156 160 162
156 159 162 156 162 158
160 158 159 162 158 158
162 160 159 162 162 160

Construct a frequency table (by sorting in your calculator). Also construct a histogram.
Solution

Exercise 1.2.5. Following is data on the hourly wages (paid only in whole dollars) in an industry.

9 11 8 9 10 11 7 10 12 13
7 11 8 11 14 9 10 9 11 7
13 13 14 12 9 8 12 14 15 9
9 7 12 7 12 7 7 11 13 9
11 9 9 9 10 14 11 12 14 7

Construct a frequency table (by sorting in your calculator). Also construct a histogram.
Solution

Exercise 1.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8

Construct a frequency table (by sorting in your calculator).

Lesson 2 : Measures of Central Tendency and Measures of Dispersion

Introduction 2.1 Measures of Central Tendency: 2.2 Measures of Central Tendency: Mean, Median, Mode
Mean

2.3 Measures of Dispersion Homework 4 - 7

Introduction

In this lesson we talk about two types of constants that we compute from data:

1. measures of central tendency and

2. measures of dispersion.

A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are
measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around.

2.1 Measure of Central Tendency: Mean

The most common measure of central tendencies is the mean or arithmetic mean.

Definition. The mean or the arithmetic mean of a set of data is given by

mean = sum of all the data values .


size of the data

If we denote a data value (i.e., the variable) by x and if n is the size of the data, then the above formula is written as

∑x

mean = x =
.
n
OR

mean = x = ∑ x/n where ∑ denotes summation.

If the data is a sample, then the mean is called the sample mean. Again, if x denotes the variable, the data is sometimes
denoted by x1,x2, ... ,xn and then

n
∑ xi
mean = x = .
i=1
n

OR

n
mean = x =
∑ xi/n
i=1
If you have not seen the notation ∑ before, it simply means summation. For example,

n
∑ xi = x1+x2+ ... +xn
i=1

Weighted Mean

Sometimes, different values in data carry different weight. Let us consider the following data and the corresponding
frequency distribution that we computed earlier:

Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several
time trials. The following are sample times taken (in seconds) to complete the laps:

50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50
Following is the frequency distribution of this data:

Time (in seconds) 46 47 48 49 50 51 52 53 54 55 56


Frequency 1 1 3 3 4 6 4 5 5 2 1
Now we want to compute the mean time. So, we add all the data values and divide by the data size 35. We already have
computed the frequency distribution which tells us that, in the data, 46 was present 1 time, 47 was present 1 time, 48 was
present 3, times and so on. So, using the frequency distribution, we compute the mean as follows :

(46x1+47x1+48x3+49x3+50x4+51x6+52x4+53x5+54x5+55x2+56x1)
mean=x= =1799/35=51.4
(1+1+3+3+4+6+4+5+5+2+1)
The mean of the original data is the weighted mean of the data values 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 56 with
the corresponding frequency as the weight. So, a new formula for the mean would be
n
∑ xi fi
i=1
x = mean = n
∑fi
i=1
OR

n n
mean =x= ∑ fi xi / ∑fi
i=1 i=1

where fi is the frequency of xi. The weighted mean is defined in more general context as follows:

Definition. If x1, x2, ... , xn in a data set have different weights and the values xi has weight wi, then the weighted
mean is defined as

weighted mean =
n
∑ wx
i i
i=1
.

n
∑ wi
i=1

OR
weighted mean = x = ∑wixi / ∑wi

Properties of the Mean

1. Combining two means. Suppose we have two sets of data. The mean of the first set is x, and the size of the
first set is m; the mean of the second set is y, and size of the second set is n. The mean of the combined data is

Combined mean = (m x +ny)/(m+n)

This is the weighted mean of x, y with weight m,n respectively.

2. Effect of translation. Let x be the mean of x1, x2, ... , xn. Then the mean of y1 = x1+d, y2 = x2+d, ... ; yn =
xn+d is given by

y= x+d

3. Effect of multiplication by a constant. Let x be the mean of x1, ... , xn. Then the mean of

z1 = cx1, z2 = cx2, ... , zn = cxn

is given by

z = cx
Properties of Mean

Remark (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you
complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean
+ 7) = 73 + 7 = 80. This is what we meant by "effect of translation."
Example (effect of multiplication by c): Suppose you have some data x1, x2, ..., xn on salaries in an industry in the
United States and the mean is $37000. On a certain day, 1 U.S. dollar = 1.4729 Canadian dollars (say c = 1.4729). So, in
Canadian dollars the mean is 37000*c = 37000 x 1.4729. Similarly, the change of units (inches to feet or cm) are
"multiplication by a constant c."

Example 2.1.2. A student took PHSX 115 (College Physics), PSYC 120 (Personality), FREN 110 (Elementary French), BUS
241 (Managerial Accounting), and MATH 365 (Elementary Statistics). The number of credit hours and the student's grade
is given in the following table:

Course PHSX 115 PSYC 120 FREN 110 BUS 241 MATH 365
Grade (Points) B (3 points) A (4 points) B (3 points) C (2 points) B (3 points)
Credit Hours 4 3 5 3 3
What is the student's GPA?

Solution. The GPA is the weighted average of the points (corresponding to the grades), weight being the course-credit
hours. So, the GPA = (3x4+4x3+3x5+2x3+3x3)/(4+3+5+3+3) = 54/18 = 3.

2.2 Measure of Central Tendency: Median, and Mode

The Median

The median represents the middle value of the data. Half the data will be less than or equal to the median, and half the
data will be greater than or equal to the median. You are above the median American income if half the American
population is making less than you make.

Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the size of data is ODD then
the median is the middle value. If it is EVEN, then themedian is the mean of the middle two values.

The Percentiles
Definition. For a number p between 0 to 100, the pth percentile xp of the data is a number such that at least p percent of
the data members are below xp and at least (100 - p) percent of the data members are above xp.

1. The 25th percentile is called the first quartile Q1.


2. The median is the 50th percentile, also called the second quartile Q2.
3. The 75th percentile is called the third quartile Q3.

The Mode

There is one other measure of central tendencies that should be mentioned.

Definition. The MODE of the data is the value or values that have the highest frequency. For example, the mode of the
set {1, 3, 5, 5, 7} is {5} because it has the highest frequency. The mode of {1, 1, 3, 5, 5, 7} is {1, 5} because 1 and 5
both have the highest frequency. Such a set is said to be bimodal.

Use of Calculators (TI-83):


Entering your data

1. Press the button stat.


2. Select "Edit" in the Edit menu and enter.
3. You will find six lists named L1, L2, L3, L4, L5, L6.
4. Let's say you want to enter your data in L1.
5. If L1 has some data, clear it by pressing the stat button and selecting ClrList in the Edit
menu.
6. Once L1 is cleared, select Edit in the Edit menu and enter.
7. Now type in your data and enter one by one.

Sorting data and computing the median

1. Enter your data in a list, say L1.


2. Select SortA in the Edit menu and enter.
3. The calculator will ask for the list. Type in the list (L1), close the parentheses, and enter.
4. The calculator will say Done.
5. Press stat, select edit in the Edit menu, and enter.
6. You will see that your data in L1 has been sorted in an increasing order.
7. If the data size is odd, the median is the middle value.
If the data size is even, the median is the average of the middle two values.

Computing the mean if only raw data is given

1. Enter your data in a list, say L1.


2. Select "1-Var Stats" in the CALC menu and enter.
3. The calculator will ask for the list. Type in the list L1 and enter.
4. The calculator will give a list of numbers; x-bar is the mean x.

Computing the mean if the frequency table is given

1. Enter the frequency table in the calculator, say, x-values in L1 and frequencies in L2.
2. Select "1-Var Stats" in the CALC menu and enter.
3. The calculator will ask for the lists. Type in the list L1, L2 and enter.
4. The calculator will give a list of numbers; x-bar is the mean x.

Problems on 2.2: Mean and Median

Exercise 2.2.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times
on a particular day.

138 142 127 137 148 130 142 133


Find the median price and mean price observed by the trader.
Solution

Exercise 2.2.2. The following figures refer to the GPA of six students.

3.0 3.3 3.1 3.0 3.1 3.1


Find the median and mean GPA.

Exercise 2.2.3. The following data give the lifetime (in days) of light bulbs.

138 952 980 967 992 197 215 157


Find the mean and median lifetime of these bulbs.
Solution

Exercise 2.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the
athlete to complete the events.

Time (in seconds) Frequency


26 3
27 6
28 5
29 6
30 9
31 3
Total 32
Compute the mean and median time taken by the athlete.
Solution

Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.

94 105 124 110 119 137 96 110 120 115 119


104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98
Compute the mean and median weight, at birth, of the babies.
Solution

Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8
Compute the mean and median hourly wage.
Solution
Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos 156 158 159 160 162


Frequency 6 4 5 6 9
Find the mean and median number of typos in a book.
Solution

Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5

19 19 21.5 19.5 20 17 20 20 19 20.5


18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18
Compute the mean and median length, at birth, of these babies.
Solution

2.3 Measures of Dispersion

Range
Clearly, the measures of central tendency—mean, median, mode—cannot tell us the "whole story" about the data.

Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of
the semester:

Section A 81 84 83 80 82
Section B 72 93 92 82 71
Both these sections have the same mean—82. But in Section A, everybody will get a B grade. In section B, we will have
two C's, one B and two A's.

The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very
small dispersion or variability, whereas section B has a large dispersion.

A very simple measure of dispersion is the range of the data as we have defined before:

range = largest value - smallest value.

Mean Deviation, Sample Variance, and Standard Deviation

We will discuss three more measures of dispersion.


Suppose we have a data set x1, x2, ... , xn of size n. We will denote the mean of the data by x. Three definitions follow:

Definition. The mean deviation of the data is defined as follows.


mean deviation = ( |x1- x | + ... + |xn- x |) / n

So, the mean deviation is the mean of the absolute deviations | xi -x | from the mean.

Definition. The sample variance s2 of the data is defined as follows:

s2 = ( (x1- x)2 + ... + (xn- x)2 ) / (n -1)

Remark.

1. Note that we denote the sample variance as the square of a number s.


2. Also note that we divide by n-1, not by n. For some reason, dividing by n-1 works better.
3. We would like our measure of dispersion to have the same units as our data, but our formula involves squares (xi-
x)2, which means the unit of dispersion, s2, is the unit of the data squared. If the data is in feet, the variance is
in square feet. To solve this problem we define another measure of dispersion, standard deviation denoted s.

Definition. The sample standard deviation s is defined as the square root of the sample variance s2. So, to compute
the sample standard deviation, we have to compute the sample variance first.

If we simplify the definition of sample variance we get the following formula:

s2 =( (x12 + x22 + ... + xn2) - nx2)/(n - 1)

Let us quickly do some computation with the above example 2.3.1.

The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5=
42/5. Since the variability of section B was much higher, the mean deviation was very high.

Let us compute the the sample variances :

For section A the sample variance is

( (81-82)2+(84-82)2+(83-82)2+(80-82)2+(82-82)2 )/(5-1) =
(1+4+1+4+0) /4= 10/4 = 2.5 .

For section B the sample variance is

( (72-82)2+(93-82)2+(92-82)2+(82-82)2+(71-82)2 )/(5-1) =
(100+121+100+0+121) /4= 442/4.

Application of Standard deviation

The mean and the standard deviation tell us a lot about how the data is distributed.

Chebyshev's Rule. This rule applies for all kinds of data. Suppose x is the mean and s is the standard deviation of the
data. Then we have the following:

1. At least 0 percent of the observations will fall within 1 standard deviation of the mean, i.e, within (x-s, x+s). This
is clearly obvious.
2. At least 75 percent of the observations will fall within 2 standard deviations of the mean, i.e., within (x-2s, x+2s).
3. At least 89 percent of the observations will fall within 3 standard deviations of the mean, i.e., within (x-
3s, x+3s).

4. More generally, at least 100(1 - 1/k2) percent of the data will be within k- standard deviations from the mean, i.e.
within (x-ks, x+ks).

Chebyshev's Rule makes no assumption about the data or the variable. If we make some assumptions about the data,
then we can improve the above rule as follows.

The Empirical Rule: Suppose the histogram of the data is symmetric around the vertical line x = x as follows:
In other words, the histogram should fit into a bell-shaped curve.

Bell-shaped Curve

Click to see the Flash animation.


Then we have the following:

1. Approximately 68.3 percent of the observations will fall in the interval (x-s, x+s).
2. Approximately 95.4 percent of the observations will fall in the interval (x-2s, x+2s).
3. Approximately 99.7 percent of the observations will fall within the interval (x-3s, x+3s).

Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data
members are EQUAL!

Practice Problem. Consider the exercises 2.2.1 through 2.2.8. For each problem, compute the mean and standard
deviation of the data and find what percentage of the data are within one, two, or three standard deviations from the
mean.

Use of the Frequency Table

When a frequency table is given, we can use new formulas to compute the mean and variance of the data.

Formulas. Suppose the data consisting of n observations are given in a frequency table (ungrouped). Let xi denote the
values and fi be the frequency of xi. Then

1. the mean =

= =
x ,
∑ fixi ∑ fixi
∑ fi n
2. the variance =

s2 =
n- 1 ,
∑ fi(xi - x)2

3. A simplified formula for variance is


1

s2 = [∑ (fixi2) - n x2 ].

n- 1

4. If the data is given in a frequency table of the grouped data, we use the same formula, with xi as the class
mark, which is the average of the class limits.

Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car
(example 2.1.1) to compute mean and variance using the above formulas.

Time Frequency
fx fx2
x f
46 1 46 2116
47 1 47 2209
48 3 144 6912
49 3 147 7203
50 4 200 10000
51 6 306 15606
52 4 208 10816
53 5 265 14045
54 5 270 14580
55 2 110 6050
56 1 56 3136

Total 35 1799 92673

So, the mean x = 1799/35 =51.4 and variance


s2 = (92673 - 35x 51.42)/(35-1) = 6.0118.

Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2,
Lesson 1):

Frequency Class Mark


Classes fx fx2
f x
60.5-80.5 9 70.5 634.5 44732.25
80.5-100.5 20 90.5 1810 163805
100.5-120.5 25 110.5 2762.5 305256.25
120.5-140.5 37 130.5 4828.5 630119.25

140.5-160.5 8 150.5 1204 181202


Total 99 11239.5 1325114.75

We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight.

So, the mean x = 11239.5/99 = 113.53 and variance

s2 = (1325114.75 - 99 x 113.532)/(99-1) = 500.997.

Remarks.

1. Note that we can only get an approximate mean and variance if we use the class mark and with the above
formula. If you also use the original data you may notice a difference.
2. Because of the availability of computers, the importance of such approximations has declined.

Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It
is important to understand these concepts and formulas.

It is equally important to appreciate the value and necessity of using calculators or other available software (like Excel). It
is almost impossible (and unnecessary) to compute these constants manually and correctly, unless one is specially gifted
with numerical computations.

Use of Calculators (TI-83):


Computing the variance and standard deviation

1. Follow the same steps used for computing the mean (using either raw data or the
frequency table).
2. The calculator will give a list of numbers; SX is the standard deviation.
3. The variance is the square of the standard deviation.

Problems on 2.3: Variance, Standard Deviation, and Use of the Frequency Table

Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times
on a particular day.

138 142 127 137 148 130 142 133


Find the variance and standard deviation of the price.
Solution

Exercise 2.3.2. The following figures refer to the GPA of six students.

3.0 3.3 3.1 3.0 3.1 3.1


Find the variance and standard deviation of GPA.

Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs.

138 952 980 967 992 197 215 157


Find the variance and standard deviation of the lifetime of these bulbs.
Solution
Exercise 2.3.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the
athlete to complete the events.

Time (in seconds) Frequency


15.6 3
15.7 6
15.8 5
15.9 6
16.0 9
16.1 3
Total 32
Compute the variance and standard deviation of time taken by the athlete.
Solution

Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.

94 105 124 110 119 137 96 110 120 115 119


104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98
Compute the variance and standard deviation of the weight, at birth, of these babies.
Solution

Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8
Compute the variance and standard deviation of the hourly wages.
Solution

Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.
No. of Typos 156 158 159 160 162
Frequency 6 4 5 6 9
Find the mean number, variance, and standard deviation of typos in a book.
Solution

Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in
May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5


19 19 21.5 19.5 20 17 20 20 19 20.5
18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18
Compute the variance and standard deviation of the length, at birth, of these babies.
Solution

Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance
and standard deviation.

Weight x 31 32 33 34 35 36 37

Frequency f 3 2 4 5 6 5 9
Find the variance and the standard deviation.
Solution

Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus.

23 17 19 24 42 33 20 22 15 9
26 37 29 19 35 18 30 21 11 23
13 27 32 32 23 35 25 33 24 23
Find the mean, variance, and the standard deviation of the data.

Lesson 3 : Probability

Introduction 3.1 Basic Concept of Probability 3.2 Sets and Subsets, Statistical Experiments, Sample
Space, Events, Probability
3.3 Laws of Probability 3.4 Counting Techniques and Probability 3.5 Conditional Probability and Independent Events
Homework 8 - 11
Introduction

Probability to a statistician is the probability of the occurrence of an event. To an ordinary person it is


the quantified chance of occurrence of that event.

Some of the early theory of probability originated in gambling and later theories developed in bioscience. We get very
tempted when we see somebody win $1 million in a lottery, but lottery operators design their games and machines in
such a way that they will make more money than they give, in the long run.

3.1 Basic Concept of Probability

We are all familiar with simple probabilistic statements. If you toss a coin the probability that the HEAD will show
up is 1 out of 2. If you roll a die the probability of getting the face 5 is 1 out of 6. The probability of having an
accident on a particular busy street on a particular day is 1 out of 100. (When a child says "probably we should
invite Aaron for my birthday," however, the "probably" may have little to do with mathematics of probability, but shows
the awareness of the concept of probability at a basic human level.)

When we toss a coin for a large number of times we find that essentially half the time the head shows up. As we continue
to toss, we see that the ratio of the number of Heads to the number of tosses remains close to and moves around 1/2. So
we say that if we toss a coin, the probability that the head will show up is .50. On the other hand, if this ratio remains
close to and moves around .49 then we will say the probability of heads is .49. To understand the concept of probability
empirically, we visit aflash animation of a coin tossing experiment.

We observe the accidents on a street over a long period of time and observe that on about one in a hundred days there is
an accident. The longer we continue to observe, we see that the ratio of the number of days there is an accident to the
number of days observed remains close to one to one hundred. So we say that probability of an accident on a day on that
street is 1 percent.

These examples explain the basic notion of probability. The probability of an event is understood as the "relative
frequency," the ratio of occurrences of the EVENT to the total number of times the EXPERIMENT is repeated.

3.2 Sets and Subsets, Statistical Experiments, Sample Space, Events, Probability

This section provides basic definitions that we will need for the rest of the course.

Sets and Subsets

Definition. By a set S we mean a collection of objects. The objects in this set S are also called elements of the set. A
set E is said to be a subset of a set S if each element of E is also an element of S. We write

E⊆S

to mean that E is a subset of S. Obviously, a subset E of S is a smaller collection than or equal to S.

The following are some examples. We also explain the usage of braces to describe a set.

1. Let D = the collection of all 52 cards in a deck. Then D is a set. Let E be the collection of all the hearts in this
deck. Then E is a subset of D. In brace notation

E={x in D : X is a Heart }

This is read "the set of x in D such that x is a heart"


2. Let T be the collection of all those who filed a tax return to the IRS for the year 1999. Then T is a set. Let L be the
collection of those whose Adjusted Gross Income in the return was less or equal to $30,000. Then L is a subset
of T. Let C be the collection of those who declared capital gains income. Then C is a subset of T. We write

L⊆T
C⊆T

In brace notation

L = {x ∈ T : the Adjusted Gross Income of x is less or equal to $30,000}.

The symbol ∈ means "an element of"


x ∈ T means x is an element of T

3. Let N be the collection of all integers, and let E be the collection of even integers. Then N, E are set and

E⊆N

In brace notation

N = {n : n is an integer}
E = {n ∈ N : n is even}.

4. Let R be the set of all (real) numbers. Let I be the set of all numbers between 0 and 1, not equal to 0,1. Then R,I
are sets and I is a subset of R. In brace notation

R = {x : x is a real number}
I = {x ∈ R : 0 < x < 1}.

5. S = {1,7,13,17,19} is a set.

6. Let S be the collection of you and your siblings, B be the collection of your brothers, and F be the collection of
your sisters. Then S,B,F are sets and we have

F⊆S
B ⊆ S.

Statistical Experiments and Sample Space

Definitions.

1. A statistical experiment is a procedure that produces exactly one out of many possible outcomes. All the
possible outcomes are known, but which outcome will result when you perform the experiment is not known.

2. Given an experiment, the set of all possible outcomes is called the sample space.

3. Given an experiment, an outcome of the experiment is called a sample point. So, the sample space consists of
sample points.
Examples. The following are examples of some experiments and their sample spaces.

1. Suppose your experiment is tossing a coin. The outcomes are H (heads) and T (tails). So, the sample space is S =
{H,T}.

2. Suppose your experiment is tossing a coin twice. The sample points (or outcomes) are HH,HT,TH,TT and the
sample space is S = {HH,HT,TH.TT}.

3. Your experiment is rolling a die. The outcomes are 1,2,3,4,5,6 and the sample space is S = {1,2,3,4,5,6}.

4. Suppose that your experiment is rolling a die twice. Then the sample space is

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)


(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

S (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)


= (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

In brace notation, we can write

S = {(i,j) : i = 1,2,3,4,5,6 and j = 1,2,3,4,5,6}.

5. Suppose your experiment is to determine the number of road accidents in Lawrence on a particular day. So, the sample
space is S = {0,1,2,3 ... }.

6. Suppose the experiment is to determine the sex of an unborn chlid. Then the sample space is S = {Female, Male}.

7. Suppose your experiment is to determine the blood group of a patient in a lab. Then the sample space is S =
{O,A,B,AB}.

8. Suppose your experiment is to observe the annual wheat production in Kansas. Then the sample space is S={x : x is a
nonnegative Number} = {x ∈ R : x ≥ 0} =[0, ∞).

Definition. The sample space S is called a finite sample space if S has only a finite number of outcomes. If S has infinite
elements, it is called an infinite sample space. Note that examples 1, 2, 3, 4, 6, and 7 above have finite sample spaces,
and 5 and 8 have infinite sample space.

Events

Definitions. Given an experiment and its sample space S, the following are important definitions.

1. A subset of the sample space S is called an event. So, an event E consists of outcomes, and we have

E ⊆ S.

2. An event ∅ that has no outcome and is called the empty event or impossible event. The impossible event
consists of no outcome; if you perform the experiment, the impossible event will never occur.
3. Since S is also a subset of S, S is an event. This event S is called the sure event. If you perform the experiment,
this event is sure to occur.

4. A simple event consists of a single outcome.

Remark. Often, we will describe events in "English," and we may have to identify them as a subset of the sample space
and also conversely.

Examples. The following are some examples of events.

1. Look at example 2 above—the experiment on the coin toss. Let E be the event that at least one of the tosses gave
T, and let F be the event that both tosses gave the same face. Then

E = {HT, TH, TT} and F = {HH,TT}.

2. Look at example 4 above—the experiment on rolling a die. Let E5 be the event that first die showed 5. Then

E5 = {(5,1), (5,2), (5,3), (5,4), (5,5), (5,6)}.

Let T5 be the event that the sum of the two "rolls" is 5. Then

T5 = {(1,4), (2,3), (3,2), (4,1)}.

Let T1 be the event that the sum of the two rolls is 1. Because T1 has no outcome, it is an impossible event. Let
T13 be the event that the sum of the two rolls is 13. Then T13 is also an impossible event.

3. Look at the example 5 above—the experiment on road accidents. Let E be the event that there is no accident on
that day. Then

E = {0}.

4. Look at example 8 above—the experiment on annual wheat production. Let E be the event that there will be more
than 1000 units of wheat production in 1998. Then

E = (1000, ∞).

The Theory of Probability

Given a sample space S, in the MATHEMATICS of probability we have rules for how to compute the probability of an event
E. Although the MATHEMATICS of probability was inspired by the empirical concept of probability, we do not derive
anything from our intuitive ideas. We are guided by the precise rules and laws that we set up.

For now we will be dealing with finite sample spaces.

Definition. Let
S = { e1, e2, ... ,en }.

be a finite sample space. The probability of a simple event {e} is a number (possibly given) denoted by P({e}) which
has the following properties:

1. 0 ≤ P({e}) ≤ 1.

2. The sum of the probabilities of all the simple events is 1:

P({e1}) + P({e2}) + ... + P({en}) = 1.

3. If E is an event, then the probability E, P(E), is defined as the sum of the probabilities of all the sample events in
E:

P(E)= ∑ P({e})

e∈E

4. So, we also have

P(impossible Event)=P (∅)=0

P(Sure Event)=P(S)=1

Remark. If we know the probabilities P({e}) of all the simple events {e}, we will be able to compute the probability of
any event E using 3. The probabilities of the simple events will

1. either be given

2. or we will be given a rule how to compute it.

Probability with Equally Likely Outcomes

One of the most frequently used models to compute probabilities of simple events is called EQUALLY LIKELY
OUTCOMES.

Definition. Let S = {e1, ... , eN} be a finite sample space. We say that all the outcomes are equally likely if all the
outcomes have the same probability. So, in this case, we have

P({e1}) = P({e2}) = … = P({eN}) = 1/N.

Also, in this case, for an event E

P(E) = ∑ P({e}) = ∑ 1/N

e∈E e∈E

=(Number of Outcomes in E)/(Number of Outcomes in S)

If n(E) denotes the number of outcomes in E then

P(E) = n(E) .
n(S)

Problems on 3.2

Probability of Simple Events Given in a Table

Exercise 3.2.1. The following table gives the blood group distribution of a certain population.

Blood Group Distribution


Blood Group O A B AB
Percentage of
47 42 8 3
Population

Find the probability that a random sample of blood will be of Blood Group A or B or AB. (Here S={O, A, B, AB} and we
want to compute the probability P(E) of the event E={A, B, AB}.
Solution

Exercise 3.2.2. A student wants to pick a school based on its grade distribution. Following is the most recent grade
distribution in a school:

Grade Distribution
Unreal Data
Grades A B C D F
Percentage of
19 33 31 14 3
Students

Find the probability that a randomly picked student will have at least a B average.
Solution

Exercise 3.2.3. The following table gives the probability distribution of a loaded die.

Probability Distribution for a Die


Face 1 2 3 4 5 6
Probability 0.20 0.15 0.15 0.10 0.05 0.35

Find the probability that the face 2 or 3 or 6 will show up when you roll the die.
Solution

Find the Probability with Equally Likely Outcomes

Exercise 3.2.4. An urn contains 7 apples and 3 oranges and 5 pears. One piece of fruit is picked at random. Find the
probability that

1. the fruit is an apple,


2. the fruit is either an apple or a pear, and

3. the fruit is an orange.

Solution

Exercise 3.2.5. A die is rolled twice. Find the probability that

1. the sum is 8,

2. only 2 or 3 showed up in both the rolls, and

3. the first roll produced a bigger number.

Solution

Exercise 3.2.6. A letter is chosen at random from the letters of the English alphabet. Find the probability that

1. the letter is either I or U,

2. the letter is in the word ALWAYS, and

3. the letter is not in the word NEVER.

Solution

3.3 Laws of Probability

Notations from Set Theory

Following are a few notations from the set theory, which we will be using in the context of sample spaces and events.

Notations. Let S be a set and E, F be two subsets of S.

1. The union E ∪ F, of E and F is the set defined as follows:

E ∪ F = {x ∈ S : x ∈ E or x ∈ F}.

So, if you put together the elements of E and F in a single collection, you get the union E ∪ F.

2. The intersection E ∩ F, of E and F is defined as follows:

E ∩ F = {x ∈ S : both x ∈ E and x ∈ F}.

So, if you take all the elements common to both E and F, you get the intersection of E and F.

3. The complement Ec, of E is defined as follows:


Ec = {x ∈ S : x ∉ E}.

So, the complement Ec of E is the collection of all the elements in S that are not in E.

Remark. If we can understand and interpret the above definitions in our context of sample spaces and events, that is
adequate. For us, S will be a fixed sample space and E,F will be events.

1. E ∪ F is the event that consists of all outcomes that are either in E or in F (or both). So the occurrence of either E
or F is the same as the occurrence of E ∪ F. That is why some textbooks use the notation (E or F) for E ∪ F. So,
notationally, as in some textbooks,

E ∪ F = E or F.

2. E ∩ F is the event that consists of all the outcomes that are both in E and F. So the simultaneous occurrence of E
and F is the same as the occurrence of E ∩ F. That is why E ∩ F is denoted by (E and F) in some texts.
Notationally, as in some textbooks,

E ∩ F = E and F.

3. Similarly, Ec is the event that consists of all the outcomes in S that are not in E. So, the occurrence of Ec is the
same as the nonoccurrence of E. Notationally, as in some textbooks,

E c = (not E)

Laws of Probability

Following are some of the laws of probability.

First, probability behaves like area and the laws of probability are like that of area.

Some formulas and definitions: Let S be sample space and let E and F be two events.

1. We have

P(E ∪ F) = P( E or F) = P(E)+P(F)- P(E ∩ F)

= P(E) + P(F) -P(E and F)

We subtract P(E ∩ F) because we counted it twice: once in P(E) and once in P(F).

2. Definition. We say E and F are mutually exclusive if E ∩ F = ∅, i.e., E and F have no outcome in common.
Since P(∅) = 0, it follows from 1 that

if E and F are mutually exclusive then


P(E ∪ F) = P(E) + P(F)

3. We also have

P(Ec) = 1 - P(E).

4. Definition. Let E be an event. We say that the odds of an event E occuring are a to b if

P(E) = a/(a+b)

Remark: This concept of ODDS is used often in gambling. When the odds in favor of a horse are 2 to 3, essentially this
means that the probability the horse will win is 2/5. We say "essentially" because in actual betting, the probability is
actually slightly less than 2/3, so that in the long run the gambling establishment makes more money than it gives. (This
instructor is not particularly experienced in such betting or horse races.)

Problems on 3.3: Laws of Probability

Exercise 3.3.1. Let E, F, G be three events. It is given

P(E)=0.3 P(F)=0.7 P(G)=0.6


P( E ∩ F) = 0.2 P( E ∪ G) = 0.7

Find the probability that

1. E or F occur,

2. both E and G occur, and

3. E does not occur.

Solution

Exercise 3.3.2. Let E, F, G be events .

1. If the odds in favor of E are 3 to 5, find the probability that E occurs.

2. If the odds against F are 3 to 4, find P(F).

3. If P(G) = 7/10, what are the odds in favor of G?

Exercise 3.3.3. The probability that a Christmas tree is taller than 6 feet is .30; the probability that a Christmas tree
weighs more than sixty pounds is 0.25; and the probability that a Christmas tree is either taller than 6 feet or more than
sixty pounds is .4.

1. Find the probability that a Christmas tree is both taller than 6 feet and weighs more than sixty pounds.

2. Find the probability that a Christmas tree is not taller than 6 feet.
3. Find the probability that a Christmas tree is either less than 6 feet tall or less than sixty pounds in weight.

4. Find the probability that a Christmas tree is neither taller than 6 feet nor heavier than sixty pounds.

Solution

Exercise 3.3.4. The probability that a student majors in liberal arts is .44; the probability that a student majors in
business is .33; and the probability that a student majors in either liberal arts or business is .65. Find the probabilities

1. that a student majors in both liberal arts and business.

2. that a student majors in neither liberal arts nor business.

Solution

3.4 Counting Techniques and Probability

Counting techniques are important and useful to learn. You might like to know, for example,

1. the number of English words (formal) of 5 letters, (A formal word is any sequence of letters from the English
alphabet. For example, eezq is a formal word.)

2. the number of ways you can deal a hand of 13 cards from a deck of 52 cards, or

3. the number of ways you can assign the first row of 11 seats to 231 guests.

Before we go further into counting, let us recall the factorial notation.

Notations. Let n be a positive integer. Then the n! (read as factorial n) is defined as

n!= 1 . 2 . … (n-2) . (n-1) n

0!=1.

Factorial n is the product of all integers from 1 up to n.

One of the main tools for such counting is the following principle:

The Basic Counting Principle. Suppose we have an experiment that is a combination of r sub-experiments, performed
one after the other, such that

1. the first sub-experiment has n1 outcomes;

2. corresponding to each outcome of the first sub-experiment, the second sub-experiment has n2 outcomes;

3. corresponding to each outcome of the first and the second sub-experiments, the third sub-experiment has
n3 outcomes;


r. corresponding to each outcome of each of the previous r-1 sub-experiments, the rth sub-experiment has
nr outcomes.

Then our original experiment will have n1n2 ... nr outcomes.

Remark. Here we have used the word "experiment" in a slightly different sense than the statistical experiments. The
basic counting principle will be used to count the number of outcomes in sample spaces and events.

Examples.

3.4.1. Count the number of words of length four that you can construct from the English alphabet. Answer: 26x 26x26x26

We use the counting principle by splitting this experiment into four sub-experiments:

Stage Job to do Number of Ways

1. Pick the first letter 26

2. Pick the 2nd letter 26

3. Pick the 3rd letter 26

4. Pick the 4th letter 26

Answer = Product = 456976

3.4.2. Count the number of ways you can assign the 11 seats in the first row in a concert hall to 231 guests.

Stage Job to do Number of Ways

1. Assign seat 1 231

2. Assign seat 2 230

3. Assign seat 3 229

4. Assign seat 4 228

5. Assign seat 5 227

6. Assign seat 6 226

7. Assign seat 7 225

8. Assign seat 8 224

9. Assign seat 9 223

10. Assign seat 10 222

11. Assign seat 11 221

Answer = Product = 221*222*...*230*231

3.4.3. Contrast: How many ways can you form a committee of 11 members from a group of 231 people? Unlike
assigning seats, here the order of selection of the members will be ignored. The 11 members, when permuted around,
will have different seat assignments but in the same committee. Forming the committee is a "combination" problem that
comes below.

Remark. The difference between assigning 11 seats in a row and forming a committee of 11 is that in the first case
the order of assignment is important. Assigning the first row to the same 11 guests in two different ways will count as
two different outcomes. When we form a committee, the order in which we pick 11 members does not make any
difference.

Definition. Suppose we have n objects. We pick r of them one by one (without ever puttting them back) and arrange
them in a row. Such an ordered arrangement will be called a permutation of n objects taken r at a time. The number
of permutations of n objects taken r at a time is denoted by nPr. It follows from the basic counting principle that

nPr = n (n-1) (n-2) ... (n-r+1) = n!/(n- r)!

Number of permutations nPr = product of r integers starting from n downward.

In contrast, we can pick r objects from a collection of n objects one by one but place the object back in the collection
before the next pick, and arrange all of them in a row. Such selection and arrangement is called picking with
replacment. Constructing a formal word of length 4 is an experiment of picking with replacement.

Remark: Example 3.1 is a problem on picking with replacement because a letter can be selected more that once.
Example 3.2 is a permutation problem.

Definition. Suppose we have n objects in a container. We pick r of them all at a time. In this case the order of selection
does not come into consideration. Such a selection is called a combination of n objects taken r at a time. The number of
combinations of n objects taken r at a time is denoted by nCr and is given by

n!
nC r = (r! (n-r)!)

Examples. 1. Count the number of ways you can form a committee of 11 from a group of 231 people. Answer: 231C11
2. Count the number of ways you can deal a hand of 13 cards from a deck of 52 cards. Answer: 52C13.

Problems on 3.4: Counting Techniques and Probability

Exercise 3.4.1. Find 5!


Solution

Exercise 3.4.2. A homeowner would like to install a new storm door. The local store offers 2 brand names; each brand
has 4 different styles and 3 colors. How many choices does the homeowner have?
Solution

Exercise 3.4.3. Suppose in the World Cup soccer tournament, group A has 8 teams. Each team of group A has to play all
the other teams in the group. How many games will be played among the group A teams. Answer: 8C2

Exercise 3.4.4. How many ways can you deal a hand of 13 cards from a deck of 52 cards? Answer: 52C13

Exercise 3.4.5. How many ways can you deal a hand of 4 spades, 3 hearts, 3 diamonds, and 3 clubs?
Solution
Solution-variation

Exercise 3.4.6. We have 13 students in a class. How many ways can we assign the 4 seats in the first row? Solution
Exercise 3.4.7. Programming languages sometimes use a hexadecimal system (also called "hex") of numbers. In this
system, 16 digits are used and denoted by 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. Suppose you form a 6-digit number in a
hexadecimal system.

1. What is the probability that the number will start with a letter digit?

2. What is the probability that the number is divisible by 16 (i.e., ends with 0)?

Solution
Here the sample space is the collection of all the 5-digit hex numbers.

• Using the counting principle, the number of hex = n(S) = 166.


• Let E be the event that the number starts with a letter digit.
• Again, by the counting principle, the number of hex in E = n(E) = 6*165.
• So, P(E) = n(E)/n(S) = 6/16.
• Let F be the event that the number is divisible by 16. Since a number is divisible by 16 means, in hex, the first
digit is 0.
• So, the number of hex in F = n(F) = 165*1 = 165.
• So, P(F) = 165/166 = 1/16.

Exercise 3.4.8. You are playing Bridge and you are dealt a hand of 13 cards.

1. What is the probability that you will get a hand of 4 spades, 3 hearts, 3 diamonds and 3 clubs?

2. What is the probability that you will get all 4 aces?

3. What is the probability that you will get all 13 spades?

Solution

Exercise 3.4.9. A committee of 9 is selected at random from a group of 11 students, 17 mothers and 13 fathers.

1. What is the probability that the committee has 3 students, 3 mothers, and 3 fathers, i. e., is a balanced
committee?

2. What is the probability that the committee has 4 mothers and 5 fathers?

3. What is the probability that the committee has all students?

Solution

Exercise 3.4.10. Three scholarships of unequal value will be awarded from a group of 35 applicants. How many ways
can such a selection be made?
Solution

3.5 Conditional Probability and Independent Events

Sometimes when new information becomes available, the probability of an event may have to be reevaluated in light of
this new information. Suppose we have a sample space S and an event E. Now suppose we have new information that an
event C has occurred. We will have to reevaluate the conditional probability of E given that C has occurred. The
conditional probability of E given that C has occurred is denoted by P(E|C). Clearly, P(E|C) may be different from P(E). In
fact, now that C has occurred, our old sample space is no longer relevant. And C assumes the role of the new sample
space.

Example. Suppose we pick a KU student at random and let E be the event that the student is taller than 6 feet. Then we
have the following observations.

1. The sample space S is the whole KU student population.

2. Since all the outcomes are equally likely, we have

number of KU students who are taller than 6 feet n(E)

P(E) = = .

Total number of KU students n(S)

3. Now suppose we know that the student selected is a male. Let us denote the event that the student is a male by
C. The probability that the student is taller than 6 feet, given that the student is a male, is higher than "simple"
P(E). In fact, our new sample space is C, which is the whole KU male student population, not S, which is the
whole KU student population.

4. We now have the probability that the student is taller than 6 feet in height given that the student is a male

number of MALE students who are taller than 6 feet

= P(E|C) =

Total number of male KU- students

n(E∩C)

= .

n(C)

Simple computations show that

n(E∩C)/n(S) P(E∩C)

P(E|C) = = .

n(C)/n(S) P(C)

Based on the above example, we give the following definition and formula.

Definition. Let S be a sample space and E, C be two events.

1. The conditional probability of E given that C has occurred is

P(E∩C)

P(E|C) =

P(C)

if P(C) ≠ 0.

2. We get the following formula


P(E∩C) = P(E|C)P(C).

Independent Events

If the conditional probability P(E|F) = P(E) the "simple" probability, then we say that E and F are independent. In this
case,

P(E∩F) = P(E)P(F).

Definition. We say that two events E and F are independent if

P(E∩F) = P(E)P(F).

If two events are not independent, then they are said to be dependent.

Remark. Let us also describe what we mean by independence of 3 or more events. For events E1,E2, … , En, we say they
are independent if the "multiplication rule" applies. For example E,F,G,H are independent if all of the following holds:

2 events

P(E∩F) = P(E)P(F), P(E∩G) = P(E)P(G),

P(E∩H) = P(E)P(H), P(F∩G) = P(F)P(G),

P(F∩H) = P(F)P(H), P(G∩H) = P(G)P(H)

3 events

P(E∩F∩G) = P(E)P(F)P(G), P(E∩F∩H) = P(E)P(F)P(H),

P(E∩G∩H) = P(E)P(G)P(H), P(F∩G∩H) = P(F)P(G)P(H)

4 events

P(E∩F∩G∩H) = P(E)P(F)P(G)P(H)

Problems on 3.5: Conditional Probability and Independent Events

Exercise 3.5.1. Let A, B be two events. Given that

P(A) = .66 P(A ∩ B) = .11

Find P(B|A).
Solution

Exercise 3.5.2. Given

P(A|B) = .8 P(B) = .1

Find P(A∩B).
Solution
Exercise 3.5.3. In a certain county, the probability that a person took a flu shot is .45 and the probability that a person
will get flu, given that he/she took a flu shot is .06. What is the probability that a randomly selected person took a flu
shot and will get flu?
Solution

Exercise 3.5.4. Consider the following two circuit diagrams:

Circuit 1 Circuit 2

For each of the two circuits do the following:


As you can see, current flows through two switches A and B to the radio and back to the battery. It is given that the
probability that the switch A is closed is 0.91 and the probability that the switch B is closed is 0.83. Assume that the two
switches function independently. Find the probability that the radio is playing.
Solution

Exercise 3.5.5. An airplane has two engines. The probability that engine 1 fails is 0.023 and the probability that engine 2
fails is 0.06. Assume that the engines function independently.

1. What is the probability that both engines fail?

2. What is the probability that both will not fail?

3. What is the probability that neither will fail?

Solution

Exercise 3.5.6. Following are data from a hospital emergency room:

1. The probability that a patient in the emergency room will have health isurance is 0.75.

2. The probability that a patient in the emergency room will survive the treatment 0.85.

3. The probability that a patient in the emergency room will have health insurance and will also survive is 0.7.

What is the conditional probability that a patient in the emergency room will survive, given that he/she has health
insurance.
Solution

Exercise 3.5.7. The probability that you will receive a wrong number call this week is 0.3; the probability that you will
receive a sales call this week is 0.8; and that the probability that you will receive a survey call this week is 0.5. What is
the probability that you will receive one of each this week? (Assume that all these calls are independent.)

esson 4 : Random Variables


4.1 Random Variables 4.2 Probability Distribution 4.3 The Bernoulli and Binomial
Experiments

Homework 12 and 13

4.1 Random Variables

Definition. Let S be a sample space. Then a random variable X assigns a numerical value X(w) to each outcome w in S.

Examples. Suppose we pick a KU student at random. Then our sample space S is the whole population of KU students.

1. Let X be the GPA of the student. If w is a student, X has a value X(w) which is the GPA of w.
2. Define Y as follows :

Y(w) = 0 If w is Male
Y(w) = 1 If w is Female

3. Let Z be the height of student w.


4. Let T be the number of credit hours completed by w.
5. Let W be the weight of w.
6. Let D be the total expenses (rounded up to the nearest dollar) of w in 1997.

Then X,Y,Z,T,W,D are all random variables.

Definitions. A random variable X is said to be a discrete random variable if the values that X can assume can be
written in a (possibly infinite) list
x1, x2, x3, ….

A random variable X is said to be a continuous random variable if X can assume any value in an interval.

Remark. In this course, examples of discrete random variables are always the number of something: number of typos,
number of accidents on a street, number of defective items in a lot, and so on. Examples of continuous random variables
are length, weight, and time.

So, Z,W are continuous random variables and X,Y,T,D are discrete random variables.

Examples.

1. Let X be the number of wrong number calls you receive in a day. Then X is a discrete random variable.

2. Let X be the waiting time before you receive the next wrong number call. Then X is a continuous random variable.

First, we will be concerned with the discrete random variables.

4.2 Probability Distribution

The probability distribution of a random variable X is a table or a rule or a method that answers probability-related
questions regarding X.

Definition. Suppose X is a discrete random variable that assumes the values


x1,x2,….

The probability distribution of X can be described by giving


p(xi) = P(X = xi)

in a table or by a formula. This function p(xi) is called the probability function of X.

So, if the probability distribution of X is given in a table, then it looks like this:
Value Probability
x p(x)
x1 p(x1)

x2 p(x2)

x3 p(x3)

… …
Properties of Probability function. Suppose X is a discrete random variable that assumes value
x 1, x 2, x 3, …

and let p(x) be the probability function. Then we have the following:

1. 0 ≤ p(xi) ≤ 1.

2. ∑ p(xi) = 1.

Definition. Let X be a discrete random variable that assumes the values


x1, x2, x3, …

Then the mean μ of X is defined as


μ=∑ xip(xi).

The mean μ is also called the expected value of X and is denoted by E(X). The mean μ is also called the population
mean.

Example. Suppose you design a coin toss game. In this game, you give the opponent $3 if a head comes and you collect
$1 if a tail comes. Let X be the money you receive. Then X assumes the values -3 and 1. You also have a loaded coin so
that

P(H) = 1/9 P(T) = 8/9.

Then the probability distribution of X is given by

Value Probability
x p(x)
-3 1/9
1 8/9
So, the mean μ of X is given by
μ=∑ xip(xi)= (-3)(1/9)+1(8/9)=5/9.

Interpretation of mean μ of X. In this example, (see the first example in section 4.1), the mean μ tells us your average
win per game if you play for a long time.

Similarly, if Z is the height then the mean μ = E(Z) is the actual mean height of the KU student population. If we take a
large sample from the KU student population and compute the sample mean, it should approximate μ.

Definition. Let X be a discrete random variable that assumes values


x1,x2, x3, …

Then the variance σ2 of X is defined as

σ2= Variance(X)=∑ (xi-μ)2p(xi).

Some simplification will show

σ2= Variance(X)=∑ xi2p(xi)-μ2.


The standard deviation σ of X is defined as the positive square root of the variance of X.

standard deviation of X= σ =√Variance(X)

The variance σ2 is also called the population variance. If we take a large sample and compute the sample variance
s2 then s2 will be an estimate for σ2. Similarly, σ is called the population standard deviation.

Problems on 4.2: Probability Distribution

Exercise 4.2.1. The number of passengers X in a car on a freeway has the following probability distribution.

X=x 1 2 3 4 5
p(x) 0.35 0.30 0.15 0.15 0.05
Find:

1. the expected number of passengers in a car;


2. the Variance σ2 of the number of passengers;
3. the probability that the number of passengers in a car is at least 3.

Solution

Exercise 4.2.2. Karin is a plumber who works for 3 different employers. Employer A pays her $120 a day, employer B
pays her $70 dollars a day, and employer C pays her $180 a day. She works for whoever calls her first. The probability that
employer A calls her first is 0.30; the probability that employer B calls first is .20; and the probability that employer C calls
her first is 0.40 (the probability that no one calls is .10). What is the expected income and variance of Karin per day?
Solution

Exercise 4.2.3. An insurance company sells a flight insurance policy at a flat rate of $500 per flight. If a policyholder dies
in flight, the insurance company pays $100,000 to the survivors. The probability that a policyholder will die in flight is .
003. What is the expected gain and variance of the company per sale?
Solution

4.3 The Bernoulli and Binomial Experiments

There are many random variables that we encounter fairly often. The first one that we discuss is called a Bernoulli
random variable.

Definition. There are many statistical experiments that have only two outcomes. In such cases, the outcomes may be
called a success or a failure. So the sample space is

S={s,f}.

Here s means success and f means failure.

Such an experiment is called a Bernoulli trial. Given a Bernoulli trial, we can define a random variable as

X = 1 if success
X = 0 if failure

If the probability P(success) = p then we have P(failure) = 1-p. So, the probability distribution of a Bernoulli random
variable is given by

Value Probability
x p(x)
0 1-p
1 p
The mean of X is

μ = 0(1-p)+1p = p.

The variance of X is

σ2 = ∑ xi2p(xi) - μ2 = (0.(1-p)+1p) -p2 = p-p2 = p(1-p).

Binomial Random Variable

Definition. An interesting statistical experiment is a combination of n "identical and independent" Bernoulli trials. Such
an experiment is called a binomial experiment. More formally, given a positive integer n and a number p with 0 ≤ p ≤ 1
a binomial(n,p) experiment (or B(n,p) experiment) is characterized as follows:

1. A binomial experiment consists of n identical and independent Bernoulli trials.


2. The probability of success in each trial remains fixed and is equal to p.

Definition. Given a B(n,p)-experiment, let

X = total number of successes in these n trials.

Then X is called a binomial (n,p)-random (or B(n,p)-random) variable. Following are some important facts about a
B(n,p)-random variable X:

1. X can assume values 0,1,…,n. The probability distribution is given by

p(r) = P(X = r) = P(r success) = nCr pr(1-p)n-r

where r runs through 0,1,2,…,n.

2. The mean of X is

μ = E(X) = np.

3. The variance of X is

σ2 = Variance(X) = np(1-p).

Problems on 4.3: Binomial Experiments

Exercise 4.3.1. Let X be a B(6,.3)-random variable. Find P(X = 2). Also find the probability that X is at least 2.
Solution

Exercise 4.3.2. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control
(CDC), 18 percent of children younger than 2 years had anemia in 1997. On a particular day, a pediatrician examined 11
children.

1. What is the probability that none will have anemia?


2. What is the probability that exactly 5 will have anemia?
3. What is the probability that all will have anemia?
4. Compute the expectation and variance of the number of children with anemia.
5. What is the probability that at least 7 will have anemia?
Solution

Exercise 4.3.3. A gardener planted 15 seeds. The probability that a seed will germinate is 0.1.

1. What is the probability that exactly 3 seeds will germinate?


2. What is the probability that exactly 4 seeds will germinate?
3. What is the probability that exactly 9 seeds will germinate?
4. Compute the expected number of seeds that will germinate.
5. Compute the standard deviation of the number of seeds that will germinate.
6. What is the probability that at most 4 seeds will germinate?

Solution

Exercise 4.3.4. In a particular county, 60 percent of the population is Hispanic.

1. What is the probability that a jury of 12 will have exactly 6 Hispanic members?
2. What is the probability that a jury of 12 will have more than 6 Hispanic members?

Solution

Exercise 4.3.5. From the hiring statistics of a corporation (say IBM), it is known that for every 4 interviews they give,
they make 1 job offer. Suppose that the corporation interviews 8 candidates each time it comes to campus. What is the
mean and standard deviation of the number of job offers made each time?

Lesson 5 : Continuous Random Variables

5.1 Probability Density Function (pdf) 5.2 The Normal Random Variable 5.3 Nomal Approximation to Binomial
Homework 14 - 16

5.1 Probability Density Function (pdf)

Given a sample space S, a continuous random variable was defined as a random variable X that can assume any value
in an interval. The probability distribution of a continuous random variable is described very differently from that of a
discrete random variable. We describe it as follows.

Definition. Let S be a sample space and X be a continuous random variable. Then there is a function f(x), of real numbers
x, to be called the probability density function, abbreviated as pdf of X. This pdf f(x) has the following properties:

1. We have f(x) ≥ 0 for all real numbers x.


2. For any two real numbers a ≤ b (also for a = -∞ and b = ∞) the probability that X will be between a and b is
given by the area under the graph of y = f(x), above the x-axis and between the vertical lines x = a and x = b. In
mathematical notations we have

P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b) =

the area under the graph of y = f(x), above x-axis, between the vertical lines x = a and x = b.
Look at the animations on
1. exponential probability.
2. normal probability.

3. If you had calculus, we have

P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b) = a∫bf(x)dx

4. It follows that for any real number a

P(X = a) = 0.

This is very much in contrast with the discrete random variables.

5. The whole area under the graph of y = f(x) above the x-axis must be one.

Remark. Given a continuous random variable X, to get a model for f(x) we look at a large sample and look at the relative
frequency histogram of the X-values.

Example. Let X have the following pdf:

1 if 0 ≤ x ≤ 1
f(x) =
0 Otherwise
Then we say X is uniformly distributed between 0 and 1 because it has the same density everywhere between 0 and 1.

Similarly, Y is said to be uniformly distributed between -1 and 3 if the pdf of Y is given by

1/4 if -1 ≤ x ≤ 3
g(x) =
0 Otherwise

The Mean and Variance

The mean μ, variance σ2 and standard deviation σ of continuous random variables X are interpreted as we did for
discrete random variables. As before, the mean μ, which is also called the expectation E(X), represents the average value
of X.

But the definitions involve some calculus, which we are trying to avoid. If you have had calculus, I am giving the following
definitions.

Suppose f(x) is the pdf of a continuous random variable X. Then the mean of X is

μ =E(X)=∫-∞ xf(x)dx

and the variance of X is


σ 2 =Variance(X)=∫-∞ (x- μ )2 f(x)dx

and the standard deviation σ is the square root of the variance σ2.

Look at the following flash animations of graphs of some pdfs:

1. Example 1. Normal
2. Example 2. t Distribution
3. Example 3. Chi-square Distribution

5.2 The Normal Random Variable

The most commonly encountered random variable in nature is the normal random variable. As we have seen in the last
section, the probability distribution of a random variable is determined by the pdf of the random variable. The pdf of a
normal random variable is described below.

PDF of a Normal Random Variable: Suppose f(x) is the pdf of a normal random variable X. Then we have the following
properties of f(x).

1. The graph of the pdf y = f(x) has a symmetric bell shape as illustrated below:

2. Look at the flash animation of the pdf of normal random variables.


3. The pdf f(x) is completely known if we know the mean μ and the standard deviation σ.
4. The graph is symmetric around the vertical line x = μ. The graph is also peaked at x = μ.
5. The graph approaches the x-axis at both ends of the x-axis.
6. The larger the standard deviation σ is, the flatter the the graph of y = f(x) will be.
7. In fact,
f(x)= 1/[σ √(2 Π)] exp [-(x- μ)2/(2σ2)] for - ∞ < x < ∞ .

8. If X is a normal random variable, we say X is normally distributed, or X has normal distribution. We also
write X has N(μ,σ)-distribution.

Definition. A normal random variable is called a Standard Normal Random Variable if it has mean μ = 0 and standard
deviation σ = 1. So, a N(0,1)-random variable is called a standard normal variable. In some textbooks the standard normal
random variable is denoted by Z. The GOOD NEWS is that a table is available to compute these probabilities. The following
properties of Z will be useful.

1. The graph of the pdf y = f(x) of the standard random variable Z is symmetric around the y-axis.
2. The total area under the graph above the x-axis is one.
3. So, on each side of the y-axis, the area under the graph above the x-axis is .5.
4. Visit the flash animation on Standard Normal Probability to see illustrations of the above.

Using the Probability Tables: Tables are used widely to compute probability. However, due to the use of various
software programs on probability, the importance of such tables has declined. In this chapter, we will use the Z-table to
compute probability for the standard normal random variable. We note the following:

1. Tables are available in many different formats.


2. Visit the Z-table and try to understand it.
3. This table gives P(Z<z) for numbers z.
4. The probability P(Z<z) is the area on the left side of z, under the bell curve.
5. The number z is read from the left column and top. The probability P(Z<z) is given in the middle.
6. So, P(a < Z < b) = P(Z < b) - P(Z<a) = the difference between the probability P(Z < b) and P(Z<a) that we read
from the table.

Inverse Probability: Sometimes we will be given the probability and asked to compute a "cut off" point.

1. Example: We may be given P(Z < c) = .975 and asked to compute c. You will see from the table P(Z<1.96) = .
975 and conclude that c=1.96.
2. Example: We may be given P(l < Z) = .005 and asked to compute l. P(l<Z) represents the area on the right side
of l, under the bell curve. So, P(Z < l) = 1 - .005 = .995. From the table P(Z<2.58) = .995 (actually .9951, but
the exact match is not always expected). So, l=2.58.
3. Visit the animation on Inverse Z distribution to inspect a particular type of cut-off problem that we will use later.

Given a N(μ,σ)-random variable X, we can use the Z-table to compute probabilities for X because of the following theorem.

Theorem. Let X be a N(μ, σ)-random variable. Then Z = [(X-μ)/(σ)] is a standard random varable. So,

a- μ b- μ

P(a < X < b) = P( <Z< )

σ σ

OR

P(a < X < b) = P(A < Z < B)

where A= (a-μ)/σ and B= (b-μ)/σ).

Now we can use the Z-table.

Problem Solving: We will have two types of problems in this section—probability computation and problems of inverse
probability (or cut-off points).

1. For a problem on normal random variables X with mean μ and standard deviation σ, the first step
is STANDARDIZATION.
2. Then, we look at the Z-table.
3. Example: Suppose X is a N(2, .5) random variable and P(X<L) = .95, what is the cut-off L? First, we standardize
and we have P((X-μ)/σ < (L-μ)/σ) = P(Z < (L-μ)/σ ) = .95. From table, P(Z < 1.65) = .95 (approximately). So, L-
μ/σ = 1.65 an L = μ+1.65σ = 2 + 1.65*.5 = 2.825.

Ubiquity of Normal Random Variables: Any random variable that we encounter in nature is, almost certainly, either
normal or approximately normal. If there is one concept that you take from this course it is this: nature's random variables
are normal or approximately normal. You will hear about normal random variables and the bell curve in your workplace or
anywhere you may have to use statistics.
Problems on 5.2: the Normal Random Variable

Exercise 5.2.1. Let Z be the standard normal random variable.

1. Find the probability P(-1.1 < Z < 2.5).


2. Find the probability P(Z < -2.1).
3. Find the probability P(-2.1 < Z < -1.5).
4. Find the probability P(1.5 < Z).

Experiment with the normal animation.


Solution

Exercise 5.2.2. Let X be a normal random variable with mean μ = 3 and standard deviation σ = 1.5 .

1. Find the probability P(-1.1 < X < 2.5).


2. Find the probability P(X < -2.1).
3. Find the probability P(-1.2 < X < -0.5).
4. Find the probability P(1.5 < X).

Experiment with the normal animation.


Experiment with the Solution

Exercise 5.2.3. The length of life of some light bulbs produced in a factory is normally distributed with mean 8640 hours
and standard deviation 1440 hours. Find the probability that a bulb will last

1. less than 5040 hours;


2. between 5040 hours and 8640 hours.

Solution

Exercise 5.2.4. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm.
What proportion (i.e, probability) of fish are between 44 cm and 110 cm long?
Solution

Exercise 5.2.5. The diameter of the pumpkins in my patch has normal distribution with mean 13 inches and standard
deviation 4.5 inches. What proportion (i.e., probability) of pumpkins is above 22 inches?
Solution

Exercise 5.2.6. The annual expenditure X of a student is approximately normally distributed with mean μ = 11,000
dollars and standard deviation σ = 1500 dollars. What percent of students spend less than 10,000 dollars?
Solution

Exercise 5.2.7. Suppose the annual production X of milk per cow is normally distributed with μ = 5500 liters and
standard deviation σ = 150 liters. What percent of cows have annual yield less than 5155 liters?
Solution

Exercise 5.2.8. The amount of vegetable oil X produced by a machine in a day is normally distributed with μ = 130 liters
and standard deviation σ = 25 liters. What is the probability that a machine will produce between 120 liters and 150 liters
on a day?
Solution

Exercise 5.2.9. The weight X at birth of babies is normally distributed with mean μ = 114 oz and standard deviation σ =
18 oz. What percent of babies will have birth weight below 141 oz?
Solution

Exercise 5.2.10. Let Z be the standard normal random variable.

1. Given that P(-1.1 < Z < c)=.6881, find c.


2. Given that P(Z < c)=0.0222, find c.
3. Given that P(c < Z < 1.5) = 0.0919, find c.
4. Given that P(c < Z) = 0.102, find c.
Experiment with the normal animation.
Solution

Exercise 5.2.11. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm.
On a fishing trip to the lake, you are instructed to release those in the lower 33 percent in length. What is the cut-off
length?
Solution

Exercise 5.2.12. The telephone company's data shows that length X of their international calls has normal distribution
with mean 11.5 minutes and standard deviation 4.3 minutes. The company decided to give a special rate for the longest
20 percent calls. What is the cut-off time length?
Solution

Exercise 5.2.13. The weight X of babies (of a fixed age) is normally distributed with with mean μ = 212 oz and standard
deviation σ = 25 oz. Doctors would be concerned (not necessarily alarmed) if a baby is among the lower 5.05 percent in
weight. Find the cut-off weight L below which the doctors will be concerned.
Solution

Exercise 5.2.14. Monthly water consumption X per household, in a subdivision in Kansas City, has normal distribution
with mean 15000 gallons and standard deviation 3000 gallons. It has been decided that a surcharge will be imposed for
those in the top 25 percent. Find the cut-off consumption U in gallons.
Solution

5.3 Normal Approximation to Binomial

A wide range of random variables behave approximately like a normal random variable. One such example is
binomial(n,p)-random variables.

Roughly, if X is a B(n,p) random variable, then X behaves approximately like a normal random variable with mean μ = np
and standard deviation σ = [np(1-p)]1/2.

As we know, a B(n,p) random variable X is discrete and

P(X=r) = nC rpr(1-p)n-r r=0,1,2,…,n.

On the other hand, if Y is a N(μ, σ) random variable then

P(Y = r) = 0.

Because of this, some correction needs to be done. The following theorem states how to use normal approximation to
binomial random variables.

Theorem. Suppose X is a B(n,p) random variable. If n is large and p is not very close to 0 or 1, then X behaves,
approximately, like a N(μ, σ) random variable where

μ = np and standard deviation σ = [np(1-p)]1/2.

We have, for r=0,1,…,n

P(X = r) = P(r-0.5 < X < r + .5) =P(L < Z < R)

where L=(r-0.5-μ)/σ and R=(r+0.5-μ)/σ.

More generally, for r,s=0,1,…,n

P(r ≤ X ≤ s) = P(r-0.5 < X < s + .5) =P(L < Z < R)

where L=(r-0.5-μ)/σ and R=(s+0.5-μ)/σ. Now use the Z table.

This adjustment by .5 on two sides is called continuity correction.


Problems on 5.3: Normal Approximation to Binomial

Exercise 5.3.1. A Lawrence bank knows that 35 percent of its customers will visit the drive-through window. If 400
customers visit the bank, what is the approximate probability that more than 120 will visit the drive-through window?
Solution

Exercise 5.3.2. It is known that the probability that a household owns a food processor is 0.1. If 190 households are
interviewed, find the approximate probability that

1. more than 26 households own a food processor;


2. less than 30 households own a food processor.

Solution

Exercise 5.3.3. The campaign committee of a candidate claims that sixty percent of the voters are in favor of the
candidate. You interview 150 voters. Assuming that the campaign committe's claim is accurate, what is the approximate
probability that less than 77 will favor the candidate?
Solution

Exercise 5.3.4. A technique is used to fertilize eggs in a fertility clinic laboratory. It is known that the probability that an
egg will be fertilized by this technique is 0.1. If 500 eggs are treated, what is the probability that at least 60 eggs will be
fertilized?
Solution

Exercise 5.3.5. The probability that a computer chip produced in a factory is defective is is .2. If you have a sample of 60
chips, what is the probability that the number of defective chips will be less than 20?
Solution

Exercise 5.3.6. The probability that a light bulb produced by a machine is defective is p = 0.2. Suppose a quality control
inspector takes a sample of 120 bulbs. What is the probability that more than 30 bulbs will be defective?
Solution

Exercise 5.3.7. Suppose the probability that a student has access to the Internet is p = 0.8. Suppose you interview 160
students. What is the probability that less than 120 students will have access to the Internet?
Solution

Exercise 5.3.8. Suppose that the probability that a person favors medical use of marijuana is p = 0.6. If 780 individuals
are interviewed, what is the probability that less than 450 will be in favor?
Solution

Exercise 5.3.9. Suppose that the probability that a middle-income family invests in the stock market is p = 0.8. If we
interview 880 middle-income families, what is the probability that more than 700 have invested in the stock market?
Solution

Exercise 5.3.10. Suppose that an insurance company knows from experience that the probability that a life-insurance
policyholder will survive another 10 years is p = 0.9. The company has 2280 policyholders. What is the probability that
more than 2025 will survive another 10 years.

Lesson 6 : Sampling Distribution

Introduction 6.1 Central Limit Theorem and Sampling Distribution of the Proportion Homework 17

Introduction

The sample mean x that we have computed in the previous chapters is, in fact, the observed value of a random variable X.
Similarly, the sample variance s2 that we have computed before is the observed value of a random variable S2. Each time
you collect a sample/data, the computed sample mean x is the value of the random variable X for this sample. This is
explained in the following example.

Example. Suppose we want to study the height distribution of the U.S. population. We collect data of size n = 1713. We
shall consider that height xi of the ith individual in this sample is, in fact, the observed value of a random
variable Xi. Here Xi is the notation for height of the ith member of the sample, which could be the height of any person
from the whole U.S. population. When we finished collecting data we have n measurements
x1, x2, …, xn.

They are, respectively, the observed values of n random variables


X1, X2, …, Xn.

We (re)define the sample mean X as the random variable

X1+X2+…+Xn

X= X = .

n
We also (re)define sample variance S2 as the random variable

1
n
S2 = ∑ (Xi - X ) 2.
i=1
n- 1
So, the sample mean we computed before in Lesson 2 is a value of X.
We also say that X1, X2, … , Xn is a sample from the population X = height of an American. We assume that our sampling
was done with replacement. Such a sample has the following properties.

1. Let X = height of an American and let mean of X be μ and variance σ2. Then X is called the parent or
the population random variable. Also μ and σ2 are called the population mean and variance.
2. Then, each of the sample member Xi has the same distribution as X. So, mean of Xi is μ and variance of Xi is σ2.
3. The sample members X1,X2, …, Xn are all mutually independent.
4. The distribution of X is called the sampling distribution of X.
5. Theorem. The mean of the sample mean X is the population mean μ, that is

E(X) = E(X) = μ

The variance of the sample mean X is given by

Var(X) = σ2/n

So, the standard deviation of X, denoted by σ X, is given by

σX = σ/√n.

6. Definition. The standard deviation σX is also called standard error.

Remark. In the above discussion, we have assumed that the sampling was done with replacement. That means that each
time a sample member is drawn, it is placed back before we select the next member. A member could, therefore, appear
more than once. Although this may seem unnatural, when we are working with a large population this is not likely to
happen and is most natural from the statistical point of view. (How often would one receive calls twice for the same
poll?)

The type of sampling where we do not place back the item selected before we select the next one is called sampling
without replacement. Although many textbooks have a lengthy discussion of this concept, we will not emphasize it. All our
samples are drawn with replacement and have the above properties.

6.1 Central Limit Theorem and Sampling Distribution of the Proportion

Central Limit Theorem


Suppose X1,X2, …,Xn is a sample from a population X with mean μ and variance σ2.

Assume n is large.

1. Then the sample mean X is, approximately, distributed as

N(μ,σX)

where σX= σ/√n.

2. So, approximately,

P(a < X <b)=P(L < Z < R)

where L=(a-μ)/σ X and R=(b-μ)/σ X

OR

a- μ b-μ

P(a < X < b) = P <Z< .

σ/√n σ/√n
3. If the parent population X is Normal, then 1) and 2) are exact.

Sampling Distribution of the Proportion

Suppose you are conducting a poll to determine the proportion p (or percentage) of people in favor of a certain
presidential candidate. You interview a randomly selected sample of n voters. Then you let X be the number of people
among these n voters who are in favor of the candidate. Then X/n is the proportion in this sample that are in favor of the
candidate. We use this sample proportion X/n as an estimate for the proportion of the entire voter population that are in
favor of the candidate. This is the number X/n that the pollsters report on TV every evening before the election.

Here p is the proportion of voters that are in favor of the candidate. So, X is a B(n,p) random variable. We have already
seen (section 5.3 in lesson 5) that, approximately, X follows a N(μ, σ) distribution, where μ = np, σ = √(np(1-p)). From
this it follows that the sample proportion X/n, approximately, has

N (p, σ) distribution
where σ =(p(1-p)/n)1/2.

In fact, the same could be derived from the central limit theorem. Let

Y=1 if success
Y=0 if failure

Here by "success" we mean that the voter is in favor of the candidate. Then Y is a Bernoulli(p) random variable and the
mean of Y is p and the variance(Y) = p(1-p). The response of each voter in the sample could thus be represented as a
random variable as follows

Xi=1 if ith sample is a success


Xi=0 if ith sample is a failure

Then X1,X2, … , Xn is a sample from the Y- population, and the sample proportion

X/n = X =(X1+X2+… +Xn)/n

is the sample mean. So, by CLT the sample proportion X=X/n, approximately, has

N(p,σ) distribution

where σ=(p(1-p)/n)1/2.

The final formulas regarding sample proportion X=X/n are as follows:

1. The mean μ and the standard deviation σ of X=X/n are given by


μ=p σ = (p(1-p)/n)1/2.

2. So, approximately,

P(a < X <b)=P(L < Z < R)

where L=(a-μ)/σ and R=(b-μ)/σ

OR

a- p b- p

P (a < X/n < b ) = P <Z< .

σ X/n σ X/n

Remark. The same thing applies when you are trying to estimate the proportion of success p. Some examples might be
the proportion of defective items, the proportion of people in favor of capital punishment, the proportion of immigrants.

Remark. The normal approximation of the sample proportion given above is not really different from the normal
approximation of the binomial random variable (section 5.3). The only difference is the way we use them. In section 5.3,
we used continuity correction. For large n, continuity correction is, in fact, negligible and will not have any effect.

Problems on 6.1: Central Limit Theorem and Sampling Distribution of the Proportion

Problems on Central Limit Theorem:

Exercise 6.1.1. It is known that the tuition paid per semester by students in a university has a distribution with mean
$2,050 and standard deviation $310. If 64 students are interviewed, what is the approximate probability that the sample
mean tuition paid will be above $2,060?
Solution

Exercise 6.1.2.
The monthly water consumption X per household in a subdivision in Kansas City has normal distribution with mean 15000
gallons and standard deviation 3000 gallons. What is the probability that the mean consumption of the 44 households in
the subdivision will exceed 16000 gallons?
Solution

Exercise 6.1.3. According to some data, the annual Kansas wheat export X has a mean 733 million dollars and standard
deviation 163 million dollars. What is the probability that over the next 10 years Kansas wheat exports will exceed 8040
million dollars?
Solution

Problems on Population Proportion:

Exercise 6.1.4. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control
(CDC) 18 percent of the children younger than two had anemia in 1997. On a particular day in that year, a pediatrician
examined 180 children.

1. What is the expected (sample) proportion of children with anemia?


2. What is the variance of the sample proportion of children with anemia?
3. What is the probability that the proportion will exceed 0.20?

Solution

Exercise 6.1.5. On one day during an impeachment hearing, it is claimed that 75 percent of eligible voters think the
President should not be impeached. Suppose we interview 700 voters. Assuming the above, what is the probability that the
sample proportion of voters who do not think the President should be impeached

1. is less than .73?


2. is less than .70?
3. is less than .60?
Lesson 7: Estimation

Introduction 7.1 Point and Interval Estimation 7.2 When σ Is Unknown


2 7.4 About the Population Proportion Homework 18 - 24
7.3 Confidence Interval for σ

Introduction

The name of the game in statistics is trying to understand the POPULATION on the basis of the information
available in the SAMPLE. Part of what we mean by "understand" is estimating the values of the population parameters.
The game here is to use suitable sample STATISTICS to estimate population parameters. For example, we may like
to use the sample mean x as an estimate for the population mean μ.

We consider two methods of estimating parameters.

1. The first one is called point estimation. In point estimation, we give a number as an estimate for the parameter.
For example, if we are trying to estimate the mean height μ of the American population, we may take a sample of
a certain size, compute the sample mean height x, and call it an estimate for μ.
2. The second one is called interval estimation. In interval estimation we give an interval (L, U) and say that the
parameter will be within this interval (with a certain level of confidence). For example, when estimating the mean
height μ of the American population, we may take a sample, compute the sample mean x and say that the
population mean μ is in the interval (x-1, x+1). Obviously, in interval estimation, the smaller the length, U-L, of
the interval and the higher the level of confidence, the better the estimation is.

7.1 Point and Interval Estimation

As we have already mentioned, we use a statistic to estimate a parameter. The statistic T used to estimate a
parameter θ is called an estimator of θ. The computed value t of T is called a point estimate or an estimate of θ. For
example, the sample mean X is an estimator of μ and the computed value x is an estimate of μ. The estimator is a
sampling random variable. Similarly, the sample variance S2 is an estimator of the population variance σ2 and the
computed value s2 is an estimate of σ2.

It may be intuitively clear to you why X and S2 would be reasonable estimators, respectively, for μ and σ2. Mathematically,
the reasons are as follows:

1. We have

E(X) = μ E(S2) = σ2.

For this reason we say X and S2 are unbiased estimators, respectively, for μ and σ2.

2.
var(X) = σ2/n
is small if n is large. So, for large n, the standard deviation σ of X decreases. This means that values of X will be
X
close to the mean μ more frequently. This improves the level of confidence for X as an estimator of μ. View the
animation on normal distribution to see how the probability mass concentrates around the mean μ as the standard
deviation decreases.

Interval Estimation

We would almost never expect a point estimate t of a parameter θ to be exactly equal to the actual value of θ. This is why
it is more reasonable to give an interval (L,U) and say that θ would be within this interval. Here L, U will be statistics.
Since the computed values of L = l,U = u will depend on the sample, we do not expect that the value of θwill always be
within this computed interval (l,u). We are happy as long as the true value of θ falls within the interval (l,u) most
often (or often enough), allowing the possibility of being "wrong" a few times.

But how often is often enough? The probability P(L < θ < U) tells us how often the paramenter will fall within (l,u). So, it is
also reasonable to give the probability P(L < θ < U) or P( θ ∉ (L,U)). This is what we do in interval estimation, also
called a confidence interval of θ.
Definition. Let θ be a population parameter. An interval estimate for θ provides the following:

1. It gives an interval (L,U) as an estimate for θ. Here L,U are statistics.


2. It also gives the probability P(L < θ < U). This number

P(L < θ < U) = 1- α

is called the level of confidence. And (L,U) is said to be a (1-α)100 percent confidence interval of θ.
3. In practice, α will be a small number, like, 0.1, 0.01, 0.05.

We need the following definition.


Definition: Given a number 0 < α < 1, the number zα is defined by the formula

P(Z > zα) = α.

View the animation on inverse Z-distribution to understand the numbers z . As mentioned above, for us a will be a small
α
number .1, .01, .05 and so on. At the end of the Z-table is a list of the numbers z that we may need frequently.
α

A (1-α)100 percent confidence interval for the mean μ:

Suppose X is a random variable with mean μ and variance σ2. We want to construct a confidence interval for μ.
We assume that σ is known. Let X1,X2, …, Xn be a sample from X. Note that from CLT we have, approximately,

P(-zα/2 < Z < zα/2 ) = 1 - α

where Z=(X-μ)√n/σ.

If we simplify, we get

P(X-E < μ < X+E)=1- α

where E=zα/2 σ/√n.

So we have the following theorem.

Theorem. Assume that σ is known. Then a (1-α)100 percent confidence interval for μ is given by

X-E < μ < X+E

where E=zα/2 σ/√n.

Remarks.

1. If you go on computing (1-α)100 percent confidence intervals on a regular basis, the true value of μ will not be
within the confidence interval α100 percent times.
2. The confidence interval we computed above may also be called a (1-α)100 percent two sided confidence interval
for μ. There could be all kinds of confidence intervals. For example, if

P(L < μ < ∞) =1 - α.

then (L, ∞) will be a (1-α)100 percent one sided (upper) confidence interval for μ.

Definitions and Formulas:

1. The length l of this (1-α)100 percent confidence interval for μ is given by

l = 2zα/2σ/√n.

2. The margin of error E is defined as

E = zα/2σ/√n.

3. The sample size n needed for a (1-α)100 percent confidence interval to have a preassigned margin of error E is
given by
n = (zα/2σ/E)2.

To be sure, always round upward in this class. Also use the Z-table for online homework.

Use of Calculators (if you have a TI-83): Z-interval


1. Press stat and then select TESTS.
2. Select Z-interval and enter.
3. Input: you will have to select stats (not data) in this section.
4. Feed in the values of σ, x, n and c-level.
5. Select calculate and enter. It will give you the confidence interval.
6. The margin of error = E = (width of the interval)/2.
7. To compute the sample size, use the formula above.

Problems on 7.1: Point and Interval Estimation

Exercise 7.1.1. Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you
have collected a sample of size 25 and the sample mean X was found to be 81.

1. Find a 99 percent confidence interval for μ.


2. Find the margin of error at 99 percent level of confidence.
Solution

Exercise 7.1.2. Assume that you have a normal population with mean μ and standard deviation σ = 9.8. Suppose you
have collected a sample of size 14 and the sample mean X was found to be 151.1.

1. Find a 99 percent confidence interval for μ.


2. Find the margin of error at 99 percent level of confidence.
Solution

Exercise 7.1.3. The time taken by an athlete to run an event is normally distributed with mean μ and known standard
deviation σ = 3.5 seconds. To estimate the mean μ, he ran 16 times and the sample mean was found to be X = 33
seconds.

1. Find the margin of error in estimating the true mean μ with 95 percent level of confidence.
2. Find a 99 percent confidence interval for μ.
Solution
2
Exercise 7.1.4. A population has normal distribution with variance σ = 289. How large a sample do we need to estimate
the mean μ within 3 units from the true value of μ, with 90 percent confidence?
Solution

Exercise 7.1.5. The tuition X paid by a student per semester in a university has a distribution with mean μ and σ = $416.
How large a sample should you draw so that you are 95 percent sure that the true value of μ will be within $10 of the
sample mean x?
Solution

7.2 When σ Is Unknown

Let X be a normal random variable with mean μ and variance σ2. Unlike in the last section, in this section we assume
that σ is not known, and we try to compute a confidence interval of μ. In the last section, the main tool (or fact) that we
used was that

Z=(X-μ) √n/σ

has N(0,1) distribution. In this section, we use the distribution of


T=(X-μ) √n/S.

The distribution of T is known as t-distribution with degrees of freedom n-1, which we have not discussed. As we did
for the N(0,1) random variable, we will now give the properties of t-distribution.

About t-distribution
Given a positive integer ν, there is a random variable T = tν that is said to have t-distribution with degrees of freedom ν.
The useful properties of t-distribution are listed below:

1. A t-random variable has degrees of freedom. If a random variable T has t-distribution with degrees of
freedom ν then we say that T has tν distribution.

2. The t-random variables are continuous random variables.

3. The mean of a t-random variable is ZERO.

4. The graph of the pdf of a t-random variable is symmetric around the y-axis and has a bell shape.

a. Flash animation: t-distribution


b. Flash animation: probability computation.

5. For a T = tν random variable, if the degrees of freedom ν is large, then it can be approximated by a N(0,1)
random variable.

6. For a number 0 < α < 1 and any positive integer ν, we define a number tν, α by the equation

P(T > tν, α) = α

where T has t-distribution with degrees of freedom ν.


View the animation on inverse-T distribution to undertand the numbers t .
ν, α

7. Tables are available, one for each degree of freedom ν, that can be used to compute the probability for T-random
variables. We will need only some of the numbers t . A table sufficient for us is provided at link for a table .
ν, α

Theorem. Let X be a normal random variable with mean μ and standard deviation σ. Let X ,X ,…, X be a
1 2 n
sample of size n from the X population. Then

T=(X-μ) √n/S.

has t-distribution with degrees of freedom n-1.

So,
P(-tn-1,α/2 < (X-μ)√n/S < tn-1,α/2 ) = 1-α.
If we simplify, we get

P(X-E < μ < X+E)=1- α


where E=tn-1,α/2S/√n.

A (1-α)100 percent Confidence Interval for μ

Under the set up of the theorem, a (1-α)100 percent confidence interval for μ is given by

X-E < μ < X+E

where E=tn-1,α/2s/ √n
E is also called the margin or error.

A Frequently Asked Question:To estimate μ, when do we use the ZInterval and when do we use the TInterval? Answer:
We use the TInterval only when σ is not known.

Use of Calculators (if you have a TI-83): T-interval


1. If we have raw data, enter the data into the Calculator.
2. Press stat and then select TESTS.
3. Select T-interval and enter.
4. Input: you will have to select stats or data, depending on what is given.
5. Feed in the values of sample standard deviation s, x, n or the List where you have the
data and c-level.
6. Select calculate and enter. It will give you the confidence interval.
7. The margin of error = E = (width of the interval)/2.

Problems on 7.2: When σ Is Unknown

Exercise 7.2.1. Assume that we have normal populations with mean μ and standard deviation σ. We have a sample of
size n = 18 that has sample mean x = 170.5 and standard deviation s = 13.3. Find the margin of error and compute a 99
percent confidence interval for μ.
Solution

Exercise 7.2.2. Suppose that the time taken to complete a problem in a Math 365 test is normally distributed with
mean μ and standard deviation σ. A sample of size 23 was taken, and sample mean and standard deviation were found to
be x = 4.7 and s = .47. Estimate the mean time μ taken to complete a problem using a 98 percent confidence interval.
Solution

Exercise 7.2.3. It is assumed that the lifetime (in hours) of lightbulbs produced in a factory is normally distributed with
mean μ and standard deviation σ. To estimate μ the following data was collected on the lifetime of bulbs.

5110 4671 6441 3331 5055 5270 5335 4973 1837


7783 4560 6074 4777 4707 5263 4978 5418 5123

Compute a 95 percent confidence interval for μ. Write down the formula for (1-α)100 percent confidence interval that you
use here.
Solution

Exercise 7.2.4. To estimate the mean weight (in pounds) of salmon in a river the following sample was collected:

34.7 33.8 38.2 20.3 27.8 45.3 43.1 37.3 32.5 32.3
31.8 41.5 44.5 29.2 25.3 29.6 39.5 29.1 37.3
Compute a 99 percent confidence interval for the sample mean μ. Write down the formula for (1-α)100 percent confidence
interval that you use here.
Solution

Exercise 7.2.5. Suppose we collect a sample from a normal population of size n = 40 with sample mean X = 18.6 and
standard deviation s = 9.486. Construct a 95 percent confidence interval for mean μ.
Solution

Exercise 7.2.6. The time taken by an athlete to run an event is normally distributed with mean μ and unknown standard
deviation σ. To estimate the mean μ he ran 16 times and the sample mean was found to be X = 33 seconds and the
sample standard deviation s = 3.5 seconds.

1. Find the margin of error in estimating the true mean μ with 99 percent level of confidence.
2. Find a 99 percent confidence interval for μ.
Solution

7.3 Confidence Interval for σ2

Let X be the normal random variable with mean μ and variance σ2. In this section, we will construct a confidence interval
for σ2. We will take a sample X1,X2, …, Xn of size n from the X population. Let X be the sample mean and let S2 be the
sample variance. To compute a confidence interval for σ2, we will be using the distribution of

U = (n-1)S2/σ2

The distribution of U is known as χ2 distribution with degrees of freedom n-1, which we have not discussed. Next we
will give the properties of a χ2 random variable.

About χ2-distribution

Given a positive integer ν, there is a random variable χ2ν that is said to have χ2 distribution with degrees of freedom ν.
The useful properties of χ2 distribution are listed below.

1. A χ2 random variable has a degree of freedom. If a random variable U has χ2 distribution with degrees of
freedom ν then we say that U has χ2ν-distribution.

2. The χ2 random variables are all continuous random variables.

3. A χ2 random variable is always nonnegative.

2
4. The graph of the pdf of a χ random variable is skewed to the right. If the degrees of freedom, ν, is large then it
can be approximated with a N(0,1) random variable.
View the animations on pdf of Chi-Square random variable and probability distribution of Chi-Square.

5. If U is a χ2ν random variable then the mean of U is ν. (We will not need this.) This fact is reflected in the
animation above.

6. For a number 0 < α < 1 and any positive integer ν, we define a number χ2ν, α by the equation
P(U > χ2 )=α
v,α

2
where U has χ distribution with degrees of freedom ν.
2
View the animation on inverse Chi-Square distribution to undertand the numbers χ .
ν, α
2
7. Tables are available, one for each degree of freedom ν, that can be used to compute probability for χ -random
2
variables. For our purpose, only some of the numbersχ will be needed. Here is a link for a table that will be
ν, α
sufficient for us.
Theorem. Let X be a normal random variable with mean μ and variance σ2. Let X1,X2,…,Xn be a sample of size n from
the X population. Then

T = (n-1)S2/σ2

has χ2 distribution with degrees of freedom n-1.

So,

P(χ2 n-1,1-α/2 < (n-1)S2/ σ2 < χ2 n-1,α/2 ) = 1-α.

If we simplify, we get

P(L < σ2 < U) = 1-α

where

L = (n-1)S2/χ2n-1,α/2

U = (n-1)S2/χ2n-1,1-α/2

Theorem. Under the same set-up as in the above theorem, a (1-α)100 percent confidence interval for the variance σ2 is
given by

l < σ2 < u

where

l = (n-1)s2/χ2n-1,α/2

u = (n-1)s2/χ2n-1,1-α/2

OR

(n- 1)s2 (n- 1)s2

<σ2< .

χ2n- 1, [(α)/2] χ2n- 1, 1- [(α)/2]

Use of Calculators: The TI-83 will not compute the confidence interval for σ2. If data is given, it is important to use the
calculator to compute the sample variance s2.

Problems for 7.3: Confidence Interval for σ2

Exercise 7.3.1. Suppose that we have collected a sample of size n = 26 from a normal population with mean μ and
2 2 2
variance σ . The sample variance was found to be s = 26.7. Compute a 95 percent confidence interval for σ .
Solution

Exercise 7.3.2. The following is sample data on the amount (in 1000 bushels) of wheat harvested by Kansas farmers in
2002.

206 300 200 385 280

600 225 933 320 260


1. Compute a 99 percent confidence interval for the variance of harvest σ2.

Solution

Exercise 7.3.3. The following is data on monthly gas consumption (in ccf) during the winter months by a household.
154 222 264 257 127

228 240 393 278 140


1. Compute a 99 percent confidence interval for the variance σ2.

Solution

7.4 About the Population Proportion

Once again, let p be the population proportion of a certain attribute. We want to compute a confidence interval for p. We
let

X = 1 if success
X = 0 if failure

where "success" means that the sample has the attribute.


So, X is a Bernoulli(p) random variable. We draw a sample X1,X2,…, Xn from the X population, let

X = X1+…+Xn

be the total number of success and

X=X/n

be the sample proportion of success. We have seen that, approximately, the sample proportion X has

N(μX, σX)-distrubution

where μX = p and σX = √((p(1-p))/n).

Therefore,
P(-zα/2 < (X-p)/σX < zα/2 ) = 1-α.

In an attempt to compute a confidence interval for p we simplify and get


P(X-zα/2 σX < p < X+zα/2 σX) ) = 1-α.

Since p is unknown, this will not produce a confidence interval for p. But the sample proportion x of success is a point
estimate of p. So we have an approximate (1-α)100 percent confidence interval for p given by

x-e < p < x+e

where

e = zα/2 √(x(1-x)/n)

Following are some of the useful formulas and definitions that we may need.

1. The margin of error e is defined as

e = zα/2 √(x(1-x)/n)

2. A conservative margin of error E is defined as

E = zα/2/√4n.

It can be checked that the margin of error e is always less or equal to the conservative margin of error E.

3. Theorem. For a (1-α)100 percent confidence interval for p, if we are given a preassigned conservative margin of
error E, then the sample size n that we need to take is given by

n = (zα/2/2E)2 , rounded to the higher integer.


Remark. In the days of Clinton's impeachment, we often heard TV newscasters read something like the following.

President Clinton has 64 percent approval rating. The poll has a margin of error plus or minus 3.1
percentage points. The poll surveyed 972 people.

They mean that the sample proportion x of people who "approve" President Clinton is 0.64. Normally they don't tell us the
level of confidence they are using. Assuming that they are using a 95 percent confidence interval, they mean that
E = zα/2 /√4n = 1.96/√(4x972) = 0.031.

Use of Calculators (if you have a TI-83): 1-PropZint


1. Press stat and then select TESTS.
2. Select 1-PropZint and enter.
3. Feed in the values of number of success x, n and c-level.
4. Select calculate and enter. It will give you the confidence interval.
5. The margin of error = e = (width of the interval)/2.
6. To compute the conservative margin of error, use the formula in the definition.
7. To compute the sample size, use the formula above.

Problems on 7.4: About the Population Proportion

Exercise 7.4.1 In a sample of 197 apples from a lot, 19 were found to be sour. Set a 99 percent confidence interval for
the proportion p of sour apples in the lot.
Solution

Exercise 7.4.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 97 of them
developed immunity. Find a 95 percent confidence interval for the proportion p of individuals in the population for whom
the vaccine would help.
Solution

Exercise 7.4.3. Before a congressional election, a poll was conducted. Out of 887 randomly selected voters interviewed,
389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

1. Construct a 98 percent confidence interval for the proportion p of voters who would vote for A.
2. Construct a 98 percent confidence interval for the proportion p of voters who would vote for B.
3. What is the conservative margin of error for both?

Solution

Exercise 7.4.4. If a pollster wanted to estimate the proportion p of Americans who think that the President should not be
impeached, how large a sample should he/she take so that the true value of p will be within .02 of the sample proportion,
with 99 percent confidence?
Solution

Exercise 7.4.5. The proportion p of defective lightbulbs produced by a machine needs to be estimated within .01 to
determine whether the machine needs to be replaced. How large a sample should we take to do this with 90 percent
confidence?
Solution

Exercise 7.4.6. In a poll released on October 28,1998, it was revealed that 60 percent of Americans wanted President
Clinton rebuked but not impeached. The poll was conducted among 1,013 adults, and it had a margin of error of 3
percentage points.

1. Can you relate the last two numbers?


2. What is the level of confidence used here?

Solution: News media polls use 95 percent confidence intervals. When they say "margin of error," they mean "conservative
margin of error." The conservative margin of error E and level of confidence 1 - α are related by the formula E =
zα/2 /√4n. For this problem E = .03, 1 - α =.95, and n =1,013. We can check zα/2 /√4n = 1.96/√(4x1013) =
0.03079.
Lesson 8 : Comparing Two Populations

Introduction 8.1 Confidence Interval of μ - μ


1 2
8.2 When σ and σ are Unknown 8.3 Comparing Two Population Proportions
1 2
Homework 25 - 27

Introduction

In this lesson we try to compare two populations. We will consider the following:

1. Compute a confidence interval of the difference μ1- μ2 of the means of two populations. For example, we may like
to estimate the difference μ1 - μ2 between the mean μ1 = annual male income and the mean μ2 = annual
female income in the United States.

2. Compute a confidence interval of the difference p1-p2 of the proportions of an attribute present (or proportions of
"success") in two populations. For example, we may like to estimate the difference p1-p2 between p1 = the
proportion of defective items produced by the new machine and p2 = the proportion of defective items produced
by the old machine.

8.1 Confidence Interval of μ1- μ2

Suppose X, Y are two similar random variables. Let mean and standard deviation of X be, respectively, μ1 and σ1. Let
mean and standard deviation of Y be, respectively, μ2and σ2. We want to compute a confidence interval for the
difference μ1- μ2. So we do the following.

1. We draw a sample X1, X2, …, Xm, of size m, from the X population and we draw a sample Y1, Y2, …, Yn, of size n,
from the Y population. Let

X = (X1+X2+ … +Xm)/m

Y = (Y1+Y2+ … +Yn)/n

be the corresponding sample means.

2. BY CLT, we have that X has

N(μ1, σ1/√m )

distribution and Y has

N(μ2, σ2/√n )

distribution.

3. You would agree that X-Y is a natural estimator of μ1- μ2.

4. Now we assume that the X samples and Y samples are mutually independent. In that case, it follows that X-Y has

N(μ1 - μ2, σ) - distribution,

where
σ = √( σ12/m + σ22/n ).

5. It follows that

P(-zα/2 < ((X-Y) - (μ1 - μ2)) /σ < zα/2 ) = 1 - α.

where σ is as above in (4).

6. If we simplify, we get

P(X-Y -zα/2 σ < μ1 - μ2 < X-Y +zα/2 σ ) = 1 - α.

where σ is as above in (4).

7. Theorem. A (1-α)100 percent confidence interval for μ1- μ2 is given by

x-y -zα/2 σ < μ1 - μ2 < x-y +zα/2 σ

where σ is as above in (4).

This formula is usable if we know the values σ1 and σ2.

8. The margin of error is given by

E = zα/2 σ

where σ is as above in (4).

Use of Calculators (if you have a TI-83): 2-SampZinterval


1. Press stat and then select TESTS.
2. Select 2-SampZinterval and enter.
3. Input: you will have to select stats (not data) in this section.
4. Feed in the values of σ1, σ2, x,y, m, n and c-level.
5. Select calculate and enter. It will give you the confidence interval.
6. The margin of error = E = (width of the interval)/2.

Problems on 8.1: Confidence Interval of μ1 - μ2

Exercise 8.1.1. Suppose we have two normal populations with means μ , μ and standard deviation σ , σ respectively.
1 2 1 2
It is known that σ = 8.1 and σ = 11.3. A sample of size m = 64 was collected from the first population, and the sample
1 2
mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean
was found to be y = 4.1. Compute a 95 percent confidence interval for the difference of mean μ - μ .
1 2
Solution

Exercise 8.1.2. The birth weight of babies in developed and developing countries are normally distributed with
mean μ1, μ2 and standard deviation σ1, σ2, respectively. (My data is not real.) Given σ1 = 2.3 pounds and σ2 = 2.9
pounds. A sample of size m = 35 babies from the developed nations were collected and the sample mean birth weight
was found to be x = 8.9 pounds. A sample of size n = 48 babies from the developing nations was collected and the
sample mean birth weight was found to be y = 7.1 pounds.
1. Compute a point estimate of the difference of mean birth weight μ1- μ2.

2. Determine the margin of error of the difference μ1- μ2 at the 95 percent level of confidence.

3. Construct a 95 percent confidence interval for μ1- μ2.

Solution

Exercise 8.1.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. The mean height and standard deviation of African elephants
are μ1, σ1 = 1.2 feet, respectively. The mean height and standard deviation of Indian elephants are μ2, σ2 = 1.1 feet,
respectively. A sample of size 25 African elephants were collected and the sample mean height was found to be x = 10.9
feet. A sample of size 28 Indian elephants was collected and the sample mean height was found to be y = 9.1 feet.

1. Compute a point estimate of the difference of mean height μ1- μ2.

2. Determine the maximum error of the difference μ1- μ2 at the 99 percent level of confidence.

3. Construct a 99 percent confidence interval for μ1- μ2.

Solution

8.2 When σ1 and σ2 are Unknown

As in the last section, we have two populations X, Y. We assume that X has N(μ1, σ1) distribution and Y has N(μ2, σ2)
distribution. Unlike in the last section, we assume thatσ1, σ2 are unknown. We try to find a confidence interval
for μ1 - μ2.

We take a sample X1, X2, …, Xm of size m from the X population, and we take a sample Y1,Y2, …, Yn from the Y
population. Following are some facts and notations.

1. Assumptions: We make an important assumption that the variances σ12 and σ22 are equal. So, we write

σ1 = σ2 = σ.

And, we also assume that the X-sample and the Y-sample are mutually independent.

2
Let X and S
X

2. be the sample mean and sample variance of the X-sample and let Y and SY2 be the sample mean and sample
variance of the Y-sample.

3. Definition. Define the pooled estimate Sp2 for σ2 as follows

S p2 =
[(m-1)SX2+(n-1)SY2 ]/ [m+n-2] =

[ ∑ (Xi-X)2 + ∑ (Yj-Y )2 ] / [m+n-2]

Although both SX2, SY2 are estimators of σ2, Sp2 is a better estimator for σ2 because it uses both the samples.
One can see that Sp2 is a weighted average of SX2and SY2.

4. It follows that

T = [ (X - Y) - (μ1 -μ2) ] / [Sp√(1/m + 1/n) ]

has a t-distribution with m+n-2 degrees of freedom.

5. Using the same kind of computations that we have done before, we see that a (1-α)100 percent confidence
interval for μ1- μ2 is given by

x-y-E < μ1- μ2 < x-y+E

where

E=tm+n-2,α/2 Sp √(1/m + 1/n)

Use of Calculators (if you have a TI-83): 2-SampTint


1. If we have raw data, enter the data into the calculator in 2 lists (say L1,L2).
2. Press stat and then select TESTS.
3. Select 2-SampTinterval and enter.
4. Input: you will have to select stats or data, depending on what is given.
5. Feed in the values of sample standard deviation s1, s2, x, y, m, n or the Lists where you have the data and c-
level.
6. Select calculate and enter. It will give you the confidence interval and also the pooled estimate of the equal
standard deviation σ.
7. The margin of error = E = (width of the interval)/2.

Problems on 8.2: When σ1 and σ2 Are Unknown

Exercise 8.2.1. Suppose that we are comparing two "similar" normal populations with means μ1, μ2 respectively and the
populations both have standard deviation σ. We collected a sample of size m = 11 from the first population that produced
a sample mean x = 13.2 and sample standard deviation s1 = 2.33. A sample of size n = 13 was collected from the
second population that had sample mean y = 11.5 and sample variance s2 = 2.73.

1. Compute the pulled estimate sp for σ.

2. Find a point estimate for μ1- μ2.

3. Compute the margin of error in estimating μ1- μ2 at the 90 percent level of significance.
4. Compute a 90 percent confidence interval for μ1- μ2.

Solution

Exercise 8.2.2. Suppose we have two normal populations with means μ1, μ2 and equal standard deviation σ. A sample
of size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x =
3.7, s1 = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard
deviation were y = 4.1, s2 = 8.7.

1. Compute the pulled estimate sp for σ.

2. Compute the margin of error for a 95 percent confidence interval for μ1- μ2.

3. Compute a 95 percent confidence interval for the difference of mean μ1- μ2.

Solution

Exercise 8.2.3. The birth weight of the babies in developed and developing countries are normally distributed with
mean μ1, μ2 and equal standard deviation σ. (My data is not real.) Suppose the following data about the birth weight
from developed and developing nations were collected.

Developed
8.8 8.1 6.3 9.7 6.3 Developing

7.1 5.3 7.7 9.1 8.1 6.3 5.2 8.3 5.9 5.5

8.2 7.9 8.3 8.9 9.0 7.1 8.1 7.9 6.3 6.9

10.1 9.9 8.8 7.8 5.2 9.1 8.1 7.0 4.9 5.3

7.2 6.3 7.1 6.3 6.1 5.8


5.7 6.8 8.3 7.7

1. Compute a point estimate of the difference of mean birth weight μ1- μ2.

2. Compute the pulled estimate for σ.

3. Determine the maximum error of the difference μ1- μ2 at the 95 percent level of confidence.

4. Construct a 95 percent confidence interval for μ1- μ2.

Solution

Exercise 8.2.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. Assume that the height of African and Indian elephants have an
equal mean σ. The mean heights of African elephants and Indian elephants are μ1, μ2, respectively. Suppose the
following data were collected on the height of elephants from the two continents (these are not real data).

African
10.9 11.7 9.3 9.9 11.5 Indian

8.8 12.9 11.7 9.1 11.1 7.1 8.3 8.2 9.1 10.3

9.1 8.7 10.5 11.3 12.3 9.3 9.7 8.9 8.8 9.1

13.1 12.9 9.5 10.7 11.3 7.9 9.9 9.2 8.8 8.1
8.7 8.8 9.3 10. 1 9.9
9.9

1. Compute a point estimate of the difference of mean height μ1- μ2.


2. Determine the maximum error of the difference μ1- μ2 at the 99 percent level of confidence.
3. Construct a 99 percent confidence interval for μ1- μ2.

Solution

8.3 Comparing Two Population Proportions


In this section, we compute a confidence interval for the difference p1-p2 of two population proportions. An example
follows.
Example. We would like to have an estimate for the difference between the proportion p1 of males who are making more
than fifty thousand dollars annually and the proportion p2 of females who are making more than fifty thousand dollars
annually. We construct a confidence interval for p1-p2.

Similarly, we might like to compare the proportion of defective items produced by an old machine and new machine in a
factory.
Assume we have two populations. Let p1 be the proportion of Population 1 that has an attribute A and let p2 be the
proportion of Population 2 that has the attribute A. We want to compute a confidence interval for p1-p2.

So, we take a sample of size m from Population 1 and let X be the number of sample members that have the attribute A
and X=X/m be the sample proportion that has the attribute A. ( We may say that X is the number of "success" in this
sample from Population 1 and X=X/m is the proportion of "success".) We take a sample from Population 2 of size
n, which is independent of the other sample. Let Y be the number of sample members that has attribute A
and Y=Y/n be the sample proportion that has the attribute A. (So, Y=Y/n is the sample proportion of "success" from
Population 2.)

(Let me explain the context of the example above. We interview m males and X would be the number of males
in this sample who make more than fifty thousand annually and X=X/m would be the proportion of the males
in this sample who make more than fifty thousand annually. Similarly, we interview n females and Y=Y/n
would be the proportion of females in this sample who make more than fifty thousand.)
We develop a confidence interval for p1-p2 as follows.

1. Notation. For the sample proportions, we have the following notatons:

X=X/m

Y=Y/n

2. As we have seen before, by CLT, we have that X has N(p1,σ1) distribution where σ1 = √(p1(1-p1)
/m) and Y has N(p2,σ2) distribution where σ2 = √(p2(1-p2) /n).

3. You would agree that X-Y is a natural estimator of p1-p2.

4. As we have assumed that the X samples and Y samples are mutually independent, it follows that X-Y has N(p1-
p2,σ) distribution where σ = √ ( σ12 + σ22 ).

5. So, it follows that


P(-zα/2<( (X- Y)-(p1-p2))/σ < zα/2 ) = 1-α

6. If we simplify, we get P((X- Y) -zα/2σ < p1-p2< (X- Y) +zα/2σ) = 1-α

7. As in section 7.4, we use X as an estimate for p1 and Y as an estimate for p2 and get the following theorem.
Theorem. An approximate (1-α)100 percent confidence interval for p1-p2 is given by

X-Y -E < p1-p2 < X-Y+E

where
E= Zα/2√( X(1-X)/m + Y(1-Y)/n )

8. The E is called the margin of error.

Use of Calculators (if you have a TI-83): 2-PropZint


1. Press stat and then select TESTS.
2. Select 2-PropZint and enter.
3. Feed in the values of number of successes x, y, sample sizes n1, n2 and c-level.
4. Select calculate and enter. It will give you the confidence interval.
5. The margin of error = E = (width of the interval)/2.

Problems on 8.3: Comparing Two Population Proportions.

Exercise 8.3.1. Suppose two independent samples were collected from two populations. We want to compare the
proportions p ,p , respectively, of an attribute A present in these two populations. Use 95 percent confidence interval to
1 2
estimate p -p . We are given that x = 55 had the attribute A in a sample of size m = 117 from the first population and y
1 2
= 37 had the attribute A in a sample of size n = 79 from the second sample.
Solution
Exercise 8.3.2. To compare the proportions p ,p of defective items produced by new and old machines, respectively,
1 2
samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of
41 items from the old, 9 were defective. Compute a 99 percent confidence interval for p -p
1 2
Solution
Exercise 8.3.3. To compare the proportions p1,p2 of men and women, respectively, who watch football, data was
collected. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch
football. (These are not real data.) Construct a 99 percent confidence interval for p1-p2.

Lesson 9 :Testing Hypotheses

9.1 The Philosophy of Testing Hypotheses 9.2 Developing a Test 9.3 Testing on a Single Population
2 9.5 Population Proportion 9.6 Testing of Hypotheses to Compare
9.4 Testing Hypotheses on Variance σ
Two Populations
9.7 Comparing Means of Two 9.8 Comparing Proportions p , Homework 28 - 32
1
Populations: σ , σ Unknown p of Two Populations
1 2 2

9.1 The Philosophy of Testing Hypotheses


In this lesson we will test a hypothesis H0, called Null hypothesis, against hypothesis HA, called the alternative
hypothesis. Only one of these two hypotheses is true.Based on the collected sample and testing criterion that we
will set up, we will accept only one of them and reject the other.

Example 1. Suppose we want to test the hypothesis that the disparity between the wages (annual income) of working
men and women does not exist any more. Let μ1 be the mean annual income of men and μ2 be the mean annual income
of working women. So, our Null hypothesis H0 and the alternative hypothesis HA may be written as

H0 : μ1- μ2 > 0
HA : μ1- μ2 = 0
Example 2. A TV commentator mentions that only about 10 years ago the average life expectancy of a human being was
75, and now it has increased substantially. To test the claim of this commentator, we let μ be the average life expectancy
of a human being. Then we set up our Null and alternative hypotheses as follows:
H0 : μ =75
HA : μ >75

1. Definition. A statistical hypothesis is a statement, claim, or proposition regarding a population. Most often, it
is about the values of the population parameters. In the above two examples, H0 and HA are statistical
hypotheses.
2. It is important to consider which is a Null hypothesis and which is an alternative hypothesis in a given context.
Essentially, one is the negation of the other.
3. The Null hypothesis H0 represents the status quo; it is something that you have believed for a long time, or it
is some assumption or method that has been working reliably for you for a long time. You want to hold on to
the Null hypothesis unless there is very strong evidence, in the collected data, that the alternative
hypothesis is better.
4. The alternative hypothesis represents a new claim or something out of the ordinary. It could be a
researcher's new technology or some sales person's claim that his/her product is better. We would be very
skeptical about the alternative hypothesis and would accept it only if there is very strong evidence, in the
collected data, in favor of it.
5. Given a Null hypothesis H0 and an alternative hypothesis HA, a test of hypothesis is a rule or a procedure to
decide, based on the collected sample, whether to accept H0 or HA.

Our test will be based on the value of a test statistic. The rule is also called the decision rule or a test of
significance.

6. Two Types of errors. In this process of testing, we may commit two types of errors.
1. If we reject H0 when it is in fact true, then it is called a type one error.
2. If we accept H0 when it is in fact false, then it is called a type two error.
3. The probability of committing a type one error is called the level of significance and is, normally,
denoted by α. Usually, α will be a .1, .05, .01 or a small number.

9.2 Developing a Test

Let X be a random variable with mean μ and standard deviation σ. Some of our hypotheses testing will look like the
following.
H0 : μ = 75
HA : μ ≠ 75

or
H0 : μ = 75
HA : μ > 75

or
H0 : μ = 75
HA : μ < 75

More generally, we test hypotheses like


H0 : μ = μ 0
HA : μ ≠ μ 0

or
H0 : μ = μ 0
HA : μ > μ 0

or
H0 : μ = μ 0
HA : μ < μ 0

To Develop a test:

Suppose we have a random variable X with mean μ and standard deviation σ. We want to develop a test procedure for the
following null and alternative hypotheses.
H0 : μ = μ 0
HA : μ ≠ μ 0

We take a sample X1,X2, …, Xm of size m from the X population and let X be the sample mean.

1. We assume that sample size m is large enough, so we have by CLT that X has

N(μ, σX)

distribution, where

σX = σ/√m.

2. Both type one and type two errors can be controlled by increasing the sample size m. But once the sample size is
fixed, it is not possible to control both simultaneously. If you want to reduce the probability of type one error, the
probability of type two error will go up. The converse is also true. Since we are more concerned about type
one error, we will try to minimize the probability of type one error, which is also called the level of
significance. So we want to develop a test at the level of significance α.

3. Since X is a good estimator for μ, and since the alternative hypothesis is


HA : μ ≠ μ0

we will reject our null hypothesis H0 only if X and μ0 are far apart, that is, if

| X - μ0| is large.
4. Also, if H0 is true, then μ = μ0 and

Z=(X-μ0) /σX

has N(0,1) distribution, where

σX = σ/√m.
Expression Z above will be called a test statistic and we will accept H0 if the observed (absolute) value |z| of |Z|
is small and reject H0 if the observed value |z| of |Z| is large.

5. If H0 is true, then

P(Z ∉ ( -zα /2, zα/2 )) = α

6. So, at the level of significance α, our decision rule is


Reject H0 if z ∉ ( -zα/2, zα/2 )where z = (x-μ0) /σX

Accept H0 otherwise.

7. The above decision rule works only if we know the value of σ.

Some Hypotheses and Decision Rules. We will assume that the value of σ is known.
1. Two-tail test: Suppose we are testing

H0 : μ = μ0
HA : μ ≠ μ0

At the level of significance α, our decision rule is


Reject H0 if z ∉ ( -zα/2, zα/2 ) where z = (x-μ0) /σX

Accept H0 otherwise.
2. Left-tail test: Suppose we are testing

H0 : μ = μ0
HA : μ < μ0

At the level of significance α, our decision rule is


Reject H0 if z < -zα where z = (x-μ0) /σX

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing


H0 : μ = μ0
HA : μ > μ0

At the level of significance α, our decision rule is


Reject H0 if z > z α where z = (x-μ0) /σX

Accept H0 otherwise.

Definition. The set of values (that is, the intervals) that leads to the rejection of the Null hypothesis H0 is called
the rejection region or the critical region.
Definition. Suppose we have a test statistic T to test H0 against HA. Let the observed value of T = t. The P-value is
defined as the probability, assuming H0 is true, that T will take a value at least as extreme as t or worse. In the above
decision rules, our test statistic is
Z = (X-μ0) /σX

If Z = z is the observed value of Z, then we have the following.

1. For the two-tail test, the P-value is given by

p=P(Z ∉(-|z|,|z|))

2. For the left-tail test, the P-value is given by

p=P(Z < z)

3. For the right-tail test, the P-value is given by

p=P(Z > z)

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the Z-Test, which comes under TESTS.
2. When we use calculators (say TI-83) for testing hypotheses, the calculator will give us z-values and p-values.
3. We can use the z-values with the above decision rules to test hypotheses.
4. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.

Remark. For the rest of this chapter, we will test hypotheses for various parameters.
1. In each case, as above, we will have three tests—the two-tail test, the left-tail test, and the right-tail test.
2. In each case, the calculator will give the value of the test statistic (as the z-value above) and the p-value.
3. If we use the p-value for a test, then the decision rule will remain the same for all the tests to come:

Reject H0 if p < α
Accept H0 otherwise.

Problems on 9.2: Developing a Test —σ Known

Exercise 9.2.1. Assume that you have a normal population with mean μ and standard deviation σ = 15. Suppose you
have collected a sample of size 25 and the sample mean X was found to be 81. We want to test the null hypothesis
H0 : μ = 75
HA : μ ≠ 75

At the 5 percent level of significance will you reject or accept the null hypothesis?
Solution

Exercise 9.2.2. (Change the level of significance.) Assume that you have a normal population with mean μ and standard
deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81. We want to
test the null hypothesis
H0 : μ = 75
HA : μ ≠ 75

At the 1 percent level of significance will you reject or accept the null hypothesis?
Solution : Same as 9.2.1

Exercise 9.2.3. (Change the alternative hypothesis) Assume that you have a normal population with mean μ and
standard deviation σ = 15. Suppose you have collected a sample of size 25 and the sample mean X was found to be 81.
We want to test the null hypothesis
H0 : μ = 75
HA : μ > 75

At the 5 percent level of significance will you reject or accept the null hypothesis?
Solution

Exercise 9.2.4. The time taken by an athlete to run an event is normally distributed with mean μ and known standard
deviation σ = 3.5 seconds. The coach believes that his mean has improved from last year's mean 34 seconds. To test, the
athlete ran 16 times and the sample mean was found to be X = 31 seconds.

1. Formulate the null and the alternative hypotheses.


2. At 5 percent level of significance, would the coach accept or reject his belief that the athlete has improved?

Solution

9.3 Testing on a Single Population

In this section, we assume that X is a N(μ,σ) random variable. In the last section, we assumed that σ was known; but in
this section we assume that σ is not known. We will do all three tests as in the above section, but assume that the value
of σ is not known.

Once again, we draw a sample X1,X2,…,X m of size m from the X population. Let X and S2 be the sample mean and
variance, respectively. The test statistic we use is
T=((X-μ0) √m) /S
If H0: μ = μ0 is true then T has t-distribution with degrees of freedom m-1. Using the same kind of arguments, we
formulate the following decision rules.

1. Two-tail test: Suppose we are testing

H0 : μ= μ0
HA : μ ≠ μ0

At the level of significance α, our decision rule is


Reject H0 if t ∉ ( -tm-1, α/2, tm-1, α/2 )where t = ((x-μ0) √m) /s

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing


H0 : μ = μ0
HA : μ < μ0

At the level of significance α, our decision rule is


Reject H0 if t < -tm-1, α where t = ((x-μ0) √m) /s

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ = μ 0
HA : μ > μ0

At the level of significance α, our decision rule is

Reject H0 if t > tm-1, α where t = ((x-μ0) √m) /s

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the T-Test, which comes under TESTS. Use it when σ is not known.
2. The calculator will give us t-values and p-values.
3. We can use the t-values with the above decision rules to test hypotheses.
4. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.

Problems on 9.3: Testing on a Single Population —σ Unknown

Exercise 9.3.1. It is assumed that the lifetime (in hours) of light bulbs produced in a factory is normally distributed with
mean μ and standard deviation σ. The mean lifetime for an average light bulb on the market is 6000 hours. To estimate μ,
the following data was collected on the lifetime of light bulbs.

5110 4671 6441 3331 5055 5270 5335 4973 1837 5487
7783 4560 6074 4777 4707 5263 4978 5418 5123 5017
The producer claims that the mean life expectancy of the bulbs is more than the average bulbs on the market.

1. Formulate your null and alternative hypotheses.


2. Write down your decision rule.
3. At one percent level of significance, what will you decide?

Solution

Exercise 9.3.2. To estimate the mean weight (in pounds) of salmon in a river, the following sample was collected.

34.7 33.8 38.2 20.3 27.8


45.3 43.1 37.3 32.5 32.3
31.8 41.5 44.5 29.2 25.3
29.6 39.5 29.1 37.3
Last year the mean weight was found to be 35 pounds. You want to test to determine if the mean weight has changed
significantly this year.

1. Formulate your null and alternative hypotheses.


2. Write down your decision rule.
3. At one percent level of significance, what will you decide?

Solution

Exercise 9.3.3. A supplier of light bulbs claims that the mean lifetime of his bulbs is longer than that of the bulbs
available on the market. It is known that the mean lifetime of the bulbs on the market is 3456 hours. To test the claim of
the supplier, you test a sample of 26 bulbs and find the sample mean to be 3720 hours and the sample standard deviation
to be s = 1152 hours. At 5 percent level of significance, would you accept the claim of the supplier?
Solution

Exercise 9.3.4. It is believed that the mean length of babies at birth in the United States is higher than the world wide
mean of 18.7 inches. A sample of 26 babies in the United States was collected, and the sample mean and standard
deviation was found to be x = 19 inches, s = 1 inch. At 1 percent level of significance, do you believe that babies in the
United States are longer?
Solution

Exercise 9.3.5. A car manufacturer claims that a new model of car will get more mileage per gallon than the old model.
The old model gets a mean mileage of 33 miles per gallon. To test the claim, 9 cars from the new model were tested and
the sample mean was found to be x = 35 miles and standard deviation s = 2.2 miles. At 5 percent level of significance,
would you accept the claim of this manufacturer?
Solution

9.4 Testing Hypotheses on Variance σ2

Once again, let X be a N(μ, σ) random variable. We would like to test the Null hypothesis that

H0 : σ2 = σ20.

As usual we draw a sample X1,X2, …,Xm of size m from the X population. Let S2 be the sample variance. The test statistic
we use is

Y = (m-1)S2/σ02.

If H0 : σ2 = σ02 is true, then Y has χ2-distribution with degrees of freedom m-1. Using the same kind of arguments, we
formulate the following decision rules.

1. Two-tail test: Suppose we are testing

H0 : σ 2 = σ 0 2
HA : σ2 ≠ σ02

At the level of significance α, our decision rule is


Reject H0 if y ∉ ( χ2 m-1,1-α/2, χ2 m-1, α/2 ) where y = (m-1)s2/σ02

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : σ 2 = σ 0 2
HA : σ2 < σ02

At the level of significance α, our decision rule is

Reject H0 if y < χ2 m-1,1-αwhere y = (m-1)s2/σ02

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : σ 2 = σ 0 2
HA : σ2 > σ02

At the level of significance α, our decision rule is

Reject H0 if y > χ2 m-1,α where y = (m-1)s2/σ02

Accept H0 otherwise.

2
Remark. The TI-83 does not have a test for σ . So, one has to use the above decision rules for this section.

Problems on 9.4: Testing Hypotheses on Variance σ2

Exercise 9.4.1 Suppose that we have collected a sample of size n = 23 from a normal population with mean μ and
2 2
variance σ . The sample variance was found to be s = 46.7. At 5 percent level of significance, would you conclude
2
that σ is bigger than 25?
Solution

Exercise 9.4.2 Following is data on the life expectancies of a group of people older than 75.

87 92 81 76 81
87 79 88 88 79
81 89 97 91 82
2
At one percent level of significance, would you conclude that the variance, σ , of life expectancies is higher than 16?
Solution

Exercise 9.4.3 Following is data on a household's monthly gas consumption (in ccf) during the winter months.
154 222 264 257 127
228 240 393 278 140
2 2
At 5 percent level of significance, would you conclude that the variance σ of gas consumption is less than 6400 ccf ?
Solution
9.5 Population Proportion

Let p be the population proportion that has a particular attribute A. We want to test Null hypothesis
H0 : p = p 0 .

As usual, we draw (or interview) a sample of size m. Let X be the number of sample members that has this attribute
and X = X/m be the sample proportion. (So, X is the sample proportion of "success.") The test statistic we use is
Z=(X-p0) /σX

where
σX = √[(p0(1-p0)) /m].

If H0 : p = p0 is true, then Z has approximately N(0,1) distribution. As before, our decision rules are

1. Two-tail test: Suppose we are testing

H0 : p = p0
H A : p ≠ p0

At the level of significance α, our decision rule is

Reject H0 if z ∉ ( -zα/2, zα/2 ) where z = (x-p0) /σX

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : p = p0
H A : p < p0

At the level of significance α, our decision rule is


Reject H0 if z < -zα where z = (x-p0) /σX

Accept H0 otherwisep.

3. Right-tail test: Suppose we are testing

H0 : p = p0
HA : p > p0

At the level of significance α, our decision rule is

Reject H0 if z > zα where z = (x-p0) /σX

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 1-PropZTest, which comes under TESTS.
2. The calculator will ask for p0, the number of success x, and the sample size n.
3. The calculator will give us z-values and p-values; p-cap is, in fact, sample proportion of success x = x/n.
4. We can use the z-values with the above decision rules to test hypotheses.
5. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Problems on 9.5: Population Proportion

Exercise 9.5.1. In a sample of 197 apples from a lot, 19 were found to be sour.

1. At one percent level of significance, would you conclude that more than 10 percent of the apples are sour?
2. At five percent level of significance, would you conclude that more than 10 percent of the apples are sour?
3. At ten percent level of significance, would you conclude that more than 10 percent of the apples are sour?

Solution

Exercise 9.5.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 61 of them got
the virus. It is known that usually fifty percent of the population get the virus.

1. At one percent level of significance, would you conclude that the vaccine is effective?
2. At five percent level of significance, would you conclude that the vaccine is effective?
3. At ten percent level of significance, would you conclude that the vaccine is effective?

Solution

Exercise 9.5.3. Before an election for a congressional seat, a poll was conducted. Out of 887 randomly selected voters
interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

1. At five percent level of significance, would you conclude that candidate A will receive more than 40 percent of the
vote?
Solution
2. At ten percent level of significance, would you conclude that candidate A will receive more than 40 percent of the
vote?
3. At ten percent level of significance, would you conclude that candidate B will receive more than 40 percent of the
vote?
Solution

9.6 Testing of Hypotheses to Compare Two Populations

As we have computed confidence intervals to compare two populations, in this section we will do significance tests to
compare two populations.
Let X be a random variable with mean μ1 and standard deviation σ1 and let Y be a random variable with mean μ2 and
standard deviation σ2. (For example, X could be the height of an American male and Y could be the height of an American
female.)
We may like to compare the equality (or inequality) of means μ1, μ2. So, our Null hypothesis is given by

H0 : μ1 = μ2

or equivalently
H0 : μ1- μ2 = 0.

So, as before we collect a sample X1,X2, …,Xm, of size m from the X-population and a sample Y1,Y2, …,Yn, of size n, from
the Y-population. Let X and S12 be the sample mean and variance, respectively, of the X-sample. Let Y and S22 be the
sample mean and variance, respectively, of the Y-sample.

First, assume that σ1, σ2 are known

If σ1, σ2 are known, then the test statistic that we use is

Z = (X-Y)/σd

where
σd = √( σ12 /m + σ22 /n )

If the Null hypothesis H0 : μ1- μ2 = 0 is true, then Z has N(0,1) distribution. As before, our decision rules are formulated
as follows.

1. Two-tail test: Suppose we are testing

H0 : μ1 - μ2= 0
HA : μ1 - μ2≠ 0

At the level of significance α, our decision rule is

Reject H0 if z ∉ ( -zα/2, zα/2 ) where z = (x-y) /σd

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 < 0

At the level of significance α, our decision rule is

Reject H0 if z < -zα where z = (x-y) /σd

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 > 0

At the level of significance α, our decision rule is

Reject H0 if z > zα where z = (x-y) /σd

Accept H0 otherwise.

Remark. If sample sizes m,n are large, we can use S1, S2 as an estimate for σ1, σ2 in the above expression for Z. So, the
modified formula for Z would be :
Z = (X-Y)/sd

where

Sd = √( S12 /m + S22 /n )

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 2-SampZTest, which comes under TESTS.
2. Use it when σ 1 and σ 2 are known.
3. The calculator will give us z-values and p-values.
4. We can use the z-values with the above decision rules to test hypotheses.
5. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.
Problems on 9.6: Testing of Hypotheses to Compare Two Populations — σ1, σ2 Known

Exercise 9.6.1. Suppose we have two normal populations with means μ , μ and standard deviation σ , σ , respectively.
1 2 1 2
It is known that σ = 8.1 and σ = 11.3. A sample of size m = 64 was collected from the first population, and the sample
1 2
mean was found to be x = 3.7. A sample of size n = 99 was collected from the second population, and the sample mean
was found to be y = 4.1. At 5 percent level of significance, would you conclude that μ ≠ μ ?
1 2
Solution

Exercise 9.6.2. Suppose the birth weight of babies in developed and developing countries are normally distributed with
mean μ1, μ2 and standard deviation σ1, σ2, respectively. (My data is not real, as is often the case.) It is known the σ1 =
2.3 pounds and σ2 = 2.9 pounds. A sample of size m = 35 babies from the developed nations was collected, and the
sample mean birth weight was found to be X = 8.9 pounds. A sample of size n = 48 babies from the developing nations
was collected, and the sample mean birth weight was found to be y = 7.6 pounds.

1. At 5 percent level of significance, would you conclude that the mean birth weight of babies in the developed
nations is higher than that of the developing nations?
2. At 1 percent level of significance, would you conclude that the mean birth weight of babies in developed nations is
higher than that of developing nations?

Solution

Exercise 9.6.3. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. The mean and standard deviation height of African elephants
are μ1, σ1= 1.5 feet, respectively. The mean and standard deviation of the height of Indian elephants are μ2, σ2= 1.3
feet, respectively. A sample of size 25 African elephants was collected, and the sample mean height was found to be x =
10.9 feet. A sample of size 28 Indian elephants was collected, and the sample mean height was found to be y = 9.1 feet.

1. At 5 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of the Indian elephants?
2. At 1 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of the Indian elephants?

Solution

9.7 Comparing Means of Two Populations: σ1, σ2 Unknown

As we did with confidence intervals, we consider the case where σ1, σ2 are not known, but we assume that standard
deviations are equal:
σ1 = σ2 = σ.

In this case, we have the estimator Sp for σ given by

Sp = ( [(m-1)SX2+(n-1)SY2 ]/ [m+n-2] )1/2

where SX and SY are the respective sample standard deviations of the corresponding samples. The test statistic that we
use is
T = (X-Y) /[Sp √( 1/m+1/n) ]

If the Null hypothesis H0 : μ1- μ2 = 0 is true, then T has a t-distribution with degrees of freedom m+n-2. We formulate
the test hypotheses and the decision rules as follows.

1. Two-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 ≠ 0

At the level of significance α, our decision rule is


Reject H0 if t ∉ ( -tm+n-2,α/2, tm+n-2,α/2 ) where t = (x-y) / [sp √( 1/m + 1/n )]

Accept H0 otherwise.

2. Left-tail test : Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 < 0

At the level of significance α, our decision rule is

Reject H0 if t < -tm+n-2,α where t = (x-y) / [sp √ ( 1/m + 1/n )]

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 > 0

At the level of significance α, our decision rule is

Reject H0 if t > tm+n-2,α where t = (x-y) / [sp √ ( 1/m + 1/n )]

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 2-SampTTest, which comes under TESTS.
2. Use it when σ 1 = σ 2= σ are UNKNOWN and equals. Either s1 and s2 will be given or raw data will be given.
3. Always use Pooled estimate of σ by selecting YES for "Pooled".
4. The calculator will give t-values and p-values and also the pooled estimate SXP.
5. We can use the t-values with the above decision rules to test hypotheses.
6. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.

Problems on 9.7: Comparing Means of Two Populations — σ1, σ1 Unknown:

Exercise 9.7.1. Suppose that we are comparing two similar normal populations with means μ1, μ2, respectively, equal
standard deviation σ. We collected a sample of size m = 11 from the first population that produced a sample mean x =
13.2 and samples standard deviation s1 = 2.33. A sample of size n = 13 was collected from the second population that
had sample mean y = 11.5 and sample variance s2 = 2.73.

At 5 percent level of significance, would you conclude that μ ≠ μ2?


1
Solution
Exercise 9.7.2. Suppose we have two normal population with means μ , μ and equal standard deviation σ. A sample of
1 2
size m = 64 was collected from the first population and the sample mean and standard deviation were found to be x = 3.1,
s = 9.2 . A sample of size n = 99 was collected from the second population and the sample mean and standard deviation
1
were y = 4.4, s = 8.7. At 5 percent level of significance, would you conclude that μ ≠ μ2.
2 1
Solution
Exercise 9.7.3. Suppose the birth weight of babies in developed and developing countries are normally distributed with
mean μ1, μ2 and equal standard deviation σ. (My data is not real, as is often the case.) The following data about birth
weight in developed and developing nations were collected.

8.8 8.1 6.3 9.7 6.3


7.1 5.3 7.7 9.1 8.1 6.3 5.2 8.3 5.9 5.5

8.2 7.9 8.3 8.9 9.0 7.1 8.1 7.9 6.3 6.9

10.1 9.9 8.8 7.8 5.2 9.1 8.1 7.0 4.9 5.3

7.2 6.3 7.1 6.3 6.1 5.8


5.7 6.8 8.3 7.7
1. At 5 percent level of significance, would you conclude that the mean birth weight of babies in the developed
countries is higher than that in developing countries?
2. At 1 percent level of significance, would you conclude that the mean birth weight of babies in the developed
countries is higher than that in developing countries?

Solution

Exercise 9.7.4. African elephants and Indian elephants are different in height, weight, and length of ear and tusk. It is
natural to assume that all these are normally distributed. Assume that the height of Arican and Indian elephants have an
equal standard deviation σ. The mean heights of the African elephants and Indian elephants areμ1, μ2, respectively. The
following data were collected on the height of the elephants from the two continents (these are not real data):

10.9 11.7 9.3 9.9 11.5


8.8 12.9 11.7 9.1 11.1 7.1 8.3 8.2 9.1 10.3

9.1 8.7 10.5 11.3 12.3 9.3 9.7 8.9 8.8 9.1

13.1 12.9 9.5 10.7 11.3 7.9 9.9 9.2 8.8 8.1
8.7 8.8 9.3 10. 1 9.9
9.9

1. At 5 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of Indian elephants?
2. At 1 percent level of significance, would you conclude that the mean height of African elephants is higher than
that of Indian elephants?

Solution

9.7 Paired t-test


Once again, we are testing equality of means μ1, μ2 of two populations. So, our Null Hypothesis is

H0 : μ1- μ2 = 0.

We continue to denote the first population random variable by X and the second population random variable by Y. We also
assume that X and Y have normal distribution, and that they are independent.

In certain situations, it is natural to collect samples in "pairs" (X,Y) from the two populations and consider the difference D
= X-Y. So, D has mean
μD = μ1- μ2

and our Null hypothesis becomes


H0 : μD = 0.

Also D has
N(μD, σD )-distribution

where

σD = √( σ12 + σ22 ).

We will collect samples in pairs (X1,Y1), …,(Xn,Yn) and look at the corresponding D-sample:

D1 = X1-Y1, …, Dn = Xn-Yn.

Let
D = ( D1+…+Dn )/n
S2D = [∑ (Di-D)2] / (n-1)

be the sample mean and variance, respectively, of the D-sample.

The test statistic that we will use is


T = (D√n) /SD

If the Null hypothesis H0 : μD = μ1- μ2 = 0 is true, then T has a t-distribution with degrees of freedom n-1.

The following are decision rules for the Paired t-test.

1. Two-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 ≠ 0

At the level of significance α, our decision rule is

Reject H0 if t ∉ ( -tn-1,α/2, tn-1,α/2 ) where t = (D√n) /SD

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 < 0

At the level of significance α, our decision rule is

Reject H0 if t < -tn-1,α where t = (D√n) /SD

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing

H0 : μ1 - μ2 = 0
HA : μ1 - μ2 > 0

At the level of significance α, our decision rule is

Reject H0 if t > tn-1,α where t = (D√n) /SD

Accept H0 otherwise.

Example. Suppose we are comparing two models of cars to see how fast they accelerate. In this case, to avoid any
variation due to individual drivers, we take n drivers and let each driver drive one of each model of car. So, (xi,yi) are the
accelerations of the first and second model driven by driver 1. Thus, we will have n pairs of observations.
Remark. The same technique of paired t-test will give us that a (1-α)100 percent confidence interval for μD = μ1- μ2 is

d-tn-1,α/2sd < μ1- μ2 < d-tn-1,α/2sd

9.8 Comparing Proportions p1, p2 of Two Populations

Once again, we have two populations and let p1 be the proportion of Population 1 that has a certain attribute A and let
p2 be the population proportion of Population 2 that has attribute A. We want to compare p1 and p2. We want to test the
equality of these two proportions. So,our Null hypothesis is
H0 : p1-p2 = 0.

We take a sample of size m from Population 1 and let X be the number of the sample members that have this attribute A,
and X = X/m be the sample mean. Similarly, we take a sample (or interview) of size n and let Y be the number of the
sample members that have this attribute A and Y = Y/n be the sample mean. (So, X, Y are proportion of "success" of the
two samples.)

Write

P=(X+Y)/(m+n)

If the null hypothesis


H0 : p 1 = p 2

is true, then p is the natural estimate for p1 = p2.

The sample statistic we use here is


Z = (X-Y) /sD

where
sD = √ [P(1-P)(1/m + 1/n) ]

If H0 : p1-p2 = 0 is true, then Z has, approximately, N(0,1) distribution. Now our test hypotheses and the decision rules
are as follows.

1. Two-tail test: Suppose we are testing

H 0 : p 1 - p2 = 0
HA : p1 - p2 ≠ 0

At the level of significance α, our decision rule is

Reject H0 if z ∉ ( -zα/2, zα/2 ) where z = (X-Y)/sD

Accept H0 otherwise.

2. Left-tail test: Suppose we are testing

HO : p1 - p2 = 0
HA : p1 - p2 < 0

At the level of significance α, our decision rule is as follows:

Reject H0 if z < -zα where z = (X-Y)/sD

Accept H0 otherwise.

3. Right-tail test: Suppose we are testing


H 0 : p 1 - p2 = 0
HA : p1 - p2 > 0

At the level of significance α, our decision rule is

Reject H0 if z > zα where z = (X-Y)/sD

Accept H0 otherwise.

Use of Calculators and P-values:

1. In the TI-83 menu the above test is called the 2PropZTest, which comes under TESTS.
2. The calculator will give us z-values and p-values. Also, in our notations, p1-cap = X, p2-cap = Y, p-cap = P
3. We can use the z-values with the above decision rules to test hypotheses.
4. Alternately, at the level of significance α, if the P-value=p then
Reject H0 if p < α
Accept H0 otherwise.

9.8: Problems on Comparing Proportions p1, p2 of Two Populations

Exercise 9.8.1. Suppose two independent samples were collected from two populations. We want to compare the
proportions p1,p2 , respectively, of an attribute A present in these two populations. We are given that x = 55 had the
attribute A in a sample of size m = 117 from the first population, and y = 37 had the attribute A is a sample of size n = 79
from the second sample.
At 1 percent level of significance, would you conclude that p > p ?
1 2
Solution
Exercise 9.8.2. To compare the proportions p1,p2 of defective items produced by new and old machines, respectively,
samples were collected. In a sample of 57 items from the new machine, 6 were found to be defective; and in a sample of
41 items from the old machine, 9 were defective.
At 5 percent level of significance, would you conclude that p < p ?
1 2
Solution
Exercise 9.8.3. Data was collected to compare the proportions p1,p2 of men and women, respectively, who watch
football. In a sample of 199 men, 83 said that they watch football; and in a sample of 161 women, 51 said they watch
football. (These are not real data).

1. At 5 percent level of significance, would you conclude that the proportion of men who watch football is higher than
the proportion of women who watch football?
2. At 1 percent level of significance, would you conclude that the proportion of men who watch football is higher than
the proportion of women who watch football?

You might also like