You are on page 1of 128

Statistics: Introduction chp1

Definitions
Statistics
Collection of methods for planning experiments, obtaining data, and then
organizing, summarizing, presenting, analyzing, interpreting, and drawing
conclusions.
Variable
Characteristic or attribute that can assume different values
Random Variable
A variable whose values are determined by chance.
Population
All subjects possessing a common characteristic that is being studied.
Sample
A subgroup or subset of the population.
Parameter
Characteristic or measure obtained from a population.
Statistic (not to be confused with Statistics)
Characteristic or measure obtained from a sample.
Descriptive Statistics
Collection, organization, summarization, and presentation of data.
Inferential Statistics
Generalizing from samples to populations using probabilities. Performing
hypothesis testing, determining relationships between variables, and making
predictions.
Qualitative Variables
Variables which assume non-numerical values.
Quantitative Variables
Variables which assume numerical values.
Discrete Variables
Variables which assume a finite or countable number of possible values.
Usually obtained by counting.
Continuous Variables
Variables which assume an infinite number of possible values. Usually
obtained by measurement.
Nominal Level
Level of measurement which classifies data into mutually exclusive, all
inclusive categories in which no order or ranking can be imposed on the data.
Ordinal Level
Level of measurement which classifies data into categories that can be ranked.
Differences between the ranks do not exist.
Interval Level
Level of measurement which classifies data that can be ranked and differences
are meaningful. However, there is no meaningful zero, so ratios are
meaningless.
Ratio Level
Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different
units of measure.
Random Sampling
Sampling in which the data is collected using chance methods or random
numbers.
Systematic Sampling
Sampling in which data is obtained by selecting every kth object.
Convenience Sampling
Sampling in which data is which is readily available is used.
Stratified Sampling
Sampling in which the population is divided into groups (called strata)
according to some characteristic. Each of these strata is then sampled using one
of the other sampling techniques.
Cluster Sampling
Sampling in which the population is divided into groups (usually
geographically). Some of these groups are randomly selected, and then all of
the elements in those groups are selected.

Table of Contents Statistics: Introduction

Population vs Sample
The population includes all objects of interest whereas the sample is only a portion of
the population. Parameters are associated with populations and statistics with samples.
Parameters are usually denoted using Greek letters (mu, sigma) while statistics are
usually denoted using Roman letters (x, s).

There are several reasons why we don't work with populations. They are usually large,
and it is often impossible to get data for every object we're studying. Sampling does
not usually occur without cost, and the more items surveyed, the larger the cost.

We compute statistics, and use them to estimate parameters. The computation is the
first part of the statistics course (Descriptive Statistics) and the estimation is the
second part (Inferential Statistics)

Discrete vs Continuous
Discrete variables are usually obtained by counting. There are a finite or countable
number of choices available with discrete data. You can't have 2.63 people in the
room.

Continuous variables are usually obtained by measuring. Length, weight, and time are
all examples of continous variables. Since continuous variables are real numbers, we
usually round them. This implies a boundary depending on the number of decimal
places. For example: 64 is really anything 63.5 <= x < 64.5. Likewise, if there are two
decimal places, then 64.03 is really anything 63.025 <= x < 63.035. Boundaries
always have one more decimal place than the data and end in a 5.

Levels of Measurement
There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go
from lowest level to highest level. Data is classified according to the highest level
which it fits. Each additional level adds something the previous level didn't have.

 Nominal is the lowest level. Only names are meaningful here.


 Ordinal adds an order to the names.
 Interval adds meaningful differences
 Ratio adds a zero so that ratios are meaningful.

Types of Sampling
There are five types of sampling: Random, Systematic, Convenience, Cluster, and
Stratified.
 Random sampling is analogous to putting everyone's name into a hat and
drawing out several names. Each element in the population has an equal
chance of occuring. While this is the preferred way of sampling, it is often
difficult to do. It requires that a complete list of every element in the
population be obtained. Computer generated lists are often used with random
sampling. You can generate random numbers using the TI82 calculator.
 Systematic sampling is easier to do than random sampling. In systematic
sampling, the list of elements is "counted off". That is, every kth element is
taken. This is similar to lining everyone up and numbering off "1,2,3,4; 1,2,3,4;
etc". When done numbering, all people numbered 4 would be used.
 Convenience sampling is very easy to do, but it's probably the worst technique
to use. In convenience sampling, readily available data is used. That is, the first
people the surveyor runs into.
 Cluster sampling is accomplished by dividing the population into groups --
usually geographically. These groups are called clusters or blocks. The clusters
are randomly selected, and each element in the selected clusters are used.
 Stratified sampling also divides the population into groups called strata.
However, this time it is by some characteristic, not geographically. For
instance, the population might be separated into males and females. A sample
is taken from each of these strata using either random, systematic, or
convenience sampling.

Table of Contents

TI-82: Generating Random Numbers


You can generate random numbers on the TI-82 calculator using the following
sequence. N is the number of different values which could be and S is the
minimum number.
int (N*rand+S)

INT is found under the MATH menu (math num 4). RAND is also found under
the MATH menu (math prb 1).

Simulate the rolling of a die (1-6): int (6*rand+1)


Simulate the flipping of a coin (0-1): int (2*rand)

This works because the rand function returns a random number between 0
and 1 (including 0 but not including 1). When it is multiplied by N, it becomes
between 0 and N, and then S is added, so it becomes between S and S+N.

If you have two values (A and B) that you need random numbers between,
then you can generate them using the following formulas.
N=B-A+1
int (N*rand+A)

Notice it is B-A+1 not B-A. Everyone agrees there are 10 numbers between 1
and 10 (inclusive). But, if you take 10-1, you get 9, not 10. Also, in the formula
above, replace the N by the actual number of different values.

Since the calculator remembers the last formula put in, and evaluates it when
you hit enter, to generate more random numbers, just hit enter again. Each
time you hit enter, you will get another random number.

Sampling Lab

The purpose of this laboratory exercise is to familiarize yourself with the different
sampling techniques.

You need one page from a movie listing (like contained in TV-Guide). Note, if you
actually use TV Guide®, then you need to use two facing pages. Pick a page with
little extraneous material, other than the listings, on it.

For the purposes of this sampling project, a movie is included on the page or in a
cluster if the running time for the movie falls on the page.

Random Sampling

Number each movie on the page. If there are a lot of movies, you may wish to number
every other or every third movie.

Generate a random sample on 8 numbers between 1 and the number of movies on the
page. Write down the # generated and the running time for the movie corresponding
to that number.
Systematic Sampling

Generate a random number between 1 and 6. Beginning with the movie corresponding
to that number, and then taking every 6th movie thereafter, write the # of the movie
and the running length of the movie.

Convenience Sampling

Write down the running time of the first eight movies.

Stratified Sampling

On a separate piece of paper, write down the running times of all PG/PG13, R, and
not-rated (either NR or no rating given) movies in three columns -- ignore all other
types (NC17, G, etc). Split a sample of 8 proportionally to each type of movie (if R is
40%, then sample 40% of 8 = 3.2 -> 3 R movies). Use random sampling within each
movie type. Record the running lengths of the movies selected.

Cluster Sampling

Divide the page into equal regions so that each region has roughly 3 - 4 movies in
each cluster. Randomly select 3 clusters, and record the running length of all movies
in those clusters.

Statistics: Frequency Distributions & Graphs chp 2

Definitions
Raw Data
Data collected in original form.
Frequency
The number of times a certain value or class of values occurs.
Frequency Distribution
The organization of raw data in table form with classes and frequencies.
Categorical Frequency Distribution
A frequency distribution in which the data is only nominal or ordinal.
Ungrouped Frequency Distribution
A frequency distribution of numerical data. The raw data is not grouped.
Grouped Frequency Distribution
A frequency distribution where several numbers are grouped into one class.
Class Limits
Separate one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limit
of one class and the lower limit of the next.
Class Boundaries
Separate one class in a grouped frequency distribution from another. The
boundaries have one more decimal place than the raw data and therefore do
not appear in the data. There is no gap between the upper boundary of one
class and the lower boundary of the next class. The lower class boundary is
found by subtracting 0.5 units from the lower class limit and the upper class
boundary is found by adding 0.5 units to the upper class limit.
Class Width
The difference between the upper and lower boundaries of any class. The
class width is also the difference between the lower limits of two consecutive
classes or the upper limits of two consecutive classes. It is not the difference
between the upper and lower limits of the same class.
Class Mark (Midpoint)
The number in the middle of the class. It is found by adding the upper and
lower limits and dividing by two. It can also be found by adding the upper and
lower boundaries and dividing by two.
Cumulative Frequency
The number of values less than the upper class boundary for the current class.
This is a running total of the frequencies.
Relative Frequency
The frequency divided by the total frequency. This gives the percent of values
falling in that class.
Cumulative Relative Frequency (Relative Cumulative Frequency)
The running total of the relative frequencies or the cumulative frequency
divided by the total frequency. Gives the percent of the values which are less
than the upper class boundary.
Histogram
A graph which displays the data by using vertical bars of various heights to
represent frequencies. The horizontal axis can be either the class boundaries,
the class marks, or the class limits.
Frequency Polygon
A line graph. The frequency is placed along the vertical axis and the class
midpoints are placed along the horizontal axis. These points are connected
with lines.
Ogive
A frequency polygon of the cumulative frequency or the relative cumulative
frequency. The vertical axis the cumulative frequency or relative cumulative
frequency. The horizontal axis is the class boundaries. The graph always starts
at zero at the lowest class boundary and will end up at the total frequency (for
a cumulative frequency) or 1.00 (for a relative cumulative frequency).
Pareto Chart
A bar graph for qualitative data with the bars arranged according to
frequency.
Pie Chart
Graphical depiction of data as slices of a pie. The frequency determines the
size of the slice. The number of degrees in any slice is the relative frequency
times 360 degrees.
Pictograph
A graph that uses pictures to represent data.
Stem and Leaf Plot
A data plot which uses part of the data value as the stem and the rest of the
data value (the leaf) to form groups or classes. This is very useful for sorting
data quickly.

Table of Contents

Statistics: Grouped Frequency Distributions

Guidelines for classes

1. There should be between 5 and 20 classes.


2. The class width should be an odd number. This will guarantee that the class
midpoints are integers instead of decimals.
3. The classes must be mutually exclusive. This means that no data value can fall
into two different classes
4. The classes must be all inclusive or exhaustive. This means that all data values
must be included.
5. The classes must be continuous. There are no gaps in a frequency distribution.
Classes that have no values in them must be included (unless it's the first or
last class which are dropped).
6. The classes must be equal in width. The exception here is the first or last class.
It is possible to have an "below ..." or "... and above" class. This is often used
with ages.

Creating a Grouped Frequency Distribution

1. Find the largest and smallest values


2. Compute the Range = Maximum - Minimum
3. Select the number of classes desired. This is usually between 5 and 20.
4. Find the class width by dividing the range by the number of classes and
rounding up. There are two things to be careful of here. You must round up,
not off. Normally 3.2 would round to be 3, but in rounding up, it becomes 4. If
the range divided by the number of classes gives an integer value (no
remainder), then you can either add one to the number of classes or add one
to the class width. Sometimes you're locked into a certain number of classes
because of the instructions. The Bluman text fails to mention the case when
there is no remainder.
5. Pick a suitable starting point less than or equal to the minimum value. You will
be able to cover: "the class width times the number of classes" values. You
need to cover one more value than the range. Follow this rule and you'll be
okay: The starting point plus the number of classes times the class width must
be greater than the maximum value. Your starting point is the lower limit of
the first class. Continue to add the class width to this lower limit to get the
rest of the lower limits.
6. To find the upper limit of the first class, subtract one from the lower limit of
the second class. Then continue to add the class width to this upper limit to
find the rest of the upper limits.
7. Find the boundaries by subtracting 0.5 units from the lower limits and adding
0.5 units from the upper limits. The boundaries are also half-way between the
upper limit of one class and the lower limit of the next class. Depending on
what you're trying to accomplish, it may not be necessary to find the
boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to
accomplish, it may not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative
frequencies.

It is possible to have the TI-82 calculator find the frequencies for you. You will have
to find the class width and class boundaries first.

TI-82: Lists and Statistics


There are two features of the TI-82 calculator that will be used. Lists and
Statistics. The STAT key is located at the top center of the calculator and the
LIST key is obtained by 2nd STAT.

There are six lists that you can work with at any time on the calculator. Each
set of data requires a list. If you include frequencies for the frequency
distribution, then it will require a list for the data and a separate list for the
frequencies. The lists are labeled L1, L2, L3, L4, L5, and L6 and are accessed
on the calculator by pressing "2nd 1", "2nd 2", etc.

STATS Key
STAT has two major categories, EDIT and CALC

STAT-EDIT

1. Edit - Use this to enter data into a list.


2. SortA( - This will sort a list in ascending order. This is useful if you want
to find the frequencies after you have already established the limits or
boundaries. Since the data is sorted in order, you just have to go
through and count the number in each class. You don't need to do the
tally. This will replace the list you tell it sort.
3. SortD( - This will sort a list in descending order. This will replace the list
you tell it to sort.
4. ClrList - This will erase any existing lists.

STAT-CALC

1. 1-Var Stats - This is used when there is only one variable. It will handle
both raw data and frequency distributions.
2. 2-Var Stats - This is used when there are two variables, and x and y.
This won't happen until the end of the semester.
3. Setup - You will need to check the setup before you find any other
statistical values from this menu. It allows you to specify which list(s)
you put the data into and if necessary, which list contains the
frequencies.
4. Med-Med - A regression model that isn't used in this course.
5. LinReg(ax+b) - A regression model that will be used later in the course
after we talk about two variable statistics.
6. QuadReg - A regression model that isn't used in this course.
7. CubicReg - A regression model that isn't used in this course.
8. QuartReg - A regression model that isn't used in this course.
9. LinReg(ax+b) - A regression model that will be used later in the course
after we talk about two variable statistics. The book uses this model,
however, we will use #5 instead.
10. LnReg - A regression model that isn't used in this course.
11. ExpReg - A regression model that isn't used in this course.
12. PwrReg - A regression model that isn't used in this course.
LIST Key
The LIST command has two major sections, OPS (Operations) and MATH.

LIST-OPS

1. SortA( - This will sort a list in ascending order. This command is


equivalent to the SortA( command under the STATS key.
2. SortD( - This will sort a list in descending order. This command is
equivalent to the SortD( command under the STATS key.
3. dim - This function will return the dimensions of a list. The dimension of
a list the number of elements in the list. This is also used as a command
to set the dimensions of a list.
4. Fill( - This command will fill a list with a constant. This is useful if you
need to set an entire list to all be one number.
5. seq - This function will generate a sequence of numbers according to
the function specified as the first argument. A list is returned, but you
must save it to one of the six lists if you want to use it for anything.

LIST-MATH

1. min( - Returns the minimum value in a list.


2. max( - Returns the maximum value in a list
3. mean( - Returns the arithmetic mean of all numbers in the list. The
mean is the sum of the list divided by the dimension of the list.
4. median( - Returns the median of the list. The median is the middle
number when the list is sorted in ascending order. If the dimension is an
even number, the median is the midpoint between the two middle
values when the list is sorted in ascending order.
5. sum - Returns the sum of the values in the list.
6. prod - Returns the product of the values in the list. If the product of a list
is zero, then at least one of the numbers is zero.

Other Keys
VARS

The VARS key can be used to retrieve the value of a statistic.

VARS Statistics
This will save a lot of retyping of values and allow you to use the full accuracy
of the calculator instead of losing digits when re-entering numbers.

Here are some common values you will be using:

Keystrokes Statistic

VARS 5 1 n, the sample size

VARS 5 1 x bar, the sample mean

VARS 5 1 Sx, the sample standard deviation

VARS 5 1 minX, the minimum value

VARS 5 1 maxX, the maximum value

There are other values under statistics which you will use. You may have to
arrow to other submenus first for some of them.

STORE

This key will save values. You may save a scalar value to a real variable (A-Z)
or a list value to a list (L1 - L6). You can use the STORE key to save a value
to the dimension of a list to set its size. You can use the STORE key to save a
list generated by the sequence command to a list.

Mathematical Operations and Functions

Lists can be used as arguments of functions. If they are, the function is


applied to each element in the list. Mathematical operations can be performed
on lists. For more information on lists, see the Introduction to the TI-82.

Entering Data
Always start with a clean set of data. You don't want to mix data from one
problem with data from another problem. Before starting any new problem,
you should clear out existing data.

STAT ClrList L1,L2,L3


Another way to clear the lists is to go into STAT EDIT, arrow to the top so that
the list name is highlighted. Then press the CLEAR key and ENTER.

You may only need to specify one list, but you can specify more than one, just
separate them with commas.

After the lists have been cleared, you can enter the new lists:

STAT Edit

Select this list that you want to use. The default will be L1. This will be fine for
most things, but do realize you can use any of the lists. Just be sure to check
the setup later.

Type in each number separating them by enter. When you are done entering,
press the QUIT key (2nd MODE).

If you need to correct data, just go back to STAT EDIT without clearing the list
first.

TI-82: Histograms, BoxPlots


You can use the calculator to draw histograms, box-plots, and compute the
frequency of each class.

See the instructions on using the calculator to do statistics and lists. This
provides an overview as well as some helpful advice for working with statistics
on the calculator.

Histograms
1. Enter the data.
2. Determine the class width and the lower class boundary (not limit) of the
first class using the techniques for creating grouped frequency
distributions.
3. Turn off any regular plots: Hit Y= and position the cursor over any equal
sign which is in inversed video (white on black) by arrowing left and then
down if necessary. Hit enter while the cursor is on the equal sign to
toggle between displaying the function (equal sign highlighted) and not
displaying the function (equal sign not highlighted).
4. Press the STATPLOT key (2nd Y=)
5. Select a plot (usually plot 1) and hit enter
6. Turn the plot on by highlighting the ON and pressing enter.
7. Set the TYPE to histograph (last type)
8. Set the XLIST to the list you put the data into
9. Set the FREQ to 1.
10. Select WINDOW
11. Put the lower class boundary for the first class in XMIN
12. The XMAX value should be the lower class boundary for the first
class plus the number of classes times the class width.
13. The Class Width should be stored in XSCL
14. YMIN should be set to 0
15. YMAX should be at least the largest frequency in any class. This
is difficult to know if you're generating the histogram without first writing
the table by hand. If the histogram displayed doesn't fit on the screen,
go back and change this number. A good initial guess might be the
sample size divided by the number or classes. You might round up it to
a nice number (multiple of 5) or add one or two so that graph is
completely shown on the screen.
16. YSCL should be set based on the YMAX value. A factor of YMAX
would be a good choice (so if YMAX is 30, let YSCL be 5). If your YMAX
is small (say under 10), you might want to set it to 1. This will determine
how many marks are placed along the vertical axis.
17. Hit the GRAPH key.

Finding the Frequency


1. Generate a histogram first
2. Hit the TRACE key
3. The "min" value is the lower class boundary
4. The "max" value is the upper class boundary
5. The "n" value is the frequency for that class.
6. Use the left and right arrow keys to get the values for all the classes.

Box Plots
1. Enter the data.
2. Turn off any regular plots: Hit Y= and position the cursor over any equal
sign which is in inversed video (white on black) by arrowing left and then
down if necessary. Hit enter while the cursor is on the equal sign to
toggle between displaying the function (equal sign highlighted) and not
displaying the function (equal sign not highlighted).
3. Press the STATPLOT key (2nd Y=)
4. Select a plot (usually plot 1) and hit enter
5. Turn the plot on by highlighting the ON and pressing enter.
6. Set the TYPE to box-plot (3rd type)
7. Set the XLIST to the list you put the data into
8. Set the FREQ to 1.
9. Zoom to Statistics mode (ZOOM 9)

You hit the TRACE key with the box plot displayed to find the five numbers
associated with it. You may use the left and right arrow keys to find all five
numbers. Note that the calculator uses the quartiles instead of the hinges. The
hinges and quartiles are the same unless the remainder when the sample size
is divided by four is three.

TI-82: Plotting an Ogive


The Ogive is a frequency polygon (line plot) graph of the cumulative frequency
or the relative cumulative frequency.

The horizontal axis is marked with the class boundaries and the vertical axis is
the frequency. All class boundaries are used -- there will be one more class
boundary than the number of classes.

The following example assumes the class boundaries are in List 1 and the
cumulative frequencies are in List 2. You are free to use any two lists that you
desire, but you should make the appropriate adjustments in the instructions if
you don't use List 1 and List 2.

1. Enter the class boundaries into List 1. Start with the lower boundary of
the first class and end with the upper boundary of the last class.
2. Enter the cumulative frequencies into List 2. Start with 0 for the first
value because there is nothing less than the first lower class boundary.
3. Turn off any regular plots: Hit Y= and position the cursor over any equal
sign which is in inversed video (white on black) by arrowing left and then
down if necessary. Hit enter while the cursor is on the equal sign to
toggle between displaying the function (equal sign highlighted) and not
displaying the function (equal sign not highlighted).
4. Press the STATPLOT key (2nd Y=)
5. Select a plot (usually plot 1) and hit enter
6. Turn the plot on by highlighting the ON and pressing enter.
7. Set the TYPE to LinePlot (2nd type)
8. Set the XLIST to List 1
9. Set the YLIST to List 2
10. Set the MARKER to any of the three values
11. Select WINDOW
12. Put the lower class boundary for the first class in XMIN
13. The XMAX value should be the upper class boundary of the last
class
14. The Class Width should be stored in XSCL
15. YMIN should be set to 0
16. YMAX should be set to the total frequency if using cumulative
frequencies in List 2 and set to 1.00 if using relative cumulative
frequencies in List 2.
17. YSCL should be set appropriately based on YMAX.
18. Hit the GRAPH key.

Relative Frequencies

There is no need to re-enter the data if you wish to use relative cumulative
frequencies instead of cumulative frequencies.

The following assumes that the cumulative frequencies are in List 2.

Replace the ### by the total frequency. You can't put "###" into the calculator.
L2 / ### STORE L2

This will replace the cumulative frequencies with the relative cumulative
frequencies.

To replace relative cumulative frequencies with cumulative frequencies,


change the division to multiplication.
L2 * ### STORE L2

PIE Program
The TI-82 doesn't support pie charts directly as it does with scatterplots, box
plots, and histograms.

Place the frequencies or relative frequencies in List 1. If the List 1 is empty or


the sum of list 1 is zero, then you are instructed to put the frequencies in list 1.
Turn off any graphs that may be on before running the PIE program.
Otherwise, the graphs will overlay the pie chart and it will take longer to draw.

The program will ask the user if they wish to place the labels on the graph. If
the user enters 1 for yes, then the values in List 1 will be placed in the graph.
This is where the difference between frequencies or relative frequencies
appear.

This program will force the calculator into radian mode and turn the axes off,
zoom standard and then zoom square. It will then draw a circle and proceed
to draw the lines which define the pie graph.

To reset the graphing screen to normal when done viewing the pie chart, you
need to:

1. DRAW CLRDRAW
2. WINDOW FORMAT AXESON
3. MODE DEGREE -Depending on your use, Leaving it in Radian mode
may be preferred

Statistics: Data Description chp3

Definitions

Statistic
Characteristic or measure obtained from a sample
Parameter
Characteristic or measure obtained from a population
Mean
Sum of all the values divided by the number of values. This can either be a
population mean (denoted by mu) or a sample mean (denoted by x bar)
Median
The midpoint of the data after being ranked (sorted in ascending order). There
are as many numbers below the median as above the median.
Mode
The most frequent number
Skewed Distribution
The majority of the values lie together on one side with a very few values (the
tail) to the other side. In a positively skewed distribution, the tail is to the right
and the mean is larger than the median. In a negatively skewed distribution,
the tail is to the left and the mean is smaller than the median.
Symmetric Distribution
The data values are evenly distributed on both sides of the mean. In a
symmetric distribution, the mean is the median.
Weighted Mean
The mean when each value is multiplied by its weight and summed. This sum
is divided by the total of the weights.
Midrange
The mean of the highest and lowest values. (Max + Min) / 2
Range
The difference between the highest and lowest values. Max - Min
Population Variance
The average of the squares of the distances from the population mean. It is
the sum of the squares of the deviations from the mean divided by the
population size. The units on the variance are the units of the population
squared.
Sample Variance
Unbiased estimator of a population variance. Instead of dividing by the
population size, the sum of the squares of the deviations from the sample
mean is divided by one less than the sample size. The units on the variance are
the units of the population squared.
Standard Deviation
The square root of the variance. The population standard deviation is the
square root of the population variance and the sample standard deviation is
the square root of the sample variance. The sample standard deviation is not
the unbiased estimator for the population standard deviation. The units on
the standard deviation is the same as the units of the population/sample.
Coefficient of Variation
Standard deviation divided by the mean, expressed as a percentage. We won't
work with the Coefficient of Variation in this course.
Chebyshev's Theorem
The proportion of the values that fall within k standard deviations of the mean

is at least where k > 1. Chebyshev's theorem can be applied to any


distribution regardless of its shape.
Empirical or Normal Rule
Only valid when a distribution in bell-shaped (normal). Approximately 68% lies
within 1 standard deviation of the mean; 95% within 2 standard deviations;
and 99.7% within 3 standard deviations of the mean.
Standard Score or Z-Score
The value obtained by subtracting the mean and dividing by the standard
deviation. When all values are transformed to their standard scores, the new
mean (for Z) will be zero and the standard deviation will be one.
Percentile
The percent of the population which lies below that value. The data must be
ranked to find percentiles.
Quartile
Either the 25th, 50th, or 75th percentiles. The 50th percentile is also called the
median.
Decile
Either the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, or 90th percentiles.
Lower Hinge
The median of the lower half of the numbers (up to and including the median).
The lower hinge is the first Quartile unless the remainder when dividing the
sample size by four is 3.
Upper Hinge
The median of the upper half of the numbers (including the median). The
upper hinge is the 3rd Quartile unless the remainder when dividing the sample
size by four is 3.
Box and Whiskers Plot (Box Plot)
A graphical representation of the minimum value, lower hinge, median, upper
hinge, and maximum. Some textbooks, and the TI-82 calculator, define the
five values as the minimum, first Quartile, median, third Quartile, and
maximum.
Five Number Summary
Minimum value, lower hinge, median, upper hinge, and maximum.
InterQuartile Range (IQR)
The difference between the 3rd and 1st Quartiles.
Outlier
An extremely high or low value when compared to the rest of the values.
Mild Outliers
Values which lie between 1.5 and 3.0 times the InterQuartile Range below the
1st Quartile or above the 3rd Quartile. Note, some texts use hinges instead of
Quartiles.
Extreme Outliers
Values which lie more than 3.0 times the InterQuartile Range below the 1st
Quartile or above the 3rd Quartile. Note, some texts use hinges instead of
Quartiles.
Table of Contents

Stats: Measures of Central Tendency

The term "Average" is vague


Average could mean one of four things. The arithmetic mean, the median, midrange,
or mode. For this reason, it is better to specify which average you're talking about.

Mean
This is what people usually intend when they say "average"

Population Mean:

Sample Mean:

Frequency Distribution:
The mean of a frequency distribution is also the weighted mean.

Median
The data must be ranked (sorted in ascending order) first. The median is the number in
the middle.

To find the depth of the median, there are several formulas that could be used, the one
that we will use is:
Depth of median = 0.5 * (n + 1)
Raw Data

The median is the number in the "depth of the median" position. If the sample size is
even, the depth of the median will be a decimal -- you need to find the midpoint
between the numbers on either side of the depth of the median.

Ungrouped Frequency Distribution

Find the cumulative frequencies for the data. The first value with a cumulative
frequency greater than depth of the median is the median. If the depth of the median is
exactly 0.5 more than the cumulative frequency of the previous class, then the median
is the midpoint between the two classes.

Grouped Frequency Distribution

This is the tough one.

Since the data is grouped, you have lost all original information. Some textbooks have
you simply take the midpoint of the class. This is an over-simplification which isn't
the true value (but much easier to do). The correct process is to interpolate.

Find out what proportion of the distance into the median class the median by dividing
the sample size by 2, subtracting the cumulative frequency of the previous class, and
then dividing all that bay the frequency of the median class.

Multiply this proportion by the class width and add it to the lower boundary of the
median class.

Mode
The mode is the most frequent data value. There may be no mode if no one value
appears more than any other. There may also be two modes (bimodal), three modes
(trimodal), or more than three modes (multi-modal).

For grouped frequency distributions, the modal class is the class with the largest
frequency.

Midrange
The midrange is simply the midpoint between the highest and lowest values.

Summary
The Mean is used in computing other statistics (such as the variance) and does not
exist for open ended grouped frequency distributions (1). It is often not appropriate for
skewed distributions such as salary information.

The Median is the center number and is good for skewed distributions because it is
resistant to change.

The Mode is used to describe the most typical case. The mode can be used with
nominal data whereas the others can't. The mode may or may not exist and there may
be more than one value for the mode (2).

The Midrange is not used very often. It is a very rough estimate of the average and is
greatly affected by extreme values (even more so than the mean).

Property Mean Median Mode Midrange

Always Exists No (1) Yes No (2) Yes

Uses all data values Yes No No No

Affected by extreme values Yes No No Yes

Using the TI-82


One can find the mean, median, and midrange using the list functions of the TI-82.
You can also find the measures of variation with the TI-82 calculator.

Table of Contents

Stats: Measures of Variation


Range
The range is the simplest measure of variation to find. It is simply the highest value
minus the lowest value.
RANGE = MAXIMUM - MINIMUM

Since the range only uses the largest and smallest values, it is greatly affected by
extreme values, that is - it is not resistant to change.

Variance
"Average Deviation"

The range only involves the smallest and largest numbers, and it would be desirable to
have a statistic which involved all of the data values.

The first attempt one might make at this is something they might call the average
deviation from the mean and define it as:

The problem is that this summation is always zero. So, the average deviation will
always be zero. That is why the average deviation is never used.

Population Variance

So, to keep it from being zero, the deviation from the mean is squared and called the
"squared deviation from the mean". This "average squared deviation from the mean"
is called the variance.

Unbiased Estimate of the Population Variance

One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of
statistics is to estimate the corresponding parameter. This formula has the problem
that the estimated value isn't the same as the parameter. To counteract this, the sum of
the squares of the deviations is divided by one less than the sample size.

Standard Deviation
There is a problem with variances. Recall that the deviations were squared. That
means that the units were also squared. To get the units back the same as the original
data values, the square root must be taken.

The sample standard deviation is not the unbiased estimator for the population
standard deviation.

The calculator does not have a variance key on it. It does have a standard deviation
key. You will have to square the standard deviation to find the variance.

Sum of Squares (shortcuts)


The sum of the squares of the deviations from the means is given a shortcut notation
and several alternative formulas.

A little algebraic simplification returns:

What's wrong with the first formula, you ask? Consider the following example - the
last row are the totals for the columns
1. Total the data values: 23
2. Divide by the number of values to get the mean: 23/5 = 4.6
3. Subtract the mean from each value to get the numbers in the second column.
4. Square each number in the second column to get the values in the third
column.
5. Total the numbers in the third column: 5.2
6. Divide this total by one less than the sample size to get the variance: 5.2 / 4 =
1.3

4 4 - 4.6 = -0.6 ( - 0.6 )^2 = 0.36

5 5 - 4.6 = 0.4 ( 0.4 ) ^2 = 0.16

3 3 - 4.6 = -1.6 ( - 1.6 )^2 = 2.56

6 6 - 4.6 = 1.4 ( 1.4 )^2 = 1.96

5 5 - 4.6 = 0.4 ( 0.4 )^2 = 0.16

23 0.00 (Always) 5.2

Not too bad, you think. But this can get pretty bad if the sample mean doesn't happen
to be an "nice" rational number. Think about having a mean of 19/7 =
2.714285714285... Those subtractions get nasty, and when you square them, they're
really bad. Another problem with the first formula is that it requires you to know the
mean ahead of time. For a calculator, this would mean that you have to save all of the
numbers that were entered. The TI-82 does this, but most scientific calculators don't.

Now, let's consider the shortcut formula. The only things that you need to find are the
sum of the values and the sum of the values squared. There is no subtraction and no
decimals or fractions until the end. The last row contains the sums of the columns, just
like before.

1. Record each number in the first column and the square of each number in the
second column.
2. Total the first column: 23
3. Total the second column: 111
4. Compute the sum of squares: 111 - 23*23/5 = 111 - 105.8 = 5.2
5. Divide the sum of squares by one less than the sample size to get the variance
= 5.2 / 4 = 1.3

x x^2

4 16

5 25

3 9

6 36

5 25

23 111

Chebyshev's Theorem
The proportion of the values that fall within k standard deviations of the mean will be

at least , where k is an number greater than 1.

"Within k standard deviations" interprets as the interval: to .

Chebyshev's Theorem is true for any sample set, not matter what the distribution.

Empirical Rule
The empirical rule is only valid for bell-shaped (normal) distributions. The following
statements are true.

 Approximately 68% of the data values fall within one standard deviation of the
mean.
 Approximately 95% of the data values fall within two standard deviations of
the mean.
 Approximately 99.7% of the data values fall within three standard deviations
of the mean.
The empirical rule will be revisited later in the chapter on normal probabilities.

Using the TI-82 to find these values


You may use the TI-82 to find the measures of central tendency and the measures of
variation using the list handling capabilities of the calculator.

Table of Contents

Stats: Measures of Position

Standard Scores (z-scores)


The standard score is obtained by subtracting the mean and dividing the difference by
the standard deviation. The symbol is z, which is why it's also called a z-score.

The mean of the standard scores is zero and the standard deviation is 1. This is the
nice feature of the standard score -- no matter what the original scale was, when the
data is converted to its standard score, the mean is zero and the standard deviation is
1.

Percentiles, Deciles, Quartiles


Percentiles (100 regions)

The kth percentile is the number which has k% of the values below it. The data must
be ranked.

1. Rank the data


2. Find k% (k /100) of the sample size, n.
3. If this is an integer, add 0.5. If it isn't an integer round up.
4. Find the number in this position. If your depth ends in 0.5, then take the
midpoint between the two numbers.

It is sometimes easier to count from the high end rather than counting from the low
end. For example, the 80th percentile is the number which has 80% below it and 20%
above it. Rather than counting 80% from the bottom, count 20% from the top.

Note: The 50th percentile is the median.

If you wish to find the percentile for a number (rather than locating the kth percentile),
then

1. Take the number of values below the number


2. Add 0.5
3. Divide by the total number of values
4. Convert it to a percent

Deciles (10 regions)

The percentiles divide the data into 100 equal regions. The deciles divide the data into
10 equal regions. The instructions are the same for finding a percentile, except instead
of dividing by 100 in step 2, divide by 10.

Quartiles (4 regions)

The quartiles divide the data into 4 equal regions. Instead of dividing by 100 in step 2,
divide by 4.

Note: The 2nd quartile is the same as the median. The 1st quartile is the 25th percentile,
the 3rd quartile is the 75th percentile.

The quartiles are commonly used (much more so than the percentiles or deciles). The
TI-82 calculator will find the quartiles for you. Some textbooks include the quartiles
in the five number summary.

Hinges
The lower hinge is the median of the lower half of the data up to and including the
median. The upper hinge is the median of the upper half of the data up to and
including the median.
The hinges are the same as the quartiles unless the remainder when dividing the
sample size by four is three (like 39 / 4 = 9 R 3).

The statement about the lower half or upper half including the median tends to be
confusing to some students. If the median is split between two values (which happens
whenever the sample size is even), the median isn't included in either since the median
isn't actually part of the data.

Example 1: sample size of 20

The median will be in position 10.5. The lower half is positions 1 - 10 and the upper
half is positions 11 - 20. The lower hinge is the median of the lower half and would be
in position 5.5. The upper hinge is the median of the upper half and would be in
position 5.5 starting with original position 11 as position 1 -- this is the original
position 15.5.

Example 2: sample size of 21

The median is in position 11. The lower half is positions 1 - 11 and the upper half is
positions 11 - 21. The lower hinge is the median of the lower half and would be in
position 6. The upper hinge is the median of the upper half and would be in position 6
when starting at position 11 -- this is original position 16.

Five Number Summary


The five number summary consists of the minimum value, lower hinge, median, upper
hinge, and maximum value. Some textbooks use the quartiles instead of the hinges.

Box and Whiskers Plot


A graphical representation of the five number summary. A box is drawn between the
lower and upper hinges with a line at the median. Whiskers (a single line, not a box)
extend from the hinges to lines at the minimum and maximum values.

Interquartile Range (IQR)


The interquartile range is the difference between the third and first quartiles. That's
it: Q3 - Q1

Outliers
Outliers are extreme values. There are mild outliers and extreme outliers. The Bluman
text does not distinguish between mild outliers and extreme outliers and just treats
either as an outlier.

Extreme Outliers

Extreme outliers are any data values which lie more than 3.0 times the interquartile
range below the first quartile or above the third quartile. x is an extreme outlier if ...
x < Q1 - 3 * IQR

or
x > Q3 + 3 * IQR
Mild Outliers

Mild outliers are any data values which lie between 1.5 times and 3.0 times the
interquartile range below the first quartile or above the third quartile. x is a mild
outlier if ...
Q1 - 3 * IQR <= x < Q1 - 1.5 * IQR

or
Q1 + 1.5 * IQR < x <= Q3 + 3 * IQR

Stats: Counting Techniques chp4

Definitions
Factorial
A positive integer factorial is the product of each natural number up to and
including the integer.
Permutation
An arrangement of objects in a specific order.
Combination
A selection of objects without regard to order.
Tree Diagram
A graphical device used to list all possibilities of a sequence of events in a
systematic way.

Table of Contents

Stats: Counting Techniques

Fundamental Theorems

Arithmetic

Every integer greater than one is either prime or can be expressed as an unique
product of prime numbers

Algebra

Every polynomial function on one variable of degree n > 0 has at least one real or
complex zero.

Linear Programming

If there is a solution to a linear programming problem, then it will occur at a corner


point or on a boundary between two or more corner points

Fundamental Counting Principle


In a sequence of events, the total possible number of ways all events can performed is
the product of the possible number of ways each individual event can be performed.

The Bluman text calls this multiplication principle 2.


Factorials
If n is a positive integer, then
n! = n (n-1) (n-2) ... (3)(2)(1)
n! = n (n-1)!

A special case is 0!
0! = 1

Permutations
A permutation is an arrangement of objects without repetition where order is
important.

Permutations using all the objects

A permutation of n objects, arranged into one group of size n, without repetition, and
order being important is:

nPn = P(n,n) = n!

Example: Find all permutations of the letters "ABC"


ABC ACB BAC BCA CAB CBA
Permutations of some of the objects

A permutation of n objects, arranged in groups of size r, without repetition, and order


being important is:

nPr = P(n,r) = n! / (n-r)!

Example: Find all two-letter permutations of the letters "ABC"


AB AC BA BC CA CB
Shortcut formula for finding a permutation

Assuming that you start a n and count down to 1 in your factorials ...

P(n,r) = first r factors of n factorial


Distinguishable Permutations

Sometimes letters are repeated and all of the permutations aren't distinguishable from
each other.

Example: Find all permutations of the letters "BOB"

To help you distinguish, I'll write the second "B" as "b"


BOb BbO OBb ObB bBO bOB

If you just write "B" as "B", however ...


BOB BBO OBB OBB BBO BBO

There are really only three distinguishable permutations here.


BOB BBO OBB

If a word has N letters, k of which are unique, and you let n (n1, n2, n3, ..., nk) be the
frequency of each of the k letters, then the total number of distinguishable
permutations is given by:

Consider the word "STATISTICS":

Here are the frequency of each letter: S=3, T=3, A=1, I=2, C=1, there are 10 letters
total
10! 10*9*8*7*6*5*4*3*2*1
Permutations = -------------- = -------------------- = 50400
3! 3! 1! 2! 1! 6 * 6 * 1 * 2 * 1

You can find distinguishable permutations using the TI-82.

Combinations
A combination is an arrangement of objects without repetition where order is not
important.
Note: The difference between a permutation and a combination is not whether there is
repetition or not -- there must not be repetition with either, and if there is repetition,
you can not use the formulas for permutations or combinations. The only difference
in the definition of a permutation and a combination is whether order is
important.

A combination of n objects, arranged in groups of size r, without repetition, and order


being important is:

nCr = C(n,r) = n! / ( (n-r)! * r! )

Another way to write a combination of n things, r at a time is using the binomial

notation:

Example: Find all two-letter combinations of the letters "ABC"


AB = BA AC = CA BC = CB

There are only three two-letter combinations.

Shortcut formula for finding a combination

Assuming that you start a n and count down to 1 in your factorials ...

C(n,r) = first r factors of n factorial divided by the last r factors of n factorial

Pascal's Triangle

Combinations are used in the binomial expansion theorem from algebra to give the
coefficients of the expansion (a+b)^n. They also form a pattern known as Pascal's
Triangle.
1
1 1
1 2 1
1 3 3 1
1 4 1 6 4
1 510 5 10
1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1

Each element in the table is the sum of the two elements directly above it. Each
element is also a combination. The n value is the number of the row (start counting at
zero) and the r value is the element in the row (start counting at zero). That would
make the 20 in the next to last row C(6,3) -- it's in the row #6 (7th row) and position #3
(4th element).

Symmetry

Pascal's Triangle illustrates the symmetric nature of a combination. C(n,r) = C(n,n-


r)

Example: C(10,4) = C(10,6) or C(100,99) = C(100,1)

Shortcut formula for finding a combination

Since combinations are symmetric, if n-r is smaller than r, then switch the
combination to its alternative form and then use the shortcut given above.

C(n,r) = first r factors of n factorial divided by the last r factors of n factorial

TI-82
You can use the TI-82 graphing calculator to find factorials, permutations, and
combinations.

Tree Diagrams
Tree diagrams are a graphical way of listing all the possible
outcomes. The outcomes are listed in an orderly fashion, so
listing all of the possible outcomes is easier than just trying
to make sure that you have them all listed. It is called a tree
diagram because of the way it looks.

The first event appears on the left, and then each sequential
event is represented as branches off of the first event.

The tree diagram to the right would show the possible ways
of flipping two coins. The final outcomes are obtained by following each branch to its
conclusion: They are from top to bottom:
HH HT TH TT
Table of Contents

Stats: Probability chp5

Definitions
Probability Experiment
Process which leads to well-defined results call outcomes
Outcome
The result of a single trial of a probability experiment
Sample Space
Set of all possible outcomes of a probability experiment
Event
One or more outcomes of a probability experiment
Classical Probability
Uses the sample space to determine the numerical probability that an event
will happen. Also called theoretical probability.
Equally Likely Events
Events which have the same probability of occurring.
Complement of an Event
All the events in the sample space except the given events.
Empirical Probability
Uses a frequency distribution to determine the numerical probability. An
empirical probability is a relative frequency.
Subjective Probability
Uses probability values based on an educated guess or estimate. It employs
opinions and inexact information.
Mutually Exclusive Events
Two events which cannot happen at the same time.
Disjoint Events
Another name for mutually exclusive events.
Independent Events
Two events are independent if the occurrence of one does not affect the
probability of the other occurring.
Dependent Events
Two events are dependent if the first event affects the outcome or occurrence
of the second event in a way the probability is changed.
Conditional Probability
The probability of an event occurring given that another event has already
occurred.
Bayes' Theorem
A formula which allows one to find the probability that an event occurred as
the result of a particular previous event.

Table of Contents

Stats: Introduction to Probability

Sample Spaces
A sample space is the set of all possible outcomes. However, some sample spaces are
better than others.

Consider the experiment of flipping two coins. It is possible to get 0 heads, 1 head, or
2 heads. Thus, the sample space could be {0, 1, 2}. Another way to look at it is flip {
HH, HT, TH, TT }. The second way is better because each event is as equally likely to
occur as any other.

When writing the sample space, it is highly desirable to have events which are equally
likely.

Another example is rolling two dice. The sums are { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }.
However, each of these aren't equally likely. The only way to get a sum 2 is to roll a 1
on both dice, but you can get a sum of 4 by rolling a 1-3, 2-2, or 3-1. The following
table illustrates a better sample space for the sum obtain when rolling two dice.

Second Die

First Die 1 2 3 4 5 6

1 2 3 4 5 6 7

2 3 4 5 6 7 8

3 4 5 6 7 8 9

4 5 6 7 8 9 10

5 6 7 8 9 10 11

6 7 8 9 10 11 12

Classical Probability
The above table lends itself to describing data another way -- using a probability
distribution. Let's consider the frequency distribution for the above sums.

Sum Frequency Relative


Frequency
2 1 1/36

3 2 2/36

4 3 3/36

5 4 4/36

6 5 5/36

7 6 6/36

8 5 5/36

9 4 4/36

10 3 3/36

11 2 2/36

12 1 1/36

If just the first and last columns were written, we would have a probability
distribution. The relative frequency of a frequency distribution is the probability of the
event occurring. This is only true, however, if the events are equally likely.

This gives us the formula for classical probability. The probability of an event
occurring is the number in the event divided by the number in the sample space.
Again, this is only true when the events are equally likely. A classical probability is
the relative frequency of each event in the sample space when each event is equally
likely.

P(E) = n(E) / n(S)


Empirical Probability
Empirical probability is based on observation. The empirical probability of an event is
the relative frequency of a frequency distribution based upon observation.

P(E) = f / n
Probability Rules
There are two rules which are very important.

All probabilities are between 0 and 1 inclusive


0 <= P(E) <= 1
The sum of all the probabilities in the sample space is 1

There are some other rules which are also important.

The probability of an event which cannot occur is 0.

The probability of any event which is not in the sample space is zero.

The probability of an event which must occur is 1.

The probability of the sample space is 1.

The probability of an event not occurring is one minus the probability of it occurring.
P(E') = 1 - P(E)

Continue and learn more about the rules of probability.

Table of Contents

Stats: Probability Rules

"OR" or Unions
Mutually Exclusive Events

Two events are mutually exclusive if they cannot occur at the same time. Another
word that means mutually exclusive is disjoint.

If two events are disjoint, then the probability of them both occurring at the same time
is 0.
Disjoint: P(A and B) = 0

If two events are mutually exclusive, then the probability of either occurring is the
sum of the probabilities of each occurring.

Specific Addition Rule

Only valid when the events are mutually exclusive.

P(A or B) = P(A) + P(B)


Example 1:

Given: P(A) = 0.20, P(B) = 0.70, A and B are disjoint

I like to use what's called a joint probability distribution. (Since disjoint means
nothing in common, joint is what they have in common -- so the values that go on the
inside portion of the table are the intersections or "and"s of each pair of events).
"Marginal" is another word for totals -- it's called marginal because they appear in the
margins.

B B' Marginal

A 0.00 0.20 0.20

A' 0.70 0.10 0.80

Marginal 0.70 0.30 1.00

The values in red are given in the problem. The grand total is always 1.00. The rest of
the values are obtained by addition and subtraction.

Non-Mutually Exclusive Events

In events which aren't mutually exclusive, there is some overlap. When P(A) and P(B)
are added, the probability of the intersection (and) is added twice. To compensate for
that double addition, the intersection needs to be subtracted.

General Addition Rule

Always valid.
P(A or B) = P(A) + P(B) - P(A and B)
Example 2:

Given P(A) = 0.20, P(B) = 0.70, P(A and B) = 0.15

B B' Marginal

A 0.15 0.05 0.20

A' 0.55 0.25 0.80

Marginal 0.70 0.30 1.00

Interpreting the table

Certain things can be determined from the joint probability distribution. Mutually
exclusive events will have a probability of zero. All inclusive events will have a zero
opposite the intersection. All inclusive means that there is nothing outside of those
two events: P(A or B) = 1.

B B' Marginal

A A and B are Mutually Exclusive if . .


this value is 0

A' . A and B are All Inclusive if this .


value is 0

Marginal . . 1.00

"AND" or Intersections
Independent Events

Two events are independent if the occurrence of one does not change the probability
of the other occurring.

An example would be rolling a 2 on a die and flipping a head on a coin. Rolling the 2
does not affect the probability of flipping the head.
If events are independent, then the probability of them both occurring is the product of
the probabilities of each occurring.

Specific Multiplication Rule

Only valid for independent events

P(A and B) = P(A) * P(B)


Example 3:

P(A) = 0.20, P(B) = 0.70, A and B are independent.

B B' Marginal

A 0.14 0.06 0.20

A' 0.56 0.24 0.80

Marginal 0.70 0.30 1.00

The 0.14 is because the probability of A and B is the probability of A times the
probability of B or 0.20 * 0.70 = 0.14.

Dependent Events

If the occurrence of one event does affect the probability of the other occurring, then
the events are dependent.

Conditional Probability

The probability of event B occurring that event A has already occurred is read "the
probability of B given A" and is written: P(B|A)

General Multiplication Rule

Always works.

P(A and B) = P(A) * P(B|A)


Example 4:

P(A) = 0.20, P(B) = 0.70, P(B|A) = 0.40

A good way to think of P(B|A) is that 40% of A is B. 40% of the 20% which was in
event A is 8%, thus the intersection is 0.08.

B B' Marginal

A 0.08 0.12 0.20

A' 0.62 0.18 0.80

Marginal 0.70 0.30 1.00

Independence Revisited

The following four statements are equivalent

1. A and B are independent events


2. P(A and B) = P(A) * P(B)
3. P(A|B) = P(A)
4. P(B|A) = P(B)

The last two are because if two events are independent, the occurrence of one doesn't
change the probability of the occurrence of the other. This means that the probability
of B occurring, whether A has happened or not, is simply the probability of B
occurring.

Continue with conditional probabilities.

Table of Contents

Stats: Conditional Probability

Conditional Probability
Recall that the probability of an event occurring given that another event has already
occurred is called a conditional probability.

The probability that event B occurs, given that event A has already occurred is

P(B|A) = P(A and B) / P(A)

This formula comes from the general multiplication principle and a little bit of
algebra.

Since we are given that event A has occurred, we have a reduced sample space.
Instead of the entire sample space S, we now have a sample space of A since we know
A has occurred. So the old rule about being the number in the event divided by the
number in the sample space still applies. It is the number in A and B (must be in A
since A has occurred) divided by the number in A. If you then divided numerator and
denominator of the right hand side by the number in the sample space S, then you
have the probability of A and B divided by the probability of A.

Examples
Example 1:

The question, "Do you smoke?" was asked of 100 people. Results are shown in the
table.

. Yes No Total

Male 19 41 60

Female 12 28 40

Total 31 69 100

 What is the probability of a randomly selected individual being a male who


smokes? This is just a joint probability. The number of "Male and Smoke"
divided by the total = 19/100 = 0.19
 What is the probability of a randomly selected individual being a male? This is
the total for male divided by the total = 60/100 = 0.60. Since no mention is
made of smoking or not smoking, it includes all the cases.
 What is the probability of a randomly selected individual smoking? Again,
since no mention is made of gender, this is a marginal probability, the total
who smoke divided by the total = 31/100 = 0.31.
 What is the probability of a randomly selected male smoking? This time,
you're told that you have a male - think of stratified sampling. What is the
probability that the male smokes? Well, 19 males smoke out of 60 males, so
19/60 = 0.31666...
 What is the probability that a randomly selected smoker is male? This time,
you're told that you have a smoker and asked to find the probability that the
smoker is also male. There are 19 male smokers out of 31 total smokers, so
19/31 = 0.6129 (approx)

After that last part, you have just worked a Bayes' Theorem problem. I know you
didn't realize it - that's the beauty of it. A Bayes' problem can be set up so it appears to
be just another conditional probability. In this class we will treat Bayes' problems as
another conditional probability and not involve the large messy formula given in the
text (and every other text).

Example 2:

There are three major manufacturing companies that make a product: Aberations,
Brochmailians, and Chompielians. Aberations has a 50% market share, and
Brochmailians has a 30% market share. 5% of Aberations' product is defective, 7% of
Brochmailians' product is defective, and 10% of Chompieliens' product is defective.

This information can be placed into a joint probability distribution

Company Good Defective Total

Aberations 0.50-0.025 = 0.475 0.05(0.50) = 0.025 0.50

Brochmailians 0.30-0.021 = 0.279 0.07(0.30) = 0.021 0.30

Chompieliens 0.20-0.020 = 0.180 0.10(0.20) = 0.020 0.20

Total 0.934 0.066 1.00

The percent of the market share for Chompieliens wasn't given, but since the
marginals must add to be 1.00, they have a 20% market share.
Notice that the 5%, 7%, and 10% defective rates don't go into the table directly. This
is because they are conditional probabilities and the table is a joint probability table.
These defective probabilities are conditional upon which company was given. That is,
the 7% is not P(Defective), but P(Defective|Brochmailians). The joint probability
P(Defective and Brochmailians) = P(Defective|Brochmailians) * P(Brochmailians).

The "good" probabilities can be found by subtraction as shown above, or by


multiplication using conditional probabilities. If 7% of Brochmailians' product is
defective, then 93% is good. 0.93(0.30)=0.279.

 What is the probability a randomly selected product is defective? P(Defective)


= 0.066
 What is the probability that a defective product came from Brochmailians?
P(Brochmailian|Defective) = P(Brochmailian and Defective) / P(Defective) =
0.021/0.066 = 7/22 = 0.318 (approx).
 Are these events independent? No. If they were, then
P(Brochmailians|Defective)=0.318 would have to equal the
P(Brochmailians)=0.30, but it doesn't. Also, the P(Aberations and
Defective)=0.025 would have to be P(Aberations)*P(Defective) =
0.50*0.066=0.033, and it doesn't.

The second question asked above is a Bayes' problem. Again, my point is, you don't
have to know Bayes formula just to work a Bayes' problem.

Bayes' Theorem
However, just for the sake of argument, let's say that you want to know what Bayes'
formula is.

Let's use the same example, but shorten each event to its one letter initial, ie: A, B, C,
and D instead of Aberations, Brochmailians, Chompieliens, and Defective.

P(D|B) is not a Bayes problem. This is given in the problem. Bayes' formula finds the
reverse conditional probability P(B|D).

It is based that the Given (D) is made of three parts, the part of D in A, the part of D in
B, and the part of D in C.
P(B and D)
P(B|D) = -----------------------------------------
P(A and D) + P(B and D) + P(C and D)
Inserting the multiplication rule for each of these joint probabilities gives
P(D|B)*P(B)
P(B|D) = -----------------------------------------
P(D|A)*P(A) + P(D|B)*P(B) + P(D|C)*P(C)

However, and I hope you agree, it is much easier to take the joint probability divided
by the marginal probability. The table does the adding for you and makes the
problems doable without having to memorize the formulas.

Table of Contents

Stats: Probability Distributions chp 6

Definitions
Random Variable
Variable whose values are determined by chance
Probability Distribution
The values a random variable can assume and the corresponding probabilities
of each.
Expected Value
The theoretical mean of the variable.
Binomial Experiment
An experiment with a fixed number of independent trials. Each trial can only
have two outcomes, or outcomes which can be reduced to two outcomes. The
probability of each outcome must remain constant from trial to trial.
Binomial Distribution
The outcomes of a binomial experiment with their corresponding
probabilities.
Multinomial Distribution
A probability distribution resulting from an experiment with a fixed number of
independent trials. Each trial has two or more mutually exclusive outcomes.
The probability of each outcome must remain constant from trial to trial.
Poisson Distribution
A probability distribution used when a density of items is distributed over a
period of time. The sample size needs to be large and the probability of
success to be small.
Hypergeometric Distribution
A probability distribution of a variable with two outcomes when sampling is
done without replacement.

Stats: Probability Distributions

Probability Functions
A probability function is a function which assigns probabilities to the values of a
random variable.

 All the probabilities must be between 0 and 1 inclusive


 The sum of the probabilities of the outcomes must be 1.

If these two conditions aren't met, then the function isn't a probability function. There
is no requirement that the values of the random variable only be between 0 and 1, only
that the probabilities be between 0 and 1.

Probability Distributions
A listing of all the values the random variable can assume with their corresponding
probabilities make a probability distribution.

A note about random variables. A random variable does not mean that the values can
be anything (a random number). Random variables have a well defined set of
outcomes and well defined probabilities for the occurrence of each outcome. The
random refers to the fact that the outcomes happen by chance -- that is, you don't
know which outcome will occur next.

Here's an example probability distribution that results from the rolling of a single fair
die.

x 1 2 3 4 5 6 sum
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 6/6=1

Mean, Variance, and Standard Deviation


Consider the following.

The definitions for population mean and variance used with an ungrouped frequency

distribution were:

Some of you might be confused by only dividing by N. Recall that this is the
population variance, the sample variance, which was the unbiased estimator for the
population variance was when it was divided by n-1.

Using algebra, this is equivalent to:

Recall that a probability is a long term relative frequency. So every f/N can be
replaced by p(x). This simplifies to
be:

What's even better, is that the last portion of the variance is the mean squared. So, the
two formulas that we will be using are:

Here's the example we were working on earlier.

x 1 2 3 4 5 6 sum
p(x) 1/6 1/6 1/6 1/6 1/6 1/6 6/6 = 1
x p(x) 1/6 2/6 3/6 4/6 5/6 6/6 21/6 = 3.5
x^2 p(x) 1/6 4/6 9/6 16/6 25/6 36/6 91/6 = 15.1667

The mean is 7/2 or 3.5


The variance is 91/6 - (7/2)^2 = 35/12 = 2.916666...
The standard deviation is the square root of the variance = 1.7078

Do not use rounded off values in the intermediate calculations. Only round off the
final answer.

You can learn how to find the mean and variance of a probability distribution using
lists with the TI-82 or using the program called pdist.

Stats: Binomial Probabilities

Binomial Experiment
A binomial experiment is an experiment which satisfies these four conditions

 A fixed number of trials


 Each trial is independent of the others
 There are only two outcomes
 The probability of each outcome remains constant from trial to trial.

These can be summarized as: An experiment with a fixed number of independent


trials, each of which can only have two possible outcomes.

The fact that each trial is independent actually means that the probabilities remain
constant.

Examples of binomial experiments

 Tossing a coin 20 times to see how many tails occur.


 Asking 200 people if they watch ABC news.
 Rolling a die to see if a 5 appears.

Examples which aren't binomial experiments

 Rolling a die until a 6 appears (not a fixed number of trials)


 Asking 20 people how old they are (not two outcomes)
 Drawing 5 cards from a deck for a poker hand (done without replacement, so
not independent)

Binomial Probability Function


Example:

What is the probability of rolling exactly two sixes in 6 rolls of a die?

There are five things you need to do to work a binomial story problem.

1. Define Success first. Success must be for a single trial. Success = "Rolling a 6 on
a single die"
2. Define the probability of success (p): p = 1/6
3. Find the probability of failure: q = 5/6
4. Define the number of trials: n = 6
5. Define the number of successes out of those trials: x = 2

Anytime a six appears, it is a success (denoted S) and anytime something else appears,
it is a failure (denoted F). The ways you can get exactly 2 successes in 6 trials are
given below. The probability of each is written to the right of the way it could occur.
Because the trials are independent, the probability of the event (all six dice) is the
product of each probability of each outcome (die)
1 FFFFSS 5/6 * 5/6 * 5/6 * 5/6 * 1/6 * 1/6 = (1/6)^2 * (5/6)^4
2 FFFSFS 5/6 * 5/6 * 5/6 * 1/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
3 FFFSSF 5/6 * 5/6 * 5/6 * 1/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
4 FFSFFS 5/6 * 5/6 * 1/6 * 5/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
5 FFSFSF 5/6 * 5/6 * 1/6 * 5/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
6 FFSSFF 5/6 * 5/6 * 1/6 * 1/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
7 FSFFFS 5/6 * 1/6 * 5/6 * 5/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
8 FSFFSF 5/6 * 1/6 * 5/6 * 5/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
9 FSFSFF 5/6 * 1/6 * 5/6 * 1/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
10 FSSFFF 5/6 * 1/6 * 1/6 * 5/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
11 SFFFFS 1/6 * 5/6 * 5/6 * 5/6 * 5/6 * 1/6 = (1/6)^2 * (5/6)^4
12 SFFFSF 1/6 * 5/6 * 5/6 * 5/6 * 1/6 * 5/6 = (1/6)^2 * (5/6)^4
13 SFFSFF 1/6 * 5/6 * 5/6 * 1/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
14 SFSFFF 1/6 * 5/6 * 1/6 * 5/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
15 SSFFFF 1/6 * 1/6 * 5/6 * 5/6 * 5/6 * 5/6 = (1/6)^2 * (5/6)^4
Notice that each of the 15 probabilities are exactly the same: (1/6)^2 * (5/6)^4.

Also, note that the 1/6 is the probability of success and you needed 2 successes. The
5/6 is the probability of failure, and if 2 of the 6 trials were success, then 4 of the 6
must be failures. Note that 2 is the value of x and 4 is the value of n-x.

Further note that there are fifteen ways this can occur. This is the number of ways 2
successes can be occur in 6 trials without repetition and order not being important, or
a combination of 6 things, 2 at a time.

The probability of getting exactly x success in n trials, with the probability of


success on a single trial being p is:

P(X=x) = nCx * p^x * q^(n-x)


Example:

A coin is tossed 10 times. What is the probability that exactly 6 heads will occur.

1. Success = "A head is flipped on a single coin"


2. p = 0.5
3. q = 0.5
4. n = 10
5. x=6

P(x=6) = 10C6 * 0.5^6 * 0.5^4 = 210 * 0.015625 * 0.0625 = 0.205078125

Mean, Variance, and Standard Deviation


The mean, variance, and standard deviation of a binomial distribution are extremely
easy to find.

Another way to remember the variance is mu-q (since the np is mu).


Example:

Find the mean, variance, and standard deviation for the number of sixes that appear
when rolling 30 dice.

Success = "a six is rolled on a single die". p = 1/6, q = 5/6.

The mean is 30 * (1/6) = 5. The variance is 30 * (1/6) * (5/6) = 25/6. The standard
deviation is the square root of the variance = 2.041241452 (approx)

Table of Contents Stats: Other Discrete Distributions

Multinomial Probabilities
A multinomial experiment is an extended binomial probability. The difference is that
in a multinomial experiment, there are more than two possible outcomes. However,
there are still a fixed number of independent trials, and the probability of each
outcome must remain constant from trial to trial.

Instead of using a combination, as in the case of the binomial probability, the number
of ways the outcomes can occur is done using distinguishable permutations.

An example here will be much more useful than a formula.

The probability that a person will pass a College Algebra class is 0.55, the probability
that a person will withdraw before the class is completed is 0.40, and the probability
that a person will fail the class is 0.05. Find the probability that in a class of 30
students, exactly 16 pass, 12 withdraw, and 2 fail.
Outcome x p(outcome)

Pass 16 0.55

Withdraw 12 0.40

Fail 2 0.05

Total 30 1.00

The probability is found using this formula:


30!
P = ---------------- * 0.55^16 * 0.40^12 * 0.05^2
(16!) (12!) (2!)

You can do this on the TI-82.

Poisson Probabilities
Named after the French mathematician Simeon Poisson, Poisson probabilities are
useful when there are a large number of independent trials with a small probability of
success on a single trial and the variables occur over a period of time. It can also be
used when a density of items is distributed over a given area or volume.

Lambda in the formula is the mean number of occurrences. If you're approximating a


binomial probability using the Poisson, then lambda is the same as mu or n * p.

Example:

If there are 500 customers per eight-hour day in a check-out lane, what is the
probability that there will be exactly 3 in line during any five-minute period?

The expected value during any one five minute period would be 500 / 96 =
5.2083333. The 96 is because there are 96 five-minute periods in eight hours. So, you
expect about 5.2 customers in 5 minutes and want to know the probability of getting
exactly 3.
p(3;500/96) = e^(-500/96) * (500/96)^3 / 3! = 0.1288 (approx)

Hypergeometric Probabilities
Hypergeometric experiments occur when the trials are not independent of each other
and occur due to sampling without replacement -- as in a five card poker hand.

Hypergeometric probabilities involve the multiplication of two combinations together


and then division by the total number of combinations.

Example:

How many ways can 3 men and 4 women be selected from a group of 7 men and 10
women?

The answer is = 7350/19448 = 0.3779 (approx)

Note that the sum of the numbers in the numerator are the numbers used in the
combination in the denominator.

This can be extended to more than two groups and called an extended hypergeometric
problem.

You can use the TI-82 to find hypergeometric probabilities.

Stats: Normal Distribution chp7

Definitions
Central Limit Theorem
Theorem which stats as the sample size increases, the sampling distribution of
the sample means will become approximately normally distributed.
Correction for Continuity
A correction applied to convert a discrete distribution to a continuous
distribution.
Finite Population Correction Factor
A correction applied to the standard error of the means when the sample size
is more than 5% of the population size and the sampling is done without
replacement.
Sampling Distribution of the Sample Means
Distribution obtained by using the means computed from random samples of
a specific size.
Sampling Error
Difference which occurs between the sample statistic and the population
parameter due to the fact that the sample isn't a perfect representation of the
population.
Standard Error or the Mean
The standard deviation of the sampling distribution of the sample means. It is
equal to the standard deviation of the population divided by the square root
of the sample size.
Standard Normal Distribution
A normal distribution in which the mean is 0 and the standard deviation is 1. It
is denoted by z.
Z-score
Also known as z-value. A standardized score in which the mean is zero and the
standard deviation is 1. The Z score is used to represent the standard normal
distribution.

Table of Contents
Stats - Normal Distributions

Any Normal Distribution


 Bell-shaped
 Symmetric about mean
 Continuous
 Never touches the x-axis
 Total area under curve is 1.00
 Approximately 68% lies within 1 standard deviation of the mean, 95% within 2
standard deviations, and 99.7% within 3 standard deviations of the mean. This
is the Empirical Rule mentioned earlier.
 Data values represented by x which has mean mu and standard deviation
sigma.

 Probability Function given by

Standard Normal Distribution


Same as a normal distribution, but also ...

 Mean is zero
 Variance is one
 Standard Deviation is one
 Data values represented by z.

 Probability Function given by

Normal Probabilities
Comprehension of this table is vital to success in the course!

There is a table which must be used to look up standard normal probabilities. The z-
score is broken into two parts, the whole number and tenth are looked up along the
left side and the hundredth is looked up across the top. The value in the intersection of
the row and column is the area under the curve between zero and the z-score looked
up.

Because of the symmetry of the normal distribution, look up the absolute value of any
z-score.

Computing Normal Probabilities

There are several different situations that can arise when asked to find normal
probabilities.

Situation Instructions

Between zero and Look up the area in the table


any number

Between two positives, or Look up both areas in the table and subtract the smaller from
Between two negatives the larger.

Between a negative and Look up both areas in the table and add them together
a positive

Less than a negative, or Look up the area in the table and subtract from 0.5000
Greater than a positive

Greater than a negative, or Look up the area in the table and add to 0.5000
Less than a positive

This can be shortened into two rules.

1. If there is only one z-score given, use 0.5000 for the second area, otherwise
look up both z-scores in the table
2. If the two numbers are the same sign, then subtract; if they are different signs,
then add. If there is only one z-score, then use the inequality to determine the
second sign (< is negative, and > is positive).
Finding z-scores from probabilities

This is more difficult, and requires you to use the table inversely. You must look up
the area between zero and the value on the inside part of the table, and then read the z-
score from the outside. Finally, decide if the z-score should be positive or negative,
based on whether it was on the left side or the right side of the mean. Remember, z-
scores can be negative, but areas or probabilities cannot be.

Situation Instructions

Area between 0 and a value Look up the area in the table


Make negative if on the left side

Area in one tail Subtract the area from 0.5000


Look up the difference in the table
Make negative if in the left tail

Area including one complete half Subtract 0.5000 from the area
(Less than a positive or greater than a Look up the difference in the table
negative) Make negative if on the left side

Within z units of the mean Divide the area by 2


Look up the quotient in the table
Use both the positive and negative z-scores

Two tails with equal area Subtract the area from 1.000
(More than z units from the mean) Divide the area by 2
Look up the quotient in the table
Use both the positive and negative z-scores

Using the table becomes proficient with practice, work lots of the normal probability
problems!

Table of Contents

Standard Normal Probabilities


z 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0 1 2 3 4 5 6 7 8 9

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
. 00 04 08 12 16 19 23 27 31 35
0 0 0 0 0 0 9 9 9 9 9

0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
. 39 43 47 51 55 59 63 67 71 75
1 8 8 8 7 7 6 6 5 4 3

0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.1 0.1
. 79 83 87 91 94 98 02 06 10 14
2 3 2 1 0 8 7 6 4 3 1

0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
. 17 21 25 29 33 36 40 44 48 51
3 9 7 5 3 1 8 6 3 0 7

0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
. 55 59 62 66 70 73 77 80 84 87
4 4 1 8 4 0 6 2 8 4 9

0 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2
. 91 95 98 01 05 08 12 15 19 22
5 5 0 5 9 4 8 3 7 0 4

0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
. 25 29 32 35 38 42 45 48 51 54
6 7 1 4 7 9 2 4 6 7 9

0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2
. 58 61 64 67 70 73 76 79 82 85
7 0 1 2 3 4 4 4 4 3 2

0 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.3
. 88 91 93 96 99 02 05 07 10 13
8 1 0 9 7 5 3 1 8 6 3
z 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0 1 2 3 4 5 6 7 8 9

0 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
. 15 18 21 23 26 28 31 34 36 38
9 9 6 2 8 4 9 5 0 5 9

1 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
. 41 43 46 48 50 53 55 57 59 62
0 3 8 1 5 8 1 4 7 9 1

1 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
. 64 66 68 70 72 74 77 79 81 83
1 3 5 6 8 9 9 0 0 0 0

1 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.4
. 84 86 88 90 92 94 96 98 99 01
2 9 9 8 7 5 4 2 0 7 5

1 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 03 04 06 08 09 11 13 14 16 17
3 2 9 6 2 9 5 1 7 2 7

1 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 19 20 22 23 25 26 27 29 30 31
4 2 7 2 6 1 5 9 2 6 9

1 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 33 34 35 37 38 39 40 41 42 44
5 2 5 7 0 2 4 6 8 9 1

1 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 45 46 47 48 49 50 51 52 53 54
6 2 3 4 4 5 5 5 5 5 5

1 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 55 56 57 58 59 59 60 61 62 63
7 4 4 3 2 1 9 8 6 5 3
z 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0 1 2 3 4 5 6 7 8 9

1 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 64 64 65 66 67 67 68 69 69 70
8 1 9 6 4 1 8 6 3 9 6

1 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 71 71 72 73 73 74 75 75 76 76
9 3 9 6 2 8 4 0 6 1 7

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 77 77 78 78 79 79 80 80 81 81
0 2 8 3 8 3 8 3 8 2 7

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 82 82 83 83 83 84 84 85 85 85
1 1 6 0 4 8 2 6 0 4 7

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 86 86 86 87 87 87 88 88 88 89
2 1 4 8 1 5 8 1 4 7 0

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 89 89 89 90 90 90 90 91 91 91
3 3 6 8 1 4 6 9 1 3 6

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 91 92 92 92 92 92 93 93 93 93
4 8 0 2 5 7 9 1 2 4 6

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 93 94 94 94 94 94 94 94 95 95
5 8 0 1 3 5 6 8 9 1 2

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 95 95 95 95 95 96 96 96 96 96
6 3 5 6 7 9 0 1 2 3 4
z 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0 1 2 3 4 5 6 7 8 9

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 96 96 96 96 96 97 97 97 97 97
7 5 6 7 8 9 0 1 2 3 4

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 97 97 97 97 97 97 97 97 98 98
8 4 5 6 7 7 8 9 9 0 1

2 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 98 98 98 98 98 98 98 98 98 98
9 1 2 2 3 4 4 5 5 6 6

3 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4
. 98 98 98 98 98 98 98 98 99 99
0 7 7 7 8 8 9 9 9 0 0

The values in the table are the areas between zero and the z-score. That is,
P(0<Z<z-score)

Stats: Central Limit Theorem

Sampling Distribution of the Sample Means


Instead of working with individual scores, statisticians often work with means. What
happens is that several samples are taken, the mean is computed for each sample, and
then the means are used as the data, rather than individual scores being used. The
sample is a sampling distribution of the sample means.

When all of the possible sample means are computed, then the following properties
are true:

 The mean of the sample means will be the mean of the population
 The variance of the sample means will be the variance of the population
divided by the sample size.
 The standard deviation of the sample means (known as the standard error of the
mean) will be smaller than the population standard deviation and will be equal
to the standard deviation of the population divided by the square root of the
sample size.
 If the population has a normal distribution, then the sample means will have a
normal distribution.
 If the population is not normally distributed, but the sample size is sufficiently
large, then the sample means will have an approximately normal distribution.
Some books define sufficiently large as at least 30 and others as at least 31.

The formula for a z-score when working with the sample means is:

Finite Population Correction Factor


If the sample size is more than 5% of the population size and the sampling is done
without replacement, then a correction needs to be made to the standard error of the
means.

In the following, N is the population size and n is the sample size. The adjustment is
to multiply the standard error by the square root of the quotient of the difference
between the population and sample sizes and one less than the population

size.

For the most part, we will be ignoring this in class.

Table of Contents

Stats: Normal Approximation to Binomial


Recall that according to the Central Limit Theorem, the sample mean of any
distribution will become approximately normal if the sample size is sufficiently large.

It turns out that the binomial distribution can be approximated using the normal
distribution if np and nq are both at least 5. Furthermore, recall that the mean of a
binomial distribution is np and the variance of the binomial distribution is npq.

Continuity Correction Factor


There is a problem with approximating the binomial with the normal. That problem
arises because the binomial distribution is a discrete distribution while the normal
distribution is a continuous distribution. The basic difference here is that with discrete
values, we are talking about heights but no widths, and with the continuous
distribution we are talking about both heights and widths.

The correction is to either add or subtract 0.5 of a unit from each discrete x-value.
This fills in the gaps to make it continuous. This is very similar to expanding of limits
to form boundaries that we did with group frequency distributions.

Examples

Discrete Continuous
x=6 5.5 < x < 6.5
x>6 x > 6.5
x >= 6 x > 5.5
x<6 x < 5.5
x <= 6 x < 6.5

As you can see, whether or not the equal to is included makes a big difference in the
discrete distribution and the way the conversion is performed. However, for a
continuous distribution, equality makes no difference.

Steps to working a normal approximation to the binomial distribution

1. Identify success, the probability of success, the number of trials, and the
desired number of successes. Since this is a binomial problem, these are the
same things which were identified when working a binomial problem.
2. Convert the discrete x to a continuous x. Some people would argue that step 3
should be done before this step, but go ahead and convert the x before you
forget about it and miss the problem.
3. Find the smaller of np or nq. If the smaller one is at least five, then the larger
must also be, so the approximation will be considered good. When you find np,
you're actually finding the mean, mu, so denote it as such.
4. Find the standard deviation, sigma = sqrt (npq). It might be easier to find the
variance and just stick the square root in the final calculation - that way you
don't have to work with all of the decimal places.
5. Compute the z-score using the standard formula for an individual score (not the
one for a sample mean).

Calculate the probability desi Stats: Estimation chp8

Definitions
Confidence Interval
An interval estimate with a specific level of confidence
Confidence Level
The percent of the time the true mean will lie in the interval estimate given.
Consistent Estimator
An estimator which gets closer to the value of the parameter as the sample
size increases.
Degrees of Freedom
The number of data values which are allowed to vary once a statistic has been
determined.
Estimator
A sample statistic which is used to estimate a population parameter. It must
be unbiased, consistent, and relatively efficient.
Interval Estimate
A range of values used to estimate a parameter.
Maximum Error of the Estimate
The maximum difference between the point estimate and the actual
parameter. The Maximum Error of the Estimate is 0.5 the width of the
confidence interval for means and proportions.
Point Estimate
A single value used to estimate a parameter.
Relatively Efficient Estimator
The estimator for a parameter with the smallest variance.
T distribution
A distribution used when the population variance is unknown.
Unbiased Estimator
An estimator whose expected value is the mean of the parameter being
estimated.

6. red.

Stats: Introduction to Estimation

One area of concern in inferential statistics is the estimation of the population


parameter from the sample statistic. It is important to realize the order here. The
sample statistic is calculated from the sample data and the population parameter is
inferred (or estimated) from this sample statistic. Let me say that again: Statistics are
calculated, parameters are estimated.

We talked about problems of obtaining the value of the parameter earlier in the course
when we talked about sampling techniques.

Another area of inferential statistics is sample size determination. That is, how large
of a sample should be taken to make an accurate estimation. In these cases, the
statistics can't be used since the sample hasn't been taken yet.

Point Estimates
There are two types of estimates we will find: Point Estimates and Interval Estimates.
The point estimate is the single best value.

A good estimator must satisfy three conditions:

 Unbiased: The expected value of the estimator must be equal to the mean of
the parameter
 Consistent: The value of the estimator approaches the value of the parameter
as the sample size increases
 Relatively Efficient: The estimator has the smallest variance of all estimators
which could be used

Confidence Intervals
The point estimate is going to be different from the population parameter because due
to the sampling error, and there is no way to know who close it is to the actual
parameter. For this reason, statisticians like to give an interval estimate which is a
range of values used to estimate the parameter.

A confidence interval is an interval estimate with a specific level of confidence. A


level of confidence is the probability that the interval estimate will contain the
parameter. The level of confidence is 1 - alpha. 1-alpha area lies within the confidence
interval.

Maximum Error of the Estimate

The maximum error of the estimate is denoted by E and is one-half the width of the
confidence interval. The basic confidence interval for a symmetric distribution is set
up to be the point estimate minus the maximum error of the estimate is less than the
true population parameter which is less than the point estimate plus the maximum
error of the estimate. This formula will work for means and proportions because they
will use the Z or T distributions which are symmetric. Later, we will talk about
variances, which don't use a symmetric distribution, and the formula will be different.

Area in Tails

Since the level of confidence is 1-alpha, the amount in the tails is alpha. There is a
notation in statistics which means the score which has the specified area in the right
tail.

Examples:
 Z(0.05) = 1.645 (the Z-score which has 0.05 to the right, and 0.4500 between 0
and it)
 Z(0.10) = 1.282 (the Z-score which has 0.10 to the right, and 0.4000 between 0
and it).

As a shorthand notation, the () are usually dropped, and the probability written as a
subscript. The greek letter alpha is used represent the area in both tails for a
confidence interval, and so alpha/2 will be the area in one tail.

Here are some common values

Confidence Area between Area in one z-score


Level 0 and z-score tail (alpha/2)

50% 0.2500 0.2500 0.674

80% 0.4000 0.1000 1.282

90% 0.4500 0.0500 1.645

95% 0.4750 0.0250 1.960

98% 0.4900 0.0100 2.326

99% 0.4950 0.0050 2.576

Notice in the above table, that the area between 0 and the z-score is simply one-half of
the confidence level. So, if there is a confidence level which isn't given above, all you
need to do to find it is divide the confidence level by two, and then look up the area in
the inside part of the Z-table and look up the z-score on the outside.

Also notice - if you look at the student's t distribution, the top row is a level of
confidence, and the bottom row is the z-score. In fact, this is where I got the extra
digit of accuracy from.

Table of Contents
Stats: Estimating the Mean

You are estimating the population mean, mu, not the sample mean, x bar.

Population Standard Deviation Known


If the population standard deviation, sigma is known, then the mean has a normal (Z)
distribution.

The maximum error of the estimate is given by the formula for E


shown. The Z here is the z-score obtained from the normal table, or
the bottom of the t-table as explained in the introduction
to estimation. The z-score is a factor of the level of confidence, so
you may get in the habit of writing it next to the level of confidence.

Once you have computed E, I suggest you save it to the memory on your calculator.
On the TI-82, a good choice would be the letter E. The reason for this is that the limits
for the confidence interval are now found by subtracting and adding the maximum
error of the estimate from/to the sample mean.

Student's t Distribution
When the population standard deviation is unknown, the mean has a Student's t
distribution. The Student's t distribution was created by William T. Gosset, an Irish
brewery worker. The brewery wouldn't allow him to publish his work under his name,
so he used the pseudonym "Student".

The Student's t distribution is very similar to the standard normal distribution.

 It is symmetric about its mean


 It has a mean of zero
 It has a standard deviation and variance greater than 1.
 There are actually many t distributions, one for each degree of freedom
 As the sample size increases, the t distribution approaches the normal
distribution.
 It is bell shaped.
 The t-scores can be negative or positive, but the probabilities are always
positive.

Degrees of Freedom

A degree of freedom occurs for every data value which is allowed to vary once a
statistic has been fixed. For a single mean, there are n-1 degrees of freedom. This
value will change depending on the statistic being used.

Population Standard Deviation Unknown


If the population standard deviation, sigma is unknown, then the mean has a student's
t (t) distribution and the sample standard deviation is used instead of the population
standard deviation.

The maximum error of the estimate is given by the formula for E


shown. The t here is the t-score obtained from the Student's t table.
The t-score is a factor of the level of confidence and the sample
size.

Once you have computed E, I suggest you save it to the memory on your calculator.
On the TI-82, a good choice would be the letter E. The reason for this is that the limits
for the confidence interval are now found by subtracting and adding the maximum
error of the estimate from/to the sample mean.

Notice the formula is the same as for a population mean when the population standard
deviation is known. The only thing that has changed is the formula for the maximum
error of the estimate.

Table of Contents

Student's T Critical Values


Conf. Level 50% 80% 90% 95% 98% 99%

One Tail 0.250 0.100 0.050 0.025 0.010 0.005

Two Tail 0.500 0.200 0.100 0.050 0.020 0.010

df = 1 1.000 3.078 6.314 12.706 31.821 63.657

2 0.816 1.886 2.920 4.303 6.965 9.925

3 0.765 1.638 2.353 3.182 4.541 5.841

4 0.741 1.533 2.132 2.776 3.747 4.604

5 0.727 1.476 2.015 2.571 3.365 4.032

6 0.718 1.440 1.943 2.447 3.143 3.707

7 0.711 1.415 1.895 2.365 2.998 3.499

8 0.706 1.397 1.860 2.306 2.896 3.355

9 0.703 1.383 1.833 2.262 2.821 3.250

10 0.700 1.372 1.812 2.228 2.764 3.169

11 0.697 1.363 1.796 2.201 2.718 3.106

12 0.695 1.356 1.782 2.179 2.681 3.055

13 0.694 1.350 1.771 2.160 2.650 3.012

14 0.692 1.345 1.761 2.145 2.624 2.977

15 0.691 1.341 1.753 2.131 2.602 2.947

16 0.690 1.337 1.746 2.120 2.583 2.921

17 0.689 1.333 1.740 2.110 2.567 2.898

18 0.688 1.330 1.734 2.101 2.552 2.878


Conf. Level 50% 80% 90% 95% 98% 99%

One Tail 0.250 0.100 0.050 0.025 0.010 0.005

Two Tail 0.500 0.200 0.100 0.050 0.020 0.010

19 0.688 1.328 1.729 2.093 2.539 2.861

20 0.687 1.325 1.725 2.086 2.528 2.845

21 0.686 1.323 1.721 2.080 2.518 2.831

22 0.686 1.321 1.717 2.074 2.508 2.819

23 0.685 1.319 1.714 2.069 2.500 2.807

24 0.685 1.318 1.711 2.064 2.492 2.797

25 0.684 1.316 1.708 2.060 2.485 2.787

26 0.684 1.315 1.706 2.056 2.479 2.779

27 0.684 1.314 1.703 2.052 2.473 2.771

28 0.683 1.313 1.701 2.048 2.467 2.763

29 0.683 1.311 1.699 2.045 2.462 2.756

30 0.683 1.310 1.697 2.042 2.457 2.750

40 0.681 1.303 1.684 2.021 2.423 2.704

50 0.679 1.299 1.676 2.009 2.403 2.678

60 0.679 1.296 1.671 2.000 2.390 2.660

70 0.678 1.294 1.667 1.994 2.381 2.648

80 0.678 1.292 1.664 1.990 2.374 2.639

90 0.677 1.291 1.662 1.987 2.368 2.632


Conf. Level 50% 80% 90% 95% 98% 99%

One Tail 0.250 0.100 0.050 0.025 0.010 0.005

Two Tail 0.500 0.200 0.100 0.050 0.020 0.010

100 0.677 1.290 1.660 1.984 2.364 2.626

z 0.674 1.282 1.645 1.960 2.326 2.576

Stats: Estimating the Proportion

You are estimating the population proportion, p.

All estimation done here is based on the fact that the normal can be used to
approximate the binomial distribution when np and nq are both at least 5. Thus, the p
that were talking about is the probability of success on a single trial from the binomial
experiments.

Recall:

The best point estimate for p is p hat, the sample proportion:

If the formula for z is divided by n in both the numerator and the denominator, then

the formula for z becomes:


Solving this for p to come up with a confidence interval, gives the maximum error of

the estimate as: .

This is not, however, the formula that we will use. The problem with estimation is that
you don't know the value of the parameter (in this case p), so you can't use it to
estimate itself - if you knew it, then there would be no problem to work out. So we
will replace the parameter by the statistic in the formula for the maximum error of the
estimate.

The maximum error of the estimate is given by the formula for E


shown. The Z here is the z-score obtained from the normal table,
or the bottom of the t-table as explained in the introduction
to estimation. The z-score is a factor of the level of confidence, so
you may get in the habit of writing it next to the level of confidence.

When you're computing E, I suggest that you find the sample proportion, p hat, and
save it to P on the calculator. This way, you can find q as (1-p). Do NOT round the
value for p hat and use the rounded value in the calculations. This will lead to error.
Once you have computed E, I suggest you save it to the memory on your calculator.
On the TI-82, a good choice would be the letter E. The reason for this is that the limits
for the confidence interval are now found by subtracting and adding the maximum
error of the estimate from/to the sample proportion.

Table of Contents

Stats: Sample Size Determination chp9

The sample size determination formulas come from the formulas for the maximum
error of the estimates. The formula is solved for n. Be sure to round the answer
obtained up to the next whole number, not off to the nearest whole number. If you
round off, then you will exceed your maximum error of the estimate in some cases.
By rounding up, you will have a smaller maximum error of the estimate than allowed,
but this is better than having a larger one than desired.

Population Mean
Here is the formula for the sample size which is obtained by
solving the maximum error of the estimate formula for the
population mean for n.

Population Proportion
Here is the formula for the sample size which is obtained by
solving the maximum error of the estimate formula for the
population proportion for n. Some texts use p hat and q hat,
but since the sample hasn't been taken, there is no value for
the sample proportion. p and q are taken from a previous study, if one is available. If
there is no previous study or estimate available, then use 0.5 for p and q, as these are
the values which will give the largest sample size, and it is better to have too large of
a sample size and come under the maximum error of the estimate than to have too
small of a sample size and exceed the maximum error of the estimate.

Table of Contents

Stats: Hypothesis Testing

Definitions
Null Hypothesis ( H0 )
Statement of zero or no change. If the original claim includes equality (<=, =, or
>=), it is the null hypothesis. If the original claim does not include equality (<, not
then the null hypothesis is the complement of the original claim. The
equal, >)
null hypothesis always includes the equal sign. The decision is based on the
null hypothesis.
Alternative Hypothesis ( H1 or Ha )
Statement which is true if the null hypothesis is false. The type of test (left,
right, or two-tail) is based on the alternative hypothesis.
Type I error
Rejecting the null hypothesis when it is true (saying false when true). Usually
the more serious error.
Type II error
Failing to reject the null hypothesis when it is false (saying true when false).
alpha
Probability of committing a Type I error.
beta
Probability of committing a Type II error.
Test statistic
Sample statistic used to decide whether to reject or fail to reject the null
hypothesis.
Critical region
Set of all values which would cause us to reject H0
Critical value(s)
The value(s) which separate the critical region from the non-critical region.
The critical values are determined independently of the sample statistics.
Significance level ( alpha )
The probability of rejecting the null hypothesis when it is true. alpha = 0.05
and alpha = 0.01 are common. If no level of significance is given, use alpha =
0.05. The level of significance is the complement of the level of confidence in
estimation.
Decision
A statement based upon the null hypothesis. It is either "reject the null
hypothesis" or "fail to reject the null hypothesis". We will never accept the
null hypothesis.
Conclusion
A statement which indicates the level of evidence (sufficient or insufficient), at
what level of significance, and whether the original claim is rejected (null) or
supported (alternative).

Table of Contents

Stats: Hypothesis Testing

Introduction
Be sure to read through the definitions for this section before trying to make sense out
of the following.

The first thing to do when given a claim is to write the claim mathematically (if
possible), and decide whether the given claim is the null or alternative hypothesis. If
the given claim contains equality, or a statement of no change from the given or
accepted condition, then it is the null hypothesis, otherwise, if it represents change, it
is the alternative hypothesis.

The following example is not a mathematical example, but may help introduce the
concept.

Example
"He's dead, Jim," said Dr. McCoy to Captain Kirk.
Mr. Spock, as the science officer, is put in charge of statistically determining the
correctness of Bones' statement and deciding the fate of the crew member (to vaporize
or try to revive)

His first step is to arrive at the hypothesis to be tested.

Does the statement represent a change in previous condition?

 Yes, there is change, thus it is the alternative hypothesis, H1


 No, there is no change, therefore is the null hypothesis, H0

The correct answer is that there is change. Dead represents a change from the
accepted state of alive. The null hypothesis always represents no change. Therefore,
the hypotheses are:

 H0 : Patient is alive.
 H1 : Patient is not alive (dead).

States of nature are something that you, as a statistician have no control over. Either it
is, or it isn't. This represents the true nature of things.

Possible states of nature (Based on H0)

 Patient is alive (H0 true - H1 false )


 Patient is dead (H0 false - H1 true)

Decisions are something that you have control over. You may make a correct decision
or an incorrect decision. It depends on the state of nature as to whether your decision
is correct or in error.

Possible decisions (Based on H0 ) / conclusions (Based on claim )

 Reject H0 / "Sufficient evidence to say patient is dead"


 Fail to Reject H0 / "Insufficient evidence to say patient is dead"

There are four possibilities that can occur based on the two possible states of nature
and the two decisions which we can make.

Statisticians will never accept the null hypothesis, we will fail to reject. In other
words, we'll say that it isn't, or that we don't have enough evidence to say that it isn't,
but we'll never say that it is, because someone else might come along with another
sample which shows that it isn't and we don't want to be wrong.
Statistically (double) speaking ...
State of Nature

Decision H0 True H0 False

Reject H0 Patient is Patient is dead,


alive,
Sufficient evidence of death
Sufficient
evidence
of death

Fail to reject H0 Patient is Patient is dead,


alive,
Insufficient evidence of death
Insufficient
evidence
of death

In English ...
State of Nature

Decision H0 True H0 False

Reject H0 Vaporize Vaporize a dead person


a live
person

Fail to reject H0 Try to Try to revive a dead person


revive a
live
person

Were you right ? ...


State of Nature
Decision H0 True H0 False

Reject H0 Type I Error Correct Assessment


alpha

Fail to reject H0 Correct Type II Error


Assessment beta

Which of the two errors is more serious? Type I or Type II ?

Since Type I is the more serious error (usually), that is the one we concentrate on. We
usually pick alpha to be very small (0.05, 0.01). Note: alpha is not a Type I error.
Alpha is the probability of committing a Type I error. Likewise beta is the probability
of committing a Type II error.

Conclusions

Conclusions are sentence answers which include whether there is enough evidence or
not (based on the decision), the level of significance, and whether the original claim is
supported or rejected.

Conclusions are based on the original claim, which may be the null or alternative
hypotheses. The decisions are always based on the null hypothesis

Original Claim

H0 H1
Decision "REJECT" "SUPPORT"

Reject H0 There There is sufficient evidence at the alpha level of


"SUFFICIENT" is sufficient evidence significance to support the claim that (insert
at the alpha level of original claim here)
significance
to reject the claim that
(insert original claim
here)
Fail to reject H0 There There is insufficient evidence at the alpha level of
"INSUFFICIENT" is insufficient evidence significance to support the claim that (insert
at the alpha level of original claim here)
significance
to rejectthe claim that
(insert original claim
here)

Table of Contents

Stats: Type of Tests

This document will explain how to determine if the test is a left tail, right tail, or two-
tail test.

The type of test is determined by the Alternative Hypothesis ( H1 )

Left Tailed Test


H1: parameter < value
Notice the inequality points to the left

Decision Rule: Reject H0 if t.s. < c.v.

Right Tailed Test


H1: parameter > value
Notice the inequality points to the right
Decision Rule: Reject H0 if t.s. > c.v.

Two Tailed Test


H1: parameter not equal value
Another way to write not equal is < or >
Notice the inequality points to both sides

Decision Rule: Reject H0 if t.s. < c.v. (left) or t.s. > c.v. (right)

The decision rule can be summarized as follows:

Reject H0 if the test statistic falls in the critical region

(Reject H0 if the test statistic is more extreme than the critical value)

Table of Contents

Stats: Confidence Intervals as Tests

Using the confidence interval to perform a hypothesis test only works with a two-
tailed test.

 If the hypothesized value of the parameter lies within the confidence interval
with a 1-alpha level of confidence, then the decision at an alpha level of
significance is to fail to reject the null hypothesis.
 If the hypothesized value of the parameter lies outside the confidence interval
with a 1-alpha level of confidence, then the decision at an alpha level of
significance is to reject the null hypothesis.
Sounds simple enough, right? It is.

However, it has a couple of problems.

 It only works with two-tail hypothesis tests.


 It requires that you compute the confidence interval first. This involves taking a
z-score or t-score and converting it into an x-score, which is more difficult than
standardizing an x-score.

Table of Contents

Stats: Hypothesis Testing Steps

Here are the steps to performing hypothesis testing

1. Write the original claim and identify whether it is the null hypothesis or the
alternative hypothesis.
2. Write the null and alternative hypothesis. Use the alternative hypothesis to
identify the type of test.
3. Write down all information from the problem.
4. Find the critical value using the tables
5. Compute the test statistic
6. Make a decision to reject or fail to reject the null hypothesis. A picture showing
the critical value and test statistic may be useful.
7. Write the conclusion.

Table of Contents

Stats: Testing a Single Proportion


You are testing p, you are not testing p hat. If you knew the value of p, then there
would be nothing to test.

All hypothesis testing is done under the assumption the null


hypothesis is true!
I can't emphasize this enough. The value for all population parameters in the test
statistics come from the null hypothesis. This is true not only for proportions, but all
of the testing we're going to be doing.

The population proportion has an approximately normal distribution


if np and nq are both at least 5. Remember that we are approximating
the binomial using the normal, and that the p we're talking about is
the probability of success on a single trial. The test statistic is shown
in the box to the right.

The critical value is found from the normal table, or from the bottom row of the t-
table.

The steps involved in the hypothesis testing remain the same. The only thing that
changes is the formula for calculating the test statistic and perhaps the distribution
which is used.

General Pattern
Notice the general pattern of these test statistics is (observed - expected) / standard
deviation.

Table of Contents

Stats: Testing a Single Proportion


You are testing p, you are not testing p hat. If you knew the value of p, then there
would be nothing to test.

All hypothesis testing is done under the assumption the null


hypothesis is true!
I can't emphasize this enough. The value for all population parameters in the test
statistics come from the null hypothesis. This is true not only for proportions, but all
of the testing we're going to be doing.

The population proportion has an approximately normal distribution


if np and nq are both at least 5. Remember that we are approximating
the binomial using the normal, and that the p we're talking about is
the probability of success on a single trial. The test statistic is shown
in the box to the right.

The critical value is found from the normal table, or from the bottom row of the t-
table.

The steps involved in the hypothesis testing remain the same. The only thing that
changes is the formula for calculating the test statistic and perhaps the distribution
which is used.

General Pattern
Notice the general pattern of these test statistics is (observed - expected) / standard
deviation.

Table of Contents

Stats: Probability Values


Classical Approach
The Classical Approach to hypothesis testing is to compare a test statistic and a
critical value. It is best used for distributions which give areas and require you to look
up the critical value (like the Student's t distribution) rather than distributions which
have you look up a test statistic to find an area (like the normal distribution).

The Classical Approach also has three different decision rules, depending on whether
it is a left tail, right tail, or two tail test.

One problem with the Classical Approach is that if a different level of significance is
desired, a different critical value must be read from the table.

P-Value Approach
The P-Value Approach, short for Probability Value, approaches hypothesis testing
from a different manner. Instead of comparing z-scores or t-scores as in the classical
approach, you're comparing probabilities, or areas.

The level of significance (alpha) is the area in the critical region. That is, the area in
the tails to the right or left of the critical values.

The p-value is the area to the right or left of the test statistic. If it is a two tail test, then
look up the probability in one tail and double it.

If the test statistic is in the critical region, then the p-value will be less than the level
of significance. It does not matter whether it is a left tail, right tail, or two tail test.
This rule always holds.

Reject the null hypothesis if the p-value is less than the level of significance.

You will fail to reject the null hypothesis if the p-value is greater than or equal to the
level of significance.

The p-value approach is best suited for the normal distribution when doing
calculations by hand. However, many statistical packages will give the p-value but not
the critical value. This is because it is easier for a computer or calculator to find the
probability than it is to find the critical value.
Another benefit of the p-value is that the statistician immediately knows at what level
the testing becomes significant. That is, a p-value of 0.06 would be rejected at an 0.10
level of significance, but it would fail to reject at an 0.05 level of significance.
Warning: Do not decide on the level of significance after calculating the test statistic
and finding the p-value.

Here is a proportion to help you keep the order straight. Any proportion equivalent to
the following statement is correct.

The test statistic is to the p-value as the critical value is to the level of significance.

Table of Contents

Stats: Two Parameter Testing chp10

Definitions
Dependent Samples
Samples in which the subjects are paired or matched in some way. Dependent
samples must have the same sample size, but it is possible to have the same
sample size without being dependent.
Independent Samples
Samples which are independent when they are not related. Independent
samples may or may not have the same sample size.
Pooled Estimate of the Variance
A weighted average of the two sample variances when the variances are
equal. The variances are "close enough" to be considered equal, but not
exactly the same, so this pooled estimate brings the two together to find the
average variance.
Table of Contents

Stats: Dependent Means

There are two possible cases when testing two population means, the dependent case
and the independent case. Most books treat the independent case first, but I'm putting
the dependent case first because it follows immediately from the test for a single
population mean in the previous chapter.

The Mean of the Difference:


The idea with the dependent case is to create a new variable, D, which is the
difference between the paired values. You will then be testing the mean of this new
variable.

Here are some steps to help you accomplish the hypothesis testing

1. Write down the original claim in simple terms. For example: After > Before.
2. Move everything to one side: After - Before > 0.
3. Call the difference you have on the left side D: D = After - Before > 0.
4. Convert to proper notation:
5. Compute the new variable D and be sure to follow the order you have defined
in step 3. Do not simply take the smaller away from the larger. From this point,
you can think of having a new set of values. Technically, they are called D, but
you can think of them as x. The original values from the two samples can be
discarded.
6. Find the mean and standard deviation of the variable D. Use these as the values
in the t-test from chapter 9.
Table of Contents

Stats: Independent Means

Sums and Differences of Independent Variables


Independent variables can be combined to form new variables. The mean and variance
of the combination can be found from the means and the variances of the original
variables.

Combination of Variables In English (Melodic Mathematics)

The mean of a sum is the sum of the means.

The mean of a difference is the difference of the


means.

The variance of a sum is the sum of the variances.

The variance of a difference is the sum of the


variances.

The Difference of the Means:


Since we are combining two variables by subtraction, the important rules from the
table above are that the mean of the difference is the difference of the means and the
variance of the difference is the sum of the variances.

It is important to note that the variance of the difference is the sum of the variances,
not the standard deviation of the difference is the sum of the standard deviations.
When we go to find the standard error, we must combine variances to do so. Also,
you're probably wondering why the variance of the difference is the sum of the
variances instead of the difference of the variances. Since the values are squared, the
negative associated with the second variable becomes positive, and it becomes the
sum of the variances. Also, variances can't be negative, and if you took the difference
of the variances, it could be negative.

Population Variances Known

When the population variances are known, the difference


of the means has a normal distribution. The variance of
the difference is the sum of the variances divided by the
sample sizes. This makes sense, hopefully, because
according to the central limit theorem, the variance of the
sampling distribution of the sample means is the variance divided by the sample size,
so what we are doing is add the variance of each mean together. The test statistic is
shown.

Population Variances Unknown, but both sample sizes large

When the population variances aren't known, the


difference of the means has a Student's t distribution.
However, if both sample sizes are large enough, then
you will be using the normal row from the t-table, so
your book lumps this under the normal distribution,
rather than the t-distribution. This gives us the chance to work the problem without
knowing if the population variances are equal or not. The test statistic is shown, and is
identical to above, except the sample variances are used instead of the population
variances.

Population Variances Unknown, unequal with small sample sizes

Ok, you're probably wondering how do you know if the variances are equal or not if
you don't know what they are. Some books teach the F-test to test the equality of two
variances, and if your book does that, then you should use the F-test to see. Other
books (statisticians) argue that if you do the F-test first to see if the variances are
equal, and then use the same level of significance to perform the t-test to test the
difference of the means, that the overall level of significance isn't the same. So, the
Bluman text tells the student whether or not the variances are equal and the Triola
text.
Since you don't know the population variances, you're
going to be using a Student's t distribution. Since the
variances are unequal, there is no attempt made to
average them together as we will in the next situation.
The degrees of freedom is the smaller of the two degrees
of freedom (n-1 for each). The "min" function means
take the minimum or smaller of the two values.
Otherwise, the formula is the same as we used with large sample sizes.

Population Variances Unknown but equal with small sample sizes

If the variances are equal, then an effort is made to average them


together. Now, equal does not mean identical. It is possible for
two variances to be statistically equal but be numerically
different. We will find a pooled estimate of the variance which is simply the weighted
mean of the variance. The weighting factors are the degrees of freedom.

Once the pooled estimate of the variance is computed,


this mean (average) variance is used in the place of the
individual sample variances. Otherwise, the formula is
the same as before. The degrees of freedom are the sum
of the individual degrees of freedom.

Table of Contents

Stats: Two Proportions

Remember that the normal distribution can be used to approximate the binomial
distribution in certain cases. Specifically, the approximation was considered good
when np and nq were both at least 5. Well, now, we're talking about two proportions,
so np and nq must be at least 5 for both samples.
We don't have a way to specifically test two proportions for
values, what we have is the ability to test the difference between
the proportions. So, much like the test for two means from
independent populations, we will be looking at the difference of
the proportions.

We will also be computing an average proportion and calling it p-


bar. It is the total number of successes divided by the total number of trials. The
definitions which are necessary are shown to the right.

The test statistic has the same general pattern as before (observed minus expected
divided by standard error). The test statistic used here is similar to that for a single
population proportion, except the difference of proportions are used instead of a single
proportion, and the value of p-bar is used instead of p in the standard error portion.

Since we're using the normal approximation to the


binomial, the difference of proportions has a normal
distribution. The test statistic is given.

Some people will be tempted to try to simplify the denominator of this test statistic
incorrectly. It can be simplified, but the correct simplification is not to simply place
the product of p-bar and q-bar over the sum of the n's. Remember that to add
fractions, you must have a common denominator, that is why this simplification is
incorrect.

The correct simplification would be to factor a


p-bar and q-bar out of the two expressions. This
is usually the formula given, because it is easier
to calculate, but I wanted to give it the other
way first so you could compare it to the other
formulas and see how similar they all are.

Stats: Correlation & Regression chp11

Definitions
Coefficient of Determination
The percent of the variation that can be explained by the regression equation
Correlation
A method used to determine if a relationship between variables exists
Correlation Coefficient
A statistic or parameter which measures the strength and direction of a
relationship between two variables
Dependent Variable
A variable in correlation or regression that can not be controlled, that is, it
depends on the independent variable.
Independent Variable
A variable in correlation or regression which can be controlled, that is, it is
independent of the other variable.
Pearson Product Moment Correlation Coefficient
A measure of the strength and direction of the linear relationship between
two variables
Regression
A method used to describe the relationship between two variables.
Regression Line
The best fit line.
Scatter Plot
An plot of the data values on a coordinate system. The independent variable is
graphed along the x-axis and the dependent variable along the y-axis
Standard Error of the Estimate
The standard deviation of the observed values about the predicted values

Table of Contents
TI-82: Scatter Plots, Regression Lines
You can use the calculator to draw scatter plots.

See the instructions on using the calculator to do statistics and lists. This provides an
overview as well as some helpful advice for working with statistics on the calculator.

Scatter Plots
1. Enter the x values into L1 and the y variables into L2.
2. Go to Stat Plot (2nd y=)
3. Turn Plot 1 on
4. Choose the type to be scatter plot (1st type)
5. Set Xlist to L1
6. Set Ylist to L2
7. Set the Mark to any of the three choices
8. Zoom to the Stat setting (#9)

Note, the Ylist and Mark won't show up until you select a scatter plot

Regression Lines
1. Setup the scatter plot as instructed above
2. Go into the Stats, Calc, Setup screen
3. Setup the 2-Var Stats so that: Xlist = L1, Ylist = L2, Freq = 1
4. Calculate the Linear Regression (ax+b) (#5)
5. Go into the Plot screen.
6. Position the cursor on the Y1 plot and hit CLEAR to erase it.
7. While still in the Y1 data entry field, go to the VARS, STATS, EQ screen and
choose option 7 which is the regression equation
8. Hit GRAPH

Regression Lines, part 2


The above technique works, but it requires that you change the equation being
graphed every time you change problems. It is possible to stick the regression
equation "ax+b" into the Y1 plot and then it will automatically graph the correct
regression equation each time.
Do this once

1. Setup the scatter plot as instructed above


2. Go into the Plot screen.
3. Position the cursor on the Y1 plot and hit CLEAR to erase it.
4. Enter a*x+b into the function. The a and b can be found under the VARS,
STATS, EQ screen

Do this for each graph

1. Go into the Stats, Calc, Setup screen


2. Setup the 2-Var Stats so that: Xlist = L1, Ylist = L2, Freq = 1
3. Calculate the Linear Regression (ax+b) (#5)
4. Hit the GRAPH key

It is important that you calculate the linear regression variables before trying to graph
the regression line. If you change the data in the lists or have not calculated the linear
regression equations, then you will get an " ERR: Undefined" when you try to graph
the data.

Be sure to turn off the stats plots and/or the Y1 plot when you need to graph other
data.

Stats: Correlation

Sum of Squares
We introduced a notation earlier in the course called the sum of squares. This notation
was the SS notation, and will make these formulas much easier to work with.
Notice these are all the same pattern,

SS(x) could be written as

Also note that

Pearson's Correlation Coefficient


There is a measure of linear correlation. The population parameter is denoted by the
greek letter rho and the sample statistic is denoted by the roman letter r.

Here are some properties of r

 r only measures the strength of a linear relationship. There are other kinds of
relationships besides linear.
 r is always between -1 and 1 inclusive. -1 means perfect negative linear
correlation and +1 means perfect positive linear correlation
 r has the same sign as the slope of the regression (best fit) line
 r does not change if the independent (x) and dependent (y) variables are
interchanged
 r does not change if the scale on either variable is changed. You may multiply,
divide, add, or subtract a value to/from all the x-values or y-values without
changing the value of r.
 r has a Student's t distribution

Here is the formula for r. Don't worry


about it, we won't be finding it this way.
This formula can be simplified through
some simple algebra and then some substitutions using the SS notation discussed
earlier.
If you divide the numerator and
denominator by n, then you get
something which is starting to hopefully
look familiar. Each of these values have
been seen before in the Sum of Squares
notation section. So, the linear correlation coefficient can be written in terms of sum
of squares.

This is the formula that we would be using for calculating


the linear correlation coefficient if we were doing it by
hand. Luckily for us, the TI-82 has this calculation built
into it, and we won't have to do it by hand at all.

Hypothesis Testing
The claim we will be testing is "There is significant linear correlation"

The Greek letter for r is rho, so the parameter used for linear correlation is rho

 H0: rho = 0
 H1: rho <> 0

r has a t distribution with n-2 degrees of freedom, and the test statistic is given

by:

Now, there are n-2 degrees of freedom this time. This is a difference from before. As
an over-simplification, you subtract one degree of freedom for each variable, and
since there are 2 variables, the degrees of freedom are n-2.

This doesn't look like our

If you consider the standard error for r is


the formula for the test statistic is , which does look like the pattern we're
looking for.

Remember that

Hypothesis testing is always done under the assumption that the null hypothesis is true.

Since H0 is rho = 0, this formula is equivalent to the one given in the book.
Additional Note: 1-r2 is later identified as the coefficient of non-determination

Hypothesis Testing Revisited


If you are testing to see if there is significant linear correlation (a two tailed test), then
there is another way to perform the hypothesis testing. There is a table of critical
values for the Pearson's Product Moment Coefficient (PPMC) given in the text book.
The degrees of freedom are n-2.

The test statistic in this case is simply the value of r. You compare the absolute value
of r (don't worry if it's negative or positive) to the critical value in the table. If the test
statistic is greater than the critical value, then there is significant linear correlation.
Furthermore, you are able to say there is significant positive linear correlation if the
original value of r is positive, and significant negative linear correlation if the original
value of r was negative.

This is the most common technique used. However, the first technique, with the t-
value must be used if it is not a two-tail test, or if a different level of significance
(other than 0.01 or 0.05) is desired.

Causation
If there is a significant linear correlation between two variables, then one of five
situations can be true.

 There is a direct cause and effect relationship


 There is a reverse cause and effect relationship
 The relationship may be caused by a third variable
 The relationship may be caused by complex interactions of several variables
 The relationship may be coincidental

Table of Contents

Stats: Regression

The idea behind regression is that when there is significant linear correlation, you can
use a line to estimate the value of the dependent variable for certain values of the
independent variable.

The regression equation should only used

 When there is significant linear correlation. That is, when you reject the null
hypothesis that rho=0 in a correlation hypothesis test.
 The value of the independent variable being used in the estimation is close to
the original values. That is, you should not use a regression equation obtained
using x's between 10 and 20 to estimate y when x is 200.
 The regression equation should not be used with different populations. That is,
if x is the height of a male, and y is the weight of a male, then you shouldn't use
the regression equation to estimate the weight of a female.
 The regression equation shouldn't be used to forecast values not from that time
frame. If data is from the 1960's, it probably isn't valid in the 1990's.

Assuming that you've decided that you can have a regression equation because there is
significant linear correlation between the two variables, the equation becomes: y'
= ax + b or y' = a + bx (some books use y-hat instead of y-prime). The Bluman text
uses the second formula, however, more people are familiar with the notion of y = mx
+ b, so I will use the first.

a is the slope of the regression line:

b is the y-intercept of the regression line:


The regression line is sometimes called the "line of best fit" or the "best fit line".

Since it "best fits" the data, it makes sense that the line passes through the means.

The regression equation is the line with slope a passing through the point

Another way to write the equation would be

apply just a little algebra, and we have


the formulas for a and b that we would
use (if we were stranded on a desert
island without the TI-82) ...

It also turns out that the slope of the regression line can be written as .
Since the standard deviations can't be negative, the sign of the slope is determined by
the sign of the correlation coefficient. This agrees with the statement made earlier that
the slope of the regression line will have the same slope as the correlation coefficient.

TI-82
Luckily, the TI-82 will find these values for us (isn't it a wonderful calculator?). We
can also use the TI-82 to plot the regression line on the scatter plot.

Table of Contents

TI-82: Correlation / Regression


See the instructions on using the calculator to do statistics and lists. This
provides an overview as well as some helpful advice for working with statistics
on the calculator.

Calculating Values
1. Enter the data. Put the x-values into list 1 and the y-values into list 2.
2. Go into the Stats, Calc, Setup screen
3. Setup the 2-Var Stats so that: Xlist = L1, Ylist = L2, Freq = 1
4. Calculate the Linear Regression (ax+b) (#5)

This screen will give you the sample linear correlation coefficient, r; the slope
of the regression equation, a; and the y-intercept of the regression equation,
b.

Just record the value of r.

To write the regression equation, replace the values of a and b into the
equation "y-hat = ax+b".

To find the coefficient of determination, square r. You can find the variable r
under VARS, STATS, EQ, r (#6).

Stats: Coefficient of Determination

Coefficient of Determination
The coefficient of determination is ...

 the percent of the variation that can be explained by the regression equation.
 the explained variation divided by the total variation
 the square of r

What's all this variation stuff?

Every sample has some variation in it (unless all the values are identical, and that's
unlikely to happen). The total variation is made up of two parts, the part that can be
explained by the regression equation and the part that can't be explained by the
regression equation.
Well, the ratio of the explained variation to the total variation is a measure of how
good the regression line is. If the regression line passed through every point on the
scatter plot exactly, it would be able to explain all of the variation. The further the line
is from the points, the less it is able to explain.

Coefficient of Non-Determination
The coefficient of non-determination is ...

 The percent of variation which is unexplained by the regression equation


 The unexplained variation divided by the total variation
 1 - r^2

Standard Error of the Estimate


The coefficient of non-determination was used in the t-test to see if there was
significant linear correlation. It was the in the numerator of the standard error formula.

The standard error of the estimate is the square root of the


coefficient of non-determination divided by it's degrees of freedom.

Confidence Interval for y'


The following only works when the sample size is large.
Large in this instance is usually taken to be more than 100.
We're not going to cover this in class, but is provided here for
your information. The maximum error of the estimate is
given, and this maximum error of the estimate is subtracted
from and added to the estimated value of y.

Stats: Chi-Square

DefinitionsStats: Chi-Square chp12


Definitions
Chi-square distribution
A distribution obtained from the multiplying the ratio of sample variance to
population variance by the degrees of freedom when random samples are
selected from a normally distributed population
Contingency Table
Data arranged in table form for the chi-square independence test
Expected Frequency
The frequencies obtained by calculation.
Goodness-of-fit Test
A test to see if a sample comes from a population with the given distribution.
Independence Test
A test to see if the row and column variables are independent.
Observed Frequency
The frequencies obtained by observation. These are the sample frequencies.

Table of Contents

Chi-square distribution
A distribution obtained from the multiplying the ratio of sample variance to
population variance by the degrees of freedom when random samples are
selected from a normally distributed population
Contingency Table
Data arranged in table form for the chi-square independence test
Expected Frequency
The frequencies obtained by calculation.
Goodness-of-fit Test
A test to see if a sample comes from a population with the given distribution.
Independence Test
A test to see if the row and column variables are independent.
Observed Frequency
The frequencies obtained by observation. These are the sample frequencies.

Table of Contents

Stats: Chi-Square Distribution

The chi-square ( ) distribution is obtained from the


values of the ratio of the sample variance and
population variance multiplied by the degrees of
freedom. This occurs when the population is normally distributed with population
variance sigma^2.

Properties of the Chi-Square


 Chi-square is non-negative. Is the ratio of two non-negative values, therefore
must be non-negative itself.
 Chi-square is non-symmetric.
 There are many different chi-square distributions, one for each degree of
freedom.
 The degrees of freedom when working with a single population variance is n-1.

Chi-Square Probabilities
Since the chi-square distribution isn't symmetric, the method for looking up left-tail
values is different from the method for looking up right tail values.

 Area to the right - just use the area given.


 Area to the left - the table requires the area to the right, so subtract the given
area from one and look this area up in the table.
 Area in both tails - divide the area by two. Look up this area for the right
critical value and one minus this area for the left critical value.

DF which aren't in the table


When the degrees of freedom aren't listed in the table, there are a couple of choices
that you have.

 You can interpolate. This is probably the more accurate way. Interpolation
involves estimating the critical value by figuring how far the given degrees of
freedom are between the two df in the table and going that far between the
critical values in the table. Most people born in the 70's didn't have to learn
interpolation in high school because they had calculators which would do
logarithms (we had to use tables in the "good old" days).
 You can go with the critical value which is less likely to cause you to reject in
error (type I error). For a right tail test, this is the critical value further to the
right (larger). For a left tail test, it is the value further to the left (smaller). For a
two-tail test, it's the value further to the left and the value further to the right.
Note, it is not the column with the degrees of freedom further to the right, it's
the critical value which is further to the right. The Bluman text has this wrong
on page 422. The guideline is right, the instructions are wrong.

Table of Contents

Stats: Single Population Variance


The variable has a chi-square distribution if the population variance has
a normal distribution. The degrees of freedom are n-1. We can use this to test the
population variance under certain conditions

Conditions for testing


 The population has a normal distribution
 The data is from a random sample
 The observations must be independent of each other
 The test statistic has a chi-square distribution with n-1 degrees of freedom and

is given by:

Testing is done in the same manner as before. Remember, all hypothesis testing is
done under the assumption the null hypothesis is true.

Confidence Intervals
If you solve the test statistic formula for the population variance, you

get:

1. Find the two critical values (alpha/2 and 1-alpha/2)


2. Compute the value for the population variance given above.
3. Place the population variance between the two values calculated in step 2 (put
the smaller one first).

Note, the left-hand endpoint of the confidence interval comes when the right critical
value is used and the right-hand endpoint of the confidence interval comes when the
left critical value is used. This is because the critical values are in the denominator and
so dividing by the larger critical value (right tail) gives the smaller endpoint.

Stats: Goodness-of-fit Test


The idea behind the chi-square goodness-of-fit test is to see if the sample comes from
the population with the claimed distribution. Another way of looking at that is to ask
if the frequency distribution fits a specific pattern.

Two values are involved, an observed value, which is the frequency of a category
from a sample, and the expected frequency, which is calculated based upon the
claimed distribution. The derivation of the formula is very similar to that of the
variance which was done earlier (chapter 2 or 3).

The idea is that if the observed frequency is really close to the claimed (expected)
frequency, then the square of the deviations will be small. The square of the deviation
is divided by the expected frequency to weight frequencies. A difference of 10 may be
very significant if 12 was the expected frequency, but a difference of 10 isn't very
significant at all if the expected frequency was 1200.

If the sum of these weighted squared deviations is small, the observed frequencies are
close to the expected frequencies and there would be no reason to reject the claim that
it came from that distribution. Only when the sum is large is the a reason to question
the distribution. Therefore, the chi-square goodness-of-fit test is always a right tail
test.

The test statistic has a chi-square distribution when


the following assumptions are met

 The data are obtained from a random sample


 The expected frequency of each category must be at least 5. This goes back to
the requirement that the data be normally distributed. You're simulating a
multinomial experiment (using a discrete distribution) with the goodness-of-fit
test (and a continuous distribution), and if each expected frequency is at least
five then you can use the normal distribution to approximate (much like the
binomial). If the expected

The following are properties of the goodness-of-fit test

 The data are the observed frequencies. This means that there is only one data
value for each category. Therefore, ...
 The degrees of freedom is one less than the number of categories, not one less
than the sample size.
 It is always a right tail test.
 It has a chi-square distribution.
 The value of the test statistic doesn't change if the order of the categories is
switched.
 The test statistic is

Interpreting the Claim


There are four ways you might be given a claim.

1. The values occur with equal frequency. Other words for this are "uniform", "no
preference", or "no difference". To find the expected frequencies, total the
observed frequencies and divide by the number of categories. This quotient is
the expected frequency for each category.
2. Specific proportions or probabilities are given. To find the expected
frequencies, multiply the total of the observed frequencies by the probability
for each category.
3. The expected frequencies are given to you. In this case, you don't have to do
anything.
4. A specific distribution is claimed. For example, "The data is normally
distributed". To work a problem like this, you need to group the data and find
the frequency for each class. Then, find the probability of being within that
class by converting the scores to z-scores and looking up the probabilities.
Finally, multiply the probabilities by the total observed frequency. (It's not
really as bad as it sounds).

Using the TI-82


You can use the lists on the TI-82 to perform the chi-square goodness-of-fit test.

Table of Contents

TI-82: Goodness-of-Fit
You can perform a chi-square goodness-of-fit test using the TI-82. Here are
the steps.

1. Enter the observed frequencies into List 1.


2. Enter the expected frequencies into List 2.
a. If you're given the expected frequencies, enter them into List 2.
b. If you're given probabilities, then enter the probabilities into List 2
and multiply List 2 by the sum of List 1 and replace List 2 with that
product: sum L1 * L2 -> L2
c. If you're testing that all categories appear with equal frequency,
then you can a) enter that value into List 2, or b) enter the total
frequency into each value of List 2 and then divide the list by the
number of categories: L2 / k -> L2 (replace k by the number of
categories), or c) enter 1 for each value in List 2 and then multiply
the list by the common expected frequency: L2 * E -> L2 (replace
E by the expected frequency)
3. Calculate the test statistic: sum ((L1 - L2)^2 / L2)

Stats: Test for Independence

In the test for independence, the claim is that the row and column variables are
independent of each other. This is the null hypothesis.

The multiplication rule said that if two events were independent, then the probability
of both occurring was the product of the probabilities of each occurring. This is key to
working the test for independence. If you end up rejecting the null hypothesis, then
the assumption must have been wrong and the row and column variable are
dependent. Remember, all hypothesis testing is done under the assumption the null
hypothesis is true.

The test statistic used is the same as the chi-square goodness-of-fit test. The principle
behind the test for independence is the same as the principle behind the goodness-of-
fit test. The test for independence is always a right tail test.

In fact, you can think of the test for independence as a goodness-of-fit test where the
data is arranged into table form. This table is called a contingency table.

The test statistic has a chi-square distribution when


the following assumptions are met

 The data are obtained from a random sample


 The expected frequency of each category must be at least 5.

The following are properties of the test for independence

 The data are the observed frequencies.


 The data is arranged into a contingency table.
 The degrees of freedom are the degrees of freedom for the row variable times
the degrees of freedom for the column variable. It is not one less than the
sample size, it is the product of the two degrees of freedom.
 It is always a right tail test.
 It has a chi-square distribution.
 The expected value is computed by taking the row total times the column total
and dividing by the grand total
 The value of the test statistic doesn't change if the order of the rows or columns
are switched.
 The value of the test statistic doesn't change if the rows and columns are
interchanged (transpose of the matrix)

 The test statistic is

Using the TI-82


There is a program called CONTING (for contingency table) for the TI-82 which will
compute the test statistic for you. You still need to look up the critical value in the
table.

Table of Contents

CONTING Program
This program completes a test for independence using a contingency table.
The observed frequencies must be contained in matrix [A]and the result is a
test statistic having a chi-square distribution.

When the program is done running, the following variables are defined

 List 1 contains the observed frequencies


 List 2 contains the expected frequencies
 List 3 contains the row totals
 List 4 contains the column totals
Stats: F-Test chp13

Definitions
F-distribution
The ratio of two independent chi-square variables divided by their respective
degrees of freedom. If the population variances are equal, this simplifies to be
the ratio of the sample variances.
Analysis of Variance (ANOVA)
A technique used to test a hypothesis concerning the means of three or mor
populations.
One-Way Analysis of Variance
Analysis of Variance when there is only one independent variable. The null
hypothesis will be that all population means are equal, the alternative
hypothesis is that at least one mean is different.
Between Group Variation
The variation due to the interaction between the samples, denoted SS(B) for
Sum of Squares Between groups. If the sample means are close to each other
(and therefore the Grand Mean) this will be small. There are k samples
involved with one data value for each sample (the sample mean), so there are
k-1 degrees of freedom.
Between Group Variance
The variance due to the interaction between the samples, denoted MS(B) for
Mean Square Between groups. This is the between group variation divided by
its degrees of freedom.
Within Group Variation
The variation due to differences within individual samples, denoted SS(W) for
Sum of Squares Within groups. Each sample is considered independently, no
interaction between samples is involved. The degrees of freedom is equal to
the sum of the individual degrees of freedom for each sample. Since each
sample has degrees of freedom equal to one less than their sample sizes, and
there are k samples, the total degrees of freedom is k less than the total
sample size: df = N - k.
Within Group Variance
The variance due to the differences within individual samples, denoted MS(W)
for Mean Square Within groups. This is the within group variation divided by
its degrees of freedom.
Scheffe' Test
A test used to find where the differences between means lie when the
Analysis of Variance indicates the means are not all equal. The Scheffe' test is
generally used when the sample sizes are different.
Tukey Test
A test used to find where the differences between the means lie when the
Analysis of Variance indicates the means are not all equal. The Tukey test is
generally used when the sample sizes are all the same.
Two-Way Analysis of Variance
An extension to the one-way analysis of variance. There are two independent
variables. There are three sets of hypothesis with the two-way ANOVA. The
first null hypothesis is that there is no interaction between the two factors.
The second null hypothesis is that the population means of the first factor are
equal. The third null hypothesis is that the population means of the second
factor are equal.
Factors
The two independent variables in a two-way ANOVA.
Treatment Groups
Groups formed by making all possible combinations of the two factors. For
example, if the first factor has 3 levels and the second factor has 2 levels, then
there will be 3x2=6 different treatment groups.
Interaction Effect
The effect one factor has on the other factor
Main Effect
The effects of the independent variables.

Stats: F-Test

The F-distribution is formed by the ratio of two independent chi-


square variables divided by their respective degrees of freedom.

Since F is formed by chi-square, many of the chi-square properties


carry over to the F distribution.

 The F-values are all non-negative


 The distribution is non-symmetric
 The mean is approximately 1
 There are two independent degrees of freedom, one for the numerator, and
one for the denominator.
 There are many different F distributions, one for each pair of degrees of
freedom.

F-Test
The F-test is designed to test if two population variances are equal. It does this by
comparing the ratio of two variances. So, if the variances are equal, the ratio of the
variances will be 1.

All hypothesis testing is done under the assumption the null hypothesis is true

If the null hypothesis is true, then the F test-statistic given above can be
simplified (dramatically). This ratio of sample variances will be test
statistic used. If the null hypothesis is false, then we will reject the null
hypothesis that the ratio was equal to 1 and our assumption that they were
equal.
There are several different F-tables. Each one has a different level of significance. So,
find the correct level of significance first, and then look up the numerator degrees of
freedom and the denominator degrees of freedom to find the critical value.

You will notice that all of the tables only give level of significance for right tail tests.
Because the F distribution is not symmetric, and there are no negative values, you
may not simply take the opposite of the right critical value to find the left critical
value. The way to find a left critical value is to reverse the degrees of freedom, look
up the right critical value, and then take the reciprocal of this value. For example, the
critical value with 0.05 on the left with 12 numerator and 15 denominator degrees of
freedom is found of taking the reciprocal of the critical value with 0.05 on the right
with 15 numerator and 12 denominator degrees of freedom.

Avoiding Left Critical Values

Since the left critical values are a pain to calculate, they are often avoided altogether.
This is the procedure followed in the textbook. You can force the F test into a right
tail test by placing the sample with the large variance in the numerator and the smaller
variance in the denominator. It does not matter which sample has the larger sample
size, only which sample has the larger variance.

The numerator degrees of freedom will be the degrees of freedom for whichever
sample has the larger variance (since it is in the numerator) and the denominator
degrees of freedom will be the degrees of freedom for whichever sample has the
smaller variance (since it is in the denominator).

If a two-tail test is being conducted, you still have to divide alpha by 2, but you only
look up and compare the right critical value.

Assumptions / Notes

 The larger variance should always be placed in the numerator


 The test statistic is F = s1^2 / s2^2 where s1^2 > s2^2
 Divide alpha by 2 for a two tail test and then find the right critical value
 If standard deviations are given instead of variances, they must be squared
 When the degrees of freedom aren't given in the table, go with the value with
the larger critical value (this happens to be the smaller degrees of freedom).
This is so that you are less likely to reject in error (type I error)
 The populations from which the samples were obtained must be normal.
 The samples must be independent
Table of Contents

Stats: One-Way ANOVA

A One-Way Analysis of Variance is a way to test the equality of three or more means
at one time by using variances.

Assumptions

 The populations from which the samples were obtained must be normally or
approximately normally distributed.
 The samples must be independent.
 The variances of the populations must be equal.

Hypotheses

The null hypothesis will be that all population means are equal, the alternative
hypothesis is that at least one mean is different.

In the following, lower case letters apply to the individual samples and capital letters
apply to the entire set collectively. That is, n is one of many sample sizes, but N is the
total sample size.

Grand Mean

The grand mean of a set of samples is the total of all the data values
divided by the total sample size. This requires that you have all of the
sample data available to you, which is usually the case, but not always. It turns out
that all that is necessary to find perform a one-way analysis of variance are the
number of samples, the sample means, the sample variances, and the sample sizes.

Another way to find the grand mean is to find the weighted average of
the sample means. The weight applied is the sample size.
Total Variation

The total variation (not variance) is comprised the sum of


the squares of the differences of each mean with the grand mean.

There is the between group variation and the within group variation. The whole idea
behind the analysis of variance is to compare the ratio of between group variance to
within group variance. If the variance caused by the interaction between the samples
is much larger when compared to the variance that appears within each group, then it
is because the means aren't the same.

Between Group Variation

The variation due to the interaction between the samples


is denoted SS(B) for Sum of Squares Between groups. If the sample means are close
to each other (and therefore the Grand Mean) this will be small. There are k samples
involved with one data value for each sample (the sample mean), so there are k-1
degrees of freedom.

The variance due to the interaction between the samples is denoted MS(B) for Mean
Square Between groups. This is the between group variation divided by its degrees of

freedom. It is also denoted by .

Within Group Variation

The variation due to differences within individual samples,


denoted SS(W) for Sum of Squares Within groups. Each sample is considered
independently, no interaction between samples is involved. The degrees of freedom is
equal to the sum of the individual degrees of freedom for each sample. Since each
sample has degrees of freedom equal to one less than their sample sizes, and there are
k samples, the total degrees of freedom is k less than the total sample size: df = N - k.

The variance due to the differences within individual samples is denoted MS(W) for
Mean Square Within groups. This is the within group variation divided by its degrees

of freedom. It is also denoted by . It is the weighted average of the variances


(weighted with the degrees of freedom).
F test statistic

Recall that a F variable is the ratio of two independent chi-square


variables divided by their respective degrees of freedom. Also recall that
the F test statistic is the ratio of two sample variances, well, it turns out
that's exactly what we have here. The F test statistic is found by dividing
the between group variance by the within group variance. The degrees of freedom for
the numerator are the degrees of freedom for the between group (k-1) and the degrees
of freedom for the denominator are the degrees of freedom for the within group (N-k).

Summary Table

All of this sounds like a lot to remember, and it is. However, there is a table which
makes things really nice.

SS df MS F

Between SS(B) k-1 SS(B) MS(B)


----------- --------------
k-1 MS(W)

Within SS(W) N-k SS(W) .


-----------
N-k

Total SS(W) + SS(B) N-1 . .

Notice that each Mean Square is just the Sum of Squares divided by its degrees of
freedom, and the F value is the ratio of the mean squares. Do not put the largest
variance in the numerator, always divide the between variance by the within variance.
If the between variance is smaller than the within variance, then the means are really
close to each other and you will fail to reject the claim that they are all equal. The
degrees of freedom of the F-test are in the same order they appear in the table (nifty,
eh?).

Decision Rule

The decision will be to reject the null hypothesis if the test statistic from the table is
greater than the F critical value with k-1 numerator and N-k denominator degrees of
freedom.
If the decision is to reject the null, then at least one of the means is different.
However, the ANOVA does not tell you where the difference lies. For this, you need
another test, either the Scheffe' or Tukey test.

TI-82

Ok, now for the really good news. There's a program called ANOVA for the TI-82
calculator which will do all of the calculations and give you the values that go into the
table for you. You must have the sample means, sample variances, and sample sizes to
use the program. If you have the sum of squares, then it is much easier to finish the
table by hand (this is what we'll do with the two-way analysis of variance)

Table of Contents

ANOVA Program
Performs a one-way Analysis of Variance. List 1 must contain the means of
the samples, list 2 must contain the sample variances, and list 3 must contain
the sample sizes. Note that the three lists must be the same size. The user is
reminded of these requirements when running the program.

The grand mean is displayed, followed by the sum of squares, degrees of


freedom, and mean sum of squares for the between group and within group.
The total sum of squares and degrees of freedom, along with the F test
statistic is also shown.

Upon completion, the program will give the user the chance to run the Scheffe
test if the sample sizes are different or the Tukey test if the sample sizes are
the same. All possible pairs are compared.

Stats: Scheffe' and Tukey Tests


When the decision from the One-Way Analysis of Variance is to reject the null
hypothesis, it means that at least one of the means isn't the same as the other means.
What we need is a way to figure out where the differences lie, not just that there is a
difference.

This is where the Scheffe' and Tukey tests come into play. They will help us analyze
pairs of means to see if there is a difference -- much like the difference of two means
covered earlier.

Hypotheses
Both tests are set up to test if pairs of means are different. The formulas
refer to mean i and mean j. The values of i and j vary, and the total
number of tests will be equal to a combination of k objects, 2 at a time
C(k,2), where k is the number of samples.

Scheffé Test
The Scheffe' test is customarily used with unequal sample sizes, although it could be
used with equal sample sizes.

The critical value for the Scheffe' test is the degrees of freedom for the between
variance times the critical value for the one-way ANOVA. This simplifies to be:
CV = (k-1) F(k-1,N-k,alpha)

The test statistic is a little bit harder to compute. Pure


mathematicians will argue that this shouldn't be called F
because it doesn't have an F distribution (it's the degrees
of freedom times an F), but we'll live it with it.

Reject H0 if the test statistic is greater than the critical value. Note, this is a right tail
test. If there is no difference between the means, the numerator will be close to zero,
and so performing a left tail test wouldn't show anything.

Tukey Test
The Tukey test is only usable when the sample sizes are the same.

The Critical Value is looked up in a table. It is Table N in the Bluman text. There are
actually several different tables, one for each level of significance. The number of
samples, k, is used as a index along the top, and the degrees of freedom for the within
group variance, v = N-k, are used as an index along the left side.

The test statistic is found by dividing the difference between the


means by the square root of the ratio of the within group variation
and the sample size.

Reject the null hypothesis if the absolute value of the test statistic
is greater than the critical value (just like the linear correlation coefficient critical
values).

TI-82
The ANOVA program for the TI-82 will do all of the pairwise comparisons for you
after it has given the ANOVA summary table. You will need to know how to find the
critical values and make the comparisons.

Table of Contents

Stats: Two-Way ANOVA

The two-way analysis of variance is an extension to the one-way analysis of variance.


There are two independent variables (hence the name two-way).

Assumptions

 The populations from which the samples were obtained must be normally or
approximately normally distributed.
 The samples must be independent.
 The variances of the populations must be equal.
 The groups must have the same sample size.

Hypotheses

There are three sets of hypothesis with the two-way ANOVA.

The null hypotheses for each of the sets are given below.
1. The population means of the first factor are equal. This is like the one-way
ANOVA for the row factor.
2. The population means of the second factor are equal. This is like the one-way
ANOVA for the column factor.
3. There is no interaction between the two factors. This is similar to performing a
test for independence with contingency tables.

Factors

The two independent variables in a two-way ANOVA are called factors. The idea is
that there are two variables, factors, which affect the dependent variable. Each factor
will have two or more levels within it, and the degrees of freedom for each factor is
one less than the number of levels.

Treatment Groups

Treatement Groups are formed by making all possible combinations of the two
factors. For example, if the first factor has 3 levels and the second factor has 2 levels,
then there will be 3x2=6 different treatment groups.

As an example, let's assume we're planting corn. The type of seed and type of
fertilizer are the two factors we're considering in this example. This example has 15
treatment groups. There are 3-1=2 degrees of freedom for the type of seed, and 5-1=4
degrees of freedom for the type of fertilizer. There are 2*4 = 8 degrees of freedom for
the interaction between the type of seed and type of fertilizer.

The data that actually appears in the table are samples. In this case, 2 samples from
each treatment group were taken.

Fert I Fert II Fert III Fert IV Fert V

Seed A-402 106, 110 95, 100 94, 107 103, 104 100, 102

Seed B-894 110, 112 98, 99 100, 101 108, 112 105, 107

Seed C-952 94, 97 86, 87 98, 99 99, 101 94, 98

Main Effect

The main effect involves the independent variables one at a time. The interaction is
ignored for this part. Just the rows or just the columns are used, not mixed. This is the
part which is similar to the one-way analysis of variance. Each of the variances
calculated to analyze the main effects are like the between variances

Interaction Effect

The interaction effect is the effect that one factor has on the other factor. The degrees
of freedom here is the product of the two degrees of freedom for each factor.

Within Variation

The Within variation is the sum of squares within each treatment group. You have one
less than the sample size (remember all treatment groups must have the same sample
size for a two-way ANOVA) for each treatment group. The total number of treatment
groups is the product of the number of levels for each factor. The within variance is
the within variation divided by its degrees of freedom.

The within group is also called the error.

F-Tests

There is an F-test for each of the hypotheses, and the F-test is the mean square for
each main effect and the interaction effect divided by the within variance. The
numerator degrees of freedom come from each effect, and the denominator degrees of
freedom is the degrees of freedom for the within variance in each case.

Two-Way ANOVA Table

It is assumed that main effect A has a levels (and A = a-1 df), main effect B has b
levels (and B = b-1 df), n is the sample size of each treatment, and N = abn is the total
sample size. Notice the overall degrees of freedom is once again one less than the total
sample size.

Source SS df MS F

Main Effect A given A, SS / df MS(A) / MS(W)


a-1

Main Effect B given B, SS / df MS(B) / MS(W)


b-1
Interaction Effect given A*B, SS / df MS(A*B) / MS(W)
(a-1)(b-1)

Within given N - ab, SS / df


ab(n-1)

Total sum of others N - 1,


abn - 1

Summary

The following results are calculated using the Quattro Pro spreadsheet. It provides the
p-value and the critical values are for alpha = 0.05.

Source of Variation SS df MS F P-value F-crit

Seed 512.8667 2 256.4333 28.283 0.000008 3.682

Fertilizer 449.4667 4 112.3667 12.393 0.000119 3.056

Interaction 143.1333 8 17.8917 1.973 0.122090 2.641

Within 136.0000 15 9.0667

Total 1241.4667 29

From the above results, we can see that the main effects are both significant, but the
interaction between them isn't. That is, the types of seed aren't all equal, and the types
of fertilizer aren't all equal, but the type of seed doesn't interact with the type of
fertilizer.
Error in Bluman Textbook

The two-way ANOVA, Example 13-9, in the Bluman text has the incorrect values in
it. The student would have no way of knowing this because the book doesn't explain
how to calculate the values.

Here is the correct table:

Source of Variation SS df MS F

Sample 3.920 1 3.920 4.752

Column 9.680 1 9.680 11.733

Interaction 54.080 1 54.080 65.552

Within 3.300 4 0.825

Total 70.980 7

The student will be responsible for finishing the table, not for coming up with the sum
of squares which go into the table in the first place.

You might also like