You are on page 1of 25

c  


   

     

  

Every integer greater than one is either prime or can be expressed as an unique
product of prime numbers




Every polynomial function on one variable of degree n > 0 has at least one real or
complex zero.

  
 


If there is a solution to a linear programming problem, then it will occur at a corner


point or on a boundary between two or more corner points

   
   
In a sequence of events, the total possible number of ways all events can performed is
the product of the possible number of ways each individual event can be performed.

The Bluman text calls this multiplication principle 2.

  
If n is a positive integer, then
||| || | | || |
||| || | |

— special case is 0!
|||
|||



   
— permutation is an arrangement of objects without repetition where order is
important.

    
   

— permutation of n objects, arranged into one group of size n, without repetition, and
order being important is:

||| ||  || |

Example: Find all permutations of the letters "—BC"


||| || || || || || |

       

— permutation of n objects, arranged in groups of size r, without repetition, and order


being important is:

||| ||  || || |

Example: Find all two-letter permutations of the letters "—BC"


||| || || |||| |||

c    
   

—ssuming that you start a n and count down to 1 in your factorials ...

P(n,r) = first r factors of n factorial

  
     

Sometimes letters are repeated and all of the permutations aren't distinguishable from
each other.

Example: Find all permutations of the letters "BOB"

To help you distinguish, I'll write the second "B" as "b"


||||||||||||||
If you just write "B" as "B", however ...
||||||||||||||

There are really only three distinguishable permutations here.


||||||||

If a word has N letters, k of which are unique, and you let n (n1, n2, n3, ..., nk) be the
frequency of each of the k letters, then the total number of distinguishable
permutations is given by:

Consider the word "ST—TISTICS":

Here are the frequency of each letter: S=3, T=3, —=1, I=2, C=1, there are 10 letters
total
||||||||||||||||||||
||||||||
 |
 ! "||||||


|
|||||||||||||| | ||||||||||||||||

You can find distinguishable permutations using the TI-82.

  
— combination is an arrangement of objects without repetition where order is not
important.

Note: The difference between a permutation and a combination is not whether there is
repetition or not -- there must not be repetition with either, and if there is repetition,
you can not use the formulas for permutations or combinations.      
                
  

— combination of n objects, arranged in groups of size r, without repetition, and order


being important is:

||| || || ||| ||||


—nother way to write a combination of n things, r at a time is using the binomial

notation:

Example: Find all two-letter combinations of the letters "—BC"


||| || ||| || ||||||

There are only three two-letter combinations.


c    
   

—ssuming that you start a n and count down to 1 in your factorials ...

C(n,r) = first r factors of n factorial divided by the last r factors of n factorial

    
 

Combinations are used in the binomial expansion theorem from algebra to give the
coefficients of the expansion (a+b)^n. They also form a pattern known as Pascal's
Triangle.
||||||||||||||||||||||||||||
|||||||||||||||||||||||||||||
||||||||||||||||||||||||||||||
|||||||||||||||||||||||| ||| ||||
||||||||||||||||||||||||||||||||
||||||||||||||||||||||
|||
||||||
||||||||||||||||||||||
|||||||||
|||||||||||||||||||| || |||||||||

Each element in the table is the sum of the two elements directly above it. Each
element is also a combination. The n value is the number of the row (start counting at
zero) and the r value is the element in the row (start counting at zero). That would
make the 20 in the next to last row C(6,3) -- it's in the row #6 (7th row) and position #3
(4th element).

c 

Pascal's Triangle illustrates the symmetric nature of a combination.  || 




Example: C(10,4) = C(10,6) or C(100,99) = C(100,1)


c    
   
Since combinations are symmetric, if n-r is smaller than r, then switch the
combination to its alternative form and then use the shortcut given above.

C(n,r) = first r factors of n factorial divided by the last r factors of n factorial

!"#$
You can use the TI-82 graphing calculator to find factorials, permutations, and
combinations.

  

Tree diagrams are a graphical way of listing all the possible
outcomes. The outcomes are listed in an orderly fashion, so
listing all of the possible outcomes is easier than just trying
to make sure that you have them all listed. It is called a tree
diagram because of the way it looks.

The first event appears on the left, and then each sequential
event is represented as branches off of the first event.

The tree diagram to the right would show the possible ways
of flipping two coins. The final outcomes are obtained by following each branch to its
conclusion: They are from top to bottom:
|||##||#$||$#||$$|
|
c  !     

c c 
— sample space is the set of all possible outcomes. However, some sample spaces are
better than others.

Consider the experiment of flipping two coins. It is possible to get 0 heads, 1 head, or
2 heads. Thus, the sample space could be {0, 1, 2}. —nother way to look at it is flip {
HH, HT, TH, TT }. The second way is better because each event is as equally likely to
occur as any other.

When writing the sample space, it is highly desirable to have events which are equally
likely.

—nother example is rolling two dice. The sums are { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }.
However, each of these aren't equally likely. The only way to get a sum 2 is to roll a 1
on both dice, but you can get a sum of 4 by rolling a 1-3, 2-2, or 3-1. The following
table illustrates a better sample space for the sum obtain when rolling two dice.

c 
   % $ & ' ( )
% 2 3 4 5 6 7
$ 3 4 5 6 7 8
& 4 5 6 7 8 9
' 5 6 7 8 9 10
( 6 7 8 9 10 11
) 7 8 9 10 11 12




   
The above table lends itself to describing data another way -- using a probability
distribution. Let's consider the frequency distribution for the above sums.

c     *  + 
   
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36

If just the first and last columns were written, we would have a probability
distribution. The relative frequency of a frequency distribution is the probability of the
event occurring. This is only true, however, if the events are equally likely.

This gives us the formula for classical probability. The probability of an event
occurring is the number in the event divided by the number in the sample space.
—gain, this is only true when the events are equally likely. — classical probability is
the relative frequency of each event in the sample space when each event is equally
likely.

||| %|| %|| &|



!    
Empirical probability is based on observation. The empirical probability of an event is
the relative frequency of a frequency distribution based upon observation.

||| %||'|| |

  * 
There are two rules which are very important.

     ,%   + 

|||
|(| %|(||
           %

There are some other rules which are also important.

    +       ,

The probability of any event which is not in the sample space is zero.

    +      %

The probability of the sample space is 1.

    +     


          


||| %)|||| %|

Continue and learn more about the rules of probability.

 
c    * 

-.*-/ 
0 !  + !+  

Two events are mutually exclusive if they cannot occur at the same time. —nother
word that means mutually exclusive is disjoint.

If two events are disjoint, then the probability of them both occurring at the same time
is 0.

|||* "+! ,||  | -|||


|

If two events are mutually exclusive, then the probability of either occurring is the
sum of the probabilities of each occurring.

c   * 

Only valid when the events are mutually exclusive.

|||  |!|||  |.| |

!  %

1 + 234,$,52634,7,56   

I like to use what's called a joint probability distribution. (Since disjoint means
nothing in common, joint is what they have in common -- so the values that go on the
inside portion of the table are the intersections or "and"s of each pair of events).
"Marginal" is another word for totals -- it's called marginal because they appear in the
margins.

6 6 0

 0.00 0.20 ,$,
 0.70 0.10 ,#,
0
 ,7, ,&, %,,
The values in red are given in the problem. The grand total is always 1.00. The rest of
the values are obtained by addition and subtraction.

r"0 !  + !+  

In events which aren't mutually exclusive, there is some overlap. When P(—) and P(B)
are added, the probability of the intersection (and) is added twice. To compensate for
that double addition, the intersection needs to be subtracted.

1   * 

—lways valid.

|||  |!|||  |.| ||  | -||

!  $

1 + 234,$,52634,7,52634,%(

6 6 0

 0.15 0.05 ,$,
 0.55 0.25 ,#,
0
 ,7, ,&, %,,

!  
    

Certain things can be determined from the joint probability distribution. Mutually
exclusive events will have a probability of zero. —ll inclusive events will have a zero
opposite the intersection. —ll inclusive means that there is nothing outside of those
two events:  |!|||.

6 6 0

 — and B are Mutually Exclusive . .
if this value is 0
 . — and B are —ll Inclusive if this .
value is 0
0
 . . %,,

-r-!  
!    !+  

Two events are independent if the occurrence of one does not change the probability
of the other occurring.

—n example would be rolling a 2 on a die and flipping a head on a coin. Rolling the 2
does not affect the probability of flipping the head.

If events are independent, then the probability of them both occurring is the product of
the probabilities of each occurring.

c  0   * 

Only valid for independent events

|||  | -|||  || |

!  &

P(—) = 0.20, P(B) = 0.70, — and B are independent.

6 6 0

 0.14 0.06 ,$,
 0.56 0.24 ,#,
0
 ,7, ,&, %,,

The 0.14 is because the probability of — and B is the probability of — times the
probability of B or 0.20 * 0.70 = 0.14.

    !+  

If the occurrence of one event does affect the probability of the other occurring, then
the events are dependent.

   

The probability of event B occurring that event — has already occurred is read "the
probability of B given —" and is written:| / 

1  0   * 
—lways works.

|||  | -|||  || / |

!  '

P(—) = 0.20, P(B) = 0.70, P(B|—) = 0.40

— good way to think of P(B|—) is that 40% of — is B. 40% of the 20% which was in
event — is 8%, thus the intersection is 0.08.

6 6 0

 0.08 0.12 ,$,
 0.62 0.18 ,#,
0
 ,7, ,&, %,,

!    * +  

The following four statements are equivalent

1.| — and B are independent events


2.| P(— and B) = P(—) * P(B)
3.| P(—|B) = P(—)
4.| P(B|—) = P(B)

The last two are because if two events are independent, the occurrence of one doesn't
change the probability of the occurrence of the other. This means that the probability
of B occurring, whether — has happened or not, is simply the probability of B
occurring.

Continue with conditional probabilities.

 
c     

   
Recall that the probability of an event occurring given that another event has already
occurred is called a conditional probability.

The probability that event B occurs, given that event — has already occurred is

||| / ||  | -|||  |

This formula comes from the general multiplication principle and a little bit of
algebra.

Since we are given that event — has occurred, we have a reduced sample space.
Instead of the entire sample space S, we now have a sample space of — since we know
— has occurred. So the old rule about being the number in the event divided by the
number in the sample space still applies. It is the number in — and B (must be in —
since — has occurred) divided by the number in —. If you then divided numerator and
denominator of the right hand side by the number in the sample space S, then you
have the probability of — and B divided by the probability of —.

!  

!  %

The question, "Do you smoke?" was asked of 100 people. Results are shown in the
table.

. 8  r  
0 19 41 ),
  12 28 ',
  &% )9 %,,

j| What is the probability of a randomly selected individual being a male who


smokes? This is just a joint probability. The number of "Male and Smoke"
divided by the total = 19/100 = 0.19
j| What is the probability of a randomly selected individual being a male? This is
the total for male divided by the total = 60/100 = 0.60. Since no mention is
made of smoking or not smoking, it includes all the cases.
j| What is the probability of a randomly selected individual smoking? —gain,
since no mention is made of gender, this is a marginal probability, the total who
smoke divided by the total = 31/100 = 0.31.
j| What is the probability of a randomly selected male smoking? This time, you're
told that you have a male - think of stratified sampling. What is the probability
that the male smokes? Well, 19 males smoke out of 60 males, so 19/60 =
0.31666...
j| What is the probability that a randomly selected smoker is male? This time,
you're told that you have a smoker and asked to find the probability that the
smoker is also male. There are 19 male smokers out of 31 total smokers, so
19/31 = 0.6129 (approx)

—fter that last part, you have just worked a Bayes' Theorem problem. I know you
didn't realize it - that's the beauty of it. — Bayes' problem can be set up so it appears to
be just another conditional probability. In this class we will treat Bayes' problems as
another conditional probability and not involve the large messy formula given in the
text (and every other text).

!  $

There are three major manufacturing companies that make a product: —berations,
Brochmailians, and Chompielians. —berations has a 50% market share, and
Brochmailians has a 30% market share. 5% of —berations' product is defective, 7% of
Brochmailians' product is defective, and 10% of Chompieliens' product is defective.

This information can be placed into a joint probability distribution

Company 1   +  
   0.50-0.025 = 0.475 0.05(0.50) = 0.025 ,(,
6    0.30-0.021 = 0.279 0.07(0.30) = 0.021 ,&,
   0.20-0.020 = 0.180 0.10(0.20) = 0.020 ,$,
  ,9&' ,,)) %,,

The percent of the market share for Chompieliens wasn't given, but since the
marginals must add to be 1.00, they have a 20% market share.
Notice that the 5%, 7%, and 10% defective rates don't go into the table directly. This
is because they are conditional probabilities and the table is a joint probability table.
These defective probabilities are conditional upon which company was given. That is,
the 7% is not P(Defective), but P(Defective|Brochmailians). The joint probability
P(Defective and Brochmailians) = P(Defective|Brochmailians) * P(Brochmailians).

The "good" probabilities can be found by subtraction as shown above, or by


multiplication using conditional probabilities. If 7% of Brochmailians' product is
defective, then 93% is good. 0.93(0.30)=0.279.

j| What is the probability a randomly selected product is defective? P(Defective)


= 0.066
j| What is the probability that a defective product came from Brochmailians?
P(Brochmailian|Defective) = P(Brochmailian and Defective) / P(Defective) =
0.021/0.066 = 7/22 = 0.318 (approx).
j| —re these events independent? No. If they were, then
P(Brochmailians|Defective)=0.318 would have to equal the
P(Brochmailians)=0.30, but it doesn't. —lso, the P(—berations and
Defective)=0.025 would have to be P(—berations)*P(Defective) =
0.50*0.066=0.033, and it doesn't.

The second question asked above is a Bayes' problem. —gain, my point is, you don't
have to know Bayes formula just to work a Bayes' problem.

6     
However, just for the sake of argument, let's say that you want to know what Bayes'
formula is.

Let's use the same example, but shorten each event to its one letter initial, ie: —, B, C,
and D instead of —berations, Brochmailians, Chompieliens, and Defective.

P(D|B) is not a Bayes problem. This is given in the problem. Bayes' formula finds the
reverse conditional probability P(B|D).

It is based that the Given (D) is made of three parts, the part of D in —, the part of D in
B, and the part of D in C.
|||||||||||||||||||||||||||| | -|*|
||| /*||||
||||||||||||||  | -|*||.| | -|*||.| | -|*|

Inserting the multiplication rule for each of these joint probabilities gives
|||||||||||||||||||||||||||| */ |
||| /*||||
|||||||||||||| */   |.| */ |.| */ |

However, and I hope you agree, it is much easier to take the joint probability divided
by the marginal probability. The table does the adding for you and makes the
problems doable without having to memorize the formulas.

 
c  :     


!  
Be sure to read through the definitions for this section before trying to make sense out
of the following.

The first thing to do when given a claim is to write the claim mathematically (if
possible), and decide whether the given claim is the null or alternative hypothesis. If
the given claim contains equality, or a statement of no change from the given or
accepted condition, then it is the null hypothesis, otherwise, if it represents change, it
is the alternative hypothesis.

The following example is not a mathematical example, but may help introduce the
concept.

!  
":  5; 5" said Dr. McCoy to Captain Kirk.

Mr. Spock, as the science officer, is put in charge of statistically determining the
correctness of Bones' statement and deciding the fate of the crew member (to vaporize
or try to revive)

His first step is to arrive at the hypothesis to be tested.

Does the statement represent a change in previous condition?

j| Yes, there is change, thus it is the alternative hypothesis, H1


j| No, there is no change, therefore is the null hypothesis, H0

The correct answer is that there is change. Dead represents a ÷  



÷÷
 

 of alive. The null hypothesis always represents ÷ . Therefore,
the hypotheses are:

j| H0 : Patient is alive.
j| H1 : Patient is not alive (dead).
States of nature are something that you, as a statistician have no control over. Either it
is, or it isn't. This represents the true nature of things.

      u   

j| Patient is alive (H0 true - H1 false )


j| Patient is dead (H0 false - H1 true)

Decisions are something that you have control over. You may make a correct decision
or an incorrect decision. It depends on the state of nature as to whether your decision
is correct or in error.

     u    /    u   ÷ 

j| Reject H0 / "Sufficient evidence to say patient is dead"


j| Fail to Reject H0 / "Insufficient evidence to say patient is dead"

There are four possibilities that can occur based on the two possible states of nature
and the two decisions which we can make.

Statisticians will never accept the null hypothesis, we will fail to reject. In other
words, we'll say that it isn't, or that we don't have enough evidence to say that it isn't,
but we'll never say that it is, because someone else might come along with another
sample which shows that it isn't and we don't want to be wrong.

c   2 3 < 




c  r 
   :,  :,
*  :, Patient is Patient is dead,
alive,
Sufficient evidence of death
Sufficient
evidence
of death

    :, Patient is Patient is dead,


alive,
Insufficient evidence of death
Insufficient
evidence
of death
!!
 

c  r 
   :,  :,
*  :, Vaporize Vaporize a dead person
a live
person

    :, Try to Try to revive a dead person


revive a
live
person

=  
 >

c  r 
   :,  :,
*  :, Type I Correct —ssessment
Error
alpha

    :, Correct Type II Error


—ssessment beta

Which of the two errors is more serious? Type I or Type II ?

Since ›  
    (usually), that is the one we concentrate on. We
usually pick alpha to be very small (0.05, 0.01). Note: alpha is not a Type I error.
—lpha is the 
÷ 

 a Type I error. Likewise beta is the 



÷ 

 a Type II error.

  

Conclusions are sentence answers which include whether there is enough evidence or
not (based on the decision), the level of significance, and whether the original claim is
supported or rejected.

Conclusions are based on the original claim, which may be the null or alternative
hypotheses. The decisions are always based on the null hypothesis
.
 
 :,  :% 
   -*!;! - -c/.* -
*  :, There There is   evidence at the alpha level of
-c/!!!r - is   evidence significance to  the claim that (insert original
at the alpha level of claim here)
significance
to   the claim that
(insert original claim
here)

    :, There There is   evidence at the alpha level of


-!rc/!!!r - is   evidence significance to the claim that (insert original
at the alpha level of claim here)
significance
to   the claim that
(insert original claim
here)
c  0  ?  

*

The range is the simplest measure of variation to find. It is simply the highest value
minus the lowest value.
|||0 12%||3 45363||3515363|

Since the range only uses the largest and smallest values, it is greatly affected by
extreme values, that is - it is not resistant to change.

?  
-+ 
 +  -

The range only involves the smallest and largest numbers, and it would be desirable to
have a statistic which involved all of the data values.

The first attempt one might make at this is something they might call the average
deviation from the mean and define it as:

The problem is that this summation is always zero. So, the average deviation will
always be zero. That is why the average deviation is never used.

 ?  

So, to keep it from being zero, the deviation from the mean is squared and called the
"squared deviation from the mean". This "average squared deviation from the mean"
is called the variance.

/  !     ?  


One would expect the sample variance to simply be the population variance with the
population mean replaced by the sample mean. However, one of the major uses of
statistics is to estimate the corresponding parameter. This formula has the problem
that the estimated value isn't the same as the parameter. To counteract this, the sum of
the squares of the deviations is divided by one less than the sample size.

c  +  
There is a problem with variances. Recall that the deviations were squared. That
means that the units were also squared. To get the units back the same as the original
data values, the square root must be taken.

The sample standard deviation is not the unbiased estimator for the population
standard deviation.

The calculator does not have a variance key on it. It does have a standard deviation
key. You will have to square the standard deviation to find the variance.

cc 2  3


The sum of the squares of the deviations from the means is given a shortcut notation
and several alternative formulas.

— little algebraic simplification returns:


What's wrong with the first formula, you ask? Consider the following example - the
last row are the totals for the columns

1.| Total the data values: 23


2.| Divide by the number of values to get the mean: 23/5 = 4.6
3.| Subtract the mean from each value to get the numbers in the second column.
4.| Square each number in the second column to get the values in the third column.
5.| Total the numbers in the third column: 5.2
6.| Divide this total by one less than the sample size to get the variance: 5.2 / 4 =
1.3

4 4 - 4.6 = -0.6 ( - 0.6 )^2 = 0.36


5 5 - 4.6 = 0.4 ( 0.4 ) ^2 = 0.16
3 3 - 4.6 = -1.6 ( - 1.6 )^2 = 2.56
6 6 - 4.6 = 1.4 ( 1.4 )^2 = 1.96
5 5 - 4.6 = 0.4 ( 0.4 )^2 = 0.16
23 0.00 (—lways) 5.2

Not too bad, you think. But this can get pretty bad if the sample mean doesn't happen
to be an "nice" rational number. Think about having a mean of 19/7 =
2.714285714285... Those subtractions get nasty, and when you square them, they're
really bad. —nother problem with the first formula is that it requires you to know the
mean ahead of time. For a calculator, this would mean that you have to save all of the
numbers that were entered. The TI-82 does this, but most scientific calculators don't.

Now, let's consider the shortcut formula. The only things that you need to find are the
sum of the values and the sum of the values squared. There is no subtraction and no
decimals or fractions until the end. The last row contains the sums of the columns, just
like before.

1.| Record each number in the first column and the square of each number in the
second column.
2.| Total the first column: 23
3.| Total the second column: 111
4.| Compute the sum of squares: 111 - 23*23/5 = 111 - 105.8 = 5.2
5.| Divide the sum of squares by one less than the sample size to get the variance =
5.2 / 4 = 1.3
x x^2
4 16
5 25
3 9
6 36
5 25
23 111

  +    
The proportion of the values that fall within k standard deviations of the mean will be

at least , where k is an number greater than 1.

"Within k standard deviations" interprets as the interval: to .

Chebyshev's Theorem is true for any sample set, not matter what the distribution.

!  * 
The empirical rule is only valid for bell-shaped (normal) distributions. The following
statements are true.

j| —pproximately 68% of the data values fall within one standard deviation of the
mean.
j| —pproximately 95% of the data values fall within two standard deviations of
the mean.
j| —pproximately 99.7% of the data values fall within three standard deviations of
the mean.

The empirical rule will be revisited later in the chapter on normal probabilities.

/ 
   !"#$     + 
You may use the TI-82 to find the measures of central tendency and the measures of
variation using the list handling capabilities of the calculator.

a 
a   


You might also like