You are on page 1of 27

Probability theory

Much inspired by the presentation of


Kren and Samuelsson
3 view of probability
Frequentist
Mathematical
Bayesian (knowledge-based)
Sample space
A universe of elementary outcomes. In
elementary treatments, we pretend that we
can come up with sets of equiprobable
outcomes (dice, coins, ...). Outcomes are
very small.
An event is a set of those outcomes. Events
are bigger than outcomes -- more
interesting.
Probability measure
Every event (=set of outcomes) is assigned
a probability, by a function we call a
probability measure.
The probability of every set is between 0
and 1, inclusive.
The probability of the whole set of
outcomes is 1.
If A and B are two event with no common
outcomes, then the probability of their
union is the sum of their probabilities.
Cards
Out universe of outcomes is single card
pulls.
Events: a red card (1/2); a jack (1/13);
Other things to remember
The probability that event P will not happen
(=event ~P will happen) is 1-prob(P).
Prob (null outcome) = 0.
p ( A B ) = p(A) + p(B) - p( A B).
Independence (definition)
Two events A and B are independent if the
probability of AB = probability of A times
the probability of B (that is, p(A)* p(B) ).
Conditional probability
This means: what's the probability of A if I
already know B is true?
p(A|B) = p(A and B) / p (B) =
p(A B) / p(B)
Probability of A given B.
p(A) is the prior probability; p(A|B) is called
a posterior probability. Once you know B is
true, the universe you care about shrinks to
B.

Bayes' rule
prob (A and B) = prob (B and A); so
prob (A |B) prob (B) = prob (B|A) prob (A)
-- just using the definition of prob (X|Y));
hence
) (
) ( ) | (
) | (
B prob
A prob A B prob
B A prob =
Bayes rule as scientific
reasoning
A hypothesis H which is supported by a set
of data D merits our belief to the degree
that:
1. We believed H before we learned about
D;
2. H predicts data D; and
3. D is unlikely.
A random variable
a.k.a. stochastic variable.
A random variable isn't a variable. It's a
function. It maps from the sample space to
the real numbers. This is a convenience: it is
our way of translating events (whatever
they are) to numbers.
Distribution function
Distribution function:
This is a function that takes a real number x
as its input, and finds all those outcomes in
the sample space that map onto x or
anything less than x.
For a die, F(0) = 0; F(1) = 1/6; F(2) = 1/3;
F(3) = 1/2; F(4) = 2/3; F(5) = 5/6; and F(6)
= F(7) = 1.
discrete distribution function
discrete, continuous
If the set of values that the distribution
function takes on is finite, or countable,
then the random variable (which isn't a
variable, it's a function) is discete; otherwise
it's continuous (also, it ought to be mostly
differentiable).
Distribution function aggregates
It's a little bit counterintuitive, in a way. What
about a function P for a die that tells us that P (
1) = 1/6, P(2) = 1/6, ... p(6) = 1/6?
That's a frequency function, or probability
function. We'll use the letter f for this. For the
case of continuous variables, we don't want to
ask what the probability of "1/6" is, because the
answer is always 0...
Rather, we ask what's the probability that
the value is in the interval (a,b) -- that's OK.
So for continuous variables, we care about
the derivative of the distribution function at
a point (that's the derivative of an integral,
after all...). This is called a probability
density function. The probability that a
random variable has a value in a set A is the
integral of the p.d.f. over that set A.
Frequency function f
The sum of the values of the frequency
function f must add up to 1!
The integral of the probability density
function must be 1.
A set of numbers that adds up to 1 is called
a distribution.
Means that have nothing to do
with meaning
The mean is the average; in everyday terms, we
add all the values and divide by the number of
items. The symbol is 'E', for 'expected' (why is the
mean expected? What else would you expect?)
Since the frequency function f tells you how many
there are of any particular value, the mean is

i
i i
x f x ) (
Weight a moment...
The mean is the first moment; the second moment is
the variance, which tells you how much the random
variable jiggles. It's the sum of the differences from
the mean (square those differences so they're
positive). The square root of this is the standard
deviation. (We don't divide by N here; that's inside
the f-function, remember?)


i
i i
x f x ) ( ) (
2

Particular probability
distributions:
Binomial
Gaussian, also known as normal
Poisson
Binomial distribution
If we run an experiment n times
(independently: simultaneous or not, we
don't care), and we care only about how
many times altogether a particular outcome
occurs -- that's a binomial distribution, with
2 parameters: the probability p of that
outcome on a single trial, and n the number
of trials.
If you toss a coin 4 times, what's the
probability that you'll get 3 heads?
If you draw a card 5 times (with
replacement), what's the probability that
you'll get exactly 1 ace?
If you generate words randomly, what's the
probability that you'll have two the's in the
first 10 words?
In general, the answer is
k n k k n k
q p
k k n
n
q p
k
n

=
|
.
|

\
|
! )! (
!
Normal or Gaussian distribution
Start off with something simple, like this:
2
x
e

That's symmetric around the y-axis (negative and


positive x treated the same way -- if x = 0, then the
value is 1, and it slides to 0 as you go off to infinity,
either positive or negative.
Gaussian or normal distribution
Well, x's average can be something other
than 0: it can be any old
2
) ( x
e
2
2
2
) (
o
x
e
And its variance (o
2
) can be other than 1
And then normalize--
so that it all adds up (integrates, really) to 1,
we have to divide by a normalizing factor:

2
2
2
) (
2
1
o

t o
x
e

You might also like