You are on page 1of 31

Business Econometrics

using SAS Tools (BEST)


Class IV Probability
Refresher

Probability

Quantifying randomness
The context: An experiment that admits several
possible outcomes
Some outcome will occur
The observer is uncertain which (or what) before
the experiment takes place
Event space = the set of possible outcomes. (Also
called the sample space.)
Probability = a measure of likelihood attached to
the events in the event space. (Try to define
probability without using a word that means
probability.)

Rules (Axioms) of
Probability
An event E will occur or not occur
P(E) is a number that equals the probability
that E will occur.
By convention, 0 < P(E) < 1.
E' = the event that E does not occur
P(E') = the probability that E does not
occur.

Essential Results for


Probability
If P(E) = 0, then E cannot (will not) occur
If P(E) = 1, then E must (will) occur
E and E' are exhaustive either E or E'
will occur.
Something will occur, P(E) + P(E') = 1
Only one thing can occur. If E occurs, then
E' will not occur E and E' are exclusive.

Joint Events
Pairs (or groups) of events: A and B
One or the other occurs: A or B A B
Both events occur A and B A B
Independent events: Occurrence of A does
not affect the probability of B
An addition rule: P(A B) = P(A)+P(B)-P(A
B)
The product rule for independent events:
P(A B) = P(A)P(B)

Application
Survey of 27326 German Individuals over 5 years. Frequency in black,
sample proportion in red. E.g., .04186 = 1144/27326, .52123 =
14243/27326
Female

Male

Total

Female

Male

Total

Uninsured

1144

1979

3123

Uninsured

.04186

.07242

.11429

Insured

11939

12264

24203

Insured

.43691

.44880

.88571

Total

13083

14243

27326

Total

.47877

.52123

1.00000

The Addition Rule Application


Survey of 27326 German Individuals over 5 years
Female

Male

Total

Uninsured

.04186

.07242

.11429

Insured

.43691

.44880

.88571

Total

.47877

.52123

1.00000

An individual is drawn randomly from the sample of 27,326


observations.
P(Female or Insured) = P(Female) + P(Insured) P(Female and
Insured)
= .47877
+ .88571
- .43691
= .92757

Independent Events
Events are independent if the occurrence of one
does not affect probabilities related to the other.
Events A and B are independent if P(A|B) = P(A).
I.e., conditioning on B does not affect the
probability of A.

Using Conditional
Probabilities: Bayes
Theorem
P(A,B)
P(A | B)
P(B)
P(B | A)P(A)

P(B)
P(B | A)P(A)

P(A,B) P(notA,B)
P(B | A)P(A)

P(B | A)P(A) P(B | notA)P(notA)

Target
Theorem
Def inition
Computation

Drug Testing

Notation
+ = test indicates disease, = indicates no disease
D = presence of disease, N = absence of disease
Known Data
P(Disease) = P(D) = .005 (Fairly rare) (Incidence)
P(Test correctly indicates disease) = P(+|D) = .98 (Sensitivity)
(Correct detection of the disease)
P(Test correctly indicates absence) = P(-|N) = . 95 (Specificity)
(Correct failure to detect the disease)
Objectives: Deduce
P(D|+) (Probability disease really is present | test positive)
P(N|) (Probability disease really is absent | test negative)
Note, P(D|+) = the probability that a patient actually has the disease
when the test says they do.

More Information
Deduce: Since P(+|D)=.98, we know P(|
D)=.02
because P(-|D)+P(+|D)=1
[P(|D) is the P(False negative).
Deduce: Since P(|N)=.95, we know P(+|
N)=.05
because P(-|N)+P(+|N)=1
[P(+|N) is the P(False positive).
Deduce: Since P(D)=.005,
P(N)=.995 because P(D)+P(N)=1.

Now, Use Bayes Theorem


We have P(+|D)=.98.
What is P(D|+)?
P(D and +)
P(+|D)P(D)
=
(By Bayes Theorem)
P(+)
P(+)
P(+) = P(D and +) + P(N and +)
= P(+|D)P(D) + P(+|N)P(N) so
P(D|+)=

P(D|+) =
=

P(+|D)P(D)
P(+|D)P(D)
=
P(+)
P(+|D)P(D) + P(+|N)P(N)

.98(.005)
= 0.08966 (Yikes!!)
.98(.005)+.05(.995)

Using the same approach, P(N|-) = 0.999889

Expected Value - A Risky Business


Venture
4 Alternative Projects: Success depends on
economic conditions, which cannot be
forecasted perfectly.
Boom
Recession Expected
(Probability) (90%)
(10%)
Value
Beer
-10,000 +12,000
-7,800
Fine Wine +20,000
-8,000
+17,200
Both
+10,000
+4,000
+9,400
T-bill
+3,000
+3,000
+3,000

Actuarially Fair Insurance

Insurance policy
You pay premium = F
If you collect on the policy, the payout
=W
Probability they pay you = P
Expected profit to them is
E[Profit] = F - P x W > 0 if F/W
>P
When is insurance fair? E[Profit] = 0?
Applications
Automobile deductible
Consumer product warranties

Litigation Risk Analysis


Form probability tree for decisions and
outcomes
Determine conditional expected payoffs
(gains or losses)
Choose strategy to optimize expected
value of payoff function (minimize loss or
maximize (net) gain.

Litigation Risk Analysis: Using


Probabilities to Determine a
Strategy

P(Upper path) = P(Causation|Liability,Document)P(Liability|Document)P(Document)


= P(Causation,Liability|Document)P(Document)
= P(Causation,Liability,Document)
= .7(.6)(.4)=.168. (Similarly for lower path, probability = .5(.3)(.6) = .09.)

Two paths to a favorable outcome. Probability =


(upper) .7(.6)(.4) + (lower) .5(.3)(.6) = .168 + .09 = .258.
How can I use this to decide whether to litigate or not?
Suppose the cost to litigate = $1,000,000 and a favorable outcome pays
$3,000,000.
What should you do?

Random Variable
Definition: A variable that will take a value
assigned to it by the outcome of a random
experiment.
Realization of a random variable: The
outcome of the experiment after it occurs.
The value that is assigned to the random
variable is the realization.
X = the variable, x = the realization
Use random variables to organize the
information about a random occurrence.

Types of Random Variables


Discrete: Takes integer values

Finite: How many female children in families


with 4 children; values = 0,1,2,3,4
Infinite: How many people will catch a certain
disease per year in a given population? Values
= 0,1,2,3, (How can the number be infinite?
It is a model.)

Continuous: A measurement. How long will a


light bulb last?
Values = 0 to
Intervals and preferences: On the scale

1=worst,2,3,4,5=best, how do you feel about


candidate _____ ? (What does this ranking mean?
Intensity of feelings should be continuously
variable.)

Probability Distribution
Range of the random variable = the
set of values it can take

Discrete: A set of integers. May


be finite or infinite
Continuous: A range of values
Probability distribution: Probabilities
associated with values in the range.

Notation

Probability distribution =
probabilities assigned to outcomes.
P(X=x) or P(Y=y) is common.
Probability function = PX(x).
Sometimes called the density function
Cumulative probability is
Prob(X < x) for the specific X.

Rules for Probabilities


1. 0 < P(x) < 1 (Valid probabilities)
2.

x all possible outcomes

P(x) 1

3. For different values of x, say A and


B, Prob(X=A or X=B) = P(A) + P(B)

Common Results
for Random Variables

Concentration of Probability

For almost any random variable, 2/3 of the


probability lies within 1
For almost any random variable, 95% of the
probability lies within 2
For almost any random variable, more than
99.5% of the probability lies within 3

What it means: For any random


outcome,

An (observed) outcome more than one away


from is somewhat unusual.
One that is more than 2 away is very unusual.
One that is more than 3 away from the mean
is so unusual that it might be an outlier (a
freak outcome).

Probabilities for two Events, A,B


Marginal Probability = The probability
of an event not considering any other
events. P(A)
Joint Probability = The probability that
two events happen at the same time.
P(A,B)
Conditional Probability = The
probability that one event happens given
that another event has happened. P(A|
B)

Independence

Random variables are independent


if the occurrence of one does not
affect the probability distribution of
the other.
If P(Y|X) does not change when X
changes, then the variables are
independent.

Two Important Math Results


For two random variables,
P(X,Y) = P(X|Y) P(Y)
P(Color blind, Male) = P(Color blind|Male)P(Male)
= .05 x .5 = .025
For two independent random variables,
P(X,Y) = P(X) P(Y)
P(Ace,Heart) = P(Ace) x P(Heart).
(This does not work if they are not independent.)

Measuring How
Variables Move
Together: Covariance
Cov(X,Y)
P(x,y)(x- )(y

values of X

values of Y

Covariance can be positive or


negative
The measure will be positive if it is
likely that Y is above its mean
when X is above its mean.
It is usually denoted XY.

Correlation is Units Free


Correlation Coefficient
XY

Covariance(x,y)

Standard deviation(x) Standard deviation(y)


1.00 XY +1.00.

Aspect of Correlation

Independence implies zero


correlation. If the variables are
independent, then the
numerator of the correlation
coefficient is 0.

Math Facts 1 Mean of a Sum


Mean of a sum. The
Mean of X+Y = E[X+Y] = E[X]
+E[Y]
Mean of a weighted sum
Mean of aX + bY = E[aX] + E[bY]
= aE[X] + bE[Y]

Math Facts 2 Variance of a


Sum
Variance of a Sum
Var[x+y] = Var[x] + Var[y] +2Cov(x,y)
Variance of a sum equals the sum of the
variances only if the variables are
uncorrelated.
Standard deviation of a sum
The standard deviation of x+y is not equal to
the sum of the standard deviations.

x y 2 xy
2
x

2
y

Math Facts 3 Variance


of a Weighted Sum
Var[ax+by] = Var[ax] + Var[by] +2Cov(ax,by)
= a2Var[x] + b2Var[y] + 2ab
Cov(x,y).
Also, Cov(x,y) is the numerator in xy, so
Cov(x,y) = xy x y.

ax by a b 2abxy x y
2

2
x

2
y

You might also like