You are on page 1of 22

Probability and Probability

Distributions
Instructor : Basesh Gala
Indian Institute of Quantitative Finance

Indian Institute of Quantitative Finance

Conditional probability and


independence
If we know that one event has occurred it may change our
view of the probability of another event. Let
A = {rain today}, B = {rain tomorrow}, C = {rain in 90 days time}

It is likely that knowledge that A has occurred will change


your view of the probability that B will occur, but not of
the probability that C will occur.
We write P(B|A) P(B), P(C|A) = P(C). P(B|A) denotes the
conditional probability of B, given A.
We say that A and C are independent, but A and B are not.
Note that for independent events P(AC) = P(A)P(C).
Indian Institute of Quantitative Finance

Conditional probability - tornado


forecasting
Consider the classic data set on the next Slide
consisting of forecasts and observations of
tornados (Finley, 1884).
Let

F = {Tornado forecast}
T = {Tornado observed}

Use the frequencies in the table to estimate


probabilities its a large sample, so estimates
should not be too bad.
Indian Institute of Quantitative Finance

Forecasts of tornados

Tornado
observed
No T
observed
Total

Tornado No T
Total
Forecast forecast
28
23
51
72

2680

2752

100

2703

2803

Indian Institute of Quantitative Finance

Conditional probability - tornado


forecasting
P(T) = 51/2803 = 0.0182
P(T|F) = 28/100 = 0.2800
P(T|Fc) = 23/2703 = 0.0085

Knowledge of the forecast changes P(T). F and T are


not independent.

P(F|T) = 28/51 = 0.5490

P(T|F), P(F|T) are often confused but are different


quantities, and can take very different values.
Indian Institute of Quantitative Finance

Conditional probability - tornado


forecasting
P(TF) = 28/2803 = P(T) P(F|T) = P(F)P(T|F)
P(F)P(T).
The two formulae for the probability of an
intersection always hold.
If A, B are independent, then P(A|B) = P(A), P(B|A)
= P(A), so P(AB) = P(A)P(B).
P(B|A) = P(B)P(A|B)/P(A)

This is Bayes Theorem, though in the usual statement


of the theorem P(A) is expanded in a more
complicated-looking fashion.
Indian Institute of Quantitative Finance

Random variables
Often we take measurements which have different
values on different occasions. Furthermore, the
values are subject to random or stochastic
variation - they are not completely predictable,
and so are not deterministic. They are random
variables.
Examples are crop yield, maximum temperature,
number of cyclones in a season, rain/no rain.
Indian Institute of Quantitative Finance

Continuous and discrete random


variables
A continuous random variable is one which can (in
theory) take any value in some range, for example
crop yield, maximum temperature.
A discrete variable has a countable set of values.
They may be

counts, such as numbers of cyclones


categories, such as much above average, above
average, near average, below average, much below
average
binary variables, such as rain/no rain
Indian Institute of Quantitative Finance

Probability distributions
If we measure a random variable many times, we can
build up a distribution of the values it can take.
Imagine an underlying distribution of values which
we would get if it was possible to take more and
more measurements under the same conditions.
This gives the probability distribution for the
variable.

Indian Institute of Quantitative Finance

Discrete probability distributions


A discrete probability distribution associates a
probability with each value of a discrete random
variable.
Example 1. Random variable has two values Rain/No
Rain. P(Rain) = 0.2, P(No Rain) = 0.8 gives a
probability distribution.
Example 2. Let X = Number of wet days in a 10 day
period. P(X=0) = 0.1074, P(X=1) = 0.2684, P(X=2) =
0.3020, P(X=6) = 0.0055, ... (see Slide 24 for more
on this example).
Note that P(rain) + P(No Rain) = 1; P(X=0) + P(X=1) +
P(X=2) + +P(X=6) + P(X=10) = 1.

Indian Institute of Quantitative Finance

Continuous probability distributions


Because continuous random variables can take all
values in a range, it is not possible to assign
probabilities to individual values.
Instead we have a continuous curve, called a
probability density function, which allows us to
calculate the probability a value within any
interval.
This probability is calculated as the area under the
curve between the values of interest. The total area
under the curve must equal 1.
Indian Institute of Quantitative Finance

Families of probability distributions


The number of different probability distributions is
unlimited. However, certain families of
distributions give good approximations to the
distributions of many random variables.
Important families of discrete distributions include
binomial, multinomial, Poisson, hypergeometric,
negative binomial
Important families of continuous distributions
include normal (Gaussian), exponential, gamma,
lognormal, Weibull, extreme value
Indian Institute of Quantitative Finance

Families of discrete distributions


We consider only two, binomial and Poisson.
There are many more.
Do not use a particular distribution unless you
are satisfied that the assumptions which
underlie it are (at least approximately)
satisfied.

Indian Institute of Quantitative Finance

Binomial distributions
The data arise from a sequence of n independent trials.
At each trial there are only two possible outcomes,
conventionally called success and failure.
The probability of success, p, is the same in each trial.
The random variable of interest is the number of successes,
X, in the n trials.

The assumptions of independence and constant p


in 1, 3 are important. If they are invalid, so is
the binomial distribution
Indian Institute of Quantitative Finance

Binomial distributions - examples


It is unlikely that the binomial distribution would be
appropriate for the number of wet days in a period of 10
consecutive days, because of non-independence of rain on
consecutive days.
It might be appropriate for the number of frost-free Januarys,
or the number of crop failures, in a 10-year period, if we
can assume no inter-annual dependence and no trend in p,
the frost-free probability, or crop failure probability.

Indian Institute of Quantitative Finance

Poisson distributions
Poisson distributions are often used to describe the number of
occurrences of a rare event. For example
The number of tropical cyclones in a season
The number of occasions in a season when river levels
exceed a certain value
The main assumptions are that events occur

at random (the occurrence of an event doesnt change the


probability of it happening again)
at a constant rate

Poisson distributions also arise as approximations to


binomials when n is large and p is small.
Indian Institute of Quantitative Finance

Poisson distributions an example


Suppose that we can assume that the number of
cyclones, X, in a particular area in a season has a
Poisson distribution with a mean (average) of 3.
Then P(X=0) = 0.05, P(X=1) = 0.15, P(X=2) =
0.22, P(X=3) = 0.22, P(X=4) = 0.17, P(X=5) =
0.10, Note:

There is no upper limit to X, unlike the binomial where


the upper limit is n.
Assuming a constant rate of occurrence, the number of
cyclones in 2 seasons would also have a Poisson
distribution, but with mean 6.
Indian Institute of Quantitative Finance

Normal (Gaussian) distributions


Normal (also known as Gaussian) distributions are
by far the most commonly used family of
continuous distributions.
They are bell-shaped see Slide 20 - and are
indexed by two parameters:

The mean the distribution is symmetric about this


value
The standard deviation this determines the spread
of the distribution. Roughly 2/3 of the distribution lies
within 1 standard deviation of the mean, and 95%
within 2 standard deviations.
Indian Institute of Quantitative Finance

Deviations from normality - skewness


Some variables deviate from normality because their
distributions are symmetric but too flat or too longtailed.
A more common type of deviation is skewness, where one tail
of the distribution is much longer than the other.
Positive skewness, as illustrated in the next Slide is most
common it occurs for windspeeds, and for rainfall
amounts.
Negatively-skewed distrbutions with longer tails to the left
sometimes occur, for example surface pressure.
Indian Institute of Quantitative Finance

A positively-skewed Weibull distribution

0.3

f(x)

0.2

0.1

0.0
0

x
Indian Institute of Quantitative Finance

10

Families of skewed distributions


There are several families of skewed distributions,
including Weibull, gamma and lognormal. Each
family has 2 or more parameters which can be
varied to fit a variety of shapes.
One particular family (strictly 3 families) consists of
so-called extreme value distributions. As the name
suggests, these can be used to model extremes
over a period, for example, maximum windspeed,
minimum temperature, greatest 24-hr. rainfall,
highest flood
Indian Institute of Quantitative Finance

Other probability distributions


We have sketched a few of the main probability
distributions, but there are many others. Examples
which dont fit standard patterns include

Proportion of sky covered by cloud may have large


probability values near 0 and 1, with lower probabilities
in between U-shaped rather than bell-shaped
Daily rainfall is neither (purely) discrete, nor
continuous. Positive values are continuous, but there is
also a non-zero (discrete) probability of taking the
value zero.
Indian Institute of Quantitative Finance

You might also like