You are on page 1of 37

Introduction to Probability and Distributions

Introduction to Probability and Distributions

by
Hrishikesh Khaladkar
Department of Mathematics
Fergusson College,Pune

May 20, 2018


Introduction to Probability and Distributions

Classical Definition of Probability

If a random experiment results in N exhaustive, mutually exclusive


and equally likely outcomes out of which m are favorable to the
happening of the event A then the Probability of occurrence of A
m
is given by P(A) =
n
exhaustive cases:are the total possible outcomes of a random
experiment
mutually exclusive cases:if the happening of any one of them
excludes the happening of all others in the same experiment
equally likely outcomes: if none of them is expected to occur
in the preference of other
favorable events: the number of outcomes which entail (result
in) the happening of the event
Introduction to Probability and Distributions

Axiomatic Definition of Probability

Given a sample space of a random experiment, the probability of


an occurence of an event A is defined as a function P(A) satisfying
the following axioms P : R → [0, 1]
P(A) is real and non-negative that is P(A) ≥ 0 (Non
negativity)
P(S) = 1 (Axiom of certainity)
n
[ Xn
P( Ai ) = P(Ai ) (Axiom of additivity)
i=1 i=1
Introduction to Probability and Distributions

Related Results
0 ≤ P(A) ≤ 1 for any event A
Complement Principle
P(A) + P(Ā) = 1 where Ā denotes the complement of A
Inclusion Exclusion Principle
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
Mutually exclusive events:
If events are mutually exclusive then A ∩ B = φ Hence
P(A ∪ B) = P(A) + P(B)
Conditional Probability
If the event A occurs followed by the event B then the
conditional
  probability is given by  
B P(A ∩ B) A P(A ∩ B)
P = . Similarly P =
A P(A) B P(B)
Independent Events  
A
If A is not dependent on B then P = P(A) and
  B
B
P = P(B). Therefore P(A ∩ B) = P(A)P(B)
A
Introduction to Probability and Distributions

Baye’s Theorem

If an event A can only occur in conjunction with one of mutually


exclusive and exhaustive events E1 , E2 , ....En and if A actually
happens the the probability that it was preceded by the particular
event Ei (i=1,2.....n) is given by  
A
  P(Ei )
Ei P(A ∩ Ei ) Ei
P =  =  
A Pn A Pn A
i=1 P(Ei )P i=1 P(Ei )P
Ei Ei
Special
  Case
B P(A ∩ B)
P =
A P(A)
His theorem has given rise to a New Branch which called Bayesian
Statistics having immense Applications including the Bayesian
Classifier in Machine Learning.
Introduction to Probability and Distributions

Random Variables

Random Variable A random variable may be defined as a


real valued function taking values on the real line R
Discrete Random Variable If the random variable assumes
only a finite or countably infinite set of values then it is called
as a Discrete Random Variable
Examples include: No heads appearing when a coin is tossed,
No of accidents occuring on a road etc..
Continuous Random Variable If the random variable
assumes only a infinite and uncountable set of values then it is
called as a Continuous Random Variable
Examples include : height, weight, temperature fluctuations
etc...
Introduction to Probability and Distributions

Probability Mass Function

Let X be a Discrete Random Variable.


With each value X we associate a number pi = P(X = xi ) which is
known as probability of xi .
If it follows the following conditions
pi = P(X = xi ) ≥ 0
P
pi = 1
Then the function0pi = P(X = xi ) is called as probability mass
function of the variable X
Introduction to Probability and Distributions

Probability Density Function

Let X be a Continuous Random Variable taking values over an


interval [a, b]
With each value X we associate a number P(X = x) which is
known as probability of the random variable X .
If it follows the following conditions
p(x) ≥ 0
For any two distinct points c and d in [a, b]
P(c ≤ x ≤ d) = Area of the curve between x = c and x = d
P(a ≤ x ≤ b) = 1
Then the function0 P(X = x) is called as probability density
function of the variable X
Introduction to Probability and Distributions

Distribution Function

If X is a discrete random variable with probability mass


function P(X = x) then the distribution function F (X ) is
defined as
F (X = x) = p(X ≤ x) = p(X = 1) + p(X = 2).... + p(X = x)
If X is a continuous random variable with probability density
function P(X = x) then the distribution function F (X ) is
defined as Z x
F (X = x) = p(X ≤ x) = p(x)dx
a
Introduction to Probability and Distributions

Expected Value or the Mean

Let X be a Random Variable then the Expected value is a


mathematical function.
P
E (X ) = xP(X = x) for discrete distributions
Z b
E (X ) = xp(x)dx for continuous distributions
a
Properties
E (c) = c for a constant c
E (aX + b) = aE (X ) + b Linearity
E (XY ) = E (X )E (Y ) Multiplicative
Introduction to Probability and Distributions

Variance or Standard deviation

Let X be a Random Variable then the variance of a random


variable X is defined as
σ 2 = E [X − E (X )]2 = E (X 2 ) − E (X )2
Properties
Var (c) = 0 for a constant c
Var (aX + b) = a2 Var (X )
Var (aX − b) = a2 Var (X )
Standard deviation is defined as the positive square root of
Variance
Introduction to Probability and Distributions

Variance or Standard deviation

Let X be a Random Variable then the variance of a random


variable X is defined as
σ 2 = E [X − E (X )]2 = E (X 2 ) − E (X )2
Properties
Var (c) = 0 for a constant c
Var (aX + b) = a2 Var (X )
Var (aX − b) = a2 Var (X )
Standard deviation is defined as the positive square root of
Variance
Introduction to Probability and Distributions

Moments

Let X be a random variable (continuous or discrete) the r th


moment about any point A is given by
µr = E (X − A)r
In particular if the point A = X̄ = E (X ) then µr is called as
the r th central moment.
0
Similarly if the point A = 0 then µr is called as the r th raw
moment.
Significances of Moments
0
µ1 = X̄ is the mean of the distribution
µ2 = σ 2 gives the variance of the distribution
µ3 helps us define the skewness of the distribution
µ4 helps us define the kurtosis of the distribution.
Introduction to Probability and Distributions

Binomial Distribution

What is Binomial Distribution? If X is a random variable which


denotes the number of successes in n trials satisfying the
conditions
n : the number of trials is finite
Each trial results in two mutually exclusive outcomes termed
as failures and success
Trials are independent
p: the probability of success is constant in any trial so is
q=1-p
then the probability of r successes is given by
P(X = r ) = nr p r q n−r where r = 0, 1, 2...n


We say that X → B(n, p)


Introduction to Probability and Distributions

Various parameters for Binomial Distribution

Mean of the distribution : np


Variance: npq

Standard Deviation: npq
Mode
1 X = k and X = k 1 if (n + 1)p is an integer bimodal
2 X = k when (n + 1)p is not an integer
Introduction to Probability and Distributions

Applications of Binomial Distribution

The distribution arises when the underlying events have two


possible outcomes the chances of which remain constant.
The number of defective found it samples of size n from a
stable production process is a binomial variable.
Its use in Genetic Engineering arises because the inheritance
of the biological characteristics depends on the genes that
occur in pairs
Introduction to Probability and Distributions

Poison Distribution

What is Poison Distribution?


A random variable X is said to follow a Poison Distribution under
the following conditions
n: the number of trials is indefinitely Large n → ∞
p: the constant probability of success of each trial is
indefinitely small p → 0
np = λ is finite
then probability distribution of X is given by
e −λ λr
P(X = r ) = where r = 0, 1, 2....
r!
We say X → P(λ)
Note that Poison is the limiting case of Binomial distribution
Introduction to Probability and Distributions

Various Parameters of Poison Distribution

Mean of the distribution :λ


Variance:λ
Standard Deviation: λ
Mode X = k and X = k 1 if (n + 1)p is an integer bimodal
X = k when (n + 1)p is not an integer
Introduction to Probability and Distributions

Applications of Poison Distribution

This distribution is applied to rare situations where the


probability of occurence decreases with respect to time.
The number of fatal automobile accidents per month.
The number of typing errors per page.
The number of atoms disintegrating per second from a radio
active material.
The number of Bomb hits on a square mile of London in 1944.
The number of defects in a manufactured article.
Introduction to Probability and Distributions

Hypergeometric Distribution

Let X be a discrete random variable which denotes the number of


success obtained out of population of size N.
A poulation of size N
A simple random sample is drawn without replacement of size
n.
k possess a certain characterstic in the population (like k
success)
x denotes the number of success within the sample of size n
The X follows Hypergeometric distribution if the probability mass
function is given by 
k N−k
x n−k
P(X = x) = N
 where x = 0, 1, ...n
n
We say that X → H(N, n, k)
This mostly arises in sampling without replacement from a finite
population whose elements are classified in to two categories
Introduction to Probability and Distributions

Introduction to Normal Distribution

If X is a continuous random variable following normal distribution


with mean µ and standard deviation σ then its
−(x − µ)2
1
p(x) = √ e 2σ 2 where −∞ < x < ∞
σ 2π
Discovered by English Mathematician De-Moivre who
obtained the Mathematical Equation while dealing with
problems arising in the game of chance.
But heavily used by Gauss to describe the theory of accidental
errors of measurements involved in the calculation of orbits of
planets and asteroids. Hence also known as Gaussian
Distribution.
We say that X → N(µ, σ 2 )
Introduction to Probability and Distributions

Standard Normal Variate

If X is a continuous random variable following normal distribution


with mean µ and standard deviation σ then define a new variable
X −µ
Z= . This is called as a standard Normal variate
σ
Special Features
 
X −µ 1 1
E (Z ) = E = E (X − µ) = (E (X ) − E (µ)) = 0
σ σ σ
 
X −µ 1 1
Var (Z ) = Var = 2 Var (X − µ) = 2 Var (X ) = 1
σ σ σ
Hence the standard normal variate is a normal variate with mean 0
and standard deviation 1.That is Z → N(0, 1)
−z 2
1
Hence the pdf is p(z) = √ e 2

There are probability tables available for Standard Normal
Variate with various Level of Significance
Introduction to Probability and Distributions

Why standard Normal Variate


Introduction to Probability and Distributions

Features of Normal distribution

The graph of p(x) is a bell shaped curve symmetrical about


the value X = µ or Z = 0
Since the distribution is symmetric mean, mode and median
coincide
As x increases numerically (on either sides) the value of p(x)
decreases rapidly.
1
The maximum value of [p(x)]max = √ which means that
σ 2π
it is inversely proportional to standard deviation.For large σ ,
p(x) decreases (which means curve tends to flatten) and for
small values of σ, p(x) increases(which means that the curve
has a sharp peak).
The X axis is an asymptote to the curve
Introduction to Probability and Distributions

Applications of Normal Distribution

If X is a normal variate with mean µ and variance σ 2 then


P(µ − 3σ < X < µ + 3σ) = P(3 < Z < +3) = 0.9973
P(|Z | > 3) = 1 − 0.9973 = 0.0027
That means the probability of standard normal variate going
the limits +3 or 3 is practically zero.It forms a basis for the
entire theory of Large Samples
Most of the discrete probability distributions tend to normal
as the number of trials increases indefinitely.
The entire theory of Small sample tests has a based
assumption that the parent population from which the sample
has been drawn follows normal distribution.
The Central Limit Theorem
Introduction to Probability and Distributions

Exponentail Distribution

A Continuous random variable X is said to follow Exponential


Distribution (or Negative Exponential) if the probability density
function is of the form. p(x) = λe −λx where x ≥ 0, λ > 0
This is a special case of Gamma Distribution
If λ = 1 then the distribution p(x) = e −x is called as standard
Exponential Distribution.
The mathematics of Exponential Distribution is often simple in
nature and so it is possible to obtain explicit formulas in terms
of elementary functions without trouble some Quadratures.
Hence Models constructed from exponential variables are used
as approximate representation for other models
We say X → Exp(λ)
Introduction to Probability and Distributions

Exponentail Distribution
Introduction to Probability and Distributions

Properties of Exponentail Distribution

The probability density curve is a J shaped curve


1
Mean of the Distribution is :
λ
1
Standard deviation of the distribution is:
λ
Suppose F (x) denotes the distribution function P(X ≤ x)
then P(x > x) = 1 − F (x) denotes the survival function in
biomedical applications and reliability function in case of
industrial applications.
If X → Exp(λ) then P(X < t1 + t2 /X > t1 ) = P(X < t2 )
where t1 > 0, t2 > 0. This is called as the Lack of memory
property.Conversly if X is a continuous random variable
satisfying this property then X follows Exponential.
p(x)
In reliability theory is called as the hazard rate.
1 − F (x)
Introduction to Probability and Distributions

Applications of Exponentail Distribution


Random process taking place over time under certain
assumptions follow Poison distribution. In such cases the
waiting time between such processess follows Exponential
distribution. For example:
trucks arriving per hour at a ware house follow Poison process
under certain assumptions. In such cases the waiting time
between two successive arrival follows Exponential Distribution.
Similarly user Log-ons in a large computer network can be
follow Poison distribution. In such cases the waiting time
between two successive logons follows Exponential
Some kind of electrical components like fuses, safety
valves,glass wares, transistors do not experience the aging
process.Hence there life time is reasonably assumed to be
exponential. Also one of the interesting facts that probability
that the component will survive t units more given that it has
already survived s time units is same as the probability that a
newly installed component will survive more than t units. This
indicates the Lack of Memory property
Introduction to Probability and Distributions

Chi-Square Distribution

X −µ
Given a standard normal variate Z = is a Chi-Square
σ
2
variate χ variate with 1 degree of freedom.
Hence if X1 , X2 ....Xk are k independent random variates with
means µ1 , µ2 ....µk and standard deviations σ1 , σ2 ...σk then the
variate
X1 − µ1 2 X2 − µ2 2 Xk − µk 2
    
2
χ = + + .... is called as
σ1 σ2 σk
a Chi-Square variate with k degrees of freedom.
Its pdf is given by
1 −χ2 k
p(x = χ2 ) = k e 2 (χ2 ) 2 −1 where 0 < χ2 < ∞
2 2 Γ k2
We say that X → χ2 (n)
Introduction to Probability and Distributions

Chi-Square Distribution

Note that for higher degrees of freedom this distribution


goes to normal
Introduction to Probability and Distributions

t Distribution

Given that X → N(0, 1) and Y → χ2 with n degrees of freedom


X
then the variable t defined by t = q is said to follow t
Y
n
distribution with parmeter n or (t → tn )
The pdf of the distribution is given by
1
p(t) =  n+1 where −∞ < t < ∞
√ t2 2
 
1 n
nB , 1+
2 2 n
We say t → tn
Introduction to Probability and Distributions

t Distribution

Note that for higher degrees of freedom this distribution


goes to normal
Introduction to Probability and Distributions

F Distribution

Suppose X and Y are independent χ2 variates with n1 and n2


X
n1
degrees of freedom respectively then F = Y
is said to follow F
n2
distribution with n1 and n2 degrees of freedom. F → F (n1 , n2 )
The pdf of the distribution is given by
  n1
n1 2
n1
n2 F 2 −1
p(F ) =  n n  
1 2  n1 +n2 where 0 < F < ∞
B , n1 2
2 2 1+ F
n2
We say that F → F (n1 , n2 )
Introduction to Probability and Distributions

F Distribution

Note that for higher degrees of freedom for n1 and n2 this


distribution goes to normal
Introduction to Probability and Distributions

A short note

All the three distributions t, χ2 and F distributions are


interelated.
All the distributions are used as Sampling distributions to
infer various population parameters from samples drawn
having size less than 30 (small samples)
Introduction to Probability and Distributions

Thank You!!

You might also like