Professional Documents
Culture Documents
Fall 2015
Instructor:
Ajit Rajwade
Topic Overview
We will study several useful families of random
exponential, chi-square
Bernoulli Distribution
Definition
Let X be a random variable whose value is 1 when a
Properties
E[X] = p(1) + (1-p)(0) = p.
Var(X) = p(1-p)2 + (1-p)(0-p)2 = p(1-p)
Whats the median?
What is (are) the mode(s)?
Binomial Distribution
Defintion
Let X be a random variable denoting the number of
P( X i ) C (n, i ) pi (1 p)ni
This is called the binomial pmf with parameters (n,p).
Defintion
The pmf of X is given as follows:
P( X i ) C (n, i ) p i (1 p )n i
n!
C ( n, i )
i! (n i )!
Explanation: Consider a sequence of trials with i
successes and n-i failures. The probability that this
sequence occurs is (1-p)n-i, by the product rule. But
there are C(n,i) such sequences so we add their
individual probabilities using the sum rule.
Defintion
Example: In 5 coin tosses, if we had two heads and
Defintion
The pmf of X is given as follows:
P( X i ) C (n, i ) p i (1 p )n i
n!
C ( n, i )
i! (n i )!
To verify it is a valid pmf:
n
i 1
i 1
i
n i
P
(
X
i
)
C
(
n
,
i
)
p
(
1
p
)
( p (1 p ))
1
Binomial theorem!
Example 1
Lets say you design a smart self-driving car and you have
Example 2
At least half of an airplanes engines need to function for it
Properties
Mean: Recall that binomial random variable X is the
Hence
n
E ( X ) E ( X i ) np
i 1
Variance:
n
i 1
i 1
Properties
The CDF is given as follows:
i
FX (i ) P( X i ) C (n, k ) p k (1 p)n k
k 0
Example
Bits are sent over a communications channel in packets
of 12. If the probability of a bit being corrupted over this
channel is 0.1 and such errors are independent, what is
the probability that no more than 2 bits in a packet are
corrupted?
If 6 packets are sent over the channel, what is the
probability that at least one packet will contain 3 or more
corrupted bits?
Let X denote the number of packets containing 3 or more
corrupted bits. What is the probability that X will exceed
its mean by more than 2 standard deviations?
Example
We want P(X = 0) + P(X=1) + P(X=2).
P(X=0) = (0.1)0(0.9)12 = 0.282
P(X=1) = C(12,1) (0.1)1(0.9)11 =0.377
P(X=2) = C(12,2) (0.1)2(0.9)10=0.23
So the answer is 0.889.
Example
The probability that 3 or more bits are corrupted is 1-0.889
= 0.111
Let Y = number of packets with 3 or more corrupted bits.
P(Y=1)-P(Y=2) = ?
Example
We want P(Y > + 2) = P(Y > 2.2) = P(Y 3) = 1-
P(Y=0)-P(Y=1)-P(Y=2) = ?
P(Y=0)=C(6,0) (0.111)0(0.889)6 = 0.4936
P(Y=1) = C(6,1)(0.111)(0.889)5 = 0.37
P(Y=2) = C(6,2) (0.111)2(0.889)4 = 0.115
P(Y > + 2) = 1-(0.4936+0.37+0.115) = 0.0214
Related distributions
In a sequence of Bernoulli trials, let X be the random
variable for the trial number that gave the first success.
Then X is called a geometric random variable and its
Properties: Mode
Let X ~ Binomial (n,p).
Then P(X = k) P(X = k-1) k (n+1)p (prove this).
Also P(X = k) P(X = k+1) k (n+1)p-1 (prove
this).
Any integer-valued k which satisfies both the above
Poisson distribution
n!
P( X i )
pi (1 p)ni
i!(n i )!
Here p is the success probability. We can express it in
the form
P( X i )
i
i!
Definition
The Poisson distribution is used to model the number
Properties
To double check that it is indeed a valid pmf, we check
that:
i 0
i!
e e 1
Properties
The afore-mentioned analysis tells us that the expected
E ( X ) E ie
i!
i 0
i
E ie
i!
i 1
i 1
e E
i 1 (i 1)!
e E
j 0 j!
e e
Properties
Variance:
Var ( X ) E ( X 2 ) ( E ( X )) 2
2 i
2
E ( X ) E i e
i!
i 0
Var ( X )
2
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
10
20
30
40
50
60
Properties
Consider independent Poisson random variables X and
i
i!
P( X i 1)
i 1
(i 1)!
P( X i 1)
, P ( X 0) 0
P( X i )
i 1
years
Number of wrong numbers dialed in a day
Number of laptops that fail on the first day of use
( t ) i t
P( X i )
e
i!
Now, the sensor element does not count all the incident
photons.
The number of counted photons is distributed as per
Binomial(n,p) where n, the number of incident photons is
itself Poisson distributed, and p is the success probability of
the Bernoulli trials that constitute the Binomial distribution.
by a Binomial distribution.
More details here or here.
Multinomial distribution
Definition
Consider a sequence of n independent trials each of
Definition
Let X be a k-dimensional random variable for which
Definition
Then the pmf of X is given as follows:
P( X ( x1 , x2 ,..., xk ))
P( X 1 x1 , X 2 x2 ,..., X k xk )
n!
i,0 pi 1, pi 1, x1 x2 ... xk n
i 1
Definition
The success probabilities for each category, i.e. p1,
Properties
Mean vector:
j 1
j 1
Properties
For vector-valued random variables, the variance is
Prove this as an
exercise!
Var ( X i X j ) n( pi p j )(1 pi p j )
But Var ( X i X j ) E[( X i i X j j )2 ]
Var ( X i ) Var ( X j ) 2Cov( X i , X j )
Cov( X i , X j ) npi p j
Hypergeometric distribution
again 1/k.
When you sample without replacement, the probability
Definition
Consider a set of objects of which N are of good
replacement.
There are C(N+M,n) ways of doing this.
Definition
There are C(N,i)C(M,n-i) ways to pick i good quality
Properties
Consider random variable Xi which has value 1 if the
P( X 2 1 | X 1 0) P( X 1 0)
N 1
N
N
M
N
N M 1 N M N M 1 N M N M
N
In general, P( X i 1)
N M
Properties
Note that:
n
X Xi
i 1
nN
n
E( X ) E X i
i 1 N M
n
n
n
n
Var ( X ) Var X i Var ( X i ) 2 Cov( X i , X j )
i 1 j i 1
i 1 i 1
NM
Var ( X i ) P( X i 1)(1 P( X i 1))
NM
Properties
Note that:
n
n
n
n
Var ( X ) Var X i Var ( X i ) 2 Cov( X i , X j )
i 1 j i 1
i 1 i 1
NM
Var ( X i ) P( X i 1)(1 P( X i 1))
NM
Cov( X i , X j ) E ( X i X j ) E ( X i ) E ( X j )
E ( X i X j ) P( X i X j 1) P( X j 1, X i 1)
P( X j 1 | X i 1) P( X i 1)
N 1
N
N M 1 N M
Properties
Note that:
Cov( X i , X j ) E ( X i X j ) E ( X i ) E ( X j )
E ( X i X j ) P( X i X j 1) P( X j 1, X i 1)
P( X j 1 | X i 1) P( X i 1)
N 1
N
N M 1 N M
N ( N 1)
N
Cov( X i , X j )
( N M )( N M 1) N M
NM
( N M )2 ( N M 1)
Properties
Note that:
N ( N 1)
N
Cov( X i , X j )
( N M )( N M 1) N M
NM
( N M )2 ( N M 1)
Var ( X )
nNM
n(n 1) NM
( N M )2 ( N M )2 ( N M 1)
nNM
n 1
( N M )2 N M 1
n 1
np(1 p)1
Recall: Each Xi is a
Bernoulli random variable
with parameter p=N/(N+M).
Gaussian distribution
Definition
A continuous random variable is said to be normally
N (, 2 )
Definition
If =0 and =1, it is called the standard normal
distribution
x2
1
f ( x)
exp denoted as
2
2
N (0,1)
Properties
Mean:
1
E( X )
2
( y )e
( x )e
( y ) 2 /( 2 )
0 why ?
E[ X ]
dy
( x ) 2 /( 2 2 )
dx
Properties
Variance:
1
2
E (( X ) )
2
y e
2 ( y ) 2 /( 2 )
2 why ?
dy
(x ) e
2
( x ) 2 /( 2 2 )
dx
Properties
If X ~ N ( , 2 ) and if Y aX b, then
Y ~ N (a b, a 2 2 )
Proof on board. And in the book.
Properties
Median = mean (why?)
Because of symmetry of the pdf about the mean
Mode = mean can be checked by setting the first
by:
1
( x ) FX ( x )
2
( x ) 2 /( 2 )
dx
Properties
CDF it is given by:
1
( x ) FX ( x )
2
( x ) 2 /( 2 )
dx
as:
erf ( x )
x2
dx
It follows that:
1
x
( x ) 1 erf
2
2
Properties
For a Gaussian with mean and standard deviation ,
it follows that:
x
x 1
erf
2
2
The probability that a Gaussian random variable has
Properties
The probability that a Gaussian random variable has
2
n
(n)-(-n)
68.2%
95.4%
99.7%
99.99366%
99.9999%
99.9999998%
A strange phenomenon
Lets say you draw n = 2 values, called x1 and x2, from
x
i
y j n i 1
n
A strange phenomenon
Now suppose you repeat the earlier two steps with larger and larger n.
It turns out that as n grows larger and larger, the histogram starts resembling a 0
mean Gaussian distribution with variance equal to that of the sampling distribution.
Now if you repeat the experiment with samples drawn from any other distribution
instead of [0,1] uniform random (i.e. you change the sampling distribution).
The phenomenon still occurs, though the resemblance may start showing up at
xi
Yn n i 1
n
( xi i )
Yn i 1 n
i 1
1A : X {0,1}
1A ( x ) 1 if x A
Indicator
function
0 otherwise
2
E[( xi i ) .1{| xi i |sn } ]
n
0, s 2
lim n i 1
n
i
sn
i 1
xi
Yn n i 1 ~ N (0, 2 )
n
x
i
i 1 ~ N (0, 2 / n ) ( why ?)
n
n
xi
i 1 ~ N ( , 2 / n )
n
Application
Your friend tells you that in 10000 successive
Application
Your friend tells you that in 10000 successive independent
testing.
p).
Hence the following random variable is a standard
X np
np(1 p )
Watch the animation here.
that:
When n , we have a,b, a b,
X np
P(a
b) (b) (a)
np(1 p)
where X ~ Binomial (n, p)
This is called the de Moivre-Laplace theorem and is
X
i 1
Xi
E ( X ) E i 1
n
Var ( X )
Var ( X )
i
i 1
2
n
S2
2
(
X
X
)
i
i 1
n 1
2
2
X
n
X
i
i 1
n 1
as follows:
n
E (( n 1) S 2 ) E ( X i2 nX 2 )
i 1
nE ( X 12 ) nE ( X 2 )
E (W 2 ) Var (W ) ( E (W ))2
as follows:
E ((n 1) S 2 ) nVar ( X 1 ) n( E ( X 1 ))2
nVar ( X ) n( E ( X ))2
E (( n 1) S 2 ) n 2 n 2
n( 2 / n) n( )2 (n 1) 2
E(S 2 ) 2
as follows:
E ((n 1)S 2 ) (n 1) 2
E(S 2 ) 2
If the sample variance were instead defined as
n
S
2
( X
X
)
i
i 1
n
we would have :
2
(
n
1
)
E(S 2 )
n
X
i 1
2
i
nX 2
Chi-square distribution
If Z1, Z2, ., Zn are independent standard normal
x y 1e x dx
0
Chi-square distribution
You will be able to obtain the expression for the chi-
Additive property
If X1 and X2 are independent chi-square random
Chi-square distribution
Tables for the chi-square distribution are available for
(n 1) S ( X i X ) ( X i )2 n( X )2
2
i 1
i 1
2
(
X
)
i
i 1
2
(
X
X
)
i
i 1
n( X )2
Xi
i 1
n
2
(
X
X
)
i
i 1
n ( X )
Xi
i 1
n
2
(
X
X
)
i
i 1
n ( X )
It turns out that these two quantities are independent random variables. The proof
of this requires multivariate statistics and transformation of random variables, and is
deferred to a later point in the course. If you are curious, you can browse this link,
but its not on the exam for now.
Given this fact about independence, it then follows that the middle term is a
chi-square distribution with n-1 degrees of freedom.
Uniform distribution
Uniform distribution
A uniform random variable over the interval [a,b] has a
integrates to 1.
It is easy to show that its mean and median are equal to
(b+a)/2.
Applications
Uniform random variables, especially over the [0,1]
p
i 1
Applications
For now, we will study two applications. How do you
P( X xi ) pi ,1 i n, pi 1
i 1
Draw u ~ Uniform(0,1)
If u p1 sampled value is x1
If p1 u p1 p2 sampled value is x2
.
.
If p1 p2 ... pn 1 u p1 p2 ... pn 1 pn sampled value is xn
Applications
Uniform random variables, especially over the [0,1]
n):
I j 1 if a j Bk , else 0
P( I j | I1 , I 2 ,..., I j 1 )
k Ii
i 1
n ( j 1)
,2 j n
U1 ~ Uniform(0,1)
I1 1, if U1 k / n, else 0
U 2 ~ Uniform(0,1)
I 2 1, if U 2 ( k I1 ) /( n 1), else 0
.
.
U j ~ Uniform(0,1)
I j 1, if U j (k I1 I 2 ... I j 1 ) /( n j 1), else 0
Exponential Distribution
Motivation
Consider a Poisson distribution with an average
Motivation
Let X ~ Poisson (t) for time interval (0,t).
T is a random variable whose distribution we are going
P(T t ) 1 P(T t ) 1 P( X 0)
The probability that the first success occurred after time t = probability
that there was no success in the time interval (0,t), i.e. X = 0 in that
interval
P(T t ) 1 P(T t ) 1 P( X 0)
e t ( t ) 0
1
1 e t
0!
Motivation
FT (t ) 1 e t
fT (t ) e t , t 0
(0 elsewhere)
Such a random variable T is called an exponential
random variable. It models the waiting time for a
Poisson process. It has a parameter .
E (T ) te dt te dt
t
Variance:
Var (T ) E (T ) ( E (T ))
2
E (T ) t e dt
2
1
0 e dt 2
u
ln 2
Properties: Memorylessness
A non-negative random variable T is said to be
memoryless if:
s, t 0, P(T s t | T t ) P(T s)
Meaning: This gives the probability that given a
P(T s t , T t )
P(T s)
P(T t )
Properties: Memorylessness
You can easily verify that this holds for the exponential
distribution.
P(T t ) e t
P(T s) e s
P(T s t ) e ( s t )
Example
Suppose that the number of miles a car can run before
P(T k l , T l )
P(T k ) e k e k /
P(T l )
Example
Suppose that the number of miles a car can run before
we have
P(T k l , T l ) 1 FT (k l )
P(T k l | T l )
P(T l )
1 FT (l )
Property: Minimum
Consider independent exponentially distributed
P( X i x) due to independence
i 1
n
e i x
i 1
i x
i 1