Professional Documents
Culture Documents
Definition: Let S be a sample space and a sigma field defined over it. Let P : be a mapping from the sigma-algebra into the real
line such that for each A , there exists a unique P( A) . Clearly P is a set function and is called probability if it satisfies the following
axioms
1. P ( A) 0 for all A
2. P( S ) 1
3. Countable additivity If A1 , A2 ,... are pair-wise disjoint events, i.e. Ai Aj for i j , then
i 1
i 1
P ( Ai ) P( Ai )
Conditional Probability
The probability of an event B under the condition that another event A has occurred is called the conditional probability of B given A and
defined by
P(A B)
, P(A) 0
P(A)
P(A B) P ( A) P ( B / A)
The events A and B are independent if P( B / A) P( B) and P( A / B) P( A) so that P(A B) P( A) P( B)
P ( B / A)
Bayes Theorem
Suppose A1 , A2 , . . . , An are partitions on S such that S = A1 A2 ..... An and Ai Aj = for i j.
Suppose the event B occurs if one of the events A1 , A2 , . . . An occurs. Thus we have the information of the
probabilities P(Ai ) and P(B / Ai ), i = 1,2..,n.We ask the following question :
Given that B has occured what is the probability that a particular event Ak has occured? In other words
what is P(Ak / B) ?
n
P(Ak | B) =
=
P(Ak ) P B / Ak
P(B)
P(Ak ) P B / Ak
n
P(A )P(B / A )
i
i=1
Random Variables
Consider the probability space (S , , P) . A Random variable is a function X : S
Figure 2
is a function defined by
FX ( x) P({s | X ( s) x, s S})
P({ X x})
for all x . It is also called the cumulative distribution function abbreviated as CDF.
The notation for FX ( x) is used to denote the CDF of the RV X at a point x .
Properties
FX (x) is a non-decreasing and right continuous function of X .
FX () 0 and FX () 1
P({x1 X x}) FX ( x) FX ( x1 )
pX ( xi ) P({s | X ( s) xi })
P({ X xi })
Definition: A random variable X defined on the probability space ( S , , P) is said to be continuous if FX (x) is absolutely continuous. Thus FX (x)
FX ( x)
f X (u )du
Clearly, f X ( x) 0, x and
f X ( x)dx 1
( X (s), Y (s))
Y ( s)
X ( s)
The probability P {X x, Y y} ( x, y)
X ( s)
FX ,Y ( x, y).
Clearly,
FX ,Y (, ) FX ,Y (, y) FX ,Y ( x, ) 0
and FX ,Y (, ) 1.
P {x1 X x2 , y1 Y y2 } FX ,Y ( x2 , y2 ) FX ,Y ( x1 , y2 ) FX ,Y ( x2 , y1 ) FX ,Y ( x1 , y1 )
FX ,Y ( x, ) FX ( x, ), FX ,Y ( y, ) FY ( y)
FX ,Y ( x, y) f X ,Y (u, v)dvdu
If f X ,Y ( x, y) is continuous at ( x, y) , then
f X ,Y ( x, y )
2
FX ,Y ( x, y ) .
xy
Clearly,
f X ,Y ( x, y) 0 ( x, y)
and f X ,Y ( x, y )dxdy 1 .
when f X ( x) 0 . Thus,
f X ,Y ( x, y) f X ( x) fY | X ( y | x) fY ( y) f X |Y ( x | y)
f X ,Y ( x, y) f X ( x) fY ( y)
f X |Y ( x | y )
f X ,Y ( x, y)
fY ( y )
f X |Y ( x | y )
=
f X ( x ) fY | X ( y / x )
fY ( y )
f X ,Y ( x, y )
f X ,Y ( x, y )dx
fYIX ( y | x) f X ( x)
f X (u ) fY | X ( y | x)du
Expected values
The expected value of a function g ( X , Y ) of continuous random variables X and Y , is given by
Particularly, we have,
EX xf X ,Y ( x, y )dxdy
x f X ,Y ( x, y )dy dx
xf X ( x)dx
For two continuous random variables X and Y , the joint moment of order m n is defined as
E ( X mY n ) x m y n f X ,Y ( x, y )dxdy
E ( X X )m (Y Y )n ( x X )m ( y Y )n f X ,Y ( x, y)dxdy
where
X EX
Y EY
The variance X2 is given by E ( X X )2 EX 2 X2 . The covariance Cov( X , Y ) between two RVs X and Y is given by
C ov( X , Y ) E( X X )(Y Y ) EXY X Y .
Two RVs X and Y are called uncorrelated if Cov( X , Y ) 0 , or in other words, EXY X Y .
If X and Y are independent, they are always uncorrelated. The converse is generally not true.
The ratio X ,Y
C ov( X , Y )
Y aX b
where Y is the approximation of Y . Such an approximation is called linear regression.
Approximation error Y Y
Mean-square approximation error
E (Y Y )2 E (Y aX b)2
For minimizing E (Y Y )2 with respect to a and b will give optimal values of a and b. Corresponding to the optimal solutions for a and b,
we have
E (Y aX b) 2 0
a
E (Y aX b)2 0
b
Y Y
slope X ,Y
Cov( X , Y )
2
X
X ,Y
we get
Y
X
y
x
X X
Figure 2
and
b Y a X
Y Y
X2
X ,Y ( X X )
y
( X X )
so that Y Y X ,Y
x
Conditional expectation
Suppose X and Y are two continuous RVs. The conditional expectation of Y given X x is defined by
Y / X x E (Y / X x) yfY / X ( y / x)
f X ,Y ( x, y )
1
21 2 1 2
If X ,Y 0 , we have
1
2(1 2 )
( x1 )2
( x )( y ) ( y )2
1
2
2
2
2
22
1 2
1
f X ,Y ( x, y )
1
21 2
1
21
e
e
( x1 )2 ( y 2 )2
1
2 12
22
( x 1 )2
1
2 12
1
2 2
( y 2 )2
1
2 22
f X ( x ) fY ( y )
For jointly Gaussian RVs, uncorrelatedness implies independence.
Random Vectors
We can extend the definition of the joint RVs to n random variables X1 , X 2 ,.., X n defined on the same probability space ( S , , P). We can
denote the these n RVs by
X1
X
2
X . X1 X 2 ... X n
.
X n
.A particular value of the random vector X is denoted by x=[ x1 x2 .... xn ]'.
The CDF of the random vector X is defined as the joint CDF of X1 , X 2 ,.., X n . Thus
x1
x1
f X (x)
n
FX , X ,..., X ( x1 , x2 ,..., xn )
x1x2 ...xn 1 2 n
X E ( X)
E ( X 1 ) E ( X 2 )... E ( X n ) '.
X1 X 2 ... X n '.
Similarly for each (i, j ), i 1, 2,.., n, j 1, 2,.., n, i j we can define the joint moment
E ( X i X j ) . All the joint moments and the mean-square values EX i2 , i 1, 2,.., n, can be represented into a correlation matrix R X,X given by
R X,X EXX'
EX 12
EX 1 X 2 ... ... EX 1 X n
EX 2 X 1
EX 22 ... ... EX 2 X n
EX n X 1
EX n X 2 ... ... EX 22
Similarly, all the possible covariances and the variances can be represented in terms of a matrix called the covariance matrix CX,X defined by
CX,X E ( X X )( X X )
cov( X 1 , X 2 ) cov( X 1 , X n )
var( X 1 )
cov( X , X ) var( X ) . cov( X , X )
2
1
2
2
n
var( X n )
cov( X n , X 1 ) cov( X n , X 2 )
It can be shown that
CX,X R X,X X'X
The random variables X1 , X 2 ,..., X n are called identically distributed if each random variable has the same marginal distribution function, that
is, x ,
FX1 x FX 2 x ... FX n x
An important subclass of independent random variables is the independent and identically distributed (iid) random variables. The random
variables X1 , X 2 ,..., X n are called iid if X1 , X 2 ,..., X n are mutually independent and each of X1 , X 2 ,..., X n has the same marginal distribution
function.
f X1 , X 2 ,..., X n ( x1 , x2 ,..., xn )
1
( X X )C-1
( X X )
X
e 2
1
n
det(CX )
Remark
The properties of the two-dimensional Gaussian random variables can be extended to multiple jointly Gaussian random variables.
If X1 , X 2 ,....., X n are jointly Gaussian, then the marginal PDF of each of X1 , X 2 ,....., X n is a Gaussian.
If the jointly Gaussian random variables X1 , X 2 ,..., X n are uncorrelated, then X1 , X 2 ,..., X n are independent also.
Inequalities based on expectations
The mean and variance also give some quantitative information about the bounds of RVs. Following inequalities are extremely useful in
many practical problems.
Markov and Chebysev Inequalities
For a random variable X which takes only nonnegative values
P{ X a}
E( X )
a
where a 0.
E ( X ) xf X ( x)dx
0
xf X ( x) dx
a
af X ( x)dx
a
aP{ X a}
P{ X a}
E( X )
a
Clearly, P{( X k )2 a}
E ( X k )2
a
P{ X X } P{ X X 2 }
2
P{ X X }
X2
X2
where Sn X i .
i 1
Theorem 1 Weak law of large numbers( WLLN): Suppose { X n } is a sequence of random variables defined on a probability space (S , , P)
with finite mean i EX i , i 1, 2,..., n and finite second moments. If
1 n n
s
1 n
P.
i .
cov( X i , X j ) 0 , then n
2
n n i=1 j=1,j i
n
n i 1
s
1 n
P.
Note that n
i means that for any 0,
n
n i 1
lim
Proof: We have
E(
Sn 1 n
1 n
1 n
i ) 2 E X i i
n n i 1
n i 1
n i 1
1 n
E ( X i i )
2
n i 1
1 n
lim P n i 0
n n
n i 1
1 n
1 n n
E ( X i i ) 2 + 2 E ( X i i )( X j j )
2
n i 1
n i=1 j=1,ji
n
n
n
1
1
2 i2i + 2 cov( X i , X j )
n i 1
n i=1 j=1,ji
Sn 1 n
1 n n
1 n
n
1
2
2
Now lim 2 ii 0 , as each ii is finite. Also,
n n i 1
1 n n
lim 2 cov( X i , X j ) 0
n n i=1 j=1,j i
S
1 n
lim E ( n i ) 2 0
n
n n i 1
S
1 n
E ( n i ) 2
n
s
1
n n i 1
Now P n i
(Chebyshev Inequality)
2
n n i 1
lim E (
1 n
lim P n i 0
n n
n i 1
sn P. 1 n
i
n
n i 1
lim
0
i
n n 2 i 1 i
n n 2 i 1
s
P.
n
n
lim
lim Fzn ( z )
n
1 u 2 2
e
du
2
X
i 1
and Z n
S n n
. Then
Random processes
Definition: Consider a probability space {S , , P}. A random process can be defined on {S , , P} as an indexed family of random
variables {X (s, t ), s S, t } where is an index set usually denoting time.
Thus X (s, t ) is a function defined on S . Figure1 illustrates a random process. The random process {X (s, t ), s S, t } is synonymously
referred to as a random function or a stochastic process also.
We observe the following in the case of a random process {X (s, t ), s S , t }
(1) For a fixed time t t0 , the collection {X (s, t0 ), s S} is a random variable.
(2) For a fixed sample point s s0 , the collection {X (s0 , t ), t } is no longer a function of the sample space. It is a deterministic
function on and called a realization of the random process. Thus each realization corresponds to a particular sample point and the
cardinality of S determines the number of such realizations. The collection of all the possible realizations of a random process is
called the ensemble.
(3) When both s and t are fixed at values s s0 and a fixed t t0 , X (s0 , t0 ) becomes a single number.
The underlying sample space and the index set are usually omitted to simplify the notation and the random process {X (s, t ), s S , t } is
generally denoted by { X (t )} .
X ( s3 , t )
X ( s2 , t )
s3
s2
s1
X ( s1 , t )
t
Figure 1 A random process
To describe {X (t ), t } we have to consider the collection of the random variables at all possible values of t. For any positive integer n , the
collection X (t1), X (t2 ),..., X (tn ) represents n jointly distributed random variables. Thus a random process {X (t ), t } at these n instants
t1, t2 ,..., tn can thus be described by specifying the n-th order joint distribution function
FX (t1 ), X (t2 ),..., X (tn ) ( x1 , x2 ,..., xn ) P( X (t1 ) x1, X (t2 ) x2 ,..., X (tn ) xn )
and the n-th order joint probability density function f X (t1 ), X (t2 ),..., X (tn ) ( x1 , x2 ,..., xn ) defined by
FX (t1 ), X (t2 ),..., X (tn ) ( x1 , x2 ,..., xn )
x1 x2
xn
However, we have to consider the joint probability distribution function for very high n and all possible t1, t2 ,..., tn to describe the
random process in sufficient details. This being a formidable task, we have to look for other descriptions of a random process.
Moments of a random process
We defined the moments of a random variable and joint moments of random variables. We can define all the possible moments and joint
moments of a random process {X (t ), t }. Particularly, following moments are important.
RX (t1 , t2 ) = autocorrelation function of the process at times t1 and t2 E( X (t 1 ) X (t2 )) Note that
The autocovariance function CX (t1 , t2 ) of the random process at time t1 and t2 is defined by
C X (t1 , t2 ) E ( X (t 1 ) X (t1 ))( X (t2 ) X (t2 ))
=RX (t1 , t2 ) X (t1 ) X (t2 )
C X (t , t ) E ( X (t ) X (t )) 2 variance of the process at time t.
For a discrete random process, we can define the autocorrelation sequence similarly.
If R X ( ) drops quickly , then the signal samples are less correlated which in turn means that the signal has lot of changes with respect
to time. Such a signal has high frequency components. If R X ( ) drops slowly, the signal samples are highly correlated and such a
signal has less high frequency components.
R X ( ) is directly related to the frequency domain representation of WSS process. The power spectral density S X ( ) is the
contribution to the average power at frequency and is given by
S X ( )
( )e j d
RX ( )
1
S X ( )e j dw
2
Example
PSD of the amplitude-modulated random-phase sinusoid X (t ) M (t ) cos ct ,
~ U 0, 2
A2
cos c
2
A2
SM c SM c where SM is the PSD of M (t )
4
The Wiener-Khinchin theorem is also valid for discrete-time random processes.
S X
S X ( ) Rx m e j m
m
or S X ( f ) Rx m e j 2 m
m
RX [ m]
w
1 f 1
1
j m
S X ( )e d
2
For a discrete-time random process, the generalized PSD is defined in the z domain as follows
S X ( z)
R m z
h[n]
y[n]
h k x[n k ]
H ( )
h n e
j n
Y [ n]
h k X [n k ]
In the sense that each realization is subjected to the convolution operation. Assume that { X [n]} is WSS. Then the expected value of the
output is given by,
Y EY [n]
h k EX [n k ]
h k
X H (0)
The Cross correlation of the input X [n m] and the output Y [n] is given by
h k X [n k ]
E X [n m]Y [n] E X [n m]
h k E X [ n m] X [ n k ]
h k R
[m k ]
h l R
[m l ]
Similarly,
h k X [n k ]
E Y [n m]Y [n] E Y [n m]
h k E Y [n m] X [n k ]
h k R
YX
[m k ]
SY ( ) ) S X )
2
X (t )
is defined by
S X ( )
N0
2
where N 0 is a real constant and called the intensity of the white noise. The corresponding autocorrelation function is given by
RX ( )
N
( )
2
1 N
d
2 2
S X ( )
RX ( )
N0
2
N0
( )
2
0
(a)
0
(b)
Here 2 is the variance of X [n] which is independent of n and (m) is the unit impulse signal. By taking the discrete-time Fourier
transform, we get
S X () 2
S X ( )
2
2
RX [m]
Note that a white noise process is described by its second-order statistics only and is therefore not unique. Particularly , if in addition, each
X [n] is Gaussian distributed, the white noise process is called a white Gaussian noise process. Similarly a sequence of Bernoulli
2
2
SY ( ) ) 2
2
A WSS random signal { X [n]} that satisfies the Paley Wiener condition | ln S X ( ) | d can be considered as an output of a linear
then S X ( z) v2 H c ( z) H a ( z)
where
H c (z ) is the causal minimum phase transfer function
X [n]
H c (z )
X [n]
1
H c ( z)
V [ n]
The spectral factorization theorem enables us to model a regular random process as an output of a minimum phase linear filter with
white noise as input. Different models are developed using different forms of linear filters.
These models are mathematically described by linear constant coefficient difference equations.
In statistics, random-process modeling using difference equations is known as time series analysis.
Under the most practical situation, the process may be considered as an output of a filter that has both zeros and poles.
q
V [ n]
H ( z)
bi z
i 0
p
X [ n]
1 ai z
i 1
i 1
i 0
X [n] ai X [n i] bV
i [n i ]