Professional Documents
Culture Documents
Handout 11
Proof: Since E and E c are disjoint and their union is S, from statement 3 of the
axiom,
Pr{S} = Pr{E} + Pr{E c }.
From statement 1, we obtain the desired property, i.e.
1 = Pr{E} + Pr{E c }. ¤
• If E and F are not disjoint, then
Pr{E ∪ F} = Pr{E} + Pr{F} − Pr{E, F}
Since E ∪ (F − E) = E ∪ F and E ∩ (F − E) = ∅,
Pr{E ∪ F} = Pr{E} + Pr{F − E} = Pr{E} + Pr{F} − Pr{E, F}.
| {z }
=Pr{F }−Pr{F,E}
¤
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert.
2
It is common to write Pr{E ∩ F} as Pr{E, F}. We shall adopt this notation.
1
• If F1 , . . . , Fn are disjoint, then
( n
) n
[ X
Pr Fi = Pr{Fi }
i=1 i=1
Proof: The statement follows from induction. For example, consider n = 3. Since
F1 ∪ F2 and F3 are disjoint, we can write
The conditional probability of event E given that event F happens (or in short given
event F), denoted by Pr{E|F}, is defined as
Pr{E, F}
Pr{E|F} =
Pr{F}
Proof: Write n
[
E= (E ∩ Fi ).
i=1
Pr{E|Fi } Pr{Fi }
Pr{Fi |E} = Pn
j=1 Pr{E|Fj } Pr{Fj }
2
Proof: Write Pr{Fi |E} as
yielding
n
Y
Pr{F1 , . . . , Fn } = Pr{F1 } Pr{Fi |F1 , . . . , Fi−1 }
i=2
or equivalently
Pr{E|F} = Pr{E}
In addition, events E and F are conditionally independent given event G if
• If S is uncountable, making X(s) take any real value in its range, then X(s) is a
continuous random variable.
The basic idea behind a random variable is that we can consider probabilistic events
as numerical-valued events, which lead us to a probability function. With this function,
we can neglect the underlying mapping from s to X, and consider a random variable X
as a direct numerical outcome of a probabilistic experiment or action.
3
1.1.3 Probability Functions
By using a random variable X, we can define numerical-valued events such as X = x and
X ≤ x for x ∈ R. The probability function
FX (x) = Pr{X ≤ x}
is known as the cumulative distribution function (CDF) or simply the distribution func-
tion. Note that the CDF is defined for all x ∈ R.
• It is customary to denote a random variable by an upper-case letter, e.g. X, and
denote its specific value by a lower-case letter, e.g. x.
• The nature of the function FX (x) is determined by random variable X, which is
identified in the subscript. When the associated random variable X is clear from
the context, we often write F (x) instead of FX (x).
• Since FX (x) indicates a probability value, it is dimensionless.
dFX (x)
fX (x) =
dx
NOTE: A common mistake is to think that fX (x) = Pr{X = x}; it is not always true.
Over all, the PDF fX (x) or CDF FX (x) provides a complete description of random
variable X.
4
1.1.4 Continuous vs. Discrete Random Variables
Roughly speaking, a continuous random variable has a continuous CDF. A discrete ran-
dom variable has a staircase CDF. A mixed-type random variable has a CDF containing
discontinuities, but the CDF is not necessarily constant between discontinuities. Fig-
ure 1.1 illustrates different types of CDFs.
continuous
discrete
mixed−type
CDF PDF
Figure 1.1: CDFs and PDFs of different types of random variables.
Since the PDF is the derivative of the CDF, a continuous random variable has a
continuous PDF. However, the PDF of a discrete or mixed-type random varaible contains
impulses due to the discontinuities in the CDF.
PMF
For a discrete random variable, let X denote the countable set of all possible values of
X(s). We can then define a probability mass function (PMF) as
fX (x) = Pr{X = x}
where x ∈ X . Note that a PMF is only meaningful for a discrete random variable. The
same notation fX (x) is used for both the PDF and the PMF; it is usually clear from the
context which type of function is referred to by fX (x).
Example 1.1 : Consider rolling of a dice. The set of sample points of this probabilis-
tic experiment is S = {1, 2, 3, 4, 5, 6}. The natural definition of an associated random
variable is
X(s) = s, s ∈ S.
5
The corresponding PMF is
1/6
2/6
1/6
0
1 2 3 4 5 6
Figure 1.2: PDF and CDF of the result of a dice roll.
6
Review
Handout 21
The PDF for X (or Y ) alone is called a marginal PDF of X (or Y ) and can be found
from the joint PDF by integrating over the other random variable, i.e.
Z ∞ Z ∞
fX (x) = fXY (x, y)dy, fY (y) = fXY (x, y)dx.
−∞ −∞
Example 1.2 : Suppose that fXY (x, y) = 14 e−|x|−|y| . The marginal PDF of X is
Z ∞ Z
1 −|x|−|y| 1 −|x| ∞ −|y|
fX (x) = e dy = e e dy
−∞ 4 4 −∞
Z
1 −|x| ∞ −y 1 ¯∞ 1
= e e dy = e−|x| × −e−y ¯0 = e−|x| .
2 0 2 | {z } 2
=1
1
1.2 Functions of Random Variables
Consider a random variable Y that is obtained as a function of another random variable
X. In particular, suppose that Y = g(X). We first consider g that is monotonic (either
increasing or decreasing).
Monotonic Functions
If g is monotonic, each value y of Y has a unique inverse denoted by g −1 (y), as illustrated
in figure 1.3.
yielding
dFY (y) dg −1 (y)
fY (y) = = fX (g −1 (y)) .
dy dy
Similarly, when g is monotonically decreasing,
dg −1 (y)
fY (y) = −fX (g −1 (y)) .
dy
It follows that, for a monotonic function g, we have
¯ −1 ¯
¯ dg (y) ¯
fY (y) = fX (g −1 (y)) · ¯¯ ¯
dy ¯
Example 1.3 : Let Y = g(X), where g(x) = ax + b. Then, g −1 (y) = (y − b)/a, yielding
dg −1 (y)/dy = 1/a. It follows that
µ ¶ ¯ ¯ µ ¶
y−b ¯1¯ 1 y − b
fY (y) = fX · ¯¯ ¯¯ = fX . ¤
a a |a| a
2
Figure 1.4: Nonmonotonic function of random variable X.
Nonmonotonic Functions
If g is not monotonic, then several values of x can correspond to a single value of y, as
illustrated in figure 1.4.
We can view g as having multiple monotonic components g1 , . . . , gK , where K is the
number of monotonic components, and sum the PDFs from these components, i.e.
K ¯ −1 ¯
X ¯ dg (y) ¯
fY (y) = fX (gk−1 (y)) · ¯¯ k ¯
k=1
dy ¯
Example 1.4 : Let Y = g(X), where g(x) = ax2 with a > 0, as illustrated in figure 1.5.
0
Figure 1.5: Y = aX 2 with a > 0.
Note that ¯ −1 ¯ ¯ −1 ¯ ¯ ³ ´ ¯
¯ dg1 (y) ¯ ¯ dg2 (y) ¯ ¯ 1 y −1/2 ¯
¯ ¯ ¯ ¯ ¯ ¯ = √1 .
¯ dy ¯ = ¯ dy ¯ = ¯ 2a a ¯ 2 ay
It follows that
· µ r ¶ µr ¶¸
1 y y
fY (y) = √ fX − + fX for y ≥ 0. ¤
2 ay a a
3
Review
Handout 31
where E[·] denotes the operator for taking the expected value of a random variable. For
convenience, we also denote E[X] by X.
Suppose that Y = g(X), i.e. Y is a function
R ∞ of X. One way to find E[Y ] is to first
compute fY (y) and then compute E[Y ] = −∞ yfY (y)dy. However, it is often easier to
use the following identify.
Z ∞
E[Y ] = E[g(X)] = g(x)fX (x)dx
−∞
Another useful property in taking the expectation is the linearity property, which
follows directly from the linearity of the integration operation. In particular, for any
random variables X1 , . . . , XN and any real numbers a1 , . . . , aN ,
" N # N
X X
E an Xn = an E [Xn ]
n=1 n=1
• Mean of X denoted by E[X] or X: Note that the mean of X is equal to the 1st
moment of X.
• Mean square of X denoted by E[X 2 ]: The mean square of X is equal to the 2nd
moment of X. More specifically,
Z ∞
2
E[X ] = x2 fX (x)dx
−∞
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert.
1
2
• Variance of X denoted by var[X] or σX : The variance of X is equal to the 2nd
central moment of X. More specifically,
Z ∞
var[X] = (x − X)2 fX (x)dx
−∞
where var[·] denotes the operator for taking the variance of a random variable.
Note that the mean E[X] can be thought of as the best guess of X in terms of the mean
square error. In particular, consider the problem of finding a number a that minimizes
the mean square error MSE = E[(X − a)2 ]. We show below that the error is minimized
by setting a = E[X]. In particular, solving dMSE/da = 0 yields
d ¡ ¢ d ¡ ¢
0= E[X 2 − 2aX + a2 ] = E[X 2 ] − 2aE[X] + a2 ] = −2E[X] + 2a,
da da
or equivalently a = E[X].
2
Roughly speaking, the variance σX measures the effective width of the PDF around
the mean. We next provide a more quantitative discussion on the variance.
E[X]
Pr{X ≥ a} ≤ .
a
R∞ R∞ x 1
R∞ E[X]
Proof: Pr{X ≥ a} = a
fX (x)dx ≤ a
f (x)dx
a X
≤ a 0
fX (x)dx = a
. ¤
Figure 1.6 illustrates how Pr{|X − E[X]| ≥ b} in the Chebyshev inequality is equal
to the area under the “tails” of the PDF. In particular, for b = 2σX , we have
1
Pr{|X − E[X]| ≥ 2σX } ≤ ,
4
which means that we can expect at least 75% of observations on random variable X to
be within the range E[X] ± 2σX . Thus, the smaller the variance, the smaller the spread
of likely values.
2
Figure 1.6: Area under the PDF tails for the Chebyshev inequality.
2
To help compute the variance σX , the following identify is sometimes useful.
2 2
σX = E[X 2 ] − X
Multivariate Expectations
Consider a function g(X, Y ) of two random variables X and Y . Then,
Z ∞ Z ∞
E[g(X, Y )] = g(x, y)fX,Y (x, y)dxdy
−∞ −∞
3
When g(X, Y ) = XY , we have
Z ∞ Z ∞
E[XY ] = xyfX,Y (x, y)dxdy.
−∞ −∞
In addition, if X and Y are independent, i.e. fX,Y (x, y) = fX (x)fY (y), we can write
µZ ∞ ¶ µZ ∞ ¶
E[XY ] = xfX (x)dx yfY (y)dy = E[X]E[Y ].
−∞ −∞
Thus, for independent random variables X and Y ,
E[XY ] = E[X]E[Y ] for independent X and Y
In addition,
" N # N
X X
var Xn = var[Xn ] for uncorrelated X1 , . . . , XN
n=1 n=1
Finally, recall that E[XY ] = E[X]E[Y ] for independent X and Y . It follows that
independent random variables are uncorrelated. However, the converse is not true in
general.
4
1.4 Real and Complex Random Vectors and Their Functions
1.4.1 Real Random Vectors
A real random vector is a vector of random variables. In particular, let X = (X1 , . . . , XN ),
where X1 , . . . , XN are random variables. By convention, a real random vector is a column
vector. The statistics of X is fully described by the joint CDF of X1 , . . . , XN , i.e.
Z = X + iY.
The mean of Z is
E[Z] = Z = X + iY ,
while the variance of Z is h¯ ¯2 i
σZ2 = E ¯Z − Z ¯ .
The covariance of two complex random variables Z1 and Z2 is defined as
£ ¤
CZ1 Z2 = E (Z1 − Z1 )(Z2 − Z2 )∗
5
1.4.3 Functions of Random Vectors
Consider N random variables X1 , . . . , XN . Let Y1 , . . . , YN be functions of X1 , . . . , XN . In
particular,
Yn = gn (X1 , . . . , XN ), n = 1, . . . , N.
Let X = (X1 , . . . , XN ) and Y = (Y1 , . . . , YN ). In addition, let g(x) = (g1 (x), . . . , gN (x)).
Assuming that g is invertible, then the joint PDF of Y can be written in terms of the
joint PDF of X as
fY (y) = |J(y)|fX (g−1 (y))
where J(y) is the Jacobian determinant
¯ ¯
¯ ∂g −1 (y)/∂y1 · · · ∂g −1 (y)/∂yN ¯
¯ 1 1 ¯
¯ .. ... .. ¯
J(y) = ¯ ¯
¯ −1 . . ¯
¯ ∂gN (y)/∂y1 · · · ∂gN −1
(y)/∂yN ¯
Suppose that there are multiple solutions of x for y = g(x). We can view g as having
multiple components g1 , . . . , gK . It follows that
K
X
fY (y) = |Jk (y)|fX (gk−1 (y))
k=1
Example 1.6 : Suppose that we know the joint PDF fX1 ,X2 (x1 , x2 ) for random variables
X1 and X2 . Define
Y1 = g1 (X1 , X2 ) = X1 + X2
Y2 = g2 (X1 , X2 ) = X1
6
Review
Handout 41
1
so that we can write X = AZ. We shall derive the PDF fX (x) in what follows. For
simplicity, we focus on the case with M = N . However, the resultant PDF expressions
are also valid for M 6= N .
We begin with the marginal PDF of Zm , which is the zero-mean unit-variance Gaus-
sian PDF, i.e.
1 2
fZ (z) = √ e−z /2 .
2π
Since Zm ’s are IID, we can write
N
Y 1 2 1 − 21 zT z
fZ (z) = √ e−zm /2 = N/2
e .
m=1
2π (2π)
1
Using the identity fX (x) = f (A−1 x),
| det A| Z
we can write
(A−1 x)T (A−1 x) = xT (A−1 )T A−1 x = xT (AT )−1 A−1 x = xT (AAT )−1 x.
Let CX be the covariance matrix for random vector X. It is easy to see that X =
AZ = 0, yielding
£ ¤ £ ¤ £ ¤ £ ¤
CX = E XXT = E AZ(AZ)T = E AZZT AT = AE ZZT AT = AAT
£ ¤
where the last equality follows from the fact that E ZZT = I. Since CX = AAT ,
1 1 T C −1 x
fX (x) = √ e− 2 x X
(zero-mean jointly Gaussian)
(2π)N/2 det CX
1 1 T C −1 (x−X)
fX (x) = √ e− 2 (x−X) X
(jointly Gaussian)
(2π)N/2 det CX
The proof is similar to the zero-mean jointly Gaussian case and is omitted.
Some important properties of jointly Gaussian random vector X are listed below.
1. A linear transformation of X yields another jointly Gaussian random vector.
2. The PDF of X is fully determined by the mean X and the covariance matrix CX ,
which are the first-order and second-order statistics.
2
Example 1.7 : Recall that the Gaussian PDF has the form
(x−X)2
1 −
2σ 2
fX (x) = p 2
e X .
2πσX
We now show that two jointly Gaussian random variables are independent if they are
uncorrelated. Let X1 and X2 be jointly Gaussian and uncorrelated. It follows that the
covariance matrix of X = (X1 , X2 ) has the form
· 2 ¸
σ1 0
CX = ,
0 σ22
where σ12 and σ22 are the variances of X1 and X2 respectively. By substituting
p · ¸
−1 1/σ12 0
det CX = σ1 σ2 and CX =
0 1/σ22
into the joint PDF expression of X, we can write
µ ¶
2
(x1 −X 1 ) (x −X ) 2
1 −1 2 + 2 22
fX1 ,X2 (x1 , x2 ) = e 2 σ1 σ2
2πσ1 σ2
à !à !
(x −X )2 (x −X )2
1 − 1 21 1 − 2 22
= p 2
e 2σ1
p
2
e 2σ2
2πσ1 2πσ2
= fX1 (x1 )fX2 (x2 ),
which implies that X1 and X2 are independent. The argument can in fact be extended in
a straightforward manner to show that uncorrelated jointly Gaussian random variables
X1 , . . . , XN are independent. ¤
3
Example Engineering Applications
Name Application
uniform modeling of quantization error
Gaussian amplitude distribution of thermal noise,
approximation of other distributions
exponential message length and interarrival time in data communications
Rayleigh fading in communication channels,
envelope of bandpass Gaussian noise
binomial number of random transmission errors in a transmitted block
of n digits
Poisson traffic model, e.g. number of message arrivals in a given
time interval
Note that the above integral cannot be evaluated to get a closed form expression. Hence,
in practice, the error function is evaluated using a table lookup. In a typical computa-
tional software, e.g. MATLAB, there is a command to evaluate the error function.
In digital communications, it is customary to use the Q function, where
Z ∞
1 2
Q(x) = √ e−u /2 du.
x 2π
SN −X
Rougly speaking, the CLT states that, as N gets large, the CDF of √
σX / N
approaches
that of a zero-mean unit-variance Gaussian RV.
4
Handout 51
Since the integration in the above definition resembles the inverse Fourier transform,
it follows that ΦX (ν) and fX (x) are a Fourier transform pair. More explicitly, if we
substitute x by f and ν by 2πt, then we can write
Z ∞
ΦX (2πt) = ej2πf t fX (f )df,
−∞
since ΦX (ν) and fX (x) form a Fourier transform pair. The characteristic function can be
used instead of the PDF as a complete statistical description of a random variable. By
using the characteristic functions, we can exploit properties of Fourier transform pairs to
compute several quantities of interest, as indicated below.
Note that the second last equality follows from the independence between X and Y .
Since multiplication in the time domain corresponds to convoluation in the frequency
domain, having ΦZ (ν) = ΦX (ν)Φ(ν) is equivalent to having
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert.
1
PN
Hence, for independent X1 , . . . , XN and Z = n=1 Xn ,
N
Y
ΦZ (ν) = ΦXn (ν), fZ (z) = fX1 (z) ∗ · · · ∗ fXN (z)
n=1
Notice that setting ν = 0 will make the last integral equal to the mean of X, yielding
¯
d ¯
E[X] = −j ΦX (ν)¯¯ .
dν ν=0
The above argument can be extended to obtain the nth moment (as long as the
characteristic function is differentiable up to the nth order), i.e.
¯
dn ¯
n
E [X ] = (−j) Φ (ν) ¯ n
dν n
X ¯
ν=0
Finally, suppose that ΦX (ν) can be expressed as a Taylor series expansion around
ν = 0. Then, ΦX (ν) can be written in terms of the moments of X as follows.
∞ ¯ ∞
X νn dn ¯ X (jν)n
ΦX (ν) = · n ΦX (ν)¯¯ = · E [X n ]
n=0
n! dν ν=0 n=0
n!
Recall the following Fourier transform pair for the Gaussian pulse.
2 /τ 2 2f 2
Ae−πt ↔ Aτ e−πτ
2
By setting A = 1 and τ = √ 1 2
, we can write
2πσX
(f −X)2
2 2 /2 1 −
e−σX (2πt) ejX(2πt) ↔ p 2
e 2σ 2
X
2πσX
Since the right hand side is equal to fX (f ), the left hand side is equal to ΦX (2πt). It
follows that
2 2
ΦX (ν) = ejXν−σX ν /2
Finally, note that it is possible to obtain the above expression through direct integration.
However, there will be more computation involved.
Note that the moment generating function is equivalent to the characteristic function
ΦX (ν) when s = jν.
As the name suggests, there is a close relationship between ΨX (s) and the nth moment
of X. In particular,
¯
dn ¯
E [X ] = n ΨX (s)¯¯
n
ds s=0
The proof is quite similar to using the characteristic function and is thus omitted.
3
The mean or first moment is computed below.
¯ ¯
d ¯ −jλ ¯ 1
E[X] = −j · ΦX (ν)¯¯ = −j · 2
¯
¯ =
dν ν=0 (ν + jλ) ν=0 λ
X∞ 2 2 X∞ X∞
(−σX ν /2)m (−1)m σX
2m 2m
ν (jν)2m (2m)!σX
2m
ΦY (ν) = = = ·
m=0
m! m=0
2m m! m=0
(2m)! 2m m!
∞
X X ∞
(jν)n n
n!σX (jν)n 1 · 2 · 3 · · · n n
= · n/2 = · σ
n=0,n even
n! 2 (n/2)! n=0,n even n! 2 · 4 · 6···n X
X ∞
(jν)n n
= · 1 · 3 · 5 · · · (n − 1)σX .
n=0,n even
n!
By comparing term by term to the Taylor’s series expansion mentioned previosly, i.e.
∞
X (jν)n
ΦY (ν) = · E [Y n ] ,
n=0
n!
4
Review
Asian Institute of Technology
Handout 61
© ª σ2
Pr |X − X| ≥ δ ≤ X2
δ
where δ > 0.
We now provide an alternative derivation of this inequality. Consider the function
g(y) defined as follows. ½
1, |y| ≥ δ
g(y) =
0, |y| < δ
Figure 2.7 illustrates that g(y) ≤ y 2 /δ 2 , which implies that
£ ¤
E[g(Y )] ≤ E Y 2 /δ 2
σY2
Pr {|Y | ≥ δ} ≤ .
δ2
2
Finally, let Y = X − X. Since σX = σY2 , we can write the desired expression, i.e.
© ª σ2
Pr |X − X| ≥ δ ≤ X2 .
δ
0
Figure 2.7: Bound on function g(y) for the Chebyshev inequality.
The Chebyshev bound is found to be “loose” for a large number of practical appli-
cations. One reason is the looseness of the function y 2 /δ 2 as an upper bound on the
function g(y).
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert.
1
2.7.2 Chernoff Bound
Tighter upper bounds can often be obtained using the Chernoff bound, which is derived
as follows. First, define the function g(x) as
½
1, x ≥ δ
g(x) =
0, x < δ
Figure 2.8 illustrates that g(x) ≤ es(x−δ) for any s > 0, which implies that
£ ¤
E[g(X)] ≤ E es(X−δ)
It follows that £ ¤
Pr{X ≥ δ} ≤ e−sδ E esX , s > 0
0
Figure 2.8: Bound on function g(x) for the Chernoff bound.
The above expression gives an upper bound on the “upper tail” of the PDF. The
tightest bound can be obtained by minimizing the upper bound expression with respect
to s, i.e. solving for s from
d £ s(X−δ) ¤ £ ¤ ¡ £ ¤ £ ¤¢
0= E e = E (X − δ)es(X−δ) = e−sδ E XesX − δE esX .
ds
Thus, the tightest bound is obtained by setting s = s∗ , where
£ ∗ ¤ £ ∗ ¤
E Xes X = δE es X , s∗ > 0
An upper bound on the “lower tail” of the PDF can be derived similarly, yielding
£ ¤
Pr{X ≤ δ} ≤ e−sδ E esX , s < 0
2
Another Look at the Chernoff Bound
Recall that the Chebyshev bound can be derived from the Markov inequality, i.e. Pr{X ≥
a} ≤ X/a for nonnegative random variable X. Similarly, we can derive the Chernoff
bound from the Markov inequality, as stated formally below.
Proof: Take esX as a random variable in the Markov inequality. In addition, view the
event X ≥ δ as being equivalent to esX ≥ esδ for s > 0. Finally, view the event X ≤ δ as
being equivalent to esX ≥ esδ for s < 0. ¤
1
Pr{Y ≥ δ} = e−δ (exact)
2
for any δ > 0. The Chebyshev bound is
2
Pr{|Y | ≥ δ} ≤ .
δ2
Since fX (x) is even, we can write
1
Pr{Y ≥ δ} ≤ (Chebyshev)
δ2
The Chernoff bound is given by
£ ¤ e−sδ
Pr{Y ≥ δ} ≤ e−sδ E esX = .
1 − s2
√
−1+ 1+δ 2
The bound can be optimized by setting s = δ
, yielding
δ2 √
2 δ
Pr{Y ≥ δ} ≤ ¡ √ ¢e1− 1+δ ≈ e−δ (Chernoff)
2 −1 + 1 + δ 2 2
for δ À 1. Thus, the Chernoff bound (with exponential decrease) is much tighter than
the Chebyshev bound (with polynomial decrease) for large δ. ¤
3
2.7.3 Tail Probabilities for a Sum of IID Random Variables
2
Let X1 , X2 , . . . be IID random variables with finite mean X and finite variance σX . Define
the sample mean
N
1 X
SN = Xn .
N n=1
Note that the mean of SN is
" N
# N
1 X 1 X 1
E [SN ] = E Xn = E [Xn ] = · N X = X.
N n=1 N n=1 N
Proof: Take SN as a random variable in the Chebyshev inequality and consider the limit
as N → ∞. ¤
As discussed above, the weak law of large numbers results from applying the Cheby-
shev inequality to the sample mean SN . Let us now consider applying the Chernoff bound
to SN . We start by writing, for any s > 0,
Pr {SN ≥ δ} = Pr {N SN ≥ N δ}
£ ¤ £ ¤
≤ e−sN δ E esN SN = e−sN δ E esX1 · · · esXN .
The bound is minimized by choosing s = s∗ such that the derivative of the bound is equal
to zero, i.e.
£ ∗ ¤ £ ∗ ¤
E X1 es X1 = δE es X1 , s∗ > 0
4
Example 2.10 : Consider IID random variables X1 , X2 , . . . with
½
1, with probability p
Xn =
−1, with probability 1 − p
where we assume that p < 1/2. We shall use the Chernoff bound to show that
( N )
X
Pr Xn ≥ 0 ≤ (4p(1 − p))N/2 .
n=1
PN
First, note that having n=1 Xn ≥ 0 is equivalent to having SN ≥ 0. Hence,
( N )
X ¡ £ ¤¢N
Pr Xn ≥ 0 = Pr {Sn ≥ 0} ≤ E esX1 , s > 0.
n=1
£ ¤
From the given PDF, E esX1 = pes + (1 − p)e−s , yielding
( N )
X ¡ ¢N
Pr Xn ≥ 0 ≤ pes + (1 − p)e−s , s > 0.
n=1
q
1−p
The bound can be minimized by setting es = p
, yielding the desired expression. ¤
5
where RN (ν) is the remainder term that goes to 0 as N → ∞.
It follows that µ ¶
ν2
ln ΦW (ν) = N ln 1 − + RN (ν) .
2N
We now use the fact that ln(1 + x) ≈ x for small x to write
ν2
lim ln ΦW (ν) = − ,
N →∞ 2
or equivalently
2 /2
ΦW (ν) = e−ν ,
which is the characteristic function of a zero-mean unit-variance Gaussian random vari-
able. Thus, in the limit as N → ∞, W becomes a zero-mean unit-variance Gaussian
random variable.
In general, the PDF of W may not approach the Gaussian PDF. However, the CDF
of W will approach the Gaussian CDF, as stated previously in the central limit theorem.
6
Review
Handout 71
1 √ √
fY (y) = √ [fX (− y) + fX ( y)] , y ≥ 0.
2 y
2 2
Substituting fX (x) = √ 1 e−x /2σ and using the even propertfy of fX (x),
2πσ 2
1 y
fY (y) = p e− 2σ2 , y ≥ 0
2πyσ 2
With the above PDF, Y is called a chi-square random variable with one degree of freedom.
Its characteristic function is written below.
Z ∞
1 y
ΦY (ν) = ejνy p e− 2σ2 dy
0 2πyσ 2
Z ∞
1 (1−j2σ 2 ν)y dy
= √ e− 2σ2 √ .
0 2πσ 2 y
1/2
By substituting u = ((1 − j2σ 2 ν)y) , we can write
Z ∞
2 1 u2 1
ΦY (ν) = √ e− 2σ2 du = ,
(1 − j2σ 2 ν)1/2 0 2πσ 2 (1 − j2σ 2 ν)1/2
where the last equality follows from the fact that the integral is equal to half the area
under the zero-mean unit-variance Gaussian PDF curve.
Consider now
PNN IID zero-mean Gaussian random variables X1 , . . . , XN with variance
σ 2 . Let Z = 2
n=1 Xn . Then, Z is a chi-square random variable with N degrees of
freedom. We find the PDF of Z by writing its characteristic function as follows. For
convenience, let Yn = Xn2 . Note that each Yn is a chi-square random variable with one
degree of freedom. Since Z is a sum of IID random variables Y1 , . . . , YN , we can write
ΦZ (ν) = (ΦY1 (ν))N , yielding
1
ΦZ (ν) =
(1 − j2σ 2 ν)N/2
1
where Γ(p) is the Gamma function defined as
Z ∞
Γ(p) = xp−1 e−x dx, p > 0
0
Below are some key properties of the Gamma function. Their proofs are left as exercises.
1. Γ(1) = 1
2. Γ(p) = (p − 1)Γ(p − 1)
3. Γ(n) = (n − 1)!, n = 1, 2, . . .
√
4. Γ(1/2) = π
Finally, it should be noted that, for N = 2, a chi-square random variable with two
degrees of freedom is equivalent to an exponential random variable. You should be able
to verify that
E[Z] = N σ 2 , var[Z] = 2N σ 4 .
Rayleigh PDF
2
Let Xp1 and X2 be two IID zero-mean Gaussian random variables with variance σ . Define
R = X12 + X22 . Then, R is a Rayleigh random variable. The PDF of R is derived as
follows. We first define Y = X12 + X22 . It follows that Y has the exponential PDF
1 − y2
fY (y) = e 2σ , y ≥ 0.
2σ 2
√
Since R = Y , we can write
¯ ¯
¯ d 2¯
fR (r) = fY (r ) · ¯ r ¯¯
¯ 2
dr
yielding
r − r22
fR (r) = e 2σ , r ≥ 0
σ2
The mean and variance of a Rayleigh random variable are given by
r
π 4−π 2
E[R] = σ , var[R] = σ .
2 2
Bernoulli Distribution
A Bernoulli random variable X has the following probabilities
½
0, with probability 1 − p
X=
1, with probability p
The event that X = 1 is often referred to as a “success”. You should be able to verify
that
E[X] = p, var[X] = p(1 − p), ΦX (ν) = 1 − p + pejν .
2
Binomial Distribution
Let
PN X1 , . . . , XN be IID Bernoulli random variables with parameter p. Then, Y =
n=1 Xn is a binomial random variable whose probabilities are given by
µ ¶
N k
Pr{X = k} = p (1 − p)N −k , k = 0, 1, . . . , N
k
The value Pr{X = k} gives the probability that k out of N events are “successful”,
where each event is succcessful with probability p. You should be able to verify that
¡ ¢N
E[X] = N p, var[X] = N p(1 − p), ΦX (ν) = 1 − p + pejν .
Geometric Distribution
Consider an experiment in which each independent trial is successful with probability p.
Let X denote the number of trials required until the first success, i.e. the first X − 1
trials fail. Then, X is a geometric random variable with the following probabilities
Pr{X = k} = (1 − p)k−1 p, k = 1, 2, . . .
1 1−p pejν
E[X] = , var[X] = , ΦX (ν) = .
p p2 1 − (1 − p)ejν
Alternatively, X can be defined as the number of failures before the first success. In
this case, the probabilities of X is
Pr{X = k} = (1 − p)k p, k = 0, 1, . . .
Poisson Distribution
A Poisson random variable X with parameter λ has the following probabilities.
λk
Pr{X = k} = e−λ , k = 0, 1, . . .
k!
A Poisson random variable represents the number of arrivals in one time unit for an
arrival process in which interarrival times are independent exponential random variables.
You should be able to verify that
¡ ¢
E[X] = λ, var[X] = λ, ΦX (ν) = exp λ(ejν − 1) .
3
Review
Handout 81
2 Random Processes
2.1 Definition of Random Processes
Recall that a random variable is a mapping from the sample space S to the set of real
numbers R. In comparison, a stochastic process or random process is a mapping from
the sample space S to the set of real-valued functions called sample functions. Figure 2.1
illustrates the mapping for a random process.
Figure 2.1: Mapping from sample points in the sample space to sample functions.
1
The autocorrelation function of X(t) is defined as
Similarly, the cross-correlation function of two random processes X(t) and Y (t) is
defined as
RXY (t1 , t2 ) = E [X(t1 )Y ∗ (t2 )]
The cross-covariance function of X(t) and Y (t) is defined as
h³ ´³ ´∗ i
CXY (t1 , t2 ) = E X(t1 ) − X(t1 ) Y (t2 ) − Y (t2 ) .
∗
= RXY (t1 , t2 ) − X(t1 ) Y (t2 )
By the analogy with random variables, random processes X(t) and Y (t) are uncorre-
lated if
CXY (t1 , t2 ) = 0 for all t1 , t2 ∈ R.
and statistically independent if the joint CDF satisfies
FX(t1 ),...,X(tm ),Y (t01 ),...,Y (t0n ) (x1 , . . . , xm , y1 , . . . , yn ) = FX(t1 ),...,X(tm ) (x1 , . . . , xm )
× FY (t01 ),...,Y (t0n ) (y1 , . . . , yn )
Time Averages
The mean X(t) as defined above is also referred to as the ensemble average. The time
average of sample function x(t) is denoted and defined as follows.
Z T /2
1
hx(t)i = lim x(t)dt
T →∞ T −T /2
2
2.3 Stationary, Ergodic, and Cyclostationary Processes
A random process is strict sense stationary (SSS) if, for all values of n ∈ Z+ and
t1 , . . . , tn , τ ∈ R, the joint CDF satisfies
for all x1 , . . . , xn ∈ C. Roughly speaking, the statistics of the random process looks the
same at all time.
For the purpose of analyzing communication systems, it is usually sufficient to assume
a stationary condition that is weaker than SSS. In particular, a random process X(t) is
wide-sense stationary (WSS) if, for all t1 , t2 ∈ R,
Roughly speaking, for a WSS random process, the first and second order statistics look
the same at all time. Note that a SSS random process is always WSS, but the converse
is not always true.
Since the autocorrelation function RX (t1 , t2 ) of a WSS random process only depends
on the time difference t1 −t2 , we usually write RX (t1 , t2 ) as a function with one argument,
i.e. RX (t1 − t2 ). Similarly, for a WSS process, we can write the autocovariance function
CX (t1 , t2 ) as CX (t1 − t2 ).
A random process is ergodic if all statistical properties that are ensemble averages
are equal to the corresponding time averages. An ergodic process must be SSS, but
ergodicity is a stronger condition than the SSS condition, i.e. some SSS process is not
ergodic. Since all statistical properties of an ergodic process can be determined from a
single sample function, each sample function of an ergodic process is representative of the
entire process.
Randomly phased sinusoid and stationary Gaussian process are examples of ergodic
processes. However, a test of ergodicity for an arbitrary random process in quite difficult
in general and is beyond the scope of this course. For analysis, we shall assume that the
random process of interest is ergodic, unless explicitly stated otherwise.
Example 2.1 (Randomly phased sinusoidal): Consider the random process X(t)
defined as
X(t) = A cos(2πf0 t + Φ),
where A, f0 > 0 are constants and Φ is a random variable uniformly distributed in the
interval [0, 2π]. The mean of X(t) is computed as
where the last equality follows from the fact that the integral is taken over one period of
the cosine function and is hence zero.
3
The autocovariance function CX (t1 , t2 ) is computed as
where the last equality follows from the previous example. Since X(t) = 0 and CX (t1 , t2 )
depends only on t1 − t2 , X(t) is WSS. ¤
4
Example 2.3 : Consider the random process defined as
X(t) = 6eΦt ,
where Φ is a random variable uniformly distributed in [0, 2]. The ensemble average is
Z ∞
1 3 ¯2 3 ¡ 2t ¢
X(t) = 6eϕt dϕ = · eϕt ¯0 = e −1 .
−∞ 2 t t
Since X(t) depends on time t, X(t) is not WSS. The autocorrelation function RX (t1 , t2 )
is computed as
£ ¤
RX (t1 , t2 ) = E[X(t1 )X ∗ (t2 )] = E 6eΦt1 · 6eΦt2
Z 2
1 18 ¯2
= 36 eϕ(t1 +t2 ) · dϕ = · eϕ(t1 +t2 ) ¯0
0 2 t1 + t2
18 ¡ 2(t1 +t2 ) ¢
= e −1 . ¤
t1 + t2
5
Review
Handout 91
2
Example 2.4 : The first two statements are proven below.
∗
1. RXY (−τ ) = RXY (τ )
p
2. |RXY (τ )| ≤ RX (0)RY (0)
1
3. |RXY (τ )| ≤ (RX (0) + RY (0))
2
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert.
2
The third statement is somewhat more difficult to show. To do so, we can justify the following
statement (using the same argument as for the derivation of the Schwarz inequality)
p
E [U (t)V ∗ (t)] ≤ E [|U (t)|2 ] E [|V (t)|2 ],
and use the above inequality to establish the third statement by setting U (t) = X(t) and V (t) = X(t−τ ).
1
2.4 Gaussian Processes
A random process X(t) is a zero-mean Gaussian process if, for all N ∈ Z+ and t1 , . . . , tN ∈
R, (X(t1 ), . . . , X(tN )) is a zero-mean jointly Gaussian random vector. In addition, we
say that X(t) is a Gaussian process if it is the sum of a zero-mean Gaussian process and
some deterministic function µ(t). Note that X(t) = µ(t).
Some important properties of Gaussian process X(t) are listed below. The proofs are
beyond the scope of this course and are omitted.
1. If we pass X(t) through an LTI filter with impulse response h(t), the output X(t) ∗
h(t) is a Gaussian process.
2. The statistics of X(t) is fully determined by the mean X(t) and the covariance
function CX (t1 , t2 ).
R∞
3. We refer to the quantity of the form −∞ X(t)u(t)dt as an observable or linear
functional of X(t). Any set of linear functionals of X(t) are jointly Gaussian.
RX (τ ) ↔ GX (f )
For τ = 0, we have, through the inverse Fourier transform, the average power of the
signal equal to
Z ∞
£ 2
¤
E |X(t)| = RX (0) = GX (f )df
−∞
2
where A and f0 are positive constants and Φ is uniformly distributed in [0, 2π]. Recall
2
that RX (τ ) = A2 cos(2πf0 τ ). It follows that
A2 A2
GX (f ) = F {RX (τ )} = δ(f − f0 ) + δ(f + f0 ).
4 4
In addition, as another illustration of the ergodicity of X(t), consider computing the
time-average autocorrelation for an arbitrary sample function for Φ = ϕ as follows.
Z
∗ 1 T /2
hX(t)X (t − τ )i = lim A cos(2πf0 t + ϕ)A cos(2πf0 (t − τ ) + ϕ)dt
T →∞ T −T /2
Z
A2 T /2
= lim (cos(2πf0 τ ) + cos(4πf0 t − 2πf0 τ + 2ϕ)) dt
T →∞ 2T −T /2
A2 Z
A2 T /2
= lim cos(2πf0 τ ) + cos(4πf0 t − 2πf0 τ + 2ϕ)dt
T →∞ 2 2T −T /2
| {z }
=0 as T →∞
2
A
= cos(2πf0 τ )
2
Note that the time average is equal to the ensemble average. ¤
where we have applied the WSS properties of X(t) and Y (t) to write E[X(t)] = E[X(0)]
and E[Y (t)] = E[Y (0)]. The autocorrelation function of Z(t) is computed below.
Note that RZ (t1 , t2 ) depends only on t1 − t2 . It follows that Z(t) is also WSS. Conse-
quently, we can write
RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RY X (τ )
GZ (f ) = GX (f ) + GY (f ) + GXY (f ) + GY X (f )
3
where define the cross PSDs such that
If X(t) and Y (t) have zero mean and are uncorrelated, then RXY (τ ) = RY X (τ ) = 0
for all τ , yielding
RZ (τ ) = RX (τ ) + RY (τ ).
In terms of the PSD,
GZ (f ) = GX (f ) + GY (f ).
Thus, for zero-mean uncorrelated jointly WSS random signals, superposition holds for the
autocorrelation function as well as for the PSD.
RZ (τ ) = RX (τ )RY (τ ) ↔ GZ (f ) = GX (τ ) ∗ GY (τ )
Proof: Using the independence between X(t) and Y (t), the mean of Z(t) is written as
Using the independence between X(t) and Y (t) and their WSS properties, the autocor-
relation function is written as
Note that RZ (t1 , t2 ) depends only on t1 − t2 . It follows that Z(t) is also WSS. Conse-
quently, we can write RZ (τ ) = RX (τ )RY (τ ).
Since multiplication in the time domain corresponds to convolution in the frequency
domain, we can write GZ (f ) = GX (f ) ∗ GY (f ). ¤
Example 2.6 (Modulated random signal): Consider the modulated random signal
where X(t) is a WSS random signal while the random phase Φ is uniformly distributed
in [0, 2π] and is independent of X(t).
Recall that U (t) = cos(2πf0 t + Φ) is WSS with the autocorrelation function RU (τ ) =
1
2
cos(2πf0 τ ). It follows that Y (t) is WSS with the following autocorrelation function.
1 1
RY (τ ) = RX (τ ) · cos(2πf0 τ ) = RX (τ ) cos(2πf0 τ )
2 2
In the frequency domain,
µ ¶
1 1 1 1 1
GY (f ) = GX (f ) ∗ δ(f − f0 ) + δ(f + f0 ) = GX (f − f0 ) + GX (f + f0 ). ¤
2 2 2 4 4
4
Review
Handout 101
Properties of Y (t)
1. Mean, autocorrelation, and PSD: The mean of Y (t) is computed below.
·Z ∞ ¸
E[Y (t)] = E h(τ )X(t − τ )dτ
−∞
Assuming that X(t) is WSS, the autocorrelation function of Y (t) is computed below.
·µZ ∞ ¶ µZ ∞ ¶¸
∗ ∗
RY (τ ) = E h(η)X(τ − η)dη h (−ξ)X (ξ)dξ
−∞ −∞
·Z ∞ Z ∞ ¸
∗ ∗
= E h(η)h (−ξ)X(τ − η)X (ξ)dξdη
−∞ −∞
Z ∞Z ∞
= h(η)h∗ (−ξ)E [X(τ − η)X ∗ (ξ)] dξdη
Z−∞ −∞
∞ Z ∞
= h(η)h∗ (−ξ)RX (τ − η − ξ)dξdη
Z−∞∞
−∞
µZ ∞ ¶
∗
= h(η) h (−ξ)RX (τ − η − ξ)dξ dη
−∞ −∞
| {z }
z(τ −η) where z(τ )=h∗ (−τ )∗RX (τ )
Z ∞
= h(η)z(τ − η)dη
−∞
= h(τ ) ∗ z(τ ) = h(τ ) ∗ h∗ (−τ ) ∗ RX (τ )
1
For an ergodic process, RY (0) yields the average power of a filtered random signal,
i.e. Z ∞
P = RY (0) = |H(f )|2 GX (f )df.
−∞
RY (τ ) = h(τ ) ∗ h∗ (−τ ) ∗ RX (τ )
GY (f ) = |H(f )|2 GX (f )
Z ∞
P = RY (0) = |H(f )|2 GX (f )df
−∞
2. Stationarity: If the input X(t) is WSS, then the output Y (t) is also WSS. In
addition, if X(t) is SSS, so is Y (t).
3. PDF: In general, it is difficult to determine the PDF of the output, even when the
PDF of the input signal is completely specified.
However, when the input is a Gaussian process, the output is also a Gaussian pro-
cess. The statistics of the output process is fully determined by the mean function
and the autocovariance function.
Example 2.7 : Consider the LTI system whose input x(t) and output y(t) are related
by
y(t) = x(t) + ax(t − T ).
The corresponding impulse response is
H(f ) = 1 + ae−j2πf T .
†
Power Spectrum Estimation
One problem that is often encountered in practice is to estimate the PSD of a random
signal x(t) when only a segment of length T of a single sample function is available.
Let us consider a single sample function of an ergodic random process x(t). Its
truncated version is given as
½
x(t), |t| ≤ T /2
xT (t) =
0, otherwise
2
Since xT (t) is strictly time-limited, its Fourier transform XT (f ) exists. An alternative
definition of the PSD of X(t) is stated as
1 £ ¤
GX (f ) = lim E |XT (f )|2 .
T →∞ T
A “natural” estimate of the PSD can be found by simply omitting the limiting and
expectation operations to obtain
1
ĜX (f ) = |XT (f )|2 .
T
This spectral estimate is called a periodogram. In practice, spectral estimation based on
a periodogram consists of the following steps.
Thermal Noise
• It is due to random motion of electrons in any conductor.
• It has a Gaussian PDF according to the central limit theorem. Note that the number
of electrons involved is quite large, with their motions statistically independent from
one another.
• The noise voltage (in V) across the terminals of a resistor with resistance R (in
Ohm) has a Gaussian PDF with the mean and variance, denoted by µ and σ 2 ,
given by
2(πkT )2
µ = 0, σ 2 = R,
3h
where k is the Boltzmann’s constant ≈ 1.38 × 10−23 J/K, h is the Planck’s constant
≈ 6.63 × 10−34 Js, and T is the absolute temperature in K.
3
The noise PSD (in V2 /Hz) is
2Rh|f |
GN (f ) = .
eh|f |/kT
−1
GN (f ) ≈ 2kT R
For T = 273-373 K (0-100 degree Celcius), kT /h ≈ 1012 Hz. Thus, for all practical
purposes, the PSD of thermal noise is constant.
Shot Noise
• It is associated with the discrete flow of charge carriers across semiconductor junc-
tions or with the emission of electrons from a cathode.
• Shot noise has a Gaussian PDF with zero mean according to the central limit
theorem.
• Shot noise has a constant power spectrum, with the noise level being independent
of the temperature.
White Noise
Several types of noise sources have constant PSDs over a wide range of frequencies. Such
a noise source is called white noise by the analogy to white light which contains all the
frequencies of visible light.
In general, we write the PSD of white noise as
GN (f ) = N0 /2,
where the factor 1/2 is included to indicate that half of the power is associated with
positive frequencies while the other half is associated with negative frequencies, so that
the power passed by an ideal bandpass filter with bandwidth B is given by N0 B. The
corresponding autocorrelation function is
N0
RN (τ ) = δ(τ ).
2
NOTE: White noise is not necessarily Gaussian noise. Conversely, Gaussian noise is not
necessarily white noise.
Consider now a sample of a zero-mean white noise process N (t). The variance of the
sample is £ ¤
E |N (t)|2 = RN (0) = ∞.
Therefore, white noise has infinite power.
4
Filtered White Noise
Consider now filtered white noise corresponding to the ideal band-limited filter, i.e.
½
N0 /2, |f | ≤ B
GN (f ) =
0, otherwise
RN (τ ) = F −1 {G(f )} = N0 Bsinc(2Bτ ).
It follows that the sample of a band-limited zero-mean white Gaussian noise is a zero-
mean Gaussian random variable with the variance
£ ¤
E |N (t)|2 = RN (0) = N0 B,
N0
GN (f ) = |H(f )|2 ,
2
and is referred to as colored noise, which is again due to the analogy to colored light
containing only some frequencies of visible light.
On the other hand, the average power of an ideal LPF with the same DC gain |H(0)|
and bandwidth B is given by
RN (0) = N0 B|H(0)|2 .
By equating these two noise powers, we can define the noise equivalent bandwidth of
an arbitrary LPF as
R∞
−∞
|H(f )|2 df
BN =
2|H(0)|2
Thus, the noise equivalent bandwidth of an arbitrary LPF is defined as the bandwidth
of the ideal LPF that produces the same output power from identical white noise input.
The definition can also be extended to bandpass filters in the same fashion.
5
Example 2.8 : Consider a LPF based on the RC circuit with the frequency response
1
H(f ) = ,
1 + jf /f0
1
where f0 = 2πRC
.
Since H(0) = 1,
Z Z Z ∞
1 ∞ 2 1 ∞ 1 1
BN = |H(f )| df = 2
df = df.
2 −∞ 2
2 −∞ 1 + f /f0 0 1 + f 2 /f02
Setting z = f /f0 yields
Z ∞
1 ∞ π 1
BN = f 0 dz = f 0 · arctanz| 0 = f 0 = .
0 1 + z2 2 4RC
The corresponding noise power is
4kT R kT ¤
RN (0) = N0 BN = = .
4RC C
6
Review
Handout 111
4. The channel adds zero-mean white Gaussian noise N (t) to the transmitted signal.
In addition, this noise is uncorrelated with the transmitted signal.
The first tree assumptions indicate that the channel is distortionless over the message
bandwidth W . The response Y (t) of a AWGN channel for a transmitted signal X(t) is
given by
Y (t) = aX(t − td ) + N (t).
If the transmitted signal X(t) has average power SX and message bandwidth W and
the receiver includes an ideal lowpass filter with bandwidth of exactly W , the power of
the channel output is given by
£ ¤ £ ¤ £ ¤
E |Y (t)|2 = E |aX(t − td )|2 + E |N (t)|2 = a2 SX + N0 W.
a2 SX
SNR = .
N0 W
Matched Filter
Consider the problem of detecting whether a pulse of a known shape p(t) has been
transmitted or not. Thus, the output of the AWGN channel is given either by
or by
Y (t) = N (t).
Without loss of generality, assume that a = 1 and td = 0 in what follows. Assume that
the receiver structure in figure 3.1 is used.
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert.
1
transmitter channel receiver
Figure 3.1: Receiver structure for pulse detection.
In addition, we base our decision about the presence or the absence of p(t) on the
output Ỹ (t) of the receiver filter h(t) sampled at time instant t = t0 . More specifically, if
the pulse is present,
Z ∞
Ỹ (t0 ) = h(t0 − τ )Y (τ )dτ
−∞
Z ∞ Z ∞
= h(t0 − τ )p(τ )dτ + h(t0 − τ )N (τ )dτ
−∞ −∞
= p̃(t0 ) + Ñ (t0 ),
where p̃(t) and Ñ (t) are the filtered pulse and the filtered noise respectively.
The key question here is as follows: What is the optimal impulse response of the
receiver filter? Intuitively, the optimal filter (in terms of minimizing the decision error
probability) should maximize the SNR at t = t0 . This SNR can be written as
¯R ¯2
¯ ∞ j2πf t0 ¯
|p̃(t0 )| 2 ¯ −∞ H(f )P (f )e df ¯
SNR = h i = R∞ .
E |Ñ (t0 )|2 −∞
|H(f )|2 GN (f )df
P ∗ (f ) −j2πf t0
H(f ) = K e ,
GN (f )
where K is an arbitrary constant. Note that the optimal filter amplifies frequency com-
ponents of the signal and attenuates frequency components of the noise.
In the case of white noise with GN (f ) = N0 /2, we can write
P ∗ (f ) −j2πf t0
H(f ) = K e .
N0 /2
2
Thus, the optimal impulse response is determined by the pulse shape. In particular, the
optimal impulse response is matched to the pulse shape. For this reason, this optimal
filter is called a matched filter.
Assume that the pulse p(t) is nonzero only in the interval [0, T ]. Substituting the
expression of h(t) into the expression for Ỹ (t0 ) yields
Z ∞ Z T
2K
Ỹ (t0 ) = h(t0 − τ )Y (τ )dτ = p∗ (τ )Y (τ )dτ.
−∞ N0 0
Note that Ỹ (t0 ) is the correlation between the transmitted pulse p(t) and the received
signal Y (t). The result indicates that we can implement this optimal filtering as a corre-
lation receiver, as illustrated in figure 3.2.
3
Review
Handout 121
or
transmitter channel
receiver
Figure 2.3: Receiver structure for binary detection.
Given that p1 (t) is transmitted, the outputs of the two matched filters are
Z T Z T Z T
∗ ∗
Z1 = Y (t)p1 (t)dt = p1 (t)p1 (t)dt + N (t)p∗1 (t)dt = E + N1 ,
0 0
|0 {z }
=N1
Z T Z T Z T
Z2 = Y (t)p∗2 (t)dt = p1 (t)p∗2 (t)dt + N (t)p∗2 (t)dt = ρE + N2 .
0 0
|0 {z }
=N2
In addition, given that p1 (t) is transmitted, a detection error occurs when Z2 > Z1 , or
equivalently Z = Z2 − Z1 > 0.
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert. In addition, this handout is the final one.
1
When N (t) is zero-mean Gaussian noise, N1 and N2 are jointly Gaussian random
variables. We compute the mean and the variance of N1 below.
·Z T ¸ Z T
∗
E[N1 ] = E N (t)p1 (t)dt = E[N (t)]p∗1 (t)dt = 0,
"µ0Z 0
¶ µZ ¶∗ #
T T
var[N1 ] = E N (τ )p∗1 (τ )dτ N (η)p∗1 (η)dη
0 0
Z T Z T
= E[N (τ )N ∗ (η)]p∗1 (τ )p1 (η)dτ dη
0 0
Z T Z T
N0
= δ(τ − η)p∗1 (τ )p1 (η)dτ dη
0 0 2
Z T
N0 EN0
= p∗1 (η)p1 (η)dη = .
2 0 2
Similarly, N2 has mean 0 and variance EN0 /2. The covariance between N1 and N2 is
computed as follows.
"µZ ¶ µZ T ¶∗ #
T
E[N1 N2∗ ] = E N (τ )p∗1 (τ )dτ N (η)p∗2 (η)dη
0 0
Z T Z T
= E[N (τ )N ∗ (η)]p∗1 (τ )p2 (η)dτ dη
0 0
Z T Z T
N0
= δ(τ − η)p∗1 (τ )p2 (η)dτ dη
0 0 2
Z T
N0 EN0
= p∗1 (η)p2 (η)dη = ρ∗ .
2 0 2
where the last equality follows from a practical assumption that p1 (t) and p2 (t) are real,
and hence ρ is real.
2
Therefore, given that p1 (t) is transmitted, the probability of detection error is
¯
¯
¯
Z − (ρ − 1)E (1 − ρ)E ¯ ¯
Pr{Z > 0|p1 (t)} = Pr p >p ¯ p1 (t)
(1 − ρ)EN0 (1 − ρ)EN0 ¯¯
| {z }
¯
zero−mean unit−variance Gaussian
s
(1 − ρ)E
= Q .
N0
By symmetry, given that p2 (t) is transmitted, the probability of detection error is the
same. In summary, the overall bit error probability is
s
(1 − ρ)E
Pe = Q .
N0
For ergodic systems, the bit error probability is equal to the bit error rate (BER)
which is a key performance measure of a digital communication system. The BER is the
the average number of errors in an indefinitely long sequence of transmitted bits.
It is customary to describe the performance of a digital communication system by
plotting the BER against the ratio Eb /N0 , where Eb is the average energy used per
transmitted bit. Significant comparisons among different communication systems are
possible using such plots. As a specific example, we shall compare two scenarios of
binary detection discussed above.
2. Orthogonal signals: ρ = 0
It follows that
Ãr ! Ãr !
2E E
Peantipodal = Q , Peorthogonal = Q
N0 N0
Figure 2.4 indicates that antipodal signals perform better compared with orthogonal
signals. In particular, for the same BER, orthogonal signals require 3 dB more energy
per bit than antipodal signals. In other words, there is a 3-dB penality in terms of the
signal energy.
†
Appendix: Wiener-Khinchine Theorem
Recall that the PSD is defined as the Fourier transform of the autocorrelation function.
The Wiener-Khinchine theorem states that the PSD is indeed equal to the following
quantity, which was previously mentioned as an alternative definition of the PSD.
1 £ ¤
GX (f ) = lim E |XT (f )|2 ,
T →∞ T
3
-1
-2 orthogonal
log10BER
-3 antipodal
-4
-5
-6
0 2 4 6 8 10 12 14
Eb/N0 (dB)
where XT (f ) is the Fourier transform of the truncation xT (t) of the sample function x(t),
i.e. ½
x(t), |t| ≤ T /2
xT (t) =
0, otherwise
GX (f ) = F{RX (τ )},
R∞
provided that −∞
|τ RX (τ )|dτ < ∞.
2
We use x(t) to denote a random process in this section. The capital X(f ) is already used to refer to
its Fourier transform.
4
Figure 2.5 shows the region of integration in the domain set of (ξ, τ ). By changing the
order of integration, we can write
Z T Z T /2−τ Z 0 Z T /2
£ 2
¤ −j2πτ
E |XT (f )| = RX (τ )e dξdτ + RX (τ )e−j2πτ dξdτ
0 −T /2 −T −T /2−τ
Z T Z 0
−j2πτ
= (T − τ )RX (τ )e dτ + (T + τ )RX (τ )e−j2πτ dτ
0 −T
Z T
= (T − |τ |)RX (τ )e−j2πτ dτ.
−T
Figure 2.5: Region of integration for the derivation of the Wiener-Khinchine theorem.
R R
Since f (τ )dτ ≤ |f (τ )|dτ for real f (τ ),R the real part and the imaginary part of the
∞
integration in the last equality are at most −∞ |τ RX (τ )|dτ , which is assumed to be finite.
It follows that the limit in the last equality is equal to zero, yielding GX (f ) = F{RX (τ )}
as desired. ¤
5
for all t, τ ∈ R and n ∈ Z. In other words, for any τ ∈ R, X(t) and RX (t, t − τ ) as
functions of t are periodic with period T0 .
For a widesense cyclostationary process X(t), the PSD is given by
where
Z T0 /2
1
hRX (t, t − τ )i = RX (t, t − τ )dt
T0 −T0 /2
Example 2.9 : Consider Y (t) = X(t) cos(2πf0 t), where X(t) is WSS. We compute the
mean and the correlation function of Y (t) as follows.
Since Y (t) and RY (t, t − τ ) are periodic with period T0 = 1/f0 , it follows that Y (t) is
widesense cyclostationary.
In addition,
1
hRY (t, t − τ )i = RX (τ ) cos(2πf0 τ ),
2
yielding the PSD
1 1
GY (f ) = GX (f − f0 ) + GX (f + f0 ).
4 4
Note that is the same PSD as for Y (t) = X(t) cos(2πf0 t + Φ), where Φ is uniformly
distributed in [0, 2π]. ¤