LN 0 PDF

Review
Handout 11
1 Probability and Random Variables

1.1 Random Variables, Probability Distributions, and Proba-
bility Densities
1.1.1 Fundamentals of Probability
The sample space S of an experiment or an action is the set of all possible outcomes.
Each possible outcome is called a sample point. An event is a set of outcomes, or a subset
of the sample space. For an event E, we shall use Pr{E} to denote the probability of E.
We first present the axioms that a probability measure must satisfy.2
Axioms of probability: Let S be the sample space and E, F ∈ S be events.

1. Pr{S} = 1.
2. 0 ≤ Pr{E} ≤ 1.
3. If E and F are disjoint, i.e. E ∩ F = ∅, then Pr{E ∪ F} = Pr{E} + Pr{F}.
The above axioms yield the following basic identities.

• If E c = S − E, then
Pr{E c } = 1 − Pr{E}
Proof: Since E and E c are disjoint and their union is S, from statement 3 of the
axiom,
Pr{S} = Pr{E} + Pr{E c }.
From statement 1, we obtain the desired property, i.e.
1 = Pr{E} + Pr{E c }. ¤
• If E and F are not disjoint, then
Pr{E ∪ F} = Pr{E} + Pr{F} − Pr{E, F}
Proof: Since (F ∩ E) ∪ (F − E) = F and (F ∩ E) ∩ (F − E) = ∅,

Pr{F} = Pr{F, E} + Pr{F − E}
⇒ Pr{F − E} = Pr{F} − Pr{F, E}.
Since E ∪ (F − E) = E ∪ F and E ∩ (F − E) = ∅,
Pr{E ∪ F} = Pr{E} + Pr{F − E} = Pr{E} + Pr{F} − Pr{E, F}.
| {z }
=Pr{F }−Pr{F,E}
¤
1
Course notes were prepared by Prof. R.M.A.P. Rajatheva and revised by Dr. Poompat Saengudom-
lert.
2
It is common to write Pr{E ∩ F} as Pr{E, F}. We shall adopt this notation.
1
• If F1 , . . . , Fn are disjoint, then
( n
) n
[ X
Pr Fi = Pr{Fi }
i=1 i=1
Proof: The statement follows from induction. For example, consider n = 3. Since
F1 ∪ F2 and F3 are disjoint, we can write
Pr{F1 ∪ F2 ∪ F3 } = Pr{F1 ∪ F2 } + Pr{F3 }.
Since F1 and F2 are disjoint, we can write
Pr{F1 ∪ F2 ∪ F3 } = Pr{F1 } + Pr{F2 } + Pr{F3 }. ¤
The conditional probability of event E given that event F happens (or in short given
event F), denoted by Pr{E|F}, is defined as
Pr{E, F}
Pr{E|F} =
Pr{F}
Alternatively, we can write
Pr{E, F} = Pr{E|F} Pr{F}
A partition of E is a set of disjoint subsets of E whose union is equal to E. Let

F1 , . . . , Fn be a partition of S. From the definition of conditional probability, we can
obtain the Bayes’ rule, which is written as
n
X
Pr{E} = Pr{E|Fi } Pr{Fi }
i=1
Proof: Write n
[
E= (E ∩ Fi ).
i=1
Since F1 , . . . , Fn are disjoint, so are the sets E ∩ F1 , . . . , E ∩ Fn . Hence,

n
X
Pr{E} = Pr{E, Fi }.
i=1
Using the definition of conditional probability, we can write

n
X
Pr{E} = Pr{E|Fi } Pr{Fi }.
¤
i=1
The Bayes’ theorem states that
Pr{E|Fi } Pr{Fi }
Pr{Fi |E} = Pn
j=1 Pr{E|Fj } Pr{Fj }
2
Proof: Write Pr{Fi |E} as
Pr{Fi , E} Pr{E|Fi } Pr{Fi }

Pr{Fi |E} = =
Pr{E} Pr{E}
and use the Bayes rule for the denominator. ¤
The conditional probability can be defined based on multiple events. In particular,

we define
Pr{E, F1 , . . . , Fn }
Pr{E|F1 , . . . , Fn } =
Pr{F1 , . . . , Fn }
It follows that we can write
Pr{F1 , . . . , Fn } = Pr{Fn |F1 , . . . , Fn−1 } Pr{F1 , . . . , Fn−1 }

= Pr{Fn |F1 , . . . , Fn−1 } Pr{Fn−1 |F1 , . . . , Fn−2 } Pr{F1 , . . . , Fn−2 }
= ...
= Pr{Fn |F1 , . . . , Fn−1 } . . . Pr{F3 |F1 , F2 } Pr{F2 |F1 } Pr{F1 }
yielding
n
Y
Pr{F1 , . . . , Fn } = Pr{F1 } Pr{Fi |F1 , . . . , Fi−1 }
i=2
Events E and F are independent if
Pr{E, F} = Pr{E} Pr{F}
or equivalently
Pr{E|F} = Pr{E}
In addition, events E and F are conditionally independent given event G if
Pr{E, F|G} = Pr{E|G} Pr{F|G}
1.1.2 Random Variables

A random variable is a mapping that assigns a real number X(s) to each sample point s
in the sample space S.
• If S is countable, then X(s) is a discrete random variable.
• If S is uncountable, making X(s) take any real value in its range, then X(s) is a
continuous random variable.
The basic idea behind a random variable is that we can consider probabilistic events
as numerical-valued events, which lead us to a probability function. With this function,
we can neglect the underlying mapping from s to X, and consider a random variable X
as a direct numerical outcome of a probabilistic experiment or action.
3
1.1.3 Probability Functions
By using a random variable X, we can define numerical-valued events such as X = x and
X ≤ x for x ∈ R. The probability function
FX (x) = Pr{X ≤ x}
is known as the cumulative distribution function (CDF) or simply the distribution func-
tion. Note that the CDF is defined for all x ∈ R.
• It is customary to denote a random variable by an upper-case letter, e.g. X, and
denote its specific value by a lower-case letter, e.g. x.
• The nature of the function FX (x) is determined by random variable X, which is
identified in the subscript. When the associated random variable X is clear from
the context, we often write F (x) instead of FX (x).
• Since FX (x) indicates a probability value, it is dimensionless.
Some Properties of a CDF

1. FX (−∞) = 0
2. FX (∞) = 1
3. If x1 < x2 , then FX (x1 ) ≤ FX (x2 ).
4. Pr{X > x} = 1 − FX (x)
5. Pr{x1 < X ≤ x2 } = FX (x2 ) − FX (x1 )
An alternative description of the probability distribution of random variable X is

provided by the probability density function (PDF) defined as
dFX (x)
fX (x) =
dx
NOTE: A common mistake is to think that fX (x) = Pr{X = x}; it is not always true.
Some Properties of a PDF

R∞
1. −∞ fX (x)dx = 1
2. fX (x) ≥ 0
Rx
3. FX (x) = −∞ fX (u)du
Rx
4. Pr{x1 < X ≤ x2 } = x12 fX (x)dx
Over all, the PDF fX (x) or CDF FX (x) provides a complete description of random
variable X.
4
1.1.4 Continuous vs. Discrete Random Variables
Roughly speaking, a continuous random variable has a continuous CDF. A discrete ran-
dom variable has a staircase CDF. A mixed-type random variable has a CDF containing
discontinuities, but the CDF is not necessarily constant between discontinuities. Fig-
ure 1.1 illustrates different types of CDFs.
continuous
discrete
mixed−type
CDF PDF
Figure 1.1: CDFs and PDFs of different types of random variables.
Since the PDF is the derivative of the CDF, a continuous random variable has a
continuous PDF. However, the PDF of a discrete or mixed-type random varaible contains
impulses due to the discontinuities in the CDF.
PMF
For a discrete random variable, let X denote the countable set of all possible values of
X(s). We can then define a probability mass function (PMF) as
fX (x) = Pr{X = x}
where x ∈ X . Note that a PMF is only meaningful for a discrete random variable. The
same notation fX (x) is used for both the PDF and the PMF; it is usually clear from the
context which type of function is referred to by fX (x).
Example 1.1 : Consider rolling of a dice. The set of sample points of this probabilis-
tic experiment is S = {1, 2, 3, 4, 5, 6}. The natural definition of an associated random
variable is
X(s) = s, s ∈ S.
5
The corresponding PMF is
fX (x) = 1/6, x ∈ {1, . . . , 6}.
The corresponding PDF is

6
1X
fX (x) = δ(x − i).
6 i=1
The corresponding CDF is
6
1X
FX (x) = u(x − i).
6 i=1
Figure 1.2 illustrates the PDF and the CDF for this example. ¤
1/6
2/6
1/6
0
1 2 3 4 5 6
Figure 1.2: PDF and CDF of the result of a dice roll.
6
Review
Handout 21
1.1.5 Joint and Conditional CDFs and PDFs

The joint CDF of random variables X and Y is defined as
FXY (x, y) = Pr{X ≤ x, Y ≤ y}
Their joint PDF is defined as
∂ 2 FXY (x, y)
fXY (x, y) =
∂x∂y
It follows that
Z y2 Z x2
Pr{x1 < X ≤ x2 , y1 < Y ≤ y2 } = fXY (x, y)dxdy.
y1 x1
The PDF for X (or Y ) alone is called a marginal PDF of X (or Y ) and can be found
from the joint PDF by integrating over the other random variable, i.e.
Z ∞ Z ∞
fX (x) = fXY (x, y)dy, fY (y) = fXY (x, y)dx.
−∞ −∞
X and Y are statistically independent (or in short independent) if

fXY (x, y) = fX (x)fY (y) for all pairs (x, y)
The conditional PDF of Y given X is defined as
fXY (x, y)
fY |X (y|x) =
fX (x)
Note that, if X and Y are independent, then fY |X (y|x) = fY (y).
Example 1.2 : Suppose that fXY (x, y) = 14 e−|x|−|y| . The marginal PDF of X is
Z ∞ Z
1 −|x|−|y| 1 −|x| ∞ −|y|
fX (x) = e dy = e e dy
−∞ 4 4 −∞
Z
1 −|x| ∞ −y 1 ¯∞ 1
= e e dy = e−|x| × −e−y ¯0 = e−|x| .
2 0 2 | {z } 2
=1
Suppose that we want to evaluate Pr{X ≤ 1, Y ≤ 0}. It can be done as follows.

Z 0 Z 1
1 −|x|−|y|
Pr{X ≤ 1, Y ≤ 0} = e dxdy
−∞ −∞ 4
µZ 1 ¶ µZ 0 ¶
1 −|x| −|y|
= e dx e dy
4 −∞ −∞
µ ¶
1³ x0 ¯ ´¡ y 0 ¢ 1
−x ¯1 1
= e | − e 0 e |−∞ = 2− ¤
4 | −∞ {z } | {z } 4 e
=1
1
=2−e−1
lert.
1
1.2 Functions of Random Variables
Consider a random variable Y that is obtained as a function of another random variable
X. In particular, suppose that Y = g(X). We first consider g that is monotonic (either
increasing or decreasing).
Monotonic Functions
If g is monotonic, each value y of Y has a unique inverse denoted by g −1 (y), as illustrated
in figure 1.3.
Figure 1.3: Monotonic function of random variable X.
When g is monotonically increasing,

Z g −1 (y)
−1
FY (y) = Pr{Y ≤ y} = Pr{X ≤ g (y)} = fX (x)dx,
−∞
yielding
dFY (y) dg −1 (y)
fY (y) = = fX (g −1 (y)) .
dy dy
Similarly, when g is monotonically decreasing,
dg −1 (y)
fY (y) = −fX (g −1 (y)) .
dy
It follows that, for a monotonic function g, we have
¯ −1 ¯
¯ dg (y) ¯
fY (y) = fX (g −1 (y)) · ¯¯ ¯
dy ¯
Example 1.3 : Let Y = g(X), where g(x) = ax + b. Then, g −1 (y) = (y − b)/a, yielding
dg −1 (y)/dy = 1/a. It follows that
µ ¶ ¯ ¯ µ ¶
y−b ¯1¯ 1 y − b
fY (y) = fX · ¯¯ ¯¯ = fX . ¤
a a |a| a
2
Figure 1.4: Nonmonotonic function of random variable X.
Nonmonotonic Functions
If g is not monotonic, then several values of x can correspond to a single value of y, as
illustrated in figure 1.4.
We can view g as having multiple monotonic components g1 , . . . , gK , where K is the
number of monotonic components, and sum the PDFs from these components, i.e.
K ¯ −1 ¯
X ¯ dg (y) ¯
fY (y) = fX (gk−1 (y)) · ¯¯ k ¯
k=1
dy ¯
Example 1.4 : Let Y = g(X), where g(x) = ax2 with a > 0, as illustrated in figure 1.5.
0
Figure 1.5: Y = aX 2 with a > 0.
Each value of y > 0 corresponds to two values of x, i.e.

½ −1 p
g1 (y) = −
p y/a
x= −1
g2 (y) = y/a
Note that ¯ −1 ¯ ¯ −1 ¯ ¯ ³ ´ ¯
¯ dg1 (y) ¯ ¯ dg2 (y) ¯ ¯ 1 y −1/2 ¯
¯ ¯ ¯ ¯ ¯ ¯ = √1 .
¯ dy ¯ = ¯ dy ¯ = ¯ 2a a ¯ 2 ay
It follows that
· µ r ¶ µr ¶¸
1 y y
fY (y) = √ fX − + fX for y ≥ 0. ¤
2 ay a a
3
Review
Handout 31
1.3 Expected Values

While the PDF or CDF is a complete statistical description of a random variable, we
often do not need the whole statistical information. More specifically, it is often sufficient
to talk about the mean, the variance, the covariance, and so on, as will be described next.
Mean (Expected Value) of a Random Variable

The mean or expected value of random variable X is defined as
Z ∞
E[X] = xfX (x)dx
−∞
where E[·] denotes the operator for taking the expected value of a random variable. For
convenience, we also denote E[X] by X.
Suppose that Y = g(X), i.e. Y is a function
R ∞ of X. One way to find E[Y ] is to first
compute fY (y) and then compute E[Y ] = −∞ yfY (y)dy. However, it is often easier to
use the following identify.
Z ∞
E[Y ] = E[g(X)] = g(x)fX (x)dx
−∞
Another useful property in taking the expectation is the linearity property, which
follows directly from the linearity of the integration operation. In particular, for any
random variables X1 , . . . , XN and any real numbers a1 , . . . , aN ,
" N # N
X X
E an Xn = an E [Xn ]
n=1 n=1
The k th moment of random variable X is defined as E[X k ]. The k th central moment

of X is defined as E[(X − X)k ]. Some of the commonly used parameters are listed below.
• Mean of X denoted by E[X] or X: Note that the mean of X is equal to the 1st
moment of X.
• Mean square of X denoted by E[X 2 ]: The mean square of X is equal to the 2nd
moment of X. More specifically,
Z ∞
2
E[X ] = x2 fX (x)dx
−∞
1
lert.
1
2
• Variance of X denoted by var[X] or σX : The variance of X is equal to the 2nd
central moment of X. More specifically,
Z ∞
var[X] = (x − X)2 fX (x)dx
−∞
where var[·] denotes the operator for taking the variance of a random variable.
• Standard deviation of X denoted by σX : The standard deviation of X is equal to

the positive square root of the variance.
Note that the mean E[X] can be thought of as the best guess of X in terms of the mean
square error. In particular, consider the problem of finding a number a that minimizes
the mean square error MSE = E[(X − a)2 ]. We show below that the error is minimized
by setting a = E[X]. In particular, solving dMSE/da = 0 yields
d ¡ ¢ d ¡ ¢
0= E[X 2 − 2aX + a2 ] = E[X 2 ] − 2aE[X] + a2 ] = −2E[X] + 2a,
da da
or equivalently a = E[X].
2
Roughly speaking, the variance σX measures the effective width of the PDF around
the mean. We next provide a more quantitative discussion on the variance.
Theorem (Markov inequality): For a nonnegative random variable X,
E[X]
Pr{X ≥ a} ≤ .
a
R∞ R∞ x 1
R∞ E[X]
Proof: Pr{X ≥ a} = a
fX (x)dx ≤ a
f (x)dx
a X
≤ a 0
fX (x)dx = a
. ¤
Theorem (Chebyshev inequality): For a random variable X,

2
σX
Pr{|X − E[X]| ≥ b} ≤ .
b2
Proof: Take |X − E[X]|2 as a random variable in the Markov inequality. ¤
Figure 1.6 illustrates how Pr{|X − E[X]| ≥ b} in the Chebyshev inequality is equal
to the area under the “tails” of the PDF. In particular, for b = 2σX , we have
1
Pr{|X − E[X]| ≥ 2σX } ≤ ,
4
which means that we can expect at least 75% of observations on random variable X to
be within the range E[X] ± 2σX . Thus, the smaller the variance, the smaller the spread
of likely values.
2
Figure 1.6: Area under the PDF tails for the Chebyshev inequality.
2
To help compute the variance σX , the following identify is sometimes useful.
2 2
σX = E[X 2 ] − X
The above identity can be obtained by writing

2 2 2 2
σX = E[(X − X)2 ] = E[X 2 − 2XX + X ] = E[X 2 ] − 2X + X .
Example 1.5 : Consider the Laplace PDF

1
fX (x) = e−|x| ,
2
which is an even function. The mean, mean square, and standard deviation are computed
as follows.
Z ∞
1
E[X] = x · e−|x| dx = 0
−∞ 2
Z ∞
1
E[X 2 ] = 2 x2 · e−x dx = 2
2
q0 √
2 √
σX = E[X 2 ] − X = 2 − 0 = 2
Finally, we compute Pr{|X − E[X]| < 2σX } below.

Z √
2 2
1 −x
Pr{|X − E[X]| < 2σX } = 2 e dx ≈ 0.94
0 2
Note that the value 0.94 is higher than the lower bound of 0.75 given by the Chebyshev
inequality. ¤
Multivariate Expectations
Consider a function g(X, Y ) of two random variables X and Y . Then,
Z ∞ Z ∞
E[g(X, Y )] = g(x, y)fX,Y (x, y)dxdy
−∞ −∞
More generally, for a function g(X1 , . . . , XN ) of N random variables X1 , . . . , XN ,

Z ∞ Z ∞
E[g(X1 , . . . , XN )] = ··· g(x1 , . . . , xN )fX1 ,...,XN (x1 , . . . , xN )dx1 · · · dxN
−∞ −∞
3
When g(X, Y ) = XY , we have
Z ∞ Z ∞
E[XY ] = xyfX,Y (x, y)dxdy.
−∞ −∞
In addition, if X and Y are independent, i.e. fX,Y (x, y) = fX (x)fY (y), we can write
µZ ∞ ¶ µZ ∞ ¶
E[XY ] = xfX (x)dx yfY (y)dy = E[X]E[Y ].
−∞ −∞
Thus, for independent random variables X and Y ,
E[XY ] = E[X]E[Y ] for independent X and Y
Sum of Random Variables

Let Z = X + Y , where X and Y are random variables. The mean and the variance of Z
are computed below.
E[Z] = E[X] + E[Y ] = X + Y
σZ2 = E[(X + Y − (X + Y ))2 ] = E[((X − X) + (Y − Y ))2 ]
= E[(X − X)2 + (Y − Y )2 + 2(X − X)(Y − Y )]
2
= σX + σY2 + 2E[(X − X)(Y − Y )]
2
= σX + σY2 + 2(E[XY ] − X Y )
For random variables X and Y , the correlation between X and Y is defined as
RXY = E[XY ]
The covariance between X and Y is defined as
CXY = E[(X − X)(Y − Y )] = E[XY ] − X Y
The covariance normalized by the respective standard deviations is called the correlation
coefficient of X and Y , which is written as
CXY
ρXY = .
σX σY
It is left as an exercise to show that −1 ≤ ρXY ≤ 1.
Random variables X and Y are uncorrelated if CXY = 0. Note that, if X and Y are
uncorrelated, then the variance of Z = X + Y is
σZ2 = σX
2
+ σY2 .
In general, for a sum of N random variables X1 + . . . + XN ,
" N # N
X X
E Xn = E[Xn ]
n=1 n=1
In addition,
" N # N
X X
var Xn = var[Xn ] for uncorrelated X1 , . . . , XN
n=1 n=1
Finally, recall that E[XY ] = E[X]E[Y ] for independent X and Y . It follows that
independent random variables are uncorrelated. However, the converse is not true in
general.
4
1.4 Real and Complex Random Vectors and Their Functions
1.4.1 Real Random Vectors
A real random vector is a vector of random variables. In particular, let X = (X1 , . . . , XN ),
where X1 , . . . , XN are random variables. By convention, a real random vector is a column
vector. The statistics of X is fully described by the joint CDF of X1 , . . . , XN , i.e.
FX (x) = FX1 ,...,XN (x1 , . . . , xN ) = Pr{X1 ≤ x1 , . . . , XN ≤ xN },
or the joint PDF of X1 , . . . , XN , i.e.
∂ N FX1 ,...,XN (x1 , . . . , xN )

fX (x) = fX1 ,...,XN (x1 , . . . , xN ) = .
∂x1 · · · ∂xN
The mean vector of a real random vector X is defined as
¡ ¢
X = X 1, . . . , X N
The correlation matrix of X is defined as

 
RX1 X1 · · · RX1 XN
£ ¤  .. 
RX = E XXT =  ... ..
. . 
RXN X1 · · · RXN XN
The covariance matrix of X is defined as

 
h i CX 1 X1 · · · CX 1 X N
 .. 
CX = E (X − X)(X − X) =  ... ...
T
. 
CXN X1 · · · CXN XN
Note that the diagonal entries of CX are the variances of X1 , . . . , XN . In addition, if

X1 , . . . , XN are uncorrelated, then CX is a diagonal matrix.
1.4.2 Complex Random Variables

A complex random variable Z is defined in terms of two random variables X and Y as
Z = X + iY.
The mean of Z is
E[Z] = Z = X + iY ,
while the variance of Z is h¯ ¯2 i
σZ2 = E ¯Z − Z ¯ .
The covariance of two complex random variables Z1 and Z2 is defined as
£ ¤
CZ1 Z2 = E (Z1 − Z1 )(Z2 − Z2 )∗
5
1.4.3 Functions of Random Vectors
Consider N random variables X1 , . . . , XN . Let Y1 , . . . , YN be functions of X1 , . . . , XN . In
particular,
Yn = gn (X1 , . . . , XN ), n = 1, . . . , N.
Let X = (X1 , . . . , XN ) and Y = (Y1 , . . . , YN ). In addition, let g(x) = (g1 (x), . . . , gN (x)).
Assuming that g is invertible, then the joint PDF of Y can be written in terms of the
joint PDF of X as
fY (y) = |J(y)|fX (g−1 (y))
where J(y) is the Jacobian determinant
¯ ¯
¯ ∂g −1 (y)/∂y1 · · · ∂g −1 (y)/∂yN ¯
¯ 1 1 ¯
¯ .. ... .. ¯
J(y) = ¯ ¯
¯ −1 . . ¯
¯ ∂gN (y)/∂y1 · · · ∂gN −1
(y)/∂yN ¯
and g−1 (y) is the inverse function vector

−1
g−1 (y) = (g1−1 (y), . . . , gN (y)).
Suppose that there are multiple solutions of x for y = g(x). We can view g as having
multiple components g1 , . . . , gK . It follows that
K
X
fY (y) = |Jk (y)|fX (gk−1 (y))
k=1
where Jk (y) is the Jacobian determinant

¯ ¯
¯ ∂g −1 (y)/∂y1 · · · ∂g −1 (y)/∂yN ¯
¯ k,1 k,1 ¯
¯ .. .. .. ¯
Jk (y) = ¯ . ¯
¯ −1 . . ¯
¯ ∂gk,N (y)/∂y1 · · · ∂gk,N
−1
(y)/∂yN ¯
and gk−1 (y) is the inverse function vector
gk−1 (y) = (gk,1

−1 −1
(y), . . . , gk,N (y)).
Example 1.6 : Suppose that we know the joint PDF fX1 ,X2 (x1 , x2 ) for random variables
X1 and X2 . Define
Y1 = g1 (X1 , X2 ) = X1 + X2
Y2 = g2 (X1 , X2 ) = X1
We find fY1 ,Y2 (y1 , y2 ) in terms of fX1 ,X2 (·, ·) as follows.

First, we write g1−1 (y1 , y2 ) = y2 and g2−1 (y1 , y2 ) = y1 − y2 . Then,
¯ ¯
¯ 0 1 ¯
J(y1 , y2 ) = ¯¯ ¯ = −1,
1 −1 ¯
yielding
fY1 ,Y2 (y1 , y2 ) = fX1 ,X2 (y2 , y1 − y2 ). ¤
6
Review
Handout 41
1.4.3 Functions of Random Vectors (Continued)

Linear Transformation
Consider random variables Y1 , . . . , YN obtained from linear transformations of X1 , . . . , XN ,
i.e.
XN
Ym = αmn Xn , m = 1, . . . , N,
n=1
where αmn ’s are constant coefficients. By defining X = (X1 , . . . , XN ), Y = (Y1 , . . . , YN ),

and  
α11 · · · α1N
 ..  ,
A =  ... ...
. 
αN 1 · · · αN N
we can write
Y = AX
Assuming that A is invertible, then X = A−1 Y. In addition, the Jacobian determi-
nant for the transformation is
1
J(y) = det A−1 = .
det A
It follows that
1
fY (y) = · fX (A−1 y)
| det A|
Jointly Gaussian Random Vectors

A set of random variables X1 , . . . , XN are zero-mean jointly Gaussian if there is a set of
independent and identically distributed (IID) zero-mean unit-variance Gaussian random
variables Z1 , . . . , ZM such that we can write
M
X
Xn = αn,m Zm
m=1
for all n = 1, . . . , N . For convenience, define random vectors X = (X1 , . . . , XN ) and

Z = (Z1 , . . . , ZM ). In addition, define a matrix
 
α1,1 · · · α1,M
 .. 
A =  ... ...
. 
αN,1 · · · αN,M
1
lert.
1
so that we can write X = AZ. We shall derive the PDF fX (x) in what follows. For
simplicity, we focus on the case with M = N . However, the resultant PDF expressions
are also valid for M 6= N .
We begin with the marginal PDF of Zm , which is the zero-mean unit-variance Gaus-
sian PDF, i.e.
1 2
fZ (z) = √ e−z /2 .
2π
Since Zm ’s are IID, we can write
N
Y 1 2 1 − 21 zT z
fZ (z) = √ e−zm /2 = N/2
e .
m=1
2π (2π)
1
Using the identity fX (x) = f (A−1 x),
| det A| Z
we can write
1 1 −1 x)T (A−1 x) 1 1 T (AAT )−1 x

fX (x) = e− 2 (A = e− 2 x ,
(2π)N/2 | det A| (2π)N/2 | det A|
where the last inequality follows from the fact that
(A−1 x)T (A−1 x) = xT (A−1 )T A−1 x = xT (AT )−1 A−1 x = xT (AAT )−1 x.
Let CX be the covariance matrix for random vector X. It is easy to see that X =
AZ = 0, yielding
£ ¤ £ ¤ £ ¤ £ ¤
CX = E XXT = E AZ(AZ)T = E AZZT AT = AE ZZT AT = AAT
£ ¤
where the last equality follows from the fact that E ZZT = I. Since CX = AAT ,
det CX = det(AAT ) = det A · det AT = | det A|2 ,

√
yielding | det A| = det CX . In conclusion, we can write
1 1 T C −1 x
fX (x) = √ e− 2 x X
(zero-mean jointly Gaussian)
(2π)N/2 det CX
More generally, a random vector X is jointly Gaussian if X = X0 + µ, where X0 is

zero-mean jointly Gaussian and µ is a constant vector in RN . Note that X = µ. For a
jointly Gaussian random vector X, the joint PDF is given by
1 1 T C −1 (x−X)
fX (x) = √ e− 2 (x−X) X
(jointly Gaussian)
(2π)N/2 det CX
The proof is similar to the zero-mean jointly Gaussian case and is omitted.
Some important properties of jointly Gaussian random vector X are listed below.
1. A linear transformation of X yields another jointly Gaussian random vector.
2. The PDF of X is fully determined by the mean X and the covariance matrix CX ,
which are the first-order and second-order statistics.
3. Jointly Gaussian random variables that are uncorrelated are independent.
2
Example 1.7 : Recall that the Gaussian PDF has the form
(x−X)2
1 −
2σ 2
fX (x) = p 2
e X .
2πσX
We now show that two jointly Gaussian random variables are independent if they are
uncorrelated. Let X1 and X2 be jointly Gaussian and uncorrelated. It follows that the
covariance matrix of X = (X1 , X2 ) has the form
· 2 ¸
σ1 0
CX = ,
0 σ22
where σ12 and σ22 are the variances of X1 and X2 respectively. By substituting
p · ¸
−1 1/σ12 0
det CX = σ1 σ2 and CX =
0 1/σ22
into the joint PDF expression of X, we can write
µ ¶
2
(x1 −X 1 ) (x −X ) 2
1 −1 2 + 2 22
fX1 ,X2 (x1 , x2 ) = e 2 σ1 σ2
2πσ1 σ2
Ã !Ã !
(x −X )2 (x −X )2
1 − 1 21 1 − 2 22
= p 2
e 2σ1
p
2
e 2σ2
2πσ1 2πσ2
= fX1 (x1 )fX2 (x2 ),
which implies that X1 and X2 are independent. The argument can in fact be extended in
a straightforward manner to show that uncorrelated jointly Gaussian random variables
X1 , . . . , XN are independent. ¤
1.5 Common Probability Models

Below are some probability models commonly used in engineering applications.
Continuous Random Variables

Name Value PDF CDF Mean Variance
1 x−a a+b (b−a)2
uniform x ∈ [a, b] b−a b−a 2 12
(x−µ)2
³ ´
1 − 1 1 x−µ
Gaussian x∈R √
2πσ 2
e 2σ 2
2
+ 2 erf √
2σ
µ σ2
exponential x≥0 λe−λx 1 − e−λx 1/λ 1/λ2
λ>0
x2
x − 2β 2 − x2 pπ (4−π)β 2
Rayleigh x≥0 β2
e 1−e 2β 2 β 2 2
Discrete Random Variables

Name Value PDF µ ¶ CDF µ ¶ Mean Variance
Pn n P n n
binomial k = 0, . . . , n k=0 × k=0 × np npq
k k
0<p<1 pk q n−k δ(x − k) pk q n−k u(x − k)
q =1−p
P∞ λk e−λ P∞ λk e−λ
Poisson k = 0, 1, . . . k=0 k!
× k=0 k!
× λ λ
λ>0 δ(x − k) u(x − k)
3
Example Engineering Applications
Name Application
uniform modeling of quantization error
Gaussian amplitude distribution of thermal noise,
approximation of other distributions
exponential message length and interarrival time in data communications
Rayleigh fading in communication channels,
envelope of bandpass Gaussian noise
binomial number of random transmission errors in a transmitted block
of n digits
Poisson traffic model, e.g. number of message arrivals in a given
time interval
NOTE: The error function is defined as

Z x
2 2
erf(x) = √ e−u du, x > 0
π 0
erf(−x) = −erf(x), x < 0
Note that the above integral cannot be evaluated to get a closed form expression. Hence,
in practice, the error function is evaluated using a table lookup. In a typical computa-
tional software, e.g. MATLAB, there is a command to evaluate the error function.
In digital communications, it is customary to use the Q function, where
Z ∞
1 2
Q(x) = √ e−u /2 du.
x 2π
In words, Q is the complimentary CDF of the zero-mean unit-variance Gaussian random

variable. It is useful to note the relationship
µ ¶
1 1 x
Q(x) = − erf √
2 2 2
Appendix: Central Limit Theorem

The Gaussian distribution plays an important role in statistics due to the central limit
theorem stated below.
Central Limit Theorem: Consider IID randomP variables X1 , . . . , XN with mean X

and variance σX . Define the sample mean SN = N N
2 1
n=1 Xn . Then,
½ ¾ Z a
SN − X 1 2
lim Pr √ ≤a = √ e−x /2 dx.
N →∞ σX / N −∞ 2π
SN −X
Rougly speaking, the CLT states that, as N gets large, the CDF of √
σX / N
approaches
that of a zero-mean unit-variance Gaussian RV.
4
Handout 51
1.6 Characteristic Functions

The characteristic function of a random variable X is defined as
Z ∞
£ jνX ¤
ΦX (ν) = E e = ejνx fX (x)dx
−∞
Since the integration in the above definition resembles the inverse Fourier transform,
it follows that ΦX (ν) and fX (x) are a Fourier transform pair. More explicitly, if we
substitute x by f and ν by 2πt, then we can write
Z ∞
ΦX (2πt) = ej2πf t fX (f )df,
−∞
which implies that

ΦX (2πt) ↔ fX (f )
It follows that Z ∞
1
fX (x) = ΦX (ν)e−jνx dν
2π −∞
since ΦX (ν) and fX (x) form a Fourier transform pair. The characteristic function can be
used instead of the PDF as a complete statistical description of a random variable. By
using the characteristic functions, we can exploit properties of Fourier transform pairs to
compute several quantities of interest, as indicated below.
PDF of a Sum of Independent Random Variables

Consider two independent random variables X and Y with PDFs fX (x) and fY (y) re-
spectively. Let Z = X + Y . The PDF of Z, i.e. fZ (z), can be found through the use of
characteristic functions as follows.
£ ¤ £ ¤ £ ¤ £ ¤ £ ¤
ΦZ (ν) = E ejνZ = E ejν(X+Y ) = E ejνX ejνY = E ejνX E ejνX
= ΦX (ν)ΦY (ν),
Note that the second last equality follows from the independence between X and Y .
Since multiplication in the time domain corresponds to convoluation in the frequency
domain, having ΦZ (ν) = ΦX (ν)Φ(ν) is equivalent to having
fZ (z) = fX (z) ∗ fY (z)

P
More generally, suppose that Z = N n=1 Xn , where X1 , . . . , XN are independent with
PDFs fX1 (x1 ), . . . , fXN (xN ). We can find the PDF fZ (z) through its characteristic func-
tion as follows.
h PN i £ ¤ N
Y £ jνXn ¤ YN
jν Xi jνX1 jνXN
ΦZ (ν) = E e n=1 =E e ···e = E e = ΦXn (ν)
n=1 n=1
1
lert.
1
PN
Hence, for independent X1 , . . . , XN and Z = n=1 Xn ,
N
Y
ΦZ (ν) = ΦXn (ν), fZ (z) = fX1 (z) ∗ · · · ∗ fXN (z)
n=1
In the special case where X1 , . . . , XN are IID,
ΦZ (ν) = (ΦX (ν))N , fZ (z) = fX (z) ∗ · · · ∗ fX (z)

| {z }
n terms
Finding the nth Moment of a Random Variable

The characteristic function ΦX (ν) is also related to the nth moment of random variable
X, i.e. E [X n ], as will be described next. Consider taking the first derivative of ΦX (ν),
i.e. ·Z ∞ ¸ Z ∞
d d jνx
ΦX (ν) = e fX (x)dx = j xejνx fX (x)dx.
dν dν −∞ −∞
Notice that setting ν = 0 will make the last integral equal to the mean of X, yielding
¯
d ¯
E[X] = −j ΦX (ν)¯¯ .
dν ν=0
The above argument can be extended to obtain the nth moment (as long as the
characteristic function is differentiable up to the nth order), i.e.
¯
dn ¯
n
E [X ] = (−j) Φ (ν) ¯ n
dν n
X ¯
ν=0
Finally, suppose that ΦX (ν) can be expressed as a Taylor series expansion around
ν = 0. Then, ΦX (ν) can be written in terms of the moments of X as follows.
∞ ¯ ∞
X νn dn ¯ X (jν)n
ΦX (ν) = · n ΦX (ν)¯¯ = · E [X n ]
n=0
n! dν ν=0 n=0
n!
Characteristic Functions of Gaussian Random Variables

2
Consider a Gaussian random variable X with mean X and variance σX . We first write
ΦX (ν) as follows.
Z ∞ Z ∞ (x−X)2
jνx 1 −
ΦX (ν) = e fX (x)dx = ejνx p 2
e 2σ 2
X dx
−∞ −∞ 2πσX
Recall the following Fourier transform pair for the Gaussian pulse.
2 /τ 2 2f 2
Ae−πt ↔ Aτ e−πτ
Using the frequency shifting property, we can write

2 /τ 2 2 (f −X)2
Ae−πt ej2πXt ↔ Aτ e−πτ
2
By setting A = 1 and τ = √ 1 2
, we can write
2πσX
(f −X)2
2 2 /2 1 −
e−σX (2πt) ejX(2πt) ↔ p 2
e 2σ 2
X
2πσX
Since the right hand side is equal to fX (f ), the left hand side is equal to ΦX (2πt). It
follows that
2 2
ΦX (ν) = ejXν−σX ν /2
Finally, note that it is possible to obtain the above expression through direct integration.
However, there will be more computation involved.
Let X1 , . . . , XN be independent Gaussian random

PN variables with means X 1 , . . . , X N
2 2
and variances σ1 , . . . , σN . In addition, let Z = n=1 Xn . We find ΦZ (ν) as follows. Since
X1 , . . . , XN are IID,
N
Y N
Y 2 2 /2
ΦZ (ν) = ΦXn (ν) = ejX n ν−σn ν
n=1 n=1
PN P
j ( n=1 X n )ν−( Nn=1 σn )ν /2
2 2
=e .
Note
P that ΦZ (ν) is the characteristic
PN function of a Gaussian random variable with
mean N n=1 X n and variance σ
n=1 n
2
. Hence, we have just shown that a sum of inde-
pendent Gaussian random variables is another Gaussian random variable with the mean
equal to the sum of individual means and with the variance equal to the sum of individual
variances.
Moment Generating Function

The moment generating function of a random variable X is defined as
£ ¤
ΨX (s) = E esX
Note that the moment generating function is equivalent to the characteristic function
ΦX (ν) when s = jν.
As the name suggests, there is a close relationship between ΨX (s) and the nth moment
of X. In particular,
¯
dn ¯
E [X ] = n ΨX (s)¯¯
n
ds s=0
The proof is quite similar to using the characteristic function and is thus omitted.
Example 1.8 : Consider the exponential PDF

fX (x) = λe−λx , x ≥ 0,
where λ > 0. The characteristic function is computed below.
Z ∞ Z ∞
jνx −λx
ΦX (ν) = e · λe dx = λ e(jν−λ)x dx
0 0
¯
(jν−λ)x ¯∞
e ¯ =− λ jλ
=λ· ¯ =
(jν − λ) 0 (jν − λ) ν + jλ
3
The mean or first moment is computed below.
¯ ¯
d ¯ −jλ ¯ 1
E[X] = −j · ΦX (ν)¯¯ = −j · 2
¯
¯ =
dν ν=0 (ν + jλ) ν=0 λ
The second moment is computed below.

¯ ¯
£ ¤ d2 ¯ j2λ ¯¯ 2
E X 2 ¯
= − 2 ΦX (ν)¯ =− = 2
dν 3 ¯
(ν + jλ) ν=0 λ
ν=0
The variance is computed below.

µ ¶2
£ 2
¤ 22 1 1
var[X] = E X −X = 2 − = 2 ¤
λ λ λ
Appendix: Central Moments of Gaussian Random Variables

We show in this appendix that, for a Gaussian random variable X with mean X and
2
variance σX ,
½
£ n
¤ n
1 · 3 · 5 · · · (n − 1)σX , n even
E (X − X) =
0, n odd
£ ¤
For convenience, let Y = X − X. Note that E [Y n ] = E (X − X)n . Then, Y is
2 2 2
Gaussian with Pzero mean and variance σX . It follows that ΦY (ν) = e−σX ν /2 . We use the
fact that et = ∞ m
m=0 t /m! to write
X∞ 2 2 X∞ X∞
(−σX ν /2)m (−1)m σX
2m 2m
ν (jν)2m (2m)!σX
2m
ΦY (ν) = = = ·
m=0
m! m=0
2m m! m=0
(2m)! 2m m!
∞
X X ∞
(jν)n n
n!σX (jν)n 1 · 2 · 3 · · · n n
= · n/2 = · σ
n=0,n even
n! 2 (n/2)! n=0,n even n! 2 · 4 · 6···n X
X ∞
(jν)n n
= · 1 · 3 · 5 · · · (n − 1)σX .
n=0,n even
n!
By comparing term by term to the Taylor’s series expansion mentioned previosly, i.e.
∞
X (jν)n
ΦY (ν) = · E [Y n ] ,
n=0
n!
the desired expression follows. ¤
4
Review
Asian Institute of Technology
Handout 61
2.7 Upper Bounds on the Tail Probabilities

2.7.1 Another Look at the Chebyshev Inequality
2
Recall that, for a random variable X with mean X and variance σX ,
© ª σ2
Pr |X − X| ≥ δ ≤ X2
δ
where δ > 0.
We now provide an alternative derivation of this inequality. Consider the function
g(y) defined as follows. ½
1, |y| ≥ δ
g(y) =
0, |y| < δ
Figure 2.7 illustrates that g(y) ≤ y 2 /δ 2 , which implies that
£ ¤
E[g(Y )] ≤ E Y 2 /δ 2
for an arbitrary random variable Y . Note that
E[g(Y )] = 0 · Pr {|Y | < δ} + 1 · Pr {|Y | ≥ δ} = Pr {|Y | ≥ δ} .
In addition, if Y has zero mean, then E [Y 2 /δ 2 ] = σY2 /δ 2 , yielding
σY2
Pr {|Y | ≥ δ} ≤ .
δ2
2
Finally, let Y = X − X. Since σX = σY2 , we can write the desired expression, i.e.
© ª σ2
Pr |X − X| ≥ δ ≤ X2 .
δ
0
Figure 2.7: Bound on function g(y) for the Chebyshev inequality.
The Chebyshev bound is found to be “loose” for a large number of practical appli-
cations. One reason is the looseness of the function y 2 /δ 2 as an upper bound on the
function g(y).
1
lert.
1
2.7.2 Chernoff Bound
Tighter upper bounds can often be obtained using the Chernoff bound, which is derived
as follows. First, define the function g(x) as
½
1, x ≥ δ
g(x) =
0, x < δ
Figure 2.8 illustrates that g(x) ≤ es(x−δ) for any s > 0, which implies that
£ ¤
E[g(X)] ≤ E es(X−δ)
for an arbitrary random variable X. Note that
E[g(X)] = 0 · Pr{X < δ} + 1 · Pr{X ≥ δ} = Pr{X ≥ δ}.
It follows that £ ¤
Pr{X ≥ δ} ≤ e−sδ E esX , s > 0
0
Figure 2.8: Bound on function g(x) for the Chernoff bound.
The above expression gives an upper bound on the “upper tail” of the PDF. The
tightest bound can be obtained by minimizing the upper bound expression with respect
to s, i.e. solving for s from
d £ s(X−δ) ¤ £ ¤ ¡ £ ¤ £ ¤¢
0= E e = E (X − δ)es(X−δ) = e−sδ E XesX − δE esX .
ds
Thus, the tightest bound is obtained by setting s = s∗ , where
£ ∗ ¤ £ ∗ ¤
E Xes X = δE es X , s∗ > 0
An upper bound on the “lower tail” of the PDF can be derived similarly, yielding
£ ¤
Pr{X ≤ δ} ≤ e−sδ E esX , s < 0
The tightest bound is obtained by setting s = s∗ , where

£ ∗ ¤ £ ∗ ¤
E Xes X = δE es X , s∗ < 0
2
Another Look at the Chernoff Bound
Recall that the Chebyshev bound can be derived from the Markov inequality, i.e. Pr{X ≥
a} ≤ X/a for nonnegative random variable X. Similarly, we can derive the Chernoff
bound from the Markov inequality, as stated formally below.
Theorem (Chernoff bound): For a random variable X,

£ ¤
Pr{X ≥ δ} ≤ e−sδ E esX , s > 0
£ ¤
Pr{X ≤ δ} ≤ e−sδ E esX , s < 0
Proof: Take esX as a random variable in the Markov inequality. In addition, view the
event X ≥ δ as being equivalent to esX ≥ esδ for s > 0. Finally, view the event X ≤ δ as
being equivalent to esX ≥ esδ for s < 0. ¤
Example 2.9 : Consider the Laplace PDF

1
fX (x) = e−|x| .
2
2
£ ¤ 1
It is left as an exercise to verify that X = 0, σX = 2, E esX = 1−s2
, and
1
Pr{Y ≥ δ} = e−δ (exact)
2
for any δ > 0. The Chebyshev bound is
2
Pr{|Y | ≥ δ} ≤ .
δ2
Since fX (x) is even, we can write
1
Pr{Y ≥ δ} ≤ (Chebyshev)
δ2
The Chernoff bound is given by
£ ¤ e−sδ
Pr{Y ≥ δ} ≤ e−sδ E esX = .
1 − s2
√
−1+ 1+δ 2
The bound can be optimized by setting s = δ
, yielding
δ2 √
2 δ
Pr{Y ≥ δ} ≤ ¡ √ ¢e1− 1+δ ≈ e−δ (Chernoff)
2 −1 + 1 + δ 2 2
for δ À 1. Thus, the Chernoff bound (with exponential decrease) is much tighter than
the Chebyshev bound (with polynomial decrease) for large δ. ¤
3
2.7.3 Tail Probabilities for a Sum of IID Random Variables
2
Let X1 , X2 , . . . be IID random variables with finite mean X and finite variance σX . Define
the sample mean
N
1 X
SN = Xn .
N n=1
Note that the mean of SN is
" N
# N
1 X 1 X 1
E [SN ] = E Xn = E [Xn ] = · N X = X.
N n=1 N n=1 N
In addition, since X1 , X2 , . . . are IID,

" N
# N
1 X 1 X 1 σ2
var [SN ] = var Xn = 2 2
var [Xn ] = 2 · N σX = X.
N n=1 N n=1 N N
Since var [SN ] goes to 0 as N → ∞, we expect that SN approaches X. The following

weak law of large numbers states that this is the case with probability approaching 1 as
N → ∞.
Theorem (Weak law of large numbers): For any δ > 0,

©¯ ¯ ª
lim Pr ¯SN − X ¯ ≥ δ = 0.
N →∞
Proof: Take SN as a random variable in the Chebyshev inequality and consider the limit
as N → ∞. ¤
As discussed above, the weak law of large numbers results from applying the Cheby-
shev inequality to the sample mean SN . Let us now consider applying the Chernoff bound
to SN . We start by writing, for any s > 0,
Pr {SN ≥ δ} = Pr {N SN ≥ N δ}
£ ¤ £ ¤
≤ e−sN δ E esN SN = e−sN δ E esX1 · · · esXN .
Since X1 , . . . , XN are IID, we have

¡ £ ¤¢N
Pr {SN ≥ δ} ≤ e−sδ E esX1 .
The bound is minimized by choosing s = s∗ such that the derivative of the bound is equal
to zero, i.e.
£ ∗ ¤ £ ∗ ¤
E X1 es X1 = δE es X1 , s∗ > 0
4
Example 2.10 : Consider IID random variables X1 , X2 , . . . with
½
1, with probability p
Xn =
−1, with probability 1 − p
where we assume that p < 1/2. We shall use the Chernoff bound to show that
( N )
X
Pr Xn ≥ 0 ≤ (4p(1 − p))N/2 .
n=1
PN
First, note that having n=1 Xn ≥ 0 is equivalent to having SN ≥ 0. Hence,
( N )
X ¡ £ ¤¢N
Pr Xn ≥ 0 = Pr {Sn ≥ 0} ≤ E esX1 , s > 0.
n=1
£ ¤
From the given PDF, E esX1 = pes + (1 − p)e−s , yielding
( N )
X ¡ ¢N
Pr Xn ≥ 0 ≤ pes + (1 − p)e−s , s > 0.
n=1
q
1−p
The bound can be minimized by setting es = p
, yielding the desired expression. ¤
Appendix: Partial Justification of Central Limit Theorem

2
Consider IID random variables X1 , X2 , . . . with mean X and variance σX . Let SN =
1
P N SN −X
n=1 Xn . We shall show below that the characteristic function of σX / N approaches
√
N
that of the zero-mean unit-variance Gaussian PDF as N → ∞.
For convenience, let Un = Xnσ−X
X
. Note that Un has zero mean and unit variance. In
addition, note that
N
SN − X 1 X
W = √ =√ Un
σX / N N n=1
has zero mean and unit variance. Since X1 , X2 , . . . are IID, so are U1 , U2 , . . .. Let ΦU (ν)
denote the characteristic function for each Un . Assume that ΦU (ν) can be expressed using
the Taylor series expansion around ν = 0, i.e.
∞ ¯ ∞
X ν m dm ¯ X (jν)m
ΦU (ν) = · m ΦU (ν)¯ ¯ = · E [U m ]
m=0
m! dν ν=0 m=0
m!
Since U1 , U2 , . . . are IID,
h i h i µ µ ¶¶N
£ jνW
¤ jν
√
PN
Un
jνU1
√
jνUN
√ ν
ΦW (ν) = E e =E e N n=1
=E e N ···e N = ΦU √
N
√
Applying the Taylor series expansion of ΦU (ν/ N ) around ν = 0 and the assumption
that E[U ] = U = 0 and E [U 2 ] = var[U ] = 1,
µ ¶ √ √
ν jν/ N (jν/ N )2 £ 2 ¤
ΦU √ =1+ E[U ] + E U + RN (ν)
N 1! 2!
ν2
=1− + RN (ν),
2N
5
where RN (ν) is the remainder term that goes to 0 as N → ∞.
It follows that µ ¶
ν2
ln ΦW (ν) = N ln 1 − + RN (ν) .
2N
We now use the fact that ln(1 + x) ≈ x for small x to write
ν2
lim ln ΦW (ν) = − ,
N →∞ 2
or equivalently
2 /2
ΦW (ν) = e−ν ,
which is the characteristic function of a zero-mean unit-variance Gaussian random vari-
able. Thus, in the limit as N → ∞, W becomes a zero-mean unit-variance Gaussian
random variable.
In general, the PDF of W may not approach the Gaussian PDF. However, the CDF
of W will approach the Gaussian CDF, as stated previously in the central limit theorem.
6
Review
Handout 71
1.8 Additional Discussions on Commonly Used PDFs

Chi-Square PDF
Consider a zero-mean Gaussian¯ −1random
¯ variable X with variance σ 2 . Let Y = X 2 . Using
PK ¯ dg (y) ¯
fY (y) = k=1 fX (gk−1 (y)) · ¯ kdy ¯, the PDF of Y can be written as
1 √ √
fY (y) = √ [fX (− y) + fX ( y)] , y ≥ 0.
2 y
2 2
Substituting fX (x) = √ 1 e−x /2σ and using the even propertfy of fX (x),
2πσ 2
1 y
fY (y) = p e− 2σ2 , y ≥ 0
2πyσ 2
With the above PDF, Y is called a chi-square random variable with one degree of freedom.
Its characteristic function is written below.
Z ∞
1 y
ΦY (ν) = ejνy p e− 2σ2 dy
0 2πyσ 2
Z ∞
1 (1−j2σ 2 ν)y dy
= √ e− 2σ2 √ .
0 2πσ 2 y
1/2
By substituting u = ((1 − j2σ 2 ν)y) , we can write
Z ∞
2 1 u2 1
ΦY (ν) = √ e− 2σ2 du = ,
(1 − j2σ 2 ν)1/2 0 2πσ 2 (1 − j2σ 2 ν)1/2
where the last equality follows from the fact that the integral is equal to half the area
under the zero-mean unit-variance Gaussian PDF curve.
Consider now
PNN IID zero-mean Gaussian random variables X1 , . . . , XN with variance
σ 2 . Let Z = 2
n=1 Xn . Then, Z is a chi-square random variable with N degrees of
freedom. We find the PDF of Z by writing its characteristic function as follows. For
convenience, let Yn = Xn2 . Note that each Yn is a chi-square random variable with one
degree of freedom. Since Z is a sum of IID random variables Y1 , . . . , YN , we can write
ΦZ (ν) = (ΦY1 (ν))N , yielding
1
ΦZ (ν) =
(1 − j2σ 2 ν)N/2
Recall that ΦZ (2πt) ↔ fZ (f ). It can be verified through straightforward computation

that the inverse Fourier transform of the following PDF yields the above characteristic
function.
1 N z
fZ (z) = N N/2 z 2 −1 e− 2σ2 , z ≥ 0
σ 2 Γ(N/2)
1
lert.
1
where Γ(p) is the Gamma function defined as
Z ∞
Γ(p) = xp−1 e−x dx, p > 0
0
Below are some key properties of the Gamma function. Their proofs are left as exercises.
1. Γ(1) = 1
2. Γ(p) = (p − 1)Γ(p − 1)
3. Γ(n) = (n − 1)!, n = 1, 2, . . .
√
4. Γ(1/2) = π
Finally, it should be noted that, for N = 2, a chi-square random variable with two
degrees of freedom is equivalent to an exponential random variable. You should be able
to verify that
E[Z] = N σ 2 , var[Z] = 2N σ 4 .
Rayleigh PDF
2
Let Xp1 and X2 be two IID zero-mean Gaussian random variables with variance σ . Define
R = X12 + X22 . Then, R is a Rayleigh random variable. The PDF of R is derived as
follows. We first define Y = X12 + X22 . It follows that Y has the exponential PDF
1 − y2
fY (y) = e 2σ , y ≥ 0.
2σ 2
√
Since R = Y , we can write
¯ ¯
¯ d 2¯
fR (r) = fY (r ) · ¯ r ¯¯
¯ 2
dr
yielding
r − r22
fR (r) = e 2σ , r ≥ 0
σ2
The mean and variance of a Rayleigh random variable are given by
r
π 4−π 2
E[R] = σ , var[R] = σ .
2 2
Bernoulli Distribution
A Bernoulli random variable X has the following probabilities
½
0, with probability 1 − p
X=
1, with probability p
The event that X = 1 is often referred to as a “success”. You should be able to verify
that
E[X] = p, var[X] = p(1 − p), ΦX (ν) = 1 − p + pejν .
2
Binomial Distribution
Let
PN X1 , . . . , XN be IID Bernoulli random variables with parameter p. Then, Y =
n=1 Xn is a binomial random variable whose probabilities are given by
µ ¶
N k
Pr{X = k} = p (1 − p)N −k , k = 0, 1, . . . , N
k
The value Pr{X = k} gives the probability that k out of N events are “successful”,
where each event is succcessful with probability p. You should be able to verify that
¡ ¢N
E[X] = N p, var[X] = N p(1 − p), ΦX (ν) = 1 − p + pejν .
Geometric Distribution
Consider an experiment in which each independent trial is successful with probability p.
Let X denote the number of trials required until the first success, i.e. the first X − 1
trials fail. Then, X is a geometric random variable with the following probabilities
Pr{X = k} = (1 − p)k−1 p, k = 1, 2, . . .
You should be able to verify that
1 1−p pejν
E[X] = , var[X] = , ΦX (ν) = .
p p2 1 − (1 − p)ejν
Alternatively, X can be defined as the number of failures before the first success. In
this case, the probabilities of X is
Pr{X = k} = (1 − p)k p, k = 0, 1, . . .

1−p 1−p p
E[X] = , var[X] = , ΦX (ν) = .
p p2 1 − (1 − p)ejν
Poisson Distribution
A Poisson random variable X with parameter λ has the following probabilities.
λk
Pr{X = k} = e−λ , k = 0, 1, . . .
k!
A Poisson random variable represents the number of arrivals in one time unit for an
arrival process in which interarrival times are independent exponential random variables.
¡ ¢
E[X] = λ, var[X] = λ, ΦX (ν) = exp λ(ejν − 1) .
3
Review
Handout 81
2 Random Processes
2.1 Definition of Random Processes
Recall that a random variable is a mapping from the sample space S to the set of real
numbers R. In comparison, a stochastic process or random process is a mapping from
the sample space S to the set of real-valued functions called sample functions. Figure 2.1
illustrates the mapping for a random process.
Figure 2.1: Mapping from sample points in the sample space to sample functions.
We can denote a random process as {X(t), t ∈ R} to emphasize that it consists of

a set of random variables, one for each time t. However, for convenience, we normally
write X(t) instead of {X(t), t ∈ R} to denote a random process. The sample function
for sample point s ∈ S is denoted as x(t, s). Note that, once we specify s, the process is
no longer random (and hence denoted by x(t, s) instead of X(t, s)).
Finally, a complex random process X(t) is defined as X(t) = XR (t) + jXI (t), where
XR (t) and XI (t) are (real) random processes. For the sake of generality, we assume in
our discussion that X(t) is a complex random process, unless explicitly stated otherwise.
2.2 Statistics of Random Processes

Recall that the value of a random process X(t) at time instant t is a random variable.
We can define the mean of random process X(t) by taking the expectation of X(t) for
each t, i.e.
X(t) = E[X(t)]
1
lert.
1
The autocorrelation function of X(t) is defined as
RX (t1 , t2 ) = E [X(t1 )X ∗ (t2 )]
The autocovariance function of X(t) is defined as

h³ ´³ ´∗ i
CX (t1 , t2 ) = E X(t1 ) − X(t1 ) X(t2 ) − X(t2 ) .
∗
= RX (t1 , t2 ) − X(t1 ) X(t2 )
Similarly, the cross-correlation function of two random processes X(t) and Y (t) is
defined as
RXY (t1 , t2 ) = E [X(t1 )Y ∗ (t2 )]
The cross-covariance function of X(t) and Y (t) is defined as
h³ ´³ ´∗ i
CXY (t1 , t2 ) = E X(t1 ) − X(t1 ) Y (t2 ) − Y (t2 ) .
∗
= RXY (t1 , t2 ) − X(t1 ) Y (t2 )
By the analogy with random variables, random processes X(t) and Y (t) are uncorre-
lated if
CXY (t1 , t2 ) = 0 for all t1 , t2 ∈ R.
and statistically independent if the joint CDF satisfies
FX(t1 ),...,X(tm ),Y (t01 ),...,Y (t0n ) (x1 , . . . , xm , y1 , . . . , yn ) = FX(t1 ),...,X(tm ) (x1 , . . . , xm )
× FY (t01 ),...,Y (t0n ) (y1 , . . . , yn )
for all m, n ∈ Z+ , t1 , . . . , tm , t01 , . . . , t0n ∈ R, and x1 , . . . , xm , y1 , . . . , yn ∈ C. As with

random variables, independence implies uncorrelatedness, but the converse is not true in
general.
Time Averages
The mean X(t) as defined above is also referred to as the ensemble average. The time
average of sample function x(t) is denoted and defined as follows.
Z T /2
1
hx(t)i = lim x(t)dt
T →∞ T −T /2
Similarly, the time autocorrelation function of sample function x(t) is

Z T /2
∗ 1
hx(t)x (t − τ )i = lim x(t)x∗ (t − τ )dt
T →∞ T −T /2
2
2.3 Stationary, Ergodic, and Cyclostationary Processes
A random process is strict sense stationary (SSS) if, for all values of n ∈ Z+ and
t1 , . . . , tn , τ ∈ R, the joint CDF satisfies
FX(t1 ),...,X(tn ) (x1 , . . . , xn ) = FX(t1 +τ ),...,X(tn +τ ) (x1 , . . . , xn )
for all x1 , . . . , xn ∈ C. Roughly speaking, the statistics of the random process looks the
same at all time.
For the purpose of analyzing communication systems, it is usually sufficient to assume
a stationary condition that is weaker than SSS. In particular, a random process X(t) is
wide-sense stationary (WSS) if, for all t1 , t2 ∈ R,
X(t1 ) = X(0) and CX (t1 , t2 ) = CX (t1 − t2 , 0).
Roughly speaking, for a WSS random process, the first and second order statistics look
the same at all time. Note that a SSS random process is always WSS, but the converse
is not always true.
Since the autocorrelation function RX (t1 , t2 ) of a WSS random process only depends
on the time difference t1 −t2 , we usually write RX (t1 , t2 ) as a function with one argument,
i.e. RX (t1 − t2 ). Similarly, for a WSS process, we can write the autocovariance function
CX (t1 , t2 ) as CX (t1 − t2 ).
A random process is ergodic if all statistical properties that are ensemble averages
are equal to the corresponding time averages. An ergodic process must be SSS, but
ergodicity is a stronger condition than the SSS condition, i.e. some SSS process is not
ergodic. Since all statistical properties of an ergodic process can be determined from a
single sample function, each sample function of an ergodic process is representative of the
entire process.
Randomly phased sinusoid and stationary Gaussian process are examples of ergodic
processes. However, a test of ergodicity for an arbitrary random process in quite difficult
in general and is beyond the scope of this course. For analysis, we shall assume that the
random process of interest is ergodic, unless explicitly stated otherwise.
Example 2.1 (Randomly phased sinusoidal): Consider the random process X(t)
defined as
X(t) = A cos(2πf0 t + Φ),
where A, f0 > 0 are constants and Φ is a random variable uniformly distributed in the
interval [0, 2π]. The mean of X(t) is computed as
X(t) = E[A cos(2πf0 + Φ)]

Z 2π
1
=A cos(2πf0 + ϕ)dϕ
0 2π
= 0,
where the last equality follows from the fact that the integral is taken over one period of
the cosine function and is hence zero.
3
The autocovariance function CX (t1 , t2 ) is computed as
CX (t1 , t2 ) = E[X(t1 )X ∗ (t2 )]

= A2 E [cos(2πf0 t1 + Φ) cos(2πf0 t2 + Φ)]
· ¸
2 1 1
=A E cos(2πf0 (t1 − t2 )) + cos(2πf0 (t1 + t2 ) + 2Φ)
2 2
2 2
A A
= cos(2πf0 (t1 − t2 )) + E [cos(2πf0 (t1 + t2 ) + 2Φ)]
2 2 Z
A2 A2 2π 1
= cos(2πf0 (t1 − t2 )) + cos(2πf0 (t1 + t2 ) + 2ϕ) dϕ
2 2 0 2π
2
A
= cos(2πf0 (t1 − t2 )),
2
where the last equality follows from the fact that the integral is taken over two periods
of the cosine function and is hence zero. Since X(t) = 0 and CX (t1 , t2 ) depends only
on t1 − t2 , X(t) is WSS. For Φ = ϕ, the time average of the sample function x(t) =
A2 cos(2πf0 t + ϕ) is
Z
1 T /2 2
hx(t)i = lim A cos(2πf0 t + ϕ)dt = 0.
T →∞ T −T /2
Note that the time average is equal to the ensemble average. ¤
Example 2.2 : Consider the random process X(t) defined as
X(t) = A cos(2πF0 t + Φ),
where A > 0 is a constant, Φ is a random variable uniformly distributed in the interval

[0, 2π], and F is a random variable independent of Φ with PDF fF0 (f0 ). For F0 = f0 , the
mean of X(t) is computed as
E [X(t)|F0 = f0 ] = E[A cos(2πf0 + Φ)] = 0,
which follows from the previous example. It follows that

Z ∞
X(t) = E [X(t)|F0 = f0 ] fF0 (f0 )df0 = 0.
−∞
The autocovariance function CX (t1 , t2 ) is computed as
CX (t1 , t2 ) = E[X(t1 )X ∗ (t2 )]

Z ∞
= E[X(t1 )X ∗ (t2 )|F0 = f0 ]fF0 (f0 )df0
0
Z
A2 ∞
= cos(2πf0 (t1 − t2 ))fF0 (f0 )df0 ,
2 −∞
where the last equality follows from the previous example. Since X(t) = 0 and CX (t1 , t2 )
depends only on t1 − t2 , X(t) is WSS. ¤
4
Example 2.3 : Consider the random process defined as
X(t) = 6eΦt ,
where Φ is a random variable uniformly distributed in [0, 2]. The ensemble average is
Z ∞
1 3 ¯2 3 ¡ 2t ¢
X(t) = 6eϕt dϕ = · eϕt ¯0 = e −1 .
−∞ 2 t t
Since X(t) depends on time t, X(t) is not WSS. The autocorrelation function RX (t1 , t2 )
is computed as
£ ¤
RX (t1 , t2 ) = E[X(t1 )X ∗ (t2 )] = E 6eΦt1 · 6eΦt2
Z 2
1 18 ¯2
= 36 eϕ(t1 +t2 ) · dϕ = · eϕ(t1 +t2 ) ¯0
0 2 t1 + t2
18 ¡ 2(t1 +t2 ) ¢
= e −1 . ¤
t1 + t2
5
Review
Handout 91
Autocorrelation Functions of WSS Random Processes

Consider a WSS random process X(t). Recall that its autocorrelation function RX (t1 , t2 )
can be written as RX (τ ) with τ = t1 −t2 . The autocorrelation function RX (τ ) has similar
properties as for deterministic signals, i.e. for Rx (τ ) with respect to a power or energy
signal x(t).
∗
1. RX (−τ ) = RX (τ )
2. RX (0) ≥ 0
3. |RX (τ )| ≤ RX (0)
2
Example 2.4 : The first two statements are proven below.
1. From RX (τ ) = E [X(t)X ∗ (t − τ )],
RX (−τ ) = E [X(t)X ∗ (t + τ )] = (E [X(t + τ )X ∗ (t)])∗ = RX

∗
(τ ).
2. From RX (τ ) = E [X(t)X ∗ (t − τ )],

£ ¤
RX (0) = E [X(t)X ∗ (t)] = E |X(t)|2 ≥ 0,
where the last inequality follows since |X(t)|2 ≥ 0. ¤
Jointly WSS Processes

Two random processes X(t) and Y (t) are jointly WSS if their cross-correlation function
satisfy
RXY (t1 , t2 ) = RXY (t1 − t2 , 0)
and can be written as RXY (τ ) with τ = t1 − t2 . Below are some basic properties of
RXY (τ ). Their proofs are omitted.
∗
1. RXY (−τ ) = RXY (τ )
p
2. |RXY (τ )| ≤ RX (0)RY (0)
1
3. |RXY (τ )| ≤ (RX (0) + RY (0))
2
1
lert.
2
The third statement is somewhat more difficult to show. To do so, we can justify the following
statement (using the same argument as for the derivation of the Schwarz inequality)
p
E [U (t)V ∗ (t)] ≤ E [|U (t)|2 ] E [|V (t)|2 ],
and use the above inequality to establish the third statement by setting U (t) = X(t) and V (t) = X(t−τ ).
1
2.4 Gaussian Processes
A random process X(t) is a zero-mean Gaussian process if, for all N ∈ Z+ and t1 , . . . , tN ∈
R, (X(t1 ), . . . , X(tN )) is a zero-mean jointly Gaussian random vector. In addition, we
say that X(t) is a Gaussian process if it is the sum of a zero-mean Gaussian process and
some deterministic function µ(t). Note that X(t) = µ(t).
Some important properties of Gaussian process X(t) are listed below. The proofs are
beyond the scope of this course and are omitted.
1. If we pass X(t) through an LTI filter with impulse response h(t), the output X(t) ∗
h(t) is a Gaussian process.
2. The statistics of X(t) is fully determined by the mean X(t) and the covariance
function CX (t1 , t2 ).
R∞
3. We refer to the quantity of the form −∞ X(t)u(t)dt as an observable or linear
functional of X(t). Any set of linear functionals of X(t) are jointly Gaussian.
4. A WSS Gaussian process is also SSS as well as ergodic.
2.5 Spectral Characteristics of Random Signals

An energy signal x(t) must be time-limited in the sense that |x(t)| → 0 as |t| → ∞. Thus,
the statistics of such a signal cannot be time invariant. In general, we can conclude that
a stationary random signal must be a power signal.
Due to their random nature, random signals may not satisfy the conditions for the
existence of Fourier transforms, e.g. absolutely integrable. However, for a WSS random
signal X(t), we can talk about the power spectral density (PSD) or the power spectrum,
denoted by GX (f ), as the Fourier transform of its autocorrelation function RX (τ ), i.e.
RX (τ ) ↔ GX (f )
The PSD GX (f ) is a real and nonnegative function of f . The average power of a

random signal X(t) in the frequency band [f1 , f2 ] can be computed from its PSD as
Z −f1 Z f2
P[f1 ,f2 ] = GX (f )df + GX (f )df
−f2 f1
For τ = 0, we have, through the inverse Fourier transform, the average power of the
signal equal to
Z ∞
£ 2
¤
E |X(t)| = RX (0) = GX (f )df
−∞
which is a counterpart of the Parseval’s theorem for deterministic signals.
Example 2.5 : Consider again the randomly phased sinusoidal signal
X(t) = A cos(2πf0 t + Φ),
2
where A and f0 are positive constants and Φ is uniformly distributed in [0, 2π]. Recall
2
that RX (τ ) = A2 cos(2πf0 τ ). It follows that
A2 A2
GX (f ) = F {RX (τ )} = δ(f − f0 ) + δ(f + f0 ).
4 4
In addition, as another illustration of the ergodicity of X(t), consider computing the
time-average autocorrelation for an arbitrary sample function for Φ = ϕ as follows.
Z
∗ 1 T /2
hX(t)X (t − τ )i = lim A cos(2πf0 t + ϕ)A cos(2πf0 (t − τ ) + ϕ)dt
T →∞ T −T /2
Z
A2 T /2
= lim (cos(2πf0 τ ) + cos(4πf0 t − 2πf0 τ + 2ϕ)) dt
T →∞ 2T −T /2
 
 A2 Z 
 A2 T /2 
= lim  cos(2πf0 τ ) + cos(4πf0 t − 2πf0 τ + 2ϕ)dt
T →∞  2 2T −T /2 
| {z }
=0 as T →∞
2
A
= cos(2πf0 τ )
2
Note that the time average is equal to the ensemble average. ¤
PSD of the Sum of Random Signals

Consider the sum of two WSS random signals X(t) and Y (t). Assume that each signal
is also WSS. Let Z(t) = X(t) + Y (t). The mean of Z(t) is computed as
Z(t) = E[X(t) + Y (t)] = E[X(t)] + E[Y (t)]

= E[X(0)] + E[Y (0)] = E[X(0) + Y (0)] = E[Z(0)] = Z(0)
where we have applied the WSS properties of X(t) and Y (t) to write E[X(t)] = E[X(0)]
and E[Y (t)] = E[Y (0)]. The autocorrelation function of Z(t) is computed below.
RZ (t1 , t2 ) = E [Z(t1 )Z ∗ (t2 )]

= E [(X(t1 ) + Y (t1 ))(X ∗ (t2 ) + Y ∗ (t2 ))]
= E [X(t1 )X ∗ (t2 )] + E [Y (t1 )Y ∗ (t2 )] + E [X(t1 )Y ∗ (t2 )] + E [Y (t1 )X ∗ (t2 )]
= RX (t1 , t2 ) + RY (t1 , t2 ) + RXY (t1 , t2 ) + RY X (t1 , t2 )
Since X(t) and Y (t) are WSS as well as jointly WSS,
RZ (t1 , t2 ) = RX (t1 − t2 ) + RY (t1 − t2 ) + RXY (t1 − t2 ) + RY X (t1 − t2 ).
Note that RZ (t1 , t2 ) depends only on t1 − t2 . It follows that Z(t) is also WSS. Conse-
quently, we can write
RZ (τ ) = RX (τ ) + RY (τ ) + RXY (τ ) + RY X (τ )
which in the frequency domain becomes
GZ (f ) = GX (f ) + GY (f ) + GXY (f ) + GY X (f )
3
where define the cross PSDs such that
RXY (τ ) ↔ GXY (f ) and RY X (τ ) ↔ GY X (f ).
If X(t) and Y (t) have zero mean and are uncorrelated, then RXY (τ ) = RY X (τ ) = 0
for all τ , yielding
RZ (τ ) = RX (τ ) + RY (τ ).
In terms of the PSD,
GZ (f ) = GX (f ) + GY (f ).
Thus, for zero-mean uncorrelated jointly WSS random signals, superposition holds for the
autocorrelation function as well as for the PSD.
PSD of the Product of Random Signals

We first show that, for independent WSS random signals X(t) and Y (t), the signal
Z(t) = X(t)Y (t) is also WSS and satisfies
RZ (τ ) = RX (τ )RY (τ ) ↔ GZ (f ) = GX (τ ) ∗ GY (τ )
Proof: Using the independence between X(t) and Y (t), the mean of Z(t) is written as
Z(t) = E[X(t)Y (t)] = E[X(t)]E[Y (t)]

= E[X(0)]E[Y (0)] = E[X(0)Y (0)] = E[Z(0)] = Z(0).
Using the independence between X(t) and Y (t) and their WSS properties, the autocor-
relation function is written as
RZ (t1 , t2 ) = E [X(t1 )Y (t1 )X ∗ (t2 )Y ∗ (t2 )] = E [X(t1 )X ∗ (t2 )] E [Y (t1 )Y ∗ (t2 )]

= RX (t1 , t2 )RY (t1 , t2 ) = RX (t1 − t2 )RY (t1 − t2 ).
Note that RZ (t1 , t2 ) depends only on t1 − t2 . It follows that Z(t) is also WSS. Conse-
quently, we can write RZ (τ ) = RX (τ )RY (τ ).
Since multiplication in the time domain corresponds to convolution in the frequency
domain, we can write GZ (f ) = GX (f ) ∗ GY (f ). ¤
Example 2.6 (Modulated random signal): Consider the modulated random signal
Y (t) = X(t) cos(2πf0 t + Φ),
where X(t) is a WSS random signal while the random phase Φ is uniformly distributed
in [0, 2π] and is independent of X(t).
Recall that U (t) = cos(2πf0 t + Φ) is WSS with the autocorrelation function RU (τ ) =
1
2
cos(2πf0 τ ). It follows that Y (t) is WSS with the following autocorrelation function.
1 1
RY (τ ) = RX (τ ) · cos(2πf0 τ ) = RX (τ ) cos(2πf0 τ )
2 2
In the frequency domain,
µ ¶
1 1 1 1 1
GY (f ) = GX (f ) ∗ δ(f − f0 ) + δ(f + f0 ) = GX (f − f0 ) + GX (f + f0 ). ¤
2 2 2 4 4
4
Review
Handout 101
2.6 Random Signals and LTI Systems

Consider passing a random signal X(t) through an LTI filter whose impulse response is
h(t). Consider the output signal which is also a random signal given below.
Z ∞
Y (t) = h(τ )X(t − τ )dτ
−∞
Properties of Y (t)
1. Mean, autocorrelation, and PSD: The mean of Y (t) is computed below.
·Z ∞ ¸
E[Y (t)] = E h(τ )X(t − τ )dτ
−∞
If X(t) is WSS, then

Z ∞ Z ∞
E[Y (t)] = h(τ )E[X(t − τ )]dτ = X(0) h(τ )dτ = H(0)X(0).
−∞ −∞
Assuming that X(t) is WSS, the autocorrelation function of Y (t) is computed below.
·µZ ∞ ¶ µZ ∞ ¶¸
∗ ∗
RY (τ ) = E h(η)X(τ − η)dη h (−ξ)X (ξ)dξ
−∞ −∞
·Z ∞ Z ∞ ¸
∗ ∗
= E h(η)h (−ξ)X(τ − η)X (ξ)dξdη
−∞ −∞
Z ∞Z ∞
= h(η)h∗ (−ξ)E [X(τ − η)X ∗ (ξ)] dξdη
Z−∞ −∞
∞ Z ∞
= h(η)h∗ (−ξ)RX (τ − η − ξ)dξdη
Z−∞∞
−∞
µZ ∞ ¶
∗
= h(η) h (−ξ)RX (τ − η − ξ)dξ dη
−∞ −∞
| {z }
z(τ −η) where z(τ )=h∗ (−τ )∗RX (τ )
Z ∞
= h(η)z(τ − η)dη
−∞
= h(τ ) ∗ z(τ ) = h(τ ) ∗ h∗ (−τ ) ∗ RX (τ )
In the frequency domain, the PSD of Y (t) is given below.
GY (f ) = H(f )H ∗ (f )GX (f ) = |H(f )|2 GX (f )

1
lert.
1
For an ergodic process, RY (0) yields the average power of a filtered random signal,
i.e. Z ∞
P = RY (0) = |H(f )|2 GX (f )df.
−∞
In summary, for an output filtered process,
RY (τ ) = h(τ ) ∗ h∗ (−τ ) ∗ RX (τ )
GY (f ) = |H(f )|2 GX (f )
Z ∞
P = RY (0) = |H(f )|2 GX (f )df
−∞
2. Stationarity: If the input X(t) is WSS, then the output Y (t) is also WSS. In
addition, if X(t) is SSS, so is Y (t).
3. PDF: In general, it is difficult to determine the PDF of the output, even when the
PDF of the input signal is completely specified.
However, when the input is a Gaussian process, the output is also a Gaussian pro-
cess. The statistics of the output process is fully determined by the mean function
and the autocovariance function.
Example 2.7 : Consider the LTI system whose input x(t) and output y(t) are related
by
y(t) = x(t) + ax(t − T ).
The corresponding impulse response is
h(t) = δ(t) + aδ(t − T ).
The corresponding frequency response is
H(f ) = 1 + ae−j2πf T .
If X(t) is WSS with PSD GX (f ), then the PSD of GY (f ) is
GY (f ) = |H(f )|2 GX (f ) = (1 + a2 + 2a cos(2πf T ))GX (f ). ¤
†
Power Spectrum Estimation
One problem that is often encountered in practice is to estimate the PSD of a random
signal x(t) when only a segment of length T of a single sample function is available.
Let us consider a single sample function of an ergodic random process x(t). Its
truncated version is given as
½
x(t), |t| ≤ T /2
xT (t) =
0, otherwise
2
Since xT (t) is strictly time-limited, its Fourier transform XT (f ) exists. An alternative
definition of the PSD of X(t) is stated as
1 £ ¤
GX (f ) = lim E |XT (f )|2 .
T →∞ T
A “natural” estimate of the PSD can be found by simply omitting the limiting and
expectation operations to obtain
1
ĜX (f ) = |XT (f )|2 .
T
This spectral estimate is called a periodogram. In practice, spectral estimation based on
a periodogram consists of the following steps.
1. Form a discrete-time version of xT (t) by sampling x(t) with a sufficiently high

sampling rate.
2. Compute a discrete-frequency version of XT (f ) by using the fast Fourier transform

(FFT) algorithm.
3. Compute the spectral estimate by squaring the magnitudes of the samples of XT (f )

and dividing them by the number of samples.
2.7 Noise Processes

The term noise is used to designate unwanted signals that corrupt the desired signal
in a communication system. According to the source of noise, we can devide noise into
external noise (e.g. atmospheric noise, noise from power lines) and internal noise from
the communication system itself.
The category of internal noise includes an important class of noise that arises due to
spontaneous fluctuations of current or voltage in electrical circuits. This kind of noise
is always present in all communication systems, and represents the basic limitation on
detection (i.e. transmission) of signals. The two most common types of spontaneous
fluctuations in electrical circuits are thermal noise and shot noise.
Thermal Noise
• It is due to random motion of electrons in any conductor.
• It has a Gaussian PDF according to the central limit theorem. Note that the number
of electrons involved is quite large, with their motions statistically independent from
one another.
• The noise voltage (in V) across the terminals of a resistor with resistance R (in
Ohm) has a Gaussian PDF with the mean and variance, denoted by µ and σ 2 ,
given by
2(πkT )2
µ = 0, σ 2 = R,
3h
where k is the Boltzmann’s constant ≈ 1.38 × 10−23 J/K, h is the Planck’s constant
≈ 6.63 × 10−34 Js, and T is the absolute temperature in K.
3
The noise PSD (in V2 /Hz) is
2Rh|f |
GN (f ) = .
eh|f |/kT
−1
With |f | ¿ kT /h, we have eh|f |/kT ≈ 1 + h|f |/kT and
GN (f ) ≈ 2kT R
For T = 273-373 K (0-100 degree Celcius), kT /h ≈ 1012 Hz. Thus, for all practical
purposes, the PSD of thermal noise is constant.
Shot Noise
• It is associated with the discrete flow of charge carriers across semiconductor junc-
tions or with the emission of electrons from a cathode.
• Shot noise has a Gaussian PDF with zero mean according to the central limit
theorem.
• Shot noise has a constant power spectrum, with the noise level being independent
of the temperature.
White Noise
Several types of noise sources have constant PSDs over a wide range of frequencies. Such
a noise source is called white noise by the analogy to white light which contains all the
frequencies of visible light.
In general, we write the PSD of white noise as
GN (f ) = N0 /2,
where the factor 1/2 is included to indicate that half of the power is associated with
positive frequencies while the other half is associated with negative frequencies, so that
the power passed by an ideal bandpass filter with bandwidth B is given by N0 B. The
corresponding autocorrelation function is
N0
RN (τ ) = δ(τ ).
2
NOTE: White noise is not necessarily Gaussian noise. Conversely, Gaussian noise is not
necessarily white noise.
Consider now a sample of a zero-mean white noise process N (t). The variance of the
sample is £ ¤
E |N (t)|2 = RN (0) = ∞.
Therefore, white noise has infinite power.
4
Filtered White Noise
Consider now filtered white noise corresponding to the ideal band-limited filter, i.e.
½
N0 /2, |f | ≤ B
GN (f ) =
0, otherwise
The filtered noise has the autocorrelation function
RN (τ ) = F −1 {G(f )} = N0 Bsinc(2Bτ ).
It follows that the sample of a band-limited zero-mean white Gaussian noise is a zero-
mean Gaussian random variable with the variance
£ ¤
E |N (t)|2 = RN (0) = N0 B,
which is also equal to the noise power.

More generally, if we pass white noise through an LTI filter with frequency response
H(f ), then the filtered noise has the PSD
N0
GN (f ) = |H(f )|2 ,
2
and is referred to as colored noise, which is again due to the analogy to colored light
containing only some frequencies of visible light.
2.8 Noise Equivalent Bandwidth

There is no unique definition for the bandwidth of a signal or for the bandwidth of
a nonideal filter. One commonly used definition for the bandwidth of a lowpass filter
(LPF) is the noise equivalent bandwidth.
When zero-mean white noise with PSD N0 /2 is passed through an LTI filter with
frequency response H(f ), the average power of the filtered noise is
Z
N0 ∞
RN (0) = |H(f )|2 df.
2 −∞
On the other hand, the average power of an ideal LPF with the same DC gain |H(0)|
and bandwidth B is given by
RN (0) = N0 B|H(0)|2 .
By equating these two noise powers, we can define the noise equivalent bandwidth of
an arbitrary LPF as
R∞
−∞
|H(f )|2 df
BN =
2|H(0)|2
Thus, the noise equivalent bandwidth of an arbitrary LPF is defined as the bandwidth
of the ideal LPF that produces the same output power from identical white noise input.
The definition can also be extended to bandpass filters in the same fashion.
5
Example 2.8 : Consider a LPF based on the RC circuit with the frequency response
1
H(f ) = ,
1 + jf /f0
1
where f0 = 2πRC
.
Since H(0) = 1,
Z Z Z ∞
1 ∞ 2 1 ∞ 1 1
BN = |H(f )| df = 2
df = df.
2 −∞ 2
2 −∞ 1 + f /f0 0 1 + f 2 /f02
Setting z = f /f0 yields
Z ∞
1 ∞ π 1
BN = f 0 dz = f 0 · arctanz| 0 = f 0 = .
0 1 + z2 2 4RC
The corresponding noise power is
4kT R kT ¤
RN (0) = N0 BN = = .
4RC C
2.8.1 Baseband Communication Model with Additive Noise

Consider a linear communication system that does not include modulation. Such a system
is called a baseband communication system. Noise in signal transmission often has an
additive effect.
For the modeling purpose, we typically combine all noise sources into a single additive
noise source located at the receiver input. Accordingly, we make two assumptions on noise
characteristics.
1. The noise process is an ergodic process with zero mean and PSD equal to N0 /2.
2. The noise process is uncorrelated with the transmitted signal.
Accordingly, the received signal Y (t) is the sum of the transmitted signal X(t) and the
noise N (t), i.e.
Y (t) = X(t) + N (t).
Since X(t) and N (t) are uncorrelated, we have superposition of signal powers, i.e.
RY (0) = RX (0) + RN (0) or equivalently
£ ¤ £ ¤ £ ¤
E |Y (t)|2 = E |X(t)|2 + E |N (t)|2 .
Define the signal power and the noise power at the receiver as
£ ¤ £ ¤
S = E |X(t)|2 and N = E |N (t)|2 .
In addition, the signal-to-noise ratio (SNR) is defined as
SNR = S/N.
The SNR is an important measure of the degree to which the transmitted signal is con-
taminated with additive noise.
In case of white noise with PSD N0 /2, the noise power at the receiver output with
power gain GR and noise equivalent bandwidth BN is given by
N = GR N0 BN .
Typically, for analytical purposes, it is assumed that additive noise is white and Gaussian,
and is referred to as additive white Gaussian noise (AWGN).
6
Review
Handout 111
3 Digital Communication Basics

AWGN Channel
The additive white Gaussian noise (AWGN) channel is the simplest practical mathemat-
ical model for describing a communication channel. This model is based on the following
assumptions.
1. The channel bandwidth is unlimited.
2. The attenuation of channel a is time-invariant and constant over all frequencies of

interest.
3. The channel delays the signal by a constant amount td .
4. The channel adds zero-mean white Gaussian noise N (t) to the transmitted signal.
In addition, this noise is uncorrelated with the transmitted signal.
The first tree assumptions indicate that the channel is distortionless over the message
bandwidth W . The response Y (t) of a AWGN channel for a transmitted signal X(t) is
given by
Y (t) = aX(t − td ) + N (t).
If the transmitted signal X(t) has average power SX and message bandwidth W and
the receiver includes an ideal lowpass filter with bandwidth of exactly W , the power of
the channel output is given by
£ ¤ £ ¤ £ ¤
E |Y (t)|2 = E |aX(t − td )|2 + E |N (t)|2 = a2 SX + N0 W.
The corresponding signal-to-noise ratio (SNR) is given by
a2 SX
SNR = .
N0 W
Matched Filter
Consider the problem of detecting whether a pulse of a known shape p(t) has been
transmitted or not. Thus, the output of the AWGN channel is given either by
Y (t) = ap(t − td ) + N (t)
or by
Y (t) = N (t).
Without loss of generality, assume that a = 1 and td = 0 in what follows. Assume that
the receiver structure in figure 3.1 is used.
1
lert.
1
transmitter channel receiver
Figure 3.1: Receiver structure for pulse detection.
In addition, we base our decision about the presence or the absence of p(t) on the
output Ỹ (t) of the receiver filter h(t) sampled at time instant t = t0 . More specifically, if
the pulse is present,
Z ∞
Ỹ (t0 ) = h(t0 − τ )Y (τ )dτ
−∞
Z ∞ Z ∞
= h(t0 − τ )p(τ )dτ + h(t0 − τ )N (τ )dτ
−∞ −∞
= p̃(t0 ) + Ñ (t0 ),
where p̃(t) and Ñ (t) are the filtered pulse and the filtered noise respectively.
The key question here is as follows: What is the optimal impulse response of the
receiver filter? Intuitively, the optimal filter (in terms of minimizing the decision error
probability) should maximize the SNR at t = t0 . This SNR can be written as
¯R ¯2
¯ ∞ j2πf t0 ¯
|p̃(t0 )| 2 ¯ −∞ H(f )P (f )e df ¯
SNR = h i = R∞ .
E |Ñ (t0 )|2 −∞
|H(f )|2 GN (f )df
Using the Schwarz’s inequality, the SNR can be upperbounded as follows.

¯ ¯2
¯R ∞ p ¯
¯ H(f ) G (f ) √P (f )
ej2πf t 0
df ¯ Z ∞
¯ −∞ N
GN (f ) ¯ |P (f )|2
SNR = R ∞ ≤ df.
−∞
|H(f )|2 GN (f )df −∞ GN (f )
The above inequality becomes equality when
P ∗ (f ) −j2πf t0
H(f ) = K e ,
GN (f )
where K is an arbitrary constant. Note that the optimal filter amplifies frequency com-
ponents of the signal and attenuates frequency components of the noise.
In the case of white noise with GN (f ) = N0 /2, we can write
P ∗ (f ) −j2πf t0
H(f ) = K e .
N0 /2
In the time domain,

2K ∗
h(t) = p (−t + t0 ).
N0
2
Thus, the optimal impulse response is determined by the pulse shape. In particular, the
optimal impulse response is matched to the pulse shape. For this reason, this optimal
filter is called a matched filter.
Assume that the pulse p(t) is nonzero only in the interval [0, T ]. Substituting the
expression of h(t) into the expression for Ỹ (t0 ) yields
Z ∞ Z T
2K
Ỹ (t0 ) = h(t0 − τ )Y (τ )dτ = p∗ (τ )Y (τ )dτ.
−∞ N0 0
Note that Ỹ (t0 ) is the correlation between the transmitted pulse p(t) and the received
signal Y (t). The result indicates that we can implement this optimal filtering as a corre-
lation receiver, as illustrated in figure 3.2.
transmitter channel receiver

Figure 3.2: Structure of a correlation receiver.
When p(t) is a rectangular pulse of duration T , the correlation filter is equivalent to

the integrate-and-dump (I&D) filter, i.e.
Z T
2K
Ỹ (t0 ) = Y (τ )dτ.
N0 0
In practical communication systems, we may not transmit information by using this

kind of on/off system, where a pulse is present or absent. Instead, we may embed the
information on the amplitude of the transmitted pulse; this technique is referred to as
pulse amplitude modulation (PAM). An alternative is to use a set of pulses, with the
information embedded in the choice of these pulses.
When using multiple pulse shapes, at the receiver, we need one matched filter (or
correlator) for each possible pulse shape. Then, we can compute the outputs from these
filters, and decide that the pulse for which the corresponding filter output is maximum
was sent.
Finally, it should be pointed out that the optimality of the matched filter was derived
for a distortionless channel. For distorting channels, the matched filter must follow the
distorted pulse shape, which is difficult in practice. Instead of changing the matched
filter, we can perform signal processing to remove the effects of channel distortion; such
processing is referred to as equalization.
3
Review
Handout 121
Coherent Detection of Binary Signals in AWGN Channel

In digital communications, detection refers to the decision regarding what data symbol
has been transmitted. In matched filter based detection, it is assumed that the receiver
has complete knowledge of the set of possible transmitted signals, and especially their
timing. Such detection is called coherent detection.
Consider a scenario in which one data bit is transmitted using two signals p1 (t) and
p2 (t) with finite duration in [0, T ] and with equal energy E, i.e.
Z T Z T
∗
E= p1 (t)p1 (t)dt = p2 (t)p∗2 (t)dt .
|0 {z } |0 {z }
=E1 =E2
Define the correlation coefficient ρ between p1 (t) and p2 (t) as

Z T Z
1 ∗ 1 T
ρ= √ p1 (t)p2 (t)dt = p1 (t)p∗2 (t)dt.
E1 E2 0 E 0
The receiver consists of two matched filter and is shown in figure 2.3. In particular,
the receiver decides that p1 (t) was transmitted if the decision parameter Z1 is greater
than Z2 , and vice versa.
or
transmitter channel
receiver
Figure 2.3: Receiver structure for binary detection.
Given that p1 (t) is transmitted, the outputs of the two matched filters are
Z T Z T Z T
∗ ∗
Z1 = Y (t)p1 (t)dt = p1 (t)p1 (t)dt + N (t)p∗1 (t)dt = E + N1 ,
0 0
|0 {z }
=N1
Z T Z T Z T
Z2 = Y (t)p∗2 (t)dt = p1 (t)p∗2 (t)dt + N (t)p∗2 (t)dt = ρE + N2 .
0 0
|0 {z }
=N2
In addition, given that p1 (t) is transmitted, a detection error occurs when Z2 > Z1 , or
equivalently Z = Z2 − Z1 > 0.
1
lert. In addition, this handout is the final one.
1
When N (t) is zero-mean Gaussian noise, N1 and N2 are jointly Gaussian random
variables. We compute the mean and the variance of N1 below.
·Z T ¸ Z T
∗
E[N1 ] = E N (t)p1 (t)dt = E[N (t)]p∗1 (t)dt = 0,
"µ0Z 0
¶ µZ ¶∗ #
T T
var[N1 ] = E N (τ )p∗1 (τ )dτ N (η)p∗1 (η)dη
0 0
Z T Z T
= E[N (τ )N ∗ (η)]p∗1 (τ )p1 (η)dτ dη
0 0
Z T Z T
N0
= δ(τ − η)p∗1 (τ )p1 (η)dτ dη
0 0 2
Z T
N0 EN0
= p∗1 (η)p1 (η)dη = .
2 0 2
Similarly, N2 has mean 0 and variance EN0 /2. The covariance between N1 and N2 is
computed as follows.
"µZ ¶ µZ T ¶∗ #
T
E[N1 N2∗ ] = E N (τ )p∗1 (τ )dτ N (η)p∗2 (η)dη
0 0
Z T Z T
= E[N (τ )N ∗ (η)]p∗1 (τ )p2 (η)dτ dη
0 0
Z T Z T
N0
= δ(τ − η)p∗1 (τ )p2 (η)dτ dη
0 0 2
Z T
N0 EN0
= p∗1 (η)p2 (η)dη = ρ∗ .
2 0 2
Since Z1 = E + N1 and Z2 = ρE + N2 , we compute E[Z] to be
E[Z] = E[Z2 ] − E[Z1 ] = (ρ − 1)E.
We next compute var[Z]. Note that Z − E[Z] = (ρE + N2 − E − N1 ) − (ρ − 1)E = N2 − N1 .

It follows that
£ ¤
var[Z] = E |N2 − N1 |2
= E[|N2 |2 ] + E[|N1 |2 ] − E[N1 N2∗ ] − E[N2 N1∗ ]
EN0 EN0 EN0 EN0
= + − ρ∗ −ρ
2 2 2 2
= (1 − Re{ρ})EN0 = (1 − ρ)EN0 ,
where the last equality follows from a practical assumption that p1 (t) and p2 (t) are real,
and hence ρ is real.
2
Therefore, given that p1 (t) is transmitted, the probability of detection error is
 ¯ 
 ¯ 

 ¯ 

 Z − (ρ − 1)E (1 − ρ)E ¯ ¯ 
Pr{Z > 0|p1 (t)} = Pr p >p ¯ p1 (t)

 (1 − ρ)EN0 (1 − ρ)EN0 ¯¯ 


 | {z } 

¯
zero−mean unit−variance Gaussian
s 
(1 − ρ)E 
= Q .
N0
By symmetry, given that p2 (t) is transmitted, the probability of detection error is the
same. In summary, the overall bit error probability is
s 
(1 − ρ)E 
Pe = Q  .
N0
For ergodic systems, the bit error probability is equal to the bit error rate (BER)
which is a key performance measure of a digital communication system. The BER is the
the average number of errors in an indefinitely long sequence of transmitted bits.
It is customary to describe the performance of a digital communication system by
plotting the BER against the ratio Eb /N0 , where Eb is the average energy used per
transmitted bit. Significant comparisons among different communication systems are
possible using such plots. As a specific example, we shall compare two scenarios of
binary detection discussed above.
1. Antipodal signals: p2 (t) = −p1 (t). In this case, ρ = −1.
2. Orthogonal signals: ρ = 0
It follows that
Ãr ! Ãr !
2E E
Peantipodal = Q , Peorthogonal = Q
N0 N0
Figure 2.4 indicates that antipodal signals perform better compared with orthogonal
signals. In particular, for the same BER, orthogonal signals require 3 dB more energy
per bit than antipodal signals. In other words, there is a 3-dB penality in terms of the
signal energy.
†
Appendix: Wiener-Khinchine Theorem
Recall that the PSD is defined as the Fourier transform of the autocorrelation function.
The Wiener-Khinchine theorem states that the PSD is indeed equal to the following
quantity, which was previously mentioned as an alternative definition of the PSD.
1 £ ¤
GX (f ) = lim E |XT (f )|2 ,
T →∞ T
3
-1
-2 orthogonal
log10BER
-3 antipodal
-4
-5
-6
0 2 4 6 8 10 12 14
Eb/N0 (dB)
Figure 2.4: BERs for antipodal and orthogonal signals.
where XT (f ) is the Fourier transform of the truncation xT (t) of the sample function x(t),
i.e. ½
x(t), |t| ≤ T /2
xT (t) =
0, otherwise
Wiener-Kinchine theorem: For a WSS process X(t),
GX (f ) = F{RX (τ )},
R∞
provided that −∞
|τ RX (τ )|dτ < ∞.
Proof: We first write2

£ ¤
E |XT (f )|2 = E [XT (f )XT∗ (f )]
"ÃZ ! ÃZ !∗ #
T /2 T /2
−j2πf η −j2πf ξ
= E x(η)e dη x(ξ)e dξ
−T /2 −T /2
Z T /2 Z T /2
= E [x(η)x∗ (ξ)] e−j2πf (η−ξ) dηdξ
−T /2 −T /2
Z T /2 Z T /2
= RX (η − ξ)e−j2πf (η−ξ) dηdξ.
−T /2 −T /2
Let τ = η − ξ, we can write

Z T /2 Z T /2−ξ
£ 2
¤
E |XT (f )| = RX (τ )e−j2πτ dτ dξ
−T /2 −T /2−ξ
2
We use x(t) to denote a random process in this section. The capital X(f ) is already used to refer to
its Fourier transform.
4
Figure 2.5 shows the region of integration in the domain set of (ξ, τ ). By changing the
order of integration, we can write
Z T Z T /2−τ Z 0 Z T /2
£ 2
¤ −j2πτ
E |XT (f )| = RX (τ )e dξdτ + RX (τ )e−j2πτ dξdτ
0 −T /2 −T −T /2−τ
Z T Z 0
−j2πτ
= (T − τ )RX (τ )e dτ + (T + τ )RX (τ )e−j2πτ dτ
0 −T
Z T
= (T − |τ |)RX (τ )e−j2πτ dτ.
−T
Figure 2.5: Region of integration for the derivation of the Wiener-Khinchine theorem.
From the definition of GX (f ), we have

1 £ ¤
GX (f ) = E |XT (f )|2
lim
T →∞ T
Z
1 T
= lim (T − |τ |)RX (τ )e−j2πτ dτ
T →∞ T −T
Z T Z
−j2πτ 1 T
= lim RX (τ )e dτ − lim |τ |RX (τ )e−j2πτ dτ
T →∞ −T T →∞ T −T
Z
1 T
= F{RX (τ )} − lim |τ |RX (τ )e−j2πτ dτ
T →∞ T −T
R R
Since f (τ )dτ ≤ |f (τ )|dτ for real f (τ ),R the real part and the imaginary part of the
∞
integration in the last equality are at most −∞ |τ RX (τ )|dτ , which is assumed to be finite.
It follows that the limit in the last equality is equal to zero, yielding GX (f ) = F{RX (τ )}
as desired. ¤
Appendix: Cyclostationary Processes

A random process X(t) is widesense cyclostationary if
X(t) = X(t + nT0 ),

RX (t, t − τ ) = RX (t + nT0 , t + nT0 − τ )
5
for all t, τ ∈ R and n ∈ Z. In other words, for any τ ∈ R, X(t) and RX (t, t − τ ) as
functions of t are periodic with period T0 .
For a widesense cyclostationary process X(t), the PSD is given by
GX (f ) = F{hRX (t, t − τ )i}
where
Z T0 /2
1
hRX (t, t − τ )i = RX (t, t − τ )dt
T0 −T0 /2
is the average autocorrelation function.
Example 2.9 : Consider Y (t) = X(t) cos(2πf0 t), where X(t) is WSS. We compute the
mean and the correlation function of Y (t) as follows.
Y (t) = E[X(t) cos(2πf0 t)] = X(t) cos(2πf0 t)

RY (t, t − τ ) = E[X(t) cos(2πf0 t)X ∗ (t − τ ) cos(2πf0 (t − τ ))]
= RX (τ ) cos(2πf0 t) cos(2πf0 (t − τ ))
µ ¶
cos(2πf0 τ ) + cos(4πf0 t − 2πf0 τ )
= RX (τ )
2
Since Y (t) and RY (t, t − τ ) are periodic with period T0 = 1/f0 , it follows that Y (t) is
widesense cyclostationary.
In addition,
1
hRY (t, t − τ )i = RX (τ ) cos(2πf0 τ ),
2
yielding the PSD
1 1
GY (f ) = GX (f − f0 ) + GX (f + f0 ).
4 4
Note that is the same PSD as for Y (t) = X(t) cos(2πf0 t + Φ), where Φ is uniformly
distributed in [0, 2π]. ¤

LN 0 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LN 0 PDF

Uploaded by

Copyright:

Available Formats

Review

1 Probability and Random Variables

Axioms of probability: Let S be the sample space and E, F ∈ S be events.

The above axioms yield the following basic identities.

Proof: Since (F ∩ E) ∪ (F − E) = F and (F ∩ E) ∩ (F − E) = ∅,

Pr{F1 ∪ F2 ∪ F3 } = Pr{F1 ∪ F2 } + Pr{F3 }.

Since F1 and F2 are disjoint, we can write

Pr{F1 ∪ F2 ∪ F3 } = Pr{F1 } + Pr{F2 } + Pr{F3 }. ¤

Alternatively, we can write

Pr{E, F} = Pr{E|F} Pr{F}

A partition of E is a set of disjoint subsets of E whose union is equal to E. Let

Since F1 , . . . , Fn are disjoint, so are the sets E ∩ F1 , . . . , E ∩ Fn . Hence,

Using the definition of conditional probability, we can write

The Bayes’ theorem states that

Pr{Fi , E} Pr{E|Fi } Pr{Fi }

and use the Bayes rule for the denominator. ¤

The conditional probability can be defined based on multiple events. In particular,

Pr{F1 , . . . , Fn } = Pr{Fn |F1 , . . . , Fn−1 } Pr{F1 , . . . , Fn−1 }

Events E and F are independent if

Pr{E, F} = Pr{E} Pr{F}

Pr{E, F|G} = Pr{E|G} Pr{F|G}

1.1.2 Random Variables

• If S is countable, then X(s) is a discrete random variable.

Some Properties of a CDF

An alternative description of the probability distribution of random variable X is

Some Properties of a PDF

fX (x) = 1/6, x ∈ {1, . . . , 6}.

The corresponding PDF is

1.1.5 Joint and Conditional CDFs and PDFs

X and Y are statistically independent (or in short independent) if

Suppose that we want to evaluate Pr{X ≤ 1, Y ≤ 0}. It can be done as follows.

Figure 1.3: Monotonic function of random variable X.

When g is monotonically increasing,

Each value of y > 0 corresponds to two values of x, i.e.

1.3 Expected Values

Mean (Expected Value) of a Random Variable

The k th moment of random variable X is defined as E[X k ]. The k th central moment

• Standard deviation of X denoted by σX : The standard deviation of X is equal to

Theorem (Markov inequality): For a nonnegative random variable X,

Theorem (Chebyshev inequality): For a random variable X,

The above identity can be obtained by writing

Example 1.5 : Consider the Laplace PDF

Finally, we compute Pr{|X − E[X]| < 2σX } below.

More generally, for a function g(X1 , . . . , XN ) of N random variables X1 , . . . , XN ,

Sum of Random Variables

FX (x) = FX1 ,...,XN (x1 , . . . , xN ) = Pr{X1 ≤ x1 , . . . , XN ≤ xN },

or the joint PDF of X1 , . . . , XN , i.e.

∂ N FX1 ,...,XN (x1 , . . . , xN )

The correlation matrix of X is defined as

The covariance matrix of X is defined as

Note that the diagonal entries of CX are the variances of X1 , . . . , XN . In addition, if

1.4.2 Complex Random Variables

and g−1 (y) is the inverse function vector

where Jk (y) is the Jacobian determinant

and gk−1 (y) is the inverse function vector

gk−1 (y) = (gk,1

We find fY1 ,Y2 (y1 , y2 ) in terms of fX1 ,X2 (·, ·) as follows.

1.4.3 Functions of Random Vectors (Continued)

where αmn ’s are constant coefficients. By defining X = (X1 , . . . , XN ), Y = (Y1 , . . . , YN ),

Jointly Gaussian Random Vectors

for all n = 1, . . . , N . For convenience, define random vectors X = (X1 , . . . , XN ) and

1 1 −1 x)T (A−1 x) 1 1 T (AAT )−1 x

det CX = det(AAT ) = det A · det AT = | det A|2 ,