You are on page 1of 36

EE132A - Introduction to Communication Systems

Module 1
Basic Probability

Winter Quarter 2015

An outline of this module...

Basic communication block diagram


Motivate course outline using this block diagram
Basic probability: sample space, events & probability
Independence
Conditional probability
Random variables: pdfs, cdfs
Expectation (mean) and variance
Expectation of multiple random variables
Uncorrelated vs. Independence

Basic communication block diagram


Msg

Estimated
Transmitter

Channel

Receiver

message

Messages generated by a source, e.g., video, speech, web page request.


Transmitter maps message into signal to be transmitted over channel.
Channel is wired cable (telephone line, TV cable, etc.) or wireless (riding on
electromagnetic waves).
Receiver maps the transmitted signal after transformation by channel onto
an estimate of the message.
In digital communication, the message is a bit sequence and this is the focus of
this course. In analog communication, the message is an analog waveform; a
topic that we will briefly touch upon in the class.

In order to study this basic scenario, we will break up the problem into modules.

Up-converter

Transmitter

Tx1

Waverform mapping

Encoder

Txmt msg

Tx3

Channel

Tx2

Rx1

Down-coverter

Rx2

Baseband front-end

Rx3

Decoder

Estimated msg

Receiver

Modules

Receiver design for discrete time channels: Rx3.

Waveform design & baseband front-end: Tx2, Rx2.

Sequence of transmitted messages: Transmit Pulse-shape: Tx2 refined.

Encoding to handle errors: Error correcting codes & decoding: Tx3 and
Rx3 refined.

Bandpass communication: Upconverter & Downconverter: Tx1 & Rx1.

ISI channels & OFDM: Modification of transmitter waveform & Rx


baseband front end.

Application to wireless systems, DSL systems.

Basic Probability

Formal probability theory starts with the concept of a probability space consisting
of
(i) An abstract space , sample space, containing all distinguishable elementary
outcomes of an experiment.
(ii) An event space F, consisting of a collection of subsets of which we
consider to be possible events to which we want to assign probabilities. We
require an algebraic structure that we will specify soon.
(iii) Probability measure, P, which assigns a number between 0 and 1 to every
event in F. It has to satisfy some axioms that we will explain later.

A natural question to ask is whether we can make F the set of all subsets of ?
This is in fact a valid choice when is a finite space, i.e. || 8.

However for infinite spaces like  r0, 1s this is much more tricky due to the
uncountably infinite number of outcomes in for this case.
In the beginning we will introduce notions for a discrete probability space where
|| 8 or || is countably infinite.

Probability Measure
A probability measure P satisfies the following axioms
1st PpF q 0

@ F PF

2nd Ppq  1 i.e. probability of everything is 1


3rd If Fi

 1, ..., n are disjoint then P

Fi

i 1

th

PpFi q

i 1

If N pAq, N pB q are frequencies of disjoint events A, B,


PpA Y B q  N pAqNN pB q  PpAq PpB q, where N is the total number
of trials. This is the frequentist way of interpreting this property.

If Fi , i

 1, 2, ... are disjoint, then P

8

i 1

Fi

i 1

PpFi q

Notes

P is a measure in the same sense as mass, length, area and volume, each of
which satisfy axioms 1, 3, 4.
But P is special since it is bounded due to axiom 2.
Also probability is different due to aspects like conditioning and
independence, that does not occur in this analogy.
Sometimes (read as mostly) it is convenient to write the probability of a
subset s P as Pps q instead of Ppts uq. This abuse of notation is (very)
common.

Examples: discrete probability spaces

A sample space is said to be discrete if it is countable.


Flipping a coin:  tH, Tu, F

 tH, tHu, tTu, u


Rolling a die:  t1, 2, 3, 4, 5, 6u, F  set of all subsets
Number of packets arriving at a node in a communication network in an
interval r0, T s :  N  t0, 1, 2, ...u
Flipping a coin till first head occurs:  tH, TH, TTH, ...u

Notes

For discrete spaces F can be taken to be the set of all subsets of which is
also called the power set of .
F need not necessarily be the entire power set.

Probability measure can be defined


by assigning probabilities to individual
outcomes Pp q 0 @ P and
Pp q  1 . For any event A,
PpAq 

Pp q.

Examples:

Suppose we take a die rolling experiment


with  t1, 2, 3, 4, 5, 6u and
1
assign Pp q  6 , @ P . Clearly
Pp q  1. If A  event that we get

even die then A  t2, 4, 6u. Then, we get PpAq 


For number of packets arriving in time r0, T s,
Pptk uq 

pT qk e T
k!

for k

Pp q  3 

1
6

 12 .

 0, 1, 2, ... and 0.

This is the Poisson probability distribution (we will define in awhile) and is
the average number of packets per unit time.

Basic Probability Laws


PpAc q  1  PpAq
A Y Ac  and A X Ac
therefore PpAq

H

PpAc q  Ppq  1.

If A B then PpAq PpB q


B  A Y pB zAq
therefore PpB q  PpAq

PpA Y B q  PpAq
PpA Y B q PpAq

P pB zAq PpAq

PpB q  PpA X B q

PpB q. Union of events bound:

Ai

PpAi q

i 1

i 1

Law of total probability: Let A1 , A2 , ... be events that partition , i.e.


Ai X Aj  H, i  j and i Ai  . Then for any event B,
PpB q 

PpB

X Ai q.

This is sometimes useful in finding probabilities of sets.

Independence

Given a probability space p, F, Pq, two events F , G P F are said to be


independent of each other if PpF X G q  PpF qPpG q. A collection of subsets

tFi uki1 are independent or mutually independent if for any distinct subcollection
tFl umi1 we have
m m

P
Fl 
PpFl q
i

i 1

i 1

Note It is not sufficient to only have P


i 1 Fi
imply the independence of a subcollection.

 ki1 PpFi q since it does not

Conditional Probability
Suppose we have a probability space p, F, Pq and an observer has told us that
an event G has occurred. Thus the observer knows the outcome of the
experiment, but we do not know which element of G has occurred. Now, we need
to calculate the probability that another event F has occurred. We denote this
conditional probability as PpF |G q. For a fixed G , we want to be able to
compute PpF |G q, @F P F. Therefore, we define a new probability measure
PG  PpF |G q on p, F q. We define this measure from earlier principles.
Since we are told that G has occurred P G , and so PG must assign zero
measure to G c i.e. we should have:
PpG c |G q  0 PpG |G q  1.
Therefore
PpF |G q
[total probability law]

 PpF X |G q  PpF X rG Y G c s|G q


 PpF X G |G q PpF X G c |G q  PpF X G |G q
0 since F XG c G c

Next intuitively we should expect that relative probabilities should not change
given knowledge that G has indeed occurred. For example if F G and H G
and if PpF q  2PpH q then we expect PpF |G q  2PpH |G q. Hence if
PpF X G q  2PpH X G q we expect PpF |G q  2PpH |G q.
Therefore we would want
PpF X G |G q
PpH X G |G q
PpF |G q
PpH |G q
Hence if we take H

 PPppHF XX GG qq , i.e.,
 PPppHF XX GG qq @F , H, G P F.

 , we get
PpF |G q
 PpPFpXG qG q or
PpG |G q
PpF X G q
PpF |G q 
PpG q

which is the definition of conditional probability. Note that we need PpG q 0 for
this to work, i.e., conditioning event should have a non-zero probability measure.

Note that if two events are independent, i.e. PpF X G q  PpF qPpG q, then we
have
PpF X G q
PpF |G q 
 PpF q,
PpG q
i.e. we have the intuitive result that probability of F is unaffected by the
knowledge that G has occurred. This is what we expect from the notion of
independence.
Note that it is not useful to define independence as PpF q  PpF |G q since it
requires PpG q 0 and therefore is slightly less general.
If PpF q 0 and PpG q 0, then
PpG |F q 

PpF X G q
PpF q

 PpF P|GpFqPq pG q .

Bayes Rule
Let A1 , A2 , ..., An be events with non-zero probability measure, which partition the
space i.e.,

i 1

Ai

 and Ai X Aj  H, i  j, and let B P F be any other event,

then by the law of total probability


PpB q

i 1

Moreover,

PpAj |B q

PpB

X Ai q 

PpB |Ai qPpAi q.

i 1

 PpAPjpBXqB q  PpB |PApjBqPqpAj q .

Hence we get the Bayes Rule,


PpB |Aj q
PpAj q, j
i 1 PpB |Ai qPpAi q

PpAj |B q  n

 1, 2, ..., n

Clearly this works for a countably infinite number of events as well.

Example - Binary Communication Channel


P(0) = 0.2

P(0|0) = 0.9

P(1|0) = 0.1

P(0|1) = 0.025
P(1) = 0.8

1
P(1|1) = 0.975

Figure: Model for a noisy binary communication channel.


 tp0, 0q, p0, 1q, p1, 0q, p1, 1qu

where the pair pi, j q denotes (tx bit,rx bit). Since is finite, then F  2 , the
power set. Moreover we just need to assign elementary outcomes a probability
measure,
Pptpi, j quq  Ppinput  i q  Ppoutput  j |input  i q, i, j

P t0, 1u.

Let us define events


A

t0 is sentu  tp0, 0qu Y tp0, 1qu


B  t0 is receivedu  tp0, 0qu Y tp1, 0qu
Then, the a-posteriori probability that 0 was sent is given by
PpA|B q

 PpAqPpB |APq pB P|ApqAc qPpB |Ac q PpAq

PpA|B q  0.2  0.9

[using Bayes rule]

0.9
0.9

0.2 
 0.2  0.9.
0.8  0.025
0.2

Note that PpA|B q PpAq  0.2.

Now suppose the same channel is used twice to send independent bits of
information. Let,

 t0 is sent on i th transmissionu Aci  t1 is sent on i th txu


Bi  t0 is received on i th transmissionu Bic  t1 is recd on i th rxu,
where i  1, 2.
Now PpA1 X A2 q  PpA1 qPpA2 q  0.2  0.2 due to independence. Suppose we are
Ai

interested in the error event Ei , i.e.,


Ei

 tin i th use of channel, tx bit  rx bitu


 pAi X Bic q Y pAci X Bi q, i  1, 2.

And we are interested in the event that both transmissions are received
erroneously, i.e., the event E1 X E2 .

Since the channel uses cause independent errors as well, we would expect
PpE1 X E2 q  PpE1 qPpE2 q
PpE1 q

 P rpA1 X B1c q Y pAc1 X B1 qs  PpA1 X B1c q PpAc1 X B1 q


 PpA1 qPpB1c |A1 q PpAc1 qPpB1 |Ac1 q
 0.2  0.1 0.8  0.025  0.04
PpE1 X E2 q  PpE1 qPpE2 q  p0.04q2  16  104  1.6  103

Random Variables
Random variables are just functions mapping elementary outcomes in the abstract
space to a real number. Here is where the power of the probability space
formalism is more apparent since we can define
X : R as a random variable X p q
and similarly define a random vector as
X : Rd i.e., Xp q P Rd is a random vector
and a discrete time random process as a function as well,

X : R i.e.,



X 

P R, t P Z.

Notes:

The random variable/vector/process inherits its probability measure from


the underlying probability space p, F, Pq.

We distinguish the probability measure of r.v. which is Px : B pRq r0, 1s


from the underlying measure of probability space P.
We need an important condition on the r.v. that the images are in F, that
is,
X 1 pAq  t : X p q P Au P F,

but these sets A need to be open sets in R (or unions, intersections, or


complements)
In general Px is defined for all F
X R.

P F x ,i.e. the event space of subsets of

Description of Random Variables:


A Cumulative distribution function(c.d.f.) is
FX paq  PX px

P p8, asq  PrtX pq au.

Now from a cumulative distribution function (which always exists for a random
variable), we can define a probability density function (which may or may not be
well defined)
FX px

x q  FX px q  Ptx

X pq x

x u.

If FX is smooth, then we can write using the mean-value theorem, for x


sufficiently small,
FX px x q  FX px q  fX px qx.
A probability density function(p.d.f.) is
fX paq 

dFX px q 
dx x a

if definable.

If the random variable is discrete, then a more suitable description is a


probability mass function(p.m.f.),
PX px q  PrtX

 x u.

Example:
Suppose we toss a coin 4 times, each toss being independent. If the coin is
unbiased, then each toss can be H or T, each with probability 21 . Let X 
number of H happening in trial. Clearly X P t0, 1, 2, 3, 4u.
PrtX
PrtX
PrtX

 1u 
 2u 

PrtX

 0u  161

4
1

1
16

 4  161  41

4
2

1
16

 6  161  83

 3u 
PrtX

4
3

1
16

 164  41

 4u  161

Joint, marginal and conditional pmfs


Now we will carefully define and use joint, marginal and conditional probability
mass functions. Let X , Y be discrete random variables, i.e., X , Y are finite or
countably infinite.
PXY px, y q  PrtX

 x, Y  y u  Ppt : X pq  x, Y pq  y uq , @x P X , y P Y,
where for short hand we define X  X , Y  Y . Note that

PXY px, y q  1.
P

x Xy Y

Marginal pmf: PX px q. Use the law of total probability

PX px q 

PXY px, y q,

y Y

@x P X .

Conditional pmf: PX |Y y px |y q is defined as


PX |Y y px |y q 

PXY px, y q
PY py q

@x P X ,

y : PY py q  0.

Chain rule:
Independence:

PXY px, y q  PX |Y px |y qPY py q  PY |X py |x qPX px q.


PXY px, y q  PX px qPY py q
equivalent to

@x, y , P X  Y
PY |X py |x q  PY py q @y P Y, x : PX px q  0

Bayes Rule for pmfs

Given PX px q, PY |X py |x q for every px, y q P X

 Y, we can find
PY |X py |x q
PY |X py |x q
PX px q 
PX |Y px |y q 
1
1 PX px q
PY py q
x PX PY |X py |x qPX px q
1

Continuous-valued random variables and Bayes rule


FXY px, y q  PrtX

x, Y y u  Ppt : X pq x, Y pq y uq

is the joint cdf of X , Y .


Independence: X , Y are independent if @x, y , FXY px, y q  FX px qFY py q
Joint pdf: If FXY px, y q is differentiable in x, y then
fXY

2
 B FBXYx Bpyx, y q  xlim0 Prtx X x
y

x, y
xy

Y y

We then define conditional pdf as


fY |X py |x q 
Bayes rule:

fXY px, y q
fX px q

if fX px q  0.

fY |X py |x qfX px q
fX |Y px |y q  8
fXY pu, y qdu

8

y u

Expectation or Mean
Expectation of a random variable X is defined as,

$
'
discrete random variables
'
& x xPX px q
E rX s 
'
'
% xfX px qdx continuous random variables
x

Fundamental Theorem of Expectation:


For a function g px q of a random variable,

$
'
discrete
'
& x g px qPX px q
E rg px qs 
'
'
% g px qfX px qdx continuous
x

Variance

$
'
2
'
 2  & x x fX px qdx
E X 
2
'
'
x PX px q
%
x

Variance of X :

VarpX q  E pX

second moment or mean-square of X

 E rX sq2


Var  E X 2 pE rX sq2  2X E rX s
 
 E X 2 pE rX sq2  2pE rX sq2
 
i.e.,  E X 2  pE rX sq2

Standard deviation is X

VarpX q i.e. VarpX q  X2 .

Expectation involving two random variables, Covariance

E rg pX , Y qs 

g px, y qfXY px, y qdxdy

extension to the Fundamental Theorem of Expectation


Covariance: CovpX , Y q  E rpX  E rX sqpY  E rY sqs , X and Y are
uncorrelated if CovpX , Y q  0.

Uncorrelated versus independence

E rXY s 

xyfXY px, y qdxdy

xfX px qdx

 

If CovpX , Y q  0, then X and Y are uncorrelated. Is there any relationship


between independence and uncorrelatedness? Suppose X
Y , where
denotes
independence, then
xyfX px qfY py qdxdy

yfY py qdy

or E rXY s  E rX s E rY s

CovpX , Y q  E rXY s  E rX s E rY s  0.
Hence independence uncorrelated.
But uncorrelated independence!

Example
Let X , Y

P t2.  1,

1, 2u.

PXY px, y q 

$
2
'
'
'5
'
&

'
'
10
'
'
%0

px, y q  p

1, 1q, p1, 1q

px, y q  p

2, 2q, p2, 2q
otherwise

Clearly X &Y are not independent,


E rX s 

xPXY px, y q  1 

x Xy Y

2
5

p1q  52

2

1
10

 2  101  0

and similarly E rY s  0.
E rXY s  1 

2
5

1

2
5

 4  101  4  101  0

uncorrelated!

Notations for the course


1

A random variable will always be denoted by an upper-case letter e.g. X ,


while a realization of it will be denoted by a lower-case letter e.g. X  x.

PX , fX denote the pmf or the pdf of a discrete or a continuous random


variable respectively.

Pr() or Ppq will denote the probability measure of an event or a subset


of the sample space.

Probability of error will be denoted by Pe .

Expectations will be denoted by E; a subscript indicating the random


variable against which the expectation is taken may sometimes be given for
clarity e.g. EX |Y rX |Y s.

Vectors, which are almost everywhere column (tall) vectors by default, will
be denoted bold-face e.g. X (a random vector) or v (a vector).

Matrices are also bold-face e.g. covariance matrix KX , dimensions which will
be specified when they are defined will determine their usage.

You might also like