Professional Documents
Culture Documents
Module 1
Basic Probability
Estimated
Transmitter
Channel
Receiver
message
In order to study this basic scenario, we will break up the problem into modules.
Up-converter
Transmitter
Tx1
Waverform mapping
Encoder
Txmt msg
Tx3
Channel
Tx2
Rx1
Down-coverter
Rx2
Baseband front-end
Rx3
Decoder
Estimated msg
Receiver
Modules
Encoding to handle errors: Error correcting codes & decoding: Tx3 and
Rx3 refined.
Basic Probability
Formal probability theory starts with the concept of a probability space consisting
of
(i) An abstract space , sample space, containing all distinguishable elementary
outcomes of an experiment.
(ii) An event space F, consisting of a collection of subsets of which we
consider to be possible events to which we want to assign probabilities. We
require an algebraic structure that we will specify soon.
(iii) Probability measure, P, which assigns a number between 0 and 1 to every
event in F. It has to satisfy some axioms that we will explain later.
A natural question to ask is whether we can make F the set of all subsets of ?
This is in fact a valid choice when is a finite space, i.e. || 8.
However for infinite spaces like r0, 1s this is much more tricky due to the
uncountably infinite number of outcomes in for this case.
In the beginning we will introduce notions for a discrete probability space where
|| 8 or || is countably infinite.
Probability Measure
A probability measure P satisfies the following axioms
1st PpF q 0
@ F PF
Fi
i 1
th
PpFi q
i 1
If Fi , i
8
i 1
Fi
i 1
PpFi q
Notes
P is a measure in the same sense as mass, length, area and volume, each of
which satisfy axioms 1, 3, 4.
But P is special since it is bounded due to axiom 2.
Also probability is different due to aspects like conditioning and
independence, that does not occur in this analogy.
Sometimes (read as mostly) it is convenient to write the probability of a
subset s P as Pps q instead of Ppts uq. This abuse of notation is (very)
common.
Notes
For discrete spaces F can be taken to be the set of all subsets of which is
also called the power set of .
F need not necessarily be the entire power set.
Pp q.
Examples:
pT qk e T
k!
for k
Pp q 3
1
6
12 .
0, 1, 2, ... and 0.
This is the Poisson probability distribution (we will define in awhile) and is
the average number of packets per unit time.
H
PpAc q Ppq 1.
PpA Y B q PpAq
PpA Y B q PpAq
P pB zAq PpAq
PpB q PpA X B q
Ai
PpAi q
i 1
i 1
PpB
X Ai q.
Independence
tFi uki1 are independent or mutually independent if for any distinct subcollection
tFl umi1 we have
m m
P
Fl
PpFl q
i
i 1
i 1
Conditional Probability
Suppose we have a probability space p, F, Pq and an observer has told us that
an event G has occurred. Thus the observer knows the outcome of the
experiment, but we do not know which element of G has occurred. Now, we need
to calculate the probability that another event F has occurred. We denote this
conditional probability as PpF |G q. For a fixed G , we want to be able to
compute PpF |G q, @F P F. Therefore, we define a new probability measure
PG PpF |G q on p, F q. We define this measure from earlier principles.
Since we are told that G has occurred P G , and so PG must assign zero
measure to G c i.e. we should have:
PpG c |G q 0 PpG |G q 1.
Therefore
PpF |G q
[total probability law]
Next intuitively we should expect that relative probabilities should not change
given knowledge that G has indeed occurred. For example if F G and H G
and if PpF q 2PpH q then we expect PpF |G q 2PpH |G q. Hence if
PpF X G q 2PpH X G q we expect PpF |G q 2PpH |G q.
Therefore we would want
PpF X G |G q
PpH X G |G q
PpF |G q
PpH |G q
Hence if we take H
PPppHF XX GG qq , i.e.,
PPppHF XX GG qq @F , H, G P F.
, we get
PpF |G q
PpPFpXG qG q or
PpG |G q
PpF X G q
PpF |G q
PpG q
which is the definition of conditional probability. Note that we need PpG q 0 for
this to work, i.e., conditioning event should have a non-zero probability measure.
Note that if two events are independent, i.e. PpF X G q PpF qPpG q, then we
have
PpF X G q
PpF |G q
PpF q,
PpG q
i.e. we have the intuitive result that probability of F is unaffected by the
knowledge that G has occurred. This is what we expect from the notion of
independence.
Note that it is not useful to define independence as PpF q PpF |G q since it
requires PpG q 0 and therefore is slightly less general.
If PpF q 0 and PpG q 0, then
PpG |F q
PpF X G q
PpF q
PpF P|GpFqPq pG q .
Bayes Rule
Let A1 , A2 , ..., An be events with non-zero probability measure, which partition the
space i.e.,
i 1
Ai
i 1
Moreover,
PpAj |B q
PpB
X Ai q
i 1
PpAj |B q n
1, 2, ..., n
P(0|0) = 0.9
P(1|0) = 0.1
P(0|1) = 0.025
P(1) = 0.8
1
P(1|1) = 0.975
where the pair pi, j q denotes (tx bit,rx bit). Since is finite, then F 2 , the
power set. Moreover we just need to assign elementary outcomes a probability
measure,
Pptpi, j quq Ppinput i q Ppoutput j |input i q, i, j
P t0, 1u.
0.9
0.9
0.2
0.2 0.9.
0.8 0.025
0.2
Now suppose the same channel is used twice to send independent bits of
information. Let,
And we are interested in the event that both transmissions are received
erroneously, i.e., the event E1 X E2 .
Since the channel uses cause independent errors as well, we would expect
PpE1 X E2 q PpE1 qPpE2 q
PpE1 q
Random Variables
Random variables are just functions mapping elementary outcomes in the abstract
space to a real number. Here is where the power of the probability space
formalism is more apparent since we can define
X : R as a random variable X p q
and similarly define a random vector as
X : Rd i.e., Xp q P Rd is a random vector
and a discrete time random process as a function as well,
X : R i.e.,
X
P R, t P Z.
Notes:
Now from a cumulative distribution function (which always exists for a random
variable), we can define a probability density function (which may or may not be
well defined)
FX px
x q FX px q Ptx
X pq x
x u.
dFX px q
dx x a
if definable.
x u.
Example:
Suppose we toss a coin 4 times, each toss being independent. If the coin is
unbiased, then each toss can be H or T, each with probability 21 . Let X
number of H happening in trial. Clearly X P t0, 1, 2, 3, 4u.
PrtX
PrtX
PrtX
1u
2u
PrtX
0u 161
4
1
1
16
4 161 41
4
2
1
16
6 161 83
3u
PrtX
4
3
1
16
164 41
4u 161
x, Y y u Ppt : X pq x, Y pq y uq , @x P X , y P Y,
where for short hand we define X X , Y Y . Note that
PXY px, y q 1.
P
x Xy Y
PX px q
PXY px, y q,
y Y
@x P X .
PXY px, y q
PY py q
@x P X ,
y : PY py q 0.
Chain rule:
Independence:
@x, y , P X Y
PY |X py |x q PY py q @y P Y, x : PX px q 0
Y, we can find
PY |X py |x q
PY |X py |x q
PX px q
PX |Y px |y q
1
1 PX px q
PY py q
x PX PY |X py |x qPX px q
1
x, Y y u Ppt : X pq x, Y pq y uq
2
B FBXYx Bpyx, y q xlim0 Prtx X x
y
x, y
xy
Y y
fXY px, y q
fX px q
if fX px q 0.
fY |X py |x qfX px q
fX |Y px |y q 8
fXY pu, y qdu
8
y u
Expectation or Mean
Expectation of a random variable X is defined as,
$
'
discrete random variables
'
& x xPX px q
E rX s
'
'
% xfX px qdx continuous random variables
x
$
'
discrete
'
& x g px qPX px q
E rg px qs
'
'
% g px qfX px qdx continuous
x
Variance
$
'
2
'
2 & x x fX px qdx
E X
2
'
'
x PX px q
%
x
Variance of X :
VarpX q E pX
E rX sq2
Var E X 2 pE rX sq2 2X E rX s
E X 2 pE rX sq2 2pE rX sq2
i.e., E X 2 pE rX sq2
Standard deviation is X
E rg pX , Y qs
E rXY s
xfX px qdx
yfY py qdy
or E rXY s E rX s E rY s
CovpX , Y q E rXY s E rX s E rY s 0.
Hence independence uncorrelated.
But uncorrelated independence!
Example
Let X , Y
P t2. 1,
1, 2u.
PXY px, y q
$
2
'
'
'5
'
&
'
'
10
'
'
%0
px, y q p
px, y q p
2, 2q, p2, 2q
otherwise
xPXY px, y q 1
x Xy Y
2
5
p1q 52
2
1
10
2 101 0
and similarly E rY s 0.
E rXY s 1
2
5
1
2
5
4 101 4 101 0
uncorrelated!
Vectors, which are almost everywhere column (tall) vectors by default, will
be denoted bold-face e.g. X (a random vector) or v (a vector).
Matrices are also bold-face e.g. covariance matrix KX , dimensions which will
be specified when they are defined will determine their usage.