Lecture Notes Feb10

Substitute Class - Lecture Notes
Abhijit Kiran Valluri

February 10, 2016
Birthday Problem
The birthday problem, also often referred to as the birthday paradox is the problem of determining the
probability that at least one pair of individuals, in a group of n, have the same birthday. Using the
pigeonhole principle, one can readily see that the probability reaches 1 when we have a group of 367
people or more (considering the extra day, February 29, in a leap year). But what is surprising to most
people, at first glance, is that the probability reaches 1/2 with just a small group of 23 people. Most
people typically guess that there should be at least 100 people before this happens. In fact, there is no
paradox here at all, and the confusion is merely due to the counter-intuitiveness of the problem.
1.1
Calculating the probability
Before we calculate the probability, we should note that we make the following assumptions: (1) all the
days of the year are equally likely to be the birthday of a person, (2) we have a representative sample
of the population. That is, we have not cherry-picked people in the group of n people such that any
particular day, week, month is more likely to be a birthday within the group. This, together with the first
assumption implies that we can simply assume that each person is equally likely to have his/her birthday
to be on any day of the year.
Then, let us define the event A = {At least two people have the same birthday}. Now, consider the
complement of this event, Ac . The birthdays of all the individuals are mutually independent events.
Hence, for Ac to occur, we can reason as follows: person 1 can have their birthday on any of the 366 days
as this does not cause any clashes; person 2 can have their birthday on any of the remaining 365 days;
similarly person 3 can have their birthday on any of the remaining 364 days and so on.
Then, the probability of event Ac is
P (Ac ) =
366 365
367 n

,
366 366
366
(1)
when there are n people. When n = 23, P (Ac ) 0.4937 and hence P (A) > 0.5. Therefore, we only need
about 23 people in the group to have more than 50% chance of someone having the same birthdays. If
there are 70 people, then this probability is about 99.9%.
Binary Symmetric Communication Channel
A binary symmetric channel (BSC) is a very common communications channel used in information theory
and communication theory related courses as a learning tool and to develop some fundamental theorems in
communication theory. Essentially, during a single channel use, the channel takes in a binary input {0, 1},
and outputs a binary bit, with a small probability of flipping the bit, given by the crossover probability.
Q: How do we infer the input from the output in such a channel?
Figure 1: A binary symmetric channel, with crossover probability p

A: Well, the input and output are often represented by random variables, X and Y respectively. Given
the above figure and the crossover probability, we can compute the conditional probability of Y given X
as:
P (Y = 0|X = 0) = P (Y = 1|X = 1) = 1 p,
(2)
P (Y = 0|X = 1) = P (Y = 1|X = 0) = p.
(3)
Using the Bayes theorem, we can get the probability distribution of X given Y as follows:
P (X = x|Y = 0) =
P (Y = 0|X = x)P (X = x)
,
P (Y = 0|X = x)P (X = x) + P (Y = 0|X = x0 )P (X = x0 )
(4)
and likewise for the other probabilities. Now, the distribution of X is called the a priori probability
distribution, which gives us the distribution of the symbols in the input. The above defined probability
distribution of X given Y is the a posteriori probability. This idea is the basis for the so called MAP
estimate (maximum a posteriori ) that is used in estimation and detection theory.
Examples of PMFs, PDFs and CDFs
Let us take a look at some common probability mass functions (PMFs) and probability density functions
(PDFs) and their corresponding cumulative distribution functions (CDFs).
3.1
Binomial
X B(n, p)
(5)
A binomial random variable, X, is obtained when we perform n Bernoulli trials, i.e., a success/failure
experiment such as a coin toss. The probability of success in each trial is p, and we repeat this n times.
We then count the total number of successes in the n trials. Clearly, this is a discrete random variable.
We see that the n Bernoulli trials are independent. Hence, the probability of X = k is simply the
probability of there
being exactly k successes in n trials. We can choose these k successes to be any of
the n trials in nk ways, and hence we get the following

n k
P (X = k) =
p (1 p)nk .
(6)
k
Clearly, the sum of the above terms from k = 0 to n is 1.
2
3.2
Geometric
X Geo(p)
(7)
The geometric random variable is also a discrete random variable. It is the discrete analogue of
the exponential random variable. There are two slightly different but equally valid ways of defining the
geometric random variable. Ill tell you one of those two. We define the geometric random variable as the
number of Bernoulli trials needed to get one success. The support set is {1, 2, 3, . . .}. The parameter, p is
the probability of success for the Bernoulli trials. So, considering a coin toss and head to be success and
tail to be failure, we keep tossing the coin till we see the head. The probability that we see the head on
the k th trial is the probability of seeing tails in the first k 1 trials and then seeing a head. This gives us
P (X = k) = (1 p)k1 p.
(8)
Summing the above from 1 to infinity gives us 1, as expected.
3.3
Uniform (continuous)
X U(a, b)
(9)
The uniform random variable can be defined for both the discrete and continuous case. We will look
at the continuous version. The uniform random variable is defined on an interval [a, b]. The probability of
any single point within this region is not a meaningful way to define this quantity because its probability
is zero, as a point is a set of measure zero1 . So, instead we can say that the probability of any portion
of the line segment from a to b is the length of that portion of the segment, divided by the length of the
whole segment, b a.
More useful is its PDF. This is given as
(
1
x [a, b]
f (x) = ba
(10)
0
otherwise.
The CDF can be obtained from the PDF as
Z
f (u)du =
F (x) =
3.4
x1
ba
x<a
x [a, b)
x b.
(11)
Exponential
X exp()
(12)
The exponential random variable is a continuous random variable. It is the continuous analogue of
the geometric distribution that we saw earlier. A very important property of the exponential distribution
is that it is memoryless.
The parameter > 0 is also called the rate. The PDF of the random variable is given by
(
ex x 0
f (x) =
(13)
0
x < 0.
The CDF is given by
(
1 ex
F (x) =
0
x0
x < 0.
(14)
1
You may be inclined to take the Real Analysis courses from the Mathematics department in a future semester to fully
understand and appreciate measure theory, which is used to explain the probability theory.
3.5
Gaussian (or Normal)

X N (, 2 )
(15)
The Gaussian random variable, also called the Normal random variable, is a very common random
variable probably the most important one that you will learn in probability theory, at least for engineers.
It appears almost everywhere. And oftentimes, engineers will stick in a Gaussian random variable for an
unknown distribution just because it is so nice to work with! And if it works, all the merrier!
The parameters, and 2 are respectively the mean and variance of the distribution. The PDF is
given by
!
1
(x )2
f (x) =
, x R.
(16)
exp
2 2
2 2
Unfortunately, the CDF of the Gaussian random variable cannot be expressed in terms of elementary
functions and hence has no closed form expression.
Due to the shape of the PDF, the normal distribution is sometimes also called the bell curve. The
phrase, grading on a curve usually refers to using the bell curve, namely the normal distribution to
distribute the grades to the point scale!
Since the Gaussian distribution is so widespread in its occurrence and usage, I have decided to include
a figure of the bell curve for you to inspect.
Figure 2: The world famous bell curve, i.e. the probability density function of the Gaussian random
variable.

Lecture Notes Feb10

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes Feb10

Uploaded by

Copyright:

Available Formats

Substitute Class - Lecture Notes

Abhijit Kiran Valluri

Calculating the probability

Binary Symmetric Communication Channel

Figure 1: A binary symmetric channel, with crossover probability p

Examples of PMFs, PDFs and CDFs

Summing the above from 1 to infinity gives us 1, as expected.

Gaussian (or Normal)

You might also like