You are on page 1of 36

Chapter 4: Sequence of Random

Variables
Chebyshevs Inequality, Convergence, Law of
Large Numbers, Central Limit Theorem, random
walks.

The standard deviation


Standard deviation is the positive square root
of the mean-square deviations of the
observations from their arithmetic mean.
Formula:

s=

(X X)
n 1

= variance

Produces a more
accurate estimate of the
true population standard
deviation

Note: Because the sum of the deviations equals zero, once n-1 of the
deviations are specified, the last deviation is already determined.
Hence the denominator uses the number of quantities that are free to
vary, called degrees of freedom.

Chebyshevs inequality
If X is a random variable with mean and variance 2,
then for any value k > 0
2

P { X k}
k

For proof see Page 127, $ 4.9 in your text book


Alternatively, for a given sample

s be the sample mean and standard deviation


of a data set. Assuming that s >0, Chebyshevs inequality
Let

and

states that for any value of k 1, greater than 100 1 2

k
percent of the data lie within the interval from x ks to

x + ks

Continued
Note: Thus by
letting k = 3/2, from Chebyshevs inequality that greater
than 100(5/9) = 55.56% of the data from any data set lies
within a distance of 1.5s of the sample mean x ,
letting k=2 shows that greater than 75% of the data lies
within 2s of the sample mean and
letting k=3 shows that greater than 800/9=88.9 % of the
data lies within 3s of the sample mean.

Empirical rule

Empirical rule

Sequence of random variables


A set of indexed set of random variables: {X1 , X 2 ,......., X n ,...}
Suppose we have a sample space S and a probability measure P. Also,
suppose that the sample space S consists of a finite no. of elements i.e.

S = {s1 , s2 ,....., sk }

Then a random variable X is a mapping that assigns a real no. to any


of the possible outcomes si, i=1,2,..,k. Thus X(si) = xi for i=1,2,
,k.
To define a sequence of random variables X1 , X 2 ,......., X n ,... it is
given that we have an underlying sample space S. In particular, each
Xn is a function from S to real numbers. Thus,

Xn (si ) = xni , for i = 1,2,......, k

In sum, a sequence of random variables is in fact a sequence of


functions X n : S

Example 1
Consider the following random experiment. A fair coin is tossed once.
Here the sample space has only two elements S = {H,T}. We define a
sequence of random variables X1 , X 2 ,......., X n ,... on the sample space as
follows:
1

Xn (s) = n + 1
1

if s = H
if s = T

a. Are the Xis independent?


b. Find the pmf and cdf of Xn,

FXn (x) for n=1,2,.

c. As n , what does FXn (x) look like?

Soln. a. The Xis are not independent because their values are determined by the
1
same coin toss. In particular, P(X1 = 1, X 2 = 1) = P(T ) =

which is different from

1 1 1
P(X1 = 1).P(X2 = 1) = P(T ).P(T ) = . =
2 2 4

Continued
b. Each Xi can take any two possible values that are equally likely. Thus
the pmf of Xn is given by

1
2
PXn (x) = P(Xn = x) =
1
2

From this we can get the cdf of Xn

1
if x =
n +1
if x = 1

1 if x 1

1
1
FXn (x) = P(Xn x) = if
x 1
n +1
2
1

0 if x < n + 1

The adjacent
figure shows the
cdf of Xn for
different values of
n. We see in the
figure that the cdf
of Xn approaches
the cdf of a
Bernoulli(1/2)
random variable
as n .

Note: This means


that the sequence
converge in
distribution to a
Bernoulli(1/2)
random variable as

Example 2
Consider the following random experiment. A fair coin is tossed repeatedly
forever. Here, the sample space S consists of all possible sequences of
heads and tails. We define the sequence of random variables X1, X2, .,
Xn,.. as follows

Xn = {0
1

if the n th coin toss results in a head


if the nth coin toss results in a tail

In this example, the Xis are independent because each Xi is a result of


a different coin toss. In fact, the Xis are independent and identically
distributed (iid) Bernoulli (1/2) random variable.

A special kind of sequence of r.v.


A very common and particularly useful class of sequences of random
variables is IID
Definition: IID Independent and identically distributed
Independent essentially, the value of one random variable does
not effect the value of any other; for instance, a coin doesnt
remember which side last landed up, so consecutive flips are said
to be independent
Identically distributed each random variable X has associated
with it a cumulative distribution function (CDF) which is derived
from the probability measure. The CDF gives the probability of the
value of the random variable being less than or equal to a certain
value. When two or more random variables have the same CDF,
we say they are identically distributed.

Convergence
We are interested in behaviour of functions of random variables such
as means, variances, proportions
For large samples, exact distributions can be difficult / impossible to
obtain.
In these situations we would like to see if a sequence of random
variables X1, X2, .., Xn, converges to a random variable X, i.e.
we would like to see if Xn gets closer and closer to X in some sense
as n increases. For example, suppose we are interested in knowing
the value of a random variable X, but we are not able to observe X
directly. Instead we can do some measurements and come up with an
estimate of X, call it X1. We then perform more measurements and
update our estimate of X, call it X2. Continue this process to obtain X1,
X2, X3, .. Finally as n increases our estimate gets better and
better i.e. Xn converges to X.

Example
Let Y1, Y2, Y3, . be a sequence of iid random variables. Let

1 n
Xn = Yi be the average of the first n of the Yis. This defines a new
n i=1
sequence X1, X2, . In other words, the sequence of interest X1, X2,
. might be a sequence of statistics based on some other sequence of
iid random variables. Note that the original sequence Y1, Y2, is iid
but the sequence X1, X2, .. is not iid

Types of convergence
There are two main types of convergence.
Convergence in Probability (CIP) - A sequence of random variables
is said to converge in probability to X if the probability of it differing from
X goes to zero as n gets large, written as

Xn P
X if >0,

P ( Xn X > ) 0 as n

Limit of an estimator

Convergence in Distribution (CID) - A sequence of random variables


is said to converge in distribution to X if the limit of the corresponding
cumulative distribution functions is the cumulative distribution function
of X, written as

Xn X if lim Fn (t) = F(t) t at which F is continuous


Limit of a CDF

Continued
Besides these there are also other types of convergence
Convergence in Quadratic mean - A sequence of random variables
is said to converge in quadratic mean to X, written as

Xn qm
X, if E ( Xn X ) 0 as n
2

is used primarily because it is stronger than CIP or CID and it can


be computed relatively easily
Convergence almost surely - A sequence of random variables is
said to converge almost surely to X, written as

Xn a.s.
X if >0,

P lim Xn X < = 1
n

Continued
Relation between different types of convergence:

a.s.

qm CIP CID
Law of Large Numbers (LLN)

Central Limit Theorem (CLT)

Limit Theorems
Limit Theorems can be used to obtain properties of estimators as the
sample sizes tend to infinity
Convergence in Probability Limit of an estimator
Convergence in Distribution Limit of a CDF
Two important theorems in probability:
the Law of Large Numbers (LLN)
the Central Limit Theorem (CLT)

Law of Large Numbers (LLN)


It states that if you repeat an experiment independently a large
number of times and average the result, what you obtain should be
close to the expected value, i.e. the mean of a large sample is close
to the mean of the distribution.
Two main versions of the law of large numbers Weak law of large numbers (WLLN)
Strong law of the large numbers (SLLN)
The difference between them is mostly theoretical
The Weak Law of Large Numbers: Let X1 , X 2 ,......., X n be an iid
random variables with a finite expected value, i.e. E(Xi ) = <
then for any > 0 ,

lim P X = 0
n

Continued..
Proof: The proof will be easier if we additionally assume that the random
variables have a finite variance i.e.
2

Var(X) = <

Using Chebyshevs inequality

Var(X) Var(X)
P X
=
0 as n
2
2

n
Application: Suppose that a sequence of independent trials is performed. Let E
be a fixed event and the probability that E occurs on a given trial be P(E). Let,

1 if E occurs on trial i
Xi =
0 if E does not occur on trial i
Then it follows that X1 + X 2 + .....+ X n represents the no. of times that E occurs
in the first n trials. Because, E[Xi ] = P(E)
, by WLLN it follows that for any > 0,
the probability that the proportion of the first n trials in which E occurs differs
from P(E) by more than , goes to 0 as n increases

Distribution depends on sample size

But its spread becomes


more and more reduced
as the sample size
increases

Sample sizes

The sample mean X is


also centered about the
population mean

Probability density function of the sample mean from a


standard normal population

The Central Limit theorem


One of the most important results in Probability
What does it say?
this theorem asserts that the sum of a large number of
independent random variables has a distribution that is
approximately normal.
it also helps explain the remarkable fact that the
empirical frequencies of so many natural populations
exhibit a bell-shaped (that is, a normal) curve.

Statement: Central Limit Theorem


Let X1 , X 2 ,......., X n ,..... be a sequence of independent and identically
distributed random variables each having mean E[Xi] = < and
variance 0 < Var(Xi) = 2 <, then the random variable

X X1 + X2 + ......+ Xn n
Zn =
=
/ n
n
converges in distribution to the standard normal random variable as
n . That is

lim P ( Z n x ) = (x)
n

This means that if X1 , X 2 ,......., X n ,..... be a sequence of i.i.d


random variables with mean and variance 2, then for n large, the
distribution of X1+X2+.+Xn is approximately normal with mean
n and variance n2

Central Limit Theorem


An interesting fact about the CLT is that it does not matter what the
distribution of the Xis is. The Xis can be discrete, continuous or
mixed random variables. Let us look at some examples:
Example 1. Lets assume that Xis are Bernoulli(p). Then E[Xi] = p
and Var(Xi) = p(1-p). Also Yn = X1+X2++Xn has Binomial(n,p)
distribution. Thus
Yn np

Zn =

np(1 p)

where Yn ~ Binomial(n,p). Figure shows the pmf for different values


of n. The shape of the pmf gets closer to the normal pdf as n
increases (See the next slide)
Example 2. Let's assume that Xi's are Uniform(0,1). Then E[Xi]=1/2,
Var(Xi)=1/12. In this case, Zn=(X1+X2+...+Xnn/2)/n/12
Figure shows the PDF of Zn for different values of n. The shape of
the PDF gets closer to the normal PDF as n increases (next slide)

Example1

Fig: In these figures, Zn is the


normalized sum of n
independent Bernoulli(p) random
variables. The shape of its PMF,
PZn(z), resembles the normal
curve as n increases.

Example 2

Fig: Zn is the normalized sum of n


independent Uniform(0,1) random
variables. The shape of its PDF,
fZn(z), gets closer to the normal
curve as n increases.

A Remark
We could have directly looked at Yn=X1+X2+...+Xn, so why
do we normalize it first and say that the normalized version
(Zn) becomes approximately normal?
This is because E[Yn]=nE[Xi] and Var(Yn)=n2Var(Yn)=n2 go to
infinity as n goes to infinity. We normalize Yn in order to have a finite
mean and variance (E[Zn]=0, Var(Zn)=1).
Nevertheless, for any fixed n, the CDF of Zn is obtained by scaling
and shifting the CDF of Yn. Thus, the two CDFs have similar shapes.

Example:
An insurance company has 25000 automobile policy holders. If the
yearly claim of a policy holder is a random variable with mean 320 and
standard deviation 540, approximate the probability that the total yearly
claim exceeds 8.3 million.

Solution: Let X denote the total yearly claim. Number the


policy holders, and let Xi denote the yearly claim of policy
holder i. With n = 25,000, we have from the central limit
n
theorem that

X = Xi
i=1

will have approximately a normal distribution with mean 320


25,000 = 8 106 and standard deviation 540 25,000 =
8.5381 104.

Contd.
Therefore
6
6
6

10
8.3

10

10
6
P{X > 8.3 10 } = P
>

4
4
8.5381 10
8.5381 10

X 8 10 6
.3 10 6
= P
>
4
4
8.5381 10
8.5381 10

P{Z > 3.51}


0.00023
Thus there are only 2.3 chances out of 10000 that the total yearly
claim will exceed 8.3 million

Important Applications
The importance of the central limit theorem is, in many real
applications, a certain random variable of interest is a sum of a large
number of independent random variables. In these situations, we
are often able to use the CLT to justify using the normal distribution.
Here are few examples:
Laboratory measurement errors are usually modeled by normal
random variables.
In communication and signal processing, Gaussian noise is the most
frequently used model for noise.
In finance, the percentage changes in the prices of some assets are
sometimes modeled by normal random variables.
When we do random sampling from a population to obtain statistical
knowledge about the population, we often model the resulting
quantity as a normal random variable.

Continued
Also it can simplify our computations significantly. A problem in
which we are interested in a sum of one thousand i.i.d. random
variables, it might be extremely difficult, to find the distribution of the
sum by direct calculation. Using the CLT we can immediately write
the distribution, if we know the mean and variance of the Xi's.

How large a sample is needed?


The central limit theorem leaves the question that how
large the sample size n needs to be for the normal
approximation to be valid?
The answer depends on the population distribution of the sample
data
For instance, if the underlying population distribution is normal, then
the sample mean X will also be normal regardless of the sample
size.
A general rule of thumb is that one can be confident of the normal
approximation whenever the sample size n is at least 30. That is,
practically speaking, no matter how nonnormal the underlying
population distribution is, the sample mean of a sample of size at
least 30 will be approximately normal.

How to apply the CLT


Here are the steps that we need in order to apply the CLT:
Write the random variable of interest, Y, as the sum of n i.i.d.
random variable Xi's:
Y=X1+X2+...+Xn.
Find E(Y) and Var(Y) by noting that
E[Y]=n, Var(Y2)=n2, where =E(Xi) and 2=Var(Xi).
According to the CLT, conclude that

Y E(Y ) Y n
=
Var(Y )
n

is approximately standard normal; thus, to find P(y1Yy2), we can


write
y n Y n y n

P(y1 Y y2 ) = P

y2 n
y1 n

n
n

Examples
Example 1. A bank teller serves customers standing in the queue one by
one. Suppose that the service time Xi for customer i has mean
E[Xi]=2(minutes) and Var(Xi)=1. We assume that service times for
different bank customers are independent. Let Y be the total time the
bank teller spends serving 50 customers. Find P(90<Y<110).

Example 2. In a communication system each data packet consists of


1000 bits. Due to the noise, each bit may be received in error with
probability 0.1. It is assumed bit errors occur independently. Find the
probability that there are more than 120 errors in a certain data packet.

Random Walk

You might also like