You are on page 1of 7

ST 370 { Random Variable Review

Random variables are simply numerical results from any experiment or physical process which
produces random outcomes. Typically we denote random variables by capital letters like X or Y
and seek to understand where these random variables spend their time, in other words, we want to
make calculations such as P (a < X < b). But before studying random variables, a fair question is

Why do we need to study random variables?

The eld of statistics is about how to use data sets, random samples from a larger set of
data called a population (possibly from a process such as the wafers coming o a semiconductor
production line), to make inferences about the population or process. The rst step in most studies
is to summarize the data with simple graphs such as histograms and boxplots and with numerical
summaries such as the sample mean X and standard deviation s.
To go beyond these simple descriptive statistics requires that we develop a language to talk
about the randomness in data sets. For this purpose the language and calculus of random variables
has been developed and re ned during the 20th century. We will look at just the barest essentials
of this language and theory. In the end we want to think of data sets as groups of random variables
X1; : : :; Xn . Our understanding of the calculus of random variables will enable us to make important
probability statements about how these random samples behave and how much information they
contain. For the rst part of the discussion we focus on just one random variable, say X1 , but for
simplicity we will drop the subscript.

Discrete Random Variables

A simple experiment like ipping a coin twice results in a basic sample space of possible
outcomes S = f(H; H ); (H; T ); (T; H ); (T; T )g. A random variable from this experiment is X = the
number of heads. It has sample space SX = f0; 1; 2g and associated probabilities P (X = 0) = 1=4,
P (X = 1) = 1=2, P (X = 2) = 1=4 (assuming it is a fair coin). The sample space SX and the
associated list of probabilities is called the probability distribution of X .
In some cases the probabilities can be put in the form of a function such as for the binomial
random variable (with success probability p and number of trials n)
P (X

= k) =

n
k

pk (1

p)n k ;

k = 0; 1; : : :; n:

This latter function is called a probability function (or probability mass function or probability
density function). The argument of the function is often a small letter like x, but for the binomial
we often use i or k to emphasize that the possible values are integers.
The expected value of a random variable is the weighted average of the possible values,
E(X ) =

xi P (X

= xi );

where the weights are the probabilities of the values. We call E(X ) the mean of X and often use
the notation  or X .
For the coin experiment above,
E(X ) = (0)(1=4) + (1)(1=2) + (2)(1=4) = 1:
It is harder to nd the expected value of a binomial random variable, but the result is quite simple:
n
X
k=0

kP (X

= k) =

n
X
k=0

n
k

px (1

p)n

= np:

(Actually the coin experiment random variable above is a binomial random variable with n = 2
and p = 1=2. Thus the expected value is np = (2)(1=2) = 1.)
Another important expected value is E(X )2 which is called the variance of X . We often
use the notation Var(X ) or  2 or X2 :
Var(X ) = E(X )2 =  2 =

(xi )2P (X = xi ):

The variance is an average weighted distance from the mean . By multiplying out the above
expression and simplifying we are able to derive the computing formula
Var(X ) = E (X 2) 2 =

x2i P (X

= xi ) 2 :

The standard deviation of X is just the square root of the variance, sd(X ) =  2 =  .
For the simple coin experiment we have
var(X ) = (0 1)2(1=4) + (1 1)2(1=2) + (2 1)2(1=4) = 1=2;
2

or by the computing formula

var(X ) = (02)(1=4) + (12)(1=2) + (22)(1=4) (1)2 = 1=2 + 1 1 = 1=2;
p

and the standard deviation is  = 1=2 = :707. For the binomial random variable, we nd
Var(X ) = np(1 p). (Since the coin experiment X is a binomial random variable, we have
Var(X ) = np(1 p) = (2)(1=2)(1 1=2) = 1=2, which agrees with the direct approach.)
Note that the random variables mentioned so far have a nite set of possible values in SX . It
is possible to have an in nite list of possible values such as for the Poisson random variable
P (X

 k
= k) = e k! ;

k = 0; 1; : : :

This is not a problem since the probabilities are going to zero quite fast as k gets large|fast
P
enough in fact so that the probabilites sum to one, 1
k=0 P (X = k) = 1, as they must for any
P
random variable. In fact, we can also nd the in nite sums  = 1
k=0 kP (X = k) =  and
P1
2
2
2
2 = . (It may seem a bit strange to
k=0 k P (X = k) =  +  so that Var(X ) =  + 
have the mean of a random variable equal to its variance. But this is an important distinguishing
feature of the Poisson distribution. A simple way to check whether a data set appears to be from
a Poisson distribution is to compare the sample mean X and sample variance s2 which should be
approximately equal.)

Continuous Random variables

Random variables with nite or even in nite lists of possible values are called discrete random
variables because the values are separated (or discrete). When a random variable can take all
possible values in an interval, say X 2 [0; 1], then we call X a continuous random variable because
the possible values are not separated but rather continuous. Many physical measurements are
continuous such as rainfall or wind speed or humidity. Even though the measurement of such
quantities is necessarily discrete because of the limitation of measuring devices (say to the nearest
.01 inch for rainfall), we still call such random variables continuous.
Continuous random variables require a di erent probability description than that used with
discrete random variables because of the uncountably in nite number of possible values in the
sample space. Although SX = [0; 1] is simple to write down, it is not possible to assign a positive
probability to each point in SX . Instead we make the mind-boggling assumption that each point
3

actually has probability 0, but sub-intervals of SX have positive probability described exactly by
the area under a curve of a function f (x) called the density function of X . This function must
R
be nonnegative, f (x)  0, and integrate to one, 11 f (x)dx=1, so that probabilities such as
R
P (a < X < b) = ab f (x)dx obey the laws of probability. Thus the probability distribution of a
continuous random variable X is described completely by the function f and the subset SX of
( 1; 1) where f (x) > 0.
The simplest examples of continuous random variables are the uniform(c; d) (assuming c < d)
with density
f (x) =

1 ;

x 2 [c; d];

and the exponential( )

1

f (x) = e

x= ;

x  0:

Note that we have been amiss in not de ning the above densities outside SX = [c; d] or SX = [0; 1),
respectively. Our convention will always be that outside the region where f is de ned, f (x) is 0.
The expected value E(X ) for continuous random variables,
E(X ) =

1
1

xf (x)dx;
P

is a weighted average similar to the expected value E(X ) = xi P (X = xi ) for discrete random
variables. Similarly we use the notation  for E(X ) and de ne the variance by
Var(X ) =  2 = E(X )2 =

1
1

(x )2f (x)dx;

with the computing formula version

E(X ) = E(X )  =
2

1 2
x f (x)dx
1

2 :

E(X ) =

Z d

1 dx =
x
d

"

x2
2(d c)

#d

2
2
= d c = d + c;
2(d c)
2
c

and
E(X ) =
2

"

d
x3
d2 + c2 + cd
1 dx =
d3 c3
=
;
x
=
d c
3(d c) c 3(d c)
3

Z d

and
Var(X ) =

d2 + c2 + cd

d+c

2

= (d 12c) :
2

Z 1
1
x e x= dx
E(X ) =
0

Z


(after making the change-of-variable y = x= )

ye y dx

= ye

y 1
0

= [0 0 + 0 + 1] = :
Similarly we nd E(X 2) = 2 2 and Var(X ) = 2 2 2 = 2:

Distribution Functions
For computing probabilities about X such as P (a < X < b) =
to de ne a function
F (x) = P (X

 x) =

Z x

Rb

f (x)dx,

it sometimes helps

f (t)dt;

which is called the cumulative distribution function of X or simply the distribution function of
X . If F is known, then it can be simpler to calculate P (a < X < b) = F (b) F (a) rather than
integrate f from a to b for each new a and b.
For example, the exponential( ) distribution function is
F (x) =

Z x

1e

t= dt

i
t= x

=1 e

x=

x  0:

Since F (x) is 0 for x < 0, our convention is to ignore F over that interval. If lifetimes of light bulbs
have an exponential distribution and we want the probability that a light bulb with average life
of = 800 hours will last longer than 1000 hours, we have P (X > 1000) = 1 P (X  1000) =
1 (1 e 1000=800) = e 1:25 = :29.
5

To get the density function f from the distribution function F (x), just take the derivative of
F (x) with respect to x. For example, d[1 exp( x= )]=dx = (1= )exp( x= ):
The most important continuous distribution is the normal distribution whose density function
is
f (x) =

p1

2

(x )2
2 2

1 < x < 1; 1 <  < 1; 0 <  < 1:

The mean of this distribution is E(X ) = , and the variance is Var(X ) =  2. Thus the normal
distribution has natural parameters  and  2 which turn out to be its mean and variance as their
names suggest.
The normal distribution is very important because many types of data and measurements can
be well-modeled by a normal density. That is, many data sets have histograms which resemble
the bell-shaped curve of a normal density. The normal distribution also has special mathematical
properties which are beyond the scope of this course. In one aspect, though, the normal distribution is very hard to work with: no analytic integral exists for the function e x . Thus, to nd
probabilities for normal random variables we need to use numerical integration.
2

But we may not always have a computer or calculator available to do the numerical integration.
Thus the classical approach is to put in a table the values of the distribution function F (z ) for the
standard normal random variable (usually called Z ) whose density is
f (z ) =

p1

2

z2
2

1 < z < 1:

The standard normal Z has mean  = 0 and variance  2 = 1. The key to using the standard
normal distribution table is that X = Z +  is also a normal distribution with mean  and
variance  2. Thus, to nd probabilities for a general normal random variable X , we need to
convert the probability about X into a probability about Z , and then look up the probabilities for
Z in the table.
For example, suppose that we want P (5 < X < 8), where X has mean  = 4 and standard
deviation  = 3. Replacing X by Z +  = 3Z + 4, we have
P (5 < 3Z + 4 < 8)

= P (5 4 < 3Z < 8 4) = P (1=3 < Z < 4=3) = FZ (4=3) FZ (1=3)

=  FZ (1:33) FZ (0:33) = :9082 :6293 = :2789
6

Transformations
We have just seen how a linear transformation X = Z +  or its inverse (Z = X )= helps
with probability calculations. In general we may know the density and/or distribution function of
a random variable X and want the density and/or distribution function of some transformation
of X , say Y = aX , or Y = aX + b, or some general one-to-one function Y = h(X ) with inverse
h 1 (Y ) = X .
The easiest method to remember is the distribution function method:

FY (y ) = P (Y

The steps for this method are

1. Substitute h(X ) for Y in the de nition of FY (y ) as a probability.
2. Use algebra to get X isolated on the left-hand-side of the  symbol.
3. Substitute for x in FX (x) the expression on the right-hand-side of the  symbol.
An example is Y = X 2 where X has an exponential( ) distribution:

p
p
FY (y ) = P (Y  y ) = P (X 2  y ) = P (X  y ) = FX ( y ) = 1

py 

y  0:

The distribution method is not as useful when you start with the density of X and want the
density of Y because you have to rst nd the distribution function of X to use in the above
sequence. Then at the end you must take the derivative of the distribution function of Y to get the
density function of Y . If you put all those steps together, the density function method is simply


fY (y ) = fX h

 d

(y )

(h 1 (y )) :
dy

For example, with Y = X 2 we have X = Y = h 1 (Y ) and

d(h

(y )) = d(py ) = 1 :
dy
dy
2py
1

1
fY (y ) = e

py

2p

x= ,

y  0:

we have