You are on page 1of 192

MATH2011 Statistical Distribution Theory

Chapter 1: Discrete random variables and some simple examples


In this module we are going to examine the properties of distributions of random variables, we
begin by defining what we mean by a random variable.

1.1 Definitions:
The set of all possible elementary outcomes of an experiment is called the sample space.
A random variable is a mapping of a sample space to the real line.
Note: we will usually a random variable by a capital letter (e.g. Y) and the value taken by a random
variable by a lower case letter (e.g. y).
A discrete random variable is a random variable that can take only a finite or countably infinite
number of values.
Note: in this module discrete random variables will usually be integer-valued.
[see 1024 P10]

1.2 Examples
If the experiment consists of tossing a coin with outcomes head or tail and we toss the coin once,
then clearly the sample space is {head, tail}. One possible random variable, X say, is the number of
heads obtained. X can only take values 0 or 1 and so is a discrete random variable.
Suppose we are interested in monitoring the number of hits at a web site in a year. Denote the
number by Y. Clearly, Y must be integer-valued and so is a discrete random variable, but there is no
obvious upper bound to Y, so it may be convenient to take the set of possible values of Y to be the
countably infinite set {0, 1, 2, }.

1.3 The probability function of a discrete random variable


Associated with a discrete random variable is a probability function (sometimes called the
probability mass function), which gives the probability of each possible value of the random
variable.
Let the random variable of interest be denoted by X with a set of possible values D. Then the
probability function p(.) is given by
p(x) = P(X = x), for x D .
Note: It is important to include the domain D when specifying a probability function.

Clearly p(x) 0 for all real values x and p(x) 1 .


xD

1
[see 1024 P10]

1.4 Example
If the experiment consists of tossing a coin with outcomes head or tail and we toss the coin once,
then clearly the sample space is {head, tail}. Suppose X, the number of heads obtained, is our
random variable of interest. Then, if the coin is fair, the probability function of X is
p(0) = p(1) = .
However, if the fairness of the coin is unknown the probability function of X could be taken to be
p(x) x (1 )1x , x = 0, 1.
Here is a parameter, i.e. a fixed but unknown constant. Clearly, must lie between 0 and 1 in
this case since it is a probability. This is a common situation encountered in Statistics: we might
assume we know the form of a probability function but it contains one or more unknown quantities
(parameters) whose value(s) we need estimate from sample data.

1.5 Bernoulli random variables

A Bernoulli trial is an experiment with just two possible outcomes success and failure that
occur with probabilities and 1 respectively, where is the success probability.
A Bernoulli random variable X has probability function

p(x) x (1 )1x , x = 0, 1,
where 0 1.
This is a basic building block for some familiar but more complex discrete random variables.
[see 1024 P10]

1.6 Discrete random variables derived from Bernoulli trials


1.6.1 Binomial random variables
Suppose we undertake a fixed number, n, of independent Bernoulli trials, each with success
probability . Let X be the number of successes in these n trials. Then X is a Binomial random
variable with probability function
n
p(x) x 1 , x = 0, 1, , n,
n x

x
where 0 1.
We will often say in such circumstances X is Binomially distributed or X is Binomial(n, ) or
X~ Binomial(n, ).
[see 1024 P10]

2
1.6.2 Negative Binomial random variables (including Geometric random variables)
Suppose we undertake a sequence of independent Bernoulli trials, each with success probability .
Let X be the number failures that occur before the kth success. Then X is called a Negative
Binomial random variable with probability function
(k x 1)!
p(x)
(k 1)!x!
1 x k , for x = 0, 1, 2, ,
where 0 1.
In the special case with k = 1 X is called a Geometric random variable with probability function

p(x) 1 x , for x = 0, 1, 2, .

[see 1024 P11]

1.7 Poisson random variables


Poisson random variables arise in a variety of practical situations where events occur apparently at
random with a rate of occurrence per unit time, e.g. queuing theory. The random variable Y is
defined as
Y = the number of events in an interval of fixed length t.
Provided that an event occurs instantaneously, the range of possible values for Y is 0, 1, ,
showing that this is a discrete random variable defined over the non-negative integers.
Then the probability function for a Poisson random variable is given by

et t
y

p(y) = , y = 0, 1, ...,
y!
where > 0.
This result was obtained in detail in MATH1024.
Of course, by defining the time unit appropriately one can take t = 1, giving probability function
e y
p(y) = , y = 0, 1, ...,
y!
where > 0. This is a useful form of the probability function if one is aiming to model count data
more generally, particularly when time is not the main focus.
[see 1024 P13]

1.8 A relationship between Poisson and Binomial random variables


Suppose that X~ Binomial(n, ) with n = and let n , then it can be shown that for fixed x
n x e x
p(x) 1
n x
as n .
x x!

So for large n and small , Binomial probabilities can be approximated by Poisson probabilities.

3
1.9 A practical illustration of using discrete random variables

Though this module is concerned primarily with obtaining theoretical results about random
variables, it is useful to remember that the results are often useful in applications. Here is an
illustration of a simple example in statistical modelling.
Suppose that we have data consisting of the number of oil producing wells in a region of Texas
(Data given in Davis (1986)). This shows the locations of oil-field discovery wells in part of the
Eastern Shelf area of the Permian Basin, Fisher and Noland counties, Texas. One question that
could be asked of these data is Are the oil wells occurring at random in this region, or is there some
pattern to their distribution?

We may investigate this by defining a suitable random variable and investigating its distribution to
see whether a Poisson distribution might be appropriate. If so, then this would confirm that the
wells are occurring at random. Suppose that we define selected areas according to the grid of
squares in the picture below (these squares are called quadrats), and count the number of wells in
each area. Then the pictorial representation has been transformed into data consisting of counts of
wells over the grid. The data are discrete, in that only integer values are obtainable.

16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10

FIGURE 1. Locations of oil field discovery wells in part of the Eastern Shelf
area of the Permian Basin, Fisher and Noland counties, Texas. Quadrats are
approximately 10 square miles in size.

4
Suppose we count the number of wells in each quadrat. The data set produced is shown below,
from which we might ask the following questions.

1 What is the average number of wells per quadrat?


2 What proportion of quadrats have no wells?
3 Is this the pattern we would expect if these wells were randomly distributed over this region?
The counts in the 160 quadrats are shown below:

0 1 2 2 1 2 0 0 0 0
0 0 2 0 3 3 1 1 0 0
0 3 0 0 3 5 2 1 1 0
2 3 1 0 1 4 1 1 0 1
3 2 2 0 0 1 2 1 2 0
3 0 0 3 1 0 2 1 0 0
2 0 0 0 0 3 1 1 1 0
4 0 0 1 0 0 4 2 1 1
2 0 1 0 0 0 0 1 2 1
1 0 0 1 0 0 2 2 1 0
0 2 2 0 0 1 0 0 0 1
0 1 0 0 1 1 1 0 2 2
1 1 3 4 2 1 3 0 1 0
0 3 3 3 1 0 0 0 0 1
2 2 3 1 1 0 0 0 0 0
6 3 1 2 2 1 0 0 0 0

The first step is to summarise these data and extract answers to some or all of these questions?
If we count the number of 0s, 1s etc and arrange the results in a table the following is produced:

Number Frequency Cumulative Relative Cumulative


frequencies Frequencies relative
frequencies
0 69 69 0.4313 0.4313
1 43 112 0.2687 0.7000
2 26 138 0.1625 0.8625
3 16 154 0.1000 0.9625
4 4 158 0.0250 0.9875
5 1 159 0.0062 0.9938
6 1 160 0.0062 1.0000

This table shows the frequency of the number of wells in a quadrat, the cumulative frequency, the
proportion and the cumulative proportion. We can see that a proportion 0.4313 of the quadrats
contain no wells. Multiplying these proportions by 100 gives percentage frequencies which show
the percentage of the sample which has 0, 1, etc wells.

5
To obtain a diagram illustrating this data the frequencies or relative frequencies may be plotted
against the count number as in the diagram below.

Frequency Relative
70 frequency

60 0.375

50

40 0.25

30

20 0.125

10

0 1 2 3 4 5 6
Number of wells
Figure 2 Frequency diagram for the discrete well data

The average number of wells per quadrat is found by adding the 160 observations and dividing by
the sample size ( i.e. number of quadrats = 160).

i.e. average = (0 + 1 + 2 + 2......+ 0)/160


= (43 1 + 26 2 + 16 3 .....+ 6 1)/160
= (no. of wells freq)/160
= (no. of wells rel. freq)
= 1.0625.
i.e. on average there is just over one well per quadrat in this area.

Does the Poisson distribution provide a good explanation for these data?
Notice that if t is the length of the interval considered (in this case the area of a quadrat), and is
the rate of occurrence per unit area, then we would expect on average an interval to contain = t
events. This value is the mean of the random variable of interest and is the parameter in the
Poisson probability function which must be determined in order to try to answer the question. Since

6
we do not know the true mean or the true rate of the occurrence of the oil wells, we shall need
to use the sample mean 1.06 as an estimate of

If the Poisson distribution fits the data, the relative frequencies will reflect the probabilities that
would be given by the corresponding Poisson probabilities with parameter 1.06;

These are calculated below:


p(0) = e-1.06(1.06)0 = 0.3464
p(1) = e-1.06(1.06)1/1! = 0.3672
p(2) = e-1.06(1.06)2/2! = 0.1946
p(3) = e-1.06(1.06)3/3! = 0.0687
p(4) = e-1.06(1.06)4/4! = 0.0182
p(5) = e-1.06(1.06)5/5! = 0.0038
p(6) = e-1.06(1.06)6/6! = 0.0007

The expected frequencies for this model may be found using 160 p(x), for x = 0, 1, 2,..., giving:

Number of wells Observed Relative Poisson Expected


frequencies Probabilities frequencies
0 69 0.4313 0.3464 55.4
1 43 0.2687 0.3672 58.8
2 26 0.1625 0.1946 31.4
3 16 0.1000 0.0687 11.0
4 4 0.0250 0.0182 2.9
5 1 0.0062 0.0038 0.6
6 1 0.0062 0.0007 0.1

with very small values for 7,8,9,.....

There are considerable differences between these results with too many zero observations and too
many in the larger number of wells relative to the results that assume the Poisson assumptions
(which themselves depend on events, i.e. the presence of a well, being randomly distributed in
two-dimensional space). The problem here appears to be that the wells are clustering together too
much for the Poisson model, which depends on random occurrence, to be valid. So our Poisson
model appears not to be suitable in this case. In fact, a version of the Negative Binomial model (see
section 1.6.2) turns out to fit the data very well. The details are beyond what we have done so far
(they require results that you will meet in the second half of the module) but, purely for illustration
that a good model for these data can be found, here are the corresponding results assuming a
Negative Binomial model.

7
Number of wells Observed Relative Neg. Bin. Expected
frequencies Probabilities
0 69 0.4313 0.4124 66.0
1 43 0.2687 0.3112 49.8
2 26 0.1625 0.1611 25.8
3 16 0.1000 0.0706 11.3
4 4 0.0250 0.0281 4.5
5 1 0.0062 0.0106 1.7
6 1 0.0062 0.0038 0.6

with very small values for x = 7,8,9...

The fit here is seen to be very much better, indicating that these oil-producing wells tend to cluster.
The Negative Binomial is thus seen to be a more appropriate model than the Poisson in explaining
this data set.

Final remarks
In this chapter and in subsequent chapters, where material appeared in MATH1024 lectures I refer
you back to the relevant notes from 2015-16 using the notation 1024 Px or 1024 Sx to mean
Probabilty Lecture x or Statistics Lecture x from the MATH1024 notes.
Where relevant I also give information where the material can be found in Mood, Graybill and Boes
(MGB), though the notation in MGB is not always the same as in these notes.
For this chapter the MGB reference is:
MGB Chapter II section 3.1 and Chapter III sections 2.2, 2.4 and 2.5.

8
MATH2011 Statistical Distribution Theory

Chapter 2 Continuous random variables and some simple examples


We shall now look at random variables which may take any values within an interval, for example
in the range from zero to infinity. Such random variables are called continuous random variables.
In this chapter we revise three well-known types of continuous random variables.

2.1 Exponential random variables


Suppose that we have a Poisson process [1024 P13] with events occurring at random at rate per
unit time, but this time we consider the random variable Y = the time interval between two events.
Clearly this variable cannot be negative, but can take any positive value.
The domain of Y is (0, ), or 0 < y < .
Consider P(Y>y) = P(no events in an interval of length y)
e y y
0
= ,
0!
= e-y,
i.e. P(Yy) = 1 e-y for 0 < y < .
Y is called an Exponential random variable. [1024 P17]

2.2 Cumulative distribution function


The expression P(Yy) above is known as the cumulative distribution function (cdf) of Y, and
gives the probability that Y y, denoted by F(y) = P(Yy).
The cdf of any random variable is bounded by 0 and 1 and is monotonically increasing. In addition,
for a continuous random variable it is also a continuous function.
For an Exponential random variable, as in 2.1, we have:
F(y) = 1 e-y for 0 < y < .
[1024 P17]

2.3 Probability density function


y

If for a random variable Y there exists a function f(y) such that F(y) = f ( u)du , then f(y) is called

the probability density function (pdf) of Y.


If such a function exists, then it has certain properties:
a) f(y) 0, for all y.

b) f (y)dy = 1.

c) f ( y)dy =
a
F(b) F(a) = P(a<Yb).

General relationship between the density function and the distribution function
We may find f(y) from F(y) by noting that
dFy
f(y) =
dy
or, if we have f(y), we may find
y

F(y) = f (u )du .

Typically, in many situations of practical interest, the pdf is more convenient to use than the cdf.
[1024 P17]

2.4 Example: Probability density function of an exponential random variable


For the example concerning the time interval between events in a Poisson process,
F(y) = 1 e-y, y>0.
Consider the function f(y) = e-y, y>0.
y

This is such that, using F(y) = f ( u)du ,


we have that F(y) = e


0
u
du = 1 e-y, y>0

which is the cdf of the Exponential distribution.


Thus f(y) = e-y, y>0, is the density function of the Exponential distribution.
[1024 P17]

Here is a numerical example.


Suppose the lifetime of a certain type of electronic component is described by an Exponential
random variable with parameter = 0.01 hours-1. What is the probability such a component will
have a lifetime of between 100 and 200 hours?
The probability is the area under the curve f(y) = 0.01e-0.01y between y = 100 and y = 200
200

0.01e
0.01y
P(100 Y 200) dy
100

= e-1 e-2 = 0.37 0.14 = 0.23.

2
2.5 Normal random variables
The most important continuous probability model is the Normal distribution. Two examples of
Normal distributions superimposed on observed sets of data are given below. The first example
shows a frequency diagram of heights of young adult males while the second involves diastolic
blood pressures of schoolboys.

Frequency
12000

6000

0
60 63 66 69 72 75 78
Height (in)

Figure 1. A distribution of heights of young adult males, with an approximating Normal


distribution (Martin, 1949, Table 17 (Grade 1))

Frequency
30

15

0
40 50 60 70 80
Diastolic blood pressure (mm Hg)

Figure 2. A distribution of diastolic blood pressures of schoolboys, with an approximating


Normal distribution (Rose, 1962, Table 1)

3
Both distributions are approximately symmetrical about their central values and they exhibit a
similar shape, even though the units of the measurements are very different.
The observed frequencies have been approximated by a smooth curve which in each case is based
on a Normal probability distribution model with appropriately chosen mean and standard deviation
(see later).
Normal random variables are ubiquitous in theoretical statistics and in application areas. One reason
for this is the central limit theorem (which you met last year and which we shall prove later in this
module).
[1024 P19-20]

2.6 Probability density function of the Normal random variable


The pdf of a Normal random variable is given by

1 ( y ) 2
f(y) exp , y
2 2 2

(where exp(z) is a convenient way of writing the exponential function ez). Note that are and > 0
are parameters. If Y is a random variable with the above pdf, we will write Y ~N(, 2).

The pdf of a Normal random variable is the familiar bell curve.

The curve is shown in the Figure 3. On the horizontal axis of this figure are marked the positions of
the mean and the values of y that differ from by , 2 and 3. The symmetry of the curve
is evident from the mathematical model, since changing the sign of y leaves f(y) unchanged.

The figure shows that a relatively small proportion of the area under the curve lies outside the two
values y = 2 and y = + 2. The vertical scale is arranged so that the area under the curve is
equal to one. This implies that the area between any two points on the horizontal axis represents
the probability that the variable takes a value between these two points. For example, the
probability that the variable takes a value in the interval y = 2 up to y = + 2 is very nearly
0.95 and the probability that Y lies outside this range is correspondingly approximately 0.05. It is
important to be able to find the area under any part of a Normal pdf.

4
0.4
Probability
density x

0.2

0.0
-3 -2 - + +2 +3
Original variable, y

-3 -2 -1 0 1 2 3
Standardised variable, z

Figure 3 The probability density function of a Normal random variable showing the scales of
the original variable and the standardised variable.

Now f(y) depends on two parameters, the mean and standard deviation . It might be thought
therefore that any relevant probabilities would have to be worked out separately for every pair of
values , . Fortunately this is not so. We have seen that the probability that Y lies in the interval
2 up to + 2 is about 0.95, which is true without specifying the values of and . In fact
the probabilities depend on an expression of the departure of y from as a multiple of . The
statement above is equivalent to saying that there is a probability of approximately 0.95 that y lies
within two standard deviations of the mean. On the diagram these multiples are marked on the axis
as 1, 2 and 3 as shown on the lower scale. The probabilities under various parts of any Normal
pdf may be expressed in terms of the standardised deviate
y
z

A few important results are given in the table below. More detailed tables of the Normal
probabilities are available (just search on standard normal tables online).

5
Some probabilities associated with Normal random variables
Standardised deviate Probability of greater deviation
z = (y )/ In either direction In one direction
0.0 1.000 0.500
1.0 0.317 0.159
2.0 0.046 0.023
3.0 0.0027 0.0013

1.645 0.10 0.05


1.960 0.05 0.025
2.576 0.01 0.005

This table shows probabilities of obtaining a standardised deviate z = (y )/ more extreme (in
either direction or in one direction) than the tabulated value. For example, for z = 2.0 the
probability of obtaining a value of (y )/ outside 2.0 is 0.046, while the probability of (y )/
being greater than 2.0 is 0.023. (By symmetry, the probability that (y )/ is less than 2.0 is
also 0.023.) The figure below illustrates these probabilities.

0.4
density

0.3

0.2

0.1
0.023 0.023

0.0
-3 -2 -1 0 1 2 3
z

The usual tabulation of Normal probabilities is in the form of the cumulative probability that
z = (y - )/ is less than the tabulated value. This may be used for any Normally distributed
random variable Y ~N(, 2) because
y1 (y ) 2
P(Y y) F(y) exp dy

2 22

( y )/ 1 z2 y
exp dz (z) P(Z ),

2 2

[1024 P19-20]

6
2.7 Example of Normal probability calculations
Suppose that daily water use at a factory varies about a mean of 15,500 gallons with standard
deviation 1,140 gallons. If demand is Normally distributed

(i) What proportion of days does the demand fall short of 14,000 gallons?
(ii) What proportion of days does demand exceed 18,000 gallons?
(iii) What is your reaction to a demand of 35,000 gallons?

In each case we first require to calculate the standard normal deviate, z = (y )/. Using the table
of the Normal distribution function, and using the symmetry property where necessary, we have
(i) z = (14,000 15,500)/1,140 = 1.32.
From tables, the upper tail probability for z = 1.32 is 0.0934, and the lower tail
probability for z = 1.32 will be identical.
Thus 9.34% of daily demands fall short of 14,000 gallons.

(ii) z = (18,000 15,500)/1,140 = 2.19, with upper tail probability 0.01426. i.e. about
1% of daily demands exceed 18,000 gallons.

(iii) z = (35,000 15,500)/1,140 = 17.11. This lies beyond the range of the tables, but the
tail probability is less than one in a billion. One would be surprised and an
explanation may be sought. It is possible that a mis-recording error has occurred, such
as two days data being taken together. This idea of surprise at an extreme result of
low probability, as predicted by a statistical model, will be important later in this
module and also in modules such as MATH2010 Statistical Methods I.

Frequency distributions resembling the Normal pdf in shape are often observed but this form should
not be taken as the norm - despite the use of the name 'Normal'. Many observed distributions are
undeniably far from 'Normal' in shape yet should not be regarded as abnormal in any way.

The importance of Normal random variables lies in the central place that it occupies in sampling
theory which we shall discuss later. Many of the usual estimation and testing procedures require
that the Normal model for the behaviour of the measurement is reasonably valid.

7
2.8 The use of a Normal approximation to Binomial probabilities
We have seen in 1.6.1 that the Binomial model is appropriate when considering the number of
successes in independent Bernoulli trials. However, Binomial probabilities can often be
approximated by Normal probabilities when n, the number of trials, is large.

Suppose that we have a Binomial situation, i.e. n trials of a dichotomous random variable (success
or failure) with constant probability of success. The probability that the number of successes is r is
given by the Binomial probabilities
n!
P(Y = r) = r (1 )n r for r = 0, 1, , n.
r!(n r)!
Therefore the probability that Y takes a value between r1 and r2 is given by
r r2
n!
P(r1 Y r2 ) = r!(n r)! (1 )
r r1
r n r

The Binomial mean = n and variance 2 = n (1 ) may be used here (see Chapter 4 for
details).

If n is sufficiently large, the observed number of individuals Y (with the particular characteristic,
success) in the sample of size n is approximately Normally distributed with mean value = n and
standard deviation = {n (1 )}.

Account should be taken of the fact that Binomial random variables are discrete, while the Normal
is continuous. Slightly more accurate approximations are provided if a continuity correction is
used. Thus the probability that the sample will contain between r1 and r2 individuals with the
characteristic of interest is approximately given by the standard Normal probability from

r1 0.5 n r2 0.5 n
z1 to z 2 .
n(1 ) n(1 )
[1024 P21]

2.9 Lognormal random variables

Perhaps the second most commonly occurring distribution in scientific investigations is the
Lognormal. The random variable Y is said to be Lognormal if X = logY is a Normally random
variable. (Note that all logarithms are assumed to be to base e in this module.)

We shall show later that the pdf is given by:

8
1 (log y ) 2
f(y) exp y 0.
y 2 2 2
Note that, as in the Normal case, the Lognormal model is a two-parameter model. It has applications
in a variety of fields, such as Economics, where a multiplicative form of the central limit theorem may
apply.
[1024 P additional handout]

Final remarks
In MGB the material in this chapter is covered in Chapter II section 2 and Chapter III sections 3.2,
3.3 and 3.5.

9
MATH2011 Statistical Distribution Theory

Chapter 6: Maxima and Minima

6.1 Order Statistics


Suppose that Y1, , Yn represent n independently and identically distributed random
variables each with cumulative distribution function F.
Suppose that the corresponding observed values are y1, , yn.
Let these values, when ordered, be represented by

y(1) < y(2) < < y(n).

The y(i), i = 1, , n, are called the order statistics corresponding to y1, , yn.

1
You have already met certain order statistics. For example, the sample median is an order
statistic: for odd values of n the sample median is equal to y({n+1}/2), while for even n the
sample median is defined as

median = [y(n/2) + y(n/2+1)]/2.

We shall concentrate, however, on two particular order statistics: y(1), the sample
minimum, and y(n), the sample maximum. We define the corresponding random variables:

Y(1) = min{Y1, , Yn}; Y(n) = max{Y1, , Yn}.

2
6.2 Applications where maxima and/or minima are of interest

There are many areas of application. For example:

Understanding outliers in statistical data an outlier will usually be the largest or


smallest data point.

In reliability engineering a system will tend to fail at its weakest point (which might
be thought of as the point with the minimum strength).

In designing coastal defences one needs to understand the distribution of the wave
heights of the highest tides.

In insurance the behaviour of the largest claims is important.

There is a whole area of statistics devoted to the study of extremes (Extreme value
theory). In this short chapter we just give a brief introduction to the subject.

3
6.3 The cdf of Y(n), the largest value in a random sample of size n.
Since Y(n) = max{Y1, , Yn}, the probability that Y(n) y gives the cumulative distribution
function of Y(n), the sample maximum.

Now the event {Y(n) y } is identical to the event {Y1 y and Y2 y and Yn y}.

So
Gn(y) = P(Y(n) y) = P( all Yi y) = P(Y1 y and Y2 y and Yn y).

Thus, by independence

Gn(y) = P(Y1 y)P(Y2 y) P(Yn y) = [F(y)]n.

4
6.4 Example: a simple discrete experiment
Suppose I roll a fair die twice. What is the probability function of the maximum of the two
scores?
We know that F(y) = y/6 for y= 1, 2, 3, 4, 5, 6. Also, n = 2 in this example.
So the distribution function of the higher score is:
G2(y) = (y/6)2 for y= 1, 2, ... , 6.
Hence
P(Y(2) = y) = (y/6)2 [(y1)/6]2 for y= 1, 2, ... , 6.

5
6.5 The pdf of the maximum in the continuous case
If the Yi are continuous, each with density function f, then the density function of Y(n) may
be found by differentiating Gn(y) with respect to y to give:

d
gn(y) = [F(y)]n n[F(y)]n 1 f (y) ,
dy

where the domain of the maximum is the same as that of each of the Yi.

6
6.6 Example: the maximum of a uniform random sample

Suppose each of the Yi is Uniform on (0, ).


y
Then on (0, ) we have F(y) = .

Hence
n 1
y 1 ny n 1
g n (y)=n , 0 < y < .
n

Here is an illustration of the above pdf when = 1 and n = 1, 5 and 10.

7
Note how the probability piles up against the upper end point of the domain of the pdf.
How do you think the expected value and variance of the sample maximum in this case
will change as n increases?
What would happen to Y(n) when the domain of the Yi has no finite upper end point?

8
6.7 The cdf Y(1), the smallest value in a random sample of size n.
Since Y(1) = min{Y1, , Yn}, the probability that Y(1) y gives the cumulative distribution
function of Y(1), the smallest value in the sample.

Now

P(Y(1) y) = 1 P(Y(1) > y) = 1 P( all Yi > y) = 1 P(Y1 > y and Y2 > y and Yn > y),

which gives, using independence,

G1(y) = 1 P(Y1 > y)P(Y2 > y) P(Yn > y) = 1 [1 F(y)]n.

9
6.8 The pdf of the minimum in the continuous case

If the Yi are continuous, each with probability density function f, then the pdf of Y(1) may
be found by differentiating G1(y) with respect to y to give:

g1(y) = n 1 F(y)
n 1
f (y) ,

where the domain of the minimum is the same as that of each of the Yi.

10
6.9 Example: The distribution of the minimum of an Exponential random sample.
Suppose that Yi for i = 1, , n, are independent Exponential random variables, each with
probability density function
f(y) = ey, 0 < y < .

Find the pdf of the smallest value in this sample of size n.

Here we have Y(1) = minimum{Y1, , Yn} and the pdf of Y(1) is

g1(y) = n 1 F(y)
n 1
f (y) for 0 < y < .

11
Now F(y) = 1 ey so that
n 1
g1(y) = n ey ey = ne ny , for 0 < y < .

That is, the distribution of the smallest value in an Exponential random sample of size n
with parameter (i.e. with mean value 1/) is also an Exponential random variable but
with parameter n (i.e. with mean value 1/(n)).

So in this case Y(1) has expected value 1/(n) and variance 1/(n)2, which both decrease as
n increases.

Is that a surprise?

12
6.10 Closing remarks
We have only given a taster here of an interesting area of statistics.
We have only looked at the marginal behaviour of the minimum and the maximum,
and we have only considered the extreme order statistics. The results can be extended
to include other order statistics.
The closure result in 6.9 hints at some interesting structure in the probabilistic
behaviour of maxima and minima. The central limit theorem basically says that under
certain conditions the sum of n independent, identically distributed random variables
is approximately Normal as n grows large. There are corresponding results for
maxima and minima, though the large-n distribution is not Normal (it is the so-called
generalised extreme value distribution).
MGB go into much greater detail on this topic in their Chapter V1 section 5.

13

You might also like