You are on page 1of 212

Review of Probability and

Statistics

Aswini Kumar Mishra


Faculty, Department of Economics

BITS-PILANI, K.K. BIRLA GOA CAMPUS


Some Important Definitions
Random Experiments
Sample Space
Sample Points
Events
Types of Events-Mutually Exclusive,
Equally Likely, Collectively Exhaustive

2
Probability Definitions
Classical Definition
Empirical Definition
Probability Properties-Probability of an
Event
Mutually Exclusive and Exhaustive Events
Statistically Independent Events

3
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6

This sequence provides an example of a discrete random variable. Suppose that you have
a red die which, when thrown, takes the numbers from 1 to 6 with equal probability.
4
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green

1
2
3
4
5
6

Suppose that you also have a green die that can take the numbers from 1 to 6 with equal
probability.
5
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green

1
2
3
4
5
6

We will define a random variable X as the sum of the numbers when the dice are thrown.

6
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green

1
2
3
4
5
6 10

For example, if the red die is 4 and the green one is 6, X is equal to 10.

7
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green

1
2
3
4
5 7
6

Similarly, if the red die is 2 and the green one is 5, X is equal to 7.

8
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

red 1 2 3 4 5 6
green

1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

The table shows all the possible outcomes.

9
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

X
red 1 2 3 4 5 6
green 2
3
1 2 3 4 5 6 7 4
5
2 3 4 5 6 7 8 6
3 4 5 6 7 8 9 7
8
4 5 6 7 8 9 10 9
5 6 7 8 9 10 11 10
11
6 7 8 9 10 11 12 12

If you look at the table, you can see that X can be any of the numbers from 2 to 12.

10
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

X f
red 1 2 3 4 5 6
green 2
3
1 2 3 4 5 6 7 4
5
2 3 4 5 6 7 8 6
3 4 5 6 7 8 9 7
8
4 5 6 7 8 9 10 9
5 6 7 8 9 10 11 10
11
6 7 8 9 10 11 12 12

We will now define f, the frequencies associated with the possible values of X.

11
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

X f
red 1 2 3 4 5 6
green 2
3
1 2 3 4 5 6 7 4
5 4
2 3 4 5 6 7 8 6
3 4 5 6 7 8 9 7
8
4 5 6 7 8 9 10 9
5 6 7 8 9 10 11 10
11
6 7 8 9 10 11 12 12

For example, there are four outcomes which make X equal to 5.

12
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

X f
red 1 2 3 4 5 6
green 2 1
3 2
1 2 3 4 5 6 7 4 3
5 4
2 3 4 5 6 7 8 6 5
3 4 5 6 7 8 9 7 6
8 5
4 5 6 7 8 9 10 9 4
5 6 7 8 9 10 11 10 3
11 2
6 7 8 9 10 11 12 12 1

Similarly you can work out the frequencies for all the other values of X.

13
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

X f p
red 1 2 3 4 5 6
green 2 1
3 2
1 2 3 4 5 6 7 4 3
5 4
2 3 4 5 6 7 8 6 5
3 4 5 6 7 8 9 7 6
8 5
4 5 6 7 8 9 10 9 4
5 6 7 8 9 10 11 10 3
11 2
6 7 8 9 10 11 12 12 1

Finally we will derive the probability of obtaining each value of X.

14
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

X f p
red 1 2 3 4 5 6
green 2 1
3 2
1 2 3 4 5 6 7 4 3
5 4
2 3 4 5 6 7 8 6 5
3 4 5 6 7 8 9 7 6
8 5
4 5 6 7 8 9 10 9 4
5 6 7 8 9 10 11 10 3
11 2
6 7 8 9 10 11 12 12 1

If there is 1/6 probability of obtaining each number on the red die, and the same on the
green die, each outcome in the table will occur with 1/36 probability.
15
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

X f p
red 1 2 3 4 5 6
green 2 1 1/36
3 2 2/36
1 2 3 4 5 6 7 4 3 3/36
5 4 4/36
2 3 4 5 6 7 8 6 5 5/36
3 4 5 6 7 8 9 7 6 6/36
8 5 5/36
4 5 6 7 8 9 10 9 4 4/36
5 6 7 8 9 10 11 10 3 3/36
11 2 2/36
6 7 8 9 10 11 12 12 1 1/36

Hence to obtain the probabilities associated with the different values of X, we divide the
frequencies by 36.
16
PROBABILITY DISTRIBUTION EXAMPLE: X IS THE SUM OF TWO DICE

probability

2
__ 3
__ 4
__ 5
__ 6
__ 5
__ 4
__ 3
__ 2
__
1 1
36 36 36 36 36 36 36 36 36
36 36

2 3 4 5 6 7 8 9 10 11 12 X

The distribution is shown graphically. in this example it is symmetrical, highest for X equal
to 7 and declining on either side.
17
Some Important Concepts
PDF of a d.r. variable
PDF of a c.r. variable
Joint PDF of d.r.variables and c.r. variables
Marginal PDF of d.r.variables and c.r. variables
Statistical Independence
Conditional PDF

18
Characteristics of Probability
Distributions
A probability distribution can often be
summarized in terms of a few of the
characteristics, known as moments of the
distribution.

Two of the most widely used moments are


the mean or expected value, and the
variance.

19
EXPECTED VALUE OF A DISCRETE RANDOM VARIABLE

Definition of E(X), the expected value of d.r.v.X:

n n
E ( X ) x1 p1 ... xn pn xi pi xi f ( xi )
i 1 i 1

The expected value of a random variable, also known as its population mean, is the
weighted average of its possible values, the weights being the probabilities attached to the
values.
Note that the sum of the probabilities must be unity, so there is no need to divide by the
sum of the weights.
20
EXPECTED VALUE OF A RANDOM VARIABLE

xi
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11

This sequence shows how the expected value is calculated, first in abstract and then with
the random variable defined in the first sequence. We begin by listing the possible values
of X. 21
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi
x1 p1
x2 p2
x3 p3
x4 p4
x5 p5
x6 p6
x7 p7
x8 p8
x9 p9
x10 p10
x11 p11

Next we list the probabilities attached to the different possible values of X.

22
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi
x1 p1 x1 p1
x2 p2
x3 p3
x4 p4
x5 p5
x6 p6
x7 p7
x8 p8
x9 p9
x10 p10
x11 p11

Then we define a column in which the values are weighted by the corresponding
probabilities.
23
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi
x1 p1 x1 p1
x2 p2 x2 p2
x3 p3
x4 p4
x5 p5
x6 p6
x7 p7
x8 p8
x9 p9
x10 p10
x11 p11

We do this for each value separately.

24
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi
x1 p1 x1 p1
x2 p2 x2 p2
x3 p3 x3 p3
x4 p4 x4 p4
x5 p5 x5 p5
x6 p6 x6 p6
x7 p7 x7 p7
x8 p8 x8 p8
x9 p9 x9 p9
x10 p10 x10 p10
x11 p11 x11 p11

Here we are assuming that n, the number of possible values, is equal to 11, but it could be
any number.
25
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi
x1 p1 x1 p1
x2 p2 x2 p2
x3 p3 x3 p3
x4 p4 x4 p4
x5 p5 x5 p5
x6 p6 x6 p6
x7 p7 x7 p7
x8 p8 x8 p8
x9 p9 x9 p9
x10 p10 x10 p10
x11 p11 x11 p11
S xi pi = E(X)
The expected value is the sum of the entries in the third column.

26
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi xi pi
x1 p1 x1 p1 2 1/36
x2 p2 x2 p2 3 2/36
x3 p3 x3 p3 4 3/36
x4 p4 x4 p4 5 4/36
x5 p5 x5 p5 6 5/36
x6 p6 x6 p6 7 6/36
x7 p7 x7 p7 8 5/36
x8 p8 x8 p8 9 4/36
x9 p9 x9 p9 10 3/36
x10 p10 x10 p10 11 2/36
x11 p11 x11 p11 12 1/36
S xi pi = E(X)
The random variable X defined in the previous sequence could be any of the integers from 2
to 12 with probabilities as shown.
27
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi xi pi xi pi
x1 p1 x1 p1 2 1/36 2/36
x2 p2 x2 p2 3 2/36
x3 p3 x3 p3 4 3/36
x4 p4 x4 p4 5 4/36
x5 p5 x5 p5 6 5/36
x6 p6 x6 p6 7 6/36
x7 p7 x7 p7 8 5/36
x8 p8 x8 p8 9 4/36
x9 p9 x9 p9 10 3/36
x10 p10 x10 p10 11 2/36
x11 p11 x11 p11 12 1/36
S xi pi = E(X)
X could be equal to 2 with probability 1/36, so the first entry in the calculation of the
expected value is 2/36.
28
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi xi pi xi pi
x1 p1 x1 p1 2 1/36 2/36
x2 p2 x2 p2 3 2/36 6/36
x3 p3 x3 p3 4 3/36
x4 p4 x4 p4 5 4/36
x5 p5 x5 p5 6 5/36
x6 p6 x6 p6 7 6/36
x7 p7 x7 p7 8 5/36
x8 p8 x8 p8 9 4/36
x9 p9 x9 p9 10 3/36
x10 p10 x10 p10 11 2/36
x11 p11 x11 p11 12 1/36
S xi pi = E(X)
The probability of x being equal to 3 was 2/36, so the second entry is 6/36.

29
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi xi pi xi pi
x1 p1 x1 p1 2 1/36 2/36
x2 p2 x2 p2 3 2/36 6/36
x3 p3 x3 p3 4 3/36 12/36
x4 p4 x4 p4 5 4/36 20/36
x5 p5 x5 p5 6 5/36 30/36
x6 p6 x6 p6 7 6/36 42/36
x7 p7 x7 p7 8 5/36 40/36
x8 p8 x8 p8 9 4/36 36/36
x9 p9 x9 p9 10 3/36 30/36
x10 p10 x10 p10 11 2/36 22/36
x11 p11 x11 p11 12 1/36 12/36
S xi pi = E(X)
Similarly for the other 9 possible values.

30
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi xi pi xi pi
x1 p1 x1 p1 2 1/36 2/36
x2 p2 x2 p2 3 2/36 6/36
x3 p3 x3 p3 4 3/36 12/36
x4 p4 x4 p4 5 4/36 20/36
x5 p5 x5 p5 6 5/36 30/36
x6 p6 x6 p6 7 6/36 42/36
x7 p7 x7 p7 8 5/36 40/36
x8 p8 x8 p8 9 4/36 36/36
x9 p9 x9 p9 10 3/36 30/36
x10 p10 x10 p10 11 2/36 22/36
x11 p11 x11 p11 12 1/36 12/36
S xi pi = E(X) 252/36

To obtain the expected value, we sum the entries in this column.

31
EXPECTED VALUE OF A RANDOM VARIABLE

xi pi xi pi xi pi xi pi
x1 p1 x1 p1 2 1/36 2/36
x2 p2 x2 p2 3 2/36 6/36
x3 p3 x3 p3 4 3/36 12/36
x4 p4 x4 p4 5 4/36 20/36
x5 p5 x5 p5 6 5/36 30/36
x6 p6 x6 p6 7 6/36 42/36
x7 p7 x7 p7 8 5/36 40/36
x8 p8 x8 p8 9 4/36 36/36
x9 p9 x9 p9 10 3/36 30/36
x10 p10 x10 p10 11 2/36 22/36
x11 p11 x11 p11 12 1/36 12/36
S xi pi = E(X) 252/36 = 7

The expected value turns out to be 7. Actually, this was obvious anyway. We saw in the
previous sequence that the distribution is symmetrical about 7.
32
EXPECTED VALUE OF A RANDOM VARIABLE

Alternative notation for E(X):

E(X) = mX

Very often the expected value of a random variable is represented by m, the Greek m. If
there is more than one random variable, their expected values are differentiated by adding
subscripts to m. 33
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

Definition of E[g(X)], the expected value of a function of X:


n
E g ( X ) g ( x1 ) p1 ... g ( xn ) pn g ( xi ) pi
i 1

To find the expected value of a function of a random variable, you calculate all the possible
values of the function, weight them by the corresponding probabilities, and sum the
results. 34
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

Definition of E[g(X)], the expected value of a function of X:


n
E g ( X ) g ( x1 ) p1 ... g ( xn ) pn g ( xi ) pi
i 1

Example:
n
E ( X ) x p1 ... x pn xi2 pi
2 2
1
2
n
i 1

For example, the expected value of X2 is found by calculating all its possible values,
multiplying them by the corresponding probabilities, and summing.
35
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi
x1 p1
x2 p2
x3 p3







xn pn

The calculation of the expected value of a function of a random variable will be outlined in
general and then illustrated with an example.
36
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi
x1 p1
x2 p2
x3 p3







xn pn

First you list the possible values of X and the corresponding probabilities.

37
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi)
x1 p1 g(x1)
x2 p2 g(x2)
x3 p3 g(x3)
...
...
...
...
...
...
...
xn pn g(xn)

Next you calculate the function of X for each possible value of X.

38
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi
x1 p1 g(x1) g(x1) p1
x2 p2 g(x2)
x3 p3 g(x3)
...
...
...
...
...
...
...
xn pn g(xn)

Then, one at a time, you weight the value of the function by its corresponding probability.

39
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi
x1 p1 g(x1) g(x1) p1
x2 p2 g(x2) g(x2) p2
x3 p3 g(x3) g(x3) p3
... ...
... ...
... ...
... ...
... ...
... ...
... ...
xn pn g(xn) g(xn) pn

You do this individually for each possible value of X.

40
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi
x1 p1 g(x1) g(x1) p1
x2 p2 g(x2) g(x2) p2
x3 p3 g(x3) g(x3) p3
... ...
... ...
... ...
... ...
... ...
... ...
... ...
xn pn g(xn) g(xn) pn
S g(xi) pi
The sum of the weighted values is the expected value of the function of X.

41
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi xi pi
x1 p1 g(x1) g(x1) p1 2 1/36
x2 p2 g(x2) g(x2) p2 3 2/36
x3 p3 g(x3) g(x3) p3 4 3/36
... ... 5 4/36
... ... 6 5/36
... ... 7 6/36
... ... 8 5/36
... ... 9 4/36
... ... 10 3/36
... ... 11 2/36
xn pn g(xn) g(xn) pn 12 1/36
S g(xi) pi
The process will be illustrated for X2, where X is the random variable defined in the first
sequence. The 11 possible values of X and the corresponding probabilities are listed.
42
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi xi pi xi2


x1 p1 g(x1) g(x1) p1 2 1/36 4
x2 p2 g(x2) g(x2) p2 3 2/36 9
x3 p3 g(x3) g(x3) p3 4 3/36 16
... ... 5 4/36 25
... ... 6 5/36 36
... ... 7 6/36 49
... ... 8 5/36 64
... ... 9 4/36 81
... ... 10 3/36 100
... ... 11 2/36 121
xn pn g(xn) g(xn) pn 12 1/36 144
S g(xi) pi
First you calculate the possible values of X2.

43
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi


x1 p1 g(x1) g(x1) p1 2 1/36 4 0.11
x2 p2 g(x2) g(x2) p2 3 2/36 9
x3 p3 g(x3) g(x3) p3 4 3/36 16
... ... 5 4/36 25
... ... 6 5/36 36
... ... 7 6/36 49
... ... 8 5/36 64
... ... 9 4/36 81
... ... 10 3/36 100
... ... 11 2/36 121
xn pn g(xn) g(xn) pn 12 1/36 144
S g(xi) pi
The first value is 4, which arises when X is equal to 2. The probability of X being equal to 2
is 1/36, so the weighted function is 4/36, which we shall write in decimal form as 0.11.
44
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi


x1 p1 g(x1) g(x1) p1 2 1/36 4 0.11
x2 p2 g(x2) g(x2) p2 3 2/36 9 0.50
x3 p3 g(x3) g(x3) p3 4 3/36 16 1.33
... ... 5 4/36 25 2.78
... ... 6 5/36 36 5.00
... ... 7 6/36 49 8.17
... ... 8 5/36 64 8.89
... ... 9 4/36 81 9.00
... ... 10 3/36 100 8.83
... ... 11 2/36 121 6.72
xn pn g(xn) g(xn) pn 12 1/36 144 4.00
S g(xi) pi
Similarly for all the other possible values of X.

45
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi


x1 p1 g(x1) g(x1) p1 2 1/36 4 0.11
x2 p2 g(x2) g(x2) p2 3 2/36 9 0.50
x3 p3 g(x3) g(x3) p3 4 3/36 16 1.33
... ... 5 4/36 25 2.78
... ... 6 5/36 36 5.00
... ... 7 6/36 49 8.17
... ... 8 5/36 64 8.89
... ... 9 4/36 81 9.00
... ... 10 3/36 100 8.83
... ... 11 2/36 121 6.72
xn pn g(xn) g(xn) pn 12 1/36 144 4.00
S g(xi) pi 54.83

The expected value of X2 is the sum of its weighted values in the final column. It is equal to
54.83. It is the average value of the figures in the previous column, taking the differing
probabilities into account. 46
EXPECTED VALUE OF A FUNCTION OF A RANDOM VARIABLE

xi pi g(xi) g(xi ) pi xi pi xi2 xi2 pi


x1 p1 g(x1) g(x1) p1 2 1/36 4 0.11
x2 p2 g(x2) g(x2) p2 3 2/36 9 0.50
x3 p3 g(x3) g(x3) p3 4 3/36 16 1.33
... ... 5 4/36 25 2.78
... ... 6 5/36 36 5.00
... ... 7 6/36 49 8.17
... ... 8 5/36 64 8.89
... ... 9 4/36 81 9.00
... ... 10 3/36 100 8.83
... ... 11 2/36 121 6.72
xn pn g(xn) g(xn) pn 12 1/36 144 4.00
S g(xi) pi 54.83

Note that E(X2) is not the same thing as E(X), squared. In the previous sequence we saw
that E(X) for this example was 7. Its square is 49.
47
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

Population variance of X: E ( X m )2

E ( X m ) 2 ( x1 m ) 2 p1 ... ( xn m ) 2 pn ( xi m ) 2 pi
n

i 1

The third sequence defined the expected value of a function of a random variable X. There
is only one function that is of much interest to us, at least initially: the squared deviation
from the population mean. 48
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

Population variance of X: E ( X m )2

E ( X m ) 2 ( x1 m ) 2 p1 ... ( xn m ) 2 pn ( xi m ) 2 pi
n

i 1

The expected value of the squared deviation is known as the population variance of X. It is
a measure of the dispersion of the distribution of X about its population mean.
49
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0 0 0.00
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

We will calculate the population variance of the random variable X defined in the first
sequence. We start as usual by listing the possible values of X and the corresponding
probabilities. 50
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0
m
0 X
E ( X 0.00
)7
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

Next we need a column giving the deviations of the possible values of X about its
population mean. In the second sequence we saw that the population mean of X was 7.
51
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0
mm EE( (XX0.00
0 XX
) )7
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

When X is equal to 2, the deviation is 5.

52
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0
m
0 X
E ( X 0.00
)7
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

Similarly for all the other possible values.

53
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0 0 0.00
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

Next we need a column giving the squared deviations. When X is equal to 2, the squared
deviation is 25.
54
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0 0 0.00
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

Similarly for the other values of X.

55
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0 0 0.00
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

Now we start weighting the squared deviations by the corresponding probabilities. What do
you think the weighted average will be? Have a guess.
56
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0 0 0.00
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

A reason for making an initial guess is that it may help you to identify an arithmetical error,
if you make one. If the initial guess and the outcome are very different, that is a warning.
57
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0 0 0.00
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

We calculate all the weighted squared deviations.

58
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

xi pi xi m (xi m)2 (xi m)2 pi

2 1/36 5 25 0.69
3 2/36 4 16 0.89
4 3/36 3 9 0.75
5 4/36 2 4 0.44
6 5/36 1 1 0.14
7 6/36 0 0 0.00
8 5/36 1 1 0.14
9 4/36 2 4 0.44
10 3/36 3 9 0.75
11 2/36 4 16 0.89
12 1/36 5 25 0.69
5.83

The sum is the population variance of X.

59
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

Population variance of X

E ( X m )2

s X2

In equations, the population variance of X is usually written sX2, s being the Greek s.

60
POPULATION VARIANCE OF A DISCRETE RANDOM VARIABLE

Standard deviation of X

E[( X m ) 2 ]

sX

The standard deviation of X is the square root of its population variance. Usually written sx,
it is an alternative measure of dispersion. It has the same units as X.
61
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)

This sequence states the rules for manipulating expected values. First, the additive rule.
The expected value of the sum of two random variables is the sum of their expected values.
62
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)


Example generalization:
E(W + X + Y + Z) = E(W) + E(X) + E(Y) + E(Z)

This generalizes to any number of variables. An example is shown.

63
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)


2. E(bX) = bE(X)

The second rule is the multiplicative rule. The expected value of (a variable multiplied by a
constant) is equal to the constant multiplied by the expected value of the variable.
64
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)


2. E(bX) = bE(X)
Example:
E(3X) = 3E(X)

For example, the expected value of 3X is three times the expected value of X.

65
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)


2. E(bX) = bE(X)
3. E(b) = b

Finally, the expected value of a constant is just the constant. Of course this is obvious.

66
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)


2. E(bX) = bE(X)
3. E(b) = b

Y = b1 + b2X
E(Y) = E(b1 + b2X)

As an exercise, we will use the rules to simplify the expected value of an expression.
Suppose that we are interested in the expected value of a variable Y, where Y = b1 + b2X.
67
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)


2. E(bX) = bE(X)
3. E(b) = b

Y = b1 + b2X
E(Y) = E(b1 + b2X)
= E(b1) + E(b2X)

We use the first rule to break up the expected value into its two components.

68
EXPECTED VALUE RULES

1. E(X + Y) = E(X) + E(Y)


2. E(bX) = bE(X)
3. E(b) = b

Y = b1 + b2X
E(Y) = E(b1 + b2X)
= E(b1) + E(b2X)
= b1 + b2E(X)

Then we use the second rule to replace E(b2X) by b2E(X) and the third rule to simplify E(b1)
to just b1. This is as far as we can go in this example.
69
INDEPENDENCE OF TWO RANDOM VARIABLES

Two random variables X and Y are said to be


independent if and only if
E[f(X)g(Y)] = E[f(X)] E[g(Y)]
for any functions f(X) and g(Y).

This very short sequence presents an important definition, that of the independence of two
random variables.
70
INDEPENDENCE OF TWO RANDOM VARIABLES

Two random variables X and Y are said to be


independent if and only if
E[f(X)g(Y)] = E[f(X)] E[g(Y)]
for any functions f(X) and g(Y).

Two variables X and Y are independent if and only if, given any functions f(X) and g(Y), the
expected value of the product f(X)g(Y) is equal to the expected value of f(X) multiplied by
the expected value of g(Y). 71
INDEPENDENCE OF TWO RANDOM VARIABLES

Two random variables X and Y are said to be


independent if and only if
E[f(X)g(Y)] = E[f(X)] E[g(Y)]
for any functions f(X) and g(Y).

Special case: if X and Y are independent,


E(XY) = E(X) E(Y)

As a special case, the expected value of XY is equal to the expected value of X multiplied by
the expected value of Y if and only if X and Y are independent.
72
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

s X2 = E(X2) m2

s X2 = E[(X m)2]

= E(X2 2mX + m2)

= E(X2) + E(2mX) + E(m2)

= E(X2) 2mE(X) + m2

= E(X2) 2m2 + m2 = E(X2) m2

This sequence derives an alternative expression for the population variance of a random
variable. It provides an opportunity for practising the use of the expected value rules.
73
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

s X2 = E(X2) m2

s X2 = E[(X m)2]

= E(X2 2mX + m2)

= E(X2) + E(2mX) + E(m2)

= E(X2) 2mE(X) + m2

= E(X2) 2m2 + m2 = E(X2) m2

We start with the definition of the population variance of X.

74
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

s X2 = E(X2) m2

s X2 = E[(X m)2]

= E(X2 2mX + m2)

= E(X2) + E(2mX) + E(m2)

= E(X2) 2mE(X) + m2

= E(X2) 2m2 + m2 = E(X2) m2

We expand the quadratic.

75
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

s X2 = E(X2) m2

s X2 = E[(X m)2]

= E(X2 2mX + m2)

= E(X2) + E(2mX) + E(m2)

= E(X2) 2mE(X) + m2

= E(X2) 2m2 + m2 = E(X2) m2

Now the first expected value rule is used to decompose the expression into three separate
expected values.
76
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

s X2 = E(X2) m2

s X2 = E[(X m)2]

= E(X2 2mX + m2)

= E(X2) + E(2mX) + E(m2)

= E(X2) 2mE(X) + m2

= E(X2) 2m2 + m2 = E(X2) m2

The second expected value rule is used to simplify the middle term and the third rule is
used to simplify the last one.
77
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

s X2 = E(X2) m2

s X2 = E[(X m)2]

= E(X2 2mX + m2)

= E(X2) + E(2mX) + E(m2)

= E(X2) 2mE(X) + m2

= E(X2) 2m2 + m2 = E(X2) m2

The middle term is rewritten, using the fact that E(X) and mX are just different ways of writing
the population mean of X.
78
ALTERNATIVE EXPRESSION FOR POPULATION VARIANCE

s X2 = E(X2) m2

s X2 = E[(X m)2]

= E(X2 2mX + m2)

= E(X2) + E(2mX) + E(m2)

= E(X2) 2mE(X) + m2

= E(X2) 2m2 + m2 = E(X2) m2

Hence we get the result.

79
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

Population mean of X: E(X) =mX

In observation i, the random


component is given by ui = xi mX

Hence xi can be decomposed


into fixed and random components: xi = mX + ui

Note that the expected value


of ui is zero:

E(ui) = E(xi mX) = E(xi) + E(mX) =mX mX = 0

In this short sequence we shall decompose a random variable X into its fixed and random
components. Let the population mean of X be mX.
80
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

Population mean of X: E(X) =mX

In observation i, the random


component is given by ui = xi mX

Hence xi can be decomposed


into fixed and random components: xi = mX + ui

Note that the expected value


of ui is zero:

E(ui) = E(xi mX) = E(xi) + E(mX) =mX mX = 0

The actual value of X in any observation will in general be different from mX. We will call the
difference ui, so ui = xi - mX.
81
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

Population mean of X: E(X) =mX

In observation i, the random


component is given by ui = xi mX

Hence xi can be decomposed


into fixed and random components: xi = mX + ui

Note that the expected value


of ui is zero:

E(ui) = E(xi mX) = E(xi) + E(mX) =mX mX = 0

Re-arranging this equation, we can write xi as the sum of its fixed component, mX, which is
the same for all observations, and its random component, ui.
82
THE FIXED AND RANDOM COMPONENTS OF A RANDOM VARIABLE

Population mean of X: E(X) =mX

In observation i, the random


component is given by ui = xi mX

Hence xi can be decomposed


into fixed and random components: xi = mX + ui

Note that the expected value


of ui is zero:

E(ui) = E(xi mX) = E(xi) + E(mX) =mX mX = 0

The expected value of the random component is zero. It does not systematically tend to
increase or decrease X. It just makes it deviate from its population mean.
83
CONTINUOUS RANDOM VARIABLES

probability

2
__ 3
__ 4
__ 5
__ 6
__ 5
__ 4
__ 3
__ 2
__
1 1
36 36 36 36 36 36 36 36 36
36 36

2 3 4 5 6 7 8 9 10 11 12 X

A discrete random variable is one that can take only a finite set of values. The sum of the
numbers when two dice are thrown is an example.
84
CONTINUOUS RANDOM VARIABLES

probability

2
__ 3
__ 4
__ 5
__ 6
__ 5
__ 4
__ 3
__ 2
__
1 1
36 36 36 36 36 36 36 36 36
36 36

2 3 4 5 6 7 8 9 10 11 12 X

Each value has associated with it a finite probability, which you can think of as a packet of
probability. The packets sum to unity because the variable must take one of the values.
85
CONTINUOUS RANDOM VARIABLES

height

55 60 65 70 75 X

However, most random variables encountered in econometrics are continuous. They can
take any one of an infinite set of values defined over a range (or possibly, ranges).
86
CONTINUOUS RANDOM VARIABLES

height

55 60 65 70 75 X

As a simple example, take the temperature in a room. We will assume that it can be
anywhere from 55 to 75 degrees Fahrenheit with equal probability within the range.
87
CONTINUOUS RANDOM VARIABLES

height

55 60 65 70 75 X

In the case of a continuous random variable, the probability of it being equal to a given finite
value (for example, temperature equal to 55.473927) is always infinitesimal.
88
CONTINUOUS RANDOM VARIABLES

height

55 60 65 70 75 X

For this reason, you can only talk about the probability of a continuous random variable
lying between two given values. The probability is represented graphically as an area.
89
CONTINUOUS RANDOM VARIABLES

height

55 56 60 65 70 75 X

For example, you could measure the probability of the temperature being between 55 and
56, both measured exactly.
90
CONTINUOUS RANDOM VARIABLES

height

0.05

55 56 60 65 70 75 X

Given that the temperature lies anywhere between 55 and 75 with equal probability, the
probability of it lying between 55 and 56 must be 0.05.
91
CONTINUOUS RANDOM VARIABLES

height

0.05

55 56 57 60 65 70 75 X

Similarly, the probability of the temperature lying between 56 and 57 is 0.05.

92
CONTINUOUS RANDOM VARIABLES

height

0.05

55 5758 60 65 70 75 X

And similarly for all the other one-degree intervals within the range.

93
CONTINUOUS RANDOM VARIABLES

height

0.05

55 5758 60 65 70 75 X

The probability per unit interval is 0.05 and accordingly the area of the rectangle
representing the probability of the temperature lying in any given unit interval is 0.05.
94
CONTINUOUS RANDOM VARIABLES

height

0.05

55 5758 60 65 70 75 X

The probability per unit interval is called the probability density and it is equal to the height
of the unit-interval rectangle.
95
CONTINUOUS RANDOM VARIABLES

f(X) = 0.05 for 55 X 75


height
f(X) = 0 for X < 55 and X > 75

0.05

55 5758 60 65 70 75 X

Mathematically, the probability density is written as a function of the variable, for example
f(X). In this example, f(X) is 0.05 for 55 < X < 75 and it is zero elsewhere.
96
CONTINUOUS RANDOM VARIABLES

probability f(X) = 0.05 for 55 X 75


density
f(X) = 0 for X < 55 and X > 75
f(X)

0.05

55 5758 60 65 70 75 X

The vertical axis is given the label probability density, rather than height. f(X) is known as
the probability density function and is shown graphically in the diagram as the thick black
line. 97
CONTINUOUS RANDOM VARIABLES

probability f(X) = 0.05 for 55 X 75


density
f(X) = 0 for X < 55 and X > 75
f(X)

0.05

55 60 65 70 75 X

Suppose that you wish to calculate the probability of the temperature lying between 65 and
70 degrees.
98
CONTINUOUS RANDOM VARIABLES

probability f(X) = 0.05 for 55 X 75


density
f(X) = 0 for X < 55 and X > 75
f(X)

0.05

55 60 65 70 75 X

To do this, you should calculate the area under the probability density function between 65
and 70.
99
CONTINUOUS RANDOM VARIABLES

probability f(X) = 0.05 for 55 X 75


density
f(X) = 0 for X < 55 and X > 75
f(X)

0.05

55 60 65 70 75 X

Typically you have to use the integral calculus to work out the area under a curve, but in
this very simple example all you have to do is calculate the area of a rectangle.
100
CONTINUOUS RANDOM VARIABLES

probability f(X) = 0.05 for 55 X 75


density
f(X) = 0 for X < 55 and X > 75
f(X)
5
0.05

0.05 0.25

55 60 65 70 75 X

The height of the rectangle is 0.05 and its width is 5, so its area is 0.25.

101
CONTINUOUS RANDOM VARIABLES

probability
density

f(X)

65 70 75 X

Now suppose that the temperature can lie in the range 65 to 75 degrees, with uniformly
decreasing probability as the temperature gets higher.
102
CONTINUOUS RANDOM VARIABLES

probability
density

f(X)
0.20

0.15

0.10

0.05

65 70 75 X

The total area of the triangle is unity because the probability of the temperature lying in the
65 to 75 range is unity. Since the base of the triangle is 10, its height must be 0.20.
103
CONTINUOUS RANDOM VARIABLES

probability f(X) = 1.50 0.02X for 65 X 75


density
f(X) = 0 for X < 65 and X > 75
f(X)
0.20

0.15

0.10

0.05

65 70 75 X

In this example, the probability density function is a line of the form f(X) = b1 + b2X. To pass
through the points (65, 0.20) and (75, 0), b1 must equal 1.50 and b2 must equal -0.02.
104
CONTINUOUS RANDOM VARIABLES

probability f(X) = 1.50 0.02X for 65 X 75


density
f(X) = 0 for X < 65 and X > 75
f(X)
0.20

0.15

0.10

0.05

65 70 75 X

Suppose that we are interested in finding the probability of the temperature lying between
65 and 70 degrees.
105
CONTINUOUS RANDOM VARIABLES

probability f(X) = 1.50 0.02X for 65 X 75


density
f(X) = 0 for X < 65 and X > 75
f(X)
0.20

0.15

0.10

0.05

65 70 75 X

We could do this by evaluating the integral of the function over this range, but there is no
need.
106
CONTINUOUS RANDOM VARIABLES

probability f(X) = 1.50 0.02X for 65 X 75


density
f(X) = 0 for X < 65 and X > 75
f(X)
0.20

0.15

0.10

0.05

65 70 75 X

It is easy to show geometrically that the answer is 0.75. This completes the introduction to
continuous random variables.
107
EXPECTED VALUE OF A CONTINUOUS RANDOM VARIABLE

Definition of E(X), the expected value of c.r.v.X:


E ( X ) xf ( x)dx

The only difference between this case and the expected value of a d.r.v. is that we replace
the summation symbol by the integral symbol.

108
EXPECTED VALUE OF A CONTINUOUS RANDOM VARIABLE

Given the continuous PDF

1
f ( x) x 2where0 x 3
9
3
x2
E ( X ) x( )dx 2.25
0
9
The only difference between this case and the expected value of a d.r.v. is that we replace
the summation symbol by the integral symbol.

109
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance

cov( X ,Y ) s XY E ( X m X )(Y mY )

The covariance of two random variables X and Y, often written sXY, is defined to be the
expected value of the product of their deviations from their population means.
110
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance

cov( X ,Y ) s XY E ( X m X )(Y mY )

E ( X m X )(Y mY ) E ( X m X )E (Y mY )
E ( X ) E ( m X )E (Y ) E ( mY )
m X m X mY mY 0 0 0

If two variables are independent, their covariance is zero. To show this, start by rewriting
the covariance as the product of the expected values of its factors.
111
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance

cov( X ,Y ) s XY E ( X m X )(Y mY )

E ( X m X )(Y mY ) E ( X m X )E (Y mY )
E ( X ) E ( m X )E (Y ) E ( mY )
m X m X mY mY 0 0 0

We are allowed to do this because (and only because) X and Y are independent (see the
earlier sequence on independence.
112
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance

cov( X ,Y ) s XY E ( X m X )(Y mY )

E ( X m X )(Y mY ) E ( X m X )E (Y mY )
E ( X ) E ( m X )E (Y ) E ( mY )
m X m X mY mY 0 0 0

The expected values of both factors are zero because E(X) = mX and E(Y) = mY. E(mX) = mX
and E(mY) = mY because mX and mY are constants. Thus the covariance is zero.
113
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

1. If Y = V + W,
cov(X, Y) = cov(X, V) + cov(X,W).

2. If Y = bZ, where b is a constant


cov(X, Y) = bcov(X, Z)

3. If Y = b, where b is a constant,
cov(X, Y) = 0

There are some rules that follow in a perfectly straightforward way from the definition of
covariance, and since they are going to be used frequently in later chapters it is worthwhile
establishing them immediately. First, the addition rule. 114
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

1. If Y = V + W,
cov(X, Y) = cov(X, V) + cov(X,W).

2. If Y = bZ, where b is a constant


cov(X, Y) = bcov(X, Z)

3. If Y = b, where b is a constant,
cov(X, Y) = 0

Next, the multiplication rule, for cases where a variable is multiplied by a constant.

115
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

1. If Y = V + W,
cov(X, Y) = cov(X, V) + cov(X,W).

2. If Y = bZ, where b is a constant


cov(X, Y) = bcov(X, Z)

3. If Y = b, where b is a constant,
cov(X, Y) = 0

Finally, a primitive rule that is often useful.

116
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

1. If Y = V + W,
cov(X, Y) = cov(X, V) + cov(X,W).
Proof:
Since Y = V + W, mY = mV + mW
cov( X ,Y ) E X m X Y mY
E X m X [V W ] [ mV mW ]
E X m X V mV X m X W mW
cov( X ,V ) cov( X ,W ).

The proofs of the rules are straightforward. In each case the proof starts with the definition
of cov(X, Y).
117
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

1. If Y = V + W,
cov(X, Y) = cov(X, V) + cov(X,W).
Proof:
Since Y = V + W, mY = mV + mW
cov( X ,Y ) E X m X Y mY
E X m X [V W ] [ mV mW ]
E X m X V mV X m X W mW
cov( X ,V ) cov( X ,W ).

We now substitute for Y and re-arrange.

118
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

1. If Y = V + W,
cov(X, Y) = cov(X, V) + cov(X,W).
Proof:
Since Y = V + W, mY = mV + mW
cov( X ,Y ) E X m X Y mY
E X m X [V W ] [ mV mW ]
E X m X V mV X m X W mW
cov( X ,V ) cov( X ,W ).

This gives us the result.

119
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

2. If Y = bZ,
cov(X, Y) = bcov(X, Z).
Proof:
Since Y = bZ, mY = bmZ
cov( X , Y ) E( X m X )(Y mY )
E( X m X )(bZ bm Z )
bE( X m X )( Z m Z )
bcov( X , Z ).

Next, the multiplication rule, for cases where a variable is multiplied by a constant. The Y
terms have been replaced by the corresponding bZ terms.
120
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

2. If Y = bZ,
cov(X, Y) = bcov(X, Z).
Proof:
Since Y = bZ, mY = bmZ
cov( X , Y ) E( X m X )(Y mY )
E( X m X )(bZ bm Z )
bE( X m X )( Z m Z )
bcov( X , Z ).

b is a common factor and can be taken out of the expression, giving us the result that we
want.
121
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Covariance rules

3. If Y = b,
cov(X, Y) = 0.
Proof:
Since Y = b, mY = b
cov( X , Y ) E( X m X )(Y mY )
E( X m X )(b b )
E0
0.

The proof of the thrid rule is trivial.

122
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Example use of covariance rules

Suppose Y = b1 + b2Z

cov(X, Y) = cov(X, [b1 + b2Z])


= cov(X, b1) + cov(X, b2Z)
= 0 + cov(X, b2Z)
= b2cov(X, Z)

Here is an example of the use of the covariance rules. Suppose that Y is a linear function of
Z and that we wish to use this to decompose cov(X, Y). We substitute for Y (first line) and
then use covariance rule 1 (second line). 123
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Example use of covariance rules

Suppose Y = b1 + b2Z

cov(X, Y) = cov(X, [b1 + b2Z])


= cov(X, b1) + cov(X, b2Z)
= 0 + cov(X, b2Z)
= b2cov(X, Z)

Next we use covariance rule 3 (third line), and finally covariance rule 2 (fourth line).

124
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,


var(Y) = b2var(Z).

3. If Y = b, where b is a constant,
var(Y) = 0.

4. If Y = V + b, where b is a constant,
var(Y) = var(V).

Corresponding to the covariance rules, there are parallel rules for variances. First the
addition rule.
125
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,


var(Y) = b2var(Z).

3. If Y = b, where b is a constant,
var(Y) = 0.

4. If Y = V + b, where b is a constant,
var(Y) = var(V).

Next, the multiplication rule, for cases where a variable is multiplied by a constant.

126
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,


var(Y) = b2var(Z).

3. If Y = b, where b is a constant,
var(Y) = 0.

4. If Y = V + b, where b is a constant,
var(Y) = var(V).

A third rule to cover the special case where Y is a constant.

127
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

2. If Y = bZ, where b is a constant,


var(Y) = b2var(Z).

3. If Y = b, where b is a constant,
var(Y) = 0.

4. If Y = V + b, where b is a constant,
var(Y) = var(V).

Finally, it is useful to state a fourth rule. It depends on the first three, but it is so often of
practical value that it is worth keeping it in mind separately.
128
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:
var(Y) = cov(Y, Y) = cov([V + W], Y)
= cov(V, Y) + cov(W, Y)
var( X ) E ( X m X ) 2
= cov(V, [V + W]) + cov(W, [V + W])
E( X m X )( X m X )
= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)
= var(V) + 2cov(V, W) + var(W)
cov( X , X ).

The proofs of these rules can be derived from the results for covariances, noting that the
variance of Y is equivalent to the covariance of Y with itself.
129
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:
var(Y) = cov(Y, Y) = cov([V + W], Y)
= cov(V, Y) + cov(W, Y)
= cov(V, [V + W]) + cov(W, [V + W])
= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)
= var(V) + 2cov(V, W) + var(W)

We start by replacing one of the Y arguments by V + W.

130
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:
var(Y) = cov(Y, Y) = cov([V + W], Y)
= cov(V, Y) + cov(W, Y)
= cov(V, [V + W]) + cov(W, [V + W])
= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)
= var(V) + 2cov(V, W) + var(W)

We then use covariance rule 1.

131
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:
var(Y) = cov(Y, Y) = cov([V + W], Y)
= cov(V, Y) + cov(W, Y)
= cov(V, [V + W]) + cov(W, [V + W])
= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)
= var(V) + 2cov(V, W) + var(W)

We now substitute for the other Y argument in both terms and use covariance rule 1 a
second time.
132
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

1. If Y = V + W,
var(Y) = var(V) + var(W) + 2cov(V, W).

Proof:
var(Y) = cov(Y, Y) = cov([V + W], Y)
= cov(V, Y) + cov(W, Y)
= cov(V, [V + W]) + cov(W, [V + W])
= cov(V, V) + cov(V,W) + cov(W, V) + cov(W, W)
= var(V) + 2cov(V, W) + var(W)

This gives us the result. Note that the order of the arguments does not affect a covariance
expression and hence cov(W, V) is the same as cov(V, W).
133
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

2. If Y = bZ, where b is a constant,


var(Y) = b2var(Z).

Proof:
var(Y) = cov(Y, Y) = cov(bZ, bZ)
= b2cov(Z, Z)
= b2var(Z).

The proof of the variance rule 2 is even more straightforward. We start by writing var(Y) as
cov(Y, Y). We then substitute for both of the iYi arguments and take the b terms outside as
common factors. 134
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

3. If Y = b, where b is a constant,
var(Y) = 0.

Proof:
var(Y) = cov(b, b) = 0.

The third rule is trivial. We make use of covariance rule 3. Obviously if a variable is
constant, it has zero variance.
135
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

4. If Y = V + b, where b is a constant,
var(Y) = var(V).

Proof:
var(Y) = var(V) + 2cov(V, b) + var(b)
= var(V)

The fourth variance rule starts by using the first. The second term on the right side is zero
by covariance rule 3. The third is also zero by variance rule 3.
136
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Variance rules

4. If Y = V + b, where b is a constant,
var(Y) = var(V).

Proof:
var(Y) = var(V) + 2cov(V, b) + var(b)
= var(V)

0 mV
V
0 mV + b
V+b

The intuitive reason for this result is easy to understand. If you add a constant to a
variable, you shift its entire distribution by that constant. The expected value of the
squared deviation from the mean is unaffected. 137
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Correlation

s XY
XY
s Xs Y
2 2

cov(X, Y) is unsatisfactory as a measure of association between two variables X and Y


because it depends on the units of measurement of X and Y.
138
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Correlation

s XY
XY
s Xs Y
2 2

A better measure of association is the population correlation coefficient because it is


dimensionless. The numerator possesses the units of measurement of both X and Y.
139
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Correlation

s XY
XY
s Xs Y
2 2

The variances of X and Y in the denominator possess the squared units of measurement of
those variables.
140
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Correlation

s XY
XY
s Xs Y
2 2

However, once the square root has been taken into account, the units of measurement are
the same as those of the numerator, and the expression as a whole is unit free.
141
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Correlation

s XY
XY
s Xs Y
2 2

If X and Y are independent, XY will be equal to zero because sXY will be zero.

142
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Correlation

s XY
XY
s Xs Y
2 2

If there is a positive association between them, sXY, and hence XY, will be positive. If there
is an exact positive linear relationship, XY will assume its maximum value of 1. Similarly, if
there is a negative relationship, XY will be negative, with minimum value of 1. 143
COVARIANCE, COVARIANCE AND VARIANCE RULES, AND CORRELATION

Correlation

s XY
XY
s Xs Y
2 2

If X and Y are independent, rXY will be equal to zero because sXY will be zero. If there is a
positive association between them, sXY, and hence rXY, will be positive. If there is an exact
positive linear relationship, rXY will assume its maximum value of 1. Similarly, if there is a
negative relationship, rXY will be negative, with minimum value of 1.
144
SAMPLING AND ESTIMATORS

Suppose we have a random variable X and we wish to


estimate its unknown population mean mX.
Planning (beforehand concepts)
Our first step is to take a sample of n observations {X1,
, Xn}.
Before we take the sample, while we are still at the
planning stage, the Xi are random quantities. We know
that they will be generated randomly from the
distribution for X, but we do not know their values in
advance.
So now we are thinking about random variables on two
levels: the random variable X, and its random sample
components.

145
SAMPLING AND ESTIMATORS

Suppose we have a random variable X and we wish to


estimate its unknown population mean mX.
Planning (beforehand concepts)
Our first step is to take a sample of n observations {X1,
, Xn}.
Before we take the sample, while we are still at the
planning stage, the Xi are random quantities. We know
that they will be generated randomly from the
distribution for X, but we do not know their values in
advance.
So now we are thinking about random variables on two
levels: the random variable X, and its random sample
components.

146
SAMPLING AND ESTIMATORS

Suppose we have a random variable X and we wish to


estimate its unknown population mean mX.
Planning (beforehand concepts)
Our first step is to take a sample of n observations {X1,
, Xn}.
Before we take the sample, while we are still at the
planning stage, the Xi are random quantities. We know
that they will be generated randomly from the
distribution for X, but we do not know their values in
advance.
So now we are thinking about random variables on two
levels: the random variable X, and its random sample
components.

147
SAMPLING AND ESTIMATORS

Suppose we have a random variable X and we wish to


estimate its unknown population mean mX.
Realization (afterwards concepts)
Once we have taken the sample we will have a set of
numbers {x1, , xn}.
This is called by statisticians a realization. The lower
case is to emphasize that these are numbers, not
variables.

148
SAMPLING AND ESTIMATORS

Suppose we have a random variable X and we wish to


estimate its unknown population mean mX.
Planning (beforehand concepts)
Back to the plan. Having generated a sample of n
observations {X1, , Xn}, we plan to use them with a
mathematical formula to estimate the unknown
population mean mX.
This formula is known as an estimator. In this context,
the standard (but not only) estimator is the sample mean

1
X X 1 ... X n
n
An estimator is a random variable because it depends on
the random quantities {X1, , Xn}.

149
SAMPLING AND ESTIMATORS

Suppose we have a random variable X and we wish to


estimate its unknown population mean mX.
Planning (beforehand concepts)
Back to the plan. Having generated a sample of n
observations {X1, , Xn}, we plan to use them with a
mathematical formula to estimate the unknown
population mean mX.
This formula is known as an estimator. In this context,
the standard (but not only) estimator is the sample mean

1
X X 1 ... X n
n
An estimator is a random variable because it depends on
the random quantities {X1, , Xn}.

150
SAMPLING AND ESTIMATORS

Suppose we have a random variable X and we wish to


estimate its unknown population mean mX.
Realization (afterwards concepts)
The actual number that we obtain, given the realization
{x1, , xn}, is known as our estimate.

151
SAMPLING AND ESTIMATORS

probability density probability density


function of X function of X

mX X mX X

We will see why these distinctions are useful and important in a comparison of the
distributions of X and X. We will start by showing that X has the same mean as X.
152
SAMPLING AND ESTIMATORS

1
E X E X 1 ... X n
n
1
E X 1 ... X n
n
1
E X 1 ... E X n
n
1
m X ... m X m X
n

We start by replacing X by its definition and then using expected value rule 2 to take 1/n out
of the expression as a common factor.
153
SAMPLING AND ESTIMATORS

1
E X E X 1 ... X n
n
1
E X 1 ... X n
n
1
E X 1 ... E X n
n
1
m X ... m X m X
n

Next we use expected value rule 1 to replace the expectation of a sum with a sum of
expectations.
154
SAMPLING AND ESTIMATORS

1
E X E X 1 ... X n
n
1
E X 1 ... X n
n
1
E X 1 ... E X n
n
1
m X ... m X m X
n

Now we come to the bit that requires thought. Start with X1. When we are still at the
planning stage, X1 is a random variable and we do not know what its value will be.
155
SAMPLING AND ESTIMATORS

1
E X E X 1 ... X n
n
1
E X 1 ... X n
n
1
E X 1 ... E X n
n
1
m X ... m X m X
n

All we know is that it will be generated randomly from the distribution of X. The expected
value of X1, as a beforehand concept, will therefore be mX. The same is true for all the other
sample components, thinking about them beforehand. Hence we write this line.
156
SAMPLING AND ESTIMATORS

1
E X E X 1 ... X n
n
1
E X 1 ... X n
n
1
E X 1 ... E X n
n
1
m X ... m X m X
n

Thus we have shown that the mean of the distribution of X is mX.

157
SAMPLING AND ESTIMATORS

probability density probability density


function of X function of X

mX X mX X

We will next demonstrate that the variance of the distribution of X is smaller than that of X,
as depicted in the diagram.
158
SAMPLING AND ESTIMATORS

1
s var X 1 ... X n
2

n
X

1
2
var X 1 ... X n
n
1
2 var X 1 ... var X n
n

2 s X ... s X2
1 2
n
s
2 ns X2 X .
2
1
n n

We start by replacing X by its definition and then using variance rule 2 to take 1/n out of the
expression as a common factor.
159
SAMPLING AND ESTIMATORS

1
s var X 1 ... X n
2

n
X

1
2
var X 1 ... X n
n
1
2 var X 1 ... var X n
n

2 s X ... s X2
1 2
n
s
2 ns X2 X .
2
1
n n

Next we use variance rule 1 to replace the variance of a sum with a sum of variances. In
principle there are many covariance terms as well, but they are zero if we assume that the
sample values are generated independently. 160
SAMPLING AND ESTIMATORS

1
s var X 1 ... X n
2

n
X

1
2
var X 1 ... X n
n
1
2 var X 1 ... var X n
n

2 s X ... s X2
1 2
n
s
2 ns X2 X .
2
1
n n

Now we come to the bit that requires thought. Start with X1. When we are still at the
planning stage, we do not know what the value of X1 will be.
161
SAMPLING AND ESTIMATORS

1
s var X 1 ... X n
2

n
X

1
2
var X 1 ... X n
n
1
2 var X 1 ... var X n
n

2 s X ... s X2
1 2
n
s
2 ns X2 X .
2
1
n n

All we know is that it will be generated randomly from the distribution of X. The variance of
X1, as a beforehand concept, will therefore be sX2. The same is true for all the other sample
components, thinking about them beforehand. Hence we write this line. 162
SAMPLING AND ESTIMATORS

1
s var X 1 ... X n
2

n
X

1
2
var X 1 ... X n
n
1
2 var X 1 ... var X n
n

2 s X ... s X2
1 2
n
s
2 ns X2 X .
2
1
n n

Thus we have demonstrated that the variance of the sample mean is equal to the variance of
X divided by n, a result with which you will be familiar from your statistics course.
163
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n

Much of the analysis in this course will be concerned with three properties of estimators:
unbiasedness, efficiency, and consistency. The first two, treated here, relate to finite
sample analysis: analysis where the sample has a finite number of observations. 164
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n

Consistency, a property that relates to analysis when the sample size tends to infinity, is
treated in a later slideshow.
165
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n

Suppose that you wish to estimate the population mean mX of a random variable X given a
sample of observations. We will demonstrate that the sample mean is an unbiased
estimator, but not the only one. 166
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n

We will start with the proof in the previous sequence. We use the second expected value
rule to take the 1/n factor out of the expectation expression.
167
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n

Next we use the first expected value rule to break up the expression into the sum of the
expectations of the observations.
168
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n

Thinking about the sample values {X1, , Xn} at the planning stage, each expectation is
equal to mX, and hence the expected value of the sample mean, before we actually generate
the sample, is mX. 169
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n
Generalized estimator Z = l1X1 + l2X2

However, the sample mean is not the only unbiased estimator of the population mean. We
will demonstrate this supposing that we have a sample of two observations (to keep it
simple). 170
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n
Generalized estimator Z = l1X1 + l2X2

We will define a generalized estimator Z which is the weighted sum of the two observations,
l1 and l2 being the weights.
171
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n
Generalized estimator Z = l1X1 + l2X2
E ( Z ) E (l1 X 1 l2 X 2 ) E (l1 X 1 ) E (l2 X 2 )
l1 E ( X 1 ) l2 E ( X 2 ) (l1 l2 )m X
m X if (l1 l2 ) 1

We will analyze the expected value of Z and find out what condition the weights have to
satisfy for Z to be an unbiased estimator.
172
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n
Generalized estimator Z = l1X1 + l2X2
E ( Z ) E (l1 X 1 l2 X 2 ) E (l1 X 1 ) E (l2 X 2 )
l1 E ( X 1 ) l2 E ( X 2 ) (l1 l2 )m X
m X if (l1 l2 ) 1

We begin by decomposing the expectation using the first expected value rule.

173
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n
Generalized estimator Z = l1X1 + l2X2
E ( Z ) E (l1 X 1 l2 X 2 ) E (l1 X 1 ) E (l2 X 2 )
l1 E ( X 1 ) l2 E ( X 2 ) (l1 l2 )m X
m X if (l1 l2 ) 1

Now we use the second expected value rule to bring l1 and l2 out of the expected value
expressions.
174
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n
Generalized estimator Z = l1X1 + l2X2
E ( Z ) E (l1 X 1 l2 X 2 ) E (l1 X 1 ) E (l2 X 2 )
l1 E ( X 1 ) l2 E ( X 2 ) (l1 l2 )m X
m X if (l1 l2 ) 1

The expected value of X in each observation, before we generate the sample, is mX.

175
UNBIASEDNESS AND EFFICIENCY

Unbiasedness of X:
1 1
E ( X ) E ( X 1 ... X n ) E ( X 1 ... X n )
n n

E ( X 1 ) ... E ( X n ) nm X m X
1 1
n n
Generalized estimator Z = l1X1 + l2X2
E ( Z ) E (l1 X 1 l2 X 2 ) E (l1 X 1 ) E (l2 X 2 )
l1 E ( X 1 ) l2 E ( X 2 ) (l1 l2 )m X
m X if (l1 l2 ) 1

Thus Z is an unbiased estimator of mX if the sum of the weights is equal to one. An infinite
number of combinations of l1 and l2 satisfy this condition, not just the sample mean.
176
UNBIASEDNESS AND EFFICIENCY

probability
density
function

estimator B

estimator A

mX

How do we choose among them? The answer is to use the most efficient estimator, the one
with the smallest population variance, because it will tend to be the most accurate.
177
UNBIASEDNESS AND EFFICIENCY

probability
density
function

estimator B

estimator A

mX

In the diagram, A and B are both unbiased estimators but B is superior because it is more
efficient.
178
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

We will analyze the variance of the generalized estimator and find out what condition the
weights must satisfy in order to minimize it.
179
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

The first variance rule is used to decompose the variance.

180
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

Note that we are assuming that X1 and X2 are independent observations and so their
covariance is zero. The second variance rule is used to bring l1 and l2 out of the variance
expressions. 181
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

The variance of X1, at the planning stage, is sX2. The same goes for the variance of X2.

182
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

Now we take account of the condition for unbiasedness and re-write the variance of Z,
substituting for l2.
183
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

The quadratic is expanded. To minimize the variance of Z, we must choose l1 so as to


minimize the final expression.
184
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

ds Z2
0 4l1 2 0 l1 l2 0.5
dl1
We differentiate with respect to l1 to obtain the first-order condition.

185
UNBIASEDNESS AND EFFICIENCY

Generalized estimator Z = l1X1 + l2X2


s Z2 var(l1 X 1 l2 X 2 )
var(l1 X 1 ) var(l2 X 2 ) 2 cov(l1 X 1 , l2 X 2 )
l12s X2 1 l22s X2 2

(l12 l22 )s X2
(l12 [1 l1 ]2 )s X2 if (l1 l2 ) 1
( 2l12 2l1 1)s X2

ds Z2
0 4l1 2 0 l1 l2 0.5
dl1
The expression is minimized for l1 = 0.5. It follows that l2 = 0.5 as well. So we have
demonstrated that the sample mean is the most efficient unbiased estimator, at least in this
example. (Note that the second differential is positive, confirming that we have a minimum.)
186
UNBIASEDNESS AND EFFICIENCY

f ( l1.2
1)

0.8

0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 l1 1

Alternatively, we could find the minimum graphically. Here is a graph of the expression as a
function of l1.
187
UNBIASEDNESS AND EFFICIENCY

f ( l1.2
1)

0.8

0.6

0.4

0.2

0
0 0.2 0.4 0.6 0.8 l1 1

Again we see that the variance is minimized for l1 = 0.5 and so the sample mean is the most
efficient unbiased estimator.
188
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

probability
density
function

estimator B

estimator A

Suppose that you have alternative estimators of a population characteristic q, one unbiased,
the other biased but with a smaller variance. How do you choose between them?
189
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

loss

error (negative) error (positive)

One way is to define a loss function which reflects the cost to you of making errors, positive
or negative, of different sizes.
190
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2 s Z2 ( m Z q ) 2
probability
density
function

estimator B

A widely-used loss function is the mean square error of the estimator, defined as the
expected value of the square of the deviation of the estimator about the true value of the
population characteristic. 191
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2 s Z2 ( m Z q ) 2
probability
density
function

estimator B

bias

q mZ

The mean square error involves a trade-off between the variance of the estimator and its
bias. Suppose you have a biased estimator like estimator B above, with expected value mZ.
192
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2 s Z2 ( m Z q ) 2
probability
density
function

estimator B

bias

q mZ

The mean square error can be shown to be equal to the sum of the variance of the estimator
and the square of the bias.
193
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

To demonstrate this, we start by subtracting and adding mZ .

194
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

We expand the quadratic using the rule (a + b)2 = a2 + b2 + 2ab, where a = Z mZ and b = mZ
q.
195
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

We use the first expected value rule to break up the expectation into its three components.

196
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

The first term in the expression is by definition the variance of Z.

197
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

(mZ q) is a constant, so the second term is a constant.

198
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

In the third term, (mZ q) may be brought out of the expectation, again because it is a
constant, using the second expected value rule.
199
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

Now E(Z) is mZ, and E(mZ) is mZ.

200
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

MSE( Z ) E ( Z q ) 2
E ( Z m Z m Z q ) 2
E ( Z m Z ) 2 ( m Z q ) 2 2( Z m Z )( m Z q )
E ( Z m Z ) 2 E ( m Z q ) 2 E 2( Z m Z )( m Z q )
s Z2 ( m Z q ) 2 2( m Z q ) E ( Z m Z )
s Z2 ( m Z q ) 2 2( m Z q )( m Z m Z )
s Z2 ( m Z q ) 2

Hence the third term is zero and the mean square error of Z is shown be the sum of the
variance of Z and the bias squared.
201
CONFLICTS BETWEEN UNBIASEDNESS AND MINIMUM VARIANCE

probability
density
function

estimator B

estimator A

In the case of the estimators shown, estimator B is probably a little better than estimator A
according to the MSE criterion.
202
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

var( X ) s E X m X

2 2
Variance X

We have seen that the variance of a random variable X is given by the expression above.

203
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

var( X ) s E X m X

2 2
Variance X

1 n
Estimator s
2
X X 2
.
X
n 1 i 1 i

Given a sample of n observations, the usual estimator of the variance is the sum of the
squared deviations around the sample mean divided by n 1, typically denoted s2X.
204
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

var( X ) s E X m X

2 2
Variance X

1 n
Estimator s
2
X X 2
.
X
n 1 i 1 i

Since the variance is the expected value of the squared deviation of X about its mean, it
makes intuitive sense to use the average of the sample squared deviations as an estimator.
But why divide by n 1 rather than by n? 205
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

var( X ) s E X m X

2 2
Variance X

1 n
Estimator s
2
X X 2
.
X
n 1 i 1 i

The reason is that the sample mean is by definition in the middle of the sample, while the
unknown population mean is not, except by coincidence.
206
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

var( X ) s E X m X

2 2
Variance X

1 n
Estimator s
2
X X 2
.
X
n 1 i 1 i

As a consequence, the sum of the squared deviations from the sample mean tends to be
slightly smaller than the sum of the squared deviations from the population mean.
207
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

var( X ) s E X m X

2 2
Variance X

1 n
Estimator s
2
X X 2
.
X
n 1 i 1 i

Hence a simple average of the squared sample deviations is a downwards biased estimator
of the variance. However, the bias can be shown to be a factor of (n 1)/n. Thus one can
allow for the bias by dividing the sum of the squared deviations by n 1 instead of n. 208
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

var( X ) s E X m X

2 2
Variance X

1 n
Estimator s
2
X X 2
.
X
n 1 i 1 i

Covariance cov( X ,Y ) s XY E X m X Y mY

1 n
Estimator s XY X i X Yi Y .
n 1 i 1

A similar adjustment has to be made when estimating a covariance. For two random variables X
and Y an unbiased estimator of the covariance sXY is given by the sum of the products of the
deviations around the sample means divided by n 1. 209
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

s XY
Correlation XY
s X2 s Y2

The population correlation coefficient XY for two variables X and Y is defined to be their
covariance divided by the square root of the product of their variances.
210
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

s XY
Correlation XY
s X2 s Y2
Estimator

1
s XY n1
X X Y Y
rXY 2 2
1 1
s X sY

2 2
X X Y Y
n1 n1


X X Y Y
X X Y Y
2 2

The sample correlation coefficient, rXY, is obtained from this by replacing the covariance and
variances by their estimators.
211
ESTIMATORS OF VARIANCE, COVARIANCE, AND CORRELATION

s XY
Correlation XY
s X2 s Y2
Estimator

1
s XY n1
X X Y Y
rXY 2 2
1 1
s X sY

2 2
X X Y Y
n1 n1


X X Y Y
X X Y Y
2 2

The 1/(n 1) terms in the numerator and the denominator cancel and one is left with a
straightforward expression.
212

You might also like