You are on page 1of 31

1/31

EC114 Introduction to Quantitative Economics


10. Further Inference Topics
Department of Economics
University of Essex
13/15 December 2011
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
2/31
Outline
1
Correlations and Independence
2
Tests About Two Populations
Reference: R. L. Thomas, Using Statistics in Economics,
McGraw-Hill, 2005, sections 6.36.4.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 3/31
Although the covariance can tell us whether there is a
positive or a negative linear association between X and Y,
it tells us nothing about the strength of this association.
For example, what constitutes a large linear association,
whether positive or negative?
The correlation coefcient, , provides such information,
and is dened as
=
Cov(X, Y)

V(X)

V(Y)
.
The usefulness of the correlation coefcient lies in the fact
that, unlike the covariance, it can only take values within a
denite nite range.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 4/31
While a covariance can take any value between and
+, the correlation is restricted to values within the range
1 to +1.
When there is an exact (perfect) positive linear association
between X and Y, the correlation takes the value = +1.
Similarly, when there is an exact (perfect) negative linear
association between X and Y, the correlation is = 1.
Furthermore, when there is no linear association between
X and Y at all, then = 0.
The correlation coefcient gives us a standard by which we
can judge the strength of any linear association between
two variables.
Clearly, if were to take a value close to zero, we would
judge the association to be a very weak one.
However, values close to +1 or 1 would imply strong
positive and negative linear associations, respectively.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 5/31
For the die-rolling example, we have already computed the
covariance as 1.0625, whereas we found V(X) = 17.19
and V(Y) = 0.94.
Hence, we obtain the correlation as:
=
1.0625

17.19

0.94
= 0.26.
The correlation is negative, as expected, but the value of
is rather closer to 0 than to 1.
So we can say that there is a fairly weak negative linear
association between X and Y in this case.
This is not unexpected because, intuitively, we would not
expect a close relationship between X, the product of the
two numbers on the two dice, and Y, their difference.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 6/31
Recalling that Cov(X, Y) = E(XY) E(X)E(Y) we can write
as
=
E(XY) E(X)E(Y)

V(X)

V(Y)
.
It follows that, if E(XY) = E(X)E(Y), then = 0 i.e. the
correlation between X and Y will be zero.
But this refers to linear relationships, and in Economics,
relationships are not always linear.
Hence we often need to discover whether any non-linear
relationships between variables are present.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 7/31
In the lecture on probability (Lecture 2), we saw that
independence between two events A and B implied that
Pr(A and B) = Pr(A) Pr(B).
That is, the joint probability of two independent events
occurring is given by the product of the marginal
probabilities of the individual events.
Two random variables, X and Y, are said to be independent
if the joint probabilities are the product of the relevant
marginal probabilities for all possible combinations of X
and Y.
That is, X and Y are independent if and only if
p(X, Y) = f (X) g(Y) for all X and Y.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 8/31
Consider the following two independent random variables,
X and Y, whose joint and marginal distributions are:
Y\X 1 2 3 4 5 g(Y)
5 0.06 0.04 0.04 0.04 0.02 0.20
10 0.09 0.06 0.06 0.06 0.03 0.30
15 0.15 0.10 0.10 0.10 0.05 0.50
f (X) 0.30 0.20 0.20 0.20 0.10
Notice that the relationship p(X, Y) = f (X)g(Y) holds for all
combinations of X and Y.
For example, p(4, 5) = f (4)g(5) and p(1, 10) = f (1)g(10).
If just one combination of X and Y were to fail to obey the
condition p(X, Y) = f (X)g(Y), then the variables could no
longer be called independent.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 9/31
While zero correlation ( = 0) implies the absence of any
linear association between X and Y, independence is a
stronger condition.
Independence implies the absence of any association
between X and Y, linear or nonlinear.
Hence independence implies zero correlation, but zero
correlation does not necessarily imply independence.
Thus the condition E(XY) = E(X)E(Y) implies
Cov(X, Y) = 0, and hence zero correlation, but does not
necessarily imply independence.
For independence we also require p(X, Y) = f (X)g(Y).
Thus, if two variables are uncorrelated, they are not
linearly associated, but they could still be not independent
(i.e. dependent) if there were some nonlinear (possibly
weak) association present.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 10/31
Although the discussion here has been restricted to
discrete variables, the concepts of independence and
correlation apply equally well to continuous variables.
However, it can be shown that the distinction between the
two concepts disappears when continuous variables are
normally distributed.
If two normally distributed variables, X and Y, are
uncorrelated, then they must be independent.
Another useful property of normally distributed variables is
given by the following theorem.
Theorem
Any linear function of a series of independently and normally
distributed variables is itself normally distributed.
Example: if X, Y and Z are all independent normally
distributed random variables, then it follows that
W = 2X + 4Y 3Z will also be normally distributed.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 11/31
Correlations can also be computed for samples.
The sample correlation coefcient, R, is dened as
R =

(X

X)(Y

Y)

(X

X)
2

(Y

Y)
2
.
As with the population correlation we nd that
1 R 1
with the same interpretation of values e.g. R = 1 implies
a perfect negative correlation between X and Y etc.
To compute R we dont need to worry about normalising
anything by n or n 1.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 12/31
To see this note that

(X

X)(Y

Y) =

XY n

Y,

(X

X)
2
=

X
2
n

X
2
,

(Y

Y)
2
=

Y
2
n

Y
2
.
As an example, a sample of 10 trials of the two-dice
experiment yields the following values for X and Y:
X 3 2 12 3 4 4 6 12 4 8
Y 2 1 1 2 0 3 1 1 0 2
From these values we obtain:

X = 58,

X
2
= 458,

XY = 72,

Y = 13,

Y
2
= 25.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Correlations and Independence 13/31
Hence

X = 58/10 = 5.8,

Y = 13/10 = 1.3 and so

(X

X)(Y

Y) = 72 10(5.8)(1.3) = 3.40,

(X

X)
2
= 458 10(5.8)
2
= 121.60,

(Y

Y)
2
= 25 10(1.3)
2
= 8.10.
It follows that
R =
3.4

121.6

8.1
= 0.108,
suggesting that there is a weak negative linear relationship
between X and Y.
(Note that the population correlation = 0.260.)
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 14/31
Often in statistics we need to compute parameter values
relating to two or more different populations.
Consider two cities, A and B.
Suppose that a researcher suspects that mean annual
income in city B is greater than in city A, and wishes to test
whether this is actually the case.
Let
1
and
2
denote the population mean incomes in
cities A and B respectively.
As always, we formulate a null hypothesis:
H
0
:
1
=
2
(no difference between mean incomes)
and an alternative hypothesis:
H
A
:
1
<
2
(mean income is greater in city B).
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 15/31
How can we derive a suitable test statistic?
Let X
1
be the annual income of a resident from city A, and
X
2
the annual income of a resident in city B.
We therefore have a population of very many values for X
1
from city A, with a mean
1
and a variance
2
1
.
Similarly, we have a population of very many values for X
2
from city B, with mean
2
and variance
2
2
.
Notice that the absolute sizes of the populations in the two
cities are unimportant, provided both cities are large.
We can now apply the Central Limit Theorem (CLT) to both
populations in turn.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 16/31
Suppose we take a sufciently large sample of size n
1
from
city A and compute the sample mean income

X
1
.
Then the CLT implies that

X
1
N

1
,

2
1
n
1

.
Similarly, taking a sufciently large sample of size n
2
from
city B yields the following result for the sample mean

X
2
:

X
2
N

2
,

2
2
n
2

.
As we are interested in the difference between the two
unknown population means,
1

2
, it makes sense to
base any test statistic on the quantity

X
1

X
2
, the
difference between the two sample means.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 17/31
If it were possible to take very many samples, each
yielding a value for

X
1

X
2
, we would obtain a sampling
distribution for

X
1

X
2
, from which we could derive a
suitable test statistic.
We therefore need to nd the sampling distribution for

X
1

X
2
.
As both

X
1
and

X
2
are normally distributed it follows (from
the Theorem on slide 10) that

X
1

X
2
is also normally
distributed.
We therefore need to nd the mean and variance of

X
1

X
2
.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 18/31
It is straightforward to show that
E(

X
1

X
2
) = E(

X
1
) E(

X
2
) =
1

2
.
If we assume that the two samples are independent (not
unreasonable) then Cov(

X
1
,

X
2
) = 0 and so
V(

X
1

X
2
) = V(

X
1
) +V(

X
2
) =

2
1
n
1
+

2
2
n
2
.
Hence

X
1

X
2
N

2
,

2
1
n
1
+

2
2
n
2

.
This is summarised in the diagram on the next slide.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 19/31

We can use this normal sampling distribution to derive an
appropriate test statistic.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 20/31
We can now standardise

X
1

X
2
in the usual way by
subtracting the mean and dividing by the standard
deviation to obtain a N(0, 1) distribution:

X
1

X
2
(
1

2
)

2
1
n
1
+

2
2
n
2
N(0, 1).
However, we require this distribution under H
0
:
1
=
2
,
resulting in the test statistic
TS =

X
1

X
2

2
1
n
1
+

2
2
n
2
N(0, 1)
under H
0
.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 21/31
Recall that H
A
:
1
<
2
implying that
1

2
< 0.
We therefore have a lower-sided one-tail test.
Adopting a 5% level of signicance the critical value from
the N(0, 1) distribution is 1.64 and our test criterion
becomes:
reject H
0
if TS < 1.64
but reserve judgement if TS > 1.64.
The test criterion is illustrated on the next slide.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 22/31

Suppose we take samples of size n
1
= n
2
= 200 and
obtain:

X
1
= 14, 860, s
1
= 1655,

X
2
= 17, 230, s
1
= 2108.
As
2
1
and
2
2
are unknown we replace them with the
unbiased estimators s
2
1
and s
2
2
.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 23/31
Then
TS =
14, 860 17, 230

1655
2
200
+
2108
2
200
= 12.52.
Using the test criterion we nd that TS < 1.64 and hence
we reject H
0
:
1
=
2
in favour of H
A
:
1
<
2
i.e. there is
evidence that the mean income in city A is below that in
city B.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 24/31
The preceding results were based on large samples and
the CLT.
However, when samples are small we have to make use of
the Students t-distribution.
Provided that both populations:
1
are normally distributed, and
2
have the same variance
2
(i.e.
2
1
=
2
2
=
2
),
then

X
1

X
2
(
1

2
)

1
n
1
+
1
n
2
N(0, 1),
even when samples are small.
But this will not be the case when
2
is unknown and has
to be replaced by an estimator of it.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 25/31
When
2
is unknown we can estimate it using
s
2
=
(n
1
1)s
2
1
+ (n
2
1)s
2
2
n
1
+n
2
2
.
It then follows that

X
1

X
2
(
1

2
)
s

1
n
1
+
1
n
2
t
n
1
+n
2
2
.
For example, if H
0
:
1
=
2
then we can use the test
statistic
TS =

X
1

X
2
s

1
n
1
+
1
n
2
t
n
1
+n
2
2
under H
0
and apply the usual testing procedure.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 26/31
In addition to test of equality for the mean we can also
conduct a test of equality for the variances of two
populations.
In this case we shall take the null and alternative
hypotheses to be:
H
0
:
2
1
=
2
2
, H
A
:
2
1
=
2
2
.
It turns out that a suitable test statistic is the ratio of the
two sample variances, s
2
1
and s
2
2
.
We therefore have
TS =
s
2
1
s
2
2
F
n
1
1,n
2
1
which is an F-distribution with n
1
1 degrees of freedom
for the numerator and n
2
1 degrees of freedom for the
denominator.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 27/31
An example of an F-distribution with 20 d.f. for the
numerator and 20 d.f. for the denominator is as follows:

The distribution is strictly positive and not symmetric so we
have to nd two critical values for a two-tail test from the
following table:
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 28/31
Upper 2.5% critical values of the F-distribution with v
1
degrees of freedom for the numerator
and v
2
degrees of freedom for the denominator
v
1
v
2
1 2 3 4 5 6 7 8 9
1 647.79 799.50 864.16 899.58 921.85 937.11 948.22 956.66 963.28
2 38.51 39.00 39.17 39.25 39.30 39.33 39.36 39.37 39.39
3 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47
4 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90
5 10.01 8.43 7.76 7.39 7.15 6.98 6.85 6.76 6.68
6 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52
7 8.07 6.54 5.89 5.52 5.29 5.12 4.99 4.90 4.82
8 7.57 6.06 5.42 5.05 4.82 4.65 4.53 4.43 4.36
9 7.21 5.71 5.08 4.72 4.48 4.32 4.20 4.10 4.03
10 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78
11 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59
12 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44
13 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.31
14 6.30 4.86 4.24 3.89 3.66 3.50 3.38 3.29 3.21
15 6.20 4.77 4.15 3.80 3.58 3.41 3.29 3.20 3.12
16 6.12 4.69 4.08 3.73 3.50 3.34 3.22 3.12 3.05
18 5.98 4.56 3.95 3.61 3.38 3.22 3.10 3.01 2.93
20 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.84
22 5.79 4.38 3.78 3.44 3.22 3.05 2.93 2.84 2.76
24 5.72 4.32 3.72 3.38 3.15 2.99 2.87 2.78 2.70
26 5.66 4.27 3.67 3.33 3.10 2.94 2.82 2.73 2.65
28 5.61 4.22 3.63 3.29 3.06 2.90 2.78 2.69 2.61
30 5.57 4.18 3.59 3.25 3.03 2.87 2.75 2.65 2.57
40 5.42 4.05 3.46 3.13 2.90 2.74 2.62 2.53 2.45
60 5.29 3.93 3.34 3.01 2.79 2.63 2.51 2.41 2.33
120 5.15 3.80 3.23 2.89 2.67 2.52 2.39 2.30 2.22
5.02 3.69 3.12 2.79 2.57 2.41 2.29 2.19 2.11
NB: Entries are F
u
such that Pr(F
v
1
,v
2
> F
u
) = 0.025.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 29/31
Table A.4 in Thomas provides additional d.f. for the
numerator as well as additional signicant levels.
The table gives the upper-tail critical value, F
u
; the
lower-tail critical value is simply the inverse of this i.e.
F
l
=
1
F
u
.
For example, with 8 d.f. for the numerator and 30 d.f. for the
denominator, the table gives
F
u
= 2.65 F
l
=
1
2.65
= 0.38.
The test criterion for the test is:
reject H
0
if TS < F
l
or if TS > F
u
but reserve judgement if F
l
< TS < F
u
.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Tests About Two Populations 30/31
For example, suppose we have two samples yielding:
n
1
= 10, s
2
1
= 14.5, n
2
= 20, s
2
2
= 4.8.
The resulting test statistic is
TS =
14.5
4.8
= 3.02
and has an F
9,19
distribution under the null.
The upper two-tail 5% critical value (which puts 2.5% into
each tail) is 2.88 and the lower-tail value is 1/2.88 = 0.35.
As TS > 2.88 we reject the null in favour of the alternative
that
2
1
=
2
2
.
EC114 Introduction to Quantitative Economics 10. Further Inference Topics
Summary 31/31
Summary
Correlations and independence.
Tests about two populations.
Next term:
Econometrics (but rst enjoy the vacation. . . )
EC114 Introduction to Quantitative Economics 10. Further Inference Topics

You might also like