You are on page 1of 4

Chapter 0: Statistics for Economists

References: Amemiya (1994) “Introduction to Statistics and Econometrics”, chap.


1 and 2; Goldberger (1991) “A Course in Econometrics”, chap. 1.

1 Probability and statistics


A probability has two different interpretations:

• Uncertainty. Crucial concept in economics (ex: individual decision-making, eco-


nomic policy...).
= The Bayesian interpretation.

• Repeated events. Limit of empirical frequencies (ex: coin toss).


= The classical, or frequentist interpretation.
In economics events often occur only once. So repetition is a thought experiment.
Examples: time series data and country-level data.

Statistics is:
the “science of observing data and making inferences about characteristics of a [the?]
random mechanism that has generated the data.”
• The first goal consists in describing the data. Types of datasets: cross-sections
(individual variation, indexed by i), time series (temporal variation, indexed by t) and
panels (combines the previous ones, indexed by i and t).
• The “random mechanism” refers to a model.
Example: revealed preferences.

1.1 Describing the data


Sample mean: In this course, we abstract from the (very interesting) issues linked to
data collection. So, we assume that we observe a sample of variables of interest, e.g.
wages, say y1 , ..., yN . We can compute the mean:

N
1 X
y= yi .
N i=1

Ex: (y1 , ..., yN ) is a random sample taken from the US working population in 2000. y is
the sample mean.

1
What does the “average” wage y represent?
One may be interested in the average wage of the random mechanism that has gener-
ated the data (say: µ). Ex: the average wage of the US working population in 2000.
The statistician assumes that y1 , ..., yN are draws (often: iid draws) from an underlying
random variable Y with mean µ.

Bayesian perspective: I don’t know µ, but I have an idea (prior ) on µ, say µ1 . Un-
der some assumptions (normality), the Bayesian solution (posterior mean) is a weighted
average of µ1 and y, where the weights depend on the uncertainty about µ1 . In this
perspective the sample is given, and µ is a random variable (=stochastic).

Frequentist perspective: If I could draw many different samples from Y , how close
would y and µ be? In other words, is it reasonable to estimate µ by y? Or: if I replace
µ by y will I make large mistakes on average, that is if I draw sufficiently many different
samples? In this perspective µ is fixed (=deterministic) and there can be many samples.

PN PN
Many other statistics: pth moments ( N1 i=1 yip ), variance ( N1 i=1 yi2 − y 2 ), median:

N N
≤ #{i, yi ≤ med(y)} < + 1,
2 2

αth quantile (0 ≤ α ≤ 1):

αN ≤ #{i, yi ≤ cα (y)} < αN + 1.

Other statistics, that are not always numbers: the histogram, the density (the continu-
ous limit of the histogram when the width of the bins tends to zero), the mode (maximum
of the density). Ex: is it reasonable to replace the (unknown) density by the empirical
histogram?

Association: Economics mostly aims at finding relations between variables (produc-


tion, labor and capital; inflation and unemployment; demand, supply and price...). Useful
statistics are the covariance:
N N
1 X 1 X
Cov(y, z) = (yi − y)(zi − z) = yi zi − y · z
N i=1 N i=1

and the correlation coefficient:

Cov(y, z)
Corr(y, z) = .
σ(y)σ(z)

2
Also: Transition probability matrices.
A last example is a nonparametric regression estimate of the conditional mean of yi
given zi , as the Nadaraya-Watson kernel :
PN zi −z

i=1 yi K h
y(z) = PN zi −z
 .
i=1 K h

In this expression, h is the bandwidth, and is usually small (= N −β , 0 < β < 1). The
larger h, the smoother the curve. When h tends to infinity, y(z) tends to y irrespective of
z. K is a continuous non negative and symmetric function which goes to zero at infinity.
Example: the Gaussian kernel. Ex: yi is log-wage and zi is years of schooling.

All these statistics can be studied in relation with the random process that has gener-
ated the data: the Data Generating Process (DGP ). In economics, behavioral assumptions
often lead to specific DGPs= models.

2 Linking the theory to the data


Theory: an economic model decribes the behavior of an economic agent. Ex: Mincer’s
model (1958).
s= years of schooling, y(s)= yearly wage if schooling level s, T = retirement, r=
interest rate. The value of studying s years is:
Z T
V (s) = y(s) exp(−rt)dt.
s

Maximizing V (s) w.r.t. s yields

d(ln y(s)) exp(−rs)


=r .
ds exp(−rs) − exp(−rT )

When T tends to infinity we obtain

d(ln y(s))
= r.
ds

Empirical analysis: strictly speaking, the model implies

ln y(s) = rs + b.

The relation is deterministic. However, in the data we do not observe a deterministic


(linear) relation. So, there is an infinite discrepancy between the model and the data.

3
A reason for that could be that the theory holds ceteris paribus, everything else equal.
Still, even by taking homogeneous segments (males in a certain occupation living in a given
city...) the relationship between wage and age will never look deterministic. Moreover,
such an approach is subject to the problem of curse of dimensionality: if we condition on
too many variables (sex, occupation, city of residence...) we end up with empty cells.

Solution:
• We interpret the theoretical relation given by the model as one giving a conditional
average. Let E(Y |X) be the conditional expectation of Y given X. We will give a precise
meaning to this object in Chapter 4. We interpret the Mincer equation as:

E(ln Y |S) = rS + b.

Still not satisfied in the data.


• We interpret the sample that we have as coming from an infinite population. In
this perspective, the theoretical relation holds in the population. If different samples are
drawn from the same population then we will obtain different estimates.

In conclusion, when confronting the Mincer model to the data we ask if the relation
between log-wages and years of schooling is linear in the population. The information we
have to answer that question is contained in one given sample drawn from that population.
Statistics provides ways of estimating the relation, and testing hypotheses about it
(e.g., is the model rejected by the data? By how much will my years at Chicago increase
my wage?).

You might also like