Professional Documents
Culture Documents
Statistics is:
the “science of observing data and making inferences about characteristics of a [the?]
random mechanism that has generated the data.”
• The first goal consists in describing the data. Types of datasets: cross-sections
(individual variation, indexed by i), time series (temporal variation, indexed by t) and
panels (combines the previous ones, indexed by i and t).
• The “random mechanism” refers to a model.
Example: revealed preferences.
N
1 X
y= yi .
N i=1
Ex: (y1 , ..., yN ) is a random sample taken from the US working population in 2000. y is
the sample mean.
1
What does the “average” wage y represent?
One may be interested in the average wage of the random mechanism that has gener-
ated the data (say: µ). Ex: the average wage of the US working population in 2000.
The statistician assumes that y1 , ..., yN are draws (often: iid draws) from an underlying
random variable Y with mean µ.
Bayesian perspective: I don’t know µ, but I have an idea (prior ) on µ, say µ1 . Un-
der some assumptions (normality), the Bayesian solution (posterior mean) is a weighted
average of µ1 and y, where the weights depend on the uncertainty about µ1 . In this
perspective the sample is given, and µ is a random variable (=stochastic).
Frequentist perspective: If I could draw many different samples from Y , how close
would y and µ be? In other words, is it reasonable to estimate µ by y? Or: if I replace
µ by y will I make large mistakes on average, that is if I draw sufficiently many different
samples? In this perspective µ is fixed (=deterministic) and there can be many samples.
PN PN
Many other statistics: pth moments ( N1 i=1 yip ), variance ( N1 i=1 yi2 − y 2 ), median:
N N
≤ #{i, yi ≤ med(y)} < + 1,
2 2
Other statistics, that are not always numbers: the histogram, the density (the continu-
ous limit of the histogram when the width of the bins tends to zero), the mode (maximum
of the density). Ex: is it reasonable to replace the (unknown) density by the empirical
histogram?
Cov(y, z)
Corr(y, z) = .
σ(y)σ(z)
2
Also: Transition probability matrices.
A last example is a nonparametric regression estimate of the conditional mean of yi
given zi , as the Nadaraya-Watson kernel :
PN zi −z
i=1 yi K h
y(z) = PN zi −z
.
i=1 K h
In this expression, h is the bandwidth, and is usually small (= N −β , 0 < β < 1). The
larger h, the smoother the curve. When h tends to infinity, y(z) tends to y irrespective of
z. K is a continuous non negative and symmetric function which goes to zero at infinity.
Example: the Gaussian kernel. Ex: yi is log-wage and zi is years of schooling.
All these statistics can be studied in relation with the random process that has gener-
ated the data: the Data Generating Process (DGP ). In economics, behavioral assumptions
often lead to specific DGPs= models.
d(ln y(s))
= r.
ds
ln y(s) = rs + b.
3
A reason for that could be that the theory holds ceteris paribus, everything else equal.
Still, even by taking homogeneous segments (males in a certain occupation living in a given
city...) the relationship between wage and age will never look deterministic. Moreover,
such an approach is subject to the problem of curse of dimensionality: if we condition on
too many variables (sex, occupation, city of residence...) we end up with empty cells.
Solution:
• We interpret the theoretical relation given by the model as one giving a conditional
average. Let E(Y |X) be the conditional expectation of Y given X. We will give a precise
meaning to this object in Chapter 4. We interpret the Mincer equation as:
E(ln Y |S) = rS + b.
In conclusion, when confronting the Mincer model to the data we ask if the relation
between log-wages and years of schooling is linear in the population. The information we
have to answer that question is contained in one given sample drawn from that population.
Statistics provides ways of estimating the relation, and testing hypotheses about it
(e.g., is the model rejected by the data? By how much will my years at Chicago increase
my wage?).