You are on page 1of 65

Luke Stein

Stanford graduate economics core (20067)


Reference and review (last updated April 11, 2012)

Contents Standard deviation Conditional homoscedasticity 1.11 Decision theory . . . . . . . . . . . . . 14


Covariance Regression model Loss function
1 Econometrics . . . . . . . . . . . . . . 4 Multivariate covariance Linear regression model with non- Decision rule
1.1 Probability foundations . . . . . . . . 4 Correlation stochastic covariates Risk function
Skewness Seemingly Unrelated Regressions Mean squared error risk function
Basic set theory
Disjointness Kurtosis Probit model 1.12 Hypothesis testing . . . . . . . . . . . 14
Partition Moment generating function Nonlinear least squares Hypothesis testing
Sigma algebra Characteristic function Linear instrumental variables Test statistic
Probability function Other generating functions 1.9 Statistics . . . . . . . . . . . . . . . . 11 Type I/Type II error
Probability space 1.5 Distributions . . . . . . . . . . . . . . 7 Sufficient statistic Power
Counting Normal distribution Factorization theorem Size
Conditional probability Bivariate normal distribution Minimal sufficient statistic Level
Bayes Rule Multivariate normal distribution Likelihood function Distance function principle
Independence of events Chi squared distribution Ancillary statistic p-value
Mutual independence of events Students t distribution Complete statistic Uniformly most powerful test
Snedecors F distribution Basus theorem Simple likelihood ratio statistic
1.2 Random variables . . . . . . . . . . . 5
Lognormal distribution Statistics in exponential families Neyman-Pearson Lemma
Random variable
Exponential families Monotone likelihood ratio models
Random vector 1.10 Point estimation . . . . . . . . . . . . 11 Unbiased test
Measurability Location and Scale families Estimator
Stable distribution Uniformly most powerful unbiased
Smallest -field Extremum estimator test
Independence of r.v.s 1.6 Random samples . . . . . . . . . . . . 9 Analogy principle Likelihood ratio test
Independence of random vectors Random sample, iid Consistent estimator Wald statistic
Mean independence Statistic Consistency with compact parameter Lagrange multiplier statistic
Cumulative distribution function Unbiased estimator space Asymptotic significance
Probability mass function Sample mean Consistency without compact param- Asymptotic size
Marginal pmf Sample variance eter space Consistent test
Conditional pmf Sample covariance Uniform (Weak) Law of Large Num-
Probability density function Sample correlation bers 1.13 Time-series concepts . . . . . . . . . . 15
Marginal pdf Order statistic Asymptotically normal estimator Stationarity
Conditional pdf Samples from the normal distribution Asymptotic variance Covariance stationarity
Borel Paradox Asymptotically efficient estimator White noise
1.7 Convergence of random variables . . . 9 Ergodicity
Stochastic ordering Maximum Likelihood estimator
Convergence in probability Martingale
Support set Consistency for for MLE
Uniform convergence in probability Random walk
Asymptotic normality for MLE
1.3 Transformations of random variables . 6 Little o error notation Martingale difference sequence
M-estimator
Transformation R1 R1 Big O error notation Ergodic stationary martingale differ-
Asymptotic normality for M-
Transformation R2 R2 Order symbols ences CLT
estimator
Convolution formulae Asymptotic equivalence Lag operator
Method of Moments estimator
Probability integral transformation Convergence in Lp Moving average process
Generalized Method of Moments esti-
1.4 Properties of random variables . . . . 6 Almost sure convergence Autoregressive process of degree one
mator
Expected value, mean Convergence in distribution Autoregressive process of degree p
Consistency for GMM estimators
Conditional expectation Central Limit Theorem for iid samples ARMA process of degree (p, q)
Asymptotic normality for GMM esti-
Two-way rule for expectations Central Limit Theorem for niid sam- Autoregressive conditional het-
mator
ples eroscedastic process
Law of Iterated Expectations Efficient GMM
Slutskys Theorem GARCH process
Median Minimum Distance estimator
Delta Method Local level model
Mode Asymptotic normality for MD estima-
Symmetric distribution 1.8 Parametric models . . . . . . . . . . . 10 tor Estimating number of lags
Moment Parametric model Uniformly minimum variance unbi- Estimating AR(p)
Variance Parameter ased estimator Maximum likelihood with serial corre-
lation
Multivariate variance Identification Fisher Information
Conditional variance Identification in exponential families Cramer-Rao Inequality 1.14 Ordinary least squares . . . . . . . . . 17

1
Ordinary least squares model Maximum likelihood for SUR Shutdown Rationalization: h and differentiable
Ordinary least squares estimators Limited Information Maximum Like- Nonincreasing returns to scale e
Asymptotic properties of OLS estima- lihood Nondecreasing returns to scale Rationalization: differentiable h
tor Constant returns to scale Rationalization: differentiable x
1.18 Unit root processes . . . . . . . . . . . 24
Estimating S Convexity Rationalization: differentiable e
Integrated process of order 0
Biases affecting OLS Integrated process of order d Transformation function Slutsky equation
Maximum likelihood estimator for Unit root process Marginal rate of transformation Normal good
OLS model Brownian motion Production function Inferior good
Best linear predictor Function Central Limit Theorem Marginal rate of technological substi- Regular good
Fitted value Dickey-Fuller Test tution Giffen good
Projection matrix Profit maximization Substitute goods
Annihilator matrix 1.19 Limited dependent variable models . . 24 Rationalization: profit maximization Complement goods
Standard error of the regression Binary response model functions Gross substitute
Sampling error Probit model Loss function Gross complement
OLS R2 Logit model Hotellings Lemma Engle curve
Standard error Truncated response model Substitution matrix Offer curve
Robust standard error Tobit model Law of Supply Roys identity
OLS t test Sample selection model Rationalization: y() and differen- Consumer welfare: price changes
OLS robust t test 1.20 Inequalities . . . . . . . . . . . . . . . 25 tiable () Compensating variation
OLS F test Bonferronis Inequality Rationalization: differentiable y() Equivalent variation
OLS robust Wald statistic Booles Inequality Rationalization: differentiable () Marshallian consumer surplus
Generalized least squares Chebychevs Inequality Rationalization: general y() and () Price index
Feasible generalized least squares Markovs Inequality Producer Surplus Formula Aggregating consumer demand
Weighted least squares Numerical inequality lemma Single-output case 2.5 Choice under uncertainty . . . . . . . 32
Frisch-Waugh Theorem Holders Inequality Cost minimization Lottery
Short and long regressions Cauchy-Schwarz Inequality Rationalization: single-output cost Preference axioms under uncertainty
1.15 Linear Generalized Method of Moments 19 Minkowskis Inequality function Expected utility function
Linear GMM model Jensens Inequality Monopoly pricing Bernoulli utility function
Instrumental variables estimator 1.21 Thoughts on MaCurdy questions . . . 25 2.3 Comparative statics . . . . . . . . . . 30 von Neumann-Morgenstern utility
Linear GMM estimator Implicit function theorem function
GMM hypothesis testing 2 Microeconomics . . . . . . . . . . . . 27 Envelope theorem Risk aversion
Efficient GMM Envelope theorem (integral form) Certain equivalent
2.1 Choice Theory . . . . . . . . . . . . . 27
Testing overidentifying restrictions Increasing differences Absolute risk aversion
Rational preference relation
Two-Stage Least Squares Supermodularity Certain equivalent rate of return
Choice rule
Durbin-Wu-Hausman test Submodularity Relative risk aversion
Revealed preference
Topkis Theorem First-order stochastic dominance
1.16 Linear multiple equation GMM . . . . 21 Houthakers Axiom of Revealed Pref-
Monotone Selection Theorem Second-order stochastic dominance
Linear multiple-equation GMM erences
Milgrom-Shannon Monotonicity The- Demand for insurance
model Weak Axiom of Revealed Preference
orem Portfolio problem
Linear multiple-equation GMM esti- Generalized Axiom of Revealed Pref-
MCS: robustness to objective function Subjective probabilities
mator erence
Full-Information Instrumental Vari- Utility function perturbation 2.6 General equilibrium . . . . . . . . . . 33
ables Efficient Interpersonal comparison Complement inputs Walrasian model
Three-Stage Least Squares Continuous preference Substitute inputs Walrasian equilibrium
Multiple equation Two-Stage Least Monotone preference LeChatelier principle Feasible allocation
Squares Locally non-satiated preference 2.4 Consumer theory . . . . . . . . . . . . 31 Pareto optimality
Seemingly Unrelated Regressions Convex preference Budget set Edgeworth box
Multiple-equation GMM with com- Homothetic preference Utility maximization problem First Welfare Theorem
mon coefficients Separable preferences Indirect utility function Second Welfare Theorem
Random effects estimator Quasi-linear preferences Marshallian demand correspondence Excess demand
Pooled OLS Lexicographic preferences Walras Law Sonnenschein-Mantel-Debreu Theo-
Expenditure minimization problem rem
1.17 ML for multiple-equation linear models 23 2.2 Producer theory . . . . . . . . . . . . 28
Gross substitutes property
Structural form Competitive producer behavior Expenditure function
Incomplete markets
Reduced form Production plan Hicksian demand correspondence
Rational expectations equilibrium
Full Information Maximum Likeli- Production set Relating Marshallian and Hicksian de-
hood Free disposal mand 2.7 Games . . . . . . . . . . . . . . . . . . 34

2
Game tree Feasible payoff Imperfect competition: final goods One-sided lack of commitment
Perfect recall Individually rational payoff Imperfect competition: intermediate Two-sided lack of commitment
Perfect information Folk Theorem goods Bulow-Rogoff defaultable debt model
Complete information Nash reversion Hyperbolic discounting
3.3 General concepts . . . . . . . . . . . . 40
Strategy Finitely repeated game Cobb-Douglas production function 3.8 John Taylor . . . . . . . . . . . . . . . 44
Payoff function Stick-and-carrot equilibrium Constant relative risk aversion utility Taylor Rule
Normal form
2.12 Collusion and cooperation . . . . . . . 37 function Granger causality
Revelation principle
Core Additively separability with station- Okuns Law
Proper subgame
Core convergence ary discounting Cagan money demand
System of beliefs
Nontransferable utility game Intertemporal Euler equation Rational expectations models
Strictly mixed strategy
Transferable utility game Intertemporal elasticity of substitu- Cagan model
2.8 Dominance . . . . . . . . . . . . . . . 35 Shapley value tion Dornbush model
Dominant strategy Simple game General equilibrium Philips Curve
Dominated strategy Convex game Inada conditions Time inconsistency
Iterated deletion of dominated strate- One-sided matching Transversality condition Lucas supply function
gies Two-sided matching No Ponzi condition
Weakly dominated strategy Dynamic systems 4 Mathematics . . . . . . . . . . . . . . 44
2.13 Principal-agent problems . . . . . . . 38
2.9 Equilibrium concepts . . . . . . . . . . 35 First Welfare Theorem
Moral hazard 4.1 General mathematical concepts . . . . 44
Rationalizable strategy Ramsey equilibrium
Adverse selection Elasticity
Pure strategy Nash equilibrium Signaling 3.4 Dynamic programming mathematics . 41 Eulers law
Mixed strategy Nash equilibrium Screening Metric space Strong set order
Trembling-hand perfection Normed vector space Meet
Correlated equilibrium 2.14 Social choice theory . . . . . . . . . . 39
Supremum norm (Rn ) Join
Bayesian Nash equilibrium Voting rule
Euclidean norm Sublattice
Subgame perfect Nash equilibrium Neutrality
Continuous function Orthants of Euclidean space
Sequentially rational strategy Anonymity
Supremum norm (real-valued func- Sum of powers
Weak perfect Bayesian equilibrium Monotonicity
tions) Geometric series
Perfect Bayesian equilibrium Mays Theorem
Convergent sequence
Consistent strategy Unanimity 4.2 Set theory . . . . . . . . . . . . . . . . 45
Cauchy sequence
Sequential equilibrium Arrows independence of irrelevant al- Supremum limit
Complete metric space
Extensive form trembling hand per- ternatives Infimum limit
Contraction
fection Arrows Theorem Partition
Contraction Mapping Theorem
Condorcet winner Convex set
2.10 Basic game-theoretic models . . . . . . 36 Contraction on subsets
Condorcet-consistent rule 4.3 Binary relations . . . . . . . . . . . . . 45
Cournot competition Blackwells sufficient conditions for a
Gibbard-Satterthwaite Theorem Complete
Bertrand competition contraction
Vickrey-Clarke-Groves mechanism Transitive
Bertrand competitionnon-spatial Principle of Optimality
Pivotal mechanism Symmetric
differentiation Dynamic programming: bounded re-
Bertrand competitionhorizontal turns Negative transitive
3 Macroeconomics . . . . . . . . . . . . 39 Reflexive
differentiation Benveniste-Scheinkman Theorem
Bertrand competitionvertical dif- 3.1 Models . . . . . . . . . . . . . . . . . . 39 Irreflexive
3.5 Continuous time . . . . . . . . . . . . 42
ferentiation Two period intertemporal choice Asymmetric
Dynamic systems in continuous time
Sequential competition model Equivalence relation
Linearization in continuous time
Herfindahl-Hirshman index Neoclassical growth model (discrete 4.4 Functions . . . . . . . . . . . . . . . . 45
NCGM in continuous time
Lerner index time) Function
Log-linearization
Monopolistic competition Endowment economy, Arrow-Debreu Surjective
Entry deterrence Endowment economy, sequential mar- 3.6 Uncertainty . . . . . . . . . . . . . . . 42
Injective
Vickrey auction kets Uncertainty: general setup
Bijective
First-price sealed bid auction Production economy, Arrow-Debreu Lucas tree
Monotone
English auction Production economy, sequential mar- Risk-free bond pricing
Convex function
Dutch auction kets Stock pricing
Binomial theorem
Public good Overlapping generations
3.7 Manuel Amador . . . . . . . . . . . . 43 Gamma function
Spence signaling model Real Business Cycles model
Lucas cost of busines cycles Beta function
Optimal taxationprimal approach
2.11 Repeated games with complete infor- Incomplete markets: finite horizon Logistic function
mation . . . . . . . . . . . . . . . . . . 37 3.2 Imperfect competition . . . . . . . . . 40 Incomplete markets: infinite horizon Taylor series
Infinitely repeated game Imperfect competition model Halls martingale hypothesis Derivative of power series

3
Taylor series examples 4.6 Linear algebra . . . . . . . . . . . . . 46 Eigen decomposition Partitioned matrices
Mean value expansion Matrix Orthogonal decomposition Kronecker product
4.5 Correspondences . . . . . . . . . . . . 46 Matrix multiplication Matrix squareroot Deviation from means
Correspondence Determinant Eigenvector, eigenvalue
Lower semi-continuity Matrix inverse Positive definiteness, &c.
5 References . . . . . . . . . . . . . . . . 48
Upper semi-continuity Orthogonal matrix Orthogonal completion
Brouwers Fixed Point Theorem Trace Idempotent matrix
Kakutanis Fixed Point Theorem Linear independence, rank Matrix derivatives Statistical tables . . . . . . . . . . . . . . 48

1 Econometrics 1. P (A) 0 for all A B; 1. If A and B are disjoint (A B = ), then P (A|B) =


P (B|A) = 0.
2. P (S) = 1;
1.1 Probability foundations S
P {Ai } B are pairwise disjoint, then P ( i Ai ) =
3. If 2. If P (B) = 1, then A, P (A|B) = P (A).
Basic set theory (C&B 1.1.4) All sets A, B, C satisfy: i P (Ai ) (countable additivity for pairwise disjoint
sets). 3. If A B, then P (B|A) = 1 and P (A|B) = P (A)/P (B).
1. Commutativity: A B = B A and A B = B A;
2. Associativity: A (B C) = (A B) C and A (B For any probability function P and A, B B, 4. If A and B are mutually exclusive, P (A|A B) =
C) = (A B) C; P (A)/[P (A) + P (B)].
1. P () = 0;
3. Distributive laws: A (B C) = (A B) (A C) and
A (B C) = (A B) (A C); 2. P (A) 1; 5. P (A B C) = P (A|B C) P (B|C) P (C).
4. DeMorgans Laws: (A B)C = AC B C and (A B)C = 3. P (AC ) = 1 P (A);
AC B C . 4. P (B AC ) = P (B) P (A B) (B but not A is B minus Bayes Rule (C&B 1.3.5) A formula for turning around condi-
both A and B); tional probabilities: P (A|B) = P (B|A) P (A)/P (B). More
Disjointness (C&B 1.1.5) Two events are disjoint (a.k.a. mutually
exclusive) iff AB = . Events {Ai } are pairwise disjoint or 5. P (A B) = P (A) + P (B) P (A B); generally, if {Ai } partition the sample space and B is any
mutually exclusive iff Ai Aj = for all i 6= j. Two events set, then i,
6. If A B, then P (A) P (B).
with nonzero probability cannot be both mutually exclusive
and independent (see ex. 1.39). If {Ci } partitions A, then P (A) =
P
P (A Ci ).
i
P (B|Ai ) P (Ai )
Partition (C&B 1.1.6) A1 , A2 , . . . is a partition of S iff: P (Ai |B) = P .
Probability space (Hansen 1-31, 1-28) (, F , P ) where: j P (B|Aj ) P (Aj )
S
1. S = i Ai (i.e., covering);
1. is the universe (e.g., S, the sample space);
2. A1 , A2 , . . . are pairwise disjoint (i.e., non-overlapping).
2. F is the -field (e.g., B1 );
Sigma algebra (C&B 1.2.1) Collection of subsets of S (i.e., a subset Independence of events (C&B 1.3.7, 9, 4.2.10) A, B statistically
3. P a probability measure (e.g., P, the probability mea- independent iff P (A B) = P (A) P (B) or identically iff
of the power set of S) is a sigma algebra, denoted B iff: sure that governs all random variables). P (A|B) = P (A) (this happens iff P (B|A) = P (B)).
1. B; A random variable X induces a probability measure PX de-
2. If A B, then AC B (closed under fined by PX (B) P (X B) = P (F ). This gives the prob-
complementationalong with first axiom gives S B); ability space (R, B, PX ). 1. Iff A and B are independent, then the following pairs
S are also independent: A and B C , AC and B, AC and
3. If {Ai } B, then i Ai B (closed under countable BC.
unions). Counting (C&B sec. 1.2.3) The number of possible arrangement of
size r from n objects is
A cdf completely determines the probability distribution of 2. Two events with nonzero probability cannot be both
a random variable if its probability function is defined only No replacement With replacement mutually exclusive and independent (see ex. 1.39).
n!
for events in the Borel field B1 , the smallest sigma algebra Ordered (nr)!
nr
containing all the intervals of real numbers of the form (a, b), n n+r1 3. If X, Y independent r.v.s, then for any A, B R,
Unordered r r events {X A} and {Y B} are independent events.
[a, b), (a, b], [a, b]. If probabilities are defined for a larger class
of events, two random variables may have the same cdf but
where n n!

not the same probability for every event (see C&B p. 33). r
r!(nr)! . (Unordered with replacement, a.k.a.
stars and bars.) Mutual independence of events (C&B 1.3.12) A collection of
Probability function (C&B 1.2.4, 89, 11) Given a sample space S events {Ai } are mutually independent
T iff Q
for any subcollec-
and associated sigma algebra B, a function P : B R is a Conditional probability (C&B 1.3.2, ex. 1.38) For A, B S with tion Ai1 , . . . , Aik we have P ( j Aij ) = j P (Aij ). Note
probability function iff it satisfies the Kolmogorov Axioms P (B) > 0, the conditional probability of A given B is that pairwise independence does not imply mutual indepen-
or Axioms of Probability: P (A|B) P (A B)/P (B). dence.

4
1.2 Random variables 2. The conditional distribution of any subset of the coor- Conditional pmf (C&B 4.2.1) For (X, Y ) a discrete random vec-
dinates, given the values of the rest of the coordinates, tor, f (y|x) P (Y = y|X = x) = f (x, y)/fX (x), where
Random variable (1.4.1, 1.5.78, 10, sec. 3.2) A function X : S R is the same as the marginal distribution of the subset. f (x, y) is joint pmf, and fX (x) is marginal pmf.
where S is a sample space. 3. Mutual independence implies pairwise independence,
Probability density function (C&B 1.6.3, 5, 4.1.10) For a con-
but pairwise independence does not imply mutual in-
1. Continuous iff its cdf is a continuous function and dis- tinuous r.v.,
R x fX (x) defined as the function which satisfies
dependence.
crete iff its cdf is a step function (i.e., if sample space FX (x) = fX (t) dt for all x. A function fX is a pdf iff:
is countable). Mean independence (Metrics P.S. 3-4c, Metrics section) X is mean
1. x, fX (x) 0;
2. Identically distributed iff A B1 , P (X A) = P (Y independent of Y (written X
m Y ) iff E(X|Y ) = E(X).
R
A), or identically iff x, FX (x) = FY (x). 2. R fX (x) dx = 1.
1. Mean independence is not transitive (i.e., X
m Y does
3. Note identical distribution says nothing about not imply that Y
m X). fX gives the probability of any event: P (X B) =
(in)dependence. R
2. Independence implies mean independence (i.e., X
R 1(xB) fX (x) dx.
Random vector (C&B 4.1.1) n-dimensional random vector is a Y = X m Y Y m X). A continuous (in all dimensions) random vector X has
function X : S Rn where S is a sample space. 3. X mY = E[(X|g(Y )] = E[X], for any function joint
R
n
R pdf fX (x1 , . . . , xn ) iff A R , P (X A) =
g(). A fX (x1 , . . . , xn ) dx1 dxn .
Measurability (Hansen 1-28, 4-11) A r.v. X : (, F ) (R, B) is F -
measurable iff B B, { : X() B} F (i.e., the 4. X
m Y = Cov(X, g(Y )) = 0 for any function g(). Marginal pdf (C&B p. 145, 178) For a continuous (in all dimen-
preimage of every element of B is an element of F ). sions) random vector,
Cumulative distribution function (C&B 1.5.1, 3, p. 147)
If F and G are both -fields with G F, then X is G- FX (x) P (X x). By the Fundamental Theorem of
Z Z
measurable = X is F -measurable (i.e., if the preimage of d
Calculus, dx FX (x) = fX (x) for a continuous r.v. at conti- fXi (xi ) fX (x) dx1 dxi1 dxi+1 dxn ,
every B is in G, it is also in its superset F ). Rn1
nuity points of fX . A function F is a cdf iff:
i.e., hold Xi = xi , and integrate fX over R in all Xj for
Smallest -field (Hansen 4-12) The smallest -field that makes a 1. limx F (x) = 0 and limx F (x) = 1; i 6= j.
r.v. Z : (, F ) (R, B) measurable is (Z) {G : B
B, G = Z 1 (B)} (i.e., the set of preimages of elements of 2. F () nondecreasing; We can also take the marginal pdf for multiple i by holding
B). 3. F () right-continuous; i.e., x0 , limxx0 F (x) = F (x0 ). these and integrating fX over R in all Xj that arent being
held.
Independence of r.v.s (C&B 4.2.5, 7, p. 154, 4.3.5) X and Y inde- A random vector X has joint cdf FX (x1 , . . . , xn ) P (X1
pendent r.v.s (written X
Y ) iff any of the following equiv- x1 , . . . , Xn xn ). By the Fundamental Theorem of Calcu- Conditional pdf (C&B 4.2.3, p. 178) For (X, Y ) a continuous ran-
alent conditions hold: n
lus, x x FX (~ x) = fX (~
x) for a continuous (in all dimen- dom vector, f (y|x) f (x, y)/fX (x) as long as fX (x) 6= 0,
1 n where f (x, y) is joint pdf, and fX (x) is marginal pdf.
sions) random vector at continuity points of fX .
1. A, B, P (X A, Y B) = P (X A) P (Y B).
We can also condition for/on multiple coordinates:
2. FXY (x, y) P (X x, Y y) = FX (x)FY (y). Probability mass function (C&B 1.6.1, 5, 4.1.3) For a discrete r.v., e.g., for (X1 , X2 , X3 , X4 ) a continuous random vector,
fX (x) P (X = x). A function fX is a pmf iff: f (x3 , x4 |x1 , x2 ) f (x1 , x2 , x3 , x4 )/fX1 X2 (x1 , x2 ), where f
3. f (x, y) = fX (x)fY (y) (i.e., joint pdf/pmf is the product
is a joint pdf, and fX1 X2 is the marginal pdf in X1 and X2 .
of marginal pdfs/pmfs).
1. x, fX (x) 0;
4. f (y|x) = fY (y) (i.e., conditional pdf/pmf equals P Borel Paradox (4.9.3) Be careful when we condition on events of
2. x fX (x) = 1.
marginal pdf/pmf). probability zero: two events of probability zero may be equiv-
fX gives the probability of any event: P (X B) = alent, but the probabilities conditional on the two events is
5. g(x), h(y), x, y, f (x, y) = g(x)h(y) (i.e., joint P different!
pdf/pmf is separable). Note that functional forms may k 1(xk B) fX (xk ).
appear separable, but limits may still depend on the A discrete random vector X has joint pmf fX (~v ) P (X = Stochastic ordering (C&B ex. 1.49, ex. 3.41-2) cdf FX stochastically
other variable; if the support set of (X, Y ) is not a cross ~v ). greater than cdf FY iff FX (t) FY (t) at all t, with strict
product, then X and Y are not independent.
inequality at some t. This implies P (X > t) P (Y > t) at
Marginal pmf (C&B 4.1.6, p. 178) For a discrete random vector, all t, with strict inequality at some t.
For any functions g(t) and h(t), X
Y = g(X)
h(Y ).
fXi (xi ) P (Xi = xi ) =
X
fX (x); A family of cdfs {F (x|)} is stochastically increasing in iff
Independence of random vectors (C&B 4.6.5) X1 , . . . , Xn mu- 1 > 2 = F (x|1 ) stochastically greater than F (x|2 ).
xi Rn1
tually independent iff for every (x1 , . . . , xn ), the joint A location family is stochastically increasing in its location
pdf/pmf is the Q product of the marginal pdfs/pmfs; i.e., parameter; if a scale family has sample space [0, ), it is
i.e., hold Xi = xi , and sum fX over all remaining possible
f (x1 , . . . , xn ) = i fXi (xi ). stochastically increasing in its scale parameter.
values of X.
1. Knowledge about the values of some coordinates gives We can also take the marginal pmf for multiple i by holding Support set (C&B eq. 2.1.7) Support set (a.k.a. support) of a r.v.
us no information about the values of the other coordi- these and summing fX over all remaining possible values of X is X {x : fX (x) > 0}, where fX a cdf or pdf (or in
nates. X. general, any nonnegative function).

5
1.3 Transformations of random variables x = h1i (u, v) and y = h2i (u, v) be the inverses, and Ji the 2. E[g(X)h(Y )|X] = g(X) E[h(Y )|X] for any functions g
Jacobian, on Ai ; and h.
Transformation R1 R1 (C&B 2.1.3, 5, 8) A discrete r.v. can be (P 3. X Y = E(Y |X) = E(Y ) (i.e., knowing X gives us
transformed into a discrete r.v. A continuous r.v. can be
fU V (u, v) = i fXY (h1i (u, v), h2i (u, v)) |Ji | , (u, v) B; no additional information about E Y ).
transformed into either a continuous or a discrete r.v. (or 0, otherwise.
mixed!). When Y = g(X) and Y g(X ) (where X is the 4. E(Y |X) = E(Y ) = Cov(X, Y ) = 0
support of X), For generalization to Rn Rn case for n > 2, see C&B 5. E(Y |X) is the MSE minimizing predictor of Y based
p. 185. on knowledge of X, (i.e., ming(x) E[Y g(X)]2 =
1. If g monotone increasing on X , then FY (y) =
FX (g 1 (y)) for y Y; E[Y E(Y |X)]2 ).
Convolution formulae (C&B 5.2.9, ex. 5.6) X Y both continu-
2. If g monotone decreasing on X and X a continuous r.v., ous. Then: Let X be a r.v. that takes values in (R, B), let G = (X)
then FY (y) = 1 FX (g 1 (y)) for y Y. R (i.e., the smallest sigma field measuring X), and assume
1. fX+Y (z) = R fX (w)fY (z w) dw.
E |Y | < . Conditional expected value of Y given X, is
If g monotone, fX continuous on X , and g 1 has continuous
R
2. fXY (z) = R fX (w)fY (w z) dw. defined implicitly (and non-uniquely) as satisfying:
derivative on Y, then: R 1
3. fXY (z) = R | w |fX (w)fY (z/w) dw.
(  d 1 1. E | E(Y |X)| < ;
fX g 1 (y) dy g (y) , y Y;
R
4. fX/Y (z) = R |w|fX (wz)fY (w) dw. 2. E(Y |X) is G-measurable (i.e., Y |X cannot rely on more
fY (y) =
0, otherwise. information than X does);
Probability integral transformation (C&B 2.1.10, ex. 2.10) If R R
Y = FX (X) (for X continuous) then Y Unif(0, 1). Can 3. G G, G E(Y |X) dP () = G Y dP () (i.e.,
If {Ai }ki=0 partitions X , with P (X A0 ) = 0; fX continu- E[E(Y |X)|X G] = E[Y |X G]);
be used to generate random samples from a particular dis-
ous on each Ai ; and {Ai }ki=1 satisfying: tribution: generate a uniform random and apply inverse of R R
cdf of target distribution. If X is discrete, Y is stochastically where the notation B PdPX (x) means B fX (x) dx if X is
1. g(x) = gi (x) for x Ai , continuous, and means xB fX (x) if X is discrete.
greater than Unif(0, 1).
2. gi monotone on Ai ,
Two-way rule for expectations (C&B p. 58,
R ex. 2.21) If Y =
3. Y, i, gi (Ai ) = Y (i.e., all Ai have same image un- 1.4 Properties of random variables g(X), then E g(X) = E Y ; i.e., R g(x)fX (x) dx =
der their respective gi s) [Hansen note 2-15 suggests this R
R yfY (y) dy.
need not hold], Expected value, mean (C&B 2.2.1, 56, 4.6.6)
4. i, gi1 has a continuous derivative on Y; then: (R Law of Iterated Expectations (C&B 4.4.3; Hansen 4-21) E X =
R g(x)fX (x) dx, if X continuous; E[E(X|Y )], provided the expectations exist. More generally,
(P   E g(X) P when L M (i.e., L contains less information, M contains
k 1 d 1 g(x)f (x) dx, if X discrete;
i=1 fX gi (y) dy gi (y) , y Y; xX X
fY (y) = more),
0, otherwise. provided the integral or sum exists and that E |g(X)| 6= .
For constants a, b, c and functions g1 , g2 such that E(g1 (X)), E[X|L] = E[E(X|M)|L] = E[E(X|L)|M].
Transformation R2
R2 (C&B p. 158, 185) Let U = g1 (X, Y ) and E(g2 (X)) exist,
V = g2 (X, Y ) where: Median (C&B ex. 2.1718) m such that P (X m) 12 and
1. E[ag1 (X) + bg2 (X) + c] = a E(g1 (X)) + b E(g2 (X)) + c P (X m) 21 . If X continuous, the median minimizes
1. (X, Y ) has pdf fXY and support A; (i.e., expectation is a linear operator); absolute deviation; i.e., mina E |X a| = E |X m|.
2. g1 and g2 define a 1-to-1 transformation from A to 2. If x, g1 (x) 0, then E g1 (X) 0;
B {(u, v) : u g1 (A), v g2 (A)} (i.e., the support of Mode (C&B ex. 2.27) f (x) is unimodal with mode equal to a iff
3. If x, g1 (x) g2 (x), then E g1 (X) E g2 (X); a x y = f (a) f (x) f (y) and a x y =
(U, V ));
4. If x, a g1 (x) b, then a E g1 (X) b. f (a) f (x) f (y).
3. Inverse transform is X = h1 (U, V ) and Y = h2 (U, V );
then: The mean is the MSE minimizing predictor for X; i.e., 1. Modes are not necessarily unique.
( minb E(X b)2 = E(X E X)2 . If X1 , . . . , Xn mutually
fXY (h1 (u, v), h2 (u, v)) |J| , (u, v) B; 2. If f is symmetric and unimodal, then the point of sym-
fU V (u, v) = independent, then E[g1 (X1 ) gn (Xn )] = E[g1 (X1 )]
0, otherwise; metry is a mode.
E[gn (Xn )].
where J is the Jacobian, Symmetric distribution (C&B ex. 2.2526) If fX is symmetric
Conditional expectation (C&B p. 150; Hansen 4-146; Hayashi 1389)
about a (i.e., , fX (a + ) = fX (a )), then:
"
x x
# a.k.a. regression of Y on X. E(Y |X) is a r.v. P
which is a func-
J det u v . tion of X. For discrete (X, Y ), E(g(Y R)|x) Y g(y)f (y|x).
y y 1. X and 2a X are identically distributed;
u v For continuous (X, Y ), E(g(Y )|x) R g(y)f (y|x) dy. Con-
ditional expected value has all the standard properties of 2. If a = 0, then MX is symmetric about 0;
If the transformation is not 1-to-1, we can partition A into expected value. Also: 3. a is the median;
{Ai } such that 1-to-1 transformations exist from each Ai
to B which map (x, y) 7 (u, v) appropriately. Letting 1. E[g(X)|X] = g(X) for any function g. 4. If E X exists, then E X = a.

6
5. For odd k, the kth central moment k is zero (if it ex- 5. Cov(aX +bY, cX +dY ) = ac Var(X)+bd Var(Y )+(ad+ 2. A cf completely determines a distribution: if the cfs of a
ists); if the distribution is symmetric about 0, then all bc) Cov(X, Y ). sequence of r.v.s converge to X in some neighborhood
odd moments are zero (if they exist). of zero, then the cdfs of the sequence converge to FX
Multivariate covariance (Hansen 5-28; Hayashi 756) Cov(X, Y) at all points where FX is continuous.
Moment (C&B 2.3.1, 11; Hansen 2-37) For n Z, the nth moment E[(X E X)(Y E Y)0 ] = E(XY 0 ) (E X)(E Y 0 ). Thus: 2
of X is 0n E X n . Also denote 01 = E X as . The nth 3. For X N(0, 1), X (t) = et /2 .

central moment is n E(X )n . 1. Cov(AX, BY) = A Cov(X, Y)B0 ; 4. We can recover probability from a cf: for all a, b such
2. Cov(X, Y) = Cov(Y, X)0 . that P (X = a) = P (X = b) = 0,
1. Two different distributions can have all the same mo- Z T ita
ments, but only if the variables have unbounded support 1 e eitb
Correlation (C&B 4.5.2, 5, 7) Corr(X, Y ) XY P (X [a, b]) = lim X (t) dt.
sets. Cov(X, Y )/(X Y ). T 2 T it
2. A distribution is uniquely determined by Pits moments Other generating functions (C&B sec. 2.6.2) Cumulant generat-
if all moments are defined and limn n 0 k
k=1 k r /k!
1. Corr(a1 X + b1 , a2 Y + b2 ) = Corr(X, Y ).
ing function log[MX (t)], if the mgf exists.
exists for all r in a neighborhood of zero. 2. Correlation is in the range [1, 1], with 1 indicating a
Factorial moment generating function (a.k.a. probability-
perfectly linear relationship (+1 for positive slope, 1
Variance (C&B 2.3.2, 4, p. 60, 4.5.6, ex. 4.58) Var X 2 = E(X generating function when X is discrete) E tX , if the ex-
for negative slope), by the Cauchy-Schwarz Inequality.
E X)2 = E X 2 (E X)2 . Often parametrized as 2 . pectation exists.
3. X
Y = Cov(X, Y ) = XY = 0 (assuming finite
1. For constants a, b, if Var X 6= , then Var(aX + b) = moments); note however that zero covariance need not
a2 Var X; imply independence. 1.5 Distributions
2. Assuming variances exist, Var(aX + bY ) = a2 Var X + Normal distribution (C&B p. 1024, 2.1.9, 3.6.5, 4.2.14, 4.3.4, 6, 5.3.3;
Skewness (C&B ex. 2.28; Greene 66) 3 3 (2 )3/2 , where i is
b2 Var Y + 2ab Cov(X, Y ); Wikipedia) Normal (a.k.a. Gaussian) particularly important
the ith central moment. Measures the lack of symmetry in
3. Var[Y E(Y |X)] = E[Var(Y |X)]. the because it is analytically tractable, has a familiar symmetric
p pdf. 0 for any normal, t, or uniform; 2 for exponential,
2 2/r for 2r , 2 a/a for gamma. bell shape, and CLT shows that it can approximate many
Multivariate variance (?) Var X E[XX0 ]E[X] E[X]0 . Thus: distributions in large samples. If X is normal with mean
Kurtosis (C&B ex. 2.28) 4 4 2 (and median) and variance 2 , then X N(, 2 ) with
1. Var(X + Y) = Var(X) + Cov(X, Y) + Cov(X, Y)0 + 2 , where i is the ith central
moment. Measures the peakedness of the pdf. 4 = 3 for pdf
Var(Y);
any normal. (Sometimes normalized by subtracting 3.) 1 2 2 1

x

2. Var(AX) = A Var(X)A0 . fX (x) = e(x) /(2 ) = .
Moment generating function (C&B 2.3.67, 1112, 15, 4.2.12, 4.6.7, 2 2
Conditional variance (C&B p. 151, 4.4.7; Greene 814) a.k.a. scedas- 9) MX (t) E e
tX as long as the expectation exists for t in a
fX has maximum at and inflection points at . Mo-
tic function. Var(Y |X) E[(Y E[Y |X])2 |X] = E[Y 2 |X] neighborhood 0. If MX exists, then n Z, n 0, ments are E X = , E X 2 = 2 + 2 , E X 3 = 3 + 3 2 ,
(E[Y |X])2 . E X 4 = 4 + 62 2 + 3 4 .
dn

0n E X n =

1. X
Y = Var(Y |X) = Var(Y ). M X (t) . Steins Lemma: If g() is differentiable with E |g 0 (X)| < ,
dtn
t=0 then E[g(X)(X )] = 2 E g 0 (X).
2. Conditional variance identity: provided the expecta-
tions exist, 1. It is possible for all moments to exist, but not the mgf. Z (X )/ is distributed N(0, 1) (i.e., standard nor-
2. If r.v.s have equal mgfs in some neighborhood of 0, then mal). E[Z k ] = 0 if k odd, E[Z k ] = 1 3 5 (n 1) if k
Var(Y ) = E[Var(Y |X)] + Var[E(Y |X)] . the variables are identically distributed (i.e., an extant even. CDF denoted (); pdf is
| {z } | {z }
residual variance regression variance mgf characterizes a distribution). 1 2
(z) fZ (z) = ez /2 .
3. If the mgfs of a sequence of r.v.s converge to MX in some 2
Implies that on average, conditioning reduces the vari-
neighborhood of zero, then the cdfs of the sequence con-
ance of the variable subject to conditioning (Var(Y ) 1. P (|X | ) = P (|Z| 1) 68.26%;
verge to FX at all points where FX is continuous.
E[Var(Y |X)]). 2. P (|X | 2) = P (|Z| 2) 95.44%;
4. For constants a, b, if MX exists, then MaX+b (t) =
Standard deviation (C&B 2.3.2) Var X. ebt MX (at). 3. P (|X | 3) = P (|Z| 3) 99.74%.

Covariance (C&B 4.5.1, 3, ex. 4.589; Greene 77) Cov(X, Y ) E[(X 5. For X Y , MX+Y (t) = MX (t)MY (t). For Independence and zero-covariance are equivalent for linear
Q
E X)(Y E Y )] = E[(X E X)Y ] = E[X(Y E Y )] = X1 , . . . , Xn mutually independent, MP Xi = i MXi . functions of normally distributed r.v.s. If normally dis-
E(XY ) (E X)(E Y ). If X, Y , Z all have finite variances, tributed random vectors are pairwise independent, they are
6. For X1 , . . . , Xn mutually independent, Z
then: mutually independent.
(a1 X
P1 + Q b1 ) + + (an Xn + bn ), then MZ (t) =
(et( bi ) ) i MXi (ai t). Given iid sample Xi N(, 2 ), log-likelihood is
1. Cov(X, Y ) = Cov[X, E(Y |X)]; X
2. Cov[X, Y E(Y |X)] = 0; Characteristic function (C&B sec. 2.6.2) X (t) E eitX , where L(x) = n
2
log(2) n
2
log( 2 ) 21 2 (xi )2 .

i = 1.
3. Cov(X, Y ) = E[Cov(X, Y |Z)] + Cov[E(X|Z), E(Y |Z)]. Many distributions can be generated with manipula-
4. Cov(X, Y + Z) = Cov(X, Y ) + Cov(X, Z). 1. The cf always exists. tions/combinations of normals:

7
1. Square of standard normal is 21 . Chi squared distribution (C&B 5.3.2; Hansen 5-2932; MaCurdy p. 6; Includes normal, gamma, beta, 2 , binomial, Poisson, and
2
Greene 92) n (Chi squared with n degrees of freedom) has negative binomial. C&B Theorem 3.4.2 gives results that
2. If X N(, 2 ),
Y N(, 2 ), and X Y , then
X + Y N( + , 2 + 2 ) (i.e., independent normals mean n and variance 2n. Can be generated from normal: may help calculate mean and variance using differentiation,
are additive in mean and variance). rather than summation/integration.
1. If Z N(0, 1), then Z 2 21 (i.e., the square of stan-
3. The sum and difference of independent normal r.v.s are dard normal is a chi squared with 1 degree of freedom); Can be re-parametrized as:
independent normal as long as the variances are equal. 2. If X1 , . . . , Xn are independent with Xi 2pi , then
4. Ratio of independent standard normals is Cauchy ( = Xi 2P p (i.e., independent chi squared variables
P
i
1, = 0); look for the kernel of the exponential distri- add to a chi squared, and the degrees of freedom add). k
!
bution when dividing normals.
X
f (x|) = h(x)c () exp i ti (x) ,
3. If X Nn (, ), then (X )0 1 (X ) 2n .
i=1
Bivariate normal distribution (C&B 4.5.10, ex. 4.45) Parameters 4. If X Nn (0, I) and Pnn is an idempotent matrix,
X , Y R; X , Y > 0; [1, 1]; and pdf (on R2 ): then X0 PX 2rank(P) = 2tr(P) .
 p 1 5. If X Nn (0, I) then the sum of the squared deviations over natural parameter space H { =
f (x, y) = 2X Y 1 2
from the sample mean X0 M X 2n1 . (1 , . . . , k ) : R h(x) exp( ki=1 i ti (x)) dx < }, where for
R P

all H, we have c () [ R h(x) exp( ki=1 i ti (x)) dx]1


R P
x X 2 6. If X Nn (0, ) and Bnn is an idempotent matrix,
 
1
exp then X0 BX 2rank(B) = 2tr(B) . to ensure the pdf integrates to 1.
2(1 2 ) X
 !!
y Y 2
   
x X y Y Students t distribution (C&B 5.3.4; Greene 6970) If X1 , . . . , Xn The joint distribution of an iid sample from an exponential
2 + .
X Y Y are iid N(, 2 ), then n(X )/ N(0, 1). However, we family will also be an exponential family (closure under ran-
will generally not know . Using
the sample variance rather dom sampling).
1. The marginal distributions of X and Y are N(X , X 2 ) than the true variance gives n(X )/s tn1 .
and N(Y , Y2 ) (note marginal normality does not im- q
Generally, N(0, 1)/ 2n1 /(n 1) tn1 . If a t distribu-
ply joint normality). Location and Scale families (C&B 3.5.17, p. 121) If f (x) a pdf,
tion has p degrees of freedom, there are only p 1 defined
2. The correlation between X and Y is . , constants with > 0, then g(x) also a pdf:
moments. t has thicker tails than normal.
3. For any constants a and b, the distribution of aX + bY t1 is Cauchy distribution (the ratio of two independent stan-
is N(aX + bY , a2 X
2 + b2 2 + 2ab ).
Y X Y dard normals). t is standard normal.  
1 x
4. All conditional distributions of Y given X = x are nor- g(x) f .
mal: Y |X = x Snedecors F distribution (C&B 5.3.68) (2p /p)/(2q /q) Fp,q .
The F distribution is also related by transformation with
2
N(Y + (Y /X ) (x X ), Y (1 2 )). several other distributions:
| {z }
2 X g(x) iff Z f , X = Z + . Assume X and Z exist;
Cov(X,Y )/X 1. 1/Fp,q Fq,p (i.e., the reciprocal of an F r.v. is another
P (X x) = P (Z (x )/), and if E Z and Var Z exist,
F with the degrees of freedom switched);
Multivariate normal distribution (Hansen 5-1735; MaCurdy p. 6;
then E X = E Z + and Var(X) = 2 Var Z.
2. (tq )2 F1,q ;
Greene 94) p-dimensional normal, Np (, ) has pdf
3. (p/q)Fp,q /(1 + (p/q)Fp,q ) beta(p/2, q/2).
p 1
f (x) = (2) 2 || 2 exp 12 (x )0 1 (x ) ,
 
1. Family of pdfs f (x ) indexed by is the location
Lognormal distribution (C&B p. 625) If X N(, 2 ), then family with standard pdf f (x) and location parameter
where = E[X] and ij = Cov(Xi , Xj ). Y eX is lognormally distributed. (Note: a lognormal is .
not the log of a normally distributed r.v.).
A linear transformation of a normal is normal: if X
Np (, ), then for any A Rqp with full row rank (which 2 2. Family of pdfs 1 f (x/) indexed by > 0 is the scale
E Y = e+( /2)
;
implies q p), and any b Rq , we have AX + b family with standard pdf f (x) and scale parameter .
2 2
Nq (A + b, AA0 ). In particular, 1/2 (X ) N(0, I), Var Y = e2(+ )
e2+ . (e.g., exponential.)
where 1/2 = (1/2 )1 = H1/2 H0 .
The following transformations of X Np (, ) are indepen- Exponential families (C&B 3.4; Mahajan 1-56, 11) Any family of 3. Family of pdfs 1 f ((x )/) indexed by , and > 0
dent iff AB0 = Cov(AX, BX) = 0: pdfs or pmfs that can be expressed as is the location-scale family with standard pdf f (x),
location parameter , and scale parameter . (e.g., uni-
1. AX N(A, AA0 ) and BX N(B, BB0 ), k
!
X form, normal, double exponential, Cauchy.)
f (x|) = h(x)c() exp wi ()ti (x) ,
2. AX N(A, AA0 ) and X0 BX 2rank(B) (where
i=1
B is an idempotent matrix),
where h(x) 0, {ti (x)} are real-valued functions, c() 0, Stable distribution (Hansen 5-15) Let X1 , X2 be iid F , and define
3. X0 AX 2rank(A) and X0 BX 2rank(B) (where
{wi ()} are real-valued functions, and the support does not Y = aX1 + bX2 + c. Then F is a stable distribution iff a,
A and B are idempotent matrices). depend on . b, c, d, e such that dY + e F .

8
1.6 Random samples Sample covariance (Greene 1024) 1.7 Convergence of random variables
Random sample, iid (5.1.1) R.v.s X1 , . . . , Xn are a random sam- 1 X sL
sXY (Xi X)(Yi Y ) Xn X
ple of size n from population f (x) (a.k.a. n iid r.v.s with n1 i
pdf/pmf f (x)) if they are mutually independent, each with
1 hX sr
marginal pdf/pmf f (x). By independence, f (x1 , . . . , xn ) =

i
Q = Xi Yi nX Y .
i f (xi ). n1 i as rL
Xn X Xn X
Statistic (C&B 5.2.1; Mahajan 1-7) A r.v. Y T (X1 , . . . , Xn ), where
T is a real or vector-valued function T (x1 , . . . , xn ) whose do- If Cov(Xi , Yi ) = XY < , then E sXY = XY (i.e., sXY
main includes the support of (X1 , . . . , Xn ). The distribution is an unbiased estimator of XY ). saX,bY = |ab|sXY . !) u} r0
p
of Y is called its sampling distribution. n
P Xn
X
P real numbers {xi , yi }i=1 , we have
For any i (xi x)(yi
Alternately, any measurable function of the data (as distinct y) = i xi yi nxy.
from a parametera function of the distribution).

Sample correlation (Greene 1024) rXY sXY /(sX sY ). Xn
X
d
Unbiased estimator (Hansen 5-14; C&B 7.3.2) An statistic is un-
biased for iff E () = for all . That is, if Bias[] raX,bY = (ab/|ab|)rXY . See more LLNs and CLTs in Time-series concepts.
E = 0 for all .
Convergence in probability (C&B 5.5.14, 12; Hansen 5-41; Hayashi
Order statistic (C&B 5.4.14) The order statistics of a sample converges in probability to
1 P 89; D&M 103; MaCurdy p. 9) {Xi }
Sample mean (C&B 5.2.2, 4, 68, 10, p. 216, 5.5.2) X n i Xi (i.e., X1 , . . . , Xn are the ordered values from X(1) (the sample i=1
the arithmetic average of the values in a random sample). X iff, > 0, limn P (|Xn X| ) = 0, or equivalently
minimum) to X(n) (the sample maximum). Thus the sam- p
For any real numbers, limn P (|Xn X| < ) = 1. Written as Xn X or
P the arithmetic average minimizes SSR ple median is
(i.e., x argmina i (xi a)2 ). Xn X = op (1) or plimn Xn = X.
(
1. If E Xi = < , then E X = (i.e., X is an unbiased X((n+1)/2) , n is odd; 1. Convergence in probability is implied by almost sure
M 1 convergence or convergence in Lp (for p > 0).
estimator of ) 2
(X(n/2) + X(n/2+1) ), n is even.
2. Convergence in probability implies convergence in dis-
2. If Var Xi = 2 < , then Var X = 2 /n.
tribution (but not conversely).
3. If Xi have mgf MX (t), then MX (t) = [MX (t/n)]n . If {Xi }n 3. (Weak Law of Large Numbers) If {Xi } iid with E Xi =
i=1 iid continuous r.v.s, then p
4. (Law of Large Numbers) If {Xi } iid, E Xi = < and < and Var Xi = 2 < , then the series Xn
as (stronger result gives convergence almost surely).
Var Xi = 2 < , then the series Xn (this also Xn  
n
implies convergence in probability, the Weak LLN). FX(j) (x) = [FX (x)]k [1 FX (x)]nk ; 4. (Continuous Mapping Theorem) If Xn
p
X and h is a
k=j
k
p
continuous function, then h(Xn )
h(X).
The distribution of the Xi , together with n, characterize the n!
distribution of X: fX(j) (x) = fX (x)[FX (x)]j1 [1 FX (x)]nj . 5. If E Xn and Var Xn 0, then Xn
.
p
(j 1)!(n j)!
1. If Xi N(, 2 ), then X N(, 2 /n). Uniform convergence in probability (Hayashi 4567; MaCurdy
See C&B 5.4.3 for discrete r.v. p. 14; D&M 137) {Qi ()}
converges in probability to Q ()
i=1 0
2. If Xi gamma(, ), then X gamma(n, /n). p
iff, sup kQn () Q0 ()k
0.
3. If Xi Cauchy(, ), then X Cauchy(, ). Samples from the normal distribution {Xi }n
(C&B 5.3.1, 6)
i=1 That is > 0, limn P (sup kQn () Q0 ()k ) =
4. If Xi (1/)f ((x )/) are members of a location- iid N(, 2 ) gives: 0, or equivalently limn P (sup kQn ()Q0 ()k < ) =
scale family, then X = Z + , where {Zi }n
i=1 is a
1.
random sample with Zi f (z). s2 (can also be shown with Basus Theorem);
1. X This is stronger than pointwise convergence. Uniform con-
vergence in probability is the regularity condition required
Sample variance (C&B 5.2.34, 6; Greene 1024) 2. X N(, 2 /n);
to pass plims through functions or to reverse the order of
3. Var(s2 ) = 2 4 /(n 1); differentiation and integration.
1 X 1 hX i
s2 (Xi X)2 = Xi2 nX 2 .
n1 i n1 i 4. (n 1)s2 / 2 2n1 . Little o error notation (D&M 10813; Hansen 5-42; MathWorld)
Roughly speaking, a function is o(z) iff is of lower asymp-
totic order than z.
Sample standard deviation is s s2 . If Var(Xi ) = 2 < If {Xi }n 2 m 2
, then E s2 = 2 (i.e., s2 is an unbiased estimator of 2 ). i=1 iid N(X , X ) and {Yi }i=1 iid N(Y , Y ), f (n) = o(g(n)) iff limn f (n)/g(n) = 0. If {f (n)} is
s2aX = a2 s2X . a sequence of random variables, then f (n) = op (g(n)) iff
s2X /s2Y s2 /X2 plimn f (n)/g(n) = 0.
real numbers {xi }n 2 =
= X
P
For
P any i=1 , we have i (xi x) 2 2 2 2
Fn1,m1 . p
2 2 X /Y sY /Y We write Xn X = op (n ) iff n (Xn X)
i xi nx . 0.

9
Big O error notation (D&M 10813; MathWorld) Roughly speaking, 2. (Strong Law of Large Numbers) If {Xi } iid with E Xi = Delta Method (C&B 5.5.24, 26, 28; Wikipedia; Hayashi 934) Let
a function is O(z) iff is of the same asymptotic order as z. as d
< and Var Xi = 2 < , then the series Xn . {Xi }
i=1 be a sequence of r.v.s satisfying n(Xn )
f (n) = O(g(n)) iff |f (n)/g(n)| < K for all n > N and some N(0, ). For a given function g and specific , suppose g 0 ()
2

positive integer N and some constant K > 0. If {f (n)} is P niid) If {Xi } niid with
3. (Strong Law of Large Numbers,
exists and g 0 () 6= 0. Then:
E Xi = 0 and limn n2 i Var Xi = , then the
a sequence of random variables, then f (n) = op (g(n)) iff as

series Xn 0. d
plimn f (n)/g(n) = 0. N(0, 2 [g 0 ()]2 ).
n[g(Xn ) g()]
as
4. (Continuous Mapping Theorem extension) If Xn X
Order symbols (D&M 111-2) as
and h is a continuous function, then h(Xn ) h(X). If g 0 () = 0, but g 00 () exists and g 00 () 6= 0, we can apply
d
the second-order Delta Method and get n[g(Xn ) g()]

O(np ) O(nq ) = O(nmax{p,q} ). Convergence in distribution (C&B 5.5.1013; Hayashi 901; Greene 1 2 00
g ()21 .
2
11920; D&M 107) {Xi }
converges in distribution to X iff,
o(np ) o(nq ) = o(nmax{p,q} ). i=1 Alternate formulation: If B is an estimator for then
( limn FXn (x) = FX (x) at all points where FX is contin- the variance of a function h(B) R is Var(h(B))
p q O(np ), p q; d
O(n ) o(n ) = uous. Written as Xn X or Xn X = Op (1) or as X is h()0 Var(B)h(). If h(B) is vector-valued, the variance
o(nq ), p < q. h
is H Var(B)H 0 , where H =
the limiting distribution of Xn . 0 (i.e., Hij j hi ()).

O(np ) O(nq ) = O(np+q ).


1. Convergence in probability is implied by almost sure
o(np ) o(nq ) = o(np+q ). convergence, convergence in Lp (for p > 0), or conver- 1.8 Parametric models
O(np ) o(nq ) = o(np+q ). gence in probability.
Parametric model (Mahajan 1-12) Describe an (unknown) prob-
2. Convergence in distribution implies convergence in ability distribution P that is assumed to be a member of a
These identities cover sums, as long as the number of terms
probability if the series converges to a constant. family of distributions P. We describe P with a parametriza-
summed is independent of n.
tion: a map from a (simpler) space to P such that
a Central Limit Theorem for iid samples (C&B 5.5.1415; Hansen P = {P : }. If is a nice subset of Euclidean
Asymptotic equivalence (D&M 1101) f (n) = g(n) iff d
limn f (n)/g(n) = 1. Lindeberg-Levy CLT: n(Xn )/
5-6065; Hayashi 96) space, and the mapping P is smooth, then P is called a
N(0, 1) as long as the iid Xi s each have finite mean, and fi- parametric model. A regular parametric model if either all
Convergence in Lp (Hansen 5-435; Hayashi 90; MaCurdy p. 11) nite, nonzero variance. Note a weaker form requires mgfs of P are continuous, or all are discrete.
{Xi } p Xi to exist in a neighborhood of zero.
i=1 converges in Lp to X iff, limn E(|Xn X| ) = 0. Parameter (Mahajan 1-2) Mapping from the family of distributions
Note this requires the existence of a pth moment. Written d
Lp In multivariate case, iid Xi (, ) satisfy n(Xn ) P to another space (typically a subset of Euclidean Space).
Xn X. Convergence in L2 a.k.a. convergence in mean N(0, ). Proved using Cramer-Wold Device. Can be explicitly or implicitly defined.
square/quadratic mean.
We also have CLT for niid samples (Lyapounovs Theorem), A function of the distribution (as distinct from a statistica
1. Convergence in Lp is implied by convergence in Lq for Ergodic stationary mds CLT, and CLT for MA() processes. function of the data).
q p.
Central Limit Theorem for niid samples (Hansen 5-62; D&M Identification (Mahajan 1-34; C&B 11.2.2) A parameter is (point)
2. Convergence in Lp implies convergence in Lj for j p. 126; MaCurdy p. 212) [Lyapounovs Theorem] If Xi identified if it is uniquely determined for every distribution;
niid(, i2 ) and a (2 + )th moment exists for each Xi , then i.e., if the mapping is one-to-one. When the parameter is
3. Convergence in Lp (for p > 0) implies convergence in
implicitly defined as the solution to an optimization prob-
probability and in distribution (but not conversely).
d
lem (e.g., (P ) = argmaxb Q0 (b, P )), identification corre-
Lp N(0, 2 ),
n(Xn ) sponds to existence of a unique solution to the optimization.
4. (Continuous Mapping Theorem extension) If Xn
Lp 1 P Two elements 1 , 2 are observationally equivalent iff
X and h is a continuous function, then h(Xn ) where 2 limn n 2
i i , as long as the iid Xi s each they imply P1 = P2 . Identification of means there is
h(X). have finite mean, and finite, nonzero variance. Note a weaker
no other element of observationally equivalent to ; i.e.,
form requires mgfs of Xi to exist in a neighborhood of zero.
Convergence in L2 to a constant requires limn E[(Xn P1 = P2 = 1 = 2 .
X)0 (Xn X)] = limn Bias2 + Var2 = 0. Thus it is nec- Implies that if i niid(0, 2 ) (with extant (2 + )th mo-
ment), and {zi } a series of (non-stochastic) constants, then Identification in exponential families (Mahajan 1-56, Metrics
essary and sufficient that squared bias and squared variance
d 1 P 2 P.S. 5-1) For iid sampling from an exponential family where
go to zero. n1/2 Z 0
N(0, 2 Szz ) where Szz limn n i zi =
1 0
i () = i , if the k k (Fisher Information) matrix
limn n Z Z.
Almost sure convergence (C&B 5.5.6, 9; Hayashi 89; D&M 106, 131; 
d log p(x, )

d log p(x, )
0 
MaCurdy p. 10) {Xi }

i=1 converges in probability to X iff, d I( ) E
as Slutskys Theorem (C&B 5.5.17; Hayashi 923) If Xn
X and d d
> 0, P (limn |Xn X| < ) = 1. Written Xn X. p
Also known as strong convergence, or convergence almost Yn
a, where a is a constant, then:
is nonsingular for every , then is identified.
everywhere.
d
1. Yn Xn
aX; Conditional homoscedasticity (Mahajan 2-17) The assumption
1. Almost sure convergence implies convergence in proba- d
that Var(Y |Z) = 2 ; i.e., variance of Y does not depend
bility and in distribution (but not conversely). 2. Xn + Yn
X + a. on Z.

10
Regression model (Mahajan 1-67, 3-9 Hayashi 4656, Metrics P.S. 7-7) 1.9 Statistics Complete statistic (Mahajan 1-16; C&B 6.2.21, 28) T : X T is
{Yi , Xi }n
i=1 , where Yi R the dependent variable (a.k.a. re- complete iff for every measurable real function g : T R
sponse) and Xi Rd the covariates or explanatory variables. Sufficient statistic (Mahajan 1-810; C&B 6.2.1) T (X) is sufficient such that , E [g(T )] = 0 implies that g(T ) = 0 almost
Possible specifications include: for {P : } (or more compactly for ) iff the condi- everywhere. Equivalently, T is complete if no non-constant
tional distribution of X given T (X) does not depend on ; function of T is first-order ancillary.
1. E(Yi |Xi ) = g(Xi ) for some unknown function g (non- i.e., p(x|T (X)) = p(x|T (x), ). Once the value of a sufficient If a minimal sufficient statistic exists, then any complete
parametric regression); statistic is known, the sample does not carry any further statistic is also a minimal sufficient statistic.
information about . Useful for:
2. E(Yi |Xi ) = g(Xi0 0 ) for some unknown 0 Rd
Basus theorem (Mahajan 1-19; C&B 6.2.24, 28) If T (X) is a com-
and unknown function g (single-index model; semi-
1. Decision theory: base decision rules on sufficient statis- plete minimal sufficient statistic, then T (X) is independent
parametric);
tics (for any decision rule, we can always come up with of every ancillary statistic. Note the minimal wasnt really
3. E(Yi |Xi ) = Xi0 0 for some unknown 0 Rd ; rule based only on a sufficient statistic that has the same necessary: if a minimal sufficient statistic exists, then any
risk); complete statistic is also a minimal sufficient statistic.
4. (Yi |Xi ) N(Xi0 0 , 2 ) for some unknown 0 Rd
and 2 R+ (Gaussian regression model; 0 is iden- 2. Dealing with nuisance parameters in hypothesis testing: Statistics in exponential families (Mahajan 1-11, 17; C&B

tified and conditional MLE = (X 0 X)1 X 0 Y is con- find sufficient statistics for the nuisance parameters and 6.2.10, 25) By the factorization theorem, T (X)
sistent if E(Xi Xi0 ) is nonsingular, conditional MLE condition decision rules on them; (T1 (X), . . . , Tk (X)) is sufficient for . (N.B. The statis-
1
2 = n (Y X )0 (Y X )). tic must contain all the Ti .)
3. Unbiased estimation: look for unbiased estimators that
are functions of sufficient statistics. IfP X is anP iid sample, then T (X)
Linear regression model with non-stochastic covariates ( i T1 (X1 ), . . . , i Tk (Xi )) is a complete statistic if the
(Mahajan 1-101, 189, Metrics P.S. 5-3) For two-dimensional Gaus-
Any one-to-one function of a sufficient statistic is also suf- set {(1 (), . . . , k ()) : } contains an open set in Rk .
sian regression model with Xi = (1, xi )0 known. The param- (Usually, all well check is dimensionality.)
ficient. Outside exponential families, it is rare to have a
eter (0 , 2 ) is identified as long as theP xi arePnot all
Pidenti- sufficient statistic of smaller dimension than the data.
cal, with complete sufficient statistic ( Yi , Yi2 , xi Yi ).
MLE computed in problem set. 1.10 Point estimation
Factorization theorem (Mahajan 1-9; C&B 6.2.6) In a regular para-
metric model {P : }, a statistic T (X) (with range T ) Estimator (Mahajan 2-2) Any measurable function of the data.
Seemingly Unrelated Regressions (Mahajan 2-11, 21)
m and X = (X 0 , . . . , X 0 )0 is sufficient for iff there exists a function g : T R Note this must not be a function of any parameters of the
{Yi , Xi }n
i=1 where Yi R i 1i mi
2 and a function h such that f (x, ) = g(T (x), )h(x) for all x distribution.
R(m ) . We assume Yi are (multivariate) normal distributed and .
with means E[Ysi ] = x0si s where s Rm and the parame- Extremum estimator (Hayashi 446) An estimator such that
2
0 )0 R(m ) .
ter of interest is = (10 , . . . , m Minimal sufficient statistic (Mahajan 1-12, 19; C&B 6.2.11) T (X) is there is a scalar (objective) function Qn () such that
minimal sufficient if it is sufficient, and for any other suf- maximizes Qn () subject to Rp . The objective
Probit model (Mahajan 2-123, 3-910; Hayashi 466) iid {Wi }n function depends not only on , but also on the data (a sam-
i=1 = ficient statistic S(X) we can find a function r such that
{Yi , Zi }n ple of size n).
i=1 where Yi {0, 1} and Yi have conditional dis- T (X) = r(S(X)). This means that a minimal sufficient
tribution P (Yi = 1|Zi ) = (0 Zi ) where () is the cdf of statistic induces the coarsest possible partition on the data;
Analogy principle (Mahajan 2-23) Consider finding an estimator
the standard normal. is identified and MLE is consistent i.e., it has achieved the maximal amount of data reduction
that satisfies the same properties in the sample that the pa-
if E(Zi Zi0 ) is nonsingular. possible while still retaining all information about the pa-
rameter satisfies in the population; i.e., seek to estimate (P )
rameter.
Alternate motivation is threshold crossing model: Yi = with (Pn ) where Pn is the empirical distribution which puts
0 Zi i where i
Zi and standard normal, and Yi = Any one-to-one function of a minimal sufficient statistic is mass n1
at each sample point. Note this distribution con-
I{Yi > 0}. minimal sufficient. If a minimal sufficient statistic exists, verges uniformly to P .
then any complete sufficient statistic is also a minimal suffi-
Nonlinear least squares (Mahajan 2-167, 3-67, 179) iid cient statistic. Consistent estimator (Mahajan 3-2; Hansen 5-412; C&B 10.1.1, 3)
{Yi , Zi }n The sequence of estimators {n }
n=1 is consistent for iff
i=1 where Yi have conditional expectation E(Y |Z) = p
(Z, ). The parameter can also be defined implicitly as Likelihood function (Mahajan 1-134) L(x, ) p(x, ). This is n
(P ). The sequence is superconsistent iff n =
= argminb E[Y (Z, b)]2 . Identification condition is the same as the pdf/pmf, but considered as a function of op (n1/2 )). Superconsistency implies consistency.
that for all b 6= , we have P ((Z, b) 6= (Z, )) > 0. instead of x. If limn Var n = 0 (variance goes to zero) and
See Mahajan 3-179 for asymptotic properties, including het- The likelihood ratio (x, ) L(x, )/L(x, 0 ), where 0 limn E n = (bias goes to zero) for every , then
eroscedasticity robust asymptotic variance matrix is fixed and known, with the support of P a subset of the {n } is consistent (sufficient, not necessary, by Chebychevs
support of P0 for all . The likelihood ratio is minimal Inequality).
Linear instrumental variables (Mahajan 2-223, 3-112) Yi = sufficient for .
Xi + i with moment conditions E[(Yi Xi )Zi ] = 0 for Consistency with compact parameter space (Mahajan 3-45,
random vector Zi . The Zi are endogenous instruments Ancillary statistic (Mahajan 1-14; C&B 6.2.16) S(X) is ancillary for 11; Hayashi 457) Let n argmaxb Qn (W, b)
for regressors Xi , endogenous because E i Xi 6= 0. Identifi- iff the distribution of S(X) does not depend on . argmaxb Qn (b). (In the last equivalence we have sup-
cation condition for is E(Zi Xi0 ) has full column rank and pressed dependence on the data W.) This covers M-
that dimension of Zi be at least dimension of Xi . It is first-order ancillary iff E S(X) does not depend on . estimators, MLE, and GMM estimators. Suppose that:

11
1. is a compact subset of Rd [generally not satisfied]; Asymptotically normal estimator(Mahajan 3-2, 134) The se- 5. (Invariance property) If is an MLE of , then h() is
quence of estimators {n }i is ( n) asymptotically normal an MLE of h().
2. Qn (b) is continuous in b for any realization of the data d
W [usually easily checked]; iff n(n (P )) N(0, V (P )) for some symmetric pos- Consistency for for MLE (Hayashi 463465; Mahajan 3-810) Let
itive definite matrix V (P ) (somewhat inaccurately referred {yt , xt } be ergodic stationary, and let be the conditional
3. Qn (b) is a measurable function of the data for all b
to as the asymptotic variance of n ). MLE that maximizes the average log conditional likelihood
[generally assumed].
Suppose that (derived under the assumption that {yt , xt } is iid).
These conditions ensure that n is well-defined. Suppose Suppose conditions (specified on Hayashi 464) allow us to
1. n is consistent for ; p
there exists a function Q0 (b) such that: apply a general consistency theorem. Then 0 despite
2. interior(); the fact that the MLE was derived under the iid assumption.
1. Identification: Q0 () is uniquely (globally) maximized 3. Qn (b) is twice continuously differentiable in a neighbor-
on at ; hood N of ; Asymptotic normality for MLE (Mahajan 3-202; C&B 10.1.12;

Qn () d Hayashi 4746; D&M 258, 2603, 2704)Suppose {Wi } {Yi , Zi }


2. Uniform convergence: Qn () converges uniformly in 4. n N(0, ); is an iid sample, that Z is ancillary for , that n
probability to Q0 () [can be verified by checking more 1 P
5. Uniform convergence of the Hessian: There exists a argmaxb n log p(Yi |Zi , b) argmaxb Qn (b). Define
primitive conditions; in particular, for M-estimators a
matrix H(b) that is continuous and nonsingular at score s(Wi , b)
log p(Yi |Zi ,b)
and Hessian H(Wi , b)
Uniform Weak LLN will suffice]. b
such that s(Wi ,b) 2
log p(Yi |Zi ,b)
p
2 b
= b b0
. Suppose:
Then n
. Qn (b) p
sup
0 H(b)
0.
bN 1. n is consistent for generally fails either because
Consistency without compact parameter space (Mahajan 3- number of parameters increases with sample size, or
5, 6; Hayashi 458) Let n argmaxb Qn (W, b) d
Then N(0, H()1 H()1 ).
n(n ) model not asymptotically identified (even if it is identi-
argmaxb Qn (b) as above. Suppose that: fied by any finite sample);
d
Asymptotic variance (C&B 10.1.9) If kn [n ] N(0, 2 ) for 2. interior();
1. True parameter interior(); some sequence of constants {kn }, then 2 is the asymptotic 3. p(Y |Z, b) is twice continuously differentiable in b for any
2. is a convex set; variance. (Y, Z);
3. Qn (b) is concave in b for any realization of the data W Asymptotically efficient estimator (C&B 10.1.112) {n } is 4. E[s(W, )] = 0 and E[H(W, )] = E[s(W, )s(W, )0 ]
[will be true for MLE, since log-likelihood is concave, if d (this is stated as an assumption, but is the Information
N(0, 2 ) and 2
asymptotically efficient if n[n ]
only after a re-parametrization]; is the CRLB. Equality and hence holds if its requirements do);
d
4. Qn (b) is a measurable function of the data for all b . 5. 1n
P
Under regularity conditions, the MLE is consistent and s(Wi , )
N(0, ) for some > 0;
asymptotically efficient.
6. E(supbN kH(W, b)k) < , which implies via ULLN
These conditions ensure that n is well-defined. Suppose
Maximum Likelihood estimator (Mahajan 2-410, 36; Hayashi 448 that
there exists a function Q0 (b) such that:
9, 4635) argmaxb L(X, b). Equivalently, for iid 1 n
X
p
1. Identification: Q0 () is uniquely (globally) maximized data, = argmaxb n 1 P
log p(Xi , b). Estimating = sup
n H(W i , b) E[H(W, b)]
0;
bN i=1
on at ; argmaxb EP log p(X, b). An M-Estimator with q(Xi , b)
p log p(Xi , b). Note: 7. E[H(W, )] is nonsingular (only required at true param-
2. Pointwise convergence: Qn (b)
Q0 (b) for all b .
eter).
1. The identification condition is that the parameter being
p
Then n exists with probability approaching 1 and n
. estimated is identified. Then n is asymptotically normal with variance I()1 (note
See Mahajan 3-6 for M-estimators. 2. MLE need not always exist, and if they do, need not be this is the Fisher information for one observation, not the
unique. joint distribution). The MLE is not necessarily unbiased,
Uniform (Weak) Law of Large Numbers (Mahajan 3-6; Hayashi 3. We may not be able to get a closed-form expression for but in the limit the variance attains the CRLB hence MLE
459) Suppose {Wi }i is ergodic stationary and that: , and therefore have to characterize it as the solution is asymptotically efficient. No GMM estimator can achieve
to a maximization problem (or its FOCs). lower variance.
1. is compact; 4. The expected value of (log) likelihood is uniquely max- Estimation of asymptotic variance can either be done by es-
imized at the true parameter as long as is identified; timating the Hessian or the score (either gives a consistent
2. q(Wi , b) is continuous in b for all Wi ;
i.e., the Kullback-Liebler Divergence expression using the Fisher Information Equality).
3. q(Wi , b) is measurable in Wi for all b;   
p(X, ) M-estimator (Mahajan 2-156; Hayashi 447)
4. EP [supb |q(Wi , b)|] < . K(b, ) E log > 0 for all b 6= , 1 P
p(X, b) argminb n q(Xi , b) (assuming iid data). Estimating
1 P
= argminb EP q(X, b).
Then n i q(Wi , b) converges uniformly to E[q(Wi , b)], and or equivalently, if p(x, b) = p(x, ) for all x implies that
E[q(Wi , b)] is a continuous function of b. b = . 1. MLE is an M-Estimator with q(Xi , b) log L(Xi , b).

This is quasi-ML if used for non-iid data. It can be consistent even for (non-iid) ergodic stationary processessee Hayashi 4645.

12
2. Sample mean is an M-Estimator with q(Xi , b) (Xi for some specified weight matrix Smm symmetric and Then n is asymptotically normal with variance
b)2 . positive definite. The quadratic form in S defines a
norm. Correct specification requires orthogonality condition [M ()W M ()0 ]1 M ()W S()W M ()0 [M ()W M ()0 ]1 .
Asymptotic normality for M-estimator (Mahajan 3-147; E[m(Xi , )] = 0. If we choose W = S()1 (the efficient choice), the asymp-
Hayashi 4704; D&M 593) Suppose {Wi } is an iid sample, that
1 P Extremum estimators can typically be thought of as GMM totic variance reduces to [M ()S()1 M ()0 ]1 .
n argmaxb n q(Wi , b) argmaxb Qn (b). De-
q(Wi ,b) estimators when we characterize them by FOCs. Includes Assuming conditions for ULLN apply,Pwe can estimate terms
fine score s(Wi , b) b
and Hessian H(Wi , b) MLE and M-estimators that can be characterized by FOCs. b 1
using consistent estimators: S m(Xi , n )m(Xi , n )0
2 n
s(Wi ,b) q(Wi ,b)
b
= b b0
. Suppose: and M
c 1 P m(Xi ,n )
.
Consistency for GMM estimators (Mahajan 3-101; Hayashi 467 n b

1. n is consistent for ; 8) Let n argmaxb Qn (b) (as above), where Efficient GMM (Mahajan 3-267; Hayashi 2123) Given above GMM
2. interior(); estimator, if W = S()1 (the inverse of the variance of the
 n 0  X n 
1 1X 1 moment conditions), then the asymptotic variance reduces to
3. q(W, b) is twice continuously differentiable in b for any Qn (b) = m(Xi , b) Sn m(Xi , b) .
2 n i=1 n i=1 Ve = [M ()S()1 M ()0 ]1 .
W;
d This is the lowest variance (in the matrix sense) that can be
4. 1n N(0, ) for some > 0;
P
s(Wi , ) The true parameter satisfies EP [m(W, )] = 0 and achieved. Therefore the optimal choice of weighting matrices
5. E(supbN kH(W, b)k) < , which implies via ULLN hence uniquely maximizes limit function Q0 (b) = is any sequence of random matrices that converge to S 1 . A
p
that 21 E(m(W, b)]0 S E(m(W, b)]. Suppose that: natural choice is to develop preliminary estimates n
(often using the identity as weighting matrix) and generating
1 n
X
1. is a compact subset of Rd ;
p
sup
H(Wi , b) E[H(W, b)] 0; h X X i1
bN
n
i=1
Wn = n 1
m(Xi , n ) m(Xi , n )0 .
2. m(b) is continuous in b for any realization of the data;
p
6. E[H(W, )] is nonsingular (only required at true param- 3. m(b) is a measurable function of the data for all b Under conditions necessary to implement ULLN, Wn

eter). (this ensures that is measurable); S 1 .

Then n is asymptotically normal with variance 4. The weight matrices Sn converge in probability to some Minimum Distance estimator (Mahajan 2-234) Estimator n
(E[H(W, )])1 (E[H(W, )])1 . symmetric positive definite matrix S. argminb gn (b)0 Sn gn (b) for some square weight matrix Sn .
Under appropriate conditions for a ULLN to apply, this GMM is a special case where gn are sample averages of some
Suppose further that: function.
variance is estimated by:
h X i1 h X i 1. Identification: E[m(W, b)] 6= 0 for any b 6= ; Asymptotic normality for MD estimator (Mahajan 3-22
Vb = 1
n
H(Wi , n ) 1
n
s(Wi , n )s(Wi , n )0 4) Let n argmaxb 21 [gn (b)]0 Wn [gn (b)]
| {z } 2. Dominance to ensure ULLN applies: argmaxb Qn (b). Define Jacobian [transpose of way we

E[supb km(W, b)k] < . gn (b)
b
usually write derivatives?] Gn (b) b
. Suppose:
h X i1
1 p
H(Wi , n ) . Then n
.
n 1. The matrix Gn (n )Wn Gn (bn ) is invertible (where bn
Showing identification and dominance for nonlinear GMM is is a point between and n for which a mean value
Method of Moments estimator (Mahajan 2-1920) Suppose iid expansion holds);
quite difficult and usually just assumed. If objective function
data from a parametric model with Rd identified and
is concave, we can replace compact ; continuous, measur- d
the first d moments of P exist: {mj ()}dj=1 {E X j }dj=1 . 2. ngn () N(0, S()) (by a CLT);
able m; and dominance by requirement that E[m(W, b)] exist
Method of Moments estimator gives moments equal to sam- p
and be finite for all b . 3. Gn (bn )
G() (by a ULLN);
ple moments:
p
n 4. Wn
W.
1X j Asymptotic normality for GMM estimator (Mahajan 3-246;
bj
mj () = m X for all j {1, . . . , d}. 1 0
Hayashi 47881) Let n argmaxb [mn (b)] Wn [mn (b)]
n i=1 i 2 P Then n is asymptotically normal with variance
1
argmaxb Qn (b), where mn (b) n m(Xi , bd1 )m1 .
1 P m(b) mn (b) [G()W G()0 ]1 G()W S()W G()0 [G()W G()0 ]1 .
Generalized Method of Moments estimator (Mahajan 2-201; Jacobian Mn (b)dm n b
= b
. Suppose:
Hayashi 447, 468) GMM estimates parameter R
d satisfy- If we choose W = S()1 (the efficient choice), the asymp-
ing E[m(X, )] = 0 where m : X Rm a vector of m 1. The matrix Mn (n )Wn Mn (bn ) is invertible; totic variance reduces to [G()S()1 G()0 ]1 .
moment conditions. If is identified, it is the unique solution
2. nmn ()
d
N(0, E[m(X, )m(X, )0 ]) N(0, S()) Uniformly minimum variance unbiased estimator (Mahajan
to the moment conditions.
2-279, Metrics P.S. 6-1; C&B 7.3.7, 17, 1920, 23, 7.5.1) An unbiased
(by a CLT);
When m d we typically cant find a satisfying all moment estimator (X) of a quantity g() is a UMVUE (a.k.a. best
conditions, so instead we seek = argminb Qn (b) where p m(X,)
3. Mn (bn )
E[ ] M () (by a ULLN); unbiased estimator) iff has finite variance and for every un-
b
 X 0  X  biased estimator (X) of g(), we have Var (X) Var (X)
1 1 p
Qn (b) n m(Xi , ) S n m(Xi , ) 4. Wn
W. for all . Note:

If {Wi } is non-iid ergodic stationary, then is the long-run variance of {s(Wi , )}; Gordins conditions are sufficient for this convergence.

13
1. Unbiased estimators may not exist; Then Type I/Type II error (Mahajan 4-2) Type I error: rejecting H
when in fact H . Type II error: accepting H when
2. Not every unbiased estimator is a UMVUE; 
dg()
0 
dg()

Var (X) I()1 . K .
3. If a UMVUE exists, it is unique; d d
Power (Mahajan 4-23) () P ((X) = 1), for K . The
4. (Rao-Blackwell) If h(X) is an unbiased estimator of
Attained iff there exists a function a() such that chance of (correctly) rejecting H given that the true param-
g(), and T (X) is a sufficient statistic for , then
eter is K . One minus the probability of Type II error
(T ) E[h(X)|T ] is unbiased for g(), and has vari- a()[(x) g()] =
log f (x|).
at a given .
ance (weakly) lower than h(X) for all means we only
need consider statistics that are functions of the data
Size (Mahajan 4-3; C&B 8.3.5) For H , the power function
only through sufficient statistics; 1.11 Decision theory () P ((X) = 1) gives the probability of a Type I error
5. If (T ) is an unbiased estimator of g() and is a func- (rejecting H incorrectly). Size is the largest probability of
tion of a complete statistic T (X), then all other un- Loss function (Mahajan 2-245) We observe x X with unknown this occurring: supH ().
biased estimators that are functions of T are equal to distribution P P. Loss function l : P A R+ (where A
(T ) almost everywhere; is the action space), l(Pb , a) gives the loss that occurs if the Level (Mahajan 4-3; C&B 8.3.6) A test is called level for some
statistician chooses action a and the true distribution is Pb . [0, 1] if its size is at most .
6. (Lehmann-Scheffe) If (T ) is a function (only) of a com- Common examples when estimating parameter v(P ) include
plete statistic T (X), then (T ) is the unique UMVUE
Distance function principle (Mahajan 4-78) When the hypoth-
of E (T ); 1. Quadratic loss: l(P, a) [v(P ) a]2 ; esis is framed in terms of a parameter for which we have
7. (Hausman Principle) W is a UMVUE for E W iff it is 2. Absolute value loss: l(P, a) |v(P ) a|; a good estimate , it may be reasonable to reject if the dis-
uncorrelated with every unbiased estimator of 0 (prac- tance between estimate and H is large; i.e., reject if
tically, impossible to prove, except for the case where 3. 0-1 loss: action space is {0, 1} and l(P , a) is zero if T (x) inf bH d(, b) > k, where d(, ) is some distance
W is a function only of a complete statistic). a , one otherwise. function.

Fisher Information (Mahajan 2-301, Metrics P.S. 6-3)


Decision rule (Mahajan 2-25) A mapping from the sample space X p-value (Mahajan 4-89; C&B 8.3.26) The smallest level of significance
to the action space A. at which a researcher using a given test would reject H on the
  0 
I() E
log f (x, )
log f (x, ) . basis of observed data x. The magnitude of the P-value can
Risk function (Mahajan 2-25) Expected loss (expectation is taken
be interpreted as a measure of the strength of the evidence
| {z } using the true distribution): R(P , ) = E [l(P , )].
Score for H.

Fisher information for an iid sample is n times information Mean squared error risk function (Mahajan 2-256; Greene 109 C&B offers a different conception: a test statistic with
11; C&B 7.3.1) If loss function l(, ) is quadratic loss, risk p(x) [0, 1] for all x X , and P (p(x) ) for every
for each individual observation. For (univariate) normal,
function is MSE risk function: R(P , ) E [(X) H and every [0, 1].

2 0
 
2 0
 g()]2 , where g() is the parameter being estimated. Note
I(, 2 ) = I(, 2 )1 = If we observe an iid
sample from normal with
known variance
0 1 4

;
0 2 4 . R(P , ) = Bias [(X)]2 + Var [(X)].
2 and use T (x) = nX/, the P-value is ( nX/).
Typically it is infeasible to minimize MSE, since it depends
Cramer-Rao Inequality (Mahajan 2-301, 5; C&B 7.3.911, 15) Given on the (presumably unknown) parameter g(). So we often Uniformly most powerful test (Mahajan 4-10, 19; C&B 8.3.11) Let
a sample X f (x|) and an estimator (X) with E (X) = use UMVUE instead. C be a set of tests. A test c C with power function ()
g(), suppose that is a UMP class C test of H against K : K iff for every
c C and every K , we have () c (). That is, for
1. The support does not depend on ; 1.12 Hypothesis testing every K , the test c is the most powerful class C test.

2. pdf f (x|) is differentiable in almost everywhere; In many cases, UMP level tests do not exist.
Hypothesis testing (Mahajan 4-12) We observe data x from dis-
3. E || < (or per C&B, Var < ); tribution P P (usually {P : }), and must decide
Simple likelihood ratio statistic (Mahajan 4-10) For simple null
whether P PH P. The null hypothesis H is that
4. The operations of differentiation and integration can be and alternate hypotheses, L(x, H , K ) p(x|K )/p(x|H )
P PH P, or equivalently H . If PH is a singleton
d (where p is either a pdf or pmf). By convention,
R
interchanged in d (x)f (x, ) dx; we call H simple, otherwise composite (identically for K).
L(x, H , K ) = 0 when both numerator and denominator
5. Fisher Information I() is nonsingular. Note under pre- The alternate hypothesis K is P PK P (or K ),
are zero.
vious conditions, where PH PK = and we assume the maintained hypoth-

esis P PH PK .
(a) I() = Var [ log f (x, )]; Neyman-Pearson Lemma (Mahajan 4-12; C&B 8.3.12) Consider
(b) If f (, ) is twice differentiable and double integra- Test statistic (Mahajan 4-2) Test statistic (a.k.a. test function) is a testing H : = 0 against K : = 1 where pdf/pmf is p(|).
tion and differentiation under the integral sign can decision rule : X {0, 1}, where 1 corresponds to rejecting Consider (randomized) likelihood ratio test function:
be interchanged, then (Fisher Information Equal- H, and 0 to accepting it. Typically we evaluate a test over
ity): this action space using the 0-1 loss function. 1,
L(x, 0 , 1 ) > k;

2
 k (x) = 0, L(x, 0 , 1 ) < k;
I() = E log f (x, ) . Equivalently defined by a critical region in the sample space
0 (0, 1), L(x, 0 , 1 ) = k.

C {x X : (x) = 1}.

14

1. If k is a size (> 0) test (i.e., E0 [k (X)] = ), then Likelihood ratio test (Mahajan 4-245, 346; Greene 15967; C&B 1. n( 0 ) has Taylor expansion
k is the most powerful in the class of level tests. 10.3.1, 3; D&M 275) Test that rejects for large values of
Qn (0 )
2. For each [0, 1] there exists a MP size likelihood n( 0 ) = 1 n + op ;
supK p(x, )
ratio test of the form k . L(X) .
supH p(x, ) Qn (0 ) d
3. If a test is a MP level test, then it must be a level 2. n

N(0, ) for some positive definite ;
likelihood ratio test. That is, there exists k and such
Because distribution properties are often complex (and be- 3. n( 0 ) converges in distribution (to something);
that for any {0 , 1 }, we have P ((X) 6= (X)) = cause H is often of smaller dimension than = H K ,
0. 4. = .
which means sup over K equals sup over ), we often use
Corollary: Power of a MP test its size. Consider the test These conditions are satisfied for ML and efficient GMM un-
sup p(x, ) p(x, ) der certain conditions.
that rejects for all data with probability ; it has power , (x) =
supH p(x, ) p(x, C ) a()
so MP test cant do worse. Under null H0 : ar1 (0 ) = 0, with A() 0 and
A(0 ) of full row rank (no redundant restrictions), define
Monotone likelihood ratio models (Mahajan 4-168; C&B 8.3.16) where is the MLE and C is the constrained MLE. the constrained estimator as c argmaxb Qn (W, b) s.t.
{P : }, where R is a MLR family if L(x, 1 , 2 ) a(b) = 0. Then
We will generally base our test on some monotonic function
is a monotone function of some function of the data T (x)
of (x). Often 2 log (x) = 2[log p(x, )log p(x, C )]. Writ-  Q ( ) 0
in the same direction for all pairs 2 > 1 . If L(x, 1 , 2 ) e 1 Qn (c )
 
n c d
ten more generally, we often have LM n 2r
is increasing (decreasing) in T (x), the family is increasing
(decreasing) MLR in T (x).
| {z } | {z } | {z }
d 1p pp 1p
2r ,
2n[Qn () Qn (C )]
In the one parameter exponential family f (x|) =
h(x) exp[()T (x) B()], where e is a consistent estimator under the null (i.e., con-
where r isPthe number of restrictions (and where here,
   1
Qn (b) n log p(xi , b)). structed using c ).
L(x, 1 , 2 ) = exp (2 ) (1 ) T (x) B(2 ) B(1 ) ;
Asymptotic significance (Mahajan 39) A test n () has asymp-
Wald statistic (Hayashi 48991; Greene 15967; Mahajan 368; D&M 278) totic significance level iff limn PP (n (X) = 1) for
and if () is strictly increasing in , the family is MLR Consider an estimator n argmaxb Qn (W, b) satisfying:
increasing in T (x). all P P0 .

Suppose {P : } where R is MLR increasing in 1. n( 0 ) has Taylor expansion Asymptotic size (Mahajan 39) Asymptotic size (a.k.a. limiting
T (x). Let t () be the test that rejects if T (x) > t (and size) of a test n () is limn supP P0 PP (n (X) = 1).
possibly with some random probability if T (x) = t). Then, Qn (0 )
n( 0 ) = 1 n + op ; Consistent test (Mahajan 39) A test n () is consistent iff

1. The power function t () is increasing in for all limn PP (n (X) = 1) = 1 for all P P1 .
and all t; Qn (0 ) d
2. n

N(0, ) for some positive definite ;
2. If E0 [t (X)] = > 0, then t () is UMP level for 1.13 Time-series concepts
testing H : 0 against K : > 0 . 3. n( 0 ) converges in distribution (to something);
4. = . iid white noise = stationary mds with finite variance =
Unbiased test (Mahajan 4-20; C&B 8.3.9; Greene 1578) A test for white noise.
H : H against K : K is unbiased of level iff
These conditions are satisfied for ML and efficient GMM un- Stationarity (Hayashi 989; Hansen B1-11; D&M 132) {zi } is (strictly)
der certain conditions. stationary iff for any finite set of subscripts i1 , . . . , ir , the
1. () for all H (i.e., level ); and
a()
Under null H0 : ar1 (0 ) = 0, with A() and A(0 ) joint distribution of (zi , zi1 , zi2 , zir ) depends only on i1
2. () for all K (i.e., power everywhere). 0
of full row rank (no redundant restrictions), i, . . . , ir i but not on i.
If is biased, then there is some H H and K K Any (measurable) transformation of a stationary process is
d
such that (H ) > (K ); i.e., we are more likely to reject W na()0 [A()
b 1 A()0 ]1 a()
2r . stationary.
under some true value than some false value.
Covariance stationarity (Hayashi 99100, 401; Hansen B1-11) {zi } is
Caution: is the inverse of the asymptotic variance of the covariance- (or weakly) stationary iff:
Uniformly most powerful unbiased test (Mahajan 4-203) A
estimator.
test for H : H against K : K is UMPU of
1. E[zi ] does not depend on i, and
level iff it is unbiased at level and it is more powerful Note the restricted estimator doesnt enter calculation of the
(for all K ) than any other unbiased test of level . Wald statistic, so the size of resulting tests may be limited. 2. j Cov[zi , zij ] (the j-th order autocovariance) ex-
Also, the Wald statistic (unlike the LR or LM statistics) is ists, is finite, and depends only on j but not on i.
The first condition is not strictly necessary, since the con-
not invariant to the restriction formulation.
stant randomized level test is unbiased. If is UMP level A strictly stationary process is covariance-stationary as long
, it will also be UMPU level . as the variance and covariances are finite.
Lagrange multiplier statistic (Hayashi 489, 49193; Greene 15967;
See Mahajan 4-23 for UMPU tests in one parameter expo- Mahajan389; D&M 2758) Consider an estimator n LLN for covariance-stationary processes with vanishing au-
nential families. argmaxb Qn (W, b) Rp satisfying: tocovariances:

15
2 L
1. If limj j = 0, then y ; Martingale (Hayashi 1023; Hansen B1-14; D&M 133) {xi } is a CLT for MA(): If {t } is iid white noise, and absolute

P
P P martingale with respect to {zi } (where xi zi ) iff summability jZ |j | < holds, then
2. If jZ j < , then limn Var( ny) = jZ j <
E[xi |zi1 , zi2 , . . . , z1 ] = xi1 for i 2.
(called the long-run variance).
 X   
d
Random walk (Hayashi 103; Hansen B1-14) An example of a martin- n[y ]
N 0, j = N 0, 2 [(1)]2 .
White noise (Hayashi 101; Hansen B1-28) {i } is white noise iff: jZ
gale. Let {gi } be an iid white noise process. The sequence
1. It is covariance-stationary, of cumulative sums {zi } where zi = g1 + + gi is a random
walk. Autoregressive process of degree one (Hayashi 3768, 385) yt =
2. E[i ] = 0 for all i (zero mean), and c+yt1 +t , with {t } white noise and Var t = 2 . Equiv-
3. j Cov[i , ij ] = 0 for j 6= 0 (no serial correlation). Martingale difference sequence (Hayashi 104; D&M 134; Hansen alently, (1 L)yt = c + t , or (1 L)(yt ) = t where
B1-14) {gi } is an mds iff E[gi |gi1 , gi2 , . . . , g1 ] = 0 for i 2. c/(1 ).
An iid sequence with mean zero and finite variance is an The cumulative sum of an mds is a martingale; conversely,
independent (i.e., iid) white noise process. 1. If || < 1, then the unique covariance-stationary so-
the first differences of a martingale are an mds. A mds has
lution is the MA() yt = + j
P
no serial correlation (i.e., Cov(gi , gj ) = 0 for all i 6= j). j=0 tj ; it has
Ergodicity (Hayashi 1012, 4025; Hansen B1-12, 26; D&M 1323) {zi } is absolutely summable coefficients.
ergodic iff it is stationary and for any two bounded functions
Ergodic stationary martingale differences CLT (Hayashi E yt = ,
f : Rk R and g : Rl R, {g }
1067; Hansen B1-156) Given a stationary ergodic mds i j = j 2 /(1 2 ),
with E[gi gi0 ] = ,


lim E[f (zi , . . . , zi+k ) g(zi+n , . . . , zi+n+l )] Autocovariances j are absolutely summable,
n
1 X
n j j /0 = j ,
= E[f (zi , . . . , zi+k )] E[g(zi+n , . . . , zi+n+l )] . ng
d
gi
N(0, ). Long-run variance jZ j = 2 /(1 )2 .
P
n i=1
(Note the RHS needs no limit in n, since by stationarity it 2. If || > 1, then the unique covariance-stationary solu-
doesnt affect g().) Intuitively, represents asymptotic in- tion is a MA() of future values of .
dependence: two elements sufficiently far apart are nearly Lag operator (Hayashi 36974; Hansen B1-313) Lj xt xtj . Filter
(L) 0 + 1 L + P 2 3. If || = 1 (unit root), there is no covariance-
independent. 2 L + . Filter applied to a constant
is (L)c = (1)c = c j=0 j . stationary solution.
A sequence of the form {Z(zi , . . . , zi+k )} is also ergodic.
Ergodic Theorem is LLN for ergodic processes: if {zi } er- Autoregressive process of degree p
P
1. If coefficients are absolutely summable ( |j | < ), (Hayashi 37880, 385)
1 P as
godic stationary with E[zi ] = , then zn n i zi .
and {xt } is covariance-stationary, then yt (L)xt
converges in L2 and is covariance-stationary; if autoco- yt = c + 1 yt1 + + p ytp + t ,
Gordins CLT for zero-mean ergodic stationary processes:
variances of {xt } are absolutely summable, then so are
Suppose {zi } is ergodic stationary and with {t } white noise and Var t = 2 . Equivalently,
autocovariances of {yt }.
(L)yt = c + t where (L) 1 1 L p Lp (note
1. E[zt zt0 ] exists and is finite; 2. As long as 0 6= 0, the inverse (L)1 is well defined. the minus signs). Equivalently, (L)(yt ) = t where
P
L 2 3. If (z) = 0 = |z| > 1 (the stability condi- c/(1) = c/(1 j ).
2. E[zt |ztj , ztj1 , . . . ] 0 as j (i.e., knowing
about the distant past does you no good in estimating tion), then the coefficients of (L)1 are absolutely Assuming stability (i.e., (z) = 0 = |z| > 1),
today); summable.
P q 1. The unique covariance-stationary solution is the
0 r ] , Moving average
3. j=0 E[rtj tj < where rtj P process (Hayashi 3668, 402; Hansen B1-289, 367) MA() yt = + (L)1 t ; it has absolutely summable
E[zt |ztj , ztj1 , . . . ] E[zt |ztj1 , ztj2 , . . . ] are yt = + j=0 j tj , with {t } white noise and Var t = coefficients;
the revisions to conditional expectations as the in- 2 . Equivalently, (yt ) = (L)t . Given absolute summa-
P 2. E yt = ;
formation set increases. bility jZ |j | < , a sufficient condition for convergence
(in L2 ), 3. Autocovariances j are absolutely summable;
Then 4. Long-run variance jZ j = 2 /[(1)]2 .
P
1. E[yt ] = ;
1. E[zt ] = 0;
2. Autocovariances j = 2
P
k=0 j+k k (if we have a
ARMA process of degree (p, q) (Hayashi 3803, 385; Hansen B1-30,
2. Autocovariances
P are absolutely summable (i.e., MA(q) process, j = (j 0 +j+1 1 + +q qj )2 437) Autoregressive/moving average process with p autore-
jZ |j | < ); for |j| q); gressive lags and q moving average lags:
3. z is asymptotically normal with long-run variance as in 3. Autocovariances j are absolutely summable (i.e.,
covariance-stationary processes with vanishing covari- P yt = c + 1 yt1 + + p ytp + 0 t + + q tq ,
jZ |j | < );
ances:
with {t } white noise and Var t = 2 . Equivalently,
 X 
4. Long-run variance jZ j = 2 [(1)]2 .
P
d
nz
N 0, j .
jZ 5. If {t } is iid white noise, {yt } is ergodic stationary. (L)yt = c + (L)t

Hansen B1-26 gives a slightly different statement for nonzero-mean ergodic stationary processes.

This is actually a corollary of Billingsleys stronger CLT (which does not require ergodic stationarity) stated in Hansen B1-16.

16
where (L) 1 1 L p Lp (note the minus signs) GARCH process (Hansen B1-389) GARCH(p, q) is a sequence 1. Exact ML: use the fact that
and (L) 0 + 1 L + + q Lq . Equivalently, {ri } with conditional variance given by f (yn , . . . , y0 ) f (yn1 , . . . , y0 )
f (yn , . . . , y0 ) =
f (yn1 , . . . , y0 ) f (yn2 , . . . , y0 )
(L)(yt ) = (L)t Var[ri |ri1 , . . . , r1 ] i2 =
2 2 2 2 f (y0 )
P + 1 ri1 + + q riq + 1 i1 + + ip .
where c/(1) = c/(1 j ). Note if (z) and n
hY i
(z) have a common root, we can factor it out and get an = f (yt |yt1 , . . . , y0 ) f (y0 ).
For {ri } a GARCH(1, 1) process, define vi ri2 i2 . Then
ARMA(p 1, q 1). we can write {ri2 } as an ARMA(1, 1) and {i2 } as an AR(1), t=1

Assuming stability (i.e., (z) = 0 = |z| > 1), with shocks {vi }: Log-likelihood is generally nonlinear due to the last
term.
ri2 = + (1 + 1 )ri1
2
+ vt 1 vt1 ,
1. The unique covariance-stationary solution is the 2. Conditional ML: use the (linear)
MA() yt = + (L)1 (L)t ; it has absolutely i2 = + (1 + 1 )i1
2
+ 1 vt1 .
n
summable coefficients; X
log f (yn , . . . , y0 |y0 ) = log f (yt |yt1 , . . . , y0 ).
2. E yt = ; Local level model (Hansen B1-412) Xt is true latent variable, t=1
Yt is observed value:
3. Autocovariances j are absolutely summable.
Yt = Xt + t , 1.14 Ordinary least squares
If (z) = 0 = |z| > 1 (here called the invertability Xt = Xt1 + t
condition), we can write as a AR(): (L)1 (L)yt = Ordinary least squares model (Hayashi 412, 34, 10910, 123, 126;

(c/(1)) + t . with {t } (measurement error) and {t } independent iid Mahajan 6-46, 123) Let

shocks. Yt is an ARMA(1, 1) with Yt = Yt1 +t +t t1 .


xi1

1

For ARMA(1, 1) yt = c + yt1 + t t1 , long-run vari-
ance is . .
Estimating number of lags (Greene 8345) Calculate sample au- Xi = . , = . ,

1 2
 . .
X
j = 2 . tocorrelations and partial autocorrelations using OLS: |{z} |{z}
K1 xiK K1 K
jZ
1
acf : For k , regress yt on ytk (and potentially a con- y1

1
0
X1
stant);
Y = . , = . , X = .. .
P P . .
For ARMA(p, q) yt = c + [ i yti ] + t [ j tj ],

i j pacf : For k , regress yt on yt1 , . . . , ytk (and poten- . . .
long-run variance is
|{z} |{z} |{z}
tially a constant), and use the coefficient on ytk . n1 yn n1 n nK Xn0

1
P !2 The model may assume:
X [(1)]2 j j Typically, an AR(p) will have acf declining monotonically,
j = 2 = 2 .
[(1)]2 and pacf irregular out to lag p, after which they will abruptly
P
jZ
1 i i 1. Linearity (p. 4): Y = X + ;
drop to zero and remain there.
2. Strict exogeneity (p. 79): E[|X] = 0 (n.b., conditional
Typically, an MA(q) will have pacf declining monotonically, on regressors for all observations); implies:
Autoregressive conditional heteroscedastic process and acf irregular out to lag q, after which they will abruptly
(Hayashi 1045; Hansen B1-38) ARCH(q) is a martingale E[] = 0,
drop to zero and remain there.
difference sequence {gi } with conditional variance E[xjk i ] = 0 for all i = 1, . . . , n, j = 1, . . . , n,
2
Var[gi |gi1 , . . . , g1 ] = + 1 gi1 2 .
+ + q giq For An ARMA(p, q) is a mixture of the above. k = 1, . . . , K,
example, ARCH(1) is: Cov(i , xjk ) = 0 for all i = 1, . . . , n, j = 1, . . . , n,
Estimating P AR(p) (Hayashi 3924, 5479) Suppose {yt } with yt = k = 1, . . . , K;
q c + [ pj=1 j ytj ] + t , and {t } iid white noise (note we
2
gi = i + gi1 need iid for ergodicity and stationarity). Then we can use 3. No multicolinearity (p. 10): rank(X) = K with proba-
OLS to regress yt on (1, yt1 , . . . , ytp )0 , since the model bility 1 (full column rank);
and {i } iid with E i = 0 and Var i = 1. satisfies linearity, stationary ergodicity, independence of re- 4. Spherical error variance (p. 102): E[0 |X] = 2 In , or
gressors from errors, conditional homoscedasticity, mds and equivalently,
1. E[gi |gi1 , . . . , g1 ] = 0; finite second moments of g, and the rank condition. We have Conditional homoscedasticity: E[2i |X] = 2 for all
consistent, asymptotically normal estimation of , as well as i = 1, . . . , n,
2. Var[gi |gi1 , . . . , g1 ] = E[gi2 |gi1 , . . . , g1 ]
= + 2 ,
gi1 consistent variance estimation.
which depends on the history of the process (own con- No correlation between observations: E[i j |X] =
Under Gaussian AR(1), conditional ML for is numerically 0 for i 6= j;
ditional heteroscedasticity); 2
equal to OLS estimate, and conditional ML CML = SSR .
n 5. Normal errors (p. 34): |X N(0, 2 In ), which given
3. Strictly stationary and ergodic if || < 1 and g1 drawn
Maximum likelihood with serial correlation (Hayashi 54347) X and N(0, 2 In );
above assumptions implies
from an appropriate distribution (or process started in
infinite past); Note that treating a serially correlated process as if it were 6. Ergodic stationarity (p. 109): {yi , Xi } is jointly station-
iid may give consistent (quasi-) ML estimators, since it is ary and ergodic (implies unconditional homoscedastic-
4. If the process is stationary, E[gi2 ] = /(1 ). an M-estimator no matter the datas distribution. ity, but allows conditional heteroscedasticity);

It is not clear to me what the white noise sequence and MA coefficients are that generate this sequence, but since t + t t1 is covariance stationary with two nonzero autocovariances, we should be able to fit one.

17
7. Predetermined regressors (p. 10910) All regressors Asymptotic properties of OLS estimator (Hayashi 113, 115) Projection matrix (Hayashi 189; Mahajan 6-910; D&M 2923) P
are orthogonal to the contemporaneous error term: OLS estimator b satisfies X(X 0 X)1 X 0 . Projects onto space spanned by columns of
E[xik i ] = 0 for all i = 1, . . . , n, k = 1, . . . , K; also p X (P X = X). Fitted values Yb Xb = P Y . Idempotent
written E[gi ] = 0 where gi Xi i ; 1. Consistent: b
(under assumptions 1, 68). and symmetric.
d
8. Rank condition (p. 110): The K K matrix xx 2. Asymptotically normal: n(b )
N(0, Avar(b)) An oblique projection matrix P X(X 0 1 X)1 X 0 1
1 1
E[Xi Xi0 ] is nonsingular (and hence finite); where Avar(b) = xx Sxx (assumptions 1, 6, 89). (and its associated annihilator) are idempotent, but not sym-
9. gi is a mds with finite second moments (p. 110): {gi } 3. Consistent variance estimation: Assuming existence of metric.
{Xi i } is a mds (which implies assumption 7); the a consistent estimator Sb for S E[gi g 0 ], estimator
i
K K matrix S E[gi gi0 ] is nonsingular; \ Sxx 1 b 1 Annihilator matrix (Hayashi 189, 301; Mahajan 6-910) M I P .
Avar(b) SSxx is consistent for Avar(b) (assump- Residuals e = Y Yb = M Y = M , and thus are orthogonal
10. Finite fourth moments for regressors (p. 123): tion 6). to X (since M X = 0).
E[(xik xij )2 ] exists and is finite for all k, j = 1, . . . , K;
OLS error variance estimator s2 (a.k.a. variance of residu- Sum of squared residuals SSR e0 e = Y 0 M Y = Y 0 e =
11. Conditional homoscedasticity (p. 126): E[2i |xi ] = 2 > als) is consistent for E[2i ], assuming expectation exists and e0 Y = 0 M .
0 (implies S = 2 xx ). is finite (assumptions 1, 68).
M is idempotent and symmetric. rank(M ) = tr(M ) = nK.
Ordinary least squares estimators (Hayashi 168, 19, 278, 31, Estimating S (Hayashi 1234, 1278) If (usually b) is consistent,
512; Mahajan 6-611) OLS estimator for is b then i yi x0i is consistent. Then under assumptions 1, Standard error of the regression (Hayashi 19) SER s2 .
argmin SSR() argmin (Y X )0 (Y X ). FOCs give 6, and 10, a consistent estimator for S is [a.k.a. standard error of the equation.]
normal equations (a.k.a. moment equations) X 0 Xb =
X 0 Y X 0 e = 0 where e is OLS residual e Y Xb.
n Sampling error (Hayashi 19, 35) b = (X 0 X)1 X 0 . Under nor-
b 1
X
S 2 xi x0i . mal error assumption, b |X N(0, 2 (X 0 X)1 ).
By no multicolinearity, X 0 X is nonsingular (hence positive n i=1 i
definite and satisfying SOC), so OLS R2 (Hayashi 201, 67; Mahajan 6-112; Greene 2503) In population,
Under assumptions 1, 68, and 11 (note with conditional ho-
the fraction of variation in Y attributable to variation in X
1
b = (X 0 X)1 X 0 Y = SXX sXY moscedasticity, we no longer require finite fourth moments),
is
can use Sb s2 Sxx where s is the (consistent) OLS estimate Var Var(x0 )
minimizes normalized distance function of 2 . 1 = = (y,x0 )2 .
Var y Var y
1 1 Biases affecting OLS (Hayashi OLS estimation in-
(Y Z )0 (Y Z ) = SSR.
1889, 1947)
(Uncentered) sample analogue is: Yb 0 Yb /Y 0 Y = 1
2n 2 2n 2 consistent due to, e.g.: e0 e/Y 0 Y .
1. Unbiased: E[b|X] = (under assumptions 13). 1. Simultaneity (p. 1889); Centered: should be used if regressors include a constant;
2. Gauss-Markov: b is most efficient among linear unbi- 2. Errors-in-variables (e.g., classical measurement error)
(yi y)2 e0 e
P
ased estimators, i.e., best linear (in Y ) unbiased esti- (p. 1945); =1 P .
2 (yi y)2
P
mator (assumptions 14). (yi y)
3. Unobserved variables (p. 1967).
3. Cov(b, e|X) = 0, where e Y Xb (assumptions 14). Be careful when comparing equations on the basis of the
Maximum likelihood estimator for OLS model (Hayashi 49;
4. Var[b|X] = 2 (X 0 X)1 (assumptions 14); therefore a fit (i.e., R2 ): the equations must share the same dependent
Mahajan 6-189) Assuming normal errors, the OLS estimate
\ s2 (X 0 X)1 . Under variable to get a meaningful comparison.
common estimator is Var(b|X) for is also the ML estimate.
normal error assumption, CRLB Standard error (Hayashi 357)
ML = b;
 2 0 1 
(X X) 0 2 1 0 1 nK 2
I 1 = ML = ee = SSR = s . q q
4 n n n SE(bk ) s2 [(X 0 X)1 ]kk = \ kk .
[Var(b|X)]
0 2 /n
Best linear predictor (Mahajan 6-23; Hayashi 13940) Population
so b achieves CRLB and is UMVUEthis result is
analogue of OLS. Suppose (y, x) a r.v. with y R and Robust standard error (Hayashi 117, 211-2)
stronger than Gauss-Markov, but requires normal er-
x Rk . Best linear predictor of y given X is
rors. q q
b (y|x) L(y|x) = x0 E[xx0 ]1 E[xy].
E SE (bk ) 1 \
n
Avar(bk ) 1
[S 1 SS
n xx
1
b xx ]kk .
OLS estimator for 2 is
SSR e0 e If one regressor in x [1 x0 ]0 is constant, then E
b (y|x) OLS t test (Hayashi 357; Mahajan 6-1316) Assuming normal er-
s2 = . b (y|1, x) = + 0 x where = Var[x]1 Cov[x, y] and rors. Test for H : k = k . Since bk k |X
nK nK E
= E[y] 0 E[x]. N(0, 2 [(X 0 X)1 ]kk ) under null, we have
Unbiased (assumptions 14). Under normal error assump-
tion, Var(s2 |X) = 2 4 /(n K) which does not meet CRLB Fitted value (Hayashi 18) Yb Xb = P Y . Thus OLS residuals bk k
zk p N(0, 1).
but is UMVUE. e = Y Yb . 2 [(X 0 X)1 ]kk

Variance of b is less than or equal to variance of any other linear unbiased estimator of in the matrix sense; i.e., difference is positive semidefinite.

18
If 2 is unknown, use s2 to get t-ratio: CX, C, we get transformed model y = X + which Short and long regressions (Mahajan 6-337) Suppose we have
satisfies assumptions of classical linear regression model. best linear predictors
bk k bk k Thus we have the unbiased, efficient estimator:
tk p tnK b (y|x1 , x2 ) = x0 1 + x0 2 ,
s2 [(X 0 X)1 ]kk SE(bk ) E 1 2
GLS argmin(y X )0 V 1 (y X )
b (y|x1 ) = x0 .
E
under null. Accept for tk close to zerot distribution sym- 1
metric about zero. (X 0 V 1 X)1 X 0 V 1 y Then 1 = iff either E[x1 x2 ] = 0 or 2 = 0 since
Level- confidence interval for k is bk SE(bk )t/2 (nK). 2 0 1 1
Var(GLS |X) (X V X) . 1
= 1 + E(x1 x01 ) E(x1 x02 )2 .

OLS robust t test (Hayashi 1179) Test for H : k = k . Under This can also be seen as an IV estimator with instruments
null, we have V 1 X. Note this result is the population analogue of the Frisch-
Waugh result for b1 .
n(bk k ) bk k d Note consistency and other attractive properties rely on
tk q
N(0, 1). strict exogeneity of regressors; if they are merely predeter-
\ SE (bk )
Avar(bk ) mined, the estimator need not be consistent (unlike OLS, 1.15 Linear Generalized Method of Mo-
which is). ments
OLS F test (Hayashi 3944, 53; Mahajan 6-167, 205) Assuming nor-
mal errors. Test for H : R#rK K1 = r#r1 , where Feasible generalized least squares (Hayashi 59, 133-7, 4157; Ma-
This material is also covered (with slightly different
rank(R) = #r (i.e., R has full row rankthere are no re- hajan 6-259; D&M 298301, 295) In practice, the covariance ma-
notationnotably the exchange of x with zin Mahajan
dundant restrictions). Accept if F F#r,nK (under null) trix 2 V is generally not known (even up to a constant).
7.
is small, where: Feasible GLS uses an estimate Vb , and under reasonable
conditions, yields estimates that are not only consistent but Linear GMM model (Hayashi 198203, 212, 2256, 4067) The model
also asymptotically equivalent to genuine GLS estimates, and may assume:
(Rb r)0 [R(X 0 X)1 R0 ]1 (Rb r)/#r therefore share their efficiency properties. However, even
F when this is the case, the performance in finite samples of 1. Linearity (p. 198): yi = zi0 + i with zi , RL ;
s2
FGLS may be much inferior to that of genuine GLS.
0\
= (Rb r) [RVar(b|X)R ] 0 1
(Rb r)/#r 2. Ergodic stationarity (p. 198): xi RK instru-
Caution that FGLS is that consistency is not guaranteed if ments; unique, non-constant elements of {(yi , zi , xi )}
(SSRR SSRU )/#r n K 2/n
= = ( 1) the regressors are merely predetermined (rather than strictly are jointly ergodic stationary (stronger than individu-
SSRU /(n k) #r exogenous). ally ergodic stationary);
(and LU /LR = (SSRU / SSRR )n/2 the likelihood ra- Weighted least squares (Hayashi 589; Mahajan 6-278) GLS where 3. Orthogonality (p. 198): All K elements of xi predeter-
tio). The first and second expressions are based on Wald there is no correlation in the error term between observations mined (i.e., orthogonal to current error term): E[gi ]
principle; the third on LR. (i.e., the matrix V is diagonal). GLS divides each element E[xi i ] = E[xi (yi zi0 )] = 0;
F ratio is square of relevant t ratio. F is preferred to multiple of X and Y by the standard deviation of the relevant error 4. Rank condition for identification (p. 200): xz
t testsgives an eliptical rather than rectangular confidence term, and then runs OLS. That is, observations with a higher E[xi zi0 ]KL has full column rank (i.e., rank L);
region. conditional variance get a lower weight.
K > L: Overidentified/determined (use GMM es-
Can also think of as timator),
OLS robust Wald statistic (Hayashi 118, 122) Test for
K = L: Just/exactly identified/determined (GMM
X
H : R#rK K1 = r#r1 , where rank(R) = #r (i.e., = argmin 1
i2
(yi x0i )2 .
R has full row rankthere are no redundant restrictions). i
estimator reduces to IV estimator),
Under null, K < L: Not/under identified/determined;
Thus the weights typically passed to OLS software are the
d reciprocal of the variances (not the reciprocal of the standard 5. Requirements for asymptotic normality (p. 2023):
W n(Rb r)0 [RAvar(b)R
\ 0 ]1 (Rb r)
2#r .
deviations). {gi } {xi i } is a martingale difference sequence
For H : a#a () = 0, where A#aK () a() is continu- with K K matrix of cross moments E(gi gi0 ) non-
ous and of full row rank, under null, Frisch-Waugh Theorem (D&M 1924; Hayashi 724; Greene 2457) singular;
Suppose we partition X into X1 and X2 (and therefore
S Plimn Var[ P ng] = Avar(g) =
0 1 d into 1 and 2 ). Then OLS estimators
W na(b)0 A(b)Avar(b)A(b) 2#a .
 
1 1
\ a(b) Avar( n gi ) = Avar( n 2i xi x0i );
b1 = (X10 X1 )1 X10 (Y X2 b2 ); By assumption 2 and ergodic stationary mds CLT,
Generalized least squares (Hayashi 549, 4157; Mahajan 6-259; S = E[gi gi0 ] = E[2i xi x0i ];
D&M 28992, 295) Without assumption 4 (spherical error vari-
b2 = (X20 M1 X2 )1 (X20 M1 Y )
6. Finite fourth moments (p. 212): E[(xik zil )2 ] exists and
ance), we have E[0 |X] 2 V with Vnn 6= I assumed to = (X20 X2 )1 X20 Y
is finite for all k and l;
be nonsingular. Gauss-Markov no longer holds; the t and F
tests are no longer valid; however, the OLS estimator is still where X2 M1 X2 and Y M1 Y are the residuals from 7. Conditional homoscedasticity (p. 2256): E(2i |xi ) =
unbiased. regressing X2 and Y respectively on X1 . 2 , which implies S E[gi gi0 ] = E[2i xi x0i ] =
By the orthogonal decomposition of V 1 , we can write In addition, the residuals from regressing Y on X equal the 2 E[xi x0i ] = 2 xx . Note:
V 1 C 0 C for some (non-unique C). Writing y Cy, X residuals from regressing Y on X2 . By assumption 5, 2 > 0 and xx nonsingular;

19
b = 2 1 P xi x0 = 2 Sxx where 2 consistent, so
S c)
(assumptions 14, 5 or 8). Note sampling error (W Distance principle statistic is by
n i i
no longer need assumption 6 for consistent estima- 0 W
= (Sxz c Sxz )1 S 0 W
c gn ().
xz
tion of S;
b1 ), S
b1 ) J(u (S
b1 ), S d
b1 )
8. {gt } {xt t } satisfies Gordins condition (p. 4067): it
3. Consistent variance estimation: If S
b is a consistent es- J(r (S 2#a ,
timator of S, then (assumption 2)
is ergodic stationary and
(a) E[gt gt0 ] exists and is finite; where J is 2n times the minimized objective function:
\c
L2 Avar((W )) =
(b) E[gt |gtj , gtj1 , . . . ] 0 as j ;
0 c
P q 0 r ] (Sxz W Sxz )1 Sxz
0 c bc 0 c
W S W Sxz (Sxz W Sxz )1 . b1 ) ngn ()0 S
J(, S b1 gn ()
(c) j=0 E[rtj tj < , where rtj
E[gt |gtj , gtj1 , . . . ] E[gt |gtj1 , gtj2 , . . . ]; 1
n
(Y Z )0 X S
b1 X 0 (Y Z )
4. Consistent estimation of S: If is consistent, S
P the long-run covariance matrix S = Avar(g)
and
E[gi gi0 ] exists and is finite, and assumptions 1, 2, (5
jZ j is nonsingular. implied?] and 6, then
LM statistic is on Majajan 7-712 or D&M 617.
Instrumental variables estimator (Hayashi 2056) A method of X X p
moments estimator based on K orthogonality (moment) con- b
S 1
n
2i xi x0i = 1
n
(yi zi0 )2 xi x0i
S. Note that unlike the Wald, these statisticswhich are nu-
ditions. For just identified GMM model (K = L; identifica- i i
merically equal in finite samplesare only asymptotically
tion also requires nonsingularity of xz E[xi zi0 ]), chi squared under efficient GMM. If the hypothesis restric-
For estimation of S under assumption 8 in place of 5,
 X 1  X  tions are linear and the weighting matrix is optimally chosen,
1
IV Sxz sxy n 1
xi zi0 1
xi yi . see Hayashi 40712 on kernel estimation and VARHAC.
n they are also numerically equal to the Wald statistic.
i i 5. Choosing instruments: Adding (correct) instruments
Numerically equivalent to linear GMM estimator with any generally increases efficiency, but also increases finite-
sample bias; weak instruments (that are almost un- Efficient GMM (Hayashi 2125; D&M 588-9, 5978, 6078) Lower
symmetric positive definite weighting matrix. If instruments c )) (0 S 1 xz )1 is achieved if W
correlated with regressors) make asymptotics poorly ap- bound Avar((W xz
c cho-
are same as regressors (xi = zi ), reduces to OLS estimator. c = S 1 . Two-step efficient GMM:
For properties, see GMM estimators. proximate finite-sample properties. sen so that W plim W

If one non-constant regressor and one non-constant instru-


GMM estimator can be seen as a GLS estimator: GMM min- 1
ment, plim IV = Cov(x, y)/ Cov(x, z). 1. Calculate (W c ) for some W
c (usually Sxx ; use residuals
imizes e0 X W
c X 0 e = e0 W cX 0 .
cGLS X W
cGLS e where W
0
i yi zi (W ) to obtain a consistent estimator S
c b of
Linear GMM estimator (Hayashi 20612, 4067; D&M 21920) For
GMM hypothesis testing (Hayashi 2112, 2224; Mahajan 7-712; S.
overidentified case (K > L), cannot generally satisfy all K
moment conditions, so estimator is D&M 6178) Assume 15. Under null, Wald principle gives:
b1 ).
2. Calculate (S
c ) argmin 1 gn ()0 W
(W c gn () 1. H0 : l = l ,
2

Note the requirement of estimating Sb1 harms small-sample
argmin 1
2n2
(Y Z )0 X W
c X 0 (Y Z ) c ) l d
l (W properties, since it requires the estimation of fourth mo-
tl
N(0, 1).
SEl ments. Efficient GMM is often outperformed in bias and
with gn () 1 P 0 variance by equally-weighted GMM (W c = I) in finite sam-
i xi (yi zi ) and some symmetric positive
n
c p 2. H0 : R#rL L1 = r#r1 , where rank(R) = #r (i.e., ples.
definite matrix W W (where W c can depend on data, and
W also symmetric and positive definite). R has full row rankthere are no redundant restric-
tions), Testing overidentifying restrictions (Hayashi 21721; Mahajan 7-
0 c
GMM (Sxz W Sxz )1 Sxz
0 c
W sxy 136; D&M 2327, 6156) Assuming overidentification (K > L),

(Z 0 X W
c X 0 Z)1 Z 0 X W
c X 0 Y.
W n(R(W
 \c
c ) r)0 R[Avar((W ))]R0
1 a consistent estimator S b (ensured by assumption 6), and as-
sumptions 15, then
If K = L (just identified case), reduces to IV estimator for d
2#r .
c ) r)
(R(W
any choice of W
c.
b1 ))0 S
b1 gn ((S
b1 )) d
J ngn ((S 2KL .
c)
1. Consistency: (W
p
(under assumptions 14). 3. H0 : a#a () = 0, where A#aL () a() is continu-
ous and of full row rank,
c ) ) d
2. Asymptotic normality: n((W
To test a subset of orthogonality conditions (as long as K1
N(0, Avar((W ))) where
c
0
\c c ))0 1
 remainingnon-suspectinstruments are enough to ensure
W na((W
c )) A((W
c ))[Avar((W ))]A((W
identification), calculate J1 statistic using remaining instru-
Avar((W
c )) = d d
2#a .
c ))
a((W ments, and C J J1 2 (K K1 ) (this is a distance
(0xz W xz )1 0xz W SW xz (0xz W xz )1 principle test, since J is 2n times the minimized objective
| {z }| {z }| {z } (Note W is not numerically invariant to the represen- function). In finite samples, use the leading submatrix of Sb
1 1 tation of the restriction.) to calculate J1 .

20
Two-Stage Least Squares (Hayashi 22631; D&M 21524) Under as- Test by estimating either artificial regression using OLS: 5. Conditional homoscedasticity becomes
sumption 7 (conditional homoscedasticity), efficient GMM
E[im ih |xim , xih ] = mh for all m, h = 1, . . . , M ;
becomes 2SLS. Minimizing normalized distance function Y = Z + Px Z + in this case
Y = Z + Mx Z + ;
1
(Y Z )0 Px (Y Z ) 11 E[xi1 x0i1 ] 1M E[xi1 x0iM ]

2n 2 i.e., including either fitted values or residuals from a first-
stage regression of suspect regressors Z on known instru- S= .. .. ..
.

. . .
yields estimator ments X. If Z are exogenous, the coefficient or should
M 1 E[xiM x0i1 ] 0
M M E[xiM xiM ]
be zero.
b1 ) = ([ 2 Sxx ]1 ) = (S 1 )
2SLS (S xx
Interpretation note: although the test is often interpreted as
0 1
= (Sxz Sxx Sxz )1 Sxz
0 1
Sxx sxy . a test for exogeneity, the test is actually for the consistency Linear multiple-equation GMM estimator (Hayashi 266-7,
of the OLS estimate. OLS can be consistent even if Z are 2703) Hayashi 270 summarizes comparison to single-equation
Note 2SLS does not depend on 2 . endogenous. GMM. As in single-equation GMM case, GMM minimizes
1
Asymptotic variance Avar(2SLS ) = 2 (0xz 1 1 es- Analogous test will work for nonlinear model as long as sus- (Y Z )0 X W
c X 0 (Y Z ), hence
xx xz ) 2n2
timated by pect regressors Z enter linearly; if Z enter nonlinearly,
failure is due to Jensens inequality. 0 c
0 1
GMM (Sxz W Sxz )1 Sxz
0 c
W sxy
\
Avar(2SLS ) 2 (Sxz Sxx Sxz )1
(Z 0 X W
c X 0 Z)1 Z 0 X W
c X 0 Y,
2 n(Z 0 Px Z)1 1.16 Linear multiple equation GMM

with 2 1 P
zi0 2SLS )2 . See Hayashi 284 for summary of multiple-equation GMM es- (See Hayashi 267 for huge partitioned matrix representation;
n i (yi
timators. This material is also covered (with slightly different can also be written with Y , , and stacked vectors, and X
If errors are heteroscedastic, 2SLS is inefficient GMM. Ro- notationnotably the exchange of x with zin Mahajan 7. and Z block matrices), where
1
bust covariance estimate (since Wc = Sxx ) is
Linear multiple-equation GMM model (Hayashi 259265, 269 1 P
xi1 z0i1

70, 2745) Hayashi 270 summarizes comparison to single-
\ n
Avar(2SLS ) = equation GMM. ..
Sxz .
,
0 1
Sxz )1 Sxz
0 1 b 1 0 1
Sxz )1 . Model has M linear equations, where equation m has Lm

(Sxz Sxx Sxx SSxx Sxz (Sxz Sxx |{z}
1
xiM z0iM
P P P
Km Lm
regressors and Km instruments. Assumptions are exactly n
1

parallel to single-equation GMM.
P
xi1 yi1
Other characterizations of 2SLS minimize kPx (Y Z)k2 = n
(Y Z)0 Px (Y Z), or else note: sxy

..
.
1. Each equation must satisfy its (equation-by-equation)
P .
|{z}
rank condition for identification. P
Km 1
1
xiM yiM
0 0 1 0 1 0 0 1 0 n
2SLS = (Z X(X X) X Z) Z X(X X) X Y
2. The vector of all (yi , zi , xi ) must be jointly ergodic
= (Z 0 Px Z)1 Z 0 Px Y stationarythis is stronger than equation-by-equation
1. If each equation is exactly identified, then choice of
ergodic stationarity.
weighting matrix doesnt matter, and multiple-equation
where the projection matrix Px X(X 0 X)1 X 0 . So can 3. Asymptotic normality requires that {gi } be a mds with GMM is numerically equivalent to equation-by-equation
view 2SLS as: nonsingular second moment, where IV estimation.
1. IV regression of Y (or Px Y ) on Z with Px Z (fitted val-
xi1 i1

2. Equation-by-equation GMM corresponds to using a
ues from regressing Z on X) as instruments, or; . block diagonal weighting matrix.
gi .. .

2. OLS regression of Y (or Px Y ) on Px Z (fitted values |{z}
P
Km 1 xiM iM 3. Assuming overidentification, efficient multiple-equation
from regressing Z on X)this is why its called two-
stage least squares. GMM will be asymptotically more efficient than
By ergodic stationary mds CLT, SP P
Km Km = equation-by-equation efficient GMM; they will only be
E[gi gi0 ] where asymptotically equivalent if efficient weighting matrix
Caution: Dont view standard errors reported by these re-
gressions are relevant for 2SLS, since they ignore errors from 1
X is block diagonal (i.e., if E[im ih xim x0ih ] = 0 for all
S lim Var[ ng] = Avar(g) = Avar( n gi ) m 6= h; under conditional homoscedasticity, this is
first stage (i.e., calculation of fitted values Px Z). n

E[i1 i1 xi1 x0i1 ] E[i1 iM xi1 x0iM ]
E[im ih ] = 0).
Durbin-Wu-Hausman test (D&M 23742; MaCurdy) Suppose we .. .. ..
= .

do not know whether certain regressors are endogenous or . . .
0 0
Full-Information Instrumental Variables Efficient (Hayashi
not, and therefore whether or not they need to be included E[iM i1 xiM xi1 ] E[iM iM xiM xiM ] 2756; Mahajan 716-7) Efficient GMM estimator with:
as instruments. Assuming endogeneity, OLS will be incon-
sistent and we prefer 2SLS; assuming exogeneity, both OLS 4. Consistent estimation of S requires finite fourth mo-
and 2SLS estimates will be consistent and we prefer OLS (it ments E[(ximk zihj )2 ] for all k = 1, . . . , Km ; j = Conditional homoscedasticity (E(0 |X) = M M
is BLUE, and unbiased under appropriate assumptions). 1, . . . , Lh ; and m, h = 1, . . . , M . Inn ).

21
b1 , where
Weighting matrix is S Suppose we have linearity; joint ergodic stationarity; orthog- Seemingly Unrelated Regressions (Hayashi 27983, 309; Mahajan
X onality; rank condition; mds with finite second moments; 717) [a.k.a. Joint Generalized Least Squares] Efficient GMM
b 1
S b 0,
Xi X conditional homoscedasticity; E[zim z0ih ] exists and is finite estimator with:
n i
i for all m, h; and xim = xi (common instruments). Then
1 0 1
xi1 x0iM
P P
11 n i xi1 xi1 1M n i 3SLS is consistent, asymptotically normal, and efficient.
Conditional homoscedasticity (E(0 |X) = M M

= .
.. .. ..
, 3SLS is more efficient than multiple-equation 2SLS unless
. Inn ),
P. either

1 P 1
M 1 n i xiM x0i1 M M n i xiM x0iM
The set of instruments the same across equations (X =
1. is diagonal, in which case the two estimators are
P IM M XnK , where X are the instruments for each
and Xi the Km M block-diagonal matrix of instruments asymptotically equivalent (although different in finite
equation).
for the ith observation; and samples),
X 2. Each structural equation is exactly identified, in which The set of instruments the union of all the regressors.
b
1
n
i 0i , case the system is exactly identified, and 2SLS and 3SLS
i are computationally identical in finite samples.
Note this is the 3SLS estimator; it just has another name
or Multiple equation Two-Stage Least Squares (MaCurdy) Effi- when instruments are the union of all regressors. Here, all re-
X cient GMM estimator with: gressors satisfy cross orthogonality: each is predetermined
1
mh n
im ih in every other equation. Since regressors are a subset of
i Conditional homoscedasticity across equations ( = b are from OLS.
instruments, residuals used to estimate
X 2 IM M ),
= 1
n
(yim z0im m )(yih z0ih h ) Normalized distance function
i Conditional homoscedasticity (E(0 |X) = M M
Inn = 2 InM nM ),
for some consistent estimator m of m (usually 2SLS). The set of instruments the same across equations (X =
1
2n
(Y Z )0 (
b 1 I)(Y Z )
Suppose we have linearity; joint ergodic stationarity; orthog- IM M XnK , where X are the instruments for each
onality; rank condition; mds with finite second moments; equation).
yields estimator
conditional homoscedasticity; and E[zim z0ih ] exists and is fi-
Weighting matrix doesnt matter up toP a factor of propor-
nite for all m, h. Then FIVE is consistent, asymptotically tionality, and is thus S b1 = I ( 1 0 1 = I
i xi xi ) b 1 I)Z 1 Z 0 (
n
SUR = Z 0 ( b 1 I)y,
 
normal, and efficient. 1 0 1
( n X X) . Normalized distance function
\ b 1 I)Z 1 .
SUR ) = n Z 0 (
 
Three-Stage Least Squares (Hayashi 2769, 308; Mahajan 717; 1 Avar(
MaCurdy) Efficient GMM estimator with:
12
1 ..
(Y Z )0

PX (Y Z )

Conditional homoscedasticity (E(0 |X) = M M 2n .
2
M
Inn ), Suppose we have linearity; joint ergodic stationarity; orthog-
The set of instruments the same across equations (X = yields estimator onality; mds with finite second moments; conditional ho-
moscedasticity; and xim = m zim . Then SUR is consis-
S
IM M XnK , where X are the instruments for each 1 0
2SLS = Z 0 (I PX )Z

equation). Z (I PX )y tent, asymptotically normal, and efficient.
b0 Z]
= [Z b 1 Z
b0 y,
Weighting matrix is Sb1 = b 1 ( 1 P xi x0 )1 = b 1 If each equation is just identified, then all the regressors
n i i
1 0 1 1 P 0 must be the same and SUR is called the multivariate regres-
( n X X) , where
b =
n i i i is the matrix of calcu- where Zb (IP )Z are the fitted values from the first-stage sionit is numerically equivalent to equation-by-equation
X
lated as in FIVE using 2SLS residuals. Normalized distance regression. The covariance matrix simplifies since = 2 I: OLS. If some equation is overidentified, we gain efficiency
function over equation-by-equation OLS as long as some errors are
1
(Y Z )0 (
b 1 P )(Y Z ) 1
correlated across equations (i.e., mh 6= 0); we are taking
2n X \ 2SLS ) = n 2 Z 0 (I PX )Z

Avar(
yields estimator advantage of exclusion restrictions vs. multivariate regres-
b0 Z]
= n 2 [Z b 1 ;
sion.
b 1 P )Z 1 Z 0 (
= Z 0 ( b 1 P )y
 
3SLS X X
 0 1 1 0 1 however, if 6= 2 I, we instead get
= Zb (b I)Zb Z
b (
b I)y, Multiple-equation GMM with common coefficients
b0 Z]
= n[Z b 1 Z
b0 ( I)Z[
bZ b0 Z]
b 1 . (Hayashi 2869) When all equations have same coefficient
where Zb (I P )Z are the fitted values from the first-
X vector RL ,
stage regression of the Zm s on X. This is the same estimator as equation-by-equation 2SLS,
and the estimated within-equation covariance matrices are
also the same. Joint estimation gives the off-diagonal (i.e., 1. Linearity becomes yim = z0im + im for m = 1, . . . , M ;
\ b 1 P )Z 1 .
3SLS ) = n Z 0 (
 
Avar( X cross-equation) blocks of of the covariance matrix. i = 1, . . . , n.

When regressors are a subset of instruments, 2SLS becomes OLS.

We need neither the rank conditions (per Hayashi 285 q. 7) nor assumption that E[zim z0ih ] exists and is finite for all m, hit is implied.

22
1 P b 1 ( 1 P xi x0 )1 . Caution:
2. Rank condition requires full column rank of (n xi x0i )1 rather than n i Structural form
Running a standard OLS will give incorrect standard errors.
E(xi1 z0i1 )

. Using matrix notation as in random effects estimator, yt + B xt = t
xz .. ;

|{z} |{z} |{z} |{z} |{z}
1  X
M M M 1 M K K1 M 1
|{z}  X 
P
Km L
0
E(xiM ziM ) pOLS = n 1
Zi0 Zi 1
n
Zi0 yi ,
i i
Note here xz is a stacked rather than a block matrix, has nonsingular ;
= (Z 0 Z)1 Z 0 y;
and the rank condition is weaker than in the standard 1 1
multiple-equation GMM modelrather than requiring Avar(pOLS ) = E[Zi0 Zi ] E[Zi0 Zi ] E[Zi0 Zi ] ; 5. t |xt N(0, ) for positive definite ;
equation-by-equation rank conditions, it suffices for one
\
Avar(pOLS ) = 6. The vector that collects elements of yt (
equation to satisfy the rank condition.
yt1 , . . . , ytM , zt1 , . . . , ztM ) and xt is iid (stronger than
0 c
GMM (Sxz W Sxz )1 Sxz
0 c
W sxy 
1
1 
1

1
1 a jointly ergodic stationary mds);
Zi0 Zi Zi0 Z Zi0 Zi
P P P
n i n i
b i
n i ,
(as in standard multiple-equation GMM estimator), but here 7. (, ) is an interior points in a compact parameter space.
= n(Z 0 Z)1 Z 0 (
b I)Z (Z 0 Z)1 .
1P 1P  
xi1 z0i1

n n
xi1 yi1
.. .. We can rewrite complete system as a reduced form
Sxz , sxy .
. .
1 P
xiM ziM0 1 P
xiM yiM 1.17 ML for multiple-equation linear mod-
n n
els yt = 0 xt + v t .
|{z} |{z}
Imposition of conditional homoscedasticity (and other appro- 1 B 1 t
priate assumptions as above) gives common-coefficient ver- Structural form (D&M 2123; Hayashi 52930) Equations each
sions of FIVE, 3SLS, and SUR estimatorsthe latter is the present one endogenous variable as a function of exogenous
random effects estimator. and endogenous variables (and an error term). It is conven- xt is orthogonal to vector of errors vt , so this is a multivariate
tional to write simultaneous equation models so that each regression model; given our normality assumption, we have
Random effects estimator (Hayashi 28990, 2923) Efficient endogenous variable appears on the left-hand side of exactly
multiple-equation GMM (SUR) where all equations have one equation, but there is nothing sacrosancy about this
yt |xt N(1 B0 xt , 1 (1 )0 ),
same coefficient vector RL , conditional homoscedasticity convention.
applies, and the set of instruments for all equations is the
Can also be written with the errors a function of all variables
union of all regressors. and can estimate (on which and B depend) and by
(one error per equation)
We can write the model as yi = Zi + i , where maximum likelihood. Generally useP
concentrated ML to first
1 n
Reduced form (D&M 2134; Hayashi 52930) Each endogenous vari- maximize over to get ()
b = n t=1 (yt + Bxt )(yt +
0
yi1 zi1 i1 Bxt )0 and then over .

able is on the left side of exactly one equation (one endoge-
yi = .. , Zi = .. , i = .. ; nous variable per equation), and only predetermined vari-

|{z} . |{z} . |{z} . ables (exogenous variables and lagged endogenous variables) Estimator is consistent and asymptotically normal (and
M 1 yiM M L
0
ziM M 1 iM are on the right side (along with an error term). asymptotically equivalent to 3SLS ) even if t |xt is not nor-
mal, as long as model is linear in ys and errors (not neces-
or as y = Z + by stacking the above (note Z is no longer Full Information Maximum Likelihood (Hayashi 52635; sarily in regressors or parameters).
block-diagonal as it was for SUR). Then MaCurdy) Application of (potentially quasi-) maximum like-
 X 1  X  lihood to 3SLS model with two extensions: iid (dependent Maximum likelihood for SUR (Hayashi 5257; MaCurdy P.S. 3 1(v))
RE = 1
n
Zi0
b 1 Zi 1
n
Zi0
b 1 yi , variables, regressors, and instruments), and conditionally A specialization of FIML, since SUR is 3SLS with the set of
i i homoscedastic normally distributed errors. Model requires: instruments the set of all regressors. In FIML notation, that
b 1 I)Z 1 Z 0 (
= Z 0 ( b 1 I)y;
 
0 +
1. Linearity: ytm = ztm means ztm is a subvector of xt ; therefore yt just collects the
m tm for m = 1, . . . , M ;
1 (sole remaining endogenous variables) yt1 , . . . , ytM and the
Avar(RE ) = E[Zi0 1 Zi ]

; 0 ] of full column rank for all
2. Rank conditions: E[xt ztm
1 structural form parameter = I. We get a closed form ex-
m;
 X
\
Avar(RE ) = n1 0 b 1
Zi Zi , pression (for some consistent estimator ),
b unlike for general
i 3. E[xt x0t ] nonsingular; FIML:
b 1 I)Z 1 .
= n Z 0 (
 
4. The system can be written as a complete system of
b 1 I)Z 1 Z 0 (
b = Z 0 ( b 1 I)y.
 
simultaneous equationslet yt be a vector of all en- ()
Pooled OLS (Hayashi 2903, 30910) Inefficient multiple-equation dogenous variables (those in yt1 , . . . , ytM , zt1 , . . . , ztM
GMM where all equations have same coefficient vector but not in xt ) :
RL calculated by pooling all observations and applying OLS. The number of endogenous variables equals the LR statistic is valid even if the normality assumption fails to
Equivalent to GMM with weighting matrix W c = IM numbers of equations (i.e., yt RM ); hold.

Caution: yt does not merely collect yt1 , . . . , ytM as usual.

Per Hayashi 534 and MaCurdy, this asymptotic equivalence does not carry over to nonlinear (in endogenous variables and/or errors) equation systems; nonlinear FIML is more efficient than nonlinear 3SLS.

23
Limited Information Maximum Likelihood (Hayashi 53842; where t (L)t and j = (j+1 + j+2 + ). 1.19 Limited dependent variable models
MaCurdy) Consider the mth equation in the FIML frame-
Eventually, the time trend (assuming there is one, i.e., 6= 0)
work, and partition regressors into endogenous yt and
dominates the stochastic trend a.k.a. driftless random walk, Binary response model (D&M 5124, 51721; MaCurdy) Dependent
predetermined xt :
which dominates which dominates the stationary process. variable yt {0, 1} and information set at time t is t . Then
0
ytm = ztm m +t = yt0 m + x0t m +t
|{z} |{z} |{z} |{z} |{z} |{z} Brownian motion (Hansen B2-57, 18, 21; Hayashi 56772) The ran-
1Lm Lm 1 1Mm Mm 1 1Km Km 1
dom CADLAG (continuous from right, limit from left) func- index
where Lm = Mm + Km . tion W : [0, 1] R (i.e., an element of D[0, 1]) such that:
z }| {
E[yt |t ] = Pr[yt = 1|t ] F (h(xt , )) .
Combining with the relevant rows from the FIML reduced | {z }
form yt = 0 xt + vt gives a complete 1 + Mm equation sys- 1. W (0) = 0 almost surely; transformation
tem: 2. For 0 t1 < t2 < < tk 1, the random variables
yt + B xt = t W (t2 ) W (t1 ), W (t3 ) W (t2 ), . . . , W (tk ) W (tk1 ) If E[yt |t ] = F (x0t ), then FOCs for maximizing log like-
|{z} |{z} |{z} |{z} |{z} are independent normals with W (s)W (t) N(0, st); lihood are same as for weighted least squares with weights
(1+Mm )(1+Mm ) (1+Mm )1 (1+Mm )K K1 (1+Mm )1
3. W (t) is continuous in t with probability 1. [F (1 F )]1/2 , which is reciprocal of squareroot of error
with   variance.
y
yt tm , t tm vt ,
 
yt If {t } is a driftless I(1) process, so that t is zero-
mean I(0) with 2 the long-run variance of {t } and FOCs may not have a finite solution if data set doesnt iden-
 0
m 00
  
1 m0 0 Var(t ), then tify all parameters.
, B ,
0 IMm 0
and xt assumed to be the first Km elements of xt . T Z 1
1 X 2 d Probit model (Hayashi 4512, 460, 4667,4778; D&M 5145;
2
t1 W (r)2 dr MaCurdy)
We estimate this structural form by ML as in FIML. Estima- 2
T t=1 0 Binary response model with E[yt |t ] = (x0t ).
tor is consistent, and asymptotically normal even if t |xt is
T
not normal, as long as model is linear in ys and errors (not 1 X d Can also think of as yt = x0t + t where t N(0, 1) and
1
2 W (1)2 0

necessarily in regressors or parameters). It has the same t t1

T t=1 2 yt = Iyt >0 .
asymptotic distribution as 2SLS (again, if specification is Z 1
linear), but generally outperforms 2SLS in finite samples. 2 Log likelihood function (QML if non-iid ergodic stationary)
W (u) dW (u)
0

1.18 Unit root processes jointly (i.e., the W () are all the same).
X
yt (x0t ) + (1 yt )(1 (x0t ))
Integrated process of order 0 (Hayashi 558) A sequence of r.v.s t
Function Central Limit Theorem (Hansen B2-917) [Donskers
is an I(0) process if it:
Invariance Principle] A sequence {Yt } (where Y0 = 0) can
1. is strictly stationary; be mapped into a D[0, 1] function by is concave. MLE is identified iff E[xt x0t ] is nonsingular. MLE
P
2. has long-run variance jZ j (0, ). is consistent and asymptotically normal if identification con-
Wn (u) Ybunc dition holds, and if {yt , xt } is ergodic stationary.
It can be written as + ut , where {ut } is a zero-mean sta-
tionary process with positive, finite long-run variance. or equivalently by
Logit model (Hayashi 5081; D&M 515) Binary response model with
Integrated process of order d (Hayashi 559) A sequence of r.v.s t t+1
 E[yt |t ] = (x0t ) where () is the logistic function: (v)
{t } is an I(d) process if its dth differences d t (1L)d t Yt for u n
, n
. exp(v)/(1 + exp(v)).
are an I(0) process.
Pt
If Yt s=1 s for {s } iid white noise, then n1/2 Wn (u) Can also think of as yt = x0t + t where t have extreme
Unit root process (Hayashi 55960, 565-6) An I(1) process, a.k.a. d
n1/2 Ybunc
W (u), where W () is a Brownian motion. value distribution and yt = Iyt >0 .
difference-stationary process. If first-differences are mean-
zero, the process is called driftless.
Dickey-Fuller Test (Hayashi 5737; Hansen B2-24) Test whether an Log likelihood function (QML if non-iid ergodic stationary)
If t + ut + (L)t , the process can be written
AR(1) process has a unit root (the null hypothesis). In case
t
X without drift, use OLS to regress yt = yt1 + t where {t } X
t = 0 + t + us iid white noise. Under null, yt (x0t ) + (1 yt )(1 (x0t ))
s=1 t
t 1 P R1
X T t Yt1 t d 0W (u) dW (u)
= t + (1) s + t +(0 0 ) T (T 1) = 1 P 2

R1 DF.
s=1 T2 t Yt1
2
0 W (u) du is concave. MLE is identified iff E[xt x0t ] is nonsingular. MLE
|{z} | {z } |{z} is consistent and asymptotically normal if identification con-
time trend stoch. trend stationary proc. Tests with drift or non-zero mean covered in Hayashi. dition holds, and if {yt , xt } is ergodic stationary.

The last expression assumes {t } is iid white noise.

24
Truncated response model (Hayashi 5117; D&M 5347; MaCurdy) Maximum Likelihood is best estimation technique (called Cauchy-Schwarz Inequality (C&B 4.7.3; Wikipedia)
Model is yt = x0t 0 + t with t |xt N(0, 02 , but we only generalized Tobit here), but also sometimes estimated us- q
see observations where yt > c for some known threshold c. ing either |EXY | E |XY | (E X 2 )(E Y 2 ).
Density for each observation is
1. Heckman two-step: first use probit estimation on all This is Holders
p Inequality with p = q = 2. Implies that
f (yt ) observations to estimate , and then use OLS for ob- Cov(X, Y ) Var(X) Var(Y ).
f (yt |yt > c) = .
Pr(yt > c) servations where zt > 0 to estimate ; note the reported Equality obtains iff X and Y are linearly dependent.
covariance matrix in the second step will not be correct.
Note that if yt N(0 , 02 ), Minkowskis Inequality (C&B 4.7.5)
2. Nonlinear least squares: only uses any data on obser-
E[yt |yt > c] = 0 + 0 (v) vations where zt > 0; generally has identification prob- [E |X + Y |p ]1/p [E |X|p ]1/p + [E |Y |p ]1/p
Var[yt |yt > c] = 02 [1 0 (v)] = 02 1 (v)[(v) v] lems, since estimation is principally based on exploiting for p 1. Implies that if X and Y have finite pth moment,
 
nonlinearity of ()/(). then so does X + Y . Implies that E |X|p E |X|p .
c0 (v)
where v 0
and (v) 1(v)
is the inverse Mills ratio.
Jensens Inequality (C&B 4.7.7) If g() is a convex function, then
Could estimate 1.20 Inequalities E g(X) g(E X).
c x0t 0
 
Equality obtains iff, for every line a + bx tangent to g(x) at
E[yt |yt > c] = x0t 0 + 0 Bonferronis Inequality (C&B 1.2.1011) Bounds below the prob- x = E X, P (g(X) = a + bX) = 1 (i.e., if g() is linear almost
0
ability of an intersection in terms of individual events: everywhere in the support of X).
by nonlinear least squares, but ML is preferred because it is P (A B) P (A) + P (B) 1; or more generally
asymptotically more efficient (and we have already imposed
full distributional assumptions). n
\  n
X n
X 1.21 Thoughts on MaCurdy questions
P Ai P (Ai ) (n 1) = 1 P (AC
i ).
Extension to nonlinear index function (in place of x0t ) is i=1 i=1 i=1 Default answer
straighforward.
Booles Inequality (C&B 1.2.11) Bounds above the probability If we assume , we know that is consis-
Tobit model (Hayashi 51821; D&M 53742; MaCurdy) [a.k.a. censored
of
S 
tent, assuming correct specification and e.g., ergodic station-
response model] All observations are observed, but we do not P a union in terms of individual events: P i Ai
arity, regressors orthogonal to the contemporary error term,
see yt if it is less than some known threshold c; like a trun- i P (Ai ).
instruments orthogonal to the contemporary error term, ap-
cated regression model combined with a probit model, with
Chebychevs Inequality (C&B 3.6.12, 5.8.3; Greene 66) A widely propriate rank condition(s) for identification.
the coefficient vectors restricted to be proportional to each
other. applicable but necessarily conservative inequality: for any We can test the null hypothesis that , which corre-
r > 0 and g(x) nonnegative, sponds to = 0, using a Wald test on Model . The
Model is statistic W is asymptotically distributed 2 ( )
yt = max{x0t 0 + t , c} P (g(X) r) E g(X)/r. under the null, thus we can(not) reject the null hypothesis.
| {z }
yt Note we have used normal/robust standard errors.
Letting g(x) = (x )2 / 2 where = E X and 2 = Var X
where t |xt N(0, 02 ) and {xt , yt } iid. gives P (|X | t) t2 . We can also test the null hypothesis using a distance prin-
ciple test, where model is the unrestricted model, and
Log-likelihood (a mix of density and mass) is See C&B 5.8.3 for a (complex) form that applies to sample model represents the restrictions imposed by the null
mean and variance rather than the (presumably unknown) hypothesis. Using a suitably normalized distance func-
yt x0t 0 c x0t 0
    
X 1 X
log + log . population values. tion QT , the statistic LR 2T (Qr Qu ) =
0 0 0
y >c
t y =c t is asymptotically distributed 2 ( ) under the
Markovs Inequality (C&B 3.8.3) Like Chebychev, but imposes null.
Reparametrize to get concavity. conditions on Y to provide more information at equality
[Note we cannot use the multiple equation 2SLS estimates to
Sample selection model (D&M 5425; MaCurdy) Rather than trun- point. If P (Y 0) = 1 (i.e., Y is nonnegative) and
conduct a distance principle test, since generating a suitably
cating according to dependent variable, truncate on another P (Y = 0) < 1 (i.e., Y not trivially 0), and r > 0, then
normalized distance function would require having estimates
variable correlated with it. Suppose yt (e.g., wage), and zt P (Y r) E Y /r with equality iff P (Y = r) = 1 P (Y =
for the variance of the error for each equation (along with
(e.g., weeks) are latent variables with 0).
the SSR for each equation).]
[Note that the validity of this distance function requires the
   0      2
Numerical inequality lemma (C&B 4.7.1) Let a and b be any
  
yt xt u ut
= 0 + t , N 0, , positive numbers, p and q positive (necessarily > 1) satis- restricted and unrestricted models to use the same weighting
zt wt vt vt 1
fying p1 + 1q = 1. Then p1 ap + 1q bq ab, with equality iff matrix; since 6= , the test statistic is not
and xt and wt exogenous or predetermined. We observe zt ap = bq . valid. However, given the regression output available, this is
(the sign of zt ) and yt = yt if zt > 0. the best we can do.]
For observations where zt > 0 (and hence we observe yt ), Holders Inequality (C&B 4.7.2) Let positive p and q (necessarily Thus we can(not) reject the null hypothesis.
model becomes > 1) satisfy p1 + 1
q
= 1, then
If we instead assume , our tests are no longer valid;
(wt0 ) in this case is consistent (again, assuming correct
yt = xt0 + + t . p 1/p q 1/q
|EXY | E |XY | (E |X| ) (E |Y | ) . specification), so. . .
(wt0 )

25
Thoughts on models 2SLS should generally have better finite sample 3. 2SLS (single-equation): 12 e0 Pz e
properties since it uses a priori knowledge of the Reported value is usually e0 Pz e; required normalization
1. Two models with and without a restriction is for LR form of the S matrix. is 1/ 2 .
testing.
FIML, LIML, 3SLS, and 2SLS are all asymptoti- Invalid under heteroscedasticity (not efficient
2. A model that includes only fitted values of an endoge- cally equivalent. GMM).
nous variable is 2SLS. Validity under serial correlation???
3. A model that includes both a (potentially) endogenous Hypothesis tests Make sure to use the same estimate for 2 in
variable and its fitted values or residuals is for a DWH both restricted and unrestricted models (typically
test. Wald: basically always possible as long as estimator is con- SSR/n or SSR/(nk) from one of the two models).
4. A model that includes lagged residuals is for testing sistent and standard errors are calculated appropriately 4. SUR / JGLS: e0 ( b 1 I)e
serial correlation. Reported value is usually as desired; no normalization
Be careful when using formulae from Hayashi; they are is needed.
generally based on the asymptotic variance matrix; re-
Which model and standard errors to use 1
gression output doesnt estimate this, it estimates n Invalid under heteroscedasticity (across
times the asymptotic variance matrix. observationsnot efficient GMM; however, its fine
A less restrictive model will still be consistent and asymp- with heteroscedasticity across equations).
State assumptions that justify consistency of estimation Validity under serial correlation???
totically more efficient; it is only harmed on finite sample
technique, and standard errors used.
properties; same goes for robust standard errors. Watch out: this isnt actually valid, since re-
stricted and unrestricted estimation will use a dif-
Check for exact identificationmodels often simplify in these Distance principle / LR
ferent weighting matrix; however, we often com-
cases.
ment that this isnt valid, but use it anyway since
Need to use 2n times a suitably normalized maximized
Look ahead at all the questions; often its best to make an thats the best output available.
objective function, i.e., one that satisfies the prop-
assumption early, since later questions will ask you to give 5. 2SLS (multiple-equation):
erty:
up that assumption. 1
"
QT
# 12
2 QT

1. Heteroscedasticity across observations a
Var T = . 0 ..

e Pz
e

0 0 0 .
Parametrize and use GLS. 2
M
Parametrize and use ML. 1
For efficient GMM, QT (X 0 e)0
b 1 (X 0 e)
satisfies Reported value is usually e0 (I Pz )e; no normalization
Others? 2
the property,
where
b consistently estimates the vari- can get us a usable distance function.
2. Homoscedasticity (across observations) ance of T X 0 e. Invalid under heteroscedasticity (across
3SLS is efficient GMM. We cannot generally use LR tests in the presence of observationsnot efficient GMM; however, its
FIML is asymptotically equivalent to 3SLS. heteroscedasticity. fine with heteroscedasticity across equations).
Invalid if errors are correlated across equations
3. Homoscedasticity (across observations), uncorrelated For GMM estimators, must use efficient weighting ma- (i.e., Phi is not a diagonal matrix)
errors across equations (i.e., diagonal) trix for LR to be valid.
Validity under serial correlation???
3SLS is efficient GMM; 2SLS is not efficient GMM Restricted and unrestricted estimates should use same I dont think any of the regression programs actu-
(e.g., cannot use minimized distance function for weighting matrix to be valid. ally give this output (they typically give e0 (IPz )e;
LR testing), but it is asymptotically distributed maximizing this objective function gives the same
identically to 3SLS Including the 2n and the normalization, use:
estimator, but not the same maximized objective
2SLS should generally have better finite sample function); thus we cant do LR testing here here,
1. MLE: 2 loglik
properties since it uses a priori knowledge of the even though it may be theoretically valid.
form of the S matrix (it assumes that S is 2 I; even Reported value is usually loglik; required normalization
is 2. 6. 3SLS: e0 ( b 1 Pz )e
though it is actually only diagonal, the resulting es-
Reported value is usually as desired; no normalization
timator is numerically identical to what we would Valid no matter what if model is correctly specified. is needed.
get if we only assumed diagonal S: equation-by-
equation 2SLS). 2. LS: SSR/ 2 = e0 e/ 2 Invalid under heteroscedasticity (across
Reported value is usually SSR; required normalization observationsnot efficient GMM; however, its
FIML, LIML, 3SLS, and 2SLS are all asymptoti-
is 1/ 2 . fine with heteroscedasticity across equations).
cally equivalent.
Validity under serial correlation???
4. Homoscedasticity (across observations), uncorrelated Invalid under heteroscedasticity.
Watch out: this isnt actually valid, since re-
errors across equations, homoscedasticity across equa- Validity under serial correlation??? stricted and unrestricted estimation will use a dif-
tions (i.e., = 2 I) Make sure to use the same estimate 2 in both re- ferent weighting matrix; however, we often com-
2SLS and 3SLS are both efficient GMM; they are stricted and unrestricted models (typically SSR/n ment that this isnt valid, but use it anyway since
asymptotically equal and identically distributed. or SSR/(n k) from one of the two models). thats the best output available.

26
Multiple equation systems Finding asymptotic distributions Choice rule (Choice 6) Given a choice set B and preference rela-
tion %, choice rule C(B, %) {x B : y B, x % y}. This
1. Structural equation: can have multiple endogenous 1. CLTs for correspondence gives the set of best elements of B.
variables; typically written as explicitly solved for one
endogenous variable. iid sample If % is complete and transitive and |B| finite and non-empty,
niid sample then C(B, %) 6= .
2. Reduced form equation: only one endogenous vari-
able per equation (and one equation per endogenous ergodic stationary process Revealed preference (Choice 68; MWG 11) We observe choices
variable). ergodic stationary mds and deduce a preference relation. Consider revealed pref-
erences C : 2B 2B satisfying A, C(A) A. Assuming
MA()
the revealed preference sets are always non-empty, there is
Instruments 2. Delta method a well-defined preference relation % (complete and transi-
tive) satisfying A, C(A, %) = C(A) iff C satisfies HARP
1. Must be correlated with endogenous variable and un- 3. MLE is asymptotically normal with variance equal to (or WARP, and the set of budget sets contains all subsets of
correlated with error. inverse of Fisher info (for a single observation, not joint up to three elements).
2. Typically things that are predeterminednot meant distribution)
technically, they were actually determined before the Houthakers Axiom of Revealed Preferences (Choice 6-88) A
endogenous variable. set of revealed preferences C : 2B 2B satisfies HARP iff
Unreliable standard errors x, y U V such that x C(U ) and y C(V ), it is also
3. Predetermined variables, lagged endogenous variables, the case that x C(V ) and y C(U ). In other words, sup-
interactions of the above. 1. Remember standard errors are asymptotic estimates ex- pose two different choice sets both contain x and y; if x is
4. In 2SLS, check the quality of instruments correlation cept in Gaussian OLS; therefore finite sample inference preferred to all elements of one choice set, and y is preferred
with endogenous variable (note this does not check the may be inaccurate. to all elements of the other, then x is also preferred to all
lack of correlation with error) by looking at the first 2. If you run a multiple-stage regression technique in sepa- elements of the second, and y is also preferred to all elements
stage regression: rate stages (e.g., sticking in fitted values along the way). of the first.
R2 ; Weak Axiom of Revealed Preference (MWG 1011) Choice
3. If you stick something into the regression that doesnt
t-statistics on each instrument; belong (e.g., fitted values for a DWH testalthough structure (B, C())where B is the set of budget sets
F -statistic for model as a whole, and potentially for some reason this one may be OK, inverse Mills for satisfies WARP iff B, B 0 B; x, y B; x, y B 0 ;
F -statistic on excluded instruments. sample selection, . . . ). x C(B); y C(B 0 ) together imply that x C(B 0 ).
(Basically, HARP, but only for budget sets).
4. Heteroscedastic errors (when not using robust standard
Omitted variables errors). Generalized Axiom of Revealed Preference (Micro P.S. 1.3) A
set of revealed preferences C : A B satisfies GARP if for
When we get asked to find out what happens if variables are 5. Serially correlated errors (when not using HAC stan- any sequences A1 , . . . , An and x1 , . . . , xn where
excluded (i.e., an incorrectly specified model is estimated), a dard errors).
good tool is the Frisch-Waugh Theorem. 1. i {1, . . . , n}, xi Ai ;
6. Poor instruments (see above).
2. i {1, . . . , n 1}, xi+1 C(Ai );
7. When regression assumptions fail ( e.g., using regular 3. x1 C(An );
Finding probability limits/showing consistency standard errors when inappropriate, failure of fourth
moment assumptions, . . . ). then xi C(Ai ) for all i. (That is, there are no revealed
1. Solve as explicit function of data and use LLN with
preference cycles except for revealed indifference.)
CMT/Slutsky.
2. Solve as explicit function of data and show bias 0 Utility function (Choice 9-13; MWG 9) Utility function u : X R
and variance 0. 2 Microeconomics represents % on X iff x % y u(x) u(y).
3. Solve explicitly, find the probability that | 0 | < , 1. This turns choice rule into a maximization problem:
and show the limit of this probability is 1 (per C&B 2.1 Choice Theory C(B, %) = argmaxyB u(y).
468). 2. A preference relation can be represented by a utility
4. MaCurdy: if the estimator is defined by LT () Rational preference relation (Choice 4; MWG 67) A binary rela- function only if it is rational (complete and transitive).
1 P tion % is a rational preference relation iff it satisfies 3. If |X| is finite, then any rational preference relation %
T t lt () = 0, show that lt (0 ) niid(0), and that
p can be represented by a utility function; if |X| is infinite,
it satisfies an LLN so that LT (0 )
0.
1. Completeness: x, y, x % y y % x (NB: implies x % x); this is not necessarily the case.
5. MaCurdy: if the estimator is defined P by minimiz- 4. If X Rn , then % (complete, transitive) can be repre-
ing HT ()0 MT HT () with HT T1 2. Transitivity: x, y, z, (x % yy % z) = x % z (which
t ht , show that sented by a continuous utility function iff % is contin-
ht (0 ) niid(0), and that it satisfies an LLN so that rules out cycles, except where theres indifference).
p
uous (i.e., limn (xn , yn ) = (x, y) and n, xn % yn
HT (0 ) 0. imply x % y).
If % is rational, then  is both irreflexive and transitive;
6. Aprajit/Hayashi: general consistency theorems with is reflexive, transitive, and symmetric; and x  y % z = 5. The property of representing % on X is ordinal (i.e.,
and without compact parameter space. x  z. invariant to monotone increasing transformations).

27
Interpersonal comparison (Choice 13) It is difficult to weigh util- Quasi-linear preferences (Choice 201; MWG 45) Suppose % on Constant returns to scale (Producer 5) y Y implies y Y
ity tradeoffs between people. Two possible systems are X = R Y is complete and transitive, and that for all 0; i.e., nonincreasing and nondecreasing returns
Rawls veil of ignorance (which effectively makes all the to scale.
choices one persons), and a system of just noticeable dif- 1. The numeraire good (good 1) is valuable: (t, y) %
ferences (which suffers transitivity issues). (t0 , y) iff t t0 ; Convexity (Producer 6) y, y 0 Y imply ty + (1 t)y 0 Y for all
t [0, 1]. Vaguely nonincreasing returns to specialization.
2. Compensation is possible: For every y, y 0 Y , there If 0 Y , then convexity implies nonincreasing returns to
Continuous preference (Choice 11; MWG 467, Micro P.S. 1-5) % on
exists some t R such that (0, y) (t, y 0 ); scale. Strictly convex iff for t (0, 1), the convex combina-
X is continuous if for any sequence {(xn , yn )} n=1 with
limn (xn , yn ) = (x, y) and n, xn % yn , we have x % y. 3. No wealth effects: If (t, y) % (t0 , y 0 ), then for all d R, tion is in the interior of Y .
Equivalently, % is continuous iff for all x, the upper and (t + d, y) % (t0 + d, y 0 ).
Transformation function (Producer 6, 24) A function T : Rn R
lower contour sets of x are both closed sets. % is rational
with T (y) 0 y Y . Can be interpreted as the
and continuous iff it can be represented by a continuous util- Then there exists a utility function representing % of the
amount of technical progress required to make y feasible.
ity function. form u(t, y) = t + v(y) for some v : Y R. (Note it can also
The set {y : T (y) = 0} is the transformation frontier.
be represented by utility functions that arent of this form.)
Monotone preference (Choice 156; MWG 423) % is monotone iff Conversely, any preference relation % on X = R Y rep- Kuhn-Tucker FOC gives necessary condition T (y ) = p,
x y = x % y (i.e., more of something is better). resented by a utility function of the form u(t, y) = t + v(y) which means the price vector is normal to the production
(MWG uses x  y = x  y; this is not equivalent.) satisfies the above conditions. (MWG uses slightly different possibility frontier at the optimal production plan.
formulation.)
Strictly/strongly monotone iff x > y = x  y. Marginal rate of transformation (Producer 67) When the
Lexicographic preferences (MWG 46) A preference relation % on transformation function T is differentiable, MRT between
% is (notes) monotone iff u() is nondecreasing. % is strictly T (y) T (y)
monotone iff u() monotone increasing. Strictly monotone R2 defined by (x, y) % (x0 , y 0 ) iff x > x0 or x = x0 y y 0 . goods k and l is MRTk,l (y) y / y . Measures
l k
= (notes or MWG) monotone. MWG monotone = Lexicographic preferences are complete, transitive, strongly the extra amount of good k that can be obtained per unit
locally non-satiated (on e.g., Rn monotone, and strictly convex; however, they is not contin- reduction of good l. Equals the slope of the transformation
+ ).
uous and cannot be represented by any utility function. frontier.
Locally non-satiated preference (Choice 156; MWG 423) % is lo-
cally non-satiated on X iff for any y X and > 0, there Production function (Producer 7) For a firm with only a single
exists x X such that kx yk and x  y (i.e., there 2.2 Producer theory output q (and inputs z), defined as f (z) max q such that
are no thick indifference curves). % is locally non-satiated T (q, z) 0. Thus Y = {(q, z) : q f (z)}, allowing for
iff u() has no local maxima in X. Strictly monotone = Competitive producer behavior (Producer 12) Firms choose a free disposal.
MWG monotone = locally non-satiated (on e.g., Rn + ).
production plan (technologically feasible set of inputs and
Marginal rate of technological substitution (Producer 7)
outputs) to maximize profits. Assumptions include:
When the production function f is differentiable, MRTS
Convex preference (Choice 156; MWG 445) % is convex on X iff f (z) f (z)
(de facto, X is a convex set, and) x % y and x0 % y together 1. Firms are price takers (applies to both input and output between goods k and l is MRTSk,l (z) z / z . Mea-
l k
imply that t (0, 1), tx + (1 t)x0 % y (i.e., one never gets markets); sures how much of input k must be used in place of one unit
worse off by mixing goods). Equivalently, % is convex on X of input l to maintain the same level of output. Equals the
2. Technology is exogenously given; slope of the isoquant.
iff the upper contour set of any y X (i.e., {x X : x % y})
is a convex set. Can be interpreted as diminishing marginal 3. Firms maximize profits; should be true as long as
Profit maximization (Producer 78) The firms optimal produc-
rates of substitution. The firm is competitive; tion decisions are given by correspondence y : Rn Rn
% is strictly convex on X iff X is a convex set, and x % y There is no uncertainty about profits;
and x0 % y (with x 6= x0 ) together imply that t (0, 1), y(p) argmax p y = {y Y : p y = (p)}.
Managers are perfectly controlled by owners. yY
tx + (1 t)x0  y.
% is (strictly) convex iff u() is (strictly) quasi-concave. Production plan (Producer 4) A vector y = (y1 , . . . , yn ) Rn Resulting profits are given by
where an output has yk > 0 and an input has yk < 0.
Homothetic preference (MWG 45, Micro P.S. 1.6) % is homothetic (p) sup p y.
yY
iff for all > 0, x % y x % y. (MWG uses 0, Production set (Producer 4) Set Y Rn of feasible production
x y = x y.) A continuous preference relation is plans; generally assumed to be non-empty and closed. Rationalization: profit maximization functions (Producer 9
homothetic iff it can be represented by a utility function that 11, 13)
is homogeneous of degree one (note it can also be represented Free disposal (Producer 5) y Y and y0 y imply y0 Y.
by utility functions that arent). 1. Profit function () is rationalized by production set Y
Shutdown (Producer 5) 0Y. iff p, (p) = supyY p y.
Separable preferences (Choice p.189) Suppose % on X Y is rep-
2. Supply correspondence y() is rationalized by produc-
resented by u(x, y). Then preferences over x do not depend Nonincreasing returns to scale (Producer 5) y Y implies y tion set Y iff p, y(p) argmaxyY p y.
on y iff there exist functions v : X R and U : R Y R Y for all [0, 1]. Implies shutdown.
such that U is increasing in its first argument and (x, y), 3. () or y() is rationalizable if it is rationalized by some
u(x, y) = U (v(x), y). Note that this property is asymmet- Nondecreasing returns to scale (Producer 5, Micro P.S. 2-1) y Y production set.
ric. Preferences over x given y will be represented by v(x), implies y Y for all 1. Along with shutdown, implies 4. () and y() are jointly rationalizable if they are both
regardless of y. that (p) = 0 or (p) = + for all p. rationalized by the same production set.

28
We seek a Y that rationalizes both y() and (). Consider an The latter two properties imply WAPM. The second de- Single-output case (Producer 22) For a single-output firm with
inner bound: allSproduction plans the firm chooses must scribes the FOC of the maximization problem, the third term free disposal, production set described as {(q, z) : z
be feasible (Y I pP y(p)). Consider an outer bound: describes the second-order condition. Rm+ , q [0, f (z)]}. With positive output price p,
Y can only include points that dont give higher profits than profit-maximization requires q = f (z), so firms maximize
(p) (Y O {y : p y (p) for all p P }). A nonempty- Rationalization: differentiable y() (Producer 16) Differentiable maxzRm +
pf (z) w z, where w Rm
+ input prices.
valued supply correspondence y() and profit function () y : P Rn on an open convex set P Rn is rationalizable
on a price set are jointly rationalized by production set Y iff: iff Cost minimization (Producer 22, Micro P.S. 2-4) For a fixed output
level q 0, firms minimize costs, choosing inputs according
1. p y = (p) for all y y(p) (adding-up); 1. y() is homogeneous of degree zero; to a conditional factor demand correspondence:
2. Y I Y Y O ; i.e., p y 0 (p) for all p, p0 , and all 2. The Jacobian Dy(p) is symmetric and positive semidef-
inite. c(q, w) inf w z;
y 0 y(p0 ) (Weak Axiom of Profit Maximization). z : f (z)q

We construct () by adding-up, ensure Hotellings Lemma Z (q, w) argmin w z
If we observe a firms choices for all positive price vectors on z : f (z)q
an open convex set P , then necessary conditions for ratio- by symmetry and homogeneity of degree zero, and ensure
nalizability include: convexity of () by Hotellings lemma and PSD. = {z : f (z) q, and w z = c(q, w)}.

Rationalization: differentiable () (Producer 17) Differentiable Once these problems are solved, firms solve maxq0 pq
1. () must be a convex function;
: P R on a convex set P Rn is rationalizable iff c(q, w).
2. () must be homogeneous of degree one; i.e., (p) = c
(p) for all p P and > 0; 1. () is homogeneous of degree one; By the envelope theorem, w
(w, q) = Z (q, w).

3. y() must be homogeneous of degree zero; i.e., y(p) = 2. () is convex. Rationalization: single-output cost function (Producer 23, Mi-
y(p) for all p P and > 0. cro P.S. 2-2) Conditional factor demand function z : R W
We construct y() by Hotellings Lemma, and ensure adding- Rn and differentiable cost function c : R W R for a fixed
Loss function (Producer 12) L(p, y) (p) p y. This is the up by homogeneity of degree one; convexity of () is given. output q on an open convex set W Rm of input prices are
loss from choosing y rather than the profit-maximizing fea- jointly rationalizable iff
sible production plan. The outer bound can be written Rationalization: general y() and () (Producer 179) y : P
Y 0 = {y : inf p L(p, y) 0}. Rn and : P R on a convex set P Rn are jointly ratio- 1. c(q, w) = w z(q, w) (adding-up);
nalizable iff for any selection y(p) y(p),
2. w c(q, w) = z(q, w) (Shephards Lemma);
Hotellings Lemma (Producer 14) (p) = y(p), assuming differ-
entiable (). Equivalently, p L(p, y)|p=p0 = (p0 ) y = 0 1. p y(p) = (p) (adding-up); 3. c(q, ) is concave.
for all y y(p0 ). An example of the Envelope Theorem. Im- 2. (Producer Surplus Formula) For any p, p0 P ,
plies that if () is differentiable at p, then y(p) must be a Other necessary properties follow from corresponding prop-
singleton.
Z 1 erties of profit-maximization, e.g.,
(p0 ) = (p) + (p0 p) y(p + (p0 p)) d;
0
Substitution matrix (Producer 156) The Jacobian of the optimal 1. c(q, ) is homogeneous of degree one in w;
supply function, Dy(p) = [yi /pj ]. By Hotellings Lemma, 3. (p0 p) (y(p0 ) y(p)) 0 (Law of Supply). 2. Z (q, ) is homogeneous of degree zero in w;
Dy(p) = D2 (p) (the Hessian of the profit function), hence
the substitution matrix is symmetric. Convexity of () im- 3. If Z (q, ) is differentiable, then the matrix
Producer Surplus Formula (Producer 1720) (p0 ) = (p) +
plies positive semidefiniteness. R1 0 0
Dw Z (q, w) = Dw
2 c(q, w) is symmetric and negative
0 (p p) y(p + (p p)) d. semidefinite;
Law of Supply (Producer 16) (p0 p) (y(p0 ) y(p)) 0; i.e., sup- 1. Works in the opposite direction of Hotellings Lemma: 4. Under free disposal, c(, w) is nondecreasing in q;
ply curves are upward-sloping. Law of Supply is the finite- it recovers the firms profits from its choices, rather than
difference equivalent of PSD of substitution matrix. Follows 5. If the production function has nondecreasing (nonin-
the other way around. creasing) RTS, the average cost function c(q, w)/q is
from WAPM (p y(p) p y(p0 )).
2. If () is differentiable, integrating Hotellings Lemma nonincreasing (nondecreasing) in q;
Rationalization: y() and differentiable () (Producer 15)
along the linear path from p to p0 gives PSF; however 6. If the production function is concave, the cost function
y : P Rn (the correspondence ensured to be a func- PSF is more general (doesnt require differentiability of c(q, w) is convex in q.
tion by Hotellings lemma, given differentiable ()) and ()).
differentiable : P R on an open convex set P Rn are 3. As written the integral is along a linear path, but it is Monopoly pricing (MWG 3847) Suppose demand at price p is
jointly rationalizable iff actually path-independent. x(p), continuous and strictly decreasing at all p for which
x(p) > 0. Suppose the monopolist faces cost function
1. p y(p) = (p) (adding-up); 4. PSF allows calculation of change in profits when price of
c(q). Monopolist solves maxp px(p) c(x(p)) for optimal
good i changes by knowing only the supply function for
price, or maxq0 p(q)q c(q) for optimal quantity (where
2. (p) = y(p) (Hotellings Lemma); good i; we need not know the prices or supply functions
p() = x1 () is the inverse demand function). Further as-
for other goods: (pi , b) (pi , a) = ab yi (pi ) dpi .
R
3. () is convex. sumptions:

If Y is convex and closed and has free disposal, and P = Rn O
+ \ {0}, then Y = Y .

29
1. p(q), c(q) continuous and twice differentiable at all Increasing differences (Producer 301, Micro P.S. 2-4) F : X T Milgrom-Shannon Monotonicity Theorem (Producer 34)
q 0; R (with X, T R) has ID (a.k.a. weakly increasing differ- X (t) argmaxxX F (x, t) is nondecreasing in t in SSO for
ences) iff for all x0 > x and t0 > t, F (x0 , t0 ) + F (x, t) all sets X R iff it has the single-crossing condition (which
2. p(0) > c0 (0) (ensures that supply and demand curves
F (x0 , t) + F (x, t0 ). Strictly/strongly increasing differences is non-symmetric): for all x0 > x and t0 > t,
cross);
(SID) iff F (x0 , t0 ) + F (x, t) > F (x0 , t) + F (x, t0 ).
3. There exists a unique socially optimal output level
Assuming F (, ) is sufficiently smooth, all of the following F (x0 , t) F (x, t) = F (x0 , t0 ) F (x, t0 ), and
q O (0, ) such that p(q o ) = c0 (q o ).
are equivalent:
F (x0 , t) > F (x, t) = F (x0 , t0 ) > F (x, t0 ).
A solution qm [0, q o ]
exists, and satisfies FOC p0 (q m )q m +
1. F has ID;
p(q m ) = c0 (q m ). If p0 (q) < 0, then p(q m ) > c0 (q m ); i.e.,
monopoly price exceeds optimal price. 2. Fx (x, t) is nondecreasing in t for all x; MCS: robustness to objective function perturbation
3. Ft (x, t) is nondecreasing in x for all t; (Producer 345) [Milgrom-Shannon] X (t)
argmaxxX [F (x, t) + G(x)] is nondecreasing in t in SSO
2.3 Comparative statics 4. Fxt (x, t) 0 for all (x, t); for all functions G : X R iff F () has ID. Note Topkis
5. F (x, t) is supermodular. gives sufficiency of ID.
Implicit function theorem (Producer 278) Consider x(t)
argmaxxX F (x, t). Suppose:
Additional results: Complement inputs (Producer 402) Restrict attention to price
1. F is twice continuously differentiable; vectors (p, w) Rm+1
+ at which input demand correspon-
1. If F (, ) and G both have ID, then for all , 0, the dence z(p, w) is single-valued. If production function f (z) is
2. X is convex; function F + G also has ID. increasing and supermodular, then z(p, w) is nondecreasing
2. If F has ID, and g1 () and g2 () are nondecreasing func- in p and nonincreasing in w. That is, supermodularity of the
3. Fxx < 0 (strict concavity of F in x; together with con-
tions, then F (g1 (), g2 ()) has ID. production function implies price-theoretic complementarity
vexity of X, this ensures a unique maximizer);
of inputs.
4. t, x(t) is in the interior of X. 3. Suppose h() is twice differentiable. Then h(x t) has
ID in x, t iff h() is concave. If profit function (p, w) is continuously differentiable, then
Then the unique maximizer is given by Fx (x(t), t) = 0, and zi (p, w) is nonincreasing in wj for all i 6= j iff (p, w) is
Supermodularity (Producer 37) F : X Rn on a sublattice X is supermodular in w.
Fxt (x(t), t) supermodular iff for all x, y X, we have F (x y) + F (x
x0 (t) = . y) F (x) + F (y).
Fxx (x(t), t) Substitute inputs (Producer 412) Suppose there are only two in-
If X is a product set, F () is supermodular iff it has ID in all puts. Restrict attention to price vectors (p, w) R3+ at which
Note by strict concavity, the denominator is negative, so x0 (t) pairs (xi , xj ) with i 6= j (holding other variables xij fixed). input demand correspondence z(p, w) is single-valued. If pro-
and Fxt (x(t), t) have the same sign. duction function f (z) is increasing and submodular, then
Submodularity (Producer 41) F () is submodular iff F () is su- z1 (p, w) is nondecreasing in w2 and z2 (p, w) is nondecreas-
Envelope theorem (Clayton Producer I 68) Given a constrained op- permodular. ing in w1 . That is, submodularity of the production function
timization v() = maxx f (x, ) such that g1 (x, ) b1 ; . . . ; implies price-theoretic substitutability of inputs in the two
gK (x, ) bK , comparative statics on the value function are Topkis Theorem (Producer 312, 8) If input case.
given by:
If there are 3 inputs, feedback between inputs with un-
1. F : X T R (with X, T R) has ID,
K changing prices makes for unpredictable results.
v f X gk L
= k = 2. t0 > t,
i i x k=1
i x
i x If profit function (p, w) is continuously differentiable, then
3. x X (t) argmaxX F (, t), and x0 X (t0 ), then zi (p, w) is nondecreasing in wj for all i 6= j iff (p, w) is
(for Lagrangian L) for all such that the set of binding con- submodular in w.
straints does not change in an open neighborhood. min{x, x0 } X (t) and max{x, x0 } X (t0 ). In other
words, X (t) X (t0 ) in strong set order. This implies
Can be thought of as application first of chain rule, and then sup X () and inf X () are nondecreasing; if X () is single- LeChatelier principle (Producer 4245) Argument (a.k.a
of FOCs. valued, then X () is nondecreasing. Samuelson-LeChatelier principle) that firms react more
to input price changes in the long-run than in the short-
If F : X T R (with X a lattice and T fully ordered) is run, because it has more inputs that it can adjust. Does
Envelope theorem (integral form) (Clayton Producer II 910)
supermodular in x and has ID in (x, t); t0 > t; and x X (t) not consistently hold; only holds if each pair of inputs are
[a.k.a. Holmstroms Lemma] Given an optimization v(q) =
and x0 X (t0 ), then (x x0 ) X (t) and (x x0 ) X (t0 ).
maxx f (x, q), the envelope theorem gives us v 0 (q) = substitutes everywhere or complements everywhere.
In other words, X () is nondecreasing in t in the stronger
fq0 (x(q), q). Integrating gives
set order. Suppose twice differentiable production function f (k, l) sat-
Z q2 isfies either fkl 0 everywhere, or fkl 0 everywhere. Then
f
v(q2 ) = v(q1 ) + (x(q), q) dq. Monotone Selection Theorem (Producer 32) Analogue of Top- if wage wl increases (decreases), the firms labor demand will
q1 q kis Theorem for SID. If F : X T R with X, T R has decrease (increase), and the decrease (increase) will be larger
SID, t0 > t, x X (t), and x0 X (t0 ), then x0 x. in the long-run than in the short-run.

30
2.4 Consumer theory 2. For z z(p) (where z() is excess demand z(p) 3. Slutsky matrix is negative semidefinite (since e(, u) is
x(p, p e) e), we have p z = 0; concave in prices),
vp
Roy: x= v/w 3. v(p, w) is nonincreasing in p and strictly increasing in 4. Slutsky matrix satisfies Dp h(p, u)p = 0 (i.e., h(, u is
r w. homogeneous of degree zero in prices).
x(p, w)
K S 2 v(p,O w)
v=u(x) Expenditure minimization problem (Consumer 9) minxRn +
p Rationalization: differentiable x (?) Slutsky matrix can be
x such that u(x) u; where u > u(0) and p  0. Finds the generated using Slutsky equation. Rationalizability requires
x x h x=h(p,v) v(p,e)=u
Slutsky: p i + wi xj = p i
h=x(p,e) e(p,v)=w
cheapest bundle that yields utility at least u. Equivalent to Marshallian demand to be homogeneous of degree 0, and
j j
cost minimization for a single-output firm with production the Slutsky matrix to be symmetric and negative semidefi-
function u. nite. [Potentially also positive everywhere and/or increasing
 Shephard: h=p e
r  in w?]
If p  0, u() is continuous, and x such that u(x) u, then
h(p, u) 2 e(p, u) EMP has a solution.
Adding-up: e=ph Rationalization: differentiable e (?) Rationalizability re-
Expenditure function (Consumer 9) e(p, u) minxRn p x such quires e to be:
+
Budget set (Consumer 2) Given prices p and wealth w, B(p, w)
that u(x) u.
{x Rn
+ : p x w}. 1. Homogeneous of degree one in prices;
Hicksian demand correspondence (Consumer 9) [a.k.a. com- 2. Concave in prices;
Utility maximization problem (Consumer 12, 68)
pensated demand] h : Rn n
+ R+ R+ with h(p, u) {x
maxxRn u(x) such that p x w, or equivalently 3. Increasing in u;
+ Rn+ : u(x) u} = argmin xRn p x such that u(x) u.
maxxB(p,w) u(x). Assumes: +
4. Positive everywhere, or equivalently nondecreasing in
Relating Marshallian and Hicksian demand (Consumer 10) all prices.
1. Perfect information,
Suppose preferences are continuous and locally non-satiated,
2. Price taking, and p  0, w 0, u u(0). Then: Slutsky equation (Consumer 134) Relates Hicksian and Marshal-
3. Linear prices, lian demand. Suppose preferences are continuous and lo-
1. x(p, w) = h(p, v(p, w)), cally non-satiated, p  0, and demand functions h(p, u) and
4. Divisible goods.
2. e(p, v(p, w) = w, x(p, w) are single-valued and differentiable. Then for all i, j,
P
Construct Lagrangian L = u(x) + (w p x) + i i xi . 3. h(p, u) = x(p, e(p, u)), xi (p, w) hi (p, u(x(p, w))) xi (p, w)
If u is concave and differentiable, Kuhn-Tucker conditions = xj (p, w),
4. v(p, e(p, u) = u. pj pj w
(FOCs, nonnegativity, complementary slackness) are neces-
} | {z }
sary and sufficient. Thus u/xk pk with equality if
| {z } | {z
Rationalization: h and differentiable e (Consumer 11) Hicksian Total Substitution Wealth
xk > 0. For any two goods consumed in positive quantities, demand function h : P R+ Rn
u/x + and differentiable expen-
xi hi xi
pj /pk = u/x j MRSjk . The Lagrange multiplier on the diture function e : P R R on an open convex set P Rn or more concisely, pj
= pj
w j
x .
k
budget constraint is the value in utils of an additional unit are jointly rationalizable by expenditure-minimization for a
of wealth; i.e., the shadow price of wealth or marginal utility given utility level u of a monotone utility function iff: Derived by differentiating hi (p, u) = xi (p, e(p, u)) with re-
of wealth or income. spect to pj and applying Shephards lemma.
1. e(p, u) = p h(p, u) (adding-uptogether with Shep-
Indirect utility function (Consumer 34) v(p, w) hards Lemma, ensures e(, u) is homogeneous of degree Normal good (Consumer 15) Marshallian demand xi (p, w) increas-
supxB(p,w) u(x). Homogeneous of degree zero. one in prices); ing in w. By Slutsky equation, normal goods must be regular.
2. p e(p, u) = h(p, u) (Shephards Lemmaequivalent to
Marshallian demand correspondence (Consumer 34) [a.k.a. Inferior good (Consumer 15) Marshallian demand xi (p, w) decreas-
envelope condition applied to e(p, u) = minh p h);
Walrasian or uncompensated demand] x : Rn ing in w.
+ R+
Rn 3. e(, u) is concave in prices.
+ with x(p, w) {x B(p, w) : u(x) = v(p, x)} =
argmaxxB(p,w) u(x). Regular good (Consumer 15) Marshallian demand xi (p, w) de-
Rationalization: differentiable h (Consumer 12) A continuously creasing in pi .
differentiable Hicksian demand function h : P R+ Rn
+ on
1. Given continuous preferences, x(p, w) 6= for p  0
an open convex set P Rn is rationalizable by expenditure- Giffen good (Consumer 15) Marshallian demand xi (p, w) increas-
and w 0.
minimization with a monotone utility function iff ing in pi . By Slutsky equation, Giffen goods must be infe-
2. Given convex preferences, x(p, w) is convex-valued. rior.
3. Given strictly convex preferences, x(p, w) is single- 1. Hicksian demand is increasing in u; and
valued. 2. The Slutsky matrix Substitute goods (Consumer 156) Goods i and j substitutes iff
4. x(p, w) is homogeneous of degree zero. Hicksian demand hi (p, u) is increasing in pj . Symmetric re-
h1 (p,u) hn (p,u)
p1
p1 lationship. In a two-good world, the goods must be substi-
tutes.

Walras Law (Consumer 4) Given locally non-satiated preferences: Dp h(p, u) =
.. .. ..
. . .

1. For x x(p, w), we have p x = w (i.e., Marshallian de- h1 (p,u) hn (p,u) Complement goods (Consumer 156) Goods i and j complements
pn
pn
mand is on budget line, and we can replace inequality iff Hicksian demand hi (p, u) is decreasing in pj . Symmetric
constraint with equality in consumer problem); is symmetric, relationship.

31
Gross substitute (Consumer 157) Good i is a gross substitute for Equivalent variation (Consumer 21, 3) How much additional ex- 2.5 Choice under uncertainty
good j iff Marshallian demand xi (p, w) is increasing in pj . penditure is required at old prices p to achieve same utility
Not necessarily a symmetric relationship. as consumption at p0 (equivalent to price changeconsumer Lottery (Uncertainty 24) A vector of probabilities adding to 1 as-
faces either new prices or revised wealth). signed to each possible outcome (prize). The set of lotteries
Gross complement (Consumer 157) Good i is a gross complement for a given prize space is convex.
for good j iff Marshallian demand xi (p, w) is decreasing in EV e(p, u0 ) e(p0 , u0 )
pj . Not necessarily a symmetric relationship. Preference axioms under uncertainty (Uncertainty 56) In ad-
= e(p, u0 ) w dition to (usual) completeness and transitivity, assume pref-
Engle curve (Consumer 156) [a.k.a. income expansion curve] For erences are:
a given price p, the locus of Marshallian demands at various which giveswhen only price i is changingthe area to the
left of the Hicksian demand curve corresponding to the new 1. Continuous: For any p, p0 , p00 P with p % p0 % p00 ,
wealth levels.
utility u0 by the consumer surplus formula and Shephards there exists [0, 1] such that p + (1 )p00 p0 .
Offer curve (Consumer 167) [a.k.a. price expansion path] For a Lemma: 2. Independent: [a.k.a. substitution axiom] For any p,
given wealth w and prices (for goods other than good i) pi , p0 , p00 P and [0, 1], we have p % p0
pi e(p, u0 ) pi p + (1 )p00 % p0 + (1 )p00 .
Z Z
the locus of Marshallian demands at various prices pi .
= dpi = hi (p, u0 ) dpi .
p0i pi p0i Expected utility function (Uncertainty 610) Utility function
Roys identity (Consumer 178) Gives Marshallian demand from
indirect utility: u : P R has expected utility form iff there are numbers
Marshallian consumer surplus (Consumer (u1 , . . . , un ) for each of P
the n (certain) outcomes such that
R p 234) Area to the left
v(p, w)/pi of Marshallian demand curve: CS p0i xi (p, w) dpi . for every p P, u(p) = i pi ui .
xi (p, w) = , i
v(p, w)/w Equivalently, for any p, p0 P, [0, 1], we have
Price index (Consumer 256) Given a basket of goods consumed at u(p + (1 )p0 ) = U (p) + (1 )U (p0 ).
Derived by differentiating v(p, e(p, u)) = u with respect quantity x given price p, and quantity x0 given price p0 , Unlike a more general utility function, an expected utility
to p and applying Shephards lemma. Alternately, by ap- 0 0 functions is not merely ordinalit is not invariant to any
x p x
plying envelope theorem to utility maximization problem 1. Laspeyres index: ppx = e(p,u) (basket is old pur- increasing transformation, only to affine transformations. If
v L
v(p, w) = maxx : pxw u(x) (giving w = w = and chases). Overestimates welfare effects of inflation due u() is an expected utility representation of %, then v() is
v L to substitution bias. also an expected utility representation of % iff a R, b > 0
p
= p
= x.)
x 0 0 e(p0 ,u0 ) such that v(p) = a + bu(p) for all p P.
2. Paasche index: ppx 0 = px0 (basket is new pur-
Consumer welfare: price changes (Consumer 20-2, 4) Welfare Preferences can be represented by an expected utility func-
chases). Underestimates welfare effects of inflation.
change in utils when prices change from p to p0 is v(p0 , w) tion iff they are complete, transitive, and satisfy continuity
v(p, w), but because utility is ordinal, this is meaningless. e(p0 ,u)
3. Ideal index: e(p,u) for some fixed utility level u, gener- and independence (assuming |P| < ; otherwise we also
More useful to have dollar-denominated measure. So mea- need the sure thing principle). Obtains since both require
ally either u(x) or u(x0 ); the percentage compensation
sure amount of additional wealth required to reach some ref- indifference curves to be parallel straight lines.
in the wealth of a consumer with utility u needed to
erence utility, generally either previous utility (CV) or new
make him as well off at the new prices as he was at the Bernoulli utility function (Uncertainty 12) Assuming prize space
utility (EV).
old ones. X is an interval on the real line, Bernoulli utility function
If preferences are quasi-linear, then CV = EV. u : X R assumed increasing and continuous.
Paasche Ideal Laspeyres. Substitution biases result from
On any range where the good in question is either normal or
using the same basket of goods at new and old prices. Include von Neumann-Morgenstern utility function (Uncertainty 12)
inferior, min{CV, EV} CS max{CV, EV}.
An expected utility representation of preferences over lot-
Compensating variation (Consumer 21, 3) How much less wealth 1. New good bias, teries characterized by a cdf over prizes X (an interval on
consumer needs to achieve same utility at prices p0 as she 2. Outlet bias. the real line). If F (x) is the probability of receiving less
had at p (compensating for price changeconsumer faces than or equal to x, and u() is the Bernoulli
R utility function,
both new prices and new wealth). Aggregating consumer demand (Consumer 2932) then vN-M utility function U (F ) R u(x) dF (x).

0 Risk aversion (Uncertainty 124) A decision maker is (strictly) risk-


CV e(p, u) e(p , u) 1. Can we predict aggregate demand knowing only aggre-
averse iff for Rany non-degenerate lottery F () with expected
= w e(p0 , u) gate wealth (not distribution)? True iff indirect utility
value EF = R x dF (x), the lottery EF which pays EF for
functions take Gorman form: vi (p, wi ) = ai (p) + b(p)wi
certain is (strictly) preferred to F .
with the same function b() for all consumers.
which giveswhen only price i is changingthe area to the R R
Stated mathematically, u(x) dF (x) u( R x dF (x)) for all
left of the Hicksian demand curve corresponding to the old 2. Can aggregate demand be explained as though there
F (), which by Jensens inequality obtains iff u() is concave.
utility u by the consumer surplus formula and Shephards were a single positive representative consumer?
Lemma: The following notions of u() being more risk-averse then
3. (If 2 holds), can the welfare of the representative con- v() are equivalent:
Z pi Z pi sumer be used as a proxy for some welfare aggregate of
e(p, u) individual consumers? (i.e., Do we have a normative 1. F %u x = F %v x for all F and x.
= dpi = hi (p, u) dpi .
p0i pi p0i representative consumer?) 2. Certain equivalent c(F, u) c(F, v) for all F .

Note this does not mean F %u G = F %v G where G is also a risky prospectthis would be a stronger version of more risk averse.

32
3. u() is more concave than v(); i.e., there exists an If insurance is actuarially fair (q = p), the agent fully insures 1. i, ui () is continuous;
increasing concave function g() such that u = g v. (a = L) for all wealth levels. If p < q, the agents insurance 2. i, ui () is increasing; i.e., ui (x0 ) > ui (x) whenever
4. Arrow-Pratt coefficient A(x, u) A(x, v) for all x. coverage a will decrease (increase) with wealth if the agent x0  x;
has decreasing (increasing) absolute risk aversion.
3. i, ui () is concave;
Certain equivalent (Uncertainty 134) c(F, u) is the certain pay-
out such that c(F,u) u F , or equivalently u(c(F, u)) = Portfolio problem (Uncertainty 235) A risk-averse agent with 4. i, ei  0;
R wealth w must choose to allocate investment between a
R u(x) dF (x). Given risk aversion (i.e., concave u), c(F, u) safe asset that returns r and a risky asset that pays re- Walrasian equilibrium (G.E. 4, 17, 249) A WE for economy E is
EF . a vector of prices and allocations (p, (xi )iI ) such that:
turn z with cdf F .
Absolute risk aversion (Uncertainty 146) For a twice differen- If risk-neutral, the agent will invest all in the asset with 1. Agents maximize their utilities: maxxBi (p) ui (x) for
tiable Bernoulli utility function u(), the Arrow-Pratt coeffi- higher expected return (r or E z). If (strictly) risk-averse, all i I;
cient of absolute risk aversion is A(x) u00 (x)/u0 (x). she will invest at least some in the risky asset as long as P i =
P i
its real expected return is positive. (To see why, consider
2. Markets clear: iI xlP iI el for all l L, or
u() has decreasing (constant, increasing) absolute risk aver- equivalently iI xi = iI ei .
P
sion iff A(x) is decreasing (. . . ) in x. Under DARA, if I will marginal utility to investing in the risky asset at investment
gamble $10 when poor, I will gamble $10 when rich. a = 0.) Under assumptions 14 above, a WE exists (proof using fixed
Since R(x) = xA(x), we have IARA = IRRA. If u is more risk-averse than v, then u will invest less in the point theorem). WE are not generally unique, but are locally
risky asset than v for any initial level of wealth. An agent unique (and there are an odd number of them). Price ad-
Certain equivalent rate of return (Uncertainty 16) A propor- with decreasing (constant, increasing) absolute risk aversion justment process (tatonnement) may not converge to an
tionate gamble pays tx where t is a non-negative random will invest more (same, less) in the risky asset at higher levels equilibrium.
variable with cdf F . The certain
R equivalent rate of return is of wealth.
cr(F, x, u) t where u(tx) = u(tx) dF (t). Feasible allocation (G.E. 4) An allocation (xi )iI RIL
+ is fea-
sible iff iI xi iI ei .
P P
Subjective probabilities (Uncertainty 268) We relax the assump-
Relative risk aversion (Uncertainty 16) For a twice differentiable tion that there are objectively correct probabilities for var-
Bernoulli utility function u(), the coefficient of relative risk ious states of the world to be realized. If preferences over Pareto optimality (G.E. 5) A feasible allocation x (xi )iI for
aversion is R(x) xu00 (x)/u0 (x) = xA(x). acts (bets) satisfy a set of properties similar in spirit to economy E is Pareto optimal iff there is no other feasible al-
the vN-M axioms (completeness, transitivity, something like location x such that ui (xi ) ui (xi ) for all i I with strict
u() has decreasing (constant, increasing) relative risk aver-
continuity, the sure thing principle, and two axioms that have inequality for some i I.
sion iff R(x) is decreasing (. . . ) in x. An agent exhibits
DRRA iff certain equivalent rate of return cr(F, x) is increas- the flavor of substitution), then decision makers choices are
consistent with some utility function and some prior proba- Edgeworth box (G.E. 610) Graphical representation of the two-
ing in x. Under DRRA, if I will invest 10% of my wealth in good, two-person exchange economy. Bottom left corner is
a risky asset when poor, I will invest 10% when rich. bility distribution (Savage 1954).
origin for one consumer; upper right corner is origin for other
Since R(x) = xA(x), we have DRRA = DARA. Savage requires an exhaustive list of possible states of the consumer (with direction of axes reversed). Budget line has
world. No reason to assume different decision makers are us- slope p1 /p2 , and passes through endowment e.
First-order stochastic dominance (Uncertainty 178) cdf G first- ing the same implied probability distribution over states, al-
order stochastically dominates cdf F iff G(x) F (x) for all though we often make a common prior assumption, which 1. Locus of Marshallian demands for each consumer as rel-
x. implies that differences in opinion are due to differences in ative prices shift is her offer curve. WE are intersec-
information. tions of the two consumers offer curves.
Equivalently, forR every nondecreasing function u : R R,
R
u(x) dG(x) u(x) dF (x). 2. Set of PO allocations is locus of points of tangency be-
tween the two consumers indifference curves, generally
Equivalently, we can construct G as a compound lottery 2.6 General equilibrium a path from the upper right to lower left of the Edge-
starting with F and followed by (weakly) upward shifts. worth box.
Walrasian model (G.E. 34, 5) An economy E ((ui , ei )iI )
Second-order stochastic dominance (Uncertainty 1821) cdf G comprises: 3. Portion of Pareto set that lies between the indifference
second-order stochastically dominates cdf FR (where F and curves that pass through e is the contract curve: PO
x 1. L commodities (indexed l L {1, . . . , L)); outcomes preferred by both consumers to their endow-
G have the same mean ) iff for every x, G(y) dy
Rx
F (y) dy. 2. I agents (indexed i I {1, . . . , I)), each with ments.

Equivalently, Endowment ei RL
+ , and
First Welfare Theorem (G.E. 11) If i, ui () is increasing (i.e.,
R for every (nondecreasing?) concave function
ui (x0 ) > ui (x) whenever x0  x) and (p, (xi )iI ) is a WE,
R
u : R R, u(x) dG(x) u(x) dF (x). Utility function ui : RL R.
+ then the allocation (xi )iI is PO. Note implicit assumptions
Equivalently, we can construct F as a compound lottery such as
starting with G and followed by mean-preserving spreads. Given market prices p RL + , each agent chooses con-
sumption to maximize utility given a budget constraint: 1. All agents face same prices;
Demand for insurance (Uncertainty 213) A risk-averse agent maxxRL ui (x) such that p x p ei , or equivalently
with wealth w faces a probability of p of incurring a loss + 2. All agents are price takers;
p Bi (p) {x : p x p ei }.
L. She can insure against this loss by buying a policy that 3. Markes exist for all goods, and individuals can freely
will pay out a in the event the loss occurs, at cost qa. We often assume (some or all of): participate;

If E F > E G, there is always a concave utility function that will prefer F to G.

33
4. Prices are somehow arrived at. 2. If z(p ) = 0 (i.e., p are WE prices), then for any p not 2. There is a unique roota node that has no predecessors
colinear with p , we have p z(p) > 0. and is a predecessor for everything else;
Proof by adding results of Walras Law across consumer at dp 3. There is a unique path (following precedence) from the
potentially Pareto-improving allocation. Result shows that 3. The tatonnement process dy = z(p(t)) with > 0
root to each terminal node (those nodes without suc-
allocation cannot be feasible. converges to WE prices for any initial condition p(0).
cessors).
4. Any change that raises the excess demand for good k
Second Welfare Theorem (G.E. 11-3) If allocation (ei )iI is PO will increase the equilibrium price of good k. We also add:
and
1. Information: a partition over the set of nodes such that
Incomplete markets (Jackson) Under incomplete markets, Wal-
1. i, ui () is continuous; rasian equilibria may: The same player makes the decisions at all nodes
within any element of the partition;
2. i, ui () is increasing; i.e., ui (x0 ) > ui (x) whenever
x0  x; 1. Fail to be efficient, and even fail to be constrained ef- The same actions are available at all nodes within
ficient (i.e., there may be more efficient outcomes that any element of the partition;
3. i, ui () is concave; are feasible under restricted trading regime); No element of the partition contains both a node
4. i, ei  0; 2. Fail to exist (although only rarely);
and its predecessor.
2. Payoffs: a mapping from terminal nodes to a vector of
then there exists a price vector p RL such that (p, (ei )iI ) 3. Have prices/trades that depend on the resolution of un- utilities.
+
is a WE. certainty.
Perfect recall (Bernheim 6) Informally, a player never forgets ei-
Note this does not say that starting from a given endowment, Rational expectations equilibrium (Jackson) Suppose l (prim- ther a decision he made in the past or information that he
every PO allocation is a WE. Thus decentralizing a PO al- itive) goods, state space S (with |S| < ), and n agents possessed when making a previous decision.
location is not simply a matter of identifying the correct each of whom has
priceslarge-scale redistribution may be required as well. Perfect information (Bernheim 7) Every information set is a sin-
Proof by separating hyperplane theorem. Consider the set endowment ei : S Rl+ , gleton.
of changes to total endowment that strictly improve every preferences ui : Rl+ S R, Complete information (Bernheim 13, 734) Each player knows the
consumers utility; by concavity of ui (), this set is convex. payoffs received by every player at every terminal node.
Separate this set from 0, and show that the resulting prices information Ii (including information contained in ei
and ui ). Per Harsanyi, a game of incomplete information can be writ-
are nonnegative, and that at these prices, ei maximizes each
ten as a game of imperfect information by adding nature as a
consumers utility. |S|l
The allocations xi : S Rl+ (or equivalently xi R+ ) player whose choices determine hidden characteristics. The
Excess demand (G.E. 18) z i (p) xi (p, p ei ) ei , where xi is the probability governing Natures decisions is taken to be com-
and prices p : S Rl+ are an REE iff:
agents Marshallian demand. Walras Law gives p z i (p) = 0. mon knowledge. NE of this Bayesian game is a Bayesian
1. Information revelation: xi is measurable with respect NE.
Aggregate excess demand is z(p) iI z i (p). If z(p) = 0,
P
i i
to Ii Iprices for all i; Strategy (Bernheim 78) A mapping that assigns a feasible action to
then (p, (x (p, p e ))iI ) is a WE.
all information sets for which a player is the decision-maker
P P
2. Market clearing: i xi (s) i ei (s) for all s S;
Sonnenschein-Mantel-Debreu Theorem (G.E. 30) Let B (i.e., a complete contingent plan). Notation is:
3. Optimizing behavior: xi (s) argmaxx ui [xi (s)] such
RL++ be open and bounded, and f : B R
L be continu-
that xi is measurable with respect to Ii Iprices and 1. Si the set of player is feasible strategies;
ous, homogeneous of degree zero, and satisfy p z(p) = 0 for p(s) xi (s) p(s) ei (s) for all i. Q
all p. Then there exist an economy E with aggregate excess 2. S j Sj the set of feasible strategy profiles;
Q
demand function z(p) satisfying z(p) = f (p) on B. 3. Si j6=i Sj the set of feasible strategy profiles for
2.7 Games every player but i.
Interpretation is that without special assumptions, pretty
much any any comparative statics result could be obtained Payoff function (Bernheim 8) gi : S R gives player is expected
in a GE model. However, Brown and Matzkin show that if Game tree (Bernheim 25) Description of a game comprising:
utility if everyone plays according to s S.
we can observe endowments as well as prices, GE may be
1. Nodes, Normal form (Bernheim 8) A description of a game as a collection
testable.
2. A mapping from nodes to the set of players, of players {1, . . . , I}, a strategy profile set S, and a payoff
Gross substitutes property (G.E. 325) A Marshallian demand function g : S RI where g(s) = (g1 (s), . . . , gI (s)).
3. Branches,
function x(p) satisfies the gross substitutes property if for
Revelation principle (Bernheim 79) In a game with externali-
all k, whenever p0k > pk and p0k = pk , then xk (p0 ) > 4. A mapping from branches to the set of action labels,
ties, when the mechanism designer doesnt have information
xk (p); i.e., all pairs of goods are (strict) gross substitutes. 5. Precedence (a partial ordering), about preferences, agents will have an incentive to under-
Implies that excess demand function satisfies gross substi- state or exaggerate their preferences. The revelation princi-
tutes. If every individual satisfies gross substitutes, then so 6. A probability distribution over branches for all notes
that map to nature. ple states that in searching for an optimal mechanism within
does aggregate excess demand. a much broader class, the designer can restrict attention to
If aggregate excess demand satisfies gross substitutes, We assume: direct revelation mechanisms (those that assume agents have
revealed their true preferences) for which truth-telling is an
1. The economy has at most one (price-normalized) WE. 1. There is a finite number of nodes; optimal strategy for each agent.

34
Proper subgame (Bernheim 91) Consider a node t in an extensive Weakly dominated strategy (Bernheim 20) si is a weakly domi- Pure strategy Nash equilibrium (Bernheim 2933) s S is a
form game, with information set h(t) and successors S(t). nated strategy iff there exists some probability distribution PSNE iff for all s S, we have gi (si , si ) gi (si , si ).
Then {t} S(t) (along with associated mappings from infor- over Si {s1i , . . . , sM
i } such that for all si Si , we have The strategies played in a PSNE are all rationalizable.
mation sets to players, from branches to action labels, and A finite game of perfect information has a PSNE [Zermelo].
from terminal notes to payoffs) is a proper subgame iff M
X Proved using backwards induction.
m
gi (sm
i , si ) gi (si , si ),
1. h(t) = {t} (the information set for t is a singleton); and m=1
If S1 , . . . , SI are compact, convex Euclidean sets and gi is
continuous in s and quasiconcave in si , then there exists a
2. t0 S(t), we have h(t0 ) S(t) (every player knows we with strict inequality for some si Si . PSNE. By Berges Theorem, the best response correspon-
are at t). dence is upper hemi-continuous. By quasiconcavity of gi ,
We cannot iteratively delete weakly dominated strategies; the best response correspondence is convex valued. Thus by
System of beliefs (Bernheim 94) Given decision nodes X, infor- unlike for strict domination, the order of deletion matters. Kakutanis Fixed Point Theorem, it has a fixed point.
mation sets H, including the information set h(t) contain-
ing t X, and (h) the player who makes the decision at Mixed strategy Nash equilibrium (Bernheim 569) We define a
information set h H, a system of beliefs 2.9 Equilibrium concepts new game where the strategy space is the set of probability
Pis a mapping distributions over (normal form) strategies in the original
: X [0, 1] such that h H we have th (t) = 1.
That is, a set of probability distributions over nodes in each PSNE rationalizable ISD. Rationalizable strategies are game (i.e., the set of mixed strategies). A MSNE of the
information set. a best response based on some prior. PSNE are best re- original game is any PSNE of this new game. Players must
sponses based on a common prior. be indifferent between playing any strategies included (with
strictly positive probability) in a MSNE.
Strictly mixed strategy (Bernheim 98) Behavior strategy profile Normal form equilibrium concepts (static games): THPE
is strictly mixed if every action at every information set is MSNE; PSNE MSNE; MSNE are made up of rationaliz- Another approach is to consider randomization over ac-
selected with strictly positive probability. able strategies. BNE are MSNE of extended game that tions at each information set (behavior strategies). However,
includes nature choosing types. Kuhns Theorem assures us that for any game of perfect
Note since every information set is reached with strictly pos- recall there are mixed strategies that yields the same dis-
itive probability, one can completely infer a system of beliefs Extensive form equilibrium concepts (dynamic games) : tribution over outcomes as any combination of behavioral
using Bayes rule. strategies; also it gives that there are behavioral strategies
that yield the same distribution over outcomes as any com-
MSNE bination of mixed strategies.
2.8 Dominance
Every finite game has a MSNE.
WPBE SPNE
Dominant strategy (Bernheim 13) si is a (strictly) dominant PBE Trembling-hand perfection (Bernheim 657) There is always
strategy iff for all s S with si =
6 si , we have gi (si , si ) > some risk that another player will make a mistake. Bern-
SE
gi (si , si ). heim notes include two equivalent rigorous definitions; origi-
EFT
HPE nal definitions are for finite games, but there is an extension
Dominated strategy (Bernheim 15) si is a (strictly) dominated available for infinite games. Key notes:
strategy iff there exists some probability distribution over
Si {s1i , . . . , sM 1. In a THPE, no player selects a weakly dominated strat-
i } such that for all si Si , we have
egy with positive probability.
M
X Rationalizable strategy (Bernheim 27) 2. For two player games, an MSNE is a THPE iff no player
m gi (sm
i , si ) > gi (si , si ). selects a weakly dominated strategy with positive prob-
m=1 1. A 1-rationalizable strategy is a best response to some ability.
(independent) probability distribution over other play- 3. For more than two player games, the set of THPE may
ers strategies. be smaller than the set of MSNE where no player selects
Iterated deletion of dominated strategies (Bernheim 156) We
iteratively delete (strictly) dominated strategies from the 2. A k-rationalizable strategy is a best response to some a weakly dominated strategy with positive probability;
game. Relies on common knowledge of rationality (i.e., ev- (independent) probability distribution over other play- if we allow correlations between the trembles of different
eryone is, everyone knows, everyone knows everyone knows, ers (k 1)-rationalizable strategies. players, the sets are the same.
. . . ). If this yields a unique outcome, the game is dominance
3. A rationalizable strategy is k-rationalizable for all k. Correlated equilibrium (Bernheim 70) In a finite game
solvable. The order of deletion is irrelevant.
({Si }Ii=1 , g), a probability distribution over S is a CE iff
For two player games, strategies that survive iterative dele- For two player games, rationalizable strategies are precisely for all i and si chosen with strictly positive probability, si
tion of dominated strategies are precisely are precisely the those that survive iterative deletion of dominated strategies. solves
set of rationalizable strategies. This equivalence holds for This equivalence holds for games with more than two play- max Esi [gi (s0i , si )|si , ];
s0i Si
games with more than two players only if we do not insist on ers only if we do not insist on independence in defining ra-
independence in defining rationalizability; if we do insist on tionalizability; if we do insist on independence, the set of i.e., player i has no incentive to defect from any strategy si ,
independence, the set of rationalizable strategies is smaller. rationalizable strategies is smaller. assuming that other players respond per .

Unclear whether PBE is the intersection of WPBE and SPNE, or a subset of the intersection.

Question is whether to allow other players randomized strategies to be correlated with each other.

35
Bayesian Nash equilibrium (Bernheim 734) Per Harsanyi, we Extensive form trembling hand perfection (Bernheim 1025) 2. Characterized by strategic complements; i.e., best re-
write our game of incomplete information as a game of imper- Agent normal form is the normal form that would obtain sponse curves are upward sloping.
fect information where nature selects hidden characteristics. if each player selected a different agent to make her decisions
3. All production is done by most efficient firm.
A BNE is a MSNE of this Bayesian game. A pure strategy at every information set, and all of a players agents acted
BNE is a PSNE of the Bayesian game. independently with the object of maximizing the players Bertrand competitionnon-spatial differentiation
payoffs. An EFTHPE is a THPE in the agent normal form (Bernheim 3840) Firm i chooses price for good i. Demand
Characterized by a set of decision rules that determine each
of the game. for good i is given by Q(pi , p1 ). Strategic complements;
players strategy contingent on his type.
EFTHPE SE; for generic finite games, they are the same. i.e., best response curves are upward sloping.
Subgame perfect Nash equilibrium (Bernheim 92) A MSNE in
behavior strategies is a SPNE iff for every proper sub- Bertrand competitionhorizontal differentiation
game, the restriction of to the subgame forms a MSNE in 2.10 Basic game-theoretic models (Bernheim 402) a.k.a. Hotelling spatial location model. Con-

behavior strategies. sumers are indexed by [0, 1], representing location. Each
Other models and examples are throughout Bernheim lecture consumer purchases zero or one unit, with payoff 0 if no
For finite games, we can find SPNE using backwards induc- notes. purchase, and v pi t(xi )2 from purchasing a type xi
tion on subgames of extensive form. good at price pi . v is value of good, t is unit transport cost.
A SPNE need not be a WPBE, and a WPBE need not be a Cournot competition (Bernheim 11, 4650) Firms simultaneously
If firms cannot choose product type xi , prices are strategic
SPNE. choose quantity. Inverse demand is P (Q) (monotonically de-
complements; i.e., best response curves are upward sloping.
creasing); cost to firm i of producing quantity qi is ci (qi )
Sequentially rational strategy (Bernheim 94) Behavior strategy where ci (0) = 0. Normal form:
Bertrand competitionvertical differentiation (Bernheim
profile is sequentially rational given a system of beliefs 425) Consumers are indexed by [0, 1], representing value
1. Strategies: S = RI+ ;
Sinformation sets h H, the actions for player (h)
iff for all attached to quantity. Each consumer purchases zero or
at h [ th S(t)] are optimal starting from h given an ini- P
2. Payouts: gi (s) = P ( j sj )si ci (si ). one unit, with payoff 0 if no purchase, and vi pi from
tial probability over h governed by , and given that other purchasing a quality xi good at price pi .
players will adhere to (h) . To ensure PSNE existence, we need quasiconcavity of gi in
si (generally dont worry about unboundedness of strategy Sequential competition (Bernheim 10712) First one firm selects
Weak perfect Bayesian equilibrium (Bernheim 935) Implausi- set). Sufficient conditions are ci () convex and P () concave. price/quantity, then the other firm follows. The leader al-
ble equilibria can still be SPNE, because we lack beliefs at The former rules out increasing returns to scale. The latter ways does (weakly) better than in the simultaneous choice
each information set. Behavior strategy profile and sys- is not a conventional property of demand functions, but is model; whether the follower does better or worse than in si-
tem of beliefs are a WPBE iff: satisfied for linear functions. multaneous choice depends whether there are strategic com-
Note: plements or substitutes. This also determines which firm
1. is sequentially rational given beliefs , and does better in the sequential choice model.
2. Where possible, is computed from using Bayes 1. Characterized by strategic substitutes; i.e., best re-
Herfindahl-Hirshman index (Bernheim 37) H 10000 i 2i ,
P
rule; i.e., for any information set h with Pr(h| ) > 0, sponse curves are downward sloping.
for all t h, where i is the market share of firm i. When all N firms
Pr(t| ) 2. Production is spread among firms. evenly split market, H = 10000/N .
(t) = .
Pr(h| ) Bertrand competition (Bernheim 11, 37) Firms simultaneously P pi c0i (qi )
choose price; consumers purchase from low-price firm. De- Lerner index (Bernheim 37) L i i Li , where Li pi
Note only restriction on out-of-equilibrium beliefs is that is firm is margin and i is the market share of firm i.
mand is Q(p) (monotonically decreasing); cost to firm i of
they exist. A SPNE need not be a WPBE, and a WPBE
producing quantity qi is ci (qi ) where ci (0) = 0. Normal form
need not be a SPNE. Monopolistic competition (Bernheim 1358)
(two-firm case):
Perfect Bayesian equilibrium (Bernheim 935) ( , ) is a PBE 1. Strategies: S = RI+ ; 1. Products are distinct, and each firm faces a downward-
if it a WPBE in all proper subgames. This ensures it is also sloping demand curve;
a SPNE. 2. Payouts:
2. The decisions of any given firm have negligible effects on
any other firm (note this does not hold for the Hotelling

Consistent strategy (Bernheim 98) Behavior strategy profile is 0,
si > si ;
consistent given a system of beliefs iff there exists a se- model, where firms have a measurable effect on their
gi (s) = si Q(si ) ci (Q(si )), si < si ;
quence of strictly mixed behavior strategy profiles n 1
 neighbors);
s Q(si ) ci (Q(si )) ,
2 i
si = si .
such that n , where n is generated from n by Bayes 3. There is free entry, with zero profits.
rule.
Note:
Can formalize as a vector of N differentiated commodities
Sequential equilibrium (Bernheim 98) ( , ) is a SE if it is se- (for N large) and a numeraire good y, where representative
1. One firm case is monopoly (in which case demand mat-
quentially rational and consistent. consumer has utility
ters); more than one yields perfectly competitive out-
SE places additional restriction on beliefs vs. WPBE, hence come since only the marginal cost of the second-most N
SE WPBE; also, can show that SE are SPNE, so an SE is
X 
efficient firm mattersthe properties of the demand u(x, y) = y + g f (xi )
also a PBE. curve are then irrelevant. i=1

36
and the curvature of f () gives the extent of substitutability Public good (Bernheim 514) A non-rival and non-excludable good. Folk Theorem (Bernheim 1712, 5, 78) Consider a supergame
of the goods. formed by repeating a finite stage game an infinite number
Spence signaling model (Bernheim 21835) Workers choose edu- of times; suppose players use the average payoff criterion.
Relationship between equilibrium variety and optimal vari-
cation level; education does nothing for workers productiv- Then the set of feasible and individually rational payoffs is
ety is dictated by:
ity, but is less costly on margin for more productive workers. precisely the set of average payoffs for Nash equilibria (which
All workers have same outside opportunities. Three types of need not be SPNE).
1. When a product is added, revenues generated fall short
WPBE:
of incremental consumer surplus because firms cant
1. Anything can happen; this makes comparative statics
perfectly price discriminate; this biases towards too lit- 1. Separating equilibria: different types choose different are problematic.
tle entry. education levels.
2. Inability to write binding contracts is not very dam-
2. Firms dont take into account the effect of introducing 2. Pooling equilibria: different types choose same educa- aging; anything attainable through a contract can also
a product on the profits of others; if goods are substi- tion levels. (Note pooling equilibria with strictly posi- be obtained through some self-enforcing agreement (if
tutes, this biases towards too much entry. tive levels of education are highly inefficienteducation there is no discounting).
adds nothing to productivity nor does it differentiate
If goods are complements, these biases reinforce and we have For discounted payoffs, all feasible payoffs that strictly ex-
workers.)
too little variety relative to social optimum. ceed minmax payoffs for every player are the average payoff
3. Hyprids: some members of differently-typed groups
for some NE and all discount rates sufficiently close to 1.
Entry deterrence (Bernheim ) An incumbent can take some ac- pool, others separate.
tion with long-term commitment prior to the arrival of an Subject to some technical conditions, versions of the folk
entrant that makes entry less attractive. For example: Equilibrium satisfies equilibrium dominance condition a.k.a. theorem (with and without discounting) hold for SPNE.
the intuitive criterion iff whenever an individual gets some
1. Selecting to produce a middle-of-the-road product level of education that no one should in equilbrium, no Nash reversion (Bernheim 1767) a.k.a. grim trigger strategy.
(Hotelling model); note the deterrent action is the same low-productivity worker should ever choose, and some high- Players attempt to support cooperation, reverting to a static
as what the firm would do if it faced no entry with cer- productivity worker could conceivably choose, the firm as- (stage-game) equilibrium as punishment if someone deviates.
tainty (entry is blockaded.) sumes he is a high-productivity worker with certainty. If a Nash reversion strategy is a NE, then it is a SPNE.
2. Producing a proliferating range of products (e.g., RTE Finitely repeated game (Bernheim 17980) If there is a unique (up
cereal). to payoffs) NE for the stage game, there is a unique SPNE
2.11 Repeated games with complete infor-
3. Preemptive investment in production capacity; shifts a mation for the repeated game, consisting of the repeated stage game
portion of marginal cost to a sunk cost. Note entrant equilibrium. However, cooperation may be possible when the
may also limit capacity to reduce threat to incumbent. Infinitely repeated game (Bernheim 1645) A supergame formed stage game has multiple NE.
by (potentially) infinite repetitions of some stage game (i.e.,
Vickrey auction (Bernheim 223, 823) a.k.a. second-price auction. Stick-and-carrot equilibrium (Bernheim 1856) If a firm strays
the game played in each period). Note there need not ac-
I bidders simultaneously submit sealed bids for an item that from the (cooperative) equilibrium path, all firms including
tually be infinite repetitions, but there must be a nonzero
each values at vi . Winning bidder pays second highest bid. itself punish it for one period. If any firm does not partici-
possibility in each stage that the game will continue.
Bidding pi = vi weakly dominates any other pure strategy, pate in the punishment (including the punished firm itself),
and is not weakly dominated by any other strategy. Analysis Given absence of terminal nodes, need mapping from strate- it gets punished again in the next period. The punishment
does not depend on complete vs. incomplete information gies to expected payoffs, e.g., is the stick; the fact that punishment will end as soon as
everyone bidding their valuation is a BNE. P t the firm punishes itself is the carrot.
1. Discounted payoffs: ui (vi ) = t=0 vi (t), where the
First-price sealed bid auction (Bernheim 837) I bidders simul- discount factor may reflect both time preference and the
taneously submit sealed bids for an item that each values at probability of continuation. 2.12 Collusion and cooperation
PT
vi . Winning bidder pays his own bid. Symmetric BNE gives 2. Average payoffs: ui (vi ) = limT T1 t=1 vi (t). Core (Jackson; MWG 6534) The core is the set of allocations not
a decision rule where each bids below his valuation.
3. Overtaking criterion: a strategy is preferred iff there is blocked by any coalition. A coalition will block an alloca-
Realized revenues typically differ from Vickrey (second-price some T beyond which all partial sums of stage game tion if there is some allocation feasible for the coalition such
sealed bid) auction, but expected revenue is the same given payoffs exceed the corresponding partial sum for other that each member is strictly better off (strong blocking),
independent private valuations (per the revenue equivalence strategies; n.b., only a partial ordering. or that every member is weakly better off and some member
theorem). is strictly better off (weak blocking).
Feasible payoff (Bernheim 16970) The convex hull of payoff vectors
English auction (Bernheim 87) a.k.a. ascending price auction Core allocations must be Pareto optimal.
from pure strategy profiles in the stage game. Note this is
Posted price of good is slowly increased until only one bid- potentially larger than the set of payoffs achievable through The set of Walrasian equilibria is a subset of the core (ba-
der remains. Staying in until posted price exceeds valuation mixed strategies, since we allow for correlations. sically, the FWT) since there are no externalities. In other
is a weakly dominated strategy; outcomes are equivalent to settings, the core could be empty due to externalities.
Vickrey (second-price sealed bid) auction. Individually rational payoff (Bernheim 1701) (Stage game) pay-
off vectors where each player receives at least his minmax Core convergence (Jackson; MWG 6557) As we increase the num-
Dutch auction (Bernheim 87) a.k.a. descending price auction payoff ber of replicas of a pure exchange economy, the core of the
Posted price of good is slowly decreased until a bidder buys. im min max i (). replicated economy converges to (equal treatment replicas
i i
Outcomes are equivalent to first-price sealed bid auction. of) the W.E. of the original economy.

37
Nontransferable utility game (Jackson) Feasible allocations for In a simple game, i is a veto player iff V (S) = 1 = i S. Two-sided matching (cf, roommate matching);
a coalition T must be listed explicitly. The core is nonempty iff there is at least one veto player.
Shapley values are then: One-to-one matching (cf, firm and workers);
Transferable utility game (Jackson; MWG 676) Feasible alloca- ( Strict preferences.
1
|S|
tions for a coalition S are V (S) R+ . Generally assume: , i is a veto player;
SV
i = # of veto players
0, otherwise.
The algorithm is not strategy-proof: there may be profitable
1. The normalization V () = 0;
If the set of veto players is a carrier, SV core. manipulations (lying by rejecting proposals), but they are
2. V (I) V (S) + V (I \ S) for all I S, where I is the typically difficult to implement.
set of all players; Convex game (Jackson; MWG 683) A TU game V () is convex iff
S T and i 6 T implies
3. Superadditivity (a stronger version of the prior assump-
tion): V (S T ) V (S) + V (T ) when S T = . V (S {i}) V (S) V (T {i}) V (T ); 2.13 Principal-agent problems

In a TU game, an allocation can be strictly blocked iff it can i.e., there are increasing marginal returns as coalitions grow. Moral hazard (Jackson) Hidden action problems. Agent takes
be weakly blocked; the weak and strong cores are therefore For a convex game, the core is nonempty; the Shapley value (non-contractable) action e, outcome has some distribu-
identically is in the core. tion that varies based on e. Generally assume risk-neutral
X X principal and risk-averse (and/or liability-limited) agent, so
{x R|I| : xi = V (I) and S I, xi V (S)}. One-sided matching (Jackson) Players are each allocated one ob- that optimal solution is not merely selling the firm to the
i i ject; each player has ordinal preferences over objects. Given agent.
strict preferences, we can find the unique (weak) core alloca-
tion using the Top Trading Cycles algorithm: Principal structures optimal contractpayment w() as a
Shapley value (Jackson; MWG 673, 67981) Given a characteristic function of realized outcomethat, for a desired effort level
function V : 2I R+ , the Shapley value is value function: 1. Look for cycles among individuals top choices; e ,
X |S|! (|I| |S| 1)!  2. Assign each member of cycles her top choice;
SV 1. Maximizes principals expected payoff: E[ w()|e ];

i (V ) V (S{i})V (S) . 3. Return to step 1, with already assigned individu-
SI s.t. i6S
|I|!
als/objects removed from consideration. 2. Satisfies the agents participation constraint (i.e., indi-
That is, the average marginal value that i contributes over vidual rationality constraint): E[u(w())|e ] g(e )
We can support this core as a competitive equilibrium by as-
all possible orderings in which members could be added to u, where g() is the agents cost of effort, and u is his
signing a common price to the objects in each cycle as they
a coalition. It is a normative allocation, and can be seen reservation utility;
are assigned. The price for cycles assigned in the same round
as a positive description under certain kinds of bargaining can vary across cycles, but must be strictly lower than the 3. Satisfies the agents incentive compatibility constraint:
regimes. prices for objects assigned in previous rounds. e argmaxe E[u(w())|e] g(e).
The Shapley value is the unique value function satisfying: The algorithm is strategy-proof: truth telling dominates.
Adverse selection (Jackson) Hidden information problems.
1. Symmetry: If we relabel agents, the Shapley values are Two-sided matching (Jackson) Two sets M and W , which may
Can be mitigated using, e.g., warantees, repeated interac-
relabeled accordingly. have different cardinalities. Every individual is either
tion, reputation mechanisms, signaling/screening.
matched to someone in the other set, or remains unmatched.
2. Carrier: T I is a carrier iff VP
(S T ) = V (S) for all
Can find two core allocations (here strict = weak) using Gale-
S I; if T is a carrier, then iT i (V ) = V (I) = Signaling (Jackson) Agents choose a costly signal (which per
Shapley [a.k.a. Deferred acceptance] algorithm: Suppose M
V (T ). Spence is often assumed to be meaningless other than for
propose, then
its informational value), and then principals Bertrand bid
3. Dummy: i is a dummy iff V (S {i}) = V (S) for all
1. Each M proposes to his favorite W to whom he has not for contracts with agents.
S I; if i is a dummy, i (V ) = 0. Note Carrier =
Dummy. yet proposed (unless he would prefer to stay single);
A multitude of pure strategy sequential equilibria typically
4. Additivity: (V + W ) = (V ) + (W ); implies that 2. Each W who has been proposed to accepts her favorite existboth pooling (various types choose same signal) and
(V ) = (V ). Convenient, but not mathematically proposal (unless she would prefer to stay single)this separating (various types choose different signals).
clear why this should hold. could involve breaking an engagement she made in a
previous round;
Screening (Jackson) Bertrand principals announce a schedule of
Simple game (Jackson) A TU game where 3. Return to step 1, where the only unengaged M pro- contracts they are willing to engage in (pairs of signals and
pose (i.e., those who either had a proposal rejected last wages), and then agents choose a contract (and hence a sig-
1. V (S) {0, 1}, round, or had an engagement broken). nal).
2. S T = V (S) V (T ), The core is a lattice; Gale-Shapley algorithm yields best No pooling equilibria can exist, and there is only one sep-
for proposing group (and worst for other group). Thus arating equilibrium that can exist (but may not). Thus in
3. V (S) = 1 = V (I \ S) = 0,
if we get same result from M -proposing and W -proposing contrast with signaling, where our problem was a multitude
4. V (I) = 1. algorithms, core is a singleton. Note this result relies on: of PSNE, screening games may have no PSNE.

38
2.14 Social choice theory Vickrey-Clarke-Groves mechanism (Jackson) Let i i be We further assume U () satisfies Inada conditions; that the
the type of individual i, and i be his announced type. Sup- production function is constant returns to scale and satis-
Voting rule (Jackson) n individuals must choose among a set A of pose d() makes ex post efficient decisions. Then a mecha- fies Fk0 > 0, Fkk 00 < 0, F 0 > 0, and F 00 < 0; and TVC
n nn
alternatives. Each individual has a complete, transitive pref- nism with transfers limt t U 0 (ct )f 0 (kt )kt = 0.
erence relation i over A that does not have indifference. X  The problem can P be rewritten in SL canonical form
ti () = uj (d(), j ) + xi (i ) t
A voting rule R() R(1 , . . . , n ) gives a social welfare as maximizing t U (f (kt ) kt+1 ) subject to kt+1
j6=i
ordering over A (which in general allows indifference). The [0, f (kt )], the given k0 , and conditions on f () and U ()
corresponding strict social welfare ranking is P (). as above. In functional equation form, we write V (k)
for any xi () is dominant-strategy incentive compatible maxk0 [0,f (k)] U (f (k) k0 ) + V (k0 ) (again, with additional
Neutrality (Jackson) For A = {a, b} (i.e., |A| = 2), consider  and (DSIC). conditions as above).
0 with i 6=0i . Then R() is neutral (over alternatives) iff Conversely, if (d, t) is DSIC, d() is ex post efficient, and i Steady state satisfies f 0 (k ) = 1. Note the utility func-
aR()b bR(0 )a. is sufficiently rich such that vi {vi : D R} there exists tion does not affect the steady state (although it will affect
i such that vi () = ui (, i ), then t() must satisfy the above dynamics).
Anonymity (Jackson) Let () be a permutation over {1, . . . , n}. condition.
Define 0i (i) . Then R() is anonymous (over individuals) Linearization of the intereuler
iff aR()b aR(0 )b. Note that Gibbard-Satterthwaite says that a general DSIC
mechanism must be dictatorial. However, here we have re- u0 (f (kt ) kt+1 ) f 0 (kt+1 )u0 (f (kt+1 ) kt+2 ) = 0
Monotonicity (Jackson) Consider  and 0 with j =0j for all stricted ourselves to quasilinear preferences, and get a DSIC
j 6= i, and i = a while 0i = b. Then R() satisfies mono- mechanism without being dictatorial. However, although it about steady state gives:
tonicity iff bR()a = bR(0 )a. reaches an efficient decision it is not always balanced (pay-
ments do not total to zero), and hence is not typically overall  " U 0 (c )
#
1
+ U 00 (c ) f 00 (k ) 1
 
kt+2 k 1+ kt+1 k
R() satisfies strict monotonicity iff bR()a = bP (0 )a. efficient.
kt+1 k 1 0 kt k
Mays Theorem (Jackson) Let A = {a, b} (i.e., |A| = 2). Then Pivotal mechanism (Jackson) An example of a VCG mechanism
R() is complete, transitive (which has no bite for |A| = 2), where The optimal policy function g(k) satisfies:
neutral, anonymous, and satisfies strict monotonicity iff it is
majority rule.
X  X
ti () = uj (d(), j ) max uj (d, j ). 1. g() is single-valued, since the value function is strictly
d
If we replace strict monotonicity with monotonicity, we get j6=i j6=i concave.
that R() must be a quota rule. 2. g(0) = 0 since that is the only feasible choice.
That is, the transfers are the externality imposed on others
Unanimity (Jackson) R() is unanimous iff a i bi implies that by choosing d() instead of d(i ). 3. g(k) is in the interior of (k), since otherwise exactly
aP ()b. one of U 0 (c) and U 0 (c0 ) would be infinite, violating the
Ensures feasibility, since transfers are always negative. How-
Intereuler condition.
ever, budget generally doesnt balancethere is a need to
Arrows independence of irrelevant alternatives (Jackson) 4. g 0 (k) > 0, since as k increases, marginal cost of saving
burn money.
Consider  and 0 with a i b a 0i b for all i. goes down, while marginal benefit stays same.
R() satisfies AIIA iff aR()b aR(0 )b; i.e., R() 5. There is unique k > 0 such that g(k ) = k .
over {a, b} only depends on  over {a, b}, not over any other 3 Macroeconomics
(irrelevant) alternatives. 6. k > k = k > g(k) > k , and k < k = k <
g(k) < k ; i.e., capital moves closer to (without crossing
Arrows Theorem (Jackson; Micro P.S. 4) Let |A| 3. Then R()
3.1 Models over) the steady-state level.
is complete, transitive, unanimous, and satisfies AIIA iff 7. The sequence k0 , g(k0 ), g(g(k0 )), . . . displays mono-
it is dictatorial (i.e., if there is some i such that , Two period intertemporal choice model (Nir 1-11) Maximize
U (C0 , C1 ) subject to C0 + S0 Y0 and C1 RS0 . The tonic and global convergence.
aP ()b a i b). 1
constraints can be rewritten C0 + R C 1 Y0 .
The theorem is tight; i.e., giving up any of completeness, Endowment economy, Arrow-Debreu (Nir 13-46) At every
transitivity, unanimity, or AIIA allows a non-dictatorial R(). Neoclassical growth model (discrete time) (Nir 3-35, 11-3, period household gets yt units of good; there is no storage,
8, 10, 1628, 12-1226) (a.k.a. Ramsey model) Maximize
and trading in all commodities (goods delivered
P t in each pe-
Condorcet winner (Jackson) a A is a Condorcet winner iff it is P t riod)
P takes place
P in period 0. Maximize U (ct ) subject
t U (ct ) subject to:
majority preferred to every other alternative. to pt ct pt yt (budget constraint), and ensure yt = ct
1. ct + kt+1 F (kt , nt ) + (1 )kt for all t (RHS can be (market clearing).
Condorcet-consistent rule (Jackson) A voting rule that picks denoted f (kt , n), or f (kt ), since the lack of disutility of
the Condorcet winner if there is one. Endowment economy, sequential markets (Nir 13-79) At ev-
working ensures that nt = 1);
ery period household gets yt units of good; there is no stor-
Gibbard-Satterthwaite Theorem (Jackson) Let 3 |A| < , 2. kt+1 0 for all t; age, and trading in assets
P (loans across periods) takes place
and F () be a social choice function (note we do not require a each period. Maximize t U (ct ) subject to ct + at+1
3. ct 0 for all t;
full ordering). Then F () has range A (implied by unanimity) yt + Rt at (budget constraint), and ensure yt = ct , at+1 = 0
and is strategy-proof (i.e., DSIC) iff it is dictatorial. 4. k0 is given. (market clearing).

39
c1 1 n1+
Production economy, Arrow-Debreu (Nir 13-106) Household 1. CRRA utility: U (c, n) = 1
1+
; Imperfect competition: intermediate goods (Nir 21) Inter-
owns capital rented for production by competitive firms, mediate goods producers have identical production technolo-
which also rent labor from households. Capital depreciates at 2. Cobb-Douglas production: yt = At kt n1
t ; gies (affected by same shocks): Qj = zkj n1
j , where
rate . There is no storage, and trading in all commodities > 0 is overhead cost (since there arent really economic
3. Productivity given by log At = log At1 + t with t
(goods delivered, capital rental, and labor in each period) profits). Has constant marginal cost, and therefore increas-
distributed iid normal.
takes place in period 0. ing returns to scale (because of ).
1. Households
P t P Optimal taxationprimal approach (Nir 10) Solve HH prob- Monopolistic problem is to maximize over prices pQ(p)
P maximize U (ct ) subject to pt [ct +
lem and combine FOC and budget constraint with govern-
kt+1 ] pt [rt kt + (1 )kt + nt wt ]. c(Q(p)); first-order condition gives markup p/ MC =

ment budget constraint to get one expression that ties to- = 1 .
2. Firms maximize (in each period) pt F (kt , nt ) pt rt kt 1+
gether allocations but has no prices or taxes. Then gover-
pt wt nt .
ment maximizes HH utility over allocations subject to this
3. Markets clear: ct + kt+1 = F (kt , nt ) + (1 )kt . constraint. 3.3 General concepts
Note the rental rate of capital (rt ) and wage rate (wt ) are Note that in the case with capital,
the problem is not sta-
Cobb-Douglas production function (Macro P.S. 1) F (K, N ) =
measured in units of consumption good. tionary; i.e., we cannot use our typical dynamic program-
zK N 1 . Constant returns to scale. Fraction of output
ming tools, and need to worry about the governments ability
goes to capital, and 1 to labor.
Production economy, sequential markets (Nir 13-25) As to commit to a future tax policy.
above, household owns capital rented for production by
Constant relative risk aversion utility function (Nir 14-15,
competitive firms, which also rent labor from households. 1 1)/(1 ), with > 0 (reduced to
18-724) U (c) = (c
Capital depreciates at rate . There is no storage, and each 3.2 Imperfect competition log utility with = 1; empirically we expect [1, 5]).
period, goods delivered, capital rental, and labor are traded.
Relative risk aversion is . Higher risk aversion also cor-
Note there is no inter-period trading, so we do not need a Imperfect competition model (Nir 21) Consider two stages of responds to a higher desire to smooth consumption over
price for goods. production: final good manufacturer operates competitively, time.
P t purchasing inputs from intermediate good producers, who
1. Households maximize U (ct ) subject to ct + kt+1
are small enough to take general price levels and aggregate Additively separability with stationary discounting (Nir 1-
rt kt + (1 )kt + nt wt .
demand as given, but whose products are imperfect substi- 24) General assumption that U (C0 , C1 ) = u(C0 ) + u(C1 )
2. Firms maximize (in each period) F (kt , nt )rt kt wt nt . tutes to the production process. with (0, 1).
3. Markets clear: ct + kt+1 = F (kt , nt ) + (1 )kt .
Imperfect competition: final goods (Nir 21) Final good pro- Intertemporal Euler equation (Nir 1-28, 11-5) u0 (ct ) =
Formulated recursively (assuming n = 1), ducer has Dixit-Stiglitz production technology: Ru0 (ct+1 ), where R is the return on not-consuming (e.g.,
f 0 (kt+1 ) Fk0 (kt+1 , nt+1 ) + (1 )). This first-order differ-
1. Households have V (k, K) = maxc,k0 [U (c) + V (k0 , K 0 ) Z 1 1/ ence equation is given by FOCs of the maximization problem
subject to c + k0 = R(K)k + W (K); where K is aggre- Y = Qj dj for two consecutive time periods.
gate capital, and R(K) and W (K) are the rental and 0
wage rates as functions of aggregate capital. We further Intertemporal elasticity of substitution (Nir 1-31, Max notes)
1 U 0 (C )
require the forecast of future capital to be rational: which is constant elasticity of substitution ( 1 between all d log C0 C0
/d log R = d log C /d log U 0 (C0 ) .
K 0 = G(K), where G() is the optimal aggregate policy pairs of inputs) and constant returns to scale. (0, 1),
C 1 1 1

function. with 1 corresponding to a competitive market where all For CRRA utility function U (c) = c1 /(1 ), intereuler
2. Firms maximize F (K) RK W , which gives FOC inputs are perfect substitutes. is C0 = C1 R so IEoS = 1 = RRA1 .
R(K) = Fk (K) + 1 and W (K) = Fn (K). Conditional factor demands are therefore General equilibrium (Nir 2-11) A competitive equilibrium is a set
3. Markets clear: C + K 0 = F (K) + (1 )K. of allocations and prices such that
hp i 1 hp i 1
4. Consistency: G(K) = g(K, K). j 1 j 1
Qj (Y ) = Y = Y,
P 1. Actors (i.e., households, firms, . . . ) maximize their ob-
Overlapping generations (Nir 17-156, 17-32) In an endowment R1 jectives;
OLGP economy, a competitive equilibrium is Pareto optimal where the Lagrange multiplier on [ 0 Qj dj]1/ Y also sat-
2. Markets clear; and
iff
t=0 1/pt = where the interest rate is Rt = pt /pt+1 . isfies = E /Y P , the (aggregate) average cost. Solving
for this gives 3. Resource constraints are satisfied.
In an an OLG economy with production, a competitive equi-
0 (K ) + 1 1.
librium is dynamic efficient iff FK
 1 Inada conditions (Nir 3-5) We generally assume utility satisfies
Z 1
Inada conditions, which imply that the nonnegativity con-
Real Business Cycles model (Nir 18-714) Benchmark RBC =P = pj1 .
model is NCGM with endogenous hours, ownership of firms 0 straint on consumption will not bind:
and capital by households, and stochastic productivity At
The own-price elasticity of demand is 1
, ignoring the 1. U (0) = 0;
multiplying labor (so production function is F (kt , At nt )). 1
We specified: effects through P . 2. U is continuously differentiable;

We generally need to impose restrictions on first-period taxation, since otherwise the government will tax capital very highly then (since its supplied perfectly inelastically).

Note the Taylor approximation of c1 1 about = 1 is c1 1 (c0 1) c0 (log c)( 1) = (1 ) log c. Thus as 1, we have U (c) = (c1 1)/(1 ) log c.

40
3. U 0 (x) > 0 (U is strictly increasing); Normed vector space (Nir 6-56) A vector space S and a norm Contraction Mapping Theorem (Nir 8-5) [a.k.a. Banach fixed
4. U 00 (x) 0 (U is strictly concave); kk : S R such that: point theorem.] If T is a contraction on a complete metric
space, then
5. limx0+ U 0 (x) = ; 1. x S, kxk 0;
6. limx U 0 (x) = 0. 1. T has exactly one fixed point V such that T V = V ;
2. x S, kxk = 0 x = 0;
and
Transversality condition (Macro P.S. 2) In NCGM, 3. Triangle inequality: x, y S, kx + yk kxk + kyk. 2. The sequence {Vi } where Vi+1 = T Vi converges to V
limt t U 0 (ct )f 0 (kt )kt = 0. from any starting V0 .
4. Scalar multiplication: x S, R, kxk = || kxk.
No Ponzi condition (Max notes) Credit constraint on agents, Contraction on subsets (Nir 8-11) Suppose T is a contraction on
which should never bind but suffice to prevent the optimality Note any normed vector space induces a metric space with
(x, y) kx yk. a complete metric space (S, ), with fixed point V = T V .
of a doubling strategy. For example, t, at k for some Further suppose Y S is a closed set, that Z Y , and that
k. y Y , T y Z. Then V Z.
Supremum norm (Rn ) (Nir 6-78) kks : Rn R with kxks
Dynamic systems (Nir 12-411, Macro P.S. 5-4) Let the sequence supi=1,...,n |xi |.
Blackwells sufficient conditions for a contraction (Nir 8-
{xt } evolve according to xt+1 = W xt . Using eigen de-
Euclidean norm kkE : Rn R with 123) Blackwell gives sufficient (not necessary) conditions for
composition W = P P 1 , giving the decoupled system (Nir 6-9)
an operator to be a contraction on the metric space B(Rn , R)
P 1 xt+1 = P 1 xt . Then if xt P 1 xt , we have v (the set of bounded functions Rn R) with the sup norm.
xti = ti x0i and hence xt = P t X0 .
u n
An operator T is a contraction if it satisfies
uX
n
kxkE t |xi |n .
For 2 2 case, if i=1
1. Monotonicity: if x, f (x) g(x), then x, T f (x)
1. |1 | > 1 and |2 | > 1, then the system explodes un- T g(x); and
less x0,1 = x0,2 = 0 (source). Continuous function (Nir 6-10; Micro math 3) f : S R is contin-
uous at x iff y S and > 0, there exists > 0 such that 2. Discounting: there exists (0, 1) such that for all
2. |1 | < 1 and |2 | < 1, then the system converges for 0, f (), and x, we have T (f (x) + ) T (f (x)) +
any x0,1 and x0,2 (sink). ky xk < = |f (y) f (x)| < .
[slight abuse of notation].
3. |1 | < 1 and |2 | > 1, then the system converges for Equivalently, iff for all sequences xn converging to x, the
any x0,1 as long as x0,2 = 0 (saddle path). sequence f (xn ) converges to f (x). Principle of Optimality (Nir 9) Under certain conditions, the
solutions to the following two problems are the same:
The speed of convergence for a state variable equals one mi- Supremum norm (real-valued functions) (Nir 6-78) Let
1. Sequence problem: W (x0 ) = max{xt+1 } t
P
nus the slope of the optimal policy function (1 g 0 (k)). C(X) denote the set of bounded, continuous functions from t=0 F (xt , xt+1 )
X to R. Then kks : C(X) R with kf ks supxX |f (x)|. such that x0 given and t, xt+1 (xt ).
First Welfare Theorem (Nir 13-27) The competitive equilibrium
2. Functional equation: V (x) = maxx0 (x) [F (x, x0 ) +
is Pareto optimal. Then if we know our solution to the social Convergent sequence (Nir 7-4) The sequence {xi } i=0 in S con- V (x0 )].
planner problem is the unique Pareto optimal solution, we verges to x S (or equivalently, the sequence has limit x) iff
know it must equal the competitive equilibrium. > 0, there exists n such that i > n = kxi xk < . In Assume
other words, there is a point in the sequence beyond which
Ramsey equilibrium (Nir 20-3, final solutions) A Ramsey equilib- all elements are arbitrarily close to the limit. 1. (x) is nonempty for all x;
rium is allocations, prices, and taxes such that:
Cauchy sequence (Nir 7-9) The sequence {xi }
in S is Cauchy 2. For all initial conditions x0 and feasible plans {xt }, the
i=0
1. Households maximize utility subject to their budget iff > 0, there exists n such that i > n and j > n together limit u({xt }) limn t F (xt , xt+1 ) exists (although
constraints, taking prices and taxes as given, and imply that kxi xj k < . In other words, there is a point in it may be ).
2. Government maximizes households utility while financ- the sequence beyond which all elements are arbitrarily close
to each other. Every convergent sequence is Cauchy (shown Then:
ing government expenditures (i.e., meeting its budget
constraint). via triangle inequality), but not every Cauchy sequence is
1. If W (x0 ) is the supremum over feasible {xt } of u({xt }),
convergent.
then W satisfies the FE.
3.4 Dynamic programming mathematics Complete metric space (Nir 7-1016) A metric space (S, ) is 2. Any solution V to the FE that satisfies boundedness
complete iff every Cauchy sequence in S converges to some condition limn n V (xn ) = 0 is a solution to the
Metric space (Nir 6-34) A set S and a distance function : S point in S. Importantly, if C(X) is the set of bounded, con- SP.
S R such that: tinuous functions from X to R, and kks : C(X) R is the 3. Any feasible plan {xt } that attains the supremum in
sup norm, then (C(X), kks is a complete metric space. the SP satisfies W (xt ) = F (xt , xt+1 ) + w(xt+1 ) for
1. x, y S, (x, y) 0;
t.
2. x, y S, (x, y) = 0 x = y; Contraction (Nir 8-34) If (S, ) is a metric space, the operator
T : S S is a contraction of modulus (0, 1) iff x, 4. Any feasible plan {xt } that satisfies
3. x, y S, (x, y) = (y, x);
y S, we have (T x, T y) (x, y). That is, T brings any W (xt ) = F (xt , xt+1 ) + w(xt+1 ) for t and
4. Triangle inequality: x, y, z S, (x, z) (x, y) + two elements closer together. Every contraction is a contin- limn sup t W (xt ) 0 attains the supremum in
(y, z). uous mapping. the SP. [?]

41
So approach is: solve FE, pick the unique solution that sat- 3.5 Continuous time 1. x x (1 + x);
isfies boundeness condition, construct a plan from the policy 2. xy x y (1 + x + y);
corresponding to this solution, check the limit condition to Dynamic systems in continuous time (Nir 14-23, 7) In R, the

make sure this plan is indeed optimal for the SP. system 3. x y x
y (1 + x + y);

dxt dxt 4. f (x) f (x ) + f 0 (x )x x = f (x )(1 + x).


Dynamic programming: bounded returns (Nir 10) Given the xt = axt dt
= ax x
= a dt
SP/FE as above, assume:
log xt = at + c Given Y = F (K, L), log-linearization gives Yb = Y K K b +
1. x takes on values in a convex subset of Rl ,
and (x) is xt = x0 e at Y L L
b where Y X is the elasticity of Y with respect to X:
nonempty and compact-valued for all x. This implies 0 (K , L )X
FX
assumption 1 above. converges iff real(a) < 0. Y X .
F (K , L )
2. The function F is bounded and continuous; (0, 1). In Rn , the system xt = Axt x t = xt where xt = P 1 xt
Together with assumption 1, this implies assumption 2 and the eigen decomposition is A = P P 1 . Therefore
Note that x d
(log x log x ) = d
(log x) = x/x.
above. dt dt
e1 t
3. For each x0 , the function F (, x0 ) is increasing in each
of its first arguments. xt = et x0 =
..
x0 . 3.6 Uncertainty
.

e n t
4. () is monotone; i.e., x1 x2 = (x1 ) (x2 ). Uncertainty: general setup (Nir 16-36) s {y1 , . . . , yn } set
5. F is strictly concave. of possible states of economy. st realization at time t.
st (s0 , . . . , st ) history at time t. () probability func-
Q P
For 2 2 case (recall det A = i and tr A = i ),
6. is convex; i.e., if x01 (x1 ) and x02 (x2 ), then
tion.
x01 + (1 )x02 (x1 + (1 )x2 ).
det A < 0: Saddle path Often assume a Markov process: (st+1 |st ) = (st+1 |st );
7. F is continuously differentiable on the interior of the set det A > 0, tr A > 0: Unstable i.e., only the previous period matters. Process described by
on which it is defined. tr A < 0: Sink
Ptransition matrix P where Pij = (st+1 = yj |st = yi ) with
a
Then j Pij = 1 for all i. Invariant distribution is eigenvector
Linearization in continuous time (Nir 14-6, Max notes) xt = 1n = P .
f (xt ) = xt Df (x )(xt x ). [cf discrete time
1. Under assumptions 1 and 2, the operator T where
xt+1 = g(xt ) = xt+1 x Dg (x )(xt x ).] Lucas tree (Nir 16-224) Assume von Neumann-Morgenstern util-
T V (x) maxx0 (x) [F (x, x0 ) + V (x0 )] maps bounded
ity function, thus objective function
continuous functions into bounded continuous func-
NCGM
R int
continuous time (Nir 14-815) Maximize
tions, is a contraction (and therefore has a unique fixed X
t=0 e U (ct ) dt such that kt = Wt + Rt kt ct kt . X X
point), and is the value function for the corresponding t (z t )U (ct (z t )) = E0 t U (ct (z t )).
The Hamiltonian is given by:
FE. t=0 z t t=0
h i
2. Under assumptions 14, the value function V (unique H et U (ct ) + t [ Wt + Rt kt ct kt ] Price (at time 0) of a claim that delivers one unit at time t
fixed point of T as defined above) is strictly increasing. | {z }
given history z t is pt (z t ). Budget constraint (Arrow-Debreu)
Budget constraint w/o kt
3. Under assumptions 12 and 56, the value function V is is
t X X
strictly concave, and the corresponding optimal policy e U (ct ) + t [Wt + Rt kt ct kt ]. X X
pt (z t )ct (z t ) pt (z t )yt (z t )
correspondence g is a continuous function. t=0 z t t=0 z t
H H d
4. Under assumptions 12 and 57, the value function V is First order conditions ct
= 0 and kt
= dt (et t ) =
H
where yt is a (stochastic) endowment. Normalizing p0 = 1
differentiable (by Benveniste-Scheinkman) and satisfies et (t t ) (or equivalently, k = t ), along with TVC gives FOC
envelope condition V 0 (x0 ) = F |
x (x0 ,g(x0 )
. limt e t
t
t kt = 0 (or equivalently, limt t kt = 0),
characterize the solution. Solving gives: t (z t )U 0 (yt (z t )) U 0 (yt (z t ))
Benveniste-Scheinkman Theorem (Nir 10-25) Suppose pt (z t ) = = t (z t ) .
U 0 (y0 )
U 0 (ct )
1. X Rl is a convex set; ct = U 00 (ct )
( Rt + )
Risk-free bond pricing (Nir 16-256) An asset purchased at time
2. V : X R is a concave function; kt = Wt + Rt kt ct kt . t that pays 1 consumption unit at time t + 1. Price is:
| {z }
3. x0 interior(X); and D is a neighborhood of x0 , with =yt by CRS P t
D X; zt+1 pt+1 (zt+1 , z )
qtRF (z t ) =
Imposing functional forms for U () and F () and log- t
pt (z )
4. Q : D R is a concave differentiable function with
linearizing allows us to get a continuous time dynamic system X (zt+1 , z t ) U 0 (yt+1 (zt+1 , z t ))
Q(x0 ) = V (x0 ) and x D, Q(x) V (x).
for kt and ct . =
z
t (z t ) U 0 (yt (z t ))
Then V () is differentiable at x0 with V 0 (x0 ) = Q0 (x0 ). t+1
Log-linearization (Nir 14-24, Max notes) Write every variable x as
Et [U 0 (yt+1 (zt+1 , z t ))]
The envelope condition of a value function is sometimes elog x and then linearize in log x about steady state log x . = .
called the Benveniste-Scheinkman condition. Use notation x log x log x xx
. For example, U 0 (yt (z t ))
x

42
Stock pricing (Nir 16-267) An asset purchased at time t that pays x = Ra (x ) + y, the RHS of which is stochastic. So endowment that has no aggregate shock. Neither can com-
dividend dt (z t ). Price is: x . mit to remain in an insurance contract. Value to insurer in
excess of autarky is:
P P s s Thus in an infinite-horizon model, complete and incomplete
s=t+1 z s ps (z )ds (z )
qttree (z t ) = t
markets with R = 1 necessarily look different. Q(0 , s0 ) =
pt (z ) X
max u(1 c) u(1 y(s0 )) + (s)Q((s), s)
X U 0 (ys (z s )) Halls martingale hypothesis (Manuel) Suppose u(c) = Ac
= Et st 0 ds (z s ). Bc2 (which may be a second-order Taylor expansion of the
c,{(s)}s
sS
s=t+1
U (yt (z t ))
true utility function), and that only a risk-free bond is avail- such that
able, with R = 1. Intereuler gives that E ct+1 = ct ; that is, X
[PK ()] : u(c) u(y(s0 )) + (s)(s) 0 ,
3.7 Manuel Amador {ct } follows an AR(1) process.
sS
However, many models have c today as a predictor of c to- [IC-1 ((s)(s))] : (s) 0,
Lucas cost of busines cycles (Manuel) We find that the wel- morrow, even without the same intereuler (e.g., the Aiyagari
fare impact of eliminating the business cycle is very small model). Tests are always rejected, but are generally run with [IC-2 ((s)(s))] : Q((s), s) 0.
(< 0.04%), especially compared to the impact of raising the aggregate data, with resulting problems of
growth rate of consumption. Complaints include the facts
FOCs and envelope condition give that
that: 1. Goods aggregation (i.e., how do you define consumption
given a basket of goods?), Q0 (0 , s0 ) = (s) + (1 + (s))Q0 ((s), s),
1. A representative agent is bad for measuring utility and
2. Agent aggregation (i.e., even if intereuler holds for each thus either consumption stays the same, increases the the
estimating the variance in consumption.
of hetrogeneous agents, it may not hold in aggregate), lowest level that satisfies IC-1, or falls to the highest level
2. The model treats shocks as transitory; treating con- that satisfies IC-2.
sumption as persistent results in significantly higher 3. Time aggregation (i.e., how do you deal with data that
cost estimates. is in discrete time period chunks?). Bulow-Rogoff defaultable debt model (Manuel) An agent
(who need only have monotone preferences) cannot com-
3. Uncertainty could also affect the consumption growth mit to repay a risk-neutral insurer; if he defaults, he will
The presence of durable goods introduces an MA component
rate, which requires an endogenous growth model (not only be able to save in a complete-markets Swiss bank
to {ct }, making it an ARMA process.
the NCGM). account that pays expected return R.
Incomplete markets: finite horizon (Manuel) Suppose a two One-sided lack of commitment (Manuel) One reason markets If wealth
might be incomplete is a lack of commitment. Suppose risk-
XX y(s )
period model with uncertain income in the second period, W (st ) (s |st )
and incomplete markets in which only a risk-free bond is averse consumuer can borrow from a risk-neutral lender, but t s
R t
available at return R. The consumer chooses savings a to cant commit to repayment. Suppose R = 1; lender takes
stochastic endowment and gives consumer consumption c(s) is finite for allst ,and we impose no Ponzi condition (natural
maximize u(y0 a) + E u(y1 + Ra), yielding FOC (in- borrowing constraint) that debt
tereuler) today, promising consumer future lifetime value w(s). Thus
u0 (y0 a ) = R E u0 (y1 + Ra ). value to lender is: XX p(s )
D(st ) (s |st ) t W (st )
X t s
R
If instead there were no uncertainty in the second period, the
 
P (v) = max (s) y(s) c(s) + P (w(s))
analogous FOC would be {c(s),w(s)}s
sS for all st , then debt will always be nonpositiveif the agent
were ever a borrower, he would do better to defect.
u0 (y0 a) = Ru0 (E y1 + Ra). such that
Thus to get (e.g., international) debt, we need either
By Jensens inequality, optimal savings is higher in the un-
X  
[PK ()] : (s) u(c(s)) + w(s) v, 1. Infinite wealth in some state st ,
certain world (a > a) as long as u000 > 0. sS 2. A limited ability to save in nearly-complete markets, or
Thus in a two period model, the difference between savings [IC ((s))] : u(c(s)) + w(s) u(y(s)) + vautarky . 3. Lenders ability to punish.
in complete and incomplete markets depends on the sign of | {z }
u000 . E
u(y(s0 )) Hyperbolic discounting (Manuel) Agent has a higher discount
1
rate between today and tomorrow than between tomorrow
Incomplete markets: infinite horizon (Manuel) Again, we and following periods (time-inconsistent). Often parameter-
suppose uncertain income, and the availability of only a FOCs and envelope condition give that ized as - discounting:
risk-free bond. Taking FOC and envelope condition of
1 1
+ (s0 ),
X
= 0 u(ct ) + u(ct+ ).
u0 (c(s0 )
X
V (x) = max u(x a) + (s)V [Ra + y(s)] u (c(s)
=1
a[,x]
sS
so consumption is weakly increasing over time. This is not Caution: Suppose an agent only likes candy bars when hes
gives that V 0 (x) R E[V 0 (x0 )]. Thus if R 1, we consistent with G.E., where we therefore must have R < 1. healthy. If he knows hes healthy today, he may prefer 1 to-
have V 0 (x) is a nonnegative supermartingale and converges day to 2 tomorrow, but 2 the day after tomorrow to 1 tomor-
to a finite value by Dobbs Convergence Theorem. Thus Two-sided lack of commitment (Manuel) [Kocherlakota] Sup- row. This may wind up looking like hyperbolic discounting,
x converges, but cannot converge to a finite value since pose two risk-averse consumuers who each get a stochastic but isnt.

43
3.8 John Taylor 6. Use 0 = H0 + P
0 as another equation to solve for 4 Mathematics
solution.
Taylor Rule (Taylor) A policy rule (or family of policy rules given
different coefficients) for the nominal interest rate as a func- Cagan model (Taylor) Rational expectations model given by: 4.1 General mathematical concepts
tion of (four-quarter average) inflation and the GDP output
gap (expressed as a percentage deviation from trend): mt pt = (pt+1 pt ). Elasticity (Wikipedia) Elasticity of f (x) is
i = + 0.5y + 0.5( 2) + 2
Derived from Cagan money demand equation with
log f (x) f x f /f xf 0 (x)

= 1.5 + 0.5y + 1.
log x x f (x) x/x f (x) .
= = =
|{z}
>1 1. Either zero coefficient on yt , or yt = 0 for all t, and
The greater than one principal suggests that the coefficient 2. rt ignored in it = rt + t+1 = rt + pt+1 pt .
on inflation should exceed one. Eulers law (Producer 14) If f is differentiable, it is homogeneous
Price reactions to money supply shocks are not quite one- of degree k iff p f (p) = kf (p).
Granger causality (Taylor) pt Granger causes yt iff the er- for-one, since theres an anticipated future deflation which
ror from predicting yt based on lagged y and p causes people to hold some of the extra money. One direction proved by differentiating f (p) = k f (p) with
2
(prediction (yt |pt1 , pt2 , . . . , yt1 , yt2 , . . . )) is less than respect to , and then setting = 1. Implies that if f is ho-
the prediction error from predicting yt based on lagged y only mogeneous of degree one, then f is homogeneous of degree
Dornbush model (Taylor) Rational expectations model given by:
2
(prediction (yt |yt1 , yt2 , . . . )). Note this is not a philo- zero.
sophical point, and does not address causation vs. correla- mt pt = (et+1 et )
tion issues. Strong set order (Producer 32) A B in the strong set order iff
pt pt1 = (et pt ).
That is, the hypothesis that p does not Granger cause y is for all a A and b B with a b, then a B and b A.
the hypothesis that coefficients on lagged ps are all zero. Equivalently, every element in A \ B is every element in
(Note higher e is currency depreciation.)
A B, which is every element in B \ A.
Okuns Law (Taylor) Prices adjust gradually, but the exchange rate overshoots.
By first equation, if money supply is shocked there must be Meet (Producer 36) For x, y Rn , the meet is x y
Y Y
= 2.5(u u ) expected future currency appreciation (e ); but there must (min{x1 , y1 }, . . . , min{xn , yn }). More generally, on a par-
Y be long run depreciation (e ) by second equation. tially ordered set, x y is the greatest lower bound of x and
where Y is potential GDP and u is the natural rate of y.
unemployment. Philips Curve (Taylor) Connection between inflation and either
GDP gap or unemployment (by Okuns Law).
Cagan money demand (Taylor) mt pt = ayt bit , where money Join (Producer 36) For x, y Rn , the meet is x y
supply m, price level p, and output gap y are all logs. LHS 1. Traditional (short-run): (max{x1 , y1 }, . . . , max{xn , yn }). More generally, on a par-
is real money balance; when interest rate is higher, demand tially ordered set, x y is the least upper bound of x and
y.
for money is lower (semi-log specification). = sr y = 2.5sr (u u ).
Note Lucas prefers log-log specification mt pt = ayt
b log(it ). 2. Expectations-augmented (long-run): Sublattice (Producer 37) A set X is a sublattice iff x, y X, we
have x y X and x y X. Any sublattice in Rn can be
Rational expectations models (Taylor) Variables in system are described as an intersection of sets of the forms
= + lr y = 2.5lr (u u ).
a function of exogenous shocks, lags, and expected future
values of variables. General solution method:
Time inconsistency (Taylor) Three types of policy plans 2 that 1. A product set X1 Xn ; or
1. Set up in vector notation (may require time shifting and aim to maximize social welfare S(x1 , x2 , 1 , 2 ) where are 2. A set {(x1 , . . . , xn ) : xi g(xj )}, where g() is an in-
substitution. policies and x are responses (including expectations of pol- creasing function.
2. Use method of undetermined coefficients to get a (de- icy):
terministic) difference equation in the coefficients (i )
1. Consistent (discretion): Take x1 as given when choosing Orthants of Euclidean space (?)
on the MA() representation of variables.
2 .
3. Guess a form for the particular solution, and solve for
2. Optimal (rule-based): Recognize that x1 responds to 1. Rn
+ {x : x 0} {x : xi 0 i}, which includes the
coefficients.
2 . axes and 0.
4. Cholesky decompose the coefficient matrix (A =
HH 1 ) in the homogenous part of the determinis- 3. Inconsistent (cheating): Promise optimal, but then 2. {x : x > 0} {x : xi 0 i} \ 0, which includes the
tic difference equation to get a decoupled equation (in name consistent. axes, but not 0.
i H 1 iH ).
3. Rn
++ {x : x  0} {x : xi > 0 i}, which includes
5. Apply stability condition to pin down some element(s) Lucas supply function (Taylor) yt = (t t ) + t . Can be
neither the axes nor 0.
of 0 . seen directly from Expectations-augmented Philips curve.

44
Sum of powers (?) 4.4 Functions Taylor series (C&B 5.5.201) If g(x) has derivatives of order r (i.e.,
n g (r) (x) exists), then for any constant a the Taylor polynomial
X n(n + 1)
i= Function (Math camp) A binary relation that is of order r about a is:
i=1
2
n 1. Well defined on its range: x, y such that xRy; r
X n(n + 1)(2n + 1) g (i) (a)
i2 =
X
2. Uniquely defined on its range: xRy xRz = y = z. g(x) Tr (x) (x a)i .
i=1
6 i!
i=0
n Surjective Range is Y . (a.k.a. onto.)
X n2 (n + 1)2 (Math camp)
i3 =
4 The remainder from the approximation Taylor approxima-
i=1 Injective (Math camp) x 6= x0 = f (x) 6= f (x0 ).
tion, g(x) Tr (x) equals ax g (r+1) (t) (x t)r /r! dt. This
R
n
X n(n + 1)(2n + 1)(3n2 + 3n 1) error always tends to 0 faster than the highest-order explicit
i4 = Bijective (Math camp) Both surjective and injective. (a.k.a. one-
30 to-one.) termi.e., limxa [g(x) Tr (x)]/(x a)r = 0.
i=1

Geometric series (?) Monotone (C&B p. 50) g(x) monotone on its domain iff either u > Derivative of power P series (Hansen 2-29) If g(t) a0 + a1 t +
n v = g(u) > g(v) (increasing), or u < v = g(u) < g(v)
X
ia(1 Rn+1 ) a2 t2 + = i ai ti , then
aR = (decreasing). A monotone function is one-to-one and onto
1R
i=0 from its domain to its image (note, not necessarily to its
range). k

X a
Ri = (if |R| < 1) g(t) = ak k!.
1R tk
t=0
i=0 Convex function (Hansen 2-22) g : Rn R is convex iff x, y
Rn , [0, 1], g(x) + (1 )g(y) g(x + (1 )y).
4.2 Set theory This is useful for calculating moments if we can express the
Binomial theorem (C&B 3.2.2)
P For nany x, y R and n Z, mgf as a power series.
n 0, then (x + y)n = n
 i ni
Supremum lim supn An limn An i=0 i x y . Special cases: 1 =
T Slimit (Math camp)
n i n
(p + (1 p))n = n
 ni and 2n =
Pn

P
n=1 k=n Ak . Set of all elements appearing in an infinite i=0 i p (1 p) i=0 i .
Taylor series examples (?) f (x) f (x0 ) + f (x0 ) (x x0 ).
number of Ai .
Gamma function (C&B p. 99) () 1 et dt
R Note that for concave (convex) f (), the LHS is weakly less
0 t =
(greater) than the RHS.
Infimum
S limit (Math lim inf n An
camp) limn An R1 1 dt on > 0. ( is also defined everywhere
T 0 [log(1/t)]
n=1 k=n Ak . Set of all elements appearing in all but else except for 0, 1, 2, . . . .)
a finite number of Ai . For small , we have:

Partition (Math camp) {A } such that A A and 1. ( + 1) = (), when > 0.


en 1 + n;
2. (n) = (n 1)! for integer n > 0.
1. Non-empty: A 6= ; (1 + )n 1 + n;
2. Exhaustive:
S
A = A; 3. ( 12 ) = .
log(1 + ) ;
3. Non-overlapping: Ai 6= Aj = Ai Aj = .
a 1
20
gamma(x)
log a.
Convex set (Math camp) x and x0
S, t [0, 1], tx + (1 t)x0
15
S (i.e., the linear combination of any two points in S is also 10

in S.) 5

0
Second-order is
-5

4.3 Binary relations -10

-15
f (x) f (x0 ) + f (x0 ) (x x0 ) + 12 (x x0 ) 2 f (x0 )(x x0 ).
Complete (Math camp) aRb bRa. -20
-4 -2 0 2 4

Transitive (Math camp) aRb bRc = aRc.


Beta function (C&B eq. 3.3.17) B(, ) () ()/( + ). Mean value expansion (Mahajan 3-13; Hayashi 4701) If h : Rp
Symmetric (Math camp) aRb bRa.
Rq is a continuously differentiable function, then h() admits
Negative transitive (Math camp) aRb = cRb aRc. Logistic function (D&M 515; Metrics P.S. 5-4a)
a mean value expansion about

Reflexive aRa. ex
(Math camp)
(x) (1 + ex )1 .
1 + ex h(b) = h() +
h(b)
(b )
Irreflexive (Math camp) a 6= b = aRb. b
Inverse is 1 () `() log[/(1 )]. First derivative
Asymmetric (Math camp) aRb b 6= c = cRa.
where b is a value lying between b and (note the MVT ap-
ex
Equivalence relation (Math camp) A binary relation that is re- (x) 0 (x) = (x)(1 (x)) = (x)x. plies to individual elements of h(), so b actually differs from
flexive, symmetric, and transitive. (1 + ex )2 element to element in the vector equation.

45
Pn Pm
4.5 Correspondences Determinant (Amemiya Ch. 11; Greene 236) |A| 6. tr(Anm A0 ) = tr(A0 A) = i=1
2
j=1 aij =
i+j a |A |, where j is an arbitrarily chosen in-
P Pm m
0a = 0 ), where a are the columns
i (1)
P
ij ij a
j=1 j j j=1 tr(a a
j j i
Correspondence (Micro math 5) : X Y is a mapping from teger and Aij is the matrix that deletes the ith row and jth of A.
elements of X to subsets of Y (i.e., for x X, we have column from A.
(x) Y ). Caution: in general, tr(AB) 6= tr(A) tr(B).
This is the volume of the parallelotope with edges along the
columns of A.
Lower semi-continuity (Micro math 7; Clayton G.E. I) : X Y Linear independence, rank (Amemiya Ch. P 11; Greene 203, 39) Vec-
is lower semi-/hemi-continuous at x X iff for every open tors {xi } linearly independent iff i ci x i = 0 = i,
1. Determinant of a 2 2 matrix A is a11 a22 a12 a21 .
set G Y containing (x), there is an open set U (x) X ci = 0.
containing x such that if x0 U (x), then (x0 ) G 6= . 2. Determinant of a 3 3 matrix A is a11 a22 a33
a11 a32 a23 a21 a12 a33 + a21 a32 a13 + a31 a12 a23 1. The following are equivalent and called nonsingularity
Intuitively, any element in (x) can be approached from a31 a22 a13 . of square A:
all directions.
3. |A| = |A0 |. |A| 6= 0, A1 exists;
For a single-valued correspondence (i.e., a function), lsc
4. For diagonal A, |A| =
Q
aii ; thus |I| = 1. A is column independent or row independent;
usc continuous. i
y, x, Ax = y;
5. |cAnn | = cn |A|.
Upper semi-continuity (Micro math; Clayton G.E. I; Bernheim 32) Ax = 0 = x = 0.
: X Y is upper semi-/hemi-continuous at x X iff 6. If A, B both n n, then |AB| = |A| |B|; together with
|I| = 1, this gives |A1 | = |A|1 . 2. Column rank (maximum number of linearly indepen-
for every open set G Y containing (x), there is an open dent columns) equals row rank, which implies rank A =
set U (x) X containing x such that if x0 U (x), then 7. If any column/row of A is all zeroes, |A| = 0; switch- rank A0 .
(x0 ) G. ing two adjacent columns/rows changes the sign of the
3. Anm is full rank iff rank equals min(n, m); a square
Identically, if for any sequences xt x and yt y with determinant; if two c/rs are identical, the determinant
matrix is full rank iff it is nonsingular.
yt (xt ) for all t, we have y (x). is zero.
4. rank(A) = rank(A0 A) = rank(AA0 ); in particular,
Intuitively, (x) does not suddenly contain new points; 8. For any eigenvalue , |A I| = 0.
Anm with m n (i.e., tall) is full rank iff A0 A
i.e., the graph of the function is a closed set.
Q
9. |Ann | = i i (i.e., determinant is the product of is nonsingular.
For a single-valued correspondence (i.e., a function), usc eigenvalues).
5. A is nonsingular = rank(AB) = rank(B).
lsc continuous.
Matrix inverse (Amemiya Ch. 11; Greene 301) A1 A = AA1 = I 6. rank(Ann ) equals the number of nonzero eigenvalues
Brouwers Fixed Point Theorem (Clayton G.E. I) If a function for any matrix A such that |A| 6= 0. (not necessarily distinct).
f : A A is continuous, and A is compact and convex, then
1. If A, B both n n, and |A| =
6 0, |B| 6= 0, then Eigen decomposition (Mathworld) For any square A with a di-
f (x) = x for some fixed point x A.
(AB)1 = B1 A1 . agonal matrix containing eigenvalues of A, and P a matrix
whose columns are the eigenvectors of A (ordered as in ),
Kakutanis Fixed Point Theorem (Clayton G.E. I; Bernheim 32) 2. If A1 exists, then (A1 )0 = (A0 )1 .
then A = PP1 .
Suppose S is compact and convex. : S S is convex-
3. The inverse of a 2 2 matrix is
valued and upper semi-continuous. Then there exists a fixed Orthogonal decomposition (Amemiya Ch. 11; Greene 378) a.k.a.
point s S such that s (s ).  1    
spectral decomposition. For any symmetric A, Hnn ,
a b 1 d b 1 d b
= = . nn such that A = HH0 , where H0 H = I (i.e., H is
c d |A| c a ad bc c a
orthogonal) and is diagonal. Note that H0 AH = .
4.6 Linear algebra
4. For diagonal A, the inverse is diagonal with elements This is just the eigen decomposition for a symmetric matrix;
Matrix (Amemiya Ch. 11) Matrix Anm {aij } has n rows and m a1
ii . here the matrix of eigenvectors is orthogonal.
columns.
Orthogonal matrix (Amemiya Ch. 11) A square matrix Ann is or- Matrix squareroot (Greene 42) Given a positive definite symmet-
1. Square iff n = m. thogonal iff A0 = A1 . This corresponds to having columns ric matrix A = HH0 , we can get a (symmetric!) matrix
orthogonal and normalized. squareroot A1/2 H1/2 H0 , where 1/2 is the diagonal
2. Transpose A0 {aji } is m n. matrix that takes the squareroot of diagonal elements of .
P
3. Symmetric iff square and i, j, aij = aji , or equiva- Trace (Amemiya Ch. 11; Metrics section; Greene 412) tr(Ann ) i aii
lently iff A = A0 . (i.e., the sum of the diagonal elements). Eigenvector, eigenvalue (Amemiya Ch. 11; Woodford 6702) Note
some results may only hold for symmetric matrices (not made
4. Diagonal iff i 6= j, aij = 0 (i.e., nondiagonal elements 1. tr(cA) = c tr(A). explicit in Amemiya). Let the orthogonal decomposition of
are 0). symmetric A be A = HH0 , where H is orthogonal and
2. tr(A0 ) = tr(A).
is diagonal.
Matrix multiplication (Amemiya Ch. 11) If c is a scalar, cA = 3. tr(A + B) = tr(A) + tr(B).
Ac = {caij }. If A is n m and
P B is m r, then C = AB is 1. Diagonal elements of D(i ) are eigenvalues (a.k.a.
4. tr(Anm Bmn ) = tr(BA).
an n r matrix with cij = m k=1 aik bkj . If AB is defined, characteristic roots) of A; columns of H are the corre-
then (AB)0 = B0 A0 .
P
5. Trace is the sum of eigenvalues: tr(Ann ) = i i . sponding eigenvectors (a.k.a. characteristic vectors).

46
2. If h is the eigenvector corresponding to , then Ah = 7. A 0 = |A| 0 and A > 0 = |A| > 0 Partitioned matrices (Hayashi 670673; Greene 323)
h and |A I| = 0. (This is an alternate definition note the natural analogues for negative (semi)definite 
A11 A12

1
for eigenvectors/eigenvalues.) matrices do not hold. A21 A22 = |A22 | |A11 A12 A22 A21 |

3. Matrix operations f (A) can be reduced to the corre-
sponding scalar operation: f (A) = HD[f (i )]H0 .
Orthogonal completion (Hansen 5-24) For a wide matrix Ank = |A11 | |A22 A21 A1 11 A12 |;
(i.e., n < k) with full row rank (i.e., all rows are linearly inde-
P
pendent), we can construct a (non-unique) (k n)k matrix A11 A1N c1 i A1i ci

4. rank(Ann ) equals the number of nonzero eigenvalues
(not necessarily distinct). A such that: . .. .. .. = .
.. . . P .. ;

.
5. Anm Bmn and BA have the same nonzero eigenval- AM 1 AM N cN A c
1. The rows of A and A are linearly independent (i.e., i M i i
ues.
|(A0 , A0 )| 6= 0;
A11 c1

A11 c1

6. Symmetric Ann and Bnn can be diagonalized by the
2. The rows of A are orthogonal to the rows of A (i.e., .. . ..
.. = ;

same orthogonal matrix H iff AB = BA. .
A A0 = 0knn , or identically, AA0 = 0nkn ). .

Q
7. |Ann | = i i (i.e., determinant is the product of
AM M cN AM M cM
eigenvalues). For a tall matrix Bnk (i.e., n > k) with full column rank
A11 B11

P (i.e., all columns are linearly independent), we can construct
8. tr(Ann ) = i i (i.e., trace is the sum of eigenval- Adiag Bdiag = .
a (non-unique) n (n k) matrix B such that: .. ;

ues).
AM M BM M
9. A and A1 have same eigenvectors and inverse eigen- 1. The columns of B and B are linearly independent (i.e.,
values. |(B, B )| 6= 0;
A0diag BAdiag
2. The columns of B are orthogonal to the columns
A 2 2 matrix A has two explosive eigenvalues (outside the
of B (i.e., B0 B = 0nkk , or identically, B0 B = A011 B11 A11 A011 B1M AM M

unit circle) iff either
0knk ). = . .. .
.. .. ;

.
Case I:
Idempotent matrix (Hansen 5-334; Amemiya 37; Hayashi 301, 367, A0M M BM 1 A11 A0M M BM M AM M
244-5) P is idempotent iff PP = P.
|A| + tr A 1,
|A| tr A 1, 1. If P is idempotent, then so is I P. A0diag Bc
|A| 1. A011 B11 c1 + . . . + A011 B1M cM

2. Every eigenvalue of a (symmetric?) idempotent matrix
is either 0 or 1. = ..
;

Case II: .
3. A symmetric and idempotent matrix is positive semidef- 0 0
AM M BM 1 c1 + . . . + AM M BM M cM
|A| + tr A 1, inite. 1
|A| tr A 1, 4. For any Xnk , we can find an idempotent projection A11
matrix (that projects onto the subspace of Rn spanned A1
..
diag =
.
|A| 1 (trivially). .

by the columns of X): PX X(X0 X)1 X0 .

1
AM M
Positive definiteness, &c. (Amemiya Ch. 11; Greene 469) Symmet- 5. For any Xnk tall (i.e., n > k), we can find an idempo-
ric A is positive definite iff x 6= 0, x0 Ax > 0. Also written tent projection matrix (that projects onto the subspace The inverse of a 2 2 partitioned matrix is:
A > 0; if A B is positive definite, write A > B. If equal- of Rn not spanned by columns of X): I PX = PX .  1
ity isnt strict, A is positive semidefinite (a.k.a. nonnegative A11 A12
6. rank(P) = tr(P) if P is idempotent. A21 A22
definite). Negative definite and negative semidefinite (a.k.a.
 1
nonpositive definite) similarly defined. A11 + A1 1
A1

Matrix derivatives (Ying handout; Greene 5153; MaCurdy) [Note
= 11 A12 F2 A21 A11 11 A12 F2 ,
derivative convention may be transpose of MWG.] F2 A21 A1
11 F2
1. A symmetric matrix is positive definite iff its eigenval-
ues are all positive; similar for other (semi-)definiteness. 1.
x
Ax = A0 . where F2 (A22 A21 A1 11 A12 )
1 . The upper left block

2. If A > 0, all diagonal elements are > 0 (consider a 0 can also be written as F1 (A11 A12 A1
22 A21 )
1
2. x
x Ax = (A + A0 )x.
quadratic form with a unit vector).
3.
x0 Ax = xx0 . Kronecker product (Hayashi 673; Greene 345) For AM N and
3. B0 B 0; if B has full column rank, then B0 B > 0. A BKL ,
4. A > 0 = A1 > 0 (i.e., the inverse of a positive 4.
log |A| = A0 1 . a11 B a1N B

A
definite matrix is positive definite). . .. ..
5.
log |A| = A0 . AB= . .
A1 . .
5. Bnk tall (i.e., n k), Ann 0 = B0 AB 0. aM 1 B aM N B

If B is full rank (i.e., rank(B) = k), then Ann > 6. w
log |A| = |A|1 w |A| = tr[A1 w
A] (may re-
0 = B0 AB > 0. quire symmetric A?). is M K N L. Operation is not commutative.

6. For Ann and Bnn both positive definite, A 7. w
A1 = A1 [ w A]A1 (may require symmetric 1. (A B)(C D) = AC BD (assuming matrix multi-
B B1 A1 , and A > B B1 > A1 . A?). plications are conformable);

47
1
2. (A B)0 = A0 B0 ; Deviation from means (Hansen 5-345;P Greene 145) M x is the is the symmetric idempotent matrix with 1 n
on diagonal,
3. (A B)1 = A1 B1 ; vector of xi x, and x0 M x = i (xi x)2 , where and n 1
off diagonal.
4. tr(AM M BKK ) = tr(A) tr(B);
5. |AM M BKK | = |A|M |B|K . M I (0 )1 0 = I 1 0
n

5 References MaCurdy MaCurdy: Economics 272 notes


Mahajan Mahajan: Economics 270/271 notes
Amemiya Amemiya: Introduction to Econometrics and Statistics Manuel Amador: Economics 211 lectures
Bernheim Bernheim: Economics 203 notes Math camp Fan: Math camp lectures
C&B Casella, Berger: Statistical Inference, 2 ed. MathWorld mathworld.wolfram.com
Choice Levin, Milgrom, Segal: Introduction to Choice Theory notes Max notes Floetotto: Economics 210 section
Clayton Featherstone: Economics 202 section notes Micro Math Levin, Rangel: Useful Math for Microeconomists notes
Consumer Levin, Milgrom, Segal: Consumer Theory notes MWG Mas-Collell, Whinston, Green: Microeconomic Theory
D&M Davidson, MacKinnon: Estimation and Inference in Econometrics Nir Jaimovich: Economics 210 notes
G.E. Levin: General Equilibrium notes Producer Levin, Milgrom, Segal: Producer Theory notes
Greene Greene: Econometric Analysis, 3 ed. Taylor Taylor: Economics 212 notes
Hansen Hansen: Economics 270/271 notes Uncertainty Levin: Choice Under Uncertainty notes
Hayashi Hayashi: Econometrics Wikipedia wikipedia.com
Jackson Jackson: Economics 204 lectures Woodford Woodford: Interest and Prices


In Greene, M is called M0 .

48
Index
() (Gamma function), 45 Autoregressive conditional heteroscedastic process, 17
() (standard normal cdf), 7 Autoregressive process of degree p, 16
3 (skewness), 7 Autoregressive process of degree one, 16
4 (kurtosis), 7 Autoregressive/moving average process of degree (p, q), 16
- discounting, 43 Axioms of Probability, 4
2 distribution, 8
(expected value), 6 B() (Beta function), 45
n (central moment), 7 B1 (Borel field), 4
0n (moment), 7 Banach fixed point theorem, 41
() (characteristic function), 7 Basic set theory, 4
() (standard normal pdf), 7 Basus theorem, 11
(correlation), 7 Bayes Rule, 4
(standard deviation), 7 Bayesian Nash equilibrium, 36
2 (variance), 7 Benveniste-Scheinkman Theorem, 42
0-1 loss, 14 Bernoulli utility function, 32
2SLS (Two-Stage Least Squares), 21 Bertrand competition, 36
3SLS (Three-Stage Least Squares), 22 Bertrand competitionhorizontal differentiation, 36
Bertrand competitionnon-spatial differentiation, 36
Absolute risk aversion, 33 Bertrand competitionvertical differentiation, 36
Absolute value loss, 14 Best linear predictor, 18
Action space, 14 Best linear unbiased estimator, 18
Adding-up, 29 Best unbiased estimator, 13
Additively separability with stationary discounting, 40 Beta function, 45
Adverse selection, 38 Biases affecting OLS, 18
Aggregating consumer demand, 32 Big O error notation, 10
AIIA (Arrows independence of irrelevant alternatives), 39 Bijective, 45
Almost sure convergence, 10 Billingsley CLT, 16
Alternate hypothesis, 14 Binary response model, 24
Analogy principle, 11 Binomial theorem, 45
Ancillary statistic, 11 Bivariate normal distribution, 8
Annihilator matrix, 18 Blackwells sufficient conditions for a contraction, 41
Anonymity, 39 BLP (best linear predictor), 18
AR(1) (autoregressive process of degree one), 16 BLUE (best linear unbiased estimator), 18
AR(p) (autoregressive process of degree p), 16 BNE (Bayesian Nash equilibrium), 36
ARCH (autoregressive conditional heteroscedastic) process, 17 Bonferronis Inequality, 25
ARMA process of degree (p, q), 16 Booles Inequality, 25
Arrows independence of irrelevant alternatives, 39 Borel field, 4
Arrows Theorem, 39 Borel Paradox, 5
Arrow-Pratt coefficient of absolute risk aversion, 33 Brouwers Fixed Point Theorem, 46
Ascending price auction, 37 Brownian motion, 24
Associativity, 4 BS (Benveniste-Scheinkman Theorem), 42
Asymmetric, 45 Budget set, 31
Asymptotic equivalence, 10 BUE (best unbiased estimator), 13
Asymptotic normality for GMM estimator, 13 Bulow-Rogoff defaultable debt model, 43
Asymptotic normality for M-estimator, 13
Asymptotic normality for MD estimator, 13 Cagan model, 44
Asymptotic normality for MLE, 12 Cagan money demand, 44
Asymptotic properties of OLS estimator, 18 CARA (constant absolute risk aversion), 33
Asymptotic significance, 15 Carrier axiom, 38
Asymptotic size, 15 Cauchy sequence, 41
Asymptotic variance, 12 Cauchy-Schwarz Inequality, 25
Asymptotically efficient estimator, 12 cdf (cumulative distribution function), 5
Asymptotically normal estimator, 12 CE (correlated equilibrium), 35
Autocovariance, 15 Censored response model, 25

49
Central Limit Theorem for MA(), 16 Continuous r.v., 5
Central Limit Theorem for ergodic stationary mds, 16 Contract curve, 33
Central Limit Theorem for iid samples, 10 Contraction, 41
Central Limit Theorem for niid samples, 10 Contraction Mapping Theorem, 41
Central Limit Theorem for zero-mean ergodic stationary processes, 16 Contraction on subsets, 41
Central moment, 7 Convergence almost surely, 10
Certain equivalent, 33 Convergence in Lp , 10
Certain equivalent rate of return, 33 Convergence in distribution, 10
Characteristic function, 7 Convergence in mean square, 10
Characteristic root, 46 Convergence in probability, 9
Characteristic vector, 46 Convergence in quadratic mean, 10
Chebychevs Inequality, 25 Convergent sequence, 41
Chi squared distribution, 8 Convex function, 45
Choice rule, 27 Convex game, 38
CLT (Central Limit Theorem) for MA(), 16 Convex preference, 28
CLT (Central Limit Theorem) for ergodic stationary mds, 16 Convex set, 45
CLT (Central Limit Theorem) for iid samples, 10 Convexity, 28
CLT (Central Limit Theorem) for niid samples, 10 Convolution formulae, 6
CLT (Central Limit Theorem) for zero-mean ergodic stationary processes, 16 Core, 37
CMT (Continuous Mapping Theorem), 9 Core convergence, 37
CMT (Contraction Mapping Theorem), 41 Correlated equilibrium, 35
Cobb-Douglas production function, 40 Correlation, 7
Coefficient of absolute risk aversion, 33 Correspondence, 46
Coefficient of relative risk aversion, 33 Cost function, 29
Commutativity, 4 Cost minimization, 29
Compensated demand correspondence, 31 Cost of business cycles, 43
Compensating variation, 32 Counting, 4
Competitive producer behavior, 28 Cournot competition, 36
Complement goods, 31 Covariance, 7
Complement inputs, 30 Covariance stationarity, 15
Complete, 45 Cramer-Rao Inequality, 14
Complete information, 34 Cramer-Wold Device, 10
Complete metric space, 41 Critical region, 14
Complete statistic, 11 CRLB (Cramer-Rao lower bound), 14
Completeness, 27 CRRA (coefficient of relative risk aversion), 33
Conditional expectation, 6 CRRA (constant relative risk aversion), 33
Conditional factor demand correspondence, 29 CRRA (constant relative risk aversion) utility function, 40
Conditional homoscedasticity, 10 CRS (constant returns to scale), 28
Conditional pdf, 5 CS (Marshallian consumer surplus), 32
Conditional pmf, 5 Cumulant generating function, 7
Conditional probability, 4 Cumulative distribution function, 5
Conditional variance, 7 CV (compensating variation), 32
Condorcet winner, 39
Condorcet-consistent rule, 39 DARA (decreasing absolute risk aversion), 33
Consistency for for MLE, 12 Decision rule, 14
Consistency for GMM estimators, 13 Defaultable debt, 43
Consistency with compact parameter space, 11 Deferred acceptance algorithm, 38
Consistency without compact parameter space, 12 Delta Method, 10
Consistent estimator, 11 Demand for insurance, 33
Consistent strategy, 36 DeMorgans Laws, 4
Consistent test, 15 Derivative of power series, 45
Constant relative risk aversion utility function, 40 Descending price auction, 37
Constant returns to scale, 28 Determinant, 46
Consumer welfare: price changes, 32 Deviation from means, 48
Continuous function, 41 Diagonal matrix, 46
Continuous Mapping Theorem, 9, 10 Dickey-Fuller Test, 24
Continuous preference, 28 Difference-stationary process, 24

50
Direct revelation mechanism, 34 Extensive form trembling hand perfection, 36
Discrete r.v., 5 Extremum estimator, 11
Disjointness, 4
Distance function principle, 14 F distribution, 8
Distributive laws, 4 F test, 19
Dixit-Stiglitz aggregator, 40 Factorial moment generating function, 7
Dobbs Convergence Theorem, 43 Factorization theorem, 11
Dominant strategy, 35 FCLT (Function Central Limit Theorem), 24
Dominated strategy, 35 Feasible allocation, 33
Donskers Invariance Principle, 24 Feasible generalized least squares, 19
Dornbush model, 44 Feasible payoff, 37
DRRA (decreasing relative risk aversion), 33 FGLS (Feasible Generalized Least Squares), 19
Dummy axiom, 38 FIML (Full Information Maximum Likelihood), 23
Durbin-Wu-Hausman test, 21 Finitely repeated game, 37
Dutch auction, 37 First Welfare Theorem, 33, 41
DWH (Durbin-Wu-Hausman) test, 21 First-order ancillary statistic, 11
Dynamic programming: bounded returns, 42 First-order stochastic dominance, 33
Dynamic systems, 41 First-price sealed bid auction, 37
Dynamic systems in continuous time, 42 Fisher Information, 10, 14
Fisher Information Equality, 14
Edgeworth box, 33 Fitted value, 18
Efficient GMM, 13, 20 FIVE (Full-Information Instrumental Variables Efficient), 21
EFTHPNE (Extensive form trembling hand perfect equilibrium), 36 Folk Theorem, 37
Eigen decomposition, 46 Free disposal, 28
Eigenvalue, 46 Frisch-Waugh Theorem, 19
Eigenvector, 46 Full Information Maximum Likelihood, 23
Eigenvector, eigenvalue, 46 Full-Information Instrumental Variables Efficient, 21
Elasticity, 44 Function, 45
Elasticity of intertemporal substitution, 40 Function Central Limit Theorem, 24
EMP (expenditure minimization problem), 31 FWL (Frisch-Waugh-Lovell) Theorem, 19
Empirical distribution, 11
Endowment economy, Arrow-Debreu, 39 Gale-Shapley algorithm, 38
Endowment economy, sequential markets, 39 Game tree, 34
Engle curve, 32 Gamma function, 45
English auction, 37 GARCH process, 17
Entry deterrence, 37 GARP (Generalized Axiom of Revealed Preference), 27
Envelope theorem, 30 Gauss-Markov theorem, 18
Envelope theorem (integral form), 30 Gaussian distribution, 7
Equilibrium dominance, 37 Gaussian regression model, 11
Equivalence relation, 45 General equilibrium, 40
Equivalent variation, 32 Generalized Axiom of Revealed Preference, 27
Ergodic stationary martingale differences CLT, 16 Generalized least squares, 19
Ergodic Theorem, 16 Generalized Method of Moments estimator, 13
Ergodicity, 16 Generalized Tobit model, 25
Estimating AR(p), 17 Geometric series, 45
Estimating S, 18 Gibbard-Satterthwaite Theorem, 39
Estimating number of lags, 17 Giffen good, 31
Estimator, 11 GLS (Generalized Least Squares), 19
Euclidean norm, 41 GMM (Generalized Method of Moments), 13, 19
Eulers law, 44 GMM hypothesis testing, 20
EV (equivalent variation), 32 Gordins CLT, 16
Excess demand, 34 Gordins condition, 16
Expected utility function, 32 Gorman form, 32
Expected value, mean, 6 Granger causality, 44
Expenditure function, 31 Grim trigger strategy, 37
Expenditure minimization problem, 31 Gross complement, 32
Exponential families, 8 Gross substitute, 32

51
Gross substitutes property, 34 Instrumental variables estimator, 20
Groves mechanism, 39 Integrated process of order d, 24
GS (Gale-Shapley) algorithm, 38 Integrated process of order 0, 24
GS (Gibbard-Satterthwaite) Theorem, 39 Intereuler equation, 40
Interpersonal comparison, 28
Holders Inequality, 25 Intertemporal elasticity of substitution, 40
Halls martingale hypothesis, 43 Intertemporal Euler equation, 40
Hamiltonian, 42 Irreflexive, 45
Hansens test of overidentifying restrictions, 20 Iterated deletion of dominated strategies, 35
HARP (Houthakers Axiom of Revealed Preferences), 27 Iterated expectations, 6
Hausman Principle, 14 IV (instrumental variables) estimator, 20
Heckman two-step, 25
Herfindahl-Hirshman index, 36 J statistic, 20
Heteroscedasticity robust asymptotic variance matrix, 11 Jacobian, 6
Heteroscedasticity-robust standard error, 18 Jensens Inequality, 25
Hicksian demand correspondence, 31 JGLS (Joint Generalized Least Squares), 22
Holmstroms Lemma, 30 Join, 44
Homothetic preference, 28 Joint cdf, 5
Hotelling spatial location model, 36 Joint Generalized Least Squares, 22
Hotellings Lemma, 29 Joint pdf, 5
Houthakers Axiom of Revealed Preferences, 27 Joint pmf, 5
Hyperbolic discounting, 43
Hypothesis testing, 14 Kakutanis Fixed Point Theorem, 46
Kocherlakota two-sided lack of commitment model, 43
I(0) process, 24 Kolmogorov Axioms or Axioms of Probability, 4
I(1) process, 24 Kronecker product, 47
I(d) process, 24 Kuhns Theorem, 35
ID (increasing differences), 30 Kullback-Liebler Divergence, 12
Ideal index, 32 Kurtosis, 7
Idempotent matrix, 47
Identification, 10 Lag operator, 16
Identification in exponential families, 10 Lagrange multiplier statistic, 15
IEoS (intertemporal elasticity of substitution), 40 Landau symbols, 9, 10
Imperfect competition model, 40 Laspeyres index, 32
Imperfect competition: final goods, 40 Law of Iterated Expectations, 6
Imperfect competition: intermediate goods, 40 Law of Large Numbers for covariance-stationary processes with vanishing autocovariances, 15
Imperfect information, 34 Law of Large Numbers for ergodic processes (Ergodic Theorem), 16
Implicit function theorem, 30 Law of Large Numbers for iid samples, 9, 10
Inada conditions, 40 Law of Large Numbers for niid samples, 10
Income expansion curve, 32 Law of Supply, 29
Incomplete information, 34 LeChatelier principle, 30
Incomplete markets, 34 Lehmann-Scheffe Theorem, 14
Incomplete markets: finite horizon, 43 Lerner index, 36
Incomplete markets: infinite horizon, 43 Level, 14
Increasing differences, 30 Lexicographic preferences, 28
Independence of events, 4 lhc (lower hemi-continuity), 46
Independence of r.v.s, 5 Likelihood function, 11
Independence of random vectors, 5 Likelihood ratio test, 15
Independent, 4, 5 lim inf, 45
Indirect utility function, 31 lim sup, 45
Individually rational payoff, 37 Limited Information Maximum Likelihood, 24
Inferior good, 31 Limiting size, 15
Infimum limit, 45 LIML (Limited Information Maximum Likelihood), 24
Infinitely repeated game, 37 Lindeberg-Levy CLT, 10
Information Equality, 14 Linear GMM estimator, 20
Injective, 45 Linear GMM model, 19
Inner bound, 29 Linear independence, rank, 46

52
Linear instrumental variables, 11 Mean independence, 5
Linear multiple-equation GMM estimator, 21 Mean squared error risk function, 14
Linear multiple-equation GMM model, 21 Mean value expansion, 45
Linear regression model with non-stochastic covariates, 11 Measurability, 5
Linearization in continuous time, 42 Median, 6
Little o error notation, 9 Meet, 44
LLN (Law of Large Numbers) for covariance-stationary processes with vanishing autocovariances, Method of Moments estimator, 13
15 Metric space, 41
LLN (Law of Large Numbers) for ergodic processes (Ergodic Theorem), 16 mgf (moment generating function), 7
LLN (Law of Large Numbers) for iid samples, 9 Milgrom-Shannon, 30
Local level model, 17 Milgrom-Shannon Monotonicity Theorem, 30
Locally non-satiated preference, 28 Minimal sufficient statistic, 11
Location and Scale families, 8 Minimum Distance estimator, 13
Log-linearization, 42 Minkowskis Inequality, 25
Logistic function, 45 Mixed strategy Nash equilibrium, 35
Logit model, 24 MLE (Maximum Likelihood estimator), 12
Lognormal distribution, 8 MLR (monotone likelihood ratio), 15
Long-run variance, 16 Mode, 6
Loss function, 14, 29 Moment, 7
Lottery, 32 Moment generating function, 7
Lower hemi-continuity, 46 Monopolistic competition, 36
Lower semi-continuity, 46 Monopoly pricing, 29
lsc (lower semi-continuity), 46 Monotone, 45
Lucas cost of busines cycles, 43 Monotone comparative statics, 30
Lucas supply function, 44 Monotone likelihood ratio models, 15
Lucas tree, 42 Monotone preference, 28
Lyapounovs Theorem, 10 Monotone Selection Theorem, 30
Monotonicity, 39
M-estimator, 12 Moral hazard, 38
MA (moving average) process, 16 Moving average process, 16
Maintained hypothesis, 14 MRS (marginal rate of substitution), 31
Marginal pdf, 5 MRT (marginal rate of transformation), 28
Marginal pmf, 5 MRTS (marginal rate of technological substitution), 28
Marginal rate of substitution, 31 MSE (mean squared error), 14
Marginal rate of technological substitution, 28 MSNE (mixed strategy Nash equilibrium), 35
Marginal rate of transformation, 28 Multiple equation 2SLS (Two-Stage Least Squares), 22
Marginal utility of wealth, 31 Multiple equation TSLS (Two-Stage Least Squares), 22
Markov process, 42 Multiple equation Two-Stage Least Squares, 22
Markovs Inequality, 25 Multiple-equation GMM with common coefficients, 22
Marshallian consumer surplus, 32 Multivariate covariance, 7
Marshallian demand correspondence, 31 Multivariate normal distribution, 8
Martingale, 16 Multivariate regression, 22
Martingale difference sequence, 16 Multivariate variance, 7
Matrix, 46 Mutual exclusivity, 4
Matrix derivatives, 47 Mutual independence of events, 4
Matrix inverse, 46
Matrix multiplication, 46 Nash reversion, 37
Matrix squareroot, 46 Natural parameter space, 8
Matrix transpose, 46 NCGM (neoclassical growth model), 39
Maximum Likelihood estimator, 12 NCGM in continuous time, 42
Maximum likelihood estimator for OLS model, 18 Negative transitive, 45
Maximum likelihood for SUR, 23 Neoclassical growth model (discrete time), 39
Maximum likelihood with serial correlation, 17 Neutrality, 39
Mays Theorem, 39 Neyman-Pearson Lemma, 14
MCS (monotone comparative statics), 30 NLS (nonlinear least squares), 11
MCS: robustness to objective function perturbation, 30 No Ponzi condition, 41
mds (martingale difference sequence), 16 Nondecreasing returns to scale, 28

53
Nonincreasing returns to scale, 28 PE (Pareto efficiency), 33
Nonlinear least squares, 11 Perfect Bayesian equilibrium, 36
Nonsingularity, 46 Perfect information, 34
Nontransferable utility game, 38 Perfect recall, 34
Normal distribution, 7 Philips Curve, 44
Normal equations, 18 Pivotal mechanism, 39
Normal form, 34 pmf (probability mass function), 5
Normal good, 31 PO (Pareto optimality), 33
Normative representative consumer, 32 Point identification, 10
Normed vector space, 41 Pooled OLS, 23
NPL (Neyman-Pearson Lemma), 14 Portfolio problem, 33
NTU (nontransferable utility) game, 38 Positive definiteness, &c., 47
Null hypothesis, 14 Positive representative consumer, 32
Numerical inequality lemma, 25 Power, 14
Power series, 45
O error notation, 10 Preference axioms under uncertainty, 32
o error notation, 9 Price expansion path, 32
Observational equivalence, 10 Price index, 32
Offer curve, 32, 33 Primal approach to optimal taxation, 40
Okuns Law, 44 Principle of Optimality, 41
OLG (overlapping generations), 40 Probability density function, 5
OLS F test, 19 Probability function, 4
OLS R2 , 18 Probability integral transformation, 6
OLS t test, 18 Probability mass function, 5
OLS (ordinary least squares), 17 Probability space, 4
OLS residual, 18 Probability-generating function, 7
OLS robust t test, 19 Probit model, 11, 24
OLS robust Wald statistic, 19 Producer Surplus Formula, 29
One-sided lack of commitment, 43 Production economy, Arrow-Debreu, 40
One-sided matching, 38 Production economy, sequential markets, 40
One-to-one, 45 Production function, 28
Onto, 45 Production plan, 28
Optimal taxationprimal approach, 40 Production set, 28
Order statistic, 9 Profit maximization, 28
Order symbols, 10 Projection matrix, 18
Ordinary least squares estimators, 18 Proper subgame, 35
Ordinary least squares model, 17 The Property, 26
Orthants of Euclidean space, 44 Proportionate gamble, 33
Orthogonal completion, 47 PSF (Producer Surplus Formula), 29
Orthogonal decomposition, 46 PSNE (pure strategy Nash equilibrium), 35
Orthogonal matrix, 46 Public good, 37
Other generating functions, 7 Pure strategy Nash equilibrium, 35
Outer bound, 29
Overlapping generations, 40 Quadrants of Euclidean space, 44
Overtaking criterion, 37 Quadratic loss, 14
Quasi-linear preferences, 28
p-value, 14
Paasche index, 32 R2 , 18
Parameter, 10 Ramsey equilibrium, 41
Parametric model, 10 Ramsey model, 39
Pareto optimality, 33 Random effects estimator, 23
Pareto set, 33 Random sample, iid, 9
Partition, 4, 45 Random variable, 5
Partitioned matrices, 47 Random vector, 5
Payoff function, 34 Random walk, 16
PBE (perfect Bayesian equilibrium), 36 Rank, 46
pdf (probability density function), 5 Rao-Blackwell Theorem, 14

54
Rational expectations equilibrium, 34 Seemingly Unrelated Regressions, 11, 22
Rational expectations models, 44 Separable preferences, 28
Rational preference relation, 27 Sequential competition, 36
Rationalizable strategy, 35 Sequential equilibrium, 36
Rationalization: h and differentiable e, 31 Sequentially rational strategy, 36
Rationalization: y() and differentiable (), 29 SER (standard error of the regression), 18
Rationalization: differentiable (), 29 Shadow price of wealth, 31
Rationalization: differentiable e, 31 Shapley value, 38
Rationalization: differentiable h, 31 Shephards Lemma, 29, 31
Rationalization: differentiable x, 31 Short and long regressions, 19
Rationalization: differentiable y(), 29 Shutdown, 28
Rationalization: general y() and (), 29 SID (strictly increasing differences), 30
Rationalization: profit maximization functions, 28 Sigma algebra, 4
Rationalization: single-output cost function, 29 Signaling, 37, 38
RBC (Real Business Cycles) model, 40 Simple game, 38
RE (random effects) estimator, 23 Simple likelihood ratio statistic, 14
Real Business Cycles model, 40 Single-crossing condition, 30
Reduced form, 23 Single-output case, 29
REE (rational expectations equilibrium), 34 Singularity, 46
Reflexive, 45 Size, 14
Regression, 6 Skewness, 7
Regression model, 11 Slutsky equation, 31
Regular good, 31 Slutsky matrix, 31
Relating Marshallian and Hicksian demand, 31 Slutskys Theorem, 10
Relative risk aversion, 33 SM (supermodularity), 30
Revealed preference, 27 Smallest -field, 5
Revelation principle, 34 Snedecors F distribution, 8
Revenue equivalence theorem, 37 Sonnenschein-Mantel-Debreu Theorem, 34
Risk aversion, 32 Spectral decomposition, 46
Risk function, 14 Spence signaling model, 37, 38
Risk-free bond pricing, 42 SPNE (subgame perfect Nash equilibrium), 36
Robust standard error, 18 Square matrix, 46
Robust t test, 19 SSO (strong set order), 44
Robust Wald statistic, 19 Stable distribution, 8
Roys identity, 32 Stage game, 37
RRA (relative risk aversion), 33 Standard deviation, 7
rXY (sample correlation), 9 Standard error, 18
Standard error of the regression, 18
s2 (sample variance), 9 Stars and bars, 4
Sample correlation, 9 Stationarity, 15
Sample covariance, 9 Statistic, 9
Sample mean, 9 Statistics in exponential families, 11
Sample median, 9 Steins Lemma, 7
Sample selection model, 25 Stick-and-carrot equilibrium, 37
Sample standard deviation, 9 Stochastic ordering, 5
Sample variance, 9 Stock pricing, 43
Samples from the normal distribution, 9 Strategy, 34
Sampling error, 18 Strict dominance, 35
Samuelson-LeChatelier principle, 30 Strict stationarity, 15
Scedastic function, 7 Strictly increasing differences, 30
Score, 14 Strictly mixed strategy, 35
Screening, 38 Strong Law of Large Numbers, 9, 10
SE (sequential equilibrium), 36 Strong set order, 44
Second Welfare Theorem, 34 Strongly increasing differences, 30
Second-order stochastic dominance, 33 Structural form, 23
Second-price auction, 37 Students t distribution, 8
SEE (standard error of the equation), 18 Subgame perfect Nash equilibrium, 36

55
Subjective probabilities, 33 Two period intertemporal choice model, 39
Sublattice, 44 Two-sided lack of commitment, 43
Submodularity, 30 Two-sided matching, 38
Substitute goods, 31 Two-Stage Least Squares, 21
Substitute inputs, 30 Two-way rule for expectations, 6
Substitution matrix, 29 Type I/Type II error, 14
Sufficient statistic, 11
Sum of powers, 44 uhc (upper hemi-continuity), 46
Sup norm, 41 ULLN (Uniform Law of Large Numbers), 12
Superconsistent estimator, 11 UMP (uniformly most powerful) test, 14
Supergame, 37 UMP (utility maximization problem), 31
Supermodularity, 30 UMPU (uniformly most powerful unbiased) test, 15
Support, 5 UMVUE (uniformly minimum variance unbiased estimator), 13
Support set, 5 Unanimity, 39
Supremum limit, 45 Unbiased estimator, 9
Supremum norm (Rn ), 41 Unbiased test, 15
Supremum norm (real-valued functions), 41 Uncertainty: general setup, 42
SUR (seemingly unrelated regressions), 11, 22 Uncompensated demand correspondence, 31
Sure thing principle, 32 Uniform (Weak) Law of Large Numbers, 12
Surjective, 45 Uniform convergence in probability, 9
sXY (sample covariance), 9 Uniformly minimum variance unbiased estimator, 13
Symmetric, 45 Uniformly most powerful test, 14
Symmetric distribution, 6 Uniformly most powerful unbiased test, 15
Symmetric matrix, 46 Unimodality, 6
System of beliefs, 35 Unit root process, 24
Upper hemi-continuity, 46
t distribution, 8 Upper semi-continuity, 46
t test, 18 usc (upper semi-continuity), 46
Taylor Rule, 44 Utility function, 27
Taylor series, 45 Utility maximization problem, 31
Taylor series examples, 45
Test function, 14 Variance, 7
Test statistic, 14 Variance of residuals, 18
Testing overidentifying restrictions, 20 VCG (Vickrey-Clarke-Groves) mechanism, 39
THPE (trembling-hand perfect equilibrium), 35 Veto player, 38
Three-Stage Least Squares, 22 Vickrey auction, 37
Threshold crossing model, 11 Vickrey-Clarke-Groves mechanism, 39
Time inconsistency, 44 von Neumann-Morgenstern utility function, 32
Tobit model, 25 Voting rule, 39
Top Trading Cycles algorithm, 38
Topkis Theorem, 30 Wald statistic, 15
Trace, 46 Walras Law, 31
Transferable utility game, 38 Walrasian demand correspondence, 31
Transformation R1 R1 , 6 Walrasian equilibrium, 33
Transformation R2 R2 , 6 Walrasian model, 33
Transformation frontier, 28 WAPM (Weak Axiom of Profit Maximization), 29
Transformation function, 28 WARP (Weak Axiom of Revealed Preference), 27
Transitive, 45 WE (Walrasian equilibrium), 33
Transitivity, 27 Weak Axiom of Profit Maximization, 29
Transpose, 46 Weak Axiom of Revealed Preference, 27
Transversality condition, 41 Weak dominance, 35
Trembling-hand perfection, 35 Weak Law of Large Numbers, 9
Truncated response model, 25 Weak perfect Bayesian equilibrium, 36
TSLS (Three-Stage Least Squares), 22 Weak stationarity, 15
TSLS (Two-Stage Least Squares), 21 Weakly dominated strategy, 35
TU (transferable utility) game, 38 Weakly increasing differences, 30
TVC (transversality condition), 41 Weighted least squares, 19

56
White noise, 16
Whites standard error, 18
WID (weakly increasing differences), 30
Wiener process, 24
WLS (Weighted Least Squares), 19
WPBE (weak perfect Bayesian equilibrium), 36

X (sample mean), 9

Zermelos Theorem, 35
0-1 loss, 14

57
58
Standard normal critical values
(two-tailed: e.g., 5% of observations from N(0,1) will fall outside 1.96)
(one-tailed: e.g., 5% of observations from N(0,1) will fall above 1.64)

Significance
50% 25% 10% 5% 2.5% 1% 0.5% 0.25% 0.1% 0.05% 0.025% 0.01%
two-tailed 0.67 1.15 1.64 1.96 2.24 2.58 2.81 3.02 3.29 3.48 3.66 3.89
one-tailed (0.00) 0.67 1.28 1.64 1.96 2.33 2.58 2.81 3.09 3.29 3.48 3.72

Chi-squared critical values


(one-tailed: e.g., 5% of observations from 2 (1) will exceed 3.84)

Significance
df 50% 25% 10% 5% 2.5% 1% 0.5% 0.25% 0.1% 0.05% 0.025% 0.01%
1 0.45 1.32 2.71 3.84 5.02 6.63 7.88 9.14 10.83 12.12 13.41 15.14
2 1.39 2.77 4.61 5.99 7.38 9.21 10.60 11.98 13.82 15.20 16.59 18.42
3 2.37 4.11 6.25 7.81 9.35 11.34 12.84 14.32 16.27 17.73 19.19 21.11
4 3.36 5.39 7.78 9.49 11.14 13.28 14.86 16.42 18.47 20.00 21.52 23.51
5 4.35 6.63 9.24 11.07 12.83 15.09 16.75 18.39 20.52 22.11 23.68 25.74
6 5.35 7.84 10.64 12.59 14.45 16.81 18.55 20.25 22.46 24.10 25.73 27.86
7 6.35 9.04 12.02 14.07 16.01 18.48 20.28 22.04 24.32 26.02 27.69 29.88
8 7.34 10.22 13.36 15.51 17.53 20.09 21.95 23.77 26.12 27.87 29.59 31.83
9 8.34 11.39 14.68 16.92 19.02 21.67 23.59 25.46 27.88 29.67 31.43 33.72
10 9.34 12.55 15.99 18.31 20.48 23.21 25.19 27.11 29.59 31.42 33.22 35.56
11 10.34 13.70 17.28 19.68 21.92 24.72 26.76 28.73 31.26 33.14 34.98 37.37
12 11.34 14.85 18.55 21.03 23.34 26.22 28.30 30.32 32.91 34.82 36.70 39.13
13 12.34 15.98 19.81 22.36 24.74 27.69 29.82 31.88 34.53 36.48 38.39 40.87
14 13.34 17.12 21.06 23.68 26.12 29.14 31.32 33.43 36.12 38.11 40.06 42.58
15 14.34 18.25 22.31 25.00 27.49 30.58 32.80 34.95 37.70 39.72 41.70 44.26
16 15.34 19.37 23.54 26.30 28.85 32.00 34.27 36.46 39.25 41.31 43.32 45.92
17 16.34 20.49 24.77 27.59 30.19 33.41 35.72 37.95 40.79 42.88 44.92 47.57
18 17.34 21.60 25.99 28.87 31.53 34.81 37.16 39.42 42.31 44.43 46.51 49.19
19 18.34 22.72 27.20 30.14 32.85 36.19 38.58 40.88 43.82 45.97 48.08 50.80

59
20 19.34 23.83 28.41 31.41 34.17 37.57 40.00 42.34 45.31 47.50 49.63 52.39
21 20.34 24.93 29.62 32.67 35.48 38.93 41.40 43.78 46.80 49.01 51.17 53.96
22 21.34 26.04 30.81 33.92 36.78 40.29 42.80 45.20 48.27 50.51 52.70 55.52
23 22.34 27.14 32.01 35.17 38.08 41.64 44.18 46.62 49.73 52.00 54.22 57.07
24 23.34 28.24 33.20 36.42 39.36 42.98 45.56 48.03 51.18 53.48 55.72 58.61
25 24.34 29.34 34.38 37.65 40.65 44.31 46.93 49.44 52.62 54.95 57.22 60.14
26 25.34 30.43 35.56 38.89 41.92 45.64 48.29 50.83 54.05 56.41 58.70 61.66
27 26.34 31.53 36.74 40.11 43.19 46.96 49.64 52.22 55.48 57.86 60.18 63.16
28 27.34 32.62 37.92 41.34 44.46 48.28 50.99 53.59 56.89 59.30 61.64 64.66
29 28.34 33.71 39.09 42.56 45.72 49.59 52.34 54.97 58.30 60.73 63.10 66.15
30 29.34 34.80 40.26 43.77 46.98 50.89 53.67 56.33 59.70 62.16 64.56 67.63
31 30.34 35.89 41.42 44.99 48.23 52.19 55.00 57.69 61.10 63.58 66.00 69.11
32 31.34 36.97 42.58 46.19 49.48 53.49 56.33 59.05 62.49 65.00 67.44 70.57
33 32.34 38.06 43.75 47.40 50.73 54.78 57.65 60.39 63.87 66.40 68.87 72.03
34 33.34 39.14 44.90 48.60 51.97 56.06 58.96 61.74 65.25 67.80 70.29 73.48
35 34.34 40.22 46.06 49.80 53.20 57.34 60.27 63.08 66.62 69.20 71.71 74.93
36 35.34 41.30 47.21 51.00 54.44 58.62 61.58 64.41 67.99 70.59 73.12 76.36
37 36.34 42.38 48.36 52.19 55.67 59.89 62.88 65.74 69.35 71.97 74.52 77.80
38 37.34 43.46 49.51 53.38 56.90 61.16 64.18 67.06 70.70 73.35 75.92 79.22
39 38.34 44.54 50.66 54.57 58.12 62.43 65.48 68.38 72.05 74.73 77.32 80.65
40 39.34 45.62 51.81 55.76 59.34 63.69 66.77 69.70 73.40 76.09 78.71 82.06
41 40.34 46.69 52.95 56.94 60.56 64.95 68.05 71.01 74.74 77.46 80.09 83.47
42 41.34 47.77 54.09 58.12 61.78 66.21 69.34 72.32 76.08 78.82 81.47 84.88
43 42.34 48.84 55.23 59.30 62.99 67.46 70.62 73.62 77.42 80.18 82.85 86.28
44 43.34 49.91 56.37 60.48 64.20 68.71 71.89 74.93 78.75 81.53 84.22 87.68
45 44.34 50.98 57.51 61.66 65.41 69.96 73.17 76.22 80.08 82.88 85.59 89.07
46 45.34 52.06 58.64 62.83 66.62 71.20 74.44 77.52 81.40 84.22 86.95 90.46
47 46.34 53.13 59.77 64.00 67.82 72.44 75.70 78.81 82.72 85.56 88.31 91.84
48 47.34 54.20 60.91 65.17 69.02 73.68 76.97 80.10 84.04 86.90 89.67 93.22
49 48.33 55.27 62.04 66.34 70.22 74.92 78.23 81.38 85.35 88.23 91.02 94.60
50 49.33 56.33 63.17 67.50 71.42 76.15 79.49 82.66 86.66 89.56 92.37 95.97
Standard normal critical values

% of obs. from N(0,1) % of obs. from N(0,1) % of obs. from N(0,1) % of obs. from N(0,1)
x above x outside x x above x outside x x above x outside x x above x outside x
0.00 50.0% 100.0% 1.00 15.9% 31.7% 2.00 2.3% 4.6% 3.00 0.13% 0.27%
0.02 49.2% 98.4% 1.02 15.4% 30.8% 2.02 2.2% 4.3% 3.02 0.13% 0.25%
0.04 48.4% 96.8% 1.04 14.9% 29.8% 2.04 2.1% 4.1% 3.04 0.12% 0.24%
0.06 47.6% 95.2% 1.06 14.5% 28.9% 2.06 2.0% 3.9% 3.06 0.11% 0.22%
0.08 46.8% 93.6% 1.08 14.0% 28.0% 2.08 1.9% 3.8% 3.08 0.10% 0.21%
0.10 46.0% 92.0% 1.10 13.6% 27.1% 2.10 1.8% 3.6% 3.10 0.097% 0.19%
0.12 45.2% 90.4% 1.12 13.1% 26.3% 2.12 1.7% 3.4% 3.12 0.090% 0.18%
0.14 44.4% 88.9% 1.14 12.7% 25.4% 2.14 1.6% 3.2% 3.14 0.084% 0.17%
0.16 43.6% 87.3% 1.16 12.3% 24.6% 2.16 1.5% 3.1% 3.16 0.079% 0.16%
0.18 42.9% 85.7% 1.18 11.9% 23.8% 2.18 1.5% 2.9% 3.18 0.074% 0.15%
0.20 42.1% 84.1% 1.20 11.5% 23.0% 2.20 1.4% 2.8% 3.20 0.069% 0.14%
0.22 41.3% 82.6% 1.22 11.1% 22.2% 2.22 1.3% 2.6% 3.22 0.064% 0.13%
0.24 40.5% 81.0% 1.24 10.7% 21.5% 2.24 1.3% 2.5% 3.24 0.060% 0.12%
0.26 39.7% 79.5% 1.26 10.4% 20.8% 2.26 1.2% 2.4% 3.26 0.056% 0.11%
0.28 39.0% 77.9% 1.28 10.0% 20.1% 2.28 1.1% 2.3% 3.28 0.052% 0.10%
0.30 38.2% 76.4% 1.30 9.7% 19.4% 2.30 1.1% 2.1% 3.30 0.048% 0.097%
0.32 37.4% 74.9% 1.32 9.3% 18.7% 2.32 1.0% 2.0% 3.32 0.045% 0.090%
0.34 36.7% 73.4% 1.34 9.0% 18.0% 2.34 0.96% 1.9% 3.34 0.042% 0.084%
0.36 35.9% 71.9% 1.36 8.7% 17.4% 2.36 0.91% 1.8% 3.36 0.039% 0.078%
0.38 35.2% 70.4% 1.38 8.4% 16.8% 2.38 0.87% 1.7% 3.38 0.036% 0.072%
0.40 34.5% 68.9% 1.40 8.1% 16.2% 2.40 0.82% 1.6% 3.40 0.034% 0.067%
0.42 33.7% 67.4% 1.42 7.8% 15.6% 2.42 0.78% 1.6% 3.42 0.031% 0.063%
0.44 33.0% 66.0% 1.44 7.5% 15.0% 2.44 0.73% 1.5% 3.44 0.029% 0.058%
0.46 32.3% 64.6% 1.46 7.2% 14.4% 2.46 0.69% 1.4% 3.46 0.027% 0.054%
0.48 31.6% 63.1% 1.48 6.9% 13.9% 2.48 0.66% 1.3% 3.48 0.025% 0.050%
0.50 30.9% 61.7% 1.50 6.7% 13.4% 2.50 0.62% 1.2% 3.50 0.023% 0.047%
0.52 30.2% 60.3% 1.52 6.4% 12.9% 2.52 0.59% 1.2% 3.52 0.022% 0.043%
0.54 29.5% 58.9% 1.54 6.2% 12.4% 2.54 0.55% 1.1% 3.54 0.020% 0.040%
0.56 28.8% 57.5% 1.56 5.9% 11.9% 2.56 0.52% 1.0% 3.56 0.019% 0.037%

60
0.58 28.1% 56.2% 1.58 5.7% 11.4% 2.58 0.49% 0.99% 3.58 0.017% 0.034%
0.60 27.4% 54.9% 1.60 5.5% 11.0% 2.60 0.47% 0.93% 3.60 0.016% 0.032%
0.62 26.8% 53.5% 1.62 5.3% 10.5% 2.62 0.44% 0.88% 3.62 0.015% 0.029%
0.64 26.1% 52.2% 1.64 5.1% 10.1% 2.64 0.41% 0.83% 3.64 0.014% 0.027%
0.66 25.5% 50.9% 1.66 4.8% 9.7% 2.66 0.39% 0.78% 3.66 0.013% 0.025%
0.68 24.8% 49.7% 1.68 4.6% 9.3% 2.68 0.37% 0.74% 3.68 0.012% 0.023%
0.70 24.2% 48.4% 1.70 4.5% 8.9% 2.70 0.35% 0.69% 3.70 0.011% 0.022%
0.72 23.6% 47.2% 1.72 4.3% 8.5% 2.72 0.33% 0.65% 3.72 0.0100% 0.020%
0.74 23.0% 45.9% 1.74 4.1% 8.2% 2.74 0.31% 0.61% 3.74 0.0092% 0.018%
0.76 22.4% 44.7% 1.76 3.9% 7.8% 2.76 0.29% 0.58% 3.76 0.0085% 0.017%
0.78 21.8% 43.5% 1.78 3.8% 7.5% 2.78 0.27% 0.54% 3.78 0.0078% 0.016%
0.80 21.2% 42.4% 1.80 3.6% 7.2% 2.80 0.26% 0.51% 3.80 0.0072% 0.014%
0.82 20.6% 41.2% 1.82 3.4% 6.9% 2.82 0.24% 0.48% 3.82 0.0067% 0.013%
0.84 20.0% 40.1% 1.84 3.3% 6.6% 2.84 0.23% 0.45% 3.84 0.0062% 0.012%
0.86 19.5% 39.0% 1.86 3.1% 6.3% 2.86 0.21% 0.42% 3.86 0.0057% 0.011%
0.88 18.9% 37.9% 1.88 3.0% 6.0% 2.88 0.20% 0.40% 3.88 0.0052% 0.010%
0.90 18.4% 36.8% 1.90 2.9% 5.7% 2.90 0.19% 0.37% 3.90 0.0048% 0.0096%
0.92 17.9% 35.8% 1.92 2.7% 5.5% 2.92 0.18% 0.35% 3.92 0.0044% 0.0089%
0.94 17.4% 34.7% 1.94 2.6% 5.2% 2.94 0.16% 0.33% 3.94 0.0041% 0.0081%
0.96 16.9% 33.7% 1.96 2.5% 5.0% 2.96 0.15% 0.31% 3.96 0.0037% 0.0075%
0.98 16.4% 32.7% 1.98 2.4% 4.8% 2.98 0.14% 0.29% 3.98 0.0034% 0.0069%
Standard normal critical values
% of observations from 2(1)
1
x above x x above x x above x x above x x above x x above x
0.000 100.0% 1.250 26.4% 2.500 11.4% 3.750 5.3% 5.000 2.5% 6.250 1.2%
0.025 87.4% 1.275 25.9% 2.525 11.2% 3.775 5.2% 5.025 2.5% 6.275 1.2%
0.050 82.3% 1.300 25.4% 2.550 11.0% 3.800 5.1% 5.050 2.5% 6.300 1.2%
0.075 78.4% 1.325 25.0% 2.575 10.9% 3.825 5.0% 5.075 2.4% 6.325 1.2%
0.100 75.2% 1.350 24.5% 2.600 10.7% 3.850 5.0% 5.100 2.4% 6.350 1.2%
0.125 72.4% 1.375 24.1% 2.625 10.5% 3.875 4.9% 5.125 2.4% 6.375 1.2%
0.150 69.9% 1.400 23.7% 2.650 10.4% 3.900 4.8% 5.150 2.3% 6.400 1.1%
0.175 67.6% 1.425 23.3% 2.675 10.2% 3.925 4.8% 5.175 2.3% 6.425 1.1%
0.200 65.5% 1.450 22.9% 2.700 10.0% 3.950 4.7% 5.200 2.3% 6.450 1.1%
0.225 63.5% 1.475 22.5% 2.725 9.9% 3.975 4.6% 5.225 2.2% 6.475 1.1%
0.250 61.7% 1.500 22.1% 2.750 9.7% 4.000 4.6% 5.250 2.2% 6.500 1.1%
0.275 60.0% 1.525 21.7% 2.775 9.6% 4.025 4.5% 5.275 2.2% 6.525 1.1%
0.300 58.4% 1.550 21.3% 2.800 9.4% 4.050 4.4% 5.300 2.1% 6.550 1.0%
0.325 56.9% 1.575 20.9% 2.825 9.3% 4.075 4.4% 5.325 2.1% 6.575 1.0%
0.350 55.4% 1.600 20.6% 2.850 9.1% 4.100 4.3% 5.350 2.1% 6.600 1.0%
0.375 54.0% 1.625 20.2% 2.875 9.0% 4.125 4.2% 5.375 2.0% 6.625 1.0%
0.400 52.7% 1.650 19.9% 2.900 8.9% 4.150 4.2% 5.400 2.0% 6.650 0.99%
0.425 51.4% 1.675 19.6% 2.925 8.7% 4.175 4.1% 5.425 2.0% 6.675 0.98%
0.450 50.2% 1.700 19.2% 2.950 8.6% 4.200 4.0% 5.450 2.0% 6.700 0.96%
0.475 49.1% 1.725 18.9% 2.975 8.5% 4.225 4.0% 5.475 1.9% 6.725 0.95%
0.500 48.0% 1.750 18.6% 3.000 8.3% 4.250 3.9% 5.500 1.9% 6.750 0.94%
0.525 46.9% 1.775 18.3% 3.025 8.2% 4.275 3.9% 5.525 1.9% 6.775 0.92%
0.550 45.8% 1.800 18.0% 3.050 8.1% 4.300 3.8% 5.550 1.8% 6.800 0.91%
0.575 44.8% 1.825 17.7% 3.075 8.0% 4.325 3.8% 5.575 1.8% 6.825 0.90%
0.600 43.9% 1.850 17.4% 3.100 7.8% 4.350 3.7% 5.600 1.8% 6.850 0.89%
0.625 42.9% 1.875 17.1% 3.125 7.7% 4.375 3.6% 5.625 1.8% 6.875 0.87%
0.650 42.0% 1.900 16.8% 3.150 7.6% 4.400 3.6% 5.650 1.7% 6.900 0.86%
0.675 41.1% 1.925 16.5% 3.175 7.5% 4.425 3.5% 5.675 1.7% 6.925 0.85%

61
0.700 40.3% 1.950 16.3% 3.200 7.4% 4.450 3.5% 5.700 1.7% 6.950 0.84%
0.725 39.5% 1.975 16.0% 3.225 7.3% 4.475 3.4% 5.725 1.7% 6.975 0.83%
0.750 38.6% 2.000 15.7% 3.250 7.1% 4.500 3.4% 5.750 1.6% 7.000 0.82%
0.775 37.9% 2.025 15.5% 3.275 7.0% 4.525 3.3% 5.775 1.6% 7.025 0.80%
0.800 37.1% 2.050 15.2% 3.300 6.9% 4.550 3.3% 5.800 1.6% 7.050 0.79%
0.825 36.4% 2.075 15.0% 3.325 6.8% 4.575 3.2% 5.825 1.6% 7.075 0.78%
0.850 35.7% 2.100 14.7% 3.350 6.7% 4.600 3.2% 5.850 1.6% 7.100 0.77%
0.875 35.0% 2.125 14.5% 3.375 6.6% 4.625 3.2% 5.875 1.5% 7.125 0.76%
0.900 34.3% 2.150 14.3% 3.400 6.5% 4.650 3.1% 5.900 1.5% 7.150 0.75%
0.925 33.6% 2.175 14.0% 3.425 6.4% 4.675 3.1% 5.925 1.5% 7.175 0.74%
0.950 33.0% 2.200 13.8% 3.450 6.3% 4.700 3.0% 5.950 1.5% 7.200 0.73%
0.975 32.3% 2.225 13.6% 3.475 6.2% 4.725 3.0% 5.975 1.5% 7.225 0.72%
1.000 31.7% 2.250 13.4% 3.500 6.1% 4.750 2.9% 6.000 1.4% 7.250 0.71%
1.025 31.1% 2.275 13.1% 3.525 6.0% 4.775 2.9% 6.025 1.4% 7.275 0.70%
1.050 30.6% 2.300 12.9% 3.550 6.0% 4.800 2.8% 6.050 1.4% 7.300 0.69%
1.075 30.0% 2.325 12.7% 3.575 5.9% 4.825 2.8% 6.075 1.4% 7.325 0.68%
1.100 29.4% 2.350 12.5% 3.600 5.8% 4.850 2.8% 6.100 1.4% 7.350 0.67%
1.125 28.9% 2.375 12.3% 3.625 5.7% 4.875 2.7% 6.125 1.3% 7.375 0.66%
1.150 28.4% 2.400 12.1% 3.650 5.6% 4.900 2.7% 6.150 1.3% 7.400 0.65%
1.175 27.8% 2.425 11.9% 3.675 5.5% 4.925 2.6% 6.175 1.3% 7.425 0.64%
1.200 27.3% 2.450 11.8% 3.700 5.4% 4.950 2.6% 6.200 1.3% 7.450 0.63%
1.225 26.8% 2.475 11.6% 3.725 5.4% 4.975 2.6% 6.225 1.3% 7.475 0.63%
Standard normal critical values
% of observations from 2(2)
2
x above x x above x x above x x above x x above x x above x
0.00 100.0% 2.50 28.7% 5.00 8.2% 7.50 2.4% 10.00 0.67% 12.50 0.19%
0.05 97.5% 2.55 27.9% 5.05 8.0% 7.55 2.3% 10.05 0.66% 12.55 0.19%
0.10 95.1% 2.60 27.3% 5.10 7.8% 7.60 2.2% 10.10 0.64% 12.60 0.18%
0.15 92.8% 2.65 26.6% 5.15 7.6% 7.65 2.2% 10.15 0.63% 12.65 0.18%
0.20 90.5% 2.70 25.9% 5.20 7.4% 7.70 2.1% 10.20 0.61% 12.70 0.17%
0.25 88.2% 2.75 25.3% 5.25 7.2% 7.75 2.1% 10.25 0.59% 12.75 0.17%
0.30 86.1% 2.80 24.7% 5.30 7.1% 7.80 2.0% 10.30 0.58% 12.80 0.17%
0.35 83.9% 2.85 24.1% 5.35 6.9% 7.85 2.0% 10.35 0.57% 12.85 0.16%
0.40 81.9% 2.90 23.5% 5.40 6.7% 7.90 1.9% 10.40 0.55% 12.90 0.16%
0.45 79.9% 2.95 22.9% 5.45 6.6% 7.95 1.9% 10.45 0.54% 12.95 0.15%
0.50 77.9% 3.00 22.3% 5.50 6.4% 8.00 1.8% 10.50 0.52% 13.00 0.15%
0.55 76.0% 3.05 21.8% 5.55 6.2% 8.05 1.8% 10.55 0.51% 13.05 0.15%
0.60 74.1% 3.10 21.2% 5.60 6.1% 8.10 1.7% 10.60 0.50% 13.10 0.14%
0.65 72.3% 3.15 20.7% 5.65 5.9% 8.15 1.7% 10.65 0.49% 13.15 0.14%
0.70 70.5% 3.20 20.2% 5.70 5.8% 8.20 1.7% 10.70 0.47% 13.20 0.14%
0.75 68.7% 3.25 19.7% 5.75 5.6% 8.25 1.6% 10.75 0.46% 13.25 0.13%
0.80 67.0% 3.30 19.2% 5.80 5.5% 8.30 1.6% 10.80 0.45% 13.30 0.13%
0.85 65.4% 3.35 18.7% 5.85 5.4% 8.35 1.5% 10.85 0.44% 13.35 0.13%
0.90 63.8% 3.40 18.3% 5.90 5.2% 8.40 1.5% 10.90 0.43% 13.40 0.12%
0.95 62.2% 3.45 17.8% 5.95 5.1% 8.45 1.5% 10.95 0.42% 13.45 0.12%
1.00 60.7% 3.50 17.4% 6.00 5.0% 8.50 1.4% 11.00 0.41% 13.50 0.12%
1.05 59.2% 3.55 16.9% 6.05 4.9% 8.55 1.4% 11.05 0.40% 13.55 0.11%
1.10 57.7% 3.60 16.5% 6.10 4.7% 8.60 1.4% 11.10 0.39% 13.60 0.11%
1.15 56.3% 3.65 16.1% 6.15 4.6% 8.65 1.3% 11.15 0.38% 13.65 0.11%
1.20 54.9% 3.70 15.7% 6.20 4.5% 8.70 1.3% 11.20 0.37% 13.70 0.11%
1.25 53.5% 3.75 15.3% 6.25 4.4% 8.75 1.3% 11.25 0.36% 13.75 0.10%
1.30 52.2% 3.80 15.0% 6.30 4.3% 8.80 1.2% 11.30 0.35% 13.80 0.10%
1.35 50.9% 3.85 14.6% 6.35 4.2% 8.85 1.2% 11.35 0.34% 13.85 0.098%

62
1.40 49.7% 3.90 14.2% 6.40 4.1% 8.90 1.2% 11.40 0.33% 13.90 0.096%
1.45 48.4% 3.95 13.9% 6.45 4.0% 8.95 1.1% 11.45 0.33% 13.95 0.093%
1.50 47.2% 4.00 13.5% 6.50 3.9% 9.00 1.1% 11.50 0.32% 14.00 0.091%
1.55 46.1% 4.05 13.2% 6.55 3.8% 9.05 1.1% 11.55 0.31% 14.05 0.089%
1.60 44.9% 4.10 12.9% 6.60 3.7% 9.10 1.1% 11.60 0.30% 14.10 0.087%
1.65 43.8% 4.15 12.6% 6.65 3.6% 9.15 1.0% 11.65 0.30% 14.15 0.085%
1.70 42.7% 4.20 12.2% 6.70 3.5% 9.20 1.01% 11.70 0.29% 14.20 0.083%
1.75 41.7% 4.25 11.9% 6.75 3.4% 9.25 0.98% 11.75 0.28% 14.25 0.080%
1.80 40.7% 4.30 11.6% 6.80 3.3% 9.30 0.96% 11.80 0.27% 14.30 0.078%
1.85 39.7% 4.35 11.4% 6.85 3.3% 9.35 0.93% 11.85 0.27% 14.35 0.077%
1.90 38.7% 4.40 11.1% 6.90 3.2% 9.40 0.91% 11.90 0.26% 14.40 0.075%
1.95 37.7% 4.45 10.8% 6.95 3.1% 9.45 0.89% 11.95 0.25% 14.45 0.073%
2.00 36.8% 4.50 10.5% 7.00 3.0% 9.50 0.87% 12.00 0.25% 14.50 0.071%
2.05 35.9% 4.55 10.3% 7.05 2.9% 9.55 0.84% 12.05 0.24% 14.55 0.069%
2.10 35.0% 4.60 10.0% 7.10 2.9% 9.60 0.82% 12.10 0.24% 14.60 0.068%
2.15 34.1% 4.65 9.8% 7.15 2.8% 9.65 0.80% 12.15 0.23% 14.65 0.066%
2.20 33.3% 4.70 9.5% 7.20 2.7% 9.70 0.78% 12.20 0.22% 14.70 0.064%
2.25 32.5% 4.75 9.3% 7.25 2.7% 9.75 0.76% 12.25 0.22% 14.75 0.063%
2.30 31.7% 4.80 9.1% 7.30 2.6% 9.80 0.74% 12.30 0.21% 14.80 0.061%
2.35 30.9% 4.85 8.8% 7.35 2.5% 9.85 0.73% 12.35 0.21% 14.85 0.060%
2.40 30.1% 4.90 8.6% 7.40 2.5% 9.90 0.71% 12.40 0.20% 14.90 0.058%
2.45 29.4% 4.95 8.4% 7.45 2.4% 9.95 0.69% 12.45 0.20% 14.95 0.057%
Standard normal critical values
% of observations from 2(3)
3
x above x x above x x above x x above x x above x x above x
0.00 100.0% 2.50 47.5% 5.00 17.2% 7.50 5.8% 10.00 1.9% 12.50 0.59%
0.05 99.7% 2.55 46.6% 5.05 16.8% 7.55 5.6% 10.05 1.8% 12.55 0.57%
0.10 99.2% 2.60 45.7% 5.10 16.5% 7.60 5.5% 10.10 1.8% 12.60 0.56%
0.15 98.5% 2.65 44.9% 5.15 16.1% 7.65 5.4% 10.15 1.7% 12.65 0.55%
0.20 97.8% 2.70 44.0% 5.20 15.8% 7.70 5.3% 10.20 1.7% 12.70 0.53%
0.25 96.9% 2.75 43.2% 5.25 15.4% 7.75 5.1% 10.25 1.7% 12.75 0.52%
0.30 96.0% 2.80 42.3% 5.30 15.1% 7.80 5.0% 10.30 1.6% 12.80 0.51%
0.35 95.0% 2.85 41.5% 5.35 14.8% 7.85 4.9% 10.35 1.6% 12.85 0.50%
0.40 94.0% 2.90 40.7% 5.40 14.5% 7.90 4.8% 10.40 1.5% 12.90 0.49%
0.45 93.0% 2.95 39.9% 5.45 14.2% 7.95 4.7% 10.45 1.5% 12.95 0.47%
0.50 91.9% 3.00 39.2% 5.50 13.9% 8.00 4.6% 10.50 1.5% 13.00 0.46%
0.55 90.8% 3.05 38.4% 5.55 13.6% 8.05 4.5% 10.55 1.4% 13.05 0.45%
0.60 89.6% 3.10 37.6% 5.60 13.3% 8.10 4.4% 10.60 1.4% 13.10 0.44%
0.65 88.5% 3.15 36.9% 5.65 13.0% 8.15 4.3% 10.65 1.4% 13.15 0.43%
0.70 87.3% 3.20 36.2% 5.70 12.7% 8.20 4.2% 10.70 1.3% 13.20 0.42%
0.75 86.1% 3.25 35.5% 5.75 12.4% 8.25 4.1% 10.75 1.3% 13.25 0.41%
0.80 84.9% 3.30 34.8% 5.80 12.2% 8.30 4.0% 10.80 1.3% 13.30 0.40%
0.85 83.7% 3.35 34.1% 5.85 11.9% 8.35 3.9% 10.85 1.3% 13.35 0.39%
0.90 82.5% 3.40 33.4% 5.90 11.7% 8.40 3.8% 10.90 1.2% 13.40 0.38%
0.95 81.3% 3.45 32.7% 5.95 11.4% 8.45 3.8% 10.95 1.2% 13.45 0.38%
1.00 80.1% 3.50 32.1% 6.00 11.2% 8.50 3.7% 11.00 1.2% 13.50 0.37%
1.05 78.9% 3.55 31.4% 6.05 10.9% 8.55 3.6% 11.05 1.1% 13.55 0.36%
1.10 77.7% 3.60 30.8% 6.10 10.7% 8.60 3.5% 11.10 1.1% 13.60 0.35%
1.15 76.5% 3.65 30.2% 6.15 10.5% 8.65 3.4% 11.15 1.1% 13.65 0.34%
1.20 75.3% 3.70 29.6% 6.20 10.2% 8.70 3.4% 11.20 1.1% 13.70 0.33%
1.25 74.1% 3.75 29.0% 6.25 10.0% 8.75 3.3% 11.25 1.0% 13.75 0.33%
1.30 72.9% 3.80 28.4% 6.30 9.8% 8.80 3.2% 11.30 1.0% 13.80 0.32%
1.35 71.7% 3.85 27.8% 6.35 9.6% 8.85 3.1% 11.35 1.00% 13.85 0.31%

63
1.40 70.6% 3.90 27.2% 6.40 9.4% 8.90 3.1% 11.40 0.97% 13.90 0.30%
1.45 69.4% 3.95 26.7% 6.45 9.2% 8.95 3.0% 11.45 0.95% 13.95 0.30%
1.50 68.2% 4.00 26.1% 6.50 9.0% 9.00 2.9% 11.50 0.93% 14.00 0.29%
1.55 67.1% 4.05 25.6% 6.55 8.8% 9.05 2.9% 11.55 0.91% 14.05 0.28%
1.60 65.9% 4.10 25.1% 6.60 8.6% 9.10 2.8% 11.60 0.89% 14.10 0.28%
1.65 64.8% 4.15 24.6% 6.65 8.4% 9.15 2.7% 11.65 0.87% 14.15 0.27%
1.70 63.7% 4.20 24.1% 6.70 8.2% 9.20 2.7% 11.70 0.85% 14.20 0.26%
1.75 62.6% 4.25 23.6% 6.75 8.0% 9.25 2.6% 11.75 0.83% 14.25 0.26%
1.80 61.5% 4.30 23.1% 6.80 7.9% 9.30 2.6% 11.80 0.81% 14.30 0.25%
1.85 60.4% 4.35 22.6% 6.85 7.7% 9.35 2.5% 11.85 0.79% 14.35 0.25%
1.90 59.3% 4.40 22.1% 6.90 7.5% 9.40 2.4% 11.90 0.77% 14.40 0.24%
1.95 58.3% 4.45 21.7% 6.95 7.4% 9.45 2.4% 11.95 0.76% 14.45 0.24%
2.00 57.2% 4.50 21.2% 7.00 7.2% 9.50 2.3% 12.00 0.74% 14.50 0.23%
2.05 56.2% 4.55 20.8% 7.05 7.0% 9.55 2.3% 12.05 0.72% 14.55 0.22%
2.10 55.2% 4.60 20.4% 7.10 6.9% 9.60 2.2% 12.10 0.70% 14.60 0.22%
2.15 54.2% 4.65 19.9% 7.15 6.7% 9.65 2.2% 12.15 0.69% 14.65 0.21%
2.20 53.2% 4.70 19.5% 7.20 6.6% 9.70 2.1% 12.20 0.67% 14.70 0.21%
2.25 52.2% 4.75 19.1% 7.25 6.4% 9.75 2.1% 12.25 0.66% 14.75 0.20%
2.30 51.3% 4.80 18.7% 7.30 6.3% 9.80 2.0% 12.30 0.64% 14.80 0.20%
2.35 50.3% 4.85 18.3% 7.35 6.2% 9.85 2.0% 12.35 0.63% 14.85 0.19%
2.40 49.4% 4.90 17.9% 7.40 6.0% 9.90 1.9% 12.40 0.61% 14.90 0.19%
2.45 48.4% 4.95 17.5% 7.45 5.9% 9.95 1.9% 12.45 0.60% 14.95 0.19%
Standard normal critical values
% of observations from 2(4)
4
x above x x above x x above x x above x x above x x above x
0.00 100.0% 2.50 64.5% 5.00 28.7% 7.50 11.2% 10.00 4.0% 12.50 1.4%
0.05 100.0% 2.55 63.6% 5.05 28.2% 7.55 11.0% 10.05 4.0% 12.55 1.4%
0.10 99.9% 2.60 62.7% 5.10 27.7% 7.60 10.7% 10.10 3.9% 12.60 1.3%
0.15 99.7% 2.65 61.8% 5.15 27.2% 7.65 10.5% 10.15 3.8% 12.65 1.3%
0.20 99.5% 2.70 60.9% 5.20 26.7% 7.70 10.3% 10.20 3.7% 12.70 1.3%
0.25 99.3% 2.75 60.0% 5.25 26.3% 7.75 10.1% 10.25 3.6% 12.75 1.3%
0.30 99.0% 2.80 59.2% 5.30 25.8% 7.80 9.9% 10.30 3.6% 12.80 1.2%
0.35 98.6% 2.85 58.3% 5.35 25.3% 7.85 9.7% 10.35 3.5% 12.85 1.2%
0.40 98.2% 2.90 57.5% 5.40 24.9% 7.90 9.5% 10.40 3.4% 12.90 1.2%
0.45 97.8% 2.95 56.6% 5.45 24.4% 7.95 9.3% 10.45 3.3% 12.95 1.2%
0.50 97.4% 3.00 55.8% 5.50 24.0% 8.00 9.2% 10.50 3.3% 13.00 1.1%
0.55 96.8% 3.05 54.9% 5.55 23.5% 8.05 9.0% 10.55 3.2% 13.05 1.1%
0.60 96.3% 3.10 54.1% 5.60 23.1% 8.10 8.8% 10.60 3.1% 13.10 1.1%
0.65 95.7% 3.15 53.3% 5.65 22.7% 8.15 8.6% 10.65 3.1% 13.15 1.1%
0.70 95.1% 3.20 52.5% 5.70 22.3% 8.20 8.5% 10.70 3.0% 13.20 1.0%
0.75 94.5% 3.25 51.7% 5.75 21.9% 8.25 8.3% 10.75 3.0% 13.25 1.0%
0.80 93.8% 3.30 50.9% 5.80 21.5% 8.30 8.1% 10.80 2.9% 13.30 0.99%
0.85 93.2% 3.35 50.1% 5.85 21.1% 8.35 8.0% 10.85 2.8% 13.35 0.97%
0.90 92.5% 3.40 49.3% 5.90 20.7% 8.40 7.8% 10.90 2.8% 13.40 0.95%
0.95 91.7% 3.45 48.6% 5.95 20.3% 8.45 7.6% 10.95 2.7% 13.45 0.93%
1.00 91.0% 3.50 47.8% 6.00 19.9% 8.50 7.5% 11.00 2.7% 13.50 0.91%
1.05 90.2% 3.55 47.0% 6.05 19.5% 8.55 7.3% 11.05 2.6% 13.55 0.89%
1.10 89.4% 3.60 46.3% 6.10 19.2% 8.60 7.2% 11.10 2.5% 13.60 0.87%
1.15 88.6% 3.65 45.5% 6.15 18.8% 8.65 7.0% 11.15 2.5% 13.65 0.85%
1.20 87.8% 3.70 44.8% 6.20 18.5% 8.70 6.9% 11.20 2.4% 13.70 0.83%
1.25 87.0% 3.75 44.1% 6.25 18.1% 8.75 6.8% 11.25 2.4% 13.75 0.81%
1.30 86.1% 3.80 43.4% 6.30 17.8% 8.80 6.6% 11.30 2.3% 13.80 0.80%
1.35 85.3% 3.85 42.7% 6.35 17.4% 8.85 6.5% 11.35 2.3% 13.85 0.78%

64
1.40 84.4% 3.90 42.0% 6.40 17.1% 8.90 6.4% 11.40 2.2% 13.90 0.76%
1.45 83.5% 3.95 41.3% 6.45 16.8% 8.95 6.2% 11.45 2.2% 13.95 0.75%
1.50 82.7% 4.00 40.6% 6.50 16.5% 9.00 6.1% 11.50 2.1% 14.00 0.73%
1.55 81.8% 4.05 39.9% 6.55 16.2% 9.05 6.0% 11.55 2.1% 14.05 0.71%
1.60 80.9% 4.10 39.3% 6.60 15.9% 9.10 5.9% 11.60 2.1% 14.10 0.70%
1.65 80.0% 4.15 38.6% 6.65 15.6% 9.15 5.7% 11.65 2.0% 14.15 0.68%
1.70 79.1% 4.20 38.0% 6.70 15.3% 9.20 5.6% 11.70 2.0% 14.20 0.67%
1.75 78.2% 4.25 37.3% 6.75 15.0% 9.25 5.5% 11.75 1.9% 14.25 0.65%
1.80 77.2% 4.30 36.7% 6.80 14.7% 9.30 5.4% 11.80 1.9% 14.30 0.64%
1.85 76.3% 4.35 36.1% 6.85 14.4% 9.35 5.3% 11.85 1.9% 14.35 0.63%
1.90 75.4% 4.40 35.5% 6.90 14.1% 9.40 5.2% 11.90 1.8% 14.40 0.61%
1.95 74.5% 4.45 34.9% 6.95 13.9% 9.45 5.1% 11.95 1.8% 14.45 0.60%
2.00 73.6% 4.50 34.3% 7.00 13.6% 9.50 5.0% 12.00 1.7% 14.50 0.59%
2.05 72.7% 4.55 33.7% 7.05 13.3% 9.55 4.9% 12.05 1.7% 14.55 0.57%
2.10 71.7% 4.60 33.1% 7.10 13.1% 9.60 4.8% 12.10 1.7% 14.60 0.56%
2.15 70.8% 4.65 32.5% 7.15 12.8% 9.65 4.7% 12.15 1.6% 14.65 0.55%
2.20 69.9% 4.70 31.9% 7.20 12.6% 9.70 4.6% 12.20 1.6% 14.70 0.54%
2.25 69.0% 4.75 31.4% 7.25 12.3% 9.75 4.5% 12.25 1.6% 14.75 0.52%
2.30 68.1% 4.80 30.8% 7.30 12.1% 9.80 4.4% 12.30 1.5% 14.80 0.51%
2.35 67.2% 4.85 30.3% 7.35 11.9% 9.85 4.3% 12.35 1.5% 14.85 0.50%
2.40 66.3% 4.90 29.8% 7.40 11.6% 9.90 4.2% 12.40 1.5% 14.90 0.49%
2.45 65.4% 4.95 29.2% 7.45 11.4% 9.95 4.1% 12.45 1.4% 14.95 0.48%
Standard normal critical values
% of observations from 2(5)
5
x above x x above x x above x x above x x above x x above x
0.00 100.0% 2.50 77.6% 5.00 41.6% 7.50 18.6% 10.00 7.5% 12.50 2.9%
0.05 100.0% 2.55 76.9% 5.05 41.0% 7.55 18.3% 10.05 7.4% 12.55 2.8%
0.10 100.0% 2.60 76.1% 5.10 40.4% 7.60 18.0% 10.10 7.2% 12.60 2.7%
0.15 100.0% 2.65 75.4% 5.15 39.8% 7.65 17.7% 10.15 7.1% 12.65 2.7%
0.20 99.9% 2.70 74.6% 5.20 39.2% 7.70 17.4% 10.20 7.0% 12.70 2.6%
0.25 99.8% 2.75 73.8% 5.25 38.6% 7.75 17.1% 10.25 6.8% 12.75 2.6%
0.30 99.8% 2.80 73.1% 5.30 38.0% 7.80 16.8% 10.30 6.7% 12.80 2.5%
0.35 99.7% 2.85 72.3% 5.35 37.5% 7.85 16.5% 10.35 6.6% 12.85 2.5%
0.40 99.5% 2.90 71.5% 5.40 36.9% 7.90 16.2% 10.40 6.5% 12.90 2.4%
0.45 99.4% 2.95 70.8% 5.45 36.3% 7.95 15.9% 10.45 6.3% 12.95 2.4%
0.50 99.2% 3.00 70.0% 5.50 35.8% 8.00 15.6% 10.50 6.2% 13.00 2.3%
0.55 99.0% 3.05 69.2% 5.55 35.2% 8.05 15.4% 10.55 6.1% 13.05 2.3%
0.60 98.8% 3.10 68.5% 5.60 34.7% 8.10 15.1% 10.60 6.0% 13.10 2.2%
0.65 98.6% 3.15 67.7% 5.65 34.2% 8.15 14.8% 10.65 5.9% 13.15 2.2%
0.70 98.3% 3.20 66.9% 5.70 33.7% 8.20 14.6% 10.70 5.8% 13.20 2.2%
0.75 98.0% 3.25 66.2% 5.75 33.1% 8.25 14.3% 10.75 5.7% 13.25 2.1%
0.80 97.7% 3.30 65.4% 5.80 32.6% 8.30 14.0% 10.80 5.5% 13.30 2.1%
0.85 97.4% 3.35 64.6% 5.85 32.1% 8.35 13.8% 10.85 5.4% 13.35 2.0%
0.90 97.0% 3.40 63.9% 5.90 31.6% 8.40 13.6% 10.90 5.3% 13.40 2.0%
0.95 96.6% 3.45 63.1% 5.95 31.1% 8.45 13.3% 10.95 5.2% 13.45 2.0%
1.00 96.3% 3.50 62.3% 6.00 30.6% 8.50 13.1% 11.00 5.1% 13.50 1.9%
1.05 95.8% 3.55 61.6% 6.05 30.1% 8.55 12.8% 11.05 5.0% 13.55 1.9%
1.10 95.4% 3.60 60.8% 6.10 29.7% 8.60 12.6% 11.10 4.9% 13.60 1.8%
1.15 95.0% 3.65 60.1% 6.15 29.2% 8.65 12.4% 11.15 4.8% 13.65 1.8%
1.20 94.5% 3.70 59.3% 6.20 28.7% 8.70 12.2% 11.20 4.8% 13.70 1.8%
1.25 94.0% 3.75 58.6% 6.25 28.3% 8.75 11.9% 11.25 4.7% 13.75 1.7%
1.30 93.5% 3.80 57.9% 6.30 27.8% 8.80 11.7% 11.30 4.6% 13.80 1.7%
1.35 93.0% 3.85 57.1% 6.35 27.4% 8.85 11.5% 11.35 4.5% 13.85 1.7%

65
1.40 92.4% 3.90 56.4% 6.40 26.9% 8.90 11.3% 11.40 4.4% 13.90 1.6%
1.45 91.9% 3.95 55.7% 6.45 26.5% 8.95 11.1% 11.45 4.3% 13.95 1.6%
1.50 91.3% 4.00 54.9% 6.50 26.1% 9.00 10.9% 11.50 4.2% 14.00 1.6%
1.55 90.7% 4.05 54.2% 6.55 25.6% 9.05 10.7% 11.55 4.2% 14.05 1.5%
1.60 90.1% 4.10 53.5% 6.60 25.2% 9.10 10.5% 11.60 4.1% 14.10 1.5%
1.65 89.5% 4.15 52.8% 6.65 24.8% 9.15 10.3% 11.65 4.0% 14.15 1.5%
1.70 88.9% 4.20 52.1% 6.70 24.4% 9.20 10.1% 11.70 3.9% 14.20 1.4%
1.75 88.3% 4.25 51.4% 6.75 24.0% 9.25 9.9% 11.75 3.8% 14.25 1.4%
1.80 87.6% 4.30 50.7% 6.80 23.6% 9.30 9.8% 11.80 3.8% 14.30 1.4%
1.85 86.9% 4.35 50.0% 6.85 23.2% 9.35 9.6% 11.85 3.7% 14.35 1.4%
1.90 86.3% 4.40 49.3% 6.90 22.8% 9.40 9.4% 11.90 3.6% 14.40 1.3%
1.95 85.6% 4.45 48.7% 6.95 22.4% 9.45 9.2% 11.95 3.5% 14.45 1.3%
2.00 84.9% 4.50 48.0% 7.00 22.1% 9.50 9.1% 12.00 3.5% 14.50 1.3%
2.05 84.2% 4.55 47.3% 7.05 21.7% 9.55 8.9% 12.05 3.4% 14.55 1.2%
2.10 83.5% 4.60 46.7% 7.10 21.3% 9.60 8.7% 12.10 3.3% 14.60 1.2%
2.15 82.8% 4.65 46.0% 7.15 21.0% 9.65 8.6% 12.15 3.3% 14.65 1.2%
2.20 82.1% 4.70 45.4% 7.20 20.6% 9.70 8.4% 12.20 3.2% 14.70 1.2%
2.25 81.4% 4.75 44.7% 7.25 20.3% 9.75 8.3% 12.25 3.2% 14.75 1.1%
2.30 80.6% 4.80 44.1% 7.30 19.9% 9.80 8.1% 12.30 3.1% 14.80 1.1%
2.35 79.9% 4.85 43.4% 7.35 19.6% 9.85 8.0% 12.35 3.0% 14.85 1.1%
2.40 79.1% 4.90 42.8% 7.40 19.3% 9.90 7.8% 12.40 3.0% 14.90 1.1%
2.45 78.4% 4.95 42.2% 7.45 18.9% 9.95 7.7% 12.45 2.9% 14.95 1.1%

You might also like