Quantile Regression PDF

Semi and Nonparametric Models in Econometrics

Part I: quantile regression
Xavier DHaultfuille
CREST-INSEE
Outline
Model and motivation
Inference in quantile regressions
Additional properties
Quantile regression in practice
Quantile IV
Quantile restrictions in nonlinear models

Outline
Quantile IV

Prologue: quantiles
I
The -th quantile ( (0, 1)) of a random variable U is

defined by
q (U) = inf{x/FU (x) },
where FU denotes the distribution function of U. Note that
when FU is strictly increasing, q (U) = FU1 ( ). Otherwise,
q (U) satisfies for instance:
q(U)
q(U)

Prologue: quantiles
I
The quantile function 7 q (U) is an increasing, left

continuous function which satisfy, for all a > 0 and b:
q (aU + b) = aq (U) + b.
Caution: q (U + V ) 6= q (U) + q (V ) in general.
Conditional quantiles are simply defined as:
(1)
q (Y |X ) = inf{u/FY |X (u|X ) }.
I
Similarly to conditional expectations, conditional quantiles are

random variables (as they depend on the random variable X ).

The model
Let Y R and X Rp , we consider here a model of the form

Y = X 0 + , q ( |X ) = 0.
Equivalently, we have
q (Y |X ) = X 0 .
This model is similar to the standard linear regression, except

that we replace the conditional expectation E (Y |X ) by a
conditional quantile.

First motivation: measuring heterogenous effects

I
The effect of a variable may not be the same for everybody. We

ignore this fact in standard linear regression by focusing on average
effects.
However, such heterogeneity may be important for policy reasons.

We may also want to test for homogeneity.
Consider for instance the location-scale model:

Y = X 0 + (X 0 ),
where is independent of X and we suppose X 0 0. Then, by (1):
q (Y |X ) = X 0 ( + q ()) .
In other terms, = + q ().

First motivation: measuring heterogenous effects

I
In the location scale model, OLS = but running OLS we miss the
fact that the effect of X differs according to quantiles of the
Quantile Regression
unobserved variable .
Example of the Engel Curves:

2000
1500
1000
500
Food Expenditure
1000
2000
3000
4000
5000
Household Income
Data taken from Engels (1857) and Koenker and Hallock (2001). Seven estimated quantile regression lines for
Engel Curves for Food: This gure plots data taken from
Ernst Engel's 1857
study of the dependence of households' food ex-
Figure
2.2. The median is indicated by the dashed line while the OLS estimate is the dotted line.
different values
of quantiles.

Second motivation: robustness to outliers and to heavy

tails
I
We want to draw inference on a variable Y but observe, instead of

Y , contaminated data Y = CX 0 + (1 C )Y , where C = 1 if
data are contaminated, 0 otherwise (C is unobserved). We suppose
that p = P(C = 1) is small but X 0 is large.
Consider first a linear model E (Y |X ) = X 0 .

Then, instead of , OLS estimate (1 p) + p. The bias
p( ) may be large even if p is small.
Now consider the quantile model q (Y |X ) = X 0 .
.
In this case, q (Y |X ) = X 0 1p
so instead of , we estimate 1p
It is independent of and will typically be close to . If some
components of are independent of (homogenous effects), the
contamination does not affect their estimation.

Second motivation: robustness to outliers and to heavy

tails
I
In a similar vein, consider a linear model

Y = X 0 + , X
.
If is symmetric around zero, we can estimate with OLS or

median regression but we may prefer to estimate it with
median regression if has heavy tails.
Indeed, if E (||) = (examples ?), OLS are inconsistent

whereas the median is always defined. One can show that
estimates using median regression are consistent.
Useful in finance, insurance...

Inference
Outline
Quantile IV

Inference
The check functions

I
It is easy to estimate the -th quantile of a random variable Y : we

simply consider the order statistic Y(1) < ... < Y(n) and estimate
q (Y ) by
b (Y ) = Y(dn e) ,
q
where dn e n > dn e 1.
It does not seem obvious, however, to generalize this to quantile

regression.
The key observation is the following property:
Proposition
Consider the check function (u) = ( 1{u < 0})u. Then:
q (Y ) arg min E [ (Y a)] .
a

Inference
The check functions

Proof: suppose for simplicity that Y admits a density fY . Then
we have
Z a
E [ (Y a)] = (E (Y ) a)
(y a)fY (y )dy .
This function is differentiable, with

E [ (Y a)]
= (a a)fY (a) +
a
fY (y )dy = FY (a) .
This function is increasing, thus a 7 E [ (Y a)] is convex and

reaches its minimum at q (Y )

Inference
The check functions

I
The minimum need not be unique (there may be several solutions to

FY (a) = ). When Y is not continuous, there may be no solution
to FY (a) = but we can still show that q (Y ) is the minimum of
E [ (Y a)].
The -th quantile minimizes the risk associated with the

(asymmetric) loss function (.). This is similar to the expectation
which minimizes the risk corresponding to the L2 -loss :

E (Y ) = arg min E (Y a)2 .
a
Similarly to conditional expectation, we can extend the reasoning to

conditional quantiles. We have
q (Y |X = x) = arg min E [ (Y a)|X = x] .
a
Thus,integrating over P ,
(x 7 q (Y |X = x)) = arg min E [ (Y h(X ))] .
h(.)

Inference
Definition of the estimator

I
Suppose that q (Y |X ) = X 0 . We have, by the preceding

argument,

arg min E (Y X 0 ) .
(2)
We use this property to define the quantile regression

estimators. Suppose that we observe a sample (Yi , Xi )i=1...n
of i.i.d. data, we let
n
1X
b
(Yi Xi0 ).
(3)
arg min
n
i=1
N.B.: when = 1/2 (median), this is equivalent to minimizing

n
1X
|Yi Xi0 |.
n
i=1
The corresponding solution is called the least absolute

deviations (LAD) estimator.

Inference
Identification
I
Before proving consistency of the estimator, we have to prove

identification of by (2). In other words, is the unique
minimizer of 7 E [ (Y X 0 )]?
One can show that this holds if the residuals are

continuously

distributed conditional on X and the matrix E f |X (0)XX 0
is positive definite (very similar to the rank condition in linear
regression).
N.B.: this fails to hold when f |X (0) = 0, which is logical

because as mentioned before in the unconditional case, the
minimizer of (2) is not unique when the d.f. is flat at .

Inference
Consistency
I
Achieving consistency of b is not as easy as with OLS because we

have no explicit form of the estimator.
We may use the special feature of , or use general consistency

theorems on M-estimators defined as
n
1X
(Ui , ).
(4)
b = arg min
n
i=1
Theorem
(van der Vaart, 1998, Theorem 5.7) Let denote the set of parameters
and supposethat for all > 0:

n
1 X

P

sup
(Ui , ) E ((U1 , )) 0,
(5)

n
i=1
inf
/d(,0 )
E ((U1 , ))
>
E ((U1 , 0 )).
Then any sequence of estimators bn defined by (4) converges in

probability to 0 .
(6)

Inference
Consistency
I
Here Ui = (Yi , Xi ) and (U, ) = (Y X 0 ).
Condition (6) is a well-separated minimum condition, which

is typically satisfied in our case under the identification
condition above and if we restrict to be compact.
The first condition is the most challenging. By the law of

large numbers, we have pointwise convergence but not, a
priori, uniform convergence. To achieve this, we may use
Glivenko-Cantelli theorems.
The idea behind is that if the set of functions ((., )) is

not too large, one can approximate the supremum by a
maximum over a finite subset of and applies the law of
large numbers to each of the elements of this subset.

Inference
Consistency
Example: the standard Glivenko-Cantelli theorem. Let us consider the
functions (x, t) = 1{x t}. Then, if Y1 is continuous:
n

1 X

P
sup
(Yi , t) E ((Y1 , t)) 0.

n
tR
i=1
N.B.: letting Fn denote the empirical d.f. of Y , this can be written in a

more usual way as
P
sup |Fn (t) F (t)| 0.
tR
Proof: fix > 0 and consider t0 = < ... < tK = such that
F (tk ) F (tk1 ) < . Then for all t [tk1 , tk ],
Fn (t) F (t) Fn (tk ) F (tk1 ) Fn (tk ) F (tk ) +
Similarly, Fn (t) F (t) Fn (tk1 ) F (tk1 ) . Thus,
|Fn (t) F (t)| max{|Fn (tk ) F (tk )|, |Fn (tk1 ) F (tk1 )|} + .

Inference
Consistency
As a result,
sup |Fn (t) F (t)|
tR
max
i{0,...,K }
|Fn (ti ) F (ti )| + .
By the weak law of large numbers, the maximum tends to zero. The
result follows
This proof can be generalized to classes of functions different from
(1{. t})tR . A -bracket in Lr is a set of functions f with l f u,
1/r
R
< . For a
where l and u are two functions satisfying
|u l|r dF
given class of functions F, define the bracketing number N[ ] (, F, Lr ) as
the minimum number of -brackets needed to cover F.
Proposition
(van der Vaart, 1998, Theorem 19.4) Suppose that for all > 0,
N[ ] (, F, L1 ) < . Then

n
1 X

P
sup
f (Xi ) E (f (X1 )) 0.

f F n
i=1

Inference
Consistency
The proposition applies to many cases, see van der Vaart (1998), chapter
19, for examples. In particular, it holds with parametric families satisfying
|(Ui , 1 ) (Ui , 2 )| m(Ui )||1 2 ||, E (m(U1 )) < .
(7)
In quantile regression,
| (Y X 0 1 ) (Y X 0 2 )|
max(, 1 )|X 0 (1 2 )|
||X || ||1 2 ||.
Thus (7) holds provided that E (||X ||) < . This establishes consistency
of b since we can then apply the theorem above.

Inference
Asymptotic normality
I
We now investigate the asymptotic distribution of b .
The usual method for smooth M-estimator is to use a Taylor

expansion. The first order condition writes as
n
1 X
b = 0.
(Ui , )
n
(8)
i=1
b we get
Then expanding around ,
"
#
n
n
1 X 2
1 X
b 0 )+oP (||
b 0 ||).
0=
(Ui , 0 )+
(Ui , 0 ) (
n
n
0
i=1
i=1
Hence, provided that one can show that ||b 0 || = OP (1/ n), we
have
" n
#
n
1 X 2
1 X
b
(U
,
)
n(
)
=
(Ui , 0 ) + oP (1).
i 0
0
n
0
n
i=1
i=1

Inference
I
By the weak law of large numbers, the central limit theorem and
Slutskis lemma, we get:

L
n b 0 N (0, J 1 HJ 1 ),
h 2
i

where J = E
and H = V (
0 (Ui , 0 )
(Ui , 0 )). This kind of
variance is often called a sandwich formula.
N.B.: in the maximum likelihood case, J = H = I0 , the Fisher

information matrix, and the formula simplifies.
In quantile regression, this method cannot be applied since the

derivative of (for u 6= 0) is the step function
0 (u) = 1{u < 0} for which no Taylor expansion is available.
The first order condition(8) may not hold exactly either. However,
0 can be replaced by oP 1n , which will be sufficient subsequently.

Inference
Two key ideas for these kinds of situations:
I
7 Q()
I
h(Ui , ) is not
i differentiable at 0 ,
= E (Ui , ) is usually (continuously)
Even if 7
differentiable.
Starting from (8), we then write:

n

1 X
b
b
b Q(0 )
0 =
(Ui , ) Q() + n Q()
n i=1
b + Q 0 ()
e n(b 0 ).
= Gn ()
(9)
h
i
b and Gn () = 1 Pn
where e (0 , )
i=1 (Ui , ) Q() . Gn is
n
a stochastic process (i.e., a random function) which is called the

empirical process.
b
I To show asymptotic normality of n(
0 ), it suffices to show
b converges to a normal distribution.
that Gn ()

Inference
I
By the central limit theorem, for any fixed , Gn () converges to a

normal distribution. Here however, b is random.
The idea is to extend simple central limit theorem to convergence

of the whole process Gn to a continuous gaussian process G . This is
achieved through Donsker theorems.
Such theorems may be seen as uniform CLT, just as

Glivenko-Cantelli were uniform LLN. Under such conditions, we can
L
b
prove that Gn ()
G (0 ), a normal variable.
As previously, Donsker theorems can be obtained when the class of

functions F is not too large. For instance:
Proposition
(van der Vaart, Theorem 19.5) Gn , as a process indexed by f F,
converges to a continuous gaussian process if
Z 1q
ln N[ ] (, F, L2 )d < .
0

Inference
I
Like previously, many classes of functions satisfy the bracketing

integral condition. In parametric classes where (7) holds, for
instance, one can show that for small enough,
N[ ] (, F, L2 )
K
.
d
Thus the bracketing integral is finite and one can apply the previous
theorem.
I
Coming back to (9), we have, under the bracketing integral

condition,

L
0
1
0
1
b
n 0 N 0, Q (0 ) V
(Ui , 0 ) Q (0 )

Inference
I
Application to the quantile regression: the bracketing integral

condition is satisfied, thus it suffices to check the differentiability of
Q() at . Here, /(Ui , ) = ( 1{Y X 0 < 0}) X .
Thus,
Q()
= E (X ) E [1{ < X 0 ( )}X ]

= E (X ) E F |X (X 0 ( )|X )X
Thus, provided that admits a density conditional on X at 0, Q(.)

is differentiable and

Q 0 ( ) = E f |X (0|X )XX 0 .
Besides,

V
(Ui , 0 )
= E {V [( 1{Y X 0 < 0}) X |X ]}

= (1 )E [XX 0 ] .

Inference
I
Finally, we get:

1

1

L
n b N 0, (1 )E f |X (0|X )XX 0
E XX 0 E f |X (0|X )XX 0
.
I
Remark 1: if Y = X 0 + where is independent of X (location

model), = q () and the asymptotic variance Vas reduces to
Vas =
(1 )
1
E [XX 0 ] .
f (q ())2
This formula is similar to the one for the OLS estimator, except that
2 is replaced by (1 )/f (q ())2 . In general, as we let 1
or 0, f (q )2 becomes very small and thus b becomes imprecise.
This is logical since data are often more dispersed at the tails.

Inference
I
b , in
Remark 2: this result applies in particular to simple quantiles q
which case we have:

(1 )
L
n (b
q q ) N 0, 2
.
fY (q )
Remark 3: we can also generalize it to parameters (1 , ..., m )

corresponding to different quantiles:
m

L
n bk k
N (0, V ) ,
(10)
k=1
where V is a m m block-matrix, whose (k, l) block Vk,l satisfies

Vk,l = [k l k l ] H(k )1 E [XX 0 ] H(l )1

and as before, H( ) = E f |X (0)XX 0 .

Inference
Confidence intervals and testing
This result is useful to build confidence intervals or test

assumptions on .
However, to obtain estimators of the asymptotic variance, one

has to estimate f |X (0|X ), which is a difficult task.
Alternative solutions have thus been proposed for inference:
I
I
I
using rank tests (not presented here);

using bootstrap or, more generally, resampling methods;
making finite sample inference.

Inference
Confidence intervals and testing: asymptotic variance

estimation
I
In the location model, Vas = (1 )E (XX 0 )1 /f (q ()), and the

only problem is the denominator. Note that
1
f (q ())
F1
( )
f (F1 ( ))
F 1 ( + h) F1 ( h)
.
= lim
h0
2h
Thus we can estimate this term by, e.g.,

(Fb1 ( + hn ) Fb1 ( hn ))/2hn , where hn 0.
This is (roughly) the estimator provided by default in Stata.

However, the corresponding variance estimator is inconsistent in
general when is not independent of X .

Inference

estimation
I
In this general case, the main difficulty is to estimate

J = E (f |X (0|X )XX 0 ]. A simple solution for that purpose,
proposed by Powell (1991), relies on the following idea:

1{| | h}
XX 0 .
J = lim E
h0
2h
Letting bi = Yi Xi0 b , we thus may estimate J by

Jb =
n
1 X
1{|b
i | hn }Xi Xi0 .
2nhn
(11)
i=1
As often in statistics, hn must be chosen so as to balance the bias

and variance of Jb (for consistency, we must have hn 0 and
nhn ).
Note that we can replace the uniform kernel 1{|u| 1}/2 in (11)
by any other density function.

Inference

estimation
I
With a consistent estimator of Vas in hand, we can easily make

inference on .
Confidence interval on :

q
q
bas , b + z1/2 V
bas ,
IC = b z1/2 V
where z1/2 is the 1 /2-th quantile of the N (0, 1) distribution.
The Wald statistic test of g ( ) = 0 writes

T = ng (b )0
1
g
g
b
( )Vas ( )
g (b ),
0
and it tends to a 2dim(g ) under the null hypothesis.

Inference
Confidence intervals and testing: bootstrap
The previous approach requires to choose a smoothing

parameter hn , and results may be sensitive to this choice.
Alternatively, we can use bootstrap by implementing the

algorithm:
For b = 1 to B:
- Draw with replacement a sample of size n from the initial
, ..., k ) denote the
sample (Yi , Xi )i=1...n . Let (kb1
bn
corresponding indices of thePobservations;
- Compute bb = arg min nj=1 (Ykbj Xk0 ).
bj

Inference
Confidence intervals and testing: bootstrap

I
Then we can estimate the asymptotic variance by
Vas
=
B
1 X b
b 2.
( b )
B
b=1
Confidence intervals or hypothesis testing may be conducted

as before, using the normal approximation.
Alternatively (percentile bootstrap), you can compute the
empirical quantiles qu of (b1 , ..., bB ) and then define a
confidence interval as
, q1/2
].
IC1 = [q/2
N.B.: there are other resampling methods specialized for the

quantile regression, see Koenker (1994), Parzen et al. (1994)
and He and Hu (2002).

Inference
Confidence intervals and testing: finite sample inference

I
Simple yet very recently developed idea (Chernozhukov et al., 2009,

Coudin and Dufour, 2009): if = 0 , then
Bi (0 ) = 1{Yi Xi0 0 0} is such that
Bi (0 )|Xi Be( ).
As a result, for all g (.) and positive definite Wn , under the

hypothesis = 0 , the distribution of
!0
!
n
n
1 X
1 X
Tn (0 ) =
( Bi (0 ))g (Xi ) Wn
( Bi (0 ))g (Xi )
n i=1
n i=1
is known (theoretically at least). Letting z1 denote its (1 )-th
quantile, we reject the null hypothesis if Tn (0 ) > z1 .
In practice, the distribution of Tn (0 ) under the null can be

approximated by simulations.

Inference
Confidence intervals and testing: finite sample inference

I
We can then define a confidence region by inverting the test:

CR1 = {/Tn () z1 }. Indeed, letting denote the true
parameter,
Pr(CR1 3 )
Pr (Tn ( ) z1 )
1 .
I
This is a general procedure to build confidence regions from a test.
To obtain confidence interval on a real-valued parameter ( ), we

let
IC1 = {(), CR1 }.
This is known as the projection method (see, e.g., Dufour and
Taamouti). Corresponding confidence intervals are conservative.
The computation of such confidence regions / intervals may be

demanding. See Chernozhukov et al. (2009) for MCMC methods
which (partially) alleviate this issue.

Inference
Testing homogeneity of effects

I
As mentioned before, an interesting property of quantile regression

is that it allows for heterogeneity of effects of X across the
distribution of Y . A byproduct is that they also provide tests for the
homogeneity hypothesis.
Let X = (1, X1 ) and = (1 , 1 ) and T denote a set

included in [0, 1], the test formally writes as
1 =
t T .
This may be seen as testing for the location model Y = X 0 + ,

with
X.
I
If the set T is finite, we can use (10) to implement such a test. If

the set is infinite, this is far more complex and can be achieved
using the convergence of 7 b as a process (see Koenker and
Xiao, 2002).

Outline
Quantile IV

Interpretation as a random coefficient model

I
Consider the following random coefficient model:

Y = X 0 U ,
U|X U[0, 1].
(12)
Suppose also that for all x, 7 x 0 is strictly increasing. Then:

P(Y X 0 |X ) = P(X 0 U X 0 |X ) = P(U |X ) = .
In other words, q (Y |X ) = X 0 .
I
This is useful to simulate models satisfying the linear restriction for

all quantiles.
This also shows that assuming quantile regression, we consider

random coefficient model with a unique underlying random variable,
which may be interpreted as the ranking on an unobserved variable.

On the monotonicity assumption
If we assume q (Y |X ) = X 0 for all , then for all x in the

support of X , 7 x 0 should be increasing. Note that except
when = ( , ) (i.e., under a location model), this cannot be
true when the support of X (apart from the constant) is Rp .
Even if this support is not Rp , the estimated functions 7 x 0 b

may not satisfy this requirement. We can prove (see Koenker, 2005)
that 7 x 0 b is increasing but this does not always hold for x 6= x.
If this is not the case, then this may be due to

I
I
misspecification, i.e. q (Y |X ) 6= X 0 ;
finite sample errors.


I
Recently, Chernozhukov et al. (2009) have proposed an elegant

solution to this issue.
If there is no misspecification, under the random coefficient

representation (12):
FY |X (y |x) = P(x 0 U y ) =
1{x 0 u y }du.
(13)
Their ideas is to use (13) to estimate the conditional distribution

function:
Z 1
b
FY |X (y |x) =
1{x 0 bu y }du.
0
Even if u 7 x 0 bu is not monotonic, FbY |X (y |x) will be a proper

distribution function.


I
Then define the rearranged quantile estimator as

b (Y |X ) = inf{y /FbY |X (y |x) }.
q
This function is increasing as a standard quantile function of a
distribution function.
Chernozhukov et al. (2009) prove that:

I
b (Y |X ) is always closer to the true quantile function than

q
x 0 bu (even under misspecification).
b (Y |X ) and x 0 bu have the
If there is no misspecification, q
same limit distribution so that all inference based on
b (Y |X ).
asymptotic properties of bu also applies to q


8
1.0
(taken from Chernozhukov et al., 2009)

Q1
F
0.0
0.2
0.4
0.6
0.8
Q
Q*
0.0
0.2
0.4
0.6
u
0.8
1.0
Figure 1. Left: The pseudo-quantile function Q and the rearranged quantile

function Q . Right: The pseudo-distribution function Q1 and the distribution
function F induced by Q.

19
(taken from Chernozhukov et al., 2009)

20000
B Rearranged curves
5000
10000
15000
Nonveterans
Veterans
5000
10000
Annual earnings
15000
Nonveterans
Veterans
Annual earnings
20000
A Original curves
0.2
0.4
0.6
Quantile index
0.8
0.2
0.4
0.6
0.8
Quantile index
Figure 4. Chernozhukov and Hansens estimates of the structural quantile

functions of earnings for veterans (left panel), and their rearrangements (right
panel).

Outline
Quantile IV

Computation of b .
I
There is no explicit solution to (3) so one has to solve the program

numerically.
An issue is the non differentiability of the objective function.

Standard algorithms such as the Newton-Raphson cannot be used
here.
The key idea is to reformulate (3) as a linear programming problem:

min
(,u,v )Rp R2n

+
10 u + (1 )10 v
s.t. X + u v Y = 0,
where X = (X1 , ..., Xn )0 , Y = (Y1 , ..., Yn )0 and 1 is a n-vector of 1.

I
Such linear programming problems can be efficiently solved by

simplex methods (for small n) or interior point methods (large n).

Computation of b .
I
Simplex method: consider a linear programming problem of the form

min c 0 x
xRn
s.t. x S = {u/Au b, Bu = d},
(14)
where c Rn , A and B are two matrices and is considered

elementwise.
I
Then one can show that (i) S is a convex polyhedron and (ii) if
solutions exist, then they are vertices of S.
Basically, the simplex method consists of going from one vertex to

another, choosing each time the steepest descent.
Interior point methods: consider (14) with A = In and b = 0, the

idea is to replace (14) by
minn c 0 x
xR
n
X
ln xk
s.t. B x = d.
(15)
k=1
(15) can be solved easily with a Newton method. Then let 0.

Software programs
I
SAS: proc quantreg.
proc quantreg data=(dataset) algorithm=(choice of algo.) ci=

(method for performing confidence intervals);
class (qualitative variables);
model (y) = (x) /quantile = (list of quantiles or ALL);
run;
I
By default, the simplex method is used. One should switch to

an interior point method (by letting algorithm=interior)
for n 1000.
By default, the confidence intervals are computed by inverting

rank-score tests when n 5000 and p 20, and resampling
method otherwise (N.B.: the latter provide more robust
standard error estimates).

Software programs
I
Stata: command qreg:

qreg depvar indepvars , quantile(choice of quantile)
The standard errors estimated by qreg are valid for the

location model only. To obtain better standard errors
estimates (given with the bootstrap), you should use bsqreg
instead.
To obtain simultaneously several quantile regressions, use

sqreg:
sqreg depvar indepvars , quantiles(choice of quantiles)
N.B: The standard errors estimates provided by sqreg are

computed with the bootstrap.

Software programs
I
A very complete R package has been developed by R.

Koenker: quantreg.
library(quantreg)
rq(y ~ x1 + x2, tau = (single quantile or vector of
quantiles), data=(dataset), method=("br" or "fn"))
I
To obtain inference on all quantiles put tau = -1 (or any

number outside [0, 1]).
method ="br" corresponds to the Simplex (default), while

fn is an interior point method.
a tutorial is available at Roger Koenkers webpage.

An example
I
I look at the impact of various factors on birth weight, following

Abreveya (2001). Indeed, a low birth weight is often associated with
subsequent health problems, and is also related to educational
attainment and labor market outcomes.
Quantile regression provides a more complete story than just

running a probit on the dummy variable (birth weight < arbitrary
threshold).
The analysis is based on exhaustive 2001 US data on birth

certificates. I restrict the sample to singleton births with mothers
black or white, between the ages of 18 and 45, resident in the US
(roughly 2.9 million observations).
Apart from the gender, information on the mother is available:

marital status, age, being black or white, education, date of the first
prenatal visit, being a smoker or not, number of cigarettes smoked
per day...

An example
I
SAS code:
ods graphics on;
proc quantreg data=birth weights ci=sparsity/iid alg=interior(tolerance=1e-4);
model birth weight = boy married black age age2 high school some college
college prenatal second prenatal third no prenatal smoker
nb cigarettes /quantile= 0.05 to 0.95 by 0.05 plot quantplot;
run;
ods graphics off;
Stata code:
sqreg birth weigh boy married black age age2 high school some college prenatal second
prenatal third no prenatal smoker nb cigarettes, quantiles(0.05 0.1 0.2 0.3 0.4
0.5 0.6 0.7 0.8 0.9 0.95)
Stata is quite long here (1 hour for a single quantile with 20

bootstrap replications). To run SAS on large databases like
this one, you may have to increase the available memory.

An example
Quantile and Objective Function
Quantile
Objective Function
Predicted Value at Mean
0.1
31108564.261
2727.4037
Parameter Estimates
Parameter
Intercept
boy
married
black
age
age2
high_school
some_college
college
prenatal_second
prenatal_third
no_prenatal
smoker
nb_cigarettes
DF Estimate
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2150.419
83.8925
64.9045
-251.465
38.3584
-0.6657
6.5725
36.6800
76.1075
-4.1840
22.2022
-472.532
-156.928
-5.8266
Standard
Error
41.9615
3.8034
4.9650
5.4947
3.0443
0.0523
5.7090
6.4022
6.7700
5.9940
12.2669
19.1648
10.6564
0.8140
95% Confidence
Limits
2068.176
76.4380
55.1734
-262.234
32.3916
-0.7682
-4.6170
24.1319
62.8384
-15.9321
-1.8405
-510.095
-177.815
-7.4221
2232.662
91.3471
74.6357
-240.696
44.3251
-0.5631
17.7620
49.2281
89.3765
7.5641
46.2449
-434.970
-136.042
-4.2311
t Value Pr > |t|

51.25
22.06
13.07
-45.77
12.60
-12.73
1.15
5.73
11.24
-0.70
1.81
-24.66
-14.73
-7.16
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.2496
<.0001
<.0001
0.4852
0.0703
<.0001
<.0001
<.0001

An example

An example

An example

An example

Quantile IV
Outline
Quantile IV

Quantile IV
Motivation
I
As in standard linear models, the X s are likely to be

endogenous in many cases.
Considering the random coefficient model Y = X 0 U , this

means that U 6 X .
Example: effect of class size on test score achievement.

Unobserved ability is likely to be correlated with the class size.
In this case, b does not consistently estimate .
I
I
On the other hand, we may have an instrument Z which

affects X but is independent of U.
Example of class sizes: Mainmonides rule (Angrist and Lavy,

1999): classrooms cannot be larger than 40.
For the moment there is no real consensus on how one should

do quantile IV regressions.

Quantile IV
Strategy for inference: moment conditions

I
Suppose that for all x, 7 x 0 is strictly increasing. Then

P(Y X 0 |Z )
P(X 0 U X 0 |Z )
P(U |Z ) = .
In other words, E [1{Y X 0 } |Z ] = 0.
Let g : Rq Rr , with r p = dim(X ), we then have:

E [g (Z ) (1{Y X 0 } )] = 0.
(16)
Thus, we can estimate using a GMM estimator:
"
#0
"
#
n
n

1X
1X
0
0
b
= arg min
g (Zi ) 1{Yi Xi }
g (Zi ) 1{Yi Xi } .
Wn
n i=1
n i=1
I
Note that if we consider Y = X 0 + with q ( |Z ) = 0, we also

obtain (16), but for one only.

Quantile IV

I
Economic models often provide conditions of the form

E [q(U, 0 )] = 0,
where U denote the data of an observation and
r = dim(q) p = dim(0 ). In this case we can estimate by the
GMM estimator:
" n
#0
" n
#
X
X
1
1
b = arg min
q(Ui , ) Wn
q(Ui , ) ,
n
n
i=1
i=1
where Wn is a r r positive definite matrix.

I
P
Then, if Wn W , we can show that b is consistent and
asymptotically normal, with:

L
n b 0 N 0, J 1 HJ 1 ,
where J = G 0 WG , H = G 0 W WG , G = E (q(U, ))/|=0 and

= V (q(U, 0 )).

Quantile IV

I
Thus we can conduct inference using these results.
Alternatively, we can use, as previously, the finite sample

distribution of
Tn (0 ) =
!0
n
1 X
g (Zi )( Bi (0 )) Wn
n i=1
!
n
1 X
g (Zi )( Bi (0 )) ,
n i=1
where Bi (0 ) = 1{Yi Xi 0 }, to draw inference on 0 .

I
To the best of my knowledge, (16) is not implemented in

standard softwares. Actually, it is very difficult to solve it
because the objective function is nonsmooth and nonconvex.

Quantile IV
Strategy for inference: inverse quantile regression

I
A more computationally convenient method has been proposed by

Chernozhukov and Hansen (2007). Let X = (X0 , X1 ) where X0 is
endogenous while X1 is exogenous, let = (0 , 0 ) be the
corresponding parameters and let Z = (X1 , Z0 ). Then:
Y X00 0 = X10 0 + Z00 0 + ,
q ( |Z ) = 0.
In other words,
(0 , 0) = arg min E [ (Y X00 0 X10 Z00 )]
(17)
(,)
For a given value of (not necessarily 0 ), it is easy to obtain the

parameters of the quantile regression of Y X00 on (X1 , Z0 ). Let
() and () be the corresponding parameters.
Then the idea is to choose such that () is small.

Quantile IV
Strategy for inference: inverse quantile regression

I
In practice, define a grid on , {1 , ..., J }. Then, for j = 1 to J:

I Compute the quantile regression of Y X 0 on (X , Z ). Let
1
0
0 j
b j ),
((
b(j )) be the corresponding estimators.
I Compute the Wald statistic corresponding to the test of
(j ) = 0:
b 1 (b
Wn (j ) = nb
(j )0 V
(j ).
as (j ))b
Then define the estimator of 0 as
b = arg min Wn (j )
j=1...J
b ).
and b = (b
I
See Chernozhukov and Hansen (2007) for the asymptotics and

inference, and Christian Hansens webpage for the Matlab code.
N.B.: the method is especially convenient when dim() is low (1 or

2), otherwise it may be time consuming.

Quantile IV
Strategy for inference: weighted quantile regression

I
Still another solution has been proposed by Abadie et al. (2002),

when the endogenous variable and the instrument are binary,
(X0 , Z0 ) {0, 1}2 and under the monotonicity condition that X0
increases with Z0 .
They show that in this case, we can estimate consistently (0 , 0 )

by a weighted quantile IV:
(0 , 0 ) = arg min E [W (Y X0 X10 )],
,
where the weights W are defined by:

W =1
I
X0 P(Z0 = 0|X , Y ) (1 X0 )P(Z0 = 1|X , Y )
.
P(Z0 = 0|X1 )
P(Z0 = 1|X1 )
An issue is that P(Z0 = z|X , Y ) and P(Z0 = z|X1 ) are unknown

functions. Abadie et al. (2002) propose to estimate them
nonparametrically and show root-n consistency of the corresponding
estimator.

Quantile IV
Strategy for inference: a bad idea

I
For its ease of implementation, some people propose (1) to

regress X on Z and (2) to run a quantile regression of Y on
b.
the projector X
However, this is valid only under the very weird condition

b q (X X
b )) |Z ) = 0,
q ( + (X X
This does not hold in general, even when q ( |Z ) = 0 and

b |Z ) = q (X X
b ) because in general,
q (X X
q (U + V ) 6= q (U) + q (V ).

Quantile IV
Quantile IV: an example

I
There has been much debate on the efficiency of subsidized training

programs (classroom training, on-the-job training, job search
assistance...) on earnings.
The usual problem for evaluating its causal effect is endogeneity

(why here?).
Abadie et al. (2002) use a large random experiment conducted in

the US on the Job Training Partnership Act (JTPA).
In this experiment, 11,202 people were assigned randomly in a

treatment or control. However, among people of the treatment
group, only 60% actually receive training. Thus, receiving training is
probably endogenous.
On the other hand, the experiment provides us with a valid

instrument.

Quantile IV
Quantile IV: an example

Here are the results obtained by Abadie et al. (2002):
Impact of training on 30-month earnings (in percentage
Men
Women
Method
Without IV
IV
Without IV
Linear reg.
21.2
8.6
18.5
q0.15
135.6
5.2
60.8
q0.25
75.2
12.0
44.4
q0.50
34.5
9.6
32.3
q0.75
17.2
10.7
14.5
q0.85
13.4
9.0
8.1
of earnings)
IV
14.6
35.5
23.1
18.4
10.1
7.4

Nonlinear models
Outline
Quantile IV

Nonlinear models
Introduction
We consider here extensions of the quantile linear regression

to nonlinear models of the form
Y = g (X 0 0 + ),
(18)
where g is a nonlinear function.

I
It is difficult to use restrictions of the kind E (|X ) = 0 in (18)

because in general, E (Y |X ) 6= g (X 0 0 ).
On the other hand, by an equivariance property, quantile

restrictions are easy to use in such models.

Nonlinear models
The basic idea

The equivariance property can be stated as follows:
Proposition
Let g be an increasing, left continuous function, then
g (q (Y )) = q (g (Y )).
Proof: recall that q (g (Y )) = inf{x R/Fg (Y ) (x) }. we have
P(Y q (Y )) P(g (Y ) g (q (Y ))).
Thus, g (q (Y )) q (g (Y )). Conversely, let u = q (g (Y )) and
g (v ) = sup{x/g (x) v }. Then
P(g (Y ) u) P(Y g (u)).
As a result, g (u) q (Y ). Because g is left continuous,
g (g (u)) u. Thus, q (g (Y )) = u g (q (Y )), which ends the proof.

Nonlinear models
The basic idea

I
Now consider Model (18) with q (|X ) = 0. If g is increasing and

left continuous, we have
q (Y |X ) = g (q (X 0 0 + |X )) = g (X 0 0 ).
By the same argument as previously, it follows that

0 arg min E [ (Y g (X 0 ))] .
Thus, compared to a linear quantile regression, we simply add g in

the program.
This comes however at the cost of some identification, estimation

and implementation issues, as we shall see below.

Nonlinear models
The basic idea
Although this idea is general, we study in details two

examples: binary and tobit models. In the first,
g (x) = 1{x > 0} and in the second, g (x) = max(x, 0).
Note that an alternative nonlinear model would be

Y = (X , 0 ) + ,
q (|X ) = 0.
Such an extension leads to a similar optimization program as

above and is thus not considered afterwards.

Nonlinear models
First example: binary models
Consider the following model:

Y = 1{X 0 0 + > 0}.
We would like to identify and estimate without imposing

arbitrary assumptions such as |X N (0, 1) (Probit models).
In particular, we would like to allow for heteroskedasticity and

leave the distribution of unspecified.
Note that a scale normalization is necessary. We suppose for

instance that the first component of 0 is equal to 1 or -1.

Nonlinear models

I
I
First attempt: E (|X ) = 0.

We have
Z
P(Y = 1|X = x) =
x 0 0
dF|X =x (u),
R
and the model imposes that udF|X =x (u) = 0.
Consider 6= 0 . For all x, it is possible (exercise...) to build
a distribution function Gx 6= F|X =x such that:
Z
dGx (u) = P(Y = 1|X = x)
x 0
Z
udGx (u) = 0.
This implies that 0 is not identified here.

Nonlinear models

I
Second attempt: q (|X ) = 0. In this case, by the

equivariance property:
q (Y |X ) = 1{X 0 0 > 0}.
To achieve identification, we must therefore have:

1{X 0 > 0} = 1{X 0 0 > 0} a.s. = 0 .
The following conditions are sufficient for that purpose

(Manski, 1988):
A1 there exists one variable (say X1 ) which is continuous and
whose density (conditional on X1 ) is almost everywhere
positive.
A2 The (Xk )1kK are linearly independent.

Nonlinear models

I
We use the standard characterization and consider:

1
b = arg min
n
(Yi 1{Xi0 > 0}).
i=1
When = 1/2, the estimator is called the maximum score

estimator, because one can show that:
1
b = arg max
n
n
X
n
X
Yi 1{Xi0 > 0} + (1 Yi )1{Xi0 0}.
i=1
Note that this program is neither differentiable in , nor even

continuous. This raises trouble in both the asymptotic
behavior of b and its computation.

Nonlinear models

I
Kim and Pollard (1990) show that

L
n1/3 (b 0 ) Z = arg
max
Vect(0 )
W (),
where W is a multidimensional gaussian process (see Kim and

Pollard for its exact distribution).
I
The reason why we get a nonstandard convergence rate is that

contrary to previously, b does not solve a (even approximate) first
order condition. For general discussion on rates of convergence of
M-estimator, see e.g. Van der Vaart (1998), Section 5.8.
Inference is difficult because the distribution of Z has no exact form

and depends on nuisance parameters. Moreover, bootstrap fails in
this context (see Abrevaya and Huang, 2005). Instead, one may use
subsampling (see Delgado, Rodriguez-Poo and Wolf, 2001).

Nonlinear models
There are also some computational issues, because

(i) the objective function is a step function and
(ii) we cannot rewrite the program as a linear programming
problem.
A first algorithm is provided by Manski and Thompson

(1986), but it may reach a local solution only. A recent
solution based on mixed integer programming has been
proposed by Florios and Skouras (2008).
To my knowledge, it has not been implemented yet in

standard softwares.

Nonlinear models
To circumvent the trouble caused by the nonregularity of the

objective function, Horowitz (1992) has proposed to replace
1{X 0 > 0} by K (X 0 /hn ), where K is a smooth distribution
function and hn 0, in the objective function.
He shows under mild regularity conditions that his estimator
has a faster rate of convergence (still lower than n yet) and

is asymptotically normal. He also shows the validity of the
bootstrap.
Implementation is also easier as the objective function is

smooth.

Nonlinear models
Second example: Tobit models

I
Consider the simple tobit model:

Y = max(0, X 0 0 + ).
Such a model is useful for consumption or top-coding (in

which case max and 0 are replaced by min and y ), among
others.
The standard Tobit estimator is the ML estimator of a model

where |X N (0, 2 ).
Powell (1984) considers instead the quantile restriction:

q (|X ) = 0.

Nonlinear models
In this case, as mentioned before:

q (Y |X ) = max(0, X 0 0 ).
Thus, identification of 0 is ensured as soon as:

max(0, X 0 ) = max(0, X 0 0 ) = 0 .
This is true for instance if E (XX 0 1{X 0 0 }) (for some

> 0) is full rank and the distribution of conditional on X
admits a density at 0.

Nonlinear models

I
The estimator satisfies

1
b = arg min
n
n
X

Yi max(0, Xi0 ) .
i=1
Contrary to the previous binary model, the program is

continuous (and differentiable except on some points). A
consequence is that the behavior of b is more standard.
Powell shows indeed that

L
n b 0 N 0, J 1 HJ 1
where

J = E f |X (0|X )1{X 0 0 0}XX 0 ,

H = E 1{X 0 0 0}XX 0 .

Nonlinear models

I
Buchinsky (1991, 1994) proposes an iterative linear programming

algorithm based on the decomposition:
X
X
1
b = arg min
(Yi Xi0 ) +
(Yi ) .
n
0
0
i/Xi 0
i/Xi <0
1. Set D0 = {1, ..., n}, b0 = 0 (for instance) and m = 1.

2. Repeat until bm = bm1 :
Estime a quantile regression on Dm1 . Let bm be the
corresponding estimator and Dm = {i/Xi0 bm 0}. Set
m = m + 1.
I
Buchinsky (1994) shows that if this algorithm converges, then it

converges to a local minimum of the objective function.
This algorithm is implemented in Stata for = 1/2 (clad).

Nonlinear models

I
Inference can be based on the estimation of the asymptotic

variance, as in quantile regression.
Alternatively, one may use a modified bootstrap proposed by

Bilias, Chen and Ying (2000):
For b = 1 to B:
- Draw with replacement a sample of size n from the initial
, ..., k ) denote the
sample (Yi , Xi )i=1...n . Let (kb1
bn
corresponding indices of the
observations;
P
- Compute bb = arg min nj=1 (Ykbj Xk0 )1{Xk0 b > 0}.
Note that each bootstrap estimator bb can be obtained easily

by a standard quantile regression since the indicator term does
not depend on .
bj
bj

Quantile Regression PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantile Regression PDF

Uploaded by

Copyright:

Available Formats

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics

The -th quantile ( (0, 1)) of a random variable U is

Semi and Nonparametric Models in Econometrics

The quantile function 7 q (U) is an increasing, left

Caution: q (U + V ) 6= q (U) + q (V ) in general.

Conditional quantiles are simply defined as:

Similarly to conditional expectations, conditional quantiles are

Semi and Nonparametric Models in Econometrics

Let Y R and X Rp , we consider here a model of the form

This model is similar to the standard linear regression, except

Semi and Nonparametric Models in Econometrics

First motivation: measuring heterogenous effects

The effect of a variable may not be the same for everybody. We

However, such heterogeneity may be important for policy reasons.

Consider for instance the location-scale model:

Semi and Nonparametric Models in Econometrics

First motivation: measuring heterogenous effects

Example of the Engel Curves:

Semi and Nonparametric Models in Econometrics

Second motivation: robustness to outliers and to heavy

We want to draw inference on a variable Y but observe, instead of

Consider first a linear model E (Y |X ) = X 0 .

Now consider the quantile model q (Y |X ) = X 0 .

Semi and Nonparametric Models in Econometrics

Second motivation: robustness to outliers and to heavy

In a similar vein, consider a linear model

If is symmetric around zero, we can estimate with OLS or

Indeed, if E (||) = (examples ?), OLS are inconsistent

Useful in finance, insurance...

Semi and Nonparametric Models in Econometrics

Semi and Nonparametric Models in Econometrics

The check functions

It is easy to estimate the -th quantile of a random variable Y : we

It does not seem obvious, however, to generalize this to quantile

The key observation is the following property:

Semi and Nonparametric Models in Econometrics

The check functions

This function is differentiable, with

This function is increasing, thus a 7 E [ (Y a)] is convex and

Semi and Nonparametric Models in Econometrics

The check functions

The minimum need not be unique (there may be several solutions to

The -th quantile minimizes the risk associated with the

Similarly to conditional expectation, we can extend the reasoning to

Semi and Nonparametric Models in Econometrics

Definition of the estimator

Suppose that q (Y |X ) = X 0 . We have, by the preceding

We use this property to define the quantile regression

N.B.: when = 1/2 (median), this is equivalent to minimizing

The corresponding solution is called the least absolute

Semi and Nonparametric Models in Econometrics

Before proving consistency of the estimator, we have to prove

One can show that this holds if the residuals are

N.B.: this fails to hold when f |X (0) = 0, which is logical

Semi and Nonparametric Models in Econometrics

Achieving consistency of b is not as easy as with OLS because we

We may use the special feature of , or use general consistency

Then any sequence of estimators bn defined by (4) converges in

Semi and Nonparametric Models in Econometrics