Professional Documents
Culture Documents
Inference@LS-Kneip 12
1.1 Why does the bootstrap work?
P (Yi = 1) = P (Yi = 1| Yn ) = p,
P (Yi = 0) = P (Yi = 0| Yn ) = 1 p
and
E (p ) = E(p | Yn ) = p,
p(1 p]
V ar (p ) = E[(p p)2 | Yn ] =
n
The conditional distribution of np = S given Yn is equal
to B(n, p). In a slight abuse of notation we will write
or
distr(np |Yn ) = B(n, p)
Inference@LS-Kneip 14
As n the centrallimit theorem implies that the (condi-
n(p p)
tional) distribution ( p(1p) |Yn ) converges (stochastically)
to a N (0, 1)-distribution. Moreover, p is a consistent estima-
tor of p and therefore p(1 p) P p(1 p) as n .
This implies that asymptotically p(1 p) may be replaced
by p(1 p), and
n(p p)
The law of ( |Yn ) converges stochastically
p(1 p)
to a N (0, 1)-distribution
More precisely, as n
( )
n(p p)
sup |P |Yn ()| P 0,
p(1 p)
where denotes the distribution function of the standard
normal distribution.
We can conclude that for large n
distr( n(p p)|Yn ) distr( n(p p)) N (0, p(1 p))
as well as
Bootstrap consistent
Inference@LS-Kneip 15
Example 2: Estimating a population mean
Let Y1 , . . . , Yn denote an i.i.d. random sample with mean
and variance 2 . In the following F will denote the corre-
sponding distribution function.
n
Y = n1 i=1 Yi is an unbiased estimator of
Problem: Construct a condence interval
This implies: n Y S tn1 , and hence
S S
P (tn1,1 2 Y tn1,1 2 )
n n
Inference@LS-Kneip 16
The bootstrap approach:
Random samples Y1 , . . . , Yn are generated by drawing obser-
vations independently and with replacement from the availa-
ble sample Yn := {Y1 , . . . , Yn }.
n
Y1 , . . . , Yn estimator Y = n1 i=1 Yi
Means and variances of the conditional distributions of Yi
and Y given Yn :
E (Yi ) = E(Yi |Yn ) = Y ,
1
n
V ar (Yi ) = E[(Yi Y ) | Yn ] = S :=
2 2
(Yi Y )2
n i=1
Moreover,
E (Y ) = Y ,
V ar (Y ) = S 2 /n
n(Y Y )
The law of ( |Yn ) converges stochastically
to a N (0, 1)-distribution
More precisely, as n
( )
n(Y Y )
sup |P |Yn ()| P 0,
where denotes the distribution function of the standard
normal distribution.
Inference@LS-Kneip 17
We can conclude that for large n
distr( n(Y Y )|Yn ) distr( n(Y )) N (0, 2 )
as well as
Bootstrap consistent
P (Y t1 2 ) 1 , P (Y > t1 2 ) ,
2 2
Here, P denotes probabilities with respect to conditional
distribution of Y given Yn := {Y1 , . . . , Yn }.
In practice:
Draw m bootstrap samples (e.g. m = 2000) and calculate
the corresponding estimates Y1 , Y2 , . . . , Ym .
Order the resulting estimates Y(1) Y(2) Y(m) .
Set t := Y([m+1]
) and t1 := Y([m+1][1 ]) .
2 2
Inference@LS-Kneip 18
A basic bootstrap condence interval:
By construction of t 2 and t1 2 we have
P (Y Y t 2 Y ) , P (Y Y t1 2 Y ) 1 .
2 2
We have seen that the bootstrap is consistent, and therefore
distr(Y Y |Yn ) distr(Y ) asymptotically. This implies
that for large n
P (Y t 2 Y ) , P (Y t1 2 Y ) 1 ,
2 2
and therefore
( )
P Y (t1 2 Y ) Y (t 2 Y ) 1
[2Y t1 2 , 2Y t 2 ]
Inference@LS-Kneip 111
General approach: Basic bootstrap 1 condence in-
terval
Random sample Yn := {Y1 , . . . , Yn }; unknown parameter (vec-
tor)
We will assume that the bootstrap is consistent: distr( |Yn )
distr( ) if n is suciently large.
Determine
2 and 1
2 quantiles t 2 and t1 2 of the condi-
tional distribution of given Yn := {Y1 , . . . , Yn } (the boot-
strap distribution):
P ( t 2 ) , P ( > t 2 ) 1 ,
2 2
P ( t1 2 ) 1 , P ( > t1 2 ) ,
2 2
Here, P denotes probabilities with respect to conditional
distribution of given Yn := {Y1 , . . . , Yn }.
Consistency of the bootstrap implies that for large n
P ( t 2 ) , P ( t1 2 ) 1 ,
2 2
and therefore
( )
P (t1 2 ) (t 2 ) 1
[2 t1 2 , 2 t 2 ]
Inference@LS-Kneip 112
Example: Bootstrap condence interval for a median
Given: i.i.d. sample Yn := {Y1 , . . . , Yn }; Yi possesses a continuous
distribution with (unknown) density f .
We are now interested in estimating the median med of the
underlying distribution. Recall that the median is dened by
P (Yi med ) = P (Yi med ) = 0.5
Bootstrap:
i.i.d. re-sample Y1 , . . . , Yn Y1 , . . . , Yn from Yn estimators
n
Y = n1 i=1 Yi and
n
S 2 = n11
i=1 (Yi Y )
2
n large approximately
n(Y Y ) n(Y )
distr( |Yn ) distr( ) N (0, 1)
S S
or
Y Y Y
distr( |Yn ) distr( )
S S
Therefore, the (conditional) distribution of Y S Y (given Yn )
Y
can be used to approximate the distribution of S .
Y Y Y Y
P ( 1 ) 1 , P ( > 1 ) ,
S 2
2 S 2
2
In practice:
Draw m bootstrap samples (e.g. m = 2000) and cal-
Y Y
culate the corresponding estimates Z1 : 1S , Z2 :=
1
Y2 Y
Ym Y
S2 , . . . , Zm := Sm .
Order the resulting estimates Z(1) Z(2)
Z(m) .
Inference@LS-Kneip 115
Set := Z([m+1]
) and 1 := Z([m+1][1 ]) .
2 2
Y Y
P ( 1 2 ) 1 , P ( > 1 2 ) ,
S 2 S 2
This yields the 1 condence interval
[Y 1 S, Y S]
Inference@LS-Kneip 116
General construction of a bootstrap-t interval (unknown
real values parameter IR):
Random sample Yn := {Y1 , . . . , Yn }; unknown parameter (vec-
tor) . Assume that the estimator of is asymptotically normal,
( )
n( ) L N (0, v ) n 2
L N (0, 1)
v
and that a consistent estimator v v(Y1 , . . . , Yn ) of v is availa-
ble. One might then replace v by v to obtain
n L N (0, 1)
v
Obviously, n ()
v and
v are asymptotic pivot statistics.
Based on an i.i.d. re-sample Y1 , . . . , Yn from {Y1 , . . . , Yn },
calculate Bootstrap estimates and v .
Determine
2 and 1 2 quantiles 2 and 1 2 of the condi-
tional distribution of v given Yn .
Bootstrap-t interval
[ 1 v, v]
Inference@LS-Kneip 117
1.3 The Parametric bootstrap
[2 t1 2 , 2 t 2 ],
2 and 1 2 quantiles
where 2 and 1 2 now denote the
of the conditional distribution of v( , )
given F (, , ).
Asymptotically we obtain
P ( 2 1 2 ) 1
v(, )
Inference@LS-Kneip 119
Example: Exponential distribution
Assume that Y follows an exponential distribution with parame-
ter . Density and distribution function are then given by
1 x/
f (y, ) = e , F (y, ) = 1 ex/
We have E(Yi ) = and V ar(Yi ) = 2 . The maximum likelihood
n 2
estimator of is given by = n1 i=1 Yi , and V ar() = n .
The parametric bootstrap can then be used to construct con-
dence intervals. The following procedure is straightforward, but
there also exist alternative approaches.
An i.i.d re-sample Y1 , . . . , Yn is generated by randomly dra-
wing observations from an exponential distribution with pa-
rameter .
Y1 , . . . , Yn Estimator
Calculation of
2 and 1
2 quantiles 2 and 1 2 with
P ( | 2 ) =
2
P ( | 1 2 ) = 1
2
where P () denotes probabilities calculated with respect to
the exponential distribution with parameter .
This yields
P ( 2 1 2 ) =
Condence interval: [
, ]
1
2 2
[2 t1 2 , 2 t 2 ],
where t 2 and t1 2 are the
2 and 1
2 quantiles of the condi-
tional distribution of given Yn .
[ 1 v, v],
where 2 and 1 2 are the
2 and 1
2 quantiles of the condi-
tional distribution of v given Yn .
[t 2 , t1 2 ]
Inference@LS-Kneip 124
1.5 Subsampling: Inference for a sample maxi-
mum
Data: i.i.d. random sample Yn := {Y1 , . . . , Yn }.
We now consider the situation that the Yi only takes values in a
compact interval [0, ] such that
P (Yi [0, ]) = 1.
Furthermore, Yi possesses a density f which is continuous on [0, ]
and satises f (y) > 0 for y (0, ], and f (y) = 0 for y [0, ].
The maximum of Yi is unknown and has to be estimated from
the data.
Similar type of extreme value problems frequently arise in eco-
nometrics. An example is the analysis of production eciencies
of dierent rms. The above situation may arise if we consider
production outputs Yi of a sample of rms with identical inputs.
A rm then is ecient if its output equals the maximal possible
value . Note that in practice usually more complicated problems
have to be considered, where production outputs dependent on
individually dierent values of input variables Frontier Ana-
lysis.
Consistent estimator of :
:= max Yi
i=1,...,n
P ( = 0) = P ( = 0|Yn ) 1 e1 ,
while P ( = 0) = 0!
One can conclude that even for large sample sizes distr(
|Yn ) will be very dierent from distr( ) Basic boot-
strap condence intervals are incorrect.
A possible remedy is to use subsampling. Similar to the ordinary
bootstrap, subsampling relies on i.i.d. re-sampling from Y, and
the only dierence consists in the fact that subsampling is based
on drawing a smaller number < n of observations.
Inference@LS-Kneip 126
Subsampling bootstrap:
Choose some < n
Determine an i.i.d. re-sample Y1 , . . . , Yk by drawing ran-
domly observations from {Y1 , . . . , Yn } bootstrap esti-
mator := maxi=1,...,k Yi
For the above problem subsampling is consistent.
If = n for some 0 < < 1, then
Inference@LS-Kneip 127
Condence interval based on subsampling:
Calculation of
2 and 1
2 quantiles t 2 and t1 2 with
P ( t 2 ) =
2
P ( t1 2 ) = 1
2
where P () denotes probabilities calculated with respect to
the conditional distribution of given Yn .
This yields
P (t 2 ( ) t1 2 ) 1 ,
P (t 2 n( ) t1 2 ) 1 .
Inference@LS-Kneip 128
1.6 Appendix
Properties:
0 Fn (x) 1
Fn (x) = 0 if x < X(1)
F (x) = 1 if x X(n)
Fn is a monotonically increasing step function
Inference@LS-Kneip 129
Example:
x1 x2 x3 x4 x5 x6 x7 x8
5,20 4,80 5,40 4,60 6,10 5,40 5,80 5,50
1.0
0.8
0.6
0.4
0.2
0.0
4.0 4.5 5.0 5.5 6.0 6.5
Inference@LS-Kneip 130
Theoretical properties of Fn
Theorem: For every x IR we obtain
Consequences:
E(Fn (x)) = F (x), i.e.. Fn (x) is an unbiased estimator of
F (x)
V ar(Fn (x)) = n1 F (x)(1 F (x)) the standard error of
Fn (x) decreases as n increaases. Fn (x) is a consistent esti-
mator of F (x)).
Theorem of Glivenko-Cantelli:
( )
P lim sup |Fn (x) F (x)| = 0 =1
n
xIR
Inference@LS-Kneip 131
1.6.2 Consistency of estimators
Convergence in probability:
Let X1 , X2 , . . . and X be random variables dened on a pro-
bability space (, A, P). Xn converges in probability to X
if
lim P [|Xn X| < ] = 1
n
for every > 0. One often uses the notation Xn P X
weak consistency:
An estimator is called weakly consistent if n P
Convergence in mean square:
Let X1 , X2 , . . . and X be random variables dened on a pro-
bability space (, A, P). Xn converges in mean square to
X if
( )
lim E |Xn X| = 0
2
n
Notation: Xn M SE X
mean square consistency:
is mean square consistent if n M SE .
Inference@LS-Kneip 132
Strong Convergence (Convergence with probability 1):
Let X1 , X2 , . . . and X be random variables dened on a pro-
bability space (, A, P). Xn converges with probability 1
(or almost surely) to X if
[ ]
P lim Xn = X = 1
n
Notation: Xn a.s. X
Strong consistency (consistency with probability 1):
An estimator is strongly consistent if n a.s.
Xn M SE X implies Xn P X
Xn a.s. X implies Xn P X
2
M SE(X) := E((X ) ) = V ar(X) =
2
n 0
n
X P as n
n=9 n=144
1.5 1.5
1.0 1.0
0.5 0.5
0.0 0.0
Inference@LS-Kneip 134
1.6.3 Convergence in distribution
Let Z1 , Z2 , . . . be a sequence of random functions with distri-
bution functions F1 , F2 , . . . , and let Z be a random variable
with distribution function F . Zn konverges in distribution to
Z if
Notation: Zn L Z
Inference@LS-Kneip 135
Theorem (Berry-Esen): Let X1 , X2 , . . . be a sequence of i.i.d.
random variables with mean E(Xi ) = and variance V ar(Xi ) =
E((Xi i )2 ) = 2 > 0. Then, if Gn denotes the distribution
function of n(X)
,
33 E(|Xi |3 )
sup |Gn (t) (t)|
t 4 3 n1/2
Inequality of Chebychev:
1
P [|X | > k] for all k > 0
k2
1
P [ k X + k] 1
k2
k P [ k X + k]
2 1 1
4 = 0, 75
3 1 9 0, 89
1
4 1 1
16 = 0, 9375
Generalization:
E(|X |r )
P [|X | > k] for all k > 0, r = 1, 2, . . .
kr
Inference@LS-Kneip 137
Cauchy-Schwarz inequality:
Let x1 , . . . , xn and y1 , . . . , yn be arbitrary real numbers. Then
n
n n
( xi yi )2 ( x2i )( yi2 )
i=1 i=1 i=1
Integrated version:
( )2
b b b
f (x)g(x)dx ( f (x)2 dx)( g(x)2 dx)
a a a
Hlder inequality:
Sei p > 1 und p1 + 1q = 1
Let xi , yi 0, i = 1, . . . , n be arbitrary numbers. Then
n n
p 1/p
n
xi yi ( xi ) ( yiq )1/q
i=1 i=1 i=1
Inference@LS-Kneip 138
2 Bootstrap and Regression Models
Problem: Analyze the inuence of some explanatory (indepen-
dent) variables X1 , X2 , . . . , Xp on a response variable (or de-
pendent variable) Y .
Observations
(Y1 , X11 , . . . , X1p ), (Y2 , X21 , . . . , X2p ), . . . , (Yn , Xn1 , . . . , Xnp )
Model
[ ]
i N (0, )
2
Inference@LS-Kneip 21
Remark: Regression analysis is usually a conditional analysis.
The goal is to estimate the regression function m which is the con-
ditional expectation of Y given X1 , . . . , Xp . Standard inference
studies the behavior of estimators conditional on the observed
values.
However, dierent types of bootstrap may be used depending on
how the data is generated.
1) Random design:
(Y1 , X11 , . . . , X1p ), (Y2 , X21 , . . . , X2p ), . . . , (Yn , Xn1 , . . . , Xnp )
is a sample of i.i.d. random vectors, i.e. observations are in-
dependent and identically distributed.
Example: p + 1 measurements from n individuals randomly
drawn from an underlying population.
2) (Xj1 , . . . , Xjp ), j = 1, . . . , p, random vectors which are, ho-
wever, not independent or not identically distributed (e.g.
time series data, the X-variables are observed in successive
time periods).
3) Fixed design: Data are collected at are pre-specied, non-
random values Xjk (corresponding for example to dierent
experimental conditions).
Inference@LS-Kneip 22
The model can be rewritten in matrix notation:
Y =X+
E() = 0, Cov() = 2 In ,
[ Nn (0, 2 In )]
X X12 X1p
Y 11
1
. X21 X22 X2p
with Y = .. , X= .. ..
..
. . .
Yn
Xn1 Xn2 Xnp
0 1
1 2
=
.. , =
..
. .
p n
E (0 )
0
. .
E () = .. = .. =
E (p ) p
2. Covariance matrix:
Inference@LS-Kneip 24
Estimation of 2 :
p
The residuals i = Yi Yi = Yi 0 j Xij estimate
j=1
the error term i
Estimator 2 of 2 :
1 n
2
= (Yi Yi )2
n p 1 i=1
2 is an unbiased estimator 2
If the true error terms i are normally distributed, then (n
2
p 1) 2 2np1
Let ij , i, j = 1, . . . , p + 1 denote the elements of the matrix
= [XT X]1 . Then, for normal errors,
j j
tnp1
jj
j j
Note: Under the normality assumption, is a Pivot stati-
jj
stics. In the general case (under some weak regularity conditions),
this quantity is an asymptotic Pivot statistics. jj /n converges
to the j th diagonal element of the matrix C, and therefore
j j
L N (0, 1) as n
jj
Inference@LS-Kneip 25
2.1 Bootstrapping Pairs
P (j t1 2 ,j ) 1 , P (j > t1 2 ,j ) ,
2 2
Here, P denotes probabilities with respect to conditional
distribution of j given Yn .
Approximate 1 (symmetric) condence interval:
[2j t1 2 ,j , 2j t 2 ,j ]
Inference@LS-Kneip 26
Remark: Under some weak regularity conditions the bootstrap
is consistent, whenever
j j
jj
Determine
2 and 1
2 quantiles 2 ,j and 1 2 ,j of the
j j
conditional distribution of
jj
of the slope 1 .
Random design implies that (Yi , Xi ), and hence (i , Xi ), i =
1, . . . , n are independent and identically distributed. Under some
regularity conditions (existence of moments) we have
1
(Xi X)2 p E(Xi x )2 = X
2
,
n i
where
( )
2
v,X = E (Xi x )2 2i .
If i and Xi are independent and 2 = var(i ) does not depend
2 2
on Xi , then v,X = X . We then generally obtain for large n
( 1 )
n i (X i X) i
distr( n(1 1 )) distr
i (Xi X)
1 2
n
( 1 )
n i (X i X) i
2
v,X
distr 2 N (0, 4 )
X x
Inference@LS-Kneip 28
Now consider the bootstrap estimator 1 ,
i (Xi X )i
1
(X X )Y
1 = i i
X )2
i
= 1 + 1
n
X )2
,
i (X i n i (X i
where i = Yi 0 1 Xi .
Recall that by denition, (Yi , Xi ), and hence (i , Xi ), i = 1, . . . , n
are independent and identically distributed observations (condi-
tional on Yn ). We obtain E( n1 i (Xi X )2 |Yn ) = n1 i (Xi
X)2 =: X
2
, and
1
| (Xi X )2 X
2
| P 0
n i
( )
as n . Moreover, E 1
n i (Xi X )i |Yn = 0 and
( )
1 1
var (Xi X )i |Yn = (Xi X)2 2i
n i n i
Bootstrap consistent
Inference@LS-Kneip 29
2.2 Bootstrapping Residuals
Matrix notation:
1
.
= .. = (I X[XT X]1 XT )Y = (I X[XT X]1 XT )
| {z }
n H
Cov() = 2 (I H)
Standardized residuals:
i
ri = var(ri ) = 2
1 hii
We have i i = 0. For the standardized residuals it is, however,
1
not guaranteed that r = n i ri is equal to zero. The residual
bootstrap thus relies on resampling centered standardized resi-
duals ri := ri r.
Inference@LS-Kneip 210
Note: Residual plots play an important role in validating regres-
sion models.
a.) Nonlinear model:
Mangelnde Modellanpassung
4
2
residuals
0
2
0 50 100 150
fitted y
b.) Heteroscedasticity
Heteroskedadastizitt
100
50
0
Residuals
50
100
150
200
0 50 100 150
fitted y_i
Inference@LS-Kneip 211
Bootstrapping Residuals
Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn ) Estima-
tor
Calculate (centered) standardized residuals
i
ri = , ri = ri r, i = 1, . . . , n
1 hii
Generate random samples 1 , . . . , n of residuals by drawing
observations independently and with replacement from {r1 , . . . , rn }.
Calculate
p
Yi = 0 + j Xij + i , i = 1, . . . , n
j=1
[2j t1 2 ,j , 2j t 2 ,j ]
i (Xi X)i
1
(X i X)Y i
1 = i = 1 + n
i (Xi X) i (Xi X)
2 1 2
n
Let X := n i (Xi X)2 . If the errors i are i.i.d. zero mean
2 1
We have
1 2
E(i |Yn ) = 0, var(i |Yn ) = ri =: 2 ,
n i
and therefore
( )
1 1
var (Xi X)i Yn = (Xi X)2 2
n i n i
Inference@LS-Kneip 214
Implementation of the wild bootstrap:
Original data: i.i.d. sample (Y1 , X1 ), . . . , (Yn , Xn ) Estima-
tor
Calculate (centered) standardized residuals
i
ri = , ri = ri r, i = 1, . . . , n
1 hii
Generate n independent random variables i from binary
distributions,
( )
1 5
P i = i = ,
2
( )
1 5
P i = i = 1 ,
2
5+ 5
i = 1, . . . , n, where = 10 .
Calculate
p
Yi = 0 + j Xij + i , i = 1, . . . , n
j=1
[2j t1 2 ,j , 2j t 2 ,j ]
(Xi X)Yi 1
(Xi X)i
1 = i = 1 + n
i
i (Xi X) i (Xi X)
2 1 2
n
and by construction
( )
1 1
var (Xi X)i Yn = (Xi X)2 2i =: w,X
2
.
n i n i
2.4 Generalizations
Yi = g(Xi , ) + i ,
0.6
0.4
0.2
0.0
0 2 4 6 8 10
X= Alter in Jahren
Inference@LS-Kneip 217
Model: Yi = eXi + i
An estimator is determined by (nonlinear) least squares;
residual: i = Yi eXi
Bootstrap: Random design bootstrapping pairs; bootstrap-
ping residuals for homoscedastic errors; wild bootstrap for
heteroscedastic errors.
2) Median Regression:
Linear model: Yi = 0 + j j Xij + i
In some applications the errors possess heavy tails ( out-
liers!). In such situations estimation of by least squares
may not be appropriate, and statisticians tend to use more
robust method. A sensible procedure then is to determine
estimates by minimizing
n
|Yi 0 j Xij |
i=1 j
Asymptotic distribution:
n( ) L N (0, 1 2 )
Bootstrapping residuals
Calculate centered residuals
1
t = Xt Xt1 , t = t t , t = 2, . . . , n
n1 t
[2 t1 2 , 2 t 2 ,j ]
Inference@LS-Kneip 220