Professional Documents
Culture Documents
November 23
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Introduction
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
The first term captures the fit to the data, while the second
penalizes curvature note that for a line, f 00 (u) = 0 for all u
Here, is the smoothing parameter, and it controls the
tradeoff between the two terms:
= 0 imposes no restrictions and f will therefore interpolate
the data
= renders curvature impossible, thereby returning us to
ordinary linear regression
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Splines
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Basis functions
M
X
m hm (x)
m=1
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
h2 (x) = x,
h3 (x) = (x 1 )+ ,
h4 (x) = (x 2 )+ ,
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Splines
The preceding is an example of a spline: a piecewise m 1
degree polynomial that is continuous up to its first m 2
derivatives
By requiring continuous derivatives, we ensure that the
resulting function is as smooth as possible
We can obtain more flexible curves by increasing the degree of
the spline and/or by adding knots
However, there is a tradeoff:
Few knots/low degree: Resulting class of functions may be too
restrictive (bias)
Many knots/high degree: We run the risk of overfitting
(variance)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
j = 1, . . . , m
k )m1
+
l = 1, . . . , K
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Quadratic splines
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Cubic splines
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Additional notes
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
B-splines in R
Fortunately, one can use B-splines without knowing the details
behind their complicated construction
In the splines package (which by default is installed but not
loaded), the bs() function will implement a B-spline basis for
you
X <- bs(x,knots=quantile(x,p=c(1/3,2/3)))
X <- bs(x,df=5)
X <- bs(x,degree=2,df=10)
Xp <- predict(X,newdata=x)
By default, bs uses degree=3, knots at evenly spaced
quantiles, and does not return a column for the intercept
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
0.00
0.05
0.10
0.15
0.20
female
male
0.05
10
15
20
Age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
spline
loess
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Natural splines in R
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
0.0
0.2
0.4
^
f (x)
^ (x)
0.6
0.8
1.0
100
120
140
160
180
200
220
Patrick Breheny
100
120
140
160
180
200
220
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
We will now see that the solution to this problem lies in the
family of natural cubic splines
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Terminology
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Design matrix
Let {Nj }nj=1 denote the collection of natural cubic spline basis
functions and N denote the n n design matrix consisting of the
basis functions evaluated at the observed values:
Nij = Nj (xi )
P
f (x) = nj=1 Nj (x)j
f (x) = N
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Solution
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Vf(x0 ) = 2
li (x0 )2
(yi yi )2
2 = i
n 2 +
= tr(L)
P
= tr(L0 L)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
GCV
0.5
male
1.0
female
0.006
0.005
0.004
0.003
0.002
0.5
1.0
1.5
log()
Patrick Breheny
1.5
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Males
10
15
20
25
10
15
20
0.15
0.05
0.05
0.05
0.15
0.05
0.05
0.05
0.15
25
10
15
20
25
Females
10
15
20
25
10
15
Patrick Breheny
20
25
0.05
0.15
0.05
0.05
0.15
0.05
0.05
0.05
0.15
10
15
20
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
10
female
15
20
male
0.20
spnbmd
0.15
0.10
0.05
0.00
0.05
10
15
20
25
age
Patrick Breheny
25
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
R implementation
Recall that local regression had a simple, standard function for
basic one-dimensional smoothing (loess) and an extensive
package for more comprehensive analyses (locfit)
Spline-based smoothing is similar
(smooth.spline) does not require any packages and
implements simple one-dimensional smoothing:
fit <- smooth.spline(x,y)
plot(fit,type="l")
predict(fit,xx)
By default, the function will choose based on GCV, but this
can be changed to CV, or you can specify
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Hypothesis testing
We are often interested in testing whether the smaller of two
nested models provides an adequate fit to the data
In ordinary regression, this accomplished via the F -statistic:
F =
(RSS0 RSS)/q
,
Patrick Breheny
Introduction
Regression splines (parametric)
Smoothing splines (nonparametric)
Patrick Breheny