Professional Documents
Culture Documents
Stephen Boyd
EE103
Stanford University
November 3, 2016
Outline
Validation
Feature engineering
y ≈ f (x)
(x1 , y1 ), . . . , (xN , yN )
I model form:
fˆ(x) = θ1 f1 (x) + · · · + θp fp (x)
I fi : Rn → R are basis functions that we choose
I θi are model parameters that we choose
I ŷi = fˆ(xi ) is (the model’s) prediction of yi
I we’d like ŷi ≈ yi , i.e., model is consistent with observed data
std(y)
fˆ(u) = avg(y) + ρ (u − avg(x))
std(x)
fˆ(x)
90
(million barrels per day)
Petroleum consumption
80
70
60
6
(million barrels per day)
Petroleum consumption
−2
I fi (x) = xi−1 , i = 1, . . . , p
I model is a polynomial of degree less than p
fˆ(x) = θ1 + θ2 x + · · · + θp xp−1
I A is a Vandermonde matrix
xp−1
1 x1 ··· 1
1 x2 ··· xp−1
2
A= .
.. ..
..
. .
1 xN ··· xp−1
N
x x
degree 10 degree 15
fˆ(x) fˆ(x)
x x
so model is
ŷ = θ1 + θ2 x1 + · · · + θn+1 xn = xT θ2:n + θ1
I β = θ2:n+1 , v = θ1
I time zeries z1 , z2 , . . .
I auto-regressive (AR) prediction model:
ẑt+1 = β1 zt + · · · + βM zt−M , t = M + 1, M + 2, . . .
I M is memory of model
I ẑt+1 is prediction of next value, based on previous M values
I we’ll choose β to minimize sum of squares of prediction errors,
(ẑM +1 − zM +1 )2 + · · · + (ẑT − zT )2
yi = zM +i , xi = (zM +i−1 , . . . , zi ), i = 1, . . . , T − M
solid line shows one-hour ahead predictions from AR model, first 5 days
70
68
66
Temperature (◦ F)
64
62
60
58
56
20 40 60 80 100 120
k
Validation
Feature engineering
Validation 21
Generalization
basic idea:
I goal of model is not to predict outcome in the given data
I instead it is to predict the outcome on new, unseen data
Validation 22
Validation
Validation 23
Validation
Validation 24
Example
I polynomial fit to training data using N = 100 training points
degree 2 degree 6
fˆ(x) fˆ(x)
x x
degree 10 degree 15
fˆ(x) fˆ(x)
x x
Validation 25
Example
1
Data set
Test set
0.8
Relative RMS error
0.6
0.4
0.2
0 5 10 15 20
Degree
Validation 26
Cross validation
Validation 27
Example
2
I house price, regression fit with x = (area/1000 ft. , bedrooms)
I 774 sales, divided into 5 folds of 155 sales each
I fit 5 regression models, removing each fold
Validation 28
Outline
Validation
Feature engineering
Feature engineering 29
Feature engineering
ŷ = θ1 f1 (x) + · · · + θp fp (x)
Feature engineering 30
Transforming features
Feature engineering 31
Example
Feature engineering 32
Example
Feature engineering 33