Professional Documents
Culture Documents
Predictive Learning
LECTURE SET 4
Statistical Learning Theory
OUTLINE of Set 4
Objectives and Overview
Inductive Learning Problem
Setting
Keep-It-Direct Principle
Analysis of ERM
VC-dimension
Generalization Bounds
Structural Risk Minimization (SRM)
Summary and Discussion
3
Objectives
Problems with philosophical
approaches
- lack quantitative description/
characterization of ideas;
- no real predictive power (as in Natural
Sciences)
- no agreement on basic definitions/
concepts (as in Natural Sciences)
Characteristics of Scientific
Theory
Problem setting
Solution approach
Math proofs (technical analysis)
Constructive methods
Applications
Note: Problem Setting and Solution
Approach are independent (of each
5
other)
History and
Overview(contd)
MAIN CONCEPTUAL CONTRIBUTIONS
Distinction between problem setting,
inductive principle and learning algorithms
Direct approach to estimation with finite
data (KID principle)
Math analysis of ERM (standard inductive
setting)
Two factors responsible for
generalization:
- empirical risk (fitting error)
- complexity(capacity) of approximating
7
Importance of VC-theory
Math results addressing the main
question:
- under what general conditions the ERM approach
leads to (good) generalization?
12
x2
4
2
0
-2
-4
-6
-2
4
x1
10
13
Probabilistic Approach
x2
4
2
0
-2
-4
-6
-2
4
x1
10
14
ERM Approach
Quadratic and linear decision boundary
estimated via minimization of squared loss
10
8
6
x2
4
2
0
-2
-4
-6
-2
4
x1
10
15
Estimation of multivariate
functions
Properties of high-dimensional
data
OUTLINE of Set 4
Objectives and Overview
Inductive Learning Problem
Setting
Keep-It-Direct Principle
Analysis of ERM
VC-dimension
Generalization Bounds
Structural Risk Minimization (SRM)
Summary and Discussion
18
Keep-It-Direct Principle
Empirical Comparison
2
g
(
x
)
sin
(2x) x [0,1]
Target function: sine-squared
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
21
Conclusion
OUTLINE of Set 4
Objectives and Overview
Inductive Learning Problem
Setting
Keep-It-Direct Principle
Analysis of ERM
VC-dimension
Generalization Bounds
Structural Risk Minimization (SRM)
Summary and Discussion
26
i 1
2. Generalization bounds
3. Inductive principles (for finite
samples)
4. Constructive methods (learning
algorithms) for implementing (3)
NOTE: (1)(2)(3)(4)
27
Consistency/Convergence of
ERM
Consistency of ERM
Remp
Convergence of empirical risk
to expected
R
risk does not imply consistency of ERM
Models estimated via ERM (w*) are always biased
estimates of the functions minimizing true risk:
Remp n* R n*
29
OUTLINE of Set 4
Objectives and Overview
Inductive Learning Problem
Setting
Keep-It-Direct Principle
Analysis of ERM
VC-dimension
Generalization Bounds
Structural Risk Minimization (SRM)
Summary and Discussion
31
SHATTERING
Linear indicator functions: can split 3 data points
in 2D in all 2^^3 = 8 possible binary partitions
32
VC DIMENSION
Definition: A set of functions has VC-dimension h
VC-dimension and
Falsifiability
I
(
x
c
)
39
40
41
functions
f x, w I w g (x) w 0
i 1
42
VC-dimension vs number of
parameters
VC-dimension can be equal to DoF
(number of parameters)
Example: linear estimators
VC-dimension can be smaller than DoF
Example: penalized estimators
VC-dimension can be larger than DoF
Example: feature selection
sin (wx)
43
Ten training
samples
from2
2
2
45
OUTLINE of Set 4
Objectives and Overview
Inductive Learning Problem
Setting
Keep-It-Direct Principle
Analysis of ERM
VC-dimension
Generalization Bounds
Structural Risk Minimization
(SRM)
46
R risk
*
to the true
Generalization Bounds
Bounds for learning machines
where
is called the confidence interval
Regression:
1
the following bound holds with
probability of
for all approximating
Rfunctions
( ) Remp ( ) / 1 c
where
a n
h ln 2 1 ln / 4
n ln
h
,
a1
h
n
n
48
can be
min 4level
/ n ,1
obtained by setting the confidence
and theoretical constants:
h h h ln n
n
n
n
2
n
49
-0.5
0
0.2
0.4
0.6
0.8
50
10
10
DegreeofFreedom
10
10
fpe
gcv
vc
cv
fpe
gcv
vc
cv
30
20
10
0
OUTLINE of Set 4
Objectives and Overview
Inductive Learning Problem
Setting
Keep-It-Direct Principle
Analysis of ERM
VC-dimension
Generalization Bounds
Structural Risk Minimization
(SRM)
52
R ( ) ~ Remp ( )
54
55
SRM Approach
Use VC-dimension as a controlling parameter for
minimizing VC bound:
R( ) Remp ( ) n / h
Remp ( )
n / h
1. Keep
fixed and minimize
(most statistical and neural network methods)
n / h
2. Keep
fixed and minimize
(Support Vector Machines)
Remp ( )
56
f m x, w b
A set of algebraic polynomials
f 1 f 2 .... f k ....
is a structure since
i
w
x
i
i 0
More generallyf m x, w, V b wi g x , v i
i 0
where gx,vi is a set of basis functions
(dictionary).
wi
x,w
,
w
ck
k
Minimization of empirical risk (MSE) on each
Sk
element
of a structure is a constrained
minimization problem
This optimization problem can be equivalently
2 the penalized empirical
stated
as
minimization
of
R
w
k ~ ck
pen
k
emp
k
risk
functional:
where the choice of
59
0
.
8
sin(
2
x
)
0
.
2
x
0.5 x
y-values ~ target fct
additive Gaussian noise with st. dev 0.05
Experimental set-up
training set ~ 40 samples
validation set ~ 40 samples (for model
selection)
SRM structures defined on algebraic
polynomials
- dictionary (polynomial degrees 1 to 10)
- penalization (fixed degree-10 polynomial)
60
55.9565 x 5
- dictionary
y 0.6186 22.7337 x 2 41.1772 x 3 19.2736 x 4
- penalization lambda=1.013e-005
- sparse polynomial
Visual results: target fct~ red line, feature selection~
black solid, dictionary ~ green, penalization ~ yellow
line
1
0.5
0.5
1.5
0.2
0.4
0.6
x
0.8
61
SRM Summary
SRM structure ~ complexity ordering on a
set of admissible models (approximating
functions)
Many different structures on the same
set of approximating functions (possible
models)
How to choose the best structure?
- depends on application data
- VC theory cannot provide answer
SRM = mechanism for complexity
control
- selecting optimal complexity for a given
data set
62
OUTLINE of Set 4
Objectives and Overview
Inductive Learning Problem
Setting
Keep-It-Direct Principle
Analysis of ERM
VC-dimension
Generalization Bounds
Structural Risk Minimization
(SRM)
63
Methodology
- learning problem setting (KID principle)
- concepts (risk minimization, VCdimension, structure)
Interpretation/ evaluation of existing
methods
Model selection using VC-bounds
New types of inference (TBD later)
What theory can not do:
- provide formalization (for a given
application)
- select good structure
- always a gap between theory and
64
applications