You are on page 1of 31

Basics of regression

Problem statement

Polynomial fitting (for illustration)

Polynomial fitting (for illustration)

Note that the regressor minimizes:

This equation relates to empirical risk


minimization -- Minimizing
the error on
the training samples

Learning the weights

Use of Simple Model

Performance of regression

Performance of regression

10

Complexity of the model


When we are fitting a higher order model, with
limited training samples, we are making the
models ---complex
There arises a problem of over fitting..and the
curse of dimensionality in a way would plague in
making the performance deteriorate on the test
set.
Complex models have large oscillations in the
learned weights.

11

Trend of weights with


different orders

12

Techniques for
controlling the overfitting

13

Techniques to control overfitting

14

Techniques for
controlling the overfitting

15

Techniques to control overfitting

16

Regularization term
Training error being zero does not indicate testing
error will be always zero!
Overfitting generally leads to zero training error, .
Over fitting can be avoided by regularization. The
regularizer will smooth the oscillations by adding a
penalty term, that penalize the weights that go large.
The penalty term , avoids weights from going
extreme values.

Regularization term

17

If training data is well behaved and is


available aplenty, regularization may not
contribute much.
At the very least, adding the term will not
deteriorate the performance ! Hence ,in
practice, it is recommended to
incorporate the penalty term.

18

Selection of model order and


regularization factor

19

Math behind regression : learning


the weight vector
We are given N training samples x1 , x2 ,....., x N
of
single dimension, with scalar target values t1 , t 2 ,....., t N

( x1 , t1 ), ( x2 , t 2 ).......( x N , t N )
Define (M+1) non-linear basis functions

( x) = [ ( x)
0

1 ( x) 2 ( x) .......... M ( x)]

For a polynomial fit,

( x) = [1

x ..........x M ]

Notations

20

w0
w
1

Denote w = w2 Vector of
weights


wM ( M +1) X 1

t1
t
2
t=


t N
NX 1

y1
y
2
y=


yN
NX 1
Vector of
predicted
values

Vector of
targets

Predicted value

21

y1 = T ( x1 )w
y 2 = T ( x2 ) w
y N = T ( xN )w

Linear combinations of these basis


functions gives the predicted value
for each of the samples.

The idea is to learn the (M+1)


weights.

Error function

22

T ( x1 )

T
( x2 )

T
( x N ) NXM

w1
y1
w
y
2

2

=




wM
MX 1 y N NX 1

23

Let

We can write
Error function

T ( x1 )

T
( x2 )

(
x
)

N
NX ( M +1)

y = w
1 N
E (w ) = (ti yi ) 2
2 i =1
=

1
(t w )T (t w )
2

Error function : quadratic

24

1 N
E ( w ) = (ti yi ) 2
2 i =1
=

1
( t w ) T (t w )
2

1
(t w )T t (t w )T w
2

1 T
(t t w T T t t T w + w T T w )
2

Using
=

w T T t = t T w
1 T
(t t 2w T T t + w T T w )
2

25

1 N
E (w ) = (ti yi ) 2
2 i =1
=

1
(t w )T (t w )
2

1
(t w )T t (t w )T w
2

1 T
(t t w T T t t T w + w T T w )
2

26

E (w )
T
T
= t w
w
At the optimal weight
vector , we have

E (w )
=0
w
T t T w = 0

w = ( T ) 1 T t

27

With regularization
1 N

E (w ) = (ti w T (x i ) 2 + w T w
2 i =1
2

It can shown :
w * = ( T + I ) 1 T t

Extension to high
dimension input feature

28

y (x, w ) = w0 + w1 x1 + ........ + wM xM

Let

w0
w
1
w2
w=



wM ( M +1) X 1

1
x
1
x2
( x) =



xM ( M +1) X 1

and

y (x, w ) =w T (x)

: M dimensional training featur

Predicted value

29

y1 = T (x1 )w
y2 = T (x 2 )w

Linear combinations of basis


functions (x) gives the
predicted value for each of the
samples.

y N = T (x N )w
The idea is to learn the M
weights.

30

We can derive a similar expression as


*

w = ( + I ) t
where
=

T (x1 )

T
( x 2 )

T ( x N )

NX ( M +1)

31

Later we will look at regression problem


and derive the weights (a) from the
probabilistic framework and (b) from
optimization techniques (gradient
descent).
We ll now switch gears to explore more on
the generative model of classifiers --parametric and non-parametric
approaches.

You might also like