Professional Documents
Culture Documents
Team 25
M Sriram(EE16B027),P Venkat(CS16B017)
Guide:Prashanth LA
Abstract
A detailed report on programming assignment 1.Consists analysis on
selecting best model for bayes classier and linear regression
Part I
1
1.1 Decision boundary and Decision Surface
2
1.3 Conclusion
Table 1:
Dataset1
Training 68.6031746032 96.8253968254 96.8571428571 96.7936507937 96.8571428571
validation 66.6666666667 96.0 96.0 95.5555555556 96.0
Dataset2 a b c d e
Training 66.6349206349 85.5555555556 87.0793650794 86.4444444444 90.4761904762
validation 70.3703703704 73.3333333333 86.962962963 76.1481481481 90.962962963
From the accuracy table of training and validation data we can observe that
model e has highest accuracy so model e is the best
2 Question2
2.1 Plot of classication accuracies for given 5 training
sets with standard error bars
for training set =100,500,1000,2000,4000 points
accuracy is 85% for approximately 25 points
3
3 Question 3
3.1 Training error for only one feature,only two features
and three features
Table 2: Training error for one feature,two features and three features
Training error one feature two features three features
Dataset3 8.47619047619 5.7619047619 9.04761904762
Dataset4 12.5714285714 11.9047619048 8.09523809524
3.2 Conclusion:
In dataset 3 the covarince matrix determinant for three feature case is almost 0
This implies the 3 features arent independent altogether
so the training error increases as we considered normal for 3 features instead
of a 2 features
This implies that training error can increase if no of features increases be-
cause the features can be dependent
4 Question 4
4.1 Estimation of sv
sv is estimated from given dataset by taking mean=-1
4
4.3 Plotting estimated densities
5 Question 5
Polynomial regression for degrees 1,3,5,9 for sin(2px)+tanh(2px)
and guassian noise
For 10 points of training set we obtain these polynomials
5
Figure 6: polynomial regression
6
Ridge Regression for degree 9
7
RMS Error and Scatter plot
Table 3:
1 1.440 -1.187
3 -0.419 21.453 -57.346 38.460
6 -2.035 64.553 -348.266 793.288 -798.942 248.107 48.006
9 22.376 -1452.3 2.961e+04 -2.83e+05 1.51e+06 -4.81e+06 9.36e+06 -1.08e+07 6.83e+06
Conclsion
Best model is degree 3
If training data size increases then overtting decreases
For low values of λ if λ increases then overtting decreases
For high values of λ if λ increases then overtting increases
8
6 Question 6
6.1 Emperical risk histogram plots for diernt lambda and
diernt degrees
Figure 10:
6.2 Conclusion
If the degree of polynomial is increased then bias decreases and variance in-
creases
The width is bias and height is variance So If degree of polynomial increases
then the histogram will become narrow
9
Question 7
Linear regression with guassian basis
Predicted models for varying training size
Figure 11:
10
Varying regression coecient for best model(2000)
Figure 12:
11
Scatter plots for training 2000 model and lambda=0.01 for training
and test sets
Figure 13: Training and test
RMS Error
Table 4:
Train20 Train100 Train1000 Train2000
Training error 29.897851361554846 28.51901992541407 32.99389054124694 31.998714083979785
Test error 54.56169314935527 32.65511743522386 31.204141982835683 31.18928200620223
12