You are on page 1of 13

CS5691 :Programming Assignment 1

15thM arch − 2019

Team 25

M Sriram(EE16B027),P Venkat(CS16B017)

Indian Institute of Technology, Madras

Guide:Prashanth LA
Abstract
A detailed report on programming assignment 1.Consists analysis on
selecting best model for bayes classier and linear regression

1 Question 1 - Bayes Classication:

Figure 1: Confusion matrix for test

Part I

1
1.1 Decision boundary and Decision Surface

Figure 2: Decision surface and boundary

1.2 Countour curves and eigen vectors

Figure 3: contour curves and eigen vectors

2
1.3 Conclusion

Table 1:
Dataset1
Training 68.6031746032 96.8253968254 96.8571428571 96.7936507937 96.8571428571
validation 66.6666666667 96.0 96.0 95.5555555556 96.0
Dataset2 a b c d e
Training 66.6349206349 85.5555555556 87.0793650794 86.4444444444 90.4761904762
validation 70.3703703704 73.3333333333 86.962962963 76.1481481481 90.962962963

From the accuracy table of training and validation data we can observe that
model e has highest accuracy so model e is the best

2 Question2
2.1 Plot of classication accuracies for given 5 training
sets with standard error bars
for training set =100,500,1000,2000,4000 points
accuracy is 85% for approximately 25 points

Figure 4: Classication accuracies

3
3 Question 3
3.1 Training error for only one feature,only two features
and three features

Table 2: Training error for one feature,two features and three features
Training error one feature two features three features
Dataset3 8.47619047619 5.7619047619 9.04761904762
Dataset4 12.5714285714 11.9047619048 8.09523809524

3.2 Conclusion:
In dataset 3 the covarince matrix determinant for three feature case is almost 0
This implies the 3 features arent independent altogether
so the training error increases as we considered normal for 3 features instead
of a 2 features
This implies that training error can increase if no of features increases be-
cause the features can be dependent

4 Question 4
4.1 Estimation of sv
sv is estimated from given dataset by taking mean=-1

4.2 Bayesian estimation of mean.


mean is estimated from formulas
µn = (nσo2 /(nσo2 + σ 2 ))xn + (σ 2 /(nσo2 + σ 2 ))µ0
σn2 = σo2 σ 2 /(nσo2 + σ 2 )
P (x|D) = N (µn , σ 2 + σn2 )

4
4.3 Plotting estimated densities

Figure 5: Estimated density plots

5 Question 5
Polynomial regression for degrees 1,3,5,9 for sin(2px)+tanh(2px)
and guassian noise
For 10 points of training set we obtain these polynomials

5
Figure 6: polynomial regression

6
Ridge Regression for degree 9

Figure 7: Ridge regression

Datasize varying for degree 9

Figure 8: Datasize varying

7
RMS Error and Scatter plot

Figure 9: Scatterplot for best model and rms

Table of W values for all models

Table 3:
1 1.440 -1.187
3 -0.419 21.453 -57.346 38.460
6 -2.035 64.553 -348.266 793.288 -798.942 248.107 48.006
9 22.376 -1452.3 2.961e+04 -2.83e+05 1.51e+06 -4.81e+06 9.36e+06 -1.08e+07 6.83e+06

Conclsion
Best model is degree 3
If training data size increases then overtting decreases
For low values of λ if λ increases then overtting decreases
For high values of λ if λ increases then overtting increases

8
6 Question 6
6.1 Emperical risk histogram plots for diernt lambda and
diernt degrees

Figure 10:

6.2 Conclusion
If the degree of polynomial is increased then bias decreases and variance in-
creases
The width is bias and height is variance So If degree of polynomial increases
then the histogram will become narrow

9
Question 7
Linear regression with guassian basis
Predicted models for varying training size
Figure 11:

10
Varying regression coecient for best model(2000)

Figure 12:

11
Scatter plots for training 2000 model and lambda=0.01 for training
and test sets
Figure 13: Training and test

RMS Error
Table 4:
Train20 Train100 Train1000 Train2000
Training error 29.897851361554846 28.51901992541407 32.99389054124694 31.998714083979785
Test error 54.56169314935527 32.65511743522386 31.204141982835683 31.18928200620223

12

You might also like