You are on page 1of 6

CEE384: Numerical Methods for Engineers

Application Project 4 - Regression

clc
Dist=[2.4 1.5 2.4 1.8 1.8 2.9 1.2 3 1.2];
width=[2.9 2.1 2.3 2.1 1.8 2.7 1.5 2.9 1.5];

% plot the data


scatter(Dist,width,'r')
hold on

% fit a straight line with linear regression


y=transpose(width); %from row vector to column vector.
for n=1:9
X(n,1)=1;
X(n,2)=Dist(n);
end
format long
B=X\y;
ycalc=B(2)*X+B(1); % fitness equation

% add fitness line


plot(Dist,ycalc)
xlabel('Distance')
ylabel('Lane width')
title('Linear Regression Relation Between distance and lane width')
grid on

x=Dist'; %from row vector to column vector.


Sxx=sum((x-mean(x)).^2);
Syy=sum((y-mean(y)).^2);
Sxy=sum((x-mean(x)).*(y-mean(y)));
SSE=Syy-B(2)*Sxy;
S2yx=SSE/(n-2);
Standard error of estimate:

Standard_error_of_estimate = sqrt(SSE/(n-2))

Standard_error_of_estimate =

0.236108935136509

scaled residual

y1 = polyfit(Dist,width,1);
predict_width = (Dist.*y1(1) + B(1)); % width predicted
Residual = width - predict_width

Residual =

Columns 1 through 3

0.422903033908388 0.283045806067817 -0.177096966091612

Columns 4 through 6
0.062998215348007 -0.237001784651993 -0.143842950624628

Columns 7 through 9

-0.096906603212374 -0.017192147531231 -0.096906603212374

coefficient of determination

coefficient_of_determination = Sxy^2/(Sxx*Syy)

coefficient_of_determination =

0.837403331350387

lane length when average safe distance is 2m

X = 2;
ycalc=B(2)*X+B(1);
Lane_width = ycalc

Lane_width =

2.183700178465199
Discuss the adequacy of your regression

 There is a linear relationship between y and x variables. This is shown from the
scatterplots.

 Normality: this is tested by examining the normality of the residuals. The


histogram indicates the residuals are skewed, that means that the assumption is
not satisfied.
 No Multi-collinearity—Multiple regression assumes that the independent
variables are not highly correlated with each other. This assumption is tested
using Variance Inflation Factor (VIF) values.

Tolerance, T = 1-R2

R2 = 0.8374

T = 1-0.8374 = 0.1626

With T > 0.1 there is no multi-collinearity in the data hence this assumption is met

 Homoscedasticity—A plot of standardized residuals versus predicted values


below shows points are equally distributed across all values of the independent
variables. Hence, the assumption is met.
Therefore majoity of the assumptions are met for this system.

You might also like