You are on page 1of 11

Prediction

Predicting

the identity of one thing based


purely on the description of another, related
thing

Not

necessarily future events, just unknowns

Based

on the relationship between a thing


that you can
know and a thing you need to predict

Predictor => prediction

Predictors and Prediction

Classification and Prediction

Linear Regression
Regression

analysis is a statistical methodology that is most often

used for numeric prediction


Estimating

the relationship among variables

Regression Line
Linear regression consists of finding the bestfitting straight line through the points. The bestfitting line is called a regression line.

Simple Regression
In simple linear regression, we predict scores on one variable
from the scores on a second variable.
The variable we are predicting is called the criterion variable
and is referred to as Y. The variable we are basing our
predictions on is called the predictor variable and is referred
to as X.
When there is only one predictor variable, the prediction
method is called simple regression. In simple linear
regression, the topic of this section, the predictions of Y
when plotted as a function of X form a straight line.

Table : 1 Actual
Data
X
Y
1.0
1.0
2.0
2.0
3.0
1.3
4.0
3.8
5.0
2.3

Standard Deviation
Table : 2 Statistics for computing the
regression line
Mx
3

My
2.06

Sx
Sy
1.5811 1.0720
39
31

r
0.627

Sx = (X-Mx)2/n-1
Sy = (Y-My)2/n-1
Correlation
r = xy/ x2y2

Table:
3
X
1.0
2.0
3.0
4.0
5.0

Y
1.0
2.0
1.3
3.8
2.3

x2
(Xi(Xi-Mx) MX)2
-2.0
4
-1.0
1
0.0
0
1.0
1
2.0
4

10

y
Yi-My
-1.06
-0.06
-0.76
1.69
0.19

y2
(Yi
-My)2
xy
1.1236 2.12
0.0036 0.06
0.5776
0
2.8561 1.69
0.0361 0.38
4.5970 4.2500

Formula for regression line


y' = bX+A
y' is the predicted scope
b is the Slope of line
b = r.Sy/Sx
A is the y intercept
A = My-bMx
Therefore
y' =

0.425X+0.785

0.425
0.785

Table 4
X

Y'

Y-Y'

(y-y')2

1.0

1.0

1.210

-0.21

0.044

2.0

2.0

1.635

0.37

0.133

3.0

1.3

2.060

-0.76

0.578

4.0

3.8

2.485

1.27

1.600

5.0

2.3

2.910

-0.66

0.436

Errors of Prediction
Y-Y'

For example, the first point has a Y of 1.00 and a


predicted Y
of 1.21. Therefore, its error of prediction is -0.21.

Table 5 :
Predicted Data

X
1.00
2.00
3.00
4.00
5.00

Y'
1.21
1.64
2.06
2.49
2.91

Linear regression consists of finding the bestfitting straight line through the points. The
best-fitting line is called a regression line.

The black diagonal line in Figure is the regression line and


consists of the predicted score on Y for each possible value
of X.
The vertical lines from the points to the regression line
represent the errors of prediction.
As you can see, the red point is very near the regression
line; its error of prediction is small.
By contrast, the yellow point is much higher than the
regression line and therefore its error of prediction is large.
You may have noticed that we did not specify what is
meant by best-fitting line.
By far the most commonly used criterion for the bestfitting line is the line that minimizes the sum of the
squared errors of prediction. That is the criterion that was
used to find the line in Figure.
The sum of the squared errors of prediction shown in
Table is lower than it would be for any other regression
line

You might also like