You are on page 1of 22

years year since 1995 income income based on model e (residual value)

1995
0
53807
52847
960
1996
1
55217
54729.3
487.7
1997
2
55209
56611.6
-1402.6
1998
3
55415
58493.9
-3078.9
1999
4
63100
60376.2
2723.8
2000
5
63206
62258.5
947.5
2001
6
63761
64140.8
-379.8
2002
7
65766
66023.1
-257.1
2003
8
standard deviation of e
2004
9
2005
10
2006
11
2007
12
2008
13
2009
14
2010
15
81081.5
2011
16
2012
17
2013
18
percentage error
2014
19
Correlation (x,y)

e^2

x-xs.mean

y-ys.mean (x-xs.mean)2

(y-ys.mean)2

921600
-3.5 -5628.13
12.25 31675791
237851.29
-2.5 -4218.13
6.25 17792579
1967286.76
-1.5 -4226.13
2.25 17860133
9479625.21
-0.5 -4020.13
0.25 16161405
7419086.44
0.5 3664.875
0.25 13431309
897756.25
1.5 3770.875
2.25 14219498
144248.04
2.5 4325.875
6.25 18713195
66100.41
3.5 6330.875
12.25 40079978
21133554 RSS-Residual sum of squares
1876.7683 RSE from residual sum of squares

sum(x-xmean)

0.0315768 3.158%

incom
70000
60000

f(x) = 1882.25x +
R = 0.87563661

50000
40000
Axis Title

30000
20000
10000
0

Title

income
income

Linear (income )

70000
60000

f(x) = 1882.25x + 52847.25


R = 0.87563661

50000
40000
30000
20000
10000
0

4
Axis Title

years
1995
1996
1997
1998
1999
2000
2001
2002

How accurate is the sample mean is the estimate of population mean ?


0 and 1 are an estimate of 0 and 1 ? We have established that the average of
but that a single estimate of 0 and 1 may be a substantial underestimate or over
In general, we answer this question by computing the standard error of 0 and 1.
S.E (0)2
S.E (1)2
var

varience -In general, 2 is not known, but can be estimated from the data. This estimate is known a

If SE( 1) is small, then even relatively small values of 1 may provide strong eviden
In contrast, if SE( 1) is large, then 1 must be large in absolute value in order for us

t-statistics
t-statistics
The tdistribution has a bell shape and for values of n greater than approximately 30 it i

it is a simple matter to compute the probability of observing any value equal to |t| or la

If we see a small p-value then we can infer that there is an association between the pre
that is, we declare a relationship to exist between X and Y if the p-value is small enou

The RSE is an estimate of the standard deviation of . Roughly speaking, it is the avera

If the predictions obtained using the model are very close to the true outcome values
On the other hand, if yi is very far from yi for one or more observations, then the RSE

The RSE provides an absolute measure of lack of t of the model to the data. But since

The R2 statistic provides an alternative measure of t. it always takes on a value betwe

TSS measures the total variance in the response Y , and can be thought of as the amount of variab
In contrast, RSS measures the amount of variability that is left unexplained after performing the re
Hence, TSSRSS measures the amount of variability in the response that is explained (or removed

correlation is also a measure of the linear relationship between X and Y

we might be able to use r = Cor(X,Y) instead of R 2 in order to assess the t of the linear

In fact, it can be shown that in the simple linear regression setting, R 2 = r2. In other wo

In the next section we will discuss the multiple linear regression problem, in which we
The concept of correlation between the predictors and the response does not extend au
since correlation quanties the association between a single pair of variables rather tha
correlation is denoted by 'r ' and it is also called pearson correlation coeffieicent

year since 1995 income income based on model e (residual value)


0
53807
52847
960
1
55217
54729.3
487.7
2
55209
56611.6
-1402.6
3
55415
58493.9
-3078.9
4
63100
60376.2
2723.8
5
63206
62258.5
947.5
6
63761
64140.8
-379.8
7
65766
66023.1
-257.1

e average of 0 and 1 s over many data sets will be very close to 0 and 1 ,
mate or overestimateof 0 and 1. How far o will that single estimateof 0 and 1 be?
0 and 1.
2052.956436661 45.30956
14031839605659 3745910
1876.768250655

mate is known as the residual standard error

strong evidence that 1 = 0, and hence that there is a relationship between X and Y .
n order for us to reject the null hypothesis

defines the number of standard deviations that 1 is away from 0.


1882.25
mately 30 it is quite similar to the normal distribution.

ual to |t| or larger, assuming 1 = 0. We call this probability the p-value

tween the predictor and the response. We reject the null hypothesis
is small enough

it is the average amount that the response will deviate from the true regression line.

ome valuesthat is, if yi yi for i =1,...,nthen RSE will be small, and we can conclude that th
then the RSE may be quite large, indicating that the model doesnt t the data well.

ata. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE

a value between 0 and 1, and is independent of the scale of Y .

mount of variability inherent in the response before the regressionis performed.


forming the regression
d (or removed) by performing the regression, and R2 measures the proportion of variability in Y that can be

of the linear model.


. In other words, the squared correlation and the R2 statistic are identical

in which we use several predictors simultaneously to predict the response.


not extend automatically to this setting,
les rather than between a larger number of variables. We will see that R2 lls this role.
eicent

e^2
921600
237851.29
1967286.76
9479625.21
7419086.44
897756.25
144248.04
66100.41
21133554 RSS

(x-xaverage)^2
12.25
6.25
2.25
0.25
0.25
2.25
6.25
12.25
42 summation

0 and 1 ,
eof 0 and 1 be?

tween X and Y .

regression line.

d we can conclude that the model ts the data very well.


the data well.

at constitutes a good RSE

of variability in Y that can be explained using X.

tical

onse.
R2 lls this role.

years year since 1995 income income based on model


1995
0
53807
52847
1996
1
55217
54729.3
1997
2
55209
56611.6
1998
3
55415
58493.9
1999
4
63100
60376.2
2000
5
63206
62258.5
2001
6
63761
64140.8
2002
7
65766
66023.1
2003
standard deviation of e
2004
2005 pecentage error
e2/total sample response
2006 correlation
11293.5
2007 correlation
2008
2009
2010
2011
2012
2013
Percentage error
2014
sample Correlation (X,Y)
squared sample correlation(X,Y

x-xs.mean
e (residual value) e^2
960
921600
487.7 237851.29
-1402.6 1967286.76
-3078.9 9479625.21
2723.8 7419086.44
947.5 897756.25
-379.8 144248.04
-257.1
66100.41
21133554
1876.7683

y-ys.mean (x-xs.mean)2
-3.5
-2.5
-1.5
-0.5
0.5
1.5
2.5
3.5

-5628.13
-4218.13
-4226.13
-4020.13
3664.875
3770.875
4325.875
6330.875

12.25
6.25
2.25
0.25
0.25
2.25
6.25
12.25
42
sum

70000

f(x) = 1882.25x + 52
R = 0.87563661

60000
50000
40000

0.0315767528
3.158%
0.9357545672
0.87563661 equals to R squared

Axis Title

30000
20000
10000
0

(y-ys.mean)2

(x-xs.mean)*(y-ys.mean)=sample Covariance(X,Y) or sample StdvXY

31675791
17792579
17860133
16161405
13431309
14219498
18713195
40079978
933887
sum

19698.4375
10545.3125
6339.1875
2010.0625
1832.4375
5656.3125
10814.6875
22158.0625
79054.5

11293.5

income
income

Linear (income )

70000
f(x) = 1882.25x + 52847.25
R = 0.87563661

60000
50000
40000
30000
20000
10000
0

4
Axis Title

Type of Repair
Electrical
Mechanical
Electrical
Mechanical
Electrical
Electrical
Mechanical
Mechanical
Electrical
Electrical

Repair person Months Since Repair time(Hours)


Dave newton
2
2.9
Dave newton
6
3
Bob Jones
8
4.8
Dave newton
3
1.8
Dave newton
2
2.9
Bob Jones
7
4.9
Bob Jones
9
4.2
Bob Jones
8
4.8
Bob Jones
4
4.4
Dave newton
6
4.5

SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

0.7816138423
0.6109201985
0.5622852234
0.713792687
10

ANOVA
df
Regression
Residual
Total

SS
1
8
9

6.4
4.076
10.476

4.62
-1.6

Standard Error
0.319217794
0.4514421336

Predicted Repair time(Hours)


3.02
3.02
4.62
3.02
3.02
4.62
4.62
4.62
4.62
3.02

Residuals
-0.12
-0.02
0.18
-1.22
-0.12
0.28
-0.42
0.18
-0.22
1.48

Coefficients
Intercept
Coded

RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10

MS

F
Significance F
6.4 12.561334642 0.0075733033
0.5095

t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
14.4728774115 5.08347E-007
3.883882447 5.356117553 3.883882447
-3.5441973198 0.0075733033 -2.6410274269 -0.5589725731 -2.6410274269

Standard Residuals
-0.178313988
-0.029718998
0.267470982
-1.8128588779
-0.178313988
0.416065972
-0.624098958
0.267470982
-0.326908978
2.1992058518

Upper 95.0%
5.356117553
-0.5589725731

Repair person Repairer Coded


Dave newton
1
Dave newton
1
Bob Jones
0
Dave newton
1
Dave newton
1
Bob Jones
0
Bob Jones
0
Bob Jones
0
Bob Jones
0
Dave newton
1
x bar
0.5 y-bar

Repair time(Hours)
2.9
3
4.8
1.8
2.9
4.9
4.2
4.8
4.4
4.5
3.82 (x-xbar)

(x-xbar)2

(x-xbar )2
(y-ybar)
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
2.5 (y-ybar)

-0.92
-0.82
0.98
-2.02
-0.92
1.08
0.38
0.98
0.58
0.68
### -2.66454E-015
Beta1

6.25

-0.92
-0.82
0.98
-2.02
-0.92
1.08
0.38
0.98
0.58
0.68
-2.66454E-015

You might also like