Linear Regression

years year since 1995 income income based on model e (residual value)
1995
0
53807
52847
960
1996
1
55217
54729.3
487.7
1997
2
55209
56611.6
-1402.6
1998
3
55415
58493.9
-3078.9
1999
4
63100
60376.2
2723.8
2000
5
63206
62258.5
947.5
2001
6
63761
64140.8
-379.8
2002
7
65766
66023.1
-257.1
2003
8
standard deviation of e
2004
9
2005
10
2006
11
2007
12
2008
13
2009
14
2010
15
81081.5
2011
16
2012
17
2013
18
percentage error
2014
19
Correlation (x,y)
e^2
x-xs.mean
y-ys.mean (x-xs.mean)2
(y-ys.mean)2
921600
-3.5 -5628.13
12.25 31675791
237851.29
-2.5 -4218.13
6.25 17792579
1967286.76
-1.5 -4226.13
2.25 17860133
9479625.21
-0.5 -4020.13
0.25 16161405
7419086.44
0.5 3664.875
0.25 13431309
897756.25
1.5 3770.875
2.25 14219498
144248.04
2.5 4325.875
6.25 18713195
66100.41
3.5 6330.875
12.25 40079978
21133554 RSS-Residual sum of squares
1876.7683 RSE from residual sum of squares
sum(x-xmean)
0.0315768 3.158%
incom
70000
60000
f(x) = 1882.25x +
R = 0.87563661
50000
40000
Axis Title
30000
20000
10000
0
Title
income
income
Linear (income )
70000
60000
f(x) = 1882.25x + 52847.25

R = 0.87563661
50000
40000
30000
20000
10000
0
4
Axis Title
years
1995
1996
1997
1998
1999
2000
2001
2002
How accurate is the sample mean is the estimate of population mean ?

0 and 1 are an estimate of 0 and 1 ? We have established that the average of
but that a single estimate of 0 and 1 may be a substantial underestimate or over
In general, we answer this question by computing the standard error of 0 and 1.
S.E (0)2
S.E (1)2
var
varience -In general, 2 is not known, but can be estimated from the data. This estimate is known a
If SE( 1) is small, then even relatively small values of 1 may provide strong eviden
In contrast, if SE( 1) is large, then 1 must be large in absolute value in order for us
t-statistics
t-statistics
The tdistribution has a bell shape and for values of n greater than approximately 30 it i
it is a simple matter to compute the probability of observing any value equal to |t| or la
If we see a small p-value then we can infer that there is an association between the pre
that is, we declare a relationship to exist between X and Y if the p-value is small enou
The RSE is an estimate of the standard deviation of . Roughly speaking, it is the avera
If the predictions obtained using the model are very close to the true outcome values
On the other hand, if yi is very far from yi for one or more observations, then the RSE
The RSE provides an absolute measure of lack of t of the model to the data. But since
The R2 statistic provides an alternative measure of t. it always takes on a value betwe
TSS measures the total variance in the response Y , and can be thought of as the amount of variab
In contrast, RSS measures the amount of variability that is left unexplained after performing the re
Hence, TSSRSS measures the amount of variability in the response that is explained (or removed
correlation is also a measure of the linear relationship between X and Y
we might be able to use r = Cor(X,Y) instead of R 2 in order to assess the t of the linear
In fact, it can be shown that in the simple linear regression setting, R 2 = r2. In other wo
In the next section we will discuss the multiple linear regression problem, in which we
The concept of correlation between the predictors and the response does not extend au
since correlation quanties the association between a single pair of variables rather tha
correlation is denoted by 'r ' and it is also called pearson correlation coeffieicent
year since 1995 income income based on model e (residual value)

0
53807
52847
960
1
55217
54729.3
487.7
2
55209
56611.6
-1402.6
3
55415
58493.9
-3078.9
4
63100
60376.2
2723.8
5
63206
62258.5
947.5
6
63761
64140.8
-379.8
7
65766
66023.1
-257.1
e average of 0 and 1 s over many data sets will be very close to 0 and 1 ,
mate or overestimateof 0 and 1. How far o will that single estimateof 0 and 1 be?
0 and 1.
2052.956436661 45.30956
14031839605659 3745910
1876.768250655
mate is known as the residual standard error
strong evidence that 1 = 0, and hence that there is a relationship between X and Y .
n order for us to reject the null hypothesis
defines the number of standard deviations that 1 is away from 0.

1882.25
mately 30 it is quite similar to the normal distribution.
ual to |t| or larger, assuming 1 = 0. We call this probability the p-value
tween the predictor and the response. We reject the null hypothesis
is small enough
it is the average amount that the response will deviate from the true regression line.
ome valuesthat is, if yi yi for i =1,...,nthen RSE will be small, and we can conclude that th
then the RSE may be quite large, indicating that the model doesnt t the data well.
ata. But since it is measured in the units of Y , it is not always clear what constitutes a good RSE
a value between 0 and 1, and is independent of the scale of Y .
mount of variability inherent in the response before the regressionis performed.

forming the regression
d (or removed) by performing the regression, and R2 measures the proportion of variability in Y that can be
of the linear model.

. In other words, the squared correlation and the R2 statistic are identical
in which we use several predictors simultaneously to predict the response.

not extend automatically to this setting,
les rather than between a larger number of variables. We will see that R2 lls this role.
eicent
e^2
921600
237851.29
1967286.76
9479625.21
7419086.44
897756.25
144248.04
66100.41
21133554 RSS
(x-xaverage)^2
12.25
6.25
2.25
0.25
0.25
2.25
6.25
12.25
42 summation
0 and 1 ,
eof 0 and 1 be?
tween X and Y .
regression line.
d we can conclude that the model ts the data very well.

the data well.
at constitutes a good RSE
of variability in Y that can be explained using X.
tical
onse.
R2 lls this role.
years year since 1995 income income based on model

1995
0
53807
52847
1996
1
55217
54729.3
1997
2
55209
56611.6
1998
3
55415
58493.9
1999
4
63100
60376.2
2000
5
63206
62258.5
2001
6
63761
64140.8
2002
7
65766
66023.1
2003
standard deviation of e
2004
2005 pecentage error
e2/total sample response
2006 correlation
11293.5
2007 correlation
2008
2009
2010
2011
2012
2013
Percentage error
2014
sample Correlation (X,Y)
squared sample correlation(X,Y
x-xs.mean
e (residual value) e^2
960
921600
487.7 237851.29
-1402.6 1967286.76
-3078.9 9479625.21
2723.8 7419086.44
947.5 897756.25
-379.8 144248.04
-257.1
66100.41
21133554
1876.7683
y-ys.mean (x-xs.mean)2
-3.5
-2.5
-1.5
-0.5
0.5
1.5
2.5
3.5
-5628.13
-4218.13
-4226.13
-4020.13
3664.875
3770.875
4325.875
6330.875
12.25
6.25
2.25
0.25
0.25
2.25
6.25
12.25
42
sum
70000
f(x) = 1882.25x + 52
R = 0.87563661
60000
50000
40000
0.0315767528
3.158%
0.9357545672
0.87563661 equals to R squared
Axis Title
30000
20000
10000
0
(y-ys.mean)2
(x-xs.mean)*(y-ys.mean)=sample Covariance(X,Y) or sample StdvXY
31675791
17792579
17860133
16161405
13431309
14219498
18713195
40079978
933887
sum
19698.4375
10545.3125
6339.1875
2010.0625
1832.4375
5656.3125
10814.6875
22158.0625
79054.5
11293.5
income
income
Linear (income )
70000
f(x) = 1882.25x + 52847.25
R = 0.87563661
60000
50000
40000
30000
20000
10000
0
4
Axis Title
Type of Repair
Electrical
Mechanical
Electrical
Mechanical
Electrical
Electrical
Mechanical
Mechanical
Electrical
Electrical
Repair person Months Since Repair time(Hours)

Dave newton
2
2.9
Dave newton
6
3
Bob Jones
8
4.8
Dave newton
3
1.8
Dave newton
2
2.9
Bob Jones
7
4.9
Bob Jones
9
4.2
Bob Jones
8
4.8
Bob Jones
4
4.4
Dave newton
6
4.5
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.7816138423
0.6109201985
0.5622852234
0.713792687
10
ANOVA
df
Regression
Residual
Total
SS
1
8
9
6.4
4.076
10.476
4.62
-1.6
Standard Error
0.319217794
0.4514421336
Predicted Repair time(Hours)

3.02
3.02
4.62
3.02
3.02
4.62
4.62
4.62
4.62
3.02
Residuals
-0.12
-0.02
0.18
-1.22
-0.12
0.28
-0.42
0.18
-0.22
1.48
Coefficients
Intercept
Coded
RESIDUAL OUTPUT
Observation
1
2
3
4
5
6
7
8
9
10
MS
F
Significance F
6.4 12.561334642 0.0075733033
0.5095
t Stat
P-value
Lower 95%
Upper 95%
Lower 95.0%
14.4728774115 5.08347E-007
3.883882447 5.356117553 3.883882447
-3.5441973198 0.0075733033 -2.6410274269 -0.5589725731 -2.6410274269
Standard Residuals
-0.178313988
-0.029718998
0.267470982
-1.8128588779
-0.178313988
0.416065972
-0.624098958
0.267470982
-0.326908978
2.1992058518
Upper 95.0%
5.356117553
-0.5589725731
Repair person Repairer Coded

Dave newton
1
Dave newton
1
Bob Jones
0
Dave newton
1
Dave newton
1
Bob Jones
0
Bob Jones
0
Bob Jones
0
Bob Jones
0
Dave newton
1
x bar
0.5 y-bar
Repair time(Hours)
2.9
3
4.8
1.8
2.9
4.9
4.2
4.8
4.4
4.5
3.82 (x-xbar)
(x-xbar)2
(x-xbar )2
(y-ybar)
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
0.25
2.5 (y-ybar)
-0.92
-0.82
0.98
-2.02
-0.92
1.08
0.38
0.98
0.58
0.68
### -2.66454E-015
Beta1
6.25
-0.92
-0.82
0.98
-2.02
-0.92
1.08
0.38
0.98
0.58
0.68
-2.66454E-015

Linear Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Regression

Uploaded by

Copyright:

Available Formats

years year since 1995 income income based on model e (residual value)

f(x) = 1882.25x + 52847.25

How accurate is the sample mean is the estimate of population mean ?

The R2 statistic provides an alternative measure of t. it always takes on a value betwe

correlation is also a measure of the linear relationship between X and Y

year since 1995 income income based on model e (residual value)

mate is known as the residual standard error

defines the number of standard deviations that 1 is away from 0.

ual to |t| or larger, assuming 1 = 0. We call this probability the p-value

a value between 0 and 1, and is independent of the scale of Y .

mount of variability inherent in the response before the regressionis performed.

of the linear model.

in which we use several predictors simultaneously to predict the response.

d we can conclude that the model ts the data very well.

at constitutes a good RSE

of variability in Y that can be explained using X.

years year since 1995 income income based on model

(x-xs.mean)*(y-ys.mean)=sample Covariance(X,Y) or sample StdvXY

Repair person Months Since Repair time(Hours)

Predicted Repair time(Hours)

Repair person Repairer Coded

You might also like