Professional Documents
Culture Documents
INTRODUCTORY
LINEAR REGRESSION
1
CHAPTER OUTLINE:
4.1 SIMPLE LINEAR REGRESSION
- Curve fitting
- Linear correlation
2
Introduction:
Regression – is a statistical procedure for establishing the
r/ship between 2 or more variables.
3
4.1 The Simple Linear Regression Model
is an equation that describes a dependent variable (Y) in
terms of an independent variable (X) plus random error
Y 0 1 X
where,
0 = intercept of the line with the Y-axis
1 = slope of the line
= random error
Random error, is the difference of data point from the
deterministic value.
This regression line is estimated from the data collected by
fitting a straight line to the data set and getting the
equation of the straight line,
Y 0 1 X 4
Example 4.1:
E(y)
Regression line
Intercept Slope 1
0 is positive
7
SCATTER DIAGRAM
Negative Linear Relationship
E(y)
Intercept
0 Regression line
Slope 1
is negative
8
SCATTER DIAGRAM
No Relationship
E(y)
9
LINEAR REGRESSION MODEL
X -3 -2 -1 0 1 2 3
Y 1 2 3 5 8 11 12
10
11
4.1.2 INFERENCES ABOUT ESTIMATED PARAMETERS
12
LEAST SQUARES METHOD
Theorem :
Given the sample data xi , yi ; i 1, 2,.... n , the
coefficients of the least squares line are:
y 0 1 x
i) y-Intercept for the Estimated Regression Equation,
0 y 1 x
i 1 n 2
n
yi
S yy yi 2 i 1
2 n
n
xi i 1 n
xi 2 i 1
n
S xx
i 1 n
14
LEAST SQUARES METHOD
15
Example 4.3: Students score in history
The data below represent scores obtained by ten primary
school students before and after they were taken on a
tour to the museum (which is supposed to increase their
interest in history)
Before,x 65 63 76 46 68 72 68 57 36 96
After, y 68 66 86 48 65 66 71 57 42 87
16
Solution
n 10 xy 44435
x 647 x 44279
2
x 64.7
y 656 y 44884
2
y = 65.6
S xy 44435
647 656
1991.8
10
647 2
S xx 44279 2418.1
10
6562
S yy 44884 1850.4
10
17
S xy 1991.8
a) ˆ1 0.8237
S xx 2418.1
ˆ0 y ˆ1 x 65.6 0.8237 64.7 12.3063
Y 12.3063 0.8237 x
b) x 60
Y 12.3063 0.8237 60 61.7283
18
4.1.3 ADEQUACY OF THE MODEL
COEFFICIENT OF DETERMINATION(R 2)
The coefficient of determination is a measure of the
variation of the dependent variable (Y) that is
explained by the regression line and the
independent variable (X).
2
The symbol for the coefficient of determination is r
or R2 .
If r =0.90, then r 2 =0.81. It means that 81% of the
variation in the dependent variable (Y) is accounted
for by the variations in the independent variable (X).
The rest of the variation, 0.19 or 19%, is unexplained
and called the coefficient of nondetermination.
Formula2 for the coefficient of nondetermination is 19
1.00 r
2
R
COEFFICIENT OF DETERMINATION( )
Relationship Among SST, SSR, SSE
i
( y y ) 2
i
( ˆ
y y ) 2
i i
( y ˆ
y ) 2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Solution :
r 2 = 0.84. It means that 84% of the variation in the
dependent variable (Y) is explained by the
variations in the independent variable (X).
21
4.1.4 Linear Correlation (r)
Correlation measures the strength of a linear
relationship between the two variables.
Also known as Pearson’s product moment
coefficient of correlation.
The symbol for the sample coefficient of correlation
is r , population .
Formula : S xy
r
S xx .S yy
@
r (sign of b1 ) r 2
22
Properties of r :
1 r 1
Values of r close to 1 implies there is a strong
positive linear relationship between x and y.
Values of r close to -1 implies there is a strong
negative linear relationship between x and y.
Values of r close to 0 implies little or no linear
relationship between x and y.
23
Refer Example 4.3: Students score in history
c)Calculate the value of r and interpret its
meaning
Sxy
Solution: r
Sxx .Syy
1991.8
2418.1 1850.4
0.9416
Thus, there is a strong positive linear
relationship between score obtain before (x)
and after (y).
24
Refer example 4.3:
i) t Test
ii) F Test
27
1) t-Test
1. Determine the hypotheses.
H0 : 1 0 ( no linear r/ship)
t / p-value
,n2
2
3. Compute the test statistic.
ˆ1
t
Var ˆ1
S yy ˆ1S xy 1
Var ˆ1
n2
S xx 28
1) t-Test
4. Determine the Rejection Rule.
Reject H0 if :
t < - t or t > t
,n2 ,n2
2 2
p-value <
5.Conclusion.
29
2) F-Test
1. Determine the hypotheses.
H0 : 1 0 ( no linear r/ship)
F = MSR/MSE
4. Determine the Rejection Rule.
Reject H0 if :
p-value <
F test > F ,1, n 2 30
2) F-Test
5.Conclusion.
31
Refer Example 4.3: Students score in history
d) Test to determine if their scores before and after
the trip is related. Use 0.05
Solution:
1. H 0 : 1 0 ( no linear r/ship)
H 1 : 1 0 (exist linear r/ship)
2. 0.05, t 2.306
0.05/2 ,8
1
S
S
1 xy 1
3. ttest
Var ( 1 )
yy
n 2 Sxx
Var ( 1 )
1850.4 (0.8237)(1991.8) 1
0.8237 2418.1
7.926 8
0.0108 0.0108 32
4. Rejection Rule:
ttest t0.025 ,8
7.926 2.306
5. Conclusion:
Thus, we reject H0. The score before (x) is
linear relationship to the score after (y) the
trip.
33
ANALYSIS OF VARIANCE (ANOVA)
The value of the test statistic F for an ANOVA test is
calculated as:
F=MSR
MSE
To calculate MSR and MSE, first compute the
regression sum of squares (SSR) and the error sum of
squares (SSE).
34
ANALYSIS OF VARIANCE (ANOVA)
General form of ANOVA table:
Source of Degrees of Sum of Mean Squares Value of the
Variation Freedom(df) Squares Test Statistic
Regression 1 SSR MSR=SSR/1
Error n-2 SSE MSE=SSE/n-2 F=MSR
MSE
Total n-1 SST
ANOVA Test
1) Hypothesis: H 0 : 1 0
H1 : 1 0
2) Select the distribution to use: F-distribution
3) Calculate the value of the test statistic: F
4) Determine rejection and non rejection regions:
5) Make a decision: Reject Ho/ accept H0 35
Example 4.5
The manufacturer of Cardio Glide exercise equipment wants to study the
relationship between the number of months since the glide was purchased
and the length of time the equipment was used last week.
36
Solution (1):
Regression equation:
Yˆ 9.939 0.637 X
37
Solution (2):
1) Hypothesis:
H 0 : 1 0
H1 : 1 0
1) F-distribution table: F0.01,1,8 11.2586
2) Test Statistic:
F = MSR/MSE = 17.303
or using p-value approach:
significant value =0.003
4) Rejection region:
Since F statistic > F table (17.303>11.2586 ), we reject H0
or since p-value (0.003 < 0.01 )we reject H0
5) Thus, there is a linear relationship between the variables
(month X and hours Y).
38