Professional Documents
Culture Documents
Linear regression is a study on the linear relationship between two variables. This is done by
fitting a linear equation to the observed data. The linear equation is then used to predict values
for the data. In a simple linear relationship, only two variables are involved:
X is the independent variable
Y is the dependent variable
Example:
A nutritionist studying weight loss programs might wants to find out if reducing intake of
carbohydrate can help a person reduce weight.
o X is the carbohydrate intake
o Y is the weight
An entrepreneur might want to know whether increasing the cost of packaging his new
product will have an effect on the sales volume.
o X is cost
o Y is sales volume
A scatter plot is essentially a ploy between the pair of (, ) values. The purpose of constructing
the plot is to examine the relationship between the two variables.
The scatter plots in figure () and () below shows the existence of a positive and a
negative relationship, whilst (
) shows no relationship between variable and .
1 Nurhana.PPD (2_14/15)
(a) Positive linear relationship
2 Nurhana.PPD (2_14/15)
Example A
In BMX dirt-bike racing, jumping high or getting air depends on many factors; the riders
skill, the angle of the jump, and the weight of the bike. Here are the data about maximum height
for various bike weights.
Type A B C D E F G H I J
Weight 19 19.5 20 20.5 21 22 22.5 23 23.5 24
(pounds)
Height 10.35 10.3 10.25 10.2 10.1 9.85 9.8 9.79 9.7 9.6
(inches)
Construct a scatter diagram for these data. Does the scatter diagram exhibit a linear relationship
between weight and height of the BMX dirt-bike?
Solution:
Yes the scatter diagram exhibits a linear (negative) relationship between weight and height.
3 Nurhana.PPD (2_14/15)
Example B
The dosage chart below was prepared by a drug company for doctors who prescribed
Tobyamycin, a drug that combats serious bacterial infections such as those in central system, for
life threatening situations.
Weight 88 99 110 121 132 143 154 165 176 187 198
(pounds)X
Usual Dosage 40 45 50 55 60 65 70 75 80 85 90
(mg)Yu
Maximum Dosage 66 75 83 91 100 108 116 125 133 141 150
(mg)Ym
Solution:
Yes the scatter diagram exhibits exhibit a linear (positive) relationship between weight and
usual dosage, and also between weight and maximum dosage
4 Nurhana.PPD (2_14/15)
8.2 CORRELATION OF COEFFICIENT
Correlation measures the strength of a linear relationship between two variables. One
numerical measure is the Pearson product moment correlation coefficient,
=
Where 2
n
n
xi
S xx = xi i =1
2
i =1 n
n n
n
x i y i
S xy = x i y i i =1 i =1
i =1 n
2
n
n
yi
S yy = yi i=1
2
i =1 n
Properties of :
1. 1 1
2. (a) Values of equal to 1 implies there is a perfect positive linear relationship between and .
(b) Values of equal to 1 implies there is a perfect negative linear relationship between and .
3. (a) Values of close to 1 implies there is a strong positive linear relationship between and .
(b) Values of close to 1 implies there is a strong negative linear relationship between and .
4. (a) Values of close to 0.5 implies there is a weak positive linear relationship between and .
(b) Values of close to 0.5 implies there is a weak negative linear relationship between and .
To find and interpret , we need to build a table with five major column heading by , , ,
and , and add the last row with a summation of the values.
# $ #$ #% $%
'# = '$ = '#$ = %
'# = '$ =%
5 Nurhana.PPD (2_14/15)
Example C
The data in the table below are the circumference (in feet) and heights (in feet) of trees in a
certain forest reserve.
Circumference, 1.8 1.9 1.8 2.4 5.1 3.1 5.5 5.1 8.3 13.7
Height, 21 33.5 24.6 40.7 73.2 24.9 40.4 45.3 53.5 93.9
Solution:
Circumference,# Height, $ #$ #% $%
1.8 21 37.8 3.24 441
1.9 33.5 63.65 3.61 1122.25
1.8 24.6 44.28 3.24 605.16
2.4 40.7 97.68 5.76 1656.49
5.1 73.2 373.32 26.01 5358.24
3.1 24.9 77.19 9.61 620.01
5.5 40.4 222.2 30.25 1632.16
5.1 45.3 231.03 26.01 2052.09
8.3 53.5 444.05 68.89 2862.25
13.7 93.9 1286.43 187.69 8817.21
'# = y = '#$ = %
'# = %
'$ =
48.7 451 2877.63 364.31 25166.86
2 2
n n
xi n
yi
= yi i =1
n
S xx = xi i =1
2 2
S yy
i =1 n i =1 n
(48.7 )2 = 25166.86
(451)
2
= 364.31
10 10
= 364.31 237.169 = 25166.86 20340.1
= 127.141 = 4826.76
n n
n
xi y i
= xi y i i =1
i =1
S xy S xy 681.26 681.26
i =1 n r= = = = 0.869
S xx S yy (127.141)(4826.76) 783.38
48.7( 451)
= 2877.63
10
Strong positive linear relationship.
= 2877 .63 2196 .37
= 681.26
6 Nurhana.PPD (2_14/15)
Example D
A researcher wishes to see if there is a relationship between the ages and net worth of the
wealthiest people in Malaysia. The data for a specific year are shown.
Age, # 73 65 53 54 79 69 61 65
Net wealth, $ ($ billions) 16 26 50 21.5 40 16 19.6 19
Solution:
Age, # Net wealth, $ ($ billions) #$ #% $%
73 16 1168 5329 256
65 26 1690 4225 676
53 50 2650 2809 2500
54 21.5 1161 2916 462.25
79 40 3160 6241 1600
69 16 1104 4761 256
61 19.6 1195.6 3721 384.16
65 19 1235 4225 361
'# = y = '#$ = '#% = '$ % =
519 208.1 13363.6 34227 6495.41
2 2
n n
n
xi n 2 yi
S xx = xi i =1 = yi i =1
2
S yy
i =1 n i =1 n
= 34227
(519)2 = 6495.41
(208.1)2
8 8
= 34227 33670.13 = 6495.41 5413.2
= 556.87 = 1082.21
n n
n
xi yi
S xy = xi yi i =1 i =1 S xy 136.89 136.89
i =1 n r= = = = 0.18
S xx S yy (556.87)(1082.21) 776.31
519( 208.1)
= 13363.6
8
Weak negative linear relationship.
= 13363.6 13500.49
= 136.89
7 Nurhana.PPD (2_14/15)
Example E
The data below obtained in a study on the number of absences and the final grades of seven
randomly selected students from statistics class.
Student A B C D E F G
Number of absences, # 6 2 15 9 12 5 8
Final grade, $ (%) 82 86 43 74 58 90 78
Solution:
# $ #$ #% $%
6 82 492 36 6724
2 86 172 4 7396
15 43 645 225 1849
9 74 666 81 5476
12 58 696 144 3364
5 90 450 25 8100
8 78 624 64 6084
'# = '$ = '#$ = '#% = '$ % =
57 511 3745 579 38993
2 2
n n
n
xi n 2 yi
= xi i =1 S yy = yi i =1
2
S xx
i =1 n i =1 n
= 579
(57 )
2
= 38993
(511)
2
7 7
= 579 464.14 = 38993 37303
= 114.86 = 1690
n n
n
x i y i
S xy = x i yi i =1
i =1
S xy 416 416
i =1 n r= = = = 0.94
S xx S yy (114.86)(1690) 440.58
57 (511)
= 3745
7
Strong negative linear relationship.
= 3745 4161
= 416
8 Nurhana.PPD (2_14/15)
Example F
Find the correlation coefficient, for the following set of data.
#) 1 1 2 4 7 8 8
$) 4 5 8 15 23 20 25
Solution:
# $ #$ #% $%
1 4 4 1 16
1 5 5 1 25
2 8 16 4 64
4 15 60 16 225
7 23 161 49 529
8 20 160 64 400
8 25 200 64 625
'# = '$ = '#$ = '#% = '$ % =
31 100 606 199 1884
2 2
n n
n
xi n 2 yi
= xi i =1 S yy = yi i =1
2
S xx
i =1 n i =1 n
= 199
(31) 2
= 1884
(100)2
7 7
= 199 137.29 = 1884 1428.57
= 61.71 = 455
n n
n
x i y i
S xy = x i yi i =1
i =1
S xy 163.14 163.14
i =1 n r= = = = 0.97
S xx S yy (61.71)(455) 167.57
31(100 )
= 606
7
Strong positive linear relationship.
= 606 442.86
= 163 .14
9 Nurhana.PPD (2_14/15)
8.3 SIMPLE LINEAR REGRESSION EQUATION
A linear regression equation is a mathematical equation that can be used to predict the
values of one dependent variable from known values of an independent variable. This equation
represents a straight line so it is the form = *+ + * where * is the slope and *+ is the
intercept.
The basic simple linear regression is given by
= *+ + * + -
Where
= independent variable
= dependent variable
*+ = intercept of the line
* = slope of the line
- = random error component
This regression line is estimated from the data collected by fitting a straight line to the
data set and getting the equation of the straight line
. = */+ + */
The straight line fitted the data set is the estimated linear regression line
. = */+ + */
10 Nurhana.PPD (2_14/15)
For any data point ( , ), the deviation or error is given by
6 = .
6 = (*/+ + */ )
Is minimized by finding the partial derivatives and setting them equal to zero. That is
9
= 2 7( */+ */ ) = 0
9:.
9
= 2 7 ( */+ */ ) = 0
9*/
;*/+ + */ 7 = 7
*/+ 7 + */ 7 = 7
yi x i
0 = y 1 x = i =1
1
i =1
n n
S xy
1 =
S xx
Given any value of , the predicted value of the dependent variable . can be found by
substituting into fitted simple linear regression.
11 Nurhana.PPD (2_14/15)
Table below summarize the formulation used in Linear Regression
Correlation Coefficient =
Where 2
n
n
xi
S xx = xi i =1
2
i =1 n
n n
n
x i y i
S xy = x i y i i =1 i =1
i =1 n
2
n
n
yi
S yy = yi i=1
2
i =1 n
Where n n
yi x i
0 = y 1 x = i =1
1 i =1
n n
S xy
1 =
S xx
12 Nurhana.PPD (2_14/15)
Example G (From Example C )
yi = 451,
i =1
x i =1
i = 48.7, n = 10
Solution:
(a) S xy
1 =
S xx
681.26
=
127.141
= 5.36
n n
y i x i
0 = y 1 x = i =1
1 i =1
n n
451 48.7
= (5.36)
10 10
= 45.1 26.1
= 19
(b) y = 0 + 1 x
= 19 + 5.36x
yi = 208.1,
i =1
x i =1
i = 519, n = 8
Solution:
(a) S xy 136 .89
1 = = = 0.25
S xx 556 .87
n n
yi x i
208.1 519
0 = y 1 x = 1 (- 0.25 )
i =1 i =1
= = 26.01 + 16 .22 = 42 .23
n n 8 8
13 Nurhana.PPD (2_14/15)
slope 1 = 0 . 25 - and estimates of intercept 0 = 42.23
(b) y = 0 + 1 x
= 42.23 0.25 x
(c) When = 35
Example I
From Example E
n n
S xx = 114.86 , S xy = 416 , S yy = 1690 , yi = 511,
i =1
x
i =1
i = 57, n = 7
Solution:
(a) S xy
1 =
S xx
416
=
114 .86
= - 3.62
n n
y i x i
0 = y 1 x = i =1
1 i =1
n n
511 57
= ( 3.62)
7 7
= 73 + 29.48
= 102.48
(b) y = 0 + 1 x
= 102.48 3.62 x
(c) When = 2
14 Nurhana.PPD (2_14/15)
Example J
From Example F
n n
S xx = 61.71 , S xy = 163.14 , S yy = 455 , yi = 100 ,
i =1
x
i =1
i = 31, n = 7
Solution:
(a) S xy
1 =
S xx
163 .14
=
61 .71
= 2.64
n n
y i x i
0 = y 1 x = i =1
1
i =1
n n
100 31
= (2.64 )
7 7
= 14.29 11.69
= 2.6
(b) y = 0 + 1 x
= 2.6 + 2.64 x
(c) When = 3
15 Nurhana.PPD (2_14/15)
Example K [FULL EXERCISE]
A study is conducted to determine the relationship between drivers age and the number of
accident he/she involved over a year.
Drivers age, 16 24 18 17 23 27 32
Number of accidents, 3 2 5 2 0 1 1
(a) Construct a scatter diagram for these data. Does the scatter diagram exhibit a linear
relationship between drivers age and number of accident?
(b) Calculate the value of correlation coefficient, and interpret the result.
(c) Find out the slope 1 and estimates of intercept 0 .
(d) Find the regression line equation.
(e) Predict the number of accident involved if he/she is 20 years old.
16 Nurhana.PPD (2_14/15)
Example L [FULL EXERCISE]
The data below represent scores obtained by 10 primary school students before and after they
were taken on a tour to the museum (which is supposed to increase their interest in history).
Before, 65 63 76 46 68 72 68 57 36 96
After, 68 66 86 48 65 66 71 57 42 87
(a) Construct a scatter diagram for these data. Does the scatter diagram exhibit a linear
relationship between scores before and after their tour to the museum?
(b) Calculate the value of correlation coefficient, and interpret the result.
(c) Find out the slope 1 and estimates of intercept 0 .
(d) Find the regression line equation.
(e) Predict the score after the tour if he/she scored 60 marks before the tour.
Answer:
(a) Yes. Exhibit linear relationship.
(b) = 0.94 (strong positive linear)
(c) 1 0.82,
=
0 =12.31
17 Nurhana.PPD (2_14/15)