You are on page 1of 52

Linear

Regression
OBJECTIVES
• Define what is meant by Linear
Regression and Correlation
• Construct the Line of Regression by
calculating Intercept and slope on a
scattergram
• Interpret the relationships using
predictions, r and r2
When two or more variables are
measured on each individual, we can
estimate the relationship of one
variable with another in terms of linear
function of one on other.

Usually performed to predict or


estimate the value of one variable
corresponding to a given value of
another variable.
Applicable on continuous variables.
Assumptions Underlying Linear
Regression
• For each value of X, there is a group of Y
values, and these Y values are normally
distributed.
The means of these normal distributions of Y values all
lie on the straight line of regression.
The standard deviations of these normal distributions are
equal.
The Y values are statistically independent. This means
that in the selection of a sample, the Y values chosen for a
particular X value do not depend on the Y values for any
other X values.
Regression Analysis

• In regression analysis we use the


independent variable (X) to estimate
the dependent variable (Y).
The relationship between the variables is linear.
• Linear regression extends to
continuous predictor variables
– Systolic blood pressure predicted by body
mass index

– Body mass index predicted by caloric


intake

– Caloric intake predicted by measure of


stress
• Important to specify a biologically
plausible model
– Systolic blood pressure predicted by
eye color

– Body mass index predicted by visual


acuity
25

20

15

10

No of 5
Rings
in a tree 0
0 5 10 15 20 25 X
age in years

Y=X
5

3
Ht in mm
(tree) 2

0
0 5 10 15 20 25 X
age in days
Y = bX
b = 1/7 = 0.143
Hence Y= 0.143 X
60

50

40

30

20

10
0 1 2 3 4 5 6 7 X
mcg of drug/mm3 in blood

Y = a+ bX
Regression Analysis
The regression equation: Y= a + bX, where:
• Y is the average predicted value of Y for
any X.
• a is the Y-intercept. It is the estimated Y
value when X=0
• b is the slope of the line, or the average
change in Y for each change of one unit in
X
• the least squares principle is used to
obtain a and b.
Regression Analysis
• The least squares principle is used
to obtain a and b. The equations to
determine a and b are:
n( XY )  ( X )( Y )
b
n(  X 2 )  (  X ) 2
Y X
a b
n n
EXAMPLE 1
• Nadeem Arshad, the student body president
at Punjab University, is concerned about the
cost to students of textbooks. He believes
there is a relationship between the number of
pages in the text and the selling price of the
book.
• To provide insight into the problem he
selects a sample of eight textbooks currently
on sale in the bookstore. Draw a scatter
diagram. Compute the correlation
coefficient.
EXAMPLE 1 continued

Books Pages Price (Rs)


Into to History 500 84
Basic Algebra 700 75
Intro to Psyc 800 99
Intro to Sociology 600 72
Bus. Mmgt 400 69
Intro to Biology 500 81
Fund. of Jazz 600 63
Prin. of Nursing 800 93
Example 1 continued
Scatter Diagram of Number of Pages and Selling Price of Text

100

90
Price ($)

80

70

60
400 500 600 700 800
Page
Example 1 continued

Book Page Price (Rs)


X Y XY X2 Y2
Into to History 500 84 42,000 250,000 7,056
Basic Algebra 700 75 52,500 490,000 5,625
Into to Psyc 800 99 79,200 640,000 9,801
Into to Sociology 600 72 43,200 360,000 5,184
Bus. Mmgt 400 69 27,600 160,000 4,761
Intro to Biology 500 81 40,500 250,000 6,561
Fund. of Jazz 600 63 37,800 360,000 3,969
Princ. of Nursing 800 93 74,400 640,000 8,649
Total 4,900 636 397,200 3,150,000 51,606
EXAMPLE continued

Develop a regression equation for the


information given that can be used to
estimate the selling price based on the
number of pages.
n( XY )  ( X )( Y )
b
n(  X 2 )  (  X ) 2
Y X
a b
n n

8(397 ,200 )  (4,900 )( 636 )


b  .05143
8(3,150 ,000 )  (4,900 ) 2

636 4,900
a  0.05143  48 .0
8 8
Example continued

The regression equation is:


Y=a+bX
Y = 48.0 + 0.05143 X

• The equation crosses the Y-axis at Rs48. A


book with no pages would cost Rs48.
• The slope of the line is 0.05143. Each
additional page costs about 0.05 of a rupee.
• The sign of the b value and the sign of r will
always be the same.
Example continued

We can use the regression equation to


estimate values of Y.

• The estimated selling price of an 800


paged book is Rs 89.14, found by
Y  48.0  0.05143 X
 48.0  0.05143(800)  89.14
Correlation
Correlation

• Correlation measures the strength of


the association between two
continuous variables.
Correlation and regression refer to the
relationship that exists between two variables,
X and Y, in the case where each particular
value of X is paired with one particular value
of Y .

Correlation and regression are two sides of


the same coin. In the underlying logic, you
can begin with either one and end up with the
other.
A Scatter Diagram is a chart that
portrays the relationship between the two
variables.

The Dependent Variable is the variable


being predicted or estimated.

The Independent Variable provides the


basis for estimation. It is the predictor
variable.
Correlation as Covariation
When you find two variables to be correlated, the
fundamental meaning of that fact is that the particular
paired instances of X and Y that you have observed
tend to co-vary.

The positive or negative sign of r, the coefficient of


correlation, indicates the direction of covariation, and
the magnitude of r², the coefficient of determination,
provides a measure of the degree of covariation.
Thus, when we find a correlation coefficient of r = -
0.86, the fundamental meaning of this numerical
fact is that the particular paired instances of X and
Y show some degree of covariation, and that the
direction of that covariation is negative, or inverse.
When we square r to get the coefficient of
determination, r²= 0.74, the fundamental
meaning of this numerical fact is that the
degree of covariation is 74%.
That is, 74% of the variance of the Y variable
is coupled with variability in the X variable;
similarly, 74% of the variance of the X
variable is associated with variability in the Y
variable.
Conversely, you could say that 26% of the
variance of Y is unrelated to variability in X.
Coefficient of correlation measures
relationship between X and Y by
asking the question, ‘how much of
the variation in the value of Y can be
explained by the variation in the value
of X
Thigh circumference (cm)
100

80

60

40
30 35 40 45 50
Knee circumference (cm)
2

y
0

-1

-2
-2 -1 0 1 2
x
4

y
2
2
1
1
0
-2 -1 0 1 2
y

0 x

-1

-2
-2 -1 0 1 2
x
Perfect Negative Correlation
10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Perfect Positive Correlation

10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Zero Correlation

10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
Strong Positive Correlation

10
9
8
7
6
Y 5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 10
X
The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a
measure of the strength of the
relationship between two variables.
It requires interval or ratio-scaled data.
It can range from -1.00 to 1.00.

Values of -1.00 or 1.00 indicate perfect and


strong correlation.
Values close to 0.0 indicate weak correlation.

Negative values indicate an inverse relationship


and positive values indicate a direct relationship.
Formula for r
We calculate the coefficient of correlation from the
following formulae.

( X  X )(Y  Y )
r
(n  1) s x s y
n(XY )  (X )(Y )

n(X 2
 
)  (X ) n Y  Y 
2 2
 2

Coefficient of Determination
The coefficient of determination (r2) is
the proportion of the total variation in the
dependent variable (Y) that is explained
or accounted for by the variation in the
independent variable (X).

It is the square of the coefficient of correlation.


It ranges from 0 to 1.

It does not give any information on the


direction of the relationship between the
variables.
EXAMPLE 1
• Nadeem Arshad, the student body president
at Punjab University, is concerned about the
cost to students of textbooks. He believes
there is a relationship between the number of
pages in the text and the selling price of the
book.
• To provide insight into the problem he
selects a sample of eight textbooks currently
on sale in the bookstore. Draw a scatter
diagram. Compute the correlation
coefficient.
EXAMPLE 1 continued

Books Pages Price (Rs)


Into to History 500 84
Basic Algebra 700 75
Intro to Psyc 800 99
Intro to Sociology 600 72
Bus. Mmgt 400 69
Intro to Biology 500 81
Fund. of Jazz 600 63
Prin. of Nursing 800 93
Example 1 continued
Scatter Diagram of Number of Pages and Selling Price of Text

100

90
Price ($)

80

70

60
400 500 600 700 800
Page
Example 1 continued

Book Page Price (Rs)


X Y XY X2 Y2
Into to History 500 84 42,000 250,000 7,056
Basic Algebra 700 75 52,500 490,000 5,625
Into to Psyc 800 99 79,200 640,000 9,801
Into to Sociology 600 72 43,200 360,000 5,184
Bus. Mmgt 400 69 27,600 160,000 4,761
Intro to Biology 500 81 40,500 250,000 6,561
Fund. of Jazz 600 63 37,800 360,000 3,969
Princ. of Nursing 800 93 74,400 640,000 8,649
Total 4,900 636 397,200 3,150,000 51,606
Example 1 continued

n(XY )  (X )(Y )


r
n(X 2
)  (X ) 2
nY   Y  
2 2

8(397,200)  ( 4,900)( 636)



8(3,150,000  (4,900) 8(51,606)  (636) 
2 2

 0.614

r2 = 0.3769

You might also like