You are on page 1of 24

Correlation & Regression

By
Dr. Manas Kumar Pal
Simple Correlation & Regression
Introduction:
Both correlation and regression are used to measure
the strength of relationships between variables.

Correlation
When two or more variables move in sympathy with
other, then they are said to be correlated. If both
variables move in the same direction then they are
said to be positively correlated. If the variables move
in opposite direction then they are said to be
negatively correlated. If they move haphazardly then
there is no correlation between them.
Types of Correlation

a. Positive or Negative
b. Simple, Partial and Multiple
c. Linear and Nonlinear
Types of Correlation
Positive correlation:
Both the variables (X and Y) will vary in the same
direction. If variable X increases, variable Y also will
increase if variable X decreases, variable Y also will
decrease.

Negative Correlation:
The given variables will vary in opposite direction. If
one variable increases, other variable will decrease &
if one variable decreases, other variable will increase.
Types of Correlation
Simple, Partial and Multiple correlations:
In simple correlation, relationship between two
variables are studied.

In partial and multiple correlations three or more


variables are studied.

In partial correlation more than two variables are


studied, but the effect on one variable is kept constant
and relationship between other two variables is
studied. Three or more variables are simultaneously
studied in multiple correlations.
Types of Correlation

Linear and Non-Linear correlation:


In linear correlation the percentage change in one
variable will be equal to the percentage change in
another variable.

It is not so in non linear correlation.


Measures of correlation

i) Scatter Diagram
ii) Karl Pearsons correlation coefficient
iii) Spearmans Rank correlation coefficient
Scatter Diagram

Scatter diagram tells us the direction in which they are


related and does not give any quantitative measures for
comparison between sets of data.
Karl Pearsons correlation coefficient.
Cov( x, y)
r
x y
n

(x
i 1
i x )( yi y )
Co var iance ( x, y ) n
r
n n
var x var y
(x
i 1
i x) 2
(y
i 1
i y) 2

n n
n

( x x )( y
i i y)
r i 1
n n

(x x) ( y
i 1
i
2

i 1
i y) 2

n
( xi yi nx y )
i 1
n n
( xi nx ) ( yi ny )
2 2 2 2

i 1 i 1
Rank correlation coefficient.

6 d i
2
Cov( x, y )
r 1
x y n(n 2 1)
Properties of Correlation Coefficient.

Its value always lies between 1 and 1.


It is a relative measure
(does not have any unit attached to it)
It is the geometric mean between two regression coefficient.
It is symmetrical
It is not affected by change of origin or change of scale.
Problems

Q1. Given below are monthly income & their net savings of a sample
of 10 employee in an organization. Calculate the correlation
coefficient?

Emp. No 1 2 3 4 5 6 7 8 9 10

Monthly 78 36 98 25 75 82 90 62 65 39
Income
(in 000Rs)
Net 8 5 9 6 6 6 8 5 5 4
Savings
Problems

Q2. The following table shows the effect of an individual social


status striving upon his purchase behaviour. Find RCC?

Individual 1 2 3 4 5 6 7 8 9 10

Attitude 5 4 2 10 3 1 6 9 7 8
of social
striving
sales 6 2 1 7 4 3 9 5 8 10
Problems
Q3. In a competition , two judges have given scores to the
participants. Find out RCC?

Participant 1 2 3 4 5 6 7 8 9 10

Scores of 33 42 65 62 60 58 75 50 70 80
Judge 1
Scores of 45 35 70 78 62 60 65 55 75 85
Judge 2
Regression
Definition:
It is used to predict the value of one variable (the
dependent variable) on the basis of other
variables (the independent variables).
Types of Regression
Simple Regression:
One dependent variable & one independent
variable.
Multiple Regression:
One dependent variable & many independent
variable.
Regression

The technique is used to predict the value of one


variable (the dependent variable y) based on
the value of other variables (independent
variables x1, x2,xn.)

Dependent variable: denoted Y


Independent variables: denoted X1, X2, , Xn
The Model
The first order linear model
y a bx
y = dependent variable
x = independent variable
a and b are unknown,
a = y-intercept y therefore, are estimated
from the data.
b = slope of the line
= error variable

a
x
18
Regression

The best line is the one that minimizes the


sum of squared vertical differences between
the points and the line which can be obtained
by the principle of least squares. The smaller
the sum of squared differences the better the
fit of the line to the data.
To calculate the estimates The regression equation
of the coefficients that that estimates the
minimize the differences equation of the first
between the data points order linear model is:
and the line, use the
formulas:
y a bx
cov( x, y )
b
v( x)
a y bx
n
( xi yi nx y )
y
byx i 1
r
n
x
( xi nx )
2 2

i 1

n
( xi yi nx y )
x
and bxy i 1
r
n
y
( yi ny 2 )
2

i 1
Properties of Regression Coefficient.

The geometric mean between two regression


coefficient is correlation coefficient.
The arithmetic mean between two regression
coefficients is correlation coefficient.
If one regression coefficient is greater than one,
other regression coefficient must be less than
one.
It is not affected by change of origin but affected
by change of scale.
Q. Given the following information about
advertisement expenditure & sales of a company.

Advertisement in Sales (y)


lakh (x)

A.M x 10 90
S.D 5 12

Correlation Coefficient= 0.8


1. Obtain the two regression equations
2. Find the sales when advertisement budget is 15 lakh
3. What should be advertisement budget if the company want
to attain a sales target of 12 lakh
THANK YOU

You might also like