You are on page 1of 23

Correlation and Regression Analysis

Correlation
• Everyday, managers make professional and personal decisions
that are based on predications of future events.

• To make these forecasts, they rely on the relationship between


what is already known and what to be estimated.

• If the decision makers can determine how the known is related


to the future event, they can aid the decision-making process
considerably.

• The important point is here is that How to determine the


relationship between variables
Examples
• Family income and expenditure on luxury items
• Yield of a crop and quantity of fertilizer used
• Sales revenue and expenses incurred on advertising
• Frequency of smoking and lung damage
• Weight and height of individuals
• figures of accidents and motorcars in a city
• Demand for a commodity and Price

• From the above cases we find some relationship between


variables
Meaning and Definition
• A statistical technique that is used to analyse the strength
(magnitude) and direction of the relationship between
variables is called correlation analysis.

• When the relationship is of a quantitative nature, the


appropriate statistical tool for discovering and measuring the
relationship and expressing it in a brief formula is known as
correlation. Croxton and Cowden

• Correlation analysis is the statistical tool we can use to


describe the degree too which one variable is linearly related
to another.
Significance of Correlation

• 1. Measures in one figure, the degree of relationship


between the variables.

• 2. Characterizes progressive development in the methods of


science and philosophy by the increase in knowledge of nature
of relationship.

• 3. Greatly helps in prediction analysis.

• 4. Contributes to the understanding of economic behavior.


Types of Correlation
• Positive Correlation

• Negative correlation

• Simple Correlation

• Partial Correlation

• Multiple correlation

• Linear Correlation

• Non-linear correlation
Positive Correlation

• Positive Correlation: If two variables change in the same


direction (i.e. if one increases the other also increases, or if
one decreases, the other also decreases), it is called a positive
correlation.
• Example
• (i) Advertising and Sales (ii) Heights and Weights (iii) Income
and Expenditure
• Negative Correlation: If two variables change in the opposite
direction (i.e. if one increases, the other decreases and vice
versa), the correlation is called negative correlation.
• Example
• Price and Demand,
• Simple Correlation: Correlation is said to be Simple when
only two variables are studied.
• Example
• Supply and Demand
• Partial and Multiple Correlations: When we study three or
more variables, it is known as either Multiple or Partial
correlation. In Multiple correlation three or more variables are
studied simultaneously.
• Example
• Crop Yield depends upon Temperature and rainfall.
• Linear Correlation:
• If the amount of change in one variable tends to bear constant
ratio to the amount of change in the other variable, then the
correlation is said to be linear. If we plot this on a graph we
obtain a straight line.

• Non-linear correlation: If the amount of change in one


variable does not bear a constant ratio to the amount of change
in the other variable, the correlation is said to be Non-linear
or Curvi-linear. If we plot these points we obtain a curve.
Measurement
• Statisticians have developed two measures for describing the
correlation between two variables:

• The coefficient of correlation: The coefficient of correlation is


a number that indicates the strength and direction of statistical
relationship between two variables.

• The coefficient of determination: The coefficient of


determination is the primary way we can measure the extent,
or strength, of association that exists between two variables.
Properties of Correlation Coefficient
• Property 1
• The correlation coefficient always lies between -1 and +1.
• Property 2
• Correlation coefficient is independent of change of origin and
scale.
• Property 3
• The sign of the correlation coefficient indicates the direction
of the relationship.
• Property 4
• High degree, moderate degree or low degree are the three
categories of this kind of correlation.
METHODS OF DETERMINING CORRELATION

• Scatter Diagram Method

• Karl Pearson’s Coefficient of Correlation

• Spearman’s Rank Correlation Method

• Methods of least Squares


Scatter Diagram Method
• The scatter diagram method is an at-a-glance method to
understand an apparent relationship (if any) between two
variables.

• A scatter diagram ( or a graph) can be traced on a graph paper


by plotting pairs of values of variables. A straight line drawn
through these pairs of values describes different types of
relationship between two variables.
Karl Pearson’s Correlation Coefficient

• A numeric measure of correlation is the one given by Karl


Pearson.

n = Number of pairs of observations

r = coefficient of correlation
Calculation
• The following table relate to age of employees and the number
of days they reported sick in a month.
Age(X) Sick
Days(Y)
30 1 -16 -3 48 256 9
32 0 -14 -4 56 196 16
35 2 -11 -2 22 121 4
40 5 -6 1 -6 36 1
48 2 2 -2 -4 4 4
50 4 4 0 0 16 0
52 6 6 2 12 36 4
55 5 9 1 9 81 1
57 7 11 3 33 121 9
61 8 15 4 60 225 16
460 40 230 1056 64
Calculation Contd……

• Interpretation: Since value of r is positive, therefore age of


employees and number of sick days are positively correlated
to a high degree.
• Hence, we may conclude that as the age of an employee
increases, he is likely to go on sick leave more often than
others
Coefficient of determination
• The coefficient of determination always has a value between 0
and 1.
• While squaring the value of correlation coefficient, the
information about the strength of the relationship is retained
but the information about the direction is lost.
• The value of coefficient of determination represents the
proportion ( or percentage ) of the total variability in the
dependent variable, y, that is explained by the independent
variable.
• The coefficient of determination (r2) is defined as the ratio of
the explained variance to total variance.
• If the value of r = 0.9, (r2) will be 0.81 and this would mean
that 81 percent of the variations in the dependent variable
explained by the independent variable
Spearman’s Rank Correlation Coefficient

• In 1904, a British psychologist Charles Edward Spearman


developed a method to measure the statistical association
(relationship) between two variables, when ordinal (or rank)
data are available.
• This implies that Spearman’s rank correlation coefficient
method is applied in a situation where quantitative measure of
qualitative factors such as Judgment, brands personalities,
beauty, intelligence, honesty, TV Programmes, leadership
cannot be fixed but individual observations can be arranged in
a definite order ( or rank).
• The ranking is done by using a set of ordinal rank numbers
with 1 for the individual observations ranked first; 2 for the
individual observation ranked second and so on either in
terms of quantity or quality.
• Mathematically, Spearman’s rank correlation coefficient is
defined as
6 d 2
R  1

n n2 1 

• Where R is rank correlation coefficient


• The number ‘6’ in the formula as scaling device ensures that
the possible range of ‘R is from -1 to +1.
• When ranks are given
• If observations in a data are already arranged in a particular
order (rank), then take the differences in pairs of observations
to determine the difference, d.
• Square these differences and obtain the total.
• Apply the formula of Spearman’s correlation coefficient
• When Ranks are not given
• If observations in a data set are not arranged in a particular
order (rank), then ranks are assigned by taking either the
highest value or the lowest value as rank one and so on for
values of both the variables.
• When ranks are equal
• If more than one observations of equal size are found at the
time of ranking observations in the data set by taking either
the highest or lower value as rank, then rank to be assigned to
individual observations is an average of the ranks that these
individual observations deserved.

• For example: if two observations are ranked equal at third


place, then the average rank of (3+4/2) =3.5 is assigned to
these two observations.
• The modified Spearman rank correlation coefficient formula
for such a case is given by


6 d 
2 1
12

m1  m1 
3
 1
12
  
m23  m2  ..
R  1  

n n2 1 
Example
• Ten competitors in a beauty contest are ranked by two Judges
in the following order.
Judge 1(R1) Judge 2(R2) D=R1-R2 D2
1 3
6 5
5 8
10 4
3 7
2 10
4 2
9 1
7 6
8 9
Example
• Calculate the spearman’s coefficient of rank correlation using
the data given.
X Y R1 R2 D =R1-R2
18 23 1 1 0 0
20 27 2 2 0 0
21 29 3 5 -2 4
22 28 4 3.5 0.5 0.25
27 28 5.5 3.5 2 4
27 31 5.5 7 -1.5 2.25
28 35 7 9 -2 4
29 30 9 6 3 9
29 36 9 10 -1 1
29 33 9 8 1 1
25.5

You might also like