You are on page 1of 34

CORRELATION

By - Prof. Prashant

CORRELATION
A measure of association between two numerical variables.
If two quantities vary in such a way that movements in one are accompanied by movements in the other, these quantities are correlated. Correlation analysis is used as a statistical tool to ascertain the association between two variables.
Correlation analysis deals with the association between two or more variables. - Simpson and Kafka

The problem in analyzing the association between two variables can be broken down into three steps:

Try to know whether the two variables are related or independent of each other.

If there is a relationship between the two variables, then know its nature & strength. (i.e., positive/negative; how close is that relationship)
Know if there is a causal relationship between them. i.e., variation in one variable causes variation in another.

Example Typically, in the summer as the temperature increases people are thirstier.

For seven random summer days, a person recorded the temperature and their water consumption, during a threehour period spent outside.

Temperature (F)

Water Consumption (ounces)

75 83 85 85 92 97 99

16 20 25 27 32 48 48

CORRELATION

CORRELATION & CAUSATION


Correlation may be due to chance particularly when the data pertain to a small sample. It is possible that both the variables are influenced by one or more other variables. It may be that case, where both the variables may be influencing each other - we cannot say which is the cause and which is the effect.

TYPES OF CORRELATION
Positive and negative

Simple, partial and multiple


Linear and non-linear

Positive and negative


Whether the correlation is positive (direct) or negative (inverse) would depend upon the direction of change of the variables.

Positive Correlation

Negative Correlation

Simple, partial and multiple


The distinction between simple, partial and multiple

correlation is based on the number of studied. When only two variables are studied it is problem of simple correlation. When three or more variables are studied I is a problem of either multiple or partial correlation.

Linear and non-linear (Curvilinear)


The distinction between linear and non-linear correlation is based upon the constancy of the ratio of change between the variables.

If the amount of change in one variable tends to bear constant ratio to the amount of change in the other variable then the correlation is said to be linear. If the amount of change in one variable does not bear the a constant ratio to the amount of change in the other variable.

Positive Linear Correlation y y

Curvilinear relationships

x Negative Linear Correlation y

x Curvilinear relationships

No Correlation y

DIFFERENT METHODS OF STUDYING CORRELATION


METHODS OF CORRELATION

SCATTER DIAGRAM
GRAPHIC METHOD

KARL RANK CONCURRENT PEARSONS CORRELATION DEVIATION COEFFICIENT METHOD OF CORRELATION

SCATTER DIAGRAM METHOD


This is simplest method for ascertaining whether two variables are related is to prepare dot chart called scatter diagram. When this method is used the given data are plotted on a graph paper or simply a scatter plot in the form of dots. For each pair of X and Y values we put a dot and thus obtain as many points as the number of observations.

Discussed more in Regression

SCATTER DIAGRAM METHOD


Strong Degree of Positive Correlation y y Strong Degree of Negative Correlation

x Weak Degree of Positive Correlation Weak Degree of Negative Correlation

Examples of Approximate Correlation (r) Values


y y y

r = -1
y

r = -.6
y

r=0

r = +.3

r = +1

GRAPHIC METHOD
This is used when individual values of the two variables are plotted on the graph paper.

Discussed more in Regression

KARL PEARSONS COEFFICIENT OF CORRELATION

RANK CORRELATION METHOD

CONCURRENT DEVIATION

Calculating the Correlation Coefficient


Sample correlation coefficient:

( x x )( y y ) [ ( x x ) ][ ( y y ) ]
2 2

or the algebraic equivalent:

r
where:

[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]

n xy x y

r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable

Calculation Example
Tree Height y 35 49 27 33 60 21 45 51 =321 Trunk Diamete r x 8 9 7 6 13 7 11 12 =73 xy 280 441 189 198 780 147 495 612 y2 1225 2401 729 1089 3600 441 2025 2601 x2 64 81 49 36 169 49 121 144 =713

=3142 =14111

Calculation Example
Tree Height, y 70
60

(continued)

[n( x 2 ) ( x) 2 ][n( y 2 ) ( y)2 ] 8(3142) (73)(321) [8(713) (73)2 ][8(14111) (321)2 ]

n xy x y

50

40

30

0.886
r = 0.886 relatively strong positive linear association between x and y

20

10

0 0 2 4 6 8 10 12 14

Trunk Diameter, x

ALGEBRAIC METHOD (COVARIENCE METHOD)


r= Cov (x,y) x.y Cov (x,y) = 1 (x-x) (y-y) n

PROCESS OF CALCULATION
Calculate the means of the two series, X & Y. Take deviations - from their respective means, indicated as x and y. The deviation should be taken in each case as the value of the individual item minus () the arithmetic mean. Square the deviations in both the series and obtain the sum of the deviation. This would give x2 and y2.

PROCESS (Contd..)
Take the product of the deviations, that is, xy. This means individual deviations are to be multiplied by the corresponding deviations in the other series and then their sum is obtained. The values thus obtained in the preceding steps xy, x2 and y2 are to be used in the formula for correlation.

SHORT-CUT METHOD
Choose convenient values as assumed means of the two series, X and Y. Deviations (now dx and dy instead of x and y) are obtained from the assumed means in the same manner as in the earlier. Obtain the sum of the dx and dy columns, that is, dx and dy.

SHORT-CUT METHOD (Contd)


Deviations dx and dy are squared up and their totals dx2 and dy2 are obtained. Finally, obtain dxdy, which is the sum of the products of deviations taken from the assumed means in the two series.

You might also like