Professional Documents
Culture Documents
By - Prof. Prashant
CORRELATION
A measure of association between two numerical variables.
If two quantities vary in such a way that movements in one are accompanied by movements in the other, these quantities are correlated. Correlation analysis is used as a statistical tool to ascertain the association between two variables.
Correlation analysis deals with the association between two or more variables. - Simpson and Kafka
The problem in analyzing the association between two variables can be broken down into three steps:
Try to know whether the two variables are related or independent of each other.
If there is a relationship between the two variables, then know its nature & strength. (i.e., positive/negative; how close is that relationship)
Know if there is a causal relationship between them. i.e., variation in one variable causes variation in another.
Example Typically, in the summer as the temperature increases people are thirstier.
For seven random summer days, a person recorded the temperature and their water consumption, during a threehour period spent outside.
Temperature (F)
75 83 85 85 92 97 99
16 20 25 27 32 48 48
CORRELATION
TYPES OF CORRELATION
Positive and negative
Positive Correlation
Negative Correlation
correlation is based on the number of studied. When only two variables are studied it is problem of simple correlation. When three or more variables are studied I is a problem of either multiple or partial correlation.
If the amount of change in one variable tends to bear constant ratio to the amount of change in the other variable then the correlation is said to be linear. If the amount of change in one variable does not bear the a constant ratio to the amount of change in the other variable.
Curvilinear relationships
x Curvilinear relationships
No Correlation y
SCATTER DIAGRAM
GRAPHIC METHOD
r = -1
y
r = -.6
y
r=0
r = +.3
r = +1
GRAPHIC METHOD
This is used when individual values of the two variables are plotted on the graph paper.
CONCURRENT DEVIATION
( x x )( y y ) [ ( x x ) ][ ( y y ) ]
2 2
r
where:
[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]
n xy x y
r = Sample correlation coefficient n = Sample size x = Value of the independent variable y = Value of the dependent variable
Calculation Example
Tree Height y 35 49 27 33 60 21 45 51 =321 Trunk Diamete r x 8 9 7 6 13 7 11 12 =73 xy 280 441 189 198 780 147 495 612 y2 1225 2401 729 1089 3600 441 2025 2601 x2 64 81 49 36 169 49 121 144 =713
=3142 =14111
Calculation Example
Tree Height, y 70
60
(continued)
n xy x y
50
40
30
0.886
r = 0.886 relatively strong positive linear association between x and y
20
10
0 0 2 4 6 8 10 12 14
Trunk Diameter, x
PROCESS OF CALCULATION
Calculate the means of the two series, X & Y. Take deviations - from their respective means, indicated as x and y. The deviation should be taken in each case as the value of the individual item minus () the arithmetic mean. Square the deviations in both the series and obtain the sum of the deviation. This would give x2 and y2.
PROCESS (Contd..)
Take the product of the deviations, that is, xy. This means individual deviations are to be multiplied by the corresponding deviations in the other series and then their sum is obtained. The values thus obtained in the preceding steps xy, x2 and y2 are to be used in the formula for correlation.
SHORT-CUT METHOD
Choose convenient values as assumed means of the two series, X and Y. Deviations (now dx and dy instead of x and y) are obtained from the assumed means in the same manner as in the earlier. Obtain the sum of the dx and dy columns, that is, dx and dy.