You are on page 1of 12

Chapter I: Correlation 1.

Introduction In the previous Chapters we have studied the characteristics of only one variable; example, marks, weights, heights, rainfalls, prices, ages, sales, etc. This type of analysis is called unvaried analysis. When we study the relationship between two variables, it is called bivariant analysis. In this case, the two variables are inter-related. We are interested to find out what type of relationship exists between the two variables. As for example, the price of the commodity and its sale are interrelated. With increase in the price of the product the quantity sold decreases, and with decrease in the price of the product the quantity sold increases. So, we can conclude that there is some relationship between price and sale. Such relationship between two or more variables refers to correlation. 2. Definition of correlation Correlation analysis attempts to determine the degree of relationship between variables. In other word, correlation is an analysis of the co-variation between two or more variables. The correlation expresses the relationship or inter-dependence of two sets of variables in such a way that the changes in the value of one variable instigate (result) changes in the other variable. One variable may be called independent and the other dependent variable. For instance, agricultural production depends on rainfall. Rainfall causes increase in the agricultural production. But increase in the agricultural production has no impact on the rainfall. Therefore, rainfall is independent and production is dependent. 2.1 Usefulness Correlation is useful in natural and social sciences. We shall, however, study the uses of relation in business and economics. 1. Correlation is very useful to economists to study the relationship between variables, like price and quantity demanded. For businessmen, it helps to estimate costs, sales, price and other related variables. 2. Correlation analysis helps in measuring the degree of relationship between the variables like supply and demand, price and supply, income and expenditure, etc. 3. With the help of the correlation analysis the relation between variables can be verified and tested. Correlation analysis help to reduce the range of uncertainty of our prediction. 4. Correlation analysis is the basis for regression analysis. 3. Types of correlation There are different types of correlation, but the important types are:
o o

Positive and negative Simple and partial and multiple; and Linear and non-linear

3.1 Positive and negative correlation Positive and negative correlations depend upon the direction of change of the variables. If two variables tend to move together in the same direction, is called positive or direct correlation. It means that in this case an increase in the value of one variable is accompanied by an increase in the value of the other variable; and a decrease in the value of one variable is accompanied by a decrease in the value of the other variable. Rainfall and yield of crops, price and supply are examples of positive correlation.
Chapter: Correlation

If two variables tend to move together in opposite directions, it is called negative or inverse correlation. It means that (i) an increase in the values of one variable is accompanied by a decrease in the value of the other variable, and (ii) an decrease n the values of one variable is accompanied by an increase in the value of the other variable. Price and demand, yield of crops and price, etc., are examples of negative correlation. 3.2 Simple, partial and multiple When we study only two variables, the relationship is described as simple correlation; example, supply of money and price level, demand and price, etc. In a multiple correlation we study more than two variables simultaneously, for example, the relationship of price, demand and supply of a commodity. When there is a multifactor relationship, but we study the correlation of only two variables excluding other, it is called partial correlation. For example, there is relationship among price, demand and supply, but when we study the correlation between price and demand, excluding the impact of the supply it is a partial correlation. 3.3 Linear and non-linear (curvilinear) correlation If the amount of change in one variable tends to bear a constant ration to the amount of change in the other variable then correlation is said to be liner. In such case, if we plot the variables on the graph, we get a straight line. The example below shows linear correlation between X and Y variables: X 5 10 15 20 25 Y 4 8 12 16 20 Correlation would be call non-linear if the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable. For example, if we double the amount of rainfall, the production of rise or wheat etc, would not necessarily be doubled. It may be pointed out that in most practical cases we find a non liner relation between the variable 4. Methods of calculating correlation There are following different methods of finding out the relationship between two variables: The different methods of finding out the relationship between two variables are Graphic method 1. Scatter Diagram or Scattergrarn 2. Simple Graph or Correlogram Mathematical method o o o o Karl Pearson's Coefficient of Correlation Spearman's Rank coefficient of Correlation Coefficient of Concurrent Deviation Method of least squares.

4.1 Graphic method 4.1.1 Scatter Diagram method

Chapter: Correlation

This is the simplest method of finding out whether there is any relationship present between two variables. In this method, the given data are plotted on a graph paper in the form of dots. X variables are plotted on the horizontal axis and Y variables on the vertical axis. As a result, we have the dots and we can find out the relationship of different points.

Diagram 1 Diagram 5

Diagram 2

Diagram 3

Diagram 4

If the plotted points form a straight line running from the lower left-hand corner to the upper right-hand corner, then there is a perfect positive correlation (Diagram 1). On the other hand, if the points are in a straight line, having a falling trend from the upper left-hand corner to the lower right-hand corner, then there is a perfect negative or inverse correlation (Diagram 2). If the plotted points fall in a narrow band, and the points are rising from lower left-hand corner to the upper right-hand corner, there is positive correlation between the variables (Diagram 3). If the plotted points fall in a narrow band from the upper left-hand corner to the lower right-hand corner, there is negative correlation (Diagram 4). If the plotted points lay scatter all over the diagram, there is no correlation between the two variables (Diagram 5). Merits of Graphic method o o o o o Scatter diagram is a simple and attractive method of finding out the nature of correlation between two variables It is a non-mathematical method of studying correlation. It is easy to understand We can get a rough idea at a glance whether it is a positive or negative correlation It is not influenced by extreme items It is a first step in finding out the relationship between two variables

Demerits of graphic method o o It gives only a rough idea. By this method we do not get the exact (quantified) degree of correlation between two variables.

4.1.2 Simple graph or Correlogram If the values of the two variables X and Y are plotted on a graph paper, we get two curves, one for X variables and another for Y variables. These two curves reveal whether or not the variables are related. If both the curves move in the same direction parallel to each other, upward or downward the correlation between X and Y is said to be positive. On the other hand, if they move in the opposite directions, then the correlation is said to be negative. The method above is used in the case of time series. This method, however, does not reveal the extent to which the variables are related to each other. Example Draw a correlation graph from the following data: Period
Chapter: Correlation

Januar y

Februar y

March

April

May

June
3

Variable 1 (X) Variable 2 (Y)

15 30

18 35

22 43

20 41

25 51

20 40

60 30 30 15 0 0 J F M A M J

4.2 Mathematical Method 4.2.1 Karl Pearson's Coefficient of Correlation Karl Pearson a statistician suggested a mathematical method for measuring the magnitude of linear relationship between two variables. Karl Pearson's method is known as Pearsonian Coefficient of Correlation. It is the most widely used method, which is denoted by the symbol r. The formulas for calculating Pearsonian Coefficient of Correlation are: r= xy (1) Nxy r= Where, xy x y x = (X X) y = (Y Y) x = Standard deviation of series X y = Standard deviation of series Y When standard deviations of series X and Y can be calculated, the formula (1) is applied. The second formula (2) can be applied without calculating the standard deviation. The coefficient of correlation can be calculated easily applying formula (2), because the standard deviation of X and Y series are not required.
The value of the coefficient of correlation lies between +I and -1. When r = + 1, then there is perfect positive correlation between the variables. When r = - I, then there is perfect negative correlation between the variables. When r = 0, then there is no relationship between the variables. Normally, the value lies between + 0.8 and - 0.5; When r = + 0.8, then there is positive correlation, because r is positive and the magnitude of correlation is 0.8. When - 0.5; then there is negative correlation and the magnitude of correlation is 0.5.
2 2

(2)

Following are steps for the calculation of Karl Pearson's Coefficient of Correlation, r:
Chapter: Correlation

Steps: Find out the mean of the two series i.e., X and Y Take deviations of the two series from mean of these series X and Y and denote x and y o Square the deviations and calculate x2 and y2 o Multiply the deviations of x and y and xy o Substitute the values in the formula o o Coefficient of Correlation can also be calculated from the values of X and Y and without calculating the deviation. This formula is used when the values of X and Y are small. The formula is: (XYN - XY) r= [X2N (X)2][Y2N (Y)2] Problem Calculate coefficient of correlation from the following data: X 12 9 8 10 11 13 Y 14 8 6 9 11 12 Solution We know that: r= Where, xy x2y2 7 3

x = (X X) y = (Y Y) Let us find out: Find out the mean of the two series i.e., X and Y Take deviations of the two series from mean of these series X and Y and denote x and y o Square the deviations and calculate x2 and y2 o Multiply the deviations of x and y and xy o Substitute the values in the formula above o o X 12 9 8 10 11 13 7 X =70 Y 14 8 6 9 11 12 3 Y= 63 x=XX 2 -1 -2 0 1 3 -3 y=YY 5 -1 -3 0 2 3 -6 xy 10 2 6 0 2 9 18 xy = 47 x2 = (X X)2 4 1 4 0 1 9 9 2 x = 28 y2 = (Y Y)2 25 1 9 0 4 9 36 x2 = 84

We have X = 10, and Y = 9 xy = 47 x2 = 28 y2 = 84


Chapter: Correlation

Let us put the values in the formula: r= xy x2y2 47 = 28 x 84 = 0.97

Problem X Y 57 113 59 117 62 126 63 126 64 130 65 129 55 111 58 116 57 112

Solution We know that coefficient of correlation: xy r= x2y2 Let us find out:

o Find out the mean of the two series i.e., X and Y o Take deviations of the two series from mean of these series X and Y and denote x and y o Square the deviations and calculate x2 and y2 o Multiply the deviations of x and y and xy o Substitute the values in the formula above X 57 59 62 63 64 65 55 58 57 54 0 Note: Y x = dx = (X -X) -3 -1 +2 3 4 5 -5 -2 -3 x = 0 x2 =dx2 = (X -X)2 9 1 4 9 16 25 25 4 9 2 x = 102 y = (Y -Y) -7 -3 6 6 10 9 -9 -4 14 y = 0 y2 = dy2 = (Y -Y)2 49 9 36 36 100 81 81 16 64 2 y = 472 xy 21 3 12 18 40 45 45 8 24 xy = 216

113 117 126 126 130 129 111 116 112 108 0 X = 60, and Y = 120

Let us put the values in formula: xy x2y2 216 = 102 x 216 = 0.98

r=

4.2.2 Alternative Method for the calculation of the coefficient of correlation The coefficient of correlation can also be calculated by applying the following formula: r= xy Nxy
Chapter: Correlation

Where, x = Standard deviation of series X y = Standard deviation of series Y N = Number of the items We have: xy = 216 x = Standard deviation of series X = y = Standard deviation of series Y = N = Number of the items = 9 Putting these values in the formula above we get: r= xy Nxy 216 = 9 x 3.36 x 7.24 = 0.98 dx2/N = dy2/N = 102/9 = 3.36 472/9 = 7.24

The correlations between height and weight are very high, that means it is very probable that with the change of height the weight also changes. Problem Find Karl Pearson's coefficient of correlation from the following data: Wages Cost of living Solution Method I We know that coefficient of correlation: xy r= x2y2 Where: o x (dx) and y (dy) are deviations of the series X and Y from their means x2 (dx2) and y2 (dy2) are sum of square the deviations xy is sum of product of the deviations x (dx) and y (dy) from their means 100 98 101 99 102 99 102 97 100 95 99 92 97 95 98 94 96 90 95 91

o
o

Let us calculate: o x and y , the deviations of the series X and Y x2 and y2 and x2 and y2, i.e. sum of square the deviations xy, sum of product of the deviations x and y from their means Wage (X) 100 101 102 102 100 99 97 98 96 95 dx = (XX) +1 +2 +3 +3 +1 0 -2 -1 -3 -4 dx2 = (XX)2 1 4 9 9 1 0 4 1 9 16 C.O.L (Y) 98 99 99 97 95 92 95 94 90 91 dy = (YY) +3 +4 +4 +2 0 -3 0 -1 -5 -4 dy2 = (YY)2 9 16 16 4 0 9 0 1 25 16 dx.dy +3 +8 +12 +6 0 0 0 +1 +15 +16
7

o
o

Chapter: Correlation

x =dx=0 We have:

x2 =dx2=54

y=dy=0

y2=dy2= 96

dxdy=61

xy = dxdy = 61 x2 = dx2 = 54 y2 = dy2 = 96 Putting these values in the formula above: r= xy x2y2 61 = 54x96 61 = 72 = 0.847

The coefficient of correlation of the two series given is 0.847. Method II We know that the coefficient of correlation is: r= Where, x = Standard deviation of series X y = Standard deviation of series Y N = Number of the items We have: N = 10 xy = 61 x = Standard deviation of series X = xy Nxy

dx2 = N dy2 = N

54 = 2.32 10 96 = 3.10 10

y = Standard deviation of series Y = Putting these values in the formula above: xy 61 r= = Nxy 10 x 2.32 x 3.10

= 0.848

The coefficient of correlation of the two series given is 0.848. 4.3 Correlation of grouped bi-variant data (continuous frequency series) When the number of observations is very large, the data is classified into class intervals and frequency distribution (or correlation table). In these cases the data are arranged in continuous frequency series (distribution). The formula for calculating the coefficient of correlation is in this case (continuous frequency series) is: fdxdy r= fdx2 Chapter: Correlation

fdxfdy N (fdy)2 fdy2 8

(fdx)2 x

This formula has similarity with the formula applied in continuous frequency series for the calculation of correlation, which is used for assumed mean. The only difference is that here the deviations are multiplied by the respective frequencies. Steps 1. Find out the mid-points of the various classes for x and y variables. 2. Calculate the step deviation of x variables dx 3. Calculate the step deviation of y variable dy 4. Multiply dx, dy and the respective frequencies (f) and calculate fdxfdy 5. Multiply dx by the respective frequency (f) and calculate fdx. 6. Multiply dx2 by the respective frequency (f) and get fdx2. 7. Multiply dy by the respective frequency (f) and get fdy. 8. Multiply dy2 by the respective frequency (f) and get fdy2 9. Putt the values in the above formula to calculate r. 5. Error of measurement for coefficient of correlation Coefficient of correlation is more reliable if the error of measurement is reduced to the minimum. 1 - r2 Probable Error (P.E.r) = 0.6745 N 5.1 Functions of Probable Error If the value of r is less than the probable error, the value of r is not at all significant. If the value of r is more than six times the probable error, the value of r is significant. If the probable error is less than 0.3, the correlation should not be considered at all. If the probable error is small, the correlation definitely exists. Problem Correlation of Coefficient r = .8 and N = 16. Calculate P.E.r; and upper and lower limit within which the correlation varies. Solution 1 - r2 1 (.8)2 Probable Error (P.E.r) = 0.6745 = = .06 N 16 The limit of correlation = .8 .06 (.74 and 0.86) 6.2 Conditions For Use of Probable Error The number of items should be large enough. When the number of pairs of observation is small, the probable error may lead to fallacious conclusions. The distribution should have a normal distribution. The items in the sample must have been selected by random sample method and in an unbiased manner. The statistical measure for which probable error is computed must have been from a sample. 6. Rank Correlation Coefficient Qualitative characteristics cannot be measured quantitatively, as in the case of Pearsons coefficient of correlation Rank method is useful in dealing with qualitative characteristics such as intelligence, beauty, morality, character, etc. Rank method is based on the ranks given to the observations. It can be used when the items are irregular, extreme, erratic or inaccurate Rank correlation is applicable on: 6D2 R=1N (N2-1) Where: R =Rank coefficient of correlation D2 = Sum of the squares of the differences of two ranks N = Number of paired observations The value of R lies between + -1 and 1.
Chapter: Correlation

If P = + 1, then there is complete agreement in the order of ranks and the direction of the rank is also the same. If R = -1, then there is complete disagreement in the order of ranks and they are in opposite directions. We may come across two types of problems Where ranks are given Where ranks are not given When the actual ranks are given, the steps followed are Compute the difference of the two ranks (R1 and R2) and denote by D. Square the D to get D2. Compute D2 Apply the formula. Problem Two teachers were asked to rank 6 students of a class. Compute the coefficient of rank and discuss the result. Student R1 R2 1 10 8 2 8 10 3 7 6 4 9 6 5 5 7 6 3 5 Solution We know that: 6D2 R=1N (N2-1) Where: R =Rank coefficient of correlation D2 = Differences of two ranks (R1 R2) N = Number of paired observations Student R1 R2 R1R2 (R1R2)2 1 10 8 2 4 2 8 10 -2 4 3 7 6 1 1 4 9 6 3 9 5 5 7 -2 4 6 3 5 -2 4 D2 = 26 We have: D2 = 26, and N = 6 6D2 R=1257 N (N2-1) 6(62 1) 210 There is a very weak correlation between the rankings of the two teachers. Problem A group of 6 students achieve following marks in mathematics and statistics. Show whether the marks achieved have any significance for ranking of the students. Discuss the result. Mathematic Statistics s 95 80 87 90 76 60 65 80 60 70 50 60 Solution We know that: 6D2
Chapter: Correlation

6 26 = 1=1-

156 = 1 - .742 = .

10

R=1-

N (N2-1)

Where: R =Rank coefficient of correlation D2 = Sum of the squares of the differences of two ranks N = Number of paired observations (R1- R2)2 1 1 1 4 4 1 (R1- R2)2 =12

Mathematics 95 87 76 65 60 50

Statistics 80 90 60 80 70 60

R1 6 5 4 3 2 1

R2 5 6 3 5 4 2

R1- R2 1 -1 1 -2 -2 -1

We have: N = 6, and D2 = (R1- R2)2 =12 6.12 72 R=1=1= .657 2 6(6 -1) 210 The coefficient of rank show that the mark achieved is correlated. The students well in mathematics also did well in statistics. 7. Correlation of Time Series Data A time series is collection data from different time periods In a time series one variable is time. The time series depicts two types of fluctuations (1) long term and (2) short term. If we correlate two time series, the resulting coefficient of correlation will include both long term and short-term changes. If we desire to study correlation of short-term change, the trend values will be Correlation for time series is calculated by: xy r= x2y2 Steps required: Calculate moving average (5-4-3-2 years/months/days) for X and Y Subtract moving average from X and Y and calculate x (x = X-X ma) and y (y = Y-Yma) Divide the deviations with appropriate number to reduce the deviations. Calculate x2 and y2. Multiply x and y and calculate xy. Apply the formula. Problem Calculate coefficient of correlation for following time series data: Year Supply (X) Price (Y) 1999 30 10 2000 34 12 2001 40 14 2002 42 18 2003 46 20 2004 48 24 2005 50 30 Solution Calculate coefficient of correlation for following time series data: Year Supply Xma x=Xx2 Price Yma y=Y(X) Xma (Y) Yma
Chapter: Correlation

y2

xy
11

1999 2000 2001 2002 2003 2004 2005

30 34 40 42 46 48 50

32 37 41 44 47 49

2 3 1 2 1 1 14

4 9 1 4 1 1

10 12 14 18 20 24 30 14 = 20

11 13 16 19 22 27

1 1 2 1 2 3

1 1 4 1 4 9

2 3 2 2 2 3

xy = 14, x2 = 20, y2 = 20 xy r= = 2 2 x y

= .70

20 20

Problem Find correlation between (i) growth of the GDP and Inflation and (ii) growth of the GDP and unemployment. Appendix Development of M2, GDP, Inflation, and Unemployment in Bangladesh From 1991 to 2008 M2 GDP Yea Crore Growth Inflatio Crore Growth Unemployment r (%) Taka) (%) n Taka (%) 1991 25004.4 --8.31 110518 --1.9 1992 28524.9 14.08 4.56 119542 8.17 ---1993 31535.6 10.55 2.73 125370 4.88 ---1994 36403.0 15.43 3.28 135412 8.01 ---1995 42212.3 15.96 8.87 152518 12.63 2.5 1996 45690.5 8.24 6.65 166324 9.05 ---1997 50627.5 10.80 2.52 180701 8.64 ---1998 55869.0 10.35 6.99 200177 10.78 ---1999 63026.3 12.81 8.91 219695 9.75 4.3 2000 74762.4 18.62 3.41 237086 7.92 ---2001 87174.1 16.60 1.58 253546 6.94 ---2002 98616.0 13.13 2.36 273201 7.75 4.3 2003 113994.4 15.60 5.14 300580 10.02 ---2004 129721.7 13.80 5.14 332973 10.78 ---2005 151446.6 16.75 6.48 370707 11.33 4.25 2006 180674.2 19.30 7.16 415728 12.14 ---2007 211504.4 17.06 7.20 472477 13.65 ---2008 248795.0 17.63 9.94 545822 15.52 ---Note: GDP: At Current Price, M2: Board Money Sources: Bangladesh Bank, Bangladesh Bank Bulletin, 1995-6 & 2004-5. Bangladesh Bank, Economic Trends, Statistical Department, Monthly, Volume XXXIV, No. 7 & XXXVII, No. 11. Statistical Yearbook of Bangladesh, Bangladesh Bureau of Statistics, Planning Division, Ministry of Planning, Government of the Peoples Republic of Bangladesh, 1994, 1995, 2005 & 2008.

Chapter: Correlation

12

You might also like