Correlation and Regression11

Introduction
 The introduction to associations between two

quantitative variables usually involves a discussion of
correlation and regression.
 Decision making is based upon the understanding of
the relationship between two or more variables.
 For example, a sales manager might be interested in
knowing the impact of advertising on sales.
sales and advertising expenses of
ACC cement
 Let us analyze the data given below
Year Sales(in Advertising
Million expenses At this point it is very difficult to
rupees) ( in million predict any kind of relationship
rupees) between the variables.
2002 32260.0 184.3
So let us make it more visible by
2003 33718.8 259.8
drawing the scatter diagram.
2004 39003.7 334.8
2005 45498.0 321.9
2006 37235.1 336.0
2007 78651.1 473.3
2008 83001.8 500.0
2009 88031.7 590.5
2010 86092.9 616.45
2011 104919.3 734.6
Scatter diagram
120000
Positive slope of the
graph shows that there is
a positive relationship
100000
between the two sets of
variables.
80000
That is any increment in

the advertising expenses
Sales
60000
shows an increment in
40000
sales volumes.
We can say that there

20000
exists a positive
correlation between the
0
0 100 200 300 400 500 600 700 800
above sets of variables.
Advertising Expenses
correlation
 The concept of ‘correlation’ is a statistical tool which
studies the relationship between two variables.
 Correlation Analysis involves various methods and
techniques used for studying and measuring the
extent of the relationship between the two variables.
 “Two variables are said to be in correlation if the
change in one of the variables results in a change in
the other variable”.
Significance of correlation
 The study of correlation is of immense use in practical
life.
 It contributes to the understanding of economic
behaviour, aids in locating the critically important
variables on which others depend.
 In business, correlation analysis enables the executive
to estimate costs, sales, prices and other variables on
the basis of some other series with which these costs,
sales or prices may be functionally related
 Example 1: Let X is the number of cars you buy, and Y be
the amount of money you spend ( assuming all the cars
cost the same). The more cars you buy the more money
you spend.
 We call this a positive correlation.
1400000
Amount Of Money
1200000
1000000
Spend
800000
600000
400000
200000
0
0 1 2 3 4 5 6 7
Number Of Cars
 Example 2: Let X is still number of cars but now Y is your
bank balance. With each car you buy your bank account
gets smaller and smaller. As X goes up Y goes down.
 We call this a negative correlation
1400000
1200000
Bank Balance
1000000
800000
600000
400000
200000
0
0 1 2 3 4 5 6 7
Number Of Cars
 Example 3: Lets try a less idealized example Let X is shirt
size and Y is shoe size. As one size goes up so does the
other, but the relationship varies from person to person,
the correlation here is still positive but not perfect.
5
Shoe Size
0
0 10 20 30 40 50
Shirt Size
Causation and correlation
 Correlation measures the relationship between two or more
variables
 Example: When the demand for a certain product goes up, its
price tends to go up as well, so there is a positive correlation
between the two variables.
 Causation, on the other hand, means that one thing will cause
the other.
 Example: When you exercise the amount of calories you are
burning per minute will go up, as the former is causing the latter.
 Correlation and causation can happen at the same time. In the
example above about exercising, for example, there’s both
correlation and causation in place.
 In the first example there is a clear cause at work.
Buying cars causes you to spend money.
Now think of the third example. Shirt and shoe sizes are
correlated.
Does this mean that wearing bigger shirts causes you to wear
bigger shoes?
Of course not.
Correlation does not imply
causation
 There could be a hidden factor Z at work causing both
X and Y. In example 3 hidden factor might be the
person’s height. Larger people usually wear bigger
shirts and shoes.
 So correlation doesn’t imply causation.
 For example, there is a positive correlation between

the number of firemen fighting a fire and the size
of the fire.
 However, this doesn’t mean that bringing more

firemen will cause the size of the fire to increase
(this is called reverse causation).
Scatter diagram
 A scatter diagram is a graphical technique used to
analyze the relationship between two variables.
 It shows whether or not there is correlation between
two variables.
 Two sets of data are plotted on a graph, with the y-
axis being used for the variable to be predicted and
the x-axis being used for the variable to make the
prediction.
Correlation Coefficient
 A measure that determines the degree to which two
variables movements are associated.
 The correlation coefficient will vary from -1 to +1.
-1<=r<=1
 Negative value shows that the correlation is negative,
the variables move in the reverse direction
 -1 indicates perfect negative correlation.
 zero indicates no correlation.
 Positive value shows that the correlation is positive,
the variables move in the same direction
 +1 indicates perfect positive correlation.

Perfectly Perfectly
No correlation
negative positive
-1 0 +1
Strength of negative Strength of positive

correlation increases correlation increases
 Correlation coefficient is denoted by r.
 Formula for correlation coefficient:
 r=
 r=
 Where n = sample size
 x = value of independent variable
 y = value of dependent variable
Regression
 Regression and correlation analyses are based on the
relationship, or association between two or more variables.
 One is called the dependent and the other is called the
independent variable.
 suppose you want to forecast sales for your company and
you've concluded that your company's sales go up and down
depending on changes in GDP.
sales Dependent variable

GDP Change Independent Variable
 A regression equation can be developed to forecast or
predict the variable we desire.
Y-intercept Regression coefficient

Dependent variable
Y = a + bX
Independent variable
 The regression equation simply describes the relationship
between the dependent variable (y) and the independent
variable (x).

The intercept, or "a", is the value of y (dependent variable)
if the value of x (independent variable) is zero. So if there
was no change in GDP, your company would still make
some sales - this value, when the change in GDP is zero, is
the intercept.
 The value of “b” interprets that if there is 1% increase in

GDP sale will likely to go up (or down) by “b” units
 If the correlation is positive then it will go up or else go
down.
The Standard Error of Estimates
 Measuring the reliability of the estimating equation.
Y Fig A Y Fig B
X X
A line is more accurate as an estimator when the data points lie close to the line (
as in fig A)than when the data points are farther away from the line ( as in fig B)
Actual
value
error
Estimated
value
 The standard error of estimate measures the variability , or
scatter of the observed values around the regression line.
 Se =
 Where Y = Values of the dependent variables.

 Y est = estimated values from the estimating equation
that correspond to each Y value.
 n = Number of data points used to fit the regression line
Interpretation of standard error of
estimate
 The larger the standard error of estimate, the greater
the scattering ( or dispersion) of points around the
regression line.
 Conversely, if Se = 0 we expect the estimating equation
to be a perfect estimator of the dependent variable
Example 350
300
y = 88.15x + 34.58
R² = 0.687
year sale Change in
GDP 250
2007 100 1.00%

200
2008 250 1.90%
sale
2009 275 2.40% 150
2010 200 2.60%

100
2011 300 2.90%
50
0
0.00% 50.00% 100.00% 150.00% 200.00% 250.00% 300.00% 350.00%
change in GDP
Interpretation
 The major outputs of simple linear regression are
 R- squared (coefficient of determination) ,
 the intercept
 and the GDP coefficient.
 The R-squared number in this example is 68.7% - this
shows how well our model predicts or forecasts the future
sales.
 intercept of 34.58, tells us that if the change in GDP was
forecasted to be zero, our sales would be about 35 units.
 the GDP coefficient of 88.15 tells us that if GDP increases
by 1%, sales will likely go up by about 88 units.
Least square method
 The regression line should be drawn on the scatter
diagram in such a way that when the squared values of
the vertical distance from each plotted point to the line
are added, the total amount will be the smallest
possible amount. This criterion is called the Method
of least squares.
Regression line of Y on X
 The equation of the straight line be
 Y = a + bX---------------------------- 1
 Let the sample size be n, then by adding those n data.
 Y = na + b X------------------------- 2
 Multiplying equation 2 by X
 XY = a X + b X2----------------------------------3
 By solving equations 2 and 3 we get

 b = (n XY - X Y)/(n X2 - ( X)2
 a = average(Y) – b average(X)
Regression line of X On Y
 The equation of the straight line be
 X = c + dY---------------------------- 1
 Let the sample size be n, then by adding those n data.
 X = nc + d Y------------------------- 2
 Multiplying equation 2 by Y
 XY = c Y + d Y2----------------------------------3
 By solving equations 2 and 3 we get

 d = (n XY - X Y)/(n Y2 - ( Y)2 )
 c = average(X) –d average(Y)
 Cost accountants often estimate overhead based on
the level of production. At the standard Knitting Co.
they have collected information on overhead expenses
and units produced at different plants and want to
estimate a regression equation to predict future
overhead.
 Overhead: 191 170 272 155 280 173 234 116 153 178
 Units: 40 42 53 35 56 39 48 30 37 40
Units(X) Overhead(Y) XY X*X Y*Y
40 191 7640 1600 36481

42 170 7140 1764 28900
53 272 14416 2809 73984
35 155 5425 1225 24025
56 280 15680 3136 78400
39 173 6747 1521 29929
48 234 11232 2304 54756
30 116 3480 900 13456
37 153 5661 1369 23409
40 178 7120 1600 31684
420 1922 84541 18228 395024
 Correlation coefficient (r) = 0.983516
 Interpretation: There is a high positive correlation
between level of production and overhead.
 Let us now regress overhead for the units produced.
 Y = a + bX
 Where b= (n XY - X Y)/(n X2 - ( X)2
b = 6.491
a = average(Y) – b* average(X)
= 192.2 – 6.491* 42 = 80.422

 Regression equation of overhead on level of
production is
 y = 80.422 + 6.491 X

Correlation and Regression11

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Correlation and Regression11

Uploaded by

Copyright:

Available Formats

Introduction

 The introduction to associations between two

That is any increment in

We can say that there

 For example, there is a positive correlation between

 However, this doesn’t mean that bringing more

 +1 indicates perfect positive correlation.

Strength of negative Strength of positive

sales Dependent variable

Y-intercept Regression coefficient

 The value of “b” interprets that if there is 1% increase in

 Where Y = Values of the dependent variables.

2007 100 1.00%

2010 200 2.60%

 By solving equations 2 and 3 we get

 By solving equations 2 and 3 we get

40 191 7640 1600 36481

= 192.2 – 6.491* 42 = 80.422

You might also like