Professional Documents
Culture Documents
60000
shows an increment in
40000
sales volumes.
1400000
Amount Of Money
1200000
1000000
Spend
800000
600000
400000
200000
0
0 1 2 3 4 5 6 7
Number Of Cars
Example 2: Let X is still number of cars but now Y is your
bank balance. With each car you buy your bank account
gets smaller and smaller. As X goes up Y goes down.
We call this a negative correlation
1400000
1200000
Bank Balance
1000000
800000
600000
400000
200000
0
0 1 2 3 4 5 6 7
Number Of Cars
Example 3: Lets try a less idealized example Let X is shirt
size and Y is shoe size. As one size goes up so does the
other, but the relationship varies from person to person,
the correlation here is still positive but not perfect.
5
Shoe Size
0
0 10 20 30 40 50
Shirt Size
Causation and correlation
Correlation measures the relationship between two or more
variables
Example: When the demand for a certain product goes up, its
price tends to go up as well, so there is a positive correlation
between the two variables.
Causation, on the other hand, means that one thing will cause
the other.
Example: When you exercise the amount of calories you are
burning per minute will go up, as the former is causing the latter.
Correlation and causation can happen at the same time. In the
example above about exercising, for example, there’s both
correlation and causation in place.
In the first example there is a clear cause at work.
Buying cars causes you to spend money.
Now think of the third example. Shirt and shoe sizes are
correlated.
Does this mean that wearing bigger shirts causes you to wear
bigger shoes?
Of course not.
Correlation does not imply
causation
There could be a hidden factor Z at work causing both
X and Y. In example 3 hidden factor might be the
person’s height. Larger people usually wear bigger
shirts and shoes.
So correlation doesn’t imply causation.
-1<=r<=1
Correlation Coefficient
Negative value shows that the correlation is negative,
the variables move in the reverse direction
-1 indicates perfect negative correlation.
zero indicates no correlation.
Positive value shows that the correlation is positive,
the variables move in the same direction
-1 0 +1
r=
r=
Where n = sample size
x = value of independent variable
y = value of dependent variable
Regression
Regression and correlation analyses are based on the
relationship, or association between two or more variables.
One is called the dependent and the other is called the
independent variable.
suppose you want to forecast sales for your company and
you've concluded that your company's sales go up and down
depending on changes in GDP.
Y = a + bX
Independent variable
The regression equation simply describes the relationship
between the dependent variable (y) and the independent
variable (x).
The intercept, or "a", is the value of y (dependent variable)
if the value of x (independent variable) is zero. So if there
was no change in GDP, your company would still make
some sales - this value, when the change in GDP is zero, is
the intercept.
X X
A line is more accurate as an estimator when the data points lie close to the line (
as in fig A)than when the data points are farther away from the line ( as in fig B)
Actual
value
error
Estimated
value
The standard error of estimate measures the variability , or
scatter of the observed values around the regression line.
Se =
300
y = 88.15x + 34.58
R² = 0.687
year sale Change in
GDP 250
sale
2009 275 2.40% 150
0
0.00% 50.00% 100.00% 150.00% 200.00% 250.00% 300.00% 350.00%
change in GDP
Interpretation
The major outputs of simple linear regression are
R- squared (coefficient of determination) ,
the intercept
and the GDP coefficient.
The R-squared number in this example is 68.7% - this
shows how well our model predicts or forecasts the future
sales.
intercept of 34.58, tells us that if the change in GDP was
forecasted to be zero, our sales would be about 35 units.
the GDP coefficient of 88.15 tells us that if GDP increases
by 1%, sales will likely go up by about 88 units.
Least square method
The regression line should be drawn on the scatter
diagram in such a way that when the squared values of
the vertical distance from each plotted point to the line
are added, the total amount will be the smallest
possible amount. This criterion is called the Method
of least squares.
Regression line of Y on X
The equation of the straight line be
Y = a + bX---------------------------- 1
Let the sample size be n, then by adding those n data.
Y = na + b X------------------------- 2
Multiplying equation 2 by X
XY = a X + b X2----------------------------------3