You are on page 1of 19

1

SIMPLE LINEAR REGRESSION

Prepared for the presentation at Kuzgun Consulting


by Yeim OZAN

Istanbul, 12.08.2015
2

Contents

1. Definition and Purpose of Simple Linear Regression


2. Derivation of Linear Regression Equations
3. Parameter Estimation
4. Hypothesis Test for Parameter Estimation
5. Analysis of Variance Approach to Test the Significance
of Regression
6. Confidence Interval on Fitted Values
7. Coefficient of Determination
3

1. Definition and Purpose of Simple


Linear Regression
Technique for determining how one variable of interest is affected by the
changes in another variable.

Used for three main purposes;

To describe the linear dependence of one variable on another


The predict values of one variable from variables of another, for which more
data are available
To correct for the linear dependence of one variable on another, in order to
clarify other features of its variability.
1.Definition and Purpose of Simple Linear
Regression (cont`d.)

Regression model:

Yi = 0 + 1Xi + i

Independent/Explanatory

Dependent/Response Residual/Error
5

1.Definition and Purpose of Simple


Linear Regression (cont`d.)

Linear regression determines the best-fit line through a


scatter plot of data, such that the sum of squared residuals
is minimized; equivalently, it minimizes the error variance.
The fit is "best" in precisely that sense: the sum of squared
errors is as small as possible.
6

2.Derivation of linear regression


equations (cont`d.)

Under these assumptions;


E(i)=0
Var(i)=2
1, 2, n are independent from each other

If E(Y/X=x)=0 + 1X is a function between random


variables X, Y and observations can be expressed as
Yi= 0 + 1Xi+i , i=1,2,,n
this model is called linear regression model.
7

3.Parameter Estimation
With these assumptions
i ~ N(0, 2)
1, 2, n are independent from each other
model parameters are
0, 1 R and 2(0,).

Provided that Xi is fixed and Yi is random variable


for i=1,2,,n ,
Yi ~N(0 + 1Xi, 2)

and Yi are independent from each other for i=1,2,n.


8

3.Parameter Estimation (cont`d.)


methods are used, least squares and maximum likelihood
Two
methods.
Maximum likelihood parameter estimators are as follows:

Once and are known, the fitted regression line can be written
as:

and residuals can be written as:


9

4.Hypothesis Test for Parameter Estimation

Hypothesis is:

The test statistic used for this test is:

The null hypothesis, , is accepted if the calculated value of the test


statistic is such that:

A similar procedure can be used to test the hypothesis on the intercept, .


10

4.Hypothesis Test for Parameter Estimation


(cont`d.)

Possible scatter plots of against . Plots (a) and (b) represent cases
when is not rejected. Plots (c) and (d) represent cases
when is rejected.
11

5. Analysis of Variance Approach to Test the Significance of


Regression

The analysis of variance (ANOVA) is another method to test for the


significance of regression.
A sample of classic ANOVA table:
Degrees of Sum of Mean Squares
Source
Freedom Squares (SS) (MS)
Regression 1

Error n-2

Total n-1

Each terms included in the above ANOVA table shall be calculated


separately. Such calculations are explained in the following slides.
12

5.1. Sum of Squares (SS)


theis sum
SS of square of deviations of all the observations, , from their mean, . In
context of ANOVA, this quantity is called the total sum of
squares (abbreviated ) because it relates to the total variance of the
observations.

In a perfect model, the regression model is such that the resulting fitted
regression line passes through all of the observations.

can be calculated using a relationship similar to the one for


obtaining. Therefore:
13

5.1. Sum of Squares (SS) (cont`d.)


variability
In a non-perfect model, a certain part still remains unexplained in the total
of the observed data. This is called as the error sum of squares
(abbreviated ).
can be obtained as the sum of squares of these deviations:

The total variability of the observed data (i.e., total sum of squares,)
can be written as:

The above equation also means to the following:


14

5.1. Sum of Squares (SS) (cont`d.)

Scatter plots showing the deviations for the sum of squares used in
ANOVA. (a) shows deviations for , (b) shows deviations for , and (c) shows
deviations for .
15

5.2. Mean Squares (MS)


squares are obtained by dividing the sum of squares by the
Mean
respective degrees of freedom. The error mean square, , can be
obtained as:

is an estimate of the variance, , of the random error term, , and can be


written as:

Similarly, the regression mean square, , can be obtained by dividing


the regression sum of squares by the respective degrees of freedom as
follows:
16

5.3. Test
test the hypothesis , the statistic used is based on the distribution. It
To
can be shown that if the null hypothesis is true, then the statistic:

The above-stated statistic follows the distribution with 1 degree of freedom


in the numerator and degrees of freedom in the denominator. is rejected if
the calculated statistic, , is such that:

is the percentile of the distribution corresponding to a cumulative


probability of and, is the significance level.
17

6.Confidence Interval on Fitted Values

A () percent confidence interval on any fitted value, , is obtained as


follows:

The width of the confidence interval depends on the value of and will be a
minimum at and will widen as increases.
18

7. Coefficient of Determination

The coefficient of determination is a measure of the amount of


variability in the data accounted for by the regression model The
coefficient of determination is the ratio of the regression sum of
squares to the total sum of squares.

can take on values between 0 and 1. The value of increases as more


terms are added to the model, even if the new term does not
contribute significantly to the model.
19

Thank you for your attention.

You might also like