You are on page 1of 16

Regression Analysis

Alok Srivastava
Robinson College of Business,
Georgia State University

Module 3
Regression

Dependent variable

Independent variable (x)

Regression is the attempt to explain variation in a dependent


variable using variation in independent variables.
Regression is thus an explanation of causation.
If the independent variable(s) sufficiently explain variation in
dependent variable, the model can be used for prediction.
Simple Linear Regression

Dependent variable (y)


y = b0 + b1X

B1 = slope
= y/ x
b0 (y intercept)

Independent variable (x)

Output of a regression is a function that predicts the


dependent variable based upon values of the
independent variables.
Simple regression fits a straight line to data.
Simple Linear Regression

Observation: y

Dependent variable Prediction: y^

Zero
Independent variable (x)

The function will make a prediction for each observed data point.
The observation is denoted by y and prediction is denoted by ^y.
Simple Linear Regression

Prediction error:

Observation: y
Prediction: y^

Zero

For each observation, the variation can be described as:

y = y + ^
Actual = Explained + Error
Simple Linear Regression

Output of a simple regression is the coefficient and


constant A.
The equation is then:
y =A+ * x +
where is residual error.

is the per unit change in dependent variable for each unit


change in independent variable. Mathematically:

y
= x
Regression

Dependent variable

Independent variable (x)

A least squares regression selects the line with the lowest total sum
of squared prediction errors.
This value is called the Sum of Squares of Error, or SSE.
Calculating SSR

Dependent variable Population mean: y

Independent variable (x)

Sum of Squares Regression (SSR) is the sum of the squared


differences between the prediction for each observation and the
population mean.
Regression Formulas

Total Sum of Squares (SST) is equal to SSR + SSE.

Mathematically,

SSR = ( y y^) (measure


2 of explained variation)

SSE = ( y y ) ^(measure of unexplained variation)

SST = SSR + SSE = ( y y ) (measure


2 of total variation in y)
Coefficient of Determination

Proportion of total variation (SST) that is explained by the regression


(SSR) is known as the Coefficient of Determination, and is often
referred to as R 2.

SSR SSR
2
R = SST = SSR + SSE

R2 can range between 0 and 1


Higher its value the more accurate the regression model is.
It is often referred to as a percentage.
Standard Error of Regression

Standard Error of a regression is a measure of its variability


It can be used in a similar manner to standard deviation, allowing for
prediction intervals.
y 2 standard errors will provide approximately 95% accuracy, and
3 standard errors will provide a 99% confidence interval.
Standard Error is calculated by taking the square root of the average
prediction error.

Standard Error =
SSE
n-k

Where n is number of observations in sample and


k is the total number of variables in the model
Multiple Linear Regression

More than one independent variable can be used to explain variance in


the dependent variable, as long as they are not linearly related.

A multiple regression takes the form:

y = A + X 1+ 1 X 2+ 2 + k Xk +

where k is the number of variables, or parameters.


Nonlinear Regression

Nonlinear functions can also be fit as regressions.


Common choices include Power, Logarithmic,
Exponential, and Logistic, but any continuous function
can be used.
Exercise

Develop a model for


estimating heating
oil used for a single
family home in the
month of January
based on average
temperature &
amount of insulation
in inches
Y-hat = 562.15 - 5.44x1 - 20.01x2

where: x1 = temperature [degrees F]


x2 = attic insulation [inches]
Dependent variable: Gallons Consumed
-------------------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
--------------------------------------------------------------------------------------
CONSTANT 562.151 21.0931 26.6509 0.0000
Insulation -20.0123 2.34251 -8.54313 0.0000
Temperature -5.43658 0.336216 -16.1699 0.0000
--------------------------------------------------------------------------------------
R-squared = 96.561 percent
R-squared (adjusted for d.f.) = 95.9879 percent
Standard Error of Est. = 26.0138
+

You might also like