Professional Documents
Culture Documents
Learning Objectives
Understanding What is regression analysis Where it is used?
Regression
Provides a conceptually simple method for investigating functional relationships between one or more independent explanatory variables (factors) and a dependent variable (outcome of interest)
independent variables
Regression in Business
Predict the future joint distribution of asset returns Construct a optimal portfolio (choose weights) Estimate the effect of price and advertisement on sales Decide what is optimal price and ad campaign Predict the future probability of default using known characteristics of borrower Decide whether or not to lend (and if so, how much)
Regression in Business
Sales volume, market movement (icecream, houses) Customer complaints over time Key product specialization Predict the demographics and types of future workforce for large companies Estimate training impact
What price should I charge for my car? What will the interest rate be next month? Will this person like that movie?
Does your income increase if you complete this course? Will tax incentives change purchasing behaviour? Is my advertising campaign working?
Where to start?
Linear Prediction
Example : Predicting house price Problem: Predict market price based on observed characteristics
Solution: Look at property related data where we know the price and some observed characteristics Build a decision rule that predicts price as a function of the observed characteristics
Size No of rooms Attached baths Garage space, UPS facility, neighbourhood etc
Easy to quantify variables like price and size but what about other variables like aesthetics, workmanship etc.
The value that we seek to predict is called the dependent (or explained ) variable, and we denote this as
The variable that we use to guide prediction is the independent (or explanatory) variable, and this is labelled
Linear prediction
Recall that the equation of a line is: Y = b0 + b1X We add the random residual term Y = b0 + b1X + u
The intercept value is in units of Y (Rs.1,00,000) The slope is in the units of Y per unit of X (Rs.1,00,000/1,000 Sq feet)
Intercept b0 : when X =0, Y = b0 Intercept is the best predictor of Y Slope b1 : when X increases by 1 unit (1000 sq ft), Y increases by b1 units (Rs.1,00,000)
Linear Prediction
Linear Prediction
We can now predict the price of a house when we know only the size
Y = dependent variable X1, X2, X3, Xp = independent variables Linear relationship is written as:
Estimating this model requires statistical tools better than simple graphical methods Least Square Method
A reasonable way to fit a line is to minimize the amount by which the fitted value differs from the actual value. This amount is called the residual or Error
Yi 0 1 X i u i
X Y i 0 1 i
Fitted value
What is the fitted value?
The dots are the observed values and the line represents our fitted values given by
ui
u Yi Y i i u Y Y
i i i
Total may be small but the individual residual may be widely scattered Also positives may cancel out negative residuals resulting in a small total
OLS Criteria
How well does the sample regression line fit the data? We want to know what proportion of variations in Y does our model explain?
Ballentine view of r2
r2 = 0
r2 = 1
Sam wants to predict the sale of compact cassette tape recorder across stores using advertisement and price data where Sales is measured in number of units sold Advertisement = number of times product is advertised within the store Price = in dollars Predict the sale of compact cassette tape recorder if advertisement = 7 and price = $132?
Yi 0 1 X 1i 2 X 2i u i
Model
Sig.
Estimated Equation
Prediction
Predict Sales when Advertisement = 7 and Price = $132 Sales = 219.231 + 6.381 x 7 -1.671 x 132 = 219.231 + 604.667 220.572 =603.326 units of sale
R-Square
SPSS output
Model Summary Model 1 R .884a R Square .782 Adjusted R Square .637 Std. Error of the Estimate 16.108
R-Square = 0.782 indicates that the model explains 78.2 % variation in Y variable
Hypothesis testing
For df = n-k and level of significance read the table value from t table Decision rule: if the calculated |t| > t, then reject the Ho.
Hypothesis testing
Test if each of the slope coefficients make any impact on the Y variable at significance level of 0.05.
0 H0 : 1 0 H1 : 1
Significance level (SPSS output):0.061 0.061 > 0.05 => Do not reject H0 Advertisement has no significant impact on Sales
Price in Dollars
-1.671
.684
-.706
-2.441 .092
Is the regression as a whole significant? Test if atleast one X variable has an impact on the Y
Statistics used : F Statistics Given as ANOVA table output in SPSS output At Significance level of 0.05 If Sig < 0.05, then Reject H0
a. Predictors: (Constant), Price in Dollars, Number of advertisement b. Dependent Variable: Sales (units sold) Sig = 0.102 > 0.05. Hence do not reject H0. Y does not depend on any of the X variables