You are on page 1of 44

What is Modeling?

3
?

m l m
21
m̂  a  bl
1.0 15.0 20

1.5 17.0 19
18
a? b?
2.0 18.0 17 a=-4
2.5 19.5 16
15
b = + 0.33
3.0 21.0 1 2 3 l

-4 0.33 20.7

mˆ  a  b  l
l = 20.7

mˆ  2.9
Simple Linear Regression

• The equation that describes how y is related to x and


an error term is called the regression model.
• The simple linear regression model is: y = β0 + β1x +ε
where:
β0 and β1 are called parameters of the model, ε is a
random variable called the error term.
Simple Linear Regression

• The simple linear regression equation is: E(y) = β0 + β1x


• Graph of the regression equation is a straight line.
• β0 is the y intercept of the regression line.
• Β1 is the slope of the regression line.
• E(y) is the expected value of y for a given x value.
Simple Linear Regression

E(y)
Positive Linear Relationship

Slope β1 is positive
Intercept
β0

x
Simple Linear Regression

Negative Linear Relationship


E(y)

Intercept
β0

Slope β1 is Negative

x
Simple Linear Regression

E(y)
No Relationship

Intercept Regression line


β0 Slope β1 is Zero

x
Simple Linear Regression

 The estimated simple linear regression equation


yˆ  b0  b1 x
 The graph is called the estimated regression line.
 b0 is the y intercept of the line
 b1 is the slope of the line.
 ŷ is the estimated value of y for a given x value
Simple Linear Regression

Regression Model Sample Data:


y =β0 + β1x +ε x y
Regression Equation x1 y1
E(y) = β0 + β1x . .
Unknown Parameters . .
β0, β1 xn yn

Estimated
Regression Equation
b0 and b1
provide estimates of yˆ  b0  b1 x
β0 and β1
Sample Statistics
b0, b1
Simple Linear Regression

Least Squares Criterion


min  ( yi  yˆi )2

where:
yi = observed value of the dependent variable for the ith
observation
ŷi = estimated value of the dependent variable for the ith
observation
Simple Linear Regression

Slope for the Estimated Regression Equation

b 
 ( x  x )( y  y )
i i

 (x  x)
1 2
i

y - Intercept for the Estimated Regression Equation


b0  y  b1 x
where:
xi = value of independent variable for ith observation
yi = value of dependent variable for ith observation
_
x = mean value for independent variable
Simple Linear Regression

Example: Reed Auto Sales


Reed Auto periodically has a Number of Number of
special week-long sale. As part of TV Ads Cars Sold
the advertising campaign Reed 1 14
runs one or more television 3 24
commercials during the weekend 2 18
preceding the sale. Data from a 1 17
sample of 5 previous sales are: 3 27
Simple Linear Regression

Slope for the Estimated Regression Equation

b 
 ( x  x )( y  y ) 20
i i
 5
 (x  x)
1 2
i 4

y-Intercept for the Estimated Regression Equation


b0  y  b1 x  20  5(2) 10

Estimated Regression Equation


yˆ 10  5x
Simple Linear Regression

30
25
20
Cars Sold

y = 5x + 10
15
10
5
0
0 1 2 3 4
TV Ads
Simple Linear Regression

Relationship Among SST, SSR, SSE


SST = SSR + SSE

 i
( y  y ) 2
  i
( ˆ
y  y ) 2
  i i
( y  ˆ
y ) 2

where:
SST = Total Sum of Squares
SSR = Sum of Squares due to Regression
SSE = Sum of Squares due to Error
Simple Linear Regression

The coefficient of determination is: r2 = SSR/SST


where:
SSR = sum of squares due to regression
SST = total sum of squares

r2 = SSR/SST = 100/114 = 0.8772

The regression relationship is very strong; 88% of the


variability in the number of cars sold can be explained
by the linear relationship between the number of TV
ads and the number of cars sold.
Simple Linear Regression

rxy  sign of b1 coefficient of determinat ion


rxy  sign of b1 r 2

Where b1 is the the slope of the estimated regression


equation yˆ  b0  b1 x

The sign of b1 in the equation yˆ 10  5x is “+”.


rxy   0.8772

Hence, rxy = +0.9366


Simple Linear Regression

1. The error ε is a random variable with mean of zero.


2. The variance of ε, denoted by σ2, is the same for all
values of the independent variable.
3. The values of ε are independent
4. The error ε is a normally distributed random variable.
Simple Linear Regression

 To test for a significant regression relationship, we must


conduct a hypothesis test to determine whether the
value of b1 is zero.
 Two tests namely, t-test and F-test are commonly used.
 Both the t test and F test require an estimate of σ2, the
variance of ε in the regression model.
Simple Linear Regression

An Estimate of σ
The mean square error (MSE) provides the estimate
of σ2, and the notation s2 is also used.

s2 = MSE = SSE/(n - 2)
where: SSE   ( yi  yˆi )2   ( yi  b0  b1xi )2
Simple Linear Regression

An Estimate of σ
 To estimate σ we take the square root of s2.

 The resulting s is called the standard error of the estimate.

SSE
s  MSE 
n2
Simple Linear Regression

Hypotheses
H0: β1 = 0
H1: β1 ≠ 0

Test Statistic is
b1
t
sb1
Simple Linear Regression

Rejection Rule

Reject H0 if p-value < α

or t < -tα/2 or t > t α/2

where:
tα/2 is based on a t distribution
with n - 2 degrees of freedom
Simple Linear Regression
1. Determine the hypotheses.
H0: β1 = 0
H1: β1 ≠ 0

2. Specify the level of significance. α = 0.05


3. Select the test statistic.
b1
t
sb1
4. State the rejection rule.
Reject H0 if p-value < 0.05 or |t | > 3.182 (with 3
degrees of freedom)
Simple Linear Regression

5. Compute the value of the test statistic.


b1 5
t   4.63
sb1 1.08

6. Determine whether to reject H0.


t = 4.541 provides an area of 0.01 in the upper tail.
Hence, the p-value is less than 0.02.
(Also, t = 4.63 > 3.182). We can reject H0.
Simple Linear Regression
 We can use a 95% confidence interval for β1 to test
the hypotheses just used in the t test.
 H0 is rejected if the hypothesized value of β1 is not
included in the confidence interval for β1 .

 The form of a confidence interval for β1 is:


Margin of Error
b1  t / 2 sb1
Point Estimator
Where tα/2 is the t value providing an area of α/2 in the
upper tail of a t-distribution with n - 2 degrees of
freedom
Simple Linear Regression
 Rejection Rule
Reject H0 if 0 is not included in the confidence interval for β1.
 95% Confidence Interval for β1
b1  t / 2 sb1 = 5 +/- 3.182(1.08) = 5 +/- 3.44
or 1.56 to 8.44
 Conclusion
0 is not included in the confidence interval. Reject H0
Simple Linear Regression

Hypotheses
H0: β1 = 0
H1: β1 ≠ 0

Test Statistic
F = MSR/MSE
Simple Linear Regression

Rejection Rule
Reject H0 if p-value < α or F > Fα

where:
Fα is based on an F distribution with 1 degree of
freedom in the numerator and n-2 degrees of freedom
in the denominator
Simple Linear Regression

1. Determine the hypotheses.


H0: β1 = 0
H1: β1 ≠ 0
2. Specify the level of significance.
α = 0.05
3. Select the test statistic.
F = MSR/MSE

4. State the rejection rule.


Reject H0 if p-value < 0.05 or F > 10.13 (with 1 d.f. in
numerator and 3 d.f. in denominator)
Simple Linear Regression

5. Compute the value of the test statistic.


F = MSR/MSE = 100/4.667 = 21.43

6. Determine whether to reject H0.


F = 17.44 provides an area of 0.025 in the upper tail.
Thus, the p-value corresponding to F = 21.43 is less than
2(0.025) = 0.05. Hence, we reject H0.
The statistical evidence is sufficient to conclude that we
have a significant relationship between the number of TV
ads aired and the number of cars sold.
Simple Linear Regression
• If the assumptions about the error term ε appear
questionable, the hypothesis tests about the significance
of the regression relationship and the interval estimation
results may not be valid.
• The residuals provide the best information about ε .

• Residual for Observation i


yi  yˆ i
• Much of the residual analysis is based on an examination of
graphical plots.
Simple Linear Regression

If the assumption that the variance of ε is the same for all


values of x is valid, and the assumed regression model is
an adequate representation of the relationship between
the variables, then -
The residual plot should give an overall impression of a
horizontal band of points
Simple Linear Regression

y  yˆ Good Pattern
Residual

x
Simple Linear Regression

y  yˆ Non-constant Variance
Residual

x
Simple Linear Regression

y  yˆ Model Form Not Adequate


Residual

x
Simple Linear Regression
Residuals
Observation Predicted Cars Sold Residuals
1 15 -1
2 25 -1
3 20 -2
4 15 2
5 25 2
Simple Linear Regression

TV Ads Residual Plot


3
2
Residuals

1
0
-1
-2
-3
0 1 2 3 4
TV Ads
Multiple Regression…

The simple linear regression model was used to analyze how one
variable (the dependent variable y) is related to one other
variable (the independent variable x).
Multiple regression allows for any number of independent
variables.
We expect to develop models that fit the data better than would
a simple linear regression model.
Simple regression considers the relation
between a single explanatory variable and
response variable
Multiple regression simultaneously considers the influence of
multiple explanatory variables on a response variable Y

The intent is to look at the


independent effect of each
variable while “adjusting out”
the influence of potential
confounders
The Model…
We now assume we have k independent variables potentially
related to the one dependent variable. This relationship is
represented in this first order linear equation:
dependent independent variables
variable

error variable
coefficients
In the one variable, two dimensional case we drew a regression
line; here we imagine a response surface.
Regression Modeling

 A simple regression model


(one independent variable)
fits a regression line in 2-
dimensional space

 A multiple regression model


with two explanatory
variables fits a regression
plane in 3-dimensional
space
Required Conditions…

For these regression methods to be valid the following four


conditions for the error variable ( ) must be met:
• The probability distribution of the error variable ( ) is
normal.
• The mean of the error variable is 0.
• The standard deviation of is , which is a constant.
• The errors are independent.
Estimating the Coefficients…
The sample regression equation is expressed as:

We will check the following:


Assess the model…
How well it fits the data
Is it useful
Are any required conditions violated?
Employ the model…
Interpreting the coefficients
Predictions using the prediction equation
Estimating the expected value of the dependent variable

You might also like