You are on page 1of 13

International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

1

STATISTICS FOR BUSINESS
CHAPTER 11
MULTIPLE REGRESSION

STRUCTURE OF PAPER
PART I - MULTIPLE REGRESSION MODEL
PART II - MEASURES OF PERFORMAMCE OF A REGRESSION MODEL AND THE ANOVA TABLE
PART III - THE F TEST OF A MULTIPLE REGRESSION MODEL
PART IV - TESTS OF THE SIGNIFICANCE OF INDIVIDUAL REGRESSION PARAMETERS

PART I
MULTIPLE REGRESSION MODEL

The population regression model of a dependent variable on a set of k independent
variables X
1
,X
2
, ,X
k
is given by
Y =

+
1
X
1
+
2
X
2
++
k
X
k
+e
where [
0
is the intercept of the regression surface and each [

, i = 1, ,k is the slope
of the regression surface sometimes called the response surface with respect to variable
X

.



International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

2

Model Assumptions:
1. For each observation, the error term e is normally distributed with mean zero and standard
deviation o and is independent of the error terms associated with all other observations.
That is,
e

~N(0,o
2
) for all ] =1,2,,n
independent of other errors.
2. In the context of regression analysis, the variables X

are considered fix quantities, although


in the context of correlational analysis, they are random variables. In any case,
X

orc inJcpcnJcnt o tc tcrm e. When we assume that X

are fixed quantities, we are


assuming that we have realization of k varibles X

and that the only randomness in comes


from the error term e.

The Estimated Regression Relationship
The estimated regression relationship is
Y

=h

+h
1
X
1
+h
2
X
2
++h
k
X
k

where
`
is the predicted value of , the value lying on the estimated regression surface.
The terms b

, i =0,,k, are the least-squares estimates of the population regression


parameters [

.








International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

3

PART II
MEASURES OF PERFORMAMCE OF A REGRESSION MODEL
AND THE ANOVA TABLE

Mean Square Error (MSF)
Standard Error of Estimate (x)
HSE and s are measures of how well the regression fits the data
MSF =
SSF
n (k +_1)

x =MSF

Multiple Correlation Coefficient (R)
Multiple Coefficient of Determination (R
2
)
Adjusted Multiple Coefficient of Determination (R

2
)
R, R
2
, and R

2
are measures of how well the regression model fits the data. In other
words, they measure the percentage of variation in the dependent variable explained
by the independent variables X


Multiple Correlation Coefficient
R
2
=
SSR
SST
=1
SSF
SST

Multiple Coefficient of Determination
R =

R
2

Adjusted Multiple Coefficient of Determination
R

2
=1
SSF/ [n (k +1)]
SST/ (n 1)
=1 (1 R
2
)
n 1
n (k +1)



International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

4

ANOVA Table for Multiple Regression Model
Source of Variation
Sum of Squares
(SS)
D
Mean Square
(MS)
F. ratio
(F
T
)
Regression
(R)
SSR k
HSR =
SSR
k
F
1
=
HSR
HSE

Error
(F)
SSE n (k +1) HSE =
SSE
n (k +1)

Total
(T)
SSI n 1






International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

5

PART III
THE F TEST OF A MULTIPLE REGRESSION MODEL
F test of a multiple regression model is a statistical hypothesis test for the existence
of a linear relationship between and any of the X



F test of a Multiple Regression Model
HYPOTHESIS TESTING PROCESS:

STEP 01: Determine the null and alternation hypothesis:
E
0
=[
1
=[
2
=[
3
= =[
k
=0
E
1
=Not oll tc [

(i =1,,k)orc zcro

STEP 02: Construct the ANOVA Table for the multiple regression model
Source of Variation
Sum of Squares
(SS)
D
Mean Square
(MS)
F. ratio
(F
T
)
Regression
(R)
SSR k
HSR =
SSR
k
F
1
=
HSR
HSE

Error
(F)
SSE n (k +1) HSE =
SSE
n (k +1)

Total
(T)
SSI n 1





International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

6

STEP 03: Compute the test statistic value (F
1
) (based on the ANOVA table) and the
critical value (F
1
) (based on the level of significance)
The test statistic value:
F
T
=F rat|u =
MSR
MSF

At the level of significance, the critical value:
F
C
=F
(u,k,n-(k+1))


STEP 04: CONCLUSION
+Situation 01: We cannot reject the null hypothesis E
0
since F
1
<F
C

For the instance that the null hypothesis is true, no linear relationship
exists between the dependent variable and any of the independent
variables X

in the proposed regression model.


+Situation 02: We can reject the null hypothesis E
0
since F
1
>F
C

For the instance that we can reject the null hypothesis, there is
statistical evidence to conclude that a regression relationship exists
between the dependent variable and at least one of the independent
variables X

proposed in the regression model.









International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

7

PART IV
TESTS OF THE SIGNIFICANCE OF INDIVIDUAL REGRESSION PARAMETERS
A test for the significance of an individual parameters is important because it tells us
not only whether there is evidence that variable X

(i =1,,k) has a linear


relationship with Y but also whether there is statistical evidence that variable X

has
explanatory power with respect to the dependent variable .

Tests of the Significance of Individual Regression Parameters
HYPOTHESIS TESTING PROCESS:

STEP 01: Determine the null and alternative hypotheses:
(Note: They are two tailed-testing)
(1) E
0
: [
1
=0
E
1
:[
1
0

(2) E
0
: [
2
=0
E
1
:[
2
0



(k) E
0
: [
k
=0
E
1
:[
k
0

STEP 02: Compute the test statistic value (t
1
/ z
1
) and the critical values (t
C
/ z
C
)
based on the level of significance
+Situation 01: If n (k +1) 30, we use t Jistribution


International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

8

For test i (i =1,,k), the test statistic value:
t
T
=
h
|

x(h
|
)

At the level of significance (o), the critical values:
t
C
=t
_
u
2
,n-(k+1)]


+Situation 02: If n (k +1) >30, we use z Jistribution
For test i (i =1,,k), the test statistic value:
z
T
=
h
|

x(h
|
)

At the level of significance (o), the critical values:
z
T
=z
[
u
2
,


STEP 03: CONCLUSION
At the level of significance, for each test,
+Situation 01: We cannot reject E
0
since t
1
[t
C
,t
C
] or z
1
[z
C
,z
C
]
For the instance that the null hypothesis is true that the slope
parameter of X

is non significant and no linear relationship exists


between the dependent variable and the independent variable X


+Situation 02: We can reject E
0
since t
1
[t
C
,t
C
] or z
1
[z
C
,z
C
]
For the instance that we can reject the null hypothesis, the variable X


is significant. It means that there is statistical evidence that variable X


has a linear relationship with and explanatory power with respect to
the dependent variable.

International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

9

Sample
PROBLEM 01: (The Form of Multiple-Choice Questions)
The sample data size 12 taken from four populations. The SPSS output for regression
analysis is as follows which are missing some values:
n =12, k =41 =3
ANOVA
Model Sum of Squares Df Mean Square F Sig.
1 Regression 80.117 3 26.706 56.700 0.000
Residual 3.768 8 0.471
Total 83.885 11
a Predictors: (Constant), X1, X2, X3
b Dependent Variable: Y

Coefficients
Model Coefficients Std. Error t Sig.
1 (Constant) 45.56 5.674
X1 2.754 0.775 3.5535 0.0075
X2 3.56 1.107 3.2159 0.0123
X3 1.85 1.065 1.7371 0.1206
Dependent Variable: Y =0.05
1. Fill in the ANOVA table and coefficients table by the relevant values at the suit cell.
2. Comment on the result of Regression:
SOLUSION
E
0
: [
1
=[
2
=[
3
=0
E
1
: Not oll tc [

(i =1,2,3) orc zcro


Since the test statistic value is too large (F
T
=56.7), we can strongly reject H
0
at all
level of significance. It means that based on the ANOVA table for regression model
International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

10

and the hypothesis testing, we have enough evidence to prove that there is a
regression relationship between the dependent variable Y and the independent
variables Xi.

3. Using =0.05, X3 is a significant predictor : (1 point)
a. True
b. False
SOLUTION: 3b. False
H
0
:
3
=0
H
1
:
3
0
Based on the table of coefficients, the p-value is large, that is, the test statistic value
falls in the non-rejection region. So, we cannot reject H
0
at 0.05 level of significance.
It means that based on the table and the hypothesis testing, we can believe that the
variable X3 is not significant predictor.

4. What is this model predict with X1=10, X2=15, X3=50?
a. 160
b. 219
c. 238
d. Other
SOLUTION
Based on the table of coefficients, we can estimate the multiple regression model for
predictor as followings:
Y

=45.560+2.754X
1
+3.56X
2
+1.85X
3

Y

=45.560+2.75410+3.5615+1.8515 =219


International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

11

5. Compute the value of R: 97.700%
SOLUTION
R=
_
SSR
SST
=
_
80.117
83.885
=0.977 =97.700%

PROBLEM 02: (The Form of Writing Questions)
A grocery store forecasts the monthly demand (Y) for their products using multiple-
regression. Three independent variables used are X
1
, X
2
, and X
3
. The data for last 12 months
of the year 2010 are collected. The regression results are shown below:
ANOVA table:
Source SS df MS F
Regression
Residual Error
Total 625.667

Coefficients
Predictors Coefficients S.E. of coefficients t
Constant -29.743 12.903
X1 1.104 0.283
X2 1.106 0.205
X3 -0.169 0.198

R
2
= 94.76%. Level of significance is = 0.05.

1. Fill up the ANOVA table; give the comments on relationships among variables.


International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

12

SOLUTION
Source SS df MS F
Regression 593.132 3 197.711 48.615
Residual Error 32.535 8 4.067
Total 625.667 11
R
2
=1
SSE
SST
or SSE =SST(1 R
2
) =625.667(10.948) =32.535
E
0
: [
1
=[
2
=[
3
=0
E
1
: Not oll tc [

(i =1,2,3) orc zcro


Since the test statistic value is too large (F
T
=48.615), we can strongly reject H
0
at
all level of significance. It means that based on the ANOVA table for regression
model and the hypothesis testing, we have enough evidence to prove that there is a
regression relationship between the dependent variable Y and the independent
variables X
I
(i =1,2,3).

2. Write the regression equation. What predictor should be removed from the equation?
SOLUTION
Predictors Coefficients S.E. of coefficients t
Constant -29.743 12.903
X1 1.104 0.283 3.901
X2 1.106 0.205 5.395
X3 -0.169 0.198 0.854

From the table, we can set up the regression equation as followings:
Y =29.743+1.104X
1
+1.106X
2
0.169X
3
+
To test whether the variables of the regression modal are significant, we have to
conduct the t-test of individual regression parameters.


International University IU

TA [SWC]
S
t
a
t
i
s
t
i
c
s

f
o
r

B
u
s
i
n
e
s
s

|

C
h
a
p
t
e
r

1
1
:

M
u
l
t
i
p
l
e

R
e
g
r
e
s
s
i
o
n

13

Our null and alternative hypothesizes of each variable:
H
0
:
1
=0
H
1
:
1
0
H
0
:
2
=0
H
1
:
2
0
H
0
:
3
=0
H
1
:
3
0

Based on the table of coefficients, we can compute the test statistic value of each
variable as followings:
t
T
X
1
=
b
1
0
s(b
1
)
=
1.104
0.283
=3.901
t
T
X
2
=
b
2
0
s(b
2
)
=
1.106
0.205
=5.395
t
T
X
3
=
b
3
0
s(b
3
)
=
0.169
0.198
=0.854

df =n (k+1) =8
=0.05,o/ 2 =0.05/ 2 =0.025
The critical value: t
C
=t
dI,u/ 2
=t
8,0.025
=2.306

Thus, at 0.05 level of significance, for the instance of X
1
and X
2
, we can reject H
0

because t
T
X
1
and t
T
X
2
are larger than t
C
. On the other hand, we cannot reject the
null hypothesis of X
3
since t
T
X
3
<t
C
. It means that based on the hypothesis testing,
we have enough evidence to prove that the variables X
1
and X
2
are significant.
However, X
3
is not significant and should be removed from the regression equation.
And we should conduct the multiple regression model again.

You might also like