Professional Documents
Culture Documents
TABLE OF CONTENTS
Lecture 1
Descriptive Statistics
(Including Stem &
Leaf Plots, Box Plots,
Regression Example)
Stem & Leaf Display
Descriptive Statistics: Means,
Median, Std.Dev., IQR
Box Plot
Regression
Lecture 2
Central Limit
Theorem & CIs
Statement of Theorem
Simulations
Practical Issues & Examples
Tail Probabilities & Z-values
Z-Value Notation
Picture of CLT
Everything There Is to Know
Summary & 3 Types of CIs
Glossary
Lecture 3
CIs & Introduction to
Hypothesis Testing
Examples of Two Main Types
of CIs
Hypothesis Testing
Type I & Type II Error
Pictures of the Right and Left
Tail P-Values
Big Picture Recap
Glossary
Lecture 4
One- & Two-Tailed
Tests, Tests on
Proportion, & Two
Sample Test
1
1
2
3
10
12
12
13
15
16
17
19
20
21
22
25
27
35
38
39
39
44
45
47
29
30
31
34
Purposes
The Three Assumptions,
Terminology, & Notation
Modeling Cost in Terms of
Units
Estimation & Interpretation of
Coefficients
Decomposition of SS(Total)
23
34
Lecture 5
More Tests on Means
from Two Samples
Lecture 6
Simple Linear
Regression
23
32
How to Do Regression in
Minitab
How to Do Regression in Excel
48
48
49
50
51
52
53
54
56
57
59
61
62
See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.
TABLE OF CONTENTS
Lecture 6 Addendum
Terminology, Examples
&Notation
Synonym Groups
Main Ideas
Examples of Correlation
Notation for Types of Variation
and R2
Lecture 7
Inferences About
Regression Coefficients
& Confidence/Prediction
Intervals for Y /Y
Modeling Home Prices Using
Rents
Regression
Output
Two Basic Tests
Test for Lack- of-Fit
Test on Coefficients
Prediction Intervals & Confidence
Intervals
How to Generate These Intervals
in Minitab 17
Lecture 8
Introduction to Multiple
Regression
Application to Predicting Product
Share (Super Bowl Broadcast)
3-D Scatterplot
Regression Output
Sequential Sums of Squares
Squared Coefficient t-Ratio
Measures Marginal Value
Discussion Questions on
Interpreting Output
Lecture 9
More Multiple
Regression Examples
63
63
63
64
90
90
93
99
66
Lecture 10
Strategies for Finding the
Best Model
Stepwise Approach
Best Subsets Approach
Procedure for Finding Best Model
Studying Successful Products (TV
Shows)
Best Subsets Output
Stepwise Options
Stepwise Output
Best Predictive Model
Regression on All Candidate
Predictors to Find Redundant
Predictors
Other Criteria for Selecting Models
Discoverers
67
68
72
73
74
75
76
77
Lecture 11
1-Way Analysis of
Variance (ANOVA) as a
Multiple Regression
80
81
82
84
85
86
88
102
102
103
104
105
108
109
110
111
113
114
115
116
116
Analysis of Covariance
126
128
118
120
122
125
See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.
TABLE OF CONTENTS
Lecture 12
Chi-Square Tests for
Goodness-of-Fit &
Independence
129
Goodness-of-Fit Test
Test for Independence
Using
Using Minitab
Minitab
129
130
130
132
132
Lecture 13
Executive Summary &
Notes for Final Exam,
Outline of the Course &
Review Questions
133
133
135
140
145
The Outlines
Tests Concerning Means
and Proportions &
Outline of Methods for
Regression
149
153
154
See the Bottom Right Corner of Each Page for the Document Page Numbers Listed Here.
2.1, 2.9, 3.2, 3.3, 3.5, 4.6, 4.8 , 5.5, 7.9, 50.
Stem and Leaf
(With trimming)
Units: 0.10
Thousand $
2|19
3|235
Same Display
4|68
5|5
6|
7|9
High: 500
High: 500
MINITABs Version: This is an option in the Graph Menu, Or you can give the commands shown.
Stem-and-Leaf Displays
With Trimming
2 19
3 235
4 68
5 5
6
7 9
500
MTB> Stem c1
Leaf Unit = 1.0
(9)
1
1
1
1
1
0 223334457
1
2
3
4
5 0
Page 1 of 156
Units:
3
4
(2)
5
3
3
2
2
$10
2|348
3|5
4|01
5|30
6|
7|0
8|
9|01
Descriptive Statistics
Variable
Salaries
N
10
Mean
8.78
Median
4.05
TrMean
4.46
Variable
Salaries
Minimum
2.10
Maximum
50.00
Q1
3.12
Q3
6.10
StDev
14.58
SE Mean
4.61
Page 2 of 156
Note how the median is much better measure of a typical central value in this case.
Recall how standard deviation is calculated.
First the sample variance is calculated:
2
The Boxplot
Elements: Q1, median, and Q3 are represented as a box, and 2 sets of fences
(inner and outer) are graphed at intervals of 1.5 IQR below Q1 and above Q3.
The figures on pages 122-125 (in our text by Bowerman et al.) provide good
illustrations.
Page 3 of 156
Inner Fences
Page 4 of 156
Salaries
40
30
20
10
Page 5 of 156
Page 6 of 156
Page 7 of 156
Data
-1
-2
-3
-4
-5
22-Apr-16
-6
Google_Return
S& P_Return
Standard_Normal
(Recall what the Standard Normal distribution looks like, e.g. http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg .)
MTB > describe c3 c5 c6
(Or to do same analysis from menus: start from Stat menu, got to Basic Statistics & then to Display
Descriptive Statistics, then in the dialog box select c3, c5,and c6 as the variables.)
Variable
Google_Return
S&P_Return
Standard_Normal
N
29
29
30
N*
1
1
0
Mean
-0.207
0.026
-0.134
SE Mean
0.260
0.116
0.170
StDev
1.401
0.624
0.931
Minimum
-5.414
-1.198
-1.778
Q1
-0.772
-0.414
-0.813
Median
-0.086
0.017
-0.184
Q3
0.771
0.534
0.598
Maximum
1.674
1.051
1.871
Page 8 of 156
In contrast to the boxplots on the previous page, many business distributions are
positively skewed. For example, here is a comparison of the revenue distributions
for the largest firms in three health care industries.
Boxplot of 201 4 Revenues in Three Health Care Industries
(for Firms That Are Among the Largest 1000 in U.S.)
1 20
United_Health_ Group
Express_Scripts_Holding
Revenue (Billions)
1 00
80
60
40
HCA_Holdings
20
0
Insurance & Managed Care
(12 Firms)
Medical Facilities
(13 Firms)
Page 9 of 156
10
Page 10 of 156
11
S
R-Sq
R-Sq(adj)
1 .29557
1 7.5%
1 4.5%
Google_Return
0
-1
-2
-3
-4
-5
-6
-1 .5
-1 .0
-0.5
0.0
0.5
1 .0
S& P_Return
Assume you have a "large" sample size n, and that you find the sample mean,
, as the average of n observations, each of which is from a parent
By Central Limit Theorem: if I sample 100 customers, then the proportion who
complain (p^) will have a normal distribution (approximately), with the same mean
(0.1) but a much smaller standard deviation of
Frequency
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.09962
StDev
0.02918
120
Frequency
M ean
1000
100
80
60
Note: this is
approximately 0.03
40
20
0
0.03
0.06
0.09
0.12
0.15
0.18
0.21
0.18
0.16
Frequency
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
3.539
StDev
1.184
140
Frequency
Mean
1000
Note: This is
approximately 1.2
120
100
80
60
40
20
0
0.8
1.6
2.4
3.2
4.0
4.8
5.6
6.4
Page 14 of 156
Variable
S&P_Return
Variable
S&P_Return
N
40
N*
0
Q1
4.99
Mean
13.41
SE Mean
2.62
Median
15.75
Q3
27.74
StDev
16.57
Minimum
-36.55
Maximum
37.20
Bernoulli Distribution:
= p, 2=p(1-p).
Consequently:
/n = [p(1-p)/n]1/2
0.10
Frequency
0.3
0.2
>
0.05
0.1
0.025 >
0.0
Value of Normal Random Variable
Page 16 of 156
Z-Value Notation
z is used to represent the standard normal value above which there is a tail
probability of .
Tail probability is
Verify that z0.10 = 1.28, z0.05=1.645, and that z0.025 = 1.96. (Use normal table, e.g.,
http://www2.owen.vanderbilt.edu/bruce.cooil/cumulative_standard_normal.pdf.)
To verify that Z
0.10
= 1.28:
Tail probability is 0.10,
0.90
Verify that Z
0.025
= 1.96 !
0.025
Page 17 of 156
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.0
.5000
.5040
.5080
.5120
.5160
.5199
.5239
.5279
.5319
.5359
0.1
.5398
.5438
.5478
.5517
.5557
.5596
.5636
.5675
.5714
.5753
0.2
.5793
.5832
.5871
.5910
.5948
.5987
.6026
.6064
.6103
.6141
0.3
.6179
.6217
.6255
.6293
.6331
.6368
.6406
.6443
.6480
.6517
0.4
.6554
.6591
.6628
.6664
.6700
.6736
.6772
.6808
.6844
.6879
0.5
.6915
.6950
.6985
.7019
.7054
.7088
.7123
.7157
.7190
.7224
0.6
.7257
.7291
.7324
.7357
.7389
.7422
.7454
.7486
.7517
.7549
0.7
.7580
.7611
.7642
.7673
.7704
.7734
.7764
.7794
.7823
.7852
0.8
.7881
.7910
.7939
.7967
.7995
.8023
.8051
.8078
.8106
.8133
0.9
.8159
.8186
.8212
.8238
.8264
.8289
.8315
.8340
.8365
.8389
1.0
.8413
.8438
.8461
.8485
.8508
.8531
.8554
.8577
.8599
.8621
1.1
.8643
.8665
.8686
.8708
.8729
.8749
.8770
.8790
.8810
.8830
1.2
.8849
.8869
.8888
.8907
.8925
.8944
.8962
.8980
.8997
.9015
1.3
.9032
.9049
.9066
.9082
.9099
.9115
.9131
.9147
.9162
.9177
1.4
.9192
.9207
.9222
.9236
.9251
.9265
.9279
.9292
.9306
.9319
1.5
.9332
.9345
.9357
.9370
.9382
.9394
.9406
.9418
.9429
.9441
1.6
.9452
.9463
.9474
.9484
.9495
.9505
.9515
.9525
.9535
.9545
1.7
.9554
.9564
.9573
.9582
.9591
.9599
.9608
.9616
.9625
.9633
1.8
.9641
.9649
.9656
.9664
.9671
.9678
.9686
.9693
.9699
.9706
1.9
.9713
.9719
.9726
.9732
.9738
.9744
.9750
.9756
.9761
.9767
2.0
.9772
.9778
.9783
.9788
.9793
.9798
.9803
.9808
.9812
.9817
2.1
.9821
.9826
.9830
.9834
.9838
.9842
.9846
.9850
.9854
.9857
2.2
.9861
.9864
.9868
.9871
.9875
.9878
.9881
.9884
.9887
.9890
2.3
.9893
.9896
.9898
.9901
.9904
.9906
.9909
.9911
.9913
.9916
2.4
.9918
.9920
.9922
.9925
.9927
.9929
.9931
.9932
.9934
.9936
2.5
.9938
.9940
.9941
.9943
.9945
.9946
.9948
.9949
.9951
.9952
2.6
.9953
.9955
.9956
.9957
.9959
.9960
.9961
.9962
.9963
.9964
2.7
.9965
.9966
.9967
.9968
.9969
.9970
.9971
.9972
.9973
.9974
2.8
.9974
.9975
.9976
.9977
.9977
.9978
.9979
.9979
.9980
.9981
2.9
.9981
.9982
.9982
.9983
.9984
.9984
.9985
.9985
.9986
.9986
3.0
.9987
.9987
.9987
.9988
.9988
.9989
.9989
.9989
.9990
.9990
Page 18 of 156
Acknowledgment: This picture of the Central Limit Theorem is based on a much prettier graph made for this course by Tim Keiningham,
Global Chief Strategy Officer and Executive Vice President, Ipsos Loyalty (also a student in an earlier version of this course).
Page 19 of 156
2)
/ (/ ).
/ ( )/ .
& (
) .
This assumes n is large enough so that
3)
()
/ (/ ).
This is the same as confidence interval (1), except that a t-value is now used in place of the zvalue.
NOTE:
()
0.2 (1.645)[.2(.8)/100]
Find an 80% C.I. for p:
0.2 1.28(0.04)
Example 3: Suppose we consider only the last 16 changes in S&P:
n=16, = 6.93, s/n= 4.69. Must use t-value because n<30.
Find a 99% C.I. for :
10
Glossary
Reference: Chapter 5 (pp. 188, 190) versus Chapter 3 (pp. 100, 110)
The Mean of a Distribution: () ().
(1)
The mean of a distribution (or a random variable X) is simply the weighted average of its
realizable outcomes, where each realizable value is weighted by its probability, P(x).
Contrast this definition with the definition of a sample mean:
x x (1 / n ) (= ).
(2)
The only difference is that (1/n), the frequency with which each observation occurs in the
sample, replaces P(x) in equation (1).
The Variance of a Distribution:
[( ) ] ( ) ().
(3)
) (())]
= [ (
(4)
) / ( )]/.
= [ (
) / ( )] .)
( The sample variance is: = [ (
_________________________________________________________________
ANSWERS to Examples (on Bottom of Previous Page)
Example 1
1) 90% CI:
2) 95% CI:
Example 2
1) 90% CI:
0.05 (1 )/
2) 80% CI:
()
/ (/ )
Lecture 3
Confidence Intervals for Means and Proportions
& Introduction to Hypothesis Testing (Large Sample Mean)
Outline (Ch.9: 9.1-9.2)
Recap of C.I.s for Means and Proportions
One-tailed tests on sample mean
What is type I error? Type II error? Power?
Everything to Know About the Test Statistic and P-value
Recap of Confidence Intervals
Example 1
I have just designed a new type of mid-size car with a hybrid engine.
To determine its average fuel efficiency (mpg), I sample 30 mile per
gallon measurements from 30 different cars (city driving).
MTB > print c1
70.4
80.3
89.3
70.5
80.9
89.7
MPG
70.8
71.2
81.1
81.4
89.9
90.6
72.5
84.2
91.0
73.5
84.2
92.1
75.1
84.3
77.0
85.4
77.2
85.6
77.4
86.3
77.9
86.3
78.3
86.7
0001
23
5
7777
8
0011
Boxplot of MPG
95
90
85
44455
666
999
01
2
MPG
4
6
7
11
12
(4)
14
14
9
6
3
1
80
75
70
N
30
Mean
81.37
Variable
MPG
Median
81.25
SE Mean
1.24
Q3
86.40
TrMean
81.43
StDev
6.80
Minimum
70.40
Q1
76.53
Maximum
92.10
Find a 95% confidence interval for the real mean mpg () and
interpret it.
C.I.:
Note:
C.I.:
/
.
(successes),
&
. )
(failures).
Page 24 of 156
Hypothesis Testing
Reconsider the new hybrid car example (example 1). Suppose that I want
to show that my new car has an average mpg () that is better than that of
the best performing competitor, for which the average mpg is 78. Formally, I want
to "disprove" a null hypothesis
H0: =78 (or sometimes written as 78)
in favor of the alternative hypothesis:
H1: > 78.
=81.37, s= 6.8, s/n = 1.24. (For n < 30, the procedure is
Note that: n=30,
identical except when we find the critical value. That case will also be discussed.)
To build a case for H1, I follow 3 logical steps (typical of all hypothesis testing).
1)
Assume H0 is true.
2)
3)
Regarding step 3, if H0 is true, I would see values of z greater than z0.05 = 1.645
only 5% of the time. This seems improbable and it supports H1 and so a reasonable
decision rule is to: reject H0 in favor of H1 if z is greater than 1.645. This assumes
that I am willing to make a mistake 5% of the time.
Page 25 of 156
Z0.05
In this sample,
- 78]/(s/n) = [81.37-78]/1.24 = 2.72 > 1.645.
z = [
Therefore, I reject H0 in favor of H1.
SUMMARY: to test H0: = 78 versus H1: > 78
we use the decision rule: reject H0 if
- 78]/(s/n) > z
z = [
or equivalently if:
> 78 + z(s/n).
for n=30: t
= 3.4, 2.5,
2.0, 1.7,
1.3, or 0.85 ).
Suppose that I had chosen = 0.001, then since z0.001 = 3.09, and z = 2.72,
I would accept H0 because z =2.72 >/ z0.001=3.09. In this case, I would be
concerned that I made a type II error. Type II error refers to the case where
the null hypothesis H0 is really false but I fail to reject it! The following
figure summarizes the situation with type I and II errors.
Page 26 of 156
DECISION
H1 IS TRUE
REJECT H0
Type I Error
Correct
Decision
ACCEPT H0
Correct
Decision
Type II Error
Note that to make a decision on whether to reject or accept H0: =78, we simply
- 78]/(s/n) with an appropriate normal
need to compare the test statistic z = [
value, z, that corresponds to the significance level that is chosen beforehand. If
z > z , we reject H0 (otherwise accept H0).
Distribution of Test Statistic (Z) When H0 Is True
P-Value: the
probability to right
of z
z0.05
1.645
z z0.001
2.72 3.09
YES
NO!
Page 27 of 156
Example 1
H1: > 78
= 0.05
2.72
Example 2
H1: < 80
= 0.10
1.10
Reject H0 (in favor of H1) if z > z 0.05 = 1.645 z < - z0.1 = -1.28
Page 28 of 156
2.5
2.0
1.5
1.0
0.5
0.0
-4
-3
-2
-1
1.10
0.8643
NO
NO
2.5
2.0
1.5
1.0
0.5
0.0
-4
-3
-2
-1
1)H0: = 0
0
2)H0: = 0
0
H 1: > 0
H1: < 0
-z
3)H0: = 0
No Other Way
H1: 0
z/2
z > z
z < z
|z| > z/2
(Note that z is the test statistic )
Tail prob. > z
Example 1
(see bottom p.6)
P-value=0.86
Picture of
test statistic and
p-value (shaded area)
relative to standard
normal distribution
P-Value=0.27
P-Value=0.0033
H0 : = 78
H1: > 78
H0 : = 80
H1: < 80
Test Statistic
z=
= .
= .
Critical Value
Z0.05= 1.645
-Z0.10 = -1.28
Z0.10/2=Z0.05=1.645
Decision
Reject H0
Accept H0
Accept H0 Because
|1.10| 1.645
Null Hypothesis
Alternative
Significance Level
= 0.05
= 0.10
H0 : = 80
H1: 80
= 0.10
|
| = .
Page 30 of 156
Glossary
= significance level = maximum probability of making a type I error.
p-value = tail-probability that corresponds to test statistic, that is calculated for
specific alternative hypothesis H1.
= probability of making a type II error (not rejecting H0 when H1 is true).
Power = 1- = probability of making correct decision when H1 is true.
(ceteris paribus).
(ceteris paribus).
Lecture 4
One and Two-Tailed Tests, Tests on a Sample
Proportion, & Introduction to Tests on Two Samples
Main References
(1) Ch.9: 9.3-9.4, Summary, Glossary, App. 9.3;
Ch.10: 10.1
(2) The Outline "Tests on Means and
Proportions" (referred to as "The Outline")
Topics
I. Tests on Means and Proportions from One Sample (Reference:
9.3-9.4)
Example of a two-tailed test (Case 1)
When to use t-values (Case 2)
Tests on a sample proportion (Case 3)
II. Tests on Means from Two Samples (Ref: 10.1)
Tests on means from two large samples (Case 4)
Tests on means when it is appropriate to assume variances are
equal(Case 5)
z = [ - 78 ] /(s/n) > z ,
or equivalently if > 78 + z(s/n). Otherwise we
accept H0.
Page 32 of 156
( : =
.
= . )
.
=
<
(or equivalently if:
<
(/ ) .
[Note that this is just Case 1 in the outline: 0 refers to the constant used
in the null hypothesis, which is "80" in this last case.]
Page 33 of 156
( )
..
.(.)
.
.
= . > . = .
(. )
(. )
= .
(1 / 100) 2
( s12 / n1 s22 / n2 ) 2
df 2
133.
2
2
2
2
2
2
2
( s1 / n1 ) ( s2 / n2 )
(0.7 / 100) (0.5 / 50)
n1 1
n2 1
100 1
50 1
So for the =0.1 and =0.01, the critical values are:
(133)
(133)
0.1 = 1.29, 0.01
= 2.35, respectively, and
the conclusions are the same in each case!
Case 5: What If We Are Willing To Assume Equal
Variances?
Example : I'm comparing weekly returns on the same stock
over two different periods. The average sample return is larger
during period 2. Can one show that the return during period 2
is significantly higher than during period 1 at the 0.01 level?
The data are: n1 = 21,
= . %, = .
n2 = 11,
= . %, = .
What are the appropriate hypotheses?
H0: 1 - 2 = 0
H1: 1 < 2
1 - 2 < 0.
JustlikeCase4withspused
inplaceofs1"ands2
.
.
. .
H1: 1 - 2 0
= 2.75
Page 37 of 156
=
,
has approximately a standard normal distribution. We have been using this single
result to justify the construction of confidence intervals and hypothesis tests.
When using this result, we have generally been approximating by substituting
the sample standard deviation, s, for it. If the sample is large enough, this
doesnt impose much additional error. But when samples are smaller (e.g., n < 30),
the convention is to accommodate the additional error (caused when using s for )
by using the fact that if the original distribution was normal, then the t-statistic,
=
,
really has what is referred to as a t-distribution with n-1 degrees of freedom. The
degrees of freedom number, n-1, refers to the amount of information that the
sample standard deviation, s, contains about the true standard deviation . If we
have only 1 observation, we have no information about (n-1= 1-1 = 0), if we
have 2 observations we have essentially 1 piece of information about , and so on.
This is the reason we divide by the degrees of freedom n-1, when calculating s,
) / ( )] .
= [ (
The real question becomes: why should we use the t-distribution when it relies
on the strong assumption that the original distribution is normal, which is
exactly the type of assumption we were trying to avoid by using the Central Limit
Theorem?! The answer is essentially this: by using t-values in place of z-values
we are doing something that accommodates the additional inaccuracy we generate
by using s to estimate , and in practice it works quite well even when the parent
distribution is not normal! Of course, t-values converge to z-values as the sample
size increases: see the t-table.
Page 38 of 156
456
390
Failures*
18
23
Is there a higher failure rate for SBA loans to Days Inn than for
Super 8 Motel at the 0.05 level?
H0:
p1 - p2 = 0 (Or 0)
H1:
p1 - p2 > 0
D0 = 0
23
390
= 0.0590 ;
2 =
18
456
= 0.0395 .
1 2 0
(1
1)
(1
2)
1
+ 2
1
0.0590 0.0395 0
0.0590(0.9410) 0.0395(0.9605)
+
390
456
0.0195
0.0151
= 1.30.
Page 40 of 156
23 +18
390 + 456
0.0590 0.0395 0
0.0485(0.9515) 0.0485(0.9515)
+
390
456
0.0195
0.0148
= 1.32 .
Case 1
Case 2
Cases 4 & 5
Case 7
Case 3
Case 6
Page 41 of 156
D0 = 0
tail prob. is
0.194/2 =0.097
Sample
1
2
X
23
18
N
390
456
Sample p
0.058974
0.039474
0.2
0.1
0.0
-3
-2
-1
Page 42 of 156
If We Pool:
Test and CI for Two Proportions
Sample
1
2
X
23
18
N
390
456
Sample p
0.058974
0.039474
Page 43 of 156
Odds Ratio
)
/(
.
.
= . ,
(1)
indicating that the odds of failure is 1.5 times higher for the Days Inn
franchises. (To turn this into a health care example: imagine
companies are people, and that failure is a disease to which certain
people are more susceptible.)
Alternatively, since this is a prospective study, sometimes the results
are summarized in terms of the relative risk of failure (for small versus
large), which is simply the ratio of 1 2 :
.
Relative Risk = . = . ,
(2)
indicating that failure is 1.5 times more likely for the Days Inn. Of
course, remember that 1 & 2 are not good estimates, which is a
common problem in health/medical applications. Also, 1 is not even
significantly larger than 2 at the 0.05 level!
Page 44 of 156
0___1/3____1_____3____
Finally, whenever 1 > 2 , the odds ratio will be greater than the
relative risk:
/(
)
/(
(
)
=
>
)
(
)
Annualized Return
1999-2009
18.8%
-9.5%
-2.6%
Page 45 of 156
CGM Focus
Fund
2010
2011
2012
2013
2014
2015
Mean
16.94
- 26.29
14.23
37.61
1.39
- 4.11
6.63
Fidelity Growth
Strategies Fund
25.63
- 8.95
11.78
37.87
13.69
3.17
13.87
difference makes it
Differences:
=CGMFidelity clear we cannot
reject H0 (Fidelity
outperforms CGM)!
But we formally
apply the test
anyway (as an
illustration).
- 8.69
- 17.34
2.45
- 0.26
- 12.30
- 7.28
- 7.24
( )
= 7.24 ; =
= 7.38.
1
mean is:
Test statistic:
Critical Value:
7.38
6
0
=
()
()
= . = .
7.240.5
3.01
= 2.57
Large Sample
(Case 1)
Small Sample
(Case 2)
Mean
1 Sample
Proportion
(Case 3)
Large Samples
(Case 4)
Means
2 samples
Proportions
Paired Samples
(Case 6)
(Case 7)
Page 47 of 156
How
Decomposition of SS(Total) = =1( )2
(See third equation on page 492:
SS(Total) is referred to there as Total Variation.)
Measures of fit: MS(Error) (the variance of error), R2(Adjusted)
Purposes
1.
2.
3.
4.
(2)
to predict Y.
3.
Number
of
Units(x)
1
3
4
5
7
Cost(y)
($1000)
6
14
10
14
26
Predicted Cost
Residual
(or fit)
)
= (
5
1
11
3
14
-4
17
-3
23
3
Page 50 of 156
In the plot above, the open circles are the actual observations of
y & x (cost & units), and the solid circles are the values of y^ & x
(predicted, or fitted cost, & units). The vertical distances
between open circles and solid circles represent the observed
errors or residuals of the regression model. The estimated
regression line is:
= 0 + 1
(3)
= 2 + 3.
SS(Error) = =1( )2
= 12 +32 +(-4)2 +(-3)2 +32 = 44.
This is apparently the smallest value of sum of squared error
obtainable among all possible choices of b0 and b1.
Please interpret these coefficients.
b0: predicted (or average) value of Y when X=0
(in this application it is the fixed cost) ;
b1: average change in Y per unit change in X
(in this application it is the variable cost).
Page 51 of 156
Decomposition of SS(Total)
( )2
=1
SS(Total) = =( ) = 224 .
The regression model succeeds in reducing the uncertainty about
y if SS(Error) is significantly less than SS(Total). Also,
regression models actually allow us to decompose SS(Total)
into two parts, SS(Error) and SS(Regression):
SS(Total) = SS(Regression) + SS(Error);
where:
SS(Regression) ==1( )2
Cost(y)
)
( )
(
( )
1
6
(6-14)2
(5-14)2
12
3
14
(14-14)2
(11-14)2
32
4
10
(10-14)2
(14-14)2
(-4)2
5
14
(14-14)2
(17-14)2
(-3)2
7
26
(26-14)2
(23-14)2
32
TOTALS:
224
=
180
+
44
Name of SS:
SS(Total)=
SS(Regress.)+ SS(Error)
Minitab Summary: Main Regression Output of Version 17
(See Page 11 for a Comparison with Excel)
Regression Analysis: Cost(y) versus Units(x) "MS" refers to "Mean Square" which is always the
corresponding SS (Sum of Squares) divided by
DF (degrees of freedom): MS=SS/DF.
Analysis of Variance
Source
Regression
Units(x)
Error
Total
DF
1
1
3
4
Adj SS
180.00
180.00
44.00
224.00
Model Summary
S
R-sq
3.82971 80.36%
Adj MS
180.00
180.00
14.67
224/4
R-sq(adj)
73.81%
F-Value
12.27
12.27
P-Value
0.039
0.039
Variance of Error
Variance of Y
R-sq(pred)
38.11%
Measures of Fit
Coefficients
Text Notation:
b
Term
Constant
Units(x)
Coef
2.00
3.000
sb
SE Coef
3.83
0.856
=
T-Value
0.52
3.50
P-Value
0.638
0.039
VIF
1.00
Regression Equation
Cost(y) = 2.00 + 3.000 Units(x)
Page 53 of 156
Note that on the line just below the Analysis of Variance table in
the MINITAB output, there are 4 primary measures of fit:
s=3.83, R-sq=80.4%, R-Sq(adj)=73.8%, R-sq(pred)=38.1 .
The first three can be calculated using the information in the
Analysis of Variance table. The standard deviation s
represents the estimated standard deviation of the residuals, or
observed errors, also written as s,
s = [Variance of observed errors]1/2
= [SS(Error)/(n -[# parameters in model])]1/2
= [44/(5-2)]1/2 = [14.67]1/2 = 3.83.
[The text calls "s" the "standard error," and writes it as simply s.
See the shaded expressions on page 479.]
()
()
=
()
()
In this example:
()
=
= . . %
=
()
(OR:
()
()
where 180," 44, and 224" are all shown in the Analysis of
Variance table.
A better measure of fit is found by adjusting R2 so that it
estimates the proportion of the variance of y that is explained
by the fitted values from the model. This proportion is referred
to as "R2(Adjusted),"
()
() = () .
In this case:
()
( )
.
=
=
[
( )]
= . . %
Cell Contents:
Units(x)
0.896
0.039
Cost(y)
1.000
*
0.896
0.039
Correlation
P-value
n
i 1
x y i y /( n 1)
1
2
2
1
n
n
x i x *
y i y
i 1
i 1
(n 1)
(n 1)
(See pp. 125-127, 492-495 of the text for more examples and discussion.)
Alternatively:
(sign of b1) * [square root of R2(from simple regression)] .
0.804 0.896 .
In the example: r = +
r=
1/ 2
10
Discussion Questions
(The Regression Output Is Redisplayed on the Next Page.)
1.
Use the regression equation to predict the cost(Y) when number of units
(X) is 4.
y^ =
2.
b0 + b1 X
What was the actual cost for an order when units =4?
residual or error at that point?)
(What is the
S2 = MS(ERROR) = 14.67
5.
= 56
6.
Show how R2(Adjusted) is related to the variance of cost and the variance
of the residuals?
()
14.67
() = 1
7.
()
=1
56
Show how R2(unadjusted) is related to the correlation between cost (Y) and units (X).
R2
= r2
11
Appendix
Comparison of MINITAB Output (Versions 14-17) with Excel
MINITAB 17 Output
Regression Analysis: Cost(y) versus Units(x)
Analysis of Variance
Source
DF Adj SS
Regression
1 180.00
Units(x)
1 180.00
Error
3
44.00
Total
4 224.00
Adj MS
180.00
180.00
14.67
F-Value
12.27
12.27
P-Value
0.039
0.039
#4
Model Summary
R-sq
R-sq(adj)
R-sq(pred)
3.82971
80.36%
73.81%
38.11%
#5
Coefficients
Term
Coef
Constant
2.00
Units(x) 3.000
SE Coef
3.83
0.856
T-Value
0.52
3.50
P-Value
0.638
0.039
VIF
1.00
Regression Equation
Cost(y) = 2.00 + 3.000 Units(x)
Excel Output
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.896421
R Square
0.803571
Adjusted R
Square
0.738095
Standard
Error
3.829708
Observations
5
ANOVA
Df
Regression
Residual
Total
Intercept
X Variable 1
MS
180
14.66667
F
12.27273
Significance
F
0.039389
Lower 95%
1
3
4
SS
180
44
224
Coefficients
Standard
Error
t Stat
P-value
2
3
3.829708
0.856349
0.522233
3.503245
0.637618
0.039389
-10.18784
0.274716
Upper
95%
Lower
95.0%
Upper
95.0%
14.18784
5.725284
10.18784
0.274716
14.18784
5.725284
Page 58 of 156
12
Residual
-1
-2
-3
-10
10
20
Fitted Value
Page 59 of 156
13
Residual
20
10
-10
5
10
15
Fitted Value
Residual
10
-10
-10
10
20
30
Fitted Value
Page 60 of 156
14
Page 61 of 156
15
Page 62 of 156
, Error, Residual
7) Coefficients are sometimes referred to using the more general term parameters.
Coefficients are the parameters that are used in linear models.
Main Ideas
Simple linear regression refers to a regression model with only one predictor. The underlying
theoretical model is:
= 0 + 1 + ,
where y represents a value of the dependent variable, x is a value of the predictor, represents
random error and and 1 represent unknown constants.
The corresponding estimate regression equation is:
= 0 + 1 .
The regression coefficient b0 and b1 refer to sample estimates of the true coefficients 0 and 1,
respectively.
The sample correlation coefficient, r, estimates the true (or population) value of the correlation,
, which is a measure of the degree to which two variables (Y and X) are linearly related.
Of course, the sample correlation (r) and the slope coefficient (b1) are closely related:
1 =
(,)
()
(*)
1 =
(,)
(,)
2
Page 63 of 156
Examples of Correlation
Change in GDP vs Consumer Sentiment (1995-2015)
1999
Change in GDP
500
2015
250
-250
2009
-500
60
70
80
90
100
110
Consumer Sentiment
85
80
75
70
65
60
55
50
20
30
40
50
60
70
80
90
100
26
MPG
24
22
20
18
16
2500
2750
3000
3250
3500
3750
4000
Weight
Random Y vs Random X
4
3
Random Y
2
1
0
-1
-2
-3
-4
-4
-3
-2
-1
Random X
1.
) ()
Total Variation = (
The sum of squares of the observations of Y around their mean.
) ()
2. Explained Variation = (
) around their mean (which is also ).
The sum of squares of the predicted values of Y (
) ()
3. Unexplained Variation = (
The sum of squares of the differences between each observation and the corresponding
predicted value.
Note:
() = =
()
[
( [# ])]
=
()
[
( )]
For simple linear regression, which includes a constant and a slope coefficient: [# of
parameters in model] = 2.
[Reference: pp. 493-495, Essentials of Business Statistics (2015), 5th Edition, Bowerman
et al.]
Page 66 of 156
Lecture 7
Inferences About Regression Coefficients &
Confidence/Prediction Intervals for Y /Y
SS(Total)
SS(Regression) +
=1( )2
SS of observed
y-values
around mean
2
+ =1( )
=1( )2
SS of fitted
values around
the mean
SS(Error)
SS of errors (errors
are the actual y minus
fitted or predicted y)
1) ) = 1
()/([# ])
( ) /(1)
()
Variance of Y
(Recall that:
2)
2(
) = 1
=1
()
())
()
= ()
{()/( [#
])}
Application
Column
C1
C2
Count
43
43
Name
URBAN AREA AND STATE
HOME PRICE
Avg for 2400 sq. ft. new home, 4 bed, 2 bath on 8000 sq.ft. lot
C3
43
Apt Rent
Avg for 950 sq. ft. unfurnished apt., 2 bed, 1.5-2bath
excluding all utilities except water.
Other interesting data sets on home prices and rental rates by city:
https://smartasset.com/mortgage/price-to-rent-ratio-in-us-cities
https://smartasset.com/mortgage/rent-vs-buy#map .
Page 68 of 156
S
R-Sq
R-Sq(adj)
HOME PRICE
1200000
70561.6
88.2%
87.9%
1000000
San Francisco CA
800000
Honolulu HI
600000
400000
200000
1000
1500
2000
2500
3000
3500
4000
Apt Rent
500000
59728.1
35.9%
34.2%
HOME PRICE
450000
400000
350000
300000
250000
200000
1000
1100
1200
1300
1400
1500
Apt Rent
Page 69 of 156
Residual
50000
-50000
-100000
-150000
300000
320000
340000
360000
380000
400000
420000
440000
Fitted Value
Page 70 of 156
Rent (X)
$286.5
.
Page 71 of 156
R-sq(adj)
34.17%
Coefficients
Term
Coef
Constant
1894
Apt Rent 286.5
SE Coef
77528
63.7
Adj MS
72076453090
72076453090
3567451386
3664999695
1909130145
F-Value
20.20
20.20
P-Value
0.000
0.000
1.92
0.401
R-sq(pred)
28.48%
T-Value
0.02
4.49
P-Value
0.981
0.000
VIF
1.00
Regression Equation
HOME PRICE = 1894 + 286.5 Apt Rent
()
()
7.21 1010
2.01 1011
OR
72.1
201
=36%
) =
()
()
.
2.01 1011 /37
34.2%
Interpret R2(adjusted):
Interpret s:
The analysis on the last page also includes two types of statistical tests.
1) Test for Lack-of-Fit
H0: The Model Is Appropriate versus H1: Model Is Not Appropriate
Here we hope that we do not reject H0. That is, we hope to see a
large p-value (e.g., p-value > 0.2). If the p-value is small and
forces us to reject H0, then we should try to find another model.
If there is substantial information that the model is inappropriate,
the Lack-of-Fit variance will be significantly larger than the
Pure-Error variance and the ratio of these two variances should
be significantly greater than 1. Here the p-value (0.4) indicates this
ratio, called the F-value (1.92), is not significantly greater than 1.
(Please find these numbers in the Analysis of Variance Table.)
F-value = 1.92 =
3.6 6
1.91
()
( )
Lack-of-Fit Variance
Pure Error Variance
2
= SS(Lack-of-Fit) + SS(Pure Error).
SS(Error) = =1( )
Rochester MN
Minot ND
250723
333300
Mean: 292012
1122
1122
Consequently:
SS(Pure Error) = (218265 232562) 2 + (246858 232562)2
+ (250723 292012)2 + (333300 292012)2 = 3.818 billion
SS(Lack-of-Fit ) = SS(Error) SS(Pure Error) =128.4 billion 3.8 billion
= 124.6 billion
Analysis of Variance Table (again)
Source
Regression
Apt Rent
Error
Lack-of-Fit
Pure Error
Total
DF
1
1
36
34
Adj SS
72076453090
72076453090
1.28428E+11
1.24610E+11
Adj MS
72076453090
72076453090
3567451386
3664999695
3818260289
1909130145
37
2.00505E+11
F-Value
20.20
20.20
P-Value
0.000
0.000
1.92
0.401
H1: 1
Test Statistic: =
286.5 0
63.7
= 4.49; p-value=0.000
versus
286.5 100
63.7
= 2.93
(Yes!!)
)*(
Page 75 of 156
10
0.025
[ ]
where SE Fit stands for standard error of the fit and represents
the uncertainty due to the fact that we only have estimates of the
regression coefficients. In calculating the prediction, these
estimated coefficients are multiplied by the corresponding predictor
value (or predictor values in the case of multiple regression), so that
the resulting error is partly a function of the actual value(s) of the
predictor variable(s). SE Fit IS NOT something to be calculated
by hand!
Page 76 of 156
11
0.025
[( )2 + ()]
Here we use a standard deviation that is the square root of the sum
of two variances. The first variance is the one which we discussed
on the last page. The second variance is MS(Error): this is the
estimated variance of the error term in the model and it affects all
individual predictions but does not depend on the predictor values.
Here is an illustration of how you would do this analysis in Minitab.
Minitab 17
Page 77 of 156
12
#3
Variable Setting
Apt Rent
1100
Fit
SE Fit
95% CI
317061 11839.4 (293049, 341072)
Variable Setting
Apt Rent
1300
Fit
SE Fit
95% CI
374364 11367.6 (351309, 397418)
95% PI
(193569, 440552)
#2
95% PI
(251055, 497672)
[ ] .)
Page 78 of 156
13
The Fitted Line Plot analysis (listed in the Regression menu) will
draw pictures of the confidence and prediction intervals, along with
the regression line (use the Options button).
Fitted Line Plot "Options" Provides Picture of the CIs and PIs
HOME PRICE = 1894 + 286.5 Apt Rent
600000
Regression
95% CI
95% PI
S
R-Sq
R-Sq(adj)
HOME PRICE
500000
59728.1
35.9%
34.2%
400000
300000
200000
1000
1100
1200
1300
1400
1500
Apt Rent
Note:
1. CIs are always narrower than PIs (and sometimes much narrower)
because they are confidence intervals for the line itself and do not
reflect the random error at individual points.
2. Both the CIs and PIs get narrower as X approaches its mean
because this is where the value of SE FIT is smallest.
Page 79 of 156
Drop Predictors
= SS(Regression)
+ SS(Error).
Add Predictors
=1( )2 =
=1( )2 + =1( )2 .
Model with no constant:
=1 2
+ =1( )2 .
=
=1( )2
SS(Total) remains unchanged as predictors are added to, or dropped
from, the model. But as predictors are added, SS(Regression)
increases by the amount that SS(Error) decreases, and as predictors
are dropped, SS(Regression) decreases by the amount that SS(Error)
increases.
Degrees of Freedom:
Total
= Regression
+
Error
[(n-1) or n] = [# of predictors] + [n-{# of parameters}].
Model with constant:
n-1
=
k
+
[n - (k+1) ].
Model with no constant:
n
=
k
+
[n k ] .
(In simple linear regression: {# of predictors}= 1, and if there is a
constant in the model, {# of parameters} = 2.)
Page 80 of 156
Name
Share
Description
% of households that watch the
game among all households
watching TV (Market Share)
C60
49
C20
49
Over/Under
C37
49
|Line|
C40
49
Dallas
1967
1968
79
68
*
79
47
48
49
2013
2014
2015
69
69
72
71
69
69
*
43.0
48.0
47.5
47.5
|Line|
14.0
13.5
0
0
0
0
0
4.5
2.5
1.0
MTB > corr c7 c60 c20 c37 c40 (Correlations c7 c60 c20 c37 c40)
Share
0.462
0.001
Share_lastYR
Over/Under
-0.385
0.007
-0.466
0.001
|Line|
-0.115
0.433
-0.095
0.522
0.241
0.098
Dallas
0.365
0.010
0.070
0.637
-0.204
0.164
Share_lastYR
Over/Under
|Line|
Cell Contents:
Correlation (r)
P-value
-0.017
0.905
1.
2.
Page 81 of 156
Page 82 of 156
Minitab 17 Commands
Minitab 17 Menus
MTB> regr;
SUBC> response Share;
Stat Menu Regression Regression
SUBC> continuous 'Share_lastYR' Dallas;
Fit Regression Model
SUBC> terms 'Share_lastYR' Dallas;
SUBC> GFITS.
Regression Equation:
Page 83 of 156
5
Regression Analysis: Share versus 'Share_lastYR, Dallas
Analysis of Variance (ANOVA Table)
Source
Regression
Share_lastYR
Dallas
Error
Lack-of-Fit
Pure Error
Total
DF
2
1
1
45
23
22
47
Adj SS
344.9
180.7
140.2
615.0
327.4
287.5
959.9
Adj MS
172.44
180.71
140.21
13.67
14.24
13.07
F-Value
12.62
13.22
10.26
P-Value
0.000
0.001
0.002
1.09
0.422
Model Summary
S
3.69681
R-sq
35.93%
R-sq(adj)
33.08%
R-sq(pred)
26.84%
Coefficients
Term
Constant
Share_lastYR
Dallas
Coef
38.82
0.411
4.60
SE Coef
7.64
0.113
1.44
T-Value
5.08
3.64
3.20
#5
P-Value
0.000
0.001
0.002
VIF
1.00
1.00
#6
Regression Equation: Share = 38.82 + 0.411 Share_lastYR + 4.60 Dallas
Check Assumptions
(See # 3)
versus
H1: Not H0
Page 84 of 156
Adj MS
172.44
180.71
140.21
13.67
F-Value
12.62
13.22
10.26
P-Value
0.000
0.001
0.002
Analysis of Variance
Analysis of Variance
Source
F-Value
P-Value
344.9
172.44
12.62
0.000
Share_lastYR 1
204.7
204.67
14.98
0.000
Dallas
140.2
140.21
10.26
0.002
Error
45
615.0
13.67
Total
47
959.9
Regression
DF
Seq SS
Seq MS
Source
Regression
DF
Seq SS
Seq MS
F-Value
P-Value
344.9
172.44
12.62
0.000
164.2
164.18
12.01
0.001
Share_lastYR 1
13.22
0.001
Dallas
180.7
180.71
Error
45
615.0
13.67
Total
47
959.9
Note that the three Analysis of Variance Tables (above) are identical except
for the rows that refer to predictors. The last two tables show the sequential
(rather than marginal) contribution of each predictor to SS(Regression).
The marginal effect of a predictor is what it contributes when it comes in
last. The sequential contribution refers to the amount by which
SS(Regression) increases when a predictor is entered in the order indicated.
Page 85 of 156
For example, the shaded rows show the amount that each predictor would
contribute to SS(Regression) if it were the only predictor in the model.
7. What would be the value of SS(Regression) in a simple linear
regression where Share is regressed on Share_lastYR?
204.7
8. What would be the value of SS(Regression) in a simple linear
regression where Share is regressed on Dallas?
164.2
Also note that in each of the Sequential tables the values of Seq SS
for the predictors add up to the Seq SS value for Regression which is
SS(Regression):
1st Sequential Table: 204.7 + 140.2 =
344.9;
nd
2 Sequential Table: 164.2 + 180.7 =
344.9.
Ultimately, a predictors value in a given model depends on its
marginal contribution, i.e., what it contributes to SS(Regression) when
added last. Thus, these values are the ones provided in the default
version of the analysis of variance table (near the top of the last page).
Each coefficients t-value also measures the corresponding predictors
marginal value in the following way: the squared t-value of the
coefficient of each predictor is equal to that predictors marginal
contribution to SS(Regression) divided by MS(Error). Here is the
coefficient table with t-values:
Coefficients
Term
Constant
Share_lastYR
Dallas
Coef
38.82
0.411
4.60
SE Coef
7.64
0.113
1.44
T-Value
5.08
3.64
3.20
P-Value
0.000
0.001
0.002
VIF
1.00
1.00
To illustrate:
2
_
=
2
. . ()
()
. . ()
()
180.7
13.67
140.2
13.67
= (3.64)2 =13.22
= (3.20)2=10.26
Since the square of each t-value reflects the marginal value of the
predictor relative to MS(Error), the fit of a model will improve
(MS(Error) will decrease) whenever we remove predictors with
absolute t-values less than 1 (or if we add predictors with |t-value| >1).
Page 86 of 156
R-sq(adj)
35.82%
Coef
42.55
0.376
4.57
-0.00420
F-Value
9.74
15.61
10.70
2.92
P-Value
0.000
0.000
0.002
0.095
T-Value
5.46
3.33
3.25
-1.71
P-Value
0.000
0.002
0.002
0.095
R-sq(pred)
28.72%
SE Coef
7.80
0.113
1.41
0.00246
Regression Equation
Page 87 of 156
. (.)
.
= 1.30
10
4.57
= 1.82
1.41
Critical value: t (44 df, = 0.1) = 1.3 Conclusion: Yes!
(Reject H0)
DF
3
1
1
1
44
47
Seq SS
383.15
204.67
140.21
38.27
576.72
959.87
Seq MS
127.72
204.67
140.21
38.27
13.11
F-Value
9.74
15.61
10.70
2.92
P-Value
0.000
0.000
0.002
0.095
13.11
Critical value: F ( 2 & 44
Conclusion: Reject H0 .
13.11
= 6.8
Page 89 of 156
Compensation (millions)
Age (years)
Wins with Team (as HC)
Indicator for AFC Team
Indicator for Head Coach in Super B.
Menu Guide:
MINITAB 17: Stat >Regression >Regression >Fit Regression Model
In Dialog Box:
Response: c8
Continuous Predictors: c6 c23 c25 c15
[When fitting this model, I actually listed the predictors in the order: c6 c23 c25 c15. The order is
inconsequential, unless we plan to use the sequential sums of squares to test the importance of a
specific subgroup of predictors. If we plan to test a subgroup, those predictors should be listed
last.]
Model 1
Regression Equation
Model Summary
S
R-sq
0.955828 60.99%
R-sq(adj)
55.21%
R-sq(pred)
47.31%
Coefficients
Term
Constant
Age
AFC
HeadCoach_in_SB
W_with_Team
Coef SE Coef
4.57
1.70
-0.0016
0.0323
-0.669
0.360
1.092
0.415
0.01494 0.00441
T-Value
2.68
-0.05
-1.86
2.63
3.39
P-Value
0.012
0.960
0.074
0.014
0.002
VIF
1.32
1.14
1.45
1.35
The simple correlation between age and salary is positive (r= 0.364, pvalue=0.040; not shown in output above), but the negative coefficient
for age in the model above suggests there is age discrimination! (Until
we realize the coefficient for age does not even approach significance.)
This model and the significance of the positive simple correlation
between age and salary indicate that age is a proxy for past success (the
last two predictors in the model above).
Discussion Question
1. What is wrong with this model? What would be a better fitting
model?
The model without Age would fit better, because |tAge| < 1
(see discussion in last lecture, bottom of p.7).
What if the constant has a t-ratio that is less than one in
absolute value? If the primary purpose of the model is
prediction, you should take the constant out of the model (as
you would any other predictor), otherwise you might keep the
constant. For example, you would not eliminate the constant in
the Market Model. The purpose of the Market Model is not
prediction, and the constant is actually used to estimate the
securitys return when the market return is zero.
Model 2
Here we drop Age
Regression Equation
Model 2 (Continued)
Analysis of Variance
Source
DF Adj SS
Regression
3 38.5626
Error
Lack-of-Fit
Pure Error
Total
Adj MS
12.8542
F-Value
14.59
P-Value
0.000
0.8811
0.9518
0.2917
3.26
0.180
.
28
25
3
31
24.6698
23.7948
0.8750
63.2324
Model Summary
S
R-sq R-sq(adj) R-sq(pred)
0.938650 60.99%
56.81%
49.58%
Here Is the Summary for Model 1:
S
R-sq R-sq(adj) R-sq(pred)
0.955828 60.99%
55.21%
47.31%
Coefficients
Term
Constant
AFC
HeadCoach_in_SB
W_with_Team
Coef
4.484
-0.665
1.086
0.01490
SE Coef
0.294
0.343
0.392
0.00425
T-Value
15.26
-1.94
2.77
3.50
P-Value
0.000
0.062
0.010
0.002
VIF
1.07
1.35
1.30
'C:\MTBWIN\DATA\PROMOD.MTW'.
Count
C1
price
24
C3
baths
24
C4
lotsize
24
C5
space
24
C7
nROOMS
24
C11
two-car
24
C12
HALFbath
24
Model 1
Stat> Regression >Regression > Fit Regression Model
In regression dialog box:
1) List as continuous predictors: c12 c11 c4 c7 c5
2) Open Options, and set Sum of Squares to
Sequential.
Regression Equation:
price = 132.8 + 28.49 HALFbath + 22.17 two-car
+ 3.52 lotsize - 5.32 nROOMS + 12.0 space
Page 93 of 156
Analysis of
Source
Regression
HALFbath
two-car
lotsize
nROOMS
space
Error
Total
Variance
DF
Seq SS
5 12955.2
1
8530.4
1
3242.6
1
969.3
1
139.2
1
73.7
18
3832.6
23 16787.7
Seq MS
2591.04
8530.44
3242.62
969.25
139.15
73.72
212.92
Model Summary
S
R-sq
14.5918 77.17%
R-sq(adj)
70.83%
Coefficients
Term
Coef
Constant 132.8
HALFbath 28.49
two-car
22.17
lotsize
3.52
nROOMS
-5.32
space
12.0
SE Coef
32.1
9.42
8.15
1.96
5.46
20.4
F-Value
12.17
40.06
15.23
4.55
0.65
0.35
P-Value
0.000
0.000
0.001
0.047
0.429
0.564
R-sq(pred)
56.25%
T-Value
4.13
3.03
2.72
1.79
-0.97
0.59
P-Value
0.001
0.007
0.014
0.090
0.343
0.564
VIF
2.22
1.86
1.61
2.52
3.42
6. What is wrong with this model? (I dont even bother to plot residuals.)
|t| < 1 for two predictors! (Thus, we know we can find a
better model.)
7. Is it best to drop both nRooms and Space, or should we hold on to
at least one of them? (Consider the marginal value of both predictors.)
{Total Marginal Value} = 139.2 + 73.7 = 212.9
Since 212.9 < MS(Error) we should drop both!
[Its just a coincidence that the total marginal value of
both predictors is actually equal to MS(Error). ]
Page 94 of 156
8. Do the following formal test at the 0.05 level. (The underlying question
is: does the information in these two predictors have significant
marginal value at the 0.05 level? We would deduce from the analysis
for the last question that they do not!)
H0 : nRooms = space = 0
Test Statistic:
H1: not H0
(.+.)/
= 0.5
R-sq(adj)
72.29%
Coefficients
Term
Coef
Constant 113.98
HALFbath
28.73
two-car
18.59
lotsize
3.92
SE Coef
9.83
6.78
6.49
1.79
Seq MS
4247.44
8530.44
3242.62
969.25
202.27
212.92
0.00
F-Value
21.00
42.17
16.03
4.79
P-Value
0.000
0.000
0.001
0.041
R-sq(pred)
60.81%
T-Value
11.60
4.24
2.87
2.19
P-Value
0.000
0.000
0.010
0.041
VIF
1.21
1.24
1.41
Regression Equation
price = 113.98 + 28.73 HALFbath + 18.59 two-car
+ 3.92 lotsize
Page 95 of 156
10. In what ways is Model 2 superior to Model 1 (if the three assumptions
hold in #9)?
Every way (simpler and better fit).
11. Is Model 2 significant at the 0.01 level?
Yes (1st p-value in Analysis of Variance refers to the
overall Regression model, and it is 0.000).
12. Which coefficients in Model 2 are significant (i.e., significantly
nonzero) at the 0.02 level?
All except lotsize (p=0.04). Be sure to refer to the
Coefficient table and NOT the Analysis of Variance Table.
Page 96 of 156
Test Statistic:
0.5
Reject H0 Yes!
Page 97 of 156
17. Please give an 90% confidence interval for the difference between the
average price of a home with a two-car garage and the average price
without a two-car garage (holding other predictors constant).
btwo car
(20)
s
t 0.05
btwo car
18.6 + 11.2
18. For the model on page 5, please test the following hypothesis at the
0.05 level.
H0: lotsize = NROOMS = space = 0
H1: not H0
Please find these
numbers on page 5.
Test Statistic:
Critical Value:
Conclusion: Accept H0 .
Page 98 of 156
10
45 45
11
In MINITAB 17, the Model subdialog box (below on the left) allows you to remove the constant term
(see the Include the constant option near the lower left corner).
The Model Subdialog Box
The buttons labelled Add in the top right corner of this dialog box allow one to add multiples or powers
of predictors listed. For example, if we highlight AFC and Head_Coach_in_SB in the top left right
margin, and hit the first Add button, we add the predictor AFCx Head_Coach_in_SB. This is
referred to as an interaction of the two predictors and allows us to estimate the incremental value (relative
to compensation) of being an AFC couch who also has been a head coach in a previous Super Bowl.
12
To Make a Prediction
After fitting a model,
predictions are made
from a separate part of
the Regression menu
(Stat Menu Regression
Regression Predict).
Here I am requesting
predicted compensation
for a head coach who
works in the AFC, has
not been a head coach for
a team in the Super Bowl,
and has 100 team wins.
That is, the numbers 1 0
100 refer to the values
of the three predictors in
the model:
AFC =1,
Head_Coach_in_SB=0,
W_with_Team=100.
()
()
Predicted R-squared = 1
,
()
where Variance(Error*), represents the variance of error that occurs
when each observation of Y is predicted without including that
observation when estimating the model. When Predicted R-squared is
not available, another approach to finding the best predictive model is
to minimize either the criterion or the Mallows Cp criterion:
Cp =
= () +
SS(Error)
,
[ + ]
( 2),
C2
C3
C4
100
100
100
0
64
0
C5
100
Viewers
C6
100
Comedy
C7
100
Drama
C8
100
Talent/Variety
C9
100
NewsMagazine
C10
100
Reality
C11
100
Sports
C19
100
Period
Name of Show
Shows
Repeat_Special_Premier
Network
Live+SD (Millions)
Indicator
Indicator
Indicator
Indicator
Indicator
Indicator
Weeks 1-4
Viewers
12.001
9.396
8.717
8.215
97
98
8.395
8.327
Repeat_Special_Premier
P
R
R
R
Network
NBC
CBS
CBS
CBS
Comedy
0
0
1
0
ABC
ABC
Drama
0
1
0
0
1
0
Reality
0
0
0
0
Period
1
1
1
1
.
0
0
0
1
.
Page 105 of 156
4
4
Comedy
Drama
0.025
0.803
-0.370
0.000
Talent/Varie
0.315
0.001
-0.219
0.028
NewsMagazine
-0.271
0.006
-0.159
0.114
Reality
-0.166
0.099
-0.159
0.114
between variables
and 1 is the
-0.287
-0.170
0.004
0.090
slope coefficient of a simple
Sports
-0.124
0.219
-0.065
0.523
-0.117
other.
0.248
-0.069
0.494
Premier
-0.152
0.130
-0.035
0.727
-0.015
0.882
0.045
0.656
Repeat
-0.353
0.000
0.134
0.182
0.043
0.675
-0.161
0.111
Special
-0.248
0.013
-0.017
0.870
-0.194
0.053
0.192
0.056
ABC
0.045
0.659
0.017
0.869
-0.187
0.062
0.050
0.618
CBS
0.063
0.531
0.231
0.021
0.229
0.022
-0.494
0.000
FOX
-0.203
0.043
-0.151
0.134
-0.068
0.501
0.263
0.008
Period_1
-0.488
0.000
0.046
0.649
-0.047
0.641
-0.162
0.108
Period_2
-0.088
0.384
-0.015
0.879
-0.236
0.018
0.250
0.012
Period_3
0.226
0.024
-0.015
0.879
0.141
0.160
-0.044
0.663
Comedy
Drama
Talent/Variety
-0.170
0.090
(-0.488)2
= 0.24 or
24%
(-0.353)2
= 0.12 or
12%
7
MTB > Breg Viewers Comedy-Reality Premier-FOX 'Period_1'-'Period_3'
Best Subsets Regression: Viewers versus Comedy, Drama, ... : Response is Viewers
R-Sq
23.9
12.5
30.9
29.6
40.1
38.3
46.4
43.9
48.4
47.8
50.2
50.1
52.2
52.2
54.2
54.1
55.7
55.1
56.4
56.3
57.2
56.9
57.9
57.3
58.4
58.1
58.7
R-Sq
(adj)
23.1
11.6
29.5
28.1
38.2
36.3
44.1
41.6
45.7
45.0
47.0
46.8
48.6
48.5
50.2
50.1
51.3
50.6
51.5
51.4
51.9
51.5
52.1
51.4
52.1
51.7
51.9
R-Sq
(pred)
21.3
9.9
26.7
25.1
34.5
33.9
40.9
37.9
42.2
42.0
43.2
43.3
45.4
44.9
46.5
46.8
47.5
47.0
47.4
46.8
46.7
46.5
46.6
45.7
45.6
45.9
45.0
Mallows
Cp
60.6
84.0
48.1
50.8
31.2
35.0
20.3
25.3
18.1
19.4
16.4
16.7
14.2
14.4
12.2
12.4
11.0
12.4
11.6
11.8
12.0
12.7
12.7
13.8
13.5
14.2
15.0
S
2.4982
2.6786
2.3916
2.4147
2.2393
2.2727
2.1298
2.1773
2.0995
2.1123
2.0737
2.0766
2.0418
2.0439
2.0106
2.0130
1.9875
2.0028
1.9827
1.9855
1.9765
1.9836
1.9724
1.9852
1.9711
1.9788
1.9763
C
o
m
e
d
y
D
r
a
m
a
T
a
l
e
n
t
/
V
a
r
i
e
t
y
N
e
w
s
M
a
g
a
z
i
n
e
R
e
a
l
i
t
y
P
r
e
m
i
e
r
R
e
p
e
a
t
S
p
e
c
i A C F
a B B O
l C S X
P
e
r
i
o
d
_
1
X
P
e
r
i
o
d
_
2
X
X
X
X
X
X
X
X
X
P
e
r
i
o
d
_
3
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X
X
X X X X
X X
X
X X X X
X X X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X X
X
X X
(+)
] =
] =
(+)
4.39
4.42
Constant
Comedy
Drama
Talent/Variety
NewsMagazine
Reality
Premier
Repeat
Special
ABC
CBS
FOX
Period_1
Period_2
Period_3
NewsMagazine*CBS
Reality*CBS
-----Step
Coef
10.72
-2.40
-1.92
1.61
-3.00
-3.83
0.659
-1.785
-1.272
1.390
2.312
-2.499
-2.095
-1.131
-0.611
-0.36
0.50
S
R-sq
R-sq(adj)
R-sq(pred)
1---P
0.060
0.101
0.202
0.038
0.014
0.470
0.031
0.079
0.070
0.005
0.005
0.020
0.105
0.286
0.799
0.729
-----Step
Coef
10.71
-2.35
-1.87
1.61
-3.18
-3.83
0.677
-1.825
-1.278
1.380
2.252
-2.530
-2.055
-1.112
-0.609
0.56
1.99730
58.78%
50.83%
43.56%
2---P
0.061
0.103
0.198
0.011
0.013
0.455
0.023
0.075
0.070
0.004
0.004
0.019
0.107
0.286
-----Step
Coef
10.68
-2.33
-1.86
1.65
-3.15
-3.53
0.662
-1.866
-1.272
1.310
2.314
-2.530
-2.083
-1.087
-0.622
3---P
0.061
0.103
0.183
0.011
0.008
0.462
0.019
0.075
0.075
0.003
0.003
0.017
0.112
0.273
-----Step
Coef
10.55
-2.08
-1.63
1.86
-2.92
-3.27
4---P
0.081
0.136
0.124
0.015
0.010
-1.992
-1.434
1.187
2.194
-2.498
-1.849
-1.006
-0.591
0.010
0.035
0.096
0.003
0.004
0.022
0.135
0.294
0.692
1.98616
58.75%
51.38%
44.60%
1.97630
58.67%
51.86%
44.99%
1.97108
58.40%
52.11%
45.63%
********************************************************************************************
Constant
Comedy
Drama
Talent/Variety
NewsMagazine
Reality
Premier
Repeat
Special
ABC
CBS
FOX
Period_1
Period_2
Period_3
NewsMagazine*CBS
Reality*CBS
S
R-sq
R-sq(adj)
R-sq(pred)
-----Step
Coef
10.18
-2.02
-1.57
1.93
-2.86
-3.18
5---P
0.090
0.152
0.110
0.017
0.013
-1.997
-1.439
1.201
2.203
-2.511
-1.548
-0.704
0.010
0.035
0.093
0.003
0.004
0.040
0.245
1.97237
57.86%
52.05%
46.60%
-----Step
Coef
9.76
-2.02
-1.46
2.04
-2.94
-3.20
-2.379
-1.691
1.324
2.495
-2.578
-1.061
6---P
-----Step 8---Coef
P
8.237
0.091
0.182
0.092
0.014
0.012
-----Step 7---Coef
P
8.322
-0.641
0.279
3.372
-1.677
-1.827
0.000
0.019
0.013
3.495
-1.487
-1.610
0.000
0.032
0.022
0.001
0.009
0.062
0.000
0.003
0.087
-2.388
-1.695
1.393
2.566
-2.239
-1.069
0.001
0.009
0.050
0.000
0.006
0.086
-2.407
-1.738
1.267
2.460
-2.203
-1.090
0.001
0.008
0.071
0.000
0.007
0.081
1.97649
57.20%
51.85%
46.70%
1.98546
56.32%
51.41%
46.81%
1.98752
55.74%
51.31%
47.48%
The model in Step 8 (with 9 predictors) is the best predictive model on page 8.
The analysis continues for two more steps but R-sq(pred) decreases. Here is
the summary for those two steps:
S
R-sq
R-sq(adj)
R-sq(pred)
10
. .
Error
Lack-of-Fit
Pure Error
Total
Model Summary
S
R-sq
1.98752 55.74%
Coefficients
Term
Constant
Talent/Variety
NewsMagazine
Reality
Repeat
Special
ABC
CBS
FOX
Period_1
Adj SS
447.70
.
90
24
66
99
Adj MS
49.744
.
355.52
81.99
273.53
803.22
R-sq(adj)
51.31%
Coef
8.237
3.495
-1.487
-1.610
-2.407
-1.738
1.267
2.460
-2.203
-1.090
F-Value
12.59
3.950
3.416
4.144
P-Value
0.000
.
0.82
.
0.694
R-sq(pred)
47.48%
SE Coef
0.628
0.656
0.683
0.690
0.691
0.638
0.693
0.681
0.803
0.617
T-Value
13.12
5.32
-2.18
-2.33
-3.48
-2.73
1.83
3.61
-2.74
-1.77
P-Value
0.000
0.000
0.032
0.022
0.001
0.008
0.071
0.000
0.007
0.081
VIF
1.68
1.16
1.18
1.79
1.16
2.08
2.93
1.47
1.81
Regression Equation
Viewers = 8.237 + 3.495 Talent/Variety - 1.487 NewsMagazine - 1.610 Reality - 2.407 Repeat
- 1.738 Special + 1.267 ABC + 2.460 CBS - 2.203 FOX - 1.090 Period_1
11
12
&
The inclusion of any one of the 3 highlighted variables above will make it
impossible to do the best subsets analysis. If these variables had already been
created, I might not realize that they described mutually exclusive categories
(and, consequently, that the inclusion of the highlighted variables was
redundant information). Whenever I get the error message above, I
13
Other Criteria
(A Vocabulary for Conversations with Other Modelers)
1. Sometimes when we compare models, the sample size changes (perhaps
because of missing values). But if the sample size, n, is not changing, it is
easier to calculate and minimize Sp (rather than ) :
(Error)
=
=
.
[( ( + 1)]
1
2. Another approach to finding the best predictive model is to select the
model that minimizes the Akaike Information Criterion (AIC),
AIC = (
(SS(Error)
n
)+
log ( ).
When the sample size is sufficiently large, AIC is minimized whenever the
criterion is minimized (and R-sq(pred) is maximized), so that one will
generally select the same model using any one of these three criteria.
3. Sometimes the objective is not to find the best predictive model, but rather
to find the model that comes closest to describing the true relationship
between Y and a set of predictors. The criteria, that are used in this case,
incorporate a more severe penalty for model complexity. One approach to
finding the true scientific model is to minimize the Bayesian Information
Criterion (BIC),
BIC = (
(SS(Error)
)
n
()
BIC* = () { + [
If you compare BIC* with , you can see that BIC* imposes a greater
penalty for model complexity!
Bruce Cooil, 2016
14
Cp =
SS(Error)
MS(Error for Model with All Predictors)
( 2) ,
Lecture 11
1-Way Analysis of Variance (ANOVA)
As a Type of Multiple Regression Analysis
Outline
Reference: Ch. 14: 14.8(again); Ch. 11: 11.1-11.2 (for main ideas only)
Large-Cap
Mid-Cap
Small-Cap
Blend
Blend
Blend
14.1
6.8
10.7
6.1
11.4
6.9
Blend
Blend
3.3
10.2
6.4
7.3
10.4
8.5
7.70
9.70
8.125
Mean
4.58
2.00
1.94
Std. Dev.
This analysis does not actually require the same number of observations
per category.
A simple (and elegant way) of summarizing the average differences in
return by group is to perform a multiple regression where:
Y is the return, and
Two indicator variables are used as predictor variables to denote
group membership. (Note: for k groups, only k-1 indicators are
needed.)
If we are primarily interested in contrasting large-cap funds with the
others, the above would be organized as follows for the multiple
regression analysis.
Fund
1
2
3
4
5
6
7
8
9
10
11
12
X1
X2
Return
14.1
6.1
3.3
7.3
6.8
11.4
10.2
10.4
10.7
6.9
6.4
8.5
Mid-Cap
0
0
0
0
1
1
1
1
0
0
0
0
Small-Cap
0
0
0
0
0
0
0
0
1
1
1
1
Page 117 of 156
Source
Regression
Error
Total
DF
2
9
11
Model Summary
S
3.09709
R-sq
9.33%
Coefficients
Term
Constant
Mid-Cap
Small-Cap
Adj SS
8.8817
86.3275
95.2092
Adj MS
4.4408
9.5919
R-sq(adj)
0.00%
Coef SE Coef
7.70
1.55
2.00
2.19
0.425
2.19
F-Value
0.46
P-Value
0.644
R-sq(pred)
0.00%
T-Value
4.97
0.91
0.19
P-Value
0.001
0.385
0.850
VIF
1.33
1.33
Regression Equation
Small-Cap
8.125
From these means, how could we have deduced that the regression
equation would have been:
Return = 7.70 + 2.00 Mid-Cap + 0.425 Small-Cap ?
To answer this question, first note the Return values that this model
predicts for each of the 3 types of funds.
Whats the predicted value of Return for Large-Cap?
(It will be the sample mean for Large-Cap.")
Predictor values in this case are:
Mid-Cap = 0
, Small-Cap = 0
Whats the predicted return for Mid-Cap funds? (It will be the
sample mean return for Mid-Cap funds.)
Predictor values in this case are:
Mid-Cap = 1 , Small-Cap =
Mid-Cap = 0
, Small-Cap =
2.00
9.70
7.70
0.425
8.125
7.70
Page 119 of 156
4.
Residual
5.0
2.5
0.0
-2.5
-5.0
8.0
8.5
9.0
Fitted Value
9.5
10.0
a.
Source
Regression
Error
Total
DF
2
9
11
Adj SS
8.8817
86.3275
95.2092
Adj MS
4.4408
9.5919
F-Value
0.46
P-Value
0.644
0.644
Conclusion:
(p-value of F)
Accept H0
F = 0.46
Term
Constant
Mid-Cap
Small-Cap
Coef
7.70
2.00
0.43
SE Coef
1.55
2.19
2.19
T-Value
4.97
0.91
0.19
P-Value
0.001
0.385
0.850
VIF
1.33
1.33
H0: Mid-Cap = 0
H0: Small-Cap = 0
H1: Mid-Cap
>0
P-value: 0.385/2=0.1925
Conclusion:
Conclusion: Accept H0
Reject H0
b.
c.
bMIAMI
bLApsdna
bNorleans
Example 2:
Analysis of Network Share By Location of the First 36 Super Bowls
(Imagine This is the Entire Sample)
Data Display
Row
year
share
site
MIAMI
LApsdna
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
79.0
68.0
71.0
69.0
75.0
74.0
72.0
73.0
72.0
78.0
73.0
67.0
74.0
67.0
63.0
73.0
69.0
71.0
63.0
70.0
66.0
62.0
68.0
63.0
63.0
61.0
66.0
66.0
63.0
72.0
65.0
67.0
61.0
68.5
60.0
61.0
2
1
1
3
1
3
2
4
3
1
2
3
1
2
3
4
2
4
4
3
2
4
1
3
4
4
2
4
1
4
3
4
4
4
4
3
0
1
1
0
1
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
1
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
Norleans Loc. 4
0
0
0
1
0
1
0
0
1
0
0
1
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
1
1
0
0
1
0
0
1
1
0
1
0
1
0
1
1
1
1
0
Analysis of
Source
Regression
MIAMI
LApsdna
Norleans
Error
Total
Variance
DF
Seq SS
3 148.323
1
70.444
1
73.389
1
4.490
32 736.087
35 884.410
Model Summary
S
R-sq
4.79611 16.77%
R-sq(adj)
8.97%
Coefficients
Term
Coef
Constant 66.19
MIAMI
4.81
LApsdna
4.09
Norleans
0.92
SE Coef
1.33
2.25
2.25
2.08
R-sq(pred)
0.00%
T-Value
49.76
2.14
1.82
0.44
P-Value
0.000
0.040
0.078
0.662
VIF
1.24
1.24
1.27
Regression Equation
share= 66.2 + 4.81 MIAMI + 4.09 LApsdna + 0.92 Norleans
10
2.
Means
site
N
1
7
2
7
3
9
4
13
Mean
71.00
70.29
67.11
66.19
StDev
5.10
4.75
4.46
4.88
95%
(67.31,
(66.59,
(63.85,
(63.48,
CI
74.69)
73.98)
70.37)
68.90)
MS ( Error )
[(7 1)510
. 2 ] [(7 1)4.752 ] [ (9 1)4.462 ] [ (13 1)4.882 ]
(7 1) (7 1) (9 1) (13 1)
The pooled standard deviation (above) is calculated the same way as in
the Case 5 test for two means and this is an illustration of how the
homoscedasticity assumption is used in regression models.
One problem with this analysis: SHARE tends to decline with time, and
games played at "Location 4" (i.e., other locations) tend to be more
common in later years, relative to games played at the first three
locations (especially MIAMI). To correct for the approximate linear
decline in SHARE over the years, I add YEARS as a predictor.
Page 125 of 156
11
Analysis of Covariance
Now we regress 'share' on 4 predictors:
year-1967, MIAMI, LApsdna, Norleans.
Analysis of Variance
Source
DF
Seq SS
Regression
4 447.891
year-1967
1 422.879
MIAMI
LApsdna
Norleans
Error
Total
1
1
1
31
35
Model Summary
S
R-sq
3.75250 50.64%
Seq MS
111.973
422.879
F-Value
7.95
30.03
P-Value
0.000
0.000
5.372
8.260
11.382
14.081
0.38
0.59
0.81
0.541
0.450
0.376
5.372
8.260
11.382
436.518
884.410
R-sq(adj)
44.27%
Coefficients
Term
Coef
Constant
73.95
year-1967 -0.3221
MIAMI
0.64
LApsdna
0.53
Norleans
-1.54
SE Coef
1.98
0.0698
1.98
1.92
1.71
R-sq(pred)
35.35%
T-Value
37.40
-4.61
0.32
0.27
-0.90
P-Value
0.000
0.000
0.748
0.786
0.376
VIF
1.35
1.57
1.48
1.41
Regression Equation
share = 73.95 - 0.3221 year-1967 + 0.64 MIAMI
+ 0.53 LApsdna - 1.54 Norleans
1.
12
H0:MIAMI=LApsdna=Norleans=0
H1: not H0.
(Note: This is a subset of the coefficients in the current model!!)
Equivalent numerator: [5.372 +8.260 +11.382]/3
F-statistic:
13
Using MINITAB to Look Up F Critical Values: Here Im Finding F(=0.2; 3 & 31 df)
x
1.6413
Bruce Cooil, 2016
Page 128 of 156
d. In This Example:
Test Statistic:
2 =
(5040)2
40
(3020)2
20
(4060)2
60
(8080)2
80
= 14.2
Critical Value: 2 0.1(3 d.f.) 6.25 (Table A.5, p. 610, Appendix A).
Conclusion: Reject H0.
There has been a change in market shares at the 0.1 level.
2.
b. Example: Suppose we want to test whether age (as measured by 3 age groups)
and soap brand preference are independent (=0.1):
H0: Age group and brand preference are independent
H1: Age group and brand preference are dependent.
Page 130 of 156
Brand Preferred
A
Other
Row Total(Ri)
< 25
10
(6)
5
(9)
15
(15)
30
25 to 40
5
(6)
10
(9)
15
(15)
30
> 40
5
(8)
15
(12)
20
(20)
40
Column Total(Cj)
20
30
50
100
This table shows the number of people in each age and brand- preference
category. The numbers in parenthesis represent the estimated expected cell
counts, Ei j, that we would have if age and brand-preference were independent:
Ei j = total*(prob.of being in age group)*(prob.of preferring that brand)
= n*(Ri/n)*(Cj/n) = (RiCj)/n.
For example:
6 = ( 30 * 20 )/100 , 9 = ( 30 * 30 )/100 , ..., 20 = ( 40 * 50 )/100.
c. Mechanics of the Test: We reject H0 in favor of H1 if the observed and
expected counts are sufficiently different. As before, we use the chi-square
statistic:
2 = all cells [(Observed Count) - (Expected Count)]2/[Expected Count].
If H0 is true, 2 should behave approximately like a sum of (r1)*(c1) squared
independent standard normal random variables. Formally we reject H0 in favor of
H1 if:
2 > 2 ( with(r-1)*(c-1) d.f.).
d. Test Statistic in this case:
2 =
(106)2
6
(59)2
9
(56)2
6
(109)2
9
(58)2
8
(1512)2
12
= 6.6
Critical Value: 20.1(4 d.f.) 7.8 (Table A.5, p. 610, Appendix A).
Conclusion: Accept H0.
Age and Brand preference are not significantly dependent.
e. Assumption: All expected counts should be at least 5.
Page 131 of 156
Lecture 13
Executive Summary of Managerial Statistics (Mgt. 6381)
& Notes for the Final Quiz
1.
2.
3.
T-tests are used for tests on individual coefficients, and Ftests are used when testing groups of coefficients.
b.
c.
e.
)] ,
[ + ]
B.
C.
D.
II.
III.
Seven Cases for Tests on Means and Proportions and Confidence Intervals
(Lectures 3-5 & The Outline)
Purposes
1.
Prediction of Y
2.
3.
4.
Descriptive Summary
IV. Regression
B.
(Continued)
Checking Assumptions
Using the plot of standardized residuals versus fit to check linearity,
homoscedasticity, and residual randomness.
C.
Interpretation of Coefficients
D.
2.
3.
4.
E.
Lack-of-Fit test
2.
3.
4.
5.
=
F.
()
()
95% . .: 0.025
G.
[ ]
[( )2 + ()]
A good criterion for finding the best predictive model (i.e., the
model that will minimize mean squared error when used to
predict y using new data): maximize Predicted Rsquare:
Predicted R-squared = 1
()
()
)] ,
[ + ]
SS(Error)
MS(Error for Model with All Predictors)
( 2),
2.
3.
4.
H.
2.
3.
6
Page 138 of 156
B.
C.
7
Page 139 of 156
1
Review Questions
(45)
1. Please refer to the analysis on page 1 of the Appendix. Here I am studying models
that predict the gross receipts from the opening release of a movie (represented by the
variable OPENING). Please use the best subsets analysis on the bottom of page 1 of
the Appendix to answer the following questions.
(6) a. Based on the best subsets analysis, which model minimizes the variance of the error (or
residuals)? (To identify the model, please specify the number of predictors and whether it
is the first or second model of that type.) Also, please specify the variance of the error.
Model Choice:
(17.076)2 = 291.6
(6) d. Assume that the correlation between OPENING and BUDGET is positive. Is there
enough information in the best-subsets analysis to deduce the actual value of the sample
correlation between these two variables? If so, please specify its value.
Please circle one: NO (there is not enough information)
If YES, give the value of the correlation between OPENING and BUDGET: p0.219
= 0.47
(6) e. Assume that the correlation between OPENING and SUMMER is positive. Is there
enough information in the best-subsets analysis to deduce the actual value of the sample
correlation between these two variables? If so, please specify its value.
Please circle one: NO (there is not enough information)
If YES, give the value of the correlation between OPENING and SUMMER: p0.030
= 0.17
(9) f. Use the Sp* criterion to choose the best predictive model from among the following two
alternatives on page 1 of the Appendix: ( 1) the best predictive model according to Cp and
(2) the best predictive model according to R-sq(pred). Which of these two models has the
best Sp* value?
Model 1 (1st with 2 preds) is the better predictive model according to Sp*
Page 140 of 156
2
(6) g. In the best subsets analysis, consider the last model with all 4 predictors. Is it possible to
deduce whether the t-ratio of the coefficient of SUMMER is greater than 1 in absolute
value? (If it is not possible, please specify what additional analysis is needed.)
Please circle one:
For predictor SUMMER in 4-predictor model , |t-ratio| is:
>1
<1
Explain how you know or what additional information is needed:
1st model with 3 predictors has all the same predictors except for
SUMMER and it has a lower MS(Error) .
(45) 2. Now consider the simple linear regression model on page 2 of the Appendix.
(12)a. As a predictive model for OPENING, this is a dreadful model. Assuming that I knew
beforehand that this model would not have much predictive value, is there any rational
reason for doing the regression analysis on page 2? (Please explain very briefly.)
STAR = 0
P-value: 0.994
H1:
STAR 0
Conclusion: Accept
H0
3
(12)e. Now briefly consider the two predictor model on page 3 of the Appendix. Here
OPENING is regressed on STAR and BUDGET. A superstars agent concedes that this
model shows that adding a superstar to a movie actually results in a decrease of 13.2
million in OPENING receipts, assuming we hold BUDGET fixed. Nevertheless, this
agent argues that this decrease in OPENING can be prevented by increasing the total
BUDGET sufficiently, and that this alone shows that movies with superstars can make
more money than movies without superstars. Is there anything wrong with this
argument? (Please explain briefly and assume that the regression model itself is valid.)
Is there anything wrong with this argument? YES
(Please circle your answer.)
NO
Brief explanation:
The following questions refer to the model on page 4 of the Appendix. Here the dependent
variable, LN(Opening), is the natural logarithm of OPENING.
(3) a. What specific problem does taking the log of the dependent variable usually help solve in a
regression analysis?
P-value:
STAR
= 0
0.187/2 <0.1
H1:
STAR
<0
Conclusion: Reject
H0
4
(9) e. On average, does the value of LN(Opening) increase significantly more than 0.01 units for
every 1 unit increase in BUDGET at the 0.05 level, assuming we hold the other predictors
constant? Please specify:
H0:
Budget
Test statistic:
= 0.01
H1:
( 0.01937 0.01)
Budget
> 0.01
2.2
0.00426
Critical Value: t
Conclusion:
Reject H0 (Yes)
Test statistic:
Critical Value:
H1: not H0
(11.3039 9.7696) / 3
1.2
0.42072
Conclusion: Accept
H0
(9) g. From the analysis on page 4 of the Appendix, can we determine what the value of
R2(unadjusted) would be if we dropped BUDGET from this model and only did the
regression on the other three predictors (with a constant)? (If it is possible, please give the
value of R2(unadjusted) for this simpler model.)
R-squared (Unadjusted)
()
()
(6) h. If I find a model for LN(Opening) that uses 5 predictors (and a constant) and reduces
MS(Error) to 0.400, is this new model of greater predictive value than the model on page 4
of the Appendix? (Assume that both models satisfy the assumptions of regression and
please justify your answer!)
Please circle I or II :
Best predictive model: I.
II.
Justification:
0.4/(32-7) = 0.0160
0.42072/(32-6) = 0.0162
Page 143 of 156
10 = 30*100/300
(20) 4.
5
90 = 270*100/300
A company wants to test whether the proportion of defective items in 3 large shipments are
the same. A random sample of 100 items is drawn from each shipment, and the results are
summarized below.
Shipment 1
Defective
Not Defective
Total
Shipment 2
Shipment 3
10
15
(10)
(10)
(10)
90
85
95
(90)
(90)
(90)
100
100
100
Total
30
270
300
Record the expected counts in each category above and do the following test.
H0 : p1 = p2 = p3 (OR: Shipment is independent of whether or not an item is defective)
H1 : not H0
Please specify:
Test statistic:
2
2
2
2
5
5
5
5
2
5.6
10 10 90 90
Critical value:
2( 2 df , 0.1) = 4.6
Conclusion:
Reject H0
Column Count
T C1
32
Name
Movie
C2
32
Opening
C3
32
Budget
C6
32
Star
C7
32
Summer
C8
32
StarXSummer
Vars
1
1
2
2
3
3
4
R-Sq
21.9
3.0
29.9
23.6
33.7
32.2
33.8
R-Sq
(adj)
19.3
0.0
25.0
18.3
26.6
24.9
24.0
R-Sq
(pred)
12.3
0.0
9.6
8.7
13.4
7.3
4.1
Mallows
Cp
3.8
11.6
2.6
5.2
3.0
3.7
5.0
S
17.900
19.950
17.256
18.010
17.076
17.272
17.374
B
u
d
g
e
t
X
S
t
a
r
S
u
m
m
e
r
S
t
a
r
X
S
u
m
m
e
r
X
X
X
X
X
X
X
X
X
X
X X
X X X
Page 145 of 156
Variance
DF
Adj SS
1
0.0
1
0.0
30 12313.9
31 12313.9
Adj MS
0.024
0.024
410.462
Model Summary
S
R-sq
20.2599 0.00%
R-sq(adj)
0.00%
Coefficients
Term
Coef
Constant 20.30
Star
0.06
SE Coef
4.65
7.29
F-Value
0.00
0.00
P-Value
0.994
0.994
R-sq(pred)
0.00%
T-Value
4.37
0.01
P-Value
0.000
0.994
VIF
1.00
Regression Equation
Opening = 20.30 + 0.06 Star
Fits and Diagnostics for Unusual Observations
Std
Obs Opening
Fit Resid Resid
21
92.73 20.30 72.43
3.67 R
22
84.13 20.30 63.83
3.24 R
R
Large residual
R-sq(adj)
25.04%
Coefficients
Term
Coef
Constant
3.33
Star
-13.15
Budget
0.398
SE Coef
6.24
7.26
0.113
Adj MS
1839.5
977.6
3679.0
297.8
240.6
447.7
F-Value
6.18
3.28
12.36
P-Value
0.006
0.080
0.001
0.54
0.879
R-sq(pred)
9.58%
T-Value
0.53
-1.81
3.52
P-Value
0.598
0.080
0.001
VIF
1.37
1.37
Regression Equation
Opening = 3.33 - 13.15 Star + 0.398 Budget
Fits and Diagnostics for Unusual Observations
Std
Obs Opening
Fit Resid Resid
21
92.73 32.35 60.38
3.67 R
22
84.13 39.11 45.02
2.83 R
R
Large residual
R-sq(adj)
42.45%
Coefficients
Term
Coef
Constant
1.623
Budget
0.01937
Star
-0.490
Summer
0.147
StarXSummer
0.439
Seq MS
2.82597
9.76959
0.32415
0.84325
0.36691
0.42072
0.45290
0.01843
F-Value
6.72
23.22
0.77
2.00
0.87
P-Value
0.001
0.000
0.388
0.168
0.359
24.57
0.040
R-sq(pred)
29.41%
SE Coef
0.262
0.00426
0.362
0.302
0.471
T-Value
6.20
4.55
-1.35
0.49
0.93
P-Value
0.000
0.000
0.187
0.631
0.359
VIF
1.37
2.40
1.73
2.88
Regression Equation
LN(Opening) = 1.623 + 0.01937 Budget - 0.490 Star + 0.147 Summer
+ 0.439 StarXSummer
Fits and Diagnostics for Unusual Observations
Obs LN(Opening)
Fit
Resid Std Resid
3
0.813 2.049 -1.236
-2.02 R
21
4.530 3.037
1.493
2.48 R
R
Large residual
The Outlines
Tests Concerning Means and Proportions
and Confidence Intervals (outline of the
ideas in Lectures 2-5)
Outline of Methods for Regression
(summary of inferential methods in
Lectures 6-11)
1 = 0
z=
x - 0
s/ n
( unknown, n >30)
t =
x - 0
s/ n
H1
Critical Region
(When to Reject H0)
< 0
z < -z
> 0
z >z
< 0
(n - 1)
t<-t
> 0
(n - 1)
t> t
|t| > t
(n - 1)
/2
z=
p - p 0
p 0 (1 - p 0)/n
z=
( x 1 - x 2 ) - D0
2
p < p0
z < -z
p > p0
z > z
p p0
1-2<D0
z < -z
s1 s2
+
n1 n 2
Two Large Samples
1-2>D0
z > z
Sec. 10.1, p. 372, with
1 and 2 replaced
1-2 D0
|z| >z/2
by s1 and s2 & p. 376
Alternative tests, that are especially appropriate when at least one sample is not large: a)Welchs t-test [MINITABs
default approach, see shaded box on p.376]: this uses the Case 4 test statistic as a t-statistic with a special formula for
the degrees of freedom; b) Mann-Whitney test, also known as Wilcoxon Rank-Sum test [In MINITAB: Stat
Nonparametrics Mann-Whitney].
D
is often zero, but refers to whatever constant we choose to use on the right side of the equation in H0.
Page 151 of 156
H0
5 1 - 2 = D0
t=
( x 1 - x 2) - D 0
sp
1
n1
H1
Critical Region
( n n 2 2)
1 - 2<D0
t t 1
1 - 2 > D0
t t 1
1 - 2 D0
|t|t
n2
Appropriate if
( n n 2 2)
1 = 2
Sec. 10.1, p. 375
(1 = 2, independent,
( n1 n 2 2)
/2
z=
p 1 p 2 D 0
p1 (1 - p1)
p 2 (1 - p 2)
+
n1
n2
Denominator is typically
calculated with pooled estimate
of p when D0 = 0 (see 1st note, near
bottom of p. 390)
p1-p2<D0
z < -z
p1-p2 >D0
z > z
p1-p2 D0
7 1 - 2 = D0
Paired Samples:
Sec. 10.2, p. 383
is often zero, but refers to whatever constant we choose to use on the right side of the equation in H0.
CONFIDENCE INTERVALS
(To Accompany: ATests Concerning Means and Proportions@)
Case #.
Parameter
x z/2 (s/ n )
-1)
x t (n
/2 (s/ n )
(small sample)
p z/2 p (1 - p )/n
1 - 2
( x1 - x 2) z/2
1 - 2
( x1 - x 2) t ( n1+ n 2-2) sp
p1 - p2
(p 1 - p 2) z/2
s1 s2
+
n1 n 2
/2
1
1
+
n1 n 2
p 1 (1 - p 1)
n1
p 2 (1 - p 2)
n2
0.025
[ ],
(3)
where b and sb are the coefficient estimate and the estimated standard error
of that coefficient, respectively. Both are provided in the coefficient table.
[SS() (, )]/
MS()
=
[SS(, ) ()]/
MS()
. (5)
Rationale:
Here the Reduced Model refers to the model without the m predictors
that are being tested. This test evaluates the combined marginal value (or
incremental value) of the m predictors. SS(Regression, Reduced Model)
and SS(Error, Reduced Model) refer to the sum of squared regression
and the sum of squared error, respectively, of the model without the m
Page 155 of 156
predictors that are being tested. We should reject H0 if the test statistic, F,
is significantly greater than 1.
Decision Rule:
If F > F (m, df of error), reject H0 and conclude that there is useful
additional information in the subgroup of predictors (otherwise accept H0).
(Usually the p-value is not available for this test.)
0.025
[ ]
(6)
(7)
This accommodates both the error due to the estimation of coefficients and
the intrinsic uncertainty in the model (the variance of the error term).