You are on page 1of 10

1

ECON1203/ECON2292
Business and Economic
Statistics
Week 12
Week 12 topics
Regression case study
Chi-squared distribution
Hypothesis test for a population variance
Chi d t t f d f fit
2
Chi-squared test of goodness of fit
Chi-squared test of a contingency table
Key references
Berman, Brooks & Davidson (2000)
Keller 4.6, 17.2, 12.2, 15.1-15.2
Sydney Olympic Games and
the stock market
Did the Sydney Olympic Games announcement have a
positive stock market impact?
BDD use market model in finance with added dummy
variable for announcement effect
R | + | R + | D + R
it
= |
0
+ |
1
R
mt
+ |
2
D
t
+ c
t
R
it
= daily return on an industry i accumulation index
R
mt
= market return on All Ordinaries accumulation index
D
t
= 1 if time is the day of the Games announcement on 23 Sept 1993; =0
otherwise
BDD use daily data 4 Jan 1988 to 29 Nov 1996 & for
several industries
3
2
Sydney Olympic Games & the
stock market
Augmented market model was estimated for 23 industry
portfolios including banks & building materials
Which of these 2 industries would you expect to be impacted
more?
4
values are ) ( in numbers :
) 0003 . 0 (
0202 . 0
) 0000 . 0 (
9346 . 0
) 7383 . 0 (
0000 . 0
: materials Building
) 7987 (.
0016 . 0
) 0000 . 0 (
0690 . 1
) 0570 . 0 (
0002 . 0
: Banks

+ + =
+ =
p Note
D R R
D R R
t mt t
t mt t
Sydney Olympic Games & the
stock market
Did the announcement have
a significant impact for both
industries?
Note reported p-values refer to
null of zero coefficient
5
BBD found only 4
industries for which there
was a significant impact
building materials,
developers & contractors,
engineering & other
services
Sydney Olympic Games & the
stock market
|
1
parameters represent sensitivity of each stock to the
overall market
Banks have b
1
=1.069
This industry has rate of return that is more sensitive to changes in the
overall market than is the average stock
Converse true for building materials where b =0 9346 Converse true for building materials where b
1
=0.9346
Thus H
0
: |
1
=1 is an interesting hypothesis
Could you use reported information to test this hypothesis?
No! Not with just reported p-values of 0.0000 for H
0
: |
1
=0
Good presentation involves reporting ses
6
3
Some different problems
Previously have concentrated on inference
associated with location problems
Means, proportions & conditional means (regression)
Many other interesting statistical problems
7
P1: CPRepairs has fluctuating demand for vehicle
repairs necessitating paying staff overtime
Changing variability could present peak load problems for
staff availability &/or morale
Inference problem: Has there been a change in the
variance of total overtime hours?
Some different problems
P2: CPRepairs concerned about consumer
satisfaction with service
Industry benchmarks for satisfaction levels are available
Inference problem: Do CPRepairs survey results differ from
8
industry benchmarks?
P3: CPRepairs services a range of customer types
(private, business, government)
If split customer satisfaction results by type can we observe
differences?
Inference problem: Is customer satisfaction independent of
customer type?
Hypotheses about variances
In testing hypotheses about variances obvious test
statistic is based on s
2
In order to compare s
2
with o
2
need appropriate sampling
distribution
9
Need to consider a new distribution
freedom of degrees 1 with
squared - Chi d distribute is statistic test ed Standardiz
~
) 1 (
population normal a from sampling random have If
2
1 2
2
2
n-
s n
n

= _
o
_
4
Hypotheses about variances
Consider
before as same
ly conceptual is hypotheses testing of
process but on depends & right the
to skewed is on distributi squared - Chi
df
10
2
1 , 1 2
2
2
0
2
0
2
0
2
1
2
0
2
0
) 1 (
if Reject
: is rule rejection specified For
value ed hypothesiz where
: :
Consider

<

=
=
< =
n
s n
H
H H
o
_
o
_
o
o
o o o o
P1: CPRepairs overtime
Staff numbers set assuming total of 50 hours
overtime per week & variance of 25
Is there evidence of a different variance?
Assume overtime hours per week is approximately normal
Choose o = 10 Choose o = .10
Sample of 12 weeks produces s
2
=28.1
Need Chi-squared distribution with (12 1) = 11 degrees of
freedom
What are relevant percentage points?
11
Chi-squared critical values
12
5
P1: CPRepairs overtime...
2
11 , 05 .
2
11 , 95 .
2
1 12 , 05 . 1
2
1
2
0
19.68 & 4.57
0.05 2 / test tailed Two
25 : 25 : H H
= = =
=
= =

_ _ _
o
o o
0
2
2
2
2
2
2
0
reject not do
36 . 12
25
1 . 28 ) 1 12 ( ) 1 (
As
19.68 or .57 4 is
) 1 (
if Reject
H
s n
s n
H

=
> <

=
o
_
o
_
13
P1: CPRepairs overtime
g rearrangin then
1 )
) 1 (
Pr( or , 1 ) Pr(
for CI a construct could ely Alternativ
2
1 , 2 / 2
2
2
1 , 2 / 1
2
1 , 2 /
2 2
1 , 2 / 1
2
= <

< = < <


n n n n
s n
o o o o
o _
o
_ o _ _ _
o
14
( ) 64 . 67 , 71 . 15 or
57 . 4
1 . 28 ) 1 12 (
,
68 . 19
1 . 28 ) 1 12 (
is CI 90% P1 For
) 1 (
,
) 1 (
is CI )100% - (1 the &
1
) 1 ( ) 1 (
Pr
g rearrangin then
2
1 , 2 / 1
2
2
1 , 2 /
2
2
1 , 2 / 1
2
2
2
1 , 2 /
2
(


=
(


< <



n n
n n
s n s n
s n s n
o o
o o
_ _
o
o
_
o
_
P1: CPRepairs overtime
90% CI is (15.71, 67.64)
Notice the CI is not symmetric about s
2
Recall for population mean CI was sample mean margin of
error
B f l i i h CI i
15
But for population variance the CI is
(s
2
error
L
<
2
<s
2
+ error
U
) & error
L
error
U
CI includes o
2
= 25 & conclude would not reject
H
0
: o
2
= 25 at 10% level
While the point estimate of o
2
is > 25 no statistical
evidence of an increase in population variance
No evidence favouring a change in staff numbers
6
Chi-squared tests
Data often occurs in nominal (categorical) form
Private health insurance status & hospital type
Customer satisfaction surveys
There are several possible outcomes or
16
categories
Categories are mutually exclusive & exhaustive
Think of each respondent/observation as being a trial
Recall binomial experiments now multinomial
extension
Will often have expected or hypothesized distribution
of outcomes
Chi-squared tests
Want to compare observed & expected
distributions
Obviously could calculate differences in expected &
observed category frequencies
I f bl i t d t i h th th Inference problem is to determine whether those
differences are statistically large
Chi-squared goodness of fit test used to
test if observed & expected distributions are
the same
17
Chi-squared tests
H
0
will specify probabilities p
i
that an observation
falls into i=1,,c categories or cells
H
0
implies expected frequencies for sample of size n
(e
i
= p
i
n)
A i Assuming
Random sampling (independent trials)
Probabilities p
i
are constant over trials
Note, the test can be unreliable if any values of
e
i
= p
i
n get too small (e.g. 3 or 4)
Solution: merge categories where feasible
18
7
Chi-squared tests
The distribution theory underlying the test is not exact
It is large sample theory (a reason for above limitation)
Test statistic is given by
( ) e o
c 2
( )
i e
i o
e
e o
i
i
c
c
i i
i i
cell in frequency expected
cell in frequency observed where
correct, is hypothesis null if 1) - (c squared - Chi from
n observatio an like behave should statistic the i.e.
, ~
2
1
1
2
=
=

=

=

_ _
19
P2: CPRepairs benchmarking
consumer satisfaction
In a national survey of all auto repair centres
customers were asked:
How would you rate the level of service provided by your
repair centre?
20
Distribution of responses:
Excellent (8%), Very good (47%), Fair (34%), Poor (11%)
CPRepairs conducted their own survey of 207
customers to compare with national results
Observed response frequencies:
Excellent (21), Very good (109), Fair (62), Poor (15)
P2: CPRepairs benchmarking
consumer satisfaction
Hypotheses
H
0
: CPRepairs distribution of customer satisfaction is the
same as the national distribution for all auto repairers
p
1
= .08, p
2
= .47, p
3
= .34, p
4
= .11
H
1
: CPRepairs distribution is not the same as the national
21
1
p
distribution
Test procedure
As c = 4 test has 3 degrees of freedom
Choose o = 0.05
Decision rule is:
Reject H
0
if _
2
> _
2
.05,3
= 7.815
8
P2: CPRepairs benchmarking
consumer satisfaction
Response Observed frequency Expected frequency
oi ei (oi - ei)
2
/ ei
Excellent 21 .08x207=16.56 1.19
Very good 109 .47x207=97.29 1.41
Fair 62 34x207=70 38 1 00
22
Fair 62 .34x207=70.38 1.00
Poor 15 .11x207=22.77 2.65
Total 207 207
_
2
= 6.25

Notice that observed frequencies tend to indicate higher levels of satisfaction compared to the
national distribution

But as _
2
=6.25 < 7.815 do not reject H0 & conclude CPRepairs distribution of customer
satisfaction responses is not statistically different from the distribution of national responses

Contingency tables
Recall SIA: private health insurance (PHI)
Survey data were summarized in a 2-way cross-tabulation
or contingency table
The 2 ways were PHI status & admission to hospital
PHI status had 2 levels (have PHI/dont have)
Ad i i h d 3 l l ( t d itt d/ d itt d
23
Admission had 3 levels (not admitted/admitted as
private/admitted as public)
Previously used such tables as descriptive tools
Also checked whether events were independent
Now want to formally test whether random variables are
independent or not
Is there a relationship between the 2 categorical random
variables?
Contingency tables
Testing strategy is similar to that used for the
goodness of fit test
Compare observed cell frequencies with those expected
under null hypothesis of independence
24
How do you calculate the expected frequencies?
Previously these followed readily from the hypothesized
probability distribution
Now H
0
simply asserts independence
Recall what is required for independent events
P(A B)=P(A)P(B)
9
Contingency tables
For a contingency table assume independence
Then use marginal (row & column) totals to generate
expected frequencies for each cell
Expected frequency for cell in row i & column j is:
25
c j n
r i n
n
n n
n
n
n
n
n
e
j
i
j i j i
ij
,..., 1 column in obs. total
,..., 1 row in obs. total where
.
.
. . . .
= =
= =
= =
Contingency tables
( )
~
now is statistic Test
2
1 1
2
2

=

= =
e
e o
r
i
c
j ij
ij ij
_ _
v
26
) 1 ( ) 1 (
column row in cell of frequency expected
column row in cell of frequency observed where
1 1
=
=
=
= =
c r
j i e
j i o
e
ij
ij
i j ij
v
P3: CPRepairs satisfaction by
consumer type
CPRepairs conducted their own survey to:
Benchmark their results versus national results (P2)
Investigate how well they were servicing different types of
customers (P3)
27
CPRepairs responses were classified into 3 types of
customers (private, business, government)
The 2-way contingency table is satisfaction response with 4
levels versus type with 3 levels
Is customer satisfaction independent of customer type?
10
P3: CPRepairs satisfaction by
consumer type
Type
Response Private Business Government Total
Excellent 4 7 10 21
Very good 35 34 40 109
Fair 21 24 17 62
Poor 6 5 4 15
Total 66 70 71 207

28
This is the contingency table with cross-tabulation of
business types & satisfaction
These are the observed survey responses for CPRepairs
Now need to compare these with what would be
expected under independence
P3: CPRepairs satisfaction by
consumer type
Type
Response Private Business Government Total
Excellent 4 (6.696) 7 (7.101) 10 (7.203) 21
Very good 35 (34.754) 34 (36.860) 40 (37.386) 109
Fair 21 (19.786) 24 (20.966) 17 (21.266) 62
29
( ) ( ) ( )
Poor 6 (4.783) 5 (5.072) 4 (5.145) 15
Total 66 70 71 207

_
2
= (4 - 6.696)
2
/6.696 +(7 - 7.101)
2
/7.101 + + (4 - 5.145)
2
/5.145 = 4.5164

As _
2
=4.5164 < _
2
.01,6 = 16.8119 do not reject H0 that type and satisfaction are independent



Further quantitative course
options
Second year
Introductory Econometrics
Statistics for Econometrics
Business Forecasting
Third year
Econometric Methods
30
Econometric Methods
Econometric Theory
Financial Econometrics
Honours
Applied Econometrics
Advanced Econometric
Theory
Elements of Econometrics
Microeconometric Modelling
of Choice

You might also like