Professional Documents
Culture Documents
Is there a correlation/association/relationship/interaction
between the variables?
Dependence:
o
o
Independence:
o
No relationship exists
between variables
No relationship
Positive relationship
As x increases, Y
increases
Negative relationship
As x increases, Y
decreases
As x increases, Y
doesnt change
Y axis is the
DEPENDENT
variable (Cost)
which is influenced.
Numerical
Categorical
Chi-Square
Lets go!
Independent Variable (X)
Numerical Categorical
Format convention
The independent variable is on the horizontal axis (X)
The dependent variable is on the vertical axis (Y).
Ex. Variables GENDER and VIEW OF LIFE
Which sentence makes more sense?
Does gender have an effect on View of life (Is life exciting , routine, or dull?)
Does View of life (Is life exciting , routine, or dull?) have an effect on gender?
Bad Presentation
Good Presentation
No Relationship
The 100%f stacked bar chart does
NOT significantly CHANGE for
different categories of the IV.
Relationship
The 100%f stacked bar chart DOES
significantly CHANGE for different
categories of the IV. (at least one has to
change for some relationship to be
detected).
A sample is
taken and
organized into a
two way
contingency
table.
House Style
Split-Level
Ranch
Total
House Location
Urban
Rural
63
49
15
33
78
82
Total
112
48
160
Research Question: Is
there a difference in
house styles (DV) at
different locations (IV)?
Is there a significant
difference?
100
80
Split
Split
Ranch
Ranch
Urban
Rural
60
40
20
statistic
observed Expected
each cell
Expected
Remember to state
your result in the
context of the
specific problem!
df = (#rows-1)
x (# columns1)
Solution
Step 3: Find
2 significant
df = (#rows-1)(# columns1)
= (2 - 1)(2 - 1) = 1
=.05 (default value)
2significant= 2,df = 2.05,1 = 3.841
Step 4: Compare 2
statistic to 2 significant
Observed
Expected
Location
Style Urban Rural Total
Split
63
49 112
Ranch
15
33
48
Total
78
82 160
= 11278 = 55
160
Style
Split
Ranch
Total
observed Expected
2
statistic
Location
Urban Rural Total
55
57 112
23
25
48
78
82 160
Expected
each cell
63 55
55
49 57
57
15 23 33 25
23
25
7.62
Your turn!
The market research group for Albers Brewery of Tuscon, AZ, wants to know whether
preferences of beer type (light, regular, dark) differ among gender (male, female).
If beer preference is independent of gender, one advertising campaign will be initiated.
However, if beer preference depends on the gender of the beer drinker, the firm will
tailor its promotions to different target markets.
Are the variables related?
A survey was conducted and the following
data was collected:
100%
75%
50%
25%
Light
Stacked
bar graph
50%
Regular
25%
0
43%
Light
43%
Regular
25%
Dark
14% Dark
Step 3: Find
2 significant
df = (#rows-1)(# columns1)
= (3 - 1)(2 - 1) = 2
=.05 (default value)
22,.05 = 5.991
22,.01= 9.210
Step 4: Compare 2
statistic to 2 significant
2
statistic
observed Expected
Expected
each cell
20 27 40 37
27
37
20 16 30 23
16
23
30 33 10 14
33
14
6.604
REJECT Ho at the .05 level of significance (2 Statistic (6.604) > 2 Significant (5.991)). There is a
difference in beer preferences for men and woman. More females prefer light beer than men. Men
prefer regular beer over light/dark and females prefer light/regular over dark beer.
FAIL TO REJECT Ho at the .01 level of significance (2 Statistic (6.604) < 2 Significant (9.210)).
There is not enough evidence to reject Ho. Any differences in cell frequencies could be explained by
chance.
How much evidence we need is related to how confident we want to be in our results.
(the level of significance) is how often we are wrong (also called type 1 error).
Small Claims Court for endangerment of a child: Less evidence needed to convict, =.05
means there is a 5% chance you are wrong. Casey Anthony verdict is Reject Ho (GUILTY)
Jury for 1st degree murder: More evidence needed to convict: =.01 means there is a 1%
chance you are wrong. Casey Anthony verdict is Fail to Reject Ho (NOT GUILTY)
Statistically Significant
The value of used depends
on how confident you want to
be in your results.
What is statistically
significant to one person
might not be to another.
Statisticians
Has to have at least a 95% chance of being
true to be considered worth telling people
about (why =.05 is default for any
statistical program).
Manager
If something has a 90% chance of being
true ( =.1), it is probably better to act as
if it were true rather than false!
Chi-Square
2
( )
Distribution
FAIL TO REJECT Ho
p value
Since .04 (p-value) < .05 ()
Reject Ho
Chi-square Assumptions
The sample size is large
(expected frequency of each cell is > 5)
Your turn!
Young
Music 14
News 46
Sports 7
Not Young
12
23
12
28
Split
Split
Ranch
Ranch
Urban
Rural
60
40
20
statistic
each cell
63 55 49 57
55
57
15 23 33 25
23
7.62
25
7.62
0.22
n
160
Squaring phi will give you the variance that can be explained.
Whether the house location is urban or rural explains
(.22x.22=.05) 5% of the variance in the style of house built.
statistic
2
observed Expected
20 27 40 37
2
27
37
20 16 30 23
16
23
30 33 10 14
33
6.604
Cramers V
Expected
each cell
14
2
n df
6.604
0.15
150 2
100%
80%
Split
Split
60%
75%
40%
20%
100%
Rural
Rural
Urban
Rural
50%
25%
Light
50%
Regular
25%
0
43%
Light
43%
Regular
25%
Dark
14% Dark
Odds Ratio
Group 1
Group 2
Total
Outcome 1
a+c
Outcome 2
b+d
a+b
c+d
a+b+c+d
Total
House Style
Split-Level
Ranch
Total
House Location
Urban
Rural
63
49
15
33
78
82
Total
112
48
160
ad 63 33 2079
OR
2.83
bc 15 49 735
Group 1 had odds of having outcome 1 OR times (more if OR>1; less
OR<1) than those who were in group 2.
Urban locations had odds of having a split-level house style 2.83 times more
than those who were in the rural area.
No universal agreement regarding what constitutes a strong or weak association:
OR > 2.0 is moderately strong; OR > 5.0 is strong
Weak associations are more likely to be explained by undetected biases or
confounders.
Community-Based
Case-Control
Cohort
Hospital-Based
Case-Control
1981
1982
1984
1984
1989
1989
1989
1988
1991
1977
1979
1982
1981
1981
1983
1987
1988
1989
1981
1987
1988
+ ve Association
-ve Association
0.0
0.5
1.0
1.5
2.0
Odds Ratio
2.5
3.0
3.5
www.contraceptiononline.org
Group 1
Group 2
Total
Outcome 1
a+c
Outcome 2
b+d
a+b
c+d
a+b+c+d
Total
RR
ad 30 90
OR
3.9
bc 70 10
a / (a b) 30 /100
3
c / (c d ) 10 /100
Those that smoke are 3 times (or 300%) more likely to develop
lung cancer than those that dont smoke.
RRR
2
c / (c d )
10 /100
ARR (a / (a b)) (c / (c d ))
(30 /100) (10 /100) .2
10
100
Group Group
A
B
RRI
Those that
smoke and
got cancer
30
100
ARI
10%
30%
200% 20%
1%
3%
200%
2%
.1%
.3%
200%
.2%
100%
80%
Split
Split
60%
75%
40%
20%
100%
Rural
Rural
Urban
Rural
50%
25%
Light
50%
Regular
25%
0
43%
Light
43%
Regular
25%
Dark
14% Dark
OE
E
(1
nrow
n
)(1 column
ntotal
ntotal
standardized residual
20 27
27
2.3
50
80
(1
)(1
)
)
150
150
Light
Regular
Dark
adjusted
standardized
male female
-2.3
3.5
0.9
-0.9
1.6
-1.9
42
44
Example
Is it safer to fly in the front, middle, or
back of the airplane?
Matt McCormick, a survival expert for
the National Transportation Safety
Board, told Travel Magazine that
There is no one safe place to sit.
Collected Raw
Data must be
organized into a
frequency table.
Seat
Back
Middle
Front
Total
f
23
35
29
87
87/3 = 29
Front is 29
Middle is 29
Back is 29
This is a uniform
distribution!
1. Form hypothesis.
observed
exp ected
exp ected
Solution
f
Hypothesized
(observed) distribution
f (expected)
Back
23
29
Middle
35
29
df = # outcomes 1 = 3 1 = 2
=.05 (default value)
2,df = 2.05,2 = 5.991
Front
29
29
Total
87
87
Step 4: Compare
to 2 significant
statistic
2.48<5.991
2 statistic < 2 significant
Fail to Reject Ho.
There is not enough evidence to refute the
claim that there is no one safe place to sit!.
observed
exp ected
23 29
exp ected
35 29
29
29
1.24 1.24 0 2.48
29 29
29
Season
Winter
78
Spring
71
Summer
87
Fall
86
Total
322
Step 4: Compare 2
statistic to 2 significant
2 statistic
(2.10) <
Fail to Reject Ho.
2 significant
(7.815)
fo
78
fe
322/4
=80.5
Spring
71
80.5
Summer
87
80.5
Fall
86
80.5
Total
322
322
78 80.5 71 80.5
2
statistic
80.5
80.5
87 80.5 86 80.5
80.5
2.10
80.5
p value (.55)
Assumptions
51
78 80.5 71 80.5
2
2
statistic
80.5
80.5
87 80.5
80.5
2.10
86 80.5
80.5
Cramers V
V
2
n df
Season
Winter
78
Spring
71
Summer
87
Fall
86
Total
322
2.10
.05
322 3
Interpretation:
A value of 0 indicates that the sample proportions are exactly equal
(a perfect fit) to the hypothesized proportions (i.e., O = E). As v
increases, the degree of departure from a perfect fit increases.
Since V=.05, there is a small effect, or small departure from fit
Research Question
7. Interpret results
55
Remember