You are on page 1of 46

Two-Way Tables: Chi-Square Tests

Edpsy/Psych/Soc 589

Carolyn J. Anderson
Department of Educational Psychology

I L L I N O I S
UNIVERSITY OF ILLINOIS AT URBANA - CHAMPAIGN

Two-Way Tables: Chi-Square Tests Slide 1 of 45


Outline

■ Overview and Definitions


Overview and Definitions ■ Chi-squared distribution
The Chi–Squared Distribution
■ Pearson’s X 2 statistic
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


■ Likelihood ratio test statistic
Chi-Squared Test Hypotheses
■ Examples of
Independence
◆ Independence
◆ Homogeneous distributions
Homogeneous Distributions

◆ Unrelated classifications
Unrelated Classification
◆ Other
Other Hypotheses

Partitioning Chi-Square ■ Residuals


Summary Comments on
Chi-Squared Tests
■ (Partitioning Chi-square)
■ Comments

Two-Way Tables: Chi-Square Tests Slide 2 of 45


Overview and Definitions
■ For a 2–way table, a null hypothesis Ho specifics a set of
probabilities
Overview and Definitions
● Overview and Definitions HO : {πij } for i = 1, . . . , I and j = 1, . . . , J
● Null Hypotheses

The Chi–Squared Distribution ■ “Expected Frequencies” are the values expected if the null
Pearson’s Chi-Squared Statistic hypothesis is true,
Likelihood Ratio Statistic
µij = nπij
Chi-Squared Test Hypotheses

Independence
■ To test a null hypothesis, we compare the observed
Homogeneous Distributions
frequencies nij and the expected frequencies µij :
Unrelated Classification
{nij − µij }
Other Hypotheses

Partitioning Chi-Square ■ The test statistics are functions of observed and expected
Summary Comments on
Chi-Squared Tests
frequencies.
■ If the null hypothesis is true, then the test statistics are
distributed as chi-squared random variables so they are
referred to as
“Chi-Squared Tests”.

Two-Way Tables: Chi-Square Tests Slide 3 of 45


Null Hypotheses

Overview and Definitions


● Overview and Definitions
The two most common tests/null hypotheses are
● Null Hypotheses

The Chi–Squared Distribution


■ Chi-squared test of Independence.
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic

Chi-Squared Test Hypotheses


■ Chi-squared test of Homogeneous Distributions.
Independence

Homogeneous Distributions

Unrelated Classification

Other Hypotheses

Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 4 of 45


The Chi–Squared Distribution
The “Degrees of Freedom”, df , completely specifies a
chi-squared distribution.
Overview and Definitions

The Chi–Squared Distribution ■ 0 ≤ chi-squared random variable.


● The Chi–Squared Distribution
● Picture of Chi–Squared
Distributions ■ The mean of a chi-squared distribution = df .
Pearson’s Chi-Squared Statistic
■ The variance of a chi-squared
√ distribution = 2df and the
Likelihood Ratio Statistic
standard deviation = 2df .
Chi-Squared Test Hypotheses

Independence ■ The shape is skewed to the right.


Homogeneous Distributions ■ As df increase, the mean gets larger and the distribution
Unrelated Classification more spread out.
Other Hypotheses

Partitioning Chi-Square
■ As df increase, the distribution becomes more “bell-shaped”
Summary Comments on
(i.e., df → ∞, χ2df → N ).
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 5 of 45


Picture of Chi–Squared Distributions

Overview and Definitions

The Chi–Squared Distribution


● The Chi–Squared Distribution
● Picture of Chi–Squared
Distributions

Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic

Chi-Squared Test Hypotheses

Independence

Homogeneous Distributions

Unrelated Classification

Other Hypotheses

Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 6 of 45


Pearson’s Chi-Squared Statistic

Overview and Definitions

I X J
The Chi–Squared Distribution
2
X (nij − µij )2
Pearson’s Chi-Squared Statistic X =
● Pearson’s Chi-Squared
i=1 j=1
µij
Statistic
● Chi–Squared Distribution and
p-value

Likelihood Ratio Statistic


■ 0 ≤ X2
Chi-Squared Test Hypotheses
■ When nij = µij for all (i, j), then X 2 = 0
Independence ■ For “large” samples, X 2 has an approximate chi-squared
Homogeneous Distributions distribution.
Unrelated Classification

Other Hypotheses
A good rule: “Large” means µij ≥ 5 for all (i, j).
Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests
■ The p-value for a test is the right tail probability of X 2 .

Two-Way Tables: Chi-Square Tests Slide 7 of 45


Chi–Squared Distribution and p-value

Overview and Definitions

The Chi–Squared Distribution

Pearson’s Chi-Squared Statistic


● Pearson’s Chi-Squared
Statistic
● Chi–Squared Distribution and
p-value

Likelihood Ratio Statistic

Chi-Squared Test Hypotheses

Independence

Homogeneous Distributions

Unrelated Classification

Other Hypotheses

Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 8 of 45


Likelihood Ratio Statistic
■ Need the maximum likelihood estimates of parameters
assuming
◆ Null hypothesis is true (simpler, restrictions on
Overview and Definitions

The Chi–Squared Distribution


parameters).
◆ Alternative hypothesis is true (more general, no (or fewer)
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


restrictions on parameters).
● Likelihood Ratio Statistic
● Likelihood Ratio Statistic for ■ The test statistic is based on
2-way Table

Chi-Squared Test Hypotheses maximum of the likelihood when parameters satisfy HO


Λ=
Independence maximum of likelihood when parameters are not restricted
Homogeneous Distributions
■ The numerator ≤ denominator (max L(HO ) ≤ max L(HA )).
Unrelated Classification
■ 0 ≤ Λ ≤ 1.
Other Hypotheses

Partitioning Chi-Square
■ If max L(HO ) = max L(HA ), then there is no evidence
Summary Comments on
against HO . (i.e., Λ = 1)
Chi-Squared Tests
■ The smaller the likelihood under HO , the more evidence
against HO (i.e., the smaller Λ).

Two-Way Tables: Chi-Square Tests Slide 9 of 45


Likelihood Ratio Statistic for 2-way Table

Overview and Definitions


The test statistic is −2 log(Λ), which for contingency tables
The Chi–Squared Distribution

J
I X
Pearson’s Chi-Squared Statistic X
Likelihood Ratio Statistic
G2 = 2 nij log(nij /µij )
● Likelihood Ratio Statistic i=1 j=1
● Likelihood Ratio Statistic for
2-way Table

Chi-Squared Test Hypotheses

Independence

Homogeneous Distributions
This is the “likelihood ratio chi-squared statistic”.
Unrelated Classification

Other Hypotheses

Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 10 of 45


Chi-Squared Tests Hypotheses
1. Independence
2. Homogeneous Distributions
Overview and Definitions
3. Unrelated Classifications
The Chi–Squared Distribution
4. Other
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic

Chi-Squared Test Hypotheses


■ 1, 2 , & 3 are all tests of “no association” or “no relationship”.
● Chi-Squared Tests
Hypotheses ■ 1 & 2 are the most common.
Independence
■ 1, 2, & 3 all use the same formula to compute expected
Homogeneous Distributions
frequencies, but arrive at it from different starting points.
Unrelated Classification
■ 4 depends on the (substantive) hypothesis you are testing.
Other Hypotheses

Partitioning Chi-Square
■ These four test differ in terms of
◆ Experimental procedure (i.e., sampling design)
Summary Comments on
Chi-Squared Tests ◆ The null and alternative hypothesis
◆ Logic used to obtain estimates of expected frequencies
assuming HO is true.

Two-Way Tables: Chi-Square Tests Slide 11 of 45


Independence

Situation: Two response variables (either Poisson sampling or


Overview and Definitions multinomial sampling)
The Chi–Squared Distribution
Null Hypothesis: Two variables are statistically independent
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


Alternative Hypothesis: Two variables are dependent.
Chi-Squared Test Hypotheses

Independence Definition of statistical independence,


● Independence
● Expected Frequencies Under
Independence HO : πij = πi+ π+j
● Testing Independence
● Computing Degrees of
Freedom
● Example: Two Items from the
for all i = 1, . . . , I and j = 1, . . . , J.
1994 GSS
● Example: Estimated
Expected Values
● Example: Test Statistics Statistical dependence is not statistically independent
● Residuals
● Adjusted Residuals
● Residuals and SAS
● Another Example of
HA : πij 6= πi+ π+j
Independence
● Admission Scandal Results
● Results Continued
for at least one i = 1, . . . , I and j = 1, . . . , J.
● Test of Independence

Homogeneous Distributions
To test this hypothesis, we assume HO is true.
Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 12 of 45


Other Hypotheses
Expected Frequencies Under Independence

Given data, the observed marginal proportions pi+ and p+j are
Overview and Definitions the maximum likelihood estimates of πi+ and π+j , respectively;
The Chi–Squared Distribution that is,
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


π̂i+ = pi+
Chi-Squared Test Hypotheses

Independence
π̂+j = p+j
● Independence
● Expected Frequencies Under
Independence
● Testing Independence
● Computing Degrees of
Freedom
“Estimated Expected Frequencies” are
● Example: Two Items from the
1994 GSS
● Example: Estimated µ̂ij = nπ̂i+ π̂+j
Expected Values
● Example: Test Statistics
● Residuals
= n(ni+ /n)(n+j /n)
● Adjusted Residuals ni+ n+j
● Residuals and SAS =
● Another Example of
Independence
n
● Admission Scandal Results
● Results Continued
● Test of Independence

Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 13 of 45


Other Hypotheses
Testing Independence

For “large” samples, to test the hypothesis that two variables


Overview and Definitions are statistically independent, use either
The Chi–Squared Distribution XX
2
Pearson’s Chi-Squared Statistic G =2 nij log(nij /µ̂ij )
Likelihood Ratio Statistic i j
Chi-Squared Test Hypotheses
or
Independence X X (nij − µ̂ij )2
2
● Independence X =
● Expected Frequencies Under
Independence
i j
µ̂ij
● Testing Independence
● Computing Degrees of
Freedom
● Example: Two Items from the
and compare value to the appropriate chi-squared distribution.
1994 GSS
● Example: Estimated
Expected Values
● Example: Test Statistics General Rule for computing Degrees of Freedom:
● Residuals
● Adjusted Residuals
● Residuals and SAS The number of parameters specified under the
● Another Example of
Independence alternative hypothesis minus the number of parameters
● Admission Scandal Results
● Results Continued specified under the null hypothesis.
● Test of Independence

Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 14 of 45


Other Hypotheses
Computing Degrees of Freedom

df = (# parameters in HA ) − (# parameters in HO )
Overview and Definitions
■ Null hypothesis has
The Chi–Squared Distribution
◆ (I − 1) unique parameters for the row margin, π̂i+ .
Pearson’s Chi-Squared Statistic
◆ (J − 1) unique parameters for the column margin, π̂+j .
Likelihood Ratio Statistic

Chi-Squared Test Hypotheses


■ Alternative hypothesis has
Independence
● Independence (IJ − 1) unique parameters. The only restriction on the
● Expected Frequencies Under
Independence parameters in the HA is that the probabilities sum to 1.
● Testing Independence
● Computing Degrees of
Freedom
● Example: Two Items from the
■ Degrees of Freedom so
1994 GSS
● Example: Estimated
Expected Values df = (IJ − 1) − [(I − 1) + (J − 1)] = (I − 1)(J − 1).
● Example: Test Statistics
● Residuals
● Adjusted Residuals
● Residuals and SAS
● Another Example of
Independence
● Admission Scandal Results
df = the same number was came up with when we
● Results Continued
● Test of Independence
considered how many numbers we need to completely
describe the association in an I × J table.
Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 15 of 45


Other Hypotheses
Example: Two Items from the 1994 GSS
■ Item 1: A working mother can establish just as warm and
secure a relationship with her children as a mother who does
Overview and Definitions
not work.
The Chi–Squared Distribution ■ Item 2: Working women should have paid maternity leave.
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic Observed Frequencies: nij


Chi-Squared Test Hypotheses
Item2
Independence
● Independence
● Expected Frequencies Under
Strongly Strongly
Independence
● Testing Independence Item 1 Agree Agree Neither Disagree Disagree
● Computing Degrees of
Freedom
● Example: Two Items from the Strongly Agree 97 96 22 17 2 234
1994 GSS
● Example: Estimated
Expected Values
Agree 102 199 48 38 5 392
● Example: Test Statistics
● Residuals Disagree 42 102 25 36 7 212
● Adjusted Residuals
● Residuals and SAS
● Another Example of
Strongly Disagree 9 18 7 10 2 46
Independence
● Admission Scandal Results 250 415 102 101 16 884
● Results Continued
● Test of Independence

Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 16 of 45


Other Hypotheses
Example: Estimated Expected Values
■ Item 1: A working mother can establish just as warm and
secure a relationship with her children as a mother who does
Overview and Definitions
not work.
The Chi–Squared Distribution ■ Item 2: Working women should have paid maternity leave.
Pearson’s Chi-Squared Statistic Estimated Expected Frequencies:
Likelihood Ratio Statistic
ni+ n+j
µ̂ij =
Chi-Squared Test Hypotheses
n
Independence
● Independence
● Expected Frequencies Under Item2
Independence
● Testing Independence
● Computing Degrees of
Strongly Strongly
Freedom
● Example: Two Items from the
1994 GSS
Item 1 Agree Agree Neither Disagree Disagree
● Example: Estimated
Expected Values Strongly Agree 66.18 109.85 27.00 26.74 4.24 234
● Example: Test Statistics
● Residuals
● Adjusted Residuals
Agree 110.86 184.03 45.23 44.79 7.10 392
● Residuals and SAS
● Another Example of
Independence
Disagree 59.96 99.53 24.46 24.22 3.84 212
● Admission Scandal Results
● Results Continued Strongly Disagree 13.01 21.60 5.31 5.26 0.83 46
● Test of Independence

Homogeneous Distributions
250 415 102 101 16 884
Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 17 of 45


Other Hypotheses
Example: Test Statistics

Overview and Definitions


Statistic df Value p-value
The Chi–Squared Distribution

Pearson’s Chi-Squared Statistic


Pearson Chi-square X2 12 47.576 < .001
Likelihood Ratio Statistic
Likelihood Ratio Chi-square G2 12 44.961 < .001
Chi-Squared Test Hypotheses

Independence
● Independence
● Expected Frequencies Under
Independence
● Testing Independence
● Computing Degrees of
Freedom
● Example: Two Items from the What’s the nature of the dependency? Residuals. . .
1994 GSS
● Example: Estimated
Expected Values
● Example: Test Statistics
● Residuals
● Adjusted Residuals
● Residuals and SAS
● Another Example of
Independence
● Admission Scandal Results
● Results Continued
● Test of Independence

Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 18 of 45


Other Hypotheses
Residuals
■ Raw Residuals: nij − µ̂ij
Problem: These tend to be large when µ̂ij is large.
Overview and Definitions
For Poisson random variables, mean = variance.
The Chi–Squared Distribution
■ Pearson Residuals or often called “standardized residuals”
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic nij − µ̂ij


p
Chi-Squared Test Hypotheses
µ̂ij
Independence
● Independence
● Expected Frequencies Under
Strongly Strongly
Independence
● Testing Independence Agree Agree Neither Disagree Disagree
● Computing Degrees of
Freedom
● Example: Two Items from the
Strongly Agree 3.79 −1.32 −.96 −1.88 −1.09
1994 GSS
● Example: Estimated
Expected Values
Agree −.84 1.10 .41 −1.01 −.79
● Example: Test Statistics
● Residuals Disagree −2.32 .25 .11 2.39 1.61
● Adjusted Residuals
● Residuals and SAS
● Another Example of
Strongly Disagree −1.11 −.77 .73 2.07 1.28
Independence
● Admission Scandal Results
● Results Continued If the null hypothesis is true, then these should be
● Test of Independence
approximately normally distributed with mean = 0, but . . .
Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 19 of 45


Other Hypotheses
Adjusted Residuals
■ Problem with Pearson Residuals: The variance (standard
deviation) of Pearson residuals is a bit too small.
Overview and Definitions ■ Adjusted Residuals or “Haberman residuals”
The Chi–Squared Distribution (Haberman, 1973).
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


nij − µ̂ij
p
Chi-Squared Test Hypotheses
µ̂ij (1 − pi+ )(1 − p+j )
Independence
● Independence If the null hypothesis is true, then these residuals have an
● Expected Frequencies Under
Independence asymptotic standard normal distribution.
● Testing Independence
● Computing Degrees of
Freedom
Strongly Strongly
● Example: Two Items from the
1994 GSS Agree Agree Neither Disagree Disagree
● Example: Estimated
Expected Values
● Example: Test Statistics Strongly Agree 5.22 −2.12 −1.19 −2.33 −1.28
● Residuals
● Adjusted Residuals Agree −1.33 2.03 .59 −1.44 −1.06
● Residuals and SAS
● Another Example of
Independence Disagree −3.14 .39 2.92 2.92 1.82
● Admission Scandal Results
● Results Continued
● Test of Independence
Strongly Disagree −1.35 −1.09 .80 2.25 1.33
Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 20 of 45


Other Hypotheses
Residuals and SAS
■ DATA GSS94;
INPUT item1 item2 count;
DATALINES;
Overview and Definitions
1 1 97
The Chi–Squared Distribution

Pearson’s Chi-Squared Statistic


1 2 96
.. .. ..
Likelihood Ratio Statistic
. . .
Chi-Squared Test Hypotheses
4 5 2
Independence
● Independence
● Expected Frequencies Under
■ PROC FREQ gives raw residuals (DEVIATION option) and
Independence
● Testing Independence
“cell contribution” to Pearson chi-squared statistic, which are
● Computing Degrees of
Freedom
Squared Pearson residuals (CELLCH2 option).
● Example: Two Items from the
1994 GSS
PROC FREQ;
● Example: Estimated
Expected Values
TABLES item1*item2 / CELLCH2;
● Example: Test Statistics
● Residuals
■ PROC GENMOD gives Adjusted residuals and lots more.
● Adjusted Residuals
● Residuals and SAS
PROC GENMOD;
● Another Example of
Independence CLASS item1 item2;
● Admission Scandal Results
● Results Continued
MODEL count = item1 item2 / link=log dist=P obstats;
● Test of Independence
“AdjChiRes” are the adjusted chi-square (Haberman)
Homogeneous Distributions
residuals.
Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 21 of 45


Other Hypotheses
Another Example of Independence

Overview and Definitions


“Specifically, there were about 26, 000 applications to the
The Chi–Squared Distribution
Urbana campus this year. About 18, 000 applicants were
Pearson’s Chi-Squared Statistic
admitted using the 69% admissions rate cited in the article.
Likelihood Ratio Statistic The 160 "I list" applicants had a 77% admissions rate,
Chi-Squared Test Hypotheses according to the Tribune. This translates into the admission of
Independence
● Independence
13 more applicants on the Category I list admissions rate
● Expected Frequencies Under
Independence
versus the standard rate.”
● Testing Independence
● Computing Degrees of
Freedom
● Example: Two Items from the Ignoring the ethical question, is 13 more applicants admitted
1994 GSS
● Example: Estimated statistically significant? In other words, is 77% statistically
Expected Values
● Example: Test Statistics different from 69%?
● Residuals
● Adjusted Residuals
● Residuals and SAS
● Another Example of Let’s look at the statistical question using all methods that
Independence
● Admission Scandal Results we’ve discussed so far.
● Results Continued
● Test of Independence

Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 22 of 45


Other Hypotheses
Admission Scandal Results
Binomial test of whether admission rate from I list is same as
general admission rate. The results are significant whether use
Overview and Definitions
asymptotic test or binomial exact tests.
The Chi–Squared Distribution I-list: Ho : Probability of Admission of I list)= .69 = (i.e., the
Pearson’s Chi-Squared Statistic proportion general admission)
Likelihood Ratio Statistic
The FREQ Procedure
Chi-Squared Test Hypotheses
Cumulative Cumulative
Independence
● Independence admit Frequency Percent Frequency Percent
● Expected Frequencies Under
Independence
● Testing Independence
yes 123 76.88 123 76.88
● Computing Degrees of
Freedom no 37 23.13 160 100.00
● Example: Two Items from the
1994 GSS
● Example: Estimated
Expected Values
Large Sample Exact Binomial
● Example: Test Statistics
● Residuals Proportion 0.7688
● Adjusted Residuals
● Residuals and SAS ASE 0.0333
● Another Example of
Independence
● Admission Scandal Results
95% Lower Conf Limit 0.7034 0.6956
● Results Continued
● Test of Independence
95% Upper Conf Limit 0.8341 0.8317
Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 23 of 45


Other Hypotheses
Results Continued
Asymptotic (large sample) Test of H0: Proportion = 0.69

ASE under H0 0.0366


Overview and Definitions

The Chi–Squared Distribution


Z 2.1538
Pearson’s Chi-Squared Statistic One-sided Pr > Z 0.0156
Likelihood Ratio Statistic Two-sided Pr > |Z| 0.0313
Chi-Squared Test Hypotheses

Independence Sample Size = 160


● Independence
● Expected Frequencies Under
Independence
● Testing Independence
● Computing Degrees of
95% Confidence
Freedom
● Example: Two Items from the Statistic Value Interval
1994 GSS
● Example: Estimated
Expected Values
Difference of Proportions .076 0.009 0.144
● Example: Test Statistics
● Residuals Odds ratio 1.478 1.022 2.136
● Adjusted Residuals
● Residuals and SAS
● Another Example of
Relative Risk 1.110 1.020 1.209
Independence
● Admission Scandal Results
Correlation 0.013
● Results Continued
● Test of Independence

Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 24 of 45


Other Hypotheses
c

24-1
Test of Independence
Admission
yes no Total
Overview and Definitions
I list 123 37 160
The Chi–Squared Distribution

Pearson’s Chi-Squared Statistic


general 18000 8000 26000
Likelihood Ratio Statistic
Total 18123 8037 26160
Chi-Squared Test Hypotheses

Independence
● Independence
● Expected Frequencies Under
Statistics for Table of List by Admission
Independence
● Testing Independence Statistic DF Value Prob
● Computing Degrees of
Freedom
● Example: Two Items from the
Chi-Square 1 4.3659 0.0367
1994 GSS
● Example: Estimated Likelihood Ratio Chi-Square 1 4.6036 0.0319
Expected Values
● Example: Test Statistics Continuity Adj. Chi-Square 1 4.0141 0.0451
● Residuals
● Adjusted Residuals
● Residuals and SAS
Mantel-Haenszel Chi-Square 1 4.3657 0.0367
● Another Example of
Independence Phi Coefficient −0.0129
● Admission Scandal Results
● Results Continued
● Test of Independence

Homogeneous Distributions

Unrelated Classification

Two-Way Tables: Chi-Square Tests Slide 25 of 45


Other Hypotheses
Homogeneous Distributions
Situation: Sample from different populations and observe
classification on a response variable. The explanatory variable
Overview and Definitions
defines the populations and the number from each population
The Chi–Squared Distribution
is determined by the researcher.
Pearson’s Chi-Squared Statistic
i.e., independent Binomial/Multinomial sampling.
Likelihood Ratio Statistic

Chi-Squared Test Hypotheses Null Hypothesis: The distributions of responses from the
Independence different populations are the same.
Homogeneous Distributions
● Homogeneous Distributions Alternative Hypothesis: The distributions of responses from the
● Chi-Square Test for
Homogeneous Distributions
● Estimated Expected
different populations are different.
Frequencies
● Degrees of Freedom
● Example: Effectiveness of
Effectiveness of Vitamin C for prevention of common cold.
Vitamin C
● Summary regarding Outcome
Effectiveness of Vitamin C

Unrelated Classification Cold No Cold


Other Hypotheses vitamin C 17/139 = .12 122/139 = .88 .12 + .88 = 1.00
Partitioning Chi-Square
placebo 31/140 = .22 109/140 = .78 .22 + .78 = 1.00
Summary Comments on
Chi-Squared Tests
48/279 = .17 231/279 = .83 .17 + .83 = 1.00

Two-Way Tables: Chi-Square Tests Slide 26 of 45


Chi-Square Test for Homogeneous Distributions
The null and alternative hypotheses are:
HO : π1 = π2 versus HA : π1 6= π2
Overview and Definitions

The Chi–Squared Distribution and more generally,


Pearson’s Chi-Squared Statistic
πij πij
Likelihood Ratio Statistic HO : πj|i = = π+j versus HA : πj|i = 6= π+j
πi+ πi+
Chi-Squared Test Hypotheses

Independence for all i, . . . , I and j = 1, . . . , J.


Homogeneous Distributions
● Homogeneous Distributions
● Chi-Square Test for
Homogeneous Distributions
Assuming HO is true, the conditional distributions of the
● Estimated Expected
Frequencies
response variable given the explanatory variable should all be
● Degrees of Freedom
● Example: Effectiveness of
equal and they should equal the marginal distribution of the
Vitamin C
● Summary regarding
response variable; that is,
Effectiveness of Vitamin C
πij
Unrelated Classification πj|i = = π+j
Other Hypotheses
πi+
Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 27 of 45


Estimated Expected Frequencies
■ Expected frequencies equal
µij = ni+ π+j
Overview and Definitions

The Chi–Squared Distribution where ni+ is given (fixed by design).


Pearson’s Chi-Squared Statistic ■ Given data, our (maximum likelihood) estimates of the
Likelihood Ratio Statistic
marginal probabilities of responses are
Chi-Squared Test Hypotheses

Independence π̂j|i = π̂+j = p+j = n+j /n


Homogeneous Distributions
● Homogeneous Distributions
● Chi-Square Test for
■ Estimated Expected Frequencies are
Homogeneous Distributions
● Estimated Expected
Frequencies µ̂ij = ni+ π̂+j
● Degrees of Freedom
● Example: Effectiveness of
Vitamin C
= ni+ (n+j /n)
● Summary regarding
ni+ n+j
Effectiveness of Vitamin C
=
Unrelated Classification n
Other Hypotheses

Partitioning Chi-Square
which is the exact same formula that we use to compute
Summary Comments on
Chi-Squared Tests
estimated expected frequencies under independence.

Two-Way Tables: Chi-Square Tests Slide 28 of 45


Degrees of Freedom
for test of homogeneous distributions

Overview and Definitions Null Hypothesis has


The Chi–Squared Distribution
(J − 1) unique parameters — the π̂+j , which sum to 1.
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


Alternative Hypothesis has
Chi-Squared Test Hypotheses

Independence I(J − 1) unique parameters — for I values of π̂j|i , which must


Homogeneous Distributions sum to 1.
● Homogeneous Distributions
● Chi-Square Test for
Homogeneous Distributions
● Estimated Expected
Frequencies
Degrees of Freedom equal
● Degrees of Freedom
● Example: Effectiveness of
Vitamin C
● Summary regarding
df = I(J − 1) − (J − 1) = (I − 1)(J − 1)
Effectiveness of Vitamin C

Unrelated Classification

Other Hypotheses
Same as for testing independence.
Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 29 of 45


Example: Effectiveness of Vitamin C

Observed Frequencies Expected Values


Outcome Outcome
Overview and Definitions

The Chi–Squared Distribution


Cold No Cold Cold No Cold
Pearson’s Chi-Squared Statistic vitamin C 17 122 139 vitamin C 23.91 115.09 139
Likelihood Ratio Statistic
placebo 31 109 140 placebo 24.09 115.91 140
Chi-Squared Test Hypotheses

Independence
48 231 279 48 231 279
Homogeneous Distributions Test Statistic df Value p–value
● Homogeneous Distributions
● Chi-Square Test for
Homogeneous Distributions
Pearson Chi-Square X2 1 4.811 .03
● Estimated Expected
Frequencies Likelihood Ratio Chi-Square G2 1 4.872 .03
● Degrees of Freedom
● Example: Effectiveness of
Vitamin C Adjusted Residuals
● Summary regarding
Effectiveness of Vitamin C
Outcome
Unrelated Classification

Other Hypotheses
Cold No Cold
Partitioning Chi-Square vitamin C −2.31 2.17
Summary Comments on
Chi-Squared Tests
placebo 2.10 −2.22

Two-Way Tables: Chi-Square Tests Slide 30 of 45


Summary regarding Effectiveness of Vitamin C
Difference of Proportions = −.10 95% CI (−.19, −.01)
Relative Risk = .552 95% CI (.32, .93)
Overview and Definitions

The Chi–Squared Distribution


Odds ratio = .490 95% CI (.26, .93)
Pearson’s Chi-Squared Statistic Correlation = −.131
Likelihood Ratio Statistic

Chi-Squared Test Hypotheses

Independence
Test Statistic df Value p–value
Homogeneous Distributions Pearson Chi-Square X2 1 4.811 .03
● Homogeneous Distributions
● Chi-Square Test for
Homogeneous Distributions
Likelihood Ratio Chi-Square G2 1 4.872 .03
● Estimated Expected
Frequencies
● Degrees of Freedom
● Example: Effectiveness of
Adjusted Residuals
Vitamin C
● Summary regarding Outcome
Effectiveness of Vitamin C

Unrelated Classification Cold No Cold


Other Hypotheses
vitamin C −2.31 2.17
Partitioning Chi-Square
placebo 2.10 −2.22
Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 31 of 45


Unrelated Classification
Situation: Both margins are fixed by design. The sample can
be considered the population.
Overview and Definitions
Example: 1970 draft lottery of 19–26 year olds (Fienberg, 1971).
Each day of the year (including Feb 29) was typed on a slip of paper
The Chi–Squared Distribution
and inserted into a capsule. The capsules were mixed and were
Pearson’s Chi-Squared Statistic assigned a “drawing number” according to their position in the
Likelihood Ratio Statistic
sequence of capsules picked from a bowl. The cross-classification of
months by drawing number where drawing numbers are grouped into
Chi-Squared Test Hypotheses
thirds.
Independence Drawing Numbers

Homogeneous Distributions
1–122 123–244 245–366 Totals
Jan 9 12 10 31
Unrelated Classification
Feb 7 12 10 29
● Unrelated Classification
● Hypothesis of Unrelated March 5 10 16 31
Classification
April 8 8 14 30
● Expected Values
● Example: 1970 Draft May 9 7 15 31
Month June 11 7 12 30
Other Hypotheses
July 12 7 12 31
Partitioning Chi-Square Aug 13 7 11 31
Sept 10 15 5 30
Summary Comments on
Chi-Squared Tests Oct 9 15 7 31
Nov 12 12 6 30
Dec 17 10 4 31
Totals 122 122 122 366

Two-Way Tables: Chi-Square Tests Slide 32 of 45


Hypothesis of Unrelated Classification

Overview and Definitions


Null Hypothesis: The row and column classifications are
The Chi–Squared Distribution
unrelated.
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


HO : Drawing was random; that is, there is no relationship
Chi-Squared Test Hypotheses
between drawing number and month of birth.
Independence
Alternative Hypothesis: The row and column classifications
Homogeneous Distributions
are related.
Unrelated Classification
● Unrelated Classification
● Hypothesis of Unrelated HA : Drawing was not random; there is a relationship between
Classification
● Expected Values drawing number and month of birth.
● Example: 1970 Draft

Other Hypotheses

Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 33 of 45


Expected Values

The logic to find the expected values follows that of


Overview and Definitions homogeneous distributions.
The Chi–Squared Distribution
■ ni+ fixed for rows
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic


■ n+j fixed for columns
Chi-Squared Test Hypotheses ■ n+j /n = proportion in column j.
Independence

Homogeneous Distributions If the null hypothesis is true, then expected frequencies µij are
Unrelated Classification
● Unrelated Classification µij = (# in row i)(proportion in column j)
● Hypothesis of Unrelated
Classification
● Expected Values
= ni+ (n+j /n)
● Example: 1970 Draft
ni+ n+j
=
Other Hypotheses
n
Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests
Degrees of Freedom = (I − 1)(J − 1).

Two-Way Tables: Chi-Square Tests Slide 34 of 45


Example: 1970 Draft
Statistic df Value p–value
Pearson chi-square X2 22 37.540 .02
Overview and Definitions
Likelihood ratio chi-square G2 22 38.669 .02
The Chi–Squared Distribution

Pearson’s Chi-Squared Statistic What’s the nature of the association?


Likelihood Ratio Statistic Adjusted Residuals:
Chi-Squared Test Hypotheses Drawing Number
1-122 123–244 245–366
Independence
Jan −.52 .64 −.12
Homogeneous Distributions Feb −1.08 .93 .15

Unrelated Classification March −2.11 −.15 2.27


● Unrelated Classification April −.80 −.83 1.63
● Hypothesis of Unrelated
Classification
May −.52 −1.35 1.87
● Expected Values Month June .42 −1.23 .82
● Example: 1970 Draft
July .68 −1.35 .68
Other Hypotheses Aug 1.07 −1.35 .28
Sept .01 2.00 −2.01
Partitioning Chi-Square
Oct −.52 1.83 −1.32
Summary Comments on Nov .68 1.04 −1.72
Chi-Squared Tests
Dec 2.67 −.15 −.251

Explanation. . .

Two-Way Tables: Chi-Square Tests Slide 35 of 45


Other Hypotheses

Overview and Definitions


These can either be
The Chi–Squared Distribution

Pearson’s Chi-Squared Statistic


■ Simpler than independence. (Example on following slides)
Likelihood Ratio Statistic
■ More complex. (e.g., symmetry and others . . . later in the
Chi-Squared Test Hypotheses
semester).
Independence

Homogeneous Distributions

Unrelated Classification

Other Hypotheses
● Other Hypotheses
● Example of Other Hypothesis
● Another Other Example

Partitioning Chi-Square

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 36 of 45


Example of Other Hypothesis
The null hypothesis specifies the distribution of one or more of
the margins.
Overview and Definitions Example: (from Wickens, 1989). Suppose that there are 2
The Chi–Squared Distribution approaches to solving a problem and the answer is either
Pearson’s Chi-Squared Statistic correct or incorrect.
Likelihood Ratio Statistic
Answer
Chi-Squared Test Hypotheses

Independence
Correct Incorrect
Homogeneous Distributions
Method A n/2 = .5
Unrelated Classification B n/2 = .5
Other Hypotheses
● Other Hypotheses
n
● Example of Other Hypothesis
● Another Other Example
■ HO : Independence and equal number of students should
Partitioning Chi-Square
choose each method.
Summary Comments on
Chi-Squared Tests ■ HA : Method and Answer are dependent and/or unequal
number of students choose each method.
The expected frequencies = ni+ n+j /n = n+j /2.

Two-Way Tables: Chi-Square Tests Slide 37 of 45


Another Other Example
Testing Mendal’s Theories of natural inheritance
Review:
Overview and Definitions
Y = yellow −→ dominant trait
The Chi–Squared Distribution

Pearson’s Chi-Squared Statistic


g = green −→ recessive trait
Likelihood Ratio Statistic
■ 1st generation: All plants have genotype Y g and phenotype
Chi-Squared Test Hypotheses
is yellow.
Independence
■ 2nd generation: Possible genotypes and phenotypes are
Homogeneous Distributions

Unrelated Classification Assuming


Other Hypotheses
● Other Hypotheses
Genotype Phenotype random
● Example of Other Hypothesis
● Another Other Example YY yellow 25%
Partitioning Chi-Square
Yg yellow 25%
Summary Comments on
Chi-Squared Tests
gY yellow 25%
gg green 25%
Theory predicts that 75% will be yellow and 25% will be green.

Two-Way Tables: Chi-Square Tests Slide 38 of 45


Partitioning Chi-Square
Another way to investigate the nature of association
The sum of independent chi-squared statistics are themselves
Overview and Definitions
chi-squared statistics with degrees of freedom equal to the
The Chi–Squared Distribution
sum of the degrees of freedom for the individual statistics.
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic For example, if


Chi-Squared Test Hypotheses
Z12 is chi-squared with df1 = 1
Independence
and Z22 is chi-squared with df2 = 1
Homogeneous Distributions

Unrelated Classification

Other Hypotheses
then (Z12 + Z22 ) is chi-squared with df = df1 + df2 = 2
Partitioning Chi-Square
● Partitioning Chi-Square
. . . and (of course) Z12 and Z22 are independent.
● Partitioning Chi-Square by
Example
● Check for Relationship &
Then Partition
“Partitioning chi-squared” uses this fact, but in reverse:
● Independent Component
Tables We start with a chi-squared statistic with df > 1 and break it
● Description of Association
● Necessary Conditions for into component parts, each with df = 1.
Partitioning

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 39 of 45


Partitioning Chi-Square by Example
Why partition? Partitioning chi–squared statistics helps to
show that an association that was significant for the overall
Overview and Definitions
table primarily reflects differences between some categories
The Chi–Squared Distribution
and/or groups of categories.
Pearson’s Chi-Squared Statistic
Demonstrate the method by example by partitioning G2 for a
Likelihood Ratio Statistic
3 × 3 table into (3 − 1)(3 − 1) = 4 parts.
Chi-Squared Test Hypotheses

Independence
Example: A sample of psychiatrists were classified with
Homogeneous Distributions
respect to their school of psychiatric thought and their beliefs
Unrelated Classification
about the origin of schizophrenia. (Agresti, 1990; Gallagher, et
al, 1987).
Other Hypotheses

Partitioning Chi-Square School of Origin of Schizophrenia


● Partitioning Chi-Square
● Partitioning Chi-Square by
Example Psychiatric Thought Biogenic Environmental Combination
● Check for Relationship &
Then Partition
● Independent Component
Eclectic 90 12 78
Tables
● Description of Association Medical 13 1 6
● Necessary Conditions for
Partitioning
Psychoanalysis 19 13 50
Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 40 of 45


Check for Relationship & Then Partition
First we check if these two variables are independent or not.
Statistic df Value p–value
Overview and Definitions X2 4 22.378 < .001
The Chi–Squared Distribution
G2 4 23.036 < .001
Pearson’s Chi-Squared Statistic

Likelihood Ratio Statistic School of Origin of Schizophrenia


Psychiatric Thought Biogenic Environmental Combination
Chi-Squared Test Hypotheses
Eclectic 90 12 78
Independence Medical 13 1 6

Homogeneous Distributions
Psychoanalysis 19 13 50

Unrelated Classification
Sub-table 1: Sub-table 3:
Other Hypotheses Bio Env −→ df = 1 Bio Env −→ df = 1
Eclectic 90 12 G2 = .294 Medical 13 1 G2 = 6.100
Partitioning Chi-Square
● Partitioning Chi-Square
Medical 13 1 p-value = .59 Psychan 19 13 p-value = .01
● Partitioning Chi-Square by
Example Sub-table 2: Sub-table 4:
● Check for Relationship & Env Com −→ df = 1 Env Com −→ df = 1
Then Partition
● Independent Component Eclectic 12 78 G2 = .005 Medical 1 6 G2 = .171
Tables Medical 1 6 p-value = .94 Psychoan 13 50 p-value = .68
● Description of Association
● Necessary Conditions for
Partitioning
But. . . .294 + .005 + 6.100 + .171 = 6.570 6= 23.036
Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 41 of 45


Independent Component Tables
A general method proposed by Lancaster (1949).
P P P
a<i b<j nab a<i naj
Overview and Definitions P
The Chi–Squared Distribution
b<j nib nij
Pearson’s Chi-Squared Statistic
Using this with our example:
Likelihood Ratio Statistic
School of Origin of Schizophrenia
Chi-Squared Test Hypotheses Psychiatirc Thought Biogenic Environmental Combination

Independence Eclectic 90 12 78
Medical 13 1 6
Homogeneous Distributions
Psychoanalysis 19 13 50
Unrelated Classification
Sub-Table 1: Sub-Table 3:
Other Hypotheses
Bio Env −→ df = 1 Bio Env −→ df = 1
Partitioning Chi-Square Eclectic 90 12 G2 = .294 Ecl+Med 103 13 G2 = 12.953
● Partitioning Chi-Square
Medical 13 1 X 2 = .264 Psychoan 19 13 X 2 = 14.989
● Partitioning Chi-Square by
Example θ̂ = .577 θ̂ = 5.421
● Check for Relationship &
Then Partition Sub-Table 2: Sub-Table 4:
● Independent Component
Tables Bio Bio
● Description of Association +Env Com −→ df = 1 +Env Com −→ df = 1
● Necessary Conditions for
Eclectic 102 78 G2 = 1.359 Ecl+Med 116 84 G2 = 8.430
Partitioning
Medical 14 6 X 2 = 1.314 Psychoan 32 50 X 2 = 8.397
Summary Comments on
θ̂ = .560 θ̂ = 2.158
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 42 of 45


Description of Association
from Agresti (1990):
“The psychoanalytic school seems more likely than
Overview and Definitions other schools to ascribe the origins of schizophrenia as
The Chi–Squared Distribution being a combination. Of those who chose either the
Pearson’s Chi-Squared Statistic biogenic or environmental origin, members of the
Likelihood Ratio Statistic psychoanalytic school were somewhat more likely than
Chi-Squared Test Hypotheses the other schools to chose the environmental origin.”
Independence
With this partitioning, likelihood ratio chi-squared statistics add
Homogeneous Distributions
up to G2 for full table
Unrelated Classification

Other Hypotheses .294 + 1.359 + 12.953 + 8.430 = 23.036


Partitioning Chi-Square
● Partitioning Chi-Square
● Partitioning Chi-Square by
Example Pearson X 2 ’s don’t add up to value in full table:
● Check for Relationship &
Then Partition
● Independent Component
Tables
.264 + 1.314 + 14.989 + 8.397 = 24.964 6= 22.378
● Description of Association
● Necessary Conditions for
Partitioning
. . . but this is OK because they are not suppose to add up
Summary Comments on
Chi-Squared Tests
exactly.

Two-Way Tables: Chi-Square Tests Slide 43 of 45


Necessary Conditions for Partitioning
You are not restricted to use the method proposed by
Lancaster; however, for partitioning to lead to a full
Overview and Definitions
decomposition of G2 the following are necessary conditions
The Chi–Squared Distribution
(Agresti, 1990)
Pearson’s Chi-Squared Statistic
■ The degrees of freedom for the sub-tables must sum to the
Likelihood Ratio Statistic
degrees of freedom for the original table.
Chi-Squared Test Hypotheses
■ Each cell count in the original table must be a cell in one and
Independence
only one sub-table.
Homogeneous Distributions

Unrelated Classification
■ Each marginal total of the original table must be a marginal
Other Hypotheses
total for one and only one sub-table.
Partitioning Chi-Square
● Partitioning Chi-Square
● Partitioning Chi-Square by A better approach to studying the nature of association —
Example
● Check for Relationship & estimating parameters that describe aspects of association
Then Partition
● Independent Component and models the represent association.
Tables
● Description of Association
● Necessary Conditions for
Partitioning

Summary Comments on
Chi-Squared Tests

Two-Way Tables: Chi-Square Tests Slide 44 of 45


Summary Comments on Chi-Squared Tests
■ Chi–squared tests of no association only indicate evidence
there is against HO .
Overview and Definitions ■ Chi–squared tests are limited to “large” samples.
The Chi–Squared Distribution ◆ As n increases relative to the size of the table, the
Pearson’s Chi-Squared Statistic distribution of X 2 and G2 are better approximated by the
Likelihood Ratio Statistic chi–squared distribution.
Chi-Squared Test Hypotheses ◆ Since the sampling distributions of X 2 and G2 are only
Independence approximated by chi–square distributions, p–values
Homogeneous Distributions should only be reported to 2 decimal places (3 at most).
Unrelated Classification
◆ The distribution of X 2 converges faster to chi–squared

Other Hypotheses than the distribution of G2 . (More about this later in


Partitioning Chi-Square semester).
Summary Comments on
◆ There are small sample methods available — “exact tests”
Chi-Squared Tests
● Summary Comments on
Chi-Squared Tests
■ The tests that we’ve discussed have not used additional
information that we may have about the variables.
■ In the case of ordinal variables, there are better methods.

Two-Way Tables: Chi-Square Tests Slide 45 of 45

You might also like