You are on page 1of 67

Hypothesis Testing:

Categorical Data
Analysis
Part 1

1
Z Test for Differences in
Two Proportions

2
Hypotheses for
Two Proportions

Research Questions
Hypothesis No Difference Pop 1 Pop 2 Pop 1 Pop 2
Any Difference Pop 1 < Pop 2 Pop 1 > Pop 2

H0 p1 - p2 = 0 p1 - p2 0 p1 - p2 0
Ha p1 - p2 0 p1 - p2 < 0 p1 - p2 > 0

3
Z Test for Difference in Two
Proportions
1. Assumptions
Populations Are Independent
Populations Follow Binomial Distribution
Normal Approximation Can Be Used for
large samples (All Expected Counts 5)
2. Z-Test Statistic for Two Proportions

Z
p1 p 2 p1 p2 where p
X1 X 2
1 1 n1 n2
p 1 p
n1 n2
4
Sample Distribution for Difference
Between Proportions
12 22
X1 X 2 ~ N 1 2 ;
n1 n2

p1 1 p1 p2 1 p2
p1 p2 N p1 p2 ;
n n
1 2
1 1
N 0; pq under H 0 : p1 p2
n1 n2

x x
p 1 2,
n1 n2
5
Z Test for Two Proportions
Thinking Challenge
MA
Youre an epidemiologist for the US
Department of Health and Human
Services. Youre studying the
prevalence of disease X in two
states (MA and CA). In MA, 74 of
1500 people surveyed were CA
diseased and in CA, 129 of 1500
were diseased. At .05 level, does
MA have a lower prevalence rate?
6
Z Test for Two Proportions
Solution*
X MA 74 X CA 129
p MA .0493 p CA .0860
nMA 1500 nCA 1500

X MA X CA 74 129
p .0677
nMA nCA 1500 1500

Z
.0493 .0860 0
.0677 1 .0677
1

1

1500 1500
4.00
7
Z Test for Two Proportions
Solution*
H0: pMA - pCA = 0 Test Statistic:
Ha: pMA - pCA < 0 Z = -4.00
= .05
nMA = 1500 nCA = 1500
Critical Value(s):
Decision:
Reject Reject at = .05
.05 Conclusion:
There is evidence MA
-1.645 0 Z is less than CA
8
Test of Independence
2

Between 2 Categorical
Variables

9
2 Test of Independence
1.Shows If a Relationship Exists Between 2
Qualitative Variables, but does Not Show
Causality
2.Assumptions
Multinomial Experiment
All Expected Counts 5
3.Uses Two-Way Contingency Table

10
2 Test of Independence
Contingency Table
1. Shows # Observations From 1 Sample
Jointly in 2 Qualitative Variables

11
2 Test of Independence
Contingency Table
1.Shows # Observations From 1 Sample
Jointly in 2 Qualitative Variables
Levels of variable 2

Residence
Disease Urban Rural Total
Status
Disease 63 49 112
No disease 15 33 48
Total 78 82 160
Levels of variable 1
12
2 Test of Independence
Hypotheses & Statistic
1.Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)
2.Test Statistic

Observed count
nij Enij

2

E n

2 Expected
all cells ij
count
Rows Columns
Degrees of Freedom: (r - 1)(c - 1)
13
2 Test of Independence
Expected Counts
1.Statistical Independence Means Joint
Probability Equals Product of Marginal
Probabilities
2.Compute Marginal Probabilities & Multiply
for Joint Probability
3.Expected Count Is Sample Size Times
Joint Probability

14
Expected Count Example
112 78
Joint probability = Marginal probability = 112
160 160 160
Residence
Disease Urban Rural
Status Obs. Obs. Total
Disease 63 49 112
No Disease 15 33 48
Total 78 82 160
112 78
78 Expected count = 160
Marginal probability = 160 160
160 = 54.6
15
Expected Count Calculation
Expected count =
Row total Column total
Sample size
112x78 Residence 112x82
Disease
160 Urban Rural 160
Status Obs. Exp. Obs. Exp. Total
Disease 63 54.6 49 57.4 112
No Disease 15 23.4 33 24.6 48
Total 78 78 82 82 160

48x78 48x82
160 160 16
2 Test of Independence
Example on HIV
You randomly sample 286 sexually active
individuals and collect information on their HIV
status and History of STDs. At the .05 level, is
there evidence of a relationship?
HIV
STDs Hx No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286

17
2 Test of Independence
Solution

E(nij) 5 in all
cells
116x132 HIV 154x116
286 No Yes 286
STDs HX Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286

170x132 170x154
286 286
18
2 Test of Independence
Solution
n En

2

Enij

2 ij ij

n En n En
allcells

2
n
2
En 2


En11 En12 En22
11 11 12 12 22 22

8453.5 3262.5

2 2

12291.5
2
54.29
53.5 62.5 91.5
19
2 Test of Independence
Solution
H0: No Relationship Test Statistic:
Ha: Relationship 2 = 54.29
= .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Decision:
Reject Reject at = .05
= .05 Conclusion:
There is evidence of a
0 3.841 2 relationship
20
Linear Regression and
Correlation Methods
Part 2

21
Regression Models

Relationship between one dependent


variable and explanatory variable(s)
Use equation to set up relationship
Numerical Dependent (Response) Variable
1 or More Numerical or Categorical Independent
(Explanatory) Variables
Used Mainly for Prediction & Estimation

22
Regression Modeling Steps
1. Hypothesize Deterministic Component
Estimate Unknown Parameters

2. Specify Probability Distribution of


Random Error Term
Estimate Standard Deviation of Error

3. Evaluate the fitted Model


4. Use Model for Prediction & Estimation

23
Linear Regression
Model

24
Linear Equations
Y
Y = mX + b
Change
m = Slope in Y
Change in X
b = Y-intercept
X

1984-1994 T/Maker Co.

25
Linear Regression Model

1. Relationship Between Variables Is a


Linear Function
Population Population Random
Y-Intercept Slope Error

Yi 0 1X i i
Dependent Independent
(Response) (Explanatory) Variable
Variable (e.g., Years s. serocon.)
(e.g., CD+ c.)
Population & Sample
Regression Models
Population Random Sample

Unknown Yi 0 1X i i

Relationship
Yi 0 1X i i




27
Sample Linear Regression
Model
Y Yi 0 1X i i

^i = Random
error
Unsampled
observation
Yi 0 1X i

X
Observed value
28
Estimating Parameters:
Least Squares Method

29
Least Squares
1. Best Fit Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-
Set Negative ones

30
Least Squares
1. Best Fit Means Difference Between
Actual Y Values & Predicted Y Values Are
a Minimum. But Positive Differences Off-


Set Negative. So square errors!

Yi Yi
n n
2
2
i
i 1 i 1

2. LS Minimizes the Sum of the Squared


Differences (errors) (SSE)
31
Least Squares Graphically

LS minimizes i 1 2 3 4
n
2 2 2 2 2

i 1
Y Y2 0 1X 2 2
^4
^2
^1 ^3
Yi 0 1X i

X
32
Coefficient Equations
Prediction equation
yi 0 1 xi

Sample slope
SS xy xi x yi y
1
SS xx i x x 2

Sample Y - intercept

0 y 1x
33
Derivation of Parameters (1)
Least Squares (L-S):
Minimize squared error
yi 0 1 xi
n n
2 2
i
i 1 i 1

yi 0 1 xi
2 2

0 i

0 0
2 ny n 0 n1 x

0 y 1x
34
Derivation of Parameters (1)
Least Squares (L-S):
Minimize squared error
i2 yi 0 1 xi
2

0
1 1
2 xi yi 0 1 xi
2 xi yi y 1 x 1 xi

1 xi xi x xi yi y
1 xi x xi x xi x yi y
SS xy
1

SS xx

35
Computation Table

Xi Yi Xi 2
Yi2
XiYi
X1 Y1 X1 2
Y1 2
X 1 Y1
X2 Y2 X2 2
Y2 2
X 2 Y2
: : : : :
Xn Yn Xn 2
Yn 2
X n Yn
Xi Yi Xi 2
Yi2
XiYi
36
Interpretation of Coefficients
^
1. Slope (1)
Estimated Y Changes by ^1 for Each 1 Unit
Increase in X
^
If = 2, then Y Is Expected to Increase by 2 for
1
Each 1 Unit Increase in X
^
2. Y-Intercept (0)
Average Value of Y When X = 0
If ^0 = 4, then Average Y Is Expected to Be
4 When X Is 0

37
Parameter Estimation Example
Obstetrics: What is the relationship between
Mothers Estriol level & Birthweight using the
following data?
Estriol Birthweight
(mg/24h) (g/1000)
1 1
2 1
3 2
4 2
5 4

38
Scatterplot
Birthweight vs. Estriol level

Birthweight
4
3
2
1
0
0 1 2 3 4 5 6
Estriol level

39
Parameter Estimation Solution
Table
Xi Yi Xi2 Yi2 XiYi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
40
Parameter Estimation Solution


X i Yi

n n

1510
i 1
n
X Y 37
i 1

n 5
1
i i
0.70
15
i 1

i

2 2
55
n
X

n
5
X
2 i 1

n
i
i 1

0 Y 1 X 2 0.70 3 0.10
41
Coefficient Interpretation
Solution
1. Slope (1)^

Birthweight (Y) Is Expected to Increase by .7


Units for Each 1 unit Increase in Estriol (X)
Intercept (0)
^
2.
Average Birthweight (Y) Is -.10 Units When
Estriol level (X) Is 0
Difficult to explain
The birthweight should always be positive

42
Parameter Estimation Thinking
Challenge
Youre a Vet epidemiologist for the county
cooperative. You gather the following data:
Food (lb.) Milk yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
What is the relationship
1984-1994 T/Maker Co.

between cows food intake and milk yield?

43
Scattergram
Milk Yield vs. Food intake*

M. Yield (lb.)
10
8
6
4
2
0
0 5 10 15
Food intake (lb.)

44
Parameter Estimation Solution
Table*
Xi Yi Xi 2
Yi2
XiYi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218

45
Parameter Estimation Solution*


X i Yi

n n


n
X Y i 1 i 1
218
32 24
n 4
1
i i
0.65
32
i 1

i

2 2
296
n
X

n
4
X
2 i 1

n
i
i 1

0 Y 1 X 6 0.658 0.80
46
Coefficient Interpretation
Solution*
Slope (1)
^
1.
Milk Yield (Y) Is Expected to Increase by
.65 lb. for Each 1 lb. Increase in Food intake
(X)

2. Y-Intercept (^0)
Average Milk yield (Y) Is Expected to Be 0.8
lb. When Food intake (X) Is 0

47
Analysis of Variance

Part 3

48
Analysis of Variance
A analysis of variance is a technique that
partitions the total sum of squares of
deviations of the observations about their
mean into portions associated with
independent variables in the experiment
and a portion associated with error

49
Analysis of Variance
The ANOVA table was previously
discussed in the context of regression
models with quantitative independent
variables, in this chapter the focus will be
on nominal independent variables (factors)

50
Analysis of Variance

A factor refers to a categorical


quantity under examination in an
experiment as a possible cause of
variation in the response variable.

51
Analysis of Variance

Levels refer to the categories,


measurements, or strata of a factor of
interest in the experiment.

52
One-Way ANOVA F-Test

1. Tests the Equality of 2 or More (p)


Population Means

2. Variables
One Nominal Independent Variable
One Continuous Dependent Variable

53
One-Way ANOVA F-Test
Assumptions
1. Randomness & Independence of Errors
2. Normality
Populations (for each condition) are
Normally Distributed
3.Homogeneity of Variance
Populations (for each condition) have Equal
Variances

54
One-Way ANOVA F-Test
Hypotheses
H0: 1 = 2 = 3 = ... = p
All Population Means
are Equal
No Treatment Effect

Ha: Not All j Are Equal


At Least 1 Pop. Mean
is Different
Treatment Effect
NOT 1 2 ... p

55
One-Way ANOVA F-Test
Hypotheses
H0: 1 = 2 = 3 = ... = p
All Population Means f(X)
are Equal
No Treatment Effect
X
Ha: Not All j Are Equal 1 = 2 = 3
At Least 1 Pop. Mean is
Different f(X)
Treatment Effect
NOT 1 = 2 = ... = p
Or i j for some i, j. X

1 = 2 3
56
One-Way ANOVA
Basic Idea
1. Compares 2 Types of Variation to Test
Equality of Means
2. If Treatment Variation Is Significantly
Greater Than Random Variation then
Means Are Not Equal
3.Variation Measures Are Obtained by
Partitioning Total Variation

57
One-Way ANOVA
Partitions Total Variation

Total variation

Variation due to Variation due to


treatment random sampling
Sum of Squares Among Sum of Squares Within
Sum of Squares Between Sum of Squares Error
Sum of Squares Treatment (SSE)
(SST) Within Groups Variation
Among Groups Variation
58
Total Variation


SS Total Y11 Y Y21 Y Yij Y
2 2
2

Response, Y

Group 1 Group 2 Group 3


59
Treatment Variation

SST n1 Y1 Y n2 Y2 Y n p Y p Y
2 2 2

Response, Y
Y3
Y
Y2
Y1

Group 1 Group 2 Group 3


60
Random (Error) Variation

SSE Y11 Y1 Y21 Y1 Y pj Y p


2 2 2

Response, Y

Y3
Y2
Y1

Group 1 Group 2 Group 3


61
One-Way ANOVA F-Test
Test Statistic
1. Test Statistic STT / p 1

SSE / n p
F = MST / MSE

MST Is Mean Square for Treatment


MSE Is Mean Square for Error

2. Degrees of Freedom
1 = p -1
2 = n - p
p = # Populations, Groups, or Levels
n = Total Sample Size
62
One-Way ANOVA
Summary Table

Source of Degrees Sum of Mean F


Variation of Squares Square
Freedom (Variance)
Treatment p-1 SST MST = MST
SST/(p - 1) MSE
Error n-p SSE MSE =
SSE/(n - p)
Total n-1 SS(Total) =
SST+SSE

63
One-Way ANOVA F-Test
Critical Value
If means are equal,
F = MST / MSE 1.
Only reject large F! Reject H0

Do Not
Reject H0

0 F
Fa ( p1, n p)

Always One-Tail!
1984-1994 T/Maker Co.

64
One-Way ANOVA F-Test
Example
As a vet epidemiologist you Food1 Food2 Food3
want to see if 3 food 25.40 23.40 20.00
supplements have different 26.31 21.80 22.20
mean milk yields. You 24.10 23.50 19.75
assign 15 cows, 5 per food 23.74 22.75 20.60
supplement. 25.10 21.60 20.40
Question: At the .05 level, is
there a difference in mean
yields?

65
One-Way ANOVA F-Test
Solution
H0: 1 = 2 = 3
Test Statistic:
Ha: Not All Equal
MST 23.5820
= .05 F 25.6
1 = 2 2 = 12 MSE .9211
Critical Value(s):
Decision:
Reject at = .05
= .05
Conclusion:
There Is Evidence Pop.
0 3.89 F Means Are Different
66
Summary Table
Solution
Source of Degrees of Sum of Mean F
Variation Freedom Squares Square
(Variance)
Food 3-1=2 47.1640 23.5820 25.60

Error 15 - 3 = 12 11.0532 .9211

Total 15 - 1 = 14 58.2172

67

You might also like