Professional Documents
Culture Documents
Compare the blood pressure of patients taking a drug to that of patients not taking the drug
Two treatments: Drug vs. No drug
Drug
110
115
120
125
130
135
140
Blood pressure
100
50
0
1
Patient number
The p-value for the t-test tells us the probability that we are wrong
if we conclude that there is a difference between the treatments.
Expressed another way:
The t-test tells us the probability that the observed differences between the two groups
is just due to random differences in sampling the two groups,
in the absence of any effect of the drug.
Using the Excel ttest workbook function:
=TTEST(A7:A13,B7:B13,2,2)
t-test p-value =
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail
-3.8
0.001265
1.782287
0.00253
2.178813
Single factor ANOVA can do the same thing as the t-test to determine if two groups (two treatments) are different.
Compare the blood pressure of patients taking a drug to that of patients not taking the drug
Two treatments: Drug vs. No drug
Drug
110
115
120
125
130
135
140
No drug
130
145
145
150
150
170
175
Count
7
7
ANOVA
Source of Variation
Between Groups
Within Groups
SS
2578.571
2142.857
Total
4721.429
Sum
Average Variance
875
125 116.6667
1065 152.1429 240.4762
df
MS
1 2578.571
12 178.5714
13
F
14.44
P-value
F crit
0.00253 4.747225
Use single-factor (1-way) Analysis of variance (ANOVA) to determine if two or more groups are different
Compare the blood pressure of patients taking three different drugs, Drug A, B and C.
Drug A
110
115
120
125
130
135
140
Drug B
105
115
125
125
125
140
140
Drug C
130
145
145
150
150
170
175
200
150
100
Drug A
50
Drug B
Drug C
0
1
Patie nt
The p-value for the ANOVA tells us the probability that we are wrong
if we conclude that there is any difference between the treatments.
ANOVA tells us the probability that the observed differences between the groups
is just due to random differences in sampling the groups, in the absence of any effect of the drug.
If at least one group is different, ANOVA gives us a small p-value.
Using Menu: Tools / DataAnalysis / ANOVA Single Factor
SUMMARY
Groups
Count
Sum
Average Variance
Drug A
7
875
125 116.6667
Drug B
7
875
125 158.3333
Drug C
7
1065 152.1429 240.4762
ANOVA
Source of Variation
Between Groups
Within Groups
SS
3438.095
3092.857
Total
6530.952
df
MS
F
P-value
F crit
2 1719.048 10.00462 0.001198 3.554561
18 171.8254
20
Looking at the SUMMARY table, we notive that the average for drug C is 152.1429, while
the average for the other two groups is 125.
The ANOVA table tells us that P-value is 0.001198, which means that it is very unlikely
we would see this big a difference between the three groups just by chance.
Use two-factor (2-way) Analysis of variance to determine if either of two factors affects the outcomes
Suppose we think that two factors, gender and drug, may affect the patient's response
Use the Excel menu: Tools/Data Analysis/ ANOVA 2-factor without replication
Factor: Age
Factor: Drug
Drug A
Drug B
118
128
120
130
121
130
Under 21
21 to 55
Over 55
Drug C
135
136
134
Count
3
3
3
Sum
Average Variance
381
127
73
386 128.6667 65.33333
385 128.3333 44.33333
3
3
3
ANOVA
Source of Variation
SS
Rows
4.6666666667
Columns
360.66666667
Error
4.6666666667
Total
370
df
MS
F
P-value
F crit
2 2.333333
2
0.25 6.944272
2 180.3333 154.5714 0.000163 6.944272
4 1.166667
8
The analysis indicates that there is a significant difference among the columns (Factor: Drug), with a p-value of 0.000163
The analysis indicates that there is NOT a significant difference among the rows (Factor: age), with a p-value of 0.25
We might be concerned that we only treated three people with each drug, and feel that we would like more replicates.
Suppose we think that two factors, gender and drug, may affect the patient's response
Factor: Drug
Drug A
Drug B
Under 21
118
128
117
126
110
125
118
131
21 to 55
120
130
118
132
121
129
124
130
Over 55
121
130
122
128
119
130
127
135
Factor: Age
Drug C
135
136
130
135
136
136
140
131
134
131
138
140
Use the Excel menu: Tools/Data Analysis/ ANOVA 2-factor with replication
Anova: Two-Factor With Replication
SUMMARY
Drug A
Drug B
Drug C
Total
Under 21
Count
Sum
Average
Variance
4
463
115.75
14.91667
4
4
12
510
536
1509
127.5
134
125.75
7 7.333333 70.20455
21 to 55
Count
Sum
Average
Variance
4
4
4
12
483
521
543
1547
120.75
130.25
135.75 128.9167
6.25 1.583333 13.58333 47.7197
Over 55
Count
Sum
Average
Variance
4
4
489
523
122.25
130.75
11.58333 8.916667
4
12
543
1555
135.75 129.5833
16.25 43.90152
Total
Count
Sum
Average
12
1435
119.5833
12
12
1554
1622
129.5 135.1667
Variance
17.35606
ANOVA
Source of Variation
Sample
Columns
Interaction
Within
SS
100.6667
1493.167
24.66667
262.25
Total
1880.75
7 10.87879
df
2
2
4
27
MS
F
P-value
F crit
50.33333 5.182078 0.012453 3.354131
746.5833 76.86463 7.14E-12 3.354131
6.166667 0.63489 0.641998 2.727765
9.712963
35
The analysis indicates that there is NOT a interaction between the rows (Factor: age)and the columns (Factor: Drug), with a pThe analysis indicates that there is a significant difference among the columns (Factor: Drug), with a p-value of 7.14E-12
The analysis indicates that there is a significant difference among the rows (Factor: age), with a p-value of 0.012
Suppose we think that two factors, gender and drug, may affect the patient's response
Factor: Drug
Drug A
Drug B
Under 21
118
120
118
118
110
121
118
123
21 to 55
120
118
118
117
121
110
125
118
Factor: Age
Under 21
Average
21 to 55
Average
Drug A
Drug B
Total
Under 21
Count
Sum
Average
Variance
4
4
464
482
116
120.5
16 4.333333
8
946
118.25
14.5
21 to 55
Count
Sum
Average
Variance
4
4
8
484
463
947
121
115.75 118.375
8.666667 14.91667 17.98214
Total
Count
Sum
Average
Variance
8
8
948
945
118.5 118.125
17.71429 14.69643
Drug A
118
118
110
118
116
Drug B > Drug A if under 21
Drug A
120
118
121
125
121
Drug B < Drug A if 21 to 55
ANOVA
Source of Variation
Sample
Columns
Interaction
Within
SS
0.0625
0.5625
95.0625
131.75
df
MS
F
P-value
F crit
1
0.0625 0.005693
0.9411 4.747225
1
0.5625 0.051233 0.82474 4.747225
1 95.0625 8.658444 0.012314 4.747225
12 10.97917
Total
227.4375
15
Total
605.4375
15
The analysis indicates that there IS an interaction between the rows (Factor: age)and the columns (Factor: Drug), with a p-valu
The analysis indicates that there is NOT a significant difference among the columns (Factor: Drug), with a p-value of .82
The analysis indicates that there is NOT a significant difference among the rows (Factor: age), with a p-value of .94
Is it correct to conclude that neither age nor drug have any effect?
How should we analyze the data to determine the effect(s), if any, of age and drug?
What would we have concluded if we had not tested for the interaction between age and drug?
Because we have a significant interaction, we have to look at the effect of the drug separately in each age group
Under 21
Average
21 to 55
Average
Drug A
118
Under 21
118
110
118
116
Drug B > Drug A if under 21
Drug B
120
118
121
123
120.5
Drug A
120
118
121
125
121
Drug B < Drug A if 21 to 55
Drug B
118
117
110
118
115.75
If we do not test for interaction, we would conclude that the drug has no effect.
If we test for interaction, we learn that the drug has different effects in different age groups, and that
the effects is significant, but in the opposite direction, in each age group.
Under 21
Drug B
120
118
121
123
120.5
If you have a missing value, the number of observations in each treatment condition is unbalanced.
This situation is called an unbalanced design
Excel cannot workon unbalanced designs.
You get an error message saying "Input range contains non-numeric data", because one of the cells is empty.
If you have an unbalanced design (missing values) use another statistics package such as R that will handle them.
Example: missing value in lower right cell of data.
Factor: Age
Factor: Drug
Drug A
Drug B
Under 21
118
128
117
126
110
125
118
131
21 to 55
120
130
118
132
121
129
124
130
Over 55
121
130
122
128
119
130
127
135
Drug C
135
136
130
135
136
136
140
131
134
131
138
cells is empty.
PatientID
P1
P2
P3
P4
P5
P6
P7
P8
P9
P10
P11
P12
P13
P14
P15
P16
P17
P18
P19
P20
P21
P22
P23
P24
P25
P26
P27
P28
P29
P30
P31
P32
P33
P34
P35
P36
Drug
Drug A
Drug A
Drug A
Drug A
Drug A
Drug A
Drug A
Drug A
Drug A
Drug A
Drug A
Drug A
Drug B
Drug B
Drug B
Drug B
Drug B
Drug B
Drug B
Drug B
Drug B
Drug B
Drug B
Drug B
Drug C
Drug C
Drug C
Drug C
Drug C
Drug C
Drug C
Drug C
Drug C
Drug C
Drug C
Drug C
In Excel, we perform a repeated measures ANOVA using the Data Analysis menu item "ANOVA: Two-factor without replication
Essentially, we consider treatment (the drug) to be one factor, and patient is the second factor.
PatientID
Drug A
118
117
110
118
120
118
121
124
121
122
119
127
1
2
3
4
5
6
7
8
9
10
11
12
Drug B
110
126
125
121
130
132
119
130
130
128
130
135
Drug C
125
117
110
131
130
132
129
130
118
121
130
135
Sum
353
360
345
370
380
382
369
384
369
371
379
397
Average
117.6667
120
115
123.3333
126.6667
127.3333
123
128
123
123.6667
126.3333
132.3333
Count
1
2
3
4
5
6
7
8
9
10
11
12
Drug A
Drug B
Drug C
3
3
3
3
3
3
3
3
3
3
3
3
12
12
12
Variance
56.33333
27
75
46.33333
33.33333
65.33333
28
12
39
14.33333
40.33333
21.33333
ANOVA
Source of Variation
Rows
Columns
Error
Total
SS
745.6388888889
332.0555555556
584.6111111111
1662.3055555556
df
MS
F
P-value
F crit
11 67.78535 2.550889 0.0296064203 2.258518
2 166.0278 6.247933 0.0070992405 3.443357
22 26.57323
35
The ANOVA p-value for patients (rows) is 0.0296, which indicates that patients differ.
The ANOVA p-value for the drugs (columns) is 0.0071, which indicates that the drugs differ.
We probably want to know which of the three drugs are different from each other.
When we compare pairs of treatments after an ANOVA, it is called "post-hoc" comparisons.
Excel doesn't have statistical tests that correct the p-values for doing multiple post-hoc comparisons.
To get p-values corrected for multiple comparisons we should use other software.
However, for now we'll use multiple t-tests, even though this method can give more false-positive results.
PatientID
1
2
3
4
5
6
7
8
9
10
11
12
Drug A
118
117
110
118
120
118
121
124
121
122
119
127
t-test for A vs B
t-test for A vs C
t-test for B vs C
Drug B
110
126
125
121
130
132
119
130
130
128
130
135
Drug C
125
117
110
131
130
132
129
130
118
121
130
135
0.007941
0.022839
0.822582
It appears that there is a big difference between A and B, and between A and C.
B and C do not appear to differ.
sitive results.