Professional Documents
Culture Documents
2. Let D = X1 - X2, where X1 is the score of the first member of the pair and
X2 is the score of the second member. Assume X1 and X2 are normally distributed. Recall that
E(X + Y) = E(X) + E(Y) (Expectations Rule 8) and V(X ± Y) = V(X) + V(Y) ± 2 COV(X,Y) =
σ2X ± Y (Expectations Rule 15). Then,
E(D) = µ D = E( X 1 - X 2 ) = E( X 1 ) - E( X 2 ) = µ 1 - µ 2 ,
V(D) = σ 2D = V( X 1 - X 2 ) = V( X 1 ) + V( X 2 ) - 2COV( X 1 , X 2 ) = σ 12 + σ 22 - 2 σ 12 ,
SD(D) = σ D = σ 12 + σ 22 - 2 σ 12
Note that, because X1 and X2 are not independent, the covariance is not 0. Now, let
N N
∑(X 1i − X 2i ) ∑D i
D= i =1
= i =1
N N
N N
∑ E( D ) ∑ µ i D
N *µD
E( D ) = i=1
= i =1
= = µD= µD
N N N
⎛ N ⎞ N
⎜ ∑ Di ⎟ ∑V( D ) + 2 − 2 σ 12
i
N σ 2D
=σD =σ1 σ2
2 2
V( D ) = V ⎜ i =1 ⎟ = i= 1
= = σ 2D
⎜ N ⎟ N
2
N
2
N N
⎜ ⎟
⎝ ⎠
− 2 σ 12
SD( D ) = σ D = σ 1 σ 2
2 2 2
+
=σ D
N N
N N
∑ ( D − D) ∑D
2
i
2
i
2
− ND
sD2 = i =1
= i =1
= s12 + s22 − 2s12
N −1 N −1
N N
∑ ( Di − D)2 ∑D
2
i
2
− ND
sD = i =1
= i =1
= s12 + s22 − 2s12
N −1 N −1
2
sD = sD
sD =
N N
Incidentally, note that the above formulas suggest two strategies for finding the sample variance
of D. You could first compute D, and then compute its variance. Or, you could compute the
sample variances and covariances of X1 and X2. We’ll discuss the latter option more in the next
handout.
d - µ D0 d - µ D0 d - µ D0
t= = =
sD sD 2
sD
N N
When the null hypothesis is true, the test statistic will have a T distribution with N - 1 degrees of
freedom.
6. Confidence interval:
d ± ( t α/2,v * s D / N ), i.e.
d - ( t α/2,v * s D / N ) ≤ µ D ≤ d + ( t α/2,v * s D / N )
µ D ± ( t α/2,v * s D / N ), i.e.
0
µ D - ( t α/2,v * s D / N ) ≤ d ≤ µ D + ( t α/2,v * s D / N )
0 0
(a) Test, at the .05 level, whether there is any significant difference between the average
scores of husbands and wives.
(b) Construct the 95% c.i. for the average difference between the husband’s and wife’s score.
∑D i
d= i= 1
= 31 = 3.875
8 8
N N
∑( D i - D ) ∑D - N D
2 2 2
i
369 - 8 * 3.8752
s 2
D = i= 1
N -1
= i= 1
N -1
= 7
= 35.554
s D = 35.554 = 5.963
d - µ D0 d - µ D0 d d
t= = = =
sD sD
2
35.554 2.1081
N N 8
Step 3. For α = .05, accept H0 if -2.365 # T7 # 2.365, or, equivalently, accept H0 if -4.986 # µ̂ D
# 4.986 (since 2.365 * 2.1081 = 4.986).
d - µ D0 d - µ D0 d d 3.875
t= = = = = = 1.838
sD 2
sD 35.554 2.1081 2.1081
N N 8
d ± ( t α/2,v * s D / N ), i.e.
- 1.11 ≤ µ D ≤ 8.86
Note that the hypothesized value of E(D) = 0 falls within this interval, hence the c.i. confirms
that we should not reject H0.
3. Test statistic. Recall that, if N1 and N2 are sufficiently large, then p^1 -
N(p1, p1q1/N1), p^2 - N(p2, p2q2/N2). Therefore,
E( pˆ 1 - pˆ 2 ) = p 1 - p 2 ,
p1 q1 p2 q2 N 2 p1 q1 + N 1 p 2 q 2
V( pˆ 1 - pˆ 2 ) = + =
N1 N2 N1N2
( + )pq ( N 1 + N 2 )
V( pˆ 1 - pˆ 2 ) = N 1 N 2 = * p*q
N1N 2 N1N2
But of course, we don’t know what p is. And, within our samples, we have two estimates of p,
p̂1 and p̂2 . The pooled estimate of p and q is therefore
N pˆ + N 2 pˆ 2 X 1 + X 2 ˆ +
pˆ = 1 1 = , q=1− X 1 X 2
N1+ N 2 N1+ N 2 N1+ N 2
This result is very intuitive. It says that the pooled estimate of p equals the total number of
successes in the two samples divided by the total number of cases. Substituting the pooled
estimates of p and q into the previous formula for the variance, we get
⎛ + ⎞⎛ + ⎞⎛ + ⎞
V( pˆ 1 - pˆ 2 ) = ⎜ N 1 N 2 ⎟⎜ X 1 X 2 ⎟⎜ 1 - X 1 X 2 ⎟
⎝ N 1 N 2 ⎠⎝ N 1 + N 2 ⎠⎝ N 1 + N 2 ⎠
Again, recall that we are assuming that N1 and N2 are sufficiently large. Otherwise, our use of
sample estimates is questionable.
X1 − X2
pˆ 1 - pˆ 2 N1 N2
z= =
⎛ N 1 + N 2 ⎞⎛ X 1 + X 2 ⎞⎛ X 1 + X 2 ⎞ ⎛ N 1 + N 2 ⎞⎛ X 1 + X 2 ⎞⎛ X +X2⎞
⎜ ⎟⎜ ⎟⎜ 1 - ⎟ ⎜ ⎟⎜ ⎟⎜ 1 − 1 ⎟
⎝ N 1 N 2 ⎠⎝ N 1 + N 2 ⎠⎝ N 1 + N 2 ⎠ ⎝ N 1 N 2 ⎠⎝ N 1 + N 2 ⎠⎝ N1+ N 2 ⎠
pˆ 1 qˆ 1 pˆ 2 qˆ 2
pˆ 1 - pˆ 2 ± zα/2 +
N1 N2
6. EXAMPLES:
1. Two groups, A and B, each consist of 100 randomly assigned people who have a disease.
One serum is given to Group A and a different serum is given to Group B; otherwise, the two
groups are treated identically. It is found that in groups A and B, 75 and 65 people, respectively,
recover from the disease.
(a) Test the hypothesis that the serums differ in their effectiveness using α = .05.
(b) Compute the approximate 95% c.i. for p^1 - p^2.
Solution. (a)
Step 1:
H0 : p1 - p 2 = 0 or, p1 = p2
HA : p1 - p2 <> 0 or, p1 <> p2
X1 − X2
pˆ 1 - pˆ 2 100 100
z= =
⎛ N 1 + N 2 ⎞⎛ X 1 + X 2 ⎞⎛ X 1 + X 2 ⎞ ⎛ 200 ⎞⎛ X 1 + X 2 ⎞⎛ X 1+ X 2 ⎞
⎜ ⎟⎜ ⎟⎜ 1 - ⎟ ⎜ ⎟⎜ ⎟⎜ 1 − ⎟
⎝ N 1 N 2 ⎠⎝ N 1 + N 2 ⎠⎝ N 1 + N 2 ⎠ ⎝ 10,000 ⎠⎝ 200 ⎠⎝ 200 ⎠
X1 − X2 75 65
−
z= 100 100 = 100 100 = 1.543
⎛ 200 ⎞⎛ X 1 + X 2 ⎞⎛ X 1+ X 2 ⎞ ⎛ 200 ⎞⎛ 75 + 65 ⎞⎛ 75 + 65 ⎞
⎜ ⎟⎜ ⎟⎜ 1 − ⎟ ⎜ ⎟⎜ ⎟⎜ 1 − ⎟
⎝ 10,000 ⎠⎝ 200 ⎠⎝ 200 ⎠ ⎝ 10,000 ⎠⎝ 200 ⎠⎝ 200 ⎠
- .026 ≤ p 1 - p 2 ≤ .226
Note that 0 falls within the approximate c.i., again suggesting H0 should not be rejected.
COMMENTS:
(1) Try working this problem as though it fell under Case II. You will find that results are
virtually identical.
(2) Again, remember the c.i. is only approximate, but it works pretty well with large samples.
mean ∑X
i= 1
i ∑( X
i=1
1i - X 2i ) ∑D
i=1
i
X= D= =
N N N
Sample 1 1
∑( x i - x ) = ∑( d i - d ) =
2 2
standard sX = sD =
N -1 N -1
deviation
1 1
( ∑ X i2 - N X 2 ) ( ∑ d i2 - N d 2 )
N -1 N -1
Estimated sX sD
standard s X = σ̂ X = s D = σ̂ D =
N N
error of the
estimate
Test x - µ X0 x - µ X0 d - µ D0 d - µ D0
statistic t= = t= =
sX 2
sX sD sD
2
N N N N