You are on page 1of 7

1

Handout 11 Estimating the Difference Between Two Means


Introduction In many applications one wishes to estimate the difference between two population means. For example, if X is the grade point average ( GPA ) of a randomly chosen GMU undergraduate at George Mason University , and if Y is the GPA of a randomly chosen undergraduate at the University of Virginia, then one might wish to estimate :X - :Y in order to establish bragging rights. There is an obvious point estimate for the difference :X - :Y , namely .

The more interesting ( and challenging question ) is how to construct a confidence interval on the difference between population means. In this handout we consider two cases : ( 1 ) The sample averages are based on simple random samples which are independent. ( i.e. each observation Xi is independent of each observation Yj ) ( 2 ) The sample averages are based on simple random samples so that pairs of observations Xi and Yi , 1 # i # n , are dependent , but the differences Di = Xi - Yi , are independent. Note that is the same as : averaging and then taking differences is the same as

taking differences and then averaging ; the best point estimator of :X - :Y is the same for both case ( 1 ) and case ( 2 ). The construction of an interval estimate is what differs in the two cases. STOP. Review the concept of independence ; also review the properties of population variances presented in Handout 5. This is a must if you want to actually understand whats going on, and not just mindlessly plug into canned formulas. The only new result we need is the very intuitive : Theorem If one takes two independent simple random samples, and if sample averages, then and are independent. and are the corresponding

Putting this together with what we knew already :

2 Theorem Suppose variables and are the sample averages based on independent simple random W= . , the variance of W is :

samples of sizes n1 and n2, respectively, then if

Application Let Y be the GPA of a randomly chosen senior graduating from GMU in 2003 ; let X be the GPA of randomly chosen senior graduating from the University of Virginia in 2003. Seniors at the University of Virginia believe that :X - :Y is positive ( naturally ) . Of course, GMU seniors are doubtful ; so a group of statistically minded seniors at the University of Virginia take two simple random samples, one at GMU , and one at UVA. Here are the results : sample size 64 49 sample mean 2.945 3.152 sample variance 0.36 0.25

GMU UVA

Assuming that GPA at both schools is normally distributed, calculate a 99% two-sided confidence interval for :X - :Y . Discuss the evidence for the claim that the average GPA for the seniors at UVA is higher than the average GPA for seniors at GMU in light of your result. Solution Let W = . From the last Theorem we know

Since one has fairly large sample sizes, its fairly safe to approximate the population variances by the sample variances given in the above table ; doing so one computes that is approximately 0.10357 .

By normality of W,

So

Substitution of the required values from the given data gives the 99% confidence interval : [ - 0.0597 , 0.4737 ]. So theres a pretty good case that the mean GPA of graduating seniors at UVA is higher that the mean GPA of graduating seniors at GMU ; on the other hand, noting that the left endpoint of the confidence interval is negative, if one wished to be 99% sure of the truth of ones conclusions, then one should still entertain a doubt that the GPA at UVA is actually higher ! The Paired T-statistic Note that the calculations in the last application were predicated on the assumption that the random variables under investigation were independent. There are many applications for which this assumption is not warranted. Some examples : ( 1 ) The score of a randomly chosen STAT250 student on examination 1 , and the score of the same student on the final . ( 2 ) The IQ score at age 21 for adopted twins are placed with two different families : one with a relatively higher household income, one with a relatively lower household income. (3) Pairs of treated fence posts are tested for durability by putting pairs of posts treated by two different methods next to each other in various locations and measuring the time until the posts can no longer bear a given side load.

In general, suppose one has n pairs of observations ( Xi , Yi ) , where for each i , the random variables Xi and Yi are dependent. In many applications, one may still assume the differences Di = Xi - Yi , 1 # i # n , are independent. Assuming that X and Y are normal, D is a normal random variable. Hence

has a Student-T distribution with n - 1 degrees of freedom. Note that :D so one may construct a confidence interval on :X - :Y in the usual way. ( Note that the point estimate of :X - :Y is =

: X - : Y,

, the same as in the case for

which X and Y are assumed to be independent ---- )

Application Here are the IQ scores for 9 twins at age 21 :

Rich Twin 110 132 120 108 88 95 90 156 134 Routine computations give

Poor Twin Difference 105 130 112 108 80 97 85 145 128 5 2 8 0 8 -2 5 11 6

= 4.78 ; s D = 4.147. So a 90 % confidence two

sided confidence interval on the mean difference in IQ is [ 4.78 - 1.86( 4.147)/3 , 4.78 + 1.86( 4.147)/3 ] or [ 2.21 , 7.35 ] .

5 Practice Problems ( and Solutions ) from Previous Examinations 1. A standardized chemistry test was given to 50 girls and 75 boys. The girls made an average score of 76 with a standard deviation of 6, while the boys made an average score of 82 with a standard deviation of 8. Assuming normality of the test scores : ( a ) Find a 96% confidence interval for :1 , the mean score of all boys ( b ) Find a 96% confidence interval for :2 , the mean score of all girls ( c ) Find a 96% confidence interval for the difference :1 - :2 Solution : Parts ( a ) and ( b ) Since the test scores are normally distributed, and the sample sizes are large enough to justify approximating the population standard deviation with the sample standard deviation, one may use the expression for a twosided 100( 1 - " )% given in the notes :

In the case at hand, " = 0.04 , so the standard normal table gives z0.02 = 2.0537. So the confidence interval for :1 is

or

[ 80.10 , 83.90 ] .

and the confidence interval for :2 is

or

[ 74.26 , 77.74 ].

6 Part ( c ) . If B denotes the score of a randomly chosen boy and G denotes the score of a randomly chosen girl, theres no reason to believe B and G are dependent. Hence if one lets W = , one is justified in writing

Which, in the case at hand, gives ( approximating population standard deviations by the sample standard deviations as before ) :

The required confidence interval for

:1 - :2 is

Or

[ 6 - ( 2.0537)( 1.2543 ) , 6 + ( 2.0537)( 1.2543 ) ] ,

which is computed to be [ 3.42 , 8.58 ] . 2. Twenty college freshman were divided into 10 pairs, each member of the pair having approximately the same I.Q. One of each pair was selected at random and assigned to a mathematics section using programmed materials only. The other member of each pair was assigned to a section in which the professor lectured. At the end of the semester each group was given the same examination and the following results were recorded: Pair Programmed Lecture Difference materials 1 76 81 -5 2 60 52 8 3 85 87 -2 4 58 70 - 12 5 91 86 5 6 75 77 -2 7 82 90 -8 8 64 63 1 9 79 85 -6 10 88 83 5

7 Give a 98% confidence interval for the difference in mean scores obtained using these two teaching techniques. Solution : Because the students were paired up in such a way as to have similar intellectual abilities, it may be the case that , X , the score of the student using programmed materials, and Y, the score of his partner who is attending lecture, are dependent random variables. On the other hand, there is no reason to believe that the differences in paired scores ( displayed above ) are dependent. Assuming test scores are normally distributed , one has all the hypotheses appropriate to using a paired t statistic. One computes = -1.6 and sD = 6.38 . From the tables of the

Student t distribution ( or MINITAB ), one finds that for 10 - 1 = 9 degrees of freedom t0.01 = 2.8214. A 100( 1 - " )% two-sided confidence interval for the mean difference has the form :

Substituting the appropriate values found above, a 98% two sided confidence interval is : [ -7.29 , 4.09 ]

You might also like