You are on page 1of 10

3.

The Sampling Distribution of the Variance

Theorem 8 (A random variable having the chi square distribution) If S 2 is the variance of a random sample of size n taken from a normal population having the variance 2 , then 2 = (n 1)S 2 = 2
n i=1 (Xi 2

)2 X

(37)

is a random variable having the chi square distribution with parameter = n 1, cf. Theorem 4. The probability density function of a chi square distribution is not symmetric, cf. Figure 4. Example 26 A random sample of 10 observations is taken from a normal population having the variance 2 = 42.5. Find the approximate probability of obtaining a sample standard deviation between 3.14 and 8.94. Solution. We want to nd P (3.14 < S < 8.94) = P (3.142 < S 2 < 8.942 ) = P( By Theorem 8,
(n1)S 2 2

(n 1)S 2 9 8.942 9 3.142 < < ). 42.5 2 42.5

has a 2 distribution with n 1 = 9 degrees of freedom. Since 9 3.142 9 8.942 = 2.0879, = 16.9250. 42.5 42.5

By the chi square distribution table, 2 0.99 As a result, P (3.14 < S < 8.94) 0.99 0.05 = 0.94 2.0879, 2 0.05 16.9250.

32

A problem closely related to that of nding the distribution of the sample variance is that of nding the distribution of the ratio of the variances of two independent random samples. This problem is important because it arises in tests in which we want to determine whether two samples come from populations having equal variances. If they do, the two sample variances should be nearly the same; that is, their ratio should be close to 1. To determine whether the ratio of two sample variances is too too or too large, we use the theory given in the following theorem.
2 2 are the variances and S2 Theorem 9 (A random variable having the F distribution) If S1

of independent random samples of size n1 and n2 , respectively, taken from two normal populations having the same variance, then F =
2 S1 2 S2

(38)

is a random variable having the F distribution with parameters 1 = n1 1 and 2 = n2 1. This is a direct consequence of Theorem 4 and Denition 2. Example 27 If two independent random samples of size n1 = 7 and n2 = 13 are taken from a normal population, what is the probability that the variance of the rst sample will be at least three times as large as that of the second sample? Solution. We are asked to nd
2 2 P (S1 3S2 ) = P( 2 S1 3). 2 S2
2 S1 2 S2

Since 1 = 7 1 = 6, 2 = 13 1 = 12, the random variable with (6, 12) degrees of freedom. By F-distribution table, F0.05 = 3.00.

has an F-distribution

Hence the probability that the variance of the rst sample will be at least three times as large as that of the second sample is 0.05. 33

Normal distributions have been assumed for Theorems 7 and 8.

Chapter 7. Inferences Concerning a Mean


Section 7.1. Point Estimation
Point Estimation: use a statistic like sample mean x or sample standard deviation s to estimate a certain parameter of the population like mean and standard deviation .

Remark 5 Theoretic background for point estimation. According to Theorem 5, X = , provides some information about therefore upon repeated sampling, the distribution of X . On the other hand, note that for experiments with replacement or experiments with innite population, random variables X1 , X2 , , Xn in a random sample are independent, so as shown in tutorial S 2 = 2 . As a result upon repeated sampling, the distribution of S provides some information about . Finally, according to Theorem 5, for experiments with replacement or experiments with innite population, sample standard deviation is given by / n. Consequently, we may use S/ n to estimate the sample standard deviation of . X

Point estimation of a mean Parameter: population mean Data: a random sample X1 , X2 , , Xn Estimator: X Estimator of standard error:
S n

Example 28 Scientists need to be able to detect small amounts of contaminations in the environment. As a check on the current capabilities, the following measurements were 34

made on test specimens spiked with a known concentration 1.25g/l of lead. That is, the readings should be average 1.25 if there is not background lead in the samples

2.4 2.9 2.7 2.6 2.9 2.0 2.8 2.2 2.4 2.4 2.0 2.5

(a) Make a dot diagram. (b) Compute the point estimate x and its Estimate of standard error
s . n

Solution. (a) (b) The sample size is n = 12. We calculate 1 x = n


12

i=1

1 xi = 2.4833, s = n1
2 s 12

12

( xi x )2 = 0.0979.
i=1

Therefore estimated standard error is

= 0.0903.
s 12

Because 2.4833 is quite dierent from 12.5 and estimated standard error

= 0.0903

is small, there appears to be either a bias due to the laboratory procedure or some lead already in the samples before they were spiked.

35

) = , we call X Since E (X an unbiased estimator of . In general we have is said to be an unbiased estimator, or Denition 5 (Unbiased estimator) A statistic its value an unbiased estimate, if and only if the mean of the sampling ditribution of the estimator equals , whatever the value of . For example, since E (S 2 ) = 2 , sample variance S 2 is an unbiased estimator of 2 , in other words, s2 is an unbiased estimate of 2 . For a given parameter , its unbiased estimators may not be unique.

Example 29 Let X1 , X2 be a random sample of size 3 from a population. Show that both
X1 +X2 2

and

aX1 +bX2 a+b

are unbiased estimators of the mean , where a, b are any positive

constants.

Solution. X1 + X 2 2 and aX1 + bX2 a+b a + b = . a+b

E
X1 +X2 2

= , E

That is, both

aX1 +bX2 a+b

are unbiased estimators of the mean .

As unbiased estimators are in general not-unique, the following question is natural: How to decide which of several unbiased estimators is best for estimating a given parameter? 1 is said to be a more eDenition 6 (More ecient unbiased estimator) A statistic 2 if cient unbiased estimator of the parameter that the statistic 1 and 2 are unbiased estimators of ; 1. Both 2. For at least one value of , 1 ) V ar( 2 ). V ar( (39)

36

and S 2 , Note that unbiased estimators are themselves random variables, for example, X and S 2 . Thus variances in Eq. they have distributions, like sampling distributions of X (39) are well-dened.

Example 30 To estimate, in a large population, the unknown proportion p of individuals who possess a certain characteristic, two random samples of size m and n were independently drawn from the population. Let X denote the number of individuals in the sample of size m who possess the given characteristic and Y the corresponding number for the sample of size n. (i) Identify the distributions of X and Y . (ii) Show that both the statistic 1 2 X Y + m n X +Y m+n

T1 =

and T2 =

are unbiased estimators of p. (iii) Evaluate V ar(T1 ) and V ar(T2 ). (iv) By referring to the relative eciency of T1 with respect to T2 , that is mine which is a better estimator.
V ar(T2 ) , V ar(T1 )

deter-

Solution. (i). The population has a binomial distribution with proportion p of individuals who possess a certain characteristic. Therefore, the random sample X has a binomial distribution with mean mp and variance mp(1 p), while the random sample Y has a binomial distribution with mean np and variance np(1 p). (ii). Note that 1 2 E (X ) E (Y ) + m n 37 1 mp np + = p, 2 m n

E (T1 ) =

and E (T2 ) = 1 1 E (X + Y ) = (mp + np) = p, m+n m+n

both T1 and T2 are unbiased estimators. (iii). 1 1 1 1 p(1 p) V ar ( X )+ V ar ( Y ) = mp (1 p )+ np (1 p ) = 4m2 4n2 4m2 4n2 4 1 1 + m n

V ar(T1 ) =

V ar(T2 ) = (iv). Since

1 1 p(1 p) (V ar(X ) + V ar(Y )) = (m + n)p(1 p) = 2 2 (m + n) (m + n) m+n

V ar(T2 ) 4mn = 1, V ar(T1 ) (m + n)2 T2 is a better unbiased estimator.

38

Section 7.2. Interval Estimation


Example 31 A random sample of size n = 100 is taken from a population with standard deviation = 5.1. Let sample mean x = 21.6 and = 0.05. Find the constant number a such that P x a x + a n n = 1 . (40)

Solution. Eq. (40) is equivalent to P Let Z :=


X . / n

x a / n

= 0.95.

Since n = 100 is big, Z is approximately standard normal. The above

equation is equivalent to P (a < Z < a) = 0.95. That is P (Z > a) = 0.025. By the normal table, a = z0.025 = 1.96. As a byproduct, we have P x z/2 x + z/2 n n = P (20.6004 22.5996) = 0.95. (41)

We call the interval (20.6004, 22.5996) a condence interval for having the degree of condence 1 = 0.95 and its two endpoints 20.6004, 22.5996 as the condence limits. For the above example, we see that given a random sample of large size from a population with known , a condence interval with degree of condence 1 for is x z/2 x + z/2 n n (42)

When is unknown but the sample size is big, we may substitute sample standard deviation s for standard deviation . In this case, given a random sample of large size from a population with unknown , a condence interval with degree of condence 1 for is s s x z/2 x + z/2 n n 39 (43)

Example 32 Engineers fabricating a new transmission-type electron multiplier created an array of silicon nano-pillars on a at silicon membrane. The precise structure can inuence the electrical properties, so the heights of 50 nano-pillars were measured in nano-meters (nm), or 109 meters.

245 333 296 304 276 336 289 234 253 292 366 323 309 284 310 338 297 314 305 330 266 391 315 305 290 300 292 311 272 312 315 355 346 337 303 265 278 276 373 271 308 276 364 390 298 290 308 221 274 343 Construct a 99% condence interval for the population mean of all nano-pillars

Solution. By calculation

x = 305.5800, s = 36.9711, z0.01/2 = 2.575. we get 36.9711 36.9711 305.5800 2.575 305.5800 + 2.575 50 50 or (292.1166, 319.04340). We are 99% condence that the interval (292.1166, 319.04340) contain the true mean nano-pillar height.

40

For a random sample of small size from a normal population, we know that X S/ n

t=

has a t-distribution of n 1 degree of freedom. By Theorem 7, a 100(1 )% small sample condence interval for of normal population can be s s x t/2 x + t/2 n n

(44)

Example 33 The mean weight loss of n = 16 grinding balls after a certain length of time in mill slurry is 3.42 grams with a standard deviation of 0.68 gram. Construct a 99% condence interval for the true mean weight loss of such grinding balls under the stated conditions. (It is assumed that mean weight loss grinding balls after a certain length of time in mill slurry has a normal distribution.)

Solution. We are given

n = 16, x = 3.42, s = 0.68, = 0.01.

Moreover, by the t-table, t0.005 = 2.947. Substituting them into Eq. (44) yields an interval 2.9190 3.9210.

Thus we are 99% condent that the interval from 2.9190 grams to 3.8210 grams contain the mean weight loss.

41

You might also like