You are on page 1of 6

Baruch College, Fall 2016, 09-06-16

STA 9700 Homework #1

Section 1.2

1.2.1 A professor recently bought six bags of Quikrete Mortar Mix and measured their net
weights as follows, in pounds:
54, 60, 63, 61, 58, 65.
These were from one production run. Assume the size of the production run was 5,000 bags.
(a) Using a pocket calculator where needed, compute a 95% confidence interval for .
s
A 95% confidence interval for is commuted as y t .05/2 y , where y = 16 yi = (54 + 60 +
n
y y =
1 2
63 + 61 + 58 + 65) / 6 = 60.167 pounds, sy = n -1 i

[(54 - 60.17) 2 + (60 - 60.17) 2 + (63 - 60.17) 2 + (61 - 60.17) 2 + (58 - 60.17) 2 (65 - 60.17) 2 ]/ 5 =
74.835/5 = 14.967 = 3.869, t-critical value for n-1 = 5 degrees of freedom and =5% equals
2.571. Therefore for our sample data, the 95% confidence interval for is:
s
y t .05/2 y = 60.167 2.571 * 3.8687/ 6 = 60.167 2.571 * 1.579 = 60.167 4.06061 =
n
(56.11, 64.23).
Answer: 95% confidence interval for is (56.11, 64.23)

(b) What assumptions are you making about the nature of the sample and the nature of the
population of net weights?

I am assuming that the population of net weights of this production run of 5,000 Mortar Mix
bags follows approximately a Normal distribution and that the sample of these particular 6 bags
was drawn (bought) at random from that population.

1.2.2 For the above computation, complete the following sentence without using Greek symbols,
We are 95% confident that

We are 95% confident that the population average for the net weights of this production run of
5,000 bags of Quikrete Mortar Mix lies between 56.11 and 64.23 pounds (assuming that the
population follows a Normal distribution and 6 bags in the sample were purchased at random
from that population).

Section 1.3

1.3.1 Modify the SAS example program in Section 1.3 to run on the Quikrete data set. Be sure
to change both the variable name and the title. Add your name at the end of the title. Show your
program and the output. Check that your original outcome matches. (With some
experimentation, you should be able to Copy and Paste from SAS into Word. This will make

1
your much more readable than just adding pages printed directly from SAS. Often, using font
Courier 8 pt or 10 pt will get the SAS output to line up nicely.)

Editor

dm "out; clear; log; clear";

Data QuikreteMortarMix6;
Input NetWt;
Datalines;
54
60
63
61
58
65
;

Proc Means Mean Std StdErr N CLM MaxDec=3;


Title "Univariate Summary for Net Weights of 6 Bags of Quikrete Mortar Mix";
run; quit;

The Output

Section 1.4

1.4.1 Tell the Story of Many Possible Samples for the Quikrete data, still assuming the
production run was of size N=5000. Try to write this in your own words and for an audience of
beginners. Include a sketch of the probability distribution of the parent population and that of
the daughter population.

A professor bought six bags of Quikrete Mortar Mix that were from one production run of 5,000
bags. Its naturally to suppose that many or even in fact all the bags of the mix from this
production run were of different weights. Therefore we can say that the professor got so called
sample of 6 bags (weighting 54, 60, 63, 61, 58, 65 pounds) out of population of 5000 bags.
These 6 bags represent a sample point. Its also clear that on any other day (or even time of the

2
day) mentioned professor would have probably gotten a slightly different sample point (lets say
for example of bags weighting 57, 62, 64, 56, 59, 60). All these possible sample points of the
original population can be counted by a simple formula of combinations without replacement
19 19
5,000C6 = 2,16410 . Therefore we have 2,16410 possible combinations of 6 bag samples,
where no bag can appear more than once in a given sample point but appears in many, many
different other sample points. This complete collection of sample points is called the sample
space of Quikrete Mortar Mix bags of production run of 5,000. Each sample point has its own
sample average. The population of 2,1641019 sample averages is called the daughter
population as opposed to the original parent population of 5,000 net weights of the bags. The
daughter population has its own statistical characteristics, including population average, which is
called the grand average of the sample averages Y . It is following that as one single bag
appears in many, many sample points the grand average and the sample average are always of
the same value Y = Y as pictured on the sketch of the probability distribution of the parent
population and that of the daughter population.

Section 1.5

1.5.1 What does it mean with respect to the Quikrete data set to say that the sample average is
an unbiased estimator?

It means that the methodology of using the sample average to estimate the population average is
unbiased because the grand average of the sample averages Y equals to the population average
. Or, in other words, because these two averages have the same value we can use the same
method of estimating one to estimate the other. With respect to the Quikrete data we can say that
60.167 pounds is an unbiased estimate of .

Section 1.6

1.6.1 Explain the meaning of a 95% confidence interval for Quikrete using the Story of Many
Possible Samples.

3
According to our calculations a 95% confidence interval for is (56.11, 64.23), from the net
weights of originally bought six bags. Its reasonable to suppose that not every possible sample
point of six bags of the mix will yield the same confidence interval. In fact they all will be
different (one for each sample point). But 95% of those confidence intervals of 2,1641019
sample points will capture the population average of the parent population of 5,000 bags.

Section 1.7

1.7.1 Using the Quikrete data, above, perform a two-sided t-test with a null hypothesis claiming
is equal to 60 pounds. Do this task with a pocket calculator, showing all work.
(Fortunately, in question 1.2.1(a) you already computed the sample standard deviation
and the standard error of the sample mean, so the computations will be few.)

Explain your work as you go. For example, if y was close to o, would you reject the null?
Why divide the raw distance between the sample average and the claimed value of by the
standard error of the sample average.

Include a sketch of the relevant t-distribution showing the t-critical values, the t-statistic
(or "t-value"), the rejection regions, and the p-value.

For a two-sided test with a null hypothesis claiming Y is equal to 60 pounds, the alternate
hypothesis will state that Y 60 pounds. So we have Ho: Y = 60 and HA: Y 60. We will
reject the claim that Y = 60 if our sample average is far in standardized units from 60. To find
how far 60 is in standardized units from 60.167 we will need to divide the raw distance between
the observed sample average and the claimed value of , found as 60.167 60 = 0.167 by the
standard deviation of the population distribution Y . However the exact value of Y in our case
is unknown, but we can use the standard error of the sample mean s y = 1.579 as an estimate for
y o 0.167
Y . Hence, we have the distance that is called t-statistic = = = 0.106 ~ 0.11,
sy 1.579
which is technically the number of standard deviations between the sample average and the
claimed value of . The division of the raw distance between the sample average and the claimed
value of by the standard error of the sample average is done in order to understand in statistical
terms how far the claimed population average from our sample average is. The matter is that in
statistics such words as far or close are relatively irrelevant, that said they dont carry any
valuable meaning. However if we state that the sample average is greater than the claimed
population average by 0.167 standard deviations, that is a valuable information to work with.
Now as t-critical value for n-1=5 degrees of freedom is 2.571 and the absolute value of 0.106 is
not greater than t-critical, we do not reject the null hypothesis. It is very often that the null
hypothesis cant be rejected as we work with small samples (6 bags sample in our case), however
it does not mean that it is not false.

4
t-Distribution with 5 degrees of freedom
t-critical values of -2.571 and 2.571
t-statistic is 0.11, p-value 0.9201

1.7.2 Perform the above t-test using SAS, inserting your name into the output. Show the program
and the output.

Editor:
dm "out; clear; log; clear";

Data Quikrete Mortar Mix;


Input NetWt;
Datalines;
54
60
63
61
58
65
;
Proc Ttest H0=60;
Title "t-test for Claim mu=60, M. Kanakidi";
run;quit;

The Output (the following page)

Section 1.8

1.8.1 Compute the 95% prediction interval for the Quikrete data. Use a pocket calculator and
show your work. (Again, you have already done almost all the needed computations.)
The 95% prediction interval has a following form y t/2 sy 1 (1/n) , i.e. in Quikrete data
case: 60.167 2.5713.869 1 (1/6) or 60.167 10.744
Answer: the 95% prediction interval is 60.167 10.744.
5
1.8.2 What is this interval trying to capture? How is that different from a confidence interval?

The 95% prediction interval is trying to capture the centered value of 95% of net weights of the
parent population of 5,000 bags. The confidence interval tries to capture the location of the
population average of the parent population, Y .

You might also like