You are on page 1of 8

KEY

Summer 2013

ECON 335 Homework 2 75 points Basic Statistics and CLRM KEY 1. Suppose you collected data on the average number of text messages sent per month by 300 randomly selected teenagers over 2009 and 2010 and found the sample mean = 300 ( )) was 3,339 text messages with an estimated standard error of () = =1(

= 250. Assume that the population data are normally distributed. monthly teenager text messaging using your sample.

a) Provide an estimate of the mean and the variance of the population distribution of

is our best estimate of the population mean, while the population variance is estimated, ) and using the definition of standard error (the standard deviation of the estimator, (3339, 1.875 107 ). here, by 2 = (250 )2 = (250 300)2 1.875 107 . Note that we get this by solving for 2 . So we estimate that the population is approximately normally distributed b) Using the distribution you calculated in a), compute the probability that any teenager will send more than 5,500 text messages. By drawing the picture below, we see that we are looking for the area to the right of 5500

Y 3339 5500 1

KEY

Summer 2013

Note that we want to use the estimates of the population parameters to calculate an area under that distribution, NOT perform a hypothesis test on the mean. As such this is only an estimate as well. To find this, we need to transform the (3339, 1.875 107 ) distribution into one we follows 5500 5500 =

know about namely, a standard normal, or (0, 1). We make this transformation as ( 5500) =

Or, approximately 30.85%.

1 .6915 .3085

5500 3339 ( .50) = 1 ( .50) 4330.13

c) Approximately what proportion of teenagers send between 1,000 and 500 text messages per month?

This is the same sort of calculation as in part (b). Were looking to find (500 1000) = 500 1000

1000 3339 500 3339 = (.656 .540) 4330.13 4330.13

We can calculate this area as ( .540) ( .656) .2946 .2546 .04 or approximately 4%. d) How many text messages per month is associated with the top 10% of all teenagers? So, we are looking for the value of c such that ( ) = 0.90. Again, we are not

testing the hypothesis about the population parameter, so we use our estimate of the population mean and variance as follows:

KEY

Summer 2013

3339 = 0.90 ( ) = 0.90 = 0.90 4330.13 The critical value for which 90% of the density of a standard normal variable lies to the left is about 1.29. So 4330.13 1.29. Solving for c, we obtain 8924.87. Thus, about 10% of teenagers send more than 8925 texts per month. Again, note the form of these questions (a)-(d), we are asking about probabilities related to the distribution of the population NOT about inferences (drawing conclusions about the population parameter from the sample data) relating to the mean.
3339

e) Provide the 95% confidence interval for the mean number of text messages per month, and provide an interpretation.

Heres where we get into inference. Since we are using an estimate of the population to a t-distribution, variance, well transform =

For a 95% confidence interval, we want to find the critical t-stats that give us .025 of the probability mass in the two tails. These t-stats for (299) are 1.96 . Note this is

~ (1)

approximately the same critical values we would obtain from the standard normal distribution, however this is only because our number of observations here are large (if we had a smaller number of observations the appropriate number of degrees of freedom, i.e. n-1, would be needed to obtain the correct critical value)
3339 250

Our 95% confidence interval thus implies 1.96 Solving, we obtain [2849 3829] = .95

1.96 = .95

KEY

Summer 2013

Thus, the interval estimate for the mean number of texts per month by teenagers is (2849, 3829). We are 95% certain that this interval contains the true mean.

f) Test the null hypothesis that the mean number of text messages per month sent by teenagers is 2,500 at the 5% level of significance using a two-tailed test, being sure to write down the null and alternative hypotheses, the test statistic, indicate the rejection region, and provide a conclusion. Here we test 0 : = 2500 against the alternative : 2500 . Our test statistic (under the null hypothesis that = 2500) is = 3339 2500 = = 3.356 ~ (299) 250

The critical t-values, which define the rejection region are 1.96 . Our t-stat lies in the right tail of the rejection region, i.e. 1.96<3.356, so we REJECT the null hypothesis in favor of the alternative. In other words, we have enough evidence to conclude that the mean number of texts is different than 2500. This is consistent with our interval estimate 2500 does NOT lie in the estimated interval with 95% confidence, so we would reject the null hypothesis.

g) Repeat part f), but now test the alternative hypothesis that the mean number of text messages sent by teenagers per month is greater than 2,500. This is now a one tailed test, with 0 : 2500 against the alternative : > 2500. tail of the distribution only. Large t-stat values give us evidence in favor of the alternative. The critical t-value is 1.64, and clearly 1.64<3.356, so we REJECT the null hypothesis. Again, we have enough evidence to reject the null in favor of the alternative.
4

The test statistic is the same but now the rejection region becomes the right side of the

KEY

Summer 2013

2.

assumptions of the CLRM discussed in the lecture notes to answer the following questions)

Consider a CLRM of the form = 1 + 2 + where ~ (0, 2 ). (Use the

a. Identify the independent variable in this model. Is it random? If so, how is it distributed (what is the mean and variance)? Is it observable or not?

The independent variable is X, which is fixed in repeated samples and thus nonrandom. It is observable, as it is part of the data.

b. Identify the dependent variable in this model. Is it random? If so, how is it distributed (what is the mean and variance)? Is it observable or not?

The dependent variable is Y, which is random since it is a linear function of the random variable u. It is distributed ~ (1 + 2 , 2 ) which can be shown by taking the expectation and variance of Y. BE SURE you understand this! c. Identify the parameters of this model. Are they random? If so, how are they distributed (what is the mean and variance)? Is it observable or not? The parameters are 1 and 2 which, being true, unknown population parameters, are

non-random and not observable.

d. Identify the error term in this model. Is it random? If so, how is it distributed (what is the mean and variance)? Is it observable or not? The error term, is random, distributed ~ (0, 2 ) and unobservable since the underlying population parameters are unobservable.

KEY

Summer 2013

3. Consider the following data, where all summations are over the index i: 3 2 1 -1 0 6 2 4 2 -2 ) = ( ) = ( ) = ( 3-1=2 1 0 -2 -1 0 22=4 1 0 4 1 2 10 6-2.4=3.6 -0.4 1.6 -0.4 -4.4 0 -0.4 0 0.8 4.4 12

2(3.6)=7.2

5 12

? What is ? a. Complete the above table, putting the sums in the last row. What is = 5 12 = = = 1 = = 2.4 5 5

b. Calculate 1 and 2 from this chart and provide an interpretation of each. 2 is our estimated slope coefficient and can be calculated as follows and means that a one unit change in X will lead to a 1.2 unit change in Y. 1 is our estimated intercept coefficient and can be calculated as follows and means that the mean of Y will be equal to 1.2 when the value of X is zero. As we have seen and will continue to see, often this literal interpretation of the intercept term has no real economic importance. 2 = 2.4 1.2(1) = 1.2 1 = 2 = )( ) 12 ( = = = 1.2 )2 2 ( 10

KEY

Summer 2013

c. Use your results from b) to complete the following table, putting the sums in the last row. 3 2 1 -1 0 5 6 2 4 2 -2 12 -3.2 1.6 -2 0 0

1.2+1.2(3)=4.8 6-4.8=1.2 1.22=1.44 3(1.2)=3.6 3.6 2.4 0 1.2 12 -1.6 1.6 2 -3.2 0 2.56 2.56 4 10.24 2 20.8

, that the least squares residuals sum to zero, and Note that the average prediction is that the sum of the independent variable times the least squares residual is also zero. These are not coincidencethey are properties that arise from the CLRM. = 1 + 2 d. Show that for this problem,

= 1.2 + 1.2(1) = 2.4 =


e.

Show that for this problem,

f. Compute 2 (Note: This is the S.E. of the regression squared) 2 = 2 20.8 = 6.9333 2 3

12 = = 2.4 = 5

KEY

Summer 2013

e. Compute the R2 for this regression. Recall, 2 = 1 your result.

. Provide an interpretation of

First, we need to know that = 2 = 20.8, we calculated this in part (c). )2 . However, we need to calculate = 2 = ( ) 2 = (
2

= (6 2.4)2 + (2 2.4)2 + (4 2.4)2 + (2 2.4)2 + (2 2.4)2 = 35.2 2 = 1 20.8 =1 = 0.409 35.2

So, therefore

This means that approximately 41% of the variation in Y is explained by the independent variable X.

You might also like