You are on page 1of 15

STA 6166, Section 8489, Fall 2007 Homework #4 Due 18 October 2007

RAMIN SHAMSHIRI UFID#: 9021-3353

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 1

1. Chapter 4, Freund and Wilson, page 180. Do all 13 concept questions (true/false). 2. Chapter 4, Freund and Wilson, page 181. Under practice exercises, do questions 2, 3, and 4. If possible, use a software package to answer questions 2 and 3. 3. Chapter 7, Freund and Wilson, page 327. Under exercises, do question 1 by hand. Show all work. 4. Chapter 7, Freund and Wilson, page 328. Under exercises, do question 5 using a software package. Do not submit raw output from your analysis. Please embed any tables, graphs, etc into your write-up (as tables or graphs, etc).

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 2

Chapter4Concept Questions 1. The t-distribution is more dispersed than the Normal. Answer: True The variance of t-distribution is greater than 1 2. The x2 distribution is used for inference on the mean when the variance is unknown. Answer: False The x2 distribution is used for a variance of standard deviation inference 3. The mean of the t distribution is affected by the degree of freedom. Answer: False The mean of the t distribution is equal to zero, like the z distribution. 4. The quantity
( ) 2 /

has the t distribution with (n-1) degrees of freedom.

Answer: False That is a z-test for mean, when the population standard deviation is known 5. In the t-test for a mean, the level of significance increases if the population standard deviation increases, holding the sample size constant. Answer: False, Because there is no relation between the significance level and the population standard deviation 6. The x2 distribution is used for inferences on the variance Answer: True 7. The mean of the t distribution is zero. Answer: True 8. When the test statistic is t and the number of degrees of freedom is >30, the critical value of t is very close to that of z. Answer: True In fact, as the sample size increases, (degree of freedom increases), the t distribution approaches the standard normal distribution. 9. The x2 distribution is skewed and its mean is always 2. Answer: False The x2 distribution is positively skewed and its shape changes as the degrees of freedom changes. The higher the degree of freedom, the less skewed this distribution is, so the Mean of this distribution is not unique. At about 100 degrees of freedom, the x2 distribution becomes somewhat symmetric. 10. The variance of a binomial proportion is np(1-p) Answer: True
Ramin Shamshiri STA6166, HW#4, Oct.18.2007 Page 3

11. The sampling distribution of a proportion is approximated by the x2 distribution. Answer: False The sampling distribution of a proportion (binomial distribution) is approximated by normal distribution 12. The t test can be applied with absolutely no assumption about the distribution of the population Answer: False The assumption for the t test is that the distribution is approximately normal.

13. The degrees of freedom for the t test do not necessarily depend on the sample size used in computing mean. Answer: True It depends on the sample size used in computing standard deviation.

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 4

Chapter 4- question No.2 The following sample was taken from a normally distributed population: 3,4,5,5,6,6,6,7,7,9,10,11,12,12,13,13,13,14,15

a- Compute the 0.95 confidence interval on the population mean 95% Confidence Interval => =0.05 For a 95% CI, and degree of freedom (19-1=18), the = 0.05 = 0.025 is equal to 2.1 from the table.
2 2 2

Statistic Value n 19 9 2 s 14.44 s 3.81

If we had the population variance, we could use the normal distribution and respectively. Here we do not have the population variance, in addition, the sample size is also less than 30, thus we should use the t-distribution. Using the below formula for the confidence interval of the Mean for a specific 9 2.1 3.81 < < + 3.81 19

< < 9 + 2.1 19 7.17 < < 10.83

b- Compute the 0.90 confidence interval on the population standard deviation . 90% Confidence Interval => =0.1=> /2=0.05
1 . 2

In order to calculate these confidence intervals, the Chi-square 2 distribution is needed. The chi-square distribution is obtained from the value of 2 = when random samples are selected from a 2 2 normally distributed population whose variance is . From the 2 table, we have the lower and upper tail as below: Formula for confidence interval for a variance: (d.f=n-1) (
1 . 2
2

) < 2 < (

1 . 2
2

) or (

1 . 2
2 /2

) < 2 < ( 2

1 . 2
(1 /2)

19 1 14.44 19 1 14.44 ) < 2 < ( ) 28.869 9.39 9 < 2 < 27.6

Formula for the confidence interval for a standard deviation: (d.f=n-1) 1 . 2 < < 2 19 1 14.44 < < 28.869 1 . 2 2 19 1 14.44 9.39

3 < 2 < 5.25

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 5

Chapter 4- question No.3


Using the data in exercise 2, test the following hypothesis: 3.aH0: =13 H1: 13 This is a test for population mean, and here we do not know the population variance, so we use the ttest. Using 95% confidence interval, we have =0.05. Since this is an equivalency test, it is a two-tailed test, so we need to find the t-value corresponding to the /2=0.025 with 18 degrees of freedom. t(df=18, /2=0.025)=2.1 t(df=18, -/2=0.025)=-2.1 We would reject the null hypothesis if the t-value from the test is either less than -2.1 or larger than 2.1. In other words, we reject the null hypothesis if | t-value |>2.1. = / = 9 13 3.81/ 19 = 4.57

Since -4.57 is less than -2.1, we reject the Null hypothesis and we say that there is not enough evidence to show that the population mean is equal to 13. Figure below also shows the results.

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 6

3.bH0: 2=10 H1: 210 This is a test for population variance, and we should use the chi-square test. This distribution is used to test a claim about a single variance or standard deviation. Formula for the Chi-square test for a single variance (d.f=n-1) ( 1) 2 2 = 2 Assumptions for the chi-square test for a single variance: The sample must be randomly selected The population must be normally distributed for the variable under study The observation must be independent of each other Using 95% confidence interval, we have =0.05. this is also a two-tailed test. We need to find both the upper and lower level of x2-value from the table. x2 (df=18, /2=0.025)=31.526 x2 (df=18, 1-/2=0.975)=8.231 We would reject the null hypothesis if the x2-value from the test is either less than 8.231 or larger than 31.526.

2 =

(1) 2 2

191 14.44 100

= 2.59

Since 2.59 is less than 8.231, we reject the null hypothesis and we say that there is not enough evidence that the population variance is equal to 10. Figure below:

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 7

Chapter 4- question No.4 A local congressman indicated that he would support the building of a new dam on the Yahoo River if at least 60% of his constituents supported the dam. His legislative aide sample 225 registered voters in his district and found 135 favored the dam. At the level of significance of 0.1 should the congressman support the building of the dam? Answer:
A hypothesis test involving a population proportion can be considered as a binomial experiment when there are only two outcomes and the probability of a success does not change from trial to trial. For the binomial distribution, =np and = . . Since the normal distribution can be used to approximate the binomial distribution when np5 and nq5, the standard normal distribution can be used to test hypothesis for proportions:

Lets first check the condition:


np5: (225*0.6=135>5) Yes nq5: ( 225*0.4=90>5 ) Yes

We claim that at least 60% of his constituents support the dam. So, we test the below hypothesis:
H0: p<0.6 H1:p0.6 This is equal to test the below hypothesis: H0: np ()<135 H1: np ()135 This is right-tailed test with =0.1=> Z 0.1=1.28 [from table] We will reject the claim, if the Z-value from test is less than 1.28. This will led to Not rejecting the Null hypothesis. The Z-test for the proportion: ( =X/n=135/225=0.6 is the sample proportion) 0.6 0.4 = = = 6.12 . / 0.6 0.4/225 Since the z-value from the test is larger than 1.28, we reject the null hypothesis and conclude that there is not enough evidence to reject the claim. In other words, there is enough evidence showing that at least 60% of his constituents support the dam.

Figure below:

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 8

Chapter 7- Question1 Oxidation Temperature y x 4 -2 3 -2 3 0 2 1 2 2

1. a- Calculate the estimated regression line to predict oxidation based on temperature. Explain the meaning of the coefficients and the variance of residuals. Solution:
X: independent variable= Temperature Y: Dependent variable= Oxidation A mathematical expression for a straight line is: y=a0+a1x+e a0: Intercept a1: Slope e: Error or residual between model and observation = y-a0-a1x So, error is the discrepancy between the true value of y and the appropriate value, a0+a1x, predicted by the linear equation One strategy to have a best fit is to minimize the sum of the residual errors for all the available data is:
=1

=1 (

0 1 )

Another logical criterion might be to minimize the sum of the absolute values of the discrepancies; = ( 0 1 ) =1 =1
A third strategy is the mini-max criterion, which can be represented as below: = 2 = (, , )2 ( 0 1 )2 =1 =1 =1 Among these three strategies, the third one is used here to determine the a0 and a1. The procedure is demonstrated as below: = 2 0 = 2 1 ( 0 1 ) ( 0 1 )

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 9

Setting these derivatives to zero will result in a minimum S r: And 0 0 = 0 1 . 2 = 1 . 1 2 = 0 2 0 1 = 0

yi na 0 a1 0

xi = 0 2 = 0

1 =


2 (

0 = 1
Where: =

)2

and =

are the means of y and x respectively.

A same approach, but different notations is mentioned in the Freund and Wilson book, page 295 and 296 to find the slope and intercept parameters of the model, which is mentioned below: y=0+1x+ y= y|x + y|x =0 + 1x The least squares criterion requires that we choose estimates of 0 and 1 that minimize ( y|x )2 = ( 0 1 )2

0 = 1 1 = ( ) = ( )2 ( )2

= =

( )2 =

( ) =

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 10

No

1 2 3 4 5

x -2 -2 0 1 2 x = 0.2

y 4 3 3 2 2 y = 2.8

-1.8 -1.8 0.2 1.2 2.2

( ) 3.24 3.24 0.04 1.44 4.84 =12.8

( ) 1.2 0.2 0.2 -0.8 -0.8

( )( ) -2.16 -0.36 0.04 -0.96 -1.76 = -5.2

-8 -6 0 2 4 = -8

4 4 0 1 4 =13

( ) 5.2 = = 0.40625 ( )2 12.8 0 = 1 =2.8-(-0.4)(-0.2)=2.72 or 1 =

1 =


2 (

)2

0 = 1 =2.8-(-0.4)(-0.2)=2.72
y|x =2.72 -0.40625x is the estimated regression line to predict y, which is oxidation based on temperature (x) Explanation: y The meaning of Coefficients: The equation y|x =2.72 -0.40625x is a straight line, with intercept 2.72 and slope -0.40625. Variance of residual: =y- y|x is called residual: MSE: Mean square or variance of these residuals is =

5 8 1 (14) 5 13 (1)2

40+14 651

26 64

= 0.40625

( y |x )2 2

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 11

1.b- Calculate the estimated oxidation thickness for each of the temperatures in the experiment. Solution: x= -2 => y|x = 2.72 0.40625(-2) =3.53 x= 0 => y|x = 2.72 0.40625(0) =2.72 x= 1 => y|x = 2.72 0.40625(1) =2.31 x= 2 => y|x = 2.72 0.40625(2) =1.90 y|x 3.53 3.53 2.72 2.31 1.90 = 2.798

No 1 2 3 4 5

x -2 -2 0 1 2 x = 0.2

y 4 3 3 2 2 y = 2.8

1.c- Calculate the residual and make a residual plot. Discuss the distribution of the residuals. Solution: =y- y|x No 1 2 3 4 5 x -2 -2 0 1 2
x = 0.2

y 4 3 3 2 2
y = 2.8

y|x 3.53 3.53 2.72 2.31 1.90 = 2.798

0.47 0.47 0.28 -0.31 0.1

2 0.23 0.23 0.0785 0.0961 0.01 = 0.6446

MSE =

( y |x )2 2

0.6446 3

= 0.2148

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 12

1.d- Test the hypothesis that 1=0, using both analysis of variance and t tests. Answer: Testing for 1=0 means that if there is a relationship between X and Y. H0: 1=0 H1: 10 Testing with 95% confidence interval, the significance level () is 5%. Since this is a two-tailed test,
we look for From the t-table, t(=0.025,df=3) which is equal to 3.1824. We would reject the null hypothesis if our ttest value is larger than |3.1824|.

The test statistic for this is =


1 = MSE = =

1 0
1

0.40625 0.129

= 3.149 with df=n-2=5-2=3

= ( )2
( y |x )2 2

0.2148 = 0.129 12.8 = 0.2148

0.6446 3

Since the |t-test value| is smaller than |3.1824|, We do not reject H0.

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 13

Chapter 7- Question5 It is generally believed that taller persons make better basketball players because they are better able t put the ball in the basket. Table below list the height of a sample of 25 non-basketball athletes and the number of successful baskets made in a 60-s time period.

a- Perform a regression relating Goals to Height to ascertain whether there is such a relationship and if there is, estimate the nature of that relationship. Answer: Goals data: Dependent data (Y) Height: Independent Data (X)
-2 1 -3 -2 -4 0 -1 2 -1 1 -2 -1 0 -1 -2 2 -2 2 5 6 -1 2 3 1 -3

No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

x
71 74 70 71 69 73 72 75 72 74 71 72 73 72 71 75 71 75 78 79 72 75 76 74 70

y
15 19 11 15 12 17 15 19 16 18 13 15 17 16 15 20 15 19 22 23 16 20 21 19 13

( )
4 1 9 4 16 0 1 4 1 1 4 1 0 1 4 4 4 4 25 36 1 4 9 1 9

( )
-1.84 2.16 -5.84 -1.84 -4.84 0.16 -1.84 2.16 -0.84 1.16 -3.84 -1.84 0.16 -0.84 -1.84 3.16 -1.84 2.16 5.16 6.16 -0.84 3.16 4.16 2.16 -3.84

( )( )
3.68 2.16 17.52 3.68 19.36 0 1.84 4.32 0.84 1.16 7.68 1.84 0 0.84 3.68 6.32 3.68 4.32 25.8 36.96 0.84 6.32 12.48 2.16 11.52

1065 1406 770 1065 828 1241 1080 1425 1152 1332 923 1080 1241 1152 1065 1500 1065 1425 1716 1817 1152 1500 1596 1406 910


5041 5476 4900 5041 4761 5329 5184 5625 5184 5476 5041 5184 5329 5184 5041 5625 5041 5625 6084 6241 5184 5625 5776 5476 4900

1825

421

148

179

30912

133373

( ) 179 = = 1.209 ( )2 148 0 = 1 = 16.84-(1.209)(73)= -71.41 1 =

y|x = -71.41+1.209x
Ramin Shamshiri STA6166, HW#4, Oct.18.2007 Page 14

30

25

y = 1.209x - 71.45 R = 0.935

20

15

10

0 68 70 72 74 x 76 78 80

b- Estimate the number of goals to be made by an athlete who is 60in tall. How much confidence can be assigned to that estimate? Answer: If x=60 then using the below model:

y|x = -71.41+1.209x y|x = -71.41+1.209(60)=1.13

Ramin Shamshiri

STA6166, HW#4, Oct.18.2007

Page 15

You might also like