You are on page 1of 4

2DI36 - Statistics

Final Exam Solution


June 28th, 2013

Important: below are the abbreviated answers of the 2DI36 exam of 28th of June 2013. This is for
your reference only, and your write-up on the exam has to be complete and well justified. Keep that is mind
when preparing for the upcoming exam, and when you write your own answers.

P.I: (25 points)


Pn Pn
1
(a) (5 points) Sample Mean = x̄ = n1 i=1 xi = 5.344 ; Sample Variance = s2 = n−1 2 2

i=1 xi − nx̄ =
0.02692 ; Sample Standard Deviation = s = 0.1641 ; Maximum = 5.63 ; Range = 5.63−5.1 = 0.53.
(b) (5 points) It is reasonable to assume this data corresponds to the realization of a random sample
from a normal distribution because the points in the QQ plot lie approximately on a line, and
the p-value from the Shapiro-Wilk test is quite large (in particular much larger than 0.05, which
means the normality assumption cannot be confidently rejected).
(c) (5 points) Let µ denote the mean pH value. We want to test the null hypothesis H0 : µ = 5.4
against H1 : µ 6= 5.4. The test statistic to use is t0 = x̄−5.4
√ = −1.079. We should reject the null
s/ n
hypothesis if |t0 | ≥ t0.05;9 = 1.833 so in this case there is not enough evidence to reject H0 .
(d) (5 points) Since 0.883 = t0.2,9 < |t0 | < t0.15,9 = 1.100 and we are performing a two sided test, we
conclude the p-value is between 2 × 0.15 and 2 × 0.2. Therefore, the 0.3< p-value< 0.4.
(e) (5 points) The CI is
s s 0.1641 0.1641
[x̄ − t0.05;9 √ , x̄ + t0.05;9 √ ] = [5.344 − 1.833 √ , 5.344 + 1.833 √ ]
n n 10 10
= [5.249, 5.439] .

Since 5.4 is inside the two-sided CI we conclude that we cannot reject H0 at the significance level
α = 0.1.
P.II: (15 points)
(a) (5 points) The log-likelihood is simply
n
X2
  
X Xi
`(θ) = log exp − i .
i=1
θ 2θ

The score (partial derivative of `(θ) with respect of θ) is


Pn
∂ n X2
`(θ) = − + i=12 i .
∂θ θ 2θ

Finally, the MLE is found by solving ∂θ `(θ) = 0, resulting in
n
1 X 2
θ̂MLE = X .
2n i=1 i

1
(b) (5 points) The bias of the estimator is given by
2
E[θ̂ME ] − θ = E[X̄ 2 ] − θ .
π
We have
E[X̄ 2 ] = V[X̄] + (E[X̄])2
V[X1 ]
= + (E[X1 ])2
n
E[X12 ] − (E[X1 ])2
= + (E[X1 ])2
n
2θ − π/2θ
= + π/2θ
 n 
2 − π/2
= + π/2 θ .
n
Putting everything together
 
2 2 − π/2 4−π
E[θ̂ME ] − θ = + π/2 θ − θ = θ.
π n πn
The estimator is therefore biased.
(c) (5 points) The estimator is biased, but the bias decreases as n grows, so, for n large the bias will
be negligible (as and aside note, this is what is known as an asymptotically unbiased estimator).
P.III: (15 points)
(a) (5 points) The type I error is given by
Pσ=75 (S ≥ 83.7) = Pσ=75 (S 2 ≥ 83.72 )
(n − 1)S 2 (n − 1)83.72
 
= Pσ=75 ≥
σ2 σ2
= P (X ≥ 13.70) = 0.25 ,
where X follows a χ211 distribution. The last equality is obtained by consulting the χ2 table in
the statistical compendium.
(b) (5 points) The power of this test is given by
Pσ=110 (S ≥ 83.7) = Pσ=110 (S 2 ≥ 83.72 )
(n − 1)S 2 (n − 1)83.72
 
= Pσ=110 ≥
σ2 1102
= P (X ≥ 6.37) > P (X ≥ 7.58) = 0.75 ,
where X follows a χ211 distribution.
(c) (5 points) All that changes from the previous question is the number of samples n. The power is
now
Pσ=110 (S ≥ 83.7) = Pσ=110 (S 2 ≥ 83.72 )
(n − 1)S 2 (23)83.72
 
= Pσ=110 ≥
σ2 1102
= P (X ≥ 13.32) < P (X ≥ 13.1) = 0.95 ,
where X follows a χ211 distribution. Therefore, this number of samples is still not adequate to
ensure the power is larger than 0.95.

2
P.IV: (20 points)
(a) (5 points) Let Y denote the salary, and x denote the height. We assume that
Y = β0 + β1 x +  ,
where  is a zero mean random error. From Figure 2 we see that most points lie close to a straight
line in the QQ plot, and that the Shapiro-Wilk p-value is still larger than 0.05. Therefore the
normality assumption is somewhat reasonable. As and aside, it seems there is only one point
that “challenges” the normality assumption (an outlier), and as a practitioner one should further
inspect that data point to see if there is any reason for the different behavior (e.g., that lawyer
was receiving unaccounted income from consultancy).
(b) (5 points) This can be cast as a t test. The test statistic value is given by t0 = 2.2929/0.6958 =
3.30. Since 3.169 = t0.005,12−2 < |t0 | < t0.0025,12−2 = 3.581 we have 2×0.0025 < p-value< 2×0.005
so the p-value is between 0.005 and 0.01.
(c) (5 points) A point estimate for Bob’s salary is simply given by
µ̂Y |x0 = β̂0 + 73β̂1 = 108.617 ,
thousand dollars.
(d) (5 points) The prediction interval has endpoints given by (x0 = 73)
s 
(x0 − x̄)2

2
1
µ̂Y |x0 ± t0.025,12−2 σ̂ 1 + + .
n Sxx
p
Note that σ̂ = 9.506, σ̂ 2 /Sxx = 0.6958, t0.025,12−2 = 2.228, and x̄ = 70.333 (direct computa-
tion). Therefore Sxx = 186.64. Plugging in everything, we conclude the endpoints of the interval
are s
(73 − 70.333)2
 
2
1
108.617 ± 2.228 9.506 1 + + = 108.617 ± 22.428 .
12 186.64
The prediction interval is therefore [86.19, 131.05]. Note that there are different, but equally valid,
ways to reach this result.
P.V: (25 points)
(a) (5 points) Let p be the true proportion of improperly capped bottles. We want to test H0 : p =
0.03 = p0 against H1 : p > 0.03. From the data we conclude that p̂ = 8/200 = 0.04. The test
statistic to use in this case is
np̂ − np0
z0 = p = 0.8291 .
np0 (1 − p0 )
We should reject the null hypothesis if z0 is larger than zα = 1.645. Therefore we cannot reject
the manufacturers claim.
(b) (5 points) From the compendium we conclude that the p-value is 0.2033, which is larger than 0.05.
(c) (5 points) The desired confidence interval is given by
" r r #
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 , p̂ + zα/2 .
n n
Plugging in the numbers we get
[0.017, 0.063] .
This is a 90% two-sided CI interval, which means that the 95% one-sided lower CI for p is given
by [0.017, 1]. Since p0 is contained in this interval we conclude that we cannot reject H0 at the
α = 0.05 significance level.

3
(d) (5 points) The length of the CI is simply
r
p̂(1 − p̂)
2zα/2 .
n
Therefore we need that  2
2zα/2
n > p̂(1 − p̂) .
0.01
Since we don’t know p̂ we can be conservative, and require
 2
2 × 1.645
n > 0.5(1 − 0.5) = 27060.25 ,
0.01

therefore we need to inspect at least 27061 bottles. This large number is the price we pay for not
taking into consideration that p is small.
(e) (5 points) Let p be the true proportion of improperly capped bottles. We want to test H0 : p =
0.005 = p0 against H1 : p > 0.005. The normal approximation we use in the tests above is only
reasonable if both np0 and n(1 − p0 ) are larger than 5. However, if p0 = 0.005 then np0 = 1 < 5,
and so the normal approximation is not valid.

You might also like