You are on page 1of 12

THE AUSTRALIAN NATIONAL UNIVERSITY

SCHOOL OF FINANCE AND APPLIED STATISTICS


First Semester Examination Final, June 2006
QUANTITATIVE RESEARCH METHODS (STAT1008)
Study Period: 15 minutes
Time Allowed: 3 hours
Permitted Material: Calculator, dictionary and 1 A4 page with notes
on both sides

Instructions to Candidates:
Attempt ALL questions.
Each question is of equal mark value.
Start your solution to each question on a new page.
To ensure full marks show all the steps in working out your
solution. Marks may be deducted for failure to show appropriate
calculations or formulae.
Unless otherwise stated, use a significance level of 5%.
Selected statistical tables are attached to the back of the
examination paper.

Page 1 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


Question 1: 20 marks
For each question below, choose the best answer from the options given.
Write your answer in your answer booklet clearly indicating the question (i to xx), and
your answer as the letter appropriate (A, B, C, or D).
You will gain 1 mark for each correct answer. Marks will not be deducted for
incorrect answers.
(i) The Central Limit Theorem states that as a sample size becomes larger, the
distribution of
a. The sample mean becomes more normal
b. The sample standard deviation becomes more normal
c. The population mean becomes more normal
d. None of the above.
(ii) In a simple regression, the R-squared value is computed to be 87.2%. From this
alone we can say
a. The regression is an appropriate model for the data
b. The regression is not an appropriate model for the data
c. The residuals will be normally distributed.
d. None of the above.
(iii) The same two variables discussed in part (ii) (above) are used to compute a
correlation coefficient. The absolute value of the correlation coefficient (to 3
decimal places) will be
a. 0.872
b. 0.934
c. 0.966
d. 0.760
(iv) A scale of very poor, poor, average, good, very good has been used to assess the
teaching methods of a lecturer. A regression is to be run using final exam result
as the response, and teaching rating as a dummy variable. How many columns
will be needed for the dummy variable coding?
a. 1
b. 2
c. 3
d. 4
(v) Suppose Minitab gives a sample correlation between two variables as -0.955,
with a p-value of 0.000. This means that
a. The null hypothesis, =0, would be rejected.
b. The null hypothesis, =0, would not be rejected.
c. The response is dependent on the explanatory variable.
d. 95.5% of the variation in the response is explained by the explanatory
variable.

Page 2 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006

(vi) A researcher asks a group of university students to select their favourite method
of assessment open book examination, closed book examination, assignment,
group work or other. In summarising this data, which would be the most
appropriate graph to use?
a. Bar chart
b. Boxplot
c. Histogram
d. Scatterplot
(vii) In investigating the relationship between years of formal education and income, a
researcher finds that the covariance is +101.34. Which of the following
statements is true, based only on this figure?
a. There is a very strong positive relationship between the two variables
b. The correlation will be very close to 1.
c. Both years of formal education and income have very large variances.
d. A line of best fit for the data would have a positive slope.
(viii) A study is being performed based on in-home access to broadband internet
within the ACT. The ACT has been divided into 32 regions, and a random
selection of homes in each region is chosen for participation in the study. This
sampling plan is best described as
a. Simple random sampling
b. Systematic sampling
c. Stratified random sampling
d. Cluster sampling

NOTE: This question continues on the next page.

Page 3 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


Use the information given below to answer questions (ix) to (xx).
It has long been claimed that if a system of flexible work hours is offered to staff, they
will have reduced demand for resources. A local health department decides to test this
theory among their 11 field workers. For 12 months, they record the distance driven
by each field worker in the course of carrying out his or her duties. Then, they switch
from a standard 5 day week to a flexible 4 day week system, and record the new
mileage driven over the 12 months immediately following. The data collected id
number of staff member, number of miles driven when working a 5 day week, number
of miles driven when working a 4 day week are entered into Minitab. The data are
presented in a boxplot below.
Boxplot of 5 Day, 4 Day
9000
8000
7000

Data

6000
5000
4000
3000
2000
1000
0
5 Day

4 Day

(ix) When the data are entered into Minitab, how many rows will be required?
a. 12
b. 3
c. 11
d. 10
(x) Based only on the boxplot, we can say that
a. The IQR is larger for the 5 day week than for the 4 day week
b. The Range is larger for the 4 day week than for the 5 day week
c. The mean of the 5 day week is higher.
d. There are more observations for the 5 day week than the 4 day week.
(xi) How many variables are present in the data set?
a. 2
b. 3
c. 4
d. 12

Page 4 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


(xii) How many continuous variables are present in the data set?
a. 1
b. 2
c. 3
d. 11
A further variable is calculated the difference between the mileages under the two
schemes, that is (5 Day mileage) (4 Day mileage). Basic Descriptive Statistics for
all three variables are calculated and presented below, however some values have
been obscured by oil blots.
Descriptive Statistics: 5 Day, 4 Day, 5 Day - 4 Day
Variable
5 Day
4 Day
5 Day - 4 Day

N
11
11
11

Mean
4955
3973
982

SE Mean
*oil1*
*oil3*
*oil5*

StDev
3161
2171
1140

Sum
*oil2*
*oil4*
*oil6*

Sum of
Squares
369966249
220755581
23593724

(xiii) The value which should be present at *oil1* is given by (to the nearest whole
number)
a. 287
b. 953
c. 908356
d. 3161
(xiv) The value which should be present at *oil4* is given by (to the nearest whole
number)
a. 43703
b. 23881
c. 14858
d. 20068689
Based on the above descriptive statistics, a 90% confidence interval for the average
mileage driven using a 5 day week is calculated to be
3161
3161

, 4955 + c
4955 c
.
n
n

(xv) In the above confidence interval, c=


a. 1.645
b. 1.96
c. 1.812
d. 2.228
(xvi) In the above confidence interval, n=
a. 10
b. 11
c. 12
d. 5

Page 5 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


The local health department wishes to test if there is a significant difference between
the mileage under the two work day schemes.
(xvii) The appropriate test statistic is given by
4955 3973
a.
1 1
2711.56
+
11 11
4955 3973
b.
3161 2171
+
11
11
982
c.
1140 11
999921
.
d.
4713241

(xviii) In testing if there is a significant difference between mileage under the two work
day schemes, the test statistic should be compared to which tables?
a. T tables with 10 degrees of freedom
b. Standard normal tables
c. T tables with 20 degrees of freedom
d. F tables with 11, 11 degrees of freedom.
(xix) The p-value for the test discussed in parts (xvii) and (xviii) (using a two-sided
alternative hypothesis) is 0.017. This means that
a. There is a 1.7% chance that the null hypothesis is true.
b. 1.7% of the time, the null hypothesis is true.
c. We would reject the null hypothesis.
d. We would reject the alternative hypothesis.
(xx) If the test were carried out against a one-sided alternative, which of the following
statements about the new p-value would be true?
a. The p-value will be equal to 0.017.
b. The p-value will be equal to 0.0085.
c. The p-value will be equal to 0.034.
d. None of the above.

Page 6 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


Question 2 (20 marks)

(a) The table below shows the results of a survey of voters including who they
voted for in the most recent federal election (in the House of Representatives)
and their positions on the death penalty for convicted murderers.
For
Against
Liberal/National
0.26
0.04
ALP
0.12
0.24
Other
0.24
0.10
i. Find the marginal probability distribution of voting in the most recent
federal election.
ii. What is the probability that a randomly chosen Australian voter supports
the death penalty for convicted murderers?
iii. What is the probability that an Australian voted for the Liberal/National
candidate in the House of Representatives at the last election if it is
known that they are against the death penalty for convicted murderers?
iv. Are voting choice and position on the death penalty independent events?
Explain your answer.
(b)A commuter must pass through give traffic lights on her way to work, and will
have to stop at each one that is red. She estimates the probability model for the
number of red lights she hits as shown below.
# red lights
0
1
2
3
4
5
Probability 0.06 0.25 0.34 0.15 0.16 0.04
i. Find the expected number of red lights at which the commuter will
have to stop on her way to work.
ii. Find the standard deviation of the number of red lights.
iii. Find the expected number of red lights the commuter will face on
her way to work over a 5 day working week. What is the standard
deviation of the number of red lights faced over a 5 day working
week?
iv. The local council installs a new set of lights on the commuters
route. The commuter wants to take a sample in order to estimate
the new mean number of red lights she can expect to be stopped at.
To estimate this mean to within half a red light (0.5), how many
journeys should she sample, assuming the new standard deviation
is equal to 2.5 red lights? That is, calculate the number of
observations she should make, clearly stating any assumptions you
make.

Page 7 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


Question 3 (20 marks)

(a) Your pocket copy of Kyrgystan on a Budget claims that you can expect to spend
about 4237 soms (the local unit of currency) each day you spend in this country,
with a standard deviation of 360 soms. Assume that expenditure follows a normal
distribution.
i. Your budget allows you to spend 90,000 soms during your stay (not
including transport into and out of the country). What is the maximum
number of whole days you can spend in Kyrgystan on average, without
breaking your budget?
ii. What is the standard deviation of your total expenses for a stay of that
duration?
iii. How much money should you budget for each day in order to cover all
but the most expensive 10% of days?
iv. After a stay of 10 days, you find that you have spent 41,414 soms.
What percentage of travellers with the same length of stay will have
spent less than you (assuming the figures in Kyrgystan on a Budget
are accurate).
(b)Having completed your stay in Kyrgystan, you return home to Canberra, and
decide to put your budgeting skills to the test. Your part-time job pays well, at $24
an hour, but the number of hours per week is a random variable best represented by
a uniform distribution with possible values from 4 to 18. Assume each week is
independent.
i. Draw a graph representing the distribution of the number of hours of
work per week. Clearly label all axes and points of interest.
ii. Find the expected value of the hours of work per week.
iii. Find the variance of the hours of work per week.
iv. What is the probability you get no more than 6 hours work next week?
v. Your budget requires your job to bring in a minimum of $112 per week
to meet minimum expenses, or you will be forced to ask your parents
for money. What is the probability that you ask your parents for money
next week?
vi. What is the probability that you get more than 600 hours work over the
coming year (52 weeks)?

Page 8 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


Question 4 (20 marks)

(a) The Yummy biscuit company claims that every 500g package of their
chocolate chip cookies contains an average of at least 1000 chocolate chips.
Being a dedicated student of statistics, you determine to test their claim, and
taking a random sample of 16 packages, you find an average of 1238.2
chocolate chips with a standard deviation of 94.3.
i. Perform a hypothesis test at the 5% level to determine if the
companys claim is supported by your data.
ii. Comment on any assumptions you have made in performing
the inference in part (i).
(b)The Scrumptious biscuit company claim that their 500g packages of
chocolate chip cookies are tastier, and contain a different number of chocolate
chips on average than those produced by the Yummy company. You take a
random sample of 16 Scrumptious packages and find a sample average of
1382.2 with a standard deviation of 123.1.
i. Test at the 10% level whether the data support the assumption
of equal population variances.
ii. Perform an appropriate test of equality of means, using a
significance level of 10% and clearly stating any assumptions
you make.
iii. Given the results of your test in (ii), answer the following
question without performing any further calculations: Would
the value 0 be found within a 90% confidence interval for the
true mean difference in number of chocolate chips between
Yummy and Scrumptious packages? Explain your answer.

Page 9 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


Question 5 (20 marks)

Losing weight is often a difficult venture. In order to maximise their chances of


success, five friends embark on a weight-loss journey together, using a combination
of healthy diet and exercise in order to reach their goals. They record their weight loss
every week over 20 weeks. Some basic descriptive statistics of both variables appear
below.
Descriptive Statistics: Weeks, Weight lost

Variable
Weeks
Weight lost

N
100
100

Mean
10.500
7.290

StDev
5.795
4.242

Minimum
1.000
-1.136

Maximum
20.000
15.414

The data are presented below in a scatterplot.


Scatterplot of Weight lost vs Weeks
17.5
15.0

Weight lost

12.5
10.0
7.5
5.0
2.5
0.0
0

10
Weeks

15

20

A regression is performed in Minitab on the data, and an excerpt of the output is given
below. However, some of the output has been obscured by sweat stains from the
exercise program.
Regression Analysis: Weight lost versus Weeks

The regression equation is


Weight lost = SWEAT STAIN 1

Predictor
Constant
Weeks

Coef
-0.0742
0.70133

SE Coef
0.2539
0.02119

T
-0.29
33.09

Page 10 of 12

P
0.771
0.000

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006


S = SWEAT STAIN 2

R-Sq = 91.8%

R-Sq(adj) = 91.7%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
1
98
99

SS
1635.5
146.4
1781.8

MS
1635.5
1.5

F
1095.03

SE Fit
0.218
0.143
0.218
0.184

Residual
3.258
2.952
-2.611
2.433

P
0.000

Unusual Observations

Obs
42
47
79
97

Weeks
2.0
7.0
19.0
17.0

Weight
lost
4.586
7.787
10.640
14.282

Fit
1.328
4.835
13.251
11.848

St Resid
2.71R
2.43R
-2.17R
2.01R

R denotes an observation with a large standardized


residual.

(a) Describe the scatterplot. Would you expect the covariance between weeks and
weight loss to be positive or negative? Give a reason.
(b)Give the equation of the fitted model (SWEAT STAIN 1).
(c) Find the standard error of the estimate (SWEAT STAIN 2). Give an interpretation
of what this value means.
(d)The friends wish to test the value of the intercept particularly, they wish to know
if the average weight loss at 0 weeks is 0kg. Use the output (without performing
any calculations) to comment on this.
(e) It is often claimed that weight loss of over 0.5kg per week is unsustainable. Test if
the average weight loss per week by this group of friends is likely to be
unsustainable based on this criterion.
(f) Comment on the unusual observations flagged in the Minitab output. Are they a
cause for concern about the model?

NOTE: This question continues on the next page.

Page 11 of 12

STAT1008 Quantitative Research Methods Final Examination Semester 1, 2006

(g)The friends wish to find a 95% confidence interval for the average weight loss of
all people using the same combination of diet and exercise in week 15. Use the
output above and calculations you have made in earlier parts of this question, to
find this interval.
Hint: You may find the following formulae useful:
(1 ) % Confidence Interval for y given that x = xg :
1 ( xg x )
+
n ( n 1) sx2

y t / 2, n 2 s

(1 ) % Prediction Interval for y given that x = xg :


1 ( xg x )
1+ +
n ( n 1) sx2

y t / 2,n 2 s

_____________________________________________________________________
END OF EXAMINATION

Page 12 of 12

You might also like