You are on page 1of 70

Part

A. Midterm Exam 2013


ooA

Notes: This exam has 9 questions. The duration is 2 hours. Bookso noteso and calculators are allowed, but not computerso cellphones or on-line connectivity.

MT2013: Question

sampling of sampling questionsoo

a) The Human Resources Department of a large university maintains records on its facultv members. The table displavs some of these data.

Place an

X in the

-Payroll of Employment
-Years

Number

space beside each variable that is best described as Quantitative.

-Birth

date Rating

_Faculty
Classification

b) Which of the following is (are) based on cross-sectional data? Company quarterly profits . -A. B. Percentage of Canadian adults who work full-time 'C. Historical closing stock prices Yearly student enrolments -D. Annual costs
c) Which of the following is (are) time series data? Number of employees in20l2 -A. This month's demand for an automotive part -B. This quarter's sales of automobiles -C. Weekly receipts at a clothing boutique -D. Percentage of employees who are female

-Teaching

-Salary

-E.

a) dir

d) The administration of a large university wants to study the types of wellness programs that would interest its employees. They plan to survey a random sample of employees. Under consideration are several sampling plans. Beside each plan, write the number of the sampling strategy given in the following list. for each. Choose from among: 1 Simple Random Sampling Stratified Random Sampling 3 Cluster Sampling Systematic Sampling

-E.

c)

d)

2: : 4:

e)

_ _ _

(i) There are five categories of employees (administration, faculty, professional staff,
clerical and maintenance). Randomly select ten individuals from each category. (ii) Each employee has an ID number. Randomly select 50 numbers. (iii) Randomly select a school within the university (e.g., Business School) and survey all of the individuals (administration, faculty, professional staff, clerical and maintenance) who work in that school. (iv) The HR Department has an alphabetized list of newly hired employees (hired within the last five years). After starting the process by randomly selecting an employee from the list, every fifth name is chosen to be included in the sample.
SG-4

A manufacturer of toys claims that less than3o/o of his toys are defective. When 100 toys were drawn from one production run of 5,000 toys, 5o/o werc found to be defective. For each term on the left, select the matching answer from the list to the right, and write the number in the blank. The 3Yovalue
e)

Population Sample Sampling Frame Parameter Statistic 2

The

5o/o

value

3 The 100 toys

4 The 5,000 toys 5 All toys produced

MT2013: Question

oocould this label be called a ophone tag?)"

Amagazine that publishes product reviews conducted a survey of teenagers'preferences for cell phones. Three brands of cell phone designed specifically with teens in mind were the focus of the study. The table summarizes responses by brand and gender.

Phone lall Me Mavbe Phone Fun XS Black Kevs II


Cell

Male
55

Female
87 150 113

Total
142 249 309 700

99-.

"196

Total
a)

350 .i

350

Which of the following charts would be appropriate for displaying the marginal distribution of cell phone brand? Bar Chart Histogram -B. -A. Line E. Stem and Leaf Display -C.

-D.

Graph

Boxplot

b) What percent

_A.

s0%

of teenagers preferred Call Me Maybe? B.

t6% 4l% _c.2s% -D.


-D.

-F,.20%
E. 16%

c) What percent

_A. 43% _8. 60% _c.2r% _A. 63%

of female teenagers preferred the Phone Fun XS?

50%

d) What percent

of teenagers who preferred the Black Keys II were males?

4l% s0% 8.32% _C. 16% -8. -D.

e)

Which of the following statement is true? _A. It appears that cell phone brand preference and gender are not related. _B. It appears that cell phone brand preference and gender are not independent. _C. It appears that cell phone brand preference and gender are independent. A scatterplot will be more informative here than a table. -D. None of the above _E,

SG.5

MT2013: Question 3

ttSpring into these summary questionst'

a) You have a set of 30 numbers. The standard deviation from these numbers is as zero. You can be certain that: Half of the numbers are above the mean _8. All of the numbers in the set are zero _C. All of the numbers in the set are equal The numbers are evenly spaced below and above the mean

-A.

-D.

b) Here is the five number summarv of the hourlv w Min Median o1 o3 20.94 37.64 44.77 49.24

for sales
Max
67.11

managers.

(i) The

shape of this distribution is best described as:

-A. Skewed to the right _8.


_C.

Symmehic

-D.

Skewed to the left Not enough information to tell


these data is: and upper inner fences:
Space for calculations:

(ii) The IQR for

(iii) Compute the lower


',i.

Lower inner fence:


Upper inner fence:

(iv) Are there any outliers, as defined by the ooinner fences" criterion? _A. Yes, only on the left side of the distribution Yes, only on the right side of the distribution -B. Yes, on both sides of the distribution -C. No

-D.

(v) Suppose there had been an effor and that the lowest hourly wage for sales was $ 18.50 instead of $20.94. Indicate whether how this change would affect following swnmary statistics (increase, decrease, or stay about the same):
a. Mean

the

Decrease Decrease Decrease Decrease

Stay the Same Stay the Same Stay the Same

Increase Increase Increase Increase

b. Median c. Range d. rQR

/
Stay the Same

c) In a perfectly symmetrical distribution,,which of the folowing statemenfs The distance from er ,o qz i. equar-to-trie;;." from to is false? e2 e3 il"6' distance

-D
d) Here is a
12 13 14 15 16

-A' -: ]Ti# aTtTT,ru;ti*l1ll#;d tT"*'&i:fl".1"H:nfll*l;i;arion { disJanc:"i"#Sl ," q:"1,


The
Ustfm

',,r,.,ui";;;

to e2 is the same as the disrance


rhe smanes*o the

rr"iiorrn" distance from

ntot of scores (out of 200) in a graduate finance course.

t34s78

347
26

17 J l8 9

(i) How many students were in the course? (ii) What was the maximum score?

(iii) What is the medianscoJe?


' ;li

An office supply chain has stores in Toronto and vancouver. be closed within the one of these stores is to coming yut *rro.h.lp -;k;,rr"J.jri"n, management reviews sales data. Below are boxpio;r unit sales for both locations.
e)

f#;dry

Which of the following statements is not correct? Monthry sales are higheil.Toronto compared

-c' _D.
-E'

itrioronio i, ru.g".ii;;# -B. Monthrv sales are less uu.iuur.

-A' The IeR for rut"r

Both distributions are fairly symmet ic. Monthly sares are more i*uarcin vun.ouuer compared to Toronto.

i,

\i";;;;;;;;o*.d

to vancouver. fo. vurr.ouu... to Toronto.

SG.7

MT2013: Question 4
a)

'6Time for relationship-building'o

A consumer research group investigating the relationship between the price of rneat scatterp (per kg) and the fat contJnt (gramO githered datathatproduced the following
8c!fir.ploi of Fsl Gnms va Prlclrk0

(i) Which best describes the association between the price of meat and fat content? Negative, moderatelY strong -A. Negative, weak -B. Positive, strong -C. Positive, weak -D. E. No aPParent association :.. (ir) If the point in the lower left hand corner ($2.00 per kilogram, 6 grams of fat) is removed, would..$be correlation would most likely

-A. -B.

b) For each of the following pairs of variables, would you expect a large negative your choices' correlation , alargepositivJcorrelation, or a small correlation? Circle
1.

-C. -D. -E.


age

remain the same become stronger negative become weaker negative become Positive become zeto

The

of

a used car and its Price

Large

2.The height and weight of a Person


3. The height and the IQ of a Person

Neg. Large Neg. Large Neg.

Large

Pos. Small Large Pos. Small Large Pos. Small

r, decide c) For each of the following statements, about the correlation coefficierrt, whether it is True or False. Circle your choices as appropriate. True 1. r equals the proportion of times two variables on a straight line True 2. r willbe +1.0 only if all the data lie exactly on a horizontal straight line True 3. r measures the fraction of outliers that appear a scatterPlot True 4. If the correlation between X and Y is r, the correlation between Y and X is -r True 5. r is a unitless number and must always lie +1.0 inclusive. between -1.0 and

lie in

MT2013: Question 5

rnistrust is the opposite of trustrwould mistress be the opposite of stress?r'

oolf

A labour efficiency consultant collected some data on several employees of a manufacturing operation: their stress levels (X, on a scale from Oio i0) and the productivity levels (Y, in parts made per hour). She only recorded some of the relevant computations, as follows:

i:5.4 ! :57.5 bt: -3.19

s, :3.3

sr:11.1
s" = 4,3

a)

Write the estimated regression equation here:


(Use two decimals only for each value)

b)

Write the correlation coefficient here:


for work:

(Round to two decimals)

Space

c) Complete this sentence:

level
d)

For each additional unit on the stress scale, the productivity parts per hour.

what percentage :lrll"rn"tion in productivity levels can be explained by the shess level variable? Give your answer here, t-o ttre nearest whoie p.r..ni

e) Estimate the

productivity of an individual whose stress level is

g:

(Round to nearest whole number)

Suppose the employee in part e) has an actual productivity level of 60 parts per hour. Compute the residual and use the fact that the standard deviation
decide whether this data sentence only.

of the resiauats is 4.3 to point would be considered an outlier. Explain why in one

Residual:
Explanation:

Outlier?

Yes

No

g) Estimate the

productivity of an individual whose shess level is unknown.

h) Give an

expected to

interval range in which the productivity level of 95% of employees would be fall. Report to the nearest whole numbers. to

SG.9

MT2013: Question 6

obellt?" "Can vou answer the call of the

a) Which statistic(s) would you expect to have a normal distribution?

I. Height of women II. Shoe sizes of men III. Age (years) of first-year university students

-A. B. II & III only


-C. All three -D. _E. None of the three
b) The length of time taken by a statistics professor to solve The Globe & Mail crossword has a normal distribution. It is known that the probability of needing 20 minutes is 0.5, while the probability of needing more than 30 minutes is 0,1

I&IIonly

I&IIIonly

(i) Find the mean and the standard deviation of the professor's solving time.

Mean:

SD=

(ii) What is the probability that the solving time is between

15 and 25 minutes?

_A.0.38

8.0.17

_c.0.68

_D.0.06

_8.0.12

_F.

c) A soft drink machine dispenses a cup, syrup and carbonated water, hopefully in order! The amount of synrp injected is normally distributed with mean 15 ml and variance 10 ml2. The amount of water injected is normally distributed with mean and variance 15 ml2. The two amounts are independent of one another.

(i) Find the mean and standard deviation of the total amount of synrp and water
dispensed.

Mean:

SD:

are dispensed in a day, what are the mean and standard deviation total amount of liquid (syrup and water) that ate required?

(ii) If 25 drinks

Mean:

SD:

minutes. suppose a random sample of 25 ordering processes is selected. (i) The standard deviation of the sampling diskibution of mean times is _A. 0.4 minutes

d) Suppose the time it takes for a purchasing agent to complete an online ordering process is_normally distributed with a mean of 8 minute. und u standard deviation of 2

8.2 minutes _C. 0.08 minutes


1.6 minutes

_D.

_8.0,12

minutes

(ii) what is the probability that the sampre mean will be less than 7.5 minutes?

_4.0.3944

_B. 0.1056 _c.0.21t4


_D.0,4013 _8.0.8944
mean height of male UBC students is 70 inches, with SD 3 inches. The mean Il. of female

height

UBC students is 65 inches, with SD 4 inches. you measure the heights of random samples of 100 males and 100 females.'Which result is the most unlikely? To decide, compute the z-scorE for each result and write the values ir trt. rp*Lffi"ia.a.

:)

1. 9n. randomly.trorin male having a height of 79 inches or more

-D.
for

-!. -c.

All females in your sample having an auerage height of 6g inches or more All males in your sample having an a'ne.agJheight of 73 inches or more
z-score for B

o1e randomly chosen female having

a treight

of 74 inches or more

forA=

C:

z-score for D =

SG-l1

MT2013: Question

66Work

with confidence!"

M
fi

a) EU (European Union) countries report that 460/o of their labour force is female. Is tX p"r..triug. of females in the Canadian labour force the same? Statscan plan to check a random Jample selected from more than 10,000 employment records on file to esti the percentage of females in the Canadian labour force.

Sr

4t

iI

(i) Statscan wants to estimate the percentage of females in the Canadian labour force *itftitr *5% with 907o confidence. How many employment records should be sampl

_A. tzt
_8.269 _c.451
_D.382

E. 1000
confident of estimating the percentage of Suppose that Statscan wants to be femates in the labour force to within +2o/o of the true percentage. Which of the follou would they have to do? Decrease the samPle size Select the same number of employment records Increase the samPle size Decrease the Precision Increase the samPling'error

(ii)

-A.

-C. -D.

-8.
-8.

select a random sample of 525 employment records, and find that229 of the people are 'females . The 90oh confidence interval is closest to:

(iii) They actually

47.2%;o

-A.40.1%to

59.7o/o 69.40/o

-B.275%to

-C.17.8%to 56.8oh -D.42.4%to 12.4% to 71.0o/o


b) For each of the following statements about a95Vo confidence interval (CI) for the mean, decide whether it is True or False. Circle your answers at the right.
1.

-E.

Results fromg1oh of all samples will lie in this interval.

True

False False

2. CIs are more information than point estimates because show how much the population parameters can vary.
3. The interval is wider

they True True True True cI


True

thanag}% U would be'

False False

4.95% of data values will fall in the range of a95oh CI for the mean.
5. We arc 95o/o confident that the confidence interval includes the samPle mean.

False

6. If we took many additional samples and computeda95a/o for each, then approximately 95% of those intervals would contain the population mean.

False

SG.I2

MT2013: Question 8
Suppose that

*Hypothetically speaking"

areport indicates that2SYo of Canadians have experienced difficulty in payments. Further suppose that anews organizatronrandomly sampled mortgage making 400 Canadians from 10 cities and found that 136 reported such difficulty. Does this indicate that the problem is more severe among these cities?
a) The

correct null and alternative hypotheses are Ho : p:0.28 and Ho : p > 0.28 -A. Ho : p:0.28 and Ho : p < 0.28 -8. Ho : p 0.28 and Ho : p * 0.28 -C. Ho I p i0.28 and Ho : p 0.28 -D. Ho : p > 0.28 and Ho : p 0.28

: :

b) The correct value

-H.

of the test statistic

is:

Space for work:

_A.

_8.

_c.
_E,
c) The

-1.28 -2.67
2.67
1.96

_D.

-1.28

P-value corresponding to this test statistic is: _A. 0.025

_E. 0.0038
d)

-8. 0.0177 '' _c. _D. 0.0522

0.2119

At a= .05, we can conclude that the percentage of Canadians in these cities experiencing difficulty making mortgage payments ... is significantly higher than 28oh is significantly lower than28%o is not significantly different from 28% _D. is equal to 28Vo _E. is none of the above; no conclusion can drawn with the given information.

-A. -B. -C.

e)Using the P-value in part c), which one of the following statements is true? A 90% confidence intervalfor p would contain2So/o -A. A 95% confidence intervalfor p would contain2SYo -B. A 95% confidence intervalfor p would not contain 28% -C. None of the above

-D.

Part f) is unrelated to parts a)

through e):

f; An opinion poll in a city of 200,000 was based on a simple random sample of 2000 people. Another poll is to be taken in the same way in a second city of population 400,000. In order for this poll to have the same margin of error as the poll in the first city,
the sample size

in the second city should be:

_A.

1000

_8.2000 _c.4000
D.8000

SG.13

MT2013: Question 9

o'No Surprise:

A Statistics Test with a test

to assist in determining the co Insurance companies track life expectancy info-rmation of all policyholders was life insurance policies. Last year the uu.rug.life expectancy now have a longer life years. ABI Insurance wants io determine littreir clients

MT2

a)Y

expectancy, on uu..ug., so thev rando"'1v.:Tp]:^:tr:lf:::tirtl'y"f,f will onry chang. th"it prr*ium structure if there is evidence t The sample has a mean of people who buy th.ir 61i"'i.r ut. fiuinilongerihan before' 4'48 years' ZS.O y.utt and a standard deviation of

il"#;;#;;ffiv
86 75
85

i,lli'll?;

e) 5. 5,0c

Det'

a)\

83

76

70

84 76

81

77
81

78 73

79

79 74

79

81

1)

83

ratt witl
hen,

b')"

a) The appropriate Ho

null

and alternative hypotheses are: Ho

1on1

c)(

compute its value' b) Give the formula for the appropriate test statistic and Formula:
Space for work:

poi d)

Computed value:

(iil (iii (ir


e)

wl
\

c) The corresPonding P-value is: Greater than 0.20 Between 0.10 and 0'20 Between 0.05 and 0'10 Between 0.025 and 0'05 E. Between 0.01 and 0'025 Less than 0.01

M
a)

-A.

-B. -C. -D.


-F.

D, a)

b( b)

c)

d) State your conclusion using cr: .05._write sentence that tells egl tns,rra;ce whether thera

ry

statistically and grammalically correct evidence to increase their premiums'

dl

e)

tt

paid policies' This sample yields a mean e) suppose ABI randomly samples.lO0 recently compute aWconfidence interval' of 77.7 years and a standard deviation of 3'6 yt*t' l plaie' , xx'xl with one decimal Report it in the

f"t;i;;.x

t-.-

'

MT2013

FOLLOW END OF QUESTIONS; ANSWERS AND EXPLANATIONS


SG.14

MT2013: Answer 1 a) c) !9ars of Employment, Teaching Rating b) d) 2,1,3,4. e) 5,3,4,1,2. Population : All toys produc-ed; sample 1 too tovsi Sampling Frame 5,000 toys; Parameter :3yo; Statistic :5o/o

B.

D.

Details and Comments: a) Years of Employment has units (yrs); Teaching Rating does not have units but the ratlns^is an average of ordinal data over a numbei of corises, and can range from I to 5 with fractional values possible. b) "Percentage of Canadian adults who work full-time" is measured at one time point, hence cross-sectional. The other variables are.rurrrroi.peatedly over time, hence longitudinal or time-series. c) only"weekly receipts at a clothing boutique" is measured at more than one time point. The other variables ut. *rururJd once each. d).(t) The five categories are strata;random samples are taken within each one. qil E^ach employee has the same chance of beinj setecteJror the sample. (iii) one school is a reasonable representative oith" entire university, hence a cluster. (iv) Choosing o'every fifth name', makes it systematic. tl Tlt sampling frame is the production run, namel y, thatpart of the population from which the sample can be drawn. :
'r
;'' \,':i I

MT2013: Answer 2 a) C. b) c)

E.

A.

d)

A.

e) B.

Details and Comments: a) categorical data are displayed with abar chart.Histograms, stem-and-leaf displays, boxplots (and usually line graphs) are for quantitative da"ta. b) 20% (142t700) c) 43% (150/3s0) d) 63% (1e6t30e) e) The column percentages for males are different from those for females, which suggests that cell phone brand preference and gender are rehtlJliindependent.)

"ot

SG-15

MT2013: Answer 3

c. b) (i) C. (ii) 11.6 (iii) Lower inner fence


a)

Upper inner fence :66.64 (iv) B. (v) Decrease, Stay the same, Increase, Stay the same d) 15, 189, 138 e) E. c)

:20.24

D.

Details and Comments: a) Look at the formula for standard deviation. If all numbers are equal, then they are all equal to the mean, so all the deviations are zero. This is the only way the standard deviation can be zero. b) (i) The median is closer to Q3 than to Q1 so the distribution is skewed to the left. (ii) rQR: Q3 - Ql - 49.24 -37.64 (iii) Lower inner fence 37.64- 1.5x1 1.6; Upper inner fence 49.24 + 1.5x1 1.6 (iv) Yes, only on the right side of the distribution since the maximum exceeds 66.64. (v) Decreasing the lowest data value decreases the sum, and hence the mean. But it doesn't really affect which is the middle value or the quartiles. The range increases. c) Quartiles divide the area of the distribution into four equal sections. d) (i) Count up the number of data values. Don't forget to attach the leaf to the stem the maximum and median. e) Monthly sales are more variable in Vancouver compared to Toronto since the box i

tal1er.

MT2013: Answe3.4 a) (i) A. Negative, moderately strong b) Large N.g.; Large Pos.; Small

(ii) B. Become stronger negative


c) False, False, False, Falseo True

Details and Comments: a) (i) Top left to bottom right is negative association. (ii) Removing the lower left point reduces the scatter. b) 1. The older the car, the lower the price. 2. The taller the person, the heavier the person. 3. Height has no connection with IQ. c) 1. "Creative" but completely wrong. 2. The points must lie exactly on a shaight line with a positive slope. 3. "Creative" but also completely wrong. 4. Corr of X and Y: Corr of Y and X. The roles are interchangeable. 5. Two of the properties of r.

3rH:it"fl,;li;I;t
?).
b_o

MT2013: Answer 5 a)f =74.73 *3.19x b) _0.95 c).odecreasesby3.19,, it is an outliei since the residen, i,

-o*

d)g0%

e) 49
0.

than2.5"";, u*uy rrom

Details and Comments:

= 59.5 -(-3 .Ig)(s,4):74.73) b) Reanange the formula ior b.r,'r b1$*/sr): (-3.1gx3 c) Interpretation of slope. d) y' G0.9r2:0.90 or 90%o

! - b$:

.3/rt.1)= -0.g5

e)

i(8) :74.73

-1.19_(8)

:49,2t
:

g) Since x is unknown, just use the mean ofy. h) Use the 68-95-99.7 liule, i.e, Sl,i * z(tt.t) :35.3,74.7

3"li*1;li;#l"t ;$r";#,

11;

(Round to 49) remember the

68 _s

s -ss.7 Rure for idenrifying

MT2013: Answer 6 A. I and II onlv uJ().n{ean :2ojso:


a)

d) (i) A. 0.4 minure, e)D. z-scores: 3, Z.ZS,i.Si,

Mean: e5 ; SD = :J !i)

r0 (ii) A. 5 iiij u.* :2375;SD :25 ,).. ,i.:


fiijn. rc) '
tj.f
OsO

D has the highest z-score anditherefore is the most unlikely.

Details and Comments: a) First-year students' ages will vary only.slightly since most are within ayearor two in age. There might be some older students, i.e.-ttror. rrtu*1ig to school etc., but it is to have students who are much younger li4]y-*tikely than lg or 19! b) (i) Computations: pr(Z>A:0: z:0,r" X ="1ii)o=r 20:$* 0 => It:20 k(Z > z) = 0.1587 l,_so X : p + 26 30 20+ I o ::-= o I0 (ii) Computations: Pr(15 < X < 2t s-201trc <Z <lys,2'l/rc) 5<z < _ .0.s): 1 2(0.3osi; :0.:s: c) (r) uomputations: E(T*y_): E(X) + E(y) =15 + 80 : 95; var(X+Y) = var(X) lvar(i) Gince rosD :r/25 : 5

; jp{l :)

:) :

:f{f

e) Computations:

$1,!:m:i",i;;1,P0= 0llttft <7.5):Pr(Z < [7.5-8J/0 .4):pr(Z < -1.25):0.1056

iJrpi:lo *ji :'ir, )s(git:iiis'T*rri : iilzsi2ozs; sn :^1azi : zs


z_score for B z_score for D

z-score for
z-score

A: 179-701/3:3 for C : [69_65]/t4/.,h 001 : 7.5

: [74-65]/4:2.25 : t73-70llt3/",h 001 :

19

sc-17

MT2013: Answer 7 (iii) A. 140'l% ,4'7 '2ohl a) (i)8,269 (ii) C. Increase the sample size b) 1. False;2. False;3. True;4. False;5. False;6. True
Details and Comments: :269 a) (i) n: (1.64s2)(0.46x0.54)(0.0s1 denominator of the 1ii;'ioot< at the formula for the CI. The sample size is in the etTor, so increasing the sample size decreases the margin of error. (iii) p :2291525 :0.4362;

90%C|:0,4362tI.645@:0.4362+0.0356or[0.4006,4?1
b)
1. The interval changes from sample to sample

2. Population parameters don't vary; sample statistics vary 3. Higher confidence requires wider intervals 4. CI; are not about individual data values; they are about estimates 5. All CIs for mean include the sample mean; only 95o/o include the population mean 6. Definition of a CI

MT2013: Answer 8 a) A. Hs: p:0.28 and Ho: P > 0.28 b) C.2.67 c) E. 0.003s d) A. e)

C.

f) B. 2000

Details and'Comments: a) One-sided altbrnative since the question asks whether the problem is'omore severe."

_ 0.34-0.28 _)6j b)l:1361400:034;z= f-po ffi @s?4


{ tl .l4oo

c) The P-value is the areato the right of 2.67 on a standard normal curve. d) Since the P-value is less than 0.05, the null hypothesis is rejected; the true population proportion is significantly higher than28%. e; nejecting the null hypotheiis for a two-tailed alternative is equivalent to the usual (two-sided) confi dence interval. 0 Sampling variability only depends on sample size, as long the population is large.

c)*1

MT2013: Answer 9 a) H6 : p:77 and Hu :1t > 77


b) Formula and computed

value: t = #r: m:

!.597

d)u

c) C. Between 0.05 and 0.10 d) There is not sufficient evidence that the mean length of life of people who buy their policies is higher, so do not increase premiums. e) 177 .0 ,78.41

Details and Comments: a) One-sided alternative since the question asks whether policy-buyers are "living longer" than before. c) Use the t-table with 19 degrees of freedom d) Since the P-value is gteater than 0.05, do not reject the null hypothesis. 1.984x3.64h00 : 77.7 t 0.7

e)o
dont

surv

"j

ll,t*

END OF ANSWERS AND EXPLANATIONS TO MIDTERM 2013


SG-18

Midterm Exam 2012


i

Notes: This exam has 9 questions. The duration is 2 hours. Books, noteso and calculators are allowedo but not computers, cellphones or on-line connectivity.

MT20l2: Question

"A

sole

practitioner"

of online"transactions. To determine if this is the case, they plan to survey a sample of their regular .urio-.rr. a) Suppose that ASW's regurar customers belong to a rewards program and have a customer rewards ID number. ASw decides to rindomly seleci 10b numbers. This sampling plan is called: _A. Simple Random Sampling B. Stratified Sampling _C. Cluster Sampling _D. Systematic Sampling _E. Convenience Sampling
b) Suppose that ASW has an alphabetized list of regular customers who belong to their rewards program. After randomly selecting a custoirer on the list, every 25th Justomer from that point on is chosen.to:-b." in the sample. This sampling plan Simple Random Sampling Stratified Sampling .,i^, _C. Cluster Sampling _D. Systematic Sampling _E. Convenience Sampling

ASW, a regional shoe chain, has recently launched an online store. Sales via the Internet have been sluggish compared to their brick and mortar stores, and management suspects that its regular customers have concerns regarding the security

_A.

is called:

-8.

c)

"All regular ASW customers', is known

_A. Parameter _B. Statistic

as the

of the study.

_C. Target Population _D. Sampling Frame _E. Sample


d)

which of the following is the parameter of interest in the ASw study? _A. All regular ASW customers % of regular ASw customers who have concprns about online security -B' ASW customers who belong to the rewards program _C. of ASW customers who belong to the r.*urdi program but don't shop online -D'%
E. None of the above

e) One member of the management team at ASW suggests that their survey could be done online. Customers logging on to the online storilwould

be asked to .o*pi.t. survey and offered a coupon as incentive to participate. Which statement is true? _A. This is a voluntary response sample _B. This would result in an unbiased random sample _C. This would result in a biased sample _D. Both A and B _E. Both A and C

tt.

SG.19

[IT2012z Question

'oPlanning

A brokerage firm gathered information on how their clients were investing for Here is a small sample of the data they collected.

a) Place an

X in the

space beside each variable that is bsst described as Quanti

Number

-Respondent

-Age

-Gender

-Household Self-directed

Income KRSP of portfulio

-Bookvalue
Based on age, clients were categorized according to where the largest percentage of their retirement portfolio was invested and shown in the table below.

t
a)

Ase 50 or Younser Over Aee 50 30 34 Vlutual Funds 37 45 Jtocks

Total
64
82

londs fotal

T9

23

86

t02

42 188

b) The percentage of clients who are over age 50 and invest in mutual funds is: 8.33.3% _C. r8.1% _D.34.0% _8. s4.3% _A.

s3.t%

b)

c) Of the clients over age 50, the percentage who invest in mutual funds is: _D.34.0% _8. s4.3% _A. s3.t% _8.333% _C.

r8j%

d) Of the clients who invest in mutual funds, the percentage over age 50 is:

_A. s3.l% _8.333% _C. l8.r% _D.34.0% _8.

54.3%

e) The percentage

f) Consider the following side-by-side bar chart for the data below:
clllrtdYosrEEi, Ol(hr

-A.

ofclients over age 50 is: l8.I% 53.1o/o

-P.333% -C.

-D.34.0% -F.

54.3%

c)'

Does the chart indicate that mode

of

investment is independent of age?


50 ,r0

Yes

No

l0 l0
l0
0

Explain in one short sentence only.

d):

liltut ?fistsitln

Yqt{q & Mffi hlrlk d hetHri

ldryrr tbr

!1*i6

ldry fter
0{d!

MT2012t Question 3
Here is a histogram and the

'6Mmm

Marketing Manager Money"

five number salary for salaries (in $) for a sample of 48

marketing managers.
Hrbgrrmof ld( lrllmger Sdarie:

filC$ l,|.nrgFr

8rhdc.

Min
46360
a) The shape

o1 69693

Median 77020

o3 9t750

Max 129420

-4. _8. Bimodal


-C. _D.
_E.
Normal

Symmetric 'r'. :.;


left

of this distribution is:

Skewed to the Skewed to the

right

')r'

b) Which of the

_A. _B.

-C. -D. _E. All three are equal


c) Which

following is true? Mode < Median < Mean Median < Mode < Mean Mean < Median < Mode Mean < Mode < Median

of the following is closest to the standard deviation? _A. $ 3,676 _8. $ 13,843 _c. $ 20,765 _D. $ 83,060 _E. Can't tell without the data

d) The

IQR for these data is: _A. $83,060 _8. s22,057

_c.
_E.

$69,693

_D.977,020
$14,566

SG.2I

MTi
e) ComPute the lower and uPPer
Space

for calculations

Tod

Lower inner fence:


Upper inner fence: the 'oinner fences" criterion? f) Are there any outliers' as defined by Yes, only on the left side- of the distribution -A. Yes, only on the right side- of the distribution -B. Y"r, on 6oth sides of the distribution

were recei

E
I

-C. -D.

No

g)Supposethemarketingmanaqe'..*hgwasearning$t29'420gotaraiseandisnow statements is true? bt+o,ooo. wht;h ortne-fottowing

L L

ilil;

-A. -B.

to parts (a) through (g) above. The next two parts are not related tuf.Jtlut"ut ftguttt 1$ tho"'uttds) for a discount The boxplots belolJ rt o* *o"tnfy (Atlantic' in three different regions of Canada office supply companywith locations Central and West).
i

-C. -D. -E.

The mean would increase The median would increase fn. range would staY the same The IQR would increase The IQR would decrease

Me,

l.srar

I @

a)\

b)'r

col:

aoxFlot of Ail*ntlc, ccT trol' d$d wr*l

#
--/ -g. -A. J.

I *H

c)1 the
Slc

Int

Eq

Sp,

h) Which of the following statements lt tfggt Central has the lowest sales revenues revenue Central has the lowest median sales revenue C. West has the lowest mean sales revenue sales -O. West has the lowest median etlantic has the lowest mean sales'
i'r

Which of the following statements is S!ry? A. West has the most variable sales revenues'

d)

$8

-g.

West has the largest tQR-'Central has the smallest IQR' -C. eUantic has the most variable sales revenues' revenues' -O. E. Central has the least variable sales

Er

SG.22

MT20l2: Question

'.OMG: A great place to workro

To determine whether the cash bonus paid by a company is related to annual pay, data were gathered for 10 account executives at Outstanding Management Group lOivtC; wtro received cash bonuses in2007. The data and summary-statistics are shown b"to*.
ANNUAL PAY
$ 70,609 $ 58.487

CASH BONAS

$ tt,22s
$ 6.238

$ 104,s61
$ 43,922 $ 82.613 $ 116,250 $ 76.751 $ 68.513 $ 137,000 $ 94.469 Meun Stsndard Deviation
$ 8s,318

$ 14,194
$ 4,188

$ 11"863 $ r3,67t
$ 7,759

$ 20.760 $ s5,000
$ 34.368

$ 17,927
$ 15,618

$ 28.077 0.735

Conelation a)

what percentage of variability in cash bonuses. can be explained by pay?


.r 'j-

;t, b) What would the correlation be if the Dollars were converted to Euros at the current conversion rate of (1 Canadian Dollar :0.76 Euros)?

the predictor variable (annual pay).

c) Estimate the linear regression model that relates the response variable (cash bonus) to

Slope of the regression line:

(Report to three decimal places) (Report to nearest whole number)

Intercept of the regression line: Equation of the linear model:

Spacefor work:

d) From the equation, in part c), estimate the cash bonus for an executive at OMG earning $82,613 ayear' and compute the residual for this estimate.

Estimated cash bonus:

Residual:

sG-23

e)Would you be confident in using your regression equation to estimate the cash for an executive at OMG earning $200,000 ayeafl

Yes

No

Reason:

f) Below is a plot showing residuals versus fitted values for the estimated regression
equation relating cash bonus to pay for the account executives at OMG.
(ruffi
v{rr{ll Fltr
ic

Cdr

bt!!I

b)w

to th

Circle the conditions for linear regression which are violated, if any. Noqe are violated Linearity
NormalitY'.-+'

Constant Variance (Equal spread) Independence

c)c
Rou

Parts (g) through (i) are unrelated to parts (a) through (f): g) In commenting on the increase in home foreclosures (i.e. banks repossessing homes), news reporter stated "there appears to be a strong correlation between home forec and job loss of the head of household." Comment on this statement; use one sentence only.

d)l
h)A research study investigated the relationship between number of hours individuals spend on the Intemet and age. Which is the predictor variable? Circle your choice.
Hours on

s!
'(f
ii':
i

Internet

Age

i)The correlation associated with the following scatterplot is:


gshrClt dtEr

_A. _8.

1.00 -1.00 _c. 0.50 _D. -0.s0 E. 0.00

t
I

I I I I I I
I
I I I I

MT2012: Question

o6Greater

attitude, greater latitude,,

The Survey of Study Habits and Attitudes (ssHA) is a psychological test that measures academic motivatigl ano tt"iy il.uits. Females ,ror. t iglrrr, on average, than males. The oisttiuution of SSHA tn" r.-ate studenis at a university has mean r20 and standard deviation 28; thedistributlon among male sfudents has mean 105 and standard atlati"n 35' Scores are nonnally distribut.i *rurn. also that scores are independent.

t:;*l ;;;f

3.ffi:f-'ffiTi:r::ffi ;?.t#ave

SHA scores greater than t

62?

Report your

I I
I

ssHA score is exceeded by only 10% of female students? Round your answer to tne nearest whole number.

u) wtrat

L l' I'
l
I

I I

Round your answers to the neaiest whole numbers.

c) compute the lower and upper quartiles for the distribution of scores

of female students.

d) suppose you select a single female student and asingle male student at random and give them the SSHA test' what are the mean and the stindard deviation of the difference (female minus male) between their scores. Report to one oecimar place. Mean = Standard Deviation =

e) using your answers-from part d),compute the probab irity thatthe chosen female has a higher score than the chosen-male.'

SG-25

f) Suppose Angelina (a female) scores 78 on the SSHA, while Brad (a male) scores
the SSHA. Use an appropriate calculation to determine who did worse compared to average for their gender. Circle the name of the person who did worse.

Angelina
Explanation:

Brad

MT20l2z Question

66A

convenient trutho'

Part I. A convenience store owner suspects that only 10% of the customers buy
magazines and thinks that he might be able to sell something more profitable. In order to decide whether he should stop selling them, he tracks the number of customers who buy magazines on a given day.
a) On that day he had 300 customers. Assuming it was a typical day and that his estimate is correct, what are the mean and standard deviation of the number of customers who buy magazines each day? Report your answers to one decimal place.

Mean:

Standard

Deviation:

b) What is the prolability that25 to 35 customers (inclusive) bought magazines that

day?

c) How many magazine sales would you consider to be very strong evidence that his 10% estimate was too low. That is, what number of sales would be extremely unusually high? Hints: Use The Empirical (68-95-99.7) Rule. Remember to give a whole number answer.

Part II. Past records indicate that the magazines he sells on any day have an average revenue of $150 with a standard deviation of $30. Suppose he takes a random sample of 36 past days' sales receipts and records the dollar value of magazine sales.
a) Describe the sampling distribution for the sample mean by naming the model and telling its mean and standard deviation.

d) co rat

b) Suppose the resulting sample mean is $130. Do you think that this sample result is unusually small? Explain.

e)r

SG-26

MT20l2: Question

"Talk about confidence!,'

One division of a telecommunications equipment company reports that l2Vo of nonelectrical components are reworked. Management wants to determine if this perceniage is the same as the percentage rework for electrical components manufacfured by the

company. The Quality Control Department plans to check a random sample of the over 10,000 electrical components manufactured across all divisions.
a) The Quality Control Department wants to estimate the true percentage of rework

for

electrical components to within *4o/o,with99Vo confidence. How many components should they sample?

_A.6s1

_B. 1000 _c.344


_D.438 _8.579
b) They actually select a random sample of 450 electrical components and find that 46
those had to be reworked. The 99o/o confrdence interval is closest to:

of

_D. [ 0.0541 ,0.1499 ] _E. Cannot be deternijnEd with the given information.
c) The 95o/o confidence interval haqed on these data is 0.0742 to 0.1302. Which one the following is the correct interpretation? The percentage of electronic components that are reworked is between 7.4Y0 and I3.0%.

_A. [ 0.0654, 0.1390 ] _8. [ 0.0432,0.1608 ] _c. [ 0.0763 ,0.1277 ]

of

-A.

-8.
-C.

we

are 95o/o confident that between7.4Yo and l3.0yo of electrical

-D.
-E.

components are reworked. The margin of error for the true percentage of electrical components that are reworked is between 7 .4%o and 13.0%. All samples of size 450 will yield a percentage of reworked electrical components that falls within 7.4Yo and 13.0%. There is a 95Yo chance that 7 .4%o to 13 .\Yo of the electrical components have to be reworked.

d) Based on the 95o/o confidence interval, should the Quality Control Department conclude that the percentage of rework for the electrical components is lower than the rate of l2o/o for non-electrical components? _A. Yes, because the lower limit of the confidence interval is 7.4%. Yes, because l2o/o is contained with the 95o/o confidence interval. No, because 12% is contained with the 95%6 confidence interval. No, because the upper limit of the confidence interval is 13.0%. _E. We cannot say since the sample size is not large enough.

-B.

-D.

-c.

e)

All

_A. ...tighten the confidence interval _B. ...decrease the margin of error
_C. ...increase precision _D. ...increase the margin of error

else being equal, increasing the level of confidence desired

will...:

E. ...increase the margin of error and tighten the confidence interval SG.27

MT2012: Question 8

654'

dip in chips"

A company manufacturing computer chips finds that 8% of all chips manufactured are defective. Management is concerned that high employee turnover is partially for the high defect rate. In an effort to decrease the percentage of defective chips, management decides to provide additional training to those employees hired within the last year. After training was implemented, a sample of 450 chips revealed only 27 with defects. Was the additional training effective in lowering the defect rate?
a) The appropriate Ho:

null and alternative hypotheses are:


Hul

b) Give the formula for the appropriate test statistic and compute its value.

Test Statistic Formula: Computed value: Show your work:

c) Assume that the value of the test statistic is -1 .4.Don't use your computed value from part b).The P-value associated with the given test statistic is closest to: _A. 0.0404 B. 0.05 0.0808 _D. 0.1616 0.9192

_c.
_8.

d) From the P-value in part c), and using a 1% significance level (i.e. cr: .01), which of , the following is _A. Conclude that additional training significantly lowered the defect rate. _B. Conclude that additional training did not significantly lower the defect rate. _C. Conclude that additional training significantly increased the defect rate. _D. Conclude that additional training did not affect the defect rate. No conclusion can be made with the given information.

true?

-E.

12: Question 9
large software development

6oThe

non-profit motiveo'

firm recently relocated its facilities. Top management has their professional employees to engage in local service activities. They that the firm's professionals volunteer an average of more than 15 hours per If this is not the case, they will institute an incentive program to increase it. A sample of 24 professionals reported the following number of hours:

l2 t3 t4 I4 t7 l7 T7 18
sample has a mean The correct

15 18

15 18

15

T6

I9

19

t6 t6 t6 I6 t9 20 20 22

of

16.7 5

hours and a standard deviation of 2.40 credit hours.

null and alternative hypotheses are:

-A.Ho -B.Ho -C.Ho -D.Ho


-E,HO

x:15andHu p:15andHu p:15andHu p*15andHu p:15andHu

f>15
p>15 p<15 p+15

p:15

b) The correct value of the test statistic is closest to:

_4. 3.572 _8. -3.572 *c. 1.327 -1.327 -D. 0.729 _8.

"r".

j.'.'

.11

c) Which

of the following conclusions is correct? _A. We reject the alternative hypothesis at the 5o/o significance level. We fail to reject the null hypothesis at the 5% significance level. -B. _C. An incentive program is needed since the evidence indicates professional employees volunteer an average of no more than 15 hours per month. _D. We reject the null hypothesis; the firm shouldn't need to institute an incentive program since the evidence indicates that professional employees volunteer an average of more than 15 hours per month. E. No conclusion can be reached about the hypothesis with the information that is given.

d) It is appropriate to test the mean because:

_ A. The data are a simple random sample from the population of interest _ B. The distribution of the sample data appears to.be approximately normal C. Volunteer hours is likely to be independent across employees _ D. All of the above
e) A95% confidence interval for the true mean number of hours of volunteer time is
closest to:

_A. 16.75 + 1.016 _8. 16.75 + 0.840

* _D. +7.074 2.40 _8.


MT2012

_c.

16.75 t4.966 16.75 4.114

- END OF QUESTIONS;

ANSWERS AND EXPLANATIONS FOLLOW


SG.29

MT2O12: ANSWERS AND EXPLANATIONS

MT20l2z Answer
a)

A.

b)

D. c) C.

d)

B.

e) E.

Details and Comments: a) Each regular customer has the same chance of being selected for the sample. b) Choosing ooevery 25th customer" makes it systematic. o'universe" for which you want to be able to generalize. c) The target population is the d) A parameter is a numerical characteristic such as a mean or a proportion/percentage. e) Since people can decide whether to answer or not, it is a voluntary response, and hence subject to bias. People who decide to participate may not be like people who decide not participate.

MT2012: Answer 2 a) Age, Household Income, Book value of portfolio b) C. 18.1% c)8.33.3% d) A. 53.1% e)E.54.3% f) Yes: The age distribution (ratio of younger to older) is about the same for each mode (i.e. type) of investment.
Details and Comments: a) Age (fq), Household Income ($), and Book Value ($) all have units and are measured on a continrium; so they are quantitative. b) 341188 :0.181 ,. c) 341102: 0.333 d) 34164: 0.531 e) l02ll88 :0.543 f) Look for differences across the clusters of bars.

n
a

d q

N.IT20l2t Answer 3 a) C. Skewed to the right b) A. Mode < Median < Mean d)8.$22,057 c) B. $ 13,843 e) Lower inner fence: $36,607.50; Upper inner fence : $124,835.50

L
I
P

a.

0B. s)A. h)B.

i)D.

b.

Details and Comments: a) Long right-hand tail: more of the area.is piled up to the left. b) The mode is the peak and it is clearly to the left of the median value of 77020.The median is less than the mean for a right-skewed distribution. c) Use the rule of thumb: s = Range/6 d) IQR: Q3 - Ql : 91750 - 69693:22,057 e) Lower inner fence :69,693 - L5x22,057 : $36,607.50 Upper inner fence :91,750 + 1.5x22,057: $124,835.50 f) The maximum is larger than the upper fence but the minimum is not smaller than the lower fence. g) The sum is increased so the mean is increased. h) The median is the line in the interior of the box. i) Variability is shown by the length of the box.
SG-30

ttt

c)

vt

a
e)

d)

At

a)

NIT20L2: Answer 4 y' :0.n52 :0.5402

b) Unchangedat0.735

or

54%o

d) 9 e)

= r(s, / s*) :O.735(t5,6tg/28,077) bo = ! - b$: 17,??1- (0.409x8sirsl


l)
$2,6 t 3)

Residual

No; a predicrion at$200,000;.;;;r extrapolation beyond the range of data. f) c ons tant Variance (v- srrap e two variables are categorical, not quantitative, *o"rution is not appropriate. i) E. 0.00

: 11,863 - 16,g2-: _M;fi;''

t 6,e 68

+ 0.40s(A;,2' i il

:$

: 0.409; : _16,968;
I 6, 82 I

i:

_16,968 + 0.409x

ill;;

;;;ffi#ft;ltJl"_0,,

fi]ft:

*l

Details and Comments: a) This is the definition of r_squared. *ttlation coefficient iiut no u"its; it doesn't change if the measurement units ?f,:t c) straightforward application of least squares regression line formulas.

h)Age "precedls" i) The best-fitting stiaight

,U:'i;',Hff ;H*ii:T,*HiT","'-ilffi?il!.'i,n"p..IiJ.avrheresiduar

dG.ii;;;#,,

hn;i, il"Jr"rr"f.

Hours on rnrernet.

MT20l2: Answer 5
d)

b) 291% Mean: t5; SD :44.8

f) Angelina; Z-score for

"j angelini:-1.s,
162

i.izgior

0.63 or 630/o Z-scoreror nrao: _1.0;

Details and Comments:

b) Find the value of z 1.28; X 120 + l.2g(28):155g c) Find z-varues that have ui ui"u

?.rtl1fftXr"h:X-varue; Pr(X > 1621 : pr(z^> Lt62 _

is r.5 sDs above the average. Find the areato the right of

z-thathrt;;;;
the

1201/2g)

: pr(7, 1.5) : 0.0668.


of fi%to
the right; then ,.unstandardize.,,

: -0.675; X: 120 + 1_0.675)(28) : 9:, 19, e3: z :0.675;


X:
t2o
d) Mean: 120-105 :15; SD e) Pr(F-M > 0) pr(Z>

;ffJfffi:ize.

'since

then z is'ffi1t i., th. ,-u;i;;;; t . r.rleft; is the negarive orrhe z_

ii z5%o to,theright and to rhe


tze

* io.atsj?ti':

0.629301 0,63 0r 630/o f) Z-score for Angerina ='gs*tzolnls -t'.5; Z-scoreror erao = (70-105) r35 : -1.0; Angelina did worse relative to the reErence populations since her Z-score more negative.

[0-ts1ii+-.t1': p:(7>_,0.33)

: ffifrg
:

44.g

SG.31

MT2012: Answer 6 Part I.


a) Mean

- nP:300x0.10:30.0; SD :.rlm: l,Fbbrcffi3o


35): Pr([25-30]15.2<Z<135-3011s.2):
Pr(-0.96

= 5.2

b) Pr(25 < X <

: 1 - 2(0.1685;:0.663. c) From the Empirical Rule, 3 SDs above the mean is extremely unusual; f3o: 30 + 3(5.2) : 45.6. Sales of 46 or more would be extremely unusual.
Part II.
a) Normal: Mean

<Z<0.96)

b) Pr(f, < 130) --Pr(Z < [130 - 150]/5) :Pr(Z < -4) < 0001 There is an extremely small probability of getting a sample mean this small. Details and Comments:

150 and SD

:3011fi:5

Part I.
a) Use the mean and standard deviation of a count.

b) Use the normal sampling distribution of a count. (Note: Continuity correction was needed, but if you used it correctly you would get an answer of 0'71 1 .)

Part II.
a) Use the mean and standard deviation of a mean. (Note: The CLT applies here, not necessary to say this in the answer.) b) Use the,normal sampling distribution of a mean.

butitiJ

"
a)

ffi

.:..

sli

l'4T20l2t Answei:7

U,d

D.

b)

A.

c)

B.

d)

C.

e) D.

Details and Comments: a) n : (2.s7 6\(0.1 2X0.S 8y( 0.04\

# isl
43 8

b) p:461450:0.1022 99% cl: 0.t022 t 2.s76@

d)'

:01022 + 0.0368

d)t!

c) Notice the wording and the use of the term"95o/o confident". d) Values inside a confidence interval are likely values of the parameter. Evidence of a change or a difference depends on the target value being outside the CI. e) Examine the CI formula; a higher confidence level requires a larger multiplier/critical

le

value so the margin of error will be larger.

NIT20l2t Answer g a) Hs : p:0.08 and Hu : p < 0.0g


b) Formula and computed value;
Z:::--

c)

c.

__f-po /(o.os)(o.gz) {lpoct| " {-r-so


0,06-0,08 d) B.

p:27/450:0.06
1aa

Details and Comments:

:1.?::l?,ttlHt:ffi:l'."
confidence interval.
c) Find the

the quesrion asks whether the rraining was errective in

b) Remember thatthe test statistic uses

/\/n

m ln

the denominator, not

arcato the left of -1.4 on the standard normal curye. d)Since the P-value is not less than-0.05 the evidenrc i, statistically

"ot

significant.

a)

MT2012t Answer 9 b) c) d)

B.

A.

D.

D. e) A.

Details and Commeng:,.-

f)Ho:p=15andHu:p>15.

:T:?#tt131::?$ve L,,1-....'''.-::-:?
' s/,ln

si'ce the question is abour "increasing" the volunteer time. <-^ 2.40/\m J'J tL
time

c) The P-value is much smaller than 0.05 so reject the null hypothesis. The volunteer is greater than r5 hours. so no incentive program is needed to get past r5 hours. d) These are rhe assumptions/condition;

e) 16.75

+2.069x2.40/\m:

ibr;

t6.75 + 1.016

";;;".pl.Ite.t.

END OF'ANSWERS AND EXPLANATIONS TO MIDTERM 2012

SG.33

Midterm Exam 2011


Notes: This exam has 9 questions. The duration is 2 hours. Books, notes, and calculators are allowedo but not computers, cellphones or on-line connectivity.

MT2011: Question
a)

6'First things firstt'

At the beginning of the term we asked all Commerce 29I students to complete our line survey. This survey was most likely designed to be:

-A. a census of all C29I students -B. arandom sample of business students
-C. a random sample -D. all of the above _8.
of
2od

a random sample

of

aIl

C29l

students

year UBC students

b) The survey asked a wide range of questions. For each variable, circle the description which best describes the type of data the variable represents.

Ethnic

Height C290 grade

background

# hrs onlin4per

day

Categorical Categorical Categorical Categorical

Quantitative Quantitative Quantitative Quantitative

Identifier Identifier Identifier Identifier

c) From the surveylresults, we can estimate that, on average, students spent 15.2 hours per week studying. This number seems high given that for a course load of 4 courses students spend 12 hours per week in the classroom and nearly half of the students reported doing paid work. What is the most likely explanation?

-C. -D.

-A. -8.

very skewed and the median is a better numerical summary the data are bimodal, the two goups are those that work and those that women study more than men none of the above

the data

are

d) Unfortunately, not every C291-registered student responded to the survey. If it were true that students who didn't respond also spend less time studying, then our estimate study time from the survey is:

-C. not a good estimate for study time of C291 students but -D. we can't say whether it is too high or too low.

-A. biased above the true average study time of C291 students -B. biased below the true average study time of C291 students

a good estimate of average study time of C291 students

e) From the survey we find that the Commerce 290 Grade (call this variable, X) has a symmetric, bell-shaped distribution. Also, 95o/o of the grades fall in the range 53 to 93. Use that information to compute the mean and standard deviation of X. Report to at one decimal place.

Mean of

SD

ofX

MT2011: Question 2 "stock answers are sufficient here,, a) The following data are the price-to-earnings ratios (P/E ratio) for a random sample of 25 stocks traded on the NYSE. The data valuis have been sorted from smallest to largest.

Data: 4 8 r-i- tL t2 i-3 13 !4 t4 15 16 1,7 t7 t7


2L 22 22 24 24 26 28 33 35

1,g

39 The mean of these values is 19.0 and the standard deviation is g.5.

i) Find the following:

: Outliers: : Qr{ote:outlie,,u,"dffi!l1:*::*Tnooutliers,write..None'')
Inner

Median Ql Q3 IQR

: :::-

fenceg

ii) Is the distribution s5rmmekic or skewed? (Note: You do not have to draw a graph to answer this.) Circle your choice. Then give your reason.
Symmetric

Skewed

.1t

iii)

Sketch a boxplot of these data. Use the version based on the five-number summary; do not use the modified version using fences.

b) Determine whether each statement is true or false?


is required. 1.

circle your choice. No explanation

If the mean and SD are equal for a measurement variable

llrl only takes positive values, the distribution is syrnmetric. 2.lf the mean and median are equal, the distribution must be normal. 3. If the mean and median are equal, the mode must also equal the mean and median. 4. The SD and IQR are always equal for a symmetric distribution. 5. The SD ofa set ofdata values can never be zero.

True True True True True

False False False False False

SG-35

,6To-fu or not to-fu, that is the question'o MT2011: Question 3 Read the foilowing survey design plan and then answer the questions after it. , Get Healthy, o pridrr", of healthfoods conducts a survey of the Lower Mainland to determine how-recepttve itgh schiot students would be to its TOFU BURGH product and what market potential (sates) it could expect. It plans the survey as follows: i. From the tist of all schools in the area, tyvo groups are defi.ned, public and private high schools, called PUBS and PNS ii. From the PUBS, four schools are chosen randomly. iii. From the PklS, one school is chosen randomly' Student iv. In the PUBS schools selected, on io pdniclpate i' a vou uaG 'aroonly cieve Healthy'ooqs 6earch proieff give every one day, researchers pa!on Iyou lryourTOFIJ BUqCH loryour ThBnkyou. fifteenth student to exit the school a Twiggy osohealihy, self' a-stamPed, and TOFU BURGH 6typical iiledthe buy Get Healthy Foods addressed postcard (ike the one here). Marksting Research set researclters Department school, PRIS In the v. TOFU BURGH StudY to: give a l1.ll0 up a stand outside the school and 1236 S. E. Marine Drive Addrr..: Vancouvsr. BC postcard to the and BURGH TOF\J free T.l: any student who comes to the stand.
Dear Hioh School se ected been You will by Gel ponc $1.00 circle your choice below Bnd mail this pdst cerd before Apdl 30. 2002. Director ol MarkBtihg Having would TOFU BURGH. in
0
1

weBK

3 4 ot more

Belurn

Nan6:

a) The overall survey sampling design planned by the company can best be described

as:

-A. -B. -C. -D. -E. t; fn tne PUBS selected, the sampling design uses:

convenience samPling multi-stage samPling stiatified samPling simple random samPling clustei'samPling

- B. voluntarY response strategY - C. unacceptable bribery of students - D. anecdotal responses c) In the PRIS selected, the sampling design uses: - B. voluntarY response strategY - C. unacceptable bribery of students - D. anecdotal responses d) One parameter of interest is likely to be: _ _ _
e)

A. systematic samPling

b)

Ie

b,

A. systematic samPling

B. the number of high school students in the Lower Mainland C. the number of students who replied they would buy at least one TOFU BURGH in a tYPical week D. the proportion of students who replied they would buy at least one TOFU BURGH in a tYPical week

c). (m
ren

sm
;; j

which of the two samples is likely to have non-response bias? A. PUBS schools onlY

Yer

Ret

B. PRIS school onlY C. Both PUBS and PRIS schools D. Neither will have non-response bias

8{-i

MT2011: Question

o(Unassociated questions about association

how ironic,

Note: This question has three unrelated parts.


a) A business school conducted a survey of companies in its state. They mailed a questionnaire to small, medium-sized, and large companies. The rate oinon-r..ponse

is important in deciding how reliable survey results are. Hrre are the data on responses to
this

Response No Resnonse

Small 375
225 600

Medium
160

Larse
40
160

Total

240 400

200

(i) What was the overall percent of non-response?

(ii) How is non-response related to the size of the business? Use percents to make your
statement precise.

1f

b) Investment reports now often include correlations. Following a table of correlations among mutual funds, a report adds, "Two funds can have perfeit correlation, yet different levels of risk. For example, Fund A and Fund B may be pirfectly correlated,'yet Fund A moves 20o/o whenever Fund B moves I0o .'Explain to someone who knows no statistics how this can happen.

A study shows that there is a positive correlation between the size of a hospital (measured by its number of beds, .r) and the median number of days, y, thatpatients remain in the hospital. Does this mean that you can shorten a hospitai stay by choosing a small hospital? Explain your answer choice.
c)

Yes

No

Reason:

SG-37

MT2011: Question 5

$Bartvs. Lisa does not refer to Simpsonos Paradoxo'

point averages (GPA) of-its 1000 a) At a well-known business school the grade 2.84 Ad standard deviation 0'40' undergraduates are normally distributed"*ith ..utt 2'00 (i'e' "on probation")? (i) What percentage of the undergfaduates have GPAs below
Answer:

a) inr ev lor

in

ob ce an

to

thr

(ii)whatGPAwillbeexceededbyonly20ohofthestudentbody?
Answer:

(iii) Compute the lower distribution:. ::.

for this and upper quartiles, and the interquartile range

Ql :

Q3=

IQR:
in<

Nc

$,
Scholastic Aptitude Test (SAT)' Ina b) Bart scores 725 onthe mathematics section of the with mean 500 and standard reference population, sAT scores are normally distributed Test (ACT) mathematics test; deviation 100. Lisa r.or., 33 on the Americutt Colltgt deviation 6' ACT score, ur. rror-utiy distributed with mean 18 and standard (i) What are the z-scores for each student?

Bart:

Lisa:

relative to the (ii) Circle either the name Bart or Lisa (above) based on who did better
reference poPulations.

MT2011: Question 6

oostrength in numbersl numbers on strengthtt

a) To test the strength of building materials such as steel girders, engineers place increasing loads on the girders until they break. The pressure exerted by the load that

eventually breaks the material is call the 'strength' of the girder. Generally speaking, the longer the girder, the less the strength. Your company makes steel girders. The engineer in charge of testing tells you that he has tested 10 girders to breaking point and has obtained data linking the length of each girder (in metres) to its strength (in kg per square centimetre). But his computer crashed just after he ran a regression analysis on the data and all he can remember is the lengths of the girders and a few strengths. He did manage to record the means and standard deviations olall the lengths and sGngths and the r2 of
the regression, which was 0.719.

(X) Lensth (m)


1 1

(Y) Strensth fte/cm')


90
101

2 2
J

Lost Lost
91

J 4 4
5

77

Lost Lost
76

j.
Mean
SD

Lost
82.60
10.72

3.00
1.49

Note: The means and standard deviations are calculated for the ENTIRE data set, including those that are missing.

(i) What is the correlation between length and strength? Report to three decimal places.

(ii) Work out

a regression equation that predicts strength

from length.

Equation:

(iii) You notice that the purchaser of your girders requires the 5 m girders to support an average load of 75 kg per square centimetre. Do you feel confident your girders will do
that? Give a numerical rationale.

SG-39

b) What is the correlation coefficient for the following three points in the X-Y plane?

(sroP AND rHU{I< rEEqBp YOU SrARr!) X 1 3 5


Y
Answer:
c) An American study found that the correlation between two-year-old children's heights (measured in inches) and their weights (measured in pounds) was 0.46. What would the correlation coefficient be if you converted their heights to centimetres and weights to kilograms? (One inch :2.54 cm and 1 pound:0.454 kg.)
4 J
2

Answer:

d) An economist studied salaries of 321bank employees with five or less years of employment in a national bank. He found that the relationship between years of service and salary was linear and that the regression equation predicting salary (in thousands of dollars) was: Salary :2I.5 + 3.1 * Years. He concludes that employees with 10 years of service should make an average salary of $52,500. Is his conclusion correct? If not, say why.

n
4

\4

e) In part d) the economist has used the regression equation to make a prediction. Which of these numbers best measures the precision of this prediction?

-A. The standard deviation ofy (sr) -B. The standard deviation of x (s,) -C. The square of the correlation coefficient (r') -D. E. The ratio of the two standard deviations (s, /s")
f) An investigator measuring various characteristics of a large group of athletes found that the correlation coefficient between the weight of the athlete and the weight that the athlete could lift was r: 0.60. Determine whether each statement is true or false. Circle your choice. (i) If an athlete gains 5 kg, he/she will be able to lift an additional 3 kg. (ii) The more an athlete can lift, on the average the more
that athlete weighs. (iii) 36 per cent of the athlete's lifting ability can be attributed to his or her weight alone. (iv) 60 per cent of the athlete's lifting ability can be attributed to his or her weight alone. True
False

The slope of the line (br)

True True True

False
False

False

SG-40

MT2011: Question 7

o6Pack

up all your troubles, and call it a day,,

An important part of the customer service responsibilities of a telephone company relates to the speed with which troubles in residential service can be repaired. Suppose that past data indicate that there is a probability of 0.70 that service troubles can be repaired on the same day they are reported.
a) Suppose the company receives 100 houble calls on aparticular day. What is the approximate chance thatS0o/o or more will receive same-day repairs,

b) Suppose it is also known that the repair time for a trouble call has a mean of 480 minutes and a standard deviatibn6f ZSO minutes. A random sample of 400 trouble calls was taken and the repair times recordpd. Compute the probability that the mean of the 400 repair times is less than 500 minutes.

SG.41

MT2011: Question 8

6cstatistical analysis of a logo transformationo'

pn An established clothing retailer, CHAP, is interested in customer response to a new logo. A survey randomly samples 100 customers; 55 of them say they wo11ld it is the neri logo to the previous one. Ho*.,r.., CHAP will only change its logo if hal convinced that the newly designed logo is preferred by the majority (i.e. more than questions. of its customers. Based on this information answer the following
the proportion of customers who prefer the newly designed a) The sample estimate logo over the previous one is: A. 0.55

i,

_8.55

c.

100

D. Not able to be determined from the information given

b) The standard error of this estimate is closest to: A. 0.0025 B. 0.050

_ _ _

c.

0.071

D.0.50
prefer c) The 95%,con_fidence interval for the true proportion of the customers who new logo over ihe previous one is closest A. 0.55 * 0.098 B. 0.55 + 0.98 c. 0.55 + 0.0049 D. 55 + 9.8

the i
'

to:

_ _ _

Fi

Ur

who d) How large a sample n would you need to estimate P,the proportion of people prefer the riewly designed logo over the previous one, with margin of error 0.05 with 99% confidence? Use the guess :0.5 as the value forp' A. 384

ter

de

c)

_8.664

_c.26

Hc

_D.271
e)

d)
test were conducted on these data, the test statistic would be 1.00. If the

If a hypothesis

uitr*uti*

_ _ _ _

hypothesis were one-sided, what would the P-value be? A. 0.0794 B. 0.1587 c. 0.3174 D. 0.8413
the hypothesis test in part e)?

Fo

Co

(sl

fl Which of the following is a correct conclusion from

A. Customers definitely prefer the new logo B. Customers definitely do not prefer the new logo C. There is not enough evidence to say customers prefer the new logo D. There is not.ttough evidence to say customers do not prefer the new logo

e)(

MT2011: Question

'6The business of bus-ness'o

You are the new Operations Manager of the local public transportation company and are especially interested in the reliability of bus service. You plan, on a monthly basis, to take a random sample of major bus stops and observe whether the buses depart on time or late and how late they are. (Buses never leave early since, if they arrive early, they wait until their departure will be exactly on time.)
a) The first month, you gather a random sample of l2l bus departures from a variety of times of day, days of the week, routes and locations. The sample has an average lateness of departure of 6.4 minutes with a standard deviation of 1.8 minutes. Which of the following is closest to a95oh confidence interval for the average lateness of departures for the entire bus system this month.

_A.6.4+0.029

_8.6.4

+0.271

_ c. 6.4 L0324 _ D. 6.4 +3.564


b) Which of the following would decrease the width of the confidence interval?

A. Reducg the confidence level B. Increaie tlie sample size


C. Reduce the saqrple standard deviation D. All of the above

Five years ago, the system-wide mean lateness of departure was known to be 6.8 minutes. Using a 5o/o level of significance and the sample results of part a), cany out a hypothesis test to decide whether the system is improving; that is, whether the mean lateness has decreased from five years ago.
c) The appropriate Ho:

null and altemative hypotheses are:


Hul

d) Give the formula for the appropriate test statistic and compute its value.

Formula: Computed value: (Show your work to the

right::>;

e) Give a range in which the P-value is located.

SG-43

f) From the P-value associated with this test statistic, which of the following is

_ _ -

A. Do not reject Hs atthe I}Yo significance level B. Reject Hs atthe I0o/o significance but not at the 5% significance level C. Reject Hs atthe Soh significance level but not atthelo/o significance level D. Reject Ho atthe to/o significance level

g) Using the 5o/o significance level, state your conclusion in that the bus company management can understand.

h) The distribution of lateness of departure is strongly skewed to the right. However, itis still appropriate to test the mean because:

_ A. The data are a simple random sample from the population of interest _ B. The sample size is large enough for the Central Limit Theorem to apply _ C. Since the sample is random, bus departures are independent of one another _ D. All of the above
..
,.1.

n
d)

tu fr
rl

BONUS: In what century did the "equals" sign first appear in print?

_ A. 1300s _ B. 1400s C. 1500s - D. 1600s -E.1700s _F.1800s

In it)
111

MT2011

G. 1900s

END Or QUESTIONS; ANSWERS AND EXPLANATIONS FOLLOW

ilr;2011'Answer
c)

A; d) B; e) Mean(E = Zi, Snpg; = 19 Details and Comments;


a

?tl::"?:#;,;:6ff#i,11f;;unt'Quantitarive;c2e0gradeeuantitative;

a) The goal was to survey the entire population

census.

of c2gl students; that is the definition

of

P#T*' ;,?,iiifli; Jll,iff iijif.'iffi

h meas ured with uni ts

(cm,%o,and hrs,

;i,ji"!:"*rtit4x;*::*{r#la*mwi,hahighnumberof
ir,i.r, are missing ror a reason flriiq;ff:,ff#f:'r",Xi::T.-lu'i.'",, ?,Iffi:ni:ff :i',.#T[l;:,'.g*1.,f#,#:1,Rure):73t2(10)
MT20lll. Answer Zi) Median: f Z, ei';13,_et ,ni,,ine iuiu.,

a)

lnner fences G3's, 49 ir) The distribution

D.iI;.';H i*t.*"o

= 24,

Ieft = 11.
(0,40.5).J There are no outriers. i, quit, Jin r.", from rhe median.

ril6il;.un

10

20

30

40

b)

All five statements

are False.

a) r) With

lelails and Comments;

fiTllg,T:i.t::ff g:.?6?iJf
,-?rti:T:Y, ii?Jh: sketch musr show

25 datapoints, the median is the 13th value. Thr

tt ts also acceprable to reporr ir-"r o oi*uy p/E the disrribu,il;;"#";; ro box and

sn:,:#ih,#::1,fi li,ry;ii:l,$li::I: rr,,il;#:: [:i#;*R) is negative, rhe.tAl;;


i#, oi.,,n,tiJn,u,
not needed in

;itn"
j.

*'.l.rt

is croser to the reft side 'r'i'i.""i."tiliii if#1f"ffif;ft"fjan

rhe skewnes

iT"m;:f.ffi.-,Hfllj;*?1"

to "work" so the dishibution is

Nor

symmerric.

is reason be true. 5. SD = 0 if all data values;.;,h"-r;;.

j. pere

*fr *:ixr#ti'l;:m;x;*ni",lx#j#i?y ;"ff#tr$;.::ffi no tor this to


sc-45

MT2011: Answer 3 a) B or C; b) A; c) B; d) D; e) C;
Details and Comments: a) Both multi-stage sampling and stratified sampling are acceptable answers. Technically, multi-stage sampling is the preferred answer, since for PUBS, four schools are chosen randomly but the actual students are selected systematically. b) Since every fifteenth student is selected, the selection is systematic, not random. c) Since students are free to come, or not, to the stand, this is voluntary response. d) Counts are not parameters because they are not adjusted for sample size; however, proportions are parameters. e) Cards are handed out either to every fifteenth student or to volunteers; however, in each group not everyone who receives a card will mail the card in; that's non-response.

(i

(i
b

sl

c,

4 tt

e)

MT2011: Answer 4 a) (i) s2% (62s11200:0.52) (ii) Non-response rates are: Small: 37.5o , Medium: 60%o,Large: 80%. The larger the company the higher the expected rate of non-response. b) Correlation is not the same as slope. So a perfect correlation does not mean that the slope is 1, hence a I unit increase in x does not mean a 1 unit increase in y. c) No: Larger hospitals are more likely to take more serious cases requiring longer len$h of stay.
f Details and Comments: a) (i) Sum across the columns to get the row totals of 575 Respondents and 625 Nonrespondents. Then divide by the overall total of 1200. (ii) Column percentages are needed here, not row percentages. l,the slope is still the ratio of b) Remember the formula for slope: Ut ,(*). Even if

a)

th re

Bi b)

0 A
m

kp

r:

M
a)

the SDs, which need not be equal. c) Look for lurking variables to explain unusual or nonsensical correlations.

b)

IQR:3.11 - 2.57 :0.54 b) Bart: 2.25;Lisa:2.50, Circle Lisa

MT2011: Answer 5 a) (i) Pr (X < 2.00) :Pr (Z < [2.00*2.84]10.40):Pr (Z<-2.10):0.0179 ot (ii)Z:0.84; X :2.84 + 0.84(0.40):3.18 (or 3.176) (iii) Q1 : 2.57iQ3 : 3.1 1; IQR : 0.54 Ql for Z:-0.675;X:2.84 + (-0.675)(0.40):2.57 Q3 forZ: 0.675;X:2.84 + (0.675X0.40):3.1t

17.9o/o.

De

a)

b) Bc

are

col

Z-score for Bart (725*500)/100 2.25:. Z-score for Lisa (33-18y6 :2.50; Lisa did better relative to the reference populations since her positive Z-score is higher.

Details and Comments: a) Remember to make sketches of the required areas so that you get the correct parts of the normal curve. In (i), standardize Xto Z and find the corresponding area; in (ii) and (iii), begin with the area, find Z and 'ounstandardize" to get X.
SG-46

tuIZ0ll:
a)

Answer 6 -0.g4g Q'.ote that the correlation is negative!)


-0.848(1 0.72/t.4s):

(i) r = *tlffi: (ii) bI= t (#) =


bo

=!

d) No

ll,i,;*l r;l 1,3",JR b) Perfect negative correlation' : r c) r:0.46, unchanged (coneration -? tpr"t p o"i, o"irir, ,t .y fall on a straight is i'nuaeant
-predictionJut
rb.

- bfi :82.6::t_g.10X3.00): 100.9; !:


j ;

-;.;;,^-'@!rv'rD'v
100.9

;j:; ;;:t m*:H:th:

_ 6.10x required 7 5 kgt cm2, you

y1.r..w;s

f) False, True, True, False

illiq[J"tiffi

J:'X*iin:ajn/;;;"iH'i,u,,or,i,prov-.,,q

line.) to thi -"ur*"..nt scales.) extrapolation beyond the range of data(that is,

*";iJ;; #:1i#,','#ni;:,:'f#??fi ii;a;;*;;;#?"'.bu'dingmighrralrdown b) Remember to makg.a ptoruerore doing the calculations. | (i) is farse becausS;d* 9ir[r *ir'i.un;;;;i;; fift of 3 kg onry on averase. A gain of 5 kg mighl eiieaoiiti#itift

Details and Comments: a) The minus sign is vital; the correlation is negative since the longer the girder, the strength' If you rotgtitttto'iiu-rlign the lower yo* iarculations of the slope, inlercept regression equation wiit and be itr.oo..t urra you up concluding that 5 m girders

gr."i..lriil;;i".

;1,H:li;ti|)||ft:;""I1{'r'.pp."
MT201l: Answer 7 a)pr (p > o.8o) :pr
b) pr

some people and less than 3 on averase; (iiifuses trre oennition or,,; ri.,,t

(zrffil
(Z t_2^.1g):0.0145

:pr
(r
<

soo)

pr

(2.

:Pr(Z

) < 1.60) :0.945

jiffi

or

l.4So/o

or 94.5%

Details and Comments; yr. the sampling distribution of p. "rJ b) Use the sampling distribution of x- (i.,e. rele_r1ber the ,/i;nthe denominator). Both of ther. .ituuiionr a"p.nJo"

fr*lTr;;ough

ill',- *a u"'rlrrJo_ samples (r00 andz00,..rpr.tiu.r/. Remenrberio.ut. a skerch to ger


riJc.nout
the

it*r

SG-47

MT2011: Answer 8 a) A;b) B; c) A; d) B; e) B; 0 C


Details and Comments: b) Reason: J (0.55) (0,45) / L00 : 0.050 c) Reason: 0.55 + 1.96(0.050) 664 d) Reason: n: (2.5762X0.5X0.5y(0.052) z-ctrrve. the 1.00 on e) Reason: Area to the right of f) Reason: The P-value is not less than 0.05 (and not even less than 0.10).
a) Reason:

p:55/100

0.55

MT2011: Answer 9 a) C b) D c) He: lr: 6.8; Hu: ., I-tto 6.4-6.8 :

< 6.8

d)t:T":73/r[m

_2.44 -L.11

e) 0.005 < P-value < 0.01

f) D. Reject Ho at the to/o significance level


g) There is strong evidence to say that the system is improving (or that mean lateness

decreased)

h) B or D (either is

acceptable) :

f
J

Details and Cimments: a) Reason: ti26 t'.980: CI 6.4 + L 980(1 .811ffi) - 6.1* { b) Examine the effect of each of these by referring to the formula for the CI. c) This is a one-tailed altemative since the question asks whether mean lateness has decreased from five years ago. d) Remember the minus sign on the test statistic. e), 0 & g) Reject H6 since the P-value is less than 0.01. Remember to state your conclusion in a sentence that answers the original question. h) B is the most important of the three, but A and C are also needed for the test to work.

0.324

c),
his

BONUS: C. Theooequals" sign first appeared in print in 1557.

MT2O11 _ END OF ANSWERS AND NXPLANATIONS

Midterm Exam 2010


Notes: This exam has 9 questions. The duration is 2 hours. Bookso notes, and calculators are allowedo but not computers, cellphones or on-line connectivity.

MT2010: Question 1 'rMittens, means, and medians'r


Bay Company was the official retailer of Olympics merchandise, including the very popular red mittens. Their database included information on each sale made to customers who paid by credit card (Visa only). Some of the variables they collected are listed below. Decide whether each variable would, for analysis, be most usefully considered as categorical, quantitative or neither.
a) The Hudson's

o r
o o

Total amount of the sale (g) Country of origin on credit


Gender of the customer

card

Categorical Quantitative Neither Categorical Quantitative Neither Categorical Quantitative Neither Categorical Quantitative Neither

Visa credit card number

b) Credit card customets were divided into two groups: Canadian residents and visitors to Canada. The average amoirirt spent by all Canadian residents was $200. The average amount spent by all visitors to -Canada was $300. What must be true about the average amount spent by all customers? A. It must be $250 B. It must be larger than the median expenditure C. It could be any number between $200 and $300 D. It must be larger than $250

_ _ _

c) A sample of 500 cash sales had a mean of $20 and astandard deviation of $40. The histogram of the data would most likelybe: A. skewed to the left (i.e. long left-hand tail) B. approximately symmetric C. skewed to the right (i.e. long right-hand tail) D. bimodal

_ _ _

d) Which of the following is likely to have a mean that is smaller than the median? A. The salaries of all National Hockey League players B. The grades of students (out of 100) on a very easy exam on which most score very high or perfectly, but a few do very poorly C. The prices of homes in Vancouver D. The grades of students (out of 100) on a very difficult exam on which most score poorly, but a few do very well

_ _ -

SG.49

e) Here is the frequency distribution of the ages of a sample

of 100 employees of the

Hudson's Bav Com Freouencv Age (years) 15-19 2 10 20-24 25-29 19 27 30-34 16 35-39 40-44 l0 4s-49 6 50-54 5 3 55-59 2 60-64 Total 100

(i) What percentage of the employees is 50 or older?

(ii) The median

age of the employees is:

_ _ _ _ _

A. About 40 B. Between 30 and 34 9. Between 40 and49 D.l,{one of the above


is:

(iii) The mean age'bf the employees

A. About 34 because about half are younger than3{ and half are older B. Above the median because the distribution is approximately symmetric C. Above the median because the distribution is skewed to the right D. None of the above

f) Based on the following figure, decide whether each of the statements below the is more likely to be True or False. (f{ote: House income means "total household and is referred to simply as "income" in the statements.)
350,000

o
E

300.000 zso,ooo zoo,ooo

.p

f
:

$ o

tso,ooo
L00,000

5o,ooo
0

BMW

Cadillac Lexus

Lincoln

Mercedes

Mercedes buyers have the highest variability in income. For each car type, the incomes are reasonably symmetric. There is a positive correlation between income and brand.

True True True

False False False

MT2010: Question 2 "Catching some zzzstl


a)

consider a standard normar random variabre, z, (i.e.with mean 0 and standard deviation 1)' Find the median, lower and upper quurtii., interquartile range

*o

(IeR) of

Median of Z:

Lower quartile (e1):


Upper quartile

(e3):

Interquartile Range:

of the median? That is, find the total percentage below "Median _ 1.5xIQR" or above "Median + 1.SxIeR".

b)What percentage of values

of Zlieoutside 1.5xIQR on each side

c) Draw a boxplot that

woufi represent data obtained from

a large sample of values of Z.

scale (wAIS), a standard Ie test, are approximately normally distributed 6r all age gr"rpr, il;*.ver, the means and standard deviations of scores differ across different_ag. g.oupr. For the 20 to 34age group, the mean is 1 10 and the standard deviation is 25",''iil.'rorirt. 60 to 64age group, the mean is 90 and the standard deviation is 25. sarah is 29 and,her mother Ann is 62. sarahscores 135 on the wAIS while Ann scores 120. which of the two has the higher score relative to her age group? Explain your choice with appropriate calculations.

d) This part is unrelated to parts a), b) and c). scores on the wechsler aaurt rntettigence

Ann

Sarah

SG-51

MT2010: Question 3 "Contender for gender offender'r A university offers only two degree programs, one in Engineering and one in English. Admission to the programs is competitive, and a women's group suspects discrimination against women in the admissions process. They obtain the following data from the a applicants by gender and admissions decision. lassification of oI all universi a two-wav classltlcatlon Female Male 20 35 Admitted 40 45 Not Admitted
a) Is there evidence of an association between the applicants' gender and success in

obtaining admission? Why or why not?

b) The university replies that there is no discrimination. Ir its defence, it produces a three-way table that classifies applicants by gender, admission decision AND program ied. which Enslish Ensineerins Female Male Male Female 10 5 Admitted 10 30 Admitted' 30 15 Not Admitted 10 30 Not Admitted
Is there an association between admission rates and gender in either program? Explain why or why not.

to

c) Are the answers in parts a) and b) contradictory? If so, how can you explain the contradiction?

d)

o1

d) After disregarding gender, are admission rates different in the two programs? Support your conclusion with an appropriate two-way table (i.e. admission decision by program).
e)

m(

Dr

SG-52

MT2010: Question 4 rrBeauty is in the eye of the frolder"


On a recent trip to Mars, scientists discovered a colony of small creatures that they named frolders. Due to the speed and agility of the frolders, the scientists could only capture five specimens to bring back to Earth to study. One scientist suspects the weight of the frolder may be related to the number of eyes it has. The following table shows the weight and number of eyes for each o of the five VE soecimens: Specimen ID aI01 4102 Ar03 4104 AT05 Weisht (ks) 2 8 4 15 6 Number of Eves 2 t1 5 t7 5
a) Plot these data below.

Frolder Study
20
L5

-* ***--i"
-----

............................j........."...."..........^....i...,.. _- "i- -

i;i; !iii

-'"" - "-

-*--i--" - "- -*i------"*""*i


; -i' -'- -**-i*ii
". ""'".-*
i

- -^-*

o o
r*F

'i

10
5

iii! aii: ';i: *'...'....".....".'...i.".....'.......^..........,."1,,,....^..............."......; "-*-"..*"....--'i:i!i ,;!l

----"'" iiii I "----*-"'i""*-^***-ii!ii ii


ii

ii

""' ^--

-i

'i'10
Weighr (kc)

15

2A

b)

Briefly describe the association (must be briefforfult marlcl)

c) Which of the following values is the correct correlation coefficient for this data? Note: You can reason this out without doing the calculation. _ A. r:0.5 B. r:0.975

r: -0.954 r: _8. -0.5 _


D.
d) Looking at the scatterplot, is the correlation coefficient an appropriate measure? Why or why not?

_c.r:o

e) A joumalist reporting on this study claims that being heavier causes a frolder to grow more eyes. What is wrong with this statement?

f) Do you think these five frolders represent a random sample? why or why not?

SG.53

MT2010: Question 5 "wires, dam wires, and electricians'r


dams can Electrical wires can corrode over time. And wires used near hydroelectric corrosion rates (measured corrode more quickly because of the extra moisture in the air' wire, but electricians in hundredths of mils; are generally known for various types of Corrosion rates for 30 would like to be able to prJdi.t the corrosion rates near dams. the relationship' A types of wire were measured in normal use and at dams to assess use as the x variable and linear regression model can be constructed with wires in normal shows the data: the same wires used at dams as the y variable. The following scatterplot
L200 1000

d) fo

e)

D1

800
500

;
E

400
200 0 400

s)l

wir

Wire (normaluse)

Rer

a) In this study, the response variable is: A. Corrosion rate for a dam wire

h)r

statement' b) Is linear regression appropriate here? choose the single best A. Yes, the scatterplot is straight enough B. No, there is not enough scatter C. No, there is too much scatter D. Yes, there are no outliers

_ -

B. Corrosion rate for a wire in normal use response C. Either rate; itdoes not matter which is considered the response variable D. Neither; the instrument used to measure corrosion is the

End

the regression line' c) Summary statistics are presented below. Use them to calculate to !@.decimal places' Show the formulas and your work. Report your final answers r : 0'8691 ,sx I

:304.6667

: t96'4466

t:554.0000

sy:286'6104

SG-54

d) A new type of wire has a corrosion rate measure of 555. What does the model predict for the corrosion measure of this type of wire used at a dam?

e) one of the data points is (220,245). Whatis the value of the residual for this point?

f) what fraction of the variation iny is accounted for by the model?

g) Can the regression line be used to reliably estimate the dam wire corrosion rate for a wire which has a rate of 2500 mil under normal use? Give
a reason.

_Yes
Reason:

_No

h) Fill in each blank with the letter of the ending that fits best.

(i) If the x andy

variables are switched,

_.

(ii) If the units are changed for both x and,y variables, (iii) If the units are changed for just the x variable,
(iv) If a constant is added to the y variable,
Endings:

A' "'the slope will change but the averages B. ...s, will change but ! will not change.
C. ...the data will be normally distributed.

and standard deviations

will not

change.

D. ...only the correlation will change. E. ...the correlation, slope, and standard deviations will remain the same. F. ...the correlation and slope will both change.
G. ...the slope will change, and s" and s, will also change.

SG-55

MT2010: Question 6 rrPutting the pedal to the medalil


Retain gflprecision throughout your calculations but write down only two decimal for your final answer.
For parts a) and b), assume that the weights of the gold medals, silver medals, and ribbons are all independent (especially since we have not learned how to deal with such questions otherwise ! ).
a) Each medal made for the recent Olympics is unique. Ours were the first Olympic Games for which the medals have not been identical! Complete gold medals (that is, the medal plus the ribbon) weigh 48 grams on average with a standard deviation of 6 grams. The ribbons that are attached to the medals weigh 8 grams on average with a standard deviation of 2 grams. Find the mean, variance and standard deviation of the weights of

the gold medals without their ribbons.

b) Complete silver medals (i.e. medal plus ribbon) weigh 38 grams on average with a standard ddvia{ion of 5 grams. Find the mean, variance and standard deviation of a pair of complete medals (gold and silver) combined.
$

c) You were instructed to assume that the weights of the gold medals, silver medals, and lengths of ribbon are all independent. Is this a reasonable assumption? Explain why or why not in one brief sentence at most,

r
d) In some winter Olympic events, such as the snowboard parallel giant slalom, the winner is the rider with the best combined time over two runs. In some summer Olympic events, such as the javelin throw, the winner is athlete with the best single distance out of four tries. Generally speaking, does the sum of two random times or the maximum of four random distances have greater variability? A. Sum of two random times B. Maximum of four random distances C. Cannot say because time and distance are unrelated variables

Why? Explain in one sentence maximum.

MT2010: Question 7 r'The food of the godslr'


chocolate bars produced by a certainmachine are labeled 240 gransto comply with advertising rules and regulations. However, the distribution of ihe actual*rigit of these chocolate bars is claimed to be normal with u^iun oii?igrutnr and astandard deviation of 3 grams.
a)

be expected to be

Approximately what percentage of all chocolate bars produced by this machine would

between24} iaZqegrams?

A quality control manager initiallyplans to take a random sample of size n fromthe production line' If he were to double his sample size to 2nt, thestandard deviation of the sampling distribution of the sample mean x would ue mutiiptied by:
b)

_4.1t2 _B. U\n

_D.2

* C. \n

c) The quality control man$ger plans to take a random sample of size n fromthe production line. How big should n be so that the sampling distribution of i-has standard deviation 0.3 grams?

_ _ c. 1000 D' Cannot be determined unless we know that the population is normal. B. 100
manager takes a random sample of nine chocolate bars from the production line, what is the probability that ilt. ru*pi. weight of the nine sample chocolate bars will be less than240 grams? d)

_A.

10

tf thq quality control

*;;

_ B. 0.0013 _ c. 0.1587 _ D. 0.9987


Show your work:

_A.0

SG.57

MT2010: Question 8 rrshooters for the shooters?'r A radio talk show host with alarge audience is interested in the proportionp of adults in his listenin g arcawho think the drinking age should be lowered to 18. To find out, he ooDo you think that the drinking age should poses the following questions to his listeners: be reduced to 18, in light of the fact that 18-year-olds are eligible for military seryice?" He asks listeners to phone in and vote "yes" if they agree the drinking age should be o'noo'if not. Of the 100 people who phoned in, 70 answered "yes". lowered and
a) The sample estimate, B, of the proportion of adults who think the drinking age should be reduced is:

_ A.70 _ B. 0.70 _ c. 0.69 D. Not able to be determined from the information _ _ _ c. 0.0021 _ D 010045
ri

given

b) The standard error of this estimate is closest to: A. 0.089 B. 0.046

c) The margin of error for a90%o confidence interval is closest to: A. 0.046 B. 0.075 c. 0.090 D. 0.690

_ _ _ _

d) How luge asample n would you need to estimate p withmargin of error 0.01 with 95% confidence? Use the guess :0.6 as the value forp. A. 6768

_8.9220 _c.9502

D. 9596

e) Which of the following assumptions for inference about a proportion using a confidence interval are violated in this case? A. The data are a simple random sample from the population of interest B. The success/failure condition C. A third choice of no opinion needed to be included D. There appear to be no violations

o ti ri

_ -

ri:

MT20l0: euestion 9 ,,Going postal,,


A simple random sample of 100 canada Post emproyees these emplovees had;"rk.d;; found r]* 0""" ,.*rri-#uJ o.v*r,that the average time or2'0 vears. oo these data;;;;#ffi.nr. with standari deviation ,r,ur-L;;# lengrh of time thar the

fl:tr'il:"lui.|.t#:"Jr"i|;H.:"1ffi r'uu"*ot;Hffi
a) Give the appropriate

oosiarservicer,ui".r,*g.o

null and alternative hypotheses.

b) Give the formula for the appropriate test statistic and compute its varue.

c) Give a range in which the p_value is located.

t"l?:d;trJ:;il:x^;;:tj["r;ti"J;lfr
sie;hcrr;;i#;;;" ve! uv! _ D. Reject lroat the t%o,ifiiinrun..
i;r;i
fl B' Reject Ho atttr. rol' c. Rejecr,,Hp ar the s%

ri*in.iir.-u"t;;;,h.

I;;:J*3;orthero,,owingisco*e*?
5% significance level ther%significance rever

"y8lf"ff

fnlfl fl:J,ffi n:?,JJillJ;ffi :"','usioninonecrearrywordedsentence


nme the population orpostar emproyees have spenr wirh rhe

l"lli:"t#;ir'Jj*rn:T
_ A. 7.0 l.0.2 _ B. 7.0 + 0.4 _ C. 7.0 x 2.0 _ D. 7.0 + 4.0

;Tf:;'#Jj"tT::ff*ffi
Ring I

Bonus Question: Just for Fun and Bragging Rights over the r 7 davs winter.orympics you saw the olympic rings rogo "111r times. In the officiar countless logo, not rr,. roro version, each of the five

.rgr.-.il; t;;,i,#

*ilffi];,f H:[:sberrheorder"rir,.iiffi in,n.


Ring 2

RG'

Ring 4

RG'
SG-59

END OF MIDTERM 2OIO -ANSWERS AND EXPLANATIONS FOLLOW

MIDTERM EXAM

2O1O:

ANSWERS AND EXPLANATIOI\S

MT2010: Answer 1 a) Quantitative; Categorical ; Cate gorical; Neither b) c c) C d) B e) (i) 10% (ii) B f) True; True; False

(iii) c

Details and Comments: a) Although the text considers an identifier variable, such as a Visa credit card number, atrypeof Jategorical variable, it is useless in that form; it is best thought of as Neither. You aren't likely to do any analysis on the Visa card number! b) The average must lie between the minimum and maximum, but depending on ,k.*n.ss it could be smaller or larger than the midpoint or median. c) The minimum value is 0 but the maximum can be very large, hence right-skewed. d) A11 except B are likely to have a long right-hand tail, where the mean exceeds the median.
e)

cumulative count to 58% (2+20+19+27). f) Incomes are not exactly symmetric, but for all practical pulposes and especially for data analysis, they certainly are reasonably synmetric.

(i) (5+3+2)1t00: llYo interval fiii +it of values (2+20+lg) are less than 30; including the 30-34

I I

increases th

1l

MT2010: Answer 2
a)

.:..

1\

bj rrob. : 2xpr (z > L5x1.35) :

2xpr (Z > 2.025) 2x0.0215 : 0.0430 or fuout 4o/o .j fn. boxplot is s;rmmetric around 0, with the ends of the box at Ql and Q3 at '0.675 and0.675 (from putt u). Since Zhasno limits, the whiskers can't extend to the minimum and maximum. Instead, use inner fences; the whiskers should extend to -2.7 and2'7 ' d) Ann has a higher rank. : Ann's z-score: (120-90) 125: L2; Sarah's z-score (135-110)125:1

Median:0;

Qli:

-0.675; Q3 :0.675; IQR:

1.35

Details and Comments: a) Z is symmetric so the median equals the mean. It is acceptable to report answers to two decimal places: For Ql: -0.68 or -0.67; for Q3: 0.68 or 0.67;for IQR: L36 or 1'34 b) If you used IQR of 1.36, the probability is 0.0414. If you used IQR of 1.34, the probability is 0.0444. c) Since the distribution is unbounded, any reasonable choice of whiskers is acceptable.

b)

c)

d) e)

Dt c) d)

of English students of either sex are admitted. c) The English pro{?m is harder to get into, and that is where more females applied. This is an illustration of Simpson's parado"x. d)

MT2010: Answer 3 a) Yes: Percent of males admitted:35/g0 :0.4375 0r 43.75yo percent of Females admitted :20160:0.33 o, Ziilo b) No: Half of engineers of either sex are admitted. one-quarter

Ensineerino

Admitted Not Admitted Column Total


Admitted to Engineering: Admitted to English:

English

40 40 80

l5
45

Row Total JJ
95 140

60 40180: 0.50 or 50oA 15/60:0.25 or 25%o

Details and Comments: when a two-way table is provided, it is useful to add the row totals and the column totals. They are needed to compute conditionar prouuuilitirr. ii.!ron', paradox is one of the most revealing illustrations of the need to dig deeper tt. relationship between categorical variables' *3,-Tt*nt appearto be the result for a two-w ay tablemay well be reversed when a third rl4riable is incorporated.

il;

MT2010: Answer

..i

f15

20

E10

e0
s

v5
0t020
Weight (kg)

d) Yes; there is a clear linear relationship e) Correlation does not imply causation. D No. They were the slower ott*, o. tt easier ones to catch.

"

Details and Comments: c) Since the correlatio.n is strong and positive, only 0.g75is a sensible choice for r. d) Conelation coefficients requlre relationships. lj4iar

SG.61

:167.683 + 1.268(555)--871.423 (or 871.424) :167.683 + 1.268(220):446.643 (or 446.644) e) 9 Residual : e:245 - 466.643 : -20L643 (or -201.644) :0.86912 :0.755 D ,t g) No; this is extrapolation far beyond the range of data. h) (1) A (ii) G (iii) B (iv) E

a)A b)A c) h - r (*) : 0.86e1(2 86.6r04ns6.4466) : r.268 bo = ! - bfi: 554.0000 - 1.268(304.6667): 167 .683 (ot 167 .684) 9 : 167.683 + 1.268x (or f: 167.684 + 1.268x)
d)f

MT2010: Answer 5

Details and Comments: a) Response variable is on the vertical axis. c) Beware of round-off error. Carcy all available decimal places in the intermediate calculations, but report fewer as instructed. d) Simple substitution e) Use the definition of residual: observed minus predicted. f) This is the definition of r-squared. g) Although it is mathematically correct to substitute 2500 into the regression equation, extrapolation far beyond the range of data is a major misuse of regression. h) Examine"th6,.formulas for slope, intercept and correlations and test out the effect of the suggested changes.. For (ii), correlation does not depend on units, but slope and SDs do change if both variables change. For (iv), the scatterplot is simply moved straight up, so SDs, slope, and correlation are not affected.
.1

MT2010: Answer 6 a) Mean (X-Y) : Mean (X) - Mean (Y) :48 - 8 :40 Var (X-Y) : Var (X) + Var (Y) : 36 + 4:40
SD (X-Y) r|fr.:6.32 b) Mean (X+Y) : Mean (X) + Mean (Y):48+38 Var (X+Y) Var (X) + Var (Y) :36 * 25 :61

7.81 c) Yes: Heavier ribbons are not expected to be found only on heavier medals.

: SD (X+Y) : rfif :

: 86

d)A
Details and Comments: a) and b) The variance of a sum or difference of two independent variables is always the sum of the individual variances. Remember that calculations are not done with standard deviations; combine variances first and then take the square root. d) The sum of two random variables generally has greater variability than a single oomean" of two measures random variable. However, if the question had asked about the rather than the sum, then the mean would have lesser variability than a single measure.

ir

p(

d,

th

rl
9

Br

BI

68% bl 7 ;";ri' ror d): ;'l;


a)

MT20l0: Answer

'

240)

rin.i","#"d;i?;1,[XliffJ*,1l]l:l'.rro, * rj:

?e!?ik and Commens; a) Use the 6g/95/r

=;]; - ,oo-rIt\ p,o, = pr (z<


243 x3 =

-3)

: 0 00r 3

d) since the question

:.:tft Y,lli;:tg':"iru,ruX1fi :y roffi =):;d:7;',:#'9{ygl;{:l:: la,ot"it^",'i,""L,,


i,
uuout

liif,'.T.#:oi,ntmffi
dardizationuses

(240,246)

a)B

MT20l0: Answer
b)

d."rr*t, *.un, ,rr" ,rt


e)A

B c)B d)B i
:70/100

o/16 = 3Ni.

aJ Keason:

lelails and Commen9;

: 0.70

_ -''' ry^,^i:!m; :ll."r"'_.,rj


b)1= 7'o-7.s

d) Reason: = e) The data arc'a""nuJni.iri. .iioiii,ro.ot21= e220 since peopte cnoose "'vw vvupro choose whefher or not to phone in! MT20l0: Answer a) He:p =7.5;Hu:

fr&?i;;ffi:soo46 , (i.66
p*7.5

9 ..F

il;JJ

iTfi{r:.il'iJH:io Ds t o; ffif[i./+,ff:i?;'fr u;;'"'


and Comments; a) This is a t_test of a ,ingte mean. Th

sav that the

d)

c
'rD]
have worked for

m#n rength vr of time rr'Ie emproyees

lepn

;;,'*?" f)rhe;;ilffi;1.H:*:f j:*Ti1i*t?iil,::f"::Ttili1i


$H.":*:,;rffiT:l
-I = 99 df Use the vatue'fo,

o statistic is negative, uaurln'tt"rrfr""gi,fri'uutr* t-ta6le you rook up the t;;*i1,]it[test O Since the p-value" i, r.otirin;rr.r " the null hypothesis rl?r-0r,, at the 5%o revet;but since

") p.ositive

:il:l'ff ti'.:irix",;tt'l?l#Tr,T,n,, ;:ii?,tt,;*lli;*;*#;ir;i'Ifi ll * Note rhar


fr

jrj;j*,:_.",,,""

*::,t tsLUE BLACK yELLow

euesrion: Olympic Rings colours RED

Od-li" tfle taUte.

cneg*-

END OF ANSWERS AND EXPLANATIONS

TO MIDTERM

2O1O

SG.63

Part B. Past Years' Midterm Exams


A collection of questions from midterm exams of past years' with
answers and exPlanations
This section presents questions from midterm exams prior to 2010. Since course syllabi, textbooks, oid.r oftopics, and even notation, have changed, not every question from past exams is relevant today. So the exam questions have been reorganized by broad topic area as follows: Section A: Section B: Section C: Section D: Section E: DescriPtive Statistics Scatterplots, Association, Correlation, Least Squares Regression Normal Curve, Sampling Distributions, Combining Random Variables Introduction to Inference, Confidence Intervals, Hypothesis Tests Miscellaneous

year and go back in time. Questions in each topic area are afiarrged from the most recent explanations/comments and answers fifowing the questibns in each topic area is a set of about the answers. The comments give details of calculations and cofirmon errors made

by students.

,.:.

Since the teaching'of any course is dynamic and always undergoing change, there may still be some terminology or notation or even a few parts of questions which are unfamiliar to you. If you are unclear whether a particular question or topic is relevant to the current year, please ask your instructor.

SECTION A: DESCRTPTTVE STATTSTTCS


Question

Al (MT2,099:9rl

6oNot

yet an olympic Sporto,

irr:l];JJ#f l*::lli1g"T.;.t,eriarrromrarersections. ffi.fr-U*,?:lli:f,ffiTJ;'ffi;ffi:##:Ti;'.'$id'',a,eandremale

a)

which team has more members? (circle the correct response)

Male

Female

Can,t

tell

Same size

'?.H;llT;*ft?i:?1,flil:**,y3#f,:f' 2122n30
c) For each of the three measures below,

rouowing is most rikery rhe mean age or

:il3,l:'illl:'.f

;i.,ffi

fill ;;;;;i A*iffi ffi

in the numerical value in the blank provided or none , o r rhe s e (c ircr e one

Value:

Is a measure of;

Interquartile ranse (for males), "


percentile (for females)
5Oth

Shape
Shape

Cenhe

Spread

None

Centre

Spread

None

Oldest male member

Shape

Centre

Spread

None

d) The distribution of male ages

is: (circle

the correct response)

Symmeffic

Skewed to the left

Skewed to the right

e) The distribution of femare ages

is: (circle the correct response)


Skewed to the right

Symmetric

Skewed to the left

SG.65

f) The mean male age is 22,5 years. One of the members of the male team is 22 years old and has a z-score of -0.25. What is the standard deviation of male ages?

g) If we assume that male ages are normally distributed, what proportion of males on the team are 22 years ofage or younger?

h) Which of the following is the best justification for the assumption of normality made in part g)? (Check the best response) A. The Law of Large Numbers B. The Central Limit Theorem C. Least squares regression D. None of the above

_ _ -

i) Team members are required to take a course in the history of underwater basketweaving. The professor records the values of several variables for each student. These variables are listed below. For each one, decide whether it has been recorded as quantitative or categorical.
Score on the final exam (out of 200 points)

Quantitative Quantitative Quantitative Quantitative

Categorical Categorical Categorical Categorical

Final grade for the course (A, B, C, D, or F)


The number of lectures the student missed Brand name of favorite swimsuit

j) Universities

across North America require underwater basket-weaving students to take quantitative a skills test. Percentage scores on this test have a mean of 30% and a standard deviation of l0%, Give a range within which you would expect to find the middle 95o/o of all North American underwater basket-weaving student test scores.

sl

ol

In

A1

u
IQ

Question A2 (MT200s-er) 6.There

are two kinds of data _ good and bad!,,

;1f,:TJ;"tljl.* lts em
_".u;au-ysli
E

*-r^

/f

I IZJ4

23467 e8543 76548

of the dara set in which cyberStat corporati on records information Surname Age Gender Salary Job Type Srnith 39 remale $62,100 MAttn".o^onJ Jones ?7 Male $47,350 Chan 27 Female $zs.zso utencal W'ono 48 Male s / /,600 Management
variables below which are recorded as quanrirarive scate variables Gender

tffifJni:tr;""r1:te
EmPloyee

Job Type b) Three small Statistics classes all took the same test. Histograms of the class are shown below. scores for each
Class

Surname Age

Salary

Class 2

Class 3

I
f
a

I
5

I
I I

4 3

40 50 60 7U 80 90

100

50 60 70 80 s0

J00

;l

40 50 60 70 80 90

10

(ii) (iii) (iv)

(i)

:f:ffiH,-jj.tffi;sr'iffi;)
Individuar incomes in the united Age of male heart

c) For each of these variables, decide whether its dishibution is more likely symmetric or skewed r.ft reft-hand ta') circre one "r

r z score? Forwhich.tu**.-rt"',,'"ilameaianmostdifferent? 11 3 which class had rhr;;;u.rt standard a"uiutiJz r z


*.
(j:;;"g

Which class had the highest mean score? Which class had ild;;;, median

;
3

states Symmetric Skewed right skewed reft attackvictims symmetric Skewed right skewed reft Lifetimes of electric light bulbs symmetric Skewed right Skewed left IQ scores of the canadian population Symmetric skewed right skewed left

sG-67

Question A3 (MT2008-Q2)

ooA

Nash-ional Game"
Obs#
1

The data set to the right contains all the point differentials or margins in all NBA games played by the Phoenix Suns up to February 13 of the 2007108 season. Negative numbers indicate losses, positive numbers indicate wins. The data have been arranged in ascending order for you (biggest loss to biggest win).
a) Compute the various numerical summaries and put them into the table below part b) under "original data." Some have been computed for you.

)
J
a

4
6 7
8

9 10
11

NOTE: Part b) is not part of the current curriculum. You can ignore it. But think of it as a challenge question. It is easy to figure out. Instructions are given in the Answers/Comments. b) Suppose the data undergo a transformation such that tr : 2X - 3, where X:
the original variable and,X* is the transformed variable. Find all of the numerical summaries forX* and put them into the table below under 'otransformed data".
Original
Data (X) Transformed Data (X*)
oonew,"

t2
13

t4
15

16 17

l8
t9
20

Mean

5.6

2l )J
23

Median
Range
Q1

24
25

26 27
28

Q3

IQR
Std dev

29 30

II,7

31 32

JJ
34

c) Are there any outliers? Use the "inner fences" definition of outliers and the original data (not the transformed data) to identify any outliers.

35

36 37
38

Lower inner fence

39 40

Upper inner fence =

4t
42
43

Observation numbers of outliers:

44 45

46
47 48

49
50
51

52

ffitfit:il

firfffl07-er)

uData,data,

dara! r can,r make bricks without cray!,, _

a) A sample of shoppers at a mall was asked the following questions. Decide whether type of data are more likely the to be quantitative or categoricai. lCircte your ctroice; What is your age (in years)? Categorical euantitative How much did you spend (in $)? Categorical euantitative What is your maitalstatus? Categorical euantitative avaitability of parking. T:" Categorical euantitative (Excellent, Good, Fair, poor)

,*

b) Here is a table of sources of electricity in canad a andthe uS and the percentage of electricity generated by each. c"".ttr.i; bar graph to !v wv'rP *-pur. canada and the uS. Do NoT use separate sets of axes ro..u.r,

gr;;:'

league salary of $2.36 ;il;h"k, ;;;; or"."*i*, That is, is $2.36 m'lion the mean or median salary for"rVna
mitlion'" which wordshould g" A study was made
119-made more than the

c) A news article reports that, "of rosters in Februarv i?nq, only

th

e 41r

players on National Basketball Association

piuy..r?

9f followins;.'":'Lt$:ll to u.ir,.,t*Jurai.uiutionz

d)

the age of enterins.firlt-year university students. which of the

_8. _C.

1 year

5 years

SG.69

e) The following histogram displays the December 2000 percentage unemployment rates in the 50 U.S. states and Puerto Rico. The labels on the horizontal axis should be interpreted as follows: the bar labelled "1" represents rates of |.0% to I.9%o, the bar labelled '02" represents rates of 2.0% to 2.9To, etc.
24
I

20

*ro ?h
14

o18

o12 310 EA 2B
4 2 0

12345678

ffiffi
a

Unsmployment Rate

(i) What percentage of the rates (out of

total of 5 1 observations) is 5.0% or greater?

(ii) Estimate the median unemployment rate.


f) You have decided to sell your home. The market is booming now with the 2010 Olympic Games preparations, and therefore most sellers of houses with similar characteristics have received extremely good deals in the past few months. You ask the realtor for a summary of net prices of homes sold in your neighborhood. The realtor hands you the,&llowing two density curves, one of them of the prices of homes sold in the past few months in your neighborhood, and the other of the prices of homes sold during a deep econbmic recession.

Curve

Curve B

(i) Under the given assumptions, which of the two curves better represents the distribution of prices of homes sold in the past few months? Circle your answer choice.
Curve A Curve B (ii) A potential buyer offers to give you the mean, the median or the mode of the prices of all the homes sold in the past few months in your neighborhood. Assuming that the density curve is the one you chose in (i) directly above, which numerical measure would you prefer? Circle your answer choice.

If you chose Curve


OR: If you chose Curve

A: B:

Mean Mean

Median Median

Mode Mode

t"

are told that the mean price of 50 houses sold is $700,000. However, you notice that there was a mistake in the calculation, and that one of the buyers paid $500,000 instead of the $800,000 that was used when making this calculation. What is the actual mean price of the 50 houses sold?

(iii) You

s.

sl

sl

SG-70

SECTION A: AI{SWERS AND EXPLANATIONS


Answer to Question A1 (MT2009-Q1) a) Can't tell b) 23 c) IQR: 3; Spread 50tn p. :22, Cenfte Oldest male

: 27, None

d) Symmetric e) Skewed to the right (22-22.5)l(-0.251:2 -0.25 (29-22.5)lo, so D g)Pr(Z < -0.25): 0.4013 h) D. None of the above i) Quantitative, Categorical, Quantitative, Categorical j) Empirical (68-95-99.7) Rule: 30 + 20 (10 , (Also accept 30 + 19.6) Note: Parts g), h) and j) are about "Sampling Distributions and the Normal Model". Check your notes or the textbook.

Z:

o:

50)

Details and Comments: a) Boxplots do not show sample sizes; they only show: min, Q1, median, Q3, and max. b) Since the age distnbution for females is shongly skewed to the right, the mean is greater than the median. The median (from the graph) is 22, so the mean must be a little larger, hence 23. Note that 30 is close to the maximum and far above Q3 so it is not a realistic estimate of the mean. c) IQR (Males) : 24 - 2I :3;50'n p. (Females) : median : 22; Oldest Male max: 27 f) Use the formula for standardiring Xto Z; however, here both the values of XandZ arc given and it is the value of o which is unknown. h) The Central Limit Theorem cannot be used as the reason here since the sample is unlikely to be large.

Answer to Question A2 (MT2008-Q1) a) The quantitative variables are Age and Salary. b) Answers:3,3,3,1.
c) Answers: Individual incomes in the United States Age of male heart attackvictims Lifetimes of electric light bulbs IQ scores of the Canadian population
Skewed right (long right-hand tail)

Skewed left (long left-hand tail) Skewed right (long right-hand tail) Symmetric (equal tails)

Details and Comments: a) Gender and Job Type are categorical; Employee # and Surname are simply strings and used as identifier variables. Taking the mean of the Employee # would not make sense. b) Class 3 has much more area to the right than Class 1 or Class 2 so the mean and median are also shifted to the right. And since the histogram for Class 3 shows the greatest skewness, it has the greatest difference between mean and median. Class 1 is less spread out (the tails are both smaller than in the other two classes) so it has the smallest standard deviation.

SG-7I

c) Incomes are skewed right because fewer people have very large incomes, more people have incomes at the lower end or middle. Age of heart attack victims is skewed left because heart attacks are much more likely in older people. Lifetimes of bulbs are skewed right because most bulbs last the amount of time they are engineered to last but some will last much longer; that is, quality is designed in. Only a few will fail early. Lifetimes in general are skewed right.

Answer to Question A3 (MT2008-Q2)


a) aq4 b Original
Data (X) Transformed Data (X*)

Mean

5.6

8.2 10 110

Median
Range

6.5

5l
-3
11

Ql
Q3

-9
19

IQR
Std dev

t4
TI,7

28

c) Lower inner ferrce - -3 - 1.5(14) -24 Upper inner fence 11 + 1.5(14):32 Observation numbers of outliers : 52

23.4

Details and Comments: Note that the question asked for the observation number(s), not the margin! For part b): Suppose the data are transformed (linearly) as follows X* : a + bX; that is, multiply the original observations by oob" and then add "a". That shifts all the values ofX up or down by the amount o'a" and changes the size of the unit of measurement by'0b". Mean(X*): a bxMean(X); Median (X*) a + bxMedian(X); Range(X*) bxRange(X); lthe effect of ooa" is cancelled] a * b"QL(X); Q3(X): a + b"Q3(X); IQR(X): bxIQR(X); [the effect of o'a"'is cancelled] SD(tr): bxSD6); fthe effect of 'oa" is cancelled]

QIf):

SG.72

Answer to Question A4 (MT2007-e1)


a) What is your age (in years)? How much did you spend (in $)?

What is your marital status?


Rate the availability of parking

Quantitative Quantitative Categorical Categorical

b)
Sources of Ectricity
80
70

Eources of Eec{rlclty

60

t50 I

.30
fl

i.o
m

Nudotr Nallrel

Cs

c) "Of the 4ll players on National Basketball Association rosters in February 1998, only 139 made more than the leagr.p MEAN salary of $2.36 million." If it were th! median, then half of the 4r1 players (i.e. 205 or 206) would exceed the value.

d) I year is the typical difference in age between entering first-year university students.

the median as 3.jYo.

(i) 5/51 :0.098, so 9.8%. It is also acceptable to round to l0o/o. (ii) The median is in the 3.0-3.9 interval, so the median is best estimated as the midpoint of that interval at3.5%o. Comment: It is also acceptable to give the range 3.0-3.9. It is not acceptable to estimate
e)

f) (i) Curve B (ii) If you chose Curve A: Mean If you chose Curve B: Mode Note: The two choices offered in part (ii) are to give you a chance to get the correct answer to part (ii) even if you made the wrong choice in part (i).

(iii) [(50x700,000) - 300,000]/50 :9694,000

Comment: Use the formula for mean and adjust accordingly.

SG-73

You might also like