You are on page 1of 41

How to proceed:

Please read each question thoroughly and


answer each question in its entirety.
Car Wash
Day # Calls Bins Q1 (25 Points). Youre the manager
1 45 0 You collect data over a 75 day period
2 30 20
3 53 40 Q-1a (5 Points) Create a graph of the
4 69 60
5 67 80 Q-2a (5 Points) Calculate the averag
6 45 100
Q-2b (5 Points) Calculate the mean n
7 17 120 the number of calls per a day? Expla
8 46 140
9 32 160 Q-2c (2.5 Points) Calculate the mode
10 33 180
11 59 Q-2d (2.5 Points) Calculate the Stan
12 74
13 42 Q-2e (5 Points) Based on the data co
reasoning and how to interpret this in
14 35
15 38 32
16 18
17 41
18 63
19 72
20 43
21 45
22 44
23 48
24 48
25 67 Range
26 72
27 41
28 52
29 75
30 40
31 34
32 40
33 88 Question 1
34 63
35 38 Day
36 48 0-20
37 45 21-39
38 50 40-59
39 150 60-79
40 84
41 28
42 37
43 52
44 44
45 49
46 70
47 75
48 58
49 23
50 62
51 10
52 71
53 80
54 70
55 41
56 47
57 99
58 38
59 29 Question 2 a
60 83
61 60 Total car washed = 4128
62 54 number of days = 75
63 35
64 46 Average =
65 51
66 58
67 72 Outliers are values that "lie outside" the other v
68 86
69 48 Question 2 c
70 48 Question 2d
71 51
72 62 Days
73 62 1.0-20
74 85 21-39
75 150 40-59
total 75 4128 60-79
total

Variance=

SD=

Q2 a The graph is skewed to th


25 Points). Youre the manager of a car wash business and you want to know how many cars you wash on a daily b
collect data over a 75 day period.

a (5 Points) Create a graph of the data

a (5 Points) Calculate the average number of cars per a day and describe how outliers affect the value of the mean.

b (5 Points) Calculate the mean number of cars per a day. What does this tell you and is this a better or a worse estim
number of calls per a day? Explain.

c (2.5 Points) Calculate the mode. Explain its significance.

d (2.5 Points) Calculate the Standard Deviation. Explain what this tells you.

e (5 Points) Based on the data collected, is the graph skewed to the left, right, or symmetric? If so, please explain you
oning and how to interpret this information.

= Max Value- Min Value


140

Car washed
879
1003
1200
1118

car washed
car washed

car washed

car washed = 4128


er of days = 75

55.04

es that "lie outside" the other values. Hence if a number is too far from the main group, it gives a bad mean.

Median= 1060.5 mode=

Midpoint car washed deviation (x-X) Deviation^2 (x-X)^2


9.5 879 -3321 11029041
30 1003 -3197 10220809
49.5 1200 -3000 9000000
69.5 1118 -3082 9498724
4200 0

14643.5

121.010330138

The graph is skewed to the left (negative skew). This means that the mean is on the left of the peak.
y cars you wash on a daily basis.

ect the value of the mean.

this a better or a worse estimate of

Day
0-19
20-39
40-59
ric? If so, please explain your 60-80
80-99
100-110
120-139
140-159
160-180

Days car washed


0-20 879
21-39 1003
40-59 1200
60-79 1118
car washed

p, it gives a bad mean.

the left of the peak.


Bin Frequency 1-a)
0 0
20 3 Histogram
40 15
60 31
80 18
100 6
120 0
140 0
160 2
180 0 Frequency
More 0

2-a) Mean = 55.04


Mean after removing 2 outliers with values 150 =

2-b) Same stesp

2-c)

Modal group is the group with highest frequency


In this case it will be 40-60
2-d)
Standard deviation

Here f is frequency of the group. X is the mid point of the group and x bar is total mea
Histogram

Frequency

52.4383562

group and x bar is total mean


Income Sales Age
$ 26,748.51 $ 1,695,712.62 33.16
$ 53,063.79 $ 3,403,862.05 32.67
$ 36,090.14 $ 2,710,352.91 35.66
$ 32,058.07 $ 529,215.46 33.07
$ 47,843.42 $ 663,686.65 35.76
$ 50,180.97 $ 2,546,324.34 33.81
$ 30,710.08 $ 2,787,046.20 30.98
$ 29,141.70 $ 612,696.05 30.78
$ 55,980.15 $ 891,822.03 32.32
$ 28,730.88 $ 1,124,967.97 32.53
$ 31,109.23 $ 909,500.98 31.44
$ 55,614.12 $ 2,631,166.88 33.16
$ 23,038.43 $ 882,972.65 31.87
$ 34,531.72 $ 1,078,573.12 33.41
$ 30,350.36 $ 844,320.19 34.05
$ 38,964.94 $ 1,849,119.03 28.89
$ 49,392.77 $ 3,860,007.32 36.11
$ 25,595.69 $ 826,573.88 32.81
$ 29,622.61 $ 604,682.87 33.05
$ 31,586.10 $ 1,903,611.60 33.50
$ 49,674.56 $ 2,356,808.39 32.68
$ 28,878.98 $ 2,788,571.96 28.52
$ 24,287.08 $ 1,634,878.29 32.89
$ 46,711.24 $ 2,371,627.37 30.50
$ 43,449.81 $ 2,627,837.96 30.29
$ 31,694.45 $ 1,868,116.33 31.29
$ 45,459.22 $ 2,236,796.86 33.05
$ 47,047.34 $ 1,318,876.23 32.93
$ 26,433.24 $ 1,868,097.84 31.84
$ 33,396.66 $ 1,695,218.57 31.08
$ 26,179.36 $ 2,700,194.42 32.18
$ 33,454.64 $ 1,156,049.77 31.69
$ 42,271.50 $ 643,858.44 34.03

Sales
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.3601347911
R Square 0.1296970678
Adjusted R Square 0.1016227797
Standard Error 857089.133581316
Observations 33

ANOVA
df SS
Regression 1 3393699486846.59
Residual 31 22772655269998.3
Total 32 26166354756844.9

Coefficients Standard Error


Intercept 545545.787265091 578170.228098795
Income 32.4943853039 15.1181142648

Calculate simple linear


regression

Using Approach 2 Redefined


data

m = XiYi
Xi^ 2

C= Y - m X
Xi^ 2

C= Y - m X

Income Sales Xi-X


$ 26,748.51 $ 1,695,712.62 -$ 1,192,543.25
$ 53,063.79 $ 3,403,862.05 -$ 1,166,227.97
$ 36,090.14 $ 2,710,352.91 -$ 1,183,201.62
$ 32,058.07 $ 529,215.46 -$ 1,187,233.69
$ 47,843.42 $ 663,686.65 -$ 1,171,448.34
$ 50,180.97 $ 2,546,324.34 -$ 1,169,110.79
$ 30,710.08 $ 2,787,046.20 -$ 1,188,581.68
$ 29,141.70 $ 612,696.05 -$ 1,190,150.06
$ 55,980.15 $ 891,822.03 -$ 1,163,311.61
$ 28,730.88 $ 1,124,967.97 -$ 1,190,560.88
$ 31,109.23 $ 909,500.98 -$ 1,188,182.53
$ 55,614.12 $ 2,631,166.88 -$ 1,163,677.64
$ 23,038.43 $ 882,972.65 -$ 1,196,253.33
$ 34,531.72 $ 1,078,573.12 -$ 1,184,760.04
$ 30,350.36 $ 844,320.19 -$ 1,188,941.40
$ 38,964.94 $ 1,849,119.03 -$ 1,180,326.82
$ 49,392.77 $ 3,860,007.32 -$ 1,169,898.99
$ 25,595.69 $ 826,573.88 -$ 1,193,696.07
$ 29,622.61 $ 604,682.87 -$ 1,189,669.15
$ 31,586.10 $ 1,903,611.60 -$ 1,187,705.66
$ 49,674.56 $ 2,356,808.39 -$ 1,169,617.20
$ 28,878.98 $ 2,788,571.96 -$ 1,190,412.78
$ 24,287.08 $ 1,634,878.29 -$ 1,195,004.68
$ 46,711.24 $ 2,371,627.37 -$ 1,172,580.52
$ 43,449.81 $ 2,627,837.96 -$ 1,175,841.95
$ 31,694.45 $ 1,868,116.33 -$ 1,187,597.31
$ 45,459.22 $ 2,236,796.86 -$ 1,173,832.54
$ 47,047.34 $ 1,318,876.23 -$ 1,172,244.42
$ 26,433.24 $ 1,868,097.84 -$ 1,192,858.52
$ 33,396.66 $ 1,695,218.57 -$ 1,185,895.10
$ 26,179.36 $ 2,700,194.42 -$ 1,193,112.40
$ 33,454.64 $ 1,156,049.77 -$ 1,185,837.12
$ 42,271.50 $ 643,858.44 -$ 1,177,020.26
$ 1,219,291.76 $ 57,623,147.23

X= $ 36,948.24
Y= $ 1,746,155.98

m= $ 46.28

C= $ 36,329.36

Y=46.28 X + 36.329.36
Sales of 50,000 Y=46.28 X 50,000+36,329.36
Y=2,350,329.36
Growth HS College
0.8299 73.5949 17.8350
0.6619 88.4557 31.9439
0.9688 73.5362 18.6198 25 Marks:
0.0821 79.1780 20.6284
0.4646 84.1838 35.2032 The data at left are m
2.1796 93.4996 41.7057 the sample, are app
customers is referre
1.8048 78.0234 28.0250
Sales ------Latest on
-0.0569 70.2949 15.0882 Income ---Median fa
-0.1577 70.6674 10.9829 Age --------Median a
0.3664 63.7395 13.2458 HS ----------Percenta
2.2256 76.9059 19.5500 College ---Percentag
1.5158 82.9452 20.8135 Growth ---Annual po
0.1413 65.2127 16.9796
-1.0400 73.4944 32.9920 Q-2a (5 Points). Co
1.6836 80.2201 22.3185
Q-2b (5 Points). As
2.3596 87.5973 24.5670
0.7840 85.3041 30.8790 Q-2c (2.5 points): Pr
0.1164 65.5884 17.4545
1.1498 80.6176 18.6356 Q-2d (2.5 points): E
0.0606 80.3790 38.3249
1.6338 79.8526 23.7780 Q-2e (10 Points). In
1.1256 81.2371 16.9300
1.4884 70.2244 19.1429
4.7937 87.1046 30.8843
1.8922 80.2057 26.5570
1.8667 75.2914 28.3600
1.7896 77.6162 19.2490
0.2707 85.1753 35.4994
3.0129 74.1792 18.6375
3.4630 81.6991 41.1130
0.7041 73.4140 17.8566
-0.1569 73.7161 26.5426
0.7084 78.6493 29.8734

Sales

Sale
Sale

MS F Significance F
3393699486846.59 4.6197811737 0.0395229519
734601782903.171

t Stat P-value Lower 95% Upper 95%Lower 95.0%


0.9435729492 0.352684205 -633640.167248479 1724732 -633640
2.1493676218 0.0395229519 1.6607879765 63.327983 1.660788
Yi-Y XiYi Xi^2
-$ 55,927,434.61 45357785973.1962 $ 715,482,787.22
-$ 54,219,285.17 180621821169.361 $ 2,815,765,809.16
-$ 54,912,794.32 97817015790.8567 $ 1,302,498,205.22
-$ 57,093,931.77 16965626229.7041 $ 1,027,719,852.12
-$ 56,959,460.57 31753039335.7167 $ 2,288,992,837.30
-$ 55,076,822.89 127777025064.905 $ 2,518,129,750.14
-$ 54,836,101.03 85590411827.1162 $ 943,109,013.61
-$ 57,010,451.17 17855004596.8518 $ 849,238,678.89
-$ 56,731,325.19 49924331180.645 $ 3,133,777,194.02
-$ 56,498,179.26 32321319606.2592 $ 825,463,465.57
-$ 56,713,646.25 28293875047.6085 $ 967,784,191.19
-$ 54,991,980.35 146330030659.96 $ 3,092,930,343.37
-$ 56,740,174.57 20342303681.0932 $ 530,769,256.86
-$ 56,544,574.10 37244985117.4933 $ 1,192,439,686.16
-$ 56,778,827.03 25625421843.1698 $ 921,144,352.13
-$ 55,774,028.20 72050812017.8433 $ 1,518,266,549.20
-$ 53,763,139.91 190656453557.505 $ 2,439,645,728.27
-$ 56,796,573.35 21156728794.5772 $ 655,139,346.58
-$ 57,018,464.36 17912284772.4455 $ 877,499,023.21
-$ 55,719,535.63 60127666358.76 $ 997,681,713.21
-$ 55,266,338.84 117073419827.233 $ 2,467,561,911.19
-$ 54,834,575.27 80531113774.7639 $ 833,995,485.84
-$ 55,988,268.94 39706419722.3449 $ 589,862,254.93
-$ 55,251,519.86 110781655223.928 $ 2,181,939,942.34
-$ 54,995,309.27 114179060116.237 $ 1,887,885,989.04
-$ 55,755,030.90 59208919615.3685 $ 1,004,538,160.80
-$ 55,386,350.37 101683040644.968 $ 2,066,540,683.01
-$ 56,304,270.99 62049618598.9176 $ 2,213,452,201.08
-$ 55,755,049.39 49379878442.4686 $ 698,716,176.90
-$ 55,927,928.66 56614638074.3896 $ 1,115,336,899.16
-$ 54,922,952.81 70689361660.2744 $ 685,358,890.01
-$ 56,467,097.45 38675229011.2514 $ 1,119,212,937.53
-$ 56,979,288.78 27216862215.546 $ 1,786,879,712.25
2233513159552.76 $ 48,264,759,027.52
25 Marks:

The data at left are monthly sales totals from a random sample of 33 stores in a large chain of nationwide clothing s
the sample, are approximately the same size and carry the same merchandise. The county, or in some cases coun
customers is referred to here as the customer base. For each of the 33 set are:
Sales ------Latest one month sales total (dollars)
Income ---Median family income of customer base (dollars)
Age --------Median age of customer base (years)
HS ----------Percentage of customer base with a high school diploma
College ---Percentage of customer base with a college diploma
Growth ---Annual population growth rate of customer base over the past 10 years.

Q-2a (5 Points). Construct a scatter plot, using sales as the dependent variable and median family income as the in

Q-2b (5 Points). Assuming a linear relationship, use the least-squares method to compute the regression coefficien

Q-2c (2.5 points): Predict the sales based off income of $50,000.00

Q-2d (2.5 points): Explain why it would not be appropriate to use the model to predict sales when income is $10,000

Q-2e (10 Points). Interpret the meaning of the Y-intercept, b0, and the slope, b1, in this problem.

It can be seen that the plots are scattered all over the place.

Sales
Sales

Upper 95.0%
1724732
63.327983

2-d)
At income =10,000 Sales 870489.64
Standard Error 857089.13
Predicted sales is almost the same as the error. That's why it won't be appropriate

2-e)
Sales = b1*Income +b0
Sales = 32.494*income+545545.7873
b1>0 which means that for an increase in income sales will also increase
Also, when we increase income by 1 unit, Sales increases by 32.494 unit

b0=545545.7873 which is equal to the sales generated when the income of the family =0
e chain of nationwide clothing stores. All stores in the franchise, and thus within
county, or in some cases counties, in which the store draws the majority of its

median family income as the independent variable. Discuss the scatter plot.

mpute the regression coefficients b0 and b1 and state the regression equation.

sales when income is $10,000 .

his problem.

attered all over the place.


won't be appropriate
lso increase
32.494 unit

the income of the family =0


25 Points:

Q-3: The cellular spinoff company Jog wants to estimate the proportion of househ
were made available with a free handset. A random sample of 500 accounts is se
purchase an additional line if the handset was free.

Construct a 99% confidence interval estimate of the population proportion of accounts that w
the table below to arrive at your answer. Points breakdown below:

(2.5 Points)- Determine N, P-bar, and Confidence Level, Square Root, and determine the Cen
(2.5 Points)- Complete the table in full.
(5 Points)- Describe the confidence interval estimate of the population proportion of accoun

Q-4 (15 Points): The amount of time it takes to take your order at the local Briarp
a standard deviation s of 0.40 minutes. If you select a random sample of 16 custo

Q-4a (5 Points): What is the probability that the mean time spent per customer is at least 3 m

Q-4b (5 Points): What is the probability that it takes between 3 minutes and 5 minutes to tak

Q-4c: (5 Points) What is the length of an order if only 1% of all orders are shorter? (Round to

Proportions

n >= 30

n 500 (sample size =500)


p-bar 0.27 pbar = number of accounts purchasing an additional line i
confidence level 99%

Center of Interval 0.27 p bar is the centre of interval


z*s/sqrt(n) 0.05 Z score = normsinv(1-alpha/2) Standard deviation = s/sqrt
Lower end of int'l 0.219 Lower end = Center of interval + Z*s/sqrt(n)
Upper end of int'l 0.321 Upper end = Center of interval - Z*s/sqrt(n)

Interval width
1-confidence level 1%
(1-confidence level)/2 0.005
z 2.58
(p)(1-p) 0.20
s = sqrt[(p)(1-p)} 0.44
sqrt(n) 22.36
s/sqrt(n) 0.02

Check assumptions:
np>5 OK
n(1-p) > 5 OK
proportion of households that would purchase an additional cellular line if it
of 500 accounts is selected. The results indicate that 135 of the accounts would

ortion of accounts that would purchase the additional line if the handset were free. Fill in

and determine the Center of Interval.

n proportion of accounts that would purchase the additional line if the handset were free.

er at the local Briarpatch restaurant has a population mean m of 3.1 minutes and
m sample of 16 customers:

customer is at least 3 minutes?

es and 5 minutes to take an order? (Round to three decimals.)

are shorter? (Round to one decimal.)

Q4- a) Z score = (X -mean)/(standard deviation/sqrt(

P(X >3) = P( x-//sqrt(n))


sing an additional line if handset was free / n
Z score = -1

Probability = 1- pvalue at Z score as we have

ndard deviation = s/sqrt(n) Probability 0.8413447


*s/sqrt(n)
b) P(3<X<5) = P(Z > 5-//sqrt(n)) - P( 3-//sqrt(n))

Z score at X =5 19
Z score at X =3 -1

Probability = 0.8413447

c) Probability = 1%
Z score at probability -2.326348

Z score = (X -mean)/(standard deviation/sqrt(n))


Findout n using the above formula
andard deviation/sqrt(n))

at Z score as we have to calculate P(X>3)

- P( 3-//sqrt(n))
iation/sqrt(n))
25 Points:
Q-5 (15 Points). You are the manager of a popular retail store. You want to determine w
check-out has changed in the past month from its previous population mean value of 4.2
population is normally distributed with a population standard deviation of 1.6 minutes. Y
during a one hour period. The sample mean is 4.75 minutes. Determine whether there
population mean wait time to check-out has changed in the past month from its previous
findings utilizing information from the hypothesis test you conduct.

Q-5a (2.5 Points): State the Null Hypothesis

Q-5b (2.5 Points): State the Alternative Hypothesis.

Q-5c (5 Points): Based on the information do we reject or accept the null hypothesis. De
table below to help answer the questions.

Q-5d (5 Points): Describe what would occur if the same size was doubled.

Q-6 (5 points):
A sport preference poll yielded the following data for men and women. Use the 5% signif
gender are independent.

Sport Preference
Basketball Football Soccer
Gender Men 20 25 30 75
Women 18 12 15 45
Total 38 37 45 120

Q-7 (5 Points): Suppose that we observe a random sample of size n from a normally dis
5% significance level, is it true that we can definitely reject in favor of the appropriate one
why not?

Mean

Null Hypothesis: Mean = 4.25 V=n-1


Sample Mean, x-bar >4.75 V=43-1
Sample Std Dev., s 1.60 V=42
Sample Size, n 43
Confidence Level 5%

Sample Mean is in Rejection Region


Reject Null Hypothesis
Lower end of range
Center of range
Upper end of range

Sample Mean, x-bar

p-value
Significance Level

(Significance Level)/2
z
sqrt(n)
s/sqrt(n)
z*s/sqrt(n)

Value of z-statistic
ail store. You want to determine whether the population mean wait time for customer to
ous population mean value of 4.25 minutes. From past experience, you can assume that the
dard deviation of 1.6 minutes. You select a sample of 43 customers wait time to check-out
nutes. Determine whether there is evidence at the 0.05 level of significance that the
the past month from its previous population mean value of 4.25 minutes. Explain your
u conduct.

or accept the null hypothesis. Describe what this means for the results of the test. Fill in the

size was doubled.

n and women. Use the 5% significance level and test to determine is sport preference and

mple of size n from a normally distributed population. If we are able to reject in favor of at the
ect in favor of the appropriate one-tailed alternative at the 2.5% significance level? Why or

5a) Null hypothesis: = 4.25


5b) Alternate hypothesis: 4.25

5c) Z score = (X -mean)/(standard deviation/sqrt(n))


Z= 2.0491995
p value = 0.0404426 Check the excel tab for the formula
If p value <0.05, reject null hypothesis
If p value >0.05, accept null hypothesis
5d) Z score = (X -mean)/(standard deviation/sqrt(n))
Z= 2.8980058
p value = 0.0037554
If p value <0.05, reject null hypothesis
If p value >0.05, accept null hypothesis
This is not true for certain. Suppose and the sample mean we obs
obviously cant reject the null because the observed sample mea
reject the null at the 2.5% level. The reason is that we know the
value for a one-tailed test is half of this, or less than 0.025, which
nd the sample mean we observe is If the alternative for the one-tailed test is then we
e the observed sample mean is in the wrong direction. But if the alternative is we can
reason is that we know the p-value for the two-tailed test was less than 0.05. The p-
is, or less than 0.025, which implies rejection at the 2.5% level.
hen we
we can
he p-
Observed
Basketball Football
Gender Men 20 25
Women 18 12
Total 38 37
Soccer Total Basketball
30 75 Gender Men 23.75
15 45 Women 14.25
45 120 Total 38

Chi square distribution


Please check the tabs for formu
Null hypothesis: Sports preference and gender are independent
Alternate hypothesis: Sports preference and gender are dependent

Chi test p val 0.31384908


If p value < 0.05, reject the null hypothesis
If p value > 0.05, accept the null hypothesis

Since p value >0.05, we accept the null hypothesis and say that sports preference and gender are
Expected
Football Soccer Total
23.125 28.125 75
13.875 16.875 45
37 45 120

ase check the tabs for formula

preference and gender are independent