You are on page 1of 14

Running head: COMPREHENSIVE PROBLEM SETS

Comprehensive Problem Sets QNT 561

COMPREHENSIVE PROBLEM SETS Comprehensive Problem Sets

Chapter 3
88. Refer to the Baseball 2005 data, which reports information on the 30 major league teams for the 2005 baseball season. a. Select the variable team salary and find the mean, median, and the standard deviation. Mean = 73.06 Median = 66.20 Standard deviation = 34.23 b. Select the variable that refers to the age the stadium was built. (Hint: Subtract the year in which the stadium was built from the current year to find the stadium age and work with that variable.) Find the mean, median, and the standard deviation. Mean = 28.20 Median = 17.50 Standard deviation = 25.94 c. Select the variable that refers to the seating capacity of the stadium. Find the mean, median, and the standard deviation. Mean = 45,913 Median = 44,174 Standard deviation = 5,894

Chapter 5
56. Assume the likelihood that any flight on Northwest Airlines arrives within 15 minutes of the scheduled time is .90. We select four flights from yesterday for study. a. What is the likelihood all four of the selected flights arrived within 15 minutes of the scheduled time? P(x=4) = 0.65610 b. What is the likelihood that none of the selected flights arrived within 15 minutes of the scheduled time? P(x=0) = 0.00010 c. What is the likelihood at least one of the selected flights did not arrive within 15 minutes of the scheduled time? P(x1) = 0.99990 n=4 p = 0.9 q = 0.1 Binomial Distribution x p(x) 0 0.00010 1 0.00360 2 0.04860 3 0.29160 4 0.65610

COMPREHENSIVE PROBLEM SETS

Chapter 6
64. An internal study by the Technology Services department at Lahey Electronics revealed company employees receive an average of two emails per hour. Assume the arrival of these emails is approximated by the Poisson distribution. a. What is the probability Linda Lahey, company president, received exactly 1 email between 4 P.M. and 5 P.M. yesterday? P(x=1) = 0.270670566 b. What is the probability she received 5 or more email during the same period? P(x5) = 0.016563608 c. What is the probability she did not receive any email during the period? P(x=0) = 0.135335283

Chapter 7
50. Fast Service Truck Lines uses the Ford Super Duty F-750 exclusively. Management made a study of the maintenance costs and determined the number of miles traveled during the year followed the normal distribution. The mean of the distribution was 60,000 miles and the standard deviation 2,000 miles. Mean = 60,000 Std dev = 2000 Normal Distribution a. What percent of the Ford Super Duty F-750s logged 65,200 miles or more?

b. What percent of the trucks logged more than 57,060 but less than 58,280 miles?

c. What percent of the Fords traveled 62,000 miles or less during the year?

d. Is it reasonable to conclude that any of the trucks were driven more than 70,000 miles? Explain. Since the probability is close 0, it is not reasonable to conclude that any of the trucks were driven more than 70,000 miles.

COMPREHENSIVE PROBLEM SETS

Chapter 8
38. The mean amount purchased by a typical customer at Churchills Grocery Store is $23.50 with a standard deviation of $5.00. Assume the distribution of amounts purchased follows the normal distribution. For a sample of 50 customers, answer the following questions. a. What is the likelihood the sample mean is at least $25.00? P[mean > 25] = P[Z > (25 - 23.50)/0.7071] = P[Z > 2.12] = 0.0169. b. What is the likelihood the sample mean is greater than $22.50 but less than $25.00? P[22.50 < mean < 25] = P[(22.50-23.50)/0.7071 < Z < (25 - 23.50)/0.7071] = P[-1.41 < Z < 2.12] = 0.9037. c. Within what limits will 90 percent of the sample means occur? 90% of the samples are within 1.65 standard deviations of the mean, so 23.50 1.65 * 0.7071, which is 23.50 1.17, which is from 22.33 to 24.67.

Chapter 9
54. Families USA, a monthly magazine that discusses issues related to health and health costs, surveyed 20 of its subscribers. It found that the annual health insurance premiums for a family with coverage through an employer averaged $10,979. The standard deviation of the sample was $1,000. n=20 x-bar = 10,979 s = 1000 degrees of freedom (df) = 19 a. Based on this sample information, develop a 90 percent confidence interval for the population mean yearly premium. Confidence = .90, so = .10 and /2 = .05 t (df=19; /2 = .05) = 1.729132792 Error = (1.729) (1000)/20 = 386.6161533 = n = 107 Interval = x-bar Error Interval is (10592.38, 11365.62) b. How large a sample is needed to find the population mean within $250 at 99 percent confidence? Confidence = .99; Error = 250; =.01 so /2 = .005 Z (.005) = 2.575829304 n = [(2.579) (1000)/250] ^ 2 = 106.1583456

COMPREHENSIVE PROBLEM SETS

Chapter 10
42. During recent seasons, Major League Baseball has been criticized for the length of the games. A report indicated that the average game lasts 3 hours and 30 minutes. A sample of 17 games revealed the following times to completion. (Note that the minutes have been changed to fractions of hours, so that a game that lasted 2 hours and 24 minutes is reported at 2.40 hours.) 2.98 2.40 2.70 2.25 3.23 3.17 2.93 3.18 2.80 2.38 3.75 3.20 3.27 2.52 2.58 4.45 2.45 Can we conclude that the mean time for a game is less than 3.50 hours? Use the .05 significance level. Ho: 3.5 H1: < 3.5 = .05 p-value = 0.001 At the .05 level of significance, we reject Ho because p-value (.001) < (.05). There is enough sample evidence to support the claim that the average baseball game lasts less than 3.5 hours.

Chapter 11
58. The amount of income spent on housing is an important component of the cost of living. The total costs of housing for homeowners might include mortgage payments, property taxes, and utility costs (water, heat, electricity). An economist selected a sample of 20 homeowners in New England and then calculated these total housing costs as a percent of monthly income, five years ago and now. The information is reported below. Is it reasonable to conclude the percent is less now than five years ago?
Homeowner 1 2 3 4 5 6 7 8 9 10 Five Years Ago 17% 20 29 43 36 43 45 19 49 49 Now 10% 39 37 27 12 41 24 26 28 26 Homeowner 11 12 13 14 15 16 17 18 19 20 Five Years Ago 35% 16 23 33 44 44 28 29 39 22 Now 32% 32 21 12 40 42 22 19 35 12

(1 represents now and 2 represents 5 years ago) Ho: 1 2 H1: 1 < 2 = .05 p-value = 0.036 At the .05 level of significance, we reject Ho because p-value (.036) < (.05). There is enough sample evidence to support the claim that the percent now is less than the percent five years ago.

COMPREHENSIVE PROBLEM SETS

Chapter 12
42. Martin Motors has in stock three cars of the same make and model. The president would like to compare the gas consumption of the three cars (labeled car A, car B, and car C) using four different types of gasoline. For each trial, a gallon of gasoline was added to an empty tank, and the car was driven until it ran out of gas. The following table shows the number of miles driven in each trial.

Distance (miles)
Types of Gasoline Regular Super regular Unleaded Premium unleaded Car A 22.4 17.0 19.2 20.3 Car B 20.8 19.4 20.2 18.6 Car C 21.5 20.7 21.2 20.4

Using the .05 level of significance: a. Is there a difference among types of gasoline? (1 = Car A; 2 = Car B; 3 = Car C) Ho: 1 = 2 = 3 H1: At least one mean differs from another = .05 p-value = 0.424 At the .05 level of significance, we fail to reject Ho because p-value (.424) > (.05). There is not enough sample evidence to show that there is a difference among the cars. b. Is there a difference in the cars? (1 = Regular; 2 = Unleaded; 3 = Super; 4 = Premium) Ho: 1 = 2 = 3 = 4 H1: At least one mean differs from another = .05 p-value = 0.166 At the .05 level of significance, we fail to reject Ho because p-value (.166) > (.05). There is not enough sample evidence to show that there is a difference among types of gasoline.

COMPREHENSIVE PROBLEM SETS

Chapter 13
37. A regional commuter airline selected a random sample of 25 flights and found that the correlation between the number of passengers and the total weight, in pounds, of luggage stored in the luggage compartment is 0.94. Using the .05 significance level, can we conclude that there is a positive association between the two variables? n=25 r = 0.94 =.05 Ho: = 0 H1: 0 t = r ((n-2)/(1-r^2)) = 13.21342127 p-value = 0 At the .05 level of significance, we reject Ho because p-value (0) < (.05). There is enough sample evidence to show that there is a relationship between the two variables. 40. A suburban hotel derives its gross income from its hotel and restaurant operations. The owners are interested in the relationship between the number of rooms occupied on a nightly basis and the revenue per day in the restaurant. Below is a sample of 25 days (Monday through Thursday) from last year showing the restaurant income and number of rooms occupied. Day 1 2 3 4 5 6 7 8 9 10 11 12 13 Income $1,452 1,361 1,426 1,470 1,456 1,430 1,354 1,442 1,394 1,459 1,399 1,458 1,537 Occupied 23 47 21 39 37 29 23 44 45 16 30 42 54 Day 14 15 16 17 18 19 20 21 22 23 24 25 Income $1,425 27 1,445 1,439 1,348 1,450 1,431 1,446 1,485 1,405 1,461 1,490 1,426 Occupied 34 15 19 38 44 47 43 38 51 61 39

Use a statistical software package to answer the following questions. a. Does the breakfast revenue seem to increase as the number of occupied rooms increases? Yes. There mostly appears to be an increase in the breakfast revenue as the number of rooms occupied increases. The scatter plot supports this claim. Draw a scatter diagram to support your conclusion.

COMPREHENSIVE PROBLEM SETS

b. Determine the coefficient of correlation between the two variables. Interpret the value. r = 0.393, which appears to be a small positive correlation c. Is it reasonable to conclude that there is a positive relationship between revenue and occupied rooms? Use the .10 significance level. df = n-2 = 25-2 = 23 = 0.05 one-tailed critical value T = 1.319 Test statistic T = r* [(n-2)/(1-r)] = 0.44* [(25-2)/(1-0.44)] = 2.35 At the .10 level of significance, we reject Ho because p-value (.052) < (.10). Conclusion: Since 2.35 > 1.319 we conclude that there is a positive relationship between the variables. d. What percent of the variation in revenue in the restaurant is accounted for by the number of rooms occupied? r = 0.44 = 0.1936, so approximately 19% of the variation in revenue is explained by the variation in the number of rooms occupied.

COMPREHENSIVE PROBLEM SETS

Chapter 14
17. The district manager of Jasons, a large discount electronics chain, is investigating why certain stores in her region are performing better than others. She believes that three factors are related to total sales: the number of competitors in the region, the population in the surrounding area, and the amount spent on advertising. From her district, consisting of several hundred stores, she selects a random sample of 30 stores. For each store she gathered the following information. Y = total sales last year (in $ thousands). X1 = number of competitors in the region. X2 = population of the region (in millions). X3 = advertising expense (in $ thousands). The sample data were run on MINITAB, with the following results.

Analysis of variance
SOURCE Regression Error Total Predictor Constant X1 X2 X3 DF 3 26 29 Coef 14.00 -1.00 30.00 0.20 SS 3050.00 2200.00 5250.00 StDev 7.00 0.70 5.20 0.08 MS 1016.67 84.62 t-ratio 2.00 -1.43 5.77 2.50

a. What are the estimated sales for the Bryne store, which has four competitors, a regional population of 0.4 (400,000), and advertising expense of 30 ($30,000)? y = 14 - 4 + 30(0.4) + .20(30) = 28 b. Compute the R2 value. The R2= RSS/TSS = 3050/5250 = 0.5810 c. Compute the multiple standard error of estimate. MSEE = (ESS/(n-k)) = 2200 / (30-4) = 84.61 = 9.20. d. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not equal to zero. Use the .05 level of significance. F-stat on overall test: ((TSS - RSS)/(number of restrictions) / (RSS/(n-k)) = ((5250 - 3050)/3) / (3050/26) = 6.251. For the F-distribution with (3,26) degrees of freedom, this has a p-value of 0.0024, so we reject the hypothesis that all variables are insignificant.

COMPREHENSIVE PROBLEM SETS

10

e. Conduct tests of hypotheses to determine which of the independent variables have significant regression coefficients. Which variables would you consider eliminating? Use the .05 significance level. To determine which variables are significant at the 95% level, we compare their individual t-stats to the critical value of the t-distribution with n-k=26 degrees of freedom and /2=0.025, which is 2.06. Both x2 and x3 are significant at the 95% confidence level, while x1 is not (its t-stat is 1.43 < 2.06). 18. Suppose that the sales manager of a large automotive parts distributor wants to estimate as early as April the total annual sales of a region. On the basis of regional sales, the total sales for the company can also be estimated. If, based on past experience, it is found that the April estimates of annual sales are reasonably accurate, then in future years the April forecast could be used to revise production schedules and maintain the correct inventory at the retail outlets. Several factors appear to be related to sales, including the number of retail outlets in the region stocking the companys parts, the number of automobiles in the region registered as of April 1, and the total personal income for the first quarter of the year. Five independent variables were finally selected as being the most important (according to the sales manager). Then the data were gathered for a recent year. The total annual sales for that year for each region were also recorded. Note in the following table that for region 1 there were 1,739 retail outlets stocking the companys automotive parts, there were 9,270,000 registered automobiles in the region as of April 1 and so on. The sales for that year were $37,702,000. Annual Sales ($ millions), Y 37.702 1,739 24.196 1,221 32.055 1,846 3.611 17.625 1,096 45.919 2,290 29.600 1,687 8.114 20.116 649 12.994 1,427 Number of Retail Outlets, X1 9.27 5.86 8.81 120 3.81 10.31 11.62 8.96 241 6.28 7.77 10.92 Number of Automobiles Registered (millions), X2 Personal Income ($ billions), X3 85.4 3.5 60.7 5.0 68.1 4.4 20.2 33.8 3.5 95.1 4.1 69.3 4.1 16.3 5.9 34.9 5.5 15.1 4.1 Average Age of Automobiles (years), X4 9.0 5.0 7.0 4.0 7.0

Number of Supervisors, X5

5.0 13.0 15.0 11.0

16.0 10.0

a. Consider the following correlation matrix. Which single variable has the strongest correlation with the dependent variable? The correlations between the independent variables

COMPREHENSIVE PROBLEM SETS

11

outlets and income and between cars and outlets are fairly strong. Could this be a problem? What is this condition called? The independent variable Income has the highest correlation with the dependent variable sales, with a correlation of 0.964. When independent variables have a strong correlation with each other (as opposed to the dependent variable), it is called multi-co linearity. This can be a problem because multi-co linearity causes the coefficients for these independent predictors to change dramatically with small changes in the data. Thus, it makes their coefficients unreliable. outlets cars income age bosses sales 0.899 0.605 0.964 -0.323 0.286 outlets 0.775 0.825 -0.489 0.183 0.409 -0.447 0.395 cars income age

-0.349 0.155

0.291

b. The output for all five variables is on the following page. What percent of the variation is explained by the regression equation?
The regression equation is sales = -19.7 - 0.00063 outlets + 1.74 cars + 0.410 income + 2.04 age - 0.034 bosses Predictor Coef St Dev Constant -19.672 5.422 Outlets -0.000629 0.002638 cars 1.7399 0.5530 income 0.40994 0.04385 age 2.0357 0.8779 bosses -0.0344 0.1880 Analysis of Variance SOURCE Regression Error Total DF 5 4 9 SS 1593.81 9.08 1602.89

t-ratio -3.63 -0.24 3.15 9.35 2.32 -0.18 MS 318.76 2.27

R^2 = .994, so 99% of the variance is accounted for by the regression equation. c. Conduct a global test of hypothesis to determine whether any of the regression coefficients are not zero. Use the .05 significance level. Ho: 1 = 2 = 3 = 4 = 5 = 0 H1: 1 2 3 4 5 0 F = MSreg/Mserr = 140.4229075 p-value = 0.0000000294 At the .05 level of significance, we reject Ho because p-value (0) < (.05). There is enough sample evidence to show that the regression coefficients are not equal to zero. d. Conduct a test of hypothesis on each of the independent variables. Would you consider eliminating outlets and bosses? Use the .05 significance level.

COMPREHENSIVE PROBLEM SETS Ho: 1 = 0 H1: 1 0 Ho: 2 = 0 H1: 2 0 Ho: 3 = 0 H1: 3 0 Ho: 4 = 0 Ho: 5 = 0 H1: 5 0 H1: 4 0 p-value 1 = 0.814902011 p-value 2 = 0.03451402 p-value 3 = 0.000722628 p-value 4 = 0.082939037 p-value 5 = 0.873261877 At the .05 level of significance, the cars, income and age coefficients are significant. However, the outlets and bosses coefficients are not significant at the .05 level of significance. Thus, we should consider eliminating these two insignificant variables.

12

e. The regression has been rerun below with outlets and bosses eliminated. Compute the coefficient of determination. How much has R2 changed from the previous analysis?
The regression equation is sales = -18.9 + 1.61 cars + 0.400 income + 1.96 age Predictor Coef Constant -18.924 cars 1.6129 income 0.40031 age 1.9637 Analysis of Variance SOURCE Regression Error Total DF 3 6 9

StDev 3.636 0.1979 0.01569 0.5846 SS 1593.66 9.23 1602.89

t-ratio -5.20 8.15 25.52 3.36 MS 531.22 1.54

The R^2 did not change when the two variables were omitted from the regression. It is still 0.994, which is 99% of the variance explained by the regression equation.

f. Following is a histogram and a stem-and-leaf chart of the residuals. Does the normality assumption appear reasonable?

COMPREHENSIVE PROBLEM SETS

13

Histogram of residual N = 10 Midpoint -1.5 -1.0 -0.5 -0.0 0.5 1.0 1.5 Count 1 1 2 2 2 1 1 * * ** ** ** * *

Stem-and-leaf of residual N = 10 Leaf Unit = 0.10 1 2 2 5 5 3 1 1 -1 -1 -0 -0 0 0 1 1 7 2 440 24 68 7

Yes, the residuals appear to be normally distributed, so the normality assumption appears very reasonable. g. Following is a plot of the fitted values of Y (i.e., Y) and the residuals. Do you see any violations of the assumptions? The residuals about the fitted values appear to be scattered and random, meaning it does not appear that any assumptions have been violated.

COMPREHENSIVE PROBLEM SETS

14

Chapter 17
22. Banner Mattress and Furniture Company wishes to study the number of credit applications received per day for the last 300 days. The information is reported on the next page. Number of Credit Applications 0 1 2 3 4 5 or more Frequency (Number of Days) 50 77 81 48 31 13

To interpret, there were 50 days on which no credit applications were received, 77 days on which only one application was received, and so on. Would it be reasonable to conclude that the population distribution is Poisson with a mean of 2.0? Use the .05 significance level. Hint: To find the expected frequencies use the Poisson distribution with a mean of 2.0. Find the probability of exactly one success given a Poisson distribution with a mean of 2.0. Multiply this probability by 300 to find the expected frequency for the number of days in which there was exactly one application. Determine the expected frequency for the other days in a similar manner. Ho: the distribution is Poisson H1: the distribution is not the Poisson distribution # of apps Obs Freq Prob Expected 0 50 0.135335283 1 77 0.270670566 2 81 0.270670566 3 48 0.180447044 4 31 0.090223522 5+ 13 0.052653017

Poisson (O-E)^2/E 40.60058497 81.20116994 81.20116994 54.13411329 27.06705665 15.7959052 2.176052 0.217359 0.000498 0.695076 0.571471 0.494881

X2 = 4.155338 p-value = 0.472725119 At the .05 level of significance, we fail to reject Ho because p-value (.4728) > (.05). There is sufficient sample evidence to support that the claim that the distribution is a Poisson distribution with a mean of 2.0.

You might also like