You are on page 1of 38

Instructors Solutions Manual - Chapter 13

Chapter 13 Solutions Develop Your Skills 13.1 1. The scatter diagram is shown below.

HendrickSoftwareSales
140 120 y=6.6519x+ 4.7013

TotalSales($000)

100 80 60 40 20 0 5 7 9 11 13 15 17 NumberofSalesContacts

The least-squares regression line is: total sales ($000) = 6.6519(number of sales contacts) + 4.7013 Interpretation: Each new sales contact results in an increase in sales of approximately $6,652. The y-intercept should not be interpreted, since the sample data did not contain any observations of 0 sales contacts. 2. The equation of the least-squares regression line is monthly spending on restaurant meals = 0.024144(monthly income)+$44.90 Interpretation: Each new dollar in monthly income increases spending on restaurant meals by about 2.4.

Copyright 2011 Pearson Canada Inc.

351

Instructors Solutions Manual - Chapter 13

3.

A scatter diagram is shown below.

SmithandKlein Manufacturing
$1,600,000 $1,400,000 $1,200,000 $1,000,000 y= 30.21x 148770

Sales

$800,000 $600,000 $400,000 $200,000 $0 $0 $10,000 $20,000 $30,000 $40,000 $50,000 PromotionExpenditure

The least-squares regression line is: annual sales = 30.21(annual promotion spending) - $148,770 Interpretation: Each new dollar in promotion spending results in an increase in annual sales of approximately $30.21. The y-intercept should not be interpreted, since the sample data did not contain any observations of $0 annual promotion spending. 4. The response variable is the semester average mark, and the explanatory variable is the total number of hours spent working during the semester. The relationship is unlikely to be positive. y = 0.1535x + 90.241 suggests that a student who worked no hours would get a mark of 90%, which seems a little high (but this intercept may not be reasonable to interpret this way, depending on the range of hours worked in the sample data). It also suggests that for each hour worked, the students mark would increase by 0.1535, which seems unlikely. It is more likely that the student's mark would decrease for each hour worked.

Copyright 2011 Pearson Canada Inc.

352

Instructors Solutions Manual - Chapter 13

5.

Because of the way the researcher has posed the question, the response variable is revenues, and the explanatory variable is the number of employees. The scatter diagram is shown below:

Top25Global Research Organizations,2007


GlobalResearchRevenues(US$Millions)
5,000 4,500 4,000 3,500 3,000 2,500 2,000 1,500 1,000 500 0 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 FullTimeEmployees y= 0.1338x+ 140.56

The least-squares regression line is: revenue (US$millions) = 0.1338(number of full-time employees) + $140.56 US million Interpretation: Each additional thousand employees results in increased revenue of US$0.1338 million (or US$133,800). The y-intercept should not be interpreted, since the sample data did not contain any observations of 0 employees.

Copyright 2011 Pearson Canada Inc.

353

Instructors Solutions Manual - Chapter 13

Develop Your Skills 13.2 6. The scatter diagram showed an apparently linear relationship between software sales and the number of sales contacts (see Develop Your Skills 13.1, Exercise 1).

NumberofSales Contacts ResidualPlot


20 15 10
Residuals

5 0 5 0 10 15 20 NumberofSalesContacts 5 10 15 20

The residual plot shows residuals centred on zero, with fairly constant variability. There is no indication that the error terms are not independent. The data were collected over a random sample of months, but the dates of collection are not included, so it is not possible to check for independence of the residuals over time. A histogram of the residuals appears to be approximately normal.

HendrickSoftware SalesResiduals
9 8 7 6 5 4 3 2 1 0 Residual

Frequency

Copyright 2011 Pearson Canada Inc.

354

Instructors Solutions Manual - Chapter 13

A check of the scatter diagram and the standardized residuals does not reveal any outliers. There are no obvious influential observations. It appears that the sample data meet the requirements of the theoretical model. 7. The scatter diagram does not contain much of a pattern, but if there is a relationship, it appears to be linear.

SpendingonRestaurantMealsand Income y= 0.0241x+ 44.903


MonthlySpendingonRestaurantMeals
$250 $200 $150 $100 $50 $0 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500 MonthlyIncome

MonthlyIncomeResidualPlot
150 100 Residuals 50 0 50 100 150 $ $1,000 $2,000 $3,000 $4,000 $5,000 MonthlyIncome

The residual plot shows a fairly constant variability, although the residuals appear to be a little larger on the positive side (except in the area of monthly incomes of around $3,500). There is no obvious dependence among the residuals.
Copyright 2011 Pearson Canada Inc.

355

Instructors Solutions Manual - Chapter 13

A histogram of the residuals appears to be approximately normal.

Residualsfor Model of Restaurant Spending and Monthly Income


30 25

Frequency

20 15 10 5 0 Residual

A check of the scatter diagram and the standardized residuals reveals six points that could be considered outliers. They are circled on the scatter diagram below.

SpendingonRestaurantMealsand Income
MonthlySpendingonRestaurantMeals
$250 $200 $150 $100 $50 $0 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500 MonthlyIncome

Copyright 2011 Pearson Canada Inc.

356

Instructors Solutions Manual - Chapter 13

The presence of so many outliers is a cause for concern. [If we had access to the original data set, we would check to see that these observations were accurate.] These outliers obviously increase the variability of the error terms. Even if the data points identified as outliers are correct, they are an indication that the model will probably not be very useful for prediction purposes. There are two points in the data set that may be influential observations. They are indicated in the scatter diagram below.

SpendingonRestaurantMealsand Income
MonthlySpendingonRestaurantMeals
$250 $200 $150 $100 $50 $0 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500 MonthlyIncome

To investigate, each point is removed from the data set, to see the effect on the leastsquares regression line. The least-squares line for the original sample data set was y = 0.0241x + 44.903. Without the circled point on the right-hand side, the equation changes to y = 0.0214x + 50.639, which is not that much of a change, relatively speaking. Similarly, the outlier at (1258.97, 154.68) could be having a large effect on the leastsquares line. Removing it changes the equation to y = 0.0262x + 39.292, which has more of an effect. Still, neither point appears to be affecting the regression relationship by a large amount (relatively speaking).

Copyright 2011 Pearson Canada Inc.

357

Instructors Solutions Manual - Chapter 13

However, at this point in the analysis, it would be useful to go back to the beginning. It does not appear that monthly income is a strong predictor of monthly restaurant spending. There is too much variability in the restaurant spending data, for the various income levels, for us to develop a useful model. 8. The scatter diagram shows the points arranged in a linear fashion. However, the scatter around the regression line appears to widen as the amount of promotional spending increases. This shows quite clearly in the residual plot.

PromotionExpenditureResidual Plot
300000 200000 100000 0 100000 200000 300000 $0 $10,000 $20,000 $30,000 $40,000 $50,000 PromotionExpenditure

At this point, it is clear that the data do meet the requirements of the theoretical model. [For completeness, we will continue to check the other requirements.]

Residuals

Copyright 2011 Pearson Canada Inc.

358

Instructors Solutions Manual - Chapter 13

This is time-series data, and so the residuals should be plotted against time. The resulting plot shows a definite pattern over time, with the residuals widening in more recent years. This again indicates a problem; the current model does not meet the requirements of the theoretical model.

ResidualsOverTime,Smith andKlein Manufacturing


250000 200000 150000 100000 50000 0 50000 100000 150000 200000 250000
1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010

At this point, it is clear that the model should be re-specified. Introducing time as an explanatory variable would probably be of interest.

Residual

Copyright 2011 Pearson Canada Inc.

359

Instructors Solutions Manual - Chapter 13

9.

With the two erroneous data points removed, the scatter diagram looks as shown below.

HoursofWorkandSemesterMarks
100 90 80 70 60 50 40 30 20 10 0 0 100 200 y= 0.144x+ 89.175

SemesterAverageMark

300

400

TotalHoursatPaidJobDuringSemester

The relationship appears to be linear. The residual plot is shown below.

TotalHoursatPaid JobDuring SemesterResidualPlot


15 10 5

Residuals

0 5 0 10 15 20 TotalHoursatPaidJobDuringSemester 100 200 300 400

The residuals appear centred on zero, with fairly constant variability, although variability seems greatest in the middle of the range of hours worked.

Copyright 2011 Pearson Canada Inc.

360

Instructors Solutions Manual - Chapter 13

There is no indication that the residuals are dependent. A histogram of the residuals is shown below.

14 12 10 8 6 4 2 0

Residualsfor SemesterMark and Hours ofWork Data

Frequency

Residual

The histogram is quite normal in shape. A check of the standardized residuals does not reveal any that are -2 or +2, although there is one observation with a standardized residual of -1.99. This is the observation (72, 65). [If we could, we would check this data point to make sure that it is accurate.] This point is quite obvious in both the scatter diagram and residual plot (the point is circled in these two graphs). There are no obvious influential observations, except perhaps for the almost-outlier. Removing this point from the data set does not affect the least squares regression line significantly. Despite the one troublesome point, the data set does appear to meet the requirements of the theoretical model.

Copyright 2011 Pearson Canada Inc.

361

Instructors Solutions Manual - Chapter 13

10. The relationship between revenues and number of employees appears to be linear. The residual plot is shown below.

FullTimeEmployeesResidualPlot
1200 1000 800 600
Residuals

400 200 0 200 0 400 600 FullTimeEmployees 5000 10000 15000 20000 25000 30000 35000

The residuals do not appear to be centred on zero, and the variability is not constant. At this point, it appears that this sample data set does not appear to meet the requirements of the theoretical model. A histogram of the residuals is shown below.

Residualsfor Top 25Global Research Organizations


16 14 12
Frequency

10 8 6 4 2 0 Residuals

Copyright 2011 Pearson Canada Inc.

362

Instructors Solutions Manual - Chapter 13

The histogram of residuals confirms what we saw in the residual plot. The residuals are highly skewed to the right. There is one observation with a standardized residual of 3.8. The corresponding point is circled on the residual plot above. Develop Your Skills 13.3 11. Since the sample data meet the requirements, it is acceptable to proceed with the hypothesis test. H0: 1 = 0 (that is, there is no linear relationship between the number of sales contacts and sales) H1: 1 > 0 (that is, there is a positive linear relationship between the number of sales contacts and sales) = 0.05 From the Excel output, t = 7.64 The p-value is 9.38E-08, which is very small. The p-value for the one-tailed test is only half of this value, and is certainly < . In other words, there is almost no chance of getting sample results like these, if in fact there is no linear relationship between the number of sales contacts and sales. Therefore, we can (with confidence), reject the null hypothesis and conclude there is evidence of a positive linear relationship between the number of sales contacts and sales data for the Hendrick Software Sales Company. 12. We already expect that the model will not be particularly useful. The number of data points with standardized residuals either +2 or -2 are a concern. However, the hypothesis test provides some evidence that there is a linear relationship between monthly income and monthly spending on restaurant meals. H0: 1 = 0 (that is, there is no linear relationship between monthly income and monthly spending on restaurant meals) H1: 1 > 0 (that is, there is a positive linear relationship between the number of sales contacts and sales) = 0.05 From the Excel output, t = 4.6. The p-value is on the output is 1.338E-05, and the pvalue for the one-tailed test is half of this. Reject H0 and conclude there is evidence of a positive linear relationship between monthly income and monthly spending on restaurant meals. 13. Since the sample data do not meet the requirements of the theoretical model, it is not appropriate to conduct a hypothesis test.

Copyright 2011 Pearson Canada Inc.

363

Instructors Solutions Manual - Chapter 13

14. Since the sample data meet the requirements, it is acceptable to proceed with the hypothesis test. H0: 1 = 0 (that is, there is no linear relationship between the number of hours worked during the semester and the semester average grade) H1: 1 < 0 (that is, there is a negative linear relationship between the number of hours worked during the semester and the semester average grade) = 0.05 From the Excel output, t = -10.01 The p-value is 2.47086E-12, which is very small. The p-value for the one-tailed test is only half of this value, and is certainly < . In other words, there is almost no chance of getting sample results like these, if in fact there is no linear relationship between the number of hours worked during the semester and the semester average grade. Therefore, we can (with confidence), reject the null hypothesis and conclude there is evidence of a negative linear relationship between the number of hours worked during the semester and the semester average grade. 15. Since the sample data do not meet the requirements of the theoretical model, it is not appropriate to conduct a hypothesis test. Develop Your Skills 13.4 16. From the Excel output, R2 = 0.72. This means that 72% of the variation in sales is explained by the number of sales contacts. This suggests a fairly strong linear association between the two variables, which is not surprising. Assuming the original data was collected correctly, it is possible that the other factors affecting sales have been randomized. In such a case, it would seem reasonable to conclude that increasing sales contacts would lead to increased sales. However, there will likely be limits to the positive impact that could be created. Presumably, salespeople contact their best prospective clients first, so additional contacts may not be as productive. As well, increasing the number of contacts may reduce the quantity of time spent with each contact, which could have a detrimental effect on sales. 17. The R2 value for this data set is only 0.18. This is not surprising, because the scatter diagram of the relationship revealed scarcely any perceivable pattern. Only 18% of the variation in monthly spending on restaurant meals is explained by income. Earlier investigations suggested this model was not worth pursuing, and the low R2 value reinforces that. 18. The R2 value is fairly high, at 0.83. This means that 83% of the variation in Smith and Kleins sales is explained by sales promotion spending. However, while there is a strong association between the two variables, the linear regression model is not a good one.

Copyright 2011 Pearson Canada Inc.

364

Instructors Solutions Manual - Chapter 13

19. The R2 value, at 0.72, suggests that 72% of the variation in semester average marks is explained by hours spent working during the semester. (Note that this is for the amended data set, where the two erroneous grades have been removedsee Develop Your Skills 13.2, Exercise 9). Obviously, there are many factors that affect semester average marks, for example, ability, study habits, past educational experience, and so on. If the original data were collected in a truly random fashion, these factors may have been randomized. It seems reasonable to conclude that students who work less will have more time for their studies, and it seems reasonable to think that marks improve with time spent studying. However, this data set does not guarantee that reducing work will lead to improved marks. 20. The R2 value is 0.93. Notice that this value looks very promising. Remember, though, that the model did not meet the requirements of the theoretical model. Remember, a high R2 value does not guarantee a cause-and-effect relationship, or a useful model. Develop Your Skills 13.5 21. Since the requirements are met, it is appropriate to create a confidence interval. The Excel output is shown below (in two parts, to better fit on the page).

ConfidenceIntervaland Prediction IntervalsCalculations Point 98% =ConfidenceLevel(%) Number Numberof SalesContacts 1 10

PredictionInterval ConfidenceInterval Lowerlimit Upperlimit Lowerlimit Upperlimit 44.96826 97.471443 66.068659 76.37104

With 98% confidence, the interval ($66,069, $76,371) contains the average sales for 10 sales contacts. 22. We have already established this is not a good model. However, even if it were a good model, we would not use it to predict monthly spending on restaurant meals based on a monthly income of $6,000. The highest monthly income in the sample data set is $4,056, and so we should not rely on our model to make predictions for a monthly income of $6,000. 23. Since the requirements are not met, it is not appropriate to create a confidence interval.

Copyright 2011 Pearson Canada Inc.

365

Instructors Solutions Manual - Chapter 13

24. The Excel output is shown below (note that this is for the amended data set, where the two erroneous grades have been removedsee Develop Your Skills 13.2, Exercise 9).

ConfidenceIntervaland Prediction IntervalsCalculations Point 95% =ConfidenceLevel(%) Number TotalHoursatPaidJobDuringSemester 1 200

PredictionInterval ConfidenceInterval Lowerlimit Upperlimit Lowerlimit Upperlimit 46.027952 74.74128231 58.1586403 62.61059452

With 95% confidence, the interval (58.2, 62.6) contains the average semester average mark, when students work 200 hours in paid employment during the semester. 25. Since the requirements are not met, it is not appropriate to construct a prediction interval. Chapter Review Exercises 1. The hypothesis test is only valid if the required conditions are met. If you don't check conditions, you may rely on a hypothesis test when it is misleading. 2. Regression prediction intervals are wider than confidence intervals because the interval has to account for the distribution of y-values around the regression line. The regression confidence interval has to take into account only that the sample regression line may not match the true population regression line. A lower standard error means that confidence and prediction intervals will be narrower. Predictions made with the model will therefore be more useful. You should not make predictions outside the range of the sample data on which the regression relationship is based because the relationship may be very different there. For example, a linear model may provide a good approximation of a portion of a relationship that is actually a curved line. However, if the line is extended beyond this portion, it could be quite misleading. It is always tempting to just remove problem data points. However, if you do this, you will often find that the remaining data points also have outliers. If you persist in the practice of removing troublesome data points, you may not have much data left! Careful thinking is a better approach. The outlier may be telling you something really important about the actual relationship between the explanatory and response variables. You wouldn't want to miss this important clue to what is really going on.

3. 4.

5.

Copyright 2011 Pearson Canada Inc.

366

Instructors Solutions Manual - Chapter 13

6.

The scatter diagram is shown below.

ListPriceandOdometerReadingfor2006 HondaCivicSedan(asofFall2008)
$22,000 $20,000 y= 0.0374x+18017

ListPrice

$18,000 $16,000 $14,000 $12,000 $10,000 0 20,000 40,000 60,000 80,000 100,000 120,000 OdometerReading

The relationship is: $list price = -0.0374 (odometer reading in kilometers) + $18,017 For this small car, the base asking price is $18,017, which is reduced by about 3.7 for every kilometer on the odometer. However, note that this base asking price should not be trusted for any cars with fewer than 8,600 kilometres, since no cars in the data set had odometer readings below that.

Copyright 2011 Pearson Canada Inc.

367

Instructors Solutions Manual - Chapter 13

7.

We have already examined the scatter diagram, which suggests a negative linear relationship. The residual plot is shown below. It has the desired appearance of constant variability, with the residuals centred on zero.

3000 2000 1000

OdometerResidualPlot

Residuals

0 0 1000 2000 3000 4000 Odometer 20000 40000 60000 80000 100000 120000

A histogram of the residuals is shown below. The histogram is not perfectly normally-distributed, but it is approximately so.

Residualsfor Honda CivicListPrice Model,Based on Odometer


8 7 6

Frequency

5 4 3 2 1 0 Residual

Copyright 2011 Pearson Canada Inc.

368

Instructors Solutions Manual - Chapter 13

There are no standardized residuals +2 or -2. It appears the sample data meet the requirements of the theoretical model, and so it would be appropriate to use odometer readings to predict the list prices of these used cars. A 95% prediction interval for the list price for one of these cars with 50,000 kilometres on the odometer is ($12,683, $19,608). The Excel output is shown below.

ConfidenceIntervaland Prediction IntervalsCalculations Point 95% =ConfidenceLevel(%) Number Odometer 1 50000

PredictionInterval ConfidenceInterval Lowerlimit Upperlimit Lowerlimit Upperlimit 12683.4909 19607.9242 15259.8312 17031.584

8.

A scatter diagram showing the two stock market indexes is shown below. Note that the data used are the "adjusted close" figures. You must take care to match the datesthere are a few instances when one market is open and the other is not. Observations that did not have a match were removed from the data set.

TSXandDJI,January June,2009
11,000
S&P/TSXCompositeIndex

y= 1.2553x 894.84

10,500 10,000 9,500 9,000 8,500 8,000 7,500 7,000 6,000 6,500 7,000 7,500 8,000 8,500 9,000 9,500 DowJonesIndustrialAverage

The estimated relationship is as follows: TSX Composite Index = 1.255 (DJI) 895

Copyright 2011 Pearson Canada Inc.

369

Instructors Solutions Manual - Chapter 13

Note that the choice of variable on the x or y axis is somewhat arbitrary here. Because Canada's economy is so dependent on exports to the US, the DJI is placed as the "explanatory" variable, but the cause and effect is not direct. 9. The coefficient of determination for the TSX and the DJI over the first six months of 2009 is 0.72. This measure suggests that 72% of the variation in the TSX is explained by variation in the DJI.

10. This data set is not a random sample, because it includes all matched observations over the period studied. Could this be considered a random sample? Probably not. The credit crisis and the recession that were having impacts on the stock markets in the first six months of 2009 made this period unreliable as a model of how the two indexes behave during more normal times. However, it is interesting to examine the patterns in the indexes over the period. The indexes were more closely related at the beginning of 2009 than they were later in the period. A time-series plot reveals this quite clearly.

TSXandDJI,January June2009
11,000 10,500 10,000 9,500
IndexValues

9,000 8,500 8,000 7,500 7,000 6,500 6,000


08May09 22May09 13Mar09 27Mar09 10Apr09 13Feb09 27Feb09 24Apr09 05Jun09 19Jun09 02Jan09 16Jan09 30Jan09

DJI TSX

The required conditions are not met (as we might expect, given the graph above).

Copyright 2011 Pearson Canada Inc.

370

Instructors Solutions Manual - Chapter 13

The residual plot clearly shows non-constant variability.

DJIResidualPlot
1000 500

Residuals

0 500 1000 1500 6,000 6,500 7,000 7,500 DJI 8,000 8,500 9,000 9,500

As well, the histogram of residuals shows marked negative skewness.

Residuals,TSX and DJIData, January June2009


40 35 30
Frequency

25 20 15 10 5 0 Residual

Copyright 2011 Pearson Canada Inc.

371

Instructors Solutions Manual - Chapter 13

A plot of the residuals over time clearly shows a time-related pattern.

ResidualsOverTime, TSXand DJIData,January June2009


1000 500
Residuals

0 500 1000 1500


08May09 22May09 13Mar09 27Mar09 10Apr09 13Feb09 27Feb09 24Apr09 05Jun09 19Jun09 02Jan09 16Jan09 30Jan09

Copyright 2011 Pearson Canada Inc.

372

Instructors Solutions Manual - Chapter 13

11. A scatter diagram is shown below.

StudentMarksin Statistics
100 90 80 70 60 50 40 30 20 10 0 0 20 40 60 MarkonTest#2 y= 0.9586x+ 0.4464
MarkonFinalExam

80

100

The estimated relationship is as follows: Mark on final exam = 0.9586 (Mark on Test #2) + 0.4464 In other words, it appears the mark on the final exam is about 96% of the mark on Test #2.

Copyright 2011 Pearson Canada Inc.

373

Instructors Solutions Manual - Chapter 13

12. The residual plot has the desired appearance.

Markon Test#2ResidualPlot
10 5

Residuals

0 5 10 15 Markon Test#2 0 20 40 60 80 100

A histogram of the residuals appears approximately normally-distributed.

Residualsfor FinalExam Marks PredictionModel


10 8

Frequency

6 4 2 0 Residual

There are no obvious influential observations or outliers. It appears that the sample data conform to the requirements of the theoretical model.

Copyright 2011 Pearson Canada Inc.

374

Instructors Solutions Manual - Chapter 13

13. Since the sample data meet the requirements, it is acceptable to proceed with the hypothesis test. H0: 1 = 0 (that is, there is no linear relationship between the mark on Test #2 and the final exam mark in Statistics) H1: 1 > 0 (that is, there is a positive linear relationship between the mark on Test #2 and the final exam mark in Statistics) = 0.05 From the Excel output, t = 16.5 The p-value is 2.96E-14, which is very small. The p-value for the one-tailed test is only half of this value, and is certainly < 5%. In other words, there is almost no chance of getting sample results like these, if in fact there is no linear relationship between the mark on Test #2 and the final exam mark in Statistics. Therefore, reject H0 and conclude there is strong evidence of a positive linear relationship between the mark on Test #2 and the final exam mark in Statistics. 14a. The Excel output is shown below.

PredictionInterval

ConfidenceInterval

Lowerlimit Upperlimit Lowerlimit Upperlimit 51.78719489 73.7293732 60.5028627 65.013705

b. c.

The 95% confidence interval estimate for the average exam mark of students who had a mark of 65% on the second test in the Statistics course is (60.5, 65). The 95% prediction interval estimate for the exam mark of a student who had a mark of 65% on the second test in the Statistics course is (51.8, 73.75). This interval is wider, because it has to take into the account the variability in individual marks of the students. The regression prediction interval is always wider than the confidence interval. The prediction interval has to take account of the distribution of exam marks around the regression line.

Copyright 2011 Pearson Canada Inc.

375

Instructors Solutions Manual - Chapter 13

15. A scatter diagram of the data is shown below.

AriesCarParts
$1,000 $900
Auditor'sInventoryValue

y=0.9806x+ 25.233

$800 $700 $600 $500 $400 $300 $200 $100 $ $ $200 $400 $600 $800 $1,000 RecordedPartsInventoryValue

If the inventory records are generally accurate, we would expect the slope of the regression line to be very close to 1, as it appears to be. It appears there is a strong positive relationship between the recorded inventory value and the audited inventory value. The relationship is as follows: auditor's inventory value = 0.9806(recorded parts inventory value) + $25.23

Copyright 2011 Pearson Canada Inc.

376

Instructors Solutions Manual - Chapter 13

16. As the scatter diagram created for Exercise 15 indicates, there appears to be a fairly strong positive linear relationship between the recorded and audited inventory values. The residual plot is shown below.

RecordedPartsInventory ValueResidualPlot
80 60 40 20 0 20 40 60 $ $200 $400 $600 $800 $1,000 RecordedPartsInventoryValue

The residual plot shows residuals fairly randomly distributed around zero, with about the same variability for all x-values. There are two residuals that show unusual variability. They are circled in the plot. The data were all collected at about the same point in time, so there is no need to check residuals against time. A review of the standardized residuals reveals two outliers, observation #1 and observation #25 (these are the two points that are circled in the residual plot). Since the auditor has realized that he misread the written records for both data points, we will amend the data, and re-do the analysis.

Residuals

Copyright 2011 Pearson Canada Inc.

377

Instructors Solutions Manual - Chapter 13

The new scatter diagram is as shown below.

AriesCarParts
$1,000 $900
Auditor'sInventoryValue

y= 0.9783x+ 25.227

$800 $700 $600 $500 $400 $300 $200 $100 $ $ $200 $400 $600 $800 $1,000 RecordedPartsInventoryValue

The new regression relationship is as follows: audited inventory value = 0.9783(recorded inventory value) + $25.23

Copyright 2011 Pearson Canada Inc.

378

Instructors Solutions Manual - Chapter 13

The residual plot for the amended data plot is shown below.

RecordedPartsInventory ValueResidualPlot
40 30 20 10 0 10 20 30 40 $ $200 $400 $600 $800 $1,000 RecordedPartsInventoryValue

The residual plot for the amended data set looks acceptable. A histogram of the residuals for the amended data set is shown below.

Residuals
9 8 7 6 5 4 3 2 1 0

Residualsfor AriesCarParts Model

Frequency

Residual

The histogram of residuals shows some positive skewness, and this is a cause for concern, suggesting caution in the use of the model.

Copyright 2011 Pearson Canada Inc.

379

Instructors Solutions Manual - Chapter 13

A check of the standardized residuals does not reveal any outliers. There are no obviously influential observations. It appears the corrected data set meets the requirements for the linear regression model, although the distribution of the residuals is not as normal in shape as is desired. 17. While we have some concern about the distribution of residuals, we will proceed with the hypothesis test. H0: 1 = 0 (that is, there is no linear relationship between the recorded inventory values and the audited inventory values) H1: 1 0 (that is, there is a linear relationship between the recorded inventory values and the audited inventory values) = 0.05 An excerpt of Excels regression output is shown below.

SUMMARYOUTPUT RegressionStatistics MultipleR 0.995213711 RSquare 0.99045033 AdjustedRSquare 0.990160946 StandardError 16.61634358 Observations 35 ANOVA df Regression Residual Total 1 33 34 SS 944994.372 9111.394836 954105.7668 MS F 944994.372 3422.616936 276.1028738

Intercept RecordedParts InventoryValue

Coefficients StandardError 25.22708893 8.612571593 0.978281557 0.016721865

tStat Pvalue 2.929100636 0.006122286 58.50313612 6.47389E35

From the Excel output, t = 58.503. The p-value is 6.47389E-35, which is very small, and certainly < 5%. In other words, there is almost no chance of getting sample results like these, if in fact there is no linear relationship between the recorded inventory values and the audited inventory values. Therefore, reject the null hypothesis and conclude there is evidence of a linear relationship between the recorded and audited inventory values.

Copyright 2011 Pearson Canada Inc.

380

Instructors Solutions Manual - Chapter 13

18. The coefficient of determination for the amended (corrected) data on actual and recorded inventory values for Aries Car Parts is 0.9905. This means that a little over 99% of the variation in the audited inventory values is explained by differences in the recorded inventory values. Such a strong relationship suggests confidence in the recorded inventory values. 19. The scatter diagram for these data is shown below.

RevenueandProfitforaRandomSample of Top1000 CanadianCompanies,2008


$35,000,000 $30,000,000 $25,000,000

y=5.8784x+478280

Profit(000)

$20,000,000 $15,000,000 $10,000,000 $5,000,000 $0 $5,000,000 $1,000,000 $0 $1,000,000 $2,000,000 $3,000,000 $4,000,000 $5,000,000 Revenue(000)

Notice that the trendline is greatly influenced by the three data points from the three largest organizations in the data set. If we remove these observations, the scatter diagram looks as shown on the next page.

Copyright 2011 Pearson Canada Inc.

381

Instructors Solutions Manual - Chapter 13

RevenueandProfitforaRandomSample of Top1000 CanadianCompanies,2008


$2,500,000 $2,000,000 $1,500,000 y= 1.3658x+ 250301

Profit(000)

$1,000,000 $500,000 $0 $500,000

$300,000

$250,000

$200,000

$150,000

$100,000

$100,000

Revenue(000)

The coefficient of determination for the full data set is 0.88, which is quite high. However, the measure is misleading. When the three largest data points are removed, the coefficient of determination is only 0.04, which seems more appropriate. The initial high value of the coefficient of determination never guarantees that a relationship is a good model, and it certainly does not, in this case.

$150,000

$50,000

$50,000

$0

Copyright 2011 Pearson Canada Inc.

382

Instructors Solutions Manual - Chapter 13

20. A scatter diagram for the data is shown below.

PerformanceofGraduatesonTest GivenDuring JobInterview


ScoreonTestGivenDuringJobInterview 75 70 65 60 55 50 45 40 35 30 50 60 70 80 90

y= 0.6421x+ 4.9775 R= 0.7989

100

FinallOverallAverageGrade

It appears there is a positive linear relationship between the final overall average grade and the score on the test given during the job interview. The regression relationship is as follows: score on test given during job interview = 0.6421(final overall average grade) + 4.98 This is promising. Since the grades are marked out of 100, and the test scores are out of 70, the slope would be 0.70 if the relationship was perfect.

Copyright 2011 Pearson Canada Inc.

383

Instructors Solutions Manual - Chapter 13

21. As discussed in Exercise 20 above, there appears to be a positive linear relationship between the final overall average grade and the score on the test given during the job interview. The residual plot is shown below.

FinalAverageMarkResidualPlot
8 6 4

Residuals

2 0 2 4 6 8 FinalAverageMark 50 60 70 80 90 100

The residuals appear randomly distributed around zero, with the same variability for all x-values. A histogram of the residuals is shown below.

Residualsfor TestScore Model


12 10

Frequency

8 6 4 2 0 Residual

The residuals appear approximately normally distributed.

Copyright 2011 Pearson Canada Inc.

384

Instructors Solutions Manual - Chapter 13

There are no outliers or obviously influential observations in the data set. It appears these data meet the requirements for the linear regression model. 22. Since the requirements are met, it is appropriate to test for a positive linear relationship. H0: 1 = 0 (that is, there is no linear relationship between the final average mark and the score on the test given during the job interview) H1: 1 > 0 (that is, there is a positive linear relationship between the final average mark and the score on the test given during the job interview) is not given We are provided with only an excerpt of Excel output. However, we know that

b1 0.642105 10.547 s b1 0.060878

We can approximate the p-value using a t-table, with n-2 = 28 degrees of freedom. Since t0.005 = 2.763, we know p-value is considerably less than 0.005. In other words, there is almost no chance of getting sample results like these, if in fact there is no linear relationship between the overall average mark of the graduate and the company test scores. Therefore, we can (with confidence), reject the null hypothesis and conclude there is evidence of a positive linear relationship. 23. Since the requirements are met, it is appropriate to create a confidence interval estimate. The Excel output is shown below.

Point Number

98% =ConfidenceLevel(%) FinalAverageMark 1 75

PredictionInterval ConfidenceInterval Lowerlimit Upperlimit Lowerlimit Upperlimit 43.9917459 62.278853 51.4810585 54.78954

With 98% confidence, we estimate that the interval (51.5, 54.8) contains the average test score of graduates with an overall average mark of 75.

Copyright 2011 Pearson Canada Inc.

385

Instructors Solutions Manual - Chapter 13

24. Refer back to the output shown above in the solution to Exercise 23. With 98% confidence, we estimate that the interval (44.0, 62.3) contains the test score of a student with an overall average mark of 75. It is difficult to decide if the company should continue to administer its own test. The answer depends on how reliable a predictor of future performance the test has been, and what the costs of administering the tests have been. If the company test makes a major distinction between the predicted performance of someone with a test score of 44 and someone with a test score of 62, then the overall average grade may not be a good substitute. However, there is fairly strong relationship between the two variables. Perhaps the company could pilot using the overall average grade with a random sample of graduates, to see how well they do. 25. No, it would not be appropriate to use package weight as a predictor of shipping cost. We can see from the residual plot that variability increases as package weight increases. 26. It is often suggested that the Canadian stock market is very closely tied to the price of oil. A data set of weekly values for the Toronto Stock Exchange Composite Index (TSX) and the Canadian spot price of oil in dollars per barrel for the period from January 2000 to June 2009 was examined. The scatter diagram (shown below), suggests that while there may be a relationship between the two variables, it is not linear.

TSXandCanadianOil Prices,January y= 76.584x+ 6039.5 2007 June2009


16,000 R= 0.6902

S&PTSXCompositeIndex

14,000 12,000 10,000 8,000 6,000 4,000 $0 $20 $40 $60 $80 $100 $120 $140 WeeklyCanadianParSpotPrice(DollarsperBarrel) $160

Copyright 2011 Pearson Canada Inc.

386

Instructors Solutions Manual - Chapter 13

The non-linearity is evident in the residual analysis, as well.

WeeklyCanadianParSpot PriceFOB (DollarsperBarrel)ResidualPlot


4000 3000 2000 1000

Residuals

0 1000 2000 3000 4000 5000 WeeklyCanadianParSpotPriceFOB(DollarsperBarrel) 0 20 40 60 80 100 120 140 160

ResidualsOverTime,TSXandOil PriceModel
4000 3000 2000 1000

Residual

0 1000 2000 3000 4000 5000

03/01/2000

03/06/2000

03/11/2000

03/04/2001

03/09/2001

03/02/2002

03/07/2002

03/12/2002

03/05/2003

03/10/2003

03/03/2004

03/08/2004

03/01/2005

03/06/2005

03/11/2005

03/04/2006

03/09/2006

03/02/2007

03/07/2007

03/12/2007

03/05/2008

03/10/2008

There appears to be a time-related pattern in the residuals. This is also apparent in the patterns of extreme residuals (those with standardized residuals either +2 or -2). They predictably occur in the period of August in 2000, January July 2007, July 2008 and September-October 2008. While the model could probably be improved by the addition of a time variable, it is not clear how this could be used for predictive
387

Copyright 2011 Pearson Canada Inc.

03/03/2009

Instructors Solutions Manual - Chapter 13

purposes. It would be probably be more useful to investigate what other explanatory variables were affecting the stock market over this period. As well, non-linear models could be explored.

Copyright 2011 Pearson Canada Inc.

388

You might also like