You are on page 1of 8

Problemas bono para Tercer examen de Estadstica Verano 2012

Regression data are often coded by subtracting the mean of each variable from the values of the respective variables. Consider the following coded data: X 3 2 1 0 1 2 3 Y 9 4 1 2 1 4 7 (a) Construct a scatter plot of the data. (b) Does the scatter plot suggest a linear relationship? (c) Fit a linear regression equation and then plot the residuals against . (d) Why does that plot have a parabolic shape? (e) What does this suggest about how the prediction equation should be modified? An engineer is interested in calibrating a flow meter to be used on a liquid-soap production line. For the test, 10 different flow rates are fixed and the corresponding meter readings observed. The data are shown here. Use these data to place a 95% prediction interval on x, the actual flow rate corresponding to an instrument reading of 4.0. Flow Rate, x Instrument Reading, y
1 2 3 4 5 6 7 8 9 10 1.4 2.3 3.1 4.2 5.1 5.8 6.8 7.6 8.7 9.5

A manufacturer of laundry detergent was interested in testing a new product prior to market release. One area of concern was the relationship between the height of the detergent suds in a washing machine as a function of the amount of detergent added in the wash cycle. For a standard size washing machine tub filled to the full level, the manufacturer made random assignments of amounts of detergent and tested them on the washing machine. The data appear next. Height, y Amount, x
28.1, 27.6 32.3, 33.2 6 7

34.8, 35.0 38.2, 39.4 43.5, 46.8

8 9 10

Suppose a researcher is interested in examining the relationship between different concentrations of pectin (0%, 1.5%, and 3% by weight) on the firmness of canned sweet potatoes after storage in a controlled 25C environment. The sample data for six cans are shown here. y (firmness) 50.5 46.8 62.3 67.7 80.1 79.2 5 x (concentration 0 0 1.5 1.5 3.0 3.0 of pectin) a. Obtain the least-squares estimates for the parameters in the model y = 0 + 1x + . b. Obtain an estimate of . c. Give the standard error of .

Postulate an equation model like that U = The following data relate biomass production of soybeans to cumulative intercepted solar radiation over an eight-week period following emergence. Biomass production is the mean dry weight in grams of independent samples of four plants. (Data courtesy of Virginia Lesser and Dr. Mike Unsworth, North Carolina State University.) Solar Radiation Plant Biomass 7
29.7 68.4 120.7 217.2 313.5 419.1 535.9 641.5 16.6 49.1 121.7 219.6 375.5 570.8 648.2 755.6

An article in the Journal of the American Statistical Association [Markov Chain Monte Carlo Methods for y Computing Bayes Factors: A Comparative Review (2001, Vol. 96, pp. 11221132)] analyzed the tabulated data on compressive strength parallel to the grain versus resin-adjusted density for specimens of radiata pine. Compressive Compressive Density Density Strength Strength
3040 29.2 3840 30.7

2470 3610 3480 3810 2330 1800 3110 3160 2310 4360 1880 3670 1740 2250 2650 4970 2620 2900 1670 2540

24.7 32.3 31.3 31.5 24.5 19.9 27.3 27.1 24.0 33.8 21.5 32.2 22.5 27.5 25.6 34.5 26.2 26.7 21.1 24.1

3800 4600 1900 2530 2920 4990 1670 3310 3450 3600 2850 1590 3770 3850 2480 3570 2620 1890 3030 3030

32.7 32.6 22.1 25.3 30.8 38.9 22.1 29.2 30.1 31.4 26.7 22.1 30.3 32.0 23.2 30.3 29.9 20.8 33.2 28.2

The vapor pressure of water at various temperatures follows: Observation Temperature Vapor pressure Number, i (K) (mm Hg)
1 2 3 4 5 6 7 8 9 10 11 273 283 293 303 313 323 333 343 353 363 373 4.6 9.2 17.5 31.8 55.3 92.5 149.4 233.7 355.1 525.8 760.0

(a) Draw a scatter diagram of these data. What type of relationship seems appropriate in relating y to x? (b) Fit a simple linear regression model to these data. (c) What do you conclude about model adequacy? (d) The ClausisClapeyron relationship states that lnPv - 1/T; where Pv is the vapor pressure of water. Repeat parts (a) (c) using an appropriate transformation.

10

11

12

An experiment was designed to compare several different types of air pollution monitors.4 The monitor was set up, and then exposed to different concentrations of ozone, ranging between 15 and 230 parts per million (ppm) for periods of 872 hours. Filters on the monitor were then analyzed, and the amount (in micrograms) of sodium nitrate (NO3) recorded by the monitor was measured. The results for one type of monitor are given in the table. Ozone, x (ppm/hr) .8 1.3 1.7 2.2 2.7 2.9 NO3, y (mg) 2.44 5.21 6.07 8.98 10.82 12.16 The following data were obtained in an experiment relating the dependent variable, y (texture of strawberries), with x (coded storage temperature). x -2 -2 0 2 2 y 4.0 3.5 2.0 0.5 0.0 a. Find the least-squares line for the data. b. Plot the data points and graph the least-squares line as a check on your calculations. c. What is the best estimate of 2, the variance of the random error ? The Australian Public Service Commissions State of the Service Report 20022003 reported job satisfaction ratings for employees. One of the survey questions asked employees to choose the five most important workplace factors (from a list of factors) that most affected how satisfied they were with their job. Respondents were then asked to indicate their level of satisfaction with their top five factors. The following data show the percentage of employees who nominated the factor in their top five, and a corresponding satisfaction rating measured using the percentage of employees who nominated the factor in the top five and who were very satisfied or satisfied with the factor in their current workplace (http://www.apsc.gov.au/stateoftheservice).

13

14

a. Develop a scatter diagram with Top Five (%) on the horizontal axis and Satisfaction Rating (%) on the vertical axis. b. What does the scatter diagram developed in part (a) indicate about the relationship between the two variables? c. Develop the estimated regression equation that could be used to predict the Satisfaction Rating (%) given the Top Five (%). d. Test for a significant relationship at the .05 level of significance. e. e. Did the estimated regression equation provide a good fit? Explain. f. What is the value of the sample correlation coefficient? Infrared spectroscopy is often used to determine the natural rubber content of

mixtures of natural and synthetic rubber. For mixtures of known percentages, the infrared spectroscopy gave the following readings: Percentage 0 20 40 60 80 100 Reading .734 .885 1.050 1.191 1.314 1.432 If a new mixture gives an infrared spectroscopy reading of 1.15, estimate its percentage of natural rubber. In 1957 the Dutch industrial engineer J. R. DeJong proposed the following model for the time it takes to perform a simple manual task as a function of the number of times the task has been practiced: T = ts-n where T is the time, n is the number of times the task has been practiced, and t and s are parameters depending on the task and individual. Estimate t and s for the following data set. T 22.4 21.3 19.7 15.6 15.2 13.9 13.7 n 0 1 2 3 4 5 6 The following data represent the relation between the number of cans damaged in a boxcar shipment of cans and the speed of the boxcar at impact. Speed Number of Cans Damaged
3 3 3 5 5 5 6 7 7 8 54 62 65 94 122 84 142 139 184 254

15

16

17

18

(a) Analyze as a simple linear regression model. (b) Plot the standardized residuals. (c) Do the results of part (b) indicate any flaw in the model? (d) If the answer to part (c) is yes, suggest a better model and estimate all resulting parameters. The following data record the amount of water (x), in centimeters, and the yield of hay (y), in metric tons per hectare, on an experimental farm. water (x) 30 45 60 75 90 105 120 yield (y) 2.11 2.27 2.5 2.88 3.21 3.48 3.37 (a) Draw the scatter plot (xi, yi). (b) Calculate the least squares estimates and and draw the graph of the estimated regression line y = + x. Use the same axes on which you graphed the scatter plot. (c) Compute the sample correlation, coefficient of determination, and error sum of squares. (d) Compute the predicted value and the residual for x = 30, x = 75. At temperatures approaching absolute zero (273C), helium exhibits traits that defy many laws of conventional physics. An experiment has been conducted with helium in solid form at various temperatures near absolute zero. The solid helium is placed in a dilution refrigerator along with a solid impure substance, and the fraction (in weight) of the impurity passing through the solid helium is recorded. (The phenomenon of solids passing directly through solids is known as quantum

tunneling.) The data are given in the following table. Proportion of Impurity C Temperature (x) Passing Through Helium (y)
262.0 265.0 256.0 267.0 270.0 272.0 272.4 272.7 272.8 272.9 .315 .202 .204 .620 .715 .935 .957 .906 .985 .987

19

20

21

a) Fit a least-squares line to the data. b) Test the null hypothesis H0: 1 = 0 against the alternative hypothesis Ha : 1 <0, at the = .01 level of significance. c) Find a 95% prediction interval for the percentage of the solid impurity passing through solid helium at 273C. (This value of x is outside the experimental region where use of the model for prediction may be dangerous.) In production flow-shop problems, performance is often evaluated by minimum makespan, this being the total elapsed time from starting the first job on the first machine until the last job is completed on the last machine. We might expect that minimum makespan would be linearly related, at least approximately, to the number of jobs. Consider the following data, with X denoting the number of jobs and Y denoting the minimum makespan in hours: 3 4 5 6 7 8 9 10 11 12 13 X 6.50 7.25 8.00 8.50 9.50 10.25 11.50 12.25 13.00 13.75 14.50 Y a) From the standpoint of engineering economics, what would a nonlinear relationship signify? b) What does a scatter plot of the data suggest about the relationship? c) If appropriate, fit a simple linear regression model to the data and estimate the increase in the minimum makespan for each additional job. If doing this would be inappropriate, explain why. For the past decade rubber powder has been used in asphalt cement to improve performance. The article Experimental Study of Recycled Rubber- Filled HighStrength Concrete (Mag. Concrete Res., 2009: 549556) included on a regression of y axial strength (MPa) on x cube strength (MPa) based on the following sample data: x 112.3 97.0 92.7 86.0 102.0 99.2 95.8 103.5 89.0 86.7 y 75.0 71.0 57.7 48.7 74.3 73.3 68.0 59.3 57.8 48.5 a) Verify that a scatter plot supports the assumption that the two variables are related via the simple linear regression model. b) Obtain the equation of the least squares line, and interpret its slope. c) Calculate and interpret the coefficient of determination d) Calculate and interpret an estimate of the error standard deviation s in the simple linear regression model. e) The largest x value in the sample considerably exceeds the other x values. What is the effect on the equation of the least squares line of deleting the corresponding observation? The amounts of solids removed from a particular material when exposed to drying periods of different lengths are as shown.

x (hours)
4.4 4.5 4.8 5.5 5.7 5.9 6.3 6.9 7.5 7.8

y (grams)
13.1 9.0 10.4 13.8 12.7 9.9 13.8 16.4 17.6 18.3 14.2 11.5 11.5 14.8 15.1 12.7 16.5 15.7 16.9 17.2

a) Estimate the linear regression line. b) Test at the 0.05 level of significance whether the linear model is adequate. In biofiltration of wastewater, air discharged from a treatment facility is passed through a damp porous membrane that causes contaminants to dissolve in water and be transformed into harmless products. The accompanying data on x = inlet temperature (C) and y = removal efficiency (%) was the basis for a scatter plot that appeared in the article Treatment of Mixed Hydrogen Sulfide and Organic Vapors in a Rock Medium Biofilter(Water Environment Research, 2001: 426435). Removal Removal Obs Temp Obs Temp % %
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 7.68 6.51 6.43 5.48 6.57 10.22 15.69 16.77 17.13 17.63 16.72 15.45 12.06 11.44 10.17 9.64 98.09 98.25 97.82 97.82 97.82 97.93 98.38 98.89 98.96 98.90 98.68 98.69 98.51 98.09 98.25 98.36 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 8.55 7.57 6.94 8.32 10.50 16.02 17.83 17.03 16.18 16.26 14.44 12.78 12.25 11.69 11.34 10.97 98.27 98.00 98.09 98.25 98.41 98.51 98.71 98.79 98.87 98.76 98.58 98.73 98.45 98.37 98.36 98.45

22

a) Does a scatter plot of the data suggest appropriateness of the simple linear regression model? b) Fit the simple linear regression model, obtain a point prediction of removal efficiency when temperature = 10.50, and calculate the value of the corresponding residual. c) Roughly what is the size of a typical deviation of points in the scatter plot from the least squares line? d) What proportion of observed variation in removal efficiency can be attributed to the model relationship? e) Estimate the slope coefficient in a way that conveys information about reliability and precision, and interpret your estimate. f) f. Personal communication with the authors of the article revealed that there was one additional observation that was not included in their scatter plot: (6.53, 96.55). What impact does this additional observation have on the equation of the least squares line and the values of s and r2?

You might also like