You are on page 1of 3

QMM5100 W19 Multiple Regression HW Due: April 8, 2019

1) It appears that over the past 45 years, the number of farms in the United States declined while the
average size of farms increased. The data, provided by the U.S. Department of Agriculture,
appear in the Farms worksheet in the Multiple Regression HW data workbook on Moodle. The
data show five-year interval data for US farms.

a) Use JMP to fit the simple regression model using Average Size as the dependent or Y
variable and number of farms as the independent or X variable. Is the model statistically
significant at  = 0.05?

b) Interpret the regression model parameters (intercept and slope) in the context of relating
farms to size.

c) Perform a residual analysis. Do the residuals satisfy the model assumptions?

2) Larry Swattle, a local attorney, is interested in understanding the relationship between the outside
temperature and his energy usage. Larry finds his electric bills for the last 24 months and records the
electric consumption in Kilowatt Hours. He then uses the internet to find the average monthly
temperature in Fahrenheit for his city for those months. The data appear in the Electricity worksheet
of the Multiple Regression Modeling HW data workbook on Moodle.

a) Fit the simple linear regression model using monthly electric consumption as the dependent or
Y variable and average daily temperature as the dependent or X variable. Analyze the
residuals to determine if they satisfy the three regression model assumptions.

b) Calculate the Durbin Watson statistic. Do the residuals show autocorrelation?

c) Predict electric consumption for a month with an average temperature of 60.

d) Determine the leverage for each observation. Do any of the data points have excess leverage?

e) Remove any data points with excess leverage and fit the simple linear regression model using
monthly electric consumption as the dependent or Y variable and average daily temperature
as the dependent or X variable. Using the model without points that have excess predict
electric consumption for a month with an average temperature of 60. Do the point(s) with
excess leverage appear to unduly influence the model?

Page: 1 of 3
QMM5100 W19 Multiple Regression HW Due: April 8, 2019

3) The quality of the orange juice produced by a manufacturer (e.g. Minute Maid, Tropicana) is
constantly monitored. There are numerous sensory and chemical components that combine to
make the best tasting orange juice. For example, one manufacturer has developed a quantitative
index of the “sweetness” of orange juice where the higher the index, the sweeter the juice. To
determine if there is a relationship between the sweetness index and the parts per million (ppm)
of water soluble pectin (a chemical measure of the juice) data was collected on 24 production
runs at a juice plant. The data appear in the Sweetness worksheet of the Multiple Regression
HW data workbook on Moodle.

a) Use JMP to fit the simple regression model using the sweetness index as the Y variable and
pectin as the X variable. Is the model statistically significant at  = 0.05?

b) What percentage of the variation in sweetness index has been explained by the linear
relationship with pectin?

c) Analyze the residuals to determine if they satisfy the three regression model assumptions.

d) Use a hypothesis test to determine if any of the data points have too much leverage.

e) Remove any data points that have excess leverage and refit the regression model. Does
removing these data points change the model?

4) A medical researcher is studying percent body fat. As part of the study, the researcher takes data
on 50 males aged 22 to 50. The dataset consists of the following variables: Fat% = percent body
fat, Age = age (yrs), Weight = weight (lbs), Height = height (in.), Neck = neck circumference
(cm), Chest = chest circumference (cm), Abdomen = abdomen circumference (cm), Hip = hip
circumference (cm), Thigh = thigh circumference (cm). The data appear in the BodyFat
worksheet of the Multiple Regression HW data workbook on Moodle.

a) Use JMP to fit the multiple regression model using Weight as the dependent or Y variable and
the other 8 variables as the independent or X variables. Is the model statistically significant
at  = 0.05? Are all 8 independent variables significant?

b) Remove insignificant independent variables one at a time until all remaining variables in the
model are significant. Provide the equation of the fit line. Interpret the regression model
parameters in the context of predicting weight.

c) Provide a point prediction for the weight of a male 70 inches tall with a 39cm neck, 94cm
abdomen and a 100cm hip.

d) Provide a 95% confidence interval for the average weight of males 70 inches tall with 20%
body fat.

e) Use a hypothesis test to determine if an additional 1cm in neck circumference corresponds to


expected increase of more than 2 pounds, i.e., test the hypothesis H o :  Neck  2 versus
H1 :  Neck  2

Page: 2 of 3
QMM5100 W19 Multiple Regression HW Due: April 8, 2019

5) The State of Ohio Department of Education has a mandated ninth-grade proficiency test that
covers writing, reading, mathematics, citizenship (social studies) and science. The Ohio
worksheet in the Multiple Regression HW data workbook contains proficiency test results for 31
school districts in Ohio for 2000. The dataset contains the percent of students in the district
passing each of the 5 test sections (listed above) and the percent of students who passed all five
sections.

a) Use JMP to fit the multiple regression model using All as the dependent or Y variable and the
other five variables as the independent or X variables. Is the model statistically significant at
 = 0.05? Are all five independent variables statistically significant at  = 0.05?

b) Continue to use All as the dependent or Y variable. Remove the insignificant variables one at
a time (removing the variable with the highest p-value) and fit regression models until you
obtain a model with only significant independent or X variables. Provide the output for your
final model.

c) What percentage of the variation in All has been explained by the linear relationship with the
independent variables in your model?

d) Provide the equation of the fit line. Interpret the regression model parameters in the context
of passing proficiency tests.

e) Use a hypothesis test to determine if the increase in students passing All sections for a one
percentage increase of students passing Math is greater than 0.50%, i.e., test the hypothesis
H o :  Math  0.50 versus H1 :  Math  0.50 using  = 0.05.

Page: 3 of 3

You might also like