You are on page 1of 2

Introduction to Probability and Statistics for Engineers

CME106, Winter 2018

Problem Set #8
(Regression and Correlation Analyses)

Date: 3/8/2018 Due: 3/15/2018

Statistics show that those people who celebrate


the most birthdays become the oldest.

Topics: Regression
Distribution of the estimators
Hypothesis testing with regression parameters
Sample correlation coefficient

MATLAB Workbook (optional): Exercise 5

Problem 1 In the simple linear regression model discussed in class it was assumed that
the means of the response variable, i.e. lie on a straight line in two-dimensional
space. In many applications a single independent variable is inadequate to describe the
mean behavior of the response variable. Consider a situation in which is a linear
superposition of two independent variables x and y so that the estimate for the mean can
be written as:

If n data points … are given, obtain three equations that can be used
to solve for the values of B0, Bx, and By that minimize the sum of squares of the residuals.
Do not solve.

Problem 2 The time to complete an assembly task is to be modeled as a linear function


of the complexity of the task and the time an assembler has been continuously working.
Complexity (x) is measured on a scale from 0 to 100, work time (y) is in hours prior to
beginning the timed assembly, and completion time (Z) is in minutes. Use your result
from Problem 1 and the following data to determine an equation of the regression plane
that predicts the expected completion time. [Hint: use MATLAB to solve the system of
three equations for the regression coefficients]

x y Z
90 8 5.6
75 4 3.8
60 0 2.0
60 8 2.4
90 4 5.2
75 0 2.9
Problem 3 A hypothesis has been proposed that the laws of inheritance caused
population extremes to regress towards the mean, i.e. that the children of individuals
having extreme values of a certain characteristic would tend to have less extreme values
of this characteristic than their parent. Given below are the heights of 10 randomly
chosen sons versus that of their fathers. It should be noted that whereas the data appear to
indicate that taller fathers tend to have taller sons, it also appears to indicate that the sons
of fathers that are either extremely short or extremely tall tend to be more “average” that
their fathers. This would imply that the slope of the regression line would be less than 1.

Fathers’ 60 62 64 65 66 67 68 70 72 74
height (in)
Sons’ 63.6 65.2 66.0 65.5 66.9 67.1 67.4 68.3 70.1 70.0
height (in)

a) determine the regression coefficients using the expressions derived in class


b) confirm your result in part a) using the polyfit function in MATLAB
c) on the same set of axes make a scatter plot of the data , the regression line, and a
line with the slope of 1 [Hint: use the lsline function in MATLAB]
d) determine whether the data strongly indicate a regression toward the mean by
testing the alternative hypothesis that the slope of the regression line is less than 1

Problem 4 The following are data given on results of a study of the effect of carbon
content in steel wires on the electrical resistance:

Carbon content (%), x Resistance (Ohms), y


0.10 15.0
0.30 18.0
0.40 19.0
0.55 21.0
0.70 22.6
0.80 23.8
0.95 26.0

a) find the least squares regression line of y on x


b) determine the 95% confidence intervals for the slope and the intercept
c) find the point estimate of the resistance for a wire having 0.50% carbon content
d) calculate the sample correlation coefficient
e) verify your result in part d) using the corrcoef function in MATLAB
f) test the hypothesis that the correlation coefficient is significantly different from 0
at the 5% significance level
g) OPTIONAL find the 95% confidence intervals for the correlation coefficient
[Hint: use Fisher’s transformation if  is thought to be significantly different
from zero]

You might also like