You are on page 1of 32

July 2013

Chapter 17
Least-Square Regression

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Where

substantial

error

is

associated

with

data,

polynomial interpolation is inappropriate and may yield


unsatisfactory results when used to predict intermediate
values. Experimentally data is often of this type. For
example,

the

following

figure

(a)

shows

seven

experimentally derived data points showing significant


variability. The data indicates that higher values of y are
associated

with

higher

values

of

x.

Now, if a sixth-order interpolating polynomial is fitted to this


data (fig b), it will pass exactly through all of the points.
However, because of the variability in the data, the curve
oscillates widely in the interval between the points. In
particular, the interpolated values at x = 1.5 and x = 6.5
appear to be well beyond the range suggested by the data.
A

more

appropriate

strategy

is

to

derive

an

approximating function that fits the shape . Fig (c) illustrates


how a straight line can be used to generally characterize
the trend of the data without passing through any particular
point.
One way to determine the line in figure (c) is to look at
the plotted data and then sketch a best line through the
points. Such approaches are not enough because they are
arbitrary. That is, unless the points define a perfect straight
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

line (in which case, interpolation would be appropriate),


different analysis would draw different lines.

To avoid this,

some criterion

must be devised to establish a basis for the fit. One way to


do this is to derive a curve that minimizes the discrepancy
between the data points
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

and the curve. One technique for doing


this is called least-squares regression.

17.1 Linear Regression


The simple example of a least-squares approximation is
fitting a straight line to a set of paired observations: (x 1,y1),
(x2,y2),

(xn,yn).

The mathematical expression for the straight line is


y = a0 + a1x + e
where a0 and a1 are coefficients representing the intercept
and the slope, respectively, and e is the error between the
model and the observations, which can be represented by
rearranging the previous equation as
e = y a0 a1x
thus, the error is the discrepancy between the true value of
y and the approximate value, a0 + a1x, predicted by the
linear equation.
17.1.1 Criteria for the best fit
One strategy for fitting a best line through the data
would be to minimize the sum of the residual errors for all
the available data, as in
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013
n

i=1

i=1

e i= ( y ia0 a1 xi )
where n = total number of points. However, this is an
inadequate criterion, as illustrated by the next figure, which
shows the fit of a straight line to two points.

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Obviously, the best fit is the line connecting the points.


However, any straight line passing through the midpoint of
the connecting line results in a minimum value of the
previous equation equal to zero because the errors cancel.


Therefore, another logical criterion might be to minimize the
sum of the absolute values of the discrepancies, as in
n

i=1

i=1

|ei|= | y iaoa1 x i|

The previous fig (b) demonstrates why this criterion is also


inadequate.
For the four points shown, any straight line falling within the
dashed lines will minimize the sum of the absolute values.
Thus, this criterion also does not yield a unique best fit.
A third strategy for fitting a best line is the minimax
criterion.
In this technique, the line is chosen that minimizes the
maximum distance that an individual point falls from the
line. As shown in previous fig (c), this strategy is ill-suited
for regression because it gives big effect to an outlier, that
is, a single point with a large error.

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

A strategy that overcomes the shortcomings of the


previous approaches is to minimize the sum of the squares
of the residuals between the measured y and the y
calculated with the linear model.
n

e = ( yi , measured y i ,model ) = ( y ia0a1 x i )2


2
i

i=1

i=1

Sr =
i=1

This criterion has a number of advantages, including the


fact that it yields a unique line for a give set of data.


17.1.2 Least-Squares fit of a straight line
To determine values of a0 and a1, the previous equation
is differentiated with respect to each coefficient:
Sr
=2 ( y ia 0a1 x i)
a0
Sr
=2 [ ( y ia 0a1 xi )x i ]
a1

Note that we have simplified the summation symbols;


unless otherwise indicated, all summation are from i = 1 to
n. Setting these derivatives equal to zero will result in a
minimum Sr.
y i a0 a 1 x i
0=

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013
2

y i x i a0 xi a1 x i
0=

Now, realizing that

a0

= na0, we can express the

equations as a set of two simultaneous linear equations with


two unknowns (a0 and a1):
(17.4)

n a0 + ( xi ) a1= y i

( x i ) a0 + ( x 2i ) a1= x i y i

These are called the normal equations. They can be solved


simultaneously
a1=

n x i yi x i y i
2

n x 2i ( x i)

This result can then be used in conjunction with Eq. (17.4)


to solve for
a0 = y a 1 x

where

and


are the means of y and x, respectively.

Example 17.1 Linear Regression


Problem Statement:
Fit a straight line to the x and y values in the first two
columns of the next table

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Solution:
The following quantities can be computed
n=7

x i yi =119.5

x 2i =140

28

x i=28 x = 7 =4
24

y i=24 y = 7 =3.428571
Using the previous two equations,
a1 =

7 ( 119.5 )28(24)
=0.8392857
2
7 ( 140 )(28)

a0 =3.4285710.8392857 ( 4 )=0.07142857

Therefore, the least-square fit is


y=0.07142857+ 0.8392857 x

The line, along with the data, is shown in the first figure (c).


17.1.3 Quantification of Error of Linear Regression
Any line other than the one computed in the previous
example results in a larger sum of the squares of the
residuals. Thus, the line is unique and in terms of our
chosen criterion is a best line through the points.
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

A number of additional properties of this fit can be


explained by examining more closely the way in which
residuals

were

computed.

Recall that the sum of the squares is defined as


n

S r = e = ( y ia0 a1 xi )2
2
i

i=1

i=1

Notice the similarity between the previous equation and


S t = ( y i y )2

The similarity can be extended further for cases where (1)


the spread of the points around the line is of similar
magnitude along the entire range of the data and (2) the
distribution of these points about the line is normal.
It can be demonstrated that if these criteria are met, leastsquare regression will provide the best (that is, the most
likely) estimates of a0 and a1.
In addition, if these criteria are met, a standard
deviation of the regression line can be determined as
s y / x=

Sr
n2


is called the standard error of
the )
estimate.
(

The subscript notation y / x designates that the error is
where

s y /x

for a predicted value of y corresponding to a particular


value of x.
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Also, notice that we now divide by n-2 because two data


derived estimates a0 and a1 were used to compute S r;
thus,

we

have

lost

two

degrees

of

freedom.

Another justification for dividing by n-2 is that there is no


such thing as the spread of data around a straight line
connecting two points..
The standard error of the estimate quantifies the spread
of the data. However,

s y /x

quantifies the spread around the

regression line as shown in the next figure (b) in contrast to


the original standard deviation Sy that quantified the spread
around the mean ( fig (a)).




The above concepts can be used to quantify the
goodness
of our fit. This is particularly useful for comparison of
several

regressions

(next figure). To do this, we return to the original data and


determine the total sum of the squares around the mean for
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

the

dependent

variable

(in our case, y). This quantity is designated as S t. This is the


magnitude of the residual error associated with the
dependent variable prior to regression. After performing the
regression, we can compute Sr, the sum of the squares of
the

residuals

around

the

regression

line.

This characterizes the residual error that remains after the


regression.
It is, therefore, sometimes called the unexplained sum of
the squares.

. . .

The difference between the two


quantifies,
St
S
r,
quantifies the improvement or error reduction due to
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

describing the data in terms of a straight line rather than as


an average value.
Because the magnitude of this quantity is scale-dependent,
the difference is normalized to St to yield
r 2=

S t S r
St

where r2 is called the coefficient of determination and r


is the correlation coefficient (= r 2 ).
For a perfect fit, Sr = 0 and r = r2 = 1, signifying that the
line explains 100 percent of the variability of the data. For r
= r2 = 0, Sr = St and the fit represents no improvement.
An alternative formulation for r that is more convenient
for computer implementations is
r=

n x i y i ( x i )( y i)

n x ( x ) n y ( y )
2
i

2
i



Example 17.2 Estimate of errors for the linear leastSquares Fit
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Problem Statement:
Compute the total standard deviation, the standard error of
the estimate, and the correlation coefficient for the data in
Example 17.1
Solution:
The summations are performed and represented in the
previous examples table. The standard deviation is
S y=

St
22.7143
=
=1.9457
n1
71

and the standard error of the estimate is


S y / x=

Sr
2.9911
=
=0.7735
n2
72

Thus, because

S y / x< S y

, the linear regression model is

efficient.
The extent of the improvement is quantified by
r 2=

S t S r 22.71432.9911
=
=0.868
St
22.7143

or
r= 0.868=0.932

These results indicate that 86.8 percent of the original


uncertainty has been explained by the linear model.

17.1.5 Linearization of Nonlinear Relationships
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Linear regression provides a powerful technique for


fitting a best line to data. However, it is predicated on the
fact that the relationship between the dependent and
independent

variables

is

linear.

This is not always the case and the first step in any
regression analysis should be to plot and visually inspect
the data to know whether a linear model applies. For
example, the next figure shows some data that is obviously
curvilinear. In some cases, techniques such as polynomial
regression, are appropriate. For example, transformations
can be used to express the data in a form that is compatible
with linear regression.


9 4444 260



Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

One

example

is

the

exponential model
y= 1 e

1 x

(17.2)
where

and

are

constants. As shown
in

the

next

equation

figure,

the

represents a nonlinear
relationship (for

1 0

between x and y.

Another
example of a nonlinear model is the simple power
equation
y=a2 x

(17.13)
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

where

and

are constant coefficients. As shown in

the previous figure, the equation ( for 2 0 or 1) is


nonlinear.
A third example of a nonlinear model is the saturationgrowth-rate equation
y= 3

x
3+ x

(17.4)
Where

and

are constant coefficients. This model

also represents a nonlinear relationship between y and x,


that levels off as x increases.
A

simpler

alternative

is

to

use

mathematical

manipulations to transform the equations into a linear form.


Then, simple linear regression can be employed to fit the
equations to data.
Equation (17.2) can be linearized by taking its natural
logarithm
ln y=ln 1 + 1 x ln e

But because ln e = 1,

ln y=ln 1 + 1 x

Thus, a plot of ln y versus x will yield a straight line with a


slope of

and an intercept of ln

(previous fig d).

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Equation (17.3) is linearized by taking its base-10 logarithm


to give
log y= 2 log x + log 2

Thus, a plot of y versus log x will yield a straight line with a


slope of

and an intercept of log

( previous fig e).

Equation (17.14) is linearized by inverting it to give


1 3 1 1
=
+
y 3 x 3

Thus, a plot of

1/ y

versus

1/ x

will be linear, with a slope of


(previous
fig f).


In their transformed forms, these models can use linear

3 / 3

and an intercept of

1/ 3

regression to evaluate the constant coefficients. They could


then be transformed back to their original state and used
for

predictive

purposes.

Example 17.4 illustrates this procedure for Eq. (17.3)


Example 17.4 Linearization of a Power Equation
Problem Statement:
Fit Eq.(17.13) to the data in the next table using a
logarithmic transformation of the data.

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Solution:
The next figure (a) is a plot of the original data in its
untransformed state. Figure (b) shows the plot of the
transformed data. A linear regression of the log-transformed
data yields the result
log y=1.75 log x0.300

Thus, the intercept, log

, equals -0.300, and therefore,

by taking the antilogarithm,


2=1.75

2=100.3=0.5

. The slope is

Consequently, the power equation is


y=0.5 x

1.75

This curve, as plotted in the next figure (a), indicates a good


fit.

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

17.1.6

General

Linear
We

Comments

on

Regression
have

focused

on

the

simple derivation and practical use of equations to fit data.


Some statistical assumptions that are inherent in the linear
least-square procedures are
1. Each x has a fixed value; it is not random and is
known without error.
2. The y values are independent random variables and
all have the same variance.
3. The y values for a given x must be normally
distributed.
Such assumptions are relevant to the proper derivation
and use of regression. For example, the first assumption
means that (1) the x values must be error-free and (2) the
regression of y versus x is not the same as x versus y.


9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

17.2 Polynomial Regression


Some engineering data, although representing a marked
pattern, is poorly represented by a straight line. For these
cases, a curve would be better suited to fit the data. One
method

to

accomplish

this

objective

is

to

use

transformations. Another alternative is to fit polynomials to


the data using polynomial regression.
The least-squares procedure can be readily extended to
fit the data to a higher-order polynomial. For example,
suppose

that

we

fit

second-order polynomial or quadratic:


2

y=a0+ a1 x+ a2 x +e

for this case the sum of the squares of the residuals is


n

S r = ( y i a0a1 x ia2 x i )

2 2

i=1

Following the procedure of the previous section, we take the


derivative of the previous equation with respect to each of
the unknown coefficients of the polynomial, as in
Sr
=2 ( y ia 0a1 x ia2 xi2)
a0
Sr
=2 x i ( y ia 0a 1 x ia 2 x 2i )
a1
Sr
=2 x 2i ( y ia 0a 1 x ia2 x 2i )
a2



Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,

Numerical, Economy
9 4444 260

eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

These equations can be set equal to zero and rearranged to


develop the following set of normal equations:
( n ) a0 + ( xi ) a 1+ ( x 2i ) a2= yi

( x i ) a0 + ( x 2i ) a1 + ( x 3i ) a2 = xi y i

( x 2i ) a0 + ( x 3i ) a1 + ( x 4i ) a2= x 2i y i

where all summations are from i = 1 through n. Note that


the above three equations are linear and have
unknowns:

a0 , a1 ,

and

a2

three

The coefficients of the unknowns can be calculated


directly from the observed data.
For this case, we see that the problem of determining a
least-squares second-order polynomial is equivalent to
solving a system of three simultaneous linear equations.

9 4444 260


Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Example Polynomial Regression


Problem Statement:
Fit a second-order polynomial to the data in the first two
columns of the next table.

Solution:
From the given data,
m=2

x i=15

x 4i =979

n =6

y i=152.6

x i yi =585.6

x =2.5

y =25.433

x 2i =55

x 2i y i =2488.8

x 3i =225

Therefore, the simultaneous linear equations are

]{ } { }

6 15 55 a0
152.6
15 55 225 a1 = 585.6
55 225 979 a2
2488.8

Solving these equations through a technique such as Gauss


elimination gives a0 = 2.47857, a1 = 2.35929, and a2 =
1.86071.


9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Continue:
Therefore, the least-squares quadratic equations for this
case is
y = 2.47857 + 2.35929x + 1.86071x2
The standard error of the estimate based on the regression
polynomial is
S y / x=

Sr
3.74657
=
=1.12
63
n( m+1)

The coefficient of determination is


r 2=

S t S r 2513.393.74657
=
=0.99851
St
2513.39

and the correlation coefficient is r = 0.99925.


These results indicate that 99.851 percent of the
original uncertainty has been explained by the model. This
result supports the conclusion that the quadratic equation
represents an excellent fit, as is also evident from the next
figure.

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013




Multiple
Linear

17.3
Regression

A useful extension of linear regression is the case where


y is a linear function of two or more independent variables.
For example, y might be a linear function of x1 and x2, as in
y=a0+ a1 x1 +a 2 x 2+ e

Such

an

equation

is

particularly

useful

when

fitting

experimental data where the variable being studied is often


a function of two other variables. For this two-dimensional
case, the regression line becomes a plane (next figure).

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013




As with the previous cases, the best values of the
coefficients are determined by setting up the sum of the
squares of the residuals,
n

S r = ( y i a0a1 x 1 ia 2 x 2 i)2
i=1

and differentiating with respect to each of the unknown


coefficients.
Sr
=2 ( y ia 0a1 x 1ia2 x 2 i )
a0
Sr
=2 x 1 i ( y ia 0a1 x1 ia2 x 2 i)
a1
Sr
=2 x 2 i ( y ia 0a1 x 1ia2 x 2 i)
a2

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

The coefficients yielding the minimum sum of the squares of


the residuals are obtained by setting the partial derivatives
equal to zero and expressing the result in matrix form as

n
x1 i
x2 i

x1 i
x2 i
2
x1 i x1 i x2 i
x 1 i x 2 i x 22 i

]{ } { }
a0
a1 =
a2

yi
x1 i y i
x2 i y i



( )
Example 17.6 Multiple Linear Regression
Problem Statement:
The following data was calculated from
the equations y = 5 + 4x1 3x2:
Use multiple linear regression to fit this
data.

Solution:
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

The summations required to develop the previous equation


are:

The result is

]{ } { }

6
16.5 14 a0
54
16.5 76.25 48 a1 = 243.5
14
48
54 a2
100

Which can be solved using a method such as Gauss


elimination for
a0 = 5

ai = 4

a2= -3

which is consistent with the original equation from which


the data was derived.

:
The foregoing two-dimensional
be
easily
case
can

extended to m dimensions, as in
y = a0 + a1x1 + a2x2 + + amxm + e
where the standard error is formulated as
S y / x=

Sr
n(m+1)

and the coefficient of determination is computed as in Eq


(17.10).

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Although there may be certain cases where a variable is


linearly related to two or more other variables, multiple
linear regression has additional utility in the derivation of
power equations of the general form
y=a0 x a1 x2a . x am
1

Such

equations

are

extremely

experimental
To

use

useful

when

fitting

data.

multiple

linear

regression,

the

equation

is

transformed by taking its logarithm to yield.


y=loga 0+ a1 log x 1 + a 2 log x 2+ +am log x m
log

This transformation is similar in spirit to the one used to fit a


power equations when y is a function of a single variable x.

Problem 17.5

Use least-squares regression to fit a straight line to


x

11

15

17

21

23

29

21

29

14

21

15

29

37

39
y

29

13

3
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

Compute the standard error of the estimate and the


correlation coefficient. Plot the data and the regression line.
If someone made an additional measurement of x = 10, y =
10, would you suspect, that the measurement was valid or
faulty? Justify your conclusion.
Solution:
The results can be summarized as
y 31.0589 0.78055 x

( s y / x 4.476306 ; r 0.901489 )

At x = 10, the best fit equation gives 23.2543. The line and
data can be plotted along with the point (10, 10).

The value of 10 is nearly 3 times the standard error away


from the line,
23.2543 10 = 13.2543 34.476
Thus, we can conclude that the value is probably erroneous.
Problem 17.13
An investigator has reported the data tabulated below for
an experiment to determine the growth rate of bacteria k
(per d), as a function of oxygen concentration c (mg/L). It is
9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

known that such data can be modeled by the following


equation:
2

k c
k = max 2
c s +c

Where

cs

and

k max

are parameters. Use a transformation

to linearize this equation. Then use linear regression to


estimate

cs

and

k max

and predict the growth rate at c = 2

mg/L.
C

0.5

0.8

1.5

2.5

1.1

2.4

5.3

7.6

8.9

Solution:
The equation can be linearized by inverting it to yield
c 1
1
1
s 2
k k max c
k max

Consequently, a plot of 1/k versus 1/c should yield a straight


line with an intercept of 1/kmax and a slope of cs/kmax
c,
mg/
L
k, /d
0.5

1.1

0.8

2.4

1.5

5.3

2.5

7.6

8.9
Sum

1/c2
4.00000
0
1.56250
0
0.44444
4
0.16000
0
0.06250
0
6.2294

1/c21/
1/k
k
0.9090 3.6363
91
64
0.4166 0.6510
67
42
0.1886 0.0838
79
57
0.1315 0.0210
79
53
0.1123 0.0070
60
22
1.7583 4.3993

(1/c2)2
16.0000
00
2.44140
6
0.19753
1
0.02560
0
0.00390
6
18.668

9 4444 260

Physics I/II, English 123, Statics, Dynamics, Strength, Structure I/II, C++, Java, Data, Algorithms,
Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

July 2013

44

75

38

44

Continue:

(. . . as
. . )
The slope and the intercept can be computed

a1

5(4.399338 ) 6.229444 (1.758375 )


0.202489
5(18.66844 ) (6.229444 ) 2

a0

1.758375
6.229444
0.202489
0.099396
5
5

Therefore, kmax = 1/0.099396 = 10.06074 and cs =


10.06074(0.202489) = 2.037189, and the fit is

10.06074 c 2
2.037189 c 2

This equation can be plotted together with the data:

The equation can be used to compute


k

10.06074 (2) 2
2.037189 (2) 2

6.666

Data,
Algorithms,

Physics I/II, English 123, Statics, Dynamics, Strength, Structure


I/II, C++,
Java,
9 4444 260

Numerical, Economy
eng-hs.com, eng-hs.net

info@eng-hs.com 260 4444 9 .

You might also like