You are on page 1of 12

ECM6 Computational Methods :

Slide 1 of 7

Lecture 11a
Linear Regression
Brian G. Higgins
Department of Chemical Engineering & Materials Science
University of California, Davis
April 2014, Hanoi, Vietnam

ECM6Lecture11aVietnam_2014.nb

Background
A common task in engineering is to determine a formula that describes how a physical quantity such as
say heat capacity varies as function of a second quantity say temperature.
As a result of an experiment you have made set of measurements and have M data points that are in
the form
88x1 , y1 <, 8x2 , y2 <, 8x3 , y3 <, 8xM , yM <<
Here say xi represents a temperature measurement and yi represents the heat capacity. When we plot
the data we might get something like this
1.2

1.0

0.8

CP

0.6

0.4

0.2

0.0
0

10

15

20

Temperature
Based on your understanding of the physics you may have an idea that the data should agree with the
blue line shown in the above plot.
We call the blue line the guess function g(x).
Since the points 8xi , yi < are experimental that are not likely to lie on the curve given by the guess function. Thus
dk = g Hxk L - yk
where dk represents the vertical distance from the data point 8xk , yk < and the curve given by the guess
function g(x). Here is a graphical representation

ECM6Lecture11aVietnam_2014.nb

0.12
0.10

Hxk , yk L

CP

0.08

dk =gHxk L-yk

0.06
0.04
0.02
0.00
0.0

0.5

1.0

1.5

2.0

Temperature

Types of Errors
We can define 3 types of error
M

Absolute Error of g : E1 HgL =

d1

d2

+ dM

dk

(1)

k=1
M

Square Error of g :

E2 HgL = d21 + d22 + d2M = d2k

(2)

k=1

Maximum Error of g :

E3 HgL = max 8 d1 ,

d2 ,

d3

dM <

(3)

Note if the graph passes through all the points 8xk , yk <, then the error Ei HgL no matter how it is defined
is zero.
On the other hand as Ei HgL get larger Hsome or all of the data points do not lie on the curveL, then we say
that the guess function g(x) does not fit the data as well. In short, the smaller is Ei HgL, the better is the fit.

Squared Error
The square error E2 HgL is the most widely used estimate of the error of how well g(x) fits the data. There
are statistical reasons why this is the best way to estimate the goodness of a fit, but we will not discuss
it in these lectures.
Thus our goal will be to
M

minimize E2 HgL where

E2 HgL = @g Hxi L - yk D2

(4)

k=1

We say then we seek a g(x) that has the least square error. Another way of saying this is the g(x) with
the minimum E2 HgL is called the least square g(x).

ECM6Lecture11aVietnam_2014.nb

Example 1
Problem Statement
Suppose we are given the following list of data points Hxk , yk L
881, 1.5<, 82, 3.9<, 84, 6.6<, 87, 11.7<, 89, 15.6<<
(i) Determine the square error as a function of k for
g HxL = k x
(ii) Find the parameter value of k that minimizes E2 HgL. What is E2 HgL for this value of k?

Solution Step 1: Define Sum of Squared Error Function


We will use Mathematica to do these calculations. The data is given by
data = 881, 1.5<, 82, 3.9<, 84, 6.6<, 87, 11.7<, 89, 15.6<<;

Next we define a function for the error E2 HgL. Note we use double strike E, as the variable E in Mathematica is the protected exponential constant equal to 2.7182...
M

E2 @g_, M_D := Hg@x@iDD - y@iDL2


i=1

Note that the parameter N defines the number of data points

Solution Step 2: Compute the Sum of Squared Error using the data
Here we have supposed that xi and yi are our data points. These are defined from the given data list
as follows:
x@i_D := data@@i, 1DD; y@i_D := data@@i, 2DD; M = Length@dataD;

Next we must define our function g.


g@x_D := k x

Then the sum of squares is given by


E2 @g, MD
H- 1.5 + kL2 + H- 3.9 + 2 kL2 + H- 6.6 + 4 kL2 + H- 11.7 + 7 kL2 + H- 15.6 + 9 kL2

We can also express this result as


Factor@E2 @g, MDD Simplify
441.27 - 516. k + 151. k2

which symbolically can be expressed as


M

E2 HgL = k2 x2i - 2 k xi yi + y2i


i=1

i=1

i=1

(5)

ECM6Lecture11aVietnam_2014.nb

Graphical View of Sum of Errors


Let us plot the square error as a function of the unknown parameter k in our model:
Plot@E2 @g, MD, 8k, - 3, 5<, PlotStyle 8Blue, Thick<,
Frame True, FrameLabel 8Style@"k", 16D, Style@"E2 ", 16D<D
3000
2500

E2

2000
1500
1000
500
0
-2

k
It is clear that E2 has a minimum value for a value of k 1.8

Computing the value of k


From calculus we know that the minimum value for E2 occurs when
E2
k

= 0,

2 E
k2

> 0 at k = kmin

Differentiating E2 gives
D@E2 @g, MD, kD Simplify
302. H- 1.70861 + kL

and solving for kmin where E2 k=0 we get


sol = Flatten@Solve@D@E2 @g, MD, kD 0DD
8k 1.70861<

We can check that this value of k gives a local minimum


D@E2 @g, MD, 8k, 2<D . sol
302

Thus our "best" guess function is


g HxL = 1.7086 x

Plotting the Best Fit


Let us plot the results and compare the fit

ECM6Lecture11aVietnam_2014.nb

plt1 = Plot@g@xD .sol, 8x, 0, 10<, PlotStyle 8Thick, Blue<,


FrameLabel 8Style@"x", 16D, Style@"gHxL", 16D<, Frame TrueD;
plt2 = ListPlot@data, PlotStyle 8PointSize@0.02`D, Red<D;
Show@plt1, plt2D
15

gHxL

10

0
0

10

x
Finally let us evaluate the least square error
E2 @g, MD . sol
0.448808

Note that the actual value is not small number, yet the fit looks good. We will address this point later.

ECM6Lecture11aVietnam_2014.nb

General Theory for Linear Regression


Background
In the last example we considered a fitting function with a single parameter. Here we extend the method
to two parameters, which is the general theory for linear regression
We will now extend the ideas from the previous section to find the best fit in a least squares sense for
the function
L HxL = m x + b
where m is the slope and b is the y- intercept. As before we will given a set of M data points in the form
88x1 , y1 <, 8x2 , y2 <, 8x3 , y3 <, 8xM , yM <<
and our objective is to find the parameters m and b that minimize the least square error. Thus
M

E2 HLL = Hm xk + b - yk L2
k=1

Minimizing the Squared Error


We know from calculus that the desired values of b and m must satisfy
E2 HLL
b

E2 HLL

= 0,

=0

Note that since the xk ' s and yk ' s are constant, we can evaluate the following sum
M

b=Mb
k=1

Now evaluating the derivatives using the chain rule gives


E2 HLL
b

= 2 Hm xk + b - yk L
k=1

Hm xk + b - yk L

= 2 Hm xk + b - yk L
k=1

= 2 M b + m xk - yk
k=1

k=1

Similarly we find
E2 HLL
m

= 2 Hm xk + b - yk L xk
k=1

ECM6Lecture11aVietnam_2014.nb

= 2 m x2k + b xk - yk xk
k=1

k=1

k=1

Then setting the partial derivatives to zero gives the following linear set of equations for m and b
M

M b + m xk - yk = 0
k=1

k=1

m x2k + b xk - yk xk = 0
k=1

k=1

k=1

We can write this system in matrix notation as


M
xk

xk
x2k

yk
b
O=K
O
m
yk xk

Let us show how this can be programmed into Mathematica .

ECM6Lecture11aVietnam_2014.nb

Example 2
Problem Statement
881, 5.12<, 83, 3<, 86, 2.48<, 89, 2.34<, 815, 2.18<<
and we want to find the parameters m and b that give the least square error for a guess function
LHxL = m x + b

Solution Step 1: Definition of Functions


We proceed as before and define the following quantities
data2 = 881, 5.12<, 83, 3<, 86, 2.48<, 89, 2.34<, 815, 2.18<<;
x@i_D := data2@@i, 1DD; y@i_D := data2@@i, 2DD; M = Length@data2D;
L@x_D := m x + b
M

E2 @g_, M_D := Hg@x@iDD - y@iDL2


i=1

Evaluating the least squares function gives


E2 @L, MD
H- 5.12 + b + mL2 + H- 3 + b + 3 mL2 + H- 2.48 + b + 6 mL2 + H- 2.34 + b + 9 mL2 + H- 2.18 + b + 15 mL2

Solution Step 2: Determining the Parameters


Taking the derivatives gives the following equations
eqns = 8D@E2 @L, MD, mD 0, D@E2 @L, MD, bD 0<
82 H- 5.12 + b + mL + 6 H- 3 + b + 3 mL + 12 H- 2.48 + b + 6 mL +
18 H- 2.34 + b + 9 mL + 30 H- 2.18 + b + 15 mL 0, 2 H- 5.12 + b + mL +
2 H- 3 + b + 3 mL + 2 H- 2.48 + b + 6 mL + 2 H- 2.34 + b + 9 mL + 2 H- 2.18 + b + 15 mL 0<
sol2 = Solve@eqnsD
88b 4.15298, m - 0.166026<<

The least square error is


E2 @L, MD . sol2
82.54009<

Solution Step 3: Visualizing the Solution


Now let us visualize the fit graphically

ECM6Lecture11aVietnam_2014.nb

plt1 = Plot@L@xD .sol2, 8x, 0, 20<, PlotStyle 8Thick, Blue<,


Frame True, FrameLabel 8Style@"x", 16D, Style@"LHxL", 16D<D;
plt2 = ListPlot@data2, PlotStyle 8PointSize@0.02D, Red<D;
Show@plt1, plt2D
4.0
3.5
3.0

LHxL

10

2.5
2.0
1.5
1.0
0

10

15

20

x
It is apparent from the plots that a straight line is not particularly good fit of the data.

ECM6Lecture11aVietnam_2014.nb

Using Mathematicas Fit Function


Mathematica has a function that allows one to use least squares to determine a best fit to a given
function. It is called Fit. Here is the syntax for the function
? Fit
Fit@data, funs, varsD finds a least-squares fit to a
list of data as a linear combination of the functions funs of variables vars.

Let us apply it to our data


linearFit = Fit@data2, 81, x<, xD
4.15298- 0.166026 x

We see we get the same result.

11

12

ECM6Lecture11aVietnam_2014.nb

Final Comments
In these notes we have outlined the general principle for doing linear regression.
We have kept the discussion limited to fitting the data to a linear function
L HxL = m x + b
In this case there are two parameters: L and M and they appear linearly in the fitting function
The term linear regression does not mean that the function is linear in the independent variable x, but
that it is linear in the unknown parameters m and L.
For example, the same ideas and programming steps apply to fitting the data to a polynomial function of
x:
P HxL = a0 + a1 x + a2 x2 + a3 x3 + + an xn
Note that the unknown parameters a0 , a1 , , an appear linearly in the function P(x)

References
These notes and the examples are adapted from Maron (1987)
M.J. Maron, Numerical Analysis. A Practical Approach, 2nd Edition, Macmillan Publishing Company,
1987

You might also like