You are on page 1of 28

Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM

Curve Fitting
Curve fitting techniques are used to fit curves to data to
obtain intermediate estimates or to derive a simpler
function from a complicated function.
Least squares regression is used when the data exhibits a
significant degree of error or noise.
Interpolation is used to fit curves that pass directly
through each of the points.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Least-Squares Regression
Least-squares regression techniques used to fit a curve
to experimental data.
These techniques used to derive an approximate function
that fits the shape or general trend of the data.
Techniques: linear regression, polynomial regression,
multiple linear regression
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Linear Regression
Fit a straight line to a set of paired observations (x
1
,y
1
),
(x
2
,y
2
), , (x
n
,y
n
)
The mathematical expression for the straight line is
y = a
0
+ a
1
x + e
e is called the error or residual
The residual is the difference between the observation
and the line: e = y a
0
a
1
x

What are the values of a
0
and a
1
?
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Minimize the sum of the squares of the residuals
( ) ( )

= = =
= = =
n
i
i i
n
i
i i
n
i
r
x a a y y y e S
i
1
2
1 0
1
2
model , measured ,
1
2
This criterion yields a unique line for a given set of data.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Least-Squares Fit of a Straight Line
To determine the values of a
0
and a
1
, differentiate S
r

with respect to each of the coefficients and set to zero:
| | 0 ) ( 2
0 ) ( 2
1 0
1
1 0
0
= =
c
c
= =
c
c

i i i
r
i i
r
x x a a y
a
S
x a a y
a
S


=
=
2
1 0
1 0
0
0
i i i i
i i
x a x a x y
x a a y
The equations become:
The normal equations are


= +
= +
i i i i
i i
y x x a x a
y x a na
2
1 0
1 0
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
( )
2
2
1

=
i
i i i i
x x n
y x y x n
a
i
The slope and the y-intercept are given by
n
x a y
a
y x a na
i i
i i

=
= +
1
0
1 0
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Fit a straight line to the data
x
i
y
i
1 0.5
2 2.5
3 2
4 4
5 3.5
6 6
7 5.5
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Linearization of Nonlinear Relationships
Transformations can be used to express the data in a
form that is compatible with the linear regression.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Suppose the relationship between x and y is
x b
e a y
1
1
=
x b a y
1 1
ln ln + =
2
2
b
x a y =
It can be linearized by taking the ln of both sides:
Consider

It can be transform into the linear form
x b a y log log log
2 2
+ =
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Consider
x b
x
a y
+
=
3
3
It can be linearized by inverting both sides
3 3
3 3
3
3
3
1 1 1
1
1 1 1 1
a x a
b
y x
b
a y x
x b
a y
+ =
|
.
|

\
|
+ =
|
.
|

\
|
+
=
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Problem 2
Fit to the data
2
2
b
x a y =
x y
1 0.5
2 1.7
3 3.4
4 5.7
5 8.4
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Problem 2
Fit to the data
Answer
2
2
b
x a y =
x y
1 0.5
2 1.7
3 3.4
4 5.7
5 8.4
75 . 1
2
3 . 0
2
'
1
5 . 0
75 . 1
5 . 0 10 300 . 0
x y
b
a a
=
=
= = =

x'= log(x) y=log(y)
0 -0.301
0.301 0.23
0.477 0.531
0.602 0.756
0.699 0.924
y = a
0
+ a
1
x + e
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Polynomial Regression
We need to fit a polynomial to data using polynomial
regression.
A second-order polynomial or quadratic fit is
y = a
0
+ a
1
x + a
2
x
2
+ c
The sum of squares of the residues:
( )

=
=
n
i
i i i r
x a x a a y S
1
2
2
2 1 0
Differentiate S
r
with respect to all parameters:
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Set the partials to zero and arrange
These equations are called the normal equations.
They form a system of linear equations with 3 equations
and 3 unknowns.
In general, an mth order polynomial requires solving a
system of m+1 linear equations.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Problem 3
Fit a second-order polynomial to the data
x y
0 2.1
1 7.7
2 14
3 27
4 41
5 61
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Multiple Linear Regression
The function y is a linear function of 2 or more
independent variables, such as
y = a
0
+ a
1
x
1
+ a
2
x
2
+ c
The sum of the squares of the residuals


To minimize S
r
,
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
The normal equations are
A system of 3 linear equations and 3 unknowns
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Q1. An electric heating-coil is immersed in a stirred tank. Solvent at
15
o
C with heat capacity 2.1 kJ kg
-1

o
C
-1
is fed into the tank at a rate of
15 kg h
-1
. Heated solvent is discharged at the same flow rate. The tank
is filled initially with 125 kg cold solvent at 10
o
C. The rate of heating
by the electric coil is 800 W. Calculate the time required for the
temperature of the solvent to reach 60
o
C.
Tutorial: 5
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM

Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Regression in Matlab
Use the in polymath
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Additional Example
The natural gas consumption for electric power generation
in the Kingdom from 1977 to 2000 is shown in the graph
below.
0
2000
4000
6000
8000
10000
12000
1975 1980 1985 1990 1995 2000 2005
Year
M
i
l
l
i
o
n

C
u
b
i
c

M
e
t
e
r
s
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Observations from the data:
There is an upward trend in the observations.
It looks like that the relation between the gas
consumption and the years is linear; i.e. the general
trend of the data is linear.
Can regression be used?
Yes because the gas consumption values are not
precise (there are errors in the measurements).
We can assume the normality holds.
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
0
2000
4000
6000
8000
10000
12000
1976 1981 1986 1991 1996
Year
M
i
l
l
i
o
n

C
u
b
i
c

M
e
t
e
r
s
Using the equations:
a
1
= 393.94
a
0
= - 777828
The coefficient of determination = 0.8811
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Quantification of Error of Linear Regression
To quantify the error reduction due to describing the
data in terms of a straight line, we use the coefficient of
determination which is defined as

=
2
2
) ( where y y S
S
S S
r
i t
t
r t
It represents the fraction of variability in y that can be
explained by the variability in x (how close the points are
to the line).
For r
2
= 1, it signifies the line explains 100% of the
variability of the data.
( ) ( )

= = =
= = =
n
i
i i
n
i
i i
n
i
r
x a a y y y e S
i
1
2
1 0
1
2
model , measured ,
1
2
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
( ) ( )

= = =
= = =
n
i
i i
n
i
i i
n
i
r
x a a y y y e S
i
1
2
1 0
1
2
model , measured ,
1
2
P448/Num-methods
If r^2 is 87 87% of
original uncertainty has been
explained by the model
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM
Example
Compute the coefficient of determination for the linear
regression in previous example
S
t
= 22.7145
S
r
= 2.9911
r
2
= 0.868
This indicates that 86.8% of the original uncertainty is
explained by the linear model.
Answer
Dr Muhammad Al-Salamah, Industrial Engineering, KFUPM

You might also like