Professional Documents
Culture Documents
Statistics
Lecture 7:
Least Squares Regression
Objectives
Introduction
Linear regression
Polynomial regression
Multiple linear regression
General linear least squares
Nonlinear regression
Experimentation
Data available at discrete points or times
Estimates are required at points between the discrete values
Curves are fit to data in order to estimate the intermediate values
Interpolation 100
- Precise data 80
Temperature (deg F)
- Force through each data point 60
40
20
0
0 1 2 3 4 5
Time (s)
Regression 6
5
- Noisy data f(x)
4
0
0 2 4 6 8 10 12 14 16
x
Experimental Data
Noisy (contains errors or inaccuracies)
x values are accurate, y values are not
Experiment 7
6
x y
2.10 2.90 5
y a0 a1 x
f(x)
6.22 3.83 4
7.17 5.98
3
10.5 5.71
2
13.7 7.74
1
0
0 2 4 6 8 10 12 14 16
x
y5
8
7
Data points
y3
6
e3
y4
5
f(x)
e2
4 Residual
y1 y2 e = y - (a 0 + a 1x )
Regression Model
2 y = a 0 + a 1x
0
0 2 4 6 8 10 12 14 16
x
Use the curve that minimizes the residual between the data
points and the line 9
y5
Model: y a0 a1 x
8
7
Data points
y3
6
x y e3
y4
yi a0 a1xi
5
2.10 2.90
f(x)
e2
6.22 3.83 4 Residual
y1 y2 e = y - (a 0 + a 1x )
7.17 5.98
ei = yi a0 a1 xi
3
Regression Model
10.5 5.71 2 y = a 0 + a 1x
13.7 7.74 1
n 2 n 0
S r = ei = yi a0 a1xi 2
0 2 4 6 8 10 12 14 16
x
i=1 i=1
Find the values of a0 and a1 that minimize Sr
Dr. M. Hrairi MTH2212 - Computational Methods and Statistics 8
Linear Regression: Finding a0 and a1
S r n 2
Minimize Sr by taking = yi a0 a1 xi
a0 a0 i=1
derivatives WRT n
a0 and a1, 2
i=1 a0
First a0
n
2
i=1 a0
n
2 y
i=1
i a0 a1 xi ( 1 )
Finally 0
n n
na0 xi a1 yi
i=1 i=1
Dr. M. Hrairi MTH2212 - Computational Methods and Statistics 9
Linear Regression: Finding a0 and a1
S r n 2
Minimize Sr by taking =
i 0 1 i
y a a x
a1 a1 i=1
derivatives WRT
a0 and a1 n
2
n
2
i=1 a1
n
2 y
i=1
i a0 a1 xi ( xi )
Finally 0
n n 2 n
xi a0 xi a1 xi yi
i=1 i=1 i=1
n n
na0 xi a1 yi
i=1 i=1
n n 2 n
xi a0 xi a1 xi yi
i=1 i=1 i=1
1 n n 2 1 n n
yi xi xi xi yi
n n i=1 i=1
a0 i=1 i=1
n 2 1 n 2
xi xi
i=1 n i=1
n 1 n n
xi yi xi yi
i=1 n i=1 i=1
a1
n 2 1 n 2
xi xi
i=1 n i=1
x y
2.10 2.90
6.22 3.83
7.17 5.98
10.5 5.71
13.7 7.74
y1
x1 x2
Total variation in y = Variation explained by the model + Unexplained variation (error)
In addition to r2, r
Define Sr
S y| x
n2
= standard error of the estimate
- Represents the distribution of the residuals around the
regression line
- Large Sy|xlarge residuals
- Small Sy|xsmall residuals
x y
2.10 2.90
6.22 3.83
7.17 5.98
10.5 5.71
13.7 7.74
Linear yi a0 a1xi
Quadratic yi a0 a1 xi a2 xi2
Residual
Normal Equations
n n
2
n
m n
n xi xi xi y i
n i=1 i=1 i=1 a0 ni=1
n n n
x xi
2 3
xi xi a1
m 1 x y
i=1 i i=1 i=1 i=1 i=1 i i
n n 2
m 2 a2
n n n
xi2 xi
3 4
xi xi xi yi
i=1 i=1 i=1 i=1 i=1
am
n m n
m 1
n
m 2
n
2m n m
xi xi xi xi xi yi
i=1 i=1 i=1 i=1 i=1
x 0 1.0 1.5 2.3 2.5 4.0 5.1 6.0 6.5 7.0 8.1 9.0
y 0.2 0.8 2.5 2.5 3.5 4.3 3.0 5.0 3.5 2.4 1.3 2.0
x 9.3 11.0 11.3 12.1 13.1 14.0 15.5 16.0 17.5 17.8 19.0 20.0
y -0.3 -1.3 -3.0 -4.0 -4.9 -4.0 -5.2 -3.0 -3.5 -1.6 -1.4 -0.1
n n
2
n
3 n
n xi xi i
x i
y
n i=1 i=1 i=1 ni=1
4 0
n n n a
x xi
2 3
xi xi a x y
i=1 i i=1 i=1 i=1 1 i=1 i i a0 0.3593
n n
n n n
5 a2 a 2.3051
xi2 xi
3 4
xi xi
2
xi yi 1
i=1 i=1 i=1 i=1 a3 i=1 a2 0.3532
n 3 n
4
n
5
n
6 n 3 a 0.0121
xi xi xi i
x xi yi 3
i=1 i=1 i=1 i=1 i=1
Regression Equation
y = - 0.359 + 2.305x - 0.353x2 + 0.012x3
6
2
f(x)
-2
-4
-6
0 5 10 15 20 25
x
y = a0 + a1x1 + a2x2 + e
Again very similar.
Minimize e
Polynomial and
multiple regression fall
within definition of
General Linear Least
Squares.
y a0 z0 a1 z1 a2 z 2 am z m e
z0 , z1, , zm are m 1 basis functions
Y Z A E
Z matrix of the calculated values of the basis functions
at the measured values of the independent variable
Y observed valued of the dependent variable
A unknown coefficients
E residuals
2 Minimized by taking its partial
n m derivative w.r.t. each of the
S r yi a j z ji coefficients and setting the
i 1 j 0 resulting equation equal to zero
Dr. M. Hrairi MTH2212 - Computational Methods and Statistics 30
Nonlinear Regression
y a0 (1 e a1 x ) e