Neura Networks For Solving Systems of Linear

Artificial Neural Networks (Spring 2007)
Neural Networks for Solving Systems of

Linear Equations
Seyed Jalal Kazemitabar
Reza Sadraei
Instructor: Dr. Saeed Bagheri
Artificial Neural Networks Course (Spring 2007)
Reza Sadraei
Jalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical Introduction
Problem Formulation
Standard Least Squares Solution
General ANN Solution
Minimax Solution
Least Absolute Value Solution
Conclusion
Reza Sadraei
Outline
Problem Formulation
Minimax Solution
Conclusion
Reza Sadraei
History
70s:
Kohonen solved optimization problems
using Neural Networks.
80s:
Hopfield used Lyapunov function (Energy
function) for proving the convergence of
iterative methods in optimization problems.
Differential Eq. Neural Networks
mapping
Reza Sadraei
History
Many problems in science and engineering
involve solving a large system of linear
equations:
Machine Learning
Physics
Image Processing
Statistics,
In many applications an on-line solution of
a set of linear equations is desired.
Reza Sadraei
History
40s:
Kaczmarz introduced a method to solve linear
equations
50s 80s:
Different methods based on Kaczmarzs has
been proposed in different fields.
Conjugate Gradient method.
No good method for on-line solution of
large systems.
Reza Sadraei
1990:
Andrzej Cichocki:
a Mathematician who received
his PhD in Electrical
Engineering
Proposed a Neural Network
for solving systems of linear
equations in real time
Reza Sadraei
Outline
Problem Formulation
Minimax Solution
Conclusion
Reza Sadraei
Problem Formulation
Linear Parameter Estimation model :
: Linear Equation
: Model matrix
: Unknown vector of the system
parameters to be estimated
: Vector of observations
: Unknown measurement errors
: Vector of true values (usually unknown)
n m
ij
R ] a [ A

=
true
b r b Ax = + =
m
R b
m
R r
m
true
R b
n T
n 2 1
R ] x ,..., x , x [ x =
m
2
1
true
true
true
m
2
1
m
2
1
n
2
1
mn 2 m 1 m
n 2 22 21
n 1 12 11
b
b
b
r
r
r
b
b
b
x
x
x
a a a
a a a
a a a
M
M M M
L
M O M
L
Reza Sadraei
Types of Equations
A set of linear equations is said to be overdetermined if
m > n.
Usually inconsistent due to noise and errors.
e.g. Linear parameter estimation problems arising in signal
processing, biology, medicine and automatic control.
A set of linear equations is said to be underdetermined if
m < n (due to the lack of information).
Inverse and extrapolation problems.
Involves much less problems than overdetermined case
n m
ij
R ] a [ A

=
true
b r b Ax = + =
Reza Sadraei
Mathematical Solutions
Why not use ?
It is not applicable since mn most of the time which
results in irreversibility of A.
What if we use least square error method?
Inversing is considered to be time consuming for
large A in real-time systems.
b A x
-1
=
; b A ) A A ( x
, b A Ax A
, 0 ) b Ax ( A ' y
), b Ax ( ) b Ax ( y
T 1 T
T T
T
T
=
=
= =
=
A A
T
Reza Sadraei
Outline
Problem Formulation
Minimax Solution
Conclusion
Reza Sadraei
Least Squares Error Function
Find the vector that minimizes the
least squares function
Where
represents the residual components of the
residual vector
n
R x
*
=
= =
n
1 j
i j ij i i i
b x a b x A ) x ( r
b Ax x r x r x r x r
T
m
= = )] ( ),..., ( ), ( [ ) (
2 1
=
= =
m
1 i
2
i
T
) x ( r
2
1
) b Ax ( ) b Ax (
2
1
) x ( E
Reza Sadraei
Gradient Descent Approach
Basic idea: compute a trajectory starting at the initial point
that has the solution x
*
as a limit point ( for )
General gradient approach for minimization of a function:
is chosen in a way that ensures the stability of the differential equations
and an appropriate convergence speed
t
) t ( x
) 0 ( x
) x ( E
dt
dX
=
n
2
1
mn 2 m 1 m
n 2 22 21
n 1 12 11
n
2
1
x
E
x
E
x
E
dt
dx
dt
dx
dt
dx
M
L
M O M
L
M

Reza Sadraei
Solving LE Using Least Squares Criterion
Gradient of the energy function:
So
Scalar representation:
) b Ax ( A
x
E
x
E
x
E
E
T
T
n 2 1
=
= L
n ,..., 2 , 1 j , x ) 0 ( x
b x a a
dt
dx
) 0 (
j j
n
1 p
n
1 k
i k ik
m
1 i
ip jp
j
= =
=

= = =
) b Ax ( A
dt
dX
T
=
Reza Sadraei

= = =

=
n
1 p
n
1 k
i k ik
m
1 i
ip jp
j
b x a a
dt
dx
Reza Sadraei
ANN With Identity Activation Function
Reza Sadraei
Outline
Problem Formulation
Minimax Solution
Conclusion
Reza Sadraei
The key step in designing an algorithm for
neural networks:
Construct an appropriate computational energy
function (Lyapunov function)
Lowest energy state will correspond to the
desired solution x
*
Using derivation, the energy function
minimization problem is transformed into a set
of ordinary differential equations
) x ( E
Reza Sadraei
In general, the optimization problem can be formulated
as:
Find the vector that minimizes the energy function
is called weighting function.
Weighting function derivation is called activation function
n
R x
*
)) x ( r ( ) b x A ( ) x ( E
m
1 i
i
m
1 i
i i
= =
= =
)) x ( r (
i
i i
i
i i
r
E
r
) r (
) r ( g
=

Reza Sadraei
Gradient descent approach:
The minimization of the energy function leads to the set of
differential equation
) x ( E
dt
dX
=
n
2
1
mn 2 m 1 m
n 2 22 21
n 1 12 11
n
2
1
x
E
x
E
x
E
dt
dx
dt
dx
dt
dx
M
L
M O M
L
M

=

= = =
= = =
m
1 i
n
1 k
i k ik i ip
n
1 p
jp
j
m
1 i
i p
i
n
1 p
jp
p
n
1 p
jp
j
b x a g a
dt
dx
r
E
x
r
x
E
dt
dx

Reza Sadraei
General ANN Architecture
=

= = =
m
1 i
n
1 k
i k ik i ip
n
1 p
jp
j
b x a g a
dt
dx
Remember that this is

the activation function
g
1
g
2
g
m
Reza Sadraei
Drawbacks of Least Square Error Criterion
Why not always use least square energy
function?
Not so good in case of existence of large outliers.
Only optimal for Gaussian distribution of error.
The proper choice of the criterion depends on
Specific applications.
Distribution of the errors in the measurement vector b
Gaussian dist*. Least squares criterion
Uniform dist. Chebyshev norm criterion
*However the assumption that the set of measurements or observations has a
Gaussian error distribution is frequently unrealistic due to different sources of
errors such as instrument errors, modeling errors, sampling errors, and human
errors.
Reza Sadraei
Hubers Function:
Weighting Function Activation Function
Special Energy Functions
>
e :
2
e
e :
2
e
) e (
2
2
H
Reza Sadraei
Talvars Function:
This Function has direct implementation
>
e :
2
e :
2
e
) e (
2
2
T
Reza Sadraei
Logistic Function:
Iterative Reweigheted method uses this activation
function.

e
Cosh ln ) e (
2
L
Reza Sadraei
L
p
-normed function:
Activation Function
=
=
m
1 i
p
i p
r
p
1
) x ( E
Reza Sadraei
L
p
-Norm Energy Functions
A well-known criterion is
energy function
Norm L
1

Weighting Function
Activation Function
=
=
m
1 i
i 1
) x ( r ) x ( E
Reza Sadraei
Another well-known criterion is
(chebyshev) criterion which can be
formulated as the minimax problem:
This criterion is optimal for uniform distribution
of error.
Norm L
{ } ) x ( r max min i
m i 1
R x
n

Reza Sadraei
Outline
Problem Formulation
Minimax Solution
Conclusion
Reza Sadraei
Minimax (L
-Norm) Criterion
For the case p= of the L
p
-Norm problem the activation
function g[r
i
(x)] can not be explicitly mathematically
expressed by
Error function can be define as
resulting in following activation function:
m i 1
i
} ) x ( r max{ ) x ( E

=
1
) (
p
i
x r
=
=

otherwise 0
} ) x ( r { max ) x ( r if )] x ( r [ sign
)] x ( r [ g
k
m k 1
i i
i
Jalal Kazemitabar
Reza Sadraei Artificial Neural Networks (Spring 2007)
Minimax (L
-Norm) Criterion
Although straightforward, some problems arise in
practical implementations of the system of
differential equations:
Exact realization of the signum functions is rather
difficult (electrically).
E
has a derivative discontinuity at x if for some i k

*This is often responsible for various anomalous
results (e.g. hysteresis phenomena)
) ( ) ( ) ( x E x r x r
k i
= =
Reza Sadraei
Transforming the problem to an equivalent one
Rather than directly implementing the proposed system, we transform
the minimax problem
into an equivalent one:
Minimize
subject to the constraints
Thus the problem can be viewed as finding the smallest non-negative
value of
where x
*
is a vector of the optimal values of the parameters
{ } ) ( max min
1
x r i
m i
R x
n

) (x r
i
0
0 ) (
* *
=

x E
Reza Sadraei
New Energy Function
Applying the standard quadratic function we can consider
the cost function as:
where are coefficients and
{ }
=

+ + + =
m
i
i i
x r x r x E
1
2 2
) )] ( ([ ) )] ( ([
2
) , (

0 , 0 > >
} , 0 min{ ] [ y y =
Reza Sadraei
New Energy Function
Applying now the gradient strategy we obtain the
associated system of differential equations
+ + +
=
] S )) x ( r ( S )) x ( r [(
dt
d
2 i i
m
1 i
1 i i 0
{ }
=
+ =
m
i
i i i i ij j
j
S x r S x r a
dt
dx
1
2 1
] )) ( ( )) ( [( ) ,..., 2 , 1 ( n j =
+
=
otherwise ; 1
0 ) x ( r ; 0
S
i
1 i

=
otherwise ; 1
0 ) x ( r ; 0
S
i
2 i
Reza Sadraei
Simplifying architecture
It is interesting to note that the system of differential
equations can be simplified by:
This nonlinear function represent a typical dead zone
function.
> +

< +
=

i i
i
i i
i i
r if r
r if
r if r
x r
_
_ 0
_
) ), ( (
Reza Sadraei
Simplifying architecture
It is easy to check:
Thus the system of differential equations can be simplified to the
form:
) ), ( ( )) ( ( )) ( (
2 1
x r S x r S x r
i i i i i i
= + +
) ), ( ( )) ( ( )) ( (
2 1
x r S x r S x r
i i i i i i
= +
) 0 (
1
0
) 0 ( , ) ), ( (
=

=
m
i
i i
x r
dt
d
, ) ), x ( r ( a
dt
dx
m
1 i
i i ij j
j
=
= ) n ,..., 2 , 1 j ( x ) 0 ( x
) 0 (
j j
= =
Reza Sadraei
=
=
m
i
i i ij j
j
x r a
dt
dx
1
) ), ( (
Reza Sadraei
=

=
m
1 i
i i 0
) ), x ( r (
dt
d

Reza Sadraei
Outline
Problem Formulation
Minimax Solution
Conclusion
Reza Sadraei
Least Absolute Values ( L
1
-Norm) Energy Function
Find the design vector that minimizes the error
function
where
Why should one choose this function knowing
that it has differentiation problems?
=
=
m
i
i
x r x E
1
1
) ( ) (
=
=
n
j
i j ij i
b x a x r
1
) (
Reza Sadraei
Important L
1
-Norm Properties
1. Least absolute value problems are equivalent to linear
programming problems and vice versa.
2. Although the energy function E
1
(x) is not differentiable, the terms
can be approximated very closely by smoothly differentiable functions
3. For a full rank* matrix A, there always exists a minimum L
1
-Norm
solution which passes through at least n of the m data points.
L
2
-Norm does not in general interpolate any of the points.
These properties are not shared by L
2
-Norm.
* Matrix A is said to be of full rank if all its rows or columns are
linearly independent.
) (x r
i
Reza Sadraei
Important L
1
-Norm Properties
Theorem: There is a minimizer of the energy
function for which the residuals for
at least n values of i, say i
1
, i
2
, , i
n
, where n denotes the
rank of the matrix A.
We can say that L
1
-Norm solution is the
median solution while the L
2
-Norm
solution is the mean solution.
n *
R x
=
=
m
1 i
i 1
) x ( r ) x ( E
0 ) x ( r
*
i
=
Reza Sadraei
Least Absolute Error Implementation
The algorithm is as follows:
1. First phase:
Solving the problem using ordinary least-square technique and
computing all m residuals
Selecting from them the n residuals which are smallest in
absolute value
2. Second phase:
Discarding the rest of equations, n equations related to selected
residuals are solved by minimizing the residuals to zero
ANN implementation is done in three layers using
inhibition control circuit.
Reza Sadraei
Phase #1
ANN Architecture for Solving L
1
-Norm Estimation
Problem
Phase #2
Reza Sadraei
Phase #1
1
-Norm Estimation
Problem
Phase #2
Reza Sadraei
Phase #1
1
-Norm Estimation
Problem
Phase #2
Reza Sadraei
Example
Consider matrix A and observation b as below. Find
the solution to Ax=b using the least absolute error
energy function.
=
16 4 1
9 3 1
4 2 1
1 1 1
0 0 1
A
=
10
1 -
1
2
1
b
,
0 b Ax , =
Reza Sadraei
In the first phase all the switches ( S
1
-S
5
) were closed and the
network was able to find the following standard least-squares
solution:
In this case it is impossible to select two largest, in absolute
value, residuals because
Phase one was rerun while switch S
4
was opened and the
network found then
=
5 . 1
5 . 3
6 . 0
x
*
I
=
6 . 0
4 . 1
6 . 0
6 . 0
4 . 0
) x ( r
*
I
6 . 0 r r r
5 3 2
= = =
=
3409 . 1
6404 . 2
9182 . 0
x
II
*
=
0273 . 0
2273 . 3
01636
2182 . 0
0818 . 0
) x ( r II
*
Reza Sadraei
Cichockis Circuit Simulation Results
In the second phase ( and third run of the network )
the inhibitive control network has opened the switch
S
2
. So in the third run only switches S
1
,S
3
,S
5
were
closed, and the network found the equilibrium point:
=
375 . 1
750 . 2
1
x
*
=
0
125 . 2
0
375 . 0
0
) x ( r
*
,
Reza Sadraei
Cichockis Circuit Simulation Results
Residuals for n=3 of the m=5 equations
converges to zero in 50 nano-seconds.
Reza Sadraei
Using MATLAB, we observed that zeroing r
1
,r
3
and
r
5
results in the minimum value of

=
=
m
1 i
i 1
) x ( r ) x ( E
Reza Sadraei
Outline
Problem Formulation
Minimax Solution
Conclusion
Reza Sadraei
Conclusion
Great need for real-time solution of linear equations.
Cichockis proposal ANN is different from classical ANNs.
Consider a proper energy function, reducing which results
in the optimal solution to Ax=b.
Proper function may have different meaning in different
applications.
Standard least square error function gives the optimal
answer for Gaussian distribution of error.
Reza Sadraei
Conclusion (Cont.)
Least square function doesnt have a good behavior when having
large outliers in observations.
Various energy functions have been proposed to solve the outlier
problem (e.g. logistic function).
Minimax results in the optimal answer for the uniform distribution of
error. It also has some implementation and mathematical problems
that results in an indirect approach to solving the problem.
Least absolute error function has some properties that makes it
distinguishable from other error functions.
Reza Sadraei

Neura Networks For Solving Systems of Linear

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neura Networks For Solving Systems of Linear

Uploaded by

Copyright:

Available Formats

Artificial Neural Networks (Spring 2007)

Neural Networks for Solving Systems of

Remember that this is

has a derivative discontinuity at x if for some i k

You might also like