Evolutionary Neuro - Fuzzy System With Internal Dynamics For System Identification

Universitt Duisburg-Essen
Facultt fr Ingenieurwissenschaften
AUTOMATISIERUNGSTECHNIK UND KOMPLEXE SYSTEME

Evolutionary Neuro-Fuzzy System with

Internal Dynamics for System Identification

Dipl-Eng.
Cristian FLOREA

Coordinators
Prof. Dr. Eng. Steven X. DING
Dr. Eng. Lavinia FERARIU

June 2006

Abstract
The scope of this project is to explore the capabilities of a dynamic neuro-fuzzy structure in
system identification, which is the first step towards control.
The motivation of this study is based on the properties of the neuro-fuzzy systems, which are
summarized in the first chapter.
The visible part of a systems dynamic is represented by the measurements of its inputs and
outputs (and sometimes state variables). The results are presented as time series. Studying
them proved to be an appropriate way for determining the internal structure of the generating
system.
Neuro-fuzzy structures capable of generating similar time series can, sometimes, be
considered functional equivalent with the real systems, thus providing an alternative way for
modelling.
Of course, nowadays mathematical background and technology already constitute as powerful
modelling tools that are used in most cases, even when the model is relatively complex
(usually, only main characteristics need to be modelled).
Neuro-fuzzy systems are intended to be used only with very complex systems for being an
efficient option, but theyre still an alternative way to model any given real system.
The practical component of this project provides a tool for system identification, developed
using the MATLAB development environment.
At this moment, validation of the toolbox uses simplified models of real systems, but a
milestone in the research process should be set by using more complex systems (social,
economic, financial)

- 4 / 46 -
I. Neuro-fuzzy systems .......................................................................................................... 6
I.1. Neural Networks ........................................................................................................ 6
I.2. Fuzzy Systems ............................................................................................................ 7
I.3. Combining Neural Networks and Fuzzy Systems ...................................................... 8
I.3.1. Neuro-Fuzzy Systems Characteristics ................................................................ 9
I.3.2. Neuro-fuzzy Systems properties ........................................................................ 9
I.3.3. Types of neuro-fuzzy systems .......................................................................... 10
II. Time Series ....................................................................................................................... 11
II.1. Time series classification ......................................................................................... 11
II.1.1. Continuity of measuring ................................................................................... 11
II.1.2. Number of variables ......................................................................................... 11
II.1.3. Linearity ........................................................................................................... 11
II.1.4. Stationarity ....................................................................................................... 11
II.2. Prediction strategies ................................................................................................. 12
II.2.1. Prerequisites ..................................................................................................... 12
II.2.2. Strategies .......................................................................................................... 13
III. Neuro-Fuzzy Systems in Time Series Analysis ........................................................... 14
III.1. Fuzzy Neurons ...................................................................................................... 15
III.2. Neurons with fuzzy weights ................................................................................. 15
III.3. The Yamakawa Neuron Model ............................................................................ 16
III.4. The Dynamic Yamakawa Neuron Model ............................................................. 18
IV. The MATLAB Implementation ................................................................................... 21
IV.1. Performance evaluation ........................................................................................ 21
IV.2. Training algorithms .............................................................................................. 21
IV.2.1. Gradient-based algorithms (Backpropagation) ................................................ 22
IV.2.2. Genetic Algorithms .......................................................................................... 28
IV.3. Training strategies ................................................................................................ 29
IV.4. Data structures ...................................................................................................... 29
IV.4.1. Class ARMA .................................................................................................... 30
IV.4.2. Class SFS .......................................................................................................... 30
IV.4.3. Class BRANCH ................................................................................................ 31
IV.4.4. Class NFS ......................................................................................................... 31
IV.5. Future development .............................................................................................. 32
IV.5.1. Evolutionary strategies ..................................................................................... 32
IV.5.2. Generating training data sets ............................................................................ 32
IV.5.3. Graphical user interface ................................................................................... 32
- 5 / 46 -
IV.6. Resources ............................................................................................................. 32
V. Testing and results ............................................................................................................ 33
V.1. Vehicle lateral dynamic model ................................................................................. 33
V.1.1. Model description ............................................................................................. 33
V.1.2. Physical simplifications .................................................................................... 34
V.1.3. Unknown input signal ...................................................................................... 35
V.1.4. Model parameter variation ............................................................................... 35
V.1.5. Model noise ...................................................................................................... 37
V.1.6. Typical failures ................................................................................................. 37
V.1.7. Physical parameters of the vehicle lateral dynamical model ........................... 38
I.1.1.1. System variables ........................................................................................... 38
I.1.1.2. Sensor noise data .......................................................................................... 38
V.1.8. Reference .......................................................................................................... 39
V.2. Test 1 ........................................................................................................................ 39
V.3. Test 2 ........................................................................................................................ 42
VI. Conclusions .................................................................................................................. 45
VII. Bibliography and references ......................................................................................... 46
- 6 / 46 -
I. Neuro-fuzzy systems
Neural networks and fuzzy systems, as alternative methods in data processing, are suited for
intelligent behaviour modelling and mimic the actions of a human expert capable of solving
complex problems.
This goal is achieved through observation and learning instead of classical mathematical
modelling (using classical laws of physics, chemistry, biology, economics and many more).
Considering the previous remarks, the process of knowledge assimilation has the leading role.
According to knowledge classification, assimilation has three major components:
- Interviewing and observing suited for knowledge that can be expressed as a set of
rules;
- Instruction;
- Learning;
I.1. Neural Networks
Neural networks are systems that try to make use of some of the known or expected
organizing principles of the human brain. They consist of a number of independent, simple
processors - the neurons. These neurons communicate with each other via weighted
connections.
At first, research in this area was driven by neurobiological interests. The modelling of single
neurons and the so-called learning rules for modifying synaptic weights were the initial
research topics.
Modern research in neural networks considers the development of architectures and learning
algorithms, and examines the applicability of these models to information processing tasks.
Although, there are still many researchers who are modelling biological neural networks by
artificial neural networks to learn more about the structure of the human brain and the way it
works, biological plausibility is usually neglected and only the problem of information
processing with artificial neural networks is considered. These models have in common that
they are based on rather simple processing units or neurons exchanging information via
weighted connections.
Different types of neural networks can solve different problems, like pattern recognition,
pattern completion, determining similarities between patterns or data - also in terms of
interpolation or extrapolation - and automatic classification.
Learning in neural networks means to determine a mapping from an input to an output space
by using example patterns. If the same or similar input patterns are presented to the network
after learning, it should produce an appropriate output pattern.
Neural networks can be used if training data is available. It is not necessary to have a
mathematical model of the problem of interest, and there is no need to provide any form of
prior knowledge. On the other hand the solution obtained from the learning process usually
cannot be interpreted.
Although there are some approaches to extract rules from neural networks, most neural
network architectures are black boxes. They cannot be checked whether their solution is
plausible, i.e. their final state cannot be interpreted in terms of rules. This also means that a
neural network usually cannot be initialized with prior knowledge if it is available, and
- 7 / 46 -
thus the network must learn from scratch. The learning process itself can take very long,
and there is usually no guarantee of success.
The following table synthesizes the advantages and drawbacks of neural networks:
Advantages Disadvantages
- No mathematical model is needed
- There is no need for prior knowledge
- There are many training methods
developed
- Black box-like system
- Usually, the manner in which the results
are obtained is not interpretable
- Adaptation to environment changes may
prove to be difficult, thus retraining is
needed
- If they exist, prior knowledge cant be
used
- The training process is not guaranteed to
converge
Table 1
I.2. Fuzzy Systems
When using fuzzy set theory, it is easy to model the fuzzy boundaries of linguistic terms by
introducing gradual memberships. In contrast to classical set theory, in which an object or a
case either is a member of a given set (defined, e.g., by some property) or not, fuzzy set
theory makes it possible that an object or a case belongs to a set only to a certain degree.
Interpretations of membership degrees include similarity, preference, and uncertainty. They
can state how similar an object or case is to a prototypical one, they can indicate preferences
between sub-optimal solutions to a problem, or they can model uncertainty about the true
situation, if this situation is described in imprecise terms. In general, due to their closeness to
human reasoning, solutions obtained using fuzzy approaches are easy to understand and to
apply.
Due to these strengths, fuzzy systems are the method of choice, if linguistic, vague, or
imprecise information has to be modelled.
The fuzzy systems are based on if-then rules. The antecedent of a rule consists of fuzzy
descriptions of input values, and the consequent defines a - possibly fuzzy - output value for
the given input. The benefits of these fuzzy systems lie in the suitable knowledge
representation. But problems may arise when fuzzy concepts have to be represented by
concrete membership degrees, which guarantee that a fuzzy system works as expected.
A fuzzy system can be used to solve a problem if knowledge about the solution is available in
the form of linguistic if-then rules. By defining suitable fuzzy sets to represent linguistic
terms used within the rules, a fuzzy system can be created from these rules.
There is no formal model of the problem of interest and no training data required.
- 8 / 46 -
Lets summarize in the following table:
- No mathematical model is required
- Prior knowledge represented as if-then
rules may be used;
- Ease of implementation
- if-then rules facilitate the interpretation
of results
- if-then rules are a prerequisite;
- No learning capabilities;
- There are no standard methods for
parameter adjustments;
- Difficulties in interpreting the results
may occur;
- Adaptation to changing environment
could be difficult;
- Improvements are not guaranteed by
parameter adaptation;
Table 2
I.3. Combining Neural Networks and Fuzzy Systems
Presently, the neuro-fuzzy approach is becoming one of the major areas of interest because it
gets the benefits of neural networks as well as of fuzzy logic systems and it removes the
individual disadvantages by combining them on the common features.
Different architectures of neuro-fuzzy system have been investigated. These architectures
have been applied in many applications especially in the process control.
Neural networks and Fuzzy logic have some common features such as distributed
representation of knowledge, model-free estimation, ability to handle data with uncertainty
and imprecision etc.
Fuzzy logic has tolerance for imprecision of data, while neural networks have tolerance for
noisy data. A neural networks learning capability provides a good way to adjust experts
knowledge and it automatically generates additional fuzzy rules and membership functions to
meet certain specifications. This reduces the design time and cost.
On the other hand, the fuzzy logic approach possibly enhances the generalization capability of
a neural network by providing more reliable output when extrapolation is needed beyond the
limits of the training data.
The basic idea of combining fuzzy systems and neural networks is to design an architecture
that uses a fuzzy system to represent knowledge in an interpretable manner and the learning
ability of a neural network to optimize its parameters.
The drawbacks of both of the individual approaches - the black box behaviour of neural
networks, and the problems of finding suitable membership values for fuzzy systems - could
thus be avoided.
A combination can constitute an interpretable model that is capable of learning and can use
problem-specific prior knowledge. Therefore, neuro-fuzzy methods are especially suited for
applications, where user interaction in model design or interpretation is desired.
- 9 / 46 -
I.3.1. Neuro-Fuzzy Systems Characteristics
Although there are a lot of different approaches, usually the term neuro--fuzzy system is used
for approaches which display the following properties:
- A neuro-fuzzy system is based on a fuzzy system which is trained by a learning
algorithm derived from neural network theory. The (heuristic) learning procedure
operates on local information, and causes only local modifications in the underlying
fuzzy system.
- A neuro-fuzzy system can be viewed as a 3-layer feed forward neural network. The
first layer represents input variables, the middle (hidden) layer represents fuzzy rules
and the third layer represents output variables. Fuzzy sets are encoded as (fuzzy)
connection weights. It is not necessary to represent a fuzzy system like this to apply a
learning algorithm to it. However, it can be convenient, because it represents the data
flow of input processing and learning within the model. (Sometimes 5-layer
architecture is used, where the fuzzy sets are represented in the units of the second and
fourth layer.)
- A neuro-fuzzy system can be always (i.e. before, during and after learning) interpreted
as a system of fuzzy rules. It is also possible to create the system out of training data
from scratch, as it is possible to initialize it by prior knowledge in form of fuzzy rules.
(Not all neuro-fuzzy models specify learning procedures for fuzzy rule creation)
- The learning procedure of a neuro-fuzzy system takes the semantic properties of the
underlying fuzzy system into account. This results in constraints on the possible
modifications applicable to the system parameters. (Not all neuro-fuzzy approaches
have this property.)
- A neuro-fuzzy system approximates an n-dimensional (unknown) function that is
partially defined by the training data. The fuzzy rules encoded within the system
represent vague samples, and can be viewed as prototypes of the training data. A
neuro-fuzzy system should not be seen as a kind of (fuzzy) expert system, and it has
nothing to do with fuzzy logic in the narrow sense.
I.3.2. Neuro-fuzzy Systems properties
From the fuzzy systems point of view, the main advantage is the learning capability, as for
neural networks, use of prior knowledge for initial conditions is an obvious gain, potentially
causing a speed-up of the training process. Also, the if-then rules set offer the possibility of
interpretable results.
The architecture of a neuro-fuzzy system is determined by the rules and fuzzy sets specific to
the problem. In this case there is no need to specify some network parameters, like the
number of hidden layers.
Unfortunately, combining the two types of systems is not a guaranteed way to success, due to the
fact that training convergence is not a sure thing.
An often used method for training fuzzy systems consists in representing it as a neural network
and applying the specific training algorithms (i.e. backpropagation). This approach implies some
alteration of the structure and/or algorithm, due to the fact that the functions used for fuzzy
inference are not differentiable (i.e. min, max). There are two possibilities:
- Replace specific fuzzy inference functions with differentiable ones (in this case there may
be a loss of interpretability of the results);
- Use of training methods that are not gradient-based;
- 10 / 46 -
Modern neuro-fuzzy systems are usually modelled as multi-layer feed-forward neural networks.
Here we can mention:
- ANFIS (Adaptive-Network-based Fuzzy Inference System) a Sugeno fuzzy system is
modelled as 5-layer, feed-forward neural network;
- GARIC (Generalized Approximate Reasoning-based Intelligent Control) implements a
fuzzy controller using specialized feed-forward neural networks;
Of course, there are other principles too as those used for fuzzy associative memories, self-
organizing feature map (transform the input of arbitrary dimension into a one or two
dimensional discrete map subject to a topological neighbourhood preserving constraint.

- No mathematical model is necessary
- Prior knowledge represented as if-then
rules is not required but it can be used;
- There are many training/learning
algorithms;
- Ease of implementation;
- Interpretable results;
- Training convergence is not guaranteed;
Table 3
I.3.3. Types of neuro-fuzzy systems
Usually there are two major combinations of neural networks and fuzzy systems.
First case involves a neural network and a fuzzy system working independently. The neural
network is used to determine some parameters of the fuzzy system. The adjustment can be
online or offline. Because of this co-working, these systems are named cooperative neuro-
fuzzy systems.
In the second case, we have hybrid neuro-fuzzy systems, where a homogeneous neural
network-like architecture is obtained by interpreting a fuzzy system as a neural network or by
directly implementing it as one.
- 11 / 46 -
II. Time Series
Inertia is an intrinsic property for all systems, thus observing their previous behaviour might
provide useful information that will allow derive some laws and rules about its evolution.
A time series is a sequence of observations (data points), measured typically at successive
times, spaced apart at uniform time intervals.
There are two main goals of time series analysis:
- identifying the nature of the phenomenon represented by the sequence of observations;
- forecasting (predicting future values of the time series variable).
Both of these goals require that the pattern of observed time series data is identified and more
or less formally described. Once the pattern is established, we can interpret and integrate it
with other data (i.e., use it in our theory of the investigated phenomenon).
Regardless of the depth of our understanding and the validity of our interpretation (theory) of
the phenomenon, we can extrapolate the identified pattern to predict future events.
If a time series can be predicted, then it is called deterministic. Usually, in practice, time
series are stochastic processes due to noisy observations. Consequently future observations
are determined only in part by past values.
II.1. Time series classification
Essentially aspects characterizing a time series determine a criterion or set of criteria used for
classifying.
II.1.1. Continuity of measuring
If it is possible for a time series to measure its values at any given time, then the time series is
called continuous.
In practice, generally, time series contain observations made at predetermined moments,
separated by constant time intervals. In this case we have discrete time series.
A continuous time series can be transformed into a discrete one by measuring its values at
equally distant moments, thus obtaining a sampled time series.
Another type is the integrative (cumulative) time series, where measuring is possible only for
cumulative values of a variable (i.e. rainfall quantities).
II.1.2. Number of variables
A time series that contains values of a single variable is named monovariable. Otherwise it is
a multivariable time series.
II.1.3. Linearity
Linearity of a time series can be determined using methods as Hurst coefficient, Lyapunov
(characteristic) exponent and correlation dimension.
II.1.4. Stationarity
A stationary process has the property that the mean, variance and autocorrelation structure do
not change over time.
- 12 / 46 -
Usually, time series are non-stationary, thus some methods were developed to transform a
non-stationary time series into a stationary one.
1. Difference the data. That is, given the series Z
t
, we create the new series
1
=
i i i
Z Z Y
The differenced data will contain one less point than the original data. Although you can
difference the data more than once, one difference is usually sufficient.
2. If the data contain a trend, we can fit some type of curve to the data and then model
the residuals from that fit. Since the purpose of the fit is to simply remove long term
trend, a simple fit, such as a straight line, is typically used.
3. For non-constant variance, taking the logarithm or square root of the series may
stabilize the variance. For negative data, you can add a suitable constant to make the
entire data positive before applying the transformation. This constant can then be
subtracted from the model to obtain predicted (i.e., the fitted) values and forecasts for
future points.
II.2. Prediction strategies
II.2.1. Prerequisites
Before presenting some prediction strategies, we need to define how to organize information
given by the time series in order to use it efficiently.
1. Data sets
Usually, time series data is divided into three sets:
- Training set used to train the prediction system; by means of trial and error the
dimension of training set is varied until an optimal size is reached;
- Validation set used for monitoring the training process to make sure that the
prediction system has not over-learned the training set;
- Testing set used after training to study the performance of the prediction system;
2. Sample delay and window size
Given a time series x
t
, x
t-1
, .x
t-i
,..and considering that we should predict x
t+n
, then one
must decide how many samples are used (this is called the window size) and how is the data
sampled. For a time series x
t
, x
t-k
, x
t-2k
.., x
t-ik
,.., k is called sample delay.
Both parameters will be determined experimentally.
3. Measure of prediction error
Given a time series x
1
, x
2
, x
3
,, x
n
and the mean value x
mean
then the standard deviation
( )
=
n
i
mean i sd
x x
n
x
1
2
1
1
(1)
is used as prediction error, given that the prediction is always the mean value. To mention
that, for random time series, the mean value is the best prediction.
A second method for measuring the prediction error starts from a series of predicted values
n
x x x ,....... ,
2 1
, the prediction error is defined as
- 13 / 46 -
sd
error
x
x
Error
= , where ( )
=
n
i
i i error
x x
n
x
1
2
1
1
(2)
In this case, the prediction quality is measured as the improvement relative to the prediction
of the mean value.
II.2.2. Strategies
One approach consists in constructing a function F
n
that can directly predict
n t
x
+
:
( )
ik t k t k t t n n t
x x x x F x
+
= ,......., , ,
2
(3)
Another way is to construct a function F
I

( )
Nk t k t k t t I t
x x x x F x
+
= ,......., , ,
2 1

which can predict one step ahead
1
+ t
x , and then apply this function iteratively n times.
Finally, one can construct a function F
1
to predict
1
+ t
x , then retrain it, so F
2
is obtains that
predicts
2
+ t
x , and so on:
( )
( )
1 1 2 1 1 2 2
2 1 1
,......., , ,
,......., , ,
+ + + + +
+
=
=
ik t k t k t t t
ik t k t k t t t
x x x x F x
x x x x F x

- 14 / 46 -
III. Neuro-Fuzzy Systems in Time Series Analysis
Classical time series analysis has two major components:
1. Time domain analysis - usually more profitably for stochastic signals, uses correlations
techniques to study signal characteristics
The correlation of two signals is defined as:
( ) ( ) t t t d t y x r
t
xy
+ =
}

In particular, when a function is correlated with itself, the operator is called
autocorrelation; otherwise it is called cross correlation.
Correlation determines how much similarity there is between the two argument functions.
Some of the general properties are:
- The maximum value of the correlation always occurs at t = 0. The function always
decreases (or stays constant) as t approaches infinity;
- The more the area under the correlation curve, the more is the similarity between the
two signals;
2. Frequency domain analysis best suited for periodic signals, is based on the Fourier
transform.
Suppose x is a complex-valued Lebesgue integral function. The Fourier transform to the
frequency domain is given by the function
( ) ( ) dt e t x X
t je
t
e

+

=
}
2
1
, with R e e
The Fourier analysis also uses Fourier series, discrete-time Fourier transform and discrete
Fourier transform.

On the other hand new approaches such as neural networks and especially neuro-fuzzy
systems offer ways for modelling a systems behaviour using artificial intelligence specific
techniques.
Neuro-fuzzy systems are more suitable for modelling large scale, complex systems that
otherwise would impose the use of very complex and hard to compute mathematical
equations.
Nowadays, in many cases, the classical methods are, sometimes by far, more efficient than
neuro-fuzzy modelling, but the research community is seeking for new, improved and
efficient modelling and training of these structures.
The scope of this project is to explore the capabilities of a dynamic neuro-fuzzy structure in
system identification, which is the first step towards control.
- 15 / 46 -
III.1. Fuzzy Neurons
The simplest neuro-fuzzy system is the fuzzy neuron which implements some basic fuzzy-
logic functions.

Figure 1 The Fuzzy Neuron
For example f can be one of the MIN or MAX functions:
( ) ( )
( ) ( )
n i n i
n i n i
x x x MAX x x x f y
x x x MIN x x x f y
,......., ,...., ,......., ,....,
,......., ,...., ,......., ,....,
1 1 max
1 1 min
= =
= =

Using fuzzy logical neurons, the output is more or less influenced by the values of inputs.
This influence depends on both the weights and the operation of fusion:
- for a neuron of type AND, the influence of its inputs having a weak weight is most
important
- for a neuron of type OR the inputs whose weight is significant are rather taken into
account

III.2. Neurons with fuzzy weights
Another way to fuzzify the neuron model is the use of fuzzy weights instead of crisp values.
Fuzzy weights are interpreted as membership functions, thus the linear synaptic connections
are replaced with non-linearities labelled as loose connected or tight connected.
Exciting or inhibiting connections are represented through fuzzy intersection or through fuzzy
complement followed by fuzzy intersection.
a. Conventional neuron with fuzzy weights
Considering the standard neuron

f
x
1
x
i
x
n
w
1
w
i
w
n
y

f
x
1
x
i
x
n
w
1
w
i
w
n
y

- 16 / 46 -
where x
1
, x
2
,, x
n
are inputs, w
1
, w
2
,, w
n
the weights, the bias (offset) and y the output.
Positive weights are excitatory and the negative ones inhibiting. The model has a single
parameter for each synapse and one non-linear function f. Due to its simplicity, more complex
functions are achieved with large, complicated architectures.
A more powerful neuron model is obtained when fuzzy weights are used and, more important,
the inputs and outputs are also membership functions (take fuzzy values). Such a model is the
Yamakawa neuron which will be presented in the next section.
b. Direct fuzzification of neural networks
In this case, inputs and/or output and/or weights are generalized to fuzzy values. The
following table presents all possible combinations
Fuzzy Neural
Network
Weights Inputs Outputs
Type 1 Crisp Fuzzy Crisp
Type 2 Crisp Fuzzy Fuzzy
Type 3 Fuzzy Fuzzy Fuzzy
Type 4 Fuzzy Crisp Fuzzy
Type 5 Crisp Crisp Fuzzy
Type 6 Fuzzy Crisp Crisp
Type 7 Fuzzy Fuzzy Crisp
Type 1 networks were used to classify fuzzy input vectors into crisp classes, types 24 were
used to implement fuzzy if-then rules.
According to some researches, types 57 cant be implemented. For type 5 the output will
always be crisp, while for 6 and 6 there is no need to fuzzify the weights.
III.3. The Yamakawa Neuron Model
Lets consider the linear combinatory model:

Figure 2 The Linear Combinator
The Yamakawa neuron is derived from the model above, where weights a
i
are replaced by
non-linear functions implemented with Sugeno fuzzy systems (SISO Single Input, Single
Output) and the bias a
0
is set to zero.

+
x
1
x
i
x
n
a
1
a
i
a
n
y

a
0
- 17 / 46 -

Figure 3 The Yamakawa Model
The structure of the non-linear synapse is presented in the next figure:

Figure 4 The Yamakawa Model Synapse Structure (Sugeno Fuzzy System)
and we have:
( )
( )
( )
=
=

=
m
j
i j i
m
j
j i i j i
i i
x g
w x g
x f
1
,
1
, ,

where
- g
i,j
- the j
th
membership function of the i
th
Sugeno fuzzy system
- w
i,j
the j
th
variable weight of the i
th
Sugeno fuzzy system
- m the number of membership functions

f
x
1
x
i
x
n
y

f
1
f
i
f
n

+
x
i
f
i
(x
i
)
g
i,1
g
i,j
g
i
,
m
w
i,1
w
i,j
w
i
,
m
- 18 / 46 -
III.4. The Dynamic Yamakawa Neuron Model
The Yamakawa model offers more computational power but is still a static structure. In order
to be more practical one can transform it into a dynamic structure. Considering that,
nowadays, in practice, most time series are discrete, ARMA (Auto-Regressive, Moving
Average) filters prove to be suitable for modelling dynamic behaviour.
The proposed structure is represented in the following figure:

Figure 5 Dynamic Synapse

Reconsidering the Yamakawa model, the following representation is obtained:

Figure 6 The Dynamic Yamakawa Model

Lets consider a second-order ARMA filter with the following structure:

Figure 7 Structure of a second order ARMA filter

ARMA1 SFS ARMA2 x
i
(k) y
i
(k)
ARMA1
1
SFS
1
ARMA2
1
x
1
(k)
ARMA1
i
SFS
i
ARMA2
i
x
i
(k) y (k)
ARMA1
n
SFS
n
ARMA2
n
x
n
(k)
+
ARMA

b
2
b
1
b
0

a
1
a
2
T
T
T
T
y(k) x(k)
-
+
+
+
+
+
+
- 19 / 46 -
The input output transfer of the ARMA filter is given by:
( ) k x
q a q a
q b q b b
k y
2
2
1
1
2
2
1
1 0
1
) (

+ +
+ +
=
where
1
q is the delay operator.
For the Sugeno fuzzy systems Gaussian membership functions are considered, with centers
uniformly distributed in [-1;1].
( )
( )
=
m
i
c k x
m
i
c k x
i
i
i
i
i
e
e
k y
1
) (
1
) (
2
2
2
2
) (
o
o
|

-
i
output singletons (variable weights);
-
i
dispersions for the Gaussian membership functions;
- c
i
centres of the Gaussian membership functions;
- m number of membership functions;
In order to write the input output transfer equation for the entire system, we need to agree on
some notations:

Figure 8 The Yamakawa Dynamic Model with signals notations

( ) k s
q a q a
q b q b b
k y
2
2
1
1
2
2
1
1 0
1
) (

+ +
+ +
=
( )
=
=
m
i
i
k v k s
1
) (
( ) ( ) ( )
( ) ( )
( ) k w
q a q a
q b q b b
k v
i i
2 2
2
1 2
1
2 2
2
1 2
1
2
0
1
) (

+ +
+ +
= n i =1
( )
( )
=
m
j
c k z
m
j
c k z
j i
i
j i
j i i
j i
j i i
e
e
k w
1
) (
1
) (
,
2
,
2
,
2
,
2
,
) (
o
o
|
n i =1
ARMA1
1
SFS
1
ARMA2
1
x
1
(k)
ARMA1
i
SFS
i
ARMA2
i
x
i
(k) y (k)
ARMA1
n
SFS
n
ARMA2
n
x
n
(k)
+
ARMA

z
i w
i
v
i
s

z
n
w
n
v
1
v
n
z
1
w
1
- 20 / 46 -
( ) ( ) ( )
( ) ( )
( ) k x
q a q a
q b q b b
k z
i i 2 1
2
1 1
1
2 1
2
1 1
1
1
0
1
) (

+ +
+ +
= n i =1

- 21 / 46 -
IV. The MATLAB Implementation
Having studied this dynamic structures capability during my license degree project, the
purpose of the current project is to make training process more efficient (concerning duration,
performance and ease of use). The objective is to find those algorithms, strategies and training
parameters that determine fast training, maximum of performance and minimum of user
adjustable parameters / variables.
IV.1. Performance evaluation
The structure proposed in this paper is intended to model a dynamic system using neuro-fuzzy
paradigm specific techniques. Thus, evaluation of modelling performance based on output
estimation error comes naturally.
In this case, the mean square error is considered:
( ) ( ) ( ) ( ) | |
2
2
,
2
1
,
2
1
, u u u k y k y k e k J
d
= =
, where is the neuro-fuzzy system parameter vector.
IV.2. Training algorithms
The training algorithms are implemented as sets of functions designed to alter the parameters
of the neuro-fuzzy system according to the criterion above.
Two types of algorithms were considered, first the gradient-descent based, specific for
training neural network and second a genetic algorithm.
Both types have de same logical structure:

scale data_sets
while not stop
output = evaluate(structure)
error = target_output output
modify parameters
if stop_crierion
stop = true
else
stop = false
end if
end while

- 22 / 46 -
Data scaling is made according to the following procedure:

1
min_ max_
_ min_ _
2 _ _
1
min_ max_
_ min_ _
2 _ _
1
min_ max_
_ min_ _
2 _ _
) _ max( max_
) _ min( min_
=
=
=
training training
set training set teting
set testing scaled
training training
set training set validation
set validation scaled
training training
set training set training
set training scaled
set training training
set training training

Although the training set values will be in [-1,1], for the validation and testing sets it is
possible to exceed this interval. This scaling procedure assures that the transformation is
bijective.
IV.2.1. Gradient-based algorithms (Backpropagation)
This class of algorithms uses the following formula for updating system parameters:
( ) ( )
( )
u
u
q u u
c
c
= +
,
1
k J
k k (4)
where is an algorithm specific parameter called learning rate.
Considering the expression for J(k,), the above formula becomes:
( ) ( ) ( )
( )
u
u
q u u
c
c
= +
,
1
k y
k e k k (5)
So, besides learning rate and estimation error, parameter variation depends only on systems
output.
Remark
For an ARMA filter having the input-output transfer described by
( ) ( ) k x
q a q a
q b q b b
k y
2
2
1
1
2
2
1
1 0
1

+ +
+ +
= (6)
the following expressions were derived:
- derivative of output with respect to input
( )
2
2
1
1
2 1 0
1

+ +
+ +
=
c
c
q a q a
b b b
k
x
y
7)
- derivative of output with respect to denominator coefficients
) (
1
2
2
1
1
k y
q a q a
q
a
y
i
i

+ +

=
c
c
; 2 , 1 = i (8)
- 23 / 46 -
- derivative of output with respect to numerator coefficients
( ) k x
q a q a
q
b
y
i
i
2
2
1
1
1

+ +
=
c
c
; 2 , 0 = i (9)

For this revision of the toolbox, a poles-zeros factorized expression was preferred for the
ARMA filters. This allows a better control of the filters behaviour.
The following expressions were considered:

) (
) )( (
) )( (
) (
2
2 1
k x
p q p q
z q z q
k k y
p
+ +
+ +
= (10)

( )
( )
( ) ( ) k x
q a q a
q b q b b
k x
q p p q p p
q z z q z z
k k y
p
2
2
1
1
2
2
1
1 0
2
2 1
1
2 1
2
2 1
1
2 1
1 1
1
) (

+ +
+ +
=
+ + +
+ + +
= (11)

( )
( )
2
2 1
1
2 1
2 1 2 1
1
1
) (

+ + +
+ + +
=
c
c
q p p q p p
z z z z k
k
x
y
p
(12)
- derivative of output with respect to gain
( )
( )
( ) k x
q p p q p p
q z z q z z
k
k
y
p
+ + +
+ + +
=
c
c

2
2 1
1
2 1
2
2 1
1
2 1
1
1
) ( (13)
- derivative of output with respect to poles
) (
1
2
2
1
1
k y
q a q a
q
a
y
i
i

+ +

=
c
c
; 2 , 1 = i (14)

2
2
2 2
1
1 2
1
2
2 1
1
1 1
p
a
a
y
p
a
a
y
p
y
p
a
a
y
p
a
a
y
p
y
c
c
c
c
+
c
c
c
c
=
c
c
c
c
c
c
+
c
c
c
c
=
c
c
(15)

( )
( ) k y
q a q a
q p q
k
p
y
k y
q a q a
q p q
k
p
y
2
2
1
1
2
1
1
2
2
2
1
1
2
2
1
1
1
) (
1
) (

+ +

=
c
c
+ +

=
c
c
(16)
- 24 / 46 -
- derivative of output with respect to zeros
( ) k x
q a q a
q
b
y
i
i
2
2
1
1
1

+ +
=
c
c
; 2 , 0 = i (17)

2
2
2 2
1
1 2
1
2
2 1
1
1 1
z
b
b
y
z
b
b
y
z
y
z
b
b
y
z
b
b
y
z
y
c
c
c
c
+
c
c
c
c
=
c
c
c
c
c
c
+
c
c
c
c
=
c
c
18)

( )
( ) k x
q a q a
q z q
k k
z
y
k x
q a q a
q z q
k k
z
y
p
p
2
2
1
1
2
1
1
2
2
2
1
1
2
2
1
1
1
) (
1
) (

+ +
+
=
c
c
+ +
+
=
c
c
(19)

For the Sugeno fuzzy systems the expressions considered are:
- input output transfer
( )
( )
=
m
i
c k x
m
i
c k x
i
i
i
i
i
e
e
k y
1
) (
1
) (
2
2
2
2
) (
o
o
|
(20)
Let
( )
=
m
i
c k x
i
i
i
e k P
1
) (
2
2
) (
o
| (21)
and
( )
=
m
i
c k x
i
i
e k Q
1
) (
2
2
) (
o
(22)
Then
) (
) (
) (
k Q
k P
k y = (23)

- 25 / 46 -
Considering the notations above, one can write
2
) (
) ( ) ( ) ( ) (
) (
k Q
k
x
Q
k P k Q k
x
P
k
x
y
c
c

c
c
=
c
c
(24)
where
( )
( )

=
c
c
m
i i
i
c k x
i
c k x
e k
x
P
i
i
1
2
) (
) (
2 1 ) (
2
2
o
|
o
(25)
and
( )
( )

=
c
c
m
i i
i
c k x
c k x
e k
x
Q
i
i
1
2
) (
) (
2 1 ) (
2
2
o
o
(26)

- derivative of output with respect to singletons
From the input output transfer expression, one can derive
) (
) (
) (
k Q
k
P
k
y
i
i
|
|
c
c
=
c
c
, m i , 1 = (27)
having
( )
2
2
) (
) (
i
i
c k x
i
e k
P
o
|
=
c
c
, m i , 1 = (28)

- derivative of output with respect to centres
The expressions above lead us to write
2
) (
) ( ) ( ) ( ) (
) (
k Q
k
c
Q
k P k Q k
c
P
k
c
y
i i
i
c
c

c
c
=
c
c
, m i , 1 = (29)
with
( )
2
) (
) (
2 ) (
2
2
i
i
c k x
i
i
c k x
e k
c
P
i
i
o
|
o

=
c
c

, m i , 1 = (30)
and
( )
2
) (
) (
2 ) (
2
2
i
i
c k x
i
c k x
e k
c
Q
i
i
o
o

=
c
c

, m i , 1 = (31)
- 26 / 46 -
- derivative of output with respect to dispersions
Some simple calculus and we have
2
) (
) ( ) ( ) ( ) (
) (
k Q
k
Q
k P k Q k
P
k
y
i i
i
o o
o
c
c

c
c
=
c
c
, m i , 1 = (32)
where the derivatives involved have following expressions
( )
3
2
) (
) ) ( (
2 ) (
2
2
i
i
c k x
i
i
c k x
e k
P
i
i
o
|
o
o

=
c
c

, m i , 1 = (33)
( )
3
2
) (
) ) ( (
2 ) (
2
2
i
i
c k x
i
c k x
e k
P
i
i
o o
o

=
c
c

, m i , 1 = (34)

With notations from Figure 8 we can write the training formula for each component.
For the algorithm itself, several options are available:
i. Parameter alteration
Sometimes the training set will contain fewer but more relevant samples. In this case
sequential training may be used, thus, the parameter change will occur after each
sample being evaluated. Otherwise batch training is used.
For batch training, suppose the training set has m samples, then the following
expressions will be considered:
( ) ( ) derivative mean error mean k k _ _ 1 = + q u u (35)
where
=
=
m
t
t e
m
error mean
1
) (
1
_ (36)
and
( )
=
c
c
=
m
t
t
y
m
derivative mean
1
1
_
u
(37)
ii. Learning rate
The implicit option is the modifiable but fixed learning rate, which means that for a
training session the learning rate will be constant, being specified before starting the
procedure.
Another option is the variable learning rate. In this case, during training procedure, the
learning rate is modified according to a previously established rule:
- Annealing (gradually lower)
In order to reach the minimum, and stay there, we must anneal (gradually
lower) the global learning rate. A simple, non-adaptive annealing schedule for
this purpose is the search-then-converge schedule
- 27 / 46 -
( )
( )
T
k
k
+
=
1
0 q
q (38)
Its name derives from the fact that it keeps nearly constant for the first T
training patterns, allowing the network to find the general location of the
minimum, before annealing it at a (very slow) pace that is known from theory
to guarantee convergence to the minimum. The characteristic time T of this
schedule is a new free parameter that must be determined by trial and
error.
- Bold driver
A useful batch method for adapting the global learning rate is the bold
driver algorithm. Its operation is simple: after each epoch, compare the
network's loss e(k) to its previous value, e(k-1). If the error has decreased,
increase by a small proportion (typically 1%-5%). If the error has increased
by more than a tiny proportion (say, 10
-10
), however, undo the last parameter
change, and decrease sharply - typically by 50%. Thus bold driver will keep
growing slowly until it finds itself taking a step that has clearly gone too far
up onto the opposite slope of the error function. Since this means that the
system has arrived in a tricky area of the error surface, it makes sense to reduce
the step size quite drastically at this point.
Unfortunately bold driver cannot be used in this form for online learning: the
stochastic fluctuations in e(k) would hopelessly confuse the algorithm.
iii. Momentum
Another technique that can help the system out of local minima is the use of a
momentum term. This is probably the most popular extension of the backpropagation
algorithm; it is hard to find cases where this is not used. With momentum m, the
parameter update at a given moment k becomes:
( ) ( ) ( ) 1 A + = A k m k f k u q u (39)
where f(k) is a factor depending on current / mean error and current / mean derivative
and e 0 < m < 1 is a new global parameter which must be determined by trial and
error. Momentum simply adds a fraction m of the previous parameter update to the
current one.
When the gradient keeps pointing in the same direction, this will increase the size of
the steps taken towards the minimum. It is therefore often necessary to reduce the
global learning rate when using a lot of momentum (m close to 1). Combining a high
learning rate with a lot of momentum, the system will rush past the minimum with
huge steps.
When the gradient keeps changing direction, momentum will smooth out the
variations.
iv. Stopping condition
For sequential training the algorithm will stop after all data samples in the training set
were presented to the system.
When using batch training the algorithm will stop after the specified number of
epochs.
- 28 / 46 -
In both cases, there is an option to stop the training process when a desired (imposed)
estimation error is reached, that means
( )
max
e k e < (40)
IV.2.2. Genetic Algorithms
The training procedure using genetic algorithms implies the following steps to be taken:
Step 1. From the system parameters construct a vector with a predefined structure;
Step 2. Create a population consisting of a specified number (NPop) of randomly
generated vectors like the one defined on the previous step; Npop is an algorithm
specific parameter;
Step 3. Init Ngen (number of generations)
Step 4. Evaluate current population; for each individual the mean square estimation error
is computed;
Step 5. Select a number of individuals for reproduction
Step 6. Apply the crossover operator with a specified probability Pc;
Step 7. Apply mutation operator with a specified probability Pm;
Step 8. Select Npop individuals from the extended population (parents and offspring)
Step 9. If NGen is reached go to next step, else repeat from step 4
Step 10. From the final population select the best individual
Step 11. Knowing the structure of an individual set the parameters

Regarding the Sugeno fuzzy systems, when constructing individuals by simply gathering all
parameters and putting them together, the crossover might produce worse individuals, due to
the fact that it is possible for one or more Sugeno systems to have similar characteristics.
To overcome this drawback, two methods were considered:
i. Test for similarity
In this case, all vectors containing centers of Gaussian membership functions are
clustered based on the relative distance between them. From each cluster a
representative is chosen. Then, these representatives are used to construct the initial
population.
Parameters for clustering algorithm are chosen by trial and error.
ii. Replace genetic algorithms with evolutionary strategies
Comparing with genetic algorithms, in evolutionary strategies there is no crossover or
the probability of crossover is drastically lowered, the main operator being mutation,
the population contains fewer individuals, a parent produces more offsprings.
In this case, Gaussian mutation will be applied. Gaussian mutation adds to each
component of the individual a small quantity obtained from a Gaussian distribution.
Due to the fact that implementation of evolutionary strategies is more complicated,
this aspect is left for future development of the toolbox.
- 29 / 46 -
IV.3. Training strategies
Due to the fact that the structure implemented by this project has a relatively large number of
parameters, it is more efficient to separate them by means of power to affect the
performance of the system.
In this revision of the toolbox a two-stage training procedure is implemented.
Mainly, the first stage works with the static version of the system, ARMA filters being
characterized only by gain.
The second stage is started with the insertion of randomly generated poles and zeros (numbers
in [-1, 1] interval are generated to assure stability and minimum phase). After that, all
parameters are adjusted, making sure in the same time that the performance is increased, thus
not ruining the work in the first stage.
For the moment, two strategies were defined:
1) Stage 1 gradient-descent batch training on static structure
Stage 2 gradient-descent batch training on dynamic structure
2) Stage 1 genetic algorithms on static structure
Stage 2 gradient-descent batch training on dynamic structure

Remark
During early tests it has been observed that adjustments made to the Sugeno systems
parameters, sometimes determine the evaluation to NaN, due to the fact that no
membership function is activated, thus the denominator of the input-output transfer is
evaluated to 0 (ore very close to 0, considering the error caused by number
representation), and the final output of the system is the result of a division by 0.
Also it is worth mention that initialization of the fuzzy systems guarantees that at least
one of the membership functions is activated.
To overcome this behaviour, when a Sugeno fuzzy system evaluates to NaN, the last
parameter change is discarded.
IV.4. Data structures
Current revision of the toolbox is implemented using the OOP capabilities of the MATLAB
environment.
Although the previous implementation was also modular and scalable it was pretty hard to
debug training-related problems.
Encapsulation reduced the amount of debuggable code and ensured minimum spread and
propagation of errors.
Another issue solved by this approach is the speed of training an evaluation.
In the following sections the implemented classes will be enumerated. Although implemented,
standard methods like constructor, display, set, get will not be mentioned.
- 30 / 46 -
IV.4.1. Class ARMA

A. Fields
Gain gain of the filter (scalar)
Poles poles of the filter (1-by-2 vector)
Zeros zeros of the filter (1-by-2 vector)
Input present and last 2 input samples (1-by-3 vector)
Output present and last 2 output samples (1-by-3 vector)
Doutdin present and last 2 samples for derivative o output with respect to input (1-
by-3 vector)
Doutdgain present and last 2 samples for derivative o output with respect to gain (1-
by-3 vector)
Doutdpoles present and last 2 samples for derivative o output with respect to poles
(2-by-3 vector)
Doutdzeros present and last 2 samples for derivative o output with respect to zeros
(2-by-3 vector)
B. Methods
Evaluate evaluate output
Evalderiv evaluate output and derivatives
Updategd updates filter parameters according to gradient-descent expression
Reset resets initial conditions

IV.4.2. Class SFS

A. Fields
Center centers of Gaussian membership functions
Beta output singletons
Sigma dispersions of the Gaussian membership functions
Doutdin derivative of output with respect to input (scalar)
Doutdcenter derivative of output with respect to centers (nmf-by-1 vector)
Doutdbeta derivative of output with respect to singletons (nmf-by-1 vector)
Doutdsigma derivative of output with respect to dispersions (nmf-by-1 vector)
where nmf represents the number of membership functions
B. Methods
Evaluate evaluate output
- 31 / 46 -
Evalderiv evaluate output and derivatives
Updategd updates fuzzy system parameters according to gradient-descent expression

IV.4.3. Class BRANCH

A. Fields
Arma1 input filter (ARMA object)
Sfs Sugeno fuzzy system (SFS object)
Arma2 output filter (ARMA object)
B. Methods
Evaluate evaluate output expression (calls object specific evaluate method)
Evalderiv evaluate output and derivatives expression (calls object specific evalderiv
method)
Updategd updates parameters according to gradient-descent expression (calls object
specific updategd method)
Reset resets initial conditions for arma1 and arma2
Initpz initializes arma1 and arma2 poles and zeros with random values in [-1, 1]

IV.4.4. Class NFS

A. Fields
Branches synapses (vector of BRANCH objects)
Arma output filter (ARMA object)
B. Methods
Evaluate evaluate output expression (calls object specific evaluate method)
Evalderiv evaluate output and derivatives expression (calls object specific evalderiv
method)
Updategd updates parameters according to gradient-descent expression (calls object
specific updategd method)
Reset resets initial conditions for ARMA and each branch object
Initpz initializes ARMA objects poles and zeros with random values in [-1, 1] (calls
object specific initpz method)
Train trains the structure according to specified input, target and strategy
Scale scales data sets (private method called by train)
- 32 / 46 -
Ga implements a genetic algorithm (private method called by train)
Sequentialgd implements a sequential gradient-descent training algorithm (private
method called by train)
Batchgd implements a batch gradient-descent training algorithm (private method called
by train)

IV.5. Future development
IV.5.1. Evolutionary strategies
Evolutionary strategies may prove to be more efficient in some cases than genetic algorithms,
especially for final tuning of the parameters.
Currently, genetic algorithms are implemented using a free, specialized toolbox developed by
the Evolutionary Computation Research Group in the Department of Automatic Control and
System Engineering from the University of Sheffield, UK.
IV.5.2. Generating training data sets
Besides training process, data preparation is the most time consuming activity involved.
Choosing the most relevant input signals and then the best suited data samples for training has
a crucial importance for a successful training. Thus an automatic or supervised data
preparation process could prove to be a useful tool.
Also, implementing such a process would help online training
IV.5.3. Graphical user interface
After the completion of the toolbox, a specialized graphical user interface would increase the
ergonomics in using this tool.
IV.6. Resources
MATLAB resources, beside the standard package, that were used to implement this project
are:
Statistics Toolbox data clustering functions and structures
Genetic Algorithms Toolbox free toolbox, not included in MATLAB packages, developed
at the Department of Automatic Control and System Engineering from the University of
Sheffield, UK (http://www.shef.ac.uk/acse/research/ecrg/gat.html).

- 33 / 46 -
V. Testing and results
V.1. Vehicle lateral dynamic model
V.1.1. Model description

The model is a simplified one track vehicle lateral dynamic linear model with roll.

System structure:
one input
*
L
o (steering angle),
two outputs and
y
a r (lateral acceleration and yaw rate) and
two state variables and r | (slip angle and yaw rate).

The model expression in state space form:

' ' '
2
*
' 2 ' 2 '
sin
0
1
( )
R R
ef
R
L
r
H V H H V V V
ref ref ref
L
H H V V V V H H V V
z z ref z
g
x v
C C l C l C C
K K K
mv mv mv
n
r r l C l C l C l C l C
I I v I
o o o o o
| | |
o
o o o o o
o | |
o
( + (

( (
(
( (
( (
= + + + (
( (
( (
+
(

( (
( (

&
&

' ' '
'
*
1 0
0 1
0
0 1
y
V H H H V V
V
a
y
ref L
r
C C l C l C
C
n
a
m mv
m
r r
n
o o o o
o
|
o
( +
(
(
( ( (
(
(
= + +
(
( ( (
(
(
(

(
(

One track model/
3 DOF
r
Center of
transient
v
x
y
CG
- 34 / 46 -
V.1.2. Physical simplifications

The vehicle lateral dynamic is a very complicated physical phenomenon; here we use
the simplified model-one track model to describe it. There are some important assumptions
that have been made for the application of the one-track model:
1) The height of gravity centre is zero,
2) there is no pitch and roll motion and
3) the model is purely linear.
For the derivation of the lateral dynamics, a coordinate system is fixed to the centre of
gravity. The equations of motion are described according to the force balances and torque
balances at the centre of gravity.
Therefore, from the application viewpoint, due to the one-track models simplification
especially the simplification in the tire model, it has been verified that it can be a good
approximation of the vehicle dynamics only when lateral acceleration
y
a is small than 0.4g on
normal dry asphalt roads [1]. And it is only valid for some not so critical driving situations,
for the pitch motion and the roll motion has been neglected.

x
z
- 35 / 46 -
V.1.3. Unknown input signal

In this model, there exists one unknown input signal, the road bank angle
x
o . This signal
cannot be measured directly in the general vehicle system, so normally it is taken as an
unknown input signal.

V.1.4. Model parameter variation

Vehicle reference velocity
ref
v
The system matrices are the function of the vehicle reference velocity, such as in A, B, C
matrices; therefore the system is exactly a LTV system. But for the purpose of the vehicle
lateral dynamic research, the variation of the longitudinal vehicle velocity is comparably
slow, so it can be considered as a constant during one observation ( such as in a short time
window as 1 second, for the residual evaluation).

Vehicle mass m
For the load of the vehicle varying, accordingly the vehicle sprung mass and the inertia will
be changed. Especially the changes are very large for the truck, but for the personal car,
comparing to the large total mass, the change caused by the number of passengers can be
neglected normally.

x
o
g m
S
z
y

US
m g
- 36 / 46 -
Vehicle cornering stiffness C
o

Cornering stiffness is the change in lateral force per unit slip angle change at a specified
normal load in the linear range of the tire.
,
( , )
y
z
d F
C f F
d
o
o

o
= =

Nominal value for our research car (Mercedes-Benz S500):
'
103600[ ] , 179000[ ]
V H
N N
C C
rad rad
o o
= =

Actually, the tire sideslip stiffness '
f
C
o
and
r
C
o
depend on roadtire friction coefficient,
wheel load, camber, toe-in, wheel pressure etc., see [3]. The problem of this fact is the
number of the unknown parameters and functions are very large and very complex. There are
some exact functions for the non-linear tire model, such as magic function and HSRI model,
but they can only be used in tire or vehicle off-lines simulation.

The general simple way to linearize the non-linear tire model is to linearize its
characteristics at the origin, so the sideslip stiffness C
o
is taken as a constant. However this
assumption is only valid in small sideslip angle and constant road adhesion efficient.

In some papers [1], [2], based on the stiffness of the steering mechanism (steering column,
gear, etc.), the following assumption has been used,
'
H V
C kC
o o
= .

Tire Heading
Direction
Lateral Force
Slip Angle
- 37 / 46 -
V.1.5. Model noise

The sensor noises are caused by the lateral acceleration sensor, yaw rates sensor noise and
steering angle sensor. The all sensor noise data are measured and supplied by the Bosch
company [2]. The details are given in the table.

V.1.6. Typical failures

Some typical failure types and values for the benchmark system are given in Table 1. The
given values are only used to show a realistic range for the faults, other fault values are also
possible. For the steering angle a ramp fault is because of the sensor type improbable, so no
fault value is given. Multiplicative faults are also not very probable and no realistic fault
values are known at this moment by the authors.

Table 1 Typical failure for the benchmark system
Offset faults
Step Ramp
Yaw rate 2 /s, 5 /s, 10 /s 10/s/min
Lateral acceleration 2 m/s
2
, 5 m/s
2
4 m/s
2
/s, 10 m/s
2
/s
Steering angle 15 , 30 --

Multiplicative faults
Giergeschwindigkeit (100 20) %, (100 40) % 100% (100 50) % in 10 s
Querbeschleunigung (100 50) %, (100 80) % 100% (100 50) % in 10 s
Lenkradwinkel -- --

- 38 / 46 -
V.1.7. Physical parameters of the vehicle lateral
dynamical model

Physical constants Symbol in Matlab Value Unit Explains
g g 9.80665 [m/s^2] Gravity constant
Vehicle parameters
L
i
i_L 18.0 [-] Steering transmission ratio
R
m

m_R 1630 [kg] Rolling sprung mass.
NR
m
m_NR 220 [kg] Non-rolling unsprung mass.
m m=m_R+m_NR [kg] Total mass
V
l

l_V 1.52931 [m] Distance from the vehicle CG
to the front axle
H
l

l_H 1.53069 [m] Distance from the vehicle CG
to the rear axle
z
I
I_z 3870 [kg-m^2] Moment of inertia about the z-
axis of the vehicle
R
K
|

K_phi
0.9429
The roll coefficient
tire model Parameters
'
V
C
o

c_alpha_V 103600 [N/rad] Front tire cornering stiffness
H
C
o
c_alpha_H 179000 [N/rad]. Rear tire cornering stiffness
I.1.1.1. System variables
|

Beta [rad] Vehicle side slip angle
r r [rad/s] Vehicle yaw rate
*
L
o
Delta_L [rad] Vehicle steering angle
y
a
Ay
[ 2
m
s
]
Vehicle lateral acceleration
ref
v
v_ref [Km/h] Vehicle longitude velocity
I.1.1.2. Sensor noise data
Standard variation
ay
n
N_ay 2 (0.2, 2.4) [ ]
y
m
a s
o =

[ 2
m
s
]
Lateral acceleration sensor
noise
- 39 / 46 -
r
n
N_r (0.2, 0.9) [ / ]
r
rad s o =

rad/s] Yaw rate sensor noise
L
n
o

N_delta
*
2
L
rad
o
o =
[rad] Steering angle sensor noise
V.1.8. Reference
[1] Marcus Brner, Rolf Isermann, Adaptive one-track model for critical lateral driving
situations,
[2] Bosch GmbH , Fehlerarten fr die ISP-Sensorik, Internal report, 2003
[3] S. X. Ding, Y. Ma, H.-G. Schulz, B. Chu ect., Fault tolerant estimation of vehicle lateral
dynamics, IFAC, Safeprocess, 2003.
[4] Mitschke, M., Dynamik der Kraftfahrzeuge. Band C. Springer- Verlag, 1990

V.2. Test 1
The following graphic represents measurements of the steering angle.

Figure 9 The steering angle (sample period 0.1 s)

From this data set, a subset is selected for training. The main principle used for determining
the most useful data subset concerns the bandwidth of the signal in specified interval. As long
as the correct data set is selected, in most cases, fewer samples produce a better and more
efficient training.
For the current project development, determining a procedure an implementing an algorithm
for choosing the training set is out of scope, but, as mentioned before, such a procedure would
make the toolbox much more easy to use.
The output variable, lateral acceleration is plotted in the next graphic:
- 40 / 46 -

Figure 10 Lateral Acceleration
From observing the previous graphic, it is obvious that this data set includes a small amount
of noise.
For the first training test the following parameters were set:
Training subset
window_start 5000
window_stop 8500
step 10
Training strategy
Phase 1 Gradient Descent Batch training
Learning rate 0.01
Number of epochs 100
Learning rate 0.005
After training process the following response was obtained
- 41 / 46 -

Figure 11 Training results

Using as input only the steering angle, it is obvious that the neuro-fuzzy structure would not
be able to be very precise, because it is not able to model the internal noise that causes the
small ripples in the reference output.
This might suggest introducing an internal feedback.
Testing the structure for the initial data set, the following results were obtained
- 42 / 46 -

Figure 12 Testing Set (MSE=0.028)
V.3. Test 2
As stated in the previous section, considering an internal feedback as an extra input signal for
the neuro-fuzzy structure might produce better results.
Thus, considering the following notations:
x = steering angle;
y = lateral acceleration;
for this second test, 2 inputs are considered for the neuro-fuzzy structure:
( )
( )
(
=
1 k y
k x
input (41)
Training subset
window_start 5000
window_stop 8500
step 10
Training strategy
Learning rate 0.009
- 43 / 46 -
Learning rate 0.005

Figure 13 Training results (2 inputs, MSE = 0.01)
For both tests, training process was around 58 seconds which is a major improvement from
the previous version of the toolbox.
For the testing set, following graphic shows the results:
- 44 / 46 -

Figure 14 Testing set (MSE=0.0047)

Other tests, with more inputs were considered, but the results are, at this time inconclusive.

- 45 / 46 -
VI. Conclusions
The main improvement brought by this project to the previous version of the MATLAB
toolbox is the speed. It is well known that artificial intelligence techniques like neural
networks, fuzzy systems, neuro-fuzzy systems, evolutionary algorithms (genetic algorithms,
evolutionary strategies, genetic programming) are time consuming and technology dependent.
This version of the toolbox has solved the time dependency of the training process,
performance being influenced only by the training parameters.
On the other hand, there are a great number of parameters that need to be adjusted to obtain
optimum performance. Thus a parameter management is required.
One partial solution is to define some training strategies which will allow some degree of
separation between parameters.
Previous study of the proposed structure revealed a great sensitivity of the output when
varying the SugenoFuzzy systems parameters, thus a small updates could determine a big
modification of the performance (in most cases, decrease of performance). This means that
variation of the fuzzy parameters would determine a quick exploration of the search space and
finding an approximate location of the global optimum.
The restrictions imposed to the ARMA filters, mainly due to stability considerations, make
the filter parameter adjustment suited for tuning the entire systems performance.
Another modification of the toolbox is the implementation of the data structures. In this case
an OOP approach was chosen, thus monitoring the behaviour of each component became an
easy task. Also, finding and isolating errors and faults is assured.
Concluding, this version brings a great increase of speed making the use of the toolbox more
efficient.
Of course, future studies will have to find the best way to increase the performance. It is
obvious that classic training algorithms are not well suited, thus some specialized versions
must be developed.
The results presented confirmed the modelling capabilities of the structure and also that this
can be done efficiently.

- 46 / 46 -
VII. Bibliography and references
1) Kaufman, Arnold Fundamental Theoretical Elements
2) Dasgupta, Dipankar Evolutionary Algorithms in Engineering Applications
3) Hagan, Martin T Neural Network Design
4) Russell, Stuart J Artificial Intelligence: A Modern Approach
5) Bellman, Richard Ernest Methods Of Nonlinear Analysis
6) Bck, Thomas - Evolutionary Algorithms in Theory and Practice (Evolution
Strategies, Evolutionary Programming, Genetic Algorithms )
7) www.wikipedia.org
8) http://www.shef.ac.uk/acse/research/ecrg/gat.html

Evolutionary Neuro - Fuzzy System With Internal Dynamics For System Identification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evolutionary Neuro - Fuzzy System With Internal Dynamics For System Identification

Uploaded by

Copyright:

Available Formats

Universitt Duisburg-Essen

You might also like