You are on page 1of 11

ARTICLE IN PRESS

Engineering Applications of Articial Intelligence 16 (2003) 453463

Recurrent radial basis function network for time-series prediction


Ryad Zemouri*, Daniel Racoceanu, Noureddine Zerhouni
! de Fonctionnement, 25, Rue Alain Savary, 25 000 Besan@on, France # e Laboratoire dAutomatique de Besan@on, Groupe Maintenance et Suret

Abstract This paper proposes a Recurrent Radial Basis Function network (RRBFN) that can be applied to dynamic monitoring and prognosis. Based on the architecture of the conventional Radial Basis Function networks, the RRBFN have input looped neurons with sigmoid activation functions. These looped-neurons represent the dynamic memory of the RRBF, and the Gaussian neurons represent the static one. The dynamic memory enables the networks to learn temporal patterns without an input buffer to hold the recent elements of an input sequence. To test the dynamic memory of the network, we have applied the RRBFN in two time series prediction benchmarks (MacKey-Glass and Logistic Map). The third application concerns an industrial prognosis problem. The nonlinear system identication using the Box and Jenkins gas furnace data was used. A two-steps training algorithm is used: the RCE training algorithm for the prototypes parameters, and the multivariate linear regression for the output connection weights. The network is able to predict the two temporal series and gives good results for the nonlinear system identication. The advantage of the proposed RRBF network is to combine the learning exibility of the RBF network with the dynamic performances of the local recurrence given by the looped-neurons. r 2003 Elsevier Ltd. All rights reserved.
Keywords: Neural network; Radial basis function; Dynamic neural networks; Recurrent neural networks; Neural predictive model; Time series prediction

1. Introduction The modern industrial monitoring requires processing a certain number of sensors signals. It concerns essentially the detection of all deviations comparing to a working reference by generating an alarm, and the failure diagnosis. The diagnosis operation has two main functions: the location of the weakening system or subsystem and the identication of the primary cause of this failure (Lefebvre, 2000). The monitoring methods can be classied in two categories (Dash and Venkatasubramanian, 2000): model-based monitoring methodologies and without any model monitoring. The rst class contains essentially control system techniques based on the difference between the system models outputs and the equipments output (Combacau, 1991). The major disadvantage of these techniques consists in the difculty to obtain the formal model especially for complex or re-congurable equipments. The second class of monitoring techniques is not sensitive to this problem. These techniques are the probabilistic ones and the
*Corresponding author. URL: http://www.lab.cnrs.fr 0952-1976/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0952-1976(03)00063-0

Articial Intelligence ones. The AI techniques are essentially based on a training process that gives certain adaptability to the monitoring application (Rengaswamy and Venkatasubramanian, 1995). The use of the Articial Neural Networks (ANN) on a monitoring task can be viewed as a pattern recognition application. The form to recognize is the measurable or observable equipment data. The output classes are the different working and failure modes of the equipment (Koivo, 1994). The Radial Basis Function Networks are completely adapted to this kind of application. Due to the non-exhaustiveness of the history database of the equipment operation, RBF networks are able to detect new operations or failures modes by their local generalization. This one is obtained by the Gaussians basis functions that are maximal to the core, and decrease in a monotonous way with the distance. The second advantage of the RBF network is the exibility of their training process. The problem with the static classication methods is that the dynamic process behavior is not considered (Koivo, 1994). For example, the distinction between a true degradation and a false alarm needs a dynamic processing of the sensors signals (Zemouri et al., 2002a).

ARTICLE IN PRESS
454 R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463

In our previous works, we have demonstrated that a dynamic RBF is able to distinguish between a pick of variation and a continuous variation of a signal sensor. This can be interpreted as a distinction between a false alarm and a true degradation. The prognosis function is also strongly dependent on the dynamic behavior of the process. The aim of the prognosis function is to predict a sensor signal evolution. This operation can be obtained either by a priori knowledge of the laws of the ageing phenomena evolution or by a training process of the signal evolution. In this way, the prognosis can identify degradations or predict the time remaining before breakdown (Brunet et al., 1990). For this purpose, we introduce a new Recurrent Radial Basis Function Network (RRBF) architecture that is able to learn temporal sequences. The RRBFN network is based on the advantages of Radial Basis Function networks in term of training process time. The recurrent or dynamic aspect is obtained by cascading looped neurons on the rst layer. This layer represents the dynamic memory of the RRBF network that permits to learn temporal data. The proposed network combines the easy use of the RBF network with the dynamic performance of the Locally Recurrent Globally Feed forward network (Tsoi and Back, 1994). The prognosis function can be seen like a timeseries prediction problem. In order to validate the prediction capability of the RRBFN, we test the network on two standards time series prediction benchmarks: the MacKey-Glass and the Logistic Map. The prognosis validation is made on a nonlinear system identication using the Box & Jenkins gas furnace data. The paper is organized in three sections: a brief survey of the RBF network, their application and their training process algorithms is presented in the second section. The third section describes the architecture of the RRBF network for the time series prediction. Finally, we present the results obtained on the three benchmarks.

also on the size of the inuence eld: ! jjx lj jj2 : fj x exp 2s2 j

For a given input, a restricted number of basis functions gives the calculation of the output. The RBF network can be classied in two categories, according to the type of output neuron: standardized and nonstandardized (Mak and Kung, 2000; Moody and Darken, 1989; Xu, 1998; Ghosh and Nag, 2000). Moreover, the RBF network can be used in two kind of application: regression and classication. 2.2. RBF training techniques The parameters of the RBF networks are the center and the inuence eld of the radial function and the output weight (between the intermediate layers neurons and those of the output layer). The training process can obtain these parameters. One classify these training techniques in the three following groups: 2.2.1. Supervised techniques The principle of these techniques is to minimize the quadratic error (Ghosh et al., 1992): X E En : 2
n

At each step of the training process, we consider the variations: Dwij of the weight, Dmjk of the center and Dsj of the inuence eld. The update law is obtained by using the descent of the gradient on En (Rumelhart et al., 1986; Le Cun, 1985). 2.2.2. Heuristic techniques The principle of these techniques is to determine the network parameters in an iterative way. Generally, we start the training process by initializing the network on a center with an initial inuence eld l0 ; s0 : Presenting the training vectors progressively creates the prototypes centers. The aim of the next step is to modify the inuence rays and the connections weights (only weights between the intermediate layer and the output one). Some of the heuristic techniques used for RBF training are presented below: 2.2.2.1. RCE Algorithm (Restricted Coulomb Energy) (Hudak, 1992). The RCE Algorithm was inspired from the theory of particles charges. The principle of the training algorithm is to modify the network architecture in a dynamic way. The intermediate neurons are added only when it is necessary. The inuence eld is then adjusted to minimize conicting zones by a threshold y (Fig. 1). 2.2.2.2. Dynamic Decay Adjustment Algorithm (Berthold and Diamond, 1995). This technique, partially extracted

2. Radial basis function network overview 2.1. RBF networks denition Radial Basis Functions networks are able to provide a local representation of an N -dimensional space. This is made by restricted inuence zone of the basis functions. The parameters of this basis function are given by a reference vector (core or prototype) lj and the dimension of the inuence eld sj : The response of the basis function depends on the Euclidian distance between the input vector x and the prototype vector lj ; and depends

ARTICLE IN PRESS
R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463 455

xA xn xB x

cluster point N k with the same class. This center is obtained by a segmentation of the training space wk of k J k the k classes, in J k disjoined groups fw j gj 1 : The population of this group is Njk points. We estimate then the center lj of the function by the average: 1 X x: 4 lj k Nj xAwk
j

Input Vector (category B)


Fig. 1. Inuence eld adjustment by RCE algorithm. Only one threshold is used. The reduction of the conicting zone must respect the following relations: fB xA oy; fA xn oy; fA xB oy: No new prototype is added for the input vector xn :

The second step calculates the variance of the Gaussian function (inuence eld). This one is calculated using the following expression: 1 X sj k x lj x lj T : 5 Nj xAwk
j

+
xA xn xB x

Input Vector (category B)


Fig. 2. Inuence eld adjustment by DDA algorithm. Two thresholds y and y are used for the conict reduction according to this expression fB xA oy ; fA xn oy ; fA xB oy : No prototype is added for the input vector fB xn > y :

Method Expectation Maximization (EM) (Dempster et al., 1977): This technique is based on the analogy between the RBF network and the Gaussian mixture models. The Expectation Maximization (EM) algorithm determines, in an iterative way, the parameters of a Gaussian mixture (by the maximum of probability). The RBF parameters are obtained by the two steps: step E which calculates the mean of the unknown data compared to the known data. The step M which maximizes the vector parameters of the step E : 2.2.3.2. Second phase (Supervised). Maximum of membership (Hernandez, 1999): This technique, used in the classication applications, considers the most signicant basis functions values fi x: fmax max fi ;
i1 N

from the RCE algorithm, is used for classication applications (discrimination). The principle of this technique is to introduce two thresholds y and y in order to reduce conicting zone between prototypes. To ensure the convergence of training algorithm, the neural network must satisfy the two inequality (3) and this for each vector x of class C from the training set (Fig. 2):
k (i: fc i xXy 48k ac; 8j : fj xoy :

where N is the number of basis functions for all the classes. The output of the neural network is then given by y classefmax : 7 Algorithm of least squares: Let suppose that is xed an empirical risk function to minimize (Remp). As for the Multi Layer Perceptron, the determination of the parameters can then be done in a supervised way by gradient decent method. If the selected cost function is quadratic with xed basis functions F; the weight matrix W is obtained by a simple linear system resolution. The solution is the weights matrix W that minimizes the empirical risk Remp. By canceling the derivative of this risk compared to the weight, we obtain the optimal conditions, which can be written in the following matrix form: Ft FW t Ft Y :
t

2.2.3. Two times training techniques These techniques estimate the RBF parameters in two phases: a rst phase is used to determine the centers and the rays of the basis functions. In this step, only input vectors are used (unsupervised training algorithm). The second step has to calculate the connections weights between the hidden layer and the output layer (supervised training). Some of these techniques are presented as below. 2.2.3.1. First phase (unsupervised). The k-means algorithm: The prototypes centers and the variances matrix can be calculated in two steps: in the rst step, the k-means cluster algorithm determines the center of the

Y represents the desired outputs vector. If the F F matrix is square and non-singular (Michelli condition (Michelli, 1986)), the optimal solution for the weights, with xed basis functions, can be written as W t Ft F1 Ft Y F1 Y : 9

ARTICLE IN PRESS
456 R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463
xi t t+1 t+2 ai
a0
ai wii

3. The recurrent radial basis function network The proposed recurrent RBF neural network considers the time as an internal representation (Chappelier, 1996; Elman, 1990). The dynamic aspect is obtained by the use an additional self-connection on the input neurons with a sigmoid activation function. These looped neurons are a special case of the Locally Recurrent Globally Feedforward architecture, called local output feed back (Tsoi and Back, 1994). The RRBF network can thus take into account a certain past of the input signal (Fig. 3). 3.1. Looped neuron Each neuron of the input layer gives a summation at the instant t between its input Ii and its previous output weighted by a self-connection wii : The output of its activation function is ai t wii xi t 1 Ii t; xi t f ai t; 10 11

xi

f(ai)

a+

f(ai)

ai

()

(a)

(b)

Fig. 4. Equilibrium points of the looped neuron: (a) the forget behavior kwii p2 and (b) temporal memorizing behavior (kwii > 2).

function parameter k: The equilibrium points of the looped neuron satisfy the following equation: at wii f at 1: 14

The point a0 0 is a rst obvious solution of this equation. The other solutions are obtained by the variations study of the function: ga a wii f a: 15

where ai t and xi t represent respectively the neuron activation and its output at the instant t: f is the sigmoid activation function: 1 expkx : 12 f x 1 expkx To highlight the inuence of this self-connection, we let evolve the neuron without an external inuence (Frasconi et al., 1995; Bernauer, 1996). The initial conditions are: the input Ii t0 0 and that xi t0 1: The output of the neuron evolves according to the following expression: 1 expkwii xt 1 : 13 xt 1 expkwii xt 1 Fig. 4 shows the temporal evolution of the output neuron. This evolution depends on the slope of the straightline D: This slope depends on two parameters: the selfconnection weight wii and the value of the activation
w Input

According to kwii ; the looped neuron has one or more equilibrium points:
*

If kwii p2; the neuron has only one equilibrium point a0 0: If kwii > 2; the neuron has three equilibrium points a0 0; a > 0; a o 0:

To study the stability of these points, we study the variations of the Lyapunov function (Frasconi et al., 1995; Bernauer, 1996). In the case where kwii p2; this function is dened by V a a2 : We obtain DV wii f a2 a2 gawii f a a: 16

I1
w

If a > 0; then f a > 0 and gao0: If wii > 0 so then, we have DV o0: If ao0; then f ao0 and ga > 0: If wii > 0; we have DV o0: The point a0 0 is thus a steady-state equilibrium point if kwii p2 with wii > 0: In the case where kwii > 2; the looped neuron has three equilibrium points: a0 0; a > 0 and a o0: To study the stability of the point a ; we dene the Lyapunov function V a a a 2 (see Frasconi et al., 1995; Bernauer, 1996). We obtain DV wii f a a 2 a a 2 gaga 2a a :

I2
w

I3
Sigmoid Function

Output Neurons

Radial Basis Function

Fig. 3. RRBF network (recurring networks with radial basis functions).

If a > a ; gao0 and ga 2a a 0; so we have DV o0: The calculation is the same in the case of aoa : The point a is a stable equilibrium point. In the same way, we can prove that the point a is another stable equilibrium point. The point a0 0 is an unstable equilibrium point.

ARTICLE IN PRESS
R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463 457

The looped neuron thus can exhibit two behaviors according to kwii : forgetting behavior kwii p2; and temporal memory behavior kwii > 2: The gure below shows the inuence of the self-connection weight on the behavior of the looped neuron with k 0:05 (Fig. 5): The self-connection procures to the neuron the capacity to memorize a certain past of the input data. The weight of this self-connection can be obtained by training, but the easier way to do it is to x it a priori. We will see in the next section how this looped neuron can make the RRBF network possible to treat dynamic data whereas traditional RBR treat only static data. 3.2. RRBF for the prognosis After showing the effect of the self-connection on the dynamic behavior of the RRBF network, we present in this paragraph the topology of the RRBF network and its training algorithm for time series prediction applications (Fig. 6).

The looped neurons cascades represent the dynamic memory of the neural network. The network then treats the data dynamically. The output vector of the looped neurons represents the input vector for the RBF nodes. The neural network output is dened by n X yt w i f i li ; s i ; 17
i1

where wi represents the connection weight between radial neurons and the output neuron. The output of the RBF nodes has the following expression: ! Pm j 2 j j 1 x t li fi li ; si exp 18 s2 i li lji m j 1 and si represent respectively the center and the dimension of the inuence ray of the ith prototype. These radial neurons are the static memory of the network. The output xj t of the j th looped neurons is the dynamic memory of the network with the following expression: xj t 1 expk$ xj t 1 xj 1 t 1 expk$ xj t 1 xj 1 t 19

1 0.9

Output of the looped neuron

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

wii = 41 wii = 39
60 80 100 120 140

wii = 30
0 20 40

wii = 40
160 180 200

Time

Fig. 5. Inuence of self-connection on the behavior of the looped neuron with k 0:05:

with j 1; y; m represents the number of the neurons of the input layer. The rst neuron of this layer has a linear activation function x1 t xt: Fig. 7 shows the relation between the looped neuron number and the length of a signal past. We have introduced a variation D at the instant t 50 for a signal (Figs. 7(a) and (b)). The aim is to highlight the dynamic memory longer of the RRBF shown in Fig. 6. Four looped-neuron RRBF is stimulated by the signal of Fig. 7(a). Figs. 7(c)(f) show the output error of each looped neuron caused by this variation D: The network parameters are determined with a twostage training process. During the rst stage, an unsupervised learning algorithm is used to determine the parameters of the RBF nodes (the centers and the inuence rays). In the second stage, linear regression is used to determine the weights between the hidden and the output layer. 3.3. Training process of the RRBF 3.3.1. The prototypes parameters The rst step of the training process consists to determine the centers and the inuence rays of the prototypes (static memory). These prototypes are extracted from the output of the looped neurons (dynamic memory). Each temporal signal is thus characterize by a cluster point that the coordinate are the output of the loop neuron at every moment t: We have adopted the RCE training algorithm for this rst stage of the training process. The inuence rays are

Fig. 6. Topology of the RRBF. The self-connection of the input neurons procures to the network a dynamic processing of the input data.

ARTICLE IN PRESS
458
62 60 58 56 54 52 50 48 46 44 0 50 100 150 200 250 300

R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463


62 60 58 56 54 52 50 48 46 44 0 50
-4

100

150

200

250

300

where N represents the number of the basis functions, centered in the N input points. The solution of this problem is to solve the N linear equations to nd the weight coefcients: 2 32 3 2 3 w1 y1 f11 f12 ? f1n 6f 7 6 7 6 7 6 21 f22 ? f2n 76 w2 7 6 y2 7 21 6 76 7 6 7; 4 ^ ^ & ^ 54 ^ 5 4 ^ 5 fn1 fn2 ? fnn wn yn yi is the desired output, and fij fjjli lj jj; i ; j 1; 2; y ; n: 22

(a)
0.012 0.01 0.008 0.006

Signal evolution

(b)
2.5 2 1.5
x 10

signal with variation

The equation can be written as


1

0.004 0.002 0 0 50 100 150 200 250 300 0.5 0

F w Y: The weight vector is then


0 50 100 150 200 250 300

23

(c)

1st looped neuron error


-5

(d)
4 3.5 3 2.5 2 1.5 1 0.5

2nd looped neuron error


x 10
-6

w F1 Y: 4. Application in prediction

24

1.8 x 10 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0

50

100

150

200

250

300

50

100

150

200

250

300

(e)

3rd

looped neuron error

(f)

4th looped neuron error

Fig. 7. Inuence of the number of looped neurons on the length of the dynamic memory of the network: (a) signal evolution, (b) signal with variation D; (c) rst looped neuron error, (d) second looped neuron error, (e) third looped neuron error, and (f) fourth looped neuron error.

We have tested the RRBF network on three time series predictions applications. On these three applications, the required goal is to predict the evolution of the input data from the knowledge of the past of these data. The training process is made from a part of the data set. The network was tested on the totality of the data. We give for each application, two error-prediction average and two error standard deviations according if the network test is made on the only the test population or on both test and training population. 4.1. MacKeyGlass chaotic time series The MacKeyGlass chaotic time series is generated by the following differential equation: x t bxt axt t : 1 x10 t t 25

adjusted according to a threshold y: A complete iteration of this algorithm is as follows: // Training Iteration // Creation of a new prototype for all training vector x Do: add a new prototype pn1 with: ln1 x n 1 end // adjusting the inuence rays for all prototype li Do: si max1pjpn4jai s: fi lj oy end // End 3.3.2. Connections weights The time series prediction can be seen like an interpolation problem. The output of RBF network is hx
n X i1

wi fi jjx li jj;

20

xt is quasi-periodical and chaotic for the following parameters: a 0:2; b 0:1 and t 17 (Jang, 1993; Chiu, 1994). The simulated data were obtained by using the fourth-order RungeKutta method for Eq. (25) with the following initial conditions x0 1:2; and xt t 0 for 0ptot: The simulation step is 1. The data of this series are available on the following location http:// neural.cs.nthu.edu.tw/jang/benchmark. We have tested the RRBF network presented previously on the MacKeyGlass prediction. To obtain good result, we have used six looped neurons. The parameters of these looped neurons are set such as to obtain the longest dynamic memory (Fig. 5). This characteristic is obtained with the value $ 40 of the self-connection and the parameter of the sigmoid function k 0:05: The parameters of the Gaussian functions as well as the

ARTICLE IN PRESS
R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463 Table 1 Results of the RRBF test on the MacKeyGlass series prediction Nb 50 100 150 200 250 300 350 400 450 500 Min 3:90 10 3:27 105 4:13 105 2:60 105 4:54 105 1:46 105 2:45 106 3:35 105 9:56 105 1:50 105
4

459

Max 0.043% 0.0036% 0.00458% 0.00288% 0.00504% 0.00162% 0.00027% 0.0037% 0.01062% 0.00166% 1.1669 1.1632 0.7129 0.3915 0.3000 0.2727 0.2874 0.3114 0.2893 0.2789 129% 129% 79% 43% 33% 30% 31% 34% 32% 31%

Moy1 0.1862 0.0969 0.0655 0.0502 0.0480 0.0441 0.0439 0.0375 0.0360 0.0380 20% 10% 7% 5% 5% 5% 4% 4% 4% 4%

Moy2 0.1776 0.0879 0.0564 0.0408 0.0369 0.0318 0.0296 0.0236 0.0209 0.0203 19% 9% 6% 4% 4% 3% 3% 2% 2% 2%

Dev Std1 0.251 0.184 0.103 0.058 0.054 0.048 0.048 0.042 0.042 0.043 27% 20% 11% 6% 6% 5% 5% 4% 4% 4%

Dev Std2 0.2482 0.1778 0.0982 0.0559 0.0518 0.0456 0.0445 0.0382 0.0368 0.0371 27% 19% 11% 6% 5% 5% 5% 4% 4% 4%

Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev Std2 are the standard deviations without and with training data. The percentages are given according to the amplitude of the signal 0.9.
1.4 0.35

1.2

0.3

0.25

0.8

0.2

0.6

0.15

0.4

0.1

0.2

Network Output System Output

0.05

200

400

600

800

1000

1200

200

400

600

800

1000

1200

(a)

(b)

Fig. 8. Prediction results: (a) neural network output and the MacKey-Glass series values and (b) error of the neural network prediction.

connections weights are given by the training algorithms presented previously with y 0:8: Table 1 presents the obtained results by the RRBF network with different number of training points (Nb) taken from the 118th data point. The prediction errors between the network output and the real value of the series are presented in the various columns of the table with the percentages of each error. This percentage is calculated according to the amplitude 0.9 of the series. The network is able to predict the series evolution with a minimum of 50 training points with a mean error equal to 19% and standard deviation error equal to 27%. This error decreases with the augmentation of the training points until 2% of the error. The training time corresponds to one iteration. Fig. 8 show the results of the test with 500 training points. 4.2. Logistic map The Logistic Map series is dened by the expression below: xt 1 4xt1 xt: 26

This series is chaotic in the interval of [0,1], with x0 0:2: The goal of this application is to predict the target value of xt 1: The input value of the RRBF network is xt: The best prediction results are obtained with one looped neuron having the parameters $ 40 for the self-connection, and k 0:05 for the sigmoid function parameter. The parameter y 0:999 was used for the rst stage training process. Table 2 shows the test results of the RRBF network for different training number (Nb). The network can gives good results with only 10 training points. Fig. 9 shows the results of the test with a 100 training data points. 4.3. Prediction nonlinear system The third application relates to a nonlinear prediction system, using the Box and Jenkins (1970) gas furnace database, which is available in the location http:// neural.cs.nthu.edu.tw/jang/benchmark. These data represent a time series of gas furnace process with ut represents the input gas and yt represents the output

ARTICLE IN PRESS
460 R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463 Table 2 Results of the RRBF test on the Logistic Map series prediction Nb 10 20 30 40 50 60 70 80 90 100 Moy1 0.0945 7:26 104 1:59 106 4:69 108 1:33 109 4:29 1010 7:11 1011 4:23 1012 1:51 1011 2:14 1011 9% 7:26 102 % 1:59 104 % 4:69 106 % 1:33 107 % 4:29 108 % 7:11 109 % 4:23 1010 % 1:51 109 % 2:14 109 % Moy2 0.0898 6:53 104 1:35 106 3:75 108 1:00 109 3:02 1010 5:10 1011 3:25 1012 1:32 1011 1:55 1011 9% 6:53 102 % 1:35 104 % 3:75 106 % 1:00 107 % 3:02 108 % 5:10 109 % 3:25 1010 % 1:32 109 % 1:55 109 % Dev Std1 0.0636 5:11 104 1:69 106 3:66 108 1:64 109 8:06 1010 1:90 1010 9:86 1012 1:23 1011 1:68 1011 6% 5:11 102 % 1:69 104 % 3:66 106 % 1:64 107 % 8:06 108 % 1:90 108 % 9:86 1010 % 1:23 109 % 1:68 109 % Dev Std2 0.0652 5:32 104 1:66 106 3:77 108 1:53 109 7:00 1010 1:55 1010 7:74 1012 1:45 1011 1:38 1011 6% 5:32 102 % 1:66 104 % 3:77 106 % 1:53 107 % 7:00 108 % 1:55 108 % 7:74 1010 % 1:45 109 % 1:38 109 %

Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

Network Output System Output


1 0.9
8 9

x 10

-11

0.8
7

0.7
6

0.6
5

0.5
4

0.4 0.3 0.2 0.1 0 0 20 40 60 80 100 120 140 160 180 200
3 2 1 0

20

40

60

80

100

120

140

160

180

200

(a)

(b)

Fig. 9. (a) Comparison of the prediction results of the network and the values of the series Logistic Map and (b) error of prediction of the neuron network.

CO2 concentration. The goal of this application is to predict the yt value from the knowledge of yt 1 and u t 1 : The used RRBF network contains two inputs: an input for yt and another for ut: The past of each input signal is taken into account by a looped neuron. The output of the neural network gives the yt 1 value. The network is composed of four input neurons (a linear neuron and a looped neuron for each input signal) and one output neuron. The intermediate neurons are determined by the rst stage training process described previously. The rst 145 points of the database are used for the training process. The second stage-training algorithm determined the connections weights. The best results were obtained with $ 500 and k 0:05 for the sigmoid function, and y 0:84 for the training of the inuence ray.

Table 3 shows the results of the network test on this application. The RRBF neuronal network gives a prediction result with an error average estimation of 8%. The training process takes one timeiteration.

5. Discussion The Recurrent Radial Basis Function Network presented in this article was successfully validated in the two time series prediction problems. Figs. 8 and 9 show the results and the error prediction of the RRBF for the MacKeyGlass series and the Logistic Map series. This dynamic aspect is obtained thanks to the looped input nodes (Fig. 3). This local output feedback procures to the neuron a dynamic memory (Fig. 5). We

ARTICLE IN PRESS
R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463 Table 3 Results of the RRBF test on the nonlinear system prediction Nb 145 Min 0.0067 0.04% Max 18.0235 120% Moy1 1.5274 10% Moy2 1.2441 8% Dev Std1 2.3267 15% Dev Std2 3.4950 23% 461

Nb represents the population of the training points. The columns Min and Max represent minimal and maximal error prediction. Moy1 represent the average errors of predictions on the part of the data without training population, and Moy2 the average errors on all the data. Dev Std1 and Dev Std2 are the standard deviations without and with training data. The percentages are given compared to amplitude of the signal.

62 60

58 56 54
1

y(t)

u(t)
52 50 48 46 44 0 50 100 150 200 250 300

-1

-2

-3 0 50 100 150 200 250 300

(a)

(b)

Fig. 10. (a) CO2 output concentration of the gas furnace and (b) input gas of the furnace.

do not have so to use temporal windows to store or bloc the input data as some neural architecture: NETtalk introduced by Sejnowski and Rosenberg (1986), the TDNN by Waibel et al. (1989) and the TDRBF by Berthold (1994). These temporal windows techniques can have many disadvantages (Elman, 1990). First, the data must be blocked by an external mechanism: when the data can be presented to the network? The second disadvantage is the limitation of the temporal window dimension. The recurrent networks are not affected with these points. We have shown in Fig. 7 that the RRBF with four looped neurons is sensitive to a past of about 100 step time data. A second advantage of the RRBF is the exibility of the training process. A two stage-learning algorithm was used. The rst stage concerns the determination of the RBF parameters, and the second stage for the output weight calculation. Only few seconds are required for train the RRBF by a personal computer with a 700 MHz processor. The major difculty is to nd the best parameters that optimize the output result. These parameters are: the number of the input looped neurons N > 0; the selfconnection value wii > 0; the parameter of the sigmoid function k > 0; and the parameter of the rst stagetraining algorithm 0oyo1. In the major case, we can have good results with only one looped neuron N 1:

This input neuron is congured to have the longest memory obtained with kw 2 (Fig. 5). The k parameter is chosen so that to give a quasi-linear aspect to the sigmoid function around the initial point kE0:05: The last parameter to adjust is the rst stage-training threshold y: The results obtained by the RRBF show that the RCE algorithm does not rigorously calculate the parameters of the Gaussian nodes. The neural network is over training. This result is completely coherent because all the data of the training set are stored as prototypes. The clustering techniques like the k-means algorithm, which minimizes the sum of squares error (SSE) between the inputs and hidden node centers, will certainly give better result than the RCE algorithm. However, these techniques can have also some disadvantages. We have presented in our previous work an example which highlights these disadvantages (Zemouri et al., 2002b):
*

There is no formal method for specifying the number of hidden nodes. These nodes are initialized randomly. We have to run several iterations to obtain the best result.

Our future works will concern the development of a new method, which boosts the performances of the k-means algorithm (Figs. 1012).

ARTICLE IN PRESS
462 R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463
Network Output System Output y(t)
60

50

40

Training population

Test population

30 0 50 100 150 200 250 300

Fig. 11. Comparison of the test results of the CO2 concentration prediction of the furnace gas with the real values.

20

10

0 0 50 100 150 200 250 300

Fig. 12. Prediction error of the RRBF network.

6. Conclusion We have presented in this article an application of the RRBF network on three time series prediction problems: MacKey-Glass, Logistic Map and Box & Jenkins gas furnace data. Thanks to its dynamic memory, the RRBF network is able to learn temporal sequences. This dynamic memory is obtained by a self-connection of the input neurons. The input data are not blocked by an external mechanism, but are memorized by the input neurons. The training process time is relatively short. It took one iteration-time for the RBF parameters calculation and a matrix multiplication-time for the output weight calculation. In the three examples, all the training data were correctly tested. The results obtained in the three Time-Series Prediction applications represent a validation for the dynamical data-treatment by the RRBF network.

References
! seaux de neurones et laide au diagnostic: un Bernauer, E., 1996. Les re " le de neurones boucle ! s pour lapprentissage de se ! quences mode temporelles, Ph.D. Thesis, LAAS/FRANCE. Berthold, M.R., 1994. A time delay radial basis function network for phoneme recognition. Proceedings of International Conference on Neural Networks, Orlando, Vol. 7, pp. 44704473. Berthold, M.R., Diamond, J., 1995. Boosting the performance of RBF networks with dynamic decay adjustment. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (Eds.), Advances in Neural Information Processing Systems, MIT Press, Cambridge, MA, pp. 521528. Box, G.E.P., Jenkins, G.M. 1970. Time Series Analysis, Forecasting and Control. Holden Day, San Francisco, pp. 532533.

" re, M., Rault, A., Verge ! , M., 1990. Brunet, J., Jaume, D., Labarre ! tection et diagnostic de panes, Approche par mode ! lisation. De ! rie diagnostic et mainTraitement des nouvelles technologies/se tenance, edition hermes FRANCE. Chappelier, J.C., 1996. RST: une architecture connexionniste pour la prise en compte de relations spatiales et temporelles. Ph.D. Thesis, ! rieure des Te ! le ! communications/France. Ecole Nationale Supe Chiu, S., 1994. Fuzzy model identication based on cluster estimation. Journal of Intelligent & Fuzzy Systems 2 (3), 267278. " mes a " Combacau, M., 1991. Commande et surveillance des syste ! ve ! nements discrets complexes: application aux ateliers exibles. e Ph.D. Thesis, University of.Sabatier Toulouse, France. Dash, S., Venkatasubramanian, V., 2000. Challenges in the industrial applications of fault diagnostic systems. Proceedings of the Conference on Process Systems Engineering Computing and Chemical Engineering 24(27). Keystone, Colorado, pp. 785791. Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistic Society, Series B 39, 138. Elman, J.L., 1990. Finding Structure in Time. Cognitive Science 14, 179211. Frasconi, P., Gori, M., Maggini, M., Soda, G., 1995. Unied integration of explicit knowledge and learning by example in recurrent networks. IEEE Transactions on Knowledge and Data Engineering 7 (2), 340346. Ghosh, J., Nag, A., 2000. In: Howlett, R.J., Jain, L.C. (Eds.), Radial Basis Function Neural Network Theory and Applications. PhysicaVerlag, Wurzburg. Ghosh, J., Beck, S., Deuser, L., 1992. A neural network based hybrid system for detection, characterization and classication of shortduration oceanic signals. IEEE Journal of Ocean Engineering 17 (4), 351363. ! me de diagnostic par re ! seaux de neurones Hernandez, N.G., 1999. Syste " la de ! tection dhypovigilance dun et statistiques: application a conducteur automobile. Ph.D. Thesis, LAAS/France. Hudak, M.J., 1992. RCE classiers: theory and practice. Cybernetics and Systems 23, 483515. Jang, J.-S.R., 1993. ANFIS: adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics 23, 665685. Koivo, H.N, 1994. Articial neural networks in fault diagnosis and control. Control in Engineering Practice 2 (1), 89101. ! dure dapprentissage pour re ! seau a " seuil Le Cun, Y., 1985. Une proce ! trique. Cognitiva 85, 599604. asyme " la mode ! lisation des syste ! mes Lefebvre, D., 2000. Contribution a " e ! ve ! nements discrets pour la commande et la dynamiques a " Diriger des Recherches, Universite ! de surveillance. Habilitation a ! / IUT Belfort, Montbe ! liard/France. Franche Comte Mak, M.W., Kung, S.Y., 2000. Estimation of elliptical basis function parameters by the EM algorithms with application to speaker verication. IEEE Transactions on Neural Networks 11 (4), 961969. Michelli, C.A., 1986. Interpolation of scattered data: distance matrices and conditionally positive denite functions. Constructive Approximation 2, 1122. Moody, J., Darken, J., 1989. Fast learning in networks of locally tuned processing units. Neural Computation 1, 281294. Rengaswamy, R., Venkatasubramanian, V., 1995. A syntactic pattern recognition approach for process monitoring and fault diagnosis. Engineering Applications of Articial Intelligence Journal 8 (1), 3551. Rumelhart, D.E, Hinton, G.E., Williams, R.J., 1986. Learning internal representation by error propagation. In: Rumelhart, D.E., McClelland, J.L. (Eds.), Parallel Distributed Processing Explorations in the Microstructure of Cognition, Vol. 1. The MIT Press, Bradford Books, Cambridge, MA, pp. 318362.

ARTICLE IN PRESS
R. Zemouri et al. / Engineering Applications of Articial Intelligence 16 (2003) 453463 Sejnowski, T.J., Rosenberg, C.R., 1986. NetTalk: a parallel network that learns to read aloud. Electrical Engineering and Computer Science Technical Report, The Johns Hopkins University. Tsoi, A.C., Back, D., 1994. Locally Recurrent Globally Feedforward: a critical review of the architectures. IEEE Transactions on Neural Networks 5 (2), 229239. Xu, L., 1998. RBF nets, mixture experts, and Bayesian Ying-Yang learning. Neurocomputing 19 (13), 223257. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K., 1989. Phoneme recognition using time delay neural network. IEEE 463 Transactions in Acoustics, Speech and Signal Processing 37 (3), 328339. Zemouri, R., Racoceanu, D., Zerhouni, N., 2002a. Application of the dynamic RBF network in a monitoring problem of the production systems. 15 IFAC World Congress on Automatic Control, Barcelone, Espagne. " seaux de Zemouri, R., Racoceanu, D., Zerhouni, N., 2002b. Re ! currents a " Fonction de base Radiales RRFR: neurones Re Application au pronostic. Revue dIntelligence Articielle, RSTI ! rie RIA 16 (03), 307338. Se

You might also like