You are on page 1of 5

TRAINING AND EXTRACTION OF FUZZY FINITE STATE AUTOMATA IN

RECURRENT NEURAL NETWORKS

Rohitash Chandra and Christian W. Omlin


School of Science and Technology
University of Fiji

Faculty of Science and Technology


University of the South Pacific
Fiji Islands
rohitashc@unifiji.ac.fj , omlin_c@usp.ac.fj

ABSTRACT were used for inducing FFA by Trakhtenbrot-Barzdin


We present a machine learning approach for the algorithm, thus the input-output mapping become a means
extraction of fuzzy finite-state automata (FFAs) from for knowledge extraction.
recurrent neural networks. After successful training on
strings with fuzzy membership µ є [0,1], we apply a Recurrent neural networks have been an important focus
generalisation of the Trakhtenbrot-Barzdin algorithm to of research as they can be applied to difficult problems
extract FFAs in symbolic form from the trained network involving time-varying patterns. Their applications range
using the string labelling assigned by the trained network. from speech recognition and financial prediction to
Our results demonstrate that the approach successfully gesture recognition [1,2,3]. Recurrent neural networks
extracts the correct deterministic equivalent automata for contain feedback connections. They are composed of an
strings much shorter than the longest string in the training input layer, a context layer which provides the recurrence,
set. a hidden layer and an output layer. Each layer contains
one or many single processing units called neurons which
KEY WORDS propagate information from one layer to the next by
Recurrent neural network, knowledge extraction, fuzzy computing a non-linear function of the weighted sum of
finite-state automata, Trakhtenbrot-Barzdin algorithm inputs. Recurrent neural networks maintain information
about their past states for the computation of future states
and outputs [4]. We use the backpropagation-through-
1. Introduction time (BPTT) algorithm for training first-order recurrent
neural networks [5]. Recurrent neural networks are
In the past, knowledge extraction from RNNs was aimed nonlinear dynamical systems and it has been previously
at finding the underlying models of the learned shown that RNN’s can represent FFA states [4]. Other
knowledge typically in the form of finite state machines. popular architectures of recurrent neural networks include
Recurrent neural networks have been trained on second-order recurrent networks [6], NARX networks [7]
deterministic finite-state automata (DFAs) and knowledge and LSTM [8]. A detailed study about the vast variety of
extraction methods have been applied to explore the recurrent neural networks is beyond the scope of this
knowledge representation in the weights of the trained paper.
network. Recently, machine learning methods have been
applied to the extraction of symbolic knowledge from Fuzzy finite-state automata have been shown to be useful
recurrent neural networks. In this paper, we apply a in application for modelling dynamical fuzzy systems in
machine learning method to induce FFAs from the input– conjunction with recurrent neural networks [9-13]. Design
output mappings of trained recurrent neural networks. We methods for constructing fuzzy finite- state automata with
use the observed input-output behaviour of the trained RNN include the use of a linear output layer for
network to label strings that were not part of the training computing the fuzzy string membership [14] and
set, i.e. we use the trained network’s generalization assigning multiple neurons, one each for every distinct
capability for string classification. The unlabelled strings membership in the training set [4]. In the latter case, the
from the training set along with the newly classified number of neurons in the output layer is equal to the
corresponding output state labels by the trained network
number of distinct fuzzy membership values that occur in recurrent state neuron S j to recurrent state neuron Si ,
the training set as shown in Figure 1
respectively. g(.) is a sigmoidal discriminant function.
Knowledge extraction from recurrent neural network aims
at finding or building a formal computational model that
mimic the network to a certain degree [15]. A recurrent 2.2 Regular Languages and Finite Automata
neural network processes information by using its internal
continuous state space as an implicit memory of past A finite-state automaton is a device that can be in one of a
input patterns [5]. Two major forms of knowledge finite number of states . In certain conditions, it can
extraction methods are: 1) looking at the observation of switch to another state; this is called a transition . When
the internal states of the network which includes the automaton starts processing input, it can be in one of
algorithms that construct clusters that corresponds to the its initial states. There is also another important subset of
finite state behaviour of the network[16] and 2) machine states of the automaton: the final states. If the automaton
learning methods where the extraction relies solely on the is in a final state after processing an input sequence, it is
input-output behaviour of the network in which the said to accept its input. Finite-state automata are used as
technique is not confined to a particular neural network test beds for training recurrent neural networks.
architecture[17]. Presumably, strings used for training do not need to
undergo any feature extraction. They are used to show
that recurrent neural networks can represent dynamical
Output systems.
layer
In this section, we discuss the transformation of fuzzy
finite-state automata to their equivalent crisp
representation, known as deterministic acceptors.

Definition 2.1: A fuzzy finite-state automaton M is a 6-


tuple M = ( ∑, Q, R, Z, δ, ω ), where ∑ and Q are the
input alphabet and the set of finite states, respectively, R є
Q is the automaton’s fuzzy start state, Z is a finite output
Context layer Input layer alphabet, δ: ∑ × Q × [0,1] →Q is the fuzzy transition
map, and ω: Q → Z is the output map.
Figure 1: Recurrent Neural Network We consider a restricted type of fuzzy automata whose
Architecture. The fuzzy output state initial state is not fuzzy, and ω is a function from F to Z,
membership is represented by a multiple number where F is a non-fuzzy set of states, called finite states.
of neurons in the output layer which is equal to Any fuzzy automaton as described in Definition 2.1 is
the distinct number of fuzzy states in the training equivalent to a restricted fuzzy automaton [18]. Notice
set. that a FFA reduces to a conventional DFA by restricting
the transition weights to 1.

2. Definitions and Methods Figures 2 and 3 show examples of an FFA and its
corresponding deterministic acceptor, respectively.
2.1 First Order Recurrent Neural Network

The equation of the dynamics of the change of hidden


state neuron activations in first order recurrent neural
networks is given by:

 K J 
Sit +1 = g  ∑ I kVik + ∑ S tj wij 
 k =1 j =1 

where I k is the k th input neuron, Sit +1 is the output value Figure 2: Fuzzy Finite-State Automaton with
of the recurrent state neuron S j at time t . Vik and Weighted State Transitions. State 1 is the
automaton’s start state; accepting states are drawn
wij are the weights connecting input neuron I k and with double circles. Only paths that can lead to an
accepting state are shown (transitions to garbage
are not shown explicitly). A transition from state
qj to qi on input symbols ak with weight θ is extracted correctly. In this paper, we apply the
represented as a direct arc from qj to qi labelled Trakhtenbrot-Barzdin algorithm to the extraction of FFAs
ak / θ. instead of DFAs from trained recurrent neural networks
and induce a minimal state. The algorithm requires that
the labels of all strings up to length L be known in order
for a DFA to be extracted unambiguously; these strings
can be represented in a so-called prefix tree. This prefix
tree is collapsed into a smaller graph by merging all pairs
of nodes that represent compatible mappings from
suffixes to labels. The algorithm visits all nodes of the
prefix tree in breadth-first order, all pairs (i,j) of nodes are
evaluated for compatibility by the subtrees rooted at i and
j, respectively. The subtrees are compatible if the nodes in
corresponding positions in the respective trees are
identical. If all corresponding labels are the same, then the
edge from i’s parent to i is changed to point at j instead.
Nodes which become inaccessible are discarded. The
result is smallest automaton consistent with the data set.
The original Trakhtenbrot-Barzdin only operated on
prefix trees whose nodes contain binary labels; here, we
generalize the algorithm to allow for labels of nodes in the
Figure 3: Equivalent Deterministic Acceptor. The
prefix tree to correspond to the fuzzy memberships that
diagram above is the corresponding deterministic
occur in the training set. The algorithm is summarised as
acceptor from the FFA in Figure 2 which computes
follows:
the membership function strings. The accepting
states are labelled with a degree of membership.
Algorithm 1: Let T be a complete prefix tree with n nodes
Notice that all transitions in the DFA have weight 1.
1,…,n

for i = 1 to n by 1 do
2.3 Knowledge Extraction from Recurrent Neural
for j = 1 to i - 1 by 1 do
Networks
if subtree(i) ≡ subtree(j)
parent(j) ← parent(i)
Neural networks were once considered black boxes, i.e. it
was believed to be difficult to understand the knowledge
or its representation in the weights as part of the
Using the above algorithm, we generate a complete prefix
information processing in the network. Knowledge
tree for strings of up to lengths L=1. For each successive
extraction is the process of finding the meaning of the
string length, we extract a FFA from the corresponding
internal weight representation of the network. There are
prefix trees of increasing depth using the Trakhtenbrot-
two main approaches for knowledge extraction from
Barzdin algorithm. The algorithm checks if the extracted
trained recurrent neural networks: inducing finite
FFA is consistent with the entire training data set, i.e. the
automata by clustering the activation values of hidden
input–output labels of the entire training data set must be
state neurons [19] and the application of machine learning
explained by the extracted FFA. If the FFA is consistent,
methods to induce automaton from the observation of
then the algorithm terminates. The algorithm for running
input-output mappings of the recurrent neural network. In
the extraction process is summarised as follows:
this paper, we introduce a generalization of a machine
learning method for symbolic knowledge extraction;
Algorithm 2 : Let M be the universal FFA that rejects
whereas the original method was conceived to extract
(or accepts) all strings, and let string L =0.
DFAs from trained recurrent neural networks, this new
method allows the extraction of fuzzy finite-state
Repeat
automata.
L← L+1
2.4 Trakhtenbrot- Barzdin Algorithm Generate all strings of up to L
Extract ML using Trakhtenbrot-Barzdin(L)
The Trakhtenbrot-Barzdin algorithm [20] extracts Until ML is consistent with the test set
minimal DFAs from a data set of input strings and their
respective output states. The algorithm is guaranteed to
induce DFA in polynomial time. A restricting premise of
the TB algorithm is that we must know the labels of all
strings up to certain length L if the desired DFA is to be
3. Experimentation remains at 100% for all prefix trees with larger depth than
L=5.
3.1 Training Recurrent Neural Networks We also ran experiments where the training set consisted
of 50%, 30% and 10% of all strings up to length 10, i.e.
The network topology consisted of two input neurons, one the training data itself certainly no longer embodied the
for each of the input symbols 0 and 1, respectively, ten knowledge about the fuzzy membership values necessary
hidden neurons and four output neurons where each of the in order to induce FFAs. In this case, the trained network
output neurons represents the fuzzy membership of the had to rely on its generalization capability in order to
fuzzy memberships that occur in the training data set. The assign the fuzzy membership values missing in the prefix
labels of the training strings were generated by the tree. Our experiments show that, even when “holes” are
automaton shown in Figure 3. We used an incremental present in the training set, it is possible to extract the ideal
training approach when we initialised a working set with deterministic finite-state acceptor by making use of the
a subset of short string and trained the network on it. We neural network for missing fuzzy membership values.
then expanded the working set and repeated these steps
until the network converged.
String Percentage correctly
Upon successful training, i.e. the network correctly Length consistent with testing set
classifies 100% of the training set on FFA strings of 2 0%
length 1-10, we tested the network on all strings of length 3 0%
1-15. We observed that the network was able to correctly
classify 100% of the testing data. This clearly 4 0%
demonstrates that recurrent neural networks are capable 5 92.8%
of representing FFA states internally. 6 100%
7 100%
8 100%
3.2 Knowledge Extraction 9 100%
As discussed in the previous section, we applied the 10 100%
knowledge extraction method where the extraction
depends on the input-out mapping of the FFA string Table 1: FFA Induction. The table above shows
obtained by the generalisation made by the trained the percentage of data consistent with the entire
network. Upon successfully training and testing of the training set with their corresponding string
network, we proceeded with the step of knowledge lengths.
extraction in order to identify the knowledge represented
in the weights of the network. We recorded the prediction
made by the network for increasing lengths of string as 5. Conclusions
discussed in Section 2.2. We transformed the string
classification represented by the four output neurons into We have successfully trained crisp representations of
their corresponding fuzzy string membership values. FFAs with recurrent neural networks. We have seen that
recurrent neural networks can represent FFA in their
Once all the strings in the data set recorded the network’s internal state representation of the weights. We used a
string, the data set with input-output mappings made by network architecture with the number of output neurons
the network was ready for FFA induction. A prefix tree equal to the number of distinct fuzzy outputs in the FFA.
was built from each sample of the input string and This particular architecture was successful in training and
corresponding fuzzy string membership value. We gave a 100% generalisation prediction on a large testing
applied the FFA extraction algorithm has shown in data set. We used the Trakhtenbrot-Barzdin algorithm for
Algorithm 2; for each string length L, we recorded the knowledge extraction. This method has proven favourable
extracted FFAs fuzzy string classification performance on and in general can be applied to various recurrent network
the training set. architectures without having constraints. The knowledge
extraction results show that the ideal deterministic
acceptor for a FFA could be extracted from prefix trees
4. Results and Discussion from depth of at least 6, i.e. the extracted acceptor could
explain the entire training data. In this paper, we show
The recorded classification performance of the FFAs that the Trakhtenbrot-Barzdin algorithm can induce crisp
extracted with increasing string length L are shown in respresentations of FFAs in similar way in which it has
Table 1. We note that the FFAs extracted from lengths been applied to DFA induction. It also demonstrates that
L=2,…,4 show 0% accuracy. The fuzzy string recurrent neural networks can represent fuzzy finite-state
classification accuracy jumps to 92.8% for L=5 and automata in similar ways in which they represent
deterministic finite-state automata.
References networks for identification of dynamical systems, IEEE
Trans. Neural Networks, 6, 1995, 422–431.
[1] A.J Robinson, An application of recurrent nets to
phone probability estimation, IEEE transactions on [13] E. B. Kosmatopoulos and M. A. Christodoulou,
Neural Networks, 5(2), 1994, 298-305. Recurrent neural networks for approximation of fuzzy
dynamical systems, Int. J. Intell. Control Syst., 1(2), 1996,
223–233.
[2] C.L. Giles, S. Lawrence & A.C. Tsoi, Rule inference
for financial prediction using recurrent neural networks, [14] C. W. Omlin, K. K. Thornber, & C. L. Giles, Fuzzy
Proc. of the IEEE/IAFE Computational Intelligence for finite state automata can be deterministically encoded into
Financial Engineering, New York City, USA, 1997, 253- recurrent neural networks, IEEE Trans. Fuzzy Syst., 6,
259 1998, 76–89.
[3] K. Marakami & H Taguchi, Gesture recognition using [15] H. Jacobsson, Rule extraction from recurrent neural
recurrent neural networks, Proc. of the SIGCHI networks: A taxonomy and review, Neural Computation,
conference on Human factors in computing systems: 17 (6), 2005,1223-1263.
Reaching through technology, Louisiana, USA, 1991,
237-242. [16] S. Das & R. Das, Induction of discrete state-machine
by stabalising a continuous recurrent neural network
[4] C. Lee Giles, C.W Omlin & K. Thornber, Equivalence using clustering. Journal of Computer Science and
in Knowledge Representation: Automata, Recurrent Informatics, 21(2), 1991, 35-40.
Neural Networks, and dynamical Systems, Proc. of the
IEEE, 87(9), 1999, 1623-1640 [17] A. Vahed & C. W Omlin, Rule extraction from
recurrent neural networks using a symbolic machine
[5] J. L. Elman, Finding structure in time, Cognitive learning algorithm, Proc. of the 6th International
Science, 14, 1990, 179-211. Conference on Neural Information Processing, Dunedin,
New Zealand, 1999, 712-717.
[6] R. L. Watrous & G. M. Kuhn, Induction of finite-state
languages using second-order recurrent networks, Proc. of [18] D. Dubois & H. Prade, Fuzzy sets and systems:
Advances in Neural Information Systems, California, theory and applications, Mathematics in science and
USA, 1992, 309-316. engineering, 14, 1980, 220-226.
[7] T. Lin, B.G. Horne, P. Tino, & C.L. Giles, Learning [19] C. Omlin & C. L. Giles, Extraction of rules from
long-term dependencies in NARX recurrent neural discrete time recurrent neural networks, Journal of the
networks. IEEE Transactions on Neural Networks, 7(6), ACM, 43(6), 1996, 937-972.
1996, 1329-1338.
[20] B. Trakhenbrot & Y. Barzdin, Finite automata:
[8] S. Hochreiter & J. Schmidhuber, Long short-term Behaviour and synthesis, North- Holland, Amsterdam,
memory, Neural Computation, 9(8), 1997, 1735-1780. 1973.
[9] E. B. Kosmatopoulos and M. A. Christodoulou,
Neural networks for identification of fuzzy dynamical
systems: An application to identification of vehicle
highway systems, Proc. of the 4th IEEE Mediterranean
Symposium, New Directions in Control and Automation,
1996, 23–38.

[10] F. E. Cellier & Y. D. Pan, Fuzzy adaptive recurrent


counter propagation neural networks: A tool for efficient
implementation of qualitative models of dynamic
processes, J. Syst. Eng., 5(4), 1995, 207–222 .

[11] E. B. Kosmatopoulos & M. A. Christodoulou,


Structural properties of gradient recurrent high-order
neural networks, IEEE Trans. Circuits Syst., 42, 1995,
592–603.

[12] E. B. Kosmatopoulos, M. M. Polycarpou, M. A.


Christodoulou, & P. A. Ioannou, High-order neural

You might also like