Professional Documents
Culture Documents
2. Definitions and Methods Figures 2 and 3 show examples of an FFA and its
corresponding deterministic acceptor, respectively.
2.1 First Order Recurrent Neural Network
K J
Sit +1 = g ∑ I kVik + ∑ S tj wij
k =1 j =1
where I k is the k th input neuron, Sit +1 is the output value Figure 2: Fuzzy Finite-State Automaton with
of the recurrent state neuron S j at time t . Vik and Weighted State Transitions. State 1 is the
automaton’s start state; accepting states are drawn
wij are the weights connecting input neuron I k and with double circles. Only paths that can lead to an
accepting state are shown (transitions to garbage
are not shown explicitly). A transition from state
qj to qi on input symbols ak with weight θ is extracted correctly. In this paper, we apply the
represented as a direct arc from qj to qi labelled Trakhtenbrot-Barzdin algorithm to the extraction of FFAs
ak / θ. instead of DFAs from trained recurrent neural networks
and induce a minimal state. The algorithm requires that
the labels of all strings up to length L be known in order
for a DFA to be extracted unambiguously; these strings
can be represented in a so-called prefix tree. This prefix
tree is collapsed into a smaller graph by merging all pairs
of nodes that represent compatible mappings from
suffixes to labels. The algorithm visits all nodes of the
prefix tree in breadth-first order, all pairs (i,j) of nodes are
evaluated for compatibility by the subtrees rooted at i and
j, respectively. The subtrees are compatible if the nodes in
corresponding positions in the respective trees are
identical. If all corresponding labels are the same, then the
edge from i’s parent to i is changed to point at j instead.
Nodes which become inaccessible are discarded. The
result is smallest automaton consistent with the data set.
The original Trakhtenbrot-Barzdin only operated on
prefix trees whose nodes contain binary labels; here, we
generalize the algorithm to allow for labels of nodes in the
Figure 3: Equivalent Deterministic Acceptor. The
prefix tree to correspond to the fuzzy memberships that
diagram above is the corresponding deterministic
occur in the training set. The algorithm is summarised as
acceptor from the FFA in Figure 2 which computes
follows:
the membership function strings. The accepting
states are labelled with a degree of membership.
Algorithm 1: Let T be a complete prefix tree with n nodes
Notice that all transitions in the DFA have weight 1.
1,…,n
for i = 1 to n by 1 do
2.3 Knowledge Extraction from Recurrent Neural
for j = 1 to i - 1 by 1 do
Networks
if subtree(i) ≡ subtree(j)
parent(j) ← parent(i)
Neural networks were once considered black boxes, i.e. it
was believed to be difficult to understand the knowledge
or its representation in the weights as part of the
Using the above algorithm, we generate a complete prefix
information processing in the network. Knowledge
tree for strings of up to lengths L=1. For each successive
extraction is the process of finding the meaning of the
string length, we extract a FFA from the corresponding
internal weight representation of the network. There are
prefix trees of increasing depth using the Trakhtenbrot-
two main approaches for knowledge extraction from
Barzdin algorithm. The algorithm checks if the extracted
trained recurrent neural networks: inducing finite
FFA is consistent with the entire training data set, i.e. the
automata by clustering the activation values of hidden
input–output labels of the entire training data set must be
state neurons [19] and the application of machine learning
explained by the extracted FFA. If the FFA is consistent,
methods to induce automaton from the observation of
then the algorithm terminates. The algorithm for running
input-output mappings of the recurrent neural network. In
the extraction process is summarised as follows:
this paper, we introduce a generalization of a machine
learning method for symbolic knowledge extraction;
Algorithm 2 : Let M be the universal FFA that rejects
whereas the original method was conceived to extract
(or accepts) all strings, and let string L =0.
DFAs from trained recurrent neural networks, this new
method allows the extraction of fuzzy finite-state
Repeat
automata.
L← L+1
2.4 Trakhtenbrot- Barzdin Algorithm Generate all strings of up to L
Extract ML using Trakhtenbrot-Barzdin(L)
The Trakhtenbrot-Barzdin algorithm [20] extracts Until ML is consistent with the test set
minimal DFAs from a data set of input strings and their
respective output states. The algorithm is guaranteed to
induce DFA in polynomial time. A restricting premise of
the TB algorithm is that we must know the labels of all
strings up to certain length L if the desired DFA is to be
3. Experimentation remains at 100% for all prefix trees with larger depth than
L=5.
3.1 Training Recurrent Neural Networks We also ran experiments where the training set consisted
of 50%, 30% and 10% of all strings up to length 10, i.e.
The network topology consisted of two input neurons, one the training data itself certainly no longer embodied the
for each of the input symbols 0 and 1, respectively, ten knowledge about the fuzzy membership values necessary
hidden neurons and four output neurons where each of the in order to induce FFAs. In this case, the trained network
output neurons represents the fuzzy membership of the had to rely on its generalization capability in order to
fuzzy memberships that occur in the training data set. The assign the fuzzy membership values missing in the prefix
labels of the training strings were generated by the tree. Our experiments show that, even when “holes” are
automaton shown in Figure 3. We used an incremental present in the training set, it is possible to extract the ideal
training approach when we initialised a working set with deterministic finite-state acceptor by making use of the
a subset of short string and trained the network on it. We neural network for missing fuzzy membership values.
then expanded the working set and repeated these steps
until the network converged.
String Percentage correctly
Upon successful training, i.e. the network correctly Length consistent with testing set
classifies 100% of the training set on FFA strings of 2 0%
length 1-10, we tested the network on all strings of length 3 0%
1-15. We observed that the network was able to correctly
classify 100% of the testing data. This clearly 4 0%
demonstrates that recurrent neural networks are capable 5 92.8%
of representing FFA states internally. 6 100%
7 100%
8 100%
3.2 Knowledge Extraction 9 100%
As discussed in the previous section, we applied the 10 100%
knowledge extraction method where the extraction
depends on the input-out mapping of the FFA string Table 1: FFA Induction. The table above shows
obtained by the generalisation made by the trained the percentage of data consistent with the entire
network. Upon successfully training and testing of the training set with their corresponding string
network, we proceeded with the step of knowledge lengths.
extraction in order to identify the knowledge represented
in the weights of the network. We recorded the prediction
made by the network for increasing lengths of string as 5. Conclusions
discussed in Section 2.2. We transformed the string
classification represented by the four output neurons into We have successfully trained crisp representations of
their corresponding fuzzy string membership values. FFAs with recurrent neural networks. We have seen that
recurrent neural networks can represent FFA in their
Once all the strings in the data set recorded the network’s internal state representation of the weights. We used a
string, the data set with input-output mappings made by network architecture with the number of output neurons
the network was ready for FFA induction. A prefix tree equal to the number of distinct fuzzy outputs in the FFA.
was built from each sample of the input string and This particular architecture was successful in training and
corresponding fuzzy string membership value. We gave a 100% generalisation prediction on a large testing
applied the FFA extraction algorithm has shown in data set. We used the Trakhtenbrot-Barzdin algorithm for
Algorithm 2; for each string length L, we recorded the knowledge extraction. This method has proven favourable
extracted FFAs fuzzy string classification performance on and in general can be applied to various recurrent network
the training set. architectures without having constraints. The knowledge
extraction results show that the ideal deterministic
acceptor for a FFA could be extracted from prefix trees
4. Results and Discussion from depth of at least 6, i.e. the extracted acceptor could
explain the entire training data. In this paper, we show
The recorded classification performance of the FFAs that the Trakhtenbrot-Barzdin algorithm can induce crisp
extracted with increasing string length L are shown in respresentations of FFAs in similar way in which it has
Table 1. We note that the FFAs extracted from lengths been applied to DFA induction. It also demonstrates that
L=2,…,4 show 0% accuracy. The fuzzy string recurrent neural networks can represent fuzzy finite-state
classification accuracy jumps to 92.8% for L=5 and automata in similar ways in which they represent
deterministic finite-state automata.
References networks for identification of dynamical systems, IEEE
Trans. Neural Networks, 6, 1995, 422–431.
[1] A.J Robinson, An application of recurrent nets to
phone probability estimation, IEEE transactions on [13] E. B. Kosmatopoulos and M. A. Christodoulou,
Neural Networks, 5(2), 1994, 298-305. Recurrent neural networks for approximation of fuzzy
dynamical systems, Int. J. Intell. Control Syst., 1(2), 1996,
223–233.
[2] C.L. Giles, S. Lawrence & A.C. Tsoi, Rule inference
for financial prediction using recurrent neural networks, [14] C. W. Omlin, K. K. Thornber, & C. L. Giles, Fuzzy
Proc. of the IEEE/IAFE Computational Intelligence for finite state automata can be deterministically encoded into
Financial Engineering, New York City, USA, 1997, 253- recurrent neural networks, IEEE Trans. Fuzzy Syst., 6,
259 1998, 76–89.
[3] K. Marakami & H Taguchi, Gesture recognition using [15] H. Jacobsson, Rule extraction from recurrent neural
recurrent neural networks, Proc. of the SIGCHI networks: A taxonomy and review, Neural Computation,
conference on Human factors in computing systems: 17 (6), 2005,1223-1263.
Reaching through technology, Louisiana, USA, 1991,
237-242. [16] S. Das & R. Das, Induction of discrete state-machine
by stabalising a continuous recurrent neural network
[4] C. Lee Giles, C.W Omlin & K. Thornber, Equivalence using clustering. Journal of Computer Science and
in Knowledge Representation: Automata, Recurrent Informatics, 21(2), 1991, 35-40.
Neural Networks, and dynamical Systems, Proc. of the
IEEE, 87(9), 1999, 1623-1640 [17] A. Vahed & C. W Omlin, Rule extraction from
recurrent neural networks using a symbolic machine
[5] J. L. Elman, Finding structure in time, Cognitive learning algorithm, Proc. of the 6th International
Science, 14, 1990, 179-211. Conference on Neural Information Processing, Dunedin,
New Zealand, 1999, 712-717.
[6] R. L. Watrous & G. M. Kuhn, Induction of finite-state
languages using second-order recurrent networks, Proc. of [18] D. Dubois & H. Prade, Fuzzy sets and systems:
Advances in Neural Information Systems, California, theory and applications, Mathematics in science and
USA, 1992, 309-316. engineering, 14, 1980, 220-226.
[7] T. Lin, B.G. Horne, P. Tino, & C.L. Giles, Learning [19] C. Omlin & C. L. Giles, Extraction of rules from
long-term dependencies in NARX recurrent neural discrete time recurrent neural networks, Journal of the
networks. IEEE Transactions on Neural Networks, 7(6), ACM, 43(6), 1996, 937-972.
1996, 1329-1338.
[20] B. Trakhenbrot & Y. Barzdin, Finite automata:
[8] S. Hochreiter & J. Schmidhuber, Long short-term Behaviour and synthesis, North- Holland, Amsterdam,
memory, Neural Computation, 9(8), 1997, 1735-1780. 1973.
[9] E. B. Kosmatopoulos and M. A. Christodoulou,
Neural networks for identification of fuzzy dynamical
systems: An application to identification of vehicle
highway systems, Proc. of the 4th IEEE Mediterranean
Symposium, New Directions in Control and Automation,
1996, 23–38.