Professional Documents
Culture Documents
Abstract - We present a hybrid recurrent neural networks represent deterministic finite automaton in their internal
architecture inspired by hidden Markov models which have weight representations [6].
shown to learn and represent dynamical systems. We use
genetic algorithms for training the hybrid architecture and The structural similarity between hidden Markov
show their contribution to speech phoneme classification. models and recurrent neural networks is the basis for
We use Mel frequency cepstral coefficient feature extraction constructing the hybrid recurrent neural networks
methods on three pair of phonemes obtained from the architecture. The recurrence equation in the recurrent neural
TIMIT speech database. We use hybrid recurrent neural network resembles the equation in the forward algorithm in
networks for modelling each pair of phoneme. Our results the hidden Markov models. The combination of the two
demonstrate that the hybrid architecture can successfully paradigms into a hybrid system may provide better
model speech sequences. generalization and training performance which would be a
useful contribution to the field of machine learning and
Keywords: Genetic algorithms, Hidden Markov models,
pattern recognition. We have previously introduced a slight
Phoneme classification and Recurrent neural networks.
variation of this architecture and shown that they can
represent dynamical systems [7]. In this paper, we show that
the hybrid recurrent neural network architecture can be
applied to model speech sequences. We use Mel cepstral
1 Introduction frequency coefficients (MFCC) for feature extraction from
the TIMIT speech database. We extract three different pairs
Recurrent neural networks have been an important
of phonemes and use hybrid recurrent neural networks for
focus of research as they can be applied to difficult
modelling.
problems involving time-varying patterns. Their
applications range from speech recognition and financial
Evolutionary optimization techniques such as genetic
prediction to gesture recognition [1]-[3]. Hidden Markov
algorithms have been popular for training neural networks
models, on the other hand, have been very popular in the
other than gradient decent learning [8]. It has been
field of speech recognition [4]. They have also been applied
observed that genetic algorithms overcome the problem of
to other problems such as gesture recognition [5].
local minima whereas in gradient descent search for the
optimal solution, it may be difficult to drive the network out
Recurrent neural networks are capable of modelling
of the local minima which in turn proves costly in terms of
complicated recognition tasks. They have shown more
training time. In this paper, we will show how genetic
accuracy in speech recognition in cases of low quality noisy
algorithms can be used for training the hybrid recurrent
data compared to hidden Markov models. However, hidden
neural networks architecture inspired by hidden Markov
Markov models have shown to perform better when it
models. We will use genetic algorithms in training hybrid
comes to large vocabulary speech recognition. One
recurrent neural networks for the classification of phonemes
limitation for hidden Markov models in the application for
extracted from the TIMIT speech database. We end this
speech recognition is that they assume that the probability
paper with conclusions from our work and possible
of being in a state at time t only depends on the previous
directions for future research.
state i.e. the state at time t-1. This assumption is
inappropriate for speech signals where dependencies often
extend through several states, however, hidden Markov
models have shown extremely well for certain types of
speech recognition. Recurrent neural networks are
dynamical systems and it has been shown that they can
2 Definitions and Methods 2.2 Hidden Markov Models
2.1 Recurrent Neural Networks A hidden Markov model (HMM) describes a process
which goes through a finite number of non-observable states
Recurrent neural networks maintain information whilst generating a signal of either discrete or continuous in
about their past states for the computation of future states nature. In a first order Markov model, the state at time t+1
and outputs by using feedback connections. They are depends only on state at time t, regardless of the states in
composed of an input layer, a context layer which provides the previous times [12]. Fig. 2 shows an example of a
state information, a hidden layer and an output layer as Markov model containing three states in a stochastic
shown in Fig. 1. Each layer contains one or more processing automaton.
units called neurons which propagate information from one
layer to the next by computing a non-linear function of their
weighted sum of inputs.
where N is the number of hidden states in the HMM, aij is Fig. 3 shows how the Gaussian distribution for
hidden Markov model is used to build hybrid recurrent
the probability of making a transition from state i to j and
neural networks. The output of the multivariate Gaussian
bj ( Ot ) is the Gaussian distribution for the observation at
function solely depends on the mean which is a vector equal
time t. The calculation in equation 2 is inherently recurrent to the size of the input vector. This parameter will also be
and bares resemblance to the recursion of recurrent neural represented in the chromosomes together with the weights
networks, refer to “(3)”. and biases and will be trained by genetic algorithms in order
to model speech sequences.
N
x j ( t ) = f ∑ x i ( t − 1) wij 1≤ i ≤ N (3)
i
K J (4) bt ( O )
S i ( t ) = f ∑ V ik S k ( t − 1) + ∑ W ij I j ( t − 1) .bt −1 ( O )
k =1
j =1