You are on page 1of 42

Hidden Markov Model

Zaheer Ahmad
PhD Scholar
ahmad.zaheer@yahoo.com
Department of Computer Science University of Peshawar
1/20/2011 1
AGENDA
• Markov Process/Model/Chain
• Orders of Markov Models
• Working of Markov Models
• Hidden Markov Model---As a double
Probability function
• How HMM is used for NLP and DIP
• MATLAB Toolbox for Markov Model
Markov processes
Introduction
• Markov processes are examples of stochastic
processes—processes that generate random
sequences of outcomes or states according to
certain probabilities.
• Markov processes are distinguished by being
memoryless—their next state depends only
on their current state, not on the history that
led them there.
Example
• A game of snakes
and ladders or any
other game whose
moves are
determined entirely
by dice is a Markov
chain.

• It doesn't depend
on how things got
to their current
state.
Usage Sequential Data
• data samples which are not independent
from each other.
• Markov chain and hidden Markov model are
probably the simplest models which can be
used to model sequential data.
Markov Model
• In probability theory, a Markov model is a
stochastic model that assumes the Markov
property.
• A hypothetical description of the Markov
Process
Markov Chain
• Often, the term "Markov chain" is used to mean a
Markov process which has a discrete (finite or
countable) state-space.
Where
a state space is a directed graph where each
possible state of a dynamical system is represented
by a vertex, and there is a directed edge from a to b if
and only if ƒ(a) = b where the function f defines the
dynamical system.

• A Markov chain is a random process with the Markov


property, i.e. the property, simply say…..
Markov property
• The Markov property states that the
conditional probability distribution for the
system at the next step (and in fact at all
future steps) given its current state depends
only on the current state of the system, and
not additionally on the state of the system at
previous steps.
Discrete Time random process
• A "discrete-time" random process means a
system which is in a certain state at each "step",
with the state changing randomly between steps.
• The steps are often thought of as time, but they
can equally well refer to physical distance or any
other discrete measurement;
• formally, the steps are just the integers or natural
numbers, and the random process is a mapping
of these to states.
Markov Model / chains
• Markov chains are mathematical descriptions of Markov models with
a discrete set of states. Markov chains are characterized by:
• A set of states {1, 2, ..., M}
• An M-by-M transition matrix T whose i,j entry is the probability of a
transition from state i to state j. The sum of the entries in each row of
T must be 1, because this is the sum of the probabilities of making a
transition from a given state to each of the other states.
• A set of possible outputs, or emissions, {s1, s2, ... , sN}. By default, the
set of emissions is {1, 2, ... , N}, where N is the number of possible
emissions, but you can choose a different set of numbers or symbols.
• An M-by-N emission matrix E whose i,k entry gives the probability of
emitting symbol sk given that the model is in state i.
• Markov chains begin in an initial state i0 at step 0. The chain then
transitions to state i1 with probability , and emits an output with
probability . Consequently, the probability of observing the sequence
of states and the sequence of emissions in the first r steps, is
Markov decision process

• A Markov decision process is a Markov chain


in which state transitions depend on the
current state and an action vector that is
applied to the system. Typically, a Markov
decision process is used to compute a policy
of actions that will maximize some utility with
respect to expected rewards. It is closely
related to Reinforcement learning, and can be
solved with value iteration and related
methods.
Order 0 Markov Models

• The simplest Markov process is a first order


process, where the choice of state is made
purely on the basis of the previous state.

• Notice this is not the same as a deterministic


system, since we expect the choice to be
made probabalistically, not deterministically.
Order 1 Markov Models

• An order 1 (first-order) Markov model has a


memory of size 1. It is defined by a table of
probabilities pr(xt=Si | xt-1=Sj), for i = 1..k & j =
1..k. You can think of this as k order 0 Markov
models, one for each Sj.
Order m Markov Models

• The order of a Markov model of fixed order, is


the length of the history or context upon
which the probabilities of the possible values
of the next state depend.
For example,
• the next state of an order 2 (second-order)
Markov Model depends upon the two
previous states.
How Markov Process Works
• Consider we have three all day weather,
which could be sunny (S), cloudy (C), or Rainy
(R).
• From the history of the weather of the
town under investigation we have the
following table
• Table-1 shows probabilities of having certain
state of tomorrow's weather and being in
certain condition today:
• The summation of Pr will be 1
• Assume tomorrow’s weather depends only
on today’s condition as it is in consistency
with the first order Markov chain
• We refer to the weather conditions by
state q that are sampled at instant t and
• The problem is to find the probability of
weather condition of tomorrow given
today's condition P(qt+1 /qt).
• An acceptable approximation for n instants
history is :

P(qt+1/qt , qt-1 , qt-2 , ….. , qt-n ) ≈ P(qt+1 /qt)


• Given today as sunny (S) what is the
probability that the next following five days are
S , C , C , R and S, having the above model?
• The answer resides in the following formula using
first order Markov chain:
• P(q1 = S, q2=S, q3=C, q4=C, q5=R, q6=S)
= P(S).P(q2=S/q1=S). P(q3=C/q2=S). P(q4=C/q3=C).
P(q5=R/q4=C). P(q6=S/q5=R)
= 1 x 0.7 x 0.2 x 0.8 x 0.15 x 0.15
= 0.00252
Finite state representation of weather
forecast problem
Transition Matrix and its Calculations

In general, if a Markov
chain has r states, then

=0.5*0.5+0.25*0.5+0.25*0.25, =0.5*0.25+0+0.25*0.25, =0.5*0.25+0.25*0.5+0.5*0.25


=0.5*0.5+0+0.5*0.25, ,
=, , ,
Hidden Markov Model (HMM)

• In HMM we observe a sequence of emissions, but do


not know the sequence of states the model went
through to generate the emissions. Analyses of hidden
Markov models seek to recover the sequence of states
from the observed data.
State-emission HMM


s1 s2 sN

w1 w4 w1 w3 w5 w1
Two kinds of parameters:
• Transition probability: P(sj | si)
• Output (Emission) probability: P(wk | si)
Hidden Markov model (HMM)
• This notion implies the double stochastic process.

• More precisely, the HMM is a probabilistic pattern


matching technique in which the observations are
considered to be the output of stochastic process
and consists of an underlying Markov chain.

• It has two components: a finite state Markov chain and


a finite set of output probability distribution
Rabiner’s Example
• simplification of Jinni Example
• Assume that we have two persons, one doing an experiment
and the other is an outside observer.
• Let us consider that we have N urns (states) numbered
from S1 to SN and
• in each urn there are M coloured balls (observations)
distributed in different proportions. Also
• we have a black bag belongs to each urn, each bag contains
100 counters numbered by three numbers.
• These numbers are the current urn number Si and the
following two urns numbers Si+1 and Si+2 in probability
proportions of .8, .15, and .05 respectively.
• The counters of the bag belonging to the urn just before the
last are carrying one of two numbers only; SN-1 and SN in
probabilities of .9 and .1 respectively.
• We assume that the starting urn (state) is always urn1 (S1)
and we end up in urnN (SN).
• The last urn need no bag as we suggest to stay their when we
reach it till the end of the experiment.
• We start the experiment at time t =1 by drawing a ball from
urn1 and register the colour then return it back to the
urn.
• Then draw a counter from the corresponding urn bag.
• The expected possible numbers on the counters are: 1
(stay in urn1), or 2 (move to the next urn), or 3 (jump to
the third urn).
• We continue with the same procedure of drawing a
counter then a ball from the corresponding urn and
registering the ball colours till we reach state N and
stay there till the end of the experiment at instant T
• The outcome of this experiment is a series of coloured
balls (observations) which could be considered as a
sequence of events governed by the probability
distribution of the balls inside each urn and by the
counters existing in each bag.
• The outside observer has no idea about which urn a
ball at any instant has drawn from (hidden states),
what he knows is only the observation sequence of the
coloured balls(observations).
Some Conclusions
• Several things could be concluded from this
experiment :
1 – The starting urn is always urn1 (S1).
2 – The urn which has been left can not be
visited again (i.e. moving from left to right
direction).
3 – Movements are either by one or two urns to
the right.
4 – The last urn visited is always urnN (SN).
• A chain of 5 urns (states) is shown in Fig.( 2 ).
The principal cases of HMM
• There are three main cases to be dealt with to
formulate a successful HMM

Case 1: Evaluation
Case 2: Decoding
Case 3: Training
Case 1: Evaluation
Given:
• a model λ = (A , B , π ) ready to be used.
• testing observation sequence O = O1 , O2 , O3
,.........., OT-1 , OT .
• Action:
• compute P(O/λ) ; the probability of the
observation sequence given the model.

• Find All Possible Paths and Pr of each


Case 2: Decoding
Given:
• a model λ = (A , B , π ) ready to be used.
• testing or training observation sequence O = O1
, O2 , O3 ,.........., OT-1 , OT .
• Action:
• track the optimum state sequence Q = q1 ,
q2 , q3 ,........., qT-1 , qT
• that most likely produce the given observations,
using the given model.

Maximum probability along the best probable state


sequence path of a given observation sequences
Case 3: Training

Training procedure to optimize the model


parameters to obtain the best model that
represent certain set of observations
Baum-Welch
(Forward–Backward) Algorithm
• It is an iterative method to reach the local
maximas of the probability function P(O/λ).
• This model always converges but the global
maximisation can not be assured
Uses in NLP and DIP
• Sound / Phonems ( sequence ) as states
– Characters/ Words as Observations

• Face Recognition ( top to bottom sequence) as


states
– Different shapes/structures as observations
MATLAB toolbox for HMM
• Generating a Test Sequence
• The following commands create the transition
and emission matrices
• TRANS = [.9 .1; .05 .95;];
• EMIS = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6;... 7/12, 1/12,
1/12, 1/12, 1/12, 1/12];
• To generate a random sequence of states and
emissions from the model, use hmmgenerate:
• [seq,states] = hmmgenerate(1000,TRANS,EMIS);
• Estimating the State Sequence
• Given the transition and emission matrices
TRANS and EMIS, the function hmmviterbi
uses the Viterbi algorithm to compute the
most likely sequence of states the model
would go through to generate a given
sequence seq of emissions:
• likelystates = hmmviterbi(seq, TRANS, EMIS);
• Using hmmtrain. If you do not know the
sequence of states states, but you have initial
guesses for TRANS and EMIS, you can still
estimate TRANS and EMIS using hmmtrain.
Conclusion
• Advantages
– Has contributed quite a bit to speech recognition
– With algorithms we have described, computation is
reasonable
– Complex processes can be modeled with low-
dimensional data
– Works well for time varying classification
• other examples: gesture recognition, formant tracking
• Limitations
– Assumption that successive observations are independent
– First order assumption: probability state at time t only depends on
state at time t-1
– Need to be “tailor made” for specific application
– Needs lots of training data, in order to see all observations
Thank
ouY

41
References
• Rabiner__A_Tutorial_on_Hidden_Markov_Models_and_Selec
ted_Applications_in_Speech_Recognition
• The Concepts of Hidden Markov Model inSpeech Recognition
Technical Report TR99/09 Waleed H. Abdulla and Nikola K.
KasabovDepartment ofKnowledge Engineering Lab
Information Science Department University of Otago New
Zealand
• Face Detection and Recognition using HMM by Ara.v.Nefian
• http://www.mathworks.com/help/toolbox/stats/f8368.html
• Wikiiipedia…
• And……. more…

42

You might also like