You are on page 1of 5

Assignment Submission

Speech Recognition System Architectural Design


Course No. : Software Architecture
Course Title : SS ZG653

Definition:
Speech Recognition System is a system that enables recognition and translation of spoken languages into
text by electronic devices like computers, mobile devices etc. It is also known as "automatic speech
recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). It incorporates
knowledge and research in the linguistics, computer science, and electrical engineering fields.

Architectural Details:
Both acoustic modeling and language modeling are important parts of modern statistically-based speech
recognition algorithms. I have used Hidden Markov Model which is basically used by most of the modern
Speech Recognition System.

a. Reason to use Hidden Markov Model (HMMs):

1. Markov model can be used for stochastic purposes i.e. for speech which can be approximated as a
stationary process.
2. HMMs can be trained automatically and are simple yet computationally feasible to use.

b. Working of Hidden Markov Model:

1. HMM outputs a sequence of n-dimensional real-valued vectors per 10 seconds.


2. Vectors consist of cepstral coefficients which are obtained by taking a Fourier Transform of a
short time window of speech and de-correlating the spectrum using a cosine transform by taking
the first coefficients.
3. HMM will have in each state a statistical distribution which will give a likelihood for each
observed vector.
4. HMM for a sequence of words or phonemes is made by concatenating the individual trained
hidden Markov models for the separate words and phonemes.

Hidden Markov Model:


A hidden Markov model is a Markov chain for which the state is only partially observable. In
other words, observations are related to the state of the system, but they are typically insufficient
to precisely determine the state. Several well-known algorithms for hidden Markov models exist.
For example, given a sequence of observations, the Viterbi algorithm will compute the most-
likely corresponding sequence of states, the forward algorithm will compute the probability of the
sequence of observations, and the BaumWelch algorithm will estimate the starting probabilities,
the transition function, and the observation function of a hidden Markov model.
One common use is for speech recognition, where the observed data is the speech
audio waveform and the hidden state is the spoken text. In this example, the Viterbi algorithm
finds the most likely sequence of spoken words given the speech audio.

Mind Map:
Use Case Diagram:
Sequence Diagram:

Activity Diagram:
References and Tools:
Creately.com
"Speaker Independent Connected Speech Recognition- Fifth Generation Computer Corporation". Fifthgen.com.
Retrieved 15 June 2013.
https://en.wikipedia.org/wiki/Hidden_Markov_model

You might also like