You are on page 1of 7

A Project Report on

ANALYSIS OF LANGUAGE MODEL FOR BACKEND CLASSIFIER IN


LANGUAGE IDENTIFICATION SYSTEM

Submitted to the Dept. of Information Technology, SNIST


in the partial fulfillment of the academic requirements for the award of
B.Tech (Information Technology) under JNTUH

by

Ms. B.JYOSTHNA (09311A1263)


Ms. K.SONAL (09311A1277)
Ms. P.SRAVANTHI (09311A1288)
Under the guidance of

Mr. Sreedhar Potla


Associate Professor in IT

Department of Information Technology


School of Computer Science and Informatics
SreeNidhi Institute of Science and Technology (An Autonomous Institute)
Yamnampet, GhatkesarMandal, R. R. Dist., Hyderabad 501301
affiliated to

Jawaharlal Nehru Technological University Hyderabad


Hyderabad 500 085

2013

Department of Information Technology


School of Computer Science and Informatics
SreeNidhi Institute of Science and Technology
(An Autonomous Institute)

CERTIFICATE
This is to certify that the Project report on Analysis of Language Model for back-end
classifier in Language Identification System is a bonafide work carried out by Ms.
B.JYOSTHNA

(09311A1263),

Ms.

K.SONAL

(09311A1277),

Ms.

P.SRAVANTHI

(09311A1288) in the partial fulfillment for the award of B.Tech degree in Information
Technology, SreeNidhi Institute of Science and Technology, Hyderabad, affiliated to Jawaharlal
Nehru Technological University, Hyderabad under our guidance and supervision.
The results embodied in the project work have not been submitted to any other University or
Institute for the award of any degree or diploma.

Internal Guide

Head of the Department

Mr. Sreedhar Potla

Dr. V. V. S. S. S. Balaram

Associate Professor

Department of IT

ACKNOWLEDGEMENTS
This project report is the outcome of the efforts of many people, who have driven
our passion to explore into Natural Language Processing. We have received great
guidance, encouragement and support from them and have learned a lot because of their
willingness to share their knowledge and experience.
Primarily, we should express our deepest sense of gratitude to our guide Mr.
Sreedhar Potla, Associate Professor, Department of IT. His guidance, total support on the
core technical aspects of our work and the organization of our thesis documentation has
been of immense help in surmounting various hurdles along the path of our goal.
We would like to express our sincere thanks to Dr. P. Narasimha Reddy,
Executive Director, Dr. V. Vasudeva Rao, Principal, Dr. V. Umakanth Sastry, Director,
School of Computer Science and Informatics, Dr. V.V.S.S.S. Balaram, Head of the
Department of Information Technology, Mr. Karthik Krishnamurthi, Project work
coordinator, SreeNidhi Institute of Science and Technology, Hyderabad for permitting us
to do our project work at SreeNidhi Institute of Science and Technology, Hyderabad.
We would also thank our parents and we adore Almighty God who has made us
come in contact with such worthy people at the right time, provided us with all the
necessary resources and made us accomplish this task.
Finally, we would also like to thank all my faculty and supporting staff and the
people who have directly or indirectly helped us.

B.Jyosthna (09311A1263)
K.Sonal (09311A1277)
P.Sravanthi (09311A1288)

TABLE OF CONTENTS
1. INTRODUCTION

1.1. Motivation

1.2. Language Identification Systems (LID)

1.2.1. Acoustic LID

1.2.2. Phonotactic LID

1.3. Features of Ideal LID

1.4. Problems with Language Identification Systems

1.5. Objective of the project

1.6. Limitations of the project

2. SPEECH PROCESSING: OVERVIEW


2.1. Classification

9
9

2.1.1. Analysis/Synthesis

10

2.1.2. Recognition

10

2.1.3. Coding

11

2.2. Area of Project

11

2.3. Processes in LID

12

2.4. Speech Database

12

2.5. Preprocessing

13

2.6. Features to be extracted and their Extraction

13

2.6.1. Prosodic Features

13

2.6.2. Feature Extraction Techniques

14

2.6.3. Tools for Feature Extraction

15

2.7. Language Modeling and Classifier

17

2.8. Framework of a LID

18

3. SYSTEM DESCRIPTION

19

3.1. Existing System

19

3.2. Proposed System

20

3.3. The Working

20

3.4. Difficulties faced

20

3.5. Requirements of the Project

21

4. LITERATURE SURVEY: STATISTICAL LANGUAGE MODELS


4.1. Definition and Use

22

4.2. Survey of Major SLM techniques

23

4.2.1. N-Gram Model

24

4.2.2. Decision Model

28

4.2.3. Exponential Model

29

4.2.4. Adaptive Model

31

4.2.5. Hidden Markov Model (HMM)

32

4.2.6. GMM

35

4.2.6.1. What is a Gaussian?

36

4.2.6.2. What is GMM?

37

4.2.6.3. EM Algorithm

38

4.2.6.4. Initialization and Convergence Issues for EM

40

4.2.6.5. Reason for choosing GMM

42

4.2.6.6. Applications of EM

43

4.3. Measures of Progress

43

4.4. Known weaknesses in current models

44

5. SYSTEM ANALYSIS AND DESIGN


5.1. Brief Description of Modules

46
46

5.1.1. Module 1: Input Pre-processing

46

5.1.2. Module 2: Initialization of EM algorithm parameters

47

5.1.3. Module 3: The EM Algorithm Processing

47

5.1.4. Module 4: Calculation of Final probabilities

48

5.2. MATLAB Environment

48

5.2.1. History and Development

49

5.2.2. MATLAB Application and its Setup

50

6. IMPLEMENTATION
6.1. Modules

53
53

6.1.1. Input Pre-processing

53

6.1.2. Initialization of EM algorithm parameters

53

6.1.3. The EM Algorithm Processing

54

6.1.4. Calculation of Final probabilities

56

6.2. Results and Screenshots

56

7. CONCLUSION AND FUTURE ENHANCEMENTS

58

8. REFERENCES

60

9. LIST OF FIGURES

62

You might also like