You are on page 1of 66

HYBRID SPEECH RECOGNITION MODEL

INTRODUCTION
1.1 INTRODUCTION
Human speech has evolved over many thousands of years to become an efficient method of sharing information and giving instructions. Designing a machine that understand human behavior particularly the capability of speaking naturally and responding properly to spoken language has intrigued engineers and scientists for centuries. Speech Recognition is a technology that allows the computer to identify and understand words spoken by a person using a microphone or telephone. The ultimate goal of the technology is to be able to produce a system that can recognize with accuracy all words that are spoken by any person. Computer software that understands your speech enables you to have conversations with the computer. These conversations would include you and the computer speaking as commands or in response to events, input, or other feedback. Speech recognition applications are becoming more advantageous. Speech based interface technology will enable easy to use automation of new and existing communication services, making human machine interaction more natural. For the disabled people the absence of the data bases and diversity of the articulator handicaps are major obstacles for the construction of reliable speech recognition systems, which explains poverty of the market in systems of speech recognition for disabled people. If a person finds it difficult or is not capable of handling the mouse ports and the keyboard and if the keyboard or mouse is faulty, there have to be other ways to handle the operating system. Speech may act as one of them. There is a growing demand for systems capable of handling Operating System using only the voice commands given by a person. This project lets the user control computer functions and dictates text by voice. For example, a person can open the Microsoft Access with a voice command, such as Open Access, paintbrush by just saying Open Paint, or any control applications like Windows Security Center, Power Options or can even open

HYBRID SPEECH RECOGNITION MODEL

the browser, by saying Open Browser and start media player by saying Start Music etc Speech recognition applications are becoming more advantageous. In speech recognition, the sounds uttered by an orator are converted to a sequence of words recognized by a listener. The logic behind the speech recognition process is that it requires information to flow in one direction that is from sounds to words. This direction of information flow is mandatory and essential for a speech recognition model to function in many practical circumstances; an automatic speech recognizer has to function in several different but well-defined aural environments. Hybrid speech recognition systems are more effectual in noiseless environments. Signal Modelling and Pattern Matching are the two basic operations carried out in speech recognition system. Signal modelling is the process of converting the speech signal into a set of parameters whereas, pattern matching is the task of identifying parameter set from memory that closely matches the parameter set obtained from the input speech signal. Speech output is more impressive and comprehensible than the text output. Speech recognizer converts the spoken utterance into text. The computers can follow and understand the human voice commands as well as languages with the aid of speech recognition technology. The speech recognition process is performed by a software component known as the speech recognition engine. The primary function of the speech recognition engine is to process spoken input and translate it into text that an application understands. The application can then do one of two things: The application can interpret the result of the recognition as command. In this case, the application is a command and control application. An example of a command and control application is one in which the caller says check balance, and the application returns the current balance of the callers account. If an application handles the recognized text simply as text, then it is considered a dictation application. In a dictation application, if you said
2

HYBRID SPEECH RECOGNITION MODEL

check balance, the application would not interpret the result, but simply return the text check balance. Speech Recognition is used for many different purposes and there is no definitive list. However, below are some of the most common applications including: 1. Health Care - One of the most promising areas for the application of speech recognition is in helping handicapped people. People with motor limitations, who cannot use a standard keyboard and mouse, can use their voices to navigate the computer and create documents. For example, Braille input/output devices touch screen systems and trackballs have all been used successfully in the classrooms. Speech recognition technology has great potential to provide people with disabilities greater access to computers and a world of opportunities. 2. Military
2.1 High-performance fighter aircraft: Speech recognizers have been operated

successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays. Some important conclusions from the work were as follows: 1. Speech recognition has definite potential for reducing pilot workload. 2. Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful. 3. More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained. 4. Laboratory research in robust speech recognition for military environments has produced promising results which, if extendable to the cockpit, should improve the utility of speech recognition in high-performance aircraft.
2.2 Training Air Traffic: Speech recognition and synthesis techniques offer

the potential to eliminate the need for a person to act as pseudo-pilot, thus reducing training and support personnel. Air controller tasks are also characterized by highly structured speech as the primary output of the controller, hence reducing the difficulty of the speech recognition task.
3

HYBRID SPEECH RECOGNITION MODEL

2.3 Battle Management: Human-machine interaction by voice has the potential

to be very useful in these environments. A number of efforts have been undertaken to interface commercially available isolated-word recognizers into battle management environments. In one feasibility study, speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications. 3. Telephony and other domains - Speech Recognition in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread. The improvement of mobile processor speeds made feasible the speech-enabled Symbian and Windows Mobile Smart phones. Speech is used mostly as a part of User Interface, for creating pre-defined or custom speech commands.

HYBRID SPEECH RECOGNITION MODEL

2. SYSTEM ANALYSIS
2.1 INTRODUCTION
Speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition (ASR). Some SR systems use "training" where an individual speaker reads sections of text into the SR system. These systems analyze the person's specific voice and use it to fine tune the recognition of that person's speech, resulting in more accurate transcription.

2.1.1 PURPOSE OF THE SYSTEM


The main goal of speech recognition is to develop techniques and

systems for speech input to machine. For reasons ranging from technological curiosity about the mechanisms for mechanical realization of human speech capabilities to desire to automate simple tasks which necessitates human machine interactions and research in automatic speech recognition by machines has attracted a great deal of attention. Automatic speech recognition systems today find widespread application in tasks that require human machine interface, such as automatic call processing in telephone networks, and query based information systems that provide updated travel information, stock price quotations, weather reports, Data entry, voice dictation, access to information: travel, banking, Commands, Avionics, Automobile portal, speech transcription, Handicapped people (blind people) supermarket, railway reservations etc.

2.1.2 SCOPE OF THE SYSTEM:


This proposed solution uses static Sphinx4.0 as a speech software and allows us to convert the speech into text format , open several kinds of

applications using a single voice command .At some point in the future, speech recognition may become speech understanding. This speech recognition that allow computers to decide what a person just said may someday allow them to grasp the meaning behind the words. Although it is a huge leap in terms of computational power and software sophistication, some researchers argue that speech recognition

HYBRID SPEECH RECOGNITION MODEL

development offers the most direct line from the computers of today to true artificial intelligence.

2.2 EXISTING SYSTEM:


In the existing system, the mobile device is initially controlled through wired networks like switches, buttons etc. later it is controlled through wireless networks like remote etc. But in wireless network, there is no proper acknowledgement for the transmitted signal. As a result there is no reliability and robustness. We can use our computer only if keyboard and mouse works properly. In wireless network when controlling a mobile device there is no proper acknowledgement for the transmitted signal. As a result there is no reliability and robustness. There is no other alternative if any of the mouse or keyboard gets faulty

LIMITATIONS OF EXISTING SYSTEM:


No proper acknowledgement for the transmitted signal. No reliability and robustness. Consumes a lot of time. Requires direct role of the person to be involved in the work.

2.3 PROPOSED SYSTEM:


In the proposed system about 95% of the existing system is being computerized. In the proposed system, the mobile device is controlled through speech signal. The speech signal is given through micro phone. Using this we can achieve the maximum amount of accuracy. Hence more acknowledgements are provided which results in reliability and robustness. Modern speech recognition systems can achieve high recognition rates, but their accuracy often decreases dramatically in noisy and crowded environments. This is usually dealt with by either requiring an almost noisefree environment or by placing the microphone very close to the speakers mouth.

HYBRID SPEECH RECOGNITION MODEL

ADVANTAGES OF PROPOSED SYSTEM:


system. Security of data. Ensures the maximum accuracy. Greater efficiency. User friendly and interactive. Minimum time required. Flexibility of the system. Highly advantageous for people with both physical and mental Ability to carry maximum number of operations on an operating

disabilities. Useful during natural calamities or typical situations. Can also be used as voice controlled operating system. Reduction in the cost of software as much as possible

2.4 MODULES: 2.4.1 User Module:


In this module user inputs a speech or voice command into the microphone. The microphone converts the speech signal given by the user into the text format or any other action based on the functionality.

2.4.2 Speech Recognition:


Speech recognition is used to recognize the word pattern data, extracted from a speech signal.Once the speech data is in the proper format, the engine searches for the best match. It does this by taking into consideration the words and phrases it knows about (the active grammars), along with its knowledge of the environment in which it is operating. Here we use CMU Sphinx 4.0 as the speech recognition software.

HYBRID SPEECH RECOGNITION MODEL

2.4.3 Speech to text:


The particular speech signal given by the user is converted into the sequence of words and is displayed in the Microsoft Word document.

2.4.4 Open applications:


This is an additional feature of my project. If the user gives a voice command then the application relating to the voice command gets opened. Here the speech recognition engine compares the voice command with the command in the database and if it matches then the particular .exe file of the application is executed and hence the application gets opened.

HYBRID SPEECH RECOGNITION MODEL

3. SYSTEM STUDY
3.1 FEASIBILITY STUDY:
A feasibility study is an evaluation and analysis of the potential of the proposed project which is based on extensive investigation and research to give full comfort to the decisions makers. Feasibility studies aim to objectively and rationally uncover the strengths and weaknesses of an existing business or proposed venture, opportunities and threats as presented by the environment, the resources required to carry through, and ultimately the prospects for success.

Technical Feasibility Operational Feasibility Economical Feasibility

3.1.1 TECHNICAL FEASIBILITY:


The assessment is based on an outline design of system requirements, to determine whether the company has the technical expertise to handle completion of the project. When writing a feasibility report, the following should be taken to consideration:

A brief description of the business to assess more possible factor/s which could affect the study

The part of the business being examined The human and economic factor The possible solutions to the problems

At this level, the concern is whether the proposal is both technically and feasible.

3.1.2 OPERATIONAL FEASIBILITY:


Operational feasibility is a measure of how well a proposed system solves the problems, and takes advantage of the opportunities identified during scope

HYBRID SPEECH RECOGNITION MODEL

definition and how it satisfies the requirements identified in the requirements analysis phase of system development.

Is there sufficient support for the management from the users? Will the system be used and work properly if it is being developed and implemented?

Will there be any resistance from the user that will undermine the possible application benefits? Under this category of service we conduct a study to analysis and

determine whether your business need can be fulfilled by using a proposed solution. The result of our operational feasibility Study will clearly outline that the solution proposed for your business is operationally workable and conveniently solves your problems under consideration after the proposal is implemented. This is sometimes referred to as Feasibility Evaluations. We would precisely describe how the system will interact with the systems and persons around. Our feasibility report would provide results of interest to all stakeholders.

3.1.3 ECONOMIC FEASIBILITY:


The purpose of an economic feasibility study (EFS) is to demonstrate the net benefit of a proposed project for accepting or disbursing electronic funds/benefits, taking into consideration the benefits and costs to the agency, other state agencies, and the general public as a whole. The EFS is composed of two required forms:

Business Case Cost Benefit Analysis

10

HYBRID SPEECH RECOGNITION MODEL

4. SOFTWARE REQUIREMENT SPECIFICATION


4.1 software requirement specification:
Software requirement specification (SRS) is the starting point of the software developing activity. As system grew more complex it became evident that the goal of the entire system cannot be easily comprehended. Hence the needs for the requirement phase arouse. The software project is initiated by the client needs. The SRS is the means of translating the ideas of the minds of clients (the input) into formal document (the output of the requirement phase). The SRS phase consists of two activities:

4.1.1 Problem/ Requirement Analysis:


The process is order and more nebulous of the two, deals with understand the problem, the goal and constraints.

4.1.2 Requirement Specification:


The focus is on specifying what has been found giving analysis such as representation, specification language and tools, and checking the specifications are addressed during this activity. The requirement phase terminates with the production of the validate SRS document. Producing the SRS document is the basic goal of this phase.

4.2 ROLE OF SRS:


The purpose of the software requirement specification is to reduce the communication gap between the clients and the developers. Software requirement specification is the medium through which the clients and user needs are accurately specified. It forms the basis of software development. A good SRS should satisfy all the parties involved in the system.

11

HYBRID SPEECH RECOGNITION MODEL

4.3 REQUIREMENT SPECIFICATION


A requirement is a feature that the system must have a constraint that it must satisfy to be accepted by the client. Requirement engineering aims at defining the requirements of the system under construction. Requirement engineering includes two main activities, requirements elicitation, which results in the specification of the system that the client understands, and analysis, which results in an analysis model that the developer can unambiguously interpret. A requirement is statement about what the proposed system will do. Requirements can be divided into two major categories: functional requirements and non requirements.

4.4 FUNCTIONAL REQUIREMENTS:


Descriptions of data to be entered into the system Descriptions of operations performed by each screen Descriptions of work-flows performed by the system Descriptions of system reports or other outputs Who can enter the data into the system? How the system meets applicable regulatory requirements

4.5 NON FUNCTIONAL REQUIREMENTS: 4.5.1 USABILITY:


The system is designed such that it is having good user interface with just voice commands and no usage of keyboard or mouse within very short period of time. It rapidly increases the usability.

4.5.2 RELIABLITY:
We made the system reliable by making the system developed with all exceptions handled.

12

HYBRID SPEECH RECOGNITION MODEL

4.5.3 PERFORMANCE:
The system will exhibit high performance because it can handle any sort of applications by a single voice command.

4.5.4 SUPPORTABILITY:
Since the project is developed in JAVA, the project will run on any operating system and it can be executed from any.

4.5.5 LEGAL:
The project is developed according GPL (general public Licence). Developed using open source technologies, hence the code can be given for free as well as services can be charged against the software act 2001.

4.6 SYSTEM REQUIREMENTS:


The user expected to ensure that the minimum requirements for running the product are satisfied. The hardware and software environment in which this product was developed is specified. It is necessary to make sure that the hardware and software the consumer uses is compatible to the specification given below.

13

HYBRID SPEECH RECOGNITION MODEL

4.7 HARDWARE REQUIREMENTS:


Processor RAM Hard Disk : X86 Compatible processor with 1.7 GHz Clockspeed : 512MB : 20GB

External Device : Microphone

4.8 SOFTWARE REQUIREMENTS:


Programming Language Operating System Speech API Design Tool Browser : Java : Windows Family : CMU Sphinx 4.0 : Data Flow Diagram : Open source browser (e.g.: Firefox ,safari, web kit, chrome And Opera)

14

HYBRID SPEECH RECOGNITION MODEL

SYSTEM DEVELOPMENT MODEL


The aim is to provide discipline to the development of software a structured framework against which development takes place is advocated. A model of the process of system development is used by organizations to describe their approach to producing computer systems. Traditionally this has been a staged approach, known as system life cycle or system development life cycle(SDLC). System development life cycle(SDLC) is the overall process of developing information systems through a multi step process from investigation of initial requirements through analysis, design, implementation and maintenance. There are different models and methodologies, but each generally consists of a series of defined steps or stages. Software development consists of a programmer writing code to solve a problem or automate procedure. Nowadays, system are so big and complex that teams of architects, analysts, programmers, testers, and users must work together to create the millions of lines of custom-written code that drive our enterprises. To manage this, a number of SDLC models have been created: waterfall, fountain, spiral, build & fix, rapid protyping, incremental, and synchronize, stabilize. From this we have used fountain model for our project.

15

HYBRID SPEECH RECOGNITION MODEL

5. SYSTEM DESIGN
5.1 INTRODUCTION
Design is concerned with identifying software components specifying relationships among components. Specifying software structure and providing blue print for the document phase. Modularity is one of the desirable properties of large systems. It implies that the system is divided into several parts. In such a manner, the interaction between parts is minimal clearly specified. Design will explain software components in detail. This will help the implementation of the system. Moreover, this will guide the further changes in the system to satisfy the further requirements.

5.2 DATA FLOW DIAGRAM:


A graphical tool used to describe and analyze the moment of data through a system manual or automated including the process, stores of data, and delays in the system. Data flow diagrams are the basis from which other components are developed. The transformation of data from input to output, through process, may be described logically and independently of the physical components associated with the system. The DFD is also know as a data flow graph or a bubble chart. DFDs are the model of the proposed system. They clearly should show the requirements on which the new system should be built. Later during design activity this is taken as the basis for drawing the systems structure charts. The basic notation used to create a DFDs are as Follows: DFDs are the model of the proposed system. They clearly should show the requirements on which the new system should be built. Later during design activity this is taken as the basis for drawing the systems structure charts.

1. Data Flow: Data move in a specific direction from an origin to a destination.


16

HYBRID SPEECH RECOGNITION MODEL

2. Process: people, procedures or devices that use or produce (transform) Data. The
physical component is not identified.

3. Source: External sources or destination of data, which may be people, programs,


organization or other entities.

4. Data Store: Here data are stored or referenced by a process in the system. Data Flow diagram for Hybrid Speech Recognition Model
Level 0:Speech Signal SPEECH RECOGNITION Text Format or Action OUTPUT monitors

INPUT

Level 1:-

Speech Signal INPUT SPEECH RECOGNITION

Text Format or Action OUTPUT monitors

SPEECH TO TEXT

OPEN APPLICATIONS

17

HYBRID SPEECH RECOGNITION MODEL

Level 2:-

Speech Signal INPUT

Text Format or Action

SPEECH RECOGNITION

monitors

OUTPUT

SPEECH TO TEXT

OPEN APPLICATIONS

TEXT DISPLAYED IN MS-WORD DOCUMENT

OPENS ACCESS, PAINT, NOTE, ADOBE, BROWSER, FACEBOOK ETC

5.3 UNIFIED MODELLING LANGUAGE


The Unified Software development process is representative of a number of component based development models that have been proposed in the industry. Using Unified Modelling Language (UML), the Unified process defines the components that will be used to build the system and the interfaces that will connect the components. Using combination of iterative and internal development, the unified process defines the function of the system by applying a scenario based approach. It then couples with an architectural frame work that identifies the form the software will take. The UML captures information about the static structure and dynamic behaviour of system. The static structure defines kinds of objects. The dynamic defines history of objects.

18

HYBRID SPEECH RECOGNITION MODEL

5.3.1 GOALS OF UML


There were a number of goals behind the development of a UML: 1. UML is a general purpose modelling language that all modules can use. 2. UML was to be as simple as possible while still being capable of modelling. modelling the full range of practical systems that need to be built. UML needs to be expressive enough to handle all the concepts that arises in a modern system, such as concurrency an distribution as well as software engineering mechanisms, such as encapsulation and components.
UML Documenting:

UML provides variety of documents in addition raw executable codes. The use case view of a system encompasses the use cases that describe the behavior of the system as seen by its end users, analysts, and testers.

Figure 5.3 Modeling a System Architecture using views of UML

19

HYBRID SPEECH RECOGNITION MODEL

The design view of a system encompasses the classes, interfaces, and collaborations that form the vocabulary of the problem and its solution. The process view of a system encompasses the threads and processes that form the system's concurrency and synchronization mechanisms. The implementation view of a system encompasses the components and files that are used to assemble and release the physical system. The deployment view of a system encompasses the nodes that form the system's hardware topology on which the system executes.

5.4 USE CASE DIAGRAM


USECASE: A use case is a description of a systems behaviour from users standpoint. Its a tried-and-true technique for gathering system requirements from a users point of view. Obtaining information from the users point of view is important if the goal is to build a system that real people can use. The little stick figure that corresponds to the washing machine user is called an actor. The ellipse represents the use case.

ACTOR: An actor represents a coherent set of roles that users of a system play when interacting with the use cases of the system. An actor participates in use case accomplish an overall purpose. An actor can represents the role of a human, a device, or any other systems.

20

HYBRID SPEECH RECOGNITION MODEL

Use Case Diagram for Hybrid Speech Recognition Model


SPEECH RECOGNITION

SPEECH TO TEXT

User

OPEN APPLICATIONS

Use Case Diagram for Open Applications

CALCULATOR

MUSIC

RUN PROGRAMS

User

OPEN ACCESS

SITE EMAIL

SITE FACEBOOK

21

HYBRID SPEECH RECOGNITION MODEL

5.5 CLASS DIAGRAM:


Class diagrams are widely used to describe the types of objects in a system and their relationships. Class diagrams model class structure and contents using design elements such as classes, packages and objects. Class diagrams describe three different perspectives when designing a system, conceptual, specification, and implementation. These perspectives become evident as the diagram is created and help solidify the design. These are arguably the most used UML diagram type. Class Diagram is the main building block of any object oriented solution. It shows the classes in a system, attributes and operations of each class and the relationship between each class. In most modeling tools a class has three parts, name at the top, attributes in the middle and operations or methods at the bottom. In large systems with many classes related classes are grouped together to create class diagrams. Different relationships between diagrams are show by different types of Arrows.

Class Diagram for Hybrid Speech Recognition Model

User
String text ; String buffer ;

Speech recognition
String notepad; String paint;

speechcheck() ()

speechtotext() loadJSGF() openapps()

22

HYBRID SPEECH RECOGNITION MODEL

5.6 SEQUENCE DIAGRAM


A sequence diagram is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams are typically associated with use case realizations in the Logical View of the system under development. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams. A sequence diagram shows, as parallel vertical lines (lifelines), different processes or objects that live simultaneously, and, as horizontal arrows, the messages exchanged between them, in the order in which they occur. This allows the specification of simple runtime scenarios in a graphical manner.

Each column represents the objects that participate in the interaction. Labels on solid arrow represent message names and may contain arguments. Actor initiating the process is shown in left column

Sequence Diagram for Hybrid Speech Recognition Model .

23

HYBRID SPEECH RECOGNITION MODEL

5.7 ACTIVITY DIAGRAM:


Activity diagrams describe the workflow behavior of a system. They are similar to state diagrams because activities are the state of doing something. These describe the state of activities by showing the sequence of activities performed. Activity diagrams can show activities that are conditional or parallel.

How to Draw: Activity Diagrams


Activity diagrams show the flow of activities through the system. Diagrams are read from top to bottom and have branches and forks to describe conditions and parallel activities. A fork is used when multiple activities are occurring at the same time. All branches at some point are followed by a merge to indicate the end of the conditional behavior started by that branch.

When to Use: Activity Diagrams


Activity diagrams should be used in conjunction with other modeling techniques such as interaction diagrams and state diagrams. The main reason to use activity diagrams is to model the workflow behind the system being designed. Activity Diagrams are also useful for: analyzing a use case by describing what actions need to take place and when they should occur; describing a complicated sequential algorithm; and modeling applications with parallel processes.

Activity Diagram for Hybrid Speech Recognition Model

24

HYBRID SPEECH RECOGNITION MODEL

6. SYSTEM DEVELOPMENT ENVIRONMENT


6.1 Introduction to Java:
Java has been around since 1991, developed by a small team of Sun Microsystems developers in a project originally called the Green project. The intent of the project was to develop a platform-independent software technology that would be used in the consumer electronics industry. The language that the team created was originally called Oak. The first implementation of Oak was in a PDA-type device called Star Seven (*7) that consisted of the Oak language, an operating system called GreenOS, a user interface, and hardware. The name *7 was derived from the telephone sequence that was used in the team's office and that was dialled in order to answer any ringing telephone from any other phone in the office. Around the time the First Person project was floundering in consumer electronics, a new craze was gaining momentum in America; the craze was called "Web surfing." The World Wide Web, a name applied to the Internet's millions of linked HTML documents was suddenly becoming popular for use by the masses. The reason for this was the introduction of a graphical Web browser called Mosaic, developed by ncSA. The browser simplified Web browsing by combining text and graphics into a single interface to eliminate the need for users to learn many confusing UNIX and DOS commands. Navigating around the Web was much easier using Mosaic. It has only been since 1994 that Oak technology has been applied to the Web. In 1994, two Sun developers created the first version of Hot Java, and then called Web Runner, which is a graphical browser for the Web that exists today. The browser was coded entirely in the Oak language, by this time called Java. Soon after, the Java compiler was rewritten in the Java language from its original C code, thus proving that Java could be used effectively as an application language.Web surfing has become an enormously popular practice among millions of computer users. Until

25

HYBRID SPEECH RECOGNITION MODEL

Java, however, the content of information on the Internet has been a bland series of HTML documents. Web users are hungry for applications that are interactive, that users can execute no matter what hardware or software platform they are using, and that travel across heterogeneous networks and do not spread viruses to their computers. Java can create such applications. The Java programming language is a high-level language that can be characterized by all of the following buzzwords: Simple Architecture neutral Object oriented Portable Distributed High performance Interpreted Multithreaded Robust Dynamic Secure

With most programming languages, you either compile or interpret a program so that you can run it on your computer. The Java programming language is unusual in that a program is both compiled and interpreted. With the compiler, first you translate a program into an intermediate language called Java byte codes the platformindependent codes interpreted by the interpreter on the Java platform. The interpreter
26

HYBRID SPEECH RECOGNITION MODEL

parses and runs each Java byte code instruction on the computer. Compilation happens just once; interpretation occurs each time the program is executed. The following figure illustrates how this works.

Figure 6.1: Working Of Java You can think of Java byte codes as the machine code instructions for the java virtual machine (Java VM). Every Java interpreter, whether its a development tool or a Web browser that can run applets, is an implementation of the Java VM. Java bytecodes help make write once, run anywhere possible. You can compile your program into bytecodes on any platform that has a Java compiler. The bytecodes can then be run on any implementation of the Java VM. That means that as long as a computer has a Java VM, the same program written in the Java programming language can run on Windows 2000, a Solaris workstation, or on an iMac. 6.2 The Java Platform: A platform is the hardware or software environment in which a program runs. Weve already mentioned some of the most popular platforms like Windows 2000, Linux, Solaris, and MacOS. Most platforms can be described as a combination of the operating system and hardware. The Java platform differs from most other platforms in that its a software-only platform that runs on top of other hardware-based platforms. The Java platform has two components: The Java Virtual Machine (Java VM) The java Application Programming Interface (Java API)

27

HYBRID SPEECH RECOGNITION MODEL

Java VM is the base for the Java platform and is ported onto various hardware-based platforms.The Java API is a large collection of ready-made software components that provide many useful capabilities, such as graphical user interface (GUI) widgets. The Java API is grouped into libraries of related classes and interfaces; these libraries are known as packages.The following figure depicts a program thats running on the Java platform. As the figure shows, the Java API and the virtual machine insulate the program from the hardware.

Figure 6.2: The Java Platform Native code is code that after you compile it, the compiled code runs on a specific hardware platform. As a platform-independent environment, the Java platform can be a bit slower than native code. However, smart compilers, well-tuned interpreters, and just-in-time byte code compilers can bring performance close to that of native code without threatening portability. 6.3 Working of Java: For those who are new to object-oriented programming, the concept of a class will be new to you. Simplistically, a class is the definition for a segment of code that can contain both data and functions .When the interpreter executes a class, it looks for a particular method by the name of main, which will sound familiar to C programmers. The main method is passed as a parameter an array of strings (similar to the argv[] of C), and is declared as a static method. To output text from the program, execute the println method of System. out, which is javas output stream. UNIX users will appreciate the theory

28

HYBRID SPEECH RECOGNITION MODEL

behind such a stream, as it is actually standard output. For those who are instead used to the Wintel platform, it will write the string passed to it to the users program. 6.4 What Can Java Technology Do? The most common types of programs written in the Java programming language are applets and applications. If youve surfed the Web, youre probably already familiar with applets. An applet is a program that adheres to certain conventions that allow it to run within a Java-enabled browser. However, the Java programming language is not just for writing cute, entertaining applets for the Web. The general-purpose, high-level Java programming language is also a powerful software platform. Using the generous API, you can write many types of programs. An application is a standalone program that runs directly on the Java platform. A special kind of application known as a server serves and supports clients on a network. Examples of servers are Web servers, proxy servers, mail servers, and print servers. Another specialized program is a servlet. A servlet can almost be thought of as an applet that runs on the server side. Java Servlets are a popular choice for building interactive web applications, replacing the use of CGI scripts. Servlets are similar to applets in that they are runtime extensions of applications. Instead of working in browsers, though, Servlets run within Java Web servers, configuring or tailoring the server. How does the API support all these kinds of programs? It does so with packages of software components that provides a wide range of functionality. Every full implementation of the Java platform gives you the following features: The essentials : Objects, strings, threads, numbers, input and output, data structures, system properties, date and time, and so on. Applets: The set of conventions used by applets. Networking: URLs, TCP (Transmission Control Protocol), UDP (User Data gram Protocol) sockets, and IP (Internet Protocol) addresses.

29

HYBRID SPEECH RECOGNITION MODEL

Internationalization: Help for writing programs that can be localized for users worldwide. Programs can automatically adapt to specific locales and be displayed in the appropriate language.

Security: Both low level and high level, including electronic signatures, public and private key management, access control, and certificates. Software components: Known as JavaBeansTM, can plug into existing component architectures. Object serialization: Allows lightweight persistence and communication via Remote Method Invocation (RMI). Java Database Connectivity (JDBCTM) wide range of relational databases. The Java platform also has APIs for 2D and 3D graphics, accessibility, : Provides uniform access to a

servers, collaboration, telephony, speech, animation, and more. The following figure depicts what is included in the Java 2 SDK.

6.5 Introduction to CMU Sphinx:


CMU Sphinx, also called Sphinx in short, is the general term to describe a group of speech recognition systems developed at Carnegie Mellon University. Sphinx is one of the speech recognition engines written entirely in the Java Programming Language. It became the first high performance Speech-to-Text Translation System. The system was the first to use the idea of Hidden Markov Models (HMM) to choose the word with the highest probability match for speech sounds. While the original Sphinx is no longer used, as developments have led to several updates, the latest of which is a modern version called the Sphinx IV system, the theory behind the first Sphinx system remains vital to understanding how Speech-toText Systems work. The main concept behind Sphinx is the Hidden Markov Model, or HMM: a mathematical model for generating probabilities of change from one state to another.

30

HYBRID SPEECH RECOGNITION MODEL

An HMM attempts to determine, based on the syllables already spoken, what the next syllable will be. This is then applied to words as well, as the first Sphinx system kept a library of word pairs which were allowed by the language, and which were not. The HMM would determine the highest probability for the next word based on the probability of which words had followed the word spoken in the past. This makes it easier for the HMM to learn how a speaker creates sentences and phrases, as well as which words are most likely to be used in a given situation. The HMM system used by Sphinx ensured a very high accuracy rate for the first time in the history of Speech-to-Text Systems with a large vocabulary.

6.6 Capabilities of
like:

CMU Sphinx 4.0: Sphinx 4.0 has several capabilities

Live mode and batch mode speech recognizers, capable of recognizing discrete and continuous speech. Generalized pluggable front end architecture. Generalized pluggable language model architecture. Includes pluggable language model support for ASCII and binary versions of Java Speech API Grammar Format (JSGF), and ARPA-format FST grammars.

Generalized search management. Includes pluggable support for breadth first and word pruning searches Generalized acoustic model architecture. Includes pluggable support for Sphinx-3 acoustic models.

Acoustic Models: An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech. Currently, Sphinx-4 uses models created with SphinxTrain. SphinxTrain generates acoustic models in the format used by Sphinx-3. The two main acoustic models that are used by Sphinx-4, TIDIGITS and Wall Street Journal, are already included in the "lib" directory of the binary distribution.

31

HYBRID SPEECH RECOGNITION MODEL

Language Models: Language modelling is used in many natural language processing applications such as speech recognition tries to capture the properties of a language, and to predict the next word in a speech sequence. The language model used by Sphinx-4 follows the ARPA format. BNF-Style Grammars: Sphinx-4 uses the Java Speech API Grammar Format (JSGF) to perform speech recognition using a BNF-style grammar. Currently, you can only use JSGF grammars with the FlatLinguist.

6.7 Overview of HMM Based Speech Recognition Systems:


Sphinx-4 is an HMM-based speech recognizer. HMM stands for Hidden Markov Models, which is a type of statistical model. In HMM-based speech recognizers, each unit of sound (usually called a phoneme) is represented by a statistical model that represents the distribution of all the evidence (data) for that phoneme. This is called the acoustic model for that phoneme. When creating an acoustic model, the speech signals are first transformed into a sequence of vectors that represent certain characteristics of the signal, and the parameters of the acoustic model are then estimated using these vectors (usually called features). This process is called training the acoustic models. During speech recognition, features are derived from the incoming in the same way as in the training process. The component of the recognizer that generates these features is called the front end. These live features are scored against the acoustic model. The score obtained indicates how likely that a particular set of features (extracted from live audio) belongs to the phoneme of the corresponding acoustic model. The process of speech recognition is to find the best possible sequence of words (or units) that will fit the given input speech. It is a search problem, and in the case of HMM-based recognizers, a graph search problem. The graph represents all possible sequences of phonemes in the entire language of the task under consideration. The graph is typically composed of the HMMs of sound units concatenated in a guided manner, as specified by the grammar of the task.

32

HYBRID SPEECH RECOGNITION MODEL

7. SAMPLE CODE
package edu.cmu.sphinx.demo.helloworld; import edu.cmu.sphinx.demo.*; import edu.cmu.sphinx.frontend.util.Microphone; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; import java.io.*;

public class HelloWorld {

public static void main(String[] args) throws FileNotFoundException, IOException { ConfigurationManager cm;

if (args.length > 0) { cm = new ConfigurationManager(args[0]); } else { cm = new ConfigurationManager (

HelloWorld.class.getResource("helloworld.config.xml") ); }

Recognizer recognizer = (Recognizer) cm.lookup("recognizer"); recognizer.allocate(); Microphone microphone = (Microphone) cm.lookup("microphone"); if (!microphone.startRecording()) { System.out.println("Cannot start microphone."); recognizer.deallocate();
33

HYBRID SPEECH RECOGNITION MODEL

System.exit(1); }

System.out.println("Say: ( this is a | speech project | designed in java | uses sphinx to | convert the speech | into text format )"); while (true) { System.out.println("Start speaking. Press Ctrl-C to quit.\n");

Result result = recognizer.recognize();

if (result != null) { String resultText = result.getBestFinalResultNoFiller(); FileOutputStream fos=new FileOutputStream("d:\\sptext.doc",true); BufferedOutputStream bos=new BufferedOutputStream(fos,1024); byte b[]=resultText.getBytes(); bos.write(b); bos.close(); System.out.println("You said: " + resultText + '\n'); } else { System.out.println("I can't hear what you said.\n"); } } } }

34

HYBRID SPEECH RECOGNITION MODEL

#JSGF V1.0;

/** * JSGF Grammar for Speech to text example */

grammar hello;

public <greet> = ( this is a | speech project | designed in java | uses sphinx | to convert the speech | into text format );

import edu.cmu.sphinx.frontend.util.Microphone; import edu.cmu.sphinx.recognizer.Recognizer; import edu.cmu.sphinx.result.Result; import edu.cmu.sphinx.util.props.ConfigurationManager; import edu.cmu.sphinx.util.props.PropertyException; import java.io.File; import java.io.IOException; import java.net.URL; import javax.speech.recognition.ResultToken; public class HelloWorld { static int i=1; static String resultText; public static void main(String[] args) throws InstantiationException { try { URL url; if (args.length > 0) { url = new File(args[0]).toURI().toURL(); } else {
35

HYBRID SPEECH RECOGNITION MODEL

url = HelloWorld.class.getResource("helloworld.config.xml"); }

System.out.println("Loading..."); ConfigurationManager cm = new ConfigurationManager(url); Recognizer recognizer = (Recognizer) cm.lookup("recognizer"); Microphone cm.lookup("microphone"); recognizer.allocate(); if (microphone.startRecording()) { System.out.println("Say: ( Power Options | Music | Device Manager | Security Center | Calculator )"); System.out.println("Say: ( open access| open note |open paint )"); System.out.println("Say: ( run Adobe | run nero | run programs | run browser )"); System.out.println("Say: ( site face book | site email )"); while (true) { System.out.println("Start speaking. Press Ctrl-C to quit.\n"); Result result = recognizer.recognize(); if (result != null) { System.out.println("Enter your choice"+ "\n"); resultText = result.getBestFinalResultNoFiller(); System.out.println("You said: " + resultText + "\n"); if (resultText.equalsIgnoreCase("Power Options")) { try { Process p; p powercfg.cpl")} catch(Exception ae){} }
36

microphone

(Microphone)

Runtime.getRuntime().exec("cmd

/c

HYBRID SPEECH RECOGNITION MODEL

if (resultText.equalsIgnoreCase("run Adobe")){

try{ Process p; p = Runtime.getRuntime().exec("cmd /c start photoshop"); } catch(Exception ae){} } if (resultText.equalsIgnoreCase("calculator")) { try{ Process p; p = Runtime.getRuntime().exec("cmd /c calc"); }catch(Exception ae){} } if (resultText.equalsIgnoreCase("Security Center")) { try{ Process p; p wscui.cpl"); }catch(Exception ae){} } else if (resultText.equalsIgnoreCase("Music")) { try{ Process p; p = Runtime.getRuntime().exec("cmd /c start wmplayer"); }catch(Exception ae){} } else if (resultText.equalsIgnoreCase("run Programs")) { try{ Process p;
37

Runtime.getRuntime().exec("cmd

/c

HYBRID SPEECH RECOGNITION MODEL

p = Runtime.getRuntime().exec("cmd /c start appwiz.cpl"); }catch(Exception ae){} } else if(resultText.equalsIgnoreCase("open paint")) { try{ Process p; p = Runtime.getRuntime().exec("cmd /c start mspaint"); }catch(Exception ae){} }

else if (resultText.equalsIgnoreCase("run Browser")) { try{ Process p;

p=Runtime.getRuntime().exec("C:\\ProgramFiles\\Mozilla Firefox\\firefox.exe"); }catch(Exception ae){} } else if (resultText.equalsIgnoreCase("site email")) { try{ Process p; String st[] = {"C:\\Program Files\\Mozilla Firefox\\firefox.exe","http://www.gmail.com"}; p = Runtime.getRuntime().exec(st); }catch(Exception ae){} } else if (resultText.equalsIgnoreCase("site face book")) { try{ Process p;

38

HYBRID SPEECH RECOGNITION MODEL

String

st[]={"C:\\Program

Files\\Mozilla

Firefox\\firefox.exe","http://www.facebook.com"}; p = Runtime.getRuntime().exec(st); }catch(Exception ae){} }

else if (resultText.equals("open note")) { try{ Process p; p = Runtime.getRuntime().exec("cmd /c start notepad"); }catch(Exception ae){} } else if (resultText.equalsIgnoreCase("start word")) { try{ Process p; p = Runtime.getRuntime().exec("cmd /c start winword"); }catch(Exception ae){} } else if (resultText.equalsIgnoreCase("run nero")) { try{ Process p; p = Runtime.getRuntime().exec("cmd /c start nero"); }catch(Exception ae){} } else if (resultText.equalsIgnoreCase("open Access")) { try{ Process p; p = Runtime.getRuntime().exec("cmd /c start msaccess");
39

HYBRID SPEECH RECOGNITION MODEL

}catch(Exception ae){} } else if (resultText.equalsIgnoreCase("Device Manager")) { try{ Process p; p devmgmt.msc"); }catch(Exception ae){} } = Runtime.getRuntime().exec("cmd /c start

} else { System.out.println("I can't hear what you said.\n"); } } } else { System.out.println("Cannot start microphone."); recognizer.deallocate(); System.exit(1); } } catch (IOException e) { System.err.println("Problem when loading HelloWorld: " + e); e.printStackTrace(); }catch (PropertyException e) { System.err.println("Problem configuring HelloWorld: " + e); e.printStackTrace(); } } }
40

HYBRID SPEECH RECOGNITION MODEL

#JSGF V1.0;

/*** JSGF Grammar for Open Applications example */ grammar hello;

public <greet> = ( Power Options | Music | Device Manager | Security Center | calculator );

public <command> = ( Open ) (access | note | paint );

public <action> = (run ) ( Adobe | nero | Programs | Browser ); public <net> = (site) (face book | email)

41

HYBRID SPEECH RECOGNITION MODEL

8. SCREEN SHOTS

Screen shot for speech to text conversion

42

HYBRID SPEECH RECOGNITION MODEL

Screen shot for music

43

HYBRID SPEECH RECOGNITION MODEL

Screen shot for power options

44

HYBRID SPEECH RECOGNITION MODEL

Screen shot for device manager

45

HYBRID SPEECH RECOGNITION MODEL

Screen shot for security center

46

HYBRID SPEECH RECOGNITION MODEL

Screen shot for calculator

47

HYBRID SPEECH RECOGNITION MODEL

Screen shot for Microsoft access

48

HYBRID SPEECH RECOGNITION MODEL

Screen shot for notepad

49

HYBRID SPEECH RECOGNITION MODEL

Screen shot for paint

50

HYBRID SPEECH RECOGNITION MODEL

Screen shot for adobe photoshop

51

HYBRID SPEECH RECOGNITION MODEL

Screen shot for nero express essentials

52

HYBRID SPEECH RECOGNITION MODEL

Screen shot for run programs

53

HYBRID SPEECH RECOGNITION MODEL

Screen shot for run browser

54

HYBRID SPEECH RECOGNITION MODEL

Screen shot for site face book

55

HYBRID SPEECH RECOGNITION MODEL

Screen shot for site email

56

HYBRID SPEECH RECOGNITION MODEL

9. SYSTEM TESTING
9.1 TEST OBJECTIVES
Software testing is the process used to help identify the correctness, completeness, security, and quality of developed computer software. Testing is a process of technical investigation, performed on behalf of stakeholders, that is intended to reveal quality related information about the product with respect to the context in which it is intended operate. This includes, but is not limited to, the process of executing a program or application with the intent of finding errors. Quality is not an absolute; it is value to some person. With that in mind, testing furnishes a criticism or comparison that compares the state and behaviour of the product against a specification. An important point is that software testing should be distinguished from the separate discipline of software quality assurance (SQA), which encompasses all business process areas, not just testing. There are many approaches to software testing, but effective testing of complex product is essentially a process of investigation, not merely a matter of creating and following routine procedure. One definition of testing is the process of questioning a product in order to evaluate it, where the question are operations the tester attempts to execute with the product, and the product answers with its behaviour in reaction to the probing of the tester[citation needed]. Although most of the intellectual processes of testing are nearly identical to that of review or inspection, the word testing is connoted to mean the dynamic analysis of the product- putting the product through its paces. Some of the common quality attributes include capability, reliability, efficiently, portability, maintainability, compatibility and usability. A good test is sometimes describes as one which reveals an error; however, more recent thinking suggests that a good test is one which reveals information of interest to someone who matters within the project community.

57

HYBRID SPEECH RECOGNITION MODEL

Introduction
Software testing is a critical element of software quality assurance and represents the ultimate review of specification, design and coding. In fact, testing is the one step in the software engineering process that could be viewed as destructive rather than constructive. A strategy for software testing integrates software test case design methods into a well-planned series of steps that result in the successful construction of software. Testing is the set of activities that can be planned in advance and conducted systematically. The underlying motivation of program testing is to affirm software quality with methods that can economically and effectively apply to both strategic to both large and small-scale systems. The best way to minimize errors is to detect and remove errors during analysis and design, so that there are few errors in the source code. There are three major categories of software errors: Requirement errors Design errors Implementation errors The following testing objectives are to be remembered: Testing is a process of executing a program with the intent of identifying an error A good test case is one that has high probability of finding an undiscovered error A successful test case is one that uncovers an error that is yet undiscovered

58

HYBRID SPEECH RECOGNITION MODEL

The software engineering process can be viewed as a spiral. Initially system engineering defines the role of software and leads to software requirement analysis where the information domain, functions, behaviour, performance, constraints and validation criteria for software are established. Moving inward along the spiral, we come to design and finally to coding. To develop computer software we spiral in along streamlines that decrease the level of abstraction on each turn. A strategy for software testing may also be viewed in the context of the spiral. Unit testing begins at the vertex of the spiral and concentrates on each unit of the software as implemented in source code. Testing progress by moving outward along the spiral to integration testing, where the focus is on the design and the construction of the software architecture. Talking another turn on outward on the spiral we encounter validation testing where requirements established as part of software requirements analysis are validated against the software that has been constructed. Finally we arrive at system testing, where the software and other system elements are tested as a whole. The purpose of testing is to discover errors. Testing is the process of trying to discover every conceivable fault or weakness in a work product. It provides a way to check the functionality of components, sub assemblies, assemblies and/or a finished product It is the process of exercising software with the intent of ensuring that the software system meets its requirements and user expectations and does not fail in an unacceptable manner. There are various types of test. Each test type addresses a specific testing requirement.

9.2 TYPES OF TESTS


The separation of debugging from testing was initially introduced by Glenford J. Myers in his 1978 book the Art of Software Testing. Although his attention was on breakage testing it illustrated the desire of the software engineering community to separate fundamental development activities, such as debugging, from that of verification. Drs. Dave gelperin and William C. Hetzel classified in 1988 the phases and goals in software testing as follows: until 1956 it was the debugging
59

HYBRID SPEECH RECOGNITION MODEL

oriented period, where testing was often associated to debugging: there was no clear difference between testing and debugging. From 1957-1978 there was the demonstration oriented period where debugging and testing was distinguished now in this period it was shown, that software satisfies the requirements. The time between 1979-1982 is announced as the destruction oriented period, where the goal was to find errors. 1983-1987 is classified as the evaluation oriented period where tests were to demonstrate that software satisfies its specification, to detect faults and to prevent faults. Dr. Hetzal writing the book The complete guide of software testing. Both works were pivotal in to todays testing culture and remain a consistent source of reference. Dr.Gelperin and jerry E. Durant also want on to develop high impact inspection technology that builds upon traditional inspections but utilizes a test driven additive.

9.2.1 White Box and Black Box Testing:


White Box Testing is a testing in which in which the software tester has knowledge of the inner workings, structure and language of the software, or at least its purpose. It is purpose. It is used to test areas that cannot be reached from a black box level. Black Box Testing is testing the software without any knowledge of the inner workings, structure or language of the module being tested. Black box tests, as most other kinds of tests, must be written from a definitive source document, such as specification or requirements document, such as specification or requirements document. It is a testing in which the software under test is treated, as a black box .you cannot see into it. The test provides inputs and responds to outputs without considering how the software works.

9.2.2 Unit testing:


Unit testing is usually conducted as part of a combined code and unit test phase of the software lifecycle, although it is not uncommon for coding and unit testing to be conducted as two distinct phases.
60

HYBRID SPEECH RECOGNITION MODEL

9.2.3 Integration testing:


Software integration testing is the incremental integration testing of two or more integrated software components on a single platform to produce failures caused by interface defects. The task of the integration test is to check that components or software applications, e.g. components in a software system or one step up software applications at the company level interact without error.

9.2.4 System Testing:


System testing ensures that the entire integrated software system meets requirements. It tests a configuration to ensure known and predictable results. An example of system testing is the configuration oriented system integration test. System testing is based on process descriptions and flows, emphasizing pre-driven process links and integration points.

9.2.5 Acceptance testing:


User Acceptance Testing is a critical phase of any project and requires significant participation by the end user. It also ensures that the system meets the functional requirements.

9.2.6 Regression Testing:


It is used to refer the repetition of the earlier successful tests to ensure that changes made in the software have not introduced new bug/side effects.In recent years the term grey box testing has come into common usage. The typical grey box tester is permitted to set up or manipulate the testing environment, like seeding a database, and can view the state of the product after his actions, like performing a SQL query on the database to be certain of the values of columns. It is used at most exclusively of client-server testers or others who use a database as a repository of information, but can also apply to attester who has to manipulate XML files (DTD or
61

HYBRID SPEECH RECOGNITION MODEL

an actual XML file) or configuration files directly. It can also be used to testers who know the internal workings or algorithm of the software under test and can write tests specifically for the anticipated results. For example, testing a data warehouse implementation involves loading the target database with information, and verifying the correctness of data population and loading of data into the correct tables.

9.3 TEST CASE RESULT


SN O 1. TEST CASE DESCRIPTION Giving a speech signal through a microphone 2. Say : Power Options "cmd /c powercfg.cpl" gets invoked Speech is converted into text format ACTION EXPECTED OUTPUT Resultant text is displayed in word document Power Options window gets opened 3. Say : Music "cmd /c start wmplayer" gets invoked Windows Media Player gets started 4. Say : Device Manger "cmd /c start devmgmt.msc" gets invoked Device Manager window will be opened

5.

Say : Security Center

"cmd /c wscui.cpl" gets invoked

Windows Security Center will be opened

6.

Say: Calculator

"cmd /c calc" gets invoked

Caluculator gets opened

62

HYBRID SPEECH RECOGNITION MODEL

7.

Say : Open Access

"cmd /c start msaccess" gets invoked

Microsoft Office Access will be opened

8.

Say : Open Note

"cmd /c start notepad" gets invoked

Note pad gets opened

9.

Say : Open Paint Say : Run

"cmd /c start mspaint" gets invoked

Paint brush gets opened

"cmd /c start photoshop" gets invoked

Adobe photoshop will be opened

10.

Adobe Say : Run Nero "cmd /c start nero" gets invoked

Nero Express Essentials will be opened

11.

Say : Run 12. Programs

"cmd /c start appwiz.cpl" gets invoked

Programs and Features window will be opened

Say : Run 13. Browser Say : Site Face 14. Book

"C:\\Program Files\\Mozilla Firefox\\firefox.exe" gets invoked "C:\\Program Files\\Mozilla

Mozilla FireFox will be opened www.facebook.c

Firefox\\firefox.exe",http://www.facebook.com om will be gets invoked opened www.gmail.com gets opened

Say : Site Email 15.

"C:\\Program Files\\Mozilla Firefox\\firefox.exe",http://www.gmail.com gets invoked

All the test cases mentioned above passed successfully. No defects encountered.

63

HYBRID SPEECH RECOGNITION MODEL

10. CONCLUSION
Speech recognition is one of the most integrating areas of machine intelligence, since, humans do a daily activity of speech recognition Whether due to technological curiosity to build machines that mimic humans or desire to automate work with machines, research in speech and speaker recognition, as a first step toward natural human-machine communication, has attracted much enthusiasm over the past five decades. We have also encountered a number of practical limitations which hinder a widespread deployment of application and services. In most speech recognition tasks, human subjects produce one to two orders of magnitude less errors than machines. There is now increasing interest in finding ways to bridge such a performance gap

The advantages of our system in comparison to more sophisticated approaches mentioned are as follows. This project provides sufficiently accurate speech detection results as a frontend for ASR-systems Our approach is computationally efficient and relatively simple to implement without deeper knowledge about speech recognition interiors and sophisticated classifiers like HMMs, GMMs or LDA. Therefore valuable for groups lacking background knowledge in speech recognition and aiming for a robust speech recognition system in restricted domains.

What we know about human speech processing is very limited. Although these areas of investigations are important the significant advances will come from studies in acoustic phonetics, speech perception, linguistics, and psychoacoustics.

10.1 FUTURE WORK


There should be lot of considerable effort in automating the translation of human language. The ability to translate the human language has to
64

HYBRID SPEECH RECOGNITION MODEL

alleviate many of the barriers associated with global communication and has therefore captured the attention of various sectors. At some point in the future, speech recognition may become speech understanding. The statistical models that allow computers to decide what a person just said may someday allow them to grasp the meaning behind the words. Future systems need to have an efficient way of representing, storing, and retrieving knowledge required for natural conversation. Although it is a huge leap in terms of computational power and software sophistication, some researchers argue that speech recognition development offers the most direct line from the computers of today to true artificial intelligence. Speech recognition has attracted scientists as an important discipline and has created a technological impact on society and is expected to flourish further in this area of human machine interaction.

65

HYBRID SPEECH RECOGNITION MODEL

11. BIBILOGRAPHY
1. Markoff, John; Talking to Machines: Progress Is Speeded; The New York Times, Business Technology; July 6, 1988. 2. Singh, Rita; The Sphinx Speech Recognition Systems; Encyclopaedia of Human Computer Interaction; 2004. 3. Sriharuksa, Janwit. (2002) An ASIC Design of Real Time Speech Recognition (Masters research study, Asian Institute of Technology, 2002). Bangkok: Asian Institute of Technology. 4. Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel Sphinx-4: A Flexible Open Source Framework for Speech Recognition , Sun Microsystems Inc., Tech Rep.; TR-2004-139; 2004. 5. Terry Thompson; "Tech Tips: Are You Talking To Your Computer Again?" Disabilities, Opportunities, Internetworking, and Technology, University of Washington; 2006. 6. R. Stuckless; Recognition means more than just getting the words right: Beyond accuracy to readability; Speech Technology, Oct. /Nov. 1999, pp. 3035; 1999. 7. Willie Walker, Paul Lamere, Philip Kwok, Bhiksha Raj, Rita Singh, Evandro Gouvea, Peter Wolf, Joe Woelfel; Sphinx-4: A Flexible Open Source Framework for Speech Recognition; Sun Microsystems Inc., Tech Rep.; TR-2004-139; 2004 8. Michael F. Tear; Spoken dialogue technology: enabling the conversational user interface ACM Computing Surveys, Volume 34, pp. 90-169; 2002.

66

You might also like