Report Voice Recognigation

LIST OF FIGURES:
LIST OF TABLES:
ABSTRACT
Our aim is to provide the computer with a natural interface, including the ability to understand human speech. For this purpose, we propose a way how to handle the Computer System specially Windows 8 with voice command. At first, the user initiates a given command by his voice through the microphone then the software of the proposed system will take over to recognize the command. If the recognition is succeeded or matched with one of the given voice command then it will perform the operation according to speakers command. In our proposed system we are going to use Microsoft Speech SDK for voice recognition process and Voice-XML for creating the voice grammar in the software part. It has the flexibility to work with the speech of any user.
Keywords--
Dynamic Programming Algorithm, Hidden Markov Model, Microphone, Microsoft speech SDK, Phonemes, Speech recognition, Voice- XML, Windows 8.
1. INTRODUCTION:
Recent years it has been seen that the improvements in the quality and performance of speech-based human machine interaction is steady. The next generation of speech based interface technology will enable easy to use automation of new and existing communication services, making human machine interaction more natural. For the disabled people the absence of the data bases and diversity of the articulator handicaps are major obstacles for the construction of reliable speech recognition systems, which explains poverty of the market in systems of speech recognition for disabled people. If a person finds it difficult or is not capable of handling the mouse ports and the keyboard and if the keyboard or mouse is faulty, there have to be other ways to handle the operating system. Speech may act as one of them. There is a growing demand for systems capable of handling Operating System using only the voice commands given by a person. And this paper represents a way how to control the OS by using voice command it also proves fruitful for surgeons while operating on a patient to retrieve his/her previous records from the computers database. It is also applicable for consumer electronics including games, mobile phones, vehicle navigation, speech ticket reservations etc. As windows 8 is about to release so we are creating Speech control for Windows 8.
Speech recognition (in many contexts also known as automatic speech recognition, computer speech recognition or erroneously as voice recognition) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program. Speech recognition applications that have emerged over the last few years include voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), domotic appliances control and content-based spoken audio search (e.g. find a podcast where particular words were spoken). Voice recognition or speaker recognition is a related process that attempts to identify the person speaking, as opposed to what is being said.
DEFINING THE PROBLEM Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding, a subject covered in section. Speech recognition systems can be characterized by many parameters, some of the more important of which are shown in Figure. An isolated-word speech recognition system requires that the speaker pause briefly between words, whereas a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies, and is much more difficult to recognize than speech read from script. Some systems require speaker enrollment---a user must provide samples of his or her speech before using them, whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence of words, language models or artificial grammars are used to restrict the combination of words. The simplest language model can be specified as a finite-state network, where the permissible words following each word are given explicitly. More general language models approximating natural language are specified in terms of a context-sensitive grammar. One popular measure of the difficulty of the task, combining the vocabulary size and the language model, is perplexity, loosely defined as the geometric mean of the number of words that can follow a word after the language model has been applied (see section for a discussion of language modeling in general and perplexity in particular). Finally, there are some external parameters that can affect speech recognition system performance, including the characteristics of the environmental noise and the type and the placement of the microphone.
PARAMETERS Speaking mode Speaking style Enrollment Vocabulary Language model Perplexity SNR Transducer
RANGE Isolated words to continue speech Read speech to spontaneous speech Speaking-dependent to speaker-independent Small( <20 words ) to large ( > 20,000 words ) Finite-state to context-sensitive Small ( < 10 ) to large ( > 100) High ( > 30 dB ) to low ( < 10 dB) Voice-cancelling microphone to telephone
Table1.1: Typical parameters used to characterize the capability of speech recognition systems
Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear. These phonetic variabilities are exemplified by the acoustic differences of the phoneme in two, true, and butter in American English. At word boundaries, contextual variations can be quite dramatic---making gas shortage sound like gash shortage in American English, and devo andare sound like devandare in Italian. Second, acoustic variabilities can result from changes in the environment as well as in the position and characteristics of the transducer. Third, within-speaker variabilities can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality. Finally, differences in sociolinguistic background, dialect, and vocal tract size and shape can contribute to cross-speaker variabilities. Figure shows the major components of a typical speech recognition system. The digitized speech signal is first transformed into a set of useful measurements or features at a fixed rate, typically once every 10--20 msec (see sections for signal representation and digital signal processing, respectively). These measurements are then used to search for the most likely word candidate, making use of constraints imposed by the acoustic, lexical, and language models. Throughout this process, training data are used to determine the values of the model parameters.
Figure1.1: Components of a typical speech recognition system.
Speech recognition systems attempt to model the sources of variability described above in several ways. At the level of signal representation, researchers have developed representations that emphasize perceptually important speakerindependent features of the signal, and de-emphasize speaker-dependent characteristics. At the acoustic phonetic level, speaker variability is typically modeled using statistical techniques applied to large amounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent acoustic models to those of the current speaker during system use. Effects of linguistic context at the acoustic phonetic level are typically handled by training separate models for phonemes in different contexts; this is called context dependent acoustic modeling. Word level variability can be handled by allowing alternate pronunciations of words in representations known as pronunciation networks. Common alternate pronunciations of words, as well as effects of dialect and accent are handled by allowing search algorithms to find alternate paths of phonemes through these networks. Statistical language models, based on estimates of the frequency of occurrence of word sequences, are often used to guide the search through the most probable sequence of words. The dominant recognition paradigm in the past fifteen years is known as hidden Markov models (HMM). An HMM is a doubly stochastic model, in which the generation of the underlying phoneme string and the frame-by-frame, surface acoustic realizations are both represented probabilistically as Markov processes, as discussed in sections. Neural networks have also been used to estimate the
frame based scores; these scores are then integrated into HMM-based system architectures, in what has come to be known as hybrid systems, as described in section. An interesting feature of frame-based HMM system is that speech segments are identified during the search process, rather than explicitly. An alternate approach is to first identify speech segments, then classify the segments and use the segment scores to recognize words. This approach has produced competitive recognition performance in several tasks
2. LITERATURE SURVEY:
HIDDEN MARKOV MODEL (HMM)-BASED SPEECH RECOGNITION: Modern general-purpose speech recognition systems are generally based on (HMMs). This is a statistical model which outputs a sequence of symbols or quantities. One possible reason why HMMs are used in speech recognition is that a speech signal could be viewed as a piece-wise stationary signal or a short-time stationary signal. That is, one could assume in a short-time in the range of 10 milliseconds, speech could be approximated as a stationary process. Speech could thus be thought as a Markov model for many stochastic processes (known as states). Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use. In speech recognition, to give the very simplest setup possible, the hidden Markov model would output a sequence of n-dimensional real-valued vectors with n around, say, 13, outputting one of these every 10 milliseconds. The vectors, again in the very simplest case, would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short-time window of speech and decor relating the spectrum using a cosine transform, then taking the first (most significant) coefficients. The hidden Markov model will tend to have, in each state, a statistical distribution called a mixture of diagonal covariance Gaussians which will give likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes. Described above are the core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system would need context dependency for the phones (so phones with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for
further speaker normalization it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semitied covariance transform (also known as maximum likelihood linear transform, or MLLT). Many systems use so-called discriminative training techniques which dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. Examples are maximum mutual information (MMI), minimum classification error (MCE) and minimum phone error (MPE). Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the Viterbi algorithm to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model which includes both the acoustic and language model information, or combining it statically beforehand (the finite state transducer, or FST, approach).
NEURAL NETWORK-BASED SPEECH RECOGNITION: Another approach in acoustic modeling is the use of neural networks. They are capable of solving much more complicated recognition tasks, but do not scale as well as HMMs when it comes to large vocabularies. Rather than being used in general-purpose speech recognition applications they can handle low quality, noisy data and speaker independence. Such systems can achieve greater accuracy than HMM based systems, as long as there is training data and the vocabulary is limited. A more general approach using neural networks is phoneme recognition. This is an active field of research, but generally the results are better than for HMMs. There are also NN-HMM hybrid systems that use the neural network part for phoneme recognition and the hidden markov model part for language modeling.
DYNAMIC TIME WARPING (DTW)-BASED SPEECH RECOGNITION: Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMMbased approach. Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another they were walking more quickly, or even if there were accelerations and decelerations during the course of one observation. DTW has been applied to video, audio, and graphics -- indeed, any data which can be turned into a linear representation can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g. time series) with certain restrictions, i.e. the sequences are "warped" non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models.
LIMITATIONS: Natural voice recognition system faces a major drawback of spontaneous voice recognition, namely hesitations, out of vocabulary. An efficient dialogue design can greatly improve the performance of the voice interface. People should be trained as to how the commands should be pronounced so as to get accurate results. This software will prove to be a boon to the people who are physically disabled and are unable to use mouse and keyboard as external input device. If the ports of mouse and keyboard do not work properly then one can also operate operating system using this software. It may save our cost from purchasing of mouse and keyboard. Speech Recognition Engine may take some unwanted signals i.e. noise in the environment, which are not required for our command. For such unwanted signals sometimes our command cannot be recognized properly and is executed in a changed manner. This may be the main limitation of this software.
3. PROJECT STATEMENT:
Fig 3.1:
As shown, the user provides voice commands through microphone. The given command is then converted into electrical pulse by the microphone. The sound card converts electrical pulse into digital signal. The Speech Recognition Engine then converts digital signals into phonemes and finally we get text command. The respective operation is thus performed. This procedure repeats for every voice command. Modules: 1. 2. 3. 4. 5. Phonemes Extraction HMM SAPI XML Database Action Applier
1) Phonemes Extraction: Phonemes are the linguistic units. They are the sounds that group together to form our words, although how a phoneme converts into sound depends on many factors including the surrounding phonemes, speaker accent and age. English uses about 44 phonemes to convey the 500,000 or so words it contains, making them a relatively good data item for speech recognition engines to work with. These phonemes are extracted by Microsoft Speech SDK. SOME EXAMPLES OF PHONEMES USED IN WORDS
From the extracted phonemes we get the command in text format. 2) USE OF HMM (HIDDEN MARKOV MODEL): Now we have a list of phonemes extracted from the given input command. These phonemes need to be combined and converted into word. The most common method is to use a Hidden Markov Model (HMM). A Markov Model (in a speech recognition context) is basically a chain of phonemes that represent a word. The chain can branch, and if it does, is statistically balanced. HMMs function as probabilistic finite state machines: The model consists of a set of states, and its topology specifies the allowed transitions between them. At every time frame, an HMM makes a probabilistic transition from one state to another and emits a feature vector with each transition.
The use of Hidden Markov Models (HMM) may improve the accuracy to recognize words in view of the fact that HMM takes into account the probabilities of transition among phonemes.
Fig 3.2: Example of HMM
3) SAPI (SPEECH APPLICATION PROGRAMMING INTERFACE): SAPI is an interface between our application platform and Microsoft Speech Engine. It provides the word formed by HMM to our programming platform which is further compared with the voice-xml database. The speech recognition engine that is utilized by this voice controlled system is the Microsoft's speech recognition engine and the associated development kit 5.1 (Microsoft Speech SDK 5.1). The recognition rate of Microsoft's speech recognition engine is not high in continuous speech mode but extremely high under the command control mode .We use (SAPI) to implement voice function. SAPI provides a high level interface between applications and speech engine. Controlling and management of various speech engines need real-time operation technology. However, SAPI realizes and hides the underlying technical detail.
There are two basic types of SAPI engines: text-to-speech (TTS) systems and speech recognizers. The TTS systems can synthesize text strings and files into spoken audio using synthetic voices, whereas Speech recognizers can convert human spoken audio into readable text strings and files. Speech engine communicates with SAPI by the device driver interface (DDI) layer and SAPI communicates with applications by API. So by the use of these application interfaces, voice recognition and speech synthesis software can be developed.
Dynamic Programming Algorithm: In this type of speech recognition technique the input voice data is converted to commands. The recognition process then consists of matching the incoming speech with stored commands. The command with the lowest distance measure from the input pattern is the recognized word. The best match (lowest distance measure) is based upon dynamic programming. This is called a Dynamic Time Warping (DTW) word recognizer. Two important concepts in DTW are, a) Features: The information in each signal has to be represented in some manner. b) Distances: Some form of metric has been used in order to obtain a match path. There are two types: Local: A computational difference between a feature of one signal and a feature of the other. Global: The overall computational difference between an entire signal and another signal of possibly different length. Speech is a time-dependent process. So the utterances of the same word will have different durations, and utterances of the same word with the same duration will differ in the middle, due to different parts of the words being spoken at different rates. To obtain a global distance between two speech patterns a time versus time comparison must be performed using a time-time matrix. We obtain a global distance between two speech patterns using a time-time" matrix. As an illustration, consider input SsPEEhH which is a 'noisy' version of
the reference word SPEECH. The time-time matrix for this illustration will be as follows:
Fig 3.3: Time-Time matrix
If D (i,j) is the global distance up to (i,j) and the local distance at (i,j) is given by d(i,j) D (i, j) = min [D (i-1, j-1), D (i-1, j), D (i, j-1)] + d (i, j) (1) Where d (i, j) is calculated using the Euclidean distance metric given by D(x, y) = ( (xj - yj) 2) 1/2 . (2)
Initial condition will be D (1, 1) = d (1, 1). The final global distance D (n,N) is calculated recursively using the base condition as terminating condition. The final global distance D (n,N) gives us the overall matching score of the reference word with the input. The input word is then recognized as the word corresponding to the reference command in the database with the lowest matching score. This algorithm ensures a polynomial complexity: O (n2v), Where n is sequences lengths and v is the total number of commands in our dictionary. XML Database The grammar of the commands used in our paper is stored in a XML file. Here, in our paper we are using XML file referred to as Voice-XML as our database. The Reference word used in our algorithm for comparison with the input word is taken from our Voice-XML database. When input command matches with the stored grammar the specific operation related to the command gets executed.
4. SYSTEM REQUIREMENT AND SPECIFICATION:

UML DIAGRAM
DFD DFD Level 0:
System
DFD Level 1:
Syste m
DFD Level 2:
Speech Recognition
Text Command
CONTROL FLOW DIAGRAM
CLASS DIAGRAM
ACTIVITY DIAGRAM
act Activ ity Diagram
Start
sendsound
initialize speech engine
receiv esound
receiv e speechengine
call HMM
compare text command
compare XML
command Found
Yes
Perform Action Command
Stop command Stop command
stop
COMPONENT DIAGRAM
DEPLOYMENT DIAGRAM
deployment Deployment
Computer
Speech Recognition Engine device microphone Text Commander
deployment spec Windows 8
deployment spec .net 4.0
HARDWARE AND SOFTWARE REQUIREMENT:
Hardware Requirements: System Hard Disk Floppy Drive Monitor Mouse Ram : Pentium IV 2.4 GHz. : 40 GB. : 1.44 Mb. : 15 VGA Colour. : Logitech. : 512 Mb.
Software Requirements: Operating system : Windows 8. Coding Language : C# ,DOT NET visual studio 2012 Data Base : MS SQL Server 2008
5. PLANING AND SHEDULING THE PROJECT WORK:
SOFTWARE ENGINEERING APPROACH:
Fig 5.1: Incremental Model
Incremental Development and Release: Developing systems through incremental release requires first providing essential operating functions, then providing system users with improved and more capable versions of a system at regular intervals .This model combines the classic software life cycle with iterative enhancement at the level of system development organization. It also supports a strategy to periodically distribute software maintenance updates and services to dispersed user communities. This in turn accommodates the provision of standard software maintenance contracts. It is therefore a popular model of software evolution used by many commercial software firms and system vendors. This approach has also been extended through the use of software prototyping tools and techniques, which more directly provide support for incremental development and iterative release for early and ongoing user feedback and evaluation. Figure 2 provides an example view of an incremental development, build, and release model for engineering large Ada-based software systems, Incremental release of software functions and/or subsystems (developed through stepwise refinement) to separate inhouse quality assurance teams that apply statistical measures and analyses as the basis for certifying high-quality software systems.
REQUIREMENT ANALYSIS:
NORMAL REQUIREMENTS: 1. User interfaces: In our system we provide a GUI on both server and client side. The users of the system can communicate with the help of , LAN and make use of the GUI available to them to execute. 2. Hardware interfaces: There are few hardware interfaces to the system: MICROPHONE 3. Software Interface: There are few software interfaces for the system: MICROPHONE drivers in order to install the MICROPHONE. 4. Communication Interface: Following communication interface required by the system: MICROPHONE: In order to communicate with the MICROPHONE. Expected Requirements: 1. Performance Requirement: The microphone which we will be using for sending the recognition should be noise free and should have freer bandwidth.
The switch needs to have exactly the same number of clients mentioned in the system. 2. Safety Requirements: To keep this system safe care should be taken to avoid the theft of components of the system. The input voltage for MICROPHONE should not be more than the standards applied to them. 3. Security Requirements: The server needs not to share any drives for networking thus avoiding data theft. 4. Software Quality Attributes: To keep interfacing MICROPHONE modem more flexible.
NON FUNCTIONAL REQUIREMENT STUDY The nonfunctional requirement of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the nonfunctional requirement study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For nonfunctional requirement analysis, some understanding of the major requirements for the system is essential.
Three key considerations involved in the nonfunctional requirement analysis are ECONOMICAL NON FUNCTIONAL REQUIREMENT TECHNICAL NON FUNCTIONAL REQUIREMENT SOCIAL NON FUNCTIONAL REQUIREMENT
ECONOMICAL NON FUNCTIONAL REQUIREMENT This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased. TECHNICAL NON FUNCTIONAL REQUIREMENT This study is carried out to check the technical nonfunctional requirement, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a
modest requirement, as only minimal or null changes are required for implementing this system. SOCIAL NON FUNCTIONAL REQUIREMENT The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system. Excited Requirements: 1. User should not enter message or question to that is not appropriate with close domain FAQ 2. System should respond to each SMS in appropriate manner by using template matching algorithm.
REQUIREMENT VALIDATION
1. Organization and Completeness
1. Are all internal cross-references to other requirements correct? 2. Are all requirements written at a consistent and appropriate level of detail? 3. Do the requirements provide an adequate basis for design? 4. Is the implementation priority of each requirement included? 5. Are all external hardware, software, and communication interfaces defined? 6. Have algorithms intrinsic to the functional requirements been defined? 7. Does the SRS include all of the known customer or system needs? 8. Is any necessary information missing from a requirement? If so, is it identified as TBD? 9. Is the expected behavior documented for all anticipated error conditions?
Yes Yes Yes No Yes Yes Yes No Yes
2. Correctness
1. Do any requirements conflict with or duplicate other requirements? 2. Is each requirement written in clear, concise, unambiguous language? No No
3. Is each requirement verifiable by testing, demonstration, review, or Yes analysis? 4. Is each requirement in scope for the project? 5. Is each requirement free from content and grammatical errors? 6. Can all of the requirements be implemented within known constraints? 7. Are any specified error messages unique and meaningful? Yes Yes Yes No
3. Quality Attributes
1. 2. 3. Are all performance objectives properly specified? Are all security and safety considerations properly specified? Are other pertinent quality attribute goals explicitly documented and quantified, with the acceptable tradeoffs specified? Yes Yes Yes
4. Traceability
1. 2. Is each requirement uniquely and correctly identified? Can each software functional requirement be traced to a higherlevel requirement (e.g., system requirement, use case)? Yes Yes
5. Special Issues
1. 2. 3. Are all requirements actually requirements, not design or implementation solutions? Are the time-critical functions identified, and timing criteria specified for them? Are all significant consumers of scarce resources (memory, network bandwidth, processor capacity, etc.) identified, and is their anticipated resource consumption Yes specified? 4. Have internationalization issues been adequately addressed? No Yes Yes
SYSTEM IMPLEMENTATION PLAN:
1. EFFORT ESTIMATE TABLE:

Task Effort weeks Deliverables Milestones
Analysis of existing systems & compare with 4 weeks proposed one Literature survey Designing & planning o System flow o Designing modules & its deliverables 1 weeks 2 weeks 1 weeks 2 week Modules design document 7 weeks 4 weeks 2 weeks Primary system Test Reports Complete project report Formal Formal
Implementation Testing Documentation
Table 5.1 : Effort Estimate Table
2. PHASE DESCRIPTION:
Phase Phase 1 Phase 2 Phase 3
Task Analysis Literature survey Design
Description Analyze the information given in the IEEE paper. Collect raw data and elaborate on literature surveys. Assign the module and design the process flow control.
Phase 4
Implementation
Implement the code for all the modules and integrate all the modules.
Phase 5
Testing
Test the code and overall process weather the process works properly.
Phase 6
Documentation
Prepare the document for this project with conclusion and future enhancement.
Table 5.2: Phase Description
3. PROJECT PLAN
Date Phase Phase 1 Phase 2 Phase 3 Mar/12 Nov/11 Aug/11 Dec/11 Sep/11 Feb/12 Oct/11 Jun/11 Jan/11 Jul/11
Phase 4 Phase 5 Phase 6
Table 5.3: Project Plan
4. ESTIMATION OF KLOC:
The number of lines required for implementation of various modules can be estimated as follows:
Sr.No. 1. 2. 3. 4. 5. 6. Modules Graphical User Interface User authentication Code Database Code Web Design Code Device Drivers Interfacing Code KLOC 0.50 0.20 0.60 0.50 0.40 0.20
Table 5.4: Estimation of KLOC
Thus the total number of lines required is approximately 2.40 KLOC.
Efforts: E=3.2* (KLOC) ^1.02 E=3.2* (2.40) ^1.02 E=7.82 person-month Development Time (In Months): D=E / N D=7.82 /3 D=2.66months. Number of Persons: 4 persons are required to complete the project with given time span successful.
FEASIBILITY ASSESSMENT:
What are P, NP-Complete, and NP-Hard? When solving problems we have to decide the difficulty level of our problem. There are three types of classes provided for that. These are as follows: 1) P Class 2) NP-hard Class 3) NP-Complete Class A decision problem is in P if there is a known polynomial-time algorithm to get that answer. A decision problem is in NP if there is a known polynomial-time algorithm for a non-deterministic machine to get the answer. Problems known to be in P are trivially in NP the nondeterministic machine just never troubles itself to fork another process, and acts just like a deterministic one. But there are some problems which are known to be in NP for which no polytime deterministic algorithm is known; in other words, we know theyre in NP, but dont know if theyre in P.A problem is NP-complete if you can prove that (1) its in NP, and (2) show that its poly-time reducible to a problem already known to be NP-complete. A problem is NP-hard if and only if its at least as hard as an NP-complete problem. The more conventional Traveling Salesman Problem of finding the shortest route is NP-hard, not strictly NP-complete.
For Project: A: Voice Communication B: Algorithmic Processing
Time Complexity = Am +B n ------------ (1) So Project Feasible and its under Permutable Class (P - Class) Explanation: To process Voice in and out of our system it will take some time let us consider that m. And to process each of the algorithms it will also requires some time. Let us consider that time as n. Because Equation 1 and Definition of P-class project is in P-Class Type of Feasibility
ECONOMICAL FEASIBILITY This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the customized products had to be purchased. TECHNICAL FEASIBILITY This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system. SOCIAL FEASIBILITY The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.
RISK MITIGATION, MONITORING AND MANAGEMENT PLAN
SCOPE AND INTENT OF RMMM ACTIVITIES The goal of the risk mitigation, monitoring and management plan is to identify as many potential risks as possible. To help determine what the potential risks are, Game Forge will be evaluated using the checklists found in section 6.3 of Roger S. Pressmans Software Engineering, A Practitioners Approach [Reference is the SEPA, 4/e, see risk checklists contained within this Web site]. These checklists help to identify potential risks in a generic sense. The project will then be analyzed to determine any project-specific risks. When all risks have been identified, they will then be evaluated to determine their probability of occurrence, and how Game Forge will be affected if they do occur. Plans will then be made to avoid each risk, to track each risk to determine if it is more or less likely to occur, and to plan for those risks should they occur. It is the organizations responsibility to perform risk mitigation, monitoring, and management in order to produce a quality product. The quicker the risks can be identified and avoided, the smaller the chances of having to face that particular risks consequence. The fewer consequences suffered as a result of good RMMM plan, the better the product and the smoother the development process. RISK MANAGEMENT ORGANIZATIONAL ROLE Each member of the organization will undertake risk management. The development team will consistently be monitoring their progress and project status as to identify present and future risks as quickly and accurately as possible. With this said, the members who are not directly involved with the implementation of the product will also need to keep their eyes open for any possible risks that the development team did not spot. The responsibility of risk management falls on each member of the organization, while William Lord maintains this document.
RISK IDENTIFICATION CHECKLIST: Product Size Risks Estimated size in lines of code (LOC) Project will have an estimated _______ line of code. Degree of confidence in estimated size We are highly confident in our estimated size. Estimated size in number of programs, files, and transactions 1. We estimate 12 programs. 2. We estimate 10 large files for the engine, 5 large files for the user interface. 3. We estimate 40 or more transactions for the engine, and 20 transactions for the user-interface. Percentage deviation in size from average for previous products We allow for a 20% deviation from average. Size of database created or used The size of the database that we will use will be an estimated 7 tables. The number of fields will vary per table and will have an overall average of 8 fields per table. The number of records in each table will vary with the number of sprites that the user adds to the project, and the number of instances of each sprite that the user creates. Number of users The number of users will be fairly high. There will be 5 users per instance of the software running, as the software is client/server or intended for multiuser use. Number of projected changes to the requirements We estimate 3 possible projected changes to the requirements. These will be as a result of our realization of what is required and not required as we get further into implementation, as well as a result of interaction with the customer and verification of the customers requirements. Amount of reuse of software Reuse will be very important to get the project started. GSM Modem is very simple to reuse (for the most part) and previous programs used to code for with GSM Modem will be reviewed and much GSM Modem code will be recopied.
Business Impact Risk Amount and quality of documentation that must be produced and delivered to customer the customer will be supplied with a complete online help file and users manual for Game Forge. Coincidentally, the customer will have access to all development documents for Game Forge, as the customer will also be grading the project. Governmental constraints in the construction of the product none known. Costs associated with late delivery Late delivery will prevent the customer from issuing a letter of acceptance for the product, which will result in an incomplete grade for the course for all members of the organization Costs associated with a defective product Unknown at this time.
Customer Related Risks Have you worked with the customer in the past? Yes, all team members have completed at least one project for the customer, though none of them have been to the magnitude of the current project. Does the customer have a solid idea of what is required? Yes, the customer has access to both the System Requirements Specification, and the Software Requirements Specification for the Game Forge project. Will the customer agree to spend time in formal requirements gathering meetings to identify project scope? Unknown. While the customer will likely participate if asked, the inquiry has not yet been made.
Process Risks Does senior management support a written policy statement that emphasizes the importance of a standard process for software development? N/A. PA Software does not have a senior management. It should be noted that the structured method has been adopted. At the
completion of the project, it will be determined if the software method is acceptable as a standard process, or if changes need to be implemented. Has your organization developed a written description of the software process to be used on this project? Yes. Is under development using the structured method as described in part three of Roger S. Pressmans Software Engineering, A Practitioners Approach. Are staff members willing to use the software process? Yes. The software process was agreed upon before development work began. Is the software process used for other products? N/A. PA Software has no other projects currently.
Technical Issues Are facilitated application specification techniques used to aid in communication between the customer and the developer? The development team will hold frequent meetings directly with the customer. No formal meetings are held (all informal). During these meetings the software is discussed and notes are taken for future review. Are specific methods used for software analysis? Special methods will be used to analyze the softwares progress and quality. These are a series of tests and reviews to ensure the software is up to speed. For more information, see the Software Quality Assurance and Software Configuration Management documents. Do you use a specific method for data and architectural design? Data and architectural design will be mostly object oriented. This allows for a higher degree data encapsulation and modularity of code.
Technology Risks Is the technology to be built new to your organization? No
Does the software interface with new or unproven hardware? No Is a specialized user interface demanded by the product requirements? Yes.
Development Environment Risks Is a software project management tool available? No. No software tools are to be used. Due to the existing deadline, the development team felt it would be more productive to begin implementing the project than trying to learn new software tools. After the completion of the project software tools may be implemented for future projects.
Risk Table
Risks Computer Crash Late Delivery Technology will not Meet Expectations End users Resist System Changes in Requirement Lack of Development Experience Lack of Database Stability Deviation from Software Engi. Poor Comments Category TI BU TE BU PS TI TI PI TI Probability (%) 70 30 25 20 20 20 40 10 20 Fig 5.5: Risk Table Impact 1 1 1 1 2 2 2 3 4
Impact Values: 1 Catastrophic 2 Critical 3 Marginal 4 Negligible Risk Refinement At various points in the checklist, lack of software tools is identified as a potential risk. Due to time constraints, the members of the design team felt that searching for and learning to use additional software tools could be detrimental to the project, as it would take time away from project development. For this reason, we have decided to forgo the use of software tools. It will not be explored as a potential risk because all planning will be done without considering their use.
STRATEGIES TO MANAGE RISK Risk Mitigation, Monitoring and Management RISK: COMPUTER CRASH Mitigation The cost associated with a computer crash resulting in a loss of data is crucial. A computer crash itself is not crucial, but rather the loss of data. A loss of data will result in not being able to deliver the product to the customer. This will result in a not receiving a letter of acceptance from the customer. Without the letter of acceptance, the group will receive a failing grade for the course. As a result the organization is taking steps to make multiple backup copies of the software in development and all documentation associated with it, in multiple locations. Monitoring When working on the product or documentation, the staff member should always be aware of the stability of the computing environment theyre working in. Any changes in the stability of the environment should be recognized and taken seriously.
Management The lack of a stable-computing environment is extremely hazardous to software development team. In the event that the computing environment is found unstable, the development team should cease work on that system until the environment is made stable again, or should move to a system that is stable and continue working there.
RISK: LATE DELIVERY Mitigation The cost associated with a late delivery is critical. A late delivery will result in a late delivery of a letter of acceptance from the customer. Without the letter of acceptance, the group will receive a failing grade for the course. Steps have been taken to ensure a timely delivery by gauging the scope of project based on the delivery deadline. Monitoring A schedule has been established to monitor project status. Falling behind schedule would indicate a potential for late delivery. The schedule will be followed closely during all development stages. Management Late delivery would be a catastrophic failure in the project development. If the project cannot be delivered on time the development team will not pass the course. If it becomes apparent that the project will not be completed on time, the only course of action available would be to request an extension to the deadline form the customer.
RISK: TECHNOLOGY DOES NOT MEET SPECIFICATIONS Mitigation
In order to prevent this from happening, meetings (formal and informal) will be held with the customer on a routine business. This insures that the product we are producing and the specifications of the customer are equivalent. Monitoring The meetings with the customer should ensure that the customer and our organization understand each other and the requirements for the product. Management Should the development team come to the realization that their idea of the product specifications differs from those of the customer, the customer should be immediately notified and whatever steps necessary to rectify this problem should be done. Preferably a meeting should be held between the development team and the customer to discuss at length this issue. RISK: END USERS RESIST SYSTEM Mitigation In order to prevent this from happening, the software will be developed with the end user in mind. The user-interface will be designed in a way to make use of the program convenient and pleasurable. Monitoring The software will be developed with the end user in mind. The development team will ask the opinion of various outside sources throughout the development phases. Specifically the user-interface developer will be sure to get a thorough opinion from others. Management Should the program be resisted by the end user, the program will be thoroughly examined to find the reasons that this is so. Specifically the user interface will be investigated and if necessary, revamped into a solution.
RISK: CHANGES IN REQUIREMENTS Mitigation
In order to prevent this from happening, meetings (formal and informal) will be held with the customer on a routine business. This insures that the product we are producing and the requirements of the customer are equivalent. Monitoring The meetings with the customer should ensure that the customer and our organization understand each other and the requirements for the product. Management Should the development team come to the realization that their idea of the product requirements differs from those of the customer, the customer should be immediately notified and whatever steps necessary to rectify this problem should be taken. Preferably a meeting should be held between the development team and the customer to discuss at length this issue. RISK: LACK OF DEVELOPMENT EXPERIENCE Mitigation In order to prevent this from happening, the development team will be required to learn the languages and techniques necessary to develop this software. The member of the team that is the most experienced in a particular facet of the development tools will need to instruct those who are not as well versed. Monitoring Each member of the team should watch and see areas where another team member may be weak. Also if one of the members is weak in a particular area it should be brought to the attention by that member, to the other members. Management The members who have the most experience in a particular area will be required to help those who dont out should it come to the attention of the team that a particular member needs help.
RISK: DATABASE IS NOT STABLE
Mitigation In order to prevent this from happening, developers who are in contact with the database, and/or use functions that interact with the database, should keep in mind the possible errors that could be caused due to poor programming/error checking. These issues should be brought to the attention of each of the other members that are also in contact with the database. Monitoring Each user should be sure that the database is left in the condition it was before it was touched, to identify possible problems. The first notice of database errors should be brought to the attention of the other team members. Management Should this occur, the organization would call a meeting and discuss the causes of the database instability, along with possible solutions?
RISK: POOR COMMENTS IN CODE Mitigation Poor code commenting can be minimized if commenting standards are better expressed. While standards have been discussed informally, no formal standard yet exists. A formal written standard must be established to ensure quality of comments in all code. Monitoring Reviews of code, with special attention given to comments will determine if they are up to standard. This must be done frequently enough to control comment quality. If they are not done comment quality could drop, resulting in code that is difficult to maintain and update. Management Should code comment quality begin to drop, time must be made available to bring comments up to standard. Careful monitoring will minimize the impact of poor commenting. Any problems are resolved by adding and refining comments as necessary.
FUTURE DIRECTIONS:
Robustness: In a robust system, performance degrades gracefully (rather than catastrophically) as conditions become more different from those under which it was trained. Differences in channel characteristics and acoustic environment should receive particular attention. Portability: Portability refers to the goal of rapidly designing, developing and deploying systems for new applications. At present, systems tend to suffer significant degradation when moved to a new task. In order to return to peak performance, they must be trained on examples specific to the new task, which is time consuming and expensive. Adaptation: How can systems continuously adapt to changing conditions (new speakers, microphone, task, etc.) and improve through use? Such adaptation can occur at many levels in systems, sub word models, word pronunciations, language models, etc. Language Modeling: Current systems use statistical language models to help reduce the search space and resolve acoustic ambiguity. As vocabulary size grows and other constraints are relaxed to create more habitable systems, it will be increasingly important to get as much constraint as possible from language models; perhaps incorporating syntactic and semantic constraints that cannot be captured by purely statistical models. Confidence Measures: Most speech recognition systems assign scores to hypotheses for the purpose of rank ordering them. These scores do not provide a good indication of whether a hypothesis is correct or not, just that it is better than the other hypotheses. As we move to tasks that require actions, we need better methods to evaluate the absolute correctness of hypotheses.
Out-of-Vocabulary Words: Systems are designed for use with a particular set of words, but system users may not know exactly which words are in the system vocabulary. This leads to a certain percentage of out-of-vocabulary words in natural conditions. Systems must have some method of detecting such out-of-vocabulary words, or they will end up mapping a word from the vocabulary onto the unknown word, causing an error. Spontaneous Speech: Systems that are deployed for real use must deal with a variety of spontaneous speech phenomena, such as filled pauses, false starts, hesitations, ungrammatical constructions and other common behaviors not found in read speech. Development on the ATIS task has resulted in progress in this area, but much work remains to be done. Prosody: Prosody refers to acoustic structure that extends over several segments or words. Stress, intonation, and rhythm convey important information for word recognition and the user's intentions (e.g., sarcasm, anger). Current systems do not capture prosodic structure. How to integrate prosodic information into the recognition architecture is a critical question that has not yet been answered. Modeling Dynamics: Systems assume a sequence of input frames which are treated as if they were independent. But it is known that perceptual cues for words and phonemes require the integration of features that reflect the movements of the articulators, which are dynamic in nature. How to model dynamics and incorporate this information into recognition systems is an unsolved problem.
6. REFRENCES:
*1+ Ben Mosbah, B., Speech Recognition for Disabilities People Volume 1,Information and Communication Technologies, 2006. ICTTA apos; 06.2nd, Issue, 24-28 April 2006 Page(s): 864 869 *2+ XiaoJie Yuan, Jing Fan, Design and Implementation of Voice Controlled Tetris Game Based on Microsoft SDK 978-1-61284-774-0/11 IEEE 2011. [3] Mukund Pabmanabhan, Michel Pichney, Large vocabulary speech recognition algorithms, IEEE Computer magazine, 0018-9162/02, pp. 42-50, 2002. [4] Fengyu Zhou, Guohui Tian, Yang Yang, Hairong Xiao and Jingshuai Chen, Research and Implementation of Voice Interaction System Based On PC in Intelligent Space Proceedings of the 2010 IEEE International Conference on Automation and Logistics August 16-20 2010, Hong Kong and Macau 978-1-42448376-1/10 IEEE 2010 *5+ Md. Abdul Kader, Biswajit Singha, and Md. Nazrul Islam, Speech Enabled Operating System Control Proceedings of 11th International Conference on Computer and Information Technology (ICCIT 2008) 25-27 December, 2008, Khulna, Bangladesh 1-4244-2136-7/08 IEEE 2008 [6] D. LeBlanc, Y. Ben Ahmed, S. Selouani, Y. Bouslimani, H. Hamam, Computer Interface by Gesture and Voice for Users with Special Needs 1-4244-0674-9/06 IEEE 2006 [7]Mu-Chun Su and Mina-Tsang Chung, Voice-controlled human computer Interface for the Disabled COMPUTING & CONTROL ENGINEERING JOURNAL OCTOBER 2001 *8+ Interacting With Computers by Voice: Automatic Speech Recognition and Synthesis by DOUGLAS OSHAUGHNESSY, SENIOR MEMBER, IEEE, PROCEEDINGS OF THE IEEE, VOL. 91, NO. 9, SEPTEMBER 2003, 0018-9219/03 IEEE 2003 [9]Omar Florez-Choque, Ernesto Cuadros-Vargas, Improving Human Computer Interaction Through Spoken Natural Language 1-4244-0707-9/07 Ieee 2007
*10+ Baseform Adaptation for Large Vocabulary Hidden Markov Model Based Speech Recognition Systems By Gediard Rigoll Ch2847-2/90/0000-0141 Ieee 1990 [11]http://www.johndavies.notts.sch.uk/children/documents/44PhonemesVoice d.ppt [12]http://www.microsoft.com/speech/download/sdk51/ *13+ Titus Felix Furtuna, Dynamic Programming Algorithms in Speech Recognition Revista Informatica Economica Nr. 2(46)/2008

Report Voice Recognigation

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report Voice Recognigation

Uploaded by

Copyright:

Available Formats

LIST OF FIGURES:

Figure1.1: Components of a typical speech recognition system.

Fig 3.2: Example of HMM

Fig 3.3: Time-Time matrix

4. SYSTEM REQUIREMENT AND SPECIFICATION:

DFD DFD Level 0:

CONTROL FLOW DIAGRAM

initialize speech engine

compare text command

Perform Action Command

Stop command Stop command

Speech Recognition Engine device microphone Text Commander

deployment spec Windows 8

deployment spec .net 4.0

HARDWARE AND SOFTWARE REQUIREMENT:

5. PLANING AND SHEDULING THE PROJECT WORK:

SOFTWARE ENGINEERING APPROACH:

Fig 5.1: Incremental Model

1. Organization and Completeness

Yes Yes Yes No Yes Yes Yes No Yes

SYSTEM IMPLEMENTATION PLAN:

1. EFFORT ESTIMATE TABLE:

Implementation Testing Documentation

Table 5.1 : Effort Estimate Table

Phase Phase 1 Phase 2 Phase 3

Task Analysis Literature survey Design

Table 5.2: Phase Description

Phase 4 Phase 5 Phase 6

Table 5.3: Project Plan

Table 5.4: Estimation of KLOC

Thus the total number of lines required is approximately 2.40 KLOC.

For Project: A: Voice Communication B: Algorithmic Processing

RISK MITIGATION, MONITORING AND MANAGEMENT PLAN

Technology Risks Is the technology to be built new to your organization? No

RISK: TECHNOLOGY DOES NOT MEET SPECIFICATIONS Mitigation

RISK: CHANGES IN REQUIREMENTS Mitigation

RISK: DATABASE IS NOT STABLE

You might also like