Voice Biometric Report

Voice Biometric Secured Control Unit
ABSTRACT
The proposed Voice Biometric Security for Industrial control verifies a persons claimed identity on the basis of his/her speech phrases. It is worddependent, which requires the speaker to say key words or sentences having the same words for both enrolling and verifying trials. This system uses template-matching approach to authenticate user. The system processes voice signal of the user, which is given as input and prepares a template by using Mel Frequency Cepstrum Coefficients (MFCC) technique and verifies it with the already existing template by using Vector Quantization (VQ) technique. Then it declares the person as valid person or an imposter depending on the template matching. Once Verification is achieved it allows the user get into a control panel and control LCD device from PC.
Department Of Electronics & Communication Appa Institute of Engineering and Technology, Gulbarga
Voice Biometric Secured Control Unit CHAPTER 1
Introduction
1.1 General introduction
The Voice Biometric Security for Industrial control is an application of Biometrics. Biometrics refers to the automatic identification of a person based on his/her physiological or behavioral characteristics. Much research has been done in the area of speaker verification using cepstral analysis, end point detection algorithms, pattern recognition, neural networks, stochastic models and many-distance measuring algorithms. However as per media reports, the satisfactory performance of the existing systems is still a matter of great concern because of the considerable number of false acceptances and false rejections. Speaker verification is a difficult task and it is still an active research area. A speech technology research center at Sydney University, Australia is actively involved in speech recognition, human speech perception, and natural language processing. Microsoft is also an active participant of speech technology research whose aim is to produce a complete speech enabled computer. To reach this stage, Microsoft is also contributed its research and development in

several other areas including noise robustness, automatic grammar learning and language modeling. The Speaker Verification can be used in many areas. Some of them are Access control to computers, cellular phones, databases, Access control to professional or wide public sites, Protection of confidential data, Remote access to computer networks, Electronic commerce, Forensic, Automatic door opening systems when person arrives, Telephone-banking transactions and Voice commands on cellular phones. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases. Given the voice input, the objective of the proposed word-dependent Voice Biometric Security for Industrial control is to verify whether the speaker is who he/she claims to be. The system processes voice signal of the user, which is given as input and finds the Mel Frequency Cepstrum Coefficients. Then it
generates a template, which is called a codebook, an array of acoustic vectors. In the enrollment phase the user codebook is saved in the system. In the verification phase the user codebook is compared against the claimed users actual codebook, which is stored in the system during the enrollment phase. If the difference is below the threshold value the user is authenticated else the user is not authenticated.
1.2
Statement of the problem

With the increased use of computers as vehicles of information
technology, it is necessary to restrict unauthorized access to the systems or to sensitive/personal data. The traditional methods of authentication involving passwords, PIN numbers and tokens have some drawbacks. Because PINs and passwords may be forgotten, and token-based methods of verification like passports and driver's licenses may be forged, stolen, or lost. Therefore, there is a need for a better method of authentication like voice verification which can eliminate these drawbacks, so that, the person to be verified is required to be physically present at the point-of-verification. The other authentication mechanisms (such as face recognition, iris recognition etc.) need a lot of equipment and conditions to work properly. So, there is a need for the cheaper authentication system like speaker verification, which works with less equipment.
1.3
Objective of the study

The objective of the study is to build a Voice Biometric Security for
Industrial control for the following reasons: Given a sample of somebody saying a particular piece of text/pass phrase (for example, the person's name), the system should be able to verify whether the speaker is who he/she claims to be. Furthermore, the system should be able to catch ``imposters'' who try to say somebody else's pass phrase. Once the person is authenticated the system allows him to control any device such as LCD, STEPPER MOTOR etc.
1.4
Scope of the study
As with all new security solutions, the application of voice biometrics technology within a given corporate environment will be based on set policies and known sensitivities. To aid its acceptance, voice verification can be combined with more traditional security features to provide an additional layer of authentication. The commercialization of voice technology offers network administrators, the new opportunities to enhance advanced user authentication methods, password control and innumerable user identification and network security applications. In a society where telecommunications and electronic commerce are the norm, the need to protect sensitive information will undoubtedly act as a catalyst to greater use of biometrics and in turn, result in improved technology. The present system can be extended to become an integral part of speech interfaces with voice identification. This is a complex task in which an unknown

voice template has to be compared with a huge set of voice templates to make it a voice verification product. This product can be used as part of multi-model verification, which includes face, fingerprints and iris scan etc. In this way, voice biometrics system is expected to create new services that will make our daily lives more convenient.
1.5
Review of literature
As per the needs and requirement for the study and implementation of Voice
Biometric Security for Industrial control, information is gathered from various possible sources. In fact, there is no single literature that could serve as a complete idea of speaker recognition. Before actually implementing the Voice Biometric Security for Industrial control, I needed to study about the recent progress, current applications and future trends in the area of speaker recognition. I have gone through [4], to know what is currently needed in the industry market. Since Voice Biometric Security for Industrial control is an application of a kind of Biometric method of identification which is used to authenticate a person on the basis of his/her behavioral characteristic like voice, I needed [9] and [10] to get a range of information about biometrics and speech technology.

The information needed for the methodology, which involves Mel Frequency Cepstrum Coefficients technique for Feature extraction process and Vector Quantization technique for Feature matching process, is taken from [5] and [6]. Feature extraction is the process that extracts a small amount of data from the voice signal that can later be used to represent the speaker. Feature matching involves the actual procedure to verify the speaker by comparing extracted features from his/her voice input with the claimed one which is stored in the database. To understand how to proceed in steps for developing the software for the system, and to analyze and design the system in object-oriented paradigm, [1] and [2] books have been referred. The Use-case driven object-oriented analysis and object-oriented design are done in a modeling language called Unified Modeling Language (UML). And the book [3] was found very helpful to implement the whole idea in Visual C++. The rest of the information needed about speaker recognition during developing the system, including its benefits and natural advantages are studied from [7], and [8].
Voice Biometric Secured Control Unit 1.6 Methodology

1.6.1 Organization of the system
The Voice Biometric Security for Industrial control mainly consists of two phases: Enrollment phase and the Verification phase. The organization of the system is shown below diagrammatically through the figures Fig.1.6.1 (a) and Fig.1.6.1 (b). During the first phase, speaker enrollment as shown in the Fig.1.6.1 (a), features are extracted from the input speech signal given by the speaker by a process called Feature extraction, and are modeled as a template. The modeling is a process of enrolling speaker to the verification system by constructing a model of his/her voice, based on the features extracted from his/her speech sample. The collection of all such enrolled models is called speaker database.
Fig.1.6.1(a): Schematic flow of Enrollment Phase In the second phase, verification phase as shown in the Fig.1.6.1 (b), features are extracted from the speech signal of a speaker and these current features are compared with the claimed features stored in the database by a process called Feature matching. Based on this comparison the final decision is made about the speaker identity.
Decision
10
Fig.1.6.1 (b): Schematic flow of Verification Phase Both these phases include Feature extraction, which is used to extract speaker dependent characteristics from speech. The main purpose of this process is to reduce the amount of test data while retaining speaker discriminative information.
1.6.2 Feature extraction process

1.6.2.1 Introduction
In speaker recognition first we convert the speech waveform to some type of parametric representation (at a considerably lower information rate) for further analysis and processing. This is often referred as the signal-processing front end. A wide range of possibilities exist for parametrically representing the speech signal for the speaker recognition task, such as Linear Prediction Coding (LPC), Mel Frequency Cepstrum Coefficients (MFCC), and others. MFCC is
perhaps the best known and most popular, and this will be used in this project. MFCCs are based on the known variation of the human ears critical bandwidths with frequency; filters spaced linearly at low frequencies and
11

logarithmically at high frequencies have been used to capture the phonetically important characteristics of speech. This is expressed in the Mel-frequency
scale, which is linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. The process of computing MFCCs is described in more detail next.
1.6.2.2 Mel-frequency cepstrum coefficients

processor
A block diagram of the structure of an MFCC processor is given in Fig.1.6.2.2. The speech input is recorded at a sampling rate of 16 KHz. This sampling frequency was chosen to minimize the effects of aliasing in the analogto-digital conversion.
continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
mel cepstrum Cepstrum
mel spectrum
Mel-frequency Wrapping
12

Fig.1.6.2.2: Block diagram of the MFCC processor The processor takes continuous speech signal as input and generates mel-frequency cepstrum coefficients as outputs. It uses the following 5 steps to accomplish this task:
(1) Frame blocking:

In this step the continuous speech signal is blocked into frames of N=16 samples, with adjacent frames being separated by M=10. The first frame
consists of the first N samples. The second frame begins M samples after the first frame, and overlaps it by N - M samples. Similarly, the third frame begins 2M samples after the first frame (or M samples after the second frame) and overlaps it by N - 2M samples. This process continues until all the speech is accounted for within one or more frames.
(2) Windowing:
The next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame. The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. If we define the window as w( n), 0 n N 1 , where N =16 is the number of samples in each frame, then the result of windowing is the signal:
13
y l (n) = xl (n) w(n),

0 n N 1
(1.1) I have used a Hamming window here, which has the form:
2n w(n) = 0.54 0.46 cos , N 1 0 n N 1
(1.2)
(3) Fast Fourier Transform (FFT):

The next processing step is the Fast Fourier Transform, which converts each frame of N =16 samples from the time domain into the frequency domain. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT) which is defined on the set of N samples {xn}, as follow:
N 1 k =0
X n = xk e 2jkn / N , (1.3)
n = 0,1,2,..., N 1
Note that we use j here to denote the imaginary unit, i.e.
j = 1 .
In
general Xns are complex numbers. The resulting sequence {X n} is interpreted as follow: the zero frequency corresponds to n = 0, positive frequencies 0 < f < Fs / 2 correspond to values 1 n N / 2 1 , while negative frequencies Fs / 2 < f < 0 correspond to N / 2 + 1 n N 1 . Here, Fs denote the sampling frequency which is 16KHz.The result after this step is often referred to as spectrum or periodogram.
14

The FFT algorithm I have used is discovered by J.W. Cooley and J.W. Tukey in 1965. This algorithm is based on the following three steps: 1. Decompose an N point time domain signal into N signals each containing a single point. An Interlaced decomposition is used each time a signal is broken in two, that is, the signal is separated into its even and odd numbered samples. There are Log2 N stages required in this decomposition. Since I have used a 16 point signal, it requires 4 stages.
2. Find the spectrum of each of the N point signals. Nothing is required to do this step because the frequency spectrum of a 1 point signal is equal to itself. Although there is no work involved, it is to be remembered that each of the 1 point signals is now a frequency spectrum, and not a time domain signal.
3. Synthesize the N frequency spectra into a single frequency spectrum. Here, the N frequency spectra are combined in the exact reverse order that the time domain decomposition took place. In other words, this synthesis must undo the interlaced decomposition done in the time domain.
15

For example, an 8 point time domain signal can be formed by two steps: dilute each 4 point signal with zeros to make it an 8 point signal, and then add the signals together. Diluting the time domain with zeros corresponds to a duplication of the frequency spectrum. Therefore, the frequency spectra are combined in the FFT by duplicating them, and then adding the duplicated spectra together. In order to match up when added, the two time domain signals are diluted with zeros in a slightly different way. In one signal, the odd points are zero, while in the other signal, the even points are zero. In other words, one of the time domain signals is shifted to the right by one sample. This time domain shift corresponds to multiplying the spectrum by a sinusoid.
(4) Mel-frequency wrapping:

Mel Frequency Cepstrum are the spectral component of the frequency response of the sound signal. It is described in detail as a separate chapter in DFT and Cepstrums.
1.6.3 Feature matching process

1.6.3.1 Introduction:
The problem of speaker recognition belongs to a much broader topic called as pattern recognition in the scientific and engineering fields. The goal of pattern recognition is to classify objects of interest into one of a number of
16

categories or classes. The objects of interest are generically called patterns and in our case are sequences of acoustic vectors that are extracted from an input speech using the MFCC technique described in the previous section. The
classes here refer to individual speakers. Since the classification procedure in our case is applied on extracted features, it can be also referred to as feature matching. The state-of-the-art in feature matching techniques used in speaker recognition includes Dynamic Time Warping (DTW), Hidden Markov Modeling (HMM), and Vector Quantization (VQ). In this project, the VQ approach will be used, due to ease of implementation and high accuracy. VQ is a process of mapping vectors from a large vector space to a finite number of regions in that space. Each region is called a cluster and can be represented by its center called a codeword. The collection of all code words is called a codebook.
1.6.3.2 Clustering the training vectors:

After the enrollment session, the acoustic vectors extracted from input speech of a speaker provide a set of training vectors. The next important step is to build a speaker-specific VQ codebook for this speaker using those training
17

vectors. There is a well-known algorithm, namely LBG algorithm [Linde, Buzo and Gray, 1980], for clustering a set of L training vectors into a set of M codebook vectors. I have used a simple version of the LBG algorithm, which is given below: 1. Determine the number of codewords, N or the size of the codebook. 2. Select N codewords at random and let that be the initial codebook. The initial codewords can be randomly chosen from the set of input vectors. 3. Using the Euclidean distance measure clusterize the vectors around each codeword. Do this by taking each input vector and finding the Euclidean distance between it and each codeword. The input vector belongs to the cluster of the codeword that yields the minimum distance. 4. Compute the new set of codewords. Obtaining the average of each cluster does this. Add the component of each vector and divide by the number of vectors in the cluster as given below:
yi = 1/m xij
(1.6)
j=1
where i is the component of each vector and m is the number of vectors in the cluster. 5. Repeat steps 3 and 4 until either the codewords dont change or the change in the codewords are small.
18

1.6.3.3 Comparison of codebooks:
In the enrollment phase, a speaker-specific VQ codebook is generated for the speaker by clustering his/her training acoustic vectors. The distance from a vector to the closest codeword of a codebook is called a VQ-distortion. In the verification phase, an input utterance of the speaker is vector-quantized using his trained codebook and the total VQ distortion is computed. If this total VQ distortion is below the threshold value, then the speaker is authenticated else the speaker is not authenticated.
1.7 Limitations of the study
19

The current Voice Biometric Security for Industrial control is though very efficient as per the demands and requirements of the present trends, but still it has got some following limitations: Its hard for the system to distinguish background noise from the users voice. This might affect the systems accuracy. A persons physical conditions such as sickness or sore throat will affect the performance. This could lead to the problem of false rejection in which the right person is rejected as wrong
person/imposter. A small number of false acceptances can be noticed when the system accepts an imposter as a right person to be authenticated. Our system performs poorly if the voice phrase is very small or if there is a considerable difference in the enrollment stage and verification stage. The factors responsible for this difference may be variance in loudness, environmental conditions or the distance between the speaker and microphone.
CHAPTER - 2 Department Of Electronics & Communication Appa Institute of Engineering and Technology, Gulbarga
20
SYSTEM ANALYSIS
This chapter gives an analysis of Voice Biometric Security for Industrial control. This includes project overview, functional requirements, system requirements, technical specifications, developers responsibility overview and Use-Case Driven analysis of the system using use-case diagrams.
2.1 Documentation Overview

The document is subdivided into following topics: Project overview Functional requirements System requirements Technical specifications Developers responsibility overview
2.1.1 Project overview

The current work is to automate the process of voice authentication. Voice authentication can be divided into user registration and user verification. The user registration module is to register the users voice characteristics in the database file. The user verification module finds the correlation between the input voice template and the stored voice template in the database.
21

2.1.2 Functional requirements
User asks to register the voice and the username. Application receives the request. The system checks for the existence of the corresponding username in the stored database file. If it is not present, that will be registered in the database file. User asks for the verification of the voice. System receives the request and processes the request. System returns the credibility of the user. These requirements will be explained by the Use-case diagrams of the UML standard in the last section of this chapter.
2.1.3 System requirements

A Noise free environment is required for the successful operation of the proposed system.
2.1.3.1 Hardware Requirements
Processor RAM Hard disk
: Pentium or above : 32 MB or above : min. of 1 GB ( with at least 30 MB of free space)
Peripheral Devices : SVGA Color Monitor, Mouse , Keyboard
22

High Quality Microphone, Speakers.
2.1.3.2 Software Requirements
Operating System Software
: Windows 95/98/Me/Xp/2000 , NT : Visual C++, Sound Card Interface, MSDN Online Help .
2.1.4 Technical specifications

The specifications were drawn up using Object-Oriented Analysis, and design was constructed using Unified Modeling Language (UML). The whole idea was implemented in Visual C++.
2.1.5 Developers responsibility overview

The overall objective is to deliver a fault free product on time. Assumptions: The voice is taken in a noise free environment through a high quality microphone and is stored in a file. And the file is read for further processing. The Architecture must be open so that any modules can be added. The product must be easy to operate and should be user friendly. The product must conform to the clients hardware.
23

And moreover, the client is assumed to be inexperienced with computers. Special attention was therefore paid to the specification phase and
communication with the client. The product has to be made as user friendly as possible. There is always the possibility of a major design fault; so extensive testing must be done during the design phase. The product must meet the specified storage requirements and response times.
24
Voice Biometric Secured Control Unit 2.2 Use-Case Driven Analysis

The first step in the analysis is to define the use-cases, which describe what the system provides in terms of functionality- the functional requirement of the system. A use-case analysis of UML standard involves reading and analyzing the specifications as well as discussing the system with potential users of the system. The next chapter introduces to UML modeling and explains how my system is modeled using UML. The use-case diagrams describe the functionality of the system and users of the system. A use case is a description of a systems behavior from users standpoint. The main elements of this type of diagrams are: 1.Actors represents the users of a system including human beings and other systems. 2. Use Cases represent the functionality or services provided by the system to users. Now let us see use-case diagrams which depict the project at different levels: (1) The following use-case diagram Fig.2.2 (a) depicts the project at first level, where the actor is shown as a stick figure with the name of the actor below the figure and the use case is shown as an ellipse containing the name of the use case.
25

First Level Use-Case diagram:
User Registration
User Verification User User Management
Fig.2.2 (a): Use-Case diagram for the proposed system
Description of the diagram:

1.
Use Case Actor Desired Outcome Entered When
: : : :
User Registration User The User name is registered. The user submits his name, clicks on User Registration and records his voice (some kind of pass phrase).
Finished When Description
: :
The users name is registered. The use-case is to register the user name when it is not present in the database file.
2.
Use Case
User Verification
26

Actor Desired Outcome Entered When : : : User Is whether the user is authorized or not. The user submits his name, clicks on User Verification and submits his voice. Finished When Description : : The result is given. This use-case is for checking the credibility of the user by comparing the submitted
(recorded) pass phrase (after converting into template) with the already existing template in the database.
3.
Use Case Actor Desired Outcome
: : :
User Management User User will be deleted or his / her pass phrase will be changed according to the requirements.
Entered When
The user submits his name, clicks on User Management and if required submits his voice.
Finished When Description
: :
The user Request is processed. This is an option to the administrator to Manage the users.
27

(2) The following use-case diagrams Fig.2.2 (b), 2.2(c) and 2.2(d) depict the project at second level, where the actor is shown as a class rectangle with the label <<actor>> and the use case is shown as an ellipse containing the name of the use case. Second Level Use-Case diagrams: a) Use-Case diagram for User Registration
Read template file

uses
<<actor>> Registration module
uses
Enroll uses
Generate template
Sample voice signal
Fig.2.2 (b): Use-case Diagram for User Registration Description of the diagram: Use Case Actor Desired Outcome Entered when : : : : Enroll Registration module Write the template to the database file. The user selects the registration and supplies name, pass phrase as voice and voice signal is sampled. Finished When : The characteristics of the voice (template) and username are written to the database file.
28

Description : This use case writes the template to the database file.
b) Use-Case diagram for User Verification
Compare templates
uses
<<actor>> Verification module
Verify
uses
Generate template
uses
Sample voice signal
Fig.2.2(c): Use-case Diagram for User Verification Description of the diagram: Use Case Actor Desired Outcome Entered when : : : : Verify Verification module Verifies whether valid user or not. The user selects the verification and supplies name, pass phrase as voice and voice signal is sampled. Finished When Description : : The final result-accepted or rejected is displayed. This use case verifies the identity of a user.
29
c) Use-Case diagram for User Management
Delete User
uses
<<actor>> Management module
Write to file
uses
Change Pass phrase
Fig.2.2 (d): Use-case diagram for User Management Description of the diagram:
1.
Use Case Actor Desired Outcome Entered when
: : : :
Delete User Management module The user template is deleted from the file. The user selects the Management module and submits his name.
Finished when Description
: :
The user template is deleted from the database This use case deletes the user template from
the database. It in turn uses the use case Write to file.
30

2. Use Case Actor Desired Outcome Entered when : : : : Change Pass phrase Management module The modified user template. The user selects the Management module and submits his name to which the pass phrase has to be modified. Finished when Description : The user template is modified.
: This use case modifies the user pass phrase and stores it back into the database. It inturn uses the use case Write to file.
31
Voice Biometric Secured Control Unit CHAPTER - 3
SYSTEM DESIGN
3.1 Introduction
The Design is the first step into the solution domain. The Voice Biometric Security for Industrial control is designed into 5 main class modules: CVoiceBiometricDlg, CSound, CParam CMath, and CVectorQuantization. The following diagram Fig.3.1 shows these classes used in the system:
CMath
CSound
CParam
CVectorQuantization
CVoiceBiometricDlg : Enroll
CVoiceBiometricDlg : Verify
CVoiceBiometricApp
Fig.3.1: Classes used in the system
32

Description :
The Voice Biometric Security for Industrial control first starts by creating an instance of the CVoiceBiometricApp object. Then its InitInstance( ) function creates a dialog window of the type CVoiceBiometricDlg. The
CVoiceBiometricDlg class mainly presents the user interface features and is a coordination point for calling various functions. When its OnEnroll( ) function is called, a CSound object is created and pass phrase is recorded in a temp.wav file. This wave file thus generated by the CSound class is used by the CParam class for generating MFCCs in a temp.mfc file. Then it is used by CVectorQuantization class to produce a codebook file. In enrollment phase this codebook file is saved as username.cb (for example, if the username is Kanchan, then the codebook file is saved as kanchan.cb). In Verification phase same steps are repeated and the codebook is saved as temp.cb which is used by compcbmain() to compare the claimed user codebook with the current codebook. If the matching score is appropriate user is authenticated else user is not authenticated.
33
Voice Biometric Secured Control Unit 3.2 UML Representation Of The System Design
3.2.1. Introduction to UML
The Voice Biometric Security for Industrial control is designed using Object oriented design methodology (Unified Modeling Language). Over the past decade, Grady booch, James Rumbaugh and Jacobson have collaborated to combine the features of their individual object oriented analysis and design method into a Unified method, the result called the Unified Modeling Language (UML), has become widely used throughout the industry. The Unified Modeling Language (UML) is a modeling language for specifying, visualizing, constructing, and documenting the artifacts of systemintensive process. UML allows us to express an analysis model using a modeling notation that is governed by a set of syntactic, semantic and pragmatic rules. This language unifies the industrys best engineering practices for modeling the systems. Some worth referring points about UML are: The UML is not simply a notation for drawing diagrams, but a complete language for capturing knowledge about a subject and expressing knowledge regarding the subject for the purpose of communication. UML applies to modeling and systems. Modeling involves a focus on understanding a subject and capturing and being able to communicate this knowledge.
34

UML is used for specifying, visualizing, constructing, and documenting systems. It is based on object-oriented paradigm.
3.2.2 UML Diagrams

The UML metamodel elements are organized into diagrams. Different diagrams are used for different purposes depending on the angle from which you are viewing the system. The different views are called architectural views. Architectural views facilitate the organization of knowledge, and diagrams enable the communication of knowledge. The UML defines nine types of diagrams: class, object, use case, sequence, collaboration, statechart, activity, component, and deployment diagram. All of these diagrams are based on the principle that concepts are depicted as symbols and relationships among concepts are depicted as paths connecting symbols, where both of these types of elements may be named. Now let us see an architectural view called Behavioral Model View, which encompasses the dynamic, or behavioral, aspects of a problem and solution. Our system design can be best explained using the following diagrams of this view:
(1) Sequence Diagrams:
35

These diagrams describe the behavior provided by a system to interactors. These diagrams contain classes that exchange messages within an interaction arranged in time sequence. In generic form, these diagrams describe a set of message exchange sequences among a set of classes. In instance form, these diagrams describe how objects of those classes interact with each other and how messages are sent and receiveThe following Sequence Diagrams Fig.3.2.2(a), 3.2.2(b) and 3.2.2(c) show the interactions between the objects of the classes used in the system during different phases. These objects are represented in usual way as named rectangles and appear on the horizontal axis of the diagram from left to right. Messages are represented as solid line arrows, which give the interaction taken place among the objects in a specified sequence. The time is represented by vertical line, which is a line extending from the object and is called as the objects lifeline. This lifeline represents the objects existence during the interaction(a) The following sequence diagram Fig.3.2.2(a) shows the interactions between the objects of the classes used in the system during enrollment phase:
36
CVoiceBiometricDlg
Enrollment
CSound
CParam
CVectorQuantization
OnEnroll()
RecordStart()
temp.wav temp.wav temp.mfc temp.mfc

temp.cb
username.cb
Fig.3.2.2(a): Sequence diagram for User Enrollment
37

Description :
1. When the user tries to register his voice in the database, an OnEnroll( ) function of the CVoiceBiometricDlg object is called which inturn calls the RecordStart( ) function of the object CSound, to record the user voice. 2. After recording, the voice is stored into the temp.wav file, which is then used by CParam object to extract the voice features from it. 3. After extraction, the extracted features are stored into the temp.mfc file, which is then used by CVectorQuantization object, to generate the temp.cb codebook file. 4. The generated codebook file is then saved as username.cb into the database by the function saveusercb( ) of CVoiceBiometricDlg object.
38

(b) The following sequence diagram Fig.3.2.2(b) shows the interactions between the objects of the classes used in the system during verification phase:
CVoiceBiometricDlg
Verification
CSound
CParam
CVectorQuantization
OnVerify( )
RecordStart( )
temp.wav
temp.wav
temp.mfc temp.mfc
temp.cb
Result
compcbmain( )
Fig.3.2.2(b): Sequence diagram for User Verification
39
Description :
1. When the user tries to verify his registered voice, an OnVerify( ) function of the CVoiceBiometricDlg object is called which inturn calls the RecordStart( ) function of the object CSound, to record the user voice. 2. The steps 2 and 3 of Fig.3.2.2(a) are followed similarly. 3. The generated temp.cb codebook is then compared with the claimed username.cb codebook by the function compcbmain( ) of
CVectorQuantization object. 4. Depending on the matching score, the CVoiceBiometricDlg object declares the Result to the user as he is authenticated or he is not authenticated.
40

(c) The following sequence diagram Fig.3.2.2(c) shows the interactions between the objects of the classes used in the system when the user modify his voice: CVoiceBiometricDlg Modification CSound CParam CVectorQuantization
OnModify( )
RecordStart( )
temp.wav temp.wav temp.mfc temp.mfc

temp.cb Replace old usercb with new usercb
Fig.3.2.2(c): Sequence diagram for User Modification
41
Description :
1. When the user tries to modify his registered voice, an OnModify( ) function of the CVoiceBiometricDlg object is called which inturn calls the RecordStart( ) function of the object CSound, to record the user voice. 2. The steps 2 and 3 of Fig. 3.2.2(a) are followed similarly. 3. The old username.cb is then replaced with the new username.cb codebook. (2) Statechart Diagrams: These diagrams render the states and responses of a class participating in behavior, and the life cycle of an object. These diagrams describe the behavior of a class in response to external stimuli. In the following statechart diagrams Fig.3.2.2(d), 3.2.2(e) and 3.2.2(f) for the different classes of the system, the state is represented as a rounded box that describes the class at a specific point in time. The states shown in all these 3 diagrams are self-explanatory. The initial state is shown as a solid circle, and the final state is shown as a circle surrounding a small solid circle, a bulls-eye. The transition is represented as a solid line arrow, which shows the relationship
42

between the two states indicating that the class in the first state will enter the second state and perform certain actions when a specific event occurs.
(a)The following diagram Fig.3.2.2(d) shows the Statechart diagram for CSound class in the system:
Start
Loads the default recording format Gets the given recording format
Stop recording voice
Begin recording voice
Stop Fig. 3.2.2(d): Statechart diagram for CSound class
43
(b) The following diagram Fig.3.2.2 (e) shows the Statechart diagram for CParam class in the system:
Start
Continuous speech signal is divided into frames
Each frame is windowed
FFT is applied to each windowed frame
Cepstrum is applied to each mel spectrum, we get MFCC coeff. Stop
Spectrum coeff. are converted into mel frequency spectrum coeff.
44
Stop
Fig.3.2.2 (e): Statechart diagram for CParam class
45

(c) The following diagram Fig.3.2.2(f) shows the Statechart diagram for CVectorQuantization class in the system:
Startt t
Initialize the size of the codebook
By applying VQ algm., clusterize the vectors to prepare codebook
Compare this codebook with the trained codebook to calculate total VQ distortion for Verification phase
Save the codebook in a file
Stop
Fig.3.2.2 (f): Statechart diagram for CVectorQuantization class
46
(3) Activity Diagram: Activity diagrams render the activities or actions of a class participating in behavior. These diagrams describe the behavior of a class in response to internal processing rather than external events. Activity diagrams describe the processing activities within a class. An activity diagram is similar to a statechart diagram, where states are activities (rounded boxes) representing the performance of operations and the transitions (solid line arrows) are triggered by the completion of the operations. The following activity diagram Fig.3.2.2 (g) shows the activities of 4 main phases of the system: Enroll, Modify, Verify & Delete (shown from left to right). All the activities shown in this diagram are self-explanatory.
stop
47
Fig.3.2.2 (g): Activity diagram of the system

Start
User has to enter his name
User is asked for new pass phrase
User is asked for pass phrase
User is asked for pass phrase
User is deleted from the database
MFCC coeff. are extracted from the recorded pass phrase
By applying vector quantization algm., clusterize the vectors Voice features are stored in a file
By applying vector quantization algm., clusterize the vectors
By applying vector quantization algm., clusterize the vectors
Voice features are stored in a file which is then replaced with the old file
Voice features are stored in a file
Compare these features with already stored features in the file
Message will be displayed for real user/imposter
Department Of Electronics & Communication Stop Appa Institute of Engineering and Technology, Gulbarga
48
IMPLEMENTATION
4.1 Coding Details
During the course of my involvement with speaker verification, I have come across several papers with information about speaker verification technologies but unfortunately most of them were not concerned with the topic. Many of them represent the theoretical aspects but very few present the implementation methodologies. I have implemented a Voice Biometric Security for Industrial control that uses Mel Frequency Cepstrum Coefficients ( MFCC ) which are used for feature extraction and Vector Quantization ( VQ ) for feature matching. For sound recording, the duration of recording has been set to three seconds. The exact timing of an utterance will generally not be the same as that of the template. If a person speaks just a small word (e.g. Hello, or any small word), he may not be authenticated although he is a genuine user and other may easily login to his account. For this purpose if the utterance length is less than one second the system will ask to speak a longer phrase.
49

The genuine users can be correctly authenticated, provided that they speak in the same way (including the emotions, loudness, and bearing in mind that this is a word-dependent and speaker dependent system) as they did at the time of the enrollment. The authentication becomes easier if the recording time for the speech is at the maximum of two seconds. The system is implemented in object-oriented paradigm. The system has been divided into 5 main classes consisting of CSound for recording wave files, CParam for extracting the MFCCs from the wave file, CVectorQuantization for generating the code book from the MFCCs. CMath is a class which provides an easy way for calculating Fast Fourier Transform etc. CVoiceBiometricDlg was the main dialog where the user interface is implemented. The other classes in the system were of various controls and of some ActiveX controls. The next section explains the various functions used by these classes and how the development of the code is being proceeded.
50
4.2 Detail Implementation

The following Table 4.2 lists the main functions used by the classes in the system:
CLASS CvoiceBiometricDlg
FUNCTION OnEnroll( ), OnVerify( ) , OnModify( ), OnDelete( ), registeruser( ), SetTimer( ), OnTimer( ), SaveFile( ), RemoveSilence( ) rkmain( ), WaveLoad( ), vq( ),
Csound Cparam
saveusercb( ) RecordStart( ), RecordStop( ) InitFBank( ), Mel( ), InitMFCC( ), wave2MFCC( ), DoHamming( )
Cmath CvectorQuantization
UnInitMFCC( ), UnInitFBank( ) fft( ), fftsort( ), dct( ) initcb( ), clusterize( ), eucldist( ), writecb( ), compcbmain( )
Table 4.2: Classes and their functions used in the system
51
CVoiceBiometricDlg class provides four important functions to manipulate the users voice signal. These are: 1. OnEnroll( ) : It registers the users voice features in the database. 2. OnVerify( ) : It verifies the current users voice features with the claimed users voice features. 3. OnModify( ) : It replaces the old users voice features with the new users voice features. 4. OnDelete( ) : It just deletes the users voice features from the database. The above first three functions inturn use various functions of the other four classes- CSound, CParam, CMath, and CVectorQuantization, to accomplish their tasks. Let us see how these functions proceed during the development of code: (1) OnEnroll( ) function: The steps followed by this function are as follows: 1. The users voice is recorded with the help of registeruser( ) function which inturn takes the help of following functions: a) RecordStart( ) : To begin the recording of users voice.
52

b) SetTimer( ) : Its a function of a Microsoft Foundation Class (MFC) called CWnd. It is used to set the timer for 3 seconds to record the users voice. 2. When the timer elapses, the function OnTimer( ) is called in which the timer code is written. In this code, all of the following steps are carried out. 3. The RecordStop( ) function is called to stop the recording. 4. Recorded voice is stored into a file called temp.wav by the function SaveFile( ). 5. The RemoveSilence( ) function is called to remove the silence present in temp.wav file. 6. The rkmain( ) function is called to extract the MFCCs from the temp.wav file. For this it uses the following steps: a) First, temp.wav file is loaded into the memory by the function WaveLoad( ). b) Then all the functions of CParam class are called in the following order: i) InitFBank( ) : To initialize the FBankInfo structure which gives the information needed about the filter banks to be created. It inturn uses Mel( ) to apply mel-frequency scale on each frame. ii) InitMFCC( ) : To initialize the MFCCInfo structure which gives the information needed about the MFCCs to be created.
53

iii) wave2MFCC( ) : To extract the MFCCs from temp.wav file. The MFCCs are stored into a file called temp.mfc.This function inturn calls other functions in the following order: DoHamming( ) to apply windowing on each frame. fft( ) to apply Fast Fourier Transform algorithm.It inturn calls fftsort( ) to implement Interlaced decomposition by using Bit reversal sorting algorithm which involves rearranging the order of the N time domain samples by counting in binary with the bits flipped left-for-right. dct( ) to apply Discrete Cosine Transform on each frame. iv) UnInitMFCC( ) : To uninitialize the MFCCInfo when the work is done. v) UnInitFBank( ) : To uninitialize the FBankInfo when the work is done. 7. The vq( ) function is called to clusterize the acoustic vectors of temp.mfc into a codebook called temp.cb. It inturn calls the other functions of class CVectorQuantization in the following order: a) initcb( ) : To initialize the size of the codebook. b) clusterize( ) : To clusterize the acoustic vectors using LBG algorithm. It inturn uses the function eucldist( ) to calculate the Euclidean distance between the vectors. c) writecb( ) : To store the codebook in a file called temp.cb.
54

8. The file temp.cb is stored as username.cb into the database by the function saveusercb( ). (2) OnVerify( ) function: The steps followed by this function are as follows: 1. The steps 1 to 7 used by the OnEnroll( ) function are followed similarly,to generate the codebook temp.cb for the current user. 2. To verify the current users codebook with his trained codebook, the compcbmain( ) function is then called, where the total VQ distortion is calculated by comparing current temp.cb with claimed username.cb of the user.
(3) OnModify( ) function: The steps 1 to 8 used by the OnEnroll( ) function are followed similarly,but in the step 8, the old username.cb is replaced with the new temp.cb of the user.
55
CHAPTER - 5
Output
56
CHAPTER - 6
Hardware Specifications
Parallel port modes
The IEEE 1284 Standard which has been published in 1994 defines five modes of data transfer for parallel port. They are: 1. Compatibility Mode 2. Nibble Mode 3. Byte Mode 4. EPP 5. ECP
57

As the name refers, data is transferred over data lines. Control lines are used to control the peripheral, and of course, the peripheral returns status signals back to the computer through Status lines. These lines are connected to Data, Control And Status registers internally. The details of parallel port signal lines are given below: Pin No (DB25) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18-25 Signal name nStrobe Data0 Data1 Data2 Data3 Data4 Data5 Data6 Data7 nAck Busy Paper-Out Select Linefeed nError nInitialize nSelect-Printer Ground Direction Out In/Out In/Out In/Out In/Out In/Out In/Out In/Out In/Out In In In In Out In Out Out Register bit Control-0 Data-0 Data-1 Data-2 Data-3 Data-4 Data-5 Data-6 Data-7 Status-6 Status-7 Status-5 Status-4 Control-1 Status-3 Control-2 Control-3 Inverted Yes No No No No No No No No No Yes No No Yes No No Yes -
6.1 Parallel port registers

As you know, the Data, Control and Status lines are connected to there corresponding registers inside the computer. So, by manipulating these registers
58

in program, one can easily read or write to parallel port with programming languages like 'C' and BASIC. The registers found in a standard parallel port are: 1. Data register 2. Status register 3. Control register As their names specify, Data register is connected to Data lines, Control register is connected to Control lines and Status register is connected to Status lines. (Here the word connection does not mean that there is some physical connection between data/control/status lines. The registers are virtually connected to the corresponding lines.) So, whatever you write to these registers will appear in the corresponding lines as voltages. Of course, you can measure it with a multimeter. And whatever you give to Parallel port as voltages can be read from these registers (with some restrictions). For example, if we write '1' to Data register, the line Data0 will be driven to +5v. Just like this, we can programmatically turn on and off any of the Data lines and Control lines. 6.2 Where these registers are? In an IBM PC, these registers are IO mapped and will have an unique address. We have to find these addresses to work with the parallel port. For a typical PC, the base address of LPT1 is 0x378 and of LPT2 is 0x278. The Data register resides at this base address, Status register at base address + 1 and the
59

control register is at base address + 2. So, once we have the base address, we can calculate the address of each register in this manner. The table below shows the register addresses of LPT1 and LPT2. Register Data register (baseaddress + 0) Status register (baseaddress + 1) Control register (baseaddress + 2) LPT1 LPT2 0x378 0x278 0x379 0x279 0x37a 0x27a
Standard LCD Pin Matches (Character number <80)
Pin 1 2 3
Symbol/Alternate Symbol Vss Vdd/Vcc Vee/Vo
Possibility -
Function Power supply (GND) Power supply (+5V) Contrast adjust
60

4 RS 0/1 0 = Instruction input / 1 = Data input 0 = Write to LCD module / 1 = Read from LCD module Enable signal Data pin 0 Data pin 1 Data pin 2 Data pin 3 Data pin 4 Data pin 5 Data pin 6 Data pin 7
R/W
0/1
6 7 8 9 10 11 12 13 14
E DB0 DB1 DB2 DB3 DB4 DB5 DB6 DB7
1, 1-->0 0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1
If your LCD has more than 80 characters (like 4x40)
15 16
E2
1, 1->0 Enable signal row 2 & 3 Not used mostly
HY1602B (Hyper 1602B) with KS0065 controller (compatible with HD44780) and backlight. Click for a bigger picture
61
1602-04 with KS0066U controller (compatible with HD44780) and backlight Click for a bigger picture
These are 2 different 2x16 LCDs as in the pictures the 15th and 16th pins are for backlight as I mentioned above.
Pinout Descriptions
Pin 1, 2, 3 : According to the table, I call Pin 1(Vss),2(Vdd/Vcc),3(Vee/Vo) power pins because they are the gates to power. Pin 1 is for ground so you have to connect to ground/earth and Pin 2 is for the +5V power supply. 6V or 4,5V is mostly acceptable in few amperes and also 3V is acceptable on some of the LCD modules (You can also power these modules with a battery in a very economical way). In my application I get the voltage from the molex cable of the pc which is
62

inside the case. And pin 3 is for the LCD's contrast adjustment. I did not but you could use a potentiometer (10K pot will be ok) for changing the contrast of your LCD. See the schematics below Pin 4, 5, 6: I call Pin4 (RS), 5(R/W),6(E) the control buddies because these pins are the arms of your controller inside your LCD module. Pin 4(RS) is registration select if this pin is low the data perceived by the data pins taken as commands by the LCD and if this pin is high the LCD can receive/send 8 or 4 bit character data. I call Pin 5(R/W) clerk because when this pin is low you can write character to the LCD, if the pin is high you can read character data or the status information from the LCD. I didn't make any read operations in my app so I solder this pin to the ground (with soldering this to the ground I made this pin low - "0" see the below circuits). Pin 6(E) which I call the guardian, is used to initiate the actual transfer of commands or character data between the LCD module and the data pins. Pin 7,8,9,10,11,12,13,14: The eight pins which are DB0-DB7 are the data pins which I call them the workers. The data can be transferred or fetched from the LCD by 8 or 4 bits. Which one is better? This is up to you, by the way if you are using a microcontroller and you have few pins you can use your module in 4 bit mode (by using DB4-DB7). I used 8 bit mode in my LCD because I used the parallel port which already have 8 bit data lines (remember my first article, part 1 D0-D7) Pin 15,16 :These two pins are for the backlight of the LCD module. 15th pin goes to the power supply(VB+) and 16th pin goes to the ground(VB-). Backlight is very useful in dim environments but some LCD modules don't have backlights. There are multicolored LCDs around as well.
Circuit
63

What we need to supply for our circuit? Below is a list of that: No: 1. 2. 3. 4. 5. 6. 7. 8. 9. * Description 2x16 Paralell LCD must be HD44780 compatiable Normal parallel printer cable Normal twin power cable 10 way housing and the PCB header 6 way housing and the PCB header Hard drive type power connector (Molex) 16 PCB terminals A digital multimeter(must measure few ampers!), a solder pen with some soldering iron Some patience, and my program :) 10 K potentiometer (Not required - needed when you want to adjust the contrast of your LCD - see circuit with potentiometer)
The circuit with potentiometer
64
If your soldering goes well you get a typical test screen of a character based LCD as its shown below:
65
Power Supply
Regulated power supply:
66
The +5V supply voltages are obtained from the 12volts transformer the output of these secondary is then applied to the bridge rectifier which converts the sinusoidal input into full wave rectifier output. The filter capacitor at the output of the bridge rectifier are charged to the peak value of rectifier output voltage whenever the diode are forward biased. Since diodes are not forward biased during entire positive and negative half cycle of the output waveform. The voltage across the filter capacitor is a pulsation dc i.e. combination of DC and ripple voltage. From the pulsating DC voltage, a regulated DC voltage is obtained by a regulated IC 7805.
Pin configuration of LM 7805 IC:
Specification:
IC No. Vout (V) I max(A) Load Regulatio n (mV) Line regulation (mV) Drop O/P (V)
67

LM7805 LM7806 5 6 1.5 1.5 10 12 3 5 2 2
Hardware:
Everybody knows what is parallel port, where it can be found, and for what it is being used. The primary use of parallel port is to connect printers to the computer and is specifically designed for this purpose. Thus it is often called as printer Port or Centronics port (this name came from a popular printer manufacturing company
'Centronics' which devised some standards for parallel port). You can see the parallel port connector in the rear panel of your PC. It is a 25 pin female (DB25) connector (to which printer is connected). On almost all the PCs only one parallel port is present, but you can add more by buying and inserting ISA/PCI parallel port cards. The power supply is 9 volt regulated power supply. We have used a step down transformer of 9 volt. The result is rectified using a bridge rectifier circuit. The result is passed through regulator IC and the Out put is coupled through a 1 microfarad capacitor for better impedence matching.
STEPPING MOTORS:
68

Motion Control, in electronic terms, means to accurately control the movement of an object based on speed, distance, load, inertia or a combination of all these factors. There are numerous types of motion control systems, including; Stepper Motor, Linear Step Motor, DC Brush, Brushless, Servo, Brushless Servo and more. This document will concentrate on Step Motor technology. In Theory, a Stepper motor is a marvel in simplicity. It has no brushes, or contacts. Basically it's a synchronous motor with the magnetic field electronically switched to rotate the armature magnet around. A Stepping Motor System consists of three basic elements, often combined with some type of user interface (Host Computer, PLC or Dumb Terminal):
The Indexer (or Controller) is a microprocessor capable of generating step pulses and direction signals for the driver. In addition, the indexer is typically required to perform many other sophisticated command functions. The Driver (or Amplifier) converts the indexer command signals into the power necessary to energize the motor windings. There are numerous types of drivers, with different current/amperage ratings and construction technology. Not all drivers are suitable to run all motors, so when designing a Motion Control System the driver selection process is critical. The Step Motor is an electromagnetic device that converts digital pulses into mechanical shaft rotation.
69

Advantages of step motors are low cost, high reliability, high torque at low speeds and a simple, rugged construction that operates in almost any environment. The main disadvantages in using a step motor is the resonance effect often exhibited at low speeds and decreasing torque with increasing speed.
TYPES OF STEPPER MOTORS:

There are basically three types of stepping motors; variable reluctance, permanent magnet and hybrid. They differ in terms of construction based on the use of permanent magnets and/or iron rotors with laminated steel stators.
VARIABLE RELUCTANCE:
The variable reluctance motor does not use a permanent magnet. As a result, the motor rotor can move without constraint or "detent" torque. This type of construction is good in non industrial applications that do not require a high degree of motor torque, such as the positioning of a micro slide. The variable reluctance motor in the below illustration has four "stator pole sets" (A, B, C,), set 15 degrees apart. Current applied to pole A through the motor winding
70

causes a magnetic attraction that aligns the rotor (tooth) to pole A. Energizing stator pole B causes the rotor to rotate 15 degrees in alignment with pole B. This process will continue with pole C and back to A in a clockwise direction. Reversing the procedure (C to A) would result in a counterclockwise rotation.
FIG: VARIABLE RELUCTANCE MOTOR

PERMANENT MAGNET:
The permanent magnet motor, also referred to as a "canstack" motor, has, as the name implies, a permanent magnet rotor. It is a relatively low speed, low torque device with large step angles of either 45 or 90 degrees. It's simple construction and low cost make it an ideal choice for non industrial applications, such as a line printer print wheel positioner.
71
Fig: Permanent Magnet Motor Unlike the other stepping motors, the PM motor rotor has no teeth and is designed to be magnetized at a right angle to it's axis. The above illustration shows a simple, 90 degree PM motor with four phases (A-D). Applying current to each phase in sequence will cause the rotor to rotate by adjusting to the changing magnetic fields. Although it operates at fairly low speed the PM motor has a relatively high torque characteristic.
HYBRID:
Hybrid motors combine the best characteristics of the variable reluctance and permanent magnet motors. They are constructed with multi-toothed stator poles and a permanent magnet rotor. Standard hybrid motors have 200 rotor teeth and rotate at 1.80 step angles. Other hybrid motors are available in 0.9and 3.6 step angle configurations. Because they exhibit high static and dynamic torque
72

and run at very high step rates, hybrid motors are used in a wide variety of industrial applications.
Fig: Hybrid Motor
MOTOR WINDINGS: UNIFILAR:

Unifilar, as the name implies, has only one winding per stator pole. Stepper motors with a unifilar winding will have 4 lead wires. The following wiring diagram illustrates a typical unifilar motor:
Bifilar wound motors means that there are two identical sets of windings on each stator pole. This type of winding configuration simplifies operation in that transferring current from one coil to another one, wound in the opposite direction, will reverse the rotation of the motor shaft. Whereas, in a unifilar application, to change direction requires reversing the current in the same winding.
73
Fig: Lead Biflar Motor The most common wiring configuration for bifilar wound stepping motors is 8 leads because they offer the flexibility of either a Series or parallel connection. There are however, many 6 lead stepping motors available for Series connection applications.
STEP MODES:
Stepper motor "step modes" include Full, Half and Microstep. The type of step mode output of any motor is dependent on the design of the driver.
FULL STEP:
Standard (hybrid) stepping motors have 200 rotor teeth, or 200 full steps per revolution of the motor shaft. Dividing the 200 steps into the 360's rotation equals a 1.8 full step angle. Normally, full step mode is achieved by energizing both windings while reversing the
74

current alternately. Essentially one digital input from the driver is equivalent to one step.
HALF STEP:
Half step simply means that the motor is rotating at 400 steps per revolution. In this mode, one winding is energized and then two windings are energized alternately, causing the rotor to rotate at half the distance, or 0.9's. (The same effect can be achieved by operating in full step mode with a 400 step per revolution motor). Half stepping is a more practical solution however, in industrial applications. Although it provides slightly less torque, half step mode reduces the amount "jumpiness" inherent in running in a full step mode.
MICROSTEP:
Microstepping is a relatively new stepper motor technology that controls the current in the motor winding to a degree that further subdivides the number of positions between poles. AMS microsteppers are capable of rotating at 1/256 of a step (per step), or over 50,000 steps per revolution.
75
FIG: RESOLUTION VS CPU STEP FREQUENCY

Microstepping is typically used in applications that require accurate positioning and a fine resolution over a wide range of speeds. MAX-2000 microsteppers integrate state-of-the-art hardware with "VRMC" (Variable Resolution Microstep Control) technology developed by AMS. At slow shaft speeds, VRMCs produces high resolution microstep positioning for silent, resonance-free operation. As shaft speed increases, the output step resolution is expanded using "onmotor-pole" synchronization. At the completion of a coarse index, the target micro position is trimmed to 1/100 of a (command) step to achieve and maintain precise positioning.
DESIGN CONSIDERATIONS:
76

The electrical compatibility between the motor and the driver are the most critical factors in a stepper motor system design. Some general guidelines in the selection of these components are:
INDUCTANCE:
Stepper motors are rated with a varying degree of inductance. A high inductance motor will provide a greater amount of torque at low speeds and similarly the reverse is true.
SERIES, PARALLEL CONNECTION:

There are two ways to connect a stepper motor; in series or in parallel. A series connection provides a high inductance and therefore greater performance at low speeds. A parallel connection will lower the inductance but increase the torque at faster speeds. The following is a typical speed/torque curve for an AMS driver and motor connected in series and parallel:
77
Fig: Torque versus speed for series and parallel connection
DRIVER VOLTAGE:
The higher the output voltage from the driver, the higher the level of torque vs. speed. Generally, the driver output voltage should be rated higher than the motor voltage rating.
MOTOR STIFFNESS:
By design, stepping motors tend to run stiff. Reducing the current flow to the motor by a small percentage will smooth the rotation. Likewise, increasing the motor current will increase the stiffness but will also provide more torque. Trade-offs between speed, torque and resolution are a main consideration in designing a step motor system.
MOTOR HEAT:
Step motors are designed to run hot (50-90 C). However, too much current may cause excessive heating and damage to the motor
78

insulation and windings. AMS step motor products reduce the risk of overheating by providing a programmable Run/Hold current feature.
C#.NET:
.NET (dot-net) is the name Microsoft gives to its general vision of the future of computing, the view being of a world in which many applications run in a distributed manner across the Internet. We can identify a number of different motivations driving this vision. Firstly, distributed computing is rather like object oriented progamming, in that it encourages specialized code to be collected in one place, rather than copied redundantly in lots of places. There are thus potential efficiency gains to be made in moving to the distributed model. Secondly, by collecting specialized code in one place and opening up a generally accessible interface to it, different types of machines (phones, handheld, desktops, etc) can all be supported with the same code. Hence Microsofts runanywhere aspiration.
79

Thirdly, by controlling real-time access to some of the distributed nodes (especially those concerning authentication), companies like Microsoft can control more easily the running of its applications. It moves applications further in the area of services provided rather than object owned. Interestingly, in taking on the .NET vision, Microsoft seems to have given up some of this proprietary tendencies (whereby all the tehcnology it touched was warped towards its Windows operating system). Because it sees its future as providing software services in distributed applications, the .NET framework has been written so that applications on other platforms will be able to access these services. At the development end of the .NET vision is the .NET framework. This contains the common language runtime, the common language runtime (CLR) manages the execution of code compiled for the .NET platform. The CLR has tow interesting features. Firstly, its specification has been opened up so that it can be ported to non-windows platforms. Secondly, any number of different languages can be used to manipulate the .NET framework classes, and the CLR will support them. This has led one commentator to claim that under.NET the language one uses is a lifestyle choice.
80

Not all of the supported languages fit entirely neatly into the .NET framework, however (in some cases the fit has been somewhat procrustean). But the one language that is guaranteed to fit in perfectly is C#. This new language, a successor to C++, has been released in conjunction with the .NET framework, and is likely to be the language of choice for many developers working on .NET applications.
81
CONCLUSION
The aim of this research was to develop a Voice Biometric Security for Industrial control, which can verify a persons claimed identity. Overall, the project was a success with the basic requirements being satisfied. The finished product could enroll users, verify their voiceprint, and provided a GUI interface for users to do so. The current Voice Biometric Security for Industrial control verifies the voice templates in a one-to-one manner and offers a reliable and accurate way of verifying the user voice. Simple algorithms for Mel Frequency Cepstrum Coefficients and Vector Quantization were used for the development of the system. The performance of the system was found to depend on the recording time and length of the sentence, sampling frequency, surrounding environment, speakers behavioral conditions and the threshold value. To increase the length of the time, the recording time has to be increased. So, for a maximum of 2 seconds, a user was able to record only a short phrase like VTU University. It was initially hard for the system to authenticate the users with this kind of voice phrases. So tests have been performed on increasing the recording time. As the
82

recording time increases (3-5 seconds), the systems verification process can authenticate more accurately, reducing the number of false rejections. The sampling frequency can also be changed from 16 KHz to 22 KHz. The best results can be observed at 22 KHz of sampling frequency. The performance of the system can be improved in several ways. From the literature I have gone through, it is specified that normalization techniques can be used to alleviate the noise effects. In future I am planning to study normalization process and implement it in my system. One another improvement I have come across is Zero Padding. The technique is that, before we apply Fast Fourier Transform, if we append zeros to the data file, the Fast Fourier Transform results in more accurate results. I will try to implement zero padding in my system for performance improvement. The implemented secure Voice Biometric Security for Industrial control does not authenticate illegal users. However a small degree of false rejections can be noticed and a small number of falls acceptances can be observed. This can be avoided by changing the threshold value.
83
APPENDIX
Wave File Format
Wave files are a part of a file interchange format, called RIFFs, created by Microsoft. The format basically is composed of a collection of data chunks. Each chunk has 32-bit ID field followed by 32-bit chunk length, followed by chunk data. This format is shown the following table: OFFSET DESCRIPTION
0x00 0x04 0x08 0x0C 0x10 0x14 0x16 0x18 0x1C 0x20 Chunk id RIFF Chunk size (32-bit) Wave chunk id WAVE Format chunk id fmt Format chunk size (32-bits) Format tag Number of channels 1=mono, 2=stereo Sample rate in Hz Average bytes per second Number of bytes per sample 0x22 0x24 0x28 0x2C 1 = 8-bit mono 2= 8-bit stereo or 16-bit mono
4 = 16-bit stereo Number of bits in a sample Data chunk id data Length of data chunk (32-bits) Sample data
Table : Wave File Format
84
BIBLIOGRAPHY
Text Books
[1] James F.Peters & Witold Pedrycz, Software Engineering-An Engineering Approach , John Wily & Sons, Inc. [2] Ali Bahrami, Object Oriented Systems Development, Tata McGraw-Hill Company. [3] David J.Kruglinski, Scot Wingo, and George Shepherd, Programming with Visual C++, Microsoft Press.
Papers
[4] Douglas A.Reynolds, PhD and Larry P.Heck, PhD, Automatic Speaker Recognition Recent progress, Current Applications, and Future Trends , AAAS 2000 Meeting, Humans, Computers and Speech Symposium, 19 February 2000.
[5] Minh N.Do, An Automatic speaker Recognition System , Audio Visual Communications Laboratory, Swiss Federal Institute of Technology, Lausanne, Switzarland. [6] J.W. Cooley and J.W. Tukey, An algorithm for the machine calculation of complex Fourier Series, Mathematics Computation, Vol.19, 1965, pp 297-301.
[7] Dr.Simon Lucey, Speaker Verification Tutorial, Advanced Multimedia processing (AMP) Labs, Carnegie Mellon University, 9/5/2002-9/6/2002.
85

[8] Joseph p. Campbell,jr.,senior member, Speaker Recognition: A Tutorial, ieee Proc. IEEE, Vol.85, pp.1437-1462.
Websites
[9] www.rpmfreelancer.no-ip.com:8080/duncan21/biometrics/index.html (BIOMETRICSIdentification and Verification). [10] www.svr-www.eng.cam.ac.uk/comp.speech (comp. speech Frequently Asked Questions WWW site).
86

Voice Biometric Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Voice Biometric Report

Uploaded by

Copyright:

Available Formats

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit CHAPTER 1

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit

Statement of the problem

Voice Biometric Secured Control Unit

Objective of the study

Voice Biometric Secured Control Unit

Scope of the study

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit 1.6 Methodology

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit

1.6.2 Feature extraction process

Voice Biometric Secured Control Unit

1.6.2.2 Mel-frequency cepstrum coefficients

mel cepstrum Cepstrum

Voice Biometric Secured Control Unit

(1) Frame blocking:

y l (n) = xl (n) w(n),

Voice Biometric Secured Control Unit

(3) Fast Fourier Transform (FFT):

Note that we use j here to denote the imaginary unit, i.e.

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit

(4) Mel-frequency wrapping:

1.6.3 Feature matching process

Voice Biometric Secured Control Unit

1.6.3.2 Clustering the training vectors:

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit

1.7 Limitations of the study

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit

2.1 Documentation Overview

2.1.1 Project overview

Voice Biometric Secured Control Unit

2.1.3 System requirements

2.1.3.1 Hardware Requirements

Processor RAM Hard disk

: Pentium or above : 32 MB or above : min. of 1 GB ( with at least 30 MB of free space)

Peripheral Devices : SVGA Color Monitor, Mouse , Keyboard

Voice Biometric Secured Control Unit

2.1.3.2 Software Requirements

Operating System Software

2.1.4 Technical specifications

2.1.5 Developers responsibility overview

Voice Biometric Secured Control Unit

Voice Biometric Secured Control Unit 2.2 Use-Case Driven Analysis

Voice Biometric Secured Control Unit

User Verification User User Management

Fig.2.2 (a): Use-Case diagram for the proposed system

Description of the diagram:

Use Case Actor Desired Outcome Entered When

Finished When Description

Voice Biometric Secured Control Unit

Use Case Actor Desired Outcome

Finished When Description

Voice Biometric Secured Control Unit

Read template file

<<actor>> Registration module

Sample voice signal

Voice Biometric Secured Control Unit