You are on page 1of 41

SOFTWARE REQUIREMENT SPECIFICATION

17 | P a g e

6. Introduction
6.1 Purpose The purpose of this document is to present the detailed description of the Speaker Recognition System. This report will discuss each stage of the project, the Requirements Specification Phase, the System Design Phase, the Implementation Phase, and the Testing Phase. This report will also discuss Recommendations to the project. 6.2 Intended Audience The different types of reader that the document is intended for can be developers, project managers, testers, documentation writers and users which include the faculty members, the institute staff, students and the alumni of the institute. It is advisable for all audiences except users to have a look through the complete document for a better understanding of this software product. 6.3 Scope of the Project Speaker recognition system is a standalone application. It can be used to restrict access to confidential information. It can be integrated into other systems to provide security. 6.4 References IEEE. IEEE Std 830-1998 IEEE Recommended Practice for Software Requirements Specifications. IEEE Computer Society, 1998.

18 | P a g e

7. Requirement Model
Requirement model is used in the systems and software engineering to refer to a process of building up a specification of the system. In this we will find actors and will make use cases, interface descriptions and problem domain objects. To draw all the relevant diagrams and notations we use UML (Unified Modeling Language). 7.1 User Requirements 7.1.1Functional Requirements a. The user should be able to enter records. b. Each record represents information about a person and contains his/her voice sample. c. Records may consist of: i. First name. ii. Last name. iii. Phone. iv. Address (city-street address). v. Voiceprint. vi. ID-number.

d. The system can do noisy filter on voice signal, which may have come from environment noise or sensitivity of the microphones. e. The system must be able to take voiceprint and user-id (in case of speaker verification) as an input and search for a match the database, and then show the result. f. The result should be viewed by showing the user-ids matching the input. g. The user should be able to see his/her full information upon successful identification/verification. 7.1.2 Non Functional Requirements a. Records are maintained in a database. b. Every record shall be allocated a unique identifier (id-number).

19 | P a g e

c. User should be able to retrieve data by entering id and voiceprint on successful identification/verification. d. To improve the performance, the database should store the compressed codebook for each user instead of voiceprint. Voiceprint of user is discarded after calculating codebook.

7.2 System Requirements 7.2.1 Actors with their description Users: Provides the system with a voice sample and expects the system to show match and details about the user. Administrator: Manages the entire speaker recognition system.

20 | P a g e

Enroll

User Request Match

Edit Information

Remove

Add Users

Administrator Remove Users

View Statistics

Figure 1: Use Case Diagram

21 | P a g e

7.2.2 Use Cases with their description

Use Case Add Records

Description Administrator adds the new users to the system. The user must provide his/her details and voice sample to the system during enrollment. The user requests a voice sample to be matched with a voiceprint in the database and retrieve details about it (on successful verification). Allows the user to add or update (remove) the records in the system, such as name, ID, phone number, etc (on successful verification) System allows the administrator to remove user. Administrator can view the performance of the system.

Request Match

Update Records

Remove User/Records View Statistics

7.2.2.1 Administrator Use case

Add Users

Administrator Remove Users

View Statistics

Figure 12: Administrator Use Case Diagram

22 | P a g e

Use Case Name: Add Users Brief Description: Administrator enrolls the users into system. 1. Preconditions: a. The system must be fully functional and connected to the Database. b. Administrator should be logged into the system. 2. Main flow of events: a. The administrator inputs user details into the system. b. The administrator inputs users voice sample. c. Notification appears that user is enrolled. 3. Post conditions: a. User is enrolled. b. A user-id with relevant details is displayed. 4. Special Requirements: none

Use Case Name: Remove Users Brief Description: Administrator removes the users from system. 1. Preconditions: a. The system must be fully functional and connected to the Database. b. Administrator should be logged into the system. 2. Main flow of events: a. The administrator inputs user-id into the system. b. Notification appears for confirmation. c. User is removed. 3. Post conditions: System no longer contains any information about the user. 4. Alternate Flow: a. User-id given by administrator is not found in the system. Administrator once again enters user-id. b. On confirmation for removal of user, administrator selects no. User is not removed from the system. 5. Special Requirements: none
23 | P a g e

Use Case Name: View Statistics Brief Description: Administrator views the performance statistics of the system. 1. Preconditions: a. The system must be fully functional and connected to the Database. b. Administrator should be logged into the system. 2. Main flow of events: a. Administrator selects to see the performance statistics. b. Statistics is shown. 3. Post conditions: None. 4. Alternate Flow: None 5. Special Requirements: none

24 | P a g e

7.2.2.2 User Use case

Enroll

User Request Match

Edit Information

Remove

Figure 13: User use case diagram

Use Case Name: Enroll Brief Description: User enrolls into system. 1. Preconditions: The system must be fully functional and connected to the Database. 2. Main flow of events: a. The user inputs his/her details into the system. b. The user inputs his/her voice sample. c. Notification appears that user is enrolled. 3. Post conditions: c. User is enrolled. d. A user-id with relevant details is displayed. 4. Special Requirements: none
25 | P a g e

Use Case Name: Remove Brief Description: User removes himself/herself from system. 1. Preconditions: The system must be fully functional and connected to the Database. 2. Main flow of events: a. The user inputs his/her user-id into the system. b. Notification appears for confirmation. c. User is removed. 3. Post conditions: System no longer contains any information about the user. 4. Alternate Flow: a. User-id given by user is not found in the system. user once again enters user-id. b. On confirmation for removal of user, user selects no. User is not removed from the system. 5. Special Requirements: none

Use Case Name: Request Match Brief Description: User enters his/her voice sample and runs the test phase. 1. Preconditions: The system must be fully functional and connected to the Database. 2. Main flow of events: a. User selects to test. b. System asks user to enter his/her user-id and voice sample. c. Matching is done and result is shown to the user. 3. Post conditions: User is allowed to login into the system. 4. Alternate Flow: None
26 | P a g e

5. Special Requirements: none

Use Case Name: Edit Information Brief Description: User edits his/her information stored in the system. User is not allowed to edit his/her already stored voice sample. 1. Preconditions: The system must be fully functional and connected to the Database. User must be logged into the system. 2. Main flow of events: a. User selects to edit. b. System displays full detail of the user. c. User edits his/her information and selects save. 3. Post conditions: System contains updated user information. 4. Alternate Flow: None 5. Special Requirements: none

7.3 Safety Requirements There are no safety requirements that concerned, such as possible loss, damage, or harm that could result from the use of Speaker Recognition System. 7.4 User Interfaces In the main menu, the user will be presented with several buttons. These will be Enroll, Voiceprint Test. After the user logs in, he/she is also presented with Edit Information and Remove buttons. Administrator is presented with Add Users, Remove Users and View Statistics button. 7.4.1 Enroll Clicking on the new user button will cause a dialog box to open with the title New User. The dialog box will have following fields:
27 | P a g e

i. First name. ii. Last name. iii. Phone. iv. Address (city-street address). It will also contain the two buttons, Enroll and Cancel. Cancel will return the user to the main menu with no user created. Enroll will prompt the user to speak so as to record his/her voice and give a countdown starting from 2 seconds. Note: the New User dialogue box will remain in the foreground when recording begins. After the countdown is complete, the system will begin recording from the microphone for 10 second. If the recording was successful then the user will be returned to the main menu. If an error occurred during recording (for example silence) a descriptive message will be displayed (for example no sound recorded) and the dialogue box will remain.

7.4.2 Voiceprint Match Clicking on Test will allow users to test their voiceprint with the implemented verification algorithm. A dialogue box will pop up with the title Voiceprint Test. Two buttons will give the option to return to the main menu (OK button) or perform the test. Recording will be carried out in the same way specified for enrollment but the responses at the end will be different. Note: the Voiceprint Test dialogue box will remain in the foreground when recording begins. At the end of the recording, the program will respond with a success, fail, or error.

7.4.3 Remove The Remove option will delete a users profile (user-id and voiceprint). Upon clicking on Remove a dialog box will pop up with the title Remove User for confirmation. There will also be the buttons Cancel and Delete. Cancel will bring the user back to the main menu. If Delete is clicked, the user will be prompted with Are you sure? The user will then have to either hit y for yes or n for no. Hitting y and then enter will delete the profile and return the user to the main menu. Hitting n will return the user to the dialogue box without deleting the profile. 7.4.4 Statistics The display of statistics is an element that will be given flexibility. At this time, the only requirements are that performance statistics be available.

28 | P a g e

7.5 Hardware Interfaces Speaker Recognition System requires access to systems microphone to capture user voice. 7.6 Software Interfaces Speaker Recognition System is built for windows operating system. It requires Microsoft XP Service Pack 3 and above to run. Since the software is built on Matlab, it requires Matlab runtime to function properly.

29 | P a g e

Problem Domain Object (PDO)

User

Administrator

VoicePrint

Figure 14: Problem Domain Object

30 | P a g e

8. Analysis Model
The analysis model describes the structure of the system or application that you are modeling. It aims to structure the system independently of the actual implementation environment. Focus is on the logical structure of the system The following are the three types of analysis objects into which the use cases can be classified:

Analysis Objects Interface Objects Entity Objects Control Objects

Figure 15: Types of Analysis Object

Entity objects: Information to be held for a longer time all behavior naturally coupled to information. Example: A person with the associated data and behavior. Interface objects: Models behavior and information that is dependent on the interface to the system. Example: User interface functionality for requesting information about a person. Control objects: Models functionality that is not naturally tied to any other object. Behavior consists of operating on several different entity objects, doing some computations and then returning the result to an interface object. Example: Calculate taxes using several different factors. The analysis model identifies the main classes in the system and contains a set of use case realizations that describe how the system will be built. Sequence diagrams realize the use cases by describing the flow of events in the use cases when they are executed. These use case realizations model how the parts of the system interact within the context of a specific use case. It can be used as the foundation of the design model since it describes the logical structure of the system, but not how it will be implemented.

31 | P a g e

Interface Objects

MICROPHONE IF.
Figure 16: Interface Objects

32 | P a g e

Entity Objects

User Information

Voice Sample
Figure 17: Entity Objects

Control Objects:

Figure 18:Control Objects

33 | P a g e

Start Panel

Receive Information

User Information <includes>

Generate Result Voice Sample

Request Match

Add/Remove/Edit User Admin Interface

User Panel

Figure 19: Analysis Model

View Statistics
34 | P a g e

9. SEQUENCE DIAGRAMS A sequence diagram in a Unified Modeling Language (UML) is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams typically are associated with use case realizations in the Logical View of the system under development. Sequence diagrams are sometimes called event diagrams, event scenarios, and timing diagrams.

Sequence Diagram: User Enrollment

Enrollment

Profile

Feature Extract

Codebook Calculation

Users

Create Request for Voice Sample Voice Sample/Training Speech Voice Sample Acoustic Vectors

Acoustic Vectors Codebook Add To User List Return User Id Return User Id

Figure 20: Sequence diagram for user enrollment

35 | P a g e

Sequence Diagram: Voice Match

User

Match Voice

Feature Extractor

Feature Comparator

Codebook

Request to initiate match

Requests for voice and user id input

VoicePrint and User Id

VoicePrint

Acoustic Vector Codebook Result is returned

Requests for user's codebook. Input: UserId

Result Result

Figure 21: Sequence diagram for Voice Match

36 | P a g e

Sequence Diagram: Edit Information

User

Authenticator

Edit

User Database

Voice Sample and UserId Successfully logged in

User Id & Request to Retrieve Information Information Updated Information

UserId User Information

Updated Information Success

Success

Figure 22: Sequence Diagram for Editing Information

37 | P a g e

10. Activity Diagrams An activity diagram is like a state diagram, except that it has a few additional symbols and is used in a different context. In a state diagram, most transitions are caused by external events; however in an activity diagram, most transitions are caused by internal events, such as the completion of an activity. An activity diagram is used to understand the flow of work that an object or component performs. It can also be used to visualize the interaction between different use cases.

1. Enroll new user

Figure 23: Activity Diagram for enrolling new user

2. Request Matching

Figure 24: Activity Diagram for voice matching

38 | P a g e

3. Remove User

Figure 25: Activity Diagram for removing user

4. Update user information

Figure 26: Activity Diagram for updating user information

39 | P a g e

11. Design Model


11.1 High Level Design There are two main modules in this speaker recognition system: The User Enrollment Module and the User Verification Module.

Figure 27: High Level Block Diagram of Speaker Verification System

11.1.1 User Enrollment Module The User Enrollment Module is used when a new user is added to the system. This module is used to essentially teach the system the new users voice. The input of this module is a voiceprint of the user along with other details. By analyzing this training speech, the module outputs a model that parameterizes the users voice. This model will be used later in the User Verification Module. 11.1.1.1 Signal Preprocessing Subsystem The signal preprocessing subsystem conditions the raw speech signal and prepares it for subsequent manipulations and analysis. This subsystem performs analog-to-digital conversion, and perform and signal conditioning necessary. 11.1.1.2 Feature Extraction Subsystem The feature extraction subsystem analyzes the users digitized voice signal and creates a series of values to use as a model for the users speech pattern.
40 | P a g e

11.1.1.3 Feature Data Compression Subsystem The disk size required for the model created in the Feature Extraction subsystem will be significant when many users are enrolled in the system. In order to store this data effectively, a form of data compression is used. After the model is compressed, it will be stored for later use in the User Verification Module.

11.1.2 Threshold Generation Module This module is used to set the sensitivity level of the system for each user enrolled in the system. This sensitivity value is called the threshold and needs to be generated whenever a new user is enrolled. This module can also be invoked when a user feels they are receiving too many false rejections and wants to re-calculate an appropriate sensitivity level. After a user enrolls with the system, running this module will essentially invoke a user verification session. However, instead of receiving a pass or fail verdict, the system will take the similarity factor found in the Feature Comparison Subsystem, and use it to determine the threshold value. This similarity factor will be scaled-up and then saved as the threshold value. Scaling the value up will hopefully account for any variances in future verification sessions. This module is required for speaker verification functionality. As of now, implementation of this module is suspended due to timing constraint. 11.1.2.1 Threshold Generation Subsystem This subsystem will set the user threshold to a scaled-up version of the similarity factor determined in the Feature Comparison Subsystem.

11.1.3 Verification Module The User Verification Module is used when the system tries to verify a user. The user informs the system that he or she is a certain user. The system will then prompt the user to say something. This utterance is referred to as the testing speech. The module performs the same signal pre-processing and feature extraction as the User Enrollment Module. The extracted speech parameterization data is then compared to the stored model. Based on the similarity, a verdict will be given to indicate whether the user has passed or failed the voice verification test.
41 | P a g e

11.1.3.1 Feature Comparison Subsystem After the Feature Extraction Subsystem parameterizes the training speech, this data is compared to the model of the user stored on disk. After comparing all the data, a similarity factor will be produced. 11.1.3.2 Decision Subsystem Based on the similarity factor produced by the Feature Comparison Subsystem, and the users threshold value, a verdict will be given by this subsystem to indicate whether the user has passed or failed the voice verification test.

11.2 Low Level Design The following section describes the information used for implementation of each subsystem. 11.2.1 Signal Preprocessing Subsystem Input: Raw speech signal Output: Digitized and conditioned speech signal (one vector containing all sampled values)

Figure 28: Signal Preprocessing Subsystem Low-Level Block Diagram

The sampling will produce a digital signal in the form of a vector or array. The silence at the beginning and end of the speech sample will be removed.

11.2.2 Feature Extraction Subsystem Input: Digital speech signal (one vector containing all sampled values) Output: A set of acoustic vectors

Figure 29: Feature Extraction Subsystem Low-Level Block Diagram

Mel-Cepstral Coefficients will be used to parameterize the speech sample and voice.
42 | P a g e

The original vector of sampled values will be framed into overlapping blocks. Each block will be windowed to minimize spectral distortion and discontinuities. A Hamming window will be used. The Fast Fourier Transform will then be applied to each windowed block as the beginning of the Mel-Cepstral Transform. After this stage, the spectral coefficients of each block are generated. The Mel Frequency Transform will then be applied to each spectral block to convert the scale to a mel-scale. The mel-scale is a logarithmic scale similar to the way the human ear perceives sound. Finally, the Discrete Cosine Transform will be applied to each Mel Spectrum to convert the values back to real values in the time domain.

11.2.3 Feature Compression Subsystem Inputs: A set of acoustic vectors Output: Codebook

Figure 30: Feature Data Compression Subsystem Low-Level Block Diagram

The K Means Vector Quantization Algorithm will be used. 4.2.4 Feature Data Comparison Subsystem Inputs: Set of acoustic vectors from testing speech; codebook Outputs: average distortion factor

Figure 31: Comparison Subsystem Low-Level Block Diagram The acoustic vectors generated by the testing voice signal will be individually compared to the codebook. The codeword closest to each test vector is found based on Euclidean distance. This minimum Euclidean Distance, or Distortion Factor, is then stored until the

43 | P a g e

Distortion Factor for each test vector has been calculated. The Average Distortion Factor is then found and normalized.

Figure 32: Distortion Calculation Algorithm Flow Chart

11.2.5 Decision Subsystem Inputs: Average distortion factor; User specific threshold Outputs: Verdict

44 | P a g e

Figure 33: Comparison Subsystem Low-Level Block Diagram

45 | P a g e

12. Alternative Options


There is more than one way to perform speaker recognition. The methods chosen for this project were mostly chosen because of their implementability and low complexity. The list of alternatives below is in no way a complete listing. 12.1 Feature Extraction Alternatives Linear Prediction Cepstrum: Identifies the vocal track parameters. Used for text-independent recognition. Discrete Wavelet Transform Delta-Cepstrum: Analyses changing tones 12.2 Feature Matching Alternatives Dynamic Time Warping: Accounts for inconsistencies in the rate of speech by stretching or compressing parts of the signal in the time domain. AI based: Hidden Markov Models, Gaussian Mixture Models, and Neural Networks.

13. Implementation
13.1 Platform Matlab was chosen as the platform for ease of implementation. A third party GNU Matlab toolbox, Voicebox, was used. This toolbox provides functions that calculate MelFrequency Coefficients and performs vector quantization.

46 | P a g e

Conclusion
In this project, we have developed a text-independent speaker identification system that is a system that identifies a person who speaks regardless of what he/she is saying. Our speaker verification system consists of two sections: (i) Enrolment section to build a database of known speakers and (ii) Unknown speaker identification system. Enrollment session is also referred to as training phase while the unknown speaker identification system is also referred to as the operation session or testing phase. In the training phase, each registered speaker has to provide samples of their speech so that the system can build or train a reference model for that speaker. It consists of two main parts. The first part consists of processing each persons input voice sample to condense and summarize the characteristics of their vocal tracts. The second part involves pulling each person's data together into a single, easily manipulated matrix. In the testing phase, above calculated matrix is used recognition.

Future Work
Currently, this application lacks easy-to-use user interface. This application can be extended to provide user interface and also this application can be fine tuned to meet realtime constraint. Other techniques may be used for implementing this application to minimize the false-acceptance and false-rejection rate.

47 | P a g e

Snapshots

Figure 34: Matlab Command Window

48 | P a g e

Figure 35: Matlab Editor

49 | P a g e

50 | P a g e

REFERENCES
1. Manfred U. A. Bromba. "Biometrics: Frequently Asked Questions" http://www.bromba.com/faq/biofaqe.htm 2. Wikipedia, the free encyclopedia. " Biometrics" http://en.wikipedia.org/wiki/Biometrics 3. Douglas A. Reynolds. Automatic Speaker Recognition Using Gaussian Mixture Speaker Models http://www.ll.mit.edu/publications/journal/pdf/vol08_no2/8.2.4.speakerrecognition.pdf 4. Patricia Melin, Jerica Urias, Daniel Solano, Miguel Soto, Miguel Lopez, and Oscar Castillo, Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms, Engineering Letters, 13:2, EL_13_2_9. http://www.engineeringletters.com/issues_v13/issue_2/EL_13_2_9.pdf

Code
51 | P a g e

speakerTest.m
function speakerTest(a) voiceboxpath='C:/Users/test/voicebox'; addpath(voiceboxpath); % A speaker recognition program. a is a string of the filename to be tested against the % database of sampled voices and it will be evaluated whose voice it is. % Mike Brooks, VOICEBOX, Free toolbox for MATLab, % www.ncl.ac.uk/CPACTsoftware/MatlabLinks.html % disteusq.m, enframe.m, kmeans.m, melbankm.m, melcepst.m, rdct.m and rfft.m from % VOICEBOX are used in this program.

test.data = wavread(a); % read the test file name = ['vimal';'anand']; % name of people in the database fs = 16000; C = 8; % sampling frequency % number of centroids

% Load data disp('Reading data for training:') [train.data] = Load_data(name); % Calculate mel-frequecny cepstral coefficients for training set fprintf('\nCalculating mel-frequency cepstral coefficients for training set:\n') [train.cc] = mfcc(train.data,name,fs); % Perform K-means algorithm for clustering (Vector Quantization) fprintf('\nApplying Vector Quantization(K-means) for feature extraction:\n') [train.kmeans] = kmean(train.cc,C); % Calculate mel-frequecny cepstral coefficients for test set %fprintf('\nCalculating mel-frequency cepstral coefficients for test set:\n') test.cc = melcepst(test.data,fs,'x'); % Compute average distances between test.cc with all the codebooks in % database, and find the lowest distortion %fprintf('\nComputing a distance measure for each codebook.\n') [result index] = distmeasure(train.kmeans,test.cc); % Display results - average distances between the features of unknown voice % (test.cc) with all the codebooks in database and identify the person with % the lowest distance fprintf('\nDisplaying the result:\n') dispresult(name,result,index)
52 | P a g e

Load_data.m
function [data] = Load_data(name) % Training mode - Load all the wave files to database (codebooks) % data = cell(size(name,1),1); for i=1:size(name,1) temp = [name(i,:) '.wav']; tempwav = wavread(temp); data{i} = tempwav; end

distmeasure.m
function [result,index] = distmeasure(x,y) result = cell(size(x,1),1); dist = cell(size(x,1),1); mins = inf; k=size(x,2); for i = 1:size(x,1) dist{i} = disteusq(x{i}(:,1:k),y(:,1:k),'x'); temp = sum(min(dist{i}))/size(dist{i},2); result{i} = temp; if temp < mins mins = temp; index = i; end end

dispresult.m
function dispresult(x,y,z) disp('The average of Euclidean distances between database and test wave file') color = ['r'; 'g'; 'c'; 'b'; 'm'; 'k']; for i = 1:size(x,1) disp(x(i,:)) disp(y{i}) end disp('The test voice is most likely from')
53 | P a g e

disp(x(z,:))

mfcc.m
function [cepstral] = mfcc(x,y,fs) % Calculate mfcc's with a frequency(fs) and store in ceptral cell. Display % y at a time when x is calculated cepstral = cell(size(x,1),1); for i = 1:size(x,1) disp(y(i,:)) cepstral{i} = melcepst(x{i},fs,'x'); end

kmean.m
function [data] = kmean(x,C) % Calculate k-means for x with C number of centroids train.kmeans.x = cell(size(x,1),1); train.kmeans.esql = cell(size(x,1),1); train.kmeans.j = cell(size(x,1),1); for i = 1:size(x,1) [train.kmeans.j{i} train.kmeans.x{i}] = kmeans(x{i}(:,1:12),C); end data = train.kmeans.x;

melcepst.m function c=melcepst(s,fs,w,nc,p,n,inc,fl,fh) %MELCEPST Calculate the mel cepstrum of a signal C=(S,FS,W,NC,P,N,INC,FL,FH) % % % Simple use: c=melcepst(s,fs) % calculate mel cepstrum with 12 coefs, 256 sample frames % c=melcepst(s,fs,'e0dD') % include log energy, 0th cepstral coef, delta and delta-delta coefs % % Inputs: % s speech signal
54 | P a g e

% fs sample rate in Hz (default 11025) % nc number of cepstral coefficients excluding 0'th coefficient (default 12) % n length of frame in samples (default power of 2 < (0.03*fs)) % p number of filters in filterbank (default: floor(3*log(fs)) = approx 2.1 per ocatave) % inc frame increment (default n/2) % fl low end of the lowest filter as a fraction of fs (default = 0) % fh high end of highest filter as a fraction of fs (default = 0.5) % % w any sensible combination of the following: % % 'R' rectangular window in time domain % 'N' Hanning window in time domain % 'M' Hamming window in time domain (default) % % 't' triangular shaped filters in mel domain (default) % 'n' hanning shaped filters in mel domain % 'm' hamming shaped filters in mel domain % % 'p' filters act in the power domain % 'a' filters act in the absolute magnitude domain (default) % % '0' include 0'th order cepstral coefficient % 'E' include log energy % 'd' include delta coefficients (dc/dt) % 'D' include delta-delta coefficients (d^2c/dt^2) % % 'z' highest and lowest filters taper down to zero (default) % 'y' lowest filter remains at 1 down to 0 frequency and % highest filter remains at 1 up to nyquist freqency % % If 'ty' or 'ny' is specified, the total power in the fft is preserved. % % Outputs: c mel cepstrum output: one frame per row. Log energy, if requested, is the % first element of each row followed by the delta and then the delta-delta % coefficients. %

if nargin<2 fs=11025; end if nargin<3 w='M'; end if nargin<4 nc=12; end if nargin<5 p=floor(3*log(fs)); end if nargin<6 n=pow2(floor(log2(0.03*fs))); end if nargin<9 fh=0.5; if nargin<8
55 | P a g e

fl=0; if nargin<7 inc=floor(n/2); end end end if isempty(w) w='M'; end if any(w=='R') z=enframe(s,n,inc); elseif any (w=='N') z=enframe(s,hanning(n),inc); else z=enframe(s,hamming(n),inc); end f=rfft(z.'); [m,a,b]=melbankm(p,n,fs,fl,fh,w); pw=f(a:b,:).*conj(f(a:b,:)); pth=max(pw(:))*1E-20; if any(w=='p') y=log(max(m*pw,pth)); else ath=sqrt(pth); y=log(max(m*abs(f(a:b,:)),ath)); end c=rdct(y).'; nf=size(c,1); nc=nc+1; if p>nc c(:,nc+1:end)=[]; elseif p<nc c=[c zeros(nf,nc-p)]; end if ~any(w=='0') c(:,1)=[]; nc=nc-1; end if any(w=='E') c=[log(sum(pw)).' c]; nc=nc+1; end % calculate derivative

56 | P a g e

if any(w=='D') vf=(4:-1:-4)/60; af=(1:-1:-1)/2; ww=ones(5,1); cx=[c(ww,:); c; c(nf*ww,:)]; vx=reshape(filter(vf,1,cx(:)),nf+10,nc); vx(1:8,:)=[]; ax=reshape(filter(af,1,vx(:)),nf+2,nc); ax(1:2,:)=[]; vx([1 nf+2],:)=[]; if any(w=='d') c=[c vx ax]; else c=[c ax]; end elseif any(w=='d') vf=(4:-1:-4)/60; ww=ones(4,1); cx=[c(ww,:); c; c(nf*ww,:)]; vx=reshape(filter(vf,1,cx(:)),nf+8,nc); vx(1:8,:)=[]; c=[c vx]; end if nargout<1 [nf,nc]=size(c); t=((0:nf-1)*inc+(n-1)/2)/fs; ci=(1:nc)-any(w=='0')-any(w=='E'); imh = imagesc(t,ci,c.'); axis('xy'); xlabel('Time (s)'); ylabel('Mel-cepstrum coefficient'); map = (0:63)'/63; colormap([map map map]); colorbar; end

57 | P a g e

You might also like