17 views

Uploaded by Shubham Garg

- b224s.09.lec10
- 10.1.1.422.223
- lec20
- 8 Segmentation of Blood Vessels and Optic Disc In
- Wireless Home Automation System with Acoustic Controlling
- Pham Thang Presentation
- Retrospective 14
- seeds_long.pdf
- Cancun260407
- Kumbharana Ck Thesis Cs
- Image Caption Technical Report
- Final Thesis
- Knowledge Based Expert System for on-line
- Voicenotes - Android Based Smart Classroom
- Lecture 12
- tesl
- Comparative Analysis of Iris Segmentation and Iris Feature Extraction Techniques
- Non Standard License Plate
- regseg
- Thyroid Nodule Detection in Magnetic Resonance

You are on page 1of 42

Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural processing) What sound units are typically defined Audio signal processing topics (pitch extraction, perceptual audio coding, source separation, music analysis) Now back to pattern recognition, but include time

ASR = static pattern classification + sequence recognition Deterministic sequence recognition: template matching Templates are typically word-based; dont need phonetic sound units per se Still need to put together local distances into something global (per word or utterance)

Basic approach the same for deterministic, statistical:

25 ms windows (e.g., Hamming), 10 ms steps (a frame) Some kind of cepstral analysis (e.g., MFCC or PLP) Cepstral vector at time n called xn

Words, phones most common For template-based ASR, mostly words For template-based ASR, local distances based on examples (reference frames) versus input frames

Easy if local matches are all correct (never happens!) Local matches are unreliable Need measure of goodness of fit Need to integrate into global measure Need to consider all possible sequences

Matrix for comparison between frames Word template = multiple feature vectors Reference template = X kref Input template = X in Need to find D( X kref , X in )

Time Normalization Which references to use Defining distances/costs Endpoints for input templates

Time Normalization

Linear Time Normalization Nonlinear Time Normalization Dynamic Time Warp (DTW)

Speech sounds stretch/compress differently Stop consonants versus vowels Need to normalize differently

Permit many more variations Ideally, compare all possible time warpings Vintsyuk (1968): use dynamic programming

Dynamic programming

Bellman optimality principle (1962): optimal policy given optimal policies from sub problems Best path through grid: if best path goes through grid point, best path includes best partial path to grid point Classic example: knapsack problem

Knapsack problem

Stuffing a sack with items, different value Goal: maximize value in sack Key point 1: If max size is 10, and we know values of solutions for max size of 9, we can compute the final answer knowing the value of adding items. Key point 2: Point 1 sounds recursive, but can be made efficiently nonrecursive by building a table

Basic DTW step w/ simple local constraints. Each (i,j) cell has local distance d and cumulative distortion D. The eqn shows the basic computational step.

Apply DP to ASR: Vintsyuk, Bridle, Sakoe Let D(i,j) = total distortion up to frame i in input and frame j in reference Let d(i,j) = local distance between frame i in input and frame j in reference Let p(i,j) = set of possible predecessors to frame i in input and frame j in reference D(i,j) = d(i, j) + minp(i,j) D(p(i,j))

DTW steps

(1) Compute local distance d in 1st column(1st frame of input) for each reference template. Let D(0,j) = d(0,j) for each cell in each template (2) For i=1 (2nd column), j=0, compute d(i,j) add to min of all possible predecessor values of D to get local value of D; repeat for each frame in each template. (3) Repeat (2) for each column to the end of input (4) For each template, find best D in last column of input (5) Choose the word for the template with smallest D

DTW Complexity

O(Nframesref . Nframesin . Ntemplates) Storage, though can just be O(Nframesref . Ntemplates) (store current column and previous column) Constant reduction: global constraints Constant reduction: local constraints

All examples? Prototypes? DTW-based global distances permit clustering

DTW-based K-means

(1) Initialize (how many, where) (2) Assign examples to closest center (DTW distance) (3) For each cluster, find template with minimum value for maximum distance, call it the center (4) Repeat (2) and (3) until some stopping criterion is reached (5) Use center templates as references for ASR

Normalizing for scale Cepstral weighting Perceptual weighting, e.g., JND Learning distances, e.g., with ANN, statistics

Sounds easy Hard in practice (noise, reverb, gain issues) Simple systems use energy, time thresholds More complex ones also use spectrum Can be tuned Not robust

Time normalization Recognition Segmentation Cant have templates for all utterances DP to the rescue

Vintsyuk, Bridle, Sakoe Sakoe: 2-level algorithm Vintsyuk, Bridle: one stage Ney explanation

Ney, H., The use of a one-stage dynamic programming algorithm for connected word recognition, IEEE Trans. Acoust. Speech Signal Process. 32: 263-271, 1984

Connected Algorithm

In principle: one big distortion matrix (for 20,000 words, 50 frames/word, 1000 frame input [10 seconds] would be 109 cells!) Also required, backtracking matrix (since word segmentation not known) Get best distortion Backtrack to get words Fundamental principle: find best segmentation and classification as part of the same process, not as sequential steps

In principle, backtracking matrix points back to best previous cell Mostly just need backtrack to end of previous word Simplifications possible

Storage efficiency

Distortion matrix -> 2 columns Backtracking matrix -> 2 rows From template points to template with lowest cost ending here From frame points to end frame of previous word

Within word local constraints Between word local constraints Grammars Transition costs

Knowledge-based segmentation

DTW combines segmentation, time norm, recognition; all segmentations considered Same feature vectors used everywhere Could segment separately, using acousticphonetic features cleverly Example: FEATURE, Ron Cole (1983)

No structure from subword units Average or exemplar values only Cross-word pronunciation effects not handled Limited flexibility for distance/distortion Limited mathematical basis -> Statistics!

Having examples can get interesting again when there are many of them Potentially an augmentation of stat methods Recent experiments show decent results Somewhat different properties -> combination

Statistical ASR Speech synthesis Speaker recognition Speaker diarization Oral presentations on your projects Written report on your project

Week of April 30: no class Monday, double class Wednesday May 2 (is that what people want?) 8 oral presentations by individuals, 12 minutes each + 3 minutes for questions 2 oral presentations by pairs 17 minutes each + 3 minutes for questions 3:10 PM to 6 PM with a 10 minute mid-session break Written report due Wednesday May 9, no late submissions (email attachment is fine)

- b224s.09.lec10Uploaded byBurime Grajqevci
- 10.1.1.422.223Uploaded byJanković Milica
- lec20Uploaded byDeepak Sakkari
- 8 Segmentation of Blood Vessels and Optic Disc InUploaded byVigneshInfotech
- Wireless Home Automation System with Acoustic ControllingUploaded byseventhsensegroup
- Pham Thang PresentationUploaded byUmang Aggarwal
- Retrospective 14Uploaded byyiktkpjh
- seeds_long.pdfUploaded byademaralve
- Cancun260407Uploaded byAibruob Omega
- Kumbharana Ck Thesis CsUploaded byYadu Krishnan
- Image Caption Technical ReportUploaded byakg299
- Final ThesisUploaded byDipti Ranjan
- Knowledge Based Expert System for on-lineUploaded byOneCharu
- Voicenotes - Android Based Smart ClassroomUploaded byIRJET Journal
- Lecture 12Uploaded byom4perfection
- teslUploaded byAna Bella
- Comparative Analysis of Iris Segmentation and Iris Feature Extraction TechniquesUploaded byRahul Sharma
- Non Standard License PlateUploaded byMuhammad Azhar Iqbal
- regsegUploaded byDeepali Gawade
- Thyroid Nodule Detection in Magnetic ResonanceUploaded byiaetsdiaetsd
- Vehicle Number Plate Recognition with Bilinear Interpolation and Plotting Horizontal and Vertical Edge Processing Histogram with Sound SignalsUploaded byEditor IJRITCC
- Tfm InglesUploaded byigconde
- 15.IJAEST Vol No 5 Issue No 2 an Efficient Technique for Segmentation of Characters of Vehicle Identification Number Using Watershed Algorithm 187 194Uploaded byiserp
- Graph MatchingUploaded byAnand Balaji
- Skin Detection ISBI 10Uploaded byLokesh Kancharla
- Genetic ModelUploaded bySilvian
- Chapter 10 Image SegmentationUploaded byGhuru
- Device Control system.pdfUploaded byTint Swe Oo
- Detection of breast cancer using firefly algorithmUploaded byijeteeditor
- Php5Uploaded bymandeep kaur

- E.C.E INTERVIEWUploaded bymumsn
- J1939_ElektronikAutomotive_201012_PressArticle_EN.pdfUploaded bywtn2013
- Multistage 2pUploaded byAHMED BAKR
- Ctc-071 Borescope InspectionUploaded byJorge Farias Gomez
- ECR_ForEmployers_FileStructure.pdfUploaded byIrshadAhmed
- 1 Computer HistoryUploaded byEdnalyn Cruzada Santos
- Ge Stator Earth FaultUploaded byMario
- G1043_US_2010Uploaded byvj4249
- Redp4437 - XCAT 2 Guide for the CSM System AdministratorUploaded byDaniel Alvarez
- Combo FixUploaded byRonnie Loco MH
- Knorr Cetina - Epistemic Cultures How the Sciences Make KnowledgeUploaded bylazarosirdaniel
- Take Home QuizUploaded byGrace
- Exam 2 Summary Sheet microbiologyUploaded byfathobo
- ajr.154.6.2110722Uploaded byBita
- SHM QuestionsUploaded byOm Prakash
- Engine Interface ModuleUploaded byjlfsouza
- lenzs_lawUploaded byMunie Rosnan
- AREC__eng_30.pdfUploaded byYamen AlSalka
- RcardUploaded byanamartinfernandez
- Dell Latitude e6230 Spec SheetUploaded byoon
- Universal Beams to BS4 Part 1Uploaded bykfctco
- Quest Electrostatics & Electricity 5 KeyUploaded byAnonymous i4CIG14g
- MiddleUploaded byColin
- Root PrimerUploaded byJisoo Eric Lee
- ME330 Syllabus 2015-SpringUploaded byellie<3
- 61627479 Application AdvES EnUploaded byjjccmmaa
- 400-101 prepaway dumpsUploaded byShan Malik
- whirlpool_awg_3200-1avs_150-2.pdfUploaded byUnni Chakyat
- ASP .NET Programming with C# & SQL Server by Don Gosselin.pdfUploaded byJesús Manuel Viña Iglesias
- Bearing Capacity - 1.pptUploaded byWamanga David