Professional Documents
Culture Documents
liu vi cc xu hng theo thi gian, v l mt thnh phn quan trng trong h thng nhn dng
ging ni trong nhiu nm.
Ti tm thy n rt kh khn tm thy mt v d tt (vi m!) Ca mt h thng nhn dng ging ni
n gin, v vy ti quyt nh to ra bi ny. Mc d thc hin iu ny s khng ginh c bt c
gii thng cho "Best Speech Recognizer", ti hy vng n s cung cp mt s ci nhn su sc vo
cch HMMs c th c s dng nhn dng ging ni v cc nhim v khc.
Trong bi ny, ti s xc nh nhng g m hnh Markov n l, cho thy lm th no thc hin mt
hnh thc (Gaussian Mixture hnh HMM, GMM-HMM) s dng numpy + scipy, v lm th no s
dng thut ton ny cho n nhn dng ging ni loa. i vi mt "lp sn xut" thc hin HMM hn,
xem hmmlearn nm gi nhng hin thc HMM m trc y l mt phn ca sklearn .
D liu
chng minh thut ton ny, chng ta cn mt b d liu hot ng trn. Ti chn s
dng cc d liu mu t d n ny bi Google Code Hakon Sandsmark . Ti cng s dng m
ny nh mt ti liu tham kho khi thc hin to ring ca ti v mt Guassian Hn hp mu HMM
(GMM-HMM). iu ny h tr trong vic kim tra thc hin ca ti, cng nh a ra mt khung tham
chiu cho hiu sut.
B d liu c sn khc l phn ln multispeaker, nhng cc tnh nng nh cao tn n gin c s
dng trong v d ny khng lm vic trong ch multispeaker (loa khc nhau c ni dung tn s
khc nhau ca cng mt t! Hy mt mnh nam / n khc bit ngn lun ...). Cng vic sp ti s
bao gm cc k thut khai thc tnh nng tin tin hn cho m thanh, v m rng cc v d
multispeaker cng nhn.
%matplotlib inline
from utils import progress_bar_downloader
import os
#Hosting files on my dropbox since downloading from google code is painful
#Original project hosting is here: https://code.google.com/p/hmm-speechrecognition/downloads/list
#Audio is included in the zip file
link = 'https://dl.dropboxusercontent.com/u/15378192/audio.tar.gz'
dlname = 'audio.tar.gz'
if not os.path.exists('./%s'%dlname):
progress_bar_downloader(link, dlname)
os.system('tar xzf %s'%dlname)
else:
print '%s already downloaded!'%dlname
spoken = []
for f in os.listdir('audio'):
for w in os.listdir('audio/' + f):
fpaths.append('audio/' + f + '/' + w)
labels.append(f)
if f not in spoken:
spoken.append(f)
print 'Words spoken:',spoken
0.
1.
5.
3.
2.
6.
0.
1.
5.
3.
2.
6.
0.
1.
4.
3.
2.
6.
0.
1.
4.
3.
2.
6.
0.
1.
4.
3.
2.
6.
0.
5.
4.
3.
2.
6.
0.
5.
4.
3.
2.
6.
0.
5.
4.
3.
2.
6.]
0.
5.
4.
3.
2.
0.
5.
4.
3.
2.
0.
5.
4.
3.
2.
(216, 32)
-c:9: RuntimeWarning: divide by zero encountered in log
Out[5]:
<matplotlib.text.Text at 0x3eb6890>