Professional Documents
Culture Documents
GMM
13307130251@fudan.edu.cn
I. [2]SVM-based
WCCN
[3],I-vector
MFCC MEL
GMM GMM MicrosoftCortanaAppleSiri
(Speaker
recognition)
II.
,
MFCCGMM
III.
MFCC
30
GMM 27
HMM ,
GMMUBM1994 [1]SVM
20
2006 GMM 8000HZ
SVM
27
10
1
V.
19
"start"
i.
20*19=380,
20*19*27=10260,
540
18*20=360,
(2)
360*27=9720,1080
(3)
GMM
27
Figure 2:
Figure 1:
VI. (MFCC)
IV.
MFCC
Mel-scale Frequency
Cepstral Coefficients MFCC
Mel
Mel
Table 1:
20 20 20 20 20 20 20 20 20 20
close file happy lucky open sound speech start stop voice
20 20 20 20 20 20 20 20 20 20
iv. (FFT)
f FFT
f mel = 2595 log 1 +
700Hz
N 1
f(Hz),5Mel X ( k ) = x (n)e j2k/N , 0 k N
n =0
i. v.
Mel
(7) FFT
H ( Z ) = 1 z1
0.9-1.0 N 1
0.97
X ( k ) = x (n)e j2k/N , 0 k N
n =0
ii. vi.
240 MFCC
(30ms)80
N 1
(10ms)
s(m) = ln( | Xa (k)|2 Hm (k), 0 m M
k =0
iii.
N 1
n(m 0.5)
C (n) = s(m)cos(
M
, n = 1, 2, ..., L
m =0
S(n), n=0,1. . . ,N-1, N
0
S (n) = LMFCC12-
S(n) XW (n),W(n) 16M
2n
W (n, a) = (1 a) aXcos[ , 0 n N 1]
N1 vii.
a MFCC
a0.46
Figure 3:
GMMlog-likelihood function
N K
C C ,t < K
t+1kk(C t C log kN (xi |k , k)
k=
1 t+k tk
dt = ,v (1) i =1 k =1
2 kK=1 k2
C C ,t QK
t t 1
VII. (GMM) i. EM
1.xi
GMM k N ( xi | k , k )
y(i, k ) = K
j =1 j N ( x i | u j , j )
2.
N
GSM 1
k =
Nk y(i, k)xi
i =1
1 N
Figure 4: MFCC-FLOW
Figure 5: MEL-linear
MFCC
ii. VIII.
X,S
i.
X |S Pr (S )
Pr (S | X ) =
p( X )
p( X |S ) S X MEL 16
Pr (S )S 14,MFCC
p( X )X 300HZ3700HZ
Table 2:
(
mix
14,MFCC
16) 251
(19) 99.6%
(start) 89.4% 82%
(15) 88.9% 74%
(,) 90% 83%
(16) 89.9%% 77.4%
() 88.7%
i.1 1
19
ii.
380
2,
540
i.2 2
start
,360
1080
MEL
i.3 3
MEL (MEL 12-16,
15 22-26)
MEL16
2414
i.4 4
199.6%(
3 ,
MFCCMFCC start
GMM
Figure 8: (GMM)
MFCC
90%251
87%
100
5
498) 83%
start
Figure 6:
16
89.9%, 251 77.4%,
49885%
25188.7
49888%
iii.
Figure 7:
20
89.4%
50
251
82%
start
,MEL15
88.9%251
74% MFCC
start
start
MFCC-
GMM
GMM-UBM,GMM-SVM,I-
VECTOR
I-VECTOR
[4]