You are on page 1of 8

MFCC

GMM


13307130251@fudan.edu.cn

June 29, 2016

I. [2]SVM-based
WCCN
[3],I-vector


MFCC MEL
GMM GMM MicrosoftCortanaAppleSiri
(Speaker

recognition)


II.



,

MFCCGMM

III.
MFCC
30
GMM 27
HMM ,
GMMUBM1994 [1]SVM
20
2006 GMM 8000HZ
SVM


27
10
1

V.

19
"start"

i.
20*19=380,

20*19*27=10260,

540

18*20=360,
(2)
360*27=9720,1080
(3)
GMM

27

Figure 2:

Figure 1:

VI. (MFCC)
IV.

MFCC
Mel-scale Frequency
Cepstral Coefficients MFCC
Mel
Mel

Table 1:


20 20 20 20 20 20 20 20 20 20
close file happy lucky open sound speech start stop voice
20 20 20 20 20 20 20 20 20 20

iv. (FFT)

f FFT
f mel = 2595 log 1 +
700Hz
N 1
f(Hz),5Mel X ( k ) = x (n)e j2k/N , 0 k N
n =0

i. v.

Mel
(7) FFT
H ( Z ) = 1 z1

0.9-1.0 N 1

0.97
X ( k ) = x (n)e j2k/N , 0 k N
n =0

ii. vi.
240 MFCC
(30ms)80
N 1
(10ms)
s(m) = ln( | Xa (k)|2 Hm (k), 0 m M
k =0
iii.
N 1
n(m 0.5)
C (n) = s(m)cos(
M
, n = 1, 2, ..., L
m =0
S(n), n=0,1. . . ,N-1, N
0
S (n) = LMFCC12-
S(n) XW (n),W(n) 16M

2n
W (n, a) = (1 a) aXcos[ , 0 n N 1]
N1 vii.

a MFCC
a0.46

Figure 3:

GMMlog-likelihood function
N K

C C ,t < K
t+1kk(C t C log kN (xi |k , k)


k=
1 t+k tk
dt = ,v (1) i =1 k =1
2 kK=1 k2


C C ,t QK
t t 1

VII. (GMM) i. EM
1.xi

GMM k N ( xi | k , k )
y(i, k ) = K
j =1 j N ( x i | u j , j )
2.
N
GSM 1
k =
Nk y(i, k)xi
i =1
1 N

k = Nk y(i, k)(xi k )(xi k )T


i =1
p( x ) = kK=1 p(k ) p( x |k )
(2)
= kK=1 kN ( x |k , k )

Figure 4: MFCC-FLOW

Figure 5: MEL-linear


MFCC

ii. VIII.

X,S
i.
X |S Pr (S )
Pr (S | X ) =
p( X )
p( X |S ) S X MEL 16
Pr (S )S 14,MFCC
p( X )X 300HZ3700HZ

Table 2:

(
mix
14,MFCC
16) 251

(19) 99.6%
(start) 89.4% 82%
(15) 88.9% 74%
(,) 90% 83%
(16) 89.9%% 77.4%
() 88.7%


i.1 1

19
ii.
380
2,
540


i.2 2

start

,360

1080
MEL


i.3 3

MEL (MEL 12-16,
15 22-26)

MEL16
2414
i.4 4
199.6%(
3 ,

MFCCMFCC start
GMM

Figure 8: (GMM)

MFCC
90%251
87%
100
5
498) 83%
start

Figure 6:
16
89.9%, 251 77.4%,
49885%
25188.7
49888%

iii.
Figure 7:

20
89.4%
50
251
82%
start

,MEL15
88.9%251
74% MFCC
start

2S [2] W. M. Campbell, D. E. Sturim, and D.


A. Reynolds Support vector machines
using GMM supervectors for speaker
verification. IEEE Signal Process. Lett.,
IX.
vol.13, no. 5, pp. 308311, 2006
[3] A. O. Hatch, S. S. Kajarekar, and A.
Stolcke Within-class covariance nor-
(2s,8000Hz), malization for SVM-based speaker recog-
MFCC nition. in Proc. Interspeech, Pitts-
GMM burgh,PA, pp. 14711474, 2006.

[4] John H.L. Hansen and Taufiq
Hasan Speaker Recognition by Ma-
chines and Humans. IEEE SIG-
NAL PROCESSING MAGAZINE
[74]november 2015.

start

MFCC-
GMM
GMM-UBM,GMM-SVM,I-
VECTOR
I-VECTOR
[4]

[1] J. Gauvain and C. Lee


Maximum a posteriori estimation for
multivariate Gaussian mixture observa-
tions of Markov chains IEEE Trans.
Speech Audio Processing,vol. 2, no.
2, pp. 291298, 1994

You might also like