Professional Documents
Culture Documents
Meng Chao-Hong
Advisor: Lee Lin-Shan, Ph.D.
June, 2009
()
()
()
(Speech Recognition)
(Bayes Theorem)(Acoustic Model)(Language Model)
(Hidden
Markov Model)
(Maximum
Likelihood Estimation)
(Discriminative Training)
(Structural Support Vector Machine) (Phone
Accuracy)(Tandem System)1%
ii
Contents
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 . . . . . . . . . . . . . . . . . . . . . . .
1.2 . . . . . . . . . . . . . . . . . . . . . . .
1.3 . . . . . . . . . . . . . . . . . . . . . . .
1.4 . . . . . . . . . . .
1.5 . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
2.1 . . . . . . . . . . . .
2.1.1 . . . . . . . . . . . . . . . . . . .
2.1.2 . . . . . . . . . . . . . . . .
2.2 (Multi-Layered Perceptron) . . . . . . .
2.2.1 . . . . . . . . . . . . . . . . . . .
2.2.2 . . . . . . . . . . . . . . . .
2.3 (Tandem) . . . . . . . . . . . . . .
2.4 . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
3.1 . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 (Primal Form) . . . . . . . . . . . . . . .
3.3 (Lagrange Duality) . . . . . . . . . . . .
3.4 . . . . . . . . . . . . . . . . . . . . . . .
3.5 . . . . . . . . . . . . . . . . . . . . . . .
3.6 . . . . . . . . . . . . . . . . . . . .
3.7 (Sequential Minimal Optimization)
3.8 . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
4.1 . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 (Primal Form)(Dual Form) . .
4.3 (Cutting Plane Method) . . . . . . . . .
4.4 . . . . . . . . . . . . . .
4.5 . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
5.1 . . . . . . . . . . . . . . . . . . . .
5.1.1 . . . . . . . . . . . . . . . . . . .
5.1.2 . . . . . . . . . . . . . . . . . . .
5.1.3 . . . . . . . . . . . . . . .
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i
ii
1
1
3
6
7
7
9
9
10
14
20
22
23
25
27
28
28
29
33
36
39
40
42
48
49
49
50
55
62
63
65
65
65
68
71
5.1.4 . . . . . . . . . .
5.1.5
5.1.6 . . . . . . .
5.2 . . . . . . . . . . . . . .
5.2.1 . . . . . . .
5.3 . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
6.1 . . . . . . . . . . . . . . . . .
6.2 . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
iv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
73
77
77
77
86
87
87
87
89
1.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
. . . . . .
. . . . . . . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
14
21
25
3.1
3.2
3.3
3.4
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
38
41
44
4.1
4.2
4.3
4.4
4.5
4.6
4.7
. . . . . . . . . . .
w
. . . .
w
. . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
51
52
52
57
58
59
61
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
. . . . . . . . . . . . . . . . . . . . . . . .
(x, y)
. . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . . . . . . . . . .
(
)
(
)
(
)
. . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
70
75
76
80
82
84
85
86
.
.
.
.
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
TIMIT
. . . . . . . . . . . . . .
TIMIT
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
48
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
(
)
(
)
(
)
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
66
67
67
69
80
82
84
85
1.1
(Rabiner) [1]
(Hidden Markov Model; HMM)
1970
[2]
(Bayes Theorem)
(Acoustic Model)
(Language Model)
(Maximum Likelihood
Estimation; MLE)
(Transcription)
(Posterior Probability)
(Competing Word Sequence)
(Testing Set)
(Likelihood)
1980
IBM
(Bahl) [3]
(Maximum Mutual Information; MMI)
1
P (Y |X)
P (Y )
P (X|Y )
(Bayesian School)
P (Y |X)
(Statistical School)
(
Y )
Y
P (Y |X)
P (Y |X)
1.1:
1.2
[8]
(Radio
Rex)
(
1.1)
1920
(Rex)
500
(Hz)
500
Rex
1940
1940
(Bell Lab)
10
(Patterns)
(Correlation Coefficient)
3
97%
99%
(Phoneme Recognizer)
(Phoneme Transition Probability)
1960
(Feature Extraction Algorithms)
(Warping)
(Speaking Rate)
(Dynamic Programming)
(Match)
1966
(Baum)
[9]
(Carnegie Mellon University)
(Baker)
IBM
(Watson Lab)
(Frederick Jelinek)
(Shannon)
(Information Theory)
IBM
Baker
(Decoding Algorithm)
Baker
(Viterbi)
IBM
(Stack)
20
Resource Management, Wall Street Journal Air Traffic Information System, Broadcast
News, CALLHOME
x
(Observation)
(Feature Vector
Sequence)
X
Y
y
y = arg max P (y|x)
yY
(1.1)
y
Y P (y|x)
x
y
P (Y |X)
P (Y |X)
(Bayes Theorem)
y = arg max P (y|x)
yY
P (x|y)P (y)
= arg max
yY
P (x)
(1.2)
P (x)
5
y = arg max P (y|x)
yY
(1.3)
P (X|Y )
P (Y )
P (X|Y )
X
P (Y )
Y
(Acoustic Model)
(Language Model)
P (Y |X)
Y
(Free-Phone Decoding)
1.3
(Component)
(
[10]
[11])
(
[4]
[5])
P (Y |X)
(Hidden Conditional Random Field; HCRF) [12,13]
(Multi-Layered Perceptron; MLP)
[14]
(Structural Support Vector Machine)
6
[15]
(Sequence Tagging) [16]
(
)
[17,18]
[19]
[20]
1.4
TIMIT
(Mel-Frequency Cepstral Coefficient; MFCC)
(Perceptron Linear Prediction; PLP)
(Tandem)
(Posterior Probability)
1%
1.5
(Optimization)
(Tandem)
2.1
(Sequence Tagging)
(Seqeunce
Segmentation)
(Part-of-speech Tagging)
(Statistical Language Modeling)
(Natural Language Processing; NLP)
(Discrete)
(Feature Vector Space)
(Finitely Countable)
(State)
(Emission Probability)
(Probability Mass Function)
(Discrete Hidden Markove Model)
(Uncountable)
(Probability Density Function)
2.1.1
(Generative Model)
Algorithm 1
1: M : = {1, 2, . . . M }
2:
(Uniform Distribution)
3:
A
M
l
4:
(Finitly Countable)
(Infinitly Countable)
(Uncountable)
5:
for t = 1, . . . , T do
6:
7:
8:
end for
(
)
1
10
0.7
0.2
0.1
0.05
0.4
0.3
0.55
0.4
0.3
2.1:
A, B,
(Finite State Machine;
FSM)
(Edge)
2.1
B
l = 1)
Xt t
st t
l
l
l
(l-th-order
Hidden Markov Model; l-th-order HMM)
(First-order Hidden Markov Model; First-order HMM)
11
= {1, 2, . . . , M }
(Alphabet Set)
M 1
M
OW (Observation Alphabet Set)
(Finitly Countable)
(Infinitely Countable)
(Uncountable)
A = {aij }
A
l
(Transition Probability)
A
(Matrix)
aij i
j
aij = P (st = j|st1 = i)
(2.1)
B = {bj (x)}
B
j
(Observation)
x
(Emission
Probability)
(2.2)
bj (ok ) = P (Xt = ok |st = j)
(2.3)
ok k
B
: (
)(
)
(Probability Distribution Family)
12
bj (x) =
C
X
cjk N (x, jk , jk ) =
k=1
C
X
(2.4)
k=1
jk (Covariance)
jk cjk
(Component)
1
C
X
cjk = 1
(2.5)
k=1
= {i }:
i
(Initial Probability)
M
aij 0, bi (x) 0, i 0, i, j, k
M
X
aij = 1
(2.6)
(2.7)
j=1
Z
bi (x) = 1
(2.8)
xOW
M
X
i = 1
(2.9)
i=1
2.2
13
0.6
1.0
0.9
0.5
0.4
0.5
0.1
2.2:
:
= (A, B, )
(2.10)
A, B,
2.1.2
(Training Set)Z
1
N
Z = {X (n) }N
n=1
(2.11)
(n)
n
i
(Maximum Likelihood Estimation)
N (Likelihood)
:
N
Y
P (X (n) ; )
n=1
14
(2.12)
(Log-Likelihood)
:
l() =
N
X
log P (X (n) ; )
(2.13)
n=1
(Fit)
(Hidden Variable)
(Expectation Maximization)
:
l() =
N
X
n=1
N
X
log P (X (n) ; )
!
log
P (X (n) , S (n) ; )
n=1
S (n)
N
X
XX
(2.14)
!
log
n=1
S (n) K (n)
Qn (S (n) , K (n) )
S (n) (State Sequence)
K (n)
XX
S (n)
Qn (S (n) , K (n) ) = 1
(2.15)
K (n)
Qn (S (n) , K (n) ) 0
15
(2.16)
(Jensens Inequality)
(Lower Bound)
!
N
X
XX
l() =
log
P (X (n) , S (n) , K (n) ; )
n=1
S (n) K (n)
N
X
XX
(n)
(n)
(n)
P (X , S , K ; )
Qn (S (n) , K (n) )
n=1
S (n) K (n)
N XX
X
P (X (n) , S (n) , K (n) ; )
(n)
(n)
Qn (S , K ) log
Qn (S (n) , K (n) )
(n)
(n)
n=1
log
S
K
N
XX X
Qn (S (n) , K (n) )
(2.17)
Qn (S (n) , K (n) ) log P (X (n) , S (n) , K (n) ; )
Qn (S (n) , K (n) ) log Qn (S (n) , K (n) )
1
P (X (n) , S (n) , K (n) ; )
=c
Qn (S (n) , K (n) )
(2.18)
(2.19)
S (n)
K (n)
Qn (S (n) , K (n) ) = 1
P (X (n) , S (n) , K (n) ; )
P
(n) , S (n) , K (n) ; )
S (n)
K (n) P (X
Qn (S (n) , K (n) ) = P
=
(2.20)
Qn (S (n) , K (n) ) log Qn (S (n) , K (n) )
16
(2.21)
N XX
X
(2.22)
(
)
P (X (n) , S (n) , K (n) ; )
Qi
N XX
X
(2.23)
P (X (n) , S (n) , K (n) ; )
(n)
(n)
(n)
log
P
(X
,
S
,
K
;
)
P (X (n) ; )
P (X (n) , S (n) , K (n) |)
P (X
(n)
,S
(n)
,K
(n)
=
|)
T
Y
t=1
T
Y
(n)
a
st1 st bst (Xt )
(2.24)
a
st1 st bst kt (Xs(n)
)
cs t k t
t
t=1
log P (X
(n)
,S
(n)
,K
(n)
=
|)
T
X
log a
st1 st +
t=1
T
X
t=1
T
X
log cst kt
(2.25)
t=1
2.17
N XX
X
N X
M X
C X
T
X
(2.26)
P (st = j, kt =
(n)
k|X (n) ; ) log bjk (xt )
N X
M X
C X
T
X
17
F (x) =
yi log xi
(2.27)
xi = 1
(Lagrange Multiplier)
yi
xi = P
i
(2.28)
yi
(n)
n=1
t=1 t (i, j)
PM PN PT
(n)
t=1 t (i, j)
n=1
k=1
PN PT (n)
t=1 t (j, k)
n=1
PC PN PT (n)
k=1
n=1
t=1 t (j, k)
PN PT
a
ij =
cjk =
(2.29)
(2.30)
(n)
t (j, k)
(n)
(2.31)
(n)
t (i, j)
(n)
(2.32)
{jk , jk }
(Chain Rule)
PT
P (st = j, kt = k|X (n) ; )xt
jk = Pt=1
T
(n) ; )
t=1 P (st = j, kt = k|X
(2.33)
PT (n)
(j,
k)x
t
t
= Pt=1
(n)
T
t=1 t (j, k)
jk =
PT
t=1
(n)
jk )(xt
t=1 t (j, k)(xt
PT (n)
t=1 t (j, k)
PT
=
18
jk )T
(2.34)
2:
3:
end for
4:
for t = 2, . . . , T do
6:
for i = 1, . . . , M do
hP
i
M
t (j) =
(i)a
bj (Xt )
t1
ij
i=1
7:
end for
5:
8:
end for
9:
P (X; ) =
PM
i=1
T (i)
(n)
(n)
t (j, k)
t (i, j)
-
(Baum-Welch Algorithm)
-
2
3
(
2
t (i)
(2.35)
(
3
t (i)
T
t (i) = P (Xt+1
|st = i; )
(n)
(2.36)
(n)
t (i, j)
t (i)
t (i)
t (j, k)
(n)
t (i, j)
(n)
t (j, k)
(n)
PC
=
k=1
PM
=
i=1
(2.37)
(2.38)
(n)
t (j, k)
t (i, j)
19
for t = T 1, . . . , 1 do
4:
for i = 1, . . . , M do
hP
i
M
t (i) =
a
b
(X
)
(j)
ij
j
t+1
t+1
j=1
5:
end for
3:
6:
end for
(a)
2.2
(b)
(Multi-Layered Perceptron)
(Neuro Science)
2.3a
!
F (x) =
X
i
20
w i xi
(2.39)
Input
Hidden
Output
layer
layer
layer
Input #1
Input #2
Output
Input #3
Input #4
2.3:
x
(Transformation)
wi
(Weight)
2.3b
(Neuro
Network)
(Acyclic)
(Multi-Layered Perceptron)
(Regression)
(Binary Classification)
(Univariate Regression)
(Threshold)
21
2.2.1
f : Rd R
d
g
g
f
L
(Neuron)
l(1 l L)
d(l) d(l1)
d(l) l 1
l
(
l
l 1
)
d(0) = d
d(L) = 1
(l1)
l
xi
(l)
, (1 i d(l1) )
xj , (1
(l)
j d(l) )
l
wij , (1 l L, 0 i d(l1) , 1 j d(l) )
l
j
(l1)
dX
(l)
(l) (l1)
xj =
wij xi
(2.40)
i=0
(s) = tanh(s) =
es es
es +es
(l)
sj
22
(l)
sj
(l1)
dX
(l) (l1)
wij xi
(2.41)
i=0
(l)
(l)
xj = (sj )
(2.42)
g(x)
(0)
(0)
x
x1 , . . . xd( 0)
L
(L)
x1 g(x)
2.2.2
Z =
{(xn , yn )}N
n=1
(Error Function)
En (w) = (g(xn ) yn )2
(2.43)
(Optimization)
(Gradient Decent)
n
(Stochastic Gradient Decent)
E(l)
,0 i
wij
d(l1) , (1 j d(l) , 1 l L)
(l)
(l)
wij wij
23
En
(l)
wij
(2.44)
n
E(l)
wij
for t = 1, 2, . . . , T do
w(t) w(t1) E(w(t1) )
end for
(Backpropagation Algorithm)
(l)
(l)
(l)
(l)
xj sj wij sj
(Chain Rule)
En
(l)
wij
(l)
sj
(l1)
= xi
(l)
wij
(l)
j =
En
(l)
sj
(l)
En
sj
(l)
(2.45)
(l)
sj
wij
E(l)n
sj
(L)
l = L
En = (x1 y)2
(L)
(L)
1
(L)
E(l)n = 2(x1 y)
x1
En
(l)
sj
(L)
x1
(l)
s1
(L)
En
(l)
x1
x1
(2.46)
(l)
s1
(L)
= (s1 )
(l1)
l 1
j
(l1)
En
(l1)
si
(l)
d
X
En
(l)
j=1
(l)
sj
(l1)
sj
(l1)
xi
xi
(l1)
si
(l)
d
X
(l)
(l)
(l1)
j wij (si
j=1
24
(2.47)
2.4:
0
(l1)
(l)
(l)
(s) = 1 2 (s)
(si ) = xi i
(l1)
i
= 1
(l1)
(xi )2
d(l)
X
(l) (l)
wij j
(2.48)
j=1
2.3
(Tandem)
(Tandem) [10]
2.3
(Pricipal Component
25
Analysis; PCA)
1000
(Subword)
(Hybrid System)
(Hermansky)
[10]
30
50
(Skew)
(Log)
[10]
(Testing Set)
(Evaluation)
26
2.4
27
3.1
(Linear Classifier)
(Binary Classification Problem)
(
3.1)
B
C
B
(Seperating Line)
(Geometric
Distance)
C
A
B
(Support Vector Machine; SVM)
(Margin)
3.1
(w, b)
(
w1 x1 + b = 0
hw, xi + b
h, i
)
(w , b )
(w , b )
hw , xi + b 0
(w , b )
hw , xi + b 0
(w , b )
hw , xi + b
0
0
0
0
0
)
28
y
A
3.1:
f
sign(z) =
1 if z 0
(3.1)
1 otherwize
sign(f (x))
n
(Seperating
Hyperplane)
(Maximum Margin Classifier)
3.2
(Primal Form)
(Hyperplane)
(Seperable Case)
29
N
Z = {(xi , yi )}N
i=1
(3.2)
xi Rn yi {1, 1}
max Thickness(w, b)
w,b
(3.3)
Thickness
w
b
i = ( distance of xi to hw, xi i + b = 0)
(3.4)
hw, xi i + b
hw,
x
i
+
b
i
= yi
i =
kwk
kwk
(3.5)
Thickness(w, b) = min i
i
(3.6)
max min i ,
i
w,b
hw, xi i + b
s.t. i :i = yi
kwk
(3.7)
kwk = 1
kwk
(
kwk =
30
1
(functional distance))
max min i
i
w,b
(3.8)
kwk = 1
= mini i
i , i
max
w,b
s.t. i : i ,
(3.9)
kwk = 1
i hw, xi i + b
max
w,b
(3.10)
kwk = 1
kwk = 1
kwk
kwk =
1
(non-convex)
= kwk
(
kwk
1
yi (hw, xi i + b)
kwk
)
w,b:w6=0 kwk
max
(3.11)
= 1
kwk
max
w,b
(3.12)
= 1
max
w,b
1
kwk
(3.13)
kwk2
21
1
1
min kwk2
w,b 2
(3.14)
3.3
(Lagrange Duality)
(Lagrange
Multiplier)
l
min f (w)
w
(3.15)
L(w, , ) = f (w) +
k
X
i gi (w) +
i=1
l
X
i hi (w)
(3.16)
i=1
i i (Lagrange Multiplier)
0
L
=0
wi
L
=0
i
L
=0
i
(3.17)
(3.18)
(3.19)
(Stationary Point)(w,)
(Primal Form)
P (w)
P (w) = max L(w, , )
,:i 0
(3.20)
P
min P (w)
w
33
(3.21)
(Primal Form)
w
i
gi (w) > 0
hi (w) 6= 0
i i
i i
P (w) = max L(w, , )
,,i 0
= max
,,i 0
f (w) +
k
X
i gi (w) +
i=1
l
X
!
i hi (w)
(3.22)
i=1
=
w
f
otherwize
(3.23)
P f
min P (w) = min max L(w, , )
w
,:i 0
(3.24)
p = min P (w)
w
(3.25)
(Dual Form)
34
D (, ) = min L(w, , )
w
(3.26)
P w
max D (, ) = max min L(w, , )
,:i 0
,:i 0
(3.27)
d = max D (w)
,:i 0
(3.28)
d = max min L(w, , ) min max L(w, , ) = p
,:i 0
,:i 0
(3.29)
f gi (Convex)
hi (Affine)
gi (Feasible)(
w
i : gi (w) < 0) d p w w
p = d = L(w , , )
w , , (Karush-Kuhn-Tucker Condition; KKT Condition)
L(w , , ) = 0, 1 i n
wi
L(w , , ) = 0, 1 i l
i
(3.30)
(3.31)
i gi (w ) = 0, 1 i k
(3.32)
gi (w ) 0, 1 i k
(3.33)
0, 1 i k
(3.34)
35
w , ,
(3.32)
(KKT Dual Complementarity Condition)
(
i > 0
gi (w ) = 0
i > 0
(Active)
)
(Support Vectors)
3.4
1
min kwk2
w,b 2
(3.35)
i : yi (hw, xi i + b 1
gi (w)
gi (w) = yi (hw, xi i + b) + 1 0
(3.36)
X
1
L(w, b, ) = kwk2
i (yi (hw, xi i + b) 1)
2
i=1
(3.37)
i i
min max L(w, b, )
w
,:i 0
36
(3.38)
max min L(w, b, )
,:i 0
(3.39)
L(w, b, )
w
b
w L(w, b, ) = w
N
X
L(w, b, ) =
b
i yi xi = 0
i=1
N
X
i y (i) = 0
(3.40)
(3.41)
i=1
(3.40)
w=
N
X
i yi xi
(3.42)
i=1
(3.42)
(3.37)
L(w, b, ) =
N
X
i=1
N
N
X
1X
i
y i y j i j xi xj b
i yi
2 i,j=1
i=1
(3.43)
(3.41)
0
L(w, b, ) =
N
X
i=1
N
1X
y i y j i j xi xj
2 i,j=1
(3.44)
w
b
L
max W () =
N
X
i=1
1X
yi yj i j hxi , xj i
2 i,j=1
s.t.i 0, i
N
X
(3.45)
i yi = 0
i=1
hi
37
3.2:
3.7
(3.42)
w w
b
b =
maxi:yi =1 wT xi + mini:yi =1 wT xi
2
(3.46)
yi (hw, xi i ) = 1 (Funtional Margin)
1) i > 0
3.2
i 0
yi (hw, xi i ) = 1
(Support Vectors)
i > 0
x
x
hw, xi + b
38
0
0
hw, xi + b
* N
+
X
hw, xi + b =
i yi xi , x + b
i=1
N
X
(3.47)
i yi hxi , xi + b
i=1
(3.48)
i > 0
hxi , xi
3.5
hw, xi + b =
N
X
i yi hxi , xi + b
(3.48)
i=1
x
x2 x3
hx, zi
h(x), (z)i
h(x), (z)i
K(x, z) = h(x), (z)i
39
(3.49)
x, z Rn
K(x, z) = (xT z)2
(3.50)
K(x, z)
! n
!
n
X
X
K(x, z) =
xi zi
xi zi
i=1
j=1
n X
n
X
xi xj zi zj
(3.51)
i=1 j=1
n
X
(xi xj )(zi zj )
i,j=1
n = 3
(x) = [x1 x1 , x1 x2 , x1 x3 , x2 x1 , x2 x2 , x2 x3 , x3 x1 , x3 x2 , x3 x3 ]T
(3.52)
(x)
O(n2 )
K(x, z)
O(n)
3.6
(Separable)
(Outlier)
3.3
L1 (L1 Regularization)
(Primal Form)
(Slack Variable)i
N
X
1
i
min kwk2 + C
w,b 2
i=1
s.t.yi (hw, xi i + b) 1 i
i 0
40
(3.53)
3.3:
(Functional Margin)
1
1 i Ci C
1
N
X
X
X
1
i
i [yi (hw, xi + b) 1 + i ]
ri i (3.54)
L(w, b, , , r) = hw, wi + C
2
i=1
i=1
i=1
i ri
i = 0 yi (hw, xi i + b) 1
(3.55)
i = C yi (hw, xi i + b) 1
(3.56)
(3.57)
41
L1
max W () =
N
X
i=1
N
1X
i
yi yj i j hxi , xj i
2 i,j=1
s.t.0 i C
N
X
(3.58)
i yi = 0
i=1
L1 i 0
0 i C
3.7
[23]
(Quadratic
Programming)
(Gradient Decent)
42
max W () =
N
X
i=1
N
1X
yi yj i j hxi , xj i
2 i,j=1
(3.59)
s.t. k, 0 k C
N
X
k yk = 0
k=1
i i j
N
X
k y k = 0
k=1
(3.60)
i y i =
j yj
j6=i
i
5
tol
Algorithm 5
1: repeat
2:
(Heuristic)
i j
3:
i j W ()
4:
until
i j
W ()
i j
(3.60)
i yi + j yj =
k y k
(3.61)
k6=i,j
i y i + j y j =
43
(3.62)
1
C
H
i yi + j yj =
3.4:
3.4
i j
[0, C] [0, C]
i yi + j yj =
j
L
H
(L j H)
(i , j )
L
H
If yi 6= yj , L = max(0, j i ), H = min(C, C + j i )
If yi = yj , L = max(0, i + j C), H = min(C, i + j )
i + j =
i j
i = ( j yj )yi
(3.63)
W (i , j , )
j
W (i , j , ) = W (( j yj yi ), j , ) = aj2 + bj + c
44
(3.64)
W (i , j , )
[0, C] [0, C]
yj (Ei Ej )
(3.65)
Ek = f (xk ) yk
(3.66)
(3.67)
j = j
j [0, C] [0, C]
j
H, if j > H
(3.68)
j =
j , if L j H
L, if j < L
(
L
H
3.4) j
(old)
i = i + yi yj (j
j )
(3.69)
i
i j
b
i = yi (hw, xi i + b) 1
(3.70)
i = C yi (hw, xi i + b) 1
(3.71)
(3.72)
b
i
(0 < i < C) b1 b
(old)
b1 = b Ei yi (i i
(old)
)hxi , xi i yj (j j
45
)hxi , xj i
(3.73)
0 < j < C
b2 b
(old)
b2 = b Ej yi (i i
(old)
hxi , xj i yj (j j
)hxj , xj i
(3.74)
(i = 0
i = C
j = 0
j = C)
b1 b2 b
b
b1 , if 0 < i < C
b=
b2 , if 0 < j < C
(3.75)
3.7
(
i, j
[23]
Require: C, tol, max passes, Z = {(xi , yi )}N
i=1
Ensure: , b
1:
i = 0, b = 0
2:
passes = 0
3:
4:
5:
for i = 1, 2, . . . , m do
6:
Ei = f (xi ) yi
7:
if (yi Ei < tol and i < C) or (yi Ei > tol and i > 0) then
8:
j 6= i
9:
Ej = f (xj ) yj
10:
L
H
11:
if L = H then
46
12:
13:
end if
14:
15:
if 0 then
16:
17:
end if
18:
19:
if |j j
(old)
20:
21:
end if
22:
23:
b1 ,b2
24:
25:
26:
27:
end for
28:
end while
29:
30:
31:
32:
33:
passes = passes + 1
else
passes = 0
end if
47
3.8
48
4.1
(Multiclass Classification)
n
(
n
2
n
2
(
(One Versus One)) k
(
n
)
n
(
(One Versus All))
(Statistical School)
(Parsing)
(
(Parse Tree))
(Bayesian School)
(Structural Support Vector Machine; SVM-struct) [15]
f x X y Y
y
{1, 1}
{1, 2, . . . k}
k
(Sequence)
(String)
49
(Tree)
(Graph)
)
f x
(
)
y(Parse Tree)
Z = (xi , yi )ni=1
(4.1)
xi y i
4.2
(Primal Form)
(Dual Form)
w
x
w
(
4.1
w1 w9 9
w
w4
)
x X
y Y
x
(y
)
y
w
hw, xi + b
y
(x, y)
(
(Conditional
Random Fields; CRF)
(Feature Function)
x
y
50
hw4 , xi + b4
hw6 , xi + b6
hw1 , xi + b1
hw
hw85 ,, xi
xi +
+ bb85
hw7 , xi + b7
hw3 , xi + b3
hw2 , xi + b2
hw9 , xi + b9
4.1:
)
hw, (x, y)i
w
w
(Discriminant
Function)F : X Y R x
y
x
y
(x, y)
F y
x
4.2
4.3
(
F (x, y; w)
x
y
x
x
y
y
x
y
w
4.2
4.3
)
w
(xi , yi )
F (xi , yi ; w) F (xi , y; w), y 6= yi xi yi
y
w
51
4.2: w
4.3: w
(4.2)
F (x, y; w) = hw, (x, y)i
(4.3)
(
(No Free Lunch Theorem)
)
52
i = F (xi , yi ; w) max F (xi , y; w)
yY\yi
(4.4)
(4.5)
i
max {hw, (xi , yi )i} < hw, (xi , yi )i}
yY\yi
(4.6)
max
n 1
(4.7)
(
i (y) hw, (xi , yi )i hw, (xi , y)i)
w
kwk 1
1
min kwk2
w 2
(4.8)
n
1
CX
min kwk2 +
i
w, 2
n i=1
(4.9)
s.t.i,i 0,
i,y Y\yi : hw, i (y)i 1 i
53
0-1
y
y P (x, y)
(x, y)
RP (f )
Z
(y, f (x))dP (x, y)
(4.10)
X Y
P (x, y)
(Empirical Risk)
RS (f )
n
X
(4.11)
i=1
y 6= yi (yi , y)
(yi , y)
(Slack Variable)
n
1
CX
2
min kwk +
i
w, 2
n i=1
(4.12)
s.t.i,i 0,
i,y Y\yi : hw, i (y)i 1
54
i
(yi , y)
n
1
CX
2
min kwk +
i
w, 2
n i=1
(4.13)
s.t.i,i 0,
i,y Y\yi : hw, i (y)i (yi , y) i
n|Y|
|Y|
(Exponential
Order)
(
)
max
X
i,y6=yi
iy
1X
iy j yhi (y), j (
y )i
2
(4.14)
s.t.i, y 6= Y\yi : iy 0
(Lagrange Multiplier)
iy (xi , yi )
y
4.3
(Approximation)
(Cutting Plane Method)
(Optimization)
(Linear Inequality)
55
(Feasible Set)
(Object Function)
(Integer Linear Programming)
max x1 + 5x2
s.t. x1 + 10x2 20
x1 2
(4.15)
x1 , x2 0
x1 , x2 Z
(Naive)
(Linear Programming Relaxation)
max x1 + 5x2
s.t. x1 + 10x2 20
(4.16)
x1 2
x1 , x2 0
(2, 1.8)
4.4
(2, 1.8)
(2,2)
(
)
(2, 1)
7
4.4
(0,2)
10
7
3
x1 + 2x2 4
56
(0,2)
x1 + 10x2 = 20
(2,0)
4.4:
(
4.5)
max x1 + 5x2
s.t. x1 + 10x2 20
x1 2
(4.17)
x1 + 2x2 4
x1 , x2 0
x1 + 2x2 4
(0,2)
(
)
(0,2)
x1 + 10x2 = 20
(2,0)
x1 + 2x2 = 4
x
4.5:
(Convex Optimization)
4.6
(
)
(
)
(
)
(Active Constraint)
4.6
[24]
(x, y)
k(x, y) (x, y)k, y, y Y
58
4.6:
[24]
6
6
(
) 6
(
)
[24]
6
4.7(
)
(
4.7
)
59
(Most-Violated Constraint)
(
4.7
)
y (
y y H(y)
)
i (
)
i
(
)
y (
4.7
)
(
4.7
)
Algorithm 6 SVM-struct
1: Si i = 1, . . . , n
2:
3:
repeat
for i = 1, 2 , n do
4:
5:
6:
7:
if H(
y ) > i + then
8:
Si Si {
y}
9:
S S
S = i Si
10:
11:
12:
end if
end for
until Si
60
4.7:
61
4.4
(Large Margin Principle)
(Conditional Random Field)
(Discriminative Model)
(Discriminative
Training)
[25]
(Bayes Risk)
(Bayesian School)
(Statistical School)
(Logistic
Regression)
(Linear Discriminant Analysis)
(Support
Vector Machine)
62
(Large Margin HMM) [26]
(
)
(Approximation)
TI Digit
(
)
4.5
(x, y)
63
(Parsing)
(Information Retrieval)
64
TIMIT
5.1
5.1.1
TIMIT
(The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus
TIMIT)
TIMIT
(Defense Advanced
Research Projects Agency; DARPA)
(Texas
Instruments; TI)
(Massachusetts Institute of Technology; MIT)
(Stanford Research Institute; SRI)
(TI +
MIT = TIMIT)
TIMIT
6300
(Dialect)
630
10
5.1 10
2
(SA):
65
31 (63%)
18 (27%)
49 (8%)
71 (70%)
31 (30%)
102 (16%)
79 (67%)
23 (23%)
102 (16%)
69 (69%)
31 (31%)
100 (16%)
62 (63%)
36 (47%)
98 (16%)
30 (65%)
16 (35%)
46 (7%)
74 (74%)
26 (26%)
100 (16%)
22 (67%)
11 (33%)
33 (5%)
438 (70%)
192 (30%)
630 (100%)
5.1: TIMIT
5
(Phonetically Balanced)
(SX):
450
3
(SI):
(Brown Corpus)
1890
1890
SA
[27]
SA
SA
3696
(462
)
192
(
TIMIT
24
5
SX
3
SI
)
TIMIT
64
(
5.2)
(Phone
Recognition)
[27]
(Glottal Stops)
48
(
5.4)
48
66
bee
day
gay
pea
tea
key
dx
muddy, dirty
bat
jh
joke
ch
choke
sea
sh
she
zone
zh
azure
fin
th
thin
van
dh
then
mom
noon
ng
sing
em
bottom
en
button
eng
washington
nx
winner
lay
ray
way
yacht
hh
hay
hv
ahead
el
bottle
iy
beet
ih
bit
eh
bet
ey
bait
ae
bat
aa
bott
aw
bout
ay
bite
ah
but
ao
bought
oy
boy
ow
boat
uh
book
uw
boot
ux
toot
er
bird
ax
about
ix
debit
axr
butter
ax-h
suspect
pau
pause
epi
silence
h#
silence
#h
silence
pcl
(unvoiced sil)
tcl
(unvoiced sil)
kcl
(unvoiced sil)
qcl
(unvoiced sil)
vcl
(vioced sil)
bcl
(voiced sil)
dcl
(voiced sil)
gcl
(voiced sil)
5.2: TIMIT
5.3:
67
48
(
5.3)
A
B
39
5.1.2
5.1a
(10ms
25ms
)
13
(Time Derivatices)
39
13
40
(Filter Banks) (
64 Hz
8k Hz)
(Ceptral Mean Substraction; CMS) [30]
5.1b
(Time Domain)
(Pre-emphasis)
(Frequency Domain)
(Autoregressive
Coefficient)
13
39
68
iy
beat
en
button
ih
bit
ng
sing
eh
bet
ch
church
ae
bat
jh
judge
ix
roses
dh
they
ax
the
bob
ah
butt
dad
uw
boot
dx
(butter)
uh
book
gag
ao
about
pop
aa
cot
tot
ey
bait
kick
ay
bite
zoo
oy
boy
zh
measure
aw
bough
very
ow
boat
fief
led
th
thief
el
bottle
sis
red
sh
shoe
yet
hh
hay
hv
wet
cl(sil)
(unvoiced closure)
er
bird
axr
vcl(sil)
(voiced closure)
mom
em
epi(sil)
(epinthetic closure)
non
nx
sil
(silence)
ux
5.4: 48
69
eng
(a) MFCC
(b) PLP
5.1:
70
5.1.3
(HTK) [31];
(QuickNet) [32]; SV M struct [33];
5.1.4
(Hidden Markov Model; HMM)
(Tandem System)
TIMIT
(Free Phone Decoding) (
)
(Phone Accuracy)
5.1.1
TIMIT
64
48
71
(
)
48
5.1.2
(HTK)
48
72
(Hidden Layer)
1000
351
39
39 9 = 351
48
48
(Principal Component Analysis; PCA)
(Eigen Value)
95%
37
5.1.5
(Framework)
(x, y)
(x, y)
aij bj (ot )
(Conditional Random Fields; CRF)
73
(Feature Function)
p(y, x) =
T
Y
t=1
(
)
XX
XX X
1
= exp
ij (yt = i)(yt1 = j) +
oi (yt = i)(xt = o)
Z
t i,jS
t iS oOw
(
)
XX
1
k fk (yt , yt1 , xt )
= exp
Z
t
k
(5.1)
x
y
(
(Observable)
)
ij = log aji = log p(yt = i|yt1 = j)
oi =
log bi (o) = log p(xt = o|yt = i)
Z = 1
f
(x, y)
(Tensor Product)
: RD RK RDK , (a b)i+(j1)D ai bj
(5.2)
PT
t=1 (xt ) c (y t )
(x, y) =
P
1 c t
Tt=1
(y ) c (y (t+1) )
(5.3)
(5.4)
(xt t
MFCC
PLP
PT
t=1
(xt ) c (y t )
PT 1
t=1
c (y t ) c (y (t+1) )
74
5.2: (x, y)
3
{A, B, C}
5.2 )
(x, y)
(Transition Count)
(Emission
Count)
A
B (A
index
0
B
index 1
C
index 3
= 1
(xt ) = xt )
0 1 0
3.2 5
A=
1 1 0 , B = 3.7 6.4
0 0 0
0
0
(5.5)
(x, y)
A
B
(x, y) =
0 1 0 1 1 0 0 0 0 3.2 5 3.7 6.4 0 0
(5.6)
(x, y)
k
w
w
k
5.3
75
5.3:
(Viterbi
Algorithm)
(Most Violated Constraint)
(Active Constraint)
(y, y)
(y, y) =
T
X
(y t , yt )
(5.7)
t=1
(Upper Bound)
76
64
48
(
)
5.1.6
(Phone Accuracy)
H =N DS
H
100%
N
H I
100%
Acc =
N
Corr =
(5.8)
(5.9)
(5.10)
H(Hit; )
D(Deletion; )
S(Substitution; )
I(Insertion; )
N(Number)
(
word
)
D
S
I
(Edit Distance)
7
5.2
5.2.1
(
5.5
5.4)
(Phone Accuray)
5.5
77
Algorithm 7
1: for i = 0, 1 . . . , m do
2:
d[i, 0] i
3:
end for
4:
for j = 0, 1 . . . , n do
5:
d[0, j] j
6:
end for
7:
for i = 1, 2, . . . , m do
8:
for j = 1, 2, . . . , n do
9:
10:
11:
12:
13:
cost 0
else
cost 1
end if
14:
end for
15:
16:
end for
78
MFCC
PLP
MLP-MFCC
MLP-PLP
(PCA)
PCA-37-MLP-MFCC
PCA-37-MLP-PLP
PCA
Flat Start
Init By Label
TIMIT
(Flat
Start)
(Forced Alignment)
(Init By Label)
Init By Label
Flat Start
62%, 63%
68%, 69%
PCA
1%
70%
79
5.4:
MFCC
PLP
MLP-MFCC
MLP-PLP
PCA-37-MLP-MFCC
PCA-37-MLP-PLP
HMM(Flat Start)
62.47%
62.69%
68.77%
69.25%
70.19%
70.26%
HMM(Init By Label)
63.26%
62.91%
69.26%
69.50%
70.30%
70.42%
5.5:
80
MFCC
PLP
PLP
MFCC
5.5
(Slack Variable)
MLP-MFCC
MLP-PLP
PCA-37-MLP-MFCC
PCA37-MLP-PLP
5.6
5.5
(x, y)
(First-Order Markov)
MFCC
PLP
51%
MFCC
PLP
63%
11%
50%
MFCC
PLP
57%
71.7%
70.42%
1%
81
5.5: (
)
MFCC
PLP
MLP-MFCC
MLP-PLP
PCA-37-MLP-MFCC
PCA-37-MLP-PLP
38.87%
39.08%
57.40%
57.57%
56.71%
56.79%
10
46.55%
44.81%
67.74%
67.65%
67.13%
67.20%
100
49.74%
49.65%
70.86%
70.89%
70.18%
70.19%
1000
51.29%
51.22%
71.71%
71.75%
71.29%
71.32%
5.6: (
)
82
PCA
PCA
PCA
5.7
5.6
5.6
(Second-Order
Markov)
(x, y)
5.8
5.7
MFCC
PLP
10
70.07%
100
1000
5.8
83
5.6: (
)
MFCC
PLP
MLP-MFCC
MLP-PLP
PCA-37-MLP-MFCC
PCA-37-MLP-PLP
38.77%
37.03%
57.54%
57.33%
56.35%
56.07%
10
46.13%
44.71%
67.73%
67.48%
67.19%
66.98%
5.7: (
)
84
5.7: (
)
MFCC
PLP
MLP-MFCC
MLP-PLP
PCA-37-MLP-MFCC
PCA-37-MLP-PLP
39.03%
39.19%
64.43%
64.25%
63.65%
63.75%
10
46.38%
44.61%
69.94%
70.07%
69.84%
69.91%
5.8: (
)
85
5.8:
5.3
TIMIT
86
6.1
1%
(
)
(
)
6.2
(Conditional Random Field; CRF)
(Discrminative Training)
(
87
88
[1] Lawrence R. Rabiner, A tutorial on hidden Markov models and selected applications
in speech recognition, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
1990.
[2] J. K. Baker, The dragon system - an overview, in IEEE Trans. Acoust. Speech
Signal Process, 1975, pp. 2429.
[3] L. Bahl, P. Brown, P de Souza, and R. Merce, Maximum mutual information
estimation of hidden markov model parameters for speech recognition, in ICASSP
1986, 1986.
[4] B.-H. Juang, W. Chou, and C.-H Lee, Minimum classification error rate methos for
speech recognition, in IEEE Transactions on Speech and Audio Processing, 1997.
[5] D. Povey and P.C. Woodland, Minimum phone error and i-smoothing for improved
discriminative training, in ICASSP 2002, 2002.
[6] J. Zheng and A. Stolcke, Improved discriminative training using phone lattices,
in Interspeech 2005, 2005.
[7] J. Du, P. Liu, F. K. Soong, J.-L. Zhou, and R.-H. Wang, Minimum divergence based
discriminative training, in Interspeech 2006, 2006.
[8] Daniel Jurafsky and James H. Martin, Speech and Language Processing, Pearson
Education Taiwan Ltd., 2005.
89
[9] Leonard E. Baum and Ted Petrie, Statistical inference for probabilistic functions
of finite state markov chains, The Annals of Mathematical Statistics, vol. 37, no. 6,
pp. 15541563, 1966.
[10] Hynek Hermansky Daniel, Daniel P. W. Ellis, and Sangita Sharma, Tandem connectionist feature extraction for conventional hmm systems, in Proc. ICASSP, 2000,
pp. 16351638.
[11] Eric Fosler Lussier and Jeremy Morris, Crandem systems: Conditional randem
field acoustic models for hidden markove model, in Proc. ICASSP, 2008, pp. 4049
4052.
[12] Asela Gunawardana, Milind Mahajan, Alex Acero, and John C. Platt, Hidden
conditional random fields for phone classification, in in Interspeech, 2005, pp.
11171120.
[13] Yun-Hsuan Sung, Constantinos Boulis, Christopher Manning, and Dan Jurafsky,
Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification, in IEEE ASRU 2007, 2007, pp. 639
642.
[14] J. Morris and E. Fosler-Lussier, Discriminative phonetic recognition with conditional random fields, in HLT-NAACL Workshop on Computationally Hard Problems
and Joint Inference in Speech and Language Processing, 2006.
[15] Tsochantaridis Ioannis, Hofmann Thomas, Joachims Thorsten, and Altun Yasemin,
Support vector machine learning for interdependent and structured output spaces,
in ICML 04, New York, NY, USA, 2004, p. 104, ACM.
90
[16] Yasemin Altun, Ioannis Tsochantaridis, and Thomas Hofmann, Hidden markov
support vector machines, 2003.
[17] Thorsten Joachims, A support vector method for multivariate performance measures, in ICML 05: Proceedings of the 22nd international conference on Machine
learning, New York, NY, USA, 2005, pp. 377384, ACM.
[18] Yue Yisong, Finley Thomas, Radlinski Filip, and Joachims Thorsten, A support
vector method for optimizing average precision, in SIGIR 07: Proceedings of the
30th annual international ACM SIGIR conference on Research and development in
information retrieval, New York, NY, USA, 2007, pp. 271278, ACM.
[19] Thorsten Joachims, Training linear svms in linear time, in KDD 06: Proceedings
of the 12th ACM SIGKDD international conference on Knowledge discovery and
data mining, New York, NY, USA, 2006, pp. 217226, ACM.
[20] S. Sathiya Keerthi and S. Sundararajan, Crf versus svm-struct for sequence labeling, Tech. Rep., Yahoo Research Techinical Report, 2007.
[21] Vladimir N. Vapnik, The nature of statistical learning theory, Springer-Verlag New
York, Inc., New York, NY, USA, 1995.
[22] Christopher J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, vol. 2, pp. 121167, 1998.
[23] John Platt Microsoft and John C. Platt, Sequential minimal optimization: A fast
algorithm for training support vector machines, Tech. Rep., Advances in Kernel
Methods - Support Vector Learning, 1998.
91
[24] Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun, Large margin methods for structured and interdependent output variables,
J. Mach. Learn. Res., vol. 6, pp. 14531484, 2005.
[25] Tom Minka, Discriminative models, not discriminitave training, Tech. Rep., Microsoft Research, 2005.
[26] X. Li, H. Jiand, and C. Liu, Large margin hmms for speech recognition, in IEEE
Transactions on Acoustics, Speech and Signal Processings (ICASSP 05), 2005, pp.
513516.
[27] K. Lee and H. Hon, Speaker-independent phone recognition using hidden markove
models, in IEEE Transactions on Acoustics, Speech and Signal Processings, 1989,
pp. 16411648.
[28] X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing, Pearson Education Taiwan Ltd., 2005.
[29] H. Hermansky, B. Hanson, and H. Wakita, Perceptually based linear predictive
analysis of speech, Apr 1985, vol. 10, pp. 509512.
[30] S. Furui, Cepstral analysis technique for automatic speaker verification, Acoustics,
Speech and Signal Processing, IEEE Transactions on, vol. 29, no. 2, pp. 254272,
Apr 1981.
[31] Machine Intelligence Laboratory Cambridge University Engineering Dept. (CUED),
Htk, http://htk.eng.cam.ac.uk.
92
93