Professional Documents
Culture Documents
32 5
Journal of Beijing U niv ersity of Posts and T elecommunications
Oct. 2009
Vo l. 32 N o. 5
( 1 , 100876; 2 , 100190)
: , , ( CRF)
. , ,
CRF CRF , . ,
F score 90 67% 80 05% , ( M E)
3 62% 5 65% , .
: ; ; ;
: T P37
: A
DONG Yuan ,
ZHOU T ao ,
DONG Cheng yu ,
WANG H ai la
( 1 School of Informat ion and Communication Engineering, Beijing U niversity of Posts and Telecommunication,
Beijing 100876, China; 2 France Telecom R& D, Beijing 100190, China)
Abstract: Prosodic st ruct ure predict ion is an important component in mandarin t ext to speech ( T T S)
syst em. A prosodic st ruct ure predict ion method is proposed, based on t he condit ional random f ield
( CRF ) algorithm. Prosodic w ord model and prosodic phrase model ut ilize CRF met hod f or machine
learning based on aut omat ically seg mented and t agged f eatures and hierarchal prosodic structure inf or
mat ion ext racted f rom a large scale manually labeled speech corpus. T he approach achieves F score of
90 67% in prosody w ord predict ion and 80 05% in prosody phrase prediction, 3 62% and 5 65%
higher t han t hat of max entropy ( M E) algorithm based met hod. Ex periment result s show t hat t he ap
proach of CRF based met hod makes considerable improvement in prosodic st ruct ure predict ion, and
w orks well in real mandarin T T S system.
Key words: text t o speech; prosodic structure; condit ional random field; machine learning
,
[ 1] .
.
,
[ 2] . ,
, .
.
[ 3] .
,
,
: 2009 03 09
: ( 108012)
:
( 1970
, .
37
1 .
. ,
; , .
,
.
( HMM ) ( M EM )
2 1
,
[ 6 7]
CRF
. HM M ,
, MEM ,
. CRF
,
. :
, ,
G = ( V , E ) , Y = { y v | v
[ 8 9]
V } G
v y v .
y v , ( X , Y )
CRF .
C= { ( x c , y c ) } G
, 3
, ( PW ) ( P P)
, , x
y p ( y | x )
1
p ( y | x ) = Z( x )
!e
f (y c , xc )
k k
, f k ( y c , x c ) ,
= { k } ,
Z( x ) =
!e
y
f ( yc , x c)
k k
, G = ( V , E )
y
1
1 1
, 3
, 1~ 3 , .
, ,
.
1 2
, ,
. . ,
:
g ( i , y i , x ) f ( i , y i - 1 , y i , x ) .
{ ( x ( k ) , y ( k ) ) } ,
CRF .
.
1 x ,
CRF , x
^
y
^
y = arg max
p ( y | x ) = arg max
y
y
kf k (
y c, x c )
, 7 ,
y Vit erbi .
2 2
CRF
.
1 3
, , 1
, ,
.
, .
38
.
, ,
, . ,
. CRF
6 , 1 .
,
2 CRF .
. ,
, ,
PO S
Word
WLen
,
. 1
LastT ype
Dist Pre
Dist N xt
,
: ( P OS, part of speech)
( WL en, word lengt h) ( Word,
32
1 4 ,
2 .
, 6 ,
. , Word
. ,
; ,
, Word .
,
POS
Word WLen 3
, . , POS
2, Word WLen 1, 2 .
2
, L astT ype
.
,
. , L ast T ype P P,
PO S
, .
, .
Dist Pre DistNxt
, Dist Pre , Dis
PO S 1
1
1
POS + 2
W ord 1
Word
. ,
, .
2 3
Word+ 1
W Len 1
WLen
WLen
WLen+ 1
, 2
, , 3 .
t Nxt .
,
,
POS + 1
Word
. ,
, 5~ 7 .
POS
PP .
PO S 2
Word 1POS 1
Word0POS 0
Word+ 1POS + 1
Word 2POS 2
1
2
,
, 2
. ,
34 38 .
39
PP , 1 PW
, PW PW, PP .
3 1
, CRF
0 75 , P P .
, 2
, 2
, 2 .
.
3 2 2
PP 0 75,
, CRF
,
. ,
#
. .
, CRF
.
, 2
, . ,
, ,
, 3 .
3 2
, .
, .
,
BaiLing ,
.
4 1
2000 %&,
3
3 2 1
,
. ,
,
7 , ,
. ,
, ,
,
. ,
, CRF
. 2000 %&,
, .
1 000
.
, 1 ,
P W , L W PW
, , ,
.
4 2
( P ) ( R )
PP , 1
0 65. PW ,
F score( F ) . ,
;
. , 1
PW , .
2 .
; F score F = 2PR /
( P+ R ) .
40
4 3
32
, ,
CRF ,
, .
4 .
CRF .
,
F score, .
F score
( PW)
87 95
86 16
87 05
[ 1]
( PW)
90 21
92 94
90 67
( PP)
71 77
77 24
74 40
( PP)
78 61
81 55
80 05
4 ,
,
[ 2]
, F score 87 05%
74 40% , 90 67%
[ 3]
,
. ,
Chai P eiqi.
Segmentation of prosodic
80 05% , ,
, 5 65% .
.
4 4
, #
705 IV 708.
[ 5]
, # / vq/ PW
/ n/ P P / vi/ L W / vi/ LW,
# / vq/ P W / n/ PW / v i/ L W / vi/
LW. #
, .
[ 6]
[ 7]
2 : #
3, # ,
, ;
, # , 2000 %
&,
, .
2 , ,
,
[ 10] .
[ 9]
. ,