2 Dong ProsodicStructure

2009 10
32 5

Journal of Beijing U niv ersity of Posts and T elecommunications
Oct. 2009
Vo l. 32 N o. 5
: 1007 5321( 2009) 05 0036 05
( 1 , 100876; 2 , 100190)
: , , ( CRF)
. , ,
CRF CRF , . ,
F score 90 67% 80 05% , ( M E)
3 62% 5 65% , .
: ; ; ;
: T P37
: A
Prosodic Structure Prediction Based on Conditional Random Field Model

1
DONG Yuan ,
ZHOU T ao ,
DONG Cheng yu ,
WANG H ai la
( 1 School of Informat ion and Communication Engineering, Beijing U niversity of Posts and Telecommunication,
Beijing 100876, China; 2 France Telecom R& D, Beijing 100190, China)
Abstract: Prosodic st ruct ure predict ion is an important component in mandarin t ext to speech ( T T S)
syst em. A prosodic st ruct ure predict ion method is proposed, based on t he condit ional random f ield
( CRF ) algorithm. Prosodic w ord model and prosodic phrase model ut ilize CRF met hod f or machine
learning based on aut omat ically seg mented and t agged f eatures and hierarchal prosodic structure inf or
mat ion ext racted f rom a large scale manually labeled speech corpus. T he approach achieves F score of
90 67% in prosody w ord predict ion and 80 05% in prosody phrase prediction, 3 62% and 5 65%
higher t han t hat of max entropy ( M E) algorithm based met hod. Ex periment result s show t hat t he ap
proach of CRF based met hod makes considerable improvement in prosodic st ruct ure predict ion, and
w orks well in real mandarin T T S system.
Key words: text t o speech; prosodic structure; condit ional random field; machine learning
,
[ 1] .
.
,
[ 2] . ,
, .
.
[ 3] .
,
,
: 2009 03 09
: ( 108012)
:
( 1970
) , , , E mail: yuandong@ bupt . edu. cn.
, .
37
1 .
, classif icat ion and regres
sion tree ( CART ) ,

[ 4 5] . CART
. ,
; , .
,
.
( HMM ) ( M EM )
2 1
,
[ 6 7]
CRF
. HM M ,
, MEM ,
. CRF
,
. :
, ,
G = ( V , E ) , Y = { y v | v
[ 8 9]
V } G
v y v .
y v , ( X , Y )
CRF .
C= { ( x c , y c ) } G
, 3
, ( PW ) ( P P)
, , x
y p ( y | x )
( intonat ion phrase) .
1
p ( y | x ) = Z( x )
!e
f (y c , xc )
k k
, f k ( y c , x c ) ,
= { k } ,
Z( x ) =
!e
y
f ( yc , x c)
k k
, G = ( V , E )
y
1
1 1
, 3
, 1~ 3 , .
, ,
.
1 2
, ,
. . ,
:
g ( i , y i , x ) f ( i , y i - 1 , y i , x ) .
{ ( x ( k ) , y ( k ) ) } ,
CRF .
.
1 x ,
CRF , x
^
y
^
y = arg max
p ( y | x ) = arg max
y
y
kf k (
y c, x c )
, 7 ,
y Vit erbi .
2 2
CRF
.
1 3
, , 1
, ,
.
, .
38
.
, ,
, . ,
. CRF
6 , 1 .
,
2 CRF .
. ,
, ,
PO S
Word
WLen
,
. 1
LastT ype
Dist Pre
Dist N xt
,
: ( P OS, part of speech)
( WL en, word lengt h) ( Word,
32
1 4 ,
the w ord it self) . , POS WL en

, Word
2 .
, 6 ,

. , Word
. ,
; ,
, Word .
,
POS
Word WLen 3
, . , POS
2, Word WLen 1, 2 .
2
, L astT ype
.
,
. , L ast T ype P P,
PO S
, .
, .
Dist Pre DistNxt
, Dist Pre , Dis
PO S 1
1
1
POS + 2
W ord 1
Word
. ,
, .
2 3
Word+ 1
W Len 1
WLen
WLen
WLen+ 1
, 2
, , 3 .
t Nxt .
,
,
POS + 1
Word
. ,
, 5~ 7 .
POS
PP .
PO S 2
Word 1POS 1
Word0POS 0
Word+ 1POS + 1
Word 2POS 2
1
2
,
, 2
. ,
34 38 .
39
PP , 1 PW
, PW PW, PP .
3 1
, CRF
0 75 , P P .
, 2
, 2
, 2 .
.
3 2 2
PP 0 75,
, CRF
,
. ,
#
. .
, CRF
.
, 2
, . ,
, ,
, 3 .
3 2
, .
, .
,
BaiLing ,
.
4 1
2000 %&,
3
3 2 1
,
. ,
,
7 , ,
. ,
, ,
,
. ,
, CRF
. 2000 %&,
, .
1 000
.
, 1 ,
P W , L W PW
, , ,
.
4 2
( P ) ( R )
PP , 1
0 65. PW ,
F score( F ) . ,
;
. , 1
PW , .
2 .
; F score F = 2PR /
( P+ R ) .
40
4 3
32
, ,
CRF ,
, .
4 .
CRF .
,
F score, .
F score
( PW)
87 95
86 16
87 05
[ 1]
( PW)
90 21
92 94
90 67
ation of pr osodic structur e for high quality M andarin
( PP)
71 77
77 24
74 40
speech synt hesis[ C] Proceeding of the Four th Inter na
( PP)
78 61
81 55
80 05
tional Conference on Spoken Language Processing.
Chou F C, T seng C Y , Lee L S, et al. Auto matic gener
Philadelphia: [ s. n. ] , 1996: 1624 1627.
4 ,
,
[ 2]
telligibility and high naturalness for Chinese[ J] . Chinese

Journal of Acoustics, 1996, 15( 1) : 81 90.
, F score 87 05%
74 40% , 90 67%
[ 3]
,
. ,
Niu Zheng yu,
Chai P eiqi.
Segmentation of prosodic
phr ases for improving the naturalness of synthesized man
80 05% , ,
, 5 65% .
Chu M in, L u S N . A tex t to speech system w ith high in
dar in Chinese speech [ C] I CSL P 2000. Beijing : [ s.

n. ] , 2000: 350 353.
[ 4]
M ao Xinnian, Do ng Yuan, Han Jiny u, et al. Inequality

max imum entropy classifier w ith char acter featrues for
polyphone disambiguation in mandarin T TS systems [ C]
IEEE I nternatio nal Conference on Acoustics, Speech,
.
4 4
and Signal Processing . Honolulu: [ s. n. ] , 2007: IV
, #
705 IV 708.
[ 5]
nat ional phrase boundar ies [ J] . Computer Speech and
, # / vq/ PW
/ n/ P P / vi/ L W / vi/ LW,
# / vq/ P W / n/ PW / v i/ L W / vi/
LW. #
, .
Wang M , H irschber g J. Auto matic classification of into

L anguage, 1992, 16( 6) : 175 196.
[ 6]
T ay lor P , Black A W. Assigning phrase breaks fr om

part of speech sequences[ J] . Computer Speech and L an
guage, 1998, 12( 2) : 99 117.
[ 7]
Jia Yuxiang, Huang Dezhi, L iu Wu, et al. T ex t normal
2 : #
ization in mandarin tex t to speech system[ C] IEEE In
3, # ,
, ;
ternational Conference on Acoust ics, Speech, and Signal
, # , 2000 %
Processing . L as Vegas: [ s. n. ] , 2008: 4693 4696.

[ 8]
r andom fields: probabilistic models for segmenting and la
&,
, .
beling sequence data [ C] Proc of the 18th ICM L . San

F rancisco: [ s. n. ] , 2001: 282 289.
2 , ,
,
[ 10] .
[ 9]
. ,
M ao X innian, Dong Yuan, He Saike, et al. Chinese

w ord seg mentat ion and named entity recog nit ion based on
conditional r andom fields [ C] IJCN LP 2008. H yder
L afferty J, M cCallum A, Pereira F C N. Conditional
abad: [ s. n. ] , 2008: 90 93.

[ 10]
Br ill E. T ransformation based error dr iven learning and

natural languag e processing: a case study in part of
speech tagging [ J] . Computational Linguistics, 1995,
21( 4) : 543 565.

2 Dong ProsodicStructure

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2 Dong ProsodicStructure

Uploaded by

Copyright:

Available Formats

2009 10

: 1007 5321( 2009) 05 0036 05

Prosodic Structure Prediction Based on Conditional Random Field Model

) , , , E mail: yuandong@ bupt . edu. cn.

, classif icat ion and regres

sion tree ( CART ) ,

( intonat ion phrase) .

the w ord it self) . , POS WL en

ation of pr osodic structur e for high quality M andarin

speech synt hesis[ C] Proceeding of the Four th Inter na

tional Conference on Spoken Language Processing.

Chou F C, T seng C Y , Lee L S, et al. Auto matic gener

Philadelphia: [ s. n. ] , 1996: 1624 1627.

telligibility and high naturalness for Chinese[ J] . Chinese

Niu Zheng yu,

phr ases for improving the naturalness of synthesized man

Chu M in, L u S N . A tex t to speech system w ith high in

dar in Chinese speech [ C] I CSL P 2000. Beijing : [ s.

M ao Xinnian, Do ng Yuan, Han Jiny u, et al. Inequality

IEEE I nternatio nal Conference on Acoustics, Speech,

and Signal Processing . Honolulu: [ s. n. ] , 2007: IV

nat ional phrase boundar ies [ J] . Computer Speech and

Wang M , H irschber g J. Auto matic classification of into

T ay lor P , Black A W. Assigning phrase breaks fr om

Jia Yuxiang, Huang Dezhi, L iu Wu, et al. T ex t normal

ization in mandarin tex t to speech system[ C] IEEE In

ternational Conference on Acoust ics, Speech, and Signal

Processing . L as Vegas: [ s. n. ] , 2008: 4693 4696.

r andom fields: probabilistic models for segmenting and la

beling sequence data [ C] Proc of the 18th ICM L . San

M ao X innian, Dong Yuan, He Saike, et al. Chinese

L afferty J, M cCallum A, Pereira F C N. Conditional

abad: [ s. n. ] , 2008: 90 93.

Br ill E. T ransformation based error dr iven learning and

You might also like