You are on page 1of 5

2009 10

32 5


Journal of Beijing U niv ersity of Posts and T elecommunications

Oct. 2009
Vo l. 32 N o. 5

: 1007 5321( 2009) 05 0036 05

( 1 , 100876; 2 , 100190)

: , , ( CRF)
. , ,
CRF CRF , . ,
F score 90 67% 80 05% , ( M E)
3 62% 5 65% , .

: ; ; ;

: T P37

: A

Prosodic Structure Prediction Based on Conditional Random Field Model


1

DONG Yuan ,

ZHOU T ao ,

DONG Cheng yu ,

WANG H ai la

( 1 School of Informat ion and Communication Engineering, Beijing U niversity of Posts and Telecommunication,
Beijing 100876, China; 2 France Telecom R& D, Beijing 100190, China)

Abstract: Prosodic st ruct ure predict ion is an important component in mandarin t ext to speech ( T T S)
syst em. A prosodic st ruct ure predict ion method is proposed, based on t he condit ional random f ield
( CRF ) algorithm. Prosodic w ord model and prosodic phrase model ut ilize CRF met hod f or machine
learning based on aut omat ically seg mented and t agged f eatures and hierarchal prosodic structure inf or
mat ion ext racted f rom a large scale manually labeled speech corpus. T he approach achieves F score of
90 67% in prosody w ord predict ion and 80 05% in prosody phrase prediction, 3 62% and 5 65%
higher t han t hat of max entropy ( M E) algorithm based met hod. Ex periment result s show t hat t he ap
proach of CRF based met hod makes considerable improvement in prosodic st ruct ure predict ion, and
w orks well in real mandarin T T S system.
Key words: text t o speech; prosodic structure; condit ional random field; machine learning

,
[ 1] .

.
,
[ 2] . ,

, .
.
[ 3] .
,
,

: 2009 03 09
: ( 108012)
:

( 1970

) , , , E mail: yuandong@ bupt . edu. cn.

, .

37

1 .

, classif icat ion and regres

sion tree ( CART ) ,


[ 4 5] . CART

. ,
; , .

,
.
( HMM ) ( M EM )

2 1

,
[ 6 7]

CRF

. HM M ,

, MEM ,
. CRF

,
. :

, ,

G = ( V , E ) , Y = { y v | v

[ 8 9]

V } G

v y v .
y v , ( X , Y )
CRF .

C= { ( x c , y c ) } G

, 3
, ( PW ) ( P P)

, , x
y p ( y | x )

( intonat ion phrase) .

1
p ( y | x ) = Z( x )

!e

f (y c , xc )

k k

, f k ( y c , x c ) ,
= { k } ,
Z( x ) =

!e
y

f ( yc , x c)

k k

, G = ( V , E )
y
1

1 1

, 3

, 1~ 3 , .
, ,
.
1 2

, ,

. . ,
:
g ( i , y i , x ) f ( i , y i - 1 , y i , x ) .
{ ( x ( k ) , y ( k ) ) } ,
CRF .
.

1 x ,
CRF , x
^

y
^

y = arg max
p ( y | x ) = arg max
y
y

kf k (

y c, x c )

, 7 ,

y Vit erbi .

2 2
CRF

.
1 3

, , 1

, ,
.
, .

38

.
, ,
, . ,

. CRF
6 , 1 .

,
2 CRF .

. ,

, ,

PO S

Word

WLen

,
. 1

LastT ype

Dist Pre

Dist N xt

,
: ( P OS, part of speech)
( WL en, word lengt h) ( Word,

32

1 4 ,

the w ord it self) . , POS WL en


, Word

2 .

, 6 ,


. , Word
. ,
; ,
, Word .
,

POS
Word WLen 3
, . , POS
2, Word WLen 1, 2 .
2

, L astT ype
.
,
. , L ast T ype P P,

PO S

, .

, .
Dist Pre DistNxt
, Dist Pre , Dis

PO S 1

1
1

POS + 2

W ord 1

Word

. ,
, .
2 3

Word+ 1

W Len 1

WLen

WLen
WLen+ 1

, 2
, , 3 .

t Nxt .
,
,

POS + 1

Word

. ,
, 5~ 7 .

POS

PP .

PO S 2

Word 1POS 1
Word0POS 0
Word+ 1POS + 1
Word 2POS 2

1
2

,
, 2
. ,

34 38 .

39

PP , 1 PW
, PW PW, PP .

3 1

, CRF

0 75 , P P .
, 2

, 2

, 2 .

.
3 2 2

PP 0 75,

, CRF
,
. ,
#
. .

, CRF
.

, 2

, . ,

, ,
, 3 .

3 2

, .
, .

,
BaiLing ,
.
4 1
2000 %&,
3

3 2 1
,
. ,
,

7 , ,
. ,
, ,
,

. ,

, CRF
. 2000 %&,

, .

1 000

.
, 1 ,
P W , L W PW

, , ,
.
4 2

( P ) ( R )

PP , 1
0 65. PW ,

F score( F ) . ,
;

. , 1

PW , .
2 .

; F score F = 2PR /
( P+ R ) .

40

4 3

32

, ,

CRF ,

, .
4 .

CRF .

,
F score, .

F score

( PW)

87 95

86 16

87 05

[ 1]

( PW)

90 21

92 94

90 67

ation of pr osodic structur e for high quality M andarin

( PP)

71 77

77 24

74 40

speech synt hesis[ C] Proceeding of the Four th Inter na

( PP)

78 61

81 55

80 05

tional Conference on Spoken Language Processing.

Chou F C, T seng C Y , Lee L S, et al. Auto matic gener

Philadelphia: [ s. n. ] , 1996: 1624 1627.

4 ,
,

[ 2]

telligibility and high naturalness for Chinese[ J] . Chinese


Journal of Acoustics, 1996, 15( 1) : 81 90.

, F score 87 05%
74 40% , 90 67%

[ 3]

,
. ,

Niu Zheng yu,

Chai P eiqi.

Segmentation of prosodic

phr ases for improving the naturalness of synthesized man

80 05% , ,

, 5 65% .

Chu M in, L u S N . A tex t to speech system w ith high in

dar in Chinese speech [ C] I CSL P 2000. Beijing : [ s.


n. ] , 2000: 350 353.
[ 4]

M ao Xinnian, Do ng Yuan, Han Jiny u, et al. Inequality


max imum entropy classifier w ith char acter featrues for
polyphone disambiguation in mandarin T TS systems [ C]

IEEE I nternatio nal Conference on Acoustics, Speech,

.
4 4

and Signal Processing . Honolulu: [ s. n. ] , 2007: IV

, #

705 IV 708.
[ 5]

nat ional phrase boundar ies [ J] . Computer Speech and

, # / vq/ PW
/ n/ P P / vi/ L W / vi/ LW,
# / vq/ P W / n/ PW / v i/ L W / vi/
LW. #
, .

Wang M , H irschber g J. Auto matic classification of into


L anguage, 1992, 16( 6) : 175 196.

[ 6]

T ay lor P , Black A W. Assigning phrase breaks fr om


part of speech sequences[ J] . Computer Speech and L an
guage, 1998, 12( 2) : 99 117.

[ 7]

Jia Yuxiang, Huang Dezhi, L iu Wu, et al. T ex t normal

2 : #

ization in mandarin tex t to speech system[ C] IEEE In

3, # ,
, ;

ternational Conference on Acoust ics, Speech, and Signal

, # , 2000 %

Processing . L as Vegas: [ s. n. ] , 2008: 4693 4696.


[ 8]

r andom fields: probabilistic models for segmenting and la

&,
, .

beling sequence data [ C] Proc of the 18th ICM L . San


F rancisco: [ s. n. ] , 2001: 282 289.

2 , ,
,
[ 10] .

[ 9]

. ,

M ao X innian, Dong Yuan, He Saike, et al. Chinese


w ord seg mentat ion and named entity recog nit ion based on
conditional r andom fields [ C] IJCN LP 2008. H yder

L afferty J, M cCallum A, Pereira F C N. Conditional

abad: [ s. n. ] , 2008: 90 93.


[ 10]

Br ill E. T ransformation based error dr iven learning and


natural languag e processing: a case study in part of
speech tagging [ J] . Computational Linguistics, 1995,
21( 4) : 543 565.

You might also like