Professional Documents
Culture Documents
NGHIN CU CC THUT TAN M GIM NHIU TING VANG TRONG MIN PH NHM NNG CAO CHT LNG TING NI
STUDY ON SPECTRAL-BASED BLIND DEREVERBERATION ALGORITHMS FOR SPEECH ENHANCEMENT SVTH: Nguyn Th Phng Mai, Trn Thy Nguyn, Th Hong Yn
Lp 05DT1,2, Khoa in t Vin thng, Trng i hc Bch khoa
1. Gii thiu Nhiu ting vang sinh ra do , nhiu ny nh hng nhiu n cht lng v tnh hiu c ca ting ni (hnh 1). X l trit/gim ting vang (dereverberation) l vn khng n gin v thng thng thng tin v tnh cht ca ngun tn hiu v iu kin ca knh truyn m thanh khng c bit trc hoc ch c rt t kin thc lin quan c cung cp.
Cho n nay, k thut x l ting vang c phn thnh hai loi l cc k thut gim ting vang v cc k thut loi b ting vang [3] ty thuc k thut c c lng p ng xung ca knh truyn hay khng. Bi bo ny nh gi hiu qu nng cao cht lng ting ni ca 2 thut tan gim ting vang: thut tan tr ph (spectral subtraction)
216
[1] v thut tan to mt n (spectral masking) [6]. Hiu qu hai thut tan ny c th nghim trn c s d liu ting Vit, nh hng ca cc thng s n hiu qu thut ton cng c kho st, hiu qu ca 2 thut tan c so snh cc khang cch ngunmicrophone khc nhau. Bi bo chia lm ba phn, ln lt cp n hai thut ton trn v cc phng php nh gi cht lng ca cc thut ton v kt qu nh gi t c. 2. Thut ton tr ph 2.1. Thut ton
Phn on tn hiu c lng p ng xung ca knh truyn Tr c lng ny ra khi tn hiu Khi phc li tn hiu ban u
Thut ton ny c xut cho h thng nhn dng ting ni t ng (ASR) [2]. S khi thut ton c trnh by trong hnh 2. Tn hiu b nhiu ting vang c bin i Fourier thi gian ngn (ca s Hanning, t l chng lp 75%). Bng cch chn ca s c chiu di ln hn nhiu so vi chiu di p ng xung vi gi thit p ng xung qua cc khung thi gian khc nhau l khng i. Bng cch tr i gi tr trung bnh ca logarit ca ph tn s, nh hng ca ting vang ln tn hiu s c gim xung. Do s dng ca s c chiu di ln nn phn gii tn s cao, sau khi p dng tr ph s lm xut hin nhiu nhn to (artifact noise). Nhiu nhn to nh hng nhiu n cht lng v tnh d hiu c ca tn hiu ting ni (iu ny khng quan trng lm vi h thng ASR). V th cn thit c qu trnh x l sau gim nhiu nhn to. 2.2. Post Processing Post Processing l th tc x l nhiu nhn to sinh ra do tr ph, thc hin nh sau. Thc hin normalize logarit bin ca c tn hiu b nhiu v tn hiu sau khi qua tr ph, dng ca s win c chiu di b hn nhiu so vi N. So snh bin ca hai tn hiu trn cng mt tn s v khung thi gian, nu bin tn hiu sau khi qua tr ph ln hn th cho rng phn bin ln hn l do nhiu nhn to, v lm suy gim bin ny bng mt h s ty thuc mc nng lng di ra. Vi nhng m nng lng sau khi x l b hn nng lng trc khi x l, h s ny bng 1. 3. Thut ton to mt n ph
Phn on tn hiu theo di tn c lng RT60 ca knh truyn cc di tn
Xy dng mt n
S thut ton c trnh by trong hnh 3. Tn hiu c phn tch thnh cc di tn s khc nhau sau tnh ton ng bao ca mi di tn s ny ri chuyn sang thang dB. Hng s thi gian a ca nhiu ting vang c c lng mi tn s bng cch tnh dc ca ng bao trn N mu (N c chn qua th nghim nhiu gi tr khc nhau). Mt ca s c chiu di N s c dch trn ng bao ca tn hiu qua tng mu, dng php quy tuyn tnh tnh dc si . Theo cch c lng trong [7], gi tr chnh xc
217
ca a c xc nh da vo b s nhn c, gi tr ny chnh l s*. Tip theo ta s dng mt n nh phn loi b phn tn hiu ch yu do ting vang, gi tr mt n ti v tr i di tn no c xc nh nh sau: mi
0 if si 1
otherwise
(1)
Gi tr c chn qua thc nghim. Hiu qu ca thut ton cng c tnh da trn vic so snh mt n c lng vi mt n tht (tnh theo tn hiu sch v tn hiu nhiu):
mi
er ec 1 otherwise 0 if
(2)
4. Cc k thut nh gi khch quan trn, ta phn tch cc thut ton x l gim ting vang. nh gi c thut ton no c hiu qu nht, cc thut ton nh gi cht lng ting ni khch quan c s dng.nh gi khch quan l phng php nh gi cht lng da trn thuc tnh ca tn hiu, thng c dng trong vic nh gi cc phng php nng cao cht lng ting ni : nh php o t s tn hiu trn nhiu ting vang; gm cc php o da trn cc h s tuyn tnh nh khong cch Log Likehook Ratio (LLR), Itakura Saito (IS), v khong cch ph (CEP) nh trong [4] tri gic gm php o khong cch dc ph (WSS) v cc php o cht lng nh gi theo cm nhn ch quan (PESQ) bao gm: Weighted Spectral Slope (WSS) distance: dc ph spectral slope u tin hai spectral slope ca hai tn hiu bi cng thc:
L
(C x , C x )
k 1
W (k )(S x (k ) S x (k )) 2
(3)
Perceptual Evaluation of Speech Quality (PESQ): Trong tt c cc objective measure th PESQ l phng php phc tp nht tuy nhin tnh tng quan ca n cao nht so vi cm nhn ch quan. Cc gi tr ca PESQ nm trong khong 0,5 n 4,5 tng ng vi cc thang o trong phng php nh gi ch quan. 5. Kt qu m phng v nh gi 5.1. C s d liu s dng nh gi hiu qu ca k thut ny, c s d liu ting Vit c xy dng gm cc cu ting Vit trch t VOA, c chiu di trung bnh l 8s, gm c ging nam v n. p ng xung ca knh truyn c xy dng t p ng xung ca mt phng hp, vi nhng khang cch ghi m khc nhau [0.1 0.25 0.5 0.75 1 1.5 3 4] m, chiu di p ng xung trung bnh khong 0.3s. Tn hiu ting ni b nhiu ting vang c c bng cch nhn chp tn hiu sch vi p ng xung trn.
218
5.2. Kt qu m phng 5.2.1. Dng thut ton tr ph (spectral subtraction) p dng thut ton vi c s d liu trn, s dng gi tr N = 0.064, 0.256, 1.024 v win = 0.016, 0.032.
T cc kt qu m phng trn ta rt ra kt lun: o Theo hnh H4, chiu di ca s trong post processing c gi tr l 0.016 tt hn 0.032 o Theo hinh H5, chiu di ti u ca N khng nht thit phi ln hn 4 ln chiu di p ng xung nh trong [1]. Khi cha c post-processing, gi tr ca s cng b cng tt. o Theo hnh H6, H7,H8 ta thy hai phng php nh gi WSS v PESQ c mc tin cy cao nht trong s cc phng php nh gi cht lng ca tn hiu sau khi qua x l trit ting vang trong khi cc phng php khc nh so snh khong cch ph (CEP distance) khng phn nh ng kt qu (cc phng php nh gi khc cho ra kt qu tng t nh CEP distance). Do vy, cc kt lun sau ch da vo hai ch s WSS v PESQ . o Theo hnh H6 v H7, cht lng ting ni sau x l ph thuc vo khong cch gia ngun v microphone, trong khong t 1m tr ln th cht lng ting ni ci thin r hn. o Theo hnh H5,H6,H7, post-processing ci thin ng k cht lng tn hiu ting ni
219
5.2.2. Thut ton to mt n ph (spectral masking) Phn m phng p dng masking method trn cng c s d liu trn, thay i cc thng s khc nhau tm cc gi tr ti u cho thut ton. Sau khi thc hin nhn thy hiu qu thut tan ph thuc chiu di ca s, loi b lc trong filter bank, loi ng bao, cc gi tr s dng cho mt n. Gi tr ti u cho l 0.015 nh trong [6].
Theo hnh H10 nhn thy khi dng ca s c chiu di khc nhau tnh dc ca ng bao tn hiu, win_sr = 1600, 800, 400,200. th win_sz = 400 cho tn hiu ra c cht lng tt nht, gi tr ny khc vi win_sz = 1600 ng vi tn s ly mu 16000 trong [6] Dng cc loi b lc sau trong filterbank: butterworth dch tn s khng i, butterworth logarit tuyn tnh, b lc GammaTone. Kt qu m phng (hnh H9) cho thy b lc logbutter l tt nht. Khi quan st ph ca tn hiu ta thy tn hiu sau x l masking method khng to ra nhiu nhn to, tn s cng khng b dch i, lm tn hiu x l d nghe hn. Mc ci thin cht lng ting ni tn hiu sau x l khng ph thuc vo khang cch v tn hiu, hiu qu ca thut tan vng xa r hn so vi vng gn. 6. Kt lun Thng qua c s d liu t to ra c di trung bnh 8s, qua tm hiu nh gi hai phng php x l ting vang thc hin trn cc chng trnh matlab ta c th thy: C hai thut ton ci thin c cht lng ca tn hiu b nhiu ting vang. Php nh gi cht lng tn hiu sau khi x l gim ting vang da trn PESQ v WSS c tin cy v tng quan vi cm nhn ch quan ca ngi nghe. Khi s dng phng php tr ph, post procesing lm ci thin cht lng tn hiu r . Cht lng tn hiu khng c nng cao nu dng tr ph v khng s dng post processing. Nhn chung, masking method cho ra kt qu tt hn so vi tr ph, tn hiu sau x l khng b nhiu nhn to, cht lng tn hiu ra khng ph thuc vo c im ting ni. Hiu qu ca c hai thut ton masking method v tr ph th hin r far_field hn so vi near_field.
220
Trong nghin cu ny, hiu qu ca thut ton c nh gi trong mi trng khng c nhiu trng cng. Hng pht trin tip theo ca nghin cu ny l th nghim hiu qu ca thut ton trn c s d liu ting vit ln hn, dng p ng xung ca cc phng hp khc v c xt n nh hng ca nhiu nn ln hiu qu ca thut ton. TI LIU THAM KHO [1] D.Gelbart, N.Morgan, Evaluating Long-Term Spectral Subtraction For Reverberant ASR, in ICSLP 2002. [2] C.Avendano, S.Tibrewala, and H.Hermansky, Multiresolution Channel Normalization for ASR in Reverberant Environments, in EUROSPEECH 1997, Rhodes, Greece, 1997. [3] Habets, Emanuel A.P, Single- and Multi-Microphone Speech Dereverberation using Spectral Enhancement, Eindhoven University Press, 2007. [4] Philipos C.Loizou, Speech enhancement theory and practice, chapter 10 Evaluating Performance of Speech Enhancement Aglorithms, CRC Press, June 2007. [5] Patrick A. Naylor, Nikolay D. Gaubitch, and Emanul A. P. Habets, Signal-Based Performance Evaluation of Dereverberation Algorithms, Journal of Electrical and Computer Engineering, volume 2010 . [6] Graham Grindlay, Blind Dereverberation of Audio Signals, E4810 Final Project, University of Columbia, December 2008. [7] R.Ratnam, D.Jones, C.Wheeler, and D.OBrien, Blind Estimation of Reverbeartion Time, Journal of the Acoutical Society of America, 114(5): 2877-2892, 2003.
221