Professional Documents
Culture Documents
Proceedings of the First International Conference on Innovative Computing, Information and Control (ICICIC'06)
0-7695-2616-0/06 $20.00 2006
wavelet coefficients of every sub-band are of the same The probability of the example xi = {a1 , a2 ," , an }
statistical property. Therefore, we can make endpoint
detection by using the variance of the wavelet belonging to c j can be deduced by Bayes Theorem, as
coefficient. shown as follows (5):
Suppose there is a discrete speech signal f[n], after P(a1 , a2 ," , an | c j ) P(c j )
using the wavelet transformation, its wavelet P (c j | a1 , a2 ," , an ) =
P (a1 , a2 ," , an ) (5)
coefficient is fk , and variance is ( f )2 , as follows in
equation (2): = P(c j ) P (a1 , a2 ," , an | c j )
1
( f )2 = ( f k E ( f k )) 2 (2) Where, is the normal factor, P (c j ) is the prior
N kN
Where, N stands for the total number of the wavelet probability of class c j , P (c j a1 , a2 ,", an ) is the
coefficients. k is the index of wavelet coefficient. posterior probability of class c j . The prior probability
According to the property of the 1/f process, after is independent of the sample data, but the posterior
using the wavelet transformation to the original signal, probability reflects the influence of the sample data to
the wavelet coefficient can be viewed as a random
variable with the zero mean value, so the equation (2) class c j . From formula (5), we can calculate out the
is changed to the following equation: probability of the sample xi belonging to class c j . On
1 (3) the basis of above, a Bayes classification is built as
( f )2 = ( f k )2
N kN Table 1.
Extracting the noise, unvoice and clean speech
signals wavelet coefficients as the known knowledge, Table 1. The describing of three layers
as shown in equation (4) as follows: Bayes classification
1
( n ) 2 = (nk )2 Class ID input Probability distribution
N k N
1
( q ) 2 = (qk ) 2
(4) noise V0 {(0, )}
m
n
N k N
1
speech V1 {skm } {(0, )}
m
c
( c ) 2 = (ck ) 2
N k N unvoice V2 {(0, )}
m
q
Proceedings of the First International Conference on Innovative Computing, Information and Control (ICICIC'06)
0-7695-2616-0/06 $20.00 2006
P ({skm } | V2 ) =
mM k N ( m )
p({skm } | V2 ) m
(2) Compute the mean Ei of Ei , as following
(8) formula (10)
(s )
m 2
1
M
= EXP k m 2 Ei =
1
E im (10)
mM , k N ( m ) 2 ( qm ) 2 2( q ) M m =1
index.
is speech. Otherwise, it is marked as noise. The other
way round, if it is found that the signal sub-band
Proceedings of the First International Conference on Innovative Computing, Information and Control (ICICIC'06)
0-7695-2616-0/06 $20.00 2006
energy value is big, the wavelet coefficient variance Table 2.The endpoint detection results of 160
method will be used. Use equation (6) and (7) to speech sentences in different SNRs
compute results, if P ({skm } | V0 ) < P ({skm } | V1 ) , then the
signal is speech, otherwise using equation (8). If method clean 15dB 10dB 0dB
P ({skm } | V0 ) < P ({skm } | V2 ) , then we judge the signal
should be speech, otherwise it should be noise. EZCR 97.9 96.6 75.6 64.0
(4) If i>D, the algorithm ends, otherwise returns to SBAV 98.5 83.3 80.1 72.1
step (2).
(5) After all frames are marked separately, the post- WCV 97.2 96.7 90.4 85.6
process will be started. We define that the minimum
SI 97.6 97.7 90.8 86.7
speech span is 8 frames and the minimum noise span is
4 frames. Thus, when the time spans are shorter than
the defined time period, it will be discarded.
7. Acknowledgements
6. The experimental results and
conclusions The project is sponsored by the Scientific Research
Foundation for the Returned Overseas Chinese
The experiment is carried under different noise Scholars, State Education Ministry of China ([2004]
conditions. First, the speech signal is sampled at No.176), Natural Science Foundation of China
11.025kHz and quantized into 16bit data, and mixed (No.60472094), Shanxi Province Natural Science
with different levels of white noise, and then added Foundation (No.20051039), and Shanxi Province
random with color noise. In all experiments, the speech Scientific Research Foundation for University Young
signals are divided into frames with 220 samples each. Scholars ([2004] No.13).The authors gratefully
The neighboring frames shared 50% overlapping area. acknowledge them.
Marking manually each speech file to distinguish
speech endpoint from the noisy background, and then 8. References
we can use these marks to obtain the accuracy of the
speech endpoint detecting method. [1] A.M. Nassar, N.S. Kader and A.M. Refat, Endpoints
We use simultaneity the Energy and Zero-Crossing detection for noisy speech using a wavelet based algorithm,
EUROSPEECH99 .Budapest Kluwer Academic Publishers ,
Rate (EZCR), Sub-Band Amplitude Variance (SBAV),
pp.903906.
Wavelet Coefficient Variance (WCV) and Synthesis
Implementation (SI) methods to detect the speech [2] J.B. Xu, C.S. Ran, The application of adaptive wavelet
signal endpoint. The endpoint detection results of 160 transformation in speech signal processing, Computer
speech sentences using four methods are shown in the Engineering and Science, vol.26, no.7, 2004.
Table 2. From the table we can see that along with the
increasing in the noise interference, the detection [3] F. Wang, F. Zheng, Speech detection in non-stationary
correct rate of WCV method is rapidly decreased. The noise based on 1/f process, Journal of Computer &
SBAV method is suit for processing the signal with Technology, vol.17, no.1, pp327-330, 2002,
white noise, not for color noise. Its advantage is simple
[4] B. Wu, X.L.Ren, A Noise Model Based Method for
and fast. The WCV method can perform much better Speech/Noise Discrimination, Journal of Shanghai Jiaotong
detection comparing with the EZCR and SBEV University, no.9, Sep.2004.
methods in white and color noise environment. Its
disadvantage is the complexity of algorithm. The SI
method makes full use of the merits of SBAV and
WCV methods, selects one of them by judging noise
type. Thus, it can obtain best detection results not
wasting system resource. It can meet the demand of
endpoint detection in practical applications, such as the
speech strengthening under strong noise interference
and the speech recognition etc.
Proceedings of the First International Conference on Innovative Computing, Information and Control (ICICIC'06)
0-7695-2616-0/06 $20.00 2006