Professional Documents
Culture Documents
1
Young Investigator supported by the MŠZŠ of the Republic of Slovenia and Socrates/Erasmus exchange student
under the multilateral agreement UL D-IV-1/99-JM/Kc. Research was also supported in part by the COST 277 project.
would reflect the differences in frequency
decomposition between the MFCC- and WPP
77.19
71.95
80.00
parametrizations, it was decided that no dynamic 70.00
EN_MFCC
61.21
EN_WPP
information should be included into the feature vectors.
60.00
It is well known that the delta mel–cepstral features
43.10
improve the performance of hidden Markov models, 50.00
34.34
yet in our experimental setup, we rather aimed to 40.00
29.55
26.55
26.34
analyze how the inherent underlying transformation
22.61
30.00
differences influence the MFCC and WPP-based
14.78
20.00
recognition performance. That turned us away from
4.60
10.00
3.07
using deltas.
Comparison of the recognition performance differences 0.00
cdigits citynames commands digits rwords yesno
between Slovenian and English SD2 databases were standard testing type
50.00
40.00
content of the signal. This is due to the non-optimal
separability of conjugate mirror filter that implements
25.50
22.42
30.00
21.52
18.13
17.58
0.00
Additionaly we experienced a problem when we
cdigits citynames commands digits rwords yesno applied a threshold to small values of energies before
standard testing type
the log followed by a de-correlation with the wavelet
Figure 1 Word error rates for six standard tests on the transform was to be taken. Log tends to boost small
Slovenian SD2 using the mel–cepstral (MFCC) and values. Since these values presumably belong to noise
wavelet (WPP) features. they represent the additional data that the model has to
absorb. This possibly yielded to degraded overall [8] Wavelet Packet Transform Features with
recognition performance. The empirical threshold Application to Speaker Identification, R. Sarikaya,
we've used with the Slovenian SD2 wouldn't work well B. L. Pellom, and J. H. Hansen, NORSIG'98, pp.
for the English SD2. The HTK couldn’t cope with the 81-84, 1998.
English WPP features calculated by thresholding and [9] I. Daubechies, Orthonormal bases of compactly
reported an "overprunning" error which was remedied supported wavelets, Comm. Pure and Applied
by the removal of thresholding. Math., 41:909-996, 1988.
In conclusion, despite the preliminary stage of our
experimental setup in the field of non-linear speech
analysis, the results confirmed the hypothesis that
using wavelets may bring potential in automatic speech
recognition. Further work and improvements should
incorporate the use of delta and delta–delta
coefficients. The phoneme classification experiment
within and between languages could also be considered
in order to give additional information on the specific
properties of parameterization techniques. Since
SpeechDat2 represents a noisy telephone database the
use of wavelet de–noising could offer a solid
foundation to increase the robustness of wavelet
parameterization method to noise and additionally
improve the recognition results.
6. References
[1] Mallat, Stéphane, A wavelet tour of signal
processing, San Diego: Academic Press, 1999.
ISBN 012466606X