Professional Documents
Culture Documents
2.3. Task
Since GHDSS is a linear separation technique, it
does not handle non-linear mixed signals. Meanwhile,
a proposed model representation for a noisy nonlinear
ICA based on the formulation of maximum likelihood
Fig. 1. DNN - Proposed model
Fig. 3. Results
4. EVALUATION
4.3. Results
4.1. Audio Dataset We show the results of our experiments in Fig. 3
For the training and test stage we use the JNAS showing the accuracy rate of the speech recognition by
dataset, which includes almost 70,000 speech sentences Julius engine for multiple audio inputs from different
from different speakers, at different sound noise rate in direction. We can see that both, the GHDSS-HRLE and
Japanese language and different speech length. the DNN methods has almost the same average result.
For preparation of this dataset was used a hearbo However, applying DNN for denoising after a GHDSS-
robot equipped with an 8 Channel array microphone HRLE previous process, enhance the previous accuracy rate.
(Fig. 2). A speech sentence was sampled every 10 deg.
around the robot, starting at 0 deg. And a noise input 5. CONCLUSION
was used the Fan Noise from the robot. In this paper, we proposed a SSS method using DNN. As
show in the results, a combined method of DNN + GHDSS-
HRLE enhanced the recognition rate that using only one
method. However, is important to remark the DNN can
achieve a recognition rate of two different process in one
DNN process without no more information that the noisy
information and the clear information.
ACKNOWLEDGEMENTS
The work has been supported by MEXT Grant-in-Aid for
Scientific Research (A) 15H01710.
REFERENCES
Fig. 2. Hearbo [1] K. Nakadai, T. Lourens, G. O. Hiroshi, and H. Kitano,
Active audition for humanoid, Proc. Natl. Conf. Artif.
Intell., pp. 832839, 2000.
[2] K. Nakadai, G. Ince, K. Nakamura, and H. Nakajima,
4.2. Training the DNN
Robot audition for dynamic environments, 2012 IEEE
For our experiments, from the audio data using MFB Int. Conf. Signal Process. Commun. Comput. ICSPCC
we get 27 dimensions and use a sliding window of 11 2012, pp. 125130, 2012.
frames, so the input 297 dimensions x Channel, and the [3] S. Maeda and S. Ishii, A noisy nonlinear independent
output become a size of 297 dimensions. For the first component analysis, Proc. 2004 14th IEEE Signal
Process. Soc. Work. Mach. Learn. Signal Process. 2004.,
experiment we use the 8 Channels noisy data, so the
pp. 173182, 2004.
input becomes to 2376 dimensions; and for the second [4] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N.
experiment using the data from then HRLE GHDSS Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N.
cleaned output as input, the DNN input becomes to 594 Sainath, and B. Kingsbury, Deep Neural Networks for
dimensions. Acoustic Modeling in Speech Recognition, IEEE Signal
Process. Mag., no. November, pp. 8297, 2012.
[5] X. Feng, Y. Zhang, and J. Glass, Speech feature
We evaluated multiple type of DNN, and found the denoising and dereverberation via deep autoencoders for
most suitable structure with three Hidden Layers of noisy reverberant speech recognition, ICASSP, IEEE Int.
which neuron size are 1600, 800, and 400 respectively. Conf. Acoust. Speech Signal Process. - Proc., pp. 1759
As for the training data, we use random information 1763, 2014.
from the JNAS dataset without information of the
speaker location angle.