You are on page 1of 6

Blind Multichannel System Identication

with Applications in Speech Signal Processing


R. M. Nickel
Department of Electrical Engineering, 121 EE West,
The Pennsylvania State University, University Park, PA 16802
Email: rmn10psu.edu
Abstract
We are presenting a newapproach for blind multichannel
system identication. The approach relies on the existence
of so called exclusive activity periods (EAPs) in the source
signals. EAPs are time intervals during which only one
source is active and all other sources are inactive (i.e. zero).
The existence of EAPs is not guaranteed for arbitrary signal
classes. EAPs occur very frequently, however, in recordings
of conversational speech. The methods proposed in this pa-
per show how EAPs can be exploited to improve the perfor-
mance of blind multichannel system identication systems
in speech processing applications. We have shown that for
modestly complex tasks the proposed method achieves an
improvement of over 10 dB in signal-to-interference ratio
over conventional techniques.
1 Introduction
The general goal in blind multichannel system identi-
cation is to estimate the transfer function of a multichannel
system solely based on its output signals and without any
specic knowledge about the input signals to the system. A
practical example for the use of blind multichannel system
identication is given by the following scenario: We are
recording an acoustic scene with an array of microphones.
If we want to isolate the voice of a single speaker out of a
mixture of signals fromdifferent sources then it is implicitly
necessary to estimate the transmission properties (or equiv-
alently the inverse transmission properties) of all channel
between all microphones and all sources.
Historically, the rst successful solutions to such prob-
lems were of instantaneous mixture type [1], i.e. cases in
which the transfer function was merely a matrix of con-
stants. A variety of tools, known as independent component
analysis (ICA) methods (Comon [2]), have been developed
for these cases. The general criterion behind ICA methods
is to achieve a maximization of a measure of independence
between the assumed input signals to the system.
Solutions to the more general convolutive mixture case
are signicantly more complicated. Most solutions can be
classied as either, time domain approaches (see [3] and the
references therein) or, frequency domain approaches (see
[4]). More specically, we have to distinguish between: the
domain in which we model the mixing process (mixing do-
main) and the domain in which we model the statistics of
the source signals (source domain). A choice of either time
or frequency for each of these domains have signicant ad-
vantages and disadvantages (see [4]).
In this paper we are proposing an alternative approach
that does not explicitly rely on an independence assumption
between sources. Instead, we are assuming the existence
of exclusive activity periods (EAPs). EAPs are time inter-
vals during which only one source is active and all other
sources are silent. The existence of EAPs is not guaranteed
for arbitrary signal classes, but EAPs occur very frequently
in recordings of conversational speech.
2 Methods
A block diagram that depicts the considered scenario is
shown in gure 1. We assume that we have M unknown
source signals x
i
[n] for i = 1 . . . M. The transmission path
between source i and receiver j is described by the trans-
fer function of a linear time-invariant system with impulse
response g
ij
[n]. The resulting M observation signals y
j
[n]
are generated according to:
y
j
[n] =
M

i=1
_

k=0
g
ij
[k] x
i
[n k]
_
for j = 1 . . . M.
(1)
Equation (1) can also be expressed in the z-domain as
Y(z) = G(z) X(z), (2)
in which X(z) is the z-transform of multichan-
nel signal x[n] = [ x
1
[n] x
2
[n] . . . x
M
[n] ]
T
,
Y(z) is the z-transform of multichannel signal
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
y[n] = [ y
1
[n] y
2
[n] . . . y
M
[n] ]
T
, and G(z) is a
matrix with the z-transforms of the impulse responses
g
ij
[n] for all i and j.
The goal in blind system identication is to nd a matrix
of transfer functions H(z) such that

X(z) given by

X(z) = H(z) Y(z) (3)


is (in some appropriate metric) as close to X(z) as possible.
Unfortunately, since we cannot measure the true source sig-
nals x[n], it is generally not possible to explicitly minimize
the deviation. Instead, the most commonly used criterion
to nd H(z) is to choose it such that the components of
the resulting source reconstruction x[n] are statistically as
independent as possible.
In this paper we are proposing an alternative method that
minimizes an estimate of the deviation between x[n] and
x[n]. The approach can be divided into three steps:
1. Find a set of time intervals [n
1
, n
2
] during which only
one source is active and all other sources are silent. We
refer to such time intervals as exclusive activity periods
(EAPs).
2. Find a set of transfer functions that deconvolve the
sources during EAPs.
3. Construct H(z) by combining the results from differ-
ent EAPs from different sources.
One of the caveats of the proposed approach is that it is
dependent on the existence of exclusive activity periods.
The existence of EAPs is not guaranteed for arbitrary signal
classes. EAPs do occur, however, very frequently in record-
ings of conversational speech, which makes the proposed
method particularly interesting for the solution of the blind
speech separation problem.
2.1 EAP Estimation for Conversational
Speech Signals
An exclusive activity period is a time during which only
one source x
i
[n] is active and all other sources x
k
[n] are
silent, i.e. x
k
[n] = 0 for k = i. The estimation of exclusive
activity periods for speech sources x
i
[n] can be based on the
(almost) periodic nature of vocalic sounds. If only a single
person is speaking then all observations y
j
[n] exhibit time
intervals with a periodic structure. When multiple persons
are speaking the periodicity is generally destroyed [5].
A robust short-time periodicity measure was proposed
Medan et al. [6]. They consider the similarity between two
adjacent observation segments of length k:
s
j
1
[n, k] =[ y
j
[n k] . . . y
j
[n 2] y
j
[n 1] ]
T
(4)
s
j
2
[n, k] =[ y
j
[n] y
j
[n + 1] . . . y
j
[n +k 1] ]
T
. (5)
Sources
Mixture Model
Observations
General Model Scenario
x
1
[n]
x
2
[n]
x
M
[n]
g
ii
[n]
g
ij
[n]
g
ji
[n]
g
jj
[n]
y
1
[n]
y
2
[n]
y
M
[n]
Figure 1. A block diagram of the mixing scenario
described by equation (1). The M unknown signal
sources are labelled with x
i
[n]. The M observa-
tions are labelled with y
j
[n].
A correlation measure NCOR
j
[n, k] is dened through a
normalized inner product of vectors s
j
1
[n, k] and s
j
2
[n, k]:
NCOR
j
[n, k] =
s
j
1
[n, k]
T
s
j
2
[n, k]
s
j
1
[n, k] s
j
2
[n, k]
. (6)
The normalization ensures that the correlation measure is
bounded between zero and one, i.e. 0 NCOR
s
[n, k] 1.
The correlation measure is equal to one at the true period p
of a perfectly periodic signal. Less than perfectly periodic
signals yield correlation values less than one. As a conse-
quence, we can dene a short-time periodicity measure as:
STPM
j
[n] = max
pminkpmax
{ NCOR
j
[n, k] } . (7)
The search range for the maximum should be bounded by
the typical pitch range of human speech (50Hz...500Hz).
For observation signals that are sampled with sampling fre-
quency F
s
we have:
p
min
= F
s
/ 500 Hz and p
max
= F
s
/ 50 Hz . (8)
A second feature that correlates well with EAPs is the so
called short-time zero crossing rate [7]:
STZC
j
[n] =
n+L

m=nL
| sign(y
j
[m]) sign(y
j
[m1]) |. (9)
In our notation sign(x) is equal to +1 for x 0 and 1
for x < 0. The zero crossing rate counts the number of
transitions frompositive samples to negative samples within
the range (n L 1) . . . (n + L). The range length is
usually chosen as L = F
s
10 msec. Typically, the zero
crossing rate is low for EAP sections and high otherwise.
For the STZC measure to work properly it is important that
possible quantization offsets in the recorded speech signal
are removed prior to processing.
A normalized short-time zero crossing measure
NZCM
j
[n] is constructed with the maximum ZCmax
j
and
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
the minimum ZCmin
j
of STZC
j
[n] over all n:
NZCM
j
[n] =
STZC
j
[n] ZCmax
j
ZCmax
j
ZCmin
j
. (10)
The identication of EAP candidates is done as follows: 1)
nd all times n for which STPM
j
[n] 0.7 for all obser-
vations j = 1 . . . M, 2) expand the so found sections in
forward and backward direction until STPM
j
[n] < 0.6, 3)
remove times for which NZCM
j
[n] > 0.5, and 4) retain
only the intersection of all EAP candidates across all chan-
nels j = 1 . . . M. An example for STPM
j
[n], NZCM
j
[n],
and the resulting set of EAP candidate sections is shown in
gure 3.
2.2 Blind EAP Deconvolution
In this section we discuss the subproblem of blind sys-
tem identication under an exclusive activity assumption,
i.e. we assume that we have identied a time interval
[n
1
, n
2
] during which only source x
i
[n] is active and all
other sources are silent, i.e. x
k
[n] = 0 for k = i. Under
the EAP assumption we may attempt to reconstruct source
x
i
[n] from each observation y
j
[n] via an appropriately cho-
sen inverse lter

h
ij
[n]:
x
j
i
[n] =
P

k=0

h
ij
[k] y
j
[n k]. (11)
Ideally x
i
[n] = x
j
i
[n] for all j = 1 . . . M. Practically,
however, we have x
k
i
[n] = x
j
i
[n] for k = j due to noise,
imperfect estimation of the EAPs, improper choice of P,
non-minimum phase properties of g
ij
[n], and so forth.
An estimate E
i
of the reconstruction error can be dened
with
E
i
=
M

j=1

n
| x
j
i
[n] x
i
[n] |
2
(12)
and x
i
[n] =
1
M
M

j=1
x
j
i
[n]. (13)
It is readily seen from equation (12) that a perfect recon-
struction with x
i
[n] = x
j
i
[n] yields a minimum error esti-
mate of E
i
= 0. Unfortunately, x
i
[n] = x
j
i
[n] may not
be the only solution that satises E
i
= 0 (e.g. if the g
ij
[n]
are linearly dependent). One may hope, however, that (if the
g
ij
[n] are sufciently different) a global minimization of E
i
will lead to good estimates for the

h
ij
[n] for j = 1 . . . M.
The computation of the global minimum of E
i
is aided
by the following notation. We dene:
x
j
i
=
_
x
j
i
[n
1
+P] . . . x
j
i
[n
2
1] x
j
i
[n
2
]
_
T
,
Y
j
=
_

_
y
j
[n
1
] y
j
[n
1
+ 1] y
j
[n
1
+P]
y
j
[n
1
+ 1] y
j
[n
1
+ 2]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y
j
[n
2
P] y
j
[n
2
]
_

_
,
and

h
j
i
=
_

h
ij
[P] . . .

h
ij
[1]

h
ij
[0]
_
T
. (14)
Equation (11) can be rewritten as x
j
i
= Y
j

h
j
i
and:
x
i
=
1
M
M

j=1
x
j
i
=
1
M
M

j=1
Y
j

h
j
i
. (15)
The error estimate (12) becomes:
E
i
=
M

j=1
_
_
_
_
_
Y
j

h
j
i

1
M
M

k=1
Y
k

h
k
i
_
_
_
_
_
2
(16)
=
M

j=1
[ Y
j

h
j
i
]
T
Y
j

h
j
i

1
M
M

j=1
M

k=1
[ Y
k

h
k
i
]
T
Y
j

h
j
i
.
We dene the matrices R
jk
= [ Y
k
]
T
Y
j
and
R
F
=
_

_
R
11
R
12
R
1M
R
21
R
22

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
R
M1
R
MM
_

_
, (17)
R
D
=
_

_
R
11
0 0
0 R
22

.
.
.
.
.
.
.
.
.
.
.
. 0
0 0 R
MM
_

_
, (18)
and

H
i
=
_
[

h
1
i
]
T
[

h
2
i
]
T
. . . [

h
M
i
]
T
_
T
. (19)
Using equations (14) to (19) we can compactly write the
error estimate as
E
i
=

H
T
i
[ R
D

1
M
R
F
]

H
i
. (20)
In order to avoid the trivial minimization of equation (20)
(with

H
i
= 0) we constrain the solution to
x
i

2
=
1
M
2

H
T
i
R
F

H
i
= 1. (21)
We have thus reformulated the problem into that of nding
the vector

H
i
that minimizes E
i
subject to equation (21).
It is readily shown with Lagrange multipliers that the so-
lution to the above problem is provided by one of the gen-
eralized eigenvectors
m
of matrices R
D
and R
F
:

m
R
D

m
= R
F

m
. (22)
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
We assume that the eigenvalues
m
are sorted in decreas-
ing order
1

2

3

4
. . . and that the eigen-
vectors are normalized such that
m
= 1 for all m.
Choosing
m
as the solution leads to an error estimate of
E
i
= M (
M
m
1). The optimal solution, i.e. the one that
minimizes E
i
is thus given by

H
i
=
1
and E
i
= M (
M

1
1).
2.3 Blind System Identication
As a result of the methods described in sections 2.1
and 2.2 we obtain an inverse lter estimate

H
i
(with its as-
sociated eigenvalue
1
) for each separately identied EAP
section. In a rst step we discard all EAP sections (and

H
i
)
for which log
10
M
M1
was greater than a certain EAP ac-
ceptance threshold (EAT - see section 4). In a second step
we use a simple minimum Euclidean distance hierarchical
clustering method [8] to associate each vector

H
i
to one of
the M sources. All vectors associated with the same source
k are averaged
1
(arithmetic mean) into an average eigen-
vector

H
k
for each source k = 1 . . . M. By extracting the
corresponding subvectors

h
j
i
in analogy to equation (19):

H
i
=
_
[

h
1
i
]
T
[

h
2
i
]
T
. . . [

h
M
i
]
T

T
, (23)
we obtain a complete set of inverse lter vectors

h
j
i
:

h
j
i
=
_

h
ij
[P] . . .

h
ij
[1]

h
ij
[0]

T
. (24)
An estimate for the mixing matrix G(z) from equation (2)
is obtained from:
_

G(z)
_
ij
=
1

P
k=0

h
ij
[k] z
k
, (25)
where notation [G]
ij
refers to the element of matrix G in
row i and column j. An estimate for the demixing ma-
trix H(z) from equation (3) can be obtained by numerically
inverting

G(z) via Gaussian elimination. Unfortunately,
the inversion process may introduce unstable poles into the
transfer functions of

H(z). The production of stable lters
can be enforced by mirroring poles that fall outside of the
unit circle back into the inside of the unit circle. The mir-
roring process distorts the correct phase response, but leaves
the magnitude response of individual channels intact.
3 Experiments
Experiments were conducted to verify the performance
of the proposed method. As speech data we used the SI-
subset of the TIMIT database from the Linguistic Data
1
The weight for each vector was chosen proportional to the number of
samples contained in the associated EAP section. Longer EAP sections
had thus more weight then shorter ones.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Source Signals
Time [sec]
C
h
a
n
n
e
l

#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l

#
2
Figure 2. An example of two source signals x
1
[n]
and x
2
[n] from the TIMIT database. The signals
were aligned to have a 30% overlap in time.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Mixed Signals and Features
Time [sec]
C
h
a
n
n
e
l

#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l

#
2
Figure 3. The resulting mixed signals y
1
[n] and
y
2
[n] from the example in gure 2. The upper
dashed line in each axis indicates the resulting
STPM
j
[n] contour (equation (7)) and the lower
dashed line indicates the resulting NZCM
j
[n]
contour (equation (10)). The gray regions indi-
cate the EAP sections that were estimated from
the mixed signals.
Consortium
2
. The chosen subset consists of recordings
from 630 subjects each uttering 3 phonetically-diverse sen-
tences
3
. The sentences were recorded with a sampling fre-
quency of 16 kHz. The signals were low-pass ltered and
down-sampled to 8 kHz prior to processing. All 3 sentences
from the same speaker were concatenated and then trun-
2
The data is available at <http://www.ldc.upenn.edu/>.
3
None of the sentences are repeated more than once.
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Demixed Signals EAP Method
Time [sec]
C
h
a
n
n
e
l

#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l

#
2
Figure 4. The resulting demixed signals x
1
[n] and
x
2
[n] from the example in gure 2 after applica-
tion of the proposed EAP method. Both signals
are very close to the original source signals x
1
[n]
and x
2
[n] depicted in gure 2.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Demixed Signals Parra/Spence Method
Time [sec]
C
h
a
n
n
e
l

#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l

#
2
Figure 5. The demixing result of a commonly
used conventional method of blind source sepa-
ration after L. Parra and C. Spence [9] (see sec-
tion 4).
cated to 4 seconds. As a result we received a total of 630
different 4 seconds long source signals x[n].
We ran experiments with different source numbers (M =
2, 3, and 4) and different lter lengths (P + 1 = 5, 7, and
10). For each M the available data was randomly split
4
into
[ 630/M ] groups of M source signals x
i
[n]. To simulate
conversational speech the signals were partially faded out
to obtain a relative time-overlap between signals of roughly
30% (see gure 2). The M source signals of each group
4
Every source signal was only used once.
were mixed with order-P random minimum phase lters
g
ij
[n] according to equation (1). The resulting observations
y
j
[n] for j = 1 . . . M were then used to estimate the inverse
lter matrix

H(z) according to section 2. The reconstructed
source signal estimates x
i
[n] for i = 1 . . . M were com-
puted according to equation (3) via

X(z) =

H(z) Y(z).
The quality of the estimated model was evaluated with
the Signal-to-Interference Ratio (SIR in [dB]) between the
reconstructed signal x
i
[n] and the original signal x
i
[n]:
SIR
i
= max
p
_
10 log
10

n
| x
i
[n] |
2

n
| x
i
[n] x
i
[n p] |
2
_
.
(26)
The evaluation of the SIR was performed under careful con-
sideration of possible numbering permutations between the
original signals and the reconstructions.
Figures 2 to 4 show an example for an experiment with
two sources. The gray regions in the gures indicate the
EAP sections that were estimated from the mixed signals
y
1
[n] and y
2
[n] from gure 3. Figure 5 shows the result
for a commonly used conventional method of blind source
separation after L. Parra and C. Spence [9] (see section 4).
4 Results
The results of the experiments are summarized in tables
I and II. Table I lists the average SIR values (AvSIR) that
were obtained by averaging the SIR
i
after equation (26)
over all channels and all experiments with the same source
number M and lter order P. The third column reports
the average SIR values for the proposed EAP method. The
fourth column reports the average SIR values that resulted
from an application of the popular blind source separation
method proposed by L. Parra and C. Spence [9]. The results
for the Parra/Spence method was computed with software
written by S. Harmeling (MATLAB function convbss.m,
endorsed by L. Parra and C. Spence). The last column re-
ports the average SIR between the observations y
j
[n] and
the sources x
i
[n] as a reference.
Table II provides supplemental information for each ex-
periment. Column three of table II lists the average SIR
results that are obtained when the proposed methods is ap-
plied to the true EAP locations (and not the estimated EAP
locations). The fourth column of table II reports the number
of instances (in %) in which the numerical inversion of ma-
trix

G(z) after equation (25) led to unstable poles that had
to be mapped back into the unit circle. Column ve reports
the chosen value for the EAP acceptance threshold (EAT)
as described in section 2.3.
It is clearly visible from table I that the proposed method
achieves signicant improvements over the Parra/Spence
method for small complexity tasks with smaller source
numbers M and smaller lter orders P. In the best case
scenario, for two sources and with a lter length of 5 taps,
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
we can obtain a 11 dB improvement in average signal-to-
interference ratio. Unfortunately, the advantage vanishes
with larger complexity tasks. The reason for the decline
is partially due to the increasing number of pole location
changes as listed in column four of table II.
Table I
Source Filter AvSIR AvSIR AvSIR
Number Length EAP Parra/ Mixed
M P + 1 Method Spence Signals
2 5 16.06 4.96 4.47
2 7 10.45 5.11 4.50
2 10 7.37 5.35 4.76
3 5 11.24 5.17 3.77
3 7 8.41 5.16 3.88
3 10 6.22 5.40 4.02
4 5 6.67 5.17 3.10
4 7 2.72 5.10 3.26
Table I. Average signal-to-noise ratios for various
source numbers M, lter orders P, and algorithms.
Table II
Source Filter AvSIR Pole
Number Length True Mirror EAT
M P + 1 EAP %
2 5 47.29 15.56 4.0
2 7 26.54 30.79 4.0
2 10 9.29 55.56 4.0
3 5 30.53 59.52 3.0
3 7 19.15 77.14 3.0
3 10 9.68 95.24 3.0
4 5 18.11 92.36 3.0
4 7 9.09 99.36 3.0
Table II. Supplemental statistics about the experiments
with various source numbers M and lter orders P.
A very promising result for future developments is con-
tained in column three of table II. If the result of the EAP
estimation is replaced with the true location of the EAPs
in the given mixture signals y
j
[n] for j = 1 . . . M then the
average SIRis dramatically improved over the Parra/Spence
method even for higher complexity cases. It is thus expected
that the method will produce signicantly better results if
equipped with a more robust EAP detection strategy.
5 Conclusions
We have presented a new approach for blind multichan-
nel system identication. The approach relies on the de-
tection of exclusive activity periods (EAPs) in the source
signals. The presence of EAPs is exploited for a reliable
blind estimation of source signals and channel properties
(between sources and observations). The advantage of the
proposed method was demonstrated experimentally within
the framework of convolutive blind separation of speech
signals.
The goal of the paper was to provide a proof of con-
cept for EAP based, blind identication methods. Some of
the methods presented in this paper, especially the section
on EAP detection, are, in their current form, still subop-
timal and deserve to be studied in greater detail. Despite
its suboptimality, however, the proposed method still im-
proves upon existing strategies (especially for lower com-
plexity tasks).
A caveat of the proposed method is that (currently) we
have not imposed a constraint that forces the optimal un-
mixing matrix

H(z) to be representative of a stable system.
Instead, we employed a simple pole-mirroring strategy that,
by itself, is responsible for a substantial part of the perfor-
mance loss at higher complexity tasks (see table II).
References
[1] A. Hyvarinen, Karhunen J., and E. Oja, Independent
Component Analysis, Wiley-Interscience, 2001.
[2] P. Comon, Independent component analysis: A new
concept, Signal Processing, vol. 36, pp. 287314,
1994.
[3] A. Cichocki and S. Amari, Adaptive Blind Signal and
Image Processing: Learning Algorithms and Applica-
tions, Wiley, Chichester, U.K., 2002.
[4] N. Mitianoudis and M. E. Davies, Audio source sepa-
ration: solutions and problems, International Journal
of Adaptive Control and Signal Processing, vol. 18, no.
3, pp. 299314, Apr. 2004.
[5] J. R. Deller, J. G. Proakis, and J. H. Hansen, Discrete-
Time Processing of Speech Signals, Macmillan, New
York, 1993.
[6] Y. Medan, E. Yair, and D. Chazan, Super resolution
pitch determination of speech signals, IEEE Transac-
tions on Signal Processing, vol. 39, no. 1, pp. 4048,
January 1991.
[7] Kondoz, Digital Speech Coding for Low Bit Rate Com-
munication Systems, Wiley-Interscience, 2004.
[8] R. O. Duda and P. E. Hart, Pattern Classication and
Scene Analysis, Wiley-Interscience, Menlo Park, CA,
1973.
[9] L. Parra and C. Spence, Convolutive blind separa-
tion of non-stationary sources, IEEE Transactions on
Speech and Audio Processing, vol. 8, no. 3, pp. 320
327, May 2000.
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.

You might also like