Professional Documents
Culture Documents
i=1
_
k=0
g
ij
[k] x
i
[n k]
_
for j = 1 . . . M.
(1)
Equation (1) can also be expressed in the z-domain as
Y(z) = G(z) X(z), (2)
in which X(z) is the z-transform of multichan-
nel signal x[n] = [ x
1
[n] x
2
[n] . . . x
M
[n] ]
T
,
Y(z) is the z-transform of multichannel signal
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
y[n] = [ y
1
[n] y
2
[n] . . . y
M
[n] ]
T
, and G(z) is a
matrix with the z-transforms of the impulse responses
g
ij
[n] for all i and j.
The goal in blind system identication is to nd a matrix
of transfer functions H(z) such that
X(z) given by
m=nL
| sign(y
j
[m]) sign(y
j
[m1]) |. (9)
In our notation sign(x) is equal to +1 for x 0 and 1
for x < 0. The zero crossing rate counts the number of
transitions frompositive samples to negative samples within
the range (n L 1) . . . (n + L). The range length is
usually chosen as L = F
s
10 msec. Typically, the zero
crossing rate is low for EAP sections and high otherwise.
For the STZC measure to work properly it is important that
possible quantization offsets in the recorded speech signal
are removed prior to processing.
A normalized short-time zero crossing measure
NZCM
j
[n] is constructed with the maximum ZCmax
j
and
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
the minimum ZCmin
j
of STZC
j
[n] over all n:
NZCM
j
[n] =
STZC
j
[n] ZCmax
j
ZCmax
j
ZCmin
j
. (10)
The identication of EAP candidates is done as follows: 1)
nd all times n for which STPM
j
[n] 0.7 for all obser-
vations j = 1 . . . M, 2) expand the so found sections in
forward and backward direction until STPM
j
[n] < 0.6, 3)
remove times for which NZCM
j
[n] > 0.5, and 4) retain
only the intersection of all EAP candidates across all chan-
nels j = 1 . . . M. An example for STPM
j
[n], NZCM
j
[n],
and the resulting set of EAP candidate sections is shown in
gure 3.
2.2 Blind EAP Deconvolution
In this section we discuss the subproblem of blind sys-
tem identication under an exclusive activity assumption,
i.e. we assume that we have identied a time interval
[n
1
, n
2
] during which only source x
i
[n] is active and all
other sources are silent, i.e. x
k
[n] = 0 for k = i. Under
the EAP assumption we may attempt to reconstruct source
x
i
[n] from each observation y
j
[n] via an appropriately cho-
sen inverse lter
h
ij
[n]:
x
j
i
[n] =
P
k=0
h
ij
[k] y
j
[n k]. (11)
Ideally x
i
[n] = x
j
i
[n] for all j = 1 . . . M. Practically,
however, we have x
k
i
[n] = x
j
i
[n] for k = j due to noise,
imperfect estimation of the EAPs, improper choice of P,
non-minimum phase properties of g
ij
[n], and so forth.
An estimate E
i
of the reconstruction error can be dened
with
E
i
=
M
j=1
n
| x
j
i
[n] x
i
[n] |
2
(12)
and x
i
[n] =
1
M
M
j=1
x
j
i
[n]. (13)
It is readily seen from equation (12) that a perfect recon-
struction with x
i
[n] = x
j
i
[n] yields a minimum error esti-
mate of E
i
= 0. Unfortunately, x
i
[n] = x
j
i
[n] may not
be the only solution that satises E
i
= 0 (e.g. if the g
ij
[n]
are linearly dependent). One may hope, however, that (if the
g
ij
[n] are sufciently different) a global minimization of E
i
will lead to good estimates for the
h
ij
[n] for j = 1 . . . M.
The computation of the global minimum of E
i
is aided
by the following notation. We dene:
x
j
i
=
_
x
j
i
[n
1
+P] . . . x
j
i
[n
2
1] x
j
i
[n
2
]
_
T
,
Y
j
=
_
_
y
j
[n
1
] y
j
[n
1
+ 1] y
j
[n
1
+P]
y
j
[n
1
+ 1] y
j
[n
1
+ 2]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
y
j
[n
2
P] y
j
[n
2
]
_
_
,
and
h
j
i
=
_
h
ij
[P] . . .
h
ij
[1]
h
ij
[0]
_
T
. (14)
Equation (11) can be rewritten as x
j
i
= Y
j
h
j
i
and:
x
i
=
1
M
M
j=1
x
j
i
=
1
M
M
j=1
Y
j
h
j
i
. (15)
The error estimate (12) becomes:
E
i
=
M
j=1
_
_
_
_
_
Y
j
h
j
i
1
M
M
k=1
Y
k
h
k
i
_
_
_
_
_
2
(16)
=
M
j=1
[ Y
j
h
j
i
]
T
Y
j
h
j
i
1
M
M
j=1
M
k=1
[ Y
k
h
k
i
]
T
Y
j
h
j
i
.
We dene the matrices R
jk
= [ Y
k
]
T
Y
j
and
R
F
=
_
_
R
11
R
12
R
1M
R
21
R
22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
R
M1
R
MM
_
_
, (17)
R
D
=
_
_
R
11
0 0
0 R
22
.
.
.
.
.
.
.
.
.
.
.
. 0
0 0 R
MM
_
_
, (18)
and
H
i
=
_
[
h
1
i
]
T
[
h
2
i
]
T
. . . [
h
M
i
]
T
_
T
. (19)
Using equations (14) to (19) we can compactly write the
error estimate as
E
i
=
H
T
i
[ R
D
1
M
R
F
]
H
i
. (20)
In order to avoid the trivial minimization of equation (20)
(with
H
i
= 0) we constrain the solution to
x
i
2
=
1
M
2
H
T
i
R
F
H
i
= 1. (21)
We have thus reformulated the problem into that of nding
the vector
H
i
that minimizes E
i
subject to equation (21).
It is readily shown with Lagrange multipliers that the so-
lution to the above problem is provided by one of the gen-
eralized eigenvectors
m
of matrices R
D
and R
F
:
m
R
D
m
= R
F
m
. (22)
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
We assume that the eigenvalues
m
are sorted in decreas-
ing order
1
2
3
4
. . . and that the eigen-
vectors are normalized such that
m
= 1 for all m.
Choosing
m
as the solution leads to an error estimate of
E
i
= M (
M
m
1). The optimal solution, i.e. the one that
minimizes E
i
is thus given by
H
i
=
1
and E
i
= M (
M
1
1).
2.3 Blind System Identication
As a result of the methods described in sections 2.1
and 2.2 we obtain an inverse lter estimate
H
i
(with its as-
sociated eigenvalue
1
) for each separately identied EAP
section. In a rst step we discard all EAP sections (and
H
i
)
for which log
10
M
M1
was greater than a certain EAP ac-
ceptance threshold (EAT - see section 4). In a second step
we use a simple minimum Euclidean distance hierarchical
clustering method [8] to associate each vector
H
i
to one of
the M sources. All vectors associated with the same source
k are averaged
1
(arithmetic mean) into an average eigen-
vector
H
k
for each source k = 1 . . . M. By extracting the
corresponding subvectors
h
j
i
in analogy to equation (19):
H
i
=
_
[
h
1
i
]
T
[
h
2
i
]
T
. . . [
h
M
i
]
T
T
, (23)
we obtain a complete set of inverse lter vectors
h
j
i
:
h
j
i
=
_
h
ij
[P] . . .
h
ij
[1]
h
ij
[0]
T
. (24)
An estimate for the mixing matrix G(z) from equation (2)
is obtained from:
_
G(z)
_
ij
=
1
P
k=0
h
ij
[k] z
k
, (25)
where notation [G]
ij
refers to the element of matrix G in
row i and column j. An estimate for the demixing ma-
trix H(z) from equation (3) can be obtained by numerically
inverting
G(z) via Gaussian elimination. Unfortunately,
the inversion process may introduce unstable poles into the
transfer functions of
H(z). The production of stable lters
can be enforced by mirroring poles that fall outside of the
unit circle back into the inside of the unit circle. The mir-
roring process distorts the correct phase response, but leaves
the magnitude response of individual channels intact.
3 Experiments
Experiments were conducted to verify the performance
of the proposed method. As speech data we used the SI-
subset of the TIMIT database from the Linguistic Data
1
The weight for each vector was chosen proportional to the number of
samples contained in the associated EAP section. Longer EAP sections
had thus more weight then shorter ones.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Source Signals
Time [sec]
C
h
a
n
n
e
l
#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l
#
2
Figure 2. An example of two source signals x
1
[n]
and x
2
[n] from the TIMIT database. The signals
were aligned to have a 30% overlap in time.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Mixed Signals and Features
Time [sec]
C
h
a
n
n
e
l
#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l
#
2
Figure 3. The resulting mixed signals y
1
[n] and
y
2
[n] from the example in gure 2. The upper
dashed line in each axis indicates the resulting
STPM
j
[n] contour (equation (7)) and the lower
dashed line indicates the resulting NZCM
j
[n]
contour (equation (10)). The gray regions indi-
cate the EAP sections that were estimated from
the mixed signals.
Consortium
2
. The chosen subset consists of recordings
from 630 subjects each uttering 3 phonetically-diverse sen-
tences
3
. The sentences were recorded with a sampling fre-
quency of 16 kHz. The signals were low-pass ltered and
down-sampled to 8 kHz prior to processing. All 3 sentences
from the same speaker were concatenated and then trun-
2
The data is available at <http://www.ldc.upenn.edu/>.
3
None of the sentences are repeated more than once.
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Demixed Signals EAP Method
Time [sec]
C
h
a
n
n
e
l
#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l
#
2
Figure 4. The resulting demixed signals x
1
[n] and
x
2
[n] from the example in gure 2 after applica-
tion of the proposed EAP method. Both signals
are very close to the original source signals x
1
[n]
and x
2
[n] depicted in gure 2.
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Demixed Signals Parra/Spence Method
Time [sec]
C
h
a
n
n
e
l
#
1
0 0.5 1 1.5 2 2.5 3 3.5 4
1
0.5
0
0.5
1
Time [sec]
C
h
a
n
n
e
l
#
2
Figure 5. The demixing result of a commonly
used conventional method of blind source sepa-
ration after L. Parra and C. Spence [9] (see sec-
tion 4).
cated to 4 seconds. As a result we received a total of 630
different 4 seconds long source signals x[n].
We ran experiments with different source numbers (M =
2, 3, and 4) and different lter lengths (P + 1 = 5, 7, and
10). For each M the available data was randomly split
4
into
[ 630/M ] groups of M source signals x
i
[n]. To simulate
conversational speech the signals were partially faded out
to obtain a relative time-overlap between signals of roughly
30% (see gure 2). The M source signals of each group
4
Every source signal was only used once.
were mixed with order-P random minimum phase lters
g
ij
[n] according to equation (1). The resulting observations
y
j
[n] for j = 1 . . . M were then used to estimate the inverse
lter matrix
H(z) according to section 2. The reconstructed
source signal estimates x
i
[n] for i = 1 . . . M were com-
puted according to equation (3) via
X(z) =
H(z) Y(z).
The quality of the estimated model was evaluated with
the Signal-to-Interference Ratio (SIR in [dB]) between the
reconstructed signal x
i
[n] and the original signal x
i
[n]:
SIR
i
= max
p
_
10 log
10
n
| x
i
[n] |
2
n
| x
i
[n] x
i
[n p] |
2
_
.
(26)
The evaluation of the SIR was performed under careful con-
sideration of possible numbering permutations between the
original signals and the reconstructions.
Figures 2 to 4 show an example for an experiment with
two sources. The gray regions in the gures indicate the
EAP sections that were estimated from the mixed signals
y
1
[n] and y
2
[n] from gure 3. Figure 5 shows the result
for a commonly used conventional method of blind source
separation after L. Parra and C. Spence [9] (see section 4).
4 Results
The results of the experiments are summarized in tables
I and II. Table I lists the average SIR values (AvSIR) that
were obtained by averaging the SIR
i
after equation (26)
over all channels and all experiments with the same source
number M and lter order P. The third column reports
the average SIR values for the proposed EAP method. The
fourth column reports the average SIR values that resulted
from an application of the popular blind source separation
method proposed by L. Parra and C. Spence [9]. The results
for the Parra/Spence method was computed with software
written by S. Harmeling (MATLAB function convbss.m,
endorsed by L. Parra and C. Spence). The last column re-
ports the average SIR between the observations y
j
[n] and
the sources x
i
[n] as a reference.
Table II provides supplemental information for each ex-
periment. Column three of table II lists the average SIR
results that are obtained when the proposed methods is ap-
plied to the true EAP locations (and not the estimated EAP
locations). The fourth column of table II reports the number
of instances (in %) in which the numerical inversion of ma-
trix
G(z) after equation (25) led to unstable poles that had
to be mapped back into the unit circle. Column ve reports
the chosen value for the EAP acceptance threshold (EAT)
as described in section 2.3.
It is clearly visible from table I that the proposed method
achieves signicant improvements over the Parra/Spence
method for small complexity tasks with smaller source
numbers M and smaller lter orders P. In the best case
scenario, for two sources and with a lter length of 5 taps,
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.
we can obtain a 11 dB improvement in average signal-to-
interference ratio. Unfortunately, the advantage vanishes
with larger complexity tasks. The reason for the decline
is partially due to the increasing number of pole location
changes as listed in column four of table II.
Table I
Source Filter AvSIR AvSIR AvSIR
Number Length EAP Parra/ Mixed
M P + 1 Method Spence Signals
2 5 16.06 4.96 4.47
2 7 10.45 5.11 4.50
2 10 7.37 5.35 4.76
3 5 11.24 5.17 3.77
3 7 8.41 5.16 3.88
3 10 6.22 5.40 4.02
4 5 6.67 5.17 3.10
4 7 2.72 5.10 3.26
Table I. Average signal-to-noise ratios for various
source numbers M, lter orders P, and algorithms.
Table II
Source Filter AvSIR Pole
Number Length True Mirror EAT
M P + 1 EAP %
2 5 47.29 15.56 4.0
2 7 26.54 30.79 4.0
2 10 9.29 55.56 4.0
3 5 30.53 59.52 3.0
3 7 19.15 77.14 3.0
3 10 9.68 95.24 3.0
4 5 18.11 92.36 3.0
4 7 9.09 99.36 3.0
Table II. Supplemental statistics about the experiments
with various source numbers M and lter orders P.
A very promising result for future developments is con-
tained in column three of table II. If the result of the EAP
estimation is replaced with the true location of the EAPs
in the given mixture signals y
j
[n] for j = 1 . . . M then the
average SIRis dramatically improved over the Parra/Spence
method even for higher complexity cases. It is thus expected
that the method will produce signicantly better results if
equipped with a more robust EAP detection strategy.
5 Conclusions
We have presented a new approach for blind multichan-
nel system identication. The approach relies on the de-
tection of exclusive activity periods (EAPs) in the source
signals. The presence of EAPs is exploited for a reliable
blind estimation of source signals and channel properties
(between sources and observations). The advantage of the
proposed method was demonstrated experimentally within
the framework of convolutive blind separation of speech
signals.
The goal of the paper was to provide a proof of con-
cept for EAP based, blind identication methods. Some of
the methods presented in this paper, especially the section
on EAP detection, are, in their current form, still subop-
timal and deserve to be studied in greater detail. Despite
its suboptimality, however, the proposed method still im-
proves upon existing strategies (especially for lower com-
plexity tasks).
A caveat of the proposed method is that (currently) we
have not imposed a constraint that forces the optimal un-
mixing matrix
H(z) to be representative of a stable system.
Instead, we employed a simple pole-mirroring strategy that,
by itself, is responsible for a substantial part of the perfor-
mance loss at higher complexity tasks (see table II).
References
[1] A. Hyvarinen, Karhunen J., and E. Oja, Independent
Component Analysis, Wiley-Interscience, 2001.
[2] P. Comon, Independent component analysis: A new
concept, Signal Processing, vol. 36, pp. 287314,
1994.
[3] A. Cichocki and S. Amari, Adaptive Blind Signal and
Image Processing: Learning Algorithms and Applica-
tions, Wiley, Chichester, U.K., 2002.
[4] N. Mitianoudis and M. E. Davies, Audio source sepa-
ration: solutions and problems, International Journal
of Adaptive Control and Signal Processing, vol. 18, no.
3, pp. 299314, Apr. 2004.
[5] J. R. Deller, J. G. Proakis, and J. H. Hansen, Discrete-
Time Processing of Speech Signals, Macmillan, New
York, 1993.
[6] Y. Medan, E. Yair, and D. Chazan, Super resolution
pitch determination of speech signals, IEEE Transac-
tions on Signal Processing, vol. 39, no. 1, pp. 4048,
January 1991.
[7] Kondoz, Digital Speech Coding for Low Bit Rate Com-
munication Systems, Wiley-Interscience, 2004.
[8] R. O. Duda and P. E. Hart, Pattern Classication and
Scene Analysis, Wiley-Interscience, Menlo Park, CA,
1973.
[9] L. Parra and C. Spence, Convolutive blind separa-
tion of non-stationary sources, IEEE Transactions on
Speech and Audio Processing, vol. 8, no. 3, pp. 320
327, May 2000.
Proceedings of the 2005 International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on
Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC05)
0-7695-2504-0/05 $20.00 2005 IEEE
Authorized licensed use limited to: Feng Chia University. Downloaded on March 15,2010 at 07:46:27 EDT from IEEE Xplore. Restrictions apply.