You are on page 1of 4

IEEE SIGNAL PROCESSING LETTERS, VOL.

15, 2008

A Linear Closed-Form Algorithm for Source


Localization From Time-Differences of Arrival
Matthew D. Gillette and Harvey F. Silverman

AbstractMicrophone arrays often operate in the near field,


which complicates the problem of determining a source location
from time-difference-of-arrival (TDOA) measurements typically
derived from generalized cross-correlation functions. Each TDOA
satisfies the equation of a hyperboloid in space and methods have
been developed to either solve for intersecting hyperboloids or
make some approximation to them, keeping source-location determination a nonlinear, somewhat complex problem. We introduce a
closed-form, analytic solution for the problem (the GS algorithm).
It is so simple that we were surprised that, until very recently, there
have been no other solutions similar to ours. The method uses a
minimum of five microphones in three dimensions, one more than
other solutions, but, for nonsingular layouts of the microphones,
it is very fast and accurate. First, the new method is compared to
other closed-form methods for accuracy and sensitivity to noise
using simulated data. Then, several variants of GS are compared
to two other real-time algorithms, LEMSalg and SRP-PHAT,
using real, human-talker data from a large array in a noisy room.
Index TermsLocalization, microphone arrays, sensor arrays.

I. INTRODUCTION

UR group has been concerned with source localization


for microphone arrays since the early 1990s [2], [3]. We,
and many others, have published both two-stage and one-stage
algorithms. In the former, a set of time-differences-of-arrival
(TDOAs) is determined first and these are used to determine
a point-source location. One-stage methods maximize a measure of the source power over a rooms focal area for the array.
Two-stage algorithms are typically faster, but suffer from inaccurate determinations of the TDOAs. While one-stage algorithms avoid this pitfall, they are computationally costly due to
searching over a space having many local maxima and minima.
The latter, however, generally performs better in noisy environments [4], [5]. Each method usually works better as the size of
the data used is grown and/or the number of microphones used
is increased.
One goal in our current research is to beamform a microphone array to a cloud, rather than to a point in space. We have
seen that this technique will enhance the higher frequencies of
the beamforming response. It should be noted that, while we
have seen this to be the case through experimentation and simulation, we are not aware of any publications to date proving
this improvement in beamforming to a cloud. We hope to address this in a future publication and to do this requires making
Manuscript received June 25, 2007; revised August 20, 2007. The associate
editor coordinating the review of this manuscript and approving it for publication was Prof. Vesa Valimaki.
The authors are with the LEMS Division of Engineering, Brown University,
Providence, RI 02912 USA (e-mail: mdg@lems.brown.edu; hfs@lems.brown.
edu).
Digital Object Identifier 10.1109/LSP.2007.910324

many source-location determinations using many sets of microphones. If a set is to be four or five microphones, this task is
better suited for a two-stage algorithm for computational reasons. Also, the accuracy of a one-stage locator degrades seriously if the number of microphones is small.
Given a set of TDOAs from a small set of microphones, the
second stage of a two-stage algorithm determines the best pointsource location in the focal volume of the room. For a single
TDOA, the three-dimensional contour whose points satisfy that
TDOA value is a hyperboloid. The earliest methods we found
were those that proposed solving a system of intersecting hyperboloids [2], [6], [7]. Other early work used a spherical method
[8], [9]. Ensuing work developed the method of linear intersection in which conic approximations to the asymptotes of the
hyperboloids were used but only for four microphones whose diagonals were orthogonal and bisecting [10]. Another quadratic
method was proposed [11], which started from the same initial equation used in this letter. It always produces two solutions and, occasionally, they are imaginary. Recently, an article
with a similar linear closed-form algorithm was published by
Militello and Buenafuente [14]. Their result was derived from a
physics perspective, while our derivation is from an engineering
perspective and is somewhat simpler. In this letter, we also test
the method using real data in three dimensions with 448 microphones in a noisy environment.
We fully derive a closed-form, linear solution to the problem
of determining a source-location point from a small set of
TDOAs for a near-field situation (the GS method). We show
its stability, computational advantage, and simplicity. We then
compare its sensitivity to noise in the TDOA estimations to two
other methods. Real data are used to compare GS to two other
real-time, implemented location-determination algorithms.
II. BASIC ALGORITHM DERIVATION
be located at point
.
Let microphone
. Let be the speed of
Let the source be at
be the number of non-reference microphones in the
sound,
microphone array, and
be the number of reference micrototal microphones in the array.
phones; thus, there are
For the proof of the basic algorithm, we assume a single refer, microphone 0. Then we define the distance from
ence
the source to microphone as
. The TDOA that we derive
, for microphone
relative to a
from the data,
referent microphone 0 is

1070-9908/$25.00 2007 IEEE


Authorized licensed use limited to: BROWN UNIVERSITY. Downloaded on September 10, 2009 at 16:47 from IEEE Xplore. Restrictions apply.

(1)
(2)

IEEE SIGNAL PROCESSING LETTERS, VOL. 15, 2008

As shown, for a constant value of the speed of sound, the


TDOA yields the distance difference of arrival (DDOA),
as follows:
(3)
We assume we know the DDOAs for four additional microphones using microphone 0 as the reference. We further assume
. As in [11], we form
we know all the microphone positions,
for microphone
(4)
The right side can be expanded to yield

One might note that there are some useful arrangements of microphones for which the matrix is singular. The most notable
of these might be a line of microphones with uniform spacing.
As some of these configurations were popular in earlier microphone arrays, this might be the reason why we had not been able
to find this algorithm in earlier literature. However, for microphones with random spacing, for example, the matrix is virtually
always nonsingular.
III. EXTENSION OF THE ALGORITHM
There are two generalizations of the basic algorithm that may
be useful. The first is that the number of non-reference microphones, , need not be restricted to four, and the second is that
the number of referents, , need not be restricted to one. For the
case of more microphones we go to matrix notation letting

(5)
and expanded again with some cancellation of the squared
source-location terms and grouping

(13)

(6)
The left side may also be rewritten using (3) as
(7)

(14)

or
(8)
Thus
Equating both sides
(15)
(9)

and each row is linearly independent, then we can


If
derive [12]

We next group all the known terms together and divide by 2


as
defining

(16)

(10)

and generate a least-squares error result, where


and is the pseudoinverse

is transpose

Rearranging and substituting gives


(17)
(11)
and all these are seen to be
The unknowns are
linear in (11). With four unknowns, we need four such equations, giving

A second generalization is to use multiple references for a


and
single solution, as demonstrated in the following for
, using a total of six microphones with microphones 0
and 5 as references:

(12)
(18)
If the matrix in (12) is not singular, then the Cartesian coordinates for the point source and the distance from it to the referent
are solved simultaneously by solving the linear system of (12).
Authorized licensed use limited to: BROWN UNIVERSITY. Downloaded on September 10, 2009 at 16:47 from IEEE Xplore. Restrictions apply.

GILLETTE AND SILVERMAN: LINEAR CLOSED-FORM ALGORITHM FOR SOURCE LOCALIZATION

(19)

IV. PERFORMANCE AND COMPARISONS


Our simulations assume the same set of microphone locations
as our current real experimental array that has 448 microphones
on 14 panels [1]. Thirty-two microphones are randomly fixed to
each 138 cm W 67 cm H panel made of acoustical foam in
an aluminum frame. There are three panels spanning the width
(X-direction) in both the front and back of the room. There are
four panels spanning the length (Z-direction) on both of the
sides of the room. The panels are fixed to the walls approximately 2 m from the ground, though this height varies for several
panels in the front of the room. All panels face inward towards
the center of the space. The origin is approximately 2 m high in
the center of the front wall; the coordinate axes are Cartesian.
An algorithm that is less sensitive to noisy estimates of
DDOAs is better. We therefore compared the proposed algorithm to the spherical algorithm [8], [9]. In a mathematical
simulation, we added noise to a selected DDOA using a uniabout a known DDOA
form distribution of error of size
value. The standard deviation of the distance of the estimated
location from the correct location in the X-direction onlythe
one best predicted by the 1 m aperture of the selected microphoneswas used as the measure of an algorithms sensitivity.
We used the minimal configuration for each algorithm, i.e., four
microphones for the spherical method and five for GS. Two
source positions were tested, one directly in front of the middle
of the array in X, and one 33.6 degrees off the normal to the
panel. Both are 3 m from the panel in the orthogonal direction.
The results are shown in Fig. 1. One can see that for small
DDOA perturbations due to noise, the GS algorithm is less
sensitive, especially when the source is closer to the normal of
the panel. Both methods are considerably more sensitive when
the source is off-angle from the normal. Also, for the spherical
method, there are two solutions and selecting the correct one is
not always simple to do. The thinner dashed lines indicate this
fact.
Next, we compared the GS algorithm to some current
real-time algorithms, SRP-PHAT and LEMSalg, for accuracy,
estimation rate, and cost. SRP-PHAT [4], [13] is known as very
accurate in a noisy environment, although computationally
costly. However, it is viable as a real-time algorithm using
the method in [13]. LEMSalg has been used at Brown as our
real-time source locator for nearly ten years [5].
We made a 50-s recording of a loud, fixed-location, human
talker (average signal to background noise about 15 dB). In all
the algorithms, the same 24 microphones, 12 each on two orthogonal panels, were used; one panel was most sensitive to X
and the other most sensitive to Z.
For the SRP-PHAT algorithm, all 276 unique pairings of different microphones taken from the 24 microphones were used

Fig. 1. Mathematical simulation: comparison of sensitivity of the GS algorithm


with the spherical solution [8], [9].

[13]. In LEMSalg, 16 pairs were taken from the 24 microphones


to generate a 16-vector of TDOAs. Then the simplex method
was used to find the best least-squares fit of the measured TDOA
vector with a hypothesized one. Fortunately, the search space
was quite smooth and the simplex method converged much of
the time, and indicated its own failure easily when it does not.
The exact same TDOA estimation technique and frame size
described in [5] was used for the GS algorithms and LEMSalg
(SRP-PHAT does not make TDOA decisions). Each has the
ability to discard bad TDOA estimates as they are detected. If
the number of good estimates for a frame becames too small,
then the frame is discarded. The percentage of frames estimated
is taken from these data.
Results for four versions of the GS technique are shown in
Table I. Let represent the size of the result vector in the GS
method, e.g., for a single referent,
and for two,
.
GS-22 assumes two reference microphones, one in middle of
. However, it was later postulated
each panel, and thus
that computing the algorithm separately for each panel would
give better results, so GS-11-11, GS-21-21, and GS-30-30 use
, 21
, and 30
independent pairs, re11
spectively. Then the X-value of the location estimate was taken
from the first panel, the Z-value taken from the second, and we
experimentally determined that the better result and standard deviation was obtained by using the output from the panel closest
to the source for the Y-value, or, here, the first panel.
First note that SRP-PHAT made good estimates for all
frames; it is the most reliable in this sense in a noisy environment. We measured the position of the human talker, but
experience has shown that the SRP-PHAT location is even
more accurate than our physical measurement (on the degree
of several centimeters), so we took this as the correct location
point. LEMSalg showed relatively small standard deviations
and was between one and two orders of magnitude lower in
cost than SRP-PHAT, even with the improved method of [13].
However, only 71% of the frames were estimated. The GS
algorithms improved in accuracy and estimation rate as more
pairs were included. GS-30-30 showed about the same accuracy
as LEMSalg, although with higher standard deviation in Y. It
should be noted that the standard deviation is generally highest
in Y because the array aperture in the Y-direction is virtually
an order of magnitude smaller than the aperture in either of
the other directions. GS-30-30 also costs about four times the

Authorized licensed use limited to: BROWN UNIVERSITY. Downloaded on September 10, 2009 at 16:47 from IEEE Xplore. Restrictions apply.

IEEE SIGNAL PROCESSING LETTERS, VOL. 15, 2008

TABLE I
COMPARISON OF GS RESULTS TO SRP-PHAT AND LEMSalg

computational cost of LEMSalg because it is necessary to


compute 60 IDFTs for the TDOA estimation versus only 16 for
LEMSalg. In both LEMSalg and the GS algorithm, the signal
processing of TDOAs dominates the overall computational
cost.
V. CONCLUSION
We have presented the GS linear, closed-form method for estimating source location from TDOAs for five or more microhaving linearly independent spacing.
phones,
The determination of the source location requires the computation of the solution of a fourth-order linear system of equations
, or the pseudo-inverse of an
system for
for
. It has been shown to have about the same sensitivity
to TDOA estimation errors as other TDOA-based, more complex quadratic solutions. On real data, the GS method essentially
has similar accuracy, estimation rate, and cost as the current estimator, LEMSalg. Higher accuracy and estimation rates can be
derived by doing some more computation, but, even for doing all
the pairs, this cost is one or two orders of magnitude smaller than
that of SRP-PHAT. As the GS method is closed-form, it does
not suffer from potential convergence issues of a search method
such as the simplex search used by LEMSalg. The GS method
also is nearly trivial to program and offers a large number of
variants, one of which may be the most suitable for any particular task.
REFERENCES
[1] H. F. Silverman, W. R. P. , III, and J. L. Flanagan, The huge microphone array (hma), J. Acoust. Soc. Amer., vol. 101, no. 5, p. 3119,
May 1997.
[2] H. F. Silverman and S. E. Kirtman, A two-stage algorithm for determining talker location from linear microphone-array data, Comput.,
Speech, Lang., vol. 6, no. 2, pp. 129152, Apr. 1992.

[3] M. S. Brandstein and H. F. Silverman, A New Time-Delay Estimator


for Finding Source Locations Using an Microphone Array, LEMS,
Div. Eng., Brown Univ., Providence, RI, LEMS Tech. Rep. 116,
Mar. 1993.
[4] J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, Robust localization in reverberent rooms, in Microphone Arrays: Techniques and
Applications, M. Brandstein and D. Ward, Eds. New York: SpringerVerlag, 2001, pp. 157180.
[5] H. F. Silverman, Y. Yu, J. M. Sachar, and W. R. Patterson, III, Performance of real-time source-location estimators for a large-aperture
microphone array, IEEE Trans. Speech Audio Process., vol. 13, no. 4,
pp. 593606, Jul. 2005.
[6] R. O. Schmidt, A new approach to geometry of range difference location, IEEE Trans. Aerosp. Electron. Syst., vol. AES-8, no. 6, pp.
821835, Nov. 1972.
[7] H. F. Silverman and K. J. Doerr, An Algorithm for Determining Talker
Location Using a Linear Microphone Array and Optimal Hyperbolic
Fit, LEMS, Div. Eng., Brown Univ., Providence, RI, LEMS Tech. Rep.
77, Jul. 1990.
[8] J. M. Delosme, M. Morf, and B. Friedlander, Source location from
time differences of arrival: Identifiability and estimation, in Proc.
ICASSP80, Denver, CO, Apr. 1980, pp. 818824.
[9] H. C. Schau and A. Z. Robinson, Passive source localization
employing intersecting spherical surfaces from time-of-arrival differences, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-35,
no. 8, pp. 12231225, Aug. 1987.
[10] M. S. Brandstein, J. E. Adcock, J. H. DiBiase, and H. F. Silverman,
A closed-form method for finding source locations from microphonearray time-delay estimates, in Proc. ICASSP-1995, Detroit, MI, May
1995, pp. 30193022.
[11] S. S. Reddi, An exact solution to range computation with time
delay information for arbitrary array geometries, IEEE Trans. Signal
Process., vol. 41, no. 1, pp. 485486, Jan. 1993.
[12] A. Ben-Israel and T. N. E. Greville, Generalized Inverses: Theory and
Applications. New York: Wiley, 1974.
[13] H. Do, H. F. Silverman, and Y. Yu, A real-time srp-phat source location implementation using stochastic region contraction (src) on a
large-aperture microphone array, in Proc. ICASSP-2007, Honolulu,
HI, Apr. 2007, pp. I-121I-124.
[14] C. Militello and S. R. Buenafuente, An exact noniterative linear
method for locating sources based on measuring receiver arrival
times, J. Acoust. Soc. Amer., vol. 121, no. 6, pp. 35953601, Jun.
2007.

Authorized licensed use limited to: BROWN UNIVERSITY. Downloaded on September 10, 2009 at 16:47 from IEEE Xplore. Restrictions apply.

You might also like