Psychoacoustic Investigation of Spherical Microphone Array Auralization

Ilmenau University of Technology
Master Thesis
Gyan Vardhan Singh
Psychoacoustic Investigation on the Auralization of

Spherical Microphone Array data using Wave Field
Synthesis
Matrikel No.:
47816
Thesis No.:
2181/13MA/08
Professor:
Univ.-Prof. Dr.-Ing. Karlheinz Brandenburg
Supervisors:
Dipl.-Ing. Johannes Nowak
Department:
Electronic Media Technology Lab (TU Ilmenau)
Date:
21-May-2014
ACKNOWLEDGEMENTS
This masters thesis would not have been possible without the support of many people.
Firstly, I wish to express my gratitude to Univ.-Prof. Dr.-Ing. Karlheinz Brandenburg
for giving me the chance to work on such an interesting topic in his group.
I owe my deepest gratitude to my supervisor Dipl.-Ing. Johannes Nowak for giving me
the opportunity to do this master thesis under his supervision. His constant guidance,
assistance and support were really important for this thesis.
Further, I wish to express my love and gratitude to my beloved family, especially my
parents, for their love, understanding and supporting me through the duration of my
study.
Finally, I thank to all my friends for their support during my studies.
ABSTRACT
Microphone arrays are structures which have atleast two or more microphones placed
at different positions in space generally in a geometrical fashion. In many applications
apart from temporal characteristics we also need the spatial charaterization of sonic
fields and in order to achieve this goal microphone arrays are employed.
In particular for spatial sound reproduction microphone arrays play a very important
role. Researchers have earlier used different microphone array configurations for the
purpose to sound recording, characterization of room acoustics and for auralization.
As the research in spatial sound reproduction progressed it was found that rendering
sound using an array of loud speaker elements is not sufficient to fully auralize an acoustic scene. It was proposed that microphone arrays be implemented on recording side
in order to reproduce a complete three dimensional acoustic behaviour. Researchers
have used different array configurations like planar arrays or circular arrays to map the
listening room acoustics for the purpose of auralization with some rendering system
e.g. wave field synthesis (WFS).
A drawback was noticed using two dimensional arrays, as they were not able to sufficiently characterize an acoustic scene in three dimension hence spherical microphone
array came into picture. Spherical microphone arrays and there processing has been
described by many authors but a perceptual analysis of various factors which plague
the performance of spherical microphone array is still not established fully.
In the present work we do a detailed analysis of the processing chain which starts from
simulation of room characteristics with spherical microphone array, wave field analysis
of the sound fields, classification of errors and Auralization of the free field impulse
responses. We bring together the existing state of the art in spherical microphone
array processing and look for the perceptual impact of different factors. We use a rigid
sphere configuration and analyze the three different error categories namely: positioning error, spatial aliasing and microphone noise. We attempt to establish a qualitative
and quantitative relation between the errors and limitations, encountered in spherical
microphone array processing and look for the psychoacoustic effects by auralizing the
free field data through WFS.
Spherical microphone array gives a complete three dimensional image of the acoustic
environment, the spherical microphone array data is decomposed into plane wave using plane wave decomposition. In process of plane wave decomposition the spherical
aperture of a spherical microphone array is discretized and because of this, limitations
get imposed on the performance of the array.
We simulate an ideal full audio spectrum wave field impact on the continuous aperture
of spherical microphone array and compare this with sampled array aperture.
In the listening test we auralized sound field based on the ideal wave field decomposition
of a continuous aperture and compare it with different degrees of errors in different
categories. By this comparision we attempt to establish the extent to which a said
error would perceptually corrupt a reproduced sound field. We also try to see the
extent to which some degree to error remains perceptually insignificant or in other
words the extent of error which can be tolerated.
We check out the spatial aliasing limit imposed by the redering system and on the
basis of that establish a base for the transform order (l = 3) used in spherical array
processing. The perceptual analysis is done in two ways we first obtain an error
level which when incorporated in auralization process (simulated for l = 3) would be
perceptually insignificant. And then we look for the perceptual effects of this error
when stepwise tarnsform order l is changed.
We also try to establish a correspondence between wave field synthesis on the rendering
side and spherical microphone array on the measurement side. We investigate to
what extent wave field synthesis could retain the perceptual quality by analyzing the
psychoacoustic effects of changing various parameters on the spherical microphone
array side. The independence of rendering side in regard to the meausrement side is
also analysed.
Contents
Contents
1 INTRODUCTION
1.1 Preliminaries . . . . .
1.2 Auralization . . . . . .
1.3 Motivation . . . . . . .
1.4 Organization of Thesis
.
.
.
.
1
3
4
9
10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
15
15
18
20
20
22
23
23
25
26
28
29
32
34
35
3 ERROR ANALYSIS
3.1 Measurement errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
38
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

2.1 Acoustic wave equation . . . . . . . . . . . . . . . . . . . .
2.1.1 Homogeneous acoustic wave equation . . . . . . . .
2.1.2 Solution of Wave equation in cartesian coordinates
2.1.3 Solution of wave equation in spehrical coordinates .
2.1.4 Spherical Bessel and Hankel functions . . . . . . . .
2.1.5 Legendre functions . . . . . . . . . . . . . . . . . .
2.1.6 Spherical harmonics . . . . . . . . . . . . . . . . .
2.1.7 Radial Velocity . . . . . . . . . . . . . . . . . . . .
2.2 Spherical harmonic decomposition . . . . . . . . . . . . . .
2.2.1 Interior and Exterior problem . . . . . . . . . . . .
2.2.2 Spherical wave spectrum . . . . . . . . . . . . . . .
2.3 Spherical wave sound fields . . . . . . . . . . . . . . . . . .
2.4 Spherical harmonic expansion of plane wave . . . . . . . .
2.5 Mode strength . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Discretization of spherical aperture and spatial aliasing . .
2.7 Plane wave decomposition . . . . . . . . . . . . . . . . . .
2.8 Spatial resolution in plane wave decomposition . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Master Thesis Gyan Vardhan Singh
Contents
3.2
3.3
3.4
3.5
ii
Description of measurement error function

Microphone noise . . . . . . . . . . . . . .
Spatial aliasing . . . . . . . . . . . . . . .
Positioning error . . . . . . . . . . . . . .
4 WAVE FIELD SYNTHESIS

4.1 Physics behind WFS . . . . . . .
4.2 Mathematical description of WFS
4.3 Synthesis operator . . . . . . . .
4.4 Focusing operator . . . . . . . . .
4.5 Practical Consequenses . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 LISTENING TEST
5.1 Listening Test . . . . . . . . . . . . . . . . . . . .
5.2 Reproduction set up . . . . . . . . . . . . . . . .
5.3 Auralization . . . . . . . . . . . . . . . . . . . . .
5.3.1 Aspect to be percetually evaluated . . . .
5.3.2 Processing . . . . . . . . . . . . . . . . . .
5.4 Structure of listening test . . . . . . . . . . . . .
5.4.1 Audio Tracks . . . . . . . . . . . . . . . .
5.4.2 Listening test condition . . . . . . . . . .
5.5 Test subjects . . . . . . . . . . . . . . . . . . . .
5.6 Evaluation . . . . . . . . . . . . . . . . . . . . . .
5.6.1 Test subject screening . . . . . . . . . . .
5.6.2 Statistic for the evaluation of listening test
5.6.3 Definations . . . . . . . . . . . . . . . . .
5.7 Spatial aliasing vs transform order . . . . . . . .
5.8 Evaluation of positioning error . . . . . . . . . . .
5.9 Microphone noise . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
43
44
47
.
.
.
.
.
48
48
51
56
60
60
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
62
63
64
65
67
69
70
70
73
74
74
74
75
76
78
82
6 Conclusions
84
Bibliography
86
List of Figures
91
List of Tables
93
Contents
iii
APPENDIX
95
A Derivations
A.1 Orthonormality of Spherical harmonics and Spherical Fourier
A.2 Position vector and Wave vector . . . . . . . . . . . . . . . .
A.3 Plane wave pressure field for different levels . . . . . . . . .
A.4 Rigid sphere and open sphere configuration . . . . . . . . . .
95
95
97
99
100
Theses
transform
. . . . . .
. . . . . .
. . . . . .
102
1 INTRODUCTION
1 INTRODUCTION
Digital processing of sounds so that they appear to come from particular locations in
three-dimensional space is a very important and is an integral part of virtual acoustics.
In virtual acoustics the goal is simulation of the complex acoustic fields so that a listener
experiences a natural environment and this is done by spatial sound reproduction
systems.
In realization of spatial sound reproduction systems the concept of sound field synthesis
is used. Various methodologies and analytical approaches are combinedly defined by
the concept of sound field synthesis.
In sound field synthesis we decompose the sound or audio into various components or
wave fields. In simple terms we pull apart the basic components of sound characterizing various spatial and temporal properties. And then after implementing complex
signal processing techniques we reproduce sound in such a way that these components
merge together in the propagation medium to auralize a complete three-dimensaional
characteristic of the sound.
Hence, sound field synthesis is a principle where an acoustic environment is processed,
synthesized and reproduced or re-created such that the real acoustic scenario could
be perceived by the listener. Spatial sound, Immersive audio, 3 D sound, surround
sound systems; these are some of the terms which are used often to describe such audio
systems.
Different aspects come into play in realizing a sound field reproduction system and
a very broad research work is being done attempting to understand various factors.
Some examples for sound field reproduction systems which deal with various conceptual aspects of signal processing are like wave field synthesis which is also our choice
of reproduction system used in this thesis, Higher order Ambisonic [1], sound field
1 INTRODUCTION
reproduction with MIMO acoustic channel inversion [2] and vector based amplitude
panning methods [3]. These are few of the spatial sound system examples which
were developed by respective researchers, in [4] the author has presented very detailed
mathematical treatment of various spatial sound reproduction techniques, he has attempted to bring these related spatial sound system on a single mathematical plane
on the basis of functional analysis.
In the present work we put forward our analysis where in, we answer various questions which come up when an acoustic environment is recreated.Sound reproduction
techniques for virtual sound systems have been studied, developed and implemented
in various different ways and configurations. Acoustic auralization of sound fields in
this work, focuses on wave field analysis (WFA) [5] [6] concerning spherical microphone arrays, and their auralization on a 2-dimensional geometry of loudspeaker array
following the principle of wave field synthesis (WFS).
In order to obtain the acoustic scene characteristics, researchers have proposed the
usage of microphone arrays. Apart from temporal properties, for spatial sound reproduction we need the spatial properties of sound field as well and therefore microphone
arrays are required as they can characterize the sound in space as well [7][5][6]. Auralization using microphone arrays have been attempted using various kind of array
geometries, In [5] the author has focused circular microphone array and used it for the
auralization of sound fields with wave field synthesis.
In [8], Spatial Sound Design principles have been explained for Auralization of room
acoustics.
In spatial sound design the spatial properties of an audio stream like position, direction,
orientation in a virtual room and room itself are modified. Two things are attempted
the first being the simulation of an acoustic environment and the other is the direction
dependent visualization and modification of the sound field by the user. In this work
we focus on the part where simulation and auralization of an acoustic scene is done.
More importantly we investigate the factors influencing the microphone array used for
room impulse response recording (RIR) and analyse the perceptual effects which would
be observed during auralization process when various parameters of the microphone
array are changed.
Any sound wave can be represented as a superpositon of plane waves in far field of its
sources [9][8], and consequently it can also be said that a room can be characterized
1 INTRODUCTION
by its impulse responses as it can be assumed to be linear time invariant (LTI). Hence
if we are able to capture the room impulse responses of a room then we can fully
characterize the acoustic nature of that room and inturn any acoustic event in that
room can be reproduced simply with the help of plane wave decomposed components
of its room impulse responses.
1.1 Preliminaries
To understand how sound radiates in a medium we would like to introduce the reader
with soap bubble analogy as explained in [10, page 6-10] by Zotter. The sound radiation
is considered as a soap bubble as shown in problem, Figure 1.1. We assume a free
sound field and an ideal bubble of soap which is large enough to enclose a musician and
an instrument, now when the sound is produced by the instrument, the bubble surface
will vibrate according to the motion of air, because as sound propagates through the
medium it will hit the bubble and consequently the soap bubble will also vibrate with
the air molecules. At respective observation points on the sphere or the soap bubble,
the wave form of the vibrating sphere can be said to represent the radiated sound.
In [9], Williams has explained that acoustic sound radiation from the instrument could
be completely defined if we are able to acoustically map the motion of this continuous
surface enclosing the sources. This kind of analysis of sound radiation is called exterior
problem.
In a similar way if we say that there are no sources inside (rather it is enclosing the
measurement set up or listening area) the soap bubble but instead the sound radiation
propagates from outside (i.e. the sources are outside) and the waves hit the bubble
from the exterior.
Now again as the bubble is in contact with the medium therefore it will vibrate and
identifying the motion of the surface of bubble would be sufficient to describe the
acoustic radiation, this is called interior problem.
In [11], exterior and interior problem have again been elaborated, as for our application interior problem is more important hence we would put forward the interior
problem with respect to spherical microphone array. A more mathematical treatment
1 INTRODUCTION
Figure 1.1: Soap bubble model of acoustic radiation [10].

is explained in chapter 2 and for further detailed analysis the reader is referred to [9,
page 124].
In auralization applications we follow the same analytical line and characterize a listening room environment by measuring the impulse response coming from different
directions [7], this in turn gives us the directional behavior of the sound i.e. how direct
sound reaches and affects the spherical array and how the reflected sound behaves.
1.2 Auralization
In order to auralize the sound field keeping the spatial characteristics of sound alive,
method based on WFS is applied in the present work. WFS is a consequence of
Huygens Principle expressed mathematically by Kirchoff Helmhotz integrals [12]. In
[13] wave field synthesis is discussed in explicit details, In [13] Verheijen has explained
1 INTRODUCTION
the reproduction techniques stressing more on the loudspeaker arrays. Mathematical

description of WFS would be followed up in chapter 4. Over here its important to
point out that although WFS does reproduce the room effect but it is not sufficient for
recreation of an acoustic virtual reality [14][5] as it lacked the knowledge of acoustic
room impression which is obtained by wave field analysis of the room. Hence, WFS
with wave field analysis was further proposed in [6]. The proposed techniques suggest
that we obtain the room characteristics by measuring the room impulse responses and
then the analysis of these impulse responses can be used for calculating the driving
function for wave field synthesis. It is suggested that we can ideally reproduce the
reverberant part of sound with 8-11 uncorrelated plane waves [6].
In our application we attempt to use different configuration of plane waves in order to
synthesize the the sound waves. As mentioned by Sonke in [6] we check out palne wave
decompositon for different number of directions and finally settle with 12 direction.
We also compare the psychoacoustic effect of using different numbers of plane wave
sources and evaluate the optimal configuration.
In [5] the techniques and suggestions proposed in [6] are further explored and implemented using a circular microphone array. There were two methodologies explained in
[5]:
Impulse response based auralization
Natural recording based auralization
Impulse response based auralization: In this approach the room acoustics are
measured and analyzed or impulse responses for the room are measured. For reproduction the room characteristics which are obtained from the impulse response
measurement are combined which dry audio channel and then reproduced. In more
simpler words if we suppose an audio file is being played in a particular room and now
this acoustic scene has to be recreated in another room. Then the knowledge of the
room impulse responses (RIR)would be sufficient to recreate the same acoustic scene
by convolving the dry audio file with the directional responses obtained by plane wave
decomposition of RIR. Refer to Figure 1.2.
Natural recording based auralization: In this approach real time recording of the
sound field is done along with the audio signals. Here we do not record impulse responses separately but live recording of acoustic event is captured. In natural recording
1 INTRODUCTION
Desired Hall
source
Microphone array
Room Response
Data
Wave Field
Analysis
Wave Field Measurement
wave field parameters based on

wave field analysis of room
impulse response
Recording Hall
Close mic
recording
convolver
Dry audio
(directional room impulse

response convolved with
dry audio and then fed
into the deriving function
of WFS)
Wave field synthesis set up
Reproduction room
Figure 1.2: Impulse response based auralization [5].

based auralization no separate impulse responses or perceptual parameters are used,
refer to Figure 1.3.
In the present work we follow impulse response based auralization technique. One important factor for using this line of approach is that the measurement and reproduction
sites are independent of each other [8][11][5].
For auralization of acoustics environment for example a hall over an extended area,
impulse responses have to be measured along an array of microphone positions [15].
The impulse responses can be processed by three different techniques:
Holophony
1 INTRODUCTION
source
Close mic
recording
Mesurement of acoustics of the

room in response to audio scene
Measurement
region
Processing
Direct sound
Wave Field
Synthesis
Room Response
audio
Wave Field Measurement
Recording and Desired Hall
Reproduction Room
Figure 1.3: Natural recording based auralization [5].

Wave Field Extrapolation (WFE)
Wave Field Decomposition (WFD)
Holophony: The impulse responses are measured at microphone positions which in
turn correspond to the loudspeaker positions in wave field synthesis. The impulse
responses in this approach can be directly used as convolution filters to derive the
corresponding loudspeaker and no further processing is required. Although this technique looks very straight forward but it is very inflexible since the output can not be
used for any other WFS layout other than the one specifically designed for the corresponding microphone array set up used in the measurement. More over, in Holophony
it is required for microphones to have very sharp directivity patterns which is quite
unrealistic in practice [16][5].
Wave Field Extrapolation: In WFE the mesurement of impulse responses is not
necessarily done at positions that correspond to the WFS loudspeaker configuration.
The impulse responses measured from any particular kind of microphone array configuration are extrapolated to the required WFS loudspeaker array positions (in principle
these positions are different then that of the microphone positions, hence extrapolation). The extrapolation can be done using Kirchhoff-Helmholtz integrals [5]. Although
auralization with wave field extrapolation has given satisfactory results but there are
some drawbacks, it requires a very large measurement array for a medium size extrap-
1 INTRODUCTION
olation area [7]. In [5] the author has shown that it atleast requires microphone array
size equivalent to that of the listening area in order to achieve satisfactory results.
Wave Field Decomposition, [17][15][18]: The wave field decomposition approach
decomposes the sound field into planes wave, which arrive from different directions.
The plane wave decompostion can be considered as an acoustic photograph of the
sound sources including secondary sources which can be regarded as the one generating
reflections [7].
The impulse responses are decomposed into plane waves which give the directional
image of the sound field. Further these plane wave are reproduced as point sources in
WFS set up. The measurement array and reproduction site are independent of each
other in this approach and we can reproduce the sound field for a larger area as compared to the other two approaches. The size of the measurement array and that of the
loud speaker array have no dependence as far as the microphone array characterizes
the room sufficiently [5][8]. The plane wave obtained through plane wave decomposition can optimally represent the sources and reflections. And hence in principle we can
reproduce the sound field satisfactorily. Due to considerable advantages of wave field
decomposition over other methods we would focus our work on plane wave decomposition of the acoustic wave fields. In next chapter we present the analysis for plane wave
decomposition of spherical microphone array. In [5], circular microphone array was
implemented for the purpose of auralization with WFS, but in order to obtain a three
dimensional plane wave decomposition use of spherical mircophone array necessiated
[11][8].
In our work we simulate the acoustics characteristics of a free field full spectrum wave
impact on spherical microphone array, and analyze it to obtain plane waves representing direct sources, reflections and reverberation part, these plane wave responses are
implemented in the driving filter of WFS and we try to auralize the sound. As a consequence of spherical microphone array being important for three dimensional sampling
of acoustic radiation we study different aspects of spherical microphone arrayin this
work and investigate their influence on spatial sound reproduction.
Finally we auralize the sound field for different cases and perceptual listening test are
conducted. Different test subjects are invited to listen to our simulated wave fields
which are auralized using WFS spatial sound renderer consisting of an 88 element
lounspeaker in a 2 dimensional nearly circular geometry.
1 INTRODUCTION
1.3 Motivation
In this thesis we do an investigation for the perceptual effects which get involved in
the spatial sound reproduction. Specifically we focus on spherical microphone arrays.
For auralization application people have used microphone arrays and reproduced the
sound waves using different methodologies importantly like wave field syntheis (WFS).
WFS is an effective tool for spatial sound reproduction as it can synthesize sound field
for large listening areas, after from that it shows robustness towards various practical
limitation [13] [19]. It is important to mention this rendering technique again over
here as our work can be divided in three parts and we use this methodology in second
part of our work.
1. Recording
2. Auralization using WFS
3. Investigation of Psychoacoustic effects
The need to understand and explain the perceptual effects which get incorporated
was still a fairly unexplored territory as far as usage of spherical microphone array is
concerned for auralization purposes using WFS.
There exist many mathematical parameters and inherent errors, namely
1. Microphone Noise
2. Positioning error in the array structure
3. Spatial Aliasing
4. Transform order
These errors and artifacts are bound to perceptually influence the auralization process, in our work we try to subjectively and objectively investigate these relatively
unexplored areas.
We try to see how much a sound field reproduction system based on some said parameters and the given specification would tolerate the errors. We also try to investigate
to what extent the mathematics and theory holds good in perceptual terms.
1 INTRODUCTION
10
1.4 Organization of Thesis

The thesis is divided into 6 chapters, the presentation of the work in this thesis has
been put forward as per the practical scheme of work.
We start with fundamentals and state of the art of spherical microphone arrays in
Chapter 2. We explain the basic mathematical fundamentals in this chapter and further continue it to spherical harmonic decomposition, types of arrays and behaviour of
there radial filters, spatial sampling in spherical microphone array and then plane wave
decomposition. We also talk about the spatial resolution of plane wave decomposition
and its limitations.
In Chapter 3 we bring up the issues related with errors which get involved in the
due course of processing. Positioning error, microphone noise and spatial aliasing
are discussed in this chapter and the state of the art about how these errors are
incorporated in the theoretical analysis is also presented.
Chapter 4 explains about the wave field synthesis, basic theoritical backgound and
its limitations.
In Chapter 5 we first provide the description about the auralization process and then
go on with the description of Listening test and the related analysis of perceptual
effects of the errors and artifacts is presented.
Chapter 6 is the conclusion and it draws out the result more prominently and we
discuss the final suggestions of this work and the future work.
11
2 MATHEMATICAL ANALYSIS AND

STATE OF THE ART
In this chapter we talk about fundamentals of wave propagation and sound fields and
summarize the existing state of the art in spherical microphone array processing and
its auralization using wave field synthesis.
The work presented in this thesis is based on simulating free field room impulse responses with a spherical microphone array and further these impulse responses are
utilized for rendering the spatial sound with WFS set up.
2.1 Acoustic wave equation

The acoustic wave equation is the mathematical formulation of sound propagation
through a medium. This section provides the introductory basics of wave equation for
more discussion please refer [9][20][21].
2.1.1 Homogeneous acoustic wave equation

For the derivation of acoustic wave equation some basic assumptions are made [22]
[23] [24]:
1. The medium of propagation of sound waves is homogeneous, that is the material
characteristics of the medium remain time invariant.
2. The medium is quiescent, that is, it remains in a state of inactivity or dormancy.
12
3. Propagation medium is characterized as an ideal gas.

4. The state changes in the gas are modeled as adiabatic process i.e a process
that takes place without the transfer heat or matter between a system and its
surroundings.
5. The static pressure p0 and static density 0 are significant in comparison to the
pressure and density perturbations of wave propagation.
Independence of relevant parameters of the medium is assured by the first condition.
The second condition gives the assurance that parameters are independent of time
and there is no gross movement of the medium. Laws of ideal gases could be applied
as a result of assumption three, fourth assumption postulates that there is no energy
exchange in the form of heat conduction in the medium i.e., there are no propagation
losses. And fifth assumption tells us that we can linearize the field variables and
medium characteristics around an operation point.
Two fundamental principles are used to derive the wave equation:
1. conservation of mass
2. the equation of momentum
Figure 2.1: Infinitesimal volume element used for the derivation of Eulers equation.
The momentum equation tells us the relation between the force applied to a volume
element and the acceleration of the element due to this applied force. In figure 2.1 an
infinitesimal volume element is considered, we use this explanation for the derivation of
Eulers equation [20][9]. Considering an infinitesimal volume element of fluid xyz.
13
We say that all the six faces experience forces due to the pressure p(x, y, z) in the fluid.
Assume that pressure on any one side is more than other side, therefore a force would
be exerted on the volume element and it would tend to move along the direction of the
force. From Newtons laws of motion we relate this force with acceleration. If we carry
out the same analysis for all three directions, finally it ends up with Eulers equation
which tells us the relation between the pressure applied on the fliud to changes in the
particle velicity of the fluid.
0
= p
t
(2.1)
Here 0 is the fluid density, is the velocity vector at any position (x, y, z) in the
medium.
(2.2)
= uex + ey + wez
p is the pressure. is called gradient or nabla operator and is defined as
ex +
ey + ez
x
y
z
(2.3)
where ex , ey and ez are unit vectors in x, y, z direction respectively, sometimes in literis the change
ature they are also written as i, j, k. p is the pressure gradient and
t
in particle velocity.
The second equation which follows conservation of mass is given as [22][20]:
+ = 0
t
(2.4)
where is the density of propagation medium, is the acoustic particle velocity. As

the above five asuumptions are assumed to hold good hence equation 2.4 signifies
that time rate change in density of the medium is proportional to the gradient of time
rate change in the particle velocity of the medium times the density of the medium.
Further taking above assumptions into consideration the time derivative in equation
14
2.5 expresses the proportionality between time derivative of acoustic pressure and ,
refer [20][25][22] for more detailed description.
p
= c2
t
t
(2.5)
where p is the pressure which is a variable in position and time t and c is speed of sound.
Equation 2.5 gives the temporal derivative of density of propogation medium in terms
of changes in pressure, combining equation 2.5 and 2.4 in view of last assumption,
we get
p
= 0 c2
t
(2.6)
Equation 2.1 and 2.6 with intial and boundary conditions form a complete set of
first order partial differential equations with a unique solution. These equation can be
combined together to form a single second order equation [25][20]. The time derivative
of equation 2.6
2
p = 0 c2
2
t
t
(2.7)
replacing the particle velocity component in equation 2.7 with gradient of pressure
from Eulers equation in equation 2.1 we obtain the the homogeneous wave equation
given by equation
2 p
1 2
p=0
c2 t2
(2.8)
where p is pressure which is a function of position and time t. Equation 2.8 can also
be represented in frequency domain by applying Fourier transformation with respect
to the time t to acoustic pressure p [26][9].
2 P (r, ) +
2
c }
| {z
P (r, ) = 0
(2.9)
k2
Equation 2.9 is known as Helmholtz equation, r is the position, r = (x, y, z), c is the
wave number k and = 2f . Analytically it can be seen that k = 2/, being wave
length, hence k is the amount of angle or radians acheived in one wave length, so if we
want to know the phase of a wave when it has travelled say 7/9 then 7/9 k gives
the phase.
15
2.1.2 Solution of Wave equation in cartesian coordinates

The general solution of wave equation in cartesian coordinates is derived through the
Helmholtz equation in three dimensions [9].
P (r, ) = A () ei(kx x+ky y+kz z)
(2.10)
where A() is an arbitrary constant. Here we define k as

k 2 = kx2 + ky2 + kz2
(2.11)
Another notation for plane wave solution is

P (r, ) = A () eikr
(2.12)
In time domain the solution for wave equation is

p (t) = Aei(kx x+ky y+kz z0 t)
(2.13)
p (t) = Aei(kr0 t)
(2.14)
A is a constant. This is the plane wave solution of the wave equation at a given
frequency 0 . We have directly put forward the solution of wave equation in cartesian
coordinate in an introductory form for detailed description please see [9].
2.1.3 Solution of wave equation in spehrical coordinates

Now we would discuss in detail about the solution of plane wave in spherical coordinate
system as this is directly related in the processing of spherical microphone arrays.
We recall the wave equation 2.8 and again present it over here
2 p(x, y, z, t)
1 2
p(x, y, z, t) = 0
c2 t2
(2.15)
16
2 is also called as laplace operator = 2 and is defined in cartesian coordianates

as
2
2
2
2
+
+
x2 y 2 z 2
(2.16)
The spherical coordinate system shown in figure 2.2 would be followed in this thesis
work. Looking at figure 2.2 we can express cartesian coordinates in terms of r, , .
z
(r, ) (r, , )
Figure 2.2: Spherical coordinate system and its relation to Cartesian coordinate system
x = r sin cos
y = r sin sin
z = r cos
(2.17)
Here r denotes length of vector r and the direction (, ) represent

azimuthhp
i
p
1
elevation pair. Hence we can say r = x2 + y 2 + z 2 , = tan
x2 + y 2 /z and
17
= tan1 [y/x]. Considering equation 2.15 and 2.17 we can express the wave equation
in spherical coordinates as
1
r2 r

r
2 p
+ 2
r sin
p
sin

+
1
2
1 2p
= 0 (2.18)
r2 sin 2 c2 t2
In this equation p is a variable of (r, , , t). The right hand side of the equation
explains the consideration that there are no source in the volume for which the equation
is defined. The solutions for this wave equation in frequency domain is explained in
[9] and is given in two forms as
p(r, , k) =
l=0 m=l
X
X
p(r, , k) =
(Alm (k) jl (kr) + Blm (k) yl (kr)) Ylm ()
l=0 m=l
X
X
(2.19)

(1)
(2)
Clm (k) hl (kr) + Dlm (k) hl (kr) Ylm ()
(2.20)
The two solution represent the interior and exterior problem, equation 2.20 refers
to the exterior problem and equation 2.19 refers to the interior problem. We will
elaborate more on these two solutions and the cofficients Alm (k), Blm (k), Clm (k), and
Dlm (k) in later sections.
The level l and mode m are integers with values defined as 0 l and l m l.
, here f is the frequency of
The acoustical wave number as defined earlier is k = c = 2f
c
the sound wave and c is the speed of sound in the medium. Functions jl (kr) and yl (kr)
(1)
are spherical bessel function of first and second kind respectively. Similarly hl (kr)
(2)
and hl (kr) are known as spherical hankel functions of first and second kind. Ylm ()
is the function known as spherical harmonic of level or order l and mode m and is
defined as
s
Ylm (, ) =
(2l + 1) (l m)! m
P (cos)eim
4 (l + m)! l
(2.21)
18
These expressions which are a outcome of the derivation for the solution of wave
equation 2.15, are acheived by separation of variable in equation 2.18. In [9, page
186],[25, page 380] and in [20, page 337] the derivation and solutions are explained quite
nicely, for more detailed analysis of separation of variable approach used in solving the
wave equation please refer to [27].
In equation 2.21, Plm (cos) is the Legendre function of the first kind and i =
1.
2.1.4 Spherical Bessel and Hankel functions

In [9], solution for spherical wave equation is given. We use separation of variable
approach [27] in order to solve the wave equation in spherical coordinate system, in
this process the spherical wave equation gets separated into four different differential
equations. The solutions to these four constituent differential equation gives us the
solution for wave equation in spherical coordinates. The solution of these differential
equations leads us to spherical Bessel function, Hankel function and the Legendre
polynomial which appears in the spherical harmonics.
jl (kr) and yl (kr) are related to corresponding bessel function as [28] [9] :
Jl+1/2 (x)
2x
r
yl (x)
Yl+1/2 (x)
2x
jl (x)
(2.22)
The equations in 2.22 are valid for l R. The spherical Hankel function of the first
(1)
(2)
and second kind hl (x) and hl (x), are defined as
(1)
hl (x) jl (x) + i yl (x)

(2)
hl (x) jl (x) i yl (x)
(2.23)
here x is the argument, and in our case it is kr. It is seen that when x is real then
(1)
(2)
hl (x) is the conjugate of hl (x), in our case kr is always real, as it is the product
(1)
of wave number and the radius or the distance from the origin. hl (x) eikr and
(2)
hl (x) eikr [9], hence Hankel function of the first kind represents an outgoing wave
where as the other one represents an incoming wave, these solution are used depending
19
upon the location of our sources, in our case the sources lie out side the measurement
sphere (refer to the explaination in the chapter 1 about soap bubble) hence we would
be interested in the incoming wave for the analysis of our spherical microphone array.
Figure 2.3: Spherical Bessel function of the first kind jl (x) (left) and the second kind
yl (x) (right) for order l {0, 3, 6} [11]
Figure 2.3 shows the behaviour of these functions for different level or order l with
respect to the argument x. Here we would like to point out few conclusions, as seen
from the plots the spherical bessel function of the first kind are finite at the origin
but then for higher orders that is l > 0, there is an initial region where the function
remains zero except for the case of j0 (x), also the function of second kind experiences
a negative non fininte behaviour near the origin. Hence firstly we would like to state
the obvious that is from equation 2.23, we can say that the spherical Hankel function
are singular at x = 0. The other consequence which is of importance in latter part
of our analysis is that as the function shows depreciating behaviour near origin or in
cases where l > x, where for us x is kr, that is the product of wave number and radius
or simply we may call it a measure of frequency of the acoustic wave. We notice that
shperical wave solution gives us a kind of damped response for lower frequency regions
and in situation where we use a higher value of level l, also refered as transform order,
we loose low frequency information of the acoustic wave and in order to retrieve it we
try to amplify the signal extensively, these conclusions would again be recalled when
we talk about interior-exterior problem and radial filter components or mode strength
for rigid sphere in plane wave decomposition.
20
2.1.5 Legendre functions

Refer to expression 2.21, the term Plm (x) appeared in the solution for wave equation
in spherical coordinate system. This term is called Legendre function. The legendre
function for the case when m = 0 are known as Legendre polynomials, denoted by
Pl (x), and are expressed by Rodrigues Formula as [9]:
Pl (x) =
1 dl 2
(x 1)l
2l l! dxl
(2.24)
The function Plm (x) which has two indices are known as associated Legendre functions
where m 6= 0. For positive m
Plm (x) = (1)m (1 x2 )m/2
dm
Pl (x)
dxm
(2.25)
and for negative m

Plm (x) = (1)m
(l m)!
Pl (x)
(l + m)!
m>0
(2.26)
The property of Legendre function which makes it attractive for us is that they form
a set of orthogonal functions for each mode m. Hence the spherical harmonics are also
a set of orthogonal functions. For further details the reader is referred to [9][25]
2.1.6 Spherical harmonics

Any function on a sphere could be represented by a combination of spherical harmonics
[9], in our case the solution of acoustic wave equation is obtained in terms of spherical
harmonics Ylm (, ) orYlm () given in equation 2.21. The spherical harmonics define
the angular components of the wave solution. Considering equation 2.26 the spherical
harmonic for negative m can be obtained from the solution for positive m given as
Ylm () = (1)m Ylm ()
m>0
(2.27)
21
where Ylm () is the complex conjugate of Ylm (). There are 2l + 1 different spherical
harmonics for each level l as l m l. One more property of spherical harmonics
is that they are not only orthogonal but they are orthonormal too[9, page 191].
Z
S2
Ylm ()Ylm
0 ()d = l0 l m0 m
(2.28)
here l0 l is the Kronecker delta, which is 1 for l0 = l and 0 otherwise. The surface
integral is defined as
Z
Z
d
d =
S2
sin d
(2.29)
As said above the any fuction on a shpere can be decomposed into the sum of spherical
harmonics [9, page 192] , [29, page 202] .
f () =
l
X
X
flm (k)Ylm ()
(2.30)
l=0 m=l
this expression can also be termed as inverse spherical Fourier transform (ISFT) [29].
As the spherical harmonic functions are orthonormal hence we can obtain the spherical
fourier transform coefficients, given as
Z
flm (k) =
S2
Ylm ()f ()d
(2.31)
The derivation for this expression can be refered in [29, page 202] and [11] in appenddix
(A.1). The importance of these expression presented above is that with the help of
these expression we obtain our spherical wave decomposition and in turn the plane
wave decomposition.
The spherical harmonic functions are further depicted in figure 2.4 for levels l {0, 1, 2, 3}.
In the expression for spherical harmonics in equation 2.21, the Legendre function Plm
represents standing spherical waves in and the factor eim represents traveling spherical waves in [17].
22
Figure 2.4: Spherical harmonics Ylm () for order l {0, 1, 2, 3} [11]
2.1.7 Radial Velocity

Till now we talked about the pressure field, now we would like to shed some light on the
radial velocity of the sound wave. As radial velocity in plane wave docomposition of
spherical waves would represent our directivity function hence an introduction to this
term is important before we go further with Spherical and plane wave decomposition
concepts.
Equation 2.3 is written in spherical coordinates as
1
1
er +
e +
e
r
r
rsin
(2.32)
where e() represents the unit vector in spherical coordinates

Refer Eulers equation 2.1, the Fourier transform of this equation gives us
i0 ck = p(x, y, z, k)
(2.33)
Equation 2.33 is in cartesian coordinates.

= u(r, , k)e + (r, , k)e + w(r, , k)er
(2.34)
23
Solving these equation we obtain the expression for radial velocity component
w(r, , k) =
1
p(r, , k)
i0 ck
r
(2.35)
2.2 Spherical harmonic decomposition

With the given background as described in the previous part now we talk about the
specific solutions. There were two solution to wave equation 2.18, given by equations
2.19 and 2.20. In [30] the author has explained spherical harmonic decomposition
of spherical microphone arrays, and also explained its limitations. Further in [31]
[17] they explain the theoritical analysis for plane wave decomposition (PWD) using
spherical convolution and then further explain the technique of spherical Fourier transforms used for PWD. In this section we will derive expressions for spherical harmonic
decomposition and talk about various consequences which are encountered during this
part of wave field analysis.
2.2.1 Interior and Exterior problem

The solution given by equation 2.20 describes the pressure field for exterior problem
[9]. Refer to figure 2.5, all the source are inside the spherical volume defined by radius
a. As the solution is valid only for the region without any sources hence the pressure
field is described for region where r a. As all sources lie inside the measurement
sphere and as per our discussion on Hankel function, we only take into consideration
the first term of equation 2.20, because the sound waves have an outgoing direction
but as there are no sources outside the region r = a hence there would be no incoming
waves and hence the second part of the solution would not be considered, therefore
our solution is
p(r, , k) =
X
l
X
(1)
(Clm (k) hl (kr)
(2.36)
l=0 m=l
Now we focus more rigorously on the Interior Problem, as this is more interesting for
our work and therefore all explainations would be done with regard to interior problem. In interior problem analysis, the sound sources are located outside the spherical
24
Figure 2.5: Exterior problem[11]

volume and estimating the acoustic effect on the surface of this volume it is sufficient
to characterize the sound in space. Going a bit further we may say that in order to
map this surface we use spherical microphone array. Hence we say that our spherical
microphone array is enclosed by an imaginary spherical volume and at each observation point of the array we attempt to measure the acoustic effect invoked by external
sources. Figure 2.6 shows the case for interior problem, all the source are present
P
P
outside the measurement sphere r = b, the region 1 and 2 represent the sources
outside the valid region of measurement. The solution for interior problem comes from
equation 2.19, as the solution should be finite at all points within the measurement region r b, hence from our discussion on Hankel function and spherical Bessel function
in section 3.2.1 when r = 0, that is at the origin, both the spherical Hankel function
and spherical Bessel function of second kind would not be finite hence our solution
would contain only the first term of equation 2.19 and is given as [9]:
p(r, , k) =
X
l
X
(Alm (k)) jl (kr) Ylm ()
(2.37)
l=0 m=l
where p(r, , k) is the sound pressure at point (r, ), k is the wavenumber and Alm (k)
is the coefficient of the spherical harmonics Ylm () of order l and mode m and jl (kr)
is the spherical Bessel function of first kind.
25
Figure 2.6: Interior problem[11]

As we defined the expression for pressure field we can also define an expression for
radial velocity w(r, , k), equation 2.37 can be used in equation 2.35 and we obtain
l
1 X X
Alm (k) jl0 (kr) Ylm ()
w(r, , k) =
ic0 l=0 m=l
(2.38)
where jl0 (kr) is the derivative of jl (kr) with respect to kr given as

jl (kr)
jl (kr) kr
=
= jl0 (kr) k
r
kr
r
(2.39)
As we are using spherical microphone array and hence we can describe the pressure
at any point on the surface of the spherical microphone array in the same fashion as
presented in the interior problem. This would become more clear in the later sections.
2.2.2 Spherical wave spectrum

Now as defined in equation 2.37, if we can obtain the cofficient Alm (k) then we can
easily define the pressure field p(r, , k). Exploiting the property of orthonormality
of spherical harmonics and the fact that any arbitrary function on a sphere can be
26
expanded in terms of its spherical harmonics [29] we follow the procedure as described
in Appendix A.1, on following the same treatment with equation 2.37 we obtain
1
Alm (k) =
jl (kr)
Z
S2
p(r, , k)Ylm ()d
(2.40)
The expression for Alm (k) is also called spherical wave spectrum as it can be regarded
as spherical Fourier transform of p(r, , k) [9], also written as
1
Plm (r, k) =
jl (kr)
Z
S2
p(r, , k)Ylm ()d
(2.41)
Plm (r, k) describes the sound wave in frequency in terms of wave number or k-space.
2.3 Spherical wave sound fields

Before we go to next sections lets bring out some analysis as how to express a spherical
wave at a point due to some given source. Refer to figure 2.7.
(r, ) (r, , )
P
P
r
z
rs
y
Figure 2.7: Geometrical description for the calculation of pressure p(r, , , k) at point
P for source at Q
27
We consider a point source, also termed as a monopole at the origin O. The pressure
p(r, k) at point P is given by the expression [9, page 198]
p(r, k) = ip0 (k)ckQs
eikr
4r
(2.42)
Here r is the length of the position vector r for point P , c is the speed of sound, and
k is the wave number. Qs represents the source strength. It is the amount of fluid
volume injected into the medium per unit time[9, page 198, 37]. The sound radiation
from a monopole is omnidirectional hence it is independent of angles and . p0 (k)
is the magnitude of the source at origin.
Now if we want to calculate the pressure field at point P due to a source located at
point Q then this can be done by some geometrical manupulation on equation 2.42.
Assume the same monopole to be located at Q with distance rs = krs k from the origin.
If we say r0s = rs then the pressure at point P due to source at Q would be quivalent
to the pressure at P 0 due to the source at the origin O. Therefore pressure p(r, , k)
at point P for a source at Q is
p(r, , k) = ip0 (k)ckQs
eikkrrs k
4kr r k
| {z s }
(2.43)
Here (, ). The significance of this equation is that we derived an expression for

pressure field at a point on a sphere due to a source located at a position other than
the origin, that means if we try to present a analogy with spherical microphone array,
then consider the array as a spherical surface and at any point on that surface we can
describe the pressure field due to a source located at any position Q. Here one more
thing which is to be noted is the fact that as kr rs k is dependent on and hence
the sound pressure in equation 2.43 is also dependent on and .
Further as derived in [9, page 198] the term X is equivalent to Green function G(r|rs )
[9, page 198].
28
2.4 Spherical harmonic expansion of plane wave

In section 2.1.2 an expression is given to calculate the pressure of an ideal plane wave
in cartesian coordinates, now we represent a similar calculation but in spherical coordinates.
p(r, , k) = p0 (k) eikr
(2.44)
where p0 (k) is the magnitude of plane wave, r is the position vector (r, ), and k is
the wave vector. Assuming p0 (k) = 1 for the purpose of derivation and using equation
2.44 in 2.37 we get
ikr
X
l
X
(2.45)
l=0 m=l
Here k and r are the wave vector and position vector respectively. Over here we would
like to point out that the plane wave which was described in vector domain by wave
vector and position vector in equation 2.44 is expressed in terms of wave number k
and scalar distance r. More description about this is given in the A.2.
Equation 2.45 can be further transformed as explained in [9, page 227] and is given
as
eikr = 4
il jl (kr)
l=0
l
X
Ylm () Ylm (0 )
(2.46)
m=l
here 0 0 , 0 is the incident direction of the plane wave, where as is the point
where we want to observe the pressure field. From equations 2.45 and 2.46 we can
draw out a conclusion that
Alm = 4 il Ylm (0 )
(2.47)
and we observe from this that the spherical wave cofficient Alm for plane wave sound
field are not dependent on k or frequency f of the wave.
29
In [11, page 18] equation 2.46 has been simulated for a plane wave sound field of 1
kHz. The simulation has been shown for different maximum value of level l and finally
it was deduced that the plane wave field can be approximated exactly only within a
bounded region around the origin and this region is bigger for higher values of l. If
we say that in equation 2.46, in place of in the first summation, we replace it by a
maximum level l = L, then we can establish an approximate rule given by
d
L
=
(2.48)
here d is the radius of the region, L maximum level l and is the wavelength of
the plane wave. This prportionality states the fact that the region for which we can
effectively define the pressure field is proportional to the level l. For reference plots
are provided in the Appendix A.3.
2.5 Mode strength

We define an expression for the combination of Bessel function and hankel function
which have appeared in earlier sections during the derivation for cofficients Alm of
spherical harmonics. In the process of measurement of sound fields using spherical
microphone arrays the interaction of sound field with the array structure has to be
taken into consideration [9] [17] [31].
If we recall equation 2.37 and 2.40 and express them in a generalized form in order
to associate them to different kind of spherical microphone array structure. Then the
equations are written as
s(r, , k) =
X
l
X
(Alm (k)) bl (kr) Ylm ()
(2.49)
s(r, , k)Ylm ()d
(2.50)
l=0 m=l
1
Alm (k) =
bl (kr)
Z
S2
30
here s(r, , k) is the spherical microphone array response. The term bl (kr) is called
as mode strength. For different microphone array structure the interaction of sound
fields with the array is approximated using this term [9] [32]. In general we define two
types of spherical array structures
Open sphere configuration
Rigid sphere configuration
In open sphere configuration we have a single microphone mounted on a robotic arm
and according to predefined microphone positions, measurements are done for respective positions on the sphere. In rigid sphere the sensors are arranged on a solid sphere.
In appendix A.4 images for open sphere and rigid sphere configuration are given.
4il jl kr,

bl (kr) =
l
4i jl kr
jl0 (ka)
0(2)
hl (ka)
open sphere arrays

(2)
hl (kr) , rigid sphere arrays
(2)
(2.51)
0(2)
here jl (kr) is the spherical bessel function of first kind, hl (kr) and hl (ka)h(2) are
the spherical Hankel function of second kind, ()0 denotes the derivative, and a is the
radius of the sphere, where r a.
The rigid sphere configuration is better than the open sphere configuration [31] [17]
[32]. The major disadvantage of the rigid sphere configuration is that it interferes
or interacts with the surrounding sound fields. The mode strength does accounts
for the scattering effect caused by the rigid sphere while calculating for the incident
waves. Although the scattering effect is negligible for small spheres but it become
more prominent when a larger sphere configuration is used. Hence in case of larger
sphere, measurement should be done more accurately as the scattered waves can be
considered as additional incident waves when they get reflected by other objects in the
measurement environment and impinge on the sphere [31].
In figure 2.8 the mode strengths bl (kr) is plotted as a function of kr and for different
order l, in the figure order l are represented by the alphabet n.
The major advantage of using rigid sphere configuration is the improved numerical
conditioning as in equation 2.50 the spherical coefficient Alm contains a term 1/bl ,
31
((a)) Rigid sphere array
((b)) Open sphere array
Figure 2.8: Mode strength for rigid sphere array and open sphere array [31]
32
and as bl is zero for some cases in open sphere configuration but not in the case of rigid
spheres [17] [31] [33].
2.6 Discretization of spherical aperture and spatial

aliasing
The analysis presented till now gave the description of sound field on a continuous
spherical aperture but in practice we can sample a sphere only on a finite number of
microphone positions. Hence we need to translate the expression for spherical coeffecients Alm (k) which are defined by the integral over a unit sphere in 2.50 into a
finite summation. The approximation of finite integrals is known as quadrature and
the expression for Alm (k) in terms of finite summation is given as [8, page 43]:
1 X
wq s(r, , k) Ylm ()
Alm (k) Alm (k) =
bl (kr) q
(2.52)
where Alm (k) is the approximated spherical coeffecient, Q is the number of microphone
positions and wq are the quadrature weights. The weights wq are the factors which are
used for compensation in different types of quadrature schemes so as to approximate
the sound field as closely as possible to the continuous aperture
Spherical microphone arrays perform spatial sampling of sound pressure defined on a
sphere and similar to time-domain sampling spatial sampling also requires to be limited
in band width i.e., limited harmonic order l to avoid aliasing [31] [34].
Hence in order to avoid spatial aliasing the following equation must hold good [8, page
44]
Alm (k) = 0,where l > Lmax
(2.53)
Here Lmax is the highest order spherical coefficient of the sound field. The equation
given in 2.53 must be ensured in sampling the sphere otherwise spatial aliasing will
33
corrupt the coefficients at lower orders. A more detailed analysis for spatial aliasing
in spherical microphone arrays is presented in [34].
The sampling of level-limited (the word level/order are used interchangeably and refer
to l) sound fields can be done in many different ways as explained in [35] [31] [8]. These
quadrature allow us to perform sampling on the sphere with negligible or no aliasing
as far as equation 2.53 holds good.
Commonly there are three sampling schemes, a more detailed mathematical description
of these sampling schemes can be found in the references provide above.
1. Chebyshev quadrature, the sampling is characterized by uniform sampling in
elevation and azimuth . The total number of microphones in this scheme are
given as Qch = 2Lmax (2Lmax + 1)
2. In Gauss-Legendre quadrature the sphere is sampled uniformly in azimuth
but in elevation it is sampled at the zeroes of the Legendre polynomials of level
Lmax + 1. Number of microphone position required in this scheme are given as
QGL = Lmax (2Lmax + 1)
3. Lebedev grid, in this quadrature scheme the microphone positions are uniformly
spread over the surface of the sphere such that each point has the same distance
to its nearest neighbours.
4
QLb = (Lmax + 1)2
3
(2.54)
In this work we use Lebedev grid as it has an advantage over the other two schemes
and that is, it uses a smaller number of microphones positions for the approximation than the other two. A more detailed description of the lebedev grid is given in
[36] [37] [38] [39]. Reference [39] gives the Fortran code for calculating the grid points
and weights for levels upto l = 131.
Using the approach of quadrature for discretization of the sphere we require a level
limited sound field in order to get an aliasing free sampling but for plane wave sound
fields the restrictions to a maximum level Lmax is not true as we can see this from
equation 2.45 and 2.46 which involve infinite number of non-zero spherical coeffecients
Alm (k). Hence some degree of spatial aliasing does occurs. But refering to section
2.2.1 we get to know that the spherical Bessel function jl (kr) decay rapidly for kr > l,
34
Figure 2.9: Different quadrature schemes [8]

therefore the strength of coeffecients in equation 2.45 can also be supposed to show a
similar behaviour for kr > l. Hence in theory we can say that the aliasing error would
not be there, if the operation frequency of the microphone array follows kr << Lmax .
Therefore we can conclude that due to spatial aliasing limitations due discretization of
measurement sphere the performance of array gets limited at high frequencies or for
large radii of the microphone array.
2.7 Plane wave decomposition

A sound field impinging on a sphere can be decomposed into its plane wave component
by plane wave decomposition using spherical Fourier transform, which was explained
in earlier in section 2.2.3 and also given in appendix A.1.
Considering equation 2.31 and 2.50 the relation between coeffecient of spherical fourier
transform (SFT) and coeffecients of spherical harmonics is given as
flm (k) = Alm (k) bl (kr)
(2.55)
where mode strength bl (kr) is defined in the previous section. Now considering a single
unit amplitude plane wave arriving from 0 = (0 , 0 ) we can get Alm (k) from equation
2.47 and putting this value in equation 2.55 we get
flm (k) = 4il bl (kr)Ylm (0 )
(2.56)
Now this is coeffecient for SFT for a single plane wave, we would now generalize this
for an infinite number of plane waves with the assumption that they have magnitude of
35
w(0 , k) and are arriving from all the directions 0 . Integrating equation 2.56 for all
the incident directions we have the expression for spherical fourier coeffecients flm (k)
Z
flm (k) = 4i bl (kr)

S2
w(0 , k)Ylm (0 )
(2.57)
The expression in equation 2.57 is termed as the spherical fourier transfor of amplitudes
w(0 , k) and we express it as wlm (k)
wlm (k) = flm (k)
1
4il b
(2.58)
l (kr)
For obtaining the amplitude ws (s , k) of any plane wave arriving from any direction
s we perform an inverse SFT of equation 2.58
ws (s , k) =
l=0 m=l
X
X
flm (k)
1
.Ylm (s )
l
4i bl (kr)
(2.59)
ws (s , k) is also called directivity function and describes the decomposed plane wave
for a particular direction s . s is also known as steering direction of the microphone
array, and tells the direction for which plane wave decomposition is computed.
Further if we use equation 2.55 in equation 2.59 we get the expression for plane wave
decomposition in terms of spherical harmonic coeffecients Alm (k).
l=0 m=l
X
X 1
ws (s , k) =
Alm (k) Ylm (s )
l
4i
(2.60)
2.8 Spatial resolution in plane wave decomposition

In [17] [32] the saptial resolution of plane wave decomposition with respect to level
l has been analysed. As we can not use higher order than kr because this results
in negligible amplitudes of spherical harmonic coeffecients in lower frequency regions.
Hence our plane wave decomposition remains level limited to finite extents.
36
It has been shown in [17] that directivity decreases for lower values of order l and this
directivity pattern has been quantified in [17] and [11, page 39] by expression
ws () =
L+1
(PL+1 (cos ) PL (cos ))
4(cos 1)
(2.61)
Here is the angle between arrival direction of plane wave 0 and steering direction
of the microphone array s . PL () is the Legendre polynomial of level l. ws () is the
directional weight and it defines the spatial resolution for plane wave decomposition
calculated with a maximum level L. Refer figure 2.10
Figure 2.10: Directivity weights for PWD verses l [17]

In this figure the directivity weights ws () are plotted for different levels l. It can be
noticed that for the = 0 i.e., when the array looks towards the arrival direction of
plane wave the directivity coeffecient (main lobe) gives a very sharp peak for higher
order and it broadens as the orders are decreased.
The spatial resolution is further defined with a relation between level l = L and the
first (smallest) zero 0 of ws () for > 0. 0 is defined as the half of the resolution of
PWD. 0 = 180
is a relation derived in [17] which tells us the extent to which plane
L
wave decomposition with a particular level L can decomepose a wave field in different
plane waves in spatial sense. Figure 2.11 is approximated by the relation 0 = 180
.
L
37
L
Figure 2.11: Half resolution of the PWD [17]
3 ERROR ANALYSIS
38
3 ERROR ANALYSIS
The performance of spherical microphone arrays is affected by errors and artifacts.

These errors effect the plane wave decomposition of the impulse response data measured by spherical microphone arrays.
In auralization the impact of these errors degrade the quality of spatial sound reproduction. The plane wave decomposition of impulse responses measured by spherical
microphone array form the basis of the deriving functions used for the loudspeaker
arrays on the rendering side and as a result, any kind of measurement error would seep
into the auralization process when the sound field is recontructed.
3.1 Measurement errors

The measurement errors evaluated in the listening test can be classified in two categories:
1. Sampling errors and artifacts: Due to finite number of microphones imposed by
discretization of the sphere, spatial aliasing is observed. More over inaccurate
positioning of the microphone elements add on to give positioning errors
2. Microphone Noise: This is the error induced by non-ideal characteristics of the
microphone and the electronic noise of the microphone elements.
In this work we use rigid sphere microphone configuration hence all the discussion
would be with regard to rigid sphere microphones. In [31] different errors and there
contribution to plane wave decomposition is derived and analysed and a framework is
introduced. In [8] [11] microphone errors with respect to open sphere configuration is
analysed although they use the same frame work as introduced in [31].
3 ERROR ANALYSIS
39
ERRORS IN SPHERICAL MICROPHONE ARRAY MEASUREMENT
Sampling errors and artifacts
Microphone Noise
Positioning error
Spatial
Aliasing
Elevation
error
Azimuth
error
Figure 3.1: Errors in Spherical Microphone array measurement
3.2 Description of measurement error function

In this section we follow the frame work given in [31] and describe the mesurement
errors mathematically and there contribution to spherical harmonic coeffecients.
For the analytical description we assume an arbitrary sound field is captured by an
rigid sphere microphone array. The frequency domain output of a single microphone
element which is considered to have all the errors as depicted in figure 3.1 is
s(r, 0q , k) + eq
(3.1)
where k is the wave number, r is radius of the sphere, eq is the noise introduced
by the microphones and 0q is the microphone position with positioning errors. The
spherical harmonic coeffecients Alm (k) can be calculated by using equation 2.52 which
is explained in 2.6. Keeping the said equations in mind we obtain the following
1
Alm (k) =
bl (kr)
q=1
X
Q
wq s(r, 0q , k) Ylm () +
q=1
X
!
wq eq Ylm ()
(3.2)
3 ERROR ANALYSIS
40
In this equation Q is the number of microphones, wq are the quadrature weights and
bl (kr) is the mode strangth for rigid sphere configuration refer section 2.5. The correct
microphone position as defined by the sampling scheme are denoted by q . Now we
express the sound field s(r, 0q , k) in terms of the correct spherical harmonic coeffecients
Al0 m0 (k) using equation 2.49 in section 2.5 and substituting it in equation 3.2
1
Alm (k) =
bl (kr)
" q=1
X
0 =l0
X
l0 =0 mX
#
wq Y
m0
l0
(0q )
Yl
(0q )
l0
q=1
Al0 m0 (k) bl0 (kr)

(3.3)

wq eq Yl ()
m
{z
The term X is equivalent to the orthonormality condition of spherical harmonics given

in section 2.1.6. In ?? this term has been extended to find the contibution of aliasing
error a and positioning error and is expressed as
0 0 + (l, m, l0 , m0 ), where l, l0 L
X
ll
mm
max
m0
0
m
0
wq Yl0 (q ) Yl (q ) =
(l, m, l0 , m0 ) + (l, m, l0 , m0 ), where l L
Q
q=1
max
< l0
(3.4)
Here l0 l and m0 m are Kronecker deltas. The maximum level Lmax is the highest level
of the spherical harmonic coeffecients Al0 m0 (k) inside the sound field which is sampled
using Q microphone positions, the relation for L and Q could be seen in section 2.6
equation 2.54 for Lebedev grid. In the first part of equation 3.4 the level l0 < Lmax
hence we do not see aliasing error in that expression.
Also from Kronecker deltas we see that if = 0 then 0q and q should be equal,
hence represents the positioning error.
In the lower part of equation 3.4 we consider l0 > Lmax hence we say spatial aliasing
would be there. Since l and l0 are different terms l0 l m0 m does not appears in this
part.
3 ERROR ANALYSIS
41
The aliasing error a is given as [31]
a (l, m, l , m ) =
q=1
X
0
0
m
wq Ylm
0 (q ) Yl0 (q ), where l Lmax < l
0
(3.5)
For positioning error we obtain it by subtracting equation 3.4 from equation 3.5 [31]
(l, m, l , m ) =
q=1
X

0
0
m0
w q Yl m
Ylm (q ), where l Lmax , l0 0 (3.6)
0 (q ) Yl0 (q )
Finally if use equation 3.4 in equation 3.3 and separate the summation over l0 we get
the expression for spherical harmonic coeffecients with all the error contributions [31]
0
l
1 X X
Alm (k)=
Al0 m0 (k) bl0 (kr) l0 l m0 m
bl (kr) l0 =0 m0 =l0
|
{z
}
(s)
Alm (k)signal contribution

0
l
1 X X
+
Al0 m0 (k) bl0 (kr) (l, m, l0 , m0 )
bl (kr) l0 =0 m0 =l0
{z
}
|
()
Alm (k)positioning error

0
l
1 X X
+
Al0 m0 (k) bl0 (kr) a (l, m, l0 , m0 )
bl (kr) l0 =0 m0 =l0
{z
}
|
(3.7)
(a)
Alm (k)aliasing error

Q
1 X
+
wq eq Ylm (q )
bl (kr) q=1
{z
}
|
(e)
Alm (k)microphone noise
In equation 3.7 the first term refer to the error free contribution in spherical harmonic
coeffecients Alm(k). As the Kronecker deltas would be one, hence the first term simplifies to Alm (k). All the other terms represent the errors. From the equation itself we see
that the errors depend on level l, kr and the quadrature. Although we are using rigid
3 ERROR ANALYSIS
42
sphere configuration but as mode strength bl (kr) has different expression for different
microphone configuration hence the errors are also dependent on array configuration.
Finally we can obtain the expression for plane wave decomposition by using equation
3.7 and substituting it in equation 2.60, we get the expression for directivity function
()
in plane wave decomposition. Each term Alm (k) in equation 3.7 yields the contribution
of that particular error to the direction weights ws
ws() (s , k)
X
l
X
l=0
1
()
Alm (k) Ylm (s )
l
4i
m=l
(3.8)
()
where s is the steering direction of spherical microphone array and Alm (k) can any of
(s)
()
(a)
(e)
the four different components in equation 3.7; Alm (k), Alm (k), Alm (k) or Alm (k). In
order to get the effective influence of the measurement errors on results of plane wave
decomposition we relate the error contribution in equation 3.7 to corresponding signal
contribution and we look for relative error contribution by taking ratio of the squared
absolute values of different errors with respect to signal contribution [31].
2

(a)
ws (s , k)
Ea (kr) =
2

(s)
ws (s , k)
2

()

ws (s , k)
E (kr) =
2
(s)

ws (s , k)

2
(e)

ws (s , k)
Ee (kr) =
2
(s)

ws (s , k)
(3.9)
Equation 3.9 Noise to signal ratios are calculated. Figure 3.2 shows the behaviour of
different errors; noise, positioning and alising, for different levels l.
On comparing various quadratures for spatial aliasing, microphone noise and positioning error the Lebedev quadrature is found to have better robustness against the
errors in general. Due to these characteristics we use Lebedev grid along with rigid
sphere [31].
3 ERROR ANALYSIS
43
Figure 3.2: Errors in Spherical Microphone array measurement [31]
3.3 Microphone noise

Microphone noise is an important source of introducing corruptive artifact in auralization process. Although the contemporary microphone technology provides a very high
signal to noise (SNR) ratio but in general we can not disregard the noise induced by
the microphones.
It is important to note that the mode strength refer 2.5, have quite low values for
smaller values of kr at higher levels l and hence, this amplifies the spherical harmonic
coefficients (refer equation 2.52) considerably therefore in situations where noise is
present, it also gets amplified 3.2. The increase in noise is more vigorous in the low kr
range than in the higher kr
Microphone noise also depends on the number of microphone used, it was shown in
simulations in [31] that higher the number of microphone the better is its robustness
against noise. It is also seen that the influence of noise is the lowest when the maximum
level l kr. For higher kr the mode strengths some what converges towards 0 dB
and hence we can say theoritically the increase in error for higher kr should not be
too significant. The quadratures used for discretization of the sphere do not have any
3 ERROR ANALYSIS
44
significant affect in regard to the microphone noise and they all behave in a similar
way. But as the noise affect is more at low kr and hence, we can say that it limits the
array performance on lower frequencies.
3.4 Spatial aliasing

The problem of spatial aliasing is quite complex in spherical microphone arrays. As
continuous aperture is not practically feasible hence we do discretization of the sphere,
using quadratures, which gives us a relation between number of microphones and the
maximum level l. But this discretization of the sphere leads us to spatial aliasing
problem. In [31] [34] [40] analysis of sampling techniques for spherical microphone
arrays and its effect on plane wave decomposition is explained. Aliasing free techniques
for level limited functions and some solution like spatial anti aliasing filters for aliasing
reduction are proposed in [34]
Refer to figure 2.8(a), because of the nature of bl (kr) the magnitude of spherical harmonic coeffecients of the sound pressure becomes increasingly insignificant for l > kr,
r is radius of sphere. The aliasing error is expected to be almost negligible if operating
frequency range of array satisfies the condition kr << Lmax .
A sphere could only be sampled for a finite number of microphone positions which are
given by the sampling technique used on the sphere. In order to do this descritization
the integral expression of Alm (k) has to be approximated by a finite summation. Refer
section 2.5, equation 2.50 is transformed to equation 2.52 which is
1 X
Alm (k) Alm (k) =
wq s(r, , k) Ylm ()
bl (kr) Q
(3.10)
Q is the number of microphone positions and wq are the weights used to approximate
for respective microphone positions. As the number of microphones is limited or finitely
defined hence our transform order also gets limited. For different quadrature schemes
there exists a relationship between maximum level l which can be used for calculation of
spherical harmonic coefficients and number of microphones Q. Section 2.6 describes
3 ERROR ANALYSIS
45
various sampling techniques, our work is based on Lebedev grid which is given in
equation 2.54 as
4
QLb = (Lmax + 1)2
3
(3.11)
Here l = Lmax is the maximum level used, and Lmax in turn corresponds to the
number of microphone positions. Hence number of microphone restrict l to Lmax or
for a particular Lmax the above relation for Lebedev grid should hold good.
Now as the plane wave fields are not level limited because they are represented by an
infinite series of spherical harmonics, shown in equation 2.45 and reproduced again
over here
ikr
l
X
X
(3.12)
l=0 m=l
we notice that they contain an infinite number of non zero coeffecients Alm (k), hence
aliasing would be there due to higher orders.
We would elaborate further about this over here. First if we look at the spherical
Bessel function in figure 2.3, for better readabilty figure is reproduced over here.
Figure 3.3: Spherical Bessel function of the first kind jl (x) (left) and the second kind
yl (x) (right) for order l {0, 3, 6} (x is argument in the plots, x = kr)[11]
The spherical Bessel function jl (kr) decays rapidly for kr > l in the figure, plots
of different levels are depicted, and spherical Bessel function curve for higher orders
becomes more and more damped. Now if we look at the spherical harmonic coeffecient
3 ERROR ANALYSIS
46
Alm (k) values for each l, in cases where kr > l, Alm (k) are significantly low due to the
behaviour of spherical Bessel function, but it can be said that coeffecient values for
kr < l would be present.
Although as l increases the spherical bessel function curve settles more and more closer
to x axis, but if full spectrum sound wave is considered then to some extent for every
kr < l, coeffecient values would be present, where l is not level limited.
As we have a limitation by quadrature that QLb = 34 (Lmax + 1)2 , and because we can
only have a limited number of microphone positions, hence plane wave field where
coeffecients for higher frequencies and higher l are present need a higher number of
microphone positions to sample and obtain the coeffecients successfully, but this is not
possible because Q that is the number of microphone is limited. Hence coeffecients
the higher values of l would be sampled erroneously that means spatial aliasing would
occur.
As Bessel function for higher values of l are very low for values of kr that are higher
than l, but still components for kr < l would be there, hence we put a limit on kr that
is kr Lmax to subdue this effect.
Spherical coeffecients of plane wave given in section 2.45 are not level limited and
should contain higher level l, but this does not occur as the expression for plane wave
given above contains bessel function which have very low significant values for cases
where l > kr. Hence levels till l kr would be defined in the spherical harmonic
coeffecients of plane wave and others will be insignificant or non existant.
Hence Alm (k) = 0, l > Lmax would hold good if the condition kr << Lmax is kept.
Considering the expression for sound field given in terms of spherical harmonic coeffecients (section 2.5) in equation 2.49 and putting it into equation 3.10 with some
rearrangement of the terms we get the expression
" Q
#
l0
X
X
X
1
0
m
Al0 m0 (k) bl0 (kr)
w q Yl m
Alm (k) =
0 (q ) Yl (q )
(3.13)
bl (kr) l0 =0 m0 =l0
q=1
|
{z
}
Z
3 ERROR ANALYSIS
47
Now the term Z is an approximation of orthonormality condition of spherical harmonics

therefore we can say [34]
Q
X
0
0
0
0
m
0
w q Yl m
(3.14)
0 (q ) Yl (q ) = l0 l m0 m + a (l, m, l , m ), where l Lmax < l
q=1
This is an approximation hence we get an additional term a (l, m, l0 , m0 ) which represents the aliasing error induced because of sampling. This aliasing error is the same
term which is the first part of the second case in equation 3.4
As in section 3.3 we said that microphone noise limits the performance of spherical
microphone array in lower frequency regions or at lower kr, in the case of spatial
aliasing, array performance is limited at higher frequencies or for larger radii.
3.5 Positioning error

Positioning error comes into play when microphone element in a spherical microphone
array are not placed at the correct position as defined by the sampling scheme (refer
section 2.6). The position error would affect the directional correctness of plane wave
decomposition and would finally corrupt the output. Because of positioning error the
required spacing between the microphone element on the sphere would get disturbed
and the measurement of impulse response would not be correct. From figure 3.1 the
behaviour of positioning error is shown, as is the case for microphone noise, positioning
error also have a higher impact at lower kr values. In the perceptual evaluation we
have simulated positining error for elevation and azimuth separately. It is clear from
the figure that positioning error is influenced by the transfor order or level l. As the
levels are increased the influence of positioning error increases. The mathematical
expression for positioning errors is given in section 3.2. The impact of positioning
error is minimal if we use kr Lmax .
It is concluded in [34] that the maximum robustness against microphone noise and
positioning error as well, is obtained for plane wave decomposition with L kr, here
L is the value of level l.
48
The spherical microphone array data is auralized using wave field synthesis (WFS).
In this chapter we explain the basic underlying principles of WFS and connect it to
spharical microphone array analysis done in chapter 3, finally we combine the spherical
microphone array analysis and WFS in chapter 5 and auralize the different error cases
for perceptual evaluation.
4.1 Physics behind WFS

WFS is based on wave physics to achieve spatial sound reproduction. The problem of
the spatial reproduction is exactly defined as follows for WFS (or for any other method
of sound field simulation): the objective is to reproduce the detail (both amplitude and
phase in the space) of a virtual sound field on a vast space.
Sound propagates in air as waves, sound waves in this case. These waves are small
oscillations of the medium which are passed on from particle to particle until the wave
reaches our ears.
The Huygenss principle (or the secondary sources principle) evokes the idea to reproduce a sound field by surrounding the listening zone with reproduction sources.
Huygenss theoritical analysis is not directly concerned with spatial sound reproduction, it simply helps in laying down the underlining principle that it is possible to
recreate a wave field by replacing a wave front to be reproduced (originally created
with a primary source) by a continuous distribution of sources (the secondary sources)
on the original wave fronts. With this understanding, we say that it would be possi-
49
ble to drop the primary source because the cumulative contribution of the secondary
sources recreates the wave front originally produced by the primary source.
According to Huygens proposition every partical P in the medium which encounters a
wave, propagates it further to nearby partical P which is kind of directly aligned with
it and with the source of the wave A and they all are in the direct path of the wave.
Now apart form communicating the wave to particle P it would also induce the wave
impact on other particle P .Hence it can be said that around every particale there is
a wave of which this particle would be the centre of propagation. in this explaination
of Huygens proposition, particle P corresponds to the secondary source and A would
be the primary source. This explaination becomes more clear with the following figure
4.3.
P
P
Figure 4.1: Illustration of huygens proposition [12]

Although the secondary source principle makes us assume that it is only possible to
reproduce a particular wave front when these secondary sources are placed on the
particular wave front to be reproduced. But contrary to this pressumtion it is possible
to place the secondary sources on any surface. The secondary sources are considered
as continuously distributes but in realization they are replaced by a funite number of
sound sources or loud speakers, see figure 4.2
As it is possible to replace a real source by an array of reproduction sources i.e.,
secondary sources. Hence, now we need to understand as to how we can feed these
sources which replace the primary source. The mathematical formulation of Huygens
principle is based on the Kirchoff-Helmholtzs integral which gives us the amplitude
50
Figure 4.2: Placement of secondary sources in huygens principle and for sound field
reproduction [12]
and phases for the secondary sources. The frequency domain amplitude and phases
for the position of a given virtual source and a secondary source are transformed to
time domain where they finally define the filters used for deriving the WFS set up.For
every virtual source and every loudspeaker a synthesis operator is defined as shown in
4.2
Figure 4.3: Placement of secondary sources in huygens principle and for sound field
reproduction [12]
In figure 4.2 W F Si represents the synthesis operator which handles an audio content
from a wave file defining the sound to be reproduced in prescribed virtual position
for loudspeaker i. As an example in this thesis we generate a 12 channel sound file
51
which is auralized by WFS, and as we simulated the directional characteristics of the

sound and structured a 12 channel audio file, WFS system takes it up and reproduces
it accordingly.
WFS was proposed by A.J.Berkhout [41]. WFS can be considered as a specific application of holophony [41].
4.2 Mathematical description of WFS
Kirchhoff-Helmholtz
Kirchhoff-Helmholtz
Integral
Integral
Elemination
Elemination of
of
Dipoles
Dipoles
Secondary
Secondary Source
Source
Selection
Selection
Linear/planar
Neumann Greens Function
Neumann Greens
Function
Correction
Correction of
of Source
Source
Mismatch
Mismatch
Point sources/
Synthesis in aplane
Exact
Exact Sound
Sound Field
Field
Synthesis
Synthesis
2 1 - dimensional
2
WFS
Figure 4.4: Overview of WFS [42]

We can reproduce spatially correct sound fields by applying the concept of wave field
synthesis. But some assumptions are also made in order to reduce the complexity.
We will give a description of wave field synthesis and it formulation. We consider a
closed space enclosed by a surface S, and consider a point A within the closed surface
S. Kirchhoff-Helmholtz integral can be derived for the sound pressure at point A using
Greens theorem and wave equation.
1
PA =
4
Z
(P G GP )
(4.1)
52
Figure 4.5: Derivation of sound pressure using Greens theorem and wave equation [13]
where G is called the Greens function, P is the pressure at the surface caused by
an arbitrary source distribution outside the enclosure, and n is the inward pointing
normal unit vector to the surface.
2 G + k 2 G = 4(r + rA )
(4.2)
The pressure at a point A can be calculated if the wave field of an external source
distribution is known at the surface of a source-free volume containing A. General
form of the Greens function is
G=
exp(jkr)
+F
r
(4.3)
Here F may be any function satisfying the wave equation 4.2 with the right-hand term
set to zero. For the derivation of the Kirchhoff integral F = 0 is chosen, and the space
variable r is chosen with respect to A.
r=
p
(x xA )2 + (y yA )2 + (z zA )2
(4.4)
G represents the wave field of a point source at A. The physical interpretation of these
choices is that a imaginary point source must be placed at A to determine the acoustic
wave paths from the surface towards A.
P
= j0 Vn
n
(4.5)
53
Substitution of the solution for G and the equation of motion 4.5 into integral 4.1 we
finally obtain the Kirchhoff integral for homogeneous media.
1
P =
4

Z
1 + jkr
exp(jkr)
P
dS
cos exp(jkr) + j0 Vn
r2
r
S
(4.6)
In the equation 4.6 the first term represents a dipole source distribution driven by
the pressure P at the surface S and the second term represents a monopole source
distribution driven by the normal component of the particle velocity Vn at the surface.
Figure 4.6: Monopole and dipoles on closed surface S [13]
The pressure at a point A can be synthesized by a monopole and dipole source distribution (combinedly called secondary source) at a surface S. The strength of the
distributions depends on the velocity and pressure of external sources measured at the
surface.
Since A can be anywhere within the volume enclosed by S, the wave field within that
volume is completely determined by equation 4.6. The positive lobes of the dipoles
interfere constructively with the monopoles inside the surface, while the negative lobes
of the dipoles exactly cancel the single positive lobe of the monopoles outside the
surface hence outside S the integral is zero. As complexity of the Kirchhoff equation
in equation 4.6 is because of cancellation of wave field of secondary sources outside the
surface S, but for application in our analysis this property is not that important and
hence we derive two special cases of the Greens Function that simplify the Kirchhoff
integral, but we consider some conditions
Fixed surface geometry
54
Non zero wave field outside the closed surface.

Although the wave field inside the surface is correctly approximated by these solutions
but the complexity of the integral decreases by these assumptions.
Now taking these two considerations into account we define two solutions Rayleigh I
and Rayleigh II integrals. The wave field inside a closed surface at A (figure 4.7) is
completely given by these two solutions. These two solutions approximate the secondary sources in a plane and establish the fact that the wave field inside the closed
surface can be correctly described by that. So the primary source distribution lying
outside the closed surface is mapped on a plane of secondary sources which with the
help of Rayleigh integrals give us the wave field at the desired point within the closed
surface.
We will explain Rayleigh I integral over here, Rayleigh II is also same but with some
slight differences. Rayleigh I is defined for monopole distribution and Rayleigh II
defines the integral for dipole distribution. One more important fact to note is that
we have assumed that we are interested in the wave field at the point inside the closed
surface.
Figure 4.7: Assumed surface geometry [13]

The Rayleigh I integral can be found by choosing a particular surface of integration
for integral equation 4.1 and a suitable function F give in equation 4.3. The surface
consists of a plane at z = 0 and a hemisphere in the half space z > 0, as drawn in the
55
figure above. All sources are located in the half space z < 0, so for any value of the
radius R the volume enclosed by S1 and S2 is sourcefree. The pressure in A is now
found by substitution of equation 4.3 in integral equation 4.1. S1 is the infinite surface
at z = 0.

Z
1
exp(jkr)
exp(jkr)
P
P =
P
+F
+F
dS
(4.7)
4 S1
n
r
r
n
We choose a function F such that it cancels the first term, that is the term representing
the dipole distribution. In the case of Rayleigh II we choose F such that the second
term cancels out and we get the first term of the integral only which represents dipole
distribution. In order to achieve monopole distribution only the function F should be
such that the normal component of the gradient of F should have opposite sign to that
of the normal component of gradient of exp(jkr)/r.
In case of Rayleigh I we take F = exp(jkr0 )/r0 which is pressure at A0 , the image of
A mirrored in the plane S1 . Finally after solving we get Rayleigh I integral
1
P =
2
Z
j0 Vn [exp(jkr)/r] dS
(4.8)
S1
This equation states that a monopole distribution in the plane z = 0, driven by two
times the strength of the particle velocity components perpendicular to the surface,
can synthesize the wave field in the half space z > 0 of a primary source distribution
located somewhere in the other half space z < 0. Hence with the aid of the Rayleigh
I integral the wave field of a primary source distribution can be synthesized by a
monopole distribution at z = 0 if the velocity at that plane is known. Refer figure 4.8
The Rayleigh I solution can be applied for synthesis if the wave field in the half space
z < 0 is of no interest as the monopole distribution will radiate a mirror wave field
into the half space z < 0, since there are no dipoles to cancel this wave field.
Rayleigh II
1
P =
2
Z
P
S1
1 + jkr
cosexp(jkr)dS
r2
(4.9)
In Rayleigh II integral the wave field of a primary source distribution can be synthesized
by a dipole distribution at z = 0 if the pressure at that plane is known.
56
Figure 4.8: Rayleigh I (only monopoles) [13]
Figure 4.9: Rayleigh II (only dipoles) [13]
4.3 Synthesis operator

In Rayleigh integrals the solution is provided in the form of surface integral where
we try to find a solution considering a planar array of loud speakers, but now we
approximate it into a line distribution and hence we simplify our solution from planar
to linear distribution.
We assume that all secondary sources have identical characteristics, in [43] derived an
expression for the 2D-operator by reducing the (exact) 3D Rayleigh I surface integral
57
to a line integral. We consider a primary source S in the xz plane, the pressure field
of the primary source is given by
P (r, ) = S()G(, , )
exp(jkr)
r
(4.10)
Figure 4.10: Coordinate system and approximation (4)

S() is the source function and G(, , ) is the directivity characteristic of the source.
All along the plane xz, secondary monopole distribution is considered which synthesizes
the wave field of the primary source at the receiver R according to Rayleigh Integral
I, from equation 4.8.
Psynth
1
=
2

exp(jkr)
j0 Vn (r, )
dxdy
r
xyplane
(4.11)
Vn (r, ) is the velocity component of primary source perpendicular to xyplane. The
surface integral will reduce to a line integral along x axis by evaluating the integral in
y direction along line m. Using equation 4.5 defined earlier to equation 4.10 which
58
gives pressure field of the source and then applying method of stationary phase by
Bleistein given in [44] for solving the integration we finally get wave field at point R.
r
Psynth = S()
jk
2
exp(jkr0 ) exp(jkr0 )
r0
G(, 0, )
dx (4.12)
r0 + r0
r0
r0
The Rayleigh I integral is evaluated for a primary source S and receiver R, both located
in the xz-plane. Vector r points from the primary source to a secondary source M at
the line m. Vector r points from secondary source to receiver. Here we see that all
secondary sources M along the line m are approximated by a secondary point source
at the intersection of m and x axis. In this we first tried to approximate the secondary
sources along the vertical line m and then integrated it along x axis taking care of the
phase. Hence finally the integration converges to one dimension along x axis but maps
the whole surface. We define a function called driving function for a point mono pole
at x axis which approximates the line m.
r
Qm (x, ) = S()
jk
2
r
exp(jkr)
(, 0, ) cos
r + r
r
(4.13)
Hence the synthesis equation can be represented as

Z
Psynth =
Qm (x, )
exp(jkr)
dx
r
(4.14)
Figure 4.11: Approximation to 2D linear form [13]
59
Finally we get the wave field synthesis equation considering the driving function
Qm (x, ) and the stationary phase method.
Psynth = S()G(0 , 0, )
exp(jk( + ))
( + )
(4.15)
where is the vector pointing from primary source to stationary phase point. The
function is independent of receiver position at z = z0 . Hence in the situation of
different receiver points which are on the line z = z0 , the synthesis still holds correct.
If we see the driving function
r
Qm (x, ) = S()
jk
2
z0
exp(jkr)
G(, 0, ) cos
z0 + z0
r
(4.16)
There is a deviation because of the Rayleigh operator which results in analysis from
p
planar to linear and because of this there is a gain factor z0 /(z0 + z0 ), now the
wave field which is reproduced is correct in phase but slightly deviates in amplitude
because of this gain factor and if we choose the distance for reference line appropriately
the amplitude error can be kept small in large listening regions.
With some more analysis the wave field solution in the previous section which were
defined for straight line geometry can be extended for an arbitrary shape of secondary
source line and receiver line. The figure below depicts the scenario.
The pressure at the receiver line is approximated as:
Z
PR =
Qm (r, )
exp(jkr)
dl
r
(4.17)
Here the integration is along the secondary source line and the driving fuction is defined
as
r r
jk
r
exp(jkr)
(4.18)
Qm (x, ) = S()
cos
2 r + r
r
here r is the length of the vector from the primary source to the secondary sources,
is the angle of incidence of r at the secondary source line and r is the vector from
secondary line to reference . The driving function here allows flexibility in the synthesis
array for the speakers. Over here we have only considered the case of monopoles simi-
60
Figure 4.12: Approximation to a line integral for a non uniform line geometry [13]
larly we can derive it for dipole arrays also. There are some other similar explanations
of wave field synthesis, one is by Spors and Rabenstein [45] [13].
4.4 Focusing operator

The principle of reciprocity between source and receiver states thta acoustic paths are
reversible, hence it is possible to focus sound waves from distributed secondary sources
towards a focus point.
4.5 Practical Consequenses

Loud speaker array used for WFS reproduction are not an ideal monopole or dipole.
They contribute radiation and phase deficiency that causes distortion in the produced
sound field. A solution has been proposed based on the multichannel iterative proce-
61
dure and that intends to optimize the sound field produced by the loud speaker array
[46] [47].
According to the Kirchhoff-Helmholtz integral, sound recording should be performed
by microphone array. However, it should be noted that the WFS system as defined
by Berkhout [48] [41] relies on the concept of notional sources, which consists in
substituting close microphones for the microphone array. Each primary source is thus
picked up by one individual microphone and the microphone signal is then propagated
to the virtual microphone array by applying amplitude weight and time delay. Therefore each microphone signal can be identified as to one individual primary source and
may be considered as a virtual substitute for this source, i.e. a notional source.
As a result of this notional source concept the WFS now only deals with the reproduction of the virtual sources, that is the notional sources by a loud speaker array.These
sources are described by specifying their position within the virtual sound scene according to the parameterization. They are reproduced as monopoles located outside
or even inside the listening room.
It is found that aurakization with 2D WFS system has more advantages than compared
to 2 12 D WFS system.In [5] this conclusion is suggested. The resoning for this is the
fact that amplitude errors play a different more prominent part in the reproduction of
room acoustics than dry source reproduction.
One more important limitation with WFS is that the spatial aliasing artifact by it
during spatial sound reproduction. As we saw in the case of microphone arrays , in
WFS also our array is discretized and because of this there is always a spatial aliasing
limit of a WFS system. Frequencies beyond this limit would not be reproducted by
the WFS system accurately.
5 LISTENING TEST
62
5 LISTENING TEST
The work presented above is basically done to understand the effect of various errors
and artifacts which get involved due to processing with spherical microphone arrays.
Mathematically, various factors for spherical microphone arrays have been analytically
exlained in previous chapters. The extent to which these errors and artifacts seep
through the rendering process during auralization and interfere in a corruptive way
needs to established. Spherical microphone array analysis discussed in this thesis
are in particular focused for auralization purposes hence at last it all comes down to
listening test. It is important to know that mathematically if there is some artifact
does it really gives a perceptual effect also and to what extent.
5.1 Listening Test

Analytical development of spherical microphone arrays is established by many authors
but a clear relation between various artifacts and errors in microphone arrays and there
perceptual impact have not been presented in the case of spatial sound reproduction
systems. In [49] perceptual evaluation for spherical microphone array is given for
binaural synthesis and evaluation is done for spatial aliasing and spatial perception.
The statistical analysis in [49] is performed with repertory grid analysis. Listening
Evaluation tests was performed in [50] for rigid sphere spherical microphone array
in order to compare two different kinds of microphone capsule used in the spherical
microphone array, over here the authors reproduced the sound field using 5.1 system
and investigated for microphone capsules performance.
Our work is concerned with the perceptual analysis of artifacts when spherical microphone array is used for spatial sound reproduction using wave field synthesis. In
5 LISTENING TEST
63
chapter 3 different errors are described. The aim of this chapter is to check and analyse
the perceptual effect of those errors when spherical microphone array data is auralized
using WFS.
The three factors which induce undesireble artifacts in the spatial sound reproduction
are defined in previous chapters, they are
Microphone noise
Positioning error
Spatial Aliasing
5.2 Reproduction set up

The spatial sound field reproduction in our study is realised using wave field synthesis
(WFS). The wave field synthesis set up consisted of a loudspeaker array of 88 elements.
The inter element spacing between the loudspeakers is 18 cm (approx.). The spatial
aliasing limit upto which the WFS reproduction set up can synthesize the sound fields
accurately is approximately 1000 Hz.
The reproduction is done in WAVE STUDIO laboratory of Fraunhofer IDMT. Figure
5.1
Figure 5.1: Wave Studio (Fraunhofer IDMT )
As shown in picture 5.1 the layout of loudspeakers is in a horizontal plane. The height
of this loud speaker layout from the ground is approximately equal to the head level
5 LISTENING TEST
64
when a test subject is in a sitting position. Hence the horizontal plane contains listener
in the middle surrounded by this array of loudspeakers.
Arrangement of listening position
The test subject who were invited for the listening test were made to sit roughly in the
middle of the room so that the listner is almost equidistant from all the loudspeaker
panels. The following graphic much clearly shows where the test subjects were made
to sit for the tests
Listener
Loud speaker
Figure 5.2: Arrangement of listening position
5.3 Auralization
In order to auralize the effect of these errors, impact of a full spectrum free field sound
wave on a sampled spherical microphone array is simulated and plane wave decomposition is done. The direction of propagation of sound wave is = (azimuth, elevation) =
(, ) = (0 , 90 ) i.e., the source lies on the horizontal plane with no vertical elevation.
Plane wave decomposition is done for 12 directions including the the direction of proagation. The simulation is done for free field case. Figure 5.3 shows the 12 different
direction for which the PWD of spherical microphone array data is done. For simulation the spherical microphone array radius r is taken as 15 cm. Free field impulse
responses for all different cases were obtained for 12 different plane wave decomposed
directions. These responses for different directions when convovled with the test audio
5 LISTENING TEST
65
(0,90)
Source direction
(270,90)
ARRAY
(90,90)
(180,90)
Figure 5.3: Depiction of PWD for 12 direction

signal will impose the behaviour of sound field on it. As a result we get a 12 channel
audio file, where each chennel represents to a directional sense of the test audio signal
in space.
5.3.1 Aspect to be percetually evaluated

The basic parameters set for simulation were designed keeping in mind the rendering tool. We simulate plane wave decomposition for different error cases. The base
transform order or level l is taken as 3, this comes from the fact that our reproduction
system has a spatial aliasing limit of 1000 Hz and for spherical array of radius r = 15
cm we have a value of kr 3. Now for a given order L, bl (kr) has the shape of a
bandpass filter ( except level zero, which behaves as a low-pass filter, with the peak
around l kr). Hence when a plane wave is approximated with a finite summation of
order l = L in spherical harmonics domain the reconstructed amplitudes are expected
to be attenuated at frequencies corresponding to kr > L [49]. Therefore to be in line
with this we keep the base transform order L = 3 considering the spatial aliasing limit
of WFS reproduction set up.
In order to establish a mechanism for comparison, we simulated and auralized an ideal
full spectrum wave impact on a continuous aperture spherical microphone array for
order L = 3. The impulse responses of ideal full spectrum wave were used to get our
reference signal against which all comparisons are made.
5 LISTENING TEST
66
For evaluation we first would define the questions for which we want to get answers
from this listening test.
1. The first question is although if we have our microphone array with enough
number of measurement positions on the sphere as defined by 2.6 then also we
face spatial aliasing (refer section 3.4) as full spectrum audio is auralized hence
the limitations kr < l would not be followed, we want to know the perceptual
effect of spatial aliasing
We investigate is the perceptual effect of changing transform order l for fixed
number of microphones. For these cases number of microphone for spherical
array was fixed to be Q = 302. The highest transform order auralized for which
we perceptually analysed the signal is L = 6, for which according to equation
2.54, Q 66 is sufficeint. The parameter Q i.e the number of microphones is 302
because for firstly it is above the minimum number of microphones required by
all conditions of L analysed in the listening test and secondly with a minimum
302 position for our base transform level L = 3 there is no noticeable spatial
aliasing.
The number of microphones is taken as 302, because the first test which we do is
used to establish a minimum required number of microphone where no aliasing
is perceptually observed. For our base transform order L = 3 at 302 sampling
points there was no aliasing and this fact is substantiated by the listening test
also. Plot are provided in the appendix.
2. For microphone noise, we add additive white gaussian noise to the frequency
domain output of the sampled sphere and then continue with the process of
plane wave decomposition. Two question are there for which we attempt to find
answers.
For a fixed transform level and a fived number of microphone positions what
is the level of microphone noise which becomes perceptially significant and
for what value it remains percetually in significant.
The other question is we investigate the effect of transform level l on the
microphone noise. We change the transform order for fixed noise level and
see how it impact for different levels
5 LISTENING TEST
67
3. The last aspect which we check is positioning errors (refer section 3.5) and
equation 3.6. The positioning error are checked againt varying transform orders
L = 3, 4, 5, 6.
The positioning error is added to the quadrature values obtained from Lebedev
grid structure. The position error is an agular offset in azimuth and elevation
which is added to = (, ). Positioning error values are normally distributed
with defined degree of standard deviation (SD) that is, we simulate the sound field
for normally distributed error values with particular level of standard deviation.
Two types of positioning errors are separately evaluated for their pecpetual effect
on the sound.
Positioning error in Azimuth: Here the error is added only in azimuth
Positioning error in Elevation: Error is added only in elevation
For both these errors a minimum value of error is investigated for base transform
order L = 3 which is found to have no peceptual impact when compared with
reference, this same error level is investigated for different transform order.
5.3.2 Processing
The flow diagram 5.4 shows the sequence of processing done for auralization spherical
microphone array data through WFS system. The block Positioning error and Microphone noise in figure 5.4 are the addition of noise and positioning error, noise is added
in the second stage to the pressure responses of microphone elements. Positioning
error is added in the first stage when discretization process is done and quadratures
are calculated.
5 LISTENING TEST
Positioning error
68
Sampling of spherical
Microphone array aperture
Simulation of full audio

spectrum wave impact
Mocriphone noise
Spherical Harmonic decomposition
Plane wave decomposition

12 direction
Impulse Response
Convolution
with dry audio
(12 channel audio
file)
WFS
rendering
system
Figure 5.4: Processing chain for auralization of spherical microphone array
5 LISTENING TEST
69
5.4 Structure of listening test

The listening test was conducted using the MUSHRA (Multiple Stimuli with Hidden
reference and Anchor) methodology [51]. The methodology is used for the subjective
evaluation of audio quality. Here the listener is presented with a reference which is
labeled as such in the test, then there is an anchor and a certain number of test samples
which contains the hidden reference.
The reference is the audio track againt which all other test samples are compared and
graded. Therefore reference is the bench mark and according to it comparison is done.
Anchor is the audio track which is the worst case possible which stands out to be graded
as bad. According to reccomendation in MUSHRA the anchor should be a 3.5 kHz low
pass version of reference. The purpose of anchor is to make the scaling in evaluation
more accurate. The test samples are different test conditions for which we want to do a
perceptual evaluation. In MUSHRA the the refernce signal is presented along with the
test samples and so that the listener should identify this hidden reference accurately
other wise they are considered outliers.
100-80
80-60
60-40
40-20
20-0
Excellant
Good
Fair
Poor
Bad
Table 5.1: MUSHRA scale

The test subjects was made to sit in the wave studio at position mentioned in figure
5.2 with the listening test GUI on a computer.
The GUI contains is the reference signal, 1,2,3... the are the different test condition
which have to be compared with reference. The reference and anchor is hidden and
it can be any two track among 1,2,3,4,5... which would play hidden reference and
anchor. Hence if we have four test conditions to evaluate the there would be six test
signal which would be present in the GUI, among those six, one would be hidden
reference, and one is Anchor. Rest four are the test conditions. A valid listener has to
identify the hidden refernce and mark it excellent and the listener should also be able
to identify anchor and mark it as bad.
5 LISTENING TEST
70
5.4.1 Audio Tracks

Three dry audio tracks were auralized for perceptual evaluation.
Castanet
Speech
Musical Song
These three test audio tracks were convolved with the impulse resposes of different
test conditions and converted into 12-channel audio files. The duration of these audio
tracks was not more that 15 seconds. The listening test siganls are categorised into
six major categories. For all the categories four test conditions are evaluated The
duration of the complete listening test was about 45 minutes. There were a total of
126 individual audio tracks including reference track.
5.4.2 Listening test condition

A total of six different test categories were structured. In total there were 18 different
evaluation instances each consisting of refrence,anchor and four test conditions. The
reference is the case where sound wave is simulated for a continuous aperture i.e., a
spherical microphone array which is not discretized for a finite number of microphone
positions.
Spatial aliasing
Under spatial aliasing two different classes of tests were conducted to answer the questions raised in section 5.3.1. All three test tracks given above are auralized for the
two different test cases. Refer table 5.2
Spatial aliasing Vs Number of microphones: In this case impact of spatial aliasing
is perceptually evaluated againt different number of microphone positions on the
spherical microphone array
Spatial Aliasing Vs Transform order(L): In this test condition we change the
tansform order and look for the extent to which spatial aliasing is induced.
5 LISTENING TEST
71
Table 5.2: Test conditions for perceptual evaluation of spatial aliasing

Spatial aliasing
I. Spatial Aliasing Vs Number of Microphones
Track Condition1 Condition2 Condition3 Condition4 Condition5 Reference
(ANCHOR)
L=3
L=3
L=3
L=3
L=12
L=3 ConMic=302
Mic=194
Mic=86
Mic=26
Mic=6
tinuous
aperture
(a)
castanet
castanet
castanet
castanet
castanet
castanet
(b)
speech
speech
speech
speech
speech
speech
(c)
music
music
music
music
music
music
II. Spatial Aliasing Vs Transform order(L)
(ANCHOR)
L=3
L=4
L=5
L=6
L=12
L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Mic=6
tinuous
aperture
(d)
castanet
castanet
castanet
castanet
castanet
castanet
(e)
speech
speech
speech
speech
speech
speech
(f)
music
music
music
music
music
music
5 LISTENING TEST
72
Microphone noise
In perceptual evaluation of microphone noise again two different test cases were structured. Refer table 5.3
Perceptual analysis of different microphone noise level: Here we analyse different
noise levels for a fixed number of microphones and a fixed taransform order. We
try to establish the case for which noise level peceptually insignificant.
Microphone noise Vs Transform order: The value of the noise level obtained
in the above test is taken and checked for auralization with different transform
orders.
Table 5.3: Test conditions for perceptual evaluation of micophone noise
Microphone noise
I. Perceptual analysis of different microphone noise level
(ANCHOR)
L=3
L=3
L=3
L=3
L=3 Mic=6 L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Noise=40dB tinuous
Noise=Noise=Noise=Noise=aperture
80dB
65dB
55dB
40dB
(g)
castanet
castanet
castanet
castanet
castanet
castanet
(h)
speech
speech
speech
speech
speech
speech
(i)
music
music
music
music
music
music
II. Microphone Noise Vs Transform order
(ANCHOR)
L=3
L=4
L=5
L=6
L=12
L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Mic=302
tinuous
Noise=Noise=Noise=Noise=Noise=40dB aperture
80dB
80dB
80dB
80dB
(j)
castanet
castanet
castanet
castanet
castanet
castanet
(k)
speech
speech
speech
speech
speech
speech
(l)
music
music
music
music
music
music
5 LISTENING TEST
73
Positioning error
For positioning error two cases were formed one for error in azimuth and the other for
error in elevation. Refer table 5.4
Positioning Error (Elevation) Vs Transform order
Positioning Error (Azimuth) Vs Transform order
Table 5.4: Test conditions for perceptual evaluation of positioning error
Positioning error
I. Positioning Error (Elevation) Vs Transform order
(ANCHOR)
L=3
L=4
L=5
L=6
L=12
L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Mic=302
tinuous
SD=0.15
SD=0.15
SD=0.15
SD=0.15
SD=10
aperture
(m) castanet
castanet
castanet
castanet
castanet
castanet
(n)
speech
speech
speech
speech
speech
speech
(o)
music
music
music
music
music
music
II. Positioning Error (Azimuth) Vs Transform order
(ANCHOR)
L=3
L=4
L=5
L=6
L=12
L=3 ConMic=302
Mic=302
Mic=302
Mic=302
Mic=302
tinuous
SD=0.15
SD=0.15
SD=0.15
SD=0.15
SD=10
aperture
(p)
castanet
castanet
castanet
castanet
castanet
castanet
(q)
speech
speech
speech
speech
speech
speech
(r)
music
music
music
music
music
music
5.5 Test subjects

A total of 21 test subjects participated in the listening test experiment, 3 female and 17
male with ages varying from 23 to 30 years, giving a mean age of 25.42. Among the test
subject none had any kind of hearing impairment. All the test subjects were students of
TU Ilmenau. 24% of the test subject had some kind of listening test experience but not
with spatial sound reproduction setup.As 24% of the test subject were not really too
aware of sApatial
sound systems hence, all the test subject were given an introductory
5 LISTENING TEST
74
overview of spatial sound systems and were explained the listening test set up. This
kind of orientation was felt important as the listening test and the wave studio lab at
first always give an impression of three dimensional surround sound auralization, but
in reality our work is more focus to know the perceptual impact of various noise and
artifacts. Hence an introductory level of information and main motive of the listening
test was beifly explained to the listeners
5.6 Evaluation
5.6.1 Test subject screening
Out of 21 participants, 14 were able to identify the hidden reference and anchor and
7 did not identify either one or both of them. Hence scores for 14 test subjects were
considered valid and the other 7 were considered as outliers.
5.6.2 Statistic for the evaluation of listening test

The distribution of the test data is checked and it was found that the test data approximately follows a normal distribution for all six categories and all individual canditions.
The basic statistical formulation in this part of our analysis is as follows
The mean X tc is the mean for for a particulat track t and a particular test condition
c. Refer tables 5.2, 5.3, 5.4, and Stc is the standard deviation.
X tc =
1 X
xztc
N z
s
Stc =
P
x2ztc ( z x2ztc )2
N (N 1)
(5.1)
(5.2)
where t: track
c: test condition
z: index of the test subject
N : Number of test subjects
5 LISTENING TEST
75
5.6.3 Definations
Statistical significance (p value): The statistical significance of a result is the
probability that the observed relationship (e.g., between variables) or a difference
(e.g., between means) in a sample occurred by pure chance (luck of the draw),
and that in the population from which the sample was drawn, no such relationship
or differences exist. Using less technical terms, one could say that the statistical
significance of a result that tells us something about the degree to which the
result is true (in the sense of being representative of the population). More
technically, the value of the p-value represents a decreasing index of the reliability
of a result (see Brownlee, 1960). The higher the p-value, the less we can believe
that the observed relation between variables in the sample is a reliable indicator
of the relation between the respective variables in the population. Specifically,
the p-value represents the probability that is involved in accepting our observed
result as valid, that is, as representative of the population [52].
Confidence Interval: The confidence interval gives us the information about the
reliability of the calculated mean. It is defined as the range in which the maen
would exist with a given probability if the test is reapeated [].
Calculation of confidence interval:

X tc tc , X tc + tc
Stc
tc = tp
N
(5.3)
The value of tp is extracted from the t-table distribution according to the number
of test subjects N .
Analysis of variance (ANOVA): The purpose of analysis of variance (ANOVA)
is to test for significant differences between means. In ANOVA the statistical
significance between means are tested by comparing the (i.e., analyzing) variances. In order to establish that the data obtained for different conditions show
perceptual difference, we further analyse the measured characteristics [53].
The 2-way-ANOVA analysis we get three p values (also explained in as statistical significance), if p-value is near zero this means that the associated null
hypothesis is in doubt. A sufficiently small p-value suggests that atleast one col-
5 LISTENING TEST
76
umn sample mean is significantly different than that of the other column sample
means. Interpreting for the different test conditions used in our test. Now if
p-value is sufficiently small than it proves the fact that there is some effect due
to conditions imposed by transform order. The p-value for the test conditions
is zero. This proves the fact that affect of transform order is significant. Refer
[54] [55] for more description in 2-way-ANOVA.
condition 1 represented as 1 reference
condition 2 represented as 2 number of microphone=302
condition 6 represented as 6 anchor
5.7 Spatial aliasing vs transform order

We analyse the the impact of changing transform order on spatial aliasing.
Figure 5.5 shows us that the impact of spatial aliasing increases as we increase the
transform order for microphone array processing. In this case the number of sampling
position is fixed to 302. It is sufficiently proved in this case that for a transform
order of L = 3 and with spherical sampling for 302 position there is no perceptual
aliasing effect. It is important to note that the maximum number of required sample
positions according to lebedev grid structure is 66 for L = 6 (the highest transform
order auralized). Theoritically sphere discretization for 302 position should be sufficient
for a aliasing free plane wave decomposition.
We see from the plots that speech signal get worst affected by spatial aliasing artifacts.
The behaviour shown by music and castanets is almost similar. The confidence interval
for music and castanet overlap. We do a 2-way ANOVA analysis for spatial aliasing
case and look for statistical significance.
We see from the table that p-value for transform order is zero or almost zero this
signifies the fact that the main factor affecting our test cases is the transform order.
5 LISTENING TEST
77
120
castanet
speech
music
Perceptual scores
100
80
60
40
20
3
4
Test conditions
Figure 5.5: Aliasing vs transformorder

On looking at the second column we see that p-value for test items is also quite close
to zero as we assume a significance level of 95% for any test data to be statistically
significant, that means p-value below 0.05 would be considered significant. Hence we
that test tracks also have an effect on the perceptual results of spatial aliasing.
Table 5.5: 2-way ANOVA for Spatial Aliasing
5 LISTENING TEST
78
5.8 Evaluation of positioning error

The affect of positioning error on auralization of plane wave decomposition is tested
separately for
1. Positioning error in elevation
2. Positioning error in azimuth
The confidence interval plots for positioning error are shown in figure 5.6, and figure
5.7
120
castanet
speech
music
100
Perceptual scores
80
60
40
20
3
4
Test conditions
Figure 5.6: Analysis of positioning error in azimuth for all three test items
Comparing both these figure we see a obvious pattern, first is that in both these
error cases increasing the transform order degrades the audio quality. But the next
interesting part is that the slope or extent to which error in elevation corrupts the
audio quality is not the same in azimuth. In the case of Elevation error, firstly all
three different test items have similar perceptual performance. There overlaping in
confidence intervals further substantiate this conclusion.
On the other hand in azimuth we see a different behaviour, for speech and music it
follows the same trend and seems as if elevation and azimuth have same effect, on
speech and music but for the castanets, azimuth error does not seem to degrade the
5 LISTENING TEST
79
120
castanet
speech
music
100
Perceptual scores
80
60
40
20
3
4
Test conditions
Figure 5.7: Analysis of positioning error in elevation for all three test items
signal in the same way as it is for other tracks. The confidence intervals although
overlap with each other but only to a small extent.
Considering the above discussion in mind we are tempted to investigate more on this
issue. In order to establish whether perceptual effect cast by these two error cases are
similar or not, we first check for the hidden significance among the different test items
in each error case.
Although in the case of elevation error if we look at the plots closely the confidence
interval among diffrerent test items i.e, music, speech, and castanet overlap to a relatively high degree, hence we can fairly conclude on the basis of confidence interval
plots that in elevation error condition 3,4, and 5 (which corresponds to transform order
4,5 and 6) share a high degree of similarity in there corruptive behaviour towards the
auralization of plane wave decomposition.
In the case of azimuth we need more evidence to establish the extent of impact. For
azimuth error we do a 2 way ANOVA analysis. In this analysis we compare effect of
test items and the effect of different conditions simultaneously.
2-way ANOVA: In our test we assume the configence interval level of 95% hence, any
p-value bigger than 0.05 that is considered high.
5 LISTENING TEST
80
Table 5.6: 2-way ANOVA analysis for azimuth error
The second p-value corresponds to the effect caused by test items, and second pvalue is 0.0006, this is also very small value and hence, it suggests that different test
items do have an impact on the complete test scenario for azimuth error. The third
value corresponds to the fact that there is no interaction between test items and test
conditions. As we can observe that the p-value for third category is quite high.
On performing 2-way-ANOVA on elevation error we get the following values of statistical significance.
Table 5.7: 2-way ANOVA analysis for elevation error
The statistical significance values in the case of elevation suggest that only the test
condition have there influence on the perceptual scores and test item do not have any
impact.
Finally we plotted combined confidence interval for all test items for both positioning
errors investigate whether any one of the error have a higher impact on perceptual
scores or not.
5 LISTENING TEST
81
120
elevation
azimuth
Mean perceptual score for all

three test items together
100
80
60
40
20
3
4
Test conditions
Figure 5.8: Average positioning error in elevation and azimuth for all three test items
5 LISTENING TEST
82
5.9 Microphone noise

In the perceptual analysis of noise we first conducted noise level tests for different levels
of noise. The transform order was kept constant at L = 3, and different levels of noise
were tested . Figure 5.9 gives the mean and confidence level plots for the different level
of noise.
Figure 5.9: noise level
120
castanet
speech
music
100
Perceptual scores
80
60
40
20
3
4
Test conditions
It is observed from the figure that as the degree of noise is increased the perceptual
response also goes down. For all the test items the response towards noise is somewhat similar. One case stands out and shows an equivalent perceptual performance in
comparison to Reference, that is case when noise level is -80 dB, i.e.,it is also observed
that noise level of -80 dB is perceptually indistinguishable in comparison to refernce.
the reference
Table 5.8: 2-way ANOVA analysis for Noise levels
5 LISTENING TEST
83
Table 5.8 gives the values for 2-way ANOVA test. From the p-values it is evident that
test items had no hidden significant influence. And as expected only the noise levels
cast a significant impact on the perceptual evaluation.
In 5.10 we have compared a noise level of -80dB against transform order which varies
from 3...6.
Figure 5.10: Noise vs transform order
120
castanet
speech
music
100
Perceptual scores
80
60
40
20
3
4
Test conditions
The plot show all the test items and there perceptual degradation when transform
orders are increased. The noise gets heavily effected even when the transform order
is changed from 3 to 4. At a transform order of 3 the test item showed equivalent
perceptual performance as compared to the reference. A 2-way ANOVA test further
substantiates the significant impact of transform orders on noise.
Table 5.9: 2-way ANOVA analysis for Noise levels vs transform order
The significance values in table 5.9 show the significance of transform order on perceptual evaluation of different test items.
6 Conclusions
84
6 Conclusions
Spherical microphone arrays were studied and analysed. The purpose of thesis was
to auralize the spherical microphone array data with the help of wave field synthesis
and then check out various errors and artifacts. Our aim was to simulate these errors
in the simulation encvironment and then design a listening test to do a perceptual
evaluation in order to establish whether in reality any particular said parameter had
any perceptual effect or not and to what extent.
There are three major limitations which affect its performance are evaluated in this
work
Spatial aliasing
Positioning error
Microphone noise
A full spectrum wave impact is sampled on a spherical microphone array and plane
wave decomposition for 12 direction is done. We simulated many test cases and analysed those test cases our selves. After indetailed analysis and auralization we design
the listening test.
For all the purposes of auralization the transform order of L = 3 was selected as the
base transform order. WFS system has a spatial aliasing frequency of 1000 Hz, and
in order to have aliasing free sampling, not only a sufficient number of microphone
positions is required but product of wavenumber and radius of the sphere kr L, this
is one of the conditions which need to be satisfied in spherical microphone arrays, on
the other side the WFS spatial aliasing should also not be crossed. The model filters
have a shape of bandpass filters except l = 0. Therefore, keeping L = 3 would restrict
6 Conclusions
85
the bandwidth on spherical microphone array side also and that is why for L = 3 we do
not see any significant corruption of the signal specifically by spatial aliasing artifacts.
Three different type of audio tracks were used. They are music, speech and castanet
Following conclusions were drawn on the basis of listening test.
1. For the perceptual analysis of each category of error we simulated the free field
data with a fixed degree of errors and artifacts and presented it in the listening
test experiment.
2. Spatial aliasing get magnified as we inrease the transform order in spherical
microphone array processing
3. The errors were evaluated for there perceptual identifiability as the transform
order is increased. It is found from the test results that microphone noise gets
amplified with the inrease in transform order.
4. The simulated spherical microphone array was based on lebedev grid sampling
scheme and it is noticed that even after having way more than the required
number of sampling positions aliasing effects were observed. It is concluded that
the number of microphone positions as per the calculation of lebedev grid do not
necessarily provide a aliasing free impulse response measurement.
5. The degradation of perceptual quality with increasing number of taransform
order is very steep.
6. Positioning errors in azimuth and elevation also get amplified with increasing
tansform order.
7. Azimuth error is found out to be influenced by the audio tracks also (substantiated by 2 way ANOVA test).
8. It was observed that in all the cases speech is affected badly by all the error
categories equally.
Further as a next step real room auralization could be implemented and then these
errors could be check again.
Bibliography
86
Bibliography
[1] D. Malham, Higher order ambisonic systems for the spatialisation of sound, in
Proceedings, ICMC99, Beijing. Beijing, China: International Computer Music
Association, 1999, pp. 484487.
[2] M. Kolundzija, C. Faller, and M. Vetterli, Reproducing sound fields using mimo
acoustic channel inversion, J. Audio Eng. Soc, vol. 59, no. 10, pp. 721734, 2011.
[3] V. Pulkki, Spatial sound generation and perception by amplitude panning techniques, Ph.D. dissertation, Helsinki University of Technology, Helsinki, Finland,
2001.
[4] F. M. Fazi, Sound field reproduction, February 2010.
[5] E. Hulsebos, Auralization using wave field synthesis, Ph.D. dissertation, Delft
University of Technology, 2004.
[6] J. Sonke, Variable acoustic by wave field synthesis, Ph.D. dissertation, Delft
University of Technology, 2000.
[7] D. de Vries and E. M. Hulsebos, Auralization of room acoustics by wave field synthesis based on array measurements of impulse responses, 12th European Signal
Processing Conference (EUSIPCO),, no. 12, 2004, eng.
[8] F. Melchior, Investigations on spatial sound design based on measured room
impulse responses, Ph.D. dissertation, TU Ilmenau, 2011.
[9] E. G. Williams, Fourier Acoustics: Sound Radiation and Nearfield Acoustical
Holography. Academic Press, 1999.
Bibliography
87
[10] F. Zotter, Analysis and synthesis of sound- radiation with spherical arrays, Ph.D.
dissertation, University of Music and Performing Arts, 2009.
[11] O. Thiergart, Sound field analysis on the basis of a spherical microphone array
for auralization applications, 2007.
[12] W. P.A.Gauthier, A.Berry, An introduction to the foundations, the technologies
and the potential applications of the acoustic field synthesis for audio spatialization on loudspeaker arrays, in Proceedings of the Harvest Moon symposium on
multichannel sound, Montreal, Canada, 2004.
[13] E. Verheijen, Sound reproduction by wave field reproduction, Ph.D. dissertation,
Delft University of Technology, 1997.
[14] R. Boone, Design and development of a synthetic acoustic antenna for highly direction sound measurements, Ph.D. dissertation, Delft University of Technology,
1987.
[15] J. Meyer and T. Agnello, Spherical microphone array for spatial sound recording,
in Audio Engineering Society Convention 115, Oct 2003.
[16] J. Daniel, S. Moreau, and R. Nicol, Further investigations of high-order ambisonics and wavefield synthesis for holophonic sound imaging, in Audio Engineering
Society Convention 114, Mar 2003.
[17] B.Rafaely, Plane-wave decomposition of the sound field on a sphere by spherical
convolution, Journal of the Acoustical Society of America, vol. 116, no. 4 I, pp.
21492157, 2004.
[18] M. A. Poletti, Three-dimensional surround sound systems based on spherical
harmonics, J. Audio Eng. Soc, vol. 53, no. 11, pp. 10041025, 2005.
[19] K. Brandenburg, S. Brix, and T. Sporer, Wave field synthesis, in 3DTV Conference: The True Vision - Capture, Transmission and Display of 3D Video, 2009,
May 2009, pp. 14.
[20] D.T.Blackstock, Fundamentals of Physical Acoustics. John Wiley, 2000.
[21] S. Spors, Active listening room compensation for spatial sound reproduction
systems, Ph.D. dissertation, University of Erlangen-Nuremberg, 2006.
Bibliography
88
[22] A.D. Pierce, Acoustics-An Introduction to its physical principles and applications.
Acoustical Society of America, 1991.
[23] P.M.Morse and H.Feshbach, Methods of theoretical physics. New York: McGrawHill, 1953, vol. Part I.
[24] Morse and Feshbach, Methods of theoretical physics.
1953, vol. Part II.
New York: McGraw-Hill,
[25] E. Skudrzyk, The foundations of acoustics: basic mathematics and basic acoustics.
Springer-Verlag, 1971.
[26] B. Girod, R. Rabenstein, and A. Stenger, Signals and systems. Wiley, 2001.
[27] J. Feldman, Solution of the wave equation by separation of variables, January
2007, lecture Notes.
[28] R. Collins, Mathematical Methods for Physicists and Engineers, ser. Dover books
on physics. Dover Publications, 1999.
[29] J. Driscoll and D. Healy, Computing fourier transforms and convolutions on the
2-sphere, Advances in Applied Mathematics, vol. 15, no. 2, pp. 202 250, 1994.
[30] T. Abhayapala and D. B. Ward, Theory and design of high order sound field
microphones using spherical microphone array, in Acoustics, Speech, and Signal
Processing (ICASSP), 2002 IEEE International Conference on, vol. 2, May 2002,
pp. II1949II1952.
[31] B. Rafaely, Analysis and design of spherical microphone arrays, Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 1, pp. 135143, Jan 2005.
[32] J. Meyer and G. Elko, A highly scalable spherical microphone array based on an
orthonormal decomposition of the soundfield, in Acoustics, Speech, and Signal
Processing (ICASSP), 2002 IEEE International Conference on, vol. 2, May 2002,
pp. II1781II1784.
[33] G. B. Arfken, H.-J. Weber, and F. E. Harris, Mathematical Methods for Physicists.
Oxford: Academic, 2012.
[34] B. Rafaely, B. Weiss, and E. Bachmat, Spatial aliasing in spherical microphone
arrays, Signal Processing, IEEE Transactions on, vol. 55, no. 3, pp. 10031010,
Bibliography
89
March 2007.
[35] G. Galdo, Geometry-based channel modeling for multi-user mimo systems and
applications, Ph.D. dissertation, 2007.
[36] V. Lebedev, Quadratures on a sphere, {USSR} Computational Mathematics and
Mathematical Physics, vol. 16, no. 2, pp. 10 24, 1976.
[37] , A quadrature formula for the sphere of 59th algebraic order of accuracy,
Russian Academy of Sciences-Doklady Mathematics-AMS Translation, vol. 50,
no. 2, pp. 283286, 1995.
[38] V. Lebedev and D. Laikov, A quadrature formula for the sphere of the 131st
algebraic order of accuracy, vol. 59, no. 3, pp. 477481, 1999.
[39] V. Lebedev, Fortran code for lebedev grids. Internet resource, 2009.
[40] Z. Li, R. Duraiswami, E. Grassi, and L. Davis, Flexible layout and optimal cancellation of the orthonormality error for spherical microphone arrays, in Acoustics,
Speech, and Signal Processing, 2004. Proceedings. (ICASSP 04). IEEE International Conference on, vol. 4, May 2004, pp. iv41iv44 vol.4.
[41] A. J. Berkhout, A holographic approach to acoustic control, J. Audio Eng. Soc,
vol. 36, no. 12, pp. 977995, 1988.
[42] F. Melchior and S. Spors, Spatial audio reproduction: from theory to production,
in tutorial, 129th Convention of the AES, 2010.
[43] E. W. Stuart, Application of curved arrays in wave field synthesis, in Audio
Engineering Society Convention 100, May 1996.
[44] N. Bleistein, Mathematical Methods for Wave Phenomena. Academic Press, July
1984.
[45] Wave field synthesis techniques for spatial sound reproduction, in Topics in
Acoustic Echo and Noise Control, ser. Signals and Communication Technology,
E. HAnsler
and G. Schmidt, Eds. Springer Berlin Heidelberg, 2006, pp. 517545.
[46] E. Corteel, U. Horbach, and R. Pellegrini, Multichannel inverse filtering of multiexciter distributed mode loudspeakers for wave field synthesis, in Audio Engineering Society Convention 112, Apr 2002.
Bibliography
90
[47] U. Horbach, D. de Vries, and E. Corteel, Spatial audio reproduction using distributed mode loudspeaker arrays, in Audio Engineering Society Conference: 21st
International Conference: Architectural Acoustics and Sound Reinforcement, Jun
2002.
[48] A. J. Berkhout, D. de Vries, and P. Vogel, Acoustic control by wave field synthesis, The Journal of the Acoustical Society of America, vol. 93, no. 5, pp.
27642778, 1993.
[49] A. Avni, J. Ahrens, M. Geier, S. Spors, H. Wierstorf, and B. Rafaely, Spatial
perception of sound fields recorded by spherical microphone arrays with varying
spatial resolution, The Journal of the Acoustical Society of America, vol. 133,
no. 5, pp. 27112721, 2013.
[50] D.-M. et. al., A comparative study of spherical microphone arrays based on subjective assessment of recordings reproduced over different audio systems, in Proceedings of Forum Acusticum 2011, Aalborg, Danemark, Jun 2011, pp. 22272230.
[51] I. Recommendation, 1534-1: Method for the subjective assessment of intermediate quality level of coding systems, Tech. Rep., 2003.
[52] G. Enderlein, Brownlee, k. a.: Statistical theory and methodology in science and
engineering, Biometrische Zeitschrift, vol. 3, no. 3, pp. 221221, 1961.
[53] R. A. S. Fisher, Statistical methods for research workers. Edinburgh Oliver and
Boyd, 1938.
[54] R. V. Hogg and J. Ledolter., Engineering Statistics. New York: MacMillan, 1987.
[55] MATLAB, 2-way-anova, Internet, May 2014.
List of Figures
91
List of Figures
1.1
1.2
1.3
Soap bubble model of acoustic radiation [10]. . . . . . . . . . . . . . . .

Impulse response based auralization [5]. . . . . . . . . . . . . . . . . . .
Natural recording based auralization [5]. . . . . . . . . . . . . . . . . .
4
6
7
Infinitesimal volume element used for the derivation of Eulers equation.

Spherical coordinate system and its relation to Cartesian coordinate
system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Spherical Bessel function of the first kind jl (x) (left) and the second
kind yl (x) (right) for order l {0, 3, 6} [11] . . . . . . . . . . . . . . . .
2.4 Spherical harmonics Ylm () for order l {0, 1, 2, 3} [11] . . . . . . . . .
2.5 Exterior problem[11] . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6 Interior problem[11] . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Geometrical description for the calculation of pressure p(r, , , k) at
point P for source at Q . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8 Mode strength for rigid sphere array and open sphere array [31] . . . .
2.9 Different quadrature schemes [8] . . . . . . . . . . . . . . . . . . . . .
2.10 Directivity weights for PWD verses l [17] . . . . . . . . . . . . . . . .
2.11 Half resolution of the PWD [17] . . . . . . . . . . . . . . . . . . . . .
12
2.1
2.2
16
19
22
24
25
26
31
34
36
37
3.1
3.2
3.3
Errors in Spherical Microphone array measurement . . . . . . . . . . . 39

Errors in Spherical Microphone array measurement [31] . . . . . . . . . 43
Spherical Bessel function of the first kind jl (x) (left) and the second kind
yl (x) (right) for order l {0, 3, 6} (x is argument in the plots, x = kr)[11] 45
4.1
4.2
Illustration of huygens proposition [12] . . . . . . . . . . . . . . . . .

Placement of secondary sources in huygens principle and for sound field
reproduction [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
50
List of Figures
92
4.3
Placement of secondary sources in huygens principle and for sound field

reproduction [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4 Overview of WFS [42] . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 Derivation of sound pressure using Greens theorem and wave equation
[13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6 Monopole and dipoles on closed surface S [13] . . . . . . . . . . . . . .
4.7 Assumed surface geometry [13] . . . . . . . . . . . . . . . . . . . . . .
4.8 Rayleigh I (only monopoles) [13] . . . . . . . . . . . . . . . . . . . . .
4.9 Rayleigh II (only dipoles) [13] . . . . . . . . . . . . . . . . . . . . . . .
4.10 Coordinate system and approximation (4) . . . . . . . . . . . . . . . .
4.11 Approximation to 2D linear form [13] . . . . . . . . . . . . . . . . . .
4.12 Approximation to a line integral for a non uniform line geometry [13] .
52
53
54
56
56
57
58
60
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
63
64
65
68
77
78
79
81
82
83
Wave Studio (Fraunhofer IDMT ) . . . . . . . . . . . . . . . . . . . . .

Arrangement of listening position . . . . . . . . . . . . . . . . . . . . .
Depiction of PWD for 12 direction . . . . . . . . . . . . . . . . . . . .
Processing chain for auralization of spherical microphone array . . . . .
Aliasing vs transformorder . . . . . . . . . . . . . . . . . . . . . . . . .
Analysis of positioning error in azimuth for all three test items . . . . .
Analysis of positioning error in elevation for all three test items . . . .
Average positioning error in elevation and azimuth for all three test items
noise level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Noise vs transform order . . . . . . . . . . . . . . . . . . . . . . . . . .
50
51
A.1 Spherical coordinate system and its relation to Cartesian coordinate

system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.2 Pressure field of a 1 kHz plane wave for different levels l using equation
2.46. The pressure field is shown for (x, z) plane with y = 0. [11] . . . 99
A.3 Pressure field of a 1 kHz plane wave for different levels l using equation
2.46. The pressure field is shown for (x, y) plane with z = 0. [11] . . . 99
A.4 A general pictoral view of rigid sphere configuration . . . . . . . . . . 100
A.5 Open sphere configuration. [8] . . . . . . . . . . . . . . . . . . . . . . 100
List Of Tables
93
List of Tables
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
MUSHRA scale . . . . . . . . . . . . . . . . . . . . . . . . .
Test conditions for perceptual evaluation of spatial aliasing .
Test conditions for perceptual evaluation of micophone noise
Test conditions for perceptual evaluation of positioning error
2-way ANOVA for Spatial Aliasing . . . . . . . . . . . . . .
2-way ANOVA analysis for azimuth error . . . . . . . . . . .
2-way ANOVA analysis for elevation error . . . . . . . . . .
2-way ANOVA analysis for Noise levels . . . . . . . . . . . .
2-way ANOVA analysis for Noise levels vs transform order .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
69
71
72
73
77
80
80
82
83
APPENDIX
94
APPENDIX
A Derivations
95
A Derivations
A.1 Orthonormality of Spherical harmonics and

Spherical Fourier transform
Spherical harmonic function Ylm () are orthonormal [9, page 191] therefore we have
Z
S2
Ylm ()Ylm
0 ()dd = l0 l m0 m
(A.1)
m
where Ylm
0 ()d is the conjugate complex of Yl0 () and l0 l is the Kronecker delta,
which is defined as
1, if l0 = l
l0 l =
0, otherwise
(A.2)
Any arbitrary function on a sphere f ()can be expanded in terms of spherical harmonics [29, page 202]
f () =
l=0 m=l
X
X
flm (k)Ylm ()
(A.3)
A Derivations
96
here flm (k) are the complex constants. Equation A.3 is also called inverse spherical
Fourier transform. Further exploiting this mathematical expression and on multiplying
equation A.3 with Ylm and integrating over a unit sphere we get
Z
S2
l=0 m=l
X
X
f () Yl ()d =
=
S2
l=0 m=l
X
X
flm (k)Ylm () Ylm d
(A.4)
Z
flm (k)
S2
Ylm () Ylm ()d
Considering equation A.1 and A.4 we have
S2
f () Yl ()d =
l=0 m=l
X
X
(A.5)
flm (k)l0 l m0 m
with the help of equation A.2, finally we define the spherical Fourier cofficient flm (k)
as
Z
S2
f () Yl ()d =
l=0 m=l
X
X
(A.6)
flm (k)
Also from [29] spherical fourier transform pair is given as f (, ) =F T flm

Z
flm =
=S 2
f (, ) =
f (, )Ylm ()d = F T {f (, )}
l=0 m=l
X
X
(A.7)
flm Ylm () = F T 1 {flm }
Also the surface integral over the sourface of the sphere is described as
Z
d =
S2
sin dd
2
(A.8)
A Derivations
97
(r, ) (r, , )
Figure A.1: Spherical coordinate system and its relation to Cartesian coordinate system
A.2 Position vector and Wave vector

in figure A.1 spherical coordinate system is given. is defined as the (, ), where
is elevation and is the azimuth. r is a scalar value which gives the radial distance of
point described as (r, , ).
A Derivations
98
Relation between spherical coordinates and cartisian coordiantes
x = r sin cos
y = r cos sin
z = r cos
y
= arctan
x

z
= arccos
x2 + y2 + z2
r = x2 + y2 + z2
(A.9)
where 0 2, 0 and 0 r
k is wave vector defined as
k2 = kx2 + ky2 + kz2
kx = k sin cos
(A.10)
ky = k cos sin
kz = k cos
k is the wave number, defined in the direction of propagation of the wave front of the
sound field.
In some literatures pressure and other quantities of sound wave are represented in terms
of and in some it is represent by wave number k, in both the cases its equivalent as
k = /c.
A Derivations
99
A.3 Plane wave pressure field for different levels

Figure A.2 and A.3 show the pressure field of 1 kHz plane wave travelling along
negative z axis, i.e., 0 = (0, 0) for different levels l = L, where L = 5, 25, 50
Figure A.2: Pressure field of a 1 kHz plane wave for different levels l using equation
2.46. The pressure field is shown for (x, z) plane with y = 0. [11]
Figure A.3: Pressure field of a 1 kHz plane wave for different levels l using equation
2.46. The pressure field is shown for (x, y) plane with z = 0. [11]
A Derivations
100
A.4 Rigid sphere and open sphere configuration

The following figure show a general view of rigid spher and open configurations
Figure A.4: A general pictoral view of rigid sphere configuration
Figure A.5: Open sphere configuration. [8]
A Derivations
101
Declaration
I hereby certify that this thesis was created autonomously without using other than
the stated references. All parts which are cited directly or indirectly are marked as
such. This thesis has not been used in the same or similar forms in parts or total in
other examinations.
Ilmenau, 14. 09. 2013
Gyan Vardhan Singh
Signature
Theses
102
Theses
1. A complete understanding of spherical microphone array processing was developed in this thesis.
2. In this thesis we simulated and auralized the free field spherical microphone array
impulse response
3. Auralization part was done with wave field synthesis
4. Various errors which palgue the performance of spherical microphone array are
analysed
5. We design a listenig test for the perceptual evaluation of various error and artifacts
6. Investigate the effect of different orders on errors and look for patterns

Psychoacoustic Investigation of Spherical Microphone Array Auralization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Psychoacoustic Investigation of Spherical Microphone Array Auralization

Uploaded by

Copyright:

Available Formats

Ilmenau University of Technology

Psychoacoustic Investigation on the Auralization of

Univ.-Prof. Dr.-Ing. Karlheinz Brandenburg

Dipl.-Ing. Johannes Nowak

Electronic Media Technology Lab (TU Ilmenau)

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

Master Thesis Gyan Vardhan Singh

Description of measurement error function

4 WAVE FIELD SYNTHESIS

Master Thesis Gyan Vardhan Singh

Master Thesis Gyan Vardhan Singh

Master Thesis Gyan Vardhan Singh

Master Thesis Gyan Vardhan Singh

Master Thesis Gyan Vardhan Singh

Figure 1.1: Soap bubble model of acoustic radiation [10].

Master Thesis Gyan Vardhan Singh

the reproduction techniques stressing more on the loudspeaker arrays. Mathematical

Master Thesis Gyan Vardhan Singh

Wave Field Measurement

wave field parameters based on

(directional room impulse

Wave field synthesis set up

Figure 1.2: Impulse response based auralization [5].

Master Thesis Gyan Vardhan Singh

Mesurement of acoustics of the

Wave Field Measurement

Recording and Desired Hall

Figure 1.3: Natural recording based auralization [5].

Master Thesis Gyan Vardhan Singh

Master Thesis Gyan Vardhan Singh

Master Thesis Gyan Vardhan Singh

1.4 Organization of Thesis

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

2 MATHEMATICAL ANALYSIS AND

2.1 Acoustic wave equation

2.1.1 Homogeneous acoustic wave equation

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

3. Propagation medium is characterized as an ideal gas.

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

where is the density of propagation medium, is the acoustic particle velocity. As

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

2.1.2 Solution of Wave equation in cartesian coordinates

where A() is an arbitrary constant. Here we define k as

Another notation for plane wave solution is

In time domain the solution for wave equation is

2.1.3 Solution of wave equation in spehrical coordinates

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

2 is also called as laplace operator = 2 and is defined in cartesian coordianates

Here r denotes length of vector r and the direction (, ) represent

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

(Alm (k) jl (kr) + Blm (k) yl (kr)) Ylm ()

Master Thesis Gyan Vardhan Singh

2 MATHEMATICAL ANALYSIS AND STATE OF THE ART

2.1.4 Spherical Bessel and Hankel functions

hl (x) jl (x) + i yl (x)