Introduction To Ambisonics - Rev. 2015

Introduction to Ambisonics Francesca Ortolani Rev.
2015a
Introduction to Ambisonics
A tutorial for beginners in 3D audio
Francesca Ortolani
iron@ironbridge-elt.com
Ironbridge Electronics
(Excerpt from the technical papers written during the development of Ambisonic Auralizer)
1.1 Introduction to Surround and 3D audio techniques

During the development of techniques and technologies in the audio world, engineers have tried
since the early years of the twentieth century to reproduce recorded sources or live takes in a realistic
manner, with the aim of giving spaciousness to sound creations.
Research is divided into several branches. Even today most of the sound reproduction systems sold,
either consumer or professional, are based on two audio channels. The main reason for that is the high
cost of amplifiers, signal processors and speakers, which often limited to 2 the number of channels for
musical productions, television, radio, etc.
In cinemas and theaters the widespread use of multi-channel systems started earlier than at home,
since it is easier to sell sound systems of medium-to-low quality with relatively low prices for home-
theatre applications.
However, only a few commercial post production studios are suitable for multi-channel mixing.
The vast majority of post-production control rooms are equipped with 2 (at most 3) speakers for stereo
playback and a subwoofer.
Multi-channel audio spread especially over film, theater and video games/virtual reality, whereas
the 98% of music production is still stereophonic.
This is not only due to a problem of costs, but also to the diffusion of the audio formats on which the
CD (2-ch, 44100 Hz, 16-bit), and earlier tapes and vinyl, prevailed.
Engineers also tried with a very little success to carry the 4-channel information on stereo devices.
Among these attempts a remarkable solution was the use of subcarriers on vinyl records. Despite the
introduction of new formats such as DVD-Audio and Super Audio CD (SACD) or multichannel wave
files (Wave Ex), music is still stereo almost in its entirety.
Another issue that discouraged sound engineers from working on multichannel mixes has been the
need to keep compatibility of the finalized audio when the number of the channels is scaled down. For
example, it is a good practice to check how a stereo mix sounds when its channels are summed to
mono. The compatibility with monaural listening is a crucial problem and it should not be
underestimated. Because of the sum of the left and right channels, you may have phase cancellations
that affect countless hours of fine-tuning during the mixing.
1
Introduction to Ambisonics Francesca Ortolani Rev. 2015a
Initially, for example, sound engineers were asked to preserve the sound quality down-mixing from
CD (stereo) to TV or radio (in the past these devices were only mono). However, the problem still
exists today in the case of live music in which, because of the size of venues and stadiums, most
listeners do not benefit from stereophonic listening.
Obviously, passing from surround to stereophonic or monophonic systems is even more critical.
Figure 1.1 In order to have a correct perception of the

stereophonic sound, the speaker pair and the listener must be
located at the vertices of an equilateral triangle.
Over the past few decades, since the second half of the twentieth century, sound artists tried to give
more and more spatial dimension to sound and their own artistic creations. This has led to the
development of techniques aimed at rendering 3D sound, which would have been alternative to classic
surround - according to its several standards imposed upon the market by Dolby - and stereo
techniques.
These include binaural audio, Wave Field Synthesis, OPSI and Ambisonics.
In particular Ambisonics can coexist with stereo and surround sound systems such as 5.1 or 7.1, etc..
This 3D audio technique was introduced in the 70s by the team led by Gerzon, Fellgett and Barton
supported by the National Research and Development Council and the British Technology Group. It is
compatible with a wide variety of speaker array configurations (either regular-symmetrical or
irregular-asymmetrical with various shapes). In-depth explanation of the physical principles on which
Ambisonics is founded are given later on; for the moment it is useful to know that this technique is in
part an extension of the basic principles of Mid-Side miking technique [1] patented by Blumlein. This
technique uses sum and difference signals between a microphone (from the family of cardioids) with
its axis pointing at 0 (MID) and a figure-of-8 microphone (SIDE) with its axis rotated by 90 with
respect to the mid. Figure 1.2 shows an example of Mid-Side configuration.
2
Figure 1.3 Example of Mid-Side configuration - polar

diagram (MID: cardioid microphone, SIDE: figure-of- 8 Figure 1.3 Quadraphonic system
microphone). LEFT = (MID+SIDE)/2, RIGHT = (MID-
SIDE)/2 [14]
Ambisonics should not be confused with "traditional" surround. First of all, Ambisonics allows
including information relative to the height (classic surround techniques are 2D instead). The
principles of acoustics on which this technique is based will be explained in detail in the next section.
Furthermore, for example, considering a classic quadraphonic system, while the phase difference
between the signals received at the front speakers is processed quite effectively by the auditory system
(at least for low frequencies), this is not the case for the rear speaker pair, so classic surround systems,
quadraphonic or larger systems, do not allow a good source localization. This is due to the fact that
sources in classic surround are recorded according to "discrete" channels, that is, independent on each
other, and the differences in level between channel pairs are used [2], [3].
Hence the layout of the loudspeakers relative to the listener becomes crucial: you can experience it
even in a simple stereo system where, if the listener and the speakers are not perfectly placed at the
vertices of an equilateral triangle, the exact source localization is lost. Testing assessed that the
quality of the ghost images1 between speaker pairs is poor if these are spaced by an angle greater than
60 degrees (i.e. the equilateral triangle mentioned above).
In quadraphonic sound, for example, speakers are spaced by 90 degrees causing a feel of "hole in
the middle". A homogeneous sound reproduction system is defined as a system in which no direction
is treated with any particular preference. Typical cinema surround systems are not homogeneous, in
fact the sound coming from the front stage (screen) is usually controlled more accurately than the rear
channels, since a solid match between sound and image is searched with the objective not to distract
the audience. We can say, however, that surround systems are coherent, within certain limits, in the
sense that the sound image remains stable, that is, not subject to significant discontinuities, if the
listener changes position [4]. The consistency of the front image is guaranteed by the presence of
sounds uncorrelated from the rest of the system. This can be achieved, for example, by delaying and
spreading the signals sent to the surround system. What we ideally want is that the reproduced sound
field (recorded or synthesized) is homogeneous and consistent (coherent) at the same time.
1
A ghost image is a sound source apparently coming from the middle of the stereo soundscape between 2
speakers.
3
In Ambisonics, on the other hand, the signals sent to the speakers contain information from each
microphone capsule used in the recording with different relations resulting from a decoding matrix.
The effect of spatialization here is much more robust than in traditional surround techniques, in the
way that the sweet spot, i.e. the optimum listening position, is wider. Ambisonics is not limited to a
precise number of speakers: the higher the number the better the directional resolution you can get.
The reason for that will be explained next, by introducing the concept of order in Ambisonics.
1.2 The Physics in Ambisonics

A comparison with other techniques
In sound field description, source characterization is one of the most important jobs of
Auralization. Auralization involves creating audio files from simulated, measured or synthesized
numerical data [5].
For example, it is possible to represent multipole or extended sources by summing a certain number
of monopoles2, i.e. point sources whose dimensions are much smaller than the wavelength of the
incident sound wave, or integrating over a distribution of monopoles or infinitesimally small surface
elements. Each source contributes in terms of sound pressure to the acoustic field.
According to source distribution, a specific spatial radiation pattern is created depending on the
position and the distance of the sources. In other words, this is expressed by Huygens principle saying
a wavefront can be considered as a secondary distribution of sources. For example, the 3D audio
technique Wave Field Synthesis is based on this and works as the acoustic equivalent of holography.
In practice, the sound field can be considered emitted by the original source or by a secondary
source belonging to the wavefront. In mathematical terms, this is equivalent to saying that we can
r
obtain the sound pressure on the area A, knowing the sound pressure p0 and its gradient p0 on the
boundary of A, by calculating the Kirchhoff-Helmholtz integral:
r
r ur R r p0 e jkR
()
r r
p r = p0 n n (1 + jkR ) dS0 , r A (1.1)
A
R R 4 R
r
where k is the wave-number and R is the vector connecting the source with the listening point [6].
A detailed analysis of integral (1.1) shows how each secondary source is composed of a monopole
(relative to the pressure gradient signal) and a dipole (relative to the pressure signal). However, there
are slight conceptual differences in the formulations by Kirchhoff-Helmholtz and Huygens. The
former is more general. The shape of the boundary does not depend on the wavefront, in addition the
Kirchhoff-Helmholtz integral itself carries information relating to both amplitude and phase of the
acoustic signal, whereas in Huygens principle it is assumed that the secondary sources are located on
equiphase surfaces. In practice, we can conclude that the Kirchhoff-Helmholtz integral generalizes
Huygens principle.
2
Monopole: point source that can be represented as a pulsating sphere with infinitesimal radius. For such
sources, emitted acoustic waves are function of the radial distance r from the source only.
Dipole: sound source composed of 2 equal monopoles having opposite faces (rotated by 180). The sound
field produced by a dipole has directional characteristics.
4
In practice, Kirchhoff-Helmholtz integral is used as represented by Figure 1.4 (Wave Field Synthesis):
Figure 1.4 Application of the Kirchhoff-Helmholtz integral in holophony/WFS
The listening area is surrounded by pairs of transducers composed of a pressure microphone and a
velocity/pressure gradient microphone. In section 1.8 some basics on these types of microphones are
given. So, the recorded field is due to the sources external to the microphone array.
Then, the Kirchhoff-Helmholtz integral can be interpreted considering that each secondary source can
be split into two elemental sources:
DIPOLE SOURCE: fed by a pressure signal p0

r
MONOPOLE SOURCE: fed by a pressure gradient signal p0
During playback a specular action is operated: arrange speakers having physical characteristics as
shown in Figure 1.4 in the place of the microphones, that is, replace the pressure mics with acoustic
dipole speakers (these speakers radiate both forwards and backwards) and replace the pressure
gradient mics with monopole speakers (closed speakers radiating only forwards thus having a
directional characteristic). The geometrical layout of the microphone array and the speaker array has
to be the same. Each speaker is fed with the signal picked up by the respective microphone.
Similarly we can surround the source instead of the listener with the microphone array [6].
Such a system guarantees (ideally) the exact reproduction of the field within the listening area and,
if the array of transducers is continuous (not possible in reality), there is no need of processing to
reconstruct the field, that is, it is sufficient to feed each speaker with the respective microphone signal.
What happens in reality, where a continuous array (of good quality, if possible) covering the whole
surface is not available, is that the acoustic signal incident on the array has too a short wavelength
compared with the distance between two transducers and it is not feasible to sample it correctly.
5
Possibly, in that case, we encounter spatial aliasing. As in the time domain, spatial aliasing occurs
when the signal is sampled in space taking an insufficient amount of points. Aliasing is revealed by the
appearance of fictitious sources.
The maximum frequency above which spatial aliasing occurs is calculated as (Nyquist Theorem):
c
f max = (1.2)
2d trans
where dtrans is the distance between two transducers and c is the speed of sound.
A signal of frequency above f max produces a time difference of arrival at two transducers greater than
the signal period, while a signal of frequency below f max is such that, being the time difference in the
range of the signal period, the phase difference at the transducers allows an unambiguous time
difference evaluation.
Basically, some simplifications on the Kirchhoff-Helmholtz integral and its use are operated. We try to
minimize the number of transducers to represent the most important secondary sources and we try not
to use both the monopole and dipole transducers: what we normally do is to send signals recorded with
cardioid or figure-of-8 microphones to monopole speakers. Note that the superposition of a monopole
and a dipole gives as a result a cardioid polar characteristic. Finally, with the aim of limiting the
number of spaced microphones used, it is preferred to build "virtual microphones" through the
processing of the recorded signals by weighting the amplitudes and delays in the time of arrival
appropriately, in order to improve the resolution of the system.
A constraint to the techniques based on the theory of the Kirchhoff-Helmholtz integral (with any
simplistic changes) is the fact that within the listening area, surrounded by the speaker array, primary
sources should be absent, i.e. the array is able to reproduce only the external sources. This is a false
problem as it is possible, however, to reverse the phase of the signals feeding the array relatively to the
secondary sources and reproduce, in this way, the internal sources, too. Therefore, we can create a
concave wave front instead of a convex one [7].
Another way to describe a sound field, especially in the case of sources with spherical symmetry, is
based on the decomposition of the sound field into spherical harmonics. Ambisonics is founded on
this second descriptive approach. Spherical harmonics are also used in issues concerning quantum
mechanics, gravitational fields and can be found in 3D graphics applications and lighting engineering.
The starting point is to express the acoustic wave equation in spherical coordinates ( r , , ) ,
where r is the radius, is the azimuth and is the elevation.
The acoustic wave equation in the time domain is:
1 p ( r , , , t )
2
p ( r , , , t ) 2
2
=0 (1.3)
c t 2
where c is the speed of sound.
6
The acoustic pressure field, due to external sources, can be developed into Fourier-Bessel series,

whose terms are weighted products of the directional functions Ymn ( , ) - spherical harmonics
with the radial functions jm ( kr ) - spherical Bessels functions of the first kind:

r
p ( r ) = ( 2m + 1) j m jm ( kr )
Bmn
Ymn ( , ) (1.4)
m=0 0 n m , =1
with m = degree and n = order (the meaning of is the spin and it will be obvious looking at the
2 f
pictures further on), k is the wavenumber k = . Equation (1.4) represents the solution of the
c
Wave Equation in the special case of plane wave.
As shown later, the ambisonic signals in the transform domain [7] are represented by the coefficients

Bmn and behave like Fourier coefficients in a Fourier series.
Note that, unlike WFS or Holophony, the sampling and the reconstruction of the sound field in
Figure 1.5 Bessel spherical functions of the first kind.
Ambisonics are executed pointwise, rather than on an area. It follows that the number of channels
needed to reconstruct the field will be much reduced compared to the other techniques mentioned

above. The information relative to sound direction is coded precisely into the coefficients Bmn just
introduced. Ambisonics produces in theory - a coherent and homogeneous reconstruction of the field
for all frequencies and directions in the sweet spot, the optimal listening point. We will see that the
area affected by a problem of incoherence gets smaller with increasing the order of Ambisonics.
Similarly, there exists a frequency limit, beyond which the error exceeds a certain level that grows
with the order. In other words, Ambisonics performs well in terms of coherence and homogeneity only
in the sweet spot and only for low frequencies. A solution in order to counter this problem can be put
in place using different types of decoding strategies according to the listener's position within the
speaker array (see Chapter 3 in Ambisonic Auralizer technical papers).
7
Let's see the spherical harmonic functions in detail, analysing how ambisonic signals are obtained
from these functions. Spherical harmonics are defined as:
( m n )! cos n if = +1

Ymn ( , ) = 2m + 1 ( 2 ) ( m + n )!P sin (1.5)
sin n if = 1 (ignore if n = 0)
0, n mn
where Pmn ( ) are associated Legendre functions of degree m and order n, pq represents Kronecker
delta and is equal to 1 if p = q , else its equal to 0. The associated Legendre function is defined as:
( 1)
n m+ n m
dn n
Pmn ( ) = (1 ) P = m (1 )
n m( )
2 2 d
m+ n (
1 2 )
2 2 m
(1.6)
d 2 m! d
where = cos .
In Ambisonics some kind of normalization of Legendre functions often takes place [8].
Schmidt Semi-Normalization is defined by:
( m n )! ( m n )!
N mn = 2m + 1 ( 2 ) ( m + n )! =
0, n en
( m + n )!
e0 = 1 if n = 0 (1.7)
en = 2 if n 1
The harmonic functions can be rewritten in Schmidt semi-normalized form (SN3D) by substituting
(1.7) into (1.5):
cos n if = +1

Ymn ( , ) = P%mn sin (1.8)
sin n if = 1 (ignore if n = 0)
The set of spherical harmonics forms an orthonormal basis in the sense of the spherical scalar product,
that is:
1
f |g 4
=
4 f ( , ) g ( , ) d (1.9)
8
So, they can be linearly combined in order to define functions on the surface of a sphere.
Figure 1.6 High Order Ambisonics (up to 3rd order) - 3D view [ D. Courville]
In such a position, Equation (1.4) has to be arrested to a certain order M (because of manageability),
also known as order of Ambisonics. Writing again for convenience:

r
p ( r ) = ( 2m + 1) j m jm ( kr )
Bmn
Ymn ( , ) (1.10)
m=0 0 n M , =1

We have seen that components Bmn are tied tightly to the acoustic pressure field and its higher-order
derivatives about the origin.
In a vector form we have:
B = B = B1
B 1
B 1
B1
L B 1
B 1
L B 1
B 1
L B1
L B 1
B 1
L B1

T
M (3 D ) 00 11 11 10 mm mm mn mn m 0 MM MM M 0
T
(1.11)
YM ( 3 D ) = Y001 Y111 Y111 Y101 L Ymm 1
Y 1
L Y 1
Y 1
L Y 1
L Y 1
Y 1
L Y 1
M0

mm mn mn m0 MM MM
(Note: m is increasing from 0 to M, n is decreasing).

So, components Bmn are bound to the vectors of order m (overall: 2m + 1 components for order m),
returning a total amount of K = ( M + 1) ambisonic CHANNELS.

2
9

The following example reveals how ambisonic signals can be achieved from coefficients Bmn .
For the time being, we stop the Fourier-Bessel series at order M = 1 obtaining the signals called W,
X, Y, Z , which we will define better in the next sections:
BM =1( 3 D) = [WXYZ ]
T
B001 = W relative to PRESSURE signal
B11+1 = X
B111 = Y relative to PRESSURE GRADIENT signals (or to acoustic velocity)
B10+1 = Z
As can be immediately seen from the 3D illustration of the spherical harmonics (Figure 1.6), in
order to achieve a higher directional resolution, the order of Ambisonics must increase.
Attention! So as to avoid confusion, it should be noted that order M of Ambisonics is different

from order n defined in Legendre functions. We can rather say that it refers to the ambisonic order in
terms of degree m in Legendre functions.
Ambisonics is not only a 3D audio technique. Sound field representation can be specialized in 2D
environments. For this purpose, the sound field should be decomposed according to a system of
cylindrical coordinates for a horizontal-only reproduction system.
One has:
( )

r
p ( r , ) = B00
1
J 0 ( kr ) + Bmn
1
2 cos m + Bmn
1
2 sin m J m ( kr ) =
m =1

(1.12)
= B J 0 ( kr ) + ( B Y
1
00
1 1(2 D )
mn mn ( , 0 ) + B
1 1(2 D )
Y
mn mn ( , 0 ) )J m ( kr )
m =1
Also in this case we get an orthonormal basis as seen in the set of 3D equations. The functions denoted
by J m ( kr ) are Bessels functions of the first kind.
Formalism unification is achieved by saying that cylindrical harmonics (horizontal) are a subset of
spherical harmonics and:
22 m m ! (3 D )
(2 D )
Ymn ( , ) = Y ( , ) (1.13)
( 2m + 1)! mn
where = 0 .
2D representation in cylindrical coordinates is very useful to understand how the sound field is
decomposed and what the recorded signals actually mean.
10
The sound field represented by Equation (1.10) and unpacked in 2D form in Equation (1.12) can be re-
written in the explicit form for a plane wave as (referring to Figure 1.7 and [9])

r
p ( r ) = P e = P J 0 ( kr ) + 2 P j m J m ( kr ) cos m ( ) =
jkr cos( )
m =1

(1.14)

= P J 0 ( kr ) + 2 j m J m ( kr ) cos ( m ) cos ( m ) + 2 j m J m ( kr ) sin ( m ) sin ( m )
m =1 m =1
Figure 1.7 Sound wave impinging on listening position [9]
Equation (1.14) can be transformed into matrix form:

r
p ( r ) = P BT h (1.15)
where
BT = B00
1
K
Bmn K = 1 2 cos 2 sin K 2 cos m 2 sin m K
h = J 0 ( kr ) j 2 J 1 ( kr ) cos j 2 J 1 ( kr ) sin K 2 J m ( kr ) cos m 2 J m ( kr ) sin m K

T m m
j j
The information about the spatial distribution of the plane wave is included in vector BT (which is
dependent on the angle of incidence only). This means recording in Ambisonics consists of the
identification of the coefficients BT . The expression of these coefficients reveals that it is required to
have microphones with directivity patterns of the form cos m , sin m .
Thats why finding or building microphones fit for the purpose gets problematic, for m > 2 , whereas,
for m = 0 (omnidirectional microphone) and m = 1 (bidirectional microphone) it is possible to find
an extensive variety of microphones and capsules on the market.
11
On the other hand, the information about sound field variations relatively to the listening point is
included in vector h . Source directivity changes with the emitted frequency (directivity increases with
increasing frequency).
Furthermore, the use of microphones of the family of cardioids simplifies expressions (1.10) and
(1.14) in a way that, separating spatial dependence from frequency dependence, one has:

p ( ) = Wm ( ) Bm Ymn

( ) (1.16)
m =0 0 n m , =1
where Wm ( ) is the weighting factor:
Wm ( ) = j m ( jm ( krMIC ) ) j (1 ) j 'm ( krMIC ) (1.17)
Equation (1.17) highlights how the recording field is dependent on the frequency (in fact, k is the wave
2 c
number k = , where is the wavelength = , c is the speed of sound and f is the frequency).
f
Basically, when miking a source, besides considering source directivity, we must consider the
microphone polar characteristic with respect to the frequency.
In light of the above, recorded ambisonic signals correspond to coefficients BT weighted by function
Wm ( ) which depends on frequency. Equation (1.17) was obtained by weighting Equation (1.10) by
a cardioid characteristic function of the kind G ( ) = + (1 ) cos . Remember that a cardioid
microphone is generated by the superimposition of an omnidirectional microphone (responsive to
pressure) and a figure-of-8 microphone (responsive to pressure gradient and so the derivative of
pressure).
In Chapter 3 of the technical papers we get into the details of encoding and decoding. It will be
assumed that the listener is at a distance from the source such that the sound front can be approximated
as plane.
12
1.3 Ambisonic Formats

As we have seen, in Ambisonics, sound directional components are encoded vectorially in a set of
spherical harmonics. This paragraph shows how audio signals are recorded and processed in
Ambisonics. Actually, Ambisonics is not limited to a particular number of channels: a greater number
of channel provides a higher directional resolution.
In Ambisonics several formats exist for microphone recording, broadcasting and reproduction of
recorded signals.
- A-Format: suitable for miking with specific microphone (e.g. Soundfield mic);
- B-Format: suitable for miking and processing with studio equipment;
- C-Format/UHJ: suitable for mono, stereo, 3-channel systems and broadcasting;
- D-Format: suitable for decoding and playback through array of speakers;
- G-Format: alike D, but decoder is not required;
A-Format
A-Format is achieved from the recording of four signals using a microphone equipped with four
sub-cardioid capsules mounted on the faces of a tetrahedron and oriented as shown in Figure 1.8a:
Figure 1.8 Tetrahedral Soundfield mic for A-Format
The four signals picked up from the capsules correspond to the directions left-front (LF), right-
front (RF), left-back (LB) and left-back (LB). For reasons of physical dimensions of the capsules,
these will not be perfectly coincident. The same problem occurs when using other microphones in B-
Format. In Chapter 3 of the technical papers, dedicated to ambisonic decoding, phase equalization of
recorded signals will be discussed. Phase equalization is required in order to make the capsules
coincident and represent the sound field in a way that the capsules are virtually placed exactly in the
center of the tetrahedron.
13
A sub-cardioid capsule is characterized by a polar diagram of

the type shown in Figure 1.8 and it is described by the equation:
( ) = 0.7 + 0.3cos
where is the angle of incidence of the acoustic wave.
Figure 1.9 Polar diagram of a sub-

cardioid capsule.
The summary table below includes microphones from the family of cardioids, their polar
characteristics and the equations describing them:
POLAR DIAGRAM TYPE OF MICROPHONE EQUATION

Family of cardioids ( ) = + (1 ) cos
(general equation)
OMNIDIRECTIONAL ( ) = 1
SUB-CARDIOID ( ) = 0.7 + 0.3cos
CARDIOID ( ) = 0.5 + 0.5cos
SUPERCARDIOID ( ) = 0.37 + 0.63cos
HYPERCARDIOID ( ) = 0.25 + 0.75cos
FIGURE-OF-8 ( ) = cos
Table 1.1 Polar diagrams and equations for the microphone from the family of cardioids
14
B-Format
B-Format consists of four signals called W, X, Y, Z. As already mentioned above, signal W is

relative to the pressure component of the sound field in all directions, while X, Y, Z refer to the
horizontal components of velocity on the horizontal plane (X, Y) and the vertical component (Z) of
velocity. Microphone takes in B-Format are achieved using three figure-of-8 microphones for signals
X, Y, Z and an omnidirectional microphone for signal W.
The axis pointing 0 in microphone X points the source (it is equivalent to MID in MS),
microphone Y is rotated by 90 (the 0 axis points leftwards), with respect to X (it is equivalent to
SIDE in MS). Microphone Z is oriented along the orthogonal plane with respect to the plane described
by the axes X and Y (the 0 axis points upwards). Figure 1.10 shows the microphone layout just
depicted in words:
Figure 1.10 W, X, Y, Z components in B-Format
However, once the B-Format signals are recorded, it is possible to rotate the array of microphones
virtually through a rotation matrix.
The four B-Format polar patterns are obtained from Equation (1.8) and described as follows
(in a normalized form: see Chapter 3):
15
W = S

X = S 2 cos cos
(1.18)
Y = S 2 sin cos
Z = S 2 sin

where S is the recorded source [1]. The reader is referred to Chapter 3 for further explanations about
the factor 2 .
B-Format can be derived from A-Format through the following transformation:
X = 0.5 ( LF LB ) + ( RF RB )

Y = 0.5 ( LF RB ) ( RF LB )
(1.19)
Z = 0.5 ( LF LB ) + ( RB RF )

W = 0.5 ( LF + LB + RF + RB )
Signal W, being omnidirectional, is given by the sum of the contributions from the four capsules.
Moreover, the recorded signal W can be used to reinforce the lower frequencies, since other types
of microphones do not perform so fatty in low frequency response, as omnidirectional microphones
do.
Extensions of B-Format were introduced for high-definition TV: BF and BEF Formats, which
include the additional channels E and F, redundant in content with respect to the channels W, X, Y, Z
and used to bolster the stability of the front image and sharpen front /rear separation.
C-Format
Recordings in A and B-Format are not naturally compatible with mono and stereo systems. To
ensure the portability of songs recorded in formats A and B into 2-channel media such as CD or radio
and television broadcasting, a new format called C-Format or UHJ was introduced. The initials, UHJ,
stand for three systems developed by the team that introduced this format, that is, U: Universal (from
Nippon Columbia UD-4/UMX, quadraphonic system), H: Matrix H (quadraphonic system from BBC),
J: System 45J (ambisonic system in use at that time).
C-Format is a hierarchical encoding/decoding system for ambisonic signals. Depending on the

number of available channels, this system is capable of reproducing, with a certain degree of accuracy,
the recorded sound field.
C-Format consists of 4 signals (L, R, T, Q) and, although it allows using up to 4 channels, it is

typically used in the 2-channel UHJ format. Left signal L is compatible with a 2-channel system, R
denotes the right signal, T is a third channel introduced for a more accurate horizontal decoding and Q
contains the information relative to the height.
We define = L + R (alike signal M in MID-SIDE) and = L R (alike signal S in MID-SIDE)

and we use the following relations to change from B to C-Format:
16
= 0.9397W + 0.1856 X

= j ( 0.3420W + 0.5099 X ) + 0.6555Y
(1.20)
T = j ( 0.1432W + 0.6512 X ) 0.7071Y
Q = 0.9772 Z

W = 0.982 + 0.197 j ( 0.828 + 0.768T )

X = 0.419 j ( 0.828 + 0.768T )
(1.21)
Y = 0.187 j + ( 0.796 0.676T )
Z = 1.023Q

where j denotes a 90-degree phase advance.
As it has been said, only signals L and R are exploited in stereo-compatible systems:
L = 0.5 ( + )
(1.22)
R = 0.5 ( )
The third channel, T, used in systems named 2 -channel systems does not contain the whole audio
band, but it is limited to 5 kHz. This third channel can be transmitted via radio, in quadrature phase
modulation. The UHJ system including 2 or 3 channels is theoretically as accurate as horizontal B-
Format (WXY). It is possible to achieve the same accuracy of WXYZ B-Format by adding a fourth
channel Q.
The Format using only channels L e R is called BHJ. There exist some modifications: THJ, including
channel T and PHJ including channel Q.
UHJ-Format has also been successful in its stereo un-decoded version. Playback of L and R signals
without the use of a decoder has the effect of a much wider stereophonic sound compared to the
soundscape obtained from a pair of conventional stereo signals. This result was an outcome by chance,
but actually had much success among listeners and it was given the name Super Stereo.
Finally, in order to achieve compatibility with monophonic systems, signals L and R are summed.
Table 1.2 summarizes UHJ hierarchical system:
Number of Decoder Capacity Typical Signals Equivalent Original

Channels applications in B-Format designation
4 Yes Full-sphere DVD, HD LRTQ WXYZ PHJ
surround disc, SACD
3 Yes Full DVD, HD LRT WXY THJ
horizontal disc, SACD
surround
2 Yes Full FM Radio LR WXY SHJ
horizontal T (band-
surround limited)
2 Yes Horizontal CD, Stereo LR - BHJ
surround Radio
17
2-Ch systems
2 No Stereo CD, Stereo LR - -
Radio, 2-Ch
systems
1 No Mono Radio LR - -
(summed)
Table 1.2 Summary table of hierarchical system UHJ, C-Format [10].
D-Format
D-Format is the format that made Ambisonics compatible with common surround speaker systems,
such as 5.1, 7.1, but also with arrays of different sizes and geometries (either regular or irregular
geometries). Signals in D-Format can be derived from either B-Format or C-Format with the use of a
decoder. The number of speakers is not limited in theory. The minimum requirements are, however, 4
speakers for adequate surround playback, 6 is better and full periphony (and therefore the information
relative to the height) can be obtained through 8 speakers.
For example, in a periphonic (i.e. 3D) system, with B-Format input signals, the i-th loudspeaker will
be fed by signal:
1
Si(SN3D) = [W + X cos i cos i + Y sin i cos i + Z sin i ] (1.23)
L
Or, similarly, in the case of higher-order Ambisonics:
1
Si(SN3D) = [W + X cos i cos i + Y sin i cos i + Z sin i +
L
(1.24)
3 3
+U cos 2i cos 2 i + V sin 2i cos 2 i + L
2 2
Higher-order Ambisonics in a 2D system produces a D-Format signal for the i-th loudspeaker as
follows:
1 8 3 8 3
Si(N2D) = W + X 2 cos i + Y 2 sin i + U cos 2i + V sin 2i + L (1.25)
L 3 2 3 2
G-Format
G-Format has the same purpose of D-Format. The difference lies in the fact that this new format is
pre-decoded, i.e., signals are already encoded and stored on multichannel audio formats such as Wave-
Ex (multi-channel version of Wave file format) or DVD-Audio and SACD. In this way, the listener
does not need a decoder, since he only has to play the DVD or the Wave-Ex file.
Obviously having a pre-decoded ambisonic track prevents custom signal adaptation to different
speaker arrays. A track decoded for a 5.1 system will be able to play fine only on this type of surround
sound system.
18
1.4 Higher Order Ambisonics
B-Format breaks off at first order. The reproduction accuracy of the sound field increases with
increasing order. Table 1.3 includes Furse-Malham and Schmidt (SN3D) coefficients, used to encode
the ambisonic channels of order higher than 1.
Spherical harmonics are represented in this form [8]:
cos n if = 1

Ymn ( , ) = P%mn sin (1.26)
sin n if = 1 ignore if n=0
where P%mn are Legendre semi-normalized functions of degree m and order n. This formulation is
called SN3D encoding (SN2D in the 2-D modification) and it is relative to 1st order Ambisonics (see
Paragraph 1.1) with the exception of the weight 0.707 applied to signal W.
Daniels modification, called MaxNormalization (MaxN), is followed by Furse-Malham (FuMa)

coeffcients, as well, with the inclusion of the weight 0.707 on W (see Table 1.3 below).
The mathematical formulations of the spherical harmonics here include weighting factors ensuring
the result of the integration of each harmonic on a sphere returns 1. The value each harmonic assumes
increases with order. This may cause a problem for nearby sources, for which a problem in dynamics
management with signals of higher order channels occurs.
The problem persists from miking to recording.
MaxN representations have weighting factors applied to each component above that of order zero (i.e.,
W), so the maximum value they can assume is limited to 1.
Above third order it gets difficult to determine the maxima of each polynomial, thats why Table 1.3
reports the FuMa coefficients and SN3D representations up to third order.
Order m,n, Channel SN3D Definition FuMa Weight
0 0,0,1 W 1 1 2
1,1,1, X cos cos 1
1 1,1,-1 Y sin cos 1
1,0,1 Z sin 1
2 2,0,1 R
( 3sin 2
1) 2 1
2,1,1 S
( 3 2 ) cos sin ( 2 ) 2 3
2,1,-1 T
( 3 2 ) sin sin ( 2 ) 2 3
2,2,1 U
( 3 2 ) cos ( 2 ) cos 2
2 3
19
2,2,-1 V
( )
3 2 sin ( 2 ) cos 2 2 3
3 3,0,1 K sin ( 5sin 2 3) 2 1
3,1,1 L
( 3 8) cos cos (5sin 1) 2
45 32
3,1,-1 M
( 3 8) sin cos (5sin 1) 2
45 32
3,2,1 N
( 15 2) cos ( 2 ) sin cos 2
3 5
3,2,-1 O
( 15 2) sin ( 2 ) sin cos 2
3 5
3,3,1 P
( 5 8) cos (3 ) cos 3
8 5
3,3,-1 Q
( 5 8) sin ( 3 ) cos 3
8 5
Table 1.3 SN3D Definitions and FuMa Weights for ambisonic signals up to third order.
1.5 Near sources
In 2003 Daniel, Nicol and Moreau proposed a new formulation of B-Format with the aim of
removing the limitation of the current formulation which allows reconstruction of plane waves only
[7].
This restriction implies that the system is not able to handle nearby sources well, especially when
these are within the array of loudspeakers.
The Fourier-Bessel expression for sound pressure on a spherical surface around a point (indicated
r
by radius vector r ) is reported below:

r
p ( r ) = j m jm ( kr )
Bmn
Ymn ( , ) + j m hm ( kr )
Amn
Ymn ( , ) (1.27)
m=0 0 n m , =1 m =0 0 n m , =1
where the first addend of the second term is equivalent to the current formulation of Ambisonics for
sources external to the speaker array, expressed in the frequency domain.
The coefficients Bmn are the gains for the spherical harmonic components, assuming that the sources
produce plane fronts. On the other hand, the second addend describes wave fronts from internal
sources, which are curved and dependent on frequency.
Daniel et al. derived a formula which describes near field sources at distance R from the centre of
the sphere:

Bmn = S FmR c ( ) Ymn

( , ) (1.28)
where S is the pressure field at centre and
20
( m + n )! jc
n
m
Fmn ( ) =

(1.29)
n =0 ( m n ) !n ! R
where c is the speed of sound and = 2 f .

It is deduced that Fmn ( ) has a gain that tends to infinite at low frequencies.
One can plainly see that compensation through the weighting factors mentioned at Paragraph 1.4 also
helps solving this problem (gain is no more infinite).
In this manner, with the use of this formulation, it is possible to reproduce sources which are
internal to the speaker array, since it allows the reconstruction of concave, plane and convex wave
fronts.
It is important to know beforehand, however, the dimension of the array when coding.
21
1.6 Pressure Microphones and Pressure Gradient Microphones
We conclude this section with some useful smattering about the microphones that are commonly used
in ambisonic arrays [11].
Pressure Microphones
These microphones show only the front face to the sound field and respond in the same way to
changes in the acoustic pressure for all the directions of the incident sound. In effect, pressure
microphones have no directional characteristic and they are also known as omnidirectional
microphones.
Actually, the microphone body will cause its response to tend to become directional with increasing
frequency, because its size becomes comparable with the wavelength of the incident sound on the
diaphragm (this is true, in general, for all microphones, which effectively tend to become
hypercardioid with increasing frequency).
Pressure Gradient Microphones
These microphones have a figure-of-8 polar diagram (see note at the end of
paragraph) along the longitudinal axis. They respond to the pressure
difference between two points A and B, shown in Figure 1.11, close
together and immersed in the sound field.
The greater pressure difference occurs at 0 and 180, whereas for a sound
coming from 90 with respect to the axis of the microphone, the sound is
received with the same sensitivity from both points A and B. In fact,
naming TF the transmission factor of the sound field (or sensitivity), the
following relation exists:
TF = TF 0 cos (1.30)
where TF 0 is the transmission factor we have when the sound is impinging
Figure 1.11 Polar pattern of

from direction 0 (microphone axis) and is the angle of incidence of the
a pressure gradient acoustic wave.
microphone.
If = 90o TF = 0 .
The incident pressure is compared at points A and B. This can be achieved electrically if you use two
identical adjacent capsules having opposite faces and measure the output voltages connected with
reverse polarity. Alternatively, the comparison is done mechanically in case of microphones having
both the front and the rear sides of the diaphragm exposed to the sound field. In this second case, only
the instant differences of the forces acting on the front and the rear result in a movement of the
diaphragm.
The pressure difference is due to the velocity of the particles of the medium in which sound
propagates. Since the microphone output voltage is proportional to the pressure difference, it is also
proportional to particle velocity, hence the name velocity microphones.
22
Mic behaviour in the presence of plane waves
Figure 1.12 shows how, in the presence of a plane wave

front, sound affects points A and B with the same
strength but with a phase difference. With a constant
sound pressure, the angle swept by the sound and the
pressure gradient increase with frequency.
In Figure 1.12b the acoustic wave has a frequency
approximately twice that of Figure 1.12a and the same
pressure. As you can see, approximately, a doubling of
the pressure gradient occurs.
Reading the technical specifications of the microphones
that are normally found on the market, specifically the
frequency response graph, we note that all microphones
have - sooner or later - a "hole" next to the so-called
characteristic frequency of the microphone as shown in
Figure 1.13. Of course, all the technical specifications
should always be taken into consideration, especially
when you want to build a microphone array for such a
Figure 1.12 Microphone behaviour with plane delicate system like Ambisonics, which exhibits most
waves. of its problems at high frequencies.
Usually, the distance between points A and B is

very tiny in microphones. There exists a limit
beyond which the microphone does not respond
efficiently to very high frequencies.
If the distance A-B is shorter than half the
wavelength we want to reproduce, this limit
distance refers to the characteristic frequency ft .
For such a limit distance we have = 180o .
Beyond the characteristic frequency the pressure
gradient diminishes abruptly.
Figure 1.13 Frequency response of a pressure gradient

microphone: characteristic frequency effect.
Mic behaviour in the presence of spherical waves
In the case of spherical waves (Figure 1.14), the pressure gradient at points A and B depends not only
on the phase difference, but also on the distance between source and microphone. In that case, for a
point source radiating a spherical front, pressure decreases with increasing distance from the source
(that is, the pressure is proportional to 1 r ).
23
Students at singing schools are taught that approaching the

microphone to the mouth it is possible to enhance low
frequencies. This is called proximity effect and is explained
by the fact that the effect is noticeable especially at low
frequencies for which the forces acting on the diaphragm are
weaker, because the phase shift is smaller than in the case of
high frequencies.
The boost at frequency f is calculated as follows:
v8 1
v = cos
Figure 1.14 Pressure gradient microphone
0
behaviour with spherical front. (1.31)
tan = = 54.14
2 r f r
v8
where r is the distance between the microphone and the source, represents the boost at frequency f
v0
(wavelength ) , v8 is the output voltage of a pressure gradient microphone having a directivity pattern
such as figure-of-8 and v0 is the output voltage of an omnidirectional microphone with same
sensitivity at 0 [30].
Note
At the very beginning of this section, we said that pressure gradient microphones have a figure-of-8
polar diagram. This is worth pointing out because when it comes to microphones belonging to the
family of cardioids, however, we refer to pressure gradient microphones, as well.
Cardioid polar characteristics can be achieved in different ways:

- Superimposition of a figure-of-8 and an omnidirectional mic;
- Microphone composed of a part of the diaphragm having only the front side exposed to the
sound field and another part having both sides exposed to the field;
- It is possible to build a microphone in which the sound gets to its rear side passing through a
delay element.
24
Bibliography
[1] F. Rumsey, Spatial Audio, Focal Press, 2001.
[2] M. A. Gerzon, "A year of surround sound," Hi-Fi News, August 1971.
[3] M. A. Gerzon, "A year of surround sound," Wireless World, December 1974.
[4] D. Malham, "Homogeneous and nonhomogeneous surround sound systems," in AES Second
Century of Audio, London, 7 -8th June 1999.
[5] M. Vorlnder, Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and

Acoustic Virtual Reality, Springer , 2008.
[6] R. Nicol and M. Emerit, "3D-sound reproduction over an extensive listening area: a hybrid
method derived from holophony and ambisonic," in AES 16th International Conference.
[7] J. Daniel, R. Nicol and S. Moreau, "Further Investigations of High Order Ambisonics and
Wavefield Synthesis for Holophonic Sound Imaging," in AES 114th Convention, Amsterdam, The
Netherlands, March 2225th, 2003 .
[8] J. Daniel, "Reprsentation de champs acoustiques, application la transmission et la

reproduction de scnes sonores complexes dans un contexte multimedia," 2001.
[9] R. Nicol and M. Emerit, "Reproduction of 3D sound for videoconferencing: a comparison

between holophony and ambisonics," in Proc. Workshop on Digital Audio Effects (DAFx-98),
Barcelona, Spain, November 19-21, 1998.
[10] M. A. Gerzon, "Ambisonics in Multichannel Broadcastingand Video," Journal of Audio

Engineering Society, vol. 33, no. 11, 1985.
[11] G. Bor and S. Peus, Microphones for Studio and Home-Recording Applications - Operation
Principles and Type Examples, Georg Neumann GmbH, 1999.
25

Introduction To Ambisonics - Rev. 2015

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Ambisonics - Rev. 2015

Uploaded by

Copyright:

Available Formats

Introduction to Ambisonics Francesca Ortolani Rev.

1.1 Introduction to Surround and 3D audio techniques

Figure 1.1 In order to have a correct perception of the

Figure 1.3 Example of Mid-Side configuration - polar

1.2 The Physics in Ambisonics

Figure 1.4 Application of the Kirchhoff-Helmholtz integral in holophony/WFS

DIPOLE SOURCE: fed by a pressure signal p0

The acoustic wave equation in the time domain is:

where c is the speed of sound.

Figure 1.5 Bessel spherical functions of the first kind.

Schmidt Semi-Normalization is defined by:

In a vector form we have:

(Note: m is increasing from 0 to M, n is decreasing).

returning a total amount of K = ( M + 1) ambisonic CHANNELS.

B001 = W relative to PRESSURE signal

B111 = Y relative to PRESSURE GRADIENT signals (or to acoustic velocity)

Attention! So as to avoid confusion, it should be noted that order M of Ambisonics is different

Figure 1.7 Sound wave impinging on listening position [9]

Equation (1.14) can be transformed into matrix form:

h = J 0 ( kr ) j 2 J 1 ( kr ) cos j 2 J 1 ( kr ) sin K 2 J m ( kr ) cos m 2 J m ( kr ) sin m K

where Wm ( ) is the weighting factor:

Wm ( ) = j m ( jm ( krMIC ) ) j (1 ) j 'm ( krMIC ) (1.17)

1.3 Ambisonic Formats

Figure 1.8 Tetrahedral Soundfield mic for A-Format

A sub-cardioid capsule is characterized by a polar diagram of

where is the angle of incidence of the acoustic wave.

Figure 1.9 Polar diagram of a sub-

POLAR DIAGRAM TYPE OF MICROPHONE EQUATION

SUB-CARDIOID ( ) = 0.7 + 0.3cos

CARDIOID ( ) = 0.5 + 0.5cos

SUPERCARDIOID ( ) = 0.37 + 0.63cos

HYPERCARDIOID ( ) = 0.25 + 0.75cos

B-Format consists of four signals called W, X, Y, Z. As already mentioned above, signal W is

Figure 1.10 W, X, Y, Z components in B-Format

B-Format can be derived from A-Format through the following transformation:

C-Format is a hierarchical encoding/decoding system for ambisonic signals. Depending on the

C-Format consists of 4 signals (L, R, T, Q) and, although it allows using up to 4 channels, it is

We define = L + R (alike signal M in MID-SIDE) and = L R (alike signal S in MID-SIDE)

W = 0.982 + 0.197 j ( 0.828 + 0.768T )

where j denotes a 90-degree phase advance.

Table 1.2 summarizes UHJ hierarchical system:

Number of Decoder Capacity Typical Signals Equivalent Original

Or, similarly, in the case of higher-order Ambisonics:

1.4 Higher Order Ambisonics

Spherical harmonics are represented in this form [8]:

Daniels modification, called MaxNormalization (MaxN), is followed by Furse-Malham (FuMa)

Order m,n, Channel SN3D Definition FuMa Weight

1.5 Near sources

where S is the pressure field at centre and

where c is the speed of sound and = 2 f .

1.6 Pressure Microphones and Pressure Gradient Microphones

Pressure Gradient Microphones

where TF 0 is the transmission factor we have when the sound is impinging

Figure 1.11 Polar pattern of

Mic behaviour in the presence of plane waves

Figure 1.12 shows how, in the presence of a plane wave

Usually, the distance between points A and B is

Figure 1.13 Frequency response of a pressure gradient

Mic behaviour in the presence of spherical waves

Students at singing schools are taught that approaching the

The boost at frequency f is calculated as follows:

Cardioid polar characteristics can be achieved in different ways:

[1] F. Rumsey, Spatial Audio, Focal Press, 2001.