Professional Documents
Culture Documents
Jorge Castellanos
Approved by the
Examining Committee:
September 2006
(For Graduation September 2006)
c Copyright 2006
by
Jorge Castellanos
All Rights Reserved
ii
CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . viii
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
I Foundations 8
2. Sound Properties and Perception . . . . . . . . . . . . . . . . 9
2.1 Sound Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Digital Representation of Sound . . . . . . . . . . . . . . 12
2.2 Sound Perception . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Sound Localization . . . . . . . . . . . . . . . . . . . . . 14
2.2.1.1 Distance Cues . . . . . . . . . . . . . . . . . . . 16
3. Spatialization Techniques . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Sound Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Amplitude Panning . . . . . . . . . . . . . . . . . . . . . 18
3.1.1.1 VBAP . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Ambisonic Rendering Technique . . . . . . . . . . . . . . 20
3.1.3 Binaural Audio Rendering Technique . . . . . . . . . . . 21
3.1.4 WFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.5 Hybrid Techniques . . . . . . . . . . . . . . . . . . . . . 23
3.2 Additional Spatialization Properties . . . . . . . . . . . . . . . . 24
3.2.1 Distance Modeling . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Object Size and Radiation Pattern . . . . . . . . . . . . 24
3.3 Room Acoustics Modeling . . . . . . . . . . . . . . . . . . . . . 25
3.3.1 Geometric Modeling Methods . . . . . . . . . . . . . . . 25
iii
3.3.2 Wave Based Methods . . . . . . . . . . . . . . . . . . . . 27
3.3.3 Statistical Methods . . . . . . . . . . . . . . . . . . . . . 27
3.3.4 Hybrid Modeling Methods . . . . . . . . . . . . . . . . . 27
3.3.5 Reverberation . . . . . . . . . . . . . . . . . . . . . . . . 27
6. Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1 Interaction Models . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 System Meta-Model . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.3.1 Spatial Data . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3.2 Spatial Processing . . . . . . . . . . . . . . . . . . . . . 47
6.3.3 Speaker Layout . . . . . . . . . . . . . . . . . . . . . . . 47
6.3.4 Panner . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.3.5 Distance Simulator . . . . . . . . . . . . . . . . . . . . . 50
6.3.6 Acoustic Modeler . . . . . . . . . . . . . . . . . . . . . . 51
6.4 High-level Model . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.4.1 Spatializer . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4.2 Auralizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
iv
III Implementation 55
7. System Implementation . . . . . . . . . . . . . . . . . . . . . . . 56
7.1 CSL Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.1.1 CSL Core . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.1.2 Connections . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.1.3 Input / Output . . . . . . . . . . . . . . . . . . . . . . . 58
7.2 Spatial Audio Subsystem . . . . . . . . . . . . . . . . . . . . . . 58
7.2.1 Position . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2.2 Spatial Sound Sources . . . . . . . . . . . . . . . . . . . 58
7.2.3 Loudspeaker Setup . . . . . . . . . . . . . . . . . . . . . 59
7.2.4 Panning . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.4.1 Vector Base Amplitude Panning . . . . . . . . . 60
7.2.4.2 Binaural Panner . . . . . . . . . . . . . . . . . 61
7.2.4.3 Ambisonic Panner . . . . . . . . . . . . . . . . 62
7.2.5 Distance Simulation . . . . . . . . . . . . . . . . . . . . 63
7.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
LITERATURE CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
v
LIST OF FIGURES
2.3 Sound diffraction: waves wrap around objects smaller than their
length. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Sound Cones used to represent sound radiation of objects [Mutanen, 2002] 25
vi
6.6 Concrete subclasses of the Panner base class. . . . . . . . . . . . 50
vii
ACKNOWLEDGMENTS
viii
ABSTRACT
ix
CHAPTER 1
Introduction
The use of virtual reality and immersive reality systems, and multimedia in gen-
eral has grown and continues to grow rapidly. Soon these systems will constitute
part of the everyday life of most people. This fact calls for the development of
many applications that will facilitate the use and understanding of this technol-
ogy. An important component of such systems is the audio rendering engine.
As a consequence of the increased attention toward multimedia systems,
multiple spatial audio rendering techniques (also commonly called 3D audio)
have been developed. Their primary goal is to allow for accurate localization of
sounds, for an improved and possibly realistic immersive acoustical experience.
So far, none of the existing techniques perform well under all circumstances.
Quoting Dave Malham [Malham, 1998]:
Even the best systems in use today for sound spatialization are rela-
tively crude, allowing for little more then the creation of an illusion,
sometimes very good, more often poor.
The different techniques complement each other to some extent, each ad-
dressing particular aspects related to spatial hearing under different physical
setups and space limitations. Ideally, a single technique should work for any
audio system configuration and in any environment; such system is not achiev-
able. However, under certain constraints one system could solve most issues.
As V. Pulkki wrote in relation to spatialization techniques [Pulkki, 1997]:
The design proposed in this thesis attempts to solve such need, by inte-
grating many of the current audio spatialization techniques into one system.
1
2
The integration of all of these techniques into one engine or framework would
provide a single, simple interface to any system, regardless of the loudspeaker or
room setup. Such system would silently adapt itself to an existing audio system
setup, improving and facilitating the users experience.
research using any available surround sound techniques without having to pro-
gram or be familiar with the sound reproduction system/technique. Spatial
audio researchers would be able to extend the system, by just adding new al-
gorithms to the already implemented ones (i.e., the system will provide sound
source management, simplifying the implementation of new techniques).
The overall goal is to ease multimedia software development by designing
a system model of a 3D audio engine. Such description should act as a model
from which concrete systems, libraries, or frameworks are implemented. As a
proof of concept, a basic implementation of the design will be discussed in the
third section of this document.
Finally, music composers, have many different tools for multi-channel au-
dio reproduction, but each of them has proved to be difficult to use, inflexible,
not scalable, and require in depth knowledge of the different spatial audio meth-
ods. Most 3D audio rendering systems implement up to a couple of methods,
but none of them provides an integrated interface to the many possibilities.
This system should serve as a step closer to a simple tool for composers.
4
1.2 Scope
This work elaborates on the design principles and practical implementa-
tion of a general-purpose spatial audio rendering framework. The design should
fully describe how the different pieces of the framework would interact if it were
to be implemented. Such description is presented more as a specification, avoid-
ing strong ties to a particular programming language. This attempt to making
it language independent allows for future implementations in other languages
and/or platforms.
The objective of the framework is that of facilitating the implementation
of audio and/or multimedia applications used in research, art, and/or enter-
tainment. We are looking to provide an engine so that any type of users, such
as artists, scientists, software engineers and possibly others can focus their at-
tention on their projects without spending any time re-implementing or un-
derstanding a complicated 3D audio engine. Hopefully such an engine will
encourage the use of multi-speaker setups.
This project was divided into two stages: System Design and System
Implementation. The system design was the primary focus for the project.
A solid design is a key component for future improvements. The system was
implemented as part of a larger C++ synthesis framework. The implementation
should be taken as an example of a concrete instance of the model described.
The different implementation issues encountered are also discussed in chapter
7, serving as a guide for future implementations.
Only a basic system was implemented, and full documentation will be
provided facilitating its usage as well as making clear implementation details
for future improvements. Contributions from future students are expected in
order to build a robust framework for multimedia development.
video games, while a few others are non-real-time, with the goal of accurately
simulating the acoustical properties of an arbitrary space.
The most used spatial audio libraries for the development of multimedia
applications are Microsoft’s DirectX, OpenAL, Java3D and X3D. These APIs
(Application Programmers Interface) comply with a set of guidelines defined
by the Interactive Audio Special Interest Group (IASIG). The first document
these guidelines [I3D, 1998] ”3D Audio Evaluation and Rendering Guidelines”
(I3DL1) defines the minimal functionality to be implemented in a 3D audio
platform. The document also explains and describes what techniques are or
aren’t considered 3D audio. The second document [I3D, 1999] ”Interactive 3D
Audio Rendering Guidelines” (I3DL2) define a more robust and complete fea-
ture set (room acoustics, sound occlusion, sound obstruction), allowing for an
improved 3D experience. Following is a short description of some of the libraries
mentioned above. For a more detailed summary of the APIs described below,
refer to R. Väänänen’s PhD thesis [Väänänen, 2003].
• blue-c: A fully immersive reality system [Naef et al., 2002]. The audio
engine was designed to support live audio input, room simulation and
accurate localization.
1.4 Organization
This document is divided into three parts. The first part, consisting of
Chapters 2 to 4, serves as a review and introduction to the basic subjects needed
to understand the later chapters of the thesis. Chapter two introduces the ba-
sic properties, digital representation, and perception of sound. Chapter three
presents a survey of the current techniques used for audio spatialization. Chap-
ter four provides a brief introduction to Object Oriented Programming (OOP)
and Design Patterns.
Part two contains Chapters 5 and 6, describing the design of the frame-
work. Chapter five presents the results of user analysis performed to better
understand the system. Chapter six presents the design of the different mod-
els that make up the system. These chapters constitute the main focus of the
thesis.
Finally, in the third section, Chapter 7 provides an overview of CSL (The
CREATE Signal Library), followed by a description of the system implemented.
PART I
Foundations
8
CHAPTER 2
Sound Properties and Perception
This chapter introduces the basic properties, digital representation, and per-
ception of sound. This knowledge is required to understand some of the design
decisions presented later in the document. The introduction below is written
in a simple language, avoiding technical terms and using the least mathematics
possible, making it a simple tutorial into the world of sound. For more detailed
information the reader should consult the references given in the text.
9
10
In an enclosed space, sound waves hit the perimeter surfaces and are
either absorbed, reflected back into the space, or a combination of both. Sound
absorption consists in the property of the surface material to transform sound
energy into a different energy type, such as heat, or mechanical. The sound
energy can be absorbed partially or totally, as function of the properties and/or
dimensions of the material. The absorption coefficient (α) is expressed as a
percentage which indicates how much of the total energy will be attenuated
and is expressed as the total incident/direct sound energy minus the reflected
energy.
α = 1 − |R|2 (2.1)
source is moving relative to the receiver, the time necessary for the sound to
arrive to the listener also varies continuously. These changes cause an apparent
change of pitch, commonly known as Doppler effect (see Figure 2.4). If the
source and receiver approach each other, the apparent pitch increases, while
if they move away from each other the pitch is perceived as decreasing. The
perceived frequency can be obtained as follows:
where fsource is the frequency of the sound source and vobserver is the speed at
which the observer is moving.
of numbers. For example a simple filter works by averaging a sample with the
previous one. More in depth introductory texts on this topic can be found at
[Roads, 1996] and [Pohlmann, 2005].
ILD and ITD cues do not always provide the necessary localization in-
formation to the auditory system. A particular issue, known as the Cone of
Confusion occurs towards the sides of the listener, consisting of a conical area
somewhat normal to the ears (see Figure 2.6) where the ILD and ITD are almost
identical due to the (quasi) symmetry of the head. A common manifestation of
this problem is front -back reversal. Studies have shown that head movement
can contribute to improve localization. Such dynamic adjustments permit the
brain compare different ITD and ILD, which helps disambiguate conflicting cues
[Begault, 1994].
Spectral Cues: Just as sound reflects off the objects present in a room,
the listener also interacts with incoming sound; depending on the angle of inci-
dence sound will reflect off of different surfaces of the listeners body. In addition,
the complexity and shape of the different ear cavities (concha, auditory canal
and the fossa create resonances at different frequencies, providing information
about the origin and type of sound. The spectral cues are primarily used for
high frequency sounds (above 6 kHz), and when the ITD and ILD are ambigu-
ous. Spectral cues are also the primary cues for discerning to some extent the
elevation of sounds.
16
Dref
Lrev = 1 − ( )2 (2.3)
Ds + Dref
Cues obtained with senses other than auditory can also help localizing
sound. Having visual contact with the emitting object, or the environment, will
substantially help anticipate expectations regarding the sound character and
construct a more accurate localization. The tactile perception of vibration can
also assist the auditory system.
Cognitive cues also play a primary role in sound localization. Previous
knowledge of a space, and or sound will allow comparison to previously known
sounds with certain characteristics. John Chowning presents a great example
in [Chowning, 2000]:
of the same duration. The listener is asked which of the two tones
is the louder
In spite of the fact that the distant tone would be louder than the closer one, the
listener is capable of discerning which one is closer thanks to previous knowledge
of the sound properties.
CHAPTER 3
Spatialization Techniques
This chapter presents a survey of the current techniques used for audio spatial-
ization. Even though the main focus is on sound positioning techniques, room
modeling is also considered, as it constitutes a very important aspect of spatial
audio. These explanations should provide the reader with the knowledge needed
to understand the design decisions taken.
18
19
must be placed at no more than 60◦ apart. The farther a virtual source is from
a loudspeaker, the less stable the image created is. This condition is due to the
frequency dependency of ILD.
3.1.1.1 VBAP
Vector Base Amplitude Panning (VBAP) is based on the amplitude pan-
ning technique described above, but formulated for multiple/varying number of
loudspeakers, and for two dimensional (i.e. only horizontal) or three dimensional
(i.e. periphonal) positioning. In addition to the regular amplitude panning pre-
viously described, this technique restates the problem using vectors and vector
bases, simplifying calculations, and making them more efficient computationally
[Pulkki, 1997].
rendering). In the same way, for the periphonic rendering, at least three loud-
speakers are needed. These loudspeakers form a triangle, and the virtual source
can be positioned anywhere inside it (see Figure 3.1). In order to position a
sound source using a multi-loudepeaker setup, first, the corresponding loud-
speaker pair or triplet has to be determined. Subsequently, the sound source
gains are calculated only for those two or three loudspeakers. The gains can be
obtained by multiplying the position (vector) of the sound source by the inverse
of the loudspeaker vectors:
−1
lk1 lk2 lk3
gains = pT L−1
mnk = [ p 1 p 2 p 3 ] lm1 lm2 lm3
(3.1)
ln1 ln2 ln3
3
For optimal decoding, the number of loudspeakers should be equal or larger than the
number of encoded channels.
4
Customized recordings provide better results (i.e. recordings are performed by inserting
microphones in the ear canal of the subject).
22
When dealing with discreet (sampled) signals, equation 3.2 can be ex-
pressed as follows:
N
X −1
y(n) = x(n − m) · h(n) (3.3)
m=0
3.1.4 WFS
Wave Field Synthesis, often referred to as Holophony (as the acoustical
analogy to holography), aims to fully reconstruct the physical properties of a
soundfield. In other words, instead of trying to reproduce the elements related to
the perceptual characteristics of the auditory system in terms of sound location
(such as the VBAP), WFS attempts to reconstruct the soundfield should the
source be present at the specified physical location. This can be explained
through the Huygens principle, which states that each point on a wave front
may be regarded as the source of secondary waves and that the surface that is
tangent to the secondary waves can be used to determine the future position of
the wave front.
WFS replaces with loudspeakers those secondary waves (see Figure 3.2).
Thus, any soundfield can be reconstructed by replacing each point of the wave-
front with a loudspeaker. In reality, the number of loudspeakers is limited,
quantizing the wavefront into discrete points. Just as with digital audio sam-
pling, to avoid aliasing, a minimum number of points is needed. In the case
of WFS, the number of loudspeakers required to truly reconstruct a soundfield
that covers the human audible range is very large, in the order of thousands. M.
Gerzon proposed that about 400,000 loudspeakers would be needed in a listen-
ing space with a 2 meter diameter [Gerzon, 1974]. Current systems make use
23
codes the radiation pattern following the principles of the Ambisonic B Format
[Menzies, 2002].
used to build an impulse response of the room. Most real-time systems use
geometric methods for performing acoustical simulation of spaces, because of
their simplicity and generality.
Image Source Method works by finding only those rays that actually
will reach the listener. This can be achieved by assuming that an exact im-
age (thus the name) of the room is reflected at the other side of each wall.
Subsequently, the vector that points from the source to the listener has the ac-
tual direction and length a ray would take (see Figure 3.4). The Image Source
method is efficient and simple but not necessarily the most convincing; it un-
derperforms especially at low frequencies.
Ray Tracing works by following the path of a very large number of rays
(or particles) that travel away from the sound source. When a ray hits a surface,
it reflects and continues its path, until it finds a receiver.
Beam Tracing works by following the path of sets of rays (a beam)
and creating transmission and reflection beams at polygon intersections. This
method is more complex than the previous ones, but it needs less virtual sources
than Source Image method, and does not suffer from the sampling problems
specific to ray tracing [Funkhouser et al., 1998].
27
3.3.5 Reverberation
Simulating the effect of reverberation can be approached either from a
physical or a perceptual point of view. Reverberation can be approached ei-
ther from a physical or a perceptual point of view. Numerous algorithms have
been proposed [Gardner, 2001]. In recent years, convolution reverberators have
gained popularity. These work by recording an actual impulse response of a
real space and then convolving it with the audio signal. An advantage to this
28
5
Even convolution reverberators have added controls that modify parameters of the room
by modifying the impulse response
CHAPTER 4
Object Oriented Programming and Design
29
30
ClassA
This is a note! Inheritance
Class name in italics ClassB derives from ClassA
represents an
Abstract Class
ClassB Aggregation
Composition + public_attribute : type 1 ClassB contains zero or
ClassB is made of one more ClassY objects.
- private_attribute : type
or more ClassX objects.
+ operation(parameter : type) : returnType
1...* *
dependency
ClassX ClassY
*
Model Observer
*
+ add(observer : Observer) update()
+ remove(observer : Observer)
+ notifyObservers()
call update()
on every observer
ConcreteModel ConcreteObserver
notify() update()
subclasses that implement an algorithm. The client can then hold a reference
to the Strategy (the base class) allowing to change the algorithm dynamically.
Facade(Used in Spatializers) Simplifies the interaction with a subsystem
by defining an interface that otherwise would be complex to use. When a
particular task requires that the client deals with many different classes that
can be put together to do something, a Facade can provide a better interaction.
The Facade class deals with all those many classes, and lets the client deal
only with the Facade. Oftentimes this Pattern is used as layering mechanism
providing a single entry point to a subsystem (see Figure 4.3.
Client
Facade
system requirements can be obtained based on user needs and usage, and not
just guessed.
Use Cases are also commonly used in the analysis phase and consist of
a step-by-step description of the user interaction with the system. Use Cases
are used for finding the functional requirements of a system. These are often
derived from the Use Scenarios.
4.3.2 Models
As already mentioned, the analysis stages help produce a list of system
requirements for designers to build a model or models of the system. These
models are then used to build the actual system. In this context a model refers
to a general description of the system.
A system can have many models, each with a different purpose. Users
have their own models of how a system should work. These mental models
could be classified in two: those that model how the system works (system
model) and those that model how people interact with the system (interaction
models). Oftentimes is better to separate the interaction model from the system
model. In this way, the interaction model can be designed keeping in mind the
mental model of the user. When these two models match, the interaction with
the system is intuitive and easy to use as it allows us to predict the effects of
our actions.
Donald Norman [Norman, 2002] captures the importance of making the
distinction between the system and interaction models in the following sentence:
”... after all, scissors, pens and light switches are pretty simple
devices. There’s no need to understand the underlying physics or
chemistry of each device we own, simple the relationship between
the controls and the outcome”
PART II
35
CHAPTER 5
User Analysis
This chapter presents the results of a thorough user analysis performed to bet-
ter understand the system. One of the primary objectives in the development
of the proposed system was designing it with usability in mind. Most soft-
ware libraries are designed considering only functionality and then efficiency,
oftentimes ignorin or barely considering the target developers as users. The
assumption is that the program would be used by a scientist, therefore usability
does not matter. Such assumption has generated many non-friendly libraries
and programming languages.
Ordinary Usability Engineering analysis was performed, borrowing only
those aspects that applied to framework development. For example, Use Scenar-
ios cannot be written the standard way, as there is no real time interaction with
the system. The information described in this section is later used to generate
a set of requirements for the system design, so that user needs are satisfied.
36
37
users will be multimedia developers. Strictly speaking, only those with software
programming knowledge and an interest in developing an audio application
would constitute the user base. The list below shows the primary target users,
followed by a description for each item.
All these scenarios have solutions with current technology, but the solution
is not always the most intuitive or most convenient one.
Flexibility and ease of use, two major requirements, conflict with each
other. Flexibility oftentimes adds complexity, producing a hard to use system.
Chapter 6 presents a solution to the problem, based on the idea of providing
different interaction interface levels. The high level interaction model should
hide the system model, presenting an intuitive and simple interface. Lower
level interaction models would provide increasing levels of control, sacrificing
some of the simplicity of the higher level models.
CHAPTER 6
Models
43
44
ular technique. Depending on the intended use of the framework and the users
background, some of them will expect a different type of interaction.
The user that has no audio knowledge might prefer the most natural
model, where setting the objects position takes care of rendering the sound
from that position. To achieve this, an expert system, behind the scenes might
run an analysis of the users setup, and choose the most adequate spatialzation
technique.
More experienced users could choose a particular rendering technique; the
proposed model does not account for this condition. In such situation, users
will have to deal directly with the system model.
is done using Ports (the ports mechanism is analog to the input and output ports
found in a physical device). The state of the Processing objects is handled by
an asynchronous Control mechanism.
Processing Data Processing Data objects passively hold the data that
Processing objects modify, offering ”a homogeneous interface to media data”
[Amatriain, 2004]. These objects are further classified, defining different types
of data. In this particular design, Processing Data objects would cary audio
data (i.e. a buffer of samples).
Processing Processing objects are the main building blocks in a 4MS-
based system. Processing objects encapsulate an algorithm, meant to perform
an action on the given data, transforming its state/essence. Data processing is
done synchronously as opposed to control data, which is sent as asynchronous
events. The state of the Processing Object is handled by a Configuration mech-
anism.
46
Processing ProcessingData
SpatialProcessing SpatialData
Speaker Layout
Distance
Panner
Spatializer Simulator
DistanceCue
Room
Acoustic
Auralizer Modeler
Model
0...*
Speaker
- position : Position
+ position() : Position
SpeakerLayout 1 + azimuth() : number
defaultLayout : SpeakerLayout + elevation() : number
setDefaultSpeakerLayout() : void + distance() : number
defaultSpeakerLayout() : SpeakerLayout
+ numSpeakers() : number
+ addSpeaker() : void
+ positionOfSpeaker(index) : Position
SpeakerLayout have to tell the layout they want to know when and if anything
changes in the layout (register themselves as observers).
To simplify usage, a default layout is created when instantiating any object
that uses the speaker layout, so that if not layout is set, the default layout is
employed. Any layout can be set as user-default, so that later instances of
objects can reuse it without the need of setting it manually per each instance.
The use of multiple layouts is allowed, by assigning the desired layout to the
object that will use it. In that case, this object will use this layout instead of
the default.
SpeakerLayout Processing
Active SpeakerLayout
6.3.4 Panner
Sound positioning received particular attention during the analysis and
design phases as it plays a mayor role in the design. The traditional, already
familiar, panning model, where the sound position is set using the pan pot of a
sound mixing console, proved to be an inconvenient model for sound positioning
when dealing with multimedia content. The model proposed, already described
in the Interaction Model Section wasn’t welcome by all users, as expected. Still,
sometimes is better to have people learn a new model that is clear and consistent,
rather than to use a familiar model that does not fit [Lidwell et al., 2003].
The Panner abstract class is the base of any SpatialProcessing object capa-
ble of modifying a signal so that it appears to be placed at a particular position
in the listening space. This name (Panner ) was chosen due to the familiarity
most people have with the concept of stereo panning. However, in this model,
the panners capabilities were extended to include a full 3D periphery, as opposed
to only the 60◦ in the horizontal plane covered by a typical stereo panner. As
with conventional 2D panners, distance information is not necessarily calculated
at this level (i.e. by the panner) 6 .
SpatialProcessing Observer
0...*
SpatiaData
Panner 1
speakerLayout : SpeakerLayout
+ setSpeakerLayout(layout : SpeakerLayout) : void
+ addSource(soundSource : SpatialData) : void
+ removeSource(soundSource :SpatialData) : void
+ numSources() : number Panners can process
multiple sources at a time.
Figure 6.5: Panner class diagram. Panner inherits from both Spatial-
Processing and Observer (top) and handles any number
of SpatialData objects.
etc.). Keeping a common base class allows for dynamic substitution of panning
techniques, as described in the Strategy design pattern [Gamma et al., 1995].
The classes described below (see section 6.4.1) are implemented using this pat-
tern. As a consequence to allowing such dynamism, the distance processing had
to be decoupled from panning, and performed by a separate Processing object.
Panner
The Panner is just the Abstract class from which particular spatialization
techniques can be implemented. For example, a VBAP implementation inherits
from the Panner, adding its particular algorithm to be performed on the sound
data.
For simplicity, a Panner should be able to process multiple sources, and
not just one, as other Processing objects do. In a way, a Panner can be seen as
a multichannel mixer of spatial sources, in that it receives any number of input
data to process, and it outputs a ”mixed down” version.
SpatialProcessing
1...* DistanceCue
source and the listener must be known. Each AcousticModeler can represent
one listener.
AcousticModeler
sceneGraph : Room
listener : Position
+ setListener(listener : Position) : void
+ setRoom(sceneGraph : Room) : void
+ reflectionInfoForSource(source SpatialData) : ReflectionData
6.4.1 Spatializer
The Spatializer acts as a Facade, hiding the underlying system from the
clients. In this case, the user does not need to manage panning and distance
simulation for each audio source; the Spatializer accomplishes these tasks. Each
time a sound source (SpatialData) is added to the Spatializer, this will internally
create a DistanceSimulator and add it to the Panner. Another advantage is that
it frees the client from knowing about the concrete panning techniques. by using
an object (SpeakerLayoutExpert) that can employ some heuristics in finding the
most adequate panning technique, by analyzing the user’s loudspeaker setup (if
user doesn’t specify the desired Panner).
SpatialProcessing
SpeakerLayoutExpert
ype
ner T
Pan
gets
Spatializer
1 *
DistanceSimulator
+ setPanningMode(type PannerType) : void
Panner SpatialData
6.4.2 Auralizer
Auralizer is a special kind of Spatializer, with the extra functionality of
calculating room acoustics. An Auralizer has the knowledge of the properties of
a space (scene graph) and makes use of an AcousticModeler to get the acoustic
54
response of the modeled space. The room response is then spatialized, providing
a full acoustical experience. Only one listener can be specified per Auralizer,
requiring one Auralizer per listener when multiple listeners are needed.
Spatializer
1
AcousticModeler
Auralizer
Implementation
55
CHAPTER 7
System Implementation
This chapter provides an overview of CSL (The CREATE Signal Library), fol-
lowed by a description of the system implemented as proof of concept to the
aforementioned design. The code is carefully documented using the Doxygen
formatted comments. The documentation is included with CSL in HTML form.
This Chapter is not meant to replace the documentation, but to serve as a com-
plement, particularly for those that will want to improve or extend the system.
See Future Work at the end of this chapter for ideas on improvements.
56
57
thod, which gets called by other UnitGenerators requesting to fill a buffer with
samples. UnitGenerator subclasses override this method implementing their
DSP.
Processing objects in CSL are called Effect. An Effect is a UnitGenerator
with an input port (see Controllable below). Other UnitGenerators can be
connected to its input to get processed.
UnitGenerator has additional mechanisms for handling multiple outputs
and for notifying dependent objects (Observers) when new samples are com-
puted. More extensive documentation can be found with the CSL distribution.
7.1.2 Connections
Connections between UnitGenerators are hidden from the user. Class
Controllable takes care of making the connection from the input and output
ports. The user does not have to deal with Ports, instead connections are made
by setting a UnitGenerator as the input of another UnitGenerator. In most
cases the input can be given as a parameter to the constructor.
7.2.1 Position
The position of any object is currently represented as a point in a three
dimensional space. The internal representation is in cartesian coordinates. A
better representation of a position would be if both cartesian and polar rep-
resentations were stored. The tradeoff is memory usage versus accuracy and
performance. For some operations cartesian representation is ideal, but often-
times the values have to be converted.
// add two loudspeakers, at +30 and -30 degrees from the center of
the listening space, and no elevation.
myAudioSetup.addSpeaker(-30);
myAudioSetup.addSpeaker(30);
7.2.4 Panning
Class Panner is a pure abstract base class that provides management ser-
vices to its subclasses. Adding and removing sound sources is handled by this
class. The Panner class inherits from both UnitGenerator and Observer. In
60
addition, it is intended to play the role of the Strategy of the design pattern
with the same name.
Panners, have a virtual void *cache(); method that has to be imple-
mented by subclasses, returning a pointer to an object that stores the state of
the spatial sources. In this way, if the position of an object does not change, the
concrete Panner can request the previous state and use it without re-calculating
parameters when not needed.
As an Observer, csl::Panner registers itself with class csl::SpeakerLayout,
which will send a notification every time the layout changes. The Panner class
receives the notification and calls void speakerLayoutChanged();. Panner
subclasses should implement speakerLayoutChanged(); and not update(void
*arg);.
All concrete Panners in CSL have some code, or are based on code, written
by previous students at the Media Arts and Technology program. The VBAP
Panner for example borrows from the work by Dough McCoy [McCoy, 2005];
The Binaural Panner is based on a VST plug-in written by Ryan Avery, and
the Ambisonic Subsystem was ported and extended from the Ambisonic li-
brary built by Graham Wakefield, Florian Hollerweger and Jorge Castellanos
[Hollerweger, 2006].
0...*
Loudspeaker
HRTFDatabase
Panner HRTF Vector
numHRTFs() 0...1
hrtfLength()
hrtfAtIndex()
Binaural Panner
Concrete Concrete
Database A Database B
In the DSP loop, after getting the current position of the Sound Source,
the corresponding HRTF has to be determined. Currently, the HRTF closest
to the Sound Source position is chosen, but ideally a set of them would be
interpolated in order to obtain a more accurate result. Finally, the HRTFs and
the input data are multiplied (when performed in the frequency domain this is
equivalent to convolution) using the low-latency block-wise FFT method.
62
UnitGenerator
AmbisonicUnitGenerator AmbisonicOrder
• WFS Wavefield Synthesis is one of the primary techniques that any good
spatialization framework should have. Even though in most practical sit-
uations is useless, future improved technology could make it widely avail-
able.
The End
66
LITERATURE CITED
[I3D, 1998] (1998). 3D Audio Rendering and Evaluation Guidelines, Level 1.0.
IASIG.
[Daniel, 2003] Daniel, J. (2003). Spatial sound encoding including near field
effect: Introducing distance coding filters and a viable, new ambisonic
format. In Journal of the Audio Engineering Society: 23rd International
Conference, Copenhagen, Denmark.
[Daniel and Moreau, 2004] Daniel, J. and Moreau, S. (2004). Further study of
sound field coding with higher order ambisonics. In Journal of the Audio
Engineering Society, Berlin, Germany.
67
68
[Funkhouser et al., 1998] Funkhouser, T., Carlbom, I., Elko, G., Pingali, G.,
Sondhi, M., and West, J. (1998). A beam tracing approach to acoustic
modeling for interactive virtual environments. In on Computer Graphics, S.
A. S. I. G. and Techniques, I., editors, International Conference on
Computer Graphics and Interactive Techniques.
[Gamma et al., 1995] Gamma, E., Helm, R., Johnson, R., and Vlissides, J.
(1995). Design Patterns - Elements of Reusable Object-Oriented Software.
Addison-Wesley.
[Lidwell et al., 2003] Lidwell, W., Holdern, K., and Butler, J. (2003).
Universal Principles of Design: 100 Ways to Enhance Usability, Influence
Perception, Increase Appeal, Make Better Design Decisions, and Teach
Through Design. Rockport Publishers.
[Menzies, 2002] Menzies, D. (2002). W-panning and o-format, tools for object
spatialization. In AES 22 International Conference on Virtual, Synthetic
and Entertainment Audio.
[Naef et al., 2002] Naef, M., Staadt, O., and Gross, M. (2002). Spatialized
audio rendering for immersive virtual environments. ACM.
[Pope, 2005] Pope, S. T. (2005). Audio in the ucsb cnsi allosphere. Technical
report, Media Arts and Technology, University of California, Santa Barbara.
[Pope et al., 2006] Pope, S. T., Amatriain, X., Putnam, L., Castellanos, J.,
and Avery, R. (2006). Metamodels and design patterns in csl 4. To be
presented at the International Computer Music Conference.
[Roads, 1996] Roads, C. (1996). The computer music tutorial. The MIT Press.