Professional Documents
Culture Documents
Version 1.0.0-Beta
2|Introduction|Audio
OCULUS VR, OCULUS, and RIFT are trademarks of Oculus VR, LLC. (C) Oculus VR, LLC. All rights reserved.
BLUETOOTH is a registered trademark of Bluetooth SIG, Inc. All other trademarks are the property of their
respective owners. Certain materials included in this publication are reprinted with the permission of the
copyright holder.
2||
Audio|Contents|3
Contents
Introduction to Virtual Reality Audio................................................................... 5
Overview............................................................................................................................................................ 5
Localization and the Human Auditory System.................................................................................................. 5
Directional Localization................................................................................................................................ 6
Distance Localization.................................................................................................................................... 8
3D Audio Spatialization..................................................................................................................................... 9
Directional Spatialization with Head-Related Transfer Functions (HRTFs)................................................. 10
Distance Modeling..................................................................................................................................... 11
Listening Devices............................................................................................................................................. 11
Environmental Modeling..................................................................................................................................15
Sound Design for Spatialization...................................................................................................................... 17
Mixing Scenes for Virtual Reality.....................................................................................................................18
VR Audio Glossary........................................................................................................................................... 20
4|Contents|Audio
Release Notes.................................................................................................... 63
Audio SDK 1.0 Release Notes........................................................................................................................ 63
Audio SDK 0.11 Release Notes...................................................................................................................... 63
Audio SDK 0.10 Release Notes...................................................................................................................... 64
Overview
This document introduces fundamental concepts in audio development for virtual reality (VR) with an emphasis
on key factors that deserve development attention.
We hope to establish that audio is crucial for creating a persuasive VR experience. Because of the key role that
audio cues play in our cognitive perception of existing in space, any effort that development teams devote to
getting it right will pay off in spades, as it will contribute powerfully to the user's sense of immersion. This is as
true for small- or mid-sized teams as it is for design houses perhaps even more so.
Audio has been a crucial part of the computer and video gaming experience since the advent of the first coinop games, which filled arcades with bleeps, bloops, and digital explosions. Over time, the state of computer
audio has steadily improved, from simple wave generators (SID, 1983) to FM synthesis (AdLib, 1987), evolving
on to 8-bit mono samples (Amiga OCS, 1985; SoundBlaster, 1989) and 16-bit stereo samples (SoundBlaster
Pro), culminating in today's 5.1 surround sound systems on modern gaming consoles (XBox, 2001).
Since the development of 5.1 surround, little has changed. The fundamental technology of playing waveforms
over speakers is the same, and the game playing environment is still primarily the living room or den with a
large television and speakers.
Virtual reality, however, is changing all this. Instead of a large environment with speakers, virtual reality brings
the experience in close to the player via a head-mounted display (HMD) and headphones. The ability to track
the user's head orientation and position significantly empowers audio technology.
Until now, the emphasis has typically been placed on the visual aspects of virtual reality (resolution, latency,
tracking), but audio must now catch up in order to provide the greatest sense of presence possible.
This document discusses the challenges, opportunities, and solutions related to audio in VR, and how some
of the techniques learned in traditional game development must be revisited and modified for VR. It is not
intended to be a rigorous scientific study of the nature of acoustics, hearing and human auditory perception.
Its intended audience includes anyone with an interest in audio and VR, including sound designers, artists, and
programmers.
If you are interested in learning about these details in greater depth, we recommend searching the Web for the
following terms:
Head-Related Impulse Response
Head-Related Transfer Function
Sound Localization
Humans rely on psychoacoustics and inference to localize sounds in three dimensions, attending to factors such
as timing, phase, level, and spectral modifications.
This section summarizes how humans localize sound. Later, we will apply that knowledge to solving the
spatialization problem, and learn how developers can take a monophonic sound and transform its signal so that
it sounds like it comes from a specific point in space.
Directional Localization
In this section, we will look at the cues humans use to determine the direction to a sound source. The two key
components of localization are direction and distance.
Lateral
Laterally localizing a sound is the simplest type of localization, as one would expect. When a sound is closer to
the left, the left ear hears it before the right ear hears it, and it sounds louder. The closer to parity, the more
centered the sound, generally speaking.
There are, however, some interesting details. First, we may primarily localize a sound based on the delay
between the sound's arrival in both ears, or interaural time difference (ITD); or, we may primarily localize a
sound based on the difference in the sound's volume level in both ears, or the interaural level difference (ILD).
The localization technique we rely upon depends heavily on the frequency content of the signal.
Sounds below a certain frequency (anywhere from 500 to 800 Hz, depending on the source) are difficult to
distinguish based on level differences. However, sounds in this frequency range have half wavelengths greater
than the dimensions of a typical human head, allowing us to rely on timing information (or phase) between the
ears without confusion.
At the other extreme, sounds with frequencies above approximately 1500 Hz have half wavelengths smaller
than the typical head. Phase information is therefore no longer reliable for localizing the sound. At these
frequencies, we rely on level differences caused by head shadowing, or the sound attenuation that results from
our heads obstructing the far ear (see figure below).
We also key on the difference in time of the signal's onset. When a sound is played, which ear hears it first
is a big part of determining its location. However, this only helps us localize short sounds with transients as
opposed to continuous sounds.
There is a transitional zone between ~800 Hz and ~1500 Hz in which both level differences and time
differences are used for localization.
Front/Back/Elevation
Front versus back localization is significantly more difficult than lateral localization. We cannot rely on time
differences, since interaural time and/or level differences may be zero for a sound in front of or behind the
listener.
In the following figure we can see how sounds at locations A and B would be indistinguishable from each other
since they are the same distance from both ears, giving identical level and time differences.
Humans rely on spectral modifications of sounds caused by the head and body to resolve this ambiguity.
These spectral modifications are filters and reflections of sound caused by the shape and size of the head,
neck, shoulders, torso, and especially, by the outer ears (or pinnae). Because sounds originating from different
directions interact with the geometry of our bodies differently, our brains use spectral modification to infer
the direction of origin. For example, sounds approaching from the front produce resonances created by the
interior of our pinnae, while sounds from the back are shadowed by our pinnae. Similarly, sounds from above
may reflect off our shoulders, while sounds from below are shadowed by our torso and shoulders.
All of these reflections and shadowing effects combine to create a direction selective filter.
Head-Related Transfer Functions (HRTFs)
A direction selection filter can be encoded as a head-related transfer function (HRTF). The HRTF is the
cornerstone for most modern 3D sound spatialization techniques. How we measure and create an HRTF is
described in more detail elsewhere in this document.
Head Motion
HRTFs by themselves may not be enough to localize a sound precisely, so we often rely on head motion to
assist with localization. Simply turning our heads changes difficult front/back ambiguity problems into lateral
localization problems that we are better equipped to solve.
In the following figure sounds at A and B are indistinguishable from each other based on level or time
differences, since they are identical. By turning her head slightly, the listener alters the time and level
differences between ears, helping to disambiguate the location of the sound. D1 is closer than D2, which is a
cue that the sound is to the left (and thus behind) the listener.
Likewise, cocking our heads can help disambiguate objects vertically. In the following figure, the listener cocks
her head, which results in D1 shortening and D2 lengthening. This provides a cue that the object is above her
head instead of below it.
Distance Localization
ILD, ITD and HRTFs help us determine the direction to a sound source, but they give relatively sparse cues for
determining the distance to a sound. To determine distance we use a combination of factors, including initial
time delay, ratio of direct sound to reverberant sound, and motion parallax.
Loudness
Loudness is the most obvious distance cue, but it can be misleading. If we lack a frame of reference, we can't
judge how much the sound has diminished in volume from its source, and thus estimate a distance. Fortunately,
we are familiar with many of the sound sources that we encounter daily, such as musical instruments, human
voice, animals, vehicles, and so on, so we can predict these distances reasonably well.
For synthetic or unfamiliar sound sources, we have no such frame of reference, and we must rely on other cues
or relative volume changes to predict if a sound is approaching or receding.
Anechoic (echoless) or open environments such as deserts may not generate appreciable reflections, which
makes estimating distances more difficult.
Ratio of Direct Sound to Reverberation
In a reverberant environment there is a long, diffuse sound tail consisting of all the late echoes interacting with
each other, bouncing off surfaces, and slowly fading away. The more we hear of a direct sound in comparison
to the late reverberations, the closer we assume it is.
This property has been used by audio engineers for decades to move a musical instrument or vocalist to the
front or to the back of a song by adjusting the wet/dry mix of an artificial reverb.
Motion Parallax
Motion parallax (the apparent movement of a sound source through space) indicates distance, since nearby
sounds typically exhibit a greater degree of parallax than far-away sounds. For example, a nearby insect can
traverse from the left to the right side of your head very quickly, but a distant airplane may take many seconds
to do the same. As a consequence, if a sound source travels quickly relative to a stationary perspective, we
tend to perceive that sound as coming from nearby.
High Frequency Attenuation
High frequencies attenuate faster than low frequencies, so over long distances we can infer a bit about distance
based on how attenuated those high frequencies are. This is often a little overstated in the literature, because
sounds must travel hundreds or thousands of feet before high frequencies are noticeably attenuated (i.e., well
above 10 kHz). This is also affected by atmospheric conditions, such as temperature and humidity.
3D Audio Spatialization
The previous section discussed how humans localize the sources of sounds in three dimensions. We now invert
that and ask, Can we apply that information to fool people into thinking that a sound is coming from a specific
point in space?
The answer, thankfully, is yes, otherwise this would be a pretty short document. A big part of VR audio is
spatialization: the ability to play a sound as if it is positioned at a specific point in three-dimensional space.
Spatialization is a key aspect of presence because it provides powerful cues suggesting the user is in an actual
3D environment, which contributes strongly to a sense of immersion.
As with localization, there are two key components to spatialization: direction and distance.
We know that sounds are transformed by our body and ear geometry differently depending on the incoming
direction. These different effects form the basis of HRTFs, which we use to localize a sound.
Capturing HRTFs
The most accurate method of HRTF capture is to take an individual, put a couple microphones in their ears
(right outside the ear canal), place them in an anechoic chamber (i.e., an echoless environment), play sounds in
the chamber from every direction we care about, and record those sounds from the mics. We can then compare
the original sound with the captured sound and compute the HRTF that takes you from one to the other.
We have to do this for both ears, and we have to capture sounds from a sufficient number of discrete directions
to build a usable sample set.
But wait we have only captured HRTFs for a specific person. If our brains are conditioned to interpret the
HRTFs of our own bodies, why would that work? Don't we have to go to a lab and capture a personalized HRTF
set?
In a perfect world, yes, we'd all have custom HRTFs measured that match our own body and ear geometry
precisely, but in reality this isn't practical. While our HRTFs are personal, they are similar enough to each other
that a generic reference set is adequate for most situations, especially when combined with head tracking.
Most HRTF-based spatialization implementations use one of a few publicly available data sets, captured either
from a range of human test subjects or from a synthetic head model such as the KEMAR.
Most HRTF databases do not have HRTFs in all directions. For example, there is often a large gap representing
the area beneath the subject's head, as it is difficult, if no impossible, to place a speaker one meter directly
below an individual's head. Some HRTF databases are sparsely sampled, including HRTFs only every 5 or 15
degrees.
Most implementations either snap to the nearest acquired HRTF (which exhibits audible discontinuities) or use
some method of HRTF interpolation. This is an ongoing area of research, but for VR applications on desktops, it
is often adequate to find and use a sufficiently-dense data set.
Applying HRTFs
Given an HRTF set, if we know the direction we want a sound to appear to come from, we can select
an appropriate HRTF and apply it to the sound. This is usually done either in the form of a time-domain
convolution or an FFT/IFFT pair.
If you don't know what these are, don't worry - those details are only relevant if you are implementing the HRTF
system yourself. Our discussion glosses over a lot of the implementation details (e.g., how we store an HRTF,
how we use it when processing a sound). For our purposes, what matters is the high-level concept: we are
simply filtering an audio signal to make it sound like it's coming from a specific direction.
Since HRTFs take the listener's head geometry into account, it is important to use headphones when
performing spatialization. Without headphones, you are effectively applying two HRTFs: the simulated one, and
the actual HRTF caused by the geometry of your body.
Head Tracking
Listeners instinctively use head motion to disambiguate and fix sound in space. If we take this ability away, our
capacity to locate sounds in space is diminished, particularly with respect to elevation and front/back. Even
ignoring localization, if we are unable to compensate for head motion, then sound reproduction is tenuous at
best. When a listener turns their head 45 degrees to the side, we must be able to reflect that in their auditory
environment, or the soundscape will ring false.
VR headsets such as the Rift provide the ability to track a listener's head orientation (and, sometimes, position).
By providing this information to a sound package, we can project a sound in the listener's space, regardless of
their head position.
This assumes that the listener is wearing headphones. It is possible to mimic this with a speaker array, but it is
significantly less reliable, more cumbersome, and more difficult to implement, and thus impractical for most VR
applications.
Distance Modeling
HRTFs help us identify a sound's direction, but they do not model our localization of distance. Humans use
several factors to infer the distance to a sound source. These can be simulated with varying degrees of accuracy
and cost in software:
Loudness, our most reliable cue, is trivial to model with simple attenuation based on distance between the
source and the listener.
Initial Time Delay is significantly harder to model, as it requires computing the early reflections for a given
set of geometry, along with that geometry's characteristics. This is both computationally expensive and
awkward to implement architecturally (specifically, sending world geometry to a lower level API is often
complex). Even so, several packages have made attempts at this, ranging from simple shoebox models to
elaborate full scene geometric modeling.
Direct vs. Reverberant Sound (or, in audio production, the wet/dry mix) is a natural byproduct of any
system that attempts to accurately model reflections and late reverberations. Unfortunately, such systems
tend to be very expensive computationally. With ad hoc models based on artificial reverberators, the mix
setting can be adjusted in software, but these are strictly empirical models.
Motion Parallax we get for free, because it is a byproduct of the velocity of a sound source.
High Frequency Attenuation due to air absorption is a minor effect, but it is also reasonably easy to model
by applying a simple low-pass filter, and by adjusting cutoff frequency and slope. In practice, HF attenuation
is not very significant in comparison to the other distance cues.
Listening Devices
Traditionally, high quality audio reproduction has been the domain of multi-speaker systems, often
accompanied by one or more subwoofers. However, with the rise of online gaming and voice chat, many
players have transitioned to headsets (headphones with integrated microphones).
For modern VR, especially with head tracking and user movement, speaker arrays are an evolutionary dead
end. Headphone audio will be the standard for VR into the future, as it provides better isolation, privacy,
portability, and spatialization.
Headphones
Headphones offer several significant advantages over free-field speaker systems for virtual reality audio:
Acoustic isolation from the listener's environment enhance realism and immersion.
Head tracking is greatly simplified.
HRTFs are more accurate since they don't suffer from the doubling down of HRTF effects (sounds
modified from the simulated HRTF, and again by the listener's actual body geometry).
Access to controls while wearing an HMD is far simpler when those controls are physically attached to the
listener.
Microphones are ideally placed and subject to much less echo/feedback.
While acoustic isolation can help with immersion, it cuts listeners off from their environment so they may be
unable to hear others entering the room, cell phone ringing, doorbell, et cetera. Whether that is a good thing
or not is up to the individual.
Open Back Headphones
Open back headphones are generally more accurate and comfortable than closed-back headphones, but they
do not isolate listeners from the exterior environment, and broadcast to the surrounding environment as well.
These are suitable for quiet areas devoted to a VR experience, possibly in conjunction with a subwoofer.
As with closed back headphones, when placed on or over the ear, open back headphones allow the pinnae to
impact sound reproduction slightly.
Earbuds
Earbuds (such as those that ship with cell phones or portable music players) are cheap, lightweight, and
very portable, though they typically lack bass. Some models, such as Apple EarPods, have surprisingly good
frequency response, albeit with a steady roll off of bass frequencies. These are mostly ignored for spatialization.
Most earbuds are poor at isolation.
In-Ear Monitors
In-ear monitors offer superior isolation from your environment, are very lightweight, and have excellent
frequency response over the entire range. They remove the effects of the listener's pinnae from sound (unlike
on-ear headphones). They have the downside of requiring insertion into the ear canal, which eliminates the
effects of the ear canal from sound reproduction entirely (since most HRTFs are captured with microphones
right outside the ear canal).
Impulse Responses
Headphones, like all transducers, impart their own characteristics on signals, and since HRTFs are frequency
sensitive, removing the headphone character from the signal will usually be beneficial. This can be
accomplished by deconvolving the output signal with the headphone's impulse response.
External Speaker Systems
Until recently, the most common way to provide sound immersion was to surround the listener with speakers,
such as a Dolby 5.1 or 7.1 speaker configuration. While partially effective for a fixed and narrow sitting position,
speaker array systems suffer from key drawbacks:
Imprecise imaging due to panning over large portions of the listening area.
No elevation cues, sounds only appear in a 360 degree circle around the listener.
Assumption of immobile listener; in particular, no head tracking.
Room effects such as reverberation and reflections impact the reproduced sound.
Poor isolation means that outside sounds can intrude on the VR experience.
It is doubtful that multi-speaker configurations will be common or effective for home VR applications, though
they may be viable for dedicated commercial installations.
Bluetooth
Bluetooth has become a popular communication method of wireless audio broadcast. Unfortunately, modern
Bluetooth implementations often incur significant latency, sometimes as high as 500 milliseconds. As a result,
Bluetooth technology is not recommended for audio output.
Environmental Modeling
HRTFs in conjunction with attenuation provide an anechoic model of three dimensional sound, which exhibits
strong directional cues but tends to sound dry and artificial due to lacking room ambiance. To compensate for
this, we can add environmental modeling to mimic the acoustic effects of nearby geometry.
Reverberation and Reflections
As sounds travel through space, they reflect off of surfaces, creating a series of echoes. The initial distinct
echoes (early reflections) help us determine the direction and distance to a sound. As these echoes propagate,
diminish, and interact they create a late reverberation tail, which contributes to our sense of space.
Artificial Reverberations
Since modeling physical walls and late reverberations can quickly become computationally expensive,
reverberation is often introduced via artificial, ad hoc methods such as those used in digital reverb units of
the 80s and 90s. While less computationally intensive than physical models, they may also sound unrealistic,
depending on the algorithm and implementation especially since they are unable to take the listener's
orientation into account.
Sampled Impulse Response Reverberation
Convolution reverbs sample the impulse response from a specific real-world location such as a recording
studio, stadium, or lecture hall. It can then be applied to a signal later, resulting in a signal that sounds as if it
were played back in that location. This can produce some phenomenally lifelike sounds, but there are some
drawbacks. Sampled impulse responses rarely match in-game synthetic environments; they represent a fixed
listener position and orientation; they are monophonic; they are difficult to transition between different areas.
Even with these limitations, they still provide high-quality results in many situations.
World Geometry and Acoustics
The shoebox model attempts to provide a simplified representation of an environment's geometry. It
assumes no occlusion, equal frequency absorption on all surfaces, and six parallel walls at a fixed distance
from the listener's head. Needless to say, this is a heavy simplification for the sake of performance, and as VR
environments become more complex and dynamic, it may not scale properly
Some solutions exist today to simulate diffraction and complex environmental geometry, but support is not
widespread and performance implications are still significant.
Environmental Transitions
Modeling a specific area is complex, but still relatively straightforward. Irrespective of choice of model,
however, there is a problem of audible discontinuities or artifacts when transitioning between areas. Some
systems require flushing and restarting the entire reverberator, and other systems introduce artifacts as
parameters are changed in real-time.
Presence and Immersion
By creating audio that is on par with high quality VR visuals, developers immerse the user in a true virtual world,
giving them a sense of presence.
Audio immersion is maximized when the listener is located inside the scene, as opposed to viewing it from
afar. For example, a 3D chess game in which the player looks down at a virtual board offers less compelling
spatialization opportunities than a game in which the player stands on the play field. By the same token, an
audioscape in which moving elements whiz past the listener's head with auditory verisimilitude is far more
compelling than one in which audio cues cut the listener off from the action by communicating that they're
outside of the field of activity.
Note: It should be noted that while the pursuit of realism is laudable, it is also optional, as we want
developers and sound designers to maintain creative control over the output.
HRTFs work by filtering frequency content, and since pure tones lack that content, they are difficult to
spatialize with HRTFs
Any glitches or discontinuities in the HRTF process will be more audible since there is no additional
frequency content to mask the artifacts. A moving sine wave will often bring out the worst in a spatialization
implementation.
Use Wide Spectrum Sources
For the same reasons that pure tones are poor for spatialization, broad spectrum sounds work well by providing
lots of frequencies for the HRTF to work with. They also help mask audible glitches that result from dynamic
changes to HRTFs, pan, and attenuation. In addition to a broad spectrum of frequencies, ensure that there is
significant frequency content above 1500 Hz, since this is used heavily by humans for sound localization.
Low frequency sounds are difficult for humans to locate - this is why home theater systems use a monophonic
subwoofer channel. If a sound is predominantly low frequency (rumbles, drones, shakes, et cetera), then you
can avoid the overhead of spatialization and use pan/attenuation instead.
Avoid Real-time Format Conversions
Converting from one audio format to another can be costly and introduce latency, so sounds should be
delivered in the same output format (sampling rate and bit depth) as the target device. For most PCs, this will
be 16-bit, 44.1 kHz PCM, but some platforms may have different output formats (e.g. 16-bit, 48 kHz on Gear
VR).
Spatialized sounds are monophonic and should thus be authored as a single channel to avoid stereo-to-mono
merging at run-time (which can introduce phase and volume artifacts).
If your title ships with non-native format audio assets, consider converting to native format at installation or load
time to avoid a hit at run-time.
Directional Sources
The Oculus Audio SDK does not support directional sound sources (speakers, human voice, car horns, et
cetera). However, higher level SDKs often model these using angle-based attenuation that controls the
tightness of the direction. This directional attenuation should occur before the spatialization effect.
Area Sources
The Oculus Audio SDK does not support area sound sources such as waterfalls, rivers, crowds, and so on.
Doppler Effect
The Doppler effect is the apparent change of a sound's pitch as its source approaches or recedes. VR
experiences can emulate this by altering the playback based on the relative speed of a sound source and the
listener, however it is very easy to introduce artifacts inadvertently in the process.
The Oculus Audio SDK does not have native support for the Doppler effect, though some high-level SDKs do.
Sound Transport Time
In the real world, sound takes time to travel, so there is often a noticeable delay between seeing and hearing
something. For example, you would see the muzzle flash from a rifle fired at you 100 meters away roughly
330 ms before you would hear it. Modeling propagation time incurs some additional complexity and may
paradoxically make things seem less realistic, as we are conditioned by popular media to believe that loud
distance actions are immediately audible.
The Oculus Audio SDK supports time-of-arrival.
Non-Spatialized Audio
Not all sounds need to be spatialized. Plenty of sounds are static or head relative, such as:
User interface elements, such as button clicks, bleeps, transitions, and other cues
Background music
Narration
Body sounds, such as breathing or heart beats
Such sounds should be segregated during authoring as they will probably be stereo, and during mixing so they
are not inadvertently pushed through the 3D positional audio pipeline.
Performance
Spatialization incurs a performance hit for each additional sound that must be placed in the 3D sound field. This
cost varies, depending on the platform. For example, on a high end PC, it may be reasonable to spatialize 50+
sounds, while you may only be able to spatialize one or two sounds on a mobile device.
Some sounds may not benefit from spatialization even if placed in 3D in the world. For example, very low
rumbles or drones offer poor directionality and could be played as standard stereo sounds with some panning
and attenuation.
Ambiance
Aural immersion with traditional non-VR games was often impossible since many gamers or PC users relied on
low-quality desktop speakers, home theaters with poor environmental isolation, or gaming headsets optimized
for voice chat.
With headphones, positional tracking, and full visual immersion, it is now more important than ever that sound
designers focus on the user's audio experience.
This means:
Audible Artifacts
As a 3D sound moves through space, different HRTFs and attenuation functions may become active, potentially
introducing discontinuities at audio buffer boundaries. These discontinuities will often manifest as clicks, pops
or ripples. They may be masked to some extent by reducing the speed of traveling sounds and by ensuring that
your sounds have broad spectral content.
Latency
While latency affects all aspects of VR, it is often viewed as a graphical issue. However, audio latency can be
disruptive and immersion-breaking as well. Depending on the speed of the host system and the underlying
audio layer, the latency from buffer submission to audible output may be as short as 2 ms in high performance
PCs using high end, low-latency audio interfaces, or, in the worst case, as long as hundreds of milliseconds.
High system latency becomes an issue as the relative speed between an audio source and the listener's head
increases. In a relatively static scene with a slow moving viewer, audio latency is harder to detect.
Effects
Effects such as filtering, equalization, distortion, flanging, and so on can be an important part of the virtual
reality experience. For example, a low pass filter can emulate the sound of swimming underwater, where high
frequencies lose energy much more quickly than in air, or distortion may be used to simulate disorientation.
VR Audio Glossary
Definitions of technical terms VR audio terms.
Term
Definition
Anechoic
Attenuation
Direct Sound
Sound that has traveled directly to the listener without reflecting (versus reverberant
sound).
Early Reflections
Reflected sounds that arrive relatively soon at a listener's location (i.e., before Late
Reflections).
Head-Related
Impulse Response
(HRIR)
Term
Definition
Head-Related
Transfer Function
(HRTF)
Head Shadowing
The attenuation of sound caused by the head lying between an ear and the sound
source.
The interval between the arrival of a direct sound and its first reflection.
Interaural Level
Difference (ILD)
Interaural Time
Difference (ITD)
The length of the interval between when a sound arrives at the first ear and when it
arrives as the second ear.
Late Reflections
Reflected sounds that arrive relatively late at a listener's location (i.e., after early
reflections).
Motion Parallax
When moving objects are father from a perceiver, their apparent speed of travel
decreases; for example, a moving airplane on the horizon appears to be travelling more
slowly than a nearby car. The apparent rate of travel of an object can therefore be used
as a distance cue.
Pinnae
The visible portion of the ear that lies outside the head.
Reverberant Sound
Sound that has reflected or reverberated before arriving at a listener's location (versus
direct sound).
Reverberation
Sound Localization
Requirements
Head Tracking
By tracking the listener's head position and orientation, we can achieve accurate 3D sound spatialization. As
the listener moves or rotates their head, they perceive the sound as remaining at a fixed location in the virtual
world.
Developers may pass Oculus PC SDK ovrPosef structures to the Oculus Audio SDK for head tracking support.
Alternatively, they can pass listener-space sound positions and no pose information for the same effect.
Headphones
The Oculus Audio SDK assumes that the end user is wearing headphones, which provide better isolation,
privacy, portability, and spatialization than free-field speaker systems. When combined with head tracking and
spatialization technology, headphones deliver an immersive sense of presence. For more on the advantages
and disadvantages of headphones for virtual reality, please refer to Listening Devices in Introduction to Audio
for Virtual Reality.
Features
This section describes the features supported by the Oculus Audio SDK.
Supported Features
Unsupported Features
There are other aspects of a high quality audio experience, however these are often more appropriately
implemented by the application programmer or a higher level engine.
Occlusion
Sounds interact with a user's environment in many ways. Objects and walls may obstruct, reflect, or propagate
a sound through the virtual world. The Oculus SDK only supports direct reflections and does not factor in the
virtual world geometry. This problem needs to be solved at a higher level than the Oculus Audio SDK due to
the requirements of scanning and referencing world geometry.
Doppler Effect
The Doppler effect is the perceived change in pitch that occurs when a sound source is moving at a rapid rate
towards or away from a listener, such as the pitch change that is perceived when a car whooshes by. It is often
emulated in middleware with a simple change in playback rate by the sound engine.
Creative Effects
Effects such as equalization, distortion, flanging, and so on can be used to great effect in a virtual reality
experience. For example, a low pass filter can emulate the sound of swimming underwater, where high
frequencies lose energy much faster than in air, distortion may be used to simulate disorientation, a narrow
bandpass equalizer can give a 'radio' effect on sound sources, and so on.
The Oculus Audio SDK does not provide these effects. Typically, these would be applied by a higher level
middleware package either before or after the Audio SDK, depending on the desired outcome. For example, a
low-pass filter might be applied to the master stereo buffers to simulate swimming underwater, but a distortion
effect may be applied pre-spatialization for a broken radio effect in a game.
Area and directional Sound Sources
The Oculus Audio SDK supports monophonic point sources. When a sound is specified, it is assumed that the
waveform data represents the sound as audible to the listener. It is up to the caller to attenuate the incoming
sound to reflect speaker directional attenuation (e.g. someone speaking while facing away from the listener)
and area sources such as waterfalls or rivers.
Platform Notes
The Oculus Audio SDK currently supports Windows 7+, Android (Gear VR), Mac OS X, and Linux. This section
covers issues that may arise with different versions.
Component
Windows
Gear VR /
Android
Mac
Linux
C/C++ SDK
yes
yes
yes
yes
Wwise plugin
yes
TBD
TBD
no
Unity plugin
yes
yes
yes
TBD
FMOD plugin
yes
yes
yes
no
VST plugin
yes
no
yes
no
AAX plugin
soon!
no
soon!
no
Windows: The Oculus Audio SDK supports Windows 7 and later, both 32 and 64-bit.
Mac: The Oculus Audio SDK supports Mac OS X 10.9 and later, 32 and 64-bit.
Linux: The Oculus Audio SDK has been ported to Linux Ubuntu 14 (64-bit).
Android/Gear VR: Oculus Audio SDK supports Android phones on the Gear VR platform.
Middleware Support
Very few Oculus developers will use the Oculus Audio C/C++ SDK directly. Most developers use a middleware
framework, such as Audiokinetic Wwise or FMOD, and/or an engine such as Unity or Epic's Unreal. For this
reason, we support middleware packages and engines commonly used by developers.
Audiokinetic Wwise: Oculus provides a Wwise compatible plugin for Windows. More information about this
plugin can be found in the Oculus Spatializer for Wwise Integration Guide.
FMOD: The Oculus Audio SDK supports FMOD on the Windows, Mac and Android platforms. More
information about this plugin can be found in the Oculus Spatializer for FMOD Integration Guide.
Unity3D: The Oculus Audio SDK supports Unity 4.6 on Android, Mac OS X and Windows. More information
about this plugin can be found in the Oculus Spatializer for Unity Integration Guide.
Unreal Engine: Epic's Unreal Engine 4 supports numerous different audio subsystems. The Wwise
integration (available directly from Audiokinetic) has been tested with our Wwise Spatializer plugin (see
above).
Overview
This guide describes how to install and use the Oculus Native Spatializer plugin in Unity 5.2+ and in end-user
applications.
The Oculus Native Spatializer Plugin (ONSP) is an add-on plugin for Unity that allows monophonic sound
sources to be spatialized in 3D relative to the user's head location.
The Native Oculus Spatializer is built on Unitys Native Audio Plugin, which removes redundant spatialization
logic and provides a first-party HRTF. It supersedes our previous Oculus Spatializer Plugin for Unity (available
prior to Audio SDK 1.0 ), which used scripts to set settings on an Audio Source.
Our ability to localize audio sources in three-dimensional space is a fundamental part of how we experience
sound. Spatialization is the process of modifying sounds to make them localizable, so they seem to originate
from distinct locations relative to the listener. It is a key part of creating a sense of presence in virtual reality
games and applications.
For a detailed discussion of audio spatialization and virtual reality audio, we recommend reviewing our
Introduction to Virtual Reality Audio guide before using the Oculus Native Spatializer. If youre unfamiliar with
Unitys audio handling, be sure to review the Unity Audio guide.
Note: Our previous OSP for Unity is now available as the Legacy Oculus Spatializer, and is intended
primarily for users of Unity 4.
Updating to Oculus Native Spatializer for Unity from previous OSP for Unity Versions
1. Note the settings used in OSPManager in your project.
2. Replace OSPAudioSource.cs (from previous OSP) on AudioSources with OculusSpatializerUserParams.cs in
<project>/Assets/OSP.
3. Set the appropriate values previously used in OSPManager in the plugin effect found on the mixer channel.
Note that the native plugin adds functionality, so you will need to adjust to this new set of parameters.
4. Remove OSPManager from the project by deleting OSPManager*.* from <project>/Assets/OSP except
your newly-added OculusSpatializerUserParams.cs.
5. Verify that OculusSpatializer is set in the Audio Manager and that Spatialization is enabled for that voice.
All functions such as Play, Stop, et cetera, that used to be on the previous OSP no longer exist. Instead, one
uses the functions on AudioSource to start, stop and modify sounds as required.
This simple scene includes a red ball and a green ball, which illustrate different spatializer settings. A looping
electronic music track is attached to the red ball, and a short human voice sequence is attached to the green
ball.
Launch the scene in the Unity Game View, navigate with the arrow keys, and control the camera orientation
with your mouse to quickly hear the spatialization effects.
To import and open RedBallGreenBall :
1.
2.
3.
4.
3. In Build Settings
4.
5.
6.
7.
Applying Spatialization
Attach the helper script ONSPAudioSource.cs, found in Assets/OSPNative/scripts, to an AudioSource. This
script accesses the extended parameters required by the Oculus Native Spatializer. Note that all parameters
native to an AudioSource are still available, though some values may not be used if spatialization is enabled on
the audio source.
In this example, we look at the script attached to the green sphere in our sample RedBallGreenBall:
OculusSpatializerUserParams Properties
Enable Spatialization
Effectively, a bypass button for spatialization. If disabled, the attached Audio Source
will act as a native audio source without spatialization. This setting is linked to the
corresponding parameter in the Audio Source expandable pane (collapsed in the above
capture).
Disable Reflections
Select to disable early reflections and reverb for the spatialized audio source.
To use early reflections and reverb, you must deselect this value and add an
OculusSpatializerReflection plugin to the channel where you send the AudioSource in
the Audio Mixer. See Audio Mixer Setup below for more details.
Gain
Adds up to 24 dB gain to audio source volume (in db), with 0 equal to unity gain.
If selected, the audio source will use an internal attenuation falloff curve controlled by
the Near and Far parameters. If deselected, the attenuation falloff will be controlled by
the authored Unity Volume curve within the Audio Source Inspector panel.
Note: We strongly recommend enabling internal attenuation falloff for a more accurate
rendering of spatialization. The internal curves match both the way the direct audio
falloff as well as how the early reflections and reverbs are modelled.
Falloff Near
Sets the point at which the audio source starts attenuating, in meters. It also influences
the reflection/reverb system, whether or not internal inverse square attenuation is
enabled. Larger values will result in less noticeable attenuation when the listener is near
the sound source.
Falloff Far
Sets the point at which the audio source reaches full volume attenuation, in meters.
It also influences the reflection/reverb system, whether or not internal inverse square
attenuation is enabled. Larger values allow for loud sounds that can be heard from a
distance.
E.Rflt Rev On
Set to any non-zero value to enable global reverb (requires early reflection (E.Rflt On) to
be enabled).
Room X/Y/Z
Sets the dimensions of the theoretical room used to calculate reflections, in meters. The
greater the dimensions, the further apart the reflections. Range: 0 - 200m.
Sets the percentage of sound reflected by each respective wall. At 0, the reflection is
fully absorbed. At 1.0, the reflection bounces from the wall without any absorption.
Caps at 0.97 to avoid feedback.
RedBallGreenBall Example
To see how this works in RedBallGreenBall, access the Audio Mixer by selecting the Audio Mixer tab in your
Project View. Then select Master under Groups as shown below.
Select the green sphere in your Scene View. Note that the Output of the attached Audio Source vocal1 is set to
our Master (SpatializerMixer):
You can now set reflection/reverberation settings to globally affect spatialized voices:
Overview
The Legacy Oculus Spatializer Plugin (OSP) is an add-on plugin for the Unity tool set that allows monophonic
sound sources to be spatialized in 3D relative to the user's head location. This integration guide outlines how to
install and use OSP in the Unity editor and end-user application.
Note: Developers using Unity 5 should use the Oculus Native Spatializer Plugin. The Legacy Oculus
Spatializer Plugin is provided for Unity 4 development.
Version Compatibility
This integration has been tested with Unity 5.1 on Windows 7 / 8 (32 and 64-bit), as well as on Mac OS X
10.8.5 and Android 4.4.4 (KitKat). It is also known to work with Unity 4.5 and later.
Both Unity Professional and Free are supported.
General OSP Limitations
1. CPU usage increases when early reflections are turned on, and increases proportionately as room
dimensions become larger.
Navigate to https://unity3d.com.
Click Download.
Click Looking for a patch releases?
Select Unity v4.6.1p4 or higher.
Android drivers; you may hear more buffer over-run issues depending on the complexity of your audio
scene. We recommend treating OpenSL as a preview of low-latency audio on mobile VR.
For up-to-date information about OpenSL support in Unity, please refer to the Oculus Unity Integration Guide.
Once installed, you will see the following folders in your project:
OSP
OSPTestScene
Plugins: includes the OculusSpatializerPlugins for various platforms.
To add spatialization to audio in Unity, start with the prefabs OSPManager and OSPAudioSource in /OSP/
Prefabs.
OSPManager
Add OSPManager to a scene to initialize the spatialization engine. OSPManager contains global properties that
the application can use to change the characteristics of the spatializer.
OSPManager Properties
Bypass
Disables spatialization. All sounds routed through OSPAudioSource will get Unity native
2D panning.
Global Scale
The scale of positional values fed into the spatializer must be set to meters. Some
applications have different scale values assigned to a unit. For such applications, use
this field to adjust the scale for the spatializer. Unity defaults to 1 meter per unit.
Example: for an application with unit set to equal 1 cm, set the Global Scale value to
0.01 (1 cm = 100 m).
Gain
Sets a global gain to all spatialized sounds. Because the spatializer attenuates the
overall volume, it is important to adjust this value so spatialized sounds play at the same
volume level as non-spatialized (or native panning) sounds.
Allows early reflections to be turned on. Early reflections enhance the spatialization
effect, but incurs a CPU hit.
Reverb On
Adds a fixed reverb tail to the output when enabled. This reverb is calculated based on
the room parameters that are set up to calculate the reflections. While its use does not
necessarily incur a CPU hit, changing the room parameters may cause CPU hitches to
occur.
Room Dimensions
Sets the dimensions of the theoretical room used to calculate reflections. The greater
the dimensions, the further apart the reflections. Room size currently caps at the range
shown in the descriptor.
Reflection Values
Sets the percentage of sound reflected by each wall (Left/Right, Forward/Backward, Up/
Down). At 0, the reflection is fully absorbed. At 1.0, the reflection bounces from the wall
without any absorption. Caps below 1.0 to avoid feedback.
Reflection Gain
This value will adjust the gain on the reflection signal (both early reflections and reverb).
It is based in meters, which represents how far the reflection signal would travel before
attenuating to zero. The greater the number, the louder the reflection signal will be in
the final output.
OSPAudioSource
Use OSPAudioSource to create sounds that will be spatialized. OSPAudioSource consists of a native Unity
Audio Source component and an OSPAudioSource script component that interacts with the spatializer.
OSPAudioSource Properties
OSP works as an add-on to the native Unity Audio Source component. It derives much of its functionality from
the Audio Source component, including the distance attenuation curves.
Note: When triggering the sounds manually, you must call the OSPAudioSource component Play and
Stop functions. Please look at the OSPAudioSource.cs script for a list of the public functions available to
control a spatialized sound source.
A few caveats are detailed in Notes and Best Practices below.
Bypass
Use this toggle if you want the audio to skip 3D spatialization and play via native 2D
Unity panning. This feature can be toggled for convenience during run-time, and may
make integration of OSP Audio Sources into existing audio mangers easier and more
flexible.
Play On Awake
We recommend that you use this toggle in the OSPAudioSource instead of the one
found in the native Unity AudioSource component. Enabling Play On Awake in the
native AudioSource component will work, but you may hear a hiccup in the sound as
the spatializer tries to obtain spatialization resources.
Disable Reflections
If set to true, this sound instance will not calculate reflections/reverberation. Use this
for sounds that do not need to have reflections enabled. Reflections take up extra CPU,
so this is a good way to reduce the overall audio CPU cost. Note that if reflections are
disabled globally within OSP manager, this setting will have no effect.
It is important not to modify these values while sound is running through the spatializer, or it will dramatically
affect the output of the spatialization.
Please ensure that you check the 3D Sound check box in audio asset import settings (Unity 4 only; Unity 5 does
not include this option).
Currently, only monophonic sounds will spatialize. A stereo sound will not be collapsed down to a monophonic
sound (even if it is set by the user in Unity as a 3D sound). As such, a spatialized sound may not use stereo
spread to make a sound encompass the listener as it gets closer - this a common practice for current
spatialization techniques.
Overview
The Oculus Spatializer Plugin (OSP) is an add-on plugin for the Audiokinetic Wwise tool set that allows
monophonic sound sources to be spatialized in 3D relative to the user's head location. This integration guide
describes how to install and use OSP in both the Wwise application and the end-user application.
Version Compatibility
Two distributions of the Wwise plugin are provided. Please choose the appropriate one, based on the version
of Wwise used by your application:
AudioSDK\Plugins\Wwise
AudioSDK\Plugins\Wwise2015
Installation
Copy the files found in the folder <platform>\bin\plugins to the folder Wwise\Authoring\<platform>\Release
\bin\plugins.
Note: "platform" = "\Win32" or "\x64".
Note: If the Mixer Plug-in tab is not visible, hit the "+" tab and verify that mixer plugins are enabled
(check box is selected) for buses.
Under the Mixer Plug-in tab, click on the "" button at the right-hand side of the property window. This will
open up the Effect Editor (global properties) for OSP:
Global Properties
The following properties are found within the OSP effect editor:
Bypass Use native
panning
Disables spatialization. All sounds that are routed through this bus will get Wwise native
2D panning.
Gain (+/-24db)
Sets a global gain to all spatialized sounds. Because the spatializer attenuates the
overall volume, it is important to adjust this value so spatialized sounds play at the same
volume level as non-spatialized (or native panning) sounds.
Global Scale
The scale of positional values fed into the spatializer must be set to meters. Some
applications have different scale values assigned to a unit. For such applications, use
this field to adjust the scale for the spatializer. Unity defaults to 1 meter per unit.
(1 unit = 1 m)
Example: for an application with unit set to equal 1 cm, set the Global Scale value to
0.01 (1 cm = 100 m).
Enable Reflections
Allows early reflections to be turned on. This greatly enhances the spatialization effect,
but incurs a CPU hit.
Reverb On
If this field is set, a fixed reverberation calculated from the early reflection room size
and reflection values will be mixed into the output (see below). This can help diffuse the
output and give a more natural sounding spatialization effect. Note that if this is turned
on, you may encounter CPU hitching when changing the Room Size and Reflection
Values parameters at run-time. Enable Reflections must be on for reverberations to be
applied.
Reflections Range Max
100 - 100000
Room Size X/Y/Z
1 200 m
Reflection Values
0 0.97
Range of the attenuation curve for reflections and late reverberation. This is the distance
at which the reflections go silent, so it should roughly match the attenuation curve in
Wwise.
Sets the dimensions of the theoretical room used to calculate reflections. The greater
the dimensions, the further apart the reflections. Value range is 1-200 meters for each
X/Y/Z component.
Sets the percentage of sound reflected by walls for each wall specified for a room
(Left/Right, Forward/Backward, Up/Down). At 0, the reflection is fully absorbed. At 1.0,
the reflection bounces from the wall without any absorption. Capped at 0.97 to avoid
feedback.
Sound Properties
For a sound to be spatialized, you must ensure that sounds are set to use the bus to which you added OSP:
Upon setting the sound to the OSP bus, a Mixer plug-in tab will show up on the sounds Sound Property Editor:
Disable Reflections
Applies a realistic built-in inverse square attenuation curve for this sound.
The spatializer assumes that only one listener is being used to drive the spatialization. The listener is
equivalent to the user's head location in space, so please be sure to update as quickly as possible. See Wwise
documentation for any caveats on placing a listener update to an alternative thread from the main thread.
Provided that the listener and sounds are properly updated within the application, the sounds that have been
set to the OSP bus will have a greater sense of 3D presence!
Overview
The Oculus Spatializer Plugin (OSP) is an add-on plugin for FMOD Studio that allows monophonic sound
sources to be properly spatialized in 3D relative to the user's head location. This integration guide outlines how
to install and use OSP in both FMOD Studio and the end-user application.
General OSP Limitations
1. CPU usage increases when early reflections are turned on, and increases proportionately as room
dimensions become larger.
Adding the Oculus Spatializer Plugin to your FMOD Studio project
For FMOD Studio 1.07 and later:
1. Add the 64-bit version of ovrfmod.dll to Program Files\FMOD SoundSystem\FMOD Studio 1.07.00\plugins.
The folders Win32 and x64 contain the plugin DLL for Windows 32-bit and 64-bit, respectively.
1. Create a directory named Plugins in your FMOD Studio project directory, if one does not already exist.
2. Copy 32-bit ovrfmod.dll into that directory. (FMOD Studio is a 32-bit application.)
Make sure that your Project Output Format is set to stereo in FMOD Studio (Edit Preferences Format
Stereo).
Note that the theoretical room used to calculate reflections follows the listener's position and rotates with the
listener's orientation. Future implementation of early reflections will allow for the listener to freely walk around a
static room.
When using early reflections, be sure to set non-symmetrical room dimensions. A perfectly cubical room may
create re-enforcing echoes that can cause sounds to be poorly spatialized. The room size should roughly match
the size of the room in the game so the audio reinforces the visuals. The shoebox model works best when
simulating rooms. For large spaces and outdoor areas, it should be complimented with a separate reverb.
Parameters
Sound Properties
Prefer mono sounds and/or set the master track input format to mono (by right-clicking
on the metering bars on the left side).
Attenuation
Enables the internal distance attenuation model. If attenuation is disabled, you can
create a custom attenuation curve using a volume automation on a distance parameter.
Range Max
Disable Reflections
If set to true, this sound instance will not calculate reflections/reverberation. Use this
for sounds that do not need to have reflections enabled. Reflections take up extra CPU,
so this is a good way to reduce the overall audio CPU cost. Note that if reflections are
disabled globally with OSP FMOD API, this setting will have no effect.
OSP_FMOD_SetEarlyReflectionsEnabled
OSP_FMOD_SetLateReverberationEnabled
OSP_FMOD_SetGlobalScale
OSP_FMOD_SetSimpleBoxRoomParameters
OSP_FMOD_SetBypass
OSP_FMOD_SetGain
(for 64 bit)
fmodSystem.loadPlugin("ovrfmod64", out handle);
Overview
This integration guide describes how to install and use the Oculus Spatializer VST.
The Oculus Spatializer VST plugin is an add-on plugin for professional Digital Audio Workstations (DAWs). It
allows for monophonic sound sources to be properly spatialized in 3D relative to the user's head location.
The VST plugin incorporates the same spatialization algorithms found in our other plugin formats (Wwise,
FMod, and Unity). These other formats are typically used to generate real-time spatialization within virtual
environments. For the audio designer, the VST plugin comes in handy for setting up mixes within your favorite
DAW, and for hearing what the mix will sound like prior to being spatialized in virtual reality.
Version Compatibility
This VST has been tested with various DAWs on Windows 7 and 8.1 (32-bit and 64-bit) and Mac OS X 10.8.5+.
For more information, please see DAW-Specific Notes.
General OSP Limitations
1. CPU usage increases when early reflections are turned on, and increases proportionately as room
dimensions become larger.
Limitations Specific to VST
1. All parameters are assigned to MIDI controllers. However, most parameter ranges fall outside of the current
MIDI mapping range of 0.0 - 1.0. Range settings for each parameter will be resolved in a future release of
the plugin.
2. Please see DAW-Specific Notes for information about each DAW tested with the spatializer.
3. You must set your DAW sample rate to be at 44.1 kHz or 48 kHz for optimal fidelity. Note that values below
16 kHz or above 48 kHz will result in no sound.
Installation
Copy the relevant VST files to your system or to your DAW's VST installation directory.
PC
On Windows, most DAWs require you to specify custom locations for 32-bit and 64-bit VST plugins. If you have
not already setup custom locations for your VST folders, we recommend using the following:
64-bit
C:\Program Files\Steinberg\VSTPlugins
32-bit
Mac
In OS X, VST plugins must be installed to either the global or user Library folder.
Global
/Library/Audio/Plug-Ins/VST
User
~/Library/Audio/Plug-Ins/VST
We recommend assigning the Oculus Spatializer to your DAW tracks as an insert effect. Any sound that plays
through this track will then be globally affected by the spatialization process.
Track Parameters
Local Track Parameters
The top section of the plugin interface contains parameters that affect the individual track.
Local track parameters are used to set up the location of a sound source in 3D space, and to shape the
attenuation of the sound as it gets further away from the listener.
These parameters are stored with the project when saving and loading.
Bypass
GAIN(dB)
[0.0 - 24.0]
NEAR (m)
[0.0 - 175.0]
Sets the distance from the listener at which a sound source will start controlling the
attenuation curve (in meters). The attenuation curve approximates an inverse square
and reaches maximum attenuation when it reaches the Far parameter.
In the 2D grid display, the radius is represented by an orange disk around the sound
position.
FAR (m)
[0.0 - 175.0]
Sets the distance from the listener at which a sound source reaches maximum
attenuation value (in meters).
In the 2D grid display, the radius will be represented by a red disk around the sound
position.
Note: You may encounter CPU hitching when changing the Near/Far parameters at
run-time. This will only occur if both Reflections and Reverb are turned on (see
"Global Track Parameters" below).
Sets the location of the sound relative to the listener (in meters). The co-ordinate system
is right-handed, with Y-axis pointing up and the Z-axis pointing toward the screen (a.k.a.
Oculus coordinate system).
Sets the scale of the 2D grid display (in meters). This allows the user to have greater
control over the sound position placement.
Toggles the reflection engine, as defined by the reflection global parameters. This
enhances the spatialization effect but incurs a commensurate performance penalty.
REVERB
When enabled, a fixed reverberation is mixed into the output, providing a more
natural-sounding spatialization effect. Based on room size and reflection values (see
X/Y/Z Size and Left/Right, Forward/Backward, Up/Down Refl.). Reflections must be
enabled to use.
Note: You may encounter CPU hitching when changing the Room Size and
Reflection Values parameters at run-time.
LEFT/RIGHT, FORWARD/
BACKWARD,UP/DOWN
REFL.
Sets the dimensions of the theoretical room used to calculate reflections. The
greater the dimensions, the further apart the reflections.
In the 2D grid display, the room will be represented by a cyan box (this will only
display if Reflections are enabled).
Sets the percentage of sound reflected by each wall in a room with the dimensions
specified by the X/Y/Z SIZE parameters. At 0, the reflection is fully absorbed. At
1.0, the reflection bounces from the wall without any absorption. Capped at 0.97 to
avoid feedback.
[0.0 - 0.97]
Other Buttons and Toggles
Note: If you are using a scrolling mouse, you may change the rotaries with it by placing the cursor over
the rotary and scrolling up or down.
ABOUT
Displays the current version of the VST (which matches the version of the Oculus Audio
SDK being used). The Update button navigates to the latest Audio SDK in the Oculus
developer site.
XZ/XY (toggle)
Changes the 2D grid display from top-down (XZ) to front-face (XY). The head model in
the center of the grid will change to indicate which view you are in, making it easier to
understand the relationship between the sound location and head position.
DAW-Specific Notes
This section discusses DAW-specific caveats and issues for the Oculus Spatializer Plugin.
The Oculus Spatializer DAW plugin is a full stereo effect (Left/Right in, Left/Right out). It will only process the
incoming Left channel and allow you to move this monophonic signal through 3D space. Some DAWs will work
properly when assigning a stereo effect to a monophonic channel, while others will require workarounds.
Up to 64 sounds running through the bus may be spatialized.
DAW
WindowsOS X
Yes
Partial
Reaper 6.5
Yes
Partial
Additional Notes
Mono track not supported. Use a stereo track; the
plugin will collapse channels to mono automatically.
You may also send the mono track to the plugin as a
send/return effect instead of as an insert effect.
Placing a stereo insert effect onto a mono track is
not supported. Solution 1) Place your mono sound
on a stereo track, with the OSP as an insert effect.
Solution 2) Convert your mono source into a stereo
source. Currently, the left channel of the source will be
affected; there is no right channel selection or stereo
to mono collapse feature in the plugin. Solution 3) Use
a stereo send from a mono source.
Legal Notifications
VST is a trademark and software of Steinberg Media Technologies GmbH.
Overview
The Oculus Spatializer AAX plugin is an add-on plugin for Avids Pro Tools audio production platform.
This plugin allows for monophonic sound sources to be properly spatialized in 3D relative to the user's head
location.
The AAX plugin incorporates the same spatialization algorithms found in our other plugin formats (e.g., Wwise,
Unity). These other formats are typically used to generate real-time spatialization within virtual environments.
For the audio designer, the AAX plugin is useful for setting up mixes within the Pro Tools DAW (Digital Audio
Workstation) and for hearing what the mix will sound like prior to being spatialized in virtual reality.
Version Compatibility
The AAX plugin has been tested in Pro Tools 11 (64-bit) on Windows 7 and 8, and OS X 10.10+.
General OSP Limitations
1. CPU usage increases when early reflections are turned on, and increases proportionately as room
dimensions become larger.
Limitations Specific to AAX
1. All parameters are assigned to MIDI controllers. However, most parameter ranges fall outside of the current
MIDI mapping range of 0.0 - 1.0. Range settings for each parameter will be resolved in a future release of
the plugin.
2. You must set the Pro Tools sample rate to be at 44.1 kHz or 48 kHz for optimal fidelity. Note that values
below 16 kHz or above 48 kHz will result in no sound.
Installation
Copy Oculus Spatializer.aaxplugin to the Avid Pro Tools Plug-Ins folder.
On a Mac:
Macintosh HD/Library/Application Support/Avid/Audio/Plug-Ins
On a PC:
C:\Program Files\Common Files\Avid\Audio\Plug-Ins
We recommend assigning the Oculus Spatializer as an insert effect to a mono track in Pro Tools. Any sound that
plays through this track will then be globally affected by the spatialization process.
Track Parameters
Local Track Parameters
The top section of the plugin interface contains parameters that affect the individual track.
Local track parameters are used to set up the location of a sound source in 3D space, and to shape the
attenuation of the sound as it gets further away from the listener.
These parameters are stored with the project when saving and loading.
Bypass
GAIN(dB)
[0.0 - 24.0]
NEAR (m)
[0.0 - 175.0]
Sets the distance from the listener at which a sound source will start controlling the
attenuation curve (in meters). The attenuation curve approximates an inverse square
and reaches maximum attenuation when it reaches the Far parameter.
In the 2D grid display, the radius is represented by an orange disk around the sound
position.
FAR (m)
[0.0 - 175.0]
Sets the distance from the listener at which a sound source reaches maximum
attenuation value (in meters).
In the 2D grid display, the radius will be represented by a red disk around the sound
position.
Note: You may encounter CPU hitching when changing the Near/Far parameters at
run-time. This will only occur if both Reflections and Reverb are turned on (see
"Global Track Parameters" below).
Sets the location of the sound relative to the listener (in meters). The co-ordinate system
is right-handed, with Y-axis pointing up and the Z-axis pointing toward the screen (a.k.a.
Oculus coordinate system).
Sets the scale of the 2D grid display (in meters). This allows the user to have greater
control over the sound position placement.
Toggles the reflection engine, as defined by the reflection global parameters. This
enhances the spatialization effect but incurs a commensurate performance penalty.
REVERB
When enabled, a fixed reverberation is mixed into the output, providing a more
natural-sounding spatialization effect. Based on room size and reflection values (see
X/Y/Z Size and Left/Right, Forward/Backward, Up/Down Refl.). Reflections must be
enabled to use.
Note: You may encounter CPU hitching when changing the Room Size and
Reflection Values parameters at run-time.
LEFT/RIGHT, FORWARD/
BACKWARD,UP/DOWN
REFL.
Sets the dimensions of the theoretical room used to calculate reflections. The
greater the dimensions, the further apart the reflections.
In the 2D grid display, the room will be represented by a cyan box (this will only
display if Reflections are enabled).
Sets the percentage of sound reflected by each wall in a room with the dimensions
specified by the X/Y/Z SIZE parameters. At 0, the reflection is fully absorbed. At
1.0, the reflection bounces from the wall without any absorption. Capped at 0.97 to
avoid feedback.
[0.0 - 0.97]
Other Buttons and Toggles
Note: If you are using a scrolling mouse, you may change the rotaries with it by placing the cursor over
the rotary and scrolling up or down.
ABOUT
Displays the current version (which matches the version of the Oculus Audio SDK being
used). The Update button navigates to the latest Audio SDK in the Oculus developer
site.
XZ/XY (toggle)
Changes the 2D grid display from top-down (XZ) to front-face (XY). The head model in
the center of the grid will change to indicate which view you are in, making it easier to
understand the relationship between the sound location and head position.
Audio|Release Notes|63
Release Notes
This section describes changes for each version release.
64|Release Notes|Audio
0.11
Overview of Major Changes
This release introduces the OculusHQ spatializer provider, which combines the quality of the former High
Quality Provider with the performance of the Simple Provider. Plugins no longer require the selection of HQ or
Simple paths. Old implementations use OHQ by default, with reflections enabled.
New Features
Minor VST changes.
Added AAX.
Added Wwise 2015.1 support.
Improved PC and Android perfomance.
Known Issues
FastPath is currently not supported for Android. As of this release, it cannot be disabled in Unity 5.1 which
will cause intermittent audio issues. To workaround this, use Unity 4.6 until the next 5.1 patch release.
Audio|Release Notes|65