Professional Documents
Culture Documents
Technology Trends
in Audio Engineering
A report by the AES Technical Council
INTRODUCTION
Technical Committees are centers of technical expertise within the AES. Coordinated by the AES Technical Council, these
committees track trends in audio in order
to recommend to the Society papers,
workshops, tutorials, master classes, standards, projects, publications, conferences,
and awards in their fields. The Technical
Council serves the role of the CTO for the
society. Currently there are 23 such
groups of specialists within the council.
Each consists of members from diverse
backgrounds, countries, companies, and
interests. The committees strive to foster
wide-ranging points of view and
approaches to technology. Please go to:
http://www.aes.org/technical/ to learn
more about the activities of each committee and to inquire about membership.
Membership is open to all AES members
as well as those with a professional interest in each field.
Technical Committee meetings and
informal discussions held during regular
conventions serve to identify the most
current and upcoming issues in the specific technical domains concerning our
Society. The TC meetings are open to all
convention registrants. With the addition
of an internet-based Virtual Office, committee members can conduct business at
any time and from any place in the world.
One of the functions of the Technical
ARCHIVING, RESTORATION,
AND DIGITAL LIBRARIES
David Ackerman, Chair
Chris Lacinak, Vice Chair
Practical observations
Broadcast Wave File (BWF) format has
become the de facto standard for preservation of audio content within the field, as
has a digital audio resolution of 24 bit/96
kHz. Time-based metadata is also of particular interest, including time-stamped
descriptive metadata and closed captions.
Manufacturers have begun to enable preservation activities through additional
metadata capabilities and support for open
formats.
Sound for moving image is somewhat in
limbo, currently being grouped with moving image preservation for the most part.
Preservation of sound for moving image is a
current focus for future attention of this
committee. Moving image and sound
preservation graduate programs are emerg90
TECHNOLOGY TRENDS
X98C, metadata for process history of audio
objects.
AES SC-03 was retired this October.
The Federal Agencies Audio Visual Digitization Working Group (digitizationguidelines.gov) is investigating audio system
evaluation tools for evaluating the performance of analog to digital converters and for
detecting interstitial errors.
The Indiana University Archives of Traditional Music (ATM) and the Archive of
World Music (AWM) at Harvard University
have received a grant from the National
Endowment for the Humanities to undertake a joint technical archiving project, a
collaborative research and development initiative with tangible end results that will
create best practices and test emerging
standards for digital preservation of archival
audio. This is known as Sound Directions.
The National Recording Preservation
Board, mandated by the National Recording
Preservation Act of 2000, is an advisory
group bringing together a number of professional organizations and expert individuals concerned with the preservation of
recorded sound. The group has published a
report from the engineers roundtable
(CLIR).
The National Digital Information Infra-
and Android devices) and non-contact technology (e.g., Microsoft Kinect, PlayStation
Eye). These are able to track player position
or gestures and are beginning to find useful
applications in game-audio. 3-D video is yet
to demonstrate a new counterpart in audio.
Spatial audio
Audio input
Speech input is now used in a number of
games and devices for character control or
player-to-player communication. Speech
analysis and processing is a key research
area in game-audio. Analysis of singing and
research in this area has been applied in a
number of leading console game titles.
Rhythm based games (e.g., Rock Band, Guitar Hero) make use of varying degrees of
instrument-style peripherals such as guitar
controllers, piano keyboards, virtual drum
kits; as well as motion controllers and
touch screens. New technologies, such as
TECHNOLOGY TRENDS
Education and standards
Standards activity continues in the games
industry and becomes more relevant as the
industry matures. Current standards activity includes: interoperable file formats, digital audio workstation design, and loudness
AUDIO FOR
TELECOMMUNICATIONS
Bob Zurek, Chair
Antti Kelloniemi, Vice Chair
TECHNOLOGY TRENDS
AUDIO FORENSICS
Enhancement
The enhancement of forensic audio
recordings remains the most common
task for forensic audio practitioners. The
goal of forensic audio enhancement is to
increase intelligibility of voice information or improve the signal to noise ratio
of a target signal by reducing the effects
or interferences that mask it. Many tools
are available through various software
developers with the most common being
noise reductioneither adaptive or linear. Difficulties in this area are caused by
lossy data compression common to small
digital recorders, data compression, and
bandwidth limited signals in telecommunications, and non-ideal recording environments common to surveillance and
security. One growing area of research is
the assessment of speech intelligibility
with multiple papers presented on the
topic at the AES 39th Conference on
Audio Forensics in 2010.
Authentication
The majority of audio media presented to
the forensic examiner are digital recordings on optical disc, HDD, flash memory,
and solid-state recorders. However, the
analysis of analog tape cassettes and
microcassettes is still required of examiners. In the area of forensic media authentication, digitally recorded audio files
may be subject to various kinds of manipulation that are harder to detect than
those in the analog domain. This leaves
the forensic audio examiner with new
challenges regarding the authentication
of these recordings. Many new techniques
have been developed in recent years for
use in these analyses. These techniques
continue to be published and presented
through the AES Journal and proceedings
of AES Conferences and Conventions.
Among these techniques is the analysis of
the Electric Network Frequency component (ENF) of a recording. If present, the
remains of the ENF may be compared to a
database of ENF from the same grid to
authenticate the date and time the
recording was made. In addition to automatic database comparison, it is possible
to learn several other things from ENF
analysis including whether portions of
the recording were removed, if an audio
recording was digitized multiple times,
Other considerations
Since the fundamental aspect of forensic
audio is its application to law with the litigation process benefitting from audio
enhancement and analysis, it is important
for the practitioner working with forensic
audio to be aware of this process and the
need for proper evidence handling and
laboratory procedures. As digital audio
proliferates so to have the identification
of proper practices for imaging media,
hashing file duplicates, and recovering
and/or repairing corrupt or carved files.
Additionally, it is not only common for
forensic audio to be played in a courtroom but for typed transcripts of
recorded conversations to be prepared for
the individuals involved in a case; the
lawyers, judge(s), and/or jury. Specific to
these needs, there are developments in
addressing the inherent bias present in
the human preparation of these transcripts. Also, the forensic audio practitioner must be aware of the audio samples being presented taking into
consideration courtroom acoustics, psychoacoustics, and the hearing abilities of
these individuals.
AES activities
Numerous papers on audio forensics
appear in the Journal of AES and are presented at AES conventions each year.
Additionally, there have been three AES
conferences on audio forensics since 2005
(AES 26th, 33rd, and 39th) and the next
will be in Denver, CO in 2012. Additionally, regular workshops and tutorials
appear at AES conventions. At the AES
130th Convention in London there was a
tutorial on forensic audio enhancement,
and at the AES 131st Convention in New
York there was a workshop on forensic
audio authentication.
93
TECHNOLOGY TRENDS
AUDIO RECORDING
AND MASTERING SYSTEMS
AUTOMOTIVE AUDIO
Richard Stroud, Chair
Tim Nind, Vice Chair
TECHNOLOGY TRENDS
round sound is becoming mandatory in
high-end automotive systems even when
the source is limited to two channels (so
this is implemented using upmix algorithms). Some listeners sense that some
surround systems provide limited envelopment on both stereo and much surround
source material.
There is almost universal branding of
audio systems in luxury cars, and newer
brands are emerging. The maximum number of speakers used in luxury vehicle systems seems to be leveling out at 182. Aftermarket audio now represents a very small
part of the automotive audio market. There
are still parts of the world where 5.1 and
high-level premium audio are not featured
in most vehicles audio line-ups. These sys-
tems can perhaps take advantage of inexpensive, powerful audio DSP systems to
improve performance. Rear seat audio performance may be important in China and
other countries, as some who can afford
automobiles can also afford drivers.
Voice recognition systems for telephone
and navigation functions are becoming
more sophisticated and enjoy wider application. Automatic equalization is being
offered for audio system tuning. Use of such
automatic systems can significantly speed
the tuning process but may not be ready to
completely replace tuning for on-road performance by trained listeners. Active noise
cancellation by the audio system is being
used for exhaust drone under condition of
cylinder deactivation. Active road noise bass
Overview
Audio coding has emerged as a critical
technology in numerous audio applications. In particular, it is a key component
of mobile multimedia applications in the
consumer market. Examples include
wireless audio broadcast, internet radio
and streaming music, music download,
storage and playback, mobile audio
recording, and Internet-based teleconferencing. Example platforms include digital
audio broadcast radio receivers, portable
music players, mobile phones, and personal computers. From this, a variety of
implications and trends can be discerned.
Digital distribution of content is
offered to the consumer in many formats
with varying quality / bitrate trade-off,
depending on application context. This
ranges from very compact formats (e.g.,
MPEG HE-AACv2 and MPEG USAC) for
wireless mobile distribution to perceptually transparent, scalable-to-lossless and
lossless formats for regular IP-based
distribution (e.g., MPEG AAC, HD-AAC
and ALS).
The frontiers of compression have been
pushed further, allowing carriage of fullbandwidth signals at very low bit rates to
the point where recent coding systems are
considered appropriate for some broadcasting applications, particularly relatively
expensive wireless communication channels such as satellite or cellular channels.
While such technology predominantly
makes use of parametric approaches (at
least in part) to achieve highest possible
quality at lowest bit rates, they are typi-
TECHNOLOGY TRENDS
Parametric coding of audio object signals provides, similarly to parametric
coding of multichannel audio, a very
compact representation of a scene consisting of several audio objects (e.g.,
music instruments, talkers, etc.). Rather
than transmitting discrete object signals,
the (downmixed) scene is transmitted,
plus parametric side information describing the properties of the individual
objects. At the decoder side, the scene
can be modified by the user according to
his/her preference, e.g., the level of a particular object can be attenuated or
boosted. A recent example for such a
technology is MPEG Spatial Audio Object
Coding (SAOC).
There has been significant progress in
the challenge of developing a truly universal coder that can deliver state of the
art performance for all kinds of input signals, including music and speech, that
has been achieved. Hybrid coders, such as
MPEG USAC (Unified Speech and Audio
Coding), have a structure combining elements from the speech and the audio
coding architectures and, over a wide
range of bit rates, perform better than
coders designed for only speech or only
audio.
TECHNOLOGY TRENDS
HEARING AND HEARING LOSS
PREVENTION
Introduction
The AESTC on Hearing and Hearing Loss
Prevention was established in 2005 with
five initial goals focused on informing the
membership as to important aspects of the
hearing process and issues related to hearing loss, so as to promote engineeringbased solutions to improve hearing and
reduce hearing loss. Its aims include the
following: raising AES member awareness
of the normal and abnormal functions of
the hearing process; raising AES member
awareness of the risk and consequences of
hearing loss resulting from excessive sound
exposure; coordinating and providing technical guidance for the AES-supported hearing testing and consultation programs at
U.S. and European conventions; facilitating
the maintenance and refinement of a database of audiometric test results and exposure information on AES members; forging
a cooperative union between AES members,
audio equipment manufacturers, hearing
instrument manufacturers, and the hearing
conservation community for purposes of
developing strategies, technologies, and
tools to reduce and prevent hearing loss.
Tinnitus
Another hearing disorder, tinnitus, is commonly experienced by individuals, often as
a result of ear infections, foreign objects or
wax in the ear, and injury from loud noises.
Tinnitus can be perceived in one or both
ears or in the head. It is usually described
as a ringing, buzzing noise, or a pure tone
perception. Certain treatments for tinnitus
have been developed for excessive conditions in the form of audio masking,
however most research is directed toward
pharmaceutical solutions and prevention.
We are also seeing the emergence of electro-acoustic techniques for treating what is
commonly referred to as idiopathic tinnitus or tinnitus with no known medical
cause. About 95% of all tinnitus is considered idiopathic. These treatments involve
prescriptive sound stimuli protocols based
on the spectral content and intensity of the
97
TECHNOLOGY TRENDS
tinnitus. In Europe, psychological assistance to help individuals live with their tinnitus is a well established procedure.
Within the past decade, the types, distribution, and uses of audio have greatly diversified. Portables and internet sourcing have
flourished and disc sales have fallen,
although the balance between the two
varies by country. High quality audio for
formal listening has evolved simultaneously
and mirrors many of the same influences.
There is a notable broad trend toward
increasing quality in many aspects of audio,
and together with promised developments
such as cloud storage and HD streaming,
digital audio including high quality formal
listening will continue to grow and evolve.
Music sources
High resolution remains a mainstay of professional recording and archiving due to its
extended headroom, precision, and frequency capture. In the consumer marketplace, the principal current high resolution
sources are discs, especially Blu-ray, and
internet downloads. The music for these
releases reflects a range of eras and recording techniques as well as resolutions, and
may have been remastered, transcoded, or
upsampled. Thus the frequency extension
and dynamic range in some cases is less
than that of newer recordings made directly
at high resolution.
The original high resolution disc formats
have not achieved wide success although
SACDs continue to be released in small
numbers, notably in classical music. SACDcapable players continue to be available and
todays universal players may play Blu-ray
Disc (BD), DVD, SACD, and CD. Some support for Direct Stream Digital, the single bit
encoding technique behind SACD, can be
found in professional recorders, players,
and modern interfaces, but LPCM has
largely supplanted single bit techniques as
release and recording formats.
98
Growth of computer
and server-based audio
There is a strong trend toward adoption of
computers and file servers into all areas of
audio, especially evident in the U.S. and Far
East. For high quality audio, there are
excellent opportunities but a range of new
technical and delivery issues. The term
computer audio covers numerous configurations where the computer may act as
front end disc player or file server; may output audio via a PCI sound card, external
sound card, or motherboard ports; and may
access downloads or streamed radio and AV
from the internet. Files may be stored on
hard drives, flash, network-attached storage
(NAS), or redundant arrays with backup;
and network file servers other than a computer may act as software players.
The traditional audiophile two-channel,
music-only marketplace has embraced
computers and file servers due to the convenience of file storage and downloads. In
this market, which overlaps professional
audio, the design ethos of low distortion,
high quality engineering has spurred
manufacturer research in identifying and
eliminating technical problems associated
with computers as front end devices.
These include isolation of noisy computer
power supplies, avoidance of jittered computer clocks, RFI shielding, special attention to computer layouts by makers of
PCI sound cards, and design of digital
interfaces to avoid contaminating an
external DAC master clock with the jitter
TECHNOLOGY TRENDS
and noise from the PC. Examples of the
latter include asynchronous USB, PLL
chips in association with Firewire and
SPDIF, and DAC-controlled data transmission. Much ongoing effort in computer
related software aims to provide bit-accurate decoding, ripping, playback, and
transcoding.
A trend to include computer audio in
home theater is underway as well but
with a greater mix of challenges for high
quality audio. Home theater is above all a
rapidly evolving and richly diverse area of
wide price range and capability. HT components routinely support the lower resolution compressed formats streamed from
the internet and cable, and variously the
high resolution AV needed for DVD, BD,
and HDTV. Support for the file types and
resolutions typical of downloads, disc
rips, and AV from other recording or nonmovie sources may be absent. It continues to be challenging to transmit files
without invoking unwanted sample rate
conversion, unintended transcoding (e.g.,
FLAC to MP3), bit truncation, and loss of
metadata.
Research
High resolution formats in general are
mature, although efforts to improve lossless compression continue. Inquiry continues into the perceptual characteristics and
audibility of sample rates above
44.1 kHz/16 bit, and of the associated filtering and data conversion processes.
Design research continues on loudspeakers, class D amps, and microphones in
support of the wide bandwidth, low distortion, wide dynamic range requirements of
high resolution. Also, surround algorithms
emphasizing enhanced spatial coding are
an especially active research area that
should be mentioned in context of high
resolution because of the improved spatial
resolution they afford.
HUMAN FACTORS
IN AUDIO SYSTEMS
Michael Hlatky, Chair
William Martens, Vice Chair
Jrn Loviscach
with digital audio: Touch screens commonly lack pixel-precise navigation, parts of
the screen will be visually obstructed by the
users hand and arm when manipulating an
on-screen control, and there is relatively
little to no tactile feedback during the interaction process.
These three reasons alone make the
design of, for instance, a touch-controlled
on-screen fader quite cumbersome. While
the precision achievable by touch manipulation of an on-screen fader might be
enough to set the playback volume when
listening to MP3s on a phone, it can be by
far not enough to set parameters when
mixing music. Some manufacturers have
therefore enabled swiping gestures on
touch-controlled faders to increase precision; this does, however, take away direct
controllability, as several micro-actions
might be necessary to achieve a desired
parameter value. Furthermore, the lack of
pressure-sensitive touch screens on the
mass market renders the expressive control
of musical instruments with such devices
nearly impossible. To enable an additional
TECHNOLOGY TRENDS
new set of problems to designers of the
common digital audio workstation (DAW).
How does a future digital audio workstation
that is targeted at producing audio for
interactive applications integrate itself well
into the development environments for the
iPhone and its siblings? Hints might be
taken from software employed to design
interactive music scores and dynamic
sounds for computer games, such as Crytecs CryEngine, or visual programming
languages such as Cycling 74 Max, or Pure
Data.
The cloud
Another trend to be observed at Music
Hackdays is the rise of web-based APIs
(application programming interfaces).
Whether it is finding new audio content,
processing audio, or simply listening to
music, companies such as SoundCloud, The
Echo Nest or Spotify have an API for that.
Music discovery and recommendation via
interconnected web services are topics
taken on now by Facebook and Google, and
even Pro Tools got in its tenth incarnation
equipped with a function to directly bounce
a mix to SoundCloud. Even the DAW has
moved into the cloud, with, for instance,
PowerFXs Soundation Studio or OhmForces OhmStudio.
The key benefit of these new audio production platforms are the enhanced possibilities for remote collaboration in comparison to traditional DAWs. The move to the
cloud does, however, also enable a whole
new approach to designing user interfaces
through so-called perpetual betas. As applications are running in the browser, update
cycles are frictionless, because each time
the user loads a session, a new version of
the software can be delivered. Another fact
to keep in mind is that the computing
power in the cloud is decentralized. A limit
to the number of plugins running in parallel might be a problem of the past as soon
as audio processing has moved to the cloud.
With all this computational power avail-
MICROPHONES
AND APPLICATIONS
Eddy B. Brixen, Chair
David Josephson, Vice Chair
Transducer technology
There has been no major break-through in
transducer technology during the last
years. Microelectronic mechanical systems
(MEMS) are not yet on the market for professional audio. However, in the near future
Digital adaptation
Innovation in the field of modern microphone technology is to some degree concentrated around adaptation to the digital age.
In particular the interfacing problems are
addressed. The continued updating of the
AES42 standard is essential in this respect.
Now dedicated input/control stages for
microphones with integrated interfaces are
available. However, different widely implemented device-to-computer standards like
TECHNOLOGY TRENDS
USB and Firewirewhich are not specifically reserved for audiohave also been
applied in this field. Regarding the data
streams, USB3 is fully satisfactory for most
audio purposes but USB microphones are
outside standards. However they have
reached a much higher level of popularity in
semi-pro audio and home recording compared to AES42.
DSP-controlled microphones are still
developing. This includes directional pattern
control of multi-transducer units providing
steering or multichannel output for surround
recordings. These techniques are not necessarily applicable in professional audio. However, in the field of surveillance and security
EBU N/ACIP
The European Broadcasting Union (EBU)
together with many equipment manufacturers has defined a common framework for
Audio Contribution over IP in order to
achieve interoperability between products.
The framework defines RTP as a common
protocol and media payload type formats
according to IETF definitions. SIP is used as
signaling for call setup and control, along
with SDP for the session description. The
recommendation is currently published as
document EBU Tech 3326-2008.
Ethernet networks. The IEEE is the organization that maintains Ethernet standards
including wired and wireless Ethernet (principally 802.3 and 802.11 respectively). AVB
adds several new services to Ethernet
switches to bring this about. The new
switches interoperate with existing Ethernet
gear but AVB-compliant media equipment
interconnected through these switches enjoy
performance currently only available from
proprietary network systems.
AVB consists of a number of interacting
standards:
802.1AS Timing and Synchronization
802.1Qat Stream Reservation Protocol
802.1Qav Forwarding and Queuing
802.1BA AVB System
IEEE 1722 Layer 2 Transport Protocol
IEEE P1722.1 Discovery, enumeration,
connection management and control
IEEE 1733 Layer 3 Transport Protocol.
AVB standardization efforts began in
earnest in late 2006. As of November 2011,
all but the P1722.1 work have been ratified
by the IEEE.
RAVENNA
A consortium of European audio companies
has announced an initiative called RAVENNA
for real-time distribution of audio and other
media content in IP-based network environments. RAVENNA uses protocols from the
IETFs RTP suite for media transport. IEEE
1588-2008 is used for clock distribution.
Performance and capacity scale with the
capabilities of the underlying network architecture. RAVENNA emphasizes data transparency, tight synchronization, low latency,
and reliability. It is aimed at applications in
AES X192
Audio Engineering Society Standards Committee Task Group SC-02-12-H is developing
an interoperability standard for high-performance media networking. The project has
been designated X192.
High-performance media networks support professional quality audio (16 bit,
48 kHz and higher) with low latencies (less
than 10 ms) compatible with live sound reinforcement. The level of network performance required to meet these requirements is
achievable on enterprise-scale networks but
generally not on wide-area networks or the
public internet.
The most recent generation of these
media networks use a diversity of proprietary
and standard protocols (see Table 1). Despite
101
TECHNOLOGY TRENDS
Technology
Purveyor
Transport
RAVENNA
ALC NetworX
RTP
AVB
IEEE, AVnu
Ethernet,
RTP
Q-LAN
Dante
QSC Audio
Products
Audinate
IEEE 1588-2002
UDP
LiveWire
Telos/Axia
2004
Proprietary (native),
UDP
a common basis in Internet Protocol, the systems do not interoperate. This latest crop of
technologies has not yet reached a level of
maturity that precludes changes to improve
interoperability.
The X192 project endeavors to identify
the region of intersection between these
technologies and to define an interoperability standard within that region. The initiative will focus on defining how existing
protocols are used to create an interoperable system. It is believed that no new protocols need be developed to achieve this.
Developing interoperability is therefore a
relatively small investment with potentially
huge return for users, audio equipment
manufacturers, and network equipment
providers.
While the immediate X192 objective is to
define a common interoperability mode the
different technologies may use to communicate to one another, it is believed that the
mode will have the potential to eventually
become the default mode for all systems. It
will be compatible with and receive performance benefits from an AVB infrastructure. Use of the standard will allow AVB
implementations to reach beyond Ethernet
into wider area applications.
While the initial X192 target application
is audio distribution, it is assumed that the
framework developed by X192 will be substantially applicable to video and other
types of media data.
Dante
Dante is a media networking solution
developed by Audinate. In addition to providing basic synchronization and transport
protocols it provides simple plug and play
operation, PC sound card interfacing via
software or hardware, glitch free redundancy, support for AVB, and support for
routed IP networks. The first Dante product arrived in 2008 via a firmware upgrade
for the Dolby Lake Processor and since
102
then many professional audio and broadcast manufacturers have adopted Dante.
From the beginning Dante implementations have been fully IP based, using the
IEEE 1588-2002 standard for synchronization, UDP/IP for audio transport and are
designed to exploit standard gigabit Ethernet switches and VoIP-style QoS (quality of
service) technology (e.g., Diffserv). Dante
is evolving with new networking standards.
Audinate has produced versions of Dante
that use the new Ethernet Audio Video
Bridging (AVB) protocols, including IEEE
802.1AS for synchronization and RTP
transport protocols. It is committed to supporting both IEEE 1733 and IEEE 1722.
Existing Dante hardware devices can be
firmware upgraded as Dante evolves, providing a migration path from existing
equipment to new AVB capable Ethernet
equipment.
Recent developments include announced
support for routing audio signals between
IP subnets and the demonstration of low
latency video. Audinate is a member of the
AVnu Alliance and the AES X192 working
group.
Q-LAN
Q-LAN is a third-generation networked
media distribution technology providing
high quality, low latency, and ample scalability aimed primarily at commercial and
professional audio systems. Q-LAN operates over gigabit and higher rate IP networks. Q-LAN is a central component of
QSCs Q-Sys integrated system platform.
Q-Sys was introduced by QSC Audio Products in June 2009. Q-LAN carries up to 512
channels of uncompressed digital audio in
floating point format with a latency of 1
millisecond.
TECHNOLOGY TRENDS
XFN command and control protocol
XFN is an IP-based peer to peer audio network control protocol, in which any device
on the network can send or receive connection management, control, and monitoring
messages. The size and capability of devices
on the network will vary. Some devices will
be large, and will incorporate extensive
functionality, while other devices will be
small with limited functionality. The XFN
protocol is undergoing standardization
within the AES, and AES project X170 has
been assigned to structure the standardization process. A draft standards document
has been written and presented to the SC02-12 working group for approval.
large variety of content. Apple is also driving this trend with iCloud, released with
iOS5. Consumer devices are becoming
more complicated and connecting devices
to the network has been difficult for users,
resulting in many calls to tech support.
The good news is that devices are becoming easier to set up. The WiFi Alliance has
created an easy setup method call WiFi
Protected Setup (WPS). This makes attaching a new device onto the home network as
easy as pressing a button or entering a simple numeric code.
Another trend driven by the adoption of
home wireless LAN technologies is in the
user interface (UI) of networked audio
devices. More and more audio products are
using the iPhone or iPad as the primary
method of device control, via the home
WiFi network. Some commentators are
even announcing the death of the infrared
remote control. Consumer Audio/Video
Receiver manufacturers such as Denon and
Pioneer offer free iPhone/iPad apps which
allow complete, and intuitive control of
their devices. This leads to another emerging trend, that of the display-less networked audio player. Once the player can
be conveniently controlled from your
smartphone, it may not be necessary for
the device to continue to include an expensive display and user controls. Display-less
high end audio players are already selling
well (for example B&W Zeppelin Air). Such
display-less networked audio players will
become ubiquitous and be available for
under $100.
International Telecommunications
Union: Future Networks
ITU-T Q21/13, Study Group SG13 is looking at Future Networks, which are
expected to be deployed during 20152020.
So far an objectives and design goals document has been published (Y.3001), and the
study group is working on virtualization
and energy saving (soon to be published as
Y.3011 and Y.3021 respectively) and on
identifiers. These deliberations are at a very
early stage and a clear direction is not yet
apparent. The underlying technology could
be a clean slate design, or it could be a
small increment to NGN (Next Generation
Network, which is based on IPv6).
TECHNOLOGY TRENDS
SEMANTIC AUDIO ANALYSIS
Mark Sandler, Chair
Dan Ellis and Jay LeBoeuf, Vice Chairs
Gyorgy Fazekas
Educational games
New advances in semantic audio technologies enable the creation of interactive educational games for music learners. It is now
possible to analyze the sound played on real
instruments and thus avoid the need for
using MIDI controllers, extract symbolic
information such as chords or note names,
and align this information with musical
scores in real-time. Applications like
Song2See demonstrate how semantic audio
technologies may help to create content for
music learners by using automated transcription, keep the user in the loop by allowing the correction of transcription errors,
use the content to ease the learning process
with fingering suggestions for each instrument, and provide real-time feedback about
the quality of playing by means of sound
analysis. The appearance of web-based platforms for content and metadata sharing,
and advances in semantic analysis and recommendation technologies also provide for
creating novel applications for music education. There is a growing trend in using community created web content, including lead
sheets and chord charts, and to analyze
YouTube videos to enhance machine analyses, or to create interactive games that are
not limited to expert generated content. The
use of the web thus provides an advantage
over games like Rock Band or Guitar Hero.
SIGNAL PROCESSING
FOR AUDIO
Christoph Musialik, Chair
James Johnston, Vice Chair
Observations
First, DSP has emerged as a technical
mainstay in the audio engineering field.
Paper submissions on DSP are now among
TECHNOLOGY TRENDS
the most popular topics at AES Conventions, while just some years ago DSP sessions were rare at AES. DSP is also a key
field for other professional conferences,
including those sponsored by IEEE and
ASA.
Second, the consumer and professional
marketplaces continue to show growth in
signal processing applications, such as
increasing number of discrete audio channels, increasing audio quality per channel
(both word width and sampling frequency,
and increasing quality of building block
electronics, such as sampling rate
converters, ADCs and DACs, due to continuously growing availability of consumer-ready DSP hardware
Third, there is growing interest in intelligent signal processing for music information retrieval (MIR), like tune query by
humming, automatically generating
playlists to mimic user preferences, or
searching large databases with semantic
queries such as style, genre, and aesthetic
similarity.
Fourth, there are emerging algorithmic
methods designed to deliver an optimal
listening experience for the particular
audio reproduction system chosen by the
listener. These methods include transcoding and up-converting of audio material to
take advantage of the available playback
channels, numerical precision, frequency
range, and spatial distribution of the playback system. Other user benefits may
include level matching for programs with
differing loudness, frequency filtering to
match loudspeaker capabilities, room cor-
clock recovery circuits, and output amplifiers that match the specifications of the
digital components.
SPATIAL AUDIO
Loudspeaker layouts
Nowadays surround sound is available in
many households, where the 5.1 layout is the
most deployed loudspeaker configuration.
The production chain from recording, coding, transmission to reproduction of surround sound for cinema is also well established. So far, the consumer market for
surround sound has mainly been driven by
movie titles; audio-only content is still quite
rare. As successor of the 5.1 layout, various
layouts with more loudspeakers arranged in a
plane have been proposed, for instance the
7.1 layout. None of them had the commercial
success of the 5.1 layout. Layouts that allow
for the reproduction of height seem to be the
next natural step in the evolution of surround sound. A number of proposed layouts
3-D
With the increased spread of 3-D video in cinema and home cinema, new requirements
must be met by spatial audio reproduction.
While 3-D video adds depth to the image, this
is not a straightforward task with stereophonic techniques. This holds especially for
sound sources closer to the listener than the
loudspeakers. Future spatial audio techniques have to provide solutions to the chal105
TECHNOLOGY TRENDS
lenges imposed by 3-D video. First concepts
have been presented.
Psychoacoustic motivation
Upcoming trends in spatial audio reproduction besides traditional stereophony are
multichannel reproduction systems that
are psychoacoustically motivated. Several
techniques have been developed on the
basis of WFS that aim at spatial reproduction with almost arbitrary layouts using a
decreased number of loudspeakers compared to traditional WFS. Such approaches
are already commercially available. Multichannel time-frequency approaches use
techniques from short-term signal analysis
to analyze and synthesize sound fields.
Directional Audio Coding (DirAC) and Binaural Cue Coding (BCC) are representatives
of the latter techniques. Time-frequency
processing seems to be a promising concept since its basic idea is related to the
analysis of sound fields performed by the
human auditory system.
The psychoacoustic mechanisms underlying the perception of synthetic sound fields
have been investigated in quite some detail.
However there are still plenty of open issues
Headphone listening
Although spatial audio is routinely used by
the gaming industry, advanced techniques
with better quality and realism can be
expected with further increases in processing
power. This holds especially for mobile
devices, where spatial audio is currently
rarely deployed. Due to the general shift
toward mobile devices spatial audio will also
be finding its way into the mobile world. As a
Diverse applications
Besides its traditional application fields, cinema and home cinema, spatial audio is
increasingly being deployed in other areas.
For instance, in teleconferencing systems,
cars, and as auditory system for advanced
human-machine interfaces. Here the use of
spatial audio is expected to provide a clear
benefit in terms of naturalness and transport
of additional information. Another important
area of application is virtual concert hall and
stage acoustics using active architecture systems where spatial audio enhances the environment with which musicians and audiences interact during performance. Modern
multichannel systems offer adjustability of
acoustics and high sound quality suitable for
live performance and recording.
Network standards
With respect to cabling, coping with the ever
increasing number of loudspeakers, the new
IEEE standards for Audio Video Bridging
(AVB) seems promising. The standards are
designed for the fully synchronized transmission of a high number of output/input channels via Ethernet. The standards are developed and currently supported by all major
players in the field and devices are expected
to be available in the near future. Such standards that work with intelligent processing to
detect the listening setup are expected to be
proposed soon.
TECHNOLOGY TRENDS
Digital terrestrial TV broadcasting
for mobile receivers
DVB-T2 Lite (Europe) is still under development, while ISDB-T is used in Japan. DMB
is employed in Korea and there have been a
few trials in Europe.
In the U.S., the Advanced Television Systems Committee (ATSC) has published final
reports of two critical industry planning
committees that have been investigating
likely methods of enhancing broadcast TV
with next-generation video compression,
transmission and Internet Protocol technologies and developing scenarios for the transmission of three-dimensional (3-D) programs
via local broadcast TV stations. The final
reports of the ATSC Planning Teams on 3-DTV (PT-1) and on ATSC 3.0 Next Generation
Broadcast Television (PT-2) are available now
for free download from the ATSC web site.
Loudness
Loudness and True Peak measurements are
replacing the conventional VU/PPM methods
of controlling program levels. This has
largely eliminated significant differences in
the loudness of different programs (and
advertisements) and the need for listeners to
keep adjusting their volume controls. Supporting international standards and operating practices have been published by several
organizations such as ITU-R, EBU and ATSC
listed below. More and more broadcasters
now apply these standards in their program
production and transmission chains.
ITU-R: BS.1770: Algorithms to measure
audio programme loudness and true-peak
audio level; BS.1771: Requirements for
loudness and true-peak indicating meters.
The following five documents provide the
Lip sync
The lip-sync issue remains unsolved, but is
being discussed in digital broadcasting
groups. Some international standards
development organizations such as IEC and
SMPTE are discussing new standards for
measuring the time differences between
audio and video.
G ai n i m m e di a t e a c ce s s t o
o v e r 1 3 , 0 0 0 f u l l y se a r c h a b l e
P D F f i l e s d o cu m e n t i n g a u di o
r e s e ar ch f ro m 1 9 5 3 t o t h e
pre sen t day. The E-librar y
i n cl u d e s e v e r y A E S p ap e r
p u b l i s h e d at a c o n v e n t i on ,
c o n f e re n c e o r i n t h e J o u r n al
Internet streaming
The use of new methods for the distribution
of signals to the home via the Internet with
streaming services is an increasing trend.
Web radio and IPTV are now getting audience figures that in a number of years from
now will be closing in on the traditional systems. Distribution technologies with rapid
growth in many countries are: ADSL/VDSL
over copper or fiber, combined with WiFi in
homes; WIMAX and 3G/UMTS; 4G and wi-fi
hot spots for distribution to handheld
devices.
J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February