You are on page 1of 12

Symposium on Image and Signal Processing and Analysis (ISPA01), Pula, Croatia, June 2001

Digital Sound Synthesis by Physical Modelling


Rudolf Rabenstein and Lutz Trautmann
Telecommunications Laboratory
University of Erlangen-Nurnberg
D-91058 Erlangen, Cauerstr. 7
frabe, trautg@LNT.de
Abstract
After recent advances in coding of natural speech and
audio signals, also the synthetic creation of musical sounds
is gaining importance. Various methods for waveform synthesis are currently used in digital instruments and software
synthesizers. A family of new synthesis methods is based on
physical models of vibrating structures (string, drum, etc.)
rather than on descriptions of the resulting waveforms. This
article describes various approaches to digital sound synthesis in general and discusses physical modelling methods
in particular. Physical models in the form of partial differential equations are presented. Then it is shown, how to
derive discrete-time models which are suitable for real-time
DSP implementation. Applications to computer music are
given as examples.

1. Introduction
The last 150 years have seen tremendous advances in
electrical, electronical, and digital information transmission
and processing. From the very beginning, the available
technology has not only been used to send written or spoken
messages but also for more entertainig purposes: to make
music! An early example is the Musical Telegraph of Elisha
Gray in 1876, based on the telephone technology of that
time. Later examples used vacuum tube oscilators throughout the first half of last century, transistorized analog synthesizers in the 1960s, and the first digital instruments in
the 1970s. By the end of last century, digital soundcards
with various methods for sound reproduction and generation were commonplace in any personal computer.
The development is rapidly going on. One driving force
is certainly the availablity of ever more powerful hardware.
Cheap memory allows to store sound samples in high quality and astonishing variety. The increase in processing
power makes it possible to compute sounds in real time.
But also new algorithms and more powerful software

give desktop computers the functionality of stereo equipment or sound studios. An example are new coding schemes
for high quality audio. Together with rising bitrates for
file transmission on the internet, they have made digital
music recordings freely avaiable on the world wide web.
Another example is the combination of high performance
sound cards, high capacity and fast access hard disks, and
sophisticated software for audio recording, processing and
mixing. A high-end personal computer equipped with these
components and programs provides the full functionality for
a small home recording studio.
While more powerful hard- and software turn a single
computer into a music machine, advances in standardization pave the way to networked solutions. The benefits of
audio coding standards has already been mentioned. But
the new MPEG-4 video and audio coding standard does not
only provide natural but also synthetic audio coding. This
means, that not only compressed samples of recorded music
can be transmitted, but also digital scores similar to MIDI
in addition to algorithms for the sound generation. Finally
the concept of Structured Audio allows to break down an
acoustic scene into their components and to transmit and
manipulate them independently.
While natural audio coding is a well researched subject
with widespread applications, the creation of synthetic high
quality music is a topic of active development. For some
time, applications have been confined to the refinement of
digital musical instruments and software synthesizers. Recently, digital sound synthesis finds its way into the MPEG4 video and audio coding standard. The most recent and
maybe most interesting family of synthesis algorithms is
based on physical models of vibrating structures.
This article will higlight some of the methods for digital sound synthesis with special emphasis on physical modelling. Section 2 presents a survey of synthesis methods.
Two algorithms for physical modelling are described in section 3. Applications to computer music are given in section 4.

2. Digital Sound Synthesis


2.1. Overview
Four methods for the synthesis of musical sounds are be
presented in ascending order of modelling complexity [5].
The first method, wavetable synthesis, is based on samples
of recorded sounds with little consideration of their physical nature. Spectral synthesis creates sounds from models of their time-frequency behaviour. The parameters of
these models are derived from descriptions of the desired
waveforms. Nonlinear synthesis allows to create spectrally
rich sounds with very modest complexity of the synthesis
algorithms. In contrast to spectral synthesis the parameters
of these nonlinear models are not related to the produced
waveforms in a straightforward way. The most advanced
method, physical modelling, is based on models of the physical properties of the vibrating structure which produces the
sound. Rather than imitating a waveform, they simulate the
physical behaviour of a string, drum, etc. Such simulations
are numerically demanding, but modern hardware allows
real-time implementations under practical conditions.

2.2. Wavetable Synthesis


The most widespread method for sound generation in
digital musical instruments today is wavetable synthesis,
also simply called sampling. Here, the term wavetable synthesis will be used, since sampling strictly denotes time discretization of continuous signals in the sense of signal theory.
In wavetable synthesis recorded or synthesized musical
events are stored in the internal memory and are played back
on demand. Therefore wavetable synthesis does not require
a parameterized sound source model. It only consists of
a database of digitized musical events (the wavetable) and
a set of playback tools. The musical events are typically
temporal parts of single notes recorded from various instruments and at various frequencies. The musical events
must be long enough to capture the attack of the real sounds
as well as a portion of the sustain. Capturing the attack is
necessary to reproduce the typical sound of an instrument.
Recording a sufficiently long sustain period avoids a strict
periodicity during playback.
The playback tools consist of various techniques for
sound variation during reproduction. The most important
components of this toolset are pitch shifting, looping, enveloping, and filtering. They are discussed here only briefly.
See [3, chapter 8] and [5] for a more detailed treatment.
Pitch shifting allows to play a wavetable at different
pitches. Recording notes at all possible frequencies for
all instruments of interest would require excessive memory. To avoid this situation only a subset of the frequency

range is recorded. Missing keys are reconstructed from the


closest recorded frequency by pitch variation during playback. Pitch shifting is accomplished by sample rate conversion techniques. Pitch variation is only possible within
the range of a few semitones without noticeable alteration
of the sound characteristics (Micky-Mouse effect).
Looping stands for recursive read out of the wavetable
during playback. It is applied due to memory limitations as
well as length variations of the played notes. As mentioned
above, only a certain period is recorded, long enough to capture the richness of the sound. This period is extended by
looping to produce the required duration of the tone. Care
has to be taken to avoid discontinuities at the loop boundaries.
Enveloping denotes the application of a time varying
gain function on the looped wavetable. Since the typical
attack-decay-sustain-release (ADSR) envelope of an instrument is destroyed by looping, it can be reconstructed or
modified by enveloping.
Filtering modifies the time dependent spectral content of
a note as enveloping changes its amplitude. Usually recursive digital filters of low order with adjustable coefficients
are used. This allows not only a better sound-variability
than present in the originally recorded wavetables but also
time-varying effects which are not possible with acoustic
instruments.
Despite these playback tools for sound alteration
(and others not mentioned here), the sound variation of
wavetable synthesis is limited by the recorded material.
However, with the availability of cheap memory, wavetable
synthesis has become popular for two reasons: Low computational cost and ease of operation. More advanced synthesis techniques need more processing power and require
more skill of the performing musician to fully exploit their
advantages.

2.3. Spectral Synthesis


While wavetable synthesis is based on sampled waveforms in the time domain, spectral synthesis produces
sounds from frequency domain models. There is a variety of
methods based on a common generic signal representation:
the superposition of basis functions (t) with time-varying
amplitudes Fl (t)

f (t) =

Fl (t) l (t) :

(1)

Only a short description of the main approaches is given


here, based on [3, chapter 9], [5], and [2]. Practical implementations often consist of combinations of these methods.

Additive Synthesis In additive synthesis, (1) describes


the superposition of sinusoids

f (t) =

Fl (t) sin(l (t)) + n(t):

!m

VCO 1

!0

VCO 2

f (t)

(2)

Sometimes a noise source n(t) is added to account for the


stochastic character which is not modelled well by sinusoids. In the simplest case, each frequency component l (t)
is given by a constant frequency and phase term l (t) =
!l t + l . In practical synthesis, the time signals in (2) are
represented by samples and the synthesized sound is processed in subsequent frames. The time variation of the amplitude and the frequency of the sinusoids are considered by
changing the values of Fl , !l , and possibly l from frame
to frame.
Subtractive Synthesis Subtractive synthesis shapes signals by taking away frequency components from a spectrally rich excitation signal. This is achieved by exciting
time-varying filters with noise. This approach is closely related to filtering in wavetable synthesis. However, in subtractive synthesis, the filter input is a synthetic signal rather
than a wavetable. Since harmonic tones cannot be well approximated by filtered noise, subtractive synthesis is mostly
used in conjunction with other synthesis methods.
Granular Synthesis In granular synthesis the basis functions l (t) in (1) are chosen to be concentrated in time and
frequency. These basis functions are called atoms or grains
here. Building sounds from such grains is called granular
synthesis. Sound grains can be obtained by various means:
from windowed sine segments, from wavetables, from Gabor expansions, or with wavelet techniques.

2.4. Nonlinear Synthesis


In the previous sections linear sound synthesis methods
have been described. They varied from the computational
cheap wavetable synthesis with low variability to the computational expensive additive synthesis where arbitrary access on the basics of a sound is possible.
Using nonlinear models for sound synthesis leads to
computational cheap methods with rich spectra. The disadvantage of these methods is that the resulting time functions or spectra cannot be calculated analytically in most
cases. Also the effect of parameter changes on the timbre of the sound cannot be predicted except for very simple
schemes. Nevertheless nonlinear synthesis provides computational low-cost synthetic sounds with a wide variety of
time functions and spectra.
The simplest case of nonlinear synthesis is discussed
here. Making the phase term in the sine function time de-

Figure 1. Frequency Modulation


pendent leads to the frequency modulation (FM) method. In
its simplest form, the time function f (t) is given by

f (t) = F (t) sin(!0 t + (t)):

(3)

The implementation consists of at least two coupled oscillators. In (3) the carrier sin(!0 t) is modulated by the timedependent modulator (t) such that the frequency becomes
time-dependent with ! (t) = !0 + (@=@t)(t). If the modulator is also sinusoidal with (t) =  sin(!m t) as shown
in Fig. 1 the resulting spectrum consists of the carrier frequency !0 and side frequencies at !0  n!m ; n 2 N . The
relations between the amplitudes of the discrete frequencies can be varied with the modulation index  . They are
given by the values of the Bessel functions of order n with
argument  . Four different FM spectra for !0 = 1 kHz
and different modulator frequencies and different modulation indices  are shown in Fig. 2. The spectrum for  = 1
has a simple rational relation between !0 and !m resulting
in a harmonic spectrum. Increasing the modulation index
to  = 2 preserves the distance of the frequency lines but
increases their number (top right). A slight decrease of !m
moves the frequency components closer together and produces a non-harmonic spectrum (bottom left). Spectrally
very rich sounds can be produced by combining small values of the modulation frequency !m with high modulation
indices, as shown for  = 8. However, due to the dependence on only a few parameters, arbitrary spectra as in additive synthesis cannot be produced. Therefore this method
fails to reproduce natural instruments. Nevertheless FM is
frequently used in synthesizers and in sound cards for personal computers, often with more than just two oscillators
in a variety of different connections.

2.5. Physical Modelling


Wavetable synthesis, spectral synthesis as well as the
nonlinear synthesis are based on sound descriptions in the
time and frequency domain. A family of methods called
physical modelling goes one step further by modelling
directly the sound production mechanism instead of the
sound. Invoking the laws of acoustics and elasticity theory results in the physical description of the main vibrating structures of musical instruments by partial differential
equations. Most methods are based on the wave equation
which describes wave propagation in solids and in air ([17]).

=1

0.5

0
0
1

0.5

1
f in kHz

0
0
1

=2

0.5

0
0

=2

1
f in kHz

2
=8

0.5

1
f in kHz

0
0

1
f in kHz

Figure 2. Typical FM spectra


Finite Difference Methods The most direct approach is
the discretization of the wave equation by finite difference
approximations of the partial derivatives with respect to
time and space. However, a faithful reproduction of the harmonic spectrum of an instrument requires small step sizes in
time and space. The resulting numerical expense is considerably. The application of this aproach to piano strings has
for example been shown by [1]. A physical motivation of
the space discretization is given by the mass-spring-models
described in [4].
Modal Synthesis Vibrating structures can also be described in terms of their characteristic frequencies or modes
and the associated decay rates. This approach allows the
formulation of couplings between different substructures.
Except for simple cases, the determination of the eigenmodes can only be conducted by experiments ([4]).
Digital Waveguides A well known theoretical approach
to the solution of the wave equation in one spatial dimension is the dAlembert solution. It separates the wave propagation process into a pair of waves travelling into opposite directions without dispersion or losses. This separation
is the basis of the digital waveguides described in [9], [3,
chapter 10], and [5, chapter 7]. The digital model consists
of a bidirectional delay line with coupling coefficents between the taps approximating losses and dispersion. The
digital waveguide method has been refined by proper adjustment of the delay lines using fractional delay filters [15].
Applications to string instruments are found in [14] and to
woodwind instruments in [7].
Couplings between sections with different wave
impedances are modelled by scattering junctions. They
approximate the partial reflections at discontinuities.
Waveguide methods have also been extended to two

and three spatial dimensions, however with considerable


increase in computational demand.
Physical modeling by digital waveguides is incorporated
into various commercial musical instruments using appropriate models for excitation (e.g. plucked, struck, and
bowed strings) and boundary conditions. Furthermore, it
provides a sound basis for the creation of artificial instruments like bowed flutes.
Transfer Function Models This relatively new approach
starts directly at the partial differential equation (PDE) describing the continuous vibrations in a musical instrument.
It transforms the PDE with suitable functional transformations into a multidimensional (MD) transfer function model
(TFM). For the time variable the Laplace transformation is
used. The spatial transformation depends on the PDE and
its boundary conditions. This leads to a generalized SturmLiouville type problem whose solutions are the eigenfunctions K (x;  ) and the eigenvalues  . They are used in
the spatial transformation as transformation kernel and as
spatial frequency variable [12].
The physical effects modelled by the PDE like longitudinal and transversal oscillations, loss and dispersion are
treated with this method analytically. Moreover, the TFM
explicitely takes initial and boundary conditions, as well as
linear and nonlinear excitation functions into account. The
discretization of this continuous model for computer implementation based on analog-to-discrete transformations preserves not only the inherent stability, but also the natural
frequencies of the oscillating body.
All parameters of this method are strictly based on physical parameters (dimensions as well as material parameters)
and the output signal is calculated analytically from these
parameters.
Digital waveguides and multidimensional transfer function models are covered in more detail in section 3.

2.6. Structured Audio


The techniques described above had initially been confined to proprietary hard and software in musical instruments or dedicated programs for the generation of digital music. Although there have been tremendous efforts
in the standardization of multimedia services, they were
mostly directed to the compression of natural audio and
video material. Synthetic sounds were of no concern until the emergence of MPEG-4 standardization. While still
advancing the coded representation of natural audiovisual
scenes, MPEG-4 introduced a tool for digital sound synthesis under the name of structured audio (SA) [16, 8, 13]. The
idea is not to transmit coded sounds as in natural audio but
a highly parametric description of music (such as a musical score) from which the sound is synthesized at the de-

coder. Among other tools, structured audio provides score


languages to encode the musical paramaters (pitch, duration, etc.) as well as methods for sound synthesis.
To describe musical scores, the very popular MIDI standard has been included into MPEG-4 structured audio.
But also a more advanced structured audio score language
(SASL) has been created to provide enhanced control of algorithmic and wavetable synthesis.
Also for sound synthesis two different methods exist,
a programming language for musical synthesis algorithms
structured audio orchestra language (SAOL) and a standard
for the storage and transmission of wavetables structured
audio sample bank format (SASBF).
SAOL is an object oriented programming language with
special commands and variable types for real-time sound
synthesis. It differs from conventional programming languages by providing three different time scales for the generation of synthetic wave forms. To each time scale belongs
a certain data type, such that variables of that type are automatically evaluated at the corresponding rate. The fastest
time scale is the a-rate, which is peformed at the sampling
rate. A medium time scale is the control rate (k-rate) for updating envelopes and other control signals. Typical values
for the k-rate are a few cycles per second. An even slower
rate is the instrument rate (i-rate) for the initialization of
timbre parameters. They may be updated asynchroneously,
e.g. at the beginning of each note. Furthermore, SAOL provides special high level sound processing commands like
signal and envelope generation, parametric filtering, evaluation of MIDI data, and access of wavetables.
Since SAOL is a general purpose language, it can be used
to realize any of the sound synthesis algorithms described
above. Since SAOL code and sample banks are transmitted together with the score data, the synthesized sound at
the decoder will be exactly the same as intended at the encoder side. This is in contrast to the reproduction of MIDI
files, where the sound quality is determined by individual
instrument or wavetable of the listeners audio equipment.
Although there is not yet a real-time structured audio encoder to date, it can be expected that synthetic audio reproduction will become an alternative to coded natural audio in
the near future [18]. It has already been demonstrated that
structured audio provides highest sound quality at very low
bitrates, not attainable by natural audio coders. Of course
synthetic audio is restricted to music which can be completely described by musical scores of some kind. It cannot
reproduce sounds without an underlying model, e.g. from
microphone recordings. Furthermore also the available synthesis methods and sound productions models have to be
refined. Advances can be expected on the field of physical
models, which are discussed in the following section.

3. Physical Modelling
Sound synthesis by physical modelling requires two essential steps: the description of a vibrating structure by the
principles of physics and the transformation into a discretetime, discrete-space model which is suitable for computer
implementation. Each step requires certain simplifications
and allows variations. These are discussed in the following
sections.

3.1. Vibration Models


Deformable bodies may exhibit vibrations in various frequency ranges. The exact description of such vibrations requires to break up the body into small volume elements.
Setting up a balance between the forces of inertia and deformation for each element leads to a PDE for the deflection from the rest position. The derivation of these PDE descriptions for vibrating strings, reeds, membranes, and other
elastic bodies can be found e.g.in [6, 10, 17].
Various PDE models for a vibrating string are presented
as examples. The time and space coordinates are denoted by
t and x. Only one space coordinate is considered for simplicity. y (x; t) denotes the deflection of the string from the
rest position. Furthermore a number of material constants
and shape parameters are required such as the Youngs modulus E and the density  of the material, the cross section
area A and the moment of inertia I of the string.
The simplest model results for undamped longitudinal
waves. It takes the form of the well-known wave equation
with second order derivatives for time and space

@ 2 y (x; t)
@x2

@ 2 y (x; t)
=0:
@t2

(4)

For sound generation, transversal waves are more important, since they transmit energy to the resonance body and
the surrounding air. They are characterized by a fourthorder spatial derivative

EI

@4y
@ 2 y (x; t)
+ A
=0:
4
@x
@t2

(5)

Typically, a string is under strain by a certain force F , resulting in an additional second order term

EI

@4y
@x4

@2y
@ 2 y (x; t)
+
A
=0:
@x2
@t2

(6)

Further refinement of the model by inclusion of rotational


vibrations and shear effects finally leads to the Timoshenko
equation from elasticity theory.
Rather than refining the model in this direction, we extend (6) by an external force per length f (x; t). Furthermore damping is considered by additional terms with the

decay variables d1 and d3

@2y
@ 2 y (x; t)
+ A
2
@x
@t2
@y
@3y
+d1
+ d3
= f (x; t) : (7)
@t
@t@x2
Note that for rigid (E = 0) or very thin (I = 0) strings with
no damping (d1 = d3 = 0) (7) has the same structure as the
EI

@4y
@x4

wave equation (4), however with different coefficients. Its


solution can be written as superposition of a forward and a
backward travelling wave (dAlembert solution)

y (x; t) = yl (x + ct) + yr (x
p

ct)

(8)

Figure 4. Boundary conditions for a string


fixed at both ends

initial conditions: one for the initial value of the deflection


and one for its time derivative.
The plucked string is characterized by a given deflection
profile for t = 0, while the time derivatives are zero [6].
The struck string is given by the first order time derivatives
while the deflection is zero at t = 0. The corresponding
initial conditions are

where c = F=(A) is the propagation speed of the waves.


If the first term in (7) does not vanish, then the travelling
waves are subject to dispersion. If the decay variables d1
and d3 are nonzero, then the term with d1 introduces frequency independent damping and the term with d3 introduces frequency dependent losses.
The vibration modes of a string are not only determined
by the PDE but also by the boundary conditions at the ends
x = x0 and x = x1 . To solve (7) we need four boundary
conditions because the highest order of spatial derivatives is
four. In most musical instruments the string is fixed at the
ends, as shown in Fig. 3.

The dot denotes time derivation y_ (x; t) = @y=@t. The initial profile of a string plucked close to x1 is shown in Fig. 5,
while Fig. 6 shows the initial velocity of string struck by
a hammer at the position xe . In general, both yi0 (x) and
yi1 (x) can be specified independently from each other.

Figure 3. Mechanical fixing of a string

Figure 5. Initial conditions for a plucked string

y (x; 0) = yi0 (x);


y_ (x; 0) = yi1 (x);

t=0;
t=0:

(11)
(12)

The boundary conditions for this situation require that


the deflection (9) and the skewness (10) at these points are
zero [6] (see Fig. 4)

y (x0 ; t) = 0;
y 00 (x0 ; t) = 0;

y (x1 ; t) = 0;
y 00 (x1 ; t) = 0:

(9)
(10)

The double prime denotes the second order spatial derivative y 00 = @ 2 y=@x2 .
Elastic fixing at the ends of the string or interface conditions to the sound board are described in the same way,
e.g. by prescribing a certain linear combination of y (x0 ; t)
and y 00 (x0 ; t). The boundary conditions can also include an
excitation function at the boundary as they occur in wood
wind instruments.
Typical excitation modes for musical instruments are to
pluck or to struck the string. These modes are expressed
in mathematical terms as initial conditions of the PDE. Because the highest time derivative in (7) is two, we need two

Figure 6. Initial conditions for a struck string


The PDE descriptions presented in this section consitute
highly accurate physical models of strings and other vibrating structures. Extensions to two and three space dimensions are strightforward with a more general definition of
the spatial differentiation operators. However, a computer
implementation of these models requires to implement suitable discretization schemes for time and space coordinates.
Two approaches of practical importance are the digital wave
guides method and the functional transformation method.
They are described below in detail.

3.2. Digital Waveguide Method


The digital waveguide method is based on the analogy
between wave propagation mechanisms in elastic bodies,
reeds and electromagnetic waveguides and their counterparts in digital delay lines. Its application to computer music has been presented e.g. in [9], [3, chapter 10], [5, chapter
7] and [15, 14, 7].
The most simple vibration model is wave equation, according to (4) for longitudinal waves or by simplification
of (7) for transversal waves. A representation of the corresponding travelling wave solution (8) by a continouos
waveguide is shown in Fig. 7.




a multiplication to model frequency independent


damping,
a selective filter to model frequency dependent damping, or
an allpass to model frequency dependent delay.

Figure 9. Digital waveguide with loss and dispersion filters

Figure 7. Continuous waveguide


It can be transformed into a equivalent discrete structure
by sampling the solution y (x; t) on a space-time grid with
x = mh and t = kT where m and k are the discrete space
and time coordinates and h and T are the corresponding
step sizes. When the time step size h is set equal to the
distance that a wave with propagation speed c travels during
the time step size h, i.e. h = cT , then the sampled waves
y [m; k ] = y (mh; kT ) are related by

y [m; k ] = yl [m + k ] + yr [m

k]

(13)

Boundary conditions are considerd by a proper termination of the dual delay line waveguide. These are also realized by digital filters as shown in Fig. 10. Similar to the
filters H (z ) for loss and dispersion, the boundary reflection
filters Rl (z ) and Rr (z ) represent





a phase shift for an open or closed line termination,


a real constant 0 < r < 1 as frequency independent
reflection factor, or
a digital filter for frequency dependent reflections.

The spatial shifts by the distance h are realized by sampling


the continuous waveguide in x-direction and the time shifts
are realized by delay elements (z 1 ). The resulting dual delay line structure of a digital waveguide is shown in Fig. 8.

Figure 10. Digital waveguide with boundary


reflection filters

Figure 8. Digital waveguide


It is capable of reproducing the travelling wave solution
of the wave equation, but it does not consider loss or dispersion from more detailed vibration models such as (7)).
These effects can be approximated by including additional
filter elements H (z ) in the delay lines (see Fig. 9). These
elements may consist of

The double delay line waveguide structure in Fig. 10 gives


a complete picture how wave propagation, loss, dispersion,
and reflection at the boundaries can be approximated. In
addition, initial conditions are realized by the initial values
of the delay elements for k = 0.
On the other hand, a practical implementation under realtime constraints calls for some simplifications. At first all
delay elements can be combined into a single delay line represented by a multiple delay element by 2M samples. Then
the various filters H (z ) at each delay element, as well as the
boundary reflection filters Rl (z ), and Rr (z ) are combined

into a smaller number of different filters. In practical implementations, their transfer functions are not derived directly
from H (z ), Rl (z ), and Rr (z ). Instead they are designed to
produce a certain waveform. It has turned out that three different filters are adequate to model the correct pitch, dispersion and frequency dependent damping [14]. Fig. 11 shows
the resulting arrangement of a single delay line and three
digital filters.

f (t)

y (t)

Hfd

Hdisp

HTP

Figure 11. Efficient realization of a digital


waveguide with excitation function f (t) and
output y (t)
The functions of these filters are

Hfd (z ) fractional delay filter to produce the exact pitch,


Hdisp (z ) dispersion filter for deviations from pure wave
propagation,

HTP (z ) lowpass filter to model frequency dependent


damping.
They are assumed to be orthogonal in the sense that they
can be designed independently from each other.

3.3. Functional Transformation Method


The functional transformation method derives a discrete model of the vibrating body from its multidimensional transfer function description. Such an approach
is well-known from the design of digital filters in the
one-dimensional case. It starts from the description of a
continuous-time electrical network by an ordinary differential equation. Application of Laplace transformation turns
the differential equation into a transfer function. Suitable
discretisation schemes like impulse-invariant transormation
or others convert the transfer function of the continuoustime network into the transfer function of a discrete-time
system, which is suitable for computer implementation. It is
worthwhile to review the reasons why the Laplace transformation is well-suited for the derivation of transfer functions
from ordinary differential equations:
1. Laplace transformation turns the time derivatives into
multiplications with algebraic functions of the complex frequency variable.

2. Laplace transformation turns the initial conditions of a


differential equation into additive terms.
By virtue of these properties, the differential equation with
initial conditions is converted into an algebraic equation.
Solving this equation for the Laplace transform of the output quantity yields the transfer function of the network.
The same approach can also be applied to multidimensional systems described by PDEs. Again, Laplace transformation can be applied with respect to the time variable.
However, the result will still contain derivatives with respect
to space. Now assume that a transformation exists for the
space variable which has similar properties as the Laplace
transformation for the time variable. Then application of
this transformation would turn the boundary-value problem
into an algebraic equation.
This approach relies on the existence of a transformation
with respect to space with differentiation properties similar
to the Laplace transformation. It has been shown how such
transformations can be obtained for the PDEs presented in
section 3.1 and many others [12, 11].
Then the following four-step procedure can be applied
to derive a discrete model for vibrating bodies from a PDE
model:
1. Application of the Laplace transformation with respect
to time removes the time derivatives and turns the
initial-boundary-value problem into a boundary value
problem for the space variable.
2. Application of a suitable transformation for the space
variable which removes the spatial derivatives and
turns the boundary value problem into an algebraic
equation.
3. Solution of the algebraic equation for the transform
of the solution of the PDE. The resulting MD transfer
function is the frequency domain equivalent to the initial continuous-time, continuous space PDE description.
4. Discretization of the MD transfer function to obtain
a discrete-time discrete-space model of the vibrating
body.
This procedure is now demonstrated by a simple PDE
model already considered in section 3.2. Then extensions to
other PDE models and other types of boundary conditions
will be discussed.
The derivation of a transfer function model of a vibrating
string is presented by a simple example. We consider the
wave equation for a string with fixed ends (boundary condi-

tions (9) and with initial conditions according to (11,12).

y(x; t)

c2 y 00 (x; t)
y (x; 0)
y_ (x; 0)
y (x0 ; t)
y (x1 ; t)

=
=
=
=
=

x0 < x < x1
x0 < x < x1
x0 < x < x1
x = x0
x = x1

0
yi0 (x)
yi1 (x)
0
0

(14)

y(x; t) and y 00 (x; t) denote second order derivatives with


respect to time and space.
Laplace transformation with respect to the time variable Y (x; s) = Lfy (x; t)g turns this initial-boundary value
problem into a boundary-value problem for the space variable x

c2 Y 00 (x; s) = s yi0 (x) + yi1 (x)


Y (x0 ; s) = 0
(15)
Y (x1 ; s) = 0

s2 Y (x; s)

Figure 12. Shape of the eigenfunctions


K (  ; x),  = 1; 2; 3

with suitable normalization factors N . The shape of the


first three eigenfunctions is shown in Fig. 12.
Using the conditions (19) and (20) and integration by
parts, we can show the differentation property of the transformation T

T fY 00 (x)g =

Zx1

x0

Note that the second order time derivative has turned into a
multiplication with s2 and that the initial values from (14)
appear as additive terms on the right hand side.
To remove also the spatial derivative and to consider the
boundary conditions, we apply the spatial transformation
Zx1

T fY (x)g = Y (  ) = Y (x)K (  ; x) dx
(16)
x0

with the transformation kernel

K (  ; x) = K sin  (x

x0 )

(17)

 = 

x1

x0

2N:

(18)

This special form of the spatial transformation (finite sinetransformation) has been chosen, because the transformation kernel from (17) fulfills the same boundary conditions
as the deflection y (x; t) of the string (compare (14)

K (  ; x0 ) = 0
K (  ; x1 ) = 0

(19)
(20)

The transformation kernel K (  ; x) represents the spatial


eigenfunction of the string. In other words, the frequency
domain quantities Y (  ) represent the amplitudes of the
corresponding eigenfunctions. In reverse, the inverse transformation constitutes an expansion of the deflection y (x) in
terms of the eigenfunctions K (  ; x)

y (x) = T

fY (  )g

N
X
1
=0

N

(22)

Application of (16) and (22) now turns the boundaryvalue problem (16) into an algebraic equation

s2 Y (  ; s) + c2 2 Y (  ; s) = s yi0 (  ) + yi1 (  ) : (23)


It is straightforward to solve (23) for the transform of the
solution Y (  ; s) = T fLfy (x; t)gg.

Y (  ; s) =

s2

s
1
yi0 (  ) + 2
y ( ) (24)
2
2
+ c 
s + c2 2 i1 

This result is the desired transfer function model. It can also


be written in the form

Y (  ; s) = G i0 (  ; s) yi0 (  ) + G i1 (  ; s) yi1 (  ) (25)

and the discrete spatial frequency

Y 00 (x)K ( ; x) dx = 2 Y (  ) :

y(  )K (  ; x)

(21)

with the transfer function for the inital values yi0 (  ) and
yi1 (  )

G i0 (  ; s) =
G i1 (  ; s) =

s
;
s2 + c2 2

(26)

(27)

s2 + c2 2

These transfer function describe the string in the same way


as the initial-boundary value from (14). However, in contrast to the original PDE model, they provide a convient
transition to a discrete-time, discrete-space model.
At first, we note that the spatial frequency  is a discrete
variable. Thus it is sufficient to discretize the time variable.
This is accomplished by any analog-to-discrete transformation, e.g. impulse-, step-, ramp-invariant or bilinear transformation. Since the inital values may be seen as the result
of impulse function, the impulse-invariant transformation

a decaying oscillation. The individual decay rate for each


frequency is determined by the coefficient c . The partial
results from each recursive system are weighted with the
values of the eigenfunctions K (; xa ) at certain listening
position xa along the string. The final result y d (xa ; k ) represents the sampled oscillation of a string element at position xa .

provides optimal results. It turns the second order transfer


functions into

s2

s
+ c2 2

z2

s2

1
+ c2 2

z2

z2

z cos(! T )
2z cos(! T ) + 1

z sin(! T )=!
2z cos(! T ) + 1

with ! = c  . Then the discrete-time transfer function


model takes the form

Y d (  ; z ) =

z2

z cos(! T )
yi0 (  ) +
2z cos(! T ) + 1
z sin(! T )=!
yi1 (  ) :
2
z
2z cos(! T ) + 1
z2

(28)

Inverse z -transformation gives finally for each value of the


discrete spatial frequency index  one difference equation
for the discrete time variable k
yd (  ; k ) =
2 cos(! T )  yd (  ; k 1) yd (  ; k 2) +
+ yi0 (  ) 0 (k ) +

sin(! T )

!

 yi1 (  )

cos(! T )  yi0 (  )

0 (k

1)
(29)

where 0 (k ) denotes the discrete-time impulse sequence.


The structure of this difference equation is shown in Fig. 13.
The spatial transforms of the initial value profiles act as inputs for the first time step k = 0. The second order recursive
system computes the time history of the eigenfunction with
frequency ! .

Figure 14. Parallel arrangement of recursive


systems

Although based on many simplifications, this example


showed, that the functional transformation method (FTM)
provides an exact and systematic way from the PDE description of a vibrating string to a discrete model suitable
for computer implementation. The coefficients of the discrete model are expressed directly by the parameters of the
physical model.
Extensions of the FTM into many directions have been
presented in [12, 11] and other literature cited there. The
main topics are briefly discussed:

Higher order spatial operators require a careful construction of the spatial transformation T . The suitable
theoretical framework for this task is the theory of special boundary-value problems of the Sturm-Liouville
type.

Figure 13. Second order difference equation


Since the most simple model, the lossless wave equation, has been assumed, the second order system in Fig. 13
shows no decay and would ring forever. Furthermore, there
exists such a second order system for each value of  and
the final output has to be recovered by the inverse spatial
transformation (21) from all partial results yd (  ; k ) in the
audible frequency range. Fig. 14 shows this situation for the
more general string model which contains loss terms. Different from Fig. 13, each second order system now exhibits

The PDEs in section 3.1 exhibit more complex differentiation operators in time and space than the wave
equation presented above. Higher order differential
operators with respect to time simply introduce higher
order polynomials in s into the transfer functions, resulting in recursive systems of higher order.

A more realistic treatment of the fixing of a string or


other vibrating bodies requires to consider also boundary conditions of second or third kind. This is also
possible in the context of the Sturm-Liouville theory
mentioned above.

So far, only problems with one spatial dimension have


been presented for simplicity. The extension to two or
three dimensions poses no fundamental difficulty. An
example for two spatial dimensions is given in [11].

4. Applications to Computer Music


After a presentation of sound synthesis methods by physical modelling, we show some applications to computer music. The focus is on the functional transformation method
(FTM) because it is the most flexible and most accurate
physical modelling method.

Figure 16. FTM model of a bowed string

4.1. Modelling of Musical Instruments


The parallel arrangement of recursive systems shown in
Fig. 14 is the core of a number of musical instrument models. They all share the same physical model of a vibrating
string, but they differ in the kind of excitation. The most
simple one is a string plucked with a certain force profile.
The other models for a bowed string and for a string struck
by a hammer employ different nonlinear excitation models.
Finally a FTM model of drum is presented.

Hammered String. To model a real hammer-string interaction the dynamic of the hammer has to be taken into
account. The hammer deflection can be modeled by one
second order recursive system. The input force for this recursive system is the negative input force for the recursive
systems of the string. The hammer interacts nonlinear with
the string because of the nonlinearity of the force-deflection
law of the hammer felt. The input variable is here the initial
hammer velocity vh . The algorithm is shown in figure 17.
The nonlinear operation includes a delay for computability.

Plucked String. The most simple way to model a plucked


string is to choose the initial value yi0 (x) according to Fig. 5
and to set yi0 (x) to zero. A more advanced model uses a
certain time and space profile for the excitation by a suitable
force per length f (x; t) as in (7). Including this excitation
model into the discrete-time system from Fig. 14 results in
an excitation of the inputs a() while the inputs b() remain
unaffected. This simple but versatile excitation model is
shown in Fig. 15

Figure 17. FTM model of a hammered string


Figure 15. FTM model of a plucked string

Bowed String. The action of the bow on a string is not


only time dependent but depends also on the velocity of the
string. It can be described as a nonlinear stick and slip action between bow and string. It can be realized with the
feedback structure shown in figure 16. The input variable is
the bow velocity.

Vibrations of a Drum. The extension of the above string


models to membranes leads to a spatial transformation with
two-dimensional eigenfunctions. A result from [11] shows
the vibrations of a circular drum excited with a drum stick
at different points. As is well-known among drummers, an
excitation closer to boundary produces a more interesting
sond than an excitation in the center.

References

Figure 18. Vibrations of a circular drum with


excitation in the center and close to the
boundary

4.2. Musical Instrument Morphing


So far, we have assumed that the physical properties of
the instrument models do not change with time. This is a
reasonable constraint for most real instruments. However,
for virtual instrument, also the model parameters are at the
disposal of the player. The FTM described above permits
also sound variations of the following kind: During operation of an instrument, its physical parameters are slowly
changed from one set of parameters to another. As a consequence, the timbre of the instrument changes gradually,
e.g. from a guitar string to a xylophone. Of course also rare
combinations of material parameters are possible that cannot appear in real instruments. This well directed change
of the sound characteristic of a virtual instrument is called
instrument morphing. It requires a close control over the
physical parameters of the model as it is provided by the
FTM.

5. Conclusions
Digital sound synthesis is an emerging application for
multimedia processing. With ever increasing computing
power, real-time implementation of demanding physical
models has become feasable. The advantage of physical
modelling over conventional sound reproduction or synthesis methods lies in the combination of highly flexible and at
the same time physically correct models. The high flexiblity
allows the player of a virtual instrument to control all parameters of the model during operation, while the physical
correctness ensures stable operation and meaningful results
with all parameter variations.
Future developements are expected in different directions. The complexity of the modells for strings, membranes, bells, tubes and other obejcts will cetrainly increase.
Furthermore, also the interactions between different kinds
of models for different components of an instrument have
to be established and implemented. Finally, the control of
the player over the virtual instrument will be extended by
new, human gesture based interfaces.

[1] A. Chaigne and A. Askenfelt. Numerical simulations of piano strings. I. A physical model for a struck string using
finite difference methods. J. Acoust. Soc. Am., 95(2):1112
1118, 1994.
[2] M. Goodwin and M. Vetterli. Time-frequency signal models
for music analysis, transformation, and synthesis. In Proc.
IEEE Int. Symp. on Time-Frequency and Time-Scale Analysis, pages 133136, 1996.
[3] M. Kahrs and K. Brandenburg, editors. Application of Digital Signal Processing to Audio and Acoustics. Kluwer Academic Publishers, Boston, 1998.
[4] G. D. Poli, A. Piccialli, and C. Roads. Representation of
Musical Signals. MIT Press, Cambridge, Mass., 1991.
[5] C. Roads, S. Pope, A. Piccialli, and G. D. Poli, editors. Musical Signal Processing. Swets & Zeitlinger, Lisse, 1997.
[6] T. Rossing and N. Fletcher. Principles of Vibration and
Sound. Springer, New York, 1995.
[7] G. P. Scavone. Digital waveguide modeling of the non-linear
excitation of single reed woodwind instruments. In Proc. Int.
Computer Music Conference, 1995.
[8] E. D. Scheirer, R. Vaa nanen, and J. Houpaniemi. AudioBIFS: Describing audio scenes with the MPEG-4 multimedia
standard. IEEE Transactions on Multimedia, 1(3):237250,
September 1999.
[9] J. O. Smith. Physical modeling using digital waveguides.
Computer Music Journal, 16(4):7491, 1992.
[10] M. Tohyama, H. Suzuki, and Y. Ando. The Nature and Technology of Acoustic Space. Academic Press, London, 1995.
[11] L. Trautmann, S. Petrausch, and R. Rabenstein. Physical
modeling of drums by transfer function methods. In Proc.
Int. Conf. Acoustics, Speech, and Signal Proc. (ICASSP01).
IEEE, 2001.
[12] L. Trautmann and R. Rabenstein. Digital sound synthesis
based on transfer function models. In Proc. Workshop on
Applications of Signal Processing to Audio and Acoustics
(WASPAA). IEEE, 1999.
[13] R. Vaa nanen. Synthetic audio tools in MPEG-4 standard. In
Proc. 108th AES Convention. Audio Engeneering Society,
February 2000. Preprint 5080.
[14] V. Valimaki, J. Huopaniemi, and M. Karjalainen. Physical
modeling of plucked string instruments with application to
real-time sound synthesis. Journal Audio Engineering Soc.,
44(5):331353, 1996.
[15] V. Valimaki and T. Takala. Virtual musical instruments
natural sound using physical models. Organised Sound,
1(2):7586, 1996.
[16] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer. Structured
audio: Creation, transmission, and rendering of parametric
sound representations. Proc. of the IEEE, 86(5):922940,
1998.
[17] L. J. Ziomek. Fundamentals of Acoustic Field Theory and
Space-Time Signal Processing. CRC Press, Boca Raton,
1995.
[18] G. Zoia and C. Alberti. An audio virtual DSP for multimedia frameworks. In Proc. Int. Conf. Acoustics, Speech, and
Signal Proc. (ICASSP01), 2001.

You might also like