You are on page 1of 13

AUDIO DSP FOR THE

BRAINDEAD

I NTERNAL DEVELOPMENT VERSION 2000.4.30


Bleeding-edge version at:
http://www.student.oulu.fi/˜oniemita/DSP/INDEX.HTM

I, as the author and copyright holder, allow you to do anything you wish with this
book free of charge, including copying, printing and republishing. In return, you
must preserve this notification and the book’s website URL on the title page.

Olli Niemitalo
Contents

About this book III

1 Sampling basics 1
1.1 What is sound? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 From air pressure to analog . . . . . . . . . . . . . . . . . . . . . .
1.3 From analog to digital . . . . . . . . . . . . . . . . . . . . . . . . . 
1.4 Quantization error . . . . . . . . . . . . . . . . . . . . . . . . . . . 
1.5 Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 
1.6 Angular frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 
1.7 Frequency range and aliasing . . . . . . . . . . . . . . . . . . . . . 
1.8 Nyquist, we have a problem! . . . . . . . . . . . . . . . . . . . . . 

2 Sinusoids 11

2.1 Amplitude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2 Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 

3 Processing 13
3.1 Mathematical model of sampling . . . . . . . . . . . . . . . . . . . 
3.2 Discrete processing . . . . . . . . . . . . . . . . . . . . . . . . . . 

Collection of filter formulae 17



1 IIR filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1 Fastest and simplest “lowpass” ever! . . . . . . . . . . . . .

2 FIR filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Internet references 18

Symbol chart 19


About this book

The purpose of the book is to be a tutorial for people who want to learn audio
digital signal processing, but find the academic books too cryptic and impractical.
Softsynth and audio software makers, game programmers, computer musicians
etc. could fall into this league. You must know how to program and have some
basic math knowledge, that’s all.
Printing should be done on both sides of the paper, preferrably with a color printer.
If your printer is not capable of printing on both sides, first print odd pages on one
side of the paper, re-insert the papers (check you got the right order and position),
and print the even pages on the blank sides. Finally, check that there is no lonely
odd page left in the paper tray. If you can’t trust your printer, do the printing
chapter by chapter.
Don’t forget to check out the book’s website for the latest version!
I started this project, because my older, similar text (DSPSTUFF.TXT) began to
seem a bit naive, and i wanted to rewrite the whole thing. ASCII art is not that
accurate so, i chose to use graphics, specificly vector graphics. The quest for right
software led me to LATEX (MiKTEX) and Adobe products (Illustrator) and GnuPlot.
My motivation is sharing knowledge, and probably a tiny bit of that built-in desire
for 15 minutes of fame. For me, this book also works as an answer to all those
“How was it again...?” questions that hit me every now and then.
I’d like to thank Timo Tossavainen for teaching me stuff, and my big brother Kalle
Niemitalo for helping with the math and writing. Thanks to all the people i’ve
got feedback from. The coolest thing yet is that i have received free software,
documents and even job offers in return for my work! :-) Please don’t stop! It’s
great hearing this is of use. Also, i’d like to know if you have found errors or have
suggestions or questions - updating is easy, as this is published electronically.

Olli Niemitalo a.k.a. Yehar / Sublevel 3


Student in information technology at the University of Oulu
http://www.student.oulu.fi/˜oniemita (English homepage)
http://www.sublevel3.org (Music from our label)
oniemitalo@sublevel3.org


                     !"  #$ %    !    '& ( )

***
1. Sampling basics

1.1 What is sound?

Sound is pressure changes traveling in the air, or in some other medium like
water. It can be caused by vibrating objects like guitar strings stirring the air, or
by air turbulence. A nuclear explosion does make a loud bang also.
An increase in air pressure practically means an increased number of air molecules
in a volume. Low pressure would mean lack of air molecules. Whenever there’s a
thinner (local low pressure) spot in the air, surrounding air molecules are pushed
there to fill it, but as they moved, they created another thin layer which is again
filled by surrounding molecules. And the sound travels. In air, at about 330m/s.
Hey, it isn’t really that simple, but you don’t necessarily need to know more! It’s
the huge amount of molecules that turns it all into statistics...

1.2 From air pressure to analog

A microphone converts the instantaneous pressure levels into instantaneous elec-


tric voltage levels. If one farts into the microphone – don’t mind me not telling how
the sound is constructed – the +-, . / 021-3-4 / 5 6738 plot could look (did look) something
like this:
Voltage

Time

Analog, continuous fart sound

This +2,. / 01324 / 5 693 8 signal is called an analog signal – referring to that the voltage
is analogous to air pressure. In this form, the sound can be recorded for example
mechanically on a vinyl disk or magnetically on a tape, or after amplification (volt-
age is scaled by multiplication), sent to speakers to convert the voltage changes
back to pressure changes, sound.
:
; <=?>-@A BDCFE GIHKJ#@>-E LD>

1.3 From analog to digital

The computer’s memory can not store an infinite amount of data. The memory
is not continuous like the curve on a vinyl disk. Instead it is divided into a finite
number of memory slots, bits, and they have only two states, 0 and 1 – black or
white, no greys, could one say – this is called digital.
Therefore it is not possible to save the original sound in digital in all its detailed-
ness. Luckily (in this context!) our hearing is limited and we cannot hear very
quiet sounds or very high frequencies, so the amount of information needed to
store an accurate sounding representation of a sound is finite, and can be easily
reached using today’s equipment.
How it is done is called sampling. Here’s a sampled version of the fart sound:
Amplitude

Time

Digital, discrete fart sound

The vertical axis is now titled amplitude, since we are no longer dealing with a
real quantity like voltage. Sometimes they say things like: “The amplitude of this
signal is 5 volts”. In that case, they are talking of the total zero-level to top height
of the waveform. Here, instead, we mean instantaneous values. Just try to grasp
the concepts, and you’ll be all right with the twisted terminology. You may even
become friends! (Hope not too good ones)
The sampled sound is not a continuous curve. Instead it is a set of peaks of
different amplitudes, spread in time at equal intervals M (meaning the time between
adjacent peaks is constant, same everywhere). This kind of a signal, where time
is quantized, is called a discrete signal. The amplitudes of the peaks are taken
from the instantaneous voltage levels of the original sound. Hence the name
sampling. Another name for a single peak is samplepoint, or shortly sample.
Samplepoint is preferred, since sample could mean a longer piece of sound too.
To limit the amount of memory required to save the amplitude of a single sam-
plepoint, amplitude is also quantized, meaning it can only have values that are
multiples of a constant. The relation of quantized amplitude to unquantized am-
plitude is a staircase function, from which the closest step is always taken:
N O7P Q R9S TU V W X Y[Z \ Y"]^ U T _ `acb d-e fIRg h i#Q bkji#l i g mcnFP o p i fIQ bnFg q QDQ bIf bq b q r d q P s b l tKu g t#v m P q w
nFR i l iDu g t$v m i vb P q Q uFg l iDu vg o i fd q i h i q m xP qQ P t$i `
y2z {z[|~}F€‚"ƒ „…"ƒ †#€ˆ‡D‰‰k†#‰ Š

Quantized
Quantization step

Unquantized

Amplitude quantization

At this point some would say we have a PCM ‹ signal.


In the computer’s memory, the amplitudes of the samplepoints are saved in an
array. The most used data format is stereo 16-bit at 44100Hz, which gives you
Š Š Š
a precision of Œ2 Ž ‘-’’  different amplitude levels “ ” Œ•–— —˜ Œ2• -•™ on two
separate channels (left and right). 44100Hz is the frequency of the peaks, sam-
plepoints. That means, 44100 samplepoints are recorded during a single second.
This frequency is called the sampling frequency, sampling rate or samplerate.
Some other commonly used bit depths are 8-bit, 24-bit and frequencies 22050Hz,
48000Hz, 88200Hz, 96000Hz. Floating point formats are also possible.
In the rest of this book, we will use š › œ…ž2Ÿ “   › ¡7¢™ notation for continuous signals
and š › œ-…ž-Ÿ £   › ¡7¢ ¤ for their sampled representations.

1.4 Quantization error

It would seem logical that better the amplitude precision, better the quality. Right!
Adding one more bit to the bit-depth doubles the number of available amplitude
levels, and drops the quantization error to half. Quantization error is the unwanted
addition to your signal due to quantization, and it can be calculated through:

Quantization error  Quantized signal ” Original signal

This is a common procedure. To extract the error from a spoiled signal, for closer
investigation, you subtract the original from it.
Let’s try the formula visually and see what we get from quantizing a sinusoid ¥ ,
one of the most basic waveforms:
¦ § ¨ © ª« ¬…­ ® ¯"°c± ² ¬ ³$´c® « ³ «µ® « ± ² © ¨ ¶ ® ª‚· ¸$± ¸D¹ ² ³Fº ± ¸D¹ ² ³ » ¼ ½ ½ ½ ¾I¿ À Á Â Ã Ä Å ¿ Æk¸#³ © ª ¬F¨ Ç © ¨F¨ Ç ³$¬ ¶ È ª© 2 ² ¶¬
¨ ¯ © ª¬ ­ ® ¯ ¸#³ «¶ ª ¨ ®© ª ® ¨ Ç ³ ¯'¬ ¶ È ª© ²-¯ ³ É ¯ ³ ¬ ³ ª ¨ ¶ ª È#¨ Ç ³$® ¯ ¶ È ¶ ª© ² ½'ʅÁ Â Ë Ì$¯ ³ ­ ³ ¯ F
¬ ¨ ®¨ Ç© ¨'¨ Ç #
³ « ¶ ¬ Í ¯ ³ ¨ ³$¬ ¶ È ª© ²
¶ ¬F©$¬ ³ ¯ ¶ ³ ¬…® ­-ª© ¯ ¯ ® Î9ɳ © Ï ¬ Ð É ± ² ¬ ³ ¬ ½'Ñc¿ À Ì À¶ ¬F©$¬ Ò ª ® ª Ò ¸­ ® ¯"« ¶ È ¶ ¨ © ² ½
Ó ÔFª© ² ® È ® ±¬…¨ ®#Ç ± ¸#© ª ® ¶ «2Ð Ç ± ¸#© ª Õ ² ¶ Ï ³#Ö Õ ¼
× ØÙ?Ú-ÛÜ ÝDÞFß àIáKâ#ÛÚ-ß ãDÚ

ä å

Quantized Original Error


Extraction of quantization error. Original is subtracted from quantized.

We can promptly see that the amplitude of the quantization error is strictly limited
into a range. The top limit of the range is equal to half the quantization step.

1.5 Frequencies

By frequencies, we mean sinusoids, such as æ ç èFé ê ë ì7íî , present in sound. We’ll


use the following abbreviations:
ï ð ä
ï
Sampling frequency
ä Frequency of the sinusoid
ê ä Time
ñ"ò ä Initial phase, alphaò
ó ä Amplitude

Sinusoidal frequencies are of the general form:

ô ìIõcö ë ê ÷cø-íIù ú ûæ é ü ýþù ï…ÿ í ÷cí  ~ù ê ë ì7í õ
ô í î
And the same using the abbreviations:

ó ï
ú ûæ é ü ý ê  ñ"ò î

Amplitude defines the height of the sinusoid, measured from the zero level to the
top. Initial phase defines the phase of the sinusoid at ê ä , a cosine  being the
result from ñ"ò ä . Commonly, the time unit is seconds, the frequency unit is Hz
and no unit is used with the amplitude.

1.6 Angular frequency

A more convenient way to express the frequency in a discrete signal is angular


frequency, which we note by  (omega, a Greek letter looking quite like double-u).
          !     " #  %$    '&      # (!  #)  #  %)  #    *
+-, ./,1024365 7098;:%8<=65 <"2?>@ A

We also introduce a new letter to express discrete time, B (upper case!). These
new variables are related to C and D by:

EGFIH J C B F C KD
CK
There are no units for these quantities, and the general sinusoid formula is sim-
plified into:

LNM OP Q E
B1R1S T U

The convenience gained is not only the simplified formula, but also that now B
can be used as samplepoint number and the frequency is expressed as parts
of the sampling frequency, kicking it out of the calculations. For example, if we
are assigned to create a sampled sinusoid of some angular frequency, we don’t
need to know the sampling frequency to be able to start typing in the samplepoint
values.
Here are some possibilities for E and the corresponding real freqs:

E1FWVYX C FWV"Z[
E1F X C F C K \]
E1F %
J 
\ H X
J C F C K \-H
E1F X C F CK
HJ

Let’s visualize a sinusoid (cosine) with the following constants:

F L F_^
Amplitude
F`EaF
Angular frequency
F J%\H
Initial phase S T FbV
M O-P Q c d
BeU
+1.0

+0.5
Amplitude

-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)
L F ^ E1F FWV
Example sinusoid, , J \-H , S T

The markers are the samplepoints. Since the angular frequency is J%\H FWfV-g , a
quarter of the sampling frequency, the sinusoid goes a full cycle every 4 samples
h ijlk/m9npo"qr s?t;u!m9k/r v"k

h
– w-xy is the fourth of a full circle (z x-y = { | ). Starting to understand angular
frequency? You can also consider } as a phase increase that is added to the
phase of the sampled sinusoid at every sampling step.
A good visualization aid is a marker going counterclockwise ~ around a unit circle
(radius 1, origo-centered). The circumference (the length of the circle straight-
ened) of a unit circle is { | . At every sampling step the arch traveled by the marker
is of length } , so is the angle the marker rotates around origo. If we are creating a
sampled cosine wave, taking samples of the horizontal coordinate of the marker
and starting from coordinates (1,0) at time  0 does the job:

€ -‚ ƒ „ … †e‡

+1.0

+0.5
Amplitude

0
ˆ
-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)

The example sinusoid with unit circle illustration, ‰W Š , }1l|%‹{ , Œ !lx

1.7 Frequency range and aliasing

Increasing the used sampling frequency allows representing higher frequencies


in the sampled sound. This is specified by the Nyquist criterion: “A sampled rep-
resentation of a signal is exact if the highest frequency in the signal is less than
half the sampling frequency”, or a version even further tailored for our purposes:
“Only frequencies smaller than half the sampling frequency can be represented.”
So if you use 44100Hz (CD quality) as sampling frequency, the highest frequency
you can have is 22050Hz, consequently called the Nyquist frequency, Ž  . Ex-
pressed in terms of angular frequency, Nyquist frequency is always | , since it is
half the sampling frequency, { | .
A cheesy proof follows. You need to store at least two samplepoints per wave
cycle, the top and the bottom, to be able to represent a sinusoid:

 ‘
’ “ ” • – “ — ’ ˜
™ • š ’ › “ œ’ • “
ž – ˜“ — ’ž Ÿ “ Ÿ ˜ ’  ¡•!” “ “ ’ œ¢ ““ –"” £ ” ¢ “
¤’ ž – ˜ ’ —” •£"“ –¥ – œ!œ– • ¦ §” ¥ ¥ ’ ¢ “ ’ £
¥ – • š ’ • “ ™ – •›
¨-© ª ©1«%¬­®6¯­"°4±²³¬´9°?µ­³´°9¶·´¸ ¹ ´eº/¹ °4µ »

¼ ½-¾ ¿ ÀÁ9Â
+1.0

Amplitude +0.5

-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)

Nyquist frequency sinusoid, Ã1Ä À , the maximum freq!

If we have a higher frequency than this Nyquist frequency, and use the same
sampling freq, shit will happen. Here we have a frequency that is 4/3 of the
Nyquist frequency:

¼ ½-¾ ¿ ÅÆ ÀÁeÂ
+1.0

+0.5
Amplitude

0
ˆ
-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)

Example sinusoid, Ã1Ä ÅÆ À , too high a freq!

Now we wipe out the continuous waveform and store only the discrete samples:

+1.0

+0.5
Amplitude

-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)

Discrete samples of the example sinusoid

Based on this data, it is impossible to retrieve the original sinusoid, because


there’s another, lower frequency sinusoid, that has the same discrete representa-
tion, and at resynthesis it is brought up instead of the original higher-than-Nyquist
frequency sinusoid. Here you see it happen:
Ç ÈÉlÊ/Ë9ÌpÍ"ÎÏ Ð?Ñ;Ò!Ë9Ê/Ï Ó"Ê

Ô Õ-Ö × ØÙ ÚÛeÜ
+1.0

+0.5
Amplitude

-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)

Reconstructed, aliased sinusoid, Ý1Þ ØÙ Ú .


(Notice the identical positions of -markers)

This process of higher-than-Nyquist frequencies transforming into lower frequen-


cies is called aliasing. Reconstruction only produces Nyquist-range frequencies,
and due to sampling, the out-of-range frequencies can not be discriminated from
the corresponding in-range frequencies having identical discrete representations.
You should note that sinusoids stay as sinusoids through aliasing, even though
changes in frequency take place.
A graph will give a general rule to aliasing. On the horizontal axis we have the
original unaliased frequency, and on the vertical the (possibly) aliased one:

0.5 fs
Aliased
freq

0Hz
0Hz 0.5 fs 1.0 fs 1.5 fs 2.0 fs 2.5 fs
Unaliased frequency

Dependence of aliased frequency from unaliased

Now this explains why it is called aliasing! First, as we increase the frequency
above ß à , it bounces off and aliases over the already used range, and when
increased more, it bounces off the 0Hz. And so on, infinitely.
The amplitude of a sinusoid is preserved in aliasing. What happens to the phase
is usually unimportant, and will not be discussed here.
In our example, the unaliased frequency is in the range á
â ã-ß ä – å-â á/ß ä , so by reading
from the graph, we can write a mathematical equation for the aliasing relation
(applicable in this specific frequency range only):

Aliased frequency Þ Sampling frequency æ Unaliased frequency

A quick review on the symbols:

ßçÞ Unaliased, original frequency


ß èéÞ Aliased frequency
ßä Þ Sampling frequency
ßà Þ Nyquist critical frequency Þ;ß ä êë
ì-í î
í1ï4ðñ6òó ôõ%ö÷Nølùúû%ølú;ü"ýeþ!ÿ ø 

We just declared:   
 
And we had    as the unaliased frequency  , so in our example:

  
 
    
      

. . . Meaning that the aliased frequency is a third of the sampling frequency, so it


goes one full cycle every three samplepoints, as you correctly see happening a
couple of figures back.
Aliasing is one of the most annoying artifacts of sampling. The first situation it
is encountered is when doing the analog digital conversion. If there are above
  frequencies in the analog signal, they will be aliased into unpleasant distortion
(that’s what any unwanted frequencies is called) in the digital signal. This is why
there has to be a hardware analog filter to remove above-Nyquist frequencies
before the ADC (Analog to Digital Converter).
Some non-audio designs use the property of aliasing in transforming higher fre-
quencies to the Nyquist range, where they can be analyzed digitally.

1.8 Nyquist, we have a problem!

Perfectly accurate reconstruction is theoretically possible if the Nyquist criterion is


satisfied. Still, it is very common to hear claims, mostly from audiophiles, that hav-
ing double the highest existing frequency as sampling frequency is not enough.
They mostly go like this: “If you sample a sinus that is of half the sampling fre-
quency, you sample the zero crossings. That gives you nothing but silence!” Yes,
that’s true, as you can see for yourself:

   !#" , Silence


+1.0

+0.5
Amplitude

0
ˆ
-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)

Sampled   ! $" equals silence!

The Nyquist frequency could be thought of as a special case, where the phase
information of the sinusoid is lost, as it is always reconstructed as %'& (   $" .
Here’s an example showing how the phase disappears:
)* +-,/.10#24365!7 8:9
; 0#.17 <6.

Original = >@? A B!CDEF G , Reconstructed H'= >-? A B!C#G


+1.0

+0.5
Amplitude

-0.5

-1.0
0 1 2 3 4 5 6 7 8
Time (sample number)

The phase information of a Nyquist frequency sinusoid is lost in sampling along


with some of its amplitude.

The amplitude of the reconstructed cosine is not that of the original sinusoid. It is,
* . In short, Nyquist frequency sinusoids are attenuated or even muted
as easily interpretable from the figure, same as the value of the original sinusoid
at CJI
depending on their initial phases.
The good thing is that this special problem is limited to Nyquist frequency only. A
frequency a tiny bit less does not have the problem. Therefore, the audiophile’s
intuitive argument loses its point. – Sampling at 40001Hz is enough for represent-
ing any 20000Hz sinusoid.
Perfect reconstruction is an extremely heavy process, and practical reconstructors
are far from perfect. Still, there is no similar phase-selective attenuation below the
Nyquist frequency. Other kinds of problems, mostly aliasing-related, exist.