You are on page 1of 3

intermusic.

com

Page 1 of 3

Robovox

DIY techniques for vocoders, talkboxes and more...

Along with the renewed interest in all things analogue, the 90s have also seen the revival of another classic 70s
sound: synthesized vocals. The last few years have seen vocoders, talkboxes and synth vocals back in fashion
with everyone from Adam F and Air to Beck and the Beastie Boys.
In this article I'll be looking at a few DIY techniques for adding synthetic vocal noises to your music, particularly the
use of vocal filters and Kraftwerk-style phoneme sequencing. You'll need a sampler, a fairly comprehensive EQ
and no real social life to speak of.
Freq speak
'Vocal' or 'formant' filters are sets of resonant band-pass filters that produce vocal tones by emphasising the same
frequencies that the human voice uses to produce different vowels. By moving our mouths and vocal chords we
cause various frequencies to resonate and it is these sustainable resonances that form the basic vowel sounds.
The ability of complex synth filters to mimic these sounds has been understood by designers for some time but
finding the right way to control them is still proving rather tricky. E-mu is able makes its Z-lane filters enunciate
sounds, but the problem of providing an intuitive method of control means the facility has yet to appear on a
production instrument.
Similarly, Korg's modelling super-synth, the OASYS, features a 'Glottal' model but, unfortunately, it now seems the
OASYS is destined just to be a development tool. Simplified vowel-only versions of both have appeared, with Emu providing vocal filter types in its EOS samplers, Morpheus module and Orbit range and Korg including the
'Talking Modulator' algorithm in both the Trinity and the Z1's effects sections.
The Ensoniq DP-Pro, Waveboy Voder, TC Electronics Fireworx and the Yamaha EX5 also feature vowel-sound
effect algorithms but what do you do if you don't own any of this gear? Well, as long as you can put your hands on
a three- or four-band fully parametric EQ you're laughing. While this statement would have made a good joke a
few years ago, the proliferation of dirt cheap digital desks and programs like Cubase VST and Logic Audio means
that most people now have access to this sort of EQ.
Tone control
By boosting two or three narrow frequencies it is possible to make your own vocal filters. Start by boosting a
narrow frequency by about 15dB and sweeping it around the lower range until you get the right sort of tone for the
vowel you're after. Try varying the width of the EQ and the amount of gain to see if it improves the sounds. Now
use another band and do the same with the higher range. You may need to add a third boost, either high or low,
to define the sound.
If you have a variable low-pass filter available you can roll off everything above 4kHz as well, as it will not
contribute anything to the vowel sound. I found that the following settings worked quite well on an 02R for the 'ay'
sounds; -15dB at 630Hz with a narrow width; +10dB at 2.52kHz with a medium width; +15dB at 2.99kHz with a
medium width; and a low-pass filter at 3.17kHz.
The 'oo' sound worked with +15dB at 198Hz with a narrow width; +10dB at 334Hz, narrow width; +10dB at 707Hz,
narrow width; and a low-pass filter again at 3.17kHz.
Passing drum loops through these EQ settings gives them a strange lo-fi sound (as you'll hear on track 26) while
simple synth sounds take on a vocal quality. Try multisampling the same synth sound through different EQ
settings then stack the samples in velocity switched layers.
By playing parts in to your sequencer and then editing the velocity of each note to trigger the desired 'vowel' you

http://intermusic.com/print.asp?ReviewId=97&ArticleTable=Features&FeatureType=TUT&...

9/17/01

intermusic.com

Page 2 of 3

can achieve a sort of talkbox effect, like the example on track 27 which uses a sawtooth wave. Some settings
work better than others and the sort of sound you are working on will also have an effect on the quality of the
results.
Bits n' speeches
Now we're going to take a look at phonemic sequencing, as pioneered by Kraftwerk. The most famous Kraftwerk
robot voices were made with EMS vocoders but another of their methods was to string together words from a
sampled bank of phonetic vowel and consonants. The result is a half-robot, half-human voice, rather like Tony
Blair's.
The first thing to do is to get your microphone out and record some phonemes. There is still academic debate over
the exact number of English language phonemes, but for our purposes, it's possible to work with the 41 listed
below.
The list is in alphabetical order, with the letters in bold type denoting the actual phonemes (so don't say the whole
word, just the bold bit): say, cat, wah, air; back; chuck; dim; glee, red; face; good; happy; bid, tried; jury; kid; late;
man; no, bang; toe, nod, cord, look, fool, now, toy; pen, run, sit, shout, measure; ten, that, thin; cut, curt; vine;
wall; yes; zip. [Phew - Ed]
Keep it short
Before you start recording, a few tips. It's very easy to add an 'uh' to the end of consonants when you say them.
Remember you only want the sound of the consonant itself. Of course, it's possible to trim the 'uh' sound off later
but, as always, try and get the best possible source material into the sampler. When recording sibilant or plosive
sounds (like s, f, b and p) turn your head slightly away from the mic and use a pop shield if you can, otherwise
you'll get unusable amounts of breath on your sample.
The boldened phonemes in the list are sustainable sounds and, if possible, you should try to make loops out of
these sounds. This requires a level pitch and volume so record several seconds of each sound as the first couple
of seconds are usually fairly wobbly. Try and hold a constant pitch from one phoneme to the next, to keep that
Kraftwerk-style monotone. The loops need not be very long and if you're after a very robotic sound, the shorter the
better.
You'll find that some sounds are harder than others to sustain ('ie' can be tricky, while both versions of the 'oo'
sound are comparatively easy) so don't worry about getting all of them. Last of all, and most importantly, under no
circumstances record these sounds where anybody else can hear you. That is, unless you're the sort of person
who isn't embarrassed about standing in the middle of a studio with a duvet over your head (for those of us
without a vocal booth), making a sound like a Gregorian monk auditioning for the Wurzels. For those of a more
self-conscious disposition, before you utter a sound, make sure everybody you live with is either out of the house,
deaf or dead. Unfortunately, there's nothing to stop someone coming home in the middle of your session and, as
deafness is sometimes curable, it's safest to kill everyone.
Voices in my head
Of course, there's nothing to stop you using someone else's voice, particularly someone off the television or radio.
It's likely to take much longer to get a full set of phonemes, and they may well be mismatched, but the results can
be interesting to say the least. Technically, it's an infringement of somebody's copyright of course, so I'll leave that
to you, the MCPS and your conscience to sort out. As a final alternative, I've included the set I recorded myself
(track 28 on the CD) and you can use these as either a reference or a shortcut.
The sustained sounds are very short, but should form perfect loops with a little trial and error. If you're worried
about letting my voice loose inside your sampler in case it starts talking a load of old bollocks or something, then
rest easy. I heavily processed my voice so that it sounds fairly unrecognisable, something you might want to try
yourself.
I added a fast, wide chorus and just a hint of distortion, before formant shifting the whole thing (more on formant
shifting later). Once you've got the sounds into the sampler, cut them to size and spread them across the
keyboard. If your sampler has a monophonic mode, you should use it to prevent two phonemes sounding at once.
Speak to me
Now, at long last, it's time to string a phrase together and what could be better than the two most inspiring words
in the dictionary: 'future' and 'music'? First, work out which phonemes make up the words you're trying to
sequence, in this case: f y oo ch ur and m y oo z i k. Try and make each phoneme follow directly after the one that
precedes it to avoid gaps in the word and use your own discretion to decide how long to hold the sustainable
sounds. Generally, the shorter you hold the sound, the more fluent the speech will sound.

http://intermusic.com/print.asp?ReviewId=97&ArticleTable=Features&FeatureType=TUT&...

9/17/01

intermusic.com

Page 3 of 3

If the varying volume of the phonemes makes the word sound jumpy, use a compressor with fast attack and
release settings to help smooth it out. This whole process can take some time but persevere and you'll get there.
You can hear my attempt on track 29 and I dare say you can do better. Once you're happy with the phrase, try
varying the sequencer tempo as you may find it actually sounds better played slightly faster or slower.
If you get tired of hearing everything in a flat monotone, you can use pitchbend to add a tune to your phrases and
if you want to add rhythm then try quantising some of the phonemes, as on track 30. As you can hear, as long as
you're prepared to put the time in, you can achieve all sorts of results. The only thing I couldn't do was stop it
making everything sound German.
Speak easy
If all this phonetic sequencing sounds too much like hard work you could try using a computerised text-to-speech
program to do the tricky stuff for you. Apple's Simple Text program includes a facility to make your computer
speak any section of text, in any of a variety of different voices. Although PCs don't come with built-in speech text
facilities, similar programs can be found on the Internet. Even better news for Mac owners is a program called
Vocal
All sorts of palm-top translators have been available for a few years now and what was state-of-the-art a while ago
is now turning up for 50 or so in Loot. If you're really lucky, a quick rummage through your (or someone else's)
attic or local junk shop might turn up an old Texas Instruments' Speak n' Spell machine, another Kraftwerk
favourite.
Alternatively, forget the phonemes and just try sampling whole words off the telly then rearranging them into
phrases. Back in the early 90s, an outfit named Fortran 5 pieced together a whole song using snippets of Sid
James' voice lifted from old Carry On films and episodes of Hancock's Half Hour. Quite why they did this remains
a mystery but they weren't going to let a little thing like not having a reason stop them, so nor should you.
Speak up (or down)
Before we wrap up, a couple of words about pitchshifters. Aside from their obvious use as harmonisers,
pitchshifters can also add a synthetic slurring sound to vocals, as on Daft Punk's Wdpk 83.7fm. Most pitchshifters
have a finetune setting rated in hundredths of a semitone so set this to about -20 and then set the feedback
parameter fairly high, around 70%. If the pitchshifter features an integral delay line you could set it to a few
milliseconds to slightly emphasise the slurring.
Another pitchshifting curiosity is formant shifting, generally found in programs like Logic Audio and Digital
Performer or as Pro Tools plug-ins. Originally intended to improve the quality of pitchshifting, formant shifting
attempts to retain the voice's original resonances and character while shifting just the pitch. Where things get
interesting is when you shift the formants but leave the pitch, resulting in an entirely different vocal character.
Dropping a vocal one or two semitones often gives an older as well as deeper sound, but still retains the original
key, as you'll hear on track 32. This is ideal for anyone self-conscious about using their own voice on tracks.
Extreme settings can work well, particularly if you drop the formants a whole octave and then take them back up
an octave to the original pitch. This can degrade the signal in an interesting way.
So, now you know all about things that make you go "Hmm". Not to mention "Ooh", "Aah" and "Wap bam boogie"
as well. Next time you're stuck for an idea on your new track, why not give these techniques a try? It never hurts
to spread the word.
Here's a basic guide to the frequencies associated with the most common phonemes of all: the vowel sounds.
Phoneme Low frequency High frequency
say 600Hz-1.2kHz 1.8-3kHz
cat 500Hz-1.2kHz 1.4-2.5kHz
car 600Hz-1.3kHz 900Hz-1.6kHz
glee 200-400Hz 2-4kHz
bid 300-600Hz 1.7-3.8kHz
toe 350-550Hz 800Hz-1.4kHz
cord 400-700Hz 500Hz-1.1kHz
fool 175-400Hz 500Hz-1.2kHz
cut 500Hz-1.1kHz 1.2-1.7kHz

http://intermusic.com/print.asp?ReviewId=97&ArticleTable=Features&FeatureType=TUT&...

9/17/01

You might also like