You are on page 1of 108

UNIVERSITY OF VICTORIA

Department of Electrical and Computer Engineering

ELEC 484 Audio Signal Processing


Final Project

Direct FFT/IFFT based Phase Vocoder


for Implementing Digital Audio Effects

Prepared by:
Tim I. Perry
V00213455
May, 2009

Prepared for:
Dr. Peter Driessen
ELEC 484
University of Victoria

Tim Perry
V00213455
2009-05-21

Contents
1.

Design Overview .................................................................................................................................. 4


1.1.

Analysis Portion ............................................................................................................................ 6

1.1.1.

Windowing

1.1.2.

Circular Shift and FFT

1.1.3.

Overview of Analysis Stage 9

1.2.

Resynthesis Portion ..................................................................................................................... 11

1.3.

Phase Unwrapping ...................................................................................................................... 13

2.

Testing Hop Size with Vectors ........................................................................................................... 15

3.

Testing with Cosine Waves................................................................................................................. 18

4.

3.1.

Cosine wave input with integer # of samples/cycle .................................................................... 18

3.2.

Cosine wave input with non-integer # of samples/cycle ............................................................ 24

3.3.

Non-integer windowing with cosine wave input ........................................................................ 27

Waterfall Plots .................................................................................................................................... 29


4.1.

Phase vs. Time, Instantaneous Frequency, and Frequency Resolution ....................................... 30

4.2.

Amplitude, Magnitude, and Phase in Time-Frequency Plane ..................................................... 32

Non-Integer Samples per Cycle, Fractional Cycle Per Segment 42


5.

Cyclic Shift ......................................................................................................................................... 45


5.1.

Cyclic Shift with Cosine Wave Input ......................................................................................... 45

Without Cyclic Shift


5.2.

47

Cyclic Shift with Kronecker Delta Function Input ..................................................................... 49

With Cyclic Shift 49


Without Cyclic Shift

51

6.

Preliminary Audio Testing .................................................................................................................. 52

7.

Implementing Audio Effects ............................................................................................................... 55


7.1.

Time Stretching ........................................................................................................................... 55

7.2.

Pitch Shifting .............................................................................................................................. 60

7.3.

Stable/Transient Components Separation ................................................................................... 63

7.4.

Robotization ................................................................................................................................ 65

7.5.

Whisperization ............................................................................................................................ 67

7.6.

Denoising .................................................................................................................................... 68
1

7.7.

Tim Perry
V00213455
2009-05-21
What Wha Filter in Freq Domain ............................................................................................... 69

Preliminary BPF Design 69


Frequency Domain Wha-Wha Implementation 71
8.

Audio Compression Using Phase Vocoder ......................................................................................... 80


8.1.

Data Compression using a Threshold Amplitude for Eliminating Bins...................................... 81

8.2.

Data Compression by Keeping N Strongest Frequency Components......................................... 84

9.

Conclusions ......................................................................................................................................... 88

REFERENCES ........................................................................................................................................... 88
APPENDIX A List of MATLAB files ..................................................................................................... 89
APPENDIX B Two Example Implementations ....................................................................................... 90
1.

pv_WhaBPF.m ................................................................................................................................ 90

2.

pv_Pitchshift.m ............................................................................................................................... 97

APPENDIX C waterfall_Plot.m ............................................................................................................ 104

Tim Perry
V00213455
2009-05-21

ELEC484FinalProject
Phase Vocoder Implementation

PROJECTGUIDELINESFORPHASE1

ImplementaphasevocoderusingtheideasinDAFXchapter8.

a) ReviewAssignment5question1cwindowedoverlappingsegments,raisedcosinewindows,cyclic
shift,andoverlapadd(DAFXfigure8.5)

b) Testwithacosinewavewithanintegerandanonintegernumberofsamplespercycle,verifythat
aftertheoverlapaddtheoutputisthesameastheinput.Choosethefrequencysuchthatthereare
manycyclespersegment,about1cyclepersegmentandasmallfractionofacyclepersegment.(6
frequenciestotal).

c) Plottheamplitudesinthetime/frequencyplane(3Dorcolorcodedplotofamplitudevstimevs
frequency)foreachcaseandinterprettheresult

d) Plotthephasesinthetime/frequencyplane(3Dorcolorcodedplotofamplitudevstimevs
frequency)foreachcase.Alsoplotthephaseversustimeforfrequencybinsclosetothefrequencyof
thecosinewave.Explainwhythephaseschangewithtimeforcosinewavesofdifferentfrequencies.
Identifytheamountofphasechangeasafunctionoffrequencyandexplainwhyitisso.Hint:
considertheideaofinstantaneousfrequency.

e) Investigatetheeffectofthecyclicshift.Testwithandwithoutcyclicshiftusingacosinewaveandan
impulsesignal,andexplaintheresults.

f) TestalsowiththeTom_dinersignalandverifytheoutputisthesameastheinput.

g) Implementthefollowingaudioeffectsbymanipulatingtheamplitudesandphases,asexplainedin
DAFXchapter8:timestretching,pitchshifting,stable/transientcomponentsseparation,
robotization,whisperization,denoising.Alsoimplementandtestthefilterofassignment2inthe
frequencydomain.TestwiththeTom_dinersignal,cosinewavesignalsandothersignalswhich
demonstratetheeffectclearly.

Tim Perry
V00213455
2009-05-21

1. Design Overview
For practical use with digital audio effects, a phase vocoder was implemented based on a block-by-block
FFT/IFFT approach. This technique makes use of a kernel algorithm that uses a sliding window to
perform FFTs on successive segments of audio. Phase and amplitude values are computed in the
frequency domain, as well as any additional desired processing, such as the implementation of audio
effects. Next, the IFFT is performed on the segment, which is re-windowed and overlap-added in the time
domain to preceding segments, recovering the signal (or a processed version of the signal). This process
can conceptually be broken into three stages:
1. Analysis
2. Frequency Domain Processing (implementation of specific DAFX)
3. Synthesis
This documentation will frequently refer to these stages; however, these stages are not separate entities
with the implementation used. To allow for a shorter computation time, the kernel algorithm iterates
through each stage on a segment of audio, all within in the same loop. With each successive windowing
of a block of audio, the kernel algorithm executes through an iteration of analysis-processing-synthesis.

Figure 1: Abstracted phase vocoder design.

Tim Perry
V00213455
2009-05-21

Figure 2: Direct FFT/IFFT (Block-by-Block) Analysis/Synthesis Phase Vocoder model, as illustrated in


Zlzers DAFX. [1].

While computation speed was the main factor for choosing the block-by-clock phase vocoder approach,
the intention here is not so much to develop an efficient phase vocoder. Instead, the goal was to develop a
robust design that can easily be tested, and easily be modified to accommodate various frequency domain
processing schemes. Keeping the conceptual segregation between the three stages of Analysis,
Processing, and Synthesis is helpful when designing for flexibility. Additionally, for the purpose of
learning, it is important to be able to abstract certain elements of the phase vocoder while focussing the
details of a specific element, such as the frequency domain implementation of an effect.

Tim Perry
V00213455
2009-05-21
In the spirit of robustness, and with heavy emphasis on analysis, this phase vocoder looses some of its
elegance and simplicity due to input data type checking and real time plot generation. Having spent
significant time testing and troubleshooting, I found it useful to be able to throw multiple data types
(numerical vectors or wavfile names) into the same phase vocoder, and have it generate appropriate plots
and feedback catered to that data type. More than anything else, I have found myself using this phase
vocoder as a tool for time-frequency analysis on audio signals.

1.1.AnalysisPortion
The analysis portion of the kernel algorithm serves the purpose of bringing a windowed segment of
audio into the frequency domain. Segments, or blocks, are windowed in such a way that they overlap
(share some samples) with their neighbouring windows in the time domain. This overlap is defined by
the hop size, Ra. The effect is a sliding FFT (actually, a hopping FFT, but an appropriate hop size
and windowing scheme can be used to obtain smooth sounding results, despite discreet steps between
each analysis window). The analysis part can be broken up into 3 steps:
1. Window the current block, forming the analysis grain.
2. Perform a circular shift on the analysis grain
3. Take the FFT of the circular shifted grain

1.1.1. Windowing
Several different windowing schemes were tried. Windowing was applied to the block during the
analysis stage, and inverse windowing was applied to the IFFT during the resynthesis stage. The
goal was to minimize spectral leakage, and obtain an accurate reconstruction of the original
signal. With rectangular windowing, truncation is abrupt, and significant spectral leakage occurs.
For a smooth transition between blocks, raised cosine windowing of overlapping segments was
used.
Three potential raised cosine windows windows to use are outlined in Figure 3. Typically, when
frequency selectivity is important, the use of a Hamming or Blackman window would be
preferred over than Hann window. However, these windowing functions produce more spectral
leakage than the Hann window. Also, for successful overlap adding with this phase vocoder
implementation, the standard raised cosine window will have to be modified.
Three raised cosine window forms based on the Hann window shape are defined as follows:
%----Hann Window: "Hanning" removes multiplications by zero-------w1 = .5*(1 - cos(2*pi*(0:WLen-1)'/ WLen)); %periodic Hann for overlap add
w_han = .5*(1 - cos(2*pi*(0: WLen -1)'/( WLen -1))); %Raised Cosine (Hann) window
w_hng = .5*(1 - cos(2*pi*(1: WLen)'/( WLen +1))); %Raised Cosine (Hanning) window

Tim Perry
V00213455
2009-05-21
Time domain

Frequency domain
50

0
-50
Magnitude (dB)

Amplitude

0.8

0.6

0.4

-100
-150
-200
-250

0.2
-300
0

-350
1

4
5
Samples

0.2
0.4
0.6
0.8
Normalized Frequency ( rad/sample)

Figure 3:Three potential raised cosine windows to use. The blue is a Hann window, the green is a Hanning
window (removes multiplications by zero), and the red is a periodic Hann window, designed for overlapadding of successive blocks. The periodic Hann window was used.

The Hanning window was modified according to the recommendations in [1] to facilitate smooth
overlap adding. The resulting window, w1, is the periodic hann, or hanningz as it is referred to
in [1] and [2]. This window is designed to begin at sample with a value of zero, and end with a
non-zero values sample, having the same value as the second sample [2]. That is:

, 1

0
1

0
1

So, applying this requirement to the raised cosine window, we get the general form as shown
below.
%==============================================================
% Create framing window (modified hanning window for OLA [2])
% w = [0, k_1, k_2,..., k_n-1 = k1]
%==============================================================
w1 = 0.5*(1-cos(2*pi*(0:WLen-1)'/(WLen)));
w2 = w1;

% analysis window
% synthesis window

Window sizes and hop sizes are discussed later on in with accompanying examples. A larger
analysis window size typically results in better frequency precision [2]. Smaller analysis
windows, however, offer superior tracking of rapid spectral changes, such as burst of high
frequency energy during transients.
A workaround to allow for enhanced frequency precision with a smaller window size, is to zero
pad the windowed grain:
7

Tim Perry
V00213455
2009-05-21
grain_zp = [zeros(1.5*WLen, 1); grain; zeros(1.5*WLen, 1)]; % zero pad
analysis grain for greater frequency precision

In the end, this was not implemented. The reason is that time-frequency plots were generated for
many audio examples, and the computer system was running out of memory. Also, having to
make design decisions over what particular window size should be used for each application was
constructive in general. There is a trade-off between frequency precision and representation of
transients, which will be frequently discussed.

Time domain

Frequency domain
50

0
-50

Magnitude (dB)

Amplitude

0.8

0.6

0.4

-100
-150
-200
-250

0.2
-300
0

4
5
Samples

-350

0.1

0.2
0.3
0.4
0.5
0.6
0.7
0.8
Normalized Frequency ( rad/sample)

0.9

Figure 4: Modified hanning window for WLen = 8 (used for vector testing)
Time domain

Frequency domain
50

1
0

-50

0.8

Magnitude (dB)

Amplitude

-100
0.6

-150

-200

0.4

-250
0.2
-300

10

20

30
Samples

40

50

60

-350

0.1

0.2

0.3
0.4
0.5
0.6
0.7
0.8
Normalized Frequency ( rad/sample)

0.9

Figure 5: Modified hanning window for WLen = 256 (used for cosine wave testing)

Tim Perry
V00213455
2009-05-21

1.1.2. CircularShiftandFFT
Circular/cyclic shift centers the window maxima at the origin, giving us zero phase at the middle
of the analysis grain, and a centered FFT. A simple phase relationship can be achieved in this
way, which is important when it comes time for phase unwrapping (discussed later). Without
cyclic shift, an impulse that is centered in the window will have oscillating phase values between
consecutive frequency bins, and as a result, the phases will unwrap in opposite directions [2]. By
default, a raised cosine window is zero at the origin. If, however, we shift the windowed segment
(grain) such that it is centered at the origin, we now have a grain with zero phase at its center.
When we perform successive FFTs on a grains centered about the origin, we obtain a phase
relationship that can be measured with a simple, systematic phase unwrapping algorithm.

1.1.3. OverviewofAnalysisStage
The analysis stage, by taking the FFT of a windowed, circular shifted segment, reveals the
measured phase and amplitude in each frequency bin. The next stage is either direct resynthesis,
or frequency domain processing followed by resynthesis. Many frequency domain effects require
phase unwrapping to be performed after the analysis stage, which will be discussed later.
Figure 6 highlights the analysis stage of the phase vocoder in context with the other stages. The
Matlab code for the basic operations of the analysis stage is included.

Tim Perry
V00213455
2009-05-21

Figure 6: Phase Vocoder implementation with analysis stage documented.

10

Tim Perry
V00213455
2009-05-21

1.2.ResynthesisPortion
The resythesis portion reconstructs the signal, or a processed version of the signal such that it regains a
time domain representation suitable for audio playback. For the definition of the resynthesis stage that
is applied in this design, it is assumed that the target phase, denoted phi_t, is known. For the basic phase
I/O functionality of the phase vocoder without effects, phi_t is simply set equal to the measured phase
from the analysis section, phi. In cases where FX processing has occurred in the frequency domain, we
will assume for now that phi_t (as well as any amplitude scaling on the FFT) has been calculated
specifically for that effect. This is done so that we can hold off on an explanation on phase unwrapping
until it becomes relevant - when we dive deeper into the specifics of phase vocoder effects processing.
The resynthesis part can be broken up into 4-5 steps for now:
1. Take the IFFT of the current frame, which is the analysis FFT frame with any newly
calculated target phase/amplitude values.
2. Re-compensate for the early circular shift by un-shifting the segment.
3. Apply inverse windowing (window tapering) to the segment, to correct for phase
discontinuities that may have occurred at the edges of a frame [2]. We will call the result
the synthesis grain, grain_t.
4. *FX specific: for certain effects, such as pitch shifting, an interpolation
scheme/resampling may be implemented, which required redefined synthesis grains. This
will be discussed later when it is applied.
5. Overlap add the synthesis grains back into the time domain.

11

Tim Perry
V00213455
2009-05-21

Figure 7: Phase Vocoder implementation with resynthesis stage documented.

12

Tim Perry
V00213455
2009-05-21

1.3.PhaseUnwrapping
Appearing in the resynthesis portion of the analysis-synthesis based phase vocoder of [2], phase
unwrapping will instead be conceptually placed before resyntehesis, in the frequency domain
processing stage. This is because phase unwrapping is a precursor to many frequency domain
audio effects.
Taking

as the phase variation, the phase can be represented in a general form as follows:

From the above expression, we will conceptually represent the phase in a single frequency bin k,
and express phase purely as a function of n (but in application, each expression below will be
applied to every frequency bin). For specific phase values, n will be expressed explicitly in terms
.
of the block index s (represented as i in the Matlab implementations), and the hope size
In order to obtain the exact phase value for each bin, phase unwrapping was performed. The
difference between the measured phase phi0 and the target phase phi_t which corresponds to each
bins nominal frequency was first computed. Next, the phase increment was calculated relative to
one sample. The vector delta_phi contains the phase difference between two adjacent frames for
each bin, and the nominal phase of the bin [2].
The following computations are base on the measured phase values of two consecutive FFT
frames. If corresponds to the frequency of a stable sinusoid, then the target phase phi_t can be
computed from the previous measured phase phi0 [1] by adding
, where
is the hop size:
(1)

The unwrapped phase can be computed from the target phase phi_t and deviation phase phi_d as:
(2)

The principle argument function is used in the calculation of the deviation phase [1]. princarg
returns the nominal initial phase of each bin, placing it in the range [ , . The deviation phase
phi_d was computed as the principle argument of phi minus the principle argument of phi_t:

13

Tim Perry
V00213455
2009-05-21
(3)

%============================================================
% princarg.m
%
% Function that returns the principle argument of the nominal
% initial phase of each frame (for use with pVocoder_FFT). [2]
%============================================================
function Phase = princarg(Phasein)
a=Phasein/(2*pi);
k=round(a);
Phase=Phasein-k*2*pi;
End

Figure 8: Phase computations for frequency bin k [1].

From Figure 8, the unwrapped phase difference between two consecutive frames is the difference
between the deviation phase and the previous phase value:

(4)

The instantaneous frequency for frequency bin k can be calculated at time

as:

14

Tim Perry
V00213455
2009-05-21
(5)

The unwrapped phase and instantaneous frequency values will be important for the
implementation of many frequency domain FX, which will be shown later.

2. Testing Hop Size with Vectors


Preliminary and intermediate testing was conducted on vectors. The function kernalPlot.m is
automatically called to plot the analysis grains of each iteration of the kernel algorithm if the total number
of windowed grains will be few enough to view on a single plot (ie. the input is a test vector, not an audio
signal). This was useful when experimenting with various windowing and hop size schemes. Since
FFTshift is used to perform the cyclic shift, kernelPlot plots the grain before it has been centered about
the origin.
With the modified hanning window (called hanningz in [1] and [2[), it was found that the ratio of analysis
hop size to window length had to be at least for the signal to be properly reconstructed during overlapadding (OLA). For the basic phase vocoder implementation (without effects), the ratio that produced the
best OLA results was:

(6)

15

Tim Perry
V00213455
2009-05-21

Figure 9: Successive analysis grains generated by the kernal algorithm (only non-zero grains
displayed). WLen = 8, Ra = 4.

Figure 10: With a hop/win ratio of


synthesis overlap add process.

= 1/2, modulating of the output signal occurred during the

16

Tim Perry
V00213455
2009-05-21

Figure 11: Successive analysis grains generated by the kernal algorithm. WLen = 8, Ra = 2.

Figure 12: With a hop/win ratio of


overlap-add process.

= 1/4, no output modulation occurred during the synthesis

17

Tim Perry
V00213455
2009-05-21

3. Testing with Cosine Waves


In order to test the I/O operation of the basic phase vocoder at various frequencies, cosine wave testing
was employed. Testing was conducted on sampled cosine wave segments with both integer and non
integer number of samples per cycle. Several different input signal lengths were used in order to allow for
plots with easy visibility at all frequencies tested. Frequencies and input parameters were used for the
majority of the tests are displayed below:
%=========================================================================
% Testing PV on cosine wave with int/non-int # of samples/cycle
%=========================================================================
fs = 8000;
% sampling frequency
Ts = 1/fs;
% sampling period
Nx = 1000;
% duration of signal
nT = (0:Nx-1)*Ts;
% Nx length time vector
% integer # of samples/cycle
f1 = 4;
f2 = 31.25;
f3 = 500;
f4 = 2000;
% non-integer # of samples/cycle
f5 = 7;
f6 = 33;
f7 = 300;
f8 = 1500;

%
%
%
%

2000 samples/cycle
256 samples/cycle
16 samples/cycle
4 samples/cycle

%----------- cosine wave input parameters--------f = f3;


% choose freq of input
x = cos(2*pi*f*nT);
% cosine wave input
WLen = 256;
Ra = WLen/4;
Rs = WLen/4;

% window length
% analysis hop size
% synthesis hop size

3.1.Cosinewaveinputwithinteger#ofsamples/cycle
The following universal parameters were used for testing the phase vocoder with cosine waves having
an integer number of samples per cycle:

Sampling frequency:
Window length:
Duration of sinusoid (unpadded):

fs = 8000Hz
WLen = 256 samples
N_orig = 1000 samples

18

Tim Perry
V00213455
2009-05-21
The testing results are visible in the plots below. For each figure, the details of the test are outline in
the caption. Plots of the analysis grains (prior to cyclic shift) are also included, as they are helpful for
illustrating the windowing at each frequency. The I/O plots demonstrate that the reconstructed
waveform is identical to the input waveform at all frequencies tested (both integer and non-integer
number of samples per cycle)

Figure 13: Successive analysis grains for integer sampled cosine wave with freq 4Hz (2000
samples per cycle, a fraction of a cycle per segment). WLen = 256, Ra = WLen/4.

Figure 14: Input and output for 4 Hz cosine wave.

19

Tim Perry
V00213455
2009-05-21

Figure 15: Successive analysis grains for integer sampled cosine wave with freq 31.25Hz (256 samples per
cycle, 1 cycle per segment). WLen = 256, Ra = WLen/4.

Figure 16: Input and output for 31.25Hz cosine wave (256 samples/cycle).

20

Tim Perry
V00213455
2009-05-21

Figure 17: Successive analysis grains for integer sampled cosine wave with freq 100Hz. WLen = 256,
Ra = WLen/4.

Figure 18: Input and output for 100Hz cosine wave (80 samples/cycle).

21

Tim Perry
V00213455
2009-05-21

Figure 19: Successive analysis grains for integer sampled cosine wave with freq 500Hz. WLen = 256, Ra =
WLen/4.

Figure 20: Input and output for 500Hz cosine wave (16 samples/cycle).

22

Tim Perry
V00213455
2009-05-21

Figure 21: Successive analysis grains for integer sampled cosine wave with freq 500Hz. WLen = 256,
Ra = WLen/4.

Figure 22: Input and output for 1000Hz (8 samples/cycle) cosine wave, zoomed in for clarity.

23

Tim Perry
V00213455
2009-05-21

3.2.Cosinewaveinputwithnoninteger#ofsamples/cycle
The analysis grain and I/O plots below reveal a perfect resynthesis for cosine waves with a fractional
number of samples per cycle.

Figure 23: Successive analysis grains for integer sampled cosine wave with freq 7Hz (1333.33 samples per
cycle, a fraction of a cycle per segment). WLen = 256, Ra = WLen/4.

Figure 24: Input and output for 7 Hz cosine wave.

24

Tim Perry
V00213455
2009-05-21

Figure 25: Successive analysis grains for integer sampled cosine wave with freq 7Hz (1333.33 samples per
cycle, a fraction of a cycle per segment). WLen = 256, Ra = WLen/4.

Figure 26: Input and output for 33 Hz cosine wave (242.424 samples per cycle, 1.056
cycles/segment).

25

Tim Perry
V00213455
2009-05-21

Figure 27: Successive analysis grains for integer sampled cosine wave with freq 300Hz. WLen = 256, Ra =
WLen/4.

Figure 28: Input and output for 300Hz cosine wave (8000/300 ~ 26.667 samples/cycle).

26

Tim Perry
V00213455
2009-05-21

3.3.Nonintegerwindowingwithcosinewaveinput
For contrast, non-integer windowing is demonstrated below. Here, we see the results of truncation, which
leads to spectral leakage in the frequency domain. The resulting signal is not reconstructed accurately.
This is most apparent by viewing the amplitude envelope of the output signal. The envelope is no longer
rectangular, as it has undergone a small amount of amplitude modulation. Interestingly, the spectral
leakage bears resemblance to a simple AM modulated waveform when viewed in the frequency domain
a carrier wave/band with two sidebands. For the phase vocoder, however, we want to avoid this that
means no non-integer windowing (we will, however, apply non-integer hop sizes for certain effects).

Figure 29: Successive analysis grains for non-integer windowing of 100Hz (80 samples/cycle) cosine wave
(top). Input and output of PV (bottom). Nx_in = 874.3, WLen = 293.7.

27

Tim Perry
V00213455
2009-05-21

Figure 30: Successive analysis grains for non-integer windowing of 300Hz (~ 26.667 samples/cycle) cosine
wave (top). Input and output of PV (center). Zoomed view of output, showing the results of truncation (which
leads to spectral leakage). Nx_in = 874.3, WLen = 293.7.

28

Tim Perry
V00213455
2009-05-21

4. Waterfall Plots
The time-frequency (waterfall) representation of a signal is useful for visualization/analyzing the spectral
content as as it changes with time. Also, the relationship between frequency resolution and window size
becomes very apparent when it can be seen, and not only heard. Waterfall plots represent a signal in terms
of its successive FFT frames in other words, as a function of the sample index of windowed time
segments, and of the frequency bins. For a window length of WLen samples, we have an FFT size of
WLen frequency bins.
The waterfall plots throughout this paper evolved, following the evolution of the waterfall plot function,
which has undergone frequent modifications. Four different time-frequency representations will be used
for analysing this phase vocoder design and operation:
1. Amplitude Waterfall (both linear and logarithmic frequency scales are used)
2. Phase Waterfall
3. Magnitude Waterfall (both linear and logarithmic frequency scales are used)
4. Phase vs. Time plot of Frequncy Bins near the maximum signals maximum amplitude in the
frequency domain. This type of plot is essentially a phase waterfall that has been zoomed in to a
typical area of interest in the case of cosine waves, the center bin is the bin closest to the waves
frequency. In the case of more complex signals, the plot will often be centered close to the
fundamental.
Before plotting the 3D time-frequency representations, a test plot of the FFT frames was completed, as
shown in Figure 31. Each subplot corresponds to the FFT of each analysis grain. The corresponding
waterfall plot will simply involve linearly orienting the FFT frames along the time axis, according to the
index of each window (the block index). The spacing (number of samples) between frames in the time
domain is determined by the hop size.

29

Tim Perry
V00213455
2009-05-21

Figure 31: FFTs of each frame for a 500Hz cosine wave 1000 samples in length. WLen = 256, Ra = WLen/4.

4.1.Phasevs.Time,InstantaneousFrequency,andFrequencyResolution
To properly interpret the phase vs. time plots for frequency bins close to the frequency of a cosine wave,
we will first derive an expression for instantaneous frequency. The goal is to understand why the phases
change with time for cosine waves of different frequencies.
Let the input cosine waveform be expressed by:
cos

cos 2

Instantaneous frequency is defined as the rate of change in the phase. From this we can conclude that
and the frequency of the cosine
when the phase is not changing, the instantaneous frequency
wave is constant.
1
2
After taking the FFT of a segment of x, we will focus on the instantaneous frequency of one frequency
bin:
30

Tim Perry
V00213455
2009-05-21
,

1
2

This can be expressed in terms of the unwrapped phase difference, bringing us back to equation (5).The
instantaneous frequency for bin k at time
1
is:

Since frequency bins are spaced according to the frequency resolution , some frequencies will be
directly centered on a bin, and some will lie between two bins. For maximum frequency resolution and
the most accurate measurement of the sinusoids phase and amplitude, we could use rectangular
windowing. However, this phase vocoder will not simply be used for analysis of sinusoids; it will be used
for audio effects. Rectangular windowing spectral leakage, which we would like to minimize.
For cosine wave frequencies that have both an integer number of samples per cycle and lie precisely on a
multiple of , the Phase vs. Time plot shows unity phase in the frequency bin where maximum
amplitude occurs. This makes sense, considering that the frequency of the cosine is not changing, that is,
0. The exception is at the truncation points where the input cosine
the instantaneous frequency
has initially been rectangular windowed. At these points in time, spectral leakage is visibly occurring,
which is indicated by the presence of visible sidebands. Even with modified hann window tapering
specifically catered to OLA, we notice spectral leakage at the end points, where no overlap adding is
occurring.
(it does not lie directly on the bin), the plots below show that we do not get perfectly
When
linear phase in the nearest bin the phase is time varying. This is because we are not actually measuring
the phase of the input cosine wave we are measuring the phase of the nearest frequency bin. The nearest
frequency bin is centered on a slightly different frequency than the cosine wave. Since its a slightly
different frequency, it will sometimes be in phase with our cosine wave (having 0 phase at these points),
and sometimes be out of phase (the relative phase changes with time). The closer the measured bin is to
the center frequency, the slower the phase will change with time. This confirms that for the most accurate
phase measurements, we want a high frequency resolution
The above is analogous to beat frequencies: two nearby tones exhibit a beat frequency when they are
slightly off in frequency. As we tune one tone to the other, the beat frequency slows, and when the signals
have the exact same pitch, it stops. In terms of musical applications, a higher frequency resolution can be
compared to a better tuner. With a better tuner, the frequency of an out of tune tone can be measured
more precisely.

31

Tim Perry
V00213455
2009-05-21

4.2.Amplitude,Magnitude,andPhaseinTimeFrequencyPlane
The following time-frequency plots coincide with the earlier testing from section 3. The constant
parameters are:

Sampling frequency:
Window length:
Frequency resolution:

fs = 8000Hz
WLen = 256 samples
f0 = 31.25Hz

Integer Samples per Cycle, Many Cycles per Segment


In the example below, the 2000 Hz cosine wave input corresponds to a 4 samples per cycle. With
the exception of spectral leakage caused by the discontinuous cosine wave, the amplitude
spectrum is well represented, with a peak at 1. The cosine wave frequency f1= 2000Hz is
centered on one of the FFT bins. As a result, we have a 0 phase representation in the phase
waterfall. The phase appears to be random everywhere else in the spectrum (and appears even
more so with a higher frequency resolution). The phase at all values other than f1 should be
ignored, as the points were calculated from round-off errors at frequencies where the spectrum
should be zero. This problem can be largely solved if a centered hann window is used but we
have already discussed the need for an offset window for best results with the overlap add
process.

Figure 32: Amplitude and Phase waterfall representations 2000Hz Cosine Wave input (Nx = 5000 samples)

32

Tim Perry
V00213455
2009-05-21
The plot of the bins near the maximum amplitude bin visually confirms the discussion in the
previous section. Here, the f1 is bin centered, and the phase is constantly 0. In the direct
neighbouring bins, which are slightly off in frequency, the phase is still constant up to the points
where truncation occurs, and sidebands cause their phase to take a jump from unity to +-180
degrees (opposite for upper and lower sidebands).
Phase vs Time for Freq Bins Near Fundamental (2000Hz-CosineWave)
3

ArgXw (f) [rad]

2
1
0
0
-1
-1

-2
0
-3

1000

1850

-2

2000

1900

3000

1950
2000

4000
2050
2100

-3

5000
n [samples]

f [Hz]

Figure 33: Phases of nearby bins for 2000Hz Cosine Wave input (signal length = 5000 samples)
Phase vs Time for Freq Bins Near Fundamental (2000Hz-CosineWave)
3

X: 256
Y: 2000
Z: 0

ArgXw (f) [rad]

2
1

0
-1
0

-2

-1

200

-3

400

1850

600

1900

-2

800

1950

1000

2000

1200

2050
2100

1400

-3
n [samples]

f [Hz]

Figure 34: Phases of nearby bins for 2000Hz Cosine Wave input (signal length = 1000 samples).

Integer Samples per Cycle, 1 Cycle per Segment


33

Tim Perry
V00213455
2009-05-21
For the WLen = 256 samples and fs = 8000Hz, we achieve 1 cycle per segment for:
f1 = 31.25Hz (256 samples per cycle)
31.25Hz is gain centered on an FFT bin (the first bin). The amplitude spectrum between the
truncated edged of the cosine wave segment is slightly misshapen, we dont have zero phase. The
phase is changing with time in the 31.25Hz bin. This is because successive windows are out of
phase with one another. We need to go through four successive windowings (since Ra = WLen/4)
to obtain two identical phases. The phase starts at 0, and the phase is again 0 after every 4nd
successive windowing, The kernel plot in Figure 37 helps to illustrate this.

Figure 35: Amplitude and Phase waterfall representations of 31.5Hz Cosine Wave input (Nx = 2048samples)

34

Tim Perry
V00213455
2009-05-21

Figure 36: Amplitude and Phase waterfall representations 31.5Hz Cosine Wave input (Nx = 2048 samples)

Figure 37: Successive analysis grains (prior to cyclic shift) for 31.5Hz Cosine Wave input (Nx = 2048
samples)

Integer Samples per Cycle, Fractional Cycle Per Segment


35

Tim Perry
V00213455
2009-05-21
For the WLen = 256 samples and fs = 8000Hz, we achieve can achieve a fraction of a cycle per
segment with integer samples per cycle for a number of frequencies. The chosen frequency is:
f1 = 4Hz (2000 samples per cycle)
Since the period of this cosine wave is much larger than the window size, the actual amplitude of
the cosine wave (its maximum or minimum value) will only be measured once for every 4 full
window lengths traversed by the kernel algorithm(corresponding to 16 hops/successive
windowings). The trend related, but different for the phase. The phase readings are constant
between consecutive hops, but after every 4 full window lengths the value jumps between 0 and
pi. Each time we hit zero phase in the bin closest to f1, we see a re-occurring pattern across the
spectrum that results from measured phase values in the sidebands - the result of spectral leakage.
Since f1= 4Hz is below the lowest frequency bin, a similar trend is also occurring just below the
Nyquist frequency. In an attempt to represent the signal with the frequency bin below f1, which
does not exist, the phases are being folded back into the bin at the top end of the spectrum.

Figure 38: Amplitude and Phase waterfall representations for 4Hz Cosine Wave input (Nx = 5 000 samples).

36

Tim Perry
V00213455
2009-05-21

Figure 39: Magnitude Spectrum in dB for 4Hz Cosine Wave input (signal length = 5 000 samples).
Phase vs Time for Freq Bins Near Fundamental (4Hz-CosineWave)
3

X: 4928
Y: 0
Z: 3.142
X: 4544
Y: 0
Z: 0

2
ArgXw (f) [rad]

-2
-1
0

20

1000

40

2000

60

-2

3000

80
4000

100
120

-3

5000
n [samples]

f [Hz]

Figure 40: Phases of nearby bins for 500Hz Cosine Wave input (Nx = 5000 samples).

37

Tim Perry
V00213455
2009-05-21

Non-Integer Samples per Cycle, Many Cycles per Segment


With a frequency f1 = 1500 Hz, we have an example using non integer samples per cycle that lines up
perfects in the center of a bin. As in the case with integer samples per cycle, the phase is constantly 0 in
the center bin. Neighbouring bins experience deviation when the sidebands leak into them at points of
truncation.

Figure 41: Amplitude and Phase waterfall representations for 1500Hz Cosine Wave input (Nx = 1025
samples).

Figure 42: Phases of nearby bins for 1500Hz Cosine Wave input (Nx = 1024 samples).

38

Tim Perry
V00213455
2009-05-21

Non-Integer Samples per Cycle, ~1 Cycle Per Segment


With a frequency f1 = 33 Hz, we have 242.424 samples per cycle, and 1.056 cycles per segment.
Viewing the amplitude spectrum, the trend resembles what we saw with 1 cycle per sample. There are
some inconsistencies in the shape of the amplitude spectrum. Some of the energy from the fundamental
has leaked into other parts of the spectrum. The phase jumps between 0 and 180 degrees after each 2
successive windowings (nonlinear phase). This jump in phase means that the instantaneous frequency of
the cosine wave is not constant. If we zoom in closely on the magnitude spectrum, this is apparent.

39

Tim Perry
V00213455
2009-05-21

Figure 43: Amplitude and Phase waterfall representations for 33 Hz Cosine Wave input (Nx = 2048 samples).

40

Tim Perry
V00213455
2009-05-21

Figure 44: Phase waterfall for 33Hz Cosine Wave input (Nx = 2048 samples).

Figure 45: Phases of nearby bins for 33Hz Cosine Wave input (Nx = 2048 samples).

41

Tim Perry
V00213455
2009-05-21

Figure 46: Zoomed view of a portion of the magnitude spectrum [dB] that displays the peak amplitude of the
bins closest to f1. The instantaneous frequency changes with the phase.

Non-Integer Samples per Cycle, Fractional Cycle Per Segment


A frequency of f1 = 7 Hz was chosen for a cosine wave that has non integer samples per cycle, and
undergoes less than one cycle per segment. The general trends look is similar the Integer samples per
cycle example (correct amplitude values are only measured when the cosine wave maxima/minima is
windowed); however, spectral leakage is more severe in this case. Since we are no longer centered on a
frequency bin, the phase exhibits nonlinear behaviour, and creates sidebands that fluctuate in intensity
(the energy that they tap from the fundamental is greatest when the amplitude of the center bin is lowest.
At the first truncation point, the sidebands peak at -20dB in neighbouring bins.

42

Tim Perry
V00213455
2009-05-21

Figure 47: Amplitude and Phase waterfall representations for7 Hz Cosine Wave input (Nx = 2048 samples).

43

Tim Perry
V00213455
2009-05-21

Figure 48: Magnitude spectrum of 7Hz cosine wave input (Nx = 2048 samples)

Figure 49: Phases of nearby bins for 7Hz Cosine Wave input (Nx = 2048 samples).

44

Tim Perry
V00213455
2009-05-21

5. Cyclic Shift
5.1.CyclicShiftwithCosineWaveInput
All previous signals were tested with a cyclic shift applied in order to center the FFT (fftshift was
used). A 500 Hz cosine wave will be used here, first represented with cyclic shift applied:

Figure 50: 500 Hz cosine wave amplitude and phase waterfalls with cyclic shift applied.

45

Tim Perry
V00213455
2009-05-21

Figure 51: cosine wave phase waterfalls with cyclic shift applied (alternative view)

Figure 52: 500Hz cosine wave phase magnitude waterfall with cyclic shift applied.

46

Tim Perry
V00213455
2009-05-21

Figure 53: Phases of nearby bins for 500Hz Cosine Wave input (Cyclic shift applied)
.

WithoutCyclicShift

When we remove the cyclic shift, the phase response in bins that dont have zero face is altered quite
significantly. Its difficult to interpret the relationship with a cosine wave, however. We have removed a
common 0 phase reference at the window center, and the resulting FFTs are not centered with a phase of
0 in frequency bin 0. Phase unwrapping can still be performed, but it will require a more complicated
algorithm than the one outline here. Where we would likely have significant problems, is with frequencies
that lie between frequency bin centers, in schemes with not great frequency precision (such as the one
used here, with the modified hann window). Figure 54 shows a drastic change in the phase relationship
between the bins directly next to the center bin, which has for practical purposes retained its 0 phase. The
nextdoor bins, however, are 180 degrees out of phase, where with a cyclic shift applied, they lie at zero
between the truncation regions. The phase relationship seems to be the same as the cyclic shifted phase
relationship in every second bin out from the center bin (which has 0 phase). However, every odd bin out
from this reference point has a very different phase relationship.

47

Tim Perry
V00213455
2009-05-21

Figure 54: Phases of nearby bins for 500Hz Cosine Wave input (Cyclic shift not applied)

48

Tim Perry
V00213455
2009-05-21

5.2.CyclicShiftwithKroneckerDeltaFunctionInput

%=========================================================================
% Testing Phase Vocoder on Unit Impulse
%[y,ModIn,PhasesIn,ModOut,PhasesOut]=PVOCODER_FFT(x,fs,WLen,Ra,Rs,TAG)
%=========================================================================
x_imp = zeros(1000, 1);
x_imp(1) = 1;
% impulse input
WLen = 256;
Ra = WLen/4;
Rs = WLen/4;

% window length
% analysis hop size
% synthesis hop size

By analyzing a burst of broadband energy, it is somewhat easier to observe what is happening in the
across the frequency spectrum when cyclic shift is/is not applied. A Kronecker Delta Function will be
used as the input, as defined above.

WithCyclicShift
After framing the input segment, a cyclic shift is applied, centering the analysis grain at the time origin ,
thereby assigning 0 phase to the window center. Using a circular shift, the analysis window is centered at
the time origin. The resulting FFT has 0 phase associated with bin 0.

49

Tim Perry
V00213455
2009-05-21

Figure 55: Amplitude (top) and phase (bottom) waterfall representations of delta function (cyclic shift is
used)

50

Tim Perry
V00213455
2009-05-21
Phase vs Time for Freq Bins Near Fundamental (Kronecker-Delta)
3
2.5
3

2
ArgXw (f) [rad]

1.5
1
1

0
-1

0.5

-2

-3

-0.5

0
20
40
60
80
100
120

1400

1200

1000

800

600

400

200

0
-1
-1.5

n [samples]
f [Hz]

Figure 56: Time-Frequency representations of amplitude and phase

WithoutCyclicShift
Without cyclic shift, there is again a change in the phase spectrum. With the simple input signal used, we
can see that the bin closet to the origin has a maximum value reaching toward 180 degrees, while with
cyclic shifted example, the same bin is a value closer to zero phase, which is the value at the origin. The
phase values exhibit a more erratic behaviour along the both the sample index axis and the frequency bin
axis when cyclic shift is not used. Phase unwrapping will certainly be more complex, and the correct
target phase values may be ambiguous.

Figure 57: Input and Output signals for Impulse input (top), selected analysis grains (bottom).

51

Tim Perry
V00213455
2009-05-21

6. Preliminary Audio Testing


Before implementing and effects, the basic phase vocoder I/O operation was tested to verify that the
analysis-resynthesis stages perform correctly, and do not introduce significant artefacts to the sound.
Testing was performed on various audio signals with different timbres and attack styles. For signals that
are defined largely by transient components, such as percussion recordings, it was found that a smaller
52

Tim Perry
V00213455
2009-05-21
window size (WLen = 1024 samples seemed to work) was more effective than a large window. This is
because with a smaller window, FFTs are taken more frequently, and hence the rapidly changing
transients are better tracked. With a larger window size on percussive material, transients were noticeably
distorted, with a smeared and less percussive attack. On the other hand, for pitched instruments with more
subtle transients, a larger window size is preferred (WLen = 4096 samples was frequently used). The
higher frequency precision that a larger window/FFT size provides is particularly well suited to largely
stable signals with a complex timbre (such as a bowed violin). The catch here, however, is that a rich
timbre is typically characterized as being abundant in partial harmonics and broadband spectral content,
much of which is high frequency energy that decays quickly. Still, for slowly changing signals in general,
the larger FFT size facilitates a reconstructed signal with a spectral representation that is more faithful to
that of the original signal.
For many test signals, in general, a reasonable compromise between transient response tracking and
frequency resolution was achieved with a window size of 2048 samples.
The flute2.wav was chosen to demonstrate the input/output performance of the phase vocoder, as the
waveform is quite simple in shape (a single note with a volume swell and changing dynamics). The flute
vibrato is actually more of a tremolo, as it consists mainly of a rapid variation in amplitude and timbre,
not in pitch. This shows up in the envelope of the waveform, and seems to gain emphasis when certain
effects are applies, such as time stretching. Figure 58 shows an apparently clean re-synthesis of the input
signal. This audio example and the topic of frequency resolution will be explored in the wha-wha filter
implementation section.

Figure 58: IO of flute2.wav (WLen = 4096).

53

Tim Perry
V00213455
2009-05-21

Figure 59: Magnitude [dB] waterfall of flute2.wav.

Figure 60: Diner Input Amplitude Waterfall

54

Tim Perry
V00213455
2009-05-21

Figure 61: Diner Input Magnitude Waterfall [dB]

7. Implementing Audio Effects


For the most part, audio effects were implemented in the frequency domain between the analysis and
resynthesis stages of the kernel algorithm. Certain effects, such as pitch shifting, required additional
considerations during resynthesis (interpolation/resampling in the case of pitch shifting, which was also
used in robotization). In such cases, having all stages of the processing in a single kernel loop was
convenient. An alternative phase vocoder implementation [2] that uses separate analyisis/synthesis
functions was very helpful as a learning tool, but less convenient in practice when effects processing
required special considerations during the analysis and syntheses stages.

7.1.TimeStretching
To implement time stretching, a time stretch ratio was defined in the kernel algorithm initialization stage
as follows.
tStretch = Rs/Ra

% time stretch ratio

Ra is the analysis hop size, and Rs is the synthesis hop size. The time stretching function is dependent on
different window sizes for analysis and synthesis. For the basic I/O phase vocoder outlined earlier,
tStretch was equal to 1 (no time stretching). By using a different synthesis hop size, we are essentially re-

55

Tim Perry
V00213455
2009-05-21
sampling the signal during synthesis at a different sampling rate. This will produce a time stretch, but on
its own, it will also produce pitch shifting.
In order to time stretch the signal without applying pitch shifting, we have to do some frequency domain
processing on the phase. Finally, phase unwrapping is put to use. The frequency domain processing
portion of the code for the time stretching implementation of the phase vocoder is shown below. This
code segment fits between the analysis and synthesis portions of the kernel algorithm.
%==========================================================
% Frequency Domain Processing
%==========================================================
%----------- Phase Unwrapping --------------phi_d = princarg(phi-phi0-omega);
% devlation phase (3)
% phase increment delta_phi: the phase difference between two adjacent frames
for each bin, added to nominal phase of the bin
delta_phi = omega + phi_d;
phi0 = phi;
% measured phase
%--------- Target Phase Calculation --------% implemetents time stretching by ratio Rs/Ra
phi_t = princarg(phi_t + delta_phi*tStretch);
% target phase

The phase unwrapping section was described earlier. The target phase calculation, however, will make
use of the time stretching ratio. By scaling the phase increment with the stretch ratio, we calculate new
target phase values that preserve the instantaneous frequency of each bin during resynthesis. This results
in the same time stretching provided by the different Ra and Rs hop sizes, but the pitch of the original
signal is retained. Window size is important (WLen should be the same size as the FFT), and should
ideally be selected based on the signal type (for example, a smaller window size for signals that have a
significant transient component will allow rapid spectral changes to be tracked, at the expense of
frequency precision: listen to 'EidolDrum-TimeStretch2-L1024-Ra128Rs256.wav). Also, sufficient
overlap must be preserved between window segments. The synthesis hop size in relation to the window
size is critical. For the moderate time stretching that was performed a suitable analysis hop size/window
length ratio was found to be:

1/8
If

is too large, for example, and the signal is stretched to several times its length, a situation can

arise where there is insufficient overlap to perform overlap adding in the resyntheis stage, resulting in a
butchered output signal.

56

Tim Perry
V00213455
2009-05-21

Figure 62: Time stretching on a cosine wave by a factor of 2. The envelope of the reconstructed
signal is not ideal at the ends of the signal, as the original cosine wave was rectangular windowed
before being applied to the phase vocoder. Overlap adding in the middle of the signal, however, is
preserved. With real audio signals, we typically have a smoother transition to and from silence.

57

Tim Perry
V00213455
2009-05-21

Figure 63: Phases near max amplitude bins for time stretching 2 cosine waves, 500Hz (top) and 2000Hz
(bottom).

58

Tim Perry
V00213455
2009-05-21

Figure 64: diner.wav time stretched by a factor of 2.

Figure 65: Closer look at the end of both input and output waveforms for diner.wav time stretched by a
factor of 2.

59

Tim Perry
V00213455
2009-05-21

7.2.PitchShifting
To perform pitch shifting without effecting the duration of the signal, a system of integrated
resampling was combined with a linear interpolation scheme, as outline in [1]. The two basic steps
are as follows:
1. For each grain, a time stretching is performed with a stretch ratio of Rs/Ra. This changes the
pitch and duration.
2. To retain the new pitch, but keep the original signal duration, each FFT is resampled to a
length

, where the NFFT is the FFT size. The integrated resampling is conducted

using an interpolation scheme, and the resulting interpolated grains are overlap added in the
time domain.

Figure 66: Overview of block-by-block pitch shifting with integrated resampling (an interpolation
scheme) [1].

A linear interpolation scheme was used for integrated resampling, and has also found its way into
alternative implementations of some of the other audio effects that were tested. Since pitch shifting
involved many initializations for the interpolation scheme, and interpolation was performed in the
resynthesis stage of the kernel algorithm, the entire kernel algorithm of the function pv_Pitchshift.m
will be included here (the full phase vocoder function is included in the appendix).
First, the pitch shifting specific initializations will be highlighted:

60

Tim Perry
V00213455
2009-05-21
%-------------------------------------------------% FX-specific initializations for pitch shifting
%-------------------------------------------------tStretch = Rs/Ra

% time stretch ratio

%-------Linear Interpolation Parameters--------Lresamp = floor(WLen/tStretch);


% length of resampled/interpolated grain
nInterpSpace = linspace(0,Lresamp-1,Lresamp)'; % linear spaced time row vec
nfracInterp = 1 + nInterpSpace*WLen/Lresamp;
nInterp0 = floor(nfracInterp);
nInterp1 = nInterp0 + 1;
frac0 = nfracInterp - nInterp0;
frac1 = 1-frac0;

Output = zeros(Lresamp+Nx,1);

%
%
%
%
%
%
%
%

Lresamp length vector of sample


values between 1 and WLen
Lresamp length vector of sample
values between 2 and WLen+1
fractional distances of integer
below interpolation points
fractional distances of integer
above interpolation points

integer
integer
samples
samples

% initialize output vector (overlap-added


% interpolated synthesis grains)

The analysis portion is shared with the original implementation that was outline. In the frequency
domain, identical processing is performed as with time stretching. However, the resynthesis portion
introduces linear interpolation between successive grains, using the frac0 and frac1 fractional
distances (which compare integer sample points to interpolation points) that are denoted above.
Frequency domain processing and resynthesis portions of the pitch shifting kernel algorithm are
included below:
%==========================================================
% Resynthesis Portion with Linear Interpolation
%==========================================================
ft = r.*exp(j*phi_t);
rt = abs(ft);

% FFT with ith grain target phase


% output amplitude

ModuliOut(1:win_end, i+1) = rt;


% store output moduli (same as input)
PhasesOut(1:win_end, i+1) = phi_t; % build matrix of output phases
%-------------- Inverse FFT & Windowing -----------tIFFT = fftshift(real(ifft(ft)));
% shifted IFFT
grain_t = tIFFT.*w2(1:win_end);
% inverse windowing (tapering)

%----------------- Interpolation -------------------grain_t2 = [grain_t;0];


% pad w/ single zero to allow interpolation
% between succesive grains

61

Tim Perry
V00213455
2009-05-21
% apply linear interpolation (integrated resampling):
grain_t3 = grain_t2(nInterp0).*frac1 + grain_t2(nInterp1).*frac0;

if (numWin_s <= 24)


% plot this grain
kernalPlot(grain_t3,WLen,i,numWin_s,grainFIGs);
end
%----------Overlap Adding of Resampled Grains--------Output(vOut:vOut+Lresamp-1) = Output(vOut:vOut+Lresamp-1) + grain_t3;
vIn = vIn + Ra;
vOut = vOut + Ra;

% sample index for start of next block

A rudimentary harmonizer is built into the pitch pv_Pitchshift.m function, which keeps the left
channel of the original signal, and combines it with a pitch shifted version of the signal, stored in the
right channel. The harmonizer is not key centered, is simply harmonized based on a specified interval.
While this is not very useful in practice, is was interesting to experiment with adding specific
harmonies to percussion lines and pitched instruments. Also, I did some experimentation with micro
tonal pitch shifting and harmonies. This, of course, is highly dependent on the frequency resolution
used. The various intervals and input parameters are listed below for interest. Most of the audio
examples are minor third and tritone intervals. Of course, few of the intervals are obtained exactly
using the rounding system used. The exception is the perfect fifth, a ratio of 1.5.
%-----choose parameters by ~pitch shifting interval-----Ra = WLen/8;
% analyis hop size
%interval = 32805/32768; % skhisma (results in bad approximation)
%interval = 2048/2025
% diaschisma
%interval = 81/80
% Syntonic coma
%interval
%interval
%interval
%interval
%interval

=
=
=
=
=

1/4
1/2
5/7
3/4
5/6

%interval = 7/6
%interval = 6/5
interval = 7/5
%interval = 10/7
%interval = 3/2
%interval = 8/5
%interval = 13/7
%interval = 7/4
%interval = 16/9
Rs = round(Ra*interval)
tStretchRatio = Rs/Ra

%
%
%
%
%

2 octaves below
octave below
tritone (septimal/Huygens) below
perfect fourth below
minor third below

% augmented second (septimal minor 3rd)


% minor 3rd
% tritone (septimal/Huygens)
% diminished 5th
% perfect 5th
% minor 6th
% tridecimal minor third
% harmonic seventh (septimal min/subminor)
% minor 7th
% analysis hop size for ~tStretchRatio
% time stretching ratio

62

Tim Perry
V00213455
2009-05-21
%==============================================================
% Create Harmony (occurs after kernel loop of pv_Pitchshift.m
%==============================================================
harmony = zeros(Ny,2);
% stereo output file
% assign input to left channel and zero pad to output length
harmony(:,1)=[Input; zeros(Ny-Nx,1)]*0.9;
harmony(:,2)=y*0.9;

% assign shifted signal to right channel

harmTAG = [wavfile(1:length(wavfile)-4),'-Harmony',num2str(tStretch)];
harmTAG = strcat(harmTAG,'-L',num2str(WLen)...
,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav');
%for file naming
wavwrite(harmony,fs,harmTAG);

7.3.Stable/TransientComponentsSeparation
The separation of stable and transient components results no so much in a complete separation, but in
two distinctively different sounds. The transient portion can be considered a fractalization of the
sound, and the stable component is an etherization of the sound [1]. If the process is performed
correctly, these two signals should be able to reproduce the original signal if added back together.
The basic operation was achieved as follows:
1. The instantaneous frequency in each bin (the derivative of the phase with respect to time) was
calculated.
2. The instantaneous frequency was checked against a threshold value, which was used to define
whether or not it was in a stable range.
3. In the case of stable component preservation, bins defined as stable were kept and used to
reconstruct the stable part of the signal (the ethereal part). In the case of transient component
preservation, the unstable components of the signal are kept for reconstruction (the fractal
part)
The conditional part of this algorithm for defining stability can be mathematically expressed as [1]:

(DAFX 8.47)

(7)

63

Tim Perry
V00213455
2009-05-21
What is happening here, is that we are monitoring the instantaneous frequency with time. In the case
of a pure sinusoid, the instantaneous frequency does not change with time (the phase remains 0) this
is a stable signal. In the case of a stair drum blast, there is a rapid impulse of broadband energy that
decays quickly this is a transient, or unstable component.
Equation 7 maps to the diagram below, and the phase unwrapping notation that was used earlier. The
area enclosed by the angle dfRa is where we define stability, which is calculated with reference to the
expected target phase.

Figure 67: Defining the stable range for transient component separation [1].

The implementation of this algorithm in Malab can be used to provide a less conceptual explanation.
Referring to the kernel algorithm block diagram that was developed earlier, two parts of the code are
modified: the FX specific initializations, and the frequency domain processing:
%---------------------------------------------% FX-specific initializations
%---------------------------------------------df = weight*2*pi/WLen; % preset threshhold value (corresonds to (8.46),DAFX)
threshAngle = df*Ra; % angular range about target phase defined as stable
phi1 = zeros(WLen,1);
phi2 = zeros(WLen,1);
grain = zeros(WLen,1);
Output = zeros(Nx,1);

% initialize output vector

phi_d = princarg(phi-2*phi1+phi2);

%======================================================================
% Frequency Domain Processing (FX-specific for Transient-Stable separtion)
%======================================================================
%-----Find Phase increment per sample for all frequency bins--phi_d = princarg(phi-2*phi1+phi2);
% deviation phase
phi_t = phi;
% target phase = measured phase

64

Tim Perry
V00213455
2009-05-21
%-----Check if frequency within stable range for each bin-----% -if not stable & mode = 0, set amplitude to 0 in bin (keep stable)
% -if stable & mode = 1, set amplitude to 0 in bin (keep transients)
if mode == 0
r = r.*(abs(phi_d) < threshAngle);
%remove transients, keep stable
else
r = r.*(abs(phi_d) >= threshAngle);
%remove stable, keep transients
end

The results of this effect were less inspiring that most of the other effects. It is rather difficult to
obtain a anything close to what one would expect as a clean separation of components, even for
relatively simple signals. Regardless of the quality of the effect, however, this was a very interesting
and productive exercise that yielded some unexpected results when attempting to define exploring
stability thresholds.

7.4.Robotization
Robotization is inherently a very simple effect as the phase vocoder is concerned. The basis effect
has one operation in the frequency domain, and that is to set the target phase in every bin on every
FFT to zero. This forces the sound to become periodic that is, mostly stable. The reason that we
get periodicity, is that each IFFT (synthesis grain) is essentially a burst of sound. When we overlap
add these pulses back into the time domain, the result is a periodic train of pulsed sounds. This will
impose a pitch on the output signal, which is defined by the hop size Ra and the sampling rate:

The result of robotization is a very inorganic, synthesized version of the input sound, lacking
expression from the original performance. Transient based signals such as drums take on a complete
retransformation. For the basic implementation, the single line of code placed in the frequency
domain processing portion of the kernel algorithm is:
%==========================================================
% Frequency Domain Processing (FX-specific for Roboto)
%==========================================================
%----Put 0 phase values on every FFT for robotization---phi_t = 0;
% set target phase as zero

However, we can expand on this. The basic Robotization scheme requires that Ra is an integer. For
added flexibility, with the goal of calculating the hop size from a defined frequency value, linear
interpolation was employed in the finction pv_RobotoLinterp.m. In this way, audio was robotized

65

Tim Perry
V00213455
2009-05-21
using fractional hop size values. Referring to the long list of linear interpolation initializations
provided in the pitch shifting code overview, the resynthesis portions becomes:

%==========================================================
% Resynthesis Portion with Linear Interpolation
%==========================================================
ft = r.*exp(j*phi_t);
rt = abs(ft);

% FFT with ith grain target phase


% output amplitude

ModuliOut(1:win_end, i+1) = rt;


% store output moduli (same as input)
PhasesOut(1:win_end, i+1) = phi_t; % build matrix of output phases
%-------------- Inverse FFT & Windowing -----------tIFFT = fftshift(real(ifft(ft)));
% shifted IFFT
grain_t = tIFFT.*w2(1:win_end);
% inverse windowing (tapering)

%----------------- Interpolation -------------------grain_t2 = [grain_t;0];


% pad w/ single zero to allow interpolation
% between succesive grains
% apply linear interpolation (integrated resampling):
grain_t3 = grain_t2(nInterp0).*frac1 + grain_t2(nInterp1).*frac0;

if (numWin_s <= 24)


% plot this grain
kernalPlot(grain_t3,WLen,i,numWin_s,grainFIGs);
end
%----------Overlap Adding of Resampled Grains--------Output(vOut:vOut+Lresamp-1) = Output(vOut:vOut+Lresamp-1) + grain_t3;
vIn = vIn + Ra;
vOut = vOut + Ra;

% sample index for start of next block

66

Tim Perry
V00213455
2009-05-21

Figure 68: Robotization applied to the input signal TranSiberianDrum.wav.

7.5.Whisperization
Whisperization is another effect that is simple to implement within the existing phase vocoder
structure. The effect contrasts with Robotization , in that a random phase is imposed on each FFT
frame before resynthesis. The effect removes periodicity (pitched components) and harmonic
relationships from a signal. The degree to which a sound is Whisperized is controlled by the
window length and hop size. For sufficient removal of nearly all pitched components, a window size
of 512 or less and a hop size of WLen/8 was effective.
The single FX processing operation required between analysis and resynthesis stages of the outline
phase vocoder model are is shown below.
%==========================================================
% Frequency Domain Processing (FX-specific for Whisperization)
%==========================================================
%----Randomize phase values on every FFT for whisperization---phi_t = 2*pi*rand(WLen,1);
% set target phase as pseudorandom

67

Tim Perry
V00213455
2009-05-21
An interesting effect was achieved by using whisperization on the audio file TranSiberianDrum.wav
,a percussive audio sample that was already somewhat lacking in pitched components. The
whisperization function is called pv_Demonize.m , as the whispering that is produced on vocals
typically sounds sinister.

7.6.Denoising
Denoising was originally attempted by applying a time domain compressor concept, but along the
frequency bins of each FFT. The idea was that attack and release time could conceptually be replaced
by a gain ramps (in essence, resulting in a filter that is dynamically applied to scale the FFTs). The
purpose would be to shape the spectrum when a defined threshold is reached in a particular range of
bins, removing noise from the overall signal. This idea was more simple in concept than in
application, and a working model was not successfully implemented.
Instead of a brute force (and highly computationally complex) denoising algorithm, a more elegant
approach was taken, as suggested in [1]. A noise threshold was roughly tuned with a scalar coefficient
(taken into the phase vocoder as an input parameter), and a nonlinear function of the FFTs
magnitudes and this coefficient was used as a simple noise gate, The basis frequency domain portion
of the code is below, where:

The FFT of the current frame is defined by f


the FFT amplitude is defined by r = abs(f)
the output FFT with noise gating applied is ft, which gets its IFFT taken in the resynthesis
stage in preparation for OLA.
The nonlinear filter coefficient is coef [a typical value was found to be coef = 0.0002]

%==========================================================
% Frequency Domain Processing (FX-specific for denoising)
%==========================================================
%--------------Apply denoising------------r_mag = 2*r/WLen;
% magnitude of f in quantity peak
ft = f.*r_mag./(r_mag + coef);
% nonlinear function for noise gate

Denoising modifies the timbre of a sound, typically by smoothing out the spectrum and removing a
significant portion of the high frequency energy. This very simple algorithm was is quite effective at
removing noise; as a consequence, instruments with rich timbres undergo significant colouration.
The audio examples provided are based more on sound colouration attribute of the denoising than the
act of removing unwanted noise. In one example, the audio file Tristram2.wav receives denoising in
an attempt to remove some guitar fret noise, which was masking the vibrato that was being applied on

68

Tim Perry
V00213455
2009-05-21
a double stop. Of course, the resulting audio has undergone a drastic change in spectral content so
this is not a truly practical application for the basic denoiser.

7.7.WhatWhaFilterinFreqDomain
PreliminaryBPFDesign
To achieve a frequency domain implementation of a wha-wha filter using a phase vocoder, the
bandpass filter from Assignment 2 was first redesigned to new, realistic specifications (based on
Assignemnt 3). The specifications for the bandpass filter to be used to create a wah-wah effect were
chosen to be:

sampling rate fs = 44,100 Hz


center frequency f1 = 44,100/64 = 689 Hz.
3dB bandwidth B = 100 Hz
implementation uses 2 poles and 2 zeros

This structure for the preliminary BPF design is based on the allpass filter (page 41 of DAFX).
The center frequency is altered often enough to make a smooth sound without noticeable
transients.
Figure 69is the block diagram for a tuneable second order allpass filter, denoted A(z). From this
structure, the tuning parameters c and d are used to implement a second order BPF.

Figure 69: Second-order allpass filter block diagram [1]

69

Tim Perry
V00213455
2009-05-21
For the allpass filter of Figure 1:
1
1

(8)

For a second-order bandpass/bandreject filter:


1
1
2

(9)

tan
tan

1
(10)

cos

1
2

(11)

Figure 70: Second-order bandpass and bandreject implemented with allpass filter [1]

Therefore, the second order bandpass transfer function can be expressed as:

(12)

cut-off frequency (phase = -180, group delay is maximum) is


controlled by the coefficient d
bandwidth is controlled by the coefficient c

Expanding this, we can find the a and b parametric filter coefficients. The
terms in the
numerator cancel out, and the resulting form is analogous to the BPF seen in assignment 2:
1
2 1

1
1

For constructing the filter in Matlab using the filter function, the coefficient arrays are would
implemented as:
b = [(c+1) 0 (c+1)];

70

Tim Perry
V00213455
2009-05-21
a = 2*[(1 d*(1-c) -c];

However, since the BPF will be implemented as a wha filter in the phase vocoder, it will be
represented manually for convenient manipulation in the frequency domain:
kBins = linspace(0,fs/2,WLen/2 + 1); % lin spaced freq bins up to Nyquist
z=exp(2*pi*kBins/fs*j); %for manually expressing filter transfer function

.... c and d defined in (8) and (9)


Num = (((c+1).*z.*z)-(c+1));
Den = 2*(z.*z + d*(1-c).*z - c);
H = Num./Den;

Figure 71: Preliminary BPF design. The frequency domain implementation will apply the appropriate scaling
to the amplitudes and phases (or optionally, only the amplitudes) of each FFT frame based on this filter
shape, and move the center frequency with each successive window.

FrequencyDomainWhaWhaImplementation

71

Tim Perry
V00213455
2009-05-21
%---------------------------------------------% FX-specific initializations (wha filter)
%---------------------------------------------whaSpeed = WLen/whaRate;
kBins = linspace(0,fs/2,WLen);
z=exp(2*pi*kBins/fs*j);
Hframes = zeros(WLen,numWin_a+1);

%
%
%
%

Multiplier for filter sweep frequency


linear spaced freq bins up to Nyquist
for manually expressing filter transfer function
Hframes(i*Ra,k) to store each blocks filter H(k)

Output = zeros(Nx,1);

% initialize output vector

%==========================================================
% Frequency Domain Processing (FX-specific for Wha BPF)
%==========================================================
fc = f1*(1 + sweepRange*cos(2*pi*whaSpeed*i/fs));

% change center freq

%----------------------------------------------% Build Transfer Function w/ Filter Coefficients


% allpass filter form (Pg 41-43, DAFX)
d = -cos(2*pi*fc/fs);
% apply new f_c to d param
c = (tan(pi*B/fs)-1)/(tan(2*pi*B/fs)+1); % controls the bandwidth
Num = (((c+1).*z.*z)-(c+1));
% b = [(c+1) 0 (c+1)]
Den = 2*(z.*z + d*(1-c).*z - c);
% a = 2*[(1 d*(1-c) -c]
H = Num./Den;
% BPF transfer function H(k)
Hframes(1:win_end, i+1) = H; % store filter H(k) in Hframes(k,i*Ra)
%-----------------------------------------------------% Filter this block by multiplication with its FFT
% choose to keep current phase, or let filter change phase response
keepPhase = 0;
if keepPhase == 1
rt = r.*abs(H');
% scale amplitudes with filter
phi_t = phi;
% keep original phase
ft = rt.*exp(j*phi_t);
% FFT filtered with ith grain BPF
else
ft = f.*H';
% filter the current block
phi_t = angle(ft);
% output phase
rt = abs(ft);
% output amplitude
end

72

Tim Perry
V00213455
2009-05-21

Figure 72: Amplitude waterfall of Wha filter applied to white noise.

Figure 73: Magnitude waterfall of Wha filter applied to white noise; log frequency scale (top), linear
freq scale (bottom). Axis with label not visible is freq.
Output file: whitenoise-Wha-fc689B100rate8rng1-L1024-Ra128Rs128.wav

73

Tim Perry
V00213455
2009-05-21

Figure 74: Input (top) and output (bottom) magnitude waterfall of flute.wav. Output file is flute2-Whafc689B100rate8rng0.8-L1024-Ra128Rs128.wav. Window size is 1024 samples.

Comparing Figure 74 with Figure 75 shows the effect of window size on frequency resolution. The
harmonics in the flute spectrum are better represented with the larger FFT size of 4096. With a smaller
window size, we get smoother tracking for the wha sweeps as there are smaller increments in filter center
frequency due to more frequent windowing and more FFTs. However, with a larger window size, the
74

Tim Perry
V00213455
2009-05-21
bandpass filter specs are better approximated due to the higher frequency resolution (more frequency bins
in each FFT, with closer spacing between bins). Without applying an interpolation scheme, however, a
wide sweeping wha filter with a 4096 window size will not sound as smooth to the listeners ear. This is
due to less frequent (and as a result, larger) increments of the BPF center frequency.

Figure 75: Input (top) and output (bottom) magnitude waterfall of flute.wav. Output file is flute2-Whafc689B100rate8rng0.8-L4096-Ra128Rs128.wav. Window size is 4096 samples.

75

Tim Perry
V00213455
2009-05-21

Figure 76: flute2.wav I/O of pv_whaBPF for window size 4096 (left) and 1024 (right)

Figure 77: Wha filtering on TyrSmidir1.wav. Output audio file is TyrSmidur1-Whaf1689B689spd130rng0.8L1024-Ra128Rs128.wav.

76

Tim Perry
V00213455
2009-05-21

Figure 78: Magnitude waterfall plots on log freq scale (input top, output bottom) of Wha filtering on
TyrSmidir1.wav. Output audio file is TyrSmidur1-Whaf1689B689spd130rng0.8-L1024-Ra128Rs128.wav.

Apply the wha filter implementation to TyrSmidir1.wav produced interesting results, as there is a timbre
change when the distorted guitars come in. Since there is more high frequency noise after this mark, we
can hear the wha filter bringing out distortion harmonics in the upper end of the wha sweep range. This is
particularly noticeable with a large window size, as the frequency resolution is higher and therefore the
77

Tim Perry
V00213455
2009-05-21
filter pass band is more narrow. The effect of bringing out harmonics is made more dramatic with a wider
sweep that reaches into the higher frequency part of the spectrum (reaching over 8kHz, for example) or a
higher default center frequency.

78

Tim Perry
V00213455
2009-05-21

Figure 79: Wha filtering on TyrSmidir1.wav. Output audio file is TyrSmidur1-Wha-fc689B100spd520rng5L1024-Ra128Rs128.wav.

79

Tim Perry
V00213455
2009-05-21

8. Audio Compression Using Phase Vocoder


Two different algorithms for data compression were used in the frequency domain. Aside from one
initialization, as documented, below, the entire code fits between the analysis and synthesis portions
of the phase vocoder kernel algorithm that has been outlined earlier. The two methods of compression
are discussed on the following pages.

Figure 80: Frequency domain processing portion for data compression. Occurs betwen the Analayis
and Synthesis stages of the phase vocoder. Mode = 0 corresponds to compression based on a
Threshold; Mode = 1 corresponds to compression based on a specified number of bins to keep.
80

Tim Perry
V00213455
2009-05-21

8.1.DataCompressionusingaThresholdAmplitudeforEliminatingBins
Data compression on audio was first implemented by setting all frequency components (corresponding to
the bins of the FFT) below a threshold amplitude to 0. Higher threshold values correspond to a higher
level of compression. A threshold of 1 corresponds to the maximum value that particular windowed FFT.
With a threshold of 1, every component of the signal represented is one of the maximum amplitude values
in its FFT. With a threshold of 0, the output is identical to the input.
This technique required a hop size of 1/8 to produce less than atrocious results. With medium levels of
compression using this method (a threshold value of about 0.5), the reconstructed signal has significant
noise components. With very high levels of compression, however, (thresholds closer to 1), the output
sound consisted of primarily the fundamental, harmonics, and other key frequencies (such as those in the
formants for vocals). This method produces unpredictable results, as each successive will often be
represented by a different number of bins. Over the duration of several bins, it is possible to jump from a
signal represented by many bins to a single sinusoid.

Figure 81: compression using a threshold of 0.5 times the maximum FFT amplitude of each frame on
Diner.wav. Output audio is diner-DataCompThresh0.2-L4096-Ra512Rs512. The large difference in signal
envelope scaling is due primarily to normalization of the output signal to the peak level of a noise artefact in
the waveform. FFT size is 4096.

81

Tim Perry
V00213455
2009-05-21

Figure 82: Amplitude waterfall representations (input top, output bottom) of compression using a threshold
of 0.5 times the maximum FFT amplitude of each frame. Output audio is Tristram2-DataCompThresh0.5L4096-Ra512Rs512.wav. FFT size is 4096.

82

Tim Perry
V00213455
2009-05-21

Figure 83: Magnitude waterfall representations (input top, output bottom) of compression using a threshold
of 0.5 times the maximum FFT amplitude of each frame. Output audio is Tristram2-DataCompThresh0.5L4096-Ra512Rs512.wav. FFT size is 4096.

83

Tim Perry
V00213455
2009-05-21

8.2.DataCompressionbyKeepingNStrongestFrequencyComponents
By keeping only a specified number of bins, Nkeep , which have the highest amplitude values in each
FFT, a more consistent and slightly better form of compression was obtained. Nkeep was chosen based
on a scaler value that we can call keepRatio, which must multiplied by the window size. Alternatively, the
number of bins to keep can be manually entered, but this is less consistent if window sizes are being
changed.
The ratio floor(keepRatio*WLen) has varying acceptable values that compress that audio without creating
noticeable artefacts. The value depends primarily on the timbre of the audio. For example, a single
sinusoid will sound fine if it is represented by a single frequency bin (in this case, keepRatio = 1/WLen).
For most audio, the minimum value of keepRatio for a reasonable quality compressed reproduction was
found to be 0.7. If it is set to be less than 0.7, audible artefacts become apparent. True perceptual encoders
can achieve much higher ratios than this by exploiting the masking effects of nearby frequency bins and
applying the most compression at frequencies where we have the least sensitive hearing. Additionally,
perceptual encoders exploit pre and post masking techniques to hide quantization noise around transients.
However, the primitive model used here did achieve success with keepRatios as low as 0.7, which I did
not expect.
Most of the frequency bins that were scrapped were in the high end of the spectrum, at the edge or outside
of our hearing range. As soon as frequency bins that are well perceived by human hearing are cut out, in
this case, artefacts are noticeable. One reason that these artefacts sound unpleasant, is that the subtle
richness of the timbre is being revoved, and but some of the high frequency harmonics are kept, sticking
out in the spectrum. A alternative (but still simple) scheme is to scale the weaker bins to lesser values, and
allocate less bits to them. A potentially better scheme is to simply allocate less bits to these values by
increasing the discrete step between bits, rather than truncating the dynamic range. For example, in the
extreme case, a frequency bin that is rather unimportant for human perception could have 2 values, on (1)
and off (0).

Figure 84: Data compression on Tristram2.wav, keeping 0.7*WLen frequency bins (Len = 1024)

84

Tim Perry
V00213455
2009-05-21

Figure 85: Input (top) and Output (bottom) amplitude waterfalls for data compression on
Tristram2.wav, keeping 0.7*WLen frequency bins (Len = 1024). The amplitude spectrum appears
unchanged (even with somewhat higher levels of compression), as there is less energy in the
frequencies where compression occurred (high frequencies). Output file is Tristram2DataCompN716.8-L1024-Ra128Rs128.wav

85

Tim Perry
V00213455
2009-05-21

Figure 86: Input (top) and Output (bottom) Magnitude waterfalls for data compression on
Tristram2.wav, keeping 0.2*WLen frequency bins (Len = 1024). The upper end of the magnitude
spectrum shows that many bins have been set to zero.

86

Tim Perry
V00213455
2009-05-21

Figure 87: and Output (bottom) amplitude waterfalls for data compression on diner.wav, using
floor(0.1*WLen) frequency bins (Len = 1024).

87

Tim Perry
V00213455
2009-05-21

9. Conclusions
A robust phase vocoder model, outlined in the block diagram of Figure 1, was implemented based on the
block-by-block FFT approach in [1]. The operation of the phase vocoder was evolved over time; the last
two implementations (wha filter and data compression) represent the most up to date version of the phase
vocoder framework. These phase vocoders call the function waterfall_Plot.m, according to their input
parameters, to produce time-ferquency representations of the input and output audio or vectors. This
functionality can easily be extended to the other phase vocoders, which were implemented with an older
version of the waterfall plotting function. The pitch shifting implementation, on the other hand, is the
most full featured of the basic effects, incorporating linear interpolation and a rudimentary harmonizer.
This project has helped develop of my practical and theoretical experience with digital audio effects, and
DSP in general. The time-frequency representation functions that were developed for this phase vocoder
have proven to be useful tools outside of the project for analysis of audio signals, and signals in general.
Had additional time been available, many of the effects that were implemented could have undergone
expansions. With the fundamental framework in place, creative effects that are based on frequencydomain processing can be designed and easily integrated into the phase vocoder.

REFERENCES
[1]

Udo Zlzer, DAFX. John Wiley & Sons, 2002.

[2]

A. Gtzen, N. Bernardini, D. Arfib, Traditional (?) Implementations of a Phase-Vocoder: The


Tricks of the Trade, COST G-6 Conference Proceedings (DAFX-00), 2000

88

Tim Perry
V00213455
2009-05-21

APPENDIX A List of MATLAB files


LIST OF MATLAB FILES

M-FILE
PhaseVocode_wav.m
PhaseVocode_vec.m
pVocoder_FFT.m
pv_Timestretch.m
pv_Pitchshift.m
pv_Transient.m
pv_Robotize.m
pv_RobotizeLinterp.m
pv_Demonize.m
pv_Denoise.m
pv_WhaBPF.m
pv_DataComp.m

DESCRIPTION
Tests implementations on audio
Tests implementations on vectors
Original I/O operation
Time Stretching
Pitch Shifting/Basic Harmonizer
Stable/Transient component separation
Basic robotization
Robotization with linear interpolation
Whisperization
Denoising
Wha wha filter (freq domain implementation)
Audio data compression experimentation

kernalPlot.m
waterfall_Plot.m
waterfall_surf.m
design_whaBPF.m
princarg.m

Plots analysis grains


Plots time-frequency representations
Plots time-frequency representations (old version)
Bandpass filter plots for pv_WhaBPF
Computed principle argument for phase unwrapping

89

Tim Perry
V00213455
2009-05-21

APPENDIX B Two Example Implementations


1. pv_WhaBPF.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% pv_WhaBPF
Author: Tim Perry
% Elec484: DAFX
V00213455
% Final Project, Phase 1
2009-07-18
%
% FFT/IFFT implementation of the Phase Vocoder (Block-by-Block Approach)
% to be used for implementing a wha-wha bandpass filter in the freq domain.
% -based on concept of the direct FFT/IFFT method used in Zolzer's DAFX
%
% [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut,waterFIG_i,waterFIG_o]...
%
=PV_WHABPF(x,WLen,f1,B,whaRate,sweepRange,TAG,waterplot,plotCODE)
%
%
x = input vector or .wav file
%
WLen = analysis & synthesis window size
%
f1 = Default center freq (sweep fc w/ respect to f1) (ex.689Hz)
%
B
- 3dB bandwidth (recomend 100 Hz)
%
whaRate
- Controls wha-wha rate independent of WLen
%
sweepRange
- Multiplier for range of filter sweep
%
TAG
- 'String-For-Naming-Plots'
%
waterplot = [0 0 0] to plot no time-freq representations
%
[1 0 0] to plot input waterfalls on lin freq scale
%
[0 1 1] to plot output watrfalls on log scale
%
[1 1 1] to plot I/O waterfalls on log scale, ext.
%
plotCODE - [1 1 1 1] plots Amp, Phase, Mag, Phase @ maxAmp Bins
%
- [0 0 0 0] plots nothing
%
%
y = output vector
%
ModuliIn,ModuliOut: input & output moduli (amplitude) matrices
%
PhaseIn,PhaseOut: input & output phase matrices
%
waterFIG_i = Waterfall fig handles for input.
%
[waterAmpFIG waterPhaseFIG waterMagFIG pMaxBinFIG]
%
waterFIG_o = Waterfall fig handles but for output.
%
% FILTER SPECS:
%
sampling rate fs = 44,100 Hz
%
center frequency example:
f_1 = 44,100/64 = 689 Hz.
%
3dB bandwidth example:
B = 100 Hz
%
implementation uses 2 poles and 2 zeros
%
% REFERENCES:
% [1] Udo Zlzer, DAFX. John Wiley & Sons, 2002.
% [2] A. Gtzen, N. Bernardini, D. Arfib, 2000. (see PDF documentation)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut,waterFIG_i,waterFIG_o]...
= pv_WhaBPF(x,WLen,f1,B,whaRate,sweepRange,TAG,waterplot,plotCODE)
%
%
%
%

x = 'white_noise.wav';
WLen = 1024;
TAG = 'WhaTest';
waterplot = 0;

Ra = WLen/8;
Rs = Ra;

%for testing

% analysis hop size


% synthesis hop size

90

Tim Perry
V00213455
2009-05-21
%==============================================================
% Input processing & zero padding
%==============================================================
figNum = get(0,'CurrentFigure');
%------------------------check input type-------------------[xtype xinfo] = wavfinfo(x);
% input data info
disp(xinfo)
if length(xtype) == 0
iswav = 0;
fs = 8000;

% vector input

grainFIGa = figure('Name','Analysis Grains (kernalPlot)', ...


'Position',[100,100,1000,800]);
clear figure(figNum);
figNum = figNum + 1;
grainFIGs = figure('Name','Synthesis Grains (kernalPlot)', ...
'Position',[100,100,1000,800]);
clear figure(figNum);
figNum = figNum + 1;
else
iswav = 1;
% wav file input
wavfile = x;
TAG=strcat(wavfile(1:length(wavfile)-4),'-',TAG);
[x fs nbits] = wavread(wavfile);
% read input audio
end

%------use left channel for mono treatment of stereo input----x = x(:,1);


% left channel (col vector)
N_orig = length(x);
% original length of input vector

if size(x,2)>1,
x=x';
end

% ensure column vectors for matrix operations

%----------------zero-pad Input---------------------% WLen zeros before x,


% WLen-(N_orig-n.*Ra) zeros after x, where n = floor(X./Y)
Input = [zeros(WLen, 1); x; zeros(WLen-mod(N_orig,Ra),1)];
Input = Input/max(abs(x));
% normalize Input
Nx = length(Input);
% zero-padded input length

fileTAG = strcat(TAG,'-L',num2str(WLen)...
,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav');
plotTAG = strcat(TAG,': WLen=',num2str(WLen)...
,' Ra=',num2str(Ra),' Rs=',num2str(Rs));

%for file naming


%for plot naming

%HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
%==============================================================
% Create framing window (modified hanning window for OLA [2])
% w = [0, k_1, k_2,..., k_n-1 = k1]
%==============================================================

91

Tim Perry
V00213455
2009-05-21
w1 = 0.5*(1-cos(2*pi*(0:WLen-1)'/(WLen)));
w2 = w1;
%winview = wvtool(w1,w2);
numWin_a = ceil((Nx-WLen+Ra)/Ra);
numWin_s = ceil((Nx-WLen+Rs)/Rs);

%
%
%
%

analyis window
synthesis window
plot windows
# of analysis windows

%==============================================================
% Initializations for Kernal Algorithm
%==============================================================
%---------------------------------------------% FX-specific initializations (wha filter)
%---------------------------------------------% f1 = fs/64;
% Default center frequency (sweep fc with respect to f1)
% B = 100;
% 3dB bandwidth is 100 Hz
% whaSpeed = 50; % Multiplier for filter sweep frequency
% sweepRange = 3; % Multiplier for range of filter sweep
whaSpeed = WLen/whaRate;
% Multiplier for filter sweep frequency
kBins = linspace(0,fs/2,WLen);
% lin spaced freq bins up to Nyquist
z=exp(2*pi*kBins/fs*j); %for manually expressing filter transfer function
Hframes = zeros(WLen, numWin_a+1); % Hframes(i*Ra,k) to store each blocks filter H(k)
Output = zeros(Nx,1);

% initialize output vector

%---------------------------------------------% General PV initializations


%---------------------------------------------%omega = 2*pi*Ra*[0:WLen-1]'/WLen;
% nominal phase increment for Ra
%phi0 = zeros(WLen,1);
% previous measured phase
phi_t = zeros(WLen,1);
% target phase
ModuliIn = zeros(WLen, numWin_a+1);
PhasesIn = ModuliIn;
ModuliOut = zeros(WLen, numWin_s+1);
PhasesOut = ModuliOut;
nPlotIndices_a = zeros(numWin_a+1,1);
nPlotIndices_s = zeros(numWin_s+1,1);

% to store sample indexes for plots

tic
%HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
%=======================================================================
%------------------------Kernal Algorithm----------------------% -performs FFTs, IFFTs, and overlap-add of successive grains
% -implements analysis and resynthesis frame-by-frame
% -stores phase and moduli for input and output signals
%=======================================================================
vIn = 1;
vOut = 1;

% analysis sample index @ frame start


% synthesis sample index @ frame start

for i = 0:numWin_a - 2

% processing on ith frame (i = 0:floor((Nx - WLen)/Ra))

%==========================================================

92

Tim Perry
V00213455
2009-05-21
% Analysis Portion
%==========================================================
if ((vIn + WLen - 1) >= Nx)
frame_end = Nx - vIn;
% end index of ith frame (if last)
else
frame_end = WLen - 1;
% end index offset of ith frame
end
win_end = frame_end + 1;
% end index offset of window
%--------Window input, forming kernal-------grain = Input(vIn : vIn + frame_end).*w1(1:win_end);
%grain = [zeros(1.5*WLen, 1); grain; zeros(1.5*WLen, 1)]; % zero pad analysis
grain for greater frequency precision

if (numWin_a <= 24)


% plot this grain
kernalPlot(grain,WLen,i,numWin_a,grainFIG);
end
%--------FFT on circular shifted grain -------f = fft(fftshift(grain));
% FFT of ith grain
r = abs(f);
% amplitude
phi = angle(f);
% phase
%----------Store analysis results----------nPlotIndices_a(i+1) = vIn + frame_end;
% store sample index for FFT frame plot
ModuliIn(1:win_end, i+1) = r;
% store analysis results
PhasesIn(1:win_end, i+1) = phi;

%==========================================================
% Frequency Domain Processing (FX-specific for Wha BPF)
%==========================================================
fc = f1*(1 + sweepRange*cos(2*pi*whaSpeed*i/fs));

% change center freq

%----------------------------------------------% Build Transfer Function w/ Filter Coefficients


% allpass filter form (Pg 41-43, DAFX)
d = -cos(2*pi*fc/fs);
% apply new f_c to d param
c = (tan(pi*B/fs)-1)/(tan(2*pi*B/fs)+1); % controls the bandwidth
Num = (((c+1).*z.*z)-(c+1));
% b = [(c+1) 0 (c+1)]
Den = 2*(z.*z + d*(1-c).*z - c);
% a = 2*[(1 d*(1-c) -c]
H = Num./Den;
% BPF transfer function H(k)
% Hframes(1:win_end, i+1) = H; % store filter H(k) in Hframes(k,i*Ra)
%-----------------------------------------------------% Filter this block by multiplication with its FFT
% choose to keep current phase, or let filter change phase response
keepPhase = 0;
if keepPhase == 1
rt = r.*abs(H');
% scale amplitudes with filter
phi_t = phi;
% keep original phase
ft = rt.*exp(j*phi_t);
% FFT filtered with ith grain BPF
else
ft = f.*H';
% filter the current block
phi_t = angle(ft);
% output phase
rt = abs(ft);
% output amplitude
end

93

Tim Perry
V00213455
2009-05-21

%==========================================================
% Resynthesis Portion
%==========================================================
ModuliOut(1:win_end, i+1) = rt;
PhasesOut(1:win_end, i+1) = phi_t;

% store output moduli (same as input)


% build matrix of output phases

%----------- Inverse FFT & Windowing --------tIFFT = fftshift(real(ifft(ft)));


% shifted IFFT
grain_t = tIFFT.*w2(1:win_end);
% inverse windowing (tapering)

%------------ Overlap Adding --------------Output(vOut:vOut+frame_end) = Output(vOut:vOut+frame_end)


vIn = vIn + Ra;
vOut = vOut + Rs;

+ grain_t;

% sample index for start of next block

end
%HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
toc

%======================================================================
%------------ Output Processing & Plotting-----------------%======================================================================
y = Output/max(abs(Output));
% normalize output vector
plotTAG_in = ['INPUT for', plotTAG];
plotTAG_out = ['OUTPUT', plotTAG];
if iswav == 1
%=====================================================
% Time Domain Plots for .WAV Input
%=====================================================
Ny = length(y);
Ts = 1/fs;
% sampling period
nT_in = (0:Nx-1)*Ts;
% time vector (zero-padded input)
nT_out = (0:Ny-1)*Ts;
% time vector (output)
axis_time = [0, max(Nx,Ny), -1.2, 1.2];
figure('Name','I/O');
clear figure(figNum);
figNum = figNum + 1;
colordef white;
%-------input (integer samples/cycle)--------subplot(2,1,1)
hold on;
plot(nT_in*fs,Input);
%plot(nT_in*fs,Input, 'b.');
%stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13);
grid on;

94

Tim Perry
V00213455
2009-05-21
axis(axis_time);
title(['Input x[n]
(', wavfile, ') , normalized & padded']);
ylabel('x[n]')
xlabel('n (samples)')
hold off;
%-------output (integer samples/cycle)--------subplot(2,1,2)
hold on;
plot(nT_out*fs,y, 'r');
%plot(nT_out*fs,y, 'r.');
%stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13);
grid on;
axis(axis_time)
title(['Output y[n]
(', plotTAG, ') , normalized']);
ylabel('y[n]')
xlabel('n (samples)')
hold off;
%==================================================================
% Time-Frequency Plots
% Plots the following:
%
- Amplitude & Phase Waterfalls
%
- Magnitude Waterfall (dB scale)
%
- Phase vs. Time plot of Freq Bins near max amplitude
%
(in the case of a single tone sinusoid, will be centered
%
around pitch)
%==================================================================
freqScale = waterplot(3);
% linear or log frequency scale
%plotCODE = [1 1 1 0];
% plot amp, phase, and mag waterfalls
if waterplot(1) == 1
% input waterfall plots
waterFIG_i = waterfall_Plot(ModuliIn,PhasesIn,nPlotIndices_a,...
fs,WLen,Ra,Nx,plotCODE,freqScale,wavfile);
end
if waterplot(2) == 1
% output waterfall plots
waterFIG_o = waterfall_Plot(ModuliOut,PhasesOut,nPlotIndices_a,...
fs,WLen,Ra,Ny,plotCODE,freqScale,plotTAG);
end

%==============================================================
% Output Audio File
%==============================================================
y = y*0.999;
% lower level a bit more to remove wavrite clip warning
x = x/max(abs(x)); % normalize input for comparison playback
wavwrite(y, fs, nbits, fileTAG);

% write output audio file

%wavplay(x, fs);
wavplay(y, fs);

% play PV input
% play PV output

%HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
else
%======================================================
% Time Domain Plots for Vector Input
%======================================================
Ny = length(y);
Nsamples_in = linspace(0,Nx,Nx);
Nsamples_out = linspace(0,Ny,Ny);

95

Tim Perry
V00213455
2009-05-21
axis_time = [0, max(Nx,Ny), -1.2, 1.2];
figure (figNum);
clear figure(figNum);
%------------normalized input plot-------------subplot(2,1,1);
hold on;
plot(Nsamples_in, Input,'b-');
stem(Nsamples_in, Input,'.','b-','MarkerSize',9);
axis(axis_time);
grid on;
title(['Input x[n]
(', plotTAG, ') , normalized']);
ylabel('x[n]');
xlabel('n');
hold off;
%------------normalized output plot-------------subplot(2,1,2);
hold on;
plot(Nsamples_out, y,'b-');
stem(Nsamples_out, y,'.','b-','MarkerSize',9);
axis(axis_time);
grid on;
title(['Output y[n]
(', plotTAG, ') , normalized']);
ylabel('y[n]');
xlabel('n');
hold off;
%==================================================================
% Time-Frequency Plots for vector input
% Plots the following:
%
- Amplitude & Phase Waterfalls
%
- Magnitude Waterfall (dB scale)
%
- Phase vs. Time plot of Freq Bins near max amplitude
%
(in the case of a single tone sinusoid, will be centered
%
around pitch)
%==================================================================
if waterplot == 1
waterfallFIG_in =
waterfall_vecPlot(ModuliIn,PhasesIn,nPlotIndices_a,fs,WLen,Ra,Rs,Nx,plotTAG_in);
waterfallFIG_out =
waterfall_vecPlot(ModuliOut,PhasesOut,nPlotIndices_a,fs,WLen,Ra,Rs,Ny,plotTAG_out);
end
end

96

Tim Perry
V00213455
2009-05-21

2. pv_Pitchshift.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% pv_Pitchshift.m
Author: Tim Perry
% Elec484: DAFX
V00213455
% Final Project, Phase 1
2009-07-13
%
% FFT/IFFT implementation of the Phase Vocoder (Block-by-Block Approach)
% to be used for pitch shifting, using integrated resampling.
% -based on concept of the direct FFT/IFFT method used in Zolzer's DAFX
% -uses different hop sizes for analysis and resynthesis
% -time stretch ratio tStretch = Rs/Ra
% -for each grain, a time stretching and resampling is performed
%
% [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut]...
%
=PV_PITCHSHIFT(x,WLen,Ra,Rs,TAG,waterplot)
%
%
x = input vector or .wav file
%
WLen = analysis window/grain size & synthesis window size
%
Ra = analysis hop size
(Ra <= WLen/4)
%
Rs = synthesis hop size (Rs <= WLen/4)
%
TAG = 'String-For-Naming-Plots'
%
waterplot = 1 to plot time-freq representations, 0 otherwise
%
%
y = output vector
%
ModuliIn,ModuliOut: input & output moduli (amplitude) matrices
%
PhaseIn,PhaseOut: input & output phase matrices
%
% REFERENCES:
% [1] Udo Zlzer, DAFX. John Wiley & Sons, 2002.
% [2] A. Gtzen, N. Bernardini, D. Arfib, 2000. (see PDF documentation)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function [y,ModuliIn,PhasesIn,ModuliOut,PhasesOut] ...


= pv_Pitchshift(x,WLen,Ra,Rs,TAG,waterplot)

%==============================================================
% Input processing & zero padding
%==============================================================
figNum = get(0,'CurrentFigure');
%------------------------check input type-------------------[xtype xinfo] = wavfinfo(x);
% input data info
disp(xinfo)
if length(xtype) == 0
iswav = 0;
fs = 8000;

% vector input

grainFIGa = figure('Name','Analysis Grains (kernalPlot)', ...


'Position',[100,100,1000,800]);
clear figure(figNum);
figNum = figNum + 1;
grainFIGs = figure('Name','Synthesis Grains (kernalPlot)', ...
'Position',[100,100,1000,800]);
clear figure(figNum);

97

Tim Perry
V00213455
2009-05-21
figNum = figNum + 1;
else
iswav = 1;
% wav file input
wavfile = x;
TAG=strcat(wavfile(1:length(wavfile)-4),'-',TAG);
[x fs nbits] = wavread(wavfile);
% read input audio
end

%------use left channel for mono treatment of stereo input----x = x(:,1);


% left channel (col vector)
N_orig = length(x);
% original length of input vector

if size(x,2)>1,
x=x';
end

% ensure column vectors for matrix operations

%----------------zero-pad Input---------------------% WLen zeros before x,


% WLen-(N_orig-n.*Ra) zeros after x, where n = floor(X./Y)
Input = [zeros(WLen, 1); x; zeros(WLen-mod(N_orig,Ra),1)];
Input = Input/max(abs(x));
% normalize Input
Nx = length(Input);
% zero-padded input length

fileTAG = strcat(TAG,'-L',num2str(WLen)...
,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav');

%for file naming

plotTAG = strcat(TAG,': WLen=',num2str(WLen)...


,' Ra=',num2str(Ra),' Rs=',num2str(Rs));

%for plot naming

%HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
%==============================================================
% Create framing window (modified hanning window for OLA [2])
% w = [0, k_1, k_2,..., k_n-1 = k1]
%==============================================================
w1 = 0.5*(1-cos(2*pi*(0:WLen-1)'/(WLen)));
w2 = w1;
%winview = wvtool(w1,w2);
numWin_a = ceil((Nx-WLen+Ra)/Ra);
numWin_s = ceil((Nx-WLen+Rs)/Rs);

% analyis window
% synthesis window
% plot windows
% # of analysis windows
% # of synthesis windows

%==============================================================
% Initializations for Kernal Algorithm
%==============================================================
%-------------------------------------------------% FX-specific initializations for pitch shifting
%-------------------------------------------------tStretch = Rs/Ra

% time stretch ratio

%-------Linear Interpolation Parameters---------

98

Tim Perry
V00213455
2009-05-21
Lresamp = floor(WLen/tStretch);

% length of resampled/interpolated grain

nInterpSpace = linspace(0,Lresamp-1,Lresamp)'; % linear spaced time row vec


nfracInterp = 1 + nInterpSpace*WLen/Lresamp;
nInterp0 = floor(nfracInterp);
nInterp1 = nInterp0 + 1;
frac0 = nfracInterp - nInterp0;
frac1 = 1-frac0;

Output = zeros(Lresamp+Nx,1);

%
%
%
%
%
%
%
%

Lresamp length vector of sample


values between 1 and WLen
Lresamp length vector of sample
values between 2 and WLen+1
fractional distances of integer
below interpolation points
fractional distances of integer
above interpolation points

integer
integer
samples
samples

% initialize output vector (overlap-added


% interpolated synthesis grains)

%---------------------------------------------% General PV initializations


%---------------------------------------------omega = 2*pi*Ra*[0:WLen-1]'/WLen;
% nominal phase increment for Ra
phi0 = zeros(WLen,1);
% previous measured phase
phi_t = zeros(WLen,1);
% target phase
ModuliIn = zeros(WLen, numWin_a+1);
PhasesIn = ModuliIn;
ModuliOut = zeros(WLen, numWin_s+1);
PhasesOut = ModuliOut;
nPlotIndices_a = zeros(numWin_a+1,1);
nPlotIndices_s = zeros(numWin_s+1,1);

% to store sample indexes for plots

tic
%HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
%=======================================================================
%------------------------Kernal Algorithm----------------------% -performs FFTs, IFFTs, and overlap-add of successive grains
% -implements analysis and resynthesis frame-by-frame
% -stores phase and moduli for input and output signals
%=======================================================================
vIn = 1;
vOut = 1;

% analysis sample index @ block start


% synthesis sample index @ block start

for i = 0:numWin_a - 2

% processing on ith block (i = 0:floor((Nx - WLen)/Ra))

%==========================================================
% Analysis Portion
%==========================================================
if ((vIn + WLen - 1) >= Nx)
frame_end = Nx - vIn;
% end index of ith frame (if last)
else
frame_end = WLen - 1;
% end index offset of ith frame
end
win_end = frame_end + 1;
% end index offset of window

99

Tim Perry
V00213455
2009-05-21
%--------Window input, forming grain-------grain = Input(vIn : vIn + frame_end).*w1(1:win_end);
% zero pad analysis grain for greater frequency precision
%grain = [zeros(1.5*WLen, 1); grain; zeros(1.5*WLen, 1)];

if (numWin_a <= 24)


% plot this grain
kernalPlot(grain,WLen,i,numWin_a,grainFIG);
end
%--------FFT on circular shifted grain -------f = fft(fftshift(grain));
% FFT of ith grain
r = abs(f);
% amplitude
phi = angle(f);
% phase
%----------Store analysis results----------nPlotIndices_a(i+1) = vIn + frame_end;
% store sample index for FFT frame plot
ModuliIn(1:win_end, i+1) = r;
% store analysis results
PhasesIn(1:win_end, i+1) = phi;

%==========================================================
% Frequency Domain Processing
%==========================================================
%----------- Phase Unwrapping --------------phi_d = princarg(phi-phi0-omega);
% devlation phase (3)
delta_phi = omega + phi_d;
% phase difference between two
% adjacent frames for each added
% to nominal phase of the bin
phi0 = phi;

% measured phase

%--------- Target Phase Calculation --------% -implemetents time stretching by ratio Rs/Ra
% -phase increment is stretched, then added to previous phase
phi_t = princarg(phi_t + delta_phi*tStretch);

%==========================================================
% Resynthesis Portion
%==========================================================
ft = r.*exp(j*phi_t);
rt = abs(ft);

% FFT with ith grain target phase


% output amplitude

ModuliOut(1:win_end, i+1) = rt;


% store output moduli (same as input)
PhasesOut(1:win_end, i+1) = phi_t; % build matrix of output phases
%-------------- Inverse FFT & Windowing -----------tIFFT = fftshift(real(ifft(ft)));
% shifted IFFT
grain_t = tIFFT.*w2(1:win_end);
% inverse windowing (tapering)

%----------------- Interpolation -------------------grain_t2 = [grain_t;0];


% pad w/ single zero to allow interpolation
% between succesive grains
grain_t3 = grain_t2(nInterp0).*frac1 + grain_t2(nInterp1).*frac0; % linear interp

100

Tim Perry
V00213455
2009-05-21
if (numWin_s <= 24)
% plot this grain
kernalPlot(grain_t3,WLen,i,numWin_s,grainFIGs);
end
%----------Overlap Adding of Resampled Grains--------Output(vOut:vOut+Lresamp-1) = Output(vOut:vOut+Lresamp-1) + grain_t3;
vIn = vIn + Ra;
vOut = vOut + Ra;

% sample index for start of next block

end
%HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
toc

%======================================================================
%------------ Output Processing & Plotting-----------------%======================================================================
y = Output/max(abs(Output));
% normalize output vector
plotTAG_in = ['INPUT for', plotTAG];
plotTAG_out = ['OUTPUT', plotTAG];
if iswav == 1
%=====================================================
% Time Domain Plots for .WAV Input
%=====================================================
Ny = length(y);
Ts = 1/fs;
% sampling period
nT_in = (0:Nx-1)*Ts;
% time vector (zero-padded input)
nT_out = (0:Ny-1)*Ts;
% time vector (output)
axis_time = [0, max(Nx,Ny), -1.2, 1.2];
figure('Name','I/O');
clear figure(figNum);
figNum = figNum + 1;

%-------input--------subplot(2,1,1)
hold on;
plot(nT_in*fs,Input);
%plot(nT_in*fs,Input, 'b.');
%stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13);
grid on;
axis(axis_time);
title(['Input x[n]
(', plotTAG, ') , normalized & padded']);
ylabel('x[n]')
xlabel('n (samples)')
hold off;
%-------output --------subplot(2,1,2)
hold on;
plot(nT_out*fs,y, 'r');
%plot(nT_out*fs,y, 'r.');
%stem(nT(1:WLen2)*fs,x2(1:WLen2),'.','MarkerSize',13);

101

Tim Perry
V00213455
2009-05-21
grid on;
axis(axis_time)
title(['Output y[n]
ylabel('y[n]')
xlabel('n (samples)')
hold off;

(', plotTAG, ')

, normalized']);

%==================================================================
% Time-Frequency Plots
% Plots the following:
%
- Amplitude & Phase Waterfalls
%
- Magnitude Waterfall (dB scale)
%
- Phase vs. Time plot of Freq Bins near max amplitude
%
(in the case of a single tone sinusoid, will be centered
%
around pitch)
%==================================================================
if waterplot == 1
waterfallFIG =
waterfall_surf(ModuliIn,PhasesIn,nPlotIndices_a,fs,WLen,Ra,Rs,Nx,plotTAG);
end

%==============================================================
% Output Audio File
%==============================================================
y = y*0.999;
% lower level a bit more to remove wavrite clip warning
x = x/max(abs(x)); % normalize input for comparison playback
wavwrite(y, fs, nbits, fileTAG);

% write output audio file

wavplay(x, fs);
wavplay(y, fs);

% play PV input
% play PV output

%==============================================================
% Create Harmony
%==============================================================
harmony = zeros(Ny,2);
% stereo output file
harmony(:,1)=[Input; zeros(Ny-Nx,1)]*0.9; % assign input to left channel and zero
pad to output length
harmony(:,2)=y*0.9;
% assign shifted to right channel
harmTAG = [wavfile(1:length(wavfile)-4),'-Harmony',num2str(tStretch)];
harmTAG = strcat(harmTAG,'-L',num2str(WLen)...
,'-Ra',num2str(Ra),'Rs',num2str(Rs),'.wav');
%for file naming
wavwrite(harmony,fs,harmTAG);

else
%======================================================
% Time Domain Plots for Vector Input
%======================================================
Ny = length(y);
Nsamples_in = linspace(0,Nx,Nx);
Nsamples_out = linspace(0,Ny,Ny);
axis_time = [0, max(Nx,Ny), -1.2, 1.2];
figure (figNum);
clear figure(figNum);

102

Tim Perry
V00213455
2009-05-21
%------------normalized input plot-------------subplot(2,1,1);
hold on;
plot(Nsamples_in, Input,'b-');
stem(Nsamples_in, Input,'.','b-','MarkerSize',9);
axis(axis_time);
grid on;
title(['Input x[n]
(', plotTAG, ') , normalized']);
ylabel('x[n]');
xlabel('n');
hold off;
%------------normalized output plot-------------subplot(2,1,2);
hold on;
plot(Nsamples_out, y,'b-');
stem(Nsamples_out, y,'.','b-','MarkerSize',9);
axis(axis_time);
grid on;
title(['Output y[n]
(', plotTAG, ') , normalized']);
ylabel('y[n]');
xlabel('n');
hold off;
%==================================================================
% Time-Frequency Plots for vector input
% Plots the following:
%
- Amplitude & Phase Waterfalls
%
- Magnitude Waterfall (dB scale)
%
- Phase vs. Time plot of Freq Bins near max amplitude
%
(in the case of a single tone sinusoid, will be centered
%
around pitch)
%==================================================================
if waterplot == 1
waterfallFIG_in =
waterfall_vecPlot(ModuliIn,PhasesIn,nPlotIndices_a,fs,WLen,Ra,Rs,Nx,plotTAG_in);
waterfallFIG_out =
waterfall_vecPlot(ModuliOut,PhasesOut,nPlotIndices_a,fs,WLen,Ra,Rs,Ny,plotTAG_out);
end
end

103

Tim Perry
V00213455
2009-05-21

APPENDIX C waterfall_Plot.m
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% waterfall_Plot.m
Author: Tim Perry
% Elec484: DAFX
V00213455
% Final Project, Phase 1 part c
2009-07-08
%
% Function to perform time-frequency representation plots (waterfall plots)
% from the phase and moduli data provided by a phase vocoder.
% Plots the following:
%
- Amplitude Waterfall (linear scale)
%
- Phase Waterfall
%
- Magnitude Waterfall (dB scale)
%
- Phase vs. Time plot of Freq Bins near max amplitude
%
(in the case of a single tone sinusoid, will be centered
%
around pitch)
%
% FIGS = WATERFALL_PLOT(Moduli,Phases,nPlotIndices,fs,WLen,Ra,...
%
N,plotCODE,freqScale,plotTAG)
%
%
Moduli
- amplitude matrix from successive FFT frames
%
Phases
- phase matrix from successive FFT frames
%
nPlotIndices - sample indexes for FFT frame plots
%
fs
- sampling frequency
%
WLen
- analysis window size
%
Ra
- analysis hop size
%
N
- time axis length [samples] (ex: signal length)
%
plotCODE
- [1 1 1 1] plots Amp, Phase, Mag, Phase @ maxAmp Bins
%
- [0 0 0 0] plots nothing
%
LogFreq
- 1 for logarithmic freq scale, 0 for linear freq scale
%
plotTAG
- 'String-For-Naming-Plots'
%
%
FIGS = fig handles [waterAmpFIG waterPhaseFIG waterMagFIG pMaxBinFIG]
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [waterAmpFIG waterPhaseFIG waterMagFIG pMaxBinFIG]= waterfall_Plot...
(Moduli,Phases,nPlotIndices,fs,WLen,Ra,N,plotCODE,LogFreq,plotTAG)
colordef black;
figNumStart = get(0,'CurrentFigure');
%clear figure(figNumStart);
figNum = figNumStart + 1;
waterAmpFIG = [];
waterPhaseFIG = [];
waterMagFIG = [];
pMaxBinFIG = [];

%return empty array if figures not created

%==============================================================
% Time-Frequency Plots Paremters
%==============================================================
%ModuliIn_norm = 2*ModuliIn/(WLen/2); %Amp spectrum in quantity peak

104

Tim Perry
V00213455
2009-05-21
Amp_norm = Moduli/Ra;
% Amp spectrum in quantity peak
Mag_dB = 20*log10(Moduli/max(max(Moduli)));
f0 = fs/WLen;
% frequency resolution
numBins = WLen;
% # of frequency bins
%kBins = linspace(0,fs/2,numBins/2 + 1); %lin spaced freq bins up to Nyquist
kBins = linspace(0,fs/2,numBins/2 + 1); %lin spaced freq bins up to Nyquist
[n, k] = meshgrid(nPlotIndices,kBins);
% rectangular domain

if plotCODE(1) == 1;
%=================================================================
%-----------------Amplitude Waterfall--------------%=================================================================
waterAmpFIG = figure('Colormap',jet(128),'Name',...
'Waterfall Amplitude Plot','Position',[10,20,1500,950]);
clear figure(figNum);
figNum = figNum + 1;

hold on
%C = del2(n,k,AmpIn_norm(1:(numBins/2+1), :));
%colour mapping
%waterAmp = surf(n,k,Amp_norm(1:(numBins/2 + 1), :)); %plot input amps
waterAmp = meshz(n,k,Amp_norm(1:(numBins/2 + 1), :));
set(waterAmp,'MeshStyle','col')
%colormap winter
colorbar
axis([0,N,0,fs/2,0,max(max(Amp_norm))])
title(['Amplitude Waterfall of Short-time FFTs (', plotTAG, ')']);
grid on
ylabel('f [Hz]')
xlabel('n [samples]')
zlabel('|X_w(f)|')
hold off
if LogFreq == true
set(gca,'yscale','log')
%set(waterAmp,'MeshStyle','both')
%view(viewmtx(-70,8,10));
% set viewpoint
end
view(viewmtx(-70,25,25));
% set viewpoint
%view(viewmtx(-55,25,25));
% set viewpoint
ylabh = get(gca,'YLabel');
set(ylabh,'Position',get(ylabh,'Position') + [-1*fs 0 0])
end

if plotCODE(2) == 1;
%=================================================================
%-----------------Phase Waterfall------------------%=================================================================
waterPhaseFIG = figure('Colormap',jet(128),'Name',...
'Waterfall Phase Plot','Position',[20,20,1800,950]);
clear figure(figNum);
figNum = figNum + 1;

hold on
% waterPhase = surf(n,k,Phases(1:(numBins/2 + 1), :));
waterPhase = meshz(n,k,Phases(1:(numBins/2 + 1), :));

%plot

phases

105

Tim Perry
V00213455
2009-05-21
set(waterPhase,'MeshStyle','col')
colormap('jet')
colorbar
axis([0,N,0,fs/2,-3.2,3.2])
title(['Phase Waterfall of Short-time FFTs (', plotTAG, ')']);
grid on
ylabel('f [Hz]')
xlabel('n [samples]')
zlabel('Arg{X_w_(f)} [rad]')
hold off
if LogFreq == true
set(gca,'yscale','log')
end
view(viewmtx(-55,75,25));

% set viewpoint

end

if plotCODE(3) ==1;
%=================================================================
% Magnitude Waterfall (dB)
%=================================================================
waterMagFIG = figure('Name','Magnitude Waterfall [dB]','Position'...
,[300,300,1200,650]);
clear figure(figNum);
figNum = figNum + 1;;
% Mag_dB(numBins/2 + 1, numWin_a) = -190; % set to expand colourmap scale

hold on
%surf(n*Ts,k,Mag_dB(1:(numBins/2 + 1), :)); %plot input mag
%waterfall(k,n*Ts,Mag_dB(1:(numBins/2 + 1), :)); %plot input mag
waterMag = meshz(n,k,Mag_dB(1:(numBins/2 + 1), :));
set(waterMag,'MeshStyle','col')
colormap('jet')
colorbar('location','East')
axis([0,N,0,fs/2,1.2*min(min(Mag_dB)),0])
title(['Magnitude [dB] Waterfall of Short-time FFTs (', plotTAG, ')']);
grid on
ylabel('f [Hz]')
xlabel('n [samples]')
zlabel('20log|X_w(f)|
[dB]')
hold off
if LogFreq == true
set(gca,'yscale','log')
colormap('winter')
%set(waterMag,'MeshStyle','both')
%view(viewmtx(-70,8,10));
end
view(viewmtx(-75,20,10));

% set viewpoint

ylabh = get(gca,'YLabel');
% make freq axis label visible
set(ylabh,'Position',get(ylabh,'Position') + [-1*fs*(N/100000) 0 0])
end

106

Tim Perry
V00213455
2009-05-21
if plotCODE(4) ==1;
%=================================================================
%----------Phase vs. Time (@ Freqs Bins near max amplitude)------%=================================================================
pMaxBinFIG = figure('Name','Phase vs. Time','Position',...
[800,50,850,450]);
clear figure(figNum);
%figNum = figNum + 1;
%------find freq bins where max amplitude occures----[maxAmps, bins_max]=max(Moduli, [ ], 1);
% get indices @ max
center_bin = median(bins_max) - 1;
% bin w/ most peaks
bins_around = 5;
% # bins above and below to include
%--------define bins to plot (close to center_bin)------if (center_bin <= bins_around)
% freq bins above f
k_closeBins = [center_bin:center_bin+bins_around - 1];
elseif (center_bin >= numBins/2 - bins_around)
% freq bins below f
k_closeBins = [center_bin-5:center_bin];
else
% freq bins surrounding f
k_closeBins = [center_bin-bins_around:center_bin+bins_around - 1];
end

[n, k_close] = meshgrid(nPlotIndices,k_closeBins*f0);

% rect domain

hold on
waterfall(n,k_close,Phases(k_closeBins + 1, :));
%stem3(n,k_close,PhasesIn(k_closeBins, :),'.');
colorbar
axis([0,N,min(min(k_close)),max(max(k_close)),-3.2,3.2])
title(['Phase vs Time for Freq Bins Near Fundamental (', plotTAG, ')']);
grid on
ylabel('f [Hz]')
xlabel('n [samples]')
zlabel('Arg{X_w_(f)} [rad]')
hold off
view(114,77)

% set viewpoint (azimuth, elevation)

end

colordef white;

107

You might also like