You are on page 1of 170

1/f STRUCTURE OF TEMPORAL FLUCTUATION IN RHYTHM

PERFORMANCE AND RHYTHMIC COORDINATION


by
Summer K. Rankin

A Dissertation Submitted to the Faculty of


The Charles E. Schmidt College of Science
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy

Florida Atlantic University


Boca Raton, FL
August 2010
Copyright by Summer K. Rankin 2010

ii
VITA

Music has always been an important part of Summer’s life. She spent the
majority of her time from the ages of 6 to 23 performing musical theater. She noticed
early on that music also affects others deeply and always had an interest in the effect
of music on human behavior and its ability to evoke strong memories or feelings. For
her undergraduate education she pursued a Bachelor of Arts degree in Music (voice).
However, because of her passionate interest in the link between music and behavior,
she switched her focus (full-time) from performance to science and got a second Bach-
elors degree in Psychology. She then joined Dr. Large’s Music Dynamics Laboratory
in 2003 as a graduate student and research assistant/teaching assistant/lab manager
at the Center for Complex Systems and Brain Sciences.

iv
ACKNOWLEDGEMENTS

I would like to thank my committee, Dr. Ed Large, Dr. Betty Tuller, and
Dr. Larry Liebovitch. Ed Large, who took a chance on a music major who came to
his office and said she was interested in music cognition. Your belief that I would be
able to handle whatever it takes has been instrumental to my success. Ted Zanto,
who taught me all sorts of things about methodology, MATLAB, house buying, more
cowbell, and trying to enjoy the process as much as possible. Daniel Shaffer, who was
extremely helpful with data collection and matching our MIDI performances. Heather
Chapin, who has always supported me with unconditional love and powerful compas-
sion. I don’t know where I would be without your mentorship, our conversations, and
belly dance excursions along the way. Thank you, to Carol Tran, for giving me faith
in the process and myself when I had none, and teaching me about the important
things in taking it all one day at a time. My family who has always been there for
me through thick and thin–you never gave up. Grandma(s), Granddad(s), Heather,
Ashley, Mama, and Dad for supporting my decision to completely change my career
choice and stay in school “forever”. It was all of you who first taught me how to
problem solve, work hard, take risks, give freely, and enjoy it. I wouldn’t have been
able to stay the course without all of your generosity and support. I hope to pay it
all forward throughout my life. Dad and Mom, you’ve been the best parents anyone
could have asked for. I do not easily forget all the rehearsals that you drove me to,
lessons you paid for, performances you videotaped, and wounds–both physical and
emotional–that you have kissed to make it all better. You two have taught me so

v
much about the world, life, and most importantly, what it means to be a good person.
Last, but not least, Ajay, who was there for me through it all. Thank you for loving
me so completely.

vi
ABSTRACT

Author: Summer K. Rankin


Title: 1/f Structure of Temporal Fluctuation in Rhythm
Performance and Rhythmic Coordination
Institution: Florida Atlantic University
Dissertation Advisor: Dr. Edward W. Large
Degree: Doctor of Philosophy
Year: 2010

This dissertation investigated the nature of pulse in the tempo fluctuation of


music performance and how people entrain with these performed musical rhythms.
In Experiment 1, one skilled pianist performed four compositions with natural tempo
fluctuation. The changes in tempo showed long-range correlation and fractal (1/f )
scaling for all four performances. To determine whether the finding of 1/f structure
would generalize to other pianists, musical styles, and performance practices, fractal
analyses were conducted on a large database of piano performances in Experiment 3.
Analyses revealed significant long-range serial correlations in 96% of the performances.
Analysis showed that the degree of fractal structure depended on piece, suggesting
that there is something in the composition’s musical structure which causes pianists’
tempo fluctuations to have a similar degree of fractal structure. Thus, musical tempo
fluctuations exhibit long-range correlations and fractal scaling.
To examine how people entrain to these temporal fluctuations, a series of
behavioral experiments were conducted where subjects were asked to tap the pulse

vii
(beat) to temporally fluctuating stimuli. The stimuli for Experiment 2 were musi-
cal performances from Experiment 1, with mechanical versions serving as controls.
Subjects entrained to all stimuli at two metrical levels, and predicted the tempo fluc-
tuations observed in Experiment 1. Fractal analyses showed that the fractal structure
of the stimuli was reflected in the inter-tap intervals, suggesting a possible relationship
between fractal tempo scaling, pulse perception, and entrainment. Experiments 4-7
investigated the extent to which people use long-range correlation and fractal scal-
ing to predict tempo fluctuations in fluctuating rhythmic sequences. Both natural
and synthetic long-range correlations enabled prediction, as well as shuffled versions
which contained no long-term fluctuations. Fractal structure of the stimuli was again
reflected in the inter-tap intervals, with persistence for the fractal stimuli, and anti-
persistence for the shuffled stimuli. 1/f temporal structure is sufficient though not
necessary for prediction of fluctuations in a stimulus with large temporal fluctuations.

viii
This manuscript is dedicated to Margaret K. Harmon & Deborah H. Rankin;
the women who taught me that I could do anything.
Contents

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Pulse and Meter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Tempo Fluctuation in Music Performance . . . . . . . . . . . . . . . 5
1.3.1 Types of Structure . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Origins of Temporal Fluctuation . . . . . . . . . . . . . . . . . 9
1.4 1/f Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Entrainment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.1 Continuation tapping . . . . . . . . . . . . . . . . . . . . . . . 10
1.5.2 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Models of Pulse and Meter . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6.1 Tempo Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6.2 Tracking vs. Prediction . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Summary and Perspective . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Fractal Fluctuation and Pulse Prediction . . . . . . . . . . . . . . . 22


2.1 Experiment 1: Fractal Structure in Piano Performance . . . . . . . . 23
2.1.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

ix
2.1.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Experiment 2: Synchronizing with Piano Performances . . . . . . . . 38
2.2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.3 Discussion: Experiment 2 . . . . . . . . . . . . . . . . . . . . 50
2.3 General Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3 1/f β Tempo Fluctuations in Skilled Piano Performances . . . . . . 55


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.1 Beat extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.2.2 Fractal analyses . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.1 ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4 Entrainment with Temporally Fluctuating Stimuli . . . . . . . . . . 62


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.1 Is 1/f Structure Sufficient? . . . . . . . . . . . . . . . . . . . . 64
4.1.2 Is 1/f Structure Necessary? . . . . . . . . . . . . . . . . . . . 65
4.2 Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.1 Tempo Fluctuations . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4 Experiment 4: Quarter-note Beats . . . . . . . . . . . . . . . . . . . . 73
4.4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

x
4.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5 Experiment 5: Eighth-note Beats . . . . . . . . . . . . . . . . . . . . 80
4.5.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.5.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6 Experiment 6: Rhythm . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.6.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.7 Experiment 7: Music . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.7.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.7.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.8 Meta-analysis for Experiments 4-7 . . . . . . . . . . . . . . . . . . . . 99
4.8.1 Cross-correlations . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.8.2 Fractal Analyses . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.9 General Discussion: Experiments 4-7 . . . . . . . . . . . . . . . . . . 108
4.9.1 Is 1/f Structure Strongly Sufficient? . . . . . . . . . . . . . . 109
4.9.2 Is 1/f Structure Necessary? . . . . . . . . . . . . . . . . . . . 109
4.9.3 Fractal Structure of Tapping Data . . . . . . . . . . . . . . . . 110

xi
5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.1 Pulse in Music is 1/f . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Coordination With Large Fluctuations . . . . . . . . . . . . . . . . . 114
5.2.1 1/f Structure is Sufficient . . . . . . . . . . . . . . . . . . . . 115
5.2.2 1/f Structure is not Necessary . . . . . . . . . . . . . . . . . . 116
5.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4 1/f Structure in the World . . . . . . . . . . . . . . . . . . . . . . . . 120

A Circular Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

B Fractal Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124


B.1 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . 124
B.2 Hurst’s Rescaled Range Analysis . . . . . . . . . . . . . . . . . . . . 124

C Synthesis of 1/f β noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

D Stimuli: Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

xii
List of Tables

2.1 Number of events and mean H for the IBIs of each performance at
three metrical levels. The mean tempo for each piece is listed in the
last column. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Mean Hurst exponent (H ) for the Inter-tap Intervals and relative phase
time series, averaged across subjects and trials. Percentage of signifi-
cant trials (persistent (P) or anti-persistent (AP)) based on R/S anal-
ysis (p < .05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1 Fractal results of all performances averaged across composition (∗p <
.05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.1 Mean IBI, number of events, Hf Gn , and the autocorrelation at lag 1


(ac1 ) are listed for the IBIs of each fluctuation at all three metrical
1
levels ( 16 -note, 18 -note, 41 -note). . . . . . . . . . . . . . . . . . . . . . 70
4.2 Fractal statistics for the ITIs and relative phase time series from Ex-
periment 4: Quarter-note Beats. Mean Hurst exponents (H from R/S
analysis) averaged across subjects, and the percentage of trials signifi-
cantly persistent (P), anti-persistent (AP), or not significant (NS), are
displayed (p < .05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Fractal statistics for the ITIs and relative phase time series from Ex-
periment 5: Eighth-note Beats. Mean Hurst exponents (H from R/S
analysis) averaged across subjects, and the percentage of trials signifi-
cantly persistent (P), anti-persistent (AP), or not significant (NS), are
displayed (p < .05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.4 Fractal statistics for the ITIs and relative phase time series from Ex-
periment 6: Rhythm. Mean Hurst exponents (H from R/S analysis)
averaged across subjects, and the percentage of trials significantly per-
sistent (P), anti-persistent (AP), or not significant (NS), are displayed
(p < .05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

xiii
4.5 Fractal statistics for the ITIs and relative phase time series from Ex-
periment 7: Music. Mean Hurst exponents (H from R/S analysis)
averaged across subjects, and the percentage of trials significantly per-
sistent (P), anti-persistent (AP), or not significant (NS), are displayed
(p < .05). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6 Fractal statistics for the ITIs and relative phase time series from Ex-
periments 4-7. Mean Hurst exponents (H from R/S analysis) averaged
across subjects, and the percentage of trials significantly persistent (P),
anti-persistent (AP), or not significant (NS), are displayed (p < .05). . 107

xiv
List of Figures

1.1 The first four measures of J.S. Bach’s Goldberg Variations. The musical
notation (A), metronomic piano roll (B), IBI plots at the eighth-note
metrical level (C), and performed piano roll (D), with metrical hierar-
chies for each. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Notation, tempo, and dynamics of the expressive performance of J.S.


Bach’s Goldberg Variations, Aria. Top panel: Piano roll showing onset
time, pitch, duration, and velocity (color). Bottom panel: Tempo map
showing beats per minute (bpm=60/IBI). . . . . . . . . . . . . . . . . 25
2.2 Notation, tempo, and dynamics of the expressive performance of Piano
Sonata No. 8 in C minor Op. 13, Mvt. 1, by Ludwig van Beethoven.
Top panel: Piano roll showing onset time, pitch, duration, and ve-
locity (color). Bottom panel: Tempo map showing beats per minute
(bpm=60/IBI). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Notation, tempo, and dynamics of the expressive performance of Etude
in E major, Op. 10, No. 3 by Frédéric Chopin. Top panel: Piano
roll showing onset time, pitch, duration, and velocity (color). Bottom
panel: Tempo map showing beats per minute (bpm=60/IBI). . . . . . 27
2.4 Notation, tempo, and dynamics of the expressive performance of I Got
Rhythm by George Gershwin. Top panel: Piano roll showing onset
time, pitch, duration, and velocity (color). Bottom panel: Tempo map
showing beats per minute (bpm=60/IBI). . . . . . . . . . . . . . . . . 28
2.5 The matcher window after each note in the Bach performance was
matched to a note in the score. The notes in blue represent notes in
the performance (bottom) that have been matched to a chord in the
score (top). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6 Spectral density (A-D) and rescaled range (E-H) analyses of the IBIs
at the eighth-note level of the performances for Bach (A, E), Beethoven
(B, F), Chopin (C, G), and Gershwin (D, H). . . . . . . . . . . . . . 35

xv
2.7 Mean relative phase values for subject 5 tapping to each condition of
Goldberg Variations, Aria by J.S. Bach. Values were averaged over all
trials for each condition. . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.8 Mean relative phase values for subject 5 when tapping to the expressive
version of Frédéric Chopin’s Etude in E major, Op.10, No.3. Values
were averaged over all trials for the eighth-note condition (top) and
quarter-note condition (bottom). The boxes emphasize the measures
where subjects consistently tapped out of phase during the expressive
conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.9 Measures 43-53 of Chopin’s Etude in E major, Op.10 No.3 where sub-
jects shifted the phase of their taps. . . . . . . . . . . . . . . . . . . . 44
2.10 Mean (A, B) and angular deviation (C, D) of relative phase, averaged
across the entire performance, as a function of performance type for
each piece (A, C) and tapping level (B, D). Error bars represent one
standard error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.11 Prediction and tracking indices for the expressive versions of Bach and
Chopin at the eighth- and quarter-note levels. Error bars represent
one standard error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.12 The tempo map (bpm=60/IBI) for 3 different metrical levels (sixteenth-
note, eighth-note, quarter-note) of Chopin’s Etude in E major, Op.10,
No.3, illustrating fractal scaling of tempo fluctuations. Fractal scal-
ing implies that changes at fast time scales could facilitate prediction
of changes at slow time scales (downward arrow), persistence implies
that changes early in the sequence could facilitate prediction of changes
later in the sequence (arched arrow). . . . . . . . . . . . . . . . . . . 54

3.1 An example of the beat extraction process for the audio recordings of
the Mazurka performances. Time is shown on the x-axis, instantaneous
amplitude on the y-axis. A) The audio waveform with locations of
taps (green) and the output from the spectrogram (onset detection) is
displayed as the black line. B) Taps (green) have been snapped to the
nearest onset (purple). C) Error corrected beats (purple), from Sapp,
(2009). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 A)A measure of the musical notation from Mazurka in A minor Op.17,
No.4. B) The corresponding measure of audio from a performance
and the corresponding corrected beat times (purple), onset detections
(orange vertical lines), and function from the spectrogram (black). C)
Harmonic Spectrogram. . . . . . . . . . . . . . . . . . . . . . . . . . . 58

xvi
3.3 A one-way ANOVA was performed on the β values of the 17 pianists’
performances of all five Chopin mazurkas. Error bars represent stan-
dard error of the mean. . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1 Raw time series (IBI plot), distribution (histogram), PSD, and R/S
analyses of the three types of tempo fluctuation at the 16th-note met-
rical level. A) Human tempo fluctuations, B)shuffled tempo fluctua-
tions, C) synthetic tempo fluctuations. . . . . . . . . . . . . . . . . . 69
4.2 Hypothesized lag 0 and lag 1 cross-correlation coefficients (ITI x IBI)
for each type of tempo fluctuation. . . . . . . . . . . . . . . . . . . . 71
4.3 Hypothesized lag 0 cross-correlation coefficients (ITI x IBI) for the
human fluctuation condition of each experiment. . . . . . . . . . . . . 72
4.4 Mean coefficients from the cross-correlation between ITIs and IBIs at
lag-1, lag 0, and lag 1 for Experiment 4: Quarter-note Beats. The
percentage of trials for which the cross-correlation coefficient was sig-
nificantly different from zero are listed if the percentage was less than
100%. Error bars equal standard error of the mean. . . . . . . . . . . 78
4.5 Mean coefficients from the cross-correlation between ITIs and IBIs at
lag-1, lag 0, and lag 1 for Experiment 5: Eighth-note Beats. The
percentage of trials for which the cross-correlation coefficient was sig-
nificantly different from zero are listed if the percentage was less than
100%. Error bars equal standard error of the mean. . . . . . . . . . . 84
4.6 Mean coefficients from the cross-correlation between ITIs and IBIs at
lag-1, lag 0, and lag 1 for Experiment 6: Rhythm. The percentage of
trials for which the cross-correlation coefficient was significantly differ-
ent from zero are listed if the percentage was less than 100%. Error
bars equal standard error of the mean. . . . . . . . . . . . . . . . . . 91
4.7 Mean coefficients from the cross-correlation between ITIs and IBIs at
lag-1, lag 0, and lag 1 for Experiment 7: Music. The percentage of tri-
als for which the cross-correlation coefficient was significantly different
from zero are listed if the percentage was less than 100%. Error bars
equal standard error of the mean. . . . . . . . . . . . . . . . . . . . . 96
4.8 Means of the lag 0 cross-correlation coefficients (ITI x IBI) for all
experiments and fluctuations. A) Shows the main effect of Fluctuation,
B) shows the main effect of Experiment, and C) shows the interaction
between Experiment and Fluctuation. . . . . . . . . . . . . . . . . . . 101

xvii
4.9 Means of the lag 1 cross-correlation coefficients (ITI x IBI) for all
experiments and fluctuations. A) Shows the main effect of Fluctuation,
B) shows the interaction between Experiment and Fluctuation. . . . . 103
4.10 Means of the lag-1 cross-correlation coefficients (ITI x IBI) for all ex-
periments and fluctuations. A) Shows the main effect of Fluctuation,
B) shows the interaction between Experiment and Fluctuation. . . . . 103
4.11 Means of the Hurst Exponents for the inter-tap intervals (ITIs). A)
Displays the main effect of Fluctuation, B) displays the interaction
between Experiment and Fluctuation, The percentage of trials signif-
icantly different from H = .5 (blue line) are listed. Error bars equal
standard error of the mean. . . . . . . . . . . . . . . . . . . . . . . . 105
4.12 Means of the Hurst Exponents for the relative phase time series. A)
Shows the main effect of Experiment, B) shows the main effect of
Fluctuation. Error bars equal standard error of the mean. (H = .5,
blue line) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

xviii
Chapter 1

Introduction

1.1 MOTIVATION

Music is a form of auditory communication involving complex temporally structured


sequences of events. Certain, temporal regularities are found in the music of all
known cultures (McNeill, 1995; Nettl, 2000; Clayton et al., 2004), suggesting that
general principles of neural dynamics underlie temporal aspects of music perception
and production. In other words, our experiences of music reflect not merely sound pat-
terns, but also the dynamical interaction of musical events with the neural processes.
Thus, investigating the universal temporal structure of music provides a unique way
of understanding the fundamental properties of neural dynamics. Western music is
composed and notated according to an underlying, temporally regular pulse. But mu-
sical notation is a mere approximation of a more complex reality that pulse fluctuates
when music is performed. Yet, humans perceive and effortlessly adapt to fluctuations
in musical performance. What is the fundamental nature of pulse in music and how
do people achieve the complex task of entraining to a temporally fluctuating stimu-
lus? This dissertation addresses this question by examining the structure of temporal
fluctuation in musical performance and the extent to which humans coordinate motor
behavior with musical performances.

1
1.2 PULSE AND METER

A rhythm is a sequence of sounds and silences organized in time, and is an intrinsic


feature of music and speech. Beat, or pulse, is the perceived periodicity of a rhythm
(Lerdahl and Jackendoff, 1983). Cooper and Meyer (1960) define pulse as an internal,
psychological percept of regularly recurring, precisely equivalent psychological events.
Pulse is not a property of the acoustic stimulus, but is an endogenous process that
arises in response to a musical rhythm (Large, 2008; Cooper and Meyer, 1960). Meter,
which is thought to be inferred from a rhythm (Cooper and Meyer, 1960; Lerdahl and
Jackendoff, 1983), is the perception of an alteration between strong and weak pulses.
Lerdahl and Jackendoff (1983) define metrical structure as the regular, hierarchical
pattern of beats to which the listener relates musical events. Meter can be notated
as beats at different time scales or metrical levels, with stronger beats occurring at
more than one metrical level (Lerdahl and Jackendoff, 1983; London, 2004). For this
dissertation, the term pulse will be used to refer to the most salient level of beats
(Large, 2008). For example, when you tap your foot to a song, this is at the level of
the pulse.
Figure 1.1A illustrates metrical structure of the first four measures of J.S. Bach’s
Goldberg Variations. The dots under the musical notation represent beats at three
different metrical levels (one row for each level). In musical notation, beats are un-
derstood to be equally spaced in time. However, the piano roll from a mechanical
performance of these measures is much more appropriate for illustrating the period-
icity of the metrical hierarchy, because time is plotted on the x-axis. In the piano
roll (Figure 1.1B), each rectangle represents a note played on the piano. The length
of the rectangle denotes the length of the note, therefore, the rectangles appear as
they occur in time. This particular piano roll was sequenced to contain no tempo

2
fluctuations; the music was played exactly as it is written in the musical notation (A),
thus it has a metronomic, or isochronous (events evenly spaced in time), pulse which
is illustrated by the metrical hierarchy under the piano roll–equal space between each
beat.
It has been proposed that pulse and meter underlie the development of temporal
expectancy for when acoustic events will occur. Theorists concerned primarily with
musicological analysis emphasize periodicity of beat and pulse in music (Cooper and
Meyer, 1960; Lerdahl and Jackendoff, 1983; Yeston, 1976; Zuckerkandl, 1956). This
implicitly assumes that the pulse of music is isochronous (Large and Palmer, 2002;
Large and Kolen, 1994; Wing and Kristofferson, 1973). However, other researchers
emphasize that pulse in performed music is flexible (e.g., Epstein, 1995). Empirical
research has demonstrated that pulse in musical performance is not periodic, and im-
portant relationships between musical structure and patterns of temporal fluctuation
exist (for a review, see Palmer, 1997).
Despite, and perhaps because of, fluctuations in tempo listeners perceive temporal
regularity, including pulse and meter (Epstein, 1995; Large and Palmer, 2002). Pulse
is induced in response to a rhythm but is stable in the absence of the stimulus or
in the presence of temporal fluctuations (Cooper and Meyer, 1960; Large, 2008).
Are tempo fluctuations deviations from isochrony, or are fluctuations intrinsic to
pulse? If tempo fluctuations violate intrinsically isochronous expectancies, listeners
would be expected to react to, or track, temporal fluctuations. If, on the other hand,
fluctuations are intrinsic to pulse, then listeners may predict temporal fluctuations,
resulting in synchronization of pulse with a fluctuating tempo. This dissertation
investigates the nature of pulse and temporal expectancy, and whether subjects track
or predict changes in tempo.

3
A phrase
measure

chord

eighth:
quarter: weak
dotted-half:
bar lines metrical
hierarchy
B

F
D
B
G
pitch

A
F
D
B
G
0 2 4 6 time (sec) 8 10 12 14

eighth:
quarter:
dotted-half:

C
0.65
IBI (sec)

0.6

0.55

0.5

0 2 4 6 8 10 12 14
time (sec)

F
D
B
G
pitch

A
F
D
B
G
0 2 4 6 8 10 12 14
eighth:
quarter:
dotted-half:
time (sec)

Figure 1.1: The first four measures of J.S. Bach’s Goldberg Variations.
The musical notation (A), metronomic piano roll (B), IBI plots at the
eighth-note metrical level (C), and performed piano roll (D), with metrical
hierarchies for each.

4
1.3 TEMPO FLUCTUATION IN MUSIC PERFORMANCE

A piece of music is never performed precisely as notated. Composers notate music to


communicate a composition in written form. Based on the notation, performers pro-
duce motor movements interacting with a musical instrument, which produces sound.
Musicians produce intentional and unintentional tempo changes during performance
(Palmer, 1997) that highlight important aspects of musical structure (Clarke, 1987;
Shaffer and Todd, 1987; Sloboda, 1985; Todd, 1985) and convey affect and emotion
(Gabrielsson, 1995; Bhatara et al., 2010; Chapin et al., 2010; Sloboda and Juslin,
2001). There are three main types of structural relationships between temporal fluc-
tuations and music that have been identified and examined by several researchers:
grouping, meter, and melody. Although rhythms in performed music may be played
with tempo fluctuation, the underlying perceived metrical structure is maintained,
resulting in the perception of pulse and beats. This underlying temporal reference is
used to analyze the temporal fluctuations in performance.
Tempo means time in Italian; in music, tempo refers to the speed at which music
is performed. The tempo of a piece of music can be quantified by extracting the
time of each beat at a particular metrical level and taking differences to produce
a time series of inter-beat intervals (IBIs). An IBI plot shows the structure of the
tempo fluctuations, and an IBI time series allows for analyses that can characterize
these fluctuations. If the tempo of a rhythm is steady (mechanical, metronomic,
isochronous), each IBI will be equal, and the IBI plot will be a horizontal straight
line. If the tempo fluctuates, the IBIs will have different values. An example of the
IBI plot at the eighth-note metrical level for a mechanical (red) and an expressive
(blue) performance of the first four measures of Bach’s Goldberg Variations is shown
in Figure 1.1C. Time is on the x-axis and IBI is on the y-axis. There is a point on the

5
graph for each beat in the metrical hierarchy (B, D) with the exception of the first
beat. Each point indicates the time interval between the previous and current beats.
Increasing IBI indicates a greater amount of time between beats, and a deceleration
in tempo. Decreasing IBI indicates a shorter amount of time between beats, and
an acceleration in tempo. When the line is decreasing, the IBIs are smaller and the
tempo is getting faster.

1.3.1 Types of Structure

Music can be divided into successive temporal segments, or groups, allowing for de-
scription of temporal characteristics at different time scales (i.e. rhythmic groups,
measures, phrases, movements, pieces; Palmer and van de Sande, 1995). Grouping of
musical sequences into smaller sub-sequences is a common method of analyzing mu-
sic, for musicians, music theorists, and researchers (Lerdahl and Jackendoff, 1983).
Performers use tempo fluctuations to mark the boundaries between groups, or in-
crease the perceptual cohesiveness within groups (Palmer, 1992). Phrase is defined:
“A short section of a composition into which the music seems to naturally fall. The
art of phrasing by a performer is often instinctive and is one of the features by which
a supreme artist may be distinguished from one of lesser inspiration.” (Oxf, 2010).
It has been said that grouping at the level of the phrase is the most salient and
important in terms of musical expression (Palmer, 1997).
Phrases have been characterized as having an acceleration-deceleration pattern,
with more exaggerated decelerations at boundaries (Penel and Drake, 1998; Todd,
1985). This is called phrase-final lengthening and is also found in speech (Klatt,
1975). The amount of slowing at a boundary reflects the depth of embedding (Shaffer
and Todd, 1987; Todd, 1985). For example, the largest deceleration is typically at
the end of a piece. Figure 1.1A shows the phrase and the length of a measure with

6
curved lines above the musical notation and an IBI plot (C, blue) of a performance
of this notation. Note the increase in IBI at the end of the phrase.
Metrical structure is another aspect of music which performers highlight with
tempo fluctuation. Meter implies which beats should be strong and which beats
should be weak. Performers emphasize these distinctions with lengthened durations
when a note corresponds with a strong beat, delayed onsets of weak beats, and
smooth or detached articulation1 (Henderson, 1936; Sloboda, 1985; Palmer, 1992;
Repp, 1999c).
A common form of melodic accentuation in temporal fluctuation is termed melody
leading (Palmer, 1989; Palmer, 1996). In polyphonic music, multiple notes are written
to be played at the same time; this is called a chord and is illustrated in Figure 1.1A.
Typically, one note in a chord belongs to the melody (melodic voice), and performers
emphasize the melody by playing it sooner than the other notes in a chord (7-50 ms).
This is done even when each voice is played by a different person/instrument. Palmer
proposes that melody leads may serve to separate voices perceptually (Palmer, 1996),
and experiments with simple tone sequences show that listeners perceive tones that
are temporally offset as belonging to separate streams (Bregman and Pinker, 1978).
Similarities between pianists’ temporal fluctuations indicate that compositional
structure leads them to make these fluctuations. Researchers have examined the re-
lationship between structure and fluctuation by comparing different performances of
the same piece of music, asking pianists to play a piece of music with a different
interpretation, and having pianists sight-read music (playing music without having
seen it before). Pianists are able to replicate their fluctuations across performances,
with very low variability (Repp, 1999c; Shaffer and Todd, 1987; Henderson, 1936).
1
A small (smooth) or large (detached) amount of time between the offset of a note and the onset
of the next note.

7
When asked to play a piece with a different interpretation, they are able to do so
without practicing (Palmer, 1989; Palmer, 1996). If the structure of a piece of music
is what dictates fluctuation, then pianists should not be able to imitate a perfor-
mance with inappropriate fluctuations. However, Clarke (1993) found that pianists
could accurately reproduce such performances. Repp analyzed 28 performances of
Schumann’s Traumerei and found that variability increased for grouping levels that
were smaller than the phrase level (Repp, 1992b). An average of the performance
IBIs was taken to reveal similar fluctuations, this is called a typical timing profile.
Penel and Drake (1998) also examined three musical performances and one mechani-
cal performance from eight pianists performing the first eight measures of Traumerei
and found that hierarchical and rhythmic groups accounted for approximately 60%
of the variance for both musical and mechanical performances.
Moreover, musicians have difficulty playing in a perfectly metronomic, or mechani-
cal, fashion. When asked to play without tempo fluctuations, pianists’ fluctuations are
greatly reduced, but not absent (Palmer, 1989; Repp, 1998; Bengtsson and Gabriels-
son, 1983). These small, unintentional timing fluctuations are highly correlated with
the timing fluctuations that appear in intentionally expressive performances (r=.54)
from the same pianists (Repp, 1999c). In other words, the deviations from regularity
were reliable and similar among pianists. In another experiment, the pianists were
given a metronome to play with (Repp, 1999c). Pianists’ deviations were highly cor-
related with their previous metronomic performances (r=.57) and their expressive
performances (r=.32). Pianists fluctuate the tempo for mechanical performances in
a similar manner as for expressive performances, but to a lesser degree. For this rea-
son, it is believed that fluctuations present in mechanical performances are residual
expressive fluctuations that musicians are unable to suppress.

8
1.3.2 Origins of Temporal Fluctuation

Tempo fluctuations are thought to be the result of the musician’s conscious and sub-
conscious structural analysis of the score combined with emotional expression and
motor constraints (Repp, 1992a; Clarke, 1985; Clarke, 1987). Penel and Drake (1998)
hypothesized that fluctuations in tempo come from biomechanical constraints, oblig-
atory perceptual-motor patterns, and intentional communication of structural and
emotional aspects of the music. They proposed that both the high-level (expression)
processes, where musicians’ fluctuations are an attempt to communicate a specific
structural interpretation to the listener, and the lower-level (perceptual) processes,
where timing fluctuations are attributed to local analysis of surface acoustic features
of the music, contribute to the fluctuations in expressive performance. Only the
lower-level processes are proposed to contribute to the temporal fluctuations in a
mechanical performance (Penel and Drake, 1998). Penel and Drake’s (1998) results
suggest that the timing fluctuations that occur when musicians attempt to play in
a metronomic fashion compensate for perceptual distortions of temporal intervals
caused by the grouping structure of the music.
Musical characteristics such as phrases and metrical structure are proposed to
determine musicians’ patterns of temporal fluctuation, and fluctuations that cannot
be attributed to musical structure or emotional communication are either unmodelled
or dismissed as white noise (Palmer, 1997; Repp, 1992b). Experiments 1 and 3
explore the nature of tempo fluctuation in music performance through analysis of the
sequential aspects of timing fluctuations (IBIs) in piano performances. Nine complete
pieces of music, played by multiple expert pianists, were analyzed to examine whether
there are generalizable properties in tempo fluctuations that transcend individual
performers, composers, or styles of music.

9
1.4 1/f STRUCTURE

1/f , or fractal, structure is prevalent in physiological temporal fluctuations, such as


heart rate, and psychological fluctuations, such as reaction time (West and Shlesinger,
1989; West and Shlesinger, 1990; Gilden, 2001; van Orden et al., 2003). Fractal analy-
sis characterizes long-term correlation and self-similarity by taking sequential aspects
of the time series into account. The sequential aspects of fluctuations in fractal time
series are not statistically independent, but exhibit serial dependence or long-term
correlations. Moreover, fractal time series exhibit ever smaller amplitude fluctuations
at ever shorter time scales, a property referred to as self-similarity. 1/f structure has
been reported in a wide range of phenomena at different levels of organization, in-
cluding rhythmic behavior and aspects of musical structure. However, it has not been
established whether the tempo of music performance exhibits 1/f structure, nor have
any studies used musical patterns to gauge the long-term structure. Experiments 1
and 3 use ecologically valid, complete pieces of music as stimuli to establish whether
pulse in music performance exhibits 1/f structure. Experiments 2 and 4-7 use mu-
sic performances that contain large tempo fluctuations to investigate the structure
of taps, errors, and relationship between the structure of the tapping data and the
structure of the music.

1.5 ENTRAINMENT

1.5.1 Continuation tapping

Studies have used periodic motor behavior to probe the perception of pulse (reviewed
by Repp, 2005). Self-paced tapping, specifically continuation tapping, has been stud-
ied by many researchers, to probe people’s ability to perceive, internally estimate,
remember and reproduce time intervals (Wing and Kristofferson, 1973). Listeners

10
are presented with an isochronous auditory rhythm and asked to tap in synchrony
with each event (1:1 synchronization), and continue tapping at the same rate after
the stimulus stops. Subjects have little difficulty synchronizing with the stimulus and
maintaining a tapping rate that matches the period of the stimulus after it has ceased
(Stevens, 1886).
Linear timing models, called timekeeper models, use a central timekeeper to track
temporal intervals (Wing and Kristofferson, 1973). The timekeeper model is based on
two independent processes: an internal clock and a motor component. First, the tem-
poral intervals are perceived and encoded, which sets the internal clock (timekeeper).
The timekeeper facilitates the production of motor commands, which generate peri-
odic temporal intervals. The properties of the central time keeper are inferred from
the ITIs. Large (2008) noted this evidence as an example of pulse stability. Vorberg
and Hambuch (1978) studied continuation tapping with the synchronization portion
as n:1 and found ITIs had recurring timing patterns that reflected the period of the
IOI (inter-onset interval) induced by the preceding synchronization task. This was
evidence of a persisting hierarchical metrical organization.
Because continuation tapping is used as an index of time perception, most con-
tinuation tapping studies only collect tens of taps, and do not consider sequential
aspects of the ITIs or error time series. Recently, researchers collected considerably
longer sequences of continuation tapping (hundreds to thousands of intervals) and
found that the power spectrum of the ITIs (inter-tap intervals) is characterized by
a linear negative slope of log-power vs. log-frequency (e.g., Delignières et al., 2004;
Gilden et al., 1995; Lemoine et al., 2006; Madison, 2004; Chen et al., 2002), which
is typical of 1/f noise. In other words, while the continuation taps matched the
period of the stimuli, the variations (fluctuations) in the ITIs exhibited fractal struc-
ture. This 1/f structure violates the assumption that the central timekeeper is a

11
source of white noise (Wing and Kristofferson, 1973). Researchers assume that this
1/f structure represents the variability inherent to the internal timekeeping process
(Gilden et al., 1995; Delignières et al., 2004; Delignières et al., 2008) and propose
models in which the central timekeeper is a source of 1/f noise (Gilden et al., 1995;
Delignières et al., 2008). Delignières et al. (2008) showed that this model–which uses
a fractal timekeeper instead of an isochronous timekeeper, and Gaussian white noise
for error correction–accounted for the statistical properties observed in the ITIs of
self-paced tapping.

1.5.2 Synchronization

With externally paced tapping, or synchronization, people are asked to maintain a


constant phase relationship between their movements and the stimulus. Over the
past century, numerous studies have probed the coordination of periodic behavior
with periodic auditory sequences (for a recent review, see Repp, 2005). As with
continuation tapping, one way to view synchronization is in terms of the perception
of time, generation of temporal intervals, and addition of Gaussian white noise. The
addition of a linear error correction mechanism allows for motor synchronization
with an external stimulus (Vorberg and Wing, 1996; Schulze et al., 2005). Both
the timekeeper and the error correction processes were regarded as separate sources
of independent white noise (Wing and Kristofferson, 1973). In general, the linear
models need a dedicated component to keep the time (assumed to be an isochronous
pulse) in addition to the error correction mechanism (white noise) (Ivry and Schlerf,
2008; Wing and Kristofferson, 1973; Vorberg and Wing, 1996). Many studies have
provided evidence and support for this interpretation of the mechanisms for self-paced
and synchronization tapping (see Vorberg and Wing, 1996). An alternative, is the
oscillator models used in nonlinear dynamics, which do not parse synchronization

12
into perception, timing, and motor components, but are holistic models emphasizing
dynamical properties (Large and Kolen, 1994; deGuzman and Kelso, 1991). The
nonlinear dynamic models attempt to explain meter perception, temporal expectancy,
and rhythmic attending (Large and Palmer, 2002; Jones and Boltz, 1989; Large and
Jones, 1999).

1:1 Tapping The measures that are used to index synchronization behavior are
mean asynchrony and tapping variability. When tapping to an isochronous stim-
ulus, (1:1 tapping is one tap for each event) subjects precede the stimulus consis-
tently rather than varying the asynchronies symmetrically about the stimulus, and
this is known as negative mean asynchrony (NMA; Woodrow, 1932). The variabil-
ity of ITIs and asynchronies has been shown to depend on the rate of the stim-
ulus and the amount of information between taps. When synchronizing with a
metronome (1:1 tapping), between 250 ms and 2000 ms, variability in interval per-
ception and production increases with interval duration, following Weber’s law; at
intervals larger than 2000 ms or smaller than 250 ms, variability increases dispro-
portionately (Michon, 1967). Recent studies have shown that the asynchronies, or
error time series, exhibit 1/f structure (Chen et al., 1997; Chen et al., 2002; Pressing
and Jolley-Rogers, 1997). However, the ITIs typically exhibit anti-persistence, which
means the time series contained negative long-term correlations (Chen et al., 1997;
Chen et al., 2002). This finding is evidence against the timekeeper model, which
assumes that synchronization is a result of an isochronous timekeeper plus white
noise, or even local correction as a first-order auto-regressive process (Vorberg and
Schulze, 2002). Gilden (2001) proposed a model which used 1/f noise as a source
of variance in the timekeeper model. Temporal fluctuations were added to sequences
as perturbations of phase or tempo to test synchronization mechanisms. People can

13
adapt to these fluctuations, and it has been suggested that phase coupling and tempo
adaptation depend upon different mechanisms (Repp, 2001c; Thaut et al., 1998;
Thaut et al., 2009). People respond quickly and automatically to phase perturbations
of periodic sequences (Repp, 2001b; Repp, 2002b; Repp, 2003; Thaut et al., 1998),
and phase correction response profiles are nonlinear (Repp, 2002c).

1:n Tapping 1:n tapping means that there are n events between each tap. NMA
is significantly reduced or absent when the task changes to 1:n tapping (see, Ascher-
sleben, 2002; Repp, 2005). The perceptual hypothesis proposed by Wohlschlager and
Koch (2000) claims that NMA is a result of a perceptual underestimation of IOIs.
This is consistent with either timekeeper or oscillator models whose period is shorter
than the interval. In other words, this underestimation shortens the internal time-
keeper and makes the listener more likely to anticipate the tap following the silent
interval. Repp (2002c) found no NMA when interpolating three additional tones
between taps. Wohlschlager and Koch (2000) found a reduction in NMA when sub-
dividing the ITI by asking subjects to make an extra tap (non-contact) between taps,
or the IOI, with clicks randomly interspersed throughout. This evidence illustrates
the reduction of NMA when IOIs are subdivided by any sort of event (for a review,
see Repp, 2005). Variability also decreases when IOIs are shortened independently
of ITIs in 1:n tapping. This decrease in variability is referred to as the subdivision
benefit (see Repp, 2005).
When asking subjects to tap antiphase with the stimulus (tap between the sounds),
Chen et al. (2001) found that the error time series, again, exhibited 1/f structure
and ITIs were anti-persistent. There was a shallower slope for this task, indicating a
whitening of the time series, and the scaling exponent became steeper when subjects
changed their cognitive strategy for the anti-phase tapping. The cognitive strategy

14
was for subjects to think about extending their finger on the sound, rather than
making the tap between them. This has been shown to make the task easier. It
is interesting to note that as the task gets harder, the spectral slope gets whiter
(Grigolini et al., 2009).

Meter Meter may be an important factor in the perception of pulse. Large et


al. (2002) explicitly instructed subjects to synchronize at different metrical levels on
different trials with complex rhythms that contained embedded phase and tempo
perturbations. They observed that adaptation to perturbations at each tapping fre-
quency reflected information from other metrical levels. These results show that
synchronization to complex rhythms is not merely a process of error correction. This
suggests that listeners are simultaneously sensitive to multiple levels of temporal
structure.
In a study by Repp (2008b), subjects tapped on target tones of isochronous tone
sequences consisting of beats and subdivisions (1:n tapping). Phase perturbations at
subdivisions perturbed tapping responses despite the fact that both task instructions
and stimulus design encouraged listeners to ignore the perturbations. Moreover, sim-
ilar responses were observed when subdivisions were present throughout the sequence
and when subdivisions were introduced only in the cycle containing the perturbation.

Music NMA is significantly decreased or absent in studies in which complex rhythms


or music is used as the stimulus (Snyder and Krumhansl, 2001; Toiviainen and Snyder,
2003). Investigations on tapping to music (Dixon and Goebl, 2002; Repp, 1999c;
Repp, 2001a; Snyder and Krumhansl, 2001; van Noorden and Moelants, 1999) have
established that subjects can tap the beat of music and other rhythmically complex
patterns, and that subjects prefer temporal structures that have simple time ratios

15
(2:1, 3:1, 3:2). Drake et al. (2000) asked listeners to tap the pulse of musical excerpts
in varied Western tonal styles. Each stimulus was either mechanically synthesized
and accented, or expressively performed by a concert pianist. Results confirmed
that musicians and non-musicians are readily able to coordinate with temporally
fluctuating musical performances. Entrainment with expressive versions occurred at
slower frequencies within a narrower range of synchronization levels and corresponded
more frequently to the theoretically correct metrical hierarchy.
In a mechanical excerpt of Chopin’s Prelude No.15 and Prelude No.6, both percep-
tion and synchronization measures exhibited consistent patterns of fluctuation across
trials and subjects, reflecting not only phrase structure but also metrical structure
(Repp, 1999c; Repp, 1999b).

1.6 MODELS OF PULSE AND METER

Existing models of pulse and meter in the literature can be broadly classified as a)
linear models with explicit timekeeper(s) and b) nonlinear models that use nonlinear
oscillators with intrinsic dynamics as their basis.
The timekeeper models utilize information processing approaches to predict the
statistical properties of the time series being modelled. With the help of a cen-
tral timekeeper, these linear models are able to track temporal intervals (Wing and
Kristofferson, 1973). Oscillator models, on the other hand, use nonlinear systems
with intrinsic dynamical properties as the basic component. Even in the absence
of external input, they exhibit oscillations. Since the components themselves ex-
hibit periodic behavior they are ideally suited to model periodically fluctuating phe-
nomena. In addition, oscillator models share certain universal mathematical proper-
ties, such as multifrequency entrainment to external stimuli (Large and Kolen, 1994;

16
deGuzman and Kelso, 1991). The entrainment can be qualitatively characterized us-
ing models such as the sine circle map (Pikovsky et al., 2001). The circle map predicts
the relative phase between the external stimulus and the intrinsic oscillation of the
oscillator.
How do these models deal with tempo fluctuations in music? Neither phase
entrainment nor error correction alone provide sufficient temporal flexibility to ac-
commodate large changes in tempo. Therefore, a class of oscillator models known
as adaptive oscillator models has included tempo adaptation as a parameter dy-
namics to capture perception and attention in a temporally flexible way that could
adapt to temporal fluctuations in music. (e.g.,Large and Kolen, 1994; Mcauley, 1995;
Large, 2008). There are also linear error-correction models, such as the interval-
based which use linear interval correction to account for differences between the
timekeeper intervals and the stimulus intervals (Mates et al., 1994; Mates, 1994),
and an asynchrony-based where linear period corrections are based on deviations of
taps from stimulus events (Schulze et al., 2005).
These models all predict that people track tempo fluctuations based on the as-
sumption of a steady pulse and a mechanism for period adaptation/error correction
(see, Large, 2008. Large and Kolen (1994) described the process of tracking com-
plex events as nonlinear oscillations that entrain to event periodicities at various time
scales. Large and Palmer (2002) showed that nonlinear oscillators can track tempo
changes successfully and that deviations from temporal expectancies (embodied in
the oscillations) could be successfully used to discern the structural interpretations
(phrase and melody) intended by the performers. When stimulated with temporally
fluctuating rhythms, internally coupled oscillations (with metrically related frequen-
cies) are more resilient than single oscillators in tracking temporal fluctuations such
as rubato. These predictions have recently been evaluated with musical stimuli and

17
more complex rhythms. The findings of Large and Palmer (2002) suggest that per-
ception of temporal regularity in complex musical sequences is based on temporal
expectancies that adapt in response to temporally fluctuating input. Adaptation of
oscillator frequency happens in response to changes in stimulus tempo.

1.6.1 Tempo Tracking

Honing (2006) has provided an analysis that suggests perceptual limitations of track-
ing ability may be taken into account as performers shape temporal fluctuations.
A number of findings support tempo tracking (Michon, 1967; Thaut et al., 1998).
Michon (1967) studied subjects’ timing responses to temporally fluctuating stimuli.
Continuous random variability was introduced into the sequence, and the subjects’
ITIs echoed the pattern at a positive lag of one. Tracking was also observed with
sinusoidally modulated sequences as stimuli (Michon, 1967; Thaut et al., 1998). How-
ever, these fluctuations were small and the adaptations could be a reflection of phase
dynamics. Dixon et al. (2006) asked listeners to rate the correspondence of click
tracks to musical excerpts and, on different trials, to tap along with the excerpts.
Smoothed click tracks were always preferred over unsmoothed, and subjects’ taps
had smoother tempo curves than the IBIs calculated from the note onsets. The
data suggested that in performed music, perceived pulses did not coincide precisely
with onsets of sounded events. Listeners heard smooth tempo changes instead, such
that some events were early and others late with respect to perceived pulse. This
observation is consistent with tempo tracking dynamics that have been proposed
for nonlinear oscillators (Large and Kolen, 1994). Such findings are consistent with
the hypothesis of a network of oscillators of different frequencies, coupled together
in the perception of a complex rhythm (cf. Large, 2000; Large and Jones, 1999;
Large and Palmer, 2002). However, these observations have been limited to complex

18
rhythms, and this dissertation extends these experiments to entire musical perfor-
mances.

1.6.2 Tracking vs. Prediction

There is also some evidence for prediction of temporal fluctuations. Linear models
were not able to account for subjects’ behavior during smooth accelerations or smooth
decelerations in tempi of a monotonic sequence of tones (Schulze et al., 2005). Schulze
et al. (2005) suggest that subjects predict temporal fluctuations. If the stimulus
fluctuates, and subjects are reacting to the fluctuations, then they will track the
onsets. However, if subjects are able to synchronize or entrain with the stimulus
events then they are predicting when the next onset will occur in order to coordinate
their motor behavior with it.
In an expressive performance of a Chopin Etude, Repp (2002a) found strong lag 0
cross-correlation between listeners’ ITIs and the IBIs, showing that subjects predict
tempo changes. The results showed that synchronization with expressively timed
music was better than synchronization with a monotonic sequence that mimicked the
expressive timing pattern or synchronization with music that followed a structurally
inappropriate (phase-shifted) expressive timing pattern. Prediction performance in-
creased across trials for music but not for the series of clicks that mimicked the
expressive timing pattern, suggesting that musical information provided a structural
framework that facilitated pattern learning. Repp’s study emphasized the importance
of musical information beyond onset timing.
Repp (2002a) considered both perception of timing and synchronization of taps
with a mechanical excerpt of Chopin’s Etude in E major, Op.10, No.3., sequenced
on a computer with precise note durations and an isochronous tempo. In perception
experiments, accuracy of time change detection exhibited a consistent pattern, even

19
though the music was metronomic. In synchronization experiments, accuracy also
exhibited a consistent pattern across trials and subjects; moreover, synchronization
accuracy profiles correlated strongly with detection accuracy profiles. Listeners’ en-
dogenous temporal fluctuations reflected mainly the metrical structure of the music.

1.7 SUMMARY AND PERSPECTIVE

The theoretical question is whether pulse is isochronous and fluctuations in tempo


are deviations, causing people to track fluctuations while utilizing non-temporal mu-
sical information, or are tempo fluctuations intrinsic to pulse and therefore expected,
which allows for prediction of the fluctuations? This dissertation addresses the fun-
damental nature of pulse in music perception and performance. The assumption that
people merely track, or follow, changes in tempo is also discussed. Pulse is the per-
ceived periodicity of a rhythm. Pulse in musical performance is not purely periodic,
but fluctuates in time. What is the nature of these fluctuations, and how does it
change with different compositions and performers? Experiment 1 investigated the
characteristics of pulse via the nature of tempo fluctuations produced by a musician in
piano performance. Patterns of temporal fluctuation in music performances were an-
alyzed to identify long-range correlation and fractal scaling. Experiment 3 examined
whether the 1/f structure in tempo fluctuation generalized to other pianists.
Performed music gives rise to a pulse to which listeners entrain, as can be demon-
strated through motor coordination tasks. Fractal structure has been found in both
continuation and synchronization with a metronome. Studies in which subjects
synchronize with an unpredictably fluctuating stimulus show that people track the
changes at a lag of one (i.e., Thaut et al., 1998). Models attempting to capture
this behavior assume that tempo fluctuations in music, or any external stimulus, are

20
violations of expectancy. These models assume that expectancy is an underlying
isochronous pulse. However, if subjects are capable of tapping in synchrony with the
pulse of a fluctuating piece of music, then they are not only reacting to changes, but
are also predicting the fluctuations. Experiment 2 examined the question of whether
subjects could entrain to large temporal fluctuations and, if so, track or predict these
changes. Musical performances with tempo fluctuations and mechanical versions of
the same music were used as stimuli. Additionally the role of meter in pulse was
investigated by asking subjects to tap the beat at two different metrical levels on dif-
ferent trials. The focus was were they tracking or predicting. Experiments 4-7 answer
the question of whether 1/f structure is sufficient and/or necessary for coordination
with stimuli containing large temporal fluctuations, and the effect of varying amounts
of information on ability to predict fluctuations.

21
Chapter 2

Fractal Fluctuation and Pulse Prediction

Long-range correlation and fractal scaling are characteristic of fractal temporal pro-
cesses (Mandelbrot, 1977), and have been observed in many natural systems (Chen et
al., 1997; Dunlap, 1910; Hurst, 1951; Mandelbrot and Wallis, 1969; van Hateren, 1997;
Yu et al., 2005). Long-term correlations of a persistent nature are observed in a time
series when the adjacent values are statistically dependent. Fractal scaling means that
measured properties of the time series depend upon the resolution of the measure-
ments, and can be seen in a scaling function, which describes how the values change
with the resolution at which the measurement is done. Both long-range correlation
and fractal scaling imply self-similarity: the parts resemble the whole. However,
the structure of tempo fluctuations in musical performance has not been analyzed in
this way. Some researchers explain temporal fluctuations in music as resulting from
grouping or metrical structure of the piece (for a review see, Palmer, 1997). But,
musical structure does not account for all of the tempo fluctuations. Previously, frac-
tal structure has been found in other musical properties, including interval frequency
fluctuation (melody) and amplitude fluctuation (loudness) (Hsü and Hsü, 1990; Hsü
and Hsü, 1991; Voss and Clarke, 1975). The presence of fractal structure in biological,
psychological and musical processes lead to the hypothesis that tempo fluctuations in
music would also exhibit similar fractal statistics, and this structure is an important

22
aspect of pulse perception, entrainment, and aesthetics.
Experiment 1 investigated whether long-range correlation and fractal scaling were
present in the tempo fluctuations of expressive music performance. Such analyses
require long time series (Delignières et al., 2006), thus entire performances–including
hundreds of beats–were recorded and analyzed. Performances of four pieces of music
from different styles that contained different rhythmic and structural characteristics
were collected to assess whether such structures would be found across musical styles.
Fractal analyses were calculated (Bassingthwaighte et al., 1994; Feder, 1988) on the
inter-beat intervals (IBIs), from multiple metrical levels to look for long-range corre-
lation and fractal scaling of tempo fluctuations.
The aim of Experiment 2 was to understand how people adapt to naturally fluc-
tuating tempi in music performance and whether their tapping data contain fractal
structure. Subjects were asked to tap the beat to two of the performances from Ex-
periment 1 and mechanical versions (no tempo fluctuations) of the same two pieces.
Subjects performed the tapping task for two different metrical levels (quarter- and
eighth-note).

2.1 EXPERIMENT 1: FRACTAL STRUCTURE IN PIANO PERFOR-


MANCE

2.1.1 Method

Stimuli The stimuli consisted of four musical compositions: (1) Aria from Goldberg
Variations, by J. S. Bach; (2) Piano Sonata No. 8 in C minor Op. 13, Mvt. 1, by
Ludwig van Beethoven (measures 11-134); (3) Etude in E major, Op. 10, No. 3 by
Frédéric Chopin; and (4) I got rhythm by George Gershwin (see Appendix D). The
pieces were chosen as exemplars of different musical styles: baroque (Bach), classical

23
(Beethoven), romantic (Chopin), and jazz (Gershwin). These styles differ in mean-
ingful ways, including rhythmic characteristics, such as, level of syncopation, absolute
tempo, and amount of tempo fluctuation. Selecting pieces from diverse musical styles
spanning several centuries of keyboard music allowed for a test of generalizability
to multiple musical styles and composers. Figures 2.1–2.4 show the performances of
each piece as piano roll notation and a tempo map (beats per minute=60/IBI). Note
that each piece has a distinct temporal structure. Baroque music (Bach, Figure 2.1) is
typically performed with little tempo variation, or rubato, while other styles (Chopin,
Figure 2.3) are performed with much tempo variation. There are also differences in
note density, grouping structure, rhythmic structure, harmonic structure, dynamics,
and tonality.

Task A piano performance major (female, 22 years old) from The Harid Conserva-
tory (www.harid.edu) with 20 years of musical training was paid $100 to prepare all
four pieces as if for a concert, including natural tempo fluctuations. The pianist was
instructed not to use any ornamentation or add notes beyond what was written in
the score. The pieces were recorded on a Kawai CA 950 digital piano that records
the timing, key velocity, and pedal position via MIDI1 technology. The pianist was
allowed to record each piece until she was satisfied with a performance, and then
chose the best performance which was analyzed as described next.

Beat extraction Each performance was matched to its notation, or score, using
a custom dynamic programming algorithm (Large, 1992; Large and Rankin, 2007;
1
Musical Instrument Digital Interface

24
Goldberg Variations, Aria - J.S. Bach

Piano Roll

f
d
b
g
e

pitch
a
f
d
b
g

0 50 100 150 200 250

25
Tempo Map
120
100
80
60
40

tempo (bpm)
20

0 50 100 150 200 250


time (sec)

Figure 2.1: Notation, tempo, and dynamics of the expressive perfor-


mance of J.S. Bach’s Goldberg Variations, Aria. Top panel: Piano roll
showing onset time, pitch, duration, and velocity (color). Bottom panel:
Tempo map showing beats per minute (bpm=60/IBI).
Piano Sonata No.8 in C minor Op.13 mvt.1 (Pathetique), Aria - L.V. Beethoven

Piano Roll

f
d
b
g
e

pitch
a
f
d
b
g

0 20 40 60 80 100 120 140 160 180 200

26
Tempo Map
140
120
100
80
60
40

tempo (bpm)
20

0 20 40 60 80 100 120 140 160 180 200


time (sec)

Figure 2.2: Notation, tempo, and dynamics of the expressive perfor-


mance of Piano Sonata No. 8 in C minor Op. 13, Mvt. 1, by Ludwig
van Beethoven. Top panel: Piano roll showing onset time, pitch, dura-
tion, and velocity (color). Bottom panel: Tempo map showing beats per
minute (bpm=60/IBI).
Etude in E major, Op.10, No.3 - Chopin
Piano Roll

f
d
b
g
e

pitch
a
f
d
b
g

27
0 20 40 60 80 100 120 140 160 180 200
Tempo Map
120
100
80
60
40

tempo (bpm)
20

0 20 40 60 80 100 120 140 160 180 200


time (sec)

Figure 2.3: Notation, tempo, and dynamics of the expressive perfor-


mance of Etude in E major, Op. 10, No. 3 by Frédéric Chopin. Top
panel: Piano roll showing onset time, pitch, duration, and velocity (color).
Bottom panel: Tempo map showing beats per minute (bpm=60/IBI).
I Got Rhythm - George Gershwin
Piano Roll

f
d
b
g
e

pitch
a
f
d
b
g

28
0 10 20 30 40 50 60 70 80 90
Tempo Map
160
140
120
100
80
60
40

tempo (bpm)
20
0 10 20 30 40 50 60 70 80 90
time (sec)

Figure 2.4: Notation, tempo, and dynamics of the expressive perfor-


mance of I Got Rhythm by George Gershwin. Top panel: Piano roll
showing onset time, pitch, duration, and velocity (color). Bottom panel:
Tempo map showing beats per minute (bpm=60/IBI).
29
Figure 2.5: The matcher window after each note in the Bach perfor-
mance was matched to a note in the score. The notes in blue represent
notes in the performance (bottom) that have been matched to a chord in
the score (top).
see Figure 2.5). Chords consisted of notes which were written to occur at the same
time, according to the musical score. Notes in the performance were grouped into
chords based on notes that were grouped into chords in the notation. The onset
time for each chord was defined to be the average of all note onset times in the
chord; this was done to account for chord asynchronies which are common in human
piano performances.2 Beats were calculated by designating each chord as one event.
Beat times were extracted at three metrical levels (sixteenth-note, eighth-note, and
quarter-note3 ). Beats to which no event corresponded were interpolated using local
tempo. In other words, for every event there is a corresponding beat time, but not
every beat–at one particular metrical level (i.e., sixteenth-note)–has a corresponding
event. IBIs were calculated by subtracting successive beat times, providing an IBI
time series for each performance. The temporal structure of each entire musical
performance was analyzed at different time scales (different metrical levels) using the
methods described below.

Fractal Analysis Sequential aspects of temporal fluctuations were considered with


the goal of determining whether these fluctuations were random and independent
or if they exhibited long-term correlations and fractal scaling. Fractal analysis was
used to identify self-similarity and a power-law scaling relationship in the time series.
Self-similarity means that pieces of fractal objects are similar to the whole. When
similarity is present between statistical populations of observations of a specific feature
made at different scales, there is statistical self-similarity. This means that the scaling
2
Pianists do not always play the notes of a chord at exactly the same time. This is defined as
chord asynchrony and can be the result of intentional or unintentional fluctuations.
3
The Beethoven piece was composed of running eighth-notes. Due to the fast tempo of the per-
formance, in order to compare it with the other three pieces, the notated eighth-note, quarter-note,
and half-note levels of the Beethoven performance were extracted and are referred to as sixteenth-,
eighth-, and quarter-note levels, respectively.

30
is identical in all directions. However, fractal time series have statistical self-affinity,4
which means that the scaling is not identical for all directions. Proportions between
the enlarged pieces in one direction are different from another. The power-law scaling
relationship is the hallmark of fractal structure. When a property, q is measured in
quantities of s, its value depends on s according to the relationship

q = f (s). (2.1)

When an object is non-fractal, q will converge to a single number with smaller units
of measure, s. For a fractal, q exhibits a power-law scaling relationship with s. As s
decreases, q increases without any limit5

q = psη (2.2)

p is the factor of proportionality and η is a negative scaling exponent. η is calculated


as the slope of the linear regression fit to the data on a plot of log q by log s:

log q = log p + η log s (2.3)

Long-range correlations and scaling in the IBI time-series were assessed with both
a Power Spectral Density (PSD) analysis and a rescaled range analysis (Bassingth-
waighte et al., 1994; Feder, 1988; Rangarajan and Ding, 2000; see Appendix B).
These two types of analyses were chosen for specific reasons outlined below.

Signal Classification It is necessary to classify signals as either fractional Gaus-


sian noise (fGn; i.e., β < 1) or fractional Brownian motion (fBm; i.e., β > 1) before
the application of fractal analyses (Eke et al., 2000). The reason is that one must use
different methods to analyze the fractal structure of a stationary time series (fGn)
4
When self-similarity is referred to in this dissertation, self-affinity is technically what is intended.
5
Fractals in nature have an upper and lower boundary.

31
than one would use to analyze non-stationary signals (fBm). A stationary signal has
constant descriptive statistics over time (i.e., mean, variance, and correlation struc-
ture), whereas non-stationary signals have variance that increases or decreases with
time. An fGn signal can be cumulatively summed to yield an fBm signal, and vice
versa. PSD is the preferred method for distinguishing between fGn and fBm series
because it can be used to calculate the spectral slope for signals that are fGn or fBm
(for a review, see Eke et al., 2000; Eke et al., 2002).
Another issue to consider when choosing analyses is the length of the signal. It has
been argued that short (i.e., N < 1000) time series cannot give reliable results with
fractal analysis (Eke et al., 2002; Bassingthwaighte et al., 1994). However, Delignières
et al. (2006) show that when there are at least 128 data points, the fractal structure
can be accurately estimated if the correct methods are used. Delignières et al. (2006)
found that for short time series the Power Spectral Density (PSD) is the best method
for classifying signals as fGn or fBm. Admittedly, the longer the time series, the less
variable the results will be.

Power Spectral Density The PSD is a method which characterizes power-law


scaling in the frequency domain. It is related to the autocorrelation in that the Fourier
transform of the autocorrelation yields the power spectrum of a signal (see Appendix
B; Beran, 1994). The PSD analysis is based on the periodogram of the FFT.

S(f ) ∼ f −β (2.4)

Power, S(f ), was plotted against frequency, f, on a log-log plot. β was estimated
by calculating the negative slope of the line relating log(S(f )) to log(f ), using linear
regression. A straight line on a log-log plot suggests that the spectral density, S(f ),
scales with frequency, f , as a power law with S(f ) ∼ f −β or 1/f β . A time series is

32
considered to have long-range correlation when β is different from zero (Malamud
and Turcotte, 1999).

Hurst’s Rescaled Range analysis One reason that a second analysis should be
used is the PSD can be weak in estimating the true spectral exponent of a time series
(Eke et al., 2000). Thus, a second, theoretically equivalent analysis was performed
using Hurst’s rescaled range (R/S) analysis (Mandelbrot and Wallis, 1969; Hurst,
1951; Hurst et al., 1965). The R/S analysis is the range, R, of cumulative deviations
normalized by the standard deviation, S, as a function of length of the signal. This
analysis characterizes temporal dependence by looking at how the range of cumulative
fluctuations depends on the length of the subset of data analyzed (see Appendix B).
The R/S analysis yields a parameter, H, as a measure of fractal dimension. H is
theoretically related to β by the identity,
β+1 β−1
Hf Gn = Hf Bm = (2.5)
2 2
H can assume any value between 0 and 1 for either fGn or fBm which is why it
is important to report the class of the signal along with H . When H = .5, the
points in the time-series are uncorrelated and independent. When H 6= .5 it indicates
that each increment is statistically dependent on the previous increment. The job
of the Hurst exponent is to characterize this dependence. When 0 < H < 0.5 the
self-similar correlations are anti-persistent: increases at any one time are more likely
to be followed by decreases over all later time scales. When 0.5 < H < 1, the
self-similar correlations at all time scales are persistent: increases at any one time
are more likely to be followed by increases over all later time scales. Most natural
fractals are persistent (Bassingthwaighte et al., 1994; Feder, 1988).
Statistical significance of the parameter H was obtained by performing the anal-
ysis on 1000 runs of the shuffled data (randomly ordered versions of the same data

33
set) and comparing the results. Shuffling the data eliminates correlational structure
and yields a result near H = .5. Both the spectral method and the R/S method
can be susceptible to artifacts (e.g., in estimation of the spectral slope), such that
reliance on either method in isolation can lead to faulty conclusions. Therefore, both
methods were used and convergence was required to establish long-range correlations
(Rangarajan and Ding, 2000).

2.1.2 Results

Results of the beat extraction process are shown in Table 2.1 for each of three metrical
levels. The shortest time-series yielded 143 events (Gershwin, quarter-note level)
while the longest time-series yielded 1945 events (Beethoven, sixteenth-note level).
Thus, each time-series had a sufficient number of data points for fractal analysis
(Delignières et al., 2006). Most beats corresponded to note onsets and others were
inserted in the time-series using local tempo. At the eighth-note level, a total of
20 beats (5.2%) were added to the Bach, 2 beats (0.7%) to the Chopin, 10 beats
(1.5%) to the Beethoven, and 15 beats (5.2%) to the Gershwin.6 More beats were
added at the sixteenth-note level, while fewer were added at the quarter-note level.
At the eighth- and quarter-note levels, the IBI time-series for the first five measures
of the Chopin was significantly correlated with Repp’s (1998) typical timing profile
for this piece, r(16) = .86, p < .001 (eighth-note), r(7) = .54, p = .13 (quarter-note),
illustrating that for measures 1-5, this pianist produced similar fluctuations as the
professional pianists’ recordings analyzed by Repp (1998).
Spectral density plots for the eighth-note level IBI time series are shown in Fig-
6
Analysis of a created fractal time series containing the same variance and number of points
showed no effect on estimates of β and H following removal of the same number of points as in the
experiment.

34
Spectral Slope R/S (Hurst)
A E
8 4
Actual H
Shuffled H
6
3

Log Power

Log R/S
4
2
2

1
0
β = 0.42 H = 0.73 (p < .001)
−2
−6 −4 −2 0 0 2 4 6

B F
10 4

8
3
Log Power

Log R/S
6
2
4

1
2
β = 0.73 H = 0.71 (p < .001)
0
−6 −4 −2 0 0 2 4 6 8

C G
10 4

8
3
Log Power

Log R/S

6
2
4

1
2
β = 0.63 H = 0.92 (p < .001)
0
−6 −4 −2 0 0 2 4 6

D H
8 4

6
3
Log Power

Log R/S

4
2
2

1
0
β = 0.73 H = 0.75 (p < .001)
−2
−6 −4 −2 0 0 2 4 6

Log Frequency Log Window Size

Figure 2.6: Spectral density (A-D) and rescaled range (E-H) analyses
of the IBIs at the eighth-note level of the performances for Bach (A, E),
Beethoven (B, F), Chopin (C, G), and Gershwin (D, H).

35
Table 2.1: Number of events and mean H for the IBIs of each perfor-
mance at three metrical levels. The mean tempo for each piece is listed
in the last column.
Piece Number of Events Hf Gn Mean Tempo

1 1 1 1 1 1 IBI
16
-note 8
-note 4
-note 16
-note 8
-note 4
-note ( 1/4−note )

Bach 767 384 192 .68 .73 .76 1259ms

Beethoven 1945 973 487 .69 .71 .72 487ms

Chopin 611 306 153 .90 .92 .94 1383ms

Gershwin 572 286 143 .75 .75 .76 626ms

ure 2.6 (A-D). For each performance, log power decreased with log frequency in a
manner consistent with a 1/f power-law distribution. Using linear regression, all
slopes were estimated to be different from zero and less than one, which classified the
performances as fGn. R/S analyses were calculated and H agreed with β according
to
β+1
Hf Gn = (2.6)
2
The results for the R/S analysis are shown in Figure 2.6 (E-H). All Hurst coeffi-
cients were significantly greater than the shuffled data (p < .001), which is indicative
of persistent (non-random) processes that can be characterized as fractal. Thus, the
PSD and R/S analyses revealed long-range dependence.
Next, PSD and R/S analyses were applied to IBIs from the three metrical levels
(sixteenth-, eighth-, and quarter-note). All results yielded β < 1 and R/S analyses
were also calculated. Table 2.1 provides H values from each level for each piece. The
R/S analysis exhibits a slightly increasing H value at each level. This is expected,
and it indicates that the amount of structure in the temporal fluctuations does not

36
change with metrical level. This result is considered to be a clear indication of fractal
scaling (Malamud and Turcotte, 1999). The meaning of this result is underscored in
Figure 2.12, which shows tempo maps for the Chopin at each of the three metrical
levels for visual comparison. The same structure is apparent, regardless of the time
scale at which the process is measured.

2.1.3 Discussion

The analysis of IBIs revealed both long-term correlation and fractal scaling. Long-
term correlation–persistence–means that fluctuations are systematic, such that in-
creases in the tempo tend to be followed by further increases, and decreases followed
by further decreases. Moreover, IBIs early in the time series are correlated with IBIs
found much later in the time series, implying structure in the performer’s dynamic
expression of tempo. Although differences between the musical styles were apparent,
all were significantly fractal, suggesting that this finding generalizes across musical
and rhythmic styles. Similar H values at each metrical level (Table 2.1) were found,
due to similar fluctuations at each time scale.
The finding of fractal structure is considered evidence against a central time-keeper
mechanism (cf. Madison, 2004), a functional clock that produces near isochronous
intervals with stationary, random variability (e.g., Vorberg and Wing, 1996). Admit-
tedly, our pianist did not intend to produce isochronous intervals. However, if the
intention was to play the same pieces without tempo fluctuation, the timing profiles
would likely correlate with those of the expressive performances (Penel and Drake,
1998; Repp, 1999a), and if so, they would display similar fractal structure. Chen et
al. (2001) proposed that long-range dependence in timing fluctuations is the outcome
of distributed neural processes acting on multiple time scales.
As discussed above, there are important relationships between music structure

37
(such as phrasing patterns) and patterns of temporal fluctuation (see Palmer, 1997).
Fractal structures also include embedded regularities (i.e., scaling), so the types of reg-
ularities we observe here are not different from previously observed patterns; rather,
they represent a different approach to measuring music structure. This approach fa-
cilitates the analysis of long performances, and does not require measurement of–or
correlation with–other aspects of music structure. However, such correlations may
be assumed to exist based on previous studies. Additionally, we found correlation
with Repp’s typical timing profile for the beginning of the Chopin, suggesting that
the results from this study would generalize to other professional pianists. More-
over, 1/f distributions have been shown along other musical dimensions, including
frequency fluctuation (related to melody), for a wide variety of musical styles includ-
ing classical, jazz, blues, and rock (Voss and Clarke, 1975). Thus, the measurement
of structure along one dimension may reflect structure along other dimensions. The
effect of 1/f structure of performance tempo on temporal coordination with expres-
sively performed rhythms was considered in Experiment 2.

2.2 EXPERIMENT 2: SYNCHRONIZING WITH PIANO PERFOR-


MANCES

The aim of Experiment 2 was to understand how people adapt to naturally fluctuating
tempi in music performance. The first issue was whether or not people could adapt
to large tempo fluctuations. Would subjects continue to produce taps on the beat at
a specific metrical level despite large and fluctuations? Would the fractal structure in
the tempo fluctuations be enough for subjects to predict the next beat, resulting in
successful entrainment with the stimulus? Or would subjects react to the changes and
track the tempo at a lag of 1? Finally, do the ITIs or errors exhibit fractal structure

38
similar to the structure of the stimuli?

2.2.1 Method

Stimuli Two of the four pieces analyzed in Experiment 1, Goldberg Variations,


Aria by J. S. Bach and Etude in E major, Op. 10, No. 3 by Frédéric Chopin, were
chosen as stimuli for Experiment 2 because they had similar mean tempi but different
rhythmic characteristics and levels of tempo fluctuation. The expressive performances
recorded in Experiment 1 were used as one set of stimuli, and controls (mechanical
performances) were created from the score using Cubase, running on a Macintosh G3
450 MHz computer. No timing or dynamic changes/fluctuations were contained in
the mechanical versions; each note’s pitch and duration was produced as it appeared
in the score. The tempo was set to the mean tempo of the corresponding expressive
version (Bach quarter-note IBI = 1259 ms; Chopin quarter-note IBI = 1383 ms).

Subjects Seven right-handed volunteers from the Florida Atlantic University com-
munity participated (1 female, 6 male). Music training ranged from zero to eight
years. Each subject signed an informed consent form that was approved by the In-
stitutional Review Board of Florida Atlantic University. One subject was excluded
from the analysis due to incomplete data.

Procedure Subjects were seated in an IAC sound-attenuated experimental cham-


ber wearing Sennheiser HD250 linear II headphones. The music was presented by a
custom Max/MSP (http://cycling74.com) program running on a Macintosh G3 com-
puter. Sounds were generated using the “Piano 1” patch on a Kawai CA 950 digital
piano. Subjects tapped on a Roland Handsonic HPD-15 drumpad that sent the time
and velocity of the taps to the Max/MSP program. An induction sequence of eight

39
beats was provided to illustrate the correct period (quarter- or eighth-note) and phase
at which to tap. Continuing from the induction sequence, subjects tapped the beat
with their index finger on the drumpad for the entire duration of the piece. Six trials
were collected for the mechanical and expressive versions at both the quarter- and
eighth-note level. To minimize learning effects, trials were blocked by piece, perfor-
mance, and metrical level. Data for different pieces were collected on different days
within one week, and the order of pieces was randomized. On each day, tapping
with mechanical performances was recorded first, and eighth-note trials were followed
by quarter-note trials. This procedure was intended to maximize learning of each
piece during the mechanical trials and minimize learning while tapping to expressive
performances (cf. Repp, 2002a).

Analysis The relative phase, φn , of each tap, n, relative to the local beat was
calculated as
Tn − Bm
φn = (2.7)
Bm − Bm−1
where Tn is the time of the tap, Bm is the time of the closest beat in the musical
piece, Bm−1 is the preceding beat, and φn is the relative phase of tap n. The resulting
variable is circular on the interval (0, 1), and was reset to the interval (−.5, .5) by
subtracting 1 from all values greater than 0.5. Although past research often has
used mean and standard deviation of timing errors, those methods ignore the circular
nature of relative phase and treat −.5 and .5 as describing different values. Instead,
circular statistics (Batschelet, 1981; Beran, 2004) were used to calculate the mean and
angular deviation (analogous to standard deviation of a linear variable) of relative
phase (see Appendix A). In addition to the mean and angular deviation of relative
phase, Inter-Tap Interval (ITI; the time between successive taps) and relative phase
were analyzed using the PSD and R/S analyses described in Experiment 1.

40
Finally, a prediction index and a tracking index (Repp, 2002a) were used to de-
scribe how subjects adapted to changes in tempo. These measures were based on a
cross-correlation between the ITIs and the IBIs of the expressive performance. The
normalization in this procedure (1−ac1 ) allows for comparison of pieces with different
types and amounts of fluctuation. The prediction index (r0∗ ) is a (normalized) lag 0
cross-correlation of ITIs with IBIs; it therefore, indicates how well subjects predict
when the next beat will occur.

(r0 − ac1 )
r0∗ = (2.8)
(1 − ac1 )

Where r0 is the lag 0 cross-correlation between ITI and IBI, and ac1 is the lag 1
autocorrelation of the IBIs. Perfect anticipation of tempo changes results in r0∗ = 1.
The tracking index (r1∗ ) is a (normalized) lag 1 cross-correlation of ITI with IBI; and
therefore, indicates the extent that subjects track tempo change.

(r1 − ac1 )
r1∗ = (2.9)
(1 − ac1 )

Where r1 is the lag 1 correlation between ITI and IBI, and ac1 is the lag 1 auto-
correlation of the IBIs. If the subjects are responding to the tempo fluctuations by
matching previous IBI, they will lag the expressive performance by one beat (e.g.,
Michon, 1967), thus, r1∗ = 1.

2.2.2 Results

Figure 2.7 shows mean relative phase (φ̄) (averaged over all 6 trials) for a typical
musician subject, for each condition of the Bach. This subject entrained successfully
with each stimulus. Not surprisingly, relative phase is less variable for the mechanical
than for the expressive versions. Also, very few negative mean relative phases are

41
Bach - S5
0.5
Eighth-Note, Mechanical 0.5
Eighth-Note, Expressive
phase, φ (radians/ 2π)

0 0

−0.5 −0.5
0 50 100 150 200 250 0 50 100 150 200 250

0.5
Quarter-Note, Mechanical 0.5
Quarter-Note, Expressive
phase, φ (radians/ 2π)

0 0

−0.5 −0.5
0 50 100 150 200 250 0 50 100 150 200 250

time (sec) time (sec)

Figure 2.7: Mean relative phase values for subject 5 tapping to each
condition of Goldberg Variations, Aria by J.S. Bach. Values were averaged
over all trials for each condition.

observed (i.e., no negative mean asynchrony). Moreover, for the expressive version,
variability decreases at the quarter note level, compared with the eighth-note level.
Figure 2.8 shows mean relative phase (φ̄) (averaged over all 6 trials) for the same
subject, tapping to the expressive Chopin performance. Similar patterns are observed
in terms of lack of negative mean asynchrony and decreased variability at the quarter-
note level. Interestingly, for the expressive performances, all subjects shifted taps at
the beginning of measure 43 (t = 120s) by half ( 12 ) of one cycle at the eighth-note

42
Chopin - S5
Eighth Note, Expressive
0. 5

phase, φ (radians/ 2π)

−0. 5
0 50 100 150 200 250

Quarter Note, Expressive


0. 5
phase, φ (radians/ 2π)

−0. 5
0 50 100 150 200 250

time (sec)

Figure 2.8: Mean relative phase values for subject 5 when tapping to the
expressive version of Frédéric Chopin’s Etude in E major, Op.10, No.3.
Values were averaged over all trials for the eighth-note condition (top)
and quarter-note condition (bottom). The boxes emphasize the measures
where subjects consistently tapped out of phase during the expressive
conditions.

level and by one quarter ( 14 ) of one cycle at the quarter-note level. All subjects begin
tapping in-phase again around the beginning of measure 55. This appears to be due
to a compositional technique in which Chopin shifts the perceptual downbeat, lasting
for approximately 12 measures (Figure 2.9). The cues that point to the down beat
become misleading and what was the weak beat becomes the strong beat. For the
mechanical condition, only two subjects shift–and both only when tapping at the

43
44
Figure 2.9: Measures 43-53 of Chopin’s Etude in E major, Op.10 No.3
where subjects shifted the phase of their taps.
eighth-note level. Thus, the tempo fluctuations in performance appear to amplify the
sense of shifted downbeat. Due to this shift, only the tapping data up to the first 43
measures of the Chopin piece were analyzed.

A B
Bach 1 -note
8
.03 Chopin 1 -note
.03 4
Mean Relative Phase, φ (radians/ 2π)

.02 .02
**
** *
*
.01 .01

.00 .00

Mechanical Expressive Mechanical Expressive

C D
.12 .12
Bach 1 -note
8
Mean Angular Deviation, s (radians/ 2π)

Chopin 1 -note
4
.10 .10

.08
**
.08 **
** **
.06 .06

.04
* .04 **
.02 .02

** p < .01 * p < .05


.00 .00

Mechanical Expressive Mechanical Expressive


Performance Type Performance Type

Figure 2.10: Mean (A, B) and angular deviation (C, D) of relative phase,
averaged across the entire performance, as a function of performance type
for each piece (A, C) and tapping level (B, D). Error bars represent one
standard error.

Mean and Angular Deviation of Relative Phase Mean and angular deviation
of relative phase were analyzed using three-way ANOVAs, with within-subject fac-

45
tors: Performance Type (mechanical vs. expressive), Metrical Level (quarter- vs.
eighth-note), and Piece (Bach vs. Chopin). Pairwise t-tests were used for posthoc
comparisons. A four-way ANOVA with factors Performance Type, Metrical Level,
Piece, and Trial (6) was computed to determine if there was an effect of trial, which
would imply that there was learning. No significant effects of Trial or interactions
containing Trial were found, implying that no learning had taken place, so this factor
was not considered further.
Mean relative phase, shown in Figure 2.10 (A, B), were relatively small–less than
3% of the IBI–indicating that subjects were able to successfully entrain to the stimu-
lus. Negative mean asynchrony was not observed; on average, taps fell slightly after
the beat, regardless of performance type. Statistical testing revealed that mean rela-
tive phase was significantly greater for Chopin than for Bach, F (1, 5) = 17.00, p < .01.
Significant two-way interactions were also found between Piece and Performance
Type, F (1, 5) = 31.75, p < .01, and between Performance Type and Metrical Level,
F (1, 5) = 5.42, p < .05. These interactions arose because mean relative phase was
significantly greater for the expressive Chopin when compared to the mechanical
Chopin (p < .001) and significantly greater expressive Chopin when compared to the
expressive Bach (p < .001). Mean relative phase was significantly greater for the
eighth-note level expressive than either the eighth-note level mechanical (p < .02),
or with the quarter-note level expressive (p < .05). No other significant effects were
found.
For angular deviation of relative phase, shown in Figure 2.10 (C, D), the ANOVA
revealed significant main effects of Performance Type, F (1, 5) = 214.26, p < .001,
Piece, F (1, 5) = 18.18, p < .01, and Metrical Level, F (1, 5) = 137.79, p < .001. Me-
chanical was less variable than expressive; Bach was less variable than Chopin, and the
quarter-note level was less variable than the eighth-note level. A significant two-way

46
interaction was found between Performance Type and Piece, F (1, 5) = 38.64, p < .01.
Both pieces were more variable for the expressive performance (p < .001), and the
Chopin expressive was significantly more variable than the Bach expressive perfor-
mance (p < .001). Also, a significant two-way interaction was found between Perfor-
mance Type and Metrical Level, F (1, 5) = 81.56, p < .001; tapping to the expressive
performances was significantly more variable than tapping to the mechanical versions
at both the quarter- and eighth-note levels (p < .001), and the eighth-note vari-
ability for the expressive performance was significantly greater than the quarter-note
variability for the expressive (p < .001).
The results for the mean and angular deviation of relative phase indicate that
overall, subjects were able to entrain. For the expressive performances, subjects
were more accurate and more precise at the quarter-note level. Moreover, subjects
were less accurate and more variable for the Chopin expressive performance, whose
IBIs showed greater variability than the Bach. No other significant main effects or
interactions were found.

Prediction and Tracking Prediction and tracking indices, shown in Figure 2.11,
were used to identify patterns of anticipation and reaction to changes in tempo.
Because changes in tempo are required for the calculation of these measures, they
were only calculated for the expressive performances. Prediction and tracking re-
sults were analyzed using three-way ANOVAs, with within-subject factors: Metrical
Level (quarter- vs. eighth-note), Piece (Bach vs. Chopin), and Index (prediction vs.
tracking). Pairwise t-tests were used for posthoc comparisons. A four-way ANOVA
with factors Metrical Level, Piece, Index, and Trial (6) was computed to determine
if there was an effect of trial. No significant effects of Trial or interactions containing
Trial were found, implying that no learning had taken place, so this factor was not

47
Bach Prediction Chopin
Tracking

1 1
Mean Index, r0* or r1*
.8 .8

.6 .6 **
*
.4 .4
**

.2 .2

0 0
Eighth-note Quarter-note Eighth-note Quarter-note
Metrical Level **p < .01, *p < .05 Metrical Level

Figure 2.11: Prediction and tracking indices for the expressive versions
of Bach and Chopin at the eighth- and quarter-note levels. Error bars
represent one standard error.

considered further.
Significant main effects of Piece, F (1, 5) = 70.95, p < .001, and Index, F (1, 5) =
50.18, p < .001, were found, and their interaction was significant, F (1, 5) = 19.76, p <
.01. In general, subjects predicted tempo changes and prediction was more effi-
cient for the Chopin than for the Bach. No other two-way interactions were signif-
icant. The three-way interaction between, Index, Level, and Piece was significant,
F (1, 5) = 10.69, p < .02. For the Chopin, prediction was significantly stronger than
tracking at both metrical levels (p < .01). For the Bach, prediction was significantly
greater than tracking at the quarter-note level (p < .05), but at the eighth-note level
prediction was not significantly different from tracking (p = .80). In the Bach piano
performances, temporal intervals were highly variable so that the eighth-note level was
not consistently subdivided; whereas, from the point of view of a quarter-note refer-
ent, more subdivisions were available. The Chopin performances contained running
sixteenth-notes throughout the piece; thus, subdivisions were present at both tapping

48
levels (see Appendix D). Overall, this suggests that the presence of subdivisions may
aid or be partially responsible for the prediction effect.

Power Spectral Density and Rescaled Range Analysis The same Power Spectral
Density (PSD) and Hurst’s Rescaled Range (R/S) analyses as described in Experi-
ment 1, were calculated for the ITIs and the relative phases for each trial. The H
and β values agreed according to the equation β = 2H − 1, as β values were less
than 1 which indicates stationary fGn. Note that a significance measure is available
for each trial using the R/S analysis (see Experiment 1; see Appendix B); the re-
sults for the R/S analysis (H) are reported in Table 2.2. Repp (2002) found phrase
structure modulations in the ITIs of subjects tapping to music without tempo fluc-
tuations. Therefore, it might be expected that tapping to mechanical performances
would show some rudiments of fractal structure. However, for ITIs from the mechan-
ical performances, 77 − 97% of trials were anti-persistent (p < .05; H < .5, which
implies negative long-range correlation); a result that is comparable to synchroniza-
tion with a metronome (Chen et al., 1997). Chen et al. (1997) attributed this to the
nature of the ITI calculation, suggesting that ITI is not an appropriate variable for
fractal analysis in synchronization tasks. However, results indicate a clear difference
when compared to the expressive performances, in which 100% of the ITIs were sig-
nificantly persistent (p < .05; H > .5, which implies positive long-range correlation).
Moreover, mean H values for the expressive trials matched H values for the respec-
tive performances, suggesting that ITIs from the expressive performances reflected
the fractal structure of the performances.
The H values for relative phases also told an interesting story. For relative phases
from the mechanical performances, 63 − 72% of the trials were significantly persistent
(p < .05) at the eighth-note level. While this is less than the 100% expected from the

49
literature (Chen et al., 2001; Chen et al., 1997), it is still a large percentage of trials.
However, when tapping to mechanical performances at the quarter-note level, only
22 − 27% of the trials were significantly persistent (p < .05). Thus, at the quarter-
note level, relative phase time series were less fractal for mechanical performances.
A surprising result for relative phase in the expressive performances was also found.
Overall, only about one-third of the trials were significantly persistent. While that
number is far greater than chance, it is far fewer than would be expected based on
synchronization with periodic sequences (Chen et al., 1997; Chen et al., 2001). Note
that these are the trials in which the ITIs were 100% persistent, suggesting that
the fractal structure somehow migrates from the errors to the ITIs when tapping to
fractally structured expressive performances. For the H values for relative phase,
a four-way ANOVA was calculated, with within-subject factors: Performance Type
(expressive vs. mechanical), Metrical Level (quarter- vs. eighth-note), Piece (Bach vs.
Chopin), and Trial (6). The ANOVA revealed an interaction between Performance
Type and Metrical Level, F (1, 5) = 9.29, p < .05, confirming the findings.

2.2.3 Discussion: Experiment 2

Results show that subjects successfully entrained to complex musical rhythms, and
their performance can be comparable for mechanical and expressive versions even
when tempo fluctuations are large. For mechanical versions, there was no difference
in accuracy (i.e., mean relative phase) between eighth- and quarter-note levels, but
there was a small, significant advantage in terms of precision (i.e., low variability)
at the quarter-note level, possibly due to the presence of subdivisions in ITIs. The
large drop in number of persistent trials for relative phase at the eighth- vs. quarter-
note levels suggests a related effect of time scale. For the expressive performances

50
Table 2.2: Mean Hurst exponent (H ) for the Inter-tap Intervals and
relative phase time series, averaged across subjects and trials. Percentage
of significant trials (persistent (P) or anti-persistent (AP)) based on R/S
analysis (p < .05).
Stimulus Inter-Tap Interval Relative Phase

Hf Gn %P % AP Hf Gn %P % AP

1
Bach Mechanical 8
.408 0 94 .692 72 0
1
Bach Mechanical 4
.466 0 77 .633 22 0

1
Bach Expressive 8
.725 100 0 .654 36 0
1
Bach Expressive 4
.720 100 0 .671 36 0

1
Chopin Mechanical 8
.428 0 80 .696 63 0
1
Chopin Mechanical 4
.402 0 97 .666 27 0

1
Chopin Expressive 8
.860 100 0 .679 47 0
1
Chopin Expressive 4
.898 100 0 .682 30 0

there was a large improvement in both mean relative phase and angular deviation
from eighth- to quarter-note levels. Subjects were equally accurate for expressive and
mechanical performances at the quarter-note level, and nearly as precise.
Cross-correlational measures yielded significantly higher prediction than tracking
indices; thus, people tend to anticipate, rather than react to, tempo fluctuations. For
the expressive performances, 100% of the trials’ ITIs showed significant persistence,
with fractal coefficients matching their respective performances, whereas fractal anal-
ysis on relative phase time series showed far fewer persistent trials than would be
expected from the synchronization literature. This suggests–albeit in a non-specific
way–that fractal structure is related to the prediction of tempo fluctuations.
Moreover, for the expressive Bach performance, prediction increased at the quarter-

51
note level. The same increase was not observed in the Chopin, however, indicating
that it was not an artifact of the blocked design. The rhythm of the Bach consisted
of highly varied temporal intervals, such that subdivisions were more often available
at the quarter-note level, whereas for the Chopin, subdivisions were always present at
both eighth- and quarter-note levels. Thus, prediction may have been partially due
to the existence of subdivisions of the beat, providing information about the length
of the IBI in progress. This interpretation is supported by the finding that accuracy
and precision were superior at the quarter-note level for the Bach and Chopin.

2.3 GENERAL DISCUSSION

How are the two main findings of the current study–fractal structuring of tempo-
ral fluctuations in piano performance and prediction of temporal fluctuations by
listeners–related? Two of the properties implied by the results are fractal scaling
and long-range correlation. Fractal scaling implies that fluctuation at lower levels
of metrical structure (e.g., sixteenth-note) provides information about fluctuation at
higher levels of metrical structure (e.g., quarter-note). Thus, the fact that tempo
fluctuations scale implies that small time scale fluctuations are useful in predicting
larger time scale fluctuations. Perturbations of subdivisions have been shown to pro-
duce positively correlated perturbations in on-beat synchronization responses, even
when subjects attempt to ignore the perturbations (Repp, 2008a), and sensitivity to
multiple metrical levels occurs in adapting to both phase and tempo perturbations
(Large and Palmer, 2002). Such responses would automatically exploit fractal scaling
properties, enabling short-term prediction of tempo fluctuations.
These observations could be explained by the Large and Jones (1999) model in
which tempo tracking takes place at multiple time scales simultaneously via neural

52
oscillations of different frequencies that entrain to stimuli and communicate with one
another. This model successfully tracked temporal fluctuations in expressive per-
formances, and systematic temporal structure characteristic of human performances
improved tracking but randomly generated temporal irregularities did not (Large
and Palmer, 2002). Scaling does not tell the whole story, however. Tempo tracking
would imply smoothed IBIs as found by Dixon et al. (2006) because tempo adapta-
tions within a stable parameter range (cf. Large and Palmer, 2002) would effectively
low-pass filter the fluctuations. But if subjects’ ITIs were smoothed versions of the
veridical IBIs, one would expect greater fractal magnitudes because smoothing means
removing higher frequencies, which results in steeper slopes. However, the H values
for ITIs were approximately equal to the H values of the IBIs themselves. Tempo
tracking appears to be ruled out by this finding. Figure 2.12 illustrates how frac-
tal structure may enable prediction in two related ways. Scaling (downward arrow)
enables prediction if oscillations adapting to tempo changes at multiple time scales
communicate with one another. Persistence (arched arrow) enables prediction within
a given time scale, because it implies long-range correlation.

53
500
400
persistence
300
1/16
200
100
0

250
200 scaling
150
1/8
100
50

Tempo (bpm)
0

125
100

54
1/4
50

0 20 40 60 80 100 120 140 160 180 200

Tempo (sec)

Figure 2.12: The tempo map (bpm=60/IBI) for 3 different metrical


levels (sixteenth-note, eighth-note, quarter-note) of Chopin’s Etude in
E major, Op.10, No.3, illustrating fractal scaling of tempo fluctuations.
Fractal scaling implies that changes at fast time scales could facilitate
prediction of changes at slow time scales (downward arrow), persistence
implies that changes early in the sequence could facilitate prediction of
changes later in the sequence (arched arrow).
Chapter 3

1/f β Tempo Fluctuations in Skilled Piano


Performances

3.1 INTRODUCTION

In Chapter 2, the performances of one skilled pianist revealed 1/f type serial corre-
lations and fractal scaling. The performances were collected from a student pianist
with 20 years of experience including instruction. Palmer (1989) showed that three
expert pianists (15-37 years of performing) exhibited more tempo fluctuation than
three student pianists (13-16 years of instruction). Is it possible that there are also
individual differences in the degree of fractal structure produced by different perform-
ers or that performing in a laboratory is different from a concert hall or recording
studio? Musical interpretations may also be sensitive to current performance practice,
such that pianists interpret music differently and performance practice differs between
today and the early 20th century. Thus, it makes sense to ask: How does this 1/f
structure in the tempo fluctuations of piano performances generalize to other pianists
and compositions? For Experiment 3, data from multiple pianists performing five
different compositions were analyzed. The data were obtained from a database that
included commercial audio recordings of professional pianists (experts). IBIs (inter-
beat intervals) were subjected to fractal analyses to assess the long-term structure of
the tempo fluctuations.

55
The database consists of commercial audio recordings of professional pianists per-
forming mazurkas written by Frédéric Chopin. The mazurka was originally a folk
dance genre in Poland. The tempo of a mazurka is characteristically uneven through-
out a measure, with the duration of the first beat typically shortened, and the duration
of the second or third beat lengthened. The data were collected over two years dur-
ing the Mazurka Project at CHARM (Centre for the History and analysis of recorded
music) at Royal Holloway, University of London (http://mazurka.org.uk). An advan-
tage of this database is the range of recording dates. The earliest recording was made
by the Swiss pianist Alfred Gruenfeld in 1902, and the most recent were recorded in
2006 and posted on YouTube (www.youtube.com). All recordings consisted of the
entire musical composition. Some recordings are from live performances in front of
an audience, and others were recorded in a studio. The five pieces of music in this
database were Mazurka in A minor Op.17, No.4 (63 performances); Mazurka in C
major Op.24, No.2 (63 performances); Mazurka in B minor Op.30, No.2 (34 perfor-
mances); Mazurka in C# minor Op.63, No.3 (87 performances); Mazurka in F major
Op.68, No.3 (50 performances).

3.2 METHODS

3.2.1 Beat extraction

Figure 3.1 shows an example of the beat extraction process.1 The first step in beat
extraction was to listen to the music while tapping to the beat at the quarter-note
level. Next, the tapped time of each beat was corrected using a partly automatic,
partly manual process. First, onsets were detected as sudden changes in spectral
content of the audio signal (see Figure 3.2 for a spectrogram example), which is
1
Thanks to Craig Sapp for sharing the database and manually extracted beat times.

56
represented by the black line in Figure 3.1. Then, taps were automatically moved to
the nearest onset peak location (shown as purple vertical lines of panel B, Figure 3.1).
Beats were manually corrected by listening to the music with the clicks representing
the adjusted beat times and visually comparing with the waveform and spectrogram.
Beat times were then recorded and IBIs were calculated by taking the difference of
the beat times.

B
amplitude

time

Figure 3.1: An example of the beat extraction process for the audio
recordings of the Mazurka performances. Time is shown on the x-axis,
instantaneous amplitude on the y-axis. A) The audio waveform with
locations of taps (green) and the output from the spectrogram (onset
detection) is displayed as the black line. B) Taps (green) have been
snapped to the nearest onset (purple). C) Error corrected beats (purple),
from Sapp, (2009).

57
A

Figure 3.2: A)A measure of the musical notation from Mazurka in A


minor Op.17, No.4. B) The corresponding measure of audio from a perfor-
mance and the corresponding corrected beat times (purple), onset detec-
tions (orange vertical lines), and function from the spectrogram (black).
C) Harmonic Spectrogram.

3.2.2 Fractal analyses

Fractal analyses were calculated on the IBIs for each performance to assess the long-
range correlations. The data included both fGn (fractional Gaussian noise) and fBm
(fractional Brownian motion), which makes the analyses slightly more complicated
than the ones described in Chapter 2. The following process was used. First, the

58
power spectral density (PSD) analysis was calculated for the IBI time series of each
performance. The opposite of the spectral slope, β, is used to classify the series as
fGn (β < 1) or fBm (β > 1) in order to choose the most appropriate fractal analysis
(see Appendix B). Next, the Hurst exponent was estimated either by the Rescaled
Range (R/S) analysis (fGn) or the signal scaled windowed variance method (fBm;
Eke et al., 2000). The mean values are shown in Table 3.1.

3.3 RESULTS

Fractal analyses showed 96% of the 299 recordings to be significantly fractal. Most
of the performances were fGn. However, the Mazurka in F major Op.68, No.3 had
many performances that were fBm (β > 1), resulting in the mean β = 1.2. Table 3.1
shows that when the average H is greater, there is a higher percentage of significantly
fractal performances.

Table 3.1: Fractal results of all performances averaged across composi-


tion (∗p < .05).
Composition # of # of #signif. % signif H β
“Mazurka in” Beats Recordings fractal fractal

A minor Op.17 No.4 395 63 63* 100 .77f Gn .71

C major Op.24 No.2 360 64 59* 92 .67f Gn .43

B minor Op.30 No.2 192 34 34* 100 .79f Gn .49

C# minor Op.63 No.3 228 88 81* 92 .68f Gn .32

F major Op.68 No.3 180 50 50* 100 .17f Bm 1.2

Total or Grand Average 271 299 287 96 .62 .63

59
Mazurkas: 17 Pianists

1.2

0.8

β 0.6

0.4

0.2

0
17−4 24−2 30−2 63−3 68−3

Piece

Figure 3.3: A one-way ANOVA was performed on the β values of the 17


pianists’ performances of all five Chopin mazurkas. Error bars represent
standard error of the mean.

3.3.1 ANOVA

The next analysis addressed the question: Is the degree of fractal structure deter-
mined by the composition? Seventeen pianists recorded all five mazurkas. A one-way
ANOVA with the factor Piece (5) was calculated on the β values for these 85 perfor-
mances. Pairwise t-tests were used for posthoc comparisons. The ANOVA on the β
values revealed a main effect, F (4, 64) = 40.20, p < .01, shown in Figure 3.3. The
mean β value for the performances of Mazurka in F major Op. 68, No. 3, was signifi-
cantly greater than mean β values for performances of the other four pieces (p < .05).
In fact, it was the only piece which yielded fBm. This indicates that the performances
of this particular piece had stronger persistence in the long-term correlations than the
performances for the other pieces. The mean β value of the performances of Mazurka
in A minor Op.17, No.4 was significantly less than the mean β value for the Mazurka

60
in F major Op. 68, No. 3 (p < .05), and significantly greater than the mean β value
from the performances of the other three mazurkas (p < .05).

3.4 DISCUSSION

Experiment 1 demonstrated fractal structure in the tempo fluctuations of four per-


formances, but left open the question of whether 1/f structure generalizes to other
pianists, styles of music, and time periods. In Experiment 3, these questions were ad-
dressed by analyzing multiple expert pianists’ performances of five Chopin mazurkas.
The tempo fluctuations of the vast majority (96%) of the recordings were significantly
fractal. The number of performances that were significantly fractal was greater when
the spectral slope was steep–the value of β was large. A higher value of β implies a
stronger long-range correlation and stronger persistence in the serial structure. This
means that there are more predictable, or smoother, tempo fluctuations.
The ANOVA on the 17 pianists’ performances of the five mazurkas revealed a
main effect of piece. This result indicates that β and H were dependent upon the
particular musical composition. Thus, something about the composition may elicit
similar fluctuations in the tempo from each pianist. It is not clear what aspects of
a musical composition lead to this similarity in fluctuation. The two main findings,
fractal structure in tempo fluctuations generalizes to other pianists, and the degree
of fractal structure is dependent on the musical composition, give new insights into
the nature of pulse in musical performance.

61
Chapter 4

Entrainment with Temporally Fluctuating


Stimuli

4.1 INTRODUCTION

Rosen (1985) defined an anticipatory system as, “a system containing a predictive


model of itself and/or of its environment, which allows it to change state at an
instant in accord with the model’s predictions pertaining to a latter instant.” It
has been established that people are capable of synchronizing to a periodic signal
(reviewed by Repp, 2005). This synchronization has been assumed to be an example
of weak anticipation, or anticipation of an external system that arises from an internal
model (Dubois, 2003). The regularity of an isochronous signal lends itself to this
type of prediction. However, tempo in musical performance is not isochronous, it
fluctuates. Relatively few studies have investigated entrainment with temporally
fluctuating stimuli (Thaut et al., 1998; Thaut et al., 2009; Drake et al., 2000; Dixon
et al., 2006). Experiments 1 and 3 reported that the tempo fluctuations of skilled
pianists are characterized by 1/f -type long-term correlations and fractal scaling. This
finding holds for multiple pianists, styles, and time periods. In Experiment 2 it was
shown that subjects are able to predict 1/f tempo fluctuations in music and the
fractal structure of the stimuli is reflected in the inter-tap intervals (ITIs). How are
subjects able to entrain to a temporally fluctuating stimulus? What information is

62
required to predict fluctuations?
Stepp and Turvey (2009) argue that an anticipatory system does not need to be
explicitly early or late for any individual event. The system is anticipatory because
of its dependence on future states, not because of its state at any particular instant.
Stepp and Turvey (2009) propose that when a stimulus contains temporal fluctua-
tions, people will coordinate on a non-local time scale, or strong anticipation. This
adaptation to the statistical structure of the stimulus can be measured by compar-
ing the long-range correlations of the behavior and the long-range correlations of the
system to find a relationship. The scaling exponent of the organism should depend
on the scaling parameter of the environment if strong anticipation is taking place.
They conclude that adaptation of behavior to the environment’s statistical structure
requires neither explicit statistical inference nor explicit prediction; the adaptation is
a necessary consequence of a strongly anticipatory system.
Stephen et al. (2008) substituted the isochronous metronome for a chaotic signal
in an attempt to give subjects a stimuli that would not allow for the same sort
of internal model to predict the timing locally. They found that the long-range
correlation of the taps was strongly correlated with the long-range correlation of
the onsets, suggesting that participants were doing a mixture of reaction (tracking),
proaction, and synchrony (prediction). The results of Experiment 2 and the results
of Stephen et al. (2008) revealed a scaling relationship between fractal structure in
the stimulus and the structure of subjects’ ITIs. These results support the idea
that synchronization with a fluctuating stimulus which contains 1/f structure, is an
example of strong anticipation. Based on, 1) the results from Experiments 1 and 3
that pulse in piano performance exhibits 1/f structure, 2) the results from Experiment
2 that people can predict tempo fluctuations when tapping to musical performances,
and 3) the framework of strong anticipation, this chapter explores the importance of

63
1/f structure in subjects’ ability to predict temporal fluctuations.

4.1.1 Is 1/f Structure Sufficient?

The first question addressed is whether fractal structure is sufficient to enable pre-
diction of temporal fluctuations. Will subjects predict large tempo changes if the
fluctuations contain long-term correlations, regardless of the origin of these fluctua-
tions? Specifically, will there be a difference in subjects’ ability to predict fluctuations
that are created by a human during a musical performance versus statistically similar
synthetically created tempo fluctuations? It is possible that the tempo fluctuations
from a human musical performance contain different information, not captured with
typical statistical analyses, than statistically similar synthetic fluctuations. It is hy-
pothesized that 1/f structure is sufficient in this sense.
This leads to the issue of tempo fluctuation and musical structure. If 1/f struc-
ture is strongly sufficient, the addition of musical information should not further aid
prediction. Tempo fluctuations are highly correlated with aspects of musical struc-
ture (i.e., phrasing, metrical; Palmer, 1997; Sloboda and Juslin, 2001), and studies
by Repp (1999b) revealed that listeners expect fluctuations. As was shown in Exper-
iments 1 and 3, the degree of fractal structure in a particular musical performance
depends upon the particular composition. This evidence suggests that there are as-
pects of the musical composition which cause pianists’ tempo fluctuations to exhibit
similar degrees of fractal structure. Does this reflect a learned or innate preference
for these fluctuations and is it present in the average listener as well as the trained
musician? If listeners have expectancies about when and how the tempo of a piece of
music should fluctuate, then 1/f fluctuations that do not correspond to the musical
structure should be harder to predict than tempo fluctuations which are appropri-
ately correlated with the musical structure. The hypothesis about the sufficiency of

64
1/f structure, then becomes more specific: Subjects’ prediction will improve with
the addition of rhythm and pitch information only if this information is appropriate
for the musical structure. In other words, simply adding 1/f tempo fluctuations to
music, in an arbitrary manner will not help subjects predict.

4.1.2 Is 1/f Structure Necessary?

The next main question is whether 1/f structure is necessary for subjects’ prediction
of temporal fluctuations. Can subjects predict tempo fluctuations without fractal
structure? If subjects exploit long-term correlations, then prediction should be com-
promised or impossible when entraining to a temporally fluctuating stimulus which
contains no long-term correlations. Short-term correlation may be enough to help
people track small fluctuations and make local phase corrections, but it is unlikely
that this type of structure leads to prediction of large temporal fluctuations. In a
study by Thaut et al. (1998), subjects tracked random fluctuations at a lag of one.
Models of entrainment to temporally fluctuating stimuli forecast that subjects will
track any temporally fluctuating stimulus. However, results from Experiment 2 and
Repp (2002a) show that subjects predict large 1/f fluctuations in music tempi. Based
on these previous findings, the hypothesis is that subjects will predict fractally struc-
tured fluctuations and track random fluctuations.
Can subjects predict temporal fluctuations that exhibit 1) serial long-term cor-
relation only, and 2) long-term correlation with additional scaling information? If
more scaling information improves prediction, this would be support for the necessity
of 1/f structure. Results from Experiment 2 suggest that more scaling information
improves prediction of tempo fluctuation. Subjects were able to predict the fluctu-
ations significantly better than they tracked them for the Chopin at both metrical
levels. But, for the Bach, subjects did not show significant prediction at the faster

65
metrical level. This could have been due to less subdivisions of ITIs in the Bach at the
faster metrical level. Based on this fining, the hypothesis is that subjects prediction
will improve with the addition of scaling information. The extent that subjects use
fractal structure to predict fluctuations is addressed in this chapter, through a series
of experiments.

4.2 STIMULI

The issues described above were examined with four experiments using an entrainment
task where subjects were asked to entrain with temporally fluctuating stimuli. The
conditions consisted of three different kinds of temporal fluctuation. The time series
used to create the temporal fluctuations are described below.

4.2.1 Tempo Fluctuations

The stimuli were based on an expert piano performance recorded at the biannual Min-
nesota International Piano-e-competition, a highly competitive, judged piano compe-
tition. In this competition, performances were recorded on a Yamaha CFIIIS concert
grand piano equipped with Disklavier Pro recording technology, which collects the
MIDI1 data via fiber optics. The winners’ performances are available to the public,
online in a variety of formats (www.piano-e-competition.com). The MIDI format was
chosen for this study, as it allows for the extraction of beat times at multiple metrical
levels in a systematic way.
Jie Chen’s performance of Triana from Isaac Albeniz’s Iberia II, the winning per-
formance from the 2004 competition, was selected to create the stimuli because: The
performance had a strong pulse which could easily be felt by the average listener,
there was rhythmic activity at the sixteenth-note level throughout, and the composi-
1
Musical Instrument Digital Interface

66
tion would be unfamiliar to an average listener. Most importantly, the performance
contained large tempo fluctuations that exhibited fractal structure. As described in
Chapter 2, the notes in the performance were matched to its score using a custom
dynamic programming algorithm (Large, 1992; Large and Rankin, 2007). The first
2:07 minutes from the performance were used in order to have each experiment last
approximately one hour.
Beat times were extracted from the performance at sixteenth-note, eighth-note,
and quarter-note metrical levels. Beats without a corresponding event were interpo-
lated using local tempo. The Power Spectral Density (PSD) and Hurst’s Rescaled
Range (R/S) analyses were calculated, as described in Appendix B, for the IBIs from
each metrical level. The length of the time series, fractal statistics, and the lag 1 au-
tocorrelation are shown in the top row of Table 4.1. The tempo from the performance
exhibited persistent fractional Gaussian noise (fGn; H = .76). The beat times from
the sixteenth-note level formed the basis for the tempo fluctuations described next.
This level of metrical structure was chosen because there was a large number of time
points (N = 785) with enough notes occurring at this metrical level to eliminate the
need for excessive interpolation of beat times.
Three different types of temporally fluctuating signals were generated for Exper-
iments 4-7; two fractal signals, and one non-fractal signal. The first type, human,
was fractal and consisted of sixteenth-note level IBIs extracted from the piano perfor-
mance as described above. The second type of fluctuation, synthetic, was also fractal
and had similar statistics as the human fluctuations, but was created using the spec-
tral synthesis method (Turcotte, 1997; Voss, 1988). The two fractal signals were used
to address the issue of whether there is an effect of source for the 1/f structure, which
enables subjects to predict tempo fluctuations. The third type of fluctuation, shuf-
fled, was a shuffled version of the human fluctuations and had non-fractal structure

67
which was used to examine whether or not the long-range correlations in the fractal
structure are necessary for prediction. The details of these tempo fluctuations are as
follows:

1. Human Fluctuations The tempo fluctuations were created by extracting the


beat times from the piano performance at the sixteenth-note level. The differ-
ence of the beat times was taken to yield the IBI time series. The descriptive
and fractal statistics for the IBI time series are shown in Table 4.1 and Figure
4.1A.

2. Shuffled Fluctuations The shuffled fluctuations were created by shuffling the


human fluctuations (sixteenth-note level IBIs) until the time series had a flat
spectral slope (β = 0.00). The resulting time series contained no long-term
serial correlations. Thus, the fluctuations were random and independent while
preserving the statistical properties of the human fluctuations, including the
number of beat times, distribution, mean, and variance (see Table 4.1 and
Figure 4.1B). The IBIs were cumulatively summed to obtain a series of beat
times at the sixteenth-note level.

3. Synthetic Fluctuations A 1/f time-series of sixteenth-note level IBIs was


synthesized from the shuffled IBIs using the spectral synthesis method (Tur-
cotte, 1997; Voss, 1988; see Appendix C). The synthetic IBI time series had
the same number of points, distribution, mean tempo, variance, and spectral
slope, as the human fluctuations, but the fluctuations no longer corresponded to
the musical structure of the composition. The IBIs were cumulatively summed
to obtain a series of sixteenth-note level beat times (see Table 4.1 and Figure
4.1C).

68
A IBIs 16th Note Level Histogram PSD R/S
0.5 50 10 0
10 H=0.76
40
0.4
5
30
0.3 Log R/S

Human
Frequency
20

Log Power
0.2 0
−1
10 10
0.1 β = 0.55
0 −5

B 50 10 0
0.5 10 H=0.51
40
0.4
5
30
0.3
Log R/S

20

Shuffled
Log Power

Frequency
0.2 0
10 −1
10
0.1 β = −0.00

69
0 −5

C 50 10 0
10 H=0.76
0.5
40
0.4 5
30
0.3
Log R/S

20
Log Power

Frequency

Synthetic
0.2 −1
10 10
0.1 β = 0.53
0 −5 0 1
100 200 300 400 500 600 700 0 0.5 −10 −5 0 10 10
Time Bins Log Frequency Log Window

Figure 4.1: Raw time series (IBI plot), distribution (histogram), PSD,
and R/S analyses of the three types of tempo fluctuation at the 16th-
note metrical level. A) Human tempo fluctuations, B)shuffled tempo
fluctuations, C) synthetic tempo fluctuations.
Table 4.1: Mean IBI, number of events, Hf Gn , and the autocorrelation at
lag 1 (ac1 ) are listed for the IBIs of each fluctuation at all three metrical
1
levels ( 16 -note, 18 -note, 41 -note).

Fluc Mean IBI # Events HfGn ac1

1 1 1 1 1 1 1 1 1 1
4 16 8 4 16 8 4 16 8 4

Human 741 ms 784 392 196 .76 .70 .67 .5659 .4018 .3026

Synthetic 746 ms 784 392 196 .76 .77 .78 .3832 .4510 .4697

Shuffled 743 ms 784 392 196 .51 .53 .54 -.0073 .0565 .1442

The hypotheses for the three tempo fluctuations are shown in Figure 4.2. First,
1/f structure will be sufficient. In other words, the subjects’ prediction (lag 0 cross
correlation between ITIs and IBIs) will be stronger for the tempo changes that ex-
hibit 1/f structure than the shuffled fluctuations, regardless of the source of the 1/f
structure. This is based on the results from Experiment 2 and Stephen et al. (2008)
who showed that subjects could predict fractally structured stimuli. The subsidiary
hypothesis for the fluctuations is that subjects will track (lag 1 cross correlation)
the non-fractal fluctuations more than subjects will track the fractal fluctuations.
This is based on the work of Thaut et al. (1998) which showed that subjects tracked
randomly fluctuating stimuli.

4.3 EXPERIMENTS

In Experiments 4-7 subjects tapped the beat to an auditory stimulus which fluc-
tuated in time according to the three types of tempo fluctuation described above.
The amount of rhythmic and pitch information was controlled and varied for each
experiment in order to further examine the sufficiency of 1/f structure in temporal

70
Hypothesis
Prediction (lag 0) Tracking (lag 1)

Human Synthetic Shuffled Human Synthetic Shuffled


Fractal Random Fractal Random
Appropriate Inappropriate Appropriate Inappropriate

Figure 4.2: Hypothesized lag 0 and lag 1 cross-correlation coefficients


(ITI x IBI) for each type of tempo fluctuation.

prediction. Starting from Experiment 4 the amount of information in terms of beat


(scaling), rhythm, and pitch were progressively added. The stimuli for Experiment
4 consisted of a sound for each beat at the quarter-note metrical level. Out of the
four experiments, this experiment had the least amount of information in the stim-
uli; subjects tapped once for each event and had no information between taps. For
Experiment 5, the stimuli consisted of beats at the eighth-note level. This stimuli in-
cluded information from a lower (faster) metrical level resulting in one event between
each tap. This additional metrical information (i.e., fractal scaling) is hypothesized
to help subjects predict tempo changes. Experiment 6 consisted of the rhythm of
the Albeniz composition. This made the stimulus more variable, thus some taps had
multiple events between them and others had none. Subjects were asked to tap with
the beat. This stimuli contained additional information about metrical structure and
rhythmic grouping. If this information helps subjects predict fluctuations, then 1/f

71
structure is not sufficient for prediction. Experiment 7 consisted of the rhythm of
Experiment 6 with the addition of the pitches from the Albeniz composition. This
stimulus contained additional melodic and harmonic information. If this information
improves prediction by giving harmonic and melodic cues about how the tempo will
change, then 1/f structure is not sufficient for prediction.

Hypothesis: Prediction (Lag 0)


Human Fluctuation Condition
r

Quarter Eighth Rhythm Music

Figure 4.3: Hypothesized lag 0 cross-correlation coefficients (ITI x IBI)


for the human fluctuation condition of each experiment.

Figure 4.3 shows the specific hypotheses for each experiment. The assumption
is that subjects will get better at predicting the tempo fluctuations with each new
piece of information, because each kind of information gives the subject more cues
about how the tempo will fluctuate. However, it is important to note that this is only
for the human fluctuation condition. The other conditions were synthesized and do
not fluctuate in ways which are appropriate for the musical composition. Thus, the
rhythm (Experiment 6) and pitch (Experiment 7) information may hinder prediction
for the synthetic and shuffled fluctuations because the temporal fluctuations would
be inappropriate and violate expectancy.

72
4.4 EXPERIMENT 4: QUARTER-NOTE BEATS

This experiment was designed to investigate the role of 1/f structure in subjects’
ability to predict temporal fluctuations. Beat times from each of the fluctuation
conditions at the quarter-note metrical level were given as stimuli. Subjects tapped
with every event (beat). There was no (scaling) information between taps or musical
information, which allowed for the investigation of the extent that subjects use long-
term correlational structure to predict temporal fluctuations. The hypothesis is that
subjects will predict the fluctuations in the two 1/f conditions equally well and they
will not be able to predict the fluctuations in the stimulus which did not exhibit
long-term correlation.

4.4.1 Method: Quarter-note Beats

Stimuli The stimuli for Experiment 4 consisted of beats at the quarter-note metrical
level (average IBI=743 ms), in which temporal fluctuations were assigned to the three
types of fluctuation described above. The stimulus was created by extracting every
fourth beat from the sixteenth-note level time series, resulting in quarter-note level
beat times for each of the three conditions (human, synthetic, shuffled ). The quarter-
note level beat times from each condition were rendered as a series of clicks created
from a clave sample (see below).

Participants Twelve volunteers (8 males, 4 females) from the Florida Atlantic Uni-
versity community participated in the experiment. Musical training ranged from zero
to ten years, none were professional musicians. Each participant signed an informed
consent form that was approved by the Institutional Review Board of Florida Atlantic
University.

73
Apparatus Subjects were seated in an IAC sound-attenuated experimental cham-
ber wearing Ultrasone PROline 550 Studio headphones. The stimulus was presented
by a custom Max/MSP (http://cycling74.com) program running on a Macintosh G4
computer (OSX 10.4.11). Sounds were generated using the General MIDI percussion
clave patch on a Kurzweill 2500R sampler. Subjects tapped on a Roland Hand-
sonic HPD-15 drumpad that sent the time and velocity of the taps to the Max/MSP
program.

Procedure Subjects were informed that they would hear a series of clicks that con-
tained temporal fluctuations. They were asked to tap with each click, and to keep
up with the tempo changes to the best of their ability. An induction sequence of six
clicks, with IOIs equal to the first stimulus IBI, was provided to illustrate the correct
period and phase at which to tap. Continuing from the induction sequence, partic-
ipants tapped the beat for the entire duration of the stimulus (2:07 min). Subjects
tapped on the drumpad using the index finger of their dominant hand. Six trials of
each tempo condition were collected in blocks. The order of the blocks was human,
synthetic, and shuffled.

4.4.2 Analysis: Quarter-note Beats

For this study, the raw cross-correlations were used instead of the prediction and
tracking indicies (see Chapter 2). The prediction and tracking indicies normalize,
which was not desirable here, as all the stimuli had the same amount of tempo fluc-
tuation, but different serial structures. Another concern was that the normalization
in the prediction and tracking indicies could be dividing out the short-term correla-
tions (ac1 ), and these correlations are an important part of the fractal structure: one
cannot have long-term correlation without short-term correlation.

74
Cross-correlations Cross-correlations between ITIs and IBIs were calculated at lag
zero (lag 0), positive lag one (lag 1), and negative lag one (lag-1) for each trial (12
subjects x 6 trials = 72 trials per condition). This calculation assessed the extent
that subjects were 1) predicting tempo fluctuations, which would result in a high
lag 0 cross-correlation (ITI fluctuations were simultaneous with the IBI fluctuations),
or 2) tracking tempo fluctuations, which would result in a high positive lag 1 cross-
correlation (ITI fluctuations were a beat behind the IBI fluctuations). The negative
lag-1 cross-correlation was calculated to assess another kind of prediction, which is
referred to as proaction. Stepp and Turvey (2009) proposed a model where subjects
may increase or decrease their ITIs before the increase or decrease in the IOIs, which
would result in a high negative lag-1 cross-correlation (ITI fluctuations ahead of the
IOI fluctuations).
Two-way ANOVAs were conducted on the correlation coefficients with the co-
efficients for each lag as the dependent variable. The analyses used within-subject
factors: Fluctuation (human vs. synthetic vs. shuffled ), and Trial (6). Significance
was computed at p < .01, and no significant effects of trial or interactions containing
trial were found, implying that no learning had taken place, and were not considered
further. The ANOVAs were rerun collapsing over Trial, as a one-way ANOVA with
the within-subjects factor of Fluctuation (human vs. synthetic vs. shuffled ). Pairwise
t-tests were used for post-hoc comparisons.

Fractal Analyses Fractal analyses were used to examine whether the 1/f structure
in the stimulus caused scaling in the tapping data. As in Experiment 2 (Chapter
2), Power Spectral Density (PSD) and Hurst’s Rescaled Range (R/S) analyses were
calculated for the ITI and the relative phase time-series for each trial and subject (72
trials per condition). The results of the PSD revealed that for every trial, all β < 1,

75
thus all trials were categorized as fGn and the R/S analysis was also calculated to
confirm the results from the PSD and obtain the Hurst exponent (H ). In general, the
H and β values agreed according to the equation β = 2H − 1. Note that a signifi-
cance measure is available for each trial, with the R/S analysis (see Chapter 2 and
Appendix B). Thus, the mean H -values from the R/S analysis are reported along
with the percentage of trials that were significantly different from H = 0.5 in Table
4.2. The results are separated into not significant, persistent or anti-persistent. When
H is not significantly different from 0.5 there is no long-term correlation present in
the series. Persistence, 0.5 < H < 1, indicates positive long-term correlation; the
increments are dependent on previous elements. Anti-persistence, 0 < H < 0.5, in-
dicates negative long-term correlation; the increments are negatively dependent on
previous elements.

4.4.3 Results: Quarter-note Beats

Cross-correlations

Lag 0 All (100%) of the lag 0 cross-correlation coefficients from each trial for
the human and synthetic conditions were significantly greater than zero. Only about
half (47%) of the coefficients for the shuffled condition were significantly greater than
zero. An ANOVA on the lag 0 cross-correlation coefficients revealed a significant
main effect of Tempo Fluctuation, F (2, 22) = 140.56, p < .01. The coefficients for
the human and synthetic conditions were significantly greater than the coefficients for
the shuffled condition (p < .01; Figure 4.4 (middle)). Thus, prediction was greater for
the fractal fluctuations but also significant for almost half of the trials in the shuffled
condition.

76
Positive lag 1 Virtually all of the lag 1 cross-correlation coefficients for all three
fluctuation conditions were significantly greater than zero, human (99%), synthetic
(100%), and shuffled (100%). Yet, an ANOVA on the lag 1 cross-correlation coeffi-
cients revealed a significant main effect of Tempo Fluctuation, F (2, 22) = 25.56, p <
.01. The coefficients for the human and synthetic conditions were significantly less
than the coefficients for the shuffled condition, and coefficients for the human condi-
tion were significantly less than the coefficients for the synthetic condition (p < .01).
Thus, tracking was strongest for the shuffled condition and weakest for the human
condition. These results are shown in Figure 4.4 (right).

Negative lag-1 For the human condition, 90% of the cross-correlation coeffi-
cients were significantly different from zero, and for the synthetic condition 100% of
the cross-correlation coefficients were significantly different from zero, however, for
the shuffled condition 7% of the cross-correlation coefficients were significantly dif-
ferent from zero. An ANOVA on the lag-1 cross-correlation coefficients revealed a
significant main effect of Tempo Fluctuation, F (2, 22) = 33.30, p < .01. The coef-
ficients for the human and synthetic conditions were significantly greater than the
coefficients for the shuffled condition (p < .01; Figure 4.4 (left)). Thus, subjects were
not able to proact when tapping to fluctuations which had no long term correlations.

Fractal Analyses

Inter-Tap Intervals Results are displayed in Table 4.2. For ITIs from the fractal
conditions, human and synthetic, 49% and 89% of trials were persistent (0.5 < H <
1; p < .05) respectively; none of the trials from the fractal conditions were anti-
persistent. Approximately half (46%) of the trials from the shuffled condition were

77
Experiment 4: Quarter-note Beats
Cross-correlations
1
Lag-1 Lag 0 Lag 1 *p < .01
*
0.9

0.8

0.7
*
0.6
*
*
0.5
r

0.4
*
0.3
*
0.2 *
0.1

0 90% 7% 47% 99%


Human Synthetic Shuffled Human Synthetic Shuffled Human Synthetic Shuffled

Figure 4.4: Mean coefficients from the cross-correlation between ITIs


and IBIs at lag-1, lag 0, and lag 1 for Experiment 4: Quarter-note Beats.
The percentage of trials for which the cross-correlation coefficient was
significantly different from zero are listed if the percentage was less than
100%. Error bars equal standard error of the mean.

anti-persistent (0 < H < 0.5; p < .01); none of the trials from the shuffled condition
were persistent. Moreover, mean H values for each condition scaled with the H
values of the stimuli, as expected based on the results from Experiment 2. Thus,
ITIs exhibited fractal structure which was dependent upon the presence of fractal
structure in the stimuli.

Relative Phase For relative phase from the fractal stimuli, human and synthetic,
40% and 53% of the trials were significantly persistent (p < .05) respectively, and
only 4% of trials were anti-persistent (p < .01). The shuffled condition also showed
persistent trials (36%), and few anti-persistent trials (7%). The H values appear to
scale with condition. Thus, the errors had positive long-term correlation and the
fractal structure in the stimuli was reflected in the structure of the error time series.

78
Table 4.2: Fractal statistics for the ITIs and relative phase time series
from Experiment 4: Quarter-note Beats. Mean Hurst exponents (H from
R/S analysis) averaged across subjects, and the percentage of trials sig-
nificantly persistent (P), anti-persistent (AP), or not significant (NS), are
displayed (p < .05).
Stimulus Inter-Tap Interval Relative Phase

Fluctuation Hf Gn Hf Gn % P %NS %AP Hf Gn % P %NS %AP

Quarter Human .67 .57 49 51 0 .55 40 56 4


Quarter Synthetic .78 .62 89 11 0 .59 53 47 0
Quarter Shuffled .54 .42 0 54 46 .54 36 57 7

4.4.4 Summary: Quarter-note Beats

The fractal tempo fluctuation conditions (human and synthetic) showed significantly
stronger prediction (lag 0) and proaction (lag-1) than the shuffled condition. Predic-
tion for the fractal conditions was much greater than the shuffled, this result indicates
that subjects used the long-term correlation to improve prediction (at both lag 0 and
lag-1), but the fact that subjects were able to predict fluctuations (at lag 0) in the
shuffled condition for about half of the trials indicates that 1/f structure is not nec-
essary for prediction. This means that something other than long-term correlation
enables prediction. It is likely that subjects used the existing short-term correlations
(ac1=.1442) to predict fluctuations at lag 0, but this was not enough to enable pre-
diction at lag-1 (proaction). Thus, subjects exploit long-term correlations when they
are present, but are capable of using other short-term structure to predict fluctua-
tions. As was hypothesized, tracking was the stronger for the shuffled condition than
the fractal conditions. This indicates that, on average, subjects were tapping after
the fluctuations rather than predicting them. There were no significant differences
between the results from the two fractal conditions at lag 0 or lag-1. This indicates

79
that, for a fluctuating stimulus, 1/f structure is sufficient for prediction. Fractal con-
ditions yielded persistent ITI time series, whereas the random fluctuations yielded
anti-persistent ITIs. Fractal analysis of the relative phase time series showed that tri-
als were either persistent or uncorrelated. H values from he ITIs and relative phase
also appear to scale with the H values from the stimuli. The results from the cross-
correlations that subjects are doing a mixture of prediction, proaction, and tracking,
along with the finding of tapping data scaling with the stimuli, provides support
for strong anticipation as the mechanism for prediction. Results from Experiment 4
show that 1/f structure is sufficient but not necessary for prediction of large temporal
fluctuations.

4.5 EXPERIMENT 5: EIGHTH-NOTE BEATS

Results from Experiment 4 showed that subjects exploited long-term correlational


structure in order to better predict temporal fluctuations, and that the structure
of the stimuli is reflected in the behavior. Experiment 5 continued to investigate
the role of 1/f structure in prediction by examining another aspect of fractal struc-
ture: scaling. In Experiment 2 subjects consistently predicted the fluctuations in
the Chopin while tapping at two different metrical levels, but for the Bach subjects
had a significant decrease in their ability to predict the fluctuations when tapping at
the eighth-note metrical level. This could have been a result of the number of notes
between taps. The Chopin had more notes at the sixteenth-note metrical level, thus
more subdivisions of ITIs when subjects were tapping at the eighth-note metrical level
when compared with the Bach which had a more variable rhythm resulting in some
ITIs (at the eighth-note level) without subdivisions. In other words, subjects had a
consistent amount of information at a lower (faster) time scale (metrical level) for the

80
Chopin. Experiment 5 systematically explored the improvement of prediction due to
additional scaling information. Does information from a lower time scale (more time
points) improve prediction?

4.5.1 Method: Eighth-note Beats

Stimuli In Experiment 5, the stimuli were created from the same fractal and random
tempo fluctuations as in Experiment 4 (human,synthetic, shuffled ). The stimuli were
rendered as clicks as in Experiment 4, however, in addition to the quarter-note level
beats, subjects also heard the beats from the eighth-note level. To produce the
onset times at the eighth-note beat level every other beat was extracted from the
sixteenth-note level time series. The eighth-note level beat times from each condition
were played as a series of clicks simultaneously with the quarter-note level beat times
presented simultaneously as a lower pitch. Subjects tapped with the quarter-note
level beats. Thus, subjects heard one sound, for each beat at the eighth-note level,
between each tap. There were twice as many events in these stimuli compared with
the stimuli of Experiment 4, in which no events intervened between taps.

Participants The same 12 subjects from Experiment 4 participated in this experi-


ment.

Apparatus Same as Experiment 4.

Procedure Subjects were informed that they would hear a series of clicks that
contained temporal fluctuations. They were asked to tap with each lower pitched
click (i.e., at the quarter-note metrical level), and to keep up with the tempo changes
to the best of their ability. An induction sequence of six clicks, with IOIs equal to the

81
first quarter-note level IBI, was provided to illustrate the correct period and phase
at which to tap. The quarter-note level beats continued throughout the stimulus at
a lower pitch than the eighth-note beat clicks, giving the subjects an unambiguous
cue for the correct phase. Continuing from the induction sequence, subjects tapped
the beat for the entire duration of the stimulus (2:07 min). Subjects tapped on the
drumpad using the index finger of their dominant hand. Six trials of each tempo
condition were collected in blocks. The order of the blocks was human, synthetic, and
shuffled.

4.5.2 Analysis: Eighth-note Beats

Cross-correlations Cross-correlations between ITIs and IBIs were calculated for


each trial (72 trials per condition). As in Experiment 4, quarter-note level IBIs were
used, because this was the level at which subjects tapped. Two-way ANOVAs were
conducted on the correlation coefficients with the coefficients for each lag as the
dependent variable. The analyses used within-subject factors: Fluctuation (human
vs. synthetic vs. shuffled ), and Trial (6). Significance was computed at p < .01, and
no significant effects of trial or interactions containing trial were found, implying that
no learning had taken place, and were not considered further. The ANOVAs were
rerun, collapsing over Trial, as a one-way ANOVA with the within-subjects factor
of Fluctuation (human vs. synthetic vs. shuffled ). Pairwise t-tests were used for
post-hoc comparisons.

Fractal Analyses As in Experiments 2 and 4, the Power Spectral Density (PSD)


and Hurst’s Rescaled Range (R/S) analyses were calculated for the ITI and the rela-
tive phase time-series from each trial across subjects (72 trials per condition).

82
4.5.3 Results: Eighth-note Beats

Cross-correlations

Lag 0 All of the lag 0 cross-correlation coefficients for all three fluctuation con-
ditions were significantly greater than zero, human (100%), synthetic (100%), and
shuffled (100%). Yet, an ANOVA on the lag 0 cross-correlation coefficients revealed
a significant main effect of Tempo Fluctuation, F (2, 22) = 198.00, p < .01. The
coefficients for the human and synthetic conditions were significantly greater than
the coefficients for the shuffled condition. There was also a small, but significant
difference between coefficients for the human condition were less than the coefficients
for the synthetic condition (p < .01; see Figure 4.5 middle).

Positive lag 1 All of the lag 1 cross-correlation coefficients for all three fluctua-
tion conditions were significantly greater than zero, human (100%), synthetic (100%),
and shuffled (100%). An ANOVA on the lag 1 cross-correlation coefficients revealed
a significant main effect of Tempo Fluctuation, F (2, 22) = 21.09, p < .01. The dif-
ferences between conditions was small but significant. The coefficients for the human
and shuffled conditions were significantly less than the coefficients for the synthetic
condition (p < .01; see Figure 4.5 right).

Negative lag-1 For the human condition, 15% of the cross-correlation coeffi-
cients were significantly different from zero, and for the synthetic condition 100%
of the cross-correlation coefficients were significantly different from zero, however, for
the shuffled condition only 5% of the cross-correlation coefficients differed significantly
from zero. An ANOVA on the lag-1 cross-correlation coefficients revealed a significant
main effect of Tempo Fluctuation, F (2, 22) = 169.28, p < .01. The coefficients for

83
the human and shuffled conditions were significantly less than the coefficients for the
synthetic condition (p < .01). These results are shown in Figure 4.5 (left).

Experiment 5: Eighth-note Beats


Cross-correlations

1
Lag-1 Lag 0 Lag 1 *p < .01
0.9

0.8
*
0.7
* *
0.6
*
r

*
0.5

0.4

0.3

*
0.2
*
0.1

0 15% 5%
Human Synthetic Shuffled Human Synthetic Shuffled Human Synthetic Shuffled

Figure 4.5: Mean coefficients from the cross-correlation between ITIs


and IBIs at lag-1, lag 0, and lag 1 for Experiment 5: Eighth-note Beats.
The percentage of trials for which the cross-correlation coefficient was
significantly different from zero are listed if the percentage was less than
100%. Error bars equal standard error of the mean.

Fractal Analyses The results showed that for every trial, all β < 1, thus all trials
were categorized as fGn and the R/S analysis was also calculated. In general, H and
β values agreed according to the equation β = 2H − 1. Mean H -values from the
R/S analysis are reported along with the percentage of trials that were significantly
different from H = 0.5 in Table 4.3.

Inter-Tap Intervals For ITIs from the fractal conditions, human and synthetic,
34% and 100% of trials were persistent (p < .05) respectively; which indicates positive
long-range correlation (0.5 < H < 1). None of the trials from the fractal conditions

84
Table 4.3: Fractal statistics for the ITIs and relative phase time series
from Experiment 5: Eighth-note Beats. Mean Hurst exponents (H from
R/S analysis) averaged across subjects, and the percentage of trials sig-
nificantly persistent (P), anti-persistent (AP), or not significant (NS), are
displayed (p < .05).
Stimulus Inter-Tap Interval Relative Phase

Fluctuation Hf Gn Hf Gn % P %NS %AP Hf Gn % P %NS %AP

Eighth Human .70 .55 34 66 0 .53 28 68 4


Eighth Synthetic .77 .65 100 0 0 .55 38 62 0
Eighth Shuffled .53 .42 0 49 51 .50 19 62 19

were anti-persistent. 51% of trials from the shuffled condition were anti-persistent
(p < .01); none of the trials from the shuffled condition were persistent. Moreover,
mean H values for each condition (H = .55 human; H = .65 synthetic; H = .42
shuffled ) scaled with the H values of the stimuli (H = .70 human; H = .77 synthetic;
H = .53 shuffled ), as expected based on the results from Experiment 2.

Relative Phase For relative phase from the fractal stimuli, human and synthetic,
28% and 38% of the trials were significantly persistent (p < .05) respectively, and only
4% of trials from the human were anti-persistent (p < .01). The shuffled condition
showed an equal number of persistent (19%), and anti-persistent trials (19%). Al-
though the H values were similar across conditions, these small differences appear to
be scaling with the H values of the stimuli. Results are displayed in Table 4.3.

85
4.5.4 Summary: Eighth-note Beats

Prediction was significantly better for the fractal conditions than for the shuffled con-
dition. There were significant differences between the human and synthetic conditions
at lag 0, lag 1, and lag-1, with coefficients for synthetic greater than human. These
results are further evidence that for a fluctuating stimulus, 1/f structure improves
prediction, but there is something other than the presence of long-term correlation
which enables subjects to predict fluctuations in the shuffled condition. Thus, sub-
jects exploit long-term correlations when they are present, but are capable of using
other short-term structure to predict fluctuations.
Fractal conditions yielded persistent ITI time series, whereas the random fluctu-
ations yielded anti-persistent ITIs. Fractal analysis of the relative phase time series
showed persistence for the fractal conditions and a small, but equal number of trials
persistent and anti-persistent for the shuffled condition. H values from the ITIs and
relative phase appear to scale with the H values from the stimuli.

4.6 EXPERIMENT 6: RHYTHM

Experiment 6 examined the issue of whether 1/f structure is sufficient for prediction
of temporal fluctuations. The stimulus for this study was the rhythm from Triana
(Albeniz) and subjects tapped the beat which was explicitly marked in the stimu-
lus. The addition of rhythmic (musical) information was hypothesized to improve
subjects’ ability to predict temporal fluctuations for the human fluctuations because
the rhythm contains the grouping and metrical information which are known to be
highly correlated with temporal fluctuations. However, this now brings us to the
issue of appropriate vs. inappropriate temporal fluctuations, in terms of the musical
composition. The synthetic and shuffled fluctuations were created without regard

86
to the musical structures and were therefore inappropriate when imposed onto the
rhythm of the music. This was not an issue in Experiments 4 and 5 where only
the beat times were used. If 1/f structure is sufficient, then the addition of musical
information (rhythm) should not improve prediction. If the appropriateness of the
fluctuations (source) is necessary, then subjects may struggle with the inappropriate
conditions due to violations of expectancy.

4.6.1 Method: Rhythm

Stimuli The stimuli were series of clicks, however, instead of beat times the clicks
were the rhythm from Triana from Isaac Albeniz’s Iberia II. The rhythm did not
include chord asynchronies or dynamics2 from the performance, because chord asyn-
chronies are known to improve tempo tracking (Palmer, 1997), and dynamics have
been found to be highly correlated with tempo fluctuations (Sloboda and Juslin,
2001). Thus, these two musical parameters were eliminated in order to control for
effects of chord asynchronies and dynamics. The rhythm had different tempo fluctu-
ations for each condition. The rhythm of each condition was presented as monotonic
clicks, simultaneously with the quarter-note beats (lower pitch). Subjects tapped to
the quarter-note level as in Experiments 4 and 5. The sixteenth-note level beats from
each type of fluctuation were used to create the fluctuations in the rhythms as follows:

1. Human Fluctuation The rhythm from Jie Chen’s piano performance was
rendered as clicks (one sound for each chord) instead of different pitches on a
piano.
2
Dynamics is a musical term which refers to changes in loudness.

87
2. Synthetic Fluctuation Using a procedure that was the opposite of extracting
beats from a musical performance, the IBIs from the sixteenth-note level syn-
thetic condition, were imposed onto the rhythm of Triana from Isaac Albeniz’s
Iberia II.

3. Shuffled Fluctuation Using a procedure that was the opposite of extracting


beats from a musical performance, the IBIs from the sixteenth-note level shuffled
condition, were imposed onto the rhythm of Triana from Isaac Albeniz’s Iberia
II.

Participants The same 12 subjects from Experiments 4 and 5 participated in this


experiment.

Apparatus Same as Experiment 4.

Procedure The procedure followed was the same as in Experiment 5.

4.6.2 Analysis: Rhythm

Cross-correlations Cross-correlations between ITIs and IBIs were calculated for


each trial (72 trials per condition). Also, as in Experiments 4 and 5, quarter-note
level IBIs were used, because this was the level at which subjects tapped.
Two-way ANOVAs were conducted on the correlation coefficients with the co-
efficients for each lag as the dependent variable. The analyses used within-subject
factors: Fluctuation (human vs. synthetic vs. shuffled ), and Trial (6). Significance
was computed at p < .01, and no significant effects of trial or interactions containing
trial were found, implying that no learning had taken place, and were not considered

88
further. The ANOVAs were rerun collapsing over Trial, as a one-way ANOVA with
the within-subjects factor of Fluctuation (human vs. synthetic vs. shuffled ). Pairwise
t-tests were used for post-hoc comparisons.

Fractal Analyses As in Experiments 2, 4, and 5, the Power Spectral Density (PSD)


and Hurst’s Rescaled Range (R/S) analyses were calculated for the ITI and the rela-
tive phase time-series from each trial across subjects (72 trials per condition).

4.6.3 Results: Rhythm

Cross-correlations

Lag 0 All of the lag 0 cross-correlation coefficients for all three fluctuation con-
ditions were significantly greater than zero, human (100%), synthetic (100%), and
shuffled (100%). An ANOVA on the lag 0 cross-correlation coefficients revealed a
significant main effect of Tempo Fluctuation, F (2, 22) = 66.82, p < .01. The coef-
ficients for the human and synthetic conditions were significantly greater than the
coefficients for the shuffled condition. A small but significant difference was observed
between the human and synthetic conditions: coefficients for the human condition
were greater than the coefficients for the synthetic condition (p < .01; see Figure
4.6(center)). Thus, subjects were able to predict fluctuations for all three condi-
tions but were much better for the fractal conditions. The decrease in prediction for
the synthetic and shuffled conditions indicates that even when fluctuations contained
1/f structure, the inappropriate fluctuations were more difficult to predict than the
human (appropriate) fluctuations.

Positive lag 1 All of the lag 1 cross-correlation coefficients for all three fluctua-
tion conditions were significantly greater than zero, human (100%), synthetic (100%),

89
and shuffled (100%). An ANOVA on the lag 1 cross-correlation coefficients revealed
a significant main effect of Tempo Fluctuation, F (2, 22) = 27.34, p < .01. The co-
efficients for the synthetic and shuffled conditions were significantly greater than the
coefficients for the human condition (p < .01; Figure 4.6 (right)). Subjects showed
stronger tracking for the shuffled and synthetic conditions which suggests that the
appropriateness of the fluctuations was important for this measure. Thus, subjects
tracked more for inappropriate tempo fluctuations, even when 1/f structure was
present.

Negative lag-1 For the human condition, 23% of the cross-correlation coeffi-
cients were significantly different from zero, and for the synthetic condition 100%
of the cross-correlation coefficients were significantly different from zero, however, in
the shuffled condition only 5% of the cross-correlation coefficients differed significantly
from zero. An ANOVA on the lag-1 cross-correlation coefficients revealed a signifi-
cant main effect of Tempo Fluctuation, F (2, 22) = 60.25, p < .01. The coefficients
for the human and shuffled conditions were significantly less than the coefficients for
the synthetic condition (p < .01). These results are shown in Figure 4.6 (left). Thus,
subjects proacted most for the synthetic condition, and very little for the shuffled
condition.

Fractal Analyses The results showed that for every trial, all β < 1, thus all trials
were categorized as fGn and the R/S analysis was also calculated. In general, the H
and β values agreed according to the equation β = 2H − 1. Mean H-values from the
R/S analysis are reported along with the percentage of trials that were significantly
different from H = 0.5 in Table 4.4.

90
Experiment 6: Rhythm
Cross-correlations
1
Lag-1 Lag 0 Lag 1 *p < .01
0.9 *
0.8
*
0.7
*
0.6
*
0.5
r

*
0.4

0.3

0.2
*
0.1 * 5%
0 23%
Human Synthetic Shuffled Human Synthetic Shuffled Human Synthetic Shuffled

Figure 4.6: Mean coefficients from the cross-correlation between ITIs


and IBIs at lag-1, lag 0, and lag 1 for Experiment 6: Rhythm. The
percentage of trials for which the cross-correlation coefficient was signifi-
cantly different from zero are listed if the percentage was less than 100%.
Error bars equal standard error of the mean.

Inter-Tap Intervals For ITIs from the fractal conditions, human and synthetic,
29% and 100% of trials were persistent (p < .05) respectively; which indicates positive
long-range correlation (0.5 < H < 1). None of the trials from the fractal conditions
were anti-persistent. Approximately one quarter (24%) of trials from the shuffled
condition were anti-persistent (p < .01); none of the trials from the shuffled condition
were persistent. Moreover, mean ITI H values for each condition (H = .56 human;
H = .65 synthetic; H = .44 shuffled ) scaled with the H values of the stimuli at the
quarter-note metrical level (H = .67 human; H = .78 synthetic; H = .54 shuffled ),
not the stimuli at the sixteenth-note level which was, H = .76 human; H = .76
synthetic; H = .51 shuffled. Results are displayed in Table 4.4.

91
Table 4.4: Fractal statistics for the ITIs and relative phase time series
from Experiment 6: Rhythm. Mean Hurst exponents (H from R/S anal-
ysis) averaged across subjects, and the percentage of trials significantly
persistent (P), anti-persistent (AP), or not significant (NS), are displayed
(p < .05).
Stimulus Inter-Tap Interval Relative Phase

Fluctuation Hf Gn Hf Gn % P %NS %AP Hf Gn % P %NS %AP

Rhythm Human .76 .56 29 71 0 .57 31 69 0


Rhythm Synthetic .76 .65 100 0 0 .56 42 58 0
Rhythm Shuffled .51 .44 0 76 24 .51 18 72 10

Relative Phase For relative phase from the fractal stimuli, human and synthetic,
31% and 42% of the trials were significantly persistent (p < .05) respectively, and none
of the trials were anti-persistent. The shuffled condition also showed persistent trials
(18%), and some anti-persistent trials (10%). The mean H value was smallest for
the shuffled condition (H = .51) and for the two fractal conditions mean H was
nearly equal (H = .57 human; H = .56 synthetic); thus the H of relative phase was
scaling with the H values of the stimuli (H = .51 shuffled ; H = .76 human; H = .76
synthetic).

4.6.4 Summary: Rhythm

Subjects’ prediction was significantly better for the fractal conditions than for the
shuffled condition, and best for the human condition.Tracking was greater for the
synthetic and shuffled conditions as compared to the human. Proaction (lag-1) was
highest for the synthetic condition. These results indicate that subjects continue to
exploit the long-term structure to predict significantly better than when there is only
short term structure. Another interesting result was that subjects predicted more for

92
the human fluctuations than the synthetic and shuffled ; and subjects tracked more
for the synthetic and shuffled then the human fluctuations. This could be a result of
inappropriate fluctuations for the synthetic and shuffled conditions which are apparent
in this experiment because subjects were given the rhythm which contains a great
deal of information that gives subjects cues about how and when the tempo should
fluctuate. Thus, when musical information is included, source becomes important
because of the appropriateness of the fluctuations with the musical structure.
Fractal conditions yielded persistent ITI time series, whereas the random fluctu-
ations yielded anti-persistent ITIs. Fractal analysis of the relative phase time series
showed persistence for the fractal conditions and a small number of trials persistent
and for the shuffled condition. H values from he ITIs and relative phase appear to
scale with the H values from the stimuli at the quarter-note metrical level. This is
most likely due to the fact that this is the level at which subjects were tapping. The
fractal scaling is, again, support in favor of strong anticipation as the mechanism for
prediction, but the significant prediction for the shuffled condition indicates other-
wise.

4.7 EXPERIMENT 7: MUSIC

Experiment 7 continued to examine the issue of whether 1/f structure is sufficient for
prediction of temporal fluctuations. Music (rhythm and pitch) with fluctuating tempi
were used as the stimuli and subjects tapped the beat which was explicitly marked
in the stimulus. The addition of pitch information was hypothesized to improve sub-
jects’ ability to predict temporal fluctuations for the human fluctuations because the
pitches provide melodic and harmonic structure, and further information about the

93
grouping structure which are known to be highly correlated with temporal fluctua-
tion. This means that the additional musical information would give even more cues
about the fluctuations, but only if they are appropriate for the music as they are in
the human condition. The synthetic and shuffled fluctuations were created without
regard to the musical structure and were therefore inappropriate when imposed onto
the rhythm of the music. If 1/f structure is sufficient, then the addition of pitch
(musical) information should not improve prediction. If the appropriateness of the
fluctuations (source) is necessary, then subjects may struggle with the inappropriate
fluctuation conditions due to violations of expectancy.

4.7.1 Method: Music

Stimuli The final experiment used the same rhythm and fluctuations from Exper-
iment 6. However, instead of being rendered as clicks, the actual pitches were used.
As in Experiment 6, the stimuli contained no chord asynchronies or dynamics.

Participants The same 12 subjects from Experiments 4, 5, and 6 participated in


this experiment.

Apparatus The same as in Experiments 4, 5, and 6.

Procedure The same as in Experiment 6.

4.7.2 Analysis: Music

Cross-correlations Cross-correlations between ITIs and IBIs were calculated for


each trial (72 trials per condition). Also, as in Experiments 4, 5, and 6, quarter-note

94
level IBIs were used, because this was the level at which subjects tapped.
Two-way ANOVAs were conducted on the correlation coefficients with the co-
efficients for each lag as the dependent variable. The analyses used within-subject
factors: Fluctuation (human vs. synthetic vs. shuffled ), and Trial (6). Significance
was computed at p < .01, and no significant effects of trial or interactions containing
trial were found, implying that no learning had taken place, and were not considered
further. The ANOVAs were rerun collapsing over Trial, as a one-way ANOVA with
the within-subjects factor of Fluctuation (human vs. synthetic vs. shuffled ). Pairwise
t-tests were used for post-hoc comparisons.

Fractal Analyses As in Experiments 2, 4, 5, and 6, the Power Spectral Density


(PSD) and Hurst’s Rescaled Range (R/S) analyses were calculated for the ITI and
the relative phase time-series from each trial across subjects (72 trials per condition).

4.7.3 Results: Music

Cross-correlations

Lag 0 All of the lag 0 cross-correlation coefficients for all three fluctuation con-
ditions were significantly greater than zero, human (100%), synthetic (100%), and
shuffled (100%). An ANOVA on the lag 0 cross-correlation coefficients revealed a
significant main effect of Tempo Fluctuation, F (2, 22) = 124.00, p < .01. The co-
efficients for the human and synthetic conditions were significantly greater than the
coefficients for the shuffled condition (p < .01; see Figure 4.7 middle).

Positive lag 1 All of the lag 1 cross-correlation coefficients for all three fluctua-
tion conditions were significantly greater than zero, human (100%), synthetic (100%),

95
and shuffled (100%). An ANOVA on the lag 1 cross-correlation coefficients revealed
no significant main effect (see Figure 4.7 right).

Negative lag-1 For the human condition, 46% of the cross-correlation coeffi-
cients were significantly different from zero, and for the synthetic condition 100% of
the cross-correlation coefficients were significantly different from zero, however, in the
shuffled condition 16% of the cross-correlation coefficients differed significantly from
zero. An ANOVA on the lag 0 cross-correlation coefficients revealed a significant
main effect of Tempo Fluctuation, F (2, 22) = 62.69, p < .01. The coefficients for
the human and shuffled conditions were significantly less than the coefficients for the
synthetic condition (p < .01). These results are shown in Figure 4.7 (left).

Experiment 7: Music
Cross-correlations
1
Lag-1 Lag 0 Lag 1 *p < .01
0.9

0.8
*
0.7

0.6

0.5
r

*
0.4

0.3

0.2
*
0.1
*
0 46% 16%
Human Synthetic Shuffled Human Synthetic Shuffled Human Synthetic Shuffled

Figure 4.7: Mean coefficients from the cross-correlation between ITIs


and IBIs at lag-1, lag 0, and lag 1 for Experiment 7: Music. The percent-
age of trials for which the cross-correlation coefficient was significantly
different from zero are listed if the percentage was less than 100%. Error
bars equal standard error of the mean.

96
Table 4.5: Fractal statistics for the ITIs and relative phase time series
from Experiment 7: Music. Mean Hurst exponents (H from R/S anal-
ysis) averaged across subjects, and the percentage of trials significantly
persistent (P), anti-persistent (AP), or not significant (NS), are displayed
(p < .05).
Stimulus Inter-Tap Interval Relative Phase

Fluctuation Hf Gn Hf Gn % P %NS %AP Hf Gn % P %NS %AP

Music Human .76 .57 56 44 0 .66 85 15 0


Music Synthetic .76 .65 100 0 0 .63 74 26 0
Music Shuffled .51 .45 0 89 11 .59 50 46 4

Fractal Analyses The results showed that for every trial, all β < 1, thus all trials
were categorized as fGn and the R/S analysis was also calculated. In general, the H
and β values agreed according to the equation β = 2H − 1. Mean H-values from the
R/S analysis are reported along with the percentage of trials that were significantly
different from H = 0.5 in Table 4.5.

Inter-Tap Intervals For ITIs from the fractal stimuli, human and synthetic, 56%
and 100% of trials were persistent (p < .05) respectively; which indicates positive long-
range correlation (0.5 < H < 1). None of the trials from the fractal conditions were
anti-persistent. Of the trials from the shuffled condition, 11% were anti-persistent
(p < .01); none of the trials from the shuffled condition were persistent. Moreover,
mean H values for each condition (H = .57 human; H = .65 synthetic; H = .45
shuffled ) scaled with the H values of the stimulus at the quarter-note metrical level
(H = .67 human; H = .78 synthetic; H = .54 shuffled ), rather than the stimuli at the
current sixteenth-note level (H = .76 human; H = .76 synthetic; H = .51 shuffled ).
This is similar to the results from Experiment 6 and most likely due to the fact that

97
this is the level at which subjects were tapping.

Relative Phase For relative phase from the fractal stimuli, human and synthetic,
85% and 74% of the trials were significantly persistent (p < .05) respectively, and none
of the trials were anti-persistent. The shuffled condition also showed persistent trials
(50%), and few anti-persistent trials (4%). All three of these percentages are much
larger than the relative phase mean H values from Experiments 4-6, and appear to
be scaling with the H values of the stimuli at the quarter-note level. Results are
displayed in Table 4.5.

4.7.4 Summary: Music

Prediction was significantly better for the fractal conditions than the shuffled, and
there was no difference between the two fractal conditions. Tracking was not sig-
nificantly different for the three conditions. Proaction was again greatest for the
synthetic condition. These results indicate that subjects continue to exploit the long-
term structure to predict significantly better than when there is only short term
structure.
Fractal conditions yielded persistent ITI time series, whereas the random fluctu-
ations yielded anti-persistent ITIs. Fractal analysis of the relative phase time series
showed persistence for the fractal conditions and a small number of trials persistent
and for the shuffled condition. H values from he ITIs and relative phase appear to
scale with the H values from the stimuli at the quarter-note metrical level. This is
most likely due to the fact that this is the level at which subjects were tapping. The
fractal scaling is evidence which supports strong anticipation as the mechanism for
prediction, but the significant prediction for the shuffled condition indicates other-
wise.

98
4.8 META-ANALYSIS FOR EXPERIMENTS 4-7

A meta-analysis was conducted to compare the results of all four experiments. This
enabled a thorough investigation of the sufficiency and necessity of 1/f structure in
temporal prediction. Specifically, a comparison of the results for when different types
and amounts of information were presented in the stimulus 1) fractal information:
long-term correlation and scaling and 2) musical information: rhythm and pitch).
The factors included: Experiment (Quarter, Eighth, Rhythm, Music) and Tempo
Fluctuation (human, synthetic, shuffled ).

4.8.1 Cross-correlations

Two-way ANOVAs were conducted on the correlation coefficients with the coefficients
for each lag as the dependent variable. The ANOVA was calculated across all four
experiments, with within-subject factors: Fluctuation (human vs. synthetic vs. shuf-
fled ), and Experiment (quarter vs. eighth vs. rhythm vs. music). Significance was
computed at p < .01. Pairwise t-tests were used for post-hoc comparisons.

Lag 0 Significant main effects of Fluctuation, F (2, 22) = 250.40, p < .01, and Ex-
periment, F (3, 33) = 40.40, p < .01, were found, and their interaction was significant,
F (6, 66) = 11.74, p < .01. These main effects and interaction are shown in Figure
4.8. Post-hoc tests revealed the coefficients from the human and synthetic conditions
were significantly greater than the coefficients from the shuffled condition (p < .01;
Figure 4.8A). This result shows that prediction was significantly better when subjects
tapped to a fractal stimulus as compared to tapping to a shuffled stimulus. Coeffi-
cients from the Quarter Experiment were significantly less than coefficients from the
Eighth, Rhythm, and Music Experiments (p < .01; Figure 4.8B). This shows that sub-
jects improved prediction with the addition of scaling information, but not musical

99
information (rhythm, pitch). Coefficients from the Eighth Experiment were signif-
icantly greater than the Quarter and Rhythm Experiments (p < .01). This result
can be accounted for by the interaction shown in Figure 4.8C. There is no significant
difference between the results from the Eighth, Rhythm and Music Experiments for
the human fluctuations, but there is a significant decrease in the results from the
Rhythm and Music Experiments when subjects tapped to the synthetic and shuffled
fluctuations. This may be because the synthetic and shuffled fluctuations were not
created to be appropriate for the structure of the music so once the rhythm and pitch
were added, the task became more difficult because the extra information was not
congruent with subjects’ expectations of where and when the tempo should fluctuate
according to the musical structure.

Positive lag 1 The ANOVA on the lag 1 cross-correlation coefficients (ITI x IBI)
revealed a significant main effect of Fluctuation, F (2, 22) = 38.93, p < .01, and the in-
teraction between Fluctuation and Experiment was significant, F (6, 66) = 14.92, p <
.01. The main effect and interaction are shown in Figure 4.9. Post-hoc tests re-
vealed the coefficients from the synthetic and shuffled conditions were significantly
greater than coefficients from the human (p < .05). This indicates that subjects were
tracking more for the non-human fluctuations synthetic and shuffled. inappropriate
Overall, subjects tracked more for the fluctuations. When examining the interaction,
note that the tracking is strongest for the Quarter Experiment, shuffled fluctuations,
which is expected as it is the condition with the least information and no long-term
structure. And tracking is the least for the Quarter Experiment, human fluctuations
which is surprising because there is only long-term structure in the stimulus.

100
Meta Analysis: Lag 0 Cross-correlations *p < .01
Quarter
0.7
A B C * Eighth
* ** Rhythm
0.6
* * * Music
*
0.5 * *
* *

r
0.4 *
*
0.3 *
0.2
*

101
0.1

0
Human Synthetic Shuffled Quarter Eighth Rhythm Music Human Synthetic Shuffled

Figure 4.8: Means of the lag 0 cross-correlation coefficients (ITI x IBI)


for all experiments and fluctuations. A) Shows the main effect of Fluctu-
ation, B) shows the main effect of Experiment, and C) shows the inter-
action between Experiment and Fluctuation.
Negative lag-1 The ANOVA on the negative lag-1 cross-correlation coefficients (ITI
x IBI) revealed a significant main effect of Fluctuation, F (2, 22) = 146.99, p < .01,
and the interaction between Fluctuation and Experiment was significant, F (6, 66) =
6.26, p < .01. The main effect and interaction are shown in Figure 4.10. Post-hoc
tests revealed that coefficients from the synthetic condition were significantly greater
than both the coefficients from the human and shuffled conditions (p < .01). And
coefficients for the human fluctuation condition were significantly greater than the
shuffled (p < .01). This indicates that subjects tapped ahead of the fluctuations for
the synthetic condition, thus there was something in the structure of this stimulus
which caused subjects to proact more often than for the other conditions. The inter-
action shows that for the human condition, the Quarter Experiment coefficients were
significantly greater than the coefficients from the Eighth, Rhythm, and Music Ex-
periments. Thus, for the Quarter Experiment, human fluctuations, subjects tapped
ahead of the fluctuations more often than for the other experiments (human). This
indicates that there is something different about the human fluctuations that was not
present in the shuffled fluctuations.

4.8.2 Fractal Analyses

Table 4.6 shows the mean H values and percent of significantly fractal trials for ITIs
and relative phase for each condition of each experiment. Two-way ANOVAs were
calculated for H values from ITIs and relative phase with within-subject factors:
Experiment (Quarter vs. Eighth vs. Rhythm vs. Music), and Fluctuation (human
vs. synthetic vs. shuffled ). Pairwise t-tests were used for posthoc comparisons.

102
Meta Analysis: Lag 1 Cross-correlations Quarter Rhythm

0.9
*p < .01 Eighth Music

A B *
*
0.8

0.7

0.6
* *
* *
0.5
r

0.4

0.3

0.2

0.1

0
Human Synthetic Shuffled Human Synthetic Shuffled

Figure 4.9: Means of the lag 1 cross-correlation coefficients (ITI x IBI)


for all experiments and fluctuations. A) Shows the main effect of Fluctu-
ation, B) shows the interaction between Experiment and Fluctuation.

Meta Analysis: Lag-1 Cross-correlations *p < .01


0.7
A B Quarter
Eighth
0.6 Rhythm
Music

*
0.5

0.4
*
*
r

0.3

*
* *
0.2

0.1

0
Human Synthetic Shuffled Human Synthetic Shuffled

Figure 4.10: Means of the lag-1 cross-correlation coefficients (ITI x IBI)


for all experiments and fluctuations. A) Shows the main effect of Fluctu-
ation, B) shows the interaction between Experiment and Fluctuation.

Inter-Tap Intervals Table 4.6 shows that H values from the ITIs for the fractal
fluctuations of each experiment were either persistent or not significantly different
from white, uncorrelated noise (p < .05); and H values from the ITIs for the shuffled
conditions of each experiment, were either anti-persistent or not significantly different
from white, uncorrelated noise (p < .05). The ANOVA on the H values from the ITIs
revealed significant main effects of Fluctuation, F (2, 22) = 2202.96, p < .01, and

103
the interaction between Experiment and Fluctuation was also significant, F (6, 66) =
6.50, p < .01. Post-hoc tests revealed that H values from the human and synthetic
conditions were significantly greater than the H values from the shuffled condition
(p < .01). This indicates that the type of fractal structure in the ITIs was dependent
upon whether the stimulus had a persistent fractal structure or no fractal structure.
This main effect is shown in Figure 4.11A.
The H values from the ITIs scaled at an even finer level than simply persistent
vs. anti-persistent. The largest mean H value from the ITIs for each experiment
always corresponded to the synthetic condition, which had the largest H value of the
three conditions at the quarter-note and eighth-note metrical levels. The smallest
mean H value from the ITIs corresponded to the shuffled condition, which always
had the smallest H value of the three fluctuation conditions. Post-hoc tests also re-
vealed that H values from the synthetic condition were significantly greater than the
H values from the human and shuffled conditions (p < .01). Moreover, the ITIs did
not scale with each experiment. The mean H values within each condition did not
change significantly for the different experiments, except for the synthetic Quarter
Experiment. Post-hoc tests revealed that H values from the synthetic, Quarter Ex-
periment were significantly less than H values from the Eighth, Rhythm, and Music
Experiments (p < .01). It appears that the ITIs scaled with the H values from the
Quarter Experiment regardless of how much fractal or musical information was added
to the stimulus. It is possible that this is due to the fact that subjects always tapped
at the quarter-note metrical level. It is also possible that subjects were streaming
the two different pitches and only paying attention to the pitches occurring at the
quarter-note metrical level, which caused the ITIs to reflect only the structure of this
time scale. This indicates that the structure of the ITIs scaled not just with the
general type of fractal structure, or lack thereof (persistent, anti-persistent, random),

104
but that the H values of the ITIs scaled with the specific H value of the stimulus
level at which they tapped.

Meta-analysis: Inter-Tap Intervals *p < .01


0.8
A * B **
Quarter
Eighth
0.7 Rhythm
Music
*
0.6
*
0.5 *
0.4
H

0.3

0.2

0.1

0 49 34 29 56 89 100 100 100 46 51 24 11


Human Synthetic Shuffled Human Synthetic Shuffled

Figure 4.11: Means of the Hurst Exponents for the inter-tap intervals
(ITIs). A) Displays the main effect of Fluctuation, B) displays the in-
teraction between Experiment and Fluctuation, The percentage of trials
significantly different from H = .5 (blue line) are listed. Error bars equal
standard error of the mean.

R/S Analysis - Relative Phase *p < .01


*
0.8

0.7
A
* B *
0.6 * *
0.5

0.4
H

0.3

0.2

0.1

0
Quarter Eighth Rhythm Music Human Synthetic Shuffled

Figure 4.12: Means of the Hurst Exponents for the relative phase time
series. A) Shows the main effect of Experiment, B) shows the main effect
of Fluctuation. Error bars equal standard error of the mean. (H = .5,
blue line)

105
Relative Phase Table 4.6 shows that most H values from the relative phase time
series were mostly persistent or not significantly different from white, uncorrelated
noise, with few anti-persistent trials for the shuffled condition of each experiment
(p < .05). There was an increase in both the mean H values (p < .05), and percentage
of significantly fractal trials (54 − 85%) for the Music Experiment. The ANOVA on
the H values of the relative phase time series revealed significant main effects of
Fluctuation, F (2, 22) = 9.81, p < .01, and Experiment F (3, 33) = 11.49, p < .01.
Post-hoc tests revealed that H values from the human and synthetic conditions were
significantly greater than the H values from the shuffled (p < .01). The H values
from the Music Experiment were significantly greater than the H values from the
Quarter, Eighth, and Rhythm Experiments (p < .01). These effects are shown in
Figure 4.12. Thus, the errors became significantly more persistent with the addition
of pitches, and errors were significantly less fractal (more random) for the shuffled
(random) condition as compared to the fractal conditions.

4.8.3 Summary: Meta-analysis

Prediction was equal for the two fractal conditions (human and synthetic) and im-
proved with the addition of scaling information (Experiment 5), but not with the
addition of rhythm or pitch information. These results demonstrate that 1/f struc-
ture is sufficient for prediction. There was a main effect showing that prediction
was significantly better for the fractal conditions than the shuffled, which indicates
that 1/f structure improves prediction. However, subjects were able to predict the
shuffled fluctuations better than chance, which indicates that 1/f structure is not
necessary for prediction. This is consistent with the initial hypotheses that subjects
would predict the human and synthetic conditions better than the shuffled. How-

106
Table 4.6: Fractal statistics for the ITIs and relative phase time series
from Experiments 4-7. Mean Hurst exponents (H from R/S analysis)
averaged across subjects, and the percentage of trials significantly per-
sistent (P), anti-persistent (AP), or not significant (NS), are displayed
(p < .05).
Stimulus Inter-Tap Interval Relative Phase

Fluctuation Hf Gn Hf Gn % P %NS %AP Hf Gn % P %NS %AP

Quarter Human .67 .57 49 51 0 .55 40 56 4


Quarter Synthetic .78 .62 89 11 0 .59 53 47 0
Quarter Shuffled .54 .42 0 54 46 .54 36 57 7

Eighth Human .70 .55 34 66 0 .53 28 68 4


Eighth Synthetic .77 .65 100 0 0 .55 38 62 0
Eighth Shuffled .53 .42 0 49 51 .50 19 62 19

Rhythm Human .76 .56 29 71 0 .57 31 69 0


Rhythm Synthetic .76 .65 100 0 0 .56 42 58 0
Rhythm Shuffled .51 .44 0 76 24 .51 18 72 10

Music Human .76 .57 56 44 0 .66 85 15 0


Music Synthetic .76 .65 100 0 0 .63 74 26 0
Music Shuffled .51 .45 0 89 11 .59 50 46 4

107
ever, subjects prediction was better than chance for the shuffled condition despite
the absence of long-term structure. This indicates that there is something other than
the 1/f structure which enables subjects to predict fluctuations. It is possible that
the short-term structure enabled prediction, because the shuffled condition contained
some short-term correlation at the eighth- and quarter-note levels (see Table 4.1).
The mean H values for the ITIs did not change much with the addition of infor-
mation at a finer time scale or the addition of rhythm or pitch information, which
could be a result of having subjects tap at the same (quarter-note) metrical level for
each experiment. However, the H values did scale with the H values of the tempo
fluctuations (stimuli) from the Quarter-note Experiment. The presence of long-term
correlation yielded persistent ITIs, and the absence of long-term correlation yielded
anti-persistent ITIs. This result is consistent with results from Chen et al. (1997)
and Experiment 2 which show that ITIs exhibit anti-persistence when subjects tap
to non-fluctuating stimuli.
Fractal analyses of the relative phase time series were less clear. The mean H
values were similar from one experiment to the next, except for the Music Experiment.
With the addition of pitch information, there was a significant increase in the number
of fractal trials, for all three conditions. Most trials were persistent, and H values
scaled in a general way with the H values of the stimuli: higher H for fractal stimuli,
lower (whiter) H for shuffled.

4.9 GENERAL DISCUSSION: EXPERIMENTS 4-7

This chapter addressed the role of 1/f structure in prediction and asked whether
fractal structure is necessary and sufficient for prediction of temporal fluctuations.
The following predictions were made: 1) there will be no significant difference be-

108
tween the two 1/f conditions (human and synthetic), 2) prediction will improve with
the addition of scaling and musical information, 3) the 1/f conditions (human and
synthetic) will elicit stronger prediction (lag 0) than the shuffled condition, and 4)
subjects will show stronger tracking (lag 1) for the shuffled condition than the fractal
conditions.

4.9.1 Is 1/f Structure Strongly Sufficient?

If the fractal structure is sufficient, the source of the 1/f structure will not effect
prediction and the addition of musical information will not improve prediction. This
was what the results showed. There was no significant difference between subjects’
ability to predict the two fractal conditions (human and synthetic), which confirms
the hypothesis regarding source of 1/f structure. The addition of musical (rhythm
and pitch) information did not improve prediction for the human fluctuations. This
result did not support the hypothesis and indicates that the 1/f structure is more
important for temporal prediction than once believed. Also, prediction significantly
decreased for the synthetic and shuffled fluctuations. This decrease is most likely due
to the fact that the fluctuations for these two conditions were inappropriate for the
piece of music. Thus, when subjects were given the musical information, rather than
just the beat information, the inappropriate fluctuations violated their expectations
which were likely determined by the grouping, harmonic, and melodic structure in
the composition. These results fully support the strong sufficiency of 1/f structure
for prediction in fluctuating stimuli.

4.9.2 Is 1/f Structure Necessary?

Though there was evidence in favor of the necessity of 1/f structure, ultimately, this
cannot be claimed. The results showed that subjects were significantly better at pre-

109
dicting the fluctuations in stimuli when scaling information was added. Also, subjects
were consistently better at predicting the stimuli which had long-term correlations,
when compared to their ability to predict fluctuations in a non-fractal stimulus which
contained no long-term correlations. These results support the necessity of 1/f struc-
ture, however, a surprising result was how well subjects were able to predict the
shuffled fluctuations. Even when there was no information between taps for Experi-
ment 4 (Quarter Beats), prediction was still significantly greater than zero for about
half (47%) of the trials. This indicates that long-term correlations are not necessary
for prediction of fluctuations. This result was found for each Experiment, suggesting
that subjects exploit long-term structure to predict as accurately as possible, but
when there is no long-term structure, subjects are using something else to predict.
It is possible that subjects were following the local fluctuations (short-term correla-
tions) in order to predict. Short-term correlations were present in the Quarter-note
Experiment shuffled stimulus. The auto-correlation at lag 1 was .1442, and this is
approximately the mean value for the lag 0 cross correlation. This short-term struc-
ture may, in fact, be what is necessary for prediction. This aspect was not controlled
in this study, so this is a speculation only.

4.9.3 Fractal Structure of Tapping Data

The fractal analyses of the ITIs show that the fractal structure in the stimulus is
reflected in the taps via scaling. The mean H for the ITIs was always the smallest for
the shuffled condition, which was also the smallest stimulus H . ITIs were persistent
for the fractal conditions and anti-persistent for the shuffled condition. This is similar
to what was found in Experiment 2, the ITIs were persistent when subjects tapped to
the expressive versions of the music, and anti-persistent when subjects tapped to the
mechanical versions of the music. Chen et al. (2002), also found that when subjects

110
tapped to a metronome the ITIs were anti-persistent. This evidence supports the
hypothesis of strong anticipation.
The results from this study are evidence against most models of synchronization,
which predict that subjects will track fluctuations in tempo. A few studies (e.g.,
Thaut et al., 1998) have shown that subjects track temporal fluctuations, but the
fluctuations in the stimuli for these previous studies were much smaller than those
seen in music performance, thus the finding of tracking could have been a result of
the phase relationship. Another interesting finding was that the mean H values for
the relative phase time series had stronger persistence for the fractal conditions than
the shuffled, and stronger persistence for all three types of fluctuation in the Music
Experiment (Experiment 7). It can be seen in Figure 4.12 that the relative phase
appears to whiten for the shuffled condition. It has been suggested by Grigolini et
al. (2009) that the H value for the errors gets more white, or random, when task
difficulty increases. The data from this chapter seem to agree with this idea that
easier tasks yield a strongly persistent error time series. Experiment 2 showed similar
results, with a higher percentage of significantly persistent trials for the mechanical
conditions, for both the Bach and Chopin (see Table 2.2). The results from this
chapter show that 1/f structure is sufficient though not necessary for prediction of
temporal fluctuations and that fractal structure of the tapping data scaled with the
fractal structure of the stimuli.

111
Chapter 5

Summary and Conclusions

Pulse is a musical universal which provides a unique opportunity to study the per-
ception of time. In this dissertation, the fundamental nature of pulse was examined.
Many theorists assume that the underlying pulse in music is isochronous, and that
temporal fluctuations are deviations from isochrony. However, previous research has
shown that pulse is not isochronous or purely periodic; it contains temporal fluc-
tuations. Correlations between tempo fluctuations and musical structure, such as
meter and phrase, have been identified (reviewed by, Palmer, 1997; Sloboda and
Juslin, 2001). However, the residual fluctuations have been either unmodelled or
assumed to be white noise (Repp, 1999c; Palmer, 1997; Vorberg and Wing, 1996;
Schulze et al., 2005; Large, 2000). The results obtained here may shed new light on
the way we think about pulse in music.
Fractal structure is found throughout nature in biology, psychology, rhythmic
motor activities, and music. It has been reported that fluctuations in loudness and
frequency (Voss and Clarke, 1975), and melody (Hsü and Hsü, 1991; Hsü and Hsü,
1990; Brothers, 2007) exhibit 1/f structure. Other studies have found 1/f struc-
ture in rhythmic behavior, including continuation tapping and synchronization with
a metronome (Torre and Delignières, 2008; Lemoine et al., 2006; Chen et al., 2002).
These findings, in combination with the known structure of tempo fluctuations, led to

112
the hypothesis that the tempi of musical performances may exhibit fractal structure.
Few studies have analyzed the nature of tempo fluctuations in entire performances
(Repp, 1992b; Repp, 1995), and until now, no studies have used fractal analyses on
pulse in music performance. In performed music, temporal fluctuations can be com-
plex, yet listeners are able to coordinate movements with the pulse. If 1/f fluctuations
are intrinsic to pulse, and to rhythmic behavior, what does this mean for expectations
of listeners and their ability to coordinate with a fluctuating stimulus?

5.1 PULSE IN MUSIC IS 1/f

In Experiments 1 and 3, entire musical performances were analyzed to investigate


the nature of pulse. The results showed that tempo fluctuations exhibit long-term
correlation (persistence, 0.5 < H < 1) and fractal scaling. Persistence is typical of
the fractal structure that is found in nature (Feder, 1988; Bassingthwaighte et al.,
1994), and indicates that tempo changes are smooth. The finding of fractal structure
was found to generalize to multiple composers, pianists, performance practices, time
periods, and musical styles. These results suggest that fluctuations are intrinsic to
pulse. Such observations provide evidence against theoretical models that describe
pulse as an isochronous signal plus white noise (see Madison, 2004). Is this fractal
structure simply a result of the fractal nature of humans? After all, fractal struc-
ture has been found in the self-paced activities of walking and tapping (Torre and
Delignières, 2008). Perhaps a certain amount of fractal structure is inherent in motor
activity. It is unlikely that this is the only explanation, because the degree of fractal
structure found in the present studies was determined by the specific piece of music.
Thus, there is some aspect specific to the composition that causes expert pianists to
fluctuate the tempo of a piece of music in similar ways.

113
One possible explanation for the similar tempo structure among different pianists
is meter and phrasing of the compositions, which are both nested structures and are
known to be correlated with tempo fluctuations (Sloboda, 1985). Music is often ana-
lyzed and notated according to these nested structures. The finding of 1/f structure
in the tempi does not discount the evidence of musical structure’s causal relationship
with tempo fluctuation. In fact, it is likely that these structures are, in part, what
creates scaling and persistence. However, it has been shown that these musical as-
pects cannot account for all of the tempo fluctuations and the differences that exist
among performers (Repp, 1992b; Repp, 1995; Penel and Drake, 1998). Instead, frac-
tal analysis is a way of characterizing these fluctuations, and it allows for the entire
performance to be analyzed objectively without the time consuming subjective work
of musicological analysis. Another important aspect of this approach is that it does
not assume the fluctuations in tempo are deviations from isochrony, but are inherent
to the performance. This suggests fluctuations are intrinsic to pulse.

5.2 COORDINATION WITH LARGE FLUCTUATIONS

How is pulse perceived and what is the mechanism for coordinating movement with
the pulse of music that fluctuates in time? In Experiment 2, it was shown that subjects
can perceive and entrain to pulse, at multiple metrical levels, amid a fluctuating
tempo. However, instead of tracking, subjects predicted the tempo fluctuations.
Fractal analyses of the ITIs and the error time series revealed that the fractal structure
of the IBIs in the expressive versions was reflected in the ITIs via scaling, whereas the
mechanical versions caused the ITIs to be anti-persistent and the error time series
to be persistent. These results are in agreement with those found in tapping to a
metronome (Chen et al., 1997; Chen et al., 2002).

114
The importance of 1/f structure in pulse prediction was investigated through a
series of experiments (Experiments 4-7) that varied the amount and type of musical
information present in stimuli having either fractal or random fluctuations. The re-
sults show that fractal structure is sufficient, if not necessary, for prediction in stimuli
with large temporal fluctuations. The addition of scaling information to the stimuli
improved subjects’ ability to predict. The further addition of musical information
(rhythm and pitch) did not further improve prediction.

5.2.1 1/f Structure is Sufficient

Experiments 4-7 illustrated that 1/f structure is sufficient for prediction of tempo-
ral fluctuations in stimuli with large tempo changes. The first issue addressed was
the importance of the source of the 1/f fluctuations. When subjects tapped with
fluctuating beat times, the source of 1/f structure was not a determinant of predic-
tion. When scaling information (more events) was added to the stimuli, subjects’
prediction improved significantly for the two different 1/f conditions. The addition
of rhythm and pitch information to the stimulus allowed for the investigation of the
second issue: whether musical information helps subjects predict fluctuations. The
rhythm gave subjects cues to the metrical and grouping structure of the music and the
pitches gave subjects the melodic and harmonic structure. Since musical structure
(meter, grouping, melody) is known to be highly correlated with tempo fluctuation
and the degree of fractal structure depends on the composition, the addition of musi-
cal information should have given subjects more information about potential tempo
fluctuations. Yet, subjects did not improve in their ability to predict, even when the
fluctuations were appropriate for the musical structure. These results indicate that
1/f structure is sufficient for prediction of temporal fluctuations.

115
5.2.2 1/f Structure is not Necessary

Experiments 4-7 demonstrated that fractal structure is not necessary for subjects
to predict fluctuations for stimuli containing large temporal fluctuations. This was
investigated by isolating the structure of the fluctuations. Subjects were always sig-
nificantly better at predicting fluctuations exhibiting fractal structure than predicting
shuffled (random) fluctuations. However, subjects were able to predict shuffled fluctu-
ations better than chance. Prediction of shuffled fluctuations may have been possible
because subjects may have used short term correlations (local changes in tempo)
present in the in the shuffled time series (ac1 = .1442). Subjects showed almost no
ability to proact for the shuffled condition. The addition of scaling information in the
stimuli increased prediction significantly. Prediction was increased for all the condi-
tions, but the fractal conditions still had significantly higher prediction rates than the
shuffled condition. The addition of beats from a lower metrical level allowed subjects
to use both long-term and short-term structure to coordinate their taps with the
stimulus. Thus, although fractal fluctuations enabled much stronger prediction, 1/f
structure was not strictly necessary, as short term correlations appear to enable some
prediction as well. This is evidence that 1/f structure is not necessary for prediction.

5.3 MODELS

The scaling properties of the metrical hierarchy could explain why subjects’ pre-
diction improved with the addition of beats at the eighth-note level in Experiment
5–another time scale of information was added to the stimuli. The fact that this
addition also improved prediction for the random fluctuations means that this moni-
toring of multiple metrical levels exploits both long- and short-term correlation and

116
scaling information.1 One might explain these observations with the Large and
Jones (1999) model, in which entrainment takes place at multiple time scales si-
multaneously via neural oscillations of different frequencies that entrain to stimuli
and communicate with one another. This model successfully tracked temporal fluc-
tuations in expressive performances. Systematic temporal structure that is charac-
teristic of human performances improved tracking, but randomly generated temporal
irregularities did not (Large and Palmer, 2002). However, the observations from this
dissertation are not consistent with tempo tracking models (Large and Palmer, 2002;
Vorberg and Schulze, 2002). Tempo tracking would imply smoothed IBIs, as found by
Dixon et al. (2006). In Chapters 2 and 4, the H values for ITIs were approximately
equal, or less than, the H values of the veridical IBIs; and the H values from the
ITIs scaled with the H values from the IBIs. Tempo tracking appears to be ruled
out by this finding as ITIs would be smoother (H greater) than IBIs due to low-pass
filtering of fluctuations. This opens up new lines of inquiry into the nature of pulse,
not just at the behavioral level, but also at the level of neural processes. Chen et
al. (1997) suggest that long-range dependence in timing fluctuations is the outcome
of distributed neural processes acting on multiple time scales.
This brings up the issue of temporal expectation and violations of temporal ex-
pectancy. How are listeners are able to predict temporal fluctuations? Prediction is
evidence against the models which propose tempo tracking in response to fluctua-
tions. The timekeeper model (Wing and Kristofferson, 1973) assumes isochrony for
the pacemaker and adds white noise to account for variance. This cannot account
for the behavior of synchronization with temporally fluctuating musical stimuli. The
oscillator models (Large and Kolen, 1994; Large, 2008; Large and Palmer, 2002) use
1
The shuffled condition did not have an autocorrelation of 0, thus exhibited some short-term
correlation.

117
the neurodynamic properties of multi-frequency entrainment. In these models, there
is a nonlinear interaction between periodic phase coupling and period adaptation.
The main discrepancy with these models and the data is that the models assume
that the underlying expectations (pulse) are isochronous. These models show that,
due to this violation of expectancy, subjects will track temporal fluctuations, rather
than predict them. However, the results from Chapters 2 and 4 indicate that subjects
predict more than track tempo changes. One possibility is that subjects are expect-
ing these fluctuations and that the underlying pulse is not isochronous. Torre and
Delignières (2008) proposed a model for continuation tapping to an isochronous stim-
ulus which matched the data quite well. However, perhaps their model is accurate
only when there is an expectation of isochrony. It may be the case that expectancy
is not always isochronous or always fractal but is context dependent. For example,
whether expectancy is violated by the doubling of one IBI may depend upon whether
the altered IBI is embedded in an isochronous rhythm or in a rhythm containing
large tempo fluctuations. When fluctuations are present, expectancy may exist on a
continuum.

Strong Anticipation The term synchronization has been used to refer to the tem-
poral relation that emerges from time delays in appropriately coupled slave and mas-
ter dynamical systems (Voss, 2001). The slave system anticipates the master by
synchronizing with a state that is in the future of the master system. Stephen et
al. (2008) state that strong anticipation is an assertion of non-local dependence of an
organism’s current behavior on the global temporal structure within its environment.
The current states of the organism-environment system depend on initial conditions
and possibilities for future events. When anticipating, an organism acts on the past

118
history and the potential outcomes.
Stephen et al. (2008) argue that an anticipatory system does not need to be
explicitly early or late for any individual event. The system is anticipatory because
of its dependence on future states, not because of its state at any particular instant.
There is global coordination on a non-local time scale which can be measured by
comparing the long-range correlations of the behavior and the long-range correlations
of the system to find a relationship. The scaling exponent of the organism should
depend on the scaling parameter of the environment if strong anticipation is taking
place. They conclude that adaptation of behavior to the environment’s statistical
structure requires neither explicit statistical inference nor explicit prediction; the
adaptation is a necessary consequence of a strongly anticipatory system.
Expectancy is correlated with musical structure (Repp, 1999b; Palmer, 1997). For
example, tempo is expected to decrease at the end of phrases, and unless the tempo
increases, stays constant, or decreases a great deal, it will not violate expectation.
It is possible that this range of expectation provides a window of interest for the
listener. Having a range of expectancy over which tempo can vary may allow for mu-
sicians to satisfy some expectancy while deviating from expectation enough to create
an engaging performance that will be novel and interesting. Tonal complexity may
operate in a similar manner; too much violation and average listeners get confused,
not enough and they are bored. These ideas could partially explain the difficulty of
teaching the stylistic nuances of how to use violations of expectancy in performance
expression. Technique can be taught, but the proper way to violate expectations
(melodic, rhythmic, tempo) is not a formula that can be applied.

119
5.4 1/f STRUCTURE IN THE WORLD

In general, many natural signals have 1/f characteristics, and some authors hypoth-
esize that neural systems have evolved to encode these signals more efficiently than
others (Yu et al., 2005). Additionally, humans prefer stochastic compositions in
which frequency and duration are determined by a 1/f 1 noise source over compo-
sitions generated with 1/f 0 or 1/f 2 scaling (Voss and Clarke, 1978; Voss, 1989).
Thus, fractal temporal structuring of performance fluctuations may be well matched
with human perceptual mechanisms. Moreover, several researchers have suggested
a deep relationship between musical and other biological rhythms (Fraisse, 1984;
Iyer, 1988). Other biological rhythms, such as heart rate, exhibit 1/f type tem-
poral dependencies (for a review, see Glass, 2001); thus, if musical rhythm owes
its structure, in part, to other biological rhythms, 1/f fluctuations would be per-
ceived as more natural than mechanical, or isochronous, rhythms. Indeed, fractal
structure may not only enable prediction of temporal fluctuations, but may enhance
affective or aesthetic judgements for music performances (cf. Bhatara et al., 2010;
Chapin et al., 2010).
There is a continuous interaction between endogenous control mechanisms and en-
vironmental stimuli (Gibson, 1966); usually, these are impossible to separate when it
comes to physiological rhythms (Glass, 2001). The research carried out in this disser-
tation shows that in music there is structure, in both the exogenous and endogenous
processes (pulse), that persists throughout the interaction. Like many physiological
rhythms, fluctuations in periodic tapping are structured in the absence of external
stimuli, in the presence of periodic stimuli, and when there is long-term structure in
the external stimulus (i.e., music). Moreover, fractal structure in stimuli may enhance
interaction, such that endogenous processes are better able to lock on, perceive struc-

120
ture, and adapt to changes. Such processes may be the mechanism behind successful
interaction with the environment and between individuals.

121
Appendix A

Circular Statistics

Past research has analyzed the mean and variance of timing errors (Woodrow, 1932;
Dunlap, 1910). These measures are appropriate if the stimulus is purely periodic, or
isochronous. The stimuli used for this thesis are not uniformly periodic. The times
subjects should tap have a varying frequency; the length of the inter-beat intervals is
not equal. Circular measures have the advantage of implicitly normalizing for cycle
time and errors are calculated in reference to the structure of the sequence.
Circular statistics was used to analyze the tapping data (Batschelet, 1981; Beran,
2004). It was assumed that the periodicity of the stimulus translated to the behavioral
data. A period of a cyclic event, such as a beat period, can be represented as a circle.
An event that occurs within a beat period, such as a tap, can be represented as a
point, φ̂, on a unit circle. Thus relative phase angle is the important variable. The
term relative phase refers to the time of a tap, n, relative to the phase of the time
where the subject was supposed to tap. The relative phase, φn , of each tap was
calculated as;
Tn − Bm
φn = (A.1)
Bm − Bm−1
Tn is the time of the subject’s nth tap, Bm is the time of the closest beat, and Bm−1
is the time of the preceding beat. The arithmetic mean of multiple angles is not

122
meaningful. Mean relative phase requires conversion into a complex number, zn ;

zn = cos 2πφn + i sin 2πφn (A.2)

Then, the mean is calculated as:

N
1 X
z̄ = zn (A.3)
N n=1
The magnitude and phase of the mean are then calculated,

r̄ = |z̄| 0 6 r̄ 6 1 (A.4)

r̄ represents the magnitude or mean vector length. r̄ is a measure of variability


that lies between 0 and 1. φ̄, is the mean relative phase angle which is circular on
the interval (0, 1), and was reset to the interval (-.5,.5)

arg(z̄)
φ̄ = − .5 6 φ̄ 6 .5 (A.5)

This corrects for problems that occur when taps stray far from the intended beat.
r̄ is sometimes used directly as a measure of synchronization strength (Goldberg
and Brown, 1969). However, for error bars one wants a measure which, like standard
deviation, goes to zero as variability diminishes. The angular deviation, s̄, does this.
p
2(1 − r̄)
s̄ = (A.6)

s̄, is a measure of dispersion about the mean.

123
Appendix B

Fractal Analysis

B.1 POWER SPECTRAL DENSITY

Power Spectral Density (PSD) analysis is widely used for computing the fractal prop-
erties of time series. The power spectrum of the time series describes the fluctuations
at different frequencies. It has an inverse power law form. The correlation function
for this process increases algebraically with time tα . Its Fourier transform, the power
spectral density, depends inversely on the frequency 1/f α+1 . It is also called an in-
verse power law spectrum. A power law on log-log plot graphs as a straight line with a
slope determined by the power law index. Power law distributions are known to have
statistical self-similarity. This gives a distribution density that follows a functional
scaling law of renormalization group property. In other words the probability density
function (PDF) is not curved, it is a straight line on a log(P DF (x)) versus log(t).

B.2 HURST’S RESCALED RANGE ANALYSIS

The Hurst exponent, H, was developed when Hurst (1951) was trying to define an
empirical descriptor of temporal signals in natural phenomena. He based this on
the statistical assessment of many observations as a civil, geographical, and ecolog-
ical engineer while working with the Nile River. Hurst was trying to determine the
necessary capacity of the Aswan dam. He found that there was correlation between

124
successive years, thus requiring the dam to be much larger than if the annual rainfalls
and river flows were random. While comparing random signals to biased ones he de-
veloped an approach to examining accumulations or integrals of naturally fluctuating
events (Hurst et al., 1965). The Hurst exponent gives a measure of smoothness of a
fractal object or time series. H can assume any value between zero and one. A low
H indicates a high degree of roughness and a high H indicates smoothness.
Hurst’s Rescaled Range Analysis (R/S analysis) is the range of cumulative devi-
ations normalized by the standard deviation S as a function of length of the signal.
What we are looking at is how the range of cumulative fluctuations depends on the
length of the subset of data analyzed. Start with the whole set of observations F (τ )
or data points that cover an entire duration τ . First, calculate the mean over the
whole of the data collected
L
1X
F̄ (τ ) = F (ti ) (B.1)
L i=1
F (ti ) represents individual data points. L represents the number of values that cor-
respond to a particular t. t, is a discrete time point that is 0 6 t 6 τ . t is the
duration of the subset or interval. The size of t is varied over a large range. Let X(t)
be the accumulated departure of the tempo F (ti ) from the mean F̄ (τ ). This gives us
a cumulative total at each time point
t
X
X(t, τ ) = [F (ti ) − F̄ (τ )] (B.2)
i=1

Now, find the range R(τ ) by locating the maximum value Xmax and the minimum
value Xmin of X(t, τ ) and take the difference

R(τ ) = Xmax − Xmin (B.3)

Calculate the standard deviation S square root of the variance, of the values F (ti ),

125
of the observations over the period τ during which the local mean is F (τ )
τ
1X 1
S={ [F (ti − F̄ (τ )]2 } 2 . (B.4)
τ i=1

Calculate R/S = R(τ )/S(τ ) and plot the values on a log-log plot.
For this first stage N = the total number of points in the time series. For the
2nd stage N will only cover a fraction of all the values in the series. N is varied
according to K. K is calculated by taking the log 2(total number of points). K is
the number of subsets that we repeat these first steps for and N is the number of
values in each subset K. So, we get a R/S for every K. For our data K = 9 and
N = [2, 4, 8, 16, 32, 64, 128, 256, 512]. These results are plotted for each N .

126
Appendix C

Synthesis of 1/f β noise

Log-normally distributed 1/f noise (used in the synthetic condition) was synthesized
to have the same fractal characteristics as the tempo fluctuations from Chen’s piano
performance of Albeniz’s Triana from Iberia II. The synthesized time series had the
same strength of long-range persistence (spectral exponent, β = .48; Hurst expo-
nent, H = .76)–and was therefore, also stationary and persistent–the same length
(N = 784), the same parameters for the lognormal distribution: the mean value
µ = .1588 and the coefficient of variation cv = σ/µ = .334. The following is how the
self-affine log-normally distributed 1/f noise was created (see Figure 4.1).

1. A time series of Inter-Beat Intervals (IBIs) was extracted from a piano per-
formance (Human condition) and shuffled until it had a spectral slope of zero
(β = 0.00). This resulted in a random time series, yn , n = 1, 2, 3, . . . , N , with
the same empirical distribution. This was used for the Shuffled condition.

2. Then the Shuffled time series was transformed to have a normal Gaussian dis-
tribution using the relation

log(yn ) − µ(log(yn ))
yn1 = (C.1)
σ(log(yn ))

127
with a Gaussian distribution, where the total time interval, T , has been divided
T
into N equal intervals of length, δ, δ = N
. The units of δ are those of T ; N is
dimensionless.

3. A Discrete Fourier Transform was taken of the time series, yn1 . The Fourier
coefficients are given by

N
X
Ym = δ yn1 e2πinm/N , m = 1, 2, 3, . . . , N (C.2)
n=1

This transform maps N real numbers (yn ) into N complex numbers (Ym ). The
Fourier spectrum of white noise will be flat, β = 0. Except for the statistical
scatter, the amplitudes of |Ym | will be equal.

4. The resulting Fourier coefficients, Ym , were filtered using the relation

 m −β/2
Ym0 = Ym (C.3)
N

The power β/2 is used because the power spectral density is proportional to
the amplitude squared. The amplitudes of the small-m coefficients correspond
to short wavelengths, and high frequencies. The large-m coefficients correspond
to long wavelengths and low frequencies.

5. An Inverse Discrete Fourier Transform (IDFT) was taken of the filtered Fourier
coefficients. The sequence of points is given by

N
1 X 0 −2πinm/N
Xn = Y e , m = 1, 2, 3, . . . , N (C.4)
N δ m=1 m

These points constitute the fractional Gaussian noise.

128
6. Many naturally occurring time series have only positive values, which results in
a non-Gaussian distribution (Turcotte, 1997; Malamud and Turcotte, 1999; i.e.,
volumetric flow of a river). The original performance data exhibited a lognormal
distribution. So, the fractional Gaussian noise sequence was converted back into
a fractional log-normal noise sequence using the relation

p !
Xn − log(µ/ 1 + c2v )
Z = exp p , m = 1, 2, 3, . . . , N (C.5)
log(1 + c2v )

The resultant time series represents a realization of lognormal distributed 1/f


noise with β the strength of long-range persistence. This was used to create the
tempo fluctuations in the Synthetic condition of Experiments 4-7.

129
Appendix D

Stimuli: Experiment 1

130
131
132
133
134
135
136
137
138
139
140
141
Bibliography

(2010) Phrase The Oxford Dictionary of Music.

Aschersleben G (2002) Temporal control of movements in sensorimotor synchroniza-


tion. Brain and Cognition 48:66–79.

Bassingthwaighte J, Liebovitch L, West B (1994) Fractal Physiology Oxford Uni-


versity Press, New York.

Batschelet E (1981) Circular Statistics in Biology Academic Press, London.

Bengtsson I, Gabrielsson A (1983) Analysis and synthesis of musical rhythm In


Sundberg J, editor, Studies of music performance, Vol. 39, pp. 27–60. Publications
issued by the Royal Swedish Academy of Music.

Beran J (1994) Statistics for Long-Memory Processes Chapman & Hall/CRC, New
York.

Beran J (2004) Statistics in Musicology Chapman & Hall/CRC, Boca Raton, FL.

Bhatara A, Duan L, Tirovolas A, Levitin D (2010) Musical expression and emotion:


Influences of temporal and dynamic variation. Manuscript submitted for publication .

Bregman A, Pinker S (1978) Auditory streaming and the building of timbre. Cana-
dian Journal of Psychology 32:19–31.

Brothers H (2007) Structural scaling in Bach’s Cello Suite no. 3. Fractals 15:89–95.

Chapin H, Large E, Jantzen K, Kelso J, Steinberg F (2010) Dynamic emotional and


neural responses to music depend on performance expression and listener experience.
Manuscript submitted for publication .

Chen Y, Ding M, Kelso J (2001) Origins of timing errors in human sensorimotor


coordination. Journal of Motor Behavior 33:3–8.

Chen Y, Ding M, Kelso JAS (1997) Long memory processes (1/f type) in human
coordination. Physical Review Letters 79:4501–4504.

142
Chen Y, Repp B, Patel A (2002) Spectral decomposition of variability in synchro-
nization and continuation tapping: Comparisons between auditory and visual pacing
and feedback conditions. Human Movement Science 21:515–532.

Clarke E (1985) Structure and expression in rhythmic performance In Howell P,


Cross I, Wes R, editors, Musical Structure and Cognition, pp. 209–236. Academic
Press, London.

Clarke E (1987) Categorical rhythm perception: An ecological perspective In


Gabrielsson A, editor, Action and Perception in Rhythm and Music, Vol. 55,
pp. 19–33. Royal Swedish Academy of Music.

Clarke E (1993) Imitating and evaluating real and transformed musical perfor-
mances. Music Perception 10:317–341.

Clarke E (1987) Levels of structure in the organization of musical time. Contempo-


rary Music Review 2:211–238.

Clayton M, Sager R, Will U (2004) In time with the music: The concept of entrain-
ment and its significance for ethnomuiscology. ESEM CounterPoint 1:1–82.

Cooper G, Meyer L (1960) The rhythmic structure of music The University of


Chicago Press, Chicago.

deGuzman G, Kelso J (1991) Multifrequency behavioral patterns and the phase


attractive circle map. Biological Cybernetics 64:485–495.

Delignières D, Lemoine L, Torre K (2004) Time intervals production in tapping and


oscillatory motion. Human Movement Science 23:87–103.

Delignières D, Ramdani S, Lemoine L, Torre K, Fortes M, Ninot G (2006) Fractal


analyses for “short” time series: A re-assessment of classical methods. Journal of
Mathematical Psychology 50:525–544.

Delignières D, Torre K, Lemoine L (2008) Fractal models for event-based and dy-
namical timers. Acta Psychologica 127:382–397.

Dixon S, Goebl W (2002) Pinpointing the beat: tapping to expressive performances.


In 7th International Conference on Music Perception and Cognition, pp. 614–620,
Adelaide, Australia. ICMPC7, Causal Productions.

Dixon S, Goebl W, Cambouropoulos E (2006) Perceptual smoothness of tempo in


expressively performed music. Music Perception 23:195–214.

Drake C, Penel A, Bigand E (2000) Tapping in time with mechanically and expres-
sively performed music. Music Perception 18:1–24.

143
Dubois D (2003) Mathematical foundations of discrete and functional systems with
strong and weak anticipations. Lecture Notes in Computer Science 2684:110–132.
Dunlap K (1910) Reactions to rhythmic stimuli, with attempt to synchronize. Psy-
chological Review 17:399–416.
Eke A, Hermán P, Bassingthwaighte J, Raymond G, Percival D, Cannon M, Balla
I, Ikrényi C (2000) Physiological time series: distinguishing fractal noises from
motions. Pflugers Arch-European Journal of Physiology 439:403–415.
Eke A, Herman P, Kocsis L, Kozak LR (2002) Fractal characterization of complexity
in temporal physiological signals. Physiological Measurement 23:R1–R38.
Epstein D (1995) Shaping Time: Music, the Brain, and Performance Schirmer
Books, London.
Feder J (1988) Fractals Plenum Press, New York.
Fraisse P (1984) Perception and estimation of time. Annual Review of Psychol-
ogy 35:1–36.
Gabrielsson A (1995) Expressive intention and performance In Steinberg R, editor,
Music and the Mind Machine, pp. 35–47. Springer-Verlag, Heidelberg.
Gibson J (1966) The senses considered as perceptual systems Houghton Mifflin,
Boston, MA.
Gilden D (2001) Cognitive emissions of 1/f noise. Psychological Review 108:33–56.
Gilden D, Thornton T, Mallon M (1995) 1/f noise in human cognition. Sci-
ence 267:1837–1839.
Glass L (2001) Synchronization and rhythmic processes in physiology. Na-
ture 410:277–284.
Goldberg J, Brown P (1969) Responses of binaural neurons of dog superior oli-
vary complex to dichotic tonal stimuli: Some physiological mechanisms of sound
localization. Journal of Neurophysiology 23:613–636.
Grigolini P, Aquino G, Bologna M, Luković M, West B (2009) A theory of 1/f noise
in human cognition. Physica A 388:4192–4204.
Henderson M (1936) Rhythmic organization in artistic piano performance In
Seashore C, editor, Objective analysis of muscial performance, Vol. 4, pp. 281–305.
University of Iowa Press, Iowa City.
Honing H (2006) Computational modeling of music cognition: A case study on
model selection. Music Perception 23:365–376.

144
Hsü K, Hsü A (1990) Fractal geometry of music. Proceedings of the National
Academy of Sciences USA 87:938–941.

Hsü K, Hsü A (1991) Self-similarity of the “1/f noise” called music. Proceedings of
the National Academy of Sciences USA 88:3507–3509.

Hurst H (1951) Long-term storage capacity of reservoirs. Transactions of the Amer-


ican Society of Civil Engineers 116:770–808.

Hurst H, Black R, Simaika Y (1965) Long-Term Storage: An Experimental Study.


Constable, London.

Ivry R, Schlerf J (2008) Dedicated and intrinsic models of time perception. Trends
in Cognitive Sciences 12:273–280.

Iyer VS (1988) Microstructures of feel, macrostructures of sound: Embodied cog-


nition in West African and African-American musics. Ph.D. diss., University of
California, Berkeley.

Jones M, Boltz M (1989) Dynamic attending and responses to time. Psychological


Review 96:459–491.

Klatt DH (1975) Vowel lengthening is syntactically determined in a connected dis-


course. Journal of Phonetics 3:129–140.

Large E (1992) Judgements of similarity for musical sequences. Unpublished


manuscript.

Large E (2000) On synchronizing movements to music. Human Movement Sci-


ence 19:527–566.

Large E (2008) Resonating to musical rhythm: Theory and experiment In Grondin


S, editor, The Psychology of Time, pp. 189–231. Emerald Group Publishing, Ltd.,
Bingley, UK.

Large E, Fink P, Kelso J (2002) Tracking simple and complex sequences. Psycho-
logical research 66:3–17.

Large E, Jones M (1999) The dynamics of attending: How people track time-varying
events. Psychological Review 106:119–159.

Large E, Kolen J (1994) Resonance and the perception of musical meter. Connection
Science 6:177–208.

Large E, Palmer C (2002) Perceiving temporal regularity in music. Cognitive Sci-


ence 26:1–37.

145
Large E, Rankin S (2007) Matching performance to notation In Eerola
T, Toiviainen P, editors, MIDI Toolbox: MATLAB Tools for Music Re-
search. University of Jyväskylä, Kopijyvä, Jyväskylä, Finland Available at
http://www.jyu.fi/musica/miditoolbox/.

Lemoine L, Torre K, Delignières D (2006) Testing for the presence of 1/f noise in con-
tinuation tapping data. Canadian Journal of Experimental Psychology 60:247–257.

Lerdahl F, Jackendoff R (1983) A generative theory of tonal music MIT Press,


Cambridge.

London J (2004) Hearing in Time: Psychological aspects of musical meter Oxford


University Press, Oxford.

Madison G (2004) Fractal modeling of human isochronous serial interval production.


Biological Cybernetics 90:105–112.

Malamud B, Turcotte D (1999) Self-affine time series: I. generation and analysis.


Advances in Geophysics 40:1–90.

Mandelbrot B (1977) The fractal geometry of nature. W. H. Freeman and Company,


New York.

Mandelbrot B, Wallis J (1969) Robustness of the rescaled range R/S in the


measurement of noncyclic long-run statistical dependence. Water Resources Re-
search 5:967–988.

Mates J (1994) A model of synchronization of motor acts to a stimulus sequence. I.


Timing and error corrections. Perception and Psychophysics 52:691–704.

Mates J, Muller U, Radil T, Poppel E (1994) Temporal integration in sensorimotor


synchronization. Journal Of Cognitive Neuroscience 6:332–340.

Mcauley J (1995) Perception of time as phase: Toward an adaptive-oscillator model


of rhythmic pattern processing. Ph.D. diss., Indiana University, Bloomington.

McNeill WH (1995) Keeping Together in Time: Dance and Drill in Human History.
Harvard University Press, Cambridge, MA.

Michon J (1967) Timing in temporal tracking van Gorcum, Assen, NL.

Nettl B (2000) An ethnomusicologist contemplates universals in musical sound and


musical culture. In Wallin N, Merker B, Brown S, editors, The Origins of Music,
pp. 463–472. MIT Press, Cambridge, MA.

Palmer C (1989) Mapping musical thought to musical performance. Journal Exper-


imental Psychology: Human perception and Performance 15:331–346.

146
Palmer C (1992) The role of interpretive preferences in music performance In Jones
M, Holleran S, editors, Cognitive bases of musical communicaiton, pp. 249–262.
American Psychological Association, Washington, DC.

Palmer C (1996) On the assignment of structure in music performance. Music


Perception 14:23–56.

Palmer C (1997) Music performance. Annual Review of Psychology 48:115–138.

Palmer C, van de Sande C (1995) Range of planning in music performance. Journal


of Experimental Psychology: Human Perception and Performance 21:947–962.

Penel A, Drake C (1998) Sources of timing variations in music performance: A


psychological segmentation model. Psychological Research 61:12–32.

Pikovsky A, Rosenblum M, Kurths J (2001) Synchronization: A universal concept


in nonlinear sciences Cambridge University Press, Cambridge, England.

Pressing J, Jolley-Rogers G (1997) Spectral properties of human cognition and skill.


Biological Cybernetics 76:339–347.

Rangarajan G, Ding M (2000) Integrated approach to the assessment of long-range


correlation in time series data. Physical Review E 61:4991–5001.

Repp B (1992a) A constraint on the expressive timing of a melodic gesture: Evidence


from performance and aesthetic judgment. Music Perception 10:221–42.

Repp B (1992b) Diversity and commonality in music performance: an analysis of


timing microstructure in Schumann’s “Traumerei”. Journal of the Acoustical Society
of America 92:2546–68.

Repp B (1995) Expressive timing in Schumann’s “Traumerei:” an analysis of per-


formances by graduate student pianists. Journal of the Acoustical Society of Amer-
ica 98:2413–2427.

Repp B (1998) Variations on a theme by Chopin: Relations between perception


and production of timing in music.hopin: Relations between perception and produc-
tion of timing in music. Journal Experimental Psychology: Human perception and
Performance 24:791–811.

Repp B (1999a) Control of expressive and metronomic timing in pianists. Journal


of Motor Behavior 31:145–164.

Repp B (1999b) Detecting deviations from metronomic timing in music: Ef-


fects of perceptual structure on the mental timekeeper. Perception and Psy-
chophysics 61:529–548.

147
Repp B (1999c) Relationships between performance timing, perception of timing
perturbations, and perceptual-motor synchronisation in two Chopin preludes. Aus-
tralian Journal of Psychology 51:188–203.

Repp B (2001a) Effects of music perception and imagery on sensorimotor synchro-


nization with complex timing patterns In Zatorre R, Peretz I, editors, The Biological
Foundations of Music, Vol. 930 of Annals of the New York Academy of Sciences,
pp. 409–411.

Repp B (2001b) Phase correction, phase resetting, and phase shifts after sublimi-
nal timing perturbations in sensorimotor synchronization. Journal of Experimental
Psychology: Human Perception and Performance 27:600–621.

Repp B (2001c) Processes underlying adaptation to tempo changes in sensorimotor


synchronization. Human Movement Science 20:277–312.

Repp B (2002a) The embodiment of musical structure: effects of musical context on


sensorimotor synchronization with complex timing patterns In Prinz W, Hommel B,
editors, Common Mechanisms in Perception and Action: Attention and Performance
XIX, pp. 245–265. Oxford University Press, Oxford.

Repp B (2002b) Phase correction following a perturbation in sensorimotor synchro-


nization depends on sensory information. Journal of Motor Behavior 34:291–298.

Repp B (2002c) Phase correction in sensorimotor synchronization: Nonlinearities


in voluntary and involuntary responses to perturbations. Human Movement Sci-
ence 21:1–37.

Repp B (2003) Phase attraction in sensorimotor synchronization with auditory se-


quences: Effects of single and periodic distractors on synchronization accuracy. Jour-
nal of Experimental Psychology: Human Perception and Performance 29:290–309.

Repp B (2005) Sensorimotor synchronization: A review of the tapping literature.


Psychonomic Bulletin and Review 12:969–992.

Repp B (2008a) Metrical subdivision results in subjective slowing of the beat. Music
Perception 26:19–39.

Repp B (2008b) Multiple temporal references in sensorimotor synchronization with


metrical auditory sequences. Psychological Research 72:79–98.

Rosen R (1985) Anticipatory Systems Pergamon Press, New York.

Schulze HH, Cordes A, Vorberg D (2005) Keeping synchrony while tempo changes:
Accelerando and ritardando. Music Perception 22:461–477.

148
Shaffer L, Todd N (1987) The interpretive component in musical time In Gabrielsson
A, editor, Action and Perception in Rhythm and Music, pp. 139–152. The Royal
Swedish Academy of Music, Stockholm.

Sloboda J (1985) Expressive skill in two pianists: Metrical communication in real


and simulated performances. Canadian Journal of Psychology 39:273–293.

Sloboda J, Juslin P (2001) Psychological perspectives on music and emotion In


Juslin P, Sloboda J, editors, Music and emotion: Theory and research, pp. 71–104.
Oxford University Press, Oxford.

Snyder J, Krumhansl C (2001) Tapping to ragtime: Cues to pulse finding. Music


Perception 18:455–489.

Stephen D, Stepp N, Dixon J, Turvey M (2008) Strong anticipation: Sensitivity to


long-range correlations in synchronization behavior. Physica A 387:5271–2578.

Stepp N, Turvey M (2009) On strong anticipation. Cognitive Systems Re-


search 11:148–164.

Stevens L (1886) On the time sense. Mind 11:393–404.

Thaut M, Stephan K, Wunderlich G, Schicks W, Tellmann L, Herzog H, McIntosh


G, Seitz R, Hömberg V (2009) Distinct cortico-cerebellar activations in rhythmic
auditory motor synchronization. Cortex 45:44–53.

Thaut M, Tian B, Azimi-Sadjadi M (1998) Rhythmic finger tapping to cosine-


wave modulated metronome sequences: Evidence of subliminal entrainment. Human
Movement Science 17:839–863.

Todd N (1985) A model of expressive timing in tonal music. Music Percep-


tion 3:33–58.

Toiviainen P, Snyder J (2003) Tapping to bach: Resonance-based modeling of pulse.


Music Perception 21:43–80.

Torre K, Delignières D (2008) Unraveling the finding of 1/f β noise in self-paced


and synchronized tapping: a unifying mechanistic model. Biological Cybernet-
ics 99:159–170.

Turcotte DL (1997) Fractals and Chaos in Geology and Geophysics Cambridge


University Press, Cambridge.

van Hateren J (1997) Processing of natural time series of intensities by the visual
system of the blowfly. Vision Research 37:3407–3416.

149
van Noorden L, Moelants D (1999) Resonance in the perception of musical pulse.
Journal of New Music Research 28:43–66.

van Orden G, Holden J, Turvey M (2003) Self-organization of cogntitive perfor-


mance. Journal of Experimental Psychology: General 132:331–350.

Vorberg D, Hambuch R (1978) On the temporal control of rhythmic performance. In


Requin J, editor, Attention and Performance, VII, pp. 535–555. Erlbaum, Hillsdale,
NJ.

Vorberg D, Schulze H (2002) Linear phase-correction in synchronization: predic-


tions, parameter estimation, and simulations. Journal of Mathematical Psychol-
ogy 46:56–87.

Vorberg D, Wing A (1996) Modeling variability and dependence in timing In


Heuer H, Keele S, editors, Handbook of Perception and Action: Motor Skills, Vol. 2,
pp. 181–262. Academic Press, London.

Voss H (2001) Dynamic long-term anticipation of chaotic states. Physical Review


Letters 87:0141021–0141024.

Voss R (1989) Random fractals: self-affinity in noise, music, mountains, and clouds.
Physica D 38:362–371.

Voss R, Clarke J (1975) “1/f noise” in music and speech. Nature 258:317–318.

Voss R, Clarke J (1978) 1/f noise in music: music from 1/f noise. Jounal of the
Acoustical Society of America 63:258–263.

Voss RF (1988) Fractals in nature: From characterization to simulation. In Pitgen


HO, Saupe D, editors, The Science of Fractal Images, pp. 21–70. Springer-Verlag,
New York.

West B, Shlesinger M (1989) On the unbiquity of 1/f noise. International Journal


of Modern Physics B 3:795–819.

West B, Shlesinger M (1990) The noise in natural phenomena. American Scien-


tist 78:40–45.

Wing A, Kristofferson A (1973) Response delays and the timing of discrete motor
responses. Perception and Psychophysics 14:5–12.

Wohlschlager A, Koch R (2000) Synchronization error: an error in time perception


In Desian P, Windsor L, editors, Rhythm perception and production, pp. 115–127.
Swets & Zeitlinger, Lisse, The Netherlands.

150
Woodrow H (1932) The effects of rate of sequence upon the accuracy of synchro-
nization. Journal Experimental Psychology 15:357–379.

Yeston M (1976) The stratification of musical rhythm Yale University Press, New
Haven, CT.

Yu Y, Romero R, Lee TS (2005) Preference of sensory neural coding for 1/f signals.
Physical Review Letters 94:108–103.

Zuckerkandl V (1956) Sound and Symbol: Music and the External World Princeton
University Press, Princeton.

151

You might also like