Professional Documents
Culture Documents
REFERENCES
Linked references are available on JSTOR for this article:
http://www.jstor.org/stable/25653513?seq=1&cid=pdf-reference#references_tab_contents
You may need to log in to JSTOR to access the linked references.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer Music
Journal
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Frangois Pachet
Sony CSL Paris Description-Based Design
6 Rue Amyot
75005 Paris, France of Melodies
pachet@csl.sony.fr
Most current approaches in computer-aided compo For instance, Hamanaka, Hirata, and Tojo (2008)
sition (CAC) are based on an explicit construction propose a morphing metaphor in which melodies
paradigm: users build musical objects by assembling can be created as interpolations between two given
components using various construction tools. Virtu melodies. But this approach is limited to the context
ally all technologies developed by computer science of the generative theory of tonal music (Lerdahl and
and artificial intelligence have been applied to CAC, Jackendoff 1983), and is not extensible to arbitrary
thereby progressively increasing the sophistication categories, as we will show later.
of music-composition tools. Composers can choose We propose here a novel approach to music
between many programming paradigms to express composition called description-based design that
the compositions they "have in mind/' from the attempts to remove the need for the user to under
now-standard time-lined sequencers (e.g., Stein stand anything technical related to the target objects.
berg's Cubase) to advanced programming languages In this article, we focus on the creation of simple
or libraries (e.g., OpenMusic; Assayag et al. 1999). musical objects?unaccompanied melodies?as a
Although these explicit constructions do benefit working example, but our paradigm is general and
from abstractions of increasing sophistication (ob can be applied to many other fields of design.
jects, constraints, rules, flow diagrams, etc.), CAC First, we introduce the general description-based
always remains based on an explicit construction design mechanism, and then we describe the type of
paradigm: Users must give the computer a clear melodies we target. Finally, we describe experiments
and complete definition of their material. This ap demonstrating the functionality of the algorithm
proach has the enormous advantage of letting users and its potential.
control all dimensions of their work. However, it
also requires from users a fine understanding of
the technicalities at work. For instance, composing Description-Based Design
music with object-orientation requires the under
standing of objects, classes, and message passing. Description-based design stems from the paradigm
Using constraints requires the understanding of of Reflexive Interaction (Pachet 2008). The idea is to
constraint satisfaction, filtering, and of the basic let users manipulate images of themselves, produced
constraint libraries, etc. by an interactive machine-learning component. The
An interesting attempt to escape these technical creation of objects (musical objects in our case)
requirements is the Elody system (Letz, Orlarey, is performed as a side-effect of the interaction, as
and Fober 1998) in which the user can create opposed to traditional interactive systems in which
arbitrary abstractions by selecting musical material target objects are produced up-front as the result of
together with a specific dimension of music (e.g., a controlled process. A typical example of reflexive
interactive system is the Continuator (Pachet
pitch or rhythm). These abstractions can then be
2004), a system that continuously learns stylistic
applied to other musical material to create yet
information coming from the user's performance
more complex objects. But here again, the user
and generates music "in the same style" in the
must mentally maintain a model of the abstraction
form of real-time answers to, or continuations of,
algorithm at work, a task that can be particularly
the music performed by the user. The Continuator
difficult as the complexity of the composition
was shown to trigger spectacular interactions with
grows. Other approaches propose construction tools
that do not require explicit programming skills.
professional jazz musicians (Pachet 2004) as well as
with children (Addessi and Pachet 2005) involved
in free, unstructured improvisation. However, this
Computer Music Journal, 33:4, pp. 56-68, Winter 2009 type of interaction shows limitations when users
? 2009 Massachusetts Institute of Technology. want to structure their production?in other words,
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
when they want to shift from improvisation to A Machine-Learning Tagging System
composition.
Description-based design adds a further com The second component is a tagging system, associ
ponent to the Continuator-like interaction by ated with a machine-learning algorithm that learns
introducing an explicit linguistic construct, pre a mapping between user tags and the feature set.
cisely aimed at addressing this "structure" problem In our case, this machine-learning component is a
inherent in free-form improvisation systems. The support vector machine (SVM). SVMs are automatic
idea is as follows. In the first phase, the system classifiers routinely used in many data-mining
generates objects?melodies, in our case?randomly applications (Burges 1998). In the training phase, the
or according to specific generators. The user can SVM builds an optimal hyper-plane that separates
then freely tag these objects with words, "jumpy," two classes, so as to maximize the so-called
"flat," "tonal," "dissonant," etc. Each object can be "margin" between the classes. This margin can then
tagged by several words, or by none. In the second be used to classify new points automatically using
phase, the user selects a starting object (say, a flat a geometrical distance as illustrated subsequently.
melody), and one of the previously assigned tags SVMs have been used extensively to learn music
(say "jumpy"). The user can then ask the system information, in particular in the audio domain,
to produce a new object that will be "close" to the typically using spectral features (Mandel, Poliner,
selected one, but "more jumpy" or "less jumpy." and Ellis 2006). In our case, we use them to learn
More generally, the user can reuse any of the tags to classes from symbolic features, as described herein.
modify a given object in the semantic direction of The tags are entered by the users as free text.
the tag. The system will then attempt to generate a To each tag is associated an SVM classifier that is
new object that optimally satisfies two conditions: retrained each time a user adds or removes a tag for
being "close" to the starting object, and increasing an object. To avoid undesirable effects such as over
(or decreasing) the probability of being of a certain fitting, a feature-selection algorithm is applied prior
tag. The new object (in our case, a progressively to the training phase. We use the IGR (information
"jumpier" melody) is then added to the palette of gain ratio) algorithm (Quinlan 1993) by which only a
objects created by the system. It can, in turn, be limited number of features are kept, maximizing the
tagged or refined at will. The design activity is there "information gain" of each retained feature. Once
fore strictly restricted to tagging objects and creating trained, this classifier can compute a probability
variations using these tags. The implementation of for that tag to be true for any object. Similarly, the
this scheme requires a combination of components classifier computes the probabilities, for a selected
that we briefly describe here. melody, of all learned tags, according to the current
state of the system in the session. More precisely, for
Object Generator a given tag, we use Boolean classifiers trained on pos
itive and negative examples. Positive examples are
We call the objects that the user wants to produce all the objects having been tagged by the user with
target objects. In our context, these objects are de this tag. Negative examples are chosen automati
fined by a set of technical features that describe these cally by the system, according to various heuristics.
objects and a generator that produces sets of objects. (Notably, it chooses approximately the same number
The generator is a program that should be able to of negative examples as positive ones, and it chooses
randomly generate every possible object of interest. only objects that have been tagged, obviously with
The choice of the feature set will influence the tags other than the one under consideration.)
capacity of the system to faithfully learn user tags, Of course, the accuracy of this prediction depends
but identifying a reasonable feature set is usually on many factors, including the feature set, but also
straightforward. These two ingredients are therefore the number of examples (i.e., objects having been
easy to design. We give the details for the particular tagged by the user). In the experiment described in
case of unaccompanied melodies subsequently. this article, we show empirically that the predictions
Pachet 57
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 1. A support vector defined by particular and the hyperplane is probability to belong to A
machine defines a class "margin" points called traditionally interpreted as is greater. Note that the
separation in a support vectors and define the inverse of the distance between an item
n-dimensional space as an the margin. The lighter probability for the item to and its variations
hyperplane of dimension line, in the middle of the belong to the class outside (variationl and variation2)
n ? 1, dividing the space margin, represents the the boundary In this case, is not necessarily
into two regions. In this class separation itself. The variation2 is closer to A meaningful.
figure, the heavy lines are distance between a point than variationl, so its
The task of the combinatorial generator is to gener Melodies are a good example to illustrate our
ate variations of an object that maximize two prop approach because they are both technical objects
erties: being as close as possible to the initial object, and subjective ones.
and increasing (or decreasing) the probability of a
given tag/classifier by a given ratio. A naive, combi Five Types of Melodies
natorial version of this algorithm is given in Figure 2.
The generation of variations is domain-specific. There are many known technical features to
We describe a simple variation generator for describe melodies, notably related to pitch
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 2. The commands is Figure 3. A typical tonal
combinatorial search straightforward. Here, N is melody (My Rifle, My
algorithm. The "FindLess" a predetermined number Pony and Me, originally
algorithm is similar, and of variations to be sung by Dean Martin and
extension to compound generated. Ricky Nelson), here,
stylized.
Figure 2
Pachet 59
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 4. The melody of Figure 5. Examples of (a)
With a Little Help from tonal, (b) serial, and (c)
My Friends (here, stylized) brown melodies generated
is a typical brown melody. by our three generators.
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 6. Initial melody though here not all pitch
created by the serial classes). The serial
melody generator. The classifier yields a
melody is perfectly serial probability of 1.0. The
according to our definition tonal classifier yields a
(all pitches are different, probability of 8 x 10-3.
Pachet 61
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 7. The same Figure 8. Still a bit "more Figure 9. Again, "more Figure 10. "More tonal,"
melody, a bit "more tonal." Probability is now tonal." Probability is now again. Probability is
tonal" The differences are 1.4 x 10"1. 3.11 x 10"1. 5.3 x 10"1.
highlighted. The tonal
classifier has increased its
probability to 5.7 x 10"2. Figure 11. Probability is
now 7.7 x 10"1.
.i^gnniJjirJJiiggi
Figure 8 Figure 10
Training Phase
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 12. Final step; the "similar" to the initial Figure 13. A slightly more not learned perfectly by
melody is now perfectly serial melody. The tonal version of the the classifier, the
tonal (in B-flat major, accidentals are displayed melody. The melody is, algorithm found an
when notes are spelled without correction and strictly speaking, not more artificial way of improving
enharmonically, with a thus do not reflect the tonal than the preceding its "understanding" of
probability of the tonal tonality, which contains one. However, because the "tonalness" by reducing
classifier of 9.6 x 10_1J> yet flats rather than sharps. notion of "tonalness" was the number of notes.
I < j f j r if ii i r'iJ i
this problem consists of explicitly programming a is highlighted by the fact that the corresponding
function to add notes to a given melody. But we can probabilities of being tonal have shifted from 0.99
again avoid the use of such explicit programming. to 0.07 (see Table 4). This phenomenon is not unex
Instead, we can reuse the two tags long and short, pected, however, as the system has just been asked to
trained with examples generated with the other make the melody "more long," but it was not given
three generators. By convention, we tag melodies any constraint on tonality or any other property.
with fewer than eight notes as short, and melodies A natural way to address this problem consists
with greater than twelve notes as long. As a in issuing a compound conjunctive command of the
consequence, the system automatically learns long form "more long AND as tonal." Such a query is
and short, with examples coming from all three made by selecting the tags and the corresponding
generators. We can check that the system has cor modifiers through a specific interface (see Figure 15).
rectly learned the tags long and short by observing In this case, starting from the same melody, we
the selected features, as indicated in Table 2. obtain the melodies in Figure 16. We can observe
The feature "number of notes" was selected as that the algorithm has now progressively added
a primary feature for long and short. This feature only tonal notes (F and G). Most importantly, these
selection process also shows incidentally that the added notes have been chosen as a "side effect"
system has however not simply associated long and of the compound command, and not through the
short to the number of notes, but rather to a more introduction of an explicit representation of tonality
complex configuration of features. For instance, in the program.
it turns out that most of the long melodies also Note that this compound query triggers a non
have more repetition in their interval sequence. trivial search. Figure 17 illustrates the search process
The system has no way to generalize better in corresponding to a "more long AND as tonal"
our context ("better" would be to consider only query At each iteration (x axis), the probabilities
the feature nbNotes). But as we will see, this for tags long (dashed line) and tonal (plain line)
approximation does not prevent it from producing are displayed. A solution is found when the tonal
"meaningful" variations. plain line is within the two horizontal dashed
We now consider a melody that is tagged both lines (which represent the bounds ?10 percent of
as tonal and short, illustrated in Figure 14 (the first the starting "tonalness") and simultaneously the
melody). In a first step we will make it longer as in long (dashed) line is above the horizontal large line
the previous experiment, i.e., through the command (which represents "more long" by 15 percent). It can
more long. As we can observe, this command indeed be seen that the system explores about 300 melodies
results in a similar melody, with more notes. The before finding a solution.
sequence of "more long" commands is illustrated
in Figure 14, and it can be noted that the melodies
have all indeed progressively more notes (from seven
initially up to ten). Making a Tonal Melody More Brown
However, it can also be noted that the notes that
have been added to the melody by the combinatorial The last example starts from a tonal melody that is
algorithm make it not tonal any longer: The initial progressively made more brown. We illustrate again
melody is in C major, but added notes (D-sharp, C the process step-by-step in Figure 18; note also the
sharp, and F-sharp) are not in the C-major scale. This increase in brownness in Table 5.
Packet 63
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 14. A short and probabilities of being long Figure 15. An interface for less, or ignorej and the
tonal melody as a starting are(b) 1.25 x 10"3, (c) specifying conjunctive corresponding ratios. Here,
point for repeated 9.77 x lO"1, and (d) 1.0. compound queries holding a query to produce an item
"stretching" operations. The latest melody cannot on several tags that is "more long by 15
The probability of being be stretched any farther, as simultaneously. For each percent AND as tonal by
long is initially (a) its probability of being tag the user can specify the 10percent," while the
1.06 x 10~7. Successive long is 1.0. type of action (as, more, other tags are ignored.
$r jJr MJ II _ (a)
jg^gj^^l Q as <?) more Q less Q?
any serial '^q =
| any short _
t as tonal by 10% . | OK I cancel j
: jj'ii^i'ubi
(d)
Table 3. The Progressive Increase in "Tonalness" at Each Step of the Process Leading to Sequences in
Figures 7-14
Melody Versions Serial Classifier Tonal Classifier "Tonalness"
1: Initial 1.0 8 x 10~3 0.54
2: More tonal 1.0 5.7 x 10"2 0.625
3: More tonal 1.0 1.4 x 10"1 0.66
4: More tonal 9.95 x 10"1 3.11 x 10"1 0.70
5: More tonal 8.15 x 10"1 8.15 x 10"1 0.75
6: More tonal 1.82 x 10"2 7.7 x 10"1 0.79
7: More tonal 3.87 x 10"5 9.63 x 10"1 0.87
8: More tonal 1.72 x 10"7 9.98 x 10"1 1.0
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 16. The (a) short given in Table 4. The final
and tonal melody now melody is (b) close to the
progressively made "more original, (c) longer, and (d)
long and as tonal." still as tonal as the starting
Respective probabilities of one.
being long and tonal are
jr iJJr U(jjUJ'lj' I
to use the system to control the generation of jazz
improvisations as an extension of the Continuator
(c) system (Pachet 2004). In this latter case, subjective
tags are used as control handles to influence, in real
<?"jJJriiJuJ(j)ii* ii (d)
time, the quality of generated solos.
Note that the working example showed here is
based on the SVM, but other classifiers could be
used. Decision trees, for instance, have been shown
from the user. The only programming constraint recently to exhibit better geometrical properties than
lies in the variation generators: They must be SVMs (Alvarez, Bernard, and Deffuant 2007). They
designed in such a way that they generate at least could be substituted for SVMs without changing the
the target objects (together with possibly many framework described here.
unwanted objects). However, this is a relatively The algorithm we propose is blind, bearing some
weak constraint, as these generators can be designed similarity with other blind search algorithms like
once for all, for a particular domain (here, melodies). genetic algorithms (GAs), often used in music gener
Of course, more- or less-efficient generators could ation (Biles 1994). GAs could indeed be used to build
be considered, but default naive generators are easy our variations, instead of specifically designed ran
to design. dom generators. However, GAs are more difficult to
This article only aims at demonstrating the nature control than random generators. Most importantly,
of the underlying algorithm using simple, well we believe that our naive search algorithm can
understood examples. The experiment presented be optimized by exploiting information about the
here used controlled categories to illustrate the features used by the classifier, and this constitutes a
algorithm and to show its capacity to produce current avenue of research. Such an optimization is
musical objects without explicit programming or not possible by definition in GAs, which operate on
Pachet 65
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 17. The evolution of
the search process during
the "more long by 15
percent AND as tonal by
10 percent" query.
* * long . tonal ^ ^ rnore long by 5* *ryr?-*?- less tonal by 10 *M3mm more tonal by 10
^ ^^^^^^^^ ^|^f
? - ?a a a s * is 58 r a * 3 lllili IB S S s??3 81 id 8 5 S "I ?
Table 5. The Progressive Increase in Brownness, thereby reducing the need for users to learn
Starting from an Initial Tonal Melody technical languages of the objects they have
mind." This approach is well suited to domain
Melody Versions Brown
which the features to describe objects are kn
1: tonal 0.0 and accurate, and users have the capacity to
2: 1 More brown 0.0 express subjective judgments in a consistent w
3: 2 More brown 6.65 x 10"35 This applies to most of the musical objects cr
4: 3 More brown 4.67 x 10"33 in the context of computer-assisted composit
5: 4 More brown 4.14 x 10~32 For instance, description-based design is curr
6: 5 More brown 1.15 x 10"29 being applied to other types of musical object
7: 6 More brown 2.88 x 10"28 particular, chords, chord sequences, and harm
8: 7 More brown 4.71 x 10"22 melodies.
9: 8 More brown 3.31 x 10"20
10: 9 More brown 1.41 x 10"8 Audio synthesis is also being investigated.
11:10 More brown 9.95 x 10"1 For instance, programming FM sounds requires
12: 11 More brown 1.0 notoriously complex knowledge of FM synthesis.
Several approaches have attempted to provide users
with more subjective means of programming sound
chromosomes, which are independent ofsynthesizers
the feature(Rolland and Pachet 1996; Sarkar,
sets used by the classifiers. Vercoe, and Yang 2007). But these approaches are
always
Description-based design attempts to bridge based on a fixed, pre-programmed represen
tation
the gap between description and construction, of supposedly universal subjective judgments.
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Figure 18. Various steps in to remove a note, to later
making a tonal melody add another note back (h).
"more brown." Note that At the last step (1), the
at (f), the algorithm finds only way to improve
no other solution to (slight brownness) is to
increase brownness than remove a note.
Y' ? 1 ^ ^(a) 1J J 1 ^ ^
personal subjective judgments about sound textures
and reuse these judgments to explore sound spaces
in an intuitive and personal way. In the case of audio
loops, our approach can benefit from two sets of tech
nologies: The large corpus of studies in the domain
4 - 1 J J J J 'tlJ J 1 J' * 11(b)
of audio features that yield efficient representations
of audio objects, and the emerging technologies
of concatenative sound synthesis (Schwartz 2006),
which provide us with the variation generators
j?r JtJJ JJ(c) W J 1 ^ needed for description-based design.
References
Pachet 67
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms
Levenshtein, V. I. 1965. "Binary Codes Capable of Programming." Computer Music Journal 20(3):47
Correcting Deletions, Insertions, and Reversals." 58.
Cybernetics and Control Theory 10(8):707-710. Sarkar, M., B. Vercoe, and Y. Yang. 2007. "Words
Mandel, M., G. Poliner, and D. Ellis. 2006. "Support that Describe Timbre: A Study of Auditory Per
Vector Machine Active Learning for Music Retrieval." ception Through Language." Paper presented
Multimedia Systems 12(1):3-13. at the Language and Music as Cognitive Sys
Pachet, E 2004 "Beyond the Cybernetic Jam Fantasy: tems Conference (LMCS-2007), Cambridge, UK,
The Continuator." IEEE Computer Graphics and 11-13 May.
Applications 4(l):31-35. Schwartz, D. 2006. "Concatenative Synthesis: The
Pachet, F. 2008. "The Future of Content Is in Ourselves." Early Years." Journal of New Music Research 35(1 ):3
ACM Computers in Entertainment 6(3). Available at 22.
www. acm.org/pubs/cie/. Temperley, D. 2007. Music and Probability. Cambridge,
Quinlan, J. R. 1993. Programs for Machine Learning. Los Massachusetts: MIT Press.
Altos, California: Morgan Kaufmann. Voss, R. E, and J. Clarke. 1978. "1// Noise in Music:
Rolland, P.-Y., and F. Pachet. 1996. "A Framework Music From 1// Noise." Journal of the Acoustical
for Representing Knowledge about Synthesizer Society of America 63(1):258-261.
This content downloaded from 67.184.195.145 on Fri, 14 Jul 2017 00:14:12 UTC
All use subject to http://about.jstor.org/terms