You are on page 1of 4

Proceedings of the International Symposium on Musical Acoustics, March 31st to April 3rd 2004 (ISMA2004), Nara, Japan

3-S2-2

Auditory Grouping in the Perception of Roughness Induced by


Subharmonics: Empirical Findings and a Qualitative Model
Chen-Gia Tsai
Department of Musicology, Humboldt University Berlin, Germany
tsai.cc@lycos.com

subharmonics at multiples of f0/n can occur in human


vocalizations. Typically, lower subharmonics are too
weak to evoke the subharmonic pitch f0/2, but upper
subharmonics are strong enough to induce the rough
sound quality. This study addressed some empirical
findings against existing psychoacoustic models of
roughness and provided a new model that qualifies the
perception of roughness induced by subharmonics.

Abstract
Quasi-periodic sounds with subharmonics at (2n-1)f0/2
(where f0 is the perceived pitch, n = 1, 2, 3...) can be
produced by musical instruments such as the saxophone,
the trombone, the violin, and the Chinese membrane
flute. Lower subharmonics in a natural sound are always
too weak to evoke the pitch f0/2, but upper
subharmonics (>11f0/2) can be strong enough to affect
the sound quality. Subharmonics are common in human
vocalizations and have been identified as a source of
roughness. However, this type of roughness cannot be
explained by existing psychoacoustic models and
appears to contradict the theory of consonancedissonance. The present study provided a qualitative
model of roughness induced by subharmonics with the
consideration of higher-order mechanisms of auditory
grouping. The key assumption was that interference
between components at nf0 lying in the same critical
bands would be largely reduced once they are grouped
by a robust pitch sensation of f0. Roughness induced by
subharmonics reflects a limitation of the pitch-based
grouping mechanism, as the perceived pitch is too high
for grouping the subharmonics.

2. Background
2.1. Neural correlates of musical dissonance
Helmholtz explained the perception of musical
dissonance in terms of beats evoked by adjacent
harmonics of two simultaneously sounding musical
tones. These beats could result in intermittent neural
activity. This approach implies that the degree of
dissonance depends on the spectral content of the tones.
Recent research on the neurophysiological basis for
dissonance perception mainly concerned temporal
coding at various levels of the auditory nervous system,
such as the inferior colliculus [4] and the primary
auditory cortex [5]. However, such phase-locked firing
patterns should be explained no more than maintenance
of temporal information at lower levels of auditory
processing. Its correlation to the unpleasant sensation
associated with dissonance or roughnesswhich is
likely coded in the prefrontal cortexremains to be
demonstrated.
A case study of brain lesion in the auditory cortex
showed a disassociation between the harmony
perception and roughness perception [6]. The authors
thus suggested that pitch relationships influenced
harmony perception in the vertical dimension with
roughness playing a secondary role. In the other words,
harmonics of two musical tones may not produce
unpleasant beats when lying in the same critical band.

1. Introduction
Auditory roughness is an important parameter that
induces unpleasant qualities of a sound. Since its
introduction by Helmholtz [1], roughness has been
considered to be due to rapid beatings in auditory
peripheral channels, or critical bands. Aside from
psychoacoustic research, roughness as an indicator of
pathological voices has been extensively studied by
clinicians.
Roughness evaluations of human voices have
provided new data for examining psychoacoustic
models of roughness. In Reuters dissertation [2], a
psychoacoustic model for roughness calculation [3] was
applied to pathological voices. It was a striking finding
that computed results showed a medium correlation to
the perceived roughness.
A conflict between psychoacoustic and clinical
studies of human voices can be found in a specific type
of roughness: roughness induced by subharmonics.
Subharmonics are spectral components at multiples of a
low integer fraction of the perceived pitch f0. This paper
focused on the subharmonics at (2n-1)f0/2, although

2.2. Pleasant beating in low-pitched singing


It is also unclear whether beats arising from the
harmonics of a single musical tone can induce
unpleasant qualities. According to psychoacousticians, a
harmonic-rich voice with f0 < 200 Hz would have a high
value of roughness, because upper harmonics could
interfere each other within critical bands. However,
such low-pitched voices are not rough according to
voice clinicians. In singing practice, the voices of a

257

professional bass are characterized by a formant around


3 kHz. Although the unresolved harmonics clustering
around this singers formant could give the voice a
rough quality, the audience favors bright low-pitched
voices rather than dull ones that are free of roughness.

whereas steady but rough tones with subharmonics can


be found in wind instruments (e.g. the trombone and
the bassoon, see [7]) and bowed string instruments [8].
Although such tones are generally avoided in Western
Classical Music because of the unpleasant quality, they
are deliberately used in Russian lament [9], jazz and
Chinese membrane flute music [10].
It is important to note that stationary violin tones
with subharmonics were related to the rustle quality
[11], which appears similar to roughness. This is in a
sharp contrast to the general belief that roughness is
always induced by rapidly modulated signals, raising
the question of multidimensionality of roughness.
Bergan and Titze [12] investigated the effect of
amplitude- and frequency-modulations on the
perception of pitch and roughness in voices with
subharmonics. In the introduction of their paper, the
authors mentioned that roughness induced by
subharmonics may be distinguishable from roughness
due to aperiodicity.

2.3. Unpleasant consonance: roughness induced by


subharmonics
According to Helmholtz, consonant intervals were
pleasant because very few beats were produced in
auditory channels. However, human voices with
subharmonics appears against this theory.
Fig. 1 displays two spectrograms of human voices
with subharmonics. The prominent pitch f0 is
represented as the first strong spectral component in
each spectrogram (marked with arrows). In Fig. 1a, the
frequencies of subharmonics (2n-1)f0/2 are integer
multiples of f0/2. Therefore, a sudden appearance of
these subharmonics could be regarded as adding a tone
one octave below to the voice. In general, the
frequencies of subharmonics in a voice are always
multiples of a low integer fraction of the fundamental
frequency, such as f0/2, f0/3, f0/4, or even f0/6 (Fig. 1b).
As the corresponding subharmonic pitch and the
fundamental frequency strictly stand in simple integer
ratios, it is a puzzle that such consonances in voices
are characterized by a rough quality.

3. Auditory modeling
3.1. Auditory grouping and interference reduction
As the spectral distribution of subharmonics may be a
new dimension of roughness, psychoacoustic models
based on the notion of critical bandwidth are unsuitable
for roughness induced by subharmonics. I suggest that
this type of roughness cannot be explained without
taking into account the pitch-based grouping mechanism
in auditory scene analysis.
Auditory scene analysis deals with the organization
of auditory scene which breaks a sound mixture into
elements and groups proximate elements into discrete
objects [13]. Grouping mechanisms are considered to
be governed by some grouping rules such as
harmonicity, coherent modulation, common onset and
spatial location.
A qualitative model of roughness induced by
subharmonics is proposed here with two assumptions.
First, the grouping rule of harmonicity is modified as
that components at nf0 are grouped only when the pitch
sensation of f0 is robust. In the other words, if the pitch
strength of f0 is low, components at nf0 will not be
grouped despite harmonicity. Second, unpleasant beats
between components lying in the same critical bands
will be largely reduced once they are grouped. This
assumption is supported by the fact that bright, lowpitched singing can have a low value of roughness.

Figure 1: Human voices with subharmonics.


2.4. Roughness vs. aperiodicity
In phoniatrics, several indicators have been used for
describing pathological voice quality, including
roughness, breathiness and hoarseness. Roughness has
been related to aperiodicity features of human voices,
which arise from the glottis instability. This approach
parallels the evaluations of psychoacoustic roughness in
amplitude- or frequency-modulated tones.
The correlation between roughness and modulated
sounds appears fairly low in music. A sopranos voice
can be rapidly and deeply modulated but still beautiful,

3.2. Model description


3.2.1.

Stage 1: Pitch extraction

The definition of subharmonics and harmonics demands


a definition of pitch, which is not obvious when the

258

subharmonic pitch f0/2 competes with f0.


Figs. 2a and 2b display the spectra of a pleasant
throat-singing voice (kargyraa) and a rough voice with
subharmonics. Psychoacoustic models of roughness
cannot explain that the throat-singing voice is less
rough than the voice with subharmonics. The perceived
drone pitch of the throat-singing voice is the frequency
of the first component, whereas the perceived pitch of
the rough voice is the frequency of the second
component. This can be related to the degree of predominance of the even-numbered components at lower
frequencies. For the throat-singing voice, no such
predominance is notable (Fig. 2a). For the rough voice,
the lowest six even-numbered components dominate
(Fig. 2b), so that the pitch f0 tends to be extracted
according to them; they are harmonics at nf0. Fig. 2c
displays the spectrum of a saxophone tone with
subharmonics. The predominance of lower harmonics
is also noticeable.

3.2.2.

Stage 2: Sifting with harmonic sieve

In this stage a harmonic sieve is constructed according


to the pitch f0. This harmonic sieve consists of a series
of harmonic holes at nf0. Harmonics pass the sieve,
while subharmonics are rejected by it.
3.2.3.

Stage 3: Grouping harmonics

In this stage, the components that have passed the


harmonic sieve are grouped as a single entity, which is
the pure part of the sound because the unpleasant beats
between harmonics are largely reduced. Rejected by the
harmonic sieve, subharmonics remain ungrouped,
evoking many entities in the auditory scene.
3.2.4.

Stage 4: Higher-order grouping

Although subharmonics are not grouped by the pitch of


f0/2, the auditory system still recognizes that both
harmonics and subharmonics have arisen from the same
source. This implicates a higher-order grouping.

4. Discussion
4.1. Cancellation filtering
The present model differs from previous roughness
models in the stages 2 and 3, where components
segregation and grouping take place. Interference
reduction of unresolved components is accomplished
through the sifting in the stage 2. To segregate the
harmonics lying in the same critical band, one should
assume a mechanism of f0-guided cancellation filtering
within auditory channels. A temporal model of
harmonics segregation was proposed in [14]. This
model offered a putative neural mechanism supporting
the idea that beats induced by unresolved harmonics are
cancelled at a higher level of the auditory processing
hierarchy.
Figure 2: Pitch extraction and the predominance of
odd-numbered components. (a) Spectrum of a throatsinging voice (kargyraa). (b) Spectrum of a rough voice.
(c) Spectrum of a saxophone tone.
Although subharmonics are often thought as weaker
than their flanking harmonics, it is important to note
that the predominance of harmonics depends on the
frequency range. Typically, lower subharmonics are
much weaker than their flanking harmonics and
partially masked by the latter. This harmonic
predominance is less significant at higher frequencies.
Figs. 2b and 2c show that the subharmonics above
6f0 are comparable to their flanking harmonics in
magnitude. However, these upper subharmonics are
unresolved components (rank>12) and unable to evoke
a robust pitch sensation. Consequently, the pitch
strength of f0/2 is fairly low.

4.2. Lower vs. upper subharmonics


While subharmonics have been identified as a source of
roughness, the present model distinguishes between the
roles of lower and upper subharmonics. It was
suggested that the pitch could be determined in terms of
subharmonic-to-harmonic ratio [15]. However, the
upper subharmonics have few contributions to pitch
perception. They cannot elicit a pitch to group
themselves, thus inducing roughness. In contrast, lower
subharmonics contribute to the pitch sensation of f0/2.
When these lower subharmonics become stronger, the
pitch and roughness of the sound will be reduced. This
effect has been verified in a perceptual experiment [10].
4.3. Subharmonics as auditory impurities
Stimuli with subharmonics shed new light on auditory
scene analysis by introducing the notion of auditory

259

impurity; a sound with subharmonics is perceived as a


pure part plus impurities. As impurities, subharmonics are more or less segregated from the pure
part of the sound composed of the well-grouped
harmonics, but still bound to it through a higher-order
grouping. This grouping, distinguishable from that
based on pitch, may stem from experiences and learning.
Since we often hear sounds with subharmonics emitted
from one oscillator such as the glottis and musical
instruments, the auditory system may have learned to
bind subharmonics with harmonics.
4.4. Sounds of self-sustained oscillators
The pitch sensation is of great importance of grouping
all components emitted by the same oscillator. However,
this grouping mechanism has some limitations. First,
self-sustained oscillators with a torus or strange attractor
in the phase space produce inharmonic components.
Second, even for the oscillators with a one-dimensional
attractor (limit cycle) that produce periodic sounds, the
pitch-based grouping sometimes fail because of peculiar
spectral features. For example, the sound of the
oscillator that has undergone a period-doubling can
have weak odd-numbered components at lower
frequencies. The pitch f0, which is extracted on the basis
of the lower even-numbered componentsthe
harmonicsis too high for grouping all components. The
pitch sensation of f0/2 can accomplish this task, but the
auditory system fails to perceive this pitch when the
lower odd-numbered componentsthe subharmonics
are weak and masked by adjacent harmonics. From this
standpoint, roughness induced by subharmonics reflects
a performance limit of the pitch-based auditory
grouping.

5. Conclusions
This study has reviewed some evidence against the
psycho-acoustic models of roughness based on the
notion of critical bandwidth. A model qualifying the
perception of roughness induced by subharmonics was
proposed with two assumptions: (1) grouping demands
a robust pitch sensation; (2) unpleasant beatings caused
by components in the same critical bands are largely
reduced by this grouping. Since lower subharmonics in
rough voices or musical tones are always much weaker
than their flanking harmonics, the pitch of f0/2 is very
weak. Therefore, the subharmonics are not grouped and
perceived as impurities. This new model takes into
consideration two higher-level mechanisms: (1)
grouping harmonics across critical bands, and (2)
binding the subharmonics with the well-grouped
harmonics to form a single auditory entity.
Future research should be dedicated to a calculation
model of roughness induced by subharmonics. This
demands the estimation of the relative strength of pitch

260

candidates, as the performance of grouping strongly


depends on the pitch strength.

References
[1] von Helmholtz, H. L. F. On the Sensations of Tone,
Dover, New York, 1954/1877.
[2] Reuter, R. Untersuchung der Rauhigkeit
menschlicher Stimmen auf der Grundlage der
nichtlinearen Dynamik und der Psychoakusik. PhD
thesis, Technical University Berlin, 2000.
[3] Aures, W. Ein Berechnungsverfahren der
Rauhigkeit, Acustica 58:268-281, 1985.
[4] McKinney, M. F., Tramo, M. J., and Delgutte, B.
Neural correlates of musical dissonance in the
inferior colliculus, in: Physiological and Psychophysical Bases of Auditory Function, Shaker
Publishing BV, 83-89, 2001.
[5] Fishman, Y., Reser, D. H., Arezzo, J. C. and
Steinschneider, M. Complex tone processing in
primary auditory cortex of the awake monkey. I.
Neural ensemble correlates of roughness, J.
Acoust. Soc. Am. 108(1):235-246, 2000.
[6] Tramo, M. J., Cariani, P. A., and Delgutte, B., and
Braida, L. D. Neurobiological foundations for the
theory of harmony in western tonal music, in: The
Biological Foundations of Music, The New York
Academy of Sciences, 92-116, 2001.
[7] Gibiat, V., and Castellengo, M. Period doubling
occurrences in wind instruments musical
performance, Acustica 86:746-754, 2000.
[8] Kimura, M. How to produce subharmonics on the
violin, J. New Music Res. 28(2):178-184, 1999.
[9] Mazo, M. Ericson, D., and Harvery, T. Emotion
and expression: Temporal data on voice quality in
Russian lament, in: Vocal Fold Physiology: Voice
Quality Control, Singular, San Diego, 173-178,
1995.
[10] Tsai, C. -G. The Chinese Membrane Flute (dizi):
Physics and Perception of its Tones. PhD thesis,
Humboldt University Berlin, 2003.
[11] Stepanek, J., and Otcenasek, Z. Rustle as an
attribute of timbre of stationary violin tones, J.
Catgut Acoust. Soc. 3(8):32-38, 1999.
[12] Bergan, C. C., and Titze, I. R. Perception of pitch
and roughness in vocal signals with subharmonics,
J. Voice 15:165-175, 2001.
[13] Bregman, A. S. Auditory Scene Analysis, MIT Press,
1990.
[14] de Cheveign, A. Concurrent vowel identification.
III. A neural model of harmonic interference cancellation, J. Acoust. Soc. Am. 101:2857-2865,
1997.
[15] Sun, X., and Xu, Y. Perceived pitch of synthesized
voice with alternate cycles, J. Voice 16(4):443-459,
2002.

You might also like