SC12 Kreiman Jody

Credits
Perceptual assessment of voice - past, present and future

The ideas presented here were developed in collaboration with Bruce Gerratt (UCLA). This research was supported by grant DC01797 from the National Institutes on Deafness and Other Communication Disorders.
Jody Kreiman, PhD University of California, Los Angeles
Outline
Quick review of the venerable history of current quality assessment protocols Discussion of theoretical reasons why these protocols remain unsatisfactory measurement tools Presentation of a psychological model of quality perception; and Description of the way in which this perceptual model can lead to psychoacoustic models of voice quality and reliable, valid, practical clinical measurement protocols.
Learner objectives and outcomes

Address historical changes in dysphonia assessment as well as future directions for researchers and clinicians. Describe key aspects of the perception of voice.
Why care about voice quality? I. INTRODUCTION

Most relevant aspect of voice to patient Document treatment progress Assess treatment efficacy Voices convey substantial information about speakers
Why include the listener? Why not just measure the acoustic signal?
Just as loudness and pitch do not exist without the listener, vocal quality is an acoustic-PERCEPTUAL phenomenon. We must be able to model listeners responses in order to reach our ultimate goal: a theoretical understanding sufficient to relate the perceived sound of a voice to the physiology that produced it, and physiology to the resultant percept.
II. THE PAST
The venerable approach to quality measurement

Create lists of terms to describe listeners auditory impressions Long history of verbal rating scales for voice quality. They are ingrained in Western culture, familiar, easy to apply, easy to understand, and have the ring of truth.
Ancient and modern labels for voice quality

Julius Pollux Moore, 1964 Brassy Brilliant Clear Deep Dull Harsh Shrill, sharp Thin Brassy, metallic Brilliant, bright Clear, white Deep Dull, dead Harsh, strident Shrill, sharp Thin Gelfer, 1988 Metallic Bright, vibrant Clear Resonant, low Dull Harsh Shrill, sharp Thin
Rating scale approaches to quality measurement

GRBAS protocol
Grade, Roughness, Breathiness, Asthenicity, Strain
CAPE-V protocol
Developed from a consensus meeting Design goals:
minimal set of meaningful parameters measures obtainable expediently applicable to a broad range of voices and settings reliable and valid with exemplars available for training
Stockholm Voice Evaluation Consensus Model

Aphonic, Breathy, Tense, Lax, Creaky, Rough, Grating, Unstable, Voice Breaks, Diplophonic
Same familiar, traditional scales (breathiness, roughness, strain, loudness, pitch)
Why consider alternate approaches?

Atheoretical approach Which scales to include?
Redundancies and ambiguities Multidimensional scaling and factor analytic studies have not resolved this problem
Vagaries of scale definition

Breathiness = dry, hard, excited, pointed, cold, choked, rough, cloudy, sharp, poor, bad (Isshiki et al.) or: Breathiness = breathy, wheezing, lack of timbre, moments of aphonia, husky, not creaky? (Hammarberg et al.)
Voice profile analysis

Consistent with phonetic theory Specifies how scales are related to each other: e.g., hoarse voice = deep, (loud), harsh/ventricular, whispery voice; gruff voice = deep, harsh, whispery, creaky voice Specifies where information about quality might be, but does not model listeners behavior
Voice profile analysis
The first elephant in the room: Validity

The problem of what qualities to measure has never been solved, leaving validity up in the air.
The second elephant: reliability

What we REALLY want to know: what is the likelihood that another rater will produce the same rating for a given voice sample? This is not what most assessments of reliability measure.
Standard reliability approaches

Standard reliability tests use statistics that measure the likelihood that a new random sample of raters would produce the same mean rating as the group studied, averaged across all the voices studied.
How to measure reliability?

These two approaches can lead to very different conclusions about reliability.
Kreiman et al., 1993 Kreiman et al., 1994 Kreiman & Gerratt, 1996 ICC = 0.99 = 0.99 ICC = 0.93 = 0.97 ICC = 0.89 = 0.90 P (exact) = 0.32
Probabilities of exact agreement
P (exact) = 0.21
P (exact) = 0.26
Roughness ratings Breathiness ratings
Solutions to unreliable rater problem

Average ratings to achieve reliable mean Other creative statistical techniques Train listeners Use fewer scale values Anchored protocols Give up and just ask the patient about satisfaction with voice/quality of life Substitute objective measures
Objective approaches to quality assessment

Acoustic assessment protocols
Dysphonia Severity Index Hoarseness Diagram Multidimensional Voice Program (MDVP)
Depend on inconsistent correlations with perceptual measures for validity as measures of quality
What to do?
Find the sources of variability. Develop alternative measurement approaches that target and reduce this variability.
III. THE PRESENT

A psychological model of voice perception
How listeners introduce variability

The literature provides evidence for four factors that introduce variability into measurements of quality:
Instability of internal standards for different qualities; Difficulties isolating individual attributes in complex acoustic voice patterns; Measurement scale resolution; The magnitude of the attribute being measured.
Experimental evidence
Four experimental factors, corresponding to these four theoretical factors:
Presence/absence of comparison stimuli; Comparison stimuli that were/were not matched to the voices being rated; Visual analog versus 6-point rating scales; The overall mean rating for each voice
Listeners should agree best when all factors are controlled, and worst when nothing is controlled.
Controlling the factors

Six experimental tasks:
Four with and two without comparison stimuli
Two with custom comparison stimuli, and two with generic comparison stimuli
Results
These four factors accounted for 84.2% of the variance in the likelihood that listeners would agree exactly in their ratings.
Continuous (visual analog) versus 6-point scales for breathiness Overall mean rating for each voice included as a covariate in the ANCOVA analysis
Unmatched anchors are worse than no anchors
So: An ideal quality assessment protocol would

1. 2. 3.
Avoid reliance on internal standards and help listeners focus attention Not depend on selection/definition of labels for quality dimensions Have fine scale resolution An analysis-by-synthesis approach meets these criteria.
Continuous scale, matched anchor stimuli
Six point scale, generic anchor stimuli
IV. THE FUTURE
What now?
So: We have a model of voice quality perception that shows us how to measure quality reliably and validly. Based on this model, we have developed a tool for assessing the perceptual importance of different acoustic parameters. Unfortunately, this model may never translate directly into a practical clinical application. HOWEVER
What now?
We can use these methods to perceptually validate acoustic measures and derive a true PSYCHOACOUSTIC model of voice quality.
For example, dB is a perceptually validated measure that relates intensity to perceived loudness.
Psychoacoustic modeling
Such a psychoacoustic model could eliminate the need for subjective quality measures, because:
The perceptual importance of each acoustic parameter can be established; interactions among parameters can be modeled; and the composite set of parameters can be selected so that it is adequate to specify voice quality.
How to build a model

Listeners typically pay attention to attributes that vary from voice to voice, so
Find the acoustic attributes that vary the most from voice to voice. Test the perceptual significance of these parameters. Using speech synthesis, evaluate the effectiveness of the parameters as a set for modeling voice quality.
How to build a model

The end point of psychoacoustic modeling efforts is a set of perceptually-valid acoustic parameters. These parameters can be used to evaluate changes in voice quality in the clinic, because they are objective measures whose relationship to quality is understood theoretically.
The final step: A theory of voice

Development of a set of perceptually-valid acoustic measures would represent a step towards a complete theory of voice that relates changes in patterns of vocal fold vibration to changes in quality. Such a model would provide a theoretical basis for clinical assessment, because it would specify causal links from laryngeal physiology, to voice acoustics, to quality, and back.
A theory of voice
We submit that development of such a comprehensive theory should be the primary goal of voice research.
Summary
The last 2000 years have produced awareness and descriptions of the importance of voice and its uses Previous work has not led to very much understanding of the whys of quality, so measurement techniques remain unsatisfying. We may be quite near a solution to this long-term problem. Even more ambitious goals are obtainable once the problem of generating reliable and valid measures of voice is solved.
Conclusion
When we cannot measure, our knowledge is meager and unsatisfactory.
Attributed to Lord Kelvin
If it exists, it exists in amounts, and if it exists, it can be measured.

Lord Thorndyke

SC12 Kreiman Jody

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SC12 Kreiman Jody

Uploaded by

Copyright:

Available Formats

Credits

Perceptual assessment of voice - past, present and future

Jody Kreiman, PhD University of California, Los Angeles

Learner objectives and outcomes

Why care about voice quality? I. INTRODUCTION

II. THE PAST

The venerable approach to quality measurement

Ancient and modern labels for voice quality

Rating scale approaches to quality measurement

Stockholm Voice Evaluation Consensus Model

Same familiar, traditional scales (breathiness, roughness, strain, loudness, pitch)

Why consider alternate approaches?

Vagaries of scale definition

Voice profile analysis

Voice profile analysis

The first elephant in the room: Validity

The second elephant: reliability

Standard reliability approaches

How to measure reliability?

Probabilities of exact agreement

Solutions to unreliable rater problem

Objective approaches to quality assessment

III. THE PRESENT

How listeners introduce variability

Controlling the factors

Unmatched anchors are worse than no anchors

So: An ideal quality assessment protocol would

Continuous scale, matched anchor stimuli

Six point scale, generic anchor stimuli

IV. THE FUTURE

How to build a model

How to build a model

The final step: A theory of voice

If it exists, it exists in amounts, and if it exists, it can be measured.

You might also like