Measuring Speech Intelegibility Using Dirac

TN002
Technical Note
MEASURING SPEECH INTELLIGIBILITY USING DIRAC
Copyright 2008 Acoustics Engineering
www.acoustics-engineering.com
October 2008
Technical Note
TN002 Measuring Speech Intelligibility Using DIRAC
This page intentionally left blank.
October 2008
Technical Note

Contents
1 Speech Intelligibility Basics............................................................................5
1.1
1.2
1.3
1.4
Modulation Transfer Function MTF..........................................................................................5

Modulation reduction matrix.....................................................................................................6
Measuring MTF: modulated noise versus impulse response...................................................7
MTF measurement conditions and limitations..........................................................................7
2 Parameters Related to Speech Intelligibility ................................................9

2.1
2.2
2.3
2.4
2.5
2.6
2.7
Speech Transmission Index STI..............................................................................................9

Room Acoustics Speech Transmission Index RASTI..............................................................9
Speech Transmission Index for Telecommunication Systems STITEL..................................10
Speech Transmission Index for PA Systems STIPA..............................................................10
Percentage Articulation Loss of Consonants % ALC.............................................................10
Signal to Noise Ratio SNR.....................................................................................................11
Early Decay Time EDT...........................................................................................................11
3 Measuring the Speech Intelligibility............................................................12

3.1
3.2
3.3
3.4
3.5
Measurement techniques.......................................................................................................12
Hardware Components..........................................................................................................12
Measuring the SNR................................................................................................................13
Level calibration.....................................................................................................................14
Combining the Impulse Response with SNR Values.............................................................17
4 Measurement Situations..............................................................................19
4.1
4.2
4.3
4.4
Impulse response measurement procedure...........................................................................19

Situation 1: Talker and listeners in same room without sound system...................................19
Situation 2: Talker and listeners in same room with sound system........................................20
Situation 3: Talker and listeners in different areas.................................................................22
5 Appendix A: Measuring the IR in the presence of background noise......25

6 Appendix B: The impact of the SNR on speech intelligibility...................26
7 References......................................................................................................30
October 2008
Technical Note
1 Speech Intelligibility Basics

1.1 Modulation Transfer Function MTF
The speech intelligibility parameters in Dirac are based on the relation between perceived
speech intelligibility and the intensity modulations in the talkers voice, as described by
Houtgast et al. [1] and [6]. When a sound source in a room is producing noise that is intensity
modulated by a low frequency sinusoidal modulation of 100% depth, the modulation at the
receiver position will be reduced due to room reflections and background noise. The
modulation transfer function (MTF) describes to what extent the modulation m is transferred
from source to receiver, as a function of the modulation frequency F, which ranges from 0.63
to 12.5 Hz. Hence, the MTF depends on the system properties and the background noise (see
figure 1).
reflections
background
noise
direct sound
mouth
simulator
omni-directional
microphone
room under test
intensities
MTF(F)
msource(F)
mreceiver(F)
Figure 1: Relation between speech intelligibility and modulation depth.
October 2008
Technical Note

1.2 Modulation reduction matrix
For each octave frequency band involved, the MTF is determined. High MTF values indicate
a good transfer of the level modulations in a talkers voice at the source position, as perceived
by a listener at the receiver position, and hence a good speech intelligibility. Low MTF values
indicate a significant reduction of the speech intelligibility, due to the acoustical system
properties and/or background noise. The MTF values for the 14 modulation frequencies are
averaged, resulting in the so called modulation transmission index (MTI). The modulation
transmission indices for the 7 octave band frequencies can be processed to arrive at the
Speech Transmission Index STI (see also [1]). The parameters RASTI (STI for room
acoustics), STITEL (STI for telecommunication systems) and STIPA (STI for public address
systems) are obtained in a similar way, but cover fewer frequency bands and/or different
modulation frequencies. Table 1 shows an example of a so called modulation reduction
matrix. In the matrix, the given modulation and octave band frequencies apply directly to STI.
The RASTI and STITEL frequency pairs are projected as well, most F-values laying in
between the STI F-values.
Table 1: Example of STI modulation reduction matrix.
The displayed matrix is for Male speech.
Female Speech parameters exclude the 125 octave frequency band.
RASTI octave bands and modulation frequencies
STITEL octave bands and modulation frequencies
Note that most RASTI and STITEL modulation frequencies are intermediate
between the listed modulation frequencies.
Modulation
frequency
F [Hz]
0.63
0.8
1.0
1.25
1.6
2.0
2.5
3.15
4.0
5
6.3
8.0
10
12.5
MTI
Octave band frequency [Hz]

125
250
500
1k
2k
4k
8k
1.000
1.000
1.000
1.000
0.858
0.858
0.858
0.858
0.858
0.651
0.651
0.506
0.506
0.444
1.000
1.000
1.000
1.000
0.866
0.866
0.866
0.87
0.866
0.676
0.676
0.533
0.533
0.460
1.000
1.000
1.000
1.000
0.806
0.806
0.806
0.806
0.806
0.543
0.543
0.377
0.377
0.267
1.000
1.000
1.000
1.000
0.852
0.852
0.852
0.852
0.852
0.664
0.664
0.531
0.531
0.427
1.000
1.000
1.000
1.000
0.837
0.837
0.837
0.837
0.837
0.630
0.630
0.506
0.506
0.406
1.000
1.000
1.000
1.000
0.841
0.841
0.841
0.841
0.841
0.633
0.633
0.502
0.502
0.413
1.000
1.000
1.000
1.000
0.835
0.835
0.835
0.835
0.835
0.612
0.612
0.471
0.471
0.373
0.75
0.76
0.7
0.75
0.73
0.74
0.73
October 2008
Technical Note

From the modulation reduction matrix, you can obtain information on the reliability of the
results and the cause of the reduction of the speech intelligibility. A constant MTF over F
indicates background noise, a constantly decreasing MTF indicates reverberation and an MTF
first decreasing and then increasing with F indicates an echo. For example, the modulation
reduction matrix of table 1 reflects a case where only reverberation plays a role, while the
influence of background noise and echoes is negligible.
The used octave frequency bands are related to the typical frequency range of a human voice.
A differentiation between typical male and typical female voice spectra is made in IEC
60268-16 [2]. The female voice spectrum model does not include the 125 Hz octave band.
1.3 Measuring MTF: modulated noise versus impulse response

Two commonly used MTF measuring methods are the modulated noise method and the
impulse response method.
The modulated noise method is straightforward. A source produces an excitation signal
basically consisting of 7 x 14 = 98 summed noise signals, each of which is filtered and 100%
modulated according to the matrix in table 1. A receiver picks up the signal at the listener
position and measures for each octave band and modulation frequency the reduced
modulation MTF(F). Due to the randomness of the excitation signal, a measurement takes a
relatively long time to obtain reproducible results (some 30 s on average). Because
background noise may fluctuate, the receiver may misinterpret these fluctuations as signal
modulations, and overestimate the speech intelligibility at low SNR values.
Schroeder [3] has shown that the MTF, hence the STI, can be derived from the impulse
response through the complex MTF, which is the Fourier transform of the squared impulse
response. Rife [4] has used [1] and [3] to include the impact of background noise. If the
impulse response is measured through deconvolution of a deterministic signal, such as an
MLS or sweep signal, the measurement takes much less time than with the modulated noise
method for the same reproducibility (some 5 s on average).
1.4 MTF measurement conditions and limitations

The MTF as a basis for the speech intelligibility also has its limitations. Distortions in the
system under test may affect the MTF (hence the measured speech intelligibility) differently
from the real speech intelligibility. For instance, a recorded voice that is played back at a
slightly higher speed is still very well intelligible, but the measured MTF may drop
significantly. Center clipping (cross-over distortion) may affect the real speech intelligibility
much more severely than the measured MTF. The same holds for signal drop outs (Houtgast,
Steeneken et al. [6]). In general the deviation between real and measured speech intelligibility
will be different for the 2 measuring methods, mentioned in the previous paragraph.
In the IEC 60268-16 standard [2], some conditions are given to avoid problems:
6
October 2008
Technical Note

1. The system under test should not introduce frequency shifts or use frequency
multiplication.
2. The system under test should not contain vocoders, such as LPC, CELP and RELP.
3. The speech transmission should be essentially linear, with amplitude compression or
expansion limited to 1 dB, and no peak clipping.
Obviously, we can add:
4. The system under test should not introduce center clipping.
5. The system under test should not introduce drop outs.
It is therefore important to be aware of any nonlinear behaviour when measuring the speech
intelligibility through a sound system. If the system is behaving linear, the measured and real
speech intelligibilities correlate very well for both methods
October 2008
Technical Note
2 Parameters Related to Speech Intelligibility

2.1 Speech Transmission Index STI
The speech transmission index STI is the most comprehensive and important speech
intelligibility parameter in Dirac. Although not usable for transmission channels that
introduce frequency shifts or frequency multiplication, or include vocoders, the STI takes into
account most effects that could cause deterioration of the speech intelligibility. For details,
refer to [2].
Technically, the STI is calculated as the weighted sum of modulation transfer indices MTI,
one for each octave frequency band from 125 Hz through 8 kHz (where each MTI value is
derived from MTF values over 14 different modulation frequencies, see table 1) taking into
account auditory effects according to IEC 60268-16.
Through different MTI weighting factors, a differentiation has been made between male and
female STI values. The STI male is based on a standard male voice frequency spectrum.
Similarly, the STI female is based on a standard female voice frequency spectrum.
The STI relates to the speech intelligibility according to table 2.
Table 2: Relation between STI and speech intelligibility.
STI
0.00 - 0.30
0.30 - 0.45
0.45 - 0.60
Speech intelligibility
Bad
Poor
Fair
0.60 - 0.75
Good
0.75 - 1.00
Excellent
2.2 Room Acoustics Speech Transmission Index RASTI

The RASTI is a simplified version of the STI, intended to approach the STI under typical
room acoustical conditions. The RASTI was originally developed to reduce the time required
to perform the measurement (using modulated noise) and the time to compute the final result.
In order to obtain correct RASTI values, in addition to the requirements for the STI method
the overall system frequency response must be uniform from the 125 Hz through the 8 kHz
octave band, the background noise must be smooth in time and frequency, the space must be
substantially free of discrete echoes and the reverberation time must not be strongly frequency
dependent. For details, refer to [2].
The RASTI is calculated as the weighted sum of MTIs over the 500 and 2000 Hz octave
bands, where the MTI values are derived from MTF values over 4 and 5 different modulation
frequencies respectively.
In Dirac, where the STI is measured through impulse responses rather than modulated noise,
the RASTI has no advantage over the STI with respect to measurement or computation time.
8
October 2008
Technical Note

Nevertheless, the RASTI requires a smaller measurement system bandwidth than the STI, and
it can be used for survey measurements in most practical room acoustical situations.
2.3 Speech Transmission Index for Telecommunication Systems STITEL

The STITEL is another simplified version of the STI, also meant to reduce measurement and
calculation time. In order for the STITEL values to approach the corresponding STI values, a
number of measurement conditions have to be met, which are typical for telecommunication
systems.
The STITEL uses the same octave bands as the STI, but in each band only 1 octave band
specific modulation frequency is used.
In Dirac, where the STI is measured through impulse responses rather than modulated noise,
the STITEL has no advantage over the STI with respect to measurement or computation time,
but it is meant for comparability with other measured STITEL values.
2.4 Speech Transmission Index for PA Systems STIPA

The STIPA is a simplified version of the STI, intended to emulate STI under conditions,
typical for public address systems. STIPA was developed to reduce the time required to
perform a measurement (using modulated noise), and the time to compute the final result. In
order for STIPA values to approximate corresponding STI values, several measurement
conditions have to be met, which are typical for public address systems.
The STIPA uses the same, partly combined, octave bands as the STI, but in each band only 2
octave band specific modulation frequencies are used.
In DIRAC, where STI is measured through impulse responses rather than modulated noise,
STIPA has no advantages over STI with respect to measurement or computation time, but is
meant for comparisons with other measured STIPA values.
2.5 Percentage Articulation Loss of Consonants % ALC

The % ALC (also called % ALcons) is originally based on the reception of words by listeners.
In Dirac, the % ALC is derived from the STI through a widely used approximation formula
by Farrel Becker1 [6]:
% ALC = 170.5405 e-5.419(STI)
The editor of [6] notes on p.81: This equation, referred to as Farrel Becker equation, is often used to relate STI to ALcons
scores. It appears that the source of this equation is not documented in open literature. However, a remarkable
correspondence is observed with the empirical data reported (in a figure rather than as an equation) by Houtgast et al. [1]. It
seems reasonable to assume that the equation was either obtained through similar experiments, or derived from the data
reported by Houtgast et.al.
October 2008
Technical Note

The same formula is used with RASTI and STITEL. The % ALC values normally range from
0 (corresponding to an excellent speech intelligibility) to 100 (corresponding to an extremely
bad speech intelligibility), but the % ALC value calculated from the above-mentioned
approximation formula, will exceed 100 at a very low STI.
The % ALC in Dirac is mainly meant for comparability with other calculated or measured %
ALC values.
2.6 Signal to Noise Ratio SNR

The SNR is defined as the logarithmic ratio of the signal level and the noise level, and is
therefore related to signals rather than systems. In Dirac, the SNR is meaningful only if 3
conditions are met:
1. The SNR is obtained using any but the External Noise measuring method.
2. The measurement system noise is negligible compared to the measured background
noise. This is normally the case when a loopback 2 impulse response, measured under
condition 1, does not show clearly visible noise or spikes.
3. With MLS or sweep, the Pre-Average is set to 1 and the system under test is timeinvariant. For instance, during a measurement in a room, temperature changes, air
movements or persons walking around should be avoided.
2.7 Early Decay Time EDT

Because the EDT relates more than the other reverberation parameters to the initial and
highest level part of the decaying energy, it also relates most to modulation reduction. The
EDT is derived from the decay curve section between 0 dB and 10 dB below the initial level.
From the corresponding slope, the EDT is calculated as the time to reach 60 dB.
Using internal generator: sound device output connected to sound device input. Using external generator: generator output
connected to sound device input.
10
October 2008
Technical Note
3 Measuring the Speech Intelligibility

3.1 Measurement techniques
There are two factors that determine the speech intelligibility. The background noise, or rather
the signal to noise ratio SNR, and the acoustics of the room (e.g. the sound reflections in the
room). Dirac derives the acoustics of a room from a measured impulse response. Rife [4] has
found that (under specific conditions) the SNR can also be calculated from the measured
impulse response. This means that both factors influencing the speech intelligibility can be
measured simultaneously.
However, it is also possible to measure both factors independently. In fact, measuring the
room acoustics using an impulse response without the presence of (significant) background
noise gives much better results, as you can use pre-averaging, longer sequence lengths etc. to
improve the INR of the impulse response. Also, it is not always possible or even desirable to
measure the impulse response in the presence of background noise.
The preferred method of measuring the speech intelligibility using Dirac is through separate
measurements of the impulse response and the SNR. These can then be combined when
calculating the speech intelligibility. This also makes it possible to use a single impulse
response measurement to study different scenarios with different SNR values.
Despite the obvious advantages of using separate measurements, it may still be necessary to
use a single measurement with Rife's method for calculating the SNR. Appendix A
summarizes this combined measurement procedure.
In Appendix B the impact of the SNR on speech intelligiblity is detailed, which leads to the
important 15 dB SNR criterion:
The background noise is negligible if the SNR exceeds 15 dB in each relevant octave
frequency band and the STI does not exceed 0.8.
In practice, STI values rarely exceed 0.8 and most likely, only a few SNR values, if any, will
come close to 15 dB. Therefore, in most practical cases it will be sufficient to meet only the
SNR > 15 dB condition.
3.2 Hardware Components

Mouth-directional loudspeaker sound sources
Mouth-directional loudspeaker sound sources, in short mouth simulators, have a directivity
similar to a human mouth. This directivity is relevant for the speech intelligibility, in that
speech intelligibility is highest on the axis of the source. An artificial mouth with directivity
October 2008
11
Technical Note

characteristics according to ITU-T P.51 [5] is preferred. A small high quality loudspeaker
with diameter not exceeding 100 mm is also usable.
A mouth simulator is normally used in the situation of an unamplified talker or in a situation
with a sound system equipped with a close-talking microphone. It is driven by a power
amplifier having an output power in the order of magnitude of 10 W.
External signal generators

MLS or sweep generator: digital sound recording device
With External MLS or Sweep measurements, you have to use a generator that produces the
same periodic MLS or sweep signal as Dirac does in the Internal MLS or Sweep mode. Slight
timing difference between the generator and the sound device that would normally cause the
INR to decrease substantially, are compensated for by Dirac automatically [7].
A convenient generator is either a CD or MP3 player [9]. Dirac comes with an audio CD
containing a variety of stimuli that can be used for open-loop measurements. The Dirac
product CDROM contains a number of MP3 files that can be copied to any MP3 player and
used for this same purpose.
Noise generator
With the External Noise measuring method, you can use any broadband random signal as
excitation signal. This may be useful for speech intelligibility measurements in occupied
rooms, where music may be more convenient than noise. However care must be taken that all
relevant frequency components are sufficiently represented in the broadband signal.
3.3 Measuring the SNR

To measure the SNR, you can use Dirac or a handheld sound level meter. Sometimes the SNR
does not need to be measured because the sound system is specified to deliver a given
minimal SNR. In other cases, only the noise levels need to be measured because the signal
from the PA system is set to a fixed level.
To measure the SNR with Dirac (or a handheld meter), you would typically first measure the
signal levels, in all relevant octave bands, for a speech shaped signal. This gives you the S+N
values. Next, a measurement without the signal gives you the N values. The SNR can then be
calculated for each octave band as SNR = ((S+N)N)/N.
The test signals used for SNR measurements should have a spectrum corresponding to
average male or female speech. For STI, STITEL and STIPA measurements the octave bands
should have the following levels (relative to a long term A-weighted level of 0 dB):
12
October 2008
Technical Note

Table 3: Relative octave band levels [dB].
Octave band [Hz]
125
250
Male
2.9
2.9
Female
-5.3
500
1000
2000
4000
8000
-0.8
-1.9
-6.8
-9.1
-12.8
-15.8
-18.8
-16.7
-24.8
-18.0
For RASTI measurements, where only 2 octave bands are relevant, the relative levels are as
follows:
Table 4: Relative octave band levels for the RASTI test signal.
Octave band [Hz]
500
2000
Level [dB]
-1.0
-10.0
Signals with the correct spectrum (Male, Female or RASTI) can be generated by Dirac, and
are also available as MP3 files on the Dirac product CD.
Note that when such a signal is played through a mouth simulator, the spectrum may change
due to the response characteristics of the mouth simulator. In this case it may be necessary to
use an equaliser to compensate. This can be done by playing a pink noise signal through the
equalizer and measuring the Leq values in the relevant octave bands of the sound emitted
through the mouth simulator. For a flat response, all values measured Leq values should be
equal.
When the signal is injected directly in an existing PA system, no equalisation is necessary, as
the response of the PA system always influences the speech signal, and is part of the system
that needs to be measured.
3.4 Level calibration

In order to measure absolute sound levels in Dirac, an input level calibration must be
performed. The input level calibration requires that the sound device is calibrated. To perform
an input level calibration, follow these steps:
1. Click the
Measure button and select the source and receiver settings:
Source Signal: External Impulse.

Capture Length: exceeding 5 s.
Receiver Type: dual omnidirectional:
2. Connect the omnidirectional microphone to the sound device input you want to use.
3. Insert the microphone into a single tone sound calibrator and switch the calibrator on.
October 2008
13
Technical Note

4. Adjust the recording gains upward, by means of the sliders in the Measurement window
or externally, keeping dark at least 1 red element of the level meter.
Figure 2: Adjusting the received calibration tone level.
5. Click the Start button and wait until the measurement is finished. Make sure that the
calibration signal did not stop before the measurement finished.
6. Check the recorded sine by zooming in. It must be a clearly visible, non clipping sine
wave.
14
October 2008
Technical Note
Figure 3: Checking the recorder calibration signal.
7. From the Setup menu, choose Calibrate Input Level.

8. In the Level Calibration dialog box, fill in the sound calibrator level, given by the
manufacturer, select the channel on which you recorded the calibrator tone, and then click
OK. Now the input level is calibrated for the selected channel.
Figure 4: Entering the sound calibrator level.
Notes
The most recent input level calibration date is displayed in the lower right-hand corner
of the Measurement window.
October 2008
15
Technical Note
You can clear the calibration results by clicking the Clear button. This will also clear
the input calibration notification in the Measurement window.
If you calibrate channel 1, while channel 2 was not yet calibrated since the last
calibration reset, channel 2 will adopt the calibration result of channel 1. This is not as
accurate as calibrating channel 2 separately (which is recommended anyway), but with
2 microphones of the same type, this is probably better than not calibrating channel 2
at all. Channel 1 will never adopt the calibration result of channel 2.
The input level calibration is valid only for the microphone used with the calibration.
The procedure is described for 1 microphone. You can also calibrate 2 microphones
simultaneously, using a sound intensity microphone probe calibrator.
Input level calibration is useful only if the sound device input frequency characteristic
is flat within 1 dB over 88 Hz through 11.3 kHz.
It is recommended to repeat the input level calibration before each speech

intelligibility or background noise measurement session.
You can save the recorded calibration tone for later usage, but this is not necessary.
Make sure the sound device is calibrated before calibrating the input level.
3.5 Combining the Impulse Response with SNR Values

Measured or estimated SNR values can be entered on the speech intelligibility window in
Dirac to evaluate the speech intelligibility parameters under different background noise
conditions. Initially, Dirac will display the SNR values that were calculated from the impulse
response using Rife's method. These SNR values are used in the calculation of the speech
intelligibility parameters.
Note that the SNR values are only accurate when the INR values in the corresponding octave
bands are larger than 15 dB. If this is not the case, the derived speech intelligibility values are
displayed in grey, suggesting that these value may be inaccurate and should be used as an
indication only. Obviously, in this case an attempt should be made to improve the INR of the
measured impulse response by increasing the pre-average count or by increasing the capture
length.
16
October 2008
Technical Note
Figure 5: Speech intelligibility with SNR calculated from impulse response.
When 'Manual SNR' is checked, SNR values can be entered manually, and the speech
intelligibility parameters are updated immediately based on these values.
Figure 6: Speech intelligibility with manual SNR values.
October 2008
17
Technical Note

4
Measurement Situations
4.1 Impulse response measurement procedure

To carry out a speech intelligibility measurement, you can follow the general procedure 3
below to acquire the impulse response.
1. Connect the excitation signal to a mouth simulator at the talker position or to the
sound system input.
2. Connect an omnidirectional microphone at a listener position to sound device input 1.
3. In Dirac, select the source and receiver settings:
Select an excitation signal. Normally the e-Sweep is the stimulus of choice, but in
some situations the MLS may be preferred because it sounds less obtrusive.
Select a capture length exceeding 2 s and the estimated reverberation time. Note
that a longer capture length generally results in a better INR.
Select a Pre-Average value. Higher pre-average values result in a higher INR.
Click the
single omnidirectional receiver button
4. Measure and save the impulse response.

5. If applicable, repeat the measurement with the receiver at different listener positions.
6. Analyse the impulse responses.
At this point it is important to note that the INR in all relevant octave bands of the measured
impulse response should be higher than 15 dB. This is required for Dirac to be able to
calculate the SNR. In most situations this requirement can easily be met by a combination of
techniques such as longer capture lengths, higher pre-average values and higher stimulus
output levels.
Hereafter, for several situations the measurement setup will be described in more detail.
4.2 Situation 1: Talker and listeners in same room without sound system
Figure 7 shows a practical situation with a talker and listeners in a room without sound
reinforcement system.
For other details, such as dual channel measurement, microphone amplifiers, power amplifiers and the use of sound level
meters instead of microphones, refer to the Dirac user manual.
18
October 2008
Technical Note
reflected
sound
direct sound
background
noise
Figure 7: Room without sound system.
The listeners receive direct sound from the talker, reflected sound, such as reverberation or
echoes, and background noise, for instance from an HVAC system. The direct sound
contributes positively to the speech intelligibility. Reflected sound may contribute positively
(e.g. via the front board) or negatively to the speech intelligibility. Background noise
contributes negatively to the speech intelligibility.
In this situation, a mouth simulator positioned at the talker's position should be used. The
measurement can be performed with a closed-loop configuration using an internal MLS or eSweep stimulus.
mouth
simulator
omni-directional
microphone
room under test

Dirac running
on notebook
Figure 8: Closed loop measurement setup in room

without sound system.
4.3 Situation 2: Talker and listeners in same room with sound system
Figure 10 shows a talker and listeners in a room with sound reinforcement system. Primarily,
the listeners receive direct sound from the sound reinforcement system, reflected sound, such
as reverberation or echoes, and background noise, for instance from an HVAC system. There
are also some secondary effects. There is direct sound coming from the talker, which is a low
level delayed version of the primary direct sound arriving at the listeners. Furthermore, the
October 2008
19
Technical Note

sound system microphone not only picks up the direct sound from the talker, but also
reflections from the lectern, fed-back sound and background noise.
reflected
sound
background
noise
direct sound
Figure 9: Room with sound reinforcement system.
In this situation, there are 2 ways to measure the speech intelligibility. To include the sound
system microphone characteristic and all secondary effects, use a mouth simulator.
Figure 10: Closed loop measurement setup with mouth simulator, in room with sound system.
20
October 2008
Technical Note
Otherwise, you can inject the excitation signal directly into the sound system.
Figure 11: Closed loop measurement setup using direct injection, in room with sound system.
4.4 Situation 3: Talker and listeners in different areas

Figure 12 shows a talker and listeners in different areas connected through a long distance
sound system. Primarily, the listeners receive direct sound from a nearby loudspeaker array,
delayed direct sound from a loudspeaker array further away, reflected sound, such as
reverberation or echoes, and background noise, for instance from the environment. There are
also some secondary effects. The sound system microphone not only picks up the direct sound
from the talker, but also reflections and background noise from within the talkers room.
In the proposed open loop measurement technique, a CD or MP3 player is used to play the
stimulus. This is the same stimulus that would be generated by Dirac in a closed loop
measurement setup. (A range of stimuli for this purpose can be found on the audio CD
accompanying Dirac). Dirac automatically compensates for clock rate differences between the
playback device and the sound device connected to the notebook PC.
October 2008
21
Technical Note

first
arriving
sound
background
noise
delayed
sound
Figure 12: Separate talker and listener areas with long distance sound system.
Also in this situation, there are 2 ways to measure the speech intelligibility. To include the
sound system microphone characteristic and all secondary effects, use a mouth simulator.
Figure 13: Open loop measurement setup with mouth simulator.
Otherwise, you can inject the excitation signal directly into the sound system.
22
October 2008
Technical Note
Figure 14: Open loop measurement setup, using direct injection.
October 2008
23
Technical Note
5 Appendix A: Measuring the IR in the presence of background

noise.
Measuring an impulse response in the presence of significant background noise, to determine
the speech intelligibility, can be difficult. Dirac demands an INR larger than 15 dB to be able
to reliably calculate the SNR. At the same time, a low SNR value during the measurement is
detrimental to the quality of the impulse response and hence the INR. To add to these
difficulties, the method devised by Rife to calculate the SNR from the impulse response does
not give the correct results when the SNR is increased artificially through pre-averaging.
1. Calibrate the sound device
2. Calibrate the input level
3. Use an equalizer and a pink noise signal to flatten the response of a mouth simulator
when applicable.
4. Choose an MLS or Lin-Sweep stimulus.
5. Select the source filter depending on the parameter that needs to be calculated:
Table 5: Output filter versus speech intelligibility parameter.
Speech intelligibility parameter to be measured
Output filter
STI Male or corresponding % ALC value
Male
STI Female or corresponding % ALC value
Female
RASTI or corresponding % ALC value
RASTI
STIPA for male voice or corresponding % ALC value
Male
STIPA for female voice or corresponding % ALC value
Female
6. Set the output level to a practical value or 60 dB(A).

7. Measure the IR with a pre-average value of 1.
Note that it is possible to use either internally or externally generated stimuli. When using an
external stimulus, the stimulus type, source filter and sequence length of this external signal
need to conform to the settings in the measurement dialog. You can find a number of speech
filtered MLS and Lin-Sweep stimuli on the Dirac CD.
If little or no background noise is present during the measurement, then the impulse response
acquired through the above method can also be used to add synthetic noise, or to mix with a
file containing a measured noise signal.
24
October 2008
Technical Note
6 Appendix B: The impact of the SNR on speech intelligibility

If the background noise level is negligible, there is no need to measure the SNR and
incorporate it in the speech intelligibility calculations. Therefore it may be worthwhile to take
a closer look at the impact of the SNR on the speech intelligibility in practical situations, and
formulate the exact condition under which the SNR is relevant.
The background noise is assumed to be negligible if its presence results in a STI decrease of
less than 5% of the STI without background noise. To figure out what the consequence is for
the allowed SNR, we write the MTF as a product of 2 modulation reduction factors, m0(k,F)
due to system properties (reverberation, echoes) and mSNR(k,F) due to background noise:
MTF (k , F ) = m0 ( k , F ) m SNR (k ) =
m0 ( k , F )
1 + 10
SNR ( k )
10
(1)
where k is the octave number. By definition, the so-called effective SNR, SNR eff(k,F) relates
to MTF(k,F) as does the SNR to mSNR:
MTF (k , F )
SNReff (k , F ) = 10 log
1 MTF (k , F )
Hence, SNReff equals SNR if the modulation is reduced by background noise only. For each k
from a set of octave frequency bands, and each F from a set of modulation frequencies,
SNReff(k,F) is calculated, and clipped to 15 dB, before being further processed to calculate
the STI. The clipping operation reflects that SNReff values exceeding 15 dB cannot have any
negative impact on the speech intelligibility, while SNReff values lower than 15 dB cannot
have any positive impact on the speech intelligibility. The clipped SNReff values are converted
to transmission indices TI(k,F) that range from 0 to 1, and each contribute to the STI.
TI ( k , F ) =
SNR(k , F ) + 15
30
For the final STI, the TI values are averaged over the modulation and octave band frequencies
in a special way, thereby taking into account auditory masking and the absolute hearing
threshold. However, to get an idea of the impact of the SNR on the STI, it is sufficient to
evaluate TI for several values of m0 and SNR. Figure 15 shows the relative change of TI when
going from a situation with SNR = to a situation with the given finite SNR:
TI =
TI SNR TI
100%
TI
October 2008
25
Technical Note

The relative changes in TI would also hold for the STI in case of equal weighting factors for
each octave frequency band. Actually, the SNR values of the octave bands from 500 Hz
through 4 kHz are the most significant.
From both parts of figure 15 it can be seen that as m0 increases from 0.5, TI changes faster
with m0 (hence with SNR), until SNReff clips to 15 dB at TI = 1, resulting in a dip at m0 =
0.97. A much steeper, yet practically irrelevant dip at m0 = 0.03 is not shown.
Using figure 15 and table 6, in which relevant octave frequency bands are defined, we now
define the 15dB SNR criterion:
The background noise is negligible if the SNR exceeds 15 dB in each relevant octave
frequency band and the STI does not exceed 0.8.
26
October 2008
Technical Note
Transmission Index without background noise
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.6
0.7
0.8
0.9
1.0
mo
0.0
0
0.1
0.2
0.3
0.4
0.5
0.8
0.9
1.0
24 dB
21 dB
18 dB
-2
15 dB
Relative change of TI [%]
-4
12 dB
-6
-8
-10
SNR=9 dB
-12
-14
Figure 15: Impact of SNR on transmission index TI.
October 2008
27
Technical Note
Table 6: Relevant octave frequency bands.

Octave frequency band
Source excitation signal
[Hz]
Male
Female
125
250
500
1k
2k
4k
8k
RASTI
If only the condition of SNR exceeding 15 dB is met, this could theoretically lead to an
underestimation of the STI of 10%, namely if the SNR is close to 15 dB for all relevant
octave frequency bands and the STI without background noise would be 0.97, resulting in a
measured STI of 0.9. In practice however, STI values rarely exceed 0.8 and most likely, only
a few SNR values, if any, will come close to 15 dB. Therefore, in most practical cases it will
be sufficient to meet only the SNR > 15 dB condition.
28
October 2008
Technical Note
7 References
[1] T. Houtgast, H.J.M. Steeneken and R. Plomp, Predicting Speech Intelligibility in Rooms from the
Modulation Transfer Function. I. General Room Acoustics, Acustica 46, 60 72 (1980).
[2] IEC 60268-16 (2003) Sound system equipment. Part 16: Objective rating of speech intelligibility
by speech transmission index.
[3]
M.R. Schroeder, Modulation Transfer Functions: Definition and Measurement, Acustica 49, 179
182 (1981).
[4]
D.D. Rife, Modulation Transfer Function Measurement with Maximum-Length Sequences, J.

Audio Eng. Soc. 40, 779 790 (1992).
[5]
ITU-T Recommendation P.51 (03/93), Telephone Transmission Quality Objective Measuring

Apparatus - Artificial Mouth
[6] T. Houtgast, H.J.M. Steeneken et al., Past, present and future of the Speech Transmission
Index, TNO Human Factors, Soesterberg, The Netherlands, 2002, ISBN 90-76702-02-0.
[7]
C.C.J.M. Hak and J.P.M. Hak, Effect of Stimulus Speed Error on Measured Room Acoustic
Parameters, Proceedings ICA 2007.
[8]
C.C.J.M. Hak and J.S. Vertegaal, MP3 Stimuli in Room Acoustics, Proceedings ICA 2007.
[9]
A New DIRAC, Measuring Speech Intelligibility. Brel & Kjr Magazine No. 1, 2008, p26-27.
October 2008
29
Acoustics Engineering develops systems for the prediction and measurement of

acoustical parameters, resulting in user-friendly tools that enable you to perform fast and
accurate acoustical measurements and calculations.
For information on our products, please contact

Acoustics Engineering
Email:
info@acoustics-engineering.com
Phone/Fax:
+31 485 520996
Mail:
Acoustics Engineering
Groenling 43-45
5831 MZ Boxmeer
The Netherlands
Website:
Brel & Kjr is the sole worldwide distributor of DIRAC. For information on DIRAC, please
contact your local B&K representative or the B&K headquarters in Denmark:
Brel & Kjr
Email:
info@bksv.com
Phone:
Fax:
+45 45 80 05 00
+45 45 80 14 05
Mail:
Brel & Kjr A/S

Skodsborgvej 307
DK-2850 Nrum
Denmark
Website:
www.bksv.com
Copyright Acoustics Engineering 2008

All rights reserved. No part of this document may be reproduced or transmitted in any form or by any
means, electronic or mechanical, without the prior written permission of Acoustics Engineering.

Measuring Speech Intelegibility Using Dirac

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measuring Speech Intelegibility Using Dirac

Uploaded by

Copyright:

Available Formats

TN002

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

This page intentionally left blank.

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

Modulation Transfer Function MTF..........................................................................................5

2 Parameters Related to Speech Intelligibility ................................................9

Speech Transmission Index STI..............................................................................................9

3 Measuring the Speech Intelligibility............................................................12

Impulse response measurement procedure...........................................................................19

5 Appendix A: Measuring the IR in the presence of background noise......25

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

1 Speech Intelligibility Basics

Figure 1: Relation between speech intelligibility and modulation depth.

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

Octave band frequency [Hz]

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

1.3 Measuring MTF: modulated noise versus impulse response

1.4 MTF measurement conditions and limitations

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

2 Parameters Related to Speech Intelligibility

2.2 Room Acoustics Speech Transmission Index RASTI

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

2.3 Speech Transmission Index for Telecommunication Systems STITEL

2.4 Speech Transmission Index for PA Systems STIPA

2.5 Percentage Articulation Loss of Consonants % ALC

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

2.6 Signal to Noise Ratio SNR

2.7 Early Decay Time EDT

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

3 Measuring the Speech Intelligibility

3.2 Hardware Components

TN002 Measuring Speech Intelligibility Using DIRAC

External signal generators

3.3 Measuring the SNR

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

3.4 Level calibration

Measure button and select the source and receiver settings:

Source Signal: External Impulse.

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

Figure 2: Adjusting the received calibration tone level.

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

Figure 3: Checking the recorder calibration signal.

7. From the Setup menu, choose Calibrate Input Level.

Figure 4: Entering the sound calibrator level.

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC

It is recommended to repeat the input level calibration before each speech

3.5 Combining the Impulse Response with SNR Values

Copyright 2008 Acoustics Engineering

TN002 Measuring Speech Intelligibility Using DIRAC