Professional Documents
Culture Documents
SwissQual License AG
Allmendweg 8 CH-4528 Zuchwil Switzerland
t +41 32 686 65 65 f +41 32 686 65 66 e info@swissqual.com
www.swissqual.com
SwissQual has made every effort to ensure that eventual instructions contained in the document are adequate and free
of errors and omissions. SwissQual will, if necessary, explain issues which may not be covered by the documents.
SwissQuals liability for any errors in the documents is limited to the correction of errors and the aforementioned advisory
services.
Copyright 2000 - 2012 SwissQual AG. All rights reserved.
No part of this publication may be copied, distributed, transmitted, transcribed, stored in a retrieval system, or translated
into any human or computer language without the prior written permission of SwissQual AG.
Confidential materials.
All information in this document is regarded as commercial valuable, protected and privileged intellectual property, and is
provided under the terms of existing Non-Disclosure Agreements or as commercial-in-confidence material.
When you refer to a SwissQual technology or product, you must acknowledge the respective text or logo trademark
somewhere in your text.
SwissQual, Seven.Five, SQuad, QualiPoc, NetQual, VQuad, Diversity as well as the following logos are
registered trademarks of SwissQual AG.
Diversity Explorer, Diversity Ranger, Diversity Unattended, NiNA+, NiNA, NQAgent, NQComm, NQDI,
NQTM, NQView, NQWeb, QPControl, QPView, QualiPoc Freerider, QualiPoc iQ, QualiPoc Mobile,
QualiPoc Static, QualiWatch-M, QualiWatch-S, SystemInspector, TestManager, VMon, VQuad-HD are
trademarks of SwissQual AG.
SwissQual acknowledges the following trademarks for company names and products:
Adobe, Adobe Acrobat, and Adobe Postscript are trademarks of Adobe Systems Incorporated.
Apple is a trademark of Apple Computer, Inc.
DIMENSION, LATITUDE, and OPTIPLEX are registered trademarks of Dell Inc.
ELEKTROBIT is a registered trademark of Elektrobit Group Plc.
Google is a registered trademark of Google Inc.
Intel, Intel Itanium, Intel Pentium, and Intel Xeon are trademarks or registered trademarks of Intel Corporation.
INTERNET EXPLORER, SMARTPHONE, TABLET are registered trademarks of Microsoft Corporation.
Java is a U.S. trademark of Sun Microsystems, Inc.
Linux is a registered trademark of Linus Torvalds.
Microsoft, Microsoft Windows, Microsoft Windows NT, and Windows Vista are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries U.S.
NOKIA is a registered trademark of Nokia Corporation.
Oracle is a registered US trademark of Oracle Corporation, Redwood City, California.
SAMSUNG is a registered trademark of Samsung Corporation.
SIERRA WIRELESS is a registered trademark of Sierra Wireless, Inc.
TRIMBLE is a registered trademark of Trimble Navigation Limited.
U-BLOX is a registered trademark of u-blox Holding AG.
UNIX is a registered trademark of The Open Group.
Contents
1
Introduction .......................................................................................................................................... 1
Listening Quality.................................................................................................................................. 2
Introduction ............................................................................................................................................ 2
The Definition of Listening Quality ......................................................................................................... 2
Subjective and Objective Quality assessment ...................................................................................... 2
Assessment of Intrusive-/Non-Intrusive Calls ........................................................................................ 3
Figures
Figure 2-1 Subjective versus objective quality assessment .............................................................................. 3
Figure 3-1 NiNA+ Listening Quality values for noise-free speech transmissions ............................................. 7
Figure 3-2 NiNA+ Listening Quality values in GSM connections using real handsets ...................................... 8
Figure 3-3 Example of NiNA+ measurements shown in NQDI ......................................................................... 9
Figure 3-4. Average NiNA+ results .................................................................................................................. 10
Figure 3-5 Signal Envelope [dB] (Received Speech Signal) ........................................................................... 12
Figure 3-6 Time Domain Chart (Received Speech Signal) ............................................................................. 12
Tables
Table 3-1 Correlation coefficients between MOS values obtained in auditory tests and scores of NiNA+ ....... 6
Table 3-2 Typical MOS values of auditory tests and NiNA+ ........................................................................... 11
ii
Contents |
CONFIDENTIAL MATERIALS
Introduction
This document describes the technical background, the application scenarios as well as the parameters that
are measured with the single ended NiNA+ voice quality measurement. The application used was the
SwissQual QoS Measurement System, the screenshots are made from the SwissQual Post Processing
System NQDI.
NiNA+ provides an opportunity for assessing the signal quality of a signal transmitted via a
telecommunications network without the knowledge of the originally transmitted signal. The speech quality is
determined by only using the output signal. SwissQuals NiNA+ solution can be applied for rating of any
arbitrary connection where a self-answering far-end side is playing back human speech (e.g. weather
forecast or similar). Since, NiNA+ can be applied on the mobile unit, the radio link forms part of the tested
connection. Of course, by using NiNA+ any fixed line connection, even Voice over IP, can be rated.
Furthermore, the NiNA+ method is not restricted to end-to-end measurements; it can be used at any arbitrary
location in the transmission chain. It can be used for quality monitoring at any electrical measuring point
within a real established voice link (e.g. in a VoIP Gateway or a at an E1/T1 interface). The calculated score
reflects the true speech quality from the perspective of the end-user as if using a conventional shaped
handset at this measuring point.
Chapter 1 | Introduction
CONFIDENTIAL MATERIALS
Listening Quality
Introduction
For network operators or equipment manufacturers, it is important to know where and why there is speech
quality degradation. Since listening quality is a major factor determining customer satisfaction, encoding
techniques must be designed for optimal speech quality. In order to assess the quality of speech encoding
techniques, large-scale auditory tests are commonly employed. However, it is very difficult to reproduce
results obtained in such a way. Furthermore, such results are depending on the level of motivation of the
individual test candidates. It is, therefore, a big advantage to have an automated method capable of
physically measuring speech quality parameters and producing results, which correlates as closely as
possible with subjectively acquired results.
Listening quality is a vague term compared with bit rate, echo or loudness. Since customer satisfaction can
be measured directly by the quality of the transmitted speech, encoding techniques must be selected and
optimized based on their listening quality.
Listening Quality: covers the listening situation between the two calling parties, where one party is
talking and the other party is listening (non active).
Talking Quality: perceived quality by the talker during own speech activity (mainly influenced by
echoes and side tones)
Conversational Quality: perceived overall quality in a human conversation. It combines Listening and
Talking Quality together with signal delay and double talk interferences..
Detailed definitions of these dimensions and test scenarios for auditory tests can be found in ITU-T P.800
series.
expectation This procedure is, however, very time consuming and therefore expensive.
Objective assessment: An automated speech quality assessment method making:
an evaluation and rating of the received signal compared to the known reference (double-ended
method and intrusive, requires a testcall), or
an evaluation and rating is conducted on the received signal alone. (single-ended method, might be a
test call to a answering machine or even live monitoring)
The basic relationship between subjective /objective assessments and double-ended/single-ended is shown
in Figure 2-1.
Experience
expectation
semantic
Human
listener
Methods requiring
a reference signal
Quality
rating
Methods requiring
NO reference
Quality
rating
Intrusive and double-ended: Both ends of the connection are under control and a defined audio signal
will be transmitted in this test connection.
Intrusive and single-ended: A test connection will be established to any answering station which is
playing back a voice signal (e.g. weather forecast). Here the same model is applied as the Non-intrusive
In-service Monitoring.
Intrusive and double-ended Speech Quality Assessment: Here the methods, which require a known
reference signal, will be applied normally. Both ends of the connection are under control and a pre-defined
voice-signal will be transmitted.
This approach generally has the disadvantage that, it is necessary to intervene in the network to be tested.
This means, to determine the signal quality, at least one transmission channel must be occupied for the
reference signal to be transmitted on it. This transmission channel cannot be used for data transfer purposes
3
during this period of time. In addition, although in a broadcasting system such as a radio service, for
example, it is in principle possible to assign the signal source for transmitting test signals, however, since all
channels are consequently occupied and the test signal would be transmitted to all receivers, this procedure
is extremely impractical. Also, Intrusive procedures are likewise unsuitable for the purpose of simultaneously
monitoring the quality of a large number of transmission channels.
Of course, the advantages of the double ended method, is that the input signal or reference signal is
known, this allows for very accurate and detailed analysis of voice quality impairments. Each change in the
signal during its transmission can be detected and be proven for its impact on perceived quality by applying
psycho-acoustic models. Such models are well applicable for optimization processes in laboratories as well
as in real networks. They are able to predict even the minimal degradations of the signals and can be
applied to compare different or similar transmission scenarios.
Non-intrusive and single-ended Speech Quality Assessment: Models assessing speech quality without a
pre-defined reference speech signal, which has to be transmitted, often called non-intrusive or single-ended
models. These models analyse the transmitted and maybe distorted speech without any possibility to
compare it with a separate input or known reference signal. Therefore, no reference input signal is available
for a detailed comparison.
The single ended models often look for pre-defined distortions by applying conventional signal analysis
methods. This means, they are looking for background noises, interruptions, frame repeats and so on. More
advanced solutions try to reconstruct a reference speech signal from the distorted one and apply similar
psycho-acoustic based methods for comparisons like the intrusive and double-ended methods.
Of course, the accuracy of a single ended approach is lower than that of an intrusive and double ended
approach. However, due to the advanced integrated speech extraction and the psycho-acoustic based
calculations, the single-ended approach is now accurate enough to be applied in real environments.
A non-intrusive, single-ended algorithm has two base applications, namely:
In-Service Monitoring: Here the speech signal of a real conversation will be assessed. This can be done
with a terminal or maybe more efficient at the PBX side at an E1/T1 link or even in a VoIP Gateway. The
advantages are two-fold:
the ability to collect a large amount of measurement data without allocating network resources and
Gain a more realistic overview about the speech quality as perceived by the subscribers. This is because
the impact to speech quality coming from the sending side (e.g. Background noise) is included in the
measurement and end result.
NiNA+ will be connected at an electrical interface, therefore the real acoustical environment of the listener
cannot be measured, instead a modelled handset is applied to the signal to act as an intermediate receiving
function.
Applications for such quality monitoring scenarios except the pure quality reporting could be also qualitybased routing or quality based billing.
For the network operator the quality monitoring scenario can be used as a powerful quality reporting tool
application, however further applications are possible like quality-based routing or quality based billing.
Intrusive and single-ended Quality Here a test connection has to be established at both ends but it is not
required that the far-end side plays back a pre-defined signal. This is an advantage as there is no need to
install a dedicated answering station. The model works with any speech signal from the far-end, these could
be public numbers like the weather forecast or the time service. This is really helpful for monitoring multi-link
connections especially to other providers or other countries. Only at the listening side a test system has to
be installed. Furthermore, the network provider will have the possibility to monitor there own voice-based
announcement services for possible impacts or accessibility.
NiNA+ is SwissQuals solution for smart predicting MOS-LQO on a single ended approach. It covers a signal
pre-processing and calculates additional parameters such as causes of quality degradations, noise and
speech levels. NiNA+ as stand-alone solution is a complete suite for non-intrusive listening quality
assessment.
SwissQuals NiNA+ solution runs on Windows 32bit platform. It requires only a speech signal with 8000 Hz
sampling frequency as input. Because of SwissQuals consequent run time optimization, it requires only
0.25% of the speech sample duration for the complete calculation on a state of the art Pentium 4 processor
1
(2.6 GHz) . For comparison, it runs nearly 100 times fast than ITU-T P.563 and even more than 20 times
faster than SwissQuals speed optimized solution for P.563.
SwissQuals NiNA+ solution runs also on Windows 32bit platform. This low complexity makes NiNA+ to an
ideal component at low performing platforms such as mobile phone operating systems and digital signal
processors.
Furthermore, the NiNA+ method has some useful requirements on the speech signal to be assessed to avoid
false predictions or malfunctions.
Sampling Frequency:
The sampling frequency has to be 8000 Hz and a linear quantized PC-signal (16bit) is required. The
conversion from other formats is not part of the algorithm itself and has to be done separately. This process
is done automatically by SwissQuals QoS measurement systems, therefore no further work needs to be
done by the customer.
Speech Sample Length:
A sample length between 5 and 20 seconds is recommended. The signal length will be checked by
SwissQuals QoS system. Defined sample length below 5 seconds will be not accepted. Sample length of
above 20 seconds will result in a warning message and will be truncated at 20 seconds. It is recommended
that the speech activity has to be in minimum 25%, but more than three seconds and should not exceed 90%
(especially for short samples).
Minimum Speech Activity:
The main requirement is the minimum amount of active speech in the file. To obtain accurate results the
speech signal should contain, at least 3 seconds of active speech. Otherwise, the processing might lead to
wrong results, because the balance between voiced and unvoiced sections is not given anymore. Even for
auditory tests with human listeners a minimum speech activity of 4 seconds is recommended. To avoid a
mal-function, the configuration of the measurement probe does not allow the definition of speech sample
length below 5 seconds. Nevertheless, the active speech might under-run the minimum speech activity.
Consequently, SwissQuals QoS system is configured not to process speech samples with less than 3 sec
active speech, instead a warning message is displayed.
Speech Level:
NiNA+ accepts range of active speech level from -16 dBov down to -45dBov. Higher levels will lead to
annoying clippings of the higher amplitudes. However, if the high speech level is caused by the network
under test, it should be considered in the quality but if the clipping is caused by measurement interface, it will
lead to artificial quality impacts.
Likewise, measurements with low speech level will have a decreasing signal-noise-ratio caused by the limited digital resolution of the used A/D converter in the measurement environment. This will also lead to
additional quality impacts. SwissQuals QoS system will ensure the proper level adjustment for all supported
cellular phones and ISDN/PSTN cards. Only in the transparent mode by using arbitrary terminals the
customer it self has to control the correct level adjustment. For that reason speech levels, which are out of
the recommended range, will be highlighted in red color by analyzing the results in SwissQuals NQDI data
interface. Please note, that files with a speech level of below -65dBov will be not analyzed and a warning
message will be displayed.
Accuracy of predicted Listening Quality:
The accuracy of the NiNA+ model was by using large speech databases covering the complete scope of
todays public switched telephone networks.
The performance against well-known databases from the ITU-T set is shown below. Due to the target
applications from SwissQuals QoS system, a strong focus was set for an outstanding performance in real
live network connections, such as the mentioned test real GSM with handset variations. The numbers are
describing the correlation coefficient between the MOS values obtained in the auditory tests and the
predicted scores by NiNA+. Therefore a third-order mapping was applied before calculation of the
correlation. The results below are comparing the NiNA+ performance with the current ITU-T standard P.563.
Table 3-1 Correlation coefficients between MOS values obtained in auditory tests and scores of NiNA+
Speech Database
Suppl. 23, Exp. 1 Am. English
ITU-T P.563
0.902
NiNA+
0.905
6
Speech Database
ITU-T P.563
NiNA+
0.842
0.918
0.916
0.857
0.929
0.903
0.895
0.925
0.935
Real VoIP
0.950
5.0
4.5
4.0
NiNA+
3.5
3.0
2.5
2.0
1.5
1.0
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
This database shown in Figure 3-1 is taken from the G.729 characterization phase of ITU-T and consists of a
wide range of existing codecs and combinations thereof. The results given are on a so-called per-condition
basis, which means the results of four samples transmitted through the same application scenario were
averaged.
5.0
4.5
4.0
NiNA+
3.5
3.0
2.5
2.0
1.5
1.0
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
This database shown in Figure 3-2 is taken from a subjective test performed or
ITU-T within the P.563 competition phase. It was organized by SwissQual in the Deutsche Telekom
Laboratories in Berlin. Compared to the common ITU-T databases, where simulated speech files are used
this test contains speech recordings in real GSM circuits. The speech signals were inserted in the handset
microphone using an artificial mouth in different acoustical environments.
Typically, of most interest to the users is the Listening Quality value gained by Figure 2-1 applying NiNA+. In
line with ITU-T Recommendation P.800.1 it is called MOS-LQO where the LQO stands for Listening Quality
Objective. The MOS-LQO is defined in range 1 to 5 where 1 is standing for bad and 5 for excellent speech
quality. In real measurements, the value will scarcely exceed 4.5.
In addition to the MOS-LQO, further analysis can be done by analysing the average section as shown in
Figure 3-4.
Static SNR in dB
Amplitude Clipping in %
Speech Activity in %
DC Offset in %
Pitch frequency in Hz
Signal Class
The MOS-LQO is truly the main result of the analysis and gives an overview about the quality in a single
number result. To give a bit more feeling about the results, which can be expected, the following table lists
results obtained by analyzing coded speech with typical speech codecs.
10
Codec
G.711
4.3
4.4
G.729
3.8
3.8
G.728
3.7
3.7
G.726 (32kbit/s)
3.9
3.8
GSM-FR
3.5
3.2
GSM-EFR
3.9
3.8
Clean speech
Noisy speech
No speech
It is possible to see more then one cause (code) in the average section. There are eight different problem
codes:
Background noise is signalized if the Noise Level is higher than -50 dB or the static SNR is below 20 dB.
Modulated Noise occurs when the segmental SNR is under-run a defined multi-dimensional threshold. It
signalizes mainly signal-form speech codecs.
Interruptions flag is set to true if one or more signal interruptions are detected in a speech signal
Level problem occurs if the signal level exceeds the nominal level for more then 10 dB. Likewise, this
problem will be also signalized if the signal level will fall 12dB below nominal level. Nominal speech level
is -26 dBov (dB to digital overload point).
DC Offset problem is shown when the DC offset of speech signal has exceeded the predefined
thresholds of +/- 0.2 %.
Amplitude clipping is shown if the saturation of the signal will lead to significant distortions.
Restricted Audio Bandwidth is flagged if there a significant limitation relatively to the expected telephone
band (3003400) can be detected.
NotSpecified signalizes that the speech quality is degraded but no outstanding reason for that
degradation could be classified
The next step in the analysis is done by looking at the signal envelope as well as by listening to the live
recordings.
Analyzing Envelope of Received Signal:
The signal envelope is graphically presented. It provides the experienced user with visual charts information
on amplitude clippings, background noises and interruption. Especially the locations of interruptions are
marked separately by vertical lines. At the top of the line the detected length of the interruption is printed in
ms (Figure 3-5).
Tim e Dom ain
Envelope
Interruptions
0.00
0
-10
82 ms
Envelope [dBov]
-20
107 ms
71 ms
-30
-40
-50
-60
-70
-80
-90
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
2.20
2.40
2.60
2.80
3.00 3.20
Time [s]
3.40
3.60
3.80
4.00
4.20
4.40
4.60
4.80
5.00
5.20
5.40
5.60
5.80
6.00
The envelope below presents the signal in the common time domain format (Figure 3-6). Also here the
experienced user can obtain some information as peaks and amplitude clippings.
Coded Sam ple
Level
0.00
30'000
25'000
20'000
15'000
10'000
5'000
0
-5'000
-10'000
-15'000
-20'000
-25'000
-30'000
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
2.00
2.20
2.40
2.60
2.80
3.00 3.20
Time [s]
3.40
3.60
3.80
4.00
4.20
4.40
4.60
4.80
5.00
5.20
5.40
5.60
5.80
6.00
Furthermore, the NQDI presentation sheet gives the possibility to play back the received sample by using the
default or a specified audio player as well as several options to export the results into external tables or text
documents.
12