You are on page 1of 5

Speech processing in the Pan-European Digital Mobile Radio

System (GSM) - System Overview


Jon E Natvigl), Stein Hansen l), and Jorge de Brito2)

l ) Nonvegian TelecommunicationsAdminiswation Research *) Centre National d'Etudes des Telecommunications(CNET),


Department, Kjeller Norway Paris, France

ABSTRACT problems of overload and congestion especially in larger cities. A


A digital cellular mobile radio system has been under second major objective of the new system was therefore to achieve
development in Europe since 1982, under the coordination of the significantly better spectrum efficiency than for the first generation
CEPT working group GSM (Groupe Special Mobile). By the end systems.
of 1988, most of the specificationswere finalized and it is expected Thirdly, since the new system would have to be introduced in
that the first systems will go into operation as early as in 1991. parallel to existing and well established first generation systems,
Of many crucial issues in the design of this system were the the new system should be competitive in terms of functionality,
development of a low bit rate speech codec and a voice activated performance and not the least, cost.
transmission scheme. After a brief review of the historical In some first generations systems, the percentage of handheld
background, the paper provides an overview of the the system, with terminals has reached a 20 % penetration. For the GSM system, 50
particular emphasis on the factors that interacted with the design of % or more of the terminals are expected to be of the handheld type.
the speech processing functions, e.g. access and transmission One essential design parameter of a handheld terminal is the
methods, error protection and network organization. operational duration of the unit before recharging the battery. The
duration is highly dependent on the baaery capacity and the average
power consumption. The battery capacity will of course in practice
1 INTRODUCTION be limited by weight limitations so the importance allocated to the
handheld terminals lead to rather severe limitations on the allowed
1.1 Background
power consumption and the implementation complexity that could
In 1991, a new Pan-European Digital Mobile Cellular be allowed for the speech codec and the associated speech functions.
Telephone system will be opened for service simultaneously in 16
European countries. This event will represent the conclusion of a 1.2 Development of the system
co-ordinated development and implementation programme that was During the course of the studies it fairly soon became clear that
initiated almost 10 years earlier: When, in 1982, the European a digital system would offer a number of advantages compared to
Conference of Post and Telecommunications Administrations analogue solutions.
(CEFT) recommended its members to reserve the frequency bands The spectrum efficiency of cellular radio is restricted by co-
890 - 915 MHz and 935 - 960 MHz for mobile radio. By this the channel interference and the occupied channel bandwidth per user.
basic pre-requisite for introducing a multi-national mobile service in Studies showed that a given voice quality could be achieved at
Europe was established. At the same time, the CEPT created the considerably higher co-channel interference levels in a digital
"Groupe Special Mobile" (GSM) to establish the specifications of a system, partly due to the fact that effective forward error protection
harmonized public mobile radio communications system. techniques may be applied in the digital case to increase error
An important motivation for this initiative was the tremendous robustness. This in turn allows a more frequent re-use of
growth in the demand for the mobile service which was frequencies and thus a larger number of channels per square km can
demonstrated in the first generation cellular systems that were beaccommodated.
deployed in the European countries in the 80s. Currently, there are A second driving force towards the digital solution was the
more than 1 million cellular phones in Europe, 75 % of these being evolution of digital transmission in the Public Switched Telephone
concentrated in UK and Scandinavia. It is expected that the Network (PSTN) and the introduction of the Integrated Services
potential for the mobile service in Europe could be 10-20 million Digital Network (ISDN). When the Pan-European cellular system
or even more. However, one limiting factor in the current mobile will go into operation in the early nineties, the penetration of ISDN
scenario in Europe is the fragmentation of the service in a number in the European countries will have reached a considerable degree.
of incompatible first generation systems. This implies that a The extension of a subset (as large as possible) of ISDN services to
mobile terminal designed for one system will not work with any of the mobile subscriber was therefore seen as an important objective.
the others. In cellular systems the transmission over the radio path poses
One fundamental objective of the new Pan-European system problems of security in two respects: i) with respect to listening
was therefore to define a common air interface to allow the users of into conversations (privacy) and ii) unauthorized access and use of
the system to make and receive calls all over Europe. This the system (non-paying user). A digital solution opens the
"roaming" facility is believed to be one of the more important possibility of relatively easy solutions to these problems by use of
advantages of the new system as seen from the users. A common encryption procedures of signalling and user data as well as by
system specification with defined interfaces allows exchange of implementing secure authentication procedures.
equipments from different manufacturers and will give the A decision to implement a digital system was reached in 1986,
customers as well as the network operators a greater number of and since then a considerable development effort has been
suppliers. undertaken by European telecommunications administrations and
The success of the cellular radio service has also brought private industry, to establish the specifications of the system. This

29B. 1.1.
1060 CH2682-3/89/0000-1060 $1 .OO 0 1989 IEEE
process involved exploration of new territory in a number of well suited for cellular mobile radio. As an option, a slow
technical area [1,2]. frequency hopping scheme may be. utilized. This implies that the
Even though the provision of a wide range of data services will carrier frequency of a particular time slot is changed between the
be a major advantage of the new system, the transmission of speech available frequencies for each new burst
is still by far considered to be the most important service in the
system. Crucial issues in the design of the system were therefore 2.2 Speech coding, error protection and interleaving
the development of the two basic speech functions: The transmission path of a mobile call is typically characterized
a low bit rate speech codec, by rapid fluctuations in the propagation conditions, which result in
a scheme for voice activated transmission intended to increase transmission error probabilities up to 10-1. Typically, the errors
spectrum efficiency even further and to save battery power in will occur in bursts. In order to reduce the subjective effects of
handheld portable terminals. transmission errors, three basic techniques are used in the GSM
This paper provides an overview of the major components of system: interleaving, error correction and e m r detection.
the system with major emphasis on those aspects which interacted Typically, the bits produced by a low bit rate speech coder
with or imposed constraints on the design of the speech functions, consist of encoded parameters describing various aspects of the
and some of the major trade-offs that had to be considered are speech signal, usually referred to as "side information", and a
outlined. residual signal which describes the remaining signal after the
information described by the side information has been removed.
2 OVERVIEW OF THE SYSTEM The side information is normally updated at a relatively slow rate,
typically every 10 - 30 ms. Thus an error in the side information
2.1 The access scheme
could result in an erroneous reproduction of the speech for at least
The radio access strategy chosen for the GSM system is the
this duration. It is therefore clear that the bits produced by such a
Time Division Multiple Access (TDMA) scheme. In the scheme
speech coder will carry different amounts of information and which
chosen, each carrier supports a bit stream of 271 kbit/s which is
would need different amount of protection
partitioned into time frames each consisting of eight time slots.
In the speech coding algorithm chosen by GSM [3, 41, the
The recurrence of a particular time slot in consecutive TDMA
Regular Pulse Excitation-Long Term Prediction LPC (RPE-LTP),
frames represents the physical channel allocated to one user. Fig.
260 bits are produced every 20 ms, resulting in a bit rate of 13
1, illustrates the composition of the basic TDMA frame and the
kbit/s. These bits are distributed as 72 bits of side information
time slot.
(filter coefficients, gain and pitch information) and 188 bits for
Of each time slot consisting of 148 bits, 114 bits (available
encoding the residual bit stream.
each 4.615 ms time frame) are available per user channel, whereas
These bits are organized into 3 classes of importance according
the remaining bits are utilized by the system for proper
to the effect of errors on speech quality [4] and the three classes are
transmission and reception of the burst. The gross bit raye of the
given different protection. Basically, a half rate convolutional code
user channel thus amounts to 114 biw4.615 ms = 24.7 kbit/s.
is applied to the most important bits whereas the less important
In order to support various monitoring and control functions,
ones are left unprotected. In addition, a small block code is applied
most notably channel observation for handover operations, each
to the 50 most sensitive bits for e m r detection. If the decoder
speech Traffic Channel (TCH) is associated with a so-called Slow
detects uncorrected errors in these bits, the complete speech frame is
Associated Control Channel (SACCH), which is transmitted over
discarded since an error in these bits has been found to result in
the same physical user channel. This is achieved by organizing the
serious speech degradation. In this case, a prediction based on
bit stream of the physical channel into a 120 ms multiframe
previous frames has been found to be subjectively more acceptable.
structure consisting of 26 frames of 114 bits as shown in Fig. 2.
The encoding process results in a speech frame ready for
In this scheme, each TCH uses 24 frames of 114 bits per 120 ms
transmission consisting of 456 bits. In order to randomize the
resulting in a bit rate of 22.8 kbit/s, whereas the SACCH uses one
effect of burst errors and improve the working conditions for the
frame corresponding to a bit rate of 950 bids. The remaining
convolutional code, an interleaving over 8 time slots is applied. In
frame is left unused in this case. This has been done as a
this scheme, each burst carries information from two speech frames.
preparation for a future speech channel using half the bit rate of the
current one As shown in part b) of the Fig.2, two TCH/SACCH 2.3 Network organization
combinations are accommodated without any change to the system. In a cellular network, the spectrum efficiency or capacity of the
The available bit rate per channel is then 11.4 kbits/s. This facility network is obtained by re-using frequencies in defined areas (cells)
will effectively double the traffic capacity of the system when a with a defined distance between them. A cell is controlled by a
speech c o d a capable of offering a reasonable performance at 6-7 Base Tranceiver System (BTS) and several BTSs may be under the
kbit/s is available. management of a Base Station Controller (BSC). The BSC and its
In certain situations, in particular when a handover is to take BTSs constitute a Base Station System (BSS). which is connected
place, the signalling capacity of the SACCH is not sufficient. In to a Mobile Switching Centre (MSC). The MSC controls one or
this situation, a special mode has been defined where a Fast several BSSs and mainly performs normal switching functions.
Associated Control Channel is created by "stealing" bits from the The MSC is very similar to an ISDN exchange, and is a point of
speech traffic channel. In order to indicate this situation to the interconnection of the GSM system to the ISDNPSTN.
decoder, a special "frame stealing flag" is set in the control part of Due to the cellular structure of the radio network, the Mobile
the burst. In the case of stolen frames, the speech decoder is Station must transfer its communication with the network from one
notified and is supposed to bridge short gaps by using a prediction BTS to another when moving from one cell to another during a call
based on previous frames, or to mute the output in the case of (handover). In GSM there are 4 possible types of handover:
longer gaps. handover within a BTS (frequency change), between two BTSs
For radio transmission each burst corresponding to a time slot within a BSC, between 2 BSCs within an MSC, and between 2
is modulated using the Gaussian Minimum Shift Keying (GMSK) MSCs. These 4 types of handover give rise to an extensive
modulation technique with a normalized bandwidth BT = 0.3, which signalling in the network and different interruptions in
is a constant envelope modulation scheme with a compact spectrum communication.

29B. 1.2.
1061
An important feature of the GSM system is automatic was demonstrated before the final decision to go for a digital
international roaming, which means that the mobile station is able solution was made [3]. In a fist round of tests it was found that a
to make and receive calls anywhere in the GSM service area, i.e. number of digital codecs would outperform systems based on
anywhere in the countries implementing the GSM system, without analogue FM when the average criterion was used. In error free
any special actions taken by the mobile subscriber. This implies, conditions, it was found that "clean" FM gave a substantially
however, that the mobile station must update the network about its transparent quality that could not be matched by the digital speech
location also when in idle mode and that the network must handle codecs under test. However, the better performance of the digital
this location updating. The location information is handled in the solutions in error conditions, more than compensates this
network by 2 different kinds of data bases, a Home Location difference. A comprehensive evaluation of the speech performance
Register (HLR) and as Visited Location Register (VLR). of is reported in [51.
The HLR is the data base which contains all data concerning the
3.2 Echo and Delay
subscription of the mobile subscriber. The HLR also contains
Transmission delay causes problems for two reasons: a) because
information about the VLR which is currently handling the mobile
station. When the mobile changes location, the HLR is updated of a long delay between a subscriber talking and receiving a reply
accordingly. often disturb the flow of the conversation and b) because reflections
The mobile station is always handled by a VLR, which is in the network generate echoes which disturb the talker.
The overall delay in a DMR system will have the following
related to a certain geographical area where the mobile is currently
roaming. When a PSTNDSDN subscriber then calls a mobile contributions:
subscriber, the HLR will be interrogated about the address of the Speech codec algorithmic delay. ........20 ms
VLR currently handling the mobile. The call will then be routed to Interleaving delay ........................... 38 ms
the correct MSC which extends the call through the BSS over the Speech ccdec processing delay. ........= 8 ms
radio path. Other delay of the radio-subsystem.. -15 ms
The procedures to handle the mobile aspects like location
updating and handover etc. between the MSC, HLR and VLR, have resulting in an overall delay of about 80 ms.
been designed using the same principles as those used in the ISDN. The delay was probably the most difficultparameter to consider
In particular, a special applications part, the Mobile Application in the system specification and can be used to illustrate the difficult
Part (MAP) which uses the Transfer Capabilities Application Part trade-off decisions that had to be taken in the system design.
(TCAP), has been defined in CCITT S S 110.7[8]. Basically, each component, when considered in isolation, could
make good use of an increased delay budget
3 SPEECH PERFORMANCE CONSIDERATIONS Speech algorithm: at least ideally, it could be assumed that the
As a public cellular radio system, the GSM system will longer the delay available to the speech codec algorithm, the better
interface to the national telephone network and must be considered were the possibilities to improve speech quality or alternatively to
as part of the international telephone network. In the design of reduce the necessary bit rate for a given speech performance.
the GSM system, this fact resulted in requirements (and constraints) Interleaving: As was outlined in section 2.1, the 456 bits
in the design of the speech functions. Due to the difficult technical encoded and error protected speech frame are interleaved by a factor
and economical trade-offs involved in the design of a cellular 8. However, 456 bits could fit exactly into 4 time slots (1 14 x 4),
system, e.g. economizing of the spectrum resource, the voice which would have reduced the interleaving delay by about 20 ms.
performance and the cost and complexity of the equipment, full This would, however, have deteriorated the working conditions for
compliance to the specifications governing normal telephone calls the error correcting algorithm considerably. The result would be a
cannot be achieved. In the following, some of the considerations lower speech quality in error conditions and a reduced spectrum
concerning the transmission performance aspects of the GSM efficiency of the system. Conversely, an even deeper interleaving,
system are outlined. would improve the speech performance even more, but was decided
3.1 Speech Quality against because of the negative effect of increased delay on the
The basic operation offered by the system is to set up and transmission performance.
maintain the communication channel between the customers with a Processing time: As was explained in the introduction, the
certain specified performance level. The digital cellular system, provision of handheld terminals was a basic requirement which
connects to the public telephone network at the MSC through a placed rather tough constraints on the implementation of the mobile
normal A-law PCM interface as shown in Fig.3 and the whole mo- terminal. A basic parameter in this context was the the power
bile system between the microphone input to this interface could in consumption of the various components. Because power
many respects be regarded as a digital telephone. One difference consumption in CMOS VLSI is almost a linear function of the
from a normal telephone is of course that the "subscriber line" of clock rate of the processing machine, a relaxation of the delay
this DMR telephone consists of the transmission path of the radio requirement could make it possible to distribute the processing over
system, which is subject to a high degree of transmission errors. a longer period with a lower clock rate, thus reducing the power
Even though a quality as close as possible to the normal telephone consumption considerably.
quality would be desirable, this would use too much spectrum and The overall delay of about 80 ms represents the outcome of this
result in an uneconomical system. Therefore, the minimum difficult trade-off process. This delay is believed to be short enough
requirement were rather aimed to provide an average speech quality, to avoid conversational delay problems (case a) above) for a
as perceived by subscribers equal to or better than the quality offered majority of calls in the system.
in first generation 900 MHz analogue mobile systems in operation The overall delay of a GSM connection will add 80 ms to the
today. The system should at the same time be as spectrum effective delay in the PSTN part of the connection. Consequently, there is a
as possible, the aim being to offer a capacity superior to that of need for echo control in all calls including a GSM terminal. For
existing systems. this reason, an echo canceller will be placed permanently at the
This requirement has been achieved and the superiority to FM PSTN interface of the GSM system to remove any echoes being
returned from the PSTN to the mobile user. The echo control in

29B.1.3.
1062
the mobile terminal poses no problem in the handset case, since the reliable VAD and for the comfort noise updating and generation.
radio system is effectively a four-wire telephone. However, it is The solutions finally adopted [7] consist of
expected that many manufacturers will offer loudspeaking terminals. - a VAD algorithm which utilizes a number of parameters
In this case, the echo is caused by the acoustic coupling between of the RPE-LTP speech codec in its discrimination
the receiver and the microphone of the terminal. The acoustic echo pmess.
tends to be time variant and is to a large extent influenced by the - a mechanism that estimates background noise
rmm reverberation. Echo cancellers for this application are still characteristics at the transmit side,
- an in-band transmission scheme for updating of comfort
under study e.g. in CCITT and it is expected that at least for first noise parameters utilizing the normal speech codec frame,
generation terminal, an echo control based on voice switching will - a mechanism for generation of constant comfort noise
be used. between updates.
Clearly, in some cases, e.g. in mobile to mobile calls being This VAD has proven to be extremely effective in mobile noise
routed via a satellite link in the PSTN, the maximum delay of 400 environments, but will of course have problems in situations where
ms recommended by CCITT will be exceeded. the interference is speech. The comfort noise generation is based on
In order to ensure optimal performance, and to minimize the the speech decoder, and provides basically a frequency shaped
number of calls having delay problems, network planning random noise which has proven to be judged as acceptable by test
guidelines have been worked out [6], which recommend a number of subjects. Updating of the comfort noise characteristics occur at the
ways to minimize performance degradation due to long delay e.g. by end of every speech burst as well as at a regular rate of twice a
avoiding satellite routings and disabling of intermediate echo second during speech pauses.
control equipment. DTX is associated with the risk of degradation of voice
3.3 Discontinuous transmission performance. Therefore, signalling means have been specified so
During the effort to maximize the spectrum efficiency of the that the network operator may invoke (or prohibit) the use of DTX
GSM system, it was found that a significant increase in spectrum by all mobiles in a certain area by command. A obvious way of
efficiency could be achieved by utilizing voice activated utilizing this function function is to activate it in congestion
transmission. The basic principle, termed Discontinuous situations, where the access to the system is more important to the
Transmission (DTX), is to switch the transmitter on only for those customer than the voice performance.
periods when there is active speech to transmit. In this way, the
average interference on the air will be reduced thus allowing a 4 CONCLUSION
smaller frequency re-use cluster size. It was found that given an An overview of the speech functions in the GSM system has
average voice activity of 50 %, the spectrum efficiency could be been given. Possibly, todays knowledge would have resulted in
doubled under certain idealized conditions. Clearly, the prospect of other and better solutions. It should be borne in mind, however,
an even a considerably lower gain than this would be a strong that the major technical decisions in the GSM system had to be
driving force for an investigation of this approach. The power taken on the basis of what was seen to be stable and reliable
saving effect on hand-portable terminal has been found to be an solutions in 1986/87. In any case, the solutions chosen will indeed
even more important advantage of the DTX feature. meet the initial design objectives for the system and will offer far
The basic problem of DTX, if implemented uncritically, is the better spectrum efficiency and listening transmission quality than is
potential degradation of the voice performance by speech clipping offered by current analogue cellular systems. A challenge for the
(speech detected as noise) and by noise contrast effects. future still remains: the half-rate speech codec which will increase
The design of a speech detector has to weigh the risk of the spectrum efficiency of the system even further.
clipping of the talkspurts, against the risk that noise is incorrectly
classified as speech. Clipping of the speech is a serious 5 ACKNOWLEDGEMENTS
impairment that will reduce the overall quality of the system. On The authors acknowledge the efforts of all members of the
the other hand, false classification of noise as speech must be TM3/COST 207 Speech Coding Experts Group in supporting the
minimized since an increased activity factor would increase the work described in this and other papers to this conference.
interference on the air.
When the Voice Activity Detector (VAD) is used to switch the 6 REFERENCES
transmitter on and off, the effect will be a modulation of the International Conference on Digital Land Mobile Radio
background noise at the receiving end. When speech is detected,the Communication. Proceedings
Venice 30 June - 3 July 1987
background noise is transmitted together with the active speech to
the receiving end. As the speech burst ends, the connection is Digital Cellular Radio Conference. - Proceedings
Hagen, Fed.Rep. of Germany, October 1988
broken, and the received noise drops to a very low level. This step
modulation of noise is perceived as very annoying and may in high- J E Natvig, "Evaluation of Six Medium Bit-Rate Coders for
the Pan-European Digital Mobile Radio System", IEEE
contrast cases reduce intelligibility. Journal on Sel. Areas in Communications Vol 6, February
A way to reduce the annoying effect of the receive channel noise 1988, pp 324-331
modulation is to insert noise (comfort noise) into the receive P Vary et al. "Speech Codec for the European Mobile Radio
channel when the connection is broken. The improvement achieved System", Globecom '89
is dependent on the accuracy by which the inserted noise matches A.Coleman et al. "Subjective Performance of the RPE-LTP
the transmitted noise. Codec for the Pan-European Cellular Digital Mobile Radio
The noise transmitted from a mobile station will normally be System", Globecom '89
the acoustical noise that is picked up by the microphone of the GSM Recommendation 03.50 "Transmission Planning
handset or loudspeakmg terminal. As can be imagined, this noise Aspects of the Speech Service in the GSM PLMN System"
has been found to vary significantly from location to location and C.Southcott et al. "Voice Control of the Pan-European
also as a function of time during a single call (e.g. when a car is Cellular Digital Mobile Radio System", Globecom '89
accelerating). This poses special problems, both in the design of a CCITT Recommendation Q.lO.51

29B.1.4.
1063
praining 1[ Data 11
3 k 57-11p26+1F57+13
I
156.25Bitr
0.577 ms
. (b) Two Half-Rate Channels

TCx: TDYA fame no.x for traffic deb


ACCH: TDMA frame for siganlling data
Idle: Frame not in use

Fig.1. Basic TDMA frame, time slot, and burst Fig.2. Mapping of traffic channels on the physical
structures. channel

MOBILE STATION (MS) BASE STATION SYSTEM (69s)


I *

Elactro-
acourtlc
input
- - Analog to
13-blt
uniform
RPE-LTP -I
encoder

MOBILE SWITCHED

--- SWITCHING TELEPHONE

---
CENTRE NETWORK

encoder
uniform

?! ? I
I I
I
I I 8000 ramplelr
I 50 framer/a &bit A-law
Analog apeech I
I 260 bWframe I (CCITT Rec.G.711,
8000 ramplelr 8000 aamplerlr 0.712, 0.713. 0.714)
13-blt unlform code I
I 13-bit uniform code

__-----_---
I II

----I L--_-
-----_--------___
Encoded speech,--"- Encoded speech,
456 bltrlframe 456 blts/freme .--____
___--- ' I

Redlo
error Inter- Demodu-
coding leaving lation

correctlon lnter- Demodu- modu- leaving


Iatlon
Iatlon

Fig.3. Block diagram of the digital mobile radio system

29B.1.5.
1064