You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4326166

Data Mapping onto Speech-like Signal to Transmission over the GSM Voice Channel

Conference Paper · April 2008


DOI: 10.1109/SSST.2008.4480189 · Source: IEEE Xplore

CITATIONS READS

5 249

3 authors, including:

Pejman Mowlaee
100 PUBLICATIONS   831 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Signal Processing for Speech Communication View project

Phase-Aware Signal Processing for Speech Communication View project

All content following this page was uploaded by Pejman Mowlaee on 25 January 2017.

The user has requested enhancement of the downloaded file.


40th Southeastern Symposium on System Theory MA2.5
University of New Orleans
New Orleans, LA, USA, March 16-18, 2008

Data Mapping onto Speech-like Signal to


Transmission over the GSM Voice Channel

Mahsa Rashidi Abolghasem Sayadiyan Pejman Mowlaee


Msc student at Electrical Engineering Associate professor at Electrical PhD student at Electrical Engineering
Department Engineering Department Department
Amirkabir University of Amirkabir University of Amirkabir University of
m rashidi(gau.tac.ir eea35gaut.ac.ir P MowlaeeLajeee.org

Abstract- One of the most important objectives in mobile communications channel and then reaches the second
communication systems is secure voice and data GSM handset. The received waveform from second
communication (including text, picture, video and voice) esp. handset is demodulated, decrypted and finally decoded
in high bit rates. In this paper, a new procedure is proposed [3]
in which the intended data or voice is encrypted and
modulated onto speech-like waveforms. The modulated The ov ll syste ock diagrm lss nsucin
waveforms are transmitted over the global system for mobile
communications (GSM) voice channel and then demodulated
Fg1.uecto lw raeqof spee chan ing
communications, we require modems having data
and decrypted at the receiver. We propose an appropriate transfer capability with low bit rates. As a result, based
modelfor the GSMAFull Rate (FR) speech codec by mapping on the proposed method in this paper, an appropriate
data onto the fundamental parameters related to formants in modem is presented. However, in some recent works
a speech-like waveform including phases, frequencies and by Katugampala reported in [3], codebooks including
pitch frequencies. The proposed model has been evaluated the values of speech-like waveform parameters are
for a GSM-to-GSM connection. Conducting different defined including pitch frequency, Line Spectral
simulations we observed that the proposed approach results Frequencies (LSF) coefficients and frame energy in
in a bit error rate (BER) of 0.020o when Signal-to-Noise
Ratio (SNR) is 15 dB in a 1.5kbps channeL As a result, modulator side. Next these parameters are used for
proposed method can be considered as afavorable choice for waveforms synthesis. Finally, the encrypted data are
robustness to additive noise. mapped onto these waveforms. These parameters will
be derived from the received speech-like waveforms in
Keywords - formants, GSM, LSF, speech-like waveform, demodulator side and compared to codebook and
formants. finally the best one is chosen [3]. Meanwhile, this
approach has been adopted for GSM Enhanced Full
1. INTRODUCTION Rate (EFR) speech codes 12.2 kbps whereas the
proposed approach in this paper is considered for 13
Hardware and protocol deficiencies are two drawbacks kbps GSM FR speech codes as reported in ETSI GSM
for the 2 generaLion communications systems 06.10 [4].
makes them only capable for data transmission at low This paper is organized in the following sections. In
bit rates (e.g. 1120 bits per page) for Short Message section 2, the complete procedure of speech-like signal
Service (SMS) in G.7 signaling channel. However, as production, data mapping are presented. Section 3 is
the data channel is available for a limited number of dedicated to synchronization. In section 4 simulation
subscribers, data transmissions are still possible to a results are reported. Section 5 concludes.
maximum rate of 9.6 kbps.
In contrast to data channel, using a voice channel 2. SPEECH-LIKE SIGNAL PRODUCTION
can result in negligible time delays as reported in [1]. PROCEDURE
In addition, one of the most important problems in data
transmission over the GSM voice channel is to make
sure whether the transmitted data is highly secure. To
cope this problem, the resulting bit stream from a low
bit rate speech coder implemented for voice channel We require mapping data bit stream on speech-like
adaptation, usually enters into data encrypting block waveforms of2G ms length (equal to what is usually
[2]. Data will be modulated on speech-like waveforms available in GSM speech coder). Therefore in this
prior entering the global system for mobile paper, we produce speech-like waveforms based on
communications (GSM) network. The resulting Auto-regressive (AR) model. Waveforms should be
waveform then enters the first GSM handset, produced with four formants so that they can be

978-1-4244-1807-7/08/$25.OO ©2008 IEEE. 54


MA2.5

adapted to GSM coder. As formants are sensitive to where HTotal (z) is the same paralleling transfer
changes and for simply extraction of signal function, Nana is the analysis window length, and Nfr is
characteristics in the demodulation, we would prefer to the frame shift length. Note that, vectors a, f and T in
parallelized the resulting transfer function according to (4) are obtained from M prominent peaks found trough
work in [5]. Finally, by applying excitation signal to peak picking procedure as reported in [6]. Fig.2 shows
resulting transfer function, appropriate speech-like a prototype of speech-like waveforms with 20ms
signal will be produced. As a result, related the transfer length produced by harmonic modeling approach
function of ith formant is a second-order difference discussed above.
equation as follows:
Speech-like

where A.=1+B.+C. (1)


4lCpare Bit Strem Data
Speech-like
waveform Speech Channel
B1 = 2 exp(-7w Af fs). Cos(27f Ifs)
I 1j Ot Demodulator | CodC Decod ing

Ci =-exp(-27 Af1/ fs) Figure 1: overview of the complete system.


1.2
wherefs is sampling frequency, Af andf are formants'
frequencies and bandwidths, respectively. In order to 1
have a logic compare between the received envelope 0.8
spectrum and the one in the transmitter paralleling of
Hi (z) is done under conditions as follows: E 06

* Firstly, we normnalize the transfer function related 0.4


to each formant to its central frequency: z 0

Hi (ee) 2w ki =1 i 1,2,3,4 (2) 0


0 20 40 60 80 100 120 140 160
Time(sample,8KHz)
* Secondly, another normalization should be Figure 2: Synthesized speech-like signal
employed in a parallel resultant format transfer
function which can be written as below: 2.2. Data mapping on speech-like signal

a, H (ejn) +/3 IH2(eicon +P H(e One of the key points is to correctly select the
H4(e'X)+/I=1HI(e'I)
+2n H4(ej-" ~+t...~(e') (3) formants' frequencies within telephone voice band
n= 1,...,4 (300-3400 Hz). During experiments and investigations
we concluded that among the mentioned parameters in
where, an Pn Pn and An are the normalized equation formants, their related frequencies and phases can only
coefficients. Finally, the speech-like waveform will be be detected as speech-coded passing signals in a voice
resulted from the overall spectral envelope by channel. As a result, we explain in detail how to select
employing the harmonic model synthesis method based parameters and to allocate data bits to frequencies and
on [6]. As a result, the complete process for waveforms phases parameters. We should select the 1 st and 2nd
production can be demonstrated as follows: frequency formants among the frequencies ranges in
300 to 1000 Hz. These frequencies are encoded by 3
bits. Note that, the frequency range of the third formant
a HTtl(f) (4) ranges in 1400 to 2500 Hz and coded by 3 bits and
v=2a (Naa
N 2 -Nf) fourth format range between 2900 to 3400 Hz coded in
2 bits. Note that since harmonic model is used in the
Al
(27f proposed method, the formant frequencies discussed
s
=ncosATnp
a n=I
-above should be selected as a multiple of the pitch
frequency which results in a negligible error in

55
MA2.5

extracting information. The selection criteria are as jumps occurring in frame boundaries, it is necessary to
follows: overlap the produced speech-like waveforms with
above approach. Also it should be considered that data
1. Received signal amplitude should be more than bit streams on speech-like signal remain undamaged.
70 percent of transmitted amplitude. To this end, it is so important to select pitch period i.e.
1/fp that has direct relation to data mapped on each
2. Frequency displacement of received formants frame. GSM codec does a linear interpolation between
should not be more than a default frequency steps Log Area Ratio (LAR) coefficients of two adjacent
for each formant. Otherwise, it causes incorrect frames (each frame consisting of 160 samples). To
extraction of the mapped information in the avoid spurious transients as well as interpolating LAR
resulting frequencies. As a result, selecting the coefficients of the last frame's the primary 40 samples
frequency steps as a multiple of the given pitch with LAR coefficients of the current frame's the
frequency fulfills such a condition. However, a primary 40 samples [4]. This motivates us to the idea
larger frequency step is selected for the 4th that adjacent frames should have the minimum overlap.
formant due to its high sensitivity to This is due to the fact when a PCM waveform signal
displacement. starts GSM tandem connection; high overlapping of
intra-frames does not cause tremendous changes in
3 nother important point is the lack of proximity reflection coefficients of each frame. This, as a result,
Another impotacntformants.
in two adjacent poimantsisAsthresulack
a result, pheroxicauses
there are incorrect detection of transmitted data. Note
ththeorainsmesnechfmehulnt
unusable regions iin bounary
band regons
unusale
bad boundary btwee
between that the overlappingg samples in each frame should not
formants. This is due to the fact that minimum be chosen in order to prevent inter-modulation effects.
As a result, (5) presents the linear interpolation for
distance for two adjacent formants is twice the
bandwidth considered while their bandwidths are proposed modulator:
the same. Due to the lack of fidelity in GSM
coder/decoder to formants bandwidths, we only Y (1+a+160-n)
consider constant and similar bandwidth 2a +1
ofAf =160 Hz in whole synthesis process. Y2 = (n + a-160) n (160- a), ...160
2a+I (5)
As the phase fidelity only holds for frequencies under (m + a)
1 kHz, some information should be preserved in phases -Y3 2a +1 m n=1, ,(1+a)
related to first and second formants. As a result, the l (+ a-m)
difference between the extracted phase from the Y4 = a-n
received signal envelope and the mapped phase in that 2a +1
particular frequency phase should be coded within 3
bits. Another important parameter is pitch-frequency Where a equals the overlapping samples in each frame.
selection problem which is proportionate to the choice Note that, Y1, Y2 are multiplied by samples of
of the synthesize window employed in harmonic (160-a) to 160 in the last frame, s(i 1), and y3 y4 are
analysis procedure discussed earlier in Section 2.1. As multiplied by samples of Ito (1+ a) in the current
a result, we observed that using pitch frequencies
fp=123 Hz and fp=125 Hz result in acceptable frame, sI, presented in (6), respectvely.
performance. Therefore we coded the mapped data on
pitch frequencies while employing 1 bit. Finally, the L1 = Y1 x S(j1) (¼)
whole speech-like waveform procedure can be L2= Y2 x S(i_1) (k1)
modulated by 12 data bits in a 20ms frame length. In
addition, we demonstrate in the simulation results that L3 = Y3 x S1 (½)
using the proposed technique we achieved at a bitrate L4 = x Si (k2) (6)
of 600bps. Interpolat ion1 = L + L2
Interpolat ion 2 =L3 + L4
2.3. Intra-frame Interpolation
Interpolat ion = Interpolat ion 1 + Interpolat ion 2
In order to achieve phase continuity which is an where k1 k2 are the numbers of samples interpolated,
important characteristic in speech signals and some Interpolation1, Interpolation2 are the overlapped

56
MA2.5

samples of each frame and Interpolation is the whole 0.8


interpolated samples of two adjacent frames and note 0.7
that in demodulator this region shouldn't be chosen. 0.6
Finally, appropriate PCM waveform signal has been 0.5
prepared to enter into the speech coded voice channel. 0.4
E
< 0.3
3. Synchronization N 0.2
° 0.1
One of the important things in simulation is 0A
synchronization of system elements so in order to
simulate the synchronization of the speech codec Received signal
frames in two base-stations, we considered this effect 0.2 Synthesized signal
0 20 40 60 80 100 120 140
inserted into system by a random number of samples Time(samples,8KHz)
before the signal passed to the second codec. Then to Figure 3: Synthesized and Received signals
simulate the synchronization of modulator and
demodulator, at the start of any communication a
predefined synchronization sequence is sent from the 0.9
modulator to the demodulator. This sequence of 0.8
samples is known to both. Since in the simulation it is 0.7
known that there will be a synchronization sequence in _ 0.6
the input signal, the synchronization module cross-
correlate a fixed predefined number of input samples at _
the beginning of the transmission with the predefined 0.4
synchronization sequence. The sample sequence that 03
best matched is used for synchronization in the 0.2
demodulator. 0.1 Received spectral envelope
Synchronized spectral envelope
0 500 1000 1500 2000 2500 3000 3500 4000
4. Simulation Results frequency(Hz)
Figure 4: Synchronized and Received envelopes
We tested our system on the GSM-to-GSM connection. We achieved a throughput of 2kbps with 0.3°0 Bit
In our simulation we generate speech-like signal by the error rate (BER). Using a Punctured l2-rate
proposed method with the length of 2.5s consisting of convolutional code with constraint length of 7, on the
about 120 different waveforms (with 20ms length) and 2kbps, achieved a 1.15 kbps channel with 0.02% BER
The best interpolation occur in a=7. To synchronize for SNR=15dB. As a result, the proposed method can
modulator and demodulator, before starting of any be considered as a favorable choice due to its
transmission a predefined synchronization sequence is robustness to additive noise as depicted in Fig. 5.
sent, that the synchronization process occurs in the
22nd sample of this sequence. Next, generated signal is 10-3
transmitted from modulator to the coder, channel, Empirical
decoder and demodulator. Fig.3 evaluates 3rd frame of
synthesized and received signals.
In order to extract the important speech-like
parameters, we need to have the envelope spectrum of
the received signal. Hence, fig.4 illustrates the 4

envelope spectra of signals. Note fig. 3, 4 have been


generated by pitch-frequency corresponding to 125 Hz 2
and signals are not selected from interpolated samples
as explained in section 2.3. Four peaks with maximum
amplitudes show displacement of central frequencies
of 3rd frame's formants in synthesized and received 1o-5 10 15 20 25 30
signals as depicted in Fig.4. SNR per bit,Eb/No (dB)
Fig.5. BER over Fading channel with additive Noise for
BT=0.3

57
MA2.5

5. Conclusion
A robust method is proposed for secure data
transmission over a GSM voice channel. The method
was based on transmitting of the mapped data on the
fundamental parameters related to formants in a
speech-like waveform including phases, frequencies
and pitch resulting in transferring 12 bits data on a
speech-like waveform using frame size of 20ms.

Reference
[1] M. Street, "Interoperability and international operation:
An introduction to end to end mobile security", IEE Secure
GSM and Beyond: End to End Security for Mobile
Communications, London, Feb., 2003.
[2] M. Stefanovic, Y. D. Cho, S. Villette, and A. M. Kondoz,
"A 2.4/1.2 kb/s speech coder with noise pre-processor",
proceedings EUSIPCO 2000, Tampere, Finland, pp. 4-8,
Sept., 2000.
[3] N. Katugampala, S. Villette, and A. Kondoz, "Secure
voice over GSM and other low bit rate systems," IEE Secure
GSM and Beyond: End to End Security for Mobile
Communications, London, Feb., 2003.
[4] J. Degener and C. Bormann." Gsm 06.10 lossy speech
compression".ftp://ftp.cs.tu-rlin.de/pub/local/kbs/tubmik/gsm
/gsm-1.0. 10.tar.gz.
[5] D. H. Klatt, "Software of cascade/parallel formant
synthesizer", J. Acoustic. Soc. Am. 67 (3), Massachusett, pp
971-996, Mar., 1980
[6] R.J. McAulay and T.F. Quatieri, "Speech
analysis/synthesis based on a sinusoidal representation,"
IEEE Trans. ASSP, vol.34, pp. 744-754, Aug., 1986.

58

View publication stats

You might also like