Professional Documents
Culture Documents
net/publication/4326166
Data Mapping onto Speech-like Signal to Transmission over the GSM Voice Channel
CITATIONS READS
5 249
3 authors, including:
Pejman Mowlaee
100 PUBLICATIONS 831 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Pejman Mowlaee on 25 January 2017.
Abstract- One of the most important objectives in mobile communications channel and then reaches the second
communication systems is secure voice and data GSM handset. The received waveform from second
communication (including text, picture, video and voice) esp. handset is demodulated, decrypted and finally decoded
in high bit rates. In this paper, a new procedure is proposed [3]
in which the intended data or voice is encrypted and
modulated onto speech-like waveforms. The modulated The ov ll syste ock diagrm lss nsucin
waveforms are transmitted over the global system for mobile
communications (GSM) voice channel and then demodulated
Fg1.uecto lw raeqof spee chan ing
communications, we require modems having data
and decrypted at the receiver. We propose an appropriate transfer capability with low bit rates. As a result, based
modelfor the GSMAFull Rate (FR) speech codec by mapping on the proposed method in this paper, an appropriate
data onto the fundamental parameters related to formants in modem is presented. However, in some recent works
a speech-like waveform including phases, frequencies and by Katugampala reported in [3], codebooks including
pitch frequencies. The proposed model has been evaluated the values of speech-like waveform parameters are
for a GSM-to-GSM connection. Conducting different defined including pitch frequency, Line Spectral
simulations we observed that the proposed approach results Frequencies (LSF) coefficients and frame energy in
in a bit error rate (BER) of 0.020o when Signal-to-Noise
Ratio (SNR) is 15 dB in a 1.5kbps channeL As a result, modulator side. Next these parameters are used for
proposed method can be considered as afavorable choice for waveforms synthesis. Finally, the encrypted data are
robustness to additive noise. mapped onto these waveforms. These parameters will
be derived from the received speech-like waveforms in
Keywords - formants, GSM, LSF, speech-like waveform, demodulator side and compared to codebook and
formants. finally the best one is chosen [3]. Meanwhile, this
approach has been adopted for GSM Enhanced Full
1. INTRODUCTION Rate (EFR) speech codes 12.2 kbps whereas the
proposed approach in this paper is considered for 13
Hardware and protocol deficiencies are two drawbacks kbps GSM FR speech codes as reported in ETSI GSM
for the 2 generaLion communications systems 06.10 [4].
makes them only capable for data transmission at low This paper is organized in the following sections. In
bit rates (e.g. 1120 bits per page) for Short Message section 2, the complete procedure of speech-like signal
Service (SMS) in G.7 signaling channel. However, as production, data mapping are presented. Section 3 is
the data channel is available for a limited number of dedicated to synchronization. In section 4 simulation
subscribers, data transmissions are still possible to a results are reported. Section 5 concludes.
maximum rate of 9.6 kbps.
In contrast to data channel, using a voice channel 2. SPEECH-LIKE SIGNAL PRODUCTION
can result in negligible time delays as reported in [1]. PROCEDURE
In addition, one of the most important problems in data
transmission over the GSM voice channel is to make
sure whether the transmitted data is highly secure. To
cope this problem, the resulting bit stream from a low
bit rate speech coder implemented for voice channel We require mapping data bit stream on speech-like
adaptation, usually enters into data encrypting block waveforms of2G ms length (equal to what is usually
[2]. Data will be modulated on speech-like waveforms available in GSM speech coder). Therefore in this
prior entering the global system for mobile paper, we produce speech-like waveforms based on
communications (GSM) network. The resulting Auto-regressive (AR) model. Waveforms should be
waveform then enters the first GSM handset, produced with four formants so that they can be
adapted to GSM coder. As formants are sensitive to where HTotal (z) is the same paralleling transfer
changes and for simply extraction of signal function, Nana is the analysis window length, and Nfr is
characteristics in the demodulation, we would prefer to the frame shift length. Note that, vectors a, f and T in
parallelized the resulting transfer function according to (4) are obtained from M prominent peaks found trough
work in [5]. Finally, by applying excitation signal to peak picking procedure as reported in [6]. Fig.2 shows
resulting transfer function, appropriate speech-like a prototype of speech-like waveforms with 20ms
signal will be produced. As a result, related the transfer length produced by harmonic modeling approach
function of ith formant is a second-order difference discussed above.
equation as follows:
Speech-like
a, H (ejn) +/3 IH2(eicon +P H(e One of the key points is to correctly select the
H4(e'X)+/I=1HI(e'I)
+2n H4(ej-" ~+t...~(e') (3) formants' frequencies within telephone voice band
n= 1,...,4 (300-3400 Hz). During experiments and investigations
we concluded that among the mentioned parameters in
where, an Pn Pn and An are the normalized equation formants, their related frequencies and phases can only
coefficients. Finally, the speech-like waveform will be be detected as speech-coded passing signals in a voice
resulted from the overall spectral envelope by channel. As a result, we explain in detail how to select
employing the harmonic model synthesis method based parameters and to allocate data bits to frequencies and
on [6]. As a result, the complete process for waveforms phases parameters. We should select the 1 st and 2nd
production can be demonstrated as follows: frequency formants among the frequencies ranges in
300 to 1000 Hz. These frequencies are encoded by 3
bits. Note that, the frequency range of the third formant
a HTtl(f) (4) ranges in 1400 to 2500 Hz and coded by 3 bits and
v=2a (Naa
N 2 -Nf) fourth format range between 2900 to 3400 Hz coded in
2 bits. Note that since harmonic model is used in the
Al
(27f proposed method, the formant frequencies discussed
s
=ncosATnp
a n=I
-above should be selected as a multiple of the pitch
frequency which results in a negligible error in
55
MA2.5
extracting information. The selection criteria are as jumps occurring in frame boundaries, it is necessary to
follows: overlap the produced speech-like waveforms with
above approach. Also it should be considered that data
1. Received signal amplitude should be more than bit streams on speech-like signal remain undamaged.
70 percent of transmitted amplitude. To this end, it is so important to select pitch period i.e.
1/fp that has direct relation to data mapped on each
2. Frequency displacement of received formants frame. GSM codec does a linear interpolation between
should not be more than a default frequency steps Log Area Ratio (LAR) coefficients of two adjacent
for each formant. Otherwise, it causes incorrect frames (each frame consisting of 160 samples). To
extraction of the mapped information in the avoid spurious transients as well as interpolating LAR
resulting frequencies. As a result, selecting the coefficients of the last frame's the primary 40 samples
frequency steps as a multiple of the given pitch with LAR coefficients of the current frame's the
frequency fulfills such a condition. However, a primary 40 samples [4]. This motivates us to the idea
larger frequency step is selected for the 4th that adjacent frames should have the minimum overlap.
formant due to its high sensitivity to This is due to the fact when a PCM waveform signal
displacement. starts GSM tandem connection; high overlapping of
intra-frames does not cause tremendous changes in
3 nother important point is the lack of proximity reflection coefficients of each frame. This, as a result,
Another impotacntformants.
in two adjacent poimantsisAsthresulack
a result, pheroxicauses
there are incorrect detection of transmitted data. Note
ththeorainsmesnechfmehulnt
unusable regions iin bounary
band regons
unusale
bad boundary btwee
between that the overlappingg samples in each frame should not
formants. This is due to the fact that minimum be chosen in order to prevent inter-modulation effects.
As a result, (5) presents the linear interpolation for
distance for two adjacent formants is twice the
bandwidth considered while their bandwidths are proposed modulator:
the same. Due to the lack of fidelity in GSM
coder/decoder to formants bandwidths, we only Y (1+a+160-n)
consider constant and similar bandwidth 2a +1
ofAf =160 Hz in whole synthesis process. Y2 = (n + a-160) n (160- a), ...160
2a+I (5)
As the phase fidelity only holds for frequencies under (m + a)
1 kHz, some information should be preserved in phases -Y3 2a +1 m n=1, ,(1+a)
related to first and second formants. As a result, the l (+ a-m)
difference between the extracted phase from the Y4 = a-n
received signal envelope and the mapped phase in that 2a +1
particular frequency phase should be coded within 3
bits. Another important parameter is pitch-frequency Where a equals the overlapping samples in each frame.
selection problem which is proportionate to the choice Note that, Y1, Y2 are multiplied by samples of
of the synthesize window employed in harmonic (160-a) to 160 in the last frame, s(i 1), and y3 y4 are
analysis procedure discussed earlier in Section 2.1. As multiplied by samples of Ito (1+ a) in the current
a result, we observed that using pitch frequencies
fp=123 Hz and fp=125 Hz result in acceptable frame, sI, presented in (6), respectvely.
performance. Therefore we coded the mapped data on
pitch frequencies while employing 1 bit. Finally, the L1 = Y1 x S(j1) (¼)
whole speech-like waveform procedure can be L2= Y2 x S(i_1) (k1)
modulated by 12 data bits in a 20ms frame length. In
addition, we demonstrate in the simulation results that L3 = Y3 x S1 (½)
using the proposed technique we achieved at a bitrate L4 = x Si (k2) (6)
of 600bps. Interpolat ion1 = L + L2
Interpolat ion 2 =L3 + L4
2.3. Intra-frame Interpolation
Interpolat ion = Interpolat ion 1 + Interpolat ion 2
In order to achieve phase continuity which is an where k1 k2 are the numbers of samples interpolated,
important characteristic in speech signals and some Interpolation1, Interpolation2 are the overlapped
56
MA2.5
57
MA2.5
5. Conclusion
A robust method is proposed for secure data
transmission over a GSM voice channel. The method
was based on transmitting of the mapped data on the
fundamental parameters related to formants in a
speech-like waveform including phases, frequencies
and pitch resulting in transferring 12 bits data on a
speech-like waveform using frame size of 20ms.
Reference
[1] M. Street, "Interoperability and international operation:
An introduction to end to end mobile security", IEE Secure
GSM and Beyond: End to End Security for Mobile
Communications, London, Feb., 2003.
[2] M. Stefanovic, Y. D. Cho, S. Villette, and A. M. Kondoz,
"A 2.4/1.2 kb/s speech coder with noise pre-processor",
proceedings EUSIPCO 2000, Tampere, Finland, pp. 4-8,
Sept., 2000.
[3] N. Katugampala, S. Villette, and A. Kondoz, "Secure
voice over GSM and other low bit rate systems," IEE Secure
GSM and Beyond: End to End Security for Mobile
Communications, London, Feb., 2003.
[4] J. Degener and C. Bormann." Gsm 06.10 lossy speech
compression".ftp://ftp.cs.tu-rlin.de/pub/local/kbs/tubmik/gsm
/gsm-1.0. 10.tar.gz.
[5] D. H. Klatt, "Software of cascade/parallel formant
synthesizer", J. Acoustic. Soc. Am. 67 (3), Massachusett, pp
971-996, Mar., 1980
[6] R.J. McAulay and T.F. Quatieri, "Speech
analysis/synthesis based on a sinusoidal representation,"
IEEE Trans. ASSP, vol.34, pp. 744-754, Aug., 1986.
58