Professional Documents
Culture Documents
Umea University
Department of Computing S
ien
e
SE-901 87 UMEA
SWEDEN
Abstra
t
This master's thesis report des
ribes the te
hnology and implementation of a system
prototype to stream audio books to mobile phones in GPRS and 3G networks. The
appli
ation on the mobile phone is developed using Java, J2ME. The audio books are
streamed from a streaming server to mobile phones in real time. The audio format used
for streaming is AMR audio.
Contents
1 Introdu tion 1
1.1 Ba
kground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5.1 J2ME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Pre-study 5
3 Requirements 7
4.1 GPRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 EDGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 3G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Mobile network ar
hite
ture . . . . . . . . . . . . . . . . . . . . . . . . . 12
5 Streaming te hnology 13
iii
6 Results 27
7 User's Guide 37
8 Con lusions 47
9 A knowledgments 49
Referen es 51
B Abbreviations 55
List of Figures
v
6.7 An overview of the buering and play ba
k of the audio . . . . . . . . . 33
vii
Chapter 1
Introdu tion
This is the nal report for the master's thesis proje
t to develop a system prototype for
delivering real-time streaming of audio books to mobile phones supporting Java[20℄.
1.1 Ba
kground
During the last few years the interest for listening to audio books in Sweden has in-
reased. In year 2004 over 160 new Swedish audio book titles were published and the
total turnover for the audio book business was 96,4 million Swedish
rowns, a
ording
to statisti
s from Forlaggareforeningen[7℄.
Through new te
hnology and higher Internet bandwidth, distribution of audio books
in mp3-format (MPEG-1 Audio Layer 3) to
omputers has started. This in
ombination
with the fa
t that ordinary audio books do not use DRM (Digital Rights Management)
have put the audio book publishing houses in a tri
ky situation
on
erning pirate
opy-
ing versus new te
hnology. The pirate
opying of audio books has dramati
ly in
reased
and
osts the publishing houses a lot of money every year.
In
ooperation with Bonnier Audio[4℄, the largest audio book publishing house in Swe-
den, an idea about trying to
reate a system prototype for delivering real-time streaming
of audio books to the mobile phone
ame about. In su
h a system the mobile phone
users
ould have easy a
ess to audio books but without getting an a
tual
opy of the
audio data, that potentially
ould be saved and redistributed. This is be
ause no data
is permanently stored on the mobile phone. This
ould be one way for the publishing
houses to take advantage of the new te
hnology to rea
h new users without having to
worry about pirate
opying.
1.2 Goal
The goal with this thesis proje
t is to develop a system prototype to deliver real-time
streaming of audio books to mobile phones supporting Java. The main task will be to
implement streaming using J2ME[19℄ (Java 2 Platform, Mi
ro Edition), a version of Java
1
that is used in most modern mobile phones. More
on
rete this means that following
proto
ols needs to be implemented into J2ME, RTSP[11℄, SDP[12℄ and RTP[10℄. More
about RTSP, SDP and RTP in
hapter "Streaming te
hnology".
Another goal was that the appli
ation, on the mobile phone, should be easy to download,
install, use and run independent of the mobile phone's operating system.
1.3 Purpose
Sin
e more and more people seems to like the idea of listening to audio books while on
the move an appli
ation for this purpose in the mobile phone, whi
h people often
arry
around, would be a great feature. The purpose of this thesis proje
t is to develop a
system prototype for su
h a feature using J2ME.
1.4 Methods
The following steps were used as method for this proje
t:
{ Use
ase s
enarios: To get a better overview of how the system should work and
how the user should intera
t with the appli
ation on the mobile phone, use
ase
s
enarios was used.
{ Te
hnology: An in-depth study of the te
hnology used for streaming had to be
done.
{ Design and implementation: Design, implementation and testing of the system.
{ Do
umentation: Writing the nal thesis report.
1.5 Tools
The following tools were used during the development of the system:
{ Platform: Windows XP
{ Program language: Java, J2ME
{ Editor: Jedit
{ Version
ontrol system: CVS
{ Stream server: Darwin streaming server[16℄
{ Database: MySQL[9℄
{ Servlet
ontainer: Tom
at[6℄
{ Mobile phone: Nokia 6680
{ Audio books: A few audio books from Bonnier Audio.
{ Network sniÆng program: Ethereal[5℄
{ Mobile phone simulation environment: Sony Eri
sson J2ME SDK 2.2.0[1℄
1.5.1 J2ME
J2ME stands for Java 2 Platform, Mi
ro Edition. This is a version of Java that is de-
signed for devi
es with limited memory, display and pro
essing power. Mobile phones
and PDAs (Personal Digital Assistant) are examples of su
h devi
es.
The
onguration, that denes the Java language features and the
ore Java libraries of
the JVM (Java Virtual Ma
hine), that is used is CLDC. CLDC stands for Conne
ted,
Limited Devi
e Conguration.
The JVM is a virtual ma
hine that runs the Java appli
ations. It translates the
lass
les into ma
hine
ode for the platform where the JVM is running. It is the use of a
virtual ma
hine that makes Java independent of underlying operating systems.
When using CLDC the virtual ma
hine is
alled KVM (K Virtual Ma
hine). It is a
virtual ma
hine designed for limited devi
es.
Apart from the
onguration, CLDC, a prole is needed to further spe
ify what kind of
devi
e the appli
ation will operate on. The prole is like an extension of the
ongu-
ration. The prole used in this proje
t is MIDP 2.0 (Mobile Information Devi
e Prole).
Figure 1.1 shows the hierar
hy of the virtual ma
hine (KVM),
onguration (CLDC)
and the prole (MIDP 2.0) ar
hite
ture.
MID Profile
CLDC Core Libraries
K Virtual Machine (KVM)
Host Operating system
In order to play ba
k sound using J2ME an API
alled MMAPI (Multimedia API) is
needed.
MMAPI
{ Player. A Player that a
epts and de
odes the media data. It is neutral to
whatever media data it re
eives. There is no dieren
es between a Player for
audio or video data. Dieren
es between audio and video data are made with
ontrols asso
iated with the Player.
{ Controls are used to render media of a spe
i
type. Every media type require
or re
ommend that one or more
ontrols are added to the Player. Example of a
ontrol is a volume
ontrol for audio data.
{ DataSour
e. The DataSour
e provides proto
ol handling and methods that
ontrol
media play ba
k and syn
hronization. It hides the details of how the media data
is read from its sour
e to the Player. For example, the media
an be read from
HTTP[13℄, a le or from other me
hanisms.
{ Manager. The manager puts all these pie
es together by letting the user
reate
Players and asso
iate them with a DataSour
e.
Chapter 2
Pre-study
In the early stage of the pre-study, before any implementation de
isions were made, some
use
ase s
enarios was made up to point out the issues that
an arise when streaming
audio in a mobile environment.
The next stage was to
hoose a mobile appli
ation platform. To make the appli
ation
easy to install and independent of underlying operating systems a de
ision to implement
the appli
ation in J2ME was taken. Another
ontributing fa
tor for the
hoi
e of J2ME
was that most modern mobile phones supports J2ME.
When the
hoi
e of platform was done the fo
us turned to investigate pros and
ons
of dierent streaming proto
ols. To make the solution independent of underlying trans-
port proto
ols a de
ision was made to use RTSP (Real-Time Streaming Proto
ol) in
ombination with RTP (Real-Time Transport Proto
ol). These proto
ols are standard
in most streaming servers. However, these proto
ols have some limitations in the ability
to pass through rewalls and networks with NAT/NAPT (Network Address Transla-
tion/Network Address Port Translation), whi
h
an be a major issue when streaming
data in mobile networks.
The quality versus bandwidth-requirements of the audio data is a big issue when stream-
ing data in a mobile environment. The system protoype required an audio format that
was optimized for spee
h and demanded low bandwidth.
There were two formats that stood out, AMR[3℄ (Adaptive Multi-rate Code
) and
Speex[18℄, both built on CELP (Code Ex
ited Linear Predi
tion) te
hnology. Speex
is an open-sour
e patent-free audio
ompression format in
ontrast to the proprietary
AMR format patented by Voi
eAge. Both are optimized for spee
h and have low band-
width requirements.
The fa
t that Speex is still under development means that it suers from some ma-
jor drawba
ks
on
erning eÆ
ien
y on devi
es with low pro
essing
apa
ity. AMR on
the other hand have been used for a very long time and is supported in the hardware
of all mobile phones with MMS
apability. After
areful
onsideration a de
ision to use
AMR as audio format was made.
5
Figure 2.1 shows the dieren
e in size between WAV, MP3 and AMR (AMR-NB) audio.
The three blo
ks represents an audio book CD in these audio formats. The size of these
les are:
{ WAV: 357,25 MB
{ MP3: 48,61 MB
{ AMR: 14,77 MB
The AMR le is approximately 4,1% of the size of the WAV le and approximately
30,4% of the size of the mp3 le. This makes AMR audio the best
hoi
e of these three
audio types for this system.
400
350
300
250
Size (MB)
200
150
100
50
0
WAV MP3 AMR
Figure 2.1: Dieren
e in size between WAV, MP3 and AMR audio
Chapter 3
Requirements
To get organized and to keep fo
us on the main tasks during this proje
t some re-
quirements was set. The requirements has been divided into two parts, SR (system
requirements) and AR (appli
ation requirements).
7
{ AR6: The appli
ation should have an audio book toplist and an oers-list.
{ AR7: Bookmarks and information about books that have been bought should be
stored in the memory of the mobile phone.
{ AR8: From the appli
ation a users hould be able to play, pause, rewind, fast
forward, set bookmark and
hange the volume.
{ AR9: The appli
ation should be able to resume the listening from where the users
last stopped listening.
{ AR10: The appli
ation should have support for multiple languages.
Internet
3G/GPRS
network
Mobile phone
Stream server
As the appli
ation should be able to run in GPRS[17℄, EDGE[17℄ and 3G[17℄[8℄ networks,
this
hapter will explain these te
hnologies and the dieren
es between them.
4.1 GPRS
GPRS (General Pa
ket Radio Servi
e) is an additional
omponent to existing GSM
(Global System for Mobile Communi
ations) ar
hite
ture to support pa
ket-swit
hed
proto
ols for transferring data.
GPRS was introdu
ed be
ause the existing GSM network was not suited to handle
pa
ket-swit
hed data in an eÆ
ient way. GPRS has two main
omponents, Coding
S
hemes (up to four s
hemes) and Time Slots (up to eight s
hemes).
GPRS theoreti
ally
onsist of four
hannel
oding s
hemes, CS-1, CS-2, CS-3 and CS-4.
All s
hemes have dierent properties, CS-1 has the most eÆ
ient error
orre
tion and
is suited to be used when the quality of the radio link is poor. CS-4 has no error
or-
re
tion and should only be used in very good
onditions. Today mostly CS-1 and CS-2
are implemented in a
tual mobile networks.
GPRS allows several mobile stations (usually mobile phones) to share the same fre-
quen
y by dividing it into dierent timeslots. Due to the pa
ket-swit
hed
hara
teristi
s
of GPRS the allo
ation of the available timeslots may vary from one instant to the next
(e.g. it may have eight timeslots at one time and four later on). This allows multiple
users to share the same transmission medium by using only the part of the bandwidth
they require. This
ould potentially be a problem when dealing with real-time streaming
where one would like to have a
uent bit rate at all time.
GPRS main features are speed, immedia
y, and better use of utilization of network
resour
es. By using multiple timeslots simultaneously and more eÆ
ient algorithms
for
hannel
oding GPRS
an a
hieve higher data-rates (speed) than GSM. Immedia
y
means that no "dial-up pro
edure" must be used as in
ir
uit-swit
hed data networks.
The data is instead transferred in pa
kets and routed individually whi
h means that
9
there is no need to establish
onne
tions between the network nodes. Therefore the
data
an be transferred almost immediately to the mobile station upon request.
For the end user this also makes billing more
exible, be
ause the user only have to
pay for the a
tual data transferred and not upon
onne
tion time. GPRS
an also use
timeslots that are left over from
ir
uit-swit
hed
onne
tions to transfer data, this im-
proves the utilization of the radio resour
es in the network.
GPRS
an theoreti
ally transfer data in 171.2 kbps, by using all eight timeslots si-
multaneously and a
hannel
oding with redu
ed error
orre
tion.
4.2 EDGE
EDGE (Enhan
ed Data Rates for Global Evolution), is an upgrade of the GSM and
GPRS network that improves the air interfa
e between a mobile station, for example a
mobile phone, and a base station. Using EDGE the speed of transferring data will be
improved in both pa
ket-swit
hed and
ir
uit-swit
hed
onne
tions.
The networks
apa
ity will grow using EDGE be
ause it makes it possible for more
users to share the same timeslots. EDGE
an also share timeslots with
onventional
GPRS networks whi
h improves the utilization of the radio resour
es.
EDGE and GSM/GPRS operates on the same frequen
y, but they use dierent ra-
dio
hannel modulations and proto
ols. EDGE uses 8-Phase Shift Keying, 8PSK, and
GSM/GPRS uses Gaussian Minimum Shift Keying, GMSK. 8PSK is usually more ef-
fe
tive than GMSK.
As GSM/GPRS uses four dierent
oding s
hemes, CS-1 to CS-4, EDGE uses nine
dierent s
hemes,
alled MCS-1 to MCS-9. The rst four s
hemes use GMSK and are
ee
tive in
onditions with bad data-rate. The last ve s
hemes use 8PSK and they
oer more data-rate. The rst four s
hemes use rates from 8.8 kbps to 17.6 kbps per
timeslot and the last ve use data-rates from 22.4 kbps to 59 kbps per timeslot.
The theoreti
al maximal data-rate using EDGE with eight timeslots is 473.6 kbps.
4.3 3G
3G is short for third generation mobile system. One 3G system is UMTS[8℄ (Universal
Mobile Tele
ommuni
ations System). UMTS is
ompletely dierent from GSM/GPRS.
UMTS uses a te
hnology
alled WCDMA (Wideband Code Division Multiple A
ess)
that does not use timeslots like GSM/GPRS. Instead the devi
es that uses WCDMA,
su
h as mobile phones, share the same frequen
y and they separate ea
h other with the
use of hash
odes.
Be
ause of this UMTS networks needs dierent base station sub systems than GSM/GPRS
networks. UMTS base station sub systems are
alled UTRAN (UMTS Terrestrial Radio
A
ess Network).
UMTS and GPRS uses the same GPRS
ore network as GSM/GPRS systems and
UMTS mobile stations are also ba
kward
ompatible with GSM/GPRS so that they
an use GSM/GPRS if there is no UMTS
onne
tion available.
UMTS have mu
h higher bitrates than GSM, GPRS and EDGE, with theoreti
al max-
imal speeds of 384 kbps for
ir
uit-swit
hed
onne
tions, voi
e and video
alls, and 2
Mbps for pa
ket-swit
hed
onne
tions, data and Internet
onne
tions.
UMTS has four dierent quality-of-servi
e
lasses. See the list below for examples of
appli
ations that uses the dierent
lasses.
{ Conversational
lass: For real-time servi
es like voi
e and video
all and real time
gaming.
{ Streaming
lass: For streaming multimedia
ontent.
{ Intera
tive
lass: For web browsing and non real-time gaming.
{ Ba
kground
lass: For ba
kground download of emails, for example.
The rst two
lasses will be transmitted as real-time
onne
tions over the WCDMA air
interfa
e and the last two will be transmitted as s
heduled non real-time pa
ket data.
The
onversational and the streaming
lass is for servi
es with lower response times
and higher throughput than the intera
tive and the ba
kground
lass.
4.4 Mobile network ar
hite
ture
Below in gure 4.1 a simplied view how the mobile te
hnologies (GPRS/EDGE/3G)
links together as a network. Both UTRAN and GSM BSS (GSM Base Station Sub-
system) share the same GPRS ba
kbone to send and re
ieve data. From the GPRS
ba
kbone the data is passed through a rewall to rea
h the Internet and the streaming
server.
3G Mobile Streaming
Server
UTRAN
UTRAN
GPRS
GPRS Internet
backbone Internet
backbone
GSM BSS
GSM BSS
Firewall
GPRS/EDGE
Mobile
Streaming te hnology
This hapter des ribes the streaming te hnology used in the system.
payload
SDP formats
TCP UDP
IP
13
5.1.1 RTSP
RTSP stands for Real-Time Streaming Proto
ol. The RTSP proto
ol is used for estab-
lishment and
ontrol of time-syn
hronized streams of
ontinuous media su
h as audio
and video. One
ould think of RTSP as a remote
ontrol for multimedia servers. To
make the intera
tion between RTSP servers and
lients
exible, the proto
ol does not
have a notion of a
onne
tion. Instead a RTSP server maintains a session identier
assosoiated to media streams and their state. This means that during a RTSP session
a
lient may use many dierent types of reliable transport proto
ols to issue RTSP re-
quests to the RTSP server as long as it knows the session identier.
The multimedia streams
ontrolled by RTSP are not spe
ied to use any spe
i
trans-
port proto
ol to deliver the media. This makes the RTSP proto
ol very general and
extensible. However the most
ommon transport proto
ol for the media in use with
RTSP is RTP (real-time transport proto
ol).
The RTSP proto
ol is text-based and has inherited the design to a high degree from
HTTP/1.1. This makes it easy to read, understand and debug. However RTSP diers
in some key aspe
ts from HTTP:
{ RTSP use a session
on
ept in the proto
ol.
{ An RTSP server needs to maintain state opposed from the stateless nature of
HTTP.
{ Both an RTSP server and
lient
an issue requests.
{ The Request-URI always
ontains the absolute URI.
The fa
t that RTSP is text-based (UTF-8) means that it is vulnerable to bit errors and
should not be exposed to them. The messages
an be use in any low-level transport
proto
ol that is 8-bit
lean. Every line is terminated by CRLF (
arriage return followed
by a line feed). The basi
stru
ture of the text-based messages in RTSP is shown in
gure 5.2. Explanation to gure 5.2:
General Message type Entity
Message type header* CRLF [Message-body]
header* header*
As mentioned before the RTSP request messages may be issued by either the
lient or
the server. Below, in gure 5.3, the stru
ture of a request message is shown.
General Request Entity
Request-Line header* header* CRLF
header*
The most important part of the request message is the so
alled Request-Line, see gure
5.4. The Request-Line is the rst line of a request message and
onsist of Method,
Request-URI and proto
ol version.
Figure 5.4: Stru
ture of the RTSP Request-Line. The WS stands for whitespa
e and
CRLF means
arriage return followed by a line feed.
{ DESCRIBE
{ GET PARAMETER
{ OPTIONS
{ PAUSE
{ PLAY
{ PING
{ REDIRECT
{ SETUP
{ SET PARAMETER
{ TEARDOWN
Below is an example of how a request method
ould be used by a
lient. The
lient
would like to know what types of methods the server supports and sends the following
request message.
For more about the methods and how they all link together see se
tion "RTSP Methods"
in this
hapter.
This type of RTSP message is a response to a RTSP request message. Below, in gure
5.5, the stru
ture of su
h a message is shown.
General Response Entity
Status-Line header* header* CRLF
header*
The rst line of a response message is the Status-Line, see gure 5.6. The Status-Line
onsist of the proto
ol version followed by a numeri
status
ode. Ea
h status
ode is
asso
iated with a textual phrase a so
alled Reason-Phrase.
Figure 5.6: The Status-Line of the RTSP response message. The WS stands for whites-
pa
e and CRLF means
arriage return followed by a line feed.
The status
odes of response messages have been divided into some general groups seen
below:
{ 1XX - Informal
{ 2XX - Su
ess
{ 3XX - Redire
tion
{ 4XX - Client Error
{ 5XX - Server Error
Ea
h of these groups are divided into spe
i
messages that are used to give more
spe
i
information. For example a server may send the
lient "RTSP/1.0 551 Option
not supported" when the
lient requests an option that is not implemented. For a
omplete list see Appendix A.
RTSP Methods
The most important RTSP methods are des
ribed in the following se
tion in order to
ve a more pra
ti
al view of RTSP. Figure 5.7 shows an overview of a streaming session
using RTSP, SDP and RTP.
Streaming Streaming
client server
DESCRIBE
RTSP/1.0 200 OK
(with SDP description)
SETUP
RTSP/1.0 200 OK
T
PLAY i
m
RTSP/1.0 200 OK e
DESCRIBE
The DESCRIBE method retrieves the des
ription of a presentation or media obje
t
identied by the request URI from a server. The DESCRIBE reply-response pair
an
be seen as an initialization phase of RTSP. The des
ription of the media in the example
below, is in SDP (Session Des
ription Proto
ol) format. For easier reading the CRLF
ending of every line will only be printed in this example.
SETUP
The SETUP request for an URI spe
ies the transport me
hanism to be used for the
streamed media. For the
lient, a
eptable transport parameters will be spe
ied in the
Transport header. The server response will
ontain the transport parameters for both
lient and server, in
luding a SSRC (syn
hronization sour
e) number used by RTCP[10℄.
Even port numbers will be used for transmitting the data and odd port numbers will
be used by RTCP.
PLAY
The PLAY request tells the server to start streaming the data to the
lient. The server
will send the data a
ording to the transport parameters agreed upon in the SETUP
request. By in
luding the session identi
ation number, retrieved in the SETUP re-
sponse, the server knows what data to send. In the example below, a Range header is
also in
luded that tells the server to stream a spe
i
interval of the media. The server
response of the PLAY request, in this example, also in
ludes a RTP-info header. The
RTP-info header
onsists of a semi
olon separated string.
PAUSE
The PAUSE request
auses the stream to be halted temporarily. The server's resour
es
are kept until a PLAY request is sent to the server or the session times out.
TEARDOWN
The TEARDOWN request stops the stream for the given URI, freeing the resour
es
asso
iated with it on the server side.
5.1.2 RTP
RTP stand for Real-Time Transport Proto
ol. It is a proto
ol to stream data in real-
time. The RTP proto
ol delivers the data from the server to the part that requested
the data.
RTP does not ensure that the pa
kets will be delivered to the re
eiver in the right
order. That is up to the re
eiver to assure. To handle this all pa
kets have a sequen
e
number that in
rements by one for every RTP pa
ket sent.
Another thing RTP does not ensure is timely deliver and other quality-of-servi
e guar-
antees. For this it relies on other, lower layer, servi
es to handle.
The stru
ture of an RTP pa
ket header is shown in gure 5.8. The header is followed
by the payload, whi
h
ontains the media that is streamed. Ea
h number in the top of
the gure represents one bit.
0 1 2 3 4 5 6 70 1 2 3 4 5 6 7 0 1 2 3 4 5 6 70 1 2 3 4 5 6 7
V=2 P X CC M PT Sequence number
Timestamp
Synchronization source (SSRC) identifier
Contributing source (CSRC) identifiers
...
The xed header does not
ontain the CSRC (
ontributing sour
e) identiers. It
on-
tains CSRC identiers only if there is more than one re
eiver of the streaming data
session in the system. The audio book system in this proje
t will only have one re
eiver
in ea
h session, so no CSRC identiers are ne
essary.
{ V (2 bits) = Version. Des
ribes what version of RTP this pa
ket use. In the gure
the version is 2.
{ P (1 bit) = Padding. If this bit is set to 1 it means that the RTP pa
ket has at
least one padding o
tet in the end. The last byte of the payload shows the number
of padding o
tets in the pa
ket.
{ X (1 bit) = Extension. If the extension bit is set to 1 the xed header is extended
by one header extension. That is not relevant in this proje
t but more information
about that
an be found in [10℄.
{ CC (4 bits) = CSRC
ount. The CC
ontains the number of CSRC identiers
ontained in the RTP header. The CSRC follows the xed header.
{ M (1 bit) = Marker. How to interpret the marker bit is dened by a prole. For
example events like frame boundaries
an be marked this way.
{ PT (7 bits) = Payload type. The PT holds information about what kind of payload
the pa
ket
ontains. In this system the PT identies that AMR audio is streamed.
{ Sequen
e number (16 bits). The sequen
e number is the pa
ket's sequen
e number.
It in
rements by one for ea
h pa
ket sent by the server. The initial value of the
sequen
e number is a random and unpredi
table value. This is to make the proto
ol
more safe. The sequen
e number is used by the re
eiver to put all the re
eived
pa
kets in the right order.
{ Timestamp (32 bits). The timestamp re
e
ts the sampling frequen
y of the rst
o
tet of data in the payload of the RTP pa
ket. The initial value is random, and
it in
reases linearly and monotoni
ally in time.
{ SSRC (32 bits). SSRC identies the syn
hronization sour
e. This is
hosen ran-
domly to avoid that two or more syn
hronization sour
es in the same RTP session
gets the same SSRC identier.
{ CSRC (0-15 items, 32 bits ea
h). This list of CSRC identiers identies the
on-
tributing sour
es for the payload
ontained in this pa
ket.
The implementation of RTP in this proje
t has UDP[15℄ (User Datagram Proto
ol)
as underlying transport proto
ol. UDP has IP[14℄ (Internet Proto
ol) as underlying
transport proto
ol. See gure 5.9 for an overview of the dierent layers that is used in
the RTP implementation.
RTP
UDP
IP
5.1.3 RTCP
There was no time to implement this proto
ol and RTCP is not really ne
essary for
the appli
ation. But RTCP is often used in systems that uses RTP, so here is a short
des
ription of the RTCP proto
ol. For further information about RTCP, see [10℄.
RTCP stands for RTP Control Proto
ol. As the name indi
ates it is used as a
on-
trol proto
ol while streaming with RTP. RTCP is used to monitor the quality servi
e
in the streaming session. It is also used to
onvey information about the members in a
streaming session.
Using RTCP it is possible to monitor delay, bandwidth quality and gather statisti
s.
This
an be used to deliver the best quality possible to the members in a session. For
example if there is a video
onferen
e with tree parti
ipants and one of them do not
have as good bandwidth as the other two, that member
an re
eive its data in lower
quality than the others, using a so
alled mixer.
Sin
e there only will be one member in ea
h streaming session of the audio book system,
one mobile phone that re
eives the data for ea
h spe
i
stream, the use of RTCP is not
as ne
essary as in sessions with more than one member. For now the implementation of
RTCP will be left for future work.
5.2 Session Des
ription Proto
ol
The SDP (Session Des
ription Proto
ol) is used to des
ribe general multimedia sessions.
The proto
ol des
ribes the media a user wants to re
eive su
h as audio, video or both,
whi
h
ode
s to use and so on. The SDP proto
ol is used in the DESCRIBE reply-
request method to des
ribe the media session of the audio book the user want to listen
to.
A SDP session des
ription
onsist of a number of text-lines of the form Type=Value.
Type is always exa
tly one
hara
ter and is
ase signi
ant. The dierent types that
are used are:
5.3.1 AMR
The header of the AMR payload within the RTP pa
kets is shown in gure 5.10. It
is one byte long.
0 1 2 3 4 5 6 7
F FT Q P P
Figure 5.10: Header of AMR and AMR-WB payload within a RTP pa ket
{ F (1 bit). If this bit is set to 1 it indi
ates that this frame is followed by another
frame in this payload, otherwise it should be set to 0.
{ FT (4 bits)= Frame type. This indi
ates what kind of AMR or AMR-WB[2℄
(Wide-Band) spee
h
oding mode or
omfort noise mode (using SID - Silen
e
Des
riptor) the frame is in. From this it is possible to look up the sampling
frequen
y of the frame.
{ Q (1 bit) is a frame quality indi
ator. If Q is set to 0 it means that the frame is
severely damaged.
{ P (2 bits) is a padding bit and must be set to zero.
If the RTP payload only
ontains one AMR or AMR-WB frame, the a
tual AMR/AMR-
WB audio data follows after the header. If it
ontains more than one frame the payload
is stru
tured like in gure 5.11.
01234567012345670123456701234567
Header1 Header2 Header3 Header4
Frame 1 ...
Frame 2 ...
Frame 3 ...
Frame 4 ...
Figure 5.11: The stru
ture of AMR and AMR-WB payload within a RTP pa
ket if the
payload
ontains more than one ARM or AMR-WB frame
{ Header1 - Header4 are all headers of the type shown in gure 5.10. Ea
h header
is one byte long.
{ Frame 1 - Frame 4
ontains the data of the ARM/AMR-WB audio. Frame 1 is
the
orresponding payload/data to Header1 and so forth. The frames are marked
with "..." be
ause they vary in size. As said before the size of the frames depends
on the sampling frequen
y of the the AMR/AMR-WB audio.
5.3.2 AMR-WB
AMR-WB (Wide-Band) has mu
h better sound quality than AMR audio. But AMR-
WB needs more bandwidth be
ause it
ontains more data.
The sampling frequen y of AMR-WB audio varies from 6,60 kbit/s up to 23.85 kbit/s.
The header of the AMR-WB payload within a RTP pa
ket is the same as for AMR
audio, see gure 5.10. Also, as in the
ase with AMR audio, AMR-WB audio
an be
sent with more than one frame per RTP pa
ket, see gure 5.11.
5.3.3 AMR RTP data to AMR audio data
The AMR/AMR-WB le formats are dierent when streamed as payload within a RTP
pa
ket from when it is played ba
k as an audio le. This means that the AMR/AMR-
WB audio data re
eived from the RTP pa
kets needs to be
onverted to the format used
when playing an AMR/AMR-WB audio le.
A header, that identies the audio type, is needed in the beginning of the le. For
AMR audio the header looks like in gure 5.12 and for AMR-WB the header looks like
in gure 5.13. Ea
h
hara
ter in the two headers is one byte long.
#!AMR\n
#!AMR-WB\n
The audio data from the RTP pa
kets needs to be appended to the header. Ea
h audio
frame needs a header, that looks like in gure 5.14, where ea
h
hara
ter represents one
bit. P stand for padding, T for payload type and V for valid. The frame header is one
byte long.
PTTTTVPP
The audio data frames needs to be pa
ked in big-endian order, that is with the most
signi
ant bit of ea
h byte as the rst bit.
Figure 5.15 shows how the headers and the audio data is built up to a AMR audio
le. The gure does not show the size of the le or the dierent parts in it, only the
stru
ture.
#AMR\n
Header1
Frame1
...
Header2
Frame2
...
.
.
.
HeaderN
FrameN
...
Results
The result of this master's thesis proje
t is a system prototype that shows that it is
possible to stream audio books to mobile phones using J2ME and the RTSP, SDP and
RTP proto
ols.
27
Appli
ation requirements:
Servlet/Database
J2ME application
Streaming server
The J2ME appli
ation, on the mobile phone,
an be divided into ve submodules, see
gure 6.2.
Main-menu
Servlet communication
Streaming communication
Player-GUI
Audio player
Servlet HTTP
Database RTSP/SDP
RTP
HTTP
RTP Internet
RTSP/SDP 3G/GPRS
network
Mobile phone
Stream server
Figure 6.3: Overview of the dierent data ommuni ation proto ols used in the system
As seen in gure 6.3 there are four dierent data
ommuni
ation proto
ols in the system,
HTTP, RTSP, SDP and RTP.
HTTP stands for Hypertext Transfer Proto
ol. This is the standard proto
ol for the
WWW (World-Wide Web). The mobile phone uses HTTP to
ommuni
ate with the
servlet and vi
e versa. The information that is sent between these two parts are text-
strings
ontaining information about audio books stored in the database.
So when sear
hing for books, getting the toplist and the oers-list, buying books and
re
eiving more information about books HTTP is the transport proto
ol that is used.
The string with book information is built up like gure 6.4.
Author;Title;Info;Category;Year;Length;ISBN;
Price;Image_URL;Stream_server_URL;Nr_of_parts$
Figure 6.4: Book information sent from the servlet to the mobile phone
Author;Title;Category;ISBN;...;...$
Figure 6.5: Toplist information sent from the servlet to the mobile phone
Author;Title;Category;Price;ISBN;...;...$
Figure 6.6: Oers-list information sent from the servlet to the mobile phone
Explanation to the dierent parts of gure 6.4, gure 6.5 and gure 6.6 follows below.
RTSP is des
ribed in
hapter "Streaming Te
hnology" and is used to set up and
on-
trol the state of the audio book stream su
h as play, pause et
. It is used between the
streaming server and the mobile phone.
RTP is also des
ribed in
hapter "Streaming Te
hnology". It is the proto
ol that trans-
ports the audio data from the streaming server to the mobile phone.
6.4 Playing streaming audio in J2ME
J2ME and MMAPI does not support playba
k of streaming media. The player in
MMAPI needs to buer (realize) a whole audio le to be able to play it. To get around
this problem the appli
ation uses two buers taking turns in re
eiving the audio data
from the RTP stream. By using the "two buer" method the appli
ation
an
ontinue
re
eiving data in one buer while feeding the other buer to the player as an audio le
that it
an realize.
One problem with this solution is when the swit
hing from one buer to the other
takes pla
e in the middle of a word. When this happens there will be a short break in
the middle of the word. To get around this the system
uts the audio at the last silent
part of the buer before sending it to the player. The part that has been
ut is added
to the next audio part.
The silent parts of the AMR audio that is streamed in the system is represented by
SID (Silen
e Des
riptor) frames.
This solution
an not be used while streaming musi
, be
ause musi
generally do not
have any repeating silent parts through out a song.
Figure 6.7 shows an overview of how the buering and play ba k of the audio is made.
RTP
.
.
.
} Receiving RTP
packets in
buffer 1
}
} }
Converting the received audio data from
Receiving RTP buffer 1 to a playable AMR audio file
RTP
packets in
. buffer 2 Playing the AMR audio file
. converted from buffer 1
.
} } }
T Converting the received audio data from
I RTP Receiving RTP buffer 2 to a playable AMR audio file
M packets in
E . buffer 1 Playing the AMR audio file
. converted from buffer 2
.
}
}
Converting the received audio data from
Receiving RTP buffer 1 to a playable AMR audio file
RTP
}
packets in
buffer 2 Playing the AMR audio file
converted from buffer 1
It is not possible to tell how the appli
ation will behave or if it will work at all with only
simulations. So the testing is the most
riti
al part, to see if the appli
ation really works.
Unfortunately, the test equipment, mobile phones and money for traÆ
osts, was not
re
eived until the end of the proje
t. So no real testing was done ex
ept for simulations
until the proje
t was almost nished.
The "Main menu"-GUI and the HTTP
onne
tions between the servlet and the phone
worked just ne.
But there was trouble with the RTSP/SDP proto
ol. The way the TCP-pa
kets where
read on the phone from the streaming server had to be modied, so that a stream session
ould be started.
This is one thing that worked ne in the simulation enviroment, but not on the phone.
Rumors that it is not possible to
onne
t to a mobile phone using UDP, without start-
ing the UDP
onne
tion from the phone rst was a worrying fa
tor. If this was true
the streaming server would not be able to
onne
t to the phone and stream the RTP
pa
kets,
ontaining the audio data.
NAT and NAPT in the mobile networks
an make it diÆ
ult to establish UDP
on-
ne
tions to mobile phones. This means that the streaming server maybe would not be
able to
onne
t to the phone and stream the RTP pa
kets,
ontaining the audio data.
But on
e the TCP-
onne
tion worked well between the server and the phone the UDP-
onne
tion also worked ne without any problems. Dierent mobile network operators
handle NAT/NAPT in dierent ways, so the system might not ne
essarily work in all
mobile networks. But it worked in the networks Telia and 3 provides.
The
onne
tion worked but there was another problem. The appli
ation on the phone
re
eived the streamed audio data, but what
ame out from the phones speaker was a
lot of noise. The Player-GUI did not update as smoothly on the real phone as on the
simulated phone.
It o
ured that the way the GUI was updated took up too mu
h pro
essing power
from the other threads in the appli
ation so everything was severely slowed down, in-
luding the Player-GUI itself. There was also a few busy waits in the player-GUI that
took up a lot of
y
les.
When the player-GUI was optimized and the busy waits eliminated, everything worked
mu
h better. The player-GUI updated smoothly and the noise was turned into the audio
that was streamed to the phone.
The main problem when testing the system was the update/repainting of the player-
GUI.
It is best to test the system in as many dierent mobile operators networks as possible.
The system is tested in the following operators networks:
{ Telia
{ 3
The rst tests was in Telia's network. When the system worked well on Telia's network
it was also tested in 3's network. It worked ne there as well.
The system is also tested outside while walking around and driving in a
ar. There
was no problems with this on Telia's and 3's networks.
The system is only tested using the Nokia 6680 mobile phone. It would be better to test
the appli
ation on dierent phones and
ompare the results.
{ Dis
Memory: 60 KB
{ Java version: J2ME - MIDP 2.0
{ Communi
ation: So
ket and HTTP support
{ Additional J2ME APIs: MMAPI
{ Supported sound formats: AMR, AMR-WB
Chapter 7
User's Guide
This hapter is a guide to learn how to use the appli ation on the mobile phone.
37
Explanation to the the main menu list:
{ My books: Choose this menu to go "My books" whi
h is where the books that
have been bought is. The number inside parenthesis indi
ates how many books
that are stored in "My books". In this example there is one book. If no books
have been bought, the number inside parenthesis would be "0".
{ Toplist: Choose this to open the audio book toplist.
{ Oers: Choose this to open the audio book oers.
{ Book sear
h: Choose this to sear
h for audio books.
{ Help: Under help there should be a user's guide and other relevant information.
But this is not implemented in this version.
{ Settings: Under settings it should be possible to set language, network options
and performan
e et
. Only set language is implemented in this version.
7.3 My books
Figure 7.2 shows how the "My books" menu looks like.
The books that have been bought are listed here. If no books have been bought, a
message that says that there is no books in "My books" will appear and the appli
ation
will go ba
k to the main menu. If a book under "My books" is
hoosed another menu
will appear, see gure 7.3.
Figure 7.3: Book options
{ Resume: Choose this to resume the listening from where it last was stopped. By
default, if the audio book never has been played, it will start from the beginning.
{ Listen from bookmark: Choose this to start listen from the bookmark that has
been set by the user. By default, if the bookmark is not set, the audio book will
be played from the beginning.
{ Listen from beginning: Choose this to listen from the beginning of the book.
{ Information: Choose this to get more information about the book, su
h as author,
publishing year et
.
{ Remove the book: Choose this to remove the book from the "My books"-list. A
onrmation s
reen will appear to ask if you really want to remove the book from
the appli
ation.
How to intera
t with the audio book player is des
ribed in se
tion "Player" in this
hapter.
7.4 Toplist
The toplist is shown in gure 7.4.
Here the user
an
hoose to get more information about a book. Just press "Information"
and more information about that book will be shown, su
h as book
over, author and
pri
e et
. Figure 7.5 shows what that looks like.
From this menu the book
an be bought by pressing "Buy". Then a
onrmation s
reen
will show, see gure 7.6. After the buy has been
onramated it will be added to the
"My books"-list. Then the appli
ation will show the "My books"-list and will be ready
to start playing the book.
7.5 Oers
The oers list has the same fun
tions as the "Toplist". The only ex
eption is that the
pri
e of the books are shown in the oers list, see gure 7.7. See se
tion "Toplist" in
this
hapter for more information.
Here the user
an
hoose to sear
h audio books by dierent
ategories. These
ategories
are:
The sear
h result list looks like and has the same fun
tionality as the toplist. See se
tion
"Toplist" in this
hapter for more information about that.
If no results
an be found from the sear
h a window will appear that noties that
no sear
h results was found. Then the appli
ation, automati
ally, will go ba
k to the
"Sear
h book" window.
7.7 Player
This se
tion will go through how to use the audio book player. When the user have
hoosed to listen to a book from the "My books"-list the player-GUI will be shown on
the phone. The player will start buering the book immediately. This is shown in gure
7.10.
When the appli
ation have nished buering the player will start to play the audio book
automati
ally. see gure 7.11.
{ Play/Pause: Press Play or Pause, depending whether the player is in playing mode
or in paused mode, or press "2" on the phone to play or pause the audio book.
{ Stop: To stop playing press stop. Then the player will be
losed and the appli
ation
will go ba
k to the "My books"-list.
{ Rewind: To rewind press, and hold down, "1" on the phone.
{ Fast forward: To fast forward press, and hold down, "3" on the phone.
{ Set bookmark: To set a bookmark, press "5" on the phone. A message indi
ating
that a bookmark is set will appear on the s
reen, se gure 7.12, and then disappear
automati
ally after a few se
onds.
{ Control the volume: Press the "*"-key to de
rease and the "#"-key to in
rease
the volume.
{ Exit: To exit press stop.
Con lusions
The hardest part of the proje
t was to implement the streaming proto
ols, RTSP, SDP
and RTP in J2ME. No information about su
h implementations from other people or
organizations were found, so it was a
hallenge to implement it.
During the proje
t many ideas of extra features that
ould be added to the system
ame up. All these feature did not have to do with the the audio streaming so it was
important that the fo
us did not slip away from the main task, to implement streaming
proto
ols in J2ME.
To make this a full, redundant and s
alable system, the work would take mu
h longer
time than the master's thesis proje
t oers. But that was not the goal with this proje
t.
The goal was to see if it was possible to stream audio books to mobile phones using
J2ME and to make a prototype of the system.
The billing-part of the system is not fully developed. The idea was the billing should
be done by sending an SMS.
The reason that this was not developed fully was that there was not enough resour
es
for doing it, for example a SMS-server and a deal with a mobile network operator would
be needed.
47
8.2 Future work
It would be ni
e to implement the
ontrol proto
ol RTCP to measure data-rates, delay
and pa
ket loss of the RTP streaming. With this information it would be possible to
make the appli
ation more dynami
in data-rate so that it
ould adjust the sound qual-
ity after available bandwidth in the mobile network.
More testing on dierent mobile phones and in dierent environments would be good.
There were no resour
es for doing this during the proje
t.
A user interfa
e for updating the audio book database would make the system more
easy to administrate and use.
Chapter 9
A knowledgments
49
Referen
es
[1℄ Sony Eri
sson J2ME SDK 2.2.0. http://developer.sonyeri
sson.
om/site/global/
do
stools/java/p java.jsp (visited 2005-06-10).
[2℄ 3GPP. Arib std-t63-26.201 v5.0.0 spee
h
ode
spee
h pro
essing fun
tions; amr
wideband spee
h
ode
; frame stru
ture (release 5). 2001.
[3℄ 3GPP. Arib std-t63-26.101 v4.2.0 - mandatory spee
h
ode
spee
h pro
essing
fun
tions; amr spee
h
ode
frame stru
ture (release 4). 2002.
[4℄ Bonnier Audio. http://www.bonnieraudio.se (visited 2005-06-10).
[5℄ Ethereal. http://www.ethereal.
om (visited 2005-06-10).
[6℄ The Apa
he Software Foundation. Apa
he tom
at.
http://jakarta.apa
he.org/tom
at (visited 2005-06-10).
[7℄ Forlaggareforeningen. http://www.forlaggareforeningen.se (visited 2005-06-10).
[8℄ Harri Holma and Anttu Toskala. WCDMA for UMTS - Radio A
ess For Third
Generation Mobile Communi
ations. Wiley, 2000.
[9℄ MySQL. http://www.mysql.
om (visited 2005-06-10).
[10℄ Rf
-1889. Rtp: A transport proto
ol for real-time appli
ations.
http://www.faqs.org/rf
s/rf
1889.html (visited 2005-05-06), 1996.
[11℄ Rf
-2326. Real time streaming proto
ol. http://www.rtsp.org/2003/drafts/draft05/
draft-ietf-mmusi
-rf
2326bis-05.pdf (visited 2005-05-12), 2003.
[12℄ Rf
-2327. Sdp: Session des
ription proto
ol.
http://www.faqs.org/rf
s/rf
2327.html (visited 2005-05-12), 1998.
[13℄ Rf
-2616. Hypertext transfer proto
ol { http/1.1.
http://www.w3.org/Proto
ols/rf
2616/rf
2616.html (visited 2005-05-12), 1999.
[14℄ Rf
-760. Internet proto
ol. http://www.faqs.org/rf
s/rf
760.html (visited 2005-05-
12), 1980.
[15℄ Rf
-768. User datagram proto
ol. http://www.faqs.org/rf
s/rf
768.html (visited
2005-05-12), 1980.
[16℄ Darwin Streaming Server. http://developer.apple.
om/darwin/proje
ts/streaming
(visited 2005-06-10).
51
[17℄ MediaLab Telia Sonera. Streaming in mobile networks - white paper. 2004.
[18℄ Speex. http://www.speex.org (visited 2005-06-10).
[19℄ Sun. J2ME. http://java.sun.
om/j2me/index.jsp (visited 2005-05-05).
[20℄ Sun. Java. http://java.sun.
om (visited 2005-05-05).
[21℄ Eri
Woudenberg. Conversion between amr (adaptive multi-rate
ode
) le formats.
http://www.
onna
tivity.
om/ eaw/amrwork/ (visited 2005-05-23), 2003.
Appendix A
53
Appendix B
Abbreviations
55
{ ISBN - International Standard Book Number
{ J2ME - Java 2 Platform, Mi
ro Edition
{ JVM - Java Virtual Ma
hine
{ KVM - K Virtual Ma
hine
{ MIDP - Mobile Information Devi
e Prole
{ MMAPI - Multimedia API
{ MMS - Multimedia Messaging System
{ mp3 - MPEG-1 Audio Layer 3
{ MPEG-1 - Moving Pi
tures Experts Group - 1
{ NAPT - Network Address Port Translation
{ NAT - Network Address Translation
{ PDA - Personal Digital Assistant
{ RMS - Re
ord Management System
{ RTCP - RTP Control Proto
ol
{ RTP - Real-Time Transport Proto
ol
{ RTSP - Real-Time Streaming Proto
ol
{ SDP - Session Des
ription Proto
ol
{ SID - Silen
e Des
riptor
{ SR - System Requirement
{ SSRC - Syn
hronization Sour
e
{ TCP - Transmission Control Proto
ol
{ UDP - User Datagram Proto
ol
{ UMTS - Universal Mobile Tele
ommuni
ations System
{ URI - Uniform Resour
e Identier
{ URL - Uniform Resour
e Lo
ator
{ UTF-8 - 8-bit Uni
ode Transformation Format
{ UTRAN - UMTS Terrestrial Radio A
ess Network
{ WAV - Waveform Audio
{ WCDMA - Wideband Code Division Multiple A
ess
{ WS - White-Spa
e
{ WWW - World-Wide Web