Project Sampal

Video Conferencing System
with
Multimedia Capabilities
Janet Adams
April 2005
BACHOLOR OF ENGINEERING
IN
TELECOMMUNICATIONS ENGINEERING
Supervised by Dr. Derek Molloy
Video Conferencing System Janet Adams
Acknowledgements
I would like to thank Dr. Derek Molloy, who supervised me on this project, for his
enthusiasm and guidance. I would also like to thank Edward Casey, whom I
collaborated with on certain areas of the project, and his supervisor, Dr. Gabriel
Muntean, for his support and advice. My thanks also go to my friends Edward Casey,
Edel Harrington and Hector Climent for listening to me and guiding me through my
initial presentation. I would like to dedicate this project to my parents, who have
supported me throughout all my time in college and especially during this, my final
year.
Declaration
I hereby declare that, except where otherwise indicated, this document is entirely my
own work and has not been submitted in whole or in part to any other university.
Signed: ......................................................................
ii
Date: ......................................
Abstract
This document will describe the development of a video conferencing system with
multimedia capabilities. The concept of multicasting will be explored as this was used
in the development of the video conferencing. Other concepts, which were used in the
development of the system such as Java Media Framework, Real-time Transport
Protocol and a number of encoding schemes, will also be investigated.
The design of the system will explain how each of the features was planned for and
developed, and will provide the user with an understanding of video conferencing,
client server communications, motion detection and much more. The implementation
section is read like a user manual. On completion of this section, the reader should be
able to make full use of all of the features within the application and should
understand the depth to which each of the features can be used.
When this document has been read, the reader will fully understand both how the
system was developed and how it can be used, as well as understanding the necessary
technical information to understand how the different features work.
iii
Table of Contents
ACKNOWLEDGEMENTS.........................................................................................II
DECLARATION..........................................................................................................II
ABSTRACT ................................................................................................................ III
TABLE OF CONTENTS........................................................................................... IV
TABLE OF FIGURES............................................................................................ VIII
TABLE OF TABLES...................................................................................................X
1
INTRODUCTION.................................................................................................1
1.1
AIM OF THIS PROJECT ...................................................................................1
1.2
CURRENT EXAMPLES OF SIMILAR APPLICATIONS ........................................1
1.3
EQUIPMENT AND SOFTWARE ........................................................................2
1.3.1
JBuilder 2005 ..........................................................................................2
1.3.2
Logitech Webcam ....................................................................................2
1.3.3
Laptop .....................................................................................................2
1.3.4
Digital Camcorder ..................................................................................2
TECHNICAL BACKGROUND ..........................................................................3

2.1
JAVA MEDIA FRAMEWORK ...........................................................................3
2.1.1
Introduction.............................................................................................3
2.1.2
JMF Architecture ....................................................................................5
2.1.3
Principle Elements ..................................................................................6
2.1.4
Common Media Formats.........................................................................9
2.1.5
Real Time Transport Protocol (RTP) Architecture in JMF ..................11
2.1.6
Alternatives to JMF...............................................................................15
2.1.7
Summary................................................................................................15
2.2
REAL-TIME TRANSPORT PROTOCOL...........................................................16
2.2.1
Introduction...........................................................................................16
2.2.2
Some RTP Definitions ...........................................................................17
2.2.3
RTP Data Structures .............................................................................19
2.2.4
RTP Control Protocol ...........................................................................21

iv
2.2.5
Alternatives to RTP ...............................................................................25
2.2.6
Summary................................................................................................25
2.3
2.3.1
Introduction...........................................................................................26
2.3.2
Encoder Principles................................................................................26
2.3.3
Decoder Principles................................................................................27
2.3.4
Alternative Audio Encoding Schemes ...................................................28
2.3.5
Summary................................................................................................28
2.4
VIDEO ENCODING SCHEME H.263 ...........................................................28
2.4.1
Introduction...........................................................................................28
2.4.2
Summary of Operation ..........................................................................28
2.4.3
Alternative Video Encoding Schemes....................................................29
2.4.4
Summary................................................................................................29
2.5
IMAGE OBSERVATION ................................................................................29
2.5.1
Initial Ideas ...........................................................................................29
2.5.2
The Way it Works ..................................................................................30
2.6
MULTICASTING ..........................................................................................32
2.6.1
Alternatives to Multicasting ..................................................................32
2.6.2
What is Multicasting .............................................................................33
2.7
3
AUDIO ENCODING SCHEME G.723.1........................................................26
SUMMARY ..................................................................................................34
DESIGN OF THE SYSTEM ..............................................................................35

3.1
SYSTEM ARCHITECTURE ............................................................................35
3.1.1
Client to Server Communication ...........................................................35
3.1.2
Client to Client Communication............................................................37
3.2
SYSTEM DESIGN .........................................................................................37
3.2.1
The Server .............................................................................................38
3.2.2
The Client ..............................................................................................40
3.3
MESSAGING STRUCTURE ............................................................................41
3.4
CONFERENCING ..........................................................................................42
3.5
IMAGE OBSERVATION ................................................................................46
3.6
COMMON PROCEDURES WITHIN THE APPLICATION ....................................47
3.6.1
Login .....................................................................................................47
3.6.2
Call Setup ..............................................................................................48
3.6.3
Call Teardown.......................................................................................49
v
3.6.4
3.7
4
Logout ...................................................................................................50
OTHER FEATURES WITHIN THE APPLICATION .............................................50
IMPLEMENTATION OF THE SYSTEM .......................................................51

4.1
INTRODUCTION...........................................................................................51
4.2
LOGGING IN ...............................................................................................51
4.3
CALLS ........................................................................................................52
4.3.1
Making a Peer to Peer Call ..................................................................52
4.3.2
Receiving a Person to Person Call .......................................................55
4.3.3
Initiating a Conference Call..................................................................56
4.3.4
Joining a Conference Call ....................................................................57
4.4
MESSAGES..................................................................................................58
4.4.1
Sending an MMS Message ....................................................................58
4.4.2
Receiving an MMS Message .................................................................61
4.4.3
Videomail Messages..............................................................................63
4.5
EXTRA FEATURES ......................................................................................64
4.5.1
Image Observation ................................................................................64
4.5.2
Adaption ................................................................................................64
4.6
USING THE SERVER ....................................................................................64
RESULTS AND DISCUSSION .........................................................................69
CONCLUSIONS AND FURTHER RESEARCH ............................................75

6.1
THE BENEFITS OF THIS PROJECT .................................................................75
6.2
THE IMPACT OF THIS PROJECT ....................................................................75
6.3
FUTURE RESEARCH POSSIBILITIES .............................................................76
6.4
MEETING THE REQUIREMENTS ...................................................................77
REFERENCES............................................................................................................78
7
APPENDIX 1 .......................................................................................................79
7.1
CALL SETUP REQUEST ...............................................................................79
7.2
LOGIN REQUEST .........................................................................................80
7.3
LOGOFF REQUEST ......................................................................................81
7.4
CALL END REQUEST...................................................................................82
7.5
CONFERENCE SETUP REQUEST ...................................................................83
7.6
ADD PARTICIPANT TO CONFERENCE REQUEST ...........................................84

vi
7.7
END CONFERENCE REQUEST ......................................................................85
7.8
SEND MESSAGE REQUEST ...........................................................................87
7.9
RECEIVE MESSAGE REQUEST .....................................................................87
APPENDIX 2 .......................................................................................................89
8.1
IMAGE OBSERVATION CODE ......................................................................89
vii
Table of Figures
FIGURE 2.1 - MEDIA PROCESSING MODEL .......................................................................4
FIGURE 2.2 - SYSTEM PROCESSING MODEL .....................................................................5
FIGURE 2.3 - JMF BASIC SYSTEM MODEL.......................................................................6
FIGURE 2.4 - RTP AND THE OSI MODEL .......................................................................17
FIGURE 2.5 RTP PACKET HEADER FORMAT ...............................................................20
FIGURE 2.6 - RTCP SENDER REPORT STRUCTURE ........................................................24
FIGURE 2.7 - RTCP RECEIVER REPORT STRUCTURE .....................................................25
FIGURE 2.8 - G.723.1 ENCODER ....................................................................................27
FIGURE 2.9 - G723.1 DECODER .....................................................................................27
FIGURE 2.10 - H.263 BASELINE ENCODER ....................................................................29
FIGURE 2.11 - MACROBLOCKS WITHIN H.263 ...............................................................31
FIGURE 2.12 - MOTION PREDICTION ..............................................................................31
FIGURE 2.13 - ORIGINAL CONFERENCING PLAN ............................................................33
FIGURE 2.14 - MULTICASTING THROUGH ROUTER ........................................................34
FIGURE 3.1 - CLIENT TO SERVER COMMUNICATION ......................................................35
FIGURE 3.2 CLIENT TO CLIENT COMMUNICATION ......................................................37
FIGURE 3.3 - SERVER CLASS DIAGRAM .........................................................................38
FIGURE 3.4 - EXAMPLE OF PUSH PULL MESSAGE SETUP ...............................................39
FIGURE 3.5 - CLIENT CLASS DIAGRAM ..........................................................................41
FIGURE 3.6 - ALLOCATING A CONFERENCE POSITION ...................................................43
FIGURE 3.7 - MESSAGE SEQUENCE CHART FOR CONFERENCE CALL .............................44
FIGURE 3.8 - CONFERENCING SETUP .............................................................................45
FIGURE 3.9 - IMAGE OBSERVATION AVERAGES.............................................................46
FIGURE 3.10 - MESSAGESEQUENCE CHART FOR LOGIN ................................................47
FIGURE 3.11 - MESSAGE SEQUENCE CHART FOR CALL SETUP ......................................48
FIGURE 3.12 - MESSAGE SEQUENCE CHART FOR CALL TEARDOWN ..............................49
FIGURE 3.13 - MESSAGE SEQUENCE CHART FOR LOGOUT.............................................50
FIGURE 4.1 - LOGIN SCREEN .........................................................................................52
FIGURE 4.2 - HOME SCREEN ..........................................................................................53
FIGURE 4.3 - MAKING A P2P CALL ...............................................................................54
FIGURE 4.4 - DURING A CALL........................................................................................55
FIGURE 4.5 - CALL ACCEPT /REJECT .............................................................................56
FIGURE 4.6 - INITIATING A CONFERENCE CALL .............................................................57
FIGURE 4.7 - CONFERENCE REQUEST ............................................................................58
FIGURE 4.8 - MMS SCREEN ..........................................................................................59
FIGURE 4.9 - ATTACH BUTTON FILE CHOOSER .............................................................60
FIGURE 4.10 - MMS SCREEN READY TO SEND ..............................................................61
FIGURE 4.11 UNIFIED INBOX SCREEN .........................................................................62
viii
FIGURE 4.12 - MESSAGE POPUP WINDOW .....................................................................62

FIGURE 4.13 - LEAVE VIDEOMAIL REQUEST .................................................................63
FIGURE 4.14 - VIDEOMAIL COMPOSE ............................................................................63
FIGURE 4.15 VIDEOMAIL POPUP .................................................................................64
FIGURE 4.16 - SERVER LOGIN SCREEN ..........................................................................65
FIGURE 4.17 - SERVER ACTIVITY SCREEN .....................................................................66
FIGURE 4.18 - SERVER CLIENT STATUS SCREEN ...........................................................66
FIGURE 4.19 - SERVER CLIENT STATUS SCREEN WITH CLIENTS ....................................67
FIGURE 4.20 - SERVER ADMINISTRATION SCREEN ........................................................68
ix
Table of Tables
TABLE 2.1 JMF COMMON VIDEO FORMATS ...............................................................10
TABLE 2.2 JMF COMMON AUDIO FORMATS...............................................................11
TABLE 5.1 - TESTING SCENARIOS: LOGIN / LOGOFF ......................................................70
TABLE 5.2 - TESTING SCENARIONS: MAKING A CALL ...................................................71
TABLE 5.3 - TESTING SCENARIOS: SENDING A MESSAGE ..............................................72
TABLE 5.4 - TESTING SCENARIOS: CONFERENCE CALL .................................................73
TABLE 5.5 - OTHER TESTING SCENARIOS ......................................................................74
Chapter 1
1 Introduction
All business organisations, for example, office blocks, colleges and shopping centres,
have telephone systems installed in them. These telephone systems allow features such
as call forward, call divert, voicemail, free extension dialling to other users within in
the same network, etc. Another object that is found in almost all of these facilities is
computers, usually one per user. Therefore, in the majority of establishments, you will
find that every employee has a telephone handset and a computer. A cost effective and
space saving idea would be to combine these two everyday utilities so that the
computer can also be used as a phone. People want their lives and work to be as
simple and time efficient as possible and one way to achieve this is to have a software
based telephony system on their computers. Why do they need a physical telephone
handset when it is possible to attain all the same features on their computers, cutting
out the expense of the handset?
1.1 Aim of this Project

The aim of this project is to develop a video conferencing facility with multicasting
capability and MMS functionality. The application will be developed in Java making
use of the Java Media Framework for real time applications. The project will be
developed in conjunction with Edward Casey, who will develop Videomail and
adaption features to add to the system.
1.2 Current Examples of Similar Applications

There are some examples of software based phone systems available. One example is
Skype, an internet phone system. This allows users to have voice conversations, free
of charge, over the internet, provided that the party they are calling is also using the
Skype service. The disadvantage is that a company employing this system would have
no control over their users. Another example is Vonage, which offers the same sort of
service as Skype and hence the same disadvantages.
1.3 Equipment and Software

1.3.1 JBuilder 2005
This is the program that was used to code and compile all of the Java code. The reason
that this program was chosen is that it was available for free and it was very
straightforward to use. It was simple to use but it did the job. One of the features that
was very helpful in this program was that it highlighted any common coding errors,
which saved a lot of time. In other situations, the developer may not have been
informed of these errors until after compilation.
1.3.2 Logitech Webcam

This was used for the testing of the video calls.
1.3.3 Laptop
Testing was difficult as very few of the features could be tested alone. Almost all
testing required two computers. For this reason, it was most efficient to use two
laptops connected to two webcams.
1.3.4 Digital Camcorder

The digital camcorder was used for the development of the image observation, as the
low quality of the webcam introduced to the image, which hampered the calculation of
an adequate threshold value.
Chapter 2
2 Technical Background
In this chapter, the various standards used in the design of this system will be
discussed. The standards chosen were based on what was supported by the Java Media
Framework. There were possibly some more suitable options out there, for example
with the encoding schemes, but the choice was limited by what was supported by the
Java Media Framework and the Real-Time Transport Protocol. The standards
discussed within this chapter were the basic building blocks that this project was built
on.
2.1 Java Media Framework

2.1.1 Introduction
It is often the case that a Java developer will want to include some real-time media
within their Java application or applet. Prime examples of such real-time media would
be audio and video. The Java Media Framework (JMF) was developed to enable this
to happen. JMF allows the capture, playback, streaming and transcoding of multiple
media formats. JMF is an extension of the Java platform that provides a powerful
toolkit from which scalable, cross platform applications can be developed.
Any data that changes with respect to time can be characterized as real-time media.
With real-time media, the idea is that you will see it as it happens. So for example, if
you are partaking in a video conference, you expect that there should not be a
significant delay between when the other person says something to you, and when you
hear and see them saying it. Audio clips, MIDI sequences, movie clips, and animations
are common forms of time-based media. Such media data can be obtained from a
variety of sources, such as local or network files, cameras, microphones, and live
broadcasts. Figure 2.1, below, shows a media processing model. There are three main
elements within the system - the input, the output and the processor. Think of the input
as where the data comes from, this could be a capture device such as a video camera, a
file or it could be data that has been received over a network. Before the input can
reach the output, it has to be formatted so that it can be received correctly. This
formatting takes place in the processor. A processor can do many things to the data,
3
some of which include compressing/decompressing, applying effect filters and

converting into the correct format using the encoding scheme which has been
specified. Once the data has been correctly formatted by the processor, it is then
passed on to the output so that the end user can see or hear it. The output could simply
be a player, such as a speaker or a television, it could save the data to a file or it could
send it across the network.
Figure 2.1 - Media Processing Model

To relate the media processor model shown above to this particular project, let us take
a look at Figure 2.2. As can be seen immediately, this system has more components
than the one shown above. However it can still be divided into the same three parts,
input, processor and output. The input consists of the MediaLocator which
represents the address of the device, and the data source, which is constructed using
the MediaLocator and is the interface to the device. The data is then taken from the
input and sent to the processor. The processor in the system consists of the processor
itself, which takes the data and converts it into the encoding scheme that has been
defined for the system. The other element of the processor is the RTPManager. The
transmission RTPManager takes the encoded data from the processor and packetizes
it, so that it can be sent over the network. The data is then transmitted over the
network where it is met on the other side by the receiver RTPManager, which takes
the data and depacketizes it, converting it back into a format that can be read by the
player. Once this stage has been completed, the data is passed to the output, consisting
of the player and the speaker (the example shown here is for a voice call, the speaker
could be a monitor or any other sort of output device that the media can be seen or
heard on). The player takes the encoded data and decodes it, then sends it to the output
device so that the receiver can see or hear it.
Figure 2.2 - System Processing Model
2.1.2 JMF Architecture

The most practical example of real-time media comes from a basic home movie
system. I have shown this system below in Figure 2.3. Imagine someone is making a
home movie, the first thing that they do is record it onto a video tape using a
camcorder. So they are using a capture device the camcorder and recording onto a
data source the video tape.
Once they have made the movie, the next logical thing that they would want to do
would be to watch it. So, thinking of the system processing model, they would need
some sort of processor that would take the data from the data source and convert into
some format that they can see and hear. This processor would be a VCR. When the
data source is placed into the processor, the data is transmitted to the final stage of the
system processing model the output. In this case, the television will be the principle
output device. There will more than likely be speakers on the television that will
transmit the audio part of the media. So below we have a very basic processing model
that many people use every day at home.
Figure 2.3 - JMF Basic System Model

Yet even though the model shown in Figure 2.3 seems very basic, it still contains the
main elements of the more complicated system process model that is shown above, in
Figure 2.2.
2.1.3 Principle Elements

Data Source
In JMF, a DataSource is the audio or media source, or possibly a combination of
the two e.g. a webcam with an integrated microphone. It could also be an incoming
stream across a network, for example the internet, or a file. Once the location or
protocol of the data is determined, the data source encapsulates both the media
location, and the protocol and software used to deliver the media. When a
DataSource is sent to a Player, the Player is unconcerned about the origin of
the DataSource.
There are two types of DataSources, determined by how the data transfer initiates:
Pull data source: Here the data flow is initiated by the client and the data flow
from the source is controlled by the client.
Push data source: Here the data flow is initiated by the server and the data flow
from the source is controlled by the server.
Several data sources can be combined into one. So if you are capturing a live scene
with two data sources: audio and video, these can be combined for easier control.
Capture Device
A capture device is the piece of hardware that you would use to capture the data,
which you would connect to the DataSource. Examples would be a microphone or
a webcam. The captured media can then be sent to the Player, converted into
another format or even stored to be used at a later stage.
Like DataSources, capture devices can be either a push or a pull source. If a
capture device is a pull source, then the user controls when to capture the image, if it is
a push source, then the user has no control over when the data is captured, it will be
captured continuously.
Player
As mentioned above, a Player takes a stream of data and renders it to an output
device. A Player can be in any one of a number of states. Usually, a Player would
go from one state to the next until it reaches the final state. The reason for these states
is so the data can be prepared before it is played. JMF defines the following six states
for the Player:
Unrealized: In this state, the Player object has just been instantiated and
does not yet know anything about its media.
Realizing: A Player moves from the unrealized state to the realizing state
when the Player's realize() method is called. In this state, the Player is
in the process of determining its resource requirements
Realized: Transitioning from the realizing state, the Player comes into the
realized state. In this state the Player knows what resources it needs and has
information about the type of media it is to present. It can also provide visual
components and controls, and its connections to other objects in the system are
in place. A player is often created already in this state, using the
createRealizedPlayer() method.
Prefetching: When the prefetch() method is called, a Player moves

from the realized state into the prefetching state. A prefetching Player is
preparing to present its media. During this phase, the Player preloads its
media data, obtains exclusive-use resources, and does whatever else is needed
to play the media data.
7
Prefetched: The state where the Player has finished prefetching media data
- it's ready to start.
Started: This state is entered when you call the start() method. The
Player is now ready to present the media data.
Processor
A Processor is a type of Player, which has added control over what processing is
performed on the input media stream. As well as the six aforementioned Player
states, a Processor includes two additional states that occur before the
Processor enters the realizing state but after the unrealized state:
Configuring: A Processor enters the configuring state from the unrealized

state when the configure() method is called. A Processor exists in the
configuring state when it connects to the DataSource, demultiplexes the
input stream, and accesses information about the format of the input data.
Configured: From the configuring state, a Processor moves into the

configured state when it is connected to the DataSource and the data format
has been determined.
As with a Player, a Processor transitions to the realized state when the

realize() method is called.
DataSink
The DataSink is a base interface for objects that read media content delivered by a
DataSource and render the media to some destination, typically a file.
Format
A Format object represents an object's exact media format. The format itself carries no
encoding-specific parameters or global-timing information; it describes the format's
encoding name and the type of data the format requires. Format subclasses include,
AudioFormat
VideoFormat
In turn, VideoFormat contains six direct subclasses:
H261Format
H263Format
8
IndexedColorFormat
JPEGFormat
RGBFormat
YUVFormat
As will be discussed in more detail later on in this report, the formats that were chosen
for this project were H.263 for the audio and G.723 mono, for the audio.
Manager
A manager, an intermediary object, integrates implementations of key interfaces that
can be used seamlessly with existing classes. JMF offers four managers:
Manager:
Use
Manager
to
create
Players,
Processors,
DataSources, and DataSinks.
PackageManager: This manager maintains a registry of packages that contain

JMF classes, such as custom Players, Processors, DataSources, and
DataSinks.
CaptureDeviceManager: This manager maintains a registry of available

capture devices.
PlugInManager: This manager maintains a registry of available JMF plug-in

processing components.
2.1.4 Common Media Formats

Table 2.1 and Table 2.2 below identify some of the characteristics of common media
formats. When selecting the format for this system, the main consideration was the
bandwidth. This needed to be as low as possible. Obviously, the quality should be as
high as possible. The CPU requirement wasnt really an issue. Each client would be
working off separate computers with separate CPU capabilities, so it wasnt something
that needed to be taken into account in choosing the encoding schemes.
So looking at Table 1, which is the most common video formats, it can be seen that
H.263 is the only one that meets the low bandwidth requirement. The quality is
medium, which is perfectly acceptable for this sort of application. Therefore, this was
the video encoding scheme that was chosen.
Looking at Table 2, for the audio, it can be seen that there are two formats that meet
the low bandwidth requirement. These are GSM and G.723.1. Of these two, the former
has a low quality while the latter has medium quality. It therefore made more sense to
choose G.723.1. I have highlighted the chosen encoding schemes.
Format
Cinepak
MPEG1
H.261
Content Type
AVI
QuickTime
MPEG
AVI
RTP
Quality
CPU
Bandwidth
Requirements
Requirements
Medium Low
High
High
High
High
Low
Medium
Medium
QuickTime
H.263
AVI
Medium Medium
Low
High
High
RTP
QuickTime
JPEG
AVI
High
RTP
Indeo
QuickTime
AVI
Medium Medium
Medium
Table 2.1 JMF Common Video Formats
10
Format
Content
CPU
Bandwidth
Requirements
Requirements
High
Low
High
Low
Low
High
Quality
Type
AVI
PCM
QuickTime
WAV
AVI
Mu-Law
QuickTime
WAV
RTP
ADPCM
(DVI,
IMA4)
AVI
QuickTime
WAV
Layer3
GSM
G.723.1
Medium
High
High
High
High
High
Medium
Low
Low
Low
RTP
MPEG-1 MPEG
MPEG
Medium Medium
MPEG
WAV
RTP
WAV
Medium Medium
RTP
Low
Table 2.2 JMF Common Audio Formats

As it happens, the schemes that were chosen are ideal for the application as H.263 was
developed for video conferencing applications and is optimised for video where there
is not much movement, and G.723 is typically used for low bit rate speech, such as
telephony applications.
2.1.5 Real Time Transport Protocol (RTP) Architecture in JMF

The JMF RTP APIs are designed to work seamlessly with the capture, presentation,
and processing capabilities of JMF. Players and processors are used to present and
manipulate RTP media streams just like any other media content. You can transmit
11
media streams that have been captured from a local capture device using a capture
DataSource or that have been stored to a file using a DataSink. Similarly, JMF can be
extended to support additional RTP formats and payloads through the standard plugin
mechanism.
[JavaTM
Media
Framework
API
http://java.sun.com/products/java-media/jmf/2.1.1/guide/index.html,
Guide,
November
19,
1999 (April 2005)]

Session Manager
In JMF, a SessionManager is used to coordinate an RTP session. The session
manager keeps track of the session participants and the streams that are being
transmitted. The session manager maintains the state of the session as viewed from the
local participant. The SessionManager interface defines methods that enable an
application to initialize and start participating in a session, remove individual streams
created by the application, and close the entire session.
Session Statistics: The session manager maintains statistics on all of the RTP
and RTCP packets sent and received in the session. The session manager
provides access to two types of global statistics:
o GlobalReceptionStats: Maintains global reception statistics for the
session.
o GlobalTransmissionStats:
Maintains
cumulative
transmission
statistics for all local senders.

Statistics for a particular recipient or outgoing stream are available from the
stream:
o ReceptionStats: Maintains source reception statistics for an individual
participant.
o TransmissionStats: Maintains transmission statistics for an individual
send stream.
Session Participants: The session manager keeps track of all of the

participants in a session. Each participant is represented by an instance of a
class that implements the Participant interface. SessionManagers create a
Participant whenever an RTCP packet arrives that contains a source
description (SDES) with a canonical name (CNAME) that has not been seen
before in the session. A participant can own more than one stream, each of
12
which is identified by the synchronization source identifier (SSRC) used by the

source of the stream.
Session Streams: The SessionManager maintains an RTPStream object for

each stream of RTP data packets in the session. There are two types of RTP
streams:
o ReceiveStream represents a stream that's being received from a
remote participant.
o SendStream represents a stream of data coming from the
Processor or input DataSource that is being sent over the network.
A ReceiveStream is constructed automatically whenever the session
manager detects a new source of RTP data.
RTP Events
RTP-specific events used to report on the state of the RTP session and streams. To
receive notification of RTP events, you implement the appropriate RTP listener and
register it with the session manager:
SessionListener: Receives notification of changes in the state of the session.

You can implement SessionListener to receive notification about events
that pertain to the RTP session as a whole, such as the addition of new
participants. There are two types of session-wide events:
o NewParticipantEvent: Indicates that a new participant has joined the
session.
o LocalCollisionEvent: Indicates that the participant's synchronization
source is already in use.
SendStreamListener: Receives notification of changes in the state of an RTP

stream that's being transmitted. You can implement SendStreamListener to
receive notification whenever:
o New send streams are created by the local participant.
o The transfer of data from the DataSource used to create the send stream
has started or stopped.
o The send stream's format or payload changes.
There are five types of events associated with a SendStream:
o NewSendStreamEvent: Indicates that a new send stream has been
created by the local participant.
13
o ActiveSendStreamEvent: Indicates that the transfer of data from the

DataSource used to create the send stream has started.
o InactiveSendStreamEvent: Indicates that the transfer of data from the
DataSource used to create the send stream has stopped.
o LocalPayloadChangeEvent: Indicates that the stream's format or
payload has changed.
o StreamClosedEvent: Indicates that the stream has been closed.
ReceiveStreamListener: Receives notification of changes in the state of an

RTP
stream
that's
being
received.
You
can
implement
ReceiveStreamListener to receive notification whenever:

o New receive streams are created.
o The transfer of data starts or stops.
o The data transfer times out.
o A previously orphaned ReceiveStream has been associated with a
Participant.
o An RTCP APP packet is received.
o The receive stream's format or payload changes.
You can also use this interface to get a handle on the stream and access the
RTP
DataSource so that you can create a MediaHandler.

There are seven types of events associated with a ReceiveStream:
o NewReceiveStreamEvent: Indicates that the session manager has
created a new receive stream for a newly-detected source.
o ActiveReceiveStreamEvent: Indicates that the transfer of data has
started.
o InactiveReceiveStreamEvent: Indicates that the transfer of data has
stopped.
o TimeoutEvent: Indicates that the data transfer has timed out.
o RemotePayloadChangeEvent: Indicates that the format or payload of
the receive stream has changed.
o StreamMappedEvent: Indicates that a previously orphaned receive
stream has been associated with a participant.
o ApplicationEvent: Indicates that an RTCP APP packet has been
received.
14
RemoteListener: Receives notification of events or RTP control messages

received from a remote participant. You can implement RemoteListener
to receive notification of events or RTP control messages received from a
remote participant. You might want to implement RemoteListener in an
application used to monitor the session - it enables you to receive RTCP
reports and monitor the quality of the session reception without having to
receive data or information on each stream. There are three types of events
associated with a remote participant:
o ReceiverReportEvent: Indicates that an RTP receiver report has been
received.
o SenderReportEvent: Indicates that an RTP sender report has been
received.
o RemoteCollisionEvent: Indicates that two remote participants are
using the same synchronization source ID (SSRC).
2.1.6 Alternatives to JMF

There was no real alternative to JMF using Java. However, if another programming
language had been used there would have been alternatives available. An example
would be to use C++ programming language in conjunction with the Microsoft Direct
Show API, which includes libraries for rendering media content. There is an open
source project being undertaken at the moment for creating a SIP communicator using
Java and the JMF. Aside from this, there are no real similar applications to this using
Java and this was the reason that Java was chosen.
2.1.7 Summary
As can be seen from the above sections, JMF is a very powerful tool. It is very easy to
work with and the best way to understand it is to use it. It is fair to say that there is a
lot of information, such as forums, help-sites etc. on the World Wide Web regarding
this subject. However, there is not a lot of information on using JMF for projects
similar to this one. Perhaps one of the best features of JMF is that it does not require
one to learn everything about it before using it. With a basic understanding of Java, it
is possible to teach yourself as you go along.
15
2.2 Real-Time Transport Protocol

2.2.1 Introduction
The real-time transport protocol (RTP), provides end-to-end delivery services for data
with real-time characteristics, such as interactive audio and video. These services
include payload type identification, sequence numbering, time-stamping and delivery
monitoring. Applications typically run RTP on top of UDP to make use of its
multiplexing and checksum services; both protocols contribute to parts of the
transport protocol functionality. However, RTP may be used with other suitable
underlying network or transport protocols. RTP supports data transfer to multiple
destinations using multicast distribution if provided by the underlying network. [RTP
Technology, http://www.ixiacom.com/library/technology _guides /tg_display.php? key
= rtp, (April 2005)]
Although RTP is used for real-time media, it does not actually ensure that packets are
delivered on time itself, but relies on lower layer services to ensure this, and other
quality-of-service (QOS) guarantees. Each packet has a sequence number and this
allows the receiver to reconstruct the packets into the correct order.
In defining RTP, two closely linked parts will be described:
The real-time transport protocol (RTP), to carry data that has real-time
properties,
The RTP control protocol (RTCP), to monitor the quality of service and to
convey information about the participants in an on-going session.
The diagram that is shown below in Figure 2.4 - RTP and the OSI Model below,
shows how RTP is incorporated into the OSI model. RTP fits into the session layer of
the model, between the application layer and the transport layer. RTP and RTCP work
independent of the underlying Transport Layer and Network Layer protocols.
Information in the RTP header tells the receiver how to reconstruct the data and
describes how the codec bit streams are packetized.
16
Figure 2.4 - RTP and the OSI Model
2.2.2 Some RTP Definitions
RTP payload: The data transported by RTP in a packet, for example audio
samples or compressed video data.
RTP packet: A data packet consisting of the fixed RTP header, a possibly
empty list of contributing sources, and the payload data. Some underlying
protocols may require an encapsulation of the RTP packet to be defined.
Typically one packet of the underlying protocol contains a single RTP packet,
but several RTP packets may be contained if permitted by the encapsulation
method.
RTCP packet: A control packet consisting of a fixed header part similar to

that of RTP data packets, followed by structured elements that vary depending
upon the RTCP packet type. Typically, multiple RTCP packets are sent
together as a compound RTCP packet in a single packet of the underlying
protocol; this is enabled by the length field in the fixed header of each RTCP
packet.
Port: The abstraction that transport protocols use to distinguish among

multiple destinations within a given host computer. TCP/IP protocols identify
ports using small positive integers. RTP depends upon the lower-layer protocol
to provide some mechanism such as ports to multiplex the RTP and RTCP
packets of a session.
17
Transport address: The combination of a network address and port that

identifies a transport-level endpoint, for example an IP address and a UDP
port. Packets are transmitted from a source transport address to a destination
transport address.
RTP session: The association among a set of participants communicating with

RTP. For each participant, the session is defined by a particular pair of
destination transport addresses (one network address plus a port pair for RTP
and RTCP). The destination transport address pair may be common for all
participants, as in the case of IP multicast, or may be different for each, as in
the case of individual unicast network addresses plus a common port pair. In a
multimedia session, each medium is carried in a separate RTP session with its
own RTCP packets. The multiple RTP sessions are distinguished by different
port number pairs and/or different multicast addresses.
Synchronization source (SSRC): The source of a stream of RTP packets,

identified by a 32-bit numeric SSRC identifier carried in the RTP header so as
not to be dependent upon the network address. All packets from a
synchronization source form part of the same timing and sequence number
space, so a receiver groups packets by synchronization source for playback.
Examples of synchronization sources include the sender of a stream of packets
derived from a signal source such as a microphone or a camera, or an RTP
mixer. A synchronization source may change its data format, e.g., audio
encoding, over time. The SSRC identifier is a randomly chosen value meant to
be globally unique within a particular RTP session. A participant need not use
the same SSRC identifier for all the RTP sessions in a multimedia session; the
binding of the SSRC identifiers is provided through RTCP. If a participant
generates multiple streams in one RTP session, for example from separate
video cameras, each must be identified as a different SSRC.
Contributing source (CSRC): A source of a stream of RTP packets that has

contributed to the combined stream produced by an RTP mixer. The mixer
inserts a list of the SSRC identifiers of the sources that contributed to the
generation of a particular packet into the RTP header of that packet. This list is
called the CSRC list. An example application is audio conferencing where a
mixer indicates all the talkers whose speech was combined to produce the
outgoing packet, allowing the receiver to indicate the current talker, even
18
though all the audio packets contain the same SSRC identifier (that of the
mixer).
End system: An application that generates the content to be sent in RTP

packets and/or consumes the content of received RTP packets. An end system
can act as one or more synchronization sources in a particular RTP session, but
typically only one.
Mixer: An intermediate system that receives RTP packets from one or more
sources, possibly changes the data format, combines the packets in some
manner and then forwards a new RTP packet. Since the timing among multiple
input sources will not generally be synchronized, the mixer will make timing
adjustments among the streams and generate its own timing for the combined
stream. Thus, all data packets originating from a mixer will be identified as
having the mixer as their synchronization source.
Translator: An intermediate system that forwards RTP packets with their

synchronization source identifier intact. Examples of translators include
devices that convert encodings without mixing, replicators from multicast to
unicast, and application-level filters in firewalls.
Monitor: An application that receives RTCP packets sent by participants in an

RTP session, in particular the reception reports, and estimates the current
quality of service for distribution monitoring, fault diagnosis and long-term
statistics. The monitor function is likely to be built into the application(s)
participating in the session, but may also be a separate application that does not
otherwise participate and does not send or receive the RTP data packets. These
are called third party monitors.
2.2.3 RTP Data Structures

Figure 2.5 below shows the structure of an RTP packet, with explanations of the
different components before it.
V is the Version, which identifies the RTP version.
P is the Padding for the protocols or algorithms that require a packet to be a

specific size. The padding field is a variable field that when set indicates that
the space at the end of the payload is padded with octets to make the packet the
proper size.
19
X is the Extension bit, when set, the fixed header is followed by exactly one
header extension with a defined format.
CSRC count contains the number of CSRC identifiers that follow the fixed
header.
M is the Marker, whose interpretation is defined by a profile, is intended to
allow significant events such as frame boundaries to be marked in the packet
stream.
Payload type - Identifies the format of the RTP payload and determines its
interpretation by the application. A profile specifies a default static mapping of
payload type codes to payload formats. Additional payload type codes may be
defined dynamically through non-RTP means.
Sequence number increments by one for each RTP data packet sent, and may
be used by the receiver to detect packet loss and to restore packet sequence.
Timestamp reflects the sampling instant of the first octet in the RTP data
packet. The sampling instant must be derived from a clock that increments
monotonically and linearly in time to allow synchronization and jitter
calculations.
SSRC is an identifier that is chosen randomly, with the intent that no two
synchronization sources within the same RTP session have the same SSRC
identifier.
CSRC identifies the contributing sources for the payload contained in this
packet. This is another layer of identification for sessions that have the same
SSRC number, but the data in the stream needs to be differentiated further.
CC
2
1
PT
3
1
Sequence Number
TimeStamp
Synchronization Source (SSRC) Identifier

Contributing Source (CSRC) identifiers
.
Payload Packet
.
Figure 2.5 RTP Packet Header Format

20
2.2.4 RTP Control Protocol

The RTP Control Protocol (RTCP) works by transmitting periodically to all
participants in the session, control packets, in much the same manner as data packets
are transmitted. RTCP performs four functions:
provides feedback on the quality of the data distribution,
carries a persistent transport-level identifier for an RTP source called the

canonical name or CNAME,
by having each participant send its control packets to all the others, each can
independently observe the number of participants and this number is used to
calculate the rate at which the packets are sent,
conveys minimal session control information, which is an optional function,

RTCP serves as a convenient channel to reach all the participants, but it is not
necessarily expected to support all the control communication requirements of
an application.
Functions 1-3 are mandatory when RTP is used in the IP multicast environment, and
are recommended for all environments. RTP application designers are advised to avoid
mechanisms that can only work in unicast mode and will not scale to larger numbers.
RTCP Packet Format
As mentioned above, RTCP packets are sent periodically to all participants as well as
the data packets. There are a number of types of RTCP packets:
Sender Report
Receiver Report
Source Description
Bye
Application-specific
All participants in a session send RTCP packets. A participant that has recently sent
data packets issues a Sender Report (SR). The sender report contains the total number
of packets and bytes sent as well as information that can be used to synchronize media
streams from different sessions. The structure of the RTCP SR is shown in Figure 2.6
below. It consists of three sections, possibly followed by a fourth profile-specific
extension section if defined.
21
The first section, the header, is 8 octets long, with the following fields:
The version (V) is 2 bits and identifies the version of RTP, which is the same
in RTCP packets as in RTP data packets.
The padding (P) is 1 bit, if the padding bit is set, this RTCP packet contains
some additional padding octets at the end which are not part of the control
information. The last octet of the padding is a count of how many padding
octets should be ignored.
The reception report count (RC) is 5 bits and represents the number of
reception report blocks contained in this packet.
The packet type (PT) is 8 bits and contains the constant 200 to identify this as
an RTCP SR packet.
The length is 16 bits, the length of this RTCP packet in 32-bit words minus one
including the header and any padding.
The SSRC is 32 bits and is the synchronization source identifier for the
originator of this SR packet.
The second section, the sender information, is 20 octets long and is present in every
sender report packet. It summarizes the data transmissions from this sender and has the
following fields:
The NTP timestamp is 64 bits and indicates the wallclock time when this
report was sent so that it may be used in combination with timestamps returned
in reception reports from other receivers to measure round-trip propagation to
those receivers.
The RTP timestamp is 32 bits and corresponds to the same time as the NTP
timestamp (above), but in the same units and with the same random offset as
the RTP timestamps in data packets.
The sender's packet count is 32 bits and is the total number of RTP data
packets transmitted by the sender since starting transmission up until the time
this SR packet was generated. The count is reset if the sender changes its
SSRC identifier.
The sender's octet count is 32 bits and is the total number of payload octets
(i.e., not including header or padding) transmitted in RTP data packets by the
sender since
starting transmission up until the time this SR packet was
generated. The count is reset if the sender changes its SSRC identifier. This
field can be used to estimate the average payload data rate.
22
The third section contains zero or more reception report blocks depending on the
number of other sources heard by this sender since the last report. Each reception
report block conveys statistics on the reception of RTP packets from a single
synchronization source. Receivers do not carry over statistics when a source changes
its SSRC identifier due to a collision. These statistics are:
The SSRC_n (source identifier) is 32 bits and is the SSRC identifier of the
source to which the information in this reception report block pertains.
The fraction lost is 8 bits and is the fraction of RTP data packets from source
SSRC_n lost since the previous SR or RR packet was sent, expressed as a fixed
point number with the binary point at the left edge of the field.
The cumulative number of packets lost is 24 bits and is the total number of
RTP data packets from source SSRC_n that have been lost since the beginning
of reception. This number is defined to be the number of packets expected less
the number of packets actually received, where the number of packets received
includes any which are late or duplicates.
The extended highest sequence number received is 32 bits. The low 16 bits
contain the highest sequence number received in an RTP data packet from
source SSRC_n, and the most significant 16 bits extend that sequence number
with the corresponding count of sequence number cycles.
The interarrival jitter is 32 bits and is an estimate of the statistical variance of

the RTP data packet interarrival time, measured in timestamp units and
expressed as an unsigned integer. The interarrival jitter J is defined to be the
mean deviation (smoothed absolute value) of the difference D in packet
spacing at the receiver compared to the sender for a pair of packets.
The last SR timestamp (LSR) is 32 bits and is the middle 32 bits out of 64 in
the NTP timestamp received as part of the most recent RTCP sender report
(SR) packet from source SSRC_n. If no SR has been received yet, the field is
set to zero.
The delay since last SR (DLSR) is 32 bits and is expressed in units of 1/65536
seconds, between receiving the last SR packet from source SSRC_n and
sending this reception report block. If no SR packet has been received yet from
SSRC_n, the DLSR field is set to zero.
23
RC
2
1
PT = SR = 200
3
1
Length
SSRC of Sender
NTP timestamp, most significant word

NTP timestamp, least significant word
RTP Timestamp
Senders Packet Count
Senders Octet Count
SSRC_1 (SSRC of first source)
Fraction Lost
Cumulative number of packets lost

extended highest sequence number received
interarrival jitter
last SR (LSR)
delay since last SR (DLSR)
SSRC_2 (SSRC of
.
second source)
.
profile-specific extensions
Figure 2.6 - RTCP Sender Report Structure

Session participants periodically issue Receiver Reports (RR) for all of the sources
from which they are receiving data packets. A receiver report contains information
about the number of packets lost, the highest sequence number received, and a
timestamp that can be used to estimate the round-trip delay between a sender and the
receiver. The format of the receiver report (RR) packet, as shown in Figure 2.7 below,
is the same as that of the SR packet except that the packet type field contains the
constant 201 and the five words of sender information are omitted (these are the NTP
and RTP timestamps and sender's packet and octet counts). The remaining fields have
the same meaning as for the SR packet. An empty RR packet (RC = 0) is put at the
head of a compound RTCP packet when there is no data transmission or reception to
report.
24
RC
2
1
PT = SR = 200
3
1
Length
SSRC of Sender
SSRC_1 (SSRC of first source)

Fraction Lost
Cumulative number of packets lost

extended highest sequence number received
interarrival jitter
last SR (LSR)
delay since last SR (DLSR)
SSRC_2 (SSRC of second source)
.
profile-specific extensions
Figure 2.7 - RTCP Receiver Report Structure
2.2.5 Alternatives to RTP

Once JMF had been chosen, there was really no better option than the real-time
transport protocol. However, it would be possible to implement a proprietary protocol
using the custom packetizers provided by JMF, along with UDP or TCP. However,
TCP is not suitable for real-time data because of the delays it introduces, due to packet
retransmission and UDP is unsuitable without a higher level features to deal with
packet sequencing and loss. Another alternative to RTP could be RTSP, however this
JMF only limited compatibility for this.
2.2.6 Summary
The Real Time Transport Protocol is a lot more expansive than described above.
However, for what it was used within this project, the detail given above is more than
adequate. It is important to understand the different packet structures that are shown,
as these form the basis by which all data within the system was sent.
25
2.3 Audio Encoding Scheme G.723.1

2.3.1 Introduction
As mentioned earlier, the audio encoding scheme which was chosen was G.723.1. This
format is ideal for compressing the audio signal component of multimedia services at a
very low bit rate. In this application it will be used for the audio side of the video
conferencing. The coder that is used was designed to represent speech with a high
quality using a limited amount of complexity. It is not ideal for audio signals other
than speech, for example music, but can be used for them.
The coder involved can operate at one of two bit rates, either 5.3 kbit/s or 6.3 kbit/s.
The higher bit rate has better quality, the lower, whilst still maintaining an adequate
quality also offers more flexibility to the designer. Both rates must be implemented
within in the encoder and the decoder. [3]
Audio signals are encoded by the coder in 30 msec frames; there is also a look ahead
of 7.5 msec. This results in a total delay of 37.5 msec. Any additional delays in the
operation and implementation of the coder can be attributed to:
actual time spent processing the data in the encoder and decoder,
transmission time on the communication link,
additional buffering delay for the multiplexing protocol.
2.3.2 Encoder Principles

The block diagram of the encoder is shown in Figure 2.8 below. As can be seen there
are a number of different blocks, the functions of which are beyond the scope of this
project.
26

) *
1) *
". /
) *
) *
) *
. /
#
) *
0) *
&) *
$. /
!&
$
) *
%
+
'
$#!
23
,) *
"
) *
) *
) *
) *
!
Figure 2.8 - G.723.1 Encoder
2.3.3 Decoder Principles

The block diagram of the decoder is shown below, in Figure 2.9. It is just shown for
diagrammatic purposes, and the functions of the blocks do not need to be understood
for this project.
Figure 2.9 - G723.1 Decoder
27
2.3.4 Alternative Audio Encoding Schemes

As shown in Table 2.2 JMF Common Audio Formats, the only other format with the
required low bandwidth is GSM. The reason that this format was not chosen, is that
G.723 mono is a better quality. This was the only reason for choosing the scheme that
was chosen. ADPCM(DVI, IMA4) and Mu-Law are also suitable for RTP data,
however they do not meet the low bandwidth requirements.
2.3.5 Summary
This format was well chosen as it is ideal for the purpose that it will be used for within
this application, which is basically the voice part of the video conferencing. Although
it is possible to go very deep into the workings of the coder and decoder, it is not
necessary for this project. It is sufficient to know the basics of how it works and what
it is suitable to be used for.
2.4 Video Encoding Scheme H.263

2.4.1 Introduction
The H.263 format is ideal for encoding video images without much movement, at low
bit rates. Pictures are sampled at an integer multiple of the video line rate. This
sampling clock and the digital network clock are asynchronous. The transmission
clock is provided externally. The video bit rate may be variable. [4]
2.4.2 Summary of Operation

The diagram in Figure 2.10 shows an H.263 baseline encoder. The algorithms
involved in the operation of this encoder are far beyond the scale of this project. It is
sufficient to know that it exists and is used in the encoding scheme.
28
Figure 2.10 - H.263 Baseline Encoder
2.4.3 Alternative Video Encoding Schemes

As shown in Table 2.1 JMF Common Video Formats, the other video formats
supported by RTP include H.261 and JPEG, however neither of these meet the low
bandwidth requirement. At the beginning of the project, it was thought that MPEG
would be used. The reason that this was not chosen is that MPEG does not support
capture from a live video source. It would only support a pre-recorded video or
capture from an MPEG enabled data source. This would not have been suitable for
video calls.
2.4.4 Summary
H.263 can be used for compressing the moving picture component of audio-visual
services at low bit rates. It is ideal for uses in video conferencing as there is not much
movement involved and low bit rates are used. This makes it the ideal encoding
scheme for this application.
2.5 Image Observation

2.5.1 Initial Ideas
Initially, it was thought that some kind of motion detection algorithm would be used to
implement the image observation feature. A number of possibilities were looked into
when researching this prospect, some of which included:
29
Motion Estimation: used to predict frames within a video sequence using

previous frames, with the help of motion vectors. The use of motion vectors
mean that only the changes in the frames are sent, as opposed to the whole
frame.
Fixed Size Block Matching: each image frame is divided into a fixed number
of blocks. For each block in the frame, a search is made in the reference frame
over an area of the image for the best matching block, to give the least
prediction error.
Motion Compensation: motion compensation uses blocks from a past frame

to construct a replica of the current frame. For each block in the current frame
a matching block is found in the past frame and if suitable, its motion vector is
substituted for the block during transmission.
After examining the specification for H.263, it was discovered that there were motion
detection and compensation algorithms built into it. This meant that the algorithm did
not have to be coded, it was already there and available to use. RTCP reports were
used to show the byte rate of the video stream, which was then used to implement the
image observation.
2.5.2 The Way it Works

Basically, the H.263 video encoding scheme was used in the implementation of the
image observation. The motion estimation and compensation that is built into the
format was used [1]. This assumes that the pixels within a current picture can be
modelled as a translation of those within a previous picture. Each macroblock is
predicted from the previous frame. The concept of macroblocks is explained below in
Figure 2.11. Each pixel within the macroblock undergoes the same amount of
translational motion, which is represented by two-dimensional motion vectors or
displacement vectors.
30
Figure 2.11 - Macroblocks within H.263

The basic idea behind the motion detection is shown in Figure 2.12 below.
Figure 2.12 - Motion Prediction
31
The way that the above was used for the image observation is as follows. When a
frame hasnt changed, a reference to a previous frame is sent. Basically, the image
observation feature exploits the temporal redundancy inherent in a video sequence.
The redundancy is larger when a camera is focused on an image that does not contain
a lot of movement. This is the case when a user leaves the shot. This redundancy is
reflected in a reduced RTCP byte rate.
This reference frame is then displayed which requires less byte rate than if a new
frame is sent. The RTCP reports monitor the byte rate of the video stream. If the byte
rate drops, and stays dropped for a certain period of time, then the call is ended. The
procedure to end the call is explained in more detail in section 3.5.
2.6 Multicasting
For the conferencing feature of this application, multicasting was used. All of the
participants within the conference transmit to a multicast address.
2.6.1 Alternatives to Multicasting

Another option that was looked at for the conferencing feature was to just allow all
participants to transmit and receive from and to each other at the same time. This setup
is shown in Figure 2.13. Basically, when the conference button was pressed, one call
would have been able to set up on top of another, so that two calls could take place
simultaneously and that participants would be able to listen for all streams.
32
Figure 2.13 - Original Conferencing Plan

As could be imagined, this method would be very cumbersome. It would use a lot of
system resources as there would be an unnecessary amount of streams being sent. It is
impractical for a user to have to transmit their data more than once. This idea was
decided against.
2.6.2 What is Multicasting

Multicasting is when a packet is sent to a host group, which is a set of hosts
identified by a single IP address. A multicast datagram is then delivered to all
members of the destination group [2]. Hosts may join or leave the group at any time as
membership of the group is dynamic. A host can be a member of more than group at a
time and a host does not need to be a member of a group to send datagrams to it. There
are two types of host groups; a permanent host group is one which has a well-known,
administratively assigned IP address, which is permanent. A permanent host group can
have any number of members, even zero. The remainder of the IP addresses are
available for dynamic assignment to the other type of group, which is known as a
transient group. This second type of group only exists as long as it has members.
The forwarding of IP multicast datagrams is handled by multicast routers. When a
datagram is transmitted by the host, it is sent as a local network multicast and will be
delivered to all members of the destination host group. The addresses which are
allocated to the host groups are known as class D IP addresses and range from
33
224.0.0.0 to 239.255.255.255. The diagram in Figure 2.14 shows how the data is
distributed to all members of the group.
Figure 2.14 - Multicasting through Router
2.7 Summary
The information contained within this chapter has been an invaluable asset in
developing this application. A firm understanding of all the standards was required
before coding could even begin. JMF placed a lot of restrictions on the standards that
could be used. JMF does provide the ability to implement custom packetizers and
custom encoders, however to so would have been time consuming and unnecessary for
this application.
34
Chapter 3
3 Design of the System
3.1 System Architecture
The system as it stands consists of two different communication architectures. One is
client to server and the other is client to client. The reason that there are two different
methods is to make the system as efficient as possible. There was the possibility of
using client to server for all communication; however it was felt that this would be
inefficient as the server did not need to be part of a call between two clients, it would
have been an unnecessary use of system resources. For this reason, calls between
clients are peer to peer and all other communication goes through the server.
3.1.1 Client to Server Communication

This architecture is used for all system messages, for the setting up of calls, etc;
basically for everything other than actual calls. There will be one server and there can
be any number of clients connected to that server. The client to server configuration is
shown in Figure 3.1.
Figure 3.1 - Client to Server Communication

35
The connections between the server and the clients are bidirectional TCP connections.
It was not necessary to use RTP here as they are not real time connections. RTP is
described in section 2.2 as being ideal for real time communication. The messages that
are sent between the clients and server will include login, logoff, messages to be sent,
calls to be made etc. which are not time dependent. The server plays an integral part in
the system. Basically, all communication between any two clients must first go
through the server. So if a client wishes to call another client, they must send a call
request to the server. The server will then proceed to set up the call between the
clients. The code for this is shown in Appendix 1 in section 7.1. Also included in
Appendix 1 are the code extracts for login request (section 7.2), logoff request (section
7.3), call end request (section 7.4), conference setup request (section 7.5), request to
add a participant to a conference (section 7.6), request to end a conference (section
7.7), request to send a message (section 7.8) and request to receive a message (section
7.9).
The purpose of including these code extracts is to show that the server really does
control everything that the clients want to do. It will be the server that will check if the
other party is online and available, and the server that will set up the call. If a client is
unavailable when a message is sent, the server will store the message until they
become available and will then forward it on. Some might ask why a server is
required, why not just let the clients communicate directly. This was basically a design
choice. It was the opinion of the developer that direct client to client communication
for all tasks would mean that the load on the clients will be quite large, which was
unnecessary. If it was up to the clients to do everything, then the system would be
slowed down sufficiently. The server will act as a centre point, where clients can
contact each other. Without the server, the clients would have difficulty in contacting
another client. It was also a lot more efficient to let the server take some of the load
and leave all administration to the server, leaving the clients free to partake in calls,
send messages etc. It also meant that messages could be sent while clients are on calls
because the server can store the message, and messages can also be sent when the
receiving client is offline and stored until their next login, something that would not
have been possible without a server.
36
3.1.2 Client to Client Communication

The other form of communication built into the system architecture is direct
communication between two clients. This will only occur at one time, during a call. As
shown in the previous section, the server is required to set up the call. However, once
the call has been set up, the server drops out and the streams are sent directly between
clients. The client to client architecture is shown in Figure 3.2. This type of
communication exists solely for calls. This is where the real-time transport protocol
discussed in section 2.2 will be employed. Voice and video streams are synchronised.
This is done within the RTP protocol. In section 2.2.4, the section on RTCP, the
canonical name (CNAME) was described as being an identifier for every stream.
When two streams, namely a voice and a video, have the same CNAME, it implies
that they are being sent from the same source and they are automatically synchronized.
This was important for the application as it is a fundamental expectation of a video
conference that the voice and video will be synchronised.
Figure 3.2 Client to Client Communication

Once again, the decision to choose this architecture can be justified. It would have
been possible to let the server remain during the call but it would have been of no
benefit. The decision to remove the server from the call reduced the server complexity
and decreased the system load.
3.2 System Design

When undertaking a software project such as the one described in this report, it is
important to have a good design brief. One of the most effective ways to design such a
system is to create class diagrams. These clearly show the methods that are part of
each class as well as the relationships between classes, and can give an in depth
understanding of the overall system.
37
3.2.1 The Server

The first class diagram, shown in Figure 3.3 is one the server and its related classes.
As described in the previous section, the server plays an integral part in the overall
functionality of the system.
Figure 3.3 - Server Class Diagram

As can be seen, the server is the parent class and there are three child classes,
ServerHandle, ServerSideUtilities and ServerSideStorage. There
can only be one server, and the methods within the server are mainly just for the
graphical user interface and will be inherited by the three child classes. The
ServerHandle is the interface a client communicates with for access to server side
resources. The server handle is responsible for the way messages are sent and is
responsible for telling the server what to do when it receives a message, depending on
the type of message received. There are two types of message, a push message and a
38
pull message, and both these messages can be sent or received. There are push and
pull links on all clients and on the server. The reason is that normally in a client server
application, only the client can start communication with the server, but by using push
and pull either can initiate communication. A push message is sent by either the client
or server, depending on who initialises communication, and the response to a push
message is a pull message. The person who sends a push will receive back a push, and
the person who sends a pull will receive a pull. An example is shown below, in Figure
3.4. This type of communication would be used in a situation where the user presses a
button on the client side that initializes communication with the server. However,
within this application, there will usually only be one send and one receive per task
(request and confirmation / error), as opposed to two of each, as shown in the diagram.
Figure 3.4 - Example of Push Pull Message Setup

It should be noted that the layout of the diagram above is not the only layout available,
the client and server could be reversed, with the server sending the push and the client
sending the pull. A typical example of this is when a server receives a message which
has to be pushed to its destination; this is the case for UMS delivery. The push and
pull messages are dealt with by the ServerHandle. There can be many
ServerHandles associated with one server, as there is one created for every client
that connects to that server.
39
The next child class is the ServerSideUtilities. This class is responsible for
all the requests which were discussed in section 3.1.1 and whose code extracts are
included in Appendix 1. So basically this class is responsible for all of the things that
the server does. All calls made or received, all messages sent or received and all
system requests such as login and logoff, will go through this class. It is also within
ServerSideUtilities that messageObjects, profileObjects and
mappingObjects are compiled. There will only ever be one of these classes
created for any server.
The final child class of the server is ServerSideStorage. Here the server will
store anything that needs to be stored. Examples include messageObjects,
confernceObjects, mappingObjects etc. It is here that messages will be
stored if the receiver is unavailable and it is this class that will store conference
information, for example how many people are involved in a particular conference.
Once again, there will only ever be one ServerSideStorage class related to any
one server.
3.2.2 The Client

The next diagram, shown in Figure 3.5, shows the class diagram for the client. The
parent class is mainly responsible for the graphical user interface. In this case, there
are
four
child
classes,
ClientHandle,
MediaManager,
ClientSideUtilities and ClientSideStorage. ClientHandle is much

the same as ServerHandle, dealing with the same sort of tasks on the client side.
The ClientSideStorage is also similar to its server equivalent. The
ClientSideUtilities is also similar to the ServerSideUtilities,
handling the client side of the different processes.
The fourth child class of client is MediaManager. This is an important class as it is
responsible for all media within the application. It creates and closes the media
streams, creates the player etc. Any changes to the media streams, e.g. volume, will be
done through the media manager class. There will only be one instance of each of the
child classes related to any one client.
40
Figure 3.5 - Client Class Diagram
3.3 Messaging Structure

In the design of this application, it was decided to use a messaging structure created by
the designers as opposed to one that was already made and available for use. The
reason for this is that for the more complicated message types, e.g. voicemail, it was
easier to adjust one that had been made by the designers instead of trying to change a
pre-made one to suit this application. The message structure is made up of a
41
messageObject, with a payload encapsulated within it. The messageObject

has four fields, as follows:
senderID: This is the phone number of the client who sent the message
destinationID: This is the phone number of the client that the message is
being sent to
payloadType: This is the type of message that is being sent. It identifies the
purpose of the message and/or the payload object
payloadObject: This is where the serialised payload data is transported. This

field related directly to the payloadType field. In effect, this is the actual
message
3.4 Conferencing
The conferencing capabilities built into this application allow up to four people to take
part in one call at the same time, using multicasting, which is described above in
section 2.6.2. There is also a limit of four conferences taking place at one time. The
reason behind these limitations was so the feature could be tested and these limits
could be extended without difficulty.
The conference function is based around a conferenceObject, which consists of
the following fields:
conferenceID: this is the ID of the conference and will be the same as the
phone number of the client who initiated the conference
conferencePosition: this will be the position of the conference within the

conference vector and it will be used to keep track of the number of people
within the conference
conferencePortBase: a reference to the port at the start of the conference. As

there are four users in a conference, there are 8 ports (four audio and four
video), so each participant must listen for six (as they do not need to listen for
their own.).
participantPortBase: this will be the port that the individual participants will
transmit from, and the port that they will not have to listen for
participantID: this will be the clientID of the individual participant
conferenceAddress: this will be the multicast address that the conference will
be transmitted to, which will always be 224.122.122.122
42
There is also a conference vector, which is stored in ServerSideStorage, where

the conferences are stored. This vector is what ensures that the maximum number of
conferences allowed is not exceeded. It does this as follows: the vector is sixteen long,
if a conference is to be added, then positions 0, 4, 8 and 12 are checked. The
conference will be added to the first of these positions that is null. If all of these
positions are full then the conference capacity has been met and the conference cannot
be started. A participant can only be added by the conference initiator. When a
participant is added, a conference object is sent containing the conferenceID, the
position of the conference in the vector and the participantID. The three positions of
the vector after the conference position will be checked and the participant will be
added to the first available space. If none of these three positions are available then the
conference is full and the participant will not be added. This process is shown in
Figure 3.6 below.
Figure 3.6 - Allocating a Conference Position

The following diagram, shown in Figure 3.7, shows the message sequence chart for a
conference call. It is just the basic messages and does not take exceptions into account.
43
Figure 3.7 - Message Sequence Chart for Conference Call
The setup for the conference is quite simple. All media will be sent to a multicast
address, which will be the same for all conferences. The destinationID field of
the conference object will be set to this multicast address. A reference port will be set
for each conference, which can be found in the conferencePortBase field of the
conference object. From this reference it is known that this and the next seven ports
will be used for this particular conference. Each user is allocated a port to transmit to;
44
this is in the participantPortBase field of the conference object. The user will
then know that it is does not have to listen to this port base, only to the other six, as it
does not have to listen to itself. This is shown in detail in Figure 3.8 below.
Figure 3.8 - Conferencing Setup
45
3.5 Image Observation

The Image Observation feature which is built into this application is one which is used
during a video call. Basically, it monitors the movement of the person who is sitting in
front of the camera. If it detects that the person has left without ending the call, then it
will end the call automatically.
The way that this works is as follows: the byte rate of the video stream which is being
sent by the user is monitored. If the byte rate falls below a certain threshold value, then
this is noted. If it stays below that value for a predefined period, then the call is ended.
To obtain the threshold which is required, a number of values had to be recorded. One
hundred and eighty values were obtained, ninety of someone talking in front of the
camera and ninety of no one sitting in front of the camera. The achieved results are
shown in the graph below, in Figure 3.9. As can be seen there is a noticeable
difference between the two.
50000
45000
40000
Byte Rate
35000
30000
25000
20000
15000
10000
5000
0
0
20
40
60
80
100
Sample
Someone talking
No one in front of camera
Figure 3.9 - Image Observation Averages

There was another value which was used in calculating the threshold value; this was
the average byte rate. The average byte rate is calculated by getting the average of the
previous four byte rates. The threshold was set to be 77% of this average, this is the
46
midrange value between the two averages in the above graph. If the byte rate was less
than 77% of the average, then it was not used in the calculation of the next average. A
count was incremented each time the threshold was reached. The call was cut off when
the count reached ten. The code for the image observation is shown in Appendix 2. If
the mark increments without the person leaving the conversation, it will unlikely get
anywhere near ten, and usually drops back to zero after one or two.
3.6 Common Procedures within the Application

3.6.1 Login
A user can login from the login screen on the client side. This process is described in
detail in section 4.2. The message sequence chart for this procedure is shown below, in
Figure 3.10 - MessageSequence Chart for Login
47
3.6.2 Call Setup

This procedure is called whenever a user presses the green call button on the home
screen of the client. The message sequence chart is shown below, in Figure 3.11.
Figure 3.11 - Message Sequence Chart for Call Setup

48
3.6.3 Call Teardown

This procedure will be called when a user presses the red call end button while a call is
in place. The message sequence chart for the procedure is shown in Figure 3.12 below,
Figure 3.12 - Message Sequence Chart for Call Teardown
49
3.6.4 Logout
The users can logout at any time, once they are logged in, by selecting the logout
option from the file menu.
Figure 3.13 - Message Sequence Chart for Logout
3.7 Other Features within the Application

The adaption and Videomail features have been designed by Edward Casey, who has
given permission for full use of the code within this application.
50
Chapter 4
4 Implementation of the System
4.1 Introduction
This chapter aims to give a graphical and textual description of how to use each of the
features within the system. On completing this chapter, the reader should be familiar
with all of the available features and should be able to make full use of these features.
The chapter will be written in step-by-step manner, beginning with the login, including
all possible actions that can be carried out once logged in and ending with the logoff
process. Images will be shown to describe graphically what is being described in the
text.
4.2 Logging In
Once the application has been started, the first thing that can be seen is the login
screen. There are username and password fields, which must be correctly completed in
order to login. The username and password must be obtained from the system
administrator in advance of attempting to login. Once the username has been allocated,
it will be the same for every subsequent login. The server IP and server port fields
should not, in normal usage, need to be changed. However, if these fields do need to
altered, then the IP must be padded, i.e. any of the sections of three numbers with less
than three digits must be padded with preceding zeroes. Figure 4.1 below shows the
login in screen as it will be seen when the application is first run. As can be seen, the
username and password fields are empty, and the server IP and port fields are hard
coded.
If an invalid username or incorrect password is entered, an error message will pop up
informing the user of the exact reason for the failed login. If for some reason the
server cannot be contacted, and error message will also inform the user of this. If all
information is entered correctly and there are no problems connecting to the server, the
user will be brought to the home screen.
51
"
Figure 4.1 - Login Screen
4.3 Calls
4.3.1 Making a Peer to Peer Call
Once the user is correctly logged in, they will see the home screen, which is shown in
Figure 4.2. From here, they can now access all of the screens, except for the login
screen. If at any time, they wish to logout, this can be done from the file menu. It is
important to logout of the system correctly, before exiting.
52

#
#
#
Figure 4.2 - Home Screen

To make a call to another user, who is logged on to the system and whose status is set
to available i.e. they are not on another call, the first thing is to enter their phone
number into the number box. You would then press the green call button, which will
start the call setup process. A graphical description of this call setup process is shown
below, in Figure 4.3.
53
Figure 4.3 - Making a P2P Call

Once the call has been accepted (the possibility of it not being accepted will be
discussed later, in the Videomail section), the call button will be disabled, the end
button will be enabled and a live video image of the called party will appear in the
larger of the two screens and a preview of the callers video image will appear in the
smaller of the two screens. The called partys audio stream will be heard through the
speakers and the caller can send their audio by speaking into their microphone. During
the call, the volume can be adjusted using the volume bar and the sound can be muted
by pressing the mute button. The call can be ended by either party pressing the red end
button. The other party, i.e. the one who has not ended the call, will receive a
confirmation message to say that the call has been ended. The audio and video streams
will be closed and the images removed from the screen. The call button will be reenabled and end button will be disabled. Any messages sent to either user during the
call will be received. Details are shown below, in Figure 4.4.
54
#
&
Figure 4.4 - During a Call

If a user is sent a message while they are on a call, the message will be stored in the
server until the user is free again.
4.3.2 Receiving a Person to Person Call

If a user is logged on and not busy and another user calls their number, they will
receive a popup message asking them whether or not they would like to accept the
call. They will also hear a ringing sound, notifying them of the incoming call.
55
Figure 4.5 - Call Accept /Reject

They will have the option to accept or reject the call. If they accept it, the same
process that happened above, when the called party accepted the call, will take place.
This process is shown in Figure 4.4 - During a Call. If the user chooses to reject the
call, then the calling party will be directed to the Videomail process, which will be
described later.
4.3.3 Initiating a Conference Call

A conference call is one that has more than two people involved. In this application,
the maximum number of people that can partake in a conference is four. Likewise, the
maximum number of conferences taking place at any one time is also four. In a
conference call, all parties will be able to see and hear all other parties. The large
video image screen, usually containing the image of the called party, will be divided
up so that the user can see all parties at once. If the initiating party presses the end call
button during the conference, then the conference will be ended and each part will be
sent a confirmation message informing them that the conference has ended. If one of
the other parties presses the end call button during the conference, then they will be
removed from the conference and all their receiving and transmitting will be
terminated. Other users will see the image on their screen change so that it is divided
between the remaining participants. The graphical description of initiating a
conference call is shown in Figure 4.6.
56
Figure 4.6 - Initiating a Conference Call
4.3.4 Joining a Conference Call

For a user to join a conference call, the user who initiated the conference must invite
them. If this happens and the user is available, then they will receive the option to join
the conference or not, with the popup window shown in Figure 4.7.
57
Figure 4.7 - Conference Request

If the user chooses to join the conference call, then the process described Figure 4.6 is
repeated. The user will see the other conference participants in their video window,
which will be divided accordingly. They will also hear the other participants through
their speakers. A conference participant can choose to exit the conference at any stage.
IF the participant who initiated the conference chooses to exit the conference then the
entire conference will be ended.
4.4 Messages
4.4.1 Sending an MMS Message
To send a message, first go to the MMS screen, which is described below in Figure 4.8
58

#
#
$
&
'
Figure 4.8 - MMS Screen

To send a MMS message, the first thing to do is to enter the number of the user that
you wish to send it to, into the box provided. Next enter the subject, which is not
compulsory but does make a message more complete for the receiver. The main body
of the text should be typed in the large box under the To and Subject fields. There is
the option to add an image. If no image is required, then the send button should be
pressed now. If a valid phone number has been specified for the receiver, then the
message will be sent to the server for processing and a confirmation message will be
received. If there is a problem sending the message, for example an invalid phone
number has been entered or there is a problem connecting to the server, then an error
message will be received, informing the user of the reason for the failure.
59
If a picture is desired, then do not press the send button yet. Press the attach button and
a file chooser will open as follows,
Figure 4.9 - Attach Button File Chooser
The required file can then be chosen and will be attached to the image. A preview of
the picture can then be seen in the image preview box, in the top right hand corner of
the window. Once all information has been entered correctly and the desired image has
been attached, the message can now be sent as described above. The MMS screen is
shown below, in Figure 4.10, when it is ready to send a message with an image
attached,
60
Figure 4.10 - MMS Screen Ready to Send
4.4.2 Receiving an MMS Message

The unified inbox window is where all received messages will go to. Messages are
automatically delivered once the receiver is online so there is no need to manually run
a check for messages operation. There will be a sound notification and the messages
will be put into the inbox, so the user can systematically check their inbox. The
message inbox is only used for storing messages and when there are no messages,
there will be no user interactions. However when there are messages present there will
be a number of possible interactions with the messages, namely reply, forward, delete
and open. After receiving a message, the inbox will look as shown in Figure 4.11. As
can be seen, the receivers phone number and the subject of the message are visible in
the inbox. When the message has been selected and right clicked on, the open, delete,
reply and forward options are shown. If the user selects reply or forward, they will be
automatically brought back to the MMS screen with the relevant fields completed (for
forward, the subject and message body will be the same as the received message and
the to field will be blank, for reply the to field will be the same as the sender field in
the received message and the subject will be set). If the user selects to delete the
61
message, the message will be moved from the inbox. If the user selects to open,
(which can also be done by double clicking), then a window will open showing the
content of the message, including the image. This window is shown in Figure 4.12.
Figure 4.11 Unified Inbox Screen
Figure 4.12 - Message Popup Window
62
4.4.3 Videomail Messages

The other type of message that can be sent or received is a Videomail message. This is
done in a different manner to the MMS messages; however, Videomail messages will
still be delivered to the unified inbox. A Videomail message cannot be sent electively.
It can only be sent if the user calls another user who is offline, busy or who rejects the
call. The user is then given the option to leave a Videomail, with the option window
shown in Figure 4.13 below.
Figure 4.13 - Leave Videomail Request

If the user opts to send a Videomail, then a window, like the one shown in Figure 4.14,
will pop up. There will be three buttons, start, stop and send. Once the start button is
pressed, the recording will begin. The user can record voice and video to a maximum
of one minute. Once they have finished they press stop, then the send button should be
pressed. This will send the voicemail to the server who will forward it onto the
receiver when they become available.
Figure 4.14 - Videomail Compose

A voicemail is received in exactly the same way as an MMS. The only difference is
that when a voicemail is opened, the user will have the option to start and stop, as
shown in Figure 4.15 below. This will either start the Videomail playing or stop it and
put it back to the beginning.
63
Figure 4.15 Videomail Popup
4.5 Extra Features

4.5.1 Image Observation
There is a feature built into this application that automatically cuts off a video call if
the user has left the shot. This will be brought into play when a user leaves the call
without ending the call. There is a timeout feature. It will not cut-off automatically but
will take averages over a period of time and if, after this time, there is still no
movement, the call will then be cut off.
4.5.2 Adaption
Another feature that is part of this application is automatic adaption. This will not
really affect the user as such, in that they do not have to do anything and the changes
made will not affect the calls or messages. Basically, if packet loss (congestion) is
detected, steps are taken to improve this and the image will therefore be clearer.
4.6 Using the Server

The server is the centre of the application. There is only one server present and
without it, none of the above features would be possible. To use the server is very
simple. First of all, the login screen is shown below in Figure 4.16. The login and
password can be assumed to be hard-coded.
64
Figure 4.16 - Server Login Screen

The login screen does not have any real interactions for the user. It is merely what the
server opens with. The next screen the user should go to is the activity screen, shown
below in Figure 4.17. The first thing that should be done is that the server must be
started. This is done by pressing the start server button. Until this has been done,
clients will not be able to connect to the server. The graph in this screen will show
activity of the server over a given period of time.
65
Figure 4.17 - Server Activity Screen

The next screen is the client status screen, shown in Figure 4.18.
Figure 4.18 - Server Client Status Screen

Client data is added by pressing the load data button. A file of type .mdf must then be
chosen. Once this has been done, the client status screen looks very different, as
shown in Figure 4.19.
66
Figure 4.19 - Server Client Status Screen with Clients

The final screen in the server is the administration screen. This is where user profiles
can be added, deleted or edited. This screen is shown below in Figure 4.20.
67

'
%'
(
(
#
%
Figure 4.20 - Server Administration Screen
68
5 Results and Discussion

A prototyping approach was adopted for the development of the system. This was an
incremental process involving design, followed by coding, followed by exhaustive
testing. Each incremental design stage used knowledge gained from previous design
and testing results to improve a model until it was acceptable for deployment in the
system. The testing was an integral part of this development process. Extensive testing
was also carried out at predetermined check points to ensure that developments had
not affected other features. Due to the collaborative nature of this project it was
essential that new features implemented by individual developers did not affect
functionality of other components developed by the other party. Testing helped
eliminate any such collaborative collisions. For the testing of this application,
collaboration was necessary as two or more parties were needed for almost all aspects
of testing. Some tables are included below showing all of the scenarios tested, the
results and any comments. In this table, the users are referred to as P1 and P2.
Login / Logoff
Number
Scenario Details
Results /
Comments
P1 attempts to login with correct username and Pass

password
P1 attempts to login with incorrect username and Pass

password
Error
message displayed
P1 attempts to login with incorrect username and Pass

correct password
Error
message displayed
P1 attempts to login with correct username and Pass
Error
incorrect password
message displayed
P1 attempts to login when server is not started.
Pass
Error
message displayed
6
7
P1 attempts to login when already logged on at Pass
Error
another location.
message displayed
P1 attempts to logoff while in a call
Fail
Code
required to end call

69
P1 attempts to logoff
Pass
P1 without following correct logout procedure
Fail
Error
message displayed.
Code
needed
server
to
on
detect
handle destroyed
Table 5.1 - Testing Scenarios: Login / Logoff
Making a Call
Number
Scenario Details
Results /
Comments
P1 makes a call to P2. P2 receives the ringing Pass

message and accepts the call. The conversation
proceeds. When it is finished, P1 presses the
End Call button and the conversation is over.
P1 makes a call to P2. P2 receives the ringing Pass

message and accepts the call. The conversation
proceeds. When it is finished, P2 presses the
End Call button and the conversation is over.
P1 makes a call to P2. P2 receives the ringing Pass P1 is sent to

message and rejects the call.
P1 makes a call to P2; P2 neither accepts nor Fail

rejects the call.
Sent
to
Sent
to
Videomail
P1 makes a call, entering the number of a client Fail
70
Invalid
Videomail
P1 makes a call, entering the number of a client Pass
who is away into the sender field.
number displayed
P1 makes a call, entering the number of a logged Pass
who is busy into the sender field.

9
saying client is busy
P1 makes a call, mistakenly entering an invalid Pass
off client into the sender field.

8
timeout
P1 makes a call, mistakenly entering their own Pass Error popup
number into the sender field.

7
required
number into the sender field.

6
Videomail of P2
Required
Timeout
10

who is set to Videomail into the sender field. P1
chooses to leave a voicemail.
11
P1 leaves the room (i.e. out of the frame of the Pass

camera). Image observation should end the call
after ten marks.
12
P1 leaves the room, image observation mark Pass

count increments but P1 returns before mark
count reaches ten. Mark count should reset to
zero.
13
P1 is in a call with little movement for a long Pass

period of time. Call should not end due to image
observation.
14

who is set to Videomail into the sender field. P1
chooses not to leave a voicemail.
Table 5.2 - Testing Scenarios: Making a Call
71
Sending a Message
Number
1
Scenario Details
Results
P1 sends a message to P2, containing only text. Pass

P2 is logged on at the time and receives it
straight away.

P2 is logged off at the time but logs on again
soon after.

P2 is logged on at the time but is on a call.
P1 sends a message to P2, containing text and Pass

image. P2 is logged on at the time and receives it
straight away.

image. P2 is logged off at the time but logs on
again soon after.

image. P2 is logged on at the time but is on a
call.
P1 sends a message to P2, containing only Pass

image. P2 is logged on at the time and receives it
straight away.

image. P2 is logged off at the time but logs on
again soon after.

image. P2 is logged on at the time but is on a
call.
10
P1 sends a message but accidentally enters their Pass they receive

own number in the sender field.
11
their own msg
P1 sends a message but accidentally enters an Pass

invalid number in the sender field.
Table 5.3 - Testing Scenarios: Sending a Message
72
Conference
Number
Scenario Details
Results /
Comments
P1 attempts to setup a conference when no Pass

conferences are running.
P1 attempts to setup conferences when there are Pass

conferences running but there is space an
available for one more.
P1 attempts to add the first participant to a newly Pass

created conference.
P1 attempts to add subsequent participants Pass

conference.
P1 attempts to add a participant to the Pass

conference that has reached capacity
P1 attempts to end the conference. All Pass

participants are notified and release resources.
Participant in conference attempts to send Pass After recode

message while in conference.
A participant who did not start the conference Pass

ends their call to the conference. Conference
exits the conference without affecting the
conference.
Table 5.4 - Testing Scenarios: Conference Call
73
Others
Number
Scenario Details
Results /
Comments
During Videomail compose, user can rerecord Pass

mail as often as they like.
Adaption position decrements when packet loss Pass

occurs
Adaption position increments when packet loss Pass

occurs
User can open, delete, forward and reply to Pass

messages in the unified inbox.
Table 5.5 - Other Testing Scenarios
74
Chapter 6
6 Conclusions and Further Research
6.1 The Benefits of this Project
The obvious benefit of this project is that it provides advanced communication over a
network. It enables users to have video conferences, with one or more people, send
and receive messages and send Videomail all over a simple LAN connection. The
users do not need to be connected to the internet and once set up; there is no cost for
calls or messages. The advanced image observation and adaption features bring
characteristics to the application that have not been seem in similar applications that
have been developed. The application is coded using Java programming language.
There is not really any similar applications that have been coded in Java, the majority
use C++. This project provides a good learning tool by showing the extent to which
Java and Java Media Framework (JMF) can be used for telephony and video
conferencing applications. The application is portable, in that it can be used in
different network situations (although it will perform best on a wired network, it will
also work on a wireless). In addition, the application can be moved between different
platforms, an inherent benefit of Java.
6.2 The Impact of this Project

This application is fun and easy to use and therefore can be used by almost anyone.
Commercial establishments could make use of it; say for example in a small business
with a pre-installed wired network and computers. They could more or less begin
using it immediately. It would provide them with free internal voice calls and
Videomail as well as the video conferencing option. The MMS service would also be
available and is not something that is seen very often in this kind of setup. Another use
for the application could be in an educational sense. It could be used, for example, in a
university, whereby lecturers could communicate with their students by leaving a class
MMS, and perhaps could give conference tutorials or such like. This application could
be used in many ways; all of them would save time and money. In the different usages
scenarios, various features could be disabled or enhanced. The possibilities are
endless!
75
6.3 Future Research Possibilities

There are many ways that this project could be enhanced. One of the first things that
could be done is to increase the conference limitations, i.e. to allow more people to
take part in any one conference and to allow more conferences to take place at any one
time. At the moment, both of these are limited to four. The reason for this limitation is
testing. To be able to test for robustness, the maximum of each had to be tested so the
most efficient thing to do was to have smaller numbers.
Another possibility would be to add some sort of phone book. This would enable users
to search for the number of the person that they would like to call or send a message
to, instead of having to know the number. A third would be to implement a caller ID
whereby when a call comes through or a message is received, it shows the senders
name instead of their phone number. Another possibility would be to increase the
MMS capacity. Allow users to send data files, music files, audio files and video files
from the MMS window, instead of just images.
It would also be desirable to be include a standards based signalling protocol such as
RTSP or SIP with the aim of making the application compatible with equipment and
software from other vendors such as Cisco and Nortel. There would also be an
opportunity to add functionality to allow the system to access the PSTN network. This
would make the system a suitable replacement for an existing telephony system in a
small office environment.
A larger research possibility would be to improve the overall performance of the
application over a wireless network. The aim would be to make it fully robust and as
good quality as a wired network. The application offers good expandability
opportunities from many angles. There is also a lot of scope to test the portability of
the system. Try it in as many non-windows environments as possible.
The system could offer different levels of users. The higher the user level, the more
privileges the user has. So for example, a very basic user might just have MMS and
voice call capabilities. The next level may have voicemail privileges. As the user
levels increase things like video calls, Videomail, conference calls etc. will come into
76
the play. The likes of the adaption and the image observation could be made optional
at the higher levels.
6.4 Meeting the Requirements

The requirements specified at the onset of this project were met and were exceeded.
The application is far beyond what the designer imagined when starting out, and
exceeds what is detailed in section 1.1, which stated the aims of the project.
77
References
[1]
Guy Cote (Student Member, IEEE), Berna Erol, Michael Gallant (Student
Member, IEEE) and Faouzi
Low
Bit Rates, IEEE Transactions on Circuits and Systems for Video
Technology, VOL.
[2]
Kossentini (Member, IEEE) H.263+: Coding at
8, NO. 7, NOVEMBER 1998
S. Deering, Stanford University, Host Extensions for IP Multicasting,

http://www.ietf.org/rfc/rfc1112.txt?number=1112, August 1989 (April 2005)
[3]
International Telecommunication Union, General Aspects of Digital

Transmission
Systems,
Dual
Rate
Speech
Coder
for
Multimedia
Communications Transmitting at 5.3 and 6.3 kbit/s, March 1996

[4]
International
Telecommunication
Union,
Series
H:
Audiovisual
and
Multimedia Systems, Infrastructure of Audiovisual Services Coding of

Moving Video, Video Coding for Low Bit Rate Communication, February
1998
78
7 Appendix 1
7.1 Call Setup Request
protected synchronized void processCallSetupRequest(CallObject tempCallObject,
ServerHandle tempServerHandleSender) {
MappingObject tempMappingObject;
ProfileObject tempProfileObject;
MessageObject tempMessageObject;
ServerHandle tempServerHandleDestination;
int tempDestinationID = tempCallObject.getDestinationID();
int tempSenderID = tempCallObject.getSenderID();
tempMappingObject =
this.server.serverSideStorage.getMapping(tempDestinationID);
if(tempMappingObject == null) {
this.updateSystemLog("Call setup failed");
tempMessageObject = this.compileMessageObject(0000000, tempSenderID, 302, "The
number you dialled is incorrect, please try again!");
tempServerHandleSender.sendPullMessage(tempMessageObject);
return;
}
if (tempMappingObject.getClientStatus() != 1) {
this.updateSystemLog("Call setup failed");
number you dialed cannot be reached at this time, please try again later");
return;
}
tempProfileObject = tempMappingObject.getClientProfile();
tempServerHandleDestination = tempMappingObject.getServerHandle();
tempCallObject.setDestinationInetAddress(tempProfileObject.getInetAddress());
tempMessageObject = this.compileMessageObject(0000000, tempDestinationID, 300,
tempCallObject);
tempServerHandleDestination.sendPushMessage(tempMessageObject);
tempMessageObject = tempServerHandleDestination.receivePushMessage();
if(tempMessageObject.getPayloadType() == 301) {
this.updateSystemLog("Call setup complete");
this.server.serverSideStorage.updateMapping(tempDestinationID, 2);
this.server.serverSideStorage.updateMapping(tempSenderID, 2);
79
return;
} else if(tempMessageObject.getPayloadType() == 302) {
this.updateSystemLog("Call setup complete");
return;
}
}
7.2 Login Request

protected synchronized void processLoginRequest(ProfileObject tempProfileObject,
ServerHandle tempServerHandle) {
String tempUsername = tempProfileObject.getUsername();
String tempPassword = tempProfileObject.getPassword();
InetAddress tempInetAddress = tempProfileObject.getInetAddress();
tempMappingObject = this.server.serverSideStorage.getMapping(tempUsername);
tempMessageObject = this.compileMessageObject(0000000, 0000000, 102, "ERROR: No
match for that usename");
tempServerHandle.sendPullMessage(tempMessageObject);
this.updateSystemLog("Login Failed");
return;
}
tempMessageObject = this.compileMessageObject(0000000, 0000000, 102, "ERROR:
Username already logged on");
return;
}
if (tempPassword.equals(tempProfileObject.getPassword())) {
tempProfileObject.setInetAddress(tempInetAddress);
80

this.server.serverSideStorage.updateMapping(tempUsername, 1, tempProfileObject,
tempServerHandle);
this.updateSystemLog("LOGIN: Client " + tempProfileObject.getClientID() + "\n");
tempMessageObject = this.compileMessageObject(0000000,
tempProfileObject.getClientID(), 101, tempProfileObject);
return;
} else {
Password incorrect");
return;
}
}
7.3 Logoff Request

protected synchronized void processLogoffRequest(ProfileObject tempProfileObject,
ServerHandle tempServerHandle) {
int tempClientID = tempProfileObject.getClientID();
String tempUsername = tempProfileObject.getUsername();
tempMappingObject = this.server.serverSideStorage.getMapping(tempClientID);
tempMessageObject = this.compileMessageObject(0000000, tempClientID, 105, "ERROR:
Unable to log off");
this.updateSystemLog("Logoff Failed");
return;
}
if (tempMappingObject.getClientStatus() > 1) {
Username already logged on");
81

return;
}
tempProfileObject.setInetAddress(null);
this.server.serverSideStorage.updateMapping(tempUsername, 0, tempProfileObject,
null);
tempProfileObject.getClientID(), 104, "Bu Bye");
this.updateSystemLog("LOGOFF: Client " + tempClientID + "\n");
return;
}
7.4 Call End Request

protected synchronized void processCallTeardownRequest(int tempClientID, CallObject
tempCallObject, ServerHandle tempServerHandleSender) {
ProfileObject tempProfileObject;
int tempDestinationID = tempCallObject.getDestinationID();
int tempSenderID = tempCallObject.getSenderID();
if(tempClientID == tempSenderID) {
tempMappingObject = this.server.serverSideStorage.getMapping(tempDestinationID);
} else {
tempMappingObject = this.server.serverSideStorage.getMapping(tempSenderID);
}
this.updateSystemLog("Call Teardown failed");
tempMessageObject = this.compileMessageObject(0000000, tempSenderID, 305, "Unable
to end call!");
return;
}
this.updateSystemLog("Call Teardown failed");
tempMessageObject = this.compileMessageObject(0000000, tempSenderID, 305, "Unable
to end call!");
return;
}
82

tempMessageObject = this.compileMessageObject(0000000, tempDestinationID, 303,
tempCallObject);
if(tempMessageObject.getPayloadType() == 304) {
this.updateSystemLog("Call Teardown complete");
this.server.serverSideStorage.updateMapping(tempDestinationID, 1);
return;
}
}
7.5 Conference Setup Request

protected synchronized void processConferenceSetupRequest(int tempSenderID,
ConferenceObject tempConferenceObject, ServerHandle tempServerHandleSender) {
System.out.println("MARK 102");
tempConferenceObject =
this.server.serverSideStorage.addConference(tempConferenceObject);
System.out.println("MARK 110");
if(tempConferenceObject == null) {
this.updateSystemLog("Conference setup failed");
tempMessageObject = this.compileMessageObject(0000000, tempSenderID, 308, "No
conference space available. Please try again later!");
return;
}
tempMappingObject = this.server.serverSideStorage.getMapping(tempSenderID);
// remove conference from vector
tempMessageObject = this.compileMessageObject(0000000, tempSenderID, 308, "Error
accessing Mapping Database. Contect your local administrator.");
return;
}
83
this.updateSystemLog("Conference setup complete");

tempMessageObject = this.compileMessageObject(0000000, tempSenderID, 307,
tempConferenceObject);
}
7.6 Add Participant to Conference Request

protected synchronized void processConferenceAddParticipantRequest(int tempSenderID,
System.out.println("CHECK 03");
int tempConferenceID = tempConferenceObject.getConferenceID();
int tempConferencePosition = tempConferenceObject.getConferencePosition();
int tempParticipantID = tempConferenceObject.getParticipantID();
System.out.println("Conference ID: " + tempConferenceObject.getConferenceID());
System.out.println("Participant ID: " + tempConferenceObject.getParticipantID());
tempMappingObject =
this.server.serverSideStorage.getMapping(tempConferenceObject.getParticipantID());
if (tempMappingObject == null) {
this.updateSystemLog("Conference add participant failed");
participant number you entered is incorrect. Please try again!");
return;
}
System.out.println("Client " + tempConferenceObject.getParticipantID() + " status:
" + tempMappingObject.getClientStatus());
participant is unavailable. Please try again later.");
return;
}
tempConferenceObject =
this.server.serverSideStorage.addConferenceParticipant(tempConferenceObject);
84

if (tempConferenceObject == null) {
"Conference participant capacity has been reached!");
return;
}
tempConferenceObject.getParticipantID(), 306, tempConferenceObject);
if (tempMessageObject.getPayloadType() == 307) {
this.server.serverSideStorage.updateMapping(tempParticipantID, 2); // set to busy
tempConferenceObject);
this.updateSystemLog("Conference setup complete");
} else if (tempMessageObject.getPayloadType() == 308) {
"Connection to participant failed");
// remove conference
}
}
7.7 End Conference Request

protected synchronized void processConferenceTeardownRequest(int tempClientID,
ConferenceObject tempConferenceParticipant = null;
System.out.println("MArk 03");
if(tempConferenceObject.getConferenceID() ==
tempConferenceObject.getParticipantID()) {
85

int tempConferencePosition = tempConferenceObject.getConferencePosition();
for (int i = (tempConferencePosition + 1); i < (tempConferencePosition + 4); i++)
{
tempMappingObject = null;
tempServerHandleDestination = null;
tempConferenceParticipant = null;
tempConferenceParticipant =
this.server.serverSideStorage.getConferenceParticipant(i, null);
if (tempConferenceParticipant == null) {
continue;
}
tempMappingObject =
this.server.serverSideStorage.getMapping(tempConferenceParticipant.getParticipantID());
tempConferenceParticipant.getParticipantID(), 312, tempConferenceParticipant);
if (tempMessageObject.getPayloadType() == 313) {
this.server.serverSideStorage.updateMapping(tempConferenceParticipant.getParticipantID(
), 1);
this.updateSystemLog("Participant" + tempConferenceParticipant + " removed
from conference");
} else {
this.updateSystemLog("Participant removal error");
}
}
tempConferenceParticipant =
this.server.serverSideStorage.getConferenceParticipant(tempConferencePosition, null);
this.server.serverSideStorage.updateMapping(tempConferenceParticipant.getParticipantID(
), 1);
tempConferenceParticipant.getParticipantID(), 313, tempConferenceParticipant);
}
}
86
7.8 Send message Request

protected void processMMSSendRequest(UMSObject tempUMSObject, ServerHandle
tempLocalHandle) {
ServerHandle tempRemoteHandle;
tempMappingObject =
this.server.serverSideStorage.getMapping(tempUMSObject.getDestinationID());
tempUMSObject.getSenderID(), 402, "The number does not exist");
tempLocalHandle.sendPullMessage(tempMessageObject);
return;
}
if(tempMappingObject.getClientStatus() == 1) {
tempRemoteHandle = tempMappingObject.getServerHandle();
tempUMSObject.getDestinationID(), 400, tempUMSObject);
tempRemoteHandle.sendPushMessage(tempMessageObject);
tempLocalHandle.sendPullMessage(tempRemoteHandle.receivePushMessage());
} else {
this.server.serverSideStorage.storeMessage(tempUMSObject);
tempUMSObject.getSenderID(), 401, "The message has been stored for future delivery");
tempLocalHandle.sendPullMessage(tempMessageObject);
this.server.serverSideUtilities.updateSystemLog("Message stored.");
}
}
7.9 Receive Message Request

protected void processMMSReceiveRequest(int senderClientID, ServerHandle
tempLocalHandle) {
boolean tempNoMessages = false;
UMSObject tempUMSObject = null;
MessageObject tempMessageObject = null;
while(!tempNoMessages) {
tempUMSObject = this.server.serverSideStorage.retrieveMessage(senderClientID);
System.out.println("Check 07");
if(tempUMSObject == null) {
87

tempNoMessages = true;
} else {
tempUMSObject.getDestinationID(), 400, tempUMSObject);
tempLocalHandle.sendPushMessage(tempMessageObject);
tempMessageObject = tempLocalHandle.receivePushMessage();
this.updateSystemLog("" + (String)tempMessageObject.getPayloadObject());
}
}
}
88
8 Appendix 2
8.1 Image Observation Code
int imageObservationAverage = 0;
this.imageObservationLastSampleTime = System.currentTimeMillis();
System.out.println("Bytes Sent: " + bytesSent);
System.out.println("Byte Rate: " + byteRate);
System.out.println("timeDelay Rate: " + timeDelay);
this.streamTotalBytesSent = byteSentTotal;
for (int j = 0; j < imageObservationArray.length; j++) {
imageObservationAverage = imageObservationArray[j] +
imageObservationAverage;
}
imageObservationAverage = (int) (imageObservationAverage /
imageObservationArray.length);
int temp = (int)(imageObservationAverage * 0.77);
if (byteRate < temp) {
imageObservationMarkCount++;
} else {
System.arraycopy(imageObservationArray, 1, imageObservationArray, 0,
imageObservationArray.length - 1);
imageObservationArray[3] = byteRate;
imageObservationMarkCount = 0;
}
System.out.println("AV: " + imageObservationAverage + " Mark: " +
imageObservationMarkCount + " Temp: " + temp);
if (imageObservationMarkCount > 10) {
this.client.clientSideUtilities.processCallTeardown();
this.imageObservationMarkCount = 0;
}
89

Project Sampal

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Sampal

Uploaded by

Copyright:

Available Formats

Video Conferencing System

Video Conferencing System Janet Adams

Video Conferencing System Janet Adams

Video Conferencing System Janet Adams

AIM OF THIS PROJECT ...................................................................................1

CURRENT EXAMPLES OF SIMILAR APPLICATIONS ........................................1

EQUIPMENT AND SOFTWARE ........................................................................2

JBuilder 2005 ..........................................................................................2

Logitech Webcam ....................................................................................2

Digital Camcorder ..................................................................................2

TECHNICAL BACKGROUND ..........................................................................3

JAVA MEDIA FRAMEWORK ...........................................................................3

JMF Architecture ....................................................................................5

Principle Elements ..................................................................................6

Common Media Formats.........................................................................9

Real Time Transport Protocol (RTP) Architecture in JMF ..................11

REAL-TIME TRANSPORT PROTOCOL...........................................................16

Some RTP Definitions ...........................................................................17

RTP Data Structures .............................................................................19

RTP Control Protocol ...........................................................................21

Video Conferencing System Janet Adams

Alternatives to RTP ...............................................................................25

Alternative Audio Encoding Schemes ...................................................28

VIDEO ENCODING SCHEME H.263 ...........................................................28

Summary of Operation ..........................................................................28

Alternative Video Encoding Schemes....................................................29

IMAGE OBSERVATION ................................................................................29

Initial Ideas ...........................................................................................29

The Way it Works ..................................................................................30

Alternatives to Multicasting ..................................................................32

What is Multicasting .............................................................................33

AUDIO ENCODING SCHEME G.723.1........................................................26

DESIGN OF THE SYSTEM ..............................................................................35

SYSTEM ARCHITECTURE ............................................................................35

Client to Server Communication ...........................................................35

Client to Client Communication............................................................37

SYSTEM DESIGN .........................................................................................37

The Server .............................................................................................38

The Client ..............................................................................................40

MESSAGING STRUCTURE ............................................................................41

IMAGE OBSERVATION ................................................................................46

COMMON PROCEDURES WITHIN THE APPLICATION ....................................47

Call Setup ..............................................................................................48

Video Conferencing System Janet Adams

IMPLEMENTATION OF THE SYSTEM .......................................................51

Making a Peer to Peer Call ..................................................................52

Receiving a Person to Person Call .......................................................55

Initiating a Conference Call..................................................................56

Joining a Conference Call ....................................................................57

Sending an MMS Message ....................................................................58

Receiving an MMS Message .................................................................61

EXTRA FEATURES ......................................................................................64

Image Observation ................................................................................64

USING THE SERVER ....................................................................................64

RESULTS AND DISCUSSION .........................................................................69

CONCLUSIONS AND FURTHER RESEARCH ............................................75

THE BENEFITS OF THIS PROJECT .................................................................75

THE IMPACT OF THIS PROJECT ....................................................................75

FUTURE RESEARCH POSSIBILITIES .............................................................76

MEETING THE REQUIREMENTS ...................................................................77

CALL SETUP REQUEST ...............................................................................79

LOGIN REQUEST .........................................................................................80

LOGOFF REQUEST ......................................................................................81

CALL END REQUEST...................................................................................82