You are on page 1of 61

Multimedia Communications :

1. Introduction
Institut Sains dan Teknologi Nasional - Jakarta
Reference
Text: Multimedia Communications; Applications,
Networks, Protocols and Standards, Fred
Halsall, Addison-Wesley; 1st edition (2002), ISBN:
0-201-39818-4.

What is Multimedia?
01/22/2007
3

Multimedia is a combination of text, art, sound,
animation, and video.




Slide: Courtesy, Hung Nguyen
Multimedia Description
Introduction to Multimedia 4
Multimedia
is an integration of continuous media (e.g. audio, video) and
discrete media (e.g. text, graphics, images) through which
digital information can be conveyed to the user in an appropriate
way.
Multi
many, much, multiple
Medium
An interleaving substance through which something is
transmitted or carried on
Why Multimedia Computing?
Introduction to Multimedia 5
Application driven
e.g. medicine, sports, entertainment, education
Information can often be better represented using
audio/video/animation rather than using text, images and
graphics alone.
Information is distributed using computer and
telecommunication networks.
Integration of multiple media places demands on
computation power
storage requirements
networking requirements
Multimedia Information Systems
Introduction to Multimedia 6
Technical challenges
Sheer volume of data
Need to manage huge volumes of data
Timing requirements
among components of data computation and communication.
Must work internally with given timing constraints - real-time
performance is required.
Integration requirements
need to process traditional media (text, images) as well as
continuous media (audio/video).
Media are not always independent of each other -
synchronization among the media may be required.
High Data Volume of Multimedia
Information
Speech 8000 samples/s 8Kbytes/s
CD Audio 44,100 samples/s, 2
bytes/sample
176Kbytes/s
Satellite
Imagery
180X180 km^2
30m^2 resolution
600MB/image
(60MB
compressed)
NTSC Video 30fps, 640X480
pixels, 3bytes/pixel
30Mbytes/s
(2-8 Mbits/s
compressed)

Introduction to Multimedia 7
Technology Incentive
Introduction to Multimedia 8
Growth in computational capacity
MM workstations with audio/video processing capability
Dramatic increase in CPU processing power
Dedicated compression engines for audio, video etc.
Rise in storage capacity
Large capacity disks (several gigabytes)
Increase in storage bandwidth,e.g. disk array technology
Surge in available network bandwidth
high speed fiber optic networks - gigabit networks
fast packet switching technology
Application Areas
Introduction to Multimedia 9
Residential Services
video-on-demand
video phone/conferencing systems
multimedia home shopping (MM catalogs, product demos and
presentation)
self-paced education
Business Services
Corporate training
Desktop MM conferencing, MM e-mail
Application Areas
Introduction to Multimedia 10
Education
Distance education - MM repository of class videos
Access to digital MM libraries over high speed networks
Science and Technology
computational visualization and prototyping
astronomy, environmental science
Medicine
Diagnosis and treatment - e.g. MM databases that provide
support for queries on scanned images, X-rays, assessments,
response etc.
Classification of Media
Introduction to Multimedia 11
Perception Medium
How do humans perceive information in a computer?
Through seeing - text, images, video
Through hearing - music, noise, speech
Representation Medium
How is the computer information encoded?
Using formats for representing and information
ASCII(text), JPEG(image), MPEG(video)
Presentation Medium
Through which medium is information delivered by the computer
or introduced into the computer?
Via I/O tools and devices
paper, screen, speakers (output media)
keyboard, mouse, camera, microphone (input media)
Classification of Media (cont.)
Introduction to Multimedia 12
Storage Medium
Where will the information be stored?
Storage media - floppy disk, hard disk, tape, CD-ROM etc.
Transmission Medium
Over what medium will the information be transmitted?
Using information carriers that enable continuous data transmission
- networks
wire, coaxial cable, fiber optics
Information Exchange Medium
Which information carrier will be used for information exchange
between different places?
Direct transmission using computer networks
Combined use of storage and transmission media (e.g. electronic
mail).
Media Concepts
Introduction to Multimedia 13
Each medium defines
Representation values - determine the information
representation of different media
Continuous representation values (e.g. electro-magnetic waves)
Discrete representation values(e.g. text characters in digital form)
Representation space determines the surrounding where the
media are presented.
Visual representation space (e.g. paper, screen)
Acoustic representation space (e.g. stereo)
Media Concepts (cont.)
Introduction to Multimedia 14
Representation dimensions of a representation
space are:
Spatial dimensions:
two dimensional (2D graphics)
three dimensional (holography)
Temporal dimensions:
Time independent (document) - Discrete media
Information consists of a sequence of individual elements without a
time component.
Time dependent (movie) - Continuous media
Information is expressed not only by its individual value but also by
its time of occurrence.
Multimedia Systems
Introduction to Multimedia 15
Qualitative and quantitative evaluation of multimedia
systems
Combination of media
continuous and discrete.
Levels of media-independence
some media types (audio/video) may be tightly coupled, others
may not.
Computer supported integration
timing, spatial and semantic synchronization
Communication capability
Data Streams
Introduction to Multimedia 16
Distributed multimedia communication systems
data of discrete and continuous media are broken into individual
units (packets) and transmitted.
Data Stream
sequence of individual packets that are transmitted in a time-
dependant fashion.
Transmission of information carrying different media leads to
data streams with varying features
Asynchronous
Synchronous
Isochronous
Data Stream Characteristics
Introduction to Multimedia 17
Asynchronous transmission mode
provides for communication with no time restriction
Packets reach receiver as quickly as possible, e.g. protocols for
email transmission
Synchronous transmission mode
defines a maximum end-to-end delay for each packet of a data
stream.
May require intermediate storage
E.g. audio connection established over a network.
Isochronous transmission mode
defines a maximum and a minimum end-to-end delay for each
packet of a data stream. Delay jitter of individual packets is
bounded.
E.g. transmission of video over a network.
Intermediate storage requirements reduced.
Data Stream Characteristics
Introduction to Multimedia 18
Data Stream characteristics for continuous media can be
based on
Time intervals between complete transmission of consecutive
packets
Strongly periodic data streams - constant time interval
Weakly periodic data streams - periodic function with finite period.
Aperiodic data streams
Data size - amount of consecutive packets
Strongly regular data streams - constant amount of data
Weakly regular data streams - varies periodically with time
Irregular data streams
Continuity
Continuous data streams
Discrete data streams
Classification based on time intervals
Introduction to Multimedia 19
Strongly periodic data stream
Weakly periodic data stream
Aperiodic data stream
T
T
T1 T3 T2
T1 T2
T
Classification based on packet size
Introduction to Multimedia 20
T
D1
D1
T
D1
D2
D3
D1
D2
D3
D1
D2
D3
Dn
Strongly regular data stream
Weakly regular data stream
Irregular data stream
t
t
t
Classification based on continuity
Introduction to Multimedia 21
Continuous data stream
Discrete data stream
D
D1 D2 D3 D4
D
D1 D2 D3 D4
Logical Data Units
Introduction to Multimedia 22
Continuous media consist of a time-dependent sequence
of individual information units called Logical Data Units
(LDU).
a symphony consists of independent sentences
a sentence consists of notes
notes are sequences of samples
Granularity of LDUs
symphony, sentence, individual notes, grouped samples, individual
samples
film, clip, frame, raster, pixel
Duration of LDU:
open LDU - duration not known in advance
closed LDU - predefined duration
Granularity of Logical Data Units
Introduction to Multimedia 23
Film
Clip
Frame
Blocks
Pixels
Multimedia Components Simplified
01/22/2007
24
Multimedia can be viewed as they combination of audio, video, data
and how they interact with the user (more than the sum of the
individual components)
Audio
Multimedia
Video Data
Background
01/22/2007
25
Fast paced emergence in applications in medicine,
education, travel etc
Characterized by large documents that must be
communicated with short delays
Glamorous applications such as distance learning,
video teleconferencing
Applications that are enhanced by Video are often
seen as driver for development of multimedia
networks
Forces Driving Communications That
Facilitate Multimedia Communications
01/22/2007
26
Evolution of communications and data networks
Increasing availability of almost unlimited bandwidth
demand
Availability of ubiquitous access to the network
Ever increasing amount of memory and
computational power
Sophisticated terminals
Digitization of virtually everything
New Information System Paradigm
01/22/2007
27
Integration
Multimedia
Integrated
Communication
Multimedia
Processing
Broadband Link
Workstation, PC
Slide: Courtesy, Hung Nguyen
Elements of Multimedia Systems
01/22/2007
28
Two key communication modes
Person-to-person
Person-to-machine
Transport
Use
Interface
Use
Interface
Transport
Processing
Storage and
Retrieval
Use
Interface
Slide: Courtesy, Hung Nguyen
Multimedia Networks
01/22/2007
29

The world has been wrapped in copper and glass
fiber and can be viewed as a hair ball with physical,
wireless and satellite entry/exit points.
Physical: LAN-WAN connections
Wireless: Cellular telephony, wireless PC
connectivity
Satellite: INMARSAT, THURYA, ACeS etc
Multimedia Communication Model
01/22/2007
30
Partitioning of information objects into distinct types,
e.g., text, audio, video
Standardization of service components per
information type
Creation of platforms at two levels network service
and multimedia communication
Define general applications for multiple use in
various multimedia environments
Define specific applications, e.g. e-commerce, tele-
training, using building blocks from platform and
general applications
Requirements
01/22/2007
31
User Requirements
Fast preparation and presentation
Dynamic control of multimedia applications
Intelligent support to users
Standardization
Network Requirements
High speed and variable bit rates
Multiple virtual connections using the same access
Synchronization of different information types
Suitable standardized services along with support
Network Requirements
01/22/2007
32
ATM-BISDN and SS7 have enabled the switching
based communications capabilities over the
PSTN that support the necessary services
ATM-BISDN-SS7 will evolve to all optical
switchless networks based on packet transfer
Packet Transfer Concept
01/22/2007
33
Allows voice, video and data to be dealt with in a
common format
More flexible than circuit switching which it can
emulate while allowing the multiplexing of varied
bit rate data streams
Dynamic allocation of bandwidth
Handle Variable Bit Rate (VBR) directly
Considerations
01/22/2007
34
Buffering required for constant bit rate data such as
audio
Re-sequencing and recovery capabilities must be
provided over networks where packets may be
received either in an order different from that
transmitted or dropped
In an ATM network some packets can be dropped while
others may not (i.e. voice vs bank transfer data packets)
Optimum packet lengths for voice video and data differ in
an ATM network
IP packets over the internet may arrive in a different order
or be dropped.
Digital Video Signal Transport
01/22/2007
35
V
i
d
e
o

Encoder
Transformatio
n
Quantization
Entropy
Coding
Bit-Rate
Control
Application
Data Structuring
U
s
e
r
s

Network
Multiplexing/Routing
Overhead
(FEC)
Re-Trans
Error detection
Loss detection
Error correction
Erasure
correction
Application
Re-Synch
Decoder
De-quantization
Entropy decode
Inv Trans
Loss conceal
Post process
The following figure will be examined over the course of the semester
Quality of Service (QoS)
01/22/2007
36
The set of parameters that defines the properties
of media streams
Can define four QoS layers:
1. User QoS: Perception of the multimedia data at the
user interface (qualitative)
2. Application QoS: Parameters such as end-to-end delay
(quantitative)
3. System QoS: Requirements on the communications
services derived from the application QoS
4. Network QoS: Parameters such as network load and
performance
Applications of Multimedia
01/22/2007
37

Business - Business applications for multimedia
include presentations training, marketing,
advertising, product demos, databases, catalogues,
instant messaging, and networked communication.

Schools - Educational software can be developed to
enrich the learning process.
Slide: Courtesy, Hung Nguyen
Applications of Multimedia
01/22/2007
38

Home - Most multimedia projects reach the homes
via television sets or monitors with built-in user
inputs.

Public places - Multimedia will become available at
stand-alone terminals or kiosks to provide
information and help.
Slide: Courtesy, Hung Nguyen
Compact Disc Read-Only (CD-ROM)
01/22/2007
39

CD-ROM is the most cost-effective distribution
medium for multimedia projects.
It can contain up to 80 minutes of full-screen video or
sound.
CD burners are used for reading discs and
converting the discs to audio, video, and data
formats.
Slide: Courtesy, Hung Nguyen
Digital Versatile Disc (DVD)
01/22/2007
40

Multilayered DVD technology increases the capacity
of current optical technology to 18 GB.
DVD authoring and integration software is used to
create interactive front-end menus for films and
games.
DVD burners are used for reading discs and
converting the disc to audio, video, and data formats.
Slide: Courtesy, Hung Nguyen
Multimedia Communications
01/22/2007
41
Multimedia communications is the delivery of
multimedia to the user by electronic or digitally
manipulated means.

Audio Communications
(Telephony, sound, Broadcast)
Multimedia
Communications
Video Communications
(Video telephony,
TV/HDTV)
Data, text, image
Communications
(Data Transfer, fax)
Slide: Courtesy, Hung Nguyen
Multimedia Terms
01/22/2007
42
Alternative Types of Media used in
Multimedia Applications
01/22/2007
43
Multimedia Communications Networks
01/22/2007
44
Multimedia Networks and Their Services
01/22/2007
45
Multimedia Networks and Their Services
01/22/2007
46
Audio-Visual Integration
Application in Biometrics Bimodal Person
Verification
01/22/2007
48

Existing methods for person verification are mainly
based on a single modality which would have
limitation in security and robustness

Audio visual integration using a camera and
microphone makes person verification a more
reliable product
Slide: Courtesy, Hung Nguyen
Joint Audio-Video Coding
01/22/2007
49
Correlation between audio and video can be used to
achieve more efficient coding
Predictive coding of audio and video information used to
construct estimate of current frame (cross-modal
redundancy)
Difference between original and estimated signal can be
transmitted as parameters
Decision on what and how to send is based on Rate
Distortion (R-D) criteria
Reconstruction done at receiver according to
agreed-upon decoding rules
Slide: Courtesy, Hung Nguyen
Cross-Model Predictive Coding
01/22/2007
50
Visual
Analysis
A-to-V
Mapping
Decision
Module
(R-D)
Parameter X
X

X X

Nothing
Parameter X
Slide: Courtesy, Hung Nguyen
Importance of Interaction
01/22/2007
51
Multimedia is more than the combination
of text, audio, video and data
Interaction among media is important
Consider a poorly dubbed movie
Audio not synchronized with video
Lip movements inconsistent with language
Audio dynamic range inconsistent with the
scene
Slide: Courtesy, Hung Nguyen
Media Interaction
01/22/2007
52
Process and Model
Audio
Text
Image
Video
Multimedia
Lip synch
Face Animation
Joint A/V Coding
Compression
Synthesis
3D Sound
Sign language
Lip reading
Speech Recognition
Text-to-Speech
Compression, Graphics
Database indexing/retrieval
Translation
Natural language
Slide: Courtesy, Hung Nguyen
Bimodality of Human Speech
01/22/2007
53
Human speech is produced by vibration of the vocal
cord, configuration of the vocal tract with muscles
that generate facial expressions
Audio + Visual Perceived
ba ga da
pa ga ta
ma ga na
Slide: Courtesy, Hung Nguyen
Basic Definitions
01/22/2007
54
The basic unit of acoustic speech is called a
phoneme
In the visual domain, the basic unit of mouth
movement is called viseme
A viseme is the smallest visibly distinguishable unit of
speech
Can contain several phonemes and thus form one viseme
group
A many-to-one mapping between phonemes and visemes
Slide: Courtesy, Hung Nguyen
Lip Reading System
01/22/2007
55
Application to support hearing-impaired person
People learn to understand spoken language by
combining visual content with lexical, syntactic,
semantic and programmatic information
Automated lip reading systems
Speech recognition possible using only visual information
Integrated with speech recognition systems to improve
accuracy
Slide: Courtesy, Hung Nguyen
Lip Synchronization
01/22/2007
56
Applications
In VTC (video teleconferencing) where video frame is
dropped (low bandwidth requirement) but audio must still
be continuous
In non-real-time use such as dubbing in studio where
recorded voice full of background noise
Time-warping commonly used in both audio and
video modes
Time-frequency analysis
Video time-warping could be used for VTC
Audio time-warping could be used for dubbing
Slide: Courtesy, Hung Nguyen
Lip Tracking
01/22/2007
57

To prevent too much jerkiness in the motion rendering
and too much loss in lip synchronization
Involved real-time analysis on 3-dimensional of the video
signal plus one temporal dimension
Produce meaningful parameters
Classification of mouth images into visemes
Measures of dimension, e.g. mouth widths and heights
Analysis tools Fourier Transform, Karhunen-Loeve
Transform (KLT), Probability Density Function (pdf)
Estimation
Slide: Courtesy, Hung Nguyen
Audio-to-Visual Mapping for Lip Tracking
01/22/2007
58
Conversion of acoustic speech to mouth shape
parameters
A mapping of phonemes to visemes
Could be most precisely implemented with a complete
speech recognizer followed by a look-up table
High computational overhead plus table look-up complexity
Do not need to recognize spoken word to achieve audio-to-visual
mapping
Physical relationships exist between vocal tract shape
and sound produced functional relationships exist
between speech and visual parameters
Slide: Courtesy, Hung Nguyen
Classification-Based Conversion Approaches
for Lip Tracking
01/22/2007
59
Two-step process
Classification of acoustic signal using VQ (vector
quantization), HMM (hidden Markov model) and NN
(neural network)
Mapping of the acoustic classes into corresponding
visual outputs, then averaged to get centroid
Shortcomings
Error resulting from averaging visual vector to get visual
centroid
Not a continuous mapping finite output levels
Slide: Courtesy, Hung Nguyen
Classification-Based Conversion
01/22/2007
60
Phoneme Space
Viseme Space
Centroid
Slide: Courtesy, Hung Nguyen
Audio and Visual Integration for Lip
Reading Applications
01/22/2007
61
Three major steps
Audio-visual pre-processing Principal Component
Analysis (PCA) has been used for feature extraction
Pattern recognition strategy (HMM, NN, time-warping)
Integration strategy (decision making)
Heuristic rules to incorporate knowledge of phonemes about the
two modalities
Combination of independent evaluation score for each
modalities
Slide: Courtesy, Hung Nguyen

You might also like